320 45 27MB
English Pages [4164] Year 2022
Econometrics Toolbox™ User's Guide
R2022b
How to Contact MathWorks Latest news:
www.mathworks.com
Sales and services:
www.mathworks.com/sales_and_services
User community:
www.mathworks.com/matlabcentral
Technical support:
www.mathworks.com/support/contact_us
Phone:
508-647-7000
The MathWorks, Inc. 1 Apple Hill Drive Natick, MA 01760-2098 Econometrics Toolbox™ User's Guide © COPYRIGHT 1999–2022 by The MathWorks, Inc. The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders. Patents
MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information.
Revision History
October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012 March 2013 September 2013 March 2014 October 2014 March 2015 September 2015 March 2016 September 2016 March 2017 September 2017 March 2018 September 2018 March 2019 September 2019 March 2020 September 2020 March 2021 September 2021 March 2022 September 2022
Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only
Version 1.0 (Release 2008b) Revised for Version 1.1 (Release 2009a) Revised for Version 1.2 (Release 2009b) Revised for Version 1.3 (Release 2010a) Revised for Version 1.4 (Release 2010b) Revised for Version 2.0 (Release 2011a) Revised for Version 2.0.1 (Release 2011b) Revised for Version 2.1 (Release 2012a) Revised for Version 2.2 (Release 2012b) Revised for Version 2.3 (Release 2013a) Revised for Version 2.4 (Release 2013b) Revised for Version 3.0 (Release 2014a) Revised for Version 3.1 (Release 2014b) Revised for Version 3.2 (Release 2015a) Revised for Version 3.3 (Release 2015b) Revised for Version 3.4 (Release 2016a) Revised for Version 3.5 (Release 2016b) Revised for Version 4.0 (Release 2017a) Revised for Version 4.1 (Release 2017b) Revised for Version 5.0 (Release 2018a) Revised for Version 5.1 (Release 2018b) Revised for Version 5.2 (Release 2019a) Revised for Version 5.3 (Release 2019b) Revised for Version 5.4 (Release 2020a) Revised for Version 5.5 (Release 2020b) Revised for Version 5.6 (Release 2021a) Revised for Version 5.7 (Release 2021b) Revised for Version 6.0 (Release 2022a) Revised for Version 6.1 (Release 2022b)
Contents
1
2
Getting Started Econometrics Toolbox Product Description . . . . . . . . . . . . . . . . . . . . . . . .
1-2
Econometric Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Econometrics Toolbox Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-3 1-3 1-3
Represent Time Series Models Using Econometrics Toolbox Objects . . . . Model Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Model Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Retrieve Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modify Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-7 1-7 1-10 1-11 1-15 1-16 1-17
Stochastic Process Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Stochastic Process? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Time Series Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit Root Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lag Operator Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristic Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-18 1-18 1-19 1-19 1-20 1-21 1-22
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-24
Data Preprocessing Data Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Why Transform? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Data Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-2 2-2 2-2
Trend-Stationary vs. Difference-Stationary Processes . . . . . . . . . . . . . . . . Nonstationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trend Stationary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Difference Stationary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-6 2-6 2-7 2-7
Specify Lag Operator Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lag Operator Polynomial of Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . Difference Lag Operator Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-9 2-9 2-11
Nonseasonal Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-13
v
3
vi
Contents
Nonseasonal and Seasonal Differencing . . . . . . . . . . . . . . . . . . . . . . . . . .
2-16
Time Series Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-19
Moving Average Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-21
Moving Average Trend Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-22
Parametric Trend Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-24
Hodrick-Prescott Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-29
Use Hodrick-Prescott Filter to Reproduce Original Result . . . . . . . . . . .
2-30
Seasonal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Seasonal Filter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stable Seasonal Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sn × m seasonal filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-34 2-34 2-34 2-35
Seasonal Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Seasonal Adjustment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deseasonalized Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seasonal Adjustment Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-37 2-37 2-37 2-37
Seasonal Adjustment Using a Stable Seasonal Filter . . . . . . . . . . . . . . . .
2-39
Seasonal Adjustment Using S(n,m) Seasonal Filters . . . . . . . . . . . . . . . .
2-45
Model Selection Box-Jenkins Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-2
Select ARIMA Model for Time Series Using Box-Jenkins Methodology . .
3-4
Autocorrelation and Partial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . What Are Autocorrelation and Partial Autocorrelation? . . . . . . . . . . . . . . Theoretical ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute Sample ACF and PACF in MATLAB® . . . . . . . . . . . . . . . . . . . .
3-11 3-11 3-11 3-11 3-12
Ljung-Box Q-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-18
Detect Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute Sample ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conduct the Ljung-Box Q-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-20 3-20 3-22
Engle’s ARCH Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-26
Detect ARCH Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Autocorrelation of Squared Residuals . . . . . . . . . . . . . . . . . . . . . . .
3-28 3-28
Conduct Engle's ARCH Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-30
Unit Root Nonstationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Unit Root Test? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Unit Root Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Available Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing for Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-33 3-33 3-33 3-37 3-38
Unit Root Tests . . . . . . . . . . . . . . . . . . Test Simulated Data for a Unit Root Test Time Series Data for Unit Root Test Stock Data for a Random Walk
............................ ............................ ............................ ............................
3-41 3-41 3-45 3-48
Assess Stationarity of a Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-51
Information Criteria for Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . Compute Information Criteria Using aicbic . . . . . . . . . . . . . . . . . . . . . . .
3-54 3-54
Model Comparison Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Available Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lagrange Multiplier Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-58 3-58 3-60 3-60 3-60 3-61
Conduct Lagrange Multiplier Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-62
Conduct Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-65
Compare GARCH Models Using Likelihood Ratio Test . . . . . . . . . . . . . . .
3-67
Classical Model Misspecification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-70
Check Fit of Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . .
3-81
Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-86
Residual Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Residuals for Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Residuals for Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Residuals for Conditional Heteroscedasticity . . . . . . . . . . . . . . . .
3-87 3-87 3-87 3-87
Assess Predictive Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-89
Nonspherical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Nonspherical Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-90 3-90
Plot a Confidence Band Using HAC Estimates . . . . . . . . . . . . . . . . . . . . .
3-91
Change the Bandwidth of a HAC Estimator . . . . . . . . . . . . . . . . . . . . . . .
3-98
Check Model Assumptions for Chow Test . . . . . . . . . . . . . . . . . . . . . . . .
3-104
Power of the Chow Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-110
vii
4
viii
Contents
Econometric Modeler Analyze Time Series Data Using Econometric Modeler . . . . . . . . . . . . . . . Prepare Data for Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . Import Time Series Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fitting Models to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conducting Goodness-of-Fit Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding Model with Best In-Sample Fit . . . . . . . . . . . . . . . . . . . . . . . . . . Export Session Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-2 4-3 4-4 4-6 4-15 4-30 4-36 4-38
Specifying Univariate Lag Operator Polynomials Interactively . . . . . . . . Specify Lag Structure Using Lag Order Tab . . . . . . . . . . . . . . . . . . . . . . Specify Lag Structure Using Lag Vector Tab . . . . . . . . . . . . . . . . . . . . . .
4-44 4-45 4-47
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Lag Structure Using Lag Order Tab . . . . . . . . . . . . . . . . . . . . . . Specify Lag Structure Using Lag Vector Tab . . . . . . . . . . . . . . . . . . . . . . Specify Coefficient Matrix Equality Constraints for Estimation . . . . . . . .
4-50 4-51 4-52 4-54
Prepare Time Series Data for Econometric Modeler App . . . . . . . . . . . . Prepare Table of Multivariate Data for Import . . . . . . . . . . . . . . . . . . . . . Prepare Numeric Vector for Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-59 4-59 4-60
Import Time Series Data into Econometric Modeler App . . . . . . . . . . . . Import Data from MATLAB Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . Import Data from MAT-File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-62 4-62 4-63
Plot Time Series Data Using Econometric Modeler App . . . . . . . . . . . . . Plot Univariate Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot Multivariate Time Series and Correlations . . . . . . . . . . . . . . . . . . . .
4-66 4-66 4-67
Detect Serial Correlation Using Econometric Modeler App . . . . . . . . . . Plot ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conduct Ljung-Box Q-Test for Significant Autocorrelation . . . . . . . . . . . .
4-71 4-71 4-73
Detect ARCH Effects Using Econometric Modeler App . . . . . . . . . . . . . . Inspect Correlograms of Squared Residuals for ARCH Effects . . . . . . . . . Conduct Ljung-Box Q-Test on Squared Residuals . . . . . . . . . . . . . . . . . . Conduct Engle's ARCH Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-77 4-77 4-80 4-82
Assess Stationarity of Time Series Using Econometric Modeler . . . . . . . Test Assuming Unit Root Null Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Assuming Stationary Null Model . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Assuming Random Walk Null Model . . . . . . . . . . . . . . . . . . . . . . . .
4-84 4-84 4-87 4-90
Assess Collinearity Among Multiple Series Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-94
Transform Time Series Using Econometric Modeler App . . . . . . . . . . . . Apply Log Transformation to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stabilize Time Series Using Nonseasonal Differencing . . . . . . . . . . . . .
4-97 4-97 4-101
Convert Prices to Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remove Seasonal Trend from Time Series Using Seasonal Difference . . Remove Deterministic Trend from Time Series . . . . . . . . . . . . . . . . . . .
4-104 4-107 4-109
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-112
Select ARCH Lags for GARCH Model Using Econometric Modeler App ........................................................
4-122
Estimate Multiplicative ARIMA Model Using Econometric Modeler App ........................................................
4-131
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-141 Specify t Innovation Distribution Using Econometric Modeler App . . .
4-150
Estimate Vector Autoregression Model Using Econometric Modeler . .
4-155
Conduct Cointegration Test Using Econometric Modeler . . . . . . . . . . .
4-170
Estimate Vector Error-Correction Model Using Econometric Modeler
4-180
Compare Predictive Performance After Creating Models Using Econometric Modeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-193
Estimate ARIMAX Model Using Econometric Modeler App . . . . . . . . . .
4-200
Estimate Regression Model with ARMA Errors Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-208
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-221
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-230 Share Results of Econometric Modeler App Session . . . . . . . . . . . . . . .
5
4-237
Time Series Regression Models Time Series Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-3
Regression Models with Time Series Errors . . . . . . . . . . . . . . . . . . . . . . . . What Are Regression Models with Time Series Errors? . . . . . . . . . . . . . . . Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-5 5-5 5-5
Create Regression Models with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with ARIMA Errors Specifications . . . . . . . . . .
5-8 5-8
ix
Specify regARIMA Models Using Name-Value Pair Arguments . . . . . . . . . . 5-9 Specify Linear Regression Models Using Econometric Modeler App . . . . 5-15
x
Contents
Specify the Default Regression Model with ARIMA Errors . . . . . . . . . . .
5-19
Modify regARIMA Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modify Properties Using Dot Notation . . . . . . . . . . . . . . . . . . . . . . . . . . Nonmodifiable Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-21 5-21 5-23
Create Regression Models with AR Errors . . . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with AR Errors . . . . . . . . . . . . . . . . . . . . . . . . AR Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . . . . AR Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with AR Errors . . . . . . Regression Model with AR Errors and t Innovations . . . . . . . . . . . . . . . .
5-26 5-26 5-27 5-27 5-28 5-29
Create Regression Models with MA Errors . . . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with MA Errors . . . . . . . . . . . . . . . . . . . . . . . MA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with MA Errors . . . . . . Regression Model with MA Errors and t Innovations . . . . . . . . . . . . . . . .
5-31 5-31 5-32 5-32 5-33 5-34
Create Regression Models with ARMA Errors . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with ARMA Errors . . . . . . . . . . . . . . . . . . . . . ARMA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . ARMA Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with ARMA Errors . . . . Regression Model with ARMA Errors and t Innovations . . . . . . . . . . . . . Specify Regression Model with ARMA Errors Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-36 5-36 5-37 5-37 5-38 5-38
Create Regression Models with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . Default Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . ARIMA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . ARIMA Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with ARIMA Errors . . . Regression Model with ARIMA Errors and t Innovations . . . . . . . . . . . . .
5-44 5-44 5-45 5-45 5-46 5-47
Create Regression Models with SARIMA Errors . . . . . . . . . . . . . . . . . . . . SARMA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with SARIMA Errors . . Regression Model with SARIMA Errors and t Innovations . . . . . . . . . . . .
5-49 5-49 5-50 5-50
Specify Regression Model with SARIMA Errors . . . . . . . . . . . . . . . . . . . .
5-53
Specify ARIMA Error Model Innovation Distribution . . . . . . . . . . . . . . . . About the Innovation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Innovation Distribution Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-59 5-59 5-60 5-60
Impulse Response of Regression Models with ARIMA Errors . . . . . . . . .
5-64
Plot Impulse Response of Regression Model with ARIMA Errors . . . . . . Regression Model with AR Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-65 5-65
5-40
Regression Model with MA Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Model with ARMA Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-66 5-67 5-69
Maximum Likelihood Estimation of regARIMA Models . . . . . . . . . . . . . . Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-72 5-72 5-72
regARIMA Model Estimation Using Equality Constraints . . . . . . . . . . . .
5-74
Presample Values for regARIMA Model Estimation . . . . . . . . . . . . . . . . .
5-78
Initial Values for regARIMA Model Estimation . . . . . . . . . . . . . . . . . . . . .
5-80
Optimization Settings for regARIMA Model Estimation . . . . . . . . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constraints on Regression Models with ARIMA Errors . . . . . . . . . . . . . .
5-82 5-82 5-84
Estimate Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . .
5-85
Estimate a Regression Model with Multiplicative ARIMA Errors . . . . . .
5-92
Select Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . . .
5-100
Choose Lags for ARMA Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-102
Intercept Identifiability in Regression Models with ARIMA Errors . . . Intercept Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intercept Identifiability Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-106 5-106 5-107
Alternative ARIMA Model Representations . . . . . . . . . . . . . . . . . . . . . . . Mathematical Development of regARIMA to ARIMAX Model Conversion .................................................... Show Conversion in MATLAB® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-110
Simulate Regression Models with ARMA Errors . . . . . . . . . . . . . . . . . . . Simulate an AR Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate an MA Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate an ARMA Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-116 5-116 5-122 5-128
Simulate Regression Models with Nonstationary Errors . . . . . . . . . . . . Simulate a Regression Model with Nonstationary Errors . . . . . . . . . . . Simulate a Regression Model with Nonstationary Exponential Errors . .
5-135 5-135 5-138
5-110 5-112
Simulate Regression Models with Multiplicative Seasonal Errors . . . . 5-143 Simulate a Regression Model with Stationary Multiplicative Seasonal Errors .................................................... 5-143 Untitled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-145 Monte Carlo Simulation of Regression Models with ARIMA Errors . . . What Is Monte Carlo Simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Monte Carlo Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-148 5-148 5-148 5-149
Presample Data for regARIMA Model Simulation . . . . . . . . . . . . . . . . .
5-151
xi
Transient Effects in regARIMA Model Simulations . . . . . . . . . . . . . . . . What Are Transient Effects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of Transient Effects on Regression . . . . . . . . . . . . . . . . . . .
5-152 5-152 5-152
Forecast a Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . .
5-160
Forecast a Regression Model with Multiplicative Seasonal ARIMA Errors ........................................................ 5-163
6
Verify Predictive Ability Robustness of a regARIMA Model . . . . . . . . . .
5-167
MMSE Forecasting Regression Models with ARIMA Errors . . . . . . . . . What Are MMSE Forecasts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How forecast Generates MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . Forecast Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-169 5-169 5-169 5-171
Monte Carlo Forecasting of regARIMA Models . . . . . . . . . . . . . . . . . . . . Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantage of Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-172 5-172 5-172
Time Series Regression I: Linear Models . . . . . . . . . . . . . . . . . . . . . . . .
5-173
Time Series Regression II: Collinearity and Estimator Variance . . . . .
5-180
Time Series Regression III: Influential Observations . . . . . . . . . . . . . . .
5-190
Time Series Regression IV: Spurious Regression . . . . . . . . . . . . . . . . . .
5-197
Time Series Regression V: Predictor Selection . . . . . . . . . . . . . . . . . . . .
5-209
Time Series Regression VI: Residual Diagnostics . . . . . . . . . . . . . . . . .
5-220
Time Series Regression VII: Forecasting . . . . . . . . . . . . . . . . . . . . . . . . .
5-231
Time Series Regression VIII: Lagged Variables and Estimator Bias . . .
5-240
Time Series Regression IX: Lag Order Selection . . . . . . . . . . . . . . . . . .
5-261
Time Series Regression X: Generalized Least Squares and HAC Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-279
Bayesian Linear Regression Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classical Versus Bayesian Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Main Bayesian Analysis Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . Posterior Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implement Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . Workflow for Standard Bayesian Linear Regression Models . . . . . . . . . .
xii
Contents
6-2 6-2 6-3 6-4 6-10 6-10
7
Workflow for Bayesian Predictor Selection . . . . . . . . . . . . . . . . . . . . . . .
6-13
Specify Gradient for HMC Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6-18
Posterior Estimation and Simulation Diagnostics . . . . . . . . . . . . . . . . . . Diagnose MCMC Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6-27 6-27 6-34
Tune Slice Sampler for Posterior Estimation . . . . . . . . . . . . . . . . . . . . . .
6-36
Compare Robust Regression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . .
6-43
Bayesian Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6-52
Bayesian Stochastic Search Variable Selection . . . . . . . . . . . . . . . . . . . .
6-63
Replacing Removed Syntaxes of estimate . . . . . . . . . . . . . . . . . . . . . . . . . Replace Removed Syntax When Estimating Analytical Marginal Posterior ..................................................... Replace Removed Syntax When Estimating Numerical Marginal Posterior ..................................................... Replace Removed Syntax When Estimating Conditional Posterior . . . . . .
6-73 6-74 6-75 6-77
Conditional Mean Models Conditional Mean Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unconditional vs. Conditional Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Static vs. Dynamic Conditional Mean Models . . . . . . . . . . . . . . . . . . . . . . Conditional Mean Models for Stationary Processes . . . . . . . . . . . . . . . . . .
7-3 7-3 7-3 7-3
Specify Conditional Mean Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 Default ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 Specify Nonseasonal Models Using Name-Value Pairs . . . . . . . . . . . . . . . . 7-7 Specify Multiplicative Models Using Name-Value Pairs . . . . . . . . . . . . . . 7-11 Specify Conditional Mean Model Using Econometric Modeler App . . . . . 7-14 Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AR(p) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stationarity of the AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-17 7-17 7-17
AR Model Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AR Model with No Constant Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AR Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . . AR Model with t Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . Specify AR Model Using Econometric Modeler App . . . . . . . . . . . . . . . .
7-19 7-19 7-19 7-20 7-21 7-22 7-22
Moving Average Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MA(q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Invertibility of the MA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-26 7-26 7-26
xiii
xiv
Contents
MA Model Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default MA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Model with No Constant Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . MA Model with t Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . Specify MA Model Using Econometric Modeler App . . . . . . . . . . . . . . . .
7-28 7-28 7-28 7-29 7-30 7-31 7-31
Autoregressive Moving Average Model . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMA(p,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stationarity and Invertibility of the ARMA Model . . . . . . . . . . . . . . . . . .
7-35 7-35 7-35
ARMA Model Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default ARMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMA Model with No Constant Term . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . . Specify ARMA Model Using Econometric Modeler App . . . . . . . . . . . . . .
7-37 7-37 7-37 7-38 7-39
ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-42
ARIMA Model Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ARIMA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . Specify ARIMA Model Using Econometric Modeler App . . . . . . . . . . . . .
7-44 7-44 7-45 7-45
Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-49
Multiplicative ARIMA Model Specifications . . . . . . . . . . . . . . . . . . . . . . . Seasonal ARIMA Model with No Constant Term . . . . . . . . . . . . . . . . . . . Seasonal ARIMA Model with Known Parameter Values . . . . . . . . . . . . . . Specify Multiplicative ARIMA Model Using Econometric Modeler App . .
7-51 7-51 7-52 7-53
Specify Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-58
ARIMA Model Including Exogenous Covariates . . . . . . . . . . . . . . . . . . . . ARIMAX(p,D,q) Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conventions and Extensions of the ARIMAX Model . . . . . . . . . . . . . . . . .
7-62 7-62 7-62
ARIMAX Model Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create ARIMAX Model Using Name-Value Pairs . . . . . . . . . . . . . . . . . . . Specify ARMAX Model Using Dot Notation . . . . . . . . . . . . . . . . . . . . . . . Specify ARIMAX or SARIMAX Model Using Econometric Modeler App . .
7-64 7-64 7-65 7-66
Modify Properties of Conditional Mean Model Objects . . . . . . . . . . . . . . Dot Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonmodifiable Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-70 7-70 7-73
Specify Conditional Mean Model Innovation Distribution . . . . . . . . . . . . About the Innovation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choices for the Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choices for the Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . Specify the Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modify the Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-76 7-76 7-76 7-76 7-77 7-79
Specify Conditional Mean and Variance Models . . . . . . . . . . . . . . . . . . . .
7-81
Plot the Impulse Response Function of Conditional Mean Model . . . . . 7-86 IRF of Moving Average Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-86 IRF of Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-90 IRF of ARMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-95 IRF of Seasonal AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-97 More About the Impulse Response Function . . . . . . . . . . . . . . . . . . . . . 7-101 Time Base Partitions for ARIMA Model Estimation . . . . . . . . . . . . . . . . Partition Time Series Data for Estimation . . . . . . . . . . . . . . . . . . . . . . .
7-103 7-105
Box-Jenkins Differencing vs. ARIMA Estimation . . . . . . . . . . . . . . . . . .
7-109
Maximum Likelihood Estimation for Conditional Mean Models . . . . . . Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-112 7-112 7-112
Conditional Mean Model Estimation with Equality Constraints . . . . . .
7-114
Presample Data for Conditional Mean Model Estimation . . . . . . . . . . .
7-115
Initial Values for Conditional Mean Model Estimation . . . . . . . . . . . . .
7-117
Optimization Settings for Conditional Mean Model Estimation . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional Mean Model Constraints . . . . . . . . . . . . . . . . . . . . . . . . . .
7-119 7-119 7-121
Estimate Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . .
7-122
Model Seasonal Lag Effects Using Indicator Variables . . . . . . . . . . . . .
7-125
Forecast IGD Rate from ARX Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-129
Estimate Conditional Mean and Variance Model . . . . . . . . . . . . . . . . . .
7-135
Choose ARMA Lags Using BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-140
Infer Residuals for Diagnostic Checking . . . . . . . . . . . . . . . . . . . . . . . . .
7-143
Monte Carlo Simulation of Conditional Mean Models . . . . . . . . . . . . . . What Is Monte Carlo Simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Monte Carlo Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-148 7-148 7-148 7-149
Presample Data for Conditional Mean Model Simulation . . . . . . . . . . .
7-150
Transient Effects in Conditional Mean Model Simulations . . . . . . . . . .
7-151
Simulate Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate AR Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate MA Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-152 7-152 7-156
Simulate Trend-Stationary and Difference-Stationary Processes . . . . .
7-160
Simulate Multiplicative ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . .
7-164
xv
Simulate Conditional Mean and Variance Models . . . . . . . . . . . . . . . . .
7-167
Monte Carlo Forecasting of Conditional Mean Models . . . . . . . . . . . . . Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantage of Monte Carlo Forecasting . . . . . . . . . . . . . . . . . . . . . . . . .
7-171 7-171 7-171
MMSE Forecasting of Conditional Mean Models . . . . . . . . . . . . . . . . . . What Are MMSE Forecasts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How forecast Generates MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . Forecast Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-172 7-172 7-172 7-173
Convergence of AR Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-175
Forecast Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-179
Specify Presample and Forecast Period Data to Forecast ARIMAX Model ........................................................ 7-182
8
xvi
Contents
Forecast Conditional Mean and Variance Model . . . . . . . . . . . . . . . . . . .
7-186
Model and Simulate Electricity Spot Prices Using the Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-189
Conditional Variance Models Conditional Variance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Conditional Variance Model Definition . . . . . . . . . . . . . . . . . . . . . GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EGARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GJR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-2 8-2 8-3 8-3 8-4
Specify GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Default GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Name-Value Pair Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify GARCH Model Using Econometric Modeler App . . . . . . . . . . . . . Specify GARCH Model with Mean Offset . . . . . . . . . . . . . . . . . . . . . . . . . Specify GARCH Model with Known Parameter Values . . . . . . . . . . . . . . . Specify GARCH Model with t Innovation Distribution . . . . . . . . . . . . . . . Specify GARCH Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . .
8-6 8-6 8-7 8-8 8-11 8-13 8-14 8-14 8-15
Specify EGARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default EGARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Default EGARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Name-Value Pair Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify EGARCH Model Using Econometric Modeler App . . . . . . . . . . . . Specify EGARCH Model with Mean Offset . . . . . . . . . . . . . . . . . . . . . . . . Specify EGARCH Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . Specify EGARCH Model with Known Parameter Values . . . . . . . . . . . . . . Specify EGARCH Model with t Innovation Distribution . . . . . . . . . . . . . .
8-17 8-17 8-18 8-19 8-22 8-24 8-24 8-25 8-26
Specify GJR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default GJR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Default GJR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Name-Value Pair Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify GJR Model Using Econometric Modeler App . . . . . . . . . . . . . . . . Specify GJR Model with Mean Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify GJR Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . Specify GJR Model with Known Parameter Values . . . . . . . . . . . . . . . . . . Specify GJR Model with t Innovation Distribution . . . . . . . . . . . . . . . . . .
8-28 8-28 8-29 8-30 8-33 8-35 8-35 8-36 8-37
Modify Properties of Conditional Variance Models . . . . . . . . . . . . . . . . . Dot Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonmodifiable Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-39 8-39 8-41
Specify the Conditional Variance Model Innovation Distribution . . . . . .
8-44
Specify Conditional Variance Model for Exchange Rates . . . . . . . . . . . . .
8-47
Maximum Likelihood Estimation for Conditional Variance Models . . . . Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-52 8-52 8-52
Conditional Variance Model Estimation with Equality Constraints . . . .
8-54
Presample Data for Conditional Variance Model Estimation . . . . . . . . . .
8-55
Initial Values for Conditional Variance Model Estimation . . . . . . . . . . . .
8-57
Optimization Settings for Conditional Variance Model Estimation . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional Variance Model Constraints . . . . . . . . . . . . . . . . . . . . . . . . .
8-58 8-58 8-60
Infer Conditional Variances and Residuals . . . . . . . . . . . . . . . . . . . . . . . .
8-62
Likelihood Ratio Test for Conditional Variance Models . . . . . . . . . . . . . .
8-66
Compare Conditional Variance Models Using Information Criteria . . . .
8-69
Monte Carlo Simulation of Conditional Variance Models . . . . . . . . . . . . What Is Monte Carlo Simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Monte Carlo Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-72 8-72 8-72 8-73
Presample Data for Conditional Variance Model Simulation . . . . . . . . . .
8-75
Simulate GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-76
Assess EGARCH Forecast Bias Using Simulations . . . . . . . . . . . . . . . . . .
8-81
Simulate Conditional Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-86
Monte Carlo Forecasting of Conditional Variance Models . . . . . . . . . . . . Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantage of Monte Carlo Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . .
8-89 8-89 8-89
xvii
9
MMSE Forecasting of Conditional Variance Models . . . . . . . . . . . . . . . . . What Are MMSE Forecasts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EGARCH MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How forecast Generates MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . .
8-90 8-90 8-90 8-90
Forecast GJR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-94
Forecast a Conditional Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-97
Converting from GARCH Functions to Model Objects . . . . . . . . . . . . . . .
8-99
Using Bootstrapping and Filtered Historical Simulation to Evaluate Market Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-102
Using Extreme Value Theory and Copulas to Evaluate Market Risk . . .
8-114
Multivariate Time Series Models Vector Autoregression (VAR) Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Stationary Multivariate Time Series Models . . . . . . . . . . . . . . . . Lag Operator Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stable and Invertible Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Models with Regression Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VAR Model Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xviii
Contents
9-3 9-3 9-5 9-6 9-6 9-7
Multivariate Time Series Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Load Multivariate Economic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preprocess Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time Base Partitions for Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partition Multivariate Time Series Data for Estimation . . . . . . . . . . . . . .
9-10 9-10 9-10 9-12 9-15 9-15 9-18
Vector Autoregression (VAR) Model Creation . . . . . . . . . . . . . . . . . . . . . . Create VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fully Specified Model Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Template for Unrestricted Estimation . . . . . . . . . . . . . . . . . . . . . . Partially Specified Model Object for Restricted Estimation . . . . . . . . . . . Display and Change Model Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Select Appropriate Lag Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-20 9-20 9-21 9-23 9-24 9-24 9-27
Create and Adjust VAR Model Using Shorthand Syntax . . . . . . . . . . . . . .
9-30
Create and Adjust VAR Model Using Longhand Syntax . . . . . . . . . . . . . .
9-32
VAR Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preparing VAR Models for Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fitting Models to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examining the Stability of a Fitted Model . . . . . . . . . . . . . . . . . . . . . . . .
9-34 9-34 9-34 9-35
Convert VARMA Model to VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-37
Fit VAR Model of CPI and Unemployment Rate . . . . . . . . . . . . . . . . . . . .
9-38
Fit VAR Model to Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-42
VAR Model Forecasting, Simulation, and Analysis . . . . . . . . . . . . . . . . . . VAR Model Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-44 9-44 9-46 9-46
Generate VAR Model Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . .
9-48
Compare Generalized and Orthogonalized Impulse Response Functions .........................................................
9-52
Forecast VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-60
Forecast VAR Model Using Monte Carlo Simulation . . . . . . . . . . . . . . . .
9-63
Forecast VAR Model Conditional Responses . . . . . . . . . . . . . . . . . . . . . . .
9-66
Implement Seemingly Unrelated Regression . . . . . . . . . . . . . . . . . . . . . .
9-70
Estimate Capital Asset Pricing Model Using SUR . . . . . . . . . . . . . . . . . .
9-75
Simulate Responses of Estimated VARX Model . . . . . . . . . . . . . . . . . . . .
9-78
Simulate VAR Model Conditional Responses . . . . . . . . . . . . . . . . . . . . . .
9-85
Simulate Responses Using filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-89
VAR Model Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-91
Convert from vgx Functions to Model Objects . . . . . . . . . . . . . . . . . . . .
9-105
Cointegration and Error Correction Analysis . . . . . . . . . . . . . . . . . . . . . Integration and Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cointegration and Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . The Role of Deterministic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cointegration Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-108 9-108 9-108 9-109 9-110
Determine Cointegration Rank of VEC Model . . . . . . . . . . . . . . . . . . . .
9-112
Identifying Single Cointegrating Relations . . . . . . . . . . . . . . . . . . . . . . . The Engle-Granger Test for Cointegration . . . . . . . . . . . . . . . . . . . . . . . Limitations of the Engle-Granger Test . . . . . . . . . . . . . . . . . . . . . . . . . .
9-114 9-114 9-114
Test for Cointegration Using the Engle-Granger Test . . . . . . . . . . . . . .
9-118
Estimate VEC Model Parameters Using egcitest . . . . . . . . . . . . . . . . . .
9-122
VEC Model Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-125
Generate VEC Model Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . .
9-133
xix
10
xx
Contents
Identifying Multiple Cointegrating Relations . . . . . . . . . . . . . . . . . . . . .
9-137
Test for Cointegration Using the Johansen Test . . . . . . . . . . . . . . . . . . .
9-138
Estimate VEC Model Parameters Using jcitest . . . . . . . . . . . . . . . . . . . .
9-140
Compare Approaches to Cointegration Analysis . . . . . . . . . . . . . . . . . . .
9-143
Testing Cointegrating Vectors and Adjustment Speeds . . . . . . . . . . . . .
9-146
Test Cointegrating Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-147
Test Adjustment Speeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-149
Model the United States Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-151
Structural Change Models Discrete-Time Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Discrete-Time Markov Chains? . . . . . . . . . . . . . . . . . . . . . . . . Discrete-Time Markov Chain Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-2 10-2 10-3
Markov Chain Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discrete-Time Markov Chain Object Framework Overview . . . . . . . . . . . Markov Chain Analysis Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-8 10-8 10-11
Create and Modify Markov Chain Model Objects . . . . . . . . . . . . . . . . . . Create Markov Chain from Stochastic Transition Matrix . . . . . . . . . . . . Create Markov Chain from Random Transition Matrix . . . . . . . . . . . . . Specify Structure for Random Markov Chain . . . . . . . . . . . . . . . . . . . .
10-17 10-17 10-18 10-20
Work with State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-23
Visualize Markov Chain Structure and Evolution . . . . . . . . . . . . . . . . . .
10-27
Determine Asymptotic Behavior of Markov Chain . . . . . . . . . . . . . . . . .
10-39
Identify Classes in Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-47
Compare Markov Chain Mixing Times . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-50
Simulate Random Walks Through Markov Chain . . . . . . . . . . . . . . . . . .
10-59
Compute State Distribution of Markov Chain at Each Time Step . . . . .
10-66
Create Threshold Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-73
Visualize Threshold Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-76
Evaluate Threshold Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-82
11
Create Threshold-Switching Dynamic Regression Models . . . . . . . . . . .
10-88
Estimate Threshold-Switching Dynamic Regression Models . . . . . . . . .
10-93
Simulate Paths of Threshold-Switching Dynamic Regression Models
10-110
Forecast Threshold-Switching Dynamic Regression Models . . . . . . . .
10-117
Analyze US Unemployment Rate Using Threshold-Switching Model .
10-123
State-Space Models What Are State-Space Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State-Space Model Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-3 11-3 11-5
What Is the Kalman Filter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Standard Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filtered States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothed States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothed State Disturbances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forecasted Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothed Observation Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kalman Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Backward Recursion of the Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . Diffuse Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-7 11-7 11-8 11-8 11-9 11-9 11-10 11-10 11-11 11-11 11-12
Explicitly Create State-Space Model Containing Known Parameter Values ........................................................ 11-14 Create State-Space Model with Unknown Parameters . . . . . . . . . . . . . . Explicitly Create State-Space Model Containing Unknown Parameters .................................................... Implicitly Create Time-Invariant State-Space Model . . . . . . . . . . . . . . .
11-16
Create State-Space Model Containing ARMA State . . . . . . . . . . . . . . . .
11-19
11-16 11-17
Implicitly Create State-Space Model Containing Regression Component ........................................................ 11-22 Implicitly Create Diffuse State-Space Model Containing Regression Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-24
Implicitly Create Time-Varying State-Space Model . . . . . . . . . . . . . . . .
11-26
Implicitly Create Time-Varying Diffuse State-Space Model . . . . . . . . . .
11-28
Create State-Space Model with Random State Coefficient . . . . . . . . . .
11-31
xxi
Estimate Time-Invariant State-Space Model . . . . . . . . . . . . . . . . . . . . . .
11-33
Estimate Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . .
11-36
Estimate Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . . .
11-40
Estimate State-Space Model Containing Regression Component . . . . .
11-44
Filter States of State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-46
Filter Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . . .
11-49
Filter Data Through State-Space Model in Real Time . . . . . . . . . . . . . .
11-54
Filter Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . . . . .
11-60
Filter States of State-Space Model Containing Regression Component ........................................................
11-66
Smooth States of State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-69
Smooth Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . . .
11-72
Smooth Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . . . .
11-78
Smooth States of State-Space Model Containing Regression Component ........................................................ 11-84 Simulate States and Observations of Time-Invariant State-Space Model ........................................................
11-87
Simulate Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . .
11-90
Simulate States of Time-Varying State-Space Model Using Simulation Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-94
Estimate Random Parameter of State-Space Model . . . . . . . . . . . . . . . .
11-97
Forecast State-Space Model Using Monte-Carlo Methods . . . . . . . . . .
11-104
Forecast State-Space Model Observations . . . . . . . . . . . . . . . . . . . . . .
11-110
Forecast Observations of State-Space Model Containing Regression Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-113
Forecast Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . .
11-117
Forecast State-Space Model Containing Regime Change in the Forecast Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-121
xxii
Contents
Forecast Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . .
11-126
Compare Simulation Smoother to Smoothed States . . . . . . . . . . . . . .
11-130
Rolling-Window Analysis of Time-Series Models . . . . . . . . . . . . . . . . . Rolling-Window Analysis for Parameter Stability . . . . . . . . . . . . . . . . . Rolling Window Analysis for Predictive Performance . . . . . . . . . . . . . .
11-135 11-135 11-135
Assess State-Space Model Stability Using Rolling Window Analysis . Assess Model Stability Using Rolling Window Analysis . . . . . . . . . . . . Assess Stability of Implicitly Created State-Space Model . . . . . . . . . .
11-138 11-138 11-141
Choose State-Space Model Specification Using Backtesting . . . . . . . .
11-145
Apply State-Space Methodology to Analyze Diebold-Li Yield Curve Model ....................................................... 11-148
12
A
Analyze Linearized DSGE Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-178
Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-199
Functions
Appendices Data Sets and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A-2
xxiii
1 Getting Started • “Econometrics Toolbox Product Description” on page 1-2 • “Econometric Modeling” on page 1-3 • “Represent Time Series Models Using Econometrics Toolbox Objects” on page 1-7 • “Stochastic Process Characteristics” on page 1-18 • “Bibliography” on page 1-24
1
Getting Started
Econometrics Toolbox Product Description Model and analyze financial and economic systems using statistical time series methods Econometrics Toolbox provides functions and interactive workflows for analyzing and modeling time series data. It offers a wide range of visualizations and diagnostics for model selection, including tests for autocorrelation and heteroscedasticity, unit roots and stationarity, cointegration, causality, and structural change. You can estimate, simulate, and forecast economic systems using a variety of modeling frameworks that can be used either interactively, using the Econometric Modeler app, or programmatically, using functions provided in the toolbox. These frameworks include regression, ARIMA, state-space, GARCH, multivariate VAR and VEC, and switching models. The toolbox also provides Bayesian tools for developing time-varying models that learn from new data.
1-2
Econometric Modeling
Econometric Modeling In this section... “Model Selection” on page 1-3 “Econometrics Toolbox Features” on page 1-3
Model Selection A probabilistic time series model is necessary for a wide variety of analysis goals, including regression inference, forecasting, and Monte Carlo simulation. When selecting a model, aim to find the most parsimonious model that adequately describes your data. A simple model is easier to estimate, forecast, and interpret. • Specification tests help you identify one or more model families that could plausibly describe the data generating process. • Model comparisons help you compare the fit of competing models, with penalties for complexity. • Goodness-of-fit checks help you assess the in-sample adequacy of your model, verify that all model assumptions hold, and evaluate out-of-sample forecast performance. Model selection is an iterative process. When goodness-of-fit checks suggest model assumptions are not satisfied—or the predictive performance of the model is not satisfactory—consider making model adjustments. Additional specification tests, model comparisons, and goodness-of-fit checks help guide this process.
Econometrics Toolbox Features Modeling Questions
Features
What is the dimension of my response variable?
• The conditional mean and variance models, regression • arima models with ARIMA errors, and Bayesian linear regression • bayeslm models in this toolbox are for modeling univariate, • egarch discrete-time data. • egcitest • Separate models are available for multivariate, discretetime data, such as VAR and VEC models. • State-space models support univariate or multivariate response variables.
Related Functions
• dssm • garch • gjr • jcontest • regARIMA • ssm • varm
Is my time series stationary?
• Stationarity tests are available. If your data is not • arima stationary, consider transforming your data. Stationarity is • i10test the foundation of many time series models. • kpsstest • Or, consider using a nonstationary ARIMA model if there • lmctest is evidence of a unit root in your data.
1-3
1
Getting Started
Modeling Questions
Features
Related Functions
Does my time series have a unit root?
• Unit root tests are available. Evidence in favor of a unit root suggests your data is difference stationary.
• adftest
• You can difference a series with a unit root until it is stationary, or model it using a nonstationary ARIMA model.
• i10test
• arima • pptest • vratiotest
How can I handle seasonal effects?
• You can deseasonalize (seasonally adjust) your data. Use seasonal filters or regression models to estimate the seasonal component.
• arima • regARIMA
• Seasonal ARIMA models use seasonal differencing to remove seasonal effects. You can also include seasonal lags to model seasonal autocorrelation (both additively and multiplicatively). Is my data autocorrelated?
• Sample autocorrelation and partial autocorrelation functions help identify autocorrelation.
• arima
• Conduct a Ljung-Box Q-test to test autocorrelations at several lags jointly.
• fgls
• If autocorrelation is present, consider using a conditional mean model.
• lbqtest
• For regression models with autocorrelated errors, consider using FGLS or HAC estimators. If the error model structure is an ARIMA model, consider using a regression model with ARIMA errors. What if my data is heteroscedastic (exhibits volatility clustering)?
• autocorr • hac • parcorr • regARIMA
• Looking for autocorrelation in the squared residual series • archtest is one way to detect conditional heteroscedasticity. • egarch • Engle’s ARCH test evaluates evidence against the null of • fgls independent innovations in favor of an ARCH model • garch alternative. • To model conditional heteroscedasticity, consider using a • gjr conditional variance model.
• hac
• For regression models that exhibit heteroscedastic errors, consider using FGLS or HAC estimators. Is there an alternative to a Gaussian innovation distribution for leptokurtic data?
• You can use a Student’s t distribution to model fatter tails • arima than a Gaussian distribution (excess kurtosis). • egarch • You can specify a t innovation distribution for all • garch conditional mean and variance models, and ARIMA error • gjr models in Econometrics Toolbox. • regARIMA • You can estimate the degrees of freedom of the t distribution along with other model parameters.
1-4
Econometric Modeling
Modeling Questions
Features
Related Functions
How do I decide between several model fits?
• You can compare nested models using misspecification tests, such as the likelihood ratio test, Wald’s test, or Lagrange multiplier test.
• aicbic
• Information criteria, such as AIC or BIC, compare model fit with a penalty for complexity. Do I have two or • The Johansen and Engle-Granger cointegration tests more time series that assess evidence of cointegration. are cointegrated? • Consider using the VEC model for modeling multivariate, cointegrated series.
• lmtest • lratiotest • waldtest • egcitest • jcitest • jcontest
• Also consider cointegration when regressing time series. If present, it can introduce spurious regression effects. What if I want to include predictor variables?
• ARIMAX, VARX, regression models with ARIMA errors, and Bayesian linear regression models are available in this toolbox. • State-space models support predictor data.
• arima • bayeslm • dssm • ssm • regARIMA • varm
What if I want to implement regression, but the classical linear model assumptions might not apply?
• Regression models with ARIMA errors are available in this • bayeslm toolbox. • fgls • Regress robustly using FGLS or HAC estimators. • hac • Use Bayesian linear regression.
• mvregress
• For a series of examples on time series regression techniques that illustrate common principles and tasks in time series regression modeling, see Econometrics Toolbox Examples.
• regARIMA
• For more regression options, see Statistics and Machine Learning Toolbox™ documentation. What if observations Standard, linear state-space modeling is available in this of a dynamic process toolbox. include measurement error?
• dssm • ssm
See Also Related Examples •
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4
•
“Detect Autocorrelation” on page 3-20
•
“Detect ARCH Effects” on page 3-28
•
“Unit Root Tests” on page 3-41
•
“Time Series Regression I: Linear Models” on page 5-173
1-5
1
Getting Started
•
“Time Series Regression II: Collinearity and Estimator Variance” on page 5-180
•
“Time Series Regression III: Influential Observations” on page 5-190
•
“Time Series Regression IV: Spurious Regression” on page 5-197
•
“Time Series Regression V: Predictor Selection” on page 5-209
•
“Time Series Regression VI: Residual Diagnostics” on page 5-220
•
“Time Series Regression VII: Forecasting” on page 5-231
•
“Time Series Regression VIII: Lagged Variables and Estimator Bias” on page 5-240
•
“Time Series Regression IX: Lag Order Selection” on page 5-261
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-279
More About
1-6
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
•
“Box-Jenkins Methodology” on page 3-2
•
“Goodness of Fit” on page 3-86
•
“Regression Models with Time Series Errors” on page 5-5
•
“Nonspherical Models” on page 3-90
•
“Conditional Mean Models” on page 7-3
•
“Conditional Variance Models” on page 8-2
•
“Vector Autoregression (VAR) Models” on page 9-3
•
“Cointegration and Error Correction Analysis” on page 9-108
Represent Time Series Models Using Econometrics Toolbox Objects
Represent Time Series Models Using Econometrics Toolbox Objects In this section... “Model Objects” on page 1-7 “Model Properties” on page 1-10 “Create Model Object” on page 1-11 “Retrieve Model Properties” on page 1-15 “Modify Model Properties” on page 1-16 “Object Functions” on page 1-17
Model Objects Econometrics Toolbox includes a number of model objects used to represent a variety of discretetime, time series models. The supported models are univariate or multivariate, linear or nonlinear, and standard or Bayesian. Model specification tests (see “Specification Testing”), economic theory, or your analysis goals can suggest a model, or set of models, for your data. After preprocessing your data, running specification tests, and selecting a set of candidate models, create the objects that best represent the models in MATLAB® to proceed with your analysis. How you create an object depends on the object type. In general, you create a model object on page 1-11 by calling the object using its name and providing values for the corresponding model parameters. Models contain two main types of parameters: model infrastructure parameters, such as model dimensionality or number of lags, and estimable parameters, such as coefficients and an error variance. Objects store the parameter values, and other information, in model properties on page 110. You operate on models by passing them and possibly other inputs, such as data, to object functions on page 1-17. The following tables contain the objects available with Econometrics Toolbox. Univariate Linear Model Objects This table contains the available objects that represent univariate linear models. You can create some models by using the Econometric Modeler app. Model
Object
Econometric Modeler Support?
Integrated, autoregressive, moving average (ARIMA) model optionally containing exogenous predictor variables (ARIMAX) or seasonal components (SARIMA)
arima
Yes
Regression model with ARIMA errors
regARIMA
Yes
Generalized autoregressive conditional heteroscedasticity garch model (GARCH)
Yes
Exponential GARCH model
egarch
Yes
Glosten-Jagannathan-Runkle model
gjr
Yes
1-7
1
Getting Started
Multivariate Linear Model Objects This table contains the available objects that represent multivariate linear models. Description
Object
Econometric Modeler Support?
Vector autoregression model (VAR) optionally containing exogenous predictor variables (VARX)
varm
Yes
Vector error-correction (VEC), or cointegrated VAR, model vecm optionally containing exogenous predictor variables (VECX)
Yes
Nonlinear Model Objects Nonlinear models included with Econometrics Toolbox are nonlinear because at least one model parameter or coefficient is time-varying. Regime-switching and time-varying state-space models have this characteristic. This table contains the available objects that represent multivariate nonlinear models. Description
Object
Notes
Econometric Modeler Support?
Discrete-state threshold-switching dynamic regression model
tsVAR
A tsVAR object is the No composition of arima or varm objects, specifying the dynamic structure in each state, and a threshold object, specifying the switching mechanism ( see “Other Models” on page 110).
Discrete-state Markov-switching dynamic regression model
msVAR
An msVAR object is the No composition of arima or varm objects, specifying the dynamic structure in each state, and a dtmc object, specifying the switching mechanism ( see “Other Models” on page 110).
Standard, continuous state-space model ssm optionally containing exogenous predictor variables
You can specify coefficient No matrices explicitly or implicitly by supplying a custom function
Continuous state-space model with diffuse initial states optionally containing exogenous predictor variables
You can specify coefficient No matrices explicitly or implicitly by supplying a custom function
dssm
Bayesian Model Objects Econometrics Toolbox includes objects that represent a Bayesian view of some of the available models. A Bayesian model object specifies the parametric form of the model and the prior distributions on the parameters. 1-8
Represent Time Series Models Using Econometrics Toolbox Objects
Bayesian Linear Regression Model Objects
Bayesian linear regression model objects specify a linear regression model for a univariate response variable and the joint prior distribution of the regression coefficients and disturbance variance. In addition to standard Bayesian linear regression, several objects implement Bayesian predictor selection. This table contains the available objects that represent Bayesian linear regression models. To create a Bayesian linear regression model object, you can call the object by name or use the bayeslm function. Description
Object
Econometric Modeler Support?
Normal-inverse-gamma conjugate prior model. The regression coefficients and disturbance variance are dependent random variables.
conjugateblm No
Normal-inverse-gamma semiconjugate prior model. The regression coefficients and disturbance variance are independent random variables.
semiconjugat No eblm
Joint prior distribution is proportional to the inverse of the disturbance variance.
diffuseblm
Joint prior distribution is specified by a random sample from the respective distributions.
empiricalblm No
No
Joint prior distribution is specified in a custom function that customblm you write.
No
Bayesian lasso regression
No
lassoblm
Stochastic search variable selection (SSVS). The regression mixconjugate No coefficients and disturbance variance are dependent blm random variables (the prior and posterior distributions are conjugate). SSVS. The regression coefficients and disturbance variance mixsemiconju No are independent random variables (the prior and posterior gateblm distributions are semiconjugate). Bayesian VAR Model
Bayesian VAR model objects specify a VAR model for the multivariate response variable and the joint prior distribution of the linear coefficient matrices and innovations covariance matrix. This table contains the available objects that represent Bayesian VAR models. To create a Bayesian VAR model object, you can call the object by name or use the bayesvarm function. Description
Object
Econometric Modeler Support?
Normal conjugate prior on the coefficients and fixed covariance
normalbvarm
No
Matrix-normal-inverse-Wishart conjugate prior model. The conjugatebva No VAR coefficients and innovations covariance are dependent rm random variables.
1-9
1
Getting Started
Description
Object
Econometric Modeler Support?
Matrix-normal-inverse-Wishart semiconjugate prior model. The VAR coefficients and innovations covariance are independent random variables.
semiconjugat No ebvarm
Joint prior distribution is proportional to the inverse of the determinant of the innovations covariance.
diffusebvarm No
Joint prior distribution is specified by a random sample from the respective distributions.
empiricalbva No rm
Bayesian State-Space Model
Bayesian state-space model objects specify a linear Gaussian state-space model the multivariate response variable and the joint prior distribution of the parameters. To create a Bayesian state-space model object, call the bssm function. Custom functions you write determine the structure of the statespace model and the joint prior distribution of the parameters. Other Models Econometrics Toolbox includes several objects that you cannot directly fit to data, but are useful for experimenting, characterizing, and visualizing dynamic systems. This table contains the available objects. Description
Object
Estimation
Econometric Modeler Support?
Threshold transitions characterized by transition mid-levels and a transition type
thresho Estimate threshold transitions No ld of a threshold-switching dynamic regression model tsVAR.
Discrete-time Markov chain characterized by a transition matrix
dtmc
Estimate the transition matrix No of a Markov-switching dynamic regression model msVAR.
Lag operator polynomial
LagOp
Not directly estimable
No
Model Properties A model object holds all the information necessary for characterizing the model and performing operations, such as estimation and forecasting. This information is model dependent, but it can include the following quantities: • Parametric form of the model • Number of model parameters (e.g., the degree of the model) • Innovation distribution (Gaussian or Student’s t) • Amount of presample data needed to initialize the model Such pieces of information are properties of the model, which are stored as fields within the model object. In this way, a model object resembles a MATLAB data structure (struct array). 1-10
Represent Time Series Models Using Econometrics Toolbox Objects
All model objects have properties according to the econometric models they represent. Each property has a predefined name, which you cannot change. For example, arima supports conditional mean models (multiplicative and additive AR, MA, ARMA, ARIMA, and ARIMAX processes). Every arima model object has these properties, shown with their corresponding names. Property Name
Property Description
Constant
Model constant
AR
Nonseasonal AR coefficients
MA
Nonseasonal MA coefficients
SAR
Seasonal AR coefficients (in a multiplicative model)
SMA
Seasonal MA coefficients (in a multiplicative model)
D
Degree of nonseasonal differencing
Seasonality
Degree of seasonal differencing
Variance
Variance of the innovation distribution
Distribution
Parametric family of the innovation distribution
P
Amount of presample data needed to initialize the AR component of the model
Q
Amount of presample data needed to initialize the MA component of the model
Create Model Object Create a model object by using its creation function and assigning values to model properties. Objects require values for model infrastructure parameters, either specified directly or inferred by other inputs. Estimable parameters can be specified or unspecified. The creation function assigns default values to any properties you do not, or cannot, specify. Tip It is good practice to be aware of the default property values for any model you create. You can fully specify a model by specifying all parameter values, or partially specify a model by providing only values of the required, model infrastructure parameters and optionally some estimable parameters. In most cases, an estimable parameter is configured for estimation when its value is NaN, which is the default value for estimable parameters for most models. Some objects accept a custom function specifying the model form. Most objects support parameter estimation. For example, to create a model object representing a particular ARIMA model, use the arima function and specify at least the autoregressive and moving average polynomial degrees and the degree of nonseasonal integration. The function creates the model object of the corresponding type (arima) in the MATLAB workspace, as shown in the figure.
1-11
1
Getting Started
You can work with model objects as you would with any other variable in MATLAB. For example, you can assign the object variable a name, view it in the MATLAB Workspace, and display its value in the Command Window by typing its name. When a model object exists in the workspace, double-click its name in the Workspace window to open the Variable Editor. The Variable Editor shows all model properties and their names. This image shows a workspace containing an arima model named Mdl.
Each property name is assigned a value. You can access or reassign writable properties by using dot notation, for example, Mdl.Constant = NaN; In addition to having a predefined name, each model property has a predefined data type. When assigning or modifying a property’s value, the assignment must be consistent with the property data type. For example, the arima properties have these data types.
1-12
Represent Time Series Models Using Econometrics Toolbox Objects
Property Name
Property Data Type
Constant
Scalar
AR
Cell array
MA
Cell array
SAR
Cell array
SMA
Cell array
D
Nonnegative integer
Seasonality
Nonnegative integer
Variance
Positive scalar
Distribution
struct array
P
Nonnegative integer (you cannot specify)
Q
Nonnegative integer (you cannot specify)
Specify an AR(2) Model To illustrate assigning property values, consider specifying the AR(2) model yt = 0 . 8yt − 1 − 0 . 2yt − 2 + εt, where the innovations are independent and identically distributed normal random variables with mean 0 and variance 0.2. Because the equation is a conditional mean model, use arima to create an object that represents the model. Assign values to model properties by using name-value pair arguments. This model has two AR coefficients, 0.8 and -0.2. Assign these values to the property AR as a cell array, {0.8,-0.2}. Assign the value 0.2 to Variance, and 0 to Constant. You do not need to assign a value to Distribution because the default innovation distribution is 'Gaussian'. There are no MA terms, seasonal terms, or degrees of integration, so do not assign values to these properties. You cannot specify values for the properties P and Q. In summary, specify the model as follows: Mdl = arima('AR',{0.8,-0.2},'Variance',0.2,'Constant',0) Mdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 0 0 {0.8 -0.2} at lags [1 2] {} {} {} 0 [1×0] 0.2
1-13
1
Getting Started
The output displays the value of the created model, Mdl. Notice that the property Seasonality is not in the output. Seasonality only displays for models with seasonal integration. The property is still present, however, as seen in the Variable Editor.
Mdl has values for every arima property, even though the specification included only three. arima assigns default values for the unspecified properties. The values of SAR, MA, and SMA are empty cell arrays because the model has no seasonal or MA terms. The values of D and Seasonality are 0 because there is no nonseasonal or seasonal differencing. arima sets: • P equal to 2, the number of presample observations needed to initialize an AR(2) model. • Q equal to 0 because there is no MA component to the model (i.e., no presample innovations are needed). Specify a GARCH(1,1) Model As another illustration, consider specifying the GARCH(1,1) model yt = εt, where 1-14
Represent Time Series Models Using Econometrics Toolbox Objects
εt = σtzt σt2 = κ + γ1σt2− 1 + α1εt2− 1 Assume zt follows a standard normal distribution. This model has one GARCH coefficient (corresponding to the lagged variance term) and one ARCH coefficient (corresponding to the lagged squared innovation term), both with unknown values. To specify this model, enter: Mdl = garch('GARCH',NaN,'ARCH',NaN) Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] 0
The default value for the constant term is also NaN. Parameters with NaN values need to be estimated or otherwise specified before you can forecast or simulate the model. There is also a shorthand syntax to create a default GARCH(1,1) model: Mdl = garch(1,1) Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] 0
The shorthand syntax returns a GARCH model with one GARCH coefficient and one ARCH coefficient, with default NaN values.
Retrieve Model Properties The property values in an existing model are retrievable. Working with models resembles working with struct arrays because you can access model properties using dot notation. That is, type the model name, then the property name, separated by '.' (a period). For example, consider the arima model with this AR(2) specification: Mdl = arima('AR',{0.8,-0.2},'Variance',0.2,'Constant',0);
1-15
1
Getting Started
To display the value of the property AR for the created model, enter: arCoefficients = Mdl.AR arCoefficients=1×2 cell array {[0.8000]} {[-0.2000]}
AR is a cell array, so you must use cell-array syntax. The coefficient cell arrays are lag-indexed, so entering secondARCoefficient = Mdl.AR{2} secondARCoefficient = -0.2000
returns the coefficient at lag 2. You can also assign any property value to a new variable: ar = Mdl.AR ar=1×2 cell array {[0.8000]} {[-0.2000]}
Modify Model Properties You can also modify model properties using dot notation. For example, consider this AR(2) specification: Mdl = arima('AR',{0.8,-0.2},'Variance',0.2,'Constant',0) Mdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 0 0 {0.8 -0.2} at lags [1 2] {} {} {} 0 [1×0] 0.2
The created model has the default Gaussian innovation distribution. Change the innovation distribution to a Student's t distribution with eight degrees of freedom. The data type for Distribution is a struct array. Mdl.Distribution = struct('Name','t','DoF',8) Mdl = arima with properties:
1-16
Represent Time Series Models Using Econometrics Toolbox Objects
Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (t Distribution)" Name = "t", DoF = 8 2 0 0 0 {0.8 -0.2} at lags [1 2] {} {} {} 0 [1×0] 0.2
The variable Mdl is updated accordingly.
Object Functions Object functions are functions that accept model objects as inputs and perform an operation on the model and other inputs. In Econometrics Toolbox, these functions, which represent steps in an econometrics analysis workflow, accept most objects included in the toolbox: • estimate • forecast • simulate Models that you can fit to data have these three methods in common, but the model objects in the toolbox can have other object functions. Object functions can distinguish between model objects (e.g., an arima model vs. a garch model). That is, some object functions accept different optional inputs and return different outputs depending on the type of model that is input. Find object function reference pages for a specific model by entering, for example, doc arima/ estimate.
See Also Related Examples •
“Econometric Modeling” on page 1-3
1-17
1
Getting Started
Stochastic Process Characteristics In this section... “What Is a Stochastic Process?” on page 1-18 “Stationary Processes” on page 1-19 “Linear Time Series Model” on page 1-19 “Unit Root Process” on page 1-20 “Lag Operator Notation” on page 1-21 “Characteristic Equation” on page 1-22
What Is a Stochastic Process? A time series yt is a collection of observations on a variable indexed sequentially over several time points t = 1, 2,...,T. Time series observations y1, y2,...,yT are inherently dependent. From a statistical modeling perspective, this means it is inappropriate to treat a time series as a random sample of independent observations. The goal of statistical modeling is finding a compact representation of the data-generating process for your data. The statistical building block of econometric time series modeling is the stochastic process. Heuristically, a stochastic process is a joint probability distribution for a collection of random variables. By modeling the observed time series yt as a realization from a stochastic process y = yt; t = 1, ..., T , it is possible to accommodate the high-dimensional and dependent nature of the data. The set of observation times T can be discrete or continuous. “Figure 1-1, Monthly Average CO2” on page 1-19 displays the monthly average CO2 concentration (ppm) recorded by the Mauna Loa Observatory in Hawaii from 1980 to 2012 [3].
1-18
Stochastic Process Characteristics
Figure 1-1, Monthly Average CO2
Stationary Processes Stochastic processes are weakly stationary or covariance stationary (or simply, stationary) if their first two moments are finite and constant over time. Specifically, if yt is a stationary stochastic process, then for all t: • E(yt) = μ < ∞. • V(yt) = σ2 < ∞. • Cov(yt, yt–h) = γh for all lags h ≠ 0. Does a plot of your stochastic process seem to increase or decrease without bound? The answer to this question indicates whether the stochastic process is stationary. “Yes” indicates that the stochastic process might be nonstationary. In “Figure 1-1, Monthly Average CO2” on page 1-19, the concentration of CO2 is increasing without bound which indicates a nonstationary stochastic process.
Linear Time Series Model Wold’s theorem [2] states that you can write all weakly stationary stochastic processes in the general linear form yt = μ +
∞
∑
i=1
ψiεt − i + εt .
1-19
1
Getting Started
Here, εt denotes a sequence of uncorrelated (but not necessarily independent) random variables from a well-defined probability distribution with mean zero. It is often called the innovation process because it captures all new information in the system at time t.
Unit Root Process A linear time series model is a unit root process if the solution set to its characteristic equation on page 1-22 contains a root that is on the unit circle (i.e., has an absolute value of one). Subsequently, the expected value, variance, or covariance of the elements of the stochastic process grows with time, and therefore is nonstationary. If your series has a unit root, then differencing it might make it stationary. For example, consider the linear time series model yt = yt − 1 + εt, where εt is a white noise sequence of innovations with variance σ2 (this is called the random walk). The characteristic equation of this model is z − 1 = 0, which has a root of one. If the initial observation y0 is fixed, then you can write the model as yt = y0 +
t
∑
i=1
εi . Its expected value is y0, which is independent of time. However, the
variance of the series is tσ2, which grows with time making the series unstable. Take the first difference to transform the series and the model becomes dt = yt − yt − 1 = εt. The characteristic equation for this series is z = 0, so it does not have a unit root. Note that • E(dt) = 0, which is independent of time, • V(dt) = σ2, which is independent of time, and • Cov(dt, dt − s) = 0, which is independent of time for all integers 0 < s < t. “Figure 1-1, Monthly Average CO2” on page 1-19 appears nonstationary. What happens if you plot the first difference dt = yt – yt–1 of this series? “Figure 1-2, Monthly Difference in CO2” on page 1-21 displays the dt. Ignoring the fluctuations, the stochastic process does not seem to increase or decrease in general. You can conclude that dt is stationary, and that yt is unit root nonstationary. For details, see “Differencing” on page 2-3.
1-20
Stochastic Process Characteristics
Figure 1-2, Monthly Difference in CO2
Lag Operator Notation The lag operator L operates on a time series yt such that Li yt = yt − i. An mth-degree lag polynomial of coefficients b1, b2,...,bm is defined as B(L) = (1 + b1L + b2L2 + … + bmLm) . In lag operator notation, you can write the general linear model using an infinite-degree polynomial ψ(L) = (1 + ψ1L + ψ2L2 + …), yt = μ + ψ(L)εt . You cannot estimate a model that has an infinite-degree polynomial of coefficients with a finite amount of data. However, if ψ(L) is a rational polynomial (or approximately rational), you can write it (at least approximately) as the quotient of two finite-degree polynomials. Define the q-degree polynomial θ(L) = (1 + θ1L + θ2L2 + … + θqLq) and the p-degree polynomial ϕ(L) = (1 + ϕ1L + ϕ2L2 + … + ϕpLp). If ψ(L) is rational, then ψ(L) =
θ(L) . ϕ(L) 1-21
1
Getting Started
Thus, by Wold’s theorem, you can model (or closely approximate) every stationary stochastic process as yt = μ +
θ(L) ε, ϕ(L) t
which has p + q coefficients (a finite number).
Characteristic Equation A degree p characteristic polynomial of the linear time series model yt = ϕ1 yt − 1 + ϕ2 yt − 2 + ... + ϕp yt − p + εt is ϕ(a) = ap − ϕ1ap − 1 − ϕ2ap − 2 − ... − ϕp . It is another way to assess that a series is a stationary process. For example, the characteristic equation of yt = 0.5yt − 1 − 0.02yt − 2 + εt is ϕ(a) = a2 − 0.5a + 0.02. The roots of the homogeneous characteristic equation ϕ(a) = 0 (called the characteristic roots) determine whether the linear time series is stationary. If every root in ϕ(a) lies inside the unit circle, then the process is stationary. Roots lie within the unit circle if they have an absolute value less than one. This is a unit root process if one or more roots lie inside the unit circle (i.e., have absolute value of one). Continuing the example, the characteristic roots of ϕ(a) = 0 are a = 0.4562, 0.0438 . Since the absolute values of these roots are less than one, the linear time series model is stationary.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell, 1938. [3] Tans, P., and R. Keeling. (2012, August). “Trends in Atmospheric Carbon Dioxide.” NOAA Research. Retrieved October 5, 2012 from https://gml.noaa.gov/ccgg/trends/ mlo.html.
See Also Related Examples
1-22
•
“Specify Conditional Mean Models” on page 7-5
•
“Specify GARCH Models” on page 8-6
•
“Specify EGARCH Models” on page 8-17
•
“Specify GJR Models” on page 8-28
•
“Simulate Stationary Processes” on page 7-152
•
“Assess Stationarity of a Time Series” on page 3-51
Stochastic Process Characteristics
More About •
“Econometric Modeling” on page 1-3
•
“Conditional Mean Models” on page 7-3
•
“Conditional Variance Models” on page 8-2
1-23
1
Getting Started
Bibliography [1] Ait-Sahalia, Y. “Testing Continuous-Time Models of the Spot Interest Rate.” The Review of Financial Studies. Spring 1996, Vol. 9, No. 2, pp. 385–426. [2] Ait-Sahalia, Y. "Transition Densities for Interest Rate and Other Nonlinear Diffusions." The Journal of Finance. Vol. 54, No. 4, August 1999. [3] Akaike, Hirotugu. "Information Theory and an Extension of the Maximum Likelihood Principle.” In Selected Papers of Hirotugu Akaike, edited by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa, 199–213. New York: Springer, 1998. https://doi.org/10.1007/978-1-4612-1694-0_15. [4] Akaike, Hirotugu. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19, no. 6 (December 1974): 716–23. https://doi.org/10.1109/ TAC.1974.1100705. [5] Almon, S. "The Distributed Lag Between Capital Appropriations and Expenditures." Econometrica. Vol. 33, 1965, pp. 178–196. [6] Amano, R. A., and S. van Norden. "Unit Root Tests and the Burden of Proof." Bank of Canada. Working paper 92–7, 1992. [7] Andrews, D. W. K. "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation." Econometrica. Vol. 59, 1991, pp. 817–858. [8] Andrews, D. W. K., and J. C. Monohan. "An Improved Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator." Econometrica. Vol. 60, 1992, pp. 953–966. [9] Baillie, R. T., and T. Bollerslev. “Prediction in Dynamic Models with Time-Dependent Conditional Variances.” Journal of Econometrics. Vol. 52, 1992, pp. 91–113. [10] Banerjee, A. N., and J. R. Magnus. "On the Sensitivity of the Usual t- and F-Tests to Covariance Misspecification." Journal of Econometrics. Vol. 95, 2000, pp. 157–176. [11] Barone-Adesi, G., K. Giannopoulos, and L. Vosper. "VaR without Correlations for Non-Linear Portfolios." Journal of Futures Markets. Vol. 19, 1999, pp. 583–602. [12] Belsley, D. A., E. Kuh, and R. E. Welsh. Regression Diagnostics. New York, NY: John Wiley & Sons, Inc., 1980. [13] Bera, A. K., and H. L. Higgins. “A Survey of ARCH Models: Properties, Estimation and Testing.” Journal of Economic Surveys. Vol. 7, No. 4, 1993. [14] Bohrnstedt, G. W., and T. M. Carter. "Robustness in Regression Analysis." In Sociological Methodology, H. L. Costner, editor, pp. 118–146. San Francisco: Jossey-Bass, 1971. [15] Bollerslev, T. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” Review of Economics and Statistics. Vol. 69, 1987, pp. 542–547. [16] Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics. Vol. 31, 1986, pp. 307–327. [17] Bollerslev, T., R. Y. Chou, and K. F. Kroner. “ARCH Modeling in Finance: A Review of the Theory and Empirical Evidence.” Journal of Econometrics. Vol. 52, 1992, pp. 5–59. 1-24
Bibliography
[18] Bollerslev, T., R. F. Engle, and D. B. Nelson. “ARCH Models.” Handbook of Econometrics. Vol. 4, Chapter 49, Amsterdam: Elsevier Science B.V., 1994, pp. 2959–3038. [19] Bollerslev, T., and E. Ghysels. “Periodic Autoregressive Conditional Heteroscedasticity.” Journal of Business and Economic Statistics. Vol. 14, 1996, pp. 139–151. [20] Bouye, E., V. Durrleman, A. Nikeghbali, G. Riboulet, and Roncalli, T. "Copulas for Finance: A Reading Guide and Some Applications." Groupe de Rech. Oper., Credit Lyonnais, Paris, 2000. [21] Box, G. E. P., and D. R. Cox. "An Analysis of Transformations". Journal of the Royal Statistical Society. Series B, Vol. 26, 1964, pp. 211–252. [22] Box, G. E. P. and D. Pierce. "Distribution of Residual Autocorrelations in AutoregressiveIntegrated Moving Average Time Series Models." Journal of the American Statistical Association. Vol. 65, 1970, pp. 1509–1526. [23] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [24] Brandolini, D., M. Pallotta, and R. Zenti. "Risk Management in an Asset Management Company: A Practical Case." Presented at EMFA 2001, Lugano, Switzerland. 2000. [25] Breusch, T.S., and L. G. Godfrey. "A Review of Recent Work on Testing for Autocorrelation in Dynamic Simultaneous Models." In Currie, D., R. Nobay, and D. Peel (Eds.), Macroeconomic Analysis: Essays in Macroeconomics and Econometrics. London: Croom Helm, 1981. [26] Breusch, T.S., and Pagan, A.R. "Simple test for heteroscedasticity and random coefficient variation". Econometrica. v. 47, 1979, pp. 1287–1294. [27] Brieman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall/CRC, 1984. [28] Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002. [29] Brooks, C., S. P. Burke, and G. Persand. “Benchmarks and the Accuracy of GARCH Model Estimation.” International Journal of Forecasting. Vol. 17, 2001, pp. 45–56. [30] Brown, Bryan W., and Roberto S. Mariano. "Predictors in Dynamic Nonlinear Models: LargeSample Behavior." Econometric Theory, 5 (December 1989): 430–52. https://doi.org/10.1017/ S0266466600012603. [31] Brown, M. B. and Forsythe, A. B. "Robust Tests for Equality of Variances." Journal of the American Statistical Association. 69, 1974, pp. 364–367. [32] Burke, S. P. "Confirmatory Data Analysis: The Joint Application of Stationarity and Unit Root Tests." University of Reading, UK. Discussion paper 20, 1994. [33] Burnham, Kenneth P., and David R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed, New York: Springer, 2002. [34] Campbell, J. Y., A. W. Lo, and A. C. MacKinlay. Chapter 12. “The Econometrics of Financial Markets.” Nonlinearities in Financial Data. Princeton, NJ: Princeton University Press, 1997. 1-25
1
Getting Started
[35] Caner, M., and L. Kilian. “Size Distortions of Tests of the Null Hypothesis of Stationarity: Evidence and Implications for the PPP Debate.” Journal of International Money and Finance. Vol. 20, 2001, pp. 639–657. [36] Cecchetti, S. G., and P. S. Lam. “Variance-Ratio Tests: Small-Sample Properties with an Application to International Output Data.” Journal of Business and Economic Statistics. Vol. 12, 1994, pp. 177–186. [37] Chambers, M. J. "Jackknife Estimation of Stationary Autoregressive Models." University of Essex Discussion Paper No. 684, 2011. [38] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [39] Chow, G. C. “Tests of Equality Between Sets of Coefficients in Two Linear Regressions.” Econometrica. Vol. 28, 1960, pp. 591–605. [40] Christoffersen, P.F. Elements of Financial Risk Management. Waltham, MA: Academic Press, 2002. [41] Clarke, K. A. "The Phantom Menace: Omitted Variable Bias in Econometric Research." Conflict Management and Peace Science. Vol. 22, 2005, pp. 341–352. [42] Clements, Michael P., and Jeremy Smith. "The Performance of Alternative Forecasting Methods for SETAR Models." International Journal of Forecasting, 13 (December 1997): 463–75. https://doi.org/10.1016/S0169-2070(97)00017-4. [43] Cochrane, J. “How Big is the Random Walk in GNP?” Journal of Political Economy. Vol. 96, 1988, pp. 893–920. [44] Congressional Budget Office, Budget and Economic Data, 10-Year Economic Projections, https://www.cbo.gov/data/budget-economic-data. [45] Cribari-Neto, F. "Asymptotic Inference Under Heteroskedasticity of Unknown Form." Computational Statistics & Data Analysis. Vol. 45, 2004, pp. 215-233. [46] Cramér, H. Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press, 1946. [47] Dagum, E. B. The X-11-ARIMA Seasonal Adjustment Method. Number 12–564E. Statistics Canada, Ottawa, 1980. [48] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [49] Davidson, R., and E. Flachaire. "The Wild Bootstrap, Tamed at Last." Journal of Econometrics. Vol. 146, 2008, pp. 162–169. [50] Del Negro, M., Schorfheide, F., Smets, F. and Wouters, R. "On the Fit of New Keynesian Models." Journal of Business & Economic Statistics. Vol. 25, No. 2, 2007, pp. 123–162. [51] Diebold, F. X. Elements of Forecasting. Mason, OH: Thomson Higher Education, 2007. [52] Diebold, F.X., and C. Li. "Forecasting the Term Structure of Government Bond Yields." Journal of Econometrics. Vol. 130, No. 2, 2006, pp. 337–364. 1-26
Bibliography
[53] Diebold, F. X., G. D. Rudebusch, and B. Aruoba (2006), "The Macroeconomy and the Yield Curve: A Dynamic Latent Factor Approach." Journal of Econometrics. Vol. 131, 2006, pp. 309–338. [54] Diebold, F.X., and G.D. Rudebusch. Business Cycles: Durations, Dynamics, and Forecasting. Princeton, NJ: Princeton University Press, 1999. [55] den Haan, W. J., and A. Levin. "A Practitioner's Guide to Robust Covariance Matrix Estimation." In Handbook of Statistics. Edited by G. S. Maddala and C. R. Rao. Amsterdam: Elsevier, 1997. [56] Dickey, D. A., and W. A. Fuller. “Distribution of the Estimators for Autoregressive Time Series with a Unit Root.” Journal of the American Statistical Association. Vol. 74, 1979, pp. 427–431. [57] Dickey, D. A., and W. A. Fuller. “Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root.” Econometrica. Vol. 49, 1981, pp. 1057–1072. [58] Dowd, K. Measuring Market Risk. West Sussex: John Wiley & Sons, 2005. [59] Durbin J., and S. J. Koopman. “A Simple and Efficient Simulation Smoother for State Space Time Series Analysis.” Biometrika. Vol 89., No. 3, 2002, pp. 603–615. [60] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012. [61] Durbin, J. and G.S. Watson. "Testing for Serial Correlation in Least Squares Regression." Biometrika. Vol. 37, 1950, pp. 409–428. [62] Elder, J., and P. E. Kennedy. “Testing for Unit Roots: What Should Students Be Taught?” Journal of Economic Education. Vol. 32, 2001, pp. 137–146. [63] Embrechts, P., A. McNeil, and D. Straumann. "Correlation and Dependence in Risk Management: Properties and Pitfalls". Risk Management: Value At Risk and Beyond. Cambridge: Cambridge University Press, 1999, pp. 176–223. [64] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [65] Engle, Robert. F. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica 50 (July 1982): 987–1007. https://doi.org/ 10.2307/1912773. [66] Engle, R. F. and C. W. J. Granger. “Co-Integration and Error-Correction: Representation, Estimation, and Testing.” Econometrica. v. 55, 1987, pp. 251–276. [67] Engle, R. F., D. M. Lilien, and R. P. Robins. “Estimating Time Varying Risk Premia in the Term Structure: The ARCH-M Model.” Econometrica. Vol. 59, 1987, pp. 391–407. [68] Faust, J. “When Are Variance Ratio Tests for Serial Dependence Optimal?” Econometrica. Vol. 60, 1992, pp. 1215–1226. [69] Fernández-Villaverde, Jesús, Rubio-Ramírez, Juan F., and Schorfheide, Frank. "Solution and Estimation Methods for DSGE Models." Handbook of Macroeconomics 2 (November 2016) 527–724. https://doi.org/10.1016/bs.hesmac.2016.03.006. [70] Findley, D. F., B. C. Monsell, W. R. Bell, M. C. Otto, and B.-C. Chen. "New Capabilities and Methods of the X-12-ARIMA Seasonal-Adjustment Program." Journal of Business & Economic Statistics. Vol. 16, Number 2, 1998, pp. 127–152 . 1-27
1
Getting Started
[71] Fisher, F. M. "Tests of Equality Between Sets of Coefficients in Two Linear Regressions: An Expository Note." Econometrica. Vol. 38, 1970, pp. 361–66. [72] Fisher, R. A.. "Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population." Biometrika. Vol. 10, 1915, pp. 507–521. [73] Fisher, R. A. "On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample." Metron. Vol. 1, 1921, pp. 3–32. [74] Fisher, R. A. "The Distribution of the Partial Correlation Coefficient." Metron. Vol. 3, 1924, pp. 329–332. [75] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [76] Gallant, A. R. Nonlinear Statistical Models. Hoboken, NJ: John Wiley & Sons, Inc., 1987. [77] Gilks, W. R., S. Richardson, and D.J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Boca Raton: Chapman & Hall/CRC, 1996. [78] Glasserman, P. Monte Carlo Methods in Financial Engineering. New York: Springer-Verlag, 2004. [79] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [80] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [81] Gourieroux, C. ARCH Models and Financial Applications. New York: Springer-Verlag, 1997. [82] Goutte, C. "Note on Free Lunches and Cross-Validation." Neural Computation. Vol. 9, 1997, pp. 1211–1215. [83] Granger, C., and P. Newbold. "Forecasting Transformed Series." Journal of the Royal Statistical Society. Series B, Vol. 38, 1976, pp. 189–203. [84] Granger, C. W. J., and P. Newbold. "Spurious Regressions in Econometrics." Journal of Econometrics. Vol. 2, 1974, pp. 111–120. [85] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008. [86] Goldberger, A. T. A Course in Econometrics. Cambridge, MA: Harvard University Press, 1991. [87] Goldfeld, S. M., and Quandt, R. E. "Some Tests for Homoscedasticity". Journal of the American Statistical Association. v. 60, 1965, pp. 539–547. [88] Hagerud, G. E. “Modeling Nordic Stock Returns with Asymmetric GARCH.” Working Paper Series in Economics and Finance. No. 164, Stockholm: Department of Finance, Stockholm School of Economics, 1997. [89] Hagerud, G. E. “Specification Tests for Asymmetric GARCH.” Working Paper Series in Economics and Finance. No. 163, Stockholm: Department of Finance, Stockholm School of Economics, 1997. 1-28
Bibliography
[90] Haggstrom, O. Finite Markov Chains and Algorithmic Applications. Cambridge, UK: Cambridge University Press, 2002. [91] Hamilton, J. D. "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." Econometrica. Vol. 57, 1989, pp. 357–384. [92] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [93] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [94] Hamilton, J. D. "Macroeconomic Regimes and Regime Shifts." In Handbook of Macroeconomics. (H. Uhlig and J. Taylor, eds.). Amsterdam: Elsevier, 2016. [95] Hannan, Edward J., and Barry G. Quinn. “The Determination of the Order of an Autoregression.” Journal of the Royal Statistical Society: Series B (Methodological) 41, no. 2 (January 1979): 190–95. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x. [96] Hart, J. D. "Kernel Regression Estimation With Time Series Errors." Journal of the Royal Statistical Society. Series B, Vol. 53, 1991, pp. 173–187. [97] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. New York: Springer, 2008. [98] Haug, A. “Testing Linear Restrictions on Cointegrating Vectors: Sizes and Powers of Wald Tests in Finite Samples.” Econometric Theory. Vol. 18, 2002, pp. 505–524. [99] Helwege, J., and P. Kleiman. "Understanding Aggregate Default Rates of High Yield Bonds." Federal Reserve Bank of New York Current Issues in Economics and Finance. Vol. 2, No. 6, 1996, pp. 1–6. [100] Hendry, D. F. Econometrics: Alchemy or Science? Oxford: Oxford University Press, 2001. [101] Hentschel, L. "All in the Family: Nesting Symmetric and Asymmetric GARCH Models." Journal of Financial Economics. Vol. 39, 1995, pp. 71–104. [102] Hibbs, D. "Problems of Statistical Estimation and Causal Inference in Dynamic Time Series Models." In Costner, H. (Ed.) Sociological Methodology. San Francisco: Jossey-Bass, 1974. [103] Hoerl, A. E., and R. W. Kennard. "Ridge Regression: Applications to Nonorthogonal Problems." Technometrics. Vol. 12, No. 1, 1970, pp. 69–82. [104] Hull, J. C. Options, Futures, and Other Derivatives. 5th ed. Englewood Cliffs, NJ: Prentice Hall, 2002. [105] Hodrick, Robert J, and Edward C. Prescott. "Postwar U.S. Business Cycles: An Empirical Investigation." Journal of Money, Credit, and Banking. Vol. 29, No. 1, 1997, pp. 1–16. [106] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [107] Hyndman, Rob J. "Highest-Density Forecast Regions for Nonlinear and Non-Normal Time Series Models." Journal of Forecasting, 14 (September 1995): 431–41. https://doi.org/10.1002/ for.3980140503. 1-29
1
Getting Started
[108] Inder, B. A. "Finite Sample Power of Tests for Autocorrelation in Models Containing Lagged Dependent Variables." Economics Letters. Vol. 14, 1984, pp.179–185. [109] Jarrow, A. Finance Theory. Englewood Cliffs, NJ: Prentice-Hall, 1988. [110] Jarvis, J. P., and D. R. Shier. "Graph-Theoretic Analysis of Finite Markov Chains." In Applied Mathematical Modeling: A Multidisciplinary Approach. Boca Raton: CRC Press, 2000. [111] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [112] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 2, 2nd ed. New York: John Wiley & Sons, 1995. [113] Johnston, J. Econometric Methods. New York: McGraw-Hill, 1972. [114] Jonsson, J. G., and M. Fridson. "Forecasting Default Rates on High Yield Bonds." Journal of Fixed Income. Vol. 6, No. 1, 1996, pp. 69–77. [115] Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lϋtkepohl, and T. C. Lee. The Theory and Practice of Econometrics. New York, NY: John Wiley & Sons, Inc., 1985. [116] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [117] Kennedy, P. A Guide to Econometrics. 6th ed. New York: John Wiley & Sons, 2008. [118] Keuzenkamp, H. A., and M. McAleer. "Simplicity, Scientific Inference and Economic Modeling." Economic Journal. Vol. 105, 1995, pp. 1–21. [119] Kiefer, N. M., T. J. Vogelsang, and H. Bunzel. "Simple Robust Testing of Regression Hypotheses." Econometrica. Vol. 68, 2000, pp. 695–714. [120] Kim, C.-J. "Dynamic Linear Models with Markov Switching." Journal of Econometrics. Vol. 60, 1994, pp. 1–22. [121] Kimball, M. "The Quantitative Analytics of the Basic Neomonetarist Model." Journal of Money, Credit, and Banking, Part 2: Liquidity, Monetary Policy, and Financial Intermediation. Vol. 27, No. 4, 1995, pp. 1241–1277. [122] King, M. L. "Robust Tests for Spherical Symmetry and Their Application to Least Squares Regression." Annals of Statistics. Vol. 8, 1980, pp. 1265–1271. [123] Kole, E. "Regime Switching Models: An Example for a Stock Market Index." Rotterdam, NL: Econometric Institute, Erasmus School of Economics, 2010. [124] Koyck, L. M. Distributed Lags Models and Investment Analysis. Amsterdam: North-Holland, 1954. [125] Krolzig, H.-M. Markov-Switching Vector Autoregressions. Berlin: Springer, 1997. [126] Krolzig, H. -M., and Hendry, D.F. "Computer Automation of General-To-Specific Model Selection Procedures." Journal of Economic Dynamics & Control. Vol. 25, 2001, pp. 831–866. [127] Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. 5th Ed. New York: McGraw-Hill/Irwin, 2005. 1-30
Bibliography
[128] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [129] Leybourne, S. J. and B. P. M. McCabe. “A Consistent Test for a Unit Root.” Journal of Business and Economic Statistics. Vol. 12, 1994, pp. 157–166. [130] Leybourne, S. J. and B. P. M. McCabe. “Modified Stationarity Tests with Data-Dependent ModelSelection Rules.” Journal of Business and Economic Statistics. Vol. 17, 1999, pp. 264–270. [131] Lin, Jin-Lung, and Clive W. J. Granger. "Forecasting from Non-Linear Models in Practice." Journal of Forecasting, 3 (January 1994): 1–9. https://doi.org/10.1002/for.3980130102. [132] Litterman, Robert B. "Forecasting with Bayesian Vector Autoregressions: Five Years of Experience." Journal of Business and Economic Statistics 4, no. 1 (January 1986): 25–38. https://doi.org/10.2307/1391384. [133] Ljung, G. and G. E. P. Box. "On a Measure of Lack of Fit in Time Series Models." Biometrika. Vol. 66, 1978, pp. 67–72. [134] Lo, A. W., and A. C. MacKinlay. “Stock Market Prices Do Not Follow Random Walks: Evidence from a Simple Specification Test.” Review of Financial Studies. Vol. 1, 1988, pp. 41–66. [135] Lo, A. W., and A. C. MacKinlay. “The Size and Power of the Variance Ratio Test.” Journal of Econometrics. Vol. 40, 1989, pp. 203–238. [136] Lo, A. W., and A. C. MacKinlay. A Non-Random Walk Down Wall St. Princeton, NJ: Princeton University Press, 2001. [137] Loeffler, G., and P. N. Posch. Credit Risk Modeling Using Excel and VBA. West Sussex, England: Wiley Finance, 2007. [138] Long, J. S., and L. H. Ervin. "Using Heteroscedasticity-Consistent Standard Errors in the Linear Regression Model." The American Statistician. v. 54, 2000, pp. 217-224. [139] Longstaff, F. A., and E. S. Schwartz. “Valuing American Options by Simulation: A Simple LeastSquares Approach.” The Review of Financial Studies. Spring 2001, Vol. 14, No. 1, pp. 113– 147. [140] Lütkepohl, H. New Introduction to Multiple Time Series Analysis. Berlin: Springer, 2005. [141] Lütkepohl, Helmut, and Markus Krätzig, editors. Applied Time Series Econometrics. 1st ed. Cambridge University Press, 2004. https://doi.org/10.1017/CBO9780511606885. [142] MacKinnon, J. G. "Numerical Distribution Functions for Unit Root and Cointegration Tests." Journal of Applied Econometrics. Vol. 11, 1996, pp. 601–618. [143] MacKinnon, J. G., and H. White. "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." Journal of Econometrics. Vol. 29, 1985, pp. 305–325. [144] Maddala, G. S., and I. M. Kim. Unit Roots, Cointegration, and Structural Change. Cambridge, UK: Cambridge University Press, 1998. [145] Maeshiro, A. "Teaching Regressions with a Lagged Dependent Variable and Autocorrelated Disturbances." Journal of Economic Education. Vol. 27, 1996, pp. 72–84. 1-31
1
Getting Started
[146] Maeshiro, A. "An Illustration of the Bias of OLS for Yt = λYt–1+Ut." Journal of Economic Education. Vol. 31, 2000, pp. 76–80. [147] Malinvaud, E. Statistical Methods of Econometrics. Amsterdam: North-Holland, 1970. [148] Marriott, F. and J. Pope. "Bias in the Estimation of Autocorrelations." Biometrika. Vol. 41, 1954, pp. 390–402. [149] Mashal, R. and A. Zeevi. "Beyond Correlation: Extreme Co-movements between Financial Assets." Columbia University, New York, 2002. [150] McCullough, B. D., and C. G. Renfro. “Benchmarks and Software Standards: A Case Study of GARCH Procedures.” Journal of Economic and Social Measurement. Vol. 25, 1998, pp. 59–71. [151] McLeod, A.I. and W.K. Li. “Diagnostic Checking ARMA Time Series Models Using SquaredResidual Autocorrelations.”Journal of Time Series Analysis. Vol. 4, 1983, pp. 269–273. [152] McNeil, A. and R. Frey. "Estimation of Tail Related Risk Measure for Heteroscedastic Financial Time Series: An Extreme Value Approach." Journal of Empirical Finance. Vol. 7, 2000, pp. 271–300. [153] Moler, C. Numerical Computing with MATLAB. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2004. [154] Montgomery, J. Mathematical Models of Social Systems. Unpublished manuscript. Department of Sociology, University of Wisconsin-Madison, 2016. [155] Morin, N. "Likelihood Ratio Tests on Cointegrating Vectors, Disequilibrium Adjustment Vectors, and their Orthogonal Complements." European Journal of Pure and Applied Mathematics. v. 3, 2010, pp. 541–571. [156] National Bureau of Economic Research (NBER), Business Cycle Expansions and Contractions, https://www.nber.org/research/data/us-business-cycle-expansions-andcontractions. [157] Nelson, D. B. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica. Vol. 59, 1991, pp. 347–370. [158] Nelson, D. B. "Conditional Heteroskedasticity in Asset Returns: A New Approach." Econometrica.. Vol. 59, No. 2, 1991, pp. 347–370. [159] Nelson, C., and C. Plosser. "Trends and Random Walks in Macroeconomic Time Series: Some Evidence and Implications." Journal of Monetary Economics. Vol. 10, 1982, pp. 130–162. [160] Nelson, R. C., and A. F. Siegel. "Parsimonious Modeling of Yield Curves." Journal of Business. Vol. 60, No. 4, 1987, pp. 473–489. [161] Newey, W. K., and K. D. West. “A Simple Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica. Vol. 55, 1987, pp. 703–708. [162] Newey, W. K, and K. D. West. “Automatic Lag Selection in Covariance Matrix Estimation.” The Review of Economic Studies. Vol. 61, No. 4, 1994, pp. 631–653. [163] Norris, J. R. Markov Chains. Cambridge, UK: Cambridge University Press, 1997. 1-32
Bibliography
[164] Nystrom, K. and J. Skoglund. "Univariate Extreme Value Theory, GARCH and Measures of Risk." Preprint, submitted 2002. [165] Nystrom, K. and J. Skoglund. "A Framework for Scenario-Based Risk Management." Preprint, submitted 2002. [166] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, 1991˙. [167] Park, T. and G. Casella. “The Bayesian Lasso.” Journal of American Statistical Association. Vol. 103, 2008, pp. 681–686. [168] Ng, S., and P. Perron. “Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag.” Journal of the American Statistical Association. Vol. 90, 1995, pp. 268–281. [169] Park, R. E. "Estimation with Heteroscedastic Error Terms". Econometrica. 34, 1966 p. 888. [170] Perron, P. “Trends and Random Walks in Macroeconomic Time Series: Further Evidence from a New Approach.” Journal of Economic Dynamics and Control. Vol. 12, 1988, pp. 297–332. [171] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29. [172] Peters, J. P. “Estimating and Forecasting Volatility of Stock Indices Using Asymmetric GARCH Models and Skewed Student-t Densities.” Working Paper. Belgium: École d'Administration des Affaires, University of Liège, March 20, 2001. [173] Phillips, P. “Time Series Regression with a Unit Root.” Econometrica. Vol. 55, 1987, pp. 277– 301. [174] Phillips, P., and P. Perron. “Testing for a Unit Root in Time Series Regression." Biometrika. Vol. 75, 1988, pp. 335–346. [175] Qin, H., and A. T. K. Wan. "On the Properties of the t- and F-Ratios in Linear Regressions with Nonnormal Errors." Econometric Theory. Vol. 20, No. 4, 2004, pp. 690–700. [176] Rea, J. D. "Indeterminacy of the Chow Test When the Number of Observations is Insufficient." Econometrica. Vol. 46, 1978, p. 229. [177] Roncalli, T., A. Durrleman, and A. Nikeghbali. "Which Copula Is the Right One?" Groupe de Rech. Oper., Credit Lyonnais, Paris, 2000. [178] Schwert, W. "Effects of Model Specification on Tests for Unit Roots in Macroeconomic Data." Journal of Monetary Economics. Vol. 20, 1987, pp. 73–103. [179] Schwarz, Gideon. “Estimating the Dimension of a Model.” The Annals of Statistics 6, no. 2 (March 1978): 461–64. https://doi.org/10.1214/aos/1176344136. [180] Schwert, W. “Tests for Unit Roots: A Monte Carlo Investigation.” Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159. [181] Shao, J. "An Asymptotic Theory for Linear Model Selection." Statistica Sinica. Vol. 7, 1997, pp. 221–264. [182] Sharpe, W. F. “Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk.” Journal of Finance. Vol. 19, 1964, pp. 425–442. 1-33
1
Getting Started
[183] Shreve, S. E. Stochastic Calculus for Finance II: Continuous-Time Models. New York: SpringerVerlag, 2004. [184] Sims, Christopher A. "Solving Linear Rational Expectations Models." Computational Economics 20 (October 2002) 1–20. https://doi.org/10.1023/A:1020517101123. [185] Sims, C., Stock, J., and Watson, M. "Inference in Linear Time Series Models with Some Unit Roots." Econometrica. Vol. 58, 1990, pp. 113–144. [186] Smets, F. and Wouters, R. "An Estimated Stochastic Dynamic General Equilibrium Model of the Euro Area." European Central Bank, Working Paper Series. No. 171, 2002. [187] Smets, F. and Wouters, R. "Comparing Shocks and Frictions in US and Euro Area Business Cycles: A Bayesian DSGE Approach." European Central Bank, Working Paper Series. No. 391, 2004. [188] Smets, F. and Wouters, R. "Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach." European Central Bank, Working Paper Series. No. 722, 2007. [189] Strang, G. Linear Algebra and Its Applications. 4th ed. Pacific Grove, CA: Brooks Cole, 2005. [190] Stone, M. "An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike's Criterion." Journal of the Royal Statistical Society. Series B, Vol. 39, 1977, pp. 44–47. [191] Stone, R. "The Analysis of Market Demand." Journal of the Royal Statistical Society. Vol. 108, 1945, pp. 1–98. [192] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [193] Tibshirani, R. "Regression Shrinkage and Selection via the Lasso." Journal of Royal Statistical Society. Vol. 58, 1996, pp. 267–288. [194] Tsay,R.S. Analysis of Financial Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 2005. [195] Turner, P. M. "Testing for Cointegration Using the Johansen Approach: Are We Using the Correct Critical Values?" Journal of Applied Econometrics. v. 24, 2009, pp. 825–831. [196] U.S. Federal Reserve Economic Data (FRED), Federal Reserve Bank of St. Louis, https:// fred.stlouisfed.org/. [197] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999. [198] Weisberg, S. Applied Linear Regression. Hoboken, NJ: John Wiley & Sons, Inc., 2005. [199] Wielandt, H. Topics in the Analytic Theory of Matrices. Lecture notes prepared by R. Mayer. Department of Mathematics, University of Wisconsin-Madison, 1967. [200] White, H. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity." Econometrica. v. 48, 1980, pp. 817–838. [201] White, J. S. "Asymptotic Expansions for the Mean and Variance of the Serial Correlation Coefficient." Biometrika. Vol 48, 1961, pp. 85–94. 1-34
Bibliography
[202] White, H. Asymptotic Theory for Econometricians. New York: Academic Press, 1984. [203] White, H., and I. Domowitz. “Nonlinear Regression with Dependent Observations.” Econometrica. Vol. 52, 1984, pp. 143–162. [204] Wilson, A. L. "When is the Chow Test UMP?" The American Statistician. Vol. 32, 1978, pp. 66– 68. [205] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell, 1938. [206] Wooldridge, J. M. Introductory Econometrics. Cincinnati, OH: South-Western, 2009.
1-35
2 Data Preprocessing • “Data Transformations” on page 2-2 • “Trend-Stationary vs. Difference-Stationary Processes” on page 2-6 • “Specify Lag Operator Polynomials” on page 2-9 • “Nonseasonal Differencing” on page 2-13 • “Nonseasonal and Seasonal Differencing” on page 2-16 • “Time Series Decomposition” on page 2-19 • “Moving Average Filter” on page 2-21 • “Moving Average Trend Estimation” on page 2-22 • “Parametric Trend Estimation” on page 2-24 • “Hodrick-Prescott Filter” on page 2-29 • “Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-30 • “Seasonal Filters” on page 2-34 • “Seasonal Adjustment” on page 2-37 • “Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-39 • “Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-45
2
Data Preprocessing
Data Transformations In this section... “Why Transform?” on page 2-2 “Common Data Transformations” on page 2-2
Why Transform? You can transform time series to: • Isolate temporal components of interest. • Remove the effect of nuisance components (like seasonality). • Make a series stationary. • Reduce spurious regression effects. • Stabilize variability that grows with the level of the series. • Make two or more time series more directly comparable. You can choose among many data transformation to address these (and other) aims. For example, you can use decomposition methods to describe and estimate time series components. Seasonal adjustment is a decomposition method you can use to remove a nuisance seasonal component. Detrending and differencing are transformations you can use to address nonstationarity due to a trending mean. Differencing can also help remove spurious regression effects due to cointegration. In general, if you apply a data transformation before modeling your data, you then need to backtransform model forecasts to return to the original scale. This is not necessary in Econometrics Toolbox if you are modeling difference-stationary data. Use arima to model integrated series that are not a priori differenced. A key advantage of this is that arima also returns forecasts on the original scale automatically.
Common Data Transformations • “Detrending” on page 2-2 • “Differencing” on page 2-3 • “Log Transformations” on page 2-4 • “Prices, Returns, and Compounding” on page 2-4 Detrending Some nonstationary series can be modeled as the sum of a deterministic trend and a stationary stochastic process. That is, you can write the series yt as yt = μt + εt, where εt is a stationary stochastic process with mean zero. 2-2
Data Transformations
The deterministic trend, μt, can have multiple components, such as nonseasonal and seasonal components. You can detrend (or decompose) the data to identify and estimate its various components. The detrending process proceeds as follows: 1
Estimate the deterministic trend component.
2
Remove the trend from the original data.
3
(Optional) Model the remaining residual series with an appropriate stationary stochastic process.
Several techniques are available for estimating the trend component. You can estimate it parametrically using least squares, nonparametrically using filters (moving averages), or a combination of both. Detrending yields estimates of all trend and stochastic components, which might be desirable. However, estimating trend components can require making additional assumptions, performing extra steps, and estimating additional parameters. Differencing Differencing is an alternative transformation for removing a mean trend from a nonstationary series. This approach is advocated in the Box-Jenkins approach to model specification [1]. According to this methodology, the first step to build models is differencing your data until it looks stationary. Differencing is appropriate for removing stochastic trends (e.g., random walks). Define the first difference as Δyt = yt − yt − 1, where Δ is called the differencing operator. In lag operator notation, where Li yt = yt − i, Δyt = (1 − L)yt . You can create lag operator polynomial objects using LagOp. Similarly, define the second difference as 2
Δ2 yt = (1 − L) yt = (yt − yt − 1) − (yt − 1 − yt − 2) = yt − 2yt − 1 + yt − 2 . Like taking derivatives, taking a first difference makes a linear trend constant, taking a second difference makes a quadratic trend constant, and so on for higher-degree polynomials. Many complex stochastic trends can also be eliminated by taking relatively low-order differences. Taking D differences makes a process with D unit roots stationary. For series with seasonal periodicity, seasonal differencing can address seasonal unit roots. For data with periodicity s (e.g., quarterly data have s = 4 and monthly data have s = 12), the seasonal differencing operator is defined as Δs yt = (1 − Ls)yt = yt − yt − s . Using a differencing transformation eliminates the intermediate estimation steps required for detrending. However, this means you can’t obtain separate estimates of the trend and stochastic components. 2-3
2
Data Preprocessing
Log Transformations For a series with exponential growth and variance that grows with the level of the series, a log transformation can help linearize and stabilize the series. If you have negative values in your time series, you should add a constant large enough to make all observations greater than zero before taking the log transformation. In some application areas, working with differenced, logged series is the norm. For example, the first differences of a logged time series, Δlogyt = logyt − logyt − 1, are approximately the rates of change of the series. Prices, Returns, and Compounding The rates of change of a price series are called returns. Whereas price series do not typically fluctuate around a constant level, the returns series often looks stationary. Thus, returns series are typically used instead of price series in many applications. Denote successive price observations made at times t and t + 1 as yt and yt+1, respectively. The continuously compounded returns series is the transformed series rt = log
yt + 1 = logyt + 1 − logyt . yt
This is the first difference of the log price series, and is sometimes called the log return. An alternative transformation for price series is simple returns, rt =
yt + 1 − yt yt + 1 = − 1. yt yt
For series with relatively high frequency (e.g., daily or weekly observations), the difference between the two transformations is small. Econometrics Toolbox has price2ret for converting price series to returns series (with either continuous or simple compounding), and ret2price for the inverse operation.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also LagOp | price2ret | ret2price
Related Examples
2-4
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Moving Average Trend Estimation” on page 2-22
•
“Nonseasonal Differencing” on page 2-13
Data Transformations
•
“Nonseasonal and Seasonal Differencing” on page 2-16
•
“Parametric Trend Estimation” on page 2-24
•
“Specify Lag Operator Polynomials” on page 2-9
More About •
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
•
“Moving Average Filter” on page 2-21
•
“Seasonal Adjustment” on page 2-37
•
“Time Series Decomposition” on page 2-19
2-5
2
Data Preprocessing
Trend-Stationary vs. Difference-Stationary Processes In this section... “Nonstationary Processes” on page 2-6 “Trend Stationary” on page 2-7 “Difference Stationary” on page 2-7
Nonstationary Processes The stationary stochastic process is a building block of many econometric time series models. Many observed time series, however, have empirical features that are inconsistent with the assumptions of stationarity. For example, the following plot shows quarterly U.S. GDP measured from 1947 to 2005. There is a very obvious upward trend in this series that one should incorporate into any model for the process. load Data_GDP plot(Data) xlim([0,234]) title('Quarterly U.S. GDP, 1947-2005')
A trending mean is a common violation of stationarity. There are two popular models for nonstationary series with a trending mean. 2-6
Trend-Stationary vs. Difference-Stationary Processes
• Trend stationary: The mean trend is deterministic. Once the trend is estimated and removed from the data, the residual series is a stationary stochastic process. • Difference stationary: The mean trend is stochastic. Differencing the series D times yields a stationary stochastic process. The distinction between a deterministic and stochastic trend has important implications for the longterm behavior of a process: • Time series with a deterministic trend always revert to the trend in the long run (the effects of shocks are eventually eliminated). Forecast intervals have constant width. • Time series with a stochastic trend never recover from shocks to the system (the effects of shocks are permanent). Forecast intervals grow over time. Unfortunately, for any finite amount of data there is a deterministic and stochastic trend that fits the data equally well (Hamilton, 1994). Unit root tests are a tool for assessing the presence of a stochastic trend in an observed series.
Trend Stationary You can write a trend-stationary process, yt, as yt = μt + εt, where: • μt is a deterministic mean trend. • εt is a stationary stochastic process with mean zero. In some applications, the trend is of primary interest. Time series decomposition methods focus on decomposing μt into different trend sources (e.g., secular trend component and seasonal component). You can decompose series nonparametrically using filters (moving averages), or parametrically using regression methods. Given an estimate μ t, you can explore the residual seriesyt − μ t for autocorrelation, and optionally model it using a stationary stochastic process model.
Difference Stationary In the Box-Jenkins modeling approach [2], nonstationary time series are differenced until stationarity is achieved. You can write a difference-stationary process, yt, as ΔD yt = μ + ψ(L)εt, where: • ΔD = (1 − L)D is a Dth-degree differencing operator. • ψ(L) = (1 + ψ L + ψ L2 + …) is an infinite-degree lag operator polynomial with absolutely 1 2 summable coefficients and all roots lying outside the unit circle. • εt is an uncorrelated innovation process with mean zero. 2-7
2
Data Preprocessing
Time series that can be made stationary by differencing are called integrated processes. Specifically, when D differences are required to make a series stationary, that series is said to be integrated of order D, denoted I(D). Processes with D ≥ 1 are often said to have a unit root.
References [1] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Related Examples •
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Nonseasonal Differencing” on page 2-13
•
“Moving Average Trend Estimation” on page 2-22
•
“Specify Lag Operator Polynomials” on page 2-9
More About
2-8
•
“Moving Average Filter” on page 2-21
•
“Time Series Decomposition” on page 2-19
•
“ARIMA Model” on page 7-42
Specify Lag Operator Polynomials
Specify Lag Operator Polynomials In this section... “Lag Operator Polynomial of Coefficients” on page 2-9 “Difference Lag Operator Polynomials” on page 2-11
Lag Operator Polynomial of Coefficients Define the lag operator L such that Liyt = yt–i. An m-degree polynomial of coefficients A in the lag operator L is given by A(L) = (A0 + A1L1 + … + AmLm) . Here, the coefficient A0 corresponds to lag 0, A1 corresponds to lag 1, and so on, to Am, which corresponds to lag m. To specify a coefficient lag operator polynomial in Econometrics Toolbox, use LagOp. Specify the (nonzero) coefficients A0,...,Am as a cell array, and the lags of the nonzero coefficients as a vector. The coefficients of lag operator polynomial objects are designed to look and feel like traditional MATLAB cell arrays. There is, however, an important difference: elements of cell arrays are accessible by positive integer sequential indexing, i.e., 1, 2, 3,.... The coefficients of lag operator polynomial objects are accessible by lag-based indexing. That is, you can specify any nonnegative integer lags, including lag 0. For example, consider specifying the polynomial A(L) = (1 − 0.3L + 0.6L4) . This polynomial has coefficient 1 at lag 0, coefficient –0.3 at lag 1, and coefficient 0.6 at lag 4. Enter: A = LagOp({1,-0.3,0.6},'Lags',[0,1,4]) A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.3 0.6] Lags: [0 1 4] Degree: 4 Dimension: 1
The created lag operator object A corresponds to a lag operator polynomial of degree 4. A LagOp object has a number of properties describing it: • Coefficients, a cell array of coefficients. • Lags, a vector indicating the lags of nonzero coefficients. • Degree, the degree of the polynomial. • Dimension, the dimension of the polynomial (relevant for multivariate time series). To access properties of the model, use dot notation. That is, enter the variable name and then the property name, separated by a period. To access specific coefficients, use dot notation along with cell array syntax (consistent with the Coefficients data type). To illustrate, returns the coefficient at lag 4: 2-9
2
Data Preprocessing
A.Coefficients{4} ans = 0.6000
Return the coefficient at lag 0: A.Coefficients{0} ans = 1
This last command illustrates lag indexing. The index 0 is valid, and corresponds to the lag 0 coefficient. Notice what happens if you index a lag larger than the degree of the polynomial: A.Coefficients{6} ans = 0
This does not return an error. Rather, it returns O, the coefficient at lag 6 (and all other lags with coefficient zero). Use similar syntax to add new nonzero coefficients. For example, to add the coefficient 0.4 at lag 6, A.Coefficients{6} = 0.4 A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.3 0.6 0.4] Lags: [0 1 4 6] Degree: 6 Dimension: 1
The lag operator polynomial object A now has nonzero coefficients at lags 0, 1, 4, and 6, and is degree 6. When lag indices are placed inside of parentheses the result is another lag-based cell array that represent a subset of the original polynomial. A0 = A.Coefficients(0) A0 = 1-D Lag-Indexed Cell Array Created at Lags [0] with Non-Zero Coefficients at Lags [0].
A0 is a new object that preserves lag-based indexing and is suitable for assignment to and from lag operator polynomial. class(A0) ans = 'internal.econ.LagIndexedArray'
In contrast, when lag indices are placed inside curly braces, the result is the same data type as the indices themselves: class(A.Coefficients{0})
2-10
Specify Lag Operator Polynomials
ans = 'double'
Difference Lag Operator Polynomials You can express the differencing operator, Δ, in lag operator polynomial notation as Δ = (1 − L) . More generally, D
ΔD = (1 − L) . To specify a first differencing operator polynomial using LagOp, specify coefficients 1 and –1 at lags 0 and 1: D1 = LagOp({1,-1},'Lags',[0,1]) D1 = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1] Lags: [0 1] Degree: 1 Dimension: 1
Similarly, the seasonal differencing operator in lag polynomial notation is Δs = (1 − Ls) . This has coefficients 1 and –1 at lags 0 and s, where s is the periodicity of the seasonality. For example, for monthly data with periodicity s = 12, D12 = LagOp({1,-1},'Lags',[0,12]) D12 = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1] Lags: [0 12] Degree: 12 Dimension: 1
This results in a polynomial object with degree 12. D
When a difference lag operator polynomial is applied to a time series yt, (1 − L) yt, this is equivalent to filtering the time series. Note that filtering a time series using a polynomial of degree D results in the loss of the first D observations. 2
Consider taking second differences of a time series yt, (1 − L) yt . You can write this differencing 2
polynomial as (1 − L) = (1 − L)(1 − L) . Create the second differencing polynomial by multiplying the polynomial D1 to itself to get the second-degree differencing polynomial: D2 = D1*D1
2-11
2
Data Preprocessing
D2 = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -2 1] Lags: [0 1 2] Degree: 2 Dimension: 1
The coefficients in the second-degree differencing polynomial correspond to the coefficients in the difference equation 2
(1 − L) yt = yt − 2yt − 1 + yt − 2 . To see the effect of filtering (differencing) on the length of a time series, simulate a data set with 10 observations to filter: rng('default') Y = randn(10,1);
Filter the time series Y using D2: Yf = filter(D2,Y); length(Yf) ans = 8
The filtered series has two observations less than the original series. The time indices for the new series can be optionally returned: Note that the time indices are given relative to time 0. That is, the original series corresponds to times 0,...,9. The filtered series loses the observations at the first two times (times 0 and 1), resulting in a series corresponding to times 2,...,9. You can also filter a time series, say Y, with a lag operator polynomial, say D2, using this shorthand syntax: Yf = D2(Y);
See Also LagOp | filter
Related Examples •
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Nonseasonal Differencing” on page 2-13
•
“Nonseasonal and Seasonal Differencing” on page 2-16
•
“Plot the Impulse Response Function of Conditional Mean Model” on page 7-86
More About •
2-12
“Moving Average Filter” on page 2-21
Nonseasonal Differencing
Nonseasonal Differencing This example shows how to take a nonseasonal difference of a time series. The time series is quarterly U.S. GDP measured from 1947 to 2005. Load the GDP data set included with the toolbox. load Data_GDP Y = Data; N = length(Y); figure plot(Y) xlim([0,N]) title('U.S. GDP')
The time series has a clear upward trend. Take a first difference of the series to remove the trend, Δyt = (1 − L)yt = yt − yt − 1 . First create a differencing lag operator polynomial object, and then use it to filter the observed series. 2-13
2
Data Preprocessing
D1 = LagOp({1,-1},'Lags',[0,1]); dY = filter(D1,Y); figure plot(2:N,dY) xlim([0,N]) title('First Differenced GDP Series')
The series still has some remaining upward trend after taking first differences. Take a second difference of the series, 2
Δ2 yt = (1 − L) yt = yt − 2yt − 1 + yt − 2 . D2 = D1*D1; ddY = filter(D2,Y); figure plot(3:N,ddY) xlim([0,N]) title('Second Differenced GDP Series')
2-14
Nonseasonal Differencing
The second-differenced series appears more stationary.
See Also LagOp | filter
Related Examples •
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Nonseasonal and Seasonal Differencing” on page 2-16
•
“Specify Lag Operator Polynomials” on page 2-9
More About •
“Data Transformations” on page 2-2
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
2-15
2
Data Preprocessing
Nonseasonal and Seasonal Differencing This example shows how to apply both nonseasonal and seasonal differencing using lag operator polynomial objects. The time series is monthly international airline passenger counts from 1949 to 1960. Load the airline data set (Data_Airline.mat). load Data_Airline y = log(DataTimeTable.PSSG); T = length(y); figure plot(DataTimeTable.Time,y) title('Log Airline Passenger Counts')
The data shows a linear trend and a seasonal component with periodicity 12. Take the first difference to address the linear trend, and the 12th difference to address the periodicity. If yt is the series to be transformed, the transformation is ΔΔ12 yt = (1 − L)(1 − L12)yt, where Δ denotes the difference operator, and L denotes the lag operator. 2-16
Nonseasonal and Seasonal Differencing
Create the lag operator polynomials 1 − L and 1 − L12. Then, multiply them to get the desired lag operator polynomial. D1 = LagOp({1 -1},'Lags',[0,1]); D12 = LagOp({1 -1},'Lags',[0,12]); D = D1*D12 D = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1 -1 1] Lags: [0 1 12 13] Degree: 13 Dimension: 1
The first polynomial, 1 − L, has coefficient 1 at lag 0 and coefficient -1 at lag 1. The seasonal differencing polynomial, 1 − L12, has coefficient 1 at lag 0, and -1 at lag 12. The product of these polynomials is (1 − L)(1 − L12) = 1 − L − L12 + L13, which has coefficient 1 at lags 0 and 13, and coefficient -1 at lags 1 and 12. Filter the data with differencing polynomial D to get the nonseasonally and seasonally differenced series. dY = filter(D,y); length(y) - length(dY) ans = 13
The filtered series is 13 observations shorter than the original series. This is due to applying a degree 13 polynomial filter. figure plot(DataTimeTable.Time(14:end),dY) title('Differenced Log Airline Passenger Counts')
2-17
2
Data Preprocessing
The differenced series has neither the trend nor seasonal component exhibited by the original series.
See Also LagOp | filter
Related Examples •
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Nonseasonal Differencing” on page 2-13
•
“Specify Lag Operator Polynomials” on page 2-9
More About
2-18
•
“Data Transformations” on page 2-2
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
Time Series Decomposition
Time Series Decomposition Time series decomposition involves separating a time series into several distinct components. There are three components that are typically of interest: • Tt, a deterministic, nonseasonal secular trend component. This component is sometimes restricted to being a linear trend, though higher-degree polynomials are also used. • St, a deterministic seasonal component with known periodicity. This component captures level shifts that repeat systematically within the same period (e.g., month or quarter) between successive years. It is often considered to be a nuisance component, and seasonal adjustment is a process for eliminating it. • It, a stochastic irregular component. This component is not necessarily a white noise process. It can exhibit autocorrelation and cycles of unpredictable duration. For this reason, it is often thought to contain information about the business cycle, and is usually the most interesting component. There are three functional forms that are most often used for representing a time series yt as a function of its trend, seasonal, and irregular components: • Additive decomposition, where yt = Tt + St + It . This is the classical decomposition. It is appropriate when there is no exponential growth in the series, and the amplitude of the seasonal component remains constant over time. For identifiability from the trend component, the seasonal and irregular components are assumed to fluctuate around zero. • Multiplicative decomposition, where yt = TtStIt . This decomposition is appropriate when there is exponential growth in the series, and the amplitude of the seasonal component grows with the level of the series. For identifiability from the trend component, the seasonal and irregular components are assumed to fluctuate around one. • Log-additive decomposition, where logyt = Tt + St + It . This is an alternative to the multiplicative decomposition. If the original series has a multiplicative decomposition, then the logged series has an additive decomposition. Using the logs can be preferable when the time series contains many small observations. For identifiability from the trend component, the seasonal and irregular components are assumed to fluctuate around zero. You can estimate the trend and seasonal components by using filters (moving averages) or parametric regression models. Given estimates T t and S t, the irregular component is estimated as I t = yt − T t − S t using the additive decomposition, and It=
yt T tS t
2-19
2
Data Preprocessing
using the multiplicative decomposition. The series yt − T t (or yt /T t using the multiplicative decomposition) is called a detrended series. Similarly, the series yt − S t (or yt /S t) is called a deseasonalized series.
See Also Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-39
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-45
•
“Parametric Trend Estimation” on page 2-24
More About
2-20
•
“Data Transformations” on page 2-2
•
“Moving Average Filter” on page 2-21
•
“Seasonal Adjustment” on page 2-37
Moving Average Filter
Moving Average Filter Some time series are decomposable into various trend components. To estimate a trend component without making parametric assumptions, you can consider using a filter. Filters are functions that turn one time series into another. By appropriate filter selection, certain patterns in the original time series can be clarified or eliminated in the new series. For example, a low-pass filter removes high frequency components, yielding an estimate of the slow-moving trend. A specific example of a linear filter is the moving average. Consider a time series yt, t = 1,...,N. A symmetric (centered) moving average filter of window length 2q + 1 is given by mt =
q
∑
j= −q
b j yt + j,
q < t < N − q.
You can choose any weights bj that sum to one. To estimate a slow-moving trend, typically q = 2 is a good choice for quarterly data (a 5-term moving average), or q = 6 for monthly data (a 13-term moving average). Because symmetric moving averages have an odd number of terms, a reasonable choice for the weights is b j = 1/4q for j = ±q, and b j = 1/2q otherwise. Implement a moving average by convolving a time series with a vector of weights using conv. You cannot apply a symmetric moving average to the q observations at the beginning and end of the series. This results in observation loss. One option is to use an asymmetric moving average at the ends of the series to preserve all observations.
See Also conv
Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Parametric Trend Estimation” on page 2-24
More About •
“Data Transformations” on page 2-2
•
“Time Series Decomposition” on page 2-19
•
“Seasonal Filters” on page 2-34
2-21
2
Data Preprocessing
Moving Average Trend Estimation This example shows how to estimate long-term trend using a symmetric moving average function. This is a convolution that you can implement using conv. The time series is monthly international airline passenger counts from 1949 to 1960. Load the airline data set (Data_Airline). load Data_Airline y = log(DataTimeTable.PSSG); T = length(y); figure plot(DataTable.Time,y) title 'Log Airline Passenger Counts'; hold on
The data shows a linear trend and a seasonal component with periodicity 12. The periodicity of the data is monthly, so a 13-term moving average is a reasonable choice for estimating the long-term trend. Use weight 1/24 for the first and last terms, and weight 1/12 for the interior terms. Add the moving average trend estimate to the observed time series plot. wts = [1/24; repmat(1/12,11,1); 1/24]; yS = conv(y,wts,'valid');
2-22
Moving Average Trend Estimation
h = plot(DataTimeTable.Time(7:end-6),yS,'r','LineWidth',2); legend(h,'13-Term Moving Average') hold off
When you use the shape parameter 'valid' in the call to conv, observations at the beginning and end of the series are lost. Here, the moving average has window length 13, so the first and last 6 observations do not have smoothed values.
See Also conv
Related Examples •
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-39
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-45
•
“Parametric Trend Estimation” on page 2-24
More About •
“Time Series Decomposition” on page 2-19
•
“Moving Average Filter” on page 2-21 2-23
2
Data Preprocessing
Parametric Trend Estimation This example shows how to estimate nonseasonal and seasonal trend components using parametric models. The time series is monthly accidental deaths in the U.S. from 1973 to 1978 (Brockwell and Davis, 2002). Load Data Load the accidental deaths data set. load Data_Accidental y = DataTimeTable.NUMD; T = length(y); figure plot(DataTimeTable.Time,y/1000) title('Monthly Accidental Deaths') ylabel('Number of Deaths (thousands)') hold on
The data shows a potential quadratic trend and a strong seasonal component with periodicity 12. Fit Quadratic Trend Fit the polynomial 2-24
Parametric Trend Estimation
Tt = β0 + β1t + β2t2 to the observed series. t = (1:T)'; X = [ones(T,1) t t.^2]; b = X\y; tH = X*b; h2 = plot(DataTimeTable.Time,tH/1000,'r','LineWidth',2); legend(h2,'Quadratic Trend Estimate') hold off
Detrend Original Series Subtract the fitted quadratic line from the original data. xt = y - tH;
Estimate Seasonal Indicator Variables Create indicator (dummy) variables for each month. The first indicator is equal to one for January observations, and zero otherwise. The second indicator is equal to one for February observations, and zero otherwise. A total of 12 indicator variables are created for the 12 months. Regress the detrended series against the seasonal indicators. 2-25
2
Data Preprocessing
mo = repmat((1:12)',6,1); sX = dummyvar(mo); bS = sX\xt; st = sX*bS; figure plot(DataTimeTable.Time,st/1000) ylabel 'Number of Deaths (thousands)'; title('Parametric Estimate of Seasonal Component (Indicators)')
In this regression, all 12 seasonal indicators are included in the design matrix. To prevent collinearity, an intercept term is not included (alternatively, you can include 11 indicators and an intercept term). Deseasonalize Original Series Subtract the estimated seasonal component from the original series. dt = y - st; figure plot(DataTimeTable.Time,dt/1000) title('Monthly Accidental Deaths (Deseasonalized)') ylabel('Number of Deaths (thousands)')
2-26
Parametric Trend Estimation
The quadratic trend is much clearer with the seasonal component removed. Estimate Irregular Component Subtract the trend and seasonal estimates from the original series. The remainder is an estimate of the irregular component. bt = y - tH - st; figure plot(DataTimeTable.Time,bt/1000) title('Irregular Component') ylabel('Number of Deaths (thousands)')
2-27
2
Data Preprocessing
You can optionally model the irregular component using a stochastic process model. References: Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also dummyvar
Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-39
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-45
More About
2-28
•
“Time Series Decomposition” on page 2-19
•
“Seasonal Adjustment” on page 2-37
Hodrick-Prescott Filter
Hodrick-Prescott Filter The Hodrick-Prescott (HP) filter is a specialized filter for trend and business cycle estimation (no seasonal component). Suppose a time series yt can be additively decomposed into a trend and business cycle component. Denote the trend component gt and the cycle component ct. Then yt = gt + ct . The HP filter finds a trend estimate, g t, by solving a penalized optimization problem. The smoothness of the trend estimate depends on the choice of penalty parameter. The cycle component, which is often of interest to business cycle analysts, is estimated as c t = yt − g t. hpfilter returns the estimated trend and cycle components of a time series.
See Also hpfilter
Related Examples •
“Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-30
More About •
“Moving Average Filter” on page 2-21
•
“Seasonal Filters” on page 2-34
•
“Time Series Decomposition” on page 2-19
2-29
2
Data Preprocessing
Use Hodrick-Prescott Filter to Reproduce Original Result This example shows how to use the Hodrick-Prescott filter to decompose a time series. The Hodrick-Prescott filter separates a time series into growth and cyclical components with yt = gt + ct where yt is a time series, gt is the growth component of yt, and ct is the cyclical component of yt for t = 1, . . . , T. The objective function for the Hodrick-Prescott filter has the form T
∑
t=1
ct2 + λ
T−1
∑
t=2
2
((gt + 1 − gt) − (gt − gt − 1))
with a smoothing parameter λ. The programming problem is to minimize the objective over all g1, . . . , gT . The conceptual basis for this programming problem is that the first sum minimizes the difference between the data and its growth component (which is the cyclical component) and the second sum minimizes the second-order difference of the growth component, which is analogous to minimization of the second derivative of the growth component. Note that this filter is equivalent to a cubic spline smoother. Use Hodrick-Prescott Filter to Analyze GNP Cyclicality Using data similar to the data found in Hodrick and Prescott [1], plot the cyclical component of GNP. This result should coincide with the results in the paper. However, since the GNP data here and in the paper are both adjusted for seasonal variations with conversion from nominal to real values, differences can be expected due to differences in the sources for the pair of adjustments. Note that our data comes from the St. Louis Federal Reserve FRED database [2], which was downloaded with the Datafeed Toolbox™. load Data_GNP % Change the following two lines to try alternative periods startdate = datetime(1950,1,1); enddate = datetime(1979,4,1); DTT = DataTimeTable(startdate:enddate,:); DTT.GNPRLog = log(DTT.GNPR);
We run the filter for different smoothing parameters λ = 400, 1600, 6400, and ∞. The infinite smoothing parameter just detrends the data. [TTbl4,CTbl4] = hpfilter(DTT,400,DataVariables="GNPRLog"); [TTbl16,CTbl16] = hpfilter(DTT,1600,DataVariables="GNPRLog"); [TTbl64,CTbl64] = hpfilter(DTT,6400,DataVariables="GNPRLog"); [TTblInf,CTblInf] = hpfilter(DTT,Inf,DataVariables="GNPRLog");
2-30
Use Hodrick-Prescott Filter to Reproduce Original Result
Plot Cyclical GNP and Its Relationship With Long-Term Trend The following code generates Figure 1 from Hodrick and Prescott [1]. plot(DTT.Time,CTbl16.GNPRLog,"b"); hold all plot(DTT.Time,CTblInf.GNPRLog - CTbl16.GNPRLog,"r"); title("\bf Figure 1 from Hodrick and Prescott"); ylabel("\bf GNP Growth"); legend(["Cyclical GNP" "Difference"]); hold off
The blue line is the cyclical component with smoothing parameter 1600 and the red line is the difference with respect to the detrended cyclical component. The difference is smooth enough to suggest that the choice of smoothing parameter is appropriate. Statistical Tests on Cyclical GNP We will now reconstruct Table 1 from Hodrick and Prescott [1]. With the cyclical components, we compute standard deviations, autocorrelations for lags 1 to 10, and perform a Dickey-Fuller unit root test to assess non-stationarity. As in the article, we see that as lambda increases, standard deviations increase, autocorrelations increase over longer lags, and the unit root hypothesis is rejected for all but the detrended case. Together, these results imply that any of the cyclical series with finite smoothing is effectively stationary. ACFTbl4 = autocorr(CTbl4,NumLags=10,DataVariable="GNPRLog"); ACFTbl16 = autocorr(CTbl16,NumLags=10,DataVariable="GNPRLog");
2-31
2
Data Preprocessing
ACFTbl64 = autocorr(CTbl64,NumLags=10,DataVariable="GNPRLog"); ACFTblInf = autocorr(CTblInf,NumLags=10,DataVariable="GNPRLog"); [StatTbl4] = adftest(CTbl4,Model="ARD"); [StatTbl16] = adftest(CTbl16,Model="ARD"); [StatTbl64] = adftest(CTbl64,Model="ARD"); [StatTblInf] = adftest(CTblInf,Model="ARD"); displayResults(CTbl4,CTbl16,CTbl64,CTblInf, ... ACFTbl4,ACFTbl16,ACFTbl64,ACFTblInf, ... StatTbl4,StatTbl16,StatTbl64,StatTblInf); Table 1 from Hodrick and Prescott Reference Smoothing Parameter 400 1600 6400 Std. Dev. 1.52 1.75 2.06 Autocorrelations 1 0.74 0.78 0.82 2 0.38 0.47 0.57 3 0.05 0.17 0.33 4 -0.21 -0.07 0.12 5 -0.36 -0.24 -0.03 6 -0.39 -0.30 -0.10 7 -0.35 -0.31 -0.13 8 -0.28 -0.29 -0.15 9 -0.22 -0.26 -0.15 10 -0.19 -0.25 -0.17 Unit Root -4.35 -4.13 -3.79 Reject H0 1 1 1
Infinity 3.11 0.92 0.81 0.70 0.59 0.50 0.44 0.39 0.35 0.31 0.26 -2.28 0
Local Function function displayResults(gnpcycle4,gnpcycle16,gnpcycle64,gnpcycleinf,... gnpacf4,gnpacf16,gnpacf64,gnpacfinf,... gnptest4,gnptest16,gnptest64,gnptestinf) % DISPLAYRESULTS Display cyclical GNP test results in tabular form fprintf(1,'Table 1 from Hodrick and Prescott Reference\n'); fprintf(1,' %10s %s\n',' ','Smoothing Parameter'); fprintf(1,' %10s %10s %10s %10s %10s\n',' ','400','1600','6400','Infinity'); fprintf(1,' %-10s %10.2f %10.2f %10.2f %10.2f\n','Std. Dev.', ... 100*std(gnpcycle4.GNPRLog),100*std(gnpcycle16.GNPRLog), ... 100*std(gnpcycle64.GNPRLog),100*std(gnpcycleinf.GNPRLog)); fprintf(1,' Autocorrelations\n'); for i = 2:11 fprintf(1,' %10g %10.2f %10.2f %10.2f %10.2f\n',(i-1), ... gnpacf4.ACF(i),gnpacf16.ACF(i),gnpacf64.ACF(i),gnpacfinf.ACF(i)) end fprintf(1,' %-10s %10.2f %10.2f %10.2f %10.2f\n','Unit Root', ... gnptest4.stat,gnptest16.stat,gnptest64.stat,gnptestinf.stat); fprintf(1,' %-10s %10d %10d %10d %10d\n','Reject H0',... gnptest4.h,gnptest16.h,gnptest64.h,gnptestinf.h); end
References [1] Hodrick, Robert J, and Edward C. Prescott. "Postwar U.S. Business Cycles: An Empirical Investigation." Journal of Money, Credit, and Banking. Vol. 29, No. 1, 1997, pp. 1–16. 2-32
Use Hodrick-Prescott Filter to Reproduce Original Result
[2] U.S. Federal Reserve Economic Data (FRED), Federal Reserve Bank of St. Louis, https:// fred.stlouisfed.org/.
See Also hpfilter
More About •
“Hodrick-Prescott Filter” on page 2-29
•
“Time Series Decomposition” on page 2-19
2-33
2
Data Preprocessing
Seasonal Filters In this section... “What Is a Seasonal Filter?” on page 2-34 “Stable Seasonal Filter” on page 2-34 “Sn × m seasonal filter” on page 2-35
What Is a Seasonal Filter? You can use a seasonal filter (moving average) to estimate the seasonal component of a time series. For example, seasonal moving averages play a large role in the X-11-ARIMA seasonal adjustment program of Statistics Canada [1] and the X-12-ARIMA seasonal adjustment program of the U.S. Census Bureau [2]. For observations made during period k, k = 1,...,s (where s is the known periodicity of the seasonality), a seasonal filter is a convolution of weights and observations made during past and future periods k. For example, given monthly data (s = 12), a smoothed January observation is a symmetric, weighted average of January data. In general, for a time series xt, t = 1,...,N, the seasonally smoothed observation at time k + js, j = 1, ...,N/s – 1, is sk +
js
r
=
∑
l= −r
alxk + ( j + l)s,
(2-1) r
with weights al such that ∑l =
− r al
= 1.
The two most commonly used seasonal filters are the stable seasonal filter and the Sn × m seasonal filter.
Stable Seasonal Filter Use a stable seasonal filter if the seasonal level does not change over time, or if you have a short time series (under 5 years). Let nk be the total number of observations made in period k. A stable seasonal filter is given by sk =
1 nk
(N/s) − 1
∑
j=1
xk + js,
for k = 1,...,s, and s k = s k − s for k > s. s
Defines = (1/s) ∑k = 1 s k . For identifiability from the trend component, • Use s k = s k − s to estimate the seasonal component for an additive decomposition model (that is, constrain the component to fluctuate around zero). • Use s k = s k /s to estimate the seasonal component for a multiplicative decomposition model (that is, constrain the component to fluctuate around one). 2-34
Seasonal Filters
Sn × m seasonal filter To apply an Sn × m seasonal filter, take a symmetric n-term moving average of m-term averages. This is equivalent to taking a symmetric, unequally weighted moving average with n + m – 1 terms (that is, use r = (n + m − 1)/2 in “Equation 2-1”). An S3×3 filter has five terms with weights 1/9, 2/9, 1/3, 2/9, 1/9 . To illustrate, suppose you have monthly data over 10 years. Let Janyy denote the value observed in January, 20yy. The S3×3-filtered value for January 2005 is J an05 =
1 1 1 1 Jan03 + Jan04 + Jan05 + Jan04 + Jan05 + Jan06 + Jan05 + Jan06 + Jan07 . 3 3 3 3
Similarly, an S3×5 filter has seven terms with weights 1/15, 2/15, 1/5, 1/5, 1/5, 2/15, 1/15 . When using a symmetric filter, observations are lost at the beginning and end of the series. You can apply asymmetric weights at the ends of the series to prevent observation loss. To center the seasonal estimate, define a moving average of the seasonally filtered series, q
s t = ∑ j = − q b js t + j . A reasonable choice for the weights areb j = 1/4q for j = ±q and b j = 1/2q otherwise. Here, q = 2 for quarterly data (a 5-term average), or q = 6 for monthly data (a 13-term average). For identifiability from the trend component, • Use s t = s t − s t to estimate the seasonal component of an additive model (that is, constrain the component to fluctuate approximately around zero). • Use s t = s t /s t to estimate the seasonal component of a multiplicative model (that is, constrain the component to fluctuate approximately around one).
References [1] Dagum, E. B. The X-11-ARIMA Seasonal Adjustment Method. Number 12–564E. Statistics Canada, Ottawa, 1980. [2] Findley, D. F., B. C. Monsell, W. R. Bell, M. C. Otto, and B.-C. Chen. “New Capabilities and Methods of the X-12-ARIMA Seasonal-Adjustment Program.” Journal of Business & Economic Statistics. Vol. 16, Number 2, 1998, pp. 127–152.
See Also Related Examples •
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-39
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-45 2-35
2
Data Preprocessing
More About
2-36
•
“Moving Average Filter” on page 2-21
•
“Seasonal Adjustment” on page 2-37
•
“Time Series Decomposition” on page 2-19
Seasonal Adjustment
Seasonal Adjustment In this section... “What Is Seasonal Adjustment?” on page 2-37 “Deseasonalized Series” on page 2-37 “Seasonal Adjustment Process” on page 2-37
What Is Seasonal Adjustment? Economists and other practitioners are sometimes interested in extracting the global trends and business cycles of a time series, free from the effect of known seasonality. Small movements in the trend can be masked by a seasonal component, a trend with fixed and known periodicity (e.g., monthly or quarterly). The presence of seasonality can make it difficult to compare relative changes in two or more series. Seasonal adjustment is the process of removing a nuisance periodic component. The result of a seasonal adjustment is a deseasonalized time series. Deseasonalized data is useful for exploring the trend and any remaining irregular component. Because information is lost during the seasonal adjustment process, you should retain the original data for future modeling purposes.
Deseasonalized Series Consider decomposing a time series, yt, into three components: • Trend component, Tt • Seasonal component, St with known periodicity s • Irregular (stationary) stochastic component, It The most common decompositions are additive, multiplicative, and log-additive. To seasonally adjust a time series, first obtain an estimate of the seasonal component, S t. The estimateS t should be constrained to fluctuate around zero (at least approximately) for additive models, and around one, approximately, for multiplicative models. These constraints allow the seasonal component to be identifiable from the trend component. Given S t, the deseasonalized series is calculated by subtracting (or dividing by) the estimated seasonal component, depending on the assumed decomposition. • For an additive decomposition, the deseasonalized series is given by d = y − S . t t t • For a multiplicative decomposition, the deseasonalized series is given by d = y /S . t t t
Seasonal Adjustment Process To best estimate the seasonal component of a series, you should first estimate and remove the trend component. Conversely, to best estimate the trend component, you should first estimate and remove the seasonal component. Thus, seasonal adjustment is typically performed as an iterative process. The following steps for seasonal adjustment resemble those used within the X-12-ARIMA seasonal adjustment program of the U.S. Census Bureau [1]. 2-37
2
Data Preprocessing
1
Obtain a first estimate of the trend component, T t, using a moving average or parametric trend estimate.
2
Detrend the original series. For an additive decomposition, calculate xt = yt − T t. For a multiplicative decomposition, calculate xt = yt /T t.
3
Apply a seasonal filter to the detrended series,xt, to obtain an estimate of the seasonal component, S t. Center the estimate to fluctuate around zero or one, depending on the chosen decomposition. Use an S3×3 seasonal filter if you have adequate data, or a stable seasonal filter otherwise.
4
Deseasonalize the original series. For an additive decomposition, calculate dt = yt − S t. For a multiplicative decomposition, calculate dt = yt /S t ..
5
Obtain a second estimate of the trend component, T t,, using the deseasonalized series dt . Consider using a Henderson filter [1], with asymmetric weights at the ends of the series.
6
Detrend the original series again. For an additive decomposition, calculate xt = yt − T t. For a multiplicative decomposition, calculate xt = yt /T t.
7
Apply a seasonal filter to the detrended series, xt, to obtain an estimate of the seasonal component, S t. Consider using an S3×5 seasonal filter if you have adequate data, or a stable seasonal filter otherwise.
8
Deseasonalize the original series. For an additive decomposition, calculate dt = yt − S t. For a multiplicative decomposition, calculate dt = yt /S t . This is the final deseasonalized series.
References [1] Findley, D. F., B. C. Monsell, W. R. Bell, M. C. Otto, and B.-C. Chen. “New Capabilities and Methods of the X-12-ARIMA Seasonal-Adjustment Program.” Journal of Business & Economic Statistics. Vol. 16, Number 2, 1998, pp. 127–152.
See Also Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-39
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-45
•
“Parametric Trend Estimation” on page 2-24
More About
2-38
•
“Time Series Decomposition” on page 2-19
•
“Seasonal Filters” on page 2-34
•
“Moving Average Filter” on page 2-21
Seasonal Adjustment Using a Stable Seasonal Filter
Seasonal Adjustment Using a Stable Seasonal Filter This example shows how to use a stable seasonal filter to deseasonalize a time series (using an additive decomposition). The time series is monthly accidental deaths in the U.S. from 1973 to 1978 (Brockwell and Davis, 2002). Load Data Load the accidental deaths data set. load Data_Accidental y = DataTimeTable.NUMD; T = length(y); figure plot(DataTimeTable.Time,y/1000) title('Monthly Accidental Deaths') ylabel('Number of deaths (thousands)') hold on
The data exhibits a strong seasonal component with periodicity 12.
2-39
2
Data Preprocessing
Apply 13-term moving average Smooth the data using a 13-term moving average. To prevent observation loss, repeat the first and last smoothed values six times. Subtract the smoothed series from the original series to detrend the data. Add the moving average trend estimate to the observed time series plot. sW13 = [1/24; repmat(1/12,11,1); 1/24]; yS = conv(y,sW13,'same'); yS(1:6) = yS(7); yS(T-5:T) = yS(T-6); xt = y-yS; plot(DataTimeTable.Time,yS/1000,'r','LineWidth',2); legend('13-Term Moving Average') hold off
The detrended time series is xt. Using the shape parameter 'same' when calling conv returns a smoothed series the same length as the original series. Create Seasonal Indices Create a cell array, sidx, to store the indices corresponding to each period. The data is monthly, with periodicity 12, so the first element of sidx is a vector with elements 1, 13, 25,...,61 (corresponding to January observations). The second element of sidx is a vector with elements 2, 14, 16,...,62 (corresponding to February observations). This is repeated for all 12 months. 2-40
Seasonal Adjustment Using a Stable Seasonal Filter
s = 12; sidx = cell(s,1); for i = 1:s sidx{i,1} = i:s:T; end sidx{1:2} ans = 1×6 1
13
25
37
49
61
14
26
38
50
62
ans = 1×6 2
Using a cell array to store the indices allows for the possibility that each period does not occur the same number of times within the span of the observed series. Apply Stable Seasonal Filter Apply a stable seasonal filter to the detrended series, xt. Using the indices constructed in Step 3, average the detrended data corresponding to each period. That is, average all of the January values (at indices 1, 13, 25,...,61), and then average all of the February values (at indices 2, 14, 26,...,62), and so on for the remaining months. Put the smoothed values back into a single vector. Center the seasonal estimate to fluctuate around zero. sst = cellfun(@(x) mean(xt(x)),sidx); % Store smoothed values in a vector of length T nc = floor(T/s); % Num. complete years rm = mod(T,s); % Num. extra months sst = [repmat(sst,nc,1);sst(1:rm)]; % Center the seasonal estimate (additive) sBar = mean(sst); sst = sst-sBar; figure plot(DataTimeTable.Time,sst/1000) title('Stable Seasonal Component') ylabel('Number of deaths (thousands)')
2-41
2
Data Preprocessing
The stable seasonal component has constant amplitude across the series. The seasonal estimate is centered, and fluctuates around zero. Deseasonalize Series Subtract the estimated seasonal component from the original data. dt = y - sst; figure plot(DataTimeTable.Time,dt/1000) title('Deseasonalized Series') ylabel('Number of deaths (thousands)')
2-42
Seasonal Adjustment Using a Stable Seasonal Filter
The deseasonalized series consists of the long-term trend and irregular components. A large-scale quadratic trend in the number of accidental deaths is clear with the seasonal component removed. References: Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002.
See Also conv | cellfun
Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-45
•
“Parametric Trend Estimation” on page 2-24
More About •
“Time Series Decomposition” on page 2-19
•
“Moving Average Filter” on page 2-21
•
“Seasonal Filters” on page 2-34 2-43
2
Data Preprocessing
•
2-44
“Seasonal Adjustment” on page 2-37
Seasonal Adjustment Using S(n,m) Seasonal Filters
Seasonal Adjustment Using S(n,m) Seasonal Filters This example shows how to apply Sn × m seasonal filters to deseasonalize a time series (using a multiplicative decomposition). The time series is monthly international airline passenger counts from 1949 to 1960. Load Data Load the airline data set. load Data_Airline y = DataTimeTable.PSSG; T = length(y); figure plot(DataTimeTable.Time,y) title('Airline Passenger Counts') hold on
The data shows an upward linear trend and a seasonal component with periodicity 12. Detrend Data Using 13-term Moving Average Before estimating the seasonal component, estimate and remove the linear trend. Apply a 13-term symmetric moving average, repeating the first and last observations six times to prevent data loss. 2-45
2
Data Preprocessing
Use weight 1/24 for the first and last terms in the moving average, and weight 1/12 for all interior terms. Divide the original series by the smoothed series to detrend the data. Add the moving average trend estimate to the observed time series plot. sW13 = [1/24;repmat(1/12,11,1);1/24]; yS = conv(y,sW13,'same'); yS(1:6) = yS(7); yS(T-5:T) = yS(T-6); xt = y./yS; plot(DataTable.Time,yS,'r','LineWidth',2) legend(["Passenger counts" "13-Term Moving Average"]) hold off
Create Seasonal Indices Create a cell array, sidx, to store the indices corresponding to each period. The data is monthly, with periodicity 12, so the first element of sidx is a vector with elements 1, 13, 25,...,133 (corresponding to January observations). The second element of sidx is a vector with elements 2, 14, 16,...,134 (corresponding to February observations). This is repeated for all 12 months. s = 12; sidx = cell(s,1); % Preallocation for i = 1:s
2-46
Seasonal Adjustment Using S(n,m) Seasonal Filters
sidx{i,1} = i:s:T; end sidx{1:2} ans = 1×12 1
13
25
37
49
61
73
85
97
109
121
133
26
38
50
62
74
86
98
110
122
134
ans = 1×12 2
14
Using a cell array to store the indices allows for the possibility that each period does not occur the same number of times within the span of the observed series. Apply S(3,3) Filter Apply a 5-term S3 × 3 seasonal moving average to the detrended series xt. That is, apply a moving average to the January values (at indices 1, 13, 25,...,133), and then apply a moving average to the February series (at indices 2, 14, 26,...,134), and so on for the remaining months. Use asymmetric weights at the ends of the moving average (using conv2). Put the smoothed values back into a single vector. To center the seasonal component around one, estimate, and then divide by, a 13-term moving average of the estimated seasonal component. % S3x3 seasonal filter % Symmetric weights sW3 = [1/9;2/9;1/3;2/9;1/9]; % Asymmetric weights for end of series aW3 = [.259 .407;.37 .407;.259 .185;.111 0]; % Apply filter to each month shat = NaN*y; for i = 1:s ns = length(sidx{i}); first = 1:4; last = ns - 3:ns; dat = xt(sidx{i}); sd = conv(dat,sW3,'same'); sd(1:2) = conv2(dat(first),1,rot90(aW3,2),'valid'); sd(ns -1:ns) = conv2(dat(last),1,aW3,'valid'); shat(sidx{i}) = sd; end % 13-term moving average of filtered series sW13 = [1/24;repmat(1/12,11,1);1/24]; sb = conv(shat,sW13,'same'); sb(1:6) = sb(s+1:s+6); sb(T-5:T) = sb(T-s-5:T-s); % Center to get final estimate s33 = shat./sb;
2-47
2
Data Preprocessing
figure plot(DataTimeTable.Time,s33) title('Estimated Seasonal Component')
Notice that the seasonal level changes over the range of the data. This illustrates the difference between an Sn × m seasonal filter and a stable seasonal filter. A stable seasonal filter assumes that the seasonal level is constant over the range of the data. Apply 13-term Henderson Filter To get an improved estimate of the trend component, apply a 13-term Henderson filter to the seasonally adjusted series. The necessary symmetric and asymmetric weights are provided in the following code. % Deseasonalize series dt = y./s33; % Henderson filter weights sWH = [-0.019;-0.028;0;.066;.147;.214; .24;.214;.147;.066;0;-0.028;-0.019]; % Asymmetric weights for end of series aWH = [-.034 -.017 .045 .148 .279 -.005 .051 .130 .215 .292 .061 .135 .201 .241 .254 .144 .205 .230 .216 .174 .211 .233 .208 .149 .080
2-48
.421; .353; .244; .120; .012;
Seasonal Adjustment Using S(n,m) Seasonal Filters
.238 .213 .147 .066 .001 -.026 -.016
.210 .146 .066 .003 -.022 -.011 0
.144 .066 .004 -.020 -.008 0 0
.068 .003 -.025 -.016 0 0 0
.002 -.039 -.042 0 0 0 0
-.058; -.092; 0 ; 0 ; 0 ; 0 ; 0 ];
% Apply 13-term Henderson filter first = 1:12; last = T-11:T; h13 = conv(dt,sWH,'same'); h13(T-5:end) = conv2(dt(last),1,aWH,'valid'); h13(1:6) = conv2(dt(first),1,rot90(aWH,2),'valid'); % New detrended series xt = y./h13; figure plot(DataTimeTable.Time,y) hold on plot(DataTimeTable.Time,h13,'r','LineWidth',2); legend(["Passenger counts" "13-Term Henderson Filter"]) title('Airline Passenger Counts') hold off
2-49
2
Data Preprocessing
Apply S(3,5) Seasonal Filter To get 6. an improved estimate of the seasonal component, apply a 7-term S3 × 5 seasonal moving average to the newly detrended series. The symmetric and asymmetric weights are provided in the following code. Center the seasonal estimate to fluctuate around 1. Deseasonalize the original series by dividing it by the centered seasonal estimate. % S3x5 seasonal filter % Symmetric weights sW5 = [1/15;2/15;repmat(1/5,3,1);2/15;1/15]; % Asymmetric weights for end of series aW5 = [.150 .250 .293; .217 .250 .283; .217 .250 .283; .217 .183 .150; .133 .067 0; .067 0 0]; % Apply filter to each month shat = NaN*y; for i = 1:s ns = length(sidx{i}); first = 1:6; last = ns-5:ns; dat = xt(sidx{i}); sd = conv(dat,sW5,'same'); sd(1:3) = conv2(dat(first),1,rot90(aW5,2),'valid'); sd(ns-2:ns) = conv2(dat(last),1,aW5,'valid'); shat(sidx{i}) = sd; end % 13-term moving average of filtered series sW13 = [1/24;repmat(1/12,11,1);1/24]; sb = conv(shat,sW13,'same'); sb(1:6) = sb(s+1:s+6); sb(T-5:T) = sb(T-s-5:T-s); % Center to get final estimate s35 = shat./sb; % Deseasonalized series dt = y./s35; figure plot(DataTimeTable.Time,dt) title('Deseasonalized Airline Passenger Counts')
2-50
Seasonal Adjustment Using S(n,m) Seasonal Filters
The deseasonalized series consists of the long-term trend and irregular components. With the seasonal component removed, it is easier to see turning points in the trend. Plot the components and the original series. Compare the original series to a series reconstructed using the component estimates. figure plot(DataTimeTable.Time,y,'Color',[.85,.85,.85],'LineWidth',4) hold on plot(DataTimeTable.Time,h13,'r','LineWidth',2) plot(DataTimeTable.Time,h13.*s35,'k--','LineWidth',1.5) legend(["Passenger counts" "13-Term Henderson Filter" ... "Trend and Seasonal Components"]) hold off title('Airline Passenger Counts')
2-51
2
Data Preprocessing
Estimate Irregular Component Detrend and deseasonalize the original series. Plot the remaining estimate of the irregular component. Irr = dt./h13; figure plot(DataTimeTable.Time,Irr) title('Airline Passenger Counts Irregular Component')
2-52
Seasonal Adjustment Using S(n,m) Seasonal Filters
You can optionally model the detrended and deseasonalized series using a stationary stochastic process model.
See Also conv | conv2 | cellfun
Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-39
•
“Parametric Trend Estimation” on page 2-24
More About •
“Time Series Decomposition” on page 2-19
•
“Moving Average Filter” on page 2-21
•
“Seasonal Filters” on page 2-34
•
“Seasonal Adjustment” on page 2-37
2-53
3 Model Selection • “Box-Jenkins Methodology” on page 3-2 • “Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4 • “Autocorrelation and Partial Autocorrelation” on page 3-11 • “Ljung-Box Q-Test” on page 3-18 • “Detect Autocorrelation” on page 3-20 • “Engle’s ARCH Test” on page 3-26 • “Detect ARCH Effects” on page 3-28 • “Unit Root Nonstationarity” on page 3-33 • “Unit Root Tests” on page 3-41 • “Assess Stationarity of a Time Series” on page 3-51 • “Information Criteria for Model Selection” on page 3-54 • “Model Comparison Tests” on page 3-58 • “Conduct Lagrange Multiplier Test” on page 3-62 • “Conduct Wald Test” on page 3-65 • “Compare GARCH Models Using Likelihood Ratio Test” on page 3-67 • “Classical Model Misspecification Tests” on page 3-70 • “Check Fit of Multiplicative ARIMA Model” on page 3-81 • “Goodness of Fit” on page 3-86 • “Residual Diagnostics” on page 3-87 • “Assess Predictive Performance” on page 3-89 • “Nonspherical Models” on page 3-90 • “Plot a Confidence Band Using HAC Estimates” on page 3-91 • “Change the Bandwidth of a HAC Estimator” on page 3-98 • “Check Model Assumptions for Chow Test” on page 3-104 • “Power of the Chow Test” on page 3-110
3
Model Selection
Box-Jenkins Methodology The Box-Jenkins methodology [1] is a five-step process for identifying, selecting, and assessing conditional mean models (for discrete, univariate time series data). 1
Establish the stationarity of your time series. If your series is not stationary, successively difference your series to attain stationarity. The sample autocorrelation function (ACF) and partial autocorrelation function (PACF) of a stationary series decay exponentially (or cut off completely after a few lags).
2
Identify a (stationary) conditional mean model for your data. The sample ACF and PACF functions can help with this selection. For an autoregressive (AR) process, the sample ACF decays gradually, but the sample PACF cuts off after a few lags. Conversely, for a moving average (MA) process, the sample ACF cuts off after a few lags, but the sample PACF decays gradually. If both the ACF and PACF decay gradually, consider an ARMA model.
3
Specify the model, and estimate the model parameters. When fitting nonstationary models in Econometrics Toolbox, it is not necessary to manually difference your data and fit a stationary model. Instead, use your data on the original scale, and create an arima model object with the desired degree of nonseasonal and seasonal differencing. Fitting an ARIMA model directly is advantageous for forecasting: forecasts are returned on the original scale (not differenced).
4
Conduct goodness-of-fit checks to ensure the model describes your data adequately. Residuals should be uncorrelated, homoscedastic, and normally distributed with constant mean and variance. If the residuals are not normally distributed, you can change your innovation distribution to a Student’s t.
5
After choosing a model—and checking its fit and forecasting ability—you can use the model to forecast or generate Monte Carlo simulations over a future time horizon.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler Objects arima Functions autocorr | parcorr
Related Examples
3-2
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4
•
“Assess Predictive Performance” on page 3-89
Box-Jenkins Methodology
•
“Check Fit of Multiplicative ARIMA Model” on page 3-81
•
“Box-Jenkins Differencing vs. ARIMA Estimation” on page 7-109
More About •
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Goodness of Fit” on page 3-86
•
“Conditional Mean Models” on page 7-3
3-3
3
Model Selection
Select ARIMA Model for Time Series Using Box-Jenkins Methodology This example shows how to use the Box-Jenkins methodology to select an ARIMA model. The time series is the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991. Load the Data Load and plot the Australian CPI data. load Data_JAustralian y = DataTable.PAU; T = length(y); figure plot(y) h1 = gca; h1.XLim = [0,T]; h1.XTick = 1:10:T; h1.XTickLabel = datestr(dates(1:10:T),17); title('Log Quarterly Australian CPI')
The series is nonstationary, with a clear upward trend.
3-4
Select ARIMA Model for Time Series Using Box-Jenkins Methodology
Plot the Sample ACF and PACF Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) for the CPI series. figure subplot(2,1,1) autocorr(y) subplot(2,1,2) parcorr(y)
The significant, linearly decaying sample ACF indicates a nonstationary process. Difference the Data Take a first difference of the data, and plot the differenced series. dY = diff(y); figure plot(dY) h2 = gca; h2.XLim = [0,T]; h2.XTick = 1:10:T; h2.XTickLabel = datestr(dates(2:10:T),17); title('Differenced Log Quarterly Australian CPI')
3-5
3
Model Selection
Differencing removes the linear trend. The differenced series appears more stationary. Plot the Sample ACF and PACF of the Differenced Series Plot the sample ACF and PACF of the differenced series to look for behavior more consistent with a stationary process. figure subplot(2,1,1) autocorr(dY) subplot(2,1,2) parcorr(dY)
3-6
Select ARIMA Model for Time Series Using Box-Jenkins Methodology
The sample ACF of the differenced series decays more quickly. The sample PACF cuts off after lag 2. This behavior is consistent with a second-degree autoregressive (AR(2)) model. Specify and Estimate an ARIMA(2,1,0) Model Specify, and then estimate, an ARIMA(2,1,0) model for the log quarterly Australian CPI. This model has one degree of nonseasonal differencing and two AR lags. By default, the innovation distribution is Gaussian with a constant variance. Mdl = arima(2,1,0); EstMdl = estimate(Mdl,y); ARIMA(2,1,0) Model (Gaussian Distribution): Value __________ Constant AR{1} AR{2} Variance
0.010072 0.21206 0.33728 9.2302e-05
StandardError _____________ 0.0032802 0.095428 0.10378 1.1112e-05
TStatistic __________
PValue __________
3.0707 2.2222 3.2499 8.3066
0.0021356 0.02627 0.0011543 9.8491e-17
Both AR coefficients are significant at the 0.05 significance level.
3-7
3
Model Selection
Check Goodness of Fit Infer the residuals from the fitted model. Check that the residuals are normally distributed and uncorrelated. res = infer(EstMdl,y); figure subplot(2,2,1) plot(res./sqrt(EstMdl.Variance)) title('Standardized Residuals') subplot(2,2,2) qqplot(res) subplot(2,2,3) autocorr(res) subplot(2,2,4) parcorr(res) hvec = findall(gcf,'Type','axes'); set(hvec,'TitleFontSizeMultiplier',0.8,... 'LabelFontSizeMultiplier',0.8);
The residuals are reasonably normally distributed and uncorrelated. Generate Forecasts Generate forecasts and approximate 95% forecast intervals for the next 4 years (16 quarters). 3-8
Select ARIMA Model for Time Series Using Box-Jenkins Methodology
[yF,yMSE] = forecast(EstMdl,16,y); UB = yF + 1.96*sqrt(yMSE); LB = yF - 1.96*sqrt(yMSE); figure h4 = plot(y,'Color',[.75,.75,.75]); hold on h5 = plot(78:93,yF,'r','LineWidth',2); h6 = plot(78:93,UB,'k--','LineWidth',1.5); plot(78:93,LB,'k--','LineWidth',1.5); fDates = [dates; dates(T) + cumsum(diff(dates(T-16:T)))]; h7 = gca; h7.XTick = 1:10:(T+16); h7.XTickLabel = datestr(fDates(1:10:end),17); legend([h4,h5,h6],'Log CPI','Forecast',... 'Forecast Interval','Location','Northwest') title('Log Australian CPI Forecast') hold off
References:
3-9
3
Model Selection
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler Objects arima Functions autocorr | parcorr | estimate | infer | forecast
Related Examples •
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Box-Jenkins Differencing vs. ARIMA Estimation” on page 7-109
•
“Nonseasonal Differencing” on page 2-13
•
“Infer Residuals for Diagnostic Checking” on page 7-143
•
“Specify Conditional Mean Models” on page 7-5
More About
3-10
•
“Box-Jenkins Methodology” on page 3-2
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
•
“Goodness of Fit” on page 3-86
•
“MMSE Forecasting of Conditional Mean Models” on page 7-172
Autocorrelation and Partial Autocorrelation
Autocorrelation and Partial Autocorrelation In this section... “What Are Autocorrelation and Partial Autocorrelation?” on page 3-11 “Theoretical ACF and PACF” on page 3-11 “Sample ACF and PACF” on page 3-11 “Compute Sample ACF and PACF in MATLAB®” on page 3-12
What Are Autocorrelation and Partial Autocorrelation? Autocorrelation is the linear dependence of a variable with itself at two points in time. For stationary processes, autocorrelation between any two observations depends only on the time lag h between them. Define Cov(yt, yt–h) = γh. Lag-h autocorrelation is given by ρh = Corr(yt, yt − h) =
γh . γ0
The denominator γ0 is the lag 0 covariance, that is, the unconditional variance of the process. Correlation between two variables can result from a mutual linear dependence on other variables (confounding). Partial autocorrelation is the autocorrelation between yt and yt–h after the removal of any linear dependence on y1, y2, ..., yt–h+1. The partial lag-h autocorrelation is denoted ϕh, h .
Theoretical ACF and PACF The autocorrelation function (ACF) for a time series yt, t = 1,...,N, is the sequence ρh, h = 1, 2,...,N – 1. The partial autocorrelation function (PACF) is the sequence ϕh, h, h = 1, 2,...,N – 1. The theoretical ACF and PACF for the AR, MA, and ARMA conditional mean models are known, and are different for each model. These differences among models are important to keep in mind when you select models. Conditional Mean Model
ACF Behavior
PACF Behavior
AR(p)
Tails off gradually
Cuts off after p lags
MA(q)
Cuts off after q lags
Tails off gradually
ARMA(p,q)
Tails off gradually
Tails off gradually
Sample ACF and PACF Sample autocorrelation and sample partial autocorrelation are statistics that estimate the theoretical autocorrelation and partial autocorrelation. Using these qualitative model selection tools, you can compare the sample ACF and PACF of your data against known theoretical autocorrelation functions [1]. For an observed series y1, y2,...,yT, denote the sample mean y . The sample lag-h autocorrelation is given by 3-11
3
Model Selection
ρh =
∑Tt = h + 1 (yt − y)(yt − h − y) . ∑Tt = 1 (yt − y)2
The standard error for testing the significance of a single lag-h autocorrelation, ρ h, is approximately h−1
2
SEρ = (1 + 2 ∑i = 1 ρ i )/N . When you use autocorr to plot the sample autocorrelation function (also known as the correlogram), approximate 95% confidence intervals are drawn at ±2SEρ by default. Optional input arguments let you modify the calculation of the confidence bounds. The sample lag-h partial autocorrelation is the estimated lag-h coefficient in an AR model containing h lags, ϕ h, h . The standard error for testing the significance of a single lag-h partial autocorrelation is approximately 1/ N . When you use parcorr to plot the sample partial autocorrelation function, approximate 95% confidence intervals are drawn at ±2/ N by default. Optional input arguments let you modify the calculation of the confidence bounds.
Compute Sample ACF and PACF in MATLAB® This example shows how to compute and plot the sample ACF and PACF of a time series by using the Econometrics Toolbox™ functions autocorr and parcorr, and the Econometric Modeler app. Generate Synthetic Time Series Simulate an MA(2) process yt by filtering a series of 1000 standard Gaussian deviates εt through the difference equation yt = εt − εt − 1 + εt − 2 . rng('default') % For reproducibility e = randn(1000,1); y = filter([1 -1 1],1,e);
Plot and Compute ACF Plot the sample ACF of yt by passing the simulated time series to autocorr. autocorr(y)
3-12
Autocorrelation and Partial Autocorrelation
The sample autocorrelation of lags greater than 2 is insignificant. Compute the sample ACF by calling autocorr again. Return the first output argument. acf = autocorr(y) acf = 21×1 1.0000 -0.6682 0.3618 -0.0208 0.0146 -0.0311 0.0611 -0.0828 0.0772 -0.0493 ⋮
acf(j) is the sample autocorrelation of yt at lag j – 1. Plot and Compute PACF Plot the sample PACF of yt by passing the simulated time series to parcorr. parcorr(y)
3-13
3
Model Selection
The sample PACF gradually decreases with increasing lag. Compute the sample PACF by calling parcorr again. Return the first output argument. pacf = parcorr(y) pacf = 21×1 1.0000 -0.6697 -0.1541 0.2929 0.3421 0.0314 -0.1483 -0.2290 -0.0394 0.1419 ⋮
pacf(j) is the sample partial autocorrelation of yt at lag j – 1. The sample ACF and PACF suggest that yt is an MA(2) process. Use Econometric Modeler Open the Econometric Modeler app by entering econometricModeler at the command prompt. 3-14
Autocorrelation and Partial Autocorrelation
econometricModeler
Load the simulated time series y. 1
On the Econometric Modeler tab, in the Import section, select Import > Import From Workspace.
2
In the Import Data dialog box, in the Import? column, select the check box for the y variable.
3
Click Import.
The variable y1 appears in the Data Browser, and its time series plot appears in the Time Series Plot(y1) figure window. Plot the sample ACF by clicking ACF on the Plots tab. Plot the sample PACF by clicking PACF on the Plots tab. Position the PACF plot below the ACF plot by dragging the PACF(y1) tab to the lower half of the document.
3-15
3
Model Selection
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler Functions autocorr | parcorr
More About • 3-16
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
Autocorrelation and Partial Autocorrelation
•
“Detect Autocorrelation” on page 3-20
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect ARCH Effects” on page 3-28
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4
•
“Ljung-Box Q-Test” on page 3-18
•
“Autoregressive Model” on page 7-17
•
“Moving Average Model” on page 7-26
•
“Autoregressive Moving Average Model” on page 7-35
3-17
3
Model Selection
Ljung-Box Q-Test The sample autocorrelation function (ACF) and partial autocorrelation function (PACF) are useful qualitative tools to assess the presence of autocorrelation at individual lags. The Ljung-Box Q-test is a more quantitative way to test for autocorrelation at multiple lags jointly [1]. The null hypothesis for this test is that the first m autocorrelations are jointly zero, H0 : ρ1 = ρ2 = … = ρm = 0. The choice of m affects test performance. If N is the length of your observed time series, choosing m ≈ ln(N) is recommended for power [2]. You can test at multiple values of m. If seasonal autocorrelation is possible, you might consider testing at larger values of m, such as 10 or 15. The Ljung-Box test statistic is given by m
Q(m) = N(N + 2) ∑h = 1
2
ρh . N−h
This is a modification of the Box-Pierce Portmanteau “Q” statistic [3]. Under the null hypothesis, Q(m) 2 distribution. follows a χm You can use the Ljung-Box Q-test to assess autocorrelation in any series with a constant mean. This includes residual series, which can be tested for autocorrelation during model diagnostic checks. If the residuals result from fitting a model with g parameters, you should compare the test statistic to a χ 2 distribution with m – g degrees of freedom. Optional input arguments to lbqtest let you modify the degrees of freedom of the null distribution. You can also test for conditional heteroscedasticity by conducting a Ljung-Box Q-test on a squared residual series. An alternative test for conditional heteroscedasticity is Engle’s ARCH test (archtest).
References [1] Ljung, G. and G. E. P. Box. “On a Measure of Lack of Fit in Time Series Models.” Biometrika. Vol. 66, 1978, pp. 67–72. [2] Tsay, R. S. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2010. [3] Box, G. E. P. and D. Pierce. “Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models.” Journal of the American Statistical Association. Vol. 65, 1970, pp. 1509–1526.
See Also Apps Econometric Modeler Functions lbqtest | archtest
3-18
Ljung-Box Q-Test
Related Examples •
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect Autocorrelation” on page 3-20
•
“Detect ARCH Effects” on page 3-28
More About •
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Engle’s ARCH Test” on page 3-26
•
“Residual Diagnostics” on page 3-87
•
“Conditional Mean Models” on page 7-3
3-19
3
Model Selection
Detect Autocorrelation In this section... “Compute Sample ACF and PACF” on page 3-20 “Conduct the Ljung-Box Q-Test” on page 3-22
Compute Sample ACF and PACF This example shows how to compute the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) to qualitatively assess autocorrelation. The time series is 57 consecutive days of overshorts from a gasoline tank in Colorado. Step 1. Load the data. Load the time series of overshorts. load('Data_Overshort.mat') Y = Data; N = length(Y); figure plot(Y) xlim([0,N]) title('Overshorts for 57 Consecutive Days')
3-20
Detect Autocorrelation
The series appears to be stationary. Step 2. Plot the sample ACF and PACF. Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). figure subplot(2,1,1) autocorr(Y) subplot(2,1,2) parcorr(Y)
3-21
3
Model Selection
The sample ACF and PACF exhibit significant autocorrelation. The sample ACF has significant autocorrelation at lag 1. The sample PACF has significant autocorrelation at lags 1, 3, and 4. The distinct cutoff of the ACF combined with the more gradual decay of the PACF suggests an MA(1) model might be appropriate for this data. Step 3. Store the sample ACF and PACF values. Store the sample ACF and PACF values up to lag 15. acf = autocorr(Y,'NumLags',15); pacf = parcorr(Y,'NumLags',15); [length(acf) length(pacf)] ans = 1×2 16
16
The outputs acf and pacf are vectors storing the sample autocorrelation and partial autocorrelation at lags 0, 1,...,15 (a total of 16 lags).
Conduct the Ljung-Box Q-Test This example shows how to conduct the Ljung-Box Q-test for autocorrelation. 3-22
Detect Autocorrelation
The time series is 57 consecutive days of overshorts from a gasoline tank in Colorado. Step 1. Load the data. Load the time series of overshorts. load('Data_Overshort.mat') Y = Data; N = length(Y); figure plot(Y) xlim([0,N]) title('Overshorts for 57 Consecutive Days')
The data appears to fluctuate around a constant mean, so no data transformations are needed before conducting the Ljung-Box Q-test. Step 2. Conduct the Ljung-Box Q-test. Conduct the Ljung-Box Q-test for autocorrelation at lags 5, 10, and 15. [h,p,Qstat,crit] = lbqtest(Y,'Lags',[5,10,15]) h = 1x3 logical array 1
1
1
3-23
3
Model Selection
p = 1×3 0.0016
0.0007
0.0013
30.5986
36.9639
18.3070
24.9958
Qstat = 1×3 19.3604 crit = 1×3 11.0705
All outputs are vectors with three elements, corresponding to tests at each of the three lags. The first element of each output corresponds to the test at lag 5, the second element corresponds to the test at lag 10, and the third element corresponds to the test at lag 15. The test decisions are stored in the vector h. The value h = 1 means reject the null hypothesis. Vector p contains the p-values for the three tests. At the α = 0 . 05 significance level, the null hypothesis of no autocorrelation is rejected at all three lags. The conclusion is that there is significant autocorrelation in the series. The test statistics and χ 2 critical values are given in outputs Qstat and crit, respectively.
References [1] Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002.
See Also Apps Econometric Modeler Functions autocorr | lbqtest | parcorr
Related Examples •
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects” on page 3-28
•
“Choose ARMA Lags Using BIC” on page 7-140
•
“Specify Multiplicative ARIMA Model” on page 7-58
•
“Specify Conditional Mean and Variance Models” on page 7-81
More About
3-24
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Ljung-Box Q-Test” on page 3-18
Detect Autocorrelation
•
“Moving Average Model” on page 7-26
•
“Goodness of Fit” on page 3-86
3-25
3
Model Selection
Engle’s ARCH Test An uncorrelated time series can still be serially dependent due to a dynamic conditional variance process. A time series exhibiting conditional heteroscedasticity—or autocorrelation in the squared series—is said to have autoregressive conditional heteroscedastic (ARCH) effects. Engle’s ARCH test is a Lagrange multiplier test to assess the significance of ARCH effects [1]. Consider a time series yt = μt + εt, whereμt is the conditional mean of the process, andεt is an innovation process with mean zero. Suppose the innovations are generated as εt = σtzt, where zt is an independent and identically distributed process with mean 0 and variance 1. Thus, E(εtεt + h) = 0 for all lags h ≠ 0 and the innovations are uncorrelated. Let Ht denote the history of the process available at time t. The conditional variance of yt is Var(yt Ht − 1) = Var(εt Ht − 1) = E(εt2 Ht − 1) = σt2 . Thus, conditional heteroscedasticity in the variance process is equivalent to autocorrelation in the squared innovation process. Define the residual series et = yt − μ t . If all autocorrelation in the original series, yt, is accounted for in the conditional mean model, then the residuals are uncorrelated with mean zero. However, the residuals can still be serially dependent. The alternative hypothesis for Engle’s ARCH test is autocorrelation in the squared residuals, given by the regression Ha : et2 = α0 + α1et2− 1 + … + αmet2− m + ut, where ut is a white noise error process. The null hypothesis is H0 : α0 = α1 = … = αm = 0. To conduct Engle’s ARCH test using archtest, you need to specify the lag m in the alternative hypothesis. One way to choose m is to compare loglikelihood values for different choices of m. You can use the likelihood ratio test (lratiotest) or information criteria (aicbic) to compare loglikelihood values. To generalize to a GARCH alternative, note that a GARCH(P,Q) model is locally equivalent to an ARCH(P + Q) model. This suggests also considering values m = P + Q for reasonable choices of P and Q. 3-26
Engle’s ARCH Test
The test statistic for Engle’s ARCH test is the usual F statistic for the regression on the squared residuals. Under the null hypothesis, the F statistic follows a χ 2 distribution with m degrees of freedom. A large critical value indicates rejection of the null hypothesis in favor of the alternative. As an alternative to Engle’s ARCH test, you can check for serial dependence (ARCH effects) in a residual series by conducting a Ljung-Box Q-test on the first m lags of the squared residual series with lbqtest. Similarly, you can explore the sample autocorrelation and partial autocorrelation functions of the squared residual series for evidence of significant autocorrelation.
References [1] Engle, Robert F. “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica. Vol. 50, 1982, pp. 987–1007.
See Also archtest | lbqtest | lratiotest | aicbic
Related Examples •
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect ARCH Effects” on page 3-28
•
“Specify Conditional Mean and Variance Models” on page 7-81
More About •
“Ljung-Box Q-Test” on page 3-18
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Model Comparison Tests” on page 3-58
•
“Information Criteria for Model Selection” on page 3-54
•
“Conditional Variance Models” on page 8-2
3-27
3
Model Selection
Detect ARCH Effects In this section... “Test Autocorrelation of Squared Residuals” on page 3-28 “Conduct Engle's ARCH Test” on page 3-30
Test Autocorrelation of Squared Residuals This example shows how to inspect a squared residual series for autocorrelation by plotting the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). Then, conduct a Ljung-Box Q-test to more formally assess autocorrelation. Load the Data. Load the NASDAQ data included with the toolbox. Convert the daily close composite index series to a percentage return series. load Data_EquityIdx; y = DataTable.NASDAQ; r = 100*price2ret(y); T = length(r); figure plot(r) xlim([0,T]) title('NASDAQ Daily Returns')
3-28
Detect ARCH Effects
The returns appear to fluctuate around a constant level, but exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. The returns are of relatively high frequency. Therefore, the daily changes can be small. For numerical stability, it is good practice to scale such data. Plot the Sample ACF and PACF. Plot the sample ACF and PACF for the squared residual series. e = r - mean(r); figure subplot(2,1,1) autocorr(e.^2) subplot(2,1,2) parcorr(e.^2)
3-29
3
Model Selection
The sample ACF and PACF show significant autocorrelation in the squared residual series. This indicates that volatility clustering is present in the residual series. Conduct a Ljung-Box Q-test. Conduct a Ljung-Box Q-test on the squared residual series at lags 5 and 10. [h,p] = lbqtest(e.^2,'Lags',[5,10]) h = 1x2 logical array 1
1
p = 1×2 0
0
The null hypothesis is rejected for the two tests (h = 1). The p values for both tests is 0. Thus, not all of the autocorrelations up to lag 5 (or 10) are zero, indicating volatility clustering in the residual series.
Conduct Engle's ARCH Test This example shows how to conduct Engle's ARCH test for conditional heteroscedasticity. 3-30
Detect ARCH Effects
Load and Preprocess Data Load the NASDAQ data included with the toolbox. Convert the daily close composite index series to a return series. load Data_EquityIdx ReturnsTbl = price2ret(DataTable); figure plot(ReturnsTbl.NASDAQ) title('NASDAQ Daily Returns') axis tight
The returns appear to fluctuate around a constant level, but exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. The returns are of relatively high frequency. Therefore, the daily changes can be small. For numerical stability, it is good practice to scale such data. Conduct Engle's ARCH Test Conduct Engle's ARCH test for conditional heteroscedasticity on the residual series from a fit of the percent returns series to a constant-only model. Specify two lags in the alternative hypothesis. ReturnsTbl.Residuals_NASDAQ = 100*(ReturnsTbl.NASDAQ - mean(ReturnsTbl.NASDAQ)); StatTbl = archtest(ReturnsTbl,DataVariable="Residuals_NASDAQ",Lags=2)
3-31
3
Model Selection
StatTbl=1×6 table h _____ Test 1
true
pValue ______ 0
stat ______
cValue ______
399.97
5.9915
Lags ____ 2
Alpha _____ 0.05
The null hypothesis is soundly rejected (h = 1, p = 0) in favor of the ARCH(2) alternative. The F statistic for the test is 399.97, much larger than the critical value from the χ 2 distribution with two degrees of freedom, 5.99. The test concludes there is significant volatility clustering in the residual series.
See Also archtest | autocorr | lbqtest | parcorr
Related Examples •
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect Autocorrelation” on page 3-20
•
“Specify Conditional Mean and Variance Models” on page 7-81
More About
3-32
•
“Engle’s ARCH Test” on page 3-26
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Conditional Variance Models” on page 8-2
Unit Root Nonstationarity
Unit Root Nonstationarity In this section... “What Is a Unit Root Test?” on page 3-33 “Modeling Unit Root Processes” on page 3-33 “Available Tests” on page 3-37 “Testing for Unit Roots” on page 3-38
What Is a Unit Root Test? A unit root process is a data-generating process whose first difference is stationary. In other words, a unit root process yt has the form yt = yt–1 + stationary process. A unit root test attempts to determine whether a given time series is consistent with a unit root process. The next section gives more details of unit root processes, and suggests why it is important to detect them.
Modeling Unit Root Processes There are two basic models for economic data with linear growth characteristics: • Trend-stationary process (TSP): yt = c + δt + stationary process • Unit root process, also called a difference-stationary process (DSP): Δyt = δ + stationary process Here Δ is the differencing operator, Δyt = yt – yt–1 = (1 – L)yt, where L is the lag operator defined by Liyt = yt – i. The processes are indistinguishable for finite data. In other words, there are both a TSP and a DSP that fit a finite data set arbitrarily well. However, the processes are distinguishable when restricted to a particular subclass of data-generating processes, such as AR(p) processes. After fitting a model to data, a unit root test checks if the AR(1) coefficient is 1. There are two main reasons to distinguish between these types of processes: • “Forecasting” on page 3-33 • “Spurious Regression” on page 3-37 Forecasting A TSP and a DSP produce different forecasts. Basically, shocks to a TSP return to the trend line c + δt as time increases. In contrast, shocks to a DSP might be persistent over time. For example, consider the simple trend-stationary model y1, t = 0 . 9y1, t − 1 + 0 . 02t + ε1, t 3-33
3
Model Selection
and the difference-stationary model y2, t = 0 . 2 + y2, t − 1 + ε2, t . In these models, ε1, t and ε2, t are independent innovation processes. For this example, the innovations are independent and distributed N(0,1). Both processes grow at rate 0.2. To calculate the growth rate for the TSP, which has a linear term 0 . 02t, set ε1, t = 0. Then, solve the model y1, t = c + δt for c and δ. c + δt = 0 . 9(c + δ(t − 1)) + 0 . 02t . The solution is c = − 1 . 8, δ = 0 . 2. A plot for t = 1:1000 shows the TSP stays very close to the trend line, while the DSP has persistent deviations away from the trend line. T = 1000; % Sample size t = (1:T)'; % Period vector rng(5); % For reproducibility randm = randn(T,2); % Innovations y = zeros(T,2); % Columns of y are data series % Build trend stationary series y(:,1) = .02*t + randm(:,1); for ii = 2:T y(ii,1) = y(ii,1) + y(ii-1,1)*.9; end % Build difference stationary series y(:,2) = .2 + randm(:,2); y(:,2) = cumsum(y(:,2)); figure plot(y(:,1),'b') hold on plot(y(:,2),'g') plot((1:T)*0.2,'k--') legend('Trend Stationary','Difference Stationary',... 'Trend Line','Location','NorthWest') hold off
3-34
Unit Root Nonstationarity
Forecasts based on the two series are different. To see this difference, plot the predicted behavior of the two series using varm, estimate, and forecast. The following plot shows the last 100 data points in the two series and predictions of the next 100 points, including confidence bounds. AR = {[NaN 0; 0 NaN]}; % Independent response series trend = [NaN; 0]; % Linear trend in first series only Mdl = varm('AR',AR,'Trend',trend); EstMdl = estimate(Mdl,y); EstMdl.SeriesNames = ["Trend stationary" "Difference stationary"]; [ynew,ycov] = forecast(EstMdl,100,y); % This generates predictions for 100 time steps seY = sqrt(diag(EstMdl.Covariance))'; % Extract standard deviations of y CIY = zeros([size(y) 2]); % In-sample intervals CIY(:,:,1) = y - seY; CIY(:,:,2) = y + seY; extractFSE = cellfun(@(x)sqrt(diag(x))',ycov,'UniformOutput',false); seYNew = cell2mat(extractFSE); CIYNew = zeros([size(ynew) 2]); % Forecast intervals CIYNew(:,:,1) = ynew - seYNew; CIYNew(:,:,2) = ynew + seYNew; tx = (T-100:T+100); hs = 1:2;
3-35
3
Model Selection
figure; for j = 1:Mdl.NumSeries hs(j) = subplot(2,1,j); hold on; h1 = plot(tx,tx*0.2,'k--'); axis tight; ha = gca; h2 = plot(tx,[y(end-100:end,j); ynew(:,j)]); h3 = plot(tx(1:101),squeeze(CIY(end-100:end,j,:)),'r:'); plot(tx(102:end),squeeze(CIYNew(:,j,:)),'r:'); h4 = fill([tx(102) ha.XLim([2 2]) tx(102)],ha.YLim([1 1 2 2]),[0.7 0.7 0.7],... 'FaceAlpha',0.1,'EdgeColor','none'); title(EstMdl.SeriesNames{j}); hold off; end legend(hs(1),[h1 h2 h3(1) h4],... {'Trend','Process','Interval estimate','Forecast horizon'},'Location','Best');
Examine the fitted parameters by passing the estimated model to summarize, and you find estimate did an excellent job. The TSP has confidence intervals that do not grow with time, whereas the DSP has confidence intervals that grow. Furthermore, the TSP goes to the trend line quickly, while the DSP does not tend towards the trend line y = 0 . 2t asymptotically.
3-36
Unit Root Nonstationarity
Spurious Regression The presence of unit roots can lead to false inferences in regressions between time series. Suppose xt and yt are unit root processes with independent increments, such as random walks with drift xt = c1 + xt–1 + ε1(t) yt = c2 + yt–1 + ε2(t), where εi(t) are independent innovations processes. Regressing y on x results, in general, in a nonzero regression coefficient, and significant coefficient of determination R2. This result holds despite xt and yt being independent random walks. If both processes have trends (ci ≠ 0), there is a correlation between x and y because of their linear trends. However, even if the ci = 0, the presence of unit roots in the xt and yt processes yields correlation. For more information on spurious regression, see Granger and Newbold [1] and “Time Series Regression IV: Spurious Regression” on page 5-197.
Available Tests There are four Econometrics Toolbox tests for unit roots. These functions test for the existence of a single unit root. When there are two or more unit roots, the results of these tests might not be valid. • “Dickey-Fuller and Phillips-Perron Tests” on page 3-37 • “KPSS Test” on page 3-38 • “Variance Ratio Test” on page 3-38 Dickey-Fuller and Phillips-Perron Tests adftest performs the augmented Dickey-Fuller test. pptest performs the Phillips-Perron test. These two classes of tests have a null hypothesis of a unit root process of the form yt = yt–1 + c + δt + εt, which the functions test against an alternative model yt = γyt–1 + c + δt + εt, where γ < 1. The null and alternative models for a Dickey-Fuller test are like those for a PhillipsPerron test. The difference is adftest extends the model with extra parameters accounting for serial correlation among the innovations: yt = c + δt + γyt – 1 + ϕ1Δyt – 1 + ϕ2Δyt – 2 +...+ ϕpΔyt – p + εt, where • L is the lag operator: Lyt = yt–1. • Δ = 1 – L, so Δyt = yt – yt–1. • εt is the innovations process. Phillips-Perron adjusts the test statistics to account for serial correlation. 3-37
3
Model Selection
There are three variants of both adftest and pptest, corresponding to the following values of the 'model' parameter: • 'AR' assumes c and δ, which appear in the preceding equations, are both 0; the 'AR' alternative has mean 0. • 'ARD' assumes δ is 0. The 'ARD' alternative has mean c/(1–γ). • 'TS' makes no assumption about c and δ. For information on how to choose the appropriate value of 'model', see “Choose Models to Test” on page 3-38. KPSS Test The KPSS test, kpsstest, is an inverse of the Phillips-Perron test: it reverses the null and alternative hypotheses. The KPSS test uses the model: yt = ct + δt + ut, with ct = ct–1 + vt. Here ut is a stationary process, and vt is an i.i.d. process with mean 0 and variance σ2. The null hypothesis is that σ2 = 0, so that the random walk term ct becomes a constant intercept. The alternative is σ2 > 0, which introduces the unit root in the random walk. Variance Ratio Test The variance ratio test, vratiotest, is based on the fact that the variance of a random walk increases linearly with time. vratiotest can also take into account heteroscedasticity, where the variance increases at a variable rate with time. The test has a null hypotheses of a random walk: Δyt = εt.
Testing for Unit Roots • “Transform Data” on page 3-38 • “Choose Models to Test” on page 3-38 • “Determine Appropriate Lags” on page 3-39 • “Conduct Unit Root Tests at Multiple Lags” on page 3-39 Transform Data Transform your time series to be approximately linear before testing for a unit root. If a series has exponential growth, take its logarithm. For example, GDP and consumer prices typically have exponential growth, so test their logarithms for unit roots. If you want to transform your data to be stationary instead of approximately linear, unit root tests can help you determine whether to difference your data, or to subtract a linear trend. For a discussion of this topic, see “What Is a Unit Root Test?” on page 3-33 Choose Models to Test • For adftest or pptest, choose model in as follows: 3-38
Unit Root Nonstationarity
• If your data shows a linear trend, set model to 'TS'. • If your data shows no trend, but seem to have a nonzero mean, set model to 'ARD'. • If your data shows no trend and seem to have a zero mean, set model to 'AR' (the default). • For kpsstest, set trend to true (default) if the data shows a linear trend. Otherwise, set trend to false. • For vratiotest, set IID to true if you want to test for independent, identically distributed innovations (no heteroscedasticity). Otherwise, leave IID at the default value, false. Linear trends do not affect vratiotest. Determine Appropriate Lags Setting appropriate lags depends on the test you use: • adftest — One method is to begin with a maximum lag, such as the one recommended by Schwert [2]. Then, test down by assessing the significance of the coefficient of the term at lag pmax. Schwert recommends a maximum lag of pmax = maximum lag = 12 T /100
1/4
,
where x is the integer part of x. The usual t statistic is appropriate for testing the significance of coefficients, as reported in the reg output structure. Another method is to combine a measure of fit, such as SSR, with information criteria such as AIC, BIC, and HQC. These statistics also appear in the reg output structure. Ng and Perron [3] provide further guidelines. • kpsstest — One method is to begin with few lags, and then evaluate the sensitivity of the results by adding more lags. For consistency of the Newey-West estimator, the number of lags must go to infinity as the sample size increases. Kwiatkowski et al. [4] suggest using a number of lags on the order of T1/2, where T is the sample size. For an example of choosing lags for kpsstest, see “Test Time Series Data for Unit Root” on page 3-45. • pptest — One method is to begin with few lags, and then evaluate the sensitivity of the results by adding more lags. Another method is to look at sample autocorrelations of yt – yt–1; slow rates of decay require more lags. The Newey-West estimator is consistent if the number of lags is O(T1/4), where T is the effective sample size, adjusted for lag and missing values. White and Domowitz [5] and Perron [6] provide further guidelines. For an example of choosing lags for pptest, see “Test Time Series Data for Unit Root” on page 345. • vratiotest does not use lags. Conduct Unit Root Tests at Multiple Lags Run multiple tests simultaneously by entering a vector of parameters for lags, alpha, model, or test. All vector parameters must have the same length. The test expands any scalar parameter to the length of a vector parameter. For an example using this technique, see “Test Time Series Data for Unit Root” on page 3-45.
3-39
3
Model Selection
References [1] Granger, C. W. J., and P. Newbold. “Spurious Regressions in Econometrics.” Journal of Econometrics. Vol 2, 1974, pp. 111–120. [2] Schwert, W. “Tests for Unit Roots: A Monte Carlo Investigation.” Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159. [3] Ng, S., and P. Perron. “Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag.” Journal of the American Statistical Association. Vol. 90, 1995, pp. 268–281. [4] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [5] White, H., and I. Domowitz. “Nonlinear Regression with Dependent Observations.” Econometrica. Vol. 52, 1984, pp. 143–162. [6] Perron, P. “Trends and Random Walks in Macroeconomic Time Series: Further Evidence from a New Approach.” Journal of Economic Dynamics and Control. Vol. 12, 1988, pp. 297–332.
See Also adftest | kpsstest | pptest | vratiotest
Related Examples
3-40
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Unit Root Tests” on page 3-41
•
“Assess Stationarity of a Time Series” on page 3-51
Unit Root Tests
Unit Root Tests In this section... “Test Simulated Data for a Unit Root” on page 3-41 “Test Time Series Data for Unit Root” on page 3-45 “Test Stock Data for a Random Walk” on page 3-48
Test Simulated Data for a Unit Root This example shows how to test univariate time series models for stationarity. It shows how to simulate data from four types of models: trend stationary, difference stationary, stationary (AR(1)), and a heteroscedastic, random walk model. It also shows that the tests yield expected results. Simulate four time series. T = 1e3; t = (1:T)';
% Sample size % Time multiple
rng(142857);
% For reproducibility
y1 = randn(T,1) + .2*t; % Trend stationary Mdl2 = arima(D=1,Constant=0.2,Variance=1); y2 = simulate(Mdl2,T,Y0=0); % Difference stationary Mdl3 = arima(AR=0.99,Constant=0.2,Variance=1); y3 = simulate(Mdl3,T,Y0=0); % AR(1) Mdl4 = arima(D=1,Constant=0.2,Variance=1); sigma = (sin(t/200) + 1.5)/2; % Std deviation e = randn(T,1).*sigma; % Innovations y4 = filter(Mdl4,e,Y0=0); % Heteroscedastic
Plot the first 100 points in each series. y = [y1 y2 y3 y4]; figure; plot1 = plot(y(1:100,:)); plot1(1).LineWidth = 2; plot1(3).LineStyle = ":"; plot1(3).LineWidth = 2; plot1(4).LineStyle = ":"; plot1(4).LineWidth = 2; title("\bf First 100 Periods of Each Series"); legend("Trend Stationary','Difference Stationary","AR(1)", ... "Heteroscedastic",Location="NorthWest");
3-41
3
Model Selection
All of the models appear nonstationary and behave similarly. Therefore, you might find it difficult to distinguish which series comes from which model simply by looking at their initial segments. Plot the entire data set. plot2 = plot(y); plot2(1).LineWidth = 2; plot2(3).LineStyle = ":"; plot2(3).LineWidth = 2; plot2(4).LineStyle = ":"; plot2(4).LineWidth = 2; title("\bf Each Entire Series"); legend("Trend Stationary','Difference Stationary","AR(1)", ... "Heteroscedastic",Location="NorthWest");
3-42
Unit Root Tests
The differences between the series are clearer here: • The trend stationary series has little deviation from its mean trend. • The difference stationary and heteroscedastic series have persistent deviations away from the trend line. • The AR(1) series exhibits long-run stationary behavior; the others grow linearly. • The difference stationary and heteroscedastic series appear similar. However, that the heteroscedastic series has much more local variability near period 300, and much less near period 900. The model variance is maximal when sin(t/200) = 1, at time 100π ≈ 314. The model variance is minimal when sin(t/200) = − 1, at time 300π ≈ 942. Therefore, the visual variability matches the model. Use the Augmented Dicky-Fuller test on the three growing series (y1, y2, and y4) to assess whether the series have a unit root. Since the series are growing, specify that there is a trend. In this case, the null hypothesis is H0 : yt = yt − 1 + c + b1 Δyt − 1 + b2 Δyt − 2 + εt and the alternative hypothesis is H1 : yt = ayt − 1 + c + δt + b1 Δyt − 1 + b2 Δyt − 2 + εt. Set the number of lags to 2 for demonstration purposes. hY1 = adftest(y1,Model="ts",Lags=2) hY1 = logical 1 hY2 = adftest(y2,Model="ts",Lags=2)
3-43
3
Model Selection
hY2 = logical 0 hY4 = adftest(y4,Model="ts",Lags=2) hY4 = logical 0
• hY1 = 1 indicates that there is sufficient evidence to suggest that y1 is trend stationary. This is the correct decision because y1 is trend stationary by construction. • hY2 = 0 indicates that there is not enough evidence to suggest that y2 is trend stationary. This is the correct decision since y2 is difference stationary by construction. • hY4 = 0 indicates that there is not enough evidence to suggest that y4 is trend stationary. This is the correct decision, however, the Dickey-Fuller test is not appropriate for a heteroscedastic series. Use the Augmented Dickey-Fuller test on the AR(1) series (y3) to assess whether the series has a unit root. Since the series is not growing, specify that the series is autoregressive with a drift term. In this case, the null hypothesis is H0 : yt = yt − 1 + b1 Δyt − 1 + b2 Δyt − 2 + εt and the alternative hypothesis is H1 : yt = ayt − 1 + b1 Δyt − 1 + b2 Δyt − 2 + εt. Set the number of lags to 2 for demonstration purposes. hY3 = adftest(y3,Model="ard",Lags=2) hY3 = logical 1
hY3 = 1 indicates that there is enough evidence to suggest that y3 is a stationary, autoregressive process with a drift term. This is the correct decision because y3 is an autoregressive process with a drift term by construction. Use the KPSS test to assess whether the series are unit root nonstationary. Specify that there is a trend in the growing series (y1, y2, and y4). The KPSS test assumes the following model: yy = ct + δt + ut ct = ct − 1 + εt, where ut is a stationary process and εt is an independent and identically distributed process with mean 0 and variance σ2. Whether there is a trend in the model, the null hypothesis is H0 : σ2 = 0 (the series is trend stationary) and the alternative hypothesis is H1 : σ2 > 0 (not trend stationary). Set the number of lags to 2 for demonstration purposes. hY1 = kpsstest(y1,Lags=2,Trend=true) hY1 = logical 0 hY2 = kpsstest(y2,Lags=2,Trend=true) hY2 = logical 1
3-44
Unit Root Tests
hY3 = kpsstest(y3,Lags=2) hY3 = logical 1 hY4 = kpsstest(y4,Lags=2,Trend=true) hY4 = logical 1
All is tests result in the correct decision. Use the variance ratio test on al four series to assess whether the series are random walks. The null hypothesis is H0: Var(Δyt) is constant, and the alternative hypothesis is H1: Var(Δyt) is not constant. Specify that the innovations are independent and identically distributed for all but y1. Test y4 both ways. hY1 = vratiotest(y1) hY1 = logical 1 hY2 = vratiotest(y2,IID=true) hY2 = logical 0 hY3 = vratiotest(y3,IID=true) hY3 = logical 0 hY4NotIID = vratiotest(y4) hY4NotIID = logical 0 hY4IID = vratiotest(y4,IID=true) hY4IID = logical 0
All tests result in the correct decisions, except for hY4_2 = 0. This test does not reject the hypothesis that the heteroscedastic process is an IID random walk. This inconsistency might be associated with the random seed. Alternatively, you can assess stationarity using pptest
Test Time Series Data for Unit Root
3-45
3
Model Selection
This example shows how to test a univariate time series for a unit root. It uses wages data (1900-1970) in the manufacturing sector. The series is in the Nelson-Plosser data set. Load the Nelson-Plosser data. Extract the nominal wages data. load Data_NelsonPlosser wages = DataTable.WN;
Trim the NaN values from the series and the corresponding dates (this step is optional because the test ignores NaN values). wDates = dates(isfinite(wages)); wages = wages(isfinite(wages));
Plot the data to look for trends. plot(wDates,wages) title('Wages')
The plot suggests exponential growth. Transform the data using the log function to linearize the series. logWages = log(wages); plot(wDates,logWages) title('Log Wages')
3-46
Unit Root Tests
The plot suggests that time series has a linear trend. Test the null hypothesis that there is no unit root (trend stationary) against the alternative hypothesis that the series is a unit root process with a trend (difference stationary). Set 'Lags',7:2:11, as suggested in Kwiatkowski et al., 1992. [h1,pValue1] = kpsstest(logWages,'Lags',7:2:11) h1 = 1x3 logical array 0
0
0
pValue1 = 1×3 0.1000
0.1000
0.1000
kpsstest fails to reject the null hypothesis that the wage series is trend stationary. Test the null hypothesis that the series is a unit root process (difference stationary) against the alternative hypothesis that the series is trend stationary. [h2,pValue2] = adftest(logWages,'Model','ts') h2 = logical 0
3-47
3
Model Selection
pValue2 = 0.8327
adftest fails to reject the null hypothesis that the wage series is a unit root process. Because the results of the two tests are inconsistent, it is unclear that the wage series has a unit root. This is a typical result of tests on many macroeconomic series. kpsstest has a limited set of calculated critical values. When it calculates a test statistic that is outside this range, the test reports the p-value at the appropriate endpoint. So, in this case, pValue reflects the closest tabulated value. When a test statistic lies inside the span of tabulated values, kpsstest linearly interpolates the p-value.
Test Stock Data for a Random Walk This example shows how to assess whether a time series is a random walk. It uses market data for daily returns of stocks and cash (money market) from the period January 1, 2000 to November 7, 2005. Load the data. load CAPMuniverse
Extract two series to test. The first column of data is the daily return of a technology stock. The last (14th) column is the daily return for cash (the daily money market rate). tech1 = Data(:,1); money = Data(:,14);
The returns are the logs of the ratios of values at the end of a day over the values at the beginning of the day. Convert the data to prices (values) instead of returns. vratiotest takes prices as inputs, as opposed to returns. tech1 = cumsum(tech1); money = cumsum(money);
Plot the data to see whether they appear to be stationary. subplot(2,1,1) plot(Dates,tech1); title('Log(relative stock value)') datetick('x') hold on subplot(2,1,2); plot(Dates,money) title('Log(accumulated cash)') datetick('x') hold off
3-48
Unit Root Tests
Cash has a small variability, and appears to have long-term trends. The stock series has a good deal of variability, and no definite trend, though it appears to increase towards the end. Test whether the stock series matches a random walk. [h,pValue,stat,cValue,ratio] = vratiotest(tech1) h = logical 0 pValue = 0.1646 stat = -1.3899 cValue = 1.9600 ratio = 0.9436
vratiotest does not reject the hypothesis that a random walk is a reasonable model for the stock series. Test whether an i.i.d. random walk is a reasonable model for the stock series. [h,pValue,stat,cValue,ratio] = vratiotest(tech1,'IID',true) h = logical 1
3-49
3
Model Selection
pValue = 0.0304 stat = -2.1642 cValue = 1.9600 ratio = 0.9436
vratiotest rejects the hypothesis that an i.i.d. random walk is a reasonable model for the tech1 stock series at the 5% level. Thus, vratiotest indicates that the most appropriate model of the tech1 series is a heteroscedastic random walk. Test whether the cash series matches a random walk. [h,pValue,stat,cValue,ratio] = vratiotest(money) h = logical 1 pValue = 4.6093e-145 stat = 25.6466 cValue = 1.9600 ratio = 2.0006
vratiotest emphatically rejects the hypothesis that a random walk is a reasonable model for the cash series (pValue = 4.6093e-145). The removal of a trend from the series does not affect the resulting statistics.
References [1] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178.
See Also adftest | kpsstest | pptest | vratiotest
More About
3-50
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Unit Root Nonstationarity” on page 3-33
Assess Stationarity of a Time Series
Assess Stationarity of a Time Series This example shows how to check whether a linear time series is a unit root process in several ways. You can assess unit root nonstationarity statistically, visually, and algebraically. Simulate Data Suppose that the true model for a linear time series is
where the innovation series is iid with mean 0 and variance 1.5. Simulate data from this model. This model is a unit root process because the lag polynomial of the right side has characteristic root 1. Mdl = arima('AR',0.2,'MA',-0.5,'D',1,'Constant',0,... 'Variance',1.5); T = 30; rng(5); Y = simulate(Mdl,T);
Assess Stationarity Statistically Econometrics Toolbox™ has four formal tests to choose from to check if a time series is nonstationary: adftest, kpsstest, pptest, and vratiotest. Use adftest to perform the DickeyFuller test on the data that you simulated in the previous steps. adftest(Y) ans = logical 0
The test result indicates that you should not reject the null hypothesis that the series is a unit root process. Assess Stationarity Visually Suppose you don't have the time series model, but you have the data. Inspect a plot of the data. Also, inspect the plots of the sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF). plot(Y); title('Simulated Time Series') xlabel('t') ylabel('Y') subplot(2,1,1) autocorr(Y) subplot(2,1,2) parcorr(Y)
3-51
3
Model Selection
The downward sloping of the plot indicates a unit root process. The lengths of the line segments on the ACF plot gradually decay, and continue this pattern for increasing lags. This behavior indicates a nonstationary series. Assess Stationarity Algebraically Suppose you have the model in standard form:
Write the equation in lag operator notation and solve for
to get
Use LagOp to convert the rational polynomial to a polynomial. Also, use isStable to inspect the characteristic roots of the denominator. num = LagOp([1 -0.5]); denom = LagOp([1 -1.2 0.2]); quot = mrdivide(num,denom); [r1,r2] = isStable(denom) Warning: Termination window not currently open and coefficients are not below tolerance.
3-52
Assess Stationarity of a Time Series
r1 = logical 0 r2 = 1.0000 0.2000
This warning indicates that the resulting quotient has a degree larger than 1001, e.g., there might not be a terminating degree. This indicates instability. r1 = 0 indicates that the denominator is unstable. r2 is a vector of characteristics roots, one of the roots is 1. Therefore, this is a unit root process. isStable is a numerical routine that calculates the characteristic values of a polynomial. If you use quot as an argument to isStable, then the output might indicate that the polynomial is stable (i.e., all characteristic values are slightly less than 1). You might need to adjust the tolerance options of isStable to get more accurate results.
See Also More About •
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
3-53
3
Model Selection
Information Criteria for Model Selection Misspecification tests, such as the likelihood ratio (lratiotest), Lagrange multiplier (lmtest), and Wald (waldtest) tests, are appropriate only for comparing nested models. In contrast, information criteria are model selection tools to compare any models fit to the same data—the models being compared do not need to be nested. Information criteria are likelihood-based measures of model fit that include a penalty for complexity (specifically, the number of parameters). Different information criteria are distinguished by the form of the penalty, and can favor different models. Let logL θ denote the value of the maximized loglikelihood objective function for a model with k parameters fit to T data points. The aicbic function returns these information criteria: • Akaike information criterion (AIC). — The AIC compares models from the perspective of information entropy, as measured by Kullback-Leibler divergence. The AIC for a given model is −2logL θ + 2k . • Bayesian (Schwarz) information criterion (BIC) — The BIC compares models from the perspective of decision theory, as measured by expected loss. The BIC for a given model is −2logL θ + klog T . • Corrected AIC (AICc) — In small samples, AIC tends to overfit. The AICc adds a second-order bias-correction term to the AIC for better performance in small samples. The AICc for a given model is AIC+
2k k + 1 . T−k−1
The bias-correction term increases the penalty on the number of parameters relative to the AIC. Because the term approaches 0 with increasing sample size, AICc approaches AIC asymptotically. The analysis in [3] suggests using AICc when numObs/numParam < 40. • Consistent AIC (CAIC) — The CAIC imposes an additional penalty for complex models, as compared to the BIC. The CAIC for a given model is −2logL θ + k log T + 1 = BIC + k . • Hannan-Quinn criterion (HQC) — The HQC imposes a smaller penalty on complex models than the BIC in large samples. The HQC for a given model is −2logL θ + 2klog log T . Regardless of the information criterion, when you compare values for multiple models, smaller values of the criterion indicate a better, more parsimonious fit. Some experts scale information criteria values by T. aicbic scales results when you set the 'Normalize' name-value pair argument to true.
Compute Information Criteria Using aicbic
3-54
Information Criteria for Model Selection
This example shows how to use aicbic to compute information criteria for several competing GARCH models fit to simulated data. Although this example uses aicbic, some Statistics and Machine Learning Toolbox™ and Econometrics Toolbox™ model fitting functions also return information criteria in their estimation summaries. Simulate Data Simulate a random path of length 50 from the ARCH(1) data generating process (DGP) yt = εt εt2 = 0 . 5 + 0 . 1εt2− 1, where εt is a random Gaussian series of innovations. rng(1) % For reproducibility DGP = garch('ARCH',{0.1},'Constant',0.5); T = 50; y = simulate(DGP,T); plot(y) ylabel('Innovation') xlabel('Time')
Create Competing Models Assume that the DGP is unknown, and that the ARCH(1), GARCH(1,1), ARCH(2), and GARCH(1,2) models are appropriate for describing the DGP. 3-55
3
Model Selection
For each competing model, create a garch model template for estimation. Mdl(1) Mdl(2) Mdl(3) Mdl(4)
= = = =
garch(0,1); garch(1,1); garch(0,2); garch(1,2);
Estimate Models Fit each model to the simulated data y, compute the loglikelihood, and suppress the estimation display. numMdl = numel(Mdl); logL = zeros(numMdl,1); % Preallocate numParam = zeros(numMdl,1); for j = 1:numMdl [EstMdl,~,logL(j)] = estimate(Mdl(j),y,'Display','off'); results = summarize(EstMdl); numParam(j) = results.NumEstimatedParameters; end
Compute and Compare Information Criteria For each model, compute all available information criteria. Normalize the results by the sample size T. [~,~,ic] = aicbic(logL,numParam,T,'Normalize',true) ic = struct with fields: aic: [1.7619 1.8016 bic: [1.8384 1.9163 aicc: [1.7670 1.8121 caic: [1.8784 1.9763 hqc: [1.7911 1.8453
1.8019 1.9167 1.8124 1.9767 1.8456
1.8416] 1.9946] 1.8594] 2.0746] 1.8999]
ic is a 1-D structure array with a field for each information criterion. Each field contains a vector of measurements; element j corresponds to the model yielding loglikelihood logL(j). For each criterion, determine the model that yields the minimum value. [~,minIdx] = structfun(@min,ic); [Mdl(minIdx).Description]' ans = 5x1 string "GARCH(0,1) Conditional "GARCH(0,1) Conditional "GARCH(0,1) Conditional "GARCH(0,1) Conditional "GARCH(0,1) Conditional
3-56
Variance Variance Variance Variance Variance
Model Model Model Model Model
(Gaussian (Gaussian (Gaussian (Gaussian (Gaussian
Distribution)" Distribution)" Distribution)" Distribution)" Distribution)"
Information Criteria for Model Selection
The model that minimizes all criteria is the ARCH(1) model, which has the same structure as the DGP.
References [1] Akaike, Hirotugu. "Information Theory and an Extension of the Maximum Likelihood Principle.” In Selected Papers of Hirotugu Akaike, edited by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa, 199–213. New York: Springer, 1998. https://doi.org/10.1007/978-1-4612-1694-0_15. [2] Akaike, Hirotugu. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19, no. 6 (December 1974): 716–23. https://doi.org/10.1109/ TAC.1974.1100705. [3] Burnham, Kenneth P., and David R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed, New York: Springer, 2002. [4] Hannan, Edward J., and Barry G. Quinn. “The Determination of the Order of an Autoregression.” Journal of the Royal Statistical Society: Series B (Methodological) 41, no. 2 (January 1979): 190–95. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x. [5] Lütkepohl, Helmut, and Markus Krätzig, editors. Applied Time Series Econometrics. 1st ed. Cambridge University Press, 2004. https://doi.org/10.1017/CBO9780511606885. [6] Schwarz, Gideon. “Estimating the Dimension of a Model.” The Annals of Statistics 6, no. 2 (March 1978): 461–64. https://doi.org/10.1214/aos/1176344136.
See Also aicbic | lratiotest | lmtest | waldtest
More About •
“Choose ARMA Lags Using BIC” on page 7-140
•
“Compare Conditional Variance Models Using Information Criteria” on page 8-69
•
“Model Comparison Tests” on page 3-58
•
“Goodness of Fit” on page 3-86
3-57
3
Model Selection
Model Comparison Tests In this section... “Available Tests” on page 3-58 “Likelihood Ratio Test” on page 3-60 “Lagrange Multiplier Test” on page 3-60 “Wald Test” on page 3-60 “Covariance Matrix Estimation” on page 3-61
Available Tests The primary goal of model selection is choosing the most parsimonious model that adequately fits your data. Three asymptotically equivalent tests compare a restricted model (the null model) against an unrestricted model (the alternative model), fit to the same data: • Likelihood ratio (LR) test • Lagrange multiplier (LM) test • Wald (W) test For a model with parameters θ, consider the restriction r(θ) = 0, which is satisfied by the null model. For example, consider testing the null hypothesis θ = θ0 . The restriction function for this test is r(θ) = θ − θ0 . The LR, LM, and Wald tests approach the problem of comparing the fit of a restricted model against MLE
an unrestricted model differently. For a given data set, let l(θ0
) denote the loglikelihood function MLE
evaluated at the maximum likelihood estimate (MLE) of the restricted (null) model. Let l(θA ) denote the loglikelihood function evaluated at the MLE of the unrestricted (alternative) model. The following figure illustrates the rationale behind each test.
3-58
Model Comparison Tests
• Likelihood ratio test. If the restricted model is adequate, then the difference between the MLE
maximized objective functions, l(θA
MLE
) − l(θ0
), should not significantly differ from zero.
• Lagrange multiplier test. If the restricted model is adequate, then the slope of the tangent of the loglikelihood function at the restricted MLE (indicated by T0 in the figure) should not significantly differ from zero (which is the slope of the tangent of the loglikelihood function at the unrestricted MLE, indicated by T). • Wald test. If the restricted model is adequate, then the restriction function evaluated at the unrestricted MLE should not significantly differ from zero (which is the value of the restriction function at the restricted MLE). The three tests are asymptotically equivalent. Under the null, the LR, LM, and Wald test statistics are all distributed as χ 2 with degrees of freedom equal to the number of restrictions. If the test statistic exceeds the test critical value (equivalently, the p-value is less than or equal to the significance level), the null hypothesis is rejected. That is, the restricted model is rejected in favor of the unrestricted model. Choosing among the LR, LM, and Wald test is largely determined by computational cost: • To conduct a likelihood ratio test, you need to estimate both the restricted and unrestricted models. • To conduct a Lagrange multiplier test, you only need to estimate the restricted model (but the test requires an estimate of the variance-covariance matrix). 3-59
3
Model Selection
• To conduct a Wald test, you only need to estimate the unrestricted model (but the test requires an estimate of the variance-covariance matrix). All things being equal, the LR test is often the preferred choice for comparing nested models. Econometrics Toolbox has functionality for all three tests.
Likelihood Ratio Test You can conduct a likelihood ratio test using lratiotest. The required inputs are: • Value of the maximized unrestricted loglikelihood, l(θMLE) A • Value of the maximized restricted loglikelihood, l(θMLE) 0 • Number of restrictions (degrees of freedom) Given these inputs, the likelihood ratio test statistic is 2
MLE
G = 2 × l(θA
MLE
) − l(θ0
) .
When estimating conditional mean and variance models (using arima, garch, egarch, or gjr), you can return the value of the loglikelihood objective function as an optional output argument of estimate or infer. For multivariate time series models, you can get the value of the loglikelihood objective function using estimate.
Lagrange Multiplier Test The required inputs for conducting a Lagrange multiplier test are: • Gradient of the unrestricted likelihood evaluated at the restricted MLEs (the score), S • Variance-covariance matrix for the unrestricted parameters evaluated at the restricted MLEs, V Given these inputs, the LM test statistic is LM = S′VS . You can conduct an LM test using lmtest. A specific example of an LM test is Engle’s ARCH test, which you can conduct using archtest.
Wald Test The required inputs for conducting a Wald test are: • Restriction function evaluated at the unrestricted MLE, r • Jacobian of the restriction function evaluated at the unrestricted MLEs, R • Variance-covariance matrix for the unrestricted parameters evaluated at the unrestricted MLEs, V Given these inputs, the test statistic for the Wald test is −1
W = r′(RVR′)
r.
You can conduct a Wald test using waldtest. 3-60
Model Comparison Tests
Tip You can often compute the Jacobian of the restriction function analytically. Or, if you have Symbolic Math Toolbox™, you can use the function jacobian.
Covariance Matrix Estimation For estimating a variance-covariance matrix, there are several common methods, including: • Outer product of gradients (OPG). Let G be the matrix of gradients of the loglikelihood function. If your data set has N observations, and there are m parameters in the unrestricted likelihood, then G is an N × m matrix. −1
The matrix (G′G)
is the OPG estimate of the variance-covariance matrix.
For arima, garch, egarch, and gjr models, the estimate method returns the OPG estimate of the variance-covariance matrix. • Inverse negative Hessian (INH). Given the loglikelihood function l(θ), the INH covariance estimate has elements cov(i, j) = −
∂2 l(θ) ∂θi ∂θ j
−1
.
The estimation function for multivariate models, estimate, returns the expected Hessian variance-covariance matrix. Tip If you have Symbolic Math Toolbox, you can use jacobian twice to calculate the Hessian matrix for your loglikelihood function.
See Also Objects arima | garch | egarch | gjr Functions lmtest | waldtest | lratiotest
Related Examples •
“Conduct Lagrange Multiplier Test” on page 3-62
•
“Conduct Wald Test” on page 3-65
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-67
More About •
“Goodness of Fit” on page 3-86
•
“Information Criteria for Model Selection” on page 3-54
•
“Maximum Likelihood Estimation for Conditional Mean Models” on page 7-112
•
“Maximum Likelihood Estimation for Conditional Variance Models” on page 8-52 3-61
3
Model Selection
Conduct Lagrange Multiplier Test This example shows how to calculate the required inputs for conducting a Lagrange multiplier (LM) test with lmtest. The LM test compares the fit of a restricted model against an unrestricted model by testing whether the gradient of the loglikelihood function of the unrestricted model, evaluated at the restricted maximum likelihood estimates (MLEs), is significantly different from zero. The required inputs for lmtest are the score function and an estimate of the unrestricted variancecovariance matrix evaluated at the restricted MLEs. This example compares the fit of an AR(1) model against an AR(2) model. Compute Restricted MLE Obtain the restricted MLE by fitting an AR(1) model (with a Gaussian innovation distribution) to the given data. Assume you have presample observations ( y−1, y0) = (9.6249,9.6396). Y = [10.1591; 10.5965; 10.0357; 9.6318; Y0 = [9.6249;
10.1675; 10.1957; 10.6558; 10.2243; 10.4429; 10.3848; 10.3972; 9.9478; 9.6402; 9.7761; 10.8202; 10.3668; 10.3980; 10.2892; 9.6310; 9.1378; 9.6318; 9.1378]; 9.6396];
Mdl = arima(1,0,0); EstMdl = estimate(Mdl,Y,'Y0',Y0); ARIMA(1,0,0) Model (Gaussian Distribution): Value _______ Constant AR{1} Variance
3.2999 0.67097 0.12506
StandardError _____________ 2.4606 0.24635 0.043015
TStatistic __________
PValue _________
1.3411 2.7237 2.9074
0.17988 0.0064564 0.0036441
When conducting an LM test, only the restricted model needs to be fit. Compute Gradient Matrix Estimate the variance-covariance matrix for the unrestricted AR(2) model using the outer product of gradients (OPG) method. For an AR(2) model with Gaussian innovations, the contribution to the loglikelihood function at time t is given by logLt = − 0 . 5log(2πσε2) −
2
(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2) 2σε2
where σε2 is the variance of the innovation distribution. The contribution to the gradient at time t is ∂logLt ∂logLt ∂logLt ∂logLt , ∂c ∂ϕ1 ∂ϕ2 ∂σε2 3-62
Conduct Lagrange Multiplier Test
where ∂logLt = ∂c
yt − c − ϕ1 yt − 1 − ϕ2 yt − 2
∂logLt = ∂ϕ1
yt − 1(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2)
∂logLt = ∂ϕ2
yt − 2(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2)
∂logLt
(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2) 1 + 2 2σε 2σε4
∂σε2
σε2 σε2 σε2 2
=−
Evaluate the gradient matrix, G, at the restricted MLEs (using ϕ2 = 0 ). c = EstMdl.Constant; phi1 = EstMdl.AR{1}; phi2 = 0; sig2 = EstMdl.Variance; Yt = Y; Yt1 = [9.6396; Y(1:end-1)]; Yt2 = [9.6249; Yt1(1:end-1)]; N = length(Y); G = zeros(N,4); G(:,1) = (Yt-c-phi1*Yt1-phi2*Yt2)/sig2; G(:,2) = Yt1.*(Yt-c-phi1*Yt1-phi2*Yt2)/sig2; G(:,3) = Yt2.*(Yt-c-phi1*Yt1-phi2*Yt2)/sig2; G(:,4) = -0.5/sig2 + 0.5*(Yt-c-phi1*Yt1-phi2*Yt2).^2/sig2^2;
Estimate Variance-Covariance Matrix Compute the OPG variance-covariance matrix estimate. V = inv(G'*G) V = 4×4 6.1431 -0.6966 0.0827 0.0367
-0.6966 0.1535 -0.0846 -0.0061
0.0827 -0.0846 0.0771 0.0024
0.0367 -0.0061 0.0024 0.0019
Numerical inaccuracies can occur due to computer precision. To make the variance-covariance matrix symmetric, combine half of its value with half of its transpose. V = V/2 + V'/2;
Calculate Score Function Evaluate the score function (the sum of the individual contributions to the gradient). score = sum(G);
3-63
3
Model Selection
Conduct Lagrange Multiplier Test Conduct the Lagrange multiplier test to compare the restricted AR(1) model against the unrestricted AR(2) model. The number of restrictions (the degree of freedom) is one. [h,p,LMstat,crit] = lmtest(score,V,1) h = logical 0 p = 0.5787 LMstat = 0.3084 crit = 3.8415
The restricted AR(1) model is not rejected in favor of the AR(2) model (h = 0).
See Also Objects arima Functions estimate | lmtest
Related Examples •
“Conduct Wald Test” on page 3-65
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-67
More About
3-64
•
“Model Comparison Tests” on page 3-58
•
“Goodness of Fit” on page 3-86
•
“Autoregressive Model” on page 7-17
Conduct Wald Test
Conduct Wald Test This example shows how to calculate the required inputs for conducting a Wald test with waldtest. The Wald test compares the fit of a restricted model against an unrestricted model by testing whether the restriction function, evaluated at the unrestricted maximum likelihood estimates (MLEs), is significantly different from zero. The required inputs for waldtest are a restriction function, the Jacobian of the restriction function evaluated at the unrestricted MLEs, and an estimate of the variance-covariance matrix evaluated at the unrestricted MLEs. This example compares the fit of an AR(1) model against an AR(2) model. Compute Unrestricted MLE Obtain the unrestricted MLEs by fitting an AR(2) model (with a Gaussian innovation distribution) to the given data. Assume you have presample observations (y−1, y0) = (9.6249,9.6396) Y = [10.1591; 10.5965; 10.0357; 9.6318; Y0 = [9.6249;
10.1675; 10.1957; 10.6558; 10.2243; 10.4429; 10.3848; 10.3972; 9.9478; 9.6402; 9.7761; 10.8202; 10.3668; 10.3980; 10.2892; 9.6310; 9.1378; 9.6318; 9.1378]; 9.6396];
Mdl = arima(2,0,0); [EstMdl,V] = estimate(Mdl,Y,'Y0',Y0); ARIMA(2,0,0) Model (Gaussian Distribution): Value _______ Constant AR{1} AR{2} Variance
StandardError _____________
2.8802 0.60623 0.10631 0.12386
2.5239 0.40372 0.29283 0.042598
TStatistic __________ 1.1412 1.5016 0.36303 2.9076
PValue _________ 0.25379 0.1332 0.71658 0.0036425
When conducting a Wald test, only the unrestricted model needs to be fit. estimate returns the estimated variance-covariance matrix as an optional output. Compute Jacobian Matrix Define the restriction function, and calculate its Jacobian matrix. For comparing an AR(1) model to an AR(2) model, the restriction function is r(c, ϕ1, ϕ2, σε2) = ϕ2 − 0 = 0 . The Jacobian of the restriction function is ∂r ∂r ∂r ∂r ∂c ∂ϕ1 ∂ϕ2 ∂σε2 = 0 0 1 0 Evaluate the restriction function and Jacobian at the unrestricted MLEs. 3-65
3
Model Selection
r = EstMdl.AR{2}; R = [0 0 1 0];
Conduct Wald Test Conduct a Wald test to compare the restricted AR(1) model against the unrestricted AR(2) model. [h,p,Wstat,crit] = waldtest(r,R,V) h = logical 0 p = 0.7166 Wstat = 0.1318 crit = 3.8415
The restricted AR(1) model is not rejected in favor of the AR(2) model (h = 0).
See Also arima | estimate | waldtest
Related Examples •
“Conduct Lagrange Multiplier Test” on page 3-62
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-67
More About
3-66
•
“Model Comparison Tests” on page 3-58
•
“Goodness of Fit” on page 3-86
•
“Autoregressive Model” on page 7-17
Compare GARCH Models Using Likelihood Ratio Test
Compare GARCH Models Using Likelihood Ratio Test This example shows how to conduct a likelihood ratio test to choose the number of lags in a GARCH model. Load Data Load the Deutschmark/British pound foreign-exchange rate data included with the toolbox. Convert the daily rates to returns. load Data_MarkPound Y = Data; r = price2ret(Y); N = length(r); figure plot(r) xlim([0,N]) title('Mark-Pound Exchange Rate Returns')
The daily returns exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. The returns are of relatively high frequency. Therefore, the daily changes can be small. For numerical stability, scale the data to percentage returns. 3-67
3
Model Selection
r = 100*r;
Fit GARCH(1,1) Model Create and fit a GARCH(1,1) model (with a mean offset) to the returns series. Return the value of the loglikelihood objective function. Mdl1 = garch('Offset',NaN,'GARCHLags',1,'ARCHLags',1); [EstMdl1,~,logL1] = estimate(Mdl1,r); GARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution):
Constant GARCH{1} ARCH{1} Offset
Value __________
StandardError _____________
TStatistic __________
PValue __________
0.010761 0.80597 0.15313 -0.0061904
0.001323 0.01656 0.013974 0.0084336
8.1342 48.669 10.959 -0.73402
4.1454e-16 0 6.038e-28 0.46294
Fit GARCH(2,1) Model Create and fit a GARCH(2,1) model with a mean offset. Mdl2 = garch(2,1); Mdl2.Offset = NaN; [EstMdl2,~,logL2] = estimate(Mdl2,r); GARCH(2,1) Conditional Variance Model with Offset (Gaussian Distribution):
Constant GARCH{1} GARCH{2} ARCH{1} Offset
Value __________
StandardError _____________
TStatistic __________
PValue __________
0.011226 0.48964 0.29769 0.16842 -0.0049837
0.001538 0.11159 0.10218 0.016583 0.0084764
7.2992 4.3878 2.9133 10.156 -0.58795
2.8947e-13 1.1453e-05 0.003576 3.1158e-24 0.55657
Conduct Likelihood Ratio Test Conduct a likelihood ratio test to compare the restricted GARCH(1,1) model fit to the unrestricted GARCH(2,1) model fit. The degree of freedom for this test is one (the number of restrictions). [h,p] = lratiotest(logL2,logL1,1) h = logical 1 p = 0.0218
3-68
Compare GARCH Models Using Likelihood Ratio Test
At the 0.05 significance level, the null GARCH(1,1) model is rejected (h = 1) in favor of the unrestricted GARCH(2,1) alternative.
See Also Objects garch Functions estimate | lratiotest
Related Examples •
“Conduct Lagrange Multiplier Test” on page 3-62
•
“Conduct Wald Test” on page 3-65
•
“Compare Conditional Variance Models Using Information Criteria” on page 8-69
More About •
“Model Comparison Tests” on page 3-58
•
“Goodness of Fit” on page 3-86
•
“GARCH Model” on page 8-3
3-69
3
Model Selection
Classical Model Misspecification Tests This example shows the use of the likelihood ratio, Wald, and Lagrange multiplier tests. These tests are useful in the evaluation and assessment of model restrictions and, ultimately, the selection of a model that balances the often competitive goals of adequacy and simplicity. Introduction Econometric models are a balance. On the one hand, they must be sufficiently detailed to account for relevant economic factors and their influence on observed data patterns. On the other hand, they must avoid unnecessary complexities that lead to computational challenges, over-fitting, or problems with interpretation. Working models are often developed by considering a sequence of nested specifications, in which larger, theoretical models are examined for simplifying restrictions on the parameters. If the parameters are estimated by maximum likelihood, three classical tests are typically used to assess the adequacy of the restricted models. They are the likelihood ratio test, the Wald test, and the Lagrange multiplier test. The loglikelihood of model parameters θ, given data d, is denoted L(θ | d). With no restrictions on the model, L is optimized at the maximum likelihood estimate (MLE) θ. With restrictions of the form r (θ) ∼ = 0, L is optimized at θ with a generally reduced loglikelihood of describing the data. The classical tests evaluate the statistical significance of model restrictions using information obtained from these optimizations. The framework is very general; it encompasses both linear and nonlinear models, and both linear and nonlinear restrictions. In particular, it extends the familiar framework of t and F tests for linear models. Each test uses the geometry of the loglikelihood surface to evaluate the significance of model restrictions in a different way: • The likelihood ratio test considers the difference in loglikelihoods at θ and ∼ θ . If the restrictions are insignificant, this difference should be near zero. • The Wald test considers the value of r at θ. If the restrictions are insignificant, this value should be ∼ near the value of r at θ , which is zero. • The Lagrange multiplier test considers the gradient, or score, of L at ∼ θ . If the restrictions are insignificant, this vector should be near the score at θ, which is zero. The likelihood ratio test evaluates the difference in loglikelihoods directly. The Wald and Lagrange multiplier tests do so indirectly, with the idea that insignificant changes in the evaluated quantities can be identified with insignificant changes in the parameters. This identification depends on the curvature of the loglikelihood surface in the neighborhood of the MLE. As a result, the Wald and Lagrange multiplier tests include an estimate of parameter covariance in the formulation of the test statistic. Econometrics Toolbox™ software implements the likelihood ratio, Wald, and Lagrange multiplier tests in the functions lratiotest, waldtest, and lmtest, respectively. Data and Models Consider the following data from the U.S. Census Bureau, giving average annual earnings by educational attainment level: 3-70
Classical Model Misspecification Tests
load Data_Income2 numLevels = 8; X = 100*repmat(1:numLevels,numLevels,1); % Levels: 100, 200, ..., 800 x = X(:); % Education y = Data(:); % Income n = length(y); % Sample size levelNames = DataTable.Properties.VariableNames; boxplot(Data,'labels',levelNames) grid on xlabel('Educational Attainment') ylabel('Average Annual Income (1999 Dollars)') title('{\bf Income and Education}')
The income distributions in the data are conditional on the educational attainment level x. This pattern is also evident in a histogram of the data, which shows the small sample size: figure edges = [0:0.2:2]*1e5; centers =[0.1:0.2:1.9]*1e5; BinCounts = zeros(length(edges)-1,numLevels); for j = 1:numLevels BinCounts(:,j) = histcounts(Data(:,j),edges); end;
3-71
3
Model Selection
h = bar(centers,BinCounts); axis tight grid on legend(h,levelNames) xlabel('Average Annual Income (1999 Dollars)') ylabel('Number of Observations') title('{\bf Income and Education}')
A common model for this type of data is the gamma distribution, with conditional density ρ
f (yi | xi, β, ρ) =
βi ρ − 1 −yiβi y e , Γ(ρ)
where βi = 1/(β + xi) and i = 1, . . . , n . Gamma distributions are sums of ρ exponential distributions, and so admit natural restrictions on the value of ρ. The exponential distribution, with ρ equal to 1, is monotonically decreasing and too simple to describe the unimodal distributions in the data. For the purposes of illustration, we will maintain a restricted model that is the sum of two exponential distributions, obtained by imposing the restriction 3-72
Classical Model Misspecification Tests
r(β, ρ) = ρ − 2 = 0 . This null model will be tested against the unrestricted alternative represented by the general gamma distribution. The loglikelihood function of the conditional gamma density, and its derivatives, are found analytically: n
∑
L(β, ρ | x) = ρ
i=1
n
lnβi − nlnΓ(ρ) + (ρ − 1)
n
∑
i=1
n
lnyi −
∑
i=1
yi βi
n
∂L 2 = − ρ ∑ βi + ∑ yiβi ∂β i=1 i=1 ∂L = ∂ρ ∂2 L 2
∂β
n
∑
n
i=1
lnβi − nΨ(ρ) +
n
=ρ
∑
i=1
2
βi − 2
n
∑
i=1
∑
i=1
lnyi
3
yi βi
∂2 L = − nΨ′(ρ) ∂ρ2 n
∂2 L = − ∑ βi ∂β ∂ρ i=1 where Ψ is the digamma function, the derivative of lnΓ. The loglikelihood function is used to find MLEs for the restricted and unrestricted models. The derivatives are used to construct gradients and parameter covariance estimates for the Wald and Lagrange multiplier tests. Maximum Likelihood Estimation Since optimizers in MATLAB® and Optimization Toolbox™ software find minima, we maximize the loglikelihood by minimizing the negative loglikelihood function. Using the L found above, we code the negative loglikelihood function with parameter vector p = [beta;rho]: nLLF = @(p)sum(p(2)*(log(p(1)+x))+gammaln(p(2))-(p(2)-1)*log(y)+y./(p(1)+x));
We use the function fmincon to compute the restricted parameter estimates at ρ = 2. The lower bound on β assures that the logarithm in nLLF is evaluated at positive arguments: options = optimoptions(@fmincon,'TolFun',1e-10,'Display','off'); rp0 = [1 1]; % Initial values rlb = [-min(x) 2]; % Lower bounds rub = [Inf 2]; % Upper bounds [rmle,rnLL] = fmincon(nLLF,rp0,[],[],[],[],rlb,rub,[],options); rbeta = rmle(1); % Restricted beta estimate rrho = rmle(2); % Restricted rho estimate rLL = -rnLL; % Restricted loglikelihood
3-73
3
Model Selection
Unrestricted parameter estimates are computed in a similar manner, starting from initial values given by the restricted estimates: up0 = [rbeta rrho]; % Initial values ulb = [-min(x) 0]; % Lower bounds uub = [Inf Inf]; % Upper bounds [umle,unLL] = fmincon(nLLF,up0,[],[],[],[],ulb,uub,[],options); ubeta = umle(1); % Unrestricted beta estimate urho = umle(2); % Unrestricted rho estimate uLL = -unLL; % Unrestricted loglikelihood
We display the MLEs on a logarithmic contour plot of the negative loglikelihood surface: betas = 1e3:1e2:4e4; rhos = 0:0.1:10; [BETAS,RHOS] = meshgrid(betas,rhos); NLL = zeros(size(BETAS)); for i = 1:numel(NLL) NLL(i) = nLLF([BETAS(i),RHOS(i)]); end L = log10(unLL); v = logspace(L-0.1,L+0.1,100); contour(BETAS,RHOS,NLL,v) % Negative loglikelihood surface colorbar hold on plot(ubeta,urho,'bo','MarkerFaceColor','b') % Unrestricted MLE line([1e3 4e4],[2 2],'Color','k','LineWidth',2) % Restriction plot(rbeta,rrho,'bs','MarkerFaceColor','b') % Restricted MLE legend('nllf','umle','restriction','rmle') xlabel('\beta') ylabel('\rho') title('{\bf Unrestricted and Restricted MLEs}')
3-74
Classical Model Misspecification Tests
Covariance Estimators The intuitive relationship between the curvature of the loglikelihood surface and the variance/ covariance of the parameter estimates is formalized by the information matrix equality, which identifies the negative expected value of the Hessian with the Fisher information matrix. The second derivatives in the Hessian express loglikelihood concavities. The Fisher information matrix expresses parameter variance; its inverse is the asymptotic covariance matrix. Covariance estimators required by the Wald and Lagrange multiplier tests are computed in a variety of ways. One approach is to use the outer product of gradients (OPG), which requires only first derivatives of the loglikelihood. While popular for its relative simplicity, the OPG estimator can be unreliable, especially with small samples. Another, often preferable, estimator is the inverse of the negative expected Hessian. By the information matrix equality, this estimator is the asymptotic covariance, appropriate for large samples. If analytic expectations are difficult to compute, the expected Hessian can be replaced by the Hessian evaluated at the parameter estimates, the so-called "observed" Fisher information. We compute each of the three estimators, using the derivatives of L found earlier. Conditional expectations in the Hessian are found using E[g(X)Y | X] = g(X)E[Y | X] . We evaluate the estimators at the unrestricted parameter estimates, for the Wald test, and then at the restricted parameter estimates, for the Lagrange multiplier test.
3-75
3
Model Selection
Different scales for the β and ρ parameters are reflected in the relative sizes of the variances. The small sample size is reflected in the differences among the estimators. We increase the precision of the displays to show these differences: format long
Evaluated at the unrestricted parameter estimates, the estimators are: % OPG estimator: UG = [-urho./(ubeta+x)+y.*(ubeta+x).^(-2),-log(ubeta+x)-psi(urho)+log(y)]; Uscore = sum(UG)'; UEstCov1 = inv(UG'*UG) %#ok UEstCov1 = 2×2 106 × 6.163694782854278 -0.002335589407070
-0.002335589407070 0.000000949847037
% Hessian estimator (observed information): UDPsi = (psi(urho+0.0001)-psi(urho-0.0001))/(0.0002); % Digamma derivative UH = [sum(urho./(ubeta+x).^2)-2*sum(y./(ubeta+x).^3),-sum(1./(ubeta+x)); ... -sum(1./(ubeta+x)),-n*UDPsi]; UEstCov2 = -inv(UH) %#ok UEstCov2 = 2×2 106 × 5.914336186238142 -0.001864364370431
-0.001864364370431 0.000000648730014
% Expected Hessian estimator (expected information): UEH = [-sum(urho./((ubeta+x).^2)), -sum(1./(ubeta+x)); ... -sum(1./(ubeta+x)),-n*UDPsi]; UEstCov3 = -inv(UEH) %#ok UEstCov3 = 2×2 106 × 4.993544524752526 -0.001574105056079
-0.001574105056079 0.000000557232151
Evaluated at the restricted parameter estimates, the estimators are: % OPG estimator: RG = [-rrho./(rbeta+x)+y.*(rbeta+x).^(-2),-log(rbeta+x)-psi(rrho)+log(y)]; Rscore = sum(RG)'; REstCov1 = inv(RG'*RG) %#ok REstCov1 = 2×2 107 ×
3-76
Classical Model Misspecification Tests
6.614327250443059 -0.000476569931624
-0.000476569931624 0.000000040110448
% Hessian estimator (observed information): RDPsi = (psi(rrho+0.0001)-psi(rrho-0.0001))/(0.0002); % Digamma derivative RH = [sum(rrho./(rbeta+x).^2)-2*sum(y./(rbeta+x).^3),-sum(1./(rbeta+x)); ... -sum(1./(rbeta+x)),-n*RDPsi]; REstCov2 = -inv(RH) %#ok REstCov2 = 2×2 107 × 2.708411988833666 -0.000153135043006
-0.000153135043006 0.000000011081064
% Expected Hessian estimator (expected information): REH = [-sum(rrho./((rbeta+x).^2)),-sum(1./(rbeta+x)); ... -sum(1./(rbeta+x)),-n*RDPsi]; REstCov3 = -inv(REH) %#ok REstCov3 = 2×2 107 × 2.613663217708711 -0.000147777897490
-0.000147777897490 0.000000010778169
Return to short numerical displays: format short
The Likelihood Ratio Test The likelihood ratio test, which evaluates the statistical significance of the difference in loglikelihoods at the unrestricted and restricted parameter estimates, is generally considered to be the most reliable of the three classical tests. Its main disadvantage is that it requires estimation of both models. This may be an issue if either the unrestricted model or the restrictions are nonlinear, making significant demands on the necessary optimizations. Once the required loglikelihoods have been obtained through maximum likelihood estimation, use lratiotest to run the likelihood ratio test: dof = 1; % Number of restrictions [LRh,LRp,LRstat,cV] = lratiotest(uLL,rLL,dof) %#ok LRh = logical 1 LRp = 7.9882e-05 LRstat = 15.5611 cV = 3.8415
3-77
3
Model Selection
The test rejects the restricted model (LRh = 1), with a p-value (LRp = 7.9882e-005) well below the default significance level (alpha = 0.05), and a test statistic (LRstat = 15.5611) well above the critical value (cV = 3.8415). Like the Wald and Lagrange multiplier tests, the likelihood ratio test is asymptotic; the test statistic is evaluated with a limiting distribution obtained by letting the sample size tend to infinity. The same chi-square distribution, with degree of freedom dof, is used to evaluate the individual test statistics of each of the three tests, with the same critical value cV. Consequences for drawing inferences from small samples should be apparent, and this is one of the reasons why the three tests are often used together, as checks against one another. The Wald Test The Wald test is appropriate in situations where the restrictions impose significant demands on parameter estimation, as in the case of multiple nonlinear constraints. The Wald test has the advantage that it requires only the unrestricted parameter estimate. Its main disadvantage is that, unlike the likelihood ratio test, it also requires a reasonably accurate estimate of the parameter covariance. To perform a Wald test, restrictions must be formulated as functions from p-dimensional parameter space to q-dimensional restriction space: r1(θ1, …, θp) r=
⋮ rq(θ1, …, θp)
with Jacobian ∂r1 ∂r1 … ∂θ1 ∂θp R=
⋮ ⋱ ⋮ . ∂rq ∂rq … ∂θ1 ∂θp
For the gamma distribution under consideration, the single restriction r(β, ρ) = ρ − 2 maps 2-dimensional parameter space to 1-dimensional restriction space with Jacobian [0 1]. Use waldtest to run the Wald test with each of the unrestricted covariance estimates computed previously. The number of restrictions dof is the length of the input vector r, so it does not have to be input explicitly as for lratiotest or lmtest: r = urho-2; % Restriction vector R = [0 1]; % Jacobian restrictions = {r,r,r}; Jacobians = {R,R,R}; UEstCov = {UEstCov1,UEstCov2,UEstCov3}; [Wh,Wp,Wstat,cV] = waldtest(restrictions,Jacobians,UEstCov) %#ok Wh = 1x3 logical array
3-78
Classical Model Misspecification Tests
1
1
1
Wp = 1×3 0.0144
0.0031
0.0014
8.7671
10.2066
3.8415
3.8415
Wstat = 1×3 5.9878 cV = 1×3 3.8415
The test rejects the restricted model with each of the covariance estimates. Hypothesis tests in Econometrics Toolbox and Statistics Toolbox™ software operate at a default 5% significance level. The significance level can be changed with an optional input: alpha = 0.01; % 1% significance level [Wh2,Wp2,Wstat2,cV2] = waldtest(restrictions,Jacobians,UEstCov,alpha) %#ok Wh2 = 1x3 logical array 0
1
1
Wp2 = 1×3 0.0144
0.0031
0.0014
8.7671
10.2066
6.6349
6.6349
Wstat2 = 1×3 5.9878 cV2 = 1×3 6.6349
The OPG estimator fails to reject the restricted model at the new significance level. The Lagrange Multiplier Test The Lagrange multiplier test is appropriate in situations where the unrestricted model imposes significant demands on parameter estimation, as in the case where the restricted model is linear but the unrestricted model is not. The Lagrange multiplier test has the advantage that it requires only the restricted parameter estimate. Its main disadvantage is that, like the Wald test, it also requires a reasonably accurate estimate of the parameter covariance. Use lmtest to run the Lagrange multiplier test with each of the restricted covariance estimates computed previously: 3-79
3
Model Selection
scores = {Rscore,Rscore,Rscore}; REstCov = {REstCov1,REstCov2,REstCov3}; [LMh,LMp,LMstat,cV] = lmtest(scores,REstCov,dof) %#ok LMh = 1x3 logical array 1
1
1
LMp = 1×3 0.0000
0.0024
0.0027
9.2442
8.9916
3.8415
3.8415
LMstat = 1×3 33.4617 cV = 1×3 3.8415
The test again rejects the restricted model with each of the covariance estimates at the default significance level. The reliability of the OPG estimator is called into question by the anomalously large value of the first test statistic. Summary The three classical model misspecification tests form a natural toolkit for econometricians. In the context of maximum likelihood estimation, they all attempt to make the same distinction, between an unrestricted and a restricted model in some hierarchy of progressively simpler descriptions of the data. Each test, however, comes with different requirements, and so may be useful in different modeling situations, depending on the computational demands. When used together, inferences can vary among the tests, especially with small samples. Users should consider the tests as only one component of a wider statistical and economic analysis.
References [1] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
3-80
Check Fit of Multiplicative ARIMA Model
Check Fit of Multiplicative ARIMA Model This example shows how to do goodness of fit checks. Residual diagnostic plots help verify model assumptions, and cross-validation prediction checks help assess predictive performance. The time series is monthly international airline passenger numbers from 1949 to 1960. Load the data and estimate a model. Load the airline data set. load Data_Airline y = log(Data); T = length(y); Mdl = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12); EstMdl = estimate(Mdl,y); ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution): Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
Check the residuals for normality. One assumption of the fitted model is that the innovations follow a Gaussian distribution. Infer the residuals, and check them for normality. res = infer(EstMdl,y); stres = res/sqrt(EstMdl.Variance); figure subplot(1,2,1) qqplot(stres) x = -4:.05:4; [f,xi] = ksdensity(stres); subplot(1,2,2) plot(xi,f,'k','LineWidth',2); hold on plot(x,normpdf(x),'r--','LineWidth',2) legend('Standardized Residuals','Standard Normal') hold off
3-81
3
Model Selection
The quantile-quantile plot (QQ-plot) and kernel density estimate show no obvious violations of the normality assumption. Check the residuals for autocorrelation. Confirm that the residuals are uncorrelated. Look at the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) plots for the standardized residuals. figure subplot(2,1,1) autocorr(stres) subplot(2,1,2) parcorr(stres)
3-82
Check Fit of Multiplicative ARIMA Model
[h,p] = lbqtest(stres,'lags',[5,10,15],'dof',[3,8,13]) h = 1x3 logical array 0
0
0
p = 1×3 0.1842
0.3835
0.7321
The sample ACF and PACF plots show no significant autocorrelation. More formally, conduct a LjungBox Q-test at lags 5, 10, and 15, with degrees of freedom 3, 8, and 13, respectively. The degrees of freedom account for the two estimated moving average coefficients. The Ljung-Box Q-test confirms the sample ACF and PACF results. The null hypothesis that all autocorrelations are jointly equal to zero up to the tested lag is not rejected (h = 0) for any of the three lags. Check predictive performance. Use a holdout sample to compute the predictive MSE of the model. Use the first 100 observations to estimate the model, and then forecast the next 44 periods. y1 = y(1:100); y2 = y(101:end);
3-83
3
Model Selection
Mdl1 = estimate(Mdl,y1); ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution): Value _________ Constant MA{1} SMA{12} Variance
0 -0.35674 -0.63319 0.0013285
StandardError _____________ 0 0.089461 0.098744 0.00015882
TStatistic __________ NaN -3.9876 -6.4124 8.365
yF1 = forecast(Mdl1,44,'Y0',y1); pmse = mean((y2-yF1).^2) pmse = 0.0069 figure plot(y2,'r','LineWidth',2) hold on plot(yF1,'k--','LineWidth',1.5) xlim([0,44]) title('Prediction Error') legend('Observed','Forecast','Location','northwest') hold off
3-84
PValue __________ NaN 6.6739e-05 1.4326e-10 6.013e-17
Check Fit of Multiplicative ARIMA Model
The predictive ability of the model is quite good. You can optionally compare the PMSE for this model with the PMSE for a competing model to help with model selection.
See Also Objects arima Functions autocorr | parcorr | lbqtest | estimate | infer | forecast
More About •
“Specify Multiplicative ARIMA Model” on page 7-58
•
“Estimate Multiplicative ARIMA Model” on page 7-122
•
“Simulate Multiplicative ARIMA Models” on page 7-164
•
“Forecast Multiplicative ARIMA Model” on page 7-179
•
“Detect Autocorrelation” on page 3-20
•
“Goodness of Fit” on page 3-86
•
“Residual Diagnostics” on page 3-87
•
“Assess Predictive Performance” on page 3-89
•
“MMSE Forecasting of Conditional Mean Models” on page 7-172
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Ljung-Box Q-Test” on page 3-18
3-85
3
Model Selection
Goodness of Fit After specifying a model and estimating its parameters, it is good practice to perform goodness-of-fit checks to diagnose the adequacy of your fitted model. When assessing model adequacy, areas of primary concern are: • Violations of model assumptions, potentially resulting in bias and inaccurate standard errors • Poor predictive performance • Missing explanatory variables Goodness-of-fit checks can help you identify areas of model inadequacy. They can also suggest ways to improve your model. For example, if you conduct a test for residual autocorrelation and get a significant result, you might be able to improve your model fit by adding additional autoregressive or moving average terms. Some strategies for evaluating goodness of fit are: • Compare your model against an augmented alternative. Make comparisons, for example, by conducting a likelihood ratio test. Testing your model against a more elaborate alternative model is a way to assess evidence of inadequacy. Give careful thought when choosing an alternative model. • Making residual diagnostic plots is an informal—but useful—way to assess violation of model assumptions. You can plot residuals to check for normality, residual autocorrelation, residual heteroscedasticity, and missing predictors. Formal tests for autocorrelation and heteroscedasticity can also help quantify possible model violations. • Predictive performance checks. Divide your data into two parts: a training set and a validation set. Fit your model using only the training data, and then forecast the fitted model over the validation period. By comparing model forecasts against the true, holdout observations, you can assess the predictive performance of your model. Prediction mean square error (PMSE) can be calculated as a numerical summary of the predictive performance. When choosing among competing models, you can look at their respective PMSE values to compare predictive performance.
See Also Related Examples •
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4
•
“Check Fit of Multiplicative ARIMA Model” on page 3-81
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-67
More About
3-86
•
“Residual Diagnostics” on page 3-87
•
“Model Comparison Tests” on page 3-58
•
“Assess Predictive Performance” on page 3-89
Residual Diagnostics
Residual Diagnostics In this section... “Check Residuals for Normality” on page 3-87 “Check Residuals for Autocorrelation” on page 3-87 “Check Residuals for Conditional Heteroscedasticity” on page 3-87
Check Residuals for Normality A common assumption of time series models is a Gaussian innovation distribution. After fitting a model, you can infer residuals and check them for normality. If the Gaussian innovation assumption holds, the residuals should look approximately normally distributed. Some plots for assessing normality are: • Histogram • Box plot • Quantile-quantile plot • Kernel density estimate The last three plots are in Statistics and Machine Learning Toolbox. If you see that your standardized residuals have excess kurtosis (fatter tails) compared to a standard normal distribution, you can consider using a Student’s t innovation distribution.
Check Residuals for Autocorrelation In time series models, the innovation process is assumed to be uncorrelated. After fitting a model, you can infer residuals and check them for any unmodeled autocorrelation. As an informal check, you can plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). If either plot shows significant autocorrelation in the residuals, you can consider modifying your model to include additional autoregression or moving average terms. More formally, you can conduct a Ljung-Box Q-test on the residual series. This tests the null hypothesis of jointly zero autocorrelations up to lag m, against the alternative of at least one nonzero autocorrelation. You can conduct the test at several values of m. The degrees of freedom for the Qtest are usually m. However, for testing a residual series, you should use degrees of freedom m – p – q, where p and q are the number of AR and MA coefficients in the fitted model, respectively.
Check Residuals for Conditional Heteroscedasticity A white noise innovation process has constant variance. After fitting a model, you can infer residuals and check them for heteroscedasticity (nonconstant variance). As an informal check, you can plot the sample ACF and PACF of the squared residual series. If either plot shows significant autocorrelation, you can consider modifying your model to include a conditional variance process. 3-87
3
Model Selection
More formally, you can conduct an Engle’s ARCH test on the residual series. This tests the null hypothesis of no ARCH effects against the alternative ARCH model with k lags.
See Also Apps Econometric Modeler Functions histogram | boxplot | qqplot | ksdensity | autocorr | parcorr | lbqtest | archtest
Related Examples •
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4
•
“Detect Autocorrelation” on page 3-20
•
“Detect ARCH Effects” on page 3-28
•
“Check Fit of Multiplicative ARIMA Model” on page 3-81
More About
3-88
•
“Goodness of Fit” on page 3-86
•
“Assess Predictive Performance” on page 3-89
•
“Ljung-Box Q-Test” on page 3-18
•
“Engle’s ARCH Test” on page 3-26
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
Assess Predictive Performance
Assess Predictive Performance If you plan to use a fitted model for forecasting, a good practice is to assess the predictive ability of the model. Models that fit well in-sample are not guaranteed to forecast well. For example, overfitting can lead to good in-sample fit, but poor predictive performance. When checking predictive performance, it is important to not use your data twice. That is, the data you use to fit your model should be different than the data you use to assess forecasts. You can use cross validation to evaluate out-of-sample forecasting ability: 1
Divide your time series into two parts: a training set and a validation set.
2
Fit a model to your training data.
3
Forecast the fitted model over the validation period.
4
Compare the forecasts to the holdout validation observations using plots and numerical summaries (such as predictive mean square error).
Prediction mean square error (PMSE) measures the discrepancy between model forecasts and observed data. Suppose you have a time series of length N, and you set aside M validation points, v .. After fitting your model to the first N – M data points (the training set), denoted y1v, y2v, …, yM v
v
v
generate forecasts y 1, y 2, …, y M . The model PMSE is calculated as PMSE =
M
2
1 v yiv − y i . Mi∑ =1
You can calculate PMSE for various choices of M to verify the robustness of your results.
See Also Related Examples •
“Check Fit of Multiplicative ARIMA Model” on page 3-81
More About •
“Goodness of Fit” on page 3-86
•
“Residual Diagnostics” on page 3-87
•
“MMSE Forecasting of Conditional Mean Models” on page 7-172
•
“MMSE Forecasting of Conditional Variance Models” on page 8-90
3-89
3
Model Selection
Nonspherical Models What Are Nonspherical Models? Consider the linear time series model yt = Xt β + εt, where yt is the response, xt is a vector of values for the r predictors, β is the vector of regression coefficients, and εt is the random innovation at time t. Ordinary least squares (OLS) estimation and inference techniques for this framework depend on certain assumptions, e.g., homoscedastic and uncorrelated innovations. For more details on the classical linear model, see “Time Series Regression I: Linear Models” on page 5-173. If your data exhibits signs of assumption violations, then OLS estimates or inferences based on them might not be valid. In particular, if the data is generated with an innovations process that exhibits autocorrelation or heteroscedasticity, then the model (or the residuals) are nonspherical. These characteristics are often detected through testing of model residuals (for details, see “Time Series Regression VI: Residual Diagnostics” on page 5-220). Nonspherical residuals are often considered a sign of model misspecification, and models are revised to whiten the residuals and improve the reliability of standard estimation techniques. In some cases, however, nonspherical models must be accepted as they are, and estimated as accurately as possible using revised techniques. Cases include: • Models presented by theory • Models with predictors that are dictated by policy • Models without available data sources, for which predictor proxies must be found A variety of alternative estimation techniques have been developed to deal with these situations.
See Also Related Examples
3-90
•
“Plot a Confidence Band Using HAC Estimates” on page 3-91
•
“Change the Bandwidth of a HAC Estimator” on page 3-98
•
“Time Series Regression I: Linear Models” on page 5-173
•
“Time Series Regression VI: Residual Diagnostics” on page 5-220
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-279
Plot a Confidence Band Using HAC Estimates
Plot a Confidence Band Using HAC Estimates This example shows how to plot heteroscedastic-and-autocorrelation consistent (HAC) corrected confidence bands using Newey-West robust standard errors. One way to estimate the coefficients of a linear model is by OLS. However, time series models tend to have innovations that are autocorrelated and heteroscedastic (i.e., the errors are nonspherical). If a time series model has nonspherical errors, then usual formulae for standard errors of OLS coefficients are biased and inconsistent. Inference based on these inefficient standard errors tends to inflate the Type I error rate. One way to account for nonspherical errors is to use HAC standard errors. In particular, the Newey-West estimator of the OLS coefficient covariance is relatively robust against nonspherical errors. Load Data Load the Canadian electric power consumption data set from the World Bank. The response is Canada's electrical energy consumption in kWh (DataTimeTable.consump), the predictor is Canada's GDP in year 2000 USD (DataTimeTable.gdp), and the data set also contains the GDP deflator (DataTimeTable.gdp_deflator). Because DataTimeTable is a timetable, DataTimeTable.Time is the sample year. load Data_PowerConsumption
Define Model Model the behavior of the annual difference in electrical energy consumption with respect to real GDP as a linear model: consumpDiff t = β0 + β1rGDPt + εt . consumpDiff = diff(DataTimeTable.consump); consumpDiff = consumpDiff/1.0e+10; T = size(consumpDiff,1);
% Annual difference in consumption % Scale for numerical stability
rGDP = DataTimeTable.gdp./(DataTimeTable.gdp_deflator); % Deflate GDP rGDP = rGDP(2:end)/1.0e+10; Mdl = fitlm(rGDP,consumpDiff); coeff = Mdl.Coefficients(:,1); EstParamCov = Mdl.CoefficientCovariance; resid = Mdl.Residuals.Raw;
Plot Data Plot the difference in energy consumption, consumpDiff versus the real GDP, to check for possible heteroscedasticity. figure plot(rGDP,consumpDiff,".") title("Annual Difference in Energy Consumption vs real GDP - Canada") xlabel("Real GDP (year 2000 USD)") ylabel("Annual Difference in Energy Consumption (kWh)")
3-91
3
Model Selection
The figure indicates that heteroscedasticity might be present in the annual difference in energy consumption. As real GDP increases, the annual difference in energy consumption seems to be less variable. Plot Residuals Plot the residuals from Mdl against the fitted values and year to assess heteroscedasticity and autocorrelation. figure tiledlayout("flow") nexttile([1 2]) plot(Mdl.Fitted,resid,".") hold on plot([min(Mdl.Fitted) max(Mdl.Fitted)],[0 0],"k-") title("Residual Plots") xlabel("Fitted Consumption") ylabel("Residuals") axis tight hold off nexttile autocorr(resid) h1 = gca; h1.FontSize = 8; nexttile parcorr(resid)
3-92
Plot a Confidence Band Using HAC Estimates
h2 = gca; h2.FontSize = 8;
The residual plot reveals decreasing residual variance with increasing fitted consumption. The autocorrelation function shows that autocorrelation might be present in the first few lagged residuals. Test for Heteroscedasticity and Autocorrelation Test for conditional heteroscedasticity using Engle's ARCH test. Test for autocorrelation using the Ljung-Box Q test. Test for overall correlation using the Durbin-Watson test. [~,englePValue] = archtest(resid); englePValue englePValue = 0.1463 [~,lbqPValue] = lbqtest(resid,Lag=1:3); % Significance of first three lags lbqPValue lbqPValue = 1×3 0.0905
0.1966
0.0522
[dwPValue] = dwtest(Mdl); dwPValue dwPValue = 0.0024
3-93
3
Model Selection
The p value of Engle's ARCH test suggests significant conditional heteroscedasticity at 15% significance level. The p value for the Ljung-Box Q test suggests significant autocorrelation with the first and third lagged residuals at 10% significance level. The p value for the Durbin-Watson test suggests that there is strong evidence for overall residual autocorrelation. The results of the tests suggest that the standard linear model conditions of homoscedasticity and uncorrelated errors are violated, and inferences based on the OLS coefficient covariance matrix are suspect. One way to proceed with inference (such as constructing a confidence band) is to correct the OLS coefficient covariance matrix by estimating the Newey-West coefficient covariance. Estimate Newey-West Coefficient Covariance Correct the OLS coefficient covariance matrix by estimating the Newey-West coefficient covariance using hac. Compute the maximum lag to be weighted for the standard Newey-West estimate, maxLag (Newey and West, 1994). Use hac to estimate the standard Newey-West coefficient covariance. maxLag = floor(4*(T/100)^(2/9)); [NWEstParamCov,~,NWCoeff] = hac(Mdl,Type="HAC", ... Bandwidth=maxLag+1); Estimator type: HAC Estimation method: BT Bandwidth: 4.0000 Whitening order: 0 Effective sample size: 49 Small sample correction: on Coefficient Covariances: | Const x1 -------------------------Const | 0.3720 -0.2990 x1 | -0.2990 0.2454
The Newey-West standard error for the coefficient of rGDP, labeled x1 in the table, is less than the usual OLS standard error. This suggests that, in this data set, correcting for residual heteroscedasticity and autocorrelation increases the precision in measuring the linear effect of real GDP on energy consumption. Calculate Working-Hotelling Confidence Bands Compute the 95% Working-Hotelling confidence band for each covariance estimate using nlpredci (Kutner et al., 2005). rGDPdes = [ones(T,1) rGDP]; modelfun = @(b,x)(b(1)*x(:,1)+b(2)*x(:,2));
% Design matrix % Define the linear model
[beta,nlresid,~,EstParamCov] = nlinfit(rGDPdes, ... consumpDiff,modelfun,[1,1]); % Estimate the model [fity,fitcb] = nlpredci(modelfun,rGDPdes,beta,nlresid, ... Covar=EstParamCov,SimOpt="on"); % Margin of errors conbandnl = [fity - fitcb fity + fitcb]; % Confidence bands [fity,NWfitcb] = nlpredci(modelfun,rGDPdes, ... beta,nlresid,Covar=NWEstParamCov,SimOpt="on"); % Corrected margin of error NWconbandnl = [fity - NWfitcb fity + NWfitcb]; % Corrected confidence bands
3-94
Plot a Confidence Band Using HAC Estimates
Plot Working-Hotelling Confidence Bands Plot the Working-Hotelling confidence bands on the same axes twice: one plot displaying electrical energy consumption with respect to real GDP, and the other displaying the electrical energy consumption time series. figure l1 = plot(rGDP,consumpDiff,"k."); hold on l2 = plot(rGDP,fity,"b-",LineWidth=2); l3 = plot(rGDP,conbandnl,"r-"); l4 = plot(rGDP,NWconbandnl,"g--"); title("Data with 95% Working-Hotelling Conf. Bands") xlabel("real GDP (year 2000 USD)") ylabel("Consumption (kWh)") axis([0.7 1.4 -2 2.5]) legend([l1 l2 l3(1) l4(1)],"Data","Fitted","95% conf. band", ... "Newey-West 95% conf. band",Location="southeast") hold off
figure year = DataTimeTable.Time(2:end); l1 = plot(year,consumpDiff); hold on l2 = plot(year,fity,"k-",LineWidth=2); l3 = plot(year,conbandnl,"r-"); l4 = plot(year,NWconbandnl,"g--"); title("Consumption with 95% Working-Hotelling Conf. Bands")
3-95
3
Model Selection
xlabel("Year") ylabel("Consumption (kWh)") legend([l1 l2 l3(1) l4(1)],"Consumption","Fitted", ... "95% conf. band","Newey-West 95% conf. band", ... Location="southwest") hold off
The plots show that the Newey-West estimator accounts for the heteroscedasticity in that the confidence band is wide in areas of high volatility, and thin in areas of low volatility. The OLS coefficient covariance estimator ignores this pattern of volatility. References:
3-96
1
Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. 5th Ed. New York: McGraw-Hill/Irwin, 2005.
2
Newey, W. K., and K. D. West. "A Simple Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, 1987, pp. 703-708.
Plot a Confidence Band Using HAC Estimates
3
Newey, W. K, and K. D. West. "Automatic Lag Selection in Covariance Matrix Estimation." The Review of Economic Studies. Vol. 61 No. 4, 1994, pp. 631-653.
See Also Related Examples •
“Change the Bandwidth of a HAC Estimator” on page 3-98
•
“Time Series Regression VI: Residual Diagnostics” on page 5-220
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-279
More About •
“Nonspherical Models” on page 3-90
3-97
3
Model Selection
Change the Bandwidth of a HAC Estimator This example shows how to change the bandwidth when estimating a HAC coefficient covariance, and compare estimates over varying bandwidths and kernels. How does the bandwidth affect HAC estimators? If you change it, are there large differences in the estimates, and, if so, are the differences practically significant? Explore bandwidth effects by estimating HAC coefficient covariances over a grid of bandwidths. Load and Plot Data Determine how the cost of living affects the behavior of nominal wages. Load the Nelson Plosser data set to explore their statistical relationship. load Data_NelsonPlosser DTT = rmmissing(DataTimeTable); cpi = DTT.CPI; % Cost of living wm = DTT.WN; % Nominal wages figure plot(DTT.CPI,DTT.WN,"o") hFit = lsline; % Regression line xlabel("Consumer Price Index (1967 = 100)") ylabel("Nominal Wages (current $)") legend(hFit,"OLS Line",Location="southeast") title("{\bf Cost of Living}") grid on
3-98
Change the Bandwidth of a HAC Estimator
The plot suggests that a linear model might capture the relationship between the two variables. Define Model Model the behavior of nominal wages with respect to CPI as this linear model. wmt = β0 + β1cpit + εt Mdl = fitlm(DTT.CPI,DTT.WN) Mdl = Linear regression model: y ~ 1 + x1 Estimated Coefficients: Estimate ________ (Intercept) x1
-2541.5 88.041
SE ______
tStat _______
pValue _________
174.64 2.6784
-14.553 32.871
2.407e-21 4.507e-40
Number of observations: 62, Error degrees of freedom: 60 Root Mean Squared Error: 494 R-squared: 0.947, Adjusted R-Squared: 0.947 F-statistic vs. constant model: 1.08e+03, p-value = 4.51e-40
3-99
3
Model Selection
coeffCPI = Mdl.Coefficients.Estimate(2); seCPI = Mdl.Coefficients.SE(2);
Plot Residuals Plot the residuals from Mdl against the fitted values to assess heteroscedasticity and autocorrelation. figure stem(Mdl.Residuals.Raw) xlabel("Observation") ylabel("Residual") title("{\bf Linear Model Residuals}") axis tight grid on
The residual plot shows varying levels of dispersion, which indicates heteroscedasticity. Neighboring residuals (with respect to observation) tend to have the same sign and magnitude, which indicates the presence of autocorrelation. Estimate HAC standard errors. Obtain HAC standard errors over varying bandwidths using the Bartlett (for the Newey-West estimate) and quadratic spectral kernels. numEstimates = 10; stdErrBT = zeros(numEstimates,1); stdErrQS = zeros(numEstimates,1);
3-100
Change the Bandwidth of a HAC Estimator
for bw = 1:numEstimates [~,CoeffTbl] = hac(DTT,ResponseVariable="WN",PredictorVariables="CPI", ... Bandwidth=bw,Display="off"); % Newey-West [~,CoeffTblQS] = hac(DTT,ResponseVariable="WN",PredictorVariables="CPI", ... Weights="QS",Bandwidth=bw,Display="off"); % HAC using quadratic spectral kernel stdErrBT(bw) = CoeffTbl.SE(2); stdErrQS(bw) = CoeffTblQS.SE(2); end
You can increase numEstimates to discover how increasing bandwidths affect the HAC estimates. Plot Standard Errors Visually compare the Newey-West standard errors of β1 to those using the quadratic spectral kernel over the bandwidth grid. figure hold on hCoeff = plot(1:numEstimates,repmat(coeffCPI,numEstimates,1), ... LineWidth=2); hOLS = plot(1:numEstimates,repmat(coeffCPI+seCPI,numEstimates,1), ... "g--"); plot(1:numEstimates,repmat(coeffCPI-seCPI,numEstimates,1),"g--") hBT = plot(1:numEstimates,coeffCPI+stdErrBT,"ro--"); plot(1:numEstimates,coeffCPI-stdErrBT,"ro--") hQS = plot(1:numEstimates,coeffCPI+stdErrQS,"kp--", ... LineWidth=2); plot(1:numEstimates,coeffCPI-stdErrQS,"kp--",LineWidth=2) hold off xlabel("Bandwidth") ylabel("CPI Coefficient") legend([hCoeff,hOLS,hBT,hQS],["OLS estimate" "OLS SE" ... "Newey-West SE" "Quadratic spectral SE"],Location="east") title("{\bf CPI Coefficient Standard Errors}") grid on
3-101
3
Model Selection
The plot suggests that, for this data set, accounting for heteroscedasticity and autocorrelation using either HAC estimate results in more conservative intervals than the usual OLS standard error. The precision of the HAC estimates decreases as the bandwidth increases along the defined grid. For this data set, the Newey-West estimates are slightly more precise than those using the quadratic spectral kernel. This might be because the latter captures heteroscedasticity and autocorrelation better than the former. References: 1
Andrews, D. W. K. "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation." Econometrica. Vol. 59, 1991, pp. 817-858.
2
Newey, W. K., and K. D. West. "A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, No. 3, 1987, pp. 703-708.\
3
Newey, W. K., and K. D. West. "Automatic Lag Selection in Covariance Matrix Estimation." The Review of Economic Studies. Vol. 61, No. 4, 1994, pp. 631-653.
See Also Related Examples
3-102
•
“Plot a Confidence Band Using HAC Estimates” on page 3-91
•
“Time Series Regression VI: Residual Diagnostics” on page 5-220
Change the Bandwidth of a HAC Estimator
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-279
More About •
“Nonspherical Models” on page 3-90
3-103
3
Model Selection
Check Model Assumptions for Chow Test This example shows how to check the model assumptions for a Chow test. The model is of U.S. gross domestic product (GDP), with consumer price index (CPI) and paid compensation of employees (COE) as predictors. The forecast horizon is 2007 - 2009, just before and after the 2008 U.S. recession began. Load and Inspect Data Load the U.S. macroeconomic data set. load Data_USEconModel
The time series in the data set contain quarterly, macroeconomic measurements from 1947 to 2009. For more details, a list of variables, and descriptions, enter Description at the command line. Extract the predictors and the response from the table. Focus the sample on observations taken from 1960 - 2009. idx = year(DataTimeTable.Time) >= 1960; dates = DataTimeTable.Time(idx); y = DataTimeTable.GDP(idx); X = DataTimeTable{idx,["CPIAUCSL" "COE"]}; varNames = ["CPIAUCSL" "COE" "GDP"];
Identify forecast horizon indices. fHIdx = year(dates) >= 2007;
Plot all series individually. Identify the periods of recession. figure tiledlayout(2,2) nexttile plot(dates,y) title(varNames{end}); xlabel("Year"); axis tight; datetick; recessionplot; for j = 1:size(X,2) nexttile plot(dates,X(:,j)) title(varNames{j}); xlabel("Year"); axis tight; datetick; recessionplot; end
3-104
Check Model Assumptions for Chow Test
All variables appear to grow exponentially. Also, around the last recession, a decline appears. Suppose that a linear regression model of GDP onto CPI and COE is appropriate, and you want to test whether there is a structural change in the model in 2007. Check Chow Test Assumptions Chow tests rely on: • Independent, Gaussian-distributed innovations • Constancy of the innovations variance within subsamples • Constancy of the innovations across any structural breaks If a model violates these assumptions, then the Chow test result might not be correct, or the Chow test might lack power. Investigate whether the assumptions hold. If any do not, preprocess the data further. Fit the linear model to the entire series. Include an intercept. Mdl = fitlm(X,y);
Mdl is a LinearModel model object. Draw two histogram plots using the residuals: one with respect to fitted values in case order, and the other with respect to the previous residual. figure tiledlayout(2,1)
3-105
3
Model Selection
nexttile plotResiduals(Mdl,"lagged"); nexttile plotResiduals(Mdl,"caseorder");
Because the scatter plot of residual vs. lagged residual forms a trend, autocorrelation exists in the residuals. Also, residuals on the extremes seem to flare out, which suggests the presence of heteroscedasticity. Conduct Engle's ARCH test at 5% level of significance to assess whether the innovations have conditional heteroscedasticity with ARCH(1) effects. Supply the table of residals and specify the raw residuals. StatTbl = archtest(Mdl.Residuals,DataVariable="Raw") StatTbl=1×6 table h _____ Test 1
true
pValue ______ 0
stat ______
cValue ______
109.37
3.8415
Lags ____ 1
Alpha _____ 0.05
h = 1 suggests to reject the null hypothesis that the entire residual series has no conditional heteroscedasticity. Apply the log transformation to all series that appear to grow exponentially to reduce the effects of heteroscedasticity. 3-106
Check Model Assumptions for Chow Test
y = log(y); X = log(X);
To account for autocorrelation, create predictor variables for all exponential series by lagging them by one period. LagMat = lagmatrix([X y],1); X = [X(2:end,:) LagMat(2:end,:)]; % Concatenate data and remove first row fHIdx = fHIdx(2:end); y = y(2:end);
Based on the residual diagnostics, choose this linear model for GDP GDPt = β0 + β1CPIAUCSLt + β2COEt + β3CPIAUCSLt − 1 + β4COEt − 1 + β5GDPt − 1 + εt .
εt should be a Gaussian series of innovations with mean zero and constant variance σ2. Diagnose the residuals again. Mdl = fitlm(X,y); figure tiledlayout(2,1) nexttile plotResiduals(Mdl,"lagged"); nexttile plotResiduals(Mdl,"caseorder");
3-107
3
Model Selection
StatTbl = archtest(Mdl.Residuals,DataVariable="Raw") StatTbl=1×6 table h _____ Test 1
false
pValue _______
stat ______
cValue ______
0.28133
1.1607
3.8415
Lags ____ 1
Alpha _____ 0.05
SubMdl = {fitlm(X(~fHIdx,:),y(~fHIdx)) fitlm(X(fHIdx,:),y(fHIdx))}; subRes = {SubMdl{1}.Residuals.Raw SubMdl{2}.Residuals.Raw}; [hVT2,pValueVT2] = vartest2(subRes{1},subRes{2}) hVT2 = 0 pValueVT2 = 0.1645
The residual plots and tests suggest that the innovations are homoscedastic and uncorrelated. Conduct a Kolmogorov-Smirnov test to assess whether the innovations are Gaussian. [hKS,pValueKS] = kstest(Mdl.Residuals.Raw/std(Mdl.Residuals.Raw)) hKS = logical 0 pValueKS = 0.2347
hKS = 0 suggests to not reject the null hypothesis that the innovations are Gaussian. For the distributed lag model, the Chow test assumptions appear valid. Conduct Chow Test Treating 2007 and beyond as a post-recession regime, test whether the linear model is stable. Specify that the break point is the last quarter of 2006. Because the complementary subsample size is greater than the number of coefficients, conduct a break point test. bp = find(~fHIdx,1,'last'); chowtest(X,y,bp,'Display','summary'); RESULTS SUMMARY *************** Test 1 Sample size: 196 Breakpoint: 187 Test type: breakpoint Coefficients tested: All Statistic: 1.3741 Critical value: 2.1481 P value: 0.2272 Significance level: 0.0500 Decision: Fail to reject coefficient stability
3-108
Check Model Assumptions for Chow Test
The test fails to reject the stability of the linear model. Evidence is inefficient to infer a structural change between Q4-2006 and Q1-2007.
See Also chowtest | archtest | vartest2 | fitlm | LinearModel
Related Examples •
“Power of the Chow Test” on page 3-110
3-109
3
Model Selection
Power of the Chow Test This example shows how to estimate the power of a Chow test using a Monte Carlo simulation. Introduction Statistical power is the probability of rejecting the null hypothesis given that it is actually false. To estimate the power of a test: 1
Simulate many data sets from a model that typifies the alternative hypothesis.
2
Test each data set.
3
Estimate the power, which is the proportion of times the test rejects the null hypothesis.
The following can compromise the power of the Chow test: • Linear model assumption departures • Relatively large innovation variance • Using the forecast test when the sample size of the complementary subsample is greater than the number of coefficients in the test [39]. Departures from model assumptions allow for an examination of the factors that most affect the power of the Chow test. Consider the model y=
X1 0 beta1 + innov 0 X2 beta2
• innov is a vector of random Gaussian variates with mean zero and standard deviation sigma. • X1 and X2 are the sets of predictor data for initial and complementary subsamples, respectively. • beta1 and beta2 are the regression coefficient vectors for the initial and complementary subsamples, respectively. Simulate Predictor Data Specify four predictors, 50 observations, and a break point at period 44 for the simulated linear model. numPreds = 4; numObs = 50; bp = 44; rng(1); % For reproducibility
Form the predictor data by specifying means for the predictors, and then adding random, standard Gaussian noise to each of the means. mu = [0 1 2 3]; X = repmat(mu,numObs,1) + randn(numObs,numPreds);
To indicate an intercept, add a column of ones to the predictor data. 3-110
Power of the Chow Test
X = [ones(numObs,1) X]; X1 = X(1:bp,:); % Initial subsample predictors X2 = X(bp+1:end,:); % Complementary subsample predictors
Specify the true values of the regression coefficients. beta1 = [1 2 3 4 5]'; % Initial subsample coefficients
Estimate Power for Small and Large Jump Compare the power between the break point and forecast tests for jumps of different sizes small in the intercept and second regression coefficient. In this example, a small jump is a 10% increase in the current value, and a large jump is a 15% increase. Complementary subsample coefficients beta2Small = beta1 + [beta1(1)*0.1 0 beta1(3)*0.1 0 0 ]'; beta2Large = beta1 + [beta1(1)*0.15 0 beta1(3)*0.15 0 0 ]';
Simulate 1000 response paths of the linear model for each of the small and large coefficient jumps. Specify that sigma is 0.2. Choose to test the intercept and the second regression coefficient. M = 1000; sigma = 0.2; Coeffs = [true false true false false]; h1BP = nan(M,2); % Preallocation h1F = nan(M,2); for j = 1:M innovSmall = sigma*randn(numObs,1); innovLarge = sigma*randn(numObs,1); ySmall = [X1 zeros(bp,size(X2,2)); ... zeros(numObs - bp,size(X1,2)) X2]*[beta1; beta2Small] + innovSmall; yLarge = [X1 zeros(bp,size(X2,2)); ... zeros(numObs - bp,size(X1,2)) X2]*[beta1; beta2Large] + innovLarge; h1BP(j,1) = chowtest(X,ySmall,bp,'Intercept',false,'Coeffs',Coeffs,... 'Display','off')'; h1BP(j,2) = chowtest(X,yLarge,bp,'Intercept',false,'Coeffs',Coeffs,... 'Display','off')'; h1F(j,1) = chowtest(X,ySmall,bp,'Intercept',false,'Coeffs',Coeffs,... 'Test','forecast','Display','off')'; h1F(j,2) = chowtest(X,yLarge,bp,'Intercept',false,'Coeffs',Coeffs,... 'Test','forecast','Display','off')'; end
Estimate the power by computing the proportion of times chowtest correctly rejected the null hypothesis of coefficient stability. power1BP = mean(h1BP); power1F = mean(h1F); table(power1BP',power1F','RowNames',{'Small_Jump','Large_Jump'},... 'VariableNames',{'Breakpoint','Forecast'}) ans=2×2 table Breakpoint __________ Small_Jump Large_Jump
0.717 0.966
Forecast ________ 0.645 0.94
3-111
3
Model Selection
In this scenario, the Chow test can detect a change in the coefficient with more power when the jump is larger. The break point test has greater power to detect the jump than the forecast test. Estimate Power for Large Innovations Variance Simulate 1000 response paths of the linear model for a large coefficient jump. Specify that sigma is 0.4. Choose to test the intercept and the second regression coefficient. sigma = 0.4; h2BP = nan(M,1); h2F = nan(M,1); for j = 1:M innov = sigma*randn(numObs,1); y = [X1 zeros(bp,size(X2,2)); ... zeros(numObs - bp,size(X1,2)) X2]*[beta1; beta2Large] + innov; h2BP(j) = chowtest(X,y,bp,'Intercept',false,'Coeffs',Coeffs,... 'Display','off')'; h2F(j) = chowtest(X,y,bp,'Intercept',false,'Coeffs',Coeffs,... 'Test','forecast','Display','off')'; end power2BP = mean(h2BP); power2F = mean(h2F); table([power1BP(2); power2BP],[power1F(2); power2F],... 'RowNames',{'Small_sigma','Large_Sigma'},... 'VariableNames',{'Breakpoint','Forecast'}) ans=2×2 table Breakpoint __________ Small_sigma Large_Sigma
0.966 0.418
Forecast ________ 0.94 0.352
For larger innovation variance, both Chow tests have difficulty detecting the large structural breaks in the intercept and second regression coefficient.
See Also chowtest
Related Examples •
3-112
“Check Model Assumptions for Chow Test” on page 3-104
4 Econometric Modeler • “Analyze Time Series Data Using Econometric Modeler” on page 4-2 • “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44 • “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50 • “Prepare Time Series Data for Econometric Modeler App” on page 4-59 • “Import Time Series Data into Econometric Modeler App” on page 4-62 • “Plot Time Series Data Using Econometric Modeler App” on page 4-66 • “Detect Serial Correlation Using Econometric Modeler App” on page 4-71 • “Detect ARCH Effects Using Econometric Modeler App” on page 4-77 • “Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84 • “Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94 • “Transform Time Series Using Econometric Modeler App” on page 4-97 • “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 • “Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122 • “Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131 • “Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141 • “Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150 • “Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155 • “Conduct Cointegration Test Using Econometric Modeler” on page 4-170 • “Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180 • “Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193 • “Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200 • “Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208 • “Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4-221 • “Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230 • “Share Results of Econometric Modeler App Session” on page 4-237
4
Econometric Modeler
Analyze Time Series Data Using Econometric Modeler The Econometric Modeler app is an interactive tool for analyzing univariate or multivariate time series data. The app is well suited for visualizing and transforming data, performing statistical specification and model identification tests, fitting models to data, and iterating among these actions. When you are satisfied with a model, you can export it to the MATLAB Workspace to forecast future responses or for further analysis. You can also generate code or a report from a session. Start Econometric Modeler by entering econometricModeler at the MATLAB command line, or by clicking Econometric Modeler under Computational Finance in the apps gallery (Apps tab on the MATLAB Toolstrip). The following workflow describes how to find a model with the best in-sample fit to time series data using Econometric Modeler. The workflow is not a strict prescription—the steps you implement depend on your goals and the model type. You can easily skip steps and iterate several steps as needed. The app is well suited to the Box-Jenkins approach to time series model building [1]. 1
Prepare data for Econometric Modeler on page 4-3 — For a response variable, or select multiple response variables for a multivariate analysis, to analyze and from which to build a predictive model. Optionally, select explanatory variables to include in the model. Note You can import only one variable from the MATLAB Workspace into Econometric Modeler. Therefore, at the command line, you must synchronize and concatenate multiple series into one variable.
2
Import time series variables on page 4-4 — Import Data into Econometric Modeler from the MATLAB Workspace or a MAT-file. After importing data, you can adjust variable properties or the presence of variables.
3
Perform exploratory data analysis on page 4-6 — View the series in various ways, stabilize the series by transforming them, and detect time series properties by performing statistical tests. • Visualize time series data on page 4-6 — Supported plots include time series plots and correlograms (for example, the autocorrelation function (ACF)). • Perform specification and model identification hypothesis tests on page 4-9 — Test series for stationarity, heteroscedasticity, autocorrelation, and collinearity or cointegration among multiple series. For ARIMA and GARCH models, this step can include determining the appropriate number of lags to include in the model. Supported tests include the augmented Dickey-Fuller test, Engle's ARCH test, the Ljung-Box Q-test, Belsley collinearity diagnostics, and the Johansen cointegration test. • Transform time series on page 4-14 — Supported transformations include the log transformation and seasonal and nonseasonal differencing.
4-2
4
Fit candidate models to the data on page 4-15 — Choose model parametric forms for univariate or multivariate response series based on the exploratory data analysis or dictated by economic theory. Then, estimate the model. Supported univariate models include seasonal and nonseasonal conditional mean (for example, ARIMA), conditional variance (for example, GARCH), and multiple linear regression models (optionally containing ARMA errors). Supported multivariate models include vector autoregression (VAR) and vector error-correction (VEC) models.
5
Conduct goodness-of-fit checks on page 4-30 — Ensure that the model adequately describes the data by performing residual diagnostics.
Analyze Time Series Data Using Econometric Modeler
• Visualize the residuals to check whether they are centered on zero, normally distributed, homoscedastic, and serially uncorrelated. Supported plots include quantile-quantile and ACF plots. • Test the residuals for homoscedasticity and autocorrelation. Supported tests include the Ljung-Box Q-test and Engle's ARCH test on the squared residuals. 6
Find the model with the best in-sample fit on page 4-36 — Estimate multiple models within the same family, and then choose the model that yields the minimal fit statistic, for example, Akaike information criterion (AIC).
7
Export session results on page 4-38 — After you find a model or models that perform adequately, summarize the results of the session. The method you choose depends on your goals. Supported methods include: • Export variables on page 4-38 — Econometric Modeler exports selected variables to the MATLAB Workspace. If a session in the app does not complete your analysis goal, such as forecasting responses, then you can export variables (including estimated models) for further analysis at the command line. • Generate a function on page 4-39 — Econometric Modeler generates a MATLAB plain text or live function that returns a selected model given the imported data. This method helps you understand the command-line functions that the app uses to create predictive models. You can modify the generated function to accomplish your analysis goals. • Generate a report on page 4-40 — Econometric Modeler produces a document, such as, a PDF, describing your activities on selected variables or models. This method provides a clear and convenient summary of your analysis when you complete your goal in the app.
Prepare Data for Econometric Modeler App You can import only one variable from the MATLAB Workspace into Econometric Modeler. Therefore, before importing data, concatenate the response series and any predictor series into one variable. Econometric Modeler supports these variable data types. • MATLAB timetable — Variables must be double-precision numeric vectors. A best practice is to import your data in a timetable because Econometric Modeler: • Names variables by using the names stored in the VariableNames field of the Properties property. • Uses the time variable values as tick labels for any axis that represents time. Otherwise, tick labels representing time are indices. • Enables you to overlay recession bands on time series plots (see recessionplot) For more details on timetables, see “Create Timetables”. • MATLAB table — Variables must be double-precision numeric vectors. Variable names are the names in the VariableNames field of the Properties property. • Numeric vector or matrix — For a matrix, each column is a separate variable named variableNamej, where j is the corresponding column. Regardless of variable type, Econometric Modeler assumes that rows correspond to time points (observations).
4-3
4
Econometric Modeler
Import Time Series Variables The data set can exist in the MATLAB Workspace or in a MAT-file that you can access from your machine. • To import a data set from the Workspace, on the Econometric Modeler tab, in the Import section, click . In the Import Data dialog box, click the check box in the Import? column for the variable containing the data, and then click Import. All variables in the Workspace of the supported data type appear in the dialog box, but you can choose one only. • To import data from a MAT-file, in the Import section, click Import, then select Import From MAT-file. In the Select a MAT-file dialog box, browse to the folder containing the data set, then double-click the MAT-file. After you import data, Econometric Modeler performs all the following actions. • The name of each variable (column) in the data set appears in the Time Series pane. • The value of the variable selected in the Time Series pane appears in the Preview pane. • A time series plot including all variables appears in the Time Series Plot(VariableName) figure window, where VariableName is the name of one of the variables in the Time Series pane. You can interact with the variables in the Time Series pane in several ways. • To select a variable to perform a statistical test or create a plot, for example, click the variable in the Time Series pane. If you double-click the variable instead, then the app also plots it in a separate time series plot. • To open, delete, or export a variable, right-click it in the Time Series pane. Then, from the context menu, choose the desired action. • To select or operate on multiple time series simultaneously, press Ctrl and click each variable you want to use. Consider importing the data in the Data_USEconModel MAT-file. 1
At the command line, load the data into the Workspace. load Data_USEconModel
2
3
4-4
In Econometric Modeler, in the Import section of the Econometric Modeler tab, click Import Data dialog box appears.
. The
Data_USEconModel stores several variables. Data, DataTable, and DataTimeTable contain the same data, but DataTable is a table, and DataTimeTable is a timetable. Import DataTimeTable by selecting the corresponding Import? check box, then clicking Import.
Analyze Time Series Data Using Econometric Modeler
All variables in DataTimeTable appear in the Time Series pane. Suppose that you want to retain COE, FEDFUNDS, and GDP only. Select all other variables, right-click one of them, and select Delete.
After working in the app, you can import another data set. After you click Import, Econometric Modeler displays the following dialog box.
If you click OK, then Econometric Modeler deletes all variables from the Time Series and Models panes, and closes all documents in the right pane. 4-5
4
Econometric Modeler
Perform Exploratory Data Analysis An exploratory data analysis includes determining characteristics of your variables and relationships between them, with the formation of a predictive model in mind. For time series data, identify series that grow exponentially, contain trends, or are nonstationary, and then transform them appropriately. For ARIMA models, to identify the model form and significant lags in the serial correlation structure of the response series, use the Box-Jenkins methodology [1]. If you plan to create GARCH models, then assess whether the series contain volatility clustering and significant lags. For multiple regression models, identify collinear predictors and those predictors that are linearly related to the response. For multivariate models, in addition to univariate analyses, you can test whether series are cointegrated. For time series data analysis, an exploratory analysis usually includes iterating among visualizing the data, performing statistical specification and model identification tests, and transforming the data. Visualizing Time Series Data After you import a data set, Econometric Modeler selects all variables in the imported data and displays a time series plot of them in the right pane by default. For example, after you import DataTimeTable in the Data_USEconModel data set, the app displays this time series plot.
To create your own time series plot: 4-6
Analyze Time Series Data Using Econometric Modeler
1
In the Time Series pane, select the appropriate number of series for the plot.
2
Click the Plots tab in the toolstrip.
3
Click the button for the type of plot you want.
Econometric Modeler supports the following time series plots. Plot
Goals • Identify missing values or outliers.
Time series
or
• Identify series that grow exponentially or that contain a trend. • Identify nonstationary series. • Identify series that contain volatility clustering. • Compare two series with different scales in the same plot (Y-Y Axis). • Compare multiple series with similar scales in the same plot.
Autocorrelation function
• Identify series with serial correlation. • Determine whether an AR model is appropriate.
(ACF)
• Identify significant MA lags for model identification.
Partial ACF (PACF)
• Identify series with serial correlation. • Determine whether an MA model is appropriate. • Identify significant AR lags for model identification. • Inspect variable distributions. • Identify variables with linear relationships pairwise.
Correlations
You can interact with an existing plot by: • Right-clicking it • Using the plot buttons that appear when you pause on the plot • Using the options on the figure window Supported interactions vary by plot type. • Save a figure — Right-click the figure, then select Export. Save the figure that appears. • Add or remove time series in a plot — Right-click the figure, point to Show Time Series menu, then select the time series to add or remove. • Plot recession bands — Right-click a time series plot, then select Show Recessions. • Show grid lines — Pause on the figure, then click • Toggle legend — Pause on the figure, then click • Pan — Pause on the figure, then click Data”.
. .
. For more details on panning, see “Zoom, Pan, and Rotate
• Zoom — Pause on the figure. To zoom in, click “Zoom, Pan, and Rotate Data”.
. To zoom out, click
. For more details, see
• Restore view — To return the plot to its original view after you pan or zoom, pause on the figure, then click
. 4-7
4
Econometric Modeler
For serial correlation function plots, additional options exist in the ACF or PACF tabs. You can specify the: • Number of lags to display • Number of standard deviations for the confidence bands • MA or AR order in which the theoretical ACF or PACF, respectively, is effectively zero Econometric Modeler updates the plot in real time as you adjust the parameters. As you explore your data, plots and computation results accumulate in the right pane under tabs. You can customize the display of the documents in the right pane to, for example, view multiple plots simultaneously, by performing any of the following actions: • Orient the plot tabs by dragging them into different sections of the right pane. As you drag a plot, the app highlights possible sections in which to place it. To undo the last document or figure window positioning, pause on the dot located in the middle of the partition, then click when it appears. • Click the Document Actions button include:
on the upper-right corner of the document. Options
• Tile All — Choose a layout for multiple plots. • Tab Position — Select where to display the figure tab. Consider an ARIMA model for the effective federal funds rate (FEDFUNDS). To identify the model characteristics (for example, the number of AR or MA lags), plot the time series, ACF, and PACF sideby-side.
4-8
1
In the Time Series pane, double-click FEDFUNDS.
2
Add recession bands to the plot by right-clicking the plot in the Time Series Plot(FEDFUNDS) figure window, then selecting Show Recessions.
3
On the Plots tab, click ACF.
4
Click PACF.
5
Click the Time Series Plot(FEDFUNDS) figure window and drag it to the left side of the right pane. Click the PACF(FEDFUNDS) figure window and drag it to the bottom right of the pane.
Analyze Time Series Data Using Econometric Modeler
The ACF dies out slowly and the PACF cuts off after the first lag. The behavior of the ACF suggests that the time series must be transformed before you choose on the form of the ARIMA model. In the right pane, observe the dot in the middle of the horizontal partition between the correlograms (below the Lag x axis label of the ACF). To undo this correlogram positioning, that is, separate the correlograms by tabs, pause on the dot and click when it appears. Performing Specification and Model Identification Hypothesis Tests You can perform hypothesis tests to confirm time series properties that you obtain visually or test for properties that are difficult to see. Econometric Modeler enables you to run tests multiple times with parameter settings. Econometric Modeler supports these tests for univariate series. Test
Hypotheses
Augmented Dickey-
H0: Series has a unit root.
Fuller
H1: Series is stationary. For details on the supported parameters, see adftest.
Kwiatkowski, Phillips, Schmidt, Shin (KPSS)
H0: Series is trend stationary. H1: Series has a unit root. For details on the supported parameters, see kpsstest.
4-9
4
Econometric Modeler
Test
Hypotheses
Leybourne-McCabe
H0: Series is a trend stationary AR(p) process. H1: Series is an ARIMA(p,1,1) process. To specify p, adjust the Number of Lags parameter. For details on the supported parameters, see lmctest.
Phillips-Peron
H0: Series has a unit root. H1: Series is stationary. For details on the supported parameters, see pptest.
Variance ratio
H0: Series is a random walk. H1: Series is not a random walk. For details on the supported parameters, see vratiotest.
Engle's ARCH
H0: Series exhibits no conditional heteroscedasticity (ARCH effects). H1: Series is an ARCH(p) model, with p > 0. To specify p, adjust the Number of Lags parameter. For details on the supported parameters, see archtest.
Ljung-Box Q-test
H0: Series exhibits no autocorrelation in the first m lags, that is, corresponding coefficients are jointly zero. H1: Series has at least one nonzero autocorrelation coefficient ρj, j ∈ {1, …,m}. To specify m, adjust the Number of Lags parameter. For details on the supported parameters, see lbqtest.
Note Before conducting tests, Econometric Modeler removes leading and trailing missing values (NaN values) in the series. Engle's ARCH test does not support missing values within the series, that is, NaN values preceded and succeeded by observations. The stationarity test results suggest whether you should transform a series to stabilize it, and which transformation is appropriate. For ARIMA models, stationarity test results suggest whether to include degrees of integration. Engle's ARCH test results indicate whether the series exhibits volatility clustering and suggests lags to include in a GARCH model. Ljung-Box Q-test results suggest how many AR lags are required in an ARIMA model. To perform a univariate test in Econometric Modeler:
4-10
1
Select a variable in the Time Series pane.
2
On the Econometric Modeler tab, in the Tests section, click New Test.
3
In the test gallery, click the test you want to conduct. A new tab for the test type appears in the toolstrip, and a new document for the test results appears in the right pane.
4
On the test type tab, in the Parameters section, adjust parameters for the test. For example, consider performing an Engle's ARCH test. On the ARCH tab, in the Parameters section, select
Analyze Time Series Data Using Econometric Modeler
the number of lags in the test statistic using the Number of Lags spin box, or the significance level (that is, the value of α) using the Significance Level spin box.
5
On the test-type tab, in the Tests section, click Run Test. The test results, including whether to reject the null hypothesis, the p-value, and the parameter settings, appear in a new row in the Results table of the test results document. If the null hypothesis has been rejected, then the app highlights the row in yellow.
If you run multiple tests on a particular series, the results of each test appear as a new row in the Results table. To remove a row from the Results table, select the corresponding check box in the Select column, then click Clear Tests in the test-type tab. Note Multiple testing inflates the false discovery rate. One conservative way to maintain an overall false discovery rate of α is to apply the Bonferroni correction to the significance level of each test. That is, for a total of t tests, set Significance Level value to α/t. Econometric Modeler supports these tests and diagnostics for multiple series. Tests and Diagnostics
Description
Belsley Collinearity
For details on the supported parameters and results, see collintest.
Diagnostics Engle-Granger
H0: The series do not exhibit cointegration. H1: The series exhibit cointegration. For details on the supported parameters, see egcitest.
Johansen
For a specified cointegration rank r: • H0: The series exhibit at most rank r cointegration. • H1: The series exhibit cointegration with rank greater than r. For details on the supported parameters, see jcitest.
To diagnose multiple series in Econometric Modeler: 1
Select at least two variables in the Time Series pane.
2
On the Econometric Modeler tab, in the Tests section, click New Test.
3
In the tests gallery, select the diagnostics you want to run. A new tab for the diagnostic appears in the toolstrip, and a new document for the results appears in the right pane.
4
On the tab for the diagnostic, you can adjust parameters for the diagnostic in the appropriate section. For example, consider conducting an Engle-Granger cointegration test on the selected series. In the tests gallery, select Engle-Granger Test. On the EGCI tab, in the Parameters section, select the cointegration regression form by using the Cointegration Regression Form 4-11
4
Econometric Modeler
list and select the test to conduct on the regression residuals by using the Residual Regression Form list.
For example, consider a predictive model containing Canadian inflation and interest rates as predictor variables. Determine whether the variables are collinear. The Data_Canada data set contains the time series. 1
Import the DataTimeTable variable in the Data_Canada data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4). The time series plot appears in the right pane.
All series appear to contain autocorrelation. Although you should remove autocorrelation from predictor variables before creating a predictive model, this example proceeds without removing autocorrelation.
4-12
Analyze Time Series Data Using Econometric Modeler
2
In the Tests section, click New Test. In the Collinearity section, click Belsley Collinearity Diagnostics. Econometric Modeler creates a new tab for the Belsley collinearity diagnostics in the toolstrip, and a it creates a new document for the results in the right pane. The results contains a table of the singular values, condition indices, and the variance-decomposition proportions for each series. Rows Econometric Modeler highlights in yellow have a condition index greater than the tolerance specified by the Condition Index parameter value (default is 30) in the Tolerances section of the Collinearity tab. The columns of the table labeled with series names form the matrix of variance-decomposition proportions. Those series with variance-decomposition proportion greater than the tolerance specified by the Variance-Decompoosition Proportion parameter value (default is 0.5) in the highlighted row exhibit multicollinearity.
Because their variance-decomposition proportions are above the tolerance (default tolerance is 0.5) for the condition index, the collinear predictors are INT_L, INT_M, and INT_S. 3
You can add or remove time series from the diagnostics. For example, remove the inflation rates from the diagnostics by performing the following procedure.
4-13
4
Econometric Modeler
a
In the test-results document, right-click the Results table or plot.
b
Point to Show Time Series. A list of all variables appears.
c
Remove the inflation rate series INF_C and INF_G from the diagnostics by deselecting the corresponding check boxes. As you deselect series, Econometric Modeler recomputes the results based on the selected series.
For more details on the Belsley collinearity diagnostics results and multicollinearity, see collintest and “Time Series Regression II: Collinearity and Estimator Variance” on page 5-180. Transforming Time Series The Box-Jenkins methodology [1] for ARIMA model selection assumes that the response series is stationary, and spurious regression models can result from a model containing nonstationary predictors and response variables (for more details, see “Time Series Regression IV: Spurious Regression” on page 5-197). To stabilize your series, Econometric Modeler supports these transformations in the Transforms section of the Econometric Modeler tab. Transformation
Use When Series ...
Notes
Log
Has an exponential trend or variance that grows with its levels
All values in the series must be positive.
Linear detrend
Has a linear deterministic trend that can be identified using least squares
When Econometric Modeler detrends the series, it ignores leading or trailing missing (NaN) values. If any missing values occur between observed values, then the app returns a vector of NaN values with length equal to the series.
First-order difference
4-14
Has a stochastic trend
Econometric Modeler prepends the differenced series with a NaN value. This action ensures that the differenced series has the same length and time base as the original series.
Analyze Time Series Data Using Econometric Modeler
Transformation
Use When Series ...
Notes
Seasonal difference
Has a seasonal, stochastic trend
You can specify the period in a season using the spin box. For example, 12 indicates a monthly seasonal transformation. Econometric Modeler prepends the differenced series with nan(period,1), where period is the specified period in a season. This action ensures that the differenced series has the same length and time base as the original series.
For more details, see “Data Transformations” on page 2-2. To transform a variable, select the variable in the Time Series pane, then click a transformation. After you transform a series, a new variable representing the transformed series appears in the Time Series pane. Also, Econometric Modeler plots and selects the new variable. To create the variable name, the app appends the transformation name to the end of the variable name. You can rename the transformed variable by clicking it twice in the Time Series to select the text of the variable name, and then entering the new name. You can select multiple series by pressing Ctrl and clicking each series, and then apply the same transformation to the selected series simultaneously. The app creates new variables for each series, appends the transformation name to the end of each transformed variable name, and plots the transformed variables in the same figure. For example, suppose that the GDP series in Data_USEconModel has an exponential trend and a stochastic trend. Stabilize the GDP by applying the log transformation and then applying the second difference. 1
Import the DataTimeTable variable in the Data_USEconModel data set into the Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
In the Time Series pane, select GDP.
3
On the Econometric Modeler tab, in the Transforms section, click Log. The app creates a variable named GDPLog, which appears in the Time Series pane, and displays a plot for the time series.
4
In the Transforms section, click Difference. The app creates a variable named GDPLogDiff and displays a plot for the time series.
5
In the Transforms section, click Difference. The app creates a variable called GDPLogDiffDiff and displays a plot for the time series.
GDPLogDiffDiff is the stabilized GDP.
Fitting Models to Data The results of an exploratory data analysis can suggest several candidate models. To choose a model, in the Time Series pane, select the time series for the response. On the Econometric Modeler tab, in the Models section, click a model or click one in the models gallery. Econometric Modeler allows you to select only those models that are appropriate for the number of the selected response series. After you select a model, you configure it for estimation. Econometric Modeler supports the following models. 4-15
4
Econometric Modeler
Model
Response
Conditional Univariate mean: ARMA/ ARIMA Models section
Type
Stationary autoregressive (AR) For details, see “Autoregressive Model” on page 7-17, arima, and estimate.
Univariate Stationary moving average (MA) For details, see “Moving Average Model” on page 7-26, arima, and estimate. Univariate Stationary ARMA For details, see “Autoregressive Moving Average Model” on page 7-35, arima, and estimate. Univariate Nonstationary, integrated ARMA (ARIMA) For details, see “ARIMA Model” on page 7-42, arima, and estimate. Univariate Seasonal (Multiplicative) ARIMA (SARIMA) For details, see “Multiplicative ARIMA Model” on page 7-49, arima, and estimate. Univariate ARIMA including exogenous predictors (ARIMAX) For details, see “ARIMA Model Including Exogenous Covariates” on page 7-62, arima, and estimate. Univariate Seasonal ARIMAX For details, see arima and estimate. Conditional Univariate variance: GARCH Models section
Generalized autoregressive conditional heteroscedastic (GARCH) For details, see “GARCH Model” on page 8-3, garch, and estimate.
4-16
Analyze Time Series Data Using Econometric Modeler
Model
Response
Type
Univariate Exponential GARCH (EGARCH) For details, see “EGARCH Model” on page 8-3, egarch, and estimate. Univariate Glosten, Jagannathan, and Runkle (GJR) For details, see “GJR Model” on page 8-4, gjr, and estimate. Multiple linear regression: Regression Models section
Univariate Multiple linear regression For details, see “Time Series Regression I: Linear Models” on page 5-173, LinearModel, and fitlm. Univariate Regression model with ARMA errors For details, see “Regression Models with Time Series Errors” on page 5-5, regARIMA, and estimate.
Vector Autoregression Models: Multivariate Models section
Multivariate Stationary vector autoregression (VAR) For details, see “Vector Autoregression (VAR) Models” on page 9-3 and varm. Multivariate VAR including exogenous variable (VARX) For details, see “Vector Autoregression (VAR) Models” on page 9-3 and varm. Multivariate Vector error-correction (VEC) or cointegrated VAR For details, see vecm.
For univariate conditional mean model estimation, SARIMA and SARIMAX are the most flexible models. You can create any conditional mean model that excludes exogenous predictors by clicking SARIMA, or you can create any conditional mean model that includes at least one exogenous predictor by clicking SARIMAX. For multivariate model estimation, the model you choose depends on whether the selected time series are stationary or cointegrated. For stationary series, create a VAR model by clicking VAR. To include exogenous predictors, click VARX instead. For cointegrated series, click VEC.
4-17
4
Econometric Modeler
After you select a model, the app displays the Type Model Parameters dialog box, where Type is the model type. For example, this figure shows the SARIMAX Model Parameters dialog box.
Adjustable parameters in the Type Model Parameters window depend on Type. In general, adjustable parameters include: • Deterministic terms and linear regression coefficients corresponding to predictor variables (see “Adjusting Deterministic Terms and Regression Component Parameters” on page 4-19) • Time series component parameters for univariate models, which include seasonal and nonseasonal lags and degrees of integration (see “Adjusting Time Series Component Parameters for Univariate Models” on page 4-20) • Time series component parameters for multivariate models, which include AR lags and AR coefficient elements to specify equality constraints during estimation (see “Adjusting Time Series Component Parameters for Multivariate Models” on page 4-21) • For univariate models, the innovation distribution (see “Adjusting Innovation Distribution Parameters for Univariate Models” on page 4-21) As you adjust parameter values, the equation in the Model Equation section changes to match your specifications. Adjustable parameters correspond to input and name-value pair arguments described in corresponding model creation reference pages. For details, see the function reference page for a specific model. Regardless of the model you choose, all unspecified coefficients in the model are unknown and estimable, including the t-distribution degrees of freedom parameter (when you specify a t innovation distribution). Note Econometric Modeler does not support: • Optimization option adjustments for estimation. 4-18
Analyze Time Series Data Using Econometric Modeler
• Composite conditional mean and variance models. For details, see “Specify Conditional Mean and Variance Models” on page 7-81. • Equality constraints on univariate model parameters during estimation (except for holding some parameters fixed at zero during estimation). To adjust optimization options, estimate composite conditional mean and variance models, or apply equality constraints, use the MATLAB command line. Adjusting Deterministic Terms and Regression Component Parameters Supported deterministic terms depend on the selected model and include a model constant (offset or intercept) and linear time trend. To include a model constant (offset or intercept) term, select the Include Constant Term or Include Offset Term check box. Similar for a linear time trend, select the Include Trend check box. To remove a deterministic term (that is, constrain it to zero during estimation), clear the check box. The location and type of the check box in the Type Model Parameters dialog box depends on the model type. By default, Econometric Modeler includes a model constant in all model types except conditional variance models. To select predictors for the regression component, in the Predictors list, select the check box in the Include? column corresponding to the predictors you want to include in the model. By default, the app does not include a regression component in any model type. • If you select ARIMAX, SARIMAX, or RegARMA, then you must choose at least one predictor. • If you select MLR, then you can specify one of the following: • An MLR model when you choose at least one predictor • A constant mean model (intercept-only model) when you clear all check boxes in the Include? column and select the Include Intercept check box • An error-only model when you clear all check boxes in the Include? column and clear the Include Intercept check box Consider a linear regression model of GDP onto CPI and the unemployment rate. To specify the regression: 1
Import the DataTimeTable variable in the Data_USEconModel data set into the Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
In the Time Series pane, select the response variable GDP.
3
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
4
In the models gallery, in the Regression Models section, click MLR.
5
In the MLR Model Parameters dialog box, in the Include? column, select the CPIAUCSL and UNRATE check boxes.
4-19
4
Econometric Modeler
6
Click the Estimate button.
Adjusting Time Series Component Parameters for Univariate Models In general for univariate models, time series component parameters contain lags to include in the seasonal and nonseasonal lag operator polynomials, and seasonal and nonseasonal degrees of integration. 4-20
Analyze Time Series Data Using Econometric Modeler
• For conditional mean models, you can specify seasonal and nonseasonal autoregressive lags, and seasonal and nonseasonal moving average lags. You can also adjust seasonal and nonseasonal degrees of integration. • For conditional variance models, you can specify ARCH and GARCH lags. EGARCH and GJR models also support leverage lags. • For regression models with ARMA errors, you can specify nonseasonal autoregressive and moving average lags. For models containing seasonal lags or degrees of seasonal or nonseasonal integration, use the command line instead. Econometric Modeler supports two options to adjust the parameters. The adjustment options are on separate tabs of the Type Model Parameters dialog box: the Lag Order and Lag Vector tabs. On the Lag Order tab, you can specify the orders of lag operator polynomials. This feature enables you to include all lags efficiently, from 1 through the specified order, in a lag operator polynomial. On the Lag Vector tab, you can specify the individual lags that comprise a lag operator polynomial. This feature is well suited for creating flexible models. For more details, see “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44. Adjusting Innovation Distribution Parameters for Univariate Models For univariate models, you can specify that the distribution of the innovations is Gaussian. For all models, except multiple linear regression models, you can specify the Student's t instead to address leptokurtic innovation distributions (for more details, see “Maximum Likelihood Estimation for Conditional Mean Models” on page 7-112, “Maximum Likelihood Estimation for Conditional Variance Models” on page 8-52, or “Maximum Likelihood Estimation of regARIMA Models” on page 5-72). If you specify the t distribution, then Econometric Modeler estimates its degrees of freedom parameter using maximum likelihood. By default, Econometric Modeler uses the Gaussian distribution for the innovations. To change the innovation distribution, in the Type Model Parameters dialog box, from the Innovation Distribution button, select a distribution in the list.
Adjusting Time Series Component Parameters for Multivariate Models Supported time series component parameters for multivariate models depend on the model type. All types enable you to include nonseasonal AR lag coefficients. However, VEC models additionally support the following parameters specifications: • The Johansen model form, which specifies which deterministic terms (overall or within the cointegrating relation) to include in the model. For details, see “Johansen Form” on page 12-2415. • Cointegration rank r • Adjustment speed matrix A • Cointegration matrix B This figure is an example of the Type Model Parameters dialog box.
4-21
4
Econometric Modeler
Like univariate models, Econometric Modeler supports adjusting the AR or short-run lag operator polynomial efficiently by specifying the lag order (Lag Order tab) or, for flexibility, by specifying individual lags (Lag Vector tab). Unlike univariate models, Econometric Modeler supports estimation equality constraints on individual entries of the AR or short-run lag coefficient matrix, which correspond to self- or cross-variable lag coefficients in the model. Coefficient constraints enable you to test economic scenarios. Econometric Modeler holds the specified values fixed during estimation. For VEC models, you can specify equality constraints on the entire matrix A or B, except for Johansen forms H* and H1* which support equality constraints only for A. To specify such equality constraints: 1
4-22
In the Type Model Parameters dialog box, select the lag to constrain by using the AR Coefficients (ϕ) (VAR or VARX) or Short-Run Coefficients (Φ) (for VEC) list.
Analyze Time Series Data Using Econometric Modeler
2
Perform either one of the following alternatives: • Click the elements of the matrices to constrain, and then enter the value. Econometric Modeler estimates all NaN entries. • Import a matrix. a
At the command line, create an appropriately sized matrix of constraints or mix of constraints and NaN values for AR or short-run lags.
b
In Econometric Modeler, in the Type Model Parameters dialog box, click Import.
c
In the dialog box, select the variable to import for the coefficient matrix.
For details, see “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50. Estimating a Univariate Model Econometric Modeler treats all parameters in the model as unknown and estimable. After you specify a model, fit it to the data by clicking Estimate in the Type Model Parameters dialog box. Note • Econometric Modeler requires initial values for estimable parameters and presample observations to initialize the model for estimation. Econometric Modeler always chooses the default initial and presample values as described in the estimate reference page of the model you want to estimate. • If Econometric Modeler issues an error during estimation, then: • The specified model poorly describes the data. Adjust model parameters, then estimate the new model. • At the command line, adjust optimization options and estimate the model. For details, see “Optimization Settings for Conditional Mean Model Estimation” on page 7-119, “Optimization Settings for Conditional Variance Model Estimation” on page 8-58, or “Optimization Settings for regARIMA Model Estimation” on page 5-82.
After you estimate a model: • A new variable that describes the estimated model appears in the Models pane with the name Type_response. Type is the model type and response is the response variable to which Econometric Modeler fit the model, for example, ARIMA_FEDFUNDS. You operate on an estimated model in the Models pane by right-clicking it. In addition to the options available for time series variables (see “Import Time Series Variables” on page 4-4), the context menu includes the Modify option, which enables you to modify and re-estimate a model. For example, right-click a model and select Modify. Then, in the Type Model Parameters dialog box, adjust parameters and click Estimate. • The object display of the model appears in the Preview pane. • The Model Summary(Type_response) document summarizing the estimation results appears in the right pane. Results shown depend on the model type. For conditional mean and regression models, results include: 4-23
4
Econometric Modeler
• Model Fit — A time series plot of the response series and the fitted values y • Parameters — An estimation summary table containing parameter estimates, standard errors, and t statistics and p-values for testing the null hypothesis that the corresponding parameter is 0 • Residual Plot — A time series plot of the residuals • Goodness of Fit — Akaike information criterion (AIC) and Bayesian information criterion (BIC) model fit statistics For conditional variance models, the results also include an estimation summary table and goodness-of-fit statistics, but Econometric Modeler plots: • Conditional Variances — A time series plot of the inferred conditional variances σ 2 t •
Standardized Residuals — A time series plot of the standardized residuals
yt − c 2
σt
, where c is
the estimated offset You can interact with individual plots by pausing on one and selecting an interaction (see “Visualizing Time Series Data” on page 4-6). You can also interact with the summary by rightclicking the document. Options include: • Export — Place plot in a separate figure window. • Show Model — Display the summary of another estimated model by pointing to Show Model, then selecting a model in the list. • Show Recessions — Plot recession bands in time series plots. Consider a SARIMA(0,1,1)×(0,1,1)12 for the monthly international airline passenger numbers from 1949 to 1960 in the Data_Airline data set. To estimate this model using the Econometric Modeler: 1
Import the DataTimeTable variable in the Data_Airline data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section a
Set Degrees of Integration to 1.
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section
4-24
a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box.
Analyze Time Series Data Using Econometric Modeler
4
Click Estimate.
As a result: • A variable named SARIMA_PSSG appears in the Models pane. • The value of SARIMA_PSSG appears in the Preview pane.
4-25
4
Econometric Modeler
• An estimation summary appears in the new Model Summary(SARIMA_PSSG) document.
4-26
Analyze Time Series Data Using Econometric Modeler
Estimating a Multivariate Model Econometric Modeler treats all parameters in the model as unknown and estimable by default. However, unlike univariate models, Econometric Modeler supports equality constraints on some parameters for estimation. You can specify coefficient values in the app or import a lag coefficient matrix from the workspace. After you configure the model, fit it to the data by clicking Estimate in the Type Model Parameters dialog box. Note • To initialize the model for estimation, Econometric Modeler removes the first p observations from the response data to use as a presample, and then the function fits the model to the remaining observations. • If Econometric Modeler issues an error during estimation, the specified model poorly describes the data. Adjust model parameters, and then estimate the new model.
4-27
4
Econometric Modeler
After you estimate a model, Econometric Modeler shows results similar to that of univariate estimation (see “Estimating a Univariate Model” on page 4-23), and you can interact with an estimated multivariate model in the same ways as with univariate models. Notable differences include: • A new variable that describes the estimated model appears in the Models pane with the name Typej, where Type is the model type and j is estimated model j of that type, for example, VAR2 is the second estimated VAR model during the session. • The Model Summary(Type_response) document summarizing the estimation results appears in the right pane. However, the plots shown depend on the model type and the selected time series in the Time Series list at the top of the document. For VAR models, results include: • Model Fit — A time series plot of the selected time series and the corresponding fitted values y • Residual Plot — A time series plot of the residuals corresponding to the selected time series For VEC models, the results additionally include a time series plot of the cointegrating relation, which is invariant to the selected time series. Consider a 3-D VAR(4) model of quarterly measurements of the US gross domestic product (GDP), M1 money supply, and the 3-month T-bill rate from 1947 through 2009. The file Data_USEconModel.mat contains the series, among other economic measurements. To estimate this model using the Econometric Modeler:
4-28
1
Import the DataTimeTable variable in the Data_USEconModel data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
On the Econometric Modeler tab, in the Time Series pane, click GDP, and then press Ctrl and click M1SL.
3
Because GDP and M1SL exhibit exponential growth, use their growth rates in the model. On the Econometric Modeler tab, in the Transforms section, click Log, and then click Difference. Change the name of the transformed variables to GDPRate and M1SLRate, respectively.
4
In the Time Series pane, click TB3MS, and then stabilize the series by clicking, in the Transforms section, Difference. Change the name of the transformed series to TB3MSRate.
5
With TB3MSRate selected, select the three rate series for the VAR model by pressing Ctrl and clicking GDPRate and M1SLRate.
6
On the Econometric Modeler tab, in the Models section, click VAR.
7
In the VAR Model Parameters dialog box, on the Lag Order tab, set Autoregressive Order to 4.
Analyze Time Series Data Using Econometric Modeler
8
Click Estimate.
As a result: • A variable named VAR appears in the Models pane. • The value of VAR appears in the Preview pane.
4-29
4
Econometric Modeler
• An estimation summary appears in the new Model Summary(VAR) document. The plots are relative to the GDPRate series. The indices of the parameter estimates in the Parameters section correspond to the order of the series in the Time Series list (for example AR{1}(2,3) is the lag 1 AR coefficient of the TB3MSRate series in the equation of the M1SLRate series.
Conducting Goodness-of-Fit Checks After you estimate a model, a good practice is to determine the adequacy of the fitted model (see “Goodness of Fit” on page 3-86). Econometric Modeler is well suited for visually assessing the insample fit (for all models except conditional variance models) and performing residual diagnostics. Residual diagnostics include evaluating the model assumptions and investigating whether you must respecify the model to address other properties of the data. Model assumptions to assess include checking whether the residuals are centered on zero, normally distributed, homoscedastic, and serially uncorrelated. If the residuals do not demonstrate all these properties, then you must determine the severity of the departure, whether to transform the data, and whether to specify a different model. For more details on residual diagnostics, see “Time Series Regression VI: Residual Diagnostics” on page 5-220 and “Residual Diagnostics” on page 3-87. To perform goodness-of-fit checks using Econometric Modeler, in the Models pane, select an estimated model. Then, complete the following steps: 4-30
Analyze Time Series Data Using Econometric Modeler
• To visually assess the in-sample fit for all models (except conditional variance models), inspect the Model Fit plot in the Model Summary document. For multivariate models, Econometric Modeler displays fitted values of one series. You can select a different series in the model to plot by clicking the series in the Time Series list. • To visually assess whether the residuals are centered on zero, autocorrelated, and heteroscedastic, inspect the Residual Plot in the Model Summary document. For multivariate models, Econometric Modeler displays residuals of one series. You can select a different residual series to plot by clicking the corresponding series in the Time Series list. • On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics. The diagnostics gallery provides these residual plots and tests. Method Residual histogram Residual quantile-quantile
Diagnostic Visually assess normality Visually assess normality and skewness
plot ACF Ljung-Box Q-test
Visually assess whether residuals are autocorrelated Test residuals for significant autocorrelation
ACF of squared residuals
Visually assess whether residuals have conditional heteroscedasticity
Engle's ARCH test
Test residuals for conditional heteroscedasticity (significant ARCH effects)
Alternatively, to plot a histogram, quantile-quantile plot, or ACF of the residuals of an estimated model: 1
Select a model in the Models pane.
2
Click the Plots tab.
3
In the Plots section, click the arrow and then click one of the plots in the Model Plots section of the gallery.
For multivariate models: • Econometric Modeler plots residual diagnostics for all model series within the same document. • Econometric Modeler runs residual diagnostic tests simultaneously for all series, but it shows results of each residual series separately. You can choose which results to show by clicking, in the test tab, the series in the Time Series list. Note Another important goodness-of-fit check is predictive-performance assessment. To assess the predictive performance of several models: 1
Fit a set of models to the data using Econometric Modeler.
2
Perform residual diagnostics on all models.
3
Choose a subset of models with desirable residual properties and minimal fit statistics (see “Finding Model with Best In-Sample Fit” on page 4-36). 4-31
4
Econometric Modeler
4
Export the chosen models to the MATLAB Workspace (see “Export Session Results” on page 438).
5
Perform a predictive performance assessment at the command line (see “Assess Predictive Performance” on page 3-89).
For an example, see “Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193. Consider performing goodness-of-fit checks on the estimated SARIMA(0,1,1)×(0,1,1)12 model for the airline counts data in “Estimating a Univariate Model” on page 4-23. 1
4-32
In the right pane, on the Model Summary(SARIMA_PSSG) document: a
Model Fit suggests that the model fits to the data fairly well.
b
Residual Plot suggests that the residuals have a mean of zero. However, the residuals appear heteroscedastic and serially correlated.
Analyze Time Series Data Using Econometric Modeler
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics. In the diagnostics gallery: a
Click Residual Q-Q Plot. The right pane display a figure window named QQPlot(SARIMA_PSSG) containing a quantile-quantile plot of the residuals.
4-33
4
Econometric Modeler
The plot suggests that the residuals are approximately normal, but with slightly heavier tails. b
4-34
Click Autocorrelation Function. In the toolstrip, the ACF tab appears and contains plot options. The right pane displays a figure window named ACF(SARIMA_PSSG) containing the ACF of the residuals.
Analyze Time Series Data Using Econometric Modeler
Because almost all the sample autocorrelation values are below the confidence bounds, the residuals are likely not serially correlated. c
Click Engle's ARCH Test. On the ARCH tab, in the Tests section, click Run Test to run the test using default options. The right pane displays the ARCH(SARIMA_PSSG) document, which shows the test results in the Results table.
4-35
4
Econometric Modeler
The results suggest rejection of the null hypothesis that the residuals exhibit no ARCH effects at a 5% level of significance. You can try removing heteroscedasticity by applying the log transformation to the series.
Finding Model with Best In-Sample Fit Econometric Modeler enables you to fit multiple related models to a data set efficiently. After you estimate a model, you can estimate other models by iterating the methods in “Perform Exploratory Data Analysis” on page 4-6, “Fitting Models to Data” on page 4-15, and “Conducting Goodness-of-Fit Checks” on page 4-30. After each iteration, a new model variable appears in the Models pane. For models in the same parametric family that you fit to the same response series, you can determine the model with the best parsimonious, in-sample fit among the estimated models by comparing their fit statistics. From a subset of candidate models, to determine the model of best fit using Econometric Modeler: 1
In the Models pane, double-click an estimated model. In the right pane, estimation results of the model appear in the Model Summary(Model) document, where Model is the name of the selected model.
2
On the Model Summary(Model) document, in the Goodness of Fit table, choose a fit statistic (AIC or BIC) and record its value.
3
Iterate the previous steps for all candidate models.
4
Choose the model that yields the minimal fit statistic.
For more details on goodness-of-fit statistics, see “Information Criteria for Model Selection” on page 3-54. Consider finding the best-fitting SARIMA model, with a period of 12, for the log of the airline passenger counts in the Data_Airline data set. Fit a subset of SARIMA models, considering all combinations of models that include up to two seasonal and nonseasonal MA lags.
4-36
1
Import the DataTimeTable variable in the Data_Airline data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
Apply the log transformation to PSSG (see “Transforming Time Series” on page 4-14).
3
Fit a SARIMA(0,1,q)×(0,1,q12)12 to PSSGLog, where all unknown orders are 0 (see “Estimating a Univariate Model” on page 4-23).
4
In the right pane, on the Model Summary(SARIMA_PSSGLog) document, in the Goodness of Fit table, record the AIC value.
Analyze Time Series Data Using Econometric Modeler
5
In the Models pane, select PSSGLog.
6
Iterate steps 4 and 5, but adjust q and q12 to cover the nine permutations of q ∈ {0,1,2} and q12 ∈ {0,1,2}. Econometric Modeler distinguishes subsequent models of the same type by appending consecutive digits to the end of the variable name.
The resulting AIC values are in this table. Model
Variable Name
AIC
SARIMA(0,1,0)× (0,1,0)12
SARIMA_PSSGLog1
-491.8042
SARIMA(0,1,0)× (0,1,1)12
SARIMA_PSSGLog2
-530.5327
SARIMA(0,1,0)× (0,1,2)12
SARIMA_PSSGLog3
-528.5330
SARIMA(0,1,1)× (0,1,0)12
SARIMA_PSSGLog4
-508.6853
SARIMA(0,1,1)× (0,1,1)12
SARIMA_PSSGLog5
-546.3970
SARIMA(0,1,1)× (0,1,2)12
SARIMA_PSSGLog6
-544.6444
SARIMA(0,1,2)× (0,1,0)12
SARIMA_PSSGLog7
-506.8027
SARIMA(0,1,2)× (0,1,1)12
SARIMA_PSSGLog8
-544.4789
SARIMA(0,1,2)× (0,1,2)12
SARIMA_PSSGLog9
-542.7171
Because it yields the minimal AIC, the SARIMA(0,1,1)×(0,1,1)12 model is the model with the best parsimonious, in-sample fit. 4-37
4
Econometric Modeler
Export Session Results Econometric Modeler offers several options for you to share your session results. The option you choose depends on your analysis goals. The options for sharing your results are in the Export section of the Econometric Modeler tab. This table describes the available options. Option Export Variables
Description Export time series and model variables to the MATLAB Workspace. Choose this option to perform further analysis at the MATLAB command line. For example, you can generate forecasts from an estimated model or check the predictive performance of several models.
Generate Function
Generate a MATLAB function to use outside the app. The function accepts the data loaded into the app as input, and outputs a model estimated in the app session. Choose this option to: • Understand the functions used by Econometric Modeler to create and estimate the model. • Modify the generated function in the MATLAB Editor for further use.
Generate Live Function
Generate a MATLAB live function to use outside the app. The function accepts the data loaded into the app as input, and outputs a model estimated in the app session. Choose this option to: • Understand the functions used by Econometric Modeler to create and estimate the model. • Modify the generated function in the Live Editor for further use.
Generate Report
Generate a report that summarizes the session. Choose this option when you achieve your analysis goals in Econometric Modeler, and you want to share a summary of the results.
Exporting Variables To export time series and estimated model variables from the Time Series or Models pane to the MATLAB Workspace: 1
2
4-38
On the Econometric Modeler tab, in the Export section, click Variables.
or Export > Export
In the Export Variables dialog box, all time series variables appear in the left pane and all model variables appear in the right pane. Choose time series and model variables to export by selecting the corresponding check boxes in the Select column. The app selects the check box of all time series or model variables that are selected in the Time Series and Models panes. Clear any check boxes for variables to you do not want to export. For example, this figure shows how to select the PSSGLog time series and the SARIMA_PSSGLog SARIMA model.
Analyze Time Series Data Using Econometric Modeler
3
Click Export.
The selected variables appear in the MATLAB Workspace. Time series variables are double-precision column vectors. Estimated models are objects of type depending on the model (for example, an exported ARIMA model is an arima object). Alternatively, you can export variables by selecting at least one variable, right-clicking a selected variable, and selecting Export. Generating a Function The app can generate a plain text function or a live function. The main difference between the two functions is the editor used to modify the generated function: you edit plain text functions in the MATLAB editor and live functions in the Live Editor. For more details on the differences between the two function types, see “What Is a Live Script or Function?”. Regardless of the function type you choose, the generated function accepts the data loaded into the app as input, and outputs a model estimated in the app session. To export a MATLAB function or live function that creates a model estimated in an app session: 1
In the Models pane, select an estimated model.
2
On the Econometric Modeler tab, in the Export section, click Export. In the Export menu, choose Generate Function or Generate Live Function.
The MATLAB Editor or Live Editor displays an untitled, unsaved function containing the code that estimates the model. • By default, the function name is modelTimeSeries. • The function accepts the originally imported data set as input. • Before the function estimates the model, it extracts the variables from the input data set used in estimation, and applies the same transformations to the variables that you applied in Econometric Modeler. • The function returns the selected estimated model.
4-39
4
Econometric Modeler
Consider generating a live function that returns SARIMA_PSSGLog, the SARIMA(0,1,1)×(0,1,1)12 model fit to the log of airline passenger data (see “Estimating a Univariate Model” on page 4-23). This figure shows the generated live function.
Generating a Report Econometric Modeler can produce a report describing your activities on selected time series and model variables. The app organizes the report into chapters corresponding to selected time series and model variables. Chapters describe session activities that you perform on the corresponding variable. Chapters on time series variables describe transformations, plots, and tests that you perform on the selected variable in the session. Estimated model chapters contain an estimation summary, that is, elements of the Model Summary document (see “Estimating a Univariate Model” on page 4-23), and residual diagnostics plots and tests. You can export the report as one of the following document types: • Hypertext Markup Language (HTML) • Microsoft® Word XML Format Document (DOCX) • Portable Document Format (PDF) To export a report: 1
4-40
On the Econometric Modeler tab, in the Export section, click Export > Generate Report.
Analyze Time Series Data Using Econometric Modeler
2
In the Select Variables for Report dialog box, all time series variables in the Time Series pane appear in the left pane and all model variables in the Models pane appear in the right pane. Choose variables to include the report by selecting their check boxes in the Select column.
3
Select a document type by clicking Report Format and selecting the format you want.
4
Click OK.
5
In the Select File to Write window: a
Browse to the folder in which you want to save the report.
b
In the File name box, type a name for the report.
c
Click Save.
Consider generating an HTML report for the analysis of the airline passenger data (see “Conducting Goodness-of-Fit Checks” on page 4-30). This figure shows how to select all variables and the HTML format.
This figure shows a sample of the generated report.
4-41
4
Econometric Modeler
4-42
Analyze Time Series Data Using Econometric Modeler
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler
More About •
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Import Time Series Data into Econometric Modeler App” on page 4-62
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131
•
“Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
•
“Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200
•
“Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
•
“Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208
•
“Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122
•
“Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4221
•
“Share Results of Econometric Modeler App Session” on page 4-237
•
Creating ARIMA Models Using Econometric Modeler App
4-43
4
Econometric Modeler
Specifying Univariate Lag Operator Polynomials Interactively Consider building a predictive, univariate time series model (conditional mean, conditional variance, or regression model with ARMA errors) by using the Econometric Modeler app. After you choose candidate models for estimation (see Perform Exploratory Data Analysis on page 4-6), you can specify the model structure of each. To do so, on the Econometric Modeler tab, in the Models section, click a model or display the gallery of supported models and click the model you want.
After you select a time series model, the Type Model Parameters dialog box appears, where Type is the model type. For example, if you select SARIMAX, then the SARIMAX Model Parameters dialog box appears.
4-44
Specifying Univariate Lag Operator Polynomials Interactively
• For all dynamic models, Econometric Modeler supports two options to specify the lag operator polynomials. The adjustment options are on separate tabs: the Lag Order and Lag Vector tab. The Lag Order tab options offer a straight forward way to include consecutive lags from lag 1 and degrees of integration (see “Specify Lag Structure Using Lag Order Tab” on page 4-45). The Lag Vector tab options allow you to create flexible models (see “Specify Lag Structure Using Lag Vector Tab” on page 4-47). • The Type Model Parameters dialog box contains a Nonseasonal or Seasonal section. The Seasonal section is absent in strictly nonseasonal model dialog boxes. To specify the nonseasonal lag operator polynomial structure, use the parameters in the Nonseasonal section. To adjust the seasonal lag operator polynomial structure, including seasonality, use the parameters in the Seasonal section. • To specify the degrees of nonseasonal integration, in the Nonseasonal section, in the Degree of Integration box, type the degree, or click the appropriate arrow on
.
• To specify the seasonal periodicity for seasonal models, in the Seasonal section, in the Period box, type the periodicity, or click the appropriate arrow on . When the periodicity is greater than 0, you can specify the seasonal autoregressive or moving average polynomial order similarly, and you can specify one degree of seasonal integration by clicking the Include Seasonal Difference check box. • For verification, the model form appears in the Model Equation section. The model form updates to your specifications in real time.
Specify Lag Structure Using Lag Order Tab On the Lag Order tab, in the Nonseasonal section, you can specify the orders of each lag operator polynomial in the nonseasonal component. In the appropriate lag polynomial order box (for example, the Autoregressive Order box), type the nonnegative integer order or click the appropriate arrow 4-45
4
Econometric Modeler
on . The app includes all consecutive lags from 1 through L in the polynomial, where L is the specified order. For seasonal models, on the Lag Order tab, in the Seasonal section: 1
Specify the period in the season by entering the nonnegative integer period in the Period box or by clicking
2
.
Specify the seasonal lag operator polynomial order. In the appropriate lag polynomial order box (for example, the Autoregressive Order box), type the nonnegative integer order ignoring seasonality or click . The lag operator exponents in the resulting polynomial are multiples of the specified period.
For example, if Period is 12 and Autoregressive Order in the Seasonal section is 3, then the seasonal autoregressive polynomial is 1 − Φ4L4 − Φ8L8 − Φ12L12 . To specify seasonal integration, select the Include Seasonal Difference check box. A seasonal difference polynomial appears in the Model Equation section, and its lag operator exponent is equal to the specified period. Consider a SARIMA(0,1,1)×(0,1,2)4 model, a seasonal multiplicative ARIMA model with four periods in a season. To specify this model using the parameters in the Lag Order tab: 1
Select a time series variable in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab, enter these values for the corresponding parameters. • In the Nonseasonal section, in the Degree of Integration box, type 1. • In the Nonseasonal section, in the Moving Average Order box, type 1. • In the Seasonal section, in the Period box, type 4. This value indicates a quarterly season. • In the Seasonal section, in the Moving Average Order box, type 2. This action includes seasonal MA lags 4 and 8 in the equation. • In the Seasonal section, select the Include Seasonal Difference check box.
4-46
Specifying Univariate Lag Operator Polynomials Interactively
Specify Lag Structure Using Lag Vector Tab On the Lag Vector tab, you specify the lags in the corresponding seasonal or nonseasonal lag operator polynomial. This figure shows the Lag Vector tab in the SARIMA Model Parameters dialog box.
4-47
4
Econometric Modeler
To specify the lags that comprise each lag operator polynomial, type a list of nonnegative, unique integers in the corresponding box. Separate values by commas or spaces, or use the colon operator (for example, 4:4:12). Specify the seasonal-difference degree by typing a nonnegative integer in the Seasonality box or by clicking
.
Consider a SARIMA(0,1,1)×(0,1,2)4 model, a seasonal multiplicative ARIMA model with four periods in a season. To specify this model using the parameters in the Lag Vector tab: 1
Select a time series variable in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, click the Lag Vector tab, then enter these values for the corresponding parameters. • In the Nonseasonal section, in the Degree of Integration box, type 1. • In the Nonseasonal section, in the Moving Average Lags box, type 1. • In the Seasonal section, in the Seasonality box, type 4. Therefore, a seasonal-difference polynomial of degree 4 appears in the equation in the Model Equation section. • In the Seasonal section, in the Moving Average Lags box, type 4 8. This action includes seasonal MA lags 4 and 8 in the equation.
4-48
Specifying Univariate Lag Operator Polynomials Interactively
See Also Apps Econometric Modeler Objects arima | regARIMA | garch | gjr | egarch
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122
4-49
4
Econometric Modeler
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively Consider building a predictive, multivariate time series model (vector autoregression (VAR) or vector error-correction (VEC)) by using the Econometric Modeler app. After you choose candidate models for estimation (see Perform Exploratory Data Analysis on page 4-6), you can specify the model structure of each. To do so, on the Econometric Modeler tab, in the Time Series pane, select all series in the model, and then, in the Models section, click the model you want.
After you select a time series model, the Type Model Parameters dialog box appears, where Type is the model type. For example, if you select VAR, then the VAR Model Parameters dialog box appears.
• Econometric Modeler supports two options to specify the autoregressive (AR) or short-run lag operator polynomials. The adjustment options are on separate tabs: the Lag Order and Lag Vector tab. The Lag Order tab options offer a straight forward way to include consecutive lags from lag 1 (see “Specify Lag Structure Using Lag Order Tab” on page 4-51). The Lag Vector tab
4-50
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
options allow you to create flexible models (see “Specify Lag Structure Using Lag Vector Tab” on page 4-52). • Regardless of which lag adjustment option you choose, you can specify equality constraints for estimation on individual lag coefficients (entries) within the AR (for VAR models) or short-run (for VEC models) matrix. Similarly, for VEC models, you can specify equality constraints on entire adjustment-speed and cointegration matrices, except for Johansen forms H* and H1* which support equality constraints only for the entire adjustment speed matrix. Econometric modeler enables you to enter equality constraints in the matrices provided in the dialog box or import entire matrices containing constraints from the workspace. For more details, see “Specify Coefficient Matrix Equality Constraints for Estimation” on page 4-54. • For verification, the model form appears in the Model Equation section. The model form updates to your specifications in real time.
Specify Lag Structure Using Lag Order Tab On the Lag Order tab, you can specify the orders of the lag operator polynomial by using the parameter appropriate for the model. • For VAR or VARX models, use the Autoregressive Order box to specify the order of the autoregressive polynomial. • For VEC models, use the Number of Lags box to specify the order of the short-run polynomial. Type a nonnegative integer or click the appropriate arrow on . The app includes all consecutive lags from 1 through L in the polynomial, where L is the specified order. When you specify an order, you can verify the lag operator polynomial in the Model Equation section. Also, Econometric Modeler includes all consecutive lags through L in the AR Coefficients (ϕ) (for VAR models) or Short-Run Coefficients (Φ) (for VEC models) list. The matrix below the list is the coefficient matrix of the selected lag, which you can use to specify equality constraints (see “Specify Coefficient Matrix Equality Constraints for Estimation” on page 4-54). Consider a 3-D VAR(4) model. To specify this model using the parameters in the Lag Order tab: 1
Select the three time series of the model in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click VAR.
3
In the VAR Model Parameters dialog box, on the Lag Order tab, in the Autoregressive Order box, type 4.
4-51
4
Econometric Modeler
4
Verify that the model in the Model Equation and the available lags in the AR Coefficients (ϕ) list contain lags 1 through 4.
Specify Lag Structure Using Lag Vector Tab On the Lag Vector tab, you specify a list of individual lags in the corresponding lag operator polynomial. This figure shows the Lag Vector tab in the VAR Model Parameters dialog box.
4-52
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
To specify the lags that comprise each lag operator polynomial, type a list of nonnegative, unique integers in the corresponding box. Separate values by commas or spaces, or use the colon operator (for example, 4:4:12). Consider a VAR(8) model that includes only lags 1, 4, and 8. To specify this model using the parameters in the Lag Vector tab: 1
Select the three time series of the model in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click VAR.
3
In the VAR Model Parameters dialog box, on the Lag Vector tab, in the Autoregressive Lags box, type 1 4 8.
4-53
4
Econometric Modeler
Specify Coefficient Matrix Equality Constraints for Estimation Econometric Modeler supports equality constraints when estimating the following parameters: • For VAR and VARX models, you can specify equality constraints on individual entries of the AR coefficient matrices. In other words, you can specify known values for some matrix elements and estimate the others. • For VEC models, you can specify equality constraints on: • Individual entries of the short-run coefficient matrices, like the AR coefficients of VAR models. • For all Johansen forms, the entire adjustment-speed matrix A. That is, Econometric Modeler does not support constraints on individual matrix elements. • For all Johansen forms except for H* and H1*, the entire cointegration matrix B. To specify equality constraints, enter the constraints in the appropriate matrix provided in the Type Model Parameters dialog box below the name of the matrix. By default, Econometric Modeler populates all matrices with NaN values, which indicate unknown, estimable parameters. For all matrices, each row corresponds to a response equation in the model. For AR and short-run coefficient matrices, each column is the lag coefficient of the specified variable within the equation. For example, in the figure, entry (2,3) is the lag 1 coefficient of TB3MSRate in the equation of M1SLRate.
4-54
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
For VEC models, adjustment-speed A and cointegration matrices B have the r linearly independent columns, where r is the rank of the cointegrating relation specified in the Rank box. Each column corresponds to a cointegrating relation.
Econometric Modeler enables you to specify constraints in two ways: • You can click the elements you want to constrain in the matrices provided, and enter the constraint. • You can use the Import button above the matrix to import an appropriately sized and configured matrix of constraints from the workspace. For example, consider a VEC(1) model for short-, medium-, and long-term annual Canadian interest rate series from 1954 through 1994. Suppose economic theory suggests the following characteristics of the model: • Each series is a unit root process with a cointegration rank of 2. • The deterministic terms in the model are a vector of intercepts in the cointegrating relations and a deterministic linear trend vector in the levels of the data (H1 Johansen form). 4-55
4
Econometric Modeler
• The linear effect, on the current long-run interest rate, of the short-run interest rate in the previous year is 0.5. • The cointegrating relations that produce stationary processes are 2.1INT_L - 2.1INT_M + 0.1INT_S and 1.7INT_L - 3.7INT_M + 1.8INT_S. To specify this model and its characteristics, follow this procedure. 1
Load the Canadian inflation and interest rate data Data_Canada.mat. load Data_Canada
2
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). 3
Import the DataTimeTable variable in the Data_Canada data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
4
Select the series INT_L, INT_M, and INT_S in the Time Series pane.
5
On the Econometric Modeler tab, in the Models section, click VEC.
6
In the VEC Model Parameters dialog box, in the Rank box, type 2.
7
In the matrix below the Short-Run Coefficients (Φ) list, in element (1,3), type 0.5.
8
Below Cointegration Matrix (B), observe that the order of the variables in the matrices is INT_L, INT_M, and INT_S. At the command line, create the cointegration matrix with the corresponding variable order. B = [2.1 1.7; -2.1 -3.7; 0.1 1.7];
In the VEC Model Parameters dialog box, at Cointegration Matrix (B), click Import. In the dialog box, select B. The figure shows the parameter configurations in the VEC Model Parameters dialog box.
4-56
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
See Also Apps Econometric Modeler Objects vecm | varm
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 4-57
4
Econometric Modeler
4-58
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
Prepare Time Series Data for Econometric Modeler App
Prepare Time Series Data for Econometric Modeler App These examples show how to prepare time series data at the MATLAB command line for use in the Econometric Modeler app. You can import only one variable into Econometric Modeler. The variable can exist in the MATLAB Workspace or a MAT-file. A row in a MATLAB timetable contains simultaneously sampled observations. When you import a timetable, the app plots associated times on the x axis of time series plots and enables you to overlay recession bands on the plots. Therefore, these examples show how to create timetables for univariate and multivariate time series data. For other supported data types and variable orientation, see “Prepare Data for Econometric Modeler App” on page 4-3.
Prepare Table of Multivariate Data for Import This example shows how to create a MATLAB timetable from synchronized data stored in a MATLAB table. The data set contains annual Canadian inflation and interest rates from 1954 through 1994. At the command line, clear the Workspace, then load the Data_Canada.mat data set. Display all variables in the workspace. clear all load Data_Canada whos Name Data DataTable DataTimeTable Description dates series
Size
Bytes
41x5 41x5 41x5 34x55 41x1 1x5
1640 8107 3827 3740 328 878
Class
Attributes
double table timetable char double cell
Data, DataTable, and DataTimeTable contain the time series, and dates contains the sampling years as a numeric vector. The row names of DataTable are the sampling years. For more details about the data set, enter Description at the command line. Time series plots in Econometric Modeler label the x-axis with sampling times when they are attached to the data. Although you can import DataTimeTable to use this feature, consider preparing DataTable as a timetable. Clear the row names of DataTable. DataTable.Properties.RowNames = {};
Convert the sampling years to a datetime vector. Specify the years, and assume that measurements were taken at the end of December. Specify that the time format is the sampling year. dates = datetime(dates,12,31,'Format','yyyy');
Convert the table DataTable to a timetable by associating the rows with the sampling times in dates. DTT = table2timetable(DataTable,'RowTimes',dates);
4-59
4
Econometric Modeler
DTT is a timetable containing the five time series and a variable named Time representing the time base. DTT is prepared for importing into Econometric Modeler. If your time series are not synchronized (that is, do not share a common time base), then you must synchronize them before you import them into the app. For more details, see synchronize and “Combine Timetables and Synchronize Their Data”.
Prepare Numeric Vector for Import This example shows how to create a timetable from a univariate time series stored as a numeric column vector. The data set contains the quarterly US gross domestic product (GDP) prices from 1947 through 2005. At the command line, clear the workspace, then load the Data_GDP.mat data set. Display all variables in the workspace. clear all load Data_GDP whos Name Data DataTable DataTimeTable Description dates
Size
Bytes
Class
234x1 234x1 234x1 22x59 234x1
1872 32337 4727 2596 1872
double table timetable char double
Attributes
Data contains the time series, and dates contains the sampling times as serial date numbers. For more details about the data set, enter Description at the command line. Convert the sampling times to a datetime vector. By default, MATLAB stores the hours, minutes, and seconds when converting from serial date numbers. Remove these clock times from the data. dates = datetime(dates,'ConvertFrom','datenum','Format','ddMMMyyyy',... 'Locale','en_US');
Create a timetable containing the data, and associate each row with the corresponding sampling time in dates. Name the variable GDP. DTT = timetable(Data,'RowTimes',dates,'VariableNames',{'GDP'});
DTT is a timetable, and is prepared for importing into Econometric Modeler.
See Also Apps Econometric Modeler Objects timetable Functions table2timetable | synchronize 4-60
Prepare Time Series Data for Econometric Modeler App
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Timetables”
•
“Prepare Data for Econometric Modeler App” on page 4-3
4-61
4
Econometric Modeler
Import Time Series Data into Econometric Modeler App These examples show how to import time series data into the Econometric Modeler app. Before you import the data, you must prepare the data at the MATLAB command line (see “Prepare Time Series Data for Econometric Modeler App” on page 4-59).
Import Data from MATLAB Workspace This example shows how to import data from the MATLAB Workspace into Econometric Modeler. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import the MATLAB timetable containing the data: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window. 4-62
Import Time Series Data into Econometric Modeler App
The series exhibits a seasonal trend and serial correlation, but an exponential growth is not apparent. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71.
Import Data from MAT-File This example shows how to import data from a MAT-file, stored on your machine, into Econometric Modeler. Suppose that the data set is named Data_Airline.mat and is stored in the MyData folder of your C drive. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import the MAT-file containing the data: 1
On the Econometric Modeler tab, in the Import section, click Import > Import From MATfile. 4-63
4
Econometric Modeler
4-64
2
In the Select a MAT-file window, browse to the C:\MyData folder. Select Data_Airline.mat, then click Open.
3
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
4
Click Import.
Import Time Series Data into Econometric Modeler App
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
See Also Apps Econometric Modeler Objects timetable
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
Creating ARIMA Models Using Econometric Modeler App
4-65
4
Econometric Modeler
Plot Time Series Data Using Econometric Modeler App These examples show how to plot univariate and multivariate time series data by using the Econometric Modeler app. After plotting time series, you can interact with the plots.
Plot Univariate Time Series Data This example shows how to plot univariate time series data, then overlay recession bands in the plot. The data set contains the quarterly US gross domestic product (GDP) prices from 1947 through 2005. At the command line, load the Data_GDP.mat data set. load Data_GDP
DataTimeTable is a timetable containing time series data. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable GDP appears in the Time Series pane, and its time series plot appears in the Time Series Plot(GDP) figure window. Overlay recession bands by right-clicking the plot and figure window, then selecting Show Recessions. Overlay a grid by pausing on the plot and clicking
.
Focus on the GDP from 1970 to the end of the sampling period:
4-66
1
Pause on the plot, then click
2
Position the cross hair at (1970,12000), then drag the cross hair to (2005,3500).
.
Plot Time Series Data Using Econometric Modeler App
The GDP appears flat or decreasing before and during periods of recession.
Plot Multivariate Time Series and Correlations This example shows how to plot multiple series on the same time series plot, interact with the resulting plot, and plot the correlations among the variables. The data set, stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
. 4-67
4
Econometric Modeler
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(INF_C) figure window. Overlay recession bands by right-clicking the plot and selecting Show Recessions. Overlay a grid by pausing on the plot and clicking
.
Remove the inflation rates (INF_C and INF_G) from the time series plot:
4-68
1
Right-click the plot.
2
Point to Show Time Series, then clear INF_C.
3
Repeat steps 1 and 2, but clear INF_G instead.
Plot Time Series Data Using Econometric Modeler App
Generate a correlation plot for all variables: 1
Select all variables in the Time Series pane.
2
Click the Plots tab, then click Correlations.
A correlations plot appears in the Correlations(INF_C) figure window. Remove the inflation rate based on GDP (INF_G) from the correlations plot: 1
Right-click the plot.
2
Point to Show Time Series, then clear INF_G.
All variables appear to be skewed to the right. According to the Pearson correlation coefficients (topleft of the off-diagonal plots): • The inflation rate explains at least 70% of the variability in the interest rates (when used as a predictor in a linear regression).
4-69
4
Econometric Modeler
• The interest rates are highly correlated; each explains at least 94% of the variability in another series.
See Also Apps Econometric Modeler Functions corrplot | recessionplot
More About
4-70
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Import Time Series Data into Econometric Modeler App” on page 4-62
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
Creating ARIMA Models Using Econometric Modeler App
Detect Serial Correlation Using Econometric Modeler App
Detect Serial Correlation Using Econometric Modeler App These examples show how to assess serial correlation by using the Econometric Modeler app. Methods include plotting the autocorrelation function (ACF) and partial autocorrelation function (PACF), and testing for significant lag coefficients using the Ljung-Box Q-test. The data set Data_Overshort.mat contains 57 consecutive days of overshorts from a gasoline tank in Colorado.
Plot ACF and PACF This example shows how to plot the ACF and PACF of a time series. At the command line, load the Data_Overshort.mat data set. load Data_Overshort
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable OSHORT appears in the Time Series pane, and its time series plot appears in the Time Series Plot(OSHORT) figure window.
4-71
4
Econometric Modeler
The series appears to be stationary. Close the Time Series Plot(OSHORT) figure window. Plot the ACF of OSHORT by clicking the Plots tab. The ACF appears in the ACF(OSHORT) figure window then clicking ACF. Plot the PACF of OSHORT by clicking the Plots tab then clicking PACF. The PACF appears in the PACF(OSHORT) figure window. Position the correlograms so that you can view them at the same time by dragging the PACF(OSHORT) figure window to the bottom of the right pane.
4-72
Detect Serial Correlation Using Econometric Modeler App
The sample ACF and PACF exhibit significant autocorrelation (that is, both contain lags that are more than two standard deviations away from 0). The sample ACF shows that the autocorrelation at lag 1 is significant. The sample PACF shows that the autocorrelations at lags 1, 3, and 4 are significant. The distinct cutoff of the ACF and the more gradual decay of the PACF suggest an MA(1) model might be appropriate for this data.
Conduct Ljung-Box Q-Test for Significant Autocorrelation This example shows how to conduct the Ljung-Box Q-test for significant autocorrelation lags. At the command line, load the Data_Overshort.mat data set. load Data_Overshort
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). 4-73
4
Econometric Modeler
Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable OSHORT appears in the Time Series pane, and its time series plot appears in the Time Series Plot(OSHORT) figure window.
The series appears to be stationary, and it fluctuates around a constant mean. Therefore, you do not need to transform the data before conducting the test. Conduct three Ljung-Box Q-Tests for testing the null hypothesis that the first 10, 5, and 1 autocorrelations are jointly zero:
4-74
1
On the Econometric Modeler tab, in the Tests section, click New Test > Ljung-Box Q-Test.
2
On the LBQ tab, in the Parameters section:
Detect Serial Correlation Using Econometric Modeler App
a
Set Number of Lags to 10.
b
Set DOF to 10.
c
To achieve a false positive rate below 0.05, use the Bonferroni correction to set Significance Level to 0.05/3 = 0.0167.
3
In the Tests section, click Run Test.
4
Repeat steps 2 and 3 twice, with these changes: a
Set Number of Lags to 5 and the DOF to 5.
b
Set Number of Lags to 1 and the DOF to 1.
The test results appear in the Results table of the LBQ(OSHORT) document.
The results show that not every autocorrelation up to lag 5 (or 10) is zero, indicating volatility clustering in the residual series.
See Also Apps Econometric Modeler Functions parcorr | autocorr | lbqtest
More About •
“Detect Autocorrelation” on page 3-20
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Box-Jenkins Methodology” on page 3-2
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Ljung-Box Q-Test” on page 3-18
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 4-75
4
Econometric Modeler
•
4-76
Creating ARIMA Models Using Econometric Modeler App
Detect ARCH Effects Using Econometric Modeler App
Detect ARCH Effects Using Econometric Modeler App These examples show how to assess whether a series has volatility clustering by using the Econometric Modeler app. Methods include inspecting correlograms of squared residuals and testing for significant ARCH lags. The data set, stored in Data_EquityIdx.mat, contains a series of daily NASDAQ closing prices from 1990 through 2001.
Inspect Correlograms of Squared Residuals for ARCH Effects This example shows how to visually determine whether a series has significant ARCH effects by plotting the autocorrelation function (ACF) and partial autocorrelation function (PACF) of a series of squared residuals. At the command line, load the Data_EquityIdx.mat data set. load Data_EquityIdx
The data set contains a table of NASDAQ and NYSE closing prices, among other variables. For more details about the data set, enter Description at the command line. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(NASDAQ) figure window. Convert the daily close NASDAQ index series to a percentage return series by taking the log of the series, then taking the first difference of the logged series: 1
In the Time Series pane, select NASDAQ.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
3
With NASDAQLog selected, in the Transforms section, click Difference.
4
In the Time Series pane, rename the NASDAQLogDiff variable by clicking it twice to select its name and entering NASDAQReturns.
The time series plot of the NASDAQ returns appears in the Time Series Plot(NASDAQReturns) figure window.
4-77
4
Econometric Modeler
The returns appear to fluctuate around a constant level, but exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. Compute squared residuals: 1
Export NASDAQReturns to the MATLAB Workspace: a
In the Time Series pane, right-click NASDAQReturns.
b
In the context menu, select Export.
NASDAQReturns appears in the MATLAB Workspace. 2
4-78
At the command line: a
For numerical stability, scale the returns by a factor of 100.
b
Create a residual series by removing the mean from the scaled returns series. Because you took the first difference of the NASDAQ prices to create the returns, the first element of the returns is missing. Therefore, to estimate the sample mean of the series, call mean(NASDAQReturns,'omitnan').
Detect ARCH Effects Using Econometric Modeler App
c
Square the residuals.
d
Add the squared residuals as a new variable to the DataTimeTable timetable.
NASDAQReturns = 100*NASDAQReturns; NASDAQResiduals = NASDAQReturns - mean(NASDAQReturns,'omitnan'); NASDAQResiduals2 = NASDAQResiduals.^2; DataTimeTable.NASDAQResiduals2 = NASDAQResiduals2;
In Econometric Modeler, import DataTimeTable: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Econometric Modeler dialog box, click OK to clear all variables and documents in the app.
3
In the Import Data dialog box, in the Import? column, select the check box for DataTimeTable.
4
Click Import.
Plot the ACF and PACF: 1
In the Time Series pane, select the NASDAQResiduals2 time series.
2
Click the Plots tab, then click ACF.
3
Click the Plots tab, then click PACF.
4
Close the Time Series Plot(NASDAQ) figure window. Then, position the ACF(NASDAQResiduals2) figure window above the PACF(NASDAQResiduals2) figure window.
4-79
4
Econometric Modeler
The sample ACF and PACF show significant autocorrelation in the squared residuals. This result indicates that volatility clustering is present.
Conduct Ljung-Box Q-Test on Squared Residuals This example shows how to test squared residuals for significant ARCH effects using the Ljung-Box Qtest. At the command line:
4-80
1
Load the Data_EquityIdx.mat data set.
2
Convert the NASDAQ prices to returns. To maintain the correct time base, prepend the resulting returns with a NaN value.
3
Scale the NASDAQ returns.
4
Compute residuals by removing the mean from the scaled returns.
5
Square the residuals.
Detect ARCH Effects Using Econometric Modeler App
6
Add the vector of squared residuals as a variable to DataTimeTable.
For more details on the steps, see “Inspect Correlograms of Squared Residuals for ARCH Effects” on page 4-77. load Data_EquityIdx NASDAQReturns = 100*price2ret(DataTimeTable.NASDAQ); NASDAQReturns = [NaN; NASDAQReturns]; NASDAQResiduals2 = (NASDAQReturns - mean(NASDAQReturns,'omitnan')).^2; DataTimeTable.NASDAQResiduals2 = NASDAQResiduals2;
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(NASDAQ) figure window. Test the null hypothesis that the first m = 5 autocorrelation lags of the squared residuals are jointly zero by using the Ljung-Box Q-test. Then, test the null hypothesis that the first m = 10 autocorrelation lags of the squared residuals are jointly zero. 1
In the Time Series pane, select the NASDAQResiduals2 time series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Ljung-Box Q-Test.
3
On the LBQ tab, in the Parameters section, set both the Number of Lags and DOF to 5. To maintain a significance level of 0.05 for the two tests, set Significance Level to 0.025.
4
In the Tests section, click Run Test.
5
Repeat steps 3 and 4, but set both the Number of Lags and DOF to 10 instead.
The test results appear in the Results table of the LBQ(NASDAQResiduals2) document.
4-81
4
Econometric Modeler
The null hypothesis is rejected for the two tests. The p-value for each test is 0. The results show that not every autocorrelation up to lag 5 (or 10) is zero, indicating volatility clustering in the squared residuals.
Conduct Engle's ARCH Test This example shows how to test residuals for significant ARCH effects using the Engle's ARCH Test. At the command line: 1
Load the Data_EquityIdx.mat data set.
2
Convert the NASDAQ prices to returns. To maintain the correct time base, prepend the resulting returns with a NaN value.
3
Scale the NASDAQ returns.
4
Compute residuals by removing the mean from the scaled returns.
5
Add the vector of residuals as a variable to DataTimeTable.
For more details on the steps, see “Inspect Correlograms of Squared Residuals for ARCH Effects” on page 4-77. load Data_EquityIdx NASDAQReturns = 100*price2ret(DataTimeTable.NASDAQ); NASDAQReturns = [NaN; NASDAQReturns]; NASDAQResiduals = NASDAQReturns - mean(NASDAQReturns,'omitnan'); DataTimeTable.NASDAQResiduals = NASDAQResiduals;
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables appear in the Time Series pane, and a time series plot of the all the series appears in the Time Series Plot(NASDAQ) figure window. Test the null hypothesis that the NASDAQ residuals series exhibits no ARCH effects by using Engle's ARCH test. Specify that the residuals series is an ARCH(2) model. 1
In the Time Series pane, select the NASDAQResiduals time series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Engle's ARCH Test.
3
On the ARCH tab, in the Parameters section, set Number of Lags to 2.
4
In the Tests section, click Run Test.
The test results appear in the Results table of the ARCH(NASDAQResiduals) document. 4-82
Detect ARCH Effects Using Econometric Modeler App
The null hypothesis is rejected in favor of the ARCH(2) alternative. The test result indicates significant volatility clustering in the residuals.
See Also Apps Econometric Modeler Functions parcorr | autocorr | lbqtest | archtest
More About •
“Detect ARCH Effects” on page 3-28
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Autocorrelation and Partial Autocorrelation” on page 3-11
•
“Ljung-Box Q-Test” on page 3-18
•
“Engle’s ARCH Test” on page 3-26
4-83
4
Econometric Modeler
Assess Stationarity of Time Series Using Econometric Modeler These examples show how to conduct statistical hypothesis tests for assessing whether a time series is a unit root process by using the Econometric Modeler app. The test you use depends on your assumptions about the nature of the nonstationarity of an underlying model.
Test Assuming Unit Root Null Model This example uses the Augmented Dickey-Fuller and Phillips-Perron tests to assess whether a time series is a unit root process. The null hypothesis for both tests is that the time series is a unit root process. The data set, stored in Data_USEconModel.mat, contains the US gross domestic product (GDP) measured quarterly, among other series. At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, among GDP, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(COE) figure window. In the Time Series pane, double-click GDP. A time series plot of GDP appears in the Time Series Plot(GDP) figure window.
4-84
Assess Stationarity of Time Series Using Econometric Modeler
The series appears to grow without bound. Apply the log transformation to GDP. On the Econometric Modeler tab, in the Transforms section, click Log. In the Time Series pane, a variable representing the logged GDP (GDPLog) appears. A time series plot of the logged GDP appears in the Time Series Plot(GDPLog) figure window.
4-85
4
Econometric Modeler
The logged GDP series appears to have a time trend or drift term. Using the Augmented Dickey-Fuller test, test the null hypothesis that the logged GDP series has a unit root against a trend stationary AR(1) model alternative. Conduct a separate test for an AR(1) model with drift alternative. For the null hypothesis of both tests, include the restriction that the trend and drift terms, respectively, are zero by conducting F tests. 1
With GDPLog selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Augmented Dickey-Fuller Test.
2
On the ADF tab, in the Parameters section: a
Set Number of Lags to 1.
b
Select Model > Trend Stationary.
c
Select Test Statistic > F statistic.
3
In the Tests section, click Run Test.
4
Repeat steps 2 and 3, but select Model > Autoregressive with Drift instead.
The test results appear in the Results table of the ADF(GDPLog) document. 4-86
Assess Stationarity of Time Series Using Econometric Modeler
For the test supposing a trend stationary AR(1) model alternative, the null hypothesis is not rejected. For the test assuming an AR(1) model with drift, the null hypothesis is rejected. Apply the Phillips-Perron test using the same assumptions as in the Augmented Dickey-Fuller tests, except the trend and drift terms in the null model cannot be zero. 1
With GDPLog selected in the Time Series pane, click the Econometric Modeler tab. Then, in the Tests section, click New Test > Phillips-Perron Test.
2
On the PP tab, in the Parameters section: a
Set Number of Lags to 1.
b
Select Model > Trend Stationary.
3
In the Tests section, click Run Test.
4
Repeat steps 2 and 3, but select Model > Autoregressive with Drift instead.
The test results appear in the Results table of the PP(GDPLog) document.
The null is not rejected for both tests. These results suggest that the logged GDP possibly has a unit root. The difference in the null models can account for the differences between the Augmented DickeyFuller and Phillips-Perron test results.
Test Assuming Stationary Null Model This example uses the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test to assess whether a time series is a unit root process. The null hypothesis is that the time series is stationary. The data set, stored in Data_NelsonPlosser.mat, contains annual nominal wages, among other US macroeconomic series. 4-87
4
Econometric Modeler
At the command line, load the Data_NelsonPlosser.mat data set. load Data_NelsonPlosser
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including the nominal wages WN, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(BY)figure window. In the Time Series pane, double-click WN. A time series plot of WN appears in the Time Series Plot(WN) figure window.
4-88
Assess Stationarity of Time Series Using Econometric Modeler
The series appears to grow without bound, and wage measurements are missing before 1900. To zoom into values occurring after 1900, pause on the plot, click box produced by dragging the cross hair.
, and enclose the time series in the
Apply the log transformation to WN. On the Econometric Modeler tab, in the Transforms section, click Log. In the Time Series pane, a variable representing the logged wages (WNLog) appears. The logged series appears in the Time Series Plot(WNLog) figure window.
The logged wages appear to have a linear trend. Using the KPSS test, test the null hypothesis that the logged wages are trend stationary against the unit root alternative. As suggested in [1], conduct three separate tests by specifying 7, 9, and 11 lags in the autoregressive model. 1
With WNLog selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > KPSS Test.
2
On the KPSS tab, in the Parameters section, set Number of Lags to 7.
3
In the Tests section, click Run Test. 4-89
4
Econometric Modeler
4
Repeat steps 2 and 3, but set Number of Lags to 9 instead.
5
Repeat steps 2 and 3, but set Number of Lags to 11 instead.
The test results appear in the Results table of the KPSS(WNLog) document.
All tests fail to reject the null hypothesis that the logged wages are trend stationary.
Test Assuming Random Walk Null Model This example uses the variance ratio test to assess the null hypothesis that a time series is a random walk. The data set, stored in CAPMuniverse.mat, contains market data for daily returns of stocks and cash (money market) from the period January 1, 2000 to November 7, 2005. At the command line, load the CAPMuniverse.mat data set. load CAPMuniverse
The series are in the timetable AssetsTimeTable. The first column of data (AAPL) is the daily return of a technology stock. The last column is the daily return for cash (the daily money market rate, CASH). Accumulate the daily technology stock and cash returns. AssetsTimeTable.AAPLcumsum = cumsum(AssetsTimeTable.AAPL); AssetsTimeTable.CASHcumsum = cumsum(AssetsTimeTable.CASH);
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import AssetsTimeTable into the app: 1
4-90
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the AssetsTimeTable variable.
3
Click Import.
Assess Stationarity of Time Series Using Econometric Modeler
The variables, including stock and cash prices (AAPLcumsum and CASHcumsum), appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(AAPL) figure window. In the Time Series pane, double-click AAPLcumsum. A time series plot of AAPLcumsum appears in the Time Series Plot(AAPLcumsum) figure window.
The accumulated returns of the stock appear to wander at first, with high variability, and then grow without bound after 2004. Using the variance ratio test, test the null hypothesis that the series of accumulated stock returns is a random walk. First, test without assuming IID innovations for the alternative model, then test assuming IID innovations. 1
With AAPLcumsum selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Variance Ratio Test.
2
On the VRatio tab, in the Tests section, click Run Test. 4-91
4
Econometric Modeler
3
On the VRatio tab, in the Parameters section, select the IID Innovations check box.
4
In the Tests section, click Run Test.
The test results appear in the Results table of the VRatio(AAPLcumsum) document.
Without assuming IID innovations for the alternative model, the test fails to reject the random walk null model. However, assuming IID innovations, the test rejects the null hypothesis. This result might be due to heteroscedasticity in the series, that is, the series might be a heteroscedastic random walk. In the Time Series pane, double-click CASHcumsum. A time series plot of CASHcumsum appears in the Time Series Plot(CASHcumsum) figure window.
4-92
Assess Stationarity of Time Series Using Econometric Modeler
The series of accumulated cash returns exhibits low variability and appears to have long-term trends. Test the null hypothesis that the series of accumulated cash returns is a random walk: 1
With CASHcumsum selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Variance Ratio Test.
2
On the VRatio tab, in the Parameters section, clear the IID Innovations box.
3
In the Tests section, click Run Test.
The test results appear in the Results tab of the VRatio(CASHcumsum) document.
The test rejects the null hypothesis that the series of accumulated cash returns is a random walk.
References [1] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178.
See Also adftest | kpsstest | lmctest | vratiotest
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Assess Stationarity of a Time Series” on page 3-51
•
“Unit Root Nonstationarity” on page 3-33
•
“Unit Root Tests” on page 3-41
4-93
4
Econometric Modeler
Assess Collinearity Among Multiple Series Using Econometric Modeler App This example shows how to assess the strengths and sources of collinearity among multiple series by using Belsley collinearity diagnostics in the Econometric Modeler app. The data set, stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(INF_C) figure window.
4-94
Assess Collinearity Among Multiple Series Using Econometric Modeler App
Perform Belsley collinearity diagnostics on all series. On the Econometric Modeler tab, in the Tests section, click New Test > Belsley Collinearity Diagnostics. The Collinearity(INF_C) document appear with the following results: • A table of singular values, corresponding condition indices, and corresponding variable variancedecomposition proportions • A plot of the variable variance-decomposition proportions corresponding to the condition index that is above the threshold, and a horizontal line indicating the variance-decomposition threshold
4-95
4
Econometric Modeler
The interest rates have variance-decomposition proportions exceeding the default tolerance, 0.5, indicated by red markers in the plot. This result suggests that the interest rates exhibit multicollinearity. If you use the three interest rates as predictors in a linear regression model, then the predictor data matrix can be ill conditioned.
See Also collintest
More About
4-96
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Time Series Regression II: Collinearity and Estimator Variance” on page 5-180
Transform Time Series Using Econometric Modeler App
Transform Time Series Using Econometric Modeler App The Econometric Modeler app enables you to transform time series data based on deterministic or stochastic trends you see in plots or hypothesis test conclusions. Available transformations in the app are log, seasonal and nonseasonal difference, and linear detrend. These examples show how to apply each transformation to time series data.
Apply Log Transformation to Data This example shows how to stabilize a time series, whose variability grows with the level of the series, by applying the log transformation. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, and its time series plot is in the Time Series Plot(PSSG) figure window. Fit a SARIMA(0,1,1)×(0,1,1)12 model to the data in levels: 1
On the Econometric Modeler tab, in the Models section, click the arrow to display the model gallery.
2
In the models gallery, in the ARMA/ARIMA Models section, click SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section a
Set Degrees of Integration to 1.
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box. 4-97
4
Econometric Modeler
4
Click Estimate.
The model variable SARIMA_PSSG appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(SARIMA_PSSG) document.
4-98
Transform Time Series Using Econometric Modeler App
The spread of the residuals increases with the level of the data, which is indicative of heteroscedasticity. Apply the log transform to PSSG: 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-99
4
Econometric Modeler
The exponential growth appears removed from the series. With PSSGLog selected in the Time Series pane, fit the SARIMA(0,1,1)×(0,1,1)12 model to the logged series using the same dialog box settings that you used for PSSG. The estimation summary appears in the Model Summary(SARIMA_PSSGLog) document.
4-100
Transform Time Series Using Econometric Modeler App
The spread of the residuals does not appear to change systematically with the levels of the data.
Stabilize Time Series Using Nonseasonal Differencing This example shows how to stabilize a time series by applying multiple nonseasonal difference operations. The data set, which is stored in Data_USEconModel.mat, contains the US gross domestic product (GDP) measured quarterly, among other series. At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 4-101
4
Econometric Modeler
1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including GDP, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(COE) figure window. In the Time Series pane, double-click GDP. A time series plot of GDP appears in the Time Series Plot(GDP) figure window.
The series appears to grow without bound. Apply the first difference to GDP. On the Econometric Modeler tab, in the Transforms section, click Difference. In the Time Series pane, a variable representing the differenced GDP (GDPDiff) appears. A time series plot of the differenced GDP appears in the Time Series Plot(GDPDiff) figure window. 4-102
Transform Time Series Using Econometric Modeler App
The differenced GDP series appears to grow without bound after 1970. Apply the second difference to the GDP by differencing the differenced GDP. With GDPDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Difference. In the Time Series pane, a variable representing the transformed differenced GDP (GDPDiffDiff) appears. A time series plot of the differenced GDP appears in the Time Series Plot(GDPDiffDiff) figure window.
4-103
4
Econometric Modeler
The transformed differenced GDP series appears stationary, although heteroscedastic.
Convert Prices to Returns This example shows how to convert multiple series of prices to returns. The data set, which is stored in Data_USEconModel.mat, contains the US GDP and personal consumption expenditures measured quarterly, among other series. At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
4-104
On the Econometric Modeler tab, in the Import section, click the Import button
.
Transform Time Series Using Econometric Modeler App
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
GDP and PCEC, among other series, appear in the Time Series pane, and a time series plot containing all series appears in the figure window. In the Time Series pane, click GDP, then press Ctrl and click PCEC. Both series are selected. Click the Plots tab, then click Time Series. A time series plot of GDP and PCEC appears in the Time Series Plot(GDP) figure window.
Both series, as prices, appear to grow without bound. Convert the GDP and personal consumption expenditure prices to returns: 4-105
4
Econometric Modeler
1
Click the Econometric Modeler tab. Ensure that GDP and PCEC are selected in the Time Series pane.
2
In the Transforms section, click Log. The Time Series pane displays variables representing the logged GDP series (GDPLog) and the logged personal consumption expenditure series (PCECLog).
3
With GDPLog and PCECLog selected in the Time Series pane, in the Transforms section, click Difference.
The Time Series pane displays variables representing the GDP returns (GDPLogDiff) and personal consumption expenditure returns (PCECLogDiff). A time series plot of the GDP and personal consumption expenditure returns appears in the Time Series Plot(GDPLogDiff) figure window. In the Time Series pane, rename the GDPLogDiff and PCECLogDiff variables. Click GDPLogDiff twice to select its name and enter GDPReturns. Click PCECLogDiff twice to select its name and enter PCECReturns. The app updates the names of all documents associated with both returns.
4-106
Transform Time Series Using Econometric Modeler App
The series of GDP and personal consumption expenditure returns appear stationary, but observations within each series appear serially correlated.
Remove Seasonal Trend from Time Series Using Seasonal Difference This example shows how to stabilize a time series exhibiting seasonal integration by applying a seasonal difference. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
4-107
4
Econometric Modeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window. Address the seasonal trend by applying the 12th order seasonal difference. On the Econometric Modeler tab, in the Transforms section, set Seasonal to 12. Then, click Seasonal. The transformed variable PSSGSeasonalDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGSeasonalDiff) figure window.
The transformed series appears to have a nonseasonal trend. 4-108
Transform Time Series Using Econometric Modeler App
Address the nonseasonal trend by applying the first difference. With PSSGSeasonalDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Difference. The transformed variable PSSGSeasonalDiffDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGSeasonalDiffDiff) figure window.
The transformed series appears stationary, but observations appear serially correlated. In the Time Series pane, rename the PSSGSeasonalDiffDiff variable by clicking it twice to select its name and entering PSSGStable. The app updates the names of all documents associated with the transformed series.
Remove Deterministic Trend from Time Series This example shows how to remove a least-squares-derived deterministic trend from a nonstationary time series. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. 4-109
4
Econometric Modeler
load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window. Apply the log transformation to the series. On the Econometric Modeler tab, in the Transforms section, click Log. The transformed variable PSSGLog appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window. Identify the deterministic trend by using least squares. Then, detrend the series by removing the identified deterministic trend. On the Econometric Modeler tab, in the Transforms section, click Detrend. The transformed variable PSSGLogDetrend appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogDetrend) figure window.
4-110
Transform Time Series Using Econometric Modeler App
PSSGLogDetrend does not appear to have a deterministic trend, although it has a marked cyclic trend.
See Also Econometric Modeler
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Data Transformations” on page 2-2
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
Creating ARIMA Models Using Econometric Modeler App
4-111
4
Econometric Modeler
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App This example shows how to use the Box-Jenkins methodology to select and estimate an ARIMA model by using the Econometric Modeler app. Then, it shows how to export the estimated model to generate forecasts. The data set, which is stored in Data_JAustralian.mat, contains the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991, among other time series. Prepare Data for Econometric Modeler At the command line, load the Data_JAustralian.mat data set. load Data_JAustralian
Import Data into Econometric Modeler At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including PAU, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(EXCH) figure window. Create a time series plot of PAU by double-clicking PAU in the Time Series pane.
4-112
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
The series appears nonstationary because it has a clear upward trend. Plot Sample ACF and PACF of Series Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). 1
In the Time Series pane, select the PAU time series.
2
Click the Plots tab, then click ACF.
3
Click the Plots tab, then click PACF.
4
Close all figure windows except for the correlograms. Then, drag the ACF(PAU) figure window above the PACF(PAU) figure window.
4-113
4
Econometric Modeler
The significant, linearly decaying sample ACF indicates a nonstationary process. Close the ACF(PAU) and PACF(PAU) figure windows. Difference the Series Take a first difference of the data. With PAU selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Difference. The transformed variable PAUDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PAUDiff) figure window.
4-114
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
Differencing removes the linear trend. The differenced series appears more stationary. Plot Sample ACF and PACF of Differenced Series Plot the sample ACF and PACF of PAUDiff. With PAUDiff selected in the Time Series pane: 1
Click the Plots tab, then click ACF.
2
Click the Plots tab, then click PACF.
3
Close the Time Series Plot(PAUDiff) figure window. Then, drag the ACF(PAUDiff) figure window above the PACF(PAUDiff) figure window.
4-115
4
Econometric Modeler
The sample ACF of the differenced series decays more quickly. The sample PACF cuts off after lag 2. This behavior is consistent with a second-degree autoregressive (AR(2)) model for the differenced series. Close the ACF(PAUDiff) and PACF(PAUDiff) figure windows. Specify and Estimate ARIMA Model Estimate an ARIMA(2,1,0) model for the log quarterly Australian CPI. This model has one degree of nonseasonal differencing and two AR lags.
4-116
1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Models section, click ARIMA.
3
In the ARIMA Model Parameters dialog box, on the Lag Order tab: a
Set Degree of Integration to 1.
b
Set Autoregressive Order to 2.
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
4
Click Estimate.
The model variable ARIMA_PAU appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMA_PAU) document.
4-117
4
Econometric Modeler
Both AR coefficients are significant at a 5% significance level. Check Goodness of Fit Check that the residuals are normally distributed and uncorrelated by plotting a histogram, quantilequantile plot, and ACF of the residuals.
4-118
1
Close the Model Summary(ARIMA_PAU) document.
2
With ARIMA_PAU selected in the Models pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
Click Residual Diagnostics > Residual Q-Q Plot.
4
Click Residual Diagnostics > Autocorrelation Function.
5
In the right pane, drag the Histogram(ARIMA_PAU) and QQPlot(ARIMA_PAU) figure windows so that they occupy the upper two quadrants, and drag the ACF so that it occupies the lower two quadrants.
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
The residual plots suggest that the residuals are approximately normally distributed and uncorrelated. However, there is some indication of an excess of large residuals. This behavior suggests that a t innovation distribution might be appropriate. Export Model to Workspace Export the model to the MATLAB Workspace. 1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Export section, click Export > Export Variables.
3
In the Export Variables dialog box, select the Select check box for the ARIMA_PAU model.
4
Click Export. The check box for the PAU time series is already selected.
The variables PAU and ARIMA_PAU appear in the workspace. Generate Forecasts at Command Line Generate forecasts and approximate 95% forecast intervals from the estimated ARIMA(2,1,0) model for the next four years (16 quarters). Use the entire series as a presample for the forecasts. 4-119
4
Econometric Modeler
[PAUF,PAUMSE] = forecast(ARIMA_PAU,16,'Y0',PAU); UB = PAUF + 1.96*sqrt(PAUMSE); LB = PAUF - 1.96*sqrt(PAUMSE); datesF = dates(end) + calquarters(1:16); figure h4 = plot(dates,PAU,'Color',[.75,.75,.75]); hold on h5 = plot(datesF,PAUF,'r','LineWidth',2); h6 = plot(datesF,UB,'k--','LineWidth',1.5); plot(datesF,LB,'k--','LineWidth',1.5); legend([h4,h5,h6],'Log CPI','Forecast',... 'Forecast Interval','Location','Northwest') title('Log Australian CPI Forecast') hold off
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler 4-120
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
Objects arima Functions estimate | forecast
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4
•
“Box-Jenkins Methodology” on page 3-2
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Share Results of Econometric Modeler App Session” on page 4-237
•
Creating ARIMA Models Using Econometric Modeler App
4-121
4
Econometric Modeler
Select ARCH Lags for GARCH Model Using Econometric Modeler App This example shows how to select the appropriate number of ARCH and GARCH lags for a GARCH model by using the Econometric Modeler app. The data set, stored in Data_MarkPound, contains daily Deutschmark/British pound bilateral spot exchange rates from 1984 through 1991. Import Data into Econometric Modeler At the command line, load the Data_MarkPound.mat data set. load Data_MarkPound
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import Data to the app: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the Data variable.
3
Click Import.
The variable Data1 appears in the Time Series pane, and its time series plot appears in the Time Series Plot(Data1) figure window.
4-122
Select ARCH Lags for GARCH Model Using Econometric Modeler App
The exchange rate looks nonstationary (it does not appear to fluctuate around a fixed level). Transform Data Convert the exchange rates to returns. 1
With Data1 selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Log. In the Time Series pane, a variable representing the logged exchange rates (Data1Log) appears , and its time series plot appears in the Time Series Plot(Data1Log) figure window.
2
In the Time Series pane, select Data1Log.
3
On the Econometric Modeler tab, in the Transforms section, click Difference.
In the Time Series pane, a variable representing the returns (Data1LogDiff) appears. A time series plot of the differenced series appears in the Time Series Plot(Data1LogDiff) figure window. Check for Autocorrelation In the Time Series pane, rename the Data1LogDiff variable by clicking it twice to select its name and entering Returns. 4-123
4
Econometric Modeler
The app updates the names of all documents associated with the returns.
The returns series fluctuates around a common level, but exhibits volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. Visually assess whether the returns have serial correlation by plotting the sample ACF and PACF:
4-124
1
Close all figure windows in the right pane.
2
In the Time Series pane, select the Returns time series.
3
Click the Plots tab, then click ACF.
4
Click the Plots tab, then click PACF.
5
Drag the PACF(Returns) figure window below the ACF(Returns) figure window so that you can view them simultaneously.
Select ARCH Lags for GARCH Model Using Econometric Modeler App
The sample ACF and PACF show virtually no significant autocorrelation. Conduct the Ljung-Box Q-test to assess whether there is significant serial correlation in the returns for at most 5, 10, and 15 lags. To maintain a false-discovery rate of approximately 0.05, specify a significance level of 0.05/3 = 0.0167 for each test. 1
Close the ACF(Returns) and PACF(Returns) figure windows.
2
With Returns selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Ljung-Box Q-Test.
3
On the LBQ tab, in the Parameters section, set Number of Lags to 5.
4
Set Significance Level to 0.0167.
5
In the Tests section, click Run Test.
6
Repeat steps 3 through 5 twice, with these changes. a
Set Number of Lags to 10 and the DOF to 10.
b
Set Number of Lags to 15 and the DOF to 15.
The test results appear in the Results table of the LBQ(Returns) document. 4-125
4
Econometric Modeler
The Ljung-Box Q-test null hypothesis that all autocorrelations up to the tested lags are zero is not rejected for tests at lags 5, 10, and 15. These results, and the ACF and PACF, suggest that a conditional mean model is not needed for this returns series. Check for Conditional Heteroscedasticity To check the returns for conditional heteroscedasticity, Econometric Modeler requires a series of squared residuals. After importing the squared residuals into the app, visually assess whether there is conditional heteroscedasticity by plotting the ACF and PACF of the squared residuals. Then, determine the appropriate number of lags for a GARCH model of the returns by conducting Engle's ARCH test. Compute the series of squared residuals at the command line by demeaning the returns, then squaring each element of the result. Export Returns to the command line: 1
In the Time Series pane, right-click Returns.
2
In the context menu, select Export.
Returns appears in the MATLAB Workspace. Remove the mean from the returns, then square each element of the result. To ensure all series in the Time Series pane are synchronized, Econometric Modeler prepends first-differenced series with a NaN value. Therefore, to estimate the sample mean, use mean(Returns,'omitnan'). Residuals = Returns - mean(Returns,'omitnan'); Residuals2 = Residuals.^2;
Create a table containing the Returns and Residuals2 variables. Tbl = table(Returns,Residuals,Residuals2);
Import Tbl into Econometric Modeler: 1
4-126
On the Econometric Modeler tab, in the Import section, click
.
2
The app must clear the right pane and all documents before importing new data. Therefore, after clicking Import, in the Econometric Modeler dialog box, click OK.
3
In the Import Data dialog box, in the Import? column, select the check box for the Tbl variable.
Select ARCH Lags for GARCH Model Using Econometric Modeler App
4
Click Import.
The variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(Residuals) figure window. Plot the ACF and PACF of the squared residuals. 1
Close the Time Series Plot(Residuals) figure window.
2
In the Time Series pane, select the Residuals2 time series.
3
Click the Plots tab, then click ACF.
4
Click the Plots tab, then click PACF.
5
Drag the PACF(Residuals2) figure window below the ACF(Residuals2) figure window so that you can view them simultaneously.
The sample ACF and PACF of the squared returns show significant autocorrelation. This result suggests that a GARCH model with lagged variances and lagged squared innovations might be appropriate for modeling the returns. 4-127
4
Econometric Modeler
Conduct Engle's ARCH test on the residuals series. Specify a two-lag ARCH model alternative hypothesis. 1
Close all figure windows.
2
In the Time Series pane, select the Residuals time series.
3
On the Econometric Modeler tab, in the Tests section, click New Test > Engle's ARCH Test.
4
On the ARCH tab, in the Parameters section, set Number of Lags to 2.
5
In the Tests section, click Run Test.
The test results appear in the Results table of the ARCH(Residuals) document.
Engle's ARCH test rejects the null hypothesis of no ARCH effects in favor of the alternative ARCH model with two lagged squared innovations. An ARCH model with two lagged innovations is locally equivalent to a GARCH(1,1) model. Create and Fit GARCH Model Fit a GARCH(1,1) model to the returns series.
4-128
1
In the Time Series pane, select the Returns time series.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GARCH.
4
In the GARCH Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1.
c
Because the returns required demeaning, include an offset by selecting the Include Offset check box.
Select ARCH Lags for GARCH Model Using Econometric Modeler App
5
Click Estimate.
The model variable GARCH_Returns appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GARCH_Returns) document.
4-129
4
Econometric Modeler
An alternative way to select lags for a GARCH model is by fitting several models containing different lag polynomial degrees. Then, choose the model yielding the minimal AIC.
See Also Apps Econometric Modeler Objects garch Functions estimate
More About
4-130
•
“Specify Conditional Variance Model for Exchange Rates” on page 8-47
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
Estimate Multiplicative ARIMA Model Using Econometric Modeler App This example uses the Box-Jenkins method [1] to determine a multiplicative seasonal ARIMA (SARIMA) model for a univariate series by using the Econometric Modeler app. Specifically, the example shows this procedure: 1
Visually determine whether the series has an exponential trend, and then remove the trend if one is present. The remaining steps determine the structure of the conditional mean model to fit to the series without the exponential trend.
2
Visually determine whether the transformed series has seasonal integration, and then remove the seasonal integration, if it is present, by taking the seasonal difference. Record the degrees of seasonal integration removed.
3
Determine whether the transformed series has a unit root by conducting a statistical test, and then apply the first difference to the series to stabilize it, if a unit root is present. Record whether a unit root exists.
4
Determine the nonseasonal AR and MA degrees of the conditional mean model for the stationary series by inspecting correlograms of the stabilized series. Record the polynomial degrees.
5
Use the results of steps 2 through 4 to create and estimate a conditional mean model for the nonstationary series with the exponential trend removed.
The data set Data_Airline.mat contains monthly counts of airline passengers. Import Data into Econometric Modeler At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
4-131
4
Econometric Modeler
The series exhibits a seasonal trend, serial correlation, and possible exponential growth. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71. Stabilize Series To properly determine the lag operator polynomial orders for the conditional mean model, correlograms require a stationary series. Address the exponential trend by applying the log transform to PSSG. 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-132
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
The exponential growth appears to be removed from the series. The rest of the example determines the parameters of conditional mean model for the series PSSGLog by several visual and statistical diagnoses and transformations. Address the seasonal trend by applying the 12th order seasonal difference. With PSSGLog selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, set Seasonal to 12. Then, click Seasonal. The transformed variable PSSGLogSeasonalDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiff) figure window.
4-133
4
Econometric Modeler
The transformed series appears to have a unit root. Test the null hypothesis that PSSGLogSeasonalDiff has a unit root by using the Augmented DickeyFuller test. Specify that the alternative is an AR(0) model, then test again specifying an AR(1) model. Adjust the significance level to 0.025 to maintain a total significance level of 0.05. 1
With PSSGLogSeasonalDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Augmented Dickey-Fuller Test.
2
On the ADF tab, in the Parameters section, set Significance Level to 0.025.
3
In the Tests section, click Run Test.
4
In the Parameters section, set Number of Lags to 1.
5
In the Tests section, click Run Test.
The test results appear in the Results table of the ADF(PSSGLogSeasonalDiff) document.
4-134
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
Both tests fail to reject the null hypothesis that the series is a unit root process. Address the unit root by applying the first difference to PSSGLogSeasonalDiff. With PSSGLogSeasonalDiff selected in the Time Series pane, click the Econometric Modeler tab. Then, in the Transforms section, click Difference. The transformed variable PSSGLogSeasonalDiffDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiffDiff) figure window. In the Time Series pane, rename the PSSGLogSeasonalDiffDiff variable by clicking it twice to select its name and entering PSSGStable. The app updates the names of all documents associated with the transformed series.
4-135
4
Econometric Modeler
Identify Model for Series Determine the lag structure for a conditional mean model of the series by plotting the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) of the stabilized series.
4-136
1
With PSSGStable selected in the Time Series pane, click the Plots tab, then click ACF.
2
Show the first 50 lags of the ACF. On the ACF tab, set Number of Lags to 50.
3
Click the Plots tab, then click PACF.
4
Show the first 50 lags of the PACF. On the PACF tab, set Number of Lags to 50.
5
Drag the ACF(PSSGStable) figure window above the PACF(PSSGStable) figure window.
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
According to [1], the autocorrelations in the ACF and PACF, the results in previous steps, and the balance of model parsimony with complexity, suggest that the following SARIMA(0,1,1)×(0,1,1)12 model is appropriate for the nonstationary series with the exponential trend removed PSSGLog. (1 − L) 1 − L12 yt = 1 + θ1L 1 + Θ12L12 εt . Close all figure windows. Specify and Estimate SARIMA Model Specify the SARIMA(0,1,1)×(0,1,1)12 model. 1
In the Time Series pane, select the PSSGLog time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section a
Set Degrees of Integration to 1. 4-137
4
Econometric Modeler
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section
4
a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box.
Click Estimate.
The model variable SARIMA_PSSGLog appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(SARIMA_PSSGLog) document.
4-138
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
The results include: • Model Fit — A time series plot of PSSGLog and the fitted values from SARIMA_PSSGLog. • Residual Plot — A time series plot of the residuals of SARIMA_PSSGLog. • Parameters — A table of estimated parameters of SARIMA_PSSGLog. Because the constant term was held fixed to 0 during estimation, its value and standard error are 0. • Goodness of Fit — The AIC and BIC fit statistics of SARIMA_PSSGLog.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler 4-139
4
Econometric Modeler
Objects arima Functions estimate
More About
4-140
•
“Specify Multiplicative ARIMA Model” on page 7-58
•
“Estimate Multiplicative ARIMA Model” on page 7-122
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Share Results of Econometric Modeler App Session” on page 4-237
•
Creating ARIMA Models Using Econometric Modeler App
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App This example shows how to evaluate ARIMA model assumptions by performing residual diagnostics in the Econometric Modeler app. The data set, which is stored in Data_JAustralian.mat, contains the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991, among other time series. Import Data into Econometric Modeler At the command line, load the Data_JAustralian.mat data set. load Data_JAustralian
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including PAU, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(EXCH) figure window. Create a time series plot of PAU by double-clicking PAU in the Time Series pane.
4-141
4
Econometric Modeler
Specify and Estimate ARIMA Model Estimate an ARIMA(2,1,0) model for the log quarterly Australian CPI (for details, see “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112).
4-142
1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Models section, click ARIMA.
3
In the ARIMA Model Parameters dialog box, on the Lag Order tab: a
Set the Degree of Integration to 1.
b
Set the Autoregressive Order to 2.
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
4
Click Estimate.
The model variable ARIMA_PAU appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMA_PAU) document.
4-143
4
Econometric Modeler
In the Model Summary(ARIMA_PAU) document, the Residual Plot figure is a time series plot of the residuals. The plot suggests that the residuals are centered at y = 0 and they exhibit volatility clustering. Perform Residual Diagnostics Visually assess whether the residuals are normally distributed by plotting their histogram and a quantile-quantile plot: 1
Close the Model Summary(ARIMA_PAU) document.
2
With ARIMA_PAU selected in the Models pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
Click Residual Diagnostics > Residual Q-Q Plot.
Inspect the histogram by clicking the Histogram(ARIMA_PAU) figure window.
4-144
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
Inspect the quantile-quantile plot by clicking the QQPlot(ARIMA_PAU) figure window.
4-145
4
Econometric Modeler
The residuals appear approximately normally distributed. However, there is an excess of large residuals, which indicates that a t innovation distribution might be a reasonable model modification. Visually assess whether the residuals are serially correlated by plotting their autocorrelations. With ARIMA_PAU selected in the Models pane, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
4-146
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
All lags that are greater than 0 correspond to insignificant autocorrelations. Therefore, the residuals are uncorrelated in time. Visually assess whether the residuals exhibit heteroscedasticity by plotting the ACF of the squared residuals. With ARIMA_PAU selected in the Models pane, click the Econometric Modeler tab. Then, click the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation.
4-147
4
Econometric Modeler
Significant autocorrelations occur at lags 4 and 5, which suggests a composite conditional mean and variance model for PAU.
See Also Apps Econometric Modeler Objects arima Functions estimate | infer
More About
4-148
•
“Infer Residuals for Diagnostic Checking” on page 7-143
•
“Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150
•
Creating ARIMA Models Using Econometric Modeler App
4-149
4
Econometric Modeler
Specify t Innovation Distribution Using Econometric Modeler App This example shows how to specify a t innovation distribution for an ARIMA model by using the Econometric Modeler app. The example also shows how to fit the model to data. The data set, which is stored in Data_JAustralian.mat, contains the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991, among other time series. Import Data into Econometric Modeler At the command line, load the Data_JAustralian.mat data set. load Data_JAustralian
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including PAU, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(EXCH) figure window. Create a time series plot of PAU by double-clicking PAU in the Time Series pane.
4-150
Specify t Innovation Distribution Using Econometric Modeler App
Specify and Estimate ARIMA Model Estimate an ARIMA(2,1,0) model for the log quarterly Australian CPI. Specify a t innovation distribution. (For details, see “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 and “Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141.) 1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Models section, click ARIMA.
3
In the ARIMA Model Parameters dialog box, on the Lag Order tab: a
Set the Degree of Integration to 1.
b
Set the Autoregressive Order to 2.
c
Click the Innovation Distribution button, then select t.
4-151
4
Econometric Modeler
4
Click Estimate.
The model variable ARIMA_PAU appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMA_PAU) document.
4-152
Specify t Innovation Distribution Using Econometric Modeler App
The app estimates the t innovation degrees of freedom (DoF) along with the model coefficients and variance.
See Also Apps Econometric Modeler Objects arima Functions estimate
More About •
“Specify Conditional Mean Model Innovation Distribution” on page 7-76
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 4-153
4
Econometric Modeler
•
4-154
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
Estimate Vector Autoregression Model Using Econometric Modeler
Estimate Vector Autoregression Model Using Econometric Modeler This example models the quarterly US GDP growth rate, M1 money supply rate, and the 3-month Tbill rate series by using the Econometric Modeler app. The example shows how to perform the following actions in the app: 1
Stabilize the raw nonstationary series.
2
Fit several competing vector autoregression (VAR) models, and choose the one with the best, parsimonious fit.
3
Diagnose each residual series.
4
Export the chosen model to the command line.
At the command line, the example conducts Granger-causality tests on the series given the estimated model, and it uses the model to generate forecasts. The data set, which is stored in Data_USEconModel.mat, contains the raw, quarterly US GDP, M1 money supply, and 3-month T-bill rate, among other series, from 1947 through 2009. Load and Import Data into Econometric Modeler At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
GDP, M1SL, and TB3MS, among other series, appear in the Time Series pane, and a time series plot containing all series appears in the figure window. Create separate time series plots by double-clicking each of GDP, M1SL, and TB3MS in the Time Series pane. Position the Time Series Plot(series) tabs by clicking and dragging each to see all plots simultaneously.
4-155
4
Econometric Modeler
The US GDP and M1 money supply series exhibit exponential growth, and the 3-month T-Bill series appears like a random walk. Diagnose and Transform Series Remove the exponential trend from the GDP and M1 money supply series by applying the log transform to each series. Click the Econometric Modeler tab, and then, in the Time Series pane, click GDP and Ctrl click M1SL. In the Transforms section, click Log. The transformed series GDPLog and M1SLLog appear in the Time Series pane, and their time series plot appears in the Time Series Plot(GDPLog) figure window.
4-156
Estimate Vector Autoregression Model Using Econometric Modeler
Test the null hypothesis that each series is a unit root process against a stationary AR(p) with drift alternative, where p = 4 through 1. For each series GDPLog, M1SLLog, and TB3MS: 1
On the Econometric Modeler tab, in the Time Series pane, click the series.
2
On the Econometric Modeler tab, in the Time Series pane, click New Test > Augmented Dickey-Fuller Test.
3
On the ADF tab, in the Parameters section, in the Number of Lags box, type 4.
4
In the Tests section, click Run.
5
Repeat steps 3 and 4 for lags 3, 2, and 1.
The following figures show the results. The tests fail to reject the null hypotheses of a unit root series in all cases, which suggests that each series is difference stationary. 4-157
4
Econometric Modeler
Stabilize the series by applying the first difference to each series. 1
Click the Econometric Modeler tab, and then, in the Time Series pane, click GDPLog and Ctrl +click M1SLLog and TB3MS.
2
In the Transforms section, click Difference. The transformed series GDPLogDiff, M1SLLogDiff, and TB3MSDiff appear in the Time Series pane, and their time series plot appears in the Time Series Plot(GDPLogDiff) figure window.
3
Rename GDPLogDiff and M1SLLogDiff to GDPRate and M1SLRate by clicking their names twice in the Time Series pane, and typing their new names.
Estimate VAR Models Estimate 3-D VAR(p) models of the US quarterly GDP growth rate series GDPRate, M1 money supply growth rate series M1SLRate, and change in the 3-month treasury bill rate series TB3MSDiff, where p = 1 through 4.
4-158
1
In the Time Series pane, click GDPRate and Ctrl+click M1SLRate and TB3MSDiff.
2
In the Models section, click VAR.
3
Fit a VAR(1) model (the default) by clicking Estimate in the VAR Model Parameters dialog box.
Estimate Vector Autoregression Model Using Econometric Modeler
The model variable VAR appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(VAR) document. 4
Repeat steps 1 and 2 for each AR order p = 2 through 4. In the Var Model Parameters dialog box, set the AR order by using the Autoregressive Order box. Similar to the VAR(1) estimation, a variable for each model (VAR2, VAR3, and VAR4) appears in the Models pane, and their estimation summaries appear in their respective Model Summary(ModelName) document. You can view properties of an estimated model in the Preview pane by clicking the model in the Models pane. For example, click VAR4.
4-159
4
Econometric Modeler
Select Model with Best In-Sample Fit The estimation summary in each Model Summary(VARp) tab contains a plot of fitted values and residuals, with respect to the time series in the Time Series list, a standard statistical table of estimates and inferences, and a table of information criteria. Compare the information criteria of each estimated model simultaneously by positioning the estimation summary documents so that they occupy the four quadrants of the right pane. The model with the lowest value has the best, parsimonious fit.
4-160
Estimate Vector Autoregression Model Using Econometric Modeler
The VAR(2) model VAR2 produces the lowest AIC and BIC values. Choose this model for further analysis. Check Goodness of Fit Inspect the following VAR(2) plots of each residual series: • Histograms, for center, normality, and outliers
4-161
4
Econometric Modeler
• Quantile-quantile plots, for normality, skewness, and tails • Autocorrelation function (ACF), for serial correlation • ACF of squared residual series, for heteroscedasticity This example diagnoses the residuals visually. Alternatively, you can conduct statistical tests to diagnose the residuals. In the Model Summary(VAR2) document, click Document Actions > Tile All, and then click the Single radio button.
On the Models pane, click VAR2. Plot separate residual histograms. On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram. Histograms of the each residual series appear in the Histogram(VAR2) document.
4-162
Estimate Vector Autoregression Model Using Econometric Modeler
Each residual series appears centered around 0, with varying degrees of slight skewness and possible outliers. This example proceeds without addressing possible skewness and outliers. Plot separate residual quantile-quantile plots. With VAR2 selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual QQ Plot. Quantile-quantile plots of each residual series appear in the QQPlot(VAR2) document.
4-163
4
Econometric Modeler
Each series has slightly fatter tails than what is expected by the normal distribution. This example proceeds without addressing possible kurtosis. Plot separate ACF plots of each residual series. With VAR2 selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function. ACF plots of each residual series appear in the ACF(VAR2) document.
4-164
Estimate Vector Autoregression Model Using Econometric Modeler
Each series has several significant, albeit small, autocorrelations. For example, the GDPRate residual series has significant autocorrelations at lags 4 and 16, and the M1SLRate and TB3MSDiff quarterly difference series both have significant autocorrelations at lags 7. To address the autocorrelations, you can include higher AR lags in the VAR model for estimation, but this example proceeds without addressing the autocorrelations this way. Plot separate ACF plots of each squared residual series. With VAR2 selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation. ACF plots of the each squared residual series appear in the ACF(VAR2)2 document.
4-165
4
Econometric Modeler
The M1SLRate and TB3MSDiff series have significant autocorrelations, which suggests that heteroscedasticity is present. This example proceeds without addressing possible heteroscedasticity. Export Model to Workspace Export the model to the workspace. 1
With the VAR2 model selected in the Models pane, on the Econometric Modeler tab, in the Export section, click Export > Export Variables.
2
In the Export Variables dialog box, select the Select check box for GDPRate, M1SLRate, and TB3MSDiff.
3
Click Export.
The variables GDPRate, M1SLRate, TB3MSDiff, and VAR2 appear in the workspace.
4-166
Estimate Vector Autoregression Model Using Econometric Modeler
Conduct Causality Analysis at Command Line Determine whether series in the system Granger-cause the other series by conducting 1-step, leaveone-out Granger-causality tests. Pass the estimated VAR(2) model to gctest. gctest(VAR2) H0 _______________________________________________
Decision __________________
"Exclude "Exclude "Exclude "Exclude "Exclude "Exclude
"Reject "Reject "Cannot "Reject "Reject "Cannot
lagged lagged lagged lagged lagged lagged
M1SLRate in GDPRate equation" TB3MSDiff in GDPRate equation" GDPRate in M1SLRate equation" TB3MSDiff in M1SLRate equation" GDPRate in TB3MSDiff equation" M1SLRate in TB3MSDiff equation"
H0" H0" reject H0" H0" H0" reject H0"
Distribution ____________ "Chi2(2)" "Chi2(2)" "Chi2(2)" "Chi2(2)" "Chi2(2)" "Chi2(2)"
The null hypothesis of the 1-step leave-one-out Granger-causality test is that a series does not 1-step Granger-cause another series, conditioned on all other series being present in the system. The results suggest: • Given that the 3-month T-bill change is in the system, enough evidence exists to suggest that M1 money supply rate 1-step Granger-causes the GDP rate. • Given that the M1 money supply rate is in the system, enough evidence exists to suggest that the 3-month T-bill change 1-step Granger-causes the GDP rate. • Given that the 3-month T-bill change is in the system, not enough evidence exists to suggest that the GDP rate 1-step Granger-causes the M1 money supply rate. • Given that the GDP rate is in the system, enough evidence exists to suggest that the 3-month T-bill change 1-step Granger-causes the M1 money supply rate. • Given that the M1 money supply rate is in the system, enough evidence exists to suggest that the GDP rate 1-step Granger-causes the 3-month T-bill change. • Given that the GDP rate is in the system, not enough evidence exists to suggest that the 1-step M1 money supply rate Granger-causes the 3-month T-bill change. Generate Forecasts at Command Line Generate forecasts and approximate 95% forecast intervals from the estimated VAR(2) model for the next four years (16 quarters). For convenience, use the entire series as a presample for the forecasts. The forecast function discards all specified presample observations except for the required final two observations. Y = [GDPRate M1SLRate TB3MSDiff]; [YF,YFMSE] = forecast(VAR2,16,Y); YFSE = cell2mat(cellfun(@(x)sqrt(diag(x)'),YFMSE, ... UniformOutput=false)); UB = YF + 1.96*YFSE; LB = YF - 1.96*YFSE; datesF = DataTimeTable.Time(end) + calquarters(1:16); figure tiledlayout(3,1) for j = 1:VAR2.NumSeries nexttile h1 = plot(DataTimeTable.Time(end-30:end),Y(end-30:end,j), ... Color=[.75,.75,.75]);
4-167
Stat ____
8. 16 3. 20 20 0.4
4
Econometric Modeler
hold on h2 = plot(datesF,YF(:,j),"r",LineWidth=2); h3 = plot(datesF,UB(:,j),"k--",LineWidth=1.5); plot(datesF,LB(:,j),"k--",LineWidth=1.5); ct = [Y(end,j) YF(1,j); Y(end,j) LB(1,j); Y(end,j) UB(1,j);]; plot([DataTimeTable.Time(end); datesF(1)],ct,Color=[.75,.75,.75]) legend([h1 h2 h3],VAR2.SeriesNames(j),"Forecast", ... "Forecast interval",Location="northwest") hold off end
See Also Apps Econometric Modeler Objects varm Functions gctest | forecast
More About • 4-168
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
Estimate Vector Autoregression Model Using Econometric Modeler
•
“Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50
•
“Forecast VAR Model Conditional Responses” on page 9-66
4-169
4
Econometric Modeler
Conduct Cointegration Test Using Econometric Modeler This example models the annual Canadian inflation and interest rate series by using the Econometric Modeler app. The example performs the following actions in the app: 1
Test each raw series for stationarity.
2
Test for cointegration using the Engle-Granger cointegration test.
3
Test for cointegration among all possible cointegration ranks using the Johansen cointegration test.
The data set, which is stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. Load and Import Data into Econometric Modeler At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(INF_C) figure window.
4-170
Conduct Cointegration Test Using Econometric Modeler
Plot only the three interest rate series INT_L, INT_M, and INT_S together. In the Time Series pane, click INT_L and Ctrl click INT_M and INT_S. Then, on the Plots tab, in the Plots section, click Time Series. The time series plot appears in the Time Series(INT_L) document.
4-171
4
Econometric Modeler
The interest rate series each appear nonstationary, and they appear to move together with meanreverting spread. In other words, they exhibit cointegration. To establish these properties, this example conducts statistical tests. Conduct Stationarity Test Assess whether each interest rate series is stationary by conducting Phillips-Perron unit-root tests. For each series, assume a stationary AR(1) process with drift for the alternative hypothesis. You can confirm this property by viewing the autocorrelation and partial autocorrelation function plots. Close all plots in the right pane, and perform the following procedure for each series INT_L, INT_M, and INT_S.
4-172
1
In the Time Series pane, click a series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Phillips-Perron Test.
3
On the PP tab, in the Parameters section, in the Number of Lags box, type 1, and in the Model list select Autoregressive with Drift.
Conduct Cointegration Test Using Econometric Modeler
4
In the Tests section, click Run Test. Test results for the selected series appear in the PP(series) tab.
Position the test results to view them simultaneously.
All tests fail to reject the null hypothesis that the series contains a unit root process. Conduct Cointegration Tests Econometric Modeler supports the Engle-Granger and Johansen cointegration tests. Before conducting a cointegration, determine whether there is visual evidence of cointegration by regressing each interest rate on the other interest rates. 1
In the cointegrating relation, assign INT_L as the dependent variable and INT_M and INT_S as the independent variables. In the Time Series pane, click INT_L. 4-173
4
Econometric Modeler
4-174
2
On the Econometric Modeler tab, in the Models section, click the arrow > MLR.
3
In the MLR Model Parameters dialog box, select the Include? check boxes of the time series INT_M and INT_S.
4
Click Estimate. The model variable MLR_INT_L appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(MLR_INT_L) document.
5
Iterate steps 1 through 4 twice. For the first iteration, assign INT_M as the dependent variable and INT_L and INT_S as the independent variables. For the second iteration, assign INT_S as the dependent variable and INT_M and INT_L as the independent variables.
6
On each Model Summary(MLR_seriesName) tab, determine whether the residual plot appears stationary.
Conduct Cointegration Test Using Econometric Modeler
4-175
4
Econometric Modeler
The residual series of the regression on INT_L appears trending, and the other residual series appear stationary. For the Engle-Granger test, choose INT_S as the dependent variable. Conduct Engle-Granger Test The Engle-Granger tests for one cointegrating relation by performing two univariate regressions: the cointegrating regression and the subsequent residual regression (for more details, see “Identifying Single Cointegrating Relations” on page 9-114). Therefore, to perform the cointegrating regression, the test requires: • The response series that takes the role of the dependent variable in the first regression. • Deterministic terms to include in the cointegrating regression. Then, the test assesses whether the residuals resulting from the cointegrating regression are a unit root process. Available tests are the Augmented Dickey-Fuller (adftest) or Phillips-Perron (pptest) test. Both perform a residual regression to form the test statistic. Therefore, the test also requires: • The unit root test to conduct, either the adftest or pptest function. The residual regression model for both tests is an AR model without deterministic terms. For more flexible models, such as AR models with drift, call the functions at the command line. • Number of lagged residuals to include in the AR model. Conduct the Engle-Granger test.
4-176
1
In the Time Series pane, click INT_L and Ctrl+click INT_M and INT_S.
2
In the Tests section, click New Test > Engle-Granger Test.
3
On the EGCI tab, in the Parameters section: a
In the Dependent Variable list, select INT_S.
b
In the Residual Regression Form list, select PP for the Phillips-Perron unit root test.
c
In the Number of Lags box, type 1.
Conduct Cointegration Test Using Econometric Modeler
4
In the Tests section, click Run Test. Test results appear in the EGCI document.
5
In the Tests section, click Run Test again. Test results and a plot of the cointegration relation for the largest rank appear in the EGCI tab.
The test rejects the null hypothesis that the series does not exhibit cointegration. Although the test is suited to determine whether series are cointegrated, the results are limited to that. For example, the test does not give insight into the cointegrating rank, which is required to form a VEC model of the series. For more details, see “Identifying Single Cointegrating Relations” on page 9-114. Johansen Test The Johansen cointegration test runs a separate test for each possible cointegration rank 0 through m – 1, where m is the number of series. This characteristic makes the Johansen test better suited than the Engle-Granger test to determine the cointegrating rank for a VEC model of the series. Also, the Johansen test framework is multivariate, and, therefore, results are not relative to an arbitrary choice for a univariate regression response variable. For more details, see “Identifying Multiple Cointegrating Relations” on page 9-137. 4-177
4
Econometric Modeler
The Johansen test requires: • Deterministic terms to include in the cointegrating relation and the model in levels. • Number of short-run lags in the VEC model of the series. Because the raw series do not contain a linear trend, assume that the only deterministic term in the model is an intercept in the cointegrating relation (H1* Johansen form), and include 1 lagged difference term in the model.
4-178
1
With INT_L, INT_M, and INT_S selected in the Time Series pane, click Tests section, click New Test > Johansen Test.
2
On the JCI tab, in the Parameters section, in the Number of Lags box, type 1, and in the Model list select H1*.
3
In the Tests section, click Run Test. Test results and a plot of the cointegration relation for the largest rank appear in the JCI tab.
Conduct Cointegration Test Using Econometric Modeler
Econometric Modeler conducts a separate test for each cointegration rank 0 through 2 (the number of series – 1). The test rejects the null hypothesis of no cointegration (Cointegration rank = 0), but fails to reject the null hypotheses of Cointegration rank ≤ 1 and Cointegration rank ≤ 2. The results suggest that the cointegration rank is 1.
See Also Apps Econometric Modeler Objects vecm Functions forecast
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
4-179
4
Econometric Modeler
Estimate Vector Error-Correction Model Using Econometric Modeler This example models the annual Canadian inflation and interest rate series by using the Econometric Modeler app. The example shows how to perform the following actions in the app: 1
Test each raw series for stationarity.
2
Test for cointegration and determine a Johansen cointegration form if cointegration is present.
3
Fit several completing vector error-correction (VEC) models, and choose the one with the best, parsimonious fit.
4
Diagnose each residual series.
5
Export the chosen model to the command line.
At the command line, the example uses the model to generate forecasts. The data set, which is stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. Load and Import Data into Econometric Modeler At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(INF_C) figure window.
4-180
Estimate Vector Error-Correction Model Using Econometric Modeler
Plot only the three interest rate series INT_L, INT_M, and INT_S together. In the Time Series pane, click INT_L and Ctrl+click INT_M and INT_S. Then, on the Plots tab, in the Plots section, click Time Series. The time series plot appears in the Time Series(INT_L) document.
4-181
4
Econometric Modeler
The interest rate series each appear nonstationary, and they appear to move together with meanreverting spread. In other words, they exhibit cointegration. To establish these properties, this example conducts statistical tests. Conduct Stationarity Test Assess whether each interest rate series is stationary by conducting Phillips-Perron unit root tests. For each series, assume a stationary AR(1) process with drift for the alternative hypothesis. You can confirm this property by viewing the autocorrelation and partial autocorrelation function plots. Close all plots in the right pane, and perform the following procedure for each series INT_L, INT_M, and INT_S.
4-182
1
In the Time Series pane, click a series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Phillips-Perron Test.
3
On the PP tab, in the Parameters section, in the Number of Lags box, type 1, and in the Model list select Autoregressive with Drift.
Estimate Vector Error-Correction Model Using Econometric Modeler
4
In the Tests section, click Run Test. Test results for the selected series appear in the PP(series) tab.
Position the test results to view them simultaneously.
All tests fail to reject the null hypothesis that the series contains a unit root process. Conduct Cointegration Test To create and estimate a VEC model of unit root series, the series need to exhibit cointegration. Conduct a Johansen test. Because the raw series do not contain a linear trend, assume that the only deterministic term in the model is an intercept in the cointegrating relation (H1* Johansen form), and include 1 lagged difference term in the model. 1
In the Time Series pane, click INT_L and Ctrl+click INT_M and INT_S. 4-183
4
Econometric Modeler
2
In the Tests section, click New Test > Johansen Test.
3
On the JCI tab, in the Parameters section, in the Number of Lags box, type 1. In the Model list, select H1*.
4
In the Tests section, click Run Test. Test results and a plot of the cointegration relation for the largest rank appear in the JCI tab.
Econometric Modeler conducts a separate test for each cointegration rank 0 through 2 (the number of series – 1). The test rejects the null hypothesis of no cointegration (Cointegration rank = 0), but fails to reject the null hypothesis of Cointegration rank ≤ 1. The conclusion is to set the cointegration rank of the VEC model to 1. Estimate VEC Models Estimate 3-D VEC(p) models of the interest rate series, with a cointegration rank of 1 and p = 1 and 2.
4-184
Estimate Vector Error-Correction Model Using Econometric Modeler
1
With INT_L, INT_M, and INT_S selected in the Time Series pane, in the Models section, click VEC.
2
In the VEC Model Parameters dialog box, in the Johansen Form, select H1*. Fit the VEC(1) model by clicking Estimate.
The model variable VEC appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(VEC) document. 3
Repeat steps 1 and 2 for the short-run polynomial order p = 2. Similar to the VEC(1) estimation, the variable VEC2 appears in the Models pane, and its estimation summary appears in the Model Summary(VEC2) document. You can view properties of an estimated model in the Preview pane by clicking the model in the Models pane. For example, click VEC. 4-185
4
Econometric Modeler
Select Model with Best In-Sample Fit The estimation summary in each Model Summary(VECp) tab contains a plot of fitted values and residuals, with respect to the time series in the Time Series list, a standard statistical table of estimates and inferences, and a table of information criteria. Compare the information criteria of each estimated model simultaneously by positioning the estimation summary documents so that they occupy the left and right sections of the right pane. The model with the lowest value has the best, parsimonious fit.
The VEC(1) model VEC produces the lowest AIC and BIC values. Choose this model for further analysis. 4-186
Estimate Vector Error-Correction Model Using Econometric Modeler
Check Goodness of Fit Inspect the following VEC(1) plots of each residual series: • Histograms, for center, normality, and outliers • Quantile-quantile plots, for normality, skewness, and tails • Autocorrelation function (ACF), for serial correlation • ACF of squared residual series, for heteroscedasticity This example diagnoses the residuals visually. Alternatively, you can conduct statistical tests to diagnose the residuals. Dismiss the Model Summary(VEC2) document by clicking
on its tab.
On the Models pane, click VEC. Plot separate residual histograms. On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram. Histograms of the each residual series appear in the Histogram(VEC) document.
4-187
4
Econometric Modeler
Each residual series appears approximately centered around 0 and approximately normal. Plot separate residual quantile-quantile plots. With VEC selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual QQ Plot. Quantile-quantile plots of each residual series appear in the QQPlot(VEC) document.
The residual series are slightly skewed left and have lighter tails than the normal distribution. This example proceeds without addressing possible skewness and light tails. Plot separate ACF plots of each residual series. With VEC selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function. ACF plots of each residual series appear in the ACF(VEC) document.
4-188
Estimate Vector Error-Correction Model Using Econometric Modeler
The residuals do not exhibit significant autocorrelation. Plot separate ACF plots of each squared residual series. With VEC selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation. ACF plots of each squared residual series appear in the ACF(VEC)2 document.
4-189
4
Econometric Modeler
Each squared residual series has significant autocorrelations at early lags, which suggests that heteroscedasticity is present in all series. This example proceeds without addressing possible heteroscedasticity. Export Model to Workspace Export the model to the workspace. 1
With the VEC model selected in the Models pane, on the Econometric Modeler tab, in the Export section, click Export > Export Variables.
2
In the Export Variables dialog box, select the Select check box for INT_L, INT_M, and INT_S.
3
Click Export.
The variables INT_L, INT_M, INT_S, and VEC appear in the workspace. Generate Forecasts at Command Line Generate forecasts and approximate 95% forecast intervals from the estimated VEC(1) model for the next five years. For convenience, use the entire series as a presample for the forecasts. The 4-190
Estimate Vector Error-Correction Model Using Econometric Modeler
forecast function discards all specified presample observations except for the required final observation. Y0 = [INT_L INT_M INT_S]; [YF,YFMSE] = forecast(VEC,5,Y0); YFSE = cell2mat(cellfun(@(x)sqrt(diag(x)'),YFMSE,UniformOutput=false)); UB = YF + 1.96*YFSE; LB = YF - 1.96*YFSE; datesF = DataTimeTable.Time(end) + calyears(1:5); figure tiledlayout(3,1) for j = 1:VEC.NumSeries nexttile h1 = plot(DataTimeTable.Time,Y0(:,j),Color=[.75,.75,.75]); hold on h2 = plot(datesF,YF(:,j),"r",LineWidth=2); h3 = plot(datesF,UB(:,j),"k--",LineWidth=1.5); plot(datesF,LB(:,j),"k--",LineWidth=1.5); ct = [Y0(end,j) YF(1,j); Y0(end,j) LB(1,j); Y0(end,j) UB(1,j);]; plot([DataTimeTable.Time(end); datesF(1)],ct,Color=[.75,.75,.75]) legend([h1 h2 h3],VEC.SeriesNames(j),"Forecast", ... "Forecast interval",Location="northwest") hold off end
4-191
4
Econometric Modeler
See Also Apps Econometric Modeler Objects vecm Functions forecast
More About
4-192
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50
Compare Predictive Performance After Creating Models Using Econometric Modeler
Compare Predictive Performance After Creating Models Using Econometric Modeler This example shows how to choose lags for an ARIMA model by comparing the AIC values of estimated models using the Econometric Modeler app. The example also shows how to compare the predictive performance of several models that have the best in-sample fits at the command line. The data set Data_Airline.mat contains monthly counts of airline passengers. Import Data into Econometric Modeler At the command line, load the Data_Airline.mat data set. load Data_Airline
To compare predictive performance later, reserve the last two years of data as a holdout sample. fHorizon = 24; HoldoutTimeTable = DataTimeTable((end - fHorizon + 1):end,:); DataTimeTable((end - fHorizon + 1):end,:) = [];
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
4-193
4
Econometric Modeler
The series exhibits a seasonal trend, serial correlation, and possible exponential growth. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71. Remove Exponential Trend Address the exponential trend by applying the log transform to PSSG. 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-194
Compare Predictive Performance After Creating Models Using Econometric Modeler
The exponential growth appears to be removed from the series. Compare In-Sample Model Fits Box, Jenkins, and Reinsel suggest a SARIMA(0,1,1)×(0,1,1)12 model without a constant for PSSGLog [1] (for more details, see “Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131). However, consider all combinations of monthly SARIMA models that include up to two seasonal and nonseasonal MA lags. Specifically, iterate the following steps for each of the nine models of the form SARIMA(0,1,q)×(0,1,q12)12, where q ∈ {0,1,2} and q12 ∈ {0,1,2}. 1
For the first iteration: a
Let q = q12 = 0.
b
With PSSGLog selected in the Time Series pane, click the Econometric Modeler tab. In the Models section, click the arrow to display the models gallery.
c
In the models gallery, in the ARMA/ARIMA Models section, click SARIMA.
d
In the SARIMA Model Parameters dialog box, on the Lag Order tab:
4-195
4
Econometric Modeler
• Nonseasonal section i
Set Degrees of Integration to 1.
ii
Set Moving Average Order to 0.
iii
Clear the Include Constant Term check box.
• Seasonal section
e 2
i
Set Period to 12 to indicate monthly data.
ii
Set Moving Average Order to 0.
iii
Select the Include Seasonal Difference check box.
Click Estimate.
Rename the new model variable. a
In the Models pane, click the new model variable twice to select its name.
b
Enter SARIMA01qx01q12. For example, when q = q12 = 0, rename the variable to SARIMA010x010.
3
In the Model Summary(SARIMA01qx01q12) document, in the Goodness of Fit table, note the AIC value. For example, for the model variable SARIMA010x010, the AIC is in this figure.
4
For the next iteration, choose values of q and q12. For example, q = 0 and q12 = 1 for the second iteration.
5
In the Models pane, right-click SARIMA01qx01q12. In the context menu, select Modify to open the SARIMA Model Parameters dialog box with the current settings for the selected model.
6
In the SARIMA Model Parameters dialog box: a
In the Nonseasonal section, set Moving Average Order to q.
b
In the Seasonal section, set Moving Average Order to q12.
c
Click Estimate.
After you complete the steps, the Models pane contains nine estimated models named SARIMA010x010 through SARIMA012x012. The resulting AIC values are in this table.
4-196
Compare Predictive Performance After Creating Models Using Econometric Modeler
Model
Variable Name
AIC
SARIMA(0,1,0)× (0,1,0)12
SARIMA010x010
-410.3520
SARIMA(0,1,0)× (0,1,1)12
SARIMA010x011
-443.0009
SARIMA(0,1,0)× (0,1,2)12
SARIMA010x012
-441.0010
SARIMA(0,1,1)× (0,1,0)12
SARIMA011x010
-422.8680
SARIMA(0,1,1)× (0,1,1)12
SARIMA011x011
-452.0039
SARIMA(0,1,1)× (0,1,2)12
SARIMA011x012
-450.0605
SARIMA(0,1,2)× (0,1,0)12
SARIMA012x010
-420.9760
SARIMA(0,1,2)× (0,1,1)12
SARIMA012x011
-450.0087
SARIMA(0,1,2)× (0,1,2)12
SARIMA012x012
-448.0650
The three models yielding the lowest three AIC values are SARIMA(0,1,1)×(0,1,1)12, SARIMA(0,1,1)× (0,1,2)12, and SARIMA(0,1,2)×(0,1,1)12. These models have the best parsimonious in-sample fit. Export Best Models to Workspace Export the models with the best in-sample fits. 1 2
On the Econometric Modeler tab, in the Export section, click
.
In the Export Variables dialog box, in the Models column, click the Select check box for the SARIMA011x011, SARIMA011x012, and SARIMA012x011. Clear the check box for any other selected models.
4-197
4
Econometric Modeler
3
Click Export.
The arima model objects SARIMA011x011, SARIMA011x012, and SARIMA012x011 appear in the MATLAB Workspace. Estimate Forecasts At the command line, estimate two-year-ahead forecasts for each model. f5 = forecast(SARIMA_PSSGLog5,fHorizon); f6 = forecast(SARIMA_PSSGLog6,fHorizon); f8 = forecast(SARIMA_PSSGLog8,fHorizon);
f5, f6, and f8 are 24-by-1 vectors containing the forecasts. Compare Prediction Mean Square Errors Estimate the prediction mean square error (PMSE) for each of the forecast vectors. logPSSGHO = log(HoldoutTimeTable.Variables); pmse5 = mean((logPSSGHO - f5).^2); pmse6 = mean((logPSSGHO - f6).^2); pmse8 = mean((logPSSGHO - f8).^2);
Identify the model yielding the lowest PMSE. [~,bestIdx] = min([pmse5 pmse6 pmse8],[],2)
The SARIMA(0,1,1)×(0,1,1)12 model performs the best in-sample and out-of-sample.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
4-198
Compare Predictive Performance After Creating Models Using Econometric Modeler
See Also Apps Econometric Modeler Objects arima Functions estimate | infer
More About •
“Assess Predictive Performance” on page 3-89
•
“Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131
•
“Share Results of Econometric Modeler App Session” on page 4-237
4-199
4
Econometric Modeler
Estimate ARIMAX Model Using Econometric Modeler App This example shows how to specify and estimate an ARIMAX model using the Econometric Modeler app. The data set, which is stored in Data_CreditDefaults.mat, contains annual investmentgrade corporate bond default rates, among other predictors, from 1984 through 2004. Consider modeling corporate bond default rates as a linear, dynamic function of the other time series in the data set. Import Data into Econometric Modeler At the command line, load the Data_CreditDefaults.mat data set. load Data_CreditDefaults
For more details on the data set, enter Description at the command line. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including IGD, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(AGE) figure window. Assess Stationarity of Dependent Variable In the Time Series pane, double-click IGD. The value of IGD appears in the Preview pane, and a time series plot for IGD appears in the Time Series Plot(IGD) figure window.
4-200
Estimate ARIMAX Model Using Econometric Modeler App
IGD appears to be stationary. Assess whether IGD has a unit root by conducting a Phillips-Perron test: 1
On the Econometric Modeler tab, in the Tests section, click New Test > Phillips-Perron Test.
2
On the PP tab, in the Parameters section, set Number of Lags to 1.
3
In the Tests section, click Run Test.
The test results in the Results table of the PP(IGD) document.
4-201
4
Econometric Modeler
The test rejects the null hypothesis that IGD contains a unit root. Inspect Correlation and Collinearity Among Variables Plot the pairwise correlations between variables. 1
Select all variables in the Time Series pane by clicking AGE, then press Shift and click SPR.
2
Click the Plots tab, then click Correlations.
A correlations plot appears in the Correlations(AGE) figure window.
All predictors appear weakly associated with IGD. You can test whether the correlation coefficients are significant by using corrplot at the command line. Assess whether any variables are collinear by performing Belsley collinearity diagnostics:
4-202
1
In the Time Series pane, select all variables.
2
Click the Econometric Modeler tab. Then, in the Tests section, click New Test > Belsley Collinearity Diagnostics.
Estimate ARIMAX Model Using Econometric Modeler App
Tabular results appear in the Collinearity(AGE) document.
None of the condition indices are greater than the condition-index tolerance (30). Therefore, the variables do not exhibit multicollinearity. Specify and Estimate ARIMAX Model Consider an ARIMAX(0,0,1) model for IGD containing all predictors. Specify and estimate the model. 1
In the Time Series pane, click IGD.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the ARMA/ARIMA Models section, click ARIMAX.
4
In the ARIMAX Model Parameters dialog box, on the Lag Order tab, set Moving Average Order to 1.
5
In the Predictors section, select the Include? check box for each time series.
4-203
4
Econometric Modeler
6
4-204
Click Estimate. The model variable ARIMAX_IGD appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMAX_IGD) document.
Estimate ARIMAX Model Using Econometric Modeler App
At a 0.10 significance level, all predictors and the MA coefficient are significant. Close all figure windows and documents. Check Goodness of Fit Check that the residuals are normally distributed and uncorrelated by plotting a histogram, quantilequantile plot, and ACF of the residuals. 1
In the Models pane, select ARIMAX_IGD.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
Click Residual Diagnostics > Residual Q-Q Plot.
4
Click Residual Diagnostics > Autocorrelation Function.
5
In the right pane, drag the Histogram(ARIMAX_IGD) and QQPlot(ARIMAX_IGD) figure windows so that they occupy the upper two quadrants, and drag the ACF so that it occupies the lower two quadrants.
4-205
4
Econometric Modeler
The residual histogram and quantile-quantile plots suggest that the residuals might not be normally distributed. According to the ACF plot, the residuals do not exhibit serial correlation. Standard inferences rely on the normality of the residuals. To remedy nonnormality, you can try transforming the response, then estimating the model using the transformed response.
See Also Apps Econometric Modeler Objects arima Functions estimate | pptest | corrplot
4-206
Estimate ARIMAX Model Using Econometric Modeler App
More About •
“Forecast IGD Rate from ARX Model” on page 7-129
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
4-207
4
Econometric Modeler
Estimate Regression Model with ARMA Errors Using Econometric Modeler App This example shows how to specify and estimate a regression model with ARMA errors using the Econometric Modeler app. The data set, which is stored in Data_USEconModel.mat, contains the US personal consumption expenditures measured quarterly, among other series. Consider modeling the US personal consumption expenditures (PCEC, in $ billions) as a linear function of the effective federal funds rate (FEDFUNDS), unemployment rate (UNRATE), and real gross domestic product (GDP, in $ billions with respect to the year 2000). Import Data into Econometric Modeler At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
Convert the federal funds and unemployment rates from percents to decimals. DataTimeTable.UNRATE = 0.01*DataTimeTable.UNRATE; DataTimeTable.FEDFUNDS = 0.01*DataTimeTable.FEDFUNDS;
Convert the nominal GDP to real GDP by dividing all values by the GDP deflator (GDPDEF) and scaling the result by 100. Create a column in DataTimeTable for the real GDP series. DataTimeTable.RealGDP = 100*DataTimeTable.GDP./DataTimeTable.GDPDEF;
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
All time series variables in DataTimeTable appear in the Time Series pane, and a time series plot of the series appears in the Time Series Plot(COE) figure window. Plot the Series Plot the PCEC, RealGDP, FEDFUNDS, and UNRATE series on separate plots.
4-208
1
In the Time Series pane, double-click PCEC.
2
Repeat step 1 for RealGDP, FEDFUNDS, and UNRATE.
3
In the right pane, drag the Time Series Plot(PCEC) figure window to the top so that it occupies the first two quadrants.
4
Drag the Time Series Plot(RealGDP) figure window to the first quadrant.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
5
Drag the Time Series Plot(UNRATE) figure window to the third quadrant.
The PCEC and RealGDP series appear to have an exponential trend. The UNRATE and FEDFUNDS series appear to have a stochastic trend. Right-click the tab for any figure window, then select Close All to close all the figure windows. Assess Collinearity Among Series Check whether the series are collinear by performing Belsley collinearity diagnostics. 1
In the Time Series pane, select PCEC. Then, press Ctrland click to select RealGDP, FEDFUNDS, and UNRATE.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Belsley Collinearity Diagnostics.
The Belsley collinearity diagnostics results appear in the Collinearity(FEDFUNDS) document.
4-209
4
Econometric Modeler
All condition indices are below the default condition-index tolerance, which is 30. The time series do not appear to be collinear. Specify and Estimate Linear Model Specify a linear model in which PCEC is the response and RealGDP, FEDFUNDS, and UNRATE are predictors.
4-210
1
In the Time Series pane, select PCEC.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click MLR.
4
In the MLR Model Parameters dialog box, in the Predictors section, select the Include? check box for the FEDFUNDS, RealGDP, and UNRATE time series.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
5
Click Estimate.
The model variable MLR_PCEC appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(MLR_PCEC) document.
4-211
4
Econometric Modeler
In the Model Summary(MLR_PCEC) figure window, the residual plot suggests that the standard linear model assumption of uncorrelated errors is violated. The residuals appear autocorrelated, nonstationary, and possibly heteroscedastic. Stabilize Variables To stabilize the residuals, stabilize the response and predictor series by converting the PCEC and RealGDP prices to returns, and by applying the first difference to FEDFUNDS and UNRATE. Convert PCEC and RealGDP prices to returns: 1
In the Time Series pane, select the PCEC time series, then press Ctrl and select the RealGDP time series.
2
On the Econometric Modeler tab, in the Transforms section, click Log, then click Diff. In the Time Series pane, variables representing the logged and differenced time series appear.
3
In the Time Series pane, rename the PCECLogDiff and RealGDPLogDiff. Click the PCECLogDiff variable twice to select its name and enter PCECReturns. Click the RealGDPLogDiff variable twice to select its name and enter RealGDPReturns.
Apply the first difference to FEDFUNDS and UNRATE: 4-212
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
1
In the Time Series pane, select the FEDFUNDS time series, then press Ctrl and select the UNRATE time series.
2
On the Econometric Modeler tab, in the Transforms section, click Difference. In the Time Series pane, variables representing the first difference of the time series appear.
3
Close all figure windows and documents.
Respecify and Estimate Linear Model Respecify the linear model, but use the stabilized series instead. 1
In the Time Series pane, select PCECReturns.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click MLR.
4
In the MLR Model Parameters dialog box, in the Predictors section, select the Include? check box for the FEDFUNDSDiff, RealGDPReturns, and UNRATEDiff time series.
5
Click Estimate.
The model variable MLR_PCECReturns appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(MLR_PCECReturns) document.
4-213
4
Econometric Modeler
The residual plot suggests that the residuals are autocorrelated. Check Goodness of Fit of Linear Model Assess whether the residuals are normally distributed and autocorrelated by generating quantilequantile and ACF plots. Create a quantile-quantile plot of the MLR_PCECReturns model residuals: 1
In the Time Series pane, select the MLR_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Q-Q Plot.
The residuals are skewed to the right. Plot the ACF of the residuals:
4-214
1
In the Time Series pane, select the MLR_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
3
On the ACF tab, set Number of Lags to 40.
The plot shows autocorrelation in the first 34 lags. Specify and Estimate Regression Model with ARMA Errors Attempt to remedy the autocorrelation in the residuals by specifying a regression model with ARMA(1,1) errors for PCECReturns. 1
In the Time Series pane, select PCECReturns.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box: a
In the Lag Order tab: i
Set Autoregressive Order to 1. 4-215
4
Econometric Modeler
ii
Set Moving Average Order to 1.
b
In the Predictors section, select the Include? check box for the FEDFUNDSDiff, RealGDPReturns, and UNRATEDiff time series.
c
Click Estimate.
The model variable RegARMA_PCECReturns appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(RegARMA_PCECReturns) document.
4-216
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
The t statistics suggest that all coefficients are significant, except for the coefficient of UNRATEDiff. The residuals appear to fluctuate around y = 0 without autocorrelation. Check Goodness of Fit of ARMA Error Model Assess whether the residuals of the RegARMA_PCECReturns model are normally distributed and autocorrelated by generating quantile-quantile and ACF plots. Create a quantile-quantile plot of the RegARMA_PCECReturns model residuals: 1
In the Models pane, select the RegARMA_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Q-Q Plot.
4-217
4
Econometric Modeler
The residuals appear approximately normally distributed. Plot the ACF of the residuals:
4-218
1
In the Models pane, select the RegARMA_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
The first autocorrelation lag is significant. From here, you can estimate multiple models that differ by the number of autoregressive and moving average polynomial orders in the ARMA error model. Then, choose the model with the lowest fit statistic. Or, you can check the predictive performance of the models by comparing forecasts to outof-sample data.
See Also Apps Econometric Modeler Objects regARIMA | LinearModel Functions estimate | fitlm | autocorr | collintest 4-219
4
Econometric Modeler
More About
4-220
•
“Estimate Regression Model with ARIMA Errors” on page 5-85
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App This example shows how to specify and fit GARCH, EGARCH, and GJR models to data using the Econometric Modeler app. Then, the example determines the model that fits to the data the best by comparing fit statistics. The data set, which is stored in Data_FXRates.mat, contains foreign exchange rates measured daily from 1979–1998. Consider creating a predictive model for the Swiss franc to US dollar exchange rate (CHF). Import Data into Econometric Modeler At the command line, load the Data_FXRates.mat data set. load Data_FXRates
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
All time series variables in DataTimeTable appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(AUD) figure window. Plot the Series Plot the Swiss franc exchange rates by double-clicking the CHF time series in the Time Series pane. Highlight periods of recession: 1
In the Time Series Plot(CHF) figure window, right-click the plot.
2
In the context menu, select Show Recessions.
4-221
4
Econometric Modeler
The CHF series appears to have a stochastic trend. Stabilize the Series Stabilize the Swiss franc exchange rates by applying the first difference to CHF. 1
In the Time Series pane, select CHF.
2
On the Econometric Modeler tab, in the Transforms section, click Difference.
3
Highlight periods of recession: a
In the Time Series Plot(CHFDiff) figure window, right-click the plot.
b
In the context menu, select Show Recessions.
A variable named CHFDiff, representing the differenced series, appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(CHFDiff) figure window.
4-222
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
The series appears to be stable, but it exhibits volatility clustering. Assess Presence of Conditional Heteroscedasticity Test the stable Swiss franc exchange rate series for conditional heteroscedasticity by conducting Engle's ARCH test. Run the test assuming an ARCH(1) alternative model, then run the test again assuming an ARCH(2) alternative model. Maintain an overall significance level of 0.05 by decreasing the significance level of each test to 0.05/2 = 0.025. 1
In the Time Series pane, select CHFDiff.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Engle's ARCH Test.
3
On the ARCH tab, in the Parameters section, set Number of Lags of 1.
4
Set Significance Level to 0.025.
5
In the Tests section, click Run Test.
6
Repeat steps 3 through 5, but set Number of Lags to 2 instead.
The test results appear in the Results table of the ARCH(CHFDiff) document.
4-223
4
Econometric Modeler
The tests reject the null hypothesis of no ARCH effects against the alternative models. This result suggests specifying a conditional variance model for CHFDiff containing at least two ARCH lags. Conditional variance models with two ARCH lags are locally equivalent to models with one ARCH and one GARCH lag. Consider GARCH(1,1), EGARCH(1,1), and GJR(1,1) models for CHFDiff. Estimate GARCH Model Specify a GARCH(1,1) model and fit it to the CHFDiff series.
4-224
1
In the Time Series pane, select the CHFDiff time series.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GARCH.
4
In the GARCH Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1.
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
c
Click Estimate.
The model variable GARCH_CHFDiff appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GARCH_CHFDiff) document.
4-225
4
Econometric Modeler
Specify and Estimate EGARCH Model Specify an EGARCH(1,1) model containing a leverage term at the first lag, and fit the model to the CHFDiff series. 1
In the Time Series pane, select the CHFDiff time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click EGARCH.
4
In the EGARCH Model Parameters dialog box, on the Lag Order tab:
5
a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1. Consequently, the app includes a corresponding leverage lag. You can remove or adjust leverage lags on the Lag Vector tab.
Click Estimate.
The model variable EGARCH_CHFDiff appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(EGARCH_CHFDiff) document.
4-226
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
Specify and Estimate GJR Model Specify a GJR(1,1) model containing a leverage term at the first lag, and fit the model to the CHFDiff series. 1
In the Time Series pane, select the CHFDiff time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GJR.
4
In the GJR Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1. Consequently, the app includes a corresponding leverage lag. You can remove or adjust leverage lags on the Lag Vector tab.
c
Click Estimate.
The model variable GJR_CHFDiff appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GJR_CHFDiff) document.
4-227
4
Econometric Modeler
Choose Model Choose the model with the best parsimonious in-sample fit. Base your decision on the model yielding the minimal Akaike information criterion (AIC). The table shows the AIC fit statistics of the estimated models, as given in the Goodness of Fit section of the estimation summary of each model. Model
AIC
GARCH(1,1)
-28730
EGARCH(1,1)
-28726
GJR(1,1)
-28737
The GJR(1,1) model yields the minimal BIC value. Therefore, it has the best parsimonious in-sample fit of all the estimated models.
See Also Apps Econometric Modeler Objects egarch | gjr | garch 4-228
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
Functions estimate
More About •
“Compare Conditional Variance Models Using Information Criteria” on page 8-69
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
4-229
4
Econometric Modeler
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App This example shows how to evaluate GARCH model assumptions by performing residual diagnostics using the Econometric Modeler app. The data set, stored in CAPMuniverse.mat, contains market data for daily returns of stocks and cash (money market) from the period January 1, 2000 to November 7, 2005. Consider modeling the market index returns (MARKET). Import Data into Econometric Modeler At the command line, load the CAPMuniverse.mat data set. load CAPMuniverse
The series are in the timetable AssetsTimeTable. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import AssetsTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the AssetsTimeTable variable.
3
Click Import.
The market index variables, including MARKET, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(APPL) figure window. Plot the Series Plot the market index series by double-clicking the MARKET time series in the Time Series pane.
4-230
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App
The series appears to fluctuate around y = 0 and exhibits volatility clustering. Consider a GARCH(1,1) model without a mean offset for the series. Specify and Estimate GARCH Model Specify a GARCH(1,1) model without a mean offset. 1
In the Time Series pane, select MARKET.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GARCH.
4
In the GARCH Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1.
4-231
4
Econometric Modeler
5
Click Estimate.
The model variable GARCH_MARKET appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GARCH_MARKET) document.
4-232
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App
The p values of the coefficient estimates are close to zero, which indicates that the estimates are significant. The inferred conditional variances show high volatility through 2003, then small volatility through 2005. The standardized residuals appear to fluctuate around y = 0, and there are several large (in magnitude) residuals. Check Goodness of Fit Assess whether the standardized residuals are normally distributed and uncorrelated. Then, assess whether the residual series has lingering conditional heteroscedasticity. Assess whether the standardized residuals are normally distributed by plotting their histogram and a quantile-quantile plot: 1
In the Models pane, select GARCH_MARKET.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
In the Diagnostics section, click Residual Diagnostics > Residual Q-Q Plot.
The histogram and quantile-quantile plot appear in the Histogram(GARCH_MARKET) and QQPlot(GARCH_MARKET) figure windows, respectively. Assess whether the standardized residuals are autocorrelated by plotting their autocorrelation function (ACF). 4-233
4
Econometric Modeler
1
In the Models pane, select GARCH_MARKET.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
The ACF plot appears in the ACF(GARCH_MARKET) figure window. Assess whether the residual series has lingering conditional heteroscedasticity by plotting the ACF of the squared standardized residuals: 1
In the Models pane, select GARCH_MARKET.
2
Click the Econometric Modeler tab. Then, in the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation.
The ACF of the squared standardized residuals appears in the ACF(GARCH_MARKET)2 figure window. Arrange the histogram, quantile-quantile plot, ACF, and the ACF of the squared standardized residual series so that they occupy the four quadrants of the right pane. On the Documents pane, click the Document Actions button squares.
4-234
, select Tile All, place the pointer in the (2,2) position of the matrix of
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App
Although the results show a few large standardized residuals, they appear to be approximately normally distributed. The ACF plots of the standardized and squared standardized residuals do not contain any significant autocorrelations. Therefore, it is reasonable to conclude that the standardized residuals are uncorrelated and homoscedastic.
See Also Apps Econometric Modeler Objects garch Functions estimate | infer | autocorr
More About •
“Infer Conditional Variances and Residuals” on page 8-62
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141 4-235
4
Econometric Modeler
•
4-236
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
Share Results of Econometric Modeler App Session
Share Results of Econometric Modeler App Session This example shows how to share the results of an Econometric Modeler app session by: • Exporting time series and model variables to the MATLAB Workspace • Generating MATLAB plain text and live functions to use outside the app • Generating a report of your activities on time series and estimated models During the session, the example transforms and plots data, runs statistical tests, and estimates a multiplicative seasonal ARIMA model. The data set Data_Airline.mat contains monthly counts of airline passengers. Import Data into Econometric Modeler At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
4-237
4
Econometric Modeler
The series exhibits a seasonal trend, serial correlation, and possible exponential growth. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71. Stabilize Series Address the exponential trend by applying the log transform to PSSG. 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-238
Share Results of Econometric Modeler App Session
The exponential growth appears to be removed from the series. Address the seasonal trend by applying the 12th order seasonal difference. With PSSGLog selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, set Seasonal to 12. Then, click Seasonal. The transformed variable PSSGLogSeasonalDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiff) figure window.
4-239
4
Econometric Modeler
The transformed series appears to have a unit root. Test the null hypothesis that PSSGLogSeasonalDiff has a unit root by using the Augmented DickeyFuller test. Specify that the alternative is an AR(0) model, then test again specifying an AR(1) model. Adjust the significance level to 0.025 to maintain a total significance level of 0.05. 1
With PSSGLogSeasonalDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Augmented Dickey-Fuller Test.
2
On the ADF tab, in the Parameters section, set Significance Level to 0.025.
3
In the Tests section, click Run Test.
4
In the Parameters section, set Number of Lags to 1.
5
In the Tests section, click Run Test.
The test results appear in the Results table of the ADF(PSSGLogSeasonalDiff) document.
4-240
Share Results of Econometric Modeler App Session
Both tests fail to reject the null hypothesis that the series is a unit root process. Address the unit root by applying the first difference to PSSGLogSeasonalDiff. With PSSGLogSeasonalDiff selected in the Time Series pane, click the Econometric Modeler tab. Then, in the Transforms section, click Difference. The transformed variable PSSGLogSeasonalDiffDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiffDiff) figure window. In the Time Series pane, rename the PSSGLogSeasonalDiffDiff variable by clicking it twice to select its name and PSSGStable. The app updates the names of all documents associated with the transformed series.
4-241
4
Econometric Modeler
Identify Model for Series Determine the lag structure for a conditional mean model of the data by plotting the sample autocorrelation function (ACF) and partial autocorrelation function (PACF).
4-242
1
With PSSGStable selected in the Time Series pane, click the Plots tab, then click ACF.
2
Show the first 50 lags of the ACF. On the ACF tab, set Number of Lags to 50.
3
Click the Plots tab, then click PACF.
4
Show the first 50 lags of the PACF. On the PACF tab, set Number of Lags to 50.
5
Drag the ACF(PSSGStable) figure window above the PACF(PSSGStable) figure window.
Share Results of Econometric Modeler App Session
According to [1], the autocorrelations in the ACF and PACF suggest that the following SARIMA(0,1,1) ×(0,1,1)12 model is appropriate for PSSGLog. (1 − L) 1 − L12 yt = 1 + θ1L 1 + Θ12L12 εt . Close all figure windows. Specify and Estimate SARIMA Model Specify the SARIMA(0,1,1)×(0,1,1)12 model. 1
In the Time Series pane, select the PSSGLog time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the ARMA/ARIMA Models section, click SARIMA.
4
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section 4-243
4
Econometric Modeler
a
Set Degrees of Integration to 1.
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section
5
a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box.
Click Estimate.
The model variable SARIMA_PSSGLog appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(SARIMA_PSSGLog) document.
4-244
Share Results of Econometric Modeler App Session
Export Variables to Workspace Export PSSGLog, PSSGStable, and SARIMA_PSSGLog to the MATLAB Workspace. 1 2
On the Econometric Modeler tab, in the Export section, click
.
In the Export Variables dialog box, select the Select check boxes for the PSSGLog and PSSGStable time series, and the SARIMA_PSSGLog model (if necessary). The app automatically selects the check boxes for all variables that are highlighted in the Time Series and Models panes.
4-245
4
Econometric Modeler
3
Click Export.
At the command line, list all variables in the workspace. whos Name Data DataTable DataTimeTable Description PSSGLog PSSGStable SARIMA_PSSGLog dates series
Size 144x1 144x2 144x1 22x54 144x1 144x1 1x1 144x1 1x1
Bytes 1152 3525 3311 2376 1152 1152 7963 1152 162
Class
Attributes
double table timetable char double double arima double cell
The contents of Data_Airline.mat, the numeric vectors PSSGLog and PSSGStable, and the estimated arima model object SARIMA_PSSGLog are variables in the workspace. Forecast the next three years (36 months) of log airline passenger counts using SARIMA_PSSGLog. Specify the PSSGLog as presample data. numObs = 36; fPSSG = forecast(SARIMA_PSSGLog,numObs,'Y0',PSSGLog);
Plot the passenger counts and the forecasts. fh = DataTimeTable.Time(end) + calmonths(1:numObs); figure; plot(DataTimeTable.Time,exp(PSSGLog)); hold on plot(fh,exp(fPSSG)); legend('Airline Passenger Counts','Forecasted Counts',... 'Location','best') title('Monthly Airline Passenger Counts, 1949-1963')
4-246
Share Results of Econometric Modeler App Session
ylabel('Passenger counts') hold off
Generate Plain Text Function from App Session Generate a MATLAB function for use outside the app. The function returns the estimated model SARIMA_PSSGLog given DataTimeTable. 1
In the Models pane of the app, select the SARIMA_PSSGLog model.
2
On the Econometric Modeler tab, in the Export section, click Export > Generate Function. The MATLAB Editor opens and contains a function named modelTimeSeries. The function accepts DataTimeTable (the variable you imported in this session), transforms data, and returns the estimated SARIMA(0,1,1)×(0,1,1)12 model SARIMA_PSSGLog.
4-247
4
Econometric Modeler
3
On the Editor tab, click Save > Save.
4
Save the function to your current folder by clicking Save in the Select File for Save As dialog box.
At the command line, estimate the SARIMA(0,1,1)×(0,1,1)12 model by passing DataTimeTable to modelTimeSeries. Name the model SARIMA_PSSGLog2. Compare the estimated model to SARIMA_PSSGLog. SARIMA_PSSGLog2 = modelTimeSeries(DataTimeTable); summarize(SARIMA_PSSGLog) summarize(SARIMA_PSSGLog2) ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488 Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488
4-248
Share Results of Econometric Modeler App Session
Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
As expected, the models are identical. Generate Live Function from App Session Unlike a plain text function, a live function contains formatted text and equations that you can modify by using the Live Editor. Generate a live function for use outside the app. The function returns the estimated model SARIMA_PSSGLog given DataTimeTable. 1
In the Models pane of the app, select the SARIMA_PSSGLog model.
2
On the Econometric Modeler tab, in the Export section, click Export > Generate Live Function. The Live Editor opens and contains a function named modelTimeSeries. The function accepts DataTimeTable (the variable you imported in this session), transforms data, and returns the estimated SARIMA(0,1,1)×(0,1,1)12 model SARIMA_PSSGLog.
3
To ensure the function does not shadow the M-file function, change the name of the function to modelTimeSeriesMLX.
4-249
4
Econometric Modeler
4
On the Live Editor tab, in the File section, click Save > Save.
5
Save the function to your current folder by clicking Save in the Select File for Save As dialog box.
At the command line, estimate the SARIMA(0,1,1)×(0,1,1)12 model by passing DataTimeTable to modelTimeSeriesMLX. Name the model SARIMA_PSSGLog2. Compare the estimated model to SARIMA_PSSGLog. SARIMA_PSSGLog2 = modelTimeSeriesMLX(DataTimeTable); summarize(SARIMA_PSSGLog) summarize(SARIMA_PSSGLog2) ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488 Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488 Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
As expected, the models are identical. Generate Report Generate a PDF report of all your actions on the PSSGLog and PSSGStable time series, and the SARIMA_PSSGLog model. 1
4-250
On the Econometric Modeler tab, in the Export section, click Export > Generate Report.
Share Results of Econometric Modeler App Session
2
In the Select Variables for Report dialog box, select the Select check boxes for the PSSGLog and PSSGStable time series, and the SARIMA_PSSGLog model (if necessary). The app automatically selects the check boxes for all variables that are highlighted in the Time Series and Models panes.
3
Click OK.
4
In the Select File to Write dialog box, navigate to the C:\MyData folder.
5
In the File name box, type SARIMAReport.
6
Click Save.
The app publishes the code required to create PSSGLog, PSSGStable, and SARIMA_PSSGLog in the PDF C:\MyData\SARIMAReport.pdf. The report includes: • A title page and table of contents • Plots that include the selected time series • Descriptions of transformations applied to the selected time series • Results of statistical tests conducted on the selected time series • Estimation summaries of the selected models
4-251
4
Econometric Modeler
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler Objects arima Functions summarize | estimate
More About
4-252
•
“Specify Multiplicative ARIMA Model” on page 7-58
•
“Estimate Multiplicative ARIMA Model” on page 7-122
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
Creating ARIMA Models Using Econometric Modeler App
5 Time Series Regression Models • “Time Series Regression Models” on page 5-3 • “Regression Models with Time Series Errors” on page 5-5 • “Create Regression Models with ARIMA Errors” on page 5-8 • “Specify the Default Regression Model with ARIMA Errors” on page 5-19 • “Modify regARIMA Model Properties” on page 5-21 • “Create Regression Models with AR Errors” on page 5-26 • “Create Regression Models with MA Errors” on page 5-31 • “Create Regression Models with ARMA Errors” on page 5-36 • “Create Regression Models with ARIMA Errors” on page 5-44 • “Create Regression Models with SARIMA Errors” on page 5-49 • “Specify Regression Model with SARIMA Errors” on page 5-53 • “Specify ARIMA Error Model Innovation Distribution” on page 5-59 • “Impulse Response of Regression Models with ARIMA Errors” on page 5-64 • “Plot Impulse Response of Regression Model with ARIMA Errors” on page 5-65 • “Maximum Likelihood Estimation of regARIMA Models” on page 5-72 • “regARIMA Model Estimation Using Equality Constraints” on page 5-74 • “Presample Values for regARIMA Model Estimation” on page 5-78 • “Initial Values for regARIMA Model Estimation” on page 5-80 • “Optimization Settings for regARIMA Model Estimation” on page 5-82 • “Estimate Regression Model with ARIMA Errors” on page 5-85 • “Estimate a Regression Model with Multiplicative ARIMA Errors” on page 5-92 • “Select Regression Model with ARIMA Errors” on page 5-100 • “Choose Lags for ARMA Error Model” on page 5-102 • “Intercept Identifiability in Regression Models with ARIMA Errors” on page 5-106 • “Alternative ARIMA Model Representations” on page 5-110 • “Simulate Regression Models with ARMA Errors” on page 5-116 • “Simulate Regression Models with Nonstationary Errors” on page 5-135 • “Simulate Regression Models with Multiplicative Seasonal Errors” on page 5-143 • “Monte Carlo Simulation of Regression Models with ARIMA Errors” on page 5-148 • “Presample Data for regARIMA Model Simulation” on page 5-151 • “Transient Effects in regARIMA Model Simulations” on page 5-152 • “Forecast a Regression Model with ARIMA Errors” on page 5-160 • “Forecast a Regression Model with Multiplicative Seasonal ARIMA Errors” on page 5-163 • “Verify Predictive Ability Robustness of a regARIMA Model” on page 5-167 • “MMSE Forecasting Regression Models with ARIMA Errors” on page 5-169
5
Time Series Regression Models
• “Monte Carlo Forecasting of regARIMA Models” on page 5-172 • “Time Series Regression I: Linear Models” on page 5-173 • “Time Series Regression II: Collinearity and Estimator Variance” on page 5-180 • “Time Series Regression III: Influential Observations” on page 5-190 • “Time Series Regression IV: Spurious Regression” on page 5-197 • “Time Series Regression V: Predictor Selection” on page 5-209 • “Time Series Regression VI: Residual Diagnostics” on page 5-220 • “Time Series Regression VII: Forecasting” on page 5-231 • “Time Series Regression VIII: Lagged Variables and Estimator Bias” on page 5-240 • “Time Series Regression IX: Lag Order Selection” on page 5-261 • “Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-279
5-2
Time Series Regression Models
Time Series Regression Models Time series regression models attempt to explain the current response using the response history (autoregressive dynamics) and the transfer of dynamics from relevant predictors (or otherwise). Theoretical frameworks for potential relationships among variables often permit different representations of the system. Use time series regression models to analyze time series data, which are measurements that you take at successive time points. For example, use time series regression modeling to: • Examine the linear effects of the current and past unemployment rates and past inflation rates on the current inflation rate. • Forecast GDP growth rates by using an ARIMA model and include the CPI growth rate as a predictor. • Determine how a unit increase in rainfall, amount of fertilizer, and labor affect crop yield. You can start a time series analysis by building a design matrix (Xt), which can include current and past observations of predictors. You can also complement the regression component with an autoregressive (AR) component to account for the possibility of response (yt) dynamics. For example, include past measurements of inflation rate in the regression component to explain the current inflation rate. AR terms account for dynamics unexplained by the regression component, which is necessarily underspecified in econometric applications. Also, the AR terms absorb residual autocorrelations, simplify innovation models, and generally improve forecast performance. Then, apply ordinary least squares (OLS) to the multiple linear regression (MLR) model: yt = Xt β + ut . If a residual analysis suggests classical linear model assumption departures such as that heteroscedasticity or autocorrelation (i.e., nonspherical errors), then: • You can estimate robust HAC (heteroscedasticity and autocorrelation consistent) standard errors (for details, see hac). • If you know the innovation covariance matrix (at least up to a scaling factor), then you can apply generalized least squares (GLS). Given that the innovation covariance matrix is correct, GLS effectively reduces the problem to a linear regression where the residuals have covariance I. • If you do not know the structure of the innovation covariance matrix, but know the nature of the heteroscedasticity and autocorrelation, then you can apply feasible generalized least squares (FGLS). FGLS applies GLS iteratively, but uses the estimated residual covariance matrix. FGLS estimators are efficient under certain conditions. For details, see [1], Chapter 11. There are time series models that model the dynamics more explicitly than MLR models. These models can account for AR and predictor effects as with MLR models, but have the added benefits of: • Accounting for moving average (MA) effects. Include MA terms to reduce the number of AR lags, effectively reducing the number of observation required to initialize the model. • Easily modeling seasonal effects. In order to model seasonal effects with an MLR model, you have to build an indicator design matrix. • Modeling nonseasonal and seasonal integration for unit root nonstationary processes. These models also differ from MLR in that they rely on distribution assumptions (i.e., they use maximum likelihood for estimation). Popular types of time series regression models include: 5-3
5
Time Series Regression Models
• Autoregressive integrated moving average with exogenous predictors (ARIMAX). This is an ARIMA model that linearly includes predictors (exogenous or otherwise). For details, see arima or “ARIMAX(p,D,q) Model” on page 7-62. • Regression model with ARIMA time series errors. This is an MLR model where the unconditional disturbance process (ut) is an ARIMA time series. In other words, you explicitly model ut as a linear time series. For details, see regARIMA. • Distributed lag model (DLM). This is an MLR model that includes the effects of predictors that persist over time. In other words, the regression component contains coefficients for contemporaneous and lagged values of predictors. Econometrics Toolbox does not contain functions that model DLMs explicitly, but you can use regARIMA or fitlm with an appropriately constructed predictor (design) matrix to analyze a DLM. • Transfer function (autoregressive distributed lag) model. This model extends the distributed lag framework in that it includes autoregressive terms (lagged responses). Econometrics Toolbox does not contain functions that model DLMs explicitly, but you can use the arima functionality with an appropriately constructed predictor matrix to analyze an autoregressive DLM. The choice you make on which model to use depends on your goals for the analysis, and the properties of the data.
References [1] Greene, W. H. Econometric Analysis. 6th ed. Englewood Cliffs, NJ: Prentice Hall, 2008.
See Also arima | regARIMA | hac | fitlm
More About
5-4
•
“ARIMAX(p,D,q) Model” on page 7-62
•
“Regression Models with Time Series Errors” on page 5-5
Regression Models with Time Series Errors
Regression Models with Time Series Errors In this section... “What Are Regression Models with Time Series Errors?” on page 5-5 “Conventions” on page 5-5
What Are Regression Models with Time Series Errors? Regression models with time series errors attempt to explain the mean behavior of a response series (yt, t = 1,...,T) by accounting for linear effects of predictors (Xt) using a multiple linear regression (MLR). However, the errors (ut), called unconditional disturbances, are time series rather than white noise, which is a departure from the linear model assumptions. Unlike the ARIMA model that includes exogenous predictors, regression models with time series errors preserve the sensitivity interpretation of the regression coefficients (β) [2]. These models are particularly useful for econometric data. Use these models to: • Analyze the effects of a new policy on a market indicator (an intervention model). • Forecast population size adjusting for predictor effects, such as expected prevalence of a disease. • Study the behavior of a process adjusting for calendar effects. For example, you can analyze traffic volume by adjusting for the effects of major holidays. For details, see [3]. • Estimate the trend by including time (t) in the model. • Forecast total energy consumption accounting for current and past prices of oil and electricity (distributed lag model). Use these tools in Econometrics Toolbox to: • Specify a regression model with ARIMA errors (see regARIMA). • Estimate parameters using a specified model, and response and predictor data (see estimate). • Simulate responses using a model and predictor data (see simulate). • Forecast responses using a model and future predictor data (see forecast). • Infer residuals and estimated unconditional disturbances from a model using the model and predictor data (see infer). • filter innovations through a model using the model and predictor data • Generate impulse responses (see impulse). • Compare a regression model with ARIMA errors to an ARIMAX model (see arima).
Conventions A regression model with time series errors has the following form (in lag operator notation): yt = c + Xt β + ut a L A L 1−L
D
1 − Ls ut = b L B L εt,
(5-1)
where 5-5
5
Time Series Regression Models
• t = 1,...,T. • yt is the response series. • Xt is row t of X, which is the matrix of concatenated predictor data vectors. That is, Xt is observation t of each predictor series. • c is the regression model intercept. • β is the regression coefficient. • ut is the disturbance series. • εt is the innovations series. • L jy = y t− j. t • a L = 1 − a L − ... − a Lp , which is the degree p, nonseasonal autoregressive polynomial. 1 p •
A L = 1 − A1L − ... − ApsL
•
1 − L , which is the degree D, nonseasonal integration polynomial.
•
1 − Ls , which is the degree s, seasonal integration polynomial.
ps
, which is the degree ps, seasonal autoregressive polynomial.
D
• b L = 1 + b L + ... + b Lq , which is the degree q, nonseasonal moving average polynomial. 1 q • B L = 1 + B L + ... + B Lqs , which is the degree q , seasonal moving average polynomial. s 1 qs Following Box and Jenkins methodology, ut is a stationary or unit root nonstationary, regular, linear time series. However, if ut is unit root nonstationary, then you do not have to explicitly difference the series as they recommend in [1]. You can simply specify the seasonal and nonseasonal integration degree using the software. For details, see “Create Regression Models with ARIMA Errors” on page 5-8. Another deviation from the Box and Jenkins methodology is that ut does not have a constant term (conditional mean), and therefore its unconditional mean is 0. However, the regression model contains an intercept term, c. Note If the unconditional disturbance process is nonstationary (i.e., the nonseasonal or seasonal integration degree is greater than 0), then the regression intercept, c, is not identifiable. For details, see “Intercept Identifiability in Regression Models with ARIMA Errors” on page 5-106. The software enforces stability and invertibility of the ARMA process. That is, ψ(L) =
b(L)B(L) = 1 + ψ1L + ψ2L2 + ..., a(L)A(L)
where the series {ψt} must be absolutely summable. The conditions for {ψt} to be absolutely summable are: • a(L) and A(L) are stable (i.e., the eigenvalues of a(L) = 0 and A(L) = 0 lie inside the unit circle). • b(L) and B(L) are invertible (i.e., their eigenvalues lie of b(L) = 0 and B(L) = 0 inside the unit circle). The software uses maximum likelihood for parameter estimation. You can choose either a Gaussian or Student’s t distribution for the innovations, εt. 5-6
Regression Models with Time Series Errors
The software treats predictors as nonstochastic variables for estimation and inference.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hyndman, R. J. (2010, October). “The ARIMAX Model Muddle.” Rob J. Hyndman. Retrieved May 4, 2017 from https://robjhyndman.com/hyndsight/arimax/. [3] Ruey, T. S. “Regression Models with Time Series Errors.” Journal of the American Statistical Association. Vol. 79, Number 385, March 1984, pp. 118–124.
See Also regARIMA | arima | estimate | filter | forecast | impulse | infer | simulate
Related Examples •
“Alternative ARIMA Model Representations” on page 5-110
•
“Intercept Identifiability in Regression Models with ARIMA Errors” on page 5-106
More About •
“ARIMA Model Including Exogenous Covariates” on page 7-62
•
“Create Regression Models with ARIMA Errors” on page 5-8
5-7
5
Time Series Regression Models
Create Regression Models with ARIMA Errors In this section... “Default Regression Model with ARIMA Errors Specifications” on page 5-8 “Specify regARIMA Models Using Name-Value Pair Arguments” on page 5-9 “Specify Linear Regression Models Using Econometric Modeler App” on page 5-15
Default Regression Model with ARIMA Errors Specifications Regression models with ARIMA errors have the following form (in lag operator notation on page 121): yt = c + Xt β + ut a L A L 1−L
D
1 − Ls ut = b L B L εt,
where • t = 1,...,T. • yt is the response series. • Xt is row t of X, which is the matrix of concatenated predictor data vectors. That is, Xt is observation t of each predictor series. • c is the regression model intercept. • β is the regression coefficient. • ut is the disturbance series. • εt is the innovations series. • L jy = y t− j. t • a L = 1 − a L − ... − a Lp , which is the degree p, nonseasonal autoregressive polynomial. 1 p •
A L = 1 − A1L − ... − ApsL
•
1 − L , which is the degree D, nonseasonal integration polynomial.
•
1 − Ls , which is the degree s, seasonal integration polynomial.
ps
, which is the degree ps, seasonal autoregressive polynomial.
D
• b L = 1 + b L + ... + b Lq , which is the degree q, nonseasonal moving average polynomial. 1 q • B L = 1 + B L + ... + B Lqs , which is the degree q , seasonal moving average polynomial. s 1 qs For simplicity, use the shorthand notation Mdl = regARIMA(p,D,q) to specify a regression model with ARIMA(p,D,q) errors, where p, D, and q are nonnegative integers. Mdl has the following default properties.
5-8
Property Name
Property Data Type
AR
Length p cell vector of NaNs
Beta
Empty vector [] of regression coefficients, corresponding to the predictor series
Create Regression Models with ARIMA Errors
Property Name
Property Data Type
D
Nonnegative scalar, corresponding to D
Distribution
"Gaussian", corresponding to the distribution of εt
Intercept
NaN, corresponding to c
MA
Length q cell vector of NaNs
P
Number of AR terms plus degree of integration, p +D
Q
Number of MA terms, q
SAR
Empty cell vector
SMA
Empty cell vector
Variance
NaN, corresponding to the variance of εt
Seasonality
0, corresponding to s
If you specify nonseasonal ARIMA errors, then • The properties D and Q are the inputs D and q, respectively. • Property P = p + D, which is the degree of the compound, nonseasonal autoregressive polynomial. In other words, P is the degree of the product of the nonseasonal autoregressive polynomial, a(L) and the nonseasonal integration polynomial, (1 – L)D. The values of properties P and Q indicate how many presample observations the software requires to initialize the time series. You can modify the properties of Mdl using dot notation. For example, Mdl.Variance = 0.5 sets the innovation variance to 0.5. For maximum flexibility in specifying a regression model with ARIMA errors, use name-value pair arguments to, for example, set each of the autoregressive parameters to a value, or specify multiplicative seasonal terms. For example, Mdl = regARIMA('AR',{0.2 0.1}) defines a regression model with AR(2) errors, and the coefficients are a1 = 0.2 and a2 = 0.1.
Specify regARIMA Models Using Name-Value Pair Arguments You can only specify the nonseasonal autoregressive and moving average polynomial degrees, and nonseasonal integration degree using the shorthand notation regARIMA(p,D,q). Some tasks, such as forecasting and simulation, require you to specify values for parameters. You cannot specify parameter values using shorthand notation. For maximum flexibility, use name-value pair arguments to specify regression models with ARIMA errors. The nonseasonal ARIMA error model might contain the following polynomials: • The degree p autoregressive polynomial a(L) = 1 – a1L – a2L2 –...– apLp. The eigenvalues of a(L) must lie within the unit circle (i.e., a(L) must be a stable polynomial). • The degree q moving average polynomial b(L) = 1 + b1L + b2L2 +...+ bqLq. The eigenvalues of b(L) must lie within the unit circle (i.e., b(L) must be an invertible polynomial). • The degree D nonseasonal integration polynomial is (1 – L)D. 5-9
5
Time Series Regression Models
The following table contains the name-value pair arguments that you use to specify the ARIMA error model (i.e., a regression model with ARIMA errors, but without a regression component and intercept): yt = ut 1
a(L)( = b(L)εt .
5-10
(5-2)
Create Regression Models with ARIMA Errors
Name-Value Pair Arguments for Nonseasonal ARIMA Error Models Name
Corresponding Model Term(s) in “Equation 5-2”
When to Specify
AR
Nonseasonal AR coefficients: a1, a2,...,ap
• To set equality constraints for the AR coefficients. For example, to specify the AR coefficients in the ARIMA error model ut = 0.8ut − 1 − 0.2ut − 2 + εt, specify 'AR',{0.8,-0.2}. • You only need to specify the nonzero elements of AR. If the nonzero coefficients are at nonconsecutive lags, specify the corresponding lags using ARLags. • The coefficients must correspond to a stable AR polynomial.
ARLags
Lags corresponding • ARLags is not a model property. to nonzero, Use this argument as a shortcut for specifying AR nonseasonal AR when the nonzero AR coefficients correspond to coefficients nonconsecutive lags. For example, to specify nonzero AR coefficients at lags 1 and 12, e.g., ut = a1ut − 1 + a2ut − 12 + εt, specify 'ARLags',[1,12]. • Use AR and ARLags together to specify known nonzero AR coefficients at nonconsecutive lags. For example, if in the given AR(12) error model with a1 = 0.6 and a12 = –0.3, then specify 'AR', {0.6,-0.3},'ARLags',[1,12].
D
Degree of nonseasonal differencing, D
• To specify a degree of nonseasonal differencing greater than zero. For example, to specify one degree of differencing, specify 'D',1. • By default, D has value 0 (meaning no nonseasonal integration).
Distribution
Distribution of the • Use this argument to specify a Student’s t innovation process, distribution. By default, the innovation distribution εt is "Gaussian". For example, to specify a t distribution with unknown degrees of freedom, specify 'Distribution','t'. • To specify a t innovation distribution with known degrees of freedom, assign Distribution a structure with fields Name and DoF. For example, for a t distribution with nine degrees of freedom, specify 'Distribution',struct('Name','t','DoF' ,9).
5-11
5
Time Series Regression Models
Name
Corresponding Model Term(s) in “Equation 5-2”
When to Specify
MA
Nonseasonal MA coefficients: b1, b2,...,bq
• To set equality constraints for the MA coefficients. For example, to specify the MA coefficients in the ARIMA error model ut = εt + 0.5εt − 1 + 0.2εt − 2, specify 'MA',{0.5,0.2}. • You only need to specify the nonzero elements of MA. If the nonzero coefficients are at nonconsecutive lags, specify the corresponding lags using MALags. • The coefficients must correspond to an invertible MA polynomial.
MALags
Lags corresponding • MALags is not a model property. to nonzero, • Use this argument as a shortcut for specifying MA nonseasonal MA when the nonzero MA coefficients correspond to coefficients nonconsecutive lags. For example, to specify nonzero MA coefficients at lags 1 and 4, e.g., ut = εt + b1εt − 1 + b4εt − 4, specify 'MALags',[1,4]. • Use MA and MALags together to specify known nonzero MA coefficients at nonconsecutive lags. For example, if in the given MA(4) error model b1 = 0.5 and b4 = 0.2, specify 'MA', {0.4,0.2},'MALags',[1,4].
Variance
Scalar variance, σ2, To set equality constraints for σ2. For example, for an of the innovation ARIMA error model with known innovation variance process, εt 0.1, specify 'Variance',0.1. By default, Variance has value NaN.
Use the name-value pair arguments in the following table in conjunction with those in Name-Value Pair Arguments for Nonseasonal ARIMA Error Models to specify the regression components of the regression model with ARIMA errors: yt = c + Xt β + ut 1
a(L)( = b(L)εt .
5-12
(5-3)
Create Regression Models with ARIMA Errors
Name-Value Pair Arguments for the Regression Component of the regARIMA Model Name
Corresponding Model Term(s) in “Equation 5-3”
When to Specify
Beta
Regression • Use this argument to specify the values of the coefficient values coefficients of the predictor series. For example, corresponding to use 'Beta',[0.5 7 -2] to specify the predictor series, β = 0.5 7 −2 ′ . β • By default, Beta is an empty vector, [].
Intercept
Intercept term for the regression model, c
• To set equality constraints for c. For example, for a model with no intercept term, specify 'Intercept',0. • By default, Intercept has value NaN.
If the time series has seasonality s, then • The degree ps seasonal autoregressive polynomial is A(L) = 1 – A 1L – A2L2 –...– ApsLps. • The degree qs seasonal moving average polynomial is B(L) 1 + B 1L + B2L2 +...+ BqsLqs. • The degree s seasonal integration polynomial is (1 – Ls). Use the name-value pair arguments in the following table in conjunction with those in tables NameValue Pair Arguments for Nonseasonal ARIMA Error Models and Name-Value Pair Arguments for the Regression Component of the regARIMA Model to specify the regression model with multiplicative seasonal ARIMA errors: yt = c + Xt β + ut 1
a(L)( A(L)(1 − Ls)ut = b(L)B(L)εt .
(5-4)
5-13
5
Time Series Regression Models
Name-Value Pair Arguments for Seasonal ARIMA Models Argument
Corresponding Model When to Specify Term(s) in “Equation 5-4”
SAR
Seasonal AR coefficients: A1, • To set equality constraints for the seasonal AR A2,...,Aps coefficients. • Use SARLags to specify the lags of the nonzero seasonal AR coefficients. Specify the lags associated with the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or 12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...). For example, to specify the ARIMA error model (1 − 0.8L)(1 − 0.2L12)ut = εt, specify 'AR',0.8,'SAR',0.2,'SARLags',12. • The coefficients must correspond to a stable seasonal AR polynomial.
SARLags
Lags corresponding to nonzero seasonal AR coefficients, in the periodicity of the responses
• SARLags is not a model property. • Use this argument when specifying SAR to indicate the lags of the nonzero seasonal AR coefficients. For example, to specify the ARIMA error model (1 − a1L)(1 − A12L12)ut = εt, specify 'ARLags',1,'SARLags',12.
SMA
Seasonal MA coefficients: B1, • To set equality constraints for the seasonal MA B2,...,Bqs coefficients. • Use SMALags to specify the lags of the nonzero seasonal MA coefficients. Specify the lags associated with the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or 12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...). For example, to specify the ARIMA error model ut = (1 + 0.6L)(1 + 0.2L4)εt, specify 'MA',0.6,'SMA',0.2,'SMALags',4. • The coefficients must correspond to an invertible seasonal MA polynomial.
SMALags
Lags corresponding to the nonzero seasonal MA coefficients, in the periodicity of the responses
• SMALags is not a model property. • Use this argument when specifying SMA to indicate the lags of the nonzero seasonal MA coefficients. For example, to specify the model ut = (1 + b1L)(1 + B4L4)εt, specify 'MALags',1,'SMALags',4.
5-14
Create Regression Models with ARIMA Errors
Argument
Corresponding Model When to Specify Term(s) in “Equation 5-4”
Seasonality
Seasonal periodicity, s
• To specify the degree of seasonal integration s in the seasonal differencing polynomial Δs = 1 – Ls. For example, to specify the periodicity for seasonal integration of quarterly data, specify 'Seasonality',4. • By default, Seasonality has value 0 (meaning no periodicity nor seasonal integration).
Note You cannot assign values to the properties P and Q. For multiplicative ARIMA error models, • regARIMA sets P equal to p + D + ps + s. • regARIMA sets Q equal to q + qs
Specify Linear Regression Models Using Econometric Modeler App You can specify the predictor variables in the regression component, and the error model lag structure and innovation distribution, using the Econometric Modeler app. The app treats all coefficients as unknown and estimable. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). In the app, you can see all supported models by selecting a time series variable for the response in the Time Series pane. Then, on the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
5-15
5
Time Series Regression Models
The Regression Models section contains supported regression models. To specify a multiple linear regression (MLR) model, select MLR. To specify regression models with ARMA errors, select RegARMA. After you select a model, the app displays the Type Model Parameters dialog box, where Type is the model type. This figure shows the RegARMA Model Parameters dialog box.
5-16
Create Regression Models with ARIMA Errors
Adjustable parameters depend on the model Type. In general, adjustable parameters include: • Predictor variables for the linear regression component, listed in the Predictors section. • For regression models with ARMA errors, you must include at least one predictor in the model. To include a predictor, select the corresponding check box in the Include? column. • For MLR models, you can clear all check boxes in the Include? column. In this case, you can specify a constant mean model (intercept-only model) by selecting the Include Intercept check box. Or, you can specify an error-only model by clearing the Include Intercept check box. • The innovation distribution and nonseasonal lags for the error model, for regression models with ARMA errors. As you adjust parameter values, the equation in the Model Equation section changes to match your specifications. Adjustable parameters correspond to input and name-value pair arguments described in the previous sections and in the regARIMA reference page. For more details on specifying models using the app, see “Fitting Models to Data” on page 4-15 and “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44.
See Also Apps Econometric Modeler 5-17
5
Time Series Regression Models
Objects regARIMA
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Specify the Default Regression Model with ARIMA Errors” on page 5-19
•
“Modify regARIMA Model Properties” on page 5-21
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-36
•
“Create Regression Models with SARIMA Errors” on page 5-49
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
5-18
“Regression Models with Time Series Errors” on page 5-5
Specify the Default Regression Model with ARIMA Errors
Specify the Default Regression Model with ARIMA Errors This example shows how to specify the default regression model with ARIMA errors using the shorthand ARIMA(p, D, q) notation corresponding to the following equation: yt = c + ut D
1 − ϕ1L − ϕ2L2 − ϕ3L3 1 − L ut = 1 + θ1L + θ2L2 εt . Specify a regression model with ARIMA(3,1,2) errors. Mdl = regARIMA(3,1,2) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 4 1 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The model specification for Mdl appears in the Command Window. By default, regARIMA sets: • The autoregressive (AR) parameter values to NaN at lags [1 2 3] • The moving average (MA) parameter values to NaN at lags [1 2] • The variance (Variance) of the innovation process, εt, to NaN • The distribution (Distribution) of εt to Gaussian • The regression model intercept to NaN There is no regression component (Beta) by default. The property:
• P = p + D, which represents the number of presample observations that the software requires to initialize the autoregressive component of the model to perform, for example, estimation. • D represents the level of nonseasonal integration. • Q represents the number of presample observations that the software requires to initialize the moving average component of the model to perform, for example, estimation. Fit Mdl to data by passing it and the data into estimate. If you pass the predictor series into estimate, then estimate estimates Beta by default. You can modify the properties of Mdl using dot notation. References: 5-19
5
Time Series Regression Models
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Modify regARIMA Model Properties” on page 5-21
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-36
•
“Create Regression Models with SARIMA Errors” on page 5-49
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
5-20
“Regression Models with Time Series Errors” on page 5-5
Modify regARIMA Model Properties
Modify regARIMA Model Properties In this section... “Modify Properties Using Dot Notation” on page 5-21 “Nonmodifiable Properties” on page 5-23
Modify Properties Using Dot Notation If you create a regression model with ARIMA errors using regARIMA, then the software assigns values to all of its properties. To change any of these property values, you do not need to reconstruct the entire model. You can modify property values of an existing model using dot notation. To access the property, type the model name, then the property name, separated by '|.|' (a period). Specify the regression model with ARIMA(3,1,2) errors yt = c + ut D
1 − ϕ1L − ϕ2L2 − ϕ3L3 1 − L ut = 1 + θ1L + θ2L2 εt . Mdl = regARIMA(3,1,2);
Use cell array notation to set the autoregressive and moving average parameters to values. Mdl.AR = {0.2 0.1 0.05}; Mdl.MA = {0.1 -0.05} Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 4 1 2 {0.2 0.1 0.05} at lags [1 2 3] {} {0.1 -0.05} at lags [1 2] {} NaN
Use dot notation to display the autoregressive coefficients of Mdl in the Command Window. ARCoeff = Mdl.AR ARCoeff=1×3 cell array {[0.2000]} {[0.1000]}
{[0.0500]}
ARCoeff is a 1-by-3 cell array. Each, successive cell contains the next autoregressive lags. You can also add more lag coefficients. 5-21
5
Time Series Regression Models
Mdl.MA = {0.1 -0.05 0.01} Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,3) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 4 1 3 {0.2 0.1 0.05} at lags [1 2 3] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
By default, the specification sets the new coefficient to the next, consecutive lag. The addition of the new coefficient increases Q by 1. You can specify a lag coefficient to a specific lag term by using cell indexing. Mdl.AR{12} = 0.01 Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(12,1,3) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 13 1 3 {0.2 0.1 0.05 0.01} at lags [1 2 3 12] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
The autoregressive coefficient 0.01 is located at the 12th lag. Property P increases to 13 with the new specification. Set the innovation distribution to the t distribution with NaN degrees of freedom. Distribution = struct('Name','t','DoF',NaN); Mdl.Distribution = Distribution Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D:
5-22
"ARIMA(12,1,3) Error Model (t Distribution)" Name = "t", DoF = NaN NaN [1×0] 13 1
Modify regARIMA Model Properties
Q: AR: SAR: MA: SMA: Variance:
3 {0.2 0.1 0.05 0.01} at lags [1 2 3 12] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
If DoF is NaN, then estimate estimates the degrees of freedom. For other tasks, such as simulating or forecasting a model, you must specify a value for DoF. To specify a regression coefficient, assign a vector to the property Beta. Mdl.Beta = [1; 3; -5] Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARIMA(12,1,3) Error Model (t Distribution)" Name = "t", DoF = NaN NaN [1 3 -5] 13 1 3 {0.2 0.1 0.05 0.01} at lags [1 2 3 12] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
If you pass Mdl into estimate with the response data and three predictor series, then the software fixes the non-|NaN| parameters at their values, and estimate Intercept, Variance, and DoF. For example, if you want to simulate data from this model, then you must specify Variance and DoF.
Nonmodifiable Properties Not all properties of a regARIMA model are modifiable. To change them directly, you must redefine the model using regARIMA. Nonmodifiable properties include: • P, which is the compound autoregressive polynomial degree. The software determines P from p, d, ps, and s. For details on notation, see “Regression Model with ARIMA Time Series Errors” on page 12-1837. • Q, which is the compound moving average degree. The software determines Q from q and qs • DoF, which is the degrees of freedom for models having a t-distributed innovation process Though they are not explicitly properties, you cannot reassign or print the lag structure using ARLags, MALags, SARLags, or SMALags. Pass these and the lag structure into regARIMA as namevalue pair arguments when you specify the model. For example, specify a regression model with ARIMA(4,1) errors using regARIMA, where the autoregressive coefficients occur at lags 1 and 4. Mdl = regARIMA('ARLags',[1 4],'MALags',1) Mdl = regARIMA with properties:
5-23
5
Time Series Regression Models
Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(4,1) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 4 1 {NaN NaN} at lags [1 4] {} {NaN} at lag [1] {} NaN
You can produce the same results by specifying a regression model with ARMA(1,1) errors, then adding an autoregressive coefficient at the fourth lag. Mdl = regARIMA(1,0,1); Mdl.AR{4} = NaN Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(4,1) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 4 1 {NaN NaN} at lags [1 4] {} {NaN} at lag [1] {} NaN
To change the value of DoF, you must define a new structure for the distribution, and use dot notation to pass it into the model. For example, specify a regression model with AR(1) errors having tdistributed innovations. Mdl = regARIMA('AR',0.5,'Distribution','t') Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(1,0) Error Model (t Distribution)" Name = "t", DoF = NaN NaN [1×0] 1 0 {0.5} at lag [1] {} {} {} NaN
The value of DoF is NaN by default. Specify that the t distribution has 10 degrees of freedom. 5-24
Modify regARIMA Model Properties
Distribution = struct('Name','t','DoF',10); Mdl.Distribution = Distribution Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(1,0) Error Model (t Distribution)" Name = "t", DoF = 10 NaN [1×0] 1 0 {0.5} at lag [1] {} {} {} NaN
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify the Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-36
•
“Create Regression Models with SARIMA Errors” on page 5-49
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
“Regression Models with Time Series Errors” on page 5-5
5-25
5
Time Series Regression Models
Create Regression Models with AR Errors In this section... “Default Regression Model with AR Errors” on page 5-26 “AR Error Model Without an Intercept” on page 5-27 “AR Error Model with Nonconsecutive Lags” on page 5-27 “Known Parameter Values for a Regression Model with AR Errors” on page 5-28 “Regression Model with AR Errors and t Innovations” on page 5-29 These examples show how to create regression models with AR errors using regARIMA. For details on specifying regression models with AR errors using the Econometric Modeler app, see “Specify Regression Model with ARMA Errors Using Econometric Modeler App” on page 5-40.
Default Regression Model with AR Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify a regression model with AR errors. Specify the default regression model with AR(3) errors: yt = c + Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt . Mdl = regARIMA(3,0,0) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,0) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 3 0 {NaN NaN NaN} at lags [1 2 3] {} {} {} NaN
The software sets the innovation distribution to Gaussian, and each parameter to NaN. The AR coefficients are at lags 1 through 3. Pass Mdl into estimate with data to estimate the parameters set to NaN. Though Beta is not in the display, if you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values.
5-26
Create Regression Models with AR Errors
AR Error Model Without an Intercept This example shows how to specify a regression model with AR errors without a regression intercept. Specify the default regression model with AR(3) errors: yt = Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt . Mdl = regARIMA('ARLags',1:3,'Intercept',0) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,0) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [1×0] 3 0 {NaN NaN NaN} at lags [1 2 3] {} {} {} NaN
The software sets Intercept to 0, but all other estimable parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation.
AR Error Model with Nonconsecutive Lags This example shows how to specify a regression model with AR errors, where the nonzero AR terms are at nonconsecutive lags. Specify the regression model with AR(4) errors: yt = c + Xt β + ut ut = a1ut − 1 + a4ut − 4 + εt . Mdl = regARIMA('ARLags',[1,4]) Mdl = regARIMA with properties: Description: "ARMA(4,0) Error Model (Gaussian Distribution)" Distribution: Name = "Gaussian" Intercept: NaN
5-27
5
Time Series Regression Models
Beta: P: Q: AR: SAR: MA: SMA: Variance:
[1×0] 4 0 {NaN NaN} at lags [1 4] {} {} {} NaN
The AR coefficients are at lags 1 and 4. Verify that the AR coefficients at lags 2 and 3 are 0. Mdl.AR ans=1×4 cell array {[NaN]} {[0]}
{[0]}
{[NaN]}
The software displays a 1-by-4 cell array. Each consecutive cell contains the corresponding AR coefficient value. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then, estimate holds a2 = 0 and a3 = 0 during estimation.
Known Parameter Values for a Regression Model with AR Errors This example shows how to specify values for all parameters of a regression model with AR errors. Specify the regression model with AR(4) errors: −2 + ut 0.5 ut = 0 . 2ut − 1 + 0 . 1ut − 4 + εt, yt = Xt
where εt is Gaussian with unit variance. Mdl = regARIMA('AR',{0.2,0.1},'ARLags',[1,4], ... 'Intercept',0,'Beta',[-2;0.5],'Variance',1) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
5-28
"Regression with ARMA(4,0) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [-2 0.5] 4 0 {0.2 0.1} at lags [1 4] {} {} {} 1
Create Regression Models with AR Errors
There are no NaN values in any Mdl properties, and therefore there is no need to estimate Mdl using estimate. However, you can simulate or forecast responses from Mdl using simulate or forecast.
Regression Model with AR Errors and t Innovations This example shows how to set the innovation distribution of a regression model with AR errors to a t distribution. Specify the regression model with AR(4) errors: −2 + ut 0.5 ut = 0 . 2ut − 1 + 0 . 1ut − 4 + εt, yt = Xt
where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('AR',{0.2,0.1},'ARLags',[1,4],... 'Intercept',0,'Beta',[-2;0.5],'Variance',1,... 'Distribution','t') Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(4,0) Error Model (t Distribution)" Name = "t", DoF = NaN 0 [-2 0.5] 4 0 {0.2 0.1} at lags [1 4] {} {} {} 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t10 distribution. Mdl.Distribution = struct('Name','t','DoF',10) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR:
"Regression with ARMA(4,0) Error Model (t Distribution)" Name = "t", DoF = 10 0 [-2 0.5] 4 0 {0.2 0.1} at lags [1 4] {}
5-29
5
Time Series Regression Models
MA: {} SMA: {} Variance: 1
You can simulate or forecast responses using simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also Apps Econometric Modeler Objects regARIMA Functions estimate | simulate | forecast
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify the Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-36
•
“Create Regression Models with SARIMA Errors” on page 5-49
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
5-30
“Regression Models with Time Series Errors” on page 5-5
Create Regression Models with MA Errors
Create Regression Models with MA Errors In this section... “Default Regression Model with MA Errors” on page 5-31 “MA Error Model Without an Intercept” on page 5-32 “MA Error Model with Nonconsecutive Lags” on page 5-32 “Known Parameter Values for a Regression Model with MA Errors” on page 5-33 “Regression Model with MA Errors and t Innovations” on page 5-34 These examples show how to create regression models with MA errors using regARIMA. For details on specifying regression models with MA errors using the Econometric Modeler app, see “Specify Regression Model with ARMA Errors Using Econometric Modeler App” on page 5-40.
Default Regression Model with MA Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify the regression model with MA errors. Specify the default regression model with MA(2) errors: yt = c + Xt β + ut ut = εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA(0,0,2) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(0,2) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 0 2 {} {} {NaN NaN} at lags [1 2] {} NaN
The software sets each parameter to NaN, and the innovation distribution to Gaussian. The MA coefficients are at lags 1 and 2. Pass Mdl into estimate with data to estimate the parameters set to NaN. Though Beta is not in the display, if you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values.
5-31
5
Time Series Regression Models
MA Error Model Without an Intercept This example shows how to specify a regression model with MA errors without a regression intercept. Specify the default regression model with MA(2) errors: yt = Xt β + ut ut = εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA('MALags',1:2,'Intercept',0) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(0,2) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [1×0] 0 2 {} {} {NaN NaN} at lags [1 2] {} NaN
The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation.
MA Error Model with Nonconsecutive Lags This example shows how to specify a regression model with MA errors, where the nonzero MA terms are at nonconsecutive lags. Specify the regression model with MA(12) errors: yt = c + Xt β + ut ut = εt + b1εt − 1 + b12εt − 12 . Mdl = regARIMA('MALags',[1, 12]) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta:
5-32
"ARMA(0,12) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0]
Create Regression Models with MA Errors
P: Q: AR: SAR: MA: SMA: Variance:
0 12 {} {} {NaN NaN} at lags [1 12] {} NaN
The MA coefficients are at lags 1 and 12. Verify that the MA coefficients at lags 2 through 11 are 0. Mdl.MA' ans=12×1 cell array {[NaN]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[NaN]}
After applying the transpose, the software displays a 12-by-1 cell array. Each consecutive cell contains the corresponding MA coefficient value. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then estimate holds b2 = b3 =...= b11 = 0 during estimation.
Known Parameter Values for a Regression Model with MA Errors This example shows how to specify values for all parameters of a regression model with MA errors. Specify the regression model with MA(2) errors: 0.5 yt = Xt −3 + ut 1.2 ut = εt + 0 . 5εt − 1 − 0 . 1εt − 2, where εt is Gaussian with unit variance. Mdl = regARIMA('Intercept',0,'Beta',[0.5; -3; 1.2],... 'MA',{0.5, -0.1},'Variance',1) Mdl = regARIMA with properties:
5-33
5
Time Series Regression Models
Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(0,2) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [0.5 -3 1.2] 0 2 {} {} {0.5 -0.1} at lags [1 2] {} 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate Mdl using estimate. However, you can simulate or forecast responses from Mdl using simulate or forecast.
Regression Model with MA Errors and t Innovations This example shows how to set the innovation distribution of a regression model with MA errors to a t distribution. Specify the regression model with MA(2) errors: 0.5 yt = Xt −3 + ut 1.2 ut = εt + 0 . 5εt − 1 − 0 . 1εt − 2, where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('Intercept',0,'Beta',[0.5; -3; 1.2],... 'MA',{0.5, -0.1},'Variance',1,'Distribution','t') Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(0,2) Error Model (t Distribution)" Name = "t", DoF = NaN 0 [0.5 -3 1.2] 0 2 {} {} {0.5 -0.1} at lags [1 2] {} 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t15 distribution. Mdl.Distribution = struct('Name','t','DoF',15)
5-34
Create Regression Models with MA Errors
Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(0,2) Error Model (t Distribution)" Name = "t", DoF = 15 0 [0.5 -3 1.2] 0 2 {} {} {0.5 -0.1} at lags [1 2] {} 1
You can simulate and forecast responses from by passing Mdl to simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also Apps Econometric Modeler Objects regARIMA Functions estimate | simulate | forecast
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify the Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with ARMA Errors” on page 5-36
•
“Create Regression Models with SARIMA Errors” on page 5-49
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
“Regression Models with Time Series Errors” on page 5-5
5-35
5
Time Series Regression Models
Create Regression Models with ARMA Errors In this section... “Default Regression Model with ARMA Errors” on page 5-36 “ARMA Error Model Without an Intercept” on page 5-37 “ARMA Error Model with Nonconsecutive Lags” on page 5-37 “Known Parameter Values for a Regression Model with ARMA Errors” on page 5-38 “Regression Model with ARMA Errors and t Innovations” on page 5-38 “Specify Regression Model with ARMA Errors Using Econometric Modeler App” on page 5-40
Default Regression Model with ARMA Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify the regression model with ARMA errors. Specify the default regression model with ARMA(3,2) errors: yt = c + Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA(3,0,2) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,2) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 3 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets each parameter to NaN, and the innovation distribution to Gaussian. The AR coefficients are at lags 1 through 3, and the MA coefficients are at lags 1 and 2. Pass Mdl into estimate with data to estimate the parameters set to NaN. The regARIMA model sets Beta to [] and does not display it. If you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values.
5-36
Create Regression Models with ARMA Errors
ARMA Error Model Without an Intercept This example shows how to specify a regression model with ARMA errors without a regression intercept. Specify the default regression model with ARMA(3,2) errors: yt = Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA('ARLags',1:3,'MALags',1:2,'Intercept',0) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,2) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [1×0] 3 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation.
ARMA Error Model with Nonconsecutive Lags This example shows how to specify a regression model with ARMA errors, where the nonzero ARMA terms are at nonconsecutive lags. Specify the regression model with ARMA(8,4) errors: yt = c + Xt β + ut ut = a1u1 + a4u4 + a8u8 + εt + b1εt − 1 + b4εt − 4 . Mdl = regARIMA('ARLags',[1,4,8],'MALags',[1,4]) Mdl = regARIMA with properties: Description: "ARMA(8,4) Error Model (Gaussian Distribution)" Distribution: Name = "Gaussian" Intercept: NaN
5-37
5
Time Series Regression Models
Beta: P: Q: AR: SAR: MA: SMA: Variance:
[1×0] 8 4 {NaN NaN NaN} at lags [1 4 8] {} {NaN NaN} at lags [1 4] {} NaN
The AR coefficients are at lags 1, 4, and 8, and the MA coefficients are at lags 1 and 4. The software sets the interim lags to 0. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then estimate holds all interim lag coefficients to 0 during estimation.
Known Parameter Values for a Regression Model with ARMA Errors This example shows how to specify values for all parameters of a regression model with ARMA errors. Specify the regression model with ARMA(3,2) errors: 2.5 + ut −0 . 6 ut = 0 . 7ut − 1 − 0 . 3ut − 2 + 0 . 1ut − 3 + εt + 0 . 5εt − 1 + 0 . 2εt − 2, yt = Xt
where εt is Gaussian with unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},'Variance',1) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(3,2) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [2.5 -0.6] 3 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate Mdl using estimate. However, you can simulate or forecast responses from Mdl using simulate or forecast.
Regression Model with ARMA Errors and t Innovations
5-38
Create Regression Models with ARMA Errors
This example shows how to set the innovation distribution of a regression model with ARMA errors to a t distribution. Specify the regression model with ARMA(3,2) errors: 2.5 + ut −0 . 6 ut = 0 . 7ut − 1 − 0 . 3ut − 2 + 0 . 1ut − 3 + εt + 0 . 5εt − 1 + 0 . 2εt − 2, yt = Xt
where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},'Variance',1,... 'Distribution','t') Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(3,2) Error Model (t Distribution)" Name = "t", DoF = NaN 0 [2.5 -0.6] 3 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t5 distribution. Mdl.Distribution = struct('Name','t','DoF',5) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(3,2) Error Model (t Distribution)" Name = "t", DoF = 5 0 [2.5 -0.6] 3 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
You can simulate or forecast responses from Mdl using simulate or forecast because Mdl is completely specified.
5-39
5
Time Series Regression Models
In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
Specify Regression Model with ARMA Errors Using Econometric Modeler App In the Econometric Modeler app, you can specify the predictor variables in the regression component, and the error model lag structure and innovation distribution of a regression model with ARMA(p,q) errors, by following these steps. All specified coefficients are unknown but estimable parameters. 1
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). 2
In the Time Series pane, select the response time series to which the model will be fit.
3
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
4
In the models gallery, in the Regression Models section, click RegARMA. The RegARMA Model Parameters dialog box appears.
5-40
Create Regression Models with ARMA Errors
5
Choose the error model lag structure. To specify a regression model with ARMA(p,q) errors that includes all AR lags from 1 through p and all MA lags from 1 through q, use the Lag Order tab. For the flexibility to specify the inclusion of particular lags, use the Lag Vector tab. For more details, see “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44. Regardless of the tab you use, you can verify the model form by inspecting the equation in the Model Equation section.
6
In the Predictors section, choose at least one predictor variable by selecting the Include? check box for the time series.
For example, suppose you are working with the Data_USEconModel.mat data set and its variables are listed in the Time Series pane. • To specify a regression model with AR(3) errors for the unemployment rate containing all consecutive AR lags from 1 through its order, Gaussian-distributed innovations, and the predictor variables COE, CPIAUCSL, FEDFUNDS, and GDP: 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA. .
4
In the regARMA Model Parameters dialog box, on the Lag Order tab, set Autoregressive Order to 3.
5
In the Predictors section, select the Include? check box for the COE, CPIAUCSL, FEDFUNDS, and GDP time series.
• To specify a regression model with MA(2) errors for the unemployment rate containing all MA lags from 1 through its order, Gaussian-distributed innovations, and the predictor variables COE and CPIAUCSL. 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box, on the Lag Order tab, set Moving Average Order to 2.
5
In the Predictors section, select the Include? check box for the COE and CPIAUCSL time series.
• To specify the regression model with ARMA(8,4) errors for the unemployment rate containing nonconsecutive lags yt = c + β1COEt + β2CPIAUCSLt + ut 1 − α1L − α4L4 − α8L8 ut = 1 + b1L + b4L4 εt
,
where εt is a series of IID Gaussian innovations: 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery. 5-41
5
Time Series Regression Models
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box, click the Lag Vector tab:
5
a
In the Autoregressive Lags box, type 1 4 8.
b
In the Moving Average Lags box, type 1 4.
In the Predictors section, select the Include? check box for the COE and CPIAUCSL time series.
• To specify a regression model with ARMA(3,2) errors for the unemployment rate containing all consecutive AR and MA lags through their respective orders, the predictor variables COE and CPIAUCSL, and t-distributed innovations: 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box, click the Lag Order tab: a
Set Autoregressive Order to 3.
b
Set Moving Average Order to 2.
5
Click the Innovation Distribution button, then select t.
6
In the Predictors section, select the Include? check box for the COE and CPIAUCSL time series.
The degrees of freedom parameter of the t distribution is an unknown but estimable parameter. After you specify a model, click Estimate to estimate all unknown parameters in the model. 5-42
Create Regression Models with ARMA Errors
See Also Apps Econometric Modeler Objects regARIMA Functions estimate | simulate | forecast
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify the Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARIMA Errors” on page 5-44
•
“Create Regression Models with SARIMA Errors” on page 5-49
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
“Regression Models with Time Series Errors” on page 5-5
5-43
5
Time Series Regression Models
Create Regression Models with ARIMA Errors In this section... “Default Regression Model with ARIMA Errors” on page 5-44 “ARIMA Error Model Without an Intercept” on page 5-45 “ARIMA Error Model with Nonconsecutive Lags” on page 5-45 “Known Parameter Values for a Regression Model with ARIMA Errors” on page 5-46 “Regression Model with ARIMA Errors and t Innovations” on page 5-47
Default Regression Model with ARIMA Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify the regression model with ARIMA errors. Specify the default regression model with ARIMA(3,1,2) errors: yt = c + Xt β + ut 1 − a1L − a2L2 − a3L3 1 − L ut = 1 + b1L + b2L2 εt . Mdl = regARIMA(3,1,2) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" Name = "Gaussian" NaN [1×0] 4 1 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets each parameter to NaN, and the innovation distribution to Gaussian. The AR coefficients are at lags 1 through 3, and the MA coefficients are at lags 1 and 2. The property P = p + D = 3 + 1 = 4. Therefore, the software requires at least four presample values to initialize the time series. Pass Mdl into estimate with data to estimate the parameters set to NaN. The regARIMA model sets Beta to [] and does not display it. If you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values. 5-44
Create Regression Models with ARIMA Errors
Be aware that the regression model intercept (Intercept) is not identifiable in regression models with ARIMA errors. If you want to estimate Mdl, then you must set Intercept to a value using, for example, dot notation. Otherwise, estimate might return a spurious estimate of Intercept.
ARIMA Error Model Without an Intercept This example shows how to specify a regression model with ARIMA errors without a regression intercept. Specify the default regression model with ARIMA(3,1,2) errors: yt = Xt β + ut 1 − a1L − a2L2 − a3L3 1 − L ut = 1 + b1L + b2L2 εt . Mdl = regARIMA('ARLags',1:3,'MALags',1:2,'D',1,'Intercept',0) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [1×0] 4 1 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. In general, if you want to use estimate to estimate a regression models with ARIMA errors where D > 0 or s > 0, then you must set Intercept to a value before estimation. You can modify the properties of Mdl using dot notation.
ARIMA Error Model with Nonconsecutive Lags This example shows how to specify a regression model with ARIMA errors, where the nonzero AR and MA terms are at nonconsecutive lags. Specify the regression model with ARIMA(8,1,4) errors: yt = Xt β + ut (1 − a1L − a4L4 − a8L8)(1 − L)ut = (1 + b1L + b4L4)εt . 5-45
5
Time Series Regression Models
Mdl = regARIMA('ARLags',[1,4,8],'D',1,'MALags',[1,4],... 'Intercept',0) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(8,1,4) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [1×0] 9 1 4 {NaN NaN NaN} at lags [1 4 8] {} {NaN NaN} at lags [1 4] {} NaN
The AR coefficients are at lags 1, 4, and 8, and the MA coefficients are at lags 1 and 4. The software sets the interim lags to 0. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then estimate holds all interim lag coefficients to 0 during estimation.
Known Parameter Values for a Regression Model with ARIMA Errors This example shows how to specify values for all parameters of a regression model with ARIMA errors. Specify the regression model with ARIMA(3,1,2) errors: yt = Xt
2.5 + ut −0 . 6
1 − 0 . 7L + 0 . 3L2 − 0 . 1L3 1 − L ut = 1 + 0 . 5L + 0 . 2L2 εt, where εt is Gaussian with unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},... 'Variance',1,'D',1) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR:
5-46
"Regression with ARIMA(3,1,2) Error Model (Gaussian Distribution)" Name = "Gaussian" 0 [2.5 -0.6] 4 1 2 {0.7 -0.3 0.1} at lags [1 2 3] {}
Create Regression Models with ARIMA Errors
MA: {0.5 0.2} at lags [1 2] SMA: {} Variance: 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate it. However, you can simulate or forecast responses by passing Mdl to simulate or forecast.
Regression Model with ARIMA Errors and t Innovations This example shows how to set the innovation distribution of a regression model with ARIMA errors to a t distribution. Specify the regression model with ARIMA(3,1,2) errors: yt = Xt
2.5 + ut −0 . 6
1 − 0 . 7L + 0 . 3L2 − 0 . 1L3 1 − L ut = 1 + 0 . 5L + 0 . 2L2 εt, where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},'Variance',1,... 'Distribution','t','D',1) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARIMA(3,1,2) Error Model (t Distribution)" Name = "t", DoF = NaN 0 [2.5 -0.6] 4 1 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t10 distribution. Mdl.Distribution = struct('Name','t','DoF',10) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta:
"Regression with ARIMA(3,1,2) Error Model (t Distribution)" Name = "t", DoF = 10 0 [2.5 -0.6]
5-47
5
Time Series Regression Models
P: D: Q: AR: SAR: MA: SMA: Variance:
4 1 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
You can simulate or forecast responses by passing Mdl to simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify the Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-36
•
“Create Regression Models with SARIMA Errors” on page 5-49
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
5-48
“Regression Models with Time Series Errors” on page 5-5
Create Regression Models with SARIMA Errors
Create Regression Models with SARIMA Errors In this section... “SARMA Error Model Without an Intercept” on page 5-49 “Known Parameter Values for a Regression Model with SARIMA Errors” on page 5-50 “Regression Model with SARIMA Errors and t Innovations” on page 5-50
SARMA Error Model Without an Intercept This example shows how to specify a regression model with SARMA errors without a regression intercept. Specify the default regression model with SARMA(1, 1) × (2, 1, 1)4 errors: yt = Xt β + ut 1 − a1L 1 − A4L4 − A8L8 1 − L4 ut = 1 + b1L 1 + B4L4 εt . Mdl = regARIMA('ARLags',1,'SARLags',[4, 8],... 'Seasonality',4,'MALags',1,'SMALags',4,'Intercept',0) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARMA(1,1) Error Model Seasonally Integrated with Seasonal AR(8) and MA(4) (Gau Name = "Gaussian" 0 [1×0] 13 5 {NaN} at lag [1] {NaN NaN} at lags [4 8] {NaN} at lag [1] {NaN} at lag [4] 4 NaN
The name-value pair argument: • 'ARLags',1 specifies which lags have nonzero coefficients in the nonseasonal autoregressive polynomial, so a(L) = (1 − a1L). • 'SARLags',[4 8] specifies which lags have nonzero coefficients in the seasonal autoregressive polynomial, so A(L) = (1 − A4L4 − A8L8). • 'MALags',1 specifies which lags have nonzero coefficients in the nonseasonal moving average polynomial, so b(L) = (1 + b1L). • 'SMALags',4 specifies which lags have nonzero coefficients in the seasonal moving average polynomial, so B(L) = (1 + B4L4). • 'Seasonality',4 specifies the degree of seasonal integration and corresponds to (1 − L4). The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. 5-49
5
Time Series Regression Models
Property P = p + D + ps + s = 1 + 0 + 8 + 4 = 13, and property Q = q + qs = 1 + 4 = 5. Therefore, the software requires at least 13 presample observation to initialize Mdl. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation. Be aware that the regression model intercept (Intercept) is not identifiable in regression models with ARIMA errors. If you want to estimate Mdl, then you must set Intercept to a value using, for example, dot notation. Otherwise, estimate might return a spurious estimate of Intercept.
Known Parameter Values for a Regression Model with SARIMA Errors This example shows how to specify values for all parameters of a regression model with SARIMA errors. Specify the regression model with SARIMA(1, 1, 1) × (1, 1, 0)12 errors: yt = Xt β + ut 1 − 0 . 2L 1 − L 1 − 0 . 25L12 − 0 . 1L24 1 − L12 ut = 1 + 0 . 15L εt, where εt is Gaussian with unit variance. Mdl = regARIMA('AR',0.2,'SAR',{0.25, 0.1},'SARLags',[12 24],... 'D',1,'Seasonality',12,'MA',0.15,'Intercept',0,'Variance',1) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARIMA(1,1,1) Error Model Seasonally Integrated with Seasonal AR(24) (Gaussian Name = "Gaussian" 0 [1×0] 38 1 1 {0.2} at lag [1] {0.25 0.1} at lags [12 24] {0.15} at lag [1] {} 12 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate Mdl. However, you can simulate or forecast responses by passing Mdl to simulate or forecast.
Regression Model with SARIMA Errors and t Innovations
5-50
Create Regression Models with SARIMA Errors
This example shows how to set the innovation distribution of a regression model with SARIMA errors to a t distribution. Specify the regression model with SARIMA(1, 1, 1) × (1, 1, 0)12 errors: yt = Xt β + ut 1 − 0 . 2L 1 − L 1 − 0 . 25L12 − 0 . 1L24 1 − L12 ut = 1 + 0 . 15L εt, where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('AR',0.2,'SAR',{0.25, 0.1},'SARLags',[12 24],... 'D',1,'Seasonality',12,'MA',0.15,'Intercept',0,... 'Variance',1,'Distribution','t') Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARIMA(1,1,1) Error Model Seasonally Integrated with Seasonal AR(24) (t Distrib Name = "t", DoF = NaN 0 [1×0] 38 1 1 {0.2} at lag [1] {0.25 0.1} at lags [12 24] {0.15} at lag [1] {} 12 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t10 distribution. Mdl.Distribution = struct('Name','t','DoF',10) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARIMA(1,1,1) Error Model Seasonally Integrated with Seasonal AR(24) (t Distrib Name = "t", DoF = 10 0 [1×0] 38 1 1 {0.2} at lag [1] {0.25 0.1} at lags [12 24] {0.15} at lag [1] {} 12 1
You can simulate or forecast responses by passing Mdl to simulate or forecast because Mdl is completely specified. 5-51
5
Time Series Regression Models
In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify the Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-36
•
“Specify Regression Model with SARIMA Errors” on page 5-53
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-59
More About •
5-52
“Regression Models with Time Series Errors” on page 5-5
Specify Regression Model with SARIMA Errors
Specify Regression Model with SARIMA Errors This example shows how to specify a regression model with multiplicative seasonal ARIMA errors. Load the Airline data set from the MATLAB® root folder, and load the recession data set. Plot the monthly passenger totals and log-totals. load Data_Airline.mat load Data_Recessions y = DataTimeTable.PSSG; logY = log(y); figure tiledlayout(2,1) nexttile plot(DataTimeTable.Time,y) title('{\bf Monthly Passenger Totals (Jan1949 - Dec1960)}') nexttile plot(DataTimeTable.Time,log(y)) title('{\bf Monthly Passenger Log-Totals (Jan1949 - Dec1960)}')
The log transformation seems to linearize the time series. Construct this predictor, which is whether the country was in a recession during the sampled period. 0 means the country was not in a recession, and 1 means that it was in a recession. 5-53
5
Time Series Regression Models
X = zeros(numel(dates),1); % Preallocation for j = 1:size(Recessions,1) X(dates >= Recessions(j,1) & dates 0 or s > 0, and you want to estimate the intercept, c, then c is not identifiable. You can show that this is true. • Consider “Equation 5-8”. Solve for ut in the second equation and substitute it into the first. 5-106
Intercept Identifiability in Regression Models with ARIMA Errors
yt = c + Xt β + Η−1(L)Ν(L)εt, where • Η(L) = a(L)(1 − L)D A(L)(1 − Ls) . • Ν(L) = b(L)B(L) . • The likelihood function is based on the distribution of εt. Solve for εt. εt = Ν−1(L)Η(L)yt + Ν−1(L)Η(L)c + Ν−1(L)Η(L)Xt β . • Note that Ljc = c. The constant term contributes to the likelihood as follows. 1
Ν−1(L)Η(L)c = Ν−1(L)a(L)A(L)( (1 − Ls)c 1
= Ν−1(L)a(L)A(L)( (c − c) =0 or 1
Ν−1(L)Η(L)c = Ν−1(L)a(L)A(L)(1 − Ls)( c 1
= Ν−1(L)a(L)A(L)(1 − Ls)( (1 − L)c 1
= Ν−1(L)a(L)A(L)(1 − Ls)( (c − c) = 0. Therefore, when the ARIMA error model is integrated, the likelihood objective function based on the distribution of εt is invariant to the value of c. In general, the effective constant in the equivalent ARIMAX representation of a regression model with ARIMA errors is a function of the compound autoregressive coefficients and the original intercept c, and incorporates a nonlinear constraint. This constraint is seamlessly incorporated for applications such as Monte Carlo simulation of integrated models with nonzero intercepts. However, for estimation, the ARIMAX model is unable to identify the constant in the presence of an integrated polynomial, and this results in spurious or unusual parameter estimates. You should exclude an intercept from integrated models in most applications.
Intercept Identifiability Illustration As an illustration, consider the regression model with ARIMA(2,1,1) errors without predictors yt = 0.5 + ut 1 − 0.8L + 0.4L2 (1 − L)ut = (1 + 0.3L)εt,
(5-9)
or yt = 0.5 + ut 1 − 1.8L + 1.2L2 − 0.4L3 ut = (1 + 0.3L)εt .
(5-10)
You can rewrite “Equation 5-10” using substitution and some manipulation 5-107
5
Time Series Regression Models
yt = 1 − 1.8 + 1.2 − 0.4 0.5 + 1.8yt − 1 − 1.2yt − 2 + 0.4yt − 3 + εt + 0.3εt − 1 . Note that 1 − 1.8 + 1.2 − 0.4 0.5 = 0(0.5) = 0. Therefore, the regression model with ARIMA(2,1,1) errors in “Equation 5-10” has an ARIMA(2,1,1) model representation yt = 1.8yt − 1 − 1.2yt − 2 + 0.4yt − 3 + εt + 0.3εt − 1 . You can see that the constant is not present in the model (which implies its value is 0), even though the value of the regression model with ARIMA errors intercept is 0.5. You can also simulate this behavior. Start by specifying the regression model with ARIMA(2,1,1) errors in “Equation 5-10”. Mdl0 = regARIMA('D',1,'AR',{0.8 -0.4},'MA',0.3,... 'Intercept',0.5,'Variance', 0.2);
Simulate 1000 observations. rng(1); T = 1000; y = simulate(Mdl0, T);
Fit Mdl to the data. Mdl = regARIMA('ARLags',1:2,'MALags',1,'D',1);... % "Empty" model to pass into estimate [EstMdl,EstParamCov] = estimate(Mdl,y,'Display','params');
Warning: When ARIMA error model is integrated, the intercept is unidentifiable and cannot be esti ARIMA(2,1,1) Error Model (Gaussian Distribution):
Intercept AR{1} AR{2} MA{1} Variance
Value ________
StandardError _____________
TStatistic __________
PValue ___________
NaN 0.89647 -0.45102 0.18804 0.19789
NaN 0.048507 0.038916 0.054505 0.0083512
NaN 18.481 -11.59 3.45 23.696
NaN 2.9207e-76 4.6573e-31 0.00056069 3.9373e-124
estimate displays a warning to inform you that the intercept is not identifiable, and sets its estimate, standard error, and t-statistic to NaN. Plot the profile likelihood for the intercept. c = linspace(Mdl0.Intercept - 50,... Mdl0.Intercept + 50,100); % Grid of intercepts logL = nan(numel(c),1); % For preallocation for i = 1:numel(logL) EstMdl.Intercept = c(i); [~,~,~,logL(i)] = infer(EstMdl,y);
5-108
Intercept Identifiability in Regression Models with ARIMA Errors
end figure plot(c,logL) title('Profile Log-Likelihood with Respect to the Intercept') xlabel('Intercept') ylabel('Loglikelihood')
The loglikelihood does not change over the grid of intercept values. The slight oscillation is a result of the numerical routine used by infer.
See Also Related Examples •
“Estimate Regression Model with ARIMA Errors” on page 5-85
5-109
5
Time Series Regression Models
Alternative ARIMA Model Representations In this section... “Mathematical Development of regARIMA to ARIMAX Model Conversion” on page 5-110 “Show Conversion in MATLAB®” on page 5-112
Mathematical Development of regARIMA to ARIMAX Model Conversion ARIMAX models and regression models with ARIMA errors are closely related, and the choice of which to use is generally dictated by your goals for the analysis. If your objective is to fit a parsimonious model to data and forecast responses, then there is very little difference between the two models. If you are more interested in preserving the usual interpretation of a regression coefficient as a measure of sensitivity, i.e., the effect of a unit change in a predictor variable on the response, then use a regression model with ARIMA errors. Regression coefficients in ARIMAX models do not possess that interpretation because of the dynamic dependence on the response [1]. Suppose that you have the parameter estimates from a regression model with ARIMA errors, and you want to see how the model structure compares to ARIMAX model. Or, suppose you want some insight as to the underlying relationship between the two models. The ARIMAX model is (t = 1,...,T): Η(L)yt = c + Xt β + Ν(L)εt,
(5-11)
where • yt is the univariate response series. • Xt is row t of X, which is the matrix of concatenated predictor series. That is, Xt is observation t of each predictor series. • β is the regression coefficient. • c is the regression model intercept. • Η(L) = ϕ(L)(1 − L)DΦ(L)(1 − Ls) = 1 − η L − η L2 − ... − η LP, which is the degree P lag operator 1 2 P polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials. For more details on notation, see “Multiplicative ARIMA Model” on page 7-49. • Ν(L) = θ(L)Θ(L) = 1 + ν L + ν L2 + ... + ν LQ, which is the degree Q lag operator polynomial that 1 2 Q captures the combined effect of the seasonal and nonseasonal moving average polynomials. • εt is a white noise innovation process. The regression model with ARIMA errors is (t = 1,...,T) yt = c + Xt β + ut A(L)ut = B(L)εt, where 5-110
(5-12)
Alternative ARIMA Model Representations
• ut is the unconditional disturbances process. •
D
A(L) = ϕ(L)(1 − L) Φ(L)(1 − Ls) = 1 − a1L − a2L2 − ... − aPLP, which is the degree P lag operator polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials.
• B(L) = θ(L)Θ(L) = 1 + b L + b L2 + ... + b LQ, which is the degree Q lag operator polynomial that 1 2 Q captures the combined effect of the seasonal and nonseasonal moving average polynomials. The values of the variables defined in “Equation 5-12” are not necessarily equivalent to the values of the variables in “Equation 5-11”, even though the notation might be similar. Consider “Equation 5-12”, the regression model with ARIMA errors. Use the following operations to convert the regression model with ARIMA errors to its corresponding ARIMAX model. 1
Solve for ut. yt = c + Xt β + ut ut =
2
B(L) ε . A(L) t
Substitute ut into the regression equation. yt = c + Xt β +
B(L) ε A(L) t
A(L)yt = A(L)c + A(L)Xt β + B(L)εt . 3
Solve for yt. yt = A(L)c + A(L)Xt β + = A(L)c + ZtΓ +
P
k=1
P
∑
∑
k=1
ak yt − k + B(L)εt (5-13)
ak yt − k + B(L)εt .
In “Equation 5-13”, • A(L)c = (1 – a1 – a2 –...– aP)c. That is, the constant in the ARIMAX model is the intercept in the regression model with ARIMA errors with a nonlinear constraint. Though applications, such as simulate, handle this constraint, estimate cannot incorporate such a constraint. In the latter case, the models are equivalent when you fix the intercept and constant to 0. • In the term A(L)Xtβ, the lag operator polynomial A(L) filters the T-by-1 vector Xtβ, which is the linear combination of the predictors weighted by the regression coefficients. This filtering process requires P presample observations of the predictor series. • arima constructs the matrix Zt as follows: • Each column of Zt corresponds to each term in A(L). • The first column of Zt is the vector Xtβ. • The second column of Zt is a sequence of d2 NaNs (d2 is the degree of the second term in d
A(L)), followed by the product L j Xt β. That is, the software attaches d2 NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by d2 observations. 5-111
5
Time Series Regression Models
• The jth column of Zt is a sequence of dj NaNs (dj is the degree of the jth term in A(L)), d
followed by the product L j Xt β. That is, the software attaches dj NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by dj observations. . • Γ = [1 –a1 –a2 ... –aP]'. The arima converter removes all zero-valued autoregressive coefficients of the difference equation. Subsequently, the arima converter does not associate zero-valued autoregressive coefficients with columns in Zt, nor does it include corresponding, zero-valued coefficients in Γ. 4
Rewrite “Equation 5-13”, yt = (1 −
P
∑
k=1
ak)c + Xt β −
P
∑
k=1
ak Xt − kβ +
P
∑
k=1
ak yt − k + εt +
Q
∑
k=1
εt − k .
For example, consider the following regression model whose errors are ARMA(2,1): yt = 0.2 + 0.5Xt + ut 1 − 0.8L + 0.4L2 ut = 1 + 0.3L εt .
(5-14)
The equivalent ARMAX model is: yt = 0.12 + 0.5 − 0.4L + 0.2L2 Xt + 0.8yt − 1 − 0.4yt − 2 + (1 + 0.3L)εt = 0.12 + ZtΓ + 0.8yt − 1 − 0.4yt − 2 + (1 + 0.3L)εt, or (1 − 0.8L + 0.4L2)yt = 0.12 + ZtΓ + (1 + 0.3L)εt, where Γ = [1 –0.8 0.4]' and x1 NaN NaN x2
x1
NaN
Zt = 0.5 x3
x2
x1
.
⋮ ⋮ ⋮ xT xT − 1 xT − 2 This model is not integrated because all of the eigenvalues associated with the AR polynomial are within the unit circle, but the predictors might affect the otherwise stable process. Also, you need presample predictor data going back at least 2 periods to, for example, fit the model to data.
Show Conversion in MATLAB® Illustrate the conversion in MATLAB® by model simulation and estimation. Specify the regression model with ARIMA errors in “Equation 5-14”. 5-112
Alternative ARIMA Model Representations
MdlregARIMA0 = regARIMA('Intercept',0.2,'AR',{0.8 -0.4}, ... 'MA',0.3,'Beta',[0.3 -0.2],'Variance',0.2);
Generate presample observations and predictor data. rng(1); % For reproducibility T = 100; maxPQ = max(MdlregARIMA0.P,MdlregARIMA0.Q); numObs = T + maxPQ; % Adjust number of observations to account for presample XregARIMA = randn(numObs,2); % Simulate predictor data u0 = randn(maxPQ,1); % Presample unconditional disturbances u(t) e0 = randn(maxPQ,1); % Presample innovations e(t)
Simulate data from the regression model with ARIMA errors MdlregARIMA0. rng(100) % For consistent seed with later call [y1,e1,u1] = simulate(MdlregARIMA0,T,'U0',u0, ... 'E0',e0,'X',XregARIMA);
Convert the regression model with ARIMA errors to an ARIMAX model. [MdlARIMAX0,XARIMAX] = arima(MdlregARIMA0,'X',XregARIMA); MdlARIMAX0 MdlARIMAX0 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMAX(2,0,1) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 1 0.12 {0.8 -0.4} at lags [1 2] {} {0.3} at lag [1] {} 0 [1 -0.8 0.4] 0.2
Generate presample responses for the ARIMAX model to ensure consistency with the regression model with ARIMA errors. Simulate data from the ARIMAX model. y0 = MdlregARIMA0.Intercept + XregARIMA(1:maxPQ,:)*MdlregARIMA0.Beta' + u0; rng(100) % For consistent seed with earlier call y2 = simulate(MdlARIMAX0,T,'Y0',y0,'E0',e0,'X',XARIMAX); figure plot(y1,'LineWidth',3) hold on plot(y2,'r:','LineWidth',2.5) hold off title("\bf Simulated Paths") legend("regARIMA Model","ARIMAX Model",'Location','best')
5-113
5
Time Series Regression Models
The simulated paths are equal because the arima converter enforces the nonlinear constraint when it converts the regression model intercept to the ARIMAX model constant. Fit a regression model with ARIMA errors to the simulated data. MdlregARIMA0 = regARIMA('ARLags',[1 2],'MALags',1); EstMdlregARIMA = estimate(MdlregARIMA0,y1,'E0',e0,'U0',u0,'X',XregARIMA); Regression with ARMA(2,1) Error Model (Gaussian Distribution): Value ________ Intercept AR{1} AR{2} MA{1} Beta(1) Beta(2) Variance
0.14074 0.83061 -0.45402 0.42803 0.29552 -0.17601 0.18231
StandardError _____________ 0.1014 0.1375 0.1164 0.15145 0.022938 0.030607 0.027765
TStatistic __________ 1.3879 6.0407 -3.9007 2.8262 12.883 -5.7506 6.5663
Fit an ARIMAX model to the simulated data. MdlARIMAX = arima('ARLags',[1 2],'MALags',1); EstMdlARIMAX = estimate(MdlARIMAX,y2,'E0',e0,'Y0',... y0,'X',XARIMAX);
5-114
PValue __________ 0.16518 1.5349e-09 9.5927e-05 0.0047109 5.597e-38 8.8941e-09 5.1569e-11
Alternative ARIMA Model Representations
ARIMAX(2,0,1) Model (Gaussian Distribution): Value ________ Constant AR{1} AR{2} MA{1} Beta(1) Beta(2) Beta(3) Variance
0.084996 0.83136 -0.45599 0.426 1.053 -0.6904 0.45399 0.18112
StandardError _____________ 0.064217 0.13634 0.11788 0.15753 0.13685 0.19262 0.15352 0.028836
TStatistic __________ 1.3236 6.0975 -3.8683 2.7043 7.6949 -3.5843 2.9572 6.281
PValue __________ 0.18564 1.0775e-09 0.0001096 0.0068446 1.4166e-14 0.00033796 0.0031047 3.3634e-10
Convert the estimated regression model with ARIMA errors EstMdlregARIMA to an ARIMAX model. ConvertedMdlARIMAX = arima(EstMdlregARIMA,'X',XregARIMA) ConvertedMdlARIMAX = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMAX(2,0,1) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 1 0.087737 {0.830611 -0.454025} at lags [1 2] {} {0.428031} at lag [1] {} 0 [1 -0.830611 0.454025] 0.182313
The estimated ARIMAX model constant is not equal to the ARIMAX model constant converted from the regression model with ARIMA errors. In other words, EstMdlARIMAX.Constant is 0.084996 and ConvertedMdlARIMAX.Constant = 0.087737. The reason for the discrepancy is estimate does not enforce the nonlinear constraint that the arima converter enforces. As a result, the other estimates are close, but not equal.
References [1] Hyndman, R. J. (2010, October). "The ARIMAX Model Muddle." Rob J. Hyndman. Retrieved May 4, 2017 from https://robjhyndman.com/hyndsight/arimax/.
See Also estimate | estimate | arima
Related Examples •
“Estimate Regression Model with ARIMA Errors” on page 5-85 5-115
5
Time Series Regression Models
Simulate Regression Models with ARMA Errors In this section... “Simulate an AR Error Model” on page 5-116 “Simulate an MA Error Model” on page 5-122 “Simulate an ARMA Error Model” on page 5-128
Simulate an AR Error Model This example shows how to simulate sample paths from a regression model with AR errors without specifying presample disturbances. Specify the regression model with AR(2) errors: −2 + ut 1.5 ut = 0 . 75ut − 1 − 0 . 5ut − 2 + εt, yt = 2 + Xt
where εt is Gaussian with mean 0 and variance 1. Beta = [-2; 1.5]; Intercept = 2; a1 = 0.75; a2 = -0.5; Variance = 1; Mdl = regARIMA('AR',{a1, a2},'Intercept',Intercept,... 'Beta',Beta,'Variance',Variance);
Generate two length T = 50 predictor series by random selection from the standard Gaussian distribution. T = 50; rng(1); % For reproducibility X = randn(T,2);
The software treats the predictors as nonstochastic series. Generate and plot one sample path of responses from Mdl. rng(2); ySim = simulate(Mdl,T,'X',X); figure plot(ySim) title('{\bf Simulated Response Series}')
5-116
Simulate Regression Models with ARMA Errors
simulate requires P = 2 presample unconditional disturbances (ut) to initialize the error series. Without them, as in this case, simulate sets the necessary presample unconditional disturbances to 0. Alternatively, filter a random innovation series through Mdl using filter. rng(2); e = randn(T,1); yFilter = filter(Mdl,e,'X',X); figure plot(yFilter) title('{\bf Simulated Response Series Using Filtered Innovations}')
5-117
5
Time Series Regression Models
The plots suggest that the simulated responses and the responses generated from the filtered innovations are equivalent. Simulate 1000 response paths from Mdl. Assess transient effects by plotting the unconditional disturbance (U) variances across the simulated paths at each period. numPaths = 1000; [Y,~,U] = simulate(Mdl,T,'NumPaths',numPaths,'X',X); figure h1 = plot(Y,'Color',[.85,.85,.85]); title('{\bf 1000 Simulated Response Paths}') hold on h2 = plot(1:T,Intercept+X*Beta,'k--','LineWidth',2); legend([h1(1),h2],'Simulated Path','Mean') hold off
5-118
Simulate Regression Models with ARMA Errors
figure h1 = plot(var(U,0,2),'r','LineWidth',2); hold on theoVarFix = ((1-a2)*Variance)/((1+a2)*((1-a2)^2-a1^2)); h2 = plot([1 T],[theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') hold off
5-119
5
Time Series Regression Models
The simulated response paths follow their theoretical mean, c + Xβ, which is not constant over time (and might look nonstationary). The variance of the process is not constant, but levels off at the theoretical variance by the 10th period. The theoretical variance of the AR(2) error model is 1 − a2 σε2 1 + a2
1 − a2
2
− a12
=
(1 + 0 . 5) 2
2
(1 − 0 . 5) (1 + 0 . 5) − 0 . 75
= 1 . 78
You can reduce transient effects is by partitioning the simulated data into a burn-in portion and a portion for analysis. Do not use the burn-in portion for analysis. Include enough periods in the burn-in portion to overcome the transient effects. burnIn = 1:10; notBurnIn = burnIn(end)+1:T; Y = Y(notBurnIn,:); X = X(notBurnIn,:); U = U(notBurnIn,:); figure h1 = plot(notBurnIn,Y,'Color',[.85,.85,.85]); hold on h2 = plot(notBurnIn,Intercept+X*Beta,'k--','LineWidth',2); title('{\bf 1000 Simulated Response Paths for Analysis}') legend([h1(1),h2],'Simulated Path','Mean') hold off
5-120
Simulate Regression Models with ARMA Errors
figure h1 = plot(notBurnIn,var(U,0,2),'r','LineWidth',2); hold on h2 = plot([notBurnIn(1) notBurnIn(end)],... [theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Converged Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') hold off
5-121
5
Time Series Regression Models
Unconditional disturbance simulation variances fluctuate around the theoretical variance due to Monte Carlo sampling error. Be aware that the exclusion of the burn-in sample from analysis reduces the effective sample size.
Simulate an MA Error Model This example shows how to simulate responses from a regression model with MA errors without specifying a presample. Specify the regression model with MA(8) errors: −2 + ut 1.5 ut = εt + 0 . 4εt − 1 − 0 . 3εt − 4 + 0 . 2εt − 8, yt = 2 + Xt
where εt is Gaussian with mean 0 and variance 0.5. Beta = [-2; 1.5]; Intercept = 2; b1 = 0.4; b4 = -0.3; b8 = 0.2; Variance = 0.5;
5-122
Simulate Regression Models with ARMA Errors
Mdl = regARIMA('MA',{b1, b4, b8},'MALags',[1 4 8],... 'Intercept',Intercept,'Beta',Beta,'Variance',Variance);
Generate two length T = 100 predictor series by random selection from the standard Gaussian distribution. T = 100; rng(4); % For reproducibility X = randn(T,2);
The software treats the predictors as nonstochastic series. Generate and plot one sample path of responses from Mdl. rng(5); ySim = simulate(Mdl,T,'X',X); figure plot(ySim) title('{\bf Simulated Response Series}')
simulate requires Q = 8 presample innovations (εt) to initialize the error series. Without them, as in this case, simulate sets the necessary presample innovations to 0. Alternatively, use filter to filter a random innovation series through Mdl. rng(5); e = randn(T,1);
5-123
5
Time Series Regression Models
yFilter = filter(Mdl,e,'X',X); figure plot(yFilter) title('{\bf Simulated Response Series Using Filtered Innovations}')
The plots suggest that the simulated responses and the responses generated from the filtered innovations are equivalent. Simulate 1000 response paths from Mdl. Assess transient effects by plotting the unconditional disturbance (U) variances across the simulated paths at each period. numPaths = 1000; [Y,~,U] = simulate(Mdl,T,'NumPaths',numPaths,'X',X); figure h1 = plot(Y,'Color',[.85,.85,.85]); title('{\bf 1000 Simulated Response Paths}') hold on h2 = plot(1:T,Intercept+X*Beta,'k--','LineWidth',2); legend([h1(1),h2],'Simulated Path','Mean') hold off
5-124
Simulate Regression Models with ARMA Errors
figure h1 = plot(var(U,0,2),'r','LineWidth',2); hold on theoVarFix = (1+b1^2+b4^2+b8^2)*Variance; h2 = plot([1 T],[theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') hold off
5-125
5
Time Series Regression Models
The simulated paths follow their theoretical mean, c + Xβ, which is not constant over time (and might look nonstationary). The variance of the process is not constant, but levels off at the theoretical variance by the 15th period. The theoretical variance of the MA(8) error model is 2
2
2
2
2
2
(1 + b1 + b4 + b8 )σε2 = 1 + 0 . 4 + ( − 0 . 3) + 0 . 2 0 . 5 = 0 . 645 . You can reduce transient effects is by partitioning the simulated data into a burn-in portion and a portion for analysis. Do not use the burn-in portion for analysis. Include enough periods in the burn-in portion to overcome the transient effects. burnIn = 1:15; notBurnIn = burnIn(end)+1:T; Y = Y(notBurnIn,:); X = X(notBurnIn,:); U = U(notBurnIn,:); figure h1 = plot(notBurnIn,Y,'Color',[.85,.85,.85]); hold on h2 = plot(notBurnIn,Intercept+X*Beta,'k--','LineWidth',2); title('{\bf 1000 Simulated Response Paths for Analysis}') legend([h1(1),h2],'Simulated Path','Mean') axis tight hold off
5-126
Simulate Regression Models with ARMA Errors
figure h1 = plot(notBurnIn,var(U,0,2),'r','LineWidth',2); hold on h2 = plot([notBurnIn(1) notBurnIn(end)],... [theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Converged Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') axis tight hold off
5-127
5
Time Series Regression Models
Unconditional disturbance simulation variances fluctuate around the theoretical variance due to Monte Carlo sampling error. Be aware that the exclusion of the burn-in sample from analysis reduces the effective sample size.
Simulate an ARMA Error Model This example shows how to simulate responses from a regression model with ARMA errors without specifying a presample. Specify the regression model with ARMA(2,1) errors: −2 + ut 1.5 ut = 0 . 9ut − 1 − 0 . 1ut − 2 + εt + 0 . 5εt − 1, yt = 2 + Xt
where εt is distributed with 15 degrees of freedom and variance 1. Beta = [-2; 1.5]; Intercept = 2; a1 = 0.9; a2 = -0.1; b1 = 0.5; Variance = 1;
5-128
Simulate Regression Models with ARMA Errors
Distribution = struct('Name','t','DoF',15); Mdl = regARIMA('AR',{a1, a2},'MA',b1,... 'Distribution',Distribution,'Intercept',Intercept,... 'Beta',Beta,'Variance',Variance);
Generate two length T = 50 predictor series by random selection from the standard Gaussian distribution. T = 50; rng(6); % For reproducibility X = randn(T,2);
The software treats the predictors as nonstochastic series. Generate and plot one sample path of responses from Mdl. rng(7); ySim = simulate(Mdl,T,'X',X); figure plot(ySim) title('{\bf Simulated Response Series}')
simulate requires: • P = 2 presample unconditional disturbances to initialize the autoregressive component of the error series. 5-129
5
Time Series Regression Models
• Q = 1 presample innovations to initialize the moving average component of the error series. Without them, as in this case, simulate sets the necessary presample errors to 0. Alternatively, use filter to filter a random innovation series through Mdl. rng(7); e = randn(T,1); yFilter = filter(Mdl,e,'X',X); figure plot(yFilter) title('{\bf Simulated Response Series Using Filtered Innovations}')
The plots suggest that the simulated responses and the responses generated from the filtered innovations are equivalent. Simulate 1000 response paths from Mdl. Assess transient effects by plotting the unconditional disturbance (U) variances across the simulated paths at each period. numPaths = 1000; [Y,~,U] = simulate(Mdl,T,'NumPaths',numPaths,'X',X); figure h1 = plot(Y,'Color',[.85,.85,.85]); title('{\bf 1000 Simulated Response Paths}') hold on
5-130
Simulate Regression Models with ARMA Errors
h2 = plot(1:T,Intercept+X*Beta,'k--','LineWidth',2); legend([h1(1),h2],'Simulated Path','Mean') hold off
figure h1 = plot(var(U,0,2),'r','LineWidth',2); hold on theoVarFix = Variance*(a1*b1*(1+a2)+(1-a2)*(1+a1*b1+b1^2))/... ((1+a2)*((1-a2)^2-a1^2)); h2 = plot([1 T],[theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance',... 'Location','Best') hold off
5-131
5
Time Series Regression Models
The simulated paths follow their theoretical mean, c + Xβ, which is not constant over time (and might look nonstationary). The variance of the process is not constant, but levels off at the theoretical variance by the 10th period. The theoretical variance of the ARMA(2,1) error model is: 2
σε2 a1b1 1 + a2 + 1 − a2 1 + a1b1 + b1 1 + a2
2
1 − a2
2
− a12 2
=
0 . 9(0 . 5) 1 − 0 . 1 + 1 + 0 . 1 1 + 0 . 9(0 . 5) + 0 . 5 1 − 0.1
2
1 + 0.1
2
2
− 0.9
= 6 . 32 .
You can reduce transient effects by partitioning the simulated data into a burn-in portion and a portion for analysis. Do not use the burn-in portion for analysis. Include enough periods in the burn-in portion to overcome the transient effects. burnIn = 1:10; notBurnIn = burnIn(end)+1:T; Y = Y(notBurnIn,:); X = X(notBurnIn,:); U = U(notBurnIn,:); figure h1 = plot(notBurnIn,Y,'Color',[.85,.85,.85]); hold on h2 = plot(notBurnIn,Intercept+X*Beta,'k--','LineWidth',2);
5-132
Simulate Regression Models with ARMA Errors
title('{\bf 1000 Simulated Response Paths for Analysis}') legend([h1(1),h2],'Simulated Path','Mean') axis tight hold off
figure h1 = plot(notBurnIn,var(U,0,2),'r','LineWidth',2); hold on h2 = plot([notBurnIn(1) notBurnIn(end)],... [theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Converged Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') axis tight hold off
5-133
5
Time Series Regression Models
Unconditional disturbance simulation variances fluctuate around the theoretical variance due to Monte Carlo sampling error. Be aware that the exclusion of the burn-in sample from analysis reduces the effective sample size.
5-134
Simulate Regression Models with Nonstationary Errors
Simulate Regression Models with Nonstationary Errors In this section... “Simulate a Regression Model with Nonstationary Errors” on page 5-135 “Simulate a Regression Model with Nonstationary Exponential Errors” on page 5-138
Simulate a Regression Model with Nonstationary Errors This example shows how to simulate responses from a regression model with ARIMA unconditional disturbances, assuming that the predictors are white noise sequences. Specify the regression model with ARIMA errors: yt = 3 + Xt
2 + ut −1 . 5
Δut = 0 . 5Δut − 1 + εt + 1 . 4εt − 1 + 0 . 8εt − 2, where the innovations are Gaussian with variance 1. T = 150; % Sample size Mdl = regARIMA('MA',{1.4,0.8},'AR',0.5,'Intercept',3,... 'Variance',1,'Beta',[2;-1.5],'D',1);
Simulate two Gaussian predictor series with mean 0 and variance 1. rng(1); % For reproducibility X = randn(T,2);
Simulate and plot the response series. y = simulate(Mdl,T,'X',X); figure; plot(y); title 'Simulated Responses'; axis tight;
5-135
5
Time Series Regression Models
Regress y onto X. Plot the residuals, and test them for a unit root. RegMdl = fitlm(X,y); figure; subplot(2,1,1); plotResiduals(RegMdl,'caseorder'); subplot(2,1,2); plotResiduals(RegMdl,'lagged');
5-136
Simulate Regression Models with Nonstationary Errors
h = adftest(RegMdl.Residuals.Raw) h = logical 0
The residual plots indicate that they are autocorrelated and possibly nonstationary (as constructed). h = 0 indicates that there is insufficient evidence to suggest that the residual series is not a unit root process. Treat the nonstationary unconditional disturbances by transforming the data appropriately. In this case, difference the responses and predictors. Reestimate the regression model using the transformed responses, and plot the residuals. dY = diff(y); dX = diff(X); dRegMdl = fitlm(dX,dY); figure; subplot(2,1,1); plotResiduals(dRegMdl,'caseorder','LineStyle','-'); subplot(2,1,2); plotResiduals(dRegMdl,'lagged');
5-137
5
Time Series Regression Models
h = adftest(dRegMdl.Residuals.Raw) h = logical 1
The residual plots indicate that they are still autocorrelated, but stationary. h = 1 indicates that there is enough evidence to suggest that the residual series is not a unit root process. Once the residuals appear stationary, you can determine the appropriate number of lags for the error model using Box and Jenkins methodology. Then, use regARIMA to completely model the regression model with ARIMA errors.
Simulate a Regression Model with Nonstationary Exponential Errors This example shows how to simulate responses from a regression model with nonstationary, exponential, unconditional disturbances. Assume that the predictors are white noise sequences. Specify the following ARIMA error model: Δut = 0 . 9Δut − 1 + εt, where the innovations are Gaussian with mean 0 and variance 0.05. 5-138
Simulate Regression Models with Nonstationary Errors
T = 50; % Sample size MdlU = arima('AR',0.9,'Variance',0.05,'D',1,'Constant',0);
Simulate unconditional disturbances. Exponentiate the simulated errors. rng(10); % For reproducibility u = simulate(MdlU,T,'Y0',[0.5:1.5]'); expU = exp(u);
Simulate two Gaussian predictor series with mean 0 and variance 1. X = randn(T,2);
Generate responses from the regression model with time series errors: yt = 3 + Xt
2 u + e t. −1 . 5
Beta = [2;-1.5]; Intercept = 3; y = Intercept + X*Beta + expU;
Plot the responses. figure plot(y) title('Simulated Responses') axis tight
5-139
5
Time Series Regression Models
The response series seems to grow exponentially (as constructed). Regress y onto X. Plot the residuals. RegMdl1 = fitlm(X,y); figure subplot(2,1,1) plotResiduals(RegMdl1,'caseorder','LineStyle','-') subplot(2,1,2) plotResiduals(RegMdl1,'lagged')
The residuals seem to grow exponentially, and seem autocorrelated (as constructed). Treat the nonstationary unconditional disturbances by transforming the data appropriately. In this case, take the log of the response series. Difference the logged responses. It is recommended to transform the predictors the same way as the responses to maintain the original interpretation of their relationship. However, do not transform the predictors in this case because they contain negative values. Reestimate the regression model using the transformed responses, and plot the residuals. dLogY = diff(log(y)); RegMdl2 = fitlm(X(2:end,:),dLogY); figure subplot(2,1,1) plotResiduals(RegMdl2,'caseorder','LineStyle','-')
5-140
Simulate Regression Models with Nonstationary Errors
subplot(2,1,2) plotResiduals(RegMdl2,'lagged')
h = adftest(RegMdl2.Residuals.Raw) h = logical 1
The residual plots indicate that they are still autocorrelated, but stationary. h = 1 indicates that there is enough evidence to suggest that the residual series is not a unit root process. Once the residuals appear stationary, you can determine the appropriate number of lags for the error model using Box and Jenkins methodology. Then, use regARIMA to completely model the regression model with ARIMA errors.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also regARIMA 5-141
5
Time Series Regression Models
More About •
5-142
“Box-Jenkins Methodology” on page 3-2
Simulate Regression Models with Multiplicative Seasonal Errors
Simulate Regression Models with Multiplicative Seasonal Errors In this section... “Simulate a Regression Model with Stationary Multiplicative Seasonal Errors” on page 5-143 “Untitled” on page 5-145
Simulate a Regression Model with Stationary Multiplicative Seasonal Errors This example shows how to simulate sample paths from a regression model with multiplicative seasonal ARIMA errors using simulate. The time series is monthly international airline passenger numbers from 1949 to 1960. Load the airline and recessions data sets. load Data_Airline load Data_Recessions
Transform the airline data by applying the logarithm, and the 1st and 12th differences. y = DataTimeTable.PSSG; logY = log(y); DiffPoly = LagOp([1 -1]); SDiffPoly = LagOp([1 -1],'Lags',[0, 12]); dLogY = filter(DiffPoly*SDiffPoly,logY);
Construct the predictor (X), which determines whether the country was in a recession during the sampled period. A 0 in row t means the country was not in a recession in month t, and a 1 in row t means that it was in a recession in month t. X = zeros(numel(dates),1); % Preallocation for j = 1:size(Recessions,1) X(dates >= Recessions(j,1) & dates 1 displays results in the command window.
The value of Display applies to all tests. Example: Display="off" Data Types: char | string Plot — Flag indicating whether to plot test results "off" | "on" Flag indicating whether to plot test results, specified as a value in this table. Value
Description
Default Value When
"off"
cusumtest does not produce any plots.
cusumtest returns any output argument.
"on"
cusumtest produces individual cusumtest does not return any plots for each test. output arguments.
Depending on the value of Test, the plots show the sequence of cusums or cusums of squares together with critical lines determined by the value of Alpha. 12-362
cusumtest
The value of Plot applies to all tests. Example: Plot="off" Data Types: char | string ResponseVariable — Variable in Tbl to use for response first variable in Tbl (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variable in Tbl to use for response, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. cusumtest uses the same specified response variable for all tests. Example: ResponseVariable="GDP" Example: ResponseVariable=[true false false false] or ResponseVariable=1 selects the first table variable as the response. Data Types: double | logical | char | cell | string PredictorVariables — Variables in Tbl to use for the predictors string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl to use for the predictors, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. cusumtest uses the same specified predictors for all tests. By default, cusumtest uses all variables in Tbl that are not specified by the ResponseVariable name-value argument. Example: PredictorVariables=["UN" "CPI"] Example: PredictorVariables=[false true true false] or DataVariables=[2 3] selects the second and third table variables. Data Types: double | logical | char | cell | string Note • When cusumtest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If the value of any option is a row vector, so is output h. Array and table outputs retain their specified dimensions.
Output Arguments h — Test rejection decisions logical scalar | logical vector 12-363
12
Functions
Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests numTests. cusumtest returns h when you supply the inputs X and y. Hypotheses are independent of the value of Test. • H0: Coefficients in β are equal in all sequential subsamples. • H1: Coefficients in β change during the period of the sample. Elements of h have the following values and meanings. • Values of 1 indicates rejection of H0 in favor of H1. • Value of 0 indicates failure to reject H0. H — Sequence of test rejection decisions logical matrix | table Sequence of test rejection decisions for each iteration of the cusum tests, returned as a numTestsby-(numObs – numPreds) logical matrix or table of logical variables. Rows correspond to separate cusum tests and columns or variables correspond to iterations. When H is a table, variable j has label Hj. • For tests in which Direction is "forward", columns or variables correspond to times numPreds + 1,...,numObs. • For tests in which Direction is "backward", columns or variables correspond to times numObs – (numPreds + 1),...,1. Rows corresponding to tests in which Intercept is true contain one less iteration, and the value in the first column of H defaults to false. For a particular test (row), if any test decision in the sequence is 1, then h is true; that is, h = any(H,2). Otherwise, h is false. Stat — Sequence of test statistics numeric matrix | table Sequence of test statistics for each iteration of the cusum tests, returned as a numTests-by-(numObs – numPreds) numeric matrix or table of numeric variables. Rows correspond to separate cusum tests and columns or variables correspond to iterations. When Stat is a table, variable j has label Statj. Values in any row depend on the value of Test. Array indices corresponds to the indexing in H. When W is a table, variable j has label Wj. Rows corresponding to tests in which Intercept is true contain one less iteration, and the value in the first column of Stat defaults to NaN. W — Sequence of standardized recursive residuals numeric matrix | table Sequence of standardized recursive residuals, returned as a numTests-by-(numObs – numPreds) numeric matrix or table of numeric variables. 12-364
cusumtest
Values in any row depend on the value of Test. Array indices correspond to the indexing in H. When W is a table, variable j has label Wj. Rows corresponding to tests in which Intercept is true contain one less iteration, and the value in the first column of W defaults to NaN. B — Sequence of recursive regression coefficient estimates numeric array Sequence of recursive regression coefficient estimates, returned as a (numPreds + 1)-by-(numObs – numPreds)-by-numTests numeric array. • B(i,j,k) corresponds to coefficient i at iteration j for test k. At iteration j of test k, cusumtest estimates the coefficients using B(:,j,k) = X(1:numPreds+j,inRegression)\y(1:numPreds+j);
inRegression is a logical vector indicating the predictors in the regression at iteration j of test k. • During forward iterations, initially constant predictors can cause multicollinearity. Therefore, cusumtest holds out constant predictors until their data changes. For iterations in which cusumtest excludes predictors from the regression, corresponding coefficient estimates default to NaN. Similarly, for backward regression, cusumtest holds out terminally constant predictors. For more details, see [1]. • Tests in which: • Intercept is true contain one less iteration, and all values in the first column of B default to NaN. • Intercept is false contain one less coefficient, and the value in the first row, which corresponds to the intercept, defaults to NaN. sumPlots — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a 3-by-numTests graphics array. sumPlots contains unique plot identifiers, which you can use to query or modify properties of the plot.
Limitations Cusum tests have little power to detect structural changes in the following cases. • Late in the sample period • When multiple changes produce cancellations in the cusums
More About Cusum Tests Cusum tests provide useful diagnostics for various model misspecifications, including gradual structural change, multiple structural changes, missing predictors, and neglected nonlinearities. The tests, formulated in [1], are based on cumulative sums, or cusums, of residuals resulting from recursive regressions. 12-365
12
Functions
Tips • The cusum of squares test: • Is a “useful complement to the cusum test, particularly when the departure from constancy of the [recursive coefficients] is haphazard rather than systematic” [1] • Has greater power for cases in which multiple shifts are likely to cancel • Is often suggested for detecting structural breaks in volatility • Alpha specifies the nominal significance levels for the tests. The actual size of a test depends on various assumptions and approximations that cusumtest uses to compute the critical lines. Plots of the recursive residuals are the best indicator of structural change. Brown, et al. suggest that the tests “should be regarded as yardsticks for the interpretation of data rather than leading to hard and fast decisions” [1]. • To produce basic diagnostic plots of the recursive coefficient estimates having the same scale for test n, enter plot(B(:,:,n)')
recreg produces similar plots, optionally using robust standard error bands.
Algorithms • cusumtest handles initially constant predictor data using the method suggested in [1] . If a predictor's data is constant for the first numCoeffs observations and this results in multicollinearity with an intercept or another predictor, then cusumtest drops the predictor from regressions and the computation of recursive residuals until its data changes. Similarly, cusumtest temporarily holds out terminally constant predictors from backward regressions. Initially constant predictors in backward regressions, or terminally constant predictors in forward regressions, are not held out by cusumtest and can lead to rank deficiency in terminal iterations. • cusumtest computes critical lines for inference in essentially different ways for the two test statistics. For cusums, cusumtest solves the normal CDF equation in [1] dynamically for each value of Alpha. For the cusums of squares test, cusumtest interpolates parameter values from the table in [2], using the method suggested in [1]. Sample sizes with degrees of freedom less than 4 are below tabulated values, and cusumtest cannot compute critical lines. Sample sizes with degrees of freedom greater than 202 are above tabulated values, and cusumtest uses the critical value associated with the largest tabulated sample size.
Version History Introduced in R2016a cusumtest returns some outputs in tables when you supply a table of data If you supply a table of time series data Tbl, cusumtest returns H, Stat, and W as tables containing variables for recursive decision statistics with rows corresponding to separate tests. Before R2022a, cusumtest returned the matrix outputs when you supplied a table of input data. Starting in R2022a, if you supply a table of input data, update your code by accessing results in H, Stat, and W using table indexing. For more details, see “Access Data in Tables”. 12-366
cusumtest
References [1] Brown, R. L., J. Durbin, and J. M. Evans. "Techniques for Testing the Constancy of Regression Relationships Over Time." Journal of the Royal Statistical Society, Series B. Vol. 37, 1975, pp. 149–192. [2] Durbin, J. "Tests for Serial Correlation in Regression Analysis Based on the Periodogram of Least Squares Residuals." Biometrika. Vol. 56, 1969, pp. 1–15.
See Also recreg | fitlm | LinearModel | chowtest
12-367
12
Functions
corrplot Plot variable correlations
Syntax [R,PValue] = corrplot(X) [R,PValue] = corrplot(Tbl) [ ___ ] = corrplot( ___ ,Name=Value) corrplot( ___ ) corrplot(ax, ___ ) [ ___ ,H] = corrplot( ___ )
Description [R,PValue] = corrplot(X) plots Pearson's correlation coefficients between all pairs of variables in the matrix of time series data X. The plot is a numVars-by-numVars grid, where numVars is the number of time series variables (columns) in X, including the following subplots: • Each off diagonal subplot contains a scatterplot of a pair of variables with a least-squares reference line, the slope of which is equal to the displayed correlation coefficient. • Each diagonal subplot contains the distribution of a variable as a histogram. Also, the function returns the correlation matrix in the plots R and a matrix of p-values PValue for testing the null hypothesis that each pair of coefficients is not correlated against the alternative hypothesis of a nonzero correlation. [R,PValue] = corrplot(Tbl) plots the Pearson's correlation coefficients between all pairs of variables in the table or timetable Tbl, and also returns tables for the correlation matrix R and matrix of p-values PValue. To select a subset of variables in Tbl, for which to plot the correlation matrix, use the DataVariables name-value argument. [ ___ ] = corrplot( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. corrplot returns the output argument combination for the corresponding input arguments. For example, corrplot(Tbl,Type="Spearman",TestR="on",DataVariables=1:5) computes Spearman’s rank correlation coefficient for the first 5 variables of the table Tbl and tests for significant correlation coefficients. corrplot( ___ ) plots the correlation matrix. corrplot(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,H] = corrplot( ___ ) plots the diagnostics of the input series and additionally returns handles to plotted graphics objects H. Use elements of H to modify properties of the plot after you create it. 12-368
corrplot
Examples Plot and Return Pearson's Correlation Coefficients Between Variables in Matrix of Data Plot and return Pearson's correlation coeffifients between pairs of time series using the default options of corrplot. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada
Plot and return the correlation matrix between all pairs of variables in the data. R = corrplot(Data)
R = 5×5 1.0000 0.9266 0.7401 0.7287 0.7136
0.9266 1.0000 0.5908 0.5716 0.5556
0.7401 0.5908 1.0000 0.9758 0.9384
0.7287 0.5716 0.9758 1.0000 0.9861
0.7136 0.5556 0.9384 0.9861 1.0000
The correlation plot shows that the short-term, medium-term, and long-term interest rates are highly correlated. 12-369
12
Functions
Plot and Return Correlations and p-values Between Table Variables Plot correlations between time series, which are variables in a table, using default options. Return a table of pairwise correlations and a table of corresponding significance-test p-values. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Plot and return the correlation matrix, with corresponding significance-test p-values, between all pairs of variables in the data [R,PValue] = corrplot(TT)
R=5×5 table
INF_C INF_G INT_S
12-370
INF_C _______
INF_G _______
INT_S _______
INT_M _______
INT_L _______
1 0.92665 0.74007
0.92665 1 0.59077
0.74007 0.59077 1
0.72867 0.57159 0.9758
0.7136 0.55557 0.93843
corrplot
INT_M INT_L
0.72867 0.7136
0.57159 0.55557
PValue=5×5 table INF_C __________ INF_C INF_G INT_S INT_M INT_L
1 3.6657e-18 3.2113e-08 6.6174e-08 1.6318e-07
0.9758 0.93843
1 0.98609
0.98609 1
INF_G __________
INT_S __________
INT_M __________
INT_L __________
3.6657e-18 1 4.7739e-05 9.4769e-05 0.00016278
3.2113e-08 4.7739e-05 1 2.3206e-27 1.3408e-19
6.6174e-08 9.4769e-05 2.3206e-27 1 5.1602e-32
1.6318e-07 0.00016278 1.3408e-19 5.1602e-32 1
corrplot returns the correlation matrix and corresponding matrix of p-values in tables R and PValue, respectively. By default, corrplot computes correlations between all pairs of variables in the input table. To select a subset of variables from an input table, set the DataVariables option.
Plot Correlations Between Selected Variables Plot the correlation matrix for selected time series. Load the credit default data set Data_CreditDefaults.mat. The table DataTable contains the default rate of investment-grade corporate bonds series (IGD, the response variable) and several predictor variables. load Data_CreditDefaults
Consider a multiple regression model for the default rate that includes an intercept term. Include a variable in the table of data that represents the intercept in the design matrix (that is, a column of ones). Place the intercept variable at the beginning of the table. Const = ones(height(DataTable),1); DataTable = addvars(DataTable,Const,Before=1);
Create a variable that contains all predictor variable names. varnames = DataTable.Properties.VariableNames; prednames = varnames(varnames ~= "IGD");
Graph a correlation plot of all predictor variables except for the intercept dummy variable. corrplot(DataTable,DataVariables=prednames(2:end));
12-371
12
Functions
The predictor BBB is moderately linearly associated with the other predictors, while all other predictors appear unassociated with each other.
Plot and Test Kendall's Rank Correlation Coefficients Plot Kendall's rank correlations between multiple time series. Conduct a hypothesis test to determine which correlations are significantly different from zero. Load data on Canadian inflation and interest rates. load Data_Canada
Plot the Kendall's rank correlation coefficients between all pairs of variables. Identify which correlations are significantly different from zero by conducting hypothesis tests. corrplot(DataTable,Type="Kendall",TestR="on")
12-372
corrplot
The correlation coefficients highlighted in red indicate which pairs of variables have correlations significantly different from zero. For these time series, all pairs of variables have correlations significantly different from zero.
Conduct Right-Tailed Correlation Tests Test for correlations greater than zero between multiple time series. Load data on Canadian inflation and interest rates Data_Canada.mat. load Data_Canada
Return the pairwise Pearson's correlations and corresponding p-values for testing the null hypothesis of no correlation against the right-tailed alternative that the correlations are greater than zero. [R,PValue] = corrplot(DataTable,Tail="right");
12-373
12
Functions
PValue PValue=5×5 table INF_C __________ INF_C INF_G INT_S INT_M INT_L
1 1.8329e-18 1.6056e-08 3.3087e-08 8.1592e-08
INF_G __________
INT_S __________
INT_M __________
INT_L __________
1.8329e-18 1 2.3869e-05 4.7384e-05 8.1392e-05
1.6056e-08 2.3869e-05 1 1.1603e-27 6.7041e-20
3.3087e-08 4.7384e-05 1.1603e-27 1 2.5801e-32
8.1592e-08 8.1392e-05 6.7041e-20 2.5801e-32 1
The output PValue has pairwise p-values all less than the default 0.05 significance level, indicating that all pairs of variables have correlation significantly greater than zero.
Input Arguments X — Time series data numeric matrix Time series data, specified as a numObs-by-numVars numeric matrix. Each column of X corresponds to a variable, and each row corresponds to an observation. Data Types: double 12-374
corrplot
Tbl — Time series data table | timetable Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. Specify numVars variables to include in the diagnostics computations by using the DataVariables argument. The selected variables must be numeric. ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, corrplot plots to the current axes (gca). corrplot does not support UIAxes targets. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: corrplot(Tbl,Type="Spearman",TestR="on",DataVariables=1:5) computes Spearman’s rank correlation coefficient for the first 5 variables of the table Tbl and tests for significant correlation coefficients. Type — Correlation coefficient "Pearson" (default) | "Kendall" | "Spearman" | character vector Correlation coefficient to compute, specified as a value in this table. Value
Description
"Pearson"
Pearson’s linear correlation coefficient
"Kendall"
Kendall’s rank correlation coefficient (τ)
"Spearman"
Spearman’s rank correlation coefficient (ρ)
Example: Type="Kendall" Data Types: char | string Rows — Option for handling rows in input time series data that contain NaN values "pairwise" (default) | "all" | "complete" | character vector Option for handling rows in the input time series data that contain NaN values, specified as a value in this table. Value
Description
"all"
Use all rows, regardless of any NaN entries.
"complete"
Use only rows that do not contain NaN entries.
12-375
12
Functions
Value
Description
"pairwise"
Use rows that do not contain NaN entries in column (variable) i or j to compute R(i,j).
Example: Rows="complete" Data Types: char | string Tail — Alternative hypothesis "both" (default) | "right" | "left" | character vector Alternative hypothesis Ha used to compute the p-values PValue, specified as a value in this table. Value
Description
"both"
Ha: Correlation is not zero.
"right"
Ha: Correlation is greater than zero.
"left"
Ha: Correlation is less than zero.
Example: Tail="left" Data Types: char | string VarNames — Unique variable names to use in plots string vector | character vector | cell vector of strings | cell vector of character vectors Unique variable names used in the plots, specified as a string vector or cell vector of strings of a length numVars. VarNames(j) specifies the name to use for variable X(:,j) or DataVariables(j). • If the input time series data is the matrix X, the default is {'var1','var2',...}. • If the input time series data is the table or timetable Tbl, the default is Tbl.Properties.VariableNames. Example: VarNames=["Const" "AGE" "BBD"] Data Types: char | cell | string TestR — Flag for testing whether correlations are significant "off" (default) | "on" | character vector Flag for testing whether correlations are significant, specified as a value in this table. Value
Description
"on"
corrplot highlights significant correlations in the correlation matrix plot using red font.
"off"
All correlations in the correlation matrix plot have black font.
Example: TestR="on" Data Types: char | string Alpha — Significance level 0.05 (default) | scalar in [0,1] 12-376
corrplot
Significance level for correlation tests, specified as a scalar in the interval [0,1]. Example: Alpha=0.01 Data Types: double DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl for which corrplot includes in the correlation matrix plot, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true true false] or DataVariables=1:3 selects the first through third table variables. Data Types: double | logical | char | cell | string
Output Arguments R — Correlations numeric matrix | table Correlations between pairs of variables in the input time series data that are displayed in the plots, returned as one of the following quantities: • numVars-by-numVars numeric matrix when you supply the input X. • numVars-by-numVars table when you supply the input Tbl, where numVars is the selected number of variables in the DataVariables argument. PValue — p-values numeric matrix | table p-values corresponding to significance tests on the elements of R, returned as one of the following quantities: • numVars-by-numVars numeric matrix when you supply the input X. • numVars-by-numVars table when you supply the input Tbl, where the variables specified by the DataVariables argument determines numVars and the names of the rows and columns of the output table. The p-values are used to test the null hypothesis of no correlation against the alternative hypothesis of a nonzero correlation, with test tail specified by the TestR argument. H — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as one of the following quantities: • numVars-by-numVars matrix of graphics objects when you supply the input X 12-377
12
Functions
• numVars-by-numVars table of graphics objects when you supply the input Tbl, where the variables specified by the DataVariables argument determines numVars and the names of the rows and columns of the output table H contains unique plot identifiers, which you can use to query or modify properties of the plot.
Tips • The setting Rows="pairwise" (the default) can return a correlation matrix that is not positive definite. The setting Rows="complete" returns a positive-definite matrix, but, in general, the estimates are based on fewer observations.
Algorithms • corrplot computes p-values for Pearson’s correlation by transforming the correlation to create a t-statistic with numObs – 2 degrees of freedom. The transformation is exact when the input time series data is normal. • corrplot computes p-values for Kendall’s and Spearman’s rank correlations by using either the exact permutation distributions (for small sample sizes) or large-sample approximations. • corrplot computes p-values for two-tailed tests by doubling the more significant of the two onetailed p-values.
Version History Introduced in R2012a corrplot returns results in tables when you supply a table of data If you supply a table of time series data Tbl, corrplot returns all outputs in separate tables. Rows and variables in the tables correspond to the variables specified by DataVariables. Before R2022a, corrplot returned each output as a matrix when you supplied a table of input data. Starting in R2022a, if you supply a table of input data and return any of the outputs, access results by using table indexing. For more details, see “Access Data in Tables”.
See Also Apps Econometric Modeler Functions collintest | corr Topics “Time Series Regression II: Collinearity and Estimator Variance” on page 5-180 “Plot Time Series Data Using Econometric Modeler App” on page 4-66
12-378
crosscorr
crosscorr Sample cross-correlation
Syntax [xcf,lags] = crosscorr(y1,y2) XCFTbl = crosscorr(Tbl) [ ___ ,bounds] = crosscorr( ___ ) [ ___ ] = crosscorr( ___ ,Name=Value) crosscorr( ___ ) crosscorr(ax, ___ ) [ ___ ,h] = crosscorr( ___ )
Description [xcf,lags] = crosscorr(y1,y2) returns the sample cross-correlation function on page 12-390 (XCF) xcf and associated lags lags between the univariate time series y1 and y2. XCFTbl = crosscorr(Tbl) returns the table XCFTbl containing variables for the sample XCF and associated lags of the last two variables in the input table or timetable Tbl. To select different variables in Tbl, for which to compute the XCF, use the DataVariables name-value argument. [ ___ ,bounds] = crosscorr( ___ ) uses any input-argument combination in the previous syntaxes, and returns the output-argument combination for the corresponding input arguments and the approximate upper and lower confidence bounds bounds on the XCF. [ ___ ] = crosscorr( ___ ,Name=Value) uses additional options specified by one or more namevalue arguments. For example, crosscorr(Tbl,DataVariables=["RGDP" "CPI"],NumLags=10,NumSTD=1.96) returns the sample XCF for lags -10 through 10 of the table variables "RGDP" and "CPI" in Tbl and 95% confidence bounds. crosscorr( ___ ) plots the sample XCF between the input series with confidence bounds. crosscorr(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = crosscorr( ___ ) plots the sample XCF between the input series and additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it.
Examples Compute XCF Between Vectors of Time Series Data Compute the XCF between two univariate time series. Input the time series data as numeric vectors. Load the equity index data Data_EquityIdx.mat. The variable Data is a 3028-by-2 matrix of daily closing prices from the NASDAQ and NYSE composite indices. Plot the two series. 12-379
12
Functions
load Data_EquityIdx yyaxis left dt = datetime(dates,ConvertFrom="datenum"); plot(dt,Data(:,1)) ylabel("NASDAQ") yyaxis right plot(dt,Data(:,2)) ylabel("NYSE") title("Daily Closing Prices, 1990-2001")
The series exhibit exponential growth. Compute the returns of each series. Ret = price2ret(Data);
Ret is a 3027-by-2 series of returns; it has one less observation than Data. Compute the XCF between the NASDAQ and NYSE returns, and return the associated lags. rnasdaq = Ret(:,1); rnyse = Ret(:,2); [xcf,lags] = crosscorr(rnasdaq,rnyse);
xcf and lags are 41-by-1 vectors that describe the XCF. Display several values of the XCF. 12-380
crosscorr
XCF = [xcf lags]; XCF([1:3 20:22 end-2:end],:) ans = 9×2 -0.0108 0.0186 -0.0002 0.0345 0.7080 0.0651 -0.0461 0.0010 0.0015
-20.0000 -19.0000 -18.0000 -1.0000 0 1.0000 18.0000 19.0000 20.0000
The correlation between the current NASDAQ return and the NYSE return from 20 days before is xcf(1) = -0.0108. The correlation between the NASDAQ and NYSE returns is xcf(21) = 0.7080. The correlation between the NASDAQ return from 20 days ago and the current NYSE return is xcf(41) = 0.0015.
Compute XCF of Table Variable Compute the XCF between two univariate time series, which are two variables in a table. Load the equity index data Data_EquityIdx.mat. The variable DataTable is a 3028-by-2 table of daily closing prices from the NYSE and NASDAQ composite indices, which are stored in the variables NYSE and NASDAQ. load Data_EquityIdx DataTable.Properties.VariableNames ans = 1x2 cell {'NYSE'}
{'NASDAQ'}
Compute the returns of the series. Store the results in a new table. RetTbl = price2ret(DataTable); head(RetTbl) Tick ____ 2 3 4 5 6 7 8 9
Interval ________ 1 1 1 1 1 1 1 1
NYSE __________
NASDAQ __________
-0.0010106 -0.0076633 -0.0084415 0.0035387 -0.010188 -0.0063818 0.0034295 -0.023407
0.0034122 -0.0032816 -0.0025501 0.0010688 -0.0042382 -0.013378 -0.0040909 -0.020573
RetTbl is a 3027-by-4 table containing the returns of the indices, ticks (days by default), and time intervals between successive prices. 12-381
12
Functions
Compute the XCF between the NASDAQ and NYSE return series. XCFTbl = crosscorr(RetTbl) XCFTbl=41×2 table Lags XCF ____ ___________ -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 ⋮
-0.010809 0.018571 -0.00016185 -0.020271 -0.029353 0.00023188 -0.0080616 0.041498 0.078821 -0.013793 0.0076655 0.01763 -0.0011033 -0.011457 -0.016523 -0.046749
crosscorr returns the results in the table XCFTbl, where variables correspond to the XCF (XCF) and associated lags (Lags). By default, crosscorr computes the XCF of the two variables in the table. To select variables from an input table, set the DataVariables option.
Return XCF Confidence Bounds Consider the equity index series in “Compute XCF of Table Variable” on page 12-381. Load the NYSE and NASDAQ closing price series in Data_EquityIdx.mat and preprocess the series. Compute the XCF and return the XCF confidence bounds. load Data_EquityIdx RetTbl = price2ret(DataTable); [XCFTbl,bounds] = crosscorr(RetTbl) XCFTbl=41×2 table Lags XCF ____ ___________ -20 -19 -18 -17 -16 -15 -14
12-382
-0.010809 0.018571 -0.00016185 -0.020271 -0.029353 0.00023188 -0.0080616
crosscorr
-13 -12 -11 -10 -9 -8 -7 -6 -5 ⋮
0.041498 0.078821 -0.013793 0.0076655 0.01763 -0.0011033 -0.011457 -0.016523 -0.046749
bounds = 2×1 0.0364 -0.0364
Assuming the NYSE and NASDAQ return series are uncorrelated, an approximate 95.4% confidence interval on the XCF is (-0.0364, 0.0364).
Plot the XCF Generate 100 random variates from a Gaussian distribution with mean 0 and variance 1. rng(3); % For reproducibility x = randn(100,1);
Create a 4-period delayed version of x. y = lagmatrix(x,4);
Plot the XCF between x and y. Because lagmatrix prepends lagged series with NaN values and crosscorr does not support NaN values, start the series at observation 5. crosscorr(x(5:end),y(5:end))
12-383
12
Functions
The upper and lower confidence bounds are the horizontal lines in the XCF plot. By design, the XCF peaks at lag 4.
Select Table Variables for XCF Plot Load the currency exchange rates data set Data_FXRates.mat. The table DataTable contains daily exchange rates of several countries, relative to the US dollar from 1980 through 1998 (with omissions). load Data_FXRates.mat dt = datetime(dates,ConvertFrom="datenum");
Plot the UK pound and French franc exchange rates. yyaxis left plot(dt,DataTable.GBP) ylabel("UK Pound/$") yyaxis right plot(dt,DataTable.FRF) ylabel("French Franc/$")
12-384
crosscorr
The series appear to be correlated. Stabilize all series in the table by computing the first difference. DiffDT = varfun(@diff,DataTable); DiffDT.Properties.VariableNames = DataTable.Properties.VariableNames;
Determine whether lags of one series are associated with the other series by computing the XCF between the daily changes in the UK pound and French franc exchange rates. figure crosscorr(DiffDT,DataVariables=["GBP" "FRF"]);
12-385
12
Functions
The series have a high contemporaneous correlation, but all other cross-correlations are either insignificant or below 0.1.
Specify Additional XCF Lags Specify the AR(1) model for the first series y1t = 2 + 0 . 3y1t − 1 + εt, where εt is Gaussian with mean 0 and variance 1. MdlY1 = arima(AR=0.3,Constant=2,Variance=1);
MdlY1 is a fully specified arima object representing the AR(1) model. Simulate data from the AR(1) model. rng(3); % For reproducibility T = 1000; y1 = simulate(MdlY1,T);
Simulate standard Gaussian variates for the second series; induce correlation at lag 36. y2 = [randn(36,1); y1(1:end-36) + randn(T-36,1)*0.1];
12-386
crosscorr
Plot the XCF by using the default settings. crosscorr(y1,y2)
All correlations in the plot are within the 2-standard-error confidence bounds. Therefore, none are significant. Plot the XCF for 60 lags on both sides of lag 0. Specify 3 standard errors for the confidence bounds. crosscorr(y1,y2,NumLags=60,NumSTD=3)
12-387
12
Functions
The plot shows significant correlations at and around lag 36.
Input Arguments y1 — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector of length T1. Data Types: double y2 — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector of length T2. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable with T rows. Each row of Tbl contains contemporaneous observations of all variables. Specify the two input series (variables) by using the DataVariables argument. The selected variables must be numeric. 12-388
crosscorr
ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, crosscorr plots to the current axes (gca). Note Missing observations, specified by NaN entries in the input series, result in a NaN-valued XCF. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: crosscorr(Tbl,DataVariables=["RGDP" "CPI"],NumLags=10,NumSTD=1.96) returns the sample XCF for lags -10 through 10 of the table variables "RGDP" and "CPI" in Tbl and 95% confidence bounds. NumLags — Number of lags positive integer Number of lags in the sample XCF, specified as a positive integer. crosscorr uses lags 0, ±1, ±2, …, ±NumLags to compute the sample XCF. If you supply y1 and y2, the default is min(20, min(T1,T2) – 1)). If you supply Tbl, the default is min(20, T – 1). Example: crosscorr(y1,y2,NumLags=10) plots the sample XCF between y1 and y2 for lags –10 through 10. Data Types: double NumSTD — Number of standard errors in confidence bounds 2 (default) | nonnegative scalar Number of standard errors in the confidence bounds, specified as a nonnegative scalar. The confidence bounds are 0 ± NumSTD*σ , where σ is the estimated standard error of the sample crosscorrelation between the input series assuming the series are uncorrelated. The default yields approximate 95% confidence bounds. Example: crosscorr(y1,y2,NumSTD=1.5) plots the XCF of y1 and y2 with confidence bounds 1.5 standard errors away from 0. Data Types: double DataVariables — Two variables in Tbl last two variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Two variables in Tbl for which crosscorr computes the XCF, specified as a string vector or cell vector of character vectors containing two variable names in Tbl.Properties.VariableNames, or 12-389
12
Functions
an integer or logical vector representing the indices of two names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | string
Output Arguments xcf — Sample XCF numeric vector Sample XCF between the input time series, returned as a numeric vector of length 2*NumLags + 1. The elements of xcf correspond to the elements of lags. The center element is the lag 0 crosscorrelation. crosscorr returns xcf only when you supply the inputs y1 and y2. lags — XCF lags numeric vector XCF lags, returned as a numeric vector with elements (-NumLags):NumLags having the same orientation as y1. crosscorr returns lags only when you supply the inputs y1 and y2. XCFTbl — Sample XCF table Sample XCF, returned as a table with variables for the outputs xcf and lags. crosscorr returns XCFTbl only when you supply the input Tbl. bounds — Approximate upper and lower XCF confidence bounds numeric vector Approximate upper and lower XCF confidence bounds assuming the input series are uncorrelated, returned as a two-element numeric vector. The NumSTD option specifies the number of standard errors from 0 in the confidence bounds. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
More About Cross-Correlation Function The cross-correlation function (XCF) measures the similarity between a time series and lagged versions of another time series as a function of the lag. Consider the time series y1,t and y2,t and lags k = 0, ±1, ±2, …. For data pairs (y1,1,y2,1), (y1,2,y2,2), …, (y1,T,y2,T), an estimate of the lag k cross-covariance is 12-390
crosscorr
T−k
1 T cy1y2(k) =
∑
y1, t − y 1 y2, t + k − y 2 ; k = 0, 1, 2, …
t=1
,
T+k
1 T
∑
y2, t − y 2 y1, t − k − y 1 ; k = 0, − 1, − 2, …
t=1
where y 1 and y 2 are the sample means of the series. The sample standard deviations of the series are: • sy = cy y (0), where cy y (0) = Var(y1) . 1 1 1 1 1 • sy = cy y (0), where cy y (0) = Var(y2) . 2 2 2 2 2 An estimate of the cross-correlation is r y1y2(k) =
cy1y2(k) sy1sy2
; k = 0, ± 1, ± 2, ….
Algorithms • If y1 and y2 have different lengths, crosscorr appends enough zeros to the end of the shorter vector to make both vectors the same size. • crosscorr uses a Fourier transform (fft) to compute the XCF in the frequency domain, and then crosscorr converts back to the time domain using an inverse Fourier transform (ifft). • NaN values in the input series result in NaN values in the output XCF. Unlike autocorr and parcorr, crosscorr does not treat NaN values as missing completely at random. Whereas autocorr and parcorr compute coefficients in the time domain, crosscorr uses fft and ifft to compute coefficients in the frequency domain. Therefore, missing data treatments follow fft and ifft defaults. • crosscorr plots the XCF when you do not request any output or when you request the fourth output.
Version History Introduced before R2006a
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also autocorr | parcorr | filter
12-391
12
Functions
diffuseblm Bayesian linear regression model with diffuse conjugate prior for data likelihood
Description The Bayesian linear regression model on page 12-401 object diffuseblm specifies that the joint prior distribution of (β,σ2) is proportional to 1/σ2 (the diffuse prior model). The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. The resulting marginal and conditional posterior distributions are analytically tractable. For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12393.
Creation Syntax PriorMdl = diffuseblm(NumPredictors) PriorMdl = diffuseblm(NumPredictors,Name,Value) Description PriorMdl = diffuseblm(NumPredictors) creates a Bayesian linear regression model on page 12-401 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is the diffuse model. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = diffuseblm(NumPredictors,Name,Value) sets properties on page 12-392 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, diffuseblm(2,'VarNames',["UnemploymentRate"; "CPI"]) specifies the names of the two predictor variables in the model.
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter PriorMdl.Intercept = false;
12-392
diffuseblm
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term from the value. After creating a model, if you change the value of NumPredictors using dot notation, then VarNames reverts to its default value. Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char
Object Functions estimate
Estimate posterior distribution of Bayesian linear regression model parameters
12-393
12
Functions
simulate forecast plot summarize
Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of standard Bayesian linear regression model
Examples Create Diffuse Prior Model Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t time points, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Suppose that the regression coefficients β = [β0, . . . , β3]′ and the disturbance variance σ2 are random variables, and you have no prior knowledge of their values or distribution. That is, you want to use the noninformative Jeffreys prior: the joint prior distribution is proportional to 1/σ2. These assumptions and the data likelihood imply an analytically tractable posterior distribution. Create a diffuse prior model for the linear regression parameters, which is the default prior model type. Specify the number of predictors p. p = 3; Mdl = bayeslm(p) Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(1) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(2) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(3) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Mdl is a diffuseblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, bayeslm displays a summary of the prior distributions. Because the prior is noninformative and the data have not been incorporated yet, the summary is trivial. 12-394
diffuseblm
You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. Mdl.VarNames = ["IPI" "E" "WR"] Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Estimate Marginal Posterior Distributions Consider the linear regression model in “Create Diffuse Prior Model” on page 12-394. Create a diffuse prior model for the linear regression parameters. Specify the number of predictors, p, and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------------Intercept | -24.2536 9.5314 [-43.001, -5.506] 0.006 t (-24.25, 9.37^2, 58) IPI | 4.3913 0.1535 [ 4.089, 4.693] 1.000 t (4.39, 0.15^2, 58) E | 0.0011 0.0004 [ 0.000, 0.002] 0.999 t (0.00, 0.00^2, 58) WR | 2.4682 0.3787 [ 1.723, 3.213] 1.000 t (2.47, 0.37^2, 58) Sigma2 | 51.9790 10.0034 [35.965, 74.937] 1.000 IG(29.00, 0.00069)
12-395
12
Functions
PosteriorMdl is a conjugateblm model object storing the joint marginal posterior distribution of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of WR is in [1.723, 3.213] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.006. • Distribution, which contains descriptions of the posterior distributions of the parameters. For example, the marginal posterior distribution of IPI is t with a mean of 4.39, a standard deviation of 0.15, and 58 degrees of freedom. Access properties of the posterior distribution using dot notation. For example, display the marginal posterior means by accessing the Mu property. PosteriorMdl.Mu ans = 4×1 -24.2536 4.3913 0.0011 2.4682
Estimate Conditional Posterior Distribution Consider the linear regression model in “Create Diffuse Prior Model” on page 12-394. Create a diffuse prior model for the linear regression parameters. Specify the number of predictors p, and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]) PriorMdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Load the Nelson-Plosser data set. Create variables for the response and predictor series. 12-396
diffuseblm
load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the conditional posterior distribution of β given the data and σ2 = 2, and return the estimation summary table to access the estimates. [Mdl,Summary] = estimate(PriorMdl,X,y,'Sigma2',2); Method: Analytic posterior distributions Conditional variable: Sigma2 fixed at 2 Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | -24.2536 1.8696 [-27.918, -20.589] 0.000 N (-24.25, 1.87^2) IPI | 4.3913 0.0301 [ 4.332, 4.450] 1.000 N (4.39, 0.03^2) E | 0.0011 0.0001 [ 0.001, 0.001] 1.000 N (0.00, 0.00^2) WR | 2.4682 0.0743 [ 2.323, 2.614] 1.000 N (2.47, 0.07^2) Sigma2 | 2 0 [ 2.000, 2.000] 1.000 Fixed value
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 2 during estimation, inferences on it are trivial. Extract the mean vector and covariance matrix of the conditional posterior of β from the estimation summary table. condPostMeanBeta = Summary.Mean(1:(end - 1)) condPostMeanBeta = 4×1 -24.2536 4.3913 0.0011 2.4682 CondPostCovBeta = Summary.Covariances(1:(end - 1),1:(end - 1)) CondPostCovBeta = 4×4 3.4956 0.0350 -0.0001 0.0241
0.0350 0.0009 -0.0000 -0.0013
-0.0001 -0.0000 0.0000 -0.0000
0.0241 -0.0013 -0.0000 0.0055
Display Mdl. Mdl Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1
12-397
12
Functions
VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Because estimate computes the conditional posterior distribution, it returns the original prior model, not the posterior, in the first position of the output argument list.
Estimate Posterior Probability Using Monte Carlo Simulation Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-395. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------------Intercept | -24.2536 9.5314 [-43.001, -5.506] 0.006 t (-24.25, 9.37^2, 58) IPI | 4.3913 0.1535 [ 4.089, 4.693] 1.000 t (4.39, 0.15^2, 58) E | 0.0011 0.0004 [ 0.000, 0.002] 0.999 t (0.00, 0.00^2, 58) WR | 2.4682 0.3787 [ 1.723, 3.213] 1.000 t (2.47, 0.37^2, 58) Sigma2 | 51.9790 10.0034 [35.965, 74.937] 1.000 IG(29.00, 0.00069)
Extract the posterior mean of β from the posterior model, and the posterior covariance of β from the estimation summary returned by summarize. estBeta = PosteriorMdl.Mu; Summary = summarize(PosteriorMdl); estBetaCov = Summary.Covariances{1:(end - 1),1:(end - 1)};
Suppose that if the coefficient of real wages is below 2.5, then a policy is enacted. Although the posterior distribution of WR is known, and you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead. 12-398
diffuseblm
Draw 1e6 samples from the marginal posterior distribution of β. NumDraws = 1e6; rng(1); BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws);
BetaSim is a 4-by- 1e6 matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws. Isolate the draws corresponding to the coefficient of real wages, and then identify which draws are less than 2.5. isWR = PosteriorMdl.VarNames == "WR"; wrSim = BetaSim(isWR,:); isWRLT2p5 = wrSim < 2.5;
Find the marginal posterior probability that the regression coefficient of WR is below 2.5 by computing the proportion of draws that are less than 2.5. probWRLT2p5 = mean(isWRLT2p5) probWRLT2p5 = 0.5341
The posterior probability that the coefficient of real wages is less than 2.5 is about 0.53. The marginal posterior distribution of the coefficient of WR is a t58, but centered at 2.47 and scaled by 0.37. Directly compute the posterior probability that the coefficient of WR is less than 2.5. center = estBeta(isWR); stdBeta = sqrt(diag(estBetaCov)); scale = stdBeta(isWR); t = (2.5 - center)/scale; dof = 68; directProb = tcdf(t,dof) directProb = 0.5333
The posterior probabilities are nearly identical.
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-395. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so you can use them to forecast real GNP. Turn the estimation display off. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data
12-399
12
Functions
yFT = DataTable{(end - fhs + 1):end,'GNPR'};
% True future responses
PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product'); ylabel('rGNP'); xlabel('Year'); hold off
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2))
12-400
diffuseblm
frmse = 25.5489
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. Copyright 2018 The MathWorks, Inc.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Alternatives The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2017a
See Also Objects semiconjugateblm | conjugateblm | customblm | empiricalblm Functions bayeslm 12-401
12
Functions
Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-402
diffusebvarm
diffusebvarm Bayesian vector autoregression (VAR) model with diffuse prior for data likelihood
Description The Bayesian VAR model on page 12-416 object diffusebvarm specifies the joint prior distribution of the array of model coefficients Λ and the innovations covariance matrix Σ of an m-D VAR(p) model. The joint prior distribution (Λ,Σ) is the diffuse model on page 12-1892. A diffuse prior model does not enable you to specify hyperparameter values for coefficient sparsity; all AR lags in the model are weighted equally. To implement Minnesota regularization, create a conjugate, semiconjugate, or normal prior model by using bayesvarm. In general, when you create a Bayesian VAR model object, it specifies the joint prior distribution and characteristics of the VARX model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12-407.
Creation Syntax PriorMdl = diffusebvarm(numseries,numlags) PriorMdl = diffusebvarm(numseries,numlags,Name,Value) Description To create a diffusebvarm object, use either the diffusebvarm function (described here) or the bayesvarm function. PriorMdl = diffusebvarm(numseries,numlags) creates a numseries-D Bayesian VAR(numlags) model object PriorMdl, which specifies dimensionalities and prior assumptions for all model coefficients λ = vec Λ = vec Φ1 Φ2 ⋯ Φp c δ Β ′ and the innovations covariance Σ, where: • numseries = m, the number of response time series variables. • numlags = p, the AR polynomial order. • The joint prior distribution of (λ,Σ) is the diffuse model on page 12-417. PriorMdl = diffusebvarm(numseries,numlags,Name,Value) sets writable properties on page 12-404 (except NumSeries and P) using name-value pair arguments. Enclose each property name in quotes. For example, diffusebvarm(3,2,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]) specifies the names of the three response variables in the Bayesian VAR(2) model.
12-403
12
Functions
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double numlags — Number of lagged responses nonnegative integer Number of lagged responses in each equation of yt, specified as a nonnegative integer. The resulting model is a VAR(numlags) model; each lag has a numseries-by-numseries coefficient matrix. numlags sets the P property. Data Types: double
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to create a 3-D Bayesian VAR(1) model and label the first through third response variables, and then include a linear time trend term, enter: PriorMdl = diffusebvarm(3,1,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]); PriorMdl.IncludeTrend = true; Model Characteristics and Dimensionality
Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the model dimensionality, for example '2-Dimensional VAR(3) Model'. Example: "Model 1" Data Types: string | char NumSeries — Number of time series m positive integer This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double P — Multivariate autoregressive polynomial order nonnegative integer This property is read-only. 12-404
diffusebvarm
Multivariate autoregressive polynomial order, specified as a nonnegative integer. P is the maximum lag that has a nonzero coefficient matrix. P specifies the number of presample observations required to initialize the model. Data Types: double SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. diffusebvarm stores SeriesNames as a string vector. Example: ["UnemploymentRate" "CPI" "FEDFUNDS"] Data Types: string IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Data Types: logical IncludeTrend — Flag for including linear time trend term δt false (default) | true Flag for including a linear time trend term δt, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Data Types: logical NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. diffusebvarm includes all predictor variables symmetrically in each response equation. VAR Model Parameters Derived from Distribution Hyperparameters
AR — Distribution mean of autoregressive coefficient matrices Φ1,…,Φp cell vector of numeric matrices 12-405
12
Functions
This property is read-only. Distribution mean of the autoregressive coefficient matrices Φ1,…,Φp associated with the lagged responses, specified as a P-D cell vector of NumSeries-by-NumSeries numeric matrices. AR{j} is Φj, the coefficient matrix of lag j . Rows correspond to equations and columns correspond to lagged response variables; SeriesNames determines the order of response variables and equations. Coefficient signs are those of the VAR model expressed in difference-equation notation. If P = 0, AR is an empty cell. Otherwise, AR is the collection of AR coefficient means extracted from Mu. Data Types: cell Constant — Distribution mean of model constant c numeric vector This property is read-only. Distribution mean of the model constant c (or intercept), specified as a NumSeries-by-1 numeric vector. Constant(j) is the constant in equation j; SeriesNames determines the order of equations. If IncludeConstant = false, Constant is an empty array. Otherwise, Constant is the model constant vector mean extracted from Mu. Data Types: double Trend — Distribution mean of linear time trend δ numeric vector This property is read-only. Distribution mean of the linear time trend δ, specified as a NumSeries-by-1 numeric vector. Trend(j) is the linear time trend in equation j; SeriesNames determines the order of equations. If IncludeTrend = false (the default), Trend is an empty array. Otherwise, Trend is the linear time trend coefficient mean extracted from Mu. Data Types: double Beta — Distribution mean of regression coefficient matrix Β numeric matrix This property is read-only. Distribution mean of the regression coefficient matrix B associated with the exogenous predictor variables, specified as a NumSeries-by-NumPredictors numeric matrix. Beta(j,:) contains the regression coefficients of each predictor in the equation of response variable j yj,t. Beta(:,k) contains the regression coefficient in each equation of predictor xk. By default, all predictor variables are in the regression component of all response equations. You can down-weight a predictor from an equation by specifying, for the corresponding coefficient, a prior mean of 0 in Mu and a small variance in V. When you create a model, the predictor variables are hypothetical. You specify predictor data when you operate on the model (for example, when you estimate the posterior by using estimate). Columns of the predictor data determine the order of the columns of Beta. 12-406
diffusebvarm
Data Types: double Covariance — Distribution mean of innovations covariance matrix Σ nan(NumSeries,NumSeries) Distribution mean of the innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries matrix of NaN. Because the prior model is diffuse, the mean of Σ is unknown, a priori.
Object Functions estimate forecast simsmooth simulate summarize
Estimate posterior distribution of Bayesian vector autoregression (VAR) model parameters Forecast responses from Bayesian vector autoregression (VAR) model Simulation smoother of Bayesian vector autoregression (VAR) model Simulate coefficients and innovations covariance matrix of Bayesian vector autoregression (VAR) model Distribution summary statistics of Bayesian vector autoregression (VAR) model
Examples Create Diffuse Prior Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt
4
=c+
FEDFUNDSt
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. Assume that the joint prior distribution of the VAR model parameters Φ1, . . . , Φ4, c ′, Σ is diffuse. Create a diffuse prior model for the 3-D VAR(4) model parameters. numseries = 3; numlags = 4; PriorMdl = diffusebvarm(numseries,numlags) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double]
[3x3 double]}
12-407
12
Functions
Covariance: [3x3 double]
PriorMdl is a diffusebvarm Bayesian VAR model object representing the prior distribution of the coefficients and innovations covariance of the 3-D VAR(4) model. The command line display shows properties of the model. You can display properties by using dot notation. Display the prior covariance mean matrices of the four AR coefficients by setting each matrix in the cell to a variable. AR1 = PriorMdl.AR{1} AR1 = 3×3 0 0 0
0 0 0
0 0 0
AR2 = PriorMdl.AR{2} AR2 = 3×3 0 0 0
0 0 0
0 0 0
AR3 = PriorMdl.AR{3} AR3 = 3×3 0 0 0
0 0 0
0 0 0
AR4 = PriorMdl.AR{4} AR4 = 3×3 0 0 0
0 0 0
0 0 0
diffusebvarm centers all AR coefficients at 0 by default. Because the model is diffuse, the data informs the posterior distribution.
Create Diffuse Bayesian AR(2) Model Consider a 1-D Bayesian AR(2) model for the daily NASDAQ returns from January 2, 1990 through December 31, 2001. yt = c + ϕ1 yt − 1 + ϕ2 yt − 1 + εt . 12-408
diffusebvarm
The joint prior is diffuse. Create a diffuse prior model for the AR(2) model parameters. numseries = 1; numlags = 2; PriorMdl = diffusebvarm(numseries,numlags) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" 1 2 "Y1" 1 0 0 {[0] [0]} 0 [1x0 double] [1x0 double] NaN
Specify Response Names and Include Linear Time Trend Consider adding a linear time trend term to the 3-D VAR(4) model of “Create Diffuse Prior Model” on page 12-407: INFLt UNRATEt
4
= c + δt +
FEDFUNDSt
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
Create a diffuse prior model for the 3-D VAR(4) model parameters. Specify response variable names. numseries = 3; numlags = 4; seriesnames = ["INFL"; "UNRATE"; "FEDFUNDS"]; PriorMdl = diffusebvarm(numseries,numlags,'SeriesNames',seriesnames,... 'IncludeTrend',true) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "UNRATE" "FEDFUNDS"] 1 1 0 {[3x3 double] [3x3 double] [3x3 double] [3x1 double]
[3x3 double]}
12-409
12
Functions
Trend: [3x1 double] Beta: [3x0 double] Covariance: [3x3 double]
Prepare Prior for Exogenous Predictor Variables Consider the 2-D VARX(1) model for the US real GDP (RGDP) and investment (GCE) rates that treats the personal consumption (PCEC) rate as exogenous: RGDPt GCEt
=c+Φ
RGDPt − 1 GCEt − 1
+ PCECt β + εt .
For all t, εt is a series of independent 2-D normal innovations with a mean of 0 and covariance Σ. Assume that the joint prior distribution is diffuse. Create a diffuse prior model for the 2-D VARX(1) model parameters. numseries = 2; numlags = 1; numpredictors = 1; PriorMdl = diffusebvarm(numseries,numlags,'NumPredictors',numpredictors) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta: Covariance:
"2-Dimensional VAR(1) Model" 2 1 ["Y1" "Y2"] 1 0 1 {[2x2 double]} [2x1 double] [2x0 double] [2x1 double] [2x2 double]
Work with Prior and Posterior Distributions Consider the 3-D VAR(4) model of “Create Diffuse Prior Model” on page 12-407. Estimate the posterior distribution, and generate forecasts from the corresponding posterior predictive distribution. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"];
12-410
diffusebvarm
DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = diffusebvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate Posterior Distribution Estimate the posterior distribution by passing the prior model and entire data series to estimate. 12-411
12
Functions
rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','equation'); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1241 -0.4809 0.1005 0.3236 -0.0503 0.0450 0.4272 | (0.0762) (0.1536) (0.0390) (0.0868) (0.1647) (0.0413) (0.0860) DUNRATE | -0.0219 0.4716 0.0391 0.0913 0.2414 0.0536 -0.0389 | (0.0413) (0.0831) (0.0211) (0.0469) (0.0891) (0.0223) (0.0465) DFEDFUNDS | -0.1586 -1.4368 -0.2905 0.3403 -0.2968 -0.3117 0.2848 | (0.1632) (0.3287) (0.0835) (0.1857) (0.3526) (0.0883) (0.1841) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
PosteriorMdl is a conjugatebvarm model object; the posterior is analytically tractable. By default, estimate uses the first four observations as a presample to initialize the model. Generate Forecasts from Posterior Predictive Distribution From the posterior predictive distribution, generate forecasts over a two-year horizon. Because sampling from the posterior predictive distribution requires the entire data set, specify the prior model in forecast instead of the posterior. fh = 8; FY = forecast(PriorMdl,fh,rmDataTimeTable{:,seriesnames});
FY is an 8-by-3 matrix of forecasts. Plot the end of the data set and the forecasts. fp = rmDataTimeTable.Time(end) + calquarters(1:fh); figure plotdata = [rmDataTimeTable{end - 10:end,seriesnames}; FY]; plot([rmDataTimeTable.Time(end - 10:end); fp'],plotdata) hold on plot([fp(1) fp(1)],ylim,'k-.') legend(seriesnames) title('Data and Forecasts') hold off
12-412
diffusebvarm
Compute Impulse Responses Plot impulse response functions by passing posterior estimations to armairf. armairf(PosteriorMdl.AR,[],'InnovCov',PosteriorMdl.Covariance)
12-413
12
Functions
12-414
diffusebvarm
12-415
12
Functions
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. 12-416
diffusebvarm
• zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation. Diffuse Model The diffuse model is an m-D Bayesian VAR model on page 12-416 that has the noninformative joint prior distribution Λ, Σ ∝ Σ
−
m+1 2
.
The diffuse model is the limiting case of the conjugate prior model (see conjugatebvarm) when Μ → 0, V-1 → 0, Ω → 0, and ν → -k, where: • k = mp + r + 1c + 1δ, the number of coefficients per response equation. • r = NumPredictors. • 1c is 1 if IncludeConstant is true, and 0 otherwise. • 1δ is 1 if IncludeTrend is true, and 0 otherwise. If the sample size is large enough to satisfy least-squares estimation, the posterior distributions are proper and analytically tractable. Λ Σ, yt, xt N(mp + r + 1c + 1δ) × m Μ, V, Σ Σ yt, xt Inverse Wishart Ω, ν , 12-417
12
Functions
where: •
Μ=
−1
T
∑
t=1
•
V=
T
∑
t=1
•
Ω=
T
∑
t=1
zt′zt
T
∑
t=1
zt′yt′ .
−1
zt′zt
.
yt − Μ′zt′ yt − Μ′zt′ ′ .
• ν = T + k.
Algorithms • If you pass a diffusebvarm object and data to estimate, MATLAB returns a conjugatebvarm object representing the posterior distribution.
Version History Introduced in R2020a
See Also Functions bayesvarm Objects conjugatebvarm | normalbvarm | semiconjugatebvarm
12-418
distplot
distplot Plot Markov chain redistributions
Syntax distplot(mc,X) distplot(mc,X,Name,Value) distplot(ax, ___ ) h = distplot( ___ )
Description distplot(mc,X) creates a heatmap from the data X showing the evolution of a distribution of states in the discrete-time Markov chain mc. distplot(mc,X,Name,Value) uses additional options specified by one or more name-value arguments. For example, specify the type of plot or the frame rate for animated plots. distplot(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca) using any of the input argument combinations in the previous syntaxes. h = distplot( ___ ) returns a handle to the distribution plot. Use h to modify properties of the plot after you create it.
Examples Visualize Evolution of State Distribution Create a four-state Markov chain from a randomly generated transition matrix containing eight infeasible transitions. rng('default'); % For reproducibility mc = mcmix(4,'Zeros',8);
mc is a dtmc object. Plot a digraph of the Markov chain. figure; graphplot(mc);
12-419
12
Functions
State 4 is an absorbing state. Compute the state redistributions at each step for 10 discrete time steps. Assume an initial uniform distribution over the states. X = redistribute(mc,10) X = 11×4 0.2500 0.0869 0.1073 0.0533 0.0641 0.0379 0.0404 0.0266 0.0259 0.0183 ⋮
0.2500 0.2577 0.2990 0.2133 0.2010 0.1473 0.1316 0.0997 0.0864 0.0670
0.2500 0.3088 0.1536 0.1844 0.1092 0.1162 0.0765 0.0746 0.0526 0.0484
0.2500 0.3467 0.4402 0.5489 0.6257 0.6985 0.7515 0.7991 0.8351 0.8663
X is an 11-by-4 matrix. Rows correspond to time steps, and columns correspond to states. Visualize the state redistribution. figure; distplot(mc,X)
12-420
distplot
After 10 transitions, the distribution appears to settle with a majority of the probability mass in state 4.
Animate Evolution of Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 0 0 0
0 0 0 0 0
1/2 1/4 1/4 0 0 ; 1/3 0 2/3 0 0 ; 0 0 0 1/3 2/3; 0 0 0 1/2 1/2; 0 0 0 3/4 1/4;
12-421
12
Functions
1/2 1/2 0 1/4 3/4 0 mc = dtmc(P);
0 0
0 0
0 0
0 ; 0 ];
Compute the state redistributions at each step for 20 discrete time steps. X = redistribute(mc,20);
Animate the redistributions in a histogram. Specify a half-second frame rate. figure; distplot(mc,X,'Type','histogram','FrameRate',0.5);
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). X — Evolution of state probabilities nonnegative numeric matrix Evolution of state probabilities, specified as a (1 + numSteps)-by-NumStates nonnegative numeric matrix returned by redistribute. The first row is the initial state distribution. Subsequent rows are 12-422
distplot
the redistributions at each step. distplot normalizes the rows by their respective sums before plotting. Data Types: double ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, distplot plots to the current axes (gca). Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Type','graph','FrameRate',3 creates an animated plot of the redistributions using a frame rate of 3 seconds. Type — Plot type 'evolution' (default) | 'histogram' | 'graph' Plot type, specified as the comma-separated pair consisting of 'Type' and a value in this table. Value
Description
'evolution' Evolution of the initial distribution. The plot is a (1 + NumSteps)-by-NumStates heatmap. Row i displays the redistribution at step i. 'histogram' Animated histogram of the redistributions. The vertical axis displays probability mass, and the horizontal axis displays states. The 'FrameRate' name-value pair argument controls the animation progress. 'graph'
Animated graph of the redistributions. distplot colors the nodes by their probability mass at each step. The 'FrameRate' name-value pair argument controls the animation progress.
Example: 'Type','graph' Data Types: string | char FrameRate — Length of discrete time steps positive scalar Length of discrete time steps, in seconds, for animated plots, specified as the comma-separated pair consisting of 'FrameRate' and a positive scalar. The default is a pause at each time step. The animation proceeds when you press the space bar. Example: 'FrameRate',3 Data Types: double
12-423
12
Functions
Output Arguments h — Handle to distribution plot graphics object Handle to the distribution plot, returned as a graphics object. h contains a unique plot identifier, which you can use to query or modify properties of the plot.
Version History Introduced in R2017b
See Also Objects dtmc Functions redistribute | simplot Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Compute State Distribution of Markov Chain at Each Time Step” on page 10-66
12-424
disp
disp Class: dssm Display summary information for diffuse state-space model
Syntax disp(Mdl) disp(Mdl,params) disp( ___ ,Name,Value)
Description disp(Mdl) displays summary information for the diffuse state-space model on page 11-4 (dssm model object) Mdl. The display includes the state and observation equations as a system of scalar equations to facilitate model verification. The display also includes the coefficient dimensionalities, notation, and initial state distribution types. The software displays unknown parameter values using c1 for the first unknown parameter, c2 for the second unknown parameter, and so on. For time-varying models with more than 20 different sets of equations, the software displays the first and last 10 groups in terms of time (the last group is the latest). disp(Mdl,params) displays the dssm model Mdl and applies initial values to the model parameters (params). disp( ___ ,Name,Value) displays Mdl using additional options specified by one or more Name,Value pair arguments. For example, you can specify the number of digits to display after the decimal point for model coefficients, or the number of terms per row for state and observation equations. You can use any of the input arguments in the previous syntaxes.
Input Arguments Mdl — Diffuse state-space model dssm model object Diffuse state-space model, specified as a dssm model object returned by dssm or estimate. params — Initial values for unknown parameters [] (default) | numeric vector Initial values for unknown parameters, specified as a numeric vector. The elements of params correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise, following the order A, B, C, D, Mean0, Cov0. 12-425
12
Functions
• If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then you must set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrices mapping function. To set the type of initial state distribution, see dssm. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. MaxStateEq — Maximum number of equations to display 100 (default) | positive integer Maximum number of equations to display, specified as the comma-separated pair consisting of 'MaxStateEq' and a positive integer. If the maximum number of states among all periods is no larger than MaxStateEq, then the software displays the model equation by equation. Example: 'MaxStateEq',10 Data Types: double NumDigits — Number of digits to display after decimal point 2 (default) | nonnegative integer Number of digits to display after the decimal point for known or estimated model coefficients, specified as the comma-separated pair consisting of 'NumDigits' and a nonnegative integer. Example: 'NumDigits',0 Data Types: double Period — Period to display state and observation equations positive integer Period to display state and observation equations for time-varying state-space models, specified as the comma-separated pair consisting of 'Period' and a positive integer. By default, the software displays state and observation equations for all periods. If Period exceeds the maximum number of observations that the model supports, then the software displays state and observation equations for all periods. If the model has more than 20 different sets of equations, then the software displays the first and last 10 groups in terms of time (the last group is the latest). Example: 'Period',120 Data Types: double PredictorsPerRow — Number of equation terms to display per row 5 (default) | positive integer Number of equation terms to display per row, specified as the comma-separated pair consisting of 'PredictorsPerRow' and a positive integer. 12-426
disp
Example: 'PredictorsPerRow',3 Data Types: double
Examples Verify Explicitly Created Diffuse State-Space Model An important step in state-space model analysis is to ensure that the software interprets the state and observation equation matrices as you intend. Use disp to help you verify the diffuse state-space model. Define a diffuse state-space model, where the state equation is an AR(2) model, and the observation equation is the difference between the current and previous state plus the observation error. Symbolically, the state-space model is x1, t x2, t x3, t
0 . 6 0 . 2 0 . 5 x1, t − 1 0.3 = 1 0 0 x2, t − 1 + 0 u1, t 0 0 1 x3, t − 1 0 x1, t
yt = 1 −1 0 x2, t + 0 . 1εt . x3, t Assume the initial state distribution is diffuse. There are three states: x1, t is the AR(2) process, x2, t represents x1, t − 1, and x3, t is the AR(2) model constant. Define the state-transition matrix. A = [0.6 0.2 0.5; 1 0 0; 0 0 1];
Define the state-disturbance-loading matrix. B = [0.3; 0; 0];
Define the measurement-sensitivity matrix. C = [1 -1 0];
Define the observation-innovation matrix. D = 0.1;
Specify the state-space model using dssm. Identify the type of initial state distributions (StateType) by noting the following: • x1, t is an AR(2) process with diffuse initial distribution. • x2, t is the same AR(2) process as x1, t. • x3, t is the constant 1 for all periods. 12-427
12
Functions
StateType = [2 2 1]; Mdl = dssm(A,B,C,D,'StateType',StateType);
Mdl is a dssm model. Verify the diffuse state-space model using disp. disp(Mdl) State-space model type: dssm State vector length: 3 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State x1(t) x2(t) x3(t)
equations: = (0.60)x1(t-1) + (0.20)x2(t-1) + (0.50)x3(t-1) + (0.30)u1(t) = x1(t-1) = x3(t-1)
Observation equation: y1(t) = x1(t) - x2(t) + (0.10)e1(t) Initial state distribution: Initial state means x1 x2 x3 0 0 1 Initial state covariance matrix x1 x2 x3 x1 Inf 0 0 x2 0 Inf 0 x3 0 0 0 State types x1 x2 Diffuse Diffuse
x3 Constant
Cov0 has infinite variance for the AR(2) states.
Display Diffuse State-Space Model with Initial Values Create a diffuse state-space model containing two independent, autoregressive states, and where the observations are the deterministic sum of the two states. Symbolically, the system of equations is xt, 1 xt, 2 12-428
=
ϕ1 0 xt − 1, 1 0 ϕ2 xt − 1, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
disp
yt = 1 1
xt, 1 xt, 2
.
Specify the state-transition matrix. A = [NaN 0; 0 NaN];
Specify the state-disturbance-loading matrix. B = [NaN 0; 0 NaN];
Specify the measurement-sensitivity matrix. C = [1 1];
Create the diffuse state-space model by using dssm. Specify that the first state is stationary and the second is diffuse. StateType = [0; 2]; Mdl = dssm(A,B,C,'StateType',StateType);
Mdl is a dssm model object. Display the state-space model. Specify initial values for the unknown parameters and the initial state means and covariance matrix as follows: • ϕ1, 0 = ϕ2, 0 = 0 . 1. • σ1, 0 = σ2, 0 = 0 . 2. params = [0.1; 0.1; 0.2; 0.2]; disp(Mdl,params) State-space model type: dssm State vector length: 2 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 0 Sample size supported by model: Unlimited Unknown parameters for estimation: 4 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations: x1(t) = (c1)x1(t-1) + (c3)u1(t) x2(t) = (c2)x2(t-1) + (c4)u2(t) Observation equation: y1(t) = x1(t) + x2(t) Initial state distribution: Initial state means
12-429
12
Functions
x1 0
x2 0
Initial state covariance matrix x1 x2 x1 0.04 0 x2 0 Inf State types x1 Stationary
x2 Diffuse
The software computes the initial state mean and variance of the stationary state using its stationary distribution.
Explicitly Create and Display Time-Varying Diffuse State-Space Model From periods 1 through 50, the state model is a diffuse AR(2) and a stationary MA(1) model, and the observation model is the sum of the two states. From periods 51 through 100, the state model includes the first AR(2) model only. Symbolically, the state-space model is, for periods 1 through 50, x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1 0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
+
σ1 0 0 0 u1t 0 0 0 0 u2t 0 0 1 0 u3t , 0 0 1 0 u4t
yt = a1 x1t + x3t + σ2εt for period 51, x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1 1 0 0 0 x3, t − 1
u1t +
σ1 0 0 0 u2t 0 0 0 0 u3t
x4, t − 1
u4t
yt = a2x1t + σ3εt and for periods 52 through 100, x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0 u1t 0 0 u2t
yt = a2x1t + σ3εt . Specify the state-transition coefficient matrix. A1 = {[NaN NaN 0 0; 1 0 0 0; 0 0 0 NaN; 0 0 0 0]}; A2 = {[NaN NaN 0 0; 1 0 0 0]}; A3 = {[NaN NaN; 1 0]}; A = [repmat(A1,50,1);A2;repmat(A3,49,1)];
Specify the state-disturbance-loading coefficient matrix. 12-430
disp
B1 = {[NaN 0 0 0; 0 0 0 0; 0 0 1 0; 0 0 1 0]}; B2 = {[NaN 0 0 0; 0 0 0 0]}; B3 = {[NaN 0; 0 0]}; B = [repmat(B1,50,1);B2;repmat(B3,49,1)];
Specify the measurement-sensitivity coefficient matrix. C1 = {[NaN 0 NaN 0]}; C3 = {[NaN 0]}; C = [repmat(C1,50,1);repmat(C3,50,1)];
Specify the observation-disturbance coefficient matrix. D1 = {NaN}; D3 = {NaN}; D = [repmat(D1,50,1);repmat(D3,50,1)];
Create the diffuse state-space model. Specify that the initial state distributions are diffuse for the states composing the AR model and stationary for those composing the MA model. StateType = [2; 2; 0; 0]; Mdl = dssm(A,B,C,D,'StateType',StateType);
Mdl is an dssm model object. The model is large and contains a different set of parameters for each period. The software displays state and observation equations for the first 10 and last 10 periods. You can choose which periods to display the equations for using the 'Period' name-value pair argument. Display the diffuse state-space model, and use 'Period' display the state and observation equations for the 50th, 51st, and 52nd periods. disp(Mdl,'Period',50) State-space model type: dssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 600 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 50): x1(t) = (c148)x1(t-1) + (c149)x2(t-1) + (c300)u1(t) x2(t) = x1(t-1) x3(t) = (c150)x4(t-1) + u3(t) x4(t) = u3(t) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40
12-431
12
Functions
c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 50): y1(t) = (c449)x1(t) + (c450)x3(t) + (c550)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597 Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 x3 x4 Diffuse Diffuse Stationary Stationary disp(Mdl,'Period',51) State-space model type: dssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 600 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,...
12-432
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
disp
Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 51): x1(t) = (c151)x1(t-1) + (c152)x2(t-1) + (c301)u1(t) x2(t) = x1(t-1) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 51): y1(t) = (c451)x1(t) + (c551)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 x3 x4 Diffuse Diffuse Stationary Stationary disp(Mdl,'Period',52) State-space model type: dssm State vector length: Time-varying
12-433
12
Functions
Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 600 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 52): x1(t) = (c153)x1(t-1) + (c154)x2(t-1) + (c302)u1(t) x2(t) = x1(t-1) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 52): y1(t) = (c452)x1(t) + (c552)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597 Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified.
12-434
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
disp
State types x1 x2 Diffuse Diffuse
x3 Stationary
x4 Stationary
The software attributes a different set of coefficients for each period. You might experience numerical issues when you estimate such models. To reuse parameters among groups of periods, consider creating a parameter-to-matrix mapping function.
Tips • The software always displays explicitly specified state-space models (that is, models you create without using a parameter-to-matrix mapping function). Try explicitly specifying state-space models first so that you can verify them using disp. • A parameter-to-matrix function that you specify to create Mdl is a black box to the software. Therefore, the software might not display complex, implicitly defined state-space models.
Algorithms • If you implicitly create Mdl, and if the software cannot infer locations for unknown parameters from the parameter-to-matrix function, then the software evaluates these parameters using their initial values and displays them as numeric values. This evaluation can occur when the parameterto-matrix function has a random, unknown coefficient, which is a convenient form for a Monte Carlo study. • The software displays the initial state distributions as numeric values. This type of display occurs because, in many cases, the initial distribution depends on the values of the state equation matrices A and B. These values are often a complicated function of unknown parameters. In such situations, the software does not display the initial distribution symbolically. Additionally, if Mean0 and Cov0 contain unknown parameters, then the software evaluates and displays numeric values for the unknown parameters.
Version History Introduced in R2015b
References [1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also dssm | ssm | estimate | filter | smooth | forecast Topics “What Are State-Space Models?” on page 11-3
12-435
12
Functions
disp Class: ssm Display summary information for state-space model
Syntax disp(Mdl) disp(Mdl,params) disp( ___ ,Name,Value)
Description disp(Mdl) displays summary information for the state-space model on page 11-3 (ssm model object) Mdl. The display includes the state and observation equations as a system of scalar equations to facilitate model verification. The display also includes the coefficient dimensionalities, notation, and initial state distribution types. The software displays unknown parameter values using c1 for the first unknown parameter, c2 for the second unknown parameter, and so on. For time-varying models with more than 20 different sets of equations, the software displays the first and last 10 groups in terms of time (the last group is the latest). disp(Mdl,params) displays the ssm model Mdl and applies initial values to the model parameters (params). disp( ___ ,Name,Value) displays the ssm model with additional options specified by one or more Name,Value pair arguments. For example, you can specify the number of digits to display after the decimal point for model coefficients, or the number of terms per row for state and observation equations. You can use any of the input arguments in the previous syntaxes.
Input Arguments Mdl — Standard state-space model ssm model object Standard state-space model, specified as an ssm model object returned by ssm or estimate. params — Initial values for unknown parameters [] (default) | numeric vector Initial values for unknown parameters, specified as a numeric vector. The elements of params correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise, following the order A, B, C, D, Mean0, Cov0. 12-436
disp
• If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then you must set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrices mapping function. To set the type of initial state distribution, see ssm. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. MaxStateEq — Maximum number of equations to display 100 (default) | positive integer Maximum number of equations to display, specified as the comma-separated pair consisting of 'MaxStateEq' and a positive integer. If the maximum number of states among all periods is no larger than MaxStateEq, then the software displays the model equation by equation. Example: 'MaxStateEq',10 Data Types: double NumDigits — Number of digits to display after decimal point 2 (default) | nonnegative integer Number of digits to display after the decimal point for known or estimated model coefficients, specified as the comma-separated pair consisting of 'NumDigits' and a nonnegative integer. Example: 'NumDigits',0 Data Types: double Period — Period to display state and observation equations positive integer Period to display state and observation equations for time-varying state-space models, specified as the comma-separated pair consisting of 'Period' and a positive integer. By default, the software displays state and observation equations for all periods. If Period exceeds the maximum number of observations that the model supports, then the software displays state and observation equations for all periods. If the model has more than 20 different sets of equations, then the software displays the first and last 10 groups in terms of time (the last group is the latest). Example: 'Period',120 Data Types: double PredictorsPerRow — Number of equation terms to display per row 5 (default) | positive integer Number of equation terms to display per row, specified as the comma-separated pair consisting of 'PredictorsPerRow' and a positive integer. 12-437
12
Functions
Example: 'PredictorsPerRow',3 Data Types: double
Examples Verify Explicitly Created State-Space Model An important step in state-space model analysis is to ensure that the software interprets the state and observation equation matrices as you intend. Use disp to help you verify the state-space model. Define a state-space model, where the state equation is an AR(2) model, and the observation equation is the difference between the current and previous state plus the observation error. Symbolically, the state-space model is x1, t x2, t x3, t
0 . 6 0 . 2 0 . 5 x1, t − 1 0.3 = 1 0 0 x2, t − 1 + 0 u1, t 0 0 1 x3, t − 1 0 x1, t
yt = 1 −1 0 x2, t + 0 . 1εt . x3, t There are three states: x1, t is the AR(2) process, x2, t represents x1, t − 1, and x3, t is the AR(2) model constant. Define the state-transition matrix. A = [0.6 0.2 0.5; 1 0 0; 0 0 1];
Define the state-disturbance-loading matrix. B = [0.3; 0; 0];
Define the measurement-sensitivity matrix. C = [1 -1 0];
Define the observation-innovation matrix. D = 0.1;
Specify the state-space model using ssm. Set the initial-state mean (Mean0) and covariance matrix (Cov0). Identify the type of initial state distributions (StateType) by noting the following: • x1, t is a stationary, AR(2) process. • x2, t is also a stationary, AR(2) process. • x3, t is the constant 1 for all periods. Mean0 = [0; 0; 1]; % The mean of the AR(2) varAR2 = 0.3*(1 - 0.2)/((1 + 0.2)*((1 - 0.2)^2 - 0.6^2)); % The variance of the AR(2) Cov1AR2 = 0.6*0.3/((1 + 0.2)*((1 - 0.2)^2) - 0.6^2); % The covariance of the AR(2)
12-438
disp
Cov0 = zeros(3); Cov0(1:2,1:2) = varAR2*eye(2) + Cov1AR2*flip(eye(2)); StateType = [0; 0; 1]; Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType);
Mdl is an ssm model. Verify the state-space model using disp. disp(Mdl) State-space model type: ssm State vector length: 3 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State x1(t) x2(t) x3(t)
equations: = (0.60)x1(t-1) + (0.20)x2(t-1) + (0.50)x3(t-1) + (0.30)u1(t) = x1(t-1) = x3(t-1)
Observation equation: y1(t) = x1(t) - x2(t) + (0.10)e1(t) Initial state distribution: Initial state means x1 x2 x3 0 0 1 Initial state covariance matrix x1 x2 x3 x1 0.71 0.44 0 x2 0.44 0.71 0 x3 0 0 0 State types x1 Stationary
x2 Stationary
x3 Constant
Display State-Space Model and Initial Values Define a state-space model containing two independent, autoregressive states, and where the observations are the deterministic sum of the two states. Symbolically, the system of equations is xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1 0 ϕ2 xt − 1, 2
+
σ1 0 ut, 1 0 σ2 ut, 2 12-439
12
Functions
yt = 1 1
xt, 1 xt, 2
.
Specify the state-transition matrix. A = [NaN 0; 0 NaN];
Specify the state-disturbance-loading matrix. B = [NaN 0; 0 NaN];
Specify the measurement-sensitivity matrix. C = [1 1];
Specify an empty matrix for the observation disturbance matrix. D = [];
Use ssm to define the state-space model. Specify the initial state means and covariance matrix to as unknown parameters. Specify that the states are stationary. Mean0 = nan(2,1); Cov0 = nan(2,2); StateType = zeros(2,1); Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType);
Mdl is an ssm model containing unknown parameters. Use disp to display the state-space model. Specify initial values for the unknown parameters and the initial state means and covariance matrix as follows: • ϕ1, 0 = ϕ2, 0 = 0 . 1. • σ1, 0 = σ2, 0 = 0 . 2. • x1, 0 = 1 and x2, 0 = 0 . 5. • Σx1, 0, x2, 0 = I2. params = [0.1; 0.1; 0.2; 0.2; 1; 0.5; 1; 0; 0; 1]; disp(Mdl,params) State-space model type: ssm State vector length: 2 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 0 Sample size supported by model: Unlimited Unknown parameters for estimation: 10 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations:
12-440
disp
x1(t) = (c1)x1(t-1) + (c3)u1(t) x2(t) = (c2)x2(t-1) + (c4)u2(t) Observation equation: y1(t) = x1(t) + x2(t) Initial state distribution: Initial state means x1 x2 1 0.50 Initial state covariance matrix x1 x2 x1 1 0 x2 0 1 State types x1 Stationary
x2 Stationary
Explicitly Create and Display Time-Varying State-Space Model From periods 1 through 50, the state model is an AR(2) and an MA(1) model, and the observation model is the sum of the two states. From periods 51 through 100, the state model includes the first AR(2) model only. Symbolically, the state-space model is, for periods 1 through 50, x1, t x2, t x3, t
=
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1
σ1 0
0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
x4, t
0 0 u1, t 0 1 u3, t , 0 1
+
yt = a1 x1, t + x3, t + σ2εt for period 51, x1, t − 1 x1, t x2, t
=
ϕ1 ϕ2 0 0 x2, t − 1
+
1 0 0 0 x3, t − 1
σ1 0
u1, t
x4, t − 1 yt = a2x1t + σ3εt and for periods 52 through 100, x1, t x2, t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0 0 0
u1, t
yt = a2x1, t + σ3εt . Specify the state-transition coefficient matrix. 12-441
12
Functions
A1 = {[NaN NaN 0 0; 1 0 0 0; 0 0 0 NaN; 0 0 0 0]}; A2 = {[NaN NaN 0 0; 1 0 0 0]}; A3 = {[NaN NaN; 1 0]}; A = [repmat(A1,50,1);A2;repmat(A3,49,1)];
Specify the state-disturbance-loading coefficient matrix. B1 = {[NaN 0;0 0; 0 1; 0 1]}; B2 = {[NaN; 0]}; B3 = {[NaN; 0]}; B = [repmat(B1,50,1);B2;repmat(B3,49,1)];
Specify the measurement-sensitivity coefficient matrix. C1 = {[NaN 0 NaN 0]}; C3 = {[NaN 0]}; C = [repmat(C1,50,1);repmat(C3,50,1)];
Specify the observation-disturbance coefficient matrix. D1 = {NaN}; D3 = {NaN}; D = [repmat(D1,50,1);repmat(D3,50,1)];
Specify the state-space model. Set the initial state means and covariance matrix to unknown parameters. Specify that the initial state distributions are stationary. Mean0 = nan(4,1); Cov0 = nan(4,4); StateType = [0; 0; 0; 0]; Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType);
Mdl is an ssm model. The model is large and contains a different set of parameters for each period. The software displays state and observation equations for the first 10 and last 10 periods. You can choose which periods to display the equations for using the 'Period' name-value pair argument. Display the state-space model, and use 'Period' to display the state and observation equations for the 50th, 51st, and 52nd periods. disp(Mdl,'Period',50) State-space model type: ssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 620 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,...
12-442
disp
State equations (in period 50): x1(t) = (c148)x1(t-1) + (c149)x2(t-1) + (c300)u1(t) x2(t) = x1(t-1) x3(t) = (c150)x4(t-1) + u2(t) x4(t) = u2(t) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 50): y1(t) = (c449)x1(t) + (c450)x3(t) + (c550)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
Initial state distribution: Initial state means x1 x2 x3 x4 NaN NaN NaN NaN Initial state covariance matrix x1 x2 x3 x4 x1 NaN NaN NaN NaN x2 NaN NaN NaN NaN x3 NaN NaN NaN NaN x4 NaN NaN NaN NaN
12-443
12
Functions
State types x1 Stationary
x2 Stationary
x3 Stationary
x4 Stationary
disp(Mdl,'Period',51) State-space model type: ssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 620 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 51): x1(t) = (c151)x1(t-1) + (c152)x2(t-1) + (c301)u1(t) x2(t) = x1(t-1) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 51): y1(t) = (c451)x1(t) + (c551)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517
12-444
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c519 c5
disp
c521 c541 c561 c581
c522 c542 c562 c582
c523 c543 c563 c583
c524 c544 c564 c584
c525 c545 c565 c585
c526 c546 c566 c586
c527 c547 c567 c587
c528 c548 c568 c588
c529 c549 c569 c589
c530 c550 c570 c590
c531 c551 c571 c591
c532 c552 c572 c592
c533 c553 c573 c593
c534 c554 c574 c594
c535 c555 c575 c595
c536 c556 c576 c596
c537 c557 c577 c597
c538 c558 c578 c598
c539 c559 c579 c599
c5 c5 c5 c6
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
Initial state distribution: Initial state means x1 x2 x3 x4 NaN NaN NaN NaN Initial state covariance matrix x1 x2 x3 x4 x1 NaN NaN NaN NaN x2 NaN NaN NaN NaN x3 NaN NaN NaN NaN x4 NaN NaN NaN NaN State types x1 Stationary
x2 Stationary
x3 Stationary
x4 Stationary
disp(Mdl,'Period',52) State-space model type: ssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 620 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 52): x1(t) = (c153)x1(t-1) + (c154)x2(t-1) + (c302)u1(t) x2(t) = x1(t-1) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267
c268 c269 c2
12-445
12
Functions
c271 c291 c311 c331
c272 c292 c312 c332
c273 c293 c313 c333
c274 c294 c314 c334
c275 c295 c315 c335
c276 c296 c316 c336
c277 c297 c317 c337
c278 c298 c318 c338
c279 c299 c319 c339
c280 c300 c320 c340
c281 c301 c321 c341
c282 c302 c322 c342
c283 c303 c323 c343
c284 c304 c324 c344
c285 c305 c325 c345
c286 c306 c326 c346
c287 c307 c327 c347
Observation equation (in period 52): y1(t) = (c452)x1(t) + (c552)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597
c288 c308 c328 c348
c289 c309 c329 c349
c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
Initial state distribution: Initial state means x1 x2 x3 x4 NaN NaN NaN NaN Initial state covariance matrix x1 x2 x3 x4 x1 NaN NaN NaN NaN x2 NaN NaN NaN NaN x3 NaN NaN NaN NaN x4 NaN NaN NaN NaN State types x1 Stationary
x2 Stationary
x3 Stationary
x4 Stationary
The software attributes a different set of coefficients for each period. You might experience numerical issues when you estimate such models. To reuse parameters among groups of periods, consider creating a parameter-to-matrix mapping function.
Tips • The software always displays explicitly specified state-space models (that is, models you create without using a parameter-to-matrix mapping function). Try explicitly specifying state-space models first so that you can verify them using disp. • A parameter-to-matrix function that you specify to create Mdl is a black box to the software. Therefore, the software might not display complex, implicitly defined state-space models.
12-446
disp
Algorithms • If you implicitly create Mdl, and if the software cannot infer locations for unknown parameters from the parameter-to-matrix function, then the software evaluates these parameters using their initial values and displays them as numeric values. This evaluation can occur when the parameterto-matrix function has a random, unknown coefficient, which is a convenient form for a Monte Carlo study. • The software displays the initial state distributions as numeric values. This type of display occurs because, in many cases, the initial distribution depends on the values of the state equation matrices A and B. These values are often a complicated function of unknown parameters. In such situations, the software does not display the initial distribution symbolically. Additionally, if Mean0 and Cov0 contain unknown parameters, then the software evaluates and displays numeric values for the unknown parameters.
References [1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also ssm | estimate | filter | smooth | forecast | simulate Topics “Create State-Space Model with Random State Coefficient” on page 11-31 “What Are State-Space Models?” on page 11-3
12-447
12
Functions
dssm class Create diffuse state-space model
Description dssm creates a linear diffuse state-space model on page 11-4 with independent Gaussian state disturbances and observation innovations. A diffuse state-space model contains diffuse states, and variances of the initial distributions of diffuse states are Inf. All diffuse states are independent of each other and all other states. The software implements the diffuse Kalman filter for filtering, smoothing, and parameter estimation. You can: • Specify a time-invariant on page 11-4 or time-varying on page 11-5 model. • Specify whether states are stationary, static on page 12-468, or nonstationary. • Specify the state-transition, state-disturbance-loading, measurement-sensitivity, or observationinnovation matrices: • Explicitly by providing the matrices • Implicitly by providing a function that maps the parameters to the matrices, that is, a parameter-to-matrix mapping function After creating a diffuse state-space model containing unknown parameters, you can estimate its parameters by passing the created dssm model object and data to estimate. The estimate function builds the likelihood function using the diffuse Kalman filter. Use a completely specified model (that is, all parameter values of the model are known) to: • Filter or smooth states using filter or smooth, respectively. These functions apply the diffuse Kalman filter and data to the state-space model. • Forecast states or observations using forecast.
Construction Mdl = dssm(A,B,C) creates a diffuse state-space model on page 11-4 (Mdl) using state-transition matrix A, state-disturbance-loading matrix B, and measurement-sensitivity matrix C. Mdl = dssm(A,B,C,D) creates a diffuse state-space model using state-transition matrix A, statedisturbance-loading matrix B, measurement-sensitivity matrix C, and observation-innovation matrix D. Mdl = dssm( ___ ,Name,Value) uses any of the input arguments in the previous syntaxes and additional options that you specify by one or more Name,Value pair arguments. Mdl = dssm(ParamMap) creates a diffuse state-space model using a parameter-to-matrix mapping function (ParamMap) that you write. The function maps a vector of parameters to the matrices A, B, and C. Optionally, ParamMap can map parameters to D, Mean0, Cov0. To specify the types of states, the function can return StateType. To accommodate a regression component in the observation equation, ParamMap can also return deflated observation data. 12-448
dssm class
Mdl = dssm(SSMMdl) converts a state-space model object (SSMMdl) to a diffuse state-space model object (Mdl). dssm sets all initial variances of diffuse states in SSMMdl.Cov0 to Inf. Input Arguments A — State-transition coefficient matrix matrix | cell vector of matrices State-transition coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. The state-transition coefficient matrix, At, specifies how the states, xt, are expected to transition from period t – 1 to t, for all t = 1,...,T. That is, the expected state-transition equation at period t is E(xt|xt–1) = Atxt–1. For time-invariant state-space models, specify A as an m-by-m matrix, where m is the number of states per period. For time-varying state-space models, specify A as a T-dimensional cell array, where A{t} contains an mt-by-mt – 1 state-transition coefficient matrix. If the number of states changes from period t – 1 to t, then mt ≠ mt – 1. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. A contributes: • sum(isnan(A(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in A at each period. • numParamsA unknown parameters to time-varying state-space models, where numParamsA = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),A,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in A. You cannot specify A and ParamMap simultaneously. Data Types: double | cell B — State-disturbance-loading coefficient matrix matrix | cell vector of matrices State-disturbance-loading coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. The state disturbances, ut, are independent Gaussian random variables with mean 0 and standard deviation 1. The state-disturbance-loading coefficient matrix, Bt, specifies the additive error structure in the state-transition equation from period t – 1 to t, for all t = 1,...,T. That is, the state-transition equation at period t is xt = Atxt–1 + Btut. For time-invariant state-space models, specify B as an m-by-k matrix, where m is the number of states and k is the number of state disturbances per period. B*B' is the state-disturbance covariance matrix for all periods. For time-varying state-space models, specify B as a T-dimensional cell array, where B{t} contains an mt-by-kt state-disturbance-loading coefficient matrix. If the number of states or state disturbances 12-449
12
Functions
changes at period t, then the matrix dimensions between B{t-1} and B{t} vary. B{t}*B{t}' is the state-disturbance covariance matrix for period t. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. B contributes: • sum(isnan(B(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in B at each period. • numParamsB unknown parameters to time-varying state-space models, where numParamsB = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),B,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in B. You cannot specify B and ParamMap simultaneously. Data Types: double | cell C — Measurement-sensitivity coefficient matrix matrix | cell vector of matrices Measurement-sensitivity coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. The measurement-sensitivity coefficient matrix, Ct, specifies how the states are expected to linearly combine at period t to form the observations, yt, for all t = 1,...,T. That is, the expected observation equation at period t is E(yt|xt) = Ctxt. For time-invariant state-space models, specify C as an n-by-m matrix, where n is the number of observations and m is the number of states per period. For time-varying state-space models, specify C as a T-dimensional cell array, where C{t} contains an nt-by-mt measurement-sensitivity coefficient matrix. If the number of states or observations changes at period t, then the matrix dimensions between C{t-1} and C{t} vary. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. C contributes: • sum(isnan(C(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in C at each period. • numParamsC unknown parameters to time-varying state-space models, where numParamsC = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),C,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in C. You cannot specify C and ParamMap simultaneously. Data Types: double | cell D — Observation-innovation coefficient matrix [] (default) | matrix | cell vector of matrices Observation-innovation coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. 12-450
dssm class
The observation innovations, εt, are independent Gaussian random variables with mean 0 and standard deviation 1. The observation-innovation coefficient matrix, Dt, specifies the additive error structure in the observation equation at period t, for all t = 1,...,T. That is, the observation equation at period t is yt = Ctxt + Dtεt. For time-invariant state-space models, specify D as an n-by-h matrix, where n is the number of observations and h is the number of observation innovations per period. D*D' is the observationinnovation covariance matrix for all periods. For time-varying state-space models, specify D as a T-dimensional cell array, where D{t} contains an nt-by-ht matrix. If the number of observations or observation innovations changes at period t, then the matrix dimensions between D{t-1} and D{t} vary. D{t}*D{t}' is the observation-innovation covariance matrix for period t. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. D contributes: • sum(isnan(D(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in D at each period. • numParamsD unknown parameters to time-varying state-space models, where numParamsD = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),D,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in D. By default, D is an empty matrix indicating no observation innovations in the state-space model. You cannot specify D and ParamMap simultaneously. Data Types: double | cell ParamMap — Parameter-to-matrix mapping function empty array ([]) (default) | function handle Parameter-to-matrix mapping function for implicit state-space model creation, specified as a function handle. ParamMap must be a function that takes at least one input argument and returns at least three output arguments. The requisite input argument is a vector of unknown parameters, and the requisite output arguments correspond to the coefficient matrices A, B, and C, respectively. If your parameter-tomapping function requires the input parameter vector argument only, then implicitly create a diffuse state-space model by entering the following: Mdl = dssm(@ParamMap)
In general, you can write an intermediate function, for example, ParamFun, using this syntax: function [A,B,C,D,Mean0,Cov0,StateType,DeflateY] = ... ParamFun(params,...otherInputArgs...)
In this general case, create the diffuse state-space model by entering Mdl = dssm(@(params)ParamMap(params,...otherInputArgs...))
However: 12-451
12
Functions
• Follow the order of the output arguments. • params is a vector, and each element corresponds to an unknown parameter. • ParamFun must return A, B, and C, which correspond to the state-transition, state-disturbanceloading, and measurement-sensitivity coefficient matrices, respectively. • If you specify more input arguments than the parameter vector (params), such as observed responses and predictors, then implicitly create the diffuse state-space model using the syntax pattern Mdl = dssm(@(params)ParamFun(params,y,z))
• For the optional output arguments D, Mean0, Cov0, StateType, and DeflateY: • The optional output arguments correspond to the observation-innovation coefficient matrix D and the name-value pair arguments Mean0, Cov0, and StateType. • To skip specifying an optional output argument, set the argument to [] in the function body. For example, to skip specifying D, then set D = []; in the function. • DeflateY is the deflated-observation data, which accommodates a regression component in the observation equation. For example, in this function, which has a linear regression component, Y is the vector of observed responses and Z is the vector of predictor data. function [A,B,C,D,Mean0,Cov0,StateType,DeflateY] = ParamFun(params,Y,Z) ... DeflateY = Y - params(9) - params(10)*Z; ... end
• For the default values of Mean0, Cov0, and StateType, see “Algorithms” on page 12-2194. • It is best practice to: • Load the data to the MATLAB Workspace before specifying the model. • Create the parameter-to-matrix mapping function as its own file. If you specify ParamMap, then you cannot specify any name-value pair arguments or any other input arguments. Data Types: function_handle SSMMdl — State-space model ssm model object State-space model to convert to a diffuse state-space model, specified as an ssm model object. dssm sets all initial variances of diffuse states in SSMMdl.Cov0 to Inf. To use the diffuse Kalman filter for filtering, smoothing, and parameter estimation instead of the standard Kalman filter, convert a state-space model to a diffuse state-space model. Name-Value Pair Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-452
dssm class
Mean0 — Initial state mean numeric vector Initial state mean for explicit state-space model creation, specified as the comma-separated pair consisting of 'Mean0' and a numeric vector with length equal to the number of initial states. For the default values, see “Algorithms” on page 12-2194. If you specify ParamMap, then you cannot specify Mean0. Instead, specify the initial state mean in the parameter-to-matrix mapping function. Data Types: double Cov0 — Initial state covariance matrix square matrix Initial state covariance matrix for explicit state-space model creation, specified as the commaseparated pair consisting of 'Cov0' and a square matrix with dimensions equal to the number of initial states. For the default values, see “Algorithms” on page 12-2194. If you specify ParamMap, then you cannot specify Cov0. Instead, specify the initial state covariance in the parameter-to-matrix mapping function. Data Types: double StateType — Initial state distribution indicator 0|1|2 Initial state distribution indicator for explicit state-space model creation, specified as the commaseparated pair consisting of 'StateType' and a numeric vector with length equal to the number of initial states. This table summarizes the available types of initial state distributions. Value
Initial State Distribution Type
0
Stationary (for example, ARMA models)
1
The constant 1 (that is, the state is 1 with probability 1)
2
Diffuse or nonstationary (for example, random walk model, seasonal linear time series) or static state on page 12-2193
For example, suppose that the state equation has two state variables: The first state variable is an AR(1) process, and the second state variable is a random walk. Specify the initial distribution types by setting 'StateType',[0; 2]. If you specify ParamMap, then you cannot specify Mean0. Instead, specify the initial state distribution indicator in the parameter-to-matrix mapping function. For the default values, see “Algorithms” on page 12-2194. Data Types: double
Properties A — State-transition coefficient matrix matrix | cell vector of matrices | empty array ([]) 12-453
12
Functions
State-transition coefficient matrix for explicitly created state-space models, specified as a matrix, a cell vector of matrices, or an empty array ([]). For implicitly created state-space models and before estimation, A is [] and read only. The state-transition coefficient matrix, At, specifies how the states, xt, are expected to transition from period t – 1 to t, for all t = 1,...,T. That is, the expected state-transition equation at period t is E(xt|xt–1) = Atxt–1. For time-invariant state-space models, A is an m-by-m matrix, where m is the number of states per period. For time-varying state-space models, A is a T-dimensional cell array, where A{t} contains an mt-by-mt – 1 state-transition coefficient matrix. If the number of states changes from period t – 1 to t, then mt ≠ mt – 1. NaN values in any coefficient matrix indicate unknown parameters in the state-space model. A contributes: • sum(isnan(A(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in A at each period. • numParamsA unknown parameters to time-varying state-space models, where numParamsA = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),A,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in A. Data Types: double | cell B — State-disturbance-loading coefficient matrix matrix | cell vector of matrices | empty array ([]) State-disturbance-loading coefficient matrix for explicitly created state-space models, specified as a matrix, a cell vector of matrices, or an empty array ([]). For implicitly created state-space models and before estimation, B is [] and read only. The state disturbances, ut, are independent Gaussian random variables with mean 0 and standard deviation 1. The state-disturbance-loading coefficient matrix, Bt, specifies the additive error structure in the state-transition equation from period t – 1 to t, for all t = 1,...,T. That is, the state-transition equation at period t is xt = Atxt–1 + Btut. For time-invariant state-space models, B is an m-by-k matrix, where m is the number of states and k is the number of state disturbances per period. B*B' is the state-disturbance covariance matrix for all periods. For time-varying state-space models, B is a T-dimensional cell array, where B{t} contains an mt-by-kt state-disturbance-loading coefficient matrix. If the number of states or state disturbances changes at period t, then the matrix dimensions between B{t-1} and B{t} vary. B{t}*B{t}' is the statedisturbance covariance matrix for period t. NaN values in any coefficient matrix indicate unknown parameters in the state-space model. B contributes: • sum(isnan(B(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in B at each period. 12-454
dssm class
• numParamsB unknown parameters to time-varying state-space models, where numParamsB = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),B,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in B. Data Types: double | cell C — Measurement-sensitivity coefficient matrix matrix | cell vector of matrices | empty array ([]) Measurement-sensitivity coefficient matrix for explicitly created state-space models, specified as a matrix, a cell vector of matrices, or an empty array ([]). For implicitly created state-space models and before estimation, C is [] and read only. The measurement-sensitivity coefficient matrix, Ct, specifies how the states are expected to combine linearly at period t to form the observations, yt, for all t = 1,...,T. That is, the expected observation equation at period t is E(yt|xt) = Ctxt. For time-invariant state-space models, C is an n-by-m matrix, where n is the number of observations and m is the number of states per period. For time-varying state-space models, C is a T-dimensional cell array, where C{t} contains an nt-by-mt measurement-sensitivity coefficient matrix. If the number of states or observations changes at period t, then the matrix dimensions between C{t-1} and C{t} vary. NaN values in any coefficient matrix indicate unknown parameters in the state-space model. C contributes: • sum(isnan(C(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in C at each period. • numParamsC unknown parameters to time-varying state-space models, where numParamsC = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),C,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in C. Data Types: double | cell D — Observation-innovation coefficient matrix matrix | cell vector of matrices | empty array ([]) Observation-innovation coefficient matrix for explicitly created state-space models, specified as a matrix, a cell vector of matrices, or an empty array ([]). For implicitly created state-space models and before estimation, D is [] and read only. The observation innovations, εt, are independent Gaussian random variables with mean 0 and standard deviation 1. The observation-innovation coefficient matrix, Dt, specifies the additive error structure in the observation equation at period t, for all t = 1,...,T. That is, the observation equation at period t is yt = Ctxt + Dtεt. For time-invariant state-space models, D is an n-by-h matrix, where n is the number of observations and h is the number of observation innovations per period. D*D' is the observation-innovation covariance matrix for all periods. 12-455
12
Functions
For time-varying state-space models, D is a T-dimensional cell array, where D{t} contains an nt-by-ht matrix. If the number of observations or observation innovations changes at period t, then the matrix dimensions between D{t-1} and D{t} vary. D{t}*D{t}' is the state-disturbance covariance matrix for period t. NaN values in any coefficient matrix indicate unknown parameters in the state-space model. D contributes: • sum(isnan(D(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in D at each period. • numParamsD unknown parameters to time-varying state-space models, where numParamsD = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),D,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in D. Data Types: double | cell Mean0 — Initial state mean numeric vector | empty array ([]) Initial state mean, specified as a numeric vector or an empty array ([]). Mean0 has length equal to the number of initial states (size(A,1) or size(A{1},1)). Mean0 is the mean of the Gaussian distribution of the states at period 0. For implicitly created state-space models and before estimation, Mean0 is [] and read only. However, estimate specifies Mean0 after estimation. Data Types: double Cov0 — Initial state covariance matrix square matrix | empty array ([]) Initial state covariance matrix, specified as a square matrix or an empty array ([]). Cov0 has dimensions equal to the number of initial states (size(A,1) or size(A{1},1)). Cov0 is the covariance of the Gaussian distribution of the states at period 0. For implicitly created state-space models and before estimation, Cov0 is [] and read only. However, estimate specifies Cov0 after estimation. Diagonal elements of Cov0 that have value Inf correspond to diffuse initial state distributions. This specification indicates complete ignorance or no prior knowledge of the initial state value. Subsequently, the software filters, smooths, and estimates parameters in the presence of diffuse initial state distributions using the diffuse Kalman filter. To use the standard Kalman filter for diffuse states instead, set each diagonal element of Cov0 to a large, positive value, for example, 1e7. This specification suggests relatively weak knowledge of the initial state value. Data Types: double StateType — Initial state distribution type numeric vector | empty array ([]) Initial state distribution indicator, specified as a numeric vector or empty array ([]). StateType has length equal to the number of initial states. 12-456
dssm class
For implicitly created state-space models or models with unknown parameters, StateType is [] and read only. This table summarizes the available types of initial state distributions. Value
Initial State Distribution Type
0
Stationary (e.g., ARMA models)
1
The constant 1 (that is, the state is 1 with probability 1)
2
Nonstationary (e.g., random walk model, seasonal linear time series) or static state on page 12-448.
For example, suppose that the state equation has two state variables: The first state variable is an AR(1) process, and the second state variable is a random walk. Then, StateType is [0; 2]. For nonstationary states, dssm sets Cov0 to Inf by default. Subsequently, the software assumes that diffuse states are uncorrelated and implements the diffuse Kalman filter for filtering, smoothing, and parameter estimation. This specification imposes no prior knowledge on the initial state values of diffuse states. Data Types: double ParamMap — Parameter-to-matrix mapping function function handle | empty array ([]) Parameter-to-matrix mapping function, specified as a function handle or an empty array ([]). ParamMap completely specifies the structure of the state-space model. That is, ParamMap defines A, B, C, D, and, optionally, Mean0, Cov0, and StateType. For explicitly created state-space models, ParamMap is [] and read only. Data Types: function_handle
Object Functions Fit Model to Data estimate refine disp
Maximum likelihood parameter estimation of diffuse state-space models Refine initial parameters to aid diffuse state-space model estimation Display summary information for diffuse state-space model
Estimate State Variables filter smooth
Forward recursion of diffuse state-space models Backward recursion of diffuse state-space models
Impulse Response Function irf irfplot
Impulse response function (IRF) of state-space model Plot impulse response function (IRF) of state-space model
Generate Minimum Mean Square Error Forecasts forecast
Forecast states and observations of diffuse state-space models 12-457
12
Functions
Copy Semantics Value. To learn how value classes affect copy operations, see Copying Objects.
Examples Explicitly Create Diffuse State-Space Model Containing Known and Unknown Parameters Create a diffuse state-space model containing two independent states, x1, t and x3, t, and an observation, yt, that is the deterministic sum of the two states at time t. x1 is an AR(1) model with a constant and x3 is a random walk. Symbolically, the state-space model is x1, t x2, t x3, t
σ1 0 ϕ1 c1 0 xt − 1, 1 ut, 1 = 0 1 0 xt − 1, 2 + 0 0 ut, 3 0 σ2 0 0 1 xt − 1, 3 xt, 1
yt = 1 0 1 xt, 2 . xt, 3 The state disturbances, u1, t and u3, t, are standard Gaussian random variables. Specify the state-transition matrix. A = [NaN NaN 0; 0 1 0; 0 0 1];
The NaN values indicate unknown parameters. Specify the state-disturbance-loading matrix. B = [NaN 0; 0 0; 0 NaN];
Specify the measurement-sensitivity matrix. C = [1 0 1];
Create a vector that specifies the state types. In this example: • x1, t is a stationary AR(1) model, so its state type is 0. • x2, t is a placeholder for the constant in the AR(1) model. Because the constant is unknown and is expressed in the first equation, x2, t is 1 for the entire series. Therefore, its state type is 1. • x3, t is a nonstationary, random walk with drift, so its state type is 2. StateType = [0 1 2];
Create the state-space model using dssm. Mdl = dssm(A,B,C,'StateType',StateType) Mdl = State-space model type: dssm
12-458
dssm class
State vector length: 3 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 0 Sample size supported by model: Unlimited Unknown parameters for estimation: 4 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State x1(t) x2(t) x3(t)
equations: = (c1)x1(t-1) + (c2)x2(t-1) + (c3)u1(t) = x2(t-1) = x3(t-1) + (c4)u2(t)
Observation equation: y1(t) = x1(t) + x3(t) Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 x3 Stationary Constant Diffuse
Mdl is a dssm model object containing unknown parameters. A detailed summary of Mdl prints to the Command Window. If you do not specify the initial state covariance matrix, then the initial variance of x3, t is Inf. It is good practice to verify that the state and observation equations are correct. If the equations are not correct, then expand the state-space equation and verify it manually.
Explicitly Create Diffuse State-Space Model Containing Observation Error Create a diffuse state-space model containing two random walk states. The observations are the sum of the two states, plus Gaussian error. Symbolically, the equation is xt, 1 xt, 2
=
σ1 0 ut, 1 1 0 xt − 1, 1 + 0 1 xt − 1, 2 0 σ2 ut, 2
yt = 1 1
xt, 1 xt, 2
+ σ3εt .
Define the state-transition matrix. A = [1 0; 0 1];
Define the state-disturbance-loading matrix. 12-459
12
Functions
B = [NaN 0; 0 NaN];
Define the measurement-sensitivity matrix. C = [1 1];
Define the observation-innovation matrix. D = NaN;
Create a vector that specifies that both states are nonstationary. StateType = [2; 2];
Create the state-space model using dssm. Mdl = dssm(A,B,C,D,'StateType',StateType) Mdl = State-space model type: dssm State vector length: 2 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 1 Sample size supported by model: Unlimited Unknown parameters for estimation: 3 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations: x1(t) = x1(t-1) + (c1)u1(t) x2(t) = x2(t-1) + (c2)u2(t) Observation equation: y1(t) = x1(t) + x2(t) + (c3)e1(t) Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 Diffuse Diffuse
Mdl is an dssm model containing unknown parameters. A detailed summary of Mdl prints to the Command Window. Pass the data and Mdl to estimate to estimate the parameters. During estimation, the initial state variances are Inf, and estimate implements the diffuse Kalman filter.
12-460
dssm class
Create Known Diffuse State-Space Model with Initial State Values Create a diffuse state-space model, where: • The state x1, t is a stationary AR(2) model with ϕ1 = 0 . 6, ϕ2 = 0 . 2, and a constant 0.5. The state disturbance is a mean zero Gaussian random variable with standard deviation 0.3. • The state x4, t is a random walk. The state disturbance is a mean zero Gaussian random variable with standard deviation 0.05. • The observation y1, t is the difference between the current and previous value in the AR(2) state, plus a mean 0 Gaussian observation innovation with standard deviation 0.1. • The observation y2, t is the random walk state plus a mean 0 Gaussian observation innovation with standard deviation 0.02. Symbolically, the state-space model is x1, t x2, t x3, t x4, t
0.6 1 = 0 0
0.2 0 0 0
0.5 0 1 0
0 0 0 1
x1, t − 1 x2, t − 1 x3, t − 1 x4, t − 1
0.3 0 u1, t 0 0 + 0 0 u4, t 0 0 . 05
x1, t y1, t y2, t
=
ε1, t 1 −1 0 0 x2, t 0.1 0 + . 0 0 0 1 x3, t 0 0 . 02 ε2, t x4, t
The model has four states: x1, t is the AR(2) process, x2, t represents x1, t − 1, x3, t is the AR(2) model constant, and x4, t is the random walk. Define the state-transition matrix. A = [0.6 0.2 0.5 0; 1 0 0 0; 0 0 1 0; 0 0 0 1];
Define the state-disturbance-loading matrix. B = [0.3 0; 0 0; 0 0; 0 0.05];
Define the measurement-sensitivity matrix. C = [1 -1 0 0; 0 0 0 1];
Define the observation-innovation matrix. D = [0.1; 0.02];
Use dssm to create the state-space model. Identify the type of initial state distributions (StateType) by noting the following: • x1, t is a stationary AR(2) process. • x2, t is also a stationary AR(2) process. • x3, t is the constant 1 for all periods. 12-461
12
Functions
• x4, t is nonstationary. Set the initial state means (Mean0) to 0. The initial state mean for constant states must be 1. Mean0 = [0; 0; 1; 0]; StateType = [0; 0; 1; 2]; Mdl = dssm(A,B,C,D,'Mean0',Mean0,'StateType',StateType) Mdl = State-space model type: dssm State vector length: 4 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State x1(t) x2(t) x3(t) x4(t)
equations: = (0.60)x1(t-1) + (0.20)x2(t-1) + (0.50)x3(t-1) + (0.30)u1(t) = x1(t-1) = x3(t-1) = x4(t-1) + (0.05)u2(t)
Observation equations: y1(t) = x1(t) - x2(t) + (0.10)e1(t) y2(t) = x4(t) + (0.02)e1(t) Initial state distribution: Initial state means x1 x2 x3 x4 0 0 1 0 Initial state covariance matrix x1 x2 x3 x4 x1 0.21 0.16 0 0 x2 0.16 0.21 0 0 x3 0 0 0 0 x4 0 0 0 Inf State types x1 Stationary
x2 Stationary
x3 Constant
x4 Diffuse
Mdl is a dssm model object. dssm sets the initial state: • Covariance matrix for the stationary states to the asymptotic covariance of the AR(2) model • Variance for constant states to 0 • Variance for diffuse states to Inf You can display or modify properties of Mdl using dot notation. For example, display the initial state covariance matrix. 12-462
dssm class
Mdl.Cov0 ans = 4×4 0.2143 0.1607 0 0
0.1607 0.2143 0 0
0 0 0 0
0 0 0 Inf
Reset the initial state means for the stationary states to their asymptotic values. Mdl.Mean0(1:2) = 0.5/(1-0.2-0.6); Mdl.Mean0 ans = 4×1 2.5000 2.5000 1.0000 0
Implicitly Create Time-Invariant State-Space Model Use a parameter mapping function to create a time-invariant state-space model, where the state model is AR(1) model. The states are observed with bias, but without random error. Set the initial state mean and variance, and specify that the state is stationary. Write a function that specifies how the parameters in params map to the state-space model matrices, the initial state values, and the type of state. Symbolically, the model is
% Copyright 2015 The MathWorks, Inc. function [A,B,C,D,Mean0,Cov0,StateType] = timeInvariantParamMap(params) % Time-invariant state-space model parameter mapping function example. This % function maps the vector params to the state-space matrices (A, B, C, and % D), the initial state value and the initial state variance (Mean0 and % Cov0), and the type of state (StateType). The state model is AR(1) % without observation error. varu1 = exp(params(2)); % Positive variance constraint A = params(1); B = sqrt(varu1); C = params(3); D = []; Mean0 = 0.5; Cov0 = 100; StateType = 0; end
12-463
12
Functions
Save this code as a file named timeInvariantParamMap.m to a folder on your MATLAB® path. Create the state-space model by passing the function timeInvariantParamMap as a function handle to ssm. Mdl = ssm(@timeInvariantParamMap);
ssm implicitly creates the state-space model. Usually, you cannot verify implicitly defined state-space models.
Convert Standard to Diffuse State-Space Model By default, ssm assigns a large scalar (1e7) to the initial state variance of all diffuse states in a standard state-space model. Using this specification, the software subsequently estimates, filters, and smooths a standard state-space model using the standard Kalman filter. A standard state-space model treatment is an approximation to results from an analysis that treats diffuse states using infinite variance. To implement the diffuse Kalman filter instead, convert the standard state-space model to a diffuse state-space model. This conversion attributes infinite variance to all diffuse states. Explicitly create a two-dimensional standard state-space model. Specify that the first state equation is x1, t = x1, t − 1 + u1, t and that the second state equation is x2, t = 0 . 2x2, t − 1 + u2, t. Specify that the first observation equation is y1, t = x1, t + ε1, t and that the second observation equation is y2, t = x2, t + ε2, t. Specify that the states are diffuse and nonstationary, respectively. A = [1 0; 0 0.2]; B = [1 0; 0 1]; C = [1 0;0 1]; D = [1 0; 0 1]; StateType = [2 0]; MdlSSM = ssm(A,B,C,D,'StateType',StateType) MdlSSM = State-space model type: ssm State vector length: 2 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 2 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equations: x1(t) = x1(t-1) + u1(t) x2(t) = (0.20)x2(t-1) + u2(t) Observation equations: y1(t) = x1(t) + e1(t) y2(t) = x2(t) + e2(t) Initial state distribution:
12-464
dssm class
Initial state means x1 x2 0 0 Initial state covariance matrix x1 x2 x1 1.00e+07 0 x2 0 1.04 State types x1 x2 Diffuse Stationary
MdlSSM is an ssm model object. In some cases, ssm can detect the state type, but it is good practice to specify whether the state is stationary, diffuse, or the constant 1. Because the model does not contain any unknown parameters, ssm infers the initial state distributions. Convert MdlSSM to a diffuse state-space model. Mdl = dssm(MdlSSM) Mdl = State-space model type: dssm State vector length: 2 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 2 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equations: x1(t) = x1(t-1) + u1(t) x2(t) = (0.20)x2(t-1) + u2(t) Observation equations: y1(t) = x1(t) + e1(t) y2(t) = x2(t) + e2(t) Initial state distribution: Initial state means x1 x2 0 0 Initial state covariance matrix x1 x2 x1 Inf 0 x2 0 1.04 State types x1
x2
12-465
12
Functions
Diffuse
Stationary
Mdl is a dssm model object. The structures of Mdl and MdlSSM are equivalent, except that the initial state variance of the state in Mdl is Inf rather than 1e7. To see the difference between the two models, simulate 10 periods of data from a state-space model that is similar to MdlSSM. Set the initial state covariance matrix to I2. Mdl0 = MdlSSM; Mdl0.Cov0 = eye(2); T = 10; rng(1); % For reproducibility y = simulate(Mdl0,T);
Obtain filtered and smoothed states from Mdl and MdlSSM using the simulated data. fY = filter(MdlSSM,y); fYD = filter(Mdl,y); sY = smooth(MdlSSM,y); sYD = smooth(Mdl,y);
Plot the filtered and smoothed states. figure; subplot(2,1,1) plot(1:T,y(:,1),'-o',1:T,fY(:,1),'-d',1:T,fYD(:,1),'-*'); title('Filtered States for x_{1,t}') legend('Simulated Data','Filtered States -- MdlSSM','Filtered States -- Mdl'); subplot(2,1,2) plot(1:T,y(:,1),'-o',1:T,sY(:,1),'-d',1:T,sYD(:,1),'-*'); title('Smoothed States for x_{1,t}') legend('Simulated Data','Smoothed States -- MdlSSM','Smoothed States -- Mdl');
12-466
dssm class
figure; subplot(2,1,1) plot(1:T,y(:,2),'-o',1:T,fY(:,2),'-d',1:T,fYD(:,2),'-*'); title('Filtered States for x_{2,t}') legend('Simulated Data','Filtered States -- MdlSSM','Filtered States -- Mdl'); subplot(2,1,2) plot(1:T,y(:,2),'-o',1:T,sY(:,2),'-d',1:T,sYD(:,2),'-*'); title('Smoothed States for x_{2,t}') legend('Simulated Data','Smoothed States -- MdlSSM','Smoothed States -- Mdl');
12-467
12
Functions
In addition to apparent transient behavior in the random walk, the filtered and smoothed states between the standard and diffuse state-space models appear nearly equivalent. The slight difference occurs because filter and smooth set all diffuse state estimates in the diffuse state-space model to 0 while they implement the diffuse Kalman filter. Once the covariance matrices of the smoothed states attain full rank, filter and smooth switch to using the standard Kalman filter. In this case, the switching time occurs after the first period.
More About Static State A static state does not change in value throughout the sample, that is, P xt + 1 = xt = 1 for all t = 1,...,T.
Tip • Specify ParamMap in a more general or complex setting, where, for example: • The initial state values are parameters. • In time-varying models, you want to use the same parameters for more than one period. • You want to impose parameter constraints.
12-468
dssm class
• You can create a dssm model object that does not contain any diffuse states. However, subsequent computations, for example, filtering and parameter estimation, can be inefficient. If all states have stationary distributions or are the constant 1, then create an ssm model object instead.
Algorithms • Default values for Mean0 and Cov0: • If you explicitly specify the state-space model (that is, you provide the coefficient matrices A, B, C, and optionally D), then: • For stationary states, the software generates the initial value using the stationary distribution. If you provide all values in the coefficient matrices (that is, your model has no unknown parameters), then dssm generates the initial values. Otherwise, the software generates the initial values during estimation. • For states that are always the constant 1, dssm sets Mean0 to 1 and Cov0 to 0. • For diffuse states, the software sets Mean0 to 0 and Cov0 to Inf by default. • If you implicitly specify the state-space model (that is, you provide the parameter vector to the coefficient-matrices-mapping function ParamMap), then the software generates the initial values during estimation. • For static states that do not equal 1 throughout the sample, the software cannot assign a value to the degenerate, initial state distribution. Therefore, set static states to 2 using the name-value pair argument StateType. Subsequently, the software treats static states as nonstationary and assigns the static state a diffuse initial distribution. • It is best practice to set StateType for each state. By default, the software generates StateType, but this behavior might not be accurate. For example, the software cannot distinguish between a constant 1 state and a static state. • The software cannot infer StateType from data because the data theoretically comes from the observation equation. The realizations of the state equation are unobservable. • dssm models do not store observed responses or predictor data. Supply the data wherever necessary using the appropriate input or name-value pair arguments. • Suppose that you want to create a diffuse state-space model using a parameter-to-matrix mapping function with this signature: [A,B,C,D,Mean0,Cov0,StateType,DeflateY] = paramMap(params,Y,Z)
and you specify the model using an anonymous function Mdl = dssm(@(params)paramMap(params,Y,Z))
The observed responses Y and predictor data Z are not input arguments in the anonymous function. If Y and Z exist in the MATLAB Workspace before you create Mdl, then the software establishes a link to them. Otherwise, if you pass Mdl to estimate, the software throws an error. The link to the data established by the anonymous function overrides all other corresponding input argument values of estimate. This distinction is important particularly when conducting a rolling window analysis. For details, see “Rolling-Window Analysis of Time-Series Models” on page 11-135.
12-469
12
Functions
Alternatives Create an ssm model object instead of a dssm model object when: • The model does not contain any diffuse states. • The diffuse states are correlated with each other or to other states. • You want to implement the standard Kalman filter.
Version History Introduced in R2015b
References [1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also ssm Topics “Implicitly Create Time-Varying Diffuse State-Space Model” on page 11-28 “Implicitly Create Diffuse State-Space Model Containing Regression Component” on page 11-24 “What Are State-Space Models?” on page 11-3 “Rolling-Window Analysis of Time-Series Models” on page 11-135
12-470
dtmc
dtmc Create discrete-time Markov chain
Description dtmc creates a discrete-time, finite-state, time-homogeneous Markov chain from a specified state transition matrix. After creating a dtmc object, you can analyze the structure and evolution of the Markov chain, and visualize the Markov chain in various ways, by using the object functions on page 12-472. Also, you can use a dtmc object to specify the switching mechanism of a Markov-switching dynamic regression model (msVAR). To create a switching mechanism, governed by threshold transitions and threshold variable data, for a threshold-switching dynamic regression model, see threshold and tsVAR.
Creation Syntax mc = dtmc(P) mc = dtmc(P,'StateNames',stateNames) Description mc = dtmc(P) creates the discrete-time Markov chain object mc specified by the state transition matrix P. mc = dtmc(P,'StateNames',stateNames) optionally associates the names stateNames to the states. Input Arguments P — State transition matrix nonnegative numeric matrix State transition matrix, specified as a numStates-by-numStates nonnegative numeric matrix. P(i,j) is either the theoretical probability of a transition from state i to state j or an empirical count of observed transitions from state i to state j. P can be fully specified (all elements are nonnegative numbers), partially specified (elements are a mix of nonnegative numbers and NaN values), or unknown (completely composed of NaN values). dtmc normalizes each row of P without any NaN values to sum to 1, then stores the normalized matrix in the property P. Data Types: double 12-471
12
Functions
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, for the two-state model mc, to label the first and second states Depression and Recession, respectively, enter: mc.StateNames = ["Depression" "Recession"];
P — Normalized transition matrix nonnegative numeric matrix This property is read-only. Normalized transition matrix, specified as a numStates-by-numStates nonnegative numeric matrix. If x is a row vector of length numStates specifying a distribution of states at time t (x sums to 1), then x*P is the distribution of states at time t + 1. NaN entries indicate estimable transition probabilities. The estimate function of msVAR treats the known elements of P as equality constraints during optimization. Data Types: double NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar. Data Types: double StateNames — Unique state labels string(1:numStates) (default) | string vector | cell vector of character vectors | numeric vector Unique state labels, specified as a string vector, cell vector of character vectors, or numeric vector of length numStates. Elements correspond to rows and columns of P. Example: ["Depression" "Recession" "Stagnant" "Boom"] Data Types: string
Object Functions dtmc objects require a fully specified transition matrix P.
Determine Markov Chain Structure asymptotics isergodic isreducible classify lazy subchain 12-472
Determine Markov chain asymptotics Check Markov chain for ergodicity Check Markov chain for reducibility Classify Markov chain states Adjust Markov chain state inertia Extract Markov subchain
dtmc
Describe Markov Chain Evolution redistribute simulate
Compute Markov chain redistributions Simulate Markov chain state walks
Visualize Markov Chain distplot eigplot graphplot simplot
Plot Markov chain redistributions Plot Markov chain eigenvalues Plot Markov chain directed graph Plot Markov chain simulations
Examples Create Markov Chain Using Matrix of Transition Probabilities Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0.5 0.5 P= 0 0
0.5 0 0 0
0 0.5 0 1
0 0 . 1 0
Element Pi j is the probability that the process transitions to state j at time t + 1 given that it is in state i at time t, for all t. Create the Markov chain that is characterized by the transition matrix P. P = [0.5 0.5 0 0; 0.5 0 0.5 0; 0 0 0 1; 0 0 1 0]; mc = dtmc(P);
mc is a dtmc object that represents the Markov chain. Display the number of states in the Markov chain. numstates = mc.NumStates numstates = 4
Plot a directed graph of the Markov chain. figure; graphplot(mc);
12-473
12
Functions
Observe that states 3 and 4 form an absorbing class, while states 1 and 2 are transient.
Create Markov Chain Using Matrix of Observed Transition Counts Consider this transition matrix in which element (i, j) is the observed number of times state i transitions to state j. 16 5 P= 9 4
2 11 7 14
3 10 6 15
13 8 . 12 1
For example, P32 = 7 implies that state 3 transitions to state 2 seven times. P = [16 5 9 4
2 11 7 14
3 10 6 15
13; 8; 12; 1];
Create the Markov chain that is characterized by the transition matrix P. mc = dtmc(P);
12-474
dtmc
Display the normalized transition matrix stored in mc. Verify that the elements within rows sum to 1 for all rows. mc.P ans = 4×4 0.4706 0.1471 0.2647 0.1176
0.0588 0.3235 0.2059 0.4118
0.0882 0.2941 0.1765 0.4412
0.3824 0.2353 0.3529 0.0294
sum(mc.P,2) ans = 4×1 1 1 1 1
Plot a directed graph of the Markov chain. figure; graphplot(mc);
12-475
12
Functions
Label Markov Chain States Consider the two-state business cycle of the US real gross national product (GNP) in [3] p. 697. At time t, real GNP can be in a state of expansion or contraction. Suppose that the following statements are true during the sample period. • If real GNP is expanding at time t, then the probability that it will continue in an expansion state at time t + 1 is p11 = 0 . 90. • If real GNP is contracting at time t, then the probability that it will continue in a contraction state at time t + 1 is p22 = 0 . 75. Create the transition matrix for the model. p11 = 0.90; p22 = 0.75; P = [p11 (1 - p11); (1 - p22) p22];
Create the Markov chain that is characterized by the transition matrix P. Label the two states. mc = dtmc(P,'StateNames',["Expansion" "Contraction"]) mc = dtmc with properties: P: [2x2 double] StateNames: ["Expansion" NumStates: 2
"Contraction"]
Plot a directed graph of the Markov chain. Indicate the probability of transition by using edge colors. figure; graphplot(mc,'ColorEdges',true);
12-476
dtmc
Create Markov Chain from Random Transition Matrix To help you explore the dtmc object functions, mcmix creates a Markov chain from a random transition matrix using only a specified number of states. Create a five-state Markov chain from a random transition matrix. rng(1); % For reproducibility mc = mcmix(5) mc = dtmc with properties: P: [5x5 double] StateNames: ["1" "2" NumStates: 5
"3"
"4"
"5"]
mc is a dtmc object. Plot the eigenvalues of the transition matrix on the complex plane. figure; eigplot(mc)
12-477
12
Functions
This spectrum determines structural properties of the Markov chain, such as periodicity and mixing rate.
Create Markov Chain with Unknown Transition Matrix Entries Consider a Markov-switching autoregression (msVAR) model for the US GDP containing four economic regimes: depression, recession, stagnation, and expansion. To estimate the transition probabilities of the switching mechanism, you must supply a dtmc model with an unknown transition matrix entries to the msVAR framework. Create a 4-regime Markov chain with an unknown transition matrix (all NaN entries). Specify the regime names. P = nan(4); statenames = ["Depression" "Recession" ... "Stagnation" "Expansion"]; mcUnknown = dtmc(P,'StateNames',statenames) mcUnknown = dtmc with properties: P: [4x4 double] StateNames: ["Depression"
12-478
"Recession"
"Stagnation"
"Expansion"]
dtmc
NumStates: 4 mcUnknown.P ans = 4×4 NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
Suppose economic theory states that the US economy never transitions to an expansion from a recession or depression. Create a 4-regime Markov chain with a partially known transition matrix representing the situation. P(1,4) = 0; P(2,4) = 0; mcPartial = dtmc(P,'StateNames',statenames) mcPartial = dtmc with properties: P: [4x4 double] StateNames: ["Depression" NumStates: 4
"Recession"
"Stagnation"
"Expansion"]
mcPartial.P ans = 4×4 NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
0 0 NaN NaN
The estimate function of msVAR treats the known elements of mcPartial.P as equality constraints during optimization. For more details on Markov-switching dynamic regression models, see msVAR.
Alternatives You also can create a Markov chain object using mcmix.
Version History Introduced in R2017b
12-479
12
Functions
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Haggstrom, O. Finite Markov Chains and Algorithmic Applications. Cambridge, UK: Cambridge University Press, 2002. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [4] Norris, J. R. Markov Chains. Cambridge, UK: Cambridge University Press, 1997.
See Also mcmix | msVAR Topics “Discrete-Time Markov Chains” on page 10-2 “Markov Chain Modeling” on page 10-8
12-480
Econometric Modeler
Econometric Modeler Analyze and model econometric time series
Description The Econometric Modeler app provides an interface for interactive exploratory data analysis. The flexible interface supports analysis of univariate and multivariate time series and conditional mean (for example, ARIMA), conditional variance (for example, GARCH), multivariate (for example, VAR and VEC), and time series regression model estimation. Using the app, you can: • Visualize and transform time series data. • Perform statistical specification and model identification tests. • Estimate candidate models and compare fits. • Perform post-fit assessments and residual diagnostics. • Automatically generate code or a report from a session.
12-481
12
Functions
Open the Econometric Modeler App • MATLAB Toolstrip: On the Apps tab, under Computational Finance, click the app icon. • MATLAB command prompt: Enter econometricModeler.
Examples •
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Import Time Series Data into Econometric Modeler App” on page 4-62
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
•
“Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150
•
“Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200
•
“Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208
•
“Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
•
“Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122
•
“Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4221
•
“Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230
•
“Share Results of Econometric Modeler App Session” on page 4-237
Version History Introduced in R2018a
See Also Objects arima | regARIMA | garch | gjr | egarch | varm | vecm 12-482
Econometric Modeler
Functions autocorr | parcorr | crosscorr | adftest | kpsstest | lmctest | lbqtest | archtest | vratiotest | pptest | collintest | egcitest | jcitest | aicbic Topics “Prepare Time Series Data for Econometric Modeler App” on page 4-59 “Import Time Series Data into Econometric Modeler App” on page 4-62 “Plot Time Series Data Using Econometric Modeler App” on page 4-66 “Detect Serial Correlation Using Econometric Modeler App” on page 4-71 “Detect ARCH Effects Using Econometric Modeler App” on page 4-77 “Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84 “Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94 “Conduct Cointegration Test Using Econometric Modeler” on page 4-170 “Transform Time Series Using Econometric Modeler App” on page 4-97 “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4112 “Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131 “Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141 “Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150 “Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200 “Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208 “Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155 “Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180 “Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193 “Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122 “Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4-221 “Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230 “Share Results of Econometric Modeler App Session” on page 4-237 “Analyze Time Series Data Using Econometric Modeler” on page 4-2 “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44 “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50 Creating ARIMA Models Using Econometric Modeler App
12-483
12
Functions
egarch EGARCH conditional variance time series model
Description Use egarch to specify a univariate EGARCH (exponential generalized autoregressive conditional heteroscedastic) model. The egarch function returns an egarch object specifying the functional form of an EGARCH(P,Q) model on page 12-500, and stores its parameter values. The key components of an egarch model include the: • GARCH polynomial, which is composed of lagged, logged conditional variances. The degree is denoted by P. • ARCH polynomial, which is composed of the magnitudes of lagged standardized innovations. • Leverage polynomial, which is composed of lagged standardized innovations. • Maximum of the ARCH and leverage polynomial degrees, denoted by Q. P is the maximum nonzero lag in the GARCH polynomial, and Q is the maximum nonzero lag in the ARCH and leverage polynomials. Other model components include an innovation mean model offset, a conditional variance model constant, and the innovations distribution. All coefficients are unknown (NaN values) and estimable unless you specify their values using namevalue pair argument syntax. To estimate models containing all or partially unknown parameter values given data, use estimate. For completely specified models (models in which all parameter values are known), simulate or forecast responses using simulate or forecast, respectively.
Creation Syntax Mdl = egarch Mdl = egarch(P,Q) Mdl = egarch(Name,Value) Description Mdl = egarch creates a zero-degree conditional variance egarch object. Mdl = egarch(P,Q) creates an EGARCH conditional variance model object (Mdl) with a GARCH polynomial with a degree of P, and ARCH and leverage polynomials each with a degree of Q. All polynomials contain all consecutive lags from 1 through their degrees, and all coefficients are NaN values. This shorthand syntax enables you to create a template in which you specify the polynomial degrees explicitly. The model template is suited for unrestricted parameter estimation, that is, estimation without any parameter equality constraints. However, after you create a model, you can alter property values using dot notation. 12-484
egarch
Mdl = egarch(Name,Value) sets properties on page 12-486 or additional options using namevalue pair arguments. Enclose each name in quotes. For example, 'ARCHLags',[1 4],'ARCH', {0.2 0.3} specifies the two ARCH coefficients in ARCH at lags 1 and 4. This longhand syntax enables you to create more flexible models. Input Arguments The shorthand syntax provides an easy way for you to create model templates that are suitable for unrestricted parameter estimation. For example, to create an EGARCH(1,2) model containing unknown parameter values, enter: Mdl = egarch(1,2);
To impose equality constraints on parameter values during estimation, set the appropriate property on page 12-486 values using dot notation. P — GARCH polynomial degree nonnegative integer GARCH polynomial degree, specified as a nonnegative integer. In the GARCH polynomial and at time t, MATLAB includes all consecutive logged conditional variance terms from lag t – 1 through lag t – P. You can specify this argument using the egarch(P,Q) shorthand syntax only. If P > 0, then you must specify Q as a positive integer. Example: egarch(1,1) Data Types: double Q — ARCH polynomial degree nonnegative integer ARCH polynomial degree, specified as a nonnegative integer. In the ARCH polynomial and at time t, MATLAB includes all consecutive magnitudes of standardized innovation terms (for the ARCH polynomial) and all standardized innovation terms (for the leverage polynomial) from lag t – 1 through lag t – Q. You can specify this argument using the egarch(P,Q) shorthand syntax only. If P > 0, then you must specify Q as a positive integer. Example: egarch(1,1) Data Types: double Name-Value Pair Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. The longhand syntax enables you to create models in which some or all coefficients are known. During estimation, estimate imposes equality constraints on any known parameters. 12-485
12
Functions
Example: 'ARCHLags',[1 4],'ARCH',{NaN NaN} specifies an EGARCH(0,4) model and unknown, but nonzero, ARCH coefficient matrices at lags 1 and 4. GARCHLags — GARCH polynomial lags 1:P (default) | numeric vector of unique positive integers GARCH polynomial lags, specified as the comma-separated pair consisting of 'GARCHLags' and a numeric vector of unique positive integers. GARCHLags(j) is the lag corresponding to the coefficient GARCH{j}. The lengths of GARCHLags and GARCH must be equal. Assuming all GARCH coefficients (specified by the GARCH property) are positive or NaN values, max(GARCHLags) determines the value of the P property. Example: 'GARCHLags',[1 4] Data Types: double ARCHLags — ARCH polynomial lags 1:Q (default) | numeric vector of unique positive integers ARCH polynomial lags, specified as the comma-separated pair consisting of 'ARCHLags' and a numeric vector of unique positive integers. ARCHLags(j) is the lag corresponding to the coefficient ARCH{j}. The lengths of ARCHLags and ARCH must be equal. Assuming all ARCH and leverage coefficients (specified by the ARCH and Leverage properties) are positive or NaN values, max([ARCHLags LeverageLags]) determines the value of the Q property. Example: 'ARCHLags',[1 4] Data Types: double LeverageLags — Leverage polynomial lags 1:Q (default) | numeric vector of unique positive integers Leverage polynomial lags, specified as the comma-separated pair consisting of 'LeverageLags' and a numeric vector of unique positive integers. LeverageLags(j) is the lag corresponding to the coefficient Leverage{j}. The lengths of LeverageLags and Leverage must be equal. Assuming all ARCH and leverage coefficients (specified by the ARCH and Leverage properties) are positive or NaN values, max([ARCHLags LeverageLags]) determines the value of the Q property. Example: 'LeverageLags',1:4 Data Types: double
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to create an EGARCH(1,1) model with unknown coefficients, and then specify a t innovation distribution with unknown degrees of freedom, enter: 12-486
egarch
Mdl = egarch('GARCHLags',1,'ARCHLags',1); Mdl.Distribution = "t";
P — GARCH polynomial degree nonnegative integer This property is read-only. GARCH polynomial degree, specified as a nonnegative integer. P is the maximum lag in the GARCH polynomial with a coefficient that is positive or NaN. Lags that are less than P can have coefficients equal to 0. P specifies the minimum number of presample conditional variances required to initialize the model. If you use name-value pair arguments to create the model, then MATLAB implements one of these alternatives (assuming the coefficient of the largest lag is positive or NaN): • If you specify GARCHLags, then P is the largest specified lag. • If you specify GARCH, then P is the number of elements of the specified value. If you also specify GARCHLags, then egarch uses GARCHLags to determine P instead. • Otherwise, P is 0. Data Types: double Q — Maximum degree of ARCH and leverage polynomials nonnegative integer This property is read-only. Maximum degree of ARCH and leverage polynomials, specified as a nonnegative integer. Q is the maximum lag in the ARCH and leverage polynomials in the model. In either type of polynomial, lags that are less than Q can have coefficients equal to 0. Q specifies the minimum number of presample innovations required to initiate the model. If you use name-value pair arguments to create the model, then MATLAB implements one of these alternatives (assuming the coefficients of the largest lags in the ARCH and leverage polynomials are positive or NaN): • If you specify ARCHLags or LeverageLags, then Q is the maximum between the two specifications. • If you specify ARCH or Leverage, then Q is the maximum number of elements between the two specifications. If you also specify ARCHLags or LeverageLags, then egarch uses their values to determine Q instead. • Otherwise, Q is 0. Data Types: double Constant — Conditional variance model constant NaN (default) | numeric scalar Conditional variance model constant, specified as a numeric scalar or NaN value. Data Types: double 12-487
12
Functions
GARCH — GARCH polynomial coefficients cell vector of positive scalars or NaN values GARCH polynomial coefficients, specified as a cell vector of positive scalars or NaN values. • If you specify GARCHLags, then the following conditions apply. • The lengths of GARCH and GARCHLags are equal. • GARCH{j} is the coefficient of lag GARCHLags(j). • By default, GARCH is a numel(GARCHLags)-by-1 cell vector of NaN values. • Otherwise, the following conditions apply. • The length of GARCH is P. • GARCH{j} is the coefficient of lag j. • By default, GARCH is a P-by-1 cell vector of NaN values. The coefficients in GARCH correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, egarch excludes that coefficient and its corresponding lag in GARCHLags from the model. Data Types: cell ARCH — ARCH polynomial coefficients cell vector of positive scalars or NaN values ARCH polynomial coefficients, specified as a cell vector of positive scalars or NaN values. • If you specify ARCHLags, then the following conditions apply. • The lengths of ARCH and ARCHLags are equal. • ARCH{j} is the coefficient of lag ARCHLags(j). • By default, ARCH is a Q-by-1 cell vector of NaN values. For more details, see the Q property. • Otherwise, the following conditions apply. • The length of ARCH is Q. • ARCH{j} is the coefficient of lag j. • By default, ARCH is a Q-by-1 cell vector of NaN values. The coefficients in ARCH correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, egarch excludes that coefficient and its corresponding lag in ARCHLags from the model. Data Types: cell Leverage — Leverage polynomial coefficients cell vector of numeric scalars or NaN values Leverage polynomial coefficients, specified as a cell vector of numeric scalars or NaN values. • If you specify LeverageLags, then the following conditions apply. • The lengths of Leverage and LeverageLags are equal. 12-488
egarch
• Leverage{j} is the coefficient of lag LeverageLags(j). • By default, Leverage is a Q-by-1 cell vector of NaN values. For more details, see the Q property. • Otherwise, the following conditions apply. • The length of Leverage is Q. • Leverage{j} is the coefficient of lag j. • By default, Leverage is a Q-by-1 cell vector of NaN values. The coefficients in Leverage correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, egarch excludes that coefficient and its corresponding lag in LeverageLags from the model. Data Types: cell UnconditionalVariance — Model unconditional variance positive scalar This property is read-only. The model unconditional variance, specified as a positive scalar. The unconditional variance is σε2 = exp
κ P
(1 − ∑i = 1 γi)
.
κ is the conditional variance model constant (Constant). Data Types: double Offset — Innovation mean model offset 0 (default) | numeric scalar | NaN Innovation mean model offset, or additive constant, specified as a numeric scalar or NaN value. Data Types: double Distribution — Conditional probability distribution of innovation process "Gaussian" (default) | "t" | structure array Conditional probability distribution of the innovation process, specified as a string or structure array. egarch stores the value as a structure array. Distribution
String
Structure Array
Gaussian
"Gaussian"
struct('Name',"Gaussian")
Student’s t
"t"
struct('Name',"t",'DoF',DoF)
The 'DoF' field specifies the t distribution degrees of freedom parameter. • DoF > 2 or DoF = NaN. • DoF is estimable. 12-489
12
Functions
• If you specify "t", DoF is NaN by default. You can change its value by using dot notation after you create the model. For example, Mdl.Distribution.DoF = 3. • If you supply a structure array to specify the Student's t distribution, then you must specify both the 'Name' and 'DoF' fields. Example: struct('Name',"t",'DoF',10) Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. egarch stores the value as a string scalar. The default value describes the parametric form of the model, for example "EGARCH(1,1) Conditional Variance Model (Gaussian Distribution)". Data Types: string | char Note • All NaN-valued model parameters, which include coefficients and the t-innovation-distribution degrees of freedom (if present), are estimable. When you pass the resulting egarch object and data to estimate, MATLAB estimates all NaN-valued parameters. During estimation, estimate treats known parameters as equality constraints, that is,estimate holds any known parameters fixed at their values. • Typically, the lags in the ARCH and leverage polynomials are the same, but their equality is not a requirement. Differing polynomials occur when: • Either ARCH{Q} or Leverage{Q} meets the near-zero exclusion tolerance. In this case, MATLAB excludes the corresponding lag from the polynomial. • You specify polynomials of differing lengths by specifying ARCHLags or LeverageLags, or by setting the ARCH or Leverage property. In either case, Q is the maximum lag between the two polynomials.
Object Functions estimate filter forecast infer simulate summarize
Fit conditional variance model to data Filter disturbances through conditional variance model Forecast conditional variances from conditional variance models Infer conditional variances of conditional variance models Monte Carlo simulation of conditional variance models Display estimation results of conditional variance model
Examples Create Default EGARCH Model Create a default egarch model object and specify its parameter values using dot notation. Create an EGARCH(0,0) model. Mdl = egarch
12-490
egarch
Mdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(0,0) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 0 0 NaN {} {} {} 0
Mdl is an egarch model. It contains an unknown constant, its offset is 0, and the innovation distribution is 'Gaussian'. The model does not have GARCH, ARCH, or leverage polynomials. Specify two unknown ARCH and leverage coefficients for lags one and two using dot notation. Mdl.ARCH = {NaN NaN}; Mdl.Leverage = {NaN NaN}; Mdl Mdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(0,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 0 2 NaN {} {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
The Q, ARCH, and Leverage properties update to 2, {NaN NaN}, {NaN NaN}, respectively. The two ARCH and leverage coefficients are associated with lags 1 and 2.
Create EGARCH Model Using Shorthand Syntax Create an egarch model object using the shorthand notation egarch(P,Q), where P is the degree of the GARCH polynomial and Q is the degree of the ARCH and leverage polynomial. Create an EGARCH(3,2) model. Mdl = egarch(3,2) Mdl = egarch with properties: Description: Distribution: P: Q: Constant:
"EGARCH(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2 NaN
12-491
12
Functions
GARCH: ARCH: Leverage: Offset:
{NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Mdl is an egarch model object. All properties of Mdl, except P, Q, and Distribution, are NaN values. By default, the software: • Includes a conditional variance model constant • Excludes a conditional mean model offset (i.e., the offset is 0) • Includes all lag terms in the GARCH polynomial up to lag P • Includes all lag terms in the ARCH and leverage polynomials up to lag Q Mdl specifies only the functional form of an EGARCH model. Because it contains unknown parameter values, you can pass Mdl and time-series data to estimate to estimate the parameters.
Create EGARCH Model Using Longhand Syntax Create an egarch model object using name-value pair arguments. Specify an EGARCH(1,1) model. By default, the conditional mean model offset is zero. Specify that the offset is NaN. Include a leverage term. Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN) Mdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution)" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] {NaN} at lag [1] NaN
Mdl is an egarch model object. The software sets all parameters to NaN, except P, Q, and Distribution. Since Mdl contains NaN values, Mdl is appropriate for estimation only. Pass Mdl and time-series data to estimate.
Create EGARCH Model with Known Coefficients Create an EGARCH(1,1) model with mean offset, yt = 0 . 5 + εt, where εt = σtzt, 12-492
egarch
σt2 = 0 . 0001 + 0 . 75logσt2− 1 + 0 . 1
|εt − 1| εt − 1 εt − 3 2 − − 0.3 + 0 . 01 , σt − 1 π σt − 1 σt − 3
and zt is an independent and identically distributed standard Gaussian process. Mdl = egarch('Constant',0.0001,'GARCH',0.75,... 'ARCH',0.1,'Offset',0.5,'Leverage',{-0.3 0 0.01}) Mdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(1,3) Conditional Variance Model with Offset (Gaussian Distribution)" Name = "Gaussian" 1 3 0.0001 {0.75} at lag [1] {0.1} at lag [1] {-0.3 0.01} at lags [1 3] 0.5
egarch assigns default values to any properties you do not specify with name-value pair arguments. An alternative way to specify the leverage component is 'Leverage',{-0.3 0.01},'LeverageLags',[1 3].
Access EGARCH Model Properties Access the properties of a created egarch model object using dot notation. Create an egarch model object. Mdl = egarch(3,2) Mdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2 NaN {NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Remove the second GARCH term from the model. That is, specify that the GARCH coefficient of the second lagged conditional variance is 0. Mdl.GARCH{2} = 0 Mdl = egarch with properties: Description: "EGARCH(3,2) Conditional Variance Model (Gaussian Distribution)"
12-493
12
Functions
Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
Name 3 2 NaN {NaN {NaN {NaN 0
= "Gaussian"
NaN} at lags [1 3] NaN} at lags [1 2] NaN} at lags [1 2]
The GARCH polynomial has two unknown parameters corresponding to lags 1 and 3. Display the distribution of the disturbances. Mdl.Distribution ans = struct with fields: Name: "Gaussian"
The disturbances are Gaussian with mean 0 and variance 1. Specify that the underlying disturbances have a t distribution with five degrees of freedom. Mdl.Distribution = struct('Name','t','DoF',5) Mdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(3,2) Conditional Variance Model (t Distribution)" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Specify that the ARCH coefficients are 0.2 for the first lag and 0.1 for the second lag. Mdl.ARCH = {0.2 0.1} Mdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(3,2) Conditional Variance Model (t Distribution)" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {0.2 0.1} at lags [1 2] {NaN NaN} at lags [1 2] 0
To estimate the remaining parameters, you can pass Mdl and your data to estimate and use the specified parameters as equality constraints. Or, you can specify the rest of the parameter values, and then simulate or forecast conditional variances from the GARCH model by passing the fully specified model to simulate or forecast, respectively. 12-494
egarch
Estimate EGARCH Model Fit an EGARCH model to an annual time series of Danish nominal stock returns from 1922-1999. Load the Data_Danish data set. Plot the nominal returns (RN). load Data_Danish; nr = DataTable.RN; figure; plot(dates,nr); hold on; plot([dates(1) dates(end)],[0 0],'r:'); % Plot y = 0 hold off; title('Danish Nominal Stock Returns'); ylabel('Nominal return (%)'); xlabel('Year');
The nominal return series seems to have a nonzero conditional mean offset and seems to exhibit volatility clustering. That is, the variability is smaller for earlier years than it is for later years. For this example, assume that an EGARCH(1,1) model is appropriate for this series. Create an EGARCH(1,1) model. The conditional mean offset is zero by default. To estimate the offset, specify that it is NaN. Include a leverage lag. 12-495
12
Functions
Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN);
Fit the EGARCH(1,1) model to the data. EstMdl = estimate(Mdl,nr); EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1} ARCH{1} Leverage{1} Offset
-0.62723 0.77419 0.38636 -0.002499 0.10325
StandardError _____________ 0.74401 0.23628 0.37361 0.19222 0.037727
TStatistic __________
PValue _________
-0.84304 3.2766 1.0341 -0.013001 2.7368
0.3992 0.0010507 0.30107 0.98963 0.0062047
EstMdl is a fully specified egarch model object. That is, it does not contain NaN values. You can assess the adequacy of the model by generating residuals using infer, and then analyzing them. To simulate conditional variances or responses, pass EstMdl to simulate. To forecast innovations, pass EstMdl to forecast.
Simulate EGARCH Model Observations and Conditional Variances Simulate conditional variance or response paths from a fully specified egarch model object. That is, simulate from an estimated egarch model or a known egarch model in which you specify all parameter values. Load the Data_Danish data set. load Data_Danish; rn = DataTable.RN;
Create an EGARCH(1,1) model with an unknown conditional mean offset. Fit the model to the annual, nominal return series. Include a leverage term. Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN); EstMdl = estimate(Mdl,rn); EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1} ARCH{1} Leverage{1} Offset
-0.62723 0.77419 0.38636 -0.002499 0.10325
StandardError _____________ 0.74401 0.23628 0.37361 0.19222 0.037727
TStatistic __________
PValue _________
-0.84304 3.2766 1.0341 -0.013001 2.7368
0.3992 0.0010507 0.30107 0.98963 0.0062047
Simulate 100 paths of conditional variances and responses from the estimated EGARCH model. 12-496
egarch
numObs = numel(rn); % Sample size (T) numPaths = 100; % Number of paths to simulate rng(1); % For reproducibility [VSim,YSim] = simulate(EstMdl,numObs,'NumPaths',numPaths);
VSim and YSim are T-by- numPaths matrices. Rows correspond to a sample period, and columns correspond to a simulated path. Plot the average and the 97.5% and 2.5% percentiles of the simulate paths. Compare the simulation statistics to the original data. VSimBar = mean(VSim,2); VSimCI = quantile(VSim,[0.025 0.975],2); YSimBar = mean(YSim,2); YSimCI = quantile(YSim,[0.025 0.975],2); figure; subplot(2,1,1); h1 = plot(dates,VSim,'Color',0.8*ones(1,3)); hold on; h2 = plot(dates,VSimBar,'k--','LineWidth',2); h3 = plot(dates,VSimCI,'r--','LineWidth',2); hold off; title('Simulated Conditional Variances'); ylabel('Cond. var.'); xlabel('Year'); subplot(2,1,2); h1 = plot(dates,YSim,'Color',0.8*ones(1,3)); hold on; h2 = plot(dates,YSimBar,'k--','LineWidth',2); h3 = plot(dates,YSimCI,'r--','LineWidth',2); hold off; title('Simulated Nominal Returns'); ylabel('Nominal return (%)'); xlabel('Year'); legend([h1(1) h2 h3(1)],{'Simulated path' 'Mean' 'Confidence bounds'},... 'FontSize',7,'Location','NorthWest');
12-497
12
Functions
Forecast EGARCH Model Conditional Variances Forecast conditional variances from a fully specified egarch model object. That is, forecast from an estimated egarch model or a known egarch model in which you specify all parameter values. The example follows from “Estimate EGARCH Model” on page 12-495. Load the Data_Danish data set. load Data_Danish; nr = DataTable.RN;
Create an EGARCH(1,1) model with an unknown conditional mean offset and include a leverage term. Fit the model to the annual nominal return series. Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN); EstMdl = estimate(Mdl,nr); EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1}
12-498
-0.62723 0.77419
StandardError _____________
TStatistic __________
PValue _________
0.74401 0.23628
-0.84304 3.2766
0.3992 0.0010507
egarch
ARCH{1} Leverage{1} Offset
0.38636 -0.002499 0.10325
0.37361 0.19222 0.037727
1.0341 -0.013001 2.7368
0.30107 0.98963 0.0062047
Forecast the conditional variance of the nominal return series 10 years into the future using the estimated EGARCH model. Specify the entire returns series as presample observations. The software infers presample conditional variances using the presample observations and the model. numPeriods = 10; vF = forecast(EstMdl,numPeriods,nr);
Plot the forecasted conditional variances of the nominal returns. Compare the forecasts to the observed conditional variances. v = infer(EstMdl,nr); figure; plot(dates,v,'k:','LineWidth',2); hold on; plot(dates(end):dates(end) + 10,[v(end);vF],'r','LineWidth',2); title('Forecasted Conditional Variances of Nominal Returns'); ylabel('Conditional variances'); xlabel('Year'); legend({'Estimation sample cond. var.','Forecasted cond. var.'},... 'Location','Best');
12-499
12
Functions
More About EGARCH Model An EGARCH model is a dynamic model that addresses conditional heteroscedasticity, or volatility clustering, in an innovations process. Volatility clustering occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes with time. An EGARCH model posits that the current conditional variance is the sum of these linear processes: • Past logged conditional variances (the GARCH component or polynomial) • Magnitudes of past standardized innovations (the ARCH component or polynomial) • Past standardized innovations (the leverage component or polynomial) Consider the time series yt = μ + εt, where εt = σtzt . The EGARCH(P,Q) conditional variance process, σt2, has the form logσt2 = κ +
P
∑
i=1
γilogσt2− i +
Q
∑
j=1
εt − j εt − j −E σt − j σt − j
αj
+
Q
∑
j=1
ξj
εt − j . σt − j
The table shows how the variables correspond to the properties of the egarch object. Variable
Description
Property
μ
Innovation mean model constant 'Offset' offset
κ
Conditional variance model constant
'Constant'
γj
GARCH component coefficients
'GARCH'
αj
ARCH component coefficients
'ARCH'
ξj
Leverage component coefficients
'Leverage'
zt
Series of independent random variables with mean 0 and variance 1
'Distribution'
If zt is Gaussian, then E
εt − j σt − j
= E zt −
j
=
2 . π
If zt is t distributed with ν > 2 degrees of freedom, then εt − j E σt − j
= E zt −
j
=
ν−1 2 ν 2
ν−2Γ π Γ
.
To ensure a stationary EGARCH model, all roots of the GARCH lag operator polynomial, (1 − γ1L − … − γPLP), must lie outside of the unit circle. 12-500
egarch
The EGARCH model is unique from the GARCH and GJR models because it models the logarithm of the variance. By modeling the logarithm, positivity constraints on the model parameters are relaxed. However, forecasts of conditional variances from an EGARCH model are biased, because by Jensen’s inequality, E(σt2) ≥ exp E(logσt2) . EGARCH models are appropriate when positive and negative shocks of equal magnitude do not contribute equally to volatility [1].
Tips • You can specify an egarch model as part of a composition of conditional mean and variance models. For details, see arima. • An EGARCH(1,1) specification is complex enough for most applications. Typically in these models, the GARCH and ARCH coefficients are positive, and the leverage coefficients are negative. If you get these signs, then large unanticipated downward shocks increase the variance. If you get signs opposite to those signs that are expected, you can encounter difficulties inferring volatility sequences and forecasting. A negative ARCH coefficient is problematic. In this case, an EGARCH model might not be the best choice for your application.
Version History Introduced in R2012a
References [1] Tsay, R. S. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2010.
See Also Objects garch | gjr | arima Topics “Specify EGARCH Models” on page 8-17 “Modify Properties of Conditional Variance Models” on page 8-39 “Specify Conditional Mean and Variance Models” on page 7-81 “Infer Conditional Variances and Residuals” on page 8-62 “Compare Conditional Variance Models Using Information Criteria” on page 8-69 “Assess EGARCH Forecast Bias Using Simulations” on page 8-81 “Forecast a Conditional Variance Model” on page 8-97 “Conditional Variance Models” on page 8-2 “EGARCH Model” on page 8-3
12-501
12
Functions
egcitest Engle-Granger cointegration test
Syntax h = egcitest(Y) [h,pValue,stat,cValue] = egcitest(Y) StatTbl = egcitest(Tbl) [ ___ ] = egcitest( ___ ,Name=Value) [ ___ ,reg1,reg2] = egcitest( ___ )
Description h = egcitest(Y) returns the rejection decision h from conducting the Engle-Granger cointegration test for assessing the null hypothesis of no cointegration among the variables in the multivariate time series Y. egcitest forms test statistics by regressing the response data Y(:,1) onto the predictor data Y(:,2:end), and then tests the residuals for a unit root. [h,pValue,stat,cValue] = egcitest(Y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = egcitest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Engle-Granger cointegration test on the variables of the table or timetable Tbl. The response variable in the regression is the first table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. [ ___ ] = egcitest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. egcitest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when egcitest conducts multiple tests: • egcitest treats each test as separate from all other tests. • If you specify Y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, egcitest(Tbl,ResponseVariable="GDP",Alpha=0.025,Lags=[0 1]) chooses GDP as the response variable from the table Tbl and conducts two tests at a level of significance of 0.025. The first test includes 0 lag in the residual regression, and the second test includes 1 lag in the residual regression. [ ___ ,reg1,reg2] = egcitest( ___ ) additionally returns the following structures of regression statistics, which are required to form the test statistic: 12-502
egcitest
• reg1 – Statistics resulting from the cointegrating regression of the specified response variable ResponseVariable onto the specified predictor variables PredictorVariables • reg2 – Statistics resulting from the residual regression implemented by the specified unit root test RReg
Examples Conduct Engle-Granger Cointegration Test on Matrix of Data Test a multivariate time series for cointegration using the default values of the Engle-Granger cointegration test. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada series' ans = 5x1 cell {'(INF_C) Inflation rate (CPI-based)' } {'(INF_G) Inflation rate (GDP deflator-based)'} {'(INT_S) Interest rate (short-term)' } {'(INT_M) Interest rate (medium-term)' } {'(INT_L) Interest rate (long-term)' }
Test the interest rate series for cointegration by using the Engle-Granger cointegration test. Use default options and return the rejection decision and p-value. h = egcitest(Data(:,3:end)) h = logical 0
egcitest uses the τ test by default, and it fails to reject the null hypothesis (h = 0) of no cointegration among the interest rate series.
Return Test p-Value and Decision Statistics Load data of Canadian inflation and interest rates Data_Canada.mat. load Data_Canada
Test the interest rate series for cointegration by using the Engle-Granger cointegration test. Use default options and return the rejection decision, p-value, τ-test statistic, and critical value. [h,pValue,stat,cValue] = egcitest(Data(:,3:end)) h = logical 0
12-503
12
Functions
pValue = 0.0526 stat = -3.9321 cValue = -3.9563
Conduct Default Engle-Granger Cointegration Test on Table Variables Conduct the Engle-Granger cointegration test on a multivariate time series using default options, which use the first table variable as the response, all other table variables as predictors, and includes a constant term in the cointegrating regression. Return a table of test results. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Conduct the Engel-Granger cointegration test by passing the timetable to egcitest and using default options. For the cointegrating regression, egcitest uses the CPI-based inflation rate as the response variable and all other variables in the timetable as predictors. StatTbl = egcitest(TT) StatTbl=1×9 table h _____ Test 1
true
pValue _________
stat _______
cValue _______
0.0023851
-6.2491
-4.7673
Lags ____ 0
Alpha _____
Test ______
CReg _____
RR ___
0.05
{'t1'}
{'c'}
{'A
StatTbl is a table of test results. The rows correspond to variables in the input timetable TT, and the columns correspond to the rejection decision, and corresponding p-value, decision statistics, and specified test options. In this case, the test rejects the null hypothesis in favor of the alternative of cointegration among all the table variables. By default, egcitest includes all input table variables in the cointegration test. To select a response variable for the cointegrating regression, set the ResponseVariable option. To select predictor variables, set the PredictorVariables option.
Select Test Statistics and Plot Cointegrating Relation Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable of the interest rate series only. load Data_Canada dates = datetime(dates,12,31); idxINT = contains(DataTable.Properties.VariableNames,"INT");
12-504
egcitest
TT = table2timetable(DataTable(:,idxINT),RowTimes=dates); TT.Observations = [];
Plot the interest rate series. figure plot(TT.Time,TT.Variables) legend(series(idxINT),Location="northwest") grid on
Reproduce row 1 of Table II in [3] by testing for cointegration, specifying the default variable assignments for the cointegrating regression and deterministic terms (response variable y1 is INT_S, the other interest rates y2 and y3 are predictors, and the model has a constant c), and specifying the τ and z tests. Return the cointegrating regression statistics. [StatTbl,reg] = egcitest(TT,Test=["t1" "t2"]); StatTbl StatTbl=2×9 table h _____ Test 1 Test 2
false true
pValue ________
stat _______
cValue _______
0.052627 0.020157
-3.9321 -25.454
-3.9563 -22.115
Lags ____ 0 0
Alpha _____
Test ______
CReg _____
RRe ____
0.05 0.05
{'t1'} {'t2'}
{'c'} {'c'}
{'AD {'AD
The τ test (Test 1) fails to reject the null hypothesis, but the z test (Test 2) rejects the null hypothesis in favor of the presence of cointegration. 12-505
12
Functions
Plot the estimated cointegrating relation using the regression statistics from the z test b1 y1 − y2 y3 − Xa, where Xa = c. b2 c = reg(2).coeff(1); b = reg(2).coeff(2:3); figure plot(TT.Time,TT.Variables*[1; -b] - c) grid on
Input Arguments Y — Data representing observations of multivariate time series yt numeric matrix Data representing observations of a multivariate time series yt, specified as a numObs-by-numDims numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. The test regresses the response variable Y(:,1) on the predictor variables Y(:,2:end). Data Types: double Tbl — Data representing observations of multivariate time series yt table | timetable 12-506
egcitest
Data representing observations of a multivariate time series yt, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. The test regresses the response variable, which is the first variable in Tbl, on the predictor variables, which are all other variables in Tbl. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. The selected variables must be numeric. Note egcitest removes, from the specified data, all observations containing at least one missing observation, represented by a NaN value. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: egcitest(Tbl,ResponseVariable="GDP",Alpha=0.025,Lags=[0 1]) chooses GDP as the response variable from the table Tbl and conducts two tests at a level of significance of 0.025. The first test includes 0 lag in the residual regression, and the second test includes 1 lag in the residual regression. CReg — Cointegrating regression form "c" (default) | "nc" | "ct" | "ctt" | character vector | string vector | cell vector of character vectors Cointegrating regression form, specified as the name of a form, or a string vector or cell vector of form names. In general, cointegrating regression is y1 = Xa + Y 2b + ε where y1 is the response variable, Y2 contains the predictor variables, and X is a design matrix for optional deterministic coefficients a, including a constant, linear time trend, and quadratic time trend. This table contains the supported forms and their names. Form Name
Description
"nc"
The regression does not include X; no constant or trends.
"c"
X contains a variable for the constant, but not for the trends.
"ct"
X contains variables for the constant and the linear time trend.
"ctt"
X contains variables for the constant, linear time trend, and quadratic time trend.
egcitest conducts a separate test for each form name in CReg. Example: CReg=["ct" "ctt"] includes a constant and linear time trend terms in the cointegrating regression for the first test, and then includes all three deterministic terms in the cointegrating regression for the second test. Data Types: char | string | cell 12-507
12
Functions
CVec — Cointegrating-regression coefficient equality constraints numeric vector (default) | cell vector of numeric vectors Cointegrating-regression coefficient equality constraints, specified as the numeric vector [a; b] or cell vector of such numeric vectors. a contains the equality constraints of the deterministic terms in the cointegrating regression. The length of a depends on the corresponding value of the CReg name-value argument, one of 0, 1, 2, or 3. For coefficients in the regression, their order in a is constant, linear trend, and quadratic trend. b contains the numDims − 1 equality constraints for the coefficient of the corresponding predictor variable in Y2. Specify NaN entries to estimate the corresponding coefficient in the regression. When CVec is completely specified (does not contain any NaN values), egcitest does not perform the cointegrating regression. By default, CVec is a completely unspecified cointegrating vector (completely composed of NaN values). Consequently, egcitest estimates all coefficients. egcitest conducts a separate test for each set of equality constraints in CVec. Example: egcitest(Tbl,CVec=[2 NaN NaN]) fixes the constant in the cointegrating regression to 2 and estimates the coefficients of the two predictor variables in Tbl. Example: egcitest(Tbl,CVec={[2 NaN NaN]; nan(3,1)), for the first test, fixes the constant in the cointegrating regression to 2 and estimates the coefficients of the two predictor variables in Tbl, and for the second test, estimates all coefficients. Example: egcitest(Tbl,CReg="ctt",CVec=[2 0.5 0.25 NaN NaN]) fixes the constant to 2, the linear trend to 0.5, and the quadratic trend to 0.25, and estimates the coefficients of the two predictor variables in Tbl. Data Types: double | cell RReg — Residual regression form "pp" (default) | "adf" | character vector | string vector | cell vector of character vectors Residual regression form, specified as the name of a form, or a string vector or cell vector of form names. Form Name
Description
"adf"
Augmented Dickey-Fuller test (adftest) of residuals from the cointegrating regression
"pp"
Phillips-Perron test (pptest) of residuals from the cointegrating regression
egcitest computes test statistics by calling adftest and pptest with the setting Model="AR". This setting requires residuals from appropriately demeaned and detrended data, which is specified by the cointegrating-regression form CReg. egcitest conducts a separate test for each form name in RReg. Example: CReg=["adf" "pp"] performs the augmented Dickey-Fuller test for the residual regression of the first test, and then performs the Phillips-Perron test for the residual regression of the second test. 12-508
egcitest
Data Types: char | string | cell Lags — Number of lags in residual regression 0 (default) | nonnegative integer | vector of nonnegative integers Number of lags in the residual regression, specified as a nonnegative integer or vector of nonnegative integers. The meaning of Lags depends on the value of the RReg name-value argument. For more details, see the Lags argument of the adftest and pptest functions. egcitest conducts a separate test for each element in Lags. Example: Lags=[0 1] includes no lags in the residual regression for the first test, and then includes one lag for the residual regression for the second test. Data Types: double Test — Test statistic type from residual regression "t1" (default) | "t2" | character vector | string vector | cell vector of character vectors Test statistic type from residual regression, specified as test name, or a string vector or cell vector of test names. This table contains the supported test names. Test Name
Description
"t1"
τ test
"t2"
z test For more details, see the Test argument of the adftest and pptest functions. egcitest conducts a separate test for each element in Test. Example: Test=["t1" "t2"] computes the τ test from the residual regression for the first test, and then computes the z test from the residual regression for the second test. Data Types: char | cell | string Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. egcitest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double ResponseVariable — Variable in Tbl to use for response first variable in Tbl (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variable in Tbl to use for response in the cointegrating regression, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. egcitest uses the same specified response variable for all tests. 12-509
12
Functions
Example: ResponseVariable="GDP" Data Types: double | logical | char | cell | string PredictorVariables — Variables in Tbl to use for the predictors string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl to use for the predictors in the cointegrating regression, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. egcitest uses the same specified predictors for all tests. By default, egcitest uses all variables in Tbl that is not specified by the ResponseVariable name-value argument. Example: DataVariables=["UN" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Note • When egcitest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the matrix Y and any value is a row vector, all outputs are row vectors. • A lagged and differenced time series has a reduced sample size. Absent presample values, if the test series yt is defined for t = 1,…,T, the lagged series yt– k is defined for t = k+1,…,T. The first difference applied to the lagged series yt– k further reduces the time base to k+2,…,T. With p lagged differences, the common time base is p+2,…,T and the effective sample size is T–(p+1).
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. egcitest returns h when you supply the input Y. • Values of 1 indicate rejection of the null hypothesis in favor of the alternative of cointegration. • Values of 0 indicate failure to reject the null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. egcitest returns pValue when you supply the input Y. The p-values are left-tailed probabilities. 12-510
egcitest
stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. egcitest returns stat when you supply the input Y. The RReg and Test settings of a particular test determine the test statistic. For more details, see adftest and pptest. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. egcitest returns cValue when you supply the input Y. The critical values are for left-tailed probabilities. Because egcitest estimates the residuals (that is, residuals are unobserved), critical values are different from those used in adftest or pptest (unless the cointegrating vector is completely specified by the CVec setting). egcitest loads tables of critical values from the file Data_EGCITest.mat, and then linearly interpolates test critical values from the tables. Critical values in the tables derive from methods described in [3]. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. egcitest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Test, CReg, and RReg. reg1 — Cointegrating regression statistics structure array Cointegrating regression statistics, returned as a structure array. The number of records equal to the number of tests. egcitest regresses the response variable ResponseVariable onto the predictor variables PredictorVariables using the regression form CReg and specified equality constraints CVec. Each element of reg1 has the fields in this table. You can access a field using dot notation, for example, reg1(3).coeff contains the coefficient estimates of the third test. num
Length of input series with NaNs removed
size
Effective sample size, adjusted for lags and difference
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
12-511
12
Functions
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
reg2 — Residual regression statistics structure array Residual regression statistics, returned as a structure array containing the same fields as reg1. The number of records equal to the number of tests. egcitest tests the residuals of the cointegrating regression for a unit root by passing the residuals, and the values of Lags and Test, to the test specified by RReg. The tests form the test statistic by a regression of the residuals using specified options. For more details on the test options and the fields of reg2, see adftest or pptest.
Tips • To draw valid inferences from the test, determine a suitable value for Lags. For more details, see the adftest “Tips” on page 12-17 and the pptest “Tips” on page 12-1759. • Samples with less than approximately 20 through 40 observations (depending on the dimension of the data numDims) can yield unreliable critical values, and therefore unreliable inferences. See [3]. • If a test result suggests that the time series are cointegrated, you can use the residuals as data for the error-correction term in a VEC representation of the variables. Follow this procedure: 1
Extract the residuals from the reg1 output (reg1.res).
2
Estimate autoregressive model components using the estimate function of varm, and treat the extracted residual series as exogenous for estimation.
Alternative Functionality App The Econometric Modeler app enables you to conduct the Engle-Granger cointegration test. 12-512
egcitest
Version History Introduced in R2011a
References [1] Engle, R. F. and C. W. J. Granger. "Co-Integration and Error-Correction: Representation, Estimation, and Testing." Econometrica. Vol. 55, 1987, pp. 251–276. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] MacKinnon, J. G. "Numerical Distribution Functions for Unit Root and Cointegration Tests." Journal of Applied Econometrics. Vol. 11, 1996, pp. 601–618.
See Also Objects vecm | varm Functions estimate | jcitest | adftest | pptest | vec2var Topics “Cointegration and Error Correction Analysis” on page 9-108 “Identifying Single Cointegrating Relations” on page 9-114 “Test for Cointegration Using the Engle-Granger Test” on page 9-118 “Estimate VEC Model Parameters Using egcitest” on page 9-122 “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50
12-513
12
Functions
eigplot Plot Markov chain eigenvalues
Syntax eigplot(mc) eVals = eigplot(mc) eigplot(ax,mc) [eVals,h] = eigplot( ___ )
Description eigplot(mc) creates a plot containing the eigenvalues of the transition matrix of the discrete-time Markov chain mc on the complex plane. The plot highlights the following: • Unit circle • Perron-Frobenius eigenvalue at (1,0) • Circle of second largest eigenvalue magnitude (SLEM) • Spectral gap between the two circles, which determines the mixing time eVals = eigplot(mc) additionally returns the eigenvalues eVals sorted by magnitude. eigplot(ax,mc) plots on the axes specified by ax instead of the current axes (gca). [eVals,h] = eigplot( ___ ) additionally returns the handle to the eigenvalue plot using input any of the input arguments in the previous syntaxes. Use h to modify properties of the plot after you create it.
Examples Plot Markov Chain Eigenvalues Create 10-state Markov chains from two random transition matrices, with one transition matrix being more sparse than the other. rng(1); % For reproducibility numstates = 10; mc1 = mcmix(numstates,'Zeros',20); mc2 = mcmix(numstates,'Zeros',80); % mc2.P is more sparse than mc1.P
Plot the eigenvalues of the transition matrices on the separate complex planes. figure; eigplot(mc1);
12-514
eigplot
figure; eigplot(mc2);
12-515
12
Functions
The pink disc in the plots show the spectral gap (the difference between the two largest eigenvalue moduli). The spectral gap determines the mixing time of the Markov chain. Large gaps indicate faster mixing, whereas thin gaps indicate slower mixing. Because the spectral gap of mc1 is thicker than the spectral gap of mc2, mc1 mixes faster than mc2.
Compute Transition Matrix Eigenvalues Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 0
12-516
0 0 0
1/2 1/4 1/4 0 0 ; 1/3 0 2/3 0 0 ; 0 0 0 1/3 2/3;
eigplot
0 0 0 0 1/2 1/2 1/4 3/4 mc = dtmc(P);
0 0 0 0
0 0 0 0
0 0 0 0
1/2 1/2; 3/4 1/4; 0 0 ; 0 0 ];
Plot and return the eigenvalues of the transition matrix on the complex plane. figure; eVals = eigplot(mc)
eVals = 7×1 complex -0.5000 -0.5000 1.0000 -0.3207 0.1604 0.1604 -0.0000
+ + + + +
0.8660i 0.8660i 0.0000i 0.0000i 0.2777i 0.2777i 0.0000i
Three eigenvalues have modulus one, which indicates that the period of mc is three. Compute the mixing time of the Markov chain. [~,tMix] = asymptotics(mc) tMix = 0.8793
12-517
12
Functions
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, eigplot plots to the current axes (gca).
Output Arguments eVals — Transition matrix eigenvalues numeric vector Transition matrix eigenvalues sorted by magnitude, returned as a numeric vector. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot. Note • By the Perron-Frobenius Theorem [2], a chain with a single recurrent communicating class (a unichain) has exactly one eigenvalue equal to 1 (the Perron-Frobenius eigenvalue), and an accompanying nonnegative left eigenvector that normalizes to a unique stationary distribution. All other eigenvalues have modulus less than or equal to 1. The inequality is strict unless the recurrent class is periodic. When there is periodicity of period k, there are k eigenvalues on the unit circle at the k roots of unity. • For an ergodic unichain, any initial distribution converges to the stationary distribution at a rate determined by the second largest eigenvalue modulus (SLEM), μ. The spectral gap, 1 – μ, provides a visual measure, with large gaps (smaller SLEM circles) producing faster convergence. Rates are exponential, with a characteristic time given by tMix = −
1 . log μ
See asymptotics.
Version History Introduced in R2017b 12-518
eigplot
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [3] Seneta, E. Non-negative Matrices and Markov Chains. New York, NY: Springer-Verlag, 1981.
See Also Objects dtmc Functions asymptotics Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Determine Asymptotic Behavior of Markov Chain” on page 10-39 “Compare Markov Chain Mixing Times” on page 10-50
12-519
12
Functions
empiricalblm Bayesian linear regression model with samples from prior or posterior distributions
Description The Bayesian linear regression model on page 12-528 object empiricalblm contains samples from the prior distributions of β and σ2, which MATLAB uses to characterize the prior or posterior distributions. The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. Because the form of the prior distribution functions are unknown, the resulting posterior distributions are not analytically tractable. Hence, to estimate or simulate from posterior distributions, MATLAB implements sampling importance resampling on page 12-529. You can create a Bayesian linear regression model with an empirical prior directly using bayeslm or empiricalblm. However, for empirical priors, estimating the posterior distribution requires that the prior closely resemble the posterior. Hence, empirical models are better suited for updating posterior distributions estimated using Monte Carlo sampling (for example, semiconjugate and custom prior models) given new data.
Creation Either the estimate function returns an empiricalblm object or you directly create one by using empiricalblm. • Return empiricalblm object using estimate: For semiconjugate, empirical, custom, and variable-selection prior models, estimate estimates the posterior distribution using Monte Carlo sampling. Specifically, estimate characterizes the posterior distribution by a large number of draws from that distribution. estimate stores the draws in the BetaDraws and Sigma2Draws properties of the returned Bayesian linear regression model object. Hence, when you estimate semiconjugateblm, empiricalblm, customblm, lassoblm, mixconjugateblm, and mixconjugateblm model objects, estimate returns an empiricalblm object. • Create empiricalblm object directly: If you want to update an estimated posterior distribution using new data, and you have draws from the posterior distribution of β and σ2, you can create an empirical model using empiricalblm.
Syntax PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws) PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws,Name,Value)
12-520
empiricalblm
Description PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws) creates a Bayesian linear regression model on page 12-528 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The random samples from the prior distributions of β and σ2, BetaDraws and Sigma2Draws, respectively, characterize the prior distributions. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws,Name,Value) sets properties on page 12-521 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, empiricalblm(2,'BetaDraws',BetaDraws,'Sigma2Draws',Sigma2Draws,'Intercept', false) specifies the random samples from the prior distributions of β and σ2 and specifies a regression model with 2 regression coefficients, but no intercept.
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to specify that there is no model intercept in PriorMdl, a Bayesian linear regression model containing three model coefficients, enter PriorMdl.Intercept = false;
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term from the value. After creating a model, if you change the value of NumPredictors using dot notation, then VarNames reverts to its default value. Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
12-521
12
Functions
Value
Description
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char BetaDraws — Random sample from prior distribution of β numeric matrix Random sample from the prior distribution of β, specified as a (Intercept + NumPredictors)-byNumDraws numeric matrix. Rows correspond to regression coefficients; the first row corresponds to the intercept, and the subsequent rows correspond to columns in the predictor data. Columns correspond to successive draws from the prior distribution. BetaDraws and SigmaDraws must have the same number of columns. NumDraws should be reasonably large. Data Types: double Sigma2Draws — Random sample from prior distribution of σ2 numeric matrix Random sample from the prior distribution of σ2, specified as a 1-by-NumDraws numeric matrix. Columns correspond to successive draws from the prior distribution. BetaDraws and Sigma2Draws must have the same number of columns. NumDraws should be reasonably large. Data Types: double
Object Functions estimate 12-522
Estimate posterior distribution of Bayesian linear regression model parameters
empiricalblm
simulate forecast plot summarize
Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of standard Bayesian linear regression model
Examples Create Empirical Prior Model Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t time points, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume that the prior distributions are: •
β | σ2 ∼ N4 M, V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. These assumptions and the data likelihood imply a normal-inverse-gamma semiconjugate model. That is, the conditional posteriors are conjugate to the prior with respect to the data likelihood, but the marginal posterior is analytically intractable. Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors p. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate') PriorMdl = semiconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ------------------------------------------------------------------------------Intercept | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(1) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(2) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(3) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2)
12-523
12
Functions
Sigma2
| 0.5000
0.5000
[ 0.138,
1.616]
1.000
IG(3.00,
1)
Mdl is a semiconjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, bayeslm displays a summary of the prior distributions. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser VarNames = {'IPI'; 'E'; 'WR'}; X = DataTable{:,VarNames}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y); Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical Beta(1) | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical Beta(2) | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical Beta(3) | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical
PosteriorMdl is an empiricalblm model object storing draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of WR is in [1.762, 3.178] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.005. In this case, the marginal posterior is analytically intractable. Therefore, estimate uses Gibbs sampling to draw from the posterior and estimate the posterior characteristics.
Update Marginal Posterior Distribution for New Data Consider the linear regression model in “Create Empirical Prior Model” on page 12-523. Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]);
12-524
empiricalblm
Load the Nelson-Plosser data set. Partition the data by reserving the last five periods in the series. load X0 = y0 = X1 = y1 =
Data_NelsonPlosser DataTable{1:(end - 5),PriorMdl.VarNames(2:end)}; DataTable{1:(end - 5),'GNPR'}; DataTable{(end - 4):end,PriorMdl.VarNames(2:end)}; DataTable{(end - 4):end,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. rng(1); % For reproducibility PosteriorMdl0 = estimate(PriorMdl,X0,y0); Method: Gibbs sampling with 10000 draws Number of observations: 57 Number of predictors: 4 | Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | -34.3887 10.5218 [-55.350, -13.615] 0.001 Empirical IPI | 3.9076 0.2786 [ 3.356, 4.459] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 3.2146 0.4967 [ 2.228, 4.196] 1.000 Empirical Sigma2 | 45.3098 8.5597 [31.620, 64.972] 1.000 Empirical
PosteriorMdl0 is an empiricalblm model object storing the Gibbs-sampling draws from the posterior distribution. Update the posterior distribution based on the last 5 periods of data by passing those observations and the posterior distribution to estimate. PosteriorMdl1 = estimate(PosteriorMdl0,X1,y1); Method: Importance sampling/resampling with 10000 draws Number of observations: 5 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -24.3152 9.3408 [-41.163, -5.301] 0.008 Empirical IPI | 4.3893 0.1440 [ 4.107, 4.658] 1.000 Empirical E | 0.0011 0.0004 [ 0.000, 0.002] 0.998 Empirical WR | 2.4763 0.3694 [ 1.630, 3.170] 1.000 Empirical Sigma2 | 46.5211 8.2913 [33.646, 65.402] 1.000 Empirical
To update the posterior distributions based on draws, estimate uses sampling importance resampling.
Estimate Posterior Probability Using Monte Carlo Simulation Consider the linear regression model in “Estimate Marginal Posterior Distribution” on page 12-1861. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. 12-525
12
Functions
p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y); Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical IPI | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical
Estimate posterior distribution summary statistics for β by using the draws from the posterior distribution stored in posterior model. estBeta = mean(PosteriorMdl.BetaDraws,2); EstBetaCov = cov(PosteriorMdl.BetaDraws');
Suppose that if the coefficient of real wages is below 2.5, then a policy is enacted. Although the posterior distribution of WR is known, and so you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead. Draw 1e6 samples from the marginal posterior distribution of β. NumDraws = 1e6; rng(1); BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws);
BetaSim is a 4-by- 1e6 matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws. Isolate the draws corresponding to the coefficient of real wages, and then identify which draws are less than 2.5. isWR = PosteriorMdl.VarNames == "WR"; wrSim = BetaSim(isWR,:); isWRLT2p5 = wrSim < 2.5;
Find the marginal posterior probability that the regression coefficient of WR is below 2.5 by computing the proportion of draws that are less than 2.5. probWRLT2p5 = mean(isWRLT2p5) probWRLT2p5 = 0.5283
The posterior probability that the coefficient of real wages is less than 2.5 is about 0.53. Copyright 2018 The MathWorks, Inc. 12-526
empiricalblm
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Estimate Marginal Posterior Distribution” on page 12-1861. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so you can use them to forecast real GNP. Turn the estimation display off. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and using the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off
12-527
12
Functions
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 25.1938
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. 12-528
empiricalblm
• β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters. Sampling Importance Resampling In the Bayesian statistics context, sampling importance resampling is a Monte Carlo algorithm for drawing samples from the posterior distribution. In general, the algorithm draws a large, weighted sample of parameter values with replacement from their respective prior distributions. This is the algorithm in the context of Bayesian linear regression. 1
Randomly draw a large number of samples from the joint prior distribution π(β,σk2). B is the (p + 1)-by-K matrix containing the values of β, and Σ is the 1-by-K vector containing the values of σ2.
2
For each draw k = 1,...,K: a
Estimate the residual vector rkt = yt − Xt βk, where yj is response j, Xj is the 1-by-p vector of predictor observations, and βk is draw k from B.
b
Evaluate the likelihood ℓk = ∑ j log ϕ rk j; 0, σk2 , where ϕ(rkj;0,σk2) is the pdf of the normal distribution with a mean of 0 and variance of σk2 (draw k from Σ).
c
3
Estimate the normalized importance weight wk =
exp ℓk
∑k exp ℓk
.
Randomly draw, with replacement, K parameters from B and Σ with respect to the normalized importance weights.
The resulting sample is approximately from the joint posterior distribution π(β,σk2|yt,Xt). Those prior draws that the algorithm is likely to choose to be in the posterior sample are those that yield higher data likelihood values. If the prior draws yield poor likelihood values, then the chosen posterior sample will poorly approximate the actual posterior distribution. For details on diagnostics, see “Algorithms” on page 12-529.
Algorithms • After implementing sampling importance resampling on page 12-529 to sample from the posterior distribution, estimate, simulate, and forecast compute the effective sample size (ESS), which is the number of samples required to yield reasonable posterior statistics and inferences. Its formula is 12-529
12
Functions
ESS =
1 . ∑ j w2j
If ESS < 0.01*NumDraws, then MATLAB throws a warning. The warning implies that, given the sample from the prior distribution, the sample from the proposal distribution is too small to yield good quality posterior statistics and inferences. • If the effective sample size is too small, then: • Increase the sample size of the draws from the prior distributions. • Adjust the prior distribution hyperparameters, and then resample from them. • Specify BetaDraws and Sigma2Draws as samples from informative prior distributions. That is, if the proposal draws come from nearly flat distributions, then the algorithm can be inefficient.
Alternatives The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2017a
See Also Objects customblm | semiconjugateblm | lassoblm | mixconjugateblm | mixconjugateblm Functions bayeslm Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-530
empiricalbvarm
empiricalbvarm Bayesian vector autoregression (VAR) model with samples from prior or posterior distribution
Description The Bayesian VAR model on page 12-540 object empiricalbvarm contains samples from the distributions of the coefficients Λ and innovations covariance matrix Σ of a VAR(p) model, which MATLAB uses to characterize the corresponding prior or posterior distributions. For Bayesian VAR model objects that have an intractable posterior, the estimate function returns an empiricalbvarm object representing the empirical posterior distribution. However, if you have random draws from the prior or posterior distributions of the coefficients and innovations covariance matrix, you can create a Bayesian VAR model with an empirical prior directly by using empiricalbvarm.
Creation Syntax Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,'SigmaDraws', SigmaDraws) Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,' SigmaDraws',SigmaDraws,Name,Value) Description Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,'SigmaDraws', SigmaDraws) creates a numseries-D Bayesian VAR(numlags) model object Mdl characterized by the random samples from the prior or posterior distributions of λ = vec Λ = vec Φ1 Φ2 ⋯ Φp c δ Β ′ and Σ, CoeffDraws and SigmaDraws, respectively. • numseries = m, a positive integer specifying the number of response time series variables. • numlags = p, a nonnegative integer specifying the AR polynomial order (that is, number of numseries-by-numseries AR coefficient matrices in the VAR model). Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,' SigmaDraws',SigmaDraws,Name,Value) sets writable properties on page 12-532 (except NumSeries and P) using name-value pair arguments. Enclose each property name in quotes. For example, empiricalbvarm(3,2,'CoeffDraws',CoeffDraws,'SigmaDraws',SigmaDraws,'SeriesNam es',["UnemploymentRate" "CPI" "FEDFUNDS"]) specifies the random samples from the distributions of λ and Σ and the names of the three response variables. Because the posterior distributions of a semiconjugate prior model (semiconjugatebvarm) are analytically intractable, estimate returns an empiricalbvarm object that characterizes the posteriors and contains the Gibbs sampler draws from the full conditionals. 12-531
12
Functions
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double numlags — Number of lagged responses nonnegative integer Number of lagged responses in each equation of yt, specified as a nonnegative integer. The resulting model is a VAR(numlags) model; each lag has a numseries-by-numseries coefficient matrix. numlags sets the P property. Data Types: double
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to create a 3-D Bayesian VAR(1) model from the coefficient and innovations covariance arrays of draws CoeffDraws and SigmaDraws, respectively, and then label the response variables, enter: Mdl = empiricalbvarm(3,1,'CoeffDraws',CoeffDraws,'SigmaDraws',SigmaDraws); Mdl.SeriesNames = ["UnemploymentRate" "CPI" "FEDFUNDS"]; Required Draws from Distribution
CoeffDraws — Random sample from prior or posterior distribution of λ numeric matrix Random sample from the prior or posterior distribution of λ, specified as a NumSeries*k-bynumdraws numeric matrix, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). CoeffDraws represents the empirical distribution of λ based on a size numdraws sample. Columns correspond to successive draws from the distribution. CoeffDraws(1:k,:) corresponds to all coefficients in the equation of response variable SeriesNames(1), CoeffDraws((k + 1): (2*k),:) corresponds to all coefficients in the equation of response variable SeriesNames(2), and so on. For a set of row indices corresponding to an equation: • Elements 1 through NumSeries correspond to the lag 1 AR coefficients of the response variables ordered by SeriesNames. • Elements NumSeries + 1 through 2*NumSeries correspond to the lag 2 AR coefficients of the response variables ordered by SeriesNames. • In general, elements (q – 1)*NumSeries + 1 through q*NumSeries correspond to the lag q AR coefficients of the response variables ordered by SeriesNames. • If IncludeConstant is true, element NumSeries*P + 1 is the model constant. 12-532
empiricalbvarm
• If IncludeTrend is true, element NumSeries*P + 2 is the linear time trend coefficient. • If NumPredictors > 0, elements NumSeries*P + 3 through k constitute the vector of regression coefficients of the exogenous variables. This figure shows the row structure of CoeffDraws for a 2-D VAR(3) model that contains a constant vector and four exogenous predictors: y1, t y2, t ⨉ ⨉ [ϕ1, 11 ϕ1, 12 ϕ2, 11 ϕ2, 12 ϕ3, 11 ϕ3, 12 c1 β11 β12 β13 β14 ϕ1, 21 ϕ1, 22 ϕ2, 21 ϕ2, 22 ϕ3, 21 ϕ3, 22 c2 β21 β22 β23 β24
], where • ϕq,jk is element (j,k) of the lag q AR coefficient matrix. • cj is the model constant in the equation of response variable j. • Bju is the regression coefficient of the exogenous variable u in the equation of response variable j. CoeffDraws and SigmaDraws must be based on the same number of draws, and both must represent draws from either the prior or posterior distribution. numdraws should be reasonably large, for example, 1e6. Data Types: double SigmaDraws — Random sample from prior or posterior distribution of Σ array of positive definite numeric matrices Random sample from the prior or posterior distribution of Σ, specified as a NumSeries-byNumSeries-by-numdraws array of positive definite numeric matrices. SigmaDraws represents the empirical distribution of Σ based on a size numdraws sample. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. Columns correspond to successive draws from the distribution. CoeffDraws and SigmaDraws must be based on the same number of draws, and both must represent draws from either the prior or posterior distribution. numdraws should be reasonably large, for example, 1e6. Data Types: double Model Characteristics and Dimensionality
Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the model dimensionality, for example '2-Dimensional VAR(3) Model'. Example: "Model 1" Data Types: string | char NumSeries — Number of time series m positive integer 12-533
12
Functions
This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double P — Multivariate autoregressive polynomial order nonnegative integer This property is read-only. Multivariate autoregressive polynomial order, specified as a nonnegative integer. P is the maximum lag that has a nonzero coefficient matrix. P specifies the number of presample observations required to initialize the model. Data Types: double SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. empiricalbvarm stores SeriesNames as a string vector. Example: ["UnemploymentRate" "CPI" "FEDFUNDS"] Data Types: string IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Data Types: logical IncludeTrend — Flag for including linear time trend term δt false (default) | true Flag for including a linear time trend term δt, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Data Types: logical 12-534
empiricalbvarm
NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. empiricalbvarm includes all predictor variables symmetrically in each response equation. VAR Model Parameters Derived from Distribution Draws
AR — Distribution mean of autoregressive coefficient matrices Φ1,…,Φp cell vector of numeric matrices This property is read-only. Distribution mean of the autoregressive coefficient matrices Φ1,…,Φp associated with the lagged responses, specified as a P-D cell vector of NumSeries-by-NumSeries numeric matrices. AR{j} is Φj, the coefficient matrix of lag j . Rows correspond to equations and columns correspond to lagged response variables; SeriesNames determines the order of response variables and equations. Coefficient signs are those of the VAR model expressed in difference-equation notation. If P = 0, AR is an empty cell. Otherwise, AR is the collection of AR coefficient means extracted from Mu. Data Types: cell Constant — Distribution mean of model constant c numeric vector This property is read-only. Distribution mean of the model constant c (or intercept), specified as a NumSeries-by-1 numeric vector. Constant(j) is the constant in equation j; SeriesNames determines the order of equations. If IncludeConstant = false, Constant is an empty array. Otherwise, Constant is the model constant vector mean extracted from Mu. Data Types: double Trend — Distribution mean of linear time trend δ numeric vector This property is read-only. Distribution mean of the linear time trend δ, specified as a NumSeries-by-1 numeric vector. Trend(j) is the linear time trend in equation j; SeriesNames determines the order of equations. If IncludeTrend = false (the default), Trend is an empty array. Otherwise, Trend is the linear time trend coefficient mean extracted from Mu. Data Types: double Beta — Distribution mean of regression coefficient matrix Β numeric matrix This property is read-only. 12-535
12
Functions
Distribution mean of the regression coefficient matrix B associated with the exogenous predictor variables, specified as a NumSeries-by-NumPredictors numeric matrix. Beta(j,:) contains the regression coefficients of each predictor in the equation of response variable j yj,t. Beta(:,k) contains the regression coefficient in each equation of predictor xk. By default, all predictor variables are in the regression component of all response equations. You can down-weight a predictor from an equation by specifying, for the corresponding coefficient, a prior mean of 0 in Mu and a small variance in V. When you create a model, the predictor variables are hypothetical. You specify predictor data when you operate on the model (for example, when you estimate the posterior by using estimate). Columns of the predictor data determine the order of the columns of Beta. Data Types: double Covariance — Distribution mean of innovations covariance matrix Σ positive definite numeric matrix This property is read-only. Distribution mean of the innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries positive definite numeric matrix. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. Data Types: double
Object Functions summarize
Distribution summary statistics of Bayesian vector autoregression (VAR) model
Examples Create Empirical Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. You can create an empirical Bayesian VAR model for the coefficients Φ1, . . . , Φ4, c ′ and innovations covariance matrix Σ in two ways:
12-536
1
Indirectly create an empiricalbvarm model by estimating the posterior distribution of a semiconjugate prior model.
2
Directly create an empiricalbvarm model by supplying draws from the prior or posterior distribution of the parameters.
empiricalbvarm
Indirect Creation Assume the following prior distributions: • vec Φ1, . . . , Φ4, c ′ Σ ∼ Ν39 μ, V , where μ is a 39-by-1 vector of means and V is the 39-by-39 covariance matrix. • Σ ∼ Inverse Wishart Ω, ν , where Ω is the 3-by-3 scale matrix and ν is the degrees of freedom. Create a semiconjugate prior model for the 3-D VAR(4) model parameters. numseries = 3; numlags = 4; PriorMdl = semiconjugatebvarm(numseries,numlags) PriorMdl = semiconjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] 13 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a semiconjugatebvarm Bayesian VAR model object representing the prior distribution of the coefficients and innovations covariance of the 3-D VAR(4) model. Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
12-537
12
Functions
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Estimate the posterior distribution by passing the prior model and entire data series to estimate. rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','off') PosteriorMdl = empiricalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: CoeffDraws: SigmaDraws: AR:
12-538
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x10000 double] [3x3x10000 double] {[3x3 double] [3x3 double] [3x3 double]
[3x3 double]}
empiricalbvarm
Constant: Trend: Beta: Covariance:
[3x1 [3x0 [3x0 [3x3
double] double] double] double]
PosteriorMdl is an empiricalbvarm model representing the empirical posterior distribution of the coefficients and innovations covariance matrix. empiricalbvarm stores the draws from the posteriors of λ and Σ in the CoeffDraws and SigmaDraws properties, respectively. Direct Creation Draw a random sample of size 1000 from the prior distribution PriorMdl. numdraws = 1000; [CoeffDraws,SigmaDraws] = simulate(PriorMdl,'NumDraws',numdraws); size(CoeffDraws) ans = 1×2 39
1000
size(SigmaDraws) ans = 1×3 3
3
1000
Create a Bayesian VAR model characterizing the empirical prior distributions of the parameters. PriorMdlEmp = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,... 'SigmaDraws',SigmaDraws) PriorMdlEmp = empiricalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: CoeffDraws: SigmaDraws: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1000 double] [3x3x1000 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
Display the prior covariance mean matrices of the four AR coefficients by setting each matrix in the cell to a variable. AR1 = PriorMdlEmp.AR{1}
12-539
12
Functions
AR1 = 3×3 -0.0198 -0.0207 -0.0009
0.0181 -0.0301 0.0638
-0.0273 -0.0070 0.0113
AR2 = PriorMdlEmp.AR{2} AR2 = 3×3 -0.0453 -0.0103 0.0277
0.0371 -0.0304 -0.0253
0.0110 -0.0011 0.0061
AR3 = PriorMdlEmp.AR{3} AR3 = 3×3 0.0368 -0.0306 -0.0314
-0.0059 -0.0106 -0.0276
0.0018 0.0179 0.0116
AR4 = PriorMdlEmp.AR{4} AR4 = 3×3 0.0159 -0.0178 0.0476
0.0406 0.0415 -0.0128
-0.0315 -0.0024 -0.0165
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. 12-540
empiricalbvarm
• δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. • zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation.
Version History Introduced in R2020a
See Also Functions bayesvarm Objects semiconjugatebvarm
12-541
12
Functions
estimate Fit autoregressive integrated moving average (ARIMA) model to data
Syntax EstMdl = estimate(Mdl,y) EstMdl = estimate(Mdl,y,Name,Value) [EstMdl,EstParamCov,logL,info] = estimate( ___ )
Description EstMdl = estimate(Mdl,y) estimates parameters of the partially specified ARIMA(p,D,q) model Mdl given the observed univariate time series y using maximum likelihood. EstMdl is the corresponding fully specified ARIMA model that stores the parameter estimates. EstMdl = estimate(Mdl,y,Name,Value) uses additional options specified by one or more namevalue arguments. For example, 'X',X includes a linear regression component in the model for the exogenous data in X. [EstMdl,EstParamCov,logL,info] = estimate( ___ ) also returns the variance-covariance matrix associated with the estimated parameters EstParamCov, optimized loglikelihood objective function value logL, and summary information info, using any of the input argument combinations in the previous syntaxes.
Examples Estimate ARMA Model Fit an ARMA(2,1) model to simulated data. Simulate Data from Known Model Suppose that the data generating process (DGP) is yt = 0 . 5yt − 1 − 0 . 3yt − 2 + εt + 0 . 2εt − 1, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1. Create the ARMA(2,1) model representing the DGP. DGP = arima('AR',{0.5,-0.3},'MA',0.2,... 'Constant',0,'Variance',0.1) DGP = arima with properties: Description: Distribution: P: D:
12-542
"ARIMA(2,0,1) Model (Gaussian Distribution)" Name = "Gaussian" 2 0
estimate
Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
1 0 {0.5 -0.3} at lags [1 2] {} {0.2} at lag [1] {} 0 [1×0] 0.1
DGP is a fully specified arima model object. Simulate a random 500 observation path from the ARMA(2,1) model. rng(5); % For reproducibility T = 500; y = simulate(DGP,T);
y is a 500-by-1 column vector representing a simulated response path from the ARMA(2,1) model DGP. Estimate Model Create an ARMA(2,1) model template for estimation. Mdl = arima(2,0,1) Mdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,1) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 1 NaN {NaN NaN} at lags [1 2] {} {NaN} at lag [1] {} 0 [1×0] NaN
Mdl is a partially specified arima model object. Only required, nonestimable parameters that determine the model structure are specified. NaN-valued properties, including ϕ1, ϕ2, θ1, c, and σ2, are unknown model parameters to be estimated. Fit the ARMA(2,1) model to y. EstMdl = estimate(Mdl,y) ARIMA(2,0,1) Model (Gaussian Distribution): Value _________
StandardError _____________
TStatistic __________
PValue __________
12-543
12
Functions
Constant AR{1} AR{2} MA{1} Variance
0.0089018 0.49563 -0.25495 0.27737 0.10004
0.018417 0.10323 0.070155 0.10732 0.0066577
0.48334 4.8013 -3.6341 2.5846 15.027
0.62886 1.5767e-06 0.00027897 0.0097491 4.9017e-51
EstMdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,1) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 1 0.00890178 {0.495632 -0.254951} at lags [1 2] {} {0.27737} at lag [1] {} 0 [1×0] 0.100043
MATLAB® displays a table containing an estimation summary, which includes parameter estimates and inferences. For example, the Value column contains corresponding maximum-likelihood estimates, and the PValue column contains p-values for the asymptotic t-test of the null hypothesis that the corresponding parameter is 0. EstMdl is a fully specified, estimated arima model object; its estimates resemble the parameter values of the DGP.
Apply Equality Constraints to Parameters During Estimation Fit an AR(2) model to simulated data while holding the model constant fixed during estimation. Simulate Data from Known Model Suppose the DGP is yt = 0 . 5yt − 1 − 0 . 3yt − 2 + εt, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1. Create the AR(2) model representing the DGP. DGP = arima('AR',{0.5,-0.3},... 'Constant',0,'Variance',0.1);
Simulate a random 500 observation path from the model. rng(5); % For reproducibility T = 500; y = simulate(DGP,T);
12-544
estimate
Create Model Object Specifying Constraint Assume that the mean of yt is 0, which implies that c is 0. Create an AR(2) model for estimation. Set c to 0. Mdl = arima('ARLags',1:2,'Constant',0) Mdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 0 0 {NaN NaN} at lags [1 2] {} {} {} 0 [1×0] NaN
Mdl is a partially specified arima model object. Specified parameters include all required parameters and the model constant. NaN-valued properties, including ϕ1, ϕ2, and σ2, are unknown model parameters to be estimated. Estimate Model Fit the AR(2) model template containing the constraint to y. EstMdl = estimate(Mdl,y) ARIMA(2,0,0) Model (Gaussian Distribution): Value ________ Constant AR{1} AR{2} Variance
0 0.56342 -0.29355 0.10022
StandardError _____________ 0 0.044225 0.041786 0.006644
TStatistic __________ NaN 12.74 -7.0252 15.085
PValue __________ NaN 3.5474e-37 2.137e-12 2.0476e-51
EstMdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA:
"ARIMA(2,0,0) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 0 0 {0.563425 -0.293554} at lags [1 2] {} {}
12-545
12
Functions
SMA: Seasonality: Beta: Variance:
{} 0 [1×0] 0.100222
EstMdl is a fully specified, estimated arima model object; its estimates resemble the parameter values of the AR(2) model DGP. The value of c in the estimation summary and object display is 0, and corresponding inferences are trivial or do not apply.
Initialize Model Estimation Using Presample Response Data Because an ARIMA model is a function of previous values, estimate requires presample data to initialize the model early in the sampling period. Although, estimate backcasts for presample data by default, you can specify required presample data instead. The P property of an arima model object specifies the required number of presample observations. Load Data Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
The table DataTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 1995. Convert the table to a timetable. dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd'); TT = table2timetable(DataTable,'RowTimes',dt); T = size(TT,1); % Total sample size
Create Model Template Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Mdl = arima(1,1,1) Mdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
12-546
"ARIMA(1,1,1) Model (Gaussian Distribution)" Name = "Gaussian" 2 1 1 NaN {NaN} at lag [1] {} {NaN} at lag [1] {} 0 [1×0] NaN
estimate
Mdl is a partially specified arima model object. Partition Sample Create vectors of indices that partition the sample into presample and estimation sample periods, so that the presample occurs first and contains Mdl.P = 2 observations, and the estimation sample contains the remaining observations. presample = 1:Mdl.P; estsample = (Mdl.P + 1):T;
Estimate Model Fit an ARIMA(1,1,1) model to the estimation sample. Specify the presample responses. EstMdl = estimate(Mdl,TT{estsample,"NYSE"},'Y0',TT{presample,"NYSE"}); ARIMA(1,1,1) Model (Gaussian Distribution): Value ________ Constant AR{1} MA{1} Variance
0.15775 -0.21985 0.28529 17.17
StandardError _____________
TStatistic __________
0.097888 0.15652 0.15393 0.20065
1.6115 -1.4046 1.8534 85.573
PValue _______ 0.10707 0.16015 0.06382 0
EstMdl is a fully specified, estimated arima model object.
Specify Initial Parameter Values for Optimization Fit an ARIMA(1,1,1) model to the daily close of the NYSE Composite Index. Specify initial parameter values obtained from an analysis of a pilot sample. Load Data Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
The table DataTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 1995. Convert the table to a timetable. dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd'); TT = table2timetable(DataTable,'RowTimes',dt);
Fit Model to Pilot Sample Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Mdl = arima(1,1,1);
12-547
12
Functions
Mdl is a partially specified arima model object. Treat the first two years as a pilot sample for obtaining initial parameter values when fitting the model to the remaining three years of data. Fit the model to the pilot sample. endPilot = datetime(1991,12,31); pilottr = timerange(TT.Time(1),endPilot,'days'); EstMdl0 = estimate(Mdl,TT{pilottr,"NYSE"},'Display','off');
EstMdl0 is a fully specified, estimated arima model object. Estimate Model Fit an ARIMA(1,1,1) model to the estimation sample. Specify the estimated parameters from the pilot sample fit as initial values for optimization. esttr = timerange(endPilot + days(1),TT.Time(end),'days'); c0 = EstMdl0.Constant; ar0 = EstMdl0.AR; ma0 = EstMdl0.MA; var0 = EstMdl0.Variance; EstMdl = estimate(Mdl,TT{esttr,"NYSE"},'Constant0',c0,'AR0',ar0,... 'MA0',ma0,'Variance0',var0); ARIMA(1,1,1) Model (Gaussian Distribution):
Constant AR{1} MA{1} Variance
Value _______
StandardError _____________
0.17424 -0.2262 0.29047 20.053
0.11648 0.18587 0.18276 0.27603
TStatistic __________ 1.4959 -1.2169 1.5893 72.65
PValue _______ 0.13468 0.22363 0.11199 0
EstMdl is a fully specified, estimated arima model object.
Estimate ARIMA Model Containing Exogenous Predictors (ARIMAX) Fit an ARIMAX model to simulated time series data. Simulate Predictor and Response Data Create the ARIMAX(2,1,0) model for the DGP, represented by yt in the equation 1
(1 − 0 . 5L + 0 . 3L2)(1 − L) yt = 2 + 1 . 5x1, t + 2 . 6x2, t − 0 . 3x3, t + εt, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1. DGP = arima('AR',{0.5,-0.3},'D',1,'Constant',2,... 'Variance',0.1,'Beta',[1.5 2.6 -0.3]);
Assume that the exogenous variables x1, t, x2, t, and x3, t are represented by the AR(1) processes 12-548
estimate
x1, t = 0 . 1x1, t − 1 + η1, t x2, t = 0 . 2x2, t − 1 + η2, t x3, t = 0 . 3x3, t − 1 + η3, t, where ηi, t follows a Gaussian distribution with mean 0 and variance 0.01 for i ∈ 1, 2, 3 . Create ARIMA models that represent the exogenous variables. MdlX1 = arima('AR',0.1,'Constant',0,'Variance',0.01); MdlX2 = arima('AR',0.2,'Constant',0,'Variance',0.01); MdlX3 = arima('AR',0.3,'Constant',0,'Variance',0.01);
Simulate length 1000 exogenous series from the AR models. Store the simulated data in a matrix. T = 1000; rng(10); % For reproducibility x1 = simulate(MdlX1,T); x2 = simulate(MdlX2,T); x3 = simulate(MdlX3,T); X = [x1 x2 x3];
X is a 1000-by-3 matrix of simulated time series data. Each row corresponds to an observation in the time series, and each column corresponds to an exogenous variable. Simulate a length 1000 series from the DGP. Specify the simulated exogenous data. y = simulate(DGP,T,'X',X);
y is a 1000-by-1 vector of response data. Estimate Model Create an ARIMA(2,1,0) model template for estimation. Mdl = arima(2,1,0) Mdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,1,0) Model (Gaussian Distribution)" Name = "Gaussian" 3 1 0 NaN {NaN NaN} at lags [1 2] {} {} {} 0 [1×0] NaN
The model description (Description property) and value of Beta suggest that the partially specified arima model object Mdl is agnostic of the exogenous predictors. Estimate the ARIMAX(2,1,0) model; specify the exogenous predictor data. Because estimate backcasts for presample responses (a process that requires presample predictor data for ARIMAX 12-549
12
Functions
models), fit the model to the latest T – Mdl.P responses. (Alternatively, you can specify presample responses by using the 'Y0' name-value pair argument.) EstMdl = estimate(Mdl,y((Mdl.P + 1):T),'X',X); ARIMAX(2,1,0) Model (Gaussian Distribution):
Constant AR{1} AR{2} Beta(1) Beta(2) Beta(3) Variance
Value ________
StandardError _____________
1.7519 0.56076 -0.26625 1.4764 2.5638 -0.34422 0.10673
0.021143 0.016511 0.015966 0.10157 0.10445 0.098623 0.0047273
TStatistic __________ 82.859 33.963 -16.676 14.536 24.547 -3.4903 22.577
PValue ___________ 0 7.9497e-253 1.9636e-62 7.1228e-48 4.6633e-133 0.00048249 7.3161e-113
EstMdl is a fully specified, estimated arima model object. When you estimate the model by using estimate and supply the exogenous data by specifying the 'X' name-value pair argument, MATLAB® recognizes the model as an ARIMAX(2,1,0) model and includes a linear regression component for the exogenous variables. The estimated model is 1
1 − 0 . 56L + 0 . 27L2 1 − L yt = 1 . 75 + 1 . 48x1, t + 2 . 56x2, t − 0 . 34x3, t + εt, which resembles the DGP represented by Mdl0. Because MATLAB returns the AR coefficients of the model expressed in difference-equation notation, their signs are opposite in the equation.
Compute Estimated Standard Errors Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
The table DataTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 1995. Convert the table to a timetable. dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd'); TT = table2timetable(DataTable,'RowTimes',dt);
Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period Fit an ARIMA(1,1,1) model to the data, and return the estimated parameter covariance matrix. Mdl = arima(1,1,1); [EstMdl,EstParamCov] = estimate(Mdl,TT{:,"NYSE"}); ARIMA(1,1,1) Model (Gaussian Distribution):
12-550
estimate
Constant AR{1} MA{1} Variance
Value ________
StandardError _____________
0.15745 -0.21995 0.2854 17.159
0.09783 0.15642 0.15382 0.20038
TStatistic __________ 1.6094 -1.4062 1.8554 85.632
PValue ________ 0.10753 0.15967 0.063541 0
EstParamCov EstParamCov = 4×4 0.0096 -0.0002 0.0002 0.0023
-0.0002 0.0245 -0.0240 -0.0060
0.0002 -0.0240 0.0237 0.0057
0.0023 -0.0060 0.0057 0.0402
EstMdl is a fully specified, estimated arima model object. Rows and columns of EstParamCov correspond to the rows in the table of estimates and inferences; for example, Cov ϕ1, θ1 = − 0 . 024. Compute estimated parameter standard errors by taking the square root of the diagonal elements of the covariance matrix. estParamSE = sqrt(diag(EstParamCov)) estParamSE = 4×1 0.0978 0.1564 0.1538 0.2004
Compute a Wald-based 95% confidence interval on ϕ. T = size(TT,1); % Effective sample size phihat = EstMdl.AR{1}; sephihat = estParamSE(2); ciphi = phihat + tinv([0.025 0.975],T - 3)*sephihat ciphi = 1×2 -0.5266
0.0867
The interval contains 0, which suggests that ϕ is insignificant.
Compute Fitted Response Values Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
12-551
12
Functions
The table DataTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 1995. Convert the table to a timetable. dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd'); TT = table2timetable(DataTable,'RowTimes',dt); T = size(TT,1);
Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Fit an ARIMA(1,1,1) model to the data. Specify the required presample and turn off the estimation display. Mdl = arima(1,1,1); preidx = 1:Mdl.P; estidx = (Mdl.P + 1):T; EstMdl = estimate(Mdl,TT{estidx,"NYSE"},... 'Y0',TT{preidx,"NYSE"},'Display','off');
Infer residuals εt from the estimated model, specify the required presample. resid = infer(EstMdl,TT{estidx,"NYSE"},... 'Y0',TT{preidx,"NYSE"});
resid is a (T – Mdl.P)-by-1 vector of residuals. Compute the fitted values yt. yhat = TT{estidx,"NYSE"} - resid;
Plot the observations and the fitted values on the same graph. plot(TT.Time(estidx),TT{estidx,"NYSE"},'r',TT.Time(estidx),yhat,'b--','LineWidth',2)
12-552
estimate
The fitted values closely track the observations. Plot the residuals versus the fitted values. plot(yhat,resid,'.') ylabel('Residuals') xlabel('Fitted values')
12-553
12
Functions
Residual variance appears larger for larger fitted values. One remedy for this behavior is to apply the log transform to the data.
Input Arguments Mdl — Partially specified ARIMA model arima model object Partially specified ARIMA model used to indicate constrained and estimable model parameters, specified as an arima model object returned by arima or estimate. Properties of Mdl describe the model structure and specify the parameters. estimate fits unspecified (NaN-valued) parameters to the data y. estimate treats specified parameters as equality constraints during estimation. y — Single path of response data numeric column vector Single path of response data to which the model Mdl is fit, specified as a numeric column vector. The last observation of y is the latest observation. Data Types: double
12-554
estimate
Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Y0',Y0,'X',X uses the vector Y0 as presample responses required for estimation, and includes a linear regression component for the exogenous predictor data in X. Estimation Options
X — Exogenous predictor data matrix Exogenous predictor data for the linear regression component, specified as the comma-separated pair consisting of 'X' and a matrix. The columns of X are separate, synchronized time series. The last row contains the latest observations. If you do not specify presample response data using the 'Y0' name-value pair argument, the number of rows of X must be at least numel(y) + Mdl.P. Otherwise, the number of rows of X must be at least the length of y. If the number of rows of X exceeds the number needed, estimate uses the latest observations only. estimate synchronizes X and y so that the latest observations (last rows) occur simultaneously. By default, estimate does not estimate the regression coefficients, regardless of their presence in Mdl. Data Types: double Options — Optimization options optimoptions optimization controller Optimization options, specified as the comma-separated pair consisting of 'Options' and an optimoptions optimization controller. For details on modifying the default values of the optimizer, see optimoptions or fmincon in Optimization Toolbox. For example, to change the constraint tolerance to 1e-6, set Options = optimoptions(@fmincon,'ConstraintTolerance',1e-6,'Algorithm','sqp'). Then, pass Options into estimate using 'Options',Options. By default, estimate uses the same default options as fmincon, except Algorithm is 'sqp' and ConstraintTolerance is 1e-7. Display — Command Window display option 'params' (default) | 'diagnostics' | 'full' | 'iter' | 'off' | string vector | cell vector of character vectors Command Window display option, specified as the comma-separated pair consisting of 'Display' and one or more of the values in this table. 12-555
12
Functions
Value
Information Displayed
'diagnostics'
Optimization diagnostics
'full'
Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics
'iter'
Iterative optimization information
'off'
None
'params'
Maximum likelihood parameter estimates, standard errors, and t statistics
Example: 'Display','off' is well suited for running a simulation that estimates many models. Example: 'Display',{'params','diagnostics'} displays all estimation results and the optimization diagnostics. Data Types: char | cell | string Presample Specifications
Y0 — Presample response data numeric column vector Presample response data for initializing the model, specified as the comma-separated pair consisting of 'Y0' and a numeric column vector. The length of Y0 must be at least Mdl.P. If Y0 has extra rows, estimate uses only the latest Mdl.P presample responses. The last row contains the latest presample responses. By default, estimate backward forecasts (backcasts) for the necessary amount of presample responses. For details on partitioning data for estimation, see “Time Base Partitions for ARIMA Model Estimation” on page 7-103. Data Types: double E0 — Presample innovations numeric column vector Presample innovations εt for initializing the model, specified as the comma-separated pair consisting of 'E0' and a numeric column vector. The length of E0 must be at least Mdl.Q. If E0 has extra rows, estimate uses only the latest Mdl.Q presample innovations. The last row contains the latest presample innovation. If Mdl.Variance is a conditional variance model object, such as a garch model, estimate can require more than Mdl.Q presample innovations. By default, estimate sets all required presample innovations to 0, which is their mean. Data Types: double V0 — Presample conditional variances numeric positive column vector 12-556
estimate
Presample conditional variances σ2t for initializing any conditional variance model, specified as the comma-separated pair consisting of 'V0' and a numeric positive column vector. The length of V0 must be at least the number of observations required to initialize the conditional variance model (see estimate). If V0 has extra rows, estimate uses only the latest observations. The last row contains the latest observation. If the variance is constant, estimate ignores V0. By default, estimate sets the necessary presample conditional variances to the average of the squared inferred innovations. Data Types: double Initial Value Specifications
Constant0 — Initial estimate of model constant numeric scalar Initial estimate of the model constant c, specified as the comma-separated pair consisting of 'Constant0' and a numeric scalar. By default, estimate derives initial estimates using standard time series techniques. Data Types: double AR0 — Initial estimates of nonseasonal AR polynomial coefficients numeric vector Initial estimates of the nonseasonal AR polynomial coefficients ϕ(L), specified as the commaseparated pair consisting of 'AR0' and a numeric vector. The length of AR0 must equal the number of lags associated with nonzero coefficients in the nonseasonal AR polynomial. Elements of AR0 correspond to elements of Mdl.AR. By default, estimate derives initial estimates using standard time series techniques. Data Types: double SAR0 — Initial estimates of seasonal autoregressive polynomial coefficients numeric vector Initial estimates of the seasonal autoregressive polynomial coefficients Φ(L), specified as the commaseparated pair consisting of 'SAR0' and a numeric vector. The length of SAR0 must equal the number of lags associated with nonzero coefficients in the seasonal autoregressive polynomial SARLags. Elements of SAR0 correspond to elements of Mdl.SAR. By default, estimate derives initial estimates using standard time series techniques. Data Types: double MA0 — Initial estimates of nonseasonal moving average polynomial coefficients numeric vector Initial estimates of the nonseasonal moving average polynomial coefficients θ(L), specified as the comma-separated pair consisting of 'MA0' and a numeric vector. 12-557
12
Functions
The length of MA0 must equal the number of lags associated with nonzero coefficients in the nonseasonal moving average polynomial MALags. Elements of MA0 correspond to elements of Mdl.MA. By default, estimate derives initial estimates using standard time series techniques. Data Types: double SMA0 — Initial estimates of seasonal moving average polynomial coefficients numeric vector Initial estimates of the seasonal moving average polynomial coefficients Θ(L), specified as the commaseparated pair consisting of 'SMA0' and a numeric vector. The length of SMA0 must equal the number of lags associated with nonzero coefficients in the seasonal moving average polynomial SMALags. Elements of SMA0 correspond to elements of Mdl.SMA. By default, estimate derives initial estimates using standard time series techniques. Data Types: double Beta0 — Initial estimates of regression coefficients numeric vector Initial estimates of the regression coefficients β, specified as the comma-separated pair consisting of 'Beta0' and a numeric vector. The length of Beta0 must equal the number of columns of X. Elements of Beta0 correspond to the predictor variables represented by the columns of X. By default, estimate derives initial estimates using standard time series techniques. Data Types: double DoF0 — Initial estimate of t-distribution degrees-of-freedom parameter 10 (default) | positive scalar Initial estimate of the t-distribution degrees-of-freedom parameter ν, specified as the commaseparated pair consisting of 'DoF0' and a positive scalar. DoF0 must exceed 2. Data Types: double Variance0 — Initial estimates of variances of innovations positive scalar | cell vector of name-value pair arguments Initial estimates of variances of innovations, specified as the comma-separated pair consisting of 'Variance0' and a positive scalar or a cell vector of name-value pair arguments.
12-558
Mdl.Variance Value
Description
'Variance0' Value
Numeric scalar or NaN
Constant variance
Positive scalar
estimate
Mdl.Variance Value
Description
'Variance0' Value
garch, egarch, or gjr model object
Conditional variance model
Cell vector of name-value pair arguments for specifying initial estimates, see the estimate function of the conditional variance model objects
By default, estimate derives initial estimates using standard time series techniques. Example: For a model with a constant variance, set 'Variance0',2 to specify an initial variance estimate of 2. Example: For a composite conditional mean and variance model, set 'Variance0', {'Constant0',2,'ARCH0',0.1} to specify an initial estimate of 2 for the conditional variance model constant, and an initial estimate of 0.1 for the lag 1 coefficient in the ARCH polynomial. Data Types: double | cell Note NaNs in input data indicate missing values. estimate uses listwise deletion to delete all sampled times (rows) in the input data containing at least one missing value. Specifically, estimate performs these steps: 1
Synchronize, or merge, the presample data sets E0, V0, and Y0 and the effective sample data X and y to create the separate sets Presample and EffectiveSample.
2
Remove all rows from Presample and EffectiveSample containing at least one NaN.
Listwise deletion reduces the sample size and can create irregular time series.
Output Arguments EstMdl — Estimated ARIMA model arima model object Estimated ARIMA model, returned as an arima model object. EstMdl is a copy of Mdl that has NaN values replaced with parameter estimates. EstMdl is fully specified. EstParamCov — Estimated covariance matrix of maximum likelihood estimates positive semidefinite numeric matrix Estimated covariance matrix of maximum likelihood estimates known to the optimizer, returned as a positive semidefinite numeric matrix. The rows and columns contain the covariances of the parameter estimates. The standard error of each parameter estimate is the square root of the main diagonal entries. The rows and columns corresponding to any parameters held fixed as equality constraints are zero vectors. Parameters corresponding to the rows and columns of EstParamCov appear in the following order: • Constant 12-559
12
Functions
• Nonzero AR coefficients at positive lags, from the smallest to largest lag • Nonzero SAR coefficients at positive lags, from the smallest to largest lag • Nonzero MA coefficients at positive lags, from the smallest to largest lag • Nonzero SMA coefficients at positive lags, from the smallest to largest lag • Regression coefficients (when you specify exogenous data X), ordered by the columns of X • Variance parameters, a scalar for constant variance models and vector for conditional variance models (see estimate for the order of parameters) • Degrees of freedom (t-innovation distribution only) Data Types: double logL — Optimized loglikelihood objective function value numeric scalar Optimized loglikelihood objective function value, returned as a numeric scalar. Data Types: double info — Optimization summary structure array Optimization summary, returned as a structure array with the fields described in this table. Field
Description
exitflag
Optimization exit flag (see fmincon in Optimization Toolbox)
options
Optimization options controller (see optimoptions and fmincon in Optimization Toolbox)
X
Vector of final parameter estimates
X0
Vector of initial parameter estimates
For example, you can display the vector of final estimates by entering info.X in the Command Window. Data Types: struct
Tip • To access values of the estimation results, including the number of free parameters in the model, pass EstMdl to summarize.
Algorithms • estimate infers innovations and conditional variances (when present) of the underlying response series, and then uses constrained maximum likelihood to fit the model Mdl to the response data y. • Because you can specify presample data inputs Y0, E0, and V0 of differing lengths, estimate assumes that all specified sets have these characteristics: • The final observation (row) in each set occurs simultaneously. • The first observation in the estimation sample immediately follows the last observation in the presample, with respect to the sampling frequency. 12-560
estimate
• If you specify the 'Display' name-value pair argument, the value overrides the Diagnostics and Display settings of the 'Options' name-value pair argument. Otherwise, estimate displays optimization information using 'Options' settings. • estimate uses the outer product of gradients (OPG) method to perform covariance matrix estimation on page 3-61.
Version History Introduced in R2012a
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [3] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima Functions simulate | forecast | infer | summarize Topics “Time Base Partitions for ARIMA Model Estimation” on page 7-103 “Estimate Multiplicative ARIMA Model” on page 7-122 “Estimate Conditional Mean and Variance Model” on page 7-135 “Model Seasonal Lag Effects Using Indicator Variables” on page 7-125 “Maximum Likelihood Estimation for Conditional Mean Models” on page 7-112 “Conditional Mean Model Estimation with Equality Constraints” on page 7-114 “Presample Data for Conditional Mean Model Estimation” on page 7-115 “Initial Values for Conditional Mean Model Estimation” on page 7-117 “Optimization Settings for Conditional Mean Model Estimation” on page 7-119
12-561
12
Functions
estimate Estimate posterior distribution of Bayesian linear regression model parameters
Syntax PosteriorMdl = estimate(PriorMdl,X,y) PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) [PosteriorMdl,Summary] = estimate( ___ )
Description To perform predictor variable selection for a Bayesian linear regression model, see estimate. PosteriorMdl = estimate(PriorMdl,X,y) returns the Bayesian linear regression on page 12578 model PosteriorMdl that characterizes the joint posterior distributions of the coefficients β and the disturbance variance σ2. PriorMdl specifies the joint prior distribution of the parameters and the structure of the linear regression model. X is the predictor data and y is the response data. PriorMdl and PosteriorMdl might not be the same object type. To produce PosteriorMdl, the estimate function updates the prior distribution with information about the parameters that it obtains from the data. NaNs in the data indicate missing values, which estimate removes by using list-wise deletion. PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify a value for either β or σ2 to estimate the conditional posterior distribution of one parameter given the specified value of the other parameter. If you specify the Beta or Sigma2 name-value pair argument, then PosteriorMdl and PriorMdl are equal. [PosteriorMdl,Summary] = estimate( ___ ) uses any of the input argument combinations in the previous syntaxes to return a table that contains the following for each parameter: the posterior mean and standard deviation, 95% credible interval, posterior probability that the parameter is greater than 0, and description of the posterior distribution (if one exists). Also, the table contains the posterior covariance matrix of β and σ2. If you specify the Beta or Sigma2 name-value pair argument, then estimate returns conditional posterior estimates.
Examples Compare Default Prior and Marginal Posterior Estimation to OLS Estimates Consider a model that predicts the fuel economy (in MPG) of a car given its engine displacement and weight. Load the carsmall data set. 12-562
estimate
load carsmall x = [Displacement Weight]; y = MPG;
Regress fuel economy onto engine displacement and weight, including an intercept to obtain ordinary least-squares (OLS) estimates. Mdl = fitlm(x,y) Mdl = Linear regression model: y ~ 1 + x1 + x2 Estimated Coefficients: Estimate __________ (Intercept) x1 x2
46.925 -0.014593 -0.0068422
SE _________
tStat _______
pValue __________
2.0858 0.0082695 0.0011337
22.497 -1.7647 -6.0353
6.0509e-39 0.080968 3.3838e-08
Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 4.09 R-squared: 0.747, Adjusted R-Squared: 0.741 F-statistic vs. constant model: 134, p-value = 7.22e-28 Mdl.MSE ans = 16.7100
Create a default, diffuse prior distribution for one predictor. p = 2; PriorMdl = bayeslm(p);
PriorMdl is a diffuseblm model object. Use default options to estimate the posterior distribution. PosteriorMdl = estimate(PriorMdl,x,y); Method: Analytic posterior distributions Number of observations: 94 Number of predictors: 3 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | 46.9247 2.1091 [42.782, 51.068] 1.000 t (46.92, 2.09^2, 91) Beta(1) | -0.0146 0.0084 [-0.031, 0.002] 0.040 t (-0.01, 0.01^2, 91) Beta(2) | -0.0068 0.0011 [-0.009, -0.005] 0.000 t (-0.01, 0.00^2, 91) Sigma2 | 17.0855 2.5905 [12.748, 22.866] 1.000 IG(45.50, 0.0013)
PosteriorMdl is a conjugateblm model object. The posterior means and the OLS coefficient estimates are almost identical. Also, the posterior standard deviations and OLS standard errors are almost identical. The posterior mean of Sigma2 is close to the OLS mean squared error (MSE). 12-563
12
Functions
Estimate Posterior Using Hamiltonian Monte Carlo Sampler Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of total employment (E) and real wages (WR).
For all , is a series of independent Gaussian disturbances with a mean of 0 and variance Assume these prior distributions: • •
.
is a 3-D t distribution with 10 degrees of freedom for each component, correlation matrix C, location ct, and scale st. , with shape
and scale
.
bayeslm treats these assumptions and the data likelihood as if the corresponding posterior is analytically intractable. Declare a MATLAB® function that: • Accepts values of hyperparameters •
and
together in a column vector, and accepts values of the
Returns the value of the joint prior distribution,
, given the values of
and
function logPDF = priorMVTIG(params,ct,st,dof,C,a,b) %priorMVTIG Log density of multivariate t times inverse gamma % priorMVTIG passes params(1:end-1) to the multivariate t density % function with dof degrees of freedom for each component and positive % definite correlation matrix C. priorMVTIG returns the log of the product of % the two evaluated densities. % % params: Parameter values at which the densities are evaluated, an % m-by-1 numeric vector. % % ct: Multivariate t distribution component centers, an (m-1)-by-1 % numeric vector. Elements correspond to the first m-1 elements % of params. % % st: Multivariate t distribution component scales, an (m-1)-by-1 % numeric (m-1)-by-1 numeric vector. Elements correspond to the % first m-1 elements of params. % % dof: Degrees of freedom for the multivariate t distribution, a % numeric scalar or (m-1)-by-1 numeric vector. priorMVTIG expands % scalars such that dof = dof*ones(m-1,1). Elements of dof % correspond to the elements of params(1:end-1). % % C: Correlation matrix for the multivariate t distribution, an % (m-1)-by-(m-1) symmetric, positive definite matrix. Rows and % columns correspond to the elements of params(1:end-1). %
12-564
estimate
% a: Inverse gamma shape parameter, a positive numeric scalar. % % b: Inverse gamma scale parameter, a positive scalar. % beta = params(1:(end-1)); sigma2 = params(end); tVal = (beta - ct)./st; mvtDensity = mvtpdf(tVal,C,dof); igDensity = sigma2^(-a-1)*exp(-1/(sigma2*b))/(gamma(a)*b^a); logPDF = log(mvtDensity*igDensity); end
Create an anonymous function that operates like priorMVTIG, but accepts the parameter values only, and holds the hyperparameter values fixed at arbitrarily chosen values. prednames = ["E" "WR"]; p = numel(prednames); numcoeff = p + 1; rng(1); % For reproducibility dof = 10; V = rand(numcoeff); Sigma = 0.5*(V + V') + numcoeff*eye(numcoeff); st = sqrt(diag(Sigma)); C = diag(1./st)*Sigma*diag(1./st); ct = rand(numcoeff,1); a = 10*rand; b = 10*rand; logPDF = @(params)priorMVTIG(params,ct,st,dof,C,a,b);
Create a custom joint prior model for the linear regression parameters. Specify the number of predictors p. Also, specify the function handle for priorMVTIG and the variable names. PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,... 'VarNames',prednames);
PriorMdl is a customblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,"GNPR"};
using the Hamiltonian Monte Carlo (HMC) Estimate the marginal posterior distributions of and sampler. Specify drawing 10,000 samples and a burn-in period of 1000 draws. PosteriorMdl = estimate(PriorMdl,X,y,'Sampler','hmc','NumDraws',1e4,... 'Burnin',1e3); Method: MCMC sampling with 10000 draws Number of observations: 62
12-565
12
Functions
Number of predictors:
3
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | -3.6344 5.6044 [-16.124, 6.211] 0.246 Empirical E | -0.0056 0.0006 [-0.007, -0.004] 0.000 Empirical WR | 15.2548 0.7768 [13.711, 16.775] 1.000 Empirical Sigma2 | 1285.5647 240.9600 [896.743, 1833.793] 1.000 Empirical
PosteriorMdl is an empiricalblm model object storing the draws from the posterior distributions. View a trace plot and an ACF plot of the draws from the posterior of disturbance variance. Do not plot the burn-in period. figure; subplot(2,1,1) plot(PosteriorMdl.BetaDraws(2,1001:end)); title(['Trace Plot ' char(8212) ' \beta_1']); xlabel('MCMC Draw') ylabel('Simulation Index') subplot(2,1,2) autocorr(PosteriorMdl.BetaDraws(2,1001:end)) figure; subplot(2,1,1) plot(PosteriorMdl.Sigma2Draws(1001:end)); title(['Trace Plot ' char(8212) ' Disturbance Variance']); xlabel('MCMC Draw') ylabel('Simulation Index') subplot(2,1,2) autocorr(PosteriorMdl.Sigma2Draws(1001:end))
12-566
(for example) and the
estimate
12-567
12
Functions
The MCMC sample of the disturbance variance appears to mix well.
Estimate Conditional Posterior Distributions Consider the regression model in “Estimate Posterior Using Hamiltonian Monte Carlo Sampler” on page 12-564. This example uses the same data and context, but assumes a diffuse prior model instead. Create a diffuse prior model for the linear regression parameters. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]) PriorMdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one
12-568
estimate
IPI E WR Sigma2
| | | |
0 0 0 Inf
Inf Inf Inf Inf
[ [ [ [
NaN, NaN, NaN, NaN,
NaN] NaN] NaN] NaN]
0.500 0.500 0.500 1.000
Proportional Proportional Proportional Proportional
to to to to
one one one 1/Sigma2
PriorMdl is a diffuseblm model object. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the conditional posterior distribution of β given the data and that σ2 = 2, and return the estimation summary table to access the estimates. [Mdl,SummaryBeta] = estimate(PriorMdl,X,y,'Sigma2',2); Method: Analytic posterior distributions Conditional variable: Sigma2 fixed at 2 Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | -24.2536 1.8696 [-27.918, -20.589] 0.000 N (-24.25, 1.87^2) IPI | 4.3913 0.0301 [ 4.332, 4.450] 1.000 N (4.39, 0.03^2) E | 0.0011 0.0001 [ 0.001, 0.001] 1.000 N (0.00, 0.00^2) WR | 2.4682 0.0743 [ 2.323, 2.614] 1.000 N (2.47, 0.07^2) Sigma2 | 2 0 [ 2.000, 2.000] 1.000 Fixed value
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 2 during estimation, inferences on it are trivial. Extract the mean vector and covariance matrix of the conditional posterior of β from the estimation summary table. condPostMeanBeta = SummaryBeta.Mean(1:(end - 1)) condPostMeanBeta = 4×1 -24.2536 4.3913 0.0011 2.4682 CondPostCovBeta = SummaryBeta.Covariances(1:(end - 1),1:(end - 1)) CondPostCovBeta = 4×4 3.4956 0.0350 -0.0001 0.0241
0.0350 0.0009 -0.0000 -0.0013
-0.0001 -0.0000 0.0000 -0.0000
0.0241 -0.0013 -0.0000 0.0055
12-569
12
Functions
Display Mdl. Mdl Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Because estimate computes the conditional posterior distribution, it returns the original prior model, not the posterior, in the first position of the output argument list. Estimate the conditional posterior distributions of σ2 given that β is condPostMeanBeta. [~,SummarySigma2] = estimate(PriorMdl,X,y,'Beta',condPostMeanBeta); Method: Analytic posterior distributions Conditional variable: Beta fixed at -24.2536 Number of observations: 62 Number of predictors: 4
4.3913
0.00112035
2.46823
| Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | -24.2536 0 [-24.254, -24.254] 0.000 Fixed value IPI | 4.3913 0 [ 4.391, 4.391] 1.000 Fixed value E | 0.0011 0 [ 0.001, 0.001] 1.000 Fixed value WR | 2.4682 0 [ 2.468, 2.468] 1.000 Fixed value Sigma2 | 48.5138 9.0088 [33.984, 69.098] 1.000 IG(31.00, 0.00069)
estimate displays a summary of the conditional posterior distribution of σ2. Because β is fixed to condPostMeanBeta during estimation, inferences on it are trivial. Extract the mean and variance of the conditional posterior of σ2 from the estimation summary table. condPostMeanSigma2 = SummarySigma2.Mean(end) condPostMeanSigma2 = 48.5138 CondPostVarSigma2 = SummarySigma2.Covariances(end,end) CondPostVarSigma2 = 81.1581
12-570
estimate
Access Estimates in Estimation Display Consider the regression model in “Estimate Posterior Using Hamiltonian Monte Carlo Sampler” on page 12-564. This example uses the same data and context, but assumes a semiconjugate prior model instead. Create a semiconjugate prior model for the linear regression parameters. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate',... 'VarNames',["IPI" "E" "WR"]);
PriorMdl is a semiconjugateblm model object. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. rng(1); % For reproducibility [PosteriorMdl,Summary] = estimate(PriorMdl,X,y); Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical IPI | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical
PosteriorMdl is an empiricalblm model object because marginal posterior distributions of semiconjugate models are analytically intractable, so estimate must implement a Gibbs sampler. Summary is a table containing the estimates and inferences that estimate displays at the command line. Display the summary table. Summary Summary=5×6 table
Intercept IPI E WR Sigma2
Mean _________
Std __________
CI95 ________________________
-23.992 4.3929 0.0011124 2.4711 46.747
9.052 0.14578 0.00033976 0.3576 8.455
-41.734 4.1011 0.00045128 1.7622 33.099
-6.1976 4.6782 0.0017883 3.1781 66.126
Positive ________
Distribution _____________
0.0053 1 0.9989 1 1
{'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'}
12-571
12
Functions
Access the 95% equitailed credible interval of the regression coefficient of IPI. Summary.CI95(2,:) ans = 1×2 4.1011
4.6782
Input Arguments PriorMdl — Bayesian linear regression model conjugateblm model object | semiconjugateblm model object | diffuseblm model object | empiricalblm model object | customblm model object Bayesian linear regression model representing a prior model, specified as an object in this table. Model Object
Description
conjugateblm
Dependent, normal-inverse-gamma conjugate model returned by bayeslm or estimate
semiconjugatebl Independent, normal-inverse-gamma semiconjugate model returned by m bayeslm diffuseblm
Diffuse prior model returned by bayeslm
empiricalblm
Prior model characterized by samples from prior distributions, returned by bayeslm or estimate
customblm
Prior distribution function that you declare returned by bayeslm
PriorMdl can also represent a joint posterior model returned by estimate, either a conjugateblm or empiricalblm model object. In this case, estimate updates the joint posterior distribution using the new observations in X and y. X — Predictor data numeric matrix Predictor data for the multiple linear regression model, specified as a numObservations-byPriorMdl.NumPredictors numeric matrix. numObservations is the number of observations and must be equal to the length of y. Data Types: double y — Response data numeric vector Response data for the multiple linear regression model, specified as a numeric vector with numObservations elements. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. 12-572
estimate
Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Sigma2',2 specifies estimating the conditional posterior distribution of the regression coefficients given the data and that the specified disturbance variance is 2. Options for All Prior Distributions
Display — Flag to display Bayesian estimator summary at command line true (default) | false Flag to display Bayesian estimator summary at the command line, specified as the comma-separated pair consisting of 'Display' and a value in this table. Value
Description
true
estimate prints estimation information and a table summarizing the Bayesian estimators to t command line.
false
estimate does not print to the command line.
The estimation information includes the estimation method, fixed parameters, the number of observations, and the number of predictors. The summary table contains estimated posterior means and standard deviations (square root of the posterior variance), 95% equitailed credible intervals, the posterior probability that the parameter is greater than 0, and a description of the posterior distribution (if known). If you specify one of Beta or Sigma2, then estimate includes your specification in the display, and corresponding posterior estimates are trivial. Example: 'Display',false Data Types: logical Options for All Prior Distributions Except Empirical
Beta — Value of regression coefficients for conditional posterior distribution estimation of disturbance variance empty array ([]) (default) | numeric column vector Value of the regression coefficients for conditional posterior distribution estimation of the disturbance variance, specified as the comma-separated pair consisting of 'Beta' and a (PriorMdl.Intercept + PriorMdl.NumPredictors)-by-1 numeric vector. estimate estimates the characteristics of π(σ2| y,X,β = Beta), where y is y, X is X, and Beta is the value of 'Beta'. If PriorMdl.Intercept is true, then Beta(1) corresponds to the model intercept. All other values correspond to the predictor variables that compose the columns of X. Beta cannot contain any NaN values (that is, all coefficients must be known). You cannot specify Beta and Sigma2 simultaneously. By default, estimate does not compute characteristics of the conditional posterior of σ2. Example: 'Beta',1:3 Data Types: double Sigma2 — Value of disturbance variance for conditional posterior distribution estimation of regression coefficients empty array ([]) (default) | positive numeric scalar 12-573
12
Functions
Value of the disturbance variance for conditional posterior distribution estimation of the regression coefficients, specified as the comma-separated pair consisting of 'Sigma2' and a positive numeric scalar. estimate estimates characteristics of π(β|y,X,Sigma2), where y is y, X is X, and Sigma2 is the value of 'Sigma2'. You cannot specify Sigma2 and Beta simultaneously. By default, estimate does not compute characteristics of the conditional posterior of β. Example: 'Sigma2',1 Data Types: double Options for Semiconjugate, Empirical, and Custom Prior Distributions
NumDraws — Monte Carlo simulation adjusted sample size 1e5 (default) | positive integer Monte Carlo simulation adjusted sample size, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer. estimate actually draws BurnIn + NumDraws*Thin samples, but it bases the estimates off NumDraws samples. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-579. If PriorMdl is a semiconjugateblm model and you specify Beta or Sigma2, then MATLAB ignores NumDraws. Example: 'NumDraws',1e7 Data Types: double Options for Semiconjugate and Custom Prior Distributions
BurnIn — Number of draws to remove from beginning of Monte Carlo sample 5000 (default) | nonnegative scalar Number of draws to remove from the beginning of the Monte Carlo sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and a nonnegative scalar. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-579. Tip To help you specify the appropriate burn-in period size, determine the extent of the transient behavior in the Monte Carlo sample by specifying 'BurnIn',0, simulating a few thousand observations using simulate, and then plotting the paths. Example: 'BurnIn',0 Data Types: double Thin — Monte Carlo adjusted sample size multiplier 1 (default) | positive integer Monte Carlo adjusted sample size multiplier, specified as the comma-separated pair consisting of 'Thin' and a positive integer. The actual Monte Carlo sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, estimate discards every Thin – 1 draws, and then retains the next. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-579. 12-574
estimate
Tip To reduce potential large serial correlation in the Monte Carlo sample, or to reduce the memory consumption of the draws stored in PosteriorMdl, specify a large value for Thin. Example: 'Thin',5 Data Types: double BetaStart — Starting values of regression coefficients for MCMC sample numeric column vector Starting values of the regression coefficients for the Markov chain Monte Carlo (MCMC) sample, specified as the comma-separated pair consisting of 'BetaStart' and a numeric column vector with (PriorMdl.Intercept + PriorMdl.NumPredictors) elements. By default, BetaStart is the ordinary least-squares (OLS) estimate. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values. Example: 'BetaStart',[1; 2; 3] Data Types: double Sigma2Start — Starting values of disturbance variance for MCMC sample positive numeric scalar Starting values of the disturbance variance for the MCMC sample, specified as the comma-separated pair consisting of 'Sigma2Start' and a positive numeric scalar. By default, Sigma2Start is the OLS residual mean squared error. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values. Example: 'Sigma2Start',4 Data Types: double Options for Custom Prior Distributions
Reparameterize — Reparameterization of σ2 as log(σ2) false (default) | true Reparameterization of σ2 as log(σ2) during posterior estimation and simulation, specified as the comma-separated pair consisting of 'Reparameterize' and a value in this table. Value
Description
false
estimate does not reparameterize σ2.
true
estimate reparameterizes σ2 as log(σ2). estimate converts results back to the original scale and does not change the functional form of PriorMdl.LogPDF.
12-575
12
Functions
Tip If you experience numeric instabilities during the posterior estimation or simulation of σ2, then specify 'Reparameterize',true. Example: 'Reparameterize',true Data Types: logical Sampler — MCMC sampler 'slice' (default) | 'metropolis' | 'hmc' MCMC sampler, specified as the comma-separated pair consisting of 'Sampler' and a value in this table. Value
Description
'slice'
Slice sampler
'metropolis'
Random walk Metropolis sampler
'hmc'
Hamiltonian Monte Carlo (HMC) sampler
Tip • To increase the quality of the MCMC draws, tune the sampler. 1
Before calling estimate, specify the tuning parameters and their values by using sampleroptions. For example, to specify the slice sampler width width, use: options = sampleroptions('Sampler',"slice",'Width',width);
2
Specify the object containing the tuning parameter specifications returned by sampleroptions by using the 'Options' name-value pair argument. For example, to use the tuning parameter specifications in options, specify: 'Options',options
• If you specify the HMC sampler, then a best practice is to provide the gradient for some variables, at least. estimate resorts the numerical computation of any missing partial derivatives (NaN values) in the gradient vector.
Example: 'Sampler',"hmc" Data Types: string Options — Sampler options [] (default) | structure array Sampler options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by sampleroptions. Use 'Options' to specify the MCMC sampler and its tuningparameter values. Example: 'Options',sampleroptions('Sampler',"hmc") Data Types: struct Width — Typical sampling-interval width positive numeric scalar | numeric vector of positive values 12-576
estimate
Typical sampling-interval width around the current value in the marginal distributions for the slice sampler, specified as the comma-separated pair consisting of 'Width' and a positive numeric scalar or a (PriorMdl.Intercept + PriorMdl.NumPredictors + 1)-by-1 numeric vector of positive values. The first element corresponds to the model intercept, if one exists in the model. The next PriorMdl.NumPredictors elements correspond to the coefficients of the predictor variables ordered by the predictor data columns. The last element corresponds to the model variance. • If Width is a scalar, then estimate applies Width to all PriorMdl.NumPredictors + PriorMdl.Intercept + 1 marginal distributions. • If Width is a numeric vector, then estimate applies the first element to the intercept (if one exists), the next PriorMdl.NumPredictors elements to the regression coefficients corresponding to the predictor variables in X, and the last element to the disturbance variance. • If the sample size (size(X,1)) is less than 100, then Width is 10 by default. • If the sample size is at least 100, then estimate sets Width to the vector of corresponding posterior standard deviations by default, assuming a diffuse prior model (diffuseblm). The typical width of the slice sampler does not affect convergence of the MCMC sample. It does affect the number of required function evaluations, that is, the efficiency of the algorithm. If the width is too small, then the algorithm can implement an excessive number of function evaluations to determine the appropriate sampling width. If the width is too large, then the algorithm might have to decrease the width to an appropriate size, which requires function evaluations. estimate sends Width to the slicesample function. For more details, see slicesample. Tip • For maximum flexibility, specify the slice sampler width width by using the 'Options' namevalue pair argument. For example: 'Options',sampleroptions('Sampler',"slice",'Width',width)
Example: 'Width',[100*ones(3,1);10]
Output Arguments PosteriorMdl — Bayesian linear regression model storing distribution characteristics conjugateblm model object | semiconjugateblm model object | diffuseblm model object | empiricalblm model object | customblm model object Bayesian linear regression model storing distribution characteristics, returned as a conjugateblm, semiconjugateblm, diffuseblm, empiricalblm, or customblm model object. • If you do not specify either Beta or Sigma2 (their values are []), then estimate updates the prior model using the data likelihood to form the posterior distribution. PosteriorMdl characterizes the posterior distribution. Its object type depends on the prior model type (PriorMdl). Model Object
PriorMdl
conjugateblm
conjugateblm or diffuseblm
12-577
12
Functions
Model Object
PriorMdl
empiricalblm
semiconjugateblm, empiricalblm, or customblm
• If you specify either Beta or Sigma2, then PosteriorMdl equals PriorMdl (the two models are the same object storing the same property values). estimate does not update the prior model to form the posterior model. However, estBeta, EstBetaCov, estSigma2, estSigma2Var, and Summary store conditional posterior estimates. For more details on the display of PosteriorMdl, see Summary. For details on supported posterior distributions that are analytically tractable, see “Analytically Tractable Posteriors” on page 6-5. Summary — Summary of Bayesian estimators table Summary of Bayesian estimators, returned as a table. Summary contains the same information as the display of the estimation summary (Display). Rows correspond to parameters, and columns correspond to these posterior characteristics for each parameter: • Mean – Posterior mean • Std – Posterior standard deviation • CI95 – 95% equitailed credible interval • Positive – The posterior probability the parameter is greater than 0 • Distribution – Description of the marginal or conditional posterior distribution of the parameter, when known • Covariances – Estimated covariance matrix of the coefficients and disturbance variance Row names are the names in PriorMdl.VarNames. The name of the last row is Sigma2. Alternatively, pass PosteriorMdl to summarize to obtain a summary of Bayesian estimators.
Limitations If PriorMdl is an empiricalblm model object. You cannot specify Beta or Sigma2. You cannot estimate conditional posterior distributions by using an empirical prior distribution.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. 12-578
estimate
• εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Tips • Monte Carlo simulation is subject to variation. If estimate uses Monte Carlo simulation, then estimates and inferences might vary when you call estimate multiple times under seemingly equivalent conditions. To reproduce estimation results, set a random number seed by using rng before calling estimate. • If estimate issues an error while estimating the posterior distribution using a custom prior model, then try adjusting initial parameter values by using BetaStart or Sigma2Start, or try adjusting the declared log prior function, and then reconstructing the model. The error might indicate that the log of the prior distribution is –Inf at the specified initial values.
Algorithms • Whenever the prior distribution (PriorMdl) and the data likelihood yield an analytically tractable posterior distribution, estimate evaluates the closed-form solutions to Bayes estimators. Otherwise, estimate resorts to Monte Carlo simulation to estimate parameters and draw inferences. For more details, see “Posterior Estimation and Inference” on page 6-4. • This figure illustrates how estimate reduces the Monte Carlo sample using the values of NumDraws, Thin, and BurnIn. Rectangles represent successive draws from the distribution. estimate removes the white rectangles from the Monte Carlo sample. The remaining NumDraws black rectangles compose the Monte Carlo sample.
Version History Introduced in R2017a 12-579
12
Functions
estimate returns only an estimated model object and estimation summary Errors starting in R2019b For a simpler interface, estimate returns only an estimated model and an estimation summary table. Now, the supported syntaxes are: PosteriorMdl = estimate(...); [PosteriorMdl,Summary] = estimate(...);
You can obtain estimated posterior means and covariances, based on the marginal or conditional distributions, from the estimation summary table. In past releases, estimate returned these output arguments: [PosteriorMdl,estBeta,EstBetaCov,estSigma2,estSigma2Var,Summary] = estimate(...);
estBeta, EstBetaCov, estSigma2, and estSigma2Var are posterior means and covariances of β and σ2. Starting in R2019b, if you request any output argument in a position greater than the second position, estimate issues this error: Too many output arguments.
For details on how to update your code, see “Replacing Removed Syntaxes of estimate” on page 6-73.
See Also Objects conjugateblm | semiconjugateblm | diffuseblm | empiricalblm | customblm Functions bayeslm | forecast | simulate | summarize | plot | sampleroptions Topics “Replacing Removed Syntaxes of estimate” on page 6-73 “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10 “Specify Gradient for HMC Sampler” on page 6-18 “Tune Slice Sampler for Posterior Estimation” on page 6-36
12-580
estimate
estimate Perform predictor variable selection for Bayesian linear regression models
Syntax PosteriorMdl = estimate(PriorMdl,X,y) PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) [PosteriorMdl,Summary] = estimate( ___ )
Description To estimate the posterior distribution of a standard Bayesian linear regression model, see estimate. PosteriorMdl = estimate(PriorMdl,X,y) returns the model that characterizes the joint posterior distributions of β and σ2 of a Bayesian linear regression on page 12-595 model. estimate also performs predictor variable selection. PriorMdl specifies the joint prior distribution of the parameters, the structure of the linear regression model, and the variable selection algorithm. X is the predictor data and y is the response data. PriorMdl and PosteriorMdl are not the same object type. To produce PosteriorMdl, estimate updates the prior distribution with information about the parameters that it obtains from the data. NaNs in the data indicate missing values, which estimate removes using list-wise deletion. PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, 'Lambda',0.5 specifies that the shrinkage parameter value for Bayesian lasso regression is 0.5 for all coefficients except the intercept. If you specify Beta or Sigma2, then PosteriorMdl and PriorMdl are equal. [PosteriorMdl,Summary] = estimate( ___ ) uses any of the input argument combinations in the previous syntaxes and also returns a table that includes the following for each parameter: posterior estimates, standard errors, 95% credible intervals, and posterior probability that the parameter is greater than 0.
Examples Select Variables Using Bayesian Lasso Regression Consider the multiple linear regression model that predicts US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. 12-581
12
Functions
Assume the prior distributions are: • For k = 0,...,3, βk | σ2 has a Laplace distribution with a mean of 0 and a scale of σ2 /λ, where λ is the shrinkage parameter. The coefficients are conditionally independent. • σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. Create a prior model for Bayesian lasso regression. Specify the number of predictors, the prior model type, and variable names. Specify these shrinkages: • 0.01 for the intercept • 10 for IPI and WR • 1e5 for E because it has a scale that is several orders of magnitude larger than the other variables The order of the shrinkages follows the order of the specified variable names, but the first element is the shrinkage of the intercept. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','Lambda',[0.01; 10; 1e5; 10],... 'VarNames',["IPI" "E" "WR"]);
PriorMdl is a lassoblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,"GNPR"};
Perform Bayesian lasso regression by passing the prior model and data to estimate, that is, by estimating the posterior distribution of β and σ2. Bayesian lasso regression uses Markov chain Monte Carlo (MCMC) to sample from the posterior. For reproducibility, set a random seed. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: lasso MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -1.3472 6.8160 [-15.169, 11.590] 0.427 Empirical IPI | 4.4755 0.1646 [ 4.157, 4.799] 1.000 Empirical E | 0.0001 0.0002 [-0.000, 0.000] 0.796 Empirical WR | 3.1610 0.3136 [ 2.538, 3.760] 1.000 Empirical Sigma2 | 60.1452 11.1180 [42.319, 85.085] 1.000 Empirical
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions in the MATLAB® command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: 12-582
estimate
• CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of IPI is in [4.157, 4.799] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.427. Plot the posterior distributions. plot(PosteriorMdl)
Given the shrinkages, the distribution of E is fairly dense around 0. Therefore, E might not be an important predictor. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
12-583
12
Functions
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-584
estimate
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Select Variables Using SSVS Consider the regression model in “Select Variables Using Bayesian Lasso Regression” on page 12581. Create a prior model for performing stochastic search variable selection (SSVS). Assume that β and σ2 are dependent (a conjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo for estimation, set a random number seed to reproduce the results. 12-585
12
Functions
rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ---------------------------------------------------------------------------------Intercept | -18.8333 10.1851 [-36.965, 0.716] 0.037 Empirical 0.8806 IPI | 4.4554 0.1543 [ 4.165, 4.764] 1.000 Empirical 0.4545 E | 0.0010 0.0004 [ 0.000, 0.002] 0.997 Empirical 0.0925 WR | 2.4686 0.3615 [ 1.766, 3.197] 1.000 Empirical 0.1734 Sigma2 | 47.7557 8.6551 [33.858, 66.875] 1.000 Empirical NaN
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions in the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E (standardized) is in [0.000, 0.0.002] is 0.95. • Regime, which contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability E that should be included in the model is 0.0925. Assuming that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
12-586
estimate
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-587
12
Functions
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Estimate Conditional Posterior Distributions Consider the regression model and prior distribution in “Select Variables Using Bayesian Lasso Regression” on page 12-581. Create a Bayesian lasso regression prior model for 3 predictors and specify variable names. Specify the shrinkage values 0.01, 10, 1e5, and 10 for the intercept, and the coefficients of IPI, E, and WR. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"],... 'Lambda',[0.01; 10; 1e5; 10]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,"GNPR"};
Estimate the conditional posterior distribution of β given the data and that σ2 = 10, and return the estimation summary table to access the estimates. 12-588
estimate
rng(1); % For reproducibility [Mdl,SummaryBeta] = estimate(PriorMdl,X,y,'Sigma2',10); Method: lasso MCMC sampling with 10000 draws Conditional variable: Sigma2 fixed at 10 Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------Intercept | -8.0643 4.1992 [-16.384, 0.018] 0.025 Empirical IPI | 4.4454 0.0679 [ 4.312, 4.578] 1.000 Empirical E | 0.0004 0.0002 [ 0.000, 0.001] 0.999 Empirical WR | 2.9792 0.1672 [ 2.651, 3.305] 1.000 Empirical Sigma2 | 10 0 [10.000, 10.000] 1.000 Empirical
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 10 during estimation, inferences on it are trivial. Display Mdl. Mdl Mdl = lassoblm with properties: NumPredictors: Intercept: VarNames: Lambda: A: B:
3 1 {4x1 cell} [4x1 double] 3 1
| Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | 0 100 [-200.000, 200.000] 0.500 Scale mixture IPI | 0 0.1000 [-0.200, 0.200] 0.500 Scale mixture E | 0 0.0000 [-0.000, 0.000] 0.500 Scale mixture WR | 0 0.1000 [-0.200, 0.200] 0.500 Scale mixture Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Because estimate computes the conditional posterior distribution, it returns the model input PriorMdl, not the conditional posterior, in the first position of the output argument list. Display the estimation summary table. SummaryBeta SummaryBeta=5×6 table Mean __________ Intercept IPI E
-8.0643 4.4454 0.00039896
Std __________
CI95 ________________________
4.1992 0.067949 0.00015673
-16.384 4.312 9.4925e-05
0.01837 4.5783 0.00070697
Positive ________
Distribution ____________
0.0254 1 0.9987
{'Empirical' {'Empirical' {'Empirical'
12-589
12
Functions
WR Sigma2
2.9792 10
0.16716 0
2.6506 10
3.3046 10
1 1
{'Empirical' {'Empirical'
SummaryBeta contains the conditional posterior estimates. Estimate the conditional posterior distributions of σ2 given that β is the conditional posterior mean of β | σ2, X, y (stored in SummaryBeta.Mean(1:(end – 1))). Return the estimation summary table. condPostMeanBeta = SummaryBeta.Mean(1:(end - 1)); [~,SummarySigma2] = estimate(PriorMdl,X,y,'Beta',condPostMeanBeta); Method: lasso MCMC sampling with 10000 draws Conditional variable: Beta fixed at -8.0643 Number of observations: 62 Number of predictors: 4
4.4454
0.00039896
2.9792
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------Intercept | -8.0643 0.0000 [-8.064, -8.064] 0.000 Empirical IPI | 4.4454 0.0000 [ 4.445, 4.445] 1.000 Empirical E | 0.0004 0.0000 [ 0.000, 0.000] 1.000 Empirical WR | 2.9792 0.0000 [ 2.979, 2.979] 1.000 Empirical Sigma2 | 56.8314 10.2921 [39.947, 79.731] 1.000 Empirical
estimate displays an estimation summary of the conditional posterior distribution of σ2 given the data and that β is condPostMeanBeta. In the display, inferences on β are trivial.
Access Estimates in Estimation Summary Display Consider the regression model in “Select Variables Using Bayesian Lasso Regression” on page 12581. Create a prior model for performing SSVS. Assume that β and σ2 are dependent (a conjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo for estimation, set a random number seed to reproduce the results. Suppress the estimation display, but return the estimation summary table. rng(1); [PosteriorMdl,Summary] = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. Summary is a table with columns corresponding to posterior 12-590
estimate
characteristics and rows corresponding to the coefficients (PosteriorMdl.VarNames) and disturbance variance (Sigma2). Display the estimated parameter covariance matrix (Covariances) and proportion of times the algorithm includes each predictor (Regime). Covariances = Summary(:,"Covariances") Covariances=5×1 table Covariances ______________________________________________________________________ Intercept IPI E WR Sigma2
103.74 1.0486 -0.0031629 0.6791 7.3916
1.0486 0.023815 -1.3637e-05 -0.030387 0.06611
-0.0031629 -1.3637e-05 1.3481e-07 -8.8792e-05 -0.00025044
0.6791 -0.030387 -8.8792e-05 0.13066 0.089039
7.3916 0.06611 -0.00025044 0.089039 74.911
Regime = Summary(:,"Regime") Regime=5×1 table Regime ______ Intercept IPI E WR Sigma2
0.8806 0.4545 0.0925 0.1734 NaN
Regime contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability that E should be included in the model is 0.0925. Assuming that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model.
Input Arguments PriorMdl — Bayesian linear regression model for predictor variable selection mixconjugateblm model object | mixsemiconjugateblm model object | lassoblm model object Bayesian linear regression model for predictor variable selection, specified as a model object in this table. Model Object
Description
mixconjugateblm
Dependent, Gaussian-mixture-inverse-gamma conjugate model for SSVS predictor variable selection, returned by bayeslm
mixsemiconjugateblm
Independent, Gaussian-mixture-inverse-gamma semiconjugate model for SSVS predictor variable selection, returned by bayeslm
12-591
12
Functions
Model Object
Description
lassoblm
Bayesian lasso regression model returned by bayeslm
X — Predictor data numeric matrix Predictor data for the multiple linear regression model, specified as a numObservations-byPriorMdl.NumPredictors numeric matrix. numObservations is the number of observations and must be equal to the length of y. Data Types: double y — Response data numeric vector Response data for the multiple linear regression model, specified as a numeric vector with numObservations elements. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Sigma2',2 specifies estimating the conditional posterior distribution of the regression coefficients given the data and that the specified disturbance variance is 2. Display — Flag to display Bayesian estimator summary to command line true (default) | false Flag to display Bayesian estimator summary to the command line, specified as the comma-separated pair consisting of 'Display' and a value in this table. Value
Description
true
estimate prints estimation information and a table summarizing the Bayesian estimators to the command line.
false
estimate does not print to the command line.
The estimation information includes the estimation method, fixed parameters, the number of observations, and the number of predictors. The summary table contains estimated posterior means, standard deviations (square root of the posterior variance), 95% equitailed credible intervals, the posterior probability that the parameter is greater than 0, and a description of the posterior distribution (if known). For models that perform SSVS, the display table includes a column for variable-inclusion probabilities. If you specify either Beta or Sigma2, then estimate includes your specification in the display. Corresponding posterior estimates are trivial. Example: 'Display',false 12-592
estimate
Data Types: logical Beta — Value of regression coefficients for conditional posterior distribution estimation of disturbance variance empty array ([]) (default) | numeric column vector Value of the regression coefficients for conditional posterior distribution estimation of the disturbance variance, specified as the comma-separated pair consisting of 'Beta' and a (PriorMdl.Intercept + PriorMdl.NumPredictors)-by-1 numeric vector. estimate estimates the characteristics of π(σ2| y,X,β = Beta), where y is y, X is X, and Beta is the value of 'Beta'. If PriorMdl.Intercept is true, then Beta(1) corresponds to the model intercept. All other values correspond to the predictor variables that compose the columns of X. Beta cannot contain any NaN values (that is, all coefficients must be known). You cannot specify Beta and Sigma2 simultaneously. By default, estimate does not compute characteristics of the conditional posterior of σ2. Example: 'Beta',1:3 Data Types: double Sigma2 — Value of disturbance variance for conditional posterior distribution estimation of regression coefficients empty array ([]) (default) | positive numeric scalar Value of the disturbance variance for conditional posterior distribution estimation of the regression coefficients, specified as the comma-separated pair consisting of 'Sigma2' and a positive numeric scalar. estimate estimates characteristics of π(β|y,X,Sigma2), where y is y, X is X, and Sigma2 is the value of 'Sigma2'. You cannot specify Sigma2 and Beta simultaneously. By default, estimate does not compute characteristics of the conditional posterior of β. Example: 'Sigma2',1 Data Types: double NumDraws — Monte Carlo simulation adjusted sample size 1e5 (default) | positive integer Monte Carlo simulation adjusted sample size, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer. estimate actually draws BurnIn – NumDraws*Thin samples. Therefore, estimate bases the estimates off NumDraws samples. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-579. Example: 'NumDraws',1e7 Data Types: double BurnIn — Number of draws to remove from beginning of Monte Carlo sample 5000 (default) | nonnegative scalar Number of draws to remove from the beginning of the Monte Carlo sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and a nonnegative scalar. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-579. 12-593
12
Functions
Tip To help you specify the appropriate burn-in period size, determine the extent of the transient behavior in the Monte Carlo sample by specifying 'BurnIn',0, simulating a few thousand observations using simulate, and then plotting the paths. Example: 'BurnIn',0 Data Types: double Thin — Monte Carlo adjusted sample size multiplier 1 (default) | positive integer Monte Carlo adjusted sample size multiplier, specified as the comma-separated pair consisting of 'Thin' and a positive integer. The actual Monte Carlo sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, estimate discards every Thin – 1 draws, and then retains the next. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-579. Tip To reduce potential large serial correlation in the Monte Carlo sample, or to reduce the memory consumption of the draws stored in PosteriorMdl, specify a large value for Thin. Example: 'Thin',5 Data Types: double BetaStart — Starting values of regression coefficients for MCMC sample numeric column vector Starting values of the regression coefficients for the Markov chain Monte Carlo (MCMC) sample, specified as the comma-separated pair consisting of 'BetaStart' and a numeric column vector with (PriorMdl.Intercept + PriorMdl.NumPredictors) elements. By default, BetaStart is the ordinary least-squares (OLS) estimate. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values. Example: 'BetaStart',[1; 2; 3] Data Types: double Sigma2Start — Starting values of disturbance variance for MCMC sample positive numeric scalar Starting values of the disturbance variance for the MCMC sample, specified as the comma-separated pair consisting of 'Sigma2Start' and a positive numeric scalar. By default, Sigma2Start is the OLS residual mean squared error. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values.
12-594
estimate
Example: 'Sigma2Start',4 Data Types: double
Output Arguments PosteriorMdl — Bayesian linear regression model storing distribution characteristics mixconjugateblm model object | mixsemiconjugateblm model object | lassoblm model object | empiricalblm model object Bayesian linear regression model storing distribution characteristics, returned as a mixconjugateblm, mixsemiconjugateblm, lassoblm, or empiricalblm model object. • If you do not specify either Beta or Sigma2 (their values are []), then estimate updates the prior model using the data likelihood to form the posterior distribution. PosteriorMdl characterizes the posterior distribution and is an empiricalblm model object. Information PosteriorMdl stores or displays helps you decide whether predictor variables are important. • If you specify either Beta or Sigma2, then PosteriorMdl equals PriorMdl (the two models are the same object storing the same property values). estimate does not update the prior model to form the posterior model. However, Summary stores conditional posterior estimates. For more details on the display of PosteriorMdl, see Summary. Summary — Summary of Bayesian estimators table Summary of Bayesian estimators, returned as a table. Summary contains the same information as the display of the estimation summary (Display). Rows correspond to parameters, and columns correspond to these posterior characteristics: • Mean – Posterior mean • Std – Posterior standard deviation • CI95 – 95% equitailed credible interval • Positive – Posterior probability that the parameter is greater than 0 • Distribution – Description of the marginal or conditional posterior distribution of the parameter, when known • Covariances – Estimated covariance matrix of the coefficients and disturbance variance • Regime – Variable-inclusion probabilities for models that perform SSVS; low probabilities indicate that the variable should be excluded from the model Row names are the names in PriorMdl.VarNames. The name of the last row is Sigma2. Alternatively, pass PosteriorMdl to summarize to obtain a summary of Bayesian estimators.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: 12-595
12
Functions
• yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Tip • Monte Carlo simulation is subject to variation. If estimate uses Monte Carlo simulation, then estimates and inferences might vary when you call estimate multiple times under seemingly equivalent conditions. To reproduce estimation results, before calling estimate, set a random number seed by using rng.
Algorithms This figure shows how estimate reduces the Monte Carlo sample using the values of NumDraws, Thin, and BurnIn.
Rectangles represent successive draws from the distribution. estimate removes the white rectangles from the Monte Carlo sample. The remaining NumDraws black rectangles compose the Monte Carlo sample.
Version History Introduced in R2018b 12-596
estimate
See Also Objects mixconjugateblm | mixsemiconjugateblm | lassoblm Functions summarize | forecast | simulate | plot Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10 “Bayesian Stochastic Search Variable Selection” on page 6-63 “Bayesian Lasso Regression” on page 6-52
12-597
12
Functions
estimate Estimate posterior distribution of Bayesian state-space model parameters
Syntax PosteriorMdl = estimate(PriorMdl,Y,params0) PosteriorMdl = estimate(PriorMdl,Y,params0,Name=Value) [PosteriorMdl,estParams,EstParamCov,Summary] = estimate( ___ )
Description PosteriorMdl = estimate(PriorMdl,Y,params0) returns the posterior Bayesian state-space model PosteriorMdl from combining the Bayesian state-space model prior distribution and likelihood PriorMdl with the response data Y. The input argument params0 is the vector of initial values for the unknown state-space model parameters Θ in PriorMdl. PosteriorMdl = estimate(PriorMdl,Y,params0,Name=Value) specifies additional options using one or more name-value arguments. For example, estimate(Mdl,Y,params0,NumDraws=1e6,Thin=3,DoF=10) uses the multivariate t10 distribution for the Metropolis-Hastings [1][2] proposal, draws 3e6 random vectors of parameters, and thins the sample to reduce serial correlation by discarding every 2 draws until it retains 1e6 draws. [PosteriorMdl,estParams,EstParamCov,Summary] = estimate( ___ ) additionally returns the following quantities using any of the input-argument combinations in the previous syntaxes: • estParams — A vector containing the estimated parameters Θ. • EstParamCov — The estimated variance-covariance matrix of the estimated parameters Θ. • Summary — Estimation summary of the posterior distribution of parameters Θ. If the distribution of the state disturbances or observation innovations is non-Gaussian, Summary also includes the estimation summary of the final state values, and any estimated disturbance or innovation distribution hyperparameters.
Examples Estimate Posterior Distribution of Time-Invariant Model Simulate observed responses from a known state-space model, then treat the model as Bayesian and estimate the posterior distribution of the parameters by treating the state-space model parameters as unknown. Suppose the following state-space model is a data-generating process (DGP). xt, 1 xt, 2
=
xt − 1, 1 0.5 0 1 0 ut, 1 + 0 −0 . 75 xt − 1, 2 0 0 . 5 ut, 2
yt = 1 1
12-598
xt, 1 xt, 2
.
estimate
Create a standard state-space model object ssm that represents the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)]; B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C);
Simulate a response path from the DGP. rng(1); % For reproducibility y = simulate(DGP,200);
Suppose the structure of the DGP is known, but the state parameters trueTheta are unknown, explicitly xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 1
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
Consider a Bayesian state-space model representing the model with unknown parameters. Arbitrarily assume that the prior distribution of ϕ1, ϕ2, σ12, and σ22 are independent Gaussian random variables with mean 0.5 and variance 1. The Local Functions on page 12-602 section contains two functions required to specify the Bayesian state-space model. You can use the functions only within this script. The paramMap function accepts a vector of the unknown state-space model parameters and returns all the following quantities: • •
A= B=
ϕ1 0 0 ϕ2 σ1 0 0 σ2
. .
• C= 1 1. • D = 0. • Mean0 and Cov0 are empty arrays [], which specify the defaults. • StateType = 0 0 , indicating that each state is stationary. The paramDistribution function accepts the same vector of unknown parameters as does paramMap, but it returns the log prior density of the parameters at their current values. Specify that parameter values outside the parameter space have log prior density of -Inf. Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm. PriorMdl = bssm(@paramMap,@priorDistribution) PriorMdl = Mapping that defines a state-space model:
12-599
12
Functions
@paramMap Log density of parameter prior distribution: @priorDistribution
PriorMdl is a bssm object representing the Bayesian state-space model with unknown parameters. Estimate the posterior distribution using default options of estimate. Specify a random set of positive values in [0,1] to initialize the MCMC algorithm. numParams = 4; theta0 = rand(numParams,1); PosteriorMdl = estimate(PriorMdl,y,theta0); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.6968 0.4459 0.0798 c(2) | 0.7662 -0.8781 0.0483 c(3) | 0.3425 0.9633 0.0694 c(4) | 0.8459 0.3978 0.0726 Posterior Distributions | Mean Std Quantile05 Quantile95 -----------------------------------------------c(1) | 0.4491 0.0905 0.3031 0.6164 c(2) | -0.8577 0.0606 -0.9400 -0.7365 c(3) | 0.9589 0.0695 0.8458 1.0699 c(4) | 0.4316 0.0893 0.3045 0.6023 Proposal acceptance rate = 40.10% PosteriorMdl.ParamMap ans = function_handle with value: @paramMap ThetaPostDraws = PosteriorMdl.ParamDistribution; [numParams,numDraws] = size(ThetaPostDraws) numParams = 4 numDraws = 1000
estimate finds an optimal proposal distribution for the Metropolis-Hastings sampler by using the tune function. Therefore, by default, estimate prints convergence information from tune. Also, estimate displays a summary of the posterior distribution of the parameters. The true values of the parameters are close to their corresponding posterior means; all are within their corresponding 95% credible intervals. PosteriorMdl is a bssm object representing the posterior distribution. 12-600
estimate
• PosteriorMdl.ParamMap is the function handle to the function representing the state-space model structure; it is the same function as PrioirMdl.ParamMap. • ThetaPostDraws is a 4-by-1000 matrix of draws from the posterior distribution. By default, estimate treats the first 100 draws as a burn-in sample and removes them from the matrix. To diagnose the Markov chain induced by the Metropolis-Hastings sampler, create trace plots of the posterior parameter draws. paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(ThetaPostDraws(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
The sampler eventually settles at near the true values of the parameters. In this case, the sample shows serial correlation and transient behavior. You can remedy serial correlation in the sample by adjusting the Thin name-value argument, and you can remedy transient effects by increasing the burn-in period using the BurnIn name-value argument.
12-601
12
Functions
Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Improve Markov Chain Convergence Consider the model in the example “Estimate Posterior Distribution of Time-Invariant Model” on page 12-598. Improve the Markov chain convergence by adjusting sampler options. Create a standard state-space model object ssm that represents the DGP, and then simulate a response path. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)]; B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C); rng(1); % For reproducibility y = simulate(DGP,200);
Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm (the functions are in Local Functions on page 12-605). Mdl = bssm(@paramMap,@priorDistribution) Mdl = Mapping that defines a state-space model: @paramMap Log density of parameter prior distribution:
12-602
estimate
@priorDistribution
Estimate the posterior distribution. Specify the simulated response path as observed responses, specify a random set of positive values in [0,1] to initialize the MCMC algorithm, and shut off all optimization displays. The plots in “Estimate Posterior Distribution of Time-Invariant Model” on page 12-598 suggest that the Markov chain settles after 500 draws. Therefore, specify a burn-in period of 500 (BurnIn=500). Specify thinning the sample by keeping the first draw of each set of 30 successive draws (Thin=30). Retain 2000 random parameter vectors (NumDraws=2000). numParams = 4; theta0 = rand(numParams,1); options = optimoptions("fminunc",Display="off"); PosteriorMdl = estimate(Mdl,y,theta0,Options=options, ... NumDraws=2000,BurnIn=500,Thin=30); Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.6968 0.4459 0.0798 c(2) | 0.7662 -0.8781 0.0483 c(3) | 0.3425 0.9633 0.0694 c(4) | 0.8459 0.3978 0.0726 Posterior Distributions | Mean Std Quantile05 Quantile95 -----------------------------------------------c(1) | 0.4495 0.0822 0.3135 0.5858 c(2) | -0.8561 0.0587 -0.9363 -0.7468 c(3) | 0.9645 0.0744 0.8448 1.0863 c(4) | 0.4333 0.0860 0.3086 0.5889 Proposal acceptance rate = 38.85% ThetaPostDraws = PosteriorMdl.ParamDistribution;
Plot trace plots and correlograms of the parameters. paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(ThetaPostDraws(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
12-603
12
Functions
figure h = tiledlayout(4,1); for j = 1:numParams nexttile autocorr(ThetaPostDraws(j,:)); ylabel(paramNames(j)); title([]); end title(h,"Posterior Sample Correlograms")
12-604
estimate
The sampler quickly settles near the true values of the parameters. The sample shows little serial correlation and no transient behavior. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0);
12-605
12
Functions
logprior = sum(log(p)); end end
Estimate Degrees of Freedom of t-Distributed State Disturbances Simulate observed responses from a DGP, then treat the model as Bayesian and estimate the posterior distribution of the model parameters Θ and the degrees of freedom νu of multivariate t-distributed state disturbances. Consider the following DGP. xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 3
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
• The true value of the state-space parameter set Θ = ϕ1, ϕ2, σ1, σ2 = 0 . 5, − 0 . 75, 1, 0 . 5 . • The state disturbances u1, t and u2, t are jointly a multivariate Student's t random series with νu = 5 degrees of freedom. Create a vector autoregression (VAR) model that represents the state equation of the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; trueDoF = 5; phi = [trueTheta(1) 0; 0 trueTheta(2)]; Sigma = [trueTheta(3)^2 0; 0 trueTheta(4)^2]; DGP = varm(AR={phi},Covariance=Sigma,Constant=[0; 0]);
Filter a random 2-D multivariate central t series of length 500 through the VAR model to obtain state values. Set the degrees of freedom to 5. rng(10) % For reproducibility T = 500; trueU = mvtrnd(eye(DGP.NumSeries),trueDoF,T); X = filter(DGP,trueU);
Obtain a series of observations from the DGP by the linear combination yt = x1, e + 3x2, t. C = [1 3]; y = X*C';
Consider a Bayesian state-space model representing the model with parameters Θ and νu treated as unknown. Arbitrarily assume that the prior distribution of the parameters in Θ are independent Gaussian random variables with mean 0.5 and variance 1. Assume that the prior on the degrees of freedom νu is flat. The functions in Local Functions on page 12-608 specify the state-space structure and prior distributions. Create the Bayesian state-space model by passing function handles to the paramMap and priorDistribution functions to bssm. Specify that the state disturbance distribution is multivariate Student's t with unknown degrees of freedom. 12-606
estimate
PriorMdl = bssm(@paramMap,@priorDistribution,StateDistribution="t");
PriorMdl is a bssm object representing the Bayesian state-space model with unknown parameters. Estimate the posterior distribution by using estimate. Specify a random set of positive values in [0,1] to initialize the MCMC algorithm. Set the burn-in period of the MCMC algorithm to 1000 draws, thin the entire MCMC sample by retaining every third draw, and set the proposal scale matrix proportionality constant to 0.25 to increase the proposal acceptance rate. numParamsTheta = 4; theta0 = rand(numParamsTheta,1); PosteriorMdl = estimate(PriorMdl,y,theta0,Thin=3,BurnIn=1000,Proportion=0.25); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.9219 0.3622 0.1151 c(2) | 0.9475 -0.7530 0.0454 c(3) | 0.2299 1.3465 0.1917 c(4) | 0.6759 0.5891 0.0545 Posterior Distributions | Mean Std Quantile05 Quantile95 ---------------------------------------------------c(1) | 0.4516 0.1036 0.2641 0.5972 c(2) | -0.7459 0.0376 -0.8085 -0.6863 c(3) | 0.9816 0.1526 0.7430 1.2494 c(4) | 0.4843 0.0402 0.4234 0.5520 x(1) | -1.1901 0.9271 -2.7359 0.2539 x(2) | 0.2133 0.3090 -0.2680 0.7286 StateDoF | 5.3980 0.7931 4.1574 6.7111 Proposal acceptance rate = 51.10% ThetaPostDraws = PosteriorMdl.ParamDistribution; [numParams,numDraws] = size(ThetaPostDraws) numParams = 4 numDraws = 1000
estimate displays a summary of the posterior distribution of the parameters Θ (c(1) through c(4)), the final values of the two states (x(1) and x(2)), and νu (StateDoF). The true values of the parameters are close to their corresponding posterior means; all are within their corresponding 95% credible intervals. PosteriorMdl is a bssm object representing the posterior distribution. • PosteriorMdl.ParamMap is the function handle to the function representing the state-space model structure. It is the same function as PrioirMdl.ParamMap. • ThetaPostDraws is a 4-by-1000 matrix of draws from the posterior distribution of Θ | y. To diagnose the Markov chain induced by the Metropolis-Hastings sampler, create trace plots of the posterior parameter draws of Θ. 12-607
12
Functions
paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(ThetaPostDraws(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
The sample shows some serial correlation. You can increase Thin to remedy this behavior. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 3]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end
12-608
estimate
function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Estimate Posterior of Time-Varying Model Consider the following time-varying, state-space model for a DGP: • From periods 1 through 250, the state equation includes stationary AR(2) and MA(1) models, respectively, and the observation model is the weighted sum of the two states. • From periods 251 through 500, the state model includes only the first AR(2) model. • μ0 = 0 . 5 0 . 5 0 0 and Σ0 is the identity matrix. Symbolically, the DGP is x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1 0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
σ1 0 0 0 u1t 0 1 u2t for t = 1, . . . , 250, 0 1
+
yt = c1 x1t + x3t + σ2εt x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1 1 0 0 0 x3, t − 1
σ1
+
0
u1t
for t = 251,
x4, t − 1 yt = c2x1t + σ3εt x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0
u1t
for t = 252, . . . , 500,
yt = c2x1t + σ3εt where: • The AR(2) parameters ϕ1, ϕ2 = 0 . 5, − 0 . 2 and σ1 = 0 . 4. • The MA(1) parameter θ = 0 . 3. • The observation equation parameters c1, c2 = 2, 3 and σ2, σ3 = 0 . 1, 0 . 2 .
12-609
12
Functions
Simulate a response path of length 500 from the model. The function timeVariantParamMapBayes.m, stored in matlabroot/examples/econ/main, specifies the state-space model structure. params = [0.5; -0.2; 0.4; 0.3; 2; 0.1; 3; 0.2]; numParams = numel(params); numObs = 500; [A,B,C,D,mean0,Cov0,stateType] = timeVariantParamMapBayes(params,numObs); DGP = ssm(A,B,C,D,Mean0=mean0,Cov0=Cov0,StateType=stateType); rng(1) % For reproducibility y = simulate(DGP,numObs); plot(y) ylabel("y")
Create a time-varying, Bayesian state-space model that uses the structure of the DGP, but all parameters are unknown and the prior density is flat (the function flatPriorBSSM.m in matlabroot/examples/econ/main is the prior density). Create a bssm object representing the Bayesian state-space model object. Mdl = bssm(@(params)timeVariantParamMapBayes(params,numObs),@flatPriorBSSM);
Estimate the posterior distribution. Initialize the parameter values for the MCMC algorithm with a random set of positive values in [0,0.5]. Set the proposal distribution to multivariate t25. Set the proportionality constant of the proposal distribution scale matrix to 0.005. Shut off all displays. 12-610
estimate
Return the posterior distribution, estimated posterior means of the parameters, estimated covariance matrix of the estimated parameters, and an estimation summary. params0 = 0.5*rand(numParams,1); options = optimoptions("fminunc",Display="off"); [PostParams,estParams,EstParamCov,Summary] = estimate(Mdl,y,params0, ... DoF=25,Proportion=0.005,Display=false,Options=options); [params estParams] ans = 8×2 0.5000 -0.2000 0.4000 0.3000 2.0000 0.1000 3.0000 0.2000
0.5065 -0.2376 1.0243 0.8592 0.9569 1.6787 1.2119 0.1781
EstParamCov EstParamCov = 8×8 0.0013 -0.0004 0.0006 -0.0022 0.0005 -0.0000 -0.0011 -0.0016
-0.0004 0.0014 -0.0023 -0.0011 0.0024 0.0002 0.0025 0.0004
0.0006 -0.0023 0.0396 0.0245 -0.0260 0.0001 -0.0436 0.0070
-0.0022 -0.0011 0.0245 0.0745 -0.0443 0.0108 -0.0256 0.0174
0.0005 0.0024 -0.0260 -0.0443 0.0355 -0.0069 0.0275 -0.0115
-0.0000 0.0002 0.0001 0.0108 -0.0069 0.0038 0.0001 0.0023
-0.0011 0.0025 -0.0436 -0.0256 0.0275 0.0001 0.0485 -0.0066
-0.0016 0.0004 0.0070 0.0174 -0.0115 0.0023 -0.0066 0.0136
Summary Summary=8×4 table Mean Std _______ ________ 0.50653 -0.2376 1.0243 0.8592 0.95694 1.6787 1.2119 0.17813
0.035976 0.03725 0.19899 0.27297 0.18852 0.061687 0.22027 0.11659
Quantile05 __________
Quantile95 __________
0.46127 -0.29337 0.75605 0.40785 0.6326 1.5636 0.83207 0.015488
0.5786 -0.17496 1.3784 1.3276 1.2624 1.7739 1.5171 0.41109
The sampler has some trouble estimating several parameters. You can improve posterior estimation by diagnosing the properties of the Metropolis-Hasting sample.
12-611
12
Functions
Estimate Posterior from Model with Deflated Response When you work with a state-space model that contains a deflated response variable, you must have data for the predictors. Consider a regression of the US unemployment rate onto and real gross national product (RGNP) rate, and suppose the resulting innovations are an ARMA(1,1) process. The state-space form of the relationship is x1, t x2, t
=
σ ϕ θ x1, t − 1 + ut 1 0 0 x2, t − 1
yt − βZt = x1, t, where: • x1, t is the ARMA process. • x2, t is a dummy state for the MA(1) effect. •
yt is the observed unemployment rate deflated by a constant and the RGNP rate (Zt).
• ut is an iid Gaussian series with mean 0 and standard deviation 1. Load the Nelson-Plosser data set, which contains a table DataTable that has the unemployment rate and RGNP series, among other series. load Data_NelsonPlosser
Create a variable in DataTable that represents the returns of the raw RGNP series. Because priceto-returns conversion reduces the sample size by one, prepad the series with NaN. DataTable.RGNPRate = [NaN; price2ret(DataTable.GNPR)]; T = height(DataTable);
Create variables for the regression. Represent the unemployment rate as the observation series and the constant and RGNP rate series as the deflation data Zt. Z = [ones(T,1) DataTable.RGNPRate]; y = DataTable.UR;
The functions armaDeflateYBayes.m and flatPriorDeflateY.m in matlabroot/examples/ econ/main specify the state-space model structure and likelihood, and the flat prior distribution. Create a bssm object representing the Bayesian state-space model. Specify the parameter-to-matrix mapping function as a handle to a function solely of the parameters. numParams = 5; Mdl = bssm(@(params)armaDeflateYBayes(params,y,Z),@flatPriorDeflateY) Mdl = Mapping that defines a state-space model: @(params)armaDeflateYBayes(params,y,Z) Log density of parameter prior distribution: @flatPriorDeflateY
12-612
estimate
Estimate the posterior distribution. Initialize the MCMC algorithm with a random set of positive values in [0,0.5]. Set the proportionality constant to 0.01. Set a burn-in period of 2000 draws, set a thinning factor of 50, and specify retaining 1000 draws. Return an estimation summary table. rng(1) % For reproducibility params0 = 0.5*rand(numParams,1); options = optimoptions("fminunc",Display="off"); [~,~,~,Summary] = estimate(Mdl,y,params0,Proportion=0.01, ... BurnIn=2000,NumDraws=1000,Thin=50,Options=options,Display=false); Summary Summary=5×4 table Mean Std _______ ________ 0.83476 1.8213 1.7568 7.8658 -15.168
0.069346 0.24575 0.33037 3.1097 1.92
Quantile05 __________ 0.71916 1.392 1.2312 3.0791 -18.011
Quantile95 __________ 0.94427 2.2163 2.3438 13.164 -11.761
Input Arguments PriorMdl — Prior Bayesian state-space model bssm model object Prior Bayesian state-space model, specified as a bssm model object return by bssm or ssm2bssm. The function handles of the properties PriorMdl.ParamDistribution and PriorMdl.ParamMap determine the prior and the data likelihood, respectively. Y — Observed response data numeric matrix | cell vector of numeric vectors Observed response data, from which estimate forms the posterior distribution, specified as a numeric matrix or a cell vector of numeric vectors. • If PriorMdl is time invariant with respect to the observation equation, Y is a T-by-n matrix. Each row of the matrix corresponds to a period and each column corresponds to a particular observation in the model. T is the sample size and n is the number of observations per period. The last row of Y contains the latest observations. • If PriorMdl is time varying with respect to the observation equation, Y is a T-by-1 cell vector. Y{t} contains an nt-dimensional vector of observations for period t, where t = 1, ..., T. The corresponding dimensions of the coefficient matrices, outputs of PriorMdl.ParamMap, C{t}, and D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations. NaN elements indicate missing observations. For details on how the Kalman filter accommodates missing observations, see “Algorithms” on page 12-620. Data Types: double | cell params0 — Initial values for parameters Θ numeric vector 12-613
12
Functions
Initial parameter values for the parameters Θ, specified as a numParams-by-1 numeric vector. Elements of params0 must correspond to the elements of the first input arguments of PriorMdl.ParamMap and Mdl.ParamDistribution. Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: estimate(Mdl,Y,params0,NumDraws=1e6,Thin=3,DoF=10) uses the multivariate t10 distribution for the Metropolis-Hastings proposal, draws 3e6 random vectors of parameters, and thins the sample to reduce serial correlation by discarding every 2 draws until it retains 1e6 draws. Kalman Filter Options
Univariate — Univariate treatment of multivariate series flag false (default) | true Univariate treatment of a multivariate series flag, specified as a value in this table. Value
Description
true
Applies the univariate treatment of a multivariate series, also known as sequential filtering
false
Does not apply sequential filtering
The univariate treatment can accelerate and improve numerical stability of the Kalman filter. However, all observation innovations must be uncorrelated. That is, DtDt' must be diagonal, where Dt (t = 1, ..., T) is the output coefficient matrix D of PriorMdl.ParamMap and PriorMdl.ParamDistribution. Example: Univariate=true Data Types: logical SquareRoot — Square root filter method flag false (default) | true Square root filter method flag, specified as a value in this table. Value
Description
true
Applies the square root filter method for the Kalman filter
false
Does not apply the square root filter method
If you suspect that the eigenvalues of the filtered state or forecasted observation covariance matrices are close to zero, then specify SquareRoot=true. The square root filter is robust to numerical issues arising from the finite precision of calculations, but requires more computational resources. Example: SquareRoot=true Data Types: logical 12-614
estimate
Posterior Sampling Options
NumDraws — Number of Metropolis-Hastings sampler draws 1000 (default) | positive integer Number of Metropolis-Hastings sampler draws from the posterior distribution Π(θ|Y), specified as a positive integer. Example: NumDraws=1e7 Data Types: double BurnIn — Number of draws to remove from beginning of sample 100 (default) | nonnegative scalar Number of draws to remove from the beginning of the sample to reduce transient effects, specified as a nonnegative scalar. For details on how estimate reduces the full sample, see “Algorithms” on page 12-1988. Tip To help you specify the appropriate burn-in period size: 1
Determine the extent of the transient behavior in the sample by setting the BurnIn name-value argument to 0.
2
Simulate a few thousand observations by using simulate.
3
Create trace plots.
Example: BurnIn=1000 Data Types: double Thin — Adjusted sample size multiplier 1 (no thinning) (default) | positive integer Adjusted sample size multiplier, specified as a positive integer. The actual sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, estimate discards every Thin – 1 draws, and then retains the next draw. For more details on how estimate reduces the full sample, see “Algorithms” on page 12-1988. Tip To reduce potential large serial correlation in the posterior sample, or to reduce the memory consumption of the output sample, specify a large value for Thin. Example: Thin=5 Data Types: double DoF — Proposal distribution degrees of freedom Inf (default) | positive scalar Proposal distribution degrees of freedom for parameter updates using the Metropolis-Hastings sampler, specified as a value in this table. 12-615
12
Functions
Value
Metropolis-Hastings Proposal Distribution
Positive scalar
Multivariate t with DoF degrees of freedom
Inf
Multivariate normal
The following options specify other aspects of the proposal distribution: • Proportion — Optional proportionality constant that scales Proposal • Center — Optional expected value • StdDoF — Optional proposal standard deviation of the degrees of freedom parameter of the tdistributed state disturbance or observation innovation process. Example: DoF=10 Data Types: double Proportion — Proposal scale matrix proportionality constant 1 (default) | positive scalar Proposal scale matrix proportionality constant, specified as a positive scalar. Tip For higher proposal acceptance rates, experiment with relatively small values for Proportion. Example: Proportion=1 Data Types: double Center — Proposal distribution center [] (default) | numeric vector Proposal distribution center, specified as a value in this table. Value
Description
numParams-by-1 numeric vector
estimate uses the independence MetropolisHastings sampler. Center is the center of the proposal distribution.
[] (empty array)
estimate uses the random-walk MetropolisHastings sampler. The center of the proposal density is the current state of the Markov chain.
Example: Center=ones(10,1) Data Types: double StdDoF — Proposal standard deviation for degrees of freedom parameter 0.1 (default) | positive numeric scalar Proposal standard deviation for the degrees of freedom parameter of the t-distributed state disturbance or observation innovation process, specified as a positive numeric scalar. StdDoF applies to the corresponding process when PriorMdl.StateDistribution.Name is "t" or PriorMdl.ObservationDistribution.Name is "t", and their associated degrees of freedom 12-616
estimate
are estimable (the DoF field is NaN or a function handle). For example, if the following conditions hold, StdDof applies only to the t-distribution degrees of freedom of the state disturbance process: • PriorMdl.StateDistribution.Name is "t". • PriorMdl.StateDistribution.DoF is NaN. • The limiting distribution of the observation innovations is Gaussian (PriorMdl.ObservationDistribution.Name is "Gaussian" or PriorMdl.ObservationDistribution is struct("Name","t","DoF",Inf). Example: StdDoF=0.5 Data Types: double Proposal Tuning Options
Hessian — Hessian approximation method "difference" (default) | "diagonal" | "opg" | "optimizer" | character vector Hessian approximation method for the Metropolis-Hastings proposal distribution scale matrix, specified as a value in this table. Value
Description
"difference"
Finite differencing
"diagonal"
Diagonalized result of finite differencing
"opg"
Outer product of gradients, ignoring the prior distribution
"optimizer"
Posterior distribution optimized by fmincon or fminunc. Specify optimization options by using the Options name-value argument.
Tip The Hessian="difference" setting can be computationally intensive and inaccurate, and the resulting scale matrix can be nonnegative definite. Try one of the other options for better results. Example: Hessian="opg" Data Types: char | string Lower — Parameter lower bounds [] (default) | numeric vector Parameter lower bounds when computing the Hessian matrix (see Hessian), specified as a numParams-by-1 numeric vector. If you specify Lower, estimate uses Lower(j) specifies the lower bound of parameter theta(j), the first input argument of PriorMdl.ParamMap and PriorMdl.ParamDistribution. The default value [] specifies no lower bounds. Note Lower does not apply to posterior simulation. To apply parameter constraints on the posterior, code them in the log prior distribution function PriorMdl.ParamDistribution by setting the log prior of values outside the distribution support to -Inf. 12-617
12
Functions
Example: Lower=[0 -5 -1e7] Data Types: double Upper — Parameter upper bounds [] (default) | numeric vector Parameter lower bounds when computing the Hessian matrix (see Hessian), specified as a numParams-by-1 numeric vector. Upper(j) specifies the upper bound of parameter theta(j), the first input argument of PriorMdl.ParamMap and PriorMdl.ParamDistribution. The default value [] specifies no upper bounds. Note Upper does not apply to posterior simulation. To apply parameter constraints on the posterior, code them in the log prior distribution function PriorMdl.ParamDistribution by setting the log prior of values outside the distribution support to -Inf. Example: Upper=[5 100 1e7] Data Types: double Options — Optimization options optimoptions optimization controller Optimization options for the setting Hessian="optimizer", specified as an optimoptions optimization controller. Options replaces default optimization options of the optimizer. For details on altering default values of the optimizer, see the optimization controller optimoptions, the constrained optimization function fmincon, or the unconstrained optimization function fminunc in Optimization Toolbox. For example, to change the constraint tolerance to 1e-6, set options = optimoptions(@fmincon,ConstraintTolerance=1e-6,Algorithm="sqp"). Then, pass Options by using Options=options. By default, estimate uses the default options of the optimizer. Simplex — Simplex search flag true (default) | false Simplex search flag to improve initial parameter values, specified as a value in this table. Value
Description
true
Apply simplex search method to improve initial parameter values for proposal optimization. For more details, see “fminsearch Algorithm”.
false
Does not apply simplex search method.
estimate applies simplex search when the numerical optimization exit flag is not positive. Example: Simplex=false Data Types: logical 12-618
estimate
Display — Proposal tuning results display flag true (default) | false Proposal tuning results display flag, specified as a value in this table. Value
Description
true
Displays tuning results
false
Suppresses tuning results display
Example: Display=false Data Types: logical
Output Arguments PosteriorMdl — Posterior Bayesian state-space model bssm model object Posterior Bayesian state-space model, returned as a bssm model object. The property PosteriorMdl.ParamDistribution is a numParams-by-NumDraws sample from the posterior distribution of Θ|y; estimate obtains the sample by using the Metropolis-Hastings sampler. PosteriorMdl.ParamDistribution(j,:) contains a NumDraws sample of the parameter theta(j), where theta corresponds to Θ, and it is the first input argument of PriorMdl.ParamMap and PriorMdl.ParamDistribution. To obtain posterior draws of degrees of freedom parameters νu and νε, use simulate and specify an output function, using the OutputFunction name-value argument, that returns the draws after each MCMC iteration. The properties PosteriorMdl.ParaMap and PrioirMdl.ParamMap are equal. estParams — Posterior means of Θ numeric vector Posterior means of Θ, returned as a numParams-by-1 numeric vector. estParams(j) contains a NumDraws sample of the parameter theta(j). EstParamCov — Variance-covariance matrix of estimates of Θ numeric matrix Variance-covariance matrix of estimates of Θ, returned as a numParams-by-numParams numeric matrix. EstParamCov(j,k) contains the estimated covariance of the parameter estimates of theta(j) and theta(k). Summary — Summary of all Bayesian estimators structure array Summary of all Bayesian estimators, returned as a table. Rows of the table correspond to elements of Θ, final state values, and estimated state disturbance and observation innovation hyperparameters: • For the first numParams rows of the table, row j contains the posterior estimation summary of theta(j). 12-619
12
Functions
• If the distribution of the state disturbances or observation innovations is non-Gaussian, the following statements hold: • Rows numParams + k contains the posterior estimation summary of the final value of state k. • Rows after numParams + mT contain the posterior estimation summary of any estimated state disturbance hyperparameters, followed by any estimated observation innovation hyperparameters. mT is the number of states in the final period T. Columns contain the posterior mean Mean, standard deviation Std, 5% percentile Quantile05, and 95% percentile Quantile95.
Algorithms • estimate uses the tune function to create the proposal distribution for the Metropolis-Hastings sampler. You can tune the sampler by using the proposal tuning options on page 12-0 . • When estimate tunes the proposal distribution, the optimizer that estimate uses to search for the posterior mode before computing the Hessian matrix depends on your specifications. • estimate uses fminunc by default. • estimate uses fmincon when you perform constrained optimization by specifying the Lower or Upper name-value argument. • estimate further refines the solution returned by fminunc or fmincon by using fminsearch when Simplex=true, which is the default. • This figure shows how estimate reduces the sample by using the values of NumDraws, Thin, and BurnIn. Rectangles represent successive draws from the distribution. estimate removes the white rectangles from the sample. The remaining NumDraws black rectangles compose the sample.
Version History Introduced in R2022a
References [1] Hastings, Wilfred K. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications." Biometrika 57 (April 1970): 97–109. https://doi.org/10.1093/biomet/57.1.97.
12-620
estimate
[2] Metropolis, Nicholas, Rosenbluth, Arianna. W., Rosenbluth, Marshall. N., Teller, Augusta. H., and Teller, Edward. "Equation of State Calculations by Fast Computing Machines." The Journal of Chemical Physics 21 (June 1953): 1087–92. https://doi.org/10.1063/1.1699114.
See Also Objects bssm Functions ssm2bssm | tune | simulate Topics “What Are State-Space Models?” on page 11-3 “What Is the Kalman Filter?” on page 11-7 “Analyze Linearized DSGE Models” on page 11-178 “Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models” on page 11-199
12-621
12
Functions
estimate Estimate posterior distribution of Bayesian vector autoregression (VAR) model parameters
Syntax PosteriorMdl = estimate(PriorMdl,Y) PosteriorMdl = estimate(PriorMdl,Y,Name,Value) [PosteriorMdl,Summary] = estimate( ___ )
Description PosteriorMdl = estimate(PriorMdl,Y) returns the Bayesian VAR(p) model on page 12-642 PosteriorMdl that characterizes the joint posterior distributions of the coefficients Λ and innovations covariance matrix Σ. PriorMdl specifies the joint prior distribution of the parameters and the structure of the VAR model. Y is the multivariate response data. PriorMdl and PosteriorMdl might not be the same object type. NaNs in the data indicate missing values, which estimate removes by using list-wise deletion. PosteriorMdl = estimate(PriorMdl,Y,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify presample data to initialize the VAR model by using the 'Y0' name-value pair argument. [PosteriorMdl,Summary] = estimate( ___ ) also returns an estimation summary of the posterior distribution Summary, using any of the input argument combinations in the previous syntaxes.
Examples Estimate Posterior Distribution Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. Assume that the joint prior distribution of the VAR model parameters Φ1, . . . , Φ4, c ′, Σ is diffuse. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)];
12-622
estimate
figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'SeriesNames',seriesnames) PriorMdl = diffusebvarm with properties:
12-623
12
Functions
Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "DUNRATE" "DFEDFUNDS"] 1 0 0 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a diffusebvarm model object. Estimate Posterior Distribution Estimate the posterior distribution by passing the prior model and entire data series to estimate. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames}) Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39 | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781 AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868 AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647 AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860 AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620 AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917
12-624
estimate
AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470) PosteriorMdl = conjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "DUNRATE" "DFEDFUNDS"] 1 0 0 [39x1 double] [13x13 double] [3x3 double] 184 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PosteriorMdl is a conjugatebvarm model object; the posterior is analytically tractable. The command line displays the posterior means (Mean) and standard deviations (Std) of all coefficients and the innovations covariance matrix. Row AR{k}(i,j) contains the posterior estimates of ϕk, ij, the lag k AR coefficient of response variable j in response equation i. By default, estimate uses the first four observations as a presample to initialize the model. Display the posterior means of the AR coefficient matrices by using dot notation. AR1 = PosteriorMdl.AR{1} AR1 = 3×3 0.1241 -0.0219 -0.1586
-0.4809 0.4716 -1.4368
0.1005 0.0391 -0.2905
AR2 = PosteriorMdl.AR{2}
12-625
12
Functions
AR2 = 3×3 0.3236 0.0913 0.3403
-0.0503 0.2414 -0.2968
0.0450 0.0536 -0.3117
AR3 = PosteriorMdl.AR{3} AR3 = 3×3 0.4272 -0.0389 0.2848
0.2738 0.0552 -0.7401
0.0523 0.0008 0.0028
AR4 = PosteriorMdl.AR{4} AR4 = 3×3 0.0167 0.0285 -0.0690
-0.1830 -0.1795 0.1494
0.0067 0.0088 -0.1372
Specify Presample Data Consider the 3-D VAR(4) model of “Estimate Posterior Distribution” on page 12-622. In this case, fit the model to the data starting in 1970. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = diffusebvarm(numseries,numlags,'SeriesNames',seriesnames);
12-626
estimate
Partition Time Base for Subsamples A VAR(4) model requires p = 4 presample observations to initialize the AR component for estimation. Define index sets corresponding to the required presample and estimation samples. idxpre = rmDataTimeTable.Time < datetime('1970','InputFormat','yyyy'); % Presample indices idxest = ~idxpre; % Estimation sample indices T = sum(idxest) T = 157
The effective sample size is 157 observations. Estimate Posterior Distribution Estimate the posterior distribution. Specify only the required presample observations by using the 'Y0' name-value pair argument. Y0 = rmDataTimeTable{find(idxpre,PriorMdl.P,'last'),seriesnames}; PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{idxest,seriesnames},... 'Y0',Y0); Bayesian VAR under diffuse priors Effective Sample Size: 157 Number of equations: 3 Number of estimated Parameters: 39 | Mean Std ------------------------------Constant(1) | 0.1431 0.1134 Constant(2) | -0.0132 0.0588 Constant(3) | -0.6864 0.2418 AR{1}(1,1) | 0.1314 0.0869 AR{1}(2,1) | -0.0187 0.0450 AR{1}(3,1) | -0.2009 0.1854 AR{1}(1,2) | -0.5009 0.1834 AR{1}(2,2) | 0.4881 0.0950 AR{1}(3,2) | -1.6913 0.3912 AR{1}(1,3) | 0.1089 0.0446 AR{1}(2,3) | 0.0555 0.0231 AR{1}(3,3) | -0.3588 0.0951 AR{2}(1,1) | 0.2942 0.1012 AR{2}(2,1) | 0.0786 0.0524 AR{2}(3,1) | 0.3767 0.2157 AR{2}(1,2) | 0.0208 0.2042 AR{2}(2,2) | 0.3238 0.1058 AR{2}(3,2) | -0.4530 0.4354 AR{2}(1,3) | 0.0634 0.0487 AR{2}(2,3) | 0.0747 0.0252 AR{2}(3,3) | -0.3594 0.1038 AR{3}(1,1) | 0.4503 0.1002 AR{3}(2,1) | -0.0388 0.0519 AR{3}(3,1) | 0.3580 0.2136 AR{3}(1,2) | 0.3119 0.2008 AR{3}(2,2) | 0.0966 0.1040 AR{3}(3,2) | -0.8212 0.4282 AR{3}(1,3) | 0.0659 0.0502 AR{3}(2,3) | 0.0155 0.0260 AR{3}(3,3) | -0.0269 0.1070
12-627
12
Functions
AR{4}(1,1) | -0.0141 0.1046 AR{4}(2,1) | 0.0105 0.0542 AR{4}(3,1) | 0.0263 0.2231 AR{4}(1,2) | -0.2274 0.1875 AR{4}(2,2) | -0.1734 0.0972 AR{4}(3,2) | 0.1328 0.3999 AR{4}(1,3) | 0.0028 0.0456 AR{4}(2,3) | 0.0094 0.0236 AR{4}(3,3) | -0.1487 0.0973 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3597 -0.0333 0.1987 | (0.0433) (0.0161) (0.0672) DUNRATE | -0.0333 0.0966 -0.1647 | (0.0161) (0.0116) (0.0365) DFEDFUNDS | 0.1987 -0.1647 1.6355 | (0.0672) (0.0365) (0.1969)
Display and Return Estimation Summaries Consider the 3-D VAR(4) model of “Estimate Posterior Distribution” on page 12-622. Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = diffusebvarm(numseries,numlags,'SeriesNames',seriesnames);
You can display estimation output in three ways, or turn off the display. Compare the display types. estimate(PriorMdl,rmDataTimeTable{:,seriesnames}); % 'table', the default Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39 | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781
12-628
estimate
AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868 AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647 AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860 AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620 AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917 AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470) estimate(PriorMdl,rmDataTimeTable{:,seriesnames},... 'Display','equation'); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1241 -0.4809 0.1005 0.3236 -0.0503 0.0450 0.4272 | (0.0762) (0.1536) (0.0390) (0.0868) (0.1647) (0.0413) (0.0860)
12-629
12
Functions
DUNRATE
| | DFEDFUNDS | |
-0.0219 (0.0413) -0.1586 (0.1632)
0.4716 (0.0831) -1.4368 (0.3287)
0.0391 (0.0211) -0.2905 (0.0835)
0.0913 (0.0469) 0.3403 (0.1857)
Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470) estimate(PriorMdl,rmDataTimeTable{:,seriesnames},... 'Display','matrix'); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39 VAR Coefficient Matrix of Lag 1 | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) -------------------------------------------------INFL | 0.1241 -0.4809 0.1005 | (0.0762) (0.1536) (0.0390) DUNRATE | -0.0219 0.4716 0.0391 | (0.0413) (0.0831) (0.0211) DFEDFUNDS | -0.1586 -1.4368 -0.2905 | (0.1632) (0.3287) (0.0835) VAR Coefficient Matrix of Lag 2 | INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) -------------------------------------------------INFL | 0.3236 -0.0503 0.0450 | (0.0868) (0.1647) (0.0413) DUNRATE | 0.0913 0.2414 0.0536 | (0.0469) (0.0891) (0.0223) DFEDFUNDS | 0.3403 -0.2968 -0.3117 | (0.1857) (0.3526) (0.0883) VAR Coefficient Matrix of Lag 3 | INFL(-3) DUNRATE(-3) DFEDFUNDS(-3) -------------------------------------------------INFL | 0.4272 0.2738 0.0523 | (0.0860) (0.1620) (0.0428) DUNRATE | -0.0389 0.0552 0.0008 | (0.0465) (0.0876) (0.0232) DFEDFUNDS | 0.2848 -0.7401 0.0028 | (0.1841) (0.3466) (0.0917) VAR Coefficient Matrix of Lag 4 | INFL(-4) DUNRATE(-4) DFEDFUNDS(-4) -------------------------------------------------INFL | 0.0167 -0.1830 0.0067 | (0.0901) (0.1520) (0.0395) DUNRATE | 0.0285 -0.1795 0.0088
12-630
0.2414 (0.0891) -0.2968 (0.3526)
0.0536 (0.0223) -0.3117 (0.0883)
-0.0389 (0.0465) 0.2848 (0.1841)
estimate
| (0.0488) DFEDFUNDS | -0.0690 | (0.1928)
(0.0822) 0.1494 (0.3253)
(0.0214) -0.1372 (0.0845)
Constant Term | 0.1007 | (0.0832) DUNRATE | -0.0499 | 0.0450 DFEDFUNDS | -0.4221 | 0.1781 INFL
Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
Return the estimation summary, which is a structure that contains the same information regardless of display type. [PosteriorMdl,Summary] = estimate(PriorMdl,rmDataTimeTable{:,seriesnames}); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39 | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781 AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868 AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647 AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860 AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620
12-631
12
Functions
AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917 AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470) Summary Summary = struct with fields: Description: "3-Dimensional VAR(4) Model" NumEstimatedParameters: 39 Table: [39x2 table] CoeffMap: [39x1 string] CoeffMean: [39x1 double] CoeffStd: [39x1 double] SigmaMean: [3x3 double] SigmaStd: [3x3 double]
The CoeffMap field contains a list of the coefficient names. The order of the names corresponds to the order of all coefficient vector inputs and outputs. Display CoeffMap. Summary.CoeffMap ans = 39x1 string "AR{1}(1,1)" "AR{1}(1,2)" "AR{1}(1,3)" "AR{2}(1,1)" "AR{2}(1,2)" "AR{2}(1,3)" "AR{3}(1,1)" "AR{3}(1,2)" "AR{3}(1,3)" "AR{4}(1,1)" "AR{4}(1,2)" "AR{4}(1,3)" "Constant(1)" "AR{1}(2,1)" "AR{1}(2,2)" "AR{1}(2,3)"
12-632
estimate
"AR{2}(2,1)" "AR{2}(2,2)" "AR{2}(2,3)" "AR{3}(2,1)" "AR{3}(2,2)" "AR{3}(2,3)" "AR{4}(2,1)" "AR{4}(2,2)" "AR{4}(2,3)" "Constant(2)" "AR{1}(3,1)" "AR{1}(3,2)" "AR{1}(3,3)" "AR{2}(3,1)" ⋮
Estimate Posterior with Sparse Coefficients Consider the 3-D VAR(4) model of “Estimate Posterior Distribution” on page 12-622 In this example, create a normal conjugate prior model with a fixed coefficient matrix instead of a diffuse model. The model contains 39 coefficients. For coefficient sparsity in the posterior, apply the Minnesota regularization method during estimation. Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create a normal conjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names, and set the innovations covariance matrix −5
10 Σ= 0
−4
10
−4
0 10 0 . 1 −0 . 2 . −0 . 2 1 . 6
According to the Minnesota regularization method, specify the following: • Each response is an AR(1) model, on average, with lag 1 coefficient 0.75. • The prior self-lag coefficients have variance 100. This large variance setting allows the data to influence the posterior more than the prior. • The prior cross-lag coefficients have variance 0.01. This small variance setting tightens the crosslag coefficients to zero during estimation. • Prior coefficient covariances decay with increasing lag at a rate of 10 (that is, lower lags are more important than higher lags). 12-633
12
Functions
numseries = numel(seriesnames); numlags = 4; Sigma = [10e-5 0 10e-4; 0 0.1 -0.2; 10e-4 -0.2 1.6]; PriorMdl = bayesvarm(numseries,numlags,'Model','normal','SeriesNames',seriesnames,... 'Center',0.75,'SelfLag',100,'CrossLag',0.01,'Decay',10,... 'Sigma',Sigma);
Estimate the posterior distribution, and display the posterior response equations. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','equation'); Bayesian VAR under normal priors and fixed Sigma Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1234 -0.4373 0.1050 0.3343 -0.0342 0.0308 0.4441 | (0.0014) (0.0027) (0.0007) (0.0015) (0.0021) (0.0006) (0.0015) DUNRATE | 0.0521 0.3636 0.0125 0.0012 0.1720 0.0009 0.0000 | (0.0252) (0.0723) (0.0191) (0.0031) (0.0666) (0.0031) (0.0004) DFEDFUNDS | -0.0105 -0.1394 -0.1368 0.0002 -0.0000 -0.1227 0.0000 | (0.0749) (0.0948) (0.0713) (0.0031) (0.0031) (0.0633) (0.0004) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ---------------------------------------INFL | 0.0001 0 0.0010 | (0) (0) (0) DUNRATE | 0 0.1000 -0.2000 | (0) (0) (0) DFEDFUNDS | 0.0010 -0.2000 1.6000 | (0) (0) (0)
Compare the results to a posterior in which you specify no prior regularization. PriorMdlNoReg = bayesvarm(numseries,numlags,'Model','normal','SeriesNames',seriesnames,... 'Sigma',Sigma); PosteriorMdlNoReg = estimate(PriorMdlNoReg,rmDataTimeTable{:,seriesnames},'Display','equation'); Bayesian VAR under normal priors and fixed Sigma Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1242 -0.4794 0.1007 0.3233 -0.0502 0.0450 0.4270 | (0.0014) (0.0028) (0.0007) (0.0016) (0.0030) (0.0007) (0.0016) DUNRATE | -0.0264 0.3428 0.0089 0.0969 0.1578 0.0292 0.0042 | (0.0347) (0.0714) (0.0203) (0.0356) (0.0714) (0.0203) (0.0337) DFEDFUNDS | -0.0351 -0.1248 -0.0411 0.0416 -0.0224 -0.1358 0.0014 | (0.0787) (0.0949) (0.0696) (0.0631) (0.0689) (0.0663) (0.0533) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ----------------------------------------
12-634
estimate
INFL
| 0.0001 | (0) DUNRATE | 0 | (0) DFEDFUNDS | 0.0010 | (0)
0 (0) 0.1000 (0) -0.2000 (0)
0.0010 (0) -0.2000 (0) 1.6000 (0)
The posterior estimates of the Minnesota prior have lower magnitude, in general, compared to the estimates of the default normal conjugate prior model.
Adjust Posterior Sampling Options Consider the 3-D VAR(4) model of “Estimate Posterior Distribution” on page 12-622 In this case, assume that the coefficients and innovations covariance matrix are independent (a semiconjugate prior model). Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create a semiconjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'Model','semiconjugate',... 'SeriesNames',seriesnames);
Because the joint posterior of a semiconjugate prior model is analytically intractable, estimate uses a Gibbs sampler to form the joint posterior by sampling from the tractable full conditionals. Estimate the posterior distribution. For the Gibbs sampler, specify an effective number of draws of 20,000, a burn-in period of 5000, and a thinning factor of 10. rng(1) % For reproducibility PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},... 'Display','equation','NumDraws',20000,'Burnin',5000,'Thin',10); Bayesian VAR under semiconjugate priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.2243 -0.0824 0.1365 0.2515 -0.0098 0.0329 0.2888
12-635
12
Functions
| | | DFEDFUNDS | |
DUNRATE
(0.0662) -0.0262 (0.0342) -0.0251 (0.0785)
(0.0821) 0.3666 (0.0728) -0.1285 (0.0962)
(0.0319) 0.0148 (0.0197) -0.0527 (0.0673)
(0.0701) 0.0929 (0.0354) 0.0379 (0.0630)
(0.0636) 0.1637 (0.0713) -0.0256 (0.0688)
(0.0309) 0.0336 (0.0198) -0.1452 (0.0643)
Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.2984 -0.0219 0.1754 | (0.0305) (0.0121) (0.0499) DUNRATE | -0.0219 0.0890 -0.1496 | (0.0121) (0.0092) (0.0292) DFEDFUNDS | 0.1754 -0.1496 1.4754 | (0.0499) (0.0292) (0.1506)
PosteriorMdl is an empiricalbvarm model represented by draws from the full conditionals. After removing the first burn-in period draws and thinning the remaining draws by keeping every 10th draw, estimate stores the draws in the CoeffDraws and SigmaDraws properties.
Estimate VARX Model Consider the 2-D VARX(1) model for the US real GDP (RGDP) and investment (GCE) rates that treats the personal consumption (PCEC) rate as exogenous: RGDPt GCEt
=c+Φ
RGDPt − 1 GCEt − 1
+ PCECt β + εt .
For all t, εt is a series of independent 2-D normal innovations with a mean of 0 and covariance Σ. Assume the following prior distributions: •
Φ c β ′ Σ ∼ Ν4 × 2 Μ, V, Σ , where M is a 4-by-2 matrix of means and V is the 4-by-4 amongcoefficient scale matrix. Equivalently, vec Φ c β ′ Σ ∼ Ν8 vec Μ , Σ ⊗ V .
• Σ ∼ Inverse Wishart Ω, ν , where Ω is the 2-by-2 scale matrix and ν is the degrees of freedom. Load the US macroeconomic data set. Compute the real GDP, investment, and personal consumption rate series. Remove all missing values from the resulting series. load Data_USEconModel DataTimeTable.RGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF; seriesnames = ["PCEC"; "RGDP"; "GCE"]; rates = varfun(@price2ret,DataTimeTable,'InputVariables',seriesnames); rates = rmmissing(rates); rates.Properties.VariableNames = seriesnames;
Create a conjugate prior model for the 2-D VARX(1) model parameters. numseries = 2; numlags = 1; numpredictors = 1; PriorMdl = conjugatebvarm(numseries,numlags,'NumPredictors',numpredictors,... 'SeriesNames',seriesnames(2:end));
Estimate the posterior distribution. Specify the exogenous predictor data. 12-636
(0.0662) 0.0016 (0.0334) -0.0040 (0.0531)
estimate
PosteriorMdl = estimate(PriorMdl,rates{:,2:end},... 'X',rates{:,1},'Display','equation'); Bayesian VAR under conjugate priors Effective Sample Size: 247 Number of equations: 2 Number of estimated Parameters: 8 VAR Equations | RGDP(-1) GCE(-1) Constant X1 ----------------------------------------------RGDP | 0.0083 -0.0027 0.0078 0.0105 | (0.0625) (0.0606) (0.0043) (0.0625) GCE | 0.0059 0.0477 0.0166 0.0058 | (0.0644) (0.0624) (0.0044) (0.0645) Innovations Covariance Matrix | RGDP GCE --------------------------RGDP | 0.0040 0.0000 | (0.0004) (0.0003) GCE | 0.0000 0.0043 | (0.0003) (0.0004)
By default, estimate uses the first p = 1 observations in the specified response data as a presample, and it removes the corresponding observations in the predictor data from the sample. The posterior means (and standard deviations) of the regression coefficients appear below the X1 column of the estimation summary table.
Input Arguments PriorMdl — Prior Bayesian VAR model conjugatebvarm model object | semiconjugatebvarm model object | diffusebvarm model object | normalbvarm model object Prior Bayesian VAR model, specified as a model object in this table. Model Object
Description
conjugatebvarm
Dependent, matrix-normal-inverse-Wishart conjugate model returned by bayesvarm, conjugatebvarm, or estimate
semiconjugatebv Independent, normal-inverse-Wishart semiconjugate prior model returned by arm bayesvarm or semiconjugatebvarm diffusebvarm
Diffuse prior model returned by bayesvarm or diffusebvarm
normalbvarm
Normal conjugate model with a fixed innovations covariance matrix, returned by bayesvarm, normalbvarm, or estimate
PriorMdl can also represent a joint posterior model returned by estimate, either a conjugatebvarm or normalbvarm model object. In this case, estimate updates the joint posterior distribution using the new observations. For a semiconjugatebvarm model, estimate uses a Gibbs sampler to estimate the posterior distribution. To tune the sampler, see “Options for Semiconjugate Prior Distributions” on page 120 . 12-637
12
Functions
Y — Observed multivariate response series numeric matrix Observed multivariate response series to which estimate fits the model, specified as a numobs-bynumseries numeric matrix. numobs is the sample size. numseries is the number of response variables (PriorMdl.NumSeries). Rows correspond to observations, and the last row contains the latest observation. Columns correspond to individual response variables. Y represents the continuation of the presample response series in Y0. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Y0',Y0,'Display','off' specifies the presample data Y0 and suppresses the estimation display. Options for All Prior Distributions
Y0 — Presample response data numeric matrix Presample response data to initialize the VAR model for estimation, specified as the comma-separated pair consisting of 'Y0' and a numpreobs-by-numseries numeric matrix. numpreobs is the number of presample observations. Rows correspond to presample observations, and the last row contains the latest observation. Y0 must have at least PriorMdl.P rows. If you supply more rows than necessary, estimate uses the latest PriorMdl.P observations only. Columns must correspond to the response series in Y. By default, estimate uses Y(1:PriorMdl.P,:) as presample observations, and then estimates the posterior using Y((PriorMdl.P + 1):end,:). This action reduces the effective sample size. Data Types: double X — Predictor data numeric matrix Predictor data for the exogenous regression component in the model, specified as the commaseparated pair consisting of 'X' and a numobs-by-PriorMdl.NumPredictors numeric matrix. Rows correspond to observations, and the last row contains the latest observation. estimate does not use the regression component in the presample period. X must have at least as many observations as the observations used after the presample period. 12-638
estimate
• If you specify Y0, then X must have at least numobs rows (see Y). • Otherwise, X must have at least numobs – PriorMdl.P observations to account for the presample removal. In either case, if you supply more rows than necessary, estimate uses the latest observations only. Columns correspond to individual predictor variables. All predictor variables are present in the regression component of each response equation. Data Types: double Display — Estimation display style 'table' (default) | 'off' | 'equation' | 'matrix' Estimation display style printed to the command line, specified as the comma-separated pair consisting of 'Display' and a value in this table. Value
Description
'off'
estimate does not print to the command line.
'table'
estimate prints the following: • Estimation information
• Tabular summary of coefficient posterior means and standard deviations; each row corresp to a coefficient, and each column corresponds to an estimate type
• Posterior mean of the innovations covariance matrix with standard deviations in parenthes 'equation'
estimate prints the following: • Estimation information
• Tabular summary of posterior means and standard deviations; each row corresponds to a response variable in the system, and each column corresponds to a coefficient in the equat (for example, the column labeled Y1(-1) contains the estimates of the lag 1 coefficient of first response variable in each equation)
• Posterior mean of the innovations covariance matrix with standard deviations in parenthes 'matrix'
estimate prints the following: • Estimation information
• Separate tabular displays of posterior means and standard deviations (in parentheses) for e parameter in the model Φ1,…, Φp, c, δ, Β, and Σ The estimation information includes the effective sample size, the number of equations in the system, and the number of estimated parameters. Example: 'Display','matrix' Data Types: char | string Options for Semiconjugate Prior Distributions
NumDraws — Monte Carlo simulation adjusted sample size 1e5 (default) | positive integer Monte Carlo simulation adjusted sample size, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer. estimate actually draws BurnIn + NumDraws*Thin samples, 12-639
12
Functions
but bases the estimates off NumDraws samples. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-643. Example: 'NumDraws',1e7 Data Types: double BurnIn — Number of draws to remove from beginning of Monte Carlo sample 5000 (default) | nonnegative scalar Number of draws to remove from the beginning of the Monte Carlo sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and a nonnegative scalar. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-643. Tip To help you specify the appropriate burn-in period size: 1
Determine the extent of the transient behavior in the sample by setting the BurnIn name-value argument to 0.
2
Simulate a few thousand observations by using simulate.
3
Create trace plots.
Example: 'BurnIn',0 Data Types: double Thin — Adjusted sample size multiplier 1 (default) | positive integer Adjusted sample size multiplier, specified as the comma-separated pair consisting of 'Thin' and a positive integer. The actual Monte Carlo sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, estimate discards every Thin – 1 draws, and then retains the next. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-643. Tip To reduce potential large serial correlation in the Monte Carlo sample, or to reduce the memory consumption of the draws stored in PosteriorMdl, specify a large value for Thin. Example: 'Thin',5 Data Types: double Coeff0 — Starting values of VAR model coefficients for Gibbs sampler numeric column vector Starting values of the VAR model coefficients for the Gibbs sampler, specified as the commaseparated pair consisting of 'Coeff0' and a numel(PriorMdl.Mu)-by-1 numeric column vector. Elements correspond to the elements of PriorMdl.Mu (see Mu). By default, Coeff0 is the ordinary least-squares (OLS) estimate. 12-640
estimate
Tip • Create Coeff0 by vertically stacking the transpose of all initial coefficients in the following order (skip coefficients not in the model): 1
All coefficient matrices ordered by lag
2
Constant vector
3
Linear time trend vector
4
Exogenous regression coefficient matrix
Specify the vectorized result Coeff0(:). • A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values.
Data Types: double Sigma0 — Starting values of innovations covariance matrix for Gibbs sampler numeric positive definite matrix Starting values of the innovations covariance matrix for the Gibbs sampler, specified as the commaseparated pair consisting of 'Sigma0' and a numeric positive definite matrix. Rows and columns correspond to response equations. By default, Sigma0 is the OLS residual mean squared error. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values. Data Types: double
Output Arguments PosteriorMdl — Posterior Bayesian VAR model conjugatebvarm model object | normalbvarm model object | empiricalbvarm model object Posterior Bayesian VAR model, returned as a model object in the table. Model Object
PriorMdl
Posterior Form
conjugatebvarm conjugatebvarm or diffusebvarm
Analytically tractable
normalbvarm
Analytically tractable
normalbvarm
empiricalbvarm semiconjugatebvarm
Analytically intractable
Summary — Summary of Bayesian estimators structure array 12-641
12
Functions
Summary of Bayesian estimators, returned as a structure array containing the fields in this table. Field
Description
Data Type
Description
Model description
String scalar
NumEstimatedParameters
Number of estimated coefficients
Numeric scalar
Table
Table of coefficient posterior Table means and standard deviations; each row corresponds to a coefficient, and each column corresponds to an estimate type
CoeffMap
Coefficient names
String vector
CoeffMean
Coefficient posterior means
Numeric vector; rows correspond to CoeffMap
CoeffStd
Coefficient posterior standard deviations
Numeric vector; rows correspond to CoeffMap
SigmaMean
Innovations covariance posterior mean matrix
Numeric matrix; rows and columns correspond to response equations
SigmaStd
Innovations covariance posterior standard deviation matrix
Numeric matrix; rows and columns correspond to response equations
Alternatively, pass PosteriorMdl to summarize to obtain a summary of Bayesian estimators.
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. 12-642
estimate
• zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation.
Tips • Monte Carlo simulation is subject to variation. If estimate uses Monte Carlo simulation, then estimates and inferences might vary when you call estimate multiple times under seemingly equivalent conditions. To reproduce estimation results, set a random number seed by using rng before calling estimate.
Algorithms • Whenever the prior distribution PriorMdl and the data likelihood yield an analytically tractable posterior distribution, estimate evaluates the closed-form solutions to Bayes estimators. Otherwise, estimate uses the Gibbs sampler to estimate the posterior. • This figure illustrates how estimate reduces the Monte Carlo sample using the values of NumDraws, Thin, and BurnIn. Rectangles represent successive draws from the distribution. estimate removes the white rectangles from the Monte Carlo sample. The remaining NumDraws black rectangles compose the Monte Carlo sample.
12-643
12
Functions
Version History Introduced in R2020a
See Also Objects normalbvarm | conjugatebvarm | semiconjugatebvarm | diffusebvarm | empiricalbvarm Functions summarize | bayesvarm
12-644
estimate
estimate Fit conditional variance model to data
Syntax EstMdl = estimate(Mdl,y) EstMdl = estimate(Mdl,y,Name,Value) [EstMdl,EstParamCov,logL,info] = estimate( ___ )
Description EstMdl = estimate(Mdl,y) estimates the unknown parameters of the conditional variance model object Mdl with the observed univariate time series y, using maximum likelihood. EstMdl is a fully specified conditional variance model object that stores the results. It is the same model type as Mdl (see garch, egarch, and gjr). EstMdl = estimate(Mdl,y,Name,Value) estimates the conditional variance model with additional options specified by one or more Name,Value pair arguments. For example, you can specify to display iterative optimization information or presample innovations. [EstMdl,EstParamCov,logL,info] = estimate( ___ ) additionally returns: • EstParamCov, the variance-covariance matrix associated with estimated parameters. • logL, the optimized loglikelihood objective function. • info, a data structure of summary information using any of the input arguments in the previous syntaxes.
Examples Estimate GARCH Model Parameters Without Initial Values Fit a GARCH(1,1) model to simulated data. Simulate 500 data points from the GARCH(1,1) model yt = εt, where εt = σtzt and σt2 = 0 . 0001 + 0 . 5σt2− 1 + 0 . 2εt2− 1 . Use the default Gaussian innovation distribution for zt. Mdl0 = garch('Constant',0.0001,'GARCH',0.5,... 'ARCH',0.2); rng default; % For reproducibility [v,y] = simulate(Mdl0,500);
12-645
12
Functions
The output v contains simulated conditional variances. y is a column vector of simulated responses (innovations). Specify a GARCH(1,1) model with unknown coefficients, and fit it to the series y. Mdl = garch(1,1); EstMdl = estimate(Mdl,y) GARCH(1,1) Conditional Variance Model (Gaussian Distribution): Value __________ Constant GARCH{1} ARCH{1}
9.8911e-05 0.45394 0.26374
StandardError _____________ 3.0726e-05 0.11193 0.056931
TStatistic __________
PValue __________
3.2191 4.0557 4.6326
0.001286 4.9987e-05 3.6111e-06
EstMdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 9.89108e-05 {0.453935} at lag [1] {0.263739} at lag [1] 0
The result is a new garch model called EstMdl. The parameter estimates in EstMdl resemble the parameter values that generated the simulated data.
Estimate EGARCH Model Parameters Without Initial Values Fit an EGARCH(1,1) model to simulated data. Simulate 500 data points from an EGARCH(1,1) model yt = εt, where εt = σtzt, and logσt2 = 0 . 001 + 0 . 7logσt2− 1 + 0 . 5
εt − 1 εt − 1 2 − − 0.3 σt − 1 π σt − 1
(the distribution of zt is Gaussian). Mdl0 = egarch('Constant',0.001,'GARCH',0.7,... 'ARCH',0.5,'Leverage',-0.3); rng default % For reproducibility [v,y] = simulate(Mdl0,500);
12-646
estimate
The output v contains simulated conditional variances. y is a column vector of simulated responses (innovations). Specify an EGARCH(1,1) model with unknown coefficients, and fit it to the series y. Mdl = egarch(1,1); EstMdl = estimate(Mdl,y) EGARCH(1,1) Conditional Variance Model (Gaussian Distribution): Value ___________ Constant GARCH{1} ARCH{1} Leverage{1}
StandardError _____________
-0.00063867 0.70506 0.56774 -0.32116
0.031698 0.067359 0.074746 0.053345
TStatistic __________
PValue __________
-0.020149 10.467 7.5956 -6.0204
0.98392 1.2221e-25 3.063e-14 1.7398e-09
EstMdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 -0.00063867 {0.705065} at lag [1] {0.567741} at lag [1] {-0.321158} at lag [1] 0
The result is a new egarch model called EstMdl. The parameter estimates in EstMdl resemble the parameter values that generated the simulated data.
Estimate GJR Model Parameters Without Initial Values Fit a GJR(1,1) model to simulated data. Simulate 500 data points from a GJR(1,1) model. yt = εt, where εt = σtzt and σt2 = 0 . 001 + 0 . 5σt2− 1 + 0 . 2εt2− 1 + 0 . 2I εt − 1 < 0 εt2− 1 . Use the default Gaussian innovation distribution for zt. Mdl0 = gjr('Constant',0.001,'GARCH',0.5,... 'ARCH',0.2,'Leverage',0.2); rng default; % For reproducibility [v,y] = simulate(Mdl0,500);
12-647
12
Functions
The output v contains simulated conditional variances. y is a column vector of simulated responses (innovations). Specify a GJR(1,1) model with unknown coefficients, and fit it to the series y. Mdl = gjr(1,1); EstMdl = estimate(Mdl,y) GJR(1,1) Conditional Variance Model (Gaussian Distribution): Value __________ Constant GARCH{1} ARCH{1} Leverage{1}
0.00097382 0.46056 0.24126 0.25051
StandardError _____________ 0.00025135 0.071793 0.063409 0.11265
TStatistic __________
PValue __________
3.8743 6.4151 3.8047 2.2237
0.00010694 1.4077e-10 0.00014196 0.02617
EstMdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 0.000973819 {0.460555} at lag [1] {0.241256} at lag [1] {0.250507} at lag [1] 0
The result is a new gjr model called EstMdl. The parameter estimates in EstMdl resemble the parameter values that generated the simulated data.
Estimate GARCH Model Parameters Using Presample Data Fit a GARCH(1,1) model to the daily close NASDAQ Composite Index returns. Load the NASDAQ data included with the toolbox. Convert the index to returns. load Data_EquityIdx nasdaq = DataTable.NASDAQ; y = price2ret(nasdaq); T = length(y); figure plot(y) xlim([0,T]) title('NASDAQ Returns')
12-648
estimate
The returns exhibit volatility clustering. Specify a GARCH(1,1) model, and fit it to the series. One presample innovation is required to initialize this model. Use the first observation of y as the necessary presample innovation. Mdl = garch(1,1); [EstMdl,EstParamCov] = estimate(Mdl,y(2:end),'E0',y(1)) GARCH(1,1) Conditional Variance Model (Gaussian Distribution): Value __________ Constant GARCH{1} ARCH{1}
1.9987e-06 0.88356 0.10903
StandardError _____________ 5.4228e-07 0.0084341 0.0076471
TStatistic __________
PValue __________
3.6857 104.76 14.257
0.00022807 0 4.0408e-46
EstMdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 1.99867e-06 {0.883563} at lag [1]
12-649
12
Functions
ARCH: {0.109027} at lag [1] Offset: 0 EstParamCov = 3×3 10-4 × 0.0000 -0.0000 0.0000
-0.0000 0.7113 -0.5343
0.0000 -0.5343 0.5848
The output EstMdl is a new garch model with estimated parameters. Use the output variance-covariance matrix to calculate the estimate standard errors. se = sqrt(diag(EstParamCov)) se = 3×1 0.0000 0.0084 0.0076
These are the standard errors shown in the estimation output display. They correspond (in order) to the constant, GARCH coefficient, and ARCH coefficient.
Estimate EGARCH Model Parameters Using Presample Data Fit an EGARCH(1,1) model to the daily close NASDAQ Composite Index returns. Load the NASDAQ data included with the toolbox. Convert the index to returns. load Data_EquityIdx nasdaq = DataTable.NASDAQ; y = price2ret(nasdaq); T = length(y); figure plot(y) xlim([0,T]) title('NASDAQ Returns')
12-650
estimate
The returns exhibit volatility clustering. Specify an EGARCH(1,1) model, and fit it to the series. One presample innovation is required to initialize this model. Use the first observation of y as the necessary presample innovation. Mdl = egarch(1,1); [EstMdl,EstParamCov] = estimate(Mdl,y(2:end),'E0',y(1)) EGARCH(1,1) Conditional Variance Model (Gaussian Distribution):
Constant GARCH{1} ARCH{1} Leverage{1}
Value _________
StandardError _____________
-0.13478 0.98391 0.19964 -0.060243
0.022092 0.0024221 0.013966 0.005647
TStatistic __________ -6.101 406.22 14.296 -10.668
PValue __________ 1.0539e-09 0 2.3324e-46 1.4358e-26
EstMdl = egarch with properties: Description: Distribution: P: Q: Constant: GARCH:
"EGARCH(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 -0.134784 {0.983909} at lag [1]
12-651
12
Functions
ARCH: {0.199645} at lag [1] Leverage: {-0.0602431} at lag [1] Offset: 0 EstParamCov = 4×4 10-3 × 0.4881 0.0533 -0.1018 0.0106
0.0533 0.0059 -0.0118 0.0017
-0.1018 -0.0118 0.1950 0.0016
0.0106 0.0017 0.0016 0.0319
The output EstMdl is a new egarch model with estimated parameters. Use the output variance-covariance matrix to calculate the estimate standard errors. se = sqrt(diag(EstParamCov)) se = 4×1 0.0221 0.0024 0.0140 0.0056
These are the standard errors shown in the estimation output display. They correspond (in order) to the constant, GARCH coefficient, ARCH coefficient, and leverage coefficient.
Estimate GJR Model Parameters Using Presample Data Fit a GJR(1,1) model to the daily close NASDAQ Composite Index returns. Load the NASDAQ data included with the toolbox. Convert the index to returns. load Data_EquityIdx nasdaq = DataTable.NASDAQ; y = price2ret(nasdaq); T = length(y); figure plot(y) xlim([0,T]) title('NASDAQ Returns')
12-652
estimate
The returns exhibit volatility clustering. Specify a GJR(1,1) model, and fit it to the series. One presample innovation is required to initialize this model. Use the first observation of y as the necessary presample innovation. Mdl = gjr(1,1); [EstMdl,EstParamCov] = estimate(Mdl,y(2:end),'E0',y(1)) GJR(1,1) Conditional Variance Model (Gaussian Distribution): Value __________ Constant GARCH{1} ARCH{1} Leverage{1}
2.4571e-06 0.88133 0.064132 0.088804
StandardError _____________ 5.6865e-07 0.0094908 0.0092023 0.0099182
TStatistic __________
PValue __________
4.321 92.862 6.9692 8.9536
1.553e-05 0 3.1882e-12 3.4396e-19
EstMdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH:
"GJR(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 2.45713e-06 {0.881334} at lag [1]
12-653
12
Functions
ARCH: {0.0641323} at lag [1] Leverage: {0.0888043} at lag [1] Offset: 0 EstParamCov = 4×4 10-4 × 0.0000 -0.0000 0.0000 0.0000
-0.0000 0.9007 -0.6939 0.0001
0.0000 -0.6939 0.8468 -0.3613
0.0000 0.0001 -0.3613 0.9837
The output EstMdl is a new gjr model with estimated parameters. Use the output variance-covariance matrix to calculate the estimate standard errors. se = sqrt(diag(EstParamCov)) se = 4×1 0.0000 0.0095 0.0092 0.0099
These are the standard errors shown in the estimation output display. They correspond (in order) to the constant, GARCH coefficient, ARCH coefficient, and leverage coefficient.
Input Arguments Mdl — Conditional variance model garch model object | egarch model object | gjr model object Conditional variance model containing unknown parameters, specified as a garch, egarch, or gjr model object. estimate treats non-NaN elements in Mdl as equality constraints, and does not estimate the corresponding parameters. y — Single path of response data numeric column vector Single path of response data, specified as a numeric column vector. The software infers the conditional variances from y, i.e., the data to which the model is fit. y is usually an innovation series with mean 0 and conditional variance characterized by the model specified in Mdl. In this case, y is a continuation of the innovation series E0. y can also represent an innovation series with mean 0 plus an offset. A nonzero Offset signals the inclusion of an offset in Mdl. The last observation of y is the latest observation. Data Types: double 12-654
estimate
Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Display','iter','E0',[0.1; 0.05] specifies to display iterative optimization information, and [0.05; 0.1] as presample innovations. For GARCH, EGARCH, and GJR Models
ARCH0 — Initial coefficient estimates corresponding to past innovation terms numeric vector Initial coefficient estimates corresponding to past innovation terms, specified as the commaseparated pair consisting of 'ARCH0' and a numeric vector. • For GARCH(P,Q) and GJR(P,Q) models: • ARCH0 must be a numeric vector containing nonnegative elements. • ARCH0 contains the initial coefficient estimates associated with the past squared innovation terms that compose the ARCH polynomial. • By default, estimate derives initial estimates using standard time series techniques. • For EGARCH(P,Q) models: • ARCH0 contains the initial coefficient estimates associated with the magnitude of the past standardized innovations that compose the ARCH polynomial. • By default, estimate sets the initial coefficient estimate associated with the first nonzero lag in the model to a small positive value. All other values are zero. The number of coefficients in ARCH0 must equal the number of lags associated with nonzero coefficients in the ARCH polynomial, as specified in the ARCHLags property of Mdl. Data Types: double Constant0 — Initial conditional variance model constant estimate numeric scalar Initial conditional variance model constant estimate, specified as the comma-separated pair consisting of 'Constant0' and a numeric scalar. For GARCH(P,Q) and GJR(P,Q) models, Constant0 must be a positive scalar. By default, estimate derives initial estimates using standard time series techniques. Data Types: double Display — Command Window display option 'params' (default) | 'diagnostics' | 'full' | 'iter' | 'off' | string vector | cell vector of character vectors Command Window display option, specified as the comma-separated pair consisting of 'Display' and one or more of the values in this table. 12-655
12
Functions
Value
Information Displayed
'diagnostics'
Optimization diagnostics
'full'
Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics
'iter'
Iterative optimization information
'off'
None
'params'
Maximum likelihood parameter estimates, standard errors, and t statistics
Example: 'Display','off' is well suited for running a simulation that estimates many models. Example: 'Display',{'params','diagnostics'} displays all estimation results and the optimization diagnostics. Data Types: char | cell | string DoF0 — Initial estimate of t-distribution degrees-of-freedom parameter 10 (default) | positive scalar Initial estimate of the t-distribution degrees-of-freedom parameter ν, specified as the commaseparated pair consisting of 'DoF0' and a positive scalar. DoF0 must exceed 2. Data Types: double E0 — Presample innovations numeric column vector Presample innovations, specified as the comma-separated pair consisting of 'E0' and a numeric column vector. The presample innovations provide initial values for the innovations process of the conditional variance model Mdl. The presample innovations derive from a distribution with mean 0. E0 must contain at least Mdl.Q rows. If E0 contains extra rows, then estimate uses the latest Mdl.Q presample innovations. The last row contains the latest presample innovation. The defaults are: • For GARCH(P,Q) and GJR(P,Q) models, estimate sets any necessary presample innovations to the square root of the average squared value of the offset-adjusted response series y. • For EGARCH(P,Q) models, estimate sets any necessary presample innovations to zero. Data Types: double GARCH0 — Initial coefficient estimates for past conditional variance terms numeric vector Initial coefficient estimates for past conditional variance terms, specified as the comma-separated pair consisting of 'GARCH0' and a numeric vector. • For GARCH(P,Q) and GJR(P,Q) models: • GARCH0 must be a numeric vector containing nonnegative elements. • GARCH0 contains the initial coefficient estimates associated with the past conditional variance terms that compose the GARCH polynomial. 12-656
estimate
• For EGARCH(P,Q) models,GARCH0 contains the initial coefficient estimates associated with past log conditional variance terms that compose the GARCH polynomial. The number of coefficients in GARCH0 must equal the number of lags associated with nonzero coefficients in the GARCH polynomial, as specified in the GARCHLags property of Mdl. By default, estimate derives initial estimates using standard time series techniques. Data Types: double Offset0 — Initial innovation mean model offset estimate scalar Initial innovation mean model offset estimate, specified as the comma-separated pair consisting of 'Offset0' and a scalar. By default, estimate sets the initial estimate to the sample mean of y. Data Types: double Options — Optimization options optimoptions optimization controller Optimization options, specified as the comma-separated pair consisting of 'Options' and an optimoptions optimization controller. For details on modifying the default values of the optimizer, see optimoptions or fmincon in Optimization Toolbox. For example, to change the constraint tolerance to 1e-6, set Options = optimoptions(@fmincon,'ConstraintTolerance',1e-6,'Algorithm','sqp'). Then, pass Options into estimate using 'Options',Options. By default, estimate uses the same default options as fmincon, except Algorithm is 'sqp' and ConstraintTolerance is 1e-7. V0 — Presample conditional variances numeric column vector with positive entries Presample conditional variances, specified as the comma-separated pair consisting of 'V0' and numeric column vector with positive entries. V0 provide initial values for conditional variance process of the conditional variance model Mdl. For GARCH(P,Q) and GJR(P,Q) models, V0 must have at least Mdl.P rows. For EGARCH(P,Q) models,V0 must have at least max(Mdl.P,Mdl.Q) rows. If the number of rows in V0 exceeds the necessary number, only the latest observations are used. The last row contains the latest observation. By default, estimate sets the necessary presample conditional variances to the average squared value of the offset-adjusted response series y. Data Types: double For EGARCH and GJR Models
Leverage0 — Initial coefficient estimates past leverage terms 0 (default) | numeric vector 12-657
12
Functions
Initial coefficient estimates past leverage terms, specified as the comma-separated pair consisting of 'Leverage0' and a numeric vector. For EGARCH(P,Q) models, Leverage0 contains the initial coefficient estimates associated with past standardized innovation terms that compose the leverage polynomial. For GJR(P,Q) models, Leverage0 contains the initial coefficient estimates associated with past, squared, negative innovations that compose the leverage polynomial. The number of coefficients in Leverage0 must equal the number of lags associated with nonzero coefficients in the leverage polynomial (Leverage), as specified in LeverageLags. Data Types: double Notes • NaNs in the presample or estimation data indicate missing data, and estimate removes them. The software merges the presample data (E0 and V0) separately from the effective sample data (y), and then uses list-wise deletion to remove rows containing at least one NaN. Removing NaNs in the data reduces the sample size, and can also create irregular time series. • estimate assumes that you synchronize the presample data such that the latest observations occur simultaneously. • If you specify a value for Display, then it takes precedence over the specifications of the optimization options Diagnostics and Display. Otherwise, estimate honors all selections related to the display of optimization information in the optimization options. • If you do not specify E0 and V0, then estimate derives the necessary presample observations from the unconditional, or long-run, variance of the offset-adjusted response process. • For all conditional variance models, V0 is the sample average of the squared disturbances of the offset-adjusted response data y. • For GARCH(P,Q) and GJR(P,Q) models, E0 is the square root of the average squared value of the offset-adjusted response series y. • For EGARCH(P,Q) models, E0 is 0. These specifications minimize initial transient effects.
Output Arguments EstMdl — Conditional variance model containing parameter estimates garch model object | egarch model object | gjr model object Conditional variance model containing parameter estimates, returned as a garch, egarch, or gjr model object. estimate uses maximum likelihood to calculate all parameter estimates not constrained by Mdl (i.e., constrained parameters have known values). EstMdl is a fully specified conditional variance model. To infer conditional variances for diagnostic checking, pass EstMdl to infer. To simulate or forecast conditional variances, pass EstMdl to simulate or forecast, respectively. EstParamCov — Variance-covariance matrix of maximum likelihood estimates numeric matrix 12-658
estimate
Variance-covariance matrix of maximum likelihood estimates of model parameters known to the optimizer, returned as a numeric matrix. The rows and columns associated with any parameters estimated by maximum likelihood contain the covariances of estimation error. The standard errors of the parameter estimates are the square root of the entries along the main diagonal. The rows and columns associated with any parameters that are held fixed as equality constraints contain 0s. estimate uses the outer product of gradients (OPG) method to perform covariance matrix estimation on page 3-61. estimate orders the parameters in EstParamCov as follows: • Constant • Nonzero GARCH coefficients at positive lags • Nonzero ARCH coefficients at positive lags • For EGARCH and GJR models, nonzero leverage coefficients at positive lags • Degrees of freedom (t innovation distribution only) • Offset (models with nonzero offset only) Data Types: double logL — Optimized loglikelihood objective function value scalar Optimized loglikelihood objective function value, returned as a scalar. Data Types: double info — Optimization summary structure array Optimization summary, returned as a structure array with the fields described in this table. Field
Description
exitflag
Optimization exit flag (see fmincon in Optimization Toolbox)
options
Optimization options controller (see optimoptions and fmincon in Optimization Toolbox)
X
Vector of final parameter estimates
X0
Vector of initial parameter estimates
For example, you can display the vector of final estimates by entering info.X in the Command Window. Data Types: struct
Tip • To access values of the estimation results, including the number of free parameters in the model, pass EstMdl to summarize. 12-659
12
Functions
Version History Introduced in R2012a
References [1] Bollerslev, Tim. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics 31 (April 1986): 307–27. https://doi.org/10.1016/0304-4076(86)90063-1. [2] Bollerslev, Tim. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” The Review of Economics and Statistics 69 (August 1987): 542–47. https:// doi.org/10.2307/1925546. [3] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [4] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995. [5] Engle, Robert. F. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica 50 (July 1982): 987–1007. https://doi.org/ 10.2307/1912773. [6] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [7] Greene, W. H. Econometric Analysis. 3rd ed. Upper Saddle River, NJ: Prentice Hall, 1997. [8] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects garch | egarch | gjr Functions filter | forecast | infer | simulate | summarize Topics “Compare Conditional Variance Models Using Information Criteria” on page 8-69 “Likelihood Ratio Test for Conditional Variance Models” on page 8-66 “Estimate Conditional Mean and Variance Model” on page 7-135 “Maximum Likelihood Estimation for Conditional Variance Models” on page 8-52 “Conditional Variance Model Estimation with Equality Constraints” on page 8-54 “Presample Data for Conditional Variance Model Estimation” on page 8-55 “Initial Values for Conditional Variance Model Estimation” on page 8-57 “Optimization Settings for Conditional Variance Model Estimation” on page 8-58
12-660
estimate
estimate Class: dssm Maximum likelihood parameter estimation of diffuse state-space models
Syntax EstMdl = estimate(Mdl,Y,params0) EstMdl = estimate(Mdl,Y,params0,Name,Value) [EstMdl,estParams,EstParamCov,logL,Output] = estimate( ___ )
Description EstMdl = estimate(Mdl,Y,params0) returns an estimated diffuse state-space model on page 114 from fitting the dssm model Mdl to the response data Y. params0 is the vector of initial values for the unknown parameters in Mdl. EstMdl = estimate(Mdl,Y,params0,Name,Value) estimates the diffuse state-space model with additional options specified by one or more Name,Value pair arguments. For example, you can specify to deflate the observations by a linear regression using predictor data, control how the results appear in the Command Window, and indicate which estimation method to use for the parameter covariance matrix. [EstMdl,estParams,EstParamCov,logL,Output] = estimate( ___ ) additionally returns these arguments using any of the input arguments in the previous syntaxes. • estParams, a vector containing the estimated parameters • EstParamCov, the estimated variance-covariance matrix of the estimated parameters • logL, the optimized loglikelihood value • Output, optimization diagnostic information structure
Input Arguments Mdl — Diffuse state-space model dssm model object Diffuse state-space model containing unknown parameters, specified as a dssm model object returned by dssm. • For explicitly created state-space models, the software estimates all NaN values in the coefficient matrices (Mdl.A, Mdl.B, Mdl.C, and Mdl.D) and the initial state means and covariance matrix (Mdl.Mean0 and Mdl.Cov0). For details on explicit and implicit model creation, see dssm. • For implicitly created state-space models, you specify the model structure and the location of the unknown parameters using the parameter-to-matrix mapping function. Implicitly create a statespace model to estimate complex models, impose parameter constraints, and estimate initial states. The parameter-to-mapping function can also accommodate additional output arguments. 12-661
12
Functions
Note Mdl does not store observed responses or predictor data. Supply the data wherever necessary using, the appropriate input and name-value pair arguments. Y — Observed response data numeric matrix | cell vector of numeric vectors Observed response data to which Mdl is fit, specified as a numeric matrix or a cell vector of numeric vectors. • If Mdl is time invariant with respect to the observation equation, then Y is a T-by-n matrix. Each row of the matrix corresponds to a period and each column corresponds to a particular observation in the model. Therefore, T is the sample size and n is the number of observations per period. The last row of Y contains the latest observations. • If Mdl is time varying with respect to the observation equation, then Y is a T-by-1 cell vector. Y{t} contains an nt-dimensional vector of observations for period t, where t = 1,...,T. The corresponding dimensions of the coefficient matrices in Mdl.C{t} and Mdl.D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations. Suppose that you create Mdl implicitly by specifying a parameter-to-matrix mapping function, and the function has input arguments for the observed responses or predictors. Then, the mapping function establishes a link to observed responses and the predictor data in the MATLAB workspace, which overrides the value of Y. NaN elements indicate missing observations. For details on how the Kalman filter accommodates missing observations, see “Algorithms” on page 12-1819. Data Types: double | cell params0 — Initial values of unknown parameters numeric vector Initial values of unknown parameters for numeric maximum likelihood estimation, specified as a numeric vector. The elements of params0 correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise, following the order A, B, C, D, Mean0, Cov0. • If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrix mapping function. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-662
estimate
Example: 'CovMethod','hessian','Display','diagnostics','Predictors',Z specifies to estimate the asymptotic parameter covariance using the negative, inverted Hessian matrix, display optimization diagnostics at the Command Window, and to deflate the observations by a linear regression containing the predictor data Z. Estimation Options
Beta0 — Initial values of regression coefficients numeric matrix Initial values of regression coefficients, specified as the comma-separated pair consisting of 'Beta0' and a d-by-n numeric matrix. d is the number of predictor variables (see Predictors) and n is the number of observed response series (see Y). By default, Beta0 is the ordinary least-squares estimate of Y onto Predictors. Data Types: double CovMethod — Asymptotic covariance estimation method 'opg' (default) | 'hessian' | 'sandwich' Asymptotic covariance estimation method, specified as the comma-separated pair consisting of 'CovMethod' and a value in this table. Value
Description
'hessian'
Negative, inverted Hessian matrix
'opg'
Outer product of gradients (OPG)
'sandwich'
Both Hessian and OPG
Example: 'CovMethod','sandwich' Data Types: char Display — Command Window display option 'params' (default) | 'diagnostics' | 'full' | 'iter' | 'off' | string vector | cell vector of character vectors Command Window display option, specified as the comma-separated pair consisting of 'Display' and one or more of the values in this table. Value
Information Displayed
'diagnostics'
Optimization diagnostics
'full'
Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics
'iter'
Iterative optimization information
'off'
None
'params'
Maximum likelihood parameter estimates, standard errors, and t statistics
Example: 'Display','off' is well suited for running a simulation that estimates many models. 12-663
12
Functions
Example: 'Display',{'params','diagnostics'} displays all estimation results and the optimization diagnostics. Data Types: char | cell | string Options — Optimization options optimoptions optimization controller Optimization options, specified as the comma-separated pair consisting of 'Options' and an optimoptions optimization controller. Options replaces default optimization options of the optimizer. For details on altering default values of the optimizer, see the optimization controller optimoptions, the constrained optimization function fmincon, or the unconstrained optimization function fminunc in Optimization Toolbox. For example, to change the constraint tolerance to 1e-6, set Options = optimoptions(@fmincon,'ConstraintTolerance',1e-6,'Algorithm','sqp'). Then, pass Options into estimate using 'Options',Options. By default: • For constrained optimization, estimate maximizes the likelihood objective function using fmincon and its default options, but sets 'Algorithm','interior-point'. • For unconstrained optimization, estimate maximizes the likelihood objective function using fminunc and its default options, but sets 'Algorithm','quasi-newton'. Predictors — Predictor data [] (default) | numeric matrix Predictor data for the regression component in the observation equation, specified as the commaseparated pair consisting of 'Predictors' and a T-by-d numeric matrix. T is the number of periods and d is the number of predictor variables. Row t corresponds to the observed predictors at period t (Zt) in the expanded observation equation yt − Zt β = Cxt + Dut . In other words, the predictor series serve as observation deflators. β is a d-by-n time-invariant matrix of regression coefficients that the software estimates with all other parameters. • For n observations per period, the software regresses all predictor series onto each observation. • If you specify Predictors, then Mdl must be time invariant. Otherwise, the software returns an error. • By default, the software excludes a regression component from the state-space model. Data Types: double SwitchTime — Final period for diffuse state initialization positive integer Final period for diffuse state initialization, specified as the comma-separated pair consisting of 'SwitchTime' and a positive integer. That is, estimate uses the observations from period 1 to period SwitchTime as a presample to implement the exact initial Kalman filter (see “Diffuse Kalman Filter” on page 11-12 and [1]). After initializing the diffuse states, estimate applies the standard Kalman filter on page 11-7 to the observations from periods SwitchTime + 1 to T. 12-664
estimate
The default value for SwitchTime is the last period in which the estimated smoothed state precision matrix is singular (i.e., the inverse of the covariance matrix). This specification represents the fewest number of observations required to initialize the diffuse states. Therefore, it is a best practice to use the default value. If you set SwitchTime to a value greater than the default, then the effective sample size decreases. If you set SwitchTime to a value that is fewer than the default, then estimate might not have enough observations to initialize the diffuse states, which can result in an error or improper values. In general, estimating, filtering, and smoothing state-space models with at least one diffuse state requires SwitchTime to be at least one. The default estimation display contains the effective sample size. Data Types: double Tolerance — Forecast uncertainty threshold 0 (default) | nonnegative scalar Forecast uncertainty threshold, specified as the comma-separated pair consisting of 'Tolerance' and a nonnegative scalar. If the forecast uncertainty for a particular observation is less than Tolerance during numerical estimation, then the software removes the uncertainty corresponding to the observation from the forecast covariance matrix before its inversion. It is best practice to set Tolerance to a small number, for example, le-15, to overcome numerical obstacles during estimation. Example: 'Tolerance',le-15 Data Types: double Univariate — Univariate treatment of multivariate series flag false (default) | true Univariate treatment of a multivariate series flag, specified as the comma-separated pair consisting of 'Univariate' and true or false. Univariate treatment of a multivariate series is also known as sequential filtering. The univariate treatment can accelerate and improve numerical stability of the Kalman filter. However, all observation innovations must be uncorrelated. That is, DtDt' must be diagonal, where Dt, t = 1,...,T, is one of the following: • The matrix D{t} in a time-varying state-space model • The matrix D in a time-invariant state-space model Example: 'Univariate',true Data Types: logical Constrained Optimization Options for fmincon
Aeq — Linear equality constraint parameter transformer matrix Linear equality constraint parameter transformer for constrained likelihood objective function maximization, specified as the comma-separated pair consisting of 'Aeq' and a matrix. 12-665
12
Functions
If you specify Aeq and beq, then estimate maximizes the likelihood objective function using the equality constraint Aeqθ = beq, where θ is a vector containing every Mdl parameter. The number of rows of Aeq is the number of constraints, and the number of columns is the number of parameters that the software estimates. Order the columns of Aeq by Mdl.A, Mdl.B, Mdl.C, Mdl.D, Mdl.Mean0, Mdl.Cov0, and the regression coefficient (if the model has one). Specify Aeq and beq together, otherwise estimate returns an error. Aeq directly corresponds to the input argument Aeq of fmincon, not to the state-transition coefficient matrix Mdl.A. By default, if you did not specify any constraint (linear inequality, linear equality, or upper and lower bound), then estimate maximizes the likelihood objective function using unconstrained maximization. Aineq — Linear inequality constraint parameter transformer matrix Linear inequality constraint parameter transformer for constrained likelihood objective function maximization, specified as the comma-separated pair consisting of 'Aineq' and a matrix. If you specify Aineq and bineq, then estimate maximizes the likelihood objective function using the inequality constraint Aineqθ ≤ bineq, where θ is a vector containing every Mdl parameter. The number of rows of Aineq is the number of constraints, and the number of columns is the number of parameters that the software estimates. Order the columns of Aineq by Mdl.A, Mdl.B, Mdl.C, Mdl.D, Mdl.Mean0, Mdl.Cov0, and the regression coefficient (if the model has one). Specify Aineq and bineq together, otherwise estimate returns an error. Aineq directly corresponds to the input argument A of fmincon, not to the state-transition coefficient matrix Mdl.A. By default, if you did not specify any constraint (linear inequality, linear equality, or upper and lower bound), then estimate maximizes the likelihood objective function using unconstrained maximization. Data Types: double beq — Linear equality constraints of transformed parameters numeric vector Linear equality constraints of the transformed parameters for constrained likelihood objective function maximization, specified as the comma-separated pair consisting of 'beq' and a numeric vector. If you specify Aeq and beq, then estimate maximizes the likelihood objective function using the equality constraint Aeqθ = beq, where θ is a vector containing every Mdl parameter.. Specify Aeq and beq together, otherwise estimate returns an error. beq directly corresponds to the input argument beq of fmincon, and is not associated with any component of Mdl. 12-666
estimate
By default, if you did not specify any constraint (linear inequality, linear equality, or upper and lower bound), then estimate maximizes the likelihood objective function using unconstrained maximization. Data Types: double bineq — Linear inequality constraint upper bounds numeric vector Linear inequality constraint upper bounds of the transformed parameters for constrained likelihood objective function maximization, specified as the comma-separated pair consisting of 'bineq' and a numeric vector. If you specify Aineq and bineq, then estimate maximizes the likelihood objective function using the inequality constraint Aineqθ ≤ bineq, where θ is a vector containing every Mdl parameter. Specify Aineq and bineq together, otherwise estimate returns an error. bineq directly corresponds to the input argument b of fmincon, and is not associated with any component of Mdl. By default, if you did not specify any constraint (linear inequality, linear equality, or upper and lower bound), then estimate maximizes the likelihood objective function using unconstrained maximization. Data Types: double lb — Lower bounds of parameters numeric vector Lower bounds of the parameters for constrained likelihood objective function maximization, specified as the comma-separated pair consisting of 'lb' and a numeric vector. If you specify lb and ub, then estimate maximizes the likelihood objective function subject to lb ≤ θ ≤ ub, where θ is a vector containing every Mdl parameter. Order the elements of lb by Mdl.A, Mdl.B, Mdl.C, Mdl.D, Mdl.Mean0, Mdl.Cov0, and the regression coefficient (if the model has one). By default, if you did not specify any constraint (linear inequality, linear equality, or upper and lower bound), then estimate maximizes the likelihood objective function using unconstrained maximization. Data Types: double ub — Upper bounds of parameters numeric vector Upper bounds of the parameters for constrained likelihood objective function maximization, specified as the comma-separated pair consisting of 'ub' and a numeric vector. If you specify lb and ub, then estimate maximizes the likelihood objective function subject to lb ≤ θ ≤ ub, where θ is a vector every Mdl parameter. Order the elements of ub by Mdl.A, Mdl.B, Mdl.C, Mdl.D, Mdl.Mean0, Mdl.Cov0, and the regression coefficient (if the model has one). 12-667
12
Functions
By default, if you did not specify any constraint (linear inequality, linear equality, or upper and lower bound), then estimate maximizes the likelihood objective function using unconstrained maximization. Data Types: double
Output Arguments EstMdl — Diffuse state-space model containing parameter estimates dssm model object Diffuse state-space model containing the parameter estimates, returned as a dssm model object. estimate uses the diffuse Kalman filter and maximum likelihood to calculate all parameter estimates. Regardless of how you created Mdl, EstMdl stores: • The parameter estimates of the coefficient matrices in the properties A, B, C, and D. • The initial state means and covariance matrix in the properties Mean0 and Cov0. Note EstMdl does not store observed responses or predictor data. If you plan to filter (using filter), forecast (using forecast), or smooth (using smooth) using EstMdl, then you need to supply the appropriate data. estParams — Maximum likelihood estimates of model parameters numeric vector Maximum likelihood estimates of the model parameters known to the optimizer, returned as a numeric vector. estParams has the same dimensions as params0. estimate arranges the estimates in estParams corresponding to unknown parameters in this order. 1
EstMdl.A(:), that is, estimates in EstMdl.A listed column-wise
2
EstMdl.B(:)
3
EstMdl.C(:)
4
EstMdl.D(:)
5
EstMdl.Mean0
6
EstMdl.Cov0(:)
7
In models with predictors, estimated regression coefficients listed column-wise
EstParamCov — Variance-covariance matrix of maximum likelihood estimates numeric matrix Variance-covariance matrix of maximum likelihood estimates of the model parameters known to the optimizer, returned as a numeric matrix. The rows and columns contain the covariances of the parameter estimates. The standard errors of the parameter estimates are the square root of the entries along the main diagonal. 12-668
estimate
estimate arranges the estimates in the rows and columns of EstParamCov corresponding to unknown parameters in this order. 1
EstMdl.A(:), that is, estimates in EstMdl.A listed column-wise
2
EstMdl.B(:)
3
EstMdl.C(:)
4
EstMdl.D(:)
5
EstMdl.Mean0
6
EstMdl.Cov0(:)
7
In models with predictors, estimated regression coefficients listed column-wise
logL — Optimized loglikelihood value scalar Optimized loglikelihood value, returned as a scalar. Missing observations do not contribute to the loglikelihood. The observations after the first SwitchTime periods contribute to the loglikelihood only. Output — Optimization information structure array Optimization information, returned as a structure array. This table describes the fields of Output. Field
Description
ExitFlag
Optimization exit flag that describes the exit condition. For details, see fmincon and fminunc.
Options
Optimization options that the optimizer used for numerical estimation. For details, see optimoptions.
Data Types: struct
Examples Fit Time-Invariant Diffuse State-Space Model to Data Generate data from a known model, and then fit a diffuse state-space model to the data. Suppose that a latent process is this AR(1) process xt = 0 . 5xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. 12-669
12
Functions
T = 100; ARMdl = arima('AR',0.5,'Constant',0,'Variance',1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,'Y0',x0);
Suppose further that the latent process is subject to additive measurement error as indicated in the equation yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.1. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.1*randn(T,1);
Together, the latent process and observation equations compose a state-space model. Supposing that the coefficients and variances are unknown parameters, the state-space model is xt = ϕxt − 1 + σ1ut yt = xt + σ2εt . Specify the state-transition matrix. Use NaN values for unknown parameters. A = NaN;
Specify the state-disturbance-loading coefficient matrix. B = NaN;
Specify the measurement-sensitivity coefficient matrix. C = 1;
Specify the observation-innovation coefficient matrix D = NaN;
Create the state-space model using the coefficient matrices and specify that the state variable is diffuse. A diffuse state specification indicates complete ignorance on the values of the states. StateType = 2; Mdl = dssm(A,B,C,D,'StateType',StateType);
Mdl is a dssm model. Verify that the model is correctly specified by viewing its display in the Command Window. Pass the observations to estimate to estimate the parameter. Set a starting value for the parameter to params0. σ1 and σ2 must be positive, so set the lower bound constraints using the 'lb' namevalue pair argument. Specify that the lower bound of ϕ is -Inf. params0 = [0.9; 0.5; 0.1]; EstMdl = estimate(Mdl,y,params0,'lb',[-Inf; 0; 0]) Method: Maximum likelihood (fmincon) Effective Sample size: 99
12-670
estimate
Logarithmic likelihood: -138.968 Akaike info criterion: 283.936 Bayesian info criterion: 291.752 | Coeff Std Err t Stat Prob ------------------------------------------------c(1) | 0.56114 0.18045 3.10975 0.00187 c(2) | 0.75836 0.24569 3.08661 0.00202 c(3) | 0.57129 0.27455 2.08086 0.03745 | | Final State Std Dev t Stat Prob x(1) | 1.24096 0.46532 2.66690 0.00766 EstMdl = State-space model type: dssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.56)x1(t-1) + (0.76)u1(t) Observation equation: y1(t) = x1(t) + (0.57)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 Inf State types x1 Diffuse
EstMdl is a dssm model. The results of the estimation appear in the Command Window, contain the fitted state-space equations, and contain a table of parameter estimates, their standard errors, t statistics, and p-values. Use dot notation to use or display the fitted state-transition matrix. EstMdl.A ans = 0.5611
Pass EstMdl to forecast to forecast observations, or to simulate to conduct a Monte Carlo study. 12-671
12
Functions
Estimate Diffuse State-Space Model Containing Regression Component Suppose that the linear relationship between unemployment rate and the nominal gross national product (nGNP) is of interest. Suppose further that unemployment rate is an AR(1) series. Symbolically, and in state-space form, the model is xt = ϕxt − 1 + σut yt − βZt = xt, where: • xt is the unemployment rate at time t. •
yt is the observed change in the unemployment rate being deflated by the return of nGNP (Zt).
• ut is the Gaussian series of state disturbances having mean 0 and unknown standard deviation σ. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series and removing the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); y = diff(DataTable.UR(~isNaN)); T = size(gnpn,1); Z = price2ret(gnpn);
% Flag periods containing NaNs % The sample size
This example continues using the series without NaN values. However, using the Kalman filter framework, the software can accommodate series containing missing values. Specify the coefficient matrices. A = NaN; B = NaN; C = 1;
Create the state-space model using dssm by supplying the coefficient matrices and specifying that the state values come from a diffuse distribution. The diffuse specification indicates complete ignorance about the moments of the initial distribution. StateType = 2; Mdl = dssm(A,B,C,'StateType',StateType);
Estimate the parameters. Specify the regression component and its initial value for optimization using the 'Predictors' and 'Beta0' name-value pair arguments, respectively. Display the estimates and all optimization diagnostic information. Restrict the estimate of σ to all positive, real numbers. params0 = [0.3 0.2]; % Initial values chosen arbitrarily Beta0 = 0.1;
12-672
estimate
EstMdl = estimate(Mdl,y,params0,'Predictors',Z,'Display','full',... 'Beta0',Beta0,'lb',[-Inf 0 -Inf]); ____________________________________________________________ Diagnostic Information Number of variables: 3 Functions Objective: Gradient: Hessian:
@(c)-fML(c,Mdl,Y,Predictors,unitFlag,sqrtFlag,mexFlag,mexTv finite-differencing bfgs
Constraints Nonlinear constraints:
do not exist
Number Number Number Number
of of of of
linear inequality constraints: linear equality constraints: lower bound constraints: upper bound constraints:
0 0 1 0
Algorithm selected interior-point ____________________________________________________________ End diagnostic information First-order Norm of Iter F-count f(x) Feasibility optimality step 0 4 5.084683e+03 0.000e+00 5.096e+04 1 8 6.405732e+02 0.000e+00 7.720e-02 1.457e+04 2 12 6.405620e+02 0.000e+00 7.713e-02 1.058e-01 3 16 6.405063e+02 0.000e+00 7.683e-02 5.285e-01 4 20 6.402322e+02 0.000e+00 7.531e-02 2.632e+00 5 24 6.389682e+02 0.000e+00 6.816e-02 1.289e+01 6 28 6.346900e+02 0.000e+00 4.146e-02 5.821e+01 7 32 6.314789e+02 0.000e+00 1.601e-02 8.771e+01 8 36 6.307024e+02 0.000e+00 7.462e-03 5.266e+01 9 40 6.304200e+02 0.000e+00 4.104e-03 4.351e+01 10 44 6.303324e+02 0.000e+00 4.116e-03 3.167e+01 11 48 6.303036e+02 0.000e+00 4.120e-03 2.418e+01 12 52 6.302943e+02 0.000e+00 4.121e-03 1.814e+01 13 56 6.302913e+02 0.000e+00 4.121e-03 1.377e+01 14 60 6.302903e+02 0.000e+00 4.121e-03 1.061e+01 15 64 6.302899e+02 0.000e+00 4.121e-03 8.547e+00 16 68 6.302897e+02 0.000e+00 4.121e-03 9.148e+00 17 72 6.302895e+02 0.000e+00 4.121e-03 9.938e+00 18 76 6.302895e+02 0.000e+00 4.121e-03 4.980e+01 19 80 6.302893e+02 0.000e+00 4.121e-03 4.247e+01 20 84 6.302892e+02 0.000e+00 4.121e-03 7.644e+00 21 89 6.302888e+02 0.000e+00 4.121e-03 2.258e+01 22 94 6.302888e+02 0.000e+00 4.121e-03 3.886e+00 23 98 6.302888e+02 0.000e+00 4.121e-03 2.733e-01 24 102 6.302887e+02 0.000e+00 4.121e-03 4.954e-01 25 106 6.302886e+02 0.000e+00 4.121e-03 4.668e-01 26 110 6.302877e+02 0.000e+00 4.121e-03 5.800e-01 27 114 6.302835e+02 0.000e+00 4.122e-03 1.192e+00 28 118 6.302622e+02 0.000e+00 4.123e-03 5.206e+00
12-673
12
Functions
29 30
122 126
6.301555e+02 6.296185e+02
0.000e+00 0.000e+00
4.131e-03 4.168e-03
2.585e+01 1.294e+02
Iter F-count 31 130 32 134 33 139 34 144 35 149 36 155 37 162 38 167 39 172 40 177 41 182 42 187 43 192 44 196 45 203 46 207 47 211 48 215 49 219 50 223 51 228 52 233 53 238 54 244 55 252 56 256 57 261 58 265 59 269 60 273
f(x) 6.268336e+02 6.096673e+02 5.687352e+02 5.316167e+02 5.141556e+02 4.740045e+02 4.645023e+02 4.344382e+02 4.215472e+02 3.863045e+02 3.077590e+02 2.971254e+02 2.710109e+02 2.700521e+02 2.696883e+02 2.691209e+02 2.689815e+02 2.682443e+02 2.643332e+02 2.375538e+02 1.976603e+02 1.598195e+02 1.333217e+02 1.257063e+02 1.189732e+02 1.150467e+02 1.111083e+02 1.105664e+02 1.104798e+02 1.104776e+02
Feasibility 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
First-order optimality 4.366e-03 5.812e-03 1.143e-02 9.612e-02 5.743e-01 5.648e-01 6.317e-01 2.566e+00 8.738e+00 1.820e+01 1.419e+01 5.061e+01 1.408e+01 4.378e+00 9.303e+00 2.573e-01 4.715e-01 5.862e-01 9.749e-01 4.233e-01 2.132e+00 8.813e+00 2.315e+01 4.639e+01 2.119e+01 1.117e+01 7.660e+00 3.301e+00 6.910e-01 1.858e-01
Norm of step 6.530e+02 3.420e+03 5.136e+03 2.581e+03 1.297e+03 3.450e+02 1.301e+02 4.287e+02 2.154e+02 1.084e+02 5.463e+01 2.744e+01 4.612e+00 1.051e+00 2.995e+00 5.053e-01 5.473e-01 2.851e+00 1.457e+01 7.788e+01 6.776e+01 3.505e+01 1.828e+01 7.067e+00 1.413e+01 5.427e+00 1.521e+00 5.585e-01 4.229e-01 1.471e-02
Iter F-count 61 277 62 281 63 285 64 289 65 293 66 297 67 301
f(x) 1.104772e+02 1.104771e+02 1.104771e+02 1.104771e+02 1.104771e+02 1.104771e+02 1.104771e+02
Feasibility 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
First-order optimality 7.298e-02 1.584e-02 1.397e-03 6.555e-04 1.311e-04 9.537e-06 1.311e-06
Norm of step 2.136e-02 6.396e-03 6.868e-03 1.744e-03 8.531e-05 4.028e-06 1.517e-07
Local minimum possible. Constraints satisfied. fmincon stopped because the size of the current step is less than the value of the step size tolerance and constraints are satisfied to within the value of the constraint tolerance. Method: Maximum likelihood (fmincon) Effective Sample size: 60 Logarithmic likelihood: -110.477 Akaike info criterion: 226.954 Bayesian info criterion: 233.287
12-674
estimate
| Coeff Std Err t Stat Prob -------------------------------------------------------c(1) | 0.59436 0.09408 6.31738 0 c(2) | 1.52554 0.10758 14.17991 0 y eps) ans = ans(:,:,1) = 0 ans(:,:,2) = 0 ans(:,:,3) = 0 ans(:,:,4) = 0
Row sums among the pages are close to 1. Display the contributions to the forecast error variance of the bond rate when real income is shocked at time 0. Decomposition(:,2,3) ans = 20×1 0.0499 0.1389 0.1700 0.1807 0.1777 0.1694 0.1601 0.1516 0.1446 0.1390 ⋮
Plot the FEVDs of all series on separate plots by passing the estimated AR coefficient matrices and innovations covariance matrix of Mdl to armafevd. armafevd(EstMdl.AR,[],"InnovCov",EstMdl.Covariance);
12-813
12
Functions
12-814
fevd
12-815
12
Functions
12-816
fevd
Each plot shows the four FEVDs of a variable when all other variables are shocked at time 0. Mdl.SeriesNames specifies the variable order.
Estimate Generalized FEVD of VAR Model Consider the 4-D VAR(2) model in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12812. Estimate the generalized FEVD of the system for 100 periods. Load the Danish money and income data set, then estimate the VAR(2) model. load Data_JDanish Mdl = varm(4,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the generalized FEVD from the estimated VAR(2) model over a forecast horizon with length 100. Decomposition = fevd(Mdl,Method="generalized",NumObs=100);
Decomposition is a 100-by-4-by-4 array representing the generalized FEVD of Mdl. Plot the generalized FEVD of the bond rate when real income is shocked at time 0. 12-817
12
Functions
figure; plot(1:100,Decomposition(:,2,3)) title("FEVD of IB When Y Is Shocked") xlabel("Forecast Horizon") ylabel("Variance Contribution") grid on
When real income is shocked, the contribution of the bond rate to the forecast error variance settles at approximately 0.061.
Specify Data in Timetables When Computing FEVD and Confidence Intervals Fit a 4-D VAR(2) model to Danish money and income rate series data in a timetable. Then, estimate and plot the orthogonalized FEVD and corresponding confidence intervals from the estimated model. Load the Danish money and income data set. load Data_JDanish
The data set includes four time series in the timetable DataTimeTable. For more details on the data set, enter Description at the command line. Assuming that the series are stationary, create a varm model object that represents a 4-D VAR(2) model. Specify the variable names. 12-818
fevd
Mdl = varm(4,2); Mdl.SeriesNames = DataTimeTable.Properties.VariableNames;
Mdl is a varm model object specifying the structure of a 4-D VAR(2) model; it is a template for estimation. Fit the VAR(2) model to the data set. EstMdl = estimate(Mdl,DataTimeTable);
Mdl is a fully specified varm model object representing an estimated 4-D VAR(2) model. Estimate the orthogonalized FEVD and corresponding 95% confidence intervals from the estimated VAR(2) model. To return confidence intervals, you must set a name-value argument that controls confidence intervals, for example, Confidence. Set Confidence to 0.95. rng(1); % For reproducibility Tbl = fevd(EstMdl,Confidence=0.95); size(Tbl) ans = 1×2 20
12
Tbl is a timetable with 20 rows, representing the periods in the FEVD, and 12 variables. Each variable is a 20-by-4 matrix of the FEVD or confidence bound associated with a variable in the model EstMdl. For example, Tbl.M2_FEVD(:,2) is the FEVD of M2 resulting from a 1-standard-deviation shock on 01-Apr-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. [Tbl.M2_FEVD_LowerBound(:,2),Tbl.M2_FEVD_UpperBound(:,2)] are the corresponding 95% confidence intervals. Plot the FEVD of M2 and its 95% confidence interval resulting from a 1-standard-deviation shock on 01-Apr-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. idxM2 = startsWith(Tbl.Properties.VariableNames,"M2"); M2FEVD = Tbl(:,idxM2); shockIdx = 2; figure hold on plot(M2FEVD.Time,M2FEVD.M2_FEVD(:,shockIdx),"-o") plot(M2FEVD.Time,[M2FEVD.M2_FEVD_LowerBound(:,shockIdx) ... M2FEVD.M2_FEVD_UpperBound(:,shockIdx)],"-o",Color="r") legend("FEVD","95% confidence interval") title('M2 FEVD, Shock to Y') hold off
12-819
12
Functions
Monte Carlo Confidence Intervals on True FEVD Consider the 4-D VAR(2) model in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12812. Estimate and plot its orthogonalized FEVD and 95% Monte Carlo confidence intervals on the true FEVD. Load the Danish money and income data set, then estimate the VAR(2) model. load Data_JDanish Mdl = varm(4,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the FEVD and corresponding 95% Monte Carlo confidence intervals from the estimated VAR(2) model. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl);
Decomposition, Lower, and Upper are 20-by-4-by-4 arrays representing the orthogonalized FEVD of Mdl and corresponding lower and upper bounds of the confidence intervals. For all arrays, rows correspond to consecutive time points from time 1 to 20, columns correspond to variables receiving a one-standard-deviation innovation shock at time 0, and pages correspond to the variables whose forecast error variance fevd decomposes. Mdl.SeriesNames specifies the variable order. 12-820
fevd
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(1:20,fevdshock2resp3); hold on h2 = plot(1:20,FEVDCIShock2Resp3,'r--'); legend([h1 h2(1)],["FEVD" "95% Confidence Interval"],... 'Location',"best") xlabel("Forecast Horizon"); ylabel("Variance Contribution"); title("FEVD of IB When Y Is Shocked"); grid on hold off
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0 and 0.5 with 95% confidence.
12-821
12
Functions
Bootstrap Confidence Intervals on True FEVD Consider the 4-D VAR(2) model in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12812. Estimate and plot its orthogonalized FEVD and 90% bootstrap confidence intervals on the true FEVD. Load the Danish money and income data set, then estimate the VAR(2) model. Return the residuals from model estimation. load Data_JDanish Mdl = varm(4,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; [Mdl,~,~,Res] = estimate(Mdl,DataTable.Series); T = size(DataTable,1) % Total sample size T = 55 n = size(Res,1)
% Effective sample size
n = 53
Res is a 53-by-4 array of residuals. Columns correspond to the variables in Mdl.SeriesNames. The estimate function requires Mdl.P = 2 observations to initialize a VAR(2) model for estimation. Because presample data (Y0) is unspecified, estimate takes the first two observations in the specified response data to initialize the model. Therefore, the resulting effective sample size is T – Mdl.P = 53, and rows of Res correspond to the observation indices 3 through T. Estimate the orthogonalized FEVD and corresponding 90% bootstrap confidence intervals from the estimated VAR(2) model. Draw 500 paths of length n from the series of residuals. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl,E=Res,NumPaths=500, ... Confidence=0.9);
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(0:19,fevdshock2resp3); hold on h2 = plot(0:19,FEVDCIShock2Resp3,"r--"); legend([h1 h2(1)],["FEVD" "90% Confidence Interval"],... 'Location',"best") xlabel("Time Index"); ylabel("Response"); title("FEVD of IB When Y Is Shocked"); grid on hold off
12-822
fevd
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0.05 and 0.4 with 90% confidence.
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object created by varm or estimate. Mdl must be fully specified. If Mdl is an estimated model (returned by estimate) , you must supply any optional data using the same data type as the input response data, to which the model is fit. If Mdl is a custom varm model object (an object not returned by estimate or modified after estimation), fevd can require a sample size for the simulation SampleSize or presample responses Y0. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-823
12
Functions
Example: fevd(Mdl,NumObs=10,Method="generalized",E=Res) specifies estimating a generalized FEVD for periods 1 through 10, and bootstraps the residuals in the numeric array Res to compute the 95% confidence bounds. Options for All FEVDs
NumObs — Number of periods 20 (default) | positive integer Number of periods for which fevd computes the FEVD (the forecast horizon), specified as a positive integer. NumObs specifies the number of observations to include in the FEVD (the number of rows in Decomposition). Example: NumObs=10 specifies estimation of the FEVD for times 1 through 10. Data Types: double Method — FEVD computation method "orthogonalized" (default) | "generalized" | character vector FEVD computation method, specified as a value in this table. Value
Description
"orthogonalized"
Compute variance decompositions using orthogonalized, one-standard-deviation innovation shocks. fevd uses the Cholesky factorization of Mdl.Covariance for orthogonalization.
"generalized"
Compute variance decompositions using onestandard-deviation innovation shocks.
Example: Method="generalized" Data Types: char | string Options for Confidence Bound Estimation
NumPaths — Number of sample paths 100 (default) | positive integer Number of sample paths (trials) to generate, specified as a positive integer. Example: NumPaths=1000 generates 1000 sample paths from which the software derives the confidence bounds. Data Types: double SampleSize — Number of observations for Monte Carlo simulation or bootstrap per sample path positive integer Number of observations for the Monte Carlo simulation or bootstrap per sample path, specified as a positive integer. • If Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter), the default is the sample size of the data to which the model is fit (see summarize). 12-824
fevd
• Otherwise: • If fevd estimates confidence bounds by conducting a Monte Carlo simulation, you must specify SampleSize. • If fevd estimates confidence bounds by bootstrapping residuals, the default is the length of the specified series of residuals (size(Res,1), where Res is the number of residuals in E or InSample). Example: If you specify SampleSize=100 and do not specify the E name-value argument, the software estimates confidence bounds from NumPaths random paths of length 100 from Mdl. Example: If you specify SampleSize=100,E=Res, the software resamples, with replacement, 100 observations (rows) from Res to form a sample path of innovations to filter through Mdl. The software forms NumPaths random sample paths from which it derives confidence bounds. Data Types: double Y0 — Presample response data numeric matrix Presample response data that provides initial values for model estimation during the simulation, specified as a numpreobs-by-numseries numeric matrix. Use Y0 only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreobs is the number of presample observations. numseries is Mdl.NumSeries, the dimensionality of the input model. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs is the number of specified presample responses and it must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. numseries is the dimensionality of the input VAR model Mdl.NumSeries. Columns must correspond to the response variables in Mdl.SeriesNames. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Y0 to the presample response data used for estimation by default (see the Y0 name-value argument of estimate). • If Mdl is a custom model and you return confidence bounds Lower or Upper, you must specify Y0. Data Types: double Presample — Presample data table | timetable Presample data that provide initial values for the model Mdl, specified as a table or timetable with numprevars variables and numpreobs rows. Use Presample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to response data in a table or timetable. 12-825
12
Functions
Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. Each variable is a numpreobs numeric vector representing one path. To control presample variable selection, see the optional PresampleResponseVariables name-value argument. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the following conditions hold: • The last row contains the latest presample observation. • Presample.Properties.RowsNames must be empty. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Presample to the presample response data used for estimation by default (see the Presample name-value argument of estimate). • If Mdl is a custom model (for example, you modify a model after estimation by using dot notation) and you return confidence bounds in the table or timetable Tbl, you must specify Presample. PresampleResponseVariables — Variables to select from Presample to use for presample response data string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Presample to use for presample data, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Presample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleResponseVariables(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleResponseVariables) is numseries PresampleResponseVariables applies only when you specify Presample. The selected variables must be numeric vectors and cannot contain missing values (NaN). PresampleResponseNames does not need to contain the same names as in Mdl.SeriesNames; fevd uses the data in selected variable PresampleResponseVariables(j) as a presample for Mdl.SeriesNames(j). If the number of variables in Presample matches Mdl.NumSeries, the default specifies all variables in Presample. If the number of variables in Presample exceeds Mdl.NumSeries, the default matches variables in Presample to names in Mdl.SeriesNames. Example: PresampleResponseVariables=["GDP" "CPI"] Example: PresampleResponseVariables=[true false true false] or PresampleResponseVariable=[1 3] selects the first and third table variables for presample data. 12-826
fevd
Data Types: double | logical | char | cell | string X — Predictor data numeric matrix Predictor data xt for estimating the model regression component during the simulation, specified as a numeric matrix containing numpreds columns. Use X only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreds is the number of predictor variables (size(Mdl.Beta,2)). Each row corresponds to an observation, and measurements in each row occur simultaneously. The last row contains the latest observation. X must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. fevd does not use the regression component in the presample period. Columns correspond to individual predictor variables. All predictor variables are present in the regression component of each response equation. To maintain model consistency when fevd estimates the confidence bounds, a good practice is to specify predictor data when Mdl has a regression component. If Mdl is an estimated model, specify the predictor data used during model estimation (see the X name-value argument of estimate). By default, fevd excludes the regression component from confidence bound estimation, regardless of its presence in Mdl. Data Types: double E — Series of residuals et from which to draw bootstrap samples numeric matrix Series of residuals from which to draw bootstrap samples, specified as a numperiods-by-numseries numeric matrix. fevd assumes that E is free of serial correlation. Use E only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. Each column is the residual series corresponding to the response series names in Mdl.SeriesNames. Each row corresponds to a period in the FEVD and the corresponding confidence bounds. If Mdl is an estimated varm model object (an object returned by estimate), you can specify E as the inferred residuals from estimation (see the E output argument of estimate or infer). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Data Types: double InSample — Time series data table | timetable 12-827
12
Functions
Time series data containing numvars variables, including numseries variables of residuals et to bootstrap or numpreds predictor variables xt for the model regression component, specified as a table or timetable. Use InSample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to response data in a table or timetable. Each variable is a single path of observations, which fevd applies to all NumPaths sample paths. If you specify Presample you must specify which variables are residuals and predictors, see the ResidualVariables and PredictorVariables name-value arguments. Each row is an observation, and measurements in each row occur simultaneously. InSample must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). • The datetime vector InSample.Time must be strictly ascending or descending. • Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the following conditions hold: • The last row contains the latest observation. • InSample.Properties.RowsNames must be empty. By default, fevd derives confidence bounds by conducting a Monte Carlo simulation and does not use model the regression component, regardless of its presence in Mdl. ResidualVariables — Variables to select from InSample to treat as residuals et for bootstrapping string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as residuals for bootstrapping, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in InSample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where ResidualVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(ResidualVariables) is numseries Regardless, selected residual variable j is the residual series for Mdl.SeriesNames(j). The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Example: ResidualVariables=["GDP_Residuals" "CPI_Residuals"] Example: ResidualVariables=[true false true false] or ResidualVariable=[1 3] selects the first and third table variables as the disturbance variables. 12-828
fevd
Data Types: double | logical | char | cell | string PredictorVariables — Variables to select from InSample to treat as exogenous predictor variables xt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as exogenous predictor variables xt, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames • A length numpreds vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(PredictorVariables) is numpreds Regardless, selected predictor variable j corresponds to the coefficients Mdl.Beta(:,j). PredictorVariables applies only when you specify InSample. The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string Confidence — Confidence level 0.95 (default) | numeric scalar in [0,1] Confidence level for the confidence bounds, specified as a numeric scalar in the interval [0,1]. For each period, randomly drawn confidence intervals cover the true response 100*Confidence% of the time. The default value is 0.95, which implies that the confidence bounds represent 95% confidence intervals. Example: Confidence=0.9 specifies 90% confidence intervals. Data Types: double Note • NaN values in Y0, X, and E indicate missing data. fevd removes missing data from these arguments by list-wise deletion. Each argument, if a row contains at least one NaN, then fevd removes the entire row. List-wise deletion reduces the sample size, can create irregular time series, and can cause E and X to be unsynchronized. • fevd issues an error when any table or timetable input contains missing values.
12-829
12
Functions
Output Arguments Decomposition — FEVD numeric array FEVD of each response variable, returned as a numobs-by-numseries-by-numseries numeric array. numobs is the value of NumObs. Columns and pages correspond to the response variables in Mdl.SeriesNames. fevd returns Decomposition only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Decomposition(t,j,k) is the contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time t, for t = 1,2,…,numobs, j = 1,2,...,numseries, and k = 1,2,...,numseries. Lower — Lower confidence bounds numeric array Lower confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Lower correspond to elements of Decomposition. fevd returns Lower only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Lower(t,j,k) is the lower bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Upper — Upper confidence bounds numeric array Upper confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Upper correspond to elements of Decomposition. fevd returns Upper only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Upper(t,j,k) is the upper bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Tbl — FEVD and confidence bounds table | timetable FEVD and confidence bounds, returned as a table or timetable with numobs rows. fevd returns Tbl only in the following situations: 12-830
fevd
• You supply optional data inputs as tables or timetables. • Mdl is an estimated model object fit to response data in a table or timetable. Regardless, the data type of Tbl is the same as the data type of specified data. Tbl contains the following variables: • The FEVD of each series in yt. Each FEVD variable in Tbl is a numobs-by-numseries numeric matrix, where numobs is the value of NumObs and numseries is the value of Mdl.NumSeries. fevd names the FEVD of response variable ResponseJ in Mdl.SeriesNames ResponseJ_FEVD. For example, if Mdl.Series(j) is GDP, Tbl contains a variable for the corresponding FEVD with the name GDP_FEVD. ResponseJ_FEVD(t,k) is the contribution to the variance decomposition of response variable ResponseJ attributable to a one-standard-deviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. • The lower and upper confidence bounds on the true FEVD of the response series, when you set at least one name-value argument that controls the confidence bounds. Each confidence bound variable in Tbl is a numobs-by-numseries numeric matrix. ResponseJ_FEVD_LowerBound and ResponseJ_FEVD_UpperBound are the names of the lower and upper bound variables, respectively, of the confidence interval on the FEVD of response variable Mdl.SeriesNames(J) = ResponseJ . For example, if Mdl.SeriesNames(j) is GDP, Tbl contains variables for the corresponding lower and upper bounds of the confidence interval with the name GDP_FEVD_LowerBound and GDP_FEVD_UpperBound. (ResponseJ_FEVD_LowerBound(t,k), ResponseJ_FEVD_UpperBound(t,k)) is the 95% confidence interval on the FEVD of response variable ResponseJ attributable to a one-standarddeviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. If Tbl is a timetable, the row order of Tbl, either ascending or descending, matches the row order of InSample, when you specify it. If you do not specify InSample and you specify Presample, the row order of Tbl is the same as the row order Presample.
More About Forecast Error Variance Decomposition The forecast error variance decomposition (FEVD) of a multivariate, dynamic system shows the relative importance of a shock to each innovation in affecting the forecast error variance of all variables in the system. Consider a numseries-D VAR(p) model on page 12-832 for the multivariate response variable yt. In lag operator notation, the infinite lag MA representation of yt is: yt = Φ−1(L) c + βxt + δt + Φ−1(L)εt = Ω(L) c + βxt + δt + Ω(L)εt . The general form of the FEVD of ykt (variable k) m periods into the future, attributable to a onestandard-deviation innovation shock to yjt, is
12-831
12
Functions
m−1
∑
γm jk =
t=0 m−1
∑
t=0
ek′Cte j
2
. ek′ Ωt ΣΩt′ek
• ej is a selection vector of length numseries containing a 1 in element j and zeros elsewhere. • For orthogonalized FEVDs, Cm = ΩmP, where P is the lower triangular factor in the Cholesky factorization of Σ. • For generalized FEVDs, Cm = σ−1ΩmΣ, where σj is the standard deviation of innovation j. j • The numerator is the contribution of an innovation shock to variable j to the forecast error variance of the m-step-ahead forecast of variable k. The denominator is the mean square error (MSE) of the m-step-ahead forecast of variable k [3]. Vector Autoregression Model A vector autoregression (VAR) model is a stationary multivariate time series model consisting of a system of m equations of m distinct response variables as linear functions of lagged responses and other terms. A VAR(p) model in difference-equation notation and in reduced form is yt = c + Φ1 yt − 1 + Φ2 yt − 2 + ... + Φp yt − p + βxt + δt + εt . • yt is a numseries-by-1 vector of values corresponding to numseries response variables at time t, where t = 1,...,T. The structural coefficient is the identity matrix. • c is a numseries-by-1 vector of constants. • Φj is a numseries-by-numseries matrix of autoregressive coefficients, where j = 1,...,p and Φp is not a matrix containing only zeros. • xt is a numpreds-by-1 vector of values corresponding to numpreds exogenous predictor variables. • β is a numseries-by-numpreds matrix of regression coefficients. • δ is a numseries-by-1 vector of linear time-trend values. • εt is a numseries-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively a numseries-by-numseries covariance matrix Σ. For t ≠ s, εt and εs are independent. Condensed and in lag operator notation, the system is Φ(L)yt = c + βxt + δt + εt, where Φ(L) = I − Φ1L − Φ2L2 − ... − ΦpLp, Φ(L)yt is the multivariate autoregressive polynomial, and I is the numseries-by-numseries identity matrix.
Algorithms • If Method is "orthogonalized", then fevd orthogonalizes the innovation shocks by applying the Cholesky factorization of the model covariance matrix Mdl.Covariance. The covariance of the orthogonalized innovation shocks is the identity matrix, and the FEVD of each variable sums to one (that is, the sum along any row of Decomposition or rows associated with FEVD variables in Tbl is one). Therefore, the orthogonalized FEVD represents the proportion of forecast error 12-832
fevd
variance attributable to various shocks in the system. However, the orthogonalized FEVD generally depends on the order of the variables. If Method is "generalized", then the resulting FEVD is invariant to the order of the variables, and is not based on an orthogonal transformation. Also, the resulting FEVD sums to one for a particular variable only when Mdl.Covariance is diagonal [4]. Therefore, the generalized FEVD represents the contribution to the forecast error variance of equation-wise shocks to the response variables in the model. • If Mdl.Covariance is a diagonal matrix, then the resulting generalized and orthogonalized FEVDs are identical. Otherwise, the resulting generalized and orthogonalized FEVDs are identical only when the first variable in Mdl.SeriesNames shocks all variables (for example, all else being the same, both methods yield the same value of Decomposition(:,1,:)). • The predictor data in X or InSample represents a single path of exogenous multivariate time series. If you specify X or InSample and the model Mdl has a regression component (Mdl.Beta is not an empty array), fevd applies the same exogenous data to all paths used for confidence interval estimation. • fevd conducts a simulation to estimate the confidence bounds Lower and Upper or associated variables in Tbl. • If you do not specify residuals by supplying E or using InSample, fevd conducts a Monte Carlo simulation by following this procedure: 1
Simulate NumPaths response paths of length SampleSize from Mdl.
2
Fit NumPaths models that have the same structure as Mdl to the simulated response paths. If Mdl contains a regression component and you specify predictor data by supplying X or using InSample, fevd fits the NumPaths models to the simulated response paths and the same predictor data (the same predictor data applies to all paths).
3
Estimate NumPaths FEVDs from the NumPaths estimated models.
4
For each time point t = 0,…,NumObs, estimate the confidence intervals by computing 1 – Confidence and Confidence quantiles (the upper and lower bounds, respectively).
• Otherwise, fevd conducts a nonparametric bootstrap by following this procedure: 1
Resample, with replacement, SampleSize residuals from E or InSample. Perform this step NumPaths times to obtain NumPaths paths.
2
Center each path of bootstrapped residuals.
3
Filter each path of centered, bootstrapped residuals through Mdl to obtain NumPaths bootstrapped response paths of length SampleSize.
4
Complete steps 2 through 4 of the Monte Carlo simulation, but replace the simulated response paths with the bootstrapped response paths.
Version History Introduced in R2019a
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-833
12
Functions
[2] Lütkepohl, H. "Asymptotic Distributions of Impulse Response Functions and Forecast Error Variance Decompositions of Vector Autoregressive Models." Review of Economics and Statistics. Vol. 72, 1990, pp. 116–125. [3] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [4] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29.
See Also Objects varm Functions estimate | simulate | filter | irf | armafevd
12-834
fevd
fevd Generate vector error-correction (VEC) model forecast error variance decomposition (FEVD)
Syntax Decomposition = fevd(Mdl) Decomposition = fevd(Mdl,Name=Value) [Decomposition,Lower,Upper] = fevd( ___ ) Tbl = fevd( ___ )
Description The fevd function returns the forecast error decomposition on page 12-856 (FEVD) of the variables in a VEC(p – 1) model on page 12-857 attributable to shocks to each response variable in the system. A fully specified vecm model object characterizes the VEC model. The FEVD provides information about the relative importance of each innovation in affecting the forecast error variance of all response variables in the system. In contrast, the impulse response function (IRF) traces the effects of an innovation shock to one variable on the response of all variables in the system. To estimate the IRF of a VEC model characterized by a vecm model object, see irf. You can supply optional data, such as a presample, as a numeric array, table, or timetable. However, all specified input data must be the same data type. When the input model is estimated (returned by estimate), supply the same data type as the data used to estimate the model. The data type of the outputs matches the data type of the specified input data. Decomposition = fevd(Mdl) returns a numeric array containing the orthogonalized FEVDs of the response variables that compose the VEC(p – 1) model Mdl characterized by a fully specified vecm model object. fevd shocks variables at time 0, and returns the FEVD for times 1 through 20. If Mdl is an estimated model (returned by estimate) fit to a numeric matrix of input response data, this syntax applies. Decomposition = fevd(Mdl,Name=Value) uses additional options specified by one or more name-value arguments. fevd returns numeric arrays when all optional input data are numeric arrays. For example, fevd(Mdl,NumObs=10,Method="generalized") specifies estimating a generalized FEVD for periods 1 through 10. If Mdl is an estimated model fit to a numeric matrix of input response data, this syntax applies. [Decomposition,Lower,Upper] = fevd( ___ ) returns numeric arrays of lower Lower and upper Upper 95% confidence bounds for confidence intervals on the true FEVD, for each period and variable in the FEVD, using any input argument combination in the previous syntaxes. By default, fevd estimates confidence bounds by conducting Monte Carlo simulation. If Mdl is an estimated model fit to a numeric matrix of input response data, this syntax applies. 12-835
12
Functions
If Mdl is a custom vecm model object (an object not returned by estimate or modified after estimation), fevd can require a sample size for the simulation SampleSize or presample responses Y0. Tbl = fevd( ___ ) returns the table or timetable Tbl containing the FEVDs and, optionally, corresponding 95% confidence bounds, of the response variables that compose the VEC(p – 1) model Mdl. The FEVD of the corresponding response is a variable in Tbl containing a matrix with columns corresponding to the variables in the system shocked at time 0. If you set at least one name-value argument that controls the 95% confidence bounds on the FEVD, Tbl also contains a variable for each of the lower and upper bounds. For example, Tbl contains confidence bounds when you set the NumPaths name-value argument. If Mdl is an estimated model fit to a table or timetable of input response data, this syntax applies.
Examples Specify Data in Numeric Matrix When Plotting FEVD Fit a 4-D VEC(2) model with two cointegrating relations to Danish money and income rate series data in a numeric matrix. Then, estimate and plot the orthogonalized FEVD from the estimated model. Load the Danish money and income data set. load Data_JDanish
For more details on the data set, enter Description at the command line. Create a vecm model object that represents a 4-D VEC(2) model with two cointegrating relations. Specify the variable names. Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames;
Mdl is a vecm model object specifying the structure of a 4-D VEC(2) model; it is a template for estimation. Fit the VEC(2) model to the numeric matrix of time series data Data. Mdl = estimate(Mdl,Data);
EstMdl is a fully specified vecm model object representing an estimated 4-D VEC(2) model. Estimate the orthogonalized FEVD from the estimated VEC(2) model. Decomposition = fevd(Mdl);
Decomposition is a 20-by-4-by-4 array representing the FEVD of Mdl. Rows correspond to consecutive time points from time 1 to 20, columns correspond to variables receiving a one-standarddeviation innovation shock at time 0, and pages correspond to variables whose forecast error variance fevd decomposes. Mdl.SeriesNames specifies the variable order. By default, fevd uses the H1 Johansen form, which is the same default form that estimate uses. 12-836
fevd
Because Decomposition represents an orthogonalized FEVD, rows should sum to 1. This characteristic illustrates that orthogonalized FEVDs represent proportions of variance contributions. Confirm that all rows of Decomposition sum to 1. rowsums = sum(Decomposition,2); sum((rowsums - 1).^2 > eps) ans = ans(:,:,1) = 0 ans(:,:,2) = 0 ans(:,:,3) = 0 ans(:,:,4) = 0
Row sums among the pages are close to 1. Display the contributions to the forecast error variance of the bond rate when real income is shocked at time 0. Decomposition(:,2,3) ans = 20×1 0.0694 0.1744 0.1981 0.2182 0.2329 0.2434 0.2490 0.2522 0.2541 0.2559 ⋮
The armafevd function plots the FEVD of VAR models characterized by AR coefficient matrices. Plot the FEVD of a VEC model by: 1
Expressing the VEC(2) model as a VAR(3) model by passing Mdl to varm
2
Passing the VAR model AR coefficients and innovations covariance matrix to armafevd
Plot the VEC(2) model FEVD for 40 periods. 12-837
12
Functions
VARMdl = varm(Mdl); armafevd(VARMdl.AR,[],InnovCov=VARMdl.Covariance, ... NumObs=40);
12-838
fevd
12-839
12
Functions
12-840
fevd
Each plot shows the four FEVDs of a variable when all other variables are shocked at time 0. Mdl.SeriesNames specifies the variable order.
Estimate Generalized FEVD of VEC Model Consider the 4-D VEC(2) model with two cointegrating relations in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12-836. Estimate the generalized FEVD of the system for 100 periods. Load the Danish money and income data set, then estimate the VEC(2) model. load Data_JDanish Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the generalized FEVD from the estimated VEC(2) model over a forecast horizon with length 100. Decomposition = fevd(Mdl,Method="generalized",NumObs=100);
Decomposition is a 100-by-4-by-4 array representing the generalized FEVD of Mdl. Plot the generalized FEVD of the bond rate when real income is shocked at time 0. 12-841
12
Functions
figure; plot(1:100,Decomposition(:,2,3)) title("FEVD of IB When Y Is Shocked") xlabel("Forecast Horizon") ylabel("Variance Contribution") grid on
When real income is shocked, the contribution of the bond rate to the forecast error variance settles at approximately 0.08.
Specify Data in Timetables When Computing FEVD and Confidence Intervals Fit a 4-D VEC(2) model with two cointegrating relations to Danish money and income rate series data in a timetable. Then, estimate and plot the orthogonalized FEVD and corresponding confidence intervals from the estimated model. Load the Danish money and income data set. load Data_JDanish
Create a vecm model object that represents a 4-D VEC(2) model with two cointegrating relations. Specify the variable names. Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames;
12-842
fevd
Mdl is a vecm model object specifying the structure of a 4-D VEC(2) model; it is a template for estimation. Fit the VEC(2) model to the data set. EstMdl = estimate(Mdl,DataTimeTable);
EstMdl is a fully specified vecm model object representing an estimated 4-D VEC(2) model. Estimate the orthogonalized FEVD and corresponding 95% confidence intervals from the estimated VEC(2) model. To return confidence intervals, you must set a name-value argument that controls confidence intervals, for example, Confidence. Set Confidence to 0.95. rng(1); % For reproducibility Tbl = fevd(EstMdl,Confidence=0.95); Tbl.Time(1) ans = datetime 01-Oct-1974 size(Tbl) ans = 1×2 20
12
Tbl is a timetable with 20 rows, representing the periods in the FEVD, and 12 variables. Each variable is a 20-by-4 matrix of the FEVD or confidence bound associated with a variable in the model EstMdl. For example, Tbl.M2_FEVD(:,2) is the FEVD of M2 resulting from a 1-standard-deviation shock on 01-Jul-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. [Tbl.M2_FEVD_LowerBound(:,2),Tbl.M2_FEVD_UpperBound(:,2)] are the corresponding 95% confidence intervals. By default, fevd uses the H1 Johansen form, which is the same default form that estimate uses. Plot the FEVD of M2 and its 95% confidence interval resulting from a 1-standard-deviation shock on 01-Jul-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. idxM2 = startsWith(Tbl.Properties.VariableNames,"M2"); M2FEVD = Tbl(:,idxM2); shockIdx = 2; figure hold on plot(M2FEVD.Time,M2FEVD.M2_FEVD(:,shockIdx),"-o") plot(M2FEVD.Time,[M2FEVD.M2_FEVD_LowerBound(:,shockIdx) ... M2FEVD.M2_FEVD_UpperBound(:,shockIdx)],"-o",Color="r") legend("FEVD","95% confidence interval") title('M2 FEVD, Shock to Y') hold off
12-843
12
Functions
Monte Carlo Confidence Intervals on True FEVD Consider the 4-D VEC(2) model with two cointegrating relations in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12-836. Estimate and plot its orthogonalized FEVD and 95% Monte Carlo confidence intervals on the true FEVD. Load the Danish money and income data set, then estimate the VEC(2) model. load Data_JDanish Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the FEVD and corresponding 95% Monte Carlo confidence intervals from the estimated VEC(2) model. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl);
Decomposition, Lower, and Upper are 20-by-4-by-4 arrays representing the orthogonalized FEVD of Mdl and corresponding lower and upper bounds of the confidence intervals. For all arrays, rows correspond to consecutive time points from time 1 to 20, columns correspond to variables receiving a one-standard-deviation innovation shock at time 0, and pages correspond to the variables whose forecast error variance fevd decomposes. Mdl.SeriesNames specifies the variable order. 12-844
fevd
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(1:20,fevdshock2resp3); hold on h2 = plot(1:20,FEVDCIShock2Resp3,"r--"); legend([h1 h2(1)],["FEVD" "95% Confidence Interval"],... Location="best") xlabel("Forecast Horizon"); ylabel("Variance Contribution"); title("FEVD of IB When Y Is Shocked"); grid on hold off
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0 and 0.7 with 95% confidence.
12-845
12
Functions
Bootstrap Confidence Intervals on True FEVD Consider the 4-D VEC(2) model with two cointegrating relations in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12-836. Estimate and plot its orthogonalized FEVD and 90% bootstrap confidence intervals on the true FEVD. Load the Danish money and income data set, then estimate the VEC(2) model. Return the residuals from model estimation. load Data_JDanish Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; [Mdl,~,~,Res] = estimate(Mdl,DataTable.Series); T = size(DataTable,1) % Total sample size T = 55 n = size(Res,1)
% Effective sample size
n = 52
Res is a 52-by-4 array of residuals. Columns correspond to the variables in Mdl.SeriesNames. The estimate function requires Mdl.P = 3 observations to initialize a VEC(2) model for estimation. Because presample data (Y0) is unspecified, estimate takes the first three observations in the specified response data to initialize the model. Therefore, the resulting effective sample size is T – Mdl.P = 52, and rows of Res correspond to the observation indices 4 through T. Estimate the orthogonalized FEVD and corresponding 90% bootstrap confidence intervals from the estimated VEC(2) model. Draw 500 paths of length n from the series of residuals. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl,E=Res,NumPaths=500, ... Confidence=0.9);
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(0:19,fevdshock2resp3); hold on h2 = plot(0:19,FEVDCIShock2Resp3,'r--'); legend([h1 h2(1)],["FEVD" "90% Confidence Interval"], ... Location="best") xlabel("Time Index"); ylabel("Response"); title("FEVD of IB When Y Is Shocked"); grid on hold off
12-846
fevd
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0 and 0.6 with 90% confidence.
Input Arguments Mdl — VEC model vecm model object VEC model, specified as a vecm model object created by vecm or estimate. Mdl must be fully specified. If Mdl is an estimated model (returned by estimate) , you must supply any optional data using the same data type as the input response data, to which the model is fit. If Mdl is a custom vecm model object (an object not returned by estimate or modified after estimation), fevd can require a sample size for the simulation SampleSize or presample responses Y0. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-847
12
Functions
Example: fevd(Mdl,NumObs=10,Method="generalized",E=Res) specifies estimating a generalized FEVD for periods 1 through 10, and bootstraps the residuals in the numeric array Res to compute the 95% confidence bounds. Options for All FEVDs
NumObs — Number of periods 20 (default) | positive integer Number of periods for which fevd computes the FEVD (the forecast horizon), specified as a positive integer. NumObs specifies the number of observations to include in the FEVD (the number of rows in Decomposition). Example: NumObs=10 specifies estimation of the FEVD for times 1 through 10. Data Types: double Method — FEVD computation method "orthogonalized" (default) | "generalized" | character vector FEVD computation method, specified as a value in this table. Value
Description
"orthogonalized"
Compute variance decompositions using orthogonalized, one-standard-deviation innovation shocks. fevd uses the Cholesky factorization of Mdl.Covariance for orthogonalization.
"generalized"
Compute variance decompositions using onestandard-deviation innovation shocks.
Example: Method="generalized" Data Types: char | string Model — Johansen form of VEC(p – 1) model deterministic terms "H1" (default) | "H2" | "H1*" | "H*" | "H" | character vector Johansen form of the VEC(p – 1) model deterministic terms [2], specified as a value in this table (for variable definitions, see “Vector Error-Correction Model” on page 12-1455). Value
Error-Correction Term
Description
"H2"
AB´yt − 1
No intercepts or trends are present in the cointegrating relations, and no deterministic trends are present in the levels of the data. Specify this model only when all response series have a mean of zero.
"H1*"
12-848
A(B´yt−1+c0)
Intercepts are present in the cointegrating relations, and no deterministic trends are present in the levels of the data.
fevd
Value
Error-Correction Term
Description
"H1"
A(B´yt−1+c0)+c1
Intercepts are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H*"
A(B´yt−1+c0+d0t)+c1
Intercepts and linear trends are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H"
A(B´yt−1+c0+d0t)+c1+d1t
Intercepts and linear trends are present in the cointegrating relations, and deterministic quadratic trends are present in the levels of the data. If quadratic trends are not present in the data, this model can produce good in-sample fits but poor out-of-sample forecasts.
For more details on the Johansen forms, see estimate. • If Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter), the default is the Johansen form used for estimation (see Model). • Otherwise, the default is "H1". Tip A best practice is to maintain model consistency during the simulation that estimates confidence bounds. Therefore, if Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter), incorporate any constraints imposed during estimation by deferring to the default value of Model. Example: Model="H1*" Data Types: string | char Options for Confidence Bound Estimation
NumPaths — Number of sample paths 100 (default) | positive integer Number of sample paths (trials) to generate, specified as a positive integer. Example: NumPaths=1000 generates 1000 sample paths from which the software derives the confidence bounds. Data Types: double SampleSize — Number of observations for Monte Carlo simulation or bootstrap per sample path positive integer Number of observations for the Monte Carlo simulation or bootstrap per sample path, specified as a positive integer. • If Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter), the default is the sample size of the data to which the model is fit (see summarize). 12-849
12
Functions
• Otherwise: • If fevd estimates confidence bounds by conducting a Monte Carlo simulation, you must specify SampleSize. • If fevd estimates confidence bounds by bootstrapping residuals, the default is the length of the specified series of residuals (size(Res,1), where Res is the number of residuals in E or InSample). Example: If you specify SampleSize=100 and do not specify the E name-value argument, the software estimates confidence bounds from NumPaths random paths of length 100 from Mdl. Example: If you specify SampleSize=100,E=Res, the software resamples, with replacement, 100 observations (rows) from Res to form a sample path of innovations to filter through Mdl. The software forms NumPaths random sample paths from which it derives confidence bounds. Data Types: double Y0 — Presample response data numeric matrix Presample response data that provides initial values for model estimation during the simulation, specified as a numpreobs-by-numseries numeric matrix. Use Y0 only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreobs is the number of presample observations. numseries is Mdl.NumSeries, the dimensionality of the input model. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs is the number of specified presample responses and it must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. numseries is the dimensionality of the input VEC model Mdl.NumSeries. Columns must correspond to the response variables in Mdl.SeriesNames. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Y0 to the presample response data used for estimation by default (see the Y0 name-value argument of estimate). • If Mdl is a custom model and you return confidence bounds Lower or Upper, you must specify Y0. Data Types: double Presample — Presample data table | timetable Presample data that provide initial values for the model Mdl, specified as a table or timetable with numprevars variables and numpreobs rows. Use Presample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. 12-850
fevd
Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs is the number of specified presample responses and it must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. Each variable is a numpreobs numeric vector representing one path. To control presample variable selection, see the optional PresampleResponseVariables name-value argument. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the following conditions hold: • The last row contains the latest presample observation. • Presample.Properties.RowsNames must be empty. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Presample to the presample response data used for estimation by default (see the Presample name-value argument of estimate). • If Mdl is a custom model (for example, you modify a model after estimation by using dot notation) and you return confidence bounds in the table or timetable Tbl, you must specify Presample. PresampleResponseVariables — Variables to select from Presample to use for presample response data string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Presample to use for presample data, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Presample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleResponseVariables(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleResponseVariables) is numseries PresampleResponseVariables applies only when you specify Presample. The selected variables must be numeric vectors and cannot contain missing values (NaN). PresampleResponseNames does not need to contain the same names as in Mdl.SeriesNames; fevd uses the data in selected variable PresampleResponseVariables(j) as a presample for Mdl.SeriesNames(j). If the number of variables in Presample matches Mdl.NumSeries, the default specifies all variables in Presample. If the number of variables in Presample exceeds Mdl.NumSeries, the default matches variables in Presample to names in Mdl.SeriesNames. Example: PresampleResponseVariables=["GDP" "CPI"] 12-851
12
Functions
Example: PresampleResponseVariables=[true false true false] or PresampleResponseVariable=[1 3] selects the first and third table variables for presample data. Data Types: double | logical | char | cell | string X — Predictor data numeric matrix Predictor data xt for estimating the model regression component during the simulation, specified as a numeric matrix containing numpreds columns. Use X only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreds is the number of predictor variables (size(Mdl.Beta,2)). Each row corresponds to an observation, and measurements in each row occur simultaneously. The last row contains the latest observation. X must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. fevd does not use the regression component in the presample period. Columns correspond to individual predictor variables. All predictor variables are present in the regression component of each response equation. To maintain model consistency when fevd estimates the confidence bounds, a good practice is to specify predictor data when Mdl has a regression component. If Mdl is an estimated model, specify the predictor data used during model estimation (see the X name-value argument of estimate). By default, fevd excludes the regression component from confidence bound estimation, regardless of its presence in Mdl. Data Types: double E — Series of residuals from which to draw bootstrap samples numeric matrix Series of residuals from which to draw bootstrap samples, specified as a numperiods-by-numseries numeric matrix. fevd assumes that E is free of serial correlation. Use E only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. Each column is the residual series corresponding to the response series names in Mdl.SeriesNames. Each row corresponds to a period in the FEVD and the corresponding confidence bounds. If Mdl is an estimated vecm model object (an object returned by estimate), you can specify E as the inferred residuals from estimation (see the E output argument of estimate or infer). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Data Types: double 12-852
fevd
InSample — Time series data table | timetable Time series data containing numvars variables, including numseries variables of residuals et to bootstrap or numpreds predictor variables xt for the model regression component, specified as a table or timetable. Use InSample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to response data in a table or timetable. Each variable is a single path of observations, which fevdapplies to all NumPaths sample paths. If you specify Presample you must specify which variables are residuals and predictors, see the ResidualVariables and PredictorVariables name-value arguments. Each row is an observation, and measurements in each row occur simultaneously. InSample must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). • The datetime vector InSample.Time must be strictly ascending or descending. • Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the following conditions hold: • The last row contains the latest observation. • InSample.Properties.RowsNames must be empty. By default, fevd derives confidence bounds by conducting a Monte Carlo simulation and does not use model the regression component, regardless of its presence in Mdl. ResidualVariables — Variables to select from InSample to treat as residuals et for bootstrapping string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as residuals for bootstrapping, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in InSample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where ResidualVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(ResidualVariables) is numseries Regardless, selected residual variable j is the residual series for Mdl.SeriesNames(j). The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Example: ResidualVariables=["GDP_Residuals" "CPI_Residuals"] 12-853
12
Functions
Example: ResidualVariables=[true false true false] or ResidualVariable=[1 3] selects the first and third table variables as the disturbance variables. Data Types: double | logical | char | cell | string PredictorVariables — Variables to select from InSample to treat as exogenous predictor variables xt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as exogenous predictor variables xt, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames • A length numpreds vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(PredictorVariables) is numpreds Regardless, selected predictor variable j corresponds to the coefficients Mdl.Beta(:,j). PredictorVariables applies only when you specify InSample. The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string Confidence — Confidence level 0.95 (default) | numeric scalar in [0,1] Confidence level for the confidence bounds, specified as a numeric scalar in the interval [0,1]. For each period, randomly drawn confidence intervals cover the true response 100*Confidence% of the time. The default value is 0.95, which implies that the confidence bounds represent 95% confidence intervals. Example: Confidence=0.9 specifies 90% confidence intervals. Data Types: double Note • NaN values in Y0, X, and E indicate missing data. fevd removes missing data from these arguments by list-wise deletion. Each argument, if a row contains at least one NaN, then fevd removes the entire row. List-wise deletion reduces the sample size, can create irregular time series, and can cause E and X to be unsynchronized. 12-854
fevd
• fevd issues an error when any table or timetable input contains missing values.
Output Arguments Decomposition — FEVD numeric array FEVD of each response variable, returned as a numobs-by-numseries-by-numseries numeric array. numobs is the value of NumObs. Columns and pages correspond to the response variables in Mdl.SeriesNames. fevd returns Decomposition only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Decomposition(t,j,k) is the contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time t, for t = 1,2,…,numobs, j = 1,2,...,numseries, and k = 1,2,...,numseries. Lower — Lower confidence bounds numeric array Lower confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Lower correspond to elements of Decomposition. fevd returns Lower only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Lower(t,j,k) is the lower bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Upper — Upper confidence bounds numeric array Upper confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Upper correspond to elements of Decomposition. fevd returns Upper only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Upper(t,j,k) is the upper bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Tbl — FEVD and confidence bounds table | timetable 12-855
12
Functions
FEVD and confidence bounds, returned as a table or timetable with numobs rows. fevd returns Tbl only in the following situations: • You supply optional data inputs as tables or timetables. • Mdl is an estimated model object fit to response data in a table or timetable. Regardless, the data type of Tbl is the same as the data type of specified data. Tbl contains the following variables: • The FEVD of each series in yt. Each FEVD variable in Tbl is a numobs-by-numseries numeric matrix, where numobs is the value of NumObs and numseries is the value of Mdl.NumSeries. fevd names the FEVD of response variable ResponseJ in Mdl.SeriesNames ResponseJ_FEVD. For example, if Mdl.Series(j) is GDP, Tbl contains a variable for the corresponding FEVD with the name GDP_FEVD. ResponseJ_FEVD(t,k) is the contribution to the variance decomposition of response variable ResponseJ attributable to a one-standard-deviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. • The lower and upper confidence bounds on the true FEVD of the response series, when you set at least one name-value argument that controls the confidence bounds. Each confidence bound variable in Tbl is a numobs-by-numseries numeric matrix. ResponseJ_FEVD_LowerBound and ResponseJ_FEVD_UpperBound are the names of the lower and upper bound variables, respectively, of the confidence interval on the FEVD of response variable Mdl.SeriesNames(J) = ResponseJ . For example, if Mdl.SeriesNames(j) is GDP, Tbl contains variables for the corresponding lower and upper bounds of the confidence interval with the name GDP_FEVD_LowerBound and GDP_FEVD_UpperBound. (ResponseJ_FEVD_LowerBound(t,k), ResponseJ_FEVD_UpperBound(t,k)) is the 95% confidence interval on the FEVD of response variable ResponseJ attributable to a one-standarddeviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. If Tbl is a timetable, the row order of Tbl, either ascending or descending, matches the row order of InSample, when you specify it. If you do not specify InSample and you specify Presample, the row order of Tbl is the same as the row order Presample.
More About Forecast Error Variance Decomposition The forecast error variance decomposition (FEVD) of a multivariate, dynamic system shows the relative importance of a shock to each innovation in affecting the forecast error variance of all variables in the system. Consider a numseries-D VEC(p – 1) model on page 12-857 for the multivariate response variable yt. In lag operator notation, the equivalent VAR(p) representation of a VEC(p – 1) model is: Γ(L)yt = c + dt + βxt + εt, where Γ(L) = I − Γ1L − Γ2L2 − ... − ΓpLp and I is the numseries-by-numseries identify matrix. In lag operator notation, the infinite lag MA representation of yt is: 12-856
fevd
yt = Γ−1(L) c + βxt + dt + Γ−1(L)εt = Ω(L) c + βxt + dt + Ω(L)εt . The general form of the FEVD of ykt (variable k) m periods into the future, attributable to a onestandard-deviation innovation shock to yjt, is m−1
∑
γm jk =
t=0 m−1
∑
t=0
ek′Cte j
2
. ek′ Ωt ΣΩt′ek
• ej is a selection vector of length numseries containing a 1 in element j and zeros elsewhere. • For orthogonalized FEVDs, Cm = ΩmP, where P is the lower triangular factor in the Cholesky factorization of Σ. • For generalized FEVDs, Cm = σ−1ΩmΣ, where σj is the standard deviation of innovation j. j • The numerator is the contribution of an innovation shock to variable j to the forecast error variance of the m-step-ahead forecast of variable k. The denominator is the mean square error (MSE) of the m-step-ahead forecast of variable k [4]. Vector Error-Correction Model A vector error-correction (VEC) model is a multivariate, stochastic time series model consisting of a system of m = numseries equations of m distinct, differenced response variables. Equations in the system can include an error-correction term, which is a linear function of the responses in levels used to stabilize the system. The cointegrating rank r is the number of cointegrating relations that exist in the system. Each response equation can include an autoregressive polynomial composed of first differences of the response series (short-run polynomial of degree p – 1), a constant, a time trend, exogenous predictor variables, and a constant and time trend in the error-correction term. A VEC(p – 1) model in difference-equation notation and in reduced form can be expressed in two ways: • This equation is the component form of a VEC model, where the cointegration adjustment speeds and cointegration matrix are explicit, whereas the impact matrix is implied. Δyt = A B′yt − 1 + c0 + d0t + c1 + d1t + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt = c + dt + AB′yt − 1 + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt . The cointegrating relations are B'yt – 1 + c0 + d0t and the error-correction term is A(B'yt – 1 + c0 + d0t). • This equation is the impact form of a VEC model, where the impact matrix is explicit, whereas the cointegration adjustment speeds and cointegration matrix are implied. Δyt = Πyt − 1 + A c0 + d0t + c1 + d1t + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt = c + dt + Πyt − 1 + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt . In the equations: 12-857
12
Functions
• yt is an m-by-1 vector of values corresponding to m response variables at time t, where t = 1,...,T. • Δyt = yt – yt – 1. The structural coefficient is the identity matrix. • r is the number of cointegrating relations and, in general, 0 < r < m. • A is an m-by-r matrix of adjustment speeds. • B is an m-by-r cointegration matrix. • Π is an m-by-m impact matrix with a rank of r. • c0 is an r-by-1 vector of constants (intercepts) in the cointegrating relations. • d0 is an r-by-1 vector of linear time trends in the cointegrating relations. • c1 is an m-by-1 vector of constants (deterministic linear trends in yt). • d1 is an m-by-1 vector of linear time-trend values (deterministic quadratic trends in yt). • c = Ac0 + c1 and is the overall constant. • d = Ad0 + d1 and is the overall time-trend coefficient. • Φj is an m-by-m matrix of short-run coefficients, where j = 1,...,p – 1 and Φp – 1 is not a matrix containing only zeros. • xt is a k-by-1 vector of values corresponding to k exogenous predictor variables. • β is an m-by-k matrix of regression coefficients. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. Condensed and in lag operator notation, the system is Φ(L)(1 − L)yt = A B′yt − 1 + c0 + d0t + c1 + d1t + βxt + εt = c + dt + AB′yt − 1 + βxt + εt where Φ(L) = I − Φ1 − Φ2 − ... − Φp − 1, I is the m-by-m identity matrix, and Lyt = yt – 1. If m = r, then the VEC model is a stable VAR(p) model in the levels of the responses. If r = 0, then the error-correction term is a matrix of zeros, and the VEC(p – 1) model is a stable VAR(p – 1) model in the first differences of the responses.
Algorithms • If Method is "orthogonalized", then fevd orthogonalizes the innovation shocks by applying the Cholesky factorization of the model covariance matrix Mdl.Covariance. The covariance of the orthogonalized innovation shocks is the identity matrix, and the FEVD of each variable sums to one, that is, the sum along any row of Decomposition or rows associated with FEVD variables in Tbl is one. Therefore, the orthogonalized FEVD represents the proportion of forecast error variance attributable to various shocks in the system. However, the orthogonalized FEVD generally depends on the order of the variables. If Method is "generalized", then the resulting FEVD, then the resulting FEVD is invariant to the order of the variables, and is not based on an orthogonal transformation. Also, the resulting FEVD sums to one for a particular variable only when Mdl.Covariance is diagonal[5]. Therefore, the generalized FEVD represents the contribution to the forecast error variance of equation-wise shocks to the response variables in the model. • If Mdl.Covariance is a diagonal matrix, then the resulting generalized and orthogonalized FEVDs are identical. Otherwise, the resulting generalized and orthogonalized FEVDs are identical 12-858
fevd
only when the first variable in Mdl.SeriesNames shocks all variables (for example, all else being the same, both methods yield the same value of Decomposition(:,1,:)). • The predictor data in X or InSample represents a single path of exogenous multivariate time series. If you specify X or InSample and the model Mdl has a regression component (Mdl.Beta is not an empty array), fevd applies the same exogenous data to all paths used for confidence interval estimation. • fevd conducts a simulation to estimate the confidence bounds Lower and Upper or associated variables in Tbl. • If you do not specify residuals by supplying E or using InSample, fevd conducts a Monte Carlo simulation by following this procedure: 1
Simulate NumPaths response paths of length SampleSize from Mdl.
2
Fit NumPaths models that have the same structure as Mdl to the simulated response paths. If Mdl contains a regression component and you specify predictor data by supplying X or using InSample, fevd fits the NumPaths models to the simulated response paths and the same predictor data (the same predictor data applies to all paths).
3
Estimate NumPaths FEVDs from the NumPaths estimated models.
4
For each time point t = 0,…,NumObs, estimate the confidence intervals by computing 1 – Confidence and Confidence quantiles (the upper and lower bounds, respectively).
• Otherwise, fevd conducts a nonparametric bootstrap by following this procedure: 1
Resample, with replacement, SampleSize residuals from E or InSample. Perform this step NumPaths times to obtain NumPaths paths.
2
Center each path of bootstrapped residuals.
3
Filter each path of centered, bootstrapped residuals through Mdl to obtain NumPaths bootstrapped response paths of length SampleSize.
4
Complete steps 2 through 4 of the Monte Carlo simulation, but replace the simulated response paths with the bootstrapped response paths.
Version History Introduced in R2019a
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [3] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [4] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [5] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29. 12-859
12
Functions
See Also Objects vecm Functions estimate | simulate | filter | irf | varm
12-860
fgls
fgls Feasible generalized least squares
Syntax [coeff,se,EstCoeffCov] = fgls(X,y) [CoeffTbl,CovTbl] = fgls(Tbl) [ ___ ] = fgls( ___ ,Name=Value) [ ___ ] = fgls(ax, ___ ,Plot=plot) [ ___ ,iterPlots] = fgls( ___ ,Plot=plot)
Description [coeff,se,EstCoeffCov] = fgls(X,y) returns vectors of coefficient estimates coeff and corresponding standard errors se, and the estimated coefficient covariance matrix EstCoeffCov from applying feasible generalized least squares on page 12-881 (FGLS) to the multiple linear regression model y = Xβ + ε. y is a vector of response data and X is a matrix of predictor data. [CoeffTbl,CovTbl] = fgls(Tbl) applies FGLS to the variables in the table or timetable Tbl, and returns FGLS coefficient estimates and standard errors in the table CoeffTbl and FGLS estimated coefficient covariance matrix EstCoeffCov. The response variable in the regression is the last table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. [ ___ ] = fgls( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. fgls returns the output argument combination for the corresponding input arguments. For example, fgls(Tbl,ResponseVariable="GDP",InnovMdl="H4",Plot="all") provides coefficient, standard error, and residual mean-squared error (MSE) plots of iterations of FGLS for a regression model with White’s robust innovations covariance, and the table variable GDP is the response while all other variables are predictors. [ ___ ] = fgls(ax, ___ ,Plot=plot) plots on the axes specified in ax instead of the axes of new figures when plot is not "off". ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,iterPlots] = fgls( ___ ,Plot=plot) returns handles to plotted graphics objects when plot is not "off". Use elements of iterPlots to modify properties of the plots after you create them.
Examples
12-861
12
Functions
Estimate FGLS Coefficients and Uncertainty Measures Suppose the sensitivity of the US consumer price index (CPI) to changes in the paid compensation of employees (COE) is of interest. Load the US macroeconomic data set, which contains the timetable of data DataTimeTable. Extract the COE and CPI series from the table. load Data_USEconModel.mat COE = DataTimeTable.COE; CPI = DataTimeTable.CPIAUCSL; dt = DataTimeTable.Time;
Plot the series. tiledlayout(2,1) nexttile plot(dt,CPI); title("\bf Consumer Price Index, Q1 in 1947 to Q1 in 2009"); axis tight nexttile plot(dt,COE); title("\bf Compensation Paid to Employees, Q1 in 1947 to Q1 in 2009"); axis tight
The series are nonstationary. Stabilize them by computing their returns.
12-862
fgls
rCPI = price2ret(CPI); rCOE = price2ret(COE);
Regress rCPI onto rCOE including an intercept to obtain ordinary least squares (OLS) estimates, standard errors, and the estimated coefficient covariance. Generate a lagged residual plot. Mdl = fitlm(rCOE,rCPI); clmCoeff = Mdl.Coefficients.Estimate clmCoeff = 2×1 0.0033 0.3513 clmSE = Mdl.Coefficients.SE clmSE = 2×1 0.0010 0.0490 CLMEstCoeffCov = Mdl.CoefficientCovariance CLMEstCoeffCov = 2×2 0.0000 -0.0000
-0.0000 0.0024
figure plotResiduals(Mdl,"lagged")
12-863
12
Functions
The residual plot exhibits an upward trend, which suggests that the innovations comprise an autoregressive process. This violates one of the classical linear model assumptions. Consequently, hypothesis tests based on the regression coefficients are incorrect, even asymptotically. Estimate the regression coefficients, standard errors, and coefficient covariances using FGLS. By default, fgls includes an intercept in the regression model and imposes an AR(1) model on the innovations. [coeff,se,EstCoeffCov] = fgls(rCPI,rCOE) coeff = 2×1 0.0148 0.1961 se = 2×1 0.0012 0.0685 EstCoeffCov = 2×2 0.0000 -0.0000
12-864
-0.0000 0.0047
fgls
Row 1 of the outputs corresponds to the intercept and row 2 corresponds to the coefficient of rCOE. If the COE series is exogenous with respect to the CPI, then the FGLS estimates coeff are consistent and asymptotically more efficient than the OLS estimates.
Estimate FGLS Coefficients and Uncertainty Measures on Table Data Load the US macroeconomic data set, which contains the timetable of data DataTimeTable. load Data_USEconModel
Stabilize all series by computing their returns. RDT = price2ret(DataTimeTable);
RDT is a timetable of returns of all variables in DataTimeTable. The price2ret function conserves variable names. Estimate the regression coefficients, standard errors, and the coefficient covariance matrix using FGLS. Specify the response and predictor variable names. [CoeffTbl,CoeffCovTbl] = fgls(RDT,ResponseVariable="CPIAUCSL",PredictorVariables="COE") CoeffTbl=2×2 table Coeff __________ Const COE
6.2416e-05 0.20562
CoeffCovTbl=2×2 table Const ___________ Const COE
1.7848e-10 -5.6329e-07
SE _________ 1.336e-05 0.055615
COE ___________ -5.6329e-07 0.003093
When you supply a table or timetable of data, fgls returns tables of estimates.
Specify AR Lags For FGLS Estimation Suppose the sensitivity of the US consumer price index (CPI) to changes in the paid compensation of employees (COE) is of interest. This example expands on the analysis outlined in the example “Estimate FGLS Coefficients and Uncertainty Measures” on page 12-861. Load the US macroeconomic data set. load Data_USEconModel
The series are nonstationary. Stabilize them by applying the log, and then the first difference. 12-865
12
Functions
LDT = price2ret(Data); rCOE = LDT(:,1); rCPI = LDT(:,2);
Regress rCPI onto rCOE, which includes an intercept to obtain OLS estimates. Plot correlograms for the residuals. Mdl = fitlm(rCOE,rCPI); u = Mdl.Residuals.Raw; figure; subplot(2,1,1) autocorr(u); subplot(2,1,2); parcorr(u);
The correlograms suggest that the innovations have significant AR effects. According to “Box-Jenkins Methodology” on page 3-2, the innovations seem to comprise an AR(3) series. Estimate the regression coefficients using FGLS. By default, fgls assumes that the innovations are autoregressive. Specify that the innovations are AR(3) by using the ARLags name-value argument, and print the final estimates to the command window by using the Display name-value argument. fgls(rCPI,rCOE,ARLags=3,Display="final"); OLS Estimates: |
12-866
Coeff
SE
fgls
-----------------------Const | 0.0122 0.0009 x1 | 0.4915 0.0686 FGLS Estimates: | Coeff SE -----------------------Const | 0.0148 0.0012 x1 | 0.1972 0.0684
If the COE rate series is exogenous with respect to the CPI rate, the FGLS estimates are consistent and asymptotically more efficient than the OLS estimates.
Account for Residual Heteroscedasticity Using FGLS Estimation Model the nominal GNP GNPN growth rate accounting for the effects of the growth rates of the consumer price index CPI, real wages WR, and the money stock MS. Account for classical linear model departures. Load the Nelson-Plosser data set, which contains the data in the table DataTable. Remove all observations containing at least one missing value. load Data_NelsonPlosser DT = rmmissing(DataTable); T = height(DT);
Plot the series. predNames = ["CPI" "WR" "MS"]; tiledlayout(2,2) for j = ["GNPN" predNames] nexttile plot(DT{:,j}); xticklabels(DT.Dates) title(j); axis tight end
12-867
12
Functions
All series appear nonstationary. For each series, compute the returns. RetDT = price2ret(DT);
RetTT is a timetable of the returns of the variables in TT. The variables names are conserved. Regress the GNPN rate onto the CPI, WR, and MS rates. Examine a scatter plot and correlograms of the residuals. Mdl = fitlm(RetDT,ResponseVar="GNPN",PredictorVar=predNames); figure plotResiduals(Mdl,"caseorder"); axis tight
12-868
fgls
figure tiledlayout(2,1) nexttile autocorr(Mdl.Residuals.Raw); nexttile parcorr(Mdl.Residuals.Raw);
12-869
12
Functions
The residuals appear to flare in, which is indicative of heteroscedasticity. The correlograms suggest that there is no autocorrelation. Estimate FGLS coefficients by accounting for the heteroscedasticity of the residuals. Specify that the estimated innovation covariance is diagonal with the squared residuals as weights (that is, White's robust estimator H0). fgls(RetDT,ResponseVariable="GNPN",PredictorVariables=predNames, ... InnovMdl="HC0",Display="final"); OLS Estimates: | Coeff SE ------------------------Const | -0.0076 0.0085 CPI | 0.9037 0.1544 WR | 0.9036 0.1906 MS | 0.4285 0.1379 FGLS Estimates: | Coeff SE ------------------------Const | -0.0102 0.0017 CPI | 0.8853 0.0169 WR | 0.8897 0.0294 MS | 0.4874 0.0291
12-870
fgls
Estimate FGLS Coefficients of Models Containing ARMA Errors Create this regression model with ARMA(1,2) errors, where εt is Gaussian with mean 0 and variance 1. 2 + ut 3 ut = 0 . 6ut − 1 + εt − 0 . 3εt − 1 + 0 . 1εt − 1 . yt = 1 + xt
beta = [2 3]; phi = 0.2; theta = [-0.3 0.1]; Mdl = regARIMA(AR=phi,MA=theta,Intercept=1, ... Beta=beta,Variance=1);
Mdl is a regARIMA model. You can access its properties using dot notation. Simulate 500 periods of 2-D standard Gaussian values for xt, and then simulate responses using Mdl. numObs = 500; rng(1); % For reproducibility X = randn(numObs,2); y = simulate(Mdl,numObs,X=X);
fgls supports AR(p) innovation models. You can convert an ARMA model polynomial to an infinite-lag AR model polynomial using arma2ar. By default, arma2ar returns the coefficients for the first 10 terms. After the conversion, determine how many lags of the resulting AR model are practically significant by checking the length of the returned vector of coefficients. Choose the number of terms that exceed 0.00001. format long arParams = arma2ar(phi,theta) arParams = 1×3 -0.100000000000000
0.070000000000000
0.031000000000000
arLags = sum(abs(arParams) > 0.00001); format short
Some of the parameters have small magnitude. You might want to reduce the number of lags to include in the innovations model for fgls. Estimate the coefficients and their standard errors using FGLS and the simulated data. Specify that the innovations comprise an AR(arLags) process. [coeff,~,EstCoeffCov] = fgls(X,y,InnovMdl="AR",ARLags=arLags) coeff = 3×1 1.0372 2.0366
12-871
12
Functions
2.9918 EstCoeffCov = 3×3 0.0026 -0.0000 0.0001
-0.0000 0.0022 0.0000
0.0001 0.0000 0.0024
The estimated coefficients are close to their true values.
Plot Iterations of FGLS Estimation This example expands on the analysis in “Estimate FGLS Coefficients of Models Containing ARMA Errors” on page 12-871. Create this regression model with ARMA(1,4) errors, where εt is Gaussian with mean 0 and variance 1. 1.5 + ut 2 ut = 0 . 9ut − 1 + εt − 0 . 4εt − 1 + 0 . 2εt − 4 . yt = 1 + xt
beta = [1.5 2]; phi = 0.9; theta = [-0.4 0.2]; Mdl = regARIMA(AR=phi,MA=theta,MALags=[1 4],Intercept=1,Beta=beta,Variance=1);
Suppose the distribution of the predictors is xt ∼ N
−1 0 . 25 0 , . 1 0 1
Simulate 30 periods from xt, and then simulate 30 corresponding responses from the regression model with ARMA errors Mdl. numObs = 30; rng(1); % For reproducibility muX = [-1 1]; sigX = [0.5 1]; X = randn(numObs,numel(beta)).*sigX + muX; y = simulate(Mdl,numObs,X=X);
Convert the ARMA model polynomial to an infinite-lag AR model polynomial using arma2ar. By default, arma2ar returns the coefficients for the first 10 terms. Find the number of terms that exceed 0.00001. arParams = arma2ar(phi,theta); arLags = sum(abs(arParams) > 1e-5);
Estimate the regression coefficients by using eight iterations of FGLS, and specify the number of lags in the AR innovation model (arLags). Also, specify to plot the coefficient estimates and their standard errors for each iteration, and to display the final estimates and the OLS estimates in tabular form. 12-872
fgls
[coeff,~,EstCoeffCov] = fgls(X,y,InnovMdl="AR",ARLags=arLags, ... NumIter=8,Plot=["coeff" "se"],Display="final"); OLS Estimates: | Coeff SE -----------------------Const | 1.7619 0.4514 x1 | 1.9637 0.3480 x2 | 1.7242 0.2152 FGLS Estimates: | Coeff SE -----------------------Const | 1.0845 0.6972 x1 | 1.7020 0.2919 x2 | 2.0825 0.1603
12-873
12
Functions
The algorithm seems to converge after the four iterations. The FGLS estimates are closer to the true values than the OLS estimates. Properties of iterative FGLS estimates in finite samples are difficult to establish. For asymptotic properties, one iteration of FGLS is sufficient, but fgls supports iterative FGLS for experimentation. If the estimates or standard errors show instability after successive iterations, then the estimated innovations covariance might be ill conditioned. Consider scaling the residuals by using the ResCond name-value argument to improve the conditioning of the estimated innovations covariance.
Input Arguments X — Predictor data X numeric matrix Predictor data X for the multiple linear regression model, specified as a numObs-by-numPreds numeric matrix. Each row represents one of the numObs observations and each column represents one of the numPreds predictor variables. Data Types: double y — Response data y numeric vector 12-874
fgls
Response data y for the multiple linear regression model, specified as a numObs-by-1 numeric vector. Rows of y and X correspond. Data Types: double Tbl — Combined predictor and response data table | timetable Combined predictor and response data for the multiple linear regression model, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. The test regresses the response variable, which is the last variable in Tbl, on the predictor variables, which are all other variables in Tbl. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument to select numPreds predictors. ax — Axes on which to plot vector of Axes objects Axes on which to plot, specified as a vector of Axes objects with length equal to the number of plots specified by the Plot name-value argument. By default, fgls creates a separate figure for each plot. Note NaNs in X, y, or Tbl indicate missing values, and fgls removes observations containing at least one NaN. That is, to remove NaNs in X or y, fgls merges the variables [X y], and then it uses list-wise deletion to remove any row that contains at least one NaN. fgls also removes any row of Tbl containing at least one NaN. Removing NaNs in the data reduces the sample size and can create irregular time series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: fgls(Tbl,ResponseVariable="GDP",InnovMdl="H4",Plot="all") provides coefficient, standard error, and residual mean squared error (RMSE) plots of iterations of FGLS for a regression model with White’s robust innovations covariance, and the table variable GDP is the response while all other variables are predictors. VarNames — Unique variable names to use in display string vector | character vector | cell vector of strings | cell vector of character vectors Unique variable names used in the display, specified as a string vector or cell vector of strings of a length numCoeffs: • If Intercept=true, VarNames(1) is the name of the intercept (for example 'Const') and VarNames(j + 1) specifies the name to use for variable X(:,j) or PredictorVariables(j). 12-875
12
Functions
• If Intercept=false, VarNames(j) specifies the name to use for variable X(:,j) or PredictorVariables(j). The defaults is one of the following alternatives prepended by 'Const' when an intercept is present in the model: • {'x1','x2',...} when you supply inputs X and y • Tbl.Properties.VariableNames when you supply input table or timetable Tbl Example: VarNames=["Const" "AGE" "BBD"] Data Types: char | cell | string Intercept — Flag to include intercept true (default) | false Flag to include a model intercept, specified as a value in this table. Value
Description
true
fgls includes an intercept term in the regression model. numCoeffs = numPreds + 1.
false
fgls does not include an intercept when fitting the regression model. numCoeffs = numPreds.
Example: Intercept=false Data Types: logical InnovMdl — Model for innovations covariance estimate "AR" (default) | "CLM" | "HC0" | "HC1" | "HC2" | "HC3" | "HC4" | character vector Model for the innovations covariance estimate, specified as a model name in the following table. Set InnovMdl to specify the structure of the innovations covariance estimator Ω . • For diagonal innovations covariance models (i.e., models with heteroscedasticity), Ω = diag(ω), where ω = {ωi; i = 1,...,T} is a vector of innovation variance estimates for the observations, and T = numObs. fgls estimates the data-driven vector ω using the corresponding model residuals (ε), their −1
leverages hi = xi(X′X) Model Name
xi′, and the degrees of freedom dfe. Weight
Reference
"CLM"
"HC0" "HC1"
12-876
[4]
T
∑
ωi =
1 df e
ωi =
εi2
[6]
ωi =
T 2 ε df e i
[5]
εi2
i=1
fgls
Model Name "HC2" "HC3"
"HC4"
Weight ωi = ωi = ωi =
Reference [5]
εi2 1 − hi
[5]
εi2 2
(1 − hi)
[1]
εi2 di
(1 − hi)
where di = min 4,
hi h
• For full innovation covariance models (in other words, models having heteroscedasticity and autocorrelation), specify "AR". fgls imposes an AR(p) model on the innovations, and constructs Ω using the number of lags, p, specified by the name-value argument arLags and the Yule-Walker equations. If the NumIter name-value argument is 1 and you specify the InnovCov0 name-value argument, fgls ignores InnovMdl. Example: InnovMdl=HC0 Data Types: char | string ARLags — Number of lags 1 (default) | positive integer Number of lags to include in the autoregressive (AR) innovations model, specified as a positive integer. If the InnovMdl name-value argument is not "AR" (that is, for diagonal models), fgls ignores ARLags. For general ARMA innovations models, convert the innovations model to the equivalent AR form by performing one of the following actions. • Construct the ARMA innovations model lag operator polynomial using LagOp. Then, divide the AR polynomial by the MA polynomial using, for example, mrdivide. The result is the infinite-order, AR representation of the ARMA model. • Use arma2ar, which returns the coefficients of the infinite-order, AR representation of the ARMA model. Example: ARLags=4 Data Types: double InnovCov0 — Initial innovations covariance [] (default) | positive vector | positive definite matrix | positive semidefinite matrix Initial innovations covariance, specified as a positive vector, positive semidefinite matrix, or a positive definite matrix.
12-877
12
Functions
InnovCov0 replaces the data-driven estimate of the innovations covariance (Ω ) in the first iteration of GLS. • For diagonal innovations covariance models (that is, models with heteroscedasticity), specify a numObs-by-1 vector. InnovCov0(j) is the variance of innovation j. • For full innovation covariance models (that is, models having heteroscedasticity and autocorrelation), specify a numObs-by-numObs matrix. InnovCov0(j,k) is the covariance of innovations j and k. By default, fgls uses a data-driven Ω (see the InnovMdl name-value argument). Data Types: double NumIter — Number of iterations 1 (default) | positive integer Number of iterations to implement for the FGLS algorithm, specified as a positive integer. fgls estimates the innovations covariance Ω at each iteration from the residual series according to the innovations covariance model InnovMdl. Then, the software computes the GLS estimates of the model coefficients. Example: NumIter=10 Data Types: double ResCond — Flag to scale residuals false (default) | true Flag to scale the residuals at each iteration of FGLS, specified as a value in this table. Value
Description
true
fgls scales the residuals at each iteration.
false
fgls does not scale the residuals at each iteration.
Tip The setting ResCond=true can improve the conditioning of the estimation of the innovations covariance Ω . Data Types: logical Display — Command window display control "off" (default) | "final" | "iter" | character vector Command window display control, specified as a value in this table.
12-878
Value
Description
"final"
fgls displays the final estimates.
"iter"
fgls displays the estimates after each iteration.
"off"
fgls suppresses command window display.
fgls
fgls shows estimation results in tabular form. Example: Display="iter" Data Types: char | string Plot — Control for plotting results "off" (default) | "all" | "coeff" | "mse" | "se" | character vector | string vector | cell array of character vectors Control for plotting results after each iteration, specified as a value in the following table, or a string vector or cell array of character vectors of such values. To examine the convergence of the FGLS algorithm, specify plotting the estimates for each iteration. Value
Description
"all"
fgls plots the estimated coefficients, their standard errors, and the residual mean-squared error (MSE) on separate plots.
"coeff"
fgls plots the estimated coefficients.
"mse"
fgls plots the MSEs.
"off"
fgls does not plot the results.
"se"
fgls plots the estimated coefficient standard errors.
Example: Plot="all" Example: Plot=["coeff" "se"] separately plots iterative coefficient estimates and their standard errors. Data Types: char | string | cell ResponseVariable — Variable in Tbl to use for response first variable in Tbl (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variable in Tbl to use for response, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. fgls uses the same specified response variable for all tests. Example: ResponseVariable="GDP" Example: ResponseVariable=[true false false false] or ResponseVariable=1 selects the first table variable as the response. Data Types: double | logical | char | cell | string PredictorVariables — Variables in Tbl to use for the predictors string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl to use for the predictors, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. 12-879
12
Functions
fgls uses the same specified predictors for all tests. By default, fgls uses all variables in Tbl that are not specified by the ResponseVariable namevalue argument. Example: PredictorVariables=["UN" "CPI"] Example: PredictorVariables=[false true true false] or DataVariables=[2 3] selects the second and third table variables. Data Types: double | logical | char | cell | string
Output Arguments coeff — FGLS coefficient estimates numeric vector FGLS coefficient estimates, returned as a numCoeffs-by-1 numeric vector. fgls returns coeff when you supply the inputs X and y. Rows of coeff correspond to the predictor matrix columns, with the first row corresponding to the intercept when Intercept=true. For example, in a model with an intercept, the value of β 1 (corresponding to the predictor x1) is in position 2 of coeff. se — Coefficient standard error estimates numeric vector Coefficient standard error estimates, returned as a numCoeffs-by-1 numeric. The elements of se are sqrt(diag(EstCoeffCov)). fgls returns se when you supply the inputs X and y. Rows of se correspond to the predictor matrix columns, with the first row corresponding to the intercept when Intercept=true. For example, in a model with an intercept, the estimated standard error of β 1 (corresponding to the predictor x1) is in position 2 of se, and is the square root of the value in position (2,2) of EstCoeffCov. EstCoeffCov — Coefficient covariance matrix estimate numeric matrix Coefficient covariance matrix estimate, returned as a numCoeffs-by-numCoeffs numeric matrix. fgls returns EstCoeffCov when you supply the inputs X and y. Rows and columns of EstCoeffCov correspond to the predictor matrix columns, with the first row and column corresponding to the intercept when Intercept=true. For example, in a model with an intercept, the estimated covariance of β 1 (corresponding to the predictor x1) and β 2 (corresponding to the predictor x2) are in positions (2,3) and (3,2) of EstCoeffCov, respectively. CoeffTbl — FGLS coefficient estimates and standard errors table FGLS coefficient estimates and standard errors, returned as a numCoeffs-by-2 table. fgls returns CoeffTbl when you supply the input Tbl. For j = 1,…,numCoeffs, row j of CoeffTbl contains estimates of coefficient j in the regression model and it has label VarNames(j). The first variable Coeff contains the coefficient estimates coeff and the second variable SE contains the standard errors se. 12-880
fgls
CovTbl — Coefficient covariance matrix estimate table Coefficient covariance matrix estimate, returned as a numCoeffs-by-numCoeffs table containing the coefficient covariance matrix estimate EstCoeffCov. fgls returns CovTbl when you supply the input Tbl. For each pair (i,j), CovTbl(i,j) contains the covariance estimate of coefficients i and j in the regression model. The label of row and variable j is VarNames(j), j = 1,…,numCoeffs. iterPlots — Handles to plotted graphics objects structure array of graphics objects Handles to plotted graphics objects, returned as a structure array of graphics objects. iterPlots contains unique plot identifiers, which you can use to query or modify properties of the plot. iterPlots is not available if the value of the Plot name-value argument is "off".
More About Feasible Generalized Least Squares Feasible generalized least squares (FGLS) estimates the coefficients of a multiple linear regression model and their covariance matrix in the presence of nonspherical innovations with an unknown covariance matrix. Let yt = Xtβ + εt be a multiple linear regression model, where the innovations process εt is Gaussian with mean 0, but with true, nonspherical covariance matrix Ω (for example, the innovations are heteroscedastic or autocorrelated). Also, suppose that the sample size is T and there are p predictors (including an intercept). Then, the FGLS estimator of β is β FGLS = X ⊤Ω
−1
X
−1 ⊤
X Ω
−1
y,
where Ω is an innovations covariance estimate based on a model (e.g., innovations process forms an AR(1) model). The estimated coefficient covariance matrix is 2
Σ FGLS = σ FGLS X ⊤Ω
−1
X
−1
,
where
2 σ FGLS
y⊤ Ω =
−1
−Ω
−1
X X ⊤Ω
−1
X
−1 ⊤
T−p
X Ω
−1
y .
FGLS estimates are computed as follows: 1
OLS is applied to the data, and then residuals ε t are computed.
2
Ω is estimated based on a model for the innovations covariance.
3
β FGLS is estimated, along with its covariance matrix Σ FGLS .
4
Optional: This process can be iterated by performing the following steps until β FGLS converges. 12-881
12
Functions
a
Compute the residuals of the fitted model using the FGLS estimates.
b
Apply steps 2–3.
If Ω is a consistent estimator of Ω and the predictors that comprise X are exogenous, then FGLS estimators are consistent and efficient. Asymptotic distributions of FGLS estimators are unchanged by repeated iteration. However, iterations might change finite sample distributions. Generalized Least Squares Generalized least squares (GLS) estimates the coefficients of a multiple linear regression model and their covariance matrix in the presence of nonspherical innovations with known covariance matrix. The setup and process for obtaining GLS estimates is the same as in FGLS on page 12-881, but replace Ω with the known innovations covariance matrix Ω. In the presence of nonspherical innovations, and with known innovations covariance, GLS estimators are unbiased, efficient, and consistent, and hypothesis tests based on the estimates are valid. Weighted Least Squares Weighted least squares (WLS) estimates the coefficients of a multiple linear regression model and their covariance matrix in the presence of uncorrelated but heteroscedastic innovations with known, diagonal covariance matrix. The setup and process to obtain WLS estimates is the same as in FGLS on page 12-881, but replace Ω with the known, diagonal matrix of weights. Typically, the diagonal elements are the inverse of the variances of the innovations. In the presence of heteroscedastic innovations, and when the variances of the innovations are known, WLS estimators are unbiased, efficient, and consistent, and hypothesis tests based on the estimates are valid.
Tips • To obtain standard generalized least squares on page 12-882 (GLS) estimates: • Set the InnovCov0 name-value argument to the known innovations covariance. • Set the NumIter name-value argument to 1. • To obtain weighted least squares on page 12-882 (WLS) estimates, set the InnovCov0 name-value argument to a vector of inverse weights (e.g., innovations variance estimates). • In specific models and with repeated iterations, scale differences in the residuals might produce a badly conditioned estimated innovations covariance and induce numerical instability. Conditioning improves when you set ResCond=true.
Algorithms • In the presence of nonspherical innovations, GLS produces efficient estimates relative to OLS and consistent coefficient covariances, conditional on the innovations covariance. The degree to which fgls maintains these properties depends on the accuracy of both the model and estimation of the innovations covariance. 12-882
fgls
• Rather than estimate FGLS estimates the usual way, fgls uses methods that are faster and more stable, and are applicable to rank-deficient cases. • Traditional FGLS methods, such as the Cochrane-Orcutt procedure, use low-order, autoregressive models. These methods, however, estimate parameters in the innovations covariance matrix using OLS, where fgls uses maximum likelihood estimation (MLE) [2].
Version History Introduced in R2014b fgls returns estimates in tables when you supply a table of data If you supply a table of time series data Tbl, fgls returns the following outputs: • In the first position, fgls returns the table CoeffTbl containing variables for coefficient estimates Coeff and standard errors SE with rows corresponding to, and labeled as, VarNames. • In the second position, fgls returns a table containing the estimated coefficient covariances CovTbl with rows and variables corresponding to, and labeled as, VarNames. Before R2022a, fgls returned the numeric outputs in separate positions of the output when you supplied a table of input data. Starting in R2022a, if you supply a table of input data, update your code to return all outputs in the first through third output positions. [CoeffTbl,CovTbl,iterPlots] = fgls(Tbl,Name=Value)
If you request more outputs, fgls issues an error. Also, access results by using table indexing. For more details, see “Access Data in Tables”.
References [1] Cribari-Neto, F. "Asymptotic Inference Under Heteroskedasticity of Unknown Form." Computational Statistics & Data Analysis. Vol. 45, 2004, pp. 215–233. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lϋtkepohl, and T. C. Lee. The Theory and Practice of Econometrics. New York, NY: John Wiley & Sons, Inc., 1985. [4] Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. 5th ed. New York: McGraw-Hill/Irwin, 2005. [5] MacKinnon, J. G., and H. White. "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." Journal of Econometrics. Vol. 29, 1985, pp. 305–325. [6] White, H. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity." Econometrica. Vol. 48, 1980, pp. 817–838. 12-883
12
Functions
See Also Functions fitlm | lscov | hac | arma2ar Objects regARIMA Topics “Classical Model Misspecification Tests” on page 3-70 “Time Series Regression I: Linear Models” on page 5-173 “Time Series Regression VI: Residual Diagnostics” on page 5-220 “Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-279 “Autocorrelation and Partial Autocorrelation” on page 3-11 “Engle’s ARCH Test” on page 3-26 “Nonspherical Models” on page 3-90 “Time Series Regression Models” on page 5-3
12-884
filter
filter Filter disturbances using ARIMA or ARIMAX model
Syntax Y = filter(Mdl,Z) Y = filter(Mdl,Z,Name,Value) [Y,E] = filter( ___ ) [Y,E,V] = filter( ___ )
Description Y = filter(Mdl,Z) returns the response series Y resulting from filtering the underlying disturbance series Z. Z is associated with the model innovations process through the ARIMA model Mdl. Y = filter(Mdl,Z,Name,Value) uses additional options specified by one or more name-value arguments. For example, filter(Mdl,Z,'X',X,'Z0',Z0) specifies exogenous predictor data X and presample disturbances Z0 to initialize the model. [Y,E] = filter( ___ ) returns the model innovations series E using any of the input arguments in the previous syntaxes. [Y,E,V] = filter( ___ ) returns the conditional variances V for a composite conditional mean and variance model (for example, an ARIMA and GARCH composite model).
Examples Compute Impulse Response Function Specify a mean zero ARIMA(2,0,1) model. Mdl = arima('Constant',0,'AR',{0.5,-0.8},'MA',-0.5,... 'Variance',0.1);
Simulate the first 20 responses of the impulse response function (IRF). Generate a disturbance series with a one-time, unit impulse, and then filter it. z = [1; zeros(19,1)]; y = filter(Mdl,z);
y is a 20-by-1 response path resulting from filtering the disturbance path z through the model. y represents the IRF. The filter function requires one presample observation to initialize the model. By default, filter uses the unconditional mean of the process, which is 0. y = y/y(1);
Normalize the IRF such that the first element is 1. Plot the impulse response function. 12-885
12
Functions
figure; stem((0:numel(y)-1)',y,'filled'); title 'Impulse Response';
The impulse response assesses the dynamic behavior of a system to a one-time, unit impulse. Alternatively, you can use the impulse function to plot the IRF for an ARIMA process.
Simulate Step Response Assess the dynamic behavior of a system to a persistent change in a variable by plotting a step response. Specify a mean zero ARIMA(2,0,1) process. Mdl = arima('Constant',0,'AR',{0.5,-0.8},'MA',-0.5,... 'Variance',0.1);
Simulate the first 20 responses to a sequence of unit disturbances. Generate a disturbance series of ones, and then filter it. Set all presample observations equal to zero. Z = ones(20,1); Y = filter(Mdl,Z,'Y0',zeros(Mdl.P,1)); Y = Y/Y(1);
12-886
filter
The last step normalizes the step response function to ensure that the first element is 1. Plot the step response function. figure; stem((0:numel(Y)-1)',Y,'filled'); title 'Step Response';
Simulate Responses from ARIMAX Model Create models for the response and predictor series. Set an ARIMAX(2,1,3) model to the response MdlY, and an AR(1) model to the MdlX. MdlY = arima('AR',{0.1 0.2},'D',1,'MA',{-0.1 0.1 0.05},... 'Constant',1,'Variance',0.5, 'Beta',2); MdlX = arima('AR',0.5,'Constant',0,'Variance',0.1);
Simulate a length 100 predictor series x and a series of iid normal disturbances z having mean zero and variance 1. rng(1); z = randn(100,1); x = simulate(MdlX,100);
Filter the disturbances z using MdlY to produce the response series y, and plot y. 12-887
12
Functions
y = filter(MdlY,z,'X',x); figure; plot(y); title 'Filter to simulate ARIMA(2,1,3)'; xlabel 'Time'; ylabel 'Response';
Simulate and Filter Multiple Paths Create a mean zero ARIMA(2,0,1) model. Mdl = arima('Constant',0,'AR',{0.5,-0.8},'MA',-0.5,... 'Variance',0.1);
Generate 20 random paths from the model. rng(1); % For reproducibility [ySim,eSim,vSim] = simulate(Mdl,100,'NumPaths',20);
ySim, eSim, and vSim are 100-by-20 matrices of 20 simulated response, innovation, and conditional variance paths of length 100, respectively. Because Mdl does not have a conditional variance model, vSim is a matrix completely composed of the value of Mdl.Variance. Obtain disturbance paths by standardizing the simulated innovations. 12-888
filter
zSim = eSim./sqrt(vSim);
Filter the disturbance paths through the model. [yFil,eFil] = filter(Mdl,zSim);
yFil and eFil are 100-by-20 matrices. The columns are independent paths generated from filtering corresponding disturbance paths in zSim through the model Mdl. Confirm that the outputs of simulate and filter are identical. sameE = norm(eSim - eFil) 1 columns or one column. Otherwise, forecast issues an error. For example, if Y0 has five columns, representing five paths, then V0 can either have five columns or one column. If V0 has one column, then forecast applies V0 to each path. • NaN values in presample data sets indicate missing data. forecast removes missing data from the presample data sets following this procedure: 1
forecast horizontally concatenates the specified presample data sets Y0 and V0 such that the latest observations occur simultaneously. The result can be a jagged array because the presample data sets can have a different number of rows. In this case, forecast prepads variables with an appropriate amount of zeros to form a matrix.
2
forecast applies list-wise deletion to the combined presample matrix by removing all rows containing at least one NaN.
3
forecast extracts the processed presample data sets from the result of step 2, and removes all prepadded zeros.
List-wise deletion reduces the sample size and can create irregular time series.
Version History Introduced in R2012a Conditional variance models require specification of presample response data to forecast conditional variances forecast now has a third input argument for you to supply presample response data. forecast(Mdl,numperiods,Y0) forecast(Mdl,numperiods,Y0,Name,Value)
Before R2019a, the syntaxes were: forecast(Mdl,numperiods) forecast(Mdl,numperiods,Name,Value)
You could optionally supply presample responses using the 'Y0' name-value pair argument. There are no plans to remove the previous syntaxes or the 'Y0' name-value pair argument at this time. However, you are encouraged to supply presample responses because, to forecast conditional variances from a conditional variance model, forecast must initialize models containing lagged variables. Without specified presample responses, forecast initializes models by using reasonable default values, but the default might not support all workflows. This table describes the default values for each conditional variance model object.
12-1050
forecast
Model Object
Presample Default
garch
All presample responses are the unconditional standard deviation of the process.
egarch
All presample responses are 0.
gjr
All presample responses are the unconditional standard deviation of the process.
Update Code
Update your code by specifying presample response data in the third input argument. If you do not supply presample responses, then forecast provides default presample values that might not support all workflows.
References [1] Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics. Vol. 31, 1986, pp. 307–327. [2] Bollerslev, T. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” The Review of Economics and Statistics. Vol. 69, 1987, pp. 542–547. [3] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [4] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995. [5] Engle, R. F. “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica. Vol. 50, 1982, pp. 987–1007. [6] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [7] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [8] Nelson, D. B. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica. Vol. 59, 1991, pp. 347–370.
See Also Objects garch | egarch | gjr Functions estimate | filter | simulate Topics “Forecast a Conditional Variance Model” on page 8-97 “Forecast GJR Models” on page 8-94 “MMSE Forecasting of Conditional Variance Models” on page 8-90 12-1051
12
Functions
forecast Class: dssm Forecast states and observations of diffuse state-space models
Syntax [Y,YMSE] = forecast(Mdl,numPeriods,Y0) [Y,YMSE] = forecast(Mdl,numPeriods,Y0,Name,Value) [Y,YMSE,X,XMSE] = forecast( ___ )
Description [Y,YMSE] = forecast(Mdl,numPeriods,Y0) returns forecasted observations on page 11-10 (Y) and their corresponding variances (YMSE) from forecasting the diffuse state-space model on page 114 Mdl using a numPeriods forecast horizon and in-sample observations Y0. [Y,YMSE] = forecast(Mdl,numPeriods,Y0,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, for state-space models that include a linear regression component in the observation model, include in-sample predictor data, predictor data for the forecast horizon, and the regression coefficient. [Y,YMSE,X,XMSE] = forecast( ___ ) uses any of the input arguments in the previous syntaxes to additionally return state forecasts on page 11-8 (X) and their corresponding variances (XMSE).
Input Arguments Mdl — Diffuse state-space model dssm model object Diffuse state-space model, specified as an dssm model object returned by dssm or estimate. If Mdl is not fully specified (that is, Mdl contains unknown parameters), then specify values for the unknown parameters using the 'Params' name-value pair argument. Otherwise, the software issues an error. estimate returns fully-specified state-space models. Mdl does not store observed responses or predictor data. Supply the data wherever necessary using the appropriate input or name-value pair arguments. numPeriods — Forecast horizon positive integer Forecast horizon, specified as a positive integer. That is, the software returns 1,..,numPeriods forecasts. Data Types: double Y0 — In-sample, observed responses cell vector of numeric vectors | numeric matrix In-sample, observed responses, specified as a cell vector of numeric vectors or a matrix. 12-1052
forecast
• If Mdl is time invariant, then Y0 is a T-by-n numeric matrix, where each row corresponds to a period and each column corresponds to a particular observation in the model. Therefore, T is the sample size and m is the number of observations per period. The last row of Y contains the latest observations. • If Mdl is time varying with respect to the observation equation, then Y is a T-by-1 cell vector. Each element of the cell vector corresponds to a period and contains an nt-dimensional vector of observations for that period. The corresponding dimensions of the coefficient matrices in Mdl.C{t} and Mdl.D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations. If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Y0 to the same data set that you used to fit Mdl. NaN elements indicate missing observations. For details on how the Kalman filter accommodates missing observations, see “Algorithms” on page 12-1065. Data Types: double | cell Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Beta',beta,'Predictors',Z specifies to deflate the observations by the regression component composed of the predictor data Z and the coefficient matrix beta. A — Forecast-horizon, state-transition, coefficient matrices cell vector of numeric matrices Forecast-horizon, state-transition, coefficient matrices, specified as the comma-separated pair consisting of 'A' and a cell vector of numeric matrices. • A must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of A is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the states, then each cell of A must contain an m-by-m matrix, where m is the number of the in-sample states per period. By default, the software uses Mdl.A throughout the forecast horizon. • If Mdl is time varying with respect to the states, then the dimensions of the matrices in the cells of A can vary, but the dimensions of each matrix must be consistent with the matrices in B and C in the corresponding periods. By default, the software uses Mdl.A{end} throughout the forecast horizon. Note The matrices in A cannot contain NaN values. Data Types: cell B — Forecast-horizon, state-disturbance-loading, coefficient matrices cell vector of matrices 12-1053
12
Functions
Forecast-horizon, state-disturbance-loading, coefficient matrices, specified as the comma-separated pair consisting of 'B' and a cell vector of matrices. • B must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of B is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the states and state disturbances, then each cell of B must contain an m-by-k matrix, where m is the number of the in-sample states per period, and k is the number of in-sample, state disturbances per period. By default, the software uses Mdl.B throughout the forecast horizon. • If Mdl is time varying, then the dimensions of the matrices in the cells of B can vary, but the dimensions of each matrix must be consistent with the matrices in A in the corresponding periods. By default, the software uses Mdl.B{end} throughout the forecast horizon. Note The matrices in B cannot contain NaN values. Data Types: cell C — Forecast-horizon, measurement-sensitivity, coefficient matrices cell vector of matrices Forecast-horizon, measurement-sensitivity, coefficient matrices, specified as the comma-separated pair consisting of 'C' and a cell vector of matrices. • C must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of C is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the states and the observations, then each cell of C must contain an n-by-m matrix, where n is the number of the in-sample observations per period, and m is the number of in-sample states per period. By default, the software uses Mdl.C throughout the forecast horizon. • If Mdl is time varying with respect to the states or the observations, then the dimensions of the matrices in the cells of C can vary, but the dimensions of each matrix must be consistent with the matrices in A and D in the corresponding periods. By default, the software uses Mdl.C{end} throughout the forecast horizon. Note The matrices in C cannot contain NaN values. Data Types: cell D — Forecast-horizon, observation-innovation, coefficient matrices cell vector of matrices Forecast-horizon, observation-innovation, coefficient matrices, specified as the comma-separated pair consisting of 'D' and a cell vector of matrices. • D must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of D is greater than numPeriods, then the 12-1054
forecast
software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the observations and the observation innovations, then each cell of D must contain an n-by-h matrix, where n is the number of the in-sample observations per period, and h is the number of in-sample, observation innovations per period. By default, the software uses Mdl.D throughout the forecast horizon. • If Mdl is time varying with respect to the observations or the observation innovations, then the dimensions of the matrices in the cells of D can vary, but the dimensions of each matrix must be consistent with the matrices in C in the corresponding periods. By default, the software uses Mdl.D{end} throughout the forecast horizon. Note The matrices in D cannot contain NaN values. Data Types: cell Beta — Regression coefficients [] (default) | numeric matrix Regression coefficients corresponding to predictor variables, specified as the comma-separated pair consisting of 'Beta' and a d-by-n numeric matrix. d is the number of predictor variables (see Predictors0 and PredictorsF) and n is the number of observed response series (see Y0). • If you specify Beta, then you must also specify Predictors0 and PredictorsF. • If Mdl is an estimated state-space model, then specify the estimated regression coefficients stored in Mdl.estParams. By default, the software excludes a regression component from the state-space model. Predictors0 — In-sample, predictor variables in state-space model observation equation [] (default) | matrix In-sample, predictor variables in the state-space model observation equation, specified as the commaseparated pair consisting of 'Predictors0' and a matrix. The columns of Predictors0 correspond to individual predictor variables. Predictors0 must have T rows, where row t corresponds to the observed predictors at period t (Zt). The expanded observation equation is yt − Zt β = Cxt + Dut . In other words, the software deflates the observations using the regression component. β is the timeinvariant vector of regression coefficients that the software estimates with all other parameters. • If there are n observations per period, then the software regresses all predictor series onto each observation. • If you specify Predictors0, then Mdl must be time invariant. Otherwise, the software returns an error. • If you specify Predictors0, then you must also specify Beta and PredictorsF. • If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Predictors0 to the same predictor data set that you used to fit Mdl. By default, the software excludes a regression component from the state-space model. 12-1055
12
Functions
Data Types: double PredictorsF — Forecast-horizon, predictor variables in state-space model observation equation [] (default) | numeric matrix In-sample, predictor variables in the state-space model observation equation, specified as the commaseparated pair consisting of 'Predictors0' and a T-by-d numeric matrix. T is the number of insample periods and d is the number of predictor variables. Row t corresponds to the observed predictors at period t (Zt). The expanded observation equation is yt − Zt β = Cxt + Dut . In other words, the software deflates the observations using the regression component. β is the timeinvariant vector of regression coefficients that the software estimates with all other parameters. • If there are n observations per period, then the software regresses all predictor series onto each observation. • If you specify Predictors0, then Mdl must be time invariant. Otherwise, the software returns an error. • If you specify Predictors0, then you must also specify Beta and PredictorsF. • If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Predictors0 to the same predictor data set that you used to fit Mdl. By default, the software excludes a regression component from the state-space model. Data Types: double
Output Arguments Y — Forecasted observations matrix | cell vector of numeric vectors Forecasted observations on page 11-10, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the observations, then Y is a numPeriods-by-n matrix. If Mdl is a time-varying, state-space model with respect to the observations, then Y is a numPeriodsby-1 cell vector of numeric vectors. Cell t of Y contains an nt-by-1 numeric vector of forecasted observations for period t. YMSE — Error variances of forecasted observations matrix | cell vector of numeric vectors Error variances of forecasted observations, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the observations, then YMSE is a numPeriods-by-n matrix. If Mdl is a time-varying, state-space model with respect to the observations, then YMSE is a numPeriods-by-1 cell vector of numeric vectors. Cell t of YMSE contains an nt-by-1 numeric vector of error variances for the corresponding forecasted observations for period t. 12-1056
forecast
X — State forecasts matrix | cell vector of numeric vectors State forecasts on page 11-8, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the states, then X is a numPeriods-by-m matrix. If Mdl is a time-varying, state-space model with respect to the states, then X is a numPeriods-by-1 cell vector of numeric vectors. Cell t of X contains an mt-by-1 numeric vector of forecasted observations for period t. XMSE — Error variances of state forecasts matrix | cell vector of numeric vectors Error variances of state forecasts, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the states, then XMSE is a numPeriodsby-m matrix. If Mdl is a time-varying, state-space model with respect to the states, then XMSE is a numPeriodsby-1 cell vector of numeric vectors. Cell t of XMSE contains an mt-by-1 numeric vector of error variances for the corresponding forecasted observations for period t.
Examples Forecast Observations of Time-Invariant Diffuse State-Space Model Suppose that a latent process is a random walk. The state equation is xt = xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. T = 100; x0 = 1.5; rng(1); % For reproducibility u = randn(T,1); x = cumsum([x0;u]); x = x(2:end);
Suppose further that the latent process is subject to additive measurement error. The observation equation is yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.75*randn(T,1);
12-1057
12
Functions
Specify the four coefficient matrices. A B C D
= = = =
1; 1; 1; 0.75;
Create the diffuse state-space model using the coefficient matrices. Specify that the initial state distribution is diffuse. Mdl = dssm(A,B,C,D,'StateType',2) Mdl = State-space model type: dssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 Inf State types x1 Diffuse
Mdl is an dssm model. Verify that the model is correctly specified using the display in the Command Window. Forecast observations 10 periods into the future, and estimate the mean squared errors of the forecasts. numPeriods = 10; [ForecastedY,YMSE] = forecast(Mdl,numPeriods,y);
Plot the forecasts with the in-sample responses, and 95% Wald-type forecast intervals. 12-1058
forecast
ForecastIntervals(:,1) = ForecastedY - 1.96*sqrt(YMSE); ForecastIntervals(:,2) = ForecastedY + 1.96*sqrt(YMSE); figure plot(T-20:T,y(T-20:T),'-k',T+1:T+numPeriods,ForecastedY,'-.r',... T+1:T+numPeriods,ForecastIntervals,'-.b',... T:T+1,[y(end)*ones(3,1),[ForecastedY(1);ForecastIntervals(1,:)']],':k',... 'LineWidth',2) hold on title({'Observed Responses and Their Forecasts'}) xlabel('Period') ylabel('Responses') legend({'Observations','Forecasted observations','95% forecast intervals'},... 'Location','Best') hold off
The forecast intervals flare out because the process is nonstationary.
Forecast Observations of Diffuse State-Space Model Containing Regression Component Suppose that the linear relationship between unemployment rate and the nominal gross national product (nGNP) is of interest. Suppose further that unemployment rate is an AR(1) series. Symbolically, and in state-space form, the model is
12-1059
12
Functions
xt = ϕxt − 1 + σut yt − βZt = xt, where: • xt is the unemployment rate at time t. •
yt is the observed change in the unemployment rate being deflated by the return of nGNP (Zt).
• ut is the Gaussian series of state disturbances having mean 0 and unknown standard deviation σ. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series and removing the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); y = diff(DataTable.UR(~isNaN)); T = size(gnpn,1); Z = price2ret(gnpn);
% Flag periods containing NaNs % The sample size
This example continues using the series without NaN values. However, using the Kalman filter framework, the software can accommodate series containing missing values. Determine how well the model forecasts observations by removing the last 10 observations for comparison. numPeriods = 10; isY = y(1:end-numPeriods); oosY = y(end-numPeriods+1:end); ISZ = Z(1:end-numPeriods); OOSZ = Z(end-numPeriods+1:end);
% Forecast horizon % In-sample observations % Out-of-sample observations % In-sample predictors % Out-of-sample predictors
Specify the coefficient matrices. A = NaN; B = NaN; C = 1;
Create the state-space model using dssm by supplying the coefficient matrices and specifying that the state values come from a diffuse distribution. The diffuse specification indicates complete ignorance about the moments of the initial distribution. StateType = 2; Mdl = dssm(A,B,C,'StateType',StateType);
Estimate the parameters. Specify the regression component and its initial value for optimization using the 'Predictors' and 'Beta0' name-value pair arguments, respectively. Display the estimates and all optimization diagnostic information. Restrict the estimate of σ to all positive, real numbers. params0 = [0.3 0.2]; % Initial values chosen arbitrarily Beta0 = 0.1;
12-1060
forecast
[EstMdl,estParams] = estimate(Mdl,y,params0,'Predictors',Z,'Beta0',Beta0,... 'lb',[-Inf 0 -Inf]); Method: Maximum likelihood (fmincon) Effective Sample size: 60 Logarithmic likelihood: -110.477 Akaike info criterion: 226.954 Bayesian info criterion: 233.287 | Coeff Std Err t Stat Prob -------------------------------------------------------c(1) | 0.59436 0.09408 6.31738 0 c(2) | 1.52554 0.10758 14.17991 0 y = dt1980Q1; figure; plot(dates(idx),arate(idx),'k',... dates(end) + calquarters(1:numPeriods),yf,'r--') xlabel("Year") ylabel("GDP (Annualized Rate)") recessionplot legend("Observations","Forecasts")
12-1072
forecast
Forecast Model with VARX Submodels Compute optimal and estimated forecasts and corresponding forecast error covariance matrices from a three-state Markov-switching dynamic regression model for a 2-D VARX response process. This example uses arbitrary parameter values for the DGP. Create Fully Specified Model for DGP Create a three-state discrete-time Markov chain model for the switching mechanism. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P);
mc is a fully specified dtmc object. dtmc normalizes the rows of P so that they sum to 1. For each regime, use varm to create a VAR model that describes the response process within the regime. Specify all parameter values. % Constants C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; % Autoregression coefficients AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % Regression coefficients Beta1 = [1;-1]; Beta2 = [2 2;-2 -2]; Beta3 = [3 3 3;-3 -3 -3]; % Innovations covariances Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VARX submodels mdl1 = varm('Constant',C1,'AR',AR1,'Beta',Beta1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Beta',Beta2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Beta',Beta3,'Covariance',Sigma3); mdl = [mdl1; mdl2; mdl3];
mdl contains three fully specified varm model objects. For the DGP, create a fully specified Markov-switching dynamic regression model from the switching mechanism mc and the submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR model. Forecast Model Ignoring Regression Component If you do not supply exogenous data, simulate and forecast ignore the regression components in the submodels. forecast requires enough data before the forecast horizon to initialize the model. Simulate 120 observations from the DGP. 12-1073
12
Functions
rng('default'); % For reproducibility Y = simulate(Mdl,120);
Y is a 120-by-2 matrix of simulated responses. Rows correspond to time points, and columns correspond to variables in the system. Treat the first 100 observations of the simulated response data as the presample for the forecast, and treat the last 20 observations as a holdout sample. idx0 idx1 Y0 = Y1 =
= 1:100; = 101:120; Y(idx0,:); % Forecast sample Y(idx1,:); % Holdout sample
Compute 1- through 20-step-ahead optimal and estimated point forecasts from the model. Compute forecast error covariance matrices corresponding to the estimated forecasts. YF1 = forecast(Mdl,Y0,20); [YF2,EstCov] = forecast(Mdl,Y0,20);
YF1 and YF2 are 20-by-2 matrices of optimal and estimated forecasts, respectively. EstCov is a 2by-2-by-20 array of forecast error covariances. Extract the forecast error variances of each response for each period in the forecast horizon. estVar1(:) = EstCov(1,1,:); estVar2(:) = EstCov(2,2,:);
estVar1 and estVar2 are 1-by-20 vectors of forecast error variances. Plot the data, forecasts, and 95% forecast intervals of each variable on separate subplots. figure subplot(2,1,1) hold on plot(idx0,Y0(:,1),'b'); h = plot(idx1,Y1(:,1),'b--'); h1 = plot(idx1,YF1(:,1),'r'); h2 = plot(idx1,YF2(:,1),'m'); ciu = YF2(:,1) + 1.96*sqrt(estVar1'); % Upper 95% confidence level cil = YF2(:,1) - 1.96*sqrt(estVar1'); % Lower 95% confidence level plot(idx1,ciu,'m-.'); plot(idx1,cil,'m-.'); yfill = [ylim,fliplr(ylim)]; xfill = [idx0(end) idx0(end) idx1(end) idx1(end)]; fill(xfill,yfill,'k','FaceAlpha',0.05) legend([h h1 h2],["Actual" "Optimal" "Estimated"],... 'Location',"NorthWest") title("Point and Interval Forecasts: Series 1") hold off subplot(2,1,2) hold on plot(idx0,Y0(:,2),'b'); h = plot(idx1,Y1(:,2),'b--'); h1 = plot(idx1,YF1(:,2),'r'); h2 = plot(idx1,YF2(:,2),'m');
12-1074
forecast
ciu = YF2(:,2) + 1.96*sqrt(estVar2'); % Upper 95% confidence level cil = YF2(:,2) - 1.96*sqrt(estVar2'); % Lower 95% confidence level plot(idx1,ciu,'m-.'); plot(idx1,cil,'m-.'); yfill = [ylim,fliplr(ylim)]; xfill = [idx0(end) idx0(end) idx1(end) idx1(end)]; fill(xfill,yfill,'k','FaceAlpha',0.05) legend([h h1 h2],["Actual" "Optimal" "Estimated"],... 'Location',"NorthWest") title("Point and Interval Forecasts: Series 2") hold off
Forecast Model Including Regression Component Simulate exogenous data for the three regressors by generating 120 random observations from the 3D standard Gaussian distribution. X = randn(120,3);
Simulate 120 observations from the DGP. Specify the exogenous data for the regression component. rng('default') Y = simulate(Mdl,120,'X',X);
Treat the first 100 observations of the simulated response and exogenous data as the presample for the forecast, and treat the last 20 observations as a holdout sample. idx0 = 1:100; idx1 = 101:120;
12-1075
12
Functions
Y0 = Y(idx0,:); Y1 = Y(idx1,:); X1 = X(idx1,:);
Compute 1- through 20-step-ahead optimal and estimated point forecasts from the model. Compute forecast error covariance matrices corresponding to the estimated forecasts. Specify the forecastperiod exogenous data for the regression component. YF1 = forecast(Mdl,Y0,20,'X',X1); [YF2,EstCov] = forecast(Mdl,Y0,20,'X',X1); estVar1(:) = EstCov(1,1,:); estVar2(:) = EstCov(2,2,:);
Plot the data, forecasts, and 95% forecast intervals of each variable on separate subplots. figure subplot(2,1,1) hold on plot(idx0,Y0(:,1),'b'); h = plot(idx1,Y1(:,1),'b--'); h1 = plot(idx1,YF1(:,1),'r'); h2 = plot(idx1,YF2(:,1),'m'); ciu = YF2(:,1) + 1.96*sqrt(estVar1'); % Upper 95% confidence level cil = YF2(:,1) - 1.96*sqrt(estVar1'); % Lower 95% confidence level plot(idx1,ciu,'m-.'); plot(idx1,cil,'m-.'); yfill = [ylim,fliplr(ylim)]; xfill = [idx0(end) idx0(end) idx1(end) idx1(end)]; fill(xfill,yfill,'k','FaceAlpha',0.05) legend([h h1 h2],["Actual" "Optimal" "Estimated"],... 'Location',"NorthWest") title("Point and Interval Forecasts: Series 1") hold off subplot(2,1,2) hold on plot(idx0,Y0(:,2),'b'); h = plot(idx1,Y1(:,2),'b--'); h1 = plot(idx1,YF1(:,2),'r'); h2 = plot(idx1,YF2(:,2),'m'); ciu = YF2(:,2) + 1.96*sqrt(estVar2'); % Upper 95% confidence level cil = YF2(:,2) - 1.96*sqrt(estVar2'); % Lower 95% confidence level plot(idx1,ciu,'m-.'); plot(idx1,cil,'m-.'); yfill = [ylim,fliplr(ylim)]; xfill = [idx0(end) idx0(end) idx1(end) idx1(end)]; fill(xfill,yfill,'k','FaceAlpha',0.05) legend([h h1 h2],["Actual" "Optimal" "Estimated"],... 'Location',"NorthWest") title("Point and Interval Forecasts: Series 2") hold off
12-1076
forecast
Input Arguments Mdl — Fully specified Markov-switching dynamic regression model msVAR model object Fully specified Markov-switching dynamic regression model, specified as an msVAR model object returned by msVAR or estimate. Properties of a fully specified model object do not contain NaN values. Y — Response data numeric matrix Response data that provides initial values for the forecasts, specified as a numObs-by-numSeries numeric matrix. numObs is the sample size. numSeries is the number of response variables (Mdl.NumSeries). Rows correspond to observations, and the last row contains the latest observation. Columns correspond to individual response variables. The forecasts YF represent the continuation of Y. Data Types: double numPeriods — Forecast horizon positive integer 12-1077
12
Functions
Forecast horizon, or the number of time points in the forecast period, specified as a positive integer. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'X',X uses the matrix X as exogenous data in the forecast horizon to evaluate regression components in the model. S0 — Initial state probabilities nonnegative numeric vector Initial state probabilities from which to forecast, specified as the comma-separated pair consisting of 'S0' and a nonnegative numeric vector of length numStates. S0 corresponds to the end of the response data sample Y. forecast normalizes S0 to produce a distribution. By default, forecast applies smooth to Y using default settings, then sets S0 to the terminal distribution of states in the data. Example: 'S0',[0.2 0.2 0.6] Example: 'S0',[0 1] specifies state 2 as the initial state. Data Types: double X — Predictor data numeric matrix | cell vector of numeric matrices Predictor data in the forecast horizon used to evaluate regression components in all submodels of Mdl, specified as the comma-separated pair consisting of 'X' and a numeric matrix or a cell vector of numeric matrices. The first row of X contains observations in the period after the period represented by the last observation in Y. To use a subset of the same predictors in each state, specify X as a matrix with numPreds columns and at least numPeriods rows. Columns correspond to distinct predictor variables. Submodels use initial columns of the associated matrix, in order, up to the number of submodel predictors. The number of columns in the Beta property of Mdl.SubModels(j) determines the number of exogenous variables in the regression component of submodel j. If the number of rows exceeds numPeriods, then forecast uses the earliest observations. To use different predictors in each state, specify a cell vector of such matrices with length numStates. By default, forecast ignores the regression components in Mdl. Data Types: double NumPaths — Number of sample paths to generate 100 (default) | positive integer 12-1078
forecast
Number of sample paths to generate for the simulation, specified as the comma-separated pair consisting of 'NumPaths' and a positive integer. If forecast returns only YF, it ignores NumPaths. Example: 'NumPaths',1000 Data Types: double
Output Arguments YF — Point forecasts numeric matrix Point forecasts, returned as a numPeriods-by-numSeries numeric matrix. If forecast returns only YF, then point forecasts are optimal. Otherwise, forecast uses Monte Carlo simulation to estimate the point forecasts. EstCov — Forecast error covariances numeric column vector | numeric array Forecast error covariances, returned as a numeric column vector or numeric array. If the submodels Mdl.SubModels represent univariate ARX models, EstCov is a numPeriods-by-1 vector. If Mdl.SubModels represent multivariate VARX models, EstCov is a numSeries-bynumSeries-by-numPeriods array. forecast performs Monte Carlo simulation to compute EstCov.
Algorithms Hamilton [2] provides a statistically optimal, one-step-ahead point forecast YF for a Markov-switching dynamic regression model. forecast computes YF iteratively to the forecast horizon when called with a single output. Nonlinearity of the Markov-switching dynamic regression model leads to nonnormal forecast errors, which complicate interval and density forecasts [3]. As a result, forecast switches to Monte Carlo methods when it returns EstCov.
Version History Introduced in R2019b
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [3] Krolzig, H.-M. Markov-Switching Vector Autoregressions. Berlin: Springer, 1997. 12-1079
12
Functions
See Also Objects msVAR Functions estimate | simulate
12-1080
forecast
forecast Class: regARIMA Forecast responses of regression model with ARIMA errors
Syntax [Y,YMSE] = forecast(Mdl,numperiods) [Y,YMSE,U] = forecast(Mdl,numperiods) [Y,YMSE,U] = forecast(Mdl,numperiods,Name,Value)
Description [Y,YMSE] = forecast(Mdl,numperiods) forecasts responses (Y) for a regression model with ARIMA time series errors and generates corresponding mean square errors (YMSE). [Y,YMSE,U] = forecast(Mdl,numperiods) additionally forecasts unconditional disturbances for a regression model with ARIMA errors. [Y,YMSE,U] = forecast(Mdl,numperiods,Name,Value) forecasts with additional options specified by one or more Name,Value pair arguments.
Input Arguments Mdl — Regression model with ARIMA errors regARIMA model Regression model with ARIMA errors, specified as a regARIMA model returned by regARIMA or estimate. The properties of Mdl cannot contain NaNs. numperiods — Forecast horizon positive integer Forecast horizon, or the number of time points in the forecast period, specified as a positive integer. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. E0 — Presample innovations numeric column vector | numeric matrix 12-1081
12
Functions
Presample innovations that initialize the moving average (MA) component of the ARIMA error model, specified as the comma-separated pair consisting of 'E0' and a numeric column vector or numeric matrix. forecast assumes that the presample innovations have a mean of 0. • If E0 is a column vector, then forecast applies it to each forecasted path. • If E0, Y0, and U0 are matrices with multiple paths, then they must have the same number of columns. • E0 requires at least Mdl.Q rows. If E0 contains extra rows, then forecast uses the latest presample innovations. The last row contains the latest presample innovation. By default, if U0 contains at least Mdl.P + Mdl.Q rows, then forecast infers E0 from U0. If U0 has an insufficient number of rows, and forecast cannot infer sufficient observations of U0 from the presample data (Y0 and X0), then E0 is 0. Data Types: double U0 — Presample unconditional disturbances numeric column vector | numeric matrix Presample unconditional disturbances that initialize the autoregressive (AR) component of the ARIMA error model, specified as the comma-separated pair consisting of 'U0' and a numeric column vector or numeric matrix. If you do not specify presample innovations E0, forecast uses U0 to infer them. • If U0 is a column vector, then forecast applies it to each forecasted path. • If U0, Y0, and E0 are matrices with multiple paths, then they must have the same number of columns. • U0 requires at least Mdl.P rows. If U0 contains extra rows, then forecast uses the latest presample unconditional disturbances. The last row contains the latest presample unconditional disturbance. By default, if the presample data (Y0 and X0) contains at least Mdl.P rows, then forecast infers U0 from the presample data. If you do not specify presample data, then all required presample unconditional disturbances are 0. Data Types: double X0 — Presample predictor data numeric matrix Presample predictor data that initializes the model for forecasting, specified as the comma-separated pair consisting of 'X0' and a numeric matrix. The columns of X0 are separate time series variables. forecast uses X0 to infer presample unconditional disturbances U0. Therefore, if you specify U0, forecast ignores X0. • If you do not specify U0, then X0 requires at least Mdl.P rows to infer U0. If X0 contains extra rows, then forecast uses the latest observations. The last row contains the latest observation of each series. • X0 requires the same number of columns as the length of Mdl.Beta. • If you specify X0, then you must also specify XF. • forecast treats X0 as a fixed (nonstochastic) matrix. Data Types: double 12-1082
forecast
XF — Forecasted or future predictor data numeric matrix Forecasted or future predictor data, specified as the comma-separated pair consisting of 'XF' and a numeric matrix. The columns of XF are separate time series, each corresponding to forecasts of the series in X0. Row t of XF contains the t-period-ahead forecasts of X0. If you specify X0, then you must also specify XF. XF and X0 require the same number of columns. XF must have at least numperiods rows. If XF exceeds numperiods rows, then forecast uses the first numperiods forecasts. forecast treats XF as a fixed (nonstochastic) matrix. By default, forecast does not include a regression component in the model, regardless of the presence of regression coefficients in Mdl. Data Types: double Y0 — Presample response data numeric column vector | numeric matrix Presample response data that initializes the model for forecasting, specified as the comma-separated pair consisting of 'Y0' and a numeric column vector or numeric matrix. forecast uses Y0 to infer presample unconditional disturbances U0. Therefore, if you specify U0, forecast ignores Y0. • If Y0 is a column vector, forecast applies it to each forecasted path. • If Y0, E0, and U0 are matrices with multiple paths, then they must have the same number of columns. • If you do not specify U0, then Y0 requires at least Mdl.P rows to infer U0. If Y0 contains extra rows, then forecast uses the latest observations. The last row contains the latest observation. Data Types: double Notes • NaNs in E0, U0, X0, XF, and Y0 indicate missing values and forecast removes them. The software merges the presample data sets (E0, U0, X0, and Y0), then uses list-wise deletion to remove any NaNs. forecast similarly removes NaNs from XF. Removing NaNs in the data reduces the sample size. Such removal can also create irregular time series. • forecast assumes that you synchronize presample data such that the latest observation of each presample series occurs simultaneously. • Set X0 to the same predictor matrix as X used in the estimation, simulation, or inference of Mdl. This assignment ensures correct inference of the unconditional disturbances, U0. • To include a regression component in the response forecast, you must specify the forecasted predictor data XF. That is, you can specify XF without also specifying X0, but forecast issues an error when you specify X0 without also specifying XF.
12-1083
12
Functions
Output Arguments Y — Minimum mean square error forecasts of response data numeric matrix Minimum mean square error (MMSE) forecasts of the response data, returned as a numeric matrix. Y has numperiods rows and numPaths columns. • If you do not specify Y0, E0, and U0, then Y is a numperiods column vector. • If you specify Y0, E0, and U0, all having numPaths columns, then Y is a numperiods-bynumPaths matrix. • Row i of Y contains the forecasts for the ith period. Data Types: double YMSE — Mean square errors of forecasted responses numeric matrix Mean square errors (MSEs) of the forecasted responses, returned as a numeric matrix. YMSE has numperiods rows and numPaths columns. • If you do not specify Y0, E0, and U0, then YMSE is a numperiods column vector. • If you specify Y0, E0, and U0, all having numPaths columns, then YMSE is a numperiods-bynumPaths matrix. • Row i of YMSE contains the forecast error variances for the ith period. • The predictor data does not contribute variability to YMSE because forecast treats XF as a nonstochastic matrix. • The square roots of YMSE are the standard errors of the forecasts of Y. Data Types: double U — Minimum mean square error forecasts of future ARIMA error model unconditional disturbances numeric matrix Minimum mean square error (MMSE) forecasts of future ARIMA error model unconditional disturbances, returned as a numeric matrix. U has numperiods rows and numPaths columns. • If you do not specify Y0, E0, and U0, then U is a numperiods column vector. • If you specify Y0, E0, and U0, all having numPaths columns, then U is a numperiods-bynumPaths matrix. • Row i of U contains the forecasted unconditional disturbances for the ith period. Data Types: double
Examples Forecast Responses of a Regression Model with ARIMA Errors Forecast responses from the following regression model with ARMA(2,1) errors over a 30-period horizon: 12-1084
forecast
0.1 + ut −0 . 2 ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, yt = Xt
where εt is Gaussian with variance 0.1. Specify the model. Simulate responses from the model and two predictor series. Mdl0 = regARIMA('Intercept',0,'AR',{0.5 -0.8},... 'MA',-0.5,'Beta',[0.1 -0.2],'Variance',0.1); rng(1); % For reproducibility X = randn(130,2); y = simulate(Mdl0,130,'X',X);
Fit the model to the first 100 observations, and reserve the remaining 30 observations to evaluate forecast performance. Mdl = regARIMA('ARLags',1:2); EstMdl = estimate(Mdl,y(1:100),'X',X(1:100,:)); Regression with ARMA(2,0) Error Model (Gaussian Distribution): Value ________ Intercept AR{1} AR{2} Beta(1) Beta(2) Variance
0.004358 0.36833 -0.75063 0.076398 -0.1396 0.079876
StandardError _____________ 0.021314 0.067103 0.090865 0.023008 0.023298 0.01342
TStatistic __________ 0.20446 5.4891 -8.2609 3.3205 -5.9919 5.9522
PValue __________ 0.83799 4.0408e-08 1.4453e-16 0.00089863 2.0741e-09 2.6453e-09
EstMdl is a new regARIMA model containing the estimates. The estimates are close to their true values. Use EstMdl to forecast a 30-period horizon. Visually compare the forecasts to the holdout data using a plot. [yF,yMSE] = forecast(EstMdl,30,'Y0',y(1:100),... 'X0',X(1:100,:),'XF',X(101:end,:)); figure plot(y,'Color',[.7,.7,.7]); hold on plot(101:130,yF,'b','LineWidth',2); plot(101:130,yF+1.96*sqrt(yMSE),'r:',... 'LineWidth',2); plot(101:130,yF-1.96*sqrt(yMSE),'r:','LineWidth',2); h = gca; ph = patch([repmat(101,1,2) repmat(130,1,2)],... [h.YLim fliplr(h.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend('Observed','Forecast',... '95% Forecast Interval','Location','Best'); title(['30-Period Forecasts and Approximate 95% '...
12-1085
12
Functions
'Forecast Intervals']) axis tight hold off
Many observations in the holdout sample fall beyond the 95% forecast intervals. Two reasons for this are: • The predictors are randomly generated in this example. estimate treats the predictors as fixed. The 95% forecast intervals based on the estimates from estimate do not account for the variability in the predictors. • By shear chance, the estimation period seems less volatile than the forecast period. estimate uses the less volatile estimation period data to estimate the parameters. Therefore, forecast intervals based on the estimates should not cover observations that have an underlying innovations process with larger variability.
Forecast the GDP Using Regression Model with ARMA Errors Forecast stationary, log GDP using a regression model with ARMA(1,1) errors, including CPI as a predictor. Load the U.S. macroeconomic data set and preprocess the data. load Data_USEconModel; logGDP = log(DataTimeTable.GDP);
12-1086
forecast
dlogGDP = diff(logGDP); % For stationarity dCPI = diff(DataTimeTable.CPIAUCSL); % For stationarity numObs = length(dlogGDP); gdp = dlogGDP(1:end-15); % Estimation sample cpi = dCPI(1:end-15); T = length(gdp); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = dCPI(frstHzn); % Holdout sample dts = DataTimeTable.Time(2:end);
Fit a regression model with ARMA(1,1) errors. Mdl = regARIMA('ARLags',1,'MALags',1); EstMdl = estimate(Mdl,gdp,'X',cpi); Regression with ARMA(1,1) Error Model (Gaussian Distribution): Value __________ Intercept AR{1} MA{1} Beta(1) Variance
0.014793 0.57601 -0.15258 0.0028972 9.5734e-05
StandardError _____________ 0.0016289 0.10009 0.11978 0.0013989 6.5562e-06
TStatistic __________ 9.0818 5.7548 -1.2738 2.071 14.602
PValue __________ 1.0684e-19 8.6755e-09 0.20272 0.038355 2.723e-48
Forecast the GDP rate over a 15-quarter horizon. Use the estimation sample as a presample for the forecast. [gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',gdp,... 'X0',cpi,'XF',hoCPI);
Plot the forecasts and 95% forecast intervals. figure h1 = plot(dts(end-65:end),dlogGDP(end-65:end),... 'Color',[.7,.7,.7]); datetick hold on h2 = plot(dts(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dts(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dts(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:','LineWidth',2); ha = gca; title('{\bf GDP Rate Forecasts and Approximate 95% Intervals}') ph = patch([repmat(dts(frstHzn(1)),1,2) repmat(dts(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP rate','Forecasted GDP rate ',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off
12-1087
12
Functions
Forecast Regression Model with ARIMA Errors With Known Intercept Forecast unit root nonstationary, log GDP using a regression model with ARIMA(1,1,1) errors, including CPI as a predictor and a known intercept. Load the U.S. Macroeconomic data set and preprocess the data. load Data_USEconModel; numObs = length(DataTimeTable.GDP); logGDP = log(DataTimeTable.GDP(1:end-15)); cpi = DataTimeTable.CPIAUCSL(1:end-15); T = length(logGDP); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = DataTimeTable.CPIAUCSL(frstHzn); % Holdout sample dt = DataTimeTable.Time;
Specify the model for the estimation period. Mdl = regARIMA('ARLags',1,'MALags',1,'D',1);
The intercept is not identifiable in a model with integrated errors, so fix its value before estimation. One way to do this is to estimate the intercept using simple linear regression. Reg4Int = [ones(T,1), cpi]\logGDP; intercept = Reg4Int(1);
12-1088
forecast
Consider performing a sensitivity analysis by using a grid of intercepts. Set the intercept and fit the regression model with ARIMA(1,1,1) errors. Mdl.Intercept = intercept; EstMdl = estimate(Mdl,logGDP,'X',cpi,'Display','off') EstMdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(1,1,1) Error Model (Gaussian Distribution)" Name = "Gaussian" 5.80142 [0.00396706] 2 1 1 {0.922709} at lag [1] {} {-0.387843} at lag [1] {} 0.000108943
Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution)
Forecast GDP over a 15-quarter horizon. Use the estimation sample as a presample for the forecast. [gdpF,gdpMSE] = forecast(EstMdl,15,'Y0',logGDP,... 'X0',cpi,'XF',hoCPI);
Plot the forecasts and 95% forecast intervals. figure h1 = plot(dt(end-65:end),log(DataTimeTable.GDP(end-65:end)),... 'Color',[.7,.7,.7]); hold on h2 = plot(dt(frstHzn),gdpF,'b','LineWidth',2); h3 = plot(dt(frstHzn),gdpF+1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); plot(dt(frstHzn),gdpF-1.96*sqrt(gdpMSE),'r:',... 'LineWidth',2); ha = gca; title('{\bf Log GDP Forecasts and Approximate 95% Intervals}') ph = patch([repmat(dt(frstHzn(1)),1,2) repmat(dt(frstHzn(end)),1,2)],... [ha.YLim fliplr(ha.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2 h3],{'Observed GDP','Forecasted GDP',... '95% Forecast Interval'},'Location','Best','AutoUpdate','off'); axis tight hold off
12-1089
12
Functions
The unconditional disturbances, ut, are nonstationary, therefore the widths of the forecast intervals grow with time.
More About Time Base Partitions for Forecasting Time base partitions for forecasting are two disjoint, contiguous intervals of the time base; each interval contains time series data for forecasting a dynamic model. The forecast period (forecast horizon) is a numperiods length partition at the end of the time base during which forecast generates forecasts Y from the dynamic model Mdl. The presample period is the entire partition occurring before the forecast period. forecast can require observed responses Y0, regression data X0, unconditional disturbances U0, or innovations E0 in the presample period to initialize the dynamic model for forecasting. The model structure determines the types and amounts of required presample observations. A common practice is to fit a dynamic model to a portion of the data set, then validate the predictability of the model by comparing its forecasts to observed responses. During forecasting, the presample period contains the data to which the model is fit, and the forecast period contains the holdout sample for validation. Suppose that yt is an observed response series; x1,t, x2,t, and x3,t are observed exogenous series; and time t = 1,…,T. Consider forecasting responses from a dynamic model of yt containing a regression component numperiods = K periods. Suppose that the dynamic model is fit to the data in the interval [1,T – K] (for more details, see estimate). This figure shows the time base partitions for forecasting. 12-1090
forecast
For example, to generate forecasts Y from a regression model with AR(2) errors, forecast requires presample unconditional disturbances U0 and future predictor data XF. • forecast infers unconditional disturbances given enough readily available presample responses and predictor data. To initialize an AR(2) error model, Y0 = yT − K − 1 yT − K ′ and X0 = x1, T − K − 1 x2, T − K − 1 x3, T − K − 1 . x1, T − K − 1 x2, T − K x3, T − K • To model, forecast requires future exogenous data XF = x1, T − K + 1 : T x2, T − K + 1 : T x3, T − K + 1 : T . This figure shows the arrays of required observations for the general case, with corresponding input and output arguments.
12-1091
12
Functions
Algorithms • forecast computes the forecasted response MSEs, YMSE, by treating the predictor data matrices (X0 and XF) as nonstochastic and statistically independent of the model innovations. Therefore, YMSE reflects the variance associated with the unconditional disturbances of the ARIMA error model alone. • forecast uses Y0 and X0 to infer U0. Therefore, if you specify U0, forecast ignores Y0 and X0.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [3] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991. [6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
See Also regARIMA | estimate | infer | simulate Topics “MMSE Forecasting of Conditional Mean Models” on page 7-172 “Monte Carlo Forecasting of Conditional Mean Models” on page 7-171
12-1092
forecast
forecast Class: ssm Forecast states and observations of state-space models
Syntax [Y,YMSE] = forecast(Mdl,numPeriods,Y0) [Y,YMSE] = forecast(Mdl,numPeriods,Y0,Name,Value) [Y,YMSE,X,XMSE] = forecast( ___ )
Description [Y,YMSE] = forecast(Mdl,numPeriods,Y0) returns forecasted observations on page 11-10 (Y) and their corresponding variances (YMSE) from forecasting the state-space model on page 11-3 Mdl using a numPeriods forecast horizon and in-sample observations Y0. [Y,YMSE] = forecast(Mdl,numPeriods,Y0,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, for state-space models that include a linear regression component in the observation model, include in-sample predictor data, predictor data for the forecast horizon, and the regression coefficient. [Y,YMSE,X,XMSE] = forecast( ___ ) uses any of the input arguments in the previous syntaxes to additionally return state forecasts on page 11-8 (X) and their corresponding variances (XMSE).
Input Arguments Mdl — Standard state-space model ssm model object Standard state-space model, specified as an ssm model object returned by ssm or estimate. If Mdl is not fully specified (that is, Mdl contains unknown parameters), then specify values for the unknown parameters using the 'Params' name-value argument. Otherwise, the software issues an error. estimate returns fully-specified state-space models. Mdl does not store observed responses or predictor data. Supply the data wherever necessary using the appropriate input or name-value arguments. numPeriods — Forecast horizon positive integer Forecast horizon, specified as a positive integer. That is, the software returns 1,..,numPeriods forecasts. Data Types: double Y0 — In-sample, observed responses cell vector of numeric vectors | numeric matrix In-sample, observed responses, specified as a cell vector of numeric vectors or a matrix. 12-1093
12
Functions
• If Mdl is time invariant, then Y0 is a T-by-n numeric matrix, where each row corresponds to a period and each column corresponds to a particular observation in the model. Therefore, T is the sample size and m is the number of observations per period. The last row of Y contains the latest observations. • If Mdl is time varying with respect to the observation equation, then Y is a T-by-1 cell vector. Each element of the cell vector corresponds to a period and contains an nt-dimensional vector of observations for that period. The corresponding dimensions of the coefficient matrices in Mdl.C{t} and Mdl.D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations. If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Y0 to the same data set that you used to fit Mdl. NaN elements indicate missing observations. For details on how the Kalman filter accommodates missing observations, see “Algorithms” on page 12-1105. Data Types: double | cell Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Beta',beta,'Predictors',Z specifies to deflate the observations by the regression component composed of the predictor data Z and the coefficient matrix beta. A — Forecast-horizon, state-transition, coefficient matrices cell vector of numeric matrices Forecast-horizon, state-transition, coefficient matrices, specified as the comma-separated pair consisting of 'A' and a cell vector of numeric matrices. • A must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of A is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the states, then each cell of A must contain an m-by-m matrix, where m is the number of the in-sample states per period. By default, the software uses Mdl.A throughout the forecast horizon. • If Mdl is time varying with respect to the states, then the dimensions of the matrices in the cells of A can vary, but the dimensions of each matrix must be consistent with the matrices in B and C in the corresponding periods. By default, the software uses Mdl.A{end} throughout the forecast horizon. Note The matrices in A cannot contain NaN values. Data Types: cell B — Forecast-horizon, state-disturbance-loading, coefficient matrices cell vector of matrices 12-1094
forecast
Forecast-horizon, state-disturbance-loading, coefficient matrices, specified as the comma-separated pair consisting of 'B' and a cell vector of matrices. • B must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of B is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the states and state disturbances, then each cell of B must contain an m-by-k matrix, where m is the number of the in-sample states per period, and k is the number of in-sample, state disturbances per period. By default, the software uses Mdl.B throughout the forecast horizon. • If Mdl is time varying, then the dimensions of the matrices in the cells of B can vary, but the dimensions of each matrix must be consistent with the matrices in A in the corresponding periods. By default, the software uses Mdl.B{end} throughout the forecast horizon. Note The matrices in B cannot contain NaN values. Data Types: cell C — Forecast-horizon, measurement-sensitivity, coefficient matrices cell vector of matrices Forecast-horizon, measurement-sensitivity, coefficient matrices, specified as the comma-separated pair consisting of 'C' and a cell vector of matrices. • C must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of C is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the states and the observations, then each cell of C must contain an n-by-m matrix, where n is the number of the in-sample observations per period, and m is the number of in-sample states per period. By default, the software uses Mdl.C throughout the forecast horizon. • If Mdl is time varying with respect to the states or the observations, then the dimensions of the matrices in the cells of C can vary, but the dimensions of each matrix must be consistent with the matrices in A and D in the corresponding periods. By default, the software uses Mdl.C{end} throughout the forecast horizon. Note The matrices in C cannot contain NaN values. Data Types: cell D — Forecast-horizon, observation-innovation, coefficient matrices cell vector of matrices Forecast-horizon, observation-innovation, coefficient matrices, specified as the comma-separated pair consisting of 'D' and a cell vector of matrices. • D must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of D is greater than numPeriods, then the 12-1095
12
Functions
software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon. • If Mdl is time invariant with respect to the observations and the observation innovations, then each cell of D must contain an n-by-h matrix, where n is the number of the in-sample observations per period, and h is the number of in-sample, observation innovations per period. By default, the software uses Mdl.D throughout the forecast horizon. • If Mdl is time varying with respect to the observations or the observation innovations, then the dimensions of the matrices in the cells of D can vary, but the dimensions of each matrix must be consistent with the matrices in C in the corresponding periods. By default, the software uses Mdl.D{end} throughout the forecast horizon. Note The matrices in D cannot contain NaN values. Data Types: cell Beta — Regression coefficients [] (default) | numeric matrix Regression coefficients corresponding to predictor variables, specified as the comma-separated pair consisting of 'Beta' and a d-by-n numeric matrix. d is the number of predictor variables (see Predictors0 and PredictorsF) and n is the number of observed response series (see Y0). • If you specify Beta, then you must also specify Predictors0 and PredictorsF. • If Mdl is an estimated state-space model, then specify the estimated regression coefficients stored in Mdl.estParams. By default, the software excludes a regression component from the state-space model. Predictors0 — In-sample, predictor variables in state-space model observation equation [] (default) | matrix In-sample, predictor variables in the state-space model observation equation, specified as the commaseparated pair consisting of 'Predictors0' and a matrix. The columns of Predictors0 correspond to individual predictor variables. Predictors0 must have T rows, where row t corresponds to the observed predictors at period t (Zt). The expanded observation equation is yt − Zt β = Cxt + Dut . In other words, the software deflates the observations using the regression component. β is the timeinvariant vector of regression coefficients that the software estimates with all other parameters. • If there are n observations per period, then the software regresses all predictor series onto each observation. • If you specify Predictors0, then Mdl must be time invariant. Otherwise, the software returns an error. • If you specify Predictors0, then you must also specify Beta and PredictorsF. • If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Predictors0 to the same predictor data set that you used to fit Mdl. By default, the software excludes a regression component from the state-space model. 12-1096
forecast
Data Types: double PredictorsF — Forecast-horizon, predictor variables in state-space model observation equation [] (default) | numeric matrix In-sample, predictor variables in the state-space model observation equation, specified as the commaseparated pair consisting of 'Predictors0' and a T-by-d numeric matrix. T is the number of insample periods and d is the number of predictor variables. Row t corresponds to the observed predictors at period t (Zt). The expanded observation equation is yt − Zt β = Cxt + Dut . In other words, the software deflates the observations using the regression component. β is the timeinvariant vector of regression coefficients that the software estimates with all other parameters. • If there are n observations per period, then the software regresses all predictor series onto each observation. • If you specify Predictors0, then Mdl must be time invariant. Otherwise, the software returns an error. • If you specify Predictors0, then you must also specify Beta and PredictorsF. • If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Predictors0 to the same predictor data set that you used to fit Mdl. By default, the software excludes a regression component from the state-space model. Data Types: double
Output Arguments Y — Forecasted observations matrix | cell vector of numeric vectors Forecasted observations on page 11-10, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the observations, then Y is a numPeriods-by-n matrix. If Mdl is a time-varying, state-space model with respect to the observations, then Y is a numPeriodsby-1 cell vector of numeric vectors. Cell t of Y contains an nt-by-1 numeric vector of forecasted observations for period t. YMSE — Error variances of forecasted observations matrix | cell vector of numeric vectors Error variances of forecasted observations, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the observations, then YMSE is a numPeriods-by-n matrix. If Mdl is a time-varying, state-space model with respect to the observations, then YMSE is a numPeriods-by-1 cell vector of numeric vectors. Cell t of YMSE contains an nt-by-1 numeric vector of error variances for the corresponding forecasted observations for period t. 12-1097
12
Functions
X — State forecasts matrix | cell vector of numeric vectors State forecasts on page 11-8, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the states, then X is a numPeriods-by-m matrix. If Mdl is a time-varying, state-space model with respect to the states, then X is a numPeriods-by-1 cell vector of numeric vectors. Cell t of X contains an mt-by-1 numeric vector of forecasted observations for period t. XMSE — Error variances of state forecasts matrix | cell vector of numeric vectors Error variances of state forecasts, returned as a matrix or a cell vector of numeric vectors. If Mdl is a time-invariant, state-space model with respect to the states, then XMSE is a numPeriodsby-m matrix. If Mdl is a time-varying, state-space model with respect to the states, then XMSE is a numPeriodsby-1 cell vector of numeric vectors. Cell t of XMSE contains an mt-by-1 numeric vector of error variances for the corresponding forecasted observations for period t.
Examples Forecast Observations of Time-Invariant State-Space Model Suppose that a latent process is an AR(1). The state equation is xt = 0 . 5xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. T = 100; ARMdl = arima('AR',0.5,'Constant',0,'Variance',1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,'Y0',x0);
Suppose further that the latent process is subject to additive measurement error. The observation equation is yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.75*randn(T,1);
12-1098
forecast
Specify the four coefficient matrices. A B C D
= = = =
0.5; 1; 1; 0.75;
Specify the state-space model using the coefficient matrices. Mdl = ssm(A,B,C,D) Mdl = State-space model type: ssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.50)x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 1.33 State types x1 Stationary
Mdl is an ssm model. Verify that the model is correctly specified using the display in the Command Window. The software infers that the state process is stationary. Subsequently, the software sets the initial state mean and covariance to the mean and variance of the stationary distribution of an AR(1) model. Forecast the observations 10 periods into the future, and estimate their variances. numPeriods = 10; [ForecastedY,YMSE] = forecast(Mdl,numPeriods,y);
Plot the forecasts with the in-sample responses, and 95% Wald-type forecast intervals. 12-1099
12
Functions
ForecastIntervals(:,1) = ForecastedY - 1.96*sqrt(YMSE); ForecastIntervals(:,2) = ForecastedY + 1.96*sqrt(YMSE); figure plot(T-20:T,y(T-20:T),'-k',T+1:T+numPeriods,ForecastedY,'-.r',... T+1:T+numPeriods,ForecastIntervals,'-.b',... T:T+1,[y(end)*ones(3,1),[ForecastedY(1);ForecastIntervals(1,:)']],':k',... 'LineWidth',2) hold on title({'Observed Responses and Their Forecasts'}) xlabel('Period') ylabel('Responses') legend({'Observations','Forecasted observations','95% forecast intervals'},... 'Location','Best') hold off
Forecast Observations of State-Space Model Containing Regression Component Suppose that the linear relationship between the change in the unemployment rate and the nominal gross national product (nGNP) growth rate is of interest. Suppose further that the first difference of the unemployment rate is an ARMA(1,1) series. Symbolically, and in state-space form, the model is
12-1100
forecast
x1, t x2, t
=
1 ϕ θ x1, t − 1 + u1, t 1 0 0 x2, t − 1
yt − βZt = x1, t + σεt, where: • x1, t is the change in the unemployment rate at time t. • x2, t is a dummy state for the MA(1) effect. •
y1, t is the observed change in the unemployment rate being deflated by the growth rate of nGNP (Zt).
• u1, t is the Gaussian series of state disturbances having mean 0 and standard deviation 1. • εt is the Gaussian series of observation innovations having mean 0 and standard deviation σ. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series, and the first difference of each series. Also, remove the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); u = DataTable.UR(~isNaN); T = size(gnpn,1); Z = [ones(T-1,1) diff(log(gnpn))]; y = diff(u);
% Flag periods containing NaNs % Sample size
Though this example removes missing values, the software can accommodate series containing missing values in the Kalman filter framework. To determine how well the model forecasts observations, remove the last 10 observations for comparison. numPeriods = 10; isY = y(1:end-numPeriods); oosY = y(end-numPeriods+1:end); ISZ = Z(1:end-numPeriods,:); OOSZ = Z(end-numPeriods+1:end,:);
% % % % %
Forecast horizon In-sample observations Out-of-sample observations In-sample predictors Out-of-sample predictors
Specify the coefficient matrices. A B C D
= = = =
[NaN NaN; 0 0]; [1; 1]; [1 0]; NaN;
Specify the state-space model using ssm. Mdl = ssm(A,B,C,D);
Estimate the model parameters, and use a random set of initial parameter values for optimization. Specify the regression component and its initial value for optimization using the 'Predictors' and 'Beta0' name-value pair arguments, respectively. Restrict the estimate of σ to all positive, real 12-1101
12
Functions
numbers. For numerical stability, specify the Hessian when the software computes the parameter covariance matrix, using the 'CovMethod' name-value pair argument. params0 = [0.3 0.2 0.1]; % Chosen arbitrarily [EstMdl,estParams] = estimate(Mdl,isY,params0,'Predictors',ISZ,... 'Beta0',[0.1 0.2],'lb',[-Inf,-Inf,0,-Inf,-Inf],'CovMethod','hessian'); Method: Maximum likelihood (fmincon) Sample size: 51 Logarithmic likelihood: -87.2409 Akaike info criterion: 184.482 Bayesian info criterion: 194.141 | Coeff Std Err t Stat Prob ---------------------------------------------------------c(1) | -0.31780 0.19429 -1.63572 0.10190 c(2) | 1.21242 0.48882 2.48032 0.01313 c(3) | 0.45583 0.63930 0.71302 0.47584 y 2 or DoF = NaN. • DoF is estimable. • If you specify "t", DoF is NaN by default. You can change its value by using dot notation after you create the model. For example, Mdl.Distribution.DoF = 3. • If you supply a structure array to specify the Student's t distribution, then you must specify both the 'Name' and 'DoF' fields. Example: struct('Name',"t",'DoF',10) Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. garch stores the value as a string scalar. The default value describes the parametric form of the model, for example "GARCH(1,1) Conditional Variance Model (Gaussian Distribution)". Example: 'Description','Model 1' Data Types: string | char Note All NaN-valued model parameters, which include coefficients and the t-innovation-distribution degrees of freedom (if present), are estimable. When you pass the resulting garch object and data to estimate, MATLAB estimates all NaN-valued parameters. During estimation, estimate treats known parameters as equality constraints, that is,estimate holds any known parameters fixed at their values.
Object Functions estimate filter forecast infer simulate summarize
Fit conditional variance model to data Filter disturbances through conditional variance model Forecast conditional variances from conditional variance models Infer conditional variances of conditional variance models Monte Carlo simulation of conditional variance models Display estimation results of conditional variance model
Examples Create Default GARCH Model Create a default garch model object and specify its parameter values using dot notation. Create a GARCH(0,0) model. 12-1198
garch
Mdl = garch Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(0,0) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 0 0 NaN {} {} 0
Mdl is a garch model. It contains an unknown constant, its offset is 0, and the innovation distribution is 'Gaussian'. The model does not have a GARCH or ARCH polynomial. Specify two unknown ARCH coefficients for lags one and two using dot notation. Mdl.ARCH = {NaN NaN} Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(0,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 0 2 NaN {} {NaN NaN} at lags [1 2] 0
The Q and ARCH properties are updated to 2 and {NaN NaN}. The two ARCH coefficients are associated with lags 1 and 2.
Create GARCH Model Using Shorthand Syntax Create a garch model using the shorthand notation garch(P,Q), where P is the degree of the GARCH polynomial and Q is the degree of the ARCH polynomial. Create a GARCH(3,2) model. Mdl = garch(3,2) Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2 NaN {NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2] 0
12-1199
12
Functions
Mdl is a garch model object. All properties of Mdl, except P, Q, and Distribution, are NaN values. By default, the software: • Includes a conditional variance model constant • Excludes a conditional mean model offset (i.e., the offset is 0) • Includes all lag terms in the ARCH and GARCH lag-operator polynomials up to lags Q and P, respectively Mdl specifies only the functional form of a GARCH model. Because it contains unknown parameter values, you can pass Mdl and the time-series data to estimate to estimate the parameters.
Create GARCH Model Using Longhand Syntax Create a garch model using name-value pair arguments. Specify a GARCH(1,1) model. By default, the conditional mean model offset is zero. Specify that the offset is NaN. Mdl = garch('GARCHLags',1,'ARCHLags',1,'Offset',NaN) Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution)" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] NaN
Mdl is a garch model object. The software sets all parameters (the properties of the model object) to NaN, except P, Q, and Distribution. Since Mdl contains NaN values, Mdl is only appropriate for estimation only. Pass Mdl and time-series data to estimate.
Create GARCH Model with Known Coefficients Create a GARCH(1,1) model with mean offset, yt = 0 . 5 + εt, where εt = σtzt, σt2 = 0 . 0001 + 0 . 75σt2− 1 + 0 . 1εt2− 1, and zt is an independent and identically distributed standard Gaussian process. 12-1200
garch
Mdl = garch('Constant',0.0001,'GARCH',0.75,... 'ARCH',0.1,'Offset',0.5) Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution)" Name = "Gaussian" 1 1 0.0001 {0.75} at lag [1] {0.1} at lag [1] 0.5
garch assigns default values to any properties you do not specify with name-value pair arguments.
Access GARCH Model Properties Access the properties of a garch model object using dot notation. Create a garch model object. Mdl = garch(3,2) Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2 NaN {NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2] 0
Remove the second GARCH term from the model. That is, specify that the GARCH coefficient of the second lagged conditional variance is 0. Mdl.GARCH{2} = 0 Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2 NaN {NaN NaN} at lags [1 3] {NaN NaN} at lags [1 2] 0
The GARCH polynomial has two unknown parameters corresponding to lags 1 and 3. 12-1201
12
Functions
Display the distribution of the disturbances. Mdl.Distribution ans = struct with fields: Name: "Gaussian"
The disturbances are Gaussian with mean 0 and variance 1. Specify that the underlying I.I.D. disturbances have a t distribution with five degrees of freedom. Mdl.Distribution = struct('Name','t','DoF',5) Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(3,2) Conditional Variance Model (t Distribution)" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {NaN NaN} at lags [1 2] 0
Specify that the ARCH coefficients are 0.2 for the first lag and 0.1 for the second lag. Mdl.ARCH = {0.2 0.1} Mdl = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(3,2) Conditional Variance Model (t Distribution)" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {0.2 0.1} at lags [1 2] 0
To estimate the remaining parameters, you can pass Mdl and your data to estimate and use the specified parameters as equality constraints. Or, you can specify the rest of the parameter values, and then simulate or forecast conditional variances from the GARCH model by passing the fully specified model to simulate or forecast, respectively.
Estimate GARCH Model Fit a GARCH model to an annual time series of Danish nominal stock returns from 1922-1999. Load the Data_Danish data set. Plot the nominal returns (nr). load Data_Danish; nr = DataTable.RN;
12-1202
garch
figure; plot(dates,nr); hold on; plot([dates(1) dates(end)],[0 0],'r:'); % Plot y = 0 hold off; title('Danish Nominal Stock Returns'); ylabel('Nominal return (%)'); xlabel('Year');
The nominal return series seems to have a nonzero conditional mean offset and seems to exhibit volatility clustering. That is, the variability is smaller for earlier years than it is for later years. For this example, assume that a GARCH(1,1) model is appropriate for this series. Create a GARCH(1,1) model. The conditional mean offset is zero by default. To estimate the offset, specify that it is NaN. Mdl = garch('GARCHLags',1,'ARCHLags',1,'Offset',NaN);
Fit the GARCH(1,1) model to the data. EstMdl = estimate(Mdl,nr); GARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________
StandardError _____________
TStatistic __________
PValue _________
12-1203
12
Functions
Constant GARCH{1} ARCH{1} Offset
0.0044476 0.84932 0.07325 0.11227
0.007814 0.26495 0.14953 0.039214
0.56918 3.2056 0.48986 2.8629
0.56923 0.0013477 0.62423 0.0041974
EstMdl is a fully specified garch model object. That is, it does not contain NaN values. You can assess the adequacy of the model by generating residuals using infer, and then analyzing them. To simulate conditional variances or responses, pass EstMdl to simulate. To forecast innovations, pass EstMdl to forecast.
Simulate GARCH Model Observations and Conditional Variances Simulate conditional variance or response paths from a fully specified garch model object. That is, simulate from an estimated garch model or a known garch model in which you specify all parameter values. Load the Data_Danish data set. load Data_Danish; nr = DataTable.RN;
Create a GARCH(1,1) model with an unknown conditional mean offset. Fit the model to the annual nominal return series. Mdl = garch('GARCHLags',1,'ARCHLags',1,'Offset',NaN); EstMdl = estimate(Mdl,nr); GARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1} ARCH{1} Offset
0.0044476 0.84932 0.07325 0.11227
StandardError _____________ 0.007814 0.26495 0.14953 0.039214
TStatistic __________ 0.56918 3.2056 0.48986 2.8629
PValue _________ 0.56923 0.0013477 0.62423 0.0041974
Simulate 100 paths of conditional variances and responses for each period from the estimated GARCH model. numObs = numel(nr); % Sample size (T) numPaths = 100; % Number of paths to simulate rng(1); % For reproducibility [VSim,YSim] = simulate(EstMdl,numObs,'NumPaths',numPaths);
VSim and YSim are T-by- numPaths matrices. Rows correspond to a sample period, and columns correspond to a simulated path. Plot the average and the 97.5% and 2.5% percentiles of the simulated paths. Compare the simulation statistics to the original data. VSimBar = mean(VSim,2); VSimCI = quantile(VSim,[0.025 0.975],2);
12-1204
garch
YSimBar = mean(YSim,2); YSimCI = quantile(YSim,[0.025 0.975],2); figure; subplot(2,1,1); h1 = plot(dates,VSim,'Color',0.8*ones(1,3)); hold on; h2 = plot(dates,VSimBar,'k--','LineWidth',2); h3 = plot(dates,VSimCI,'r--','LineWidth',2); hold off; title('Simulated Conditional Variances'); ylabel('Cond. var.'); xlabel('Year'); subplot(2,1,2); h1 = plot(dates,YSim,'Color',0.8*ones(1,3)); hold on; h2 = plot(dates,YSimBar,'k--','LineWidth',2); h3 = plot(dates,YSimCI,'r--','LineWidth',2); hold off; title('Simulated Nominal Returns'); ylabel('Nominal return (%)'); xlabel('Year'); legend([h1(1) h2 h3(1)],{'Simulated path' 'Mean' 'Confidence bounds'},... 'FontSize',7,'Location','NorthWest');
12-1205
12
Functions
Forecast GARCH Model Conditional Variances Forecast conditional variances from a fully specified garch model object. That is, forecast from an estimated garch model or a known garch model in which you specify all parameter values. The example follows from “Estimate GARCH Model” on page 12-1202. Load the Data_Danish data set. load Data_Danish; nr = DataTable.RN;
Create a GARCH(1,1) model with an unknown conditional mean offset, and fit the model to the annual, nominal return series. Mdl = garch('GARCHLags',1,'ARCHLags',1,'Offset',NaN); EstMdl = estimate(Mdl,nr); GARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1} ARCH{1} Offset
0.0044476 0.84932 0.07325 0.11227
StandardError _____________ 0.007814 0.26495 0.14953 0.039214
TStatistic __________ 0.56918 3.2056 0.48986 2.8629
PValue _________ 0.56923 0.0013477 0.62423 0.0041974
Forecast the conditional variance of the nominal return series 10 years into the future using the estimated GARCH model. Specify the entire returns series as presample observations. The software infers presample conditional variances using the presample observations and the model. numPeriods = 10; vF = forecast(EstMdl,numPeriods,nr);
Plot the forecasted conditional variances of the nominal returns. Compare the forecasts to the observed conditional variances. v = infer(EstMdl,nr); figure; plot(dates,v,'k:','LineWidth',2); hold on; plot(dates(end):dates(end) + 10,[v(end);vF],'r','LineWidth',2); title('Forecasted Conditional Variances of Nominal Returns'); ylabel('Conditional variances'); xlabel('Year'); legend({'Estimation sample cond. var.','Forecasted cond. var.'},... 'Location','Best');
12-1206
garch
More About GARCH Model A GARCH model is a dynamic model that addresses conditional heteroscedasticity, or volatility clustering, in an innovations process. Volatility clustering occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes with time. A GARCH model posits that the current conditional variance is the sum of these linear processes, with coefficients for each term: • Past conditional variances (the GARCH component or polynomial) • Past squared innovations (the ARCH component or polynomial) • Constant offsets for the innovation mean and conditional variance models Consider the time series yt = μ + εt, where εt = σtzt . The GARCH(P,Q) conditional variance process, σt2, has the form σt2 = κ + γ1σt2− 1 + … + γPσt2− P + α1εt2− 1 + … + αQεt2− Q . In lag operator notation, the model is 12-1207
12
Functions
1 − γ1L − … − γPLP σt2 = κ + α1L + … + αQLQ εt2 . The table shows how the variables correspond to the properties of the garch model object. Variable
Description
Property
μ
Innovation mean model constant 'Offset' offset
κ>0
Conditional variance model constant
'Constant'
γi ≥ 0
GARCH component coefficients
'GARCH'
αj ≥ 0
ARCH component coefficients
'ARCH'
zt
Series of independent random variables with mean 0 and variance 1
'Distribution'
For stationarity and positivity, GARCH models use these constraints: • κ>0 • γi ≥ 0, α j ≥ 0 •
Q
∑iP= 1 γi + ∑ j = 1 α j < 1
Engle’s original ARCH(Q) model is equivalent to a GARCH(0,Q) specification. GARCH models are appropriate when positive and negative shocks of equal magnitude contribute equally to volatility [1].
Tips You can specify a garch model as part of a composition of conditional mean and variance models. For details, see arima.
Version History Introduced in R2012a
References [1] Tsay, R. S. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2010.
See Also Objects egarch | gjr | arima Topics “Specify GARCH Models” on page 8-6 “Modify Properties of Conditional Variance Models” on page 8-39 12-1208
garch
“Specify Conditional Mean and Variance Models” on page 7-81 “Infer Conditional Variances and Residuals” on page 8-62 “Compare Conditional Variance Models Using Information Criteria” on page 8-69 “Simulate GARCH Models” on page 8-76 “Forecast a Conditional Variance Model” on page 8-97 “Conditional Variance Models” on page 8-2 “GARCH Model” on page 8-3
12-1209
12
Functions
gctest Block-wise Granger causality and block exogeneity tests
Syntax h = gctest(Y1,Y2) h = gctest(Y1,Y2,Y3) h = gctest( ___ ,Name,Value) [h,pvalue,stat,cvalue] = gctest( ___ )
Description The gctest function conducts a block-wise Granger causality test on page 12-1220 by accepting sets of time series data representing the "cause" and "effect" multivariate response variables in the test. gctest supports the inclusion of optional endogenous conditioning variables in the model for the test. To conduct the leave-one-out, exclude-all, and block-wise Granger causality tests on the response variables of a fully specified VAR model (represented by a varm model object), see gctest. h = gctest(Y1,Y2) returns the test decision h from conducting a block-wise Granger causality test on page 12-1220 for assessing whether a set of time series variables Y1 Granger-causes a distinct set of the time series variables Y2. The gctest function conducts tests in the vector autoregression on page 12-1221 (VAR) framework and treats Y1 and Y2 as response (endogenous) variables during testing. h = gctest(Y1,Y2,Y3) conducts a 1-step Granger causality test for Y1 and Y2, conditioned on a distinct set of time series variables Y3. The variable Y3 is endogenous in the underlying VAR model, but gctest does not consider it a "cause" or an "effect" in the test. h = gctest( ___ ,Name,Value) specifies options using one or more name-value arguments in addition to the input argument combinations in previous syntaxes. For example, Test="f",NumLags=2 specifies conducting an F test that compares the residual sum of squares between restricted and unrestricted VAR(2) models for all response variables. [h,pvalue,stat,cvalue] = gctest( ___ ) additionally returns the p-value pvalue, test statistic stat, and critical value cvalue for the test.
Examples Conduct Granger Causality Test Conduct a Granger causality test to assess whether the M1 money supply has an impact on the predictive distribution of the consumer price index (CPI). Load the US macroeconomic data set Data_USEconModel.mat. load Data_USEconModel
12-1210
gctest
The data set includes the MATLAB® timetable DataTimeTable, which contains 14 variables measured from Q1 1947 through Q1 2009. M1SL is the table variable containing the M1 money supply, and CPIAUCSL is the table variable containing the CPI. For more details, enter Description at the command line. Visually assess whether the series are stationary by plotting them in the same figure. figure; yyaxis left plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL) ylabel("CPI"); yyaxis right plot(DataTimeTable.Time,DataTimeTable.M1SL); ylabel("Money Supply");
Both series are nonstationary. Stabilize the series by converting them to rates. m1slrate = price2ret(DataTimeTable.M1SL); inflation = price2ret(DataTimeTable.CPIAUCSL);
Assume that a VAR(1) model is an appropriate multivariate model for the rates. Conduct a default Granger causality test to assess whether the M1 money supply rate Granger-causes the inflation rate. h = gctest(m1slrate,inflation)
12-1211
12
Functions
h = logical 1
The test decision h is 1, which indicates the rejection of the null hypothesis that the M1 money supply rate does not Granger-cause inflation.
Test Series for Feedback Time series undergo feedback when they Granger-cause each other. Assess whether the US inflation and M1 money supply rates undergo feedback. Load the US macroeconomic data set Data_USEconModel.mat. Convert the price series to returns. load Data_USEconModel inflation = price2ret(DataTimeTable.CPIAUCSL); m1slrate = price2ret(DataTimeTable.M1SL);
Conduct a Granger causality test to assess whether the inflation rate Granger-causes the M1 money supply rate. Assume that an underlying VAR(1) model is appropriate for the two series. The default level of significance α for the test is 0.05. Because this example conducts two tests, decrease α by half for each test to achieve a family-wise level of significance of 0.05. hIRgcM1 = gctest(inflation,m1slrate,Alpha=0.025) hIRgcM1 = logical 1
The test decision hIRgcM1 = 1 indicates rejection of the null hypothesis of noncausality. There is enough evidence to suggest that the inflation rate Granger-causes the M1 money supply rate at 0.025 level of significance. Conduct another Granger causality test to assess whether the M1 money supply rate Granger-causes the inflation rate. hM1gcIR = gctest(m1slrate,inflation,Alpha=0.025) hM1gcIR = logical 0
The test decision hM1gcIR = 0 indicates that the null hypothesis of noncausality should not be rejected. There is not enough evidence to suggest that the M1 money supply rate Granger-causes the inflation rate at 0.025 level of significance. Because not enough evidence exists to suggest that the inflation rate Granger-causes the M1 money supply rate, the two series do not undergo feedback.
12-1212
gctest
Conduct 1-Step Granger Causality Test Conditioned on Variable Assess whether the US gross domestic product (GDP) Granger-causes CPI conditioned on the M1 money supply. Load the US macroeconomic data set Data_USEconModel.mat. load Data_USEconModel
The variables GDP and GDPDEF of DataTimeTable are the US GDP and its deflator with respect to year 2000 dollars, respectively. Both series are nonstationary. Convert the M1 money supply and CPI to rates. Convert the US GDP to the real GDP rate. m1slrate = price2ret(DataTimeTable.M1SL); inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF);
Assume that a VAR(1) model is an appropriate multivariate model for the rates. Conduct a Granger causality test to assess whether the real GDP rate has an impact on the predictive distribution of the inflation rate, conditioned on the M1 money supply. The inclusion of a conditional variable forces gctest to conduct a 1-step Granger causality test. h = gctest(rgdprate,inflation,m1slrate) h = logical 0
The test decision h is 0, which indicates failure to reject the null hypothesis that the real GDP rate is not a 1-step Granger-cause of inflation when you account for the M1 money supply rate. gctest includes the M1 money supply rate as a response variable in the underlying VAR(1) model, but it does not include the M1 money supply in the computation of the test statistics. Conduct the test again, but without conditioning on the M1 money supply rate. h = gctest(rgdprate,inflation) h = logical 0
The test result is the same as before, suggesting that the real GDP rate does not Granger-cause inflation at all periods in a forecast horizon and regardless of whether you account for the M1 money supply rate in the underlying VAR(1) model.
Adjust Number of Lags for Underlying VAR Model By default, gctest assumes an underlying VAR(1) model for all specified response variables. However, a VAR(1) model might be an inappropriate representation of the data. For example, the model might not capture all the serial correlation present in the variables. To specify a more complex underlying VAR model, you can increase the number of lags by specifying the NumLags name-value argument of gctest. 12-1213
12
Functions
Consider the Granger causality tests conducted in “Conduct 1-Step Granger Causality Test Conditioned on Variable” on page 12-1212. Load the US macroeconomic data set Data_USEconModel.mat. Convert the M1 money supply and CPI to rates. Convert the US GDP to the real GDP rate. load Data_USEconModel m1slrate = price2ret(DataTimeTable.M1SL); inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF);
Preprocess the data by removing all missing observations (indicated by NaN). idx = sum(isnan([m1slrate inflation rgdprate]),2) < 1; m1slrate = m1slrate(idx); inflation = inflation(idx); rgdprate = rgdprate(idx); T = numel(m1slrate); % Total sample size
Fit VAR models, with lags ranging from 1 to 4, to the real GDP and inflation rate series. Initialize each fit by specifying the first four observations. Store the Akaike information criteria (AIC) of the fits. numseries = 2; numlags = (1:4)'; nummdls = numel(numlags); % Partition time base. maxp = max(numlags); % Maximum number of required presample responses idxpre = 1:maxp; idxest = (maxp + 1):T; % Preallocation EstMdl(1:nummdls) = varm(numseries,0); aic = zeros(nummdls,1); % Fit VAR models to data. Y0 = [rgdprate(idxpre) inflation(idxpre)]; % Presample Y = [rgdprate(idxest) inflation(idxest)]; % Estimation sample for j = 1:numel(numlags) Mdl = varm(numseries,numlags(j)); Mdl.SeriesNames = ["rGDP" "Inflation"]; EstMdl(j) = estimate(Mdl,Y,Y0=Y0); results = summarize(EstMdl(j)); aic(j) = results.AIC; end p = numlags(aic == min(aic)) p = 3
A VAR(3) model yields the best fit. Assess whether the real GDP rate Granger-causes inflation. gctest removes p observations from the beginning of the input data to initialize the underlying VAR(p) model for estimation. Prepend only the required p = 3 presample observations to the estimation sample. Specify the concatenated series as input data. Return the p-value of the test.
12-1214
gctest
rgdprate3 = [Y0((end - p + 1):end,1); Y(:,1)]; inflation3 = [Y0((end - p + 1):end,2); Y(:,2)]; [h,pvalue] = gctest(rgdprate3,inflation3,NumLags=p) h = logical 1 pvalue = 7.7741e-04
The p-value is approximately 0.0008, indicating the existence of strong evidence to reject the null hypothesis of noncausality, that is, that the three real GDP rate lags in the inflation rate equation are jointly zero. Given the VAR(3) model, there is enough evidence to suggest that the real GDP rate Granger-causes at least one future value of the inflation rate. Alternatively, you can conduct the same test by passing the estimated VAR(3) model (represented by the varm model object in EstMdl(3)), to the object function gctest. Specify a block-wise test and the "cause" and "effect" series names. h = gctest(EstMdl(3),Type="block-wise", ... Cause="rGDP",Effect="Inflation") H0 ____________________________________________
Decision ___________
"Exclude lagged rGDP in Inflation equations"
"Reject H0"
Distribution ____________ "Chi2(3)"
Statistic _________ 16.799
h = logical 1
Address Integrated Series During Testing If you are testing integrated series for Granger causality, then the Wald test statistic does not follow a χ 2 or F distribution, and test results can be unreliable. However, you can implement the Granger causality test in [5] by specifying the maximum integration order among all the variables in the system using the Integration name-value argument. Consider the Granger causality tests conducted in “Conduct 1-Step Granger Causality Test Conditioned on Variable” on page 12-1212. Load the US macroeconomic data set Data_USEconModel.mat and take the log of real GDP and CPI. load Data_USEconModel cpi = log(DataTimeTable.CPIAUCSL); rgdp = log(DataTimeTable.GDP./DataTimeTable.GDPDEF);
Assess whether the real GDP Granger-causes CPI. Assume the series are I 1 , or order-1 integrated. Also, specify an underlying VAR(3) model and the F test. Return the test statistic and p-value. [h,pvalue,stat] = gctest(rgdp,cpi,NumLags=3, ... Integration=1,Test="f") h = logical 1
12-1215
_
0
12
Functions
pvalue = 0.0031 stat = 4.7557
The p-value = 0.0031, indicating the existence of strong evidence to reject the null hypothesis of noncausality, that is, that the three real GDP lags in the CPI equation are jointly zero. Given the VAR(3) model, there is enough evidence to suggest that real GDP Granger-causes at least one future value of the CPI. In this case, the test augments the VAR(3) model with an additional lag. In other words, the model is a VAR(4) model. However, gctest tests only whether the first three lags are 0.
Test for Block Exogeneity Time series are block exogenous if they do not Granger-cause any other variables in a multivariate system. Test whether the effective federal funds rate is block exogenous with respect to the real GDP, personal consumption expenditures, and inflation rates. Load the US macroeconomic data set Data_USEconModel.mat. Convert the price series to returns. load Data_USEconModel inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF); pcerate = price2ret(DataTimeTable.PCEC);
Test whether the federal funds rate is nonstationary by conducting an augmented Dickey-Fuller test. Specify that the alternative model has a drift term and an F test. StatTbl = adftest(DataTimeTable,DataVariable="FEDFUNDS",Model="ard") StatTbl=1×8 table h _____ Test 1
false
pValue ________
stat _______
cValue _______
0.071419
-2.7257
-2.8751
Lags ____ 0
Alpha _____
Model _______
Test ______
0.05
{'ARD'}
{'T1'}
The test decision h = 0 indicates that the null hypothesis that the series has a unit root should not be rejected, at 0.05 significance level. To stabilize the federal funds rate series, apply the first difference to it. dfedfunds = diff(DataTimeTable.FEDFUNDS);
Assume a 4-D VAR(3) model for the four series. Assess whether the federal funds rate is block exogenous with respect to the real GDP, personal consumption expenditures, and inflation rates. Conduct an F-based Wald test, and return the p-value, test statistic, and critical value. cause = dfedfunds; effects = [inflation rgdprate pcerate]; [hgc,pvalue,stat,cvalue] = gctest(cause,effects,NumLags=2,Test="f") hgc = logical 1
12-1216
gctest
pvalue = 4.1619e-10 stat = 10.4383 cvalue = 2.1426
The test decision hgc = 1 indicates that the null hypothesis that the federal funds rate is block exogenous should be rejected. This result suggests that the federal funds rate Granger-causes at least one of the other variables in the system. To determine which variables the federal funds rate Granger-causes, you can run a leave-one-out test. For more details, see gctest.
Input Arguments Y1 — Data for response variables representing Granger-causes numeric vector | numeric matrix Data for the response variables representing the Granger-causes in the test, specified as a numobs1by-1 numeric vector or a numobs1-by-numseries1 numeric matrix. numobs1 is the number of observations and numseries1 is the number of time series variables. Row t contains the observation in time t, the last row contains the latest observation. Y1 must have enough rows to initialize and estimate the underlying VAR model. gctest uses the first NumLags observations to initialize the model for estimation. Columns correspond to distinct time series variables. Data Types: double | single Y2 — Data for response variables affected by Granger-causes numeric vector | numeric matrix Data for response variables affected by the Granger-causes in the test, specified as a numobs2-by-1 numeric vector or a numobs2-by-numseries2 numeric matrix. numobs2 is the number of observations in the data and numseries2 is the number of time series variables. Row t contains the observation in time t, the last row contains the latest observation. Y2 must have enough rows to initialize and estimate the underlying VAR model. gctest uses the first NumLags observations to initialize the model for estimation. Columns correspond to distinct time series variables. Data Types: double | single Y3 — Data for conditioning response variables numeric vector | numeric matrix Data for conditioning response variables, specified as a numobs3-by-1 numeric vector or a numobs3by-numseries3 numeric matrix. numobs3 is the number of observations in the data and numseries3 is the number of time series variables. Row t contains the observation in time t, the last row contains the latest observation. Y3 must have enough rows to initialize and estimate the underlying VAR model. gctest uses the first NumLags observations to initialize the model for estimation. 12-1217
12
Functions
Columns correspond to distinct time series variables. If you specify Y3, then Y1, Y2, and Y3 represent the response variables in the underlying VAR model. gctest assesses whether Y1 is a 1-step Granger-cause of Y2. Data Types: double | single Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: Alpha=0.10,NumLags=2 specifies a 0.10 significance level for the test and uses an underlying VAR(2) model for all response variables. NumLags — Number of lagged responses 1 (default) | nonnegative integer Number of lagged responses to include in the underlying VAR model for all response variables, specified as a nonnegative integer. The resulting underlying model is a VAR(NumLags) model. Example: NumLags=2 Data Types: double | single Integration — Maximum order of integration 0 (default) | nonnegative integer Maximum order of integration among all response variables, specified as a nonnegative integer. To address integration, gctest augments the VAR(NumLags) model by adding additional lagged responses beyond NumLags to all equations during estimation. For more details, see [5] and [3]. Example: Integration=1 Data Types: double | single Constant — Flag indicating inclusion of model intercepts true (default) | false Flag indicating the inclusion of model intercepts (constants) in the underlying VAR model, specified as a value in this table. Value
Description
true
All equations in the underlying VAR model have an intercept. gctest estimates the intercepts with all other estimable parameters.
false
All underlying VAR model equations do not have an intercept. gctest sets all intercepts to 0.
Example: Constant=false Data Types: logical 12-1218
gctest
Trend — Flag indicating inclusion of linear time trends false (default) | true Flag indicating the inclusion of linear time trends in the underlying VAR model, specified as a value in this table. Value
Description
true
All equations in the underlying VAR model have a linear time trend. gctest estimates the linear time trend coefficients with all other estimable parameters.
false
All underlying VAR model equations do not have a linear time trend.
Example: Trend=false Data Types: logical X — Predictor data numeric matrix Predictor data for the regression component in the underlying VAR model, specified as a numeric matrix containing numpreds columns. numpreds is the number of predictor variables. Row t contains the observation in time t, and the last row contains the latest observation. gctest does not use the regression component in the presample period. X must have at least as many observations as the number of observations used by gctest after the presample period. Specifically, X must have at least numobs – Mdl.P observations, where numobs = min([numobs1 numobs2 numobs3]). If you supply more rows than necessary, gctest uses the latest observations only. Columns correspond to individual predictor variables. gctest treats predictors as exogenous. All predictor variables are present in the regression component of each response equation. By default, gctest excludes a regression component from all equations. Data Types: double | single Alpha — Significance level 0.05 (default) | numeric scalar in (0,1) Significance level for the test, specified as a numeric scalar in (0,1). Example: Alpha=0.1 Data Types: double | single Test — Test statistic distribution under the null hypothesis "chi-square" (default) | "f" Test statistic distribution under the null hypothesis, specified as a value in this table. Value
Description
"chi-square"
gctest derives outputs from conducting a χ2 test.
12-1219
12
Functions
Value
Description
"f"
gctest derives outputs from conducting an F test.
For test statistic forms, see [4]. Example: Test="f" Data Types: char | string
Output Arguments h — Block-wise Granger causality test decision logical scalar Block-wise Granger causality test on page 12-1220 decision, returned as a logical scalar. • h = 1 indicates rejection of H0. • If you specify the conditioning response data Y3, then sufficient evidence exists to suggest that the response variables represented in Y1 are 1-step Granger-causes of the response variables represented in Y2, conditioned on the response variables represented in Y3. • Otherwise, sufficient evidence exists to suggest that the variables in Y1 are h-step Grangercauses of the variables in Y2 for some h ≥ 0. In other words, Y1 is block endogenous with respect to Y2. • h = 0 indicates failure to reject H0. • If you specify Y3, then the variables in Y1 are not 1-step Granger-causes of the variables in Y2, conditioned on Y3. • Otherwise, Y1 does not Granger-cause Y2. In other words, there is not enough evidence to reject block exogeneity of Y1 with respect to Y2. pvalue — p-value numeric scalar p-value, returned as a numeric scalar. stat — Test statistic numeric scalar Test statistic, returned as a numeric scalar. cvalue — Critical value numeric scalar Critical value for the significance level Alpha, returned as a numeric scalar.
More About Granger Causality Test The Granger causality test is a statistical hypothesis test that assesses whether past and present values of a set of m1 = numseries1 time series variables y1,t, called the "cause" variables, affect the 12-1220
gctest
predictive distribution of a distinct set of m2 = numseries2 time series variables y2,t, called the "effect" variables. The impact is a reduction in forecast mean squared error (MSE) of y2,t. If past values of y1,t affect y2,t + h, then y1,t is an h-step Granger-cause of y2,t. In other words, y1,t Grangercauses y2,t if y1,t is an h-step Granger-cause of y2,t for all h ≥ 1. Consider a stationary VAR(p) model on page 12-1221 for [y1,t y2,t]: y1, t y2, t
= c + δt + βxt +
Φ11, 1 Φ12, 1 y1, t − 1 Φ21, 1 Φ22, 1 y2, t − 1
+ ... +
Φ11, p Φ12, p y1, t − p Φ21, p Φ22, p y2, t − p
+
ε1, t ε2, t
.
Assume the following conditions: • Future values cannot inform past values. • y1,t uniquely informs y2,t (no other variable has the information to inform y2,t). If Φ21,1 = … = Φ21,p = 0m1,m2, then y1,t is not the block-wise Granger cause of y2,t + h, for all h ≥ 1 and where 0m2,m1 is an m2-by-m1 matrix of zeros. Also, y1,t is block exogenous with respect to y2,t. Consequently, the block-wise Granger causality test hypotheses are: H0 : Φ21, 1 = ... = Φ21, p = 0m2, m1 H1 : ∃ j ∈ 1, ..., p ∋ Φ21, j ≠ 0m2, m1 . H1 implies that at least one h ≥ 1 exists such that y1,t is an h-step Granger-cause of y2,t. gctest conducts χ2-based or F-based Wald tests (see 'Test'). For test statistic forms, see [4]. Distinct conditioning endogenous variables y3,t can be included in the system (see Y3). In this case, the VAR(p) model is: y1, t
Φ11, 1 Φ12, 1 Φ13, 1 y1, t − 1
Φ11, p Φ12, p Φ13, p y1, t − p
ε1, t
y2, t = c + δt + βxt + Φ21, 1 Φ22, 1 Φ23, 1 y2, t − 1 + ... + Φ21, p Φ22, p Φ23, p y2, t − p + ε2, t . y3, t
Φ31, 1 Φ32, 1 Φ33, 1 y3, t − 1
Φ31, p Φ32, p Φ33, p y3, t − p
ε3, t
gctest does not test the parameters associated with the conditioning variables. The test assesses only whether y1,t is an 1-step Granger-cause of y2,t. Vector Autoregression Model A vector autoregression (VAR) model is a stationary multivariate time series model consisting of a system of m equations of m distinct response variables as linear functions of lagged responses and other terms. A VAR(p) model in difference-equation notation and in reduced form is yt = c + Φ1 yt − 1 + Φ2 yt − 2 + ... + Φp yt − p + βxt + δt + εt . • yt is a numseries-by-1 vector of values corresponding to numseries response variables at time t, where t = 1,...,T. The structural coefficient is the identity matrix. • c is a numseries-by-1 vector of constants. • Φj is a numseries-by-numseries matrix of autoregressive coefficients, where j = 1,...,p and Φp is not a matrix containing only zeros. 12-1221
12
Functions
• xt is a numpreds-by-1 vector of values corresponding to numpreds exogenous predictor variables. • β is a numseries-by-numpreds matrix of regression coefficients. • δ is a numseries-by-1 vector of linear time-trend values. • εt is a numseries-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively a numseries-by-numseries covariance matrix Σ. For t ≠ s, εt and εs are independent. Condensed and in lag operator notation, the system is Φ(L)yt = c + βxt + δt + εt, where Φ(L) = I − Φ1L − Φ2L2 − ... − ΦpLp, Φ(L)yt is the multivariate autoregressive polynomial, and I is the numseries-by-numseries identity matrix. For example, a VAR(1) model containing two response series and three exogenous predictor variables has this form: y1, t = c1 + ϕ11 y1, t − 1 + ϕ12 y2, t − 1 + β11x1, t + β12x2, t + β13x3, t + δ1t + ε1, t y2, t = c2 + ϕ21 y1, t − 1 + ϕ22 y2, t − 1 + β21x1, t + β22x2, t + β23x3, t + δ2t + ε2, t .
Version History Introduced in R2019a
References [1] Granger, C. W. J. "Investigating Causal Relations by Econometric Models and Cross-Spectral Methods." Econometrica. Vol. 37, 1969, pp. 424–459. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] Dolado, J. J., and H. Lütkepohl. "Making Wald Tests Work for Cointegrated VAR Systems." Econometric Reviews. Vol. 15, 1996, pp. 369–386. [4] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [5] Toda, H. Y., and T. Yamamoto. "Statistical Inferences in Vector Autoregressions with Possibly Integrated Processes." Journal of Econometrics. Vol. 66, 1995, pp. 225–250.
See Also Objects varm Functions armairf | armafevd | estimate | gctest
12-1222
gctest
gctest Granger causality and block exogeneity tests for vector autoregression (VAR) models
Syntax h = gctest(Mdl) h = gctest(Mdl,Name,Value) [h,Summary] = gctest( ___ )
Description The gctest object function can conduct leave-one-out, exclude-all, and block-wise Granger causality tests on page 12-1235 for the response variables of a fully specified vector autoregression on page 12-1237 (VAR) model (represented by a varm model object). To conduct a block-wise Granger causality test from specified sets of time series data representing "cause" and "effect" multivariate response variables, or to address possibly integrated series for the test, see the gctest function. h = gctest(Mdl) returns the test decision h from conducting leave-one-out Granger causality tests on page 12-1235 on all response variables that compose the VAR(p) model Mdl. h = gctest(Mdl,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, 'Type',"block-wise",'Cause',1:2,'Effect',3:5 specifies conducting a block-wise test to assess whether the response variables Mdl.SeriesNames(1:2) Granger-cause the response variables Mdl.SeriesNames(3:5) conditioned on all other variables in the model. [h,Summary] = gctest( ___ ) additionally returns a table that summarizes all test results, using any of the input argument combinations in the previous syntaxes. Results include test decisions and p-values.
Examples Conduct Leave-One-Out Granger Causality Test Conduct a leave-one-out Granger causality test to assess whether each variable in a 3-D VAR model Granger-causes another variable, given the third variable. The variables in the VAR model are the M1 money supply, consumer price index (CPI), and US gross domestic product (GDP). Load the US macroeconomic data set Data_USEconModel.mat. load Data_USEconModel
The data set includes the MATLAB® timetable DataTimeTable, which contains 14 variables measured from Q1 1947 through Q1 2009. • M1SL is the table variable containing the M1 money supply. 12-1223
12
Functions
• CPIAUCSL is the table variable containing the CPI. • GDP is the table variable containing the US GDP. For more details, enter Description at the command line. Visually assess whether the series are stationary. plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL) ylabel("Money Supply");
plot(DataTimeTable.Time,DataTimeTable.M1SL) ylabel("CPI");
12-1224
gctest
plot(DataTimeTable.Time,DataTimeTable.GDP) ylabel("GDP")
12-1225
12
Functions
All series are nonstationary. Stabilize the series. • Convert the M1 money supply prices to returns. • Convert the CPI to the inflation rate. • Convert the GDP to the real GDP rate with respect to year 2000 dollars. m1slrate = price2ret(DataTimeTable.M1SL); inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF);
Preprocess the data by removing all missing observations (indicated by NaN). tbl = table(m1slrate,inflation,rgdprate); tbl = rmmissing(tbl); T = size(tbl,1); % Total sample size
Fit VAR models, with lags ranging from 1 to 4, to the series. Initialize each fit by specifying the first four observations. Store the Akaike information criteria (AIC) of the fits. numseries = 3; numlags = (1:4)'; nummdls = numel(numlags); % Partition time base. maxp = max(numlags); % Maximum number of required presample responses
12-1226
gctest
idxpre = 1:maxp; idxest = (maxp + 1):T; % Preallocation EstMdl(nummdls) = varm(numseries,0); aic = zeros(nummdls,1); % Fit VAR models to data. Y0 = tbl{idxpre,:}; % Presample Y = tbl{idxest,:}; % Estimation sample for j = 1:numel(numlags) Mdl = varm(numseries,numlags(j)); Mdl.SeriesNames = tbl.Properties.VariableNames; EstMdl(j) = estimate(Mdl,Y,'Y0',Y0); results = summarize(EstMdl(j)); aic(j) = results.AIC; end [~,bestidx] = min(aic); p = numlags(bestidx) p = 3 BestMdl = EstMdl(bestidx);
A VAR(3) model yields the best fit. For each variable and equation in the system, conduct a leave-one-out Granger causality test to assess whether a variable in the fitted VAR(3) model is the 1-step Granger-cause of another variable, given the third variable. h = gctest(BestMdl) H0 _______________________________________________
Decision __________________
"Exclude "Exclude "Exclude "Exclude "Exclude "Exclude
"Cannot "Cannot "Cannot "Reject "Cannot "Reject
lagged lagged lagged lagged lagged lagged
inflation in m1slrate equation" rgdprate in m1slrate equation" m1slrate in inflation equation" rgdprate in inflation equation" m1slrate in rgdprate equation" inflation in rgdprate equation"
reject reject reject H0" reject H0"
H0" H0" H0" H0"
Distribution ____________ "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)"
h = 6x1 logical array 0 0 0 1 0 1
gctest conducts six tests, so h is a 6-by-1 logical vector of test decisions corresponding to the rows of the test summary table. The results indicate the following decisions, at a 5% level of significance: • Do not reject the claim that the inflation rate is not the 1-step Granger-cause of the M1 money supply rate, given the real GDP rate (h(1) = 0). 12-1227
Stat ____
7.0 2.5 2.7 14. 7.0 12.
12
Functions
• Do not reject the claim that the real GDP rate is not the 1-step Granger-cause of the M1 money supply rate, given the inflation rate (h(2) = 0). • Do not reject the claim that the M1 money supply rate is not the 1-step Granger-cause of the inflation rate, given the real GDP rate (h(3) = 0). • Reject the claim that the real GDP rate is not the 1-step Granger-cause of the inflation rate, given the M1 money supply rate (h(4) = 1). • Do not reject the claim that the M1 money supply rate is not the 1-step Granger-cause of the real GDP rate, given the inflation rate (h(5) = 0). • Reject the claim that the inflation rate is not the 1-step Granger-cause of the real GDP rate, given the M1 money supply rate (h(6) = 1). Because the inflation and real GDP rates are 1-step Granger-causes of each other, they constitute a feedback loop.
Conduct Exclude-All Granger Causality Test Consider the 3-D VAR(3) model and leave-one-out Granger causality test in “Conduct Leave-One-Out Granger Causality Test” on page 12-1223. Load the US macroeconomic data set Data_USEconModel.mat. Preprocess the data. Fit a VAR(3) model to the preprocessed data. load Data_USEconModel m1slrate = price2ret(DataTimeTable.M1SL); inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF); tbl = table(m1slrate,inflation,rgdprate); tbl = rmmissing(tbl); Mdl = varm(3,3); Mdl.SeriesNames = tbl.Properties.VariableNames; EstMdl = estimate(Mdl,tbl{5:end,:},'Y0',tbl{2:4,:});
Conduct an exclude-all Granger causality test on the variables of the fitted model. h = gctest(EstMdl,'Type',"exclude-all"); H0 ________________________________________________________
Decision __________________
Distributio ___________
"Exclude all but lagged m1slrate in m1slrate equation" "Exclude all but lagged inflation in inflation equation" "Exclude all but lagged rgdprate in rgdprate equation"
"Cannot reject H0" "Reject H0" "Reject H0"
"Chi2(6)" "Chi2(6)" "Chi2(6)"
gctest conducts numtests = 3 tests. The results indicate the following decisions, each at a 5% level of significance: • Do not reject the claim that the inflation and real GDP rates do not Granger-cause the M1 money supply rate. 12-1228
gctest
• Reject the claim that the M1 money supply and real GDP rates do not Granger-cause the inflation rate. • Reject the claim that the M1 money supply and inflation rates do not Granger-cause the real GDP rate.
Adjust Significance Level for Multiple Tests The false discovery rate increases with the number of simultaneous hypothesis tests you conduct. To combat the increase, decrease the level of significance per test by using the 'Alpha' name-value pair argument. Consider the 3-D VAR(3) model and leave-one-out Granger causality test in “Conduct Leave-One-Out Granger Causality Test” on page 12-1223. Load the US macroeconomic data set Data_USEconModel.mat. Preprocess the data. Fit a VAR(3) model to the preprocessed data. load Data_USEconModel m1slrate = price2ret(DataTimeTable.M1SL); inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF); tbl = table(m1slrate,inflation,rgdprate); tbl = rmmissing(tbl); Mdl = varm(3,3); Mdl.SeriesNames = tbl.Properties.VariableNames; EstMdl = estimate(Mdl,tbl{5:end,:},'Y0',tbl{2:4,:});
A leave-one-out Granger causality test on the variables in the model results in numtests = 6 simultaneous tests. Conduct the tests, but specify a family-wise significance level of 0.05 by specifying a level of significance of alpha = 0.05/numtests for each test. numtests = 6; alpha = 0.05/numtests alpha = 0.0083 gctest(EstMdl,'Alpha',alpha); H0 _______________________________________________
Decision __________________
"Exclude "Exclude "Exclude "Exclude "Exclude "Exclude
"Cannot "Cannot "Cannot "Reject "Cannot "Reject
lagged lagged lagged lagged lagged lagged
inflation in m1slrate equation" rgdprate in m1slrate equation" m1slrate in inflation equation" rgdprate in inflation equation" m1slrate in rgdprate equation" inflation in rgdprate equation"
reject reject reject H0" reject H0"
H0" H0" H0" H0"
Distribution ____________ "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)"
The test decisions of these more conservative tests are the same as the tests decisions in “Conduct Leave-One-Out Granger Causality Test” on page 12-1223. However, the conclusions of the conservative tests hold simultaneously at a 5% level of significance.
12-1229
Stat ____
7.0 2.5 2.7 14. 7.0 12.
12
Functions
Access Test p-Values Consider the 3-D VAR(3) model and leave-one-out Granger causality test in “Conduct Leave-One-Out Granger Causality Test” on page 12-1223. Load the US macroeconomic data set Data_USEconModel.mat. Preprocess the data. Fit a VAR(3) model to the preprocessed data. load Data_USEconModel m1slrate = price2ret(DataTimeTable.M1SL); inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF); tbl = table(m1slrate,inflation,rgdprate); tbl = rmmissing(tbl); Mdl = varm(3,3); Mdl.SeriesNames = tbl.Properties.VariableNames; EstMdl = estimate(Mdl,tbl{5:end,:},'Y0',tbl{2:4,:});
Conduct a leave-one-out Granger causality test on the variables of the fitted model. Return the test result summary table and suppress the test results display. [~,Summary] = gctest(EstMdl,'Display',false) Summary=6×6 table H0 _______________________________________________
Decision __________________
"Exclude "Exclude "Exclude "Exclude "Exclude "Exclude
"Cannot "Cannot "Cannot "Reject "Cannot "Reject
lagged lagged lagged lagged lagged lagged
inflation in m1slrate equation" rgdprate in m1slrate equation" m1slrate in inflation equation" rgdprate in inflation equation" m1slrate in rgdprate equation" inflation in rgdprate equation"
reject reject reject H0" reject H0"
H0" H0" H0" H0"
Distribution ____________ "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)" "Chi2(3)"
Summary is a MATLAB table containing numtests = 6 rows. The rows contain the results of each test. The columns are table variables containing characteristics of the tests. Extract the p-values of the tests. pvalues = Summary.PValue pvalues = 6×1 0.0698 0.4648 0.4398 0.0025 0.0708 0.0074
12-1230
Stat ____
7.0 2.5 2.7 14. 7.0 12.
gctest
Test for Block Exogeneity Time series are block exogenous if they do not Granger-cause any other variables in a multivariate system. Test whether the effective federal funds rate is block exogenous with respect to the real GDP, personal consumption expenditures, and inflation rates. Load the US macroeconomic data set Data_USEconModel.mat. Convert the price series to returns. load Data_USEconModel inflation = price2ret(DataTimeTable.CPIAUCSL); rgdprate = price2ret(DataTimeTable.GDP./DataTimeTable.GDPDEF); pcerate = price2ret(DataTimeTable.PCEC);
Test whether the federal funds rate is nonstationary by conducting an augmented Dickey-Fuller test. Specify that the alternative model has a drift term and an F test. StatTbl = adftest(DataTimeTable,DataVariable="FEDFUNDS",Model="ard") StatTbl=1×8 table h _____ Test 1
false
pValue ________
stat _______
cValue _______
0.071419
-2.7257
-2.8751
Lags ____ 0
Alpha _____
Model _______
Test ______
0.05
{'ARD'}
{'T1'}
The test decision h = 0 indicates that the null hypothesis that the series has a unit root should not be rejected, at 0.05 significance level. To stabilize the federal funds rate series, apply the first difference to it. dfedfunds = diff(DataTimeTable.FEDFUNDS);
Preprocess the data by removing all missing observations (indicated by NaN). tbl = table(inflation,pcerate,rgdprate,dfedfunds); tbl = rmmissing(tbl); T = size(tbl,1); % Total sample size
Assume a 4-D VAR(3) model for the four series. Initialize the model by using the first three observations, and fit the model to the rest of the data. Assign names to the series in the model. Mdl = varm(4,3); Mdl.SeriesNames = tbl.Properties.VariableNames; EstMdl = estimate(Mdl,tbl{4:end,:},Y0=tbl{1:3,:});
Assess whether the federal funds rate is block exogenous with respect to the real GDP, personal consumption expenditures, and inflation rates. Conduct an F-based Wald test, and return the test decision and summary table. Suppress the test results display. cause = "dfedfunds"; effects = ["inflation" "pcerate" "rgdprate"]; [h,Summary] = gctest(EstMdl,Type="blockwise", ... Cause=cause,Effect=effects,Test="f",Display=false);
gctest conducts one test. h = 1 indicates, at a 5% level of significance, rejection of the null hypothesis that the federal funds rate is block exogenous with respect to the other variables in the VAR model. This result suggests that the federal funds rate Granger-causes at least one of the other variables in the system. 12-1231
12
Functions
Alternatively, you can conduct the same blockwise Granger causality test by passing the data to the gctest function. causedata = tbl.dfedfunds; EffectsData = tbl{:,effects}; [hgc,pvalue,stat,cvalue] = gctest(causedata,EffectsData,... NumLags=3,Test="f") hgc = logical 1 pvalue = 9.0805e-09 stat = 6.9869 cvalue = 1.9265
To determine which variables are Granger-caused by the federal funds rate, conduct a leave-one-out test and specify the "cause" and "effects." gctest(EstMdl,Cause=cause,Effect=effects); H0 ________________________________________________
Decision ___________
"Exclude lagged dfedfunds in inflation equation" "Exclude lagged dfedfunds in pcerate equation" "Exclude lagged dfedfunds in rgdprate equation"
"Reject H0" "Reject H0" "Reject H0"
Distribution ____________ "Chi2(3)" "Chi2(3)" "Chi2(3)"
The test results suggest the following decisions, each at a 5% level of significance: • Reject the claim that the federal funds rate is not a 1-step Granger-cause of the inflation rate, given all other variables in the VAR model. • Reject the claim that the federal funds rate is not a 1-step Granger-cause of the personal consumption expenditures rate, given all other variables in the VAR model. • Reject the claim that the federal funds rate is not a 1-step Granger-cause of the real GDP rate, given all other variables in the VAR model.
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object created by varm or estimate. Mdl must be fully specified. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-1232
Statistic _________ 26.157 10.151 10.651
gctest
Example: 'Type',"block-wise",'Cause',1:2,'Effect',3:5 specifies conducting a block-wise test to assess whether the response variables Mdl.SeriesNames(1:2) Granger-cause the response variables Mdl.SeriesNames(3:5) conditioned on all other variables in the model. Type — Granger causality test to conduct "leave-one-out" (default) | "exclude-all" | "block-wise" Granger causality test to conduct, specified as the comma-separated pair consisting of 'Type' and a value in this table. Suppose that the VAR model Mdl is m-D (m = Mdl.NumSeries). Value
Description
"leave-one-out"
Leave-one-out test For j = 1,…,m, k = 1,…,m, and j ≠ k, gctest tests the null hypothesis that variable j does not Granger-cause variable k, conditioned on all other variables in the model. This setting conducts numtests = m(m – 1) tests.
"exclude-all"
Exclude-all test For j = 1,…,m, gctest tests the null hypothesis that all other response variables do not jointly Granger-cause response variable j. This setting conducts numtests = m tests.
"block-wise"
Block-wise test gctest tests the null hypothesis that the response variables specified by Causes do not jointly Granger-cause the response variables specified by Effects. This setting conducts numtests = 1 test. If you do not specify Causes and Effects, then gctest tests the null hypothesis that all lag coefficients, including self lags, are zero.
Example: 'Type',"exclude-all" Data Types: char | string Alpha — Significance level 0.05 (default) | numeric scalar in (0,1) Significance level for each conducted test (see Type), specified as the comma-separated pair consisting of 'Alpha' and a numeric scalar in (0,1). Example: 'Alpha',0.1 Data Types: double | single Test — Test statistic distribution under null hypothesis "chi-square" (default) | "f" Test statistic distribution under the null hypothesis, specified as the comma-separated pair consisting of 'Test' and a value in this table.
12-1233
12
Functions
Value
Description
"chi-square"
gctest derives outputs from conducting a χ2 test.
"f"
gctest derives outputs from conducting an F test.
For test statistic forms, see [4]. Example: 'Test',"f" Data Types: char | string Cause — VAR model response variables representing Granger-causes numeric vector of variable indices | vector of variable names VAR model response variables representing Granger-causes in the 1-step block-wise test, specified as the comma-separated pair consisting of 'Cause' and a numeric vector of variable indices or a vector of variable names. For either input type, values correspond to the response series names in the SeriesNames property of the input VAR model object Mdl, which you access by using dot notation: Mdl.SeriesNames. Example: 'Cause',["rGDP" "m1sl"] Example: 'Cause',1:2 Data Types: single | double | char | string Effect — VAR model response variables affected by Granger-causes numeric vector of variable indices | vector of variable names VAR model response variables affected by Granger-causes in the 1-step block-wise test, specified as the comma-separated pair consisting of 'Effect' and a numeric vector of variable indices or a vector of variable names. For either input type, values correspond to the response series names in the SeriesNames property of the input VAR model object Mdl, which you access by using dot notation: Mdl.SeriesNames. Example: 'Cause',"inflation" Example: 'Cause',3 Data Types: single | double | char | string Display — Flag to display test summary table true (default) | false Flag to display a test summary table at the command line, specified as the comma-separated pair consisting of 'Display' and a value in this table. Value
Description
true
Display a test summary table, as returned in Summary, at the command line.
false
Do not display a test summary table.
Example: 'Display',false 12-1234
gctest
Data Types: logical
Output Arguments h — Granger causality test decisions logical scalar | logical vector Granger causality test decisions, returned as a logical scalar or numtests-by-1 logical vector. For j = 1,…,numtests: • h(j) = 1 indicates that test j rejects the null hypothesis H0 that the "cause" variables are not 1step Granger-causes of the "effect" variables. Sufficient evidence exists to support Granger causality and endogeneity of the "cause" variables. • h(j) = 0 indicates failure to reject H0. The number of tests conducted depends on the specified test type (see Type). For more details on the conducted tests, display or return the test summary table (see Display and Summary, respectively). Summary — Summary of test results table Summary of the test results, returned as a table. Each row of Summary corresponds to one of the numtests conducted tests. The columns describe the characteristics of the tests. Column Name
Description
Data Type
H0
Granger causality or block exogeneity test null hypothesis description
String scalar
Decision
Test decisions corresponding to String scalar h
Distribution
Test statistic distribution under the null hypothesis
String scalar
Statistic
Value of test statistic
Numeric scalar
PValue
Test p-value
Numeric scalar
CriticalValue
Critical value for the significance level Alpha
Numeric scalar
More About Granger Causality Test The Granger causality test is a statistical hypothesis test that assesses whether past and present values of a set of m1 time series variables, called the "cause" variables, affect the predictive distribution of a distinct set of m2 time series variables, called the "effect" variables. The impact is a reduction in forecast mean squared error (MSE) of the "effect" variables. If past values of the "cause" variables affect the "effect" variables h-steps into the forecast horizon, the "cause" variables are hstep Granger-causes of the "effect" variables. If the "cause" variables are h-step Granger-causes of the "effect" variables for all h ≥ 1, the "cause" variables Granger-cause the "effect" variables. 12-1235
12
Functions
gctest provides block-wise, leave-one-out, and exclude-all variations of the Granger causality tests (see 'Type') and χ2-based or F-based Wald tests (see 'Test'). For test statistic forms, see [4]. For all test types, assume the following conditions: • Future values cannot inform past values. • The "cause" variables uniquely inform the "effect" variables. No other variables have the information to inform the "effect" variables. Block-Wise Test
Let y1,t denote the m1 "cause" variables and y2,t denote the m2 "effect" variables. Consider a stationary VAR(p) model on page 12-1237 for [y1,t y2,t]: y1, t y2, t
= c + δt + βxt +
Φ11, 1 Φ12, 1 y1, t − 1 Φ21, 1 Φ22, 1 y2, t − 1
+ ... +
Φ11, p Φ12, p y1, t − p Φ21, p Φ22, p y2, t − p
+
ε1, t ε2, t
.
If Φ21,1 = … = Φ21,p = 0m1,m2, then y1,t is not the block-wise Granger-cause of y2,t + h, for all h ≥ 1 and where 0m2,m1 is an m2-by-m1 matrix of zeros. Also, y1,t is block exogenous with respect to y2,t. Consequently, the block-wise Granger causality test hypotheses are: H0 : Φ21, 1 = ... = Φ21, p = 0m2, m1 H1 : ∃ j ∈ 1, ..., p ∋ Φ21, j ≠ 0m2, m1 . H1 implies that at least one h ≥ 1 exists such that y1,t is the h-step Granger-cause of y2,t. Distinct endogenous variables in the VAR model that are not "causes" or "effects" in the block-wise test are conditioning variables. If conditioning variables exist in the model, h = 1. In other words, gctest tests the null hypothesis of 1-step noncausality. Leave-One-Out Test
For each response variable and equation in the VAR model, gctest removes lags of a variable from an equation, except self lags, and tests the null hypothesis of 1-step noncausality. Specifically, consider the m-D VAR(p) model y j, t
ϕ11, 1 ϕ12, 1 ϕ13, 1 y1, t − 1
ϕ11, p ϕ12, p ϕ13, p y j, t − p
ε j, t
yk, t = c + δt + βxt + ϕ21, 1 ϕ22, 1 ϕ23, 1 y2, t − 1 + ... + ϕ21, p ϕ22, p ϕ23, p yk, t − p + εk, t , ys, t
ϕ31, 1 ϕ32, 1 Φ33, 1 y3, t − 1
ϕ31, p ϕ32, p Φ33, p ys, t − p
εs, t
where: • yj,t and yk,t are 1-D series representing the "cause" and "effect" variables, respectively. • ys,t is an (m – 2)-D series of all other endogenous variables; s = {1,…,m} \ {j,k}. • For ℓ = 1,…,p: • ϕ11,ℓ, ϕ12,ℓ, ϕ21,ℓ, and ϕ22,ℓ are scalar lag coefficients. • ϕ13,ℓ, ϕ31,ℓ, ϕ23,ℓ, and ϕ32,ℓ are (m – 2)-D vectors of lag coefficients. • Φ33,ℓ is an (m – 2)-by-(m – 2) matrix of lag coefficients. For j = 1,…,m, k = 1,…,m, and j ≠ k, gctest tests the null hypothesis that yj,t is not a 1-step Grangercause of yk,t, given ys,t: 12-1236
gctest
H0 : ϕ21, 1 = ... = ϕ21, p = 0 H1 : ∃r ∈ 1, ..., p ∋ ϕ21, r ≠ 0. gctest conducts m(m – 1) tests. Exclude-All Test
For each equation in the VAR model, gctest removes all lags from the equation, except self lags, and tests for h-step noncausality. Specifically, consider the m-D VAR(p) model yk, t y−k, t
= c + δt + βxt +
ϕkk, 1
ϕk − k, 1
yk, t − 1
ϕ−kk, 1 Φ−k − k, 1 y−k, t − 1
+ ... +
ϕkk, p
ϕk − k, p
yk, t − p
ϕ−kk, p Φ−k − k, p y−k, t − p
+
εk, t ε−k, t
,
where: • y-k,t is an (m – 1)-D series of all endogenous variables in the VAR model (except yk,t) representing the "cause" variables. • yk,t is the 1-D series representing the "effect" variable. • For ℓ = 1,…,p: • ϕkk,ℓ is a scalar lag coefficient. • ϕk-k,ℓ and ϕ-kk,ℓ are (m – 1)-D vectors of lag coefficients. • Φ-k-k,ℓ is an (m – 1)-by-(m – 1) matrix of lag coefficients. For k = 1,…,m, gctest tests the null hypothesis that the variables in y-k,t are not h-step Grangercauses of yk,t: H0 : ϕ21, 1 = ... = ϕ21, p = 0 H1 : ∃r ∈ 1, ..., p ∋ ϕ21, r ≠ 0. gctest conducts m tests. Vector Autoregression Model A vector autoregression (VAR) model is a stationary multivariate time series model consisting of a system of m equations of m distinct response variables as linear functions of lagged responses and other terms. A VAR(p) model in difference-equation notation and in reduced form is yt = c + Φ1 yt − 1 + Φ2 yt − 2 + ... + Φp yt − p + βxt + δt + εt . • yt is a numseries-by-1 vector of values corresponding to numseries response variables at time t, where t = 1,...,T. The structural coefficient is the identity matrix. • c is a numseries-by-1 vector of constants. • Φj is a numseries-by-numseries matrix of autoregressive coefficients, where j = 1,...,p and Φp is not a matrix containing only zeros. • xt is a numpreds-by-1 vector of values corresponding to numpreds exogenous predictor variables. • β is a numseries-by-numpreds matrix of regression coefficients. • δ is a numseries-by-1 vector of linear time-trend values. 12-1237
12
Functions
• εt is a numseries-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively a numseries-by-numseries covariance matrix Σ. For t ≠ s, εt and εs are independent. Condensed and in lag operator notation, the system is Φ(L)yt = c + βxt + δt + εt, where Φ(L) = I − Φ1L − Φ2L2 − ... − ΦpLp, Φ(L)yt is the multivariate autoregressive polynomial, and I is the numseries-by-numseries identity matrix. For example, a VAR(1) model containing two response series and three exogenous predictor variables has this form: y1, t = c1 + ϕ11 y1, t − 1 + ϕ12 y2, t − 1 + β11x1, t + β12x2, t + β13x3, t + δ1t + ε1, t y2, t = c2 + ϕ21 y1, t − 1 + ϕ22 y2, t − 1 + β21x1, t + β22x2, t + β23x3, t + δ2t + ε2, t .
Tip • gctest uses the series names in Mdl in test result summaries. To make the output more meaningful for your application, specify series names by setting the SeriesNames property of the VAR model object Mdl by using dot notation before calling gctest. For example, the following code assigns names to the variables in the 3-D VAR model object Mdl: Mdl.SeriesNames = ["rGDP" "m1sl" "inflation"];
• The exclude-all and leave-one-out Granger causality tests conduct multiple, simultaneous tests. To control the inevitable increase in the false discovery rate, decrease the level of significance Alpha when you conduct multiple tests. For example, to achieve a family-wise significance level of 0.05, specify 'Alpha',0.05/numtests.
Algorithms The name-value pair arguments Cause and Effect apply to the block-wise Granger causality test because they specify which equations have lag coefficients set to 0 for the null hypothesis. Because the leave-one-out and exclude-all Granger causality tests cycle through all combinations of variables in the VAR model, the information provided by Cause and Effect is not necessary. However, you can specify a leave-one-out or exclude-all Granger causality test and the Cause and Effect variables to conduct unusual tests such as constraints on self lags. For example, the following code assesses the null hypothesis that the first variable in the VAR model Mdl is not the 1-step Granger-cause of itself: gctest(Mdl,'Type',"leave-one-out",'Cause',1,'Effect',1);
Version History Introduced in R2019a
References [1] Granger, C. W. J. "Investigating Causal Relations by Econometric Models and Cross-Spectral Methods." Econometrica. Vol. 37, 1969, pp. 424–459. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-1238
gctest
[3] Dolado, J. J., and H. Lütkepohl. "Making Wald Tests Work for Cointegrated VAR Systems." Econometric Reviews. Vol. 15, 1996, pp. 369–386. [4] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [5] Toda, H. Y., and T. Yamamoto. "Statistical Inferences in Vector Autoregressions with Possibly Integrated Processes." Journal of Econometrics. Vol. 66, 1995, pp. 225–250.
See Also Objects varm Functions armairf | armafevd | estimate | gctest Topics “Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
12-1239
12
Functions
gjr GJR conditional variance time series model
Description Use gjr to specify a univariate GJR (Glosten, Jagannathan, and Runkle) model. The gjr function returns a gjr object specifying the functional form of a GJR(P,Q) model on page 12-1256, and stores its parameter values. The key components of a gjr model include the: • GARCH polynomial, which is composed of lagged conditional variances. The degree is denoted by P. • ARCH polynomial, which is composed of the lagged squared innovations. • Leverage polynomial, which is composed of lagged squared, negative innovations. • Maximum of the ARCH and leverage polynomial degrees, denoted by Q. P is the maximum nonzero lag in the GARCH polynomial, and Q is the maximum nonzero lag in the ARCH and leverage polynomials. Other model components include an innovation mean model offset, a conditional variance model constant, and the innovations distribution. All coefficients are unknown (NaN values) and estimable unless you specify their values using namevalue pair argument syntax. To estimate models containing all or partially unknown parameter values given data, use estimate. For completely specified models (models in which all parameter values are known), simulate or forecast responses using simulate or forecast, respectively.
Creation Syntax Mdl = gjr Mdl = gjr(P,Q) Mdl = gjr(Name,Value) Description Mdl = gjr returns a zero-degree conditional variance gjr object. Mdl = gjr(P,Q) creates a GJR conditional variance model object (Mdl) with a GARCH polynomial with a degree of P and ARCH and leverage polynomials each with a degree of Q. All polynomials contain all consecutive lags from 1 through their degrees, and all coefficients are NaN values. This shorthand syntax enables you to create a template in which you specify the polynomial degrees explicitly. The model template is suited for unrestricted parameter estimation, that is, estimation without any parameter equality constraints. However, after you create a model, you can alter property values using dot notation. 12-1240
gjr
Mdl = gjr(Name,Value) sets properties on page 12-1242 or additional options using name-value pair arguments. Enclose each property name in quotes. For example, 'ARCHLags',[1 4],'ARCH', {0.2 0.3} specifies the two ARCH coefficients in ARCH at lags 1 and 4. This longhand syntax enables you to create more flexible models. Input Arguments The shorthand syntax provides an easy way for you to create model templates that are suitable for unrestricted parameter estimation. For example, to create a GJR(1,2) model containing unknown parameter values, enter: Mdl = gjr(1,2);
To impose equality constraints on parameter values during estimation, set the appropriate property on page 12-1242 values using dot notation. P — GARCH polynomial degree nonnegative integer GARCH polynomial degree, specified as a nonnegative integer. In the GARCH polynomial and at time t, MATLAB includes all consecutive conditional variance terms from lag t – 1 through lag t – P. You can specify this argument using the gjr(P,Q) shorthand syntax only. If P > 0, then you must specify Q as a positive integer. Example: gjr(1,1) Data Types: double Q — ARCH polynomial degree nonnegative integer ARCH polynomial degree, specified as a nonnegative integer. In the ARCH polynomial and at time t, MATLAB includes all consecutive squared innovation terms (for the ARCH polynomial) and squared, negative innovation terms (for the leverage polynomial) from lag t – 1 through lag t – Q. You can specify this argument using the gjr(P,Q) shorthand syntax only. If P > 0, then you must specify Q as a positive integer. Example: gjr(1,1) Data Types: double Name-Value Pair Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. The longhand syntax enables you to create models in which some or all coefficients are known. During estimation, estimate imposes equality constraints on any known parameters. 12-1241
12
Functions
Example: 'ARCHLags',[1 4],'ARCH',{NaN NaN} specifies a GJR(0,4) model and unknown, but nonzero, ARCH coefficient matrices at lags 1 and 4. GARCHLags — GARCH polynomial lags 1:P (default) | numeric vector of unique positive integers GARCH polynomial lags, specified as the comma-separated pair consisting of 'GARCHLags' and a numeric vector of unique positive integers. GARCHLags(j) is the lag corresponding to the coefficient GARCH{j}. The lengths of GARCHLags and GARCH must be equal. Assuming all GARCH coefficients (specified by the GARCH property) are positive or NaN values, max(GARCHLags) determines the value of the P property. Example: 'GARCHLags',[1 4] Data Types: double ARCHLags — ARCH polynomial lags 1:Q (default) | numeric vector of unique positive integers ARCH polynomial lags, specified as the comma-separated pair consisting of 'ARCHLags' and a numeric vector of unique positive integers. ARCHLags(j) is the lag corresponding to the coefficient ARCH{j}. The lengths of ARCHLags and ARCH must be equal. Assuming all ARCH and leverage coefficients (specified by the ARCH and Leverage properties) are positive or NaN values, max([ARCHLags LeverageLags]) determines the value of the Q property. Example: 'ARCHLags',[1 4] Data Types: double LeverageLags — Leverage polynomial lags 1:Q (default) | numeric vector of unique positive integers Leverage polynomial lags, specified as the comma-separated pair consisting of 'LeverageLags' and a numeric vector of unique positive integers. LeverageLags(j) is the lag corresponding to the coefficient Leverage{j}. The lengths of LeverageLags and Leverage must be equal. Assuming all ARCH and leverage coefficients (specified by the ARCH and Leverage properties) are positive or NaN values, max([ARCHLags LeverageLags]) determines the value of the Q property. Example: 'LeverageLags',1:4 Data Types: double
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to create a GJR(1,1) model with unknown coefficients, and then specify a t innovation distribution with unknown degrees of freedom, enter: 12-1242
gjr
Mdl = gjr('GARCHLags',1,'ARCHLags',1); Mdl.Distribution = "t";
P — GARCH polynomial degree nonnegative integer This property is read-only. GARCH polynomial degree, specified as a nonnegative integer. P is the maximum lag in the GARCH polynomial with a coefficient that is positive or NaN. Lags that are less than P can have coefficients equal to 0. P specifies the minimum number of presample conditional variances required to initialize the model. If you use name-value pair arguments to create the model, then MATLAB implements one of these alternatives (assuming the coefficient of the largest lag is positive or NaN): • If you specify GARCHLags, then P is the largest specified lag. • If you specify GARCH, then P is the number of elements of the specified value. If you also specify GARCHLags, then gjr uses GARCHLags to determine P instead. • Otherwise, P is 0. Data Types: double Q — Maximum degree of ARCH and leverage polynomials nonnegative integer This property is read-only. Maximum degree of ARCH and leverage polynomials, specified as a nonnegative integer. Q is the maximum lag in the ARCH and leverage polynomials in the model. In either type of polynomial, lags that are less than Q can have coefficients equal to 0. Q specifies the minimum number of presample innovations required to initiate the model. If you use name-value pair arguments to create the model, then MATLAB implements one of these alternatives (assuming the coefficients of the largest lags in the ARCH and leverage polynomials are positive or NaN): • If you specify ARCHLags or LeverageLags, then Q is the maximum between the two specifications. • If you specify ARCH or Leverage, then Q is the maximum number of elements between the two specifications. If you also specify ARCHLags or LeverageLags, then gjr uses their values to determine Q instead. • Otherwise, Q is 0. Data Types: double Constant — Conditional variance model constant NaN (default) | positive scalar Conditional variance model constant, specified as a positive scalar or NaN value. Data Types: double 12-1243
12
Functions
GARCH — GARCH polynomial coefficients cell vector of positive scalars or NaN values GARCH polynomial coefficients, specified as a cell vector of positive scalars or NaN values. • If you specify GARCHLags, then the following conditions apply. • The lengths of GARCH and GARCHLags are equal. • GARCH{j} is the coefficient of lag GARCHLags(j). • By default, GARCH is a numel(GARCHLags)-by-1 cell vector of NaN values. • Otherwise, the following conditions apply. • The length of GARCH is P. • GARCH{j} is the coefficient of lag j. • By default, GARCH is a P-by-1 cell vector of NaN values. The coefficients in GARCH correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, gjr excludes that coefficient and its corresponding lag in GARCHLags from the model. Data Types: cell ARCH — ARCH polynomial coefficients cell vector of positive scalars or NaN values ARCH polynomial coefficients, specified as a cell vector of positive scalars or NaN values. • If you specify ARCHLags, then the following conditions apply. • The lengths of ARCH and ARCHLags are equal. • ARCH{j} is the coefficient of lag ARCHLags(j). • By default, ARCH is a Q-by-1 cell vector of NaN values. For more details, see the Q property. • Otherwise, the following conditions apply. • The length of ARCH is Q. • ARCH{j} is the coefficient of lag j. • By default, ARCH is a Q-by-1 cell vector of NaN values. The coefficients in ARCH correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, gjr excludes that coefficient and its corresponding lag in ARCHLags from the model. Data Types: cell Leverage — Leverage polynomial coefficients cell vector of numeric scalars or NaN values Leverage polynomial coefficients, specified as a cell vector of numeric scalars or NaN values. • If you specify LeverageLags, then the following conditions apply. • The lengths of Leverage and LeverageLags are equal. 12-1244
gjr
• Leverage{j} is the coefficient of lag LeverageLags(j). • By default, Leverage is a Q-by-1 cell vector of NaN values. For more details, see the Q property. • Otherwise, the following conditions apply. • The length of Leverage is Q. • Leverage{j} is the coefficient of lag j. • By default, Leverage is a Q-by-1 cell vector of NaN values. The coefficients in Leverage correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, gjr excludes that coefficient and its corresponding lag in LeverageLags from the model. Data Types: cell UnconditionalVariance — Model unconditional variance positive scalar This property is read-only. The model unconditional variance, specified as a positive scalar. The unconditional variance is σε2 =
κ P i = 1 γi −
1−∑
Q 1 j = 1αj − 2
∑
Q
∑j = 1ξj
.
κ is the conditional variance model constant (Constant). Data Types: double Offset — Innovation mean model offset 0 (default) | numeric scalar | NaN Innovation mean model offset, or additive constant, specified as a numeric scalar or NaN value. Data Types: double Distribution — Conditional probability distribution of innovation process "Gaussian" (default) | "t" | structure array Conditional probability distribution of the innovation process, specified as a string or structure array. gjr stores the value as a structure array. Distribution
String
Structure Array
Gaussian
"Gaussian"
struct('Name',"Gaussian")
Student’s t
"t"
struct('Name',"t",'DoF',DoF)
The 'DoF' field specifies the t distribution degrees of freedom parameter. • DoF > 2 or DoF = NaN. • DoF is estimable. • If you specify "t", DoF is NaN by default. You can change its value by using dot notation after you create the model. For example, Mdl.Distribution.DoF = 3. 12-1245
12
Functions
• If you supply a structure array to specify the Student's t distribution, then you must specify both the 'Name' and 'DoF' fields. Example: struct('Name',"t",'DoF',10) Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. gjr stores the value as a string scalar. The default value describes the parametric form of the model, for example "GJR(1,1) Conditional Variance Model (Gaussian Distribution)". Data Types: string | char Note • All NaN-valued model parameters, which include coefficients and the t-innovation-distribution degrees of freedom (if present), are estimable. When you pass the resulting gjr object and data to estimate, MATLAB estimates all NaN-valued parameters. During estimation, estimate treats known parameters as equality constraints, that is,estimate holds any known parameters fixed at their values. • Typically, the lags in the ARCH and leverage polynomials are the same, but their equality is not a requirement. Differing polynomials occur when: • Either ARCH{Q} or Leverage{Q} meets the near-zero exclusion tolerance. In this case, MATLAB excludes the corresponding lag from the polynomial. • You specify polynomials of differing lengths by specifying ARCHLags or LeverageLags, or by setting the ARCH or Leverage property. In either case, Q is the maximum lag between the two polynomials.
Object Functions estimate filter forecast infer simulate summarize
Fit conditional variance model to data Filter disturbances through conditional variance model Forecast conditional variances from conditional variance models Infer conditional variances of conditional variance models Monte Carlo simulation of conditional variance models Display estimation results of conditional variance model
Examples Create Default GJR Model Create a default gjr model object and specify its parameter values using dot notation. Create a GJR(0,0) model. Mdl = gjr Mdl = gjr with properties:
12-1246
gjr
Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(0,0) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 0 0 NaN {} {} {} 0
Mdl is a gjr model object. It contains an unknown constant, its offset is 0, and the innovation distribution is 'Gaussian'. The model does not have GARCH, ARCH, or leverage polynomials. Specify two unknown ARCH and leverage coefficients for lags one and two using dot notation. Mdl.ARCH = {NaN NaN}; Mdl.Leverage = {NaN NaN}; Mdl Mdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(0,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 0 2 NaN {} {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
The Q, ARCH, and Leverage properties update to 2, {NaN NaN}, and {NaN NaN}, respectively. The two ARCH and leverage coefficients are associated with lags 1 and 2.
Create GJR Model Using Shorthand Syntax Create a gjr model object using the shorthand notation gjr(P,Q), where P is the degree of the GARCH polynomial and Q is the degree of the ARCH and leverage polynomials. Create an GJR(3,2) model. Mdl = gjr(3,2) Mdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH:
"GJR(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2 NaN {NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2]
12-1247
12
Functions
Leverage: {NaN NaN} at lags [1 2] Offset: 0
Mdl is a gjr model object. All properties of Mdl, except P, Q, and Distribution, are NaN values. By default, the software: • Includes a conditional variance model constant • Excludes a conditional mean model offset (i.e., the offset is 0) • Includes all lag terms in the GARCH polynomial up to lags P • Includes all lag terms in the ARCH and leverage polynomials up to lag Q Mdl specifies only the functional form of a GJR model. Because it contains unknown parameter values, you can pass Mdl and time-series data to estimate to estimate the parameters.
Create GJR Model Using Longhand Syntax Create a gjr model using name-value pair arguments. Specify a GJR(1,1) model. Mdl = gjr('GARCHLags',1,'ARCHLags',1,'LeverageLags',1) Mdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] {NaN} at lag [1] 0
Mdl is a gjr model object. The software sets all parameters to NaN, except P, Q, Distribution, and Offset (which is 0 by default). Since Mdl contains NaN values, Mdl is only appropriate for estimation only. Pass Mdl and time-series data to estimate.
Create GJR Model with Known Coefficients Create a GJR(1,1) model with mean offset yt = 0 . 5 + εt, where εt = σtzt, σt2 = 0 . 0001 + 0 . 35σt2− 1 + 0 . 1εt2− 1 + 0 . 03εt2− 1I(εt − 1 < 0) + 0 . 01εt2− 3I(εt − 3 < 0), 12-1248
gjr
and zt is an independent and identically distributed standard Gaussian process. Mdl = gjr('Constant',0.0001,'GARCH',0.35,... 'ARCH',0.1,'Offset',0.5,'Leverage',{0.03 0 0.01}) Mdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(1,3) Conditional Variance Model with Offset (Gaussian Distribution)" Name = "Gaussian" 1 3 0.0001 {0.35} at lag [1] {0.1} at lag [1] {0.03 0.01} at lags [1 3] 0.5
gjr assigns default values to any properties you do not specify with name-value pair arguments. An alternative way to specify the leverage component is 'Leverage',{0.03 0.01},'LeverageLags',[1 3].
Access GJR Model Properties Access the properties of a gjr model object using dot notation. Create a gjr model object. Mdl = gjr(3,2) Mdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2 NaN {NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Remove the second GARCH term from the model. That is, specify that the GARCH coefficient of the second lagged conditional variance is 0. Mdl.GARCH{2} = 0 Mdl = gjr with properties: Description: Distribution: P: Q:
"GJR(3,2) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 3 2
12-1249
12
Functions
Constant: GARCH: ARCH: Leverage: Offset:
NaN {NaN NaN} at lags [1 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
The GARCH polynomial has two unknown parameters corresponding to lags 1 and 3. Display the distribution of the disturbances. Mdl.Distribution ans = struct with fields: Name: "Gaussian"
The disturbances are Gaussian with mean 0 and variance 1. Specify that the underlying disturbances have a t distribution with five degrees of freedom. Mdl.Distribution = struct('Name','t','DoF',5) Mdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(3,2) Conditional Variance Model (t Distribution)" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Specify that the ARCH coefficients are 0.2 for the first lag and 0.1 for the second lag. Mdl.ARCH = {0.2 0.1} Mdl = gjr with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"GJR(3,2) Conditional Variance Model (t Distribution)" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {0.2 0.1} at lags [1 2] {NaN NaN} at lags [1 2] 0
To estimate the remaining parameters, you can pass Mdl and your data to estimate and use the specified parameters as equality constraints. Or, you can specify the rest of the parameter values, and then simulate or forecast conditional variances from the GARCH model by passing the fully specified model to simulate or forecast, respectively.
12-1250
gjr
Estimate GJR Model Fit a GJR model to an annual time series of stock price index returns from 1861-1970. Load the Nelson-Plosser data set. Convert the yearly stock price indices (SP) to returns. Plot the returns. load Data_NelsonPlosser; sp = price2ret(DataTable.SP); figure; plot(dates(2:end),sp); hold on; plot([dates(2) dates(end)],[0 0],'r:'); % Plot y = 0 hold off; title('Returns'); ylabel('Return (%)'); xlabel('Year'); axis tight;
The return series does not seem to have a conditional mean offset, and seems to exhibit volatility clustering. That is, the variability is smaller for earlier years than it is for later years. For this example, assume that an GJR(1,1) model is appropriate for this series. Create a GJR(1,1) model. The conditional mean offset is zero by default. The software includes a conditional variance model constant by default. Mdl = gjr('GARCHLags',1,'ARCHLags',1,'LeverageLags',1);
12-1251
12
Functions
Fit the GJR(1,1) model to the data. EstMdl = estimate(Mdl,sp); GJR(1,1) Conditional Variance Model (Gaussian Distribution):
Constant GARCH{1} ARCH{1} Leverage{1}
Value _________
StandardError _____________
0.0045728 0.55808 0.20461 0.18066
0.0044199 0.24 0.17886 0.26802
TStatistic __________ 1.0346 2.3253 1.144 0.67406
PValue ________ 0.30086 0.020057 0.25263 0.50027
EstMdl is a fully specified gjr model object. That is, it does not contain NaN values. You can assess the adequacy of the model by generating residuals using infer, and then analyzing them. To simulate conditional variances or responses, pass EstMdl to simulate. To forecast innovations, pass EstMdl to forecast.
Simulate GJR Model Observations and Conditional Variances Simulate conditional variance or response paths from a fully specified gjr model object. That is, simulate from an estimated gjr model or a known gjr model in which you specify all parameter values. Load the Nelson-Plosser data set. Convert the yearly stock price indices to returns. load Data_NelsonPlosser; sp = price2ret(DataTable.SP);
Create a GJR(1,1) model. Fit the model to the return series. Mdl = gjr(1,1); EstMdl = estimate(Mdl,sp); GJR(1,1) Conditional Variance Model (Gaussian Distribution):
Constant GARCH{1} ARCH{1} Leverage{1}
Value _________
StandardError _____________
0.0045728 0.55808 0.20461 0.18066
0.0044199 0.24 0.17886 0.26802
TStatistic __________ 1.0346 2.3253 1.144 0.67406
PValue ________ 0.30086 0.020057 0.25263 0.50027
Simulate 100 paths of conditional variances and responses from the estimated GJR model. numObs = numel(sp); % Sample size (T) numPaths = 100; % Number of paths to simulate rng(1); % For reproducibility [VSim,YSim] = simulate(EstMdl,numObs,'NumPaths',numPaths);
12-1252
gjr
VSim and YSim are T-by- numPaths matrices. Rows correspond to a sample period, and columns correspond to a simulated path. Plot the average and the 97.5% and 2.5% percentiles of the simulated paths. Compare the simulation statistics to the original data. dates = dates(2:end); VSimBar = mean(VSim,2); VSimCI = quantile(VSim,[0.025 0.975],2); YSimBar = mean(YSim,2); YSimCI = quantile(YSim,[0.025 0.975],2); figure; subplot(2,1,1); h1 = plot(dates,VSim,'Color',0.8*ones(1,3)); hold on; h2 = plot(dates,VSimBar,'k--','LineWidth',2); h3 = plot(dates,VSimCI,'r--','LineWidth',2); hold off; title('Simulated Conditional Variances'); ylabel('Cond. var.'); xlabel('Year'); axis tight; subplot(2,1,2); h1 = plot(dates,YSim,'Color',0.8*ones(1,3)); hold on; h2 = plot(dates,YSimBar,'k--','LineWidth',2); h3 = plot(dates,YSimCI,'r--','LineWidth',2); hold off; title('Simulated Nominal Returns'); ylabel('Nominal return (%)'); xlabel('Year'); axis tight; legend([h1(1) h2 h3(1)],{'Simulated path' 'Mean' 'Confidence bounds'},... 'FontSize',7,'Location','NorthWest');
12-1253
12
Functions
Forecast GJR Model Conditional Variances Forecast conditional variances from a fully specified gjr model object. That is, forecast from an estimated gjr model or a known gjr model in which you specify all parameter values. Load the Nelson-Plosser data set. Convert the yearly stock price indices (SP) to returns. load Data_NelsonPlosser; sp = price2ret(DataTable.SP);
Create a GJR(1,1) model and fit it to the return series. Mdl = gjr('GARCHLags',1,'ARCHLags',1,'LeverageLags',1); EstMdl = estimate(Mdl,sp); GJR(1,1) Conditional Variance Model (Gaussian Distribution):
Constant GARCH{1} ARCH{1} Leverage{1}
12-1254
Value _________
StandardError _____________
0.0045728 0.55808 0.20461 0.18066
0.0044199 0.24 0.17886 0.26802
TStatistic __________ 1.0346 2.3253 1.144 0.67406
PValue ________ 0.30086 0.020057 0.25263 0.50027
gjr
Forecast the conditional variance of the nominal return series 10 years into the future using the estimated GJR model. Specify the entire return series as presample observations. The software infers presample conditional variances using the presample observations and the model. numPeriods = 10; vF = forecast(EstMdl,numPeriods,sp);
Plot the forecasted conditional variances of the nominal returns. Compare the forecasts to the observed conditional variances. v = infer(EstMdl,sp); nV = size(v,1); dates = dates((end - nV + 1):end); figure; plot(dates,v,'k:','LineWidth',2); hold on; plot(dates(end):dates(end) + 10,[v(end);vF],'r','LineWidth',2); title('Forecasted Conditional Variances of Returns'); ylabel('Conditional variances'); xlabel('Year'); axis tight; legend({'Estimation Sample Cond. Var.','Forecasted Cond. var.'},... 'Location','NorthWest');
12-1255
12
Functions
More About GJR Model The Glosten, Jagannathan, and Runkle (GJR) model is a dynamic model that addresses conditional heteroscedasticity, or volatility clustering, in an innovations process. Volatility clustering occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes with time. The GJR model is a generalization of the GARCH model that is appropriate for modeling asymmetric volatility clustering [1]. Specifically, the model posits that the current conditional variance is the sum of these linear processes, with coefficients: • Past conditional variances (the GARCH component or polynomial). • Past squared innovations (the ARCH component or polynomial). • Past squared, negative innovations (the leverage component or polynomial). Consider the time series yt = μ + εt, where εt = σtzt . The GJR(P,Q) conditional variance process, σt2, has the form σt2 = κ +
P
∑
i=1
γiσt2− i +
Q
∑
j=1
α jεt2− j +
Q
∑
j=1
ξ jI εt − j < 0 εt2− j .
The table shows how the variables correspond to the properties of the gjr object. In the table, I[x < 0] = 1, and 0 otherwise. Variable
Description
μ
Innovation mean model constant 'Offset' offset
κ>0
Conditional variance model constant
'Constant'
γj
GARCH component coefficients
'GARCH'
αj
ARCH component coefficients
'ARCH'
ξj
Leverage component coefficients
'Leverage'
zt
Series of independent random variables with mean 0 and variance 1
'Distribution'
For stationarity and positivity, GJR models use these constraints: • κ>0 • γi ≥ 0, α j ≥ 0 • αj + ξ j ≥ 0 •
12-1256
∑iP= 1 γi + ∑ j = 1 α j + 12 ∑ j = 1 ξ j < 1 Q
Q
Property
gjr
GJR models are appropriate when negative shocks of contribute more to volatility than positive shocks [2]. If all leverage coefficients are zero, then the GJR model reduces to the GARCH model. Because the GARCH model is nested in the GJR model, you can use likelihood ratio tests to compare a GARCH model fit against a GJR model fit.
Tip You can specify a gjr model as part of a composition of conditional mean and variance models. For details, see arima.
Version History Introduced in R2012a
References [1] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [2] Tsay, R. S. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2010.
See Also Objects garch | egarch | arima Topics “Specify GJR Models” on page 8-28 “Modify Properties of Conditional Variance Models” on page 8-39 “Specify Conditional Mean and Variance Models” on page 7-81 “Infer Conditional Variances and Residuals” on page 8-62 “Compare Conditional Variance Models Using Information Criteria” on page 8-69 “Simulate GARCH Models” on page 8-76 “Forecast GJR Models” on page 8-94 “Conditional Variance Models” on page 8-2 “GJR Model” on page 8-4
12-1257
12
Functions
graphplot Plot Markov chain directed graph
Syntax graphplot(mc) graphplot(mc,Name,Value) graphplot(ax, ___ ) h = graphplot( ___ )
Description graphplot(mc) creates a plot of the directed graph (digraph) of the discrete-time Markov chain mc. Nodes correspond to the states of mc. Directed edges correspond to nonzero transition probabilities in the transition matrix mc.P. graphplot(mc,Name,Value) uses additional options specified by one or more name-value arguments. Options include highlighting transition probabilities, communicating classes, and specifying class properties of recurrence/transience and period. Also, you can plot the condensed digraph instead, with communicating classes as supernodes. graphplot(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca) using any of the input argument combinations in the previous syntaxes. The option ax can precede any of the input argument combinations in the previous syntaxes. h = graphplot( ___ ) returns the handle to the digraph plot. Use h to modify properties of the plot after you create it.
Examples Plot Digraph of Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0
12-1258
0 0
1/2 1/4 1/4 1/3 0 2/3
0 0
0 ; 0 ;
graphplot
0 0 0 0 0 0 1/2 1/2 1/4 3/4 mc = dtmc(P);
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1/3 2/3; 1/2 1/2; 3/4 1/4; 0 0 ; 0 0 ];
Plot a directed graph of the Markov chain. figure; graphplot(mc);
Identify Communicating Classes in Digraph Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0.5 0.5 P= 0 0
0.5 0 0 0
0 0.5 0 1
0 0 . 1 0
Create the Markov chain that is characterized by the transition matrix P. Name the states Regime 1 through Regime 4. 12-1259
12
Functions
P = [0.5 0.5 0 0; 0.5 0 0.5 0; 0 0 0 1; 0 0 1 0]; mc = dtmc(P,'StateNames',["Regime 1" "Regime 2" "Regime 3" "Regime 4"]);
Plot a directed graph of the Markov chain. Identify the communicating classes in the digraph and color the edges according to the probability of transition. figure; graphplot(mc,'ColorNodes',true,'ColorEdges',true)
States 3 and 4 compose a communicating class with period 2. States 1 and 2 are transient.
Suppress State Labels Create a "dumbbell" Markov chain containing 10 states in each "weight" and three states in the "bar." • Specify random transition probabilities between states within each weight. • If the Markov chain reaches the state in a weight that is closest to the bar, then specify a high probability of transitioning to the bar. • Specify uniform transitions between states in the bar. rng(1); % For reproducibility w = 10; DBar = [0 1 0; 1 0 1; 0 1 0]; DB = blkdiag(rand(w),DBar,rand(w));
12-1260
% Dumbbell weights % Dumbbell bar % Transition matrix
graphplot
% Connect dumbbell weights and bar DB(w,w+1) = 1; DB(w+1,w) = 1; DB(w+3,w+4) = 1; DB(w+4,w+3) = 1; db = dtmc(DB);
Plot a directed graph of the Markov chain. Return the plot handle. figure; h = graphplot(db);
Observe that the state labels are difficult to read. Remove the labels entirely. h.NodeLabel = {};
12-1261
12
Functions
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, graphplot plots to the current axes (gca). Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'ColorEdges',true,'ColorNodes',true colors the edges to indicate transition probabilities and colors the nodes based on their communicating class. 12-1262
graphplot
LabelNodes — Flag for labeling nodes using state names true (default) | false Flag for labeling nodes using state names, specified as the comma-separated pair consisting of 'LabelNodes' and a value in this table. Value
Description
true
Label nodes using the names in mc.StateNames.
false
Label nodes using state numbers.
Example: 'LabelNodes',false Data Types: logical ColorNodes — Flag for coloring nodes based on communicating class false (default) | true Flag for coloring nodes based on communicating class, specified as the comma-separated pair consisting of 'ColorNodes' and a value in this table. Value
Description
true
Nodes in the same communicating class have the same color. Solid markers represent nodes in recurrent classes, and hollow markers represent nodes in transient classes. The legend contains the periodicity of recurrent classes.
false
All nodes have the same color.
Example: 'ColorNodes',true Data Types: logical LabelEdges — Flag for labeling edges with transition probabilities false (default) | true Flag for labeling edges with the transition probabilities in the transition matrix mc.P, specified as the comma-separated pair consisting of 'LabelEdges' and a value in this table. Value
Description
true
Label edges with transition probabilities rounded to two decimal places.
false
Do not label edges.
Example: 'LabelEdges',true Data Types: logical ColorEdges — Flag for coloring edges to indicate transition probabilities false (default) | true Flag for coloring edges to indicate transition probabilities, specified as the comma-separated pair consisting of 'ColorEdges' and a value in this table.
12-1263
12
Functions
Value
Description
true
Color edges to indicate transition probabilities. Include a color bar, which summarizes the color coding.
false
Use the same color for all edges.
Example: 'ColorEdges',true Data Types: logical Condense — Flag for condensing the graph false (default) | true Flag for condensing the graph, with each communicating class represented by a single supernode, specified as the comma-separated pair consisting of 'Condense' and a value in this table. Value
Description
true
Nodes are supernodes containing communicating classes. Node labels list the component states of each supernode. An edge from supernode i to supernode j indicates a nonzero probability of transition from some state in supernode i to some state in supernode j. Transition probabilities between supernodes are not well defined, and graphplot disables edge information.
false
Nodes are states in mc.
Example: 'Condense',true Data Types: logical
Output Arguments h — Handle to graph plot graphics object Handle to the graph plot, returned as a graphics object. h is a unique identifier, which you can use to query or modify properties of the plot.
Tips • To produce the directed graph as a MATLAB digraph object and use additional functions of that object, enter: G = digraph(mc.P)
• For readability, the 'LabelNodes' name-value pair argument allows you to turn off lengthy node labels and replace them with node numbers. To remove node labels completely, set h.NodeLabel = {};. • To compute node information on communicating classes and their properties, use classify. • To extract a communicating class in the graph, use subchain. • The condensed graph is useful for: • Identifying transient classes (supernodes with positive outdegree) • Identifying recurrent classes (supernodes with zero outdegree) 12-1264
graphplot
• Visualizing the overall structure of unichains (chains with a single recurrent class and any transient classes that transition into it)
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [3] Jarvis, J. P., and D. R. Shier. "Graph-Theoretic Analysis of Finite Markov Chains." In Applied Mathematical Modeling: A Multidisciplinary Approach. Boca Raton: CRC Press, 2000.
See Also Objects digraph Functions classify | subchain Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Identify Classes in Markov Chain” on page 10-47
12-1265
12
Functions
hac Heteroscedasticity and autocorrelation consistent covariance estimators
Syntax [EstCoeffCov,se,coeff] = hac(X,y) [EstCoeffCov,se,coeff] = hac(Mdl) [CovTbl,CoeffTbl] = hac(Tbl) [ ___ ] = hac( ___ ,Name=Value)
Description [EstCoeffCov,se,coeff] = hac(X,y) returns a robust covariance matrix estimate EstCoeffCov, and vectors of corrected standard errors se and OLS coefficient estimates coeff from applying ordinary least squares (OLS) on the multiple linear regression models y = Xβ + ε under general forms of heteroscedasticity and autocorrelation in the innovations process ε. y is a vector of response data and X is a matrix of predictor data. [EstCoeffCov,se,coeff] = hac(Mdl) returns a robust covariance matrix estimate, and vectors of corrected standard errors and OLS coefficient estimates from a fitted multiple linear regression model Mdl, as returned by fitlm. [CovTbl,CoeffTbl] = hac(Tbl) returns a robust covariance matrix estimate in the table CovTbl, and corrected standard errors and OLS coefficient estimates in the table CoeffTbl from a fitted multiple linear regression model of the variables in the table or timetable Tbl. The response variable in the regression is the last table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. [ ___ ] = hac( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. hac returns the output argument combination for the corresponding input arguments. For example, hac(Tbl,ResponseVariable="GDP",Type="HC") provides a heteroscedasticityconsistent coefficient covariance estimate of a regression where the table variable GDP is the response while all other variables are predictors.
Examples Return Newey-West Coefficient Covariance Estimate By default, hac returns the Newey-West coefficient covariance estimate, which is appropriate when residuals from a linear regression fit show evidence of heteroscedasticity and autocorrelation. Simulate data from a linear model in which the innovations process is heteroscedastic and autocorrelated. Compare coefficient covariance estimates from regular OLS and the Newey-West estimator by using the default options of hac. 12-1266
hac
Simulate Data Simulate a 1000-path series of data from the following innovations process. εt = 0 . 5εt − 1 + ut ut ∼ N(0, σt2), where σt2 = 0 . 001t and ε0 = 0 . 1. rng(1); % For reproducibility T = 1000; sigma2 = 0.001*(1:T)'; ut = randn(T,1).*sqrt(sigma2); eMdl = arima(AR=0.5,Constant=0,Variance=1); et = filter(eMdl,ut,Y0=0.1);
Simulate responses from the following linear model. 1 yt = Xt 2 + εt, 3 where Xt = [1 x1, t x2, t] x j, t ∼ N j,
j . 2
X = [1 2] + randn(T,2).*sqrt([1/2 1]); beta = (1:3)'; y = [ones(T,1) X]*beta + et;
Fit Linear Model Fit a CLM to the simulated data. Plot the residuals in case order and plot each residual with respect to the previous residual. Mdl = fitlm(X,y); tiledlayout(2,1) nexttile plotResiduals(Mdl,"caseorder") nexttile plotResiduals(Mdl,"lagged")
12-1267
12
Functions
The residuals exhibit heteroscedasticity and autocorrelation. Compute Newey-West Coefficient Covariance Estimate Estimate the Newey-West covariance, which accounts for the heteroscedasticity and autocorrelation of the residuals, by passing the data to hac. Return standard errors and coefficients. Compare results against the CLM approach. [EstCoeffCov,se,coeff] = hac(X,y) Estimator type: HAC Estimation method: BT Bandwidth: 11.1758 Whitening order: 0 Effective sample size: 1000 Small sample correction: on Coefficient Covariances: | Const x1 x2 ----------------------------------Const | 0.0042 -0.0009 -0.0011 x1 | -0.0009 0.0012 -0.0000 x2 | -0.0011 -0.0000 0.0006 EstCoeffCov = 3×3 0.0042
12-1268
-0.0009
-0.0011
hac
-0.0009 -0.0011
0.0012 -0.0000
-0.0000 0.0006
se = 3×1 0.0645 0.0341 0.0254 coeff = 3×1 1.0444 2.0153 2.9567 EstCoeffCovOLS = Mdl.CoefficientCovariance EstCoeffCovOLS = 3×3 0.0045 -0.0012 -0.0013
-0.0012 0.0012 -0.0000
-0.0013 -0.0000 0.0007
seOLS = Mdl.Coefficients.SE seOLS = 3×1 0.0669 0.0348 0.0257 coeffOLS = Mdl.Coefficients.Estimate coeffOLS = 3×1 1.0444 2.0153 2.9567
Between the CLM and Newey-West approaches, the coefficient estimates are equal, but the standard errors and covariances are different.
Estimate White's Robust Covariance from Fitted Linear Model Model an automobile's price with its curb weight, engine size, and cylinder bore diameter using the linear model: pricei = β0 + β1curbWeighti + β2engineSizei + β3borei + εi . Estimate model coefficients and White's robust covariance. 12-1269
12
Functions
Load the 1985 automobile imports data set import-85.mat, which contains all variables in the matrix X. Collect the variables in the regression model in a table. load imports-85 varnames = ["CurbWeight" "EngineSize" "Bore" "Price"]; Tbl = array2table(X(:,[7:9 15]),VariableNames=varnames);
Fit the linear model to the data and plot the residuals versus the fitted values. Mdl = fitlm(Tbl); plotResiduals(Mdl,"fitted")
The residuals appear to flare out, which indicates heteroscedasticity. Use hac to compute the usual OLS coefficient covariance estimate and White's robust covariance estimate. [LSCov,lsSE,lsCoeff] = hac(Mdl,Type="HC",Weights="CLM", ... Display="off") % OLS estimates LSCov = 4×4 13.7124 0.0000 0.0120 -4.5609
12-1270
0.0000 0.0000 -0.0000 -0.0005
0.0120 -0.0000 0.0002 -0.0017
-4.5609 -0.0005 -0.0017 1.8195
hac
lsSE = 4×1 3.7030 0.0011 0.0133 1.3489 lsCoeff = 4×1 64.0948 -0.0087 -0.0158 -2.6998 [WhiteCov,whiteSE,whiteCoeff] = hac(Mdl,Type="HC",Weights="HC0", ... Display="off") % White's estimates WhiteCov = 4×4 15.5122 -0.0008 0.0137 -4.4461
-0.0008 0.0000 -0.0000 -0.0003
0.0137 -0.0000 0.0001 -0.0010
-4.4461 -0.0003 -0.0010 1.5707
whiteSE = 4×1 3.9385 0.0011 0.0105 1.2533 whiteCoeff = 4×1 64.0948 -0.0087 -0.0158 -2.6998
The OLS coefficient covariance estimate is not equal to White's robust estimate because the latter accounts for the heteroscedasticity in the residuals.
Set Bandwidth for Newey-West Coefficient Covariance Estimate Model the nominal GNP (GNPN) with consumer price index (CPI), real wages (WR), and the money stock (MS) using the linear model: GNPNi = β0 + β1CPIi + β2WRi + β3MSi + εi . Estimate the model coefficients and the Newey-West OLS coefficient covariance matrix. 12-1271
12
Functions
Load the Nelson Plosser data set Data_NelsonPlosser.mat, which contains the table DataTable. Remove all observations containing at least one NaN and compute the log of the series. load Data_NelsonPlosser Tbl = rmmissing(DataTable); LogTbl = varfun(@log,Tbl); T = height(LogTbl);
Fit the linear model using the log of all series. Plot the residuals by case order and plot a correlogram of them. Mdl = fitlm(LogTbl,"log_GNPN ~ 1 + log_CPI + log_WR + log_MS"); resid = Mdl.Residuals.Raw; figure tiledlayout(2,1) nexttile plotResiduals(Mdl,"caseorder") axis tight nexttile autocorr(resid)
The residual plot exhibits signs of heteroscedasticity, autocorrelation, and possibly model misspecification. The sample autocorrelation function clearly exhibits autocorrelation. Calculate the lag selection parameter for the standard Newey-West HAC estimate [2]. maxLag = floor(4*(T/100)^(2/9));
12-1272
hac
Estimate the standard Newey-West OLS coefficient covariance by using hac. Set the bandwidth to maxLag + 1. Display the OLS coefficient estimates, and their standard errors and covariance matrix. prednames = ["log_CPI" "log_WR" "log_MS"]; [CovTbl,CoeffTbl] = hac(LogTbl,ResponseVariable="log_GNPN", ... PredictorVariables=prednames,Bandwidth=maxLag + 1, ... Display="full") Estimator type: HAC Estimation method: BT Bandwidth: 4.0000 Whitening order: 0 Effective sample size: 62 Small sample correction: on Coefficient Estimates: | Coeff SE --------------------------Const | -4.3473 0.4297 log_CPI | 0.9947 0.1001 log_WR | 1.3954 0.1283 log_MS | 0.0789 0.0625 Coefficient Covariances: | Const log_CPI log_WR log_MS ---------------------------------------------Const | 0.1847 -0.0318 -0.0433 0.0239 log_CPI | -0.0318 0.0100 0.0027 -0.0043 log_WR | -0.0433 0.0027 0.0165 -0.0064 log_MS | 0.0239 -0.0043 -0.0064 0.0039 CovTbl=4×4 table
Const log_CPI log_WR log_MS
Const _________
log_CPI __________
log_WR _________
log_MS __________
0.18468 -0.03177 -0.043323 0.023939
-0.03177 0.010015 0.0026864 -0.0043025
-0.043323 0.0026864 0.016454 -0.006435
0.023939 -0.0043025 -0.006435 0.0039086
CoeffTbl=4×2 table Coeff ________ Const log_CPI log_WR log_MS
-4.3473 0.99472 1.3954 0.078933
SE ________ 0.42974 0.10008 0.12827 0.062519
The first table in the output contains the OLS estimates (β0, . . . , β3, respectively) and their standard errors. The second table is the Newey-West coefficient covariance matrix estimate. For example, Cov(β1, β2) = 0 . 0027. hac returns tables of estimates when you supply a table of data.
12-1273
12
Functions
Plot Kernel Densities Plot the kernel density functions available in hac. Set domain, x, and range w. x = (0:0.001:3.2)'; w = zeros(size(x));
Compute the truncated kernel density. cTR = 2; % Renormalization constant TR = (abs(x) 0), for all i,j [2]. To determine reducibility, isreducible computes Q. • By the Perron-Frobenius Theorem [2], irreducible Markov chains have unique stationary distributions. Unichains, which consist of a single recurrent class plus transient classes, also have unique stationary distributions (with zero probability mass in the transient classes). Reducible chains with multiple recurrent classes have stationary distributions that depend on the initial distribution.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985.
See Also Objects dtmc Functions asymptotics | classify | isergodic Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Determine Asymptotic Behavior of Markov Chain” on page 10-39 “Identify Classes in Markov Chain” on page 10-47
12-1500
isStable
isStable Determine stability of lag operator polynomial
Syntax [indicator,eigenvalues] = isStable(A)
Description [indicator,eigenvalues] = isStable(A) takes a lag operator polynomial object A and checks if it is stable. The stability condition requires that the magnitudes of all roots of the characteristic polynomial are less than 1 to within a small numerical tolerance.
Input Arguments A Lag operator polynomial object, as produced by LagOp.
Output Arguments indicator Boolean value for the stability test. true indicates that A(L) is stable and that the magnitude of all eigenvalues of its characteristic polynomial are less than one; false indicates that A(L) is unstable and that the magnitude of at least one of the eigenvalues of its characteristic polynomial is greater than or equal to one. eigenvalues Eigenvalues of the characteristic polynomial associated with A(L). The length of eigenvalues is the product of the degree and dimension of A(L).
Examples Check a Lag Operator Polynomial for Stability Divide two Lag Operator polynomial objects and check if the resulting polynomial is stable: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); [indicator,eigenvalues]=isStable(A\B) indicator = logical 1 eigenvalues = 4×1 complex
12-1501
12
Functions
0.3531 -0.0723 -0.0723 -0.3086
+ + +
0.0000i 0.3003i 0.3003i 0.0000i
Tips • Zero-degree polynomials are always stable. • For polynomials of degree greater than zero, the presence of NaN-valued coefficients returns a false stability indicator and vector of NaNs in eigenvalues. • When testing for stability, the comparison incorporates a small numerical tolerance. The indicator is true when the magnitudes of all eigenvalues are less than 1-10*eps, where eps is machine precision. Users who wish to incorporate their own tolerance (including 0) may simply ignore indicator and determine stability as follows: [~,eigenvalues] = isStable(A); indicator = all(abs(eigenvalues) < (1-tol));
for some small, nonnegative tolerance tol.
References [1] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also mldivide | mrdivide Topics “Specify Lag Operator Polynomials” on page 2-9 “Plot the Impulse Response Function of Conditional Mean Model” on page 7-86
12-1502
jcitest
jcitest Johansen cointegration test
Syntax h = jcitest(Y) h = jcitest(Tbl) h = jcitest( ___ ,Name=Value) [h,pValue,stat,cValue] = jcitest( ___ ) [h,pValue,stat,cValue,mles] = jcitest( ___ )
Description h = jcitest(Y) returns the rejection decisions h from conducting the Johansen test, which assesses each null hypothesis H(r) of cointegration rank less than or equal to r among the numDimsdimensional multivariate time series Y against the alternatives H(numDims) (trace test) or H(r + 1) (maxeig test). The tests produce maximum likelihood estimates of the parameters in a vector errorcorrection (VEC on page 12-1514) model of the cointegrated series. h = jcitest(Tbl) returns rejection decisions from conducting the Johansen test on the variables of the table or timetable Tbl. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. h = jcitest( ___ ,Name=Value) uses additional options specified by one or more name-value arguments, using any input-argument combination in the previous syntaxes. Some options control the number of tests to conduct. The following conditions apply when jcitest conducts multiple tests: • jcitest treats each test as separate from all other tests. • Each row of all outputs contains the results of the corresponding test. For example, jcitest(Tbl,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. [h,pValue,stat,cValue] = jcitest( ___ ) displays, at the command window, the results of the Johansen test and returns the p-values pValue, test statistics stat, and critical values cValue of the test. The results display includes the ranks r, corresponding rejection decisions, p-values, decision statistics, and specified options. [h,pValue,stat,cValue,mles] = jcitest( ___ ) also returns a structure of maximum likelihood estimates associated with the VEC(q) model of the multivariate time series yt.
Examples
12-1503
12
Functions
Conduct Johansen Cointegration Test on Matrix of Data Test a multivariate time series for cointegration using the default values of the Johansen cointegration test. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada series' ans = 5x1 cell {'(INF_C) Inflation rate (CPI-based)' } {'(INF_G) Inflation rate (GDP deflator-based)'} {'(INT_S) Interest rate (short-term)' } {'(INT_M) Interest rate (medium-term)' } {'(INT_L) Interest rate (long-term)' }
Test the interest rate series for cointegration by using the Johansen cointegration test. Use default options and return the rejection decision. h = jcitest(Data(:,3:end)) h=1×7 table
t1
r0 _____
r1 _____
r2 _____
Model ______
true
true
false
{'H1'}
Lags ____ 0
Test _________
Alpha _____
{'trace'}
0.05
By default, jcitest conducts the trace test and uses the H1 Johansen form by default. The test fails to reject the null hypothesis of rank 2 cointegration in the series.
Conduct Default Johansen Cointegration Test on Table Variables Conduct the Johansen cointegration test on a multivariate time series using default options, which tests all table variables. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Conduct the Johansen cointegration test by passing the timetable to jcitest and using default options. jcitest tests for cointegration among all table variables by default. h = jcitest(TT) h=1×9 table r0 _____
12-1504
r1 _____
r2 _____
r3 _____
r4 _____
Model ______
Lags ____
Test _________
Alpha _____
jcitest
t1
true
true
false
false
true
{'H1'}
0
{'trace'}
0.05
The test fails to reject the null hypotheses of rank 2 and 3 cointegration among the series. By default, jcitest includes all input table variables in the cointegration test. To select a subset of variables to test, set the DataVariables option.
Conduct Johansen Test for Each Test Statistic jcitest supports two types Johansen tests. Conduct a test for each type. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. Identify the interest rate series. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; idxINT = contains(TT.Properties.VariableNames,"INT");
Conduct the Johansen cointegration test to assess cointegration among the interest rate series. Specify both test types trace and maxeig, and set the level of significance to 2.5%. h = jcitest(TT,DataVariables=idxINT,Test=["trace" "maxeig"],Alpha=0.025) h=2×7 table
t1 t2
r0 _____
r1 _____
r2 _____
Model ______
true false
false false
false false
{'H1'} {'H1'}
Lags ____ 0 0
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.025 0.025
h is a 2-row table; rows contain results of separate tests. At 2.5% level of significance: • The trace test fails to reject the null hypotheses of ranks 1 and 2 cointegration among the series. • The maxeig test fails to reject the null hypotheses for each cointegration rank.
Return Test p-Values and Decision Statistics Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. Identify the interest rate series. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; idxINT = contains(TT.Properties.VariableNames,"INT");
12-1505
12
Functions
Conduct the Johansen cointegration test to assess cointegration among the interest rate series. Specify both test types trace and maxeig. [h,pValue,stat,cValue] = jcitest(TT,DataVariables=idxINT,Test=["trace" "maxeig"]) ************************ Results Summary (Test 1) Data: TT Effective sample size: 40 Model: H1 Lags: 0 Statistic: trace Significance level: 0.05 r h stat cValue pValue eigVal ---------------------------------------0 1 37.6886 29.7976 0.0050 0.4101 1 1 16.5770 15.4948 0.0343 0.2842 2 0 3.2003 3.8415 0.0737 0.0769 ************************ Results Summary (Test 2) Data: TT Effective sample size: 40 Model: H1 Lags: 0 Statistic: maxeig Significance level: 0.05 r h stat cValue pValue eigVal ---------------------------------------0 0 21.1116 21.1323 0.0503 0.4101 1 0 13.3767 14.2644 0.0687 0.2842 2 0 3.2003 3.8415 0.0737 0.0769 h=2×7 table
t1 t2
r0 _____
r1 _____
r2 _____
Model ______
true false
true false
false false
{'H1'} {'H1'}
pValue=2×7 table r0 _________ t1 t2
0.0050497 0.050346
stat=2×7 table r0 ______
12-1506
Lags ____ 0 0
r1 ________
r2 ________
Model ______
0.034294 0.06874
0.073661 0.073661
{'H1'} {'H1'}
r1 ______
r2 ______
Model ______
Lags ____
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.05 0.05
Lags ____ 0 0
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.05 0.05
Test __________
Alpha _____
jcitest
t1 t2
37.689 21.112
16.577 13.377
3.2003 3.2003
{'H1'} {'H1'}
cValue=2×7 table r0 ______
r1 ______
r2 ______
Model ______
15.495 14.264
3.8415 3.8415
{'H1'} {'H1'}
t1 t2
29.798 21.132
0 0
Lags ____ 0 0
{'trace' } {'maxeig'}
0.05 0.05
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.05 0.05
jcitest prints a results display for each test to the command window. All outputs are tables containing the corresponding statistics and test options.
Plot Estimated Cointegrating Relations Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; idxINT = contains(TT.Properties.VariableNames,"INT");
Plot the interest series. plot(TT.Time,TT{:,idxINT}) legend(series(idxINT),Location="northwest") grid on
12-1507
12
Functions
Test the interest rate series for cointegration; use the default Johansen form H1. Return all outputs. [h,pValue,stat,cValue,mles] = jcitest(TT,DataVariables=idxINT); ************************ Results Summary (Test 1) Data: TT Effective sample size: 40 Model: H1 Lags: 0 Statistic: trace Significance level: 0.05 r h stat cValue pValue eigVal ---------------------------------------0 1 37.6886 29.7976 0.0050 0.4101 1 1 16.5770 15.4948 0.0343 0.2842 2 0 3.2003 3.8415 0.0737 0.0769 h h=1×7 table r0 _____
12-1508
r1 _____
r2 _____
Model ______
Lags ____
Test _________
Alpha _____
jcitest
t1
true
true
false
{'H1'}
0
{'trace'}
0.05
pValue pValue=1×7 table r0 _________ t1
0.0050497
r1 ________
r2 ________
Model ______
0.034294
0.073661
{'H1'}
Lags ____ 0
Test _________
Alpha _____
{'trace'}
0.05
The test fails to reject the null hypothesis of rank 2 cointegration in the series. Plot the estimated cointegrating relations B′yt − 1 + c0: TTLag = lagmatrix(TT,1); T = height(TTLag); B = mles.r2.paramVals.B; c0 = mles.r2.paramVals.c0; plot(TTLag.Time,TTLag{:,idxINT}*B+repmat(c0',T,1)) grid on
Input Arguments Y — Data representing observations of multivariate time series yt numeric matrix 12-1509
12
Functions
Data representing observations of a multivariate time series yt, specified as a numObs-by-numDims numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double Tbl — Data representing observations of multivariate time series yt table | timetable Data representing observations of a multivariate time series yt, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. Note jcitest removes the following observations from the specified data: • All rows containing at least one missing observation, represented by a NaN value • From the beginning of the data, initial values required to initialize lagged variables
Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: jcitest(Tbl,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. Model — Johansen form of VEC(q) model deterministic terms "H1" (default) | "H2" | "H1*" | "H*" | "H" | character vector | string vector | cell vector of character vectors Johansen form of the VEC(q) model deterministic terms [3], specified as a Johansen form name in the table, or a string vector or cell vector of character vectors of such values (for model parameter definitions, see “Vector Error-Correction (VEC) Model” on page 12-1514). Value
Error-Correction Term
Description
"H2"
AB´yt − 1
No intercepts or trends are present in the cointegrating relations, and no deterministic trends are present in the levels of the data. Specify this model only when all response series have a mean of zero.
12-1510
"H1*"
A(B´yt−1+c0)
Intercepts are present in the cointegrating relations, and no deterministic trends are present in the levels of the data.
"H1"
A(B´yt−1+c0)+c1
Intercepts are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
jcitest
Value
Error-Correction Term
Description
"H*"
A(B´yt−1+c0+d0t)+c1
Intercepts and linear trends are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H"
A(B´yt−1+c0+d0t)+c1+d1t
Intercepts and linear trends are present in the cointegrating relations, and deterministic quadratic trends are present in the levels of the data. If quadratic trends are not present in the data, this model can produce good in-sample fits but poor out-of-sample forecasts.
jcitest conducts a separate test for each value in Model. Example: Model="H1*" uses the Johansen form H1* for all tests. Example: Model=["H1*" "H1"] uses Johansen form H1* for the first test, and then uses Johansen form H1 for the second test. Data Types: string | char | cell Lags — Number of lagged differences q 0 (default) | nonnegative integer | vector of nonnegative integers Number of lagged differences q in the VEC(q) model, specified as a nonnegative integer or vector of nonnegative integers. jcitest conducts a separate test for each value in Lags. Example: Lags=1 includes Δyt – 1 in the model for all tests. Example: Lags=[0 1] includes no lags in the model for the first test, and then includes Δyt – 1 in the model for the second test. Data Types: double Test — Test to perform "trace" (default) | "maxeig" | character vector | string vector | cell vector of character vectors Test to perform, specified as a value in this table, or a string vector or cell vector of character vectors of such values. Value
Description
"trace"
The alternative hypothesis is H(numDims), and the test statistics are −T log(1 − λr + 1) + … + log(1 − λm) .
"maxieig"
The alternative hypothesis is H(r + 1), and the test statistics are −Tlog(1 − λr + 1) .
Both tests assess the null hypothesis H(r) of cointegration rank less than or equal to r. jcitest computes statistics using the effective sample size T ≤ nnumObs and ordered estimates of the eigenvalues of C = AB′, λ1 > ... > λm, where m = numDims. 12-1511
12
Functions
jcitest conducts a separate test for each value in Test. Example: Test="maxeig" conducts the maxeig test for all tests. Example: Test=["maxeig" "trace"] conducts the maxeig test for the first test, and then conducts the trace test for the second test. Data Types: char | string | cell Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. jcitest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double Display — Command window display control "off" | "summary" | "params" | "full" Command window display control, specified as a value in this table. Value
Description
"off"
jcitest does not display the results to the command window. If jcitest returns h or no outputs, this display is the default.
"summary"
jcitest displays a tabular summary of test results. The tabular display includes null ranks r = 0:(numDims − 1) in the first column of each summary. jcitest displays multiple test results in separate summaries. When jcitest returns any other output than h (for example, pValue), this display is the default. You cannot set this display when jcitest returns h or no outputs.
"params"
jcitest displays maximum likelihood estimates of the parameter values associated with the reduced-rank VEC(q) model of yt. You can set this display only when jcitest returns mles. jcitest returns the displayed parameter estimates in the field mles.rn(j).paramVals for null rank r = n and test j.
"full"
jcitest displays both "summary" and "params".
Example: Display="off" Data Types: char | string DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl for which jcitest conducts the test, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] 12-1512
jcitest
Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Note • When jcitest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • A lagged and differenced time series has a reduced sample size. Absent presample values, if the test series yt is defined for t = 1,…,T, the lagged series yt– k is defined for t = k+1,…,T. The first difference applied to the lagged series yt– k further reduces the time base to k+2,…,T. With p lagged differences, the common time base is p+2,…,T and the effective sample size is T–(p+1).
Output Arguments h — Test rejection decisions table Test rejection decisions, returned as a numTests-by-(numDims + 3) table, where numTests is the number of tests, which is determined by specified options. Row j of h corresponds to test j with options. Rows of h correspond to tests specified by the values of the last three variables Model, Test, and Alpha. Row labels are t1, t2, …, tu, where u = numTests. Variables of h correspond to different, maintained cointegration ranks r = 0, 1, …, numDims – 1 and specified name-value arguments that control the number of tests. Variable labels are r0, r1, …, rR, where R = numDims – 1, and Model, Test, and Alpha. To access results, for example, the result for test j of null rank k, use h.rk(j). Variable k, labeled rk, is logical vector whose entries have the following interpretations: • 1 (true) indicates rejection of the null hypothesis of cointegration rank k in favor of the alternative hypothesis. • 0 (false) indicates failure to reject the null hypothesis of cointegration rank k. pValue — Test statistic p-values table Test statistic p-values, returned as a table with the same dimensions and labels as h. Variable k, labeled rk, is a numeric vector of p-values for the corresponding tests. The p-values are right-tailed probabilities. When test statistics are outside tabulated critical values, jcitest returns maximum (0.999) or minimum (0.001) p-values. stat — Test statistics table 12-1513
12
Functions
Test statistics, returned as a table with the same dimensions and labels as h. The Test setting of a particular test determines the test statistic. cValue — Critical values table Critical values, returned as a table with the same dimensions and labels as h. Variable k, labeled rk, is a numeric vector of critical values for the corresponding tests. The critical values are for righttailed probabilities determined by Alpha. jcitest loads tables of critical values from the file Data_JCITest.mat, and then linearly interpolates test critical values from the tables. Critical values in the tables derive from methods described in [4]. mles — Maximum likelihood estimates (MLE) associated with VEC(q) model of yt structure array Maximum likelihood estimates associated with the VEC(q) model of yt, returned as a table with the same dimensions and labels as h. Variable k, labeled rk, is a structure array of MLEs with elements for the corresponding tests. Each element of mles.rk has the fields in this table. You can access a field using dot notation, for example, mles.r2(3).paramVals contains the parameter estimates of the third test corresponding to the null hypothesis of rank 2 cointegration. Field
Description
paramNames
Cell vector of parameter names, of the form: {A, B, B1, … Bq, c0, d0, c1, d1} Elements depend on the values of the Lags and Model namevalue arguments.
paramVals
Structure of parameter estimates with field names corresponding to the parameter names in paramNames.
res
T-by-numDims matrix of residuals, where T is the effective sample size, obtained by fitting the VEC(q) model of yt to the input data.
EstCov
Estimated covariance Q of the innovations process εt.
eigVal
Eigenvalue associated with H(r).
eigVec
Eigenvector associated with the eigenvalue in eigVal. Eigenvectors v are normalized so that v′S11v = 1, where S11 is defined as in [3].
rLL
Restricted loglikelihood of yt under the null.
uLL
Unrestricted loglikelihood of yt under the alternative.
More About Vector Error-Correction (VEC) Model A vector error-correction (VEC) model is a multivariate, stochastic time series model consisting of a system of m = numDims equations of m distinct, differenced response variables. Equations in the 12-1514
jcitest
system can include an error-correction term, which is a linear function of the responses in levels used to stabilize the system. The cointegrating rank r is the number of cointegrating relations that exist in the system. Each response equation can include a degree q autoregressive polynomial composed of first differences of the response series, a constant, a time trend, and a constant and time trend in the error-correction term. Expressed in lag operator notation, a VEC(q) model for a multivariate time series yt is Φ(L)(1 − L)yt = A B′yt − 1 + c0 + d0t + c1 + d1t + εt = c + dt + Cyt − 1 + εt, where • yt is an m = numDims dimensional time series corresponding to m response variables at time t, t = 1,...,T. • Φ(L) = I − Φ1 − Φ2 − ... − Φq, I is the m-by-m identity matrix, and Lyt = yt – 1. • The cointegrating relations are B'yt – 1 + c0 + d0t and the error-correction term is A(B'yt – 1 + c0 + d0t). • r is the number of cointegrating relations and, in general, 0 ≤ r ≤ m. • A is an m-by-r matrix of adjustment speeds. • B is an m-by-r cointegration matrix. • C = AB′ is an m-by-m impact matrix with a rank of r. • c0 is an r-by-1 vector of constants (intercepts) in the cointegrating relations. • d0 is an r-by-1 vector of linear time trends in the cointegrating relations. • c1 is an m-by-1 vector of constants (deterministic linear trends in yt). • d1 is an m-by-1 vector of linear time-trend values (deterministic quadratic trends in yt). • c = Ac0 + c1 and is the overall constant. • d = Ad0 + d1 and is the overall time-trend coefficient. • Φj is an m-by-m matrix of short-run coefficients, where j = 1,...,q and Φq is not a matrix containing only zeros. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. If m = r, then the VEC model is a stable VAR(q + 1) model in the levels of the responses. If r = 0, then the error-correction term is a matrix of zeros, and the VEC(q) model is a stable VAR(q) model in the first differences of the responses.
Tips • To convert VEC(q) model parameters in the mles output to VAR(q + 1) model parameters, use vec2var. • To test linear constraints on the error-correction speeds A and the space of cointegrating relations spanned by B, use jcontest. 12-1515
12
Functions
Algorithms • jcitest identifies deterministic terms that are outside of the cointegrating relations, c1 and d1, by projecting constant and linear regression coefficients, respectively, onto the orthogonal complement of A. • If jcitest fails to reject the null hypothesis of cointegration rank r = 0, the inference is that the error-correction coefficient C is zero, and the VEC(q) model reduces to a standard VAR(q) model in first differences. If jcitest rejects all cointegration ranks r less than numDims, the inference is that C has full rank, and yt is stationary in levels. • The parameters A and B in the reduced-rank VEC(q) model are not identifiable, but their product C = AB′ is identifiable. jcitest constructs B = V(:,1:r) using the orthonormal eigenvectors V returned by eig, and then renormalizes so that V'*S11*V = I [3]. • The time series in the specified input data can be stationary in levels or first differences (that is, I(0) or I(1)). Rather than pretesting series for unit roots (using, e.g., adftest, pptest, kpsstest, or lmctest), the Johansen procedure formulates the question within the model. An I(0) series is associated with a standard unit vector in the space of cointegrating relations, and the jcontest can test for its presence. • Deterministic cointegration, where cointegrating relations, perhaps with an intercept, produce stationary series, is the traditional sense of cointegration introduced by Engle and Granger [1] (see egcitest). Stochastic cointegration, where cointegrating relations produce trend-stationary series (that is, d0 is nonzero), extends the definition of cointegration to accommodate a greater variety of economic series. • Unless higher-order trends are present in the data, models with fewer restrictions can produce good in-sample fits, but poor out-of-sample forecasts.
Alternative Functionality App The Econometric Modeler app enables you to conduct the Johansen cointegration test.
Version History Introduced in R2011a
References [1] Engle, R. F. and C. W. J. Granger. "Co-Integration and Error-Correction: Representation, Estimation, and Testing." Econometrica. Vol. 55, 1987, pp. 251–276. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [4] MacKinnon, J. G. "Numerical Distribution Functions for Unit Root and Cointegration Tests." Journal of Applied Econometrics. Vol. 11, 1996, pp. 601–618. [5] Turner, P. M. "Testing for Cointegration Using the Johansen Approach: Are We Using the Correct Critical Values?" Journal of Applied Econometrics. v. 24, 2009, pp. 825–831. 12-1516
jcitest
See Also Objects vecm | varm Functions estimate | vec2var | egcitest | jcontest Topics “Cointegration and Error Correction Analysis” on page 9-108 “Identifying Single Cointegrating Relations” on page 9-114 “Compare Approaches to Cointegration Analysis” on page 9-143 “Test for Cointegration Using the Johansen Test” on page 9-138 “Test Cointegrating Vectors” on page 9-147 “Estimate VEC Model Parameters Using jcitest” on page 9-140 “Testing Cointegrating Vectors and Adjustment Speeds” on page 9-146 “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50 “Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
12-1517
12
Functions
jcontest Johansen constraint test
Syntax h = jcontest(Y,r,test,Cons) [h,pValue,stat,cValue] = jcontest(Y,r,test,Cons) StatTbl = jcontest(Tbl,r,test,Cons) [ ___ ] = jcontest( ___ ,Name=Value) [ ___ ,mles] = jcontest( ___ )
Description h = jcontest(Y,r,test,Cons) returns the rejection decisions h from conducting the Johansen constraint test, which assesses linear constraints on either the error-correction (adjustment) speeds A or the cointegration space spanned by the cointegrating matrix B in the reduced-rank VEC(q) on page 12-1531 model of the multivariate time series yt, where: • Y is a matrix of observations of yt. • r is the common rank of matrices A and B. • test specifies the constraint types, including linear or equality constraints on A or B. • Cons specifies the test constraint values. For a particular test, the constraint type and values form the null hypotheses tested against the alternative hypothesis H(r) of cointegration rank less than or equal to r (an unconstrained VEC model). The tests also produce maximum likelihood estimates of the parameters in the VEC(q) model, subject to the constraints. Each element of test and Cons results in a separate test. [h,pValue,stat,cValue] = jcontest(Y,r,test,Cons) also returns the p-values pValue, test statistics stat, and critical values cValue of the test. StatTbl = jcontest(Tbl,r,test,Cons) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Johansen constraint test on all variables of the input table or timetable Tbl. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. [ ___ ] = jcontest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. jcontest returns the output argument combination for the corresponding input arguments. In addition to bp, some options control the number of tests to conduct. For example, jcontest(Tbl,r,test,Cons,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. [ ___ ,mles] = jcontest( ___ ) also returns a structure of maximum likelihood estimates associated with the constrained VEC(q) models of the multivariate time series yt. 12-1518
jcontest
Examples Conduct Johansen Constraint Test on Matrix of Data Test for weak exogeneity of a time series, with respect to other series in the system, by using default options of jcontest. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada series' ans = 5x1 cell {'(INF_C) Inflation rate (CPI-based)' } {'(INF_G) Inflation rate (GDP deflator-based)'} {'(INT_S) Interest rate (short-term)' } {'(INT_M) Interest rate (medium-term)' } {'(INT_L) Interest rate (long-term)' }
Use the Johansen constraint test to assess whether the CPI-based inflation rate y1, t is weakly exogenous with respect to the three interest rate series by testing the following constraint in a 4-D VEC model of the series: (1 − L)y1, t = c + εt . Specify a rank of 1 for the test, a linear constraint on the 4-by-1 adjustment speed vector A so that a1 = 0, and default options. Return the rejection decision. Cons = [1; 0; 0; 0]; Y = Data(:,[1 3:5]); h = jcontest(Y,1,"ACon",Cons) h = logical 0
Given default options and assumptions, h = 0 suggests that the test fails to reject the null hypothesis of the constrained model, which is that the inflation rate is weakly exogenous with respect to the interest rate series.
Return Test p-Values and Decision Statistics Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada
Conduct the default Johansen constraint test to assess whether the CPI-based inflation rate is weakly exogenous with respect to the interest rate series. Return the test decisions and p-values. 12-1519
12
Functions
Cons = [1; 0; 0; 0]; Y = Data(:,[1 3:5]); [h,pValue,stat,cValue] = jcontest(Y,1,"ACon",Cons) h = logical 0 pValue = 0.3206 stat = 0.9865 cValue = 3.8415
Conduct Johansen Constraint Test on Table Variables Test for weak exogeneity of a time series, which are variables in a table, with respect to the other time series in the table. Return a table of results. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Use the Johansen constraint test to assess whether the CPI-based and GDP-deflator-based inflation rates ( y1, t and y2, t, respectively) are weakly exogenous with respect to the three interest rate series by testing the following constraint in a 5-D VEC model of the series: (1 − L)y1, t = c1, 1 + ε1, t (1 − L)y2, t = c2, 1 + ε2, t . Specify a rank of 2 for the test, a linear constraint on the 4-by-2 adjustment speed vector A so that a1 = 0 and a2 = 0, and default options. Cons = [1 0 0 0 0 StatTbl =
0; 1; 0; 0; 0]; jcontest(TT,2,"ACon",Cons)
StatTbl=1×8 table h _____ Test 1
true
pValue __________
stat ______
cValue ______
1.3026e-05
27.907
9.4877
Lags ____ 0
Alpha _____
Model ______
Test ________
0.05
{'H1'}
{'acon'}
StatTbl.h = 1 means that the test rejects the null hypothesis of the constrained model that the inflation rates are jointly weakly exogenous. StatTbl.pValue = 1.3026e-5 suggests that the evidence to reject is strong. 12-1520
jcontest
By default, jcontest conducts the Johansen constraint test on all variables in the input table. To select a subset of variables from an input table, set the DataVariables option.
Test Purchasing Power Parity Using jcontest Use the Johansen framework to test multivariate time series with the following characteristics: 1
The log Australian CPI, log US CPI, and the exchange rate series of the countries are stationary.
2
The three series exhibit cointegration.
3
The Australian and the US dollars have the same purchasing power.
Load and Inspect Data Load the data on Australian and U.S. prices Data_JAustralian.mat, which contains the table DataTable. Convert the table to a timetable. load Data_JAustralian dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Dates = [];
Plot the log Australian and log US CPI series (PAU and PUS, respectively), and the log AUD/USD exchange rate series EXCH. varnames = ["PAU" "PUS" "EXCH"]; plot(TT.Time,TT{:,varnames}) legend(varnames,Location="best") grid on
12-1521
12
Functions
Pretest for Stationarity Use jcontest to test the null hypothesis that the individual series are stationary by specifying, for each variable j, the following constrained model a1, j (1 − L)yt = a2, j (yt − 1 + c0) + c1 + εt . a3, j Specify the variables to use in the test. Cons = num2cell(eye(3),1) Cons=1×3 cell array {3x1 double} {3x1 double}
{3x1 double}
StatTbl0 = jcontest(TT,1,"BVec",Cons,DataVariables=varnames) StatTbl0=3×8 table h _____ Test 1 Test 2 Test 3
12-1522
true true false
pValue __________
stat ______
cValue ______
1.307e-05 1.0274e-05 0.06571
22.49 22.972 5.445
5.9915 5.9915 5.9915
Lags ____ 0 0 0
Alpha _____
Model ______
Test ________
0.05 0.05 0.05
{'H1'} {'H1'} {'H1'}
{'bvec'} {'bvec'} {'bvec'}
jcontest
jcontest returns a table of test results. Each row corresponds to a separate test and columns correspond to results or specified options for each test. StatTbl.h(j) = 1 rejects the null hypothesis of stationarity of variable j, and StatTbl.h(j) = 0 fails to reject stationarity. Test for Cointegration Test for cointegration by using jcitest. StatTbl1 = jcitest(TT,DataVariables=varnames) StatTbl1=1×7 table r0 r1 _____ _____ t1
true
false
r2 _____
Model ______
false
{'H1'}
Lags ____ 0
Test _________
Alpha _____
{'trace'}
0.05
StatTbl1.r1 = 0 and StatTbl1.r2 = 0 suggest that the series exhibit at least rank 1 cointegration. Test for Purchasing Power Parity Test for purchasing power parity PAU = PUS + EXCH. StatTbl2 = jcontest(TT,1,"BCon",[1 -1 -1]',DataVariables=varnames) StatTbl2=1×8 table h _____ Test 1
false
pValue ________
stat ______
cValue ______
0.053995
3.7128
3.8415
Lags ____ 0
Alpha _____
Model ______
Test ________
0.05
{'H1'}
{'bcon'}
StatTbl2.h = 0 means that the test fails to reject the null hypothesis of the constrained model, that is, purchasing power parity between the models should not be rejected.
Inspect Maximum Likelihood Estimates of Constrained VEC Models Compare the four types of supported constraints on the adjustment-speed and cointegrating matrices. Load the data on Australian and US prices Data_JAustralian.mat, which contains the table DataTable. Convert the table to a timetable. Consider a 3-D VEC model consisting of the log Australian and US CPI, and the log AUD/USD exchange rate series. load Data_JAustralian dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Dates = []; varnames = ["PAU" "PUS" "EXCH"];
Conduct the four Johansen constraint tests; specify the arbitrary constraint value 1 −1 −1 ′. Return the test results and maximum likelihood estimates of the constrained model. 12-1523
12
Functions
[StatTbl,mle] = jcontest(TT,1,["ACon" "AVec" "BCon" "BVec"], ... [1 -1 -1]',DataVariables=varnames); StatTbl StatTbl=4×8 table h _____ Test Test Test Test
1 2 3 4
false true false false
pValue __________
stat ______
cValue ______
0.11047 3.0486e-08 0.053995 0.074473
2.5475 34.612 3.7128 5.1946
3.8415 5.9915 3.8415 5.9915
Lags ____ 0 0 0 0
Alpha _____
Model ______
Test ________
0.05 0.05 0.05 0.05
{'H1'} {'H1'} {'H1'} {'H1'}
{'acon'} {'avec'} {'bcon'} {'bvec'}
mle is a 4-by-1 structure array with fields containing the maximum likelihood estimates of the constrained models for each test. For each test, display the estimates of A and B, and compute the MLE of the impact matrix C = A B′. The impactmat function is a local supporting function on page 12-1526 that computes the MLE of the impact matrix and displays the estimated matrices. [AACon,BACon,CACon] = impactmat(mle(1)) AACon = 3×1 0.0043 0.0055 -0.0012 BACon = 3×1 2.8496 -2.3341 -6.2670 CACon = 3×3 0.0121 0.0156 -0.0035
-0.0099 -0.0128 0.0028
-0.0267 -0.0343 0.0076
[1 -1 -1]*AACon ans = -1.7347e-18 [AAVec,BAVec,CAVec] = impactmat(mle(2)) AAVec = 3×1 1 -1 -1 BAVec = 3×1
12-1524
jcontest
-0.0204 0.0158 0.0246 CAVec = 3×3 -0.0204 0.0204 0.0204
0.0158 -0.0158 -0.0158
0.0246 -0.0246 -0.0246
[ABCon,BBCon,CBCon] = impactmat(mle(3)) ABCon = 3×1 -0.0043 -0.0052 -0.0089 BBCon = 3×1 1.8001 -3.9210 5.7211 CBCon = 3×3 -0.0078 -0.0094 -0.0159
0.0170 0.0206 0.0347
-0.0248 -0.0300 -0.0507
[1 -1 -1]*BBCon ans = 0 [ABVec,BBVec,CBVec] = impactmat(mle(4)) ABVec = 3×1 0.0252 0.0422 0.0556 BBVec = 3×1 1 -1 -1 CBVec = 3×3 0.0252 0.0422
-0.0252 -0.0422
-0.0252 -0.0422
12-1525
12
Functions
0.0556
-0.0556
-0.0556
Observe that the AVec and BVec constraints apply the constraint value directly to the coefficients, whereas the estimates of the ACon and BCon constraints fulfill the corresponding linear constraint. Supporting Function function [A,B,C] = impactmat(mlest) A = mlest.paramVals.A; B = mlest.paramVals.B; C = A*B'; end
Input Arguments Y — Data representing observations of multivariate time series yt numeric matrix Data representing observations of a multivariate time series yt, specified as a numObs-by-numDims numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double Tbl — Data representing observations of multivariate time series yt table | timetable Data representing observations of a multivariate time series yt, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. r — Common rank of A and B positive integer in [1,numDims − 1] Common rank of A and B, specified by a positive integer in the interval [1,numDims − 1]. Tip Infer r by conducting a Johansen test using jcitest. Data Types: double test — Null hypothesis constraint types "ACon" | "AVec" | "BVec" | "BVec" | character vector | string vector | cell vector of character vectors Null hypothesis constraint type, specified as a constraint name in the table, or a string vector or cell vector of character vectors of such values.
12-1526
Constraint Name
Description
"ACon"
Test linear constraints on A.
"AVec"
Test specific vectors in A.
jcontest
Constraint Name
Description
"BVec"
Test linear constraints on B.
"BVec"
Test specific vectors in B.
jcontest conducts a separate test for each value in test. Data Types: string | char | cell Cons — Null hypothesis constraint values R numeric matrix | cell vector of numeric matrices Null hypothesis constraint values, specified as a value for the corresponding constraint type test, or a cell vector of such values, in this table. For constraints on B, the number of rows in each matrix numDims1 is one of the following, where numDims is the number of dimensions in the input data: • numDims + 1 when the Model name-value argument is "H*" or "H1*" and constraints include the restricted deterministic term in the model • numDims otherwise Constraint Type test
Constraint Value Cons
Description
"ACon"
R, a numDims-by-numCons Specifies numCons constraints on A given by R'A numeric matrix = 0, where numCons ≤ numDims − r.
"AVec"
numDims1-by-numCons numeric matrix
Specifies numCons equality constraints imposed on error-correction speed vectors in A, where numCons ≤ r.
"BCon"
R, numDims-by-numCons numeric matrix
Specifies numCons constraints on B given by R'B = 0, where numCons ≤ numDims − r.
"BVec"
numDims1-by-numCons numeric matrix
Specifies numCons equality constraints imposed on numCons of the cointegrating vectors in B, where numCons ≤ r.
Tip When you construct constraint values, use the following interpretations of the rows and columns of A and B. • Row i of A contains the adjustment speeds of variable yi,t to disequilibrium in each of the r cointegrating relations. • Column j of A contains the adjustment speeds of each of the numDims variables to disequillibrium in cointegrating relation j. • Row i of B contains the coefficients of variable yi,t in each of the r cointegrating relations. • Column j of B contains the coefficients of each of the numDims variables in cointegrating relation j.
jcontest conducts a separate test for each cell in Cons. Data Types: string | char | cell 12-1527
12
Functions
Note jcontest removes the following observations from the specified data: • All rows containing at least one missing observation, represented by a NaN value • From the beginning of the data, initial values required to initialize lagged variables
Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: jcontest(Tbl,r,test,Cons,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. Model — Johansen form of VEC(q) model deterministic terms "H1" (default) | "H2" | "H1*" | "H*" | "H" | character vector | string vector | cell vector of character vectors Johansen form of the VEC(q) model deterministic terms [3], specified as a Johansen form name in the table, or a string vector or cell vector of character vectors of such values (for model parameter definitions, see “Vector Error-Correction (VEC) Model” on page 12-1531). Value
Error-Correction Term
Description
"H2"
AB´yt − 1
No intercepts or trends are present in the cointegrating relations, and no deterministic trends are present in the levels of the data. Specify this model only when all response series have a mean of zero.
"H1*"
A(B´yt−1+c0)
Intercepts are present in the cointegrating relations, and no deterministic trends are present in the levels of the data.
"H1"
A(B´yt−1+c0)+c1
Intercepts are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H*"
A(B´yt−1+c0+d0t)+c1
Intercepts and linear trends are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H"
A(B´yt−1+c0+d0t)+c1+d1t
Intercepts and linear trends are present in the cointegrating relations, and deterministic quadratic trends are present in the levels of the data. If quadratic trends are not present in the data, this model can produce good in-sample fits but poor out-of-sample forecasts.
jcontest conducts a separate test for each value in Model. 12-1528
jcontest
Example: Model="H1*" uses the Johansen form H1* for all tests. Example: Model=["H1*" "H1"] uses Johansen form H1* for the first test, and then uses Johansen form H1 for the second test. Data Types: string | char | cell Lags — Number of lagged differences q 0 (default) | nonnegative integer | vector of nonnegative integers Number of lagged differences q in the VEC(q) model, specified as a nonnegative integer or vector of nonnegative integers. jcontest conducts a separate test for each value in Lags. Example: Lags=1 includes Δyt – 1 in the model for all tests. Example: Lags=[0 1] includes no lags in the model for the first test, and then includes Δyt – 1 in the model for the second test. Data Types: double Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. jcontest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl for which jcontest conducts the test, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Note • When jcontest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors. • A lagged and differenced time series has a reduced sample size. Absent presample values, if the test series yt is defined for t = 1,…,T, the lagged series yt– k is defined for t = k+1,…,T. The first 12-1529
12
Functions
difference applied to the lagged series yt– k further reduces the time base to k+2,…,T. With p lagged differences, the common time base is p+2,…,T and the effective sample size is T–(p+1).
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. jcontest returns h when you supply the input Y. • Values of 1 indicate rejection of the null hypothesis that the specified constraints test and Cons hold in favor of the alternative hypothesis that they do not hold. • Values of 0 indicate failure to reject the null hypothesis that the constraints hold. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. jcontest returns pValue when you supply the input Y. The p-values are right-tail probabilities. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. jcontest returns stat when you supply the input Y. The test statistics are likelihood ratios determined by the test. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. jcontest returns cValue when you supply the input Y. The asymptotic distributions of the test statistics are chi-square, with the degree-of-freedom parameter determined by the test. The critical value of the test statistics are for a right-tail probability. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. jcontest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Model, and Test. mles — Maximum likelihood estimates (MLE) associated with constrained VEC(q) model of yt structure array 12-1530
jcontest
Maximum likelihood estimates associated with the constrained VEC(q) model of yt, return as a structure array with the number of records equal to the number of tests. Each element of mles has the fields in this table. You can access a field using dot notation, for example, mles(3).paramVals contains a structure of the parameter estimates. Field
Description
paramNames
Cell vector of parameter names, of the form: {A, B, B1, … Bq, c0, d0, c1, d1}. Elements depend on the values of Lags and Model.
paramVals
Structure of parameter estimates with field names corresponding to the parameter names in paramNames.
res
T-by-numDims matrix of residuals, where T is the effective sample size, obtained by fitting the VEC(q) model of yt to the input data.
EstCov
Estimated covariance Q of the innovations process εt.
rLL
Restricted loglikelihood of yt under the null hypothesis.
uLL
Unrestricted loglikelihood of yt under the alternative hypothesis.
dof
Degrees of freedom of the asymptotic chi-square distribution of the test statistic.
More About Vector Error-Correction (VEC) Model A vector error-correction (VEC) model is a multivariate, stochastic time series model consisting of a system of m = numDims equations of m distinct, differenced response variables. Equations in the system can include an error-correction term, which is a linear function of the responses in levels used to stabilize the system. The cointegrating rank r is the number of cointegrating relations that exist in the system. Each response equation can include a degree q autoregressive polynomial composed of first differences of the response series, a constant, a time trend, and a constant and time trend in the error-correction term. Expressed in lag operator notation, a VEC(q) model for a multivariate time series yt is Φ(L)(1 − L)yt = A B′yt − 1 + c0 + d0t + c1 + d1t + εt = c + dt + Cyt − 1 + εt, where • yt is an m = numDims dimensional time series corresponding to m response variables at time t, t = 1,...,T. • Φ(L) = I − Φ1 − Φ2 − ... − Φq, I is the m-by-m identity matrix, and Lyt = yt – 1. • The cointegrating relations are B'yt – 1 + c0 + d0t and the error-correction term is A(B'yt – 1 + c0 + d0t). • r is the number of cointegrating relations and, in general, 0 ≤ r ≤ m. 12-1531
12
Functions
• A is an m-by-r matrix of adjustment speeds. • B is an m-by-r cointegration matrix. • C = AB′ is an m-by-m impact matrix with a rank of r. • c0 is an r-by-1 vector of constants (intercepts) in the cointegrating relations. • d0 is an r-by-1 vector of linear time trends in the cointegrating relations. • c1 is an m-by-1 vector of constants (deterministic linear trends in yt). • d1 is an m-by-1 vector of linear time-trend values (deterministic quadratic trends in yt). • c = Ac0 + c1 and is the overall constant. • d = Ad0 + d1 and is the overall time-trend coefficient. • Φj is an m-by-m matrix of short-run coefficients, where j = 1,...,q and Φq is not a matrix containing only zeros. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. If m = r, then the VEC model is a stable VAR(q + 1) model in the levels of the responses. If r = 0, then the error-correction term is a matrix of zeros, and the VEC(q) model is a stable VAR(q) model in the first differences of the responses.
Tips • jcontest compares finite-sample statistics to asymptotic critical values, and tests can show significant size distortions for small samples [2]. Larger samples lead to more reliable inferences. • To convert VEC(q) model parameters in the mles output to vector autoregressive (VAR) model parameters, use the vec2var function.
Algorithms • jcontest identifies deterministic terms that are outside of the cointegrating relations, c1 and d1, by projecting constant and linear regression coefficients, respectively, onto the orthogonal complement of A. • The parameters A and B in the reduced-rank VEC(q) model are not identifiable. jcontest identifies B using the methods in [3], depending on the test. • Tests on B answer questions about the space of cointegrating relations. Tests on A answer questions about common driving forces in the system. For example, an all-zero row in A indicates a variable that is weakly exogenous with respect to the coefficients in B. Such a variable can affect other variables, but it does not adjust to disequilibrium in the cointegrating relations. Similarly, a standard unit vector column in A indicates a variable that is exclusively adjusting to disequilibrium in a particular cointegrating relation. • Constraint matrices R satisfying R′A = 0 or R′B = 0 are equivalent to A = Hφ or B = Hφ, where H is the orthogonal complement of R (null(R')) and φ is a vector of free parameters.
Version History Introduced in R2011a
12-1532
jcontest
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Haug, A. “Testing Linear Restrictions on Cointegrating Vectors: Sizes and Powers of Wald Tests in Finite Samples.” Econometric Theory. v. 18, 2002, pp. 505–524. [3] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [4] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [5] Morin, N. "Likelihood Ratio Tests on Cointegrating Vectors, Disequilibrium Adjustment Vectors, and their Orthogonal Complements." European Journal of Pure and Applied Mathematics. v. 3, 2010, pp. 541–571.
See Also Objects vecm | varm Functions jcitest | vec2var Topics “Cointegration and Error Correction Analysis” on page 9-108 “Identifying Single Cointegrating Relations” on page 9-114 “Compare Approaches to Cointegration Analysis” on page 9-143 “Test for Cointegration Using the Johansen Test” on page 9-138 “Test Cointegrating Vectors” on page 9-147 “Estimate VEC Model Parameters Using jcitest” on page 9-140 “Testing Cointegrating Vectors and Adjustment Speeds” on page 9-146
12-1533
12
Functions
kpsstest KPSS test for stationarity
Syntax h = kpsstest(y) [h,pValue,stat,cValue] = kpsstest(y) StatTbl = kpsstest(Tbl) [ ___ ] = kpsstest( ___ ,Name=Value) [ ___ ,reg] = kpsstest( ___ )
Description h = kpsstest(y) returns the rejection decision h from conducting the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test on page 12-1544 for a unit root in the univariate time series y. [h,pValue,stat,cValue] = kpsstest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = kpsstest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the KPSS test for a unit root in the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable namevalue argument. [ ___ ] = kpsstest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. kpsstest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when kpsstest conducts multiple tests: • kpsstest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, kpsstest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, for the presence of a unit root in the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West estimator of the long-run variance and the second test includes 1 autocovariance lag. [ ___ ,reg] = kpsstest( ___ ) additionally returns a structure of regression statistics for the hypothesis test reg.
Examples Conduct KPSS Test on Vector of Data Test a time series for a unit root using the default options of kpsstest. Input the time series data as a numeric vector. 12-1534
kpsstest
Load the Nelson-Plosser macroeconomic series data set. Plot the real gross national product (RGNP). load Data_NelsonPlosser rgnp = DataTable.GNPR; dt = datetime(dates,ConvertFrom="datenum"); plot(dt,rgnp) title("Real Gross National Product")
The series exhibits exponential growth. Linearize the RGNP series. linRGNP = log(rgnp);
Assess the null hypothesis of the KPSS test, which is that the series is trend stationary. Use default options. h = kpsstest(linRGNP) h = logical 1
h = 1 indicates that, at a 5% level of significance, the test rejects the null hypothesis that the linearized Real GNP series is trend stationary, which suggests that the series is unit root nonstationary.
12-1535
12
Functions
Return Test p-Value and Decision Statistics Load the Nelson-Plosser Macroeconomic series data set, and linearize the RGNP series. load Data_NelsonPlosser linRGNP = log(DataTable.GNPR);
Assess the null hypothesis that the series is trend stationary. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stats,cValue] = kpsstest(linRGNP) h = logical 1 pValue = 0.0100 stats = 0.6299 cValue = 0.1460
Conduct KPSS Test on Table Variable Test whether a time series, which is one variable in a table, is trend stationary using the default options. Load the Nelson-Plosser macroeconomic series data set, which contains annual measurements of macroeconomic variables in the table DataTable. Linearize the RGNP series by applying the log transformation, and store the result in DataTable. load Data_NelsonPlosser DataTable.LinRGNP = log(DataTable.GNPR); DataTable.Properties.VariableNames{end} ans = 'LinRGNP'
Test the null hypothesis that the linearized RGNP series is trend stationary. StatTbl = kpsstest(DataTable) StatTbl=1×7 table h _____ Test 1
true
pValue ______
stat _______
cValue ______
0.01
0.62989
0.146
Lags ____ 0
Alpha _____
Trend _____
0.05
true
kpsstest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, Trend), and rows correspond to individual tests (in this case, kpsstest conducts one test). 12-1536
kpsstest
By default, kpsstest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Specify Lags for Newey-West Estimator by Testing Up Conduct multiple tests on the linearized RGNP series that reproduce the first row of the second half of Table 5 in [2]. Load the Nelson-Plosser macroeconomic series data set, which contains annual measurements of macroeconomic variables in the table DataTable. Apply the log transformation to all variables in the table. load Data_NelsonPlosser LogDT = varfun(@log,DataTable); LogDT.Properties.VariableNames{end} ans = 'log_SP'
varfun applies log to all variables in DataTable, prepends log_ to all transformed variable names, and stores the result in the table LogDT. The final variable is the log of the stock price index series (SP). Assess the null hypothesis that the linearized RGNP series is trend stationary over a range of lags. Specify the variable name of the linearized RGNP series log_GNPR. lags = (0:8); StatTbl = kpsstest(LogDT,DataVariable="log_GNPR",Lags=lags) StatTbl=9×7 table h _____ Test Test Test Test Test Test Test Test Test
1 2 3 4 5 6 7 8 9
true true true true true true true false false
pValue ________
stat _______
cValue ______
0.01 0.01 0.01 0.0169 0.027579 0.04015 0.048417 0.05886 0.066757
0.62989 0.33666 0.24209 0.1976 0.17291 0.15782 0.1479 0.14122 0.13695
0.146 0.146 0.146 0.146 0.146 0.146 0.146 0.146 0.146
Lags ____ 0 1 2 3 4 5 6 7 8
Alpha _____
Trend _____
0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
true true true true true true true true true
The tests corresponding to 0 ≤ lags ≤ 2 produce p-values that are less than 0.01. For 2 < lags < 7, the tests indicate sufficient evidence to suggest that log RGNP is unit root nonstationary (as opposed to the series being trend stationary) at the default 5% level.
Select Newey-West Estimator Lags Using Sample Size Test whether the wage series in the manufacturing sector (1900–1970) has a unit root. Use the advice in [2] to select the number of lags in the Newey-West estimator of the coefficient standard errors. 12-1537
12
Functions
Load the Nelson-Plosser macroeconomic data set. Remove all missing values from the data relative to the wage series WN. load Data_NelsonPlosser [DataTable,idx] = rmmissing(DataTable,DataVariables="WN"); dt = dates(~idx);
Compute the effective sample size T and its square root, where the latter is approximately the number of lags recommended for the Newey-West estimator. T = height(DataTable); sqrtT = sqrt(T);
Plot the wage series. plot(dt,DataTable.WN) title("Wages")
The wage series appears to grow exponentially. Linearize the wages series by applying the log transformation to all variables in the table. LogDT = varfun(@log,DataTable); plot(dt,LogDT.log_WN) title("Log Wages")
12-1538
kpsstest
The log wage series appears to have a linear trend. Test the null hypothesis that the log wage series is trend stationary (no unit root) against the alternative hypothesis that the log wage series is difference stationary. Conduct the test by setting a range of lags for the Newey-West estimator around T . StatTbl = kpsstest(LogDT,DataVariable="log_WN",Lags=7:10) StatTbl=4×7 table h _____ Test Test Test Test
1 2 3 4
false false false false
pValue ______ 0.1 0.1 0.1 0.1
stat ________
cValue ______
0.10678 0.10074 0.096634 0.094058
0.146 0.146 0.146 0.146
Lags ____ 7 8 9 10
Alpha _____
Trend _____
0.05 0.05 0.05 0.05
true true true true
All tests fail to reject the null hypothesis that the log wages series is trend stationary. The p-values are larger than 0.1. The software compares the test statistic to critical values and computes p-values that it interpolates from tables in [2].
12-1539
12
Functions
Inspect Regression Statistics Load the Nelson-Plosser macroeconomic series data set. Apply the log transformation to all variables in the table. load Data_NelsonPlosser LogDT = varfun(@log,DataTable);
Assess the null hypothesis that the linearized RGNP series is trend stationary. Use the Trend option to conduct the test with (true) and without (false) a deterministic time trend term in the response model. Return the regression statistics. [~,reg] = kpsstest(LogDT,DataVariable="log_GNPR",Trend=[true false]);
reg is a structure array of length 2 with fields that store the OLS regression results. Each element corresponds to a test. Compare the coefficient estimates. withTrend = reg(1).coeff withTrend = 2×1 4.5834 0.0310 woTrend = reg(2).coeff woTrend = 5.5595
For the first test, the response model for the regression includes a trend term, so the regression coefficients withTrend include a model intercept (under the null hypothesis) 4.5834 and the coefficient of the time trend 0.0310. For the second test, the response model includes an intercept only for the regression, so the intercept woTrend is 5.5595. Display the coefficient standard errors for the first test. reg(1).se ans = 2×1 0.0344 0.0010
The Lags option includes autocovariance lags in the Newey-West estimator of the long-run variance. Therefore, the option does not affect the estimated OLS coefficients, standard errors, or MSE. Conduct a KPSS test for each lag from 0 through 4. Compare the standard OLS and the Newey-West estimates. lags = 0:4; [~,regLags] = kpsstest(LogDT,DataVariable="log_GNPR",Lags=lags); coeffs = table(regLags.coeff,VariableNames="Lags_"+lags, ... RowNames=["Intercept" "Trend"]); se = table(regLags.se,VariableNames="Lags_"+lags, ...
12-1540
kpsstest
RowNames=["SE_Intercept" "SE_Trend"]); mse = table(regLags.MSE,VariableNames="Lags_"+lags, ... RowNames="MSE"); nw = table(regLags.NWEst,VariableNames="Lags_"+lags, ... RowNames="NWVar"); [coeffs; se; mse; nw] ans=6×5 table
Intercept Trend SE_Intercept SE_Trend MSE NWVar
Lags_0 __________
Lags_1 __________
Lags_2 __________
Lags_3 __________
Lags_4 __________
4.5834 0.030988 0.03443 0.00095035 0.017933 0.017354
4.5834 0.030988 0.03443 0.00095035 0.017933 0.03247
4.5834 0.030988 0.03443 0.00095035 0.017933 0.045154
4.5834 0.030988 0.03443 0.00095035 0.017933 0.055321
4.5834 0.030988 0.03443 0.00095035 0.017933 0.063222
Input Arguments y — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note kpsstest removes missing observations, represented by NaN values, from the input series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: kpsstest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, for the presence of a unit root in the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West estimator of the long-run variance and the second test includes 1 autocovariance lag. Lags — Number of autocovariance lags 0 (default) | nonnegative integer | vector of nonnegative integers 12-1541
12
Functions
Number of autocovariance lags to include in the Newey-West estimator of the long-run variance, specified as a nonnegative integer or vector of nonnegative integers. If Lags(j) > 0, kpsstest includes lags 1 through Lags(j) in the estimator for test j. kpsstest conducts a separate test for each element in Lags. Example: Lags=0:2 includes zero lagged autocovariance terms in the Newey-West estimator for the first test, the lag 1 autocovariance term for the second test, and autocovariance lags 1 and 2 in the third test. Data Types: double Trend — Flag for including deterministic trend term δt true (default) | false | logical vector Flag for including deterministic trend δt in the model, specified as a logical scalar or vector. kpsstest conducts a separate test for each element in Trend. Example: Trend=false excludes δt from the response model for all tests. Data Types: logical Alpha — Significance level 0.05 (default) | numeric scalar | numeric vector Significance level for the hypothesis test, specified as a numeric scalar or vector with entries between 0.01 and 0.10. kpsstest conducts a separate test for each element in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string Note • When kpsstest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors.
12-1542
kpsstest
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. kpsstest returns h when you supply the input y. • Values of 1 indicate rejection of the trend-stationary null hypothesis in favor of the unit root alternative. • Values of 0 indicate failure to reject the trend-stationary null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. kpsstest returns pValue when you supply the input y. The p-values are right-tail probabilities. When test statistics are outside tabulated critical values, kpsstest returns maximum (0.10) or minimum (0.01) p-values. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. kpsstest returns stat when you supply the input y. kpsstest computes test statistics by using an ordinary least squares (OLS) regression (for more details, see KPSS Test on page 12-1544). • If you set Trend=false, kpsstest regresses y on an intercept. • Otherwise, kpsstest regresses y on an intercept and trend term. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. kpsstest returns cValue when you supply the input y. Critical values are for right-tail probabilities. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. kpsstest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, and Trend. reg — Regression statistics structure array 12-1543
12
Functions
Regression statistics for OLS estimation of the coefficients in the model, returned as a structure array with the number of records equal to the number of tests. Each element of reg has the fields in this table. You can access a field using dot notation, for example, reg(1).coeff contains the coefficient estimates of the first test. Field
Description
num
Length of input series with NaNs removed
size
Effective sample size T, adjusted for lags
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
autoCov
Estimated residual autocovariances
NWEst
Newey-West coefficient standard error estimates
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
More About Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) Test The KPSS test assesses the null hypothesis that a univariate time series is trend stationary against the alternative that it is a nonstationary unit root process. The test uses the structural model 12-1544
kpsstest
yt = ct + δt + u1t ct = ct − 1 + u2t, where • δ is the trend coefficient (see the Trend argument). • u1t is a stationary process. • u2t is an independent and identically distributed process with mean 0 and variance σ2. The null hypothesis is that σ2 = 0, which implies that the random walk term (ct) is constant and acts as the model intercept. The alternative hypothesis is that σ2 > 0, which introduces the unit root in the random walk. An OLS regression of yt onto Xt yields the residual series {et}, where Xt has one of the following forms: • Xt = 1 for all t when Trend is false. • Xt = [1 δt] when Trend is true. The test statistic is
∑
T
t=1 s2T 2
2
St
,
where • T is the effective sample size. • s2 is the Newey-West estimate of the long-run variance. • sT = e1 + e2 + … + eT.
Tips • To draw valid inferences from a KPSS test, you must determine a suitable value for the Lags argument. The following methods can determine a suitable number of lags: • Begin with a small number of lags, and then evaluate the sensitivity of the results by adding more lags. • Kwiatkowski et al. [2] suggest that a number of lags on the order of T , where T is the effective sample size, is often satisfactory under both the null and the alternative. For consistency of the Newey-West estimator, the number of lags must approach infinity as the sample size increases. • With a specific testing strategy in mind, determine the value of the Trend argument by the growth characteristics of the input time series. • If the input series grows, include a trend term by setting Trend to true (default). This setting provides a reasonable comparison of a trend stationary null and a unit root process with drift. • If a series does not exhibit long-term growth characteristics, exclude a trend term by setting Trend to false. 12-1545
12
Functions
Algorithms • Test statistics follow nonstandard distributions under the null, even asymptotically. Kwiatkowski et al. [2] use Monte Carlo simulations, for models with and without a trend, to tabulate asymptotic critical values for a standard set of significance levels between 0.01 and 0.1. kpsstest interpolates critical values and p-values from these tables.
Version History Introduced in R2009b
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [3] Newey, W. K., and K. D. West. "A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, 1987, pp. 703–708.
See Also pptest | adftest | vratiotest | lmctest Topics “Unit Root Tests” on page 3-41 “Unit Root Nonstationarity” on page 3-33
12-1546
lagmatrix
lagmatrix Create lagged time series data
Syntax YLag = lagmatrix(Y,lags) [YLag,TLag] = lagmatrix(Y,lags) LagTbl = lagmatrix(Tbl,lags) [ ___ ] = lagmatrix( ___ ,Name=Value)
Description YLag = lagmatrix(Y,lags) shifts the input regular series Y in time by the lags (positive) or leads (negative) in lags, and returns the matrix of shifted series YLag. [YLag,TLag] = lagmatrix(Y,lags) also returns a vector TLag representing the common time base for the shifted series relative to the original time base of 1, 2, 3, …, numObs. LagTbl = lagmatrix(Tbl,lags) shifts all variables in the input table or timetable Tbl, which represent regular time series, and returns the table or timetable of shifted series LagTbl. To select different variables in Tbl to shift, use the DataVariables name-value argument. [ ___ ] = lagmatrix( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. lagmatrix returns the output argument combination for the corresponding input arguments. For example, lagmatrix(Tbl,1,Y0=zeros(1,5),DataVariables=1:5) lags, by one period, the first five variables in the input table Tbl and sets the presample of each series to 0.
Examples Shift Matrix of Data Create a bivariate time series matrix X with five observations each. Y = [1 -1; 2 -2 ;3 -3 ;4 -4 ;5 -5] Y = 5×2 1 2 3 4 5
-1 -2 -3 -4 -5
Create a shifted matrix, which is composed of the original X and its first two lags. lags = [0 1 2]; XLag = lagmatrix(Y,lags)
12-1547
12
Functions
XLag = 5×6 1 2 3 4 5
-1 -2 -3 -4 -5
NaN 1 2 3 4
NaN -1 -2 -3 -4
NaN NaN 1 2 3
NaN NaN -1 -2 -3
XLAG is a 5-by-6 matrix: • The first two columns contain the original data (lag 0). • Columns 3 and 4 contain the data lagged by one unit. • Columns 5 and 6 contain the data lagged by two units. By default, lagmatrix returns only values corresponding to the time base of the original data, and the function fills unknown presample values using NaNs.
Return Time Base of Shifted Series Create a bivariate time series matrix X with five observations each. Y = [1 -1; 2 -2 ;3 -3 ;4 -4 ;5 -5] Y = 5×2 1 2 3 4 5
-1 -2 -3 -4 -5
Create a shifted matrix, which is composed of the original X and its first two lags. Return the time base of the shift series. lags = [0 1 2]; [XLag,TLag] = lagmatrix(Y,lags);
By default, lagmatrix returns the time base of the input data.
Shift Table Variables Shift multiple time series, which are variables in tables, using the default options of lagmatrix. Load data of yearly Canadian inflation and interest rates Data_Canada.mat, which contains five series in the table DataTable. load Data_Canada
Create a timetable from the table of data. 12-1548
lagmatrix
dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; tail(TT) Time ___________
INF_C _______
INF_G _______
INT_S ______
INT_M ______
INT_L ______
31-Dec-1987 31-Dec-1988 31-Dec-1989 31-Dec-1990 31-Dec-1991 31-Dec-1992 31-Dec-1993 31-Dec-1994
4.2723 3.9439 4.8743 4.6547 5.4633 1.4946 1.8246 0.18511
4.608 4.5256 4.7258 3.1015 2.8614 1.2281 1.0473 0.60929
8.1692 9.4158 12.016 12.805 8.8301 6.5088 4.9268 5.4168
9.4158 9.7717 10.203 11.193 9.1625 7.4317 6.4583 7.7867
9.9267 10.227 9.9217 10.812 9.8067 8.7717 7.8767 8.58
Create timetable containing all series lagged by one year, the series themselves, and the series led by a year. lags = [1 0 -1]; LagTT = lagmatrix(TT,lags); head(LagTT) Time ___________
Lag1INF_C _________
31-Dec-1954 31-Dec-1955 31-Dec-1956 31-Dec-1957 31-Dec-1958 31-Dec-1959 31-Dec-1960 31-Dec-1961
NaN 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396
Lag1INF_G _________ NaN 1.4468 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139
Lag1INT_S _________ NaN 1.4658 1.5533 2.9025 3.7775 2.2925 4.805 3.3242
Lag1INT_M _________ NaN 2.6683 2.7908 3.7575 4.565 3.4692 4.9383 4.5192
Lag1INT_L _________ NaN 3.255 3.1892 3.6058 4.125 4.115 5.0492 5.1892
Lag0INF_C _________ 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396 1.0156
LagTT is a timetable containing the shifted series. lagmatrix appends each variable of the input timetable by Lagj or Leadj, depending on whether the series is a lag or lead, with j indicating the number of shifting units. By default, lagmatrix shifts all variables in the input table. You can choose a subset of variables to shift by using the DataVariables name-value argument. For example, shift only the inflation rate series. LagTTINF = lagmatrix(TT,lags,DataVariables=["INF_C" "INF_G"]); head(LagTTINF) Time ___________
Lag1INF_C _________
31-Dec-1954 31-Dec-1955 31-Dec-1956 31-Dec-1957 31-Dec-1958 31-Dec-1959 31-Dec-1960 31-Dec-1961
NaN 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396
Lag1INF_G _________ NaN 1.4468 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139
Lag0INF_C _________ 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396 1.0156
Lag0INF_G _________
Lead1INF_C __________
1.4468 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139 0.46074
0.077402 1.4218 3.1546 2.4828 1.183 1.2396 1.0156 1.1088
Lead1INF_G __________ 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139 0.46074 1.3737
12-1549
12
Functions
Specify Presample and Postsample Data Create a vector of univariate time series data. y = [0.1 0.4 -0.2 0.1 0.2]';
Create vectors representing presample and postsample data. y0 = [0.50; 0.75]*y(1) y0 = 2×1 0.0500 0.0750 yF = [0.75; 0.50]*y(end) yF = 2×1 0.1500 0.1000
Shift the series by two units in both directions. Specify the presample and postsample data, and return a matrix containing shifted series for the entire time base. lags = [2 0 -2]; [YLag,TLag] = lagmatrix(y,lags,Y0=y0,YF=yF) YLag = 5×3 0.0500 0.0750 0.1000 0.4000 -0.2000
0.1000 0.4000 -0.2000 0.1000 0.2000
-0.2000 0.1000 0.2000 0.1500 0.1000
TLag = 5×1 1 2 3 4 5
Because the presample and postsample have enough observations to cover the time base of the input data, the shifted series YLag is completely specified (it does not contain NaN entries). Shift the series in the same way, but return a matrix containing shifted series for the entire time base by specifying "full" for the Shape name-value argument. [YLagFull,TLagFull] = lagmatrix(y,lags,Y0=y0,YF=yF,Shape="full")
12-1550
lagmatrix
YLagFull = 9×3 NaN NaN 0.0500 0.0750 0.1000 0.4000 -0.2000 0.1000 0.2000
0.0500 0.0750 0.1000 0.4000 -0.2000 0.1000 0.2000 0.1500 0.1000
0.1000 0.4000 -0.2000 0.1000 0.2000 0.1500 0.1000 NaN NaN
TLagFull = 9×1 -1 0 1 2 3 4 5 6 7
Because the presample and postsample do not contain enough observations to cover the full time base, which includes presample through postsample times, lagmatrix fills unknown sample units using NaN values.
Input Arguments Y — Time series data numeric matrix Time series data, specified as a numObs-by-numVars numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double lags — Data shifts integer | integer-valued vector Data shifts, specified as an integer or integer-valued vector of length numShifts. • Lags are positive integers, which shift the input series forward over the time base. • Leads are negative integers, which shift the input series backward over the time base. lagmatrix applies each specified shift in lags, in order, to each input series. Shifts of regular time series have units of one time step. Data Types: double Tbl — Time series data table | timetable 12-1551
12
Functions
Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. If Tbl is a timetable, it must represent a sample with a regular datetime time step (see isregular). Specify numVars variables to filter by using the DataVariables argument. The selected variables must be numeric. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: lagmatrix(Tbl,1,Y0=zeros(1,5),DataVariables=1:5) lags, by one period, the first 5 variables in the input table Tbl and sets the presample of each series to 0. Y0 — Presample data NaN (default) | numeric matrix | table | timetable Presample data to backward fill lagged series, specified as a matrix with numVars columns, or a table or timetable. For a table or timetable, the DataVariables name-value argument selects the variables in Y0 to shift. Y0 must have the same data type as the input data. Timetables must have regular sample times preceding times in Tbl. lagmatrix fills required presample values from the end of Y0. Example: Y0=zeros(size(Y,2),2) YF — Postsample data to front fill led series NaN (default) | numeric matrix | table | timetable Postsample data to frontward fill led series, specified as a matrix with numVars columns, or a table or timetable. For a table or timetable, the DataVariables name-value argument selects the variables in YF to shift. The default for postsample data is NaN. YF must have the same data type as the input data. Timetables must have regular sample times following times in Tbl. lagmatrix fills required postsample values from the beginning of YF. Example: YF=ones(size(Y,2),3) DataVariables — Variables in Tbl, Y0, and YF all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl, Y0, and YF, from which lagmatrix creates shifted time series data, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. 12-1552
lagmatrix
Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Shape — Part of shifted series to appear in outputs "same" (default) | "full" | "valid" | character vector Part of the shifted series to appear in the outputs, specified as a value in this table. Value
Description
"full"
Outputs contain all values in the input time series data and all specified presample Y0 or postsample Yf values on an expanded time base.
"same"
Outputs contain only values on the original time base.
"valid"
Outputs contain values for times at which all series have specified (non-NaN) values.
To illustrate the shape of the output shifted time series for each value of Shape, suppose the input time series data is a 2-D series with numObs = T observations y1, t y2, t , and lags is [1 0 -1]. The output shifted series is one of the three T-by-6 matrix arrays in this figure.
Example: Shape="full" Data Types: char | string
Output Arguments YLag — Shifted time series variables numeric matrix Shifted time series variables in Y, returned as a numeric matrix. lagmatrix returns YLag when you supply the input Y. Columns are, in order, all series in Y shifted by the lags(1), all series in Y shifted by the lags(2), …, all series in Y shifted by lags(end). Rows depend on the value of the Shape name-value argument. 12-1553
12
Functions
For example, suppose Y is the 2-D time series of numObs = T observations y1, t y2, t , lags is [1 0 -1], and Shape if "full". YLag is the T-by-6 matrix NaN
NaN
NaN
NaN
y1, 1 y2, 1
NaN
NaN
y1, 1
y2, 1
y1, 2 y2, 2
y1, 1
y2, 1
y1, 2
y2, 2
y1, 3 y2, 3
⋮
⋮
⋮
⋮
y1, T − 2 y2, T − 2 y1, T − 1 y2, T − 1
⋮ ⋮ . y1, T y2, T
y1, T − 1 y2, T − 1
NaN NaN
y1, T
y2, T
y1, T
y2, T
NaN
NaN NaN NaN
TLag — Common time base for the shifted series numeric matrix Common time base for the shifted series relative to the original time base of 1, 2, 3, …, numObs, returned as a vector of length equal to the number of observations in YLag. lagmatrix returns TLag when you supply the input Y. Series with lags (lags > 0) have higher indices; series with leads (lags < 0) have lower indices. For example, the value of TLag for the example in the YLag output description is the column vector with entries 0:(T+1). LagTbl — Shifted time series variables and common time base table | timetable Shifted time series variables and common time base, returned as a table or timetable, the same data type as Tbl. lagmatrix returns LagTbl when you supply the input Tbl. LagTbl contains the outputs YLag and TLag. The following conditions apply: • Each lagged variable of LagTbl has a label Lagjvarname, where varname is the corresponding variable name in DataVariables and j is lag j in lags. • Each lead variable has a label Leadjvarname, where j is lead j in lags. • If LagTbl is a table, the variable labeled TLag contains TLag. • If LagTbl is a timetable, the Time variable contains TLag.
Version History Introduced before R2006a
See Also Functions filter Objects LinearModel Topics “Check Model Assumptions for Chow Test” on page 3-104 12-1554
lagmatrix
“Time Series Regression IX: Lag Order Selection” on page 5-261
12-1555
12
Functions
LagOp Create lag operator polynomial
Description Create a p-degree, m-dimensional lag operator polynomial A(L) = A0 + A1L1 + A2L2 +...+ ApLp by specifying the coefficient matrices A0,…,Ap and, optionally, the corresponding lags. L is the lag (or backshift) operator such that Ljyt = yt–j. LagOp object functions on page 12-1558 enable you to work with specified polynomials. For example, you can filter time series data through a polynomial, determine whether one is stable, or combine multiple polynomials by performing polynomial algebra including addition, subtraction, multiplication, and division. To fit a dynamic model containing lag operator polynomials to data, create the appropriate model object, and then fit it to the data. For univariate models, see arima and estimate; for multivariate models, see varm and estimate. For further analysis, you can create a LagOp object from the resulting estimated coefficients.
Creation Syntax A = LagOp(coefficients) A = LagOp(coefficients,Name,Value) Description A = LagOp(coefficients) creates a lag operator polynomial A with coefficients coefficients, and sets the Coefficients property. A = LagOp(coefficients,Name,Value) specifies additional options using one or more namevalue pair arguments. For example, LagOp(coefficients,'Lags',[0 4 8],'Tolerance',1e-10) associates the coefficients to lags 0, 4, and 8, and sets the lag inclusion tolerance to 1e-10. Input Arguments coefficients — Lag operator polynomial coefficients numeric vector | square numeric matrix | cell vector of square numeric matrices Lag operator polynomial coefficients, specified as a numeric vector, m-by-m square numeric matrix, or cell vector of m-by-m square numeric matrices.
12-1556
Value
Polynomial Returned by LagOp
Numeric vector of length p + 1
A p-degree, 1-D lag operator polynomial, where coefficient(j) is the coefficient of lag j – 1
LagOp
Value
Polynomial Returned by LagOp
m-by-m square numeric matrix
A 0-degree, m-D lag operator polynomial, where coefficient is A0, the coefficient of lag 0
Length p + 1 cell vector of m-by-m square numeric matrices
A p-degree m-D lag operator polynomial, where coefficient(j) is the coefficient matrix of lag j – 1
Example: LagOp(1:3) creates the polynomial A(L) = 1 + 2L1 + 3L2. Name-Value Pair Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Lags',[4 8],'Tolerance',1e-10 associates the coefficients to lags 4 and 8, and sets the coefficient magnitude tolerance to 1e-10. Lags — Lags associated with polynomial coefficients vector of unique, nonnegative integers Lags associated with the polynomial coefficients, specified as the comma-separated pair consisting of 'Lags' and a vector of unique, nonnegative integers. Lags must have numcoeff elements, where numcoeff is one of the following: • If coefficients is a numeric or cell vector, numcoeff = numel(coefficients). The coefficient of lag Lags(j) is coefficients(j). • If coefficients is a matrix, numcoeff = 1. Example: 'Lags',[0 4 8 12] Tolerance — Lag inclusion tolerance 1e–12 (default) | nonnegative numeric scalar Lag inclusion tolerance, specified as the comma-separated pair consisting of 'Tolerance' and a nonnegative numeric scalar. Let c be the element of coefficient j with the largest magnitude. If c ≤ Tolerance, LagOp removes coefficient j from the polynomial. Consequently, MATLAB performs the following actions: • Remove the corresponding lag from the vector in the Lags property. • Replace the corresponding coefficient of the array in the Coefficients property with zeros(m). • If a removed lag is the final lag in the polynomial, MATLAB reduces the degree of the polynomial to the degree of the next largest lag present in the polynomial. For example, if MATLAB removes lags 3 and 4 from a degree 4 polynomial because their coefficient magnitudes are below Tolerance, the resulting polynomial has a degree of 2. Example: 'Tolerance',1e-10
12-1557
12
Functions
Properties Coefficients — Lag operator polynomial coefficients lag-indexed cell array of numeric scalars | lag-indexed cell array of numeric matrices Lag operator polynomial coefficients, specified as a lag-indexed cell array of numeric scalars or m-bym matrices. Coefficients{j} is the coefficient of lag j, where the integer j ≥ 0. For example, A.Coefficients{0} stores the lag 0 coefficient of A. The Lags property stores the indices of nonzero coefficients of Coefficients. Degree — Polynomial degree p nonnegative numeric scalar This property is read-only. Polynomial degree p, specified as a nonnegative numeric scalar. Degree = max(Lags), the largest lag associated with a nonzero coefficient. Data Types: double Dimension — Polynomial dimension m positive numeric scalar This property is read-only. Polynomial dimension m, specified as a positive numeric scalar. You can apply the polynomial A only to an m-D time series variable. Data Types: double Lags — Polynomial lags associated with nonzero coefficients lag-indexed vector of nonnegative integers This property is read-only. Polynomial lags associated with nonzero coefficients, specified as a lag-indexed vector of nonnegative integers. To work with the value of Lags, convert it to a standard MATLAB vector by entering the following code. lags = A.Lags;
Data Types: double
Object Functions filter isEqLagOp isNonZero isStable minus mldivide mrdivide mtimes 12-1558
Apply lag operator polynomial to filter time series Determine if two LagOp objects are same mathematical polynomial Find lags associated with nonzero coefficients of LagOp objects Determine stability of lag operator polynomial Lag operator polynomial subtraction Lag operator polynomial left division Lag operator polynomial right division Lag operator polynomial multiplication
LagOp
plus reflect toCellArray
Lag operator polynomial addition Reflect lag operator polynomial coefficients around lag zero Convert lag operator polynomial object to cell array
Examples Create and Modify Univariate Lag Operator Polynomial Create a LagOp object that represents the lag operator polynomial A(L) = 1 − 0 . 6L + 0 . 08L2 . A = LagOp([1 -0.6 0.08]) A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.6 0.08] Lags: [0 1 2] Degree: 2 Dimension: 1
Display the coefficient of lag 0. a0 = A.Coefficients{0} a0 = 1
Assign a nonzero coefficient to the third lag: A.Coefficients{3} = 0.5 A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.6 0.08 0.5] Lags: [0 1 2 3] Degree: 3 Dimension: 1
The polynomial degree increases to 3.
Specify Polynomial Lags Create a LagOp object that represents the lag operator polynomial A(L) = 1 + 0 . 25L4 + 0 . 1L8 + 0 . 05L12 . nonzeroCoeffs = [1 0.25 0.1 0.05]; lags = [0 4 8 12]; A = LagOp(nonzeroCoeffs,'Lags',lags) A = 1-D Lag Operator Polynomial:
12-1559
12
Functions
----------------------------Coefficients: [1 0.25 0.1 0.05] Lags: [0 4 8 12] Degree: 12 Dimension: 1
Extract the coefficients from the lag operator polynomial, and display all coefficients from lags 0 through 12. allCoeffs = toCellArray(A); allCoeffs = cell2mat(allCoeffs'); allLags = 0:A.Degree;
% Extract coefficients % Prepare lags for display
table(allCoeffs,'RowNames',"Lag " + string(allLags)) ans=13×1 table allCoeffs _________ Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag
0 1 2 3 4 5 6 7 8 9 10 11 12
1 0 0 0 0.25 0 0 0 0.1 0 0 0 0.05
Create Multivariate Lag Operator Polynomial Create a LagOp object that represents the lag operator polynomial 0.5 0 0 1 0 . 25 0 . 1 AL = 0 1 0 + −0 . 5 1 −0 . 5 L4 . 0 0 −0 . 5 0 . 15 −0 . 2 1 Phi0 = [0.5 0 0 1 0 0
0;... 0;... -0.5];
Phi4 = [1 -0.5 0.15
0.25 1 -0.2
0.1;... -0.5;... 1];
Phi = {Phi0 Phi4}; lags = [0 4]; A = LagOp(Phi,'Lags',lags)
12-1560
LagOp
A = 3-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 2 Non-Zero Coefficients] Lags: [0 4] Degree: 4 Dimension: 3
Work with Lag Operator Polynomials Create Multivariate Lag Operator Polynomial Create the 2-D, degree 3 lag operator polynomial A(L) =
1 0 0 . 5 0 . 25 0 . 05 0 . 025 3 + L+ L . 0 1 −0 . 1 0 . 4 −0 . 01 0 . 04
m = 2; A0 = eye(m); A1 = [0.5 0.25; -0.1 0.4]; A3 = 0.1*A1; Coeffs = {A0 A1 A3}; lags = [0 1 3]; A = LagOp(Coeffs,'Lags',lags) A = 2-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 3 Non-Zero Coefficients] Lags: [0 1 3] Degree: 3 Dimension: 2
Determine Polynomial Stability A lag operator polynomial is stable if the magnitudes of all eigenvalues of its characteristic polynomial are less than 1. Determine whether the polynomial is stable, and return the eigenvalues of its characteristic polynomial. [tf,evals] = isStable(A) tf = logical 1 evals = 6×1 complex -0.5820 -0.5820 0.0821 0.0821 0.0499
+ + +
0.1330i 0.1330i 0.2824i 0.2824i 0.2655i
12-1561
12
Functions
0.0499 - 0.2655i
Invert Polynomial Compute the inverse of A L by right dividing the 2-by-2 identity matrix by A L . Ainv = mrdivide(eye(A.Dimension),A) Ainv = 2-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 10 Non-Zero Coefficients] Lags: [0 1 2 3 4 5 6 7 8 9] Degree: 9 Dimension: 2
Ainv is a LagOp object representing the inverse of A L , a degree 9 lag operator polynomial. In theory, the inverse of a lag operator polynomial is of infinite degree, but the mrdivide coefficientmagnitude tolerances truncate the polynomial. Multiply A L and its inverse. checkinv = Ainv*A checkinv = 2-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 4 Non-Zero Coefficients] Lags: [0 10 11 12] Degree: 12 Dimension: 2
Because the inverse computation returns the truncated theoretical inverse, the product checkinv contains lags representing the remainder. You can decrease the mrdivide coefficient-magnitude tolerances to obtain an inverse polynomial that is more precise. Filter Time Series Produce the 2-D time series yt = A(L)et = A0et + A1et − 1 + A2et − 2 + A3et − 3 by filtering the 2-D series of 100 random standard Gaussian deviates et through the polynomial. T = 100; e = randn(T,m); y = filter(A,e); plot((A.Degree + 1):T,y) title('Filtered Series')
12-1562
LagOp
y is a 97-by-2 matrix representing yt . y has p fewer observations than e because filter requires the first p observations of e to initialize the dynamic series when producing y(1:4,:).
Version History Introduced in R2010a
See Also arima | varm Topics “Specify Lag Operator Polynomials” on page 2-9 “Stochastic Process Characteristics” on page 1-18 “AR Model Specifications” on page 7-19 “MA Model Specifications” on page 7-28 “Vector Autoregression (VAR) Model Creation” on page 9-20
12-1563
12
Functions
lassoblm Bayesian linear regression model with lasso regularization
Description The Bayesian linear regression model on page 12-1578 object lassoblm specifies the joint prior distribution of the regression coefficients and the disturbance variance (β, σ2) for implementing Bayesian lasso regression [1]. For j = 1,…,NumPredictors, the conditional prior distribution of βj|σ2 is the Laplace (double exponential) distribution with a mean of 0 and scale σ2/λ, where λ is the lasso regularization, or shrinkage, parameter. The prior distribution of σ2 is inverse gamma with shape A and scale B. The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. The resulting posterior distribution is not analytically tractable. For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis and feature selection, pass the model object and data to the appropriate object function on page 12-1567.
Creation Syntax PriorMdl = lassoblm(NumPredictors) PriorMdl = lassoblm(NumPredictors,Name,Value) Description PriorMdl = lassoblm(NumPredictors) creates a Bayesian linear regression model on page 121868 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is appropriate for implementing Bayesian lasso regression [1]. PriorMdl is a template that defines the prior distributions and specifies the values of the lasso regularization parameter λ and the dimensionality of β. PriorMdl = lassoblm(NumPredictors,Name,Value) sets properties on page 12-1565 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, lassoblm(3,'Lambda',0.5) specifies a shrinkage of 0.5 for the three coefficients (not the intercept).
12-1564
lassoblm
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to set the shrinkage for all coefficients, except the intercept, to 0.5, enter PriorMdl.Lambda = 0.5;
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term for the value. After creating a model, if you change the of value NumPredictors using dot notation, then these parameters revert to the default values: • Variable names (VarNames) • Shrinkage parameter (Lambda) Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. 12-1565
12
Functions
The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char Lambda — Lasso regularization parameter 1 (default) | positive numeric scalar | positive numeric vector Lasso regularization parameter for all regression coefficients, specified as a positive numeric scalar or (Intercept + NumPredictors)-by-1 positive numeric vector. Larger values of Lambda cause corresponding coefficients to shrink closer to zero. Suppose X is a T-by-NumPredictors matrix of predictor data, which you specify during estimation, simulation, or forecasting. • If Lambda is a vector and Intercept is true, Lambda(1) is the shrinkage for the intercept, Lambda(2) is the shrinkage for the coefficient of the first predictor X(:,1), Lambda(3) is the shrinkage for the coefficient of the second predictor X(:,2),…, and Lambda(NumPredictors + 1) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors). • If Lambda is a vector and Intercept is false, Lambda(1) is the shrinkage for the coefficient of the first predictor X(:,1),…, and Lambda(NumPredictors) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors). • If you supply the scalar s for Lambda, then all coefficients of the predictors in X have a shrinkage of s. • If Intercept is true, the intercept has a shrinkage of 0.01, and lassoblm stores [0.01; s*ones(NumPredictors,1)] in Lambda. • Otherwise, lassoblm stores s*ones(NumPredictors,1) in Lambda. Example: 'Lambda',6 Data Types: double A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar Shape hyperparameter of the inverse gamma prior on σ2, specified as a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. Example: 'A',0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf. 12-1566
lassoblm
With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. Example: 'B',5 Data Types: double
Object Functions estimate simulate forecast plot summarize
Perform predictor variable selection for Bayesian linear regression models Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Examples Create Prior Model for Bayesian Lasso Regression Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions: • For j = 0,...,3, β j | σ2 has a Laplace distribution with a mean of 0 and a scale of σ2 /λ, where λ is the shrinkage parameter. The coefficients are conditionally independent. • σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. Create a prior model for Bayesian linear regression. Specify the number of predictors p. p = 3; Mdl = lassoblm(p);
PriorMdl is a lassoblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, lassoblm displays a summary of the prior distributions. Alternatively, you can create a prior model for Bayesian lasso regression by passing the number of predictors to bayeslm and setting the ModelType name-value pair argument to 'lasso'. MdlBayesLM = bayeslm(p,'ModelType','lasso') MdlBayesLM = lassoblm with properties: NumPredictors: 3
12-1567
12
Functions
Intercept: VarNames: Lambda: A: B:
1 {4x1 cell} [4x1 double] 3 1
| Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | 0 100 [-200.000, 200.000] 0.500 Scale mixture Beta(1) | 0 1 [-2.000, 2.000] 0.500 Scale mixture Beta(2) | 0 1 [-2.000, 2.000] 0.500 Scale mixture Beta(3) | 0 1 [-2.000, 2.000] 0.500 Scale mixture Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Mdl and MdlBayesLM are equivalent model objects. You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. Mdl.VarNames = ["IPI" "E" "WR"] Mdl = lassoblm with properties: NumPredictors: Intercept: VarNames: Lambda: A: B:
3 1 {4x1 cell} [4x1 double] 3 1
| Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | 0 100 [-200.000, 200.000] 0.500 Scale mixture IPI | 0 1 [-2.000, 2.000] 0.500 Scale mixture E | 0 1 [-2.000, 2.000] 0.500 Scale mixture WR | 0 1 [-2.000, 2.000] 0.500 Scale mixture Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
MATLAB® associates the variable names to the regression coefficients in displays.
Perform Variable Selection Using Default Lasso Shrinkage Consider the linear regression model in “Create Prior Model for Bayesian Lasso Regression” on page 12-1567. Create a prior model for performing Bayesian lasso regression. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]); shrinkage = PriorMdl.Lambda
12-1568
lassoblm
shrinkage = 4×1 0.0100 1.0000 1.0000 1.0000
PriorMdl stores the shrinkage values for all predictors in its Lambda property. shrinkage(1) is the shrinkage for the intercept, and the elements of shrinkage(2:end) correspond to the coefficients of the predictors in Mdl.VarNames. The default shrinkage for the intercept is 0.01, and the default is 1 for all other coefficients. Load the Nelson-Plosser data set. Create variables for the response and predictor series. Because lasso is sensitive to variable scales, standardize all variables. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; X = (X - mean(X,'omitnan'))./std(X,'omitnan'); y = (y - mean(y,'omitnan'))/std(y,'omitnan');
Although this example standardizes variables, you can specify different shrinkage values for each coefficient by setting the Lambda property of PriorMdl to a numeric vector of shrinkage values. Implement Bayesian lasso regression by estimating the marginal posterior distributions of β and σ2. Because Bayesian lasso regression uses Markov chain Monte Carlo (MCMC) for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: lasso MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------Intercept | -0.4490 0.0527 [-0.548, -0.344] 0.000 Empirical IPI | 0.6679 0.1063 [ 0.456, 0.878] 1.000 Empirical E | 0.1114 0.1223 [-0.110, 0.365] 0.827 Empirical WR | 0.2215 0.1367 [-0.024, 0.494] 0.956 Empirical Sigma2 | 0.0343 0.0062 [ 0.024, 0.048] 1.000 Empirical
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions at the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E (standardized) is in [-0.110, 0.365] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0. 12-1569
12
Functions
By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-1570
lassoblm
The trace plots indicate that the draws seem to mix well. The plot shows no detectable transience or serial correlation, and the draws do not jump between states. Plot the posterior distributions of the coefficients and disturbance variance. figure; plot(PosteriorMdl)
12-1571
12
Functions
E and WR might not be important predictors because 0 is within the region of high density in their posterior distributions.
Attribute Different Shrinkage Values to Coefficients Consider the linear regression model in “Create Prior Model for Bayesian Lasso Regression” on page 12-1567 and its implementation in “Perform Variable Selection Using Default Lasso Shrinkage” on page 12-1568. When you implement lasso regression, a common practice is to standardize variables. However, if you want to preserve the interpretation of the coefficients, but the variables have different scales, then you can perform differential shrinkage by specifying a different shrinkage for each coefficient. Create a prior model for performing Bayesian lasso regression. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. Determine whether the variables have exponential trends by plotting each in separate figures. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)};
12-1572
lassoblm
y = DataTable{:,'GNPR'}; figure; plot(dates,y) title('GNPR')
for j = 1:3 figure; plot(dates,X(:,j)); title(PriorMdl.VarNames(j + 1)); end
12-1573
12
Functions
12-1574
lassoblm
12-1575
12
Functions
The variables GNPR, IPI, and WR appear to have an exponential trend. Remove the exponential trend from the variables GNPR, IPI, and WR. y = log(y); X(:,[1 3]) = log(X(:,[1 3]));
All predictor variables have different scales (for more details, enter Description at the command line). Display the mean of each predictor. Remove observations containing leading missing values from all predictors. predmeans = mean(X,'omitnan') predmeans = 1×3 104 × 0.0002
4.7700
0.0004
The values of the second predictor are much greater than those of the other two predictors and the response. Therefore, the regression coefficient of the second predictor can appear close to zero. Using dot notation, attribute a very low shrinkage to the intercept, a shrinkage of 0.1 to the first and third predictors, and a shrinkage of 1000 to the second predictor. PriorMdl.Lambda = [1e-5 0.1 1e4 0.1];
12-1576
lassoblm
Implement Bayesian lasso regression by estimating the marginal posterior distributions of β and σ2. Because Bayesian lasso regression uses MCMC for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: lasso MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ---------------------------------------------------------------------Intercept | 2.0281 0.6839 [ 0.679, 3.323] 0.999 Empirical IPI | 0.3534 0.2497 [-0.139, 0.839] 0.923 Empirical E | 0.0000 0.0000 [-0.000, 0.000] 0.762 Empirical WR | 0.5250 0.3482 [-0.126, 1.209] 0.937 Empirical Sigma2 | 0.0315 0.0055 [ 0.023, 0.044] 1.000 Empirical
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Create Prior Model for Bayesian Lasso Regression” on page 12-1567. Perform Bayesian lasso regression: 1
Create a Bayesian lasso prior model for the regression coefficients and disturbance variance. Use the default shrinkage.
2
Hold out the last 10 periods of data from estimation.
3
Estimate the marginal posterior distributions.
p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF)
12-1577
12
Functions
h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 25.4831
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. When you perform Bayesian lasso regression, a best practice is to search for appropriate shrinkage values. One way to do so is to estimate the forecast RMSE over a grid of shrinkage values, and choose the shrinkage that minimizes the forecast RMSE.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: 12-1578
lassoblm
• yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Tips • Lambda is a tuning parameter. Therefore, perform Bayesian lasso regression using a grid of shrinkage values, and choose the model that best balances a fit criterion and model complexity. • For estimation, simulation, and forecasting, MATLAB does not standardize predictor data. If the variables in the predictor data have different scales, then specify a shrinkage parameter for each predictor by supplying a numeric vector for Lambda.
Alternative Functionality The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2018b
References [1] Park, T., and G. Casella. "The Bayesian Lasso." Journal of the American Statistical Association. Vol. 103, No. 482, 2008, pp. 681–686.
See Also Objects empiricalblm | mixconjugateblm | mixsemiconjugateblm Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10 12-1579
12
Functions
lazy Adjust Markov chain state inertia
Syntax lc = lazy(mc) lc = lazy(mc,w)
Description lc = lazy(mc) transforms the discrete-time Markov chain mc into the lazy chain on page 12-1588 lc with an adjusted state inertia. lc = lazy(mc,w) applies the inertial weights w for the transformation.
Examples Create Lazy Markov Chain Consider this three-state transition matrix. 0 10 P= 0 0 1 . 1 00 Create the irreducible and periodic Markov chain that is characterized by the transition matrix P. P = [0 1 0; 0 0 1; 1 0 0]; mc = dtmc(P);
At time t = 1,..., T, mc is forced to move to another state deterministically. Determine the stationary distribution of the Markov chain and whether it is ergodic. xFix = asymptotics(mc) xFix = 1×3 0.3333
0.3333
0.3333
isergodic(mc) ans = logical 0
mc is irreducible and not ergodic. As a result, mc has a stationary distribution, but it is not a limiting distribution for all initial distributions. Show why xFix is not a limiting distribution for all initial distributions. 12-1580
lazy
x0 = [1 0 0]; x1 = x0*P x1 = 1×3 0
1
0
0
1
0
0
x2 = x1*P x2 = 1×3 0 x3 = x2*P x3 = 1×3 1
sum(x3 == x0) == mc.NumStates ans = logical 1
The initial distribution is reached again after several steps, which implies that the subsequent state distributions cycle through the same sets of distributions indefinitely. Therefore, mc does not have a limiting distribution. Create a lazy version of the Markov chain mc. lc = lazy(mc) lc = dtmc with properties: P: [3x3 double] StateNames: ["1" "2" NumStates: 3
"3"]
lc.P ans = 3×3 0.5000 0 0.5000
0.5000 0.5000 0
0 0.5000 0.5000
lc is a dtmc object. At time t = 1,..., T, lc "flips a fair coin". It remains in its current state if the "coin shows heads" and transitions to another state if the "coin shows tails". Determine the stationary distribution of the lazy chain and whether it is ergodic. lcxFix = asymptotics(lc)
12-1581
12
Functions
lcxFix = 1×3 0.3333
0.3333
0.3333
isergodic(lc) ans = logical 1
lc and mc have the same stationary distributions, but only lc is ergodic. Therefore, the limiting distribution of lc exists and is equal to its stationary distribution.
Supply Inertial Weights for Lazy Chain Transformation Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 1/2 1/4 1/4 0 0 ; 0 0 1/3 0 2/3 0 0 ; 0 0 0 0 0 1/3 2/3; 0 0 0 0 0 1/2 1/2; 0 0 0 0 0 3/4 1/4; 1/2 1/2 0 0 0 0 0 ; 1/4 3/4 0 0 0 0 0 ]; mc = dtmc(P);
Plot the eigenvalues of the transition matrix on the complex plane. figure; eigplot(mc); title('Original Markov Chain')
12-1582
lazy
Three eigenvalues have modulus one, which indicates that the period of mc is three. Create lazy versions of the Markov chain mc using various inertial weights. Plot the eigenvalues of the lazy chains on separate complex planes. w2 = 0.1; % More active Markov chain w3 = 0.9; % Lazier Markov chain w4 = [0.9 0.1 0.25 0.5 0.25 0.001 0.999]; % Laziness differs between states lc1 lc2 lc3 lc4
= = = =
lazy(mc); lazy(mc,w2); lazy(mc,w3); lazy(mc,w4);
figure; eigplot(lc1); title('Default Laziness');
12-1583
12
Functions
figure; eigplot(lc2); title('More Active Chain');
12-1584
lazy
figure; eigplot(lc3); title('Lazier Chain');
12-1585
12
Functions
figure; eigplot(lc4); title('Differing Laziness Levels');
12-1586
lazy
All lazy chains have only one eigenvalue with modulus one. Therefore, they are aperiodic. The spectral gap (distance between inner and outer circle) determines the mixing time. Observe that all lazy chains take longer to mix than the original Markov chain. Chains with different inertial weights than the default take longer to mix than the default lazy chain.
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). w — Inertial weights 0.5 (default) | numeric scalar | numeric vector Inertial weights, specified as a numeric scalar or vector of length NumStates. Values must be between 0 and 1. • If w is a scalar, lazy applies it to all states. That is, the transition matrix of the lazy chain (lc.P) is the result of the linear transformation Plazy = 1 − w P + wI . P is mc.P and I is the NumStates-by-NumStates identity matrix. 12-1587
12
Functions
• If w is a vector, lazy applies the weights state by state (row by row). Data Types: double
Output Arguments lc — Discrete-time Markov chain dtmc object Discrete-time Markov chain, returned as a dtmc object. lc is the lazy version of mc.
More About Lazy Chain A lazy version of a Markov chain has, for each state, a probability of staying in the same state equal to at least 0.5. In a directed graph of a Markov chain, the default lazy transformation ensures self-loops on all states, eliminating periodicity. If the Markov chain is irreducible, then its lazy version is ergodic. See graphplot.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013.
See Also Objects dtmc Functions graphplot | asymptotics Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Determine Asymptotic Behavior of Markov Chain” on page 10-39
12-1588
lbqtest
lbqtest Ljung-Box Q-test for residual autocorrelation
Syntax h = lbqtest(res) [h,pValue,stat,cValue] = lbqtest(res) StatTbl = lbqtest(Tbl) [ ___ ] = lbqtest( ___ ,Name=Value)
Description h = lbqtest(res) returns the rejection decision h from conducting a Ljung-Box Q-test on page 121600 for autocorrelation in the residual series res. [h,pValue,stat,cValue] = lbqtest(res) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = lbqtest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting a Ljung-Box Q-test for residual autocorrelation in the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = lbqtest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. lbqtest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when lbqtest conducts multiple tests: • lbqtest treats each test as separate from all other tests. • If you specify res, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, lbqtest(Tbl,DataVariable="ResidualGDP",Alpha=0.025,Lags=[1 4]) conducts two tests, at a level of significance of 0.025, for the presence of residual autocorrelation in the variable ResidualGDP of the table Tbl. The first test includes 1 lag in the test statistic, and the second test includes 4 lags.
Examples Conduct Ljung-Box Q-Test on Vector of Data Test a time series for residual autocorrelation using default options of lbqtest. Input the time series data as a numeric vector. Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
12-1589
12
Functions
Data is a time series vector of daily Deutschmark/British pound bilateral spot exchange rates. Plot the series. plot(Data) title("\bf Deutschmark/British Pound Bilateral Spot Exchange Rate") ylabel("Spot Exchange Rate") xlabel("Business Days Since January 2, 1984")
The series appears nonstationary. To stabilize the series, convert the spot exchange rates to returns. returns = price2ret(Data); plot(returns) title("\bf Deutschmark/British Pound Bilateral Spot Exchange Rate") ylabel("Return") xlabel("Business Days Since January 3, 1984")
12-1590
lbqtest
Compute the deviations of the return series from the mean. residuals = returns - mean(returns);
At 0.05 level of significance, test the residual series for autocorrelation using the default options of the Ljung-Box Q-test. h = lbqtest(residuals) h = logical 0
The result h = 0 indicates that insufficient evidence exists to reject the null hypothesis of no residual autocorrelation through 20 lags.
Return Test p-Value and Decision Statistics Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
Preprocess the data by following this procedure: 1
Stabilize the series by computing daily returns. 12-1591
12
Functions
2
Compute the deviations from the mean return.
returns = price2ret(Data); residuals = returns - mean(returns);
Test the residual series for a significant autocorrelation from 1 through 20 lags. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stat,cValue] = lbqtest(residuals) h = logical 0 pValue = 0.1131 stat = 27.8445 cValue = 31.4104
Conduct Ljung-Box Q-Test on Table Variable Test a time series, which is one variable in a table, for residual autocorrelation using default options of lbqtest. Load the equity index data set Data_EquityIdx. Preprocess the daily NASDAQ closing prices by performing the following actions: 1
Convert the price series to a percentage return series by using price2ret.
2
Represent the series as residuals that fluctuate around a constant level by centering the returns series.
Store the residual series in the table with the rest of the data. Because the price-to-return conversion reduces the sample size from the head of the series, replace the missing residual with the a NaN. load Data_EquityIdx ret = 100*price2ret(DataTable.NASDAQ); res = ret - mean(ret); DataTable.Residuals_NASDAQ = [NaN; res]; DataTable.Properties.VariableNames{end} ans = 'Residuals_NASDAQ'
The residual series is the last variable in the table. Conduct Ljung-Box Q-test on the residual series at a 5% significance level by supplying the entire data set lbqtest. StatTbl = lbqtest(DataTable) StatTbl=1×7 table h _____
12-1592
pValue __________
stat ______
cValue ______
Lags ____
Alpha _____
DoF ___
lbqtest
Test 1
true
2.8182e-11
92.395
31.41
20
0.05
20
lbqtest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, and DoF), and rows correspond to individual tests (in this case, lbqtest conducts one test). h = 1 and pValue = 2.82e-11 rejects the null hypothesis and suggests that the evidence for at least one significant autocorrelation in lags 1 through 20 in the NASDAQ returns residual series is strong. By default, lbqtest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Test Time Series for Autocorrelation and ARCH Effects Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
Convert the prices to returns. returns = price2ret(Data);
Compute the deviations of the return series. res = returns - mean(returns);
Test the hypothesis that the residual series is not autocorrelated, using the default number of lags. h1 = lbqtest(res) h1 = logical 0
h1 = 0 indicates that there is insufficient evidence to reject the null hypothesis that the residuals of the returns are not autocorrelated. Test the hypothesis that there are significant ARCH effects, using the default number of lags [3]. h2 = lbqtest(res.^2) h2 = logical 1
h2 = 1 indicates that there are significant ARCH effects in the residuals of the returns. Test for residual heteroscedasticity using archtest. Specify an alternative ARCH(L) model, where L ≤ 20, for consistency with h2. h3 = archtest(res,Lags=20)
12-1593
12
Functions
h3 = logical 1
h3 = 1 indicates that the null hypothesis of no residual heteroscedasticity should be rejected in favor of an ARCH(L) model, where L ≤ 20. This result is consistent with h2.
Test Several Autocorrelation Lags Conduct multiple Ljung-Box Q-tests for autocorrelation by specifying several lags for the test statistic. The data set is a time series of 57 consecutive days of overshorts from an underground gasoline tank in Colorado [2]. That is, the current overshort ( yt) represents the accuracy in measuring the amount of fuel: • In the tank at the end of day t • In the tank at the end of day t − 1 • Delivered to the tank on day t • Sold on day t. Load the data set. load Data_Overshort T = height(DataTable); figure plot(DataTable.OSHORT) title('Daily Gasoline Overshorts')
12-1594
lbqtest
lbqtest is appropriate for a series with a constant mean. Because the series appears to fluctuate around a constant mean, you do not need to stabilize it. Compute deviations from the mean. DataTable.Residuals_OSHORT = DataTable.OSHORT - mean(DataTable.OSHORT);
Assess whether the residuals are autocorrelated. Include 5, 10, and 15 lags in the test statistic, and adjust the significance level of each test to 0.05/3. StatTbl = lbqtest(DataTable,DataVariable="Residuals_OSHORT", ... Lags=[5 10 15],Alpha=0.05/3) StatTbl=3×7 table h _____ Test 1 Test 2 Test 3
true true true
pValue __________
stat ______
cValue ______
Lags ____
Alpha ________
DoF ___
0.0016465 0.00068328 0.001281
19.36 30.599 36.964
13.839 21.707 28.88
5 10 15
0.016667 0.016667 0.016667
5 10 15
Rows of StatTbl contain results of separate tests conducted for each specified lag. Each test rejects the null hypothesis at 0.0167 level of significance.
12-1595
12
Functions
Assess Autocorrelation in Inferred Residuals Infer residuals from an estimated ARIMA model, and assess whether the residuals exhibit autocorrelation using lbqtest. Load the Australian Consumer Price Index (CPI) data set. The time series (cpi) is the log quarterly CPI from 1972 to 1991. Remove the trend in the series by taking the first difference. load Data_JAustralian cpi = DataTable.PAU; T = length(cpi); dCPI = diff(cpi); dt = datetime(dates,ConvertFrom="datenum"); figure plot(dt(2:T),dCPI) title("Differenced Australian CPI") xlabel("Year") ylabel("CPI Growth Rate") axis tight
The differenced series appears stationary. Fit an AR(1) model to the series, and then infer residuals from the estimated model. Mdl = arima(1,0,0); EstMdl = estimate(Mdl,dCPI);
12-1596
lbqtest
ARIMA(1,0,0) Model (Gaussian Distribution): Value _________ Constant AR{1} Variance
0.015564 0.29646 0.0001038
StandardError _____________
TStatistic __________
PValue __________
5.4106 2.6834 8.6994
6.2808e-08 0.0072876 3.3362e-18
0.0028766 0.11048 1.1932e-05
res = infer(EstMdl,dCPI); stdRes = res/sqrt(EstMdl.Variance); % Standardized residuals
Assess whether the residuals are autocorrelated by conducting a Ljung-Box Q-test. The standardized residuals originate from the estimated model (EstMdl) containing parameters. When using such residuals, perform the following actions: • Adjust the degrees of freedom (DoF) of the test statistic distribution to account for the estimated parameters. • Set the number of lags to include in the test statistic. • When you count the estimated parameters, skip the constant and variance parameters. lags = 10; dof = lags - 1; % One autoregressive parameter [h,pValue] = lbqtest(stdRes,Lags=lags,DoF=dof) h = logical 1 pValue = 0.0119
pValue = 0.0119 suggests that the residuals have significant autocorrelation in at least one lag of lags 1 through 5, at the 5% level.
Input Arguments res — Residual series vector Residual series, specified as a numeric vector. Each element of res corresponds to an observation. Typically, res contains the (standardized) residuals from a model fit to observed time series. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single residual series (variable) to test by using the DataVariable argument. The selected variable must be numeric. 12-1597
12
Functions
Note Specify missing observations using NaN. The lbqtest function treats missing values as missing completely at random on page 12-1600. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: lbqtest(Tbl,DataVariable="ResidualGDP",Alpha=0.025,Lags=[1 4]) conducts two tests, at a level of significance of 0.025, for the presence of residual autocorrelation in the variable ResidualGDP of the table Tbl. The first test includes 1 lag in the test statistic, and the second test includes 4 lags. Lags — Number of lags min([20,T-1]) (default) | positive integer | vector of positive integers Number of lags L to include in the test statistic, specified as a positive integer that is less than T or a vector of such positive integers, where T is the effective sample size (the number of nonmissing values in the input series). lbqtest conducts a separate test for each element in Lags. Example: Lags=[1 4] conducts two tests. The first test includes only the first lag in the AR model of the squared residuals, and the second test includes the first through fourth lags. Data Types: double Alpha — Significance level 0.05 (default) | numeric scalar | numeric vector Significance level for the hypothesis test, specified as a numeric scalar in the interval (0,1) or a numeric vector of such values. lbqtest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DoF — Degrees of freedom Lags (default) | positive integer | vector of positive integers Degrees of freedom for the asymptotic chi-square distribution of the test statistic under the null hypothesis, specified as a positive integer or vector of positive integers. lbqtest conducts a separate test for each value in DoF. If DoF is an integer, then it must be less than or equal to Lags. Otherwise, each element of DoF must be less than or equal to the corresponding element of Lags. Example: DoF=15 specifies 15 degrees of freedom for the distribution of the test statistic. Data Types: double 12-1598
lbqtest
DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. Example: DataVariable="ResidualGDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. lbqtest returns h when you supply the input res. • Values of 1 indicate rejection of the no residual autocorrelation null hypothesis in favor of the alternative. • Values of 0 indicate failure to reject the no residual autocorrelation null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. lbqtest returns pValue when you supply the input res. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. lbqtest returns stat when you supply the input res. cValue — Test critical values numeric scalar | numeric vector Test critical values, determined by Alpha, returned as a numeric scalar or vector with length equal to the number of tests. lbqtest returns cValue when you supply the input res. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. lbqtest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, and DoF.
12-1599
12
Functions
More About Ljung-Box Q-Test The Ljung-Box Q-test is a "portmanteau" test that assesses the null hypothesis that a series of residuals exhibits no autocorrelation for a fixed number of lags L (see Lags), against the alternative that some autocorrelation coefficient ρ(k), k = 1, ..., L is nonzero. The test statistic is Q = T(T + 2)
L
∑
k=1
2
ρ(k) , (T − k)
where T is the sample size, L is the number of autocorrelation lags, and ρ(k) is the sample autocorrelation at lag k. Under the null hypothesis, the asymptotic distribution of Q is chi-square with L degrees of freedom. Missing Completely at Random Observations of a random variable are missing completely at random if the tendency of an observation to be missing is independent of both the random variable and the tendency of all other observations to be missing.
Tips If you obtain the input residual series by fitting a model to data, reduce the degrees of freedom DoF by the number of estimated coefficients, excluding constants. For example, if you obtain the input residuals by fitting an ARMA(p,q) model, set DoF=L−p−q, where L is the value of Lags.
Algorithms • The value of the Lags argument L affects the power of the test. • If L is too small, the test does not detect high-order autocorrelations. • If L is too large, the test loses power when a significant correlation at one lag is washed out by insignificant correlations at other lags. • Box, Jenkins, and Reinsel suggest the default setting Lags=min[20,T-1] [1]. • Tsay cites simulation evidence showing better test power performance when L is approximately log(T) [5]. • lbqtest does not directly test for serial dependencies other than autocorrelation. However, you can use it to identify conditional heteroscedasticity (ARCH effects) by testing squared residuals [4]. Engle's test assesses the significance of ARCH effects directly. For details, see archtest.
Version History Introduced before R2006a 12-1600
lbqtest
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002. [3] Gourieroux, C. ARCH Models and Financial Applications. New York: Springer-Verlag, 1997. [4] McLeod, A. I. and W. K. Li. "Diagnostic Checking ARMA Time Series Models Using SquaredResidual Autocorrelations." Journal of Time Series Analysis. Vol. 4, 1983, pp. 269–273. [5] Tsay, R. S. Analysis of Financial Time Series. 2nd Ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
See Also autocorr | archtest Topics “Detect Autocorrelation” on page 3-20 “Check Fit of Multiplicative ARIMA Model” on page 3-81 “Specify Conditional Mean and Variance Models” on page 7-81 “Time Series Regression VI: Residual Diagnostics” on page 5-220 “Ljung-Box Q-Test” on page 3-18
12-1601
12
Functions
lmctest Leybourne-McCabe stationarity test
Syntax h = lmctest(y) [h,pValue,stat,cValue] = lmctest(y) StatTbl = lmctest(Tbl) [ ___ ] = lmctest( ___ ,Name=Value) [ ___ ,reg1,reg2] = lmctest( ___ )
Description h = lmctest(y) returns the rejection decision h from conducting the Leybourne-McCabe stationarity test on page 12-1611 for assessing whether the univariate time series y is stationary. [h,pValue,stat,cValue] = lmctest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = lmctest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Leybourne-McCabe stationarity test on the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = lmctest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. lmctest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when lmctest conducts multiple tests: • lmctest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, lmctest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test includes 0 lagged terms in the structural model, and the second test includes 1 lagged term in the structural model. [ ___ ,reg1,reg2] = lmctest( ___ ) additionally returns structures of regression statistics, which are required to form the test statistic on page 12-1612. • reg1 – Maximum likelihood estimation of the reduced-form model • reg2 – Deterministic local level model of filtered response data, with Gaussian noise and an optional linear trend
Examples
12-1602
lmctest
Conduct Leybourne-McCabe Stationarity Test on Vector of Data Test a time series for stationarity using the default options of lmctest. Input the time series data as a numeric vector. Load Schwert's macroeconomic data set. Extract the monthly unemployment rate UN. load Data_SchwertMacro un = DataTableMth.UN;
Represent the series as a growth rate by applying the first difference. unr = diff(un);
The first difference operation causes unr to have one less observation than un. The timebase of unr starts at observation 2. Plot the unemployment growth rate. dts = datetime(datesMth,ConvertFrom="datenum"); plot(dts(2:end),unr) title("Unemployment Growth Rate")
Assess the null hypothesis of the Leybourne-McCabe stationarity test that the unemployment growth rate series is a trend-stationary AR(0) model. Use default options. h = lmctest(unr)
12-1603
12
Functions
h = logical 0
h = 0 indicates that, at a 5% level of significance, the test fails to reject the null hypothesis that the unemployment growth rate series is a trend-stationary AR(0) model.
Return Test p-Value and Decision Statistics Load Schwert's macroeconomic data set Data_SchwertMacro.mat. Extract the monthly unemployment rate UN, and apply the first difference to the series. load Data_SchwertMacro unr = diff(DataTableMth.UN);
Assess the null hypothesis that the series is a trend-stationary AR(0) process. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stats,cValue] = lmctest(unr) h = logical 0 pValue = 0.1000 stats = 0.0978 cValue = 0.1460
pValue = 0.1000 is the maximum tabulated value; its actual value can be larger than 0.1000.
Conduct Leybourne-McCabe Stationarity Test on Table Variable Test whether a time series, which is one variable in a table, for stationarity using the default options. Load Schwert's macroeconomic data set. Convert the table of monthly series to a timetable. load Data_SchwertMacro dates = datetime(datesMth,ConvertFrom="datenum"); TT = table2timetable(DataTableMth,RowTimes=dates);
Apply the first difference to all monthly series. DTT = varfun(@diff,TT(:,2:end)); DTT.Properties.VariableNames{end} ans = 'diff_SIG'
Assess the null hypothesis of the Leybourne-McCabe stationarity test that the rate of the volatility of returns to Standard & Poor's composite index series is a trend-stationary AR(0) model. StatTbl = lmctest(DTT)
12-1604
lmctest
StatTbl=1×8 table h _____ Test 1
false
pValue ______ 0.1
stat _________
cValue ______
0.0027953
0.146
Lags ____ 0
Alpha _____
Trend _____
Test ________
0.05
true
{'var2'}
lmctest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, Trend, and Test), and rows correspond to individual tests (in this case, lmctest conducts one test). By default, lmctest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Assess Whether Series Is Trend Stationary and AR(p) Test the growth of the US unemployment rate using the data in [5]. Load Schwert's macroeconomic data set. Convert the table of monthly series to a timetable. Apply the first difference to all variables in the timetable. load Data_SchwertMacro dates = datetime(datesMth,ConvertFrom="datenum"); TT = table2timetable(DataTableMth,RowTimes=dates); TT.Dates = []; DTT = varfun(@diff,TT);
Define the time range of the sample considered in [4]. Extract a subtable with the corresponding dates. trLM = timerange("1948-01-01","1985-12-01","closed"); DTT4 = DTT(trLM,:);
Assess the null hypothesis that the unemployment rate growth is a trend-stationary, AR(1) process. Conduct the same test twice. For the first test, specify the test statistic that uses the estimated variance from OLS regression (Test = "var1"). For the second test, specify the test statistic that uses estimated variance from the maximum likelihood of the reduced-form regression model (Test = "var2"). StatTbl = lmctest(DTT4,DataVariable="diff_UN", ... Lags=1,Test=["var1" "var2"]) StatTbl=2×8 table h _____ Test 1 Test 2
false true
pValue ________
stat ________
cValue ______
0.1 0.020721
0.099166 0.18741
0.146 0.146
Lags ____ 1 1
Alpha _____
Trend _____
Test ________
0.05 0.05
true true
{'var1'} {'var2'}
Row Test 1 of StatTbl contains the results of the first test StatTbl.h(1) = 0 indicates that, at a 5% level of significance, there is not enough evidence to reject that the unemployment growth rate is a trend-stationary, AR(1) process. Row Test 2 of StatTbl contains the results of the second test StatTbl.h(2) = 1 indicates that, at a 5% level of significance, there is enough evidence to reject 12-1605
12
Functions
that the unemployment growth rate is a trend-stationary, AR(1) process, which implies that the unemployment rate growth is nonstationary. Leybourne and McCabe report that the original LMC statistic fails to reject stationarity, while the modified LMC statistic does reject it [4].
Inspect Regression Statistics Load Schwert's macroeconomic data set. Convert the table of monthly series to a timetable. Apply the first difference to all variables in the timetable. load Data_SchwertMacro dates = datetime(datesMth,ConvertFrom="datenum"); TT = table2timetable(DataTableMth,RowTimes=dates); TT.Dates = []; DTT = varfun(@diff,TT);
Assess the null hypothesis that the unemployment rate growth is a trend-stationary, AR(1) process. Return the regression statistics from the maximum likelihood estimation of reduced-form model and from the OLS estimation of deterministic local level model of filtered response data. [StatTbl,reg1,reg2] = lmctest(DTT,DataVariable="diff_UN",Lags=1);
Display the coefficients of both regressions. rownames = ["c_0"; "delta"; "b_1"]; varnames = ["Coefficient"; "SE"; "PValue"]; coeff = reg1.coeff; se = reg1.se; pval = reg1.tStats.pVal; table(coeff,se,pval,RowNames=rownames,VariableNames=varnames) ans=3×3 table
c_0 delta b_1
Coefficient ___________
SE _________
PValue __________
-0.00051942 -0.25044 0.63665
0.0041996 0.051797 0.046451
0.90162 1.8305e-06 5.6215e-36
rownames = ["Intercept"; "Trend"]; coeff = reg2.coeff; se = reg2.se; pval = reg2.tStats.pVal; table(coeff,se,pval,RowNames=rownames,VariableNames=varnames) ans=2×3 table
Intercept Trend
12-1606
Coefficient ___________
SE __________
PValue _______
0.013904 -2.2372e-05
0.024521 9.3396e-05
0.57099 0.81079
lmctest
Input Arguments y — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note lmctest removes missing observations, represented by NaN values, from the input series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: lmctest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test includes 0 lagged terms in the structural model, and the second test includes 1 lagged term in the structural model. Lags — Number p of lagged response terms 0 (default) | nonnegative integer | vector of nonnegative integers Number p of lagged values of yt to include in the structural model, specified as a nonnegative integer or vector of nonnegative integers. p is equal to the number of lagged differences of yt in the reducedform model. lmctest conducts a separate test for each element in Lags. Lags > 0 decreases the sample size (see “Algorithms” on page 12-1612). Tip To draw valid inferences from a Leybourne-McCabe stationarity test, you must determine a suitable value for the Lags argument. The test is robust when p is greater than its true value in the data-generating process, and simulation evidence shows that, under the null hypothesis, the marginal distribution of the MLE of bp is asymptotically normal [3]. Therefore, you can use a standard t-test to determine whether bp is significant. However, estimated coefficient standard errors are unreliable when the MA(1) coefficient a is near 1. For a model-order test, valid under both the null and alternative hypotheses, that relies only on the MLEs of bp and a, and not on their standard errors, see [4]. 12-1607
12
Functions
Example: Lags=[0 1] includes no lags in the structural model for the first test, and then includes yt 1 in the structural model for the second test. Data Types: double Trend — Flag for including deterministic trend term δt true (default) | false | logical vector Flag for including deterministic trend δt in the structural model, specified as a logical scalar or vector. Trend=true is equivalent to including the drift term δ in the reduced-form model. lmctest conducts a separate test for each element in Trend. Tip With a specific testing strategy in mind, determine the value of the Trend argument by the growth characteristics of the input time series. • If the input series grows, include a trend term by setting Trend to true (default). This setting provides a reasonable comparison of a trend stationary null and a unit root process with drift. • If a series does not exhibit long-term growth characteristics, exclude a trend term by setting Trend to false.
Example: Trend=false excludes δt from the structural model for all tests. Data Types: logical Test — Variance estimate "var1" (default) | "var2" | character vector | string vector | cell vector of character vectors 2
Variance estimate σ 1 to use for the test statistic, specified as an estimate name, or a string vector or cell vector of estimate names, in this table. Test Name
Description
"var1"
2
σ1 =
e2′e2 , T
where e2 is the residual vector from the regression of the deterministic local level model of the filtered responses. "var2"
2
σ 1 = aσ2, where a and σ2 are MLEs from the reduced-form model regression. For more details, see “Test Statistics” on page 12-1612. lmctest conducts a separate test for each estimate name in Test. Example: Test=["var1" "var2"] conducts two tests. The first test uses the variance estimate described by "var1" to compute the test statistic, and the second test uses the variance estimate "var2". Data Types: char | cell | string Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector
12-1608
lmctest
Nominal significance level for the hypothesis test, specified a numeric scalar between 0.01 and 0.10 or a numeric vector of such values. lmctest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. lmctest returns h when you supply the input y. • Values of 1 indicate rejection of the AR(p) null hypothesis in favor of the ARIMA(p,1,1) alternative. • Values of 0 indicate failure to reject the AR(p) null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. lmctest returns pValue when you supply the input y. The p-values are right-tail probabilities. When test statistics are outside tabulated critical values, lmctest returns maximum (0.10) or minimum (0.01) p-values. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. lmctest returns stat when you supply the input y. For details, see “Test Statistics” on page 12-1612. cValue — Critical values numeric scalar | numeric vector 12-1609
12
Functions
Critical values, returned as a numeric scalar or vector with length equal to the number of tests. lmctest returns cValue when you supply the input y. Critical values are for right-tail probabilities. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. lmctest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Trend, and Test. reg1 — Regression statistics from maximum likelihood estimation of reduced-form model structure array Regression statistics from the maximum likelihood estimation of the reduced-form model, returned as a structure array with the number of records equal to the number of tests. Each element of reg1 has the fields in this table. You can access a field using dot notation, for example, reg(1).coeff contains the coefficient estimates of the first test.
12-1610
num
Length of input series with NaNs removed, T – 1
size
Effective sample size, adjusted for lags and difference, T – (p + 1)
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
lmctest
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
reg2 — Regression statistics from estimation of deterministic local level model of filtered response data structure array Regression statistics from the estimation of the deterministic local level model of the filtered response data, returned as a structure array with the number of records equal to the number of tests. reg2 has the same fields as reg1, but with the differences described in the following table. num
Length of input series with NaNs removed, T – p
size
Effective sample size, adjusted for lags and difference, T – p
More About Leybourne-McCabe Stationarity Test The Leybourne-McCabe stationarity test assesses the null hypothesis that a response series yt is a trend-stationary, degree p autoregressive model (AR(p)) against the alternative hypothesis that yt is nonstationary. Structural Model for Hypotheses
The structural model for the response series is yt = ct + δt + b1 yt − 1 + ⋯ + bp yt − 1 + u1, t ct = ct − 1 + u2, t, where •
u1, t iid 0, σ12 u2, t iid 0, σ22 ,
• u1,t and u2,t are independent. • The number of lags p is the value of the Lags option. • The trend term δt is present when Trend=true. • T is the sample size without missing observations. The model is second-order equivalent in moments to the reduced-form ARIMA(p,1,1) model Δyt = δ + b1 Δyt − 1 + ... + bp Δyt − p + 1 − aL vt, where L is the lag operator Lyt = yt–1 and vt ~ iid(0,σ2). The null hypothesis is that σ22 = 0 in the structural model, which is equivalent to a = 1 in the reduced-form model. The alternative is that σ22 > 0 or a < 1. Under the null hypothesis, the structural model is AR(p) with intercept c0 and trend δt; the reduced-form model is an over-differenced ARIMA(p,1,1) representation of the same process. 12-1611
12
Functions
Test Statistics
lmctest computes test statistics using this two-stage method: 1
Perform OLS regression of the reduced-form model to obtain maximum likelihood estimates (MLEs) of the coefficients (results are stored in the output reg1). Specifically, regress Y = Δyt on p lagged differences of y. lmctest stores the regression results in reg1.
2
Regress filtered data zt onto xt where • zt = yt − b1 yt − 1 − ... − bp yt − p . • t = p+1,…,T • xt is one of the following: • 1, for an intercept-only model when Trend=false • [1 t], for a model with an intercept and deterministic tome trend when Trend=true lmctest stores the regression results in reg2.
The test statistic s* (output stat) is s∗ =
e1′Ve1 s2T 2
,
wherea • e1 is a vector of the residuals from the reduced-form model regression. • V(i,j) = min(i,j) • s2 is an estimate of σ2 that depends on the value of the variance-estimation algorithm option Test: 1 • "var1" — The estimate is 2
σ1 =
e2′e2 , T
where e2 is the residual vector from the regression zt onto xt (output reg2). This selection is the original Leybourne-McCabe test, as described in [3], and it has a rate of consistency O(T). • "var2" (default) — The estimate is aσ2, where a and σ2 are MLEs from the reduced-form model regression (output reg1). This selection is the modified Leybourne-McCabe test, as described in [4], and it has a rate of consistency O(T2).
Tips • The alternative hypothesis that σ22 > 0 implies 0 < a < 1. As a result, an alternative model with a = 0 and a random walk, reduced-form model with iid errors is not possible. The class of I(1) alternatives represented by such a model is appropriate for economic series with significant MA(1) components [3]. To test for a random walk, use vratiotest.
Algorithms • The value of the Lags option lags the response in the structural model, and the reduced-form model operates on the first difference of the response. In general, when a time series is lagged or 12-1612
lmctest
differenced, the sample size is reduced. Without a presample, if yt is defined for t = 1,…,T, the lagged series yt–k is defined for t = k+1,…,T. When yt–k is differenced, the time base reduces to k +2,…,T. p lagged differences reduce the common time base to p+2,…,T and the effective sample size is T – (p+1). • Test statistics follow nonstandard distributions under the null, even asymptotically. Asymptotic critical values for a standard set of significance levels between 0.01 and 0.1, for models with and without a trend, have been tabulated in [2] using Monte Carlo simulations. Critical values cValue and p-values pValue reported by lmctest are interpolated from the tables. The tabulated tables are identical to those for kpsstest. • Bootstrapped critical values, used by tests with a unit root null (such as adftest and pptest), are not possible for lmctest [1]. As a result, size distortions for small samples may be significant, especially for highly persistent processes.
Version History Introduced in R2010a
References [1] Caner, M., and L. Kilian. "Size Distortions of Tests of the Null Hypothesis of Stationarity: Evidence and Implications for the PPP Debate." Journal of International Money and Finance. Vol. 20, 2001, pp. 639–657. [2] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [3] Leybourne, S. J., and B. P. M. McCabe. "A Consistent Test for a Unit Root." Journal of Business and Economic Statistics. Vol. 12, 1994, pp. 157–166. [4] Leybourne, S. J., and B. P. M. McCabe. "Modified Stationarity Tests with Data-Dependent ModelSelection Rules." Journal of Business and Economic Statistics. Vol. 17, 1999, pp. 264–270. [5] Schwert, G. W. "Effects of Model Specification on Tests for Unit Roots in Macroeconomic Data." Journal of Monetary Economics. Vol. 20, 1987, pp. 73–103.
See Also pptest | adftest | vratiotest | kpsstest Topics “Unit Root Nonstationarity” on page 3-33
12-1613
12
Functions
lmtest Lagrange multiplier test of model specification
Syntax h = lmtest(score,ParamCov,dof) h = lmtest(score,ParamCov,dof,alpha) [h,pValue] = lmtest( ___ ) [h,pValue,stat,cValue] = lmtest( ___ )
Description h = lmtest(score,ParamCov,dof) returns a logical value (h) with the rejection decision from conducting a Lagrange multiplier test on page 12-1622 of model specification at the 5% significance level. lmtest constructs the test statistic using the score function (score), the estimated parameter covariance (ParamCov), and the degrees of freedom (dof). h = lmtest(score,ParamCov,dof,alpha) returns the rejection decision of the Lagrange multiplier test conducted at significance level alpha. • If score and ParamCov are length k cell arrays, then all other arguments must be length k vectors or scalars. lmtest treats each cell as a separate test, and returns a vector of rejection decisions. • If score is a row cell array, then lmtest returns a row vector. [h,pValue] = lmtest( ___ ) returns the rejection decision and p-value (pValue) for the hypothesis test, using any of the input arguments in the previous syntaxes. [h,pValue,stat,cValue] = lmtest( ___ ) additionally returns the test statistic (stat) and critical value (cValue) for the hypothesis test.
Examples Choose the Best AR Model Specification Compare AR model specifications for a simulated response series using lmtest. Consider the AR(3) model: yt = 1 + 0 . 9yt − 1 − 0 . 5yt − 2 + 0 . 4yt − 3 + εt, where εt is Gaussian with mean 0 and variance 1. Specify this model using arima. Mdl = arima('Constant',1,'Variance',1,'AR',{0.9,-0.5,0.4});
Mdl is a fully specified, AR(3) model. Simulate presample and effective sample responses from Mdl. 12-1614
lmtest
T = 100; rng(1); % For reproducibility n = max(Mdl.P,Mdl.Q); % Number of presample observations y = simulate(Mdl,T + n);
y is a a random path from Mdl that includes presample observations. Specify the restricted model: yt = c + ϕ1 yt − 1 + ϕ2 yt − 2 + εt, where εt is Gaussian with mean 0 and variance σ2. Mdl0 = arima(3,0,0); Mdl0.AR{3} = 0;
The structure of Mdl0 is the same as Mdl. However, every parameter is unknown, except that ϕ3 = 0. This is an equality constraint during estimation. Estimate the restricted model using the simulated data (y). [EstMdl0,EstParamCov] = estimate(Mdl0,y((n+1):end),... 'Y0',y(1:n),'display','off'); phi10 = EstMdl0.AR{1}; phi20 = EstMdl0.AR{2}; phi30 = 0; c0 = EstMdl0.Constant; phi0 = [c0;phi10;phi20;phi30]; v0 = EstMdl0.Variance;
EstMdl0 contains the parameter estimates of the restricted model. lmtest requires the unrestricted model score evaluated at the restricted model estimates. The unrestricted model gradient is ∂l(ϕ1, ϕ2, ϕ3, c, σ2; yt, . . . , yt − 3) 1 = 2 (yt − c − ϕ1 yt − 1 − ϕ2 yt − 2 − ϕ3 yt − 3) ∂c σ ∂l(ϕ1, ϕ2, ϕ3, c, σ2; yt, . . . , yt − 3) 1 = 2 (yt − c − ϕ1 yt − 1 − ϕ2 yt − 2 − ϕ3 yt − 3)yt − ∂ϕ j σ ∂l(ϕ1, ϕ2, ϕ3, c, σ2; yt, . . . , yt − 3) ∂σ2
= −
j
1 1 2 + (yt − c − ϕ1 yt − 1 − ϕ2 yt − 2 − ϕ3 yt − 3) . 2σ2 2σ4
MatY = lagmatrix(y,1:3); LagY = MatY(all(~isnan(MatY),2),:); cGrad = (y((n+1):end)-[ones(T,1),LagY]*phi0)/v0; phi1Grad = ((y((n+1):end)-[ones(T,1),LagY]*phi0).*LagY(:,1))/v0; phi2Grad = ((y((n+1):end)-[ones(T,1),LagY]*phi0).*LagY(:,2))/v0; phi3Grad = ((y((n+1):end)-[ones(T,1),LagY]*phi0).*LagY(:,3))/v0; vGrad = -1/(2*v0)+((y((n+1):end)-[ones(T,1),LagY]*phi0).^2)/(2*v0^2); Grad = [cGrad,phi1Grad,phi2Grad,phi3Grad,vGrad]; % Gradient matrix score = sum(Grad)'; % Score under the restricted model
12-1615
12
Functions
Evaluate the unrestricted parameter covariance estimator using the restricted MLEs and the outer product of gradients (OPG) method. EstParamCov0 = inv(Grad'*Grad); dof = 1; % Number of model restrictions
Test the null hypothesis that ϕ3 = 0 at a 1% significance level using lmtest. [h,pValue] = lmtest(score,EstParamCov0,dof,0.1) h = logical 1 pValue = 2.2524e-09
pValue is close to 0, which suggests that there is strong evidence to reject the restricted, AR(2) model in favor of the unrestricted, AR(3) model.
Assess Model Specifications Using the Lagrange Multiplier Test Compare two model specifications for simulated education and income data. The unrestricted model has the following loglikelihood: n
l(β, ρ) = − nlogΓ(ρ) + ρ
∑
k=1
n
n
logβk + (ρ − 1)
∑
k=1
log(yk) −
∑
k=1
ykβk,
where •
βk =
1 . β + xk
• xk is the number of grades that person k completed. •
yk is the income (in thousands of USD) of person k.
That is, the income of person k given the number of grades that person k completed is Gamma distributed with shape ρ and rate βi. The restricted model sets ρ = 1, which implies that the income of person k given the number of grades person k completed is exponentially distributed with mean β + xi. The restricted model is H0 : ρ = 1. In order to compare this model to the unrestricted model, you require: • The gradient vector of the unrestricted model • The maximum likelihood estimate (MLE) under the restricted model • The parameter covariance estimator evaluated under the MLEs of the restricted model Load the data. load Data_Income1 x = DataTable.EDU; y = DataTable.INC;
12-1616
lmtest
Estimate the restricted model parameters by maximizing l(ρ, β) with respect to β subject to the restriction ρ = 1. The gradient of l(ρ, β) is ∂l(ρ, β) = ∂β
T
∑
i=1
2
(yiβi − ρβi) T
∂l(ρ, β) = − TΨ(ρ) + ∑ (logβi yi), ∂ρ i=1 where Ψ(ρ) is the digamma function. rho0 = 1; % Restricted rho dof = 1; % Number of restrictions dLBeta = @(beta) sum(y./((beta + x).^2) - rho0./(beta + x));... % Anonymous gradient function [betaHat0,fVal,exitFlag] = fzero(dLBeta,0) betaHat0 = 15.6027 fVal = 2.7756e-17 exitFlag = 1 beta = [0:0.1:50]; plot(beta,arrayfun(dLBeta,beta)) hold on plot([beta(1);beta(end)],zeros(2,1),'k:') plot(betaHat0,fVal,'ro','MarkerSize',10) xlabel('{\beta}') ylabel('Loglikelihood Gradient') title('{\bf Loglikelihood Gradient with Respect to \beta}') hold off
12-1617
12
Functions
The gradient with respect to β (dLBeta) is decreasing, which suggests that there is a local maximum at its root. Therefore, betaHat0 is the MLE for the restricted model. fVal indicates that the value of the gradient is very close to 0 at betaHat0. The exit flag (exitFlag) is 1, which indicates that fzero found a root of the gradient without a problem. Estimate the parameter covariance under the restricted model using the outer product of gradients (OPG). rGradient = [-rho0./(betaHat0+x)+y.*(betaHat0+x).^(-2),... log(y./(betaHat0+x))-psi(rho0)]; % Gradient per unit rScore = sum(rGradient)'; % Score function rEstParamCov = inv(rGradient'*rGradient); % Parameter covariance estimate
Test the unrestricted model against the restricted model using the Lagrange multiplier test. [h,pValue] = lmtest(rScore,rEstParamCov,dof) h = logical 1 pValue = 7.4744e-05
pValue is close to 0, which indicates that there is strong evidence to suggest that the unrestricted model fits the data better than the restricted model.
12-1618
lmtest
Assess Conditional Heteroscedasticity Using the Lagrange Multiplier Test Test whether there are significant ARCH effects in a simulated response series using lmtest. The parameter values in this example are arbitrary. Specify the AR(1) model with an ARCH(1) variance: yt = 0 . 9yt − 1 + εt, where • εt = wt ht . • ht = 1 + 0 . 5ε2 . t−1 • wt is Gaussian with mean 0 and variance 1. VarMdl = garch('ARCH',0.5,'Constant',1); Mdl = arima('Constant',0,'Variance',VarMdl,'AR',0.9);
Mdl is a fully specified, AR(1) model with an ARCH(1) variance. Simulate presample and effective sample responses from Mdl. T = 100; rng(1); % For reproducibility n = 2; % Number of presample observations required for the gradient [y,ep,v] = simulate(Mdl,T + n);
ep is the random path of innovations from VarMdl. The software filters ep through Mdl to yield the random response path y. Specify the restricted model and assume that the AR model constant is 0: yt = c + ϕ1 yt − 1 + εt, where ht = α0 + α1εt2− 1. VarMdl0 = garch(0,1); VarMdl0.ARCH{1} = 0; Mdl0 = arima('ARLags',1,'Constant',0,'Variance',VarMdl0);
The structure of Mdl0 is the same as Mdl. However, every parameter is unknown, except for the restriction α1 = 0. These are equality constraints during estimation. You can interpret Mdl0 as an AR(1) model with the Gaussian innovations that have mean 0 and constant variance. Estimate the restricted model using the simulated data (y). psI = 1:n; % Presample indices esI = (n + 1):(T + n); % Estimation sample indices [EstMdl0,EstParamCov] = estimate(Mdl0,y(esI),... 'Y0',y(psI),'E0',ep(psI),'V0',v(psI),'display','off'); phi10 = EstMdl0.AR{1}; alpha00 = EstMdl0.Variance.Constant;
EstMdl0 contains the parameter estimates of the restricted model. 12-1619
12
Functions
lmtest requires the unrestricted model score evaluated at the restricted model estimates. The unrestricted model loglikelihood function is T
l(ϕ1, α0, α1) =
∑
t=2
− 0 . 5log(2π) − 0 . 5loght −
εt2 , 2ht
where εt = yt − ϕ1 yt − 1. The unrestricted gradient is ∂l(ϕ1, α0, α1) = ∂α
T
1 zt f t, 2h t t=2
∑
where zt = [1, εt2− 1] and f t =
I=
εt2 − 1. The information matrix is ht
T
1 2
2ht
∑
t=2
zt′zt .
Under the null, restricted model, ht = h0 = α0 for all t, where α0 is the estimate from the restricted model analysis. Evaluate the gradient and information matrix under the restricted model. Estimate the parameter covariance by inverting the information matrix. e = y - phi10*lagmatrix(y,1); eLag1Sq = lagmatrix(e,1).^2; h0 = alpha00; ft = (e(esI).^2/h0 - 1); zt = [ones(T,1),eLag1Sq(esI)]'; score0 = 1/(2*h0)*zt*ft; % Score function InfoMat0 = (1/(2*h0^2))*(zt*zt'); EstParamCov0 = inv(InfoMat0); % Estimated parameter covariance dof = 1; % Number of model restrictions
Test the null hypothesis that α1 = 0 at the 5% significance level using lmtest. [h,pValue] = lmtest(score0,EstParamCov0,dof) h = logical 1 pValue = 4.0443e-06
pValue is close to 0, which suggests that there is evidence to reject the restricted AR(1) model in favor of the unrestricted AR(1) model with an ARCH(1) variance.
Input Arguments score — Unrestricted model loglikelihood gradients vector | cell array of vectors 12-1620
lmtest
Unrestricted model loglikelihood gradients evaluated at the restricted model parameter estimates, specified as a vector or cell vector. • For a single test, score can be a p-vector or a singleton cell array containing a p-by-1 vector. p is the number of parameters in the unrestricted model. • For conducting k > 1 tests, score must be a length k cell array. Cell j must contain one pj-by-1 vector that corresponds to one independent test. pj is the number of parameters in the unrestricted model of test j. Data Types: double | cell ParamCov — Parameter covariance estimate matrix | cell array of matrices Parameter covariance estimate, specified as a symmetric matrix of cell array of symmetric matrices. ParamCov is the unrestricted model parameter covariance estimator evaluated at the restricted model parameter estimates. • For a single test, ParamCov can be a p-by-p matrix or singleton cell array containing a p-by-p matrix. p is the number of parameters in the unrestricted model. • For conducting k > 1 tests, ParamCov must be a length k cell array. Cell j must contain one pj-bypj matrix that corresponds to one independent test. pj is the number of parameters in the unrestricted model of test j. Data Types: double | cell dof — Degrees of freedom positive integer | vector of positive integers Degrees of freedom for the asymptotic, chi-square distribution of the test statistics, specified as a positive integer or vector of positive integers. For each corresponding test, the elements of dof: • Are the number of model restrictions • Should be less than the number of parameters in the unrestricted model When conducting k > 1 tests, • If dof is a scalar, then the software expands it to a k-by-1 vector. • If dof is a vector, then it must have length k. alpha — Nominal significance levels 0.05 (default) | scalar | vector Nominal significance levels for the hypothesis tests, specified as a scalar or vector. Each element of alpha must be greater than 0 and less than 1. When conducting k > 1 tests, • If alpha is a scalar, then the software expands it to a k-by-1 vector. • If alpha is a vector, then it must have length k. Data Types: double 12-1621
12
Functions
Output Arguments h — Test rejection decisions logical | vector of logicals Test rejection decisions, returned as a logical value or vector of logical values with a length equal to the number of tests that the software conducts. • h = 1 indicates rejection of the null, restricted model in favor of the alternative, unrestricted model. • h = 0 indicates failure to reject the null, restricted model. pValue — Test statistic p-values scalar | vector Test statistic p-values, returned as a scalar or vector with a length equal to the number of tests that the software conducts. stat — Test statistics scalar | vector Test statistics, returned as a scalar or vector with a length equal to the number of tests that the software conducts. cValue — Critical values scalar | vector Critical values determined by alpha, returned as a scalar or vector with a length equal to the number of tests that the software conducts.
More About Lagrange Multiplier Test This test compares specifications of nested models by assessing the significance of restrictions to an extended model with unrestricted parameters. The test statistic (LM) is LM = S′VS, where • S is the gradient of the unrestricted loglikelihood function, evaluated at the restricted parameter estimates (score), i.e., S=
∂l(θ) ∂θ
θ = θ 0, MLE
.
• V is the covariance estimator for the unrestricted model parameters, evaluated at the restricted parameter estimates. If LM exceeds a critical value in its asymptotic distribution, then the test rejects the null, restricted (nested) model in favor of the alternative, unrestricted model. 12-1622
lmtest
The asymptotic distribution of LM is chi-square. Its degrees of freedom (dof) is the number of restrictions in the corresponding model comparison. The nominal significance level of the test (alpha) determines the critical value (cValue).
Tips • lmtest requires the unrestricted model score and parameter covariance estimator evaluated at parameter estimates for the restricted model. For example, to compare competing, nested arima models: 1
Analytically compute the score and parameter covariance estimator based on the innovation distribution.
2
Use estimate to estimate the restricted model parameters.
3
Evaluate the score and covariance estimator at the restricted model estimates.
4
Pass the evaluated score, restricted covariance estimate, and the number of restrictions (i.e., the degrees of freedom) into lmtest.
• If you find estimating parameters in the unrestricted model difficult, then use lmtest. By comparison: • waldtest only requires unrestricted parameter estimates. • lratiotest requires both unrestricted and restricted parameter estimates.
Algorithms • lmtest performs multiple, independent tests when inputs are cell arrays. • If the gradients and covariance estimates are the same for all tests, but the restricted parameter estimates vary, then lmtest “tests down” against multiple restricted models. • If the gradients and covariance estimates vary, but the restricted parameter estimates do not, then lmtest “tests up” against multiple unrestricted models. • Otherwise, lmtest compares model specifications pair-wise. • alpha is nominal in that it specifies a rejection probability in the asymptotic distribution. The actual rejection probability can differ from the nominal significance. Lagrange multiplier tests tend to under-reject for small values of alpha, and over-reject for large values of alpha. Lagrange multiplier tests typically yield lower rejection errors than likelihood ratio and Wald tests.
Version History Introduced in R2009a
References [1] Davidson, R. and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, W. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2008. 12-1623
12
Functions
[4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima | varm | garch | regARIMA Functions lratiotest | estimate | estimate | waldtest | estimate | estimate Topics “Model Comparison Tests” on page 3-58 “Classical Model Misspecification Tests” on page 3-70 “Conduct Lagrange Multiplier Test” on page 3-62
12-1624
lratiotest
lratiotest Likelihood ratio test of model specification
Syntax h = lratiotest(uLogL,rLogL,dof) h = lratiotest(uLogL,rLogL,dof,alpha) [h,pValue] = lratiotest( ___ ) [h,pValue,stat,cValue] = lratiotest( ___ )
Description h = lratiotest(uLogL,rLogL,dof) returns a logical value (h) with the rejection decision from conducting a likelihood ratio test on page 12-1631 of model specification. lratiotest constructs the test statistic using the loglikelihood objective function evaluated at the unrestricted model parameter estimates (uLogL) and the restricted model parameter estimates (rLogL). The test statistic distribution has dof degrees of freedom. • If uLogL or rLogL is a vector, then the other must be a scalar or vector of equal length. lratiotest(uLogL,rLogL,dof) treats each element of a vector input as a separate test, and returns a vector of rejection decisions. • If uLogL or rLogL is a row vector, then lratiotest(uLogL,rLogL,dof) returns a row vector. h = lratiotest(uLogL,rLogL,dof,alpha) returns the rejection decision of the likelihood ratio test conducted at significance level alpha. [h,pValue] = lratiotest( ___ ) returns the rejection decision and p-value (pValue) for the hypothesis test, using any of the input arguments in the previous syntaxes. [h,pValue,stat,cValue] = lratiotest( ___ ) additionally returns the test statistic (stat) and critical value (cValue) for the hypothesis test.
Examples Assess Model Specifications Using the Likelihood Ratio Test Compare two model specifications for simulated education and income data. The unrestricted model has the following loglikelihood: n
l(β, ρ) = − nlogΓ(ρ) + ρ
∑
k=1
n
logβk + (ρ − 1)
∑
k=1
n
log(yk) −
∑
k=1
ykβk,
where •
βk =
1 . β + xk 12-1625
12
Functions
• xk is the number of grades that person k completed. •
yk is the income (in thousands of USD) of person k.
That is, the income of person k given the number of grades that person k completed is Gamma distributed with shape ρ and rate βk. The restricted model sets ρ = 1, which implies that the income of person k given the number of grades person k completed is exponentially distributed with mean β + xk. The restricted model is H0 : ρ = 1. Comparing this model to the unrestricted model using lratiotest requires the following: • The loglikelihood function • The maximum likelihood estimate (MLE) under the unrestricted model • The MLE under the restricted model Load the data. load Data_Income1 x = DataTable.EDU; y = DataTable.INC;
To estimate the unrestricted model parameters, maximize l(ρ, β) with respect to ρ and β. The gradient of l(ρ, β) is n
∂l(ρ, β) = − nψ(ρ) + ∑ log(ykβk) ∂ρ k=1 ∂l(ρ, β) = ∂β
n
∑
k=1
βk(βk yk − ρ),
where ψ(ρ) is the digamma function. nLogLGradFun = @(theta) deal(-sum(-gammaln(theta(1)) - ... theta(1)*log(theta(2) + x) + (theta(1)-1)*log(y) - ... y./(theta(2)+x)),... -[sum(-psi(theta(1))+log(y./(theta(2)+x)));... sum(1./(theta(2)+x).*(y./(theta(2)+x)-theta(1)))]);
nLogLGradFun is an anonymous function that returns the negative loglikelihood and the gradient given the input theta, which holds the parameters ρ and β, respectively. Numerically optimize the negative loglikelihood function using fmincon, which minimizes an objective function subject to constraints. theta0 = randn(2,1); % Initial value for optimization uLB = [0 -min(x)]; % Unrestricted model lower bound uUB = [Inf Inf]; % Unrestricted model upper bound options = optimoptions('fmincon','Algorithm','interior-point',... 'FunctionTolerance',1e-10,'Display','off',... 'SpecifyObjectiveGradient',true); % Optimization options [uMLE,uLogL] = fmincon(nLogLGradFun,theta0,[],[],[],[],uLB,uUB,[],options); uLogL = -uLogL;
12-1626
lratiotest
uMLE is the unrestricted maximum likelihood estimate, and uLogL is the loglikelihood maximum. Impose the restriction to the loglikelihood by setting the corresponding lower and upper bound constraints of ρ to 1. Minimize the negative, restricted loglikelihood. dof = 1; % Number of restrictions rLB = [1 -min(x)]; % Restricted model lower bound rUB = [1 Inf]; % Restricted model upper bound [rMLE,rLogL] = fmincon(nLogLGradFun,theta0,[],[],[],[],rLB,rUB,[],options); rLogL = -rLogL;
rMLE is the unrestricted maximum likelihood estimate, and rLogL is the loglikelihood maximum. Use the likelihood ratio test to assess whether the data provide enough evidence to favor the unrestricted model over the restricted model. [h,pValue,stat] = lratiotest(uLogL,rLogL,dof) h = logical 1 pValue = 8.9146e-04 stat = 11.0404
pValue is close to 0, which indicates that there is strong evidence suggesting that the unrestricted model fits the data better than the restricted model.
Test Among Multiple Nested Model Specifications Assess model specifications by testing down among multiple restricted models using simulated data. The true model is the ARMA(2,1) yt = 3 + 0 . 9yt − 1 − 0 . 5yt − 2 + εt + 0 . 7εt − 1, where εt is Gaussian with mean 0 and variance 1. Specify the true ARMA(2,1) model, and simulate 100 response values. TrueMdl = arima('AR',{0.9,-0.5},'MA',0.7,... 'Constant',3,'Variance',1); T = 100; rng(1); % For reproducibility y = simulate(TrueMdl,T);
Specify the unrestricted model and the candidate models for testing down. Mdl = {arima(2,0,2),arima(2,0,1),arima(2,0,0),arima(1,0,2),arima(1,0,1),... arima(1,0,0),arima(0,0,2),arima(0,0,1)}; rMdlNames = {'ARMA(2,1)','AR(2)','ARMA(1,2)','ARMA(1,1)',... 'AR(1)','MA(2)','MA(1)'};
Mdl is a 1-by-7 cell array. Mdl{1} is the unrestricted model, and all other cells contain a candidate model. 12-1627
12
Functions
Fit the candidate models to the simulated data. logL = zeros(size(Mdl,1),1); % Preallocate loglikelihoods dof = logL; % Preallocate degrees of freedom for k = 1:size(Mdl,2) [EstMdl,~,logL(k)] = estimate(Mdl{k},y,'Display','off'); dof(k) = 4 - (EstMdl.P + EstMdl.Q); % Number of restricted parameters end uLogL = logL(1); rLogL = logL(2:end); dof = dof(2:end);
uLogL and rLogL are the values of the unrestricted loglikelihood evaluated at the unrestricted and restricted model parameter estimates, respectively. Apply the likelihood ratio test at a 1% significance level to find the appropriate, restricted model specification(s). alpha = .01; h = lratiotest(uLogL,rLogL,dof,alpha); RestrictedModels = rMdlNames(~h) RestrictedModels = 1x4 cell {'ARMA(2,1)'} {'ARMA(1,2)'}
{'ARMA(1,1)'}
{'MA(2)'}
The most appropriate restricted models are ARMA(2,1), ARMA(1,2), ARMA(1,1), or MA(2). You can test down again, but use ARMA(2,1) as the unrestricted model. In this case, you must remove MA(2) from the possible restricted models.
Assess Conditional Heteroscedasticity Using the Likelihood Ratio Test Test whether there are significant ARCH effects in a simulated response series using lratiotest. The parameter values in this example are arbitrary. Specify the AR(1) model with an ARCH(1) variance: yt = 0 . 9yt − 1 + εt, where • εt = wt ht . • ht = 1 + 0 . 5ε2 . t−1 • wt is Gaussian with mean 0 and variance 1. VarMdl = garch('ARCH',0.5,'Constant',1); Mdl = arima('Constant',0,'Variance',VarMdl,'AR',0.9);
Mdl is a fully specified AR(1) model with an ARCH(1) variance. Simulate presample and effective sample responses from Mdl. 12-1628
lratiotest
T = 100; rng(1); % For reproducibility n = 2; % Number of presample observations required for the gradient [y,epsilon,condVariance] = simulate(Mdl,T + n); psI = 1:n; % Presample indices esI = (n + 1):(T + n); % Estimation sample indices
epsilon is the random path of innovations from VarMdl. The software filters epsilon through Mdl to yield the random response path y. Specify the unrestricted model assuming that the conditional mean model constant is 0: yt = ϕ1 yt − 1 + εt, where ht = α0 + α1εt2− 1. Fit the simulated data (y) to the unrestricted model using the presample observations. UVarMdl = garch(0,1); UMdl = arima('ARLags',1,'Constant',0,'Variance',UVarMdl); [~,~,uLogL] = estimate(UMdl,y(esI),'Y0',y(psI),'E0',epsilon(psI),... 'V0',condVariance(psI),'Display','off');
uLogL is the maximum value of the unrestricted loglikelihood function. Specify the restricted model assuming that the conditional mean model constant is 0: yt = ϕ1 yt − 1 + εt, where ht = α0. Fit the simulated data (y) to the restricted model using the presample observations. RVarMdl = garch(0,1); RVarMdl.ARCH{1} = 0; RMdl = arima('ARLags',1,'Constant',0,'Variance',RVarMdl); [~,~,rLogL] = estimate(RMdl,y(esI),'Y0',y(psI),'E0',epsilon(psI),... 'V0',condVariance(psI),'Display','off');
The structure of RMdl is the same as UMdl. However, every parameter is unknown, except for the restriction. These are equality constraints during estimation. You can interpret RMdl as an AR(1) model with the Gaussian innovations that have mean 0 and constant variance. Test the null hypothesis that α1 = 0 at the default 5% significance level using lratoitest. dof = (UMdl.P + UMdl.Q + UVarMdl.P + UVarMdl.Q) ... - (RMdl.P + RMdl.Q + RVarMdl.P + RVarMdl.Q); [h,pValue,stat,cValue] = lratiotest(uLogL,rLogL,dof) h = logical 1 pValue = 6.7505e-04 stat = 11.5567 cValue = 3.8415
12-1629
12
Functions
h = 1 indicates that the null, restricted model should be rejected in favor of the alternative, unrestricted model. pValue is close to 0, suggesting that there is strong evidence for the rejection. stat is the value of the chi-square test statistic, and cValue is the critical value for the test.
Input Arguments uLogL — Unrestricted model loglikelihood maxima scalar | vector Unrestricted model loglikelihood maxima, specified as a scalar or vector. If uLogL is a scalar, then the software expands it to the same length as rLogL. Data Types: double rLogL — Restricted model loglikelihood maxima scalar | vector Restricted model loglikelihood maxima, specified as a scalar or vector. If rLogL is a scalar, then the software expands it to the same length as uLogL. Elements of rLogL should not exceed the corresponding elements of uLogL. Data Types: double dof — Degrees of freedom positive integer | vector of positive integers Degrees of freedom for the asymptotic, chi-square distribution of the test statistics, specified as a positive integer or vector of positive integers. For each corresponding test, the elements of dof: • Are the number of model restrictions • Should be less than the number of parameters in the unrestricted model. When conducting k > 1 tests, • If dof is a scalar, then the software expands it to a k-by-1 vector. • If dof is a vector, then it must have length k. Data Types: double alpha — Nominal significance levels 0.05 (default) | scalar | vector Nominal significance levels for the hypothesis tests, specified as a scalar or vector. Each element of alpha must be greater than 0 and less than 1. When conducting k > 1 tests, • If alpha is a scalar, then the software expands it to a k-by-1 vector. • If alpha is a vector, then it must have length k. Data Types: double 12-1630
lratiotest
Output Arguments h — Test rejection decisions logical | vector of logicals Test rejection decisions, returned as a logical value or vector of logical values with a length equal to the number of tests that the software conducts. • h = 1 indicates rejection of the null, restricted model in favor of the alternative, unrestricted model. • h = 0 indicates failure to reject the null, restricted model. pValue — Test statistic p-values scalar | vector Test statistic p-values, returned as a scalar or vector with a length equal to the number of tests that the software conducts. stat — Test statistics scalar | vector Test statistics, returned as a scalar or vector with a length equal to the number of tests that the software conducts. cValue — Critical values scalar | vector Critical values determined by alpha, returned as a scalar or vector with a length equal to the number of tests that the software conducts.
More About Likelihood Ratio Test The likelihood ratio test compares specifications of nested models by assessing the significance of restrictions to an extended model with unrestricted parameters. The test uses the following algorithm: 1
Maximize the loglikelihood function [l(θ)] under the restricted and unrestricted model assumptions. Denote the MLEs for the restricted and unrestricted models θ 0 and θ , respectively.
2
Evaluate the loglikelihood objective function at the restricted and unrestricted MLEs, i.e., l
0
= l θ 0 and l = l θ .
3
Compute the likelihood ratio test statistic, LR = 2 l − l
4
If LR exceeds a critical value (Cα) relative to its asymptotic distribution, then reject the null, restricted model in favor of the alternative, unrestricted model.
0
.
• Under the null hypothesis, LR is χd2 distributed with d degrees of freedom. • The degrees of freedom for the test (d) is the number of restricted parameters. • The significance level of the test (α) determines the critical value (Cα). 12-1631
12
Functions
Tips • Estimate unrestricted and restricted univariate linear time series models, such as arima or garch, or time series regression models (regARIMA) using estimate. Estimate unrestricted and restricted VAR models (varm) using estimate. The estimate functions return loglikelihood maxima, which you can use as inputs to lratiotest. • If you can easily compute both restricted and unrestricted parameter estimates, then use lratiotest. By comparison: • waldtest only requires unrestricted parameter estimates. • lmtest requires restricted parameter estimates.
Algorithms • lratiotest performs multiple, independent tests when the unrestricted or restricted model loglikelihood maxima (uLogL and rLogL, respectively) is a vector. • If rLogL is a vector and uLogL is a scalar, then lratiotest “tests down” against multiple restricted models. • If uLogL is a vector and rLogL is a scalar, then lratiotest “tests up” against multiple unrestricted models. • Otherwise, lratiotest compares model specifications pair-wise. • alpha is nominal in that it specifies a rejection probability in the asymptotic distribution. The actual rejection probability is generally greater than the nominal significance.
Version History Introduced before R2006a
References [1] Davidson, R. and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, W. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2008. [4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima | varm | garch | regARIMA Functions lmtest | estimate | estimate | waldtest | estimate | estimate 12-1632
lratiotest
Topics “Compare GARCH Models Using Likelihood Ratio Test” on page 3-67 “Classical Model Misspecification Tests” on page 3-70 “Model Comparison Tests” on page 3-58
12-1633
12
Functions
mcmix Create random Markov chain with specified mixing structure
Syntax mc = mcmix(numStates) mc = mcmix(numStates,Name=Value)
Description mc = mcmix(numStates) returns the discrete-time Markov chain mc containing numStates states. mc is characterized by random transition probabilities. mc = mcmix(numStates,Name=Value) uses additional options specified by one or more namevalue arguments to structure mc to simulate different mixing times. For example, you can control the pattern of feasible transitions.
Examples Generate Markov Chain from Random Transition Matrix Generate a six-state Markov chain from a random transition matrix. rng(1); % For reproducibility mc = mcmix(6);
mc is a dtmc object. Display the transition matrix. mc.P ans = 6×6 0.2732 0.3050 0.0078 0.2480 0.2708 0.2791
0.1116 0.2885 0.0439 0.1481 0.2488 0.1095
0.1145 0.0475 0.0082 0.2245 0.0580 0.0991
0.1957 0.0195 0.2439 0.0485 0.1614 0.2611
0.0407 0.1513 0.2950 0.1369 0.0137 0.1999
0.2642 0.1882 0.4013 0.1939 0.2474 0.0513
Plot a digraph of the Markov chain. Specify coloring the edges according to the probability of transition. figure; graphplot(mc,ColorEdges=true);
12-1634
mcmix
Decrease Feasible Transitions Generate random transition matrices containing a specified number of zeros in random locations. A zero in location (i, j) indicates that state i does not transition to state j. Generate two 10-state Markov chains from random transition matrices. Specify the random placement of 10 zeros within one chain and 30 zeros within the other chain. rng(1); % For reproducibility numStates = 10; mc1 = mcmix(numStates,Zeros=10); mc2 = mcmix(numStates,Zeros=30);
mc1 and mc2 are dtmc objects. Estimate the mixing times for each Markov chain. [~,tMix1] = asymptotics(mc1) tMix1 = 0.7567 [~,tMix2] = asymptotics(mc2) tMix2 = 0.8137
mc1, the Markov chain with higher connectivity, mixes more quickly than mc2. 12-1635
12
Functions
Fix Specific Transition Probabilities Generate a Markov chain characterized by a partially random transition matrix. Also, decrease the number of feasible transitions. Generate a 4-by-4 matrix of missing (NaN) values, which represents the transition matrix. P = NaN(4);
Specify that state 1 transitions to state 2 with probability 0.5, and that state 2 transitions to state 1 with the same probability. P(1,2) = 0.5; P(2,1) = 0.5;
Create a Markov chain characterized by the partially known transition matrix. For the remaining unknown transition probabilities, specify that five transitions are infeasible for 5 random transitions. An infeasible transition is a transition whose probability of occurring is zero. rng(1); % For reproducibility mc = mcmix(4,Fix=P,Zeros=5);
mc is a dtmc object. With the exception of the fixed elements (1,2) and (2,1) of the transition matrix, mcmix places five zeros in random locations and generates random probabilities for the remaining nine locations. The probabilities in a particular row sum to 1. Display the transition matrix and plot a digraph of the Markov chain. In the plot, indicate transition probabilities by specifying edge colors. P = mc.P P = 4×4 0 0.5000 0.1632 0
0.5000 0 0 0.5672
0.1713 0.1829 0.8368 0.1676
figure; graphplot(mc,'ColorEdges',true);
12-1636
0.3287 0.3171 0 0.2652
mcmix
Input Arguments numStates — Number of states positive integer Number of states, specified as a positive integer. If you do not specify any name-value arguments, mcmix constructs a Markov chain with random transition probabilities. Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: Zeros=10 places 0 at 10 random locations in the transition matrix. Fix — Locations and values of fixed transition probabilities NaN(numStates) (default) | numeric matrix 12-1637
12
Functions
Locations and values of fixed transition probabilities, specified as a numStates-by-numStates numeric matrix. Probabilities in any row must have a sum less than or equal to 1. Rows that sum to 1 also fix 0 values in the rest of the row. mcmix assigns random probabilities to locations containing NaN values. Example: Fix=[0.5 NaN NaN; NaN 0.5 NaN; NaN NaN 0.5] Data Types: double Zeros — Number of zero-valued transition probabilities 0 (default) | positive integer Number of zero-valued transition probabilities to assign to random locations in the transition matrix, specified as a positive integer less than NumStates. The mcmix function assigns Zeros zeros to the locations containing a NaN in Fix. Example: Zeros=10 Data Types: double StateNames — Unique state labels string(1:numStates) (default) | string vector | cell vector of character vectors | numeric vector Unique state labels, specified as a string vector, cell vector of character vectors, or numeric vector of numStates length. Elements correspond to rows and columns of the transition matrix. Example: StateNames=["Depression" "Recession" "Stagnant" "Boom"] Data Types: double | string | cell
Output Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain, returned as a dtmc object.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985.
See Also dtmc | asymptotics 12-1638
mcmix
Topics “Discrete-Time Markov Chains” on page 10-2 “Markov Chain Modeling” on page 10-8
12-1639
12
Functions
minus Lag operator polynomial subtraction
Syntax C = minus(A, B, 'Tolerance', tolerance) C = A -B
Description Given two lag operator polynomials A(L) and B(L), C = minus(A, B, 'Tolerance', tolerance) performs a polynomial subtraction C(L) = A(L) – B(L) with tolerance tolerance. 'Tolerance' is the nonnegative scalar tolerance used to determine which coefficients are included in the result. The default tolerance is 1e–12. Specifying a tolerance greater than 0 allows the user to exclude polynomial lags with near-zero coefficients. A coefficient matrix of a given lag is excluded only if the magnitudes of all elements of the matrix are less than or equal to the specified tolerance. C = A -B performs a polynomial subtraction. If at least one of A or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator).
Examples Subtract Two Lag Operator Polynomials Create two LagOp polynomials and subtract one from the other: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); A-B ans = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [-0.1 0.08] Lags: [1 2] Degree: 2 Dimension: 1
Algorithms The subtraction operator (–) invokes minus, but the optional coefficient tolerance is available only by calling minus directly.
See Also plus 12-1640
mixconjugateblm
mixconjugateblm Bayesian linear regression model with conjugate priors for stochastic search variable selection (SSVS)
Description The Bayesian linear regression model on page 12-1654 object mixconjugateblm specifies the joint prior distribution of the regression coefficients and the disturbance variance (β, σ2) for implementing SSVS on page 12-1655 (see [1] and [2]) assuming β and σ2 are dependent random variables. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis and feature selection, pass the model object and data to the appropriate object function on page 12-1645.
Creation Syntax PriorMdl = mixconjugateblm(NumPredictors) PriorMdl = mixconjugateblm(NumPredictors,Name,Value) Description PriorMdl = mixconjugateblm(NumPredictors) creates a Bayesian linear regression model on page 12-1654 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is appropriate for implementing SSVS for predictor selection [2]. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = mixconjugateblm(NumPredictors,Name,Value) sets properties on page 12-1641 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, mixconjugateblm(3,'Probability',abs(rand(4,1))) specifies random prior regime probabilities for all four coefficients in the model.
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter PriorMdl.Intercept = false;
NumPredictors — Number of predictor variables nonnegative integer 12-1641
12
Functions
Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term for the value. After creating a model, if you change the of value NumPredictors using dot notation, then these parameters revert to the default values: • Variables names (VarNames) • Prior mean of β (Mu) • Prior variances of β for each regime (V) • Prior correlation matrix of β (Correlation) • Prior regime probabilities (Probability) Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char 12-1642
mixconjugateblm
Mu — Component-wise mean hyperparameter of Gaussian mixture prior on β zeros(Intercept + NumPredictors,2) (default) | numeric matrix Component-wise mean hyperparameter of the Gaussian mixture prior on β, specified as an (Intercept + NumPredictors)-by-2 numeric matrix. The first column contains the prior means for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior means for component 2 (the variable-exclusion regime, that is, γ = 0). • If Intercept is false, then Mu has NumPredictors rows. mixconjugateblm sets the prior mean of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, Mu has NumPredictors + 1 elements. The first element corresponds to the prior means of the intercept, and all other elements correspond to the predictor variables. Tip To perform SSVS, use the default value of Mu. Example: In a 3-coefficient model, 'Mu',[0.5 0; 0.5 0; 0.5 0] sets the component 1 prior mean of all coefficients to 0.5 and sets the component 2 prior mean of all coefficients to 0. Data Types: double V — Component-wise variance factor hyperparameter of Gaussian mixture prior on β repmat([10 0.1],Intercept + NumPredictors,1) (default) | positive numeric matrix Component-wise variance factor hyperparameter of the Gaussian mixture prior on β, an (Intercept + NumPredictors)-by-2 positive numeric matrix. The first column contains the prior variance factors for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior variance factors for component 2 (the variable-exclusion regime, that is, γ = 0). Regardless of regime or coefficient, the prior variance of a coefficient is the variance factor times σ2. • If Intercept is false, then V has NumPredictors rows. mixconjugateblm sets the prior variance factor of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, V has NumPredictors + 1 elements. The first element corresponds to the prior variance factor of the intercept, and all other elements correspond to the predictor variables. Tip • To perform SSVS, specify a larger variance factor for regime 1 than for regime 2 (for all j, specify V(j,1) > V(j,2)). • For more details on what value to specify for V, see [1].
Example: In a 3-coefficient model, 'V',[100 1; 100 1; 100 1] sets the component 1 prior variance factor of all coefficients to 100 and sets the component 2 prior variance factor of all coefficients to 1. Data Types: double
12-1643
12
Functions
Probability — Prior probability distribution for variable inclusion and exclusion regimes 0.5*ones(Intercept + NumPredictors,1) (default) | numeric vector of values in [0,1] | function handle Prior probability distribution for the variable inclusion and exclusion regimes, specified as an (Intercept + NumPredictors)-by-1 numeric vector of values in [0,1], or a function handle in the form @fcnName, where fcnName is the function name. Probability represents the prior probability distribution of γ = {γ1,…,γK}, where: • K = Intercept + NumPredictors, which is the number of coefficients in the regression model. • γk ∈ {0,1} for k = 1,…,K. Therefore, the sample space has a cardinality of 2K. • γk = 1 indicates variable VarNames(k) is included in the model, and γk = 0 indicates that the variable is excluded from the model. If Probability is a numeric vector: • Rows correspond to the variable names in VarNames. For models containing an intercept, the prior probability for intercept inclusion is Probability(1). • For k = 1,…,K, the prior probability for excluding variable k is 1 – Probability(k). • Prior probabilities of the variable-inclusion regime, among all variables and the intercept, are independent. If Probability is a function handle, then it represents a custom prior distribution of the variableinclusion regime probabilities. The corresponding function must have this declaration statement (the argument and function names can vary): logprob = regimeprior(varinc)
• logprob is a numeric scalar representing the log of the prior distribution. You can write the prior distribution up to a proportionality constant. • varinc is a K-by-1 logical vector. Elements correspond to the variable names in VarNames and indicate the regime in which the corresponding variable exists. varinc(k) = true indicates VarName(k) is included in the model, and varinc(k) = false indicates it is excluded from the model. You can include more input arguments, but they must be known when you call mixconjugateblm. For details on what value to specify for Probability, see [1]. Example: In a 3-coefficient model, 'Probability',rand(3,1) assigns random prior variableinclusion probabilities to each coefficient. Data Types: double | function_handle Correlation — Prior correlation matrix of β eye(Intercept + NumPredictors) (default) | numeric, positive definite matrix Prior correlation matrix of β for both components in the mixture model, specified as an (Intercept + NumPredictors)-by-(Intercept + NumPredictors) numeric, positive definite matrix. Consequently, the prior covariance matrix for component j in the mixture model is sigma2*diag(sqrt(V(:,j)))*Correlation*diag(sqrt(V(:,j))), where sigma2 is σ2 and V is the matrix of coefficient variance factors. Rows and columns correspond to the variable names in VarNames. 12-1644
mixconjugateblm
By default, regression coefficients are uncorrelated, conditional on the regime. Note You can supply any appropriately sized numeric matrix. However, if your specification is not positive definite, mixconjugateblm issues a warning and replaces your specification with CorrelationPD, where: CorrelationPD = 0.5*(Correlation + Correlation.');
For details on what value to specify for Correlation, see [1]. Data Types: double A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar Shape hyperparameter of the inverse gamma prior on σ2, specified as a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. Example: 'A',0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf. With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. Example: 'B',5 Data Types: double
Object Functions estimate simulate forecast plot summarize
Perform predictor variable selection for Bayesian linear regression models Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Examples
12-1645
12
Functions
Create Prior Model for SSVS Consider the linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions for k = 0,...,3: •
βk | σ2, γk = γkσ V k1Z1 + (1 − γk)σ V k2Z2, where Z1 and Z2 are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori, but they are dependent on the disturbance variance.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. • γk ∈ 0, 1 and it represents the random variable-inclusion regime variable with a discrete uniform distribution. Create a prior model for SSVS. Specify the number of predictors p. p = 3; PriorMdl = mixconjugateblm(p);
PriorMdl is a mixconjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. mixconjugateblm displays a summary of the prior distributions at the command line. Alternatively, you can create a prior model for SSVS by passing the number of predictors to bayeslm and setting the ModelType name-value pair argument to 'mixconjugate'. MdlBayesLM = bayeslm(p,'ModelType','mixconjugate') MdlBayesLM = mixconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Beta(1) | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Beta(2) | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Beta(3) | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
12-1646
mixconjugateblm
Mdl and MdlBayesLM are equivalent model objects. You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. PriorMdl.VarNames = ["IPI" "E" "WR"] PriorMdl = mixconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution IPI | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution E | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution WR | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
MATLAB® associates the variable names to the regression coefficients in displays. Plot the prior distributions. plot(PriorMdl);
12-1647
12
Functions
The prior distribution of each coefficient is a mixture of two Gaussians: both components have a mean of zero, but component 1 has a large variance relative to component 2. Therefore, their distributions are centered at zero and have the spike-and-slab appearance.
Perform Variable Selection Using SSVS and Default Options Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1645. Create a prior model for performing SSVS. Assume that β and σ2 are dependent (a conjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Display the prior regime probabilities and Gaussian mixture variance factors of the prior β. priorProbabilities = table(PriorMdl.Probability,'RowNames',PriorMdl.VarNames,... 'VariableNames',"Probability") priorProbabilities=4×1 table Probability ___________ Intercept
12-1648
0.5
mixconjugateblm
IPI E WR
0.5 0.5 0.5
priorV = array2table(PriorMdl.V,'RowNames',PriorMdl.VarNames,... 'VariableNames',["gammaIs1" "gammaIs0"]) priorV=4×2 table gammaIs1 ________ Intercept IPI E WR
10 10 10 10
gammaIs0 ________ 0.1 0.1 0.1 0.1
PriorMdl stores prior regime probabilities in the Probability property and the regime variance factors in the V property. The default prior probability of variable inclusion is 0.5. The default variance factors for each coefficient are 10 for the variable-inclusion regime and 0.01 for the variableexclusion regime. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo (MCMC) for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ---------------------------------------------------------------------------------Intercept | -18.8333 10.1851 [-36.965, 0.716] 0.037 Empirical 0.8806 IPI | 4.4554 0.1543 [ 4.165, 4.764] 1.000 Empirical 0.4545 E | 0.0010 0.0004 [ 0.000, 0.002] 0.997 Empirical 0.0925 WR | 2.4686 0.3615 [ 1.766, 3.197] 1.000 Empirical 0.1734 Sigma2 | 47.7557 8.6551 [33.858, 66.875] 1.000 Empirical NaN
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions at the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E (standardized) is in [0.000, 0.002] is 0.95. 12-1649
12
Functions
• Regime, which contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability that E should be included in the model is 0.0925. Assuming, that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-1650
mixconjugateblm
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Specify Custom Prior Regime Probability Distribution for SSVS Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1645. Load the Nelson-Plosser data set. Create variables for the response and predictor series. Add example-specific files to the MATLAB® path. load Data_NelsonPlosser VarNames = ["IPI" "E" "WR"]; X = DataTable{:,VarNames}; y = DataTable{:,"GNPR"}; path = fullfile(matlabroot,'examples','econ','main'); addpath(path);
Assume the following: • The intercept is in the model with probability 0.9. • IPI and E are in the model with probability 0.75. • If E is included in the model, then the probability that WR is included in the model is 0.9. 12-1651
12
Functions
• If E is excluded from the model, then the probability that WR is included is 0.25. Declare a function named priorssvsexample.m that: • Accepts a logical vector indicating whether the intercept and variables are in the model (true for model inclusion). Element 1 corresponds to the intercept, and the rest of the elements correspond to the variables in the data. • Returns a numeric scalar representing the log of the described prior regime probability distribution. function logprior = priorssvsexample(varinc) %PRIORSSVSEXAMPLE Log prior regime probability distribution for SSVS % PRIORSSVSEXAMPLE is an example of a custom log prior regime probability % distribution for SSVS with dependent random variables. varinc is % a 4-by-1 logical vector indicating whether 4 coefficients are in a model % and logPrior is a numeric scalar representing the log of the prior % distribution of the regime probabilities. % % Coefficients enter a model according to these rules: % * varinc(1) is included with probability 0.9. % * varinc(2) and varinc(3) are in the model with probability 0.75. % * If varinc(3) is included in the model, then the probability that % varinc(4) is included in the model is 0.9. % * If varinc(3) is excluded from the model, then the probability % that varinc(4) is included is 0.25. logprior = log(0.9) + 2*log(0.75) + log(varinc(3)*0.9 + (1-varinc(3))*0.25); end
priorssvsexample.m is an example-specific file included with Econometrics Toolbox™. To access it, enter edit priorssvsexample.m at the command line. are (a conjugate mixture model). Create a prior model for performing SSVS. Assume that and Specify the number of predictors p the names of the regression coefficients, and custom, prior probability distribution of the variable-inclusion regimes. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"],... 'Probability',@priorssvsexample);
Implement SSVS by estimating the marginal posterior distributions of and MCMC for estimation, set a random number seed to reproduce the results.
. Because SSVS uses
rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ----------------------------------------------------------------------------------
12-1652
mixconjugateblm
Intercept IPI E WR Sigma2
| -18.7971 | 4.4559 | 0.0010 | 2.4684 | 47.7391
10.1644 0.1530 0.0004 0.3618 8.6741
[-37.002, 0.765] [ 4.166, 4.760] [ 0.000, 0.002] [ 1.759, 3.196] [33.823, 67.024]
0.039 1.000 0.997 1.000 1.000
Empirical Empirical Empirical Empirical Empirical
0.8797 0.4623 0.2665 0.1727 NaN
Assuming, that variables with Regime < 0.1 should be removed from the model, the results suggest that you can include all variables in the model. Because this example requires path to access example-specific files, clean up by removing path from the MATLAB® path. rmpath(path);
Forecast Responses Using Posterior Predictive Distribution Consider the regression model in “Create Prior Model for SSVS” on page 12-1645. Perform SSVS: 1
Create a Bayesian regression model for SSVS with a conjugate prior for the data likelihood. Use the default settings.
2
Hold out the last 10 periods of data from estimation.
3
Estimate the marginal posterior distributions.
p = 3; PriorMdl = bayeslm(p,'ModelType','mixconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP');
12-1653
12
Functions
xlabel('Year'); hold off
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 18.8470
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. When you perform Bayesian regression with SSVS, a best practice is to tune the hyperparameters. One way to do so is to estimate the forecast RMSE over a grid of hyperparameter values, and choose the value that minimizes the forecast RMSE.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. 12-1654
mixconjugateblm
For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters. Stochastic Search Variable Selection Stochastic search variable selection (SSVS) is a predictor variable selection method for Bayesian linear regression that searches the space of potential models for models with high posterior probability, and averages the models it finds after it completes the search. SSVS assumes that the prior distribution of each regression coefficient is a mixture of two Gaussian distributions, and the prior distribution of σ2 is inverse gamma with shape A and scale B. Let γ = {γ1, …,γK} be a latent, random regime indicator for the regression coefficients β, where: • K is the number of coefficients in the model (Intercept + NumPredictors). γk = 1 means that βk|σ2,γk is Gaussian with mean 0 and variance c1. • γk = 0 means that a predictor is Gaussian with mean 0 and variance c2. • A probability mass function governs the distribution of γ, and the sample space of γ is composed of 2K elements. More specifically, given γk and σ2, βk = γkc1Z + (1 – γk)c2Z, where: • Z is a standard normal random variable. • For conjugate models (mixconjugateblm), cj = σ2Vj, j = 1,2. • For semiconjugate models (mixsemiconjugateblm), cj = Vj. c1 is relatively large, which implies that the corresponding predictor is more likely to be in the model. c2 is relatively small, which implies that the corresponding predictor is less likely to be in the model because distribution is dense around 0. In this framework, if the potential exists for a total of K coefficients in a model, then the space has 2K models through which to search. Because computing posterior probabilities of all 2K models can be computationally expensive, SSVS uses MCMC to sample γ = {γ1,…,γK} and estimate posterior probabilities of corresponding models. The models that the algorithm chooses often have higher posterior probabilities. The algorithm composes the estimated posterior distributions of β and σ2 by 12-1655
12
Functions
computing the weighted average of the sampled models. The algorithm attributes a larger weight to those models sampled more often. The resulting posterior distribution for conjugate mixture models is analytically tractable (see “Algorithms” on page 12-1656). For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5.
Algorithms A closed-form posterior exists for conjugate mixture priors in the SSVS framework with K coefficients. However, because the prior β|σ2,γ, marginalized by γ, is a 2K-component Gaussian mixture, MATLAB uses MCMC instead to sample from the posterior for numerical stability.
Alternative Functionality The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2018b
References [1] George, E. I., and R. E. McCulloch. "Variable Selection Via Gibbs Sampling." Journal of the American Statistical Association. Vol. 88, No. 423, 1993, pp. 881–889. [2] Koop, G., D. J. Poirier, and J. L. Tobias. Bayesian Econometric Methods. New York, NY: Cambridge University Press, 2007.
See Also Objects empiricalblm | lassoblm | mixsemiconjugateblm Topics “Bayesian Linear Regression” on page 6-2 “Bayesian Lasso Regression” on page 6-52 “Bayesian Stochastic Search Variable Selection” on page 6-63
12-1656
mixsemiconjugateblm
mixsemiconjugateblm Bayesian linear regression model with semiconjugate priors for stochastic search variable selection (SSVS)
Description The Bayesian linear regression model on page 12-1671 object mixsemiconjugateblm specifies the joint prior distribution of the regression coefficients and the disturbance variance (β, σ2) for implementing SSVS on page 12-1671 (see [1] and [2]) assuming β and σ2 are dependent random variables. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis and feature selection, pass the model object and data to the appropriate object function on page 12-1661.
Creation Syntax PriorMdl = mixsemiconjugateblm(NumPredictors) PriorMdl = mixsemiconjugateblm(NumPredictors,Name,Value) Description PriorMdl = mixsemiconjugateblm(NumPredictors) creates a Bayesian linear regression model on page 12-1654 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is appropriate for implementing SSVS for predictor selection [2]. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = mixsemiconjugateblm(NumPredictors,Name,Value) sets properties on page 121657 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, mixsemiconjugateblm(3,'Probability',abs(rand(4,1))) specifies random prior regime probabilities for all four coefficients in the model.
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter PriorMdl.Intercept = false;
NumPredictors — Number of predictor variables nonnegative integer 12-1657
12
Functions
Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term for the value. After creating a model, if you change the of value NumPredictors using dot notation, then these parameters revert to the default values: • Variable names (VarNames) • Prior mean of β (Mu) • Prior variances of β for each regime (V) • Prior correlation matrix of β (Correlation) • Prior regime probabilities (Probability) Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char 12-1658
mixsemiconjugateblm
Mu — Component-wise mean hyperparameter of Gaussian mixture prior on β zeros(Intercept + NumPredictors,2) (default) | numeric matrix Component-wise mean hyperparameter of the Gaussian mixture prior on β, specified as an (Intercept + NumPredictors)-by-2 numeric matrix. The first column contains the prior means for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior means for component 2 (the variable-exclusion regime, that is, γ = 0). • If Intercept is false, then Mu has NumPredictors rows. mixsemiconjugateblm sets the prior mean of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, Mu has NumPredictors + 1 elements. The first element corresponds to the prior means of the intercept, and all other elements correspond to the predictor variables. Tip To perform SSVS, use the default value of Mu. Example: In a 3-coefficient model, 'Mu',[0.5 0; 0.5 0; 0.5 0] sets the component 1 prior mean of all coefficients to 0.5 and sets the component 2 prior mean of all coefficients to 0. Data Types: double V — Component-wise variance hyperparameter of Gaussian mixture prior on β repmat([10 0.1],Intercept + NumPredictors,1) (default) | positive numeric matrix Component-wise variance hyperparameter of the Gaussian mixture prior on β, an (Intercept + NumPredictors)-by-2 positive numeric matrix. The first column contains the prior variance factors for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior variance factors for component 2 (the variable-exclusion regime, that is, γ = 0). • If Intercept is false, then V has NumPredictors rows. mixsemiconjugateblm sets the prior variance factor of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, V has NumPredictors + 1 elements. The first element corresponds to the prior variance factor of the intercept, and all other elements correspond to the predictor variables. Tip • To perform SSVS, specify a larger variance factor for regime 1 than for regime 2 (for all j, specify V(j,1) > V(j,2)). • For more details on what value to specify for V, see [1].
Example: In a 3-coefficient model, 'V',[100 1; 100 1; 100 1] sets the component 1 prior variance factor of all coefficients to 100 and sets the component 2 prior variance factor of all coefficients to 1. Data Types: double Probability — Prior probability distribution for variable inclusion and exclusion regimes 0.5*ones(Intercept + NumPredictors,1) (default) | numeric vector of values in [0,1] | function handle 12-1659
12
Functions
Prior probability distribution for the variable inclusion and exclusion regimes, specified as an (Intercept + NumPredictors)-by-1 numeric vector of values in [0,1], or a function handle in the form @fcnName, where fcnName is the function name. Probability represents the prior probability distribution of γ = {γ1,…,γK}, where: • K = Intercept + NumPredictors, which is the number of coefficients in the regression model. • γk ∈ {0,1} for k = 1,…,K. Therefore, the sample space has a cardinality of 2K. • γk = 1 indicates variable VarNames(k) is included in the model, and γk = 0 indicates that the variable is excluded from the model. If Probability is a numeric vector: • Rows correspond to the variable names in VarNames. For models containing an intercept, the prior probability for intercept inclusion is Probability(1). • For k = 1,…,K, the prior probability for excluding variable k is 1 – Probability(k). • Prior probabilities of the variable-inclusion regime, among all variables and the intercept, are independent. If Probability is a function handle, then it represents a custom prior distribution of the variableinclusion regime probabilities. The corresponding function must have this declaration statement (the argument and function names can vary): logprob = regimeprior(varinc)
• logprob is a numeric scalar representing the log of the prior distribution. You can write the prior distribution up to a proportionality constant. • varinc is a K-by-1 logical vector. Elements correspond to the variable names in VarNames and indicate the regime in which the corresponding variable exists. varinc(k) = true indicates VarName(k) is included in the model, and varinc(k) = false indicates it is excluded from the model. You can include more input arguments, but they must be known when you call mixsemiconjugateblm. For details on what value to specify for Probability, see [1]. Example: In a 3-coefficient model, 'Probability',rand(3,1) assigns random prior variableinclusion probabilities to each coefficient. Data Types: double | function_handle Correlation — Prior correlation matrix of β eye(Intercept + NumPredictors) (default) | numeric, positive definite matrix Prior correlation matrix of β for both components in the mixture model, specified as an (Intercept + NumPredictors)-by-(Intercept + NumPredictors) numeric, positive definite matrix. Consequently, the prior covariance matrix for component j in the mixture model is diag(sqrt(V(:,j)))*Correlation*diag(sqrt(V(:,j))), where V is the matrix of coefficient variances. Rows and columns correspond to the variable names in VarNames. By default, regression coefficients are uncorrelated, conditional on the regime. 12-1660
mixsemiconjugateblm
Note You can supply any appropriately sized numeric matrix. However, if your specification is not positive definite, mixsemiconjugateblm issues a warning and replaces your specification with CorrelationPD, where: CorrelationPD = 0.5*(Correlation + Correlation.');
Tip For details on what value to specify for Correlation, see [1]. Data Types: double A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar Shape hyperparameter of the inverse gamma prior on σ2, specified as a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. Example: 'A',0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf. With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. Example: 'B',5 Data Types: double
Object Functions estimate simulate forecast plot summarize
Perform predictor variable selection for Bayesian linear regression models Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Examples
12-1661
12
Functions
Create Prior Model for SSVS Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions for k = 0,...,3: •
βk | σ2, γk = γk V k1Z1 + (1 − γk) V k2Z2, where Z1 and Z2 are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. • γk ∈ 0, 1 and it represents the random variable-inclusion regime variable with a discrete uniform distribution. Create a prior model for SSVS. Specify the number of predictors p. p = 3; PriorMdl = mixsemiconjugateblm(p);
PriorMdl is a mixsemiconjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. mixsemiconjugateblm displays a summary of the prior distributions at the command line. Alternatively, you can create a prior model for SSVS by passing the number of predictors to bayeslm and setting the ModelType name-value pair argument to 'mixsemiconjugate'. MdlBayesLM = bayeslm(p,'ModelType','mixsemiconjugate') MdlBayesLM = mixsemiconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Beta(1) | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Beta(2) | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Beta(3) | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
12-1662
mixsemiconjugateblm
Mdl and MdlBayesLM are equivalent model objects. You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. PriorMdl.VarNames = ["IPI" "E" "WR"] PriorMdl = mixsemiconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution IPI | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution E | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution WR | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
MATLAB® associates the variable names to the regression coefficients in displays. Plot the prior distributions. plot(PriorMdl);
12-1663
12
Functions
The prior distribution of each coefficient is a mixture of two Gaussians: both components have a mean of zero, but component 1 has a large variance relative to component 2. Therefore, their distributions are centered at zero and have the spike-and-slab appearance.
Perform Variable Selection Using SSVS and Default Options Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1661. Create a prior model for performing SSVS. Assume that β and σ2 are independent (a semiconjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixsemiconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Display the prior regime probabilities and Gaussian mixture variance factors of the prior β. priorProbabilities = table(PriorMdl.Probability,'RowNames',PriorMdl.VarNames,... 'VariableNames',"Probability") priorProbabilities=4×1 table Probability ___________ Intercept
12-1664
0.5
mixsemiconjugateblm
IPI E WR
0.5 0.5 0.5
priorV = array2table(PriorMdl.V,'RowNames',PriorMdl.VarNames,... 'VariableNames',["gammaIs1" "gammaIs0"]) priorV=4×2 table gammaIs1 ________ Intercept IPI E WR
10 10 10 10
gammaIs0 ________ 0.1 0.1 0.1 0.1
PriorMdl stores prior regime probabilities in the Probability property and the regime variance factors in the V property. The default prior probability of variable inclusion is 0.5. The default variance factors for each coefficient are 10 for the variable-inclusion regime and 0.01 for the variableexclusion regime. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo (MCMC) for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ------------------------------------------------------------------------------Intercept | -1.5629 2.6816 [-7.879, 2.703] 0.300 Empirical 0.5901 IPI | 4.6217 0.1222 [ 4.384, 4.865] 1.000 Empirical 1 E | 0.0004 0.0002 [ 0.000, 0.001] 0.976 Empirical 0.0918 WR | 2.6098 0.3691 [ 1.889, 3.347] 1.000 Empirical 1 Sigma2 | 50.9169 9.4955 [35.838, 72.707] 1.000 Empirical NaN
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions at the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E is in [0.000, 0.001] is 0.95. 12-1665
12
Functions
• Regime, which contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability that E should be included in the model is 0.0918. Assuming that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-1666
mixsemiconjugateblm
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Specify Custom Prior Regime Probability Distribution for SSVS Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1661. Load the Nelson-Plosser data set. Create variables for the response and predictor series. Add example-specific files to the MATLAB® path. load Data_NelsonPlosser VarNames = ["IPI" "E" "WR"]; X = DataTable{:,VarNames}; y = DataTable{:,"GNPR"}; path = fullfile(matlabroot,'examples','econ','main'); addpath(path);
Assume the following: • The intercept is in the model with probability 0.9. • IPI and E are in the model with probability 0.75. • If E is included in the model, then the probability that WR is included in the model is 0.9. 12-1667
12
Functions
• If E is excluded from the model, then the probability that WR is included is 0.25. Declare a function named priorssvsexample.m that: • Accepts a logical vector indicating whether the intercept and variables are in the model (true for model inclusion). Element 1 corresponds to the intercept, and the rest of the elements correspond to the variables in the data. • Returns a numeric scalar representing the log of the described prior regime probability distribution. function logprior = priorssvsexample(varinc) %PRIORSSVSEXAMPLE Log prior regime probability distribution for SSVS % PRIORSSVSEXAMPLE is an example of a custom log prior regime probability % distribution for SSVS with dependent random variables. varinc is % a 4-by-1 logical vector indicating whether 4 coefficients are in a model % and logPrior is a numeric scalar representing the log of the prior % distribution of the regime probabilities. % % Coefficients enter a model according to these rules: % * varinc(1) is included with probability 0.9. % * varinc(2) and varinc(3) are in the model with probability 0.75. % * If varinc(3) is included in the model, then the probability that % varinc(4) is included in the model is 0.9. % * If varinc(3) is excluded from the model, then the probability % that varinc(4) is included is 0.25. logprior = log(0.9) + 2*log(0.75) + log(varinc(3)*0.9 + (1-varinc(3))*0.25); end
priorssvsexample.m is an example-specific file included with Econometrics Toolbox™. To access it, enter edit priorssvsexample.m at the command line. are independent (a semiconjugate Create a prior model for performing SSVS. Assume that and mixture model). Specify the number of predictors p the names of the regression coefficients, and custom, prior probability distribution of the variable-inclusion regimes. p = 3; PriorMdl = mixsemiconjugateblm(p,'VarNames',["IPI" "E" "WR"],... 'Probability',@priorssvsexample);
Implement SSVS by estimating the marginal posterior distributions of and MCMC for estimation, set a random number seed to reproduce the results.
. Because SSVS uses
rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime -------------------------------------------------------------------------------
12-1668
mixsemiconjugateblm
Intercept IPI E WR Sigma2
| -1.4658 | 4.6227 | 0.0004 | 2.6105 | 50.9621
2.6046 0.1222 0.0002 0.3692 9.4999
[-7.781, 2.546] [ 4.385, 4.866] [ 0.000, 0.001] [ 1.886, 3.346] [35.860, 72.596]
0.308 1.000 0.976 1.000 1.000
Empirical Empirical Empirical Empirical Empirical
0.5516 1 0.2557 1 NaN
Assuming, that variables with Regime < 0.1 should be removed from the model, the results suggest that you can include all variables in the model. Because this example requires path to access example-specific files, clean up by removing path from the MATLAB® path. rmpath(path);
Forecast Responses Using Posterior Predictive Distribution Consider the regression model in “Create Prior Model for SSVS” on page 12-1661. Perform SSVS: 1
Create a Bayesian regression model for SSVS with a semiconjugate prior for the data likelihood. Use the default settings.
2
Hold out the last 10 periods of data from estimation.
3
Estimate the marginal posterior distributions.
p = 3; PriorMdl = bayeslm(p,'ModelType','mixsemiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP');
12-1669
12
Functions
xlabel('Year'); hold off
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 4.5935
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. When you perform Bayesian regression with SSVS, a best practice is to tune the hyperparameters. One way to do so is to estimate the forecast RMSE over a grid of hyperparameter values, and choose the value that minimizes the forecast RMSE. Copyright 2018 The MathWorks, Inc.
12-1670
mixsemiconjugateblm
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters. Stochastic Search Variable Selection Stochastic search variable selection (SSVS) is a predictor variable selection method for Bayesian linear regression that searches the space of potential models for models with high posterior probability, and averages the models it finds after it completes the search. SSVS assumes that the prior distribution of each regression coefficient is a mixture of two Gaussian distributions, and the prior distribution of σ2 is inverse gamma with shape A and scale B. Let γ = {γ1, …,γK} be a latent, random regime indicator for the regression coefficients β, where: • K is the number of coefficients in the model (Intercept + NumPredictors). γk = 1 means that βk|σ2,γk is Gaussian with mean 0 and variance c1. • γk = 0 means that a predictor is Gaussian with mean 0 and variance c2. • A probability mass function governs the distribution of γ, and the sample space of γ is composed of 2K elements. More specifically, given γk and σ2, βk = γkc1Z + (1 – γk)c2Z, where: • Z is a standard normal random variable. • For conjugate models (mixconjugateblm), cj = σ2Vj, j = 1,2. • For semiconjugate models (mixsemiconjugateblm), cj = Vj. c1 is relatively large, which implies that the corresponding predictor is more likely to be in the model. c2 is relatively small, which implies that the corresponding predictor is less likely to be in the model because distribution is dense around 0. 12-1671
12
Functions
In this framework, if the potential exists for a total of K coefficients in a model, then the space has 2K models through which to search. Because computing posterior probabilities of all 2K models can be computationally expensive, SSVS uses MCMC to sample γ = {γ1,…,γK} and estimate posterior probabilities of corresponding models. The models that the algorithm chooses often have higher posterior probabilities. The algorithm composes the estimated posterior distributions of β and σ2 by computing the weighted average of the sampled models. The algorithm attributes a larger weight to those models sampled more often. The resulting posterior distribution for semiconjugate mixture models is analytically intractable. For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5.
Alternative Functionality The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2018b
References [1] George, E. I., and R. E. McCulloch. "Variable Selection Via Gibbs Sampling." Journal of the American Statistical Association. Vol. 88, No. 423, 1993, pp. 881–889. [2] Koop, G., D. J. Poirier, and J. L. Tobias. Bayesian Econometric Methods. New York, NY: Cambridge University Press, 2007.
See Also Objects empiricalblm | lassoblm | mixconjugateblm Topics “Bayesian Linear Regression” on page 6-2 “Bayesian Lasso Regression” on page 6-52 “Bayesian Stochastic Search Variable Selection” on page 6-63
12-1672
mldivide
mldivide Lag operator polynomial left division
Syntax B = A\C B = mldivide(A, C'PropertyName',PropertyValue)
Description Given two lag operator polynomials, A(L) and C(L)B = A\C perform a left division so that C(L) = A(L)*B(L), or B(L) = A(L)\C(L). Left division requires invertibility of the coefficient matrix associated with lag 0 of the denominator polynomial A(L). B = mldivide(A, C'PropertyName',PropertyValue) accepts one or more comma-separated property name/value pairs.
Input Arguments A Denominator (divisor) lag operator polynomial object, as produced by LagOp, in the quotient A(L) \C(L). C Numerator (dividend) lag operator polynomial object, as produced by LagOp, in the quotient A(L) \C(L)). If at least one of A or C is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator). 'AbsTol' Nonnegative scalar absolute tolerance used as part of the termination criterion of the calculation of the quotient coefficients and, subsequently, to determine which coefficients to include in the quotient. Specifying an absolute tolerance allows for customization of the termination criterion. Once the algorithm has terminated, 'AbsTol' is used to exclude polynomial lags with near-zero coefficients. A coefficient matrix for a given lag is excluded if the magnitudes of all elements of the matrix are less than or equal to the absolute tolerance. Default: 1e-12 'RelTol' Nonnegative scalar relative tolerance used as part of the termination criterion of the calculation of the quotient coefficients. At each lag, a coefficient matrix is calculated and its 2-norm compared to the largest coefficient 2-norm. If the ratio of the current norm to the largest norm is less than or equal to 'RelTol', then the relative termination criterion is satisfied. Default: 0.01 12-1673
12
Functions
'Window' Positive integer indicating the size of the window used to check termination tolerances. Window represents the number of consecutive lags for which coefficients must satisfy a tolerance-based termination criterion in order to terminate the calculation of the quotient coefficients. If coefficients remain below tolerance for the length of the specified tolerance window, they are assumed to have died out sufficiently to terminate the algorithm (see notes below). Default: 20 'Degree' Nonnegative integer indicating the maximum degree of the quotient polynomial. For stable denominators, the default is the power to which the magnitude of the largest eigenvalue of the denominator must be raised to equal the relative termination tolerance 'RelTol'; for unstable denominators, the default is the power to which the magnitude of the largest eigenvalue must be raised to equal the largest positive floating point number (see realmax). The default is 1000, regardless of the stability of the denominator. Default: 1000
Output Arguments B Quotient lag operator polynomial object, such that B(L) = A(L)\C(L).
Examples Divide Lag Operator Polynomials Create two LagOp polynomial objects: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5});
The ratios A/B and B\A are equal: isEqLagOp(A/B,B\A) ans = logical 1
Tips The right division operator (\) invokes mldivide, but the optional inputs are available only by calling mldivide directly. To right-invert a stable B(L), set C(L) = eye(B.Dimension). 12-1674
mldivide
Algorithms Lag operator polynomial division generally results in infinite-degree polynomials. mldivide imposes a termination criterion to truncate the degree of the quotient polynomial. If 'Degree' is unspecified, the maximum degree of the quotient is determined by the stability of the denominator. Stable denominator polynomials usually result in quotients whose coefficients exhibit geometric decay in absolute value. (When coefficients change sign, it is the coefficient envelope which decays geometrically.) Unstable denominators usually result in quotients whose coefficients exhibit geometric growth in absolute value. In either case, maximum degree will not exceed the value of 'Degree'. To control truncation error by terminating the coefficient sequence too early, the termination criterion involves three steps: 1
At each lag in the quotient polynomial, a coefficient matrix is calculated and tested against both a relative and an absolute tolerance (see 'RelTol' and 'AbsTol' inputs ).
2
If the current coefficient matrix is below either tolerance, then a tolerance window is opened to ensure that all subsequent coefficients remain below tolerance for a number of lags determined by 'Window'.
3
If any subsequent coefficient matrix within the window is above both tolerances, then the tolerance window is closed and additional coefficients are calculated, repeating steps (1) and (2) until a subsequent coefficient matrix is again below either tolerance, and a new window is opened.
Steps (1)-(3) are repeated until a coefficient is below tolerance and subsequent coefficients remains below tolerance for 'Window' lags, or until the maximum 'Degree' is encountered, or until a coefficient becomes numerically unstable (NaN or +/-Inf).
References [1] Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hayashi, F. Econometrics. Princeton, NJ: Princeton University Press, 2000. [3] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also mrdivide Topics “Specify Lag Operator Polynomials” on page 2-9 “Plot the Impulse Response Function of Conditional Mean Model” on page 7-86
12-1675
12
Functions
mrdivide Lag operator polynomial right division
Syntax A = C/B A = mrdivide(C, B,'PropertyName', PropertyValue)
Description A = C/B returns the quotient lag operator polynomial (A), which is the result of C(L)/B(L). A = mrdivide(C, B,'PropertyName', PropertyValue) accepts one or more optional commaseparated property name/value pairs.
Input Arguments C Numerator (dividend) lag operator polynomial object, as produced by LagOp, in the quotient C(L)/ B(L). B Denominator (divisor) lag operator polynomial object, as produced by LagOp, in the quotient C(L)/ B(L). If at least one of C or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator). 'AbsTol' Nonnegative scalar absolute tolerance used as part of the termination criterion of the calculation of the quotient coefficients and, subsequently, to determine which coefficients to include in the quotient. Specifying an absolute tolerance allows for customization of the termination criterion. Once the algorithm has terminated, 'AbsTol' is used to exclude polynomial lags with near-zero coefficients. A coefficient matrix for a given lag is excluded if the magnitudes of all elements of the matrix are less than or equal to the absolute tolerance. Default: 1e-12 'RelTol' Nonnegative scalar relative tolerance used as part of the termination criterion of the calculation of the quotient coefficients. At each lag, a coefficient matrix is calculated and its 2-norm compared to the largest coefficient 2-norm. If the ratio of the current norm to the largest norm is less than or equal to 'RelTol', then the relative termination criterion is satisfied. Default: 0.01 12-1676
mrdivide
'Window' Positive integer indicating the size of the window used to check termination tolerances. Window represents the number of consecutive lags for which coefficients must satisfy a tolerance-based termination criterion in order to terminate the calculation of the quotient coefficients. If coefficients remain below tolerance for the length of the specified tolerance window, they are assumed to have died out sufficiently to terminate the algorithm (see notes below). Default: 20 'Degree' Nonnegative integer indicating the maximum degree of the quotient polynomial. For stable denominators, the default is the power to which the magnitude of the largest eigenvalue of the denominator must be raised to equal the relative termination tolerance 'RelTol'; for unstable denominators, the default is the power to which the magnitude of the largest eigenvalue must be raised to equal the largest positive floating point number (see realmax). The default is 1000, regardless of the stability of the denominator. Default: 1000
Output Arguments A Quotient lag operator polynomial object, with A(L) = C(L)/B(L).
Examples Invert a Lag Operator Polynomial Create a LagOp polynomial object with a sequence of scalar coefficients specified as a cell array: A = LagOp({1 -0.5});
Invert the polynomial by using the short-hand slash ("/") operator: a = 1 / A a = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 0.5 0.25 0.125 0.0625 0.03125 0.015625] Lags: [0 1 2 3 4 5 6] Degree: 6 Dimension: 1
Tips The right division operator (/) invokes mrdivide, but the optional inputs are available only by calling mrdivide directly. To right-invert a stable B(L), set C(L) = eye(B.Dimension). 12-1677
12
Functions
Algorithms Lag operator polynomial division generally results in infinite-degree polynomials. mrdivide imposes a termination criterion to truncate the degree of the quotient polynomial. If 'Degree' is unspecified, the maximum degree of the quotient is determined by the stability of the denominator. Stable denominator polynomials usually result in quotients whose coefficients exhibit geometric decay in absolute value. (When coefficients change sign, it is the coefficient envelope which decays geometrically.) Unstable denominators usually result in quotients whose coefficients exhibit geometric growth in absolute value. In either case, maximum degree will not exceed the value of 'Degree'. To control truncation error by terminating the coefficient sequence too early, the termination criterion involves three steps: 1
At each lag in the quotient polynomial, a coefficient matrix is calculated and tested against both a relative and an absolute tolerance (see 'RelTol' and 'AbsTol' inputs ).
2
If the current coefficient matrix is below either tolerance, then a tolerance window is opened to ensure that all subsequent coefficients remain below tolerance for a number of lags determined by 'Window'.
3
If any subsequent coefficient matrix within the window is above both tolerances, then the tolerance window is closed and additional coefficients are calculated, repeating steps (1) and (2) until a subsequent coefficient matrix is again below either tolerance, and a new window is opened.
The algorithm repeats steps 1–3 until a coefficient is below tolerance and subsequent coefficients remains below tolerance for 'Window' lags, or until the maximum 'Degree' is encountered, or until a coefficient becomes numerically unstable (NaN or +/-Inf).
References [1] Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hayashi, F. Econometrics. Princeton, NJ: Princeton University Press, 2000. [3] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also mldivide
12-1678
msVAR
msVAR Create Markov-switching dynamic regression model
Description The msVAR function returns an msVAR object that specifies the functional form of a Markov-switching dynamic regression model on page 12-1694 for the univariate or multivariate response process yt. The msVAR object also stores the parameter values of the model. An msVAR object has two key components: • Switching mechanism among states, represented by a discrete-time Markov chain (dtmc object) • State-specific submodels, either autoregressive (ARX) or vector autoregression (VARX) models (arima or varm objects), which can contain exogenous regression components The components completely specify the model structure. The Markov chain transition matrix and submodel parameters, such as the AR coefficients and innovation-distribution variance, are unknown and estimable unless you specify their values. To estimate a model containing unknown parameter values, pass the model and data to estimate. To work with an estimated or fully specified msVAR object, pass it to an object function on page 12-1681. Alternatively, to create a threshold-switching dynamic regression model, which has a switching mechanism governed by threshold transitions and observations of a threshold variable, see threshold and tsVAR.
Creation Syntax Mdl = msVAR(mc,mdl) Mdl = msVAR(mc,mdl,'SeriesNames',seriesNames) Description Mdl = msVAR(mc,mdl) creates a Markov-switching dynamic regression model Mdl (an msVAR object) that has the discrete-time Markov chain, switching mechanism among states mc and the statespecific, stable dynamic regression submodels mdl. Mdl = msVAR(mc,mdl,'SeriesNames',seriesNames) optionally sets the SeriesNames property, which associates the names seriesNames to the time series of the model. Input Arguments mc — Discrete-time Markov chain for switching mechanism among states dtmc object Discrete-time Markov chain for the switching mechanism among states, specified as a dtmc object. 12-1679
12
Functions
The states represented in the rows and columns of the transition matrix mc.P correspond to the states represented in the submodel vector mdl. msVAR processes and stores mc in the Switch property. mdl — State-specific dynamic regression submodels vector of arima objects | vector of varm objects State-specific dynamic regression submodels, specified as a length mc.NumStates vector of model objects individually constructed by arima or varm. All submodels must be of the same type (arima or varm) and have the same number of series. Unlike other model estimation tools, estimate does not infer the size of submodel regression coefficient arrays during estimation. Therefore, you must specify the Beta property of each submodel appropriately. For example, to include and estimate three predictors of the regression component of univariate submodel j, set mdl(j).Beta = NaN(3,1). msVAR processes and stores mdl in the property Submodels.
Properties You can set only the SeriesNames property when you create a model by using name-value argument syntax or by using dot notation. MATLAB derives the values of all other properties from inputs mc and mdl. For example, create a Markov-switching model for a 2-D response series, and then label the first and second series "GDP" and "CPI", respectively. Mdl = msVAR(mc,mdl); Mdl.SeriesNames = ["GDP" "CPI"];
NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar. Data Types: double NumSeries — Number of time series positive integer This property is read-only. Number of time series, specified as a positive integer. NumSeries specifies the dimensionality of the response variable and innovation in all submodels. Data Types: double StateNames — State labels string vector This property is read-only. State labels, specified as a string vector of length NumStates. 12-1680
msVAR
Data Types: string SeriesNames — Unique Series labels string(1:numSeries) (default) | string vector | cell array of character vectors | numeric vector Unique series labels, specified as a string vector, cell array of character vectors, or a numeric vector of length numSeries. msVAR stores the series names as a string vector. Data Types: string Switch — Discrete-time Markov chain for switching mechanism among states dtmc object This property is read-only. Discrete-time Markov chain for the switching mechanism among states, specified as a dtmc object. Submodels — State-specific vector autoregression submodels vector of varm objects This property is read-only. State-specific vector autoregression submodels, specified as a vector of varm objects of length NumStates. msVAR removes unsupported submodel components. • For arima submodels, msVAR does not support the moving average (MA), differencing, and seasonal components. If any submodel is a composite conditional mean and variance model (for example, its Variance property is a garch object), msVAR issues an error. • For varm submodels, msVAR does not support the trend component. msVAR converts submodels specified as arima objects to 1-D varm objects. Notes • NaN-valued elements in either the properties of Switch or the submodels of Submodels indicate unknown, estimable parameters. Specified elements, except submodel innovation variances, indicate equality constraints on parameters in model estimation. • All unknown submodel parameters are state dependent.
Object Functions estimate filter forecast simulate smooth summarize
Fit Markov-switching dynamic regression model to data Filtered inference of operative latent states in Markov-switching dynamic regression data Forecast sample paths from Markov-switching dynamic regression model Simulate sample paths of Markov-switching dynamic regression model Smoothed inference of operative latent states in Markov-switching dynamic regression data Summarize Markov-switching dynamic regression model estimation results 12-1681
12
Functions
Examples Create Fully Specified Univariate Model Create a two-state Markov-switching dynamic regression model for a 1-D response process. Specify all parameter values (this example uses arbitrary values). Create a two-state discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.9 0.1; 0.3 0.7]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]) mc = dtmc with properties: P: [2x2 double] StateNames: ["Expansion" NumStates: 2
"Recession"]
mc is a dtmc object. For each regime, use arima to create an AR model that describes the response process within the regime. % Constants C1 = 5; C2 = -5; % AR coefficients AR1 = [0.3 0.2]; % 2 lags AR2 = 0.1; % 1 lag % Innovations variances v1 = 2; v2 = 1; % AR Submodels mdl1 = arima('Constant',C1,'AR',AR1,... 'Variance',v1,'Description','Expansion State') mdl1 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta:
12-1682
"Expansion State" Name = "Gaussian" 2 0 0 5 {0.3 0.2} at lags [1 2] {} {} {} 0 [1×0]
msVAR
Variance: 2 ARIMA(2,0,0) Model (Gaussian Distribution) mdl2 = arima('Constant',C2,'AR',AR2,... 'Variance',v2,'Description','Recession State') mdl2 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State" Name = "Gaussian" 1 0 0 -5 {0.1} at lag [1] {} {} {} 0 [1×0] 1
ARIMA(1,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are fully specified arima objects. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2];
Use msVAR to create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
2 1 ["Expansion" "1" [1x1 dtmc] [2x1 varm]
"Recession"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR:
"AR-Stationary 1-Dimensional VAR(2) Model" "Y1" 1 2 5 {0.3 0.2} at lags [1 2]
12-1683
12
Functions
Trend: 0 Beta: [1×0 matrix] Covariance: 2 Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 1-Dimensional VAR(1) Model" "Y1" 1 1 -5 {0.1} at lag [1] 0 [1×0 matrix] 1
Mdl is a fully specified msVAR object representing a univariate two-state Markov-switching dynamic regression model. msVAR stores specified arima submodels as varm objects. Because Mdl is fully specified, you can pass it to any msVAR object function for further analysis (see “Object Functions” on page 12-1681). Or, you can specify that the parameters of Mdl are initial values for the estimation procedure (see estimate).
Create Fully Specified Model for US GDP Rate Consider a two-state Markov-switching dynamic regression model of the postwar US real GDP growth rate. The model has the parameter estimates presented in [1]. Create a discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.92 0.08; 0.26 0.74]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
mc is a fully specified dtmc object. Create separate AR(0) models (constant only) for the two regimes. sigma = 3.34; % Homoscedastic models across states mdl1 = arima('Constant',4.62,'Variance',sigma^2); mdl2 = arima('Constant',-0.48,'Variance',sigma^2); mdl = [mdl1 mdl2];
Create the Markov-switching dynamic regression model that describes the behavior of the US GDP growth rate. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: 2
12-1684
msVAR
NumSeries: StateNames: SeriesNames: Switch: Submodels:
1 ["Expansion" "1" [1x1 dtmc] [2x1 varm]
"Recession"]
Mdl is a fully specified msVAR object.
Create Partially Specified Univariate Model for Estimation Consider fitting to data a two-state Markov-switching model for a 1-D response process. Create a discrete-time Markov chain model for the switching mechanism. Specify a 2-by-2 matrix of NaN values for the transition matrix. This setting indicates that you want to estimate all transition probabilities. Label the states. P = NaN(2); mc = dtmc(P,'StateNames',["Expansion" "Recession"]) mc = dtmc with properties: P: [2x2 double] StateNames: ["Expansion" NumStates: 2
"Recession"]
mc.P ans = 2×2 NaN NaN
NaN NaN
mc is a partially specified dtmc object. The transition matrix mc.P is completely unknown and estimable. Create AR(1) and AR(2) models by using the shorthand syntax of arima. After you create each model, specify the model description by using dot notation. mdl1 = arima(1,0,0); mdl1.Description = "Expansion State" mdl1 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR:
"Expansion State" Name = "Gaussian" 1 0 0 NaN {NaN} at lag [1] {}
12-1685
12
Functions
MA: SMA: Seasonality: Beta: Variance:
{} {} 0 [1×0] NaN
ARIMA(1,0,0) Model (Gaussian Distribution) mdl2 = arima(2,0,0); mdl2.Description = "Recession State" mdl2 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State" Name = "Gaussian" 2 0 0 NaN {NaN NaN} at lags [1 2] {} {} {} 0 [1×0] NaN
ARIMA(2,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are partially specified arima objects. NaN-valued properties correspond to unknown, estimable parameters. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2];
Create a Markov-switching model template from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
2 1 ["Expansion" "1" [1x1 dtmc] [2x1 varm]
"Recession"]
Mdl is a partially specified msVAR object representing a univariate two-state Markov-switching dynamic regression model. Mdl.Submodels(1) ans = varm with properties:
12-1686
msVAR
Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(1) Model" "Y1" 1 1 NaN {NaN} at lag [1] 0 [1×0 matrix] NaN
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" "Y1" 1 2 NaN {NaN NaN} at lags [1 2] 0 [1×0 matrix] NaN
msVAR converts the arima object submodels to 1-D varm object equivalents. Mdl is prepared for estimation. You can pass Mdl, along with data and a fully specified model containing initial values for optimization, to estimate.
Create Fully Specified Multivariate Model Create a three-state Markov-switching dynamic regression model for a 2-D response process. Specify all parameter values (this example uses arbitrary values). Create a three-state discrete-time Markov chain model that describes the regime switching mechanism. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P); mc.P ans = 3×3 0.8333 0.0833 0.0833
0.0833 0.8333 0.0833
0.0833 0.0833 0.8333
mc is a dtmc object. dtmc normalizes P so that each row sums to 1. For each regime, use varm to create a VAR model that describes the response process within the regime. Specify all parameter values. 12-1687
12
Functions
% Constants (numSeries x 1 vectors) C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; % Autoregression coefficients (numSeries AR1 = {}; % 0 AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2
x numSeries matrices) lags lag lags
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VAR Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3);
mdl1, mdl2, and mdl3 are fully specified varm objects. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2; mdl3];
Use msVAR to create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
3 2 ["1" "2" ["1" "2"] [1x1 dtmc] [3x1 varm]
"3"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance: Mdl.Submodels(2)
12-1688
"2-Dimensional VAR(0) Model" "Y1" "Y2" 2 0 [1 -1]' {} [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
msVAR
ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VAR(1) Model" "Y1" "Y2" 2 1 [2 -2]' {2×2 matrix} at lag [1] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
Mdl.Submodels(3) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VAR(2) Model" "Y1" "Y2" 2 2 [3 -3]' {2×2 matrices} at lags [1 2] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
Mdl is a fully specified msVAR object representing a multivariate three-state Markov-switching dynamic regression model.
Create Fully Specified Model Containing Regression Component Consider including regression components for exogenous variables in each submodel of the Markovswitching dynamic regression model in “Create Fully Specified Multivariate Model” on page 12-1687. Create a three-state discrete-time Markov chain model that describes the regime switching mechanism. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P);
For each regime, use varm to create a VARX model that describes the response process within the regime. Specify all parameter values. % Constants (numSeries x 1 vectors) C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; % Autoregression coefficients (numSeries AR1 = {}; % 0 AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2
x numSeries matrices) lags lag lags
12-1689
12
Functions
% Regression coefficients Beta1 = [1;-1]; Beta2 = [2 2;-2 -2]; Beta3 = [3 3 3;-3 -3 -3];
(numSeries x numRegressors matrices) % 1 regressor % 2 regressors % 3 regressors
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VARX Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Beta',Beta1,... 'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Beta',Beta2,... 'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Beta',Beta3,... 'Covariance',Sigma3);
mdl1, mdl2, and mdl3 are fully specified varm objects representing the state-specified submodels. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2; mdl3];
Use msVAR to create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
3 2 ["1" "2" ["1" "2"] [1x1 dtmc] [3x1 varm]
"3"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional VARX(0) Model with 1 Predictor" "Y1" "Y2" 2 0 [1 -1]' {} [2×1 vector of zeros] [2×1 matrix] [2×2 matrix]
Mdl.Submodels(2) ans = varm with properties:
12-1690
msVAR
Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(1) Model with 2 Predictors" "Y1" "Y2" 2 1 [2 -2]' {2×2 matrix} at lag [1] [2×1 vector of zeros] [2×2 matrix] [2×2 matrix]
Mdl.Submodels(3) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(2) Model with 3 Predictors" "Y1" "Y2" 2 2 [3 -3]' {2×2 matrices} at lags [1 2] [2×1 vector of zeros] [2×3 matrix] [2×2 matrix]
Create Partially Specified Multivariate Model for Estimation Consider fitting to data a three-state Markov-switching model for a 2-D response process. Create a discrete-time Markov chain model for the switching mechanism. Specify a 3-by-3 matrix of NaN values for the transition matrix. This setting indicates that you want to estimate all transition probabilities. P = nan(3); mc = dtmc(P);
mc is a partially specified dtmc object. The transition matrix mc.P is completely unknown and estimable. Create 2-D VAR(0), VAR(1), and VAR(2) models by using the shorthand syntax of varm. Store the models in a vector. mdl1 = varm(2,0); mdl2 = varm(2,1); mdl3 = varm(2,2); mdl = [mdl1 mdl2 mdl3]; mdl(1) ans = varm with properties: Description: "2-Dimensional VAR(0) Model" SeriesNames: "Y1" "Y2"
12-1691
12
Functions
NumSeries: P: Constant: AR: Trend: Beta: Covariance:
2 0 [2×1 {} [2×1 [2×0 [2×2
vector of NaNs] vector of zeros] matrix] matrix of NaNs]
mdl contains three state-specific varm model templates for estimation. NaN values in the properties indicate estimable parameters. Create a Markov-switching model template from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
3 2 ["1" "2" ["1" "2"] [1x1 dtmc] [3x1 varm]
"3"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 0 [2×1 vector of {} [2×1 vector of [2×0 matrix] [2×2 matrix of
VAR(0) Model"
NaNs] zeros] NaNs]
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance: Mdl.Submodels(3)
12-1692
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of [2×0 matrix] [2×2 matrix of
VAR(1) Model"
NaNs] NaNs} at lag [1] zeros] NaNs]
msVAR
ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional VAR(2) Model" "Y1" "Y2" 2 2 [2×1 vector of NaNs] {2×2 matrices of NaNs} at lags [1 2] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix of NaNs]
Mdl is a partially specified msVAR model for estimation.
Specify Model Regression Component for Estimation Consider including regression components for exogenous variables in the submodels of the Markovswitching dynamic regression model in “Create Partially Specified Multivariate Model for Estimation” on page 12-1691. Assume that the VAR(0) model includes the regressor x1t, the VAR(1) model includes the regressors x1t and x2t, and the VAR(2) model includes the regressors x1t, x2t, and x3t. Create the discrete-time Markov chain. P = nan(3); mc = dtmc(P);
Create 2-D VARX(0), VARX(1), and VARX(2) models by using the shorthand syntax of varm. For each model, set the Beta property to a numSeries-by-numRegressors matrix of NaN values by using dot notation. Store all models in a vector. numSeries = 2; mdl1 = varm(numSeries,0); mdl1.Beta = NaN(numSeries,1); mdl2 = varm(numSeries,1); mdl2.Beta = NaN(numSeries,2); mdl3 = varm(numSeries,2); mdl3.Beta = nan(numSeries,3); mdl = [mdl1; mdl2; mdl3];
Create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl); Mdl.Submodels(2) ans = varm with properties: Description: "2-Dimensional VARX(1) Model with 2 Predictors" SeriesNames: "Y1" "Y2"
12-1693
12
Functions
NumSeries: P: Constant: AR: Trend: Beta: Covariance:
2 1 [2×1 {2×2 [2×1 [2×2 [2×2
vector matrix vector matrix matrix
of of of of of
NaNs] NaNs} at lag [1] zeros] NaNs] NaNs]
Create Model Specifying Equality Constraints for Estimation Consider the model in “Create Partially Specified Multivariate Model for Estimation” on page 121691. Suppose theory dictates that states do not persist. Create a discrete-time Markov chain model for the switching mechanism. Specify a 3-by-3 matrix of NaN values for the transition matrix. Indicate that states do not persist by setting the diagonal elements of the matrix to 0. P = nan(3); P(logical(eye(3))) = 0; mc = dtmc(P);
mc is a partially specified dtmc object. Create the submodels and store them in a vector. mdl1 = mdl2 = mdl3 = submdl
varm(2,0); varm(2,1); varm(2,2); = [mdl1; mdl2; mdl3];
Create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,submdl); Mdl.Switch.P ans = 3×3 0 NaN NaN
NaN 0 NaN
NaN NaN 0
estimate treats the known diagonal elements of the transition matrix as equality constraints during estimation. For more details, see estimate.
More About Markov-Switching Dynamic Regression Model A Markov-switching dynamic regression model of a univariate or multivariate response series yt describes the dynamic behavior of the series in the presence of structural breaks or regime changes. A collection of state-specific dynamic regression submodels describes the dynamic behavior of yt within the regimes. 12-1694
msVAR
f 1 yt; xt, θ1 , st = 1 yt
f 2 yt; xt, θ2 , st = 2
, ⋮ ⋮ f n yt; xt, θn , st = n
where: • st is a discrete-time Markov chain representing the switching mechanism among regimes (Switch). • n is the number of regimes (NumStates). •
f i yt; xt, θi is the regime i dynamic regression model of yt (Submodels(i)). Submodels are either univariate (ARX) or multivariate (VARX).
• xt is a vector of observed exogenous variables at time t. • θi is the regime i collection of parameters of the dynamic regression model, such as AR coefficients and the innovation variances. Hamilton [2] proposes a general model, known as Markov-switching autoregression (MSAR), allowing for lagged values of the switching state s. Hamilton [3] shows how to convert an MSAR model into a dynamic regression model with a higher-dimensional state space, supported by msVAR.
Version History Introduced in R2019b
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." Econometrica. Vol. 57, 1989, pp. 357–384. [3] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Krolzig, H.-M. Markov-Switching Vector Autoregressions. Berlin: Springer, 1997.
See Also dtmc | arima | varm | tsVAR
12-1695
12
Functions
mtimes Lag operator polynomial multiplication
Syntax C = mtimes(A, B, 'Tolerance',tolerance) C = A * B
Description Given two lag operator polynomials A(L) and B(L),C = mtimes(A, B, 'Tolerance',tolerance) performs a polynomial multiplication C(L) = A(L) * B(L). If at least one of A or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator). 'Tolerance' is the nonnegative scalar tolerance used to determine which coefficients are included in the result. The default tolerance is 1e-12. Specifying a tolerance greater than 0 allows the user to exclude polynomial lags with near-zero coefficients. A coefficient matrix of a given lag is excluded only if the magnitudes of all elements of the matrix are less than or equal to the specified tolerance. C = A * B performs a polynomial multiplication C(L) = A(L) * B(L).
Examples Multiply Two Lag Operator Polynomials Create two LagOp polynomials and multiply them together: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); mtimes(A,B) ans = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1.1 0.38 -0.04] Lags: [0 1 2 3] Degree: 3 Dimension: 1
Tips The multiplication operator (*) invokes mtimes, but the optional coefficient tolerance is available only by calling mtimes directly.
See Also mldivide | mrdivide 12-1696
normalbvarm
normalbvarm Bayesian vector autoregression (VAR) model with normal conjugate prior and fixed covariance for data likelihood
Description The Bayesian VAR model on page 12-1717 object normalbvarm specifies the prior distribution of the array of model coefficients Λ in an m-D VAR(p) model, where the innovations covariance matrix Σ is known and fixed. The prior distribution of Λ is the normal conjugate prior model on page 12-1718. In general, when you create a Bayesian VAR model object, it specifies the joint prior distribution and characteristics of the VARX model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12-1702.
Creation Syntax PriorMdl = normalbvarm(numseries,numlags) PriorMdl = normalbvarm(numseries,numlags,Name,Value) Description To create a normalbvarm object, use either the normalbvarm function (described here) or the bayesvarm function. The syntaxes for each function are similar, but the options differ. bayesvarm enables you to set prior hyperparameter values for Minnesota prior[1] regularization easily, whereas normalbvarm requires the entire specification of prior distribution hyperparameters. PriorMdl = normalbvarm(numseries,numlags) creates a numseries-D Bayesian VAR(numlags) model object PriorMdl, which specifies dimensionalities and prior assumptions for all model coefficients λ = vec Λ = vec Φ1 Φ2 ⋯ Φp c δ Β ′ , where: • numseries = m, the number of response time series variables. • numlags = p, the AR polynomial order. • The prior distribution of λ is the normal conjugate prior model on page 12-1718. • The fixed innovations covariance Σ is the m-by-m identity matrix. PriorMdl = normalbvarm(numseries,numlags,Name,Value) sets writable properties on page 12-1698 (except NumSeries and P) using name-value pair arguments. Enclose each property name in quotes. For example, normalbvarm(3,2,'Sigma',4*eye(3),'SeriesNames', ["UnemploymentRate" "CPI" "FEDFUNDS"]) specifies the names of the three response variables in the Bayesian VAR(2) model, and fixes the innovations covariance matrix at 4*eye(3).
12-1697
12
Functions
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double numlags — Number of lagged responses nonnegative integer Number of lagged responses in each equation of yt, specified as a nonnegative integer. The resulting model is a VAR(numlags) model; each lag has a numseries-by-numseries coefficient matrix. numlags sets the P property. Data Types: double
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to create a 3-D Bayesian VAR(1) model and label the first through third response variables, and then include a linear time trend term, enter: PriorMdl = normalbvarm(3,1,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]); PriorMdl.IncludeTrend = true; Model Characteristics and Dimensionality
Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the model dimensionality, for example '2-Dimensional VAR(3) Model'. Example: "Model 1" Data Types: string | char NumSeries — Number of time series m positive integer This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double P — Multivariate autoregressive polynomial order nonnegative integer This property is read-only. 12-1698
normalbvarm
Multivariate autoregressive polynomial order, specified as a nonnegative integer. P is the maximum lag that has a nonzero coefficient matrix. P specifies the number of presample observations required to initialize the model. Data Types: double SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. normalbvarm stores SeriesNames as a string vector. Example: ["UnemploymentRate" "CPI" "FEDFUNDS"] Data Types: string IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Data Types: logical IncludeTrend — Flag for including linear time trend term δt false (default) | true Flag for including a linear time trend term δt, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Data Types: logical NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. normalbvarm includes all predictor variables symmetrically in each response equation. Distribution Hyperparameters
Mu — Mean of multivariate normal prior on λ zeros(NumSeries*(NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors),1) (default) | numeric vector 12-1699
12
Functions
Mean of the multivariate normal prior on λ, specified as a NumSeries*k-by-1 numeric vector, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). Mu(1:k) corresponds to all coefficients in the equation of response variable SeriesNames(1), Mu((k + 1):(2*k)) corresponds to all coefficients in the equation of response variable SeriesNames(2), and so on. For a set of indices corresponding to an equation: • Elements 1 through NumSeries correspond to the lag 1 AR coefficients of the response variables ordered by SeriesNames. • Elements NumSeries + 1 through 2*NumSeries correspond to the lag 2 AR coefficients of the response variables ordered by SeriesNames. • In general, elements (q – 1)*NumSeries + 1 through q*NumSeries corresponds to the lag q AR coefficients of the response variables ordered by SeriesNames. • If IncludeConstant is true, element NumSeries*P + 1 is the model constant. • If IncludeTrend is true, element NumSeries*P + 2 is the linear time trend coefficient. • If NumPredictors > 0, elements NumSeries*P + 3 through k constitute the vector of regression coefficients of the exogenous variables. This figure shows the structure of the transpose of Mu for a 2-D VAR(3) model that contains a constant vector and four exogenous predictors: y1, t y2, t ⨉ ⨉ ϕ ϕ ϕ ϕ ϕ ϕ c β β β β ϕ ϕ ϕ ϕ ϕ [ 1, 11 1, 12 2, 11 2, 12 3, 11 3, 12 1 11 12 13 14 1, 21 1, 22 2, 21 2, 22 3, 21 ϕ3, 22 c2 β21 β22 β23 β24
], where • ϕq,jk is element (j,k) of the lag q AR coefficient matrix. • cj is the model constant in the equation of response variable j. • Bju is the regression coefficient of the exogenous variable u in the equation of response variable j. Tip bayesvarm enables you to specify Mu easily by using the Minnesota regularization method. To specify Mu directly: 1
Set separate variables for the prior mean of each coefficient matrix and vector.
2
Horizontally concatenate all coefficient means in this order: Coef f = Φ1 Φ2 ⋯ Φp c δ Β .
3
Vectorize the transpose of the coefficient mean matrix. Mu = Coeff.'; Mu = Mu(:);
Data Types: double V — Conditional covariance matrix of multivariate normal prior on λ eye(NumSeries*(NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors)) (default) | symmetric, positive definite numeric matrix 12-1700
normalbvarm
Conditional covariance matrix of multivariate normal prior on λ, specified as a NumSeries*k-byNumSeries*k symmetric, positive definite matrix, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). Row and column indices correspond to the model coefficients in the same way as Mu. For example, consider a 3-D VAR(2) model containing a constant and four exogenous variables. • V(1,1) is Var(ϕ1,11). • V(5,6) is Cov(ϕ2,12,ϕ2,13). • V(8,9) is Cov(β11,β12). Tip bayesvarm enables you to create any Bayesian VAR prior model and specify V easily by using the Minnesota regularization method. Data Types: double Sigma — Fixed innovations covariance matrix Σ eye(NumSeries) (default) | positive definite numeric matrix Fixed innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries positive definite numeric matrix. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. Sigma sets the Covariance property. Data Types: double VAR Model Parameters Derived from Distribution Hyperparameters
AR — Distribution mean of autoregressive coefficient matrices Φ1,…,Φp cell vector of numeric matrices This property is read-only. Distribution mean of the autoregressive coefficient matrices Φ1,…,Φp associated with the lagged responses, specified as a P-D cell vector of NumSeries-by-NumSeries numeric matrices. AR{j} is Φj, the coefficient matrix of lag j . Rows correspond to equations and columns correspond to lagged response variables; SeriesNames determines the order of response variables and equations. Coefficient signs are those of the VAR model expressed in difference-equation notation. If P = 0, AR is an empty cell. Otherwise, AR is the collection of AR coefficient means extracted from Mu. Data Types: cell Constant — Distribution mean of model constant c numeric vector This property is read-only. Distribution mean of the model constant c (or intercept), specified as a NumSeries-by-1 numeric vector. Constant(j) is the constant in equation j; SeriesNames determines the order of equations. 12-1701
12
Functions
If IncludeConstant = false, Constant is an empty array. Otherwise, Constant is the model constant vector mean extracted from Mu. Data Types: double Trend — Distribution mean of linear time trend δ numeric vector This property is read-only. Distribution mean of the linear time trend δ, specified as a NumSeries-by-1 numeric vector. Trend(j) is the linear time trend in equation j; SeriesNames determines the order of equations. If IncludeTrend = false (the default), Trend is an empty array. Otherwise, Trend is the linear time trend coefficient mean extracted from Mu. Data Types: double Beta — Distribution mean of regression coefficient matrix Β numeric matrix This property is read-only. Distribution mean of the regression coefficient matrix B associated with the exogenous predictor variables, specified as a NumSeries-by-NumPredictors numeric matrix. Beta(j,:) contains the regression coefficients of each predictor in the equation of response variable j yj,t. Beta(:,k) contains the regression coefficient in each equation of predictor xk. By default, all predictor variables are in the regression component of all response equations. You can down-weight a predictor from an equation by specifying, for the corresponding coefficient, a prior mean of 0 in Mu and a small variance in V. When you create a model, the predictor variables are hypothetical. You specify predictor data when you operate on the model (for example, when you estimate the posterior by using estimate). Columns of the predictor data determine the order of the columns of Beta. Data Types: double Covariance — Fixed innovations covariance matrix Σ positive definite numeric matrix This property is read-only. Fixed innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries symmetric, positive definite numeric matrix. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. The Sigma property sets Covariance. Data Types: double
Object Functions estimate
12-1702
Estimate posterior distribution of Bayesian vector autoregression (VAR) model parameters
normalbvarm
forecast simsmooth simulate summarize
Forecast responses from Bayesian vector autoregression (VAR) model Simulation smoother of Bayesian vector autoregression (VAR) model Simulate coefficients and innovations covariance matrix of Bayesian vector autoregression (VAR) model Distribution summary statistics of Bayesian vector autoregression (VAR) model
Examples Create Normal Conjugate Prior Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt
4
=c+
FEDFUNDSt
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and fixed covariance Σ = I, the 3-D identity matrix. Assume that the prior distribution vec Φ1, . . . , Φ4, c ′ ∼ Ν39 μ, V , where μ is a 39-by-1 vector of means and V is the 39-by-39 covariance matrix. Create a normal conjugate prior model for the 3-D VAR(4) model parameters. numseries = 3; numlags = 4; PriorMdl = normalbvarm(numseries,numlags) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a normalbvarm Bayesian VAR model object representing the prior distribution of the coefficients of the 3-D VAR(4) model. The command line display shows properties of the model. You can display properties by using dot notation. Display the prior mean matrices of the four AR coefficients by setting each matrix in the cell to a variable. 12-1703
12
Functions
AR1 = PriorMdl.AR{1} AR1 = 3×3 0 0 0
0 0 0
0 0 0
AR2 = PriorMdl.AR{2} AR2 = 3×3 0 0 0
0 0 0
0 0 0
AR3 = PriorMdl.AR{3} AR3 = 3×3 0 0 0
0 0 0
0 0 0
AR4 = PriorMdl.AR{4} AR4 = 3×3 0 0 0
0 0 0
0 0 0
normalbvarm centers all AR coefficients at 0 by default. The AR property is read only, but it is derived from the writeable property Mu. Display the fixed innovations covariance Σ. PriorMdl.Covariance ans = 3×3 1 0 0
0 1 0
0 0 1
Covariance is a read-only property. To set the value Σ , use the 'Sigma' name-value pair argument or specify the Sigma property by using dot notation. For example: PriorMdl.Sigma = 4*eye(PriorMdl.NumSeries);
12-1704
normalbvarm
Specify Innovations Covariance Matrix Consider the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1703. Suppose econometric theory dictates that −5
10 Σ= 0
−4
10
−4
0 10 0 . 1 −0 . 2 . −0 . 2 1 . 6
Create a normal conjugate prior model for the VAR model coefficients. Specify the value of Σ. numseries = 3; numlags = 4; Sigma = [10e-5 0 10e-4; 0 0.1 -0.2; 10e-4 -0.2 1.6]; PriorMdl = normalbvarm(numseries,numlags,'Sigma',Sigma) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
Because Σ is fixed for normalbvarm prior models, PriorMdl.Sigma and PriorMdl.Covariance are equal. PriorMdl.Sigma ans = 3×3 0.0001 0 0.0010
0 0.1000 -0.2000
0.0010 -0.2000 1.6000
PriorMdl.Covariance ans = 3×3 0.0001 0 0.0010
0 0.1000 -0.2000
0.0010 -0.2000 1.6000
12-1705
12
Functions
Create Bayesian AR(2) Model with Normal Conjugate Coefficient Prior Consider a 1-D Bayesian AR(2) model for the daily NASDAQ returns from January 2, 1990 through December 31, 2001. yt = c + ϕ1 yt − 1 + ϕ2 yt − 1 + εt . The coefficient prior distribution is [ϕ1 ϕ2 c]′ | σ2 ∼ N3 μ, V , where μ is a 3-by-1 vector of coefficient means and V is a 3-by-3 covariance matrix. Assume Var(εt) is 2. Create a normal conjugate prior model for the AR(2) model parameters. numseries = 1; numlags = 2; PriorMdl = normalbvarm(numseries,numlags,'Sigma',2) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" 1 2 "Y1" 1 0 0 [3x1 double] [3x3 double] 2 {[0] [0]} 0 [1x0 double] [1x0 double] 2
Specify High Tightness for Lags and Response Names In the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1703, consider excluding lags 2 and 3 from the model. You cannot exclude coefficient matrices from models, but you can specify high prior tightness on zero for coefficients that you want to exclude. Create a normal conjugate prior model for the 3-D VAR(4) model parameters. Specify response variable names. By default, AR coefficient prior means are zero. Specify high tightness values for lags 2 and 3 by setting their prior variances to 1e-6. Leave all other coefficient tightness values at their defaults: 12-1706
normalbvarm
• 1 for AR coefficient variances • 1e3 for constant vector variances • 0 for all coefficient covariances numseries = 3; numlags = 4; seriesnames = ["INFL"; "UNRATE"; "FEDFUNDS"]; vPhi1 = ones(numseries,numseries); vPhi2 = 1e-6*ones(numseries,numseries); vPhi3 = 1e-6*ones(numseries,numseries); vPhi4 = ones(numseries,numseries); vc = 1e3*ones(3,1); Vmat = [vPhi1 vPhi2 vPhi3 vPhi4 vc]'; V = diag(Vmat(:)); PriorMdl = normalbvarm(numseries,numlags,'SeriesNames',seriesnames,... 'V',V) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "UNRATE" "FEDFUNDS"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
Set Prior Hyperparameters for Minnesota Regularization normalbvarm options enable you to specify coefficient prior hyperparameter values directly, but bayesvarm options are well suited for tuning hyperparameters following the Minnesota regularization method. Consider the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1703. The model contains 39 coefficients. For coefficient sparsity, create a normal conjugate Bayesian VAR model by using bayesvarm. Specify the following, a priori: • Each response is an AR(1) model, on average, with lag 1 coefficient 0.75. • Prior self-lag coefficients have variance 100. This large variance setting allows the data to influence the posterior more than the prior. • Prior cross-lag coefficients have variance 1. This small variance setting tightens the cross-lag coefficients to zero during estimation. 12-1707
12
Functions
• Prior coefficient covariances decay with increasing lag at a rate of 2 (that is, lower lags are more important than higher lags). • The innovations covariance Σ = I. numseries = 3; numlags = 4; seriesnames = ["INFL"; "UNRATE"; "FEDFUNDS"]; Sigma = eye(numseries); PriorMdl = bayesvarm(numseries,numlags,'ModelType','normal','Sigma',Sigma,... 'Center',0.75,'SelfLag',100,'CrossLag',1,'Decay',2,'SeriesNames',seriesnames) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "UNRATE" "FEDFUNDS"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
Display all prior coefficient means. Phi1 = PriorMdl.AR{1} Phi1 = 3×3 0.7500 0 0
0 0.7500 0
Phi2 = PriorMdl.AR{2} Phi2 = 3×3 0 0 0
0 0 0
0 0 0
Phi3 = PriorMdl.AR{3} Phi3 = 3×3 0 0 0
12-1708
0 0 0
0 0 0
0 0 0.7500
[3x3 double]}
normalbvarm
Phi4 = PriorMdl.AR{4} Phi4 = 3×3 0 0 0
0 0 0
0 0 0
Display a heatmap of the prior coefficient covariances for each response equation.
numexocoeffseqn = PriorMdl.IncludeConstant + ... PriorMdl.IncludeTrend + PriorMdl.NumPredictors; % Number of exogenous coefficient numcoeffseqn = PriorMdl.NumSeries*PriorMdl.P + numexocoeffseqn; % Total number of coefficients pe arcoeffnames = strings(numseries,numlags,numseries); for j = 1:numseries % Equations for r = 1:numlags for k = 1:numseries % Response Variables arcoeffnames(k,r,j) = "\phi_{"+r+","+j+k+"}"; end end arcoeffseqn = arcoeffnames(:,:,j); idx = ((j-1)*numcoeffseqn + 1):(numcoeffseqn*j) - numexocoeffseqn; Veqn = PriorMdl.V(idx,idx); figure heatmap(arcoeffseqn(:),arcoeffseqn(:),Veqn); title(sprintf('Equation of %s',seriesnames(j))) end
12-1709
12
Functions
12-1710
normalbvarm
Work with Prior and Posterior Distributions Consider the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1703. Estimate the posterior distribution, and generate forecasts from the corresponding posterior predictive distribution. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
12-1711
12
Functions
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a normal conjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names. Assume that the innovations covariance is the identity matrix. numseries = numel(seriesnames); numlags = 4; PriorMdl = normalbvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate Posterior Distribution Estimate the posterior distribution by passing the prior model and entire data series to estimate. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','equation'); Bayesian VAR under normal priors and fixed Sigma Effective Sample Size: 197
12-1712
normalbvarm
Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1260 -0.4400 0.1049 0.3176 -0.0545 0.0440 0.4173 | (0.1367) (0.2673) (0.0700) (0.1551) (0.2854) (0.0739) (0.1536) DUNRATE | -0.0236 0.4440 0.0350 0.0900 0.2295 0.0520 -0.0330 | (0.1367) (0.2673) (0.0700) (0.1551) (0.2854) (0.0739) (0.1536) DFEDFUNDS | -0.1514 -1.3408 -0.2762 0.3275 -0.2971 -0.3041 0.2609 | (0.1367) (0.2673) (0.0700) (0.1551) (0.2854) (0.0739) (0.1536) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS -------------------------------------INFL | 1 0 0 | (0) (0) (0) DUNRATE | 0 1 0 | (0) (0) (0) DFEDFUNDS | 0 0 1 | (0) (0) (0)
Because the prior is conjugate for the data likelihood, the posterior is analytically tractable. By default, estimate uses the first four observations as a presample to initialize the model. Generate Forecasts from Posterior Predictive Distribution From the posterior predictive distribution, generate forecasts over a two-year horizon. Because sampling from the posterior predictive distribution requires the entire data set, specify the prior model in forecast instead of the posterior. fh = 8; rng(1); % For reproducibility FY = forecast(PriorMdl,fh,rmDataTimeTable{:,seriesnames});
FY is an 8-by-3 matrix of forecasts. Plot the end of the data set and the forecasts. fp = rmDataTimeTable.Time(end) + calquarters(1:fh); figure plotdata = [rmDataTimeTable{end - 10:end,seriesnames}; FY]; plot([rmDataTimeTable.Time(end - 10:end); fp'],plotdata) hold on plot([fp(1) fp(1)],ylim,'k-.') legend(seriesnames) title('Data and Forecasts') hold off
12-1713
12
Functions
Compute Impulse Responses Plot impulse response functions by passing posterior estimations to armairf. armairf(PosteriorMdl.AR,[],'InnovCov',PosteriorMdl.Covariance)
12-1714
normalbvarm
12-1715
12
Functions
12-1716
normalbvarm
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. 12-1717
12
Functions
• zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation. Normal Conjugate Prior Model The normal conjugate prior model, outlined in [1], is an m-D Bayesian VAR model on page 12-1717 in which the innovations covariance matrix Σ is known and fixed, while the coefficient vector λ = vec(Λ) has the prior distribution λ Nm(mp + r + 1c + 1δ) μ, V , where: • r = NumPredictors. • 1c is 1 if IncludeConstant is true, and 0 otherwise. • 1δ is 1 if IncludeTrend is true, and 0 otherwise. The posterior distribution is proper and analytically tractable. λ yt, xt Nm(mp) + r + 1c + 1δ μ, V , where: 12-1718
normalbvarm
•
μ = V −1 +
T
∑
t=1
•
T
V = V −1 +
∑
t=1
Zt′Σ−1Zt Zt′Σ−1Zt
−1
V −1μ +
T
∑
t=1
Zt′Σ−1 yt .
−1
.
Version History Introduced in R2020a
References [1] Litterman, Robert B. "Forecasting with Bayesian Vector Autoregressions: Five Years of Experience." Journal of Business and Economic Statistics 4, no. 1 (January 1986): 25–38. https://doi.org/10.2307/1391384.
See Also Functions bayesvarm Objects semiconjugatebvarm | diffusebvarm | conjugatebvarm
12-1719
12
Functions
parcorr Sample partial autocorrelation
Syntax [pacf,lags] = parcorr(y) PACFTbl = parcorr(Tbl) [ ___ ,bounds] = parcorr( ___ ) [ ___ ] = parcorr( ___ ,Name=Value) parcorr( ___ ) parcorr(ax, ___ ) [ ___ ,h] = parcorr( ___ )
Description [pacf,lags] = parcorr(y) returns the sample partial autocorrelation on page 12-1731 function (PACF) pacf and associated lags lags of the univariate time series y. PACFTbl = parcorr(Tbl) returns the table PACFTbl containing variables for the sample PACF and associated lags of the last variable in the input table or timetable Tbl. To select a different variable in Tbl, for which to compute the PACF, use the DataVariable name-value argument. [ ___ ,bounds] = parcorr( ___ ) uses any input-argument combination in the previous syntaxes, and returns the output-argument combination for the corresponding input arguments and the approximate upper and lower confidence bounds bounds on the PACF. [ ___ ] = parcorr( ___ ,Name=Value) uses additional options specified by one or more namevalue arguments. For example, parcorr(Tbl,DataVariable="RGDP",NumLags=10,NumSTD=1.96) returns 10 lags of the sample PACF of the table variable "RGDP" in Tbl and 95% confidence bounds. parcorr( ___ ) plots the sample PACF of the input series with confidence bounds. parcorr(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = parcorr( ___ ) plots the sample PACF of the input series and additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it.
Examples Compute PACF from Vector of Time Series Data Compute the PACF of a univariate time series. Input the time series data as a numeric vector. Load the quarterly real GDP series in Data_GDP.mat. Plot the series, which is stored in the numeric vector Data. 12-1720
parcorr
load Data_GDP plot(Data)
The series exhibits exponential growth. Compute the returns of the series. ret = price2ret(Data);
ret is a series of real GDP returns; it has one less observation than the real GDP series. Compute the PACF of the real GDP returns, and return the associated lags. [pacf,lags] = parcorr(ret); [pacf lags] ans = 21×2 1.0000 0.3329 0.0828 -0.1205 -0.1080 -0.0869 0.0226 -0.0254 -0.0243 0.0699
0 1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000
12-1721
12
Functions
⋮
Let yt be the real GDP return at time t. pacf(3) = 0.0828 means that the correlation between yt and yt − 2, after adjusting for the linear effects of yt − 1 on yt, is 0.0828.
Compute PACF of Table Variable Compute the PACF of a time series, which is one variable in a table. Load the electricity spot price data set Data_ElectricityPrices.mat, which contains the daily spot prices in the timetable DataTimeTable. load Data_ElectricityPrices.mat DataTimeTable.Properties.VariableNames ans = 1x1 cell array {'SpotPrice'}
Plot the series. plot(DataTimeTable.SpotPrice)
The time series plot does not clearly indicate an exponential trend or unit root. 12-1722
parcorr
Compute the PACF of the raw spot price series. PACFTbl = parcorr(DataTimeTable) PACFTbl=21×2 table Lags PACF ____ ________ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ⋮
1 0.5541 0.10938 0.099833 0.029511 0.038836 0.065892 0.029965 0.034951 0.050091 0.051031 0.033994 0.051877 0.028973 0.047456 0.11895
parcorr returns the results in the table PACFTbl, where variables correspond to the PACF (PACF) and associated lags (Lags). By default, parcorr computes the PACF of the last variable in the table. To select a variable from an input table, set the DataVariable option.
Return PACF Confidence Bounds Consider the electricity spot prices in “Compute PACF of Table Variable” on page 12-1722. Load the electricity spot price data set Data_ElectricityPrices.mat. Compute the PACF and return the PACF confidence bounds. load Data_ElectricityPrices [PACFTbl,bounds] = parcorr(DataTimeTable) PACFTbl=21×2 table Lags PACF ____ ________ 0 1 2 3 4 5 6 7
1 0.5541 0.10938 0.099833 0.029511 0.038836 0.065892 0.029965
12-1723
12
Functions
8 9 10 11 12 13 14 15 ⋮
0.034951 0.050091 0.051031 0.033994 0.051877 0.028973 0.047456 0.11895
bounds = 2×1 0.0532 -0.0532
Assuming the spot prices follow a Gaussian white noise series, an approximate 95.4% confidence interval on the PACF is (-0.0532, 0.0532).
Compare OLS and Yule-Walker PACFs Load the US quarterly macroeconomic series data Data_USEconModel.mat. Remove all missing values from the timetable of data DataTimeTable by using listwise deletion. load Data_USEconModel DataTimeTable = rmmissing(DataTimeTable);
Compute the PACF of the raw effective federal funds rate FEDFUNDS by using the OLS method (the default method when the data does not contain any missing values). Change the name of the PACF variable of the output table to ols. PACFTbl = parcorr(DataTimeTable,DataVariable="FEDFUNDS",Method="ols"); PACFTbl = renamevars(PACFTbl,"PACF","ols");
Compute the PACF of the raw effective federal funds rate FEDFUNDS by solving the Yule-Walker equations. Store the result as the variable yw in PACFTbl. PACFTbl.yw = parcorr(DataTimeTable.FEDFUNDS,Method="yule-walker");
Compare the PACFs between the methods. PACFTbl PACFTbl=21×3 table Lags ols ____ _________ 0 1 2 3 4 5 6
12-1724
1 0.92881 0.074551 0.14949 -0.23745 0.073346 -0.25854
yw _________ 1 0.91502 0.061337 0.15031 -0.20808 0.073911 -0.21128
parcorr
7 8 9 10 11 12 13 14 15 ⋮
0.021783 0.10817 0.16147 -0.081344 0.050378 0.075574 0.026074 -0.036989 0.074656
0.018625 0.092159 0.10987 -0.077452 0.034769 0.070003 0.02403 -0.025924 0.048657
Plot the PACFs in the same stem plot. stem(repmat(PACFTbl.Lags,1,2),PACFTbl{:,["ols" "yw"]},"filled") title("PACF of Effective Federal Funds Rate") legend(["OLS" "Yule-Walker"])
Plot PACF of Simulated Time Series Specify the AR(2) model: yt = 0 . 6yt − 1 − 0 . 5yt − 2 + εt, where εt is Gaussian with mean 0 and variance 1. 12-1725
12
Functions
rng(1); % For reproducibility Mdl = arima(AR={0.6 -0.5},Constant=0,Variance=1) Mdl = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" Name = "Gaussian" 2 0 0 0 {0.6 -0.5} at lags [1 2] {} {} {} 0 [1×0] 1
Simulate 1000 observations from Mdl. y = simulate(Mdl,1000);
Plot the PACF. Specify that the series is an AR(2) process. parcorr(y,NumAR=2)
The PACF cuts off after the second lag. This behavior indicates an AR(2) process. 12-1726
parcorr
Specify Additional Lags for PACF Plot Specify the multiplicative seasonal ARMA (2, 0, 1) × (3, 0, 0)12 model: (1 − 0 . 75L − 0 . 15L2)(1 − 0 . 9L12 + 0 . 75L24 − 0 . 5L36)yt = 2 + εt − 0 . 5εt − 1, where εt is Gaussian with mean 0 and variance 1. Mdl = arima(AR={0.75 0.15},SAR={0.9 -0.75 0.5}, ... SARLags=[12 24 36],MA=-0.5,Constant=2, ... Variance=1);
Simulate data from Mdl. rng(1); y = simulate(Mdl,1000);
Plot the default partial autocorrelation function (PACF). figure parcorr(y)
The default correlogram does not display the dependence structure for higher lags. Plot the PACF for 40 lags. 12-1727
12
Functions
figure parcorr(y,NumLags=40)
The correlogram shows the larger correlations at lags 12, 24, and 36.
Input Arguments y — Observed univariate time series numeric vector Observed univariate time series for which parcorr computes or plots the PACF, specified as a numeric vector. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl contains contemporaneous observations of all variables. Specify a single series (variable) by using the DataVariable argument. The selected variable must be numeric. ax — Axes on which to plot Axes object 12-1728
parcorr
Axes on which to plot, specified as an Axes object. By default, parcorr plots to the current axes (gca). Note Specify missing observations using NaN. The parcorr function treats missing values as missing completely at random on page 12-1731. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: parcorr(Tbl,DataVariable="RGDP",NumLags=10,NumSTD=3) plots 10 lags of the sample PACF of the variable "RGDP" in Tbl, and displays confidence bounds consisting of 3 standard errors away from 0. NumLags — Number of lags positive integer Number of lags in the sample PACF, specified as a positive integer. parcorr uses lags 0:NumLags to estimate the PACF. The default is min([20,T – 1]), where T is the effective sample size of the input time series. Example: parcorr(y,Numlags=10) plots the sample PACF of y for lags 0 through 10. Data Types: double NumAR — Number of lags in theoretical AR model 0 (default) | nonnegative integer Number of lags in a theoretical AR model of the input time series, specified as a nonnegative integer less than NumLags. parcorr uses NumAR to estimate confidence bounds. For lags > NumAR, parcorr assumes that the input times series is a Gaussian white noise process. Consequently, the standard error is approximately 1/ T, where T is the effective sample size of the input time series. Example: parcorr(y,NumAR=10) specifies that y is an AR(10) process and plots confidence bounds for all lags greater than 10. Data Types: double NumSTD — Number of standard errors in confidence bounds 2 (default) | nonnegative scalar Number of standard errors in the confidence bounds, specified as a nonnegative scalar. For all lags greater than NumAR, the confidence bounds are 0 ± NumSTD*σ , where σ is the estimated standard error of the sample partial autocorrelation. The default yields approximate 95% confidence bounds. 12-1729
12
Functions
Example: parcorr(y,NumSTD=1.5) plots the PACF of y with confidence bounds 1.5 standard errors away from 0. Data Types: double Method — PACF estimation method "ols" | "yule-walker" | character vector PACF estimation method, specified as a value in this table. Value
Description
Restrictions
"ols"
Ordinary least squares (OLS)
The input times series must be fully observed (it cannot contain any NaN values).
"yule-walker"
Yule-Walker equations
None.
If the input time series is fully observed, the default is "ols". Otherwise, the default is "yulewalker". Example: parcorr(y,Method="yule-walker") computes the PACF of y using the Yule-Walker equations. Data Types: char | string DataVariable — Variable in Tbl last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl for which parcorr computes the PACF, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 selects the second table variable. Data Types: double | logical | char | string
Output Arguments pacf — Sample PACF numeric vector Sample PACF, returned as a numeric vector of length NumLags + 1. parcorr returns pacf only when you supply the input y. The elements of pacf correspond to lags 0, 1, 2, ..., NumLags (that is, elements of lags). For all time series, the lag 0 partial autocorrelation pacf(1) = 1. lags — PACF lags numeric vector PACF lags, returned as a numeric vector with elements 0:NumLags. parcorr returns lags only when you supply the input y. PACFTbl — Sample PACF table 12-1730
parcorr
Sample PACF, returned as a table with variables for the outputs pacf and lags. parcorr returns PACFTbl only when you supply the input Tbl. bounds — Approximate upper and lower confidence bounds numeric vector Approximate upper and lower confidence bounds assuming the input series is an AR(NumAR) process, returned as a two-element numeric vector. The NumSTD option specifies the number of standard errors from 0 in the confidence bounds. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
More About Partial Autocorrelation Function The partial autocorrelation function measures the correlation between yt and yt + k after adjusting for the linear effects of yt + 1,...,yt + k – 1. The estimation of the PACF involves solving the Yule-Walker equations with respect to the autocorrelations. However, if the time series is fully observed, then the PACF can be estimated by fitting successive autoregressive models of orders 1, 2, ... using ordinary least squares. For details, see [1], Chapter 3. Missing Completely at Random Observations of a random variable are missing completely at random if the tendency of an observation to be missing is independent of both the random variable and the tendency of all other observations to be missing.
Tips • To plot the PACF without confidence bounds, set NumSTD=0.
Algorithms parcorr plots the PACF when you do not request any output or when you request the fourth output h.
Version History Introduced before R2006a
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. 12-1731
12
Functions
[2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Apps Econometric Modeler Functions autocorr | crosscorr | filter Topics “Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-4 “Detect Autocorrelation” on page 3-20 “Box-Jenkins Methodology” on page 3-2 “Autocorrelation and Partial Autocorrelation” on page 3-11
12-1732
plot
plot Visualize prior and posterior densities of Bayesian linear regression model parameters
Syntax plot(PosteriorMdl) plot(PriorMdl) plot(PosteriorMdl,PriorMdl) plot( ___ ,Name,Value) pointsUsed = plot( ___ ) [pointsUsed,posteriorDensity,priorDensity] = plot( ___ ) [pointsUsed,posteriorDensity,priorDensity,FigureHandle] = plot( ___ )
Description plot(PosteriorMdl) or plot(PriorMdl) plots the posterior or prior distributions of the parameters in the Bayesian linear regression model on page 12-1747 PosteriorMdl or PriorMdl, respectively. plot adds subplots for each parameter to one figure and overwrites the same figure when you call plot multiple times. plot(PosteriorMdl,PriorMdl) plots the posterior and prior distributions in the same subplot. plot uses solid blue lines for posterior densities and dashed red lines for prior densities. plot( ___ ,Name,Value) uses any of the input argument combinations in the previous syntaxes and additional options specified by one or more name-value pair arguments. For example, you can evaluate the posterior or prior density by supplying values of β and σ2, or choose which parameter distributions to include in the figure. pointsUsed = plot( ___ ) also returns the values of the parameters that plot uses to evaluate the densities in the subplots. [pointsUsed,posteriorDensity,priorDensity] = plot( ___ ) also returns the values of the evaluated densities. If you specify one model, then plot returns the density values in PosteriorDensity. Otherwise, plot returns the posterior density values in PosteriorDensity and the prior density values in PriorDensity. [pointsUsed,posteriorDensity,priorDensity,FigureHandle] = plot( ___ ) returns the figure handle of the figure containing the distributions.
Examples Plot Prior and Posterior Distributions Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). 12-1733
12
Functions
GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions: •
β | σ2 ∼ N4 M, σ2V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model. Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and the variable names. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames);
PriorMdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Plot the prior distributions. plot(PriorMdl);
12-1734
plot
plot plots the marginal prior distributions of the intercept, regression coefficients, and disturbance variance. Suppose that the mean of the regression coefficients is [ − 20 4 0 . 001 2]′ and their scaled covariance matrix is 1 0 0 0 0 . 001 0 0 0 1e − 8 0 0 0
0 0 . 0 0.1
Also, the prior scale of the disturbance variance is 0.01. Specify the prior information using dot notation. PriorMdl.Mu = [-20; 4; 0.001; 2]; PriorMdl.V = diag([1 0.001 1e-8 0.01]); PriorMdl.B = 0.01;
Request a new figure and plot the prior distribution. plot(PriorMdl);
plot replaces the current distribution figure with a plot of the prior distribution of the disturbance variance. Load the Nelson-Plosser data set, and create variables for the predictor and response data. 12-1735
12
Functions
load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Estimate the posterior distributions. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is a conjugateblm model object that contains the posterior distributions of β and σ2. Plot the posterior distributions. plot(PosteriorMdl);
Plot the prior and posterior distributions of the parameters on the same subplots. plot(PosteriorMdl,PriorMdl);
12-1736
plot
Plot Distributions to Separate Figures Consider the regression model in “Plot Prior and Posterior Distributions” on page 12-1733. Load the Nelson-Plosser data set, create a default conjugate prior model, and then estimate the posterior using the first 75% of the data. Turn off the estimation display. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR; d = 0.75; PosteriorMdlFirst = estimate(PriorMdl,X(1:floor(d*end),:),y(1:floor(d*end)),... 'Display',false);
Plot the prior distribution and the posterior distribution of the disturbance variance. Return the figure handle. [~,~,~,h] = plot(PosteriorMdlFirst,PriorMdl,'VarNames','Sigma2');
12-1737
12
Functions
h is the figure handle for the distribution plot. If you change the tag name of the figure by changing the Tag property, then the next plot call places all new distribution plots on a different figure. Change the name of the figure handle to FirstHalfData using dot notation. h.Tag = 'FirstHalfData';
Estimate the posterior distribution using the rest of the data. Specify the posterior distribution based on the final 25% of the data as the prior distribution. PosteriorMdl = estimate(PosteriorMdlFirst,X(ceil(d*end):end,:),... y(ceil(d*end):end),'Display',false);
Plot the posterior of the disturbance variance based on half of the data and all the data to a new figure. plot(PosteriorMdl,PosteriorMdlFirst,'VarNames','Sigma2');
12-1738
plot
This type of plot shows the evolution of the posterior distribution when you incorporate new data.
Return Default Distribution and Evaluations Consider the regression model in “Plot Prior and Posterior Distributions” on page 12-1733. Load the Nelson-Plosser data set and create a default conjugate prior model. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Plot the prior distributions. Request the values of the parameters used to create the plots and their respective densities. [pointsUsedPrior,priorDensities1] = plot(PriorMdl);
12-1739
12
Functions
pointsUsedPrior is a 5-by-1 cell array of 1-by-1000 numeric vectors representing the values of the parameters that plot uses to plot the corresponding densities. The first element corresponds to the intercept, the next three elements correspond to the regression coefficients, and the last element corresponds to the disturbance variance. priorDensities1 has the same dimensions as pointsUsed and contains the corresponding density values. Estimate the posterior distribution. Turn off the estimation display. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Plot the posterior distributions. Request the values of the parameters used to create the plots and their respective densities. [pointsUsedPost,posteriorDensities1] = plot(PosteriorMdl);
12-1740
plot
pointsUsedPost and posteriorDensities1 have the same dimensions as pointsUsedPrior. The pointsUsedPost output can be different from pointsUsedPrior. posteriorDensities1 contains the posterior density values. Plot the prior and posterior distributions. Request the values of the parameters used to create the plots and their respective densities. [pointsUsedPP,posteriorDensities2,priorDensities2] = plot(PosteriorMdl,PriorMdl);
12-1741
12
Functions
All output values have the same dimensions as pointsUsedPrior. The posteriorDensities2 output contains the posterior density values. The priorDensities2 output contains the prior density values. Confirm that pointsUsedPP is equal to pointsUsedPost. compare = @(a,b)sum(a == b) == numel(a); cellfun(compare,pointsUsedPost,pointsUsedPP) ans = 5x1 logical array 1 1 1 1 1
The points used are equivalent. Confirm that the posterior densities are the same, but that the prior densities are not. cellfun(compare,posteriorDensities1,posteriorDensities2) ans = 5x1 logical array 1 1
12-1742
plot
1 1 1 cellfun(compare,priorDensities1,priorDensities2) ans = 5x1 logical array 0 0 0 0 0
When plotting only the prior distribution, plot evaluates the prior densities at points that produce a clear plot of the prior distribution. When plotting both a prior and posterior distribution, plot prefers to plot the posterior clearly. Therefore, plot can determine a different set of points to use.
Specify Values for Density Evaluation and Plotting Consider the regression model in “Plot Prior and Posterior Distributions” on page 12-1733. Load the Nelson-Plosser data set and create a default conjugate prior model for the regression coefficients and disturbance variance. Then, estimate the posterior distribution and obtain the estimation summary table from summarize. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR; PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 Log marginal likelihood: -259.348 | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | -24.2494 8.7821 [-41.514, -6.985] 0.003 t (-24.25, 8.65^2, 68) IPI | 4.3913 0.1414 [ 4.113, 4.669] 1.000 t (4.39, 0.14^2, 68) E | 0.0011 0.0003 [ 0.000, 0.002] 1.000 t (0.00, 0.00^2, 68) WR | 2.4683 0.3490 [ 1.782, 3.154] 1.000 t (2.47, 0.34^2, 68) Sigma2 | 44.1347 7.8020 [31.427, 61.855] 1.000 IG(34.00, 0.00069) summaryTbl = summarize(PosteriorMdl); summaryTbl = summaryTbl.MarginalDistributions;
summaryTbl is a table containing the statistics that estimate displays at the command line. 12-1743
12
Functions
For each parameter, determine a set of 50 evenly spaced values within three standard deviations of the mean. Put the values into the cells of a 5-by-1 cell vector following the order of the parameters that comprise the rows of the estimation summary table. Points = cell(numel(summaryTbl.Mean),1); % Preallocation for j = 1:numel(summaryTbl.Mean) Points{j} = linspace(summaryTbl.Mean(j) - 3*summaryTbl.Std(j),... summaryTbl.Mean(j) + 2*summaryTbl.Std(j),50); end
Plot the posterior distributions within their respective intervals. plot(PosteriorMdl,'Points',Points)
Input Arguments PosteriorMdl — Bayesian linear regression model storing posterior distribution characteristics conjugateblm model object | empiricalblm model object Bayesian linear regression model storing posterior distribution characteristics, specified as a conjugateblm or empiricalblm model object returned by estimate.
12-1744
plot
When you also specify PriorMdl, then PosteriorMdl is the posterior distribution composed of PriorMdl and data. If the NumPredictors and VarNames properties of the two models are not equal, plot issues an error. PriorMdl — Bayesian linear regression model storing prior distribution characteristics conjugateblm model object | semiconjugateblm model object | diffuseblm model object | mixconjugateblm model object | lassoblm model object Bayesian linear regression model storing prior distribution characteristics, specified as a conjugateblm, semiconjugateblm, diffuseblm, empiricalblm, customblm, mixconjugateblm, mixsemiconjugateblm, or lassoblm model object returned by bayeslm. When you also specify PosteriorMdl, then PriorMdl is the prior distribution that, when combined with the data likelihood, forms PosteriorMdl. If the NumPredictors and VarNames properties of the two models are not equal, plot issues an error. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'VarNames',["Beta1"; "Beta2"; "Sigma2"] plots the distributions of regression coefficients corresponding to the names Beta1 and Beta2 in the VarNames property of the model object and the disturbance variance Sigma2. VarNames — Parameter names cell vector of character vectors | string vector Parameter names indicating which densities to plot in the figure, specified as the comma-separated pair consisting of 'VarNames' and a cell vector of character vectors or string vector. VarNames must include "Intercept", any name in the VarNames property of PriorMdl or PosteriorMdl, or "Sigma2". By default, plot chooses "Intercept" (if an intercept exists in the model), all regression coefficients, and "Sigma2". If the model has more than 34 regression coefficients, then plot chooses the first through the 34th only. VarNames is case insensitive. Tip If your model contains many variables, then try plotting subsets of the parameters on separate plots for a better view of the distributions. Example: 'VarNames',["Beta(1)","Beta(2)"] Data Types: string | cell Points — Parameter values for density evaluation and plotting numeric vector | cell vector of numeric vectors Parameter values for density evaluation and plotting, specified as the comma-separated pair consisting of 'Points' and a numPoints-dimensional numeric vector or a numVarNames12-1745
12
Functions
dimensional cell vector of numeric vectors. numPoints is the number of parameters values that plot evaluates and plots the density. • If Points is a numeric vector, then plot evaluates and plots the densities of all specified distributions by using its elements (see VarNames). • If Points is a cell vector of numeric vectors, then: • numVarNames must be numel(VarNames), where VarNames is the value of VarNames. • Cells correspond to the elements of VarNames. • For j = 1,…,numVarNames, plot evaluates and plots the density of the parameter named VarNames{j} by using the vector of points in cell Points(j). By default, plot determines 1000 adequate values at which to evaluate and plot the density for each parameter. Example: 'Points',{1:0.1:10 10:0.2:25 1:0.01:2} Data Types: double | cell
Output Arguments pointsUsed — Parameter values used for density evaluation and plotting cell vector of numeric vectors Parameter values used for density evaluation and plotting, returned as a cell vector of numeric vectors. Suppose Points and VarNames are the values of Points and VarNames, respectively. If Points is a numeric vector, then PointsUsed is repmat({Points},numel(VarNames)). Otherwise, PointsUsed equals Points. Cells correspond to the names in VarNames. posteriorDensity — Evaluated and plotted posterior densities cell vector of numeric row vectors Evaluated and plotted posterior densities, returned as a numVarNames-by-1 cell vector of numeric row vectors. numVarNames is numel(VarNames), where VarNames is the value of VarNames. Cells correspond to the names in VarNames. posteriorDensity has the same dimensions as priorDensity. priorDensity — Evaluated and plotted prior densities cell vector of numeric row vectors Evaluated and plotted prior densities, returned as a numVarNames-by-1 cell vector of numeric row vectors. priorDensity has the same dimensions as posteriorDensity. FigureHandle — Figure window containing distributions figure object Figure window containing distributions, returned as a figure object. plot overwrites the figure window that it produces. If you rename FigureHandle for a new figure window, or call figure before calling plot, then plot continues to overwrite the current figure. To plot distributions to a different figure window, 12-1746
plot
change the figure identifier of the current figure window by renaming its Tag property. For example, to rename the current figure window called FigureHandle to newFigure, at the command line, enter: FigureHandle.Tag = newFigure;
Limitations Because improper distributions (distributions with densities that do not integrate to 1) are not well defined, plot cannot plot them very well.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Version History Introduced in R2017a
See Also Objects conjugateblm | semiconjugateblm | diffuseblm | empiricalblm | customblm | mixconjugateblm | mixsemiconjugateblm | lassoblm Functions forecast | simulate | summarize | summarize 12-1747
12
Functions
Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-1748
plus
plus Lag operator polynomial addition
Syntax C = plus(A, B, 'Tolerance', tolerance) C = A + B
Description Given two lag operator polynomials A(L) and B(L), C = plus(A, B, 'Tolerance', tolerance) performs a polynomial addition C(L) = A(L) + B(L)with tolerance tolerance. 'Tolerance' is the nonnegative scalar tolerance used to determine which coefficients are included in the result. The default tolerance is 1e-12. Specifying a tolerance greater than 0 allows the user to exclude polynomial lags with near-zero coefficients. A coefficient matrix of a given lag is excluded only if the magnitudes of all elements of the matrix are less than or equal to the specified tolerance. C = A + B performs a polynomial addition. If at least one of A or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator).
Examples Add Two Lag Operator Polynomials Create two LagOp polynomials and add them: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); plus(A,B) ans = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [2 -1.1 0.08] Lags: [0 1 2] Degree: 2 Dimension: 1
Algorithms The addition operator (+) invokes plus, but the optional coefficient tolerance is available only by calling plus directly.
See Also minus 12-1749
12
Functions
pptest Phillips-Perron test for one unit root
Syntax h = pptest(y) [h,pValue,stat,cValue] = pptest(y) StatTbl = pptest(Tbl) [ ___ ] = pptest( ___ ,Name=Value) [ ___ ,reg] = pptest( ___ )
Description h = pptest(y) returns the rejection decision h from conducting the Phillips-Perron test on page 121758 for a unit root in the univariate time series y. [h,pValue,stat,cValue] = pptest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = pptest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Phillips-Perron test on the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = pptest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. pptest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when pptest conducts multiple tests: • pptest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, pptest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West covariance estimator and the second test includes 1 autocovariance lags. [ ___ ,reg] = pptest( ___ ) additionally returns a structure of regression statistics for the hypothesis test reg.
Examples Conduct Phillips-Perron Test on Vector of Data Test a time series for a unit root using the default options of pptest. Input the time series data as a numeric vector. 12-1750
pptest
Load the Canadian inflation rate data and extract the CPI-based inflation rate INF_C. load Data_Canada y = DataTable.INF_C;
Test the time series for a unit root. h = pptest(y) h = logical 0
The result h = 0 indicates that this test fails to reject the null hypothesis of a unit root against the AR(1) alternative.
Return Test p-Value and Decision Statistics Load Canadian inflation rate data and extract the CPI-based inflation rate INF_C. load Data_Canada y = DataTable.INF_C;
Test the time series for a unit root. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stat,cValue] = pptest(y) h = logical 0 pValue = 0.3255 stat = -0.8769 cValue = -1.9476
Conduct Phillips-Perron Test on Table Variable Test a time series, which is one variable in a table, for a unit root using default options. Load Canadian inflation rate data, which contains yearly measurements on five time series variables in the table DataTable. load Data_Canada
Test the long-term bond rate series INT_L, the last variable in the table, for a unit root. StatTbl = pptest(DataTable) StatTbl=1×8 table h _____
pValue ______
stat _______
cValue _______
Lags ____
Alpha _____
Model ______
Test ______
12-1751
12
Functions
Test 1
false
0.7358
0.24601
-1.9476
0
0.05
{'AR'}
{'T1'}
pptest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, Model, and Test), and rows correspond to individual tests (in this case, pptest conducts one test). By default, pptest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Assess Stationarity Using Phillips-Perron Test Test GDP data for a unit root using a trend-stationary alternative with 0, 1, and 2 lags for the NeweyWest estimator. Load the GDP data set. load Data_GDP logGDP = log(Data);
Perform the Phillips-Perron test including 0, 1, and 2 autocovariance lags in the Newey-West robust covariance estimator. h = pptest(logGDP,Model="TS",Lags=0:2) h = 1x3 logical array 0
0
0
Each test returns h = 0, which means the test fails to reject the unit-root null hypothesis for each set of lags. Therefore, there is not enough evidence to suggest that log GDP is trend stationary.
Inspect Regression Statistics Test a time series for a unit root against trend-stationary alternatives. Inspect the regression statistics corresponding to each of the tests. Load a US macroeconomic data set Data_USEconModel.mat. Compute the log of the GDP and include the result as a new variable called LogGDP in the data set. load Data_USEconModel DataTimeTable.LogGDP = log(DataTimeTable.GDP);
Test for a unit root in the logged GDP series using three different choices for the number of lagged difference terms. Return the regression statistics for each alternative model. [StatTbl,reg] = pptest(DataTimeTable,DataVariable="LogGDP",Model="TS",Lags=0:2); StatTbl StatTbl=3×8 table h
12-1752
pValue
stat
cValue
Lags
Alpha
Model
Test
pptest
Test 1 Test 2 Test 3
_____
_______
_______
_______
false false false
0.999 0.999 0.99829
1.0247 0.56702 0.31644
-3.4302 -3.4302 -3.4302
____ 0 1 2
_____
______
______
0.05 0.05 0.05
{'TS'} {'TS'} {'TS'}
{'T1'} {'T1'} {'T1'}
pptest treats each of the three lag choices as separate tests, and returns results and settings for each test along the rows of the table StatTbl. reg is a 3-by-1 structure array containing regression statistics corresponding to each of the three alternative models. Display the names of the coefficients, and their estimates and corresponding p-values resulting from the regressions of the three alternative models. test1 = array2table([reg(1).coeff reg(1).tStats.pVal], ... RowNames=reg(1).names,VariableNames=["Coeff" "pValue"]) test1=3×2 table Coeff ___________ c d a
-0.014026 -0.00013104 1.0061
pValue ___________ 0.6654 0.22383 8.5908e-255
test2 = array2table([reg(2).coeff reg(2).tStats.pVal], ... RowNames=reg(2).names,VariableNames=["Coeff" "pValue"]) test2=3×2 table Coeff ___________ c d a
-0.014026 -0.00013104 1.0061
pValue ___________ 0.6654 0.22383 8.5908e-255
test3 = array2table([reg(3).coeff reg(3).tStats.pVal], ... RowNames=reg(3).names,VariableNames=["Coeff" "pValue"]) test3=3×2 table Coeff ___________ c d a
-0.014026 -0.00013104 1.0061
pValue ___________ 0.6654 0.22383 8.5908e-255
The coefficients c and d are the drift and deterministic trend, respectively, in the alternative model. a is the lag 1 AR term in the null and alternative models. Although each test uses a different number of lags, all coefficient estimates are the same. This result occurs because Lags factors into the NeweyWest estimate of the long-run variance, not the model itself. Compare the Newey-West variance estimates among the tests. reg.NWEst
12-1753
12
Functions
ans = 1.2352e-04 ans = 1.8025e-04 ans = 2.2355e-04
Input Arguments y — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note pptest removes missing observations, represented by NaN values, from the input series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: pptest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, for the presence of a unit root in the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West estimator of the long-run variance and the second test includes 1 autocovariance lag. Lags — Number of autocovariance lags 0 (default) | nonnegative integer | vector of nonnegative integers Number of autocovariance lags to include in the Newey-West estimator of the long-run variance, specified as a nonnegative integer or vector of nonnegative integers. If Lags(j) > 0, pptest includes lags 1 through Lags(j) in the estimator for test j. pptest conducts a separate test for each element in Lags. Example: Lags=0:2 includes zero lagged autocovariance terms in the Newey-West estimator for the first test, the lag 1 autocovariance term for the second test, and autocovariance lags 1 and 2 in the third test. Data Types: double 12-1754
pptest
Model — Model variant "AR" (default) | "ARD" | "TS" | character vector | string vector | cell vector of character vectors Model variant, specified as a model variant name, or a string vector or cell vector of model names. This table contains the supported model variant names. Model Variant Name
Description
"AR"
Autoregressive model variant, which specifies a test of the null model yt = yt – 1 + εt against the alternative model yt = ϕyt – 1 + εt with AR(1) coefficient |ϕ| < 1.
"ARD"
Autoregressive model with drift variant, which specifies a test of the null model yt = yt – 1 + εt against the alternative model yt = c + ϕyt – 1 + εt with drift coefficient c and AR(1) coefficient |ϕ| < 1.
"TS"
Trend-stationary model variant, which specifies a test of the null model yt = yt – 1 + εt against the alternative model yt = c + + ϕyt – 1 + εt with drift coefficient c, deterministic trend coefficient δ, and AR(1) coefficient | ϕ| < 1. pptest conducts a separate test for each model variant name in Model. Example: Model=["AR" "ARD"] uses the stationary AR model as the alternative hypothesis for the first test, and then uses the stationary AR model with drift as the alternative hypothesis for the second test. Data Types: char | cell | string Test — Test statistic "t1" (default) | "t2" | character vector | string vector | cell vector of character vectors Test statistic, specified as a test name, or a string vector or cell vector of test names. This table contains the supported test names.
Test Name
Description
"t1"
Modification of the standard t statistic t1 =
ϕ −1 , SE ϕ
computed using the ordinary least squares (OLS) estimate of the AR(1) coefficient ϕ and its standard error SE(ϕ ), in the alternative model.
12-1755
12
Functions
Test Name
Description
"t2"
Modification of the unstudentized t statistic t2 = T(ϕ – 1) t2 =
T(ϕ − 1) , 1−β1−…−βp
computed using the OLS estimates of the AR(1) coefficient ϕ in the alternative model. T is the effective sample size, which is adjusted for lags and missing values. The test assesses the significance of the restriction ϕ − 1 = 0. pptest modifies the test statistics to account for serial correlations in the innovations process εt. pptest conducts a separate test for each test name in Test. Example: Test="t2" computes the F test statistic for all tests. Data Types: char | cell | string Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. pptest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string Note • When pptest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors.
12-1756
pptest
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. pptest returns h when you supply the input y. • Values of 1 indicate rejection of the unit-root null hypothesis in favor of the alternative. • Values of 0 indicate failure to reject the unit-root null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. pptest returns pValue when you supply the input y. The p-values are left-tail probabilities. When test statistics are outside tabulated critical values, pptest returns maximum (0.999) or minimum (0.001) p-values. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. pptest returns stat when you supply the input y. pptest computes test statistics using OLS estimates of the coefficients in the alternative model. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. pptest returns cValue when you supply the input y. The critical values are for left-tail probabilities. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. pptest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Model, and Test. reg — Regression statistics structure array Regression statistics from the OLS estimation of coefficients in the alternative model, returned as a structure array with number of records equal to the number of tests. Each element of reg has the fields in this table. You can access a field using dot notation, for example, reg(1).coeff contains the coefficient estimates of the first test. 12-1757
12
Functions
num
Length of input series with NaNs removed
size
Effective sample size, adjusted for lags
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
autoCov
Estimated residual autocovariances
NWEst
Newey-West estimator
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
More About Phillips-Perron Test The Phillips-Perron test assesses the null hypothesis of a unit root in a univariate time series yt, where yt = c + δt + ϕyt – 1 + εt and • c is the drift coefficient (see Model). • δ is the deterministic trend coefficient (see Model). • εt is a mean zero innovation process. The null hypothesis of a unit root restricts ϕ = 1. The alternative hypothesis is ϕ < 1. A test that fails to reject the null hypothesis, fails to reject the possibility of a unit root. 12-1758
pptest
Variants of the model allow for different growth characteristics (see Model). The model with δ = 0 has no trend component, and the model with c = 0 and δ = 0 has no drift or trend.
Tips • To draw valid inferences from a Phillips-Perron test, you must determine a suitable value for the Lags argument. The following methods help determine a suitable value: • Begin by setting a small value and then evaluate the sensitivity of the results by adding more lags. • Inspect sample autocorrelations of yt − yt−1; slow rates of decay require more lags. The Newey-West estimator is consistent when the number of lags is O(T1/4), where T is the effective sample size, adjusted for lag and missing values. For more details, see [9] and [5]. • With a specific testing strategy in mind, determine the value of Model by the growth characteristics of yt. If you include too many regressors (see Lags), the test loses power; if you include too few regressors, the test is biased towards favoring the null model [2]. In general, if a series grows, the "TS" model (see Model) provides a reasonable trend-stationary alternative to a unit-root process with drift. If a series is does not grow, the "AR" and "ARD" models provide reasonable stationary alternatives to a unit-root process without drift. The "ARD" alternative model has a mean of c/(1 – a); the "AR" alternative model has mean 0.
Algorithms • In general, when a time series is lagged, the sample size is reduced. Without a presample, if yt is defined for t = 1,…,T, the lag k series yt–k is defined for t = k+1,…,T. Consequently, the effective sample size of the common time base is T − k. • To account for serial correlations in the innovations process εt, pptest uses modified DickeyFuller statistics (see adftest). • Phillips-Perron statistics stat follow nonstandard distributions under the null, even asymptotically. pptest uses tabulated critical values, generated by Monte Carlo simulations, for a range of sample sizes and significance levels of the null model with Gaussian innovations and five million replications per sample size. pptest interpolates critical values cValue and p-values pValue from the tables. Tables for tests of Test types "t1" and "t2" are identical to those for adftest.
Version History Introduced in R2009b
References [1] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Elder, J., and P. E. Kennedy. "Testing for Unit Roots: What Should Students Be Taught?" Journal of Economic Education. Vol. 32, 2001, pp. 137–146. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-1759
12
Functions
[4] Newey, W. K., and K. D. West. "A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, 1987, pp. 703–708. [5] Perron, P. "Trends and Random Walks in Macroeconomic Time Series: Further Evidence from a New Approach." Journal of Economic Dynamics and Control. Vol. 12, 1988, pp. 297–332. [6] Phillips, P. "Time Series Regression with a Unit Root." Econometrica. Vol. 55, 1987, pp. 277–301. [7] Phillips, P., and P. Perron. "Testing for a Unit Root in Time Series Regression." Biometrika. Vol. 75, 1988, pp. 335–346. [8] Schwert, W. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159. [9] White, H., and I. Domowitz. "Nonlinear Regression with Dependent Observations." Econometrica. Vol. 52, 1984, pp. 143–162.
See Also adftest | kpsstest | vratiotest | lmctest Topics “Unit Root Nonstationarity” on page 3-33
12-1760
price2ret
price2ret Convert prices to returns
Syntax [Returns,intervals] = price2ret(Prices) ReturnTbl = price2ret(PriceTbl) [ ___ ] = price2ret( ___ ,Name=Value)
Description [Returns,intervals] = price2ret(Prices) returns the matrix of numVars continuously compounded return series Returns, and corresponding time intervals intervals, from the matrix of numVars price series Prices. ReturnTbl = price2ret(PriceTbl) returns the table or timetable of continuously compounded return series ReturnTbl of each variable in the table or timetable of price series PriceTbl. To select different variables in Tbl from which to compute returns, use the DataVariables name-value argument. [ ___ ] = price2ret( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. price2ret returns the output argument combination for the corresponding input arguments. For example, price2ret(Tbl,Method="periodic",DataVariables=1:5) computes the simple periodic returns of the first five variables in the input table Tbl.
Examples Compute Return Series from Price Series in Vector of Data Load the Schwert Stock data st Data_SchwertStock.mat, which contains daily prices of the S&P index from 1930 through 2008, among other variables (enter Description for more details). load Data_SchwertStock numObs = height(DataTableDly) numObs = 20838 dates = datetime(datesDly,ConvertFrom="datenum");
Convert the S&P price series to returns. prices = DataTableDly.SP; returns = price2ret(prices);
returns is a 20837-by-1 vector of daily S&P returns compounded continuously. r9 = returns(9) r9 = 0.0033
12-1761
12
Functions
p9_10 = [prices(9) prices(10)] p9_10 = 1×2 21.4500
21.5200
returns(9) = 0.0033 is the daily return of the prices in the interval [21.45, 21.52]. plot(dates,DataTableDly.SP) ylabel("Price") yyaxis right plot(dates(1:end-1),returns) ylabel("Return") title("S&P Index Prices and Returns")
Compute Simple Periodic Return Series from Table of Price Series Convert the price series in a table to simple periodic return series. Load the US equity indices data set, which contains the table DataTable of daily closing prices of the NYSE and NASDAQ composite indices from 1990 through 2011. load Data_EquityIdx
Create a timetable from the table. 12-1762
price2ret
dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); numObs = height(TT);
Convert the NASDAQ and NYSE prices to simple periodic and continuously compounded returns. varnames = ["NASDAQ" "NYSE"]; TTRetC = price2ret(TT,DataVariables=varnames); TTRetP = price2ret(TT,DataVariables=varnames,Method="periodic");
Because TT is a timetable, TTRetC and TTRetP are timetables. Plot the return series with the corresponding prices for the last 50 observations. idx = ((numObs - 1) - 51):(numObs - 1); figure plot(dates(idx + 1),TT.NYSE(idx + 1)) title("NYSE Index Prices and Returns") ylabel("Price") yyaxis right h = plot(dates(idx),[TTRetC.NYSE(idx) TTRetP.NYSE(idx)]); h(2).Marker = 'o'; h(2).Color = 'k'; ylabel("Return") legend(["Price" "Continuous" "Periodic"],Location="northwest") axis tight
12-1763
12
Functions
figure plot(dates(idx + 1),TT.NASDAQ(idx + 1)) title("NASDAQ Index Prices and Returns") ylabel("Price") yyaxis right h = plot(dates(idx),[TTRetC.NASDAQ(idx) TTRetP.NASDAQ(idx)]); h(2).Marker = 'o'; h(2).Color = 'k'; ylabel("Return") legend(["Price" "Continuous" "Periodic"],Location="northwest") axis tight
In this case, the simple periodic and continuously compounded returns of each price series are similar.
Specify Observation Times and Units Create two stock price series from continuously compounded returns that have the following characteristics: • Series 1 grows at a 10 percent rate at each observation time. • Series 2 changes at a random uniform rate in the interval [-0.1, 0.1] at each observation time. • Each series starts at price 100 and is 10 observations in length. 12-1764
price2ret
rng(1); % For reproducibility numObs = 10; p1 = 100; r1 = 0.10; r2 = [0; unifrnd(-0.10,0.10,numObs - 1,1)]; s1 = 100*exp(r1*(0:(numObs - 1))'); cr2 = cumsum(r2); s2 = 100*exp(cr2); S = [s1 s2];
Convert each price series to a return series, and return the observation intervals. [R,intervals] = price2ret(S);
Prepend the return series so that the input and output elements are of the same length and correspond. [[NaN; intervals] S [[NaN NaN]; R] r2] ans = 10×6 NaN 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100.0000 110.5171 122.1403 134.9859 149.1825 164.8721 182.2119 201.3753 222.5541 245.9603
100.0000 98.3541 102.7850 93.0058 89.4007 83.3026 76.7803 72.1105 69.9172 68.4885
NaN 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000
NaN -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
0 -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
price2ret returns rates matching the rates from the simulated series. price2ret assumes prices are recorded in a regular time base. Therefore, all durations between prices are 1. Convert the prices to returns again, but associate the prices with years starting from August 1, 2010. tau1 = datetime(2010,08,01); dates = tau1 + years((0:(numObs-1))'); [Ry,intervalsy] = price2ret(S,Ticks=dates); [[NaN; intervalsy] S [[NaN NaN]; Ry] r2] ans = 10×6 NaN 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425
100.0000 110.5171 122.1403 134.9859 149.1825 164.8721 182.2119 201.3753 222.5541 245.9603
100.0000 98.3541 102.7850 93.0058 89.4007 83.3026 76.7803 72.1105 69.9172 68.4885
NaN 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003
NaN -0.0000 0.0001 -0.0003 -0.0001 -0.0002 -0.0002 -0.0002 -0.0001 -0.0001
0 -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
12-1765
12
Functions
price2ret assumes time units are days. Therefore, all durations are approximately 365 and the returns are normalized for that time unit. Compute returns again, but specify that the observation times are years. [Ryy,intervalsyy] = price2ret(S,Ticks=dates,Units="years"); [[NaN; intervalsyy] S [[NaN NaN]; Ryy] r2] ans = 10×6 NaN 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100.0000 110.5171 122.1403 134.9859 149.1825 164.8721 182.2119 201.3753 222.5541 245.9603
100.0000 98.3541 102.7850 93.0058 89.4007 83.3026 76.7803 72.1105 69.9172 68.4885
NaN 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000
NaN -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
0 -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
price2ret normalizes the returns relative to years, and now the returned rates match the simulated rates.
Input Arguments Prices — Time series of prices numeric matrix Time series of prices, specified as a numObs-by-numVars numeric matrix. Each row of Prices corresponds to an observation time specified by the optional Ticks name-value argument. Each column of Prices corresponds to an individual price series. Data Types: double PriceTbl — Time series of prices table | timetable Time series of prices, specified as a table or timetable with numObs rows. Each row of Tbl is an observation time. For a table, the optional Ticks name-value argument specifies observation times. For a timetable, PriceTbl.Time specifies observation times and it must be a datetime vector. Specify numVars variables, from which to compute returns, by using the DataVariables argument. The selected variables must be numeric. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: price2ret(Tbl,Method="periodic",DataVariables=1:5) computes the simple periodic returns of the first five variables in the input table Tbl. 12-1766
price2ret
Ticks — Observation times τ numeric vector | datetime vector Observation times τ, specified as a length numObs numeric or datetime vector of increasing values. When the input price series are in a matrix or table, the default is 1:numObs. When the input price series are in a timetable, price2ret uses the row times in PriceTbl.Time and ignores Ticks. PriceTbl.Time must be a datetime vector. Example: Ticks=datetime(1950:2020,12,31) specifies the end of each year from 1950 through 2020. Example: Ticks=datetime(1950,03,31):calquarters(1):datetime(2020,12,31) specifies the end of each quarter during the years 1950 through 2020. Data Types: double | datetime Units — Time units "days" (default) | "milliseconds" | "seconds" | "minutes" | "hours" | "years" | character vector Time units to use when observation times Ticks are datetimes, specified as a value in this table. Value
Description
"milliseconds"
Milliseconds
"seconds"
Seconds
"minutes"
Minutes
"hours"
Hours
"days"
Days
"years"
Years
price2ret requires time units to convert duration intervals to numeric values for normalizing returns. When the value of the Ticks name-value argument is a numeric vector, price2ret ignores the value of Units. Example: Units="years" Data Types: char | string Method — Compounding method "continuous" (default) | "periodic" | character vector Compounding method, specified as a value in this table. Value
Description
"continuous"
Compute continuously compounded returns
"periodic"
Compute simple periodic returns
Example: Method="periodic" Data Types: char | string 12-1767
12
Functions
DataVariables — Variables in PriceTbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in PriceTbl, from which price2ret computes returns, specified as a string vector or cell vector of character vectors containing variable names in PriceTbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string
Output Arguments Returns — Return series numeric matrix Return series, returned as a (numObs – 1)-by-numVars numeric matrix. price2ret returns Returns when you supply the input Prices. Returns in row i ri are associated with price interval [pi,pi+1], i = 1:(numObs - 1), according to the compounding method Method: • When Method is "continuous", ri =
log pi + 1 /pi . τi + 1 − τi
• When Method is "periodic", ri =
pi + 1 /pi − 1 . τi + 1 − τi
When observation times τ (see Ticks) are datetimes, the magnitude of the normalizing interval τi+1 – τi depends on the specified time units (see Units). intervals — Time intervals between observations numeric vector Time intervals between observations τi+1 – τi, returned as a length numObs – 1 numeric vector. price2ret returns intervals when you supply the input Prices. When observation times (see Ticks) are datetimes, interval magnitudes depend on the specified time units (see Units). ReturnTbl — Return series and time intervals table | timetable Return series and time intervals, returned as a table or timetable, the same data type as PriceTbl, with numObs – 1 rows. price2ret returns ReturnTbl when you supply the input PriceTbl. ReturnTbl contains the outputs Returns and intervals. 12-1768
price2ret
ReturnTbl associates observation time τi+1 with the end of the interval for the returns in row i ri.
Algorithms Consider the following variables: • p is a price series (Prices). • r is the corresponding return series (Returns). • τ is a vector of the observation times (Ticks). • δ is the series of lengths between observation times (intervals). The following figure shows how the inputs and outputs are associated.
Version History Introduced before R2006a price2ret supports name-value argument syntax for all optional inputs Behavior changed in R2022a price2ret accepts the observation times ticktimes and compounding method method as the name-value arguments Ticks and Method, respectively. However, the function will continue to accept the previous syntax. The syntax before R2022a is price2ret(Prices,ticktimes,method)
The recommended syntax for R2022a and later releases is price2ret(Prices,Ticks=ticktimes,Method=method)
As with any set of name-value arguments, you can specify them in any order.
See Also ret2price | tick2ret Topics “Returns with Negative Prices”
12-1769
12
Functions
print (To be removed) Display parameter estimation results for conditional variance models Note print will be removed in a future release. Use summarize instead.
Syntax print(Mdl,EstParamCov)
Description print(Mdl,EstParamCov) displays parameter estimates, standard errors, and t statistics for the fitted conditional variance model Mdl, with estimated parameter variance-covariance matrix EstParamCov. Mdl can be a garch, egarch, or gjr model.
Examples Print GARCH Estimation Results Print the results from estimating a GARCH model using simulated data. Simulate data from an GARCH(1,1) model with known parameter values. Mdl0 = garch('Constant',0.01,'GARCH',0.8,'ARCH',0.14) Mdl0 = garch with properties: Description: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" Name = "Gaussian" 1 1 0.01 {0.8} at lag [1] {0.14} at lag [1] 0
rng 'default'; [V,Y] = simulate(Mdl0,100);
Fit a GARCH(1,1) model to the simulated data, turning off the print display. Mdl = garch(1,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead.
12-1770
print
GARCH(1,1) Conditional Variance Model: ---------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant GARCH{1} ARCH{1}
Value ----------0.0167004 0.77263 0.191686
Standard Error -----------0.0165077 0.0776905 0.0750675
t Statistic ----------1.01167 9.94498 2.55351
Print EGARCH Estimation Results Print the results from estimating an EGARCH model using simulated data. Simulate data from an EGARCH(1,1) model with known parameter values. Mdl0 = egarch('Constant',0.01,'GARCH',0.8,'ARCH',0.14,... 'Leverage',-0.1); rng 'default'; [V,Y] = simulate(Mdl0,100);
Fit an EGARCH(1,1) model to the simulated data, turning off the print display. Mdl = egarch(1,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. EGARCH(1,1) Conditional Variance Model: -------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant GARCH{1} ARCH{1} Leverage{1}
Value ----------0.0654887 0.858069 0.27702 -0.179034
Standard Error -----------0.0746316 0.154361 0.171036 0.125057
t Statistic ----------0.877494 5.55886 1.61966 -1.43162
Print GJR Estimation Results Print the results from estimating a GJR model using simulated data. Simulate data from a GJR(1,1) model with known parameter values. 12-1771
12
Functions
Mdl0 = gjr('Constant',0.01,'GARCH',0.8,'ARCH',0.14,... 'Leverage',0.1); rng 'default'; [V,Y] = simulate(Mdl0,100);
Fit a GJR(1,1) model to the simulated data, turning off the print display. Mdl = gjr(1,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. GJR(1,1) Conditional Variance Model: -------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant GARCH{1} ARCH{1} Leverage{1}
Value ----------0.194785 0.69954 0.192965 0.214988
Standard Error -----------0.254198 0.11266 0.0931335 0.223923
t Statistic ----------0.766271 6.20928 2.07192 0.9601
Input Arguments Mdl — Conditional variance model garch model object | egarch model object | gjr model object Conditional variance model without any unknown parameters, specified as a garch, egarch, or gjr model object. Mdl is usually the estimated conditional variance model returned by estimate. EstParamCov — Estimated parameter variance-covariance matrix numeric matrix Estimated parameter variance-covariance matrix, returned as a numeric matrix. EstParamCov is usually the estimated conditional variance model returned by estimate. The rows and columns associated with any parameters contain the covariances. The standard errors of the parameter estimates are the square root of the entries along the main diagonal. The rows and columns associated with any parameters held fixed as equality constraints during estimation contain 0s. The order of the parameters in EstParamCov must be: • Constant 12-1772
print
• Nonzero GARCH coefficients at positive lags • Nonzero ARCH coefficients at positive lags • For EGARCH and GJR models, nonzero leverage coefficients at positive lags • Degrees of freedom (t innovation distribution only) • Offset (models with nonzero offset only) Data Types: double
Version History Introduced in R2012a
See Also Objects garch | egarch | gjr Functions estimate | summarize
12-1773
12
Functions
print (To be removed) Display parameter estimation results for ARIMA or ARIMAX models Note print will be removed in a future release. Use summarize instead.
Syntax print(EstMdl,EstParamCov)
Description print(EstMdl,EstParamCov) displays parameter estimates, standard errors, and t statistics for a fitted ARIMA or ARIMAX model.
Input Arguments EstMdl arima model estimated using estimate. EstParamCov Estimation error variance-covariance matrix, as output by estimate. EstParamCov is a square matrix with a row and column for each parameter known to the optimizer when Mdl was fit by estimate. Known parameters include all parameters estimate estimated. If you specified a parameter as fixed during estimation, then it is also a known parameter and the rows and columns associated with it contain 0s. The parameters in EstParamCov are ordered as follows: • Constant • Nonzero AR coefficients at positive lags • Nonzero SAR coefficients at positive lags • Nonzero MA coefficients at positive lags • Nonzero SMA coefficients at positive lags • Regression coefficients (when EstMdl contains them) • Variance parameters (scalar for constant-variance models, or a vector of parameters for a conditional variance model) • Degrees of freedom (t innovation distribution only)
Examples Print ARIMA Estimation Results Print the results from estimating an ARIMA model using simulated data. 12-1774
print
Simulate data from an ARMA(1,1) model using known parameter values. MdlSim = arima('Constant',0.01,'AR',0.8,'MA',0.14,... 'Variance',0.1); rng 'default'; Y = simulate(MdlSim,100);
Fit an ARMA(1,1) model to the simulated data, turning off the print display. Mdl = arima(1,0,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. ARIMA(1,0,1) Model: -------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant AR{1} MA{1} Variance
Value ----------0.0445373 0.822892 0.12032 0.133727
Standard Error -----------0.0460376 0.0711631 0.101817 0.0178793
t Statistic ----------0.967412 11.5635 1.18173 7.4794
Print ARIMAX Estimation Results Print the results of estimating an ARIMAX model. Load the Credit Defaults data set, assign the response IGD to Y and the predictors AGE, CPF, and SPR to the matrix X, and obtain the sample size T. To avoid distraction from the purpose of this example, assume that all predictor series are stationary. load Data_CreditDefaults X = Data(:,[1 3:4]); T = size(X,1); y = Data(:,5);
Separate the initial values from the main response and predictor series. y0 = y(1); yEst = y(2:T); XEst = X(2:end,:);
Set the ARIMAX(1,0,0) model yt = c + ϕ1 yt − 1 + εt to MdlY to fit to the data. MdlY = arima(1,0,0);
Fit the model to the data and specify the initial values. 12-1775
12
Functions
[EstMdl,EstParamCov] = estimate(MdlY,yEst,'X',XEst,... 'Y0',y0,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. ARIMAX(1,0,0) Model: --------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant AR{1} Beta(1) Beta(2) Beta(3) Variance
Value -----------0.204768 -0.0173091 0.0239329 -0.0124602 0.0680871 0.00539463
See Also Objects arima Functions estimate | summarize
12-1776
Standard Error -----------0.266078 0.565618 0.0218417 0.00749917 0.0745041 0.00224393
t Statistic -----------0.769578 -0.030602 1.09574 -1.66154 0.913871 2.4041
print
print Class: regARIMA (To be removed) Display estimation results for regression models with ARIMA errors Note print will be removed in a future release. Use summarize instead.
Syntax print(Mdl,ParamCov)
Description print(Mdl,ParamCov) displays parameter estimates, standard errors, and t statistics for the fitted regression model with ARIMA time series errors Mdl.
Input Arguments Mdl — Regression model with ARIMA errors regARIMA model Regression model with ARIMA errors, specified as a regARIMA model returned by regARIMA or estimate. ParamCov — Estimation error variance-covariance numeric matrix Estimation error variance-covariance, specified as a numeric matrix. ParamCov is a square matrix with a row and column for each parameter known to the optimizer that estimate uses to fit Mdl. Known parameters include all parameters estimate estimates. If you specify a parameter as fixed during estimation, then it is also a known parameter and the rows and columns associated with it contain 0s. print omits coefficients of lag operator polynomials at lags excluded from Mdl. print orders the parameters in ParamCov as follows: • Intercept • Nonzero AR coefficients at positive lags • Nonzero SAR coefficients at positive lags • Nonzero MA coefficients at positive lags • Nonzero SMA coefficients at positive lags • Regression coefficients (when Mdl contains them) • Variance • Degrees of freedom for the t-distribution 12-1777
12
Functions
Data Types: double
Examples Print Estimation Results of a Regression Model with ARIMA Errors Fit Regress GDP onto CPI using a regression model with ARMA(1,1) errors, and print the results. Load the US Macroeconomic data set and preprocess the data. load Data_USEconModel; logGDP = log(DataTable.GDP); dlogGDP = diff(logGDP); dCPI = diff(DataTable.CPIAUCSL);
Fit the model to the data. Mdl = regARIMA('ARLags',1,'MALags',1); [EstMdl,EstParamCov] = estimate(Mdl,dlogGDP,'X',... dCPI,'Display','off');
Print the estimates. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. Regression with ARIMA(1,0,1) Error Model: -----------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Intercept AR{1} MA{1} Beta1 Variance
Value ----------0.014776 0.605274 -0.161651 0.00204403 9.35782e-05
See Also Objects regARIMA Functions estimate | summarize
12-1778
Standard Error -----------0.00146271 0.0892902 0.10956 0.000706163 6.03135e-06
t Statistic ----------10.1018 6.77872 -1.47546 2.89456 15.5153
recessionplot
recessionplot Overlay recession bands on time series plot
Syntax recessionplot recessionplot(Name,Value) hBands = recessionplot( ___ )
Description recessionplot overlays shaded US recession bands, as reported by the National Bureau of Economic Research (NBER) [1], on a time series plot in the current axes. Abscissa data must represent dates created by datenum or datetime. recessionplot(Name,Value) uses additional options specified by one or more name-value arguments. For example, recessionplot('recessions',recessionPeriods) specifies overlaying shaded bands for the recession periods in recessionPeriods. hBands = recessionplot( ___ ) returns a vector of handles to the recession bands, using any of the input-argument combinations in the previous syntaxes.
Examples Overlay Recession Bands on Time Series Plot Load the Data_Unemployment.mat data set, which contains a monthly US unemployment rate series measured from 1954 through 1998. load Data_Unemployment
The variables Data and dates, among others, appear in the workspace. For more details on the data, enter Description. Data is a 45-by-12 matrix of the unemployment rates. The rows of Data correspond to successive years and its columns correspond to successive months; Data(j,k) is the unemployment rate in month k of year j. Represent Data as a vector of regular time series data by transposing the matrix, and then vertically concatenating the columns of the result. Data = Data'; un = Data(:);
dates is a numeric vector of the 45 consecutive sampling years. Create a datetime vector that expands dates by including all months within each year. Y Y M D t
= = = = =
repmat(dates',12,1); Y(:); repmat((1:12)',length(dates),1); ones(length(un),1); datetime(Y,M,D);
12-1779
12
Functions
Alternatively, you can use the calmonths function to efficiently include all months within each year. tspan = datetime([dates(1); dates(end)],[1; 12],[1; 1]); t = (tspan(1):calmonths(1):tspan(2))';
Plot the unemployment rate series. Overlay bands for recession periods reported by NBER. figure plot(t,un) recessionplot ylabel('Rate (%)') title("Unemployment Rate")
Periods of recession appear to occur with sudden, relatively large increases in the unemployment rate.
Change Color and Transparency of Recession Bands Overlay recession bands on time series plot, then return the handles of the recession bands to change the color and opacity of the bands. Load the Data_CreditDefaults.mat data set, which contains a credit default rate series and several predictor series measured annually from 1984 through 2004. load Data_CreditDefaults
12-1780
recessionplot
The variables Data and dates, among others, appear in the workspace. For more details on the data, enter Description. Data is a 21-by-5 numeric matrix containing the series. Extract the predictor series, which comprise the first four columns. X = Data(:,1:4);
dates is a numeric vector containing the 21 sampling years. Convert dates to a datetime vector of years. Assume the series are measured at the end of the year. T = numel(dates); dates = [dates [12 31].*ones(T,2)]; dates = datetime(dates);
Plot the predictor series. Overlay recession bands and return the handles to the bands. Change the band color to red and reduce the opacity to 0.1. figure; plot(dates,X,'LineWidth',2); xlabel("Year"); ylabel("Level"); hBands = recessionplot; set(hBands,'FaceColor',"r",'FaceAlpha',0.1);
12-1781
12
Functions
Input Arguments Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'recessions',recessionPeriods specifies overlaying shaded bands for the recession periods in recessionPeriods. axes — Axes on which to plot Axes object Axes on which to overlay recession bands, specified as an Axes object. The target axes must contain a time series plot with serial dates on the horizontal axis. By default, recessionplot plots to the current axes (gca). recessions — Recession periods Data_Recessions.mat (default) | matrix of serial date numbers | datetime matrix Recession periods, or data indicating the beginning and end of historical recessions, specified as a numRecessions-by-2 matrix of serial date numbers or datetime entries. Each row is a period of recession, with the first column indicating the beginning of the recession and the second column indicating the end of the recession. The default is the US recession data in Data_Recessions.mat, reported by NBER [1].
Output Arguments hBands — Handles to plotted graphics objects graphics vector Handles to plotted graphics objects, returned as a graphics vector. hBands contains unique plot identifiers, which you can use to query or modify properties of the recession bands.
Tips • recessionplot requires datetime values or serial date numbers on the horizontal axis of a time series plot. To convert other date information to this format before plotting, use datetime or datenum. • To achieve satisfactory displays on certain monitors and projectors, change the color and opacity of the recession bands by setting the FaceColor and FaceAlpha properties of the output handles.
Version History Introduced in R2012a 12-1782
recessionplot
References [1] National Bureau of Economic Research (NBER), Business Cycle Expansions and Contractions, https://www.nber.org/research/data/us-business-cycle-expansions-andcontractions.
See Also datenum | datetime Topics “Dates and Time” “Represent Dates and Times in MATLAB” “Convert Between Text and datetime or duration Values”
12-1783
12
Functions
recreg Recursive linear regression
Syntax [Coeff,SE] = recreg(X,y) [CoeffTbl,SETbl] = recreg(Tbl) ___ = recreg( ___ ,Name=Value) recreg( ___ ) ___ = recreg(ax, ___ ) [ ___ ,coeffPlots] = recreg( ___ )
Description recreg recursively estimates coefficients (β) and their standard errors in a multiple linear regression model of the form y = Xβ + ε by performing successive regressions using nested or rolling windows. recreg has options for OLS, HAC, and FGLS estimates, and for iterative plots of the estimates. [Coeff,SE] = recreg(X,y) returns a matrix of regression coefficient estimates Coeff and a corresponding matrix of standard error estimates SE from recursive regressions of the multiple linear regression model y = Xβ + ε. [CoeffTbl,SETbl] = recreg(Tbl) returns regression coefficients estimates in the table CoeffTbl, and standard error estimates in the table SETbl from a recursive regression on the linear model of the variables in the table or timetable Tbl. The response variable in the regression is the last table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. ___ = recreg( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. recreg returns the output argument combination for the corresponding input arguments. For example, recreg(Tbl,ResponseVariable="GDP",Intercept=false,Estimator="fgls") excludes an intercept term from the regression model, in which the response variable is the variable GDP in the table Tbl, and uses FGLS to estimate coefficients and standard errors. recreg( ___ ) plots iterative coefficient estimates with ±2 standard error bands for each coefficient in the multiple linear regression model. ___ = recreg(ax, ___ ) plots on the axes specified in ax instead of the axes of new figures. The option ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,coeffPlots] = recreg( ___ ) additionally returns handles to plotted graphics objects. Use elements of coeffPlots to modify properties of the plots after you create it.
Examples
12-1784
recreg
Inspect Consumption Model Coefficients for Structural Change Check coefficient estimates for instability in a model of food demand around World War II. Implement forward and backward recursive regressions in a rolling window. Load the US food consumption data set, which contains annual measurements from 1927 through 1962 with missing data due to WWII. load Data_Consumption
For more details on the data, enter Description at the command prompt. Plot the series. P = Data(:,1); % Food price index I = Data(:,2); % Disposable income index Q = Data(:,3); % Food consumption index figure plot(dates,[P I Q]) axis tight grid on xlabel("Year") ylabel("Index") title("\bf Time Series Plot of All Series") legend("Price","Income","Consumption",Location="southeast")
Measurements are missing from 1942 through 1947, which correspond to WWII. 12-1785
12
Functions
To examine elasticities, apply the log transformation to each series. LP = log(P); LI = log(I); LQ = log(Q);
Consider a model in which log consumption is a linear function of the logs of food price and income. In other words, LQt = β0 + β1LIt + β2LP + εt .
εt is a Gaussian random variable with mean 0 and standard deviation σ2. Identify the breakpoint index at the end of WWII, 1945. Ignore missing years with missing data. numCoeff = 4; % Three predictors and an intercept T = numel(dates(~isnan(P))); % Sample size bpIdx = find(dates(~isnan(P)) >= 1945,1) - numCoeff bpIdx = 12
The 12th iteration corresponds to the end of the war. Plot forward recursive-regression coefficient estimates using a rolling window 1/4 of the sample size. Indicate to plot the coefficients of LP and LI only in the same figure. X = [LP LI]; y = LQ; varnames = ["Log-price" "Log-income"]; plotvars = [false true true]; window = ceil(T*1/4); recreg(X,y,Window=window,Plot="combined",PlotVars=plotvars, ... VarNames=varnames);
12-1786
recreg
Plot forward recursive-regression coefficient estimates using a rolling window 1/3 of the sample size. window = ceil(T*1/3); recreg(X,y,Window=window,Plot="combined",PlotVars=plotvars, ... VarNames=varnames);
12-1787
12
Functions
Plot forward recursive-regression coefficient estimates using a rolling window of size 1/2 of the sample size. window = ceil(T*1/2); recreg(X,y,Window=window,Plot="combined",PlotVars=plotvars, ... VarNames=varnames);
12-1788
recreg
As the window size increases, the lines show less volatility, but the coefficients do exhibit instability.
Inspect Real US GNP Model for Instability Apply recursive regressions using nested windows to look for instability in an explanatory model of real GNP for a period spanning World War II. Load the Nelson-Plosser data set. load Data_NelsonPlosser
The time series in the data set contain annual, macroeconomic measurements from 1860 to 1970. For more details, a list of variables, and descriptions, enter Description in the command line. Several series have missing data. Focus the sample to measurements from 1915 to 1970. Identify the index corresponding to 1945, the end of WWII, to use as a breakpoint for the test. span = (1915 = 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else
12-1973
12
Functions
mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Simulate Parameters from Posterior of Time-Varying Model Consider the following time-varying, state-space model for a DGP: • From periods 1 through 250, the state equation includes stationary AR(2) and MA(1) models, respectively, and the observation model is the weighted sum of the two states. • From periods 251 through 500, the state model includes only the first AR(2) model. • μ0 = 0 . 5 0 . 5 0 0 and Σ0 is the identity matrix. Symbolically, the DGP is x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1
σ1 0 0 0 u1t 0 1 u2t for t = 1, . . . , 250, 0 1
+
0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
yt = c1 x1t + x3t + σ2εt . x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1
σ1
+
1 0 0 0 x3, t − 1
0
u1t
for t = 251,
x4, t − 1 yt = c2x1t + σ3εt . x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0
u1t
for t = 252, . . . , 500 .
yt = c2x1t + σ3εt . where: • The AR(2) parameters ϕ1, ϕ2 = 0 . 5, − 0 . 2 and σ1 = 0 . 4. • The MA(1) parameter θ = 0 . 3. • The observation equation parameters c1, c2 = 2, 3 and σ2, σ3 = 0 . 1, 0 . 2 . Simulate a response path of length 500 from the model. The function timeVariantParamMapBayes.m, stored in matlabroot/examples/econ/main, specifies the state-space model structure. params = [0.5; -0.2; 0.4; 0.3; 2; 0.1; 3; 0.2]; numObs = 500; numParams = numel(params); [A,B,C,D,mean0,Cov0,stateType] = timeVariantParamMapBayes(params,numObs); DGP = ssm(A,B,C,D,Mean0=mean0,Cov0=Cov0,StateType=stateType);
12-1974
simulate
rng(1) % For reproducibility y = simulate(DGP,numObs); plot(y) ylabel("y")
Create a time-varying, Bayesian state-space model that uses the structure of the DGP, but all parameters are unknown and the prior density is flat (the function flatPriorBSSM.m in matlabroot/examples/econ/main is the prior density). Create a bssm object representing the Bayesian state-space model object. Mdl = bssm(@(params)timeVariantParamMapBayes(params,numObs),@flatPriorBSSM);
Draw a sample from the posterior distribution. Initialize the parameter values to a random set of positive values in [0,0.5]. Set the proposal distribution to multivariate t25 with a scale matrix proportional. Set the proportionality constant to 0.005. params0 = 0.5*rand(numParams,1); options = optimoptions("fminunc",Display="off"); [params0,Proposal] = tune(Mdl,y,params0,Options=options,Display=false); [PostParams,accept] = simulate(Mdl,y,params0,Proposal, ... DoF=25,Proportion=0.005); accept accept = 0.7460
12-1975
12
Functions
PostParams is an 8-by-1000 matrix of 1000 random draws from the posterior distribution. The Metropolis-Hastings sampler accepted 75% of the proposed draws.
Draw Posterior Sample From Model with Deflated Response When you work with a state-space model that contains a deflated response variable, you must have data for the predictors. Consider a regression of the US unemployment rate onto and real gross national product (rGNP) rate, and suppose the resulting innovations are an ARMA(1,1) process. The state-space form of the relationship is x1, t x2, t
=
σ ϕ θ x1, t − 1 + ut 1 0 0 x2, t − 1
yt − βZt = x1, t, where: • x1, t is the ARMA process. • x2, t is a dummy state for the MA(1) effect. •
yt is the observed unemployment rate deflated by a constant and the rGNP rate (Zt).
• ut is an iid Gaussian series with mean 0 and standard deviation 1. Load the Nelson-Plosser data set, which contains a table DataTable that has the unemployment rate and rGNP series, among other series. load Data_NelsonPlosser
Create a variable in DataTable that represents the returns of the raw rGNP series. Because price-toreturns conversion reduces the sample size by one, prepad the series with NaN. DataTable.RGNPRate = [NaN; price2ret(DataTable.GNPR)]; T = height(DataTable);
Create variables for the regression. Represent the unemployment rate as the observation series and the constant and rGNP rate series as the deflation data Zt. Z = [ones(T,1) DataTable.RGNPRate]; y = DataTable.UR;
The functions armaDeflateYBayes.m and flatPriorDeflateY.m in matlabroot/examples/ econ/main specify the state-space model structure and likelihood, and the flat prior distribution. Create a bssm object representing the Bayesian state-space model. Specify the parameter-to-matrix mapping function as a handle to a function solely of the parameters. numParams = 5; Mdl = bssm(@(params)armaDeflateYBayes(params,y,Z),@flatPriorDeflateY) Mdl = Mapping that defines a state-space model:
12-1976
simulate
@(params)armaDeflateYBayes(params,y,Z) Log density of parameter prior distribution: @flatPriorDeflateY
Draw a sample from the posterior distribution. Initialize the MCMC algorithm with a random set of positive values in [0,0.5]. Obtain an optimal set of proposal distribution moments for the sampler by using tune. Set the proportionality constant to 0.01. Set a burn-in period of 2000 draws, set a thinning factor of 50, and specify retaining 1000 draws. rng(1) % For reproducibility params0 = 0.5*rand(numParams,1); options = optimoptions("fminunc",Display="off"); [params0,Proposal] = tune(Mdl,y,params0,Options=options, ... Display=false); [PostParams,accept] = simulate(Mdl,y,params0,Proposal,Proportion=0.01, ... BurnIn=2000,NumDraws=1000,Thin=50); accept accept = 0.9200
PostParams is a 5-by-1000 matrix of 1000 draws from the posterior distribution. The MetropolisHastings sampler accepts 92% of the proposed draws. paramNames = ["\phi" "\theta" "\sigma" "\beta_0" "\beta_1"]; figure h = tiledlayout(numParams,1); for j = 1:numParams nexttile plot(PostParams(j,:)) hold on ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
12-1977
12
Functions
All samples appear to suffer from autocorrelation. To improve the Markov chain, experiment with the Thin option.
Access Degrees of Freedom Posterior Draws Using Output Function The estimate and simulate functions do not return posterior draws of the degrees of freedom parameters of multivariate t-distributed state disturbances and observation innovations. However, you can write an output function that stores the degrees of freedom draws at each iteration of the MCMC algorithm, and pass it to simulate to obtain the draws. Consider the following DGP. xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 3
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
• The true value of the state-space parameter set Θ = ϕ1, ϕ2, σ1, σ2 = 0 . 5, − 0 . 75, 1, 0 . 5 . • The state disturbances u1, t and u2, t are jointly a multivariate Student's t random series with νu = 5 degrees of freedom. 12-1978
simulate
Create a vector autoregression (VAR) model that represents the state equation of the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; trueDoF = 5; phi = [trueTheta(1) 0; 0 trueTheta(2)]; Sigma = [trueTheta(3)^2 0; 0 trueTheta(4)^2]; DGP = varm(AR={phi},Covariance=Sigma,Constant=[0; 0]);
Filter a random 2-D multivariate central t series of length 500 through the VAR model to obtain state values. Set the degrees of freedom to 5. rng(10) % For reproducibility T = 500; trueU = mvtrnd(eye(DGP.NumSeries),trueDoF,T); X = filter(DGP,trueU);
Obtain a series of observations from the DGP by the linear combination yt = x1, e + 3x2, t. C = [1 3]; y = X*C';
Consider a Bayesian state-space model representing the model with parameters Θ and νu treated as unknown. Arbitrarily assume that the prior distribution of the parameters in Θ are independent Gaussian random variables with mean 0.5 and variance 1. Assume that the prior on the degrees of freedom νu is flat. The functions in Local Functions on page 12-1981 specify the state-space structure and prior distributions. Create the Bayesian state-space model by passing function handles to the paramMap and priorDistribution functions to bssm. Specify that the state disturbance distribution is multivariate Student's t with unknown degrees of freedom. PriorMdl = bssm(@paramMap,@priorDistribution,StateDistribution="t");
PriorMdl is a bssm object representing the Bayesian state-space model with unknown parameters. In Local Functions on page 12-1981, write a function called posteriorDrawsNuU that accepts the input structure and returns the posterior draws of νu. function StateDoF = posteriorDrawsStateDoF(inputstruct) StateDoF = inputstruct.StateDoF; end
Simulate draws from the posterior distribution of Θ, νu | y by using simulate. Specify a random set of positive values in [0,1] to initialize the Kalman filter. Set the burn-in period of the MCMC algorithm to 1000 draws, draw a sample of 10000 from the posterior, specify the univariate treatment of multivariate series for faster computations, and set the proposal scale matrix to a diagonal matrix with small values along the diagonal to increase the proposal acceptance rate. numParamsTheta = 4; theta0 = rand(numParamsTheta,1); [ThetaPostDraws,accept,StateDoFDraws] = simulate(PriorMdl,y,theta0, ... eye(numParamsTheta)/1000,BurnIn=1e3,NumDraws=1e4, ... Univariate=true,OutputFunction=@posteriorDrawsStateDoF); [numThetaParams,numDraws] = size(ThetaPostDraws)
12-1979
12
Functions
numThetaParams = 4 numDraws = 10000 accept accept = 0.4066 size(StateDoFDraws) ans = 1×2 1
10000
ThetaPostDraws is a 4-by-10000 matrix containing the posterior draws of the parameters Θ. accept is the Metropolis-Hastings sampler acceptance probability, in this case 41%, which means that simulate rejected 59% of the posterior draws. StateDoFDraws is a 1-by-10000 vector containing the posterior draws of νu. To diagnose the Markov chain induced by the Metropolis-within-Gibbs sampler, create a trace plot of the posterior draws of νu. figure plot(StateDoFDraws) title("Posterior Trace Plot of \nu_u")
The posterior sample shows significant serial correlation. You can remedy these behaviors by adjusting the Thin name-value argument. In general, simulate has name-value arguments that 12-1980
simulate
enable you to control several aspects of the MCMC sampler, which can improve the quality of the posterior sample. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 3]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end function StateDoF = posteriorDrawsStateDoF(inputstruct) StateDoF = inputstruct.StateDoF; end
Input Arguments PriorMdl — Prior Bayesian state-space model bssm model object Prior Bayesian state-space model, specified as a bssm model object return by bssm or ssm2bssm. The function handles of the properties PriorMdl.ParamDistribution and PriorMdl.ParamMap determine the prior and the data likelihood, respectively. Y — Observed response data numeric matrix | cell vector of numeric vectors Observed response data, from which simulate forms the posterior distribution, specified as a numeric matrix or a cell vector of numeric vectors. • If PriorMdl is time invariant with respect to the observation equation, Y is a T-by-n matrix. Each row of the matrix corresponds to a period and each column corresponds to a particular observation in the model. T is the sample size and n is the number of observations per period. The last row of Y contains the latest observations. 12-1981
12
Functions
• If PriorMdl is time varying with respect to the observation equation, Y is a T-by-1 cell vector. Y{t} contains an nt-dimensional vector of observations for period t, where t = 1, ..., T. The corresponding dimensions of the coefficient matrices, outputs of PriorMdl.ParamMap, C{t}, and D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations. NaN elements indicate missing observations. For details on how the Kalman filter accommodates missing observations, see “Algorithms” on page 12-620. Data Types: double | cell params0 — Initial values for parameters Θ numeric vector Initial parameter values for the parameters Θ, specified as a numParams-by-1 numeric vector. Elements of params0 must correspond to the elements of the first input arguments of PriorMdl.ParamMap and Mdl.ParamDistribution. Data Types: double Proposal — Metropolis-Hastings sampler proposal distribution covariance/scale matrix for parameters Θ positive definite numeric matrix Metropolis-Hastings sampler proposal distribution covariance/scale matrix for the parameters Θ, up to the proportionality constant Proportion, specified as a numParams-by-numParams, positivedefinite numeric matrix. Elements of Proposal must correspond to elements in params0. The proposal distribution is multivariate normal or Student's t with DoF degrees of freedom (for details, see DoF). Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: simulate(Mdl,Y,params0,Proposal,NumDraws=1e6,Thin=3,DoF=10) uses the multivariate t10 distribution for the Metropolis-Hastings proposal, draws 3e6 random vectors of parameters, and thins the sample to reduce serial correlation by discarding every 2 draws until it retains 1e6 draws. Kalman Filter Options
Univariate — Univariate treatment of multivariate series flag false (default) | true Univariate treatment of a multivariate series flag, specified as a value in this table.
12-1982
simulate
Value
Description
true
Applies the univariate treatment of a multivariate series, also known as sequential filtering
false
Does not apply sequential filtering
The univariate treatment can accelerate and improve numerical stability of the Kalman filter. However, all observation innovations must be uncorrelated. That is, DtDt' must be diagonal, where Dt (t = 1, ..., T) is the output coefficient matrix D of PriorMdl.ParamMap and PriorMdl.ParamDistribution. Example: Univariate=true Data Types: logical SquareRoot — Square root filter method flag false (default) | true Square root filter method flag, specified as a value in this table. Value
Description
true
Applies the square root filter method for the Kalman filter
false
Does not apply the square root filter method
If you suspect that the eigenvalues of the filtered state or forecasted observation covariance matrices are close to zero, then specify SquareRoot=true. The square root filter is robust to numerical issues arising from the finite precision of calculations, but requires more computational resources. Example: SquareRoot=true Data Types: logical Posterior Sampling Options
NumDraws — Number of Metropolis-Hastings sampler draws 1000 (default) | positive integer Number of Metropolis-Hastings sampler draws from the posterior distribution Π(θ|Y), specified as a positive integer. Example: NumDraws=1e7 Data Types: double BurnIn — Number of draws to remove from beginning of sample 100 (default) | nonnegative scalar Number of draws to remove from the beginning of the sample to reduce transient effects, specified as a nonnegative scalar. For details on how simulate reduces the full sample, see “Algorithms” on page 12-1988. Tip To help you specify the appropriate burn-in period size: 1
Determine the extent of the transient behavior in the sample by setting the BurnIn name-value argument to 0. 12-1983
12
Functions
2
Simulate a few thousand observations by using simulate.
3
Create trace plots.
Example: BurnIn=1000 Data Types: double Thin — Adjusted sample size multiplier 1 (no thinning) (default) | positive integer Adjusted sample size multiplier, specified as a positive integer. The actual sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, simulate discards every Thin – 1 draws, and then retains the next draw. For more details on how simulate reduces the full sample, see “Algorithms” on page 12-1988. Tip To reduce potential large serial correlation in the posterior sample, or to reduce the memory consumption of the output sample, specify a large value for Thin. Example: Thin=5 Data Types: double DoF — Proposal distribution degrees of freedom Inf (default) | positive scalar Proposal distribution degrees of freedom for parameter updates using the Metropolis-Hastings sampler, specified as a value in this table. Value
Metropolis-Hastings Proposal Distribution
Positive scalar
Multivariate t with DoF degrees of freedom
Inf
Multivariate normal
The following options specify other aspects of the proposal distribution: • Proposal — Required covariance/scale matrix • Proportion — Optional proportionality constant that scales Proposal • Center — Optional expected value • StdDoF — Optional proposal standard deviation of the degrees of freedom parameter of the tdistributed state disturbance or observation innovation process Example: DoF=10 Data Types: double Proportion — Proposal scale matrix proportionality constant 1 (default) | positive scalar Proposal scale matrix proportionality constant, specified as a positive scalar. 12-1984
simulate
Tip For higher proposal acceptance rates, experiment with relatively small values for Proportion. Example: Proportion=1 Data Types: double Center — Proposal distribution center [] (default) | numeric vector Proposal distribution center, specified as a value in this table. Value
Description
numParams-by-1 numeric vector
simulate uses the independence MetropolisHastings sampler. Center is the center of the proposal distribution.
[] (empty array)
simulate uses the random-walk MetropolisHastings sampler. The center of the proposal density is the current state of the Markov chain.
Example: Center=ones(10,1) Data Types: double StdDoF — Proposal standard deviation for degrees of freedom parameter 0.1 (default) | positive numeric scalar Proposal standard deviation for the degrees of freedom parameter of the t-distributed state disturbance or observation innovation process, specified as a positive numeric scalar. StdDoF applies to the corresponding process when PriorMdl.StateDistribution.Name is "t" or PriorMdl.ObservationDistribution.Name is "t", and their associated degrees of freedom are estimable (the DoF field is NaN or a function handle). For example, if the following conditions hold, StdDof applies only to the t-distribution degrees of freedom of the state disturbance process: • PriorMdl.StateDistribution.Name is "t". • PriorMdl.StateDistribution.DoF is NaN. • The limiting distribution of the observation innovations is Gaussian (PriorMdl.ObservationDistribution.Name is "Gaussian" or PriorMdl.ObservationDistribution is struct("Name","t","DoF",Inf). Example: StdDoF=0.5 Data Types: double OutputFunction — Function to call after each MCMC iteration [] (default) | function handle Function that simulate calls after each MCMC iteration, specified as a function handle. simulate saves the outputs of the specified function after each iteration and returns them in the Output output argument. You write the function, which can perform arbitrary calculations, such as computing or plotting custom statistics at each iteration. Suppose outputfcn is the name of the MATLAB function associated with specified function handle. outputfcn must have this form. 12-1985
12
Functions
function Output = outputfcn(Input) ... end
At each iteration, simulate passes Input as a structure array with fields in this table. In other words, values in this table are available for operations in the function.
12-1986
Field
Description
Value
Iteration
Current iteration
Numeric scalar
Parameters
Current values of the parameters θ
Numeric vector
LogPosteriorD ensity
Log posterior density evaluated at current parameters and conditioned on latent variables
Numeric scalar
StateDoF
Current degree of freedom of the tdistributed state disturbances
Numeric scalar
ObservationDo F
Current degree of freedom of the tdistributed observation innovation
Numeric scalar
StateVariance
Current latent variance variable on page 12-1987 associated with the state disturbance
T-by-1 numeric vector
ObservationVa riance
Current latent variance variable on page 12-1987 associated with the observation innovation, a T-by-1 numeric vector
T-by-1 numeric vector
A
Coefficient A evaluated at current parameters
Numeric matrix for a dimension-invariant coefficient or a T-by-1 cell vector of numeric matrices for a dimension-varying coefficient
B
Coefficient B evaluated at current parameters
Numeric matrix for a dimension-invariant coefficient or a T-by-1 cell vector of numeric matrices for a dimension-varying coefficient
C
Coefficient C evaluated at current parameters
Numeric matrix for a dimension-invariant coefficient or a T-by-1 cell vector of numeric matrices for a dimension-varying coefficient
D
Coefficient D evaluated at current parameters
Numeric matrix for a dimension-invariant coefficient or a T-by-1 cell vector of numeric matrices for a dimension-varying coefficient
States
Current state draws obtained by simulation smoother
Numeric matrix for a dimension-invariant coefficient or a T-by-1 cell vector of numeric matrices for a dimension-varying coefficient
Y
Observations
The value of the input Y
PreviousOutpu t
Output of the previous iteration
Structure array with fields in this table
simulate
Output depends on the computations in outputfcn. However, if Output is a k-by-1 homogeneous column vector, simulate concatenates the outputs of all iterations and returns a k-by-NumDraws matrix of the same type. If simulate cannot horizontally concatenate the outputs of all iterations, simulate returns a 1-by-NumDraws cell array instead, where successive cells contain the output of corresponding iterations. Example: OutputFunction=@outputfcn Data Types: function_handle
Output Arguments Params — Posterior draws numeric matrix Posterior draws of the state-space model parameters Θ, returned as a numParams-by-NumDraws numeric matrix. Each column is a separate draw from the distribution. Each row corresponds to the corresponding element of params0. Because the Metropolis-Hastings sampler is a Markov chain, successive draws are correlated. accept — Proposal acceptance rate positive scalar in [0,1] Proposal acceptance rate, returned as a positive scalar in [0,1]. Output — Custom function output array Custom function OutputFunction output concatenated over all MCMC iterations, returned as a matrix or cell vector. Output has NumDraws columns, and the number of rows depends on the size of the output at each iteration. The data type of the entries of Output depends on the operations of the output function.
More About Latent Variance Variables of t-Distributed Errors To facilitate posterior sampling, multivariate Student's t-distributed state disturbances and observation innovations are each represented as a inverse-gamma scale mixture, where the inversegamma random variable is the latent variance variable. Explicitly, suppose the m-dimensional state disturbances ut are iid multivariate t distributed with location 0, scale Im, and degrees of freedom νu. As an inverse-gamma scale mixture ut = ζtut, where: • The latent variable ζt is inverse-gamma with shape and scale νu/2. • ut is an m-dimensional multivariate standard Gaussian random variable. Multivariate t-distributed observation innovations can be similarly decomposed. 12-1987
12
Functions
Tips • Acceptance rates from accept that are relatively close to 0 or 1 indicate that the Markov chain does not sufficiently explore the posterior distribution. To obtain an appropriate acceptance rate for your data, tune the sampler by implementing one of the following procedures. • Use the tune function to search for the posterior mode by numerical optimization. tune returns optimized parameters and the proposal scale matrix, which is proportional to the negative Hessian matrix evaluated at the posterior mode. Pass the optimized parameters and the proposal scale to simulate, and tune the proportionality constant by using the Proportion name-value argument. Small values of Proportion tend to increase the proposal acceptance rates of the Metropolis-Hastings sampler. • Perform a multistage simulation: 1
Choose an initial value for the input Proposal and simulate draws from the posterior.
2
Compute the sample covariance matrix, and pass the result to simulate as an updated value for Proposal.
3
Repeat steps 1 and 2 until accept is reasonable for your data.
Algorithms • simulate performs posterior simulation of model parameters, states, and latent variables as follows: 1
simulate generates parameter draws by the Metropolis-Hastings sampler. The posterior density is proportional to the product of the prior density PriorMdl.ParamDistribution and the likelihood obtained by the Kalman filter of the state-space model PriorMdl.ParamMap.
2
simulate updates states by the simulation smoother of the state-space model.
3
simulate samples the latent variance variables on page 12-1987 of state disturbance and observation innovation distributions from posterior conditional distributions. For t-distributed state disturbances or observation innovations, latent variables are inverse-gamma distributed.
• If the input prior model PrioirMdl is a Gaussian linear state-space model and you do not specify the custom output function OutputFunction, simulate marginalizes the states, and the Metropolis-Hastings sampler simulates only parameters. Otherwise, simulate simulates parameters Θ, states xt, and latent variables ζt by the Metropolis-within-Gibbs sampler, which is more computationally intensive compared to the standard Metropolis-Hastings sampler. • This figure shows how simulate reduces the sample by using the values of NumDraws, Thin, and BurnIn. Rectangles represent successive draws from the distribution. simulate removes the white rectangles from the sample. The remaining NumDraws black rectangles compose the sample.
12-1988
simulate
Version History Introduced in R2022a
References [1] Hastings, Wilfred K. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications." Biometrika 57 (April 1970): 97–109. https://doi.org/10.1093/biomet/57.1.97. [2] Metropolis, Nicholas, Rosenbluth, Arianna. W., Rosenbluth, Marshall. N., Teller, Augusta. H., and Teller, Edward. "Equation of State Calculations by Fast Computing Machines." The Journal of Chemical Physics 21 (June 1953): 1087–92. https://doi.org/10.1063/1.1699114.
See Also Objects bssm Functions ssm2bssm | estimate | tune Topics “Analyze Linearized DSGE Models” on page 11-178 “Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models” on page 11-199
12-1989
12
Functions
simulate Simulate coefficients and innovations covariance matrix of Bayesian vector autoregression (VAR) model
Syntax [Coeff,Sigma] = simulate(PriorMdl) [Coeff,Sigma] = simulate(PriorMdl,Y) [Coeff,Sigma] = simulate( ___ ,Name,Value)
Description [Coeff,Sigma] = simulate(PriorMdl) returns a random vector of coefficients Coeff and a random innovations covariance matrix Sigma drawn from the prior Bayesian VAR(p) model on page 12-2000 PriorMdl. [Coeff,Sigma] = simulate(PriorMdl,Y) draws from the posterior distributions produced or updated by incorporating the response data Y. NaNs in the data indicate missing values, which simulate removes by using list-wise deletion. [Coeff,Sigma] = simulate( ___ ,Name,Value) specifies options using one or more name-value pair arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can set the number of random draws from the distribution or specify the presample response data.
Examples Draw Coefficients and Innovations Covariance Matrix from Prior Distribution Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. Assume that a conjugate prior distribution π Φ1, . . . , Φ4, c ′, Σ governs the behavior of the parameters. Create a conjugate prior model. Specify the response series names. Obtain a summary of the prior distribution. seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; numseries = numel(seriesnames); numlags = 4;
12-1990
simulate
PriorMdl = bayesvarm(numseries,numlags,'ModelType','conjugate',... 'SeriesNames',seriesnames); Summary = summarize(PriorMdl,'off');
Draw a set of coefficients and an innovations covariance matrix from the prior distribution. rng(1) % For reproducibility [Coeff,Sigma] = simulate(PriorMdl);
Display the selected coefficients with corresponding names and the innovations covariance matrix. table(Coeff,'RowNames',Summary.CoeffMap) ans=39×1 table Coeff __________ AR{1}(1,1) AR{1}(1,2) AR{1}(1,3) AR{2}(1,1) AR{2}(1,2) AR{2}(1,3) AR{3}(1,1) AR{3}(1,2) AR{3}(1,3) AR{4}(1,1) AR{4}(1,2) AR{4}(1,3) Constant(1) AR{1}(2,1) AR{1}(2,2) AR{1}(2,3) ⋮
0.44999 0.047463 -0.042106 -0.0086264 0.034049 -0.058092 -0.015698 -0.053203 -0.031138 0.036431 -0.058279 -0.02195 -1.001 -0.068182 0.51029 -0.094367
AR{r}(j,k) is the AR coefficient of response variable k (lagged r units) in response equation j. Sigma Sigma = 3×3 0.1238 -0.0053 -0.0369
-0.0053 0.0456 0.0160
-0.0369 0.0160 0.1039
Rows and columns of Sigma correspond to the innovations in the response equations ordered by PriorMdl.SeriesNames.
Simulate Parameters from Analytically Tractable Posterior Distribution Consider the 3-D VAR(4) model of “Draw Coefficients and Innovations Covariance Matrix from Prior Distribution” on page 12-1990. In this case, assume that the prior distribution is diffuse. 12-1991
12
Functions
Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response series names. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate Posterior Distribution Estimate the posterior distribution. Return the estimation summary. [PosteriorMdl,Summary] = estimate(PriorMdl,rmDataTimeTable{:,seriesnames}); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39 | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781 AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868 AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647 AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860
12-1992
simulate
AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620 AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917 AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
PosteriorMdl is a conjugatebvarm model, which is analytically tractable. Simulate Parameters from Posterior Draw 1000 samples from the posterior distribution. rng(1) [Coeff,Sigma] = simulate(PosteriorMdl,'NumDraws',1000);
Coeff is a 39-by-1000 matrix of randomly drawn coefficients. Each column is an individual draw, and each row is an individual coefficient. Sigma is a 3-by-3-by-1000 array of randomly drawn innovations covariance matrices. Each page is an individual draw. Display the first coefficient drawn from the distribution with corresponding parameter names, and display the first drawn innovations covariance matrix. Coeffs = table(Coeff(:,1),'RowNames',Summary.CoeffMap) Coeffs=39×1 table Var1 _________ AR{1}(1,1) AR{1}(1,2) AR{1}(1,3) AR{2}(1,1) AR{2}(1,2) AR{2}(1,3) AR{3}(1,1) AR{3}(1,2) AR{3}(1,3)
0.14994 -0.46927 0.088388 0.28139 -0.19597 0.049222 0.3946 0.081871 0.002117
12-1993
12
Functions
AR{4}(1,1) AR{4}(1,2) AR{4}(1,3) Constant(1) AR{1}(2,1) AR{1}(2,2) AR{1}(2,3) ⋮
0.13514 -0.23661 -0.01869 0.035787 0.0027895 0.62382 0.053232
Sigma(:,:,1) ans = 3×3 0.2653 -0.0075 0.1483
-0.0075 0.1015 -0.1435
0.1483 -0.1435 1.5042
Simulate Parameters from Analytically Intractable Posterior Distribution Consider the 3-D VAR(4) model of “Draw Coefficients and Innovations Covariance Matrix from Prior Distribution” on page 12-1990. In this case, assume that the prior distribution is semiconjugate. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a semiconjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'Model','semiconjugate',... 'SeriesNames',seriesnames);
Simulate Parameters from Posterior Because the joint posterior distribution of a semiconjugate prior model is analytically intractable, simulate sequentially draws from the full conditional distributions. Draw 1000 samples from the posterior distribution. Specify a burn-in period of 10,000, and a thinning factor of 5. Start the Gibbs sampler by assuming the posterior mean of Σ is the 3-D identity matrix. 12-1994
simulate
rng(1) [Coeff,Sigma] = simulate(PriorMdl,rmDataTimeTable{:,seriesnames},... 'NumDraws',1000,'BurnIn',1e4,'Thin',5,'Sigma0',eye(3));
Coeff is a 39-by-1000 matrix of randomly drawn coefficients. Each column is an individual draw, and each row is an individual coefficient. Sigma is a 3-by-3-by-1000 array of randomly drawn innovations covariance matrices. Each page is an individual draw.
Simulate Coefficients from VARX Model Consider the 2-D VARX(1) model for the US real GDP (RGDP) and investment (GCE) rates that treats the personal consumption (PCEC) rate as exogenous: RGDPt GCEt
=c+Φ
RGDPt − 1 GCEt − 1
+ PCECt β + εt .
For all t, εt is a series of independent 2-D normal innovations with a mean of 0 and covariance Σ. Assume the following prior distributions: •
Φ c β ′ Σ ∼ Ν4 × 2 Μ, V, Σ , where M is a 4-by-2 matrix of means and V is the 4-by-4 amongcoefficient scale matrix. Equivalently, vec Φ c β ′ Σ ∼ Ν8 vec Μ , Σ ⊗ V .
• Σ ∼ Inverse Wishart Ω, ν , where Ω is the 2-by-2 scale matrix and ν is the degrees of freedom. Load the US macroeconomic data set. Compute the real GDP, investment, and personal consumption rate series. Remove all missing values from the resulting series. load Data_USEconModel DataTimeTable.RGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF; seriesnames = ["PCEC"; "RGDP"; "GCE"]; rates = varfun(@price2ret,DataTimeTable,'InputVariables',seriesnames); rates = rmmissing(rates); rates.Properties.VariableNames = seriesnames;
Create a conjugate prior model for the 2-D VARX(1) model parameters. numseries = 2; numlags = 1; numpredictors = 1; PriorMdl = conjugatebvarm(numseries,numlags,'NumPredictors',numpredictors,... 'SeriesNames',seriesnames(2:end));
Simulate directly from the posterior distribution. Specify the exogenous predictor data. [Coeff,Sigma] = simulate(PriorMdl,rates{:,PriorMdl.SeriesNames},... 'X',rates{:,seriesnames(1)});
By default, simulate uses the first p = 1 observations of the response data to initialize the dynamic component of the model, and removes the corresponding observations from the predictor data.
12-1995
12
Functions
Input Arguments PriorMdl — Prior Bayesian VAR model conjugatebvarm model object | semiconjugatebvarm model object | diffusebvarm model object | normalbvarm model object Prior Bayesian VAR model, specified as a model object in this table. Model Object
Description
conjugatebvarm
Dependent, matrix-normal-inverse-Wishart conjugate model returned by bayesvarm or conjugatebvarm
semiconjugatebv Independent, normal-inverse-Wishart semiconjugate prior model returned by arm bayesvarm or semiconjugatebvarm diffusebvarm
Diffuse prior model returned by bayesvarm or diffusebvarm
normalbvarm
Normal conjugate model with a fixed innovations covariance matrix, returned by bayesvarm or normalbvarm
Note If PriorMdl is a diffusebvarm model, then you must also supply Y because simulate cannot draw from an improper prior distribution. Consequently, Coeff and Sigma represent draws from the posterior distribution. Y — Observed multivariate response series numeric matrix Observed multivariate response series to which simulate fits the model, specified as a numobs-bynumseries numeric matrix. numobs is the sample size. numseries is the number of response variables (PriorMdl.NumSeries). Rows correspond to observations, and the last row contains the latest observation. Columns correspond to individual response variables. Y represents the continuation of the presample response series in Y0. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Y0',Y0,'X',X specifies the presample response data Y0 to initialize the VAR model for posterior estimation, and the predictor data X for the exogenous regression component. Options for All Prior Distributions
NumDraws — Number of random draws 1 (default) | positive integer 12-1996
simulate
Number of random draws from the distributions, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer. Example: 'NumDraws',1e7 Data Types: double Y0 — Presample response data numeric matrix Presample response data to initialize the VAR model for estimation, specified as the comma-separated pair consisting of 'Y0' and a numpreobs-by-numseries numeric matrix. numpreobs is the number of presample observations. Rows correspond to presample observations, and the last row contains the latest observation. Y0 must have at least PriorMdl.P rows. If you supply more rows than necessary, simulate uses the latest PriorMdl.P observations only. Columns must correspond to the response series in Y. By default, simulate uses Y(1:PriorMdl.P,:) as presample observations, and then estimates the posterior using Y((PriorMdl.P + 1):end,:). This action reduces the effective sample size. Data Types: double X — Predictor data numeric matrix Predictor data for the exogenous regression component in the model, specified as the commaseparated pair consisting of 'X' and a numobs-by-PriorMdl.NumPredictors numeric matrix. Rows correspond to observations, and the last row contains the latest observation. simulate does not use the regression component in the presample period. X must have at least as many observations as the observations used after the presample period. • If you specify Y0, then X must have at least numobs rows (see Y). • Otherwise, X must have at least numobs – PriorMdl.P observations to account for the presample removal. In either case, if you supply more rows than necessary, simulate uses the latest observations only. Columns correspond to individual predictor variables. All predictor variables are present in the regression component of each response equation. Data Types: double Options for Semiconjugate Prior Distributions
BurnIn — Number of draws to remove from beginning of sample 0 (default) | nonnegative scalar Number of draws to remove from the beginning of the sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and a nonnegative scalar. For details on how simulate reduces the full sample, see “Algorithms” on page 12-2001. Tip To help you specify the appropriate burn-in period size: 12-1997
12
Functions
1
Determine the extent of the transient behavior in the sample by setting the BurnIn name-value argument to 0.
2
Simulate a few thousand observations by using simulate.
3
Create trace plots.
Example: 'BurnIn',0 Data Types: double Thin — Adjusted sample size multiplier 1 (default) | positive integer Adjusted sample size multiplier, specified as the comma-separated pair consisting of 'Thin' and a positive integer. The actual sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, simulate discards every Thin – 1 draws, and then retains the next draw. For more details on how simulate reduces the full sample, see “Algorithms” on page 12-2001. Tip To reduce potential large serial correlation in the sample, or to reduce the memory consumption of the draws stored in Coeff and Sigma, specify a large value for Thin. Example: 'Thin',5 Data Types: double Coeff0 — Starting value of VAR model coefficients for Gibbs sampler numeric column vector Starting value of the VAR model coefficients for the Gibbs sampler, specified as the comma-separated pair consisting of 'Coeff0' and a numeric column vector with (PriorMdl.NumSeries*k)-byNumDraws elements, where k = PriorMdl.NumSeries*PriorMdl.P + PriorMdl.IncludeIntercept + PriorMdl.IncludeTrend + PriorMdl.NumPredictors, which is the number of coefficients in a response equation. For details on the structure of Coeff0, see the output Coeff. By default, Coeff0 is the multivariate least-squares estimate. Tip • To specify Coeff0: 1
Set separate variables for the initial values each coefficient matrix and vector.
2
Horizontally concatenate all coefficient means in this order: Coef f = Φ1 Φ2 ⋯ Φp c δ Β .
3
Vectorize the transpose of the coefficient mean matrix. Coeff0 = Coeff.'; Coeff0 = Coeff0(:);
12-1998
simulate
• A good practice is to run simulate multiple times with different parameter starting values. Verify that the estimates from each run converge to similar values.
Data Types: double Sigma0 — Starting value of innovations covariance matrix for Gibbs sampler positive definite numeric matrix Starting value of the innovations covariance matrix for the Gibbs sampler, specified as the commaseparated pair consisting of 'Sigma0' and a PriorMdl.NumSeries-by-PriorMdl.NumSeries positive definite numeric matrix. By default, Sigma0 is the residual mean squared error from multivariate least-squares. Rows and columns correspond to innovations in the equations of the response variables ordered by PriorMdl.SeriesNames. Tip A good practice is to run simulate multiple times with different parameter starting values. Verify that the estimates from each run converge to similar values. Data Types: double
Output Arguments Coeff — Simulated VAR model coefficients numeric matrix Simulated VAR model coefficients, returned as a (PriorMdl.NumSeries*k)-by-NumDraws numeric matrix, where k = PriorMdl.NumSeries*PriorMdl.P + PriorMdl.IncludeIntercept + PriorMdl.IncludeTrend + PriorMdl.NumPredictors, which is the number of coefficients in a response equation. Each column is a separate draw from the distribution. For draw j, Coeff(1:k,j) corresponds to all coefficients in the equation of response variable PriorMdl.SeriesNames(1), Coeff((k + 1):(2*k),j) corresponds to all coefficients in the equation of response variable PriorMdl.SeriesNames(2), and so on. For a set of indices corresponding to an equation: • Elements 1 through PriorMdl.NumSeries correspond to the lag 1 AR coefficients of the response variables ordered by PriorMdl.SeriesNames. • Elements PriorMdl.NumSeries + 1 through 2*PriorMdl.NumSeries correspond to the lag 2 AR coefficients of the response variables ordered by PriorMdl.SeriesNames. • In general, elements (q – 1)*PriorMdl.NumSeries + 1 through q*PriorMdl.NumSeries correspond to the lag q AR coefficients of the response variables ordered by PriorMdl.SeriesNames. • If PriorMdl.IncludeConstant is true, element PriorMdl.NumSeries*PriorMdl.P + 1 is the model constant. • If PriorMdl.IncludeTrend is true, element PriorMdl.NumSeries*PriorMdl.P + 2 is the linear time trend coefficient. • If PriorMdl.NumPredictors > 0, elements PriorMdl.NumSeries*PriorMdl.P + 3 through k compose the vector of regression coefficients of the exogenous variables. 12-1999
12
Functions
This figure shows the structure of Coeff(L,j) for a 2-D VAR(3) model that contains a constant vector and four exogenous predictors. y1, t y2, t ⨉ ⨉ [ϕ1, 11 ϕ1, 12 ϕ2, 11 ϕ2, 12 ϕ3, 11 ϕ3, 12 c1 β11 β12 β13 β14 ϕ1, 21 ϕ1, 22 ϕ2, 21 ϕ2, 22 ϕ3, 21 ϕ3, 22 c2 β21 β22 β23 β24
], where • ϕq,jk is element (j,k) of the lag q AR coefficient matrix. • cj is the model constant in the equation of response variable j. • Bju is the regression coefficient of exogenous variable u in the equation of response variable j. Sigma — Simulated innovations covariance matrices array of positive definite numeric matrices Simulated innovations covariance matrices, returned as a PriorMdl.NumSeries-byPriorMdl.NumSeries-by-NumDraws array of positive definite numeric matrices. Each page is a separate draw (covariance) from the distribution. Rows and columns correspond to innovations in the equations of the response variables ordered by PriorMdl.SeriesNames. If PriorMdl is a normalbvarm object, all covariances in Sigma are equal to PriorMdl.Covariance.
Limitations • simulate cannot draw values from an improper distribution, which is a distribution whose density does not integrate to 1.
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. 12-2000
simulate
• δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. • zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation.
Tip • Monte Carlo simulation is subject to variation. If simulate uses Monte Carlo simulation, then estimates and inferences might vary when you call simulate multiple times under seemingly equivalent conditions. To reproduce estimation results, set a random number seed by using rng before calling simulate.
Algorithms • If simulate estimates a posterior distribution (when you supply Y) and the posterior is analytically tractable, simulate simulates directly from the posterior. Otherwise, simulate uses the Gibbs sampler to estimate the posterior. • This figure shows how simulate reduces the sample by using the values of NumDraws, Thin, and BurnIn. Rectangles represent successive draws from the distribution. simulate removes the white rectangles from the sample. The remaining NumDraws black rectangles compose the sample. 12-2001
12
Functions
• If PriorMdl is a semiconjugatebvarm object and you do not specify starting values (Coeff0 and Sigma0), simulate samples from the posterior distribution by applying the Gibbs sampler. 1
simulate uses the default value of Sigma0 for Σ and draws a value of Λ from π(Λ|Σ,Y,X), the full conditional distribution of the VAR model coefficients.
2
simulate draws a value of Σ from π(Σ|Λ,Y,X), the full conditional distribution of the innovations covariance matrix, by using the previously generated value of Λ.
3
The function repeats steps 1 and 2 until convergence. To assess convergence, draw a trace plot of the sample.
If you specify Coeff0, simulate draws a value of Σ from π(Σ|Λ,Y,X) to start the Gibbs sampler. • simulate does not return default starting values that it generates.
Version History Introduced in R2020a
See Also Objects normalbvarm | conjugatebvarm | semiconjugatebvarm | diffusebvarm | empiricalbvarm Functions simsmooth
12-2002
simulate
simulate Monte Carlo simulation of conditional variance models
Syntax V = simulate(Mdl,numObs) V = simulate(Mdl,numObs,Name,Value) [V,Y] = simulate( ___ )
Description V = simulate(Mdl,numObs) simulates a numObs-period conditional variance path from the fully specified conditional variance model Mdl. Mdl can be a garch, egarch, or gjr model. V = simulate(Mdl,numObs,Name,Value) simulates conditional variance paths with additional options specified by one or more Name,Value pair arguments. For example, you can generate multiple sample paths or specify presample innovation paths. [V,Y] = simulate( ___ ) additionally simulates response paths using any of the input arguments in the previous syntaxes.
Examples Simulate GARCH Model Conditional Variances and Responses Simulate conditional variance and response paths from a GARCH(1,1) model. Specify a GARCH(1,1) model with known parameters. Mdl = garch('Constant',0.01,'GARCH',0.7,'ARCH',0.2);
Simulate 500 sample paths, each with 100 observations. rng default; % For reproducibility [V,Y] = simulate(Mdl,100,'NumPaths',500); figure subplot(2,1,1) plot(V) title('Simulated Conditional Variances') subplot(2,1,2) plot(Y) title('Simulated Responses')
12-2003
12
Functions
The simulated responses look like draws from a stationary stochastic process. Plot the 2.5th, 50th (median), and 97.5th percentiles of the simulated conditional variances. lower = prctile(V,2.5,2); middle = median(V,2); upper = prctile(V,97.5,2); figure plot(1:100,lower,'r:',1:100,middle,'k',... 1:100,upper,'r:','LineWidth',2) legend('95% Interval','Median') title('Approximate 95% Intervals')
12-2004
simulate
The intervals are asymmetric due to positivity constraints on the conditional variance.
Simulate EGARCH Model Conditional Variances and Responses Simulate conditional variance and response paths from an EGARCH(1,1) model. Specify an EGARCH(1,1) model with known parameters. Mdl = egarch('Constant',0.001,'GARCH',0.7,'ARCH',0.2,... 'Leverage',-0.3);
Simulate 500 sample paths, each with 100 observations. rng default; % For reproducibility [V,Y] = simulate(Mdl,100,'NumPaths',500); figure subplot(2,1,1) plot(V) title('Simulated Conditional Variances') subplot(2,1,2) plot(Y) title('Simulated Responses (Innovations)')
12-2005
12
Functions
The simulated responses look like draws from a stationary stochastic process. Plot the 2.5th, 50th (median), and 97.5th percentiles of the simulated conditional variances. lower = prctile(V,2.5,2); middle = median(V,2); upper = prctile(V,97.5,2); figure plot(1:100,lower,'r:',1:100,middle,'k',... 1:100, upper,'r:','LineWidth',2) legend('95% Interval','Median') title('Approximate 95% Intervals')
12-2006
simulate
The intervals are asymmetric due to positivity constraints on the conditional variance.
Simulate GJR Model Conditional Variances and Responses Simulate conditional variance and response paths from a GJR(1,1) model. Specify a GJR(1,1) model with known parameters. Mdl = gjr('Constant',0.001,'GARCH',0.7,'ARCH',0.2,... 'Leverage',0.1);
Simulate 500 sample paths, each with 100 observations. rng default; % For reproducibility [V,Y] = simulate(Mdl,100,'NumPaths',500); figure subplot(2,1,1) plot(V) title('Simulated Conditional Variances') subplot(2,1,2) plot(Y) title('Simulated Responses (Innovations)')
12-2007
12
Functions
The simulated responses look like draws from a stationary stochastic process. Plot the 2.5th, 50th (median), and 97.5th percentiles of the simulated conditional variances. lower = prctile(V,2.5,2); middle = median(V,2); upper = prctile(V,97.5,2); figure plot(1:100,lower,'r:',1:100,middle,'k',... 1:100, upper,'r:','LineWidth',2) legend('95% Interval','Median') title('Approximate 95% Intervals')
12-2008
simulate
The intervals are asymmetric due to positivity constraints on the conditional variance.
Forecast Conditional Variances by Monte-Carlo Simulation Simulate conditional variances of the daily NASDAQ Composite Index returns for 500 days. Use the simulations to make forecasts and approximate 95% forecast intervals. Compare the forecasts among GARCH(1,1), EGARCH(1,1), and GJR(1,1) fits. Load the NASDAQ data included with the toolbox. Convert the index to returns. load Data_EquityIdx nasdaq = DataTable.NASDAQ; r = price2ret(nasdaq); T = length(r);
Fit GARCH(1,1), EGARCH(1,1), and GJR(1,1) models to the entire data set. Infer conditional variances to use as presample conditional variances for the forecast simulation. Mdl = cell(3,1); % Preallocation Mdl{1} = garch(1,1); Mdl{2} = egarch(1,1); Mdl{3} = gjr(1,1); EstMdl = cellfun(@(x)estimate(x,r,'Display','off'),Mdl,...
12-2009
12
Functions
'UniformOutput',false); v0 = cellfun(@(x)infer(x,r),EstMdl,'UniformOutput',false);
EstMdl is 3-by-1 cell vector. Each cell is a different type of estimated conditional variance model, e.g., EstMdl{1} is an estimated GARCH(1,1) model. V0 is a 3-by-1 cell vector, and each cell contains the inferred conditional variances from the corresponding, estimated model. Simulate 1000 samples paths with 500 observations each. Use the observed returns and inferred conditional variances as presample data. vSim = cell(3,1); % Preallocation for j = 1:3 rng default; % For reproducibility vSim{j} = simulate(EstMdl{j},500,'NumPaths',1000,'E0',r,'V0',v0{j}); end
vSim is a 3-by-1 cell vector, and each cell contains a 500-by-1000 matrix of simulated conditional variances generated from the corresponding, estimated model. Plot the simulation mean forecasts and approximate 95% forecast intervals, along with the conditional variances inferred from the data. lower = cellfun(@(x)prctile(x,2.5,2),vSim,'UniformOutput',false); upper = cellfun(@(x)prctile(x,97.5,2),vSim,'UniformOutput',false); mn = cellfun(@(x)mean(x,2),vSim,'UniformOutput',false); datesPlot = dates(end - 250:end); datesFH = dates(end) + (1:500)'; h = zeros(3,4); figure for j = 1:3 col = zeros(1,3); col(j) = 1; h(j,1) = plot(datesPlot,v0{j}(end-250:end),'Color',col); hold on h(j,2) = plot(datesFH,mn{j},'Color',col,'LineWidth',3); h(j,3:4) = plot([datesFH datesFH],[lower{j} upper{j}],':',... 'Color',col,'LineWidth',2); end hGCA = gca; plot(datesFH(1)*[1 1],hGCA.YLim,'k--'); datetick; axis tight; h = h(:,1:3); legend(h(:),'GARCH - Inferred','EGARCH - Inferred','GJR - Inferred',... 'GARCH - Sim. Mean','EGARCH - Sim. Mean','GJR - Sim. Mean',... 'GARCH - 95% Fore. Int.','EGARCH - 95% Fore. Int.',... 'GJR - 95% Fore. Int.','Location','NorthEast') title('Simulated Conditional Variance Forecasts') hold off
12-2010
simulate
Input Arguments Mdl — Conditional variance model garch model object | egarch model object | gjr model object Conditional variance model without any unknown parameters, specified as a garch, egarch, or gjr model object. Mdl cannot contain any properties that have NaN value. numObs — Sample path length positive integer Sample path length, specified as a positive integer. That is, the number of random observations to generate per output path. V and Y have numObs rows. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'numPaths',1000,'E0',[0.5; 0.5] specifies to generate 1000 sample paths and to use [0.5; 0.5] as presample innovations per path. 12-2011
12
Functions
NumPaths — Number of sample paths to generate 1 (default) | positive integer Number of sample paths to generate, specified as the comma-separated pair consisting of 'NumPaths' and a positive integer. V and Y have NumPaths columns. Example: 'NumPaths',1000 Data Types: double E0 — Presample innovations numeric column vector | numeric matrix Presample innovations, specified as the comma-separated pair consisting of 'E0' and a numeric column vector or matrix. The presample innovations provide initial values for the innovations process of the conditional variance model Mdl. The presample innovations derive from a distribution with mean 0. E0 must contain at least Mdl.Q elements or rows. If E0 contains extra rows, simulate uses the latest Mdl.Q only. The last element or row contains the latest presample innovation. • If E0 is a column vector, it represents a single path of the underlying innovation series. simulate applies E0 to each simulated path. • If E0 is a matrix, then each column represents a presample path of the underlying innovation series. E0 must have at least NumPaths columns. If E0 has more columns than necessary, simulate uses the first NumPaths columns only. The defaults are: • For GARCH(P,Q) and GJR(P,Q) models, simulate sets any necessary presample innovations to an independent sequence of disturbances with mean zero and standard deviation equal to the unconditional standard deviation of the conditional variance process. • For EGARCH(P,Q) models, simulate sets any necessary presample innovations to an independent sequence of disturbances with mean zero and variance equal to the exponentiated unconditional mean of the logarithm of the EGARCH variance process. Example: 'E0',[0.5; 0.5] V0 — Positive presample conditional variance paths numeric column vector | numeric matrix Positive presample conditional variance paths, specified as a numeric vector or matrix. V0 provides initial values for the conditional variances in the model. • If V0 is a column vector, then simulate applies it to each output path. • If V0 is a matrix, then it must have at least NumPaths columns. If V0 has more columns than necessary, simulate uses the first NumPaths columns only. • For GARCH(P,Q) and GJR(P,Q) models: • V0 must have at least Mdl.P rows to initialize the variance equation. • By default, simulate sets any necessary presample variances to the unconditional variance of the conditional variance process. 12-2012
simulate
• For EGARCH(P,Q) models, simulate: • V0 must have at least max(Mdl.P,Mdl.Q) rows to initialize the variance equation. • By default, simulate sets any necessary presample variances to the exponentiated unconditional mean of the logarithm of the EGARCH variance process. If the number of rows in V0 exceeds the number necessary, then simulate uses the latest, required number of observations only. The last element or row contains the latest observation. Example: 'V0',[1; 0.5] Data Types: double Notes • If E0 and V0 are column vectors, simulate applies them to every column of the outputs V and Y. This application allows simulated paths to share a common starting point for Monte Carlo simulation of forecasts and forecast error distributions. • NaNs indicate missing values. simulate removes missing values. The software merges the presample data (E0 and V0), and then uses list-wise deletion to remove any rows containing at least one NaN. Removing NaNs in the data reduces the sample size. Removing NaNs can also create irregular time series. • simulate assumes that you synchronize presample data such that the latest observation of each presample series occurs simultaneously.
Output Arguments V — Simulated conditional variance paths numeric column vector | numeric matrix Simulated conditional variance paths of the mean-zero innovations associated with Y, returned as a numeric column vector or matrix. V is a numObs-by-NumPaths matrix, in which each column corresponds to a simulated conditional variance path. Rows of V are periods corresponding to the periodicity of Mdl. Y — Simulated response paths numeric column vector | numeric matrix Simulated response paths, returned as a numeric column vector or matrix. Y usually represents a mean-zero, heteroscedastic time series of innovations with conditional variances given in V (a continuation of the presample innovation series E0). Y can also represent a time series of mean-zero, heteroscedastic innovations plus an offset. If Mdl includes an offset, then simulate adds the offset to the underlying mean-zero, heteroscedastic innovations so that Y represents a time series of offset-adjusted innovations. Y is a numObs-by-NumPaths matrix, in which each column corresponds to a simulated response path. Rows of Y are periods corresponding to the periodicity of Mdl.
12-2013
12
Functions
Version History Introduced in R2012a
References [1] Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics. Vol. 31, 1986, pp. 307–327. [2] Bollerslev, T. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” The Review of Economics and Statistics. Vol. 69, 1987, pp. 542–547. [3] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [4] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995. [5] Engle, R. F. “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica. Vol. 50, 1982, pp. 987–1007. [6] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [7] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [8] Nelson, D. B. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica. Vol. 59, 1991, pp. 347–370.
See Also Objects garch | egarch | gjr Functions estimate | forecast | filter Topics “Simulate GARCH Models” on page 8-76 “Simulate Conditional Variance Model” on page 8-86 “Assess EGARCH Forecast Bias Using Simulations” on page 8-81 “Monte Carlo Simulation of Conditional Variance Models” on page 8-72 “Presample Data for Conditional Variance Model Simulation” on page 8-75 “Monte Carlo Forecasting of Conditional Variance Models” on page 8-89
12-2014
simulate
simulate Simulate Markov chain state walks
Syntax X = simulate(mc,numSteps) X = simulate(mc,numSteps,'X0',x0)
Description X = simulate(mc,numSteps) returns data X on random walks of length numSteps through sequences of states in the discrete-time Markov chain mc. X = simulate(mc,numSteps,'X0',x0) optionally specifies the initial state of simulations x0.
Examples Simulate Random Walk Through Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 1/2 1/4 1/4 0 0 ; 0 0 1/3 0 2/3 0 0 ; 0 0 0 0 0 1/3 2/3; 0 0 0 0 0 1/2 1/2; 0 0 0 0 0 3/4 1/4; 1/2 1/2 0 0 0 0 0 ; 1/4 3/4 0 0 0 0 0 ]; mc = dtmc(P);
Plot a directed graph of the Markov chain. Indicate the probability of transition by using edge colors. figure; graphplot(mc,'ColorEdges',true);
12-2015
12
Functions
Simulate a 20-step random walk that starts from a random state. rng(1); % For reproducibility numSteps = 20; X = simulate(mc,numSteps) X = 21×1 3 7 1 3 6 1 3 7 2 5
⋮
X is a 21-by-1 matrix. Rows correspond to steps in the random walk. Because X(1) is 3, the random walk begins at state 3. Visualize the random walk. figure; simplot(mc,X);
12-2016
simulate
Specify Starting States for Multiple Simulations Create a four-state Markov chain from a randomly generated transition matrix containing eight infeasible transitions. rng('default'); % For reproducibility mc = mcmix(4,'Zeros',8);
mc is a dtmc object. Plot a digraph of the Markov chain. figure; graphplot(mc);
12-2017
12
Functions
State 4 is an absorbing state. Run three 10-step simulations for each state. x0 = 3*ones(1,mc.NumStates); numSteps = 10; X = simulate(mc,numSteps,'X0',x0);
X is an 11-by-12 matrix. Rows corresponds to steps in the random walk. Columns 1–3 are the simulations that start at state 1; column 4–6 are the simulations that start at state 2; columns 7–9 are the simulations that start at state 3; and columns 10–12 are the simulations that start at state 4. For each time, plot the proportions states that are visited over all simulations. figure; simplot(mc,X)
12-2018
simulate
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). numSteps — Number of discrete time steps positive integer Number of discrete time steps in each simulation, specified as a positive integer. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'X0',[1 0 2] specifies simulating three times, the first simulation starts in state 1 and the final two start in state 3. 12-2019
12
Functions
X0 — Initial states of simulations vector of nonnegative integers Initial states of simulations, specified as the comma-separated pair consisting of 'X0' and a vector of nonnegative integers of NumStates length. X0 provides counts for the number of simulations to begin in each state. The total number of simulations (numSims) is sum(X0). The default is a single simulation beginning from a random initial state. Example: 'X0',[10 10 0 5] Data Types: double
Output Arguments X — Indices of states numeric matrix of positive integers Indices of states visited during the simulations, returned as a (1 + numSteps)-by-numSims numeric matrix of positive integers. The first row contains the initial states. Columns, in order, are all simulations beginning in the first state, then all simulations beginning in the second state, and so on.
Tips • To start n simulations from state k, use: X0 = zeros(1,NumStates); X0(k) = n;
• To visualize the data created by simulate, use simplot.
Version History Introduced in R2017b
See Also Objects dtmc Functions redistribute | simplot Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Simulate Random Walks Through Markov Chain” on page 10-59
12-2020
simulate
simulate Simulate sample paths of Markov-switching dynamic regression model
Syntax Y = simulate(Mdl,numObs) Y = simulate(Mdl,numObs,Name,Value) [Y,E,StatePaths] = simulate( ___ )
Description Y = simulate(Mdl,numObs) returns a random numObs-period path of response series Y from simulating the fully specified Markov-switching dynamic regression model Mdl. Y = simulate(Mdl,numObs,Name,Value) uses additional options specified by one or more namevalue arguments. For example, 'NumPaths',1000,'Y0',Y0 simulates 1000 sample paths and initializes the dynamic component of each submodel by using the presample response data Y0. [Y,E,StatePaths] = simulate( ___ ) also returns the simulated innovation paths E and the simulated state paths StatePaths, using any of the input argument combinations in the previous syntaxes.
Examples Simulate Response Path Simulate a response path from a two-state Markov-switching dynamic regression model for a 1-D response process. This example uses arbitrary parameter values. Create Fully Specified Model Create a two-state discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.9 0.1; 0.3 0.7]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
mc is a fully specified dtmc object. For each regime, use arima to create an AR model that describes the response process within the regime. Store the submodels in a vector. mdl1 = arima('Constant',5,'AR',[0.3 0.2],... 'Variance',2); mdl2 = arima('Constant',-5,'AR',0.1,... 'Variance',1); mdl = [mdl1; mdl2];
mdl1 and mdl2 are fully specified arima objects. 12-2021
12
Functions
Create a Markov-switching dynamic regression model from the switching mechanism mc and the vector of submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR object. Simulate Response Path Generate one random response path of length 50 from the model. rng(1); % For reproducibility y = simulate(Mdl,50);
y is a 50-by-1 vector of one response path. Plot the response path. figure plot(y) xlabel("Time") ylabel("Response")
Simulate Multiple Paths Consider the model in “Simulate Response Path” on page 12-2021. 12-2022
simulate
Create the Markov-switching dynamic regression model. P = [0.9 0.1; 0.3 0.7]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]); mdl1 = arima('Constant',5,'AR',[0.3 0.2],... 'Variance',2); mdl2 = arima('Constant',-5,'AR',0.1,... 'Variance',1); mdl = [mdl1; mdl2]; Mdl = msVAR(mc,mdl);
Simulate 3 response, innovations, and state-index paths of 5 observations from the model. rng('default') % For reproducibility [Y,E,SP] = simulate(Mdl,5,'NumPaths',3) Y = 5×3 -5.7605 -5.7002 4.2446 -3.1665 -3.8995
10.9496 8.5772 10.7774 -2.2920 -4.7403
11.4633 -3.1268 -5.6161 -5.2677 -6.3141
0.9496 -1.7076 1.0143 1.6302 0.4889
1.4633 0.7269 -0.3034 0.2939 -0.7873
E = 5×3 -0.2050 -0.1241 2.1068 1.4090 1.4172 SP = 5×3 2 2 1 2 2
1 1 1 2 2
1 2 2 2 2
Y, E, and SP are 5-by-3 matrices. Columns represent separate, independent paths. Simulate a single path of responses, innovations, and states into a simulation horizon of length 50. Then plot each path separately. [y,e,sp] = simulate(Mdl,50); figure subplot(3,1,1) plot(y) ylabel('Response') grid on subplot(3,1,2) plot(e) ylabel('Innovation') grid on
12-2023
12
Functions
subplot(3,1,3) plot(sp,'m') ylabel('State') yticks([1 2]) yticklabels(Mdl.StateNames)
Simulate US GDP Rates and Economic States Consider a two-state Markov-switching dynamic regression model of the postwar US real GDP growth rate. The model has the parameter estimates presented in [1]. Create a discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.92 0.08; 0.26 0.74]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
Create separate AR(0) models (constant only) for the two regimes. sigma = 3.34; % Homoscedastic models across states mdl1 = arima('Constant',4.62,'Variance',sigma^2); mdl2 = arima('Constant',-0.48,'Variance',sigma^2); mdl = [mdl1 mdl2];
12-2024
simulate
Create the Markov-switching dynamic regression model that describes the behavior of the US GDP growth rate. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR object. Generate one random path of 100 responses, corresponding innovations, and states from the model. rng(1) % For reproducibility [y,e,sp] = simulate(Mdl,100);
y is a 100-by-1 vector of GDP rates, and e is a 100-by-1 vector of corresponding innovations. sp is a 100-by-1 vector of state indices.
Specify Presample Data Consider the Markov-switching model in “Simulate US GDP Rates and Economic States” on page 122024, but assume that the submodels are AR(1) instead. Consider fitting the model to observations in the period 1960:Q1–2004:Q2. Create the model template for estimation. Specify AR(1) submodels. mc = dtmc(NaN(2),'StateNames',["Expansion" "Recession"]); ar1 = arima(1,0,0); Mdl = msVAR(mc,[ar1; ar1]);
Because the submodels are AR(1), each requires one presample observation to initialize its dynamic component for estimation. Create the model containing initial parameter values for the estimation procedure. mc0 = dtmc(0.5*ones(2),'StateNames',["Expansion" "Recession"]); submdl01 = arima('Constant',1,'Variance',1,'AR',0.001); submdl02 = arima('Constant',-1,'Variance',1,'AR',0.001); Mdl0 = msVAR(mc0,[submdl01; submdl02]);
Load the data. Transform the entire set to an annualized rate series. load Data_GDP qrate = diff(Data)./Data(1:(end - 1)); arate = 100*((1 + qrate).^4 - 1);
Identify the presample and estimation sample periods using the dates associated with the annualized rate series. Because the transformation applies the first difference, you must drop the first observation date from the original sample. dates = datetime(dates(2:end),'ConvertFrom','datenum',... 'Format','yyyy:QQQ','Locale','en_US'); estPrd = datetime(["1960:Q2" "2004:Q2"],'InputFormat','yyyy:QQQ',... 'Format','yyyy:QQQ','Locale','en_US'); idxEst = isbetween(dates,estPrd(1),estPrd(2)); idxPre = dates < estPrd(1);
Fit the model to the estimation sample data. Specify the presample observation. 12-2025
12
Functions
y0 = arate(idxPre); EstMdl = estimate(Mdl,Mdl0,arate(idxEst),'Y0',y0);
Simulate a response path from the fitted model over the estimation period. Specify the presample observation. rng(1) % For reproducibility numObs = sum(idxEst); aratesim = simulate(EstMdl,numObs,'Y0',y0);
Plot the observations and simulated path of annualized rates, and identify periods of recession by using recessionplot. figure; plot(dates(idxEst),[arate(idxEst) aratesim]) recessionplot xlabel("Time") ylabel("Annualized GDP Rate") legend("Observed","Simulated");
Fit Model to Simulated Data Assess estimation accuracy using simulated data from a known data-generating process (DGP). This example uses arbitrary parameter values. 12-2026
simulate
Create Model for DGP Create a fully specified, two-state discrete-time Markov chain model for the switching mechanism. P = [0.7 0.3; 0.1 0.9]; mc = dtmc(P);
For each state, create a fully specified AR(1) model for the response process. % Constants C1 = 5; C2 = -2; % Autoregression coefficients AR1 = 0.4; AR2 = 0.2; % Variances V1 = 4; V2 = 2; % AR Submodels dgp1 = arima('Constant',C1,'AR',AR1,'Variance',V1); dgp2 = arima('Constant',C2,'AR',AR2,'Variance',V2);
Create a fully specified Markov-switching dynamic regression model for the DGP. DGP = msVAR(mc,[dgp1,dgp2]);
Simulate Response Paths from DGP Generate 10 random response paths of length 1000 from the DGP. rng(1); % For reproducibility N = 10; n = 1000; Data = simulate(DGP,n,'Numpaths',N);
Data is a 1000-by-10 matrix of simulated responses. Create Model for Estimation Create a partially specified Markov-switching dynamic regression model that has the same structure as the data-generating process, but specify an unknown transition matrix and unknown submodel coefficients. PEst = NaN(2); mcEst = dtmc(PEst); mdl = arima(1,0,0); Mdl = msVAR(mcEst,[mdl; mdl]);
Create Model Containing Initial Values Create a fully specified Markov-switching dynamic regression model that has the same structure as Mdl, but set all estimable parameters to initial values. P0 = 0.5*ones(2); mc0 = dtmc(P0);
12-2027
12
Functions
mdl01 = arima('Constant',1,'AR',0.5,'Variance',2); mdl02 = arima('Constant',-1,'AR',0.5,'Variance',1); Mdl0 = msVAR(mc0,[mdl01,mdl02]);
Estimate Models Fit the model to each simulated path. For each path, plot the loglikelihood at each iteration of the EM algorithm. c1 = zeros(N,1); c2 = zeros(N,1); v1 = zeros(N,1); v2 = zeros(N,1); ar1 = zeros(N,1); ar2 = zeros(N,1); PStack = zeros(2,2,N); figure hold on for i = 1:N EstModel = estimate(Mdl,Mdl0,Data(:,i),'IterationPlot',true); c1(i) = EstModel.Submodels(1).Constant; c2(i) = EstModel.Submodels(2).Constant; v1(i) = EstModel.Submodels(1).Covariance; v2(i) = EstModel.Submodels(2).Covariance; ar1(i) = EstModel.Submodels(1).AR{1}; ar2(i) = EstModel.Submodels(2).AR{1}; PStack(:,:,i) = EstModel.Switch.P; end hold off
12-2028
simulate
Assess Accuracy Compute the Monte Carlo mean of each estimated parameter. c1Mean = mean(c1); c2Mean = mean(c2); v1Mean = mean(v1); v2Mean = mean(v2); ar1Mean = mean(ar1); ar2Mean = mean(ar2); PMean = mean(PStack,3);
Compare population parameters to the corresponding Monte Carlo estimates. DGPvsEstimate = [... C1 c1Mean C2 c2Mean V1 v1Mean V2 v2Mean AR1 ar1Mean AR2 ar2Mean] DGPvsEstimate = 6×2 5.0000 -2.0000 4.0000 2.0000
5.0260 -1.9615 3.9710 1.9903
12-2029
12
Functions
0.4000 0.2000
0.4061 0.2017
P P = 2×2 0.7000 0.1000
0.3000 0.9000
PEstimate = PMean PEstimate = 2×2 0.7065 0.1023
0.2935 0.8977
Simulate Paths from Model with VARX Submodels Generate random paths from a three-state Markov-switching dynamic regression model for a 2-D VARX response process. This example uses arbitrary parameter values for the DGP. Create Fully Specified Model for DGP Create a three-state discrete-time Markov chain model for the switching mechanism. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P);
mc is a fully specified dtmc object. dtmc normalizes the rows of P so that they sum to 1. For each regime, use varm to create a VAR model that describes the response process within the regime. Specify all parameter values. % Constants C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; % Autoregression coefficients AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % Regression coefficients Beta1 = [1;-1]; Beta2 = [2 2;-2 -2]; Beta3 = [3 3 3;-3 -3 -3]; % Innovations covariances Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3];
12-2030
simulate
% VARX submodels mdl1 = varm('Constant',C1,'AR',AR1,'Beta',Beta1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Beta',Beta2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Beta',Beta3,'Covariance',Sigma3); mdl = [mdl1; mdl2; mdl3];
mdl contains three fully specified varm model objects. For the DGP, create a fully specified Markov-switching dynamic regression model from the switching mechanism mc and the submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR model. Simulate Data Ignoring Regression Component If you do not supply exogenous data, simulate ignores the regression components in the submodels. Simulate 3 separate, independent paths of responses, innovations, and state indices of length 5 from the model. rng(1); % For reproducibility [Y,E,SP] = simulate(Mdl,5,'NumPaths',3) Y = Y(:,:,1) = 5.2387 4.4290 1.1668 -0.9654 -0.2701
1.5297 4.2738 -1.2905 -0.2028 0.8993
Y(:,:,2) = 2.7737 -0.8651 -0.0511 0.5826 2.4022
-2.5383 -1.1046 0.3696 -0.8926 -0.6912
Y(:,:,3) = 3.5443 4.9748 5.7213 4.2473 2.7972
0.8768 -0.7956 0.8073 0.5805 -1.3340
E = E(:,:,1) = 1.2387 -0.3434 0.1668
1.5297 2.8896 -0.2905
12-2031
12
Functions
-1.9654 -1.2701
0.7972 1.8993
E(:,:,2) = 1.7737 -1.8651 -1.0511 -0.4174 1.4022
-1.5383 -0.1046 1.3696 0.1074 0.3088
E(:,:,3) = -0.4557 1.1150 1.3134 -0.6941 1.7972
0.8768 -1.0061 0.7176 -0.6838 -0.3340
SP = 5×3 2 2 1 1 1
1 1 1 1 1
2 2 2 2 1
Y and E are 5-by-2-by-3 arrays of simulated responses and innovations, respectively. Rows correspond to time points, columns correspond to variables in the system, and pages correspond to paths. SP is a 5-by-3 matrix whose columns correspond to paths. Simulate a single path of responses, innovations, and states into a simulation horizon of length 50. Then plot each path separately. rng0 = rng; % Store settings to reproduce state sequence. [Y,E,SP] = simulate(Mdl,50); figure subplot(3,1,1) plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) subplot(3,1,2) plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) subplot(3,1,3) plot(SP,'m') ylabel("State") yticks([1 2 3])
12-2032
simulate
Simulate Data Including Regression Component Simulate exogenous data for the three regressors by generating 50 random observations from the 3-D standard Gaussian distribution. X = randn(50,3);
Generate one random response, innovation, and state path of length 50. Specify the simulated exogenous data for the submodel regression components. Plot the results. rng(rng0); % Reproduce state sequence in previous simulation. [Y,E,SP] = simulate(Mdl,50,'X',X); figure subplot(3,1,1) plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) subplot(3,1,2) plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) subplot(3,1,3) plot(SP,'m') ylabel("State") yticks([1 2 3])
12-2033
12
Functions
Perform Monte Carlo Estimation Consider the model in “Simulate Paths from Model with VARX Submodels” on page 12-2030. Create Fully Specified Model Create the Markov-switching model excluding the regression component. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P); C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3); mdl = [mdl1; mdl2; mdl3]; Mdl = msVAR(mc,mdl);
12-2034
simulate
Simulate Multiple Paths Generate 1000 random paths of responses for 50 time steps. Start all simulations at the first state. rng(10); % For reproducibility S0 = [1 0 0]; Y = simulate(Mdl,50,'S0',S0,'NumPaths',1000);
Y is a 50-by-2-by-1000 array of simulated response paths. Compute Monte Carlo Distribution For each variable and path, compute the process mean. mus = mean(Y,1);
For each variable, plot the Monte Carlo distribution of the process mean. figure h1 = histogram(mus(1,1,:),'Normalization',"probability",... 'BinWidth',0.1); hold on h2 = histogram(mus(1,2,:),'Normalization',"probability",... 'BinWidth',0.1); legend(["y_1" "y_2"]) title('Process Means') hold off
12-2035
12
Functions
For each variable and path, compute the process standard deviation. sigmas = std(Y,0,1);
For each variable, plot the Monte Carlo distribution of the process standard deviation. figure h1 = histogram(sigmas(1,1,:),'Normalization',"probability",... 'BinWidth',0.05); hold on h2 = histogram(sigmas(1,2,:),'Normalization',"probability",... 'BinWidth',0.05); legend(["y_1" "y_2"]) title('Process Standard Deviations') hold off
Input Arguments Mdl — Fully specified Markov-switching dynamic regression model msVAR model object Fully specified Markov-switching dynamic regression model, specified as an msVAR model object returned by msVAR or estimate. Properties of a fully specified model object do not contain NaN values. numObs — Number of observations to generate positive integer 12-2036
simulate
Number of observations to generate for each sample path, specified as a positive integer. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'NumPaths',1000,'Y0',Y0 simulates 1000 sample paths and initializes the dynamic component of each submodel by using the presample response data Y0. NumPaths — Number of sample paths to generate 1 (default) | positive integer Number of sample paths to generate, specified as the comma-separated pair consisting of 'NumPaths' and a positive integer. Example: 'NumPaths',1000 Data Types: double Y0 — Presample response data numeric matrix | numeric array Presample response data, specified as the comma-separated pair consisting of 'Y0' and a numeric matrix or array. To use the same presample data for each numPaths path, specify a numPreSampleObs-bynumSeries matrix, where numPaths is the value of NumPaths, numPreSampleObs is the number of presample observations, and numSeries is the number of response variables. To use different presample data for each path: • For univariate ARX submodels, specify a numPreSampleObs-by-numPaths matrix. • For multivariate VARX submodels, specify a numPreSampleObs-by-numSeries-by-numPaths array. The number of presample observations numPreSampleObs must be sufficient to initialize the AR terms of all submodels. If numPreSampleObs exceeds the AR order of any state, simulate uses the latest observations. Each time simulate switches states, it updates Y0 using the latest simulated observations. By default, simulate determines Y0 by the submodel of the initial state (see S0): • If the initial submodel is a stationary AR process without regression components, simulate sets presample observations to the unconditional mean. • Otherwise, simulate sets presample observations to zero. Data Types: double S0 — Initial state probabilities nonnegative numeric vector 12-2037
12
Functions
Initial state probabilities, specified as the comma-separated pair consisting of 'S0' and a nonnegative numeric vector of length numStates. simulate normalizes S0 to produce a distribution. simulate selects the initial state of each path from S0 at random. To start from a specific initial state, specify a distribution with a probability mass of 1 in that state. By default, simulate sets S0 to a steady-state distribution computed by asymptotics. Example: 'S0',[0.2 0.2 0.6] Example: 'S0',[0 1] specifies state 2 as the initial state. Data Types: double X — Predictor data numeric matrix | cell vector of numeric matrices Predictor data used to evaluate regression components in all submodels of Mdl, specified as the comma-separated pair consisting of 'X' and a numeric matrix or a cell vector of numeric matrices. To use a subset of the same predictors in each state, specify X as a matrix with numPreds columns and at least numObs rows. Columns correspond to distinct predictor variables. Submodels use initial columns of the associated matrix, in order, up to the number of submodel predictors. The number of columns in the Beta property of Mdl.SubModels(j) determines the number of exogenous variables in the regression component of submodel j. If the number of rows exceeds numObs, then simulate uses the latest observations. To use different predictors in each state, specify a cell vector of such matrices with length numStates. By default, simulate ignores regression components in Mdl. Data Types: double
Output Arguments Y — Simulated response paths numeric matrix | numeric array Simulated response paths, returned as a numeric matrix or array. Y represents the continuation of the presample responses in Y0. For univariate ARX submodels, Y is a numObs-by-numPaths matrix. For multivariate VARX submodels, Y is a numObs-by-numSeries-by-numPaths array. E — Simulated innovation paths numeric matrix | numeric array Simulated innovation paths, returned as a numeric matrix or array. For univariate ARX submodels, E is a numObs-by-numPaths matrix. For multivariate VARX submodels, E is a numObs-by-numSeries-by-numPaths array. StatePaths — Simulated state paths numeric matrix 12-2038
simulate
Simulated state paths, returned as a numObs-by-numPaths numeric matrix.
Version History Introduced in R2019b
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects msVAR Functions estimate | forecast
12-2039
12
Functions
simulate Class: regARIMA Monte Carlo simulation of regression model with ARIMA errors
Syntax [Y,E] = simulate(Mdl,numObs) [Y,E,U] = simulate(Mdl,numObs) [Y,E,U] = simulate(Mdl,numObs,Name,Value)
Description [Y,E] = simulate(Mdl,numObs) simulates one sample path of observations (Y) and innovations (E) from the regression model with ARIMA time series errors, Mdl. The software simulates numObs observations and innovations per sample path. [Y,E,U] = simulate(Mdl,numObs) additionally simulates unconditional disturbances, U. [Y,E,U] = simulate(Mdl,numObs,Name,Value) simulates sample paths with additional options specified by one or more Name,Value pair arguments.
Input Arguments Mdl Regression model with ARIMA errors, specified as a regARIMA model returned by regARIMA or estimate. The properties of Mdl cannot contain NaNs. numObs Number of observations (rows) to generate for each path of Y, E, and U, specified as a positive integer. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. E0 Presample innovations that have mean 0 and provide initial values for the ARIMA error model, specified as the comma-separated pair consisting of 'E0' and a column vector or matrix. • If E0 is a column vector, then it is applied to each inferred path. 12-2040
simulate
• If E0 is a matrix, then it requires at least NumPaths columns. If E0 contains more columns than required, then simulate uses the first NumPaths columns. • E0 must contain at least Mdl.Q rows. If E0 contains more rows than required, then simulate uses the latest presample innovations. The last row contains the latest presample innovation. Default: simulate sets the necessary presample innovations to 0. NumPaths Number of sample paths (columns) to generate for Y, E, and U, specified as the comma-separated pair consisting of 'NumPaths' and a positive integer. Default: 1 U0 Presample unconditional disturbances that provide initial values for the ARIMA error model, specified as the comma-separated pair consisting of 'U0' and a column vector or matrix. • If U0 is a column vector, then it is applied to each inferred path. • If U0 is a matrix, then it requires at least NumPaths columns. If U0 contains more columns than required, then infer uses the first NumPaths columns. • U0 must contain at least Mdl.P rows. If U0 contains more rows than required, then simulate uses the latest presample unconditional disturbances. The last row contains the latest presample unconditional disturbance. Default: simulate sets the necessary presample unconditional disturbances to 0. X Predictor data in the regression model, specified as the comma-separated pair consisting of 'X' and a matrix. The columns of X are separate, synchronized time series, with the last row containing the latest observations. X must have at least numObs rows. If the number of rows of X exceeds the number required, then simulate uses the latest observations. Default: simulate does not use a regression component regardless of its presence in Mdl. Notes • NaNs in E0, U0, and X indicate missing values and simulate removes them. The software merges the presample data sets (E0 and U0), then uses list-wise deletion to remove any NaNs. simulate similarly removes NaNs from X. Removing NaNs in the data reduces the sample size, and can also create irregular time series. • simulate assumes that you synchronize presample data such that the latest observation of each presample series occurs simultaneously. • All predictors (i.e., columns in X) are associated with each response path in Y.
12-2041
12
Functions
Output Arguments Y Simulated responses, returned as a numObs-by-NumPaths matrix. E Simulated, mean 0 innovations, returned as a numObs-by-NumPaths matrix. U Simulated unconditional disturbances, returned as a numObs-by-NumPaths matrix.
Examples Simulate Responses, Innovations, and Unconditional Disturbances Simulate paths of responses, innovations, and unconditional disturbances from a regression model with SARIMA(2, 1, 1)12 errors. Specify the model: yt = X
1.5 + ut −2
1 − 0 . 2L − 0 . 1L2 1 − L 1 − 0 . 01L12 1 − L12 ut = 1 + 0 . 5L 1 + 0 . 02L12 εt, where εt follows a t-distribution with 15 degrees of freedom. Distribution = struct('Name','t','DoF',15); Mdl = regARIMA('AR',{0.2, 0.1},'MA',{0.5},'SAR',0.01,... 'SARLags',12,'SMA',0.02,'SMALags',12,'D',1,... 'Seasonality',12,'Beta',[1.5; -2],'Intercept',0,... 'Variance',0.1,'Distribution',Distribution) Mdl = regARIMA with properties: Description: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"Regression with ARIMA(2,1,1) Error Model Seasonally Integrated with Seasonal A Name = "t", DoF = 15 0 [1.5 -2] 27 1 13 {0.2 0.1} at lags [1 2] {0.01} at lag [12] {0.5} at lag [1] {0.02} at lag [12] 12 0.1
Simulate and plot 500 paths with 25 observations each. 12-2042
simulate
T = 25; rng(1) X = randn(T,2); [Y,E,U] = simulate(Mdl,T,'NumPaths',500,'X',X); figure subplot(2,1,1); plot(Y) axis tight title('{\bf Simulated Response Paths}') subplot(2,2,3); plot(E) axis tight title('{\bf Simulated Innovations Paths}') subplot(2,2,4); plot(U) axis tight title('{\bf Simulated Unconditional Disturbances Paths}')
Plot the 2.5th, 50th (median), and 97.5th percentiles of the simulated response paths. lower = prctile(Y,2.5,2); middle = median(Y,2); upper = prctile(Y,97.5,2); figure plot(1:25,lower,'r:',1:25,middle,'k',... 1:25,upper,'r:')
12-2043
12
Functions
title('\bf{95% Percentile Confidence Interval for the Response}') legend('95% Interval','Median','Location','Best')
Compute statistics across the second dimension (across paths) to summarize the sample paths. Plot a histogram of the simulated paths at time 20. figure histogram(Y(20,:),10) title('Response Distribution at Time 20')
12-2044
simulate
Forecast Stationary Process Using Monte Carlo Simulations Regress the stationary, quarterly log GDP onto the CPI using a regression model with ARMA(1,1) errors, and forecast log GDP using Monte Carlo simulation. Load the US Macroeconomic data set and preprocess the data. load Data_USEconModel; logGDP = log(DataTimeTable.GDP); dlogGDP = diff(logGDP); % For stationarity dCPI = diff(DataTimeTable.CPIAUCSL); % For stationarity numObs = length(dlogGDP); gdp = dlogGDP(1:end-15); % Estimation sample cpi = dCPI(1:end-15); T = length(gdp); % Effective sample size frstHzn = T+1:numObs; % Forecast horizon hoCPI = dCPI(frstHzn); % Holdout sample dts = DataTimeTable.Time(2:end);
Fit a regression model with ARMA(1,1) errors. Mdl = regARIMA('ARLags',1,'MALags',1); EstMdl = estimate(Mdl,gdp,'X',cpi); Regression with ARMA(1,1) Error Model (Gaussian Distribution):
12-2045
12
Functions
Value __________ Intercept AR{1} MA{1} Beta(1) Variance
0.014793 0.57601 -0.15258 0.0028972 9.5734e-05
StandardError _____________ 0.0016289 0.10009 0.11978 0.0013989 6.5562e-06
TStatistic __________ 9.0818 5.7548 -1.2738 2.071 14.602
PValue __________ 1.0684e-19 8.6755e-09 0.20272 0.038355 2.723e-48
Infer unconditional disturbances. [~,u0] = infer(EstMdl,gdp,'X',cpi);
Simulate 1000 paths with 15 observations each. Use the inferred unconditional disturbances as presample data. rng(1); % For reproducibility gdpF = simulate(EstMdl,15,'NumPaths',1000,... 'U0',u0,'X',hoCPI);
Plot the simulation mean forecast and approximate 95% forecast intervals. lower = prctile(gdpF,2.5,2); upper = prctile(gdpF,97.5,2); mn = mean(gdpF,2); figure plot(dts(end-65:end),dlogGDP(end-65:end),'Color',[.7,.7,.7]) hold on h1 = plot(dts(frstHzn),lower,'r:','LineWidth',2); plot(dts(frstHzn),upper,'r:','LineWidth',2) h2 = plot(dts(frstHzn),mn,'k','LineWidth',2); h = gca; ph = patch([repmat(dts(frstHzn(1)),1,2) repmat(dts(frstHzn(end)),1,2)],... [h.YLim fliplr(h.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2],{'95% Interval','Simulation Mean'},'Location','NorthWest',... 'AutoUpdate','off') axis tight title('{\bf log GDP Forecast - 15 Quarter Horizon}') hold off
12-2046
simulate
Forecast Unit Root Nonstationary Process Using Monte Carlo Simulations Regress the unit root nonstationary, quarterly log GDP onto the CPI using a regression model with ARIMA(1,1,1) errors with known intercept. Forecast log GDP using Monte Carlo simulation. Load the US Macroeconomic data set and preprocess the data. load Data_USEconModel numObs = length(DataTimeTable.GDP); logGDP = log(DataTimeTable.GDP(1:end-15)); cpi = DataTimeTable.CPIAUCSL(1:end-15); T = length(logGDP); frstHzn = T+1:numObs; % Forecast horizon hoCPI = DataTimeTable.CPIAUCSL(frstHzn); % Holdout sample
Fit a regression model with ARIMA(1,1,1). The intercept is not identifiable in a model with integrated errors, so fix its value before estimation. intercept = 5.867; Mdl = regARIMA('ARLags',1,'MALags',1,'D',1,'Intercept',intercept); EstMdl = estimate(Mdl,logGDP,'X',cpi); Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution):
12-2047
12
Functions
Intercept AR{1} MA{1} Beta(1) Variance
Value __________
StandardError _____________
5.867 0.92271 -0.38785 0.0039668 0.00010894
0 0.030978 0.060354 0.0016498 7.272e-06
TStatistic __________ Inf 29.786 -6.4263 2.4044 14.981
PValue ___________ 0 5.8737e-195 1.3076e-10 0.016199 9.7345e-51
Infer unconditional disturbances. [~,u0] = infer(EstMdl,logGDP,'X',cpi);
Simulate 1000 paths with 15 observations each. Use the inferred unconditional disturbances as presample data. rng(1); % For reproducibility GDPF = simulate(EstMdl,15,'NumPaths',1000,... 'U0',u0,'X',hoCPI);
Plot the simulation mean forecast and approximate 95% forecast intervals. lower = prctile(GDPF,2.5,2); upper = prctile(GDPF,97.5,2); mn = mean(GDPF,2); dt = DataTimeTable.Time; figure plot(dt(end-65:end),log(DataTimeTable.GDP(end-65:end)),'Color',... [.7,.7,.7]) hold on h1 = plot(dt(frstHzn),lower,'r:','LineWidth',2); plot(dt(frstHzn),upper,'r:','LineWidth',2) h2 = plot(dt(frstHzn),mn,'k','LineWidth',2); h = gca; ph = patch([repmat(dt(frstHzn(1)),1,2) repmat(dt(frstHzn(end)),1,2)],... [h.YLim fliplr(h.YLim)],... [0 0 0 0],'b'); ph.FaceAlpha = 0.1; legend([h1 h2],{'95% Interval','Simulation Mean'},'Location','NorthWest',... 'AutoUpdate','off') axis tight title('{\bf log GDP Forecast - 15 Quarter Horizon}') hold off
12-2048
simulate
The unconditional disturbances, ut, are nonstationary, therefore the widths of the forecast intervals grow with time.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [3] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991. [6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
See Also regARIMA | estimate | filter | forecast | infer Topics “Alternative ARIMA Model Representations” on page 5-110 12-2049
12
Functions
“Simulate Stationary Processes” on page 7-152 “Simulate Trend-Stationary and Difference-Stationary Processes” on page 7-160 “Monte Carlo Simulation of Conditional Mean Models” on page 7-148 “Presample Data for Conditional Mean Model Simulation” on page 7-150 “Transient Effects in Conditional Mean Model Simulations” on page 7-151 “Monte Carlo Forecasting of Conditional Mean Models” on page 7-171
12-2050
simulate
simulate Class: ssm Monte Carlo simulation of state-space models
Syntax [Y,X] = simulate(Mdl,numObs) [Y,X] = simulate(Mdl,numObs,Name,Value) [Y,X,U,E] = simulate( ___ )
Description [Y,X] = simulate(Mdl,numObs) simulates one sample path of observations (Y) and states (X) from a fully specified, state-space model on page 11-3 (Mdl). The software simulates numObs observations and states per sample path. [Y,X] = simulate(Mdl,numObs,Name,Value) returns simulated responses and states with additional options specified by one or more Name,Value pair arguments. For example, specify the number of paths or model parameter values. [Y,X,U,E] = simulate( ___ ) additionally simulate state disturbances (U) and observation innovations (E) using any of the input arguments in the previous syntaxes.
Input Arguments Mdl — Standard state-space model ssm model object Standard state-space model, specified as anssm model object returned by ssm or estimate. A standard state-space model has finite initial state covariance matrix elements. That is, Mdl cannot be a dssm model object. If Mdl is not fully specified (that is, Mdl contains unknown parameters), then specify values for the unknown parameters using the 'Params' Name,Value pair argument. Otherwise, the software throws an error. numObs — Number of periods per path to simulate positive integer Number of periods per path to generate variants, specified as a positive integer. If Mdl is a time-varying model on page 11-5, then the length of the cell vector corresponding to the coefficient matrices must be at least numObs. If numObs is fewer than the number of periods that Mdl can support, then the software only uses the matrices in the first numObs cells of the cell vectors corresponding to the coefficient matrices. Data Types: double 12-2051
12
Functions
Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. NumPaths — Number of sample paths to generate variants 1 (default) | positive integer Number of sample paths to generate variants, specified as the comma-separated pair consisting of 'NumPaths' and a positive integer. Example: 'NumPaths',1000 Data Types: double Params — Values for unknown parameters numeric vector Values for unknown parameters in the state-space model, specified as the comma-separated pair consisting of 'Params' and a numeric vector. The elements of Params correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of Params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise following the order A, B, C, D, Mean0, and Cov0. • If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then you must set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrix mapping function. If Mdl contains unknown parameters, then you must specify their values. Otherwise, the software ignores the value of Params. Data Types: double
Output Arguments Y — Simulated observations matrix | cell matrix of numeric vectors Simulated observations, returned as a matrix or cell matrix of numeric vectors. If Mdl is a time-invariant model on page 11-4 with respect to the observations, then Y is a numObs-byn-by-numPaths array. That is, each row corresponds to a period, each column corresponds to an observation in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated observations. If Mdl is a time-varying model on page 11-5 with respect to the observations, then Y is a numObs-bynumPaths cell matrix of vectors. Y{t,j} contains a vector of length nt of simulated observations for period t of sample path j. The last row of Y contains the latest set of simulated observations. 12-2052
simulate
Data Types: cell | double X — Simulated states numeric matrix | cell matrix of numeric vectors Simulated states, returned as a numeric matrix or cell matrix of vectors. If Mdl is a time-invariant model with respect to the states, then X is a numObs-by-m-by-numPaths array. That is, each row corresponds to a period, each column corresponds to a state in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated states. If Mdl is a time-varying model with respect to the states, then X is a numObs-by-numPaths cell matrix of vectors. X{t,j} contains a vector of length mt of simulated states for period t of sample path j. The last row of X contains the latest set of simulated states. U — Simulated state disturbances matrix | cell matrix of numeric vectors Simulated state disturbances, returned as a matrix or cell matrix of vectors. If Mdl is a time-invariant model with respect to the state disturbances, then U is a numObs-by-h-bynumPaths array. That is, each row corresponds to a period, each column corresponds to a state disturbance in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated state disturbances. If Mdl is a time-varying model with respect to the state disturbances, then U is a numObs-bynumPaths cell matrix of vectors. U{t,j} contains a vector of length ht of simulated state disturbances for period t of sample path j. The last row of U contains the latest set of simulated state disturbances. Data Types: cell | double E — Simulated observation innovations matrix | cell matrix of numeric vectors Simulated observation innovations, returned as a matrix or cell matrix of numeric vectors. If Mdl is a time-invariant model with respect to the observation innovations, then E is a numObs-by-hby-numPaths array. That is, each row corresponds to a period, each column corresponds to an observation innovation in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated observation innovations. If Mdl is a time-varying model with respect to the observation innovations, then E is a numObs-bynumPaths cell matrix of vectors. E{t,j} contains a vector of length ht of simulated observation innovations for period t of sample path j. The last row of E contains the latest set of simulated observations. Data Types: cell | double
Examples Simulate States and Observations of Time-Invariant State-Space Model Suppose that a latent process is an AR(1) model. The state equation is 12-2053
12
Functions
xt = 0 . 5xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. T = 100; ARMdl = arima('AR',0.5,'Constant',0,'Variance',1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,'Y0',x0);
Suppose further that the latent process is subject to additive measurement error. The observation equation is yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.75*randn(T,1);
Specify the four coefficient matrices. A B C D
= = = =
0.5; 1; 1; 0.75;
Specify the state-space model using the coefficient matrices. Mdl = ssm(A,B,C,D) Mdl = State-space model type: ssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.50)x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution:
12-2054
simulate
Initial state means x1 0 Initial state covariance matrix x1 x1 1.33 State types x1 Stationary
Mdl is an ssm model. Verify that the model is correctly specified using the display in the Command Window. The software infers that the state process is stationary. Subsequently, the software sets the initial state mean and covariance to the mean and variance of the stationary distribution of an AR(1) model. Simulate one path each of states and observations. Specify that the paths span 100 periods. [simY,simX] = simulate(Mdl,100);
simY is a 100-by-1 vector of simulated responses. simX is a 100-by-1 vector of simulated states. Plot the true state values with the simulated states. Also, plot the observed responses with the simulated responses. figure subplot(2,1,1) plot(1:T,x,'-k',1:T,simX,':r','LineWidth',2) title({'True State Values and Simulated States'}) xlabel('Period') ylabel('State') legend({'True state values','Simulated state values'}) subplot(2,1,2) plot(1:T,y,'-k',1:T,simY,':r','LineWidth',2) title({'Observed Responses and Simulated responses'}) xlabel('Period') ylabel('Response') legend({'Observed responses','Simulated responses'})
12-2055
12
Functions
By default, simulate simulates one path for each state and observation in the state-space model. To conduct a Monte Carlo study, specify to simulate a large number of paths.
Simulate State-Space Models Containing Unknown Parameters To generate variates from a state-space model, specify values for all unknown parameters. Explicitly create this state-space model. xt = ϕxt − 1 + σ1ut yt = xt + σ2εt where ut and εt are independent Gaussian random variables with mean 0 and variance 1. Suppose that the initial state mean and variance are 1, and that the state is a stationary process. A = NaN; B = NaN; C = 1; D = NaN; mean0 = 1; cov0 = 1; stateType = 0; Mdl = ssm(A,B,C,D,'Mean0',mean0,'Cov0',cov0,'StateType',stateType);
12-2056
simulate
Simulate 100 responses from Mdl. Specify that the autoregressive coefficient is 0.75, the state disturbance standard deviation is 0.5, and the observation innovation standard deviation is 0.25. params = [0.75 0.5 0.25]; y = simulate(Mdl,100,'Params',params); figure; plot(y); title 'Simulated Responses'; xlabel 'Period';
The software searches for NaN values column-wise following the order A, B, C, D, Mean0, and Cov0. The order of the elements in params should correspond to this search.
Estimate Monte-Carlo Forecasts of State-Space Model Suppose that the relationship between the change in the unemployment rate (x1, t) and the nominal gross national product (nGNP) growth rate (x3, t) can be expressed in the following, state-space model form.
12-2057
12
Functions
x1, t x2, t x3, t x4, t
ϕ1 θ1 γ1 0 x1, t − 1
1 x 1 0 0 0 0 2, t − 1 = + 0 γ2 0 ϕ2 θ2 x3, t − 1 0 0 0 0 0 x4, t − 1
0 0 u1, t 1 u2, t 1
x1, t y1, t y2, t
=
σ1 0 ε1, t 1 0 0 0 x2, t + , 0 0 1 0 x3, t 0 σ2 ε2, t x4, t
where: • x1, t is the change in the unemployment rate at time t. • x2, t is a dummy state for the MA(1) effect on x1, t. • x3, t is the nGNP growth rate at time t. • x4, t is a dummy state for the MA(1) effect on x3, t. •
y1, t is the observed change in the unemployment rate.
•
y2, t is the observed nGNP growth rate.
• u1, t and u2, t are Gaussian series of state disturbances having mean 0 and standard deviation 1. • ε1, t is the Gaussian series of observation innovations having mean 0 and standard deviation σ1. • ε2, t is the Gaussian series of observation innovations having mean 0 and standard deviation σ2. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series, and the first difference of each. Also, remove the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); u = DataTable.UR(~isNaN); T = size(gnpn,1); y = zeros(T-1,2); y(:,1) = diff(u); y(:,2) = diff(log(gnpn));
% Flag periods containing NaNs % Sample size % Preallocate
This example proceeds using series without NaN values. However, using the Kalman filter framework, the software can accommodate series containing missing values. To determine how well the model forecasts observations, remove the last 10 observations for comparison. numPeriods = 10; isY = y(1:end-numPeriods,:); oosY = y(end-numPeriods+1:end,:);
Specify the coefficient matrices. 12-2058
% Forecast horizon % In-sample observations % Out-of-sample observations
simulate
A B C D
= = = =
[NaN NaN NaN 0; 0 0 0 0; NaN 0 NaN NaN; 0 0 0 0]; [1 0;1 0 ; 0 1; 0 1]; [1 0 0 0; 0 0 1 0]; [NaN 0; 0 NaN];
Specify the state-space model using ssm. Verify that the model specification is consistent with the state-space model. Mdl = ssm(A,B,C,D) Mdl = State-space model type: ssm State vector length: 4 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 2 Sample size supported by model: Unlimited Unknown parameters for estimation: 8 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State x1(t) x2(t) x3(t) x4(t)
equations: = (c1)x1(t-1) + (c3)x2(t-1) + (c4)x3(t-1) + u1(t) = u1(t) = (c2)x1(t-1) + (c5)x3(t-1) + (c6)x4(t-1) + u2(t) = u2(t)
Observation equations: y1(t) = x1(t) + (c7)e1(t) y2(t) = x3(t) + (c8)e2(t) Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types are not specified.
Estimate the model parameters, and use a random set of initial parameter values for optimization. Restrict the estimate of σ1 and σ2 to all positive, real numbers using the 'lb' name-value pair argument. For numerical stability, specify the Hessian when the software computes the parameter covariance matrix, using the 'CovMethod' name-value pair argument. rng(1); params0 = rand(8,1); [EstMdl,estParams] = estimate(Mdl,isY,params0,... 'lb',[-Inf -Inf -Inf -Inf -Inf -Inf 0 0],'CovMethod','hessian'); Method: Maximum likelihood (fmincon) Sample size: 51 Logarithmic likelihood: -170.92 Akaike info criterion: 357.84 Bayesian info criterion: 373.295
12-2059
12
Functions
| Coeff Std Err t Stat Prob ---------------------------------------------------c(1) | 0.06750 0.16548 0.40791 0.68334 c(2) | -0.01372 0.05887 -0.23302 0.81575 c(3) | 2.71201 0.27039 10.03006 0 c(4) | 0.83816 2.84586 0.29452 0.76836 c(5) | 0.06273 2.83474 0.02213 0.98235 c(6) | 0.05197 2.56875 0.02023 0.98386 c(7) | 0.00272 2.40777 0.00113 0.99910 c(8) | 0.00016 0.13942 0.00113 0.99910 | | Final State Std Dev t Stat Prob x(1) | -0.00000 0.00272 -0.00033 0.99973 x(2) | 0.12237 0.92954 0.13164 0.89527 x(3) | 0.04049 0.00016 256.74352 0 x(4) | 0.01183 0.00016 72.50925 0
EstMdl is an ssm model, and you can access its properties using dot notation. Filter the estimated, state-space model, and extract the filtered states and their variances from the final period. [~,~,Output] = filter(EstMdl,isY);
Modify the estimated, state-space model so that the initial state means and covariances are the filtered states and their covariances of the final period. This sets up simulation over the forecast horizon. EstMdl1 = EstMdl; EstMdl1.Mean0 = Output(end).FilteredStates; EstMdl1.Cov0 = Output(end).FilteredStatesCov;
Simulate 5e5 paths of observations from the fitted, state-space model EstMdl. Specify to simulate observations for each period. numPaths = 5e5; SimY = simulate(EstMdl1,10,'NumPaths',numPaths);
SimY is a 10-by- 2-by- numPaths array containing the simulated observations. The rows of SimY correspond to periods, the columns correspond to an observation in the model, and the pages correspond to paths. Estimate the forecasted observations and their 95% confidence intervals in the forecast horizon. MCFY = mean(SimY,3); CIFY = quantile(SimY,[0.025 0.975],3);
Estimate the theoretical forecast bands. [Y,YMSE] = forecast(EstMdl,10,isY); Lb = Y - sqrt(YMSE)*1.96; Ub = Y + sqrt(YMSE)*1.96;
Plot the forecasted observations with their true values and the forecast intervals. figure h = plot(dates(end-numPeriods-9:end),[isY(end-9:end,1);oosY(:,1)],'-k',... dates(end-numPeriods+1:end),MCFY(end-numPeriods+1:end,1),'.-r',...
12-2060
simulate
dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,1,1),'-b',... dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,1,2),'-b',... dates(end-numPeriods+1:end),Y(:,1),':c',... dates(end-numPeriods+1:end),Lb(:,1),':m',... dates(end-numPeriods+1:end),Ub(:,1),':m',... 'LineWidth',3); xlabel('Period') ylabel('Change in the unemployment rate') legend(h([1,2,4:6]),{'Observations','MC forecasts',... '95% forecast intervals','Theoretical forecasts',... '95% theoretical intervals'},'Location','Best') title('Observed and Forecasted Changes in the Unemployment Rate')
figure h = plot(dates(end-numPeriods-9:end),[isY(end-9:end,2);oosY(:,2)],'-k',... dates(end-numPeriods+1:end),MCFY(end-numPeriods+1:end,2),'.-r',... dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,2,1),'-b',... dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,2,2),'-b',... dates(end-numPeriods+1:end),Y(:,2),':c',... dates(end-numPeriods+1:end),Lb(:,2),':m',... dates(end-numPeriods+1:end),Ub(:,2),':m',... 'LineWidth',3); xlabel('Period') ylabel('nGNP growth rate') legend(h([1,2,4:6]),{'Observations','MC forecasts',... '95% MC intervals','Theoretical forecasts','95% theoretical intervals'},...
12-2061
12
Functions
'Location','Best') title('Observed and Forecasted nGNP Growth Rates')
Tip Simulate states from their joint conditional posterior distribution given the responses by using simsmooth.
References [1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also ssm | estimate | filter | forecast | smooth | simsmooth Topics “Simulate States and Observations of Time-Invariant State-Space Model” on page 11-87 “Simulate Time-Varying State-Space Model” on page 11-90 “Forecast State-Space Model Using Monte-Carlo Methods” on page 11-104 “Estimate Random Parameter of State-Space Model” on page 11-97 “What Are State-Space Models?” on page 11-3 12-2062
simulate
simulate Simulate sample paths of threshold-switching dynamic regression model
Syntax Y = simulate(Mdl,numObs) Y = simulate(Mdl,numObs,Name,Value) [Y,E,StatePaths] = simulate( ___ )
Description Y = simulate(Mdl,numObs) returns a random numObs-period path of response series Y from simulating the fully specified threshold-switching dynamic regression model Mdl. Y = simulate(Mdl,numObs,Name,Value) uses additional options specified by one or more namevalue arguments. For example, simulate(Mdl,10,NumPaths=1000,Y0=Y0) simulates 1000 sample paths of length 10, and initializes the dynamic component of each submodel by using the presample response data Y0. [Y,E,StatePaths] = simulate( ___ ) also returns the simulated innovation paths E and the simulated state paths StatePaths, using any of the input argument combinations in the previous syntaxes.
Examples Simulate Response Path from SETAR Model Suppose a data-generating process (DGP) is a two-state, self-exciting threshold autoregressive (SETAR) model for a 1-D response variable. Specify all parameter values (this example uses arbitrary values). Create Fully Specified Model for DGP Create a discrete threshold transition at level 0. Label the regimes to reflect the state of the economy: When the threshold variable (currently unknown) is in − ∞ , 0 , the economy is in a recession. When the threshold variable is in [0, ∞ , the economy is expanding. t = 0; tt = threshold(t,StateNames=["Recession" "Expansion"]) tt = threshold with properties: Type: Levels: Rates: StateNames:
'discrete' 0 [] ["Recession"
"Expansion"]
12-2063
12
Functions
NumStates: 2
tt is a fully specified threshold object that describes the switching mechanism of the thresholdswitching model. Assume the following univariate models describe the response process of the system: • Recession: yt = − 1 + 0 . 1yt − 1 + ε1, t, where ε1, t ∼ Ν 0, 1 . 2 • Expansion: y = 1 + + 0 . 3y t t − 1 + 0 . 2yt − 2 + ε2, t, where ε2, t ∼ Ν 0, 2 .
For each regime, use arima to create an AR model that describes the response process within the regime. c1 = -1; c2 = 1; ar1 = 0.1; ar2 = [0.3 0.2]; v1 = 1; v2 = 4; mdl1 = arima(Constant=c1,AR=ar1,Variance=v1,... Description="Recession State Model") mdl1 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State Model" Name = "Gaussian" 1 0 0 -1 {0.1} at lag [1] {} {} {} 0 [1×0] 1
ARIMA(1,0,0) Model (Gaussian Distribution) mdl2 = arima(Constant=c2,AR=ar2,Variance=v2,... Description="Expansion State Model") mdl2 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR:
12-2064
"Expansion State Model" Name = "Gaussian" 2 0 0 1 {0.3 0.2} at lags [1 2] {}
simulate
MA: SMA: Seasonality: Beta: Variance:
{} {} 0 [1×0] 4
ARIMA(2,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are fully specified arima objects. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. mdl = [mdl1; mdl2];
Use tsVAR to create a TAR model from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["Recession" "Expansion"] "1" []
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 1-Dimensional VAR(2) Model" "Y1" 1 2 1 {0.3 0.2} at lags [1 2] 0 [1×0 matrix] 4
Mdl is a fully specified tsVAR object representing a univariate two-state TAR model. tsVAR stores specified arima submodels as varm objects. Simulate Response Data from DGP Generate one random response path of length 50 from the model. simulate assumes that the threshold variable is yt − 1, which implies that the model is self-exciting. rng(1); % For reproducibility y = simulate(Mdl,50);
y is a 50-by-1 vector of one response path. 12-2065
12
Functions
Plot the response path with the threshold by using ttplot. figure ttplot(Mdl.Switch,Data=y)
Simulate Multiple Paths Consider the following logistic TAR (LSTAR) model for the annual, CPI-based, Canadian inflation rate series yt. • State 1: y = − 5 + ε , where ε ∼ Ν 0, 0 . 12 . 1, t 1, t t • State 2: y = ε , where ε ∼ Ν 0, 0 . 22 . 2, t 2, t t • State 3: y = 5 + ε , where ε ∼ Ν 0, 0 . 32 . 3, t 3, t t • The system is in state 1 when yt < 2, the system is in state 2 when 2 ≤ yt < 8, and the system is in state 3 otherwise. • The transition function rate between states 1 and 2 is 3.5, and the transition function rate between states 2 and 3 is 1.5. Create an LSTAR model representing yt. t = [2 8]; tt = threshold([2 8],Type="logistic",Rates=[3.5 1.5]);
12-2066
simulate
mdl1 = arima(Constant=-5,Variance=0.1); mdl2 = arima(Constant=0,Variance=0.2); mdl3 = arima(Constant=5,Variance=0.3); Mdl = tsVAR(tt,[mdl1; mdl2; mdl3]);
Load the Canadian inflation and interest rate data set. load Data_Canada
Extract the CPI-based inflation rate series. INF_C = DataTable.INF_C; numObs = length(INF_C);
Simulate ten paths from the model. Specify the threshold variable type and its data. Y = simulate(Mdl,numObs,NumPaths=10,Type="exogenous",Z=INF_C);
Y is a numObs-by-10 matrix of simulated paths. Each column represents an independently simulated path. In a tiled layout, plot the threshold transitions with the data by using ttplot,and plot the simulated paths to one tile. tiledlayout(2,1) nexttile ttplot(tt,Data=INF_C) colorbar('off') xticklabels(dates(xticks)) nexttile plot(dates,Y) grid on axis tight title("Simulations")
12-2067
12
Functions
Y switches between submodels according to the value of the threshold variable INF_C. Mixing is evident for observations near thresholds, such as at the inflation rates of 1964 and 1978.
Return Innovations and States Consider the model for the annual, CPI-based, Canadian inflation rate series in “Simulate Multiple Paths” on page 12-2066. Create the LSTAR model for the series. t = [2 8]; tt = threshold([2 8],Type="logistic",Rates=[3.5 1.5]); mdl1 = arima(Constant=-5,Variance=0.1); mdl2 = arima(Constant=0,Variance=0.2); mdl3 = arima(Constant=5,Variance=0.3); Mdl = tsVAR(tt,[mdl1; mdl2; mdl3]);
Load the Canadian inflation and interest rate data set and extract the inflation rate series. load Data_Canada INF_C = DataTable.INF_C; numObs = length(INF_C);
12-2068
simulate
Simulate a length numObs path from the model. Specify the threshold variable type and its data. Return the innovations and states. [y,e,s] = simulate(Mdl,numObs,NumPaths=10,Type="exogenous",Z=INF_C); tiledlayout(3,1) nexttile plot(y); ylabel("Simulated Response") grid on nexttile plot(e) ylabel('Innovation') grid on nexttile stem(s) ylabel('State') yticks([1 2 3]) yticklabels(Mdl.StateNames)
12-2069
12
Functions
Initialize Multivariate Model Simulation from Multiple Starting Conditions This example shows how to initialize simulated paths from presample responses and initial states. The example uses arbitrary parameter values. Fully Specify LSETAR Model Consider the following 2-D LSETAR model. • • •
State 1, "Low": yt =
1 0 1 −0 . 1 + ε1, t, where ε1, t ∼ N , . −1 0 −0 . 1 1
State 2 , "Med": yt =
2 0.5 0.1 0 2 −0 . 2 + yt − 1 + ε2, t, where ε2, t ∼ N , . −2 0.5 0.5 0 −0 . 2 2
3 0 . 25 0 0 0 + yt − 1 + yt − 2 + ε3, t, where −3 0 0 0 . 25 0 0 3 −0 . 3 ε3, t ∼ N , . 0 −0 . 3 3 State 3, "High": yt =
• The system is in state 1 when y2, t − 4 < − 1, the system is in state 2 when −1 ≤ y2, t − 4 < 1, and the system is in state 3 otherwise. • The transition function is logistic. The transition rate from state 1 to 2 is 3.5, and the transition rate from state 1 to 3 is 1.5. Create logistic threshold transitions at mid-levels -1 and 1 with rates 3.5 and 1.5, respectively. Label the states. t = [-1 1]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames);
Create the VAR submodels by using varm. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. % Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Autoregression coefficients (numSeries AR1 = {}; % 0 AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2
x numSeries matrices) lags lag lags
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VAR Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3); mdl = [mdl1; mdl2; mdl3];
12-2070
simulate
Create an LSETAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] []
"High"]
Mdl is a fully specified tsVAR object representing a multivariate three-state LSETAR model. tsVAR object functions enable you to specify threshold variable characteristics and data. Initialize Simulation from Presample Responses Consider simulating 5 paths initialized from presample responses. Specify a numPreObs-bynumSeries-by-numPaths array of presample responses, where: • numPreObs is the number of presample responses per series and path. You must specify enough presample observations to initialize all AR components in the VAR models and the endogenous threshold variable. The largest AR component order is 2 and the threshold variable delay is 4, therefore simulate requires numPreObs=4 presample observations per series and path. • numSeries=2, the number of response series in the system. • numPaths=5, the number of independent paths to simulate. delay = 4; numPaths = 5; Y0 = zeros(delay,Mdl.NumSeries,numPaths); for j = 2:numPaths Y0(:,:,j) = 10*j*ones(delay,Mdl.NumSeries); end
Simulate 10 paths of length 100 from the LSETAR model from the presample. Specify the endogenous threshold variable and its delay, y2, t − 4. numObs = 100; rng(1); Y = simulate(Mdl,numObs,NumPaths=numPaths,Y0=Y0,Index=2,Delay=4);
Y is a 100-by-2-by-5 array of simulate response paths. For example, Y(50,2,3) is the simulated response of path 3, of series Y2, at time point 50. Plot the simulated paths for each variable on separate plots. tiledlayout(2,1) nexttile plot(squeeze(Y(:,1,:))) title("Y1") nexttile
12-2071
12
Functions
plot(squeeze(Y(:,2,:))) title("Y2")
The system quickly settles regardless of the presample. Initialize Simulation from States Simulate three paths of length 100, where each of the three states initialize a path. Specify state indices for initialization, and specify the endogenous threshold variable and its delay. S0 = 1:Mdl.NumStates; numPaths = numel(S0); Y = simulate(Mdl,numObs,NumPaths=numPaths,S0=S0,Index=2,Delay=4); tiledlayout(2,1) nexttile plot(squeeze(Y(:,1,:))) title("Y1") nexttile plot(squeeze(Y(:,2,:))) title("Y2")
12-2072
simulate
Simulate Model Containing Exogenous Regression Component Consider including regression components for exogenous variables in each submodel of the threshold-switching dynamic regression model in “Initialize Multivariate Model Simulation from Multiple Starting Conditions” on page 12-2069. Fully Specify LSETAR Model Create logistic threshold transitions at mid-levels -1 and 1 with rates 3.5 and 1.5, respectively. Label the states. t = [-1 1]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [-1 1] [3.5000 1.5000] ["Low" "Med" 3
"High"]
12-2073
12
Functions
Assume the following VARX models describe the response processes of the system: • • •
State 1: yt =
1 1 0 1 −0 . 1 + x1, t + ε1, t, where ε1, t ∼ N , . −1 −1 0 −0 . 1 1
State 2: yt =
2 2 2 0.5 0.1 0 2 −0 . 2 + x2, t + yt − 1 + ε2, t, where ε2, t ∼ N , . −2 −2 −2 0.5 0.5 0 −0 . 2 2
3 3 3 3 0 . 25 0 0 0 + x3, t + yt − 1 + yt − 2 + ε3, t, where −3 −3 −3 −3 0 0 0 . 25 0 0 3 −0 . 3 ε3, t ∼ N , . 0 −0 . 3 3 State 3: yt =
x1, t represents a single exogenous variable, x2, t represents two exogenous variables, and x3, t represents three exogenous variables. Store the submodels in a vector. % Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Regression coefficients (numSeries x numRegressors matrices) Beta1 = [1; -1]; % 1 regressor Beta2 = [2 2; -2 -2]; % 2 regressors Beta3 = [3 3 3; -3 -3 -3]; % 3 regressors % Autoregression coefficients (numSeries x numSeries matrices) AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; %VARX submodels mdl1 = varm(Constant=C1,AR=AR1,Beta=Beta1,Covariance=Sigma1); mdl2 = varm(Constant=C2,AR=AR2,Beta=Beta2,Covariance=Sigma2); mdl3 = varm(Constant=C3,AR=AR3,Beta=Beta3,Covariance=Sigma3); mdl = [mdl1; mdl2; mdl3];
Create an LSETAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames:
12-2074
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"]
"High"]
simulate
Covariance: []
Simulate Data Ignoring Regression Component If you do not supply exogenous data, simulate ignores the regression components in the submodels. Simulate a single path of responses, innovations, and states into a simulation horizon of length 50. Then plot each path separately. rng(1); % For reproducibility numObs = 50; [Y,E,SP] = simulate(Mdl,numObs); figure tiledlayout(3,1) nexttile plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) nexttile plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) nexttile stem(SP) ylabel("State") yticks([1 2 3])
12-2075
12
Functions
Simulate Data Including Regression Component simulate requires exogenous data in order to generate random paths from the model. Simulate exogenous data for the three regressors by generating 50 random observations from the 3-D standard Gaussian distribution. X = randn(50,3);
Generate one random response, innovation, and state path of length 50. Specify the simulated exogenous data for the submodel regression components. Plot the results. rng(1); % Reset seed for comparison [Y,E,SP] = simulate(Mdl,numObs,X=X); figure tiledlayout(3,1) nexttile plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) nexttile plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) nexttile
12-2076
simulate
stem(SP) ylabel("State") yticks([1 2 3])
Perform Monte Carlo Estimation This example shows how to use Monte Carlo estimation to obtain an interval estimate of the threshold mid-level. Consider a SETAR model for the real US GDP growth rate yt with AR(4) submodels. Suppose the threshold variable is yt (self exciting with 0 delay). Create a discrete threshold transition at unknown mid-level t1. Label the states "Recession" and "Expansion". tt = threshold(NaN,StateNames=["Recession" "Expansion"]);
For each state, create a partially specified AR(4) model with one coefficient at lag 4. Store the state submodels in a vector. submdl = arima(ARLags=4); mdl = [submdl; submdl];
Each submodel has an unknown, estimable lag 4 coefficient, model constant, and innovations variance. 12-2077
12
Functions
Create a partially specified TAR model from the threshold transition and submodel vector. Mdl = tsVAR(tt,mdl);
Create a fully specified threshold transition that has the same structure as tt, but set the mid-level to 0. tt0 = threshold(0);
Load the US macroeconomic data set. Compute the real GDP growth rate as a percent. load Data_USEconModel rGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF; pRGDP = 100*price2ret(rGDP); T = numel(pRGDP);
Fit the TAR to the real GDP rate series. EstMdl = estimate(Mdl,tt0,pRGDP,Z=pRGDP,Type="exogenous");
Simulate 100 response paths from the estimated model. rng(100) % For reproducibility numPaths= 100; Y = simulate(EstMdl,T,NumPaths=numPaths,Z=pRGDP,Type="exogenous");
Fit the TAR model to each simulated response path. Specify the estimated threshold transition EstMdl.Switch to initialize the estimation procedure. For each path, store the estimated threshold transition mid-level. tMC = nan(T,1); for j = 1:numPaths EstMdlSim = estimate(Mdl,EstMdl.Switch,Y(:,j),Z=Y(:,j),Type="exogenous"); tMC(j) = EstMdlSim.Switch.Levels; end
tMC is a 100-by-1 vector representing a Monte Carlo sample of threshold transitions. Obtain a 95% confidence interval on the true threshold transition by computing the 0.25 and .975 quantiles of the Monte Carlo sample. tCI = quantile(tMC,[0.025 0.975]) tCI = 1×2 0.5158
0.9810
A 95% confidence interval on the true threshold transition is (0.52%, 0.98%).
Input Arguments Mdl — Fully specified threshold-switching dynamic regression model tsVAR model object 12-2078
simulate
Fully specified threshold-switching dynamic regression model, specified as an tsVAR model object returned by tsVAR or estimate. Properties of a fully specified model object do not contain NaN values. numObs — Number of observations to generate positive integer Number of observations to generate for each sample path, specified as a positive integer. Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: NumPaths=1000,Y0=Y0 simulates 1000 sample paths and initializes the dynamic component of each submodel by using the presample response data Y0. NumPaths — Number of sample paths to generate 1 (default) | positive integer Number of sample paths to generate, specified as a positive integer. Example: NumPaths=1000 Data Types: double Type — Type of threshold variable data "endogenous" (default) | "exogenous" Type of threshold variable data, specified as a value in this table. Value
Description
"endogenous"
The model is self-exciting with threshold variable data zt = y j, (t − d), generated by response j, where • The name-value argument 'Delay' specifies the delay d. • The name-value argument 'Index' specifies the component j of the multivariate response variable.
"exogenous"
The threshold variable is exogenous to the system. The name-value argument 'Z' specifies the threshold variable data and is required.
Example: Type="exogenous",Z=z specifies the data z for the exogenous threshold variable. Example: Type="endogenous",Index=2,Delay=4 specifies the endogenous threshold variable as y2,t−4, whose data is Y(:,2). Data Types: char | string | cell 12-2079
12
Functions
Y0 — Presample response data numeric matrix | numeric array Presample response data, specified as a numeric matrix or array. To use the same presample data for each of the numPaths path, specify a numPreSampleObs-bynumSeries matrix, where numPaths is the value of NumPaths, numPreSampleObs is the number of presample observations, and numSeries is the number of response variables. To use different presample data for each path: • For univariate ARX submodels, specify a numPreSampleObs-by-numPaths matrix. • For multivariate VARX submodels, specify a numPreSampleObs-by-numSeries-by-numPaths array. The number of presample observations numPreSampleObs must be sufficient to initialize the AR terms of all submodels. For models of type "endogenous", the number of presample observations must also be sufficient to initialize the delayed response. If numPreSampleObs exceeds the number necessary to initial the model, simulate uses only the latest observations. The last row contains the latest observations. simulate updates Y0 using the latest simulated observations each time it switches states. By default, simulate determines Y0 by the submodel of the initial state: • If the initial submodel is a stationary AR process without regression components, simulate sets presample observations to the unconditional mean. • Otherwise, simulate sets presample observations to zero. Data Types: double Z — Threshold variable data zt empty array ([]) (default) | numeric vector | numeric matrix Threshold variable data for simulations of type "exogenous", specified as a numeric vector of length numObsZ or a numObsZ-by-numPaths numeric matrix. For a numeric vector, simulate applies the same data to all simulated paths. For a matrix, simulate applies columns of Z to corresponding simulated paths. If numObsZ exceeds numobs, simulate uses only the latest observations. The last row contains the latest observation. simulate determines the initial state of simulations by values in the first row Z(1,:). Data Types: double Delay — Threshold variable delay d in yj,t−d 1 (default) | positive integer Threshold variable delay d in yj,t−d for simulations of type "endogenous", specified as a positive integer. Example: Delay=4 specifies that the threshold variable is y2,t−d, where j is the value of Index. Data Types: double 12-2080
simulate
Index — Threshold variable index j in yj,t−d 1 (default) | scalar in 1:Mdl.NumSeries Threshold variable index j in yj,t−d for simulations of type "endogenous", specified as a scalar in 1:Mdl.NumSeries. simulate ignores Index for univariate AR models. Example: Index=2 specifies that the threshold variable is y2,t−d, where d is the value of Delay. Data Types: double S0 — Initial states 1 (default) | numeric scalar | numeric vector Initial states of simulations, for simulations of type "endogenous", specified as a numeric scalar or vector of length numPaths. Entries of S0 must be in 1:Mdl.NumStates. A scalar S0 applies the same initial state to all paths. A vector S0 applies initial state S0(j) to path j. If you specify Y0, simulate ignores S0 and determines initial states by the specified presample data. Example: 'S0',2 applies state 2 to initialize all paths. Example: 'S0',[2 3] specifies state 2 as the initial state. Data Types: double X — Predictor data numeric matrix | cell vector of numeric matrices Predictor data used to evaluate regression components in all submodels of Mdl, specified as a numeric matrix or a cell vector of numeric matrices. To use a subset of the same predictors in each state, specify X as a matrix with numPreds columns and at least numObs rows. Columns correspond to distinct predictor variables. Submodels use initial columns of the associated matrix, in order, up to the number of submodel predictors. The number of columns in the Beta property of Mdl.SubModels(j) determines the number of exogenous variables in the regression component of submodel j. If the number of rows exceeds numObs, then simulate uses the latest observations. To use different predictors in each state, specify a cell vector of such matrices with length numStates. By default, simulate ignores regression components in Mdl. Data Types: double
Output Arguments Y — Simulated response paths numeric matrix | numeric array Simulated response paths, returned as a numeric matrix or array. Y represents the continuation of the presample responses in Y0. 12-2081
12
Functions
For univariate ARX submodels, Y is a numObs-by-numPaths matrix. For multivariate VARX submodels, Y is a numObs-by-numSeries-by-numPaths array. E — Simulated innovation paths numeric matrix | numeric array Simulated innovation paths, returned as a numeric matrix or array. For univariate ARX submodels, E is a numObs-by-numPaths matrix. For multivariate VARX submodels, E is a numObs-by-numSeries-by-numPaths array. simulate generates innovations using the covariance specification in Mdl. For more details, see tsVAR. StatePaths — Simulated state paths numeric matrix Simulated state paths, returned as a numObs-by-numPaths numeric matrix. If threshold levels in Mdl.Switch.Levels are t1, t2,… ,tn, simulate labels states of the threshold variable (-∞,t1), [t1,t2), … [tn,∞) as 1, 2, 3,... n + 1, respectively.
Version History Introduced in R2021b
References [1] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [2] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also Objects tsVAR Functions estimate | forecast Topics “Estimate Threshold-Switching Dynamic Regression Models” on page 10-93 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-110 “Forecast Threshold-Switching Dynamic Regression Models” on page 10-117
12-2082
simulate
simulate Monte Carlo simulation of vector autoregression (VAR) model
Syntax Y = simulate(Mdl,numobs) Y = simulate(Mdl,numobs,Name=Value) [Y,E] = simulate( ___ ) Tbl = simulate(Mdl,numobs,Presample=Presample) Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables) Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables,Presample=Presample) Tbl = simulate( ___ ,Name=Value)
Description Conditional and Unconditional Simulation for Numeric Arrays
Y = simulate(Mdl,numobs) returns the numeric array Y containing a random numobs-period path of multivariate response series from performing an unconditional simulation of the fully specified VAR(p) model Mdl. Y = simulate(Mdl,numobs,Name=Value) uses additional options specified by one or more namevalue arguments. simulate returns numeric arrays when all optional input data are numeric arrays. For example, simulate(Mdl,100,NumPaths=1000,Y0=PS) returns a numeric array of 1000, 100period simulated response paths from Mdl and specifies the numeric array of presample response data PS. To produce a conditional simulation, specify response data in the simulation horizon by using the YF name-value argument. [Y,E] = simulate( ___ ) also returns the numeric array containing the simulated multivariate model innovations series E corresponding to the simulated responses Y, using any input argument combination in the previous syntaxes. Unconditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,Presample=Presample) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the unconditional simulation of the response series in the model Mdl. simulate uses the table or timetable of presample data Presample to initialize the response series. simulate selects the variables in Mdl.SeriesNames to simulate, or it selects all variables in Presample. To select different response variables in Tbl to simulate, use the PresampleResponseVariables name-value argument. Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) uses additional options specified by one or more name-value arguments. For example, 12-2083
12
Functions
simulate(Mdl,100,Presample=PSTbl,PresampleResponseVariables=["GDP" "CPI"]) returns a timetable of variables containing 100-period simulated response and innovations series from Mdl, initialized by the data in the GDP and CPI variables of the timetable of presample data in PSTbl. Conditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the conditional simulation of the response series in the model Mdl. InSample is a table or timetable of response or predictor data in the simulation horizon that simulate uses to perform the conditional simulation and ResponseVariables specifies the response variables in InSample. Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables,Presample=Presample) uses the presample data in the table or timetable Presample to initialize the model. Tbl = simulate( ___ ,Name=Value) uses additional options specified by one or more name-value arguments, using any input argument combination in the previous two syntaxes.
Examples Return Response Series in Matrix from Unconditional Simulation Fit a VAR(4) model to the consumer price index (CPI) and unemployment rate data. Then, simulate a vector of responses from the estimated model. Load the Data_USEconModel data set. load Data_USEconModel
Plot the two series on separate plots. figure plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL); title("Consumer Price Index") ylabel("Index") xlabel("Date")
12-2084
simulate
figure plot(DataTimeTable.Time,DataTimeTable.UNRATE); title("Unemployment Rate") ylabel("Percent") xlabel("Date")
12-2085
12
Functions
Stabilize the CPI by converting it to a series of growth rates. Synchronize the two series by removing the first observation from the unemployment rate series. Create a new data set containing the transformed variables, and do not include any rows containing at least one missing observation. rcpi = price2ret(DataTimeTable.CPIAUCSL); unrate = DataTimeTable.UNRATE(2:end); dates = DataTimeTable.Time(2:end); Data = array2timetable([rcpi unrate],RowTimes=dates, ... VariableNames=["RCPI" "UNRATE"]); Data = rmmissing(Data);
Create a default VAR(4) model using the shorthand syntax. Mdl = varm(2,4); Mdl.SeriesNames = Data.Properties.VariableNames;
Estimate the model using the entire data set. EstMdl = estimate(Mdl,Data.Variables);
EstMdl is a fully specified, estimated varm model object. Simulate a response series path from the estimated model with length equal to the path in the data. rng(1); % For reproducibility numobs = height(Data); Y = simulate(EstMdl,numobs);
12-2086
simulate
Y is a 245-by-2 matrix of simulated responses. The first and second columns contain the simulated CPI growth rate and unemployment rate, respectively. Plot the simulated and true responses. figure plot(Data.Time,Y(:,1)); hold on plot(Data.Time,Data.RCPI) title("CPI Growth Rate"); ylabel("Growth Rate") xlabel("Date") legend("Simulation","Observed") hold off
figure plot(Data.Time,Y(:,2)); hold on plot(Data.Time,Data.UNRATE) ylabel("Percent") xlabel("Date") title("Unemployment Rate") legend("Simulation","Observed") hold off
12-2087
12
Functions
Simulate Responses Using filter Illustrate the relationship between simulate and filter by estimating a 4-dimensional VAR(2) model of the four response series in Johansen's Danish data set. Simulate a single path of responses using the fitted model and the historical data as initial values, and then filter a random set of Gaussian disturbances through the estimated model using the same presample responses. Load Johansen's Danish economic data. load Data_JDanish
For details on the variables, enter Description. Create a default 4-D VAR(2) model. Mdl = varm(4,2); Mdl.SeriesNames = DataTimeTable.Properties.VariableNames;
Estimate the VAR(2) model using the entire data set. EstMdl = estimate(Mdl,DataTimeTable.Variables);
When reproducing the results of simulate and filter, it is important to take these actions. 12-2088
simulate
• Set the same random number seed using rng. • Specify the same presample response data using the Y0 name-value argument. Set the default random seed. Simulate 100 observations by passing the estimated model to simulate. Specify the entire data set as the presample. rng("default") YSim = simulate(EstMdl,100,Y0=DataTimeTable.Variables);
YSim is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in Data. Set the default random seed. Simulate 4 series of 100 observations from the standard Gaussian distribution. rng("default") Z = randn(100,4);
Filter the Gaussian values through the estimated model. Specify the entire data set as the presample. YFilter = filter(EstMdl,Z,Y0=DataTimeTable.Variables);
YFilter is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in the data Data. Before filtering the disturbances, filter scales Z by the lower triangular Cholesky factor of the model covariance in EstMdl.Covariance. Compare the resulting responses between filter and simulate. (YSim - YFilter)'*(YSim - YFilter) ans = 4×4 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
The results are identical.
Simulate Arrays of Multiple Response and Innovations Paths Load Johansen's Danish economic data. Remove all missing observations. load Data_JDanish Data = rmmissing(Data); T = height(Data);
For details on the variables, enter Description. Create a default 4-D VAR(2) model. Mdl = varm(4,2);
Estimate the VAR(2) model using the entire data set. 12-2089
12
Functions
EstMdl = estimate(Mdl,Data);
When reproducing the results of simulate and filter, it is important to take these actions. • Set the same random number seed using rng. • Specify the same presample response data using the Y0 name-value argument. Simulate 100 paths of T - EstMdl.P, the effective sample size, responses and corresponding innovations by passing the estimated model to simulate. Specify the same matrix of presample as the presample used in estimation (the earliest Mdl.P observations, by default). rng("default") p = Mdl.P; numobs = T - p; PS = Data(1:p,:); [YSim,ESim] = simulate(EstMdl,numobs,NumPaths=100,Y0=PS); size(YSim) ans = 1×3 53
4
100
YSim and ESim is a 53-by-4-by-1000 numeric arrays of simulated responses and innovations, respectively. Each row corresponds to a period in the simulation horizon, each column corresponds to the variable in EstMdl.SeriesNames, and pages are separate, independently simulated paths. Plot each simulated response and innovations variable with their observations. figure InSample = Data((p+1):end,:); tiledlayout(2,2) for j = 1:numel(EstMdl.SeriesNames) nexttile h1 = plot(squeeze(YSim(:,j,:)),Color=[0.8 0.8 0.8]); hold on h2 = plot(InSample(:,j),Color="k",LineWidth=2); hold off title(series(j)) legend([h1(1) h2],["Simulated" "Observed"]) end
12-2090
simulate
E = infer(EstMdl,InSample,Y0=PS); figure tiledlayout(2,2) for j = 1:numel(EstMdl.SeriesNames) nexttile h1 = plot(squeeze(ESim(:,j,:)),Color=[0.8 0.8 0.8]); hold on h2 = plot(E(:,j),Color="k",LineWidth=2); hold off title("Innovations: " + EstMdl.SeriesNames{j}) legend([h1(1) h2],["Simulated" "Observed"]) end
12-2091
12
Functions
Return Timetable of Responses and Innovations from Unconditional Simulation Fit a VAR(4) model to the consumer price index (CPI) and unemployment rate data. Then, perform an unconditional simulation of the estimated model and return the simulated responses and corresponding innovations in a timetable. This example is based on “Return Response Series in Matrix from Unconditional Simulation” on page 12-2084. Load and Preprocess Data Load the Data_USEconModel data set. Compute the CPI growth rate. Because the growth rate calculation consumes the earliest observation, include the rate variable in the timetable by prepending the series with NaN. load Data_USEconModel DataTimeTable.RCPI = [NaN; price2ret(DataTimeTable.CPIAUCSL)]; T = height(DataTimeTable) T = 249
Prepare Timetable for Estimation When you plan to supply a timetable directly to estimate, you must ensure it has all the following characteristics: 12-2092
simulate
• All selected response variables are numeric and do not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the table, relative to the CPI rate (RCPI) and unemployment rate (UNRATE) series. varnames = ["RCPI" "UNRATE"]; DTT = rmmissing(DataTimeTable,DataVariables=varnames); T = height(DTT) T = 245
rmmissing removes the four initial missing observations from the DataTimeTable to create a subtable DTT. The variables RCPI and UNRATE of DTT do not have any missing observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt; areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 1
DTT is regular with respect to time. Create Model Template for Estimation Create a default VAR(4) model using the shorthand syntax. Specify the response variable names. Mdl = varm(2,4); Mdl.SeriesNames = varnames;
Fit Model to Data Estimate the model. Pass the entire timetable DTT. By default, estimate selects the response variables in Mdl.SeriesNames to fit to the model. Alternatively, you can use the ResponseVariables name-value argument. 12-2093
12
Functions
EstMdl = estimate(Mdl,DTT); p = EstMdl.P p = 4
Perform Unconditional Simulation of Estimated Model Simulate a response and innovations path from the estimated model and return the simulated series as variables in a timetable. simulate requires information for the output timetable, such as variable names, sampling times for the simulation horizon, and sampling frequency. Therefore, supply a presample of the earliest p = 4 observations of the data DTT, from which simulate infers the required timetable information. Specify a simulation horizon of numobs - p. rng(1) % For reproducibility PSTbl = DTT(1:p,:); numobs = T - p; Tbl = simulate(EstMdl,T,Presample=PSTbl); size(Tbl) ans = 1×2 245
4
PSTbl PSTbl=4×15 timetable Time COE _____ _____ Q1-48 Q2-48 Q3-48 Q4-48
137.9 139.6 144.5 145.9
CPIAUCSL ________ 23.5 24.15 24.36 24.05
FEDFUNDS ________ NaN NaN NaN NaN
GCE ____
GDP _____
GDPDEF ______
GPDI ____
GS10 ____
HOANBS ______
37.6 39.7 41.4 43.5
260.4 267.3 273.9 275.2
16.111 16.254 16.556 16.597
45 48.1 50.2 49.1
NaN NaN NaN NaN
55.036 55.007 55.398 54.885
head(Tbl) Time _____
RCPI_Responses ______________
UNRATE_Responses ________________
RCPI_Innovations ________________
UNRATE_Innovations __________________
Q1-49 Q2-49 Q3-49 Q4-49 Q1-50 Q2-50 Q3-50 Q4-50
0.0037294 0.0064827 -0.0073358 -0.0057328 -0.0060454 -0.0084475 -0.0067066 -0.0020759
4.6036 5.0083 5.4981 5.7007 5.8687 5.5758 5.4129 5.2191
-0.0038547 0.0070154 -0.0045047 -0.0065904 -0.005022 -0.0034013 -0.0033182 0.0010595
0.25039 0.027504 0.25199 0.10593 0.13824 -0.26192 0.13055 0.11135
Tbl is a 241-by-4 matrix of simulated responses and innovations. RCPI_Responses is the simulated path of the CPI growth rate and RCPI_Innovations is the corresponding innovations series, and the variables associated with the unemployment rate are similar. The timestamps of Tbl follow directly from the timestamps of PSTbl, and they have the same sampling frequency.
12-2094
simulate
Simulate Responses From Model Containing Regression Component Estimate a VAR(4) model of the consumer price index (CPI), the unemployment rate, and the gross domestic product (GDP). Include a linear regression component containing the current and the last 4 quarters of government consumption expenditures and investment. Simulate multiple paths from the estimated model. Load the Data_USEconModel data set. Compute the real GDP. load Data_USEconModel DataTimeTable.RGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF*100;
Plot all variables on separate plots. figure tiledlayout(2,2) nexttile plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL); ylabel("Index") title("Consumer Price Index") nexttile plot(DataTimeTable.Time,DataTimeTable.UNRATE); ylabel("Percent") title("Unemployment Rate") nexttile plot(DataTimeTable.Time,DataTimeTable.RGDP); ylabel("Output") title("Real Gross Domestic Product") nexttile plot(DataTimeTable.Time,DataTimeTable.GCE); ylabel("Billions of $") title("Government Expenditures")
12-2095
12
Functions
Stabilize the CPI, GDP, and GCE by converting each to a series of growth rates. Synchronize the unemployment rate series with the others by removing its first observation. varnames = ["CPIAUCSL" "RGDP" "GCE"]; DTT = varfun(@price2ret,DataTimeTable,InputVariables=varnames); DTT.Properties.VariableNames = varnames; DTT.UNRATE = DataTimeTable.UNRATE(2:end);
Make the time base regular. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
Expand the GCE rate series to a matrix that includes the first lagged series through the fourth lag series. RGCELags = lagmatrix(DTT,1:4,DataVariables="GCE"); DTT = [DTT RGCELags]; DTT = rmmissing(DTT);
Create separate presample and estimation sample data sets. The presample contains the earliest p = 4 observations, and the estimation sample contains the rest of the data. p = 4; PS = DTT(1:p,:); InSample = DTT((p+1):end,:); respnames = ["CPIAUCSL" "UNRATE" "RGDP"];
12-2096
simulate
idx = endsWith(InSample.Properties.VariableNames,"GCE"); prednames = InSample.Properties.VariableNames(idx);
Create a default VAR(4) model using the shorthand syntax. Specify the response variable names. Mdl = varm(3,p); Mdl.SeriesNames = respnames;
Estimate the model using the entire sample. Specify the GCE and its lags as exogenous predictor data for the regression component. EstMdl = estimate(Mdl,InSample,Presample=PS,PredictorVariables=prednames);
Generate 100 random response and innovations paths from the estimated model by performing an unconditional simulation. Specify that the length of the paths is the same as the length of the estimation sample period. Supply the presample and estimation sample data. rng(1) % For reproducibility numpaths = 100; numobs = height(InSample); Tbl = simulate(EstMdl,numobs,NumPaths=numpaths, ... Presample=PS,InSample=InSample,PredictorVariables=prednames); size(Tbl) ans = 1×2 240
14
head(Tbl) Time _____
CPIAUCSL __________
RGDP __________
GCE __________
Q1-49 Q2-49 Q3-49 Q4-49 Q1-50 Q2-50 Q3-50 Q4-50
0.00041815 -0.0071324 -0.0059122 0.0012698 0.010101 0.01908 0.025954 0.035395
-0.0031645 0.011385 -0.010366 0.040091 0.029649 0.03844 0.017994 0.01197
0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478 0.075508 0.14807
UNRATE ______ 6.2 6.6 6.6 6.3 5.4 4.4 4.3 3.4
Lag1GCE __________
Lag2GCE __________
Lag ____
0.047147 0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478 0.075508
0.04948 0.047147 0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478
0 0 0. 0. -0.0 -0. -0. 0.
Tbl is a 240-by-14 timetable of estimation sample data, simulated responses (denoted responseName_Responses) and corresponding innovations (denoted responseName_Innovations). The simulated response and innovations variables are 240-by-100 matrices, where each row is a period in the estimation sample and each column is a separate, independently generated path. For each time in the estimation sample, compute the mean vector of the simulated responses among all paths. idx = endsWith(Tbl.Properties.VariableNames,"_Responses"); simrespnames = Tbl.Properties.VariableNames(idx); MeanSim = varfun(@(x)mean(x,2),Tbl,InputVariables=simrespnames);
MeanSim is a 240-by-3 timetable containing the average of the simulated responses at each time point. 12-2097
12
Functions
Plot the simulated responses, their averages, and the data. figure tiledlayout(2,2) for j = 1:Mdl.NumSeries nexttile plot(Tbl.Time,Tbl{:,simrespnames(j)},Color=[0.8,0.8,0.8]) title(Mdl.SeriesNames{j}); hold on h1 = plot(Tbl.Time,Tbl{:,respnames(j)}); h2 = plot(Tbl.Time,MeanSim{:,"Fun_"+simrespnames(j)}); hold off end hl = legend([h1 h2],"Data","Mean"); hl.Position = [0.6 0.25 hl.Position(3:4)];
Return Timetable of Responses and Innovations from Conditional Simulation Perform a conditional simulation of the VAR model in “Return Timetable of Responses and Innovations from Unconditional Simulation” on page 12-2092, in which economists hypothesize that the unemployment rate is 6% for 15 quarters after the end of the sampling period (from Q2 of 2009 through Q4 of 2012).
12-2098
simulate
Load and Preprocess Data Load the Data_USEconModel data set. Compute the CPI growth rate. Because the growth rate calculation consumes the earliest observation, include the rate variable in the timetable by prepending the series with NaN. load Data_USEconModel DataTimeTable.RCPI = [NaN; price2ret(DataTimeTable.CPIAUCSL)];
Prepare Timetable for Estimation Remove all missing values from the table, relative to the CPI rate (RCPI) and unemployment rate (UNRATE) series. varnames = ["RCPI" "UNRATE"]; DTT = rmmissing(DataTimeTable,DataVariables=varnames);
Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
Create Model Template for Estimation Create a default VAR(4) model using the shorthand syntax. Specify the response variable names. p = 4; Mdl = varm(2,p); Mdl.SeriesNames = varnames;
Fit Model to Data Estimate the model. Pass the entire timetable DTT. By default, estimate selects the response variables in Mdl.SeriesNames to fit to the model. Alternatively, you can use the ResponseVariables name-value argument. EstMdl = estimate(Mdl,DTT);
Prepare for Conditional Simulation of Estimated Model Suppose economists hypothesize that the unemployment rate will be at 6% for the next 15 quarters. Create a timetable with the following qualities: 1
The timestamps are regular with respect to the estimation sample timestamps and they are ordered from Q2 of 2009 through Q4 of 2012.
2
The variable RCPI (and, consequently, all other variables in DTT) is a 15-by-1 vector of NaN values.
3
The variable UNRATE is a 15-by-1 vector, where each element is 6.
numobs = 15; shdt = DTT.Time(end) + calquarters(1:numobs); DTTCondSim = retime(DTT,shdt,"fillwithmissing"); DTTCondSim.UNRATE = ones(numobs,1)*6;
12-2099
12
Functions
DTTCondSim is a 15-by-15 timetable that follows directly, in time, from DTT, and both timetables have the same variables. All variables in DTTCondSim contain NaN values, except for UNRATE, which is a vector composed of the value 6. Perform Conditional Simulation of Estimated Model Simulate the CPI growth rate given the hypothesis by supplying the conditioning data DTTCondSim and specifying the response variable names. Generate 1000 paths. Because the simulation horizon is beyond the estimation sample data, supply the estimation sample as a presample to initialize the model. rng(1) % For reproducibility Tbl = simulate(EstMdl,numobs,NumPaths=1000, ... InSample=DTTCondSim,ResponseVariables=EstMdl.SeriesNames, ... Presample=DTT,PresampleResponseVariables=EstMdl.SeriesNames); size(Tbl) ans = 1×2 15
19
idx = endsWith(Tbl.Properties.VariableNames,["_Responses" "_Innovations"]); head(Tbl(:,idx)) Time _____
RCPI_Responses ______________
Q2-09 Q3-09 Q4-09 Q1-10 Q2-10 Q3-10 Q4-10 Q1-11
1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
UNRATE_Responses ________________ 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
RCPI_Innovations ________________ 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
UNRATE_Innovations __________________ 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
Tbl is a 15-by-19 matrix of simulated responses and innovations of RCPI given UNRATE is 6% for the next 15 quarters. RCPI_Responses contains the simulated paths of the CPI growth rate and RCPI_Innovations contains the corresponding innovations series. UNRATE_Responses is a 15by-1000 matrix composed of the value 6. All other variables in Tbl are the variables and their values in DTTCondSim. Plot the simulated values of the CPI growth rate and their mean with the final few values of the estimation sample data. MeanRCPISim = mean(Tbl.RCPI_Responses,2); figure h1 = plot(DTT.Time((end-30):end),DTT.RCPI((end-30):end)); hold on h2 = plot(Tbl.Time,Tbl.RCPI_Responses,Color=[0.8 0.8 0.8]); h3 = plot(Tbl.Time,MeanRCPISim,Color="k",LineWidth=2); xline(Tbl.Time(1),"r--",LineWidth=2) hold off title(EstMdl.SeriesNames) legend([h1 h2(1) h3],["Estimation data" "Simulated paths" "Simulation Mean"], ... Location="best")
12-2100
simulate
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object created by varm or estimate. Mdl must be fully specified. numobs — Number of random observations to generate positive integer Number of random observations to generate per output path, specified as a positive integer. The output arguments Y and E, or Tbl, have numobs rows. Data Types: double Presample — Presample data table | timetable Presample data that provides initial values for the model Mdl, specified as a table or timetable with numprevars variables and numpreobs rows. The following situations describe when to use Presample: • Presample is required when simulate performs an unconditional simulation, which occurs under one of the following conditions: 12-2101
12
Functions
• You do not supply data in the simulation horizon (that is, you do not use the InSample namevalue argument). • You specify only predictor data for the model regression component in the simulation horizon using the InSample and PredictorVariables name-value arguments, but you do not select any response variables from InSample. • Presample is optional when simulate performs a conditional simulation, that is, when you supply response data in the simulation horizon, on which to condition the simulated responses, by using the InSample and ResponseVariables name-value arguments. The default, simulate sets any necessary presample observations. • For stationary VAR processes without regression components, simulate sets presample observations to the unconditional mean μ = Φ−1(L)c . • For nonstationary processes or models that contain a regression component, simulate sets presample observations to zero. Regardless of the situation, simulate returns the simulated variables in the output table or timetable Tbl, which is commensurate with Presample. Each row is a presample observation, and measurements in each row, among all paths, occur simultaneously. numpreobs must be at least Mdl.P. If you supply more rows than necessary, simulate uses the latest Mdl.P observations only. Each variable is a numpreobs-by-numprepaths numeric matrix. Variables are associated with response series in Mdl.SeriesNames. To control presample variable selection, see the optional PresampleResponseVariables name-value argument. For each variable, columns are separate, independent paths. • If variables are vectors, simulate applies them to each respective path to initialize the model for the simulation. Therefore, all respective response paths derive from common initial conditions. • Otherwise, for each variable ResponseK and each path j, simulate applies Presample.ResponseK(:,j) to produce Tbl.ResponseK(:,j). Variables must have at least numpaths columns, and simulate uses only the first numpaths columns. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs InSample and Presample must be consistent in time such that Presample immediately precedes InSample with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the following conditions hold: • The last row contains the latest presample observation. • Presample.Properties.RowsNames must be empty. InSample — Future time series response or predictor data table | timetable Future time series response or predictor data, specified as a table or timetable. InSample contains numvars variables, including numseries response variables yt or numpreds predictor variables xt 12-2102
simulate
for the model regression component. You can specify InSample only when other data inputs are tables or timetables. Use InSample in the following situations: • Perform conditional simulation. You must also supply the response variable names in InSample by using the ResponseVariables name-value argument. • Supply future predictor data for either unconditional or conditional simulation. To supply predictor data, you must specify predictor variable names in InSample by using the PredictorVariables name-value argument. Otherwise, simulate ignores the model regression component. simulate returns the simulated variables in the output table or timetable Tbl, which is commensurate with InSample. Each row corresponds to an observation in the simulation horizon, the first row is the earliest observation, and measurements in each row, among all paths, occur simultaneously. InSample must have at least numobs rows to cover the simulation horizon. If you supply more rows than necessary, simulate uses only the first numobs rows. Each response variable is a numeric matrix with numpaths columns. For each response variable K, columns are separate, independent paths. Specifically, path j of response variable ResponseK captures the state, or knowledge, of ResponseK as it evolves from the presample past (for example, Presample.ResponseK) into the future. For each selected response variable ResponseK: • If InSample.ResponseK is a vector, simulate applies to each of the numpaths output paths (see NumPaths). • Otherwise, InSample.ResponseK must have at least numpaths columns. If you supply more pages than necessary, simulate uses only the first numpaths columns. Each predictor variable is a numeric vector. All predictor variables are present in the regression component of each response equation and apply to all response paths. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). • The datetime vector InSample.Time must be strictly ascending or descending. • Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the following conditions hold: • The last row contains the latest observation. • InSample.Properties.RowsNames must be empty. Elements of the response variables of InSample can be numeric scalars or missing values (indicated by NaN values). simulate treats numeric scalars as deterministic future responses that are known in advance, for example, set by policy. simulate simulates responses for corresponding NaN values conditional on the known values. Elements of selected predictor variables must be numeric scalars. By default, simulate performs an unconditional simulation without a regression component in the model (each selected response variables are a numobs-by-numpaths matrix composed of NaN values indicating a complete lack of knowledge of the future state of all simulated responses). Therefore, variables in Tbl result from a conventional, unconditional Monte Carlo simulation. 12-2103
12
Functions
For more details, see “Algorithms” on page 12-2109. Example: Consider simulating one path from a model composed of two response series, GDP and CPI, three periods into the future. Suppose that you have prior knowledge about some of the future values of the responses, and you want to simulate the unknown responses conditional on your knowledge. Specify InSample as a table containing the values that you know, and use NaN for values you do not know but want to simulate. For example, InSample=array2table([2 NaN; 0.1 NaN; NaN NaN],VariableNames=["GDP" "CPI"]) specifies that you have no knowledge of the future values of CPI, but you know that GDP is 2, 0.1, and unknown in periods 1, 2, and 3, respectively, in the simulation horizon. ResponseVariables — Variables to select from InSample to treat as response variables yt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as response variables yt, specified as a one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in InSample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where ResponseVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(ResponseVariables) is numseries To perform conditional simulation, you must specify ResponseVariables to select the response variables in InSample for the conditioning data. ResponseVariables applies only when you specify InSample. The selected variables must be numeric vectors (single path) or matrices (columns represent multiple independent paths) of the same width. Example: ResponseVariables=["GDP" "CPI"] Example: ResponseVariables=[true false true false] or ResponseVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: simulate(Mdl,100,NumPaths=1000,Y0=PS) returns a numeric array of 1000, 100period simulated response paths from Mdl and specifies the numeric array of presample response data PS. NumPaths — Number of sample paths to generate 1 (default) | positive integer Number of sample paths to generate, specified as a positive integer. The outputs Y and E have NumPaths pages, and each simulated response and innovation variable in the output Tbl is a numobs-by-NumPaths matrix. 12-2104
simulate
Example: NumPaths=1000 Data Types: double Y0 — Presample responses numeric matrix | numeric array Presample responses that provide initial values for the model Mdl, specified as a numpreobs-bynumseries numeric matrix or a numpreobs-by-numseries-by-numprepaths numeric array. Use Y0 only when you supply optional data inputs as numeric arrays. numpreobs is the number of presample observations. numprepaths is the number of presample response paths. Each row is a presample observation, and measurements in each row, among all pages, occur simultaneously. The last row contains the latest presample observation. Y0 must have at least Mdl.P rows. If you supply more rows than necessary, simulate uses the latest Mdl.P observations only. Each column corresponds to the response series name in Mdl.SeriesNames. Pages correspond to separate, independent paths. • If Y0 is a matrix, simulate applies it to simulate each sample path (page). Therefore, all paths in the output argument Y derive from common initial conditions. • Otherwise, simulate applies Y0(:,:,j) to initialize simulating path j. Y0 must have at least numpaths pages, and simulate uses only the first numpaths pages. By default, simulate sets any necessary presample observations. • For stationary VAR processes without regression components, simulate sets presample observations to the unconditional mean μ = Φ−1(L)c . • For nonstationary processes or models that contain a regression component, simulate sets presample observations to zero. Data Types: double PresampleResponseVariables — Variables to select from Presample to use for presample response data string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Presample to use for presample data, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Presample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleResponseVariables(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleResponseVariables) is numseries PresampleResponseVariables applies only when you specify Presample. The selected variables must be numeric vectors and cannot contain missing values (NaN). 12-2105
12
Functions
PresampleResponseNames does not need to contain the same names as in Mdl.SeriesNames; simulate uses the data in selected variable PresampleResponseVariables(j) as a presample for Mdl.SeriesNames(j). If the number of variables in Presample matches Mdl.NumSeries, the default specifies all variables in Presample. If the number of variables in Presample exceeds Mdl.NumSeries, the default matches variables in Presample to names in Mdl.SeriesNames. Example: PresampleResponseVariables=["GDP" "CPI"] Example: PresampleResponseVariables=[true false true false] or PresampleResponseVariable=[1 3] selects the first and third table variables for presample data. Data Types: double | logical | char | cell | string X — Predictor data numeric matrix Predictor data for the regression component in the model, specified as a numeric matrix containing numpreds columns. Use X only when you supply optional data inputs as numeric arrays. numpreds is the number of predictor variables (size(Mdl.Beta,2)). Each row corresponds to an observation, and measurements in each row occur simultaneously. The last row contains the latest observation. X must have at least numobs rows. If you supply more rows than necessary, simulate uses only the latest numobs observations. simulate does not use the regression component in the presample period. Each column is an individual predictor variable. All predictor variables are present in the regression component of each response equation. simulate applies X to each path (page); that is, X represents one path of observed predictors. By default, simulate excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Variables to select from InSample to treat as exogenous predictor variables xt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as exogenous predictor variables xt, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames • A length numpreds vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(PredictorVariables) is numpreds Regardless, selected predictor variable j corresponds to the coefficients Mdl.Beta(:,j). PredictorVariables applies only when you specify InSample. The selected variables must be numeric vectors and cannot contain missing values (NaN). 12-2106
simulate
By default, simulate excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string YF — Future multivariate response series numeric matrix | numeric array Future multivariate response series for conditional simulation, specified as a numeric matrix or array containing numseries columns. Use YF only when you supply optional data inputs as numeric arrays. Each row corresponds to observations in the simulation horizon, and the first row is the earliest observation. Specifically, row j in sample path k (YF(j,:,k)) contains the responses j periods into the future. YF must have at least numobs rows to cover the simulation horizon. If you supply more rows than necessary, simulate uses only the first numobs rows. Each column corresponds to the response variable name in Mdl.SeriesNames. Each page corresponds to a separate sample path. Specifically, path k (YF(:,:,k)) captures the state, or knowledge, of the response series as they evolve from the presample past (Y0) into the future. • If YF is a matrix, simulate applies YF to each of the numpaths output paths (see NumPaths). • Otherwise, YF must have at least numpaths pages. If you supply more pages than necessary, simulate uses only the first numpaths pages. Elements of YF can be numeric scalars or missing values (indicated by NaN values). simulate treats numeric scalars as deterministic future responses that are known in advance, for example, set by policy. simulate simulates responses for corresponding NaN values conditional on the known values. By default, YF is an array composed of NaN values indicating a complete lack of knowledge of the future state of all simulated responses. Therefore, simulate obtains the output responses Y from a conventional, unconditional Monte Carlo simulation. For more details, see “Algorithms” on page 12-2109. Example: Consider simulating one path from a model composed of four response series three periods into the future. Suppose that you have prior knowledge about some of the future values of the responses, and you want to simulate the unknown responses conditional on your knowledge. Specify YF as a matrix containing the values that you know, and use NaN for values you do not know but want to simulate. For example, YF=[NaN 2 5 NaN; NaN NaN 0.1 NaN; NaN NaN NaN NaN] specifies that you have no knowledge of the future values of the first and fourth response series; you know the value for period 1 in the second response series, but no other value; and you know the values for periods 1 and 2 in the third response series, but not the value for period 3. Data Types: double Note • NaN values in Y0 and X indicate missing values. simulate removes missing values from the data by list-wise deletion. If Y0 is a 3-D array, then simulate performs these steps. 12-2107
12
Functions
1
Horizontally concatenate pages to form a numpreobs-by-numpaths*numseries matrix.
2
Remove any row that contains at least one NaN from the concatenated data.
In the case of missing observations, the results obtained from multiple paths of Y0 can differ from the results obtained from each path individually. For conditional simulation (see YF), if X contains any missing values in the latest numobs observations, then simulate throws an error. • simulate issues an error when selected response variables from Presample and selected predictor variables from InSample contain any missing values.
Output Arguments Y — Simulated multivariate response series numeric matrix | numeric array Simulated multivariate response series, returned as a numobs-by-numseries numeric matrix or a numobs-by-numseries-by-numpaths numeric array. simulate returns Y only when you supply optional data sets as numeric matrices or arrays, for example, you use the Y0 name-value argument. Y represents the continuation of the presample responses in Y0. Each row is a time point in the simulation horizon. Values in a row, among all pages, occur simultaneously. The last row contains the latest simulated values. Each column corresponds to the response series name in Mdl.SeriesNames. Pages correspond to separate, independently simulated paths. If you specify future responses for conditional simulation using the YF name-value argument, the known values in YF appear in the same positions in Y. However, Y contains simulated values for the missing observations in YF. E — Simulated multivariate model innovations series numeric matrix | numeric array Simulated multivariate model innovations series, returned as a numobs-by-numseries numeric matrix or a numobs-by-numseries-by-numpaths numeric array. simulate returns E only when you supply optional data sets as numeric matrices or arrays, for example, you use the Y0 name-value argument. Elements of E and Y correspond. If you specify future responses for conditional simulation (see the YF name-value argument), simulate infers the innovations from the known values in YF and places the inferred innovations in the corresponding positions in E. For the missing observations in YF, simulate draws from the Gaussian distribution conditional on any known values, and places the draws in the corresponding positions in E. Tbl — Simulated multivariate response, model innovations, and other variables table | timetable 12-2108
simulate
Simulated multivariate response, model innovations, and other variables, returned as a table or timetable, the same data type as Presample or InSample. simulate returns Tbl only when you supply the inputs Presample or InSample. Tbl contains the following variables: • The simulated paths within the simulation horizon of the selected response series yt. Each simulated response variable in Tbl is a numobs-by-numpaths numeric matrix, where numobs is the value of NumObs and numpaths is the value of NumPaths. Each row corresponds to a time in the simulation horizon and each column corresponds to a separate path. simulate names the simulated response variable ResponseK ResponseK_Responses. For example, if Mdl.Series(K) is GDP, Tbl contains a variable for the corresponding simulated response with the name GDP_Responses. If you specify ResponseVariables, ResponseK is ResponseVariable(K). Otherwise, ResponseK is PresampleResponseVariable(K). • The simulated paths within the simulation horizon of the innovations εt corresponding to yt. Each simulated innovations variable in Tbl is a numobs-by-numpaths numeric matrix. Each row corresponds to a time in the simulation horizon and each column corresponds to a separate path. simulate names the simulated innovations variable of response ResponseK ResponseK_Innovations. For example, if Mdl.Series(K) is GDP, Tbl contains a variable for the corresponding innovations with the name GDP_Innovations. If Tbl is a timetable, the following conditions hold: • The row order of Tbl, either ascending or descending, matches the row order of InSample, when you specify it. If you do not specify InSample and you specify Presample, the row order of Tbl is the same as the row order Presample. • If you specify InSample, row times Tbl.Time are InSample.Time(1:numobs). Otherwise, Tbl.Time(1) is the next time after Presample(end) relative the sampling frequency, and Tbl.Time(2:numobs) are the following times relative to the sampling frequency.
Algorithms Suppose Y0 and YF are the presample and future response data specified by the numeric data inputs in Y0 and YF or the selected variables from the input tables or timetables Presample and InSample. Similarly, suppose E contains the simulated model innovations as returned in the numeric array E or the table or timetable Tbl. • simulate performs conditional simulation using this process for all pages k = 1,...,numpaths and for each time t = 1,...,numobs. 1
simulate infers (or inverse filters) the model innovations for all response variables (E(t,:,k) from the known future responses (YF(t,:,k)). In E, simulate mimics the pattern of NaN values that appears in YF.
2
For the missing elements of E at time t, simulate performs these steps. a
Draw Z1, the random, standard Gaussian distribution disturbances conditional on the known elements of E.
b
Scale Z1 by the lower triangular Cholesky factor of the conditional covariance matrix. That is, Z2 = L*Z1, where L = chol(C,"lower") and C is the covariance of the conditional Gaussian distribution.
c
Impute Z2 in place of the corresponding missing values in E. 12-2109
12
Functions
3
For the missing values in YF, simulate filters the corresponding random innovations through the model Mdl.
• simulate uses this process to determine the time origin t0 of models that include linear time trends. • If you do not specify Y0, then t0 = 0. • Otherwise, simulate sets t0 to size(Y0,1) – Mdl.P. Therefore, the times in the trend component are t = t0 + 1, t0 + 2,..., t0 + numobs. This convention is consistent with the default behavior of model estimation in which estimate removes the first Mdl.P responses, reducing the effective sample size. Although simulate explicitly uses the first Mdl.P presample responses in Y0 to initialize the model, the total number of observations in Y0 (excluding any missing values) determines t0.
Version History Introduced in R2017a
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [3] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [4] Lütkepohl, H. New Introduction to Multiple Time Series Analysis. Berlin: Springer, 2005.
See Also Objects varm Functions estimate | filter | infer | forecast Topics “VAR Model Forecasting, Simulation, and Analysis” on page 9-44 “Simulate Responses Using filter” on page 9-89 “Simulate Responses of Estimated VARX Model” on page 9-78 “Simulate VAR Model Conditional Responses” on page 9-85 “VAR Model Case Study” on page 9-91
12-2110
simulate
simulate Monte Carlo simulation of vector error-correction (VEC) model
Syntax Y = simulate(Mdl,numobs) Y = simulate(Mdl,numobs,Name=Value) [Y,E] = simulate( ___ ) Tbl = simulate(Mdl,numobs,Presample=Presample) Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables) Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables,Presample=Presample) Tbl = simulate( ___ ,Name=Value)
Description Conditional and Unconditional Simulation for Numeric Arrays
Y = simulate(Mdl,numobs) returns the numeric array Y containing a random numobs-period path of multivariate response series from performing an unconditional simulation of the fully specified VEC(p – 1) model Mdl. Y = simulate(Mdl,numobs,Name=Value) uses additional options specified by one or more namevalue arguments. simulate returns numeric arrays when all optional input data are numeric arrays. For example, simulate(Mdl,100,NumPaths=1000,Y0=PS) returns a numeric array of 1000, 100period simulated response paths from Mdl and specifies the numeric array of presample response data PS. To perform a conditional simulation, specify response data in the simulation horizon by using the YF name-value argument. [Y,E] = simulate( ___ ) also returns the numeric array containing the simulated multivariate model innovations series E corresponding to the simulated responses Y, using any input argument combination in the previous syntaxes. Unconditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,Presample=Presample) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the unconditional simulation of the response series in the model Mdl. simulate uses the table or timetable of presample data Presample to initialize the response series. simulate selects the variables in Mdl.SeriesNames to simulate or all variables in Presample. To select different response variables in Presample to simulate, use the PresampleResponseVariables name-value argument. Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) uses additional options specified by one or more name-value arguments. For example, 12-2111
12
Functions
simulate(Mdl,100,Presample=PSTbl,PresampleResponseVariables=["GDP" "CPI"]) returns a timetable of variables containing 100-period simulated response and innovations series from Mdl, initialized by the data in the GDP and CPI variables of the timetable of presample data in PSTbl. Conditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the conditional simulation of the response series in the model Mdl. InSample is a table or timetable of response or predictor data in the simulation horizon that simulate uses to perform the conditional simulation and ResponseVariables specifies the response variables in InSample. Tbl = simulate(Mdl,numobs,InSample=InSample,ReponseVariables= ResponseVariables,Presample=Presample) uses the presample data in the table or timetable Presample to initialize the model. Tbl = simulate( ___ ,Name=Value) uses additional options specified by one or more name-value arguments, using any input argument combination in the previous two syntaxes.
Examples Return Response Series in Matrix from Unconditional Simulation Consider a VEC model for the following seven macroeconomic series, fit the model to the data, and then perform unconditional simulation by generating a random path of the response variables from the estimated model. • Gross domestic product (GDP) • GDP implicit price deflator • Paid compensation of employees • Nonfarm business sector hours of all persons • Effective federal funds rate • Personal consumption expenditures • Gross private domestic investment Suppose that a cointegrating rank of 4 and one short-run term are appropriate, that is, consider a VEC(1) model. Load the Data_USEconVECModel data set. load Data_USEconVECModel
For more information on the data set and variables, enter Description at the command line. Determine whether the data needs to be preprocessed by plotting the series on separate plots. figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.GDP)
12-2112
simulate
title("Gross Domestic Product") ylabel("Index") xlabel("Date") nexttile plot(FRED.Time,FRED.GDPDEF) title("GDP Deflator") ylabel("Index") xlabel("Date") nexttile plot(FRED.Time,FRED.COE) title("Paid Compensation of Employees") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.HOANBS) title("Nonfarm Business Sector Hours") ylabel("Index") xlabel("Date")
figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.FEDFUNDS) title("Federal Funds Rate") ylabel("Percent") xlabel("Date")
12-2113
12
Functions
nexttile plot(FRED.Time,FRED.PCEC) title("Consumption Expenditures") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.GPDI) title("Gross Private Domestic Investment") ylabel("Billions of $") xlabel("Date")
Stabilize all series, except the federal funds rate, by applying the log transform. Scale the resulting series by 100 so that all series are on the same scale. FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
Create a VECM(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = FRED.Properties.VariableNames Mdl = vecm with properties:
12-2114
simulate
Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model with Linear Time Trend" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs] [4×1 vector of NaNs] {7×7 matrix of NaNs} at lag [1] [7×1 vector of NaNs] [7×0 matrix] [7×7 matrix of NaNs]
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set and the default options. EstMdl = estimate(Mdl,FRED.Variables) EstMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [14.1329 8.77841 -7.20359 ... and 4 more]' [7×4 matrix] [7×4 matrix] [7×7 matrix] [-28.6082 109.555 -77.0912 ... and 1 more]' [4×1 vector of zeros] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
EstMdl is an estimated vecm model object. It is fully specified because all parameters have known values. By default, estimate imposes the constraints of the H1 Johansen VEC model form by removing the cointegrating trend and linear trend terms from the model. Parameter exclusion from estimation is equivalent to imposing equality constraints to zero. Simulate a response series path from the estimated model with length equal to the path in the data. rng(1); % For reproducibility numobs = size(FRED,1); Y = simulate(EstMdl,numobs);
Y is a 240-by-7 matrix of simulated responses. Columns correspond to the variable names in EstMdl.SeriesNames.
12-2115
12
Functions
Simulate Responses Using filter Illustrate the relationship between simulate and filter by estimating a 4-D VEC(1) model of the four response series in Johansen's Danish data set. Simulate a single path of responses using the fitted model and the historical data as initial values, and then filter a random set of Gaussian disturbances through the estimated model using the same presample responses. Load Johansen's Danish economic data. load Data_JDanish
For details on the variables, enter Description. Create a default 4-D VEC(1) model. Assume that a cointegrating rank of 1 is appropriate. Mdl = vecm(4,1,1); Mdl.SeriesNames = DataTable.Properties.VariableNames Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"4-Dimensional Rank = 1 VEC(1) Model with Linear Time Trend" "M2" "Y" "IB" ... and 1 more 4 1 2 [4×1 vector of NaNs] [4×1 matrix of NaNs] [4×1 matrix of NaNs] [4×4 matrix of NaNs] NaN NaN {4×4 matrix of NaNs} at lag [1] [4×1 vector of NaNs] [4×0 matrix] [4×4 matrix of NaNs]
Estimate the VEC(1) model using the entire data set. Specify the H1* Johansen model form. EstMdl = estimate(Mdl,Data,Model="H1*");
When reproducing the results of simulate and filter, it is important to take these actions. • Set the same random number seed using rng. • Specify the same presample response data using the Y0 name-value argument. Simulate 100 observations by passing the estimated model to simulate. Specify the entire data set as the presample. rng("default") YSim = simulate(EstMdl,100,Y0=Data);
YSim is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in EstMdl.SeriesNames. 12-2116
simulate
Set the default random seed. Simulate 4 series of 100 observations from the standard Gaussian distribution. rng("default") Z = randn(100,4);
Filter the Gaussian values through the estimated model. Specify the entire data set as the presample. YFilter = filter(EstMdl,Z,Y0=Data);
YFilter is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in EstMdl.SeriesNames. Before filtering the disturbances, filter scales Z by the lower triangular Cholesky factor of the model covariance in EstMdl.Covariance. Compare the resulting responses between filter and simulate. (YSim - YFilter)'*(YSim - YFilter) ans = 4×4 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
The results are identical.
Simulate Arrays of Multiple Response and Innovations Paths Consider this VEC(1) model for three hypothetical response series. Δyt = c + AB′yt − 1 + Φ1 Δyt − 1 + εt =
−1 −0 . 3 0 . 3 0 0.1 0.2 0 . 1 −0 . 2 0 . 2 −3 + −0 . 2 0 . 1 yt − 1 + 0 . 2 −0 . 2 0 Δyt − 1 + εt . −0 . 7 0 . 5 0 . 2 −30 −1 0 0 . 7 −0 . 2 0 . 3
The innovations are multivariate Gaussian with a mean of 0 and the covariance matrix 1.3 0.4 1.6 Σ = 0.4 0.6 0.7 . 1.6 0.7 5 Create variables for the parameter values. A = [-0.3 0.3; -0.2 0.1; -1 0]; B = [0.1 -0.7; -0.2 0.5; 0.2 0.2]; Phi = {[0. 0.1 0.2; 0.2 -0.2 0; 0.7 -0.2 0.3]}; c = [-1; -3; -30]; tau = [0; 0; 0]; Sigma = [1.3 0.4 1.6; 0.4 0.6 0.7; 1.6 0.7 5];
% % % % % %
Adjustment Cointegration ShortRun Constant Trend Covariance
12-2117
12
Functions
Create a vecm model object representing the VEC(1) model using the appropriate name-value pair arguments. Mdl = vecm(Adjustment=A,Cointegration=B, ... Constant=c,ShortRun=Phi,Trend=tau,Covariance=Sigma);
Mdl is effectively a fully specified vecm model object. That is, the cointegration constant and linear trend are unknown, but are not needed for simulating observations or forecasting given that the overall constant and trend parameters are known. Simulate 1000 paths of 100 observations. Return the innovations (scaled disturbances). rng(1); % For reproducibility numpaths = 1000; numobs = 100; [Y,E] = simulate(Mdl,numobs,NumPaths=numpaths);
Y is a 100-by-3-by-1000 matrix of simulated responses. E is a matrix whose dimensions correspond to the dimensions of Y, but represents the simulated, scaled disturbances. Columns correspond to the response variable names Mdl.SeriesNames. For each time point, compute the mean vector of the simulated responses among all paths. MeanSim = mean(Y,3);
MeanSim is a 100-by-7 matrix containing the average of the simulated responses at each time point. Plot the simulated responses and their averages, and plot the simulated innovations. figure tiledlayout(2,2) for j = 1:numel(Mdl.SeriesNames) nexttile h1 = plot(squeeze(Y(:,j,:)),Color=[0.8 0.8 0.8]); hold on h2 = plot(MeanSim(:,j),Color="k",LineWidth=2); hold off title(Mdl.SeriesNames{j}); legend([h1(1) h2],["Simulated" "Mean"]) end
12-2118
simulate
figure tiledlayout(2,2) for j = 1:numel(Mdl.SeriesNames) nexttile h1 = plot(squeeze(E(:,j,:)),Color=[0.8 0.8 0.8]); hold on yline(0,"r--") title("Innovations: " + Mdl.SeriesNames{j}) end
12-2119
12
Functions
Return Timetable of Responses and Innovations from Unconditional Simulation Consider a VEC model for the following seven macroeconomic series, and then fit the model to a timetable of response data. This example is based on “Return Response Series in Matrix from Unconditional Simulation” on page 12-2112. Load and Preprocess Data Load the Data_USEconVECModel data set. load Data_USEconVECModel DTT = FRED; DTT.GDP = 100*log(DTT.GDP); DTT.GDPDEF = 100*log(DTT.GDPDEF); DTT.COE = 100*log(DTT.COE); DTT.HOANBS = 100*log(DTT.HOANBS); DTT.PCEC = 100*log(DTT.PCEC); DTT.GPDI = 100*log(DTT.GPDI);
Prepare Timetable for Estimation When you plan to supply a timetable directly to estimate, you must ensure it has all the following characteristics: 12-2120
simulate
• All selected response variables are numeric and do not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the table. DTT = rmmissing(DTT); T = height(DTT) T = 240
DTT does not contain any missing values. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
DTT is regular with respect to time. Create Model Template for Estimation Create a VEC(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = DTT.Properties.VariableNames;
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Fit Model to Data Estimate the model by supplying the timetable of data DTT. By default, because the number of variables in Mdl.SeriesNames is the number of variables in DTT, estimate fits the model to all the variables in DTT. EstMdl = estimate(Mdl,DTT); p = EstMdl.P p = 2
12-2121
12
Functions
EstMdl is an estimated vecm model object. Perform Unconditional Simulation of Estimated Model Simulate a response and innovations path from the estimated model and return the simulated series as variables in a timetable. simulate requires information for the output timetable, such as variable names, sampling times for the simulation horizon, and sampling frequency. Therefore, supply a presample of the earliest p = 2 observations of the data DTT, from which simulate infers the required timetable information. Specify a simulation horizon of numobs - p. rng(1) % For reproducibility PSTbl = DTT(1:p,:); T = T - p; Tbl = simulate(EstMdl,T,Presample=PSTbl); size(Tbl) ans = 1×2 238
14
PSTbl PSTbl=2×7 timetable Time GDP ___________ ______ 01-Jan-1957 01-Apr-1957
615.4 615.87
GDPDEF ______
COE ______
HOANBS ______
FEDFUNDS ________
PCEC ______
GPDI ______
280.25 280.95
556.3 557.03
400.29 400.07
2.96 3
564.3 565.11
435.29 435.54
head(Tbl) Time ___________ 01-Jul-1957 01-Oct-1957 01-Jan-1958 01-Apr-1958 01-Jul-1958 01-Oct-1958 01-Jan-1959 01-Apr-1959
GDP_Responses _____________
GDPDEF_Responses ________________
616.84 619.27 620.08 620.73 621.25 621.9 622.57 624.12
281.66 282.31 282.64 282.94 283.36 284.06 284.44 285
COE_Responses _____________ 557.71 559.72 561.48 562.02 562.07 562.91 564.48 566.1
HOANBS_Responses ________________ 400.25 400.14 400.26 400 400.21 399.89 399.68 399.1
Tbl is a 238-by-14 matrix of simulated responses (denoted responseVariable_Responses) and corresponding innovations (denoted responseVariable_Innovations). The timestamps of Tbl follow directly from the timestamps of PSTbl, and they have the same sampling frequency.
Simulate Responses From Model Containing Regression Component Consider the model and data in “Return Response Series in Matrix from Unconditional Simulation” on page 12-2112. Load the Data_USEconVECModel data set. load Data_USEconVECModel
12-2122
FEDF ____
simulate
The Data_Recessions data set contains the beginning and ending serial dates of recessions. Load this data set. Convert the matrix of date serial numbers to a datetime array. load Data_Recessions dtrec = datetime(Recessions,ConvertFrom="datenum");
Remove the exponential trend from the series, and then scale them by a factor of 100. DTT = FRED; DTT.GDP = 100*log(DTT.GDP); DTT.GDPDEF = 100*log(DTT.GDPDEF); DTT.COE = 100*log(DTT.COE); DTT.HOANBS = 100*log(DTT.HOANBS); DTT.PCEC = 100*log(DTT.PCEC); DTT.GPDI = 100*log(DTT.GPDI);
Create a dummy variable that identifies periods in which the U.S. was in a recession or worse. Specifically, the variable should be 1 if FRED.Time occurs during a recession, and 0 otherwise. Include the variable with the FRED data. isin = @(x)(any(dtrec(:,1) = 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0) (theta(5) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else logprior = 0; end end
Input Arguments MdlSSM — Standard, linear state-space model ssm object Standard, linear state-space model, specified as an ssm object returned by ssm. Note The ssm2bssm converter is best suited for converting explicitly created, simple state-space models. For moderate through complex models, particularly implicitly created models where a 12-2199
12
Functions
parameter-to-matrix mapping function specifies the state-space, create a Bayesian model directly by using the bssm function. ParamDistribution — Log of joint probability density function of the state-space model parameters Π(θ) @(x)0 (default) | function handle Log of joint probability density function of the state-space model parameters Π(θ), specified as a function handle in the form @fcnName, where fcnName is the function name. ParamDistribution sets the ParamDistribution property of MdlBSSM. Suppose logPrior is the name of the MATLAB function defining the joint prior distribution of θ. Then, logPrior must have this form. function logpdf = logPrior(theta,...otherInputs...) ... end
where: • theta is a numParams-by-1 numeric vector of the linear state-space model parameters θ. Elements of theta must correspond to the unknown parameters of MdlSSM (see “Tips” on page 12-2201). The function can accept other inputs in subsequent positions. • logpdf is a numeric scalar representing the log of the joint probability density of θ at the input theta. If ParamDistribution requires the input parameter vector argument only, you can create the bssm object by calling: MdlBSSM = ssm2bssm(MdlSSM,@logPrior)
In general, create the bssm object by calling: MdlBSSM = ssm2bssm(MdlSSM,@(theta)logPrior(theta,...otherInputArgs...))
The default @(x)0 indicates a joint prior density that is proportional to 1 everywhere. Tip • The default joint prior is not necessarily a proper density. Consider specifying a proper prior instead. • Because out-of-bounds prior density evaluation is 0, set the log prior density of out-of-bounds parameter arguments to -Inf.
Data Types: function_handle
Output Arguments MdlBSSM — Bayesian state-space model bssm object 12-2200
ssm2bssm
Bayesian state-space model, returned as a bssm object. MdlBSSM completely specifies the state-space model structure (likelihood) and joint prior distribution.
Tips • To determine the order of the parameters for the first input argument theta of the log joint prior density function ParamDistribution, display the standard state-space model MdlSSM at the command line. MATLAB labels the parameters cj under the State equations and Observation equations headings, where j is the index of the parameter in the vector theta. For example, consider the following display of the standard state-space model MdlSSM. MdlSSM = State-space model type: ssm [ ... ] State equations: x1(t) = (c1)x1(t-1) + (c2)x2(t-1) + (c3)u1(t) x2(t) = x1(t-1) Observation equation: y1(t) = x1(t) + (c4)e1(t) [...]
In this case, theta is a 4-by-1 vector, where: • theta(1) is c1, the lag 1 AR coefficient of state variable x1,t. • theta(2) is c2, the lag 2 AR coefficient of state variable x1,t. • theta(3) is c3, the standard deviation of state disturbance u1,t. • theta(4) is c4, the standard deviation of observation innovation ε1,t.
Version History Introduced in R2022a
See Also ssm | bssm
12-2201
12
Functions
subchain Extract Markov subchain
Syntax sc = subchain(mc,states)
Description sc = subchain(mc,states) returns the subchain sc extracted from the discrete-time Markov chain mc. The subchain contains the states states and all states that are reachable from states.
Examples Extract Recurrent Subchain from Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0.5 P= 0 0
1 0 0 0
0 0.5 0.5 0.5
0 0 . 0.5 0.5
Create the Markov chain that is characterized by the transition matrix P. P = [0 1 0 0; 0.5 0 0.5 0; 0 0 0.5 0.5; 0 0 0.5 0.5]; mc = dtmc(P);
Plot a directed graph of the Markov chain. Visually identify the communicating class to which each state belongs by using node colors. figure; graphplot(mc,'ColorNodes',true);
12-2202
subchain
Determine the stationary distribution of the Markov chain. x = asymptotics(mc) x = 1×4 0.0000
0.0000
0.5000
0.5000
The Markov chain eventually gets absorbed into states 3 and 4, and subsequent transitions are stochastic. Extract the recurrent subchain of the Markov chain by passing mc to subchain and specifying one of the states in the recurrent, aperiodic communicating class. sc = subchain(mc,3);
sc is a dtmc object. Plot a directed graph of the subchain. figure; graphplot(sc,'ColorNodes',true)
12-2203
12
Functions
Extract Unichain from Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0.5 0 P= 0 0
0.5 0.5 0 0
0 0.5 0.5 0.5
0 0 . 0.5 0.5
Create the Markov chain that is characterized by the transition matrix P. Name the states Regime 1 through Regime 4. P = [0.5 0.5 0 0; 0 0.5 0.5 0; 0 0 0.5 0.5; 0 0 0.5 0.5]; mc = dtmc(P,'StateNames',["Regime 1" "Regime 2" "Regime 3" "Regime 4"]);
Plot a digraph of the chain. Visually identify the communicating class to which each state belongs by using node colors. figure; graphplot(mc,'ColorNodes',true);
12-2204
subchain
Regimes 1 and 2 are in their own communicating class because Regime 2 does not transition to Regime 1. Extract the subchain containing Regime 2, a transient state. Display the transition matrix of the subchain. sc = subchain(mc,"Regime 2"); sc.P ans = 3×3 0.5000 0 0
0.5000 0.5000 0.5000
0 0.5000 0.5000
Regime 1 is not in the subchain. Plot a digraph of the subchain. figure; graphplot(sc,'ColorNodes',true);
12-2205
12
Functions
The plot shows a unichain: a Markov chain containing one recurrent communicating class and the selected transient class.
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). states — States to include in subchain numeric vector of positive integers | string vector | cell vector of character vectors States to include in the subchain, specified as a numeric vector of positive integers, string vector, or cell vector of character vectors. • For a numeric vector, elements of states correspond to rows of the transition matrix mc.P. • For a string vector or cell vector of character vectors, elements of states must be state names in mc.StateNames. Example: ["Regime 1" "Regime 2"] Data Types: double | string | cell 12-2206
subchain
Output Arguments sc — Discrete-time Markov chain dtmc object Discrete-time Markov chain, returned as a dtmc object. sc is a subchain of mc containing the states states and all states reachable from states. The state names of the subchain sc.StateNames are inherited from mc.
Algorithms • State j is reachable from state i if there is a nonzero probability of moving from i to j in a finite number of steps. subchain determines reachability by forming the transitive closure of the associated digraph, then enumerating one-step transitions. • Subchains are closed under reachability to ensure that the transition matrix of sc remains stochastic (that is, rows sum to 1), with transition probabilities identical to the transition probabilities in mc.P. • If you specify a state in a recurrent communicating class, then subchain extracts the entire communicating class. If you specify a state in a transient communicating class, then subchain extracts the transient class and all classes reachable from the transient class. To extract a unichain, specify a state in each component transient class. See classify.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985.
See Also Objects digraph Functions classify Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Identify Classes in Markov Chain” on page 10-47
12-2207
12
Functions
summarize Display ARIMA model estimation results
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the ARIMA model Mdl. • If Mdl is an estimated model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC), and the estimated innovations variance. • If Mdl is an unestimated model returned by arima, then summarize prints the standard object display (the same display that arima prints during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated model, then results is a structure containing estimation results. • If Mdl is an unestimated model, then results is an arima model object that is equal to Mdl.
Input Arguments Mdl — ARIMA model arima model object ARIMA model, specified as an arima model object returned by estimate or arima.
Output Arguments results — Model summary structure array | arima model object Model summary, returned as a structure array or an arima model object. • If Mdl is an estimated model, then results is a structure array containing the fields in this table.
12-2208
Field
Description
Description
Model summary description (string)
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
summarize
Field
Description
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Maximum likelihood estimates of the model parameters with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
VarianceTable
Maximum likelihood estimate of the model variance with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality). If Mdl.Variance is constant, then VarianceTable is a table containing one row. If Mdl.Variance is an estimated conditional variance model (for example, a garch model), then VarianceTable is a table whose rows correspond to estimated variance model parameters.
• If Mdl is an unestimated model, then results is an arima model object that is equal to Mdl.
Examples Display Estimation Results Print the results from estimating an ARMA model using simulated data. Simulate data from an ARMA(1,1) model using known parameter values. MdlSim = arima('Constant',0.01,'AR',0.8,'MA',0.14,... 'Variance',0.1); rng 'default'; Y = simulate(MdlSim,100);
Fit an ARMA(1,1) model to the simulated data, turning off the print display. Mdl = arima(1,0,1); EstMdl = estimate(Mdl,Y,'Display','off');
Print the estimation results. summarize(EstMdl) ARIMA(1,0,1) Model (Gaussian Distribution)
12-2209
12
Functions
Effective Sample Size: 100 Number of Estimated Parameters: 4 LogLikelihood: -41.296 AIC: 90.592 BIC: 101.013 Value ________ Constant AR{1} MA{1} Variance
0.044537 0.82289 0.12032 0.13373
StandardError _____________ 0.046038 0.071163 0.10182 0.017879
TStatistic __________ 0.96741 11.563 1.1817 7.4794
PValue __________ 0.33334 6.3104e-31 0.23731 7.466e-14
Extract Estimation Results from Fitting Composite Conditional Mean and Variance Model Load the NASDAQ data included with Econometrics™ toolbox. Convert the daily close composite index series to a return series. For numerical stability, convert the returns to percentage returns. Specify an AR(1) and GARCH(1,1) composite model. This is a model of the form rt = c + ϕ1rt − 1 + εt, where εt = σtzt, σt2 = κ + γ1σt2− 1 + α1εt2− 1, and zt is an independent and identically distributed standardized Gaussian process. load Data_EquityIdx nasdaq = DataTable.NASDAQ; r = 100*price2ret(nasdaq); T = length(r); Mdl = arima('ARLags',1,'Variance',garch(1,1));
Fit the model Mdl to the return series r by using estimate. Use the presample observations that estimate chooses by default. EstMdl = estimate(Mdl,r,'Display','params'); ARIMA(1,0,0) Model (Gaussian Distribution): Value ________ Constant AR{1}
12-2210
0.072632 0.13816
StandardError _____________ 0.018047 0.019893
TStatistic __________
PValue __________
4.0245 6.945
5.7086e-05 3.7846e-12
summarize
GARCH(1,1) Conditional Variance Model (Gaussian Distribution):
Constant GARCH{1} ARCH{1}
Value ________
StandardError _____________
TStatistic __________
PValue __________
0.022377 0.87312 0.11865
0.0033201 0.0091019 0.008717
6.7399 95.927 13.611
1.5852e-11 0 3.434e-42
Create a variable named results that contains the estimation results by using summarize. results = summarize(EstMdl) results = struct with fields: Description: "ARIMA(1,0,0) Model (Gaussian Distribution)" SampleSize: 3027 NumEstimatedParameters: 5 LogLikelihood: -4.7414e+03 AIC: 9.4929e+03 BIC: 9.5230e+03 Table: [2x4 table] VarianceTable: [3x4 table]
Extract the parameter estimate summary tables from the estimation results structure array by using dot notation. The Table field contains the conditional mean model parameter estimates and inferences. The VarianceTable field contains the conditional variance model parameter estimates and inferences. meanEstTbl = results.Table meanEstTbl=2×4 table Value ________ Constant AR{1}
0.072632 0.13816
StandardError _____________
TStatistic __________
PValue __________
4.0245 6.945
5.7086e-05 3.7846e-12
StandardError _____________
TStatistic __________
PValue __________
0.0033201 0.0091019 0.008717
6.7399 95.927 13.611
1.5852e-11 0 3.434e-42
0.018047 0.019893
varianceEstTbl = results.VarianceTable varianceEstTbl=3×4 table Value ________ Constant GARCH{1} ARCH{1}
0.022377 0.87312 0.11865
Version History Introduced in R2018a
12-2211
12
Functions
See Also Objects arima | garch | egarch | gjr Functions estimate
12-2212
summarize
summarize Distribution summary statistics of Bayesian vector autoregression (VAR) model
Syntax summarize(Mdl) summarize(Mdl,display) Summary = summarize(Mdl)
Description summarize(Mdl) displays, at the command line, a tabular summary of the coefficients of the Bayesian VAR(p) model on page 12-2226 Mdl, and the innovations covariance matrix. The summary includes the means and standard deviations of the distribution Mdl represents. summarize(Mdl,display) prints the summary using the display style display. Summary = summarize(Mdl) returns distribution summary statistics Summary.
Examples Inspect Minnesota Prior Assumptions Among Models Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. Assume that a prior distribution π Φ1, . . . , Φ4, c ′, Σ governs the behavior of the parameters. Consider using Minnesota regularization to obtain a parsimonious representation of the coefficient posterior distribution. For each supported prior assumption, create the corresponding Bayesian VAR(4) model object for the three response variables by using bayesvarm. For each model that supports the option, specify all the following. • The response variable names. • Prior self-lag coefficients have variance 100. This large-variance setting allows the data to influence the posterior more than the prior. • Prior cross-lag coefficients have variance 1. This small-variance setting tightens the cross-lag coefficients to zero during estimation. • Prior coefficient covariances decay with increasing lag at a rate of 2 (that is, lower lags are more important than larger lags). 12-2213
12
Functions
• For the normal conjugate prior model, assume that the innovations covariance is the 3-D identity matrix. seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; numseries = numel(seriesnames); numlags = 4; DiffusePriorMdl = bayesvarm(numseries,numlags,'SeriesNames',seriesnames); ConjugatePriorMdl = bayesvarm(numseries,numlags,'ModelType','conjugate',... 'SeriesNames',seriesnames,'Center',0.75,'SelfLag',100,'Decay',2); SemiConjugatePriorMdl = bayesvarm(numseries,numlags,'ModelType','semiconjugate',... 'SeriesNames',seriesnames,'Center',0.75,'SelfLag',100,'CrossLag',1,'Decay',2); NormalPriorMdl = bayesvarm(numseries,numlags,'ModelType','normal',... 'SeriesNames',seriesnames,'Center',0.75,'SelfLag',100,'CrossLag',1,'Decay',2,... 'Sigma',eye(numseries));
For each model, display summary of the prior distribution. summarize(DiffusePriorMdl) | Mean Std ------------------------Constant(1) | 0 Inf Constant(2) | 0 Inf Constant(3) | 0 Inf AR{1}(1,1) | 0 Inf AR{1}(2,1) | 0 Inf AR{1}(3,1) | 0 Inf AR{1}(1,2) | 0 Inf AR{1}(2,2) | 0 Inf AR{1}(3,2) | 0 Inf AR{1}(1,3) | 0 Inf AR{1}(2,3) | 0 Inf AR{1}(3,3) | 0 Inf AR{2}(1,1) | 0 Inf AR{2}(2,1) | 0 Inf AR{2}(3,1) | 0 Inf AR{2}(1,2) | 0 Inf AR{2}(2,2) | 0 Inf AR{2}(3,2) | 0 Inf AR{2}(1,3) | 0 Inf AR{2}(2,3) | 0 Inf AR{2}(3,3) | 0 Inf AR{3}(1,1) | 0 Inf AR{3}(2,1) | 0 Inf AR{3}(3,1) | 0 Inf AR{3}(1,2) | 0 Inf AR{3}(2,2) | 0 Inf AR{3}(3,2) | 0 Inf AR{3}(1,3) | 0 Inf AR{3}(2,3) | 0 Inf AR{3}(3,3) | 0 Inf AR{4}(1,1) | 0 Inf AR{4}(2,1) | 0 Inf AR{4}(3,1) | 0 Inf AR{4}(1,2) | 0 Inf AR{4}(2,2) | 0 Inf AR{4}(3,2) | 0 Inf
12-2214
summarize
AR{4}(1,3) | 0 Inf AR{4}(2,3) | 0 Inf AR{4}(3,3) | 0 Inf Innovations Covariance Matrix | INFL UNRATE FEDFUNDS -----------------------------------INFL | NaN NaN NaN | (NaN) (NaN) (NaN) UNRATE | NaN NaN NaN | (NaN) (NaN) (NaN) FEDFUNDS | NaN NaN NaN | (NaN) (NaN) (NaN)
Diffuse prior models put equal weight on all model coefficients. This specification allows the data to determine the posterior distribution. summarize(ConjugatePriorMdl) | Mean Std ------------------------------Constant(1) | 0 33.3333 Constant(2) | 0 33.3333 Constant(3) | 0 33.3333 AR{1}(1,1) | 0.7500 3.3333 AR{1}(2,1) | 0 3.3333 AR{1}(3,1) | 0 3.3333 AR{1}(1,2) | 0 3.3333 AR{1}(2,2) | 0.7500 3.3333 AR{1}(3,2) | 0 3.3333 AR{1}(1,3) | 0 3.3333 AR{1}(2,3) | 0 3.3333 AR{1}(3,3) | 0.7500 3.3333 AR{2}(1,1) | 0 1.6667 AR{2}(2,1) | 0 1.6667 AR{2}(3,1) | 0 1.6667 AR{2}(1,2) | 0 1.6667 AR{2}(2,2) | 0 1.6667 AR{2}(3,2) | 0 1.6667 AR{2}(1,3) | 0 1.6667 AR{2}(2,3) | 0 1.6667 AR{2}(3,3) | 0 1.6667 AR{3}(1,1) | 0 1.1111 AR{3}(2,1) | 0 1.1111 AR{3}(3,1) | 0 1.1111 AR{3}(1,2) | 0 1.1111 AR{3}(2,2) | 0 1.1111 AR{3}(3,2) | 0 1.1111 AR{3}(1,3) | 0 1.1111 AR{3}(2,3) | 0 1.1111 AR{3}(3,3) | 0 1.1111 AR{4}(1,1) | 0 0.8333 AR{4}(2,1) | 0 0.8333 AR{4}(3,1) | 0 0.8333 AR{4}(1,2) | 0 0.8333 AR{4}(2,2) | 0 0.8333 AR{4}(3,2) | 0 0.8333 AR{4}(1,3) | 0 0.8333 AR{4}(2,3) | 0 0.8333
12-2215
12
Functions
AR{4}(3,3) | 0 0.8333 Innovations Covariance Matrix | INFL UNRATE FEDFUNDS ----------------------------------------INFL | 0.1111 0 0 | (0.0594) (0.0398) (0.0398) UNRATE | 0 0.1111 0 | (0.0398) (0.0594) (0.0398) FEDFUNDS | 0 0 0.1111 | (0.0398) (0.0398) (0.0594)
With a tighter prior variance around 0 for larger lags, the posterior of the conjugate model is likely to be more sparse that the posterior of the diffuse model. summarize(SemiConjugatePriorMdl) | Mean Std -----------------------------Constant(1) | 0 100 Constant(2) | 0 100 Constant(3) | 0 100 AR{1}(1,1) | 0.7500 10 AR{1}(2,1) | 0 1 AR{1}(3,1) | 0 1 AR{1}(1,2) | 0 1 AR{1}(2,2) | 0.7500 10 AR{1}(3,2) | 0 1 AR{1}(1,3) | 0 1 AR{1}(2,3) | 0 1 AR{1}(3,3) | 0.7500 10 AR{2}(1,1) | 0 5 AR{2}(2,1) | 0 0.5000 AR{2}(3,1) | 0 0.5000 AR{2}(1,2) | 0 0.5000 AR{2}(2,2) | 0 5 AR{2}(3,2) | 0 0.5000 AR{2}(1,3) | 0 0.5000 AR{2}(2,3) | 0 0.5000 AR{2}(3,3) | 0 5 AR{3}(1,1) | 0 3.3333 AR{3}(2,1) | 0 0.3333 AR{3}(3,1) | 0 0.3333 AR{3}(1,2) | 0 0.3333 AR{3}(2,2) | 0 3.3333 AR{3}(3,2) | 0 0.3333 AR{3}(1,3) | 0 0.3333 AR{3}(2,3) | 0 0.3333 AR{3}(3,3) | 0 3.3333 AR{4}(1,1) | 0 2.5000 AR{4}(2,1) | 0 0.2500 AR{4}(3,1) | 0 0.2500 AR{4}(1,2) | 0 0.2500 AR{4}(2,2) | 0 2.5000 AR{4}(3,2) | 0 0.2500 AR{4}(1,3) | 0 0.2500 AR{4}(2,3) | 0 0.2500 AR{4}(3,3) | 0 2.5000 Innovations Covariance Matrix
12-2216
summarize
| INFL UNRATE FEDFUNDS ----------------------------------------INFL | 0.1111 0 0 | (0.0594) (0.0398) (0.0398) UNRATE | 0 0.1111 0 | (0.0398) (0.0594) (0.0398) FEDFUNDS | 0 0 0.1111 | (0.0398) (0.0398) (0.0594) summarize(NormalPriorMdl) | Mean Std -----------------------------Constant(1) | 0 100 Constant(2) | 0 100 Constant(3) | 0 100 AR{1}(1,1) | 0.7500 10 AR{1}(2,1) | 0 1 AR{1}(3,1) | 0 1 AR{1}(1,2) | 0 1 AR{1}(2,2) | 0.7500 10 AR{1}(3,2) | 0 1 AR{1}(1,3) | 0 1 AR{1}(2,3) | 0 1 AR{1}(3,3) | 0.7500 10 AR{2}(1,1) | 0 5 AR{2}(2,1) | 0 0.5000 AR{2}(3,1) | 0 0.5000 AR{2}(1,2) | 0 0.5000 AR{2}(2,2) | 0 5 AR{2}(3,2) | 0 0.5000 AR{2}(1,3) | 0 0.5000 AR{2}(2,3) | 0 0.5000 AR{2}(3,3) | 0 5 AR{3}(1,1) | 0 3.3333 AR{3}(2,1) | 0 0.3333 AR{3}(3,1) | 0 0.3333 AR{3}(1,2) | 0 0.3333 AR{3}(2,2) | 0 3.3333 AR{3}(3,2) | 0 0.3333 AR{3}(1,3) | 0 0.3333 AR{3}(2,3) | 0 0.3333 AR{3}(3,3) | 0 3.3333 AR{4}(1,1) | 0 2.5000 AR{4}(2,1) | 0 0.2500 AR{4}(3,1) | 0 0.2500 AR{4}(1,2) | 0 0.2500 AR{4}(2,2) | 0 2.5000 AR{4}(3,2) | 0 0.2500 AR{4}(1,3) | 0 0.2500 AR{4}(2,3) | 0 0.2500 AR{4}(3,3) | 0 2.5000 Innovations Covariance Matrix | INFL UNRATE FEDFUNDS ----------------------------------INFL | 1 0 0 | (0) (0) (0) UNRATE | 0 1 0
12-2217
12
Functions
| FEDFUNDS | |
(0) 0 (0)
(0) 0 (0)
(0) 1 (0)
Semiconjugate and normal conjugate prior models yield a richer prior specification than the conjugate and diffuse models.
Adjust Distribution Summary Displays Consider the 3-D VAR(4) model of “Inspect Minnesota Prior Assumptions Among Models” on page 122213. Assume that the prior distribution is diffuse. Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate the posterior distribution. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames}); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39 | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781 AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868
12-2218
summarize
AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647 AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860 AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620 AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917 AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
Summarize the posterior distribution; compare each estimation display type. summarize(PosteriorMdl); % The default is 'table'. | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781 AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868 AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647
12-2219
12
Functions
AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860 AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620 AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917 AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
The default is the same default tabular display that estimate prints. summarize(PosteriorMdl,'equation');
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1241 -0.4809 0.1005 0.3236 -0.0503 0.0450 0.4272 | (0.0762) (0.1536) (0.0390) (0.0868) (0.1647) (0.0413) (0.0860) DUNRATE | -0.0219 0.4716 0.0391 0.0913 0.2414 0.0536 -0.0389 | (0.0413) (0.0831) (0.0211) (0.0469) (0.0891) (0.0223) (0.0465) DFEDFUNDS | -0.1586 -1.4368 -0.2905 0.3403 -0.2968 -0.3117 0.2848 | (0.1632) (0.3287) (0.0835) (0.1857) (0.3526) (0.0883) (0.1841) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
12-2220
summarize
In the 'equation' display, rows correspond to response equations in the VAR system, and columns correspond to lagged response variables within equations. Elements in the table correspond to the posterior means of the corresponding coefficient; under each mean in parentheses is the standard deviation of the posterior. summarize(PosteriorMdl,'matrix'); VAR Coefficient Matrix of Lag 1 | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) -------------------------------------------------INFL | 0.1241 -0.4809 0.1005 | (0.0762) (0.1536) (0.0390) DUNRATE | -0.0219 0.4716 0.0391 | (0.0413) (0.0831) (0.0211) DFEDFUNDS | -0.1586 -1.4368 -0.2905 | (0.1632) (0.3287) (0.0835) VAR Coefficient Matrix of Lag 2 | INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) -------------------------------------------------INFL | 0.3236 -0.0503 0.0450 | (0.0868) (0.1647) (0.0413) DUNRATE | 0.0913 0.2414 0.0536 | (0.0469) (0.0891) (0.0223) DFEDFUNDS | 0.3403 -0.2968 -0.3117 | (0.1857) (0.3526) (0.0883) VAR Coefficient Matrix of Lag 3 | INFL(-3) DUNRATE(-3) DFEDFUNDS(-3) -------------------------------------------------INFL | 0.4272 0.2738 0.0523 | (0.0860) (0.1620) (0.0428) DUNRATE | -0.0389 0.0552 0.0008 | (0.0465) (0.0876) (0.0232) DFEDFUNDS | 0.2848 -0.7401 0.0028 | (0.1841) (0.3466) (0.0917) VAR Coefficient Matrix of Lag 4 | INFL(-4) DUNRATE(-4) DFEDFUNDS(-4) -------------------------------------------------INFL | 0.0167 -0.1830 0.0067 | (0.0901) (0.1520) (0.0395) DUNRATE | 0.0285 -0.1795 0.0088 | (0.0488) (0.0822) (0.0214) DFEDFUNDS | -0.0690 0.1494 -0.1372 | (0.1928) (0.3253) (0.0845) Constant Term INFL | 0.1007 | (0.0832) DUNRATE | -0.0499 | 0.0450 DFEDFUNDS | -0.4221 | 0.1781 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS -------------------------------------------
12-2221
12
Functions
INFL
| | DUNRATE | | DFEDFUNDS | |
0.3028 (0.0321) -0.0217 (0.0124) 0.1579 (0.0499)
-0.0217 (0.0124) 0.0887 (0.0094) -0.1435 (0.0283)
0.1579 (0.0499) -0.1435 (0.0283) 1.3872 (0.1470)
In the 'matrix' display, each table contains the posterior mean of the corresponding coefficient matrix. Under each mean in parentheses the posterior standard deviation.
Return and Inspect Estimation Summary Information Consider the 3-D VAR(4) model of “Inspect Minnesota Prior Assumptions Among Models” on page 122213. Assume that the parameters follow a semiconjugate prior model. Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment aand federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create a semiconjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names, and suppress the estimation display. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'Model','semiconjugate',... 'SeriesNames',seriesnames);
Estimate the posterior distribution. Suppress the estimation display. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','off');
Because the posterior of a semiconjugate model is analytically intractable, PosteriorMdl is an empiricalbvarm model object storing the draws from the Gibbs sampler. Summarize the posterior distribution; return the estimation summary. Summary = summarize(PosteriorMdl); | Mean Std ------------------------------Constant(1) | 0.1830 0.0718 Constant(2) | -0.0808 0.0413 Constant(3) | -0.0161 0.1309 AR{1}(1,1) | 0.2246 0.0650 AR{1}(2,1) | -0.0263 0.0340 AR{1}(3,1) | -0.0263 0.0775
12-2222
summarize
AR{1}(1,2) | -0.0837 0.0824 AR{1}(2,2) | 0.3665 0.0740 AR{1}(3,2) | -0.1283 0.0948 AR{1}(1,3) | 0.1362 0.0323 AR{1}(2,3) | 0.0154 0.0198 AR{1}(3,3) | -0.0538 0.0685 AR{2}(1,1) | 0.2518 0.0700 AR{2}(2,1) | 0.0928 0.0352 AR{2}(3,1) | 0.0373 0.0628 AR{2}(1,2) | -0.0097 0.0632 AR{2}(2,2) | 0.1657 0.0709 AR{2}(3,2) | -0.0254 0.0688 AR{2}(1,3) | 0.0329 0.0308 AR{2}(2,3) | 0.0341 0.0199 AR{2}(3,3) | -0.1451 0.0637 AR{3}(1,1) | 0.2895 0.0665 AR{3}(2,1) | 0.0013 0.0332 AR{3}(3,1) | -0.0036 0.0530 AR{3}(1,2) | 0.0322 0.0538 AR{3}(2,2) | -0.0150 0.0667 AR{3}(3,2) | -0.0369 0.0568 AR{3}(1,3) | 0.0368 0.0298 AR{3}(2,3) | -0.0083 0.0194 AR{3}(3,3) | 0.1516 0.0603 AR{4}(1,1) | 0.0452 0.0644 AR{4}(2,1) | 0.0225 0.0325 AR{4}(3,1) | -0.0097 0.0470 AR{4}(1,2) | -0.0218 0.0468 AR{4}(2,2) | -0.1125 0.0611 AR{4}(3,2) | 0.0013 0.0491 AR{4}(1,3) | 0.0180 0.0273 AR{4}(2,3) | 0.0084 0.0179 AR{4}(3,3) | -0.0815 0.0594 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.2983 -0.0219 0.1750 | (0.0307) (0.0121) (0.0500) DUNRATE | -0.0219 0.0890 -0.1495 | (0.0121) (0.0093) (0.0290) DFEDFUNDS | 0.1750 -0.1495 1.4730 | (0.0500) (0.0290) (0.1514) Summary Summary = struct with fields: Description: "3-Dimensional VAR(4) Model" NumEstimatedParameters: 39 Table: [39x2 table] CoeffMap: [39x1 string] CoeffMean: [39x1 double] CoeffStd: [39x1 double] SigmaMean: [3x3 double] SigmaStd: [3x3 double]
12-2223
12
Functions
Summary is a structure array of fields containing posterior estimation information. For example, the CoeffMap field contains a list of the coefficient names. The order of the names corresponds to the order the all coefficient vector inputs and outputs. Display CoeffMap. Summary.CoeffMap ans = 39x1 string "AR{1}(1,1)" "AR{1}(1,2)" "AR{1}(1,3)" "AR{2}(1,1)" "AR{2}(1,2)" "AR{2}(1,3)" "AR{3}(1,1)" "AR{3}(1,2)" "AR{3}(1,3)" "AR{4}(1,1)" "AR{4}(1,2)" "AR{4}(1,3)" "Constant(1)" "AR{1}(2,1)" "AR{1}(2,2)" "AR{1}(2,3)" "AR{2}(2,1)" "AR{2}(2,2)" "AR{2}(2,3)" "AR{3}(2,1)" "AR{3}(2,2)" "AR{3}(2,3)" "AR{4}(2,1)" "AR{4}(2,2)" "AR{4}(2,3)" "Constant(2)" "AR{1}(3,1)" "AR{1}(3,2)" "AR{1}(3,3)" "AR{2}(3,1)" ⋮
Input Arguments Mdl — Prior or posterior Bayesian VAR model conjugatebvarm model object | semiconjugatebvarm model object | diffusebvarm model object | normalbvarm model object | empiricalbvarm model object Prior or posterior Bayesian VAR model, specified as a model object in this table. Model Object
Description
conjugatebvarm
Dependent, matrix-normal-inverse-Wishart conjugate model returned by bayesvarm, conjugatebvarm, or estimate
semiconjugatebv Independent, normal-inverse-Wishart semiconjugate prior model returned by arm bayesvarm or semiconjugatebvarm
12-2224
summarize
Model Object
Description
diffusebvarm
Diffuse prior model returned by bayesvarm or diffusebvarm
empiricalbvarm
Prior or posterior model characterized by random draws from respective distributions, returned by empiricalbvarm or estimate
display — Distribution summary display style 'table' (default) | 'off' | 'equation' | 'matrix' Distribution summary display style, specified as a value in this table. Value
Description
'off'
summarize does not print to the command line.
'table'
summarize prints the following: • Estimation information
• Tabular summary of coefficient posterior means and standard deviations; each row corresp to a coefficient, and each column corresponds to an estimate type
• Posterior mean of the innovations covariance matrix with standard deviations in parenthes 'equation'
summarize prints the following: • Estimation information
• Tabular summary of posterior means and standard deviations; each row corresponds to a response variable in the system, and each column corresponds to a coefficient in the equat (for example, the column labeled Y1(-1) contains the estimates of the lag 1 coefficient of first response variable in each equation)
• Posterior mean of the innovations covariance matrix with standard deviations in parenthes 'matrix'
summarize prints the following: • Estimation information
• Separate tabular displays of posterior means and standard deviations (in parentheses) for e parameter in the model Φ1,…, Φp, c, δ, Β, and Σ Data Types: char | string
Output Arguments Summary — Distribution summary statistics structure array Distribution summary statistics, returned as a structure array containing these fields: Field
Description
Data type
Description
Model description
string scalar
NumEstimatedParameters
Number of coefficients
numeric scalar
12-2225
12
Functions
Field
Description
Data type
Table
Table of coefficient distribution table means and standard deviations; each row corresponds to a coefficient and each column corresponds to a statistic
CoeffMap
Coefficient names
string vector
CoeffMean
Coefficient distribution means
numeric vector, rows correspond to CoeffMap
CoeffStd
Coefficient distribution standard numeric vector, rows deviations correspond to CoeffMap
SigmaMean
Innovations covariance distribution mean matrix
numeric matrix, rows and columns correspond to response equations
SigmaStd
Innovations covariance distribution standard deviation matrix
numeric matrix, rows and columns correspond to response equations
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. • zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix
12-2226
summarize
zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation.
Version History Introduced in R2020a
See Also Functions estimate | bayesvarm Objects normalbvarm | conjugatebvarm | semiconjugatebvarm | diffusebvarm | empiricalbvarm
12-2227
12
Functions
summarize Display estimation results of conditional variance model
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the conditional variance model Mdl. • If Mdl is an estimated model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC). • If Mdl is an unestimated model returned by garch, egarch, or gjr, then summarize prints the standard object display (the same display printed during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated model, then results is a structure containing estimation results. • If Mdl is an unestimated model, then results is a garch, egarch, or gjr model object that is equal to Mdl.
Examples Display Estimation Results Print the results from estimating a GARCH model using simulated data. Simulate data from a GARCH(1,1) model with known parameter values. Mdl0 = garch('Constant',0.01,'GARCH',0.8,'ARCH',0.14); rng 'default'; % For reproducibility [V,Y] = simulate(Mdl0,100);
Fit a GARCH(1,1) model to the simulated data. Suppress the estimation display. Mdl = garch(1,1); EstMdl = estimate(Mdl,Y,'Display','off');
Display an estimation summary. summarize(EstMdl) GARCH(1,1) Conditional Variance Model (Gaussian Distribution)
12-2228
summarize
Effective Sample Size: 100 Number of Estimated Parameters: 3 LogLikelihood: -96.5255 AIC: 199.051 BIC: 206.866 Value _______ Constant GARCH{1} ARCH{1}
0.0167 0.77263 0.19169
StandardError _____________
TStatistic __________
PValue __________
1.0117 9.945 2.5535
0.31169 2.6523e-23 0.010664
0.016508 0.07769 0.075068
Extract Estimation Results from Estimated Model Estimate several models by passing an EGARCH model template and data to estimate. Vary the number of ARCH and GARCH lags among the models. Extract the AIC from the estimation results, and choose the model that minimizes the fit statistic. Simulate data from an EGARCH(0,1) model with known parameter values. Mdl0 = egarch('Constant',0.01,'ARCH',0.75,'Leverage',-0.1); rng(2); % For reproducibility [~,Y] = simulate(Mdl0,100);
To determine the number of ARCH and GARCH lags, create and estimate multiple EGARCH models. Vary the number of GARCH and ARCH lags (p and q, respectively) among the models from 0 to 1 lag. Exclude the case where p = 1 and q = 0 because the presence of GARCH lags requires the presence of ARCH lags. Suppress all estimation displays. Extract the AIC from the estimation results structure. The field AIC stores the AIC. pq = [0 0; 0 1; 1 1]; AIC = zeros(size(pq,1),1); % Preallocation for j = 1:size(pq,1) Mdl = egarch(pq(j,1),pq(j,2)); EstMdl = estimate(Mdl,Y,'Display','off'); results = summarize(EstMdl); AIC(j) = results.AIC; end
Compare the AIC values among the models. [minAIC,bestidx] = min(AIC,[],1); bestPQ = pq(bestidx,:) bestPQ = 1×2 0
1
The best fitting model is the EGARCH(0,1) model because its corresponding AIC is the lowest. This model also has the structure of the model used to simulate the data. 12-2229
12
Functions
Input Arguments Mdl — Conditional variance model garch model object | egarch model object | gjr model object Conditional variance model, specified as a garch, egarch, or gjr model object returned by estimate, garch, egarch, or gjr.
Output Arguments results — Model summary structure array | garch model object | egarch model object | gjr model object Model summary, returned as a structure array or a garch, egarch, or gjr model object. • If Mdl is an estimated model, then results is a structure array containing the fields in this table. Field
Description
Description
Model summary description (string)
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Maximum likelihood estimates of the model parameters with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
• If Mdl is an unestimated model, then results is a conditional variance model object that is equal to Mdl.
Version History Introduced in R2012a
See Also Objects garch | egarch | gjr Functions estimate 12-2230
summarize
summarize Summarize Markov-switching dynamic regression model estimation results
Syntax summarize(Mdl) summarize(Mdl,state) results = summarize( ___ )
Description summarize(Mdl) displays a summary of the Markov-switching dynamic regression model Mdl. • If Mdl is an estimated model returned by estimate, then summarize displays estimation results to the MATLAB Command Window. The display includes: • A model description • Estimated transition probabilities • Fit statistics, which include the effective sample size, number of estimated submodel parameters and constraints, loglikelihood, and information criteria (AIC and BIC) • A table of submodel estimates and inferences, which includes coefficient estimates with standard errors, t-statistics, and p-values. • If Mdl is an unestimated Markov-switching model returned by msVAR, summarize prints the standard object display (the same display that msVAR prints during model creation). summarize(Mdl,state) displays only summary information for the submodel with name state. results = summarize( ___ ) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated Markov-switching model, results is a table containing the submodel estimates and inferences. • If Mdl is an unestimated model, results is an msVAR object that is equal to Mdl.
Examples Estimate Markov-Switching Dynamic Regression Model Consider a two-state Markov-switching dynamic regression model of the postwar US real GDP growth rate, as estimated in [1]. Create Partially Specified Model for Estimation Create a Markov-switching dynamic regression model for the naive estimator by specifying a twostate discrete-time Markov chain with an unknown transition matrix and AR(0) (constant only) submodels for both regimes. Label the regimes. P = NaN(2); mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
12-2231
12
Functions
mdl = arima(0,0,0); Mdl = msVAR(mc,[mdl; mdl]);
Mdl is a partially specified msVAR object. NaN-valued elements of the Switch and SubModels properties indicate estimable parameters. Create Fully Specified Model Containing Initial Values The estimation procedure requires initial values for all estimable parameters. Create a fully specified Markov-switching dynamic regression model that has the same structure as Mdl, but set all estimable parameters to initial values. This example uses arbitrary initial values. P0 = 0.5*ones(2); mc0 = dtmc(P0,'StateNames',Mdl.StateNames); mdl01 = arima('Constant',1,'Variance',1); mdl02 = arima('Constant',-1,'Variance',1); Mdl0 = msVAR(mc0,[mdl01; mdl02]);
Mdl0 is a fully specified msVAR object. Load and Preprocess Data Load the US GDP data set. load Data_GDP
Data contains quarterly measurements of the US real GDP in the period 1947:Q1–2005:Q2. The estimation period in [1] is 1947:Q2–2004:Q2. For more details on the data set, enter Description at the command line. Transform the data to an annualized rate series: 1
Convert the data to a quarterly rate within the estimation period.
2
Annualize the quarterly rates.
qrate = diff(Data(2:230))./Data(2:229); % Quarterly rate arate = 100*((1 + qrate).^4 - 1); % Annualized rate
Estimate Model Fit the model Mdl to the annualized rate series arate. Specify Mdl0 as the model containing the initial estimable parameter values. EstMdl = estimate(Mdl,Mdl0,arate);
EstMdl is an estimated (fully specified) Markov-switching dynamic regression model. EstMdl.Switch is an estimated discrete-time Markov chain model (dtmc object), and EstMdl.Submodels is a vector of estimated univariate VAR(0) models (varm objects). Display the estimated state-specific dynamic models. EstMdlExp = EstMdl.Submodels(1) EstMdlExp = varm with properties: Description: "1-Dimensional VAR(0) Model" SeriesNames: "Y1"
12-2232
summarize
NumSeries: P: Constant: AR: Trend: Beta: Covariance:
1 0 4.90146 {} 0 [1×0 matrix] 12.087
EstMdlRec = EstMdl.Submodels(2) EstMdlRec = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(0) Model" "Y1" 1 0 0.0084884 {} 0 [1×0 matrix] 12.6876
Display the estimated state transition matrix. EstP = EstMdl.Switch.P EstP = 2×2 0.9088 0.2303
0.0912 0.7697
Display an estimation summary containing parameter estimates and inferences. summarize(EstMdl) Description 1-Dimensional msVAR Model with 2 Submodels Switch Estimated Transition Matrix: 0.909 0.091 0.230 0.770 Fit Effective Sample Size: 228 Number of Estimated Parameters: 2 Number of Constrained Parameters: 0 LogLikelihood: -639.496 AIC: 1282.992 BIC: 1289.851 Submodels Estimate _________
StandardError _____________
TStatistic __________
PValue ___________
12-2233
12
Functions
State 1 Constant(1) State 2 Constant(1)
4.9015 0.0084884
0.23023 0.2359
21.289 0.035983
1.4301e-100 0.9713
Display Estimation Summary Separately for Each State Create the following fully specified Markov-switching model the DGP. •
• • •
0.5 0.2 0.3 State transition matrix: P = 0 . 2 0 . 6 0 . 2 . 0.2 0.1 0.7 State 1: State 2: State 3:
y1, t y2, t y1, t y2, t y1, t y2, t
=
−1 −0 . 5 0 . 1 y1, t − 1 0 0.5 0 + + ε1, t, where ε1, t ∼ N2 . , −1 0 . 2 −0 . 75 y2, t − 1 0 0 1
=
−1 0 1 0 . + ε2, t, where ε2, t ∼ N2 , 2 0 0 1
=
1 0 . 5 0 . 1 y1, t − 1 0 1 −0 . 1 + + ε3, t, where ε3, t ∼ N2 , . 2 0 . 2 0 . 75 y2, t − 1 0 −0 . 1 2
PDGP = [0.5 0.2 0.3; 0.2 0.6 0.2; 0.2 0.1 0.7]; mcDGP = dtmc(PDGP); constant1 = [-1; -1]; constant2 = [-1; 2]; constant3 = [1; 2]; AR1 = [-0.5 0.1; 0.2 -0.75]; AR3 = [0.5 0.1; 0.2 0.75]; Sigma1 = [0.5 0; 0 1]; Sigma2 = eye(2); Sigma3 = [1 -0.1; -0.1 2]; mdl1DGP = varm(Constant=constant1,AR={AR1},Covariance=Sigma1); mdl2DGP = varm(Constant=constant2,Covariance=Sigma2); mdl3DGP = varm(Constant=constant3,AR={AR3},Covariance=Sigma3); mdlDGP = [mdl1DGP; mdl2DGP; mdl3DGP]; MdlDGP = msVAR(mcDGP,mdlDGP);
Generate a random response path of length 1000 from the DGP. rng(1) % For reproducibiliy Y = simulate(MdlDGP,1000);
Create a partially specified Markov-switching model that has the same structure as the DGP, but the transition matrix, and all submodel coefficients and innovations covariance matrices are unknown and estimable. mc = dtmc(nan(3)); mdlar = varm(2,1); mdlc = varm(2,0); Mdl = msVAR(mc,[mdlar; mdlc; mdlar]);
Initialize the estimation procedure by fully specifying a Markov-switching model that has the same structure as Mdl, but has the following parameter values: 12-2234
summarize
• A randomly drawn transition matrix • Randomly drawn contant vectors for each model • AR self lags of 0.1 and cross lags of 0 • The identify matrix for the innovations covariance P0 = randi(10,3,3); mc0 = dtmc(P0); constant01 = randn(2,1); constant02 = randn(2,1); constant03 = randn(2,1); AR0 = 0.1*eye(2); Sigma0 = eye(2); mdl01 = varm(Constant=constant01,AR={AR0},Covariance=Sigma0); mdl02 = varm(Constant=constant02,Covariance=Sigma0); mdl03 = varm(Constant=constant03,AR={AR0},Covariance=Sigma0); submdl0 = [mdl01; mdl02; mdl03]; Mdl0 = msVAR(mc0,submdl0);
Fit the Markov-switching model to the simulated series. Plot the loglikelihood after each iteration of the EM algorithm. EstMdl = estimate(Mdl,Mdl0,Y,IterationPlot=true);
The plot displays the evolution of the loglikelihood with increasing iterations of the EM algorithm. The procedure terminates when one of the stopping criteria is satisfied. Display an estimation summary of the model. 12-2235
12
Functions
summarize(EstMdl) Description 2-Dimensional msVAR Model with 3 Submodels Switch Estimated Transition Matrix: 0.501 0.245 0.254 0.204 0.549 0.247 0.188 0.102 0.710 Fit Effective Sample Size: 999 Number of Estimated Parameters: 14 Number of Constrained Parameters: 0 LogLikelihood: -3634.005 AIC: 7296.010 BIC: 7364.704 Submodels
State State State State State State State State State State State State State State
1 1 1 1 1 1 2 2 3 3 3 3 3 3
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2) Constant(1) Constant(2) Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
Estimate ________
StandardError _____________
-0.98929 -1.0884 -0.48446 0.1835 0.083953 -0.72972 -0.9082 1.9514 1.1212 1.9561 0.48965 0.22688 0.095847 0.72766
0.023779 0.030164 0.01547 0.019624 0.0070162 0.0089002 0.030103 0.030483 0.044427 0.0593 0.023149 0.030899 0.012005 0.016024
TStatistic __________ -41.603 -36.083 -31.316 9.3509 11.966 -81.989 -30.17 64.016 25.237 32.986 21.152 7.3427 7.9838 45.41
PValue ___________ 0 4.1957e-285 2.8121e-215 8.6868e-21 5.3839e-33 0 5.9064e-200 0 1.5818e-140 1.2831e-238 2.6484e-99 2.0936e-13 1.4188e-15 0
Display an estimation summary separately for each state. summarize(EstMdl,1) Description 2-Dimensional VAR Submodel, State 1 Submodel
State State State State State State
1 1 1 1 1 1
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
summarize(EstMdl,2)
12-2236
Estimate ________
StandardError _____________
-0.98929 -1.0884 -0.48446 0.1835 0.083953 -0.72972
0.023779 0.030164 0.01547 0.019624 0.0070162 0.0089002
TStatistic __________ -41.603 -36.083 -31.316 9.3509 11.966 -81.989
PValue ___________ 0 4.1957e-285 2.8121e-215 8.6868e-21 5.3839e-33 0
summarize
Description 2-Dimensional VAR Submodel, State 2 Submodel Estimate ________ State 2 Constant(1) State 2 Constant(2)
StandardError _____________
-0.9082 1.9514
TStatistic __________
PValue ___________
-30.17 64.016
5.9064e-200 0
TStatistic __________
PValue ___________
25.237 32.986 21.152 7.3427 7.9838 45.41
1.5818e-140 1.2831e-238 2.6484e-99 2.0936e-13 1.4188e-15 0
0.030103 0.030483
summarize(EstMdl,3) Description 2-Dimensional VAR Submodel, State 3 Submodel Estimate ________ State State State State State State
3 3 3 3 3 3
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
StandardError _____________
1.1212 1.9561 0.48965 0.22688 0.095847 0.72766
0.044427 0.0593 0.023149 0.030899 0.012005 0.016024
Return Estimation Summary Table Consider the model for the US GDP growth rate in “Estimate Markov-Switching Dynamic Regression Model” on page 12-2231. Create a Markov-switching dynamic regression model for the naive estimator. P = NaN(2); mc = dtmc(P,'StateNames',["Expansion" "Recession"]); mdl = arima(0,0,0); Mdl = msVAR(mc,[mdl; mdl]);
Create a fully specified Markov-switching dynamic regression model that has the same structure as Mdl, but set all estimable parameters to initial values. P0 = 0.5*ones(2); mc0 = dtmc(P0,'StateNames',Mdl.StateNames); mdl01 = arima('Constant',1,'Variance',1); mdl02 = arima('Constant',-1,'Variance',1); Mdl0 = msVAR(mc0,[mdl01; mdl02]);
Load the US GDP data set. Preprocess the data. load Data_GDP qrate = diff(Data(2:230))./Data(2:229); % Quarterly rate arate = 100*((1 + qrate).^4 - 1); % Annualized rate
Fit the model Mdl to the annualized rate series arate. Specify Mdl0 as the model containing the initial estimable parameter values. 12-2237
12
Functions
EstMdl = estimate(Mdl,Mdl0,arate);
Return an estimation summary table. results = summarize(EstMdl) results=2×4 table
State 1 Constant(1) State 2 Constant(1)
Estimate _________
StandardError _____________
TStatistic __________
PValue ___________
4.9015 0.0084884
0.23023 0.2359
21.289 0.035983
1.4301e-100 0.9713
results is a table containing estimates and inferences for all submodel coefficients. Identify significant coefficient estimates. results.Properties.RowNames(results.PValue < 0.05) ans = 1x1 cell array {'State 1 Constant(1)'}
Input Arguments Mdl — Markov-switching dynamic regression model msVAR object Markov-switching dynamic regression model, specified as an msVAR object returned by estimate or msVAR. state — State to summarize integer in 1:Mdl.NumStates (default) | state name in Mdl.StateNames State to summarize, specified as an integer in 1:Mdl.NumStates or a state name in Mdl.StateNames. The default summarizes all states. Example: summarize(Mdl,3) summarizes the third state in Mdl. Example: summarize(Mdl,"Recession") summarizes the state labeled "Recession" in Mdl. Data Types: double | char | string
Output Arguments results — Model summary table | msVAR object Model summary, returned as a table or an msVAR object. • If Mdl is an estimated Markov-switching model returned by estimate, results is a table of summary information for the submodel parameter estimates. Each row corresponds to a submodel 12-2238
summarize
coefficient. Columns correspond to the estimate (Estimate), standard error (StandardError), tstatistic (TStatistic), and the p-value (PValue). When the summary includes all states (the default), results.Properties stores the following fit statistics: Field
Description
Description
Model summary description (character vector)
EffectiveSampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
NumConstraints
Number of equality constraints (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike information criterion (numeric scalar)
BIC
Bayesian information criterion (numeric scalar)
• If Mdl is an unestimated model, results is an msVAR object that is equal to Mdl. Note When results is a table, it contains only submodel parameter estimates: • Mdl.Switch contains estimated transition probabilities. • Mdl.Submodels(j).Covariance contains the estimated residual covariance matrix of state j. For details, see msVAR.
Algorithms estimate implements a version of Hamilton's Expectation-Maximization (EM) algorithm, as described in [3]. The standard errors, loglikelihood, and information criteria are conditional on optimal parameter values in the estimated transition matrix Mdl.Switch. In particular, standard errors do not account for variation in estimated transition probabilities.
Version History Introduced in R2021b
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-2239
12
Functions
[4] Hamilton, J. D. "Macroeconomic Regimes and Regime Shifts." In Handbook of Macroeconomics. (H. Uhlig and J. Taylor, eds.). Amsterdam: Elsevier, 2016.
See Also Objects msVAR | dtmc Functions estimate
12-2240
summarize
summarize Class: regARIMA Display estimation results of regression model with ARIMA errors
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the regression model with ARIMA errors Mdl. • If Mdl is an estimated model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC), and the estimated innovations variance. • If Mdl is an unestimated model returned by regARIMA, then summarize prints the standard object display (the same display that regARIMA prints during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated model, then results is a structure containing estimation results. • If Mdl is an unestimated model, then results is a regARIMA model object that is equal to Mdl.
Input Arguments Mdl — Regression model with ARIMA errors regARIMA model object Regression model with ARIMA errors, specified as a regARIMA model object returned by estimate or regARIMA.
Output Arguments results — Model summary structure array | regARIMA model object Model summary, returned as a structure array or a regARIMA model object. • If Mdl is an estimated model, then results is a structure array containing the fields in this table. Field
Description
Description
Model summary description (string)
12-2241
12
Functions
Field
Description
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Maximum likelihood estimates of the model parameters with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
• If Mdl is an unestimated model, then results is a regARIMA model object that is equal to Mdl.
Examples Display Estimation Results Regress the US gross domestic product (GDP) onto the US consumer price index (CPI) using a regression model with ARMA(1,1) errors, and summarize the results. Load the US Macroeconomic data set and preprocess the data. load Data_USEconModel; logGDP = log(DataTimeTable.GDP); dlogGDP = diff(logGDP); dCPI = diff(DataTimeTable.CPIAUCSL);
Fit the model to the data. Mdl = regARIMA('ARLags',1,'MALags',1); EstMdl = estimate(Mdl,dlogGDP,'X',dCPI,'Display','off');
Display the estimates. summarize(EstMdl) ARMA(1,1) Error Model (Gaussian Distribution) Effective Sample Size: 248 Number of Estimated Parameters: 5 LogLikelihood: 798.406 AIC: -1586.81 BIC: -1569.24 Value __________
12-2242
StandardError _____________
TStatistic __________
PValue __________
summarize
Intercept AR{1} MA{1} Beta(1) Variance
0.014776 0.60527 -0.16165 0.002044 9.3578e-05
0.0014627 0.08929 0.10956 0.00070616 6.0314e-06
10.102 6.7787 -1.4755 2.8946 15.515
5.4238e-24 1.2124e-11 0.14009 0.0037969 2.7338e-54
Extract Estimation Results from Estimated Model Estimate several models by passing the data to estimate. Vary the autoregressive and moving average degrees p and q, respectively. Estimation results contain the AIC, which you can extract and then compare among the models. Simulate response and predictor data for the regression model with ARMA errors: −2 + ut 1.5 ut = 0 . 75ut − 1 − 0 . 5ut − 2 + εt + 0 . 7εt − 1, yt = 2 + Xt
where εt is Gaussian with mean 0 and variance 1. Mdl0 = regARIMA('Intercept',2,'Beta',[-2; 1.5],... 'AR',{0.75, -0.5},'MA',0.7,'Variance',1); rng(2); % For reproducibility X = randn(1000,2); % Predictors y = simulate(Mdl0,1000,'X',X);
To determine the number of AR and MA lags, create and estimate multiple regression models with ARMA(p, q) errors. Vary p = 1,..,3 and q = 1,...,3 among the models. Suppress all estimation displays. Extract the AIC from the estimation results structure. The field AIC stores the AIC. pMax = 3; qMax = 3; AIC = zeros(pMax,qMax); % Preallocation for p = 1:pMax for q = 1:qMax Mdl = regARIMA(p,0,q); EstMdl = estimate(Mdl,y,'X',X,'Display','off'); results = summarize(EstMdl); AIC(p,q) = results.AIC; end end
Compare the AIC values among the models. minAIC = min(min(AIC)) minAIC = 2.9280e+03 [bestP,bestQ] = find(AIC == minAIC) bestP = 2
12-2243
12
Functions
bestQ = 1
The best fitting model is the regression model with AR(2,1) errors because its corresponding AIC is the lowest. This model also has the structure of the model used to simulate the data.
Version History Introduced in R2018a
See Also Objects regARIMA Functions estimate
12-2244
summarize
summarize Distribution summary statistics of standard Bayesian linear regression model
Syntax summarize(Mdl) SummaryStatistics = summarize(Mdl)
Description To obtain a summary of a Bayesian linear regression model for predictor selection, see summarize. summarize(Mdl) displays a tabular summary of the random regression coefficients and disturbance variance of the standard Bayesian linear regression model on page 12-2248 Mdl at the command line. For each parameter, the summary includes the: • Standard deviation (square root of the variance) • 95% equitailed credible intervals • Probability that the parameter is greater than 0 • Description of the distributions, if known SummaryStatistics = summarize(Mdl) returns a structure array that stores a: • Table containing the summary of the regression coefficients and disturbance variance • Table containing the covariances between variables • Description of the joint distribution of the parameters
Examples Summarize Posterior Distribution Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t time points, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions: •
β | σ2 ∼ N4 M, σ2V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. 12-2245
12
Functions
These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model. Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and the variable names. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames);
PriorMdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Summarize the prior distribution. summarize(PriorMdl) | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) IPI | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) E | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) WR | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
The function displays a table of summary statistics and other information about the prior distribution at the command line. Load the Nelson-Plosser data set and create variables for the predictor and response data. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Estimate the posterior distributions. Suppress the estimation display. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is a conjugateblm model object that contains the posterior distributions of β and σ2. Obtain summary statistics from the posterior distribution. summary = summarize(PosteriorMdl);
summary is a structure array containing three fields: MarginalDistributions, Covariances, and JointDistribution. Display the marginal distribution summary and covariances by using dot notation. summary.MarginalDistributions ans=5×5 table
Intercept
12-2246
Mean _________
Std __________
-24.249
8.7821
CI95 ________________________ -41.514
-6.9847
Positive _________
Distr ____________
0.0032977
{'t (-24.25,
summarize
IPI E WR Sigma2
4.3913 0.0011202 2.4683 44.135
0.1414 0.00032931 0.34895 7.802
4.1134 0.00047284 1.7822 31.427
4.6693 0.0017676 3.1543 61.855
1 0.99952 1 1
{'t (4.39, 0 {'t (0.00, 0 {'t (2.47, 0 {'IG(34.00,
summary.Covariances ans=5×5 table
Intercept IPI E WR Sigma2
Intercept __________
IPI ___________
E ___________
WR ___________
Sigma2 ______
77.125 0.77133 -0.0023655 0.5311 0
0.77133 0.019994 -6.5001e-06 -0.02948 0
-0.0023655 -6.5001e-06 1.0844e-07 -8.0013e-05 0
0.5311 -0.02948 -8.0013e-05 0.12177 0
0 0 0 0 60.871
The MarginalDistributions field is a table of summary statistics and other information about the posterior distribution. Covariances is a table containing the covariance matrix of the parameters.
Input Arguments Mdl — Standard Bayesian linear regression model conjugateblm model object | semiconjugateblm model object | diffuseblm model object | empiricalblm model object | customblm model object Standard Bayesian linear regression model, specified as a model object in this table. Model Object
Description
conjugateblm
Dependent, normal-inverse-gamma conjugate prior or posterior model returned by bayeslm or estimate
semiconjugatebl Independent, normal-inverse-gamma semiconjugate prior model returned by m bayeslm diffuseblm
Diffuse prior model returned by bayeslm
empiricalblm
Prior or posterior model characterized by random draws from respective distributions, returned by bayeslm or estimate
customblm
Prior distribution function that you declare returned by bayeslm
Output Arguments SummaryStatistics — Parameter distribution summary structure array Parameter distribution summary, returned as a structure array containing the information in this table.
12-2247
12
Functions
Structure Field
Description
MarginalDistribu Table containing a summary of the parameter distributions. Rows correspond tions to parameters. Columns correspond to the: • Estimated posterior mean (Mean) • Standard deviation (Std) • 95% equitailed credible interval (CI95) • Posterior probability that the parameter is greater than 0 (Positive) • Description of the marginal or conditional posterior distribution of the parameter (Distribution) Row names are the names in Mdl.VarNames, and the name of the last row is Sigma2. Covariances
Table containing covariances between parameters. Rows and columns correspond to the intercept (if one exists) the regression coefficients, and disturbance variance. Row and column names are the same as the row names in MarginalDistributions.
JointDistributio A string scalar that describes the distributions of the regression coefficients n (Beta) and the disturbance variance (Sigma2) when known. For distribution descriptions: • N(Mu,V) denotes the normal distribution with mean Mu and variance matrix V. This distribution can be multivariate. • IG(A,B) denotes the inverse gamma distribution with shape A and scale B. • t(Mu,V,DoF) denotes the Student’s t distribution with mean Mu, variance V, and degrees of freedom DoF.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. 12-2248
summarize
Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Version History Introduced in R2017a
See Also Objects conjugateblm | semiconjugateblm | diffuseblm | empiricalblm | customblm Functions estimate | forecast Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-2249
12
Functions
summarize Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Syntax summarize(Mdl) SummaryStatistics = summarize(Mdl)
Description To obtain a summary of a standard Bayesian linear regression model, see summarize. summarize(Mdl) displays a tabular summary of the random regression coefficients and disturbance variance of the Bayesian linear regression model on page 12-2253 Mdl at the command line. For each parameter, the summary includes the: • Standard deviation (square root of the variance) • 95% equitailed credible intervals • Probability that the parameter is greater than 0 • Description of the distributions, if known • Marginal probability that a coefficient should be included in the model, for stochastic search variable selection (SSVS) predictor-variable-selection models SummaryStatistics = summarize(Mdl) returns a structure array with a table summarizing the regression coefficients and disturbance variance, and a description of the joint distribution of the parameters.
Examples Summarize Prior and Posterior Distributions Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions for k = 0,...,3: •
12-2250
βk | σ2, γk = γkσ V k1Z1 + (1 − γk)σ V k2Z2, where Z1 and Z2 are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori, but they are dependent on the disturbance variance.
summarize
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. • γk ∈ 0, 1 and it represents the random variable-inclusion regime variable with a discrete uniform distribution. Create a prior model for SSVS. Specify the number of predictors p. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','mixconjugateblm','VarNames',VarNames);
PriorMdl is a mixconjugateblm Bayesian linear regression model object for SSVS predictor selection representing the prior distribution of the regression coefficients and disturbance variance. Summarize the prior distribution. summarize(PriorMdl) | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution IPI | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution E | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution WR | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
The function displays a table of summary statistics and other information about the prior distribution at the command line. Load the Nelson-Plosser data set, and create variables for the predictor and response data. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Estimate the posterior distributions. Suppress the estimation display. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is an empiricalblm model object that contains the posterior distributions of β and σ2. Obtain summary statistics from the posterior distribution. summary = summarize(PosteriorMdl);
summary is a structure array containing two fields: MarginalDistributions and JointDistribution. Display the marginal distribution summary by using dot notation. summary.MarginalDistributions ans=5×5 table Mean __________
Std _________
CI95 ________________________
Positive ________
Distribution _____________
12-2251
12
Functions
Intercept IPI E WR Sigma2
-18.66 4.4555 0.00096765 2.4739 47.773
10.348 0.15287 0.0003759 0.36337 8.6863
-37.006 4.1561 0.00021479 1.7607 33.574
0.8406 4.7561 0.0016644 3.1882 67.585
0.0412 1 0.9968 1 1
{'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'}
The MarginalDistributions field is a table of summary statistics and other information about the posterior distribution.
Input Arguments Mdl — Bayesian linear regression model for predictor variable selection mixconjugateblm model object | mixsemiconjugateblm model object | lassoblm model object Bayesian linear regression model for predictor variable selection, specified as a model object in this table. Model Object
Description
mixconjugateblm
Dependent, Gaussian-mixture-inverse-gamma conjugate model for SSVS predictor variable selection, returned by bayeslm
mixsemiconjugateblm
Independent, Gaussian-mixture-inverse-gamma semiconjugate model for SSVS predictor variable selection, returned by bayeslm
lassoblm
Bayesian lasso regression model returned by bayeslm
Output Arguments SummaryStatistics — Parameter distribution summary structure array Parameter distribution summary, returned as a structure array containing the information in this table.
12-2252
summarize
Structure Field
Description
MarginalDistribu Table containing a summary of the parameter distributions. Rows correspond tions to parameters. Columns correspond to the: • Estimated posterior mean (Mean) • Standard deviation (Std) • 95% equitailed credible interval (CI95) • Posterior probability that the parameter is greater than 0 (Positive) • Description of the marginal or conditional posterior distribution of the parameter (Distribution) Row names are the names in Mdl.VarNames. The name of the last row is Sigma2. JointDistributio A string scalar that describes the distributions of the regression coefficients n (Beta) and the disturbance variance (Sigma2) when known. For distribution descriptions: • N(Mu,V) denotes the normal distribution with mean Mu and variance matrix V. This distribution can be multivariate. • IG(A,B) denotes the inverse gamma distribution with shape A and scale B. • Mixture distribution denotes a Student’s t mixture distribution. Note If Mdl is a lassoblm model and Mdl.Probability is a function handle representing the regime probability distribution, then summarize cannot estimate prior distribution statistics for the coefficients. Therefore, entries corresponding to coefficient statistics are NaN values.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. 12-2253
12
Functions
Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Algorithms • If Mdl is a lassoblm model object and Mdl.Probability is a numeric vector, then the 95% credible intervals on the regression coefficients are Mean + [–2 2]*Std, where Mean and Std are variables in the summary table. • If Mdl is a mixconjugateblm or mixsemiconjugateblm model object, then the 95% credible intervals on the regression coefficients are estimated from the mixture cdf. If the estimation fails, then summarize returns NaN values instead.
Version History Introduced in R2018b
See Also Objects mixconjugateblm | mixsemiconjugateblm | lassoblm Functions estimate Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-2254
summarize
summarize Summarize threshold-switching dynamic regression model estimation results
Syntax summarize(Mdl) summarize(Mdl,state) results = summarize( ___ )
Description summarize(Mdl) displays a summary of the threshold-switching dynamic regression model Mdl. • If Mdl is an estimated model returned by estimate, then summarize displays estimation results to the MATLAB Command Window. The display includes: • A model description • Estimated threshold transitions • Fit statistics, which include the effective sample size, number of estimated submodel parameters and constraints, loglikelihood, and information criteria (AIC and BIC) • A table of submodel estimates and inferences, which includes coefficient estimates with standard errors, t-statistics, and p-values • If Mdl is an unestimated threshold-switching model returned by tsVAR, summarize prints the standard object display (the same display that tsVAR prints during model creation). summarize(Mdl,state) displays only summary information for the submodel with name state. results = summarize( ___ ) returns one of the following variables using any of the input argument combinations in the previous syntaxes. • If Mdl is an estimated threshold-switching model, results is a table containing the submodel estimates and inferences. • If Mdl is an unestimated model, results is a tsVAR object that is equal to Mdl. summarize does not print to the Command Window
Examples Fit SETAR Model to Simulated Data Assess estimation accuracy using simulated data from a known data-generating process (DGP). This example uses arbitrary parameter values. Create Model for DGP Create a discrete threshold transition at mid-level 1. ttDGP = threshold(1)
12-2255
12
Functions
ttDGP = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 1 [] ["1" "2"] 2
ttDGP is a threshold object representing the state-switching mechanism of the DGP. Create the following fully specified self-exciting TAR (SETAR) model for the DGP. • State 1: yt = εt. • State 2: yt = 2 + εt. • εt ∼ Ν 0, 1 . Specify the submodels by using arima. mdl1DGP = arima(Constant=0); mdl2DGP = arima(Constant=2); mdlDGP = [mdl1DGP mdl2DGP];
Because the innovations distribution is invariant across states, the tsVAR software ignores the value of the submodel innovations variance (Variance property). Create a threshold-switching model for the DGP. Specify the model-wide innovations variance. MdlDGP = tsVAR(ttDGP,mdlDGP,Covariance=1);
MdlDGP is a tsVAR object representing the DGP. Simulate Response Paths from DGP Generate a random response path of length 100 from the DGP. By default, simulate assumes a SETAR model with delay d = 1. In other words, the threshold variable is yt − 1. rng(1) % For reproducibiliy y = simulate(MdlDGP,100);
y is a 100-by-1 vector of representing the simulated response path. Create Model for Estimation Create a partially specified threshold-switching model that has the same structure as the datagenerating process, but specify the transition mid-level, submodel coefficients, and model-wide constant as unknown for estimation. tt = threshold(NaN); mdl1 = arima('Constant',NaN); mdl2 = arima('Constant',NaN); Mdl = tsVAR(tt,[mdl1,mdl2],'Covariance',NaN);
Mdl is a partially specified tsVAR object representing a template for estimation. NaN-valued elements of the Switch and Submodels properties indicate estimable parameters. 12-2256
summarize
Mdl is agnostic of the threshold variable; tsVAR object functions enable you to specify threshold variable characteristics or data. Create Threshold Transitions Containing Initial Values The estimation procedure requires initial values for all estimable threshold transition parameters. Fully specify a threshold transition that has the same structure as tt, but set the mid-level to 0. tt0 = threshold(0);
tt0 is a fully specified threshold object. Estimate Model Fit the model to the simulated path. By default, the model is self-exciting and the delay of the threshold variable is d = 1. EstMdl = estimate(Mdl,tt0,y) EstMdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["1" "2"] "1" 1.0225
EstMdl is a fully specified tsVAR object representing the estimated SETAR model. Display an estimation summary of the submodels. summarize(EstMdl) Description 1-Dimensional tsVAR Model with 2 Submodels Switch Transition Type: discrete Estimated Levels: 1.128 Fit Effective Sample Size: 99 Number of Estimated Parameters: 2 Number of Constrained Parameters: 0 LogLikelihood: -141.574 AIC: 287.149 BIC: 292.339 Submodels
State 1 Constant(1) State 2 Constant(1)
Estimate ________
StandardError _____________
TStatistic __________
PValue __________
-0.12774 2.1774
0.13241 0.16829
-0.96474 12.939
0.33467 2.7264e-38
12-2257
12
Functions
The estimates are close to their true values. Plot the estimated switching mechanism with the threshold data, which is the response data. figure ttplot(EstMdl.Switch,'Data',y)
Display Estimation Summary for One State Create the following fully specified SETAR model for the DGP. • • • •
State 1: State 2: State 3: ε1, t ε2, t
y1, t y2, t y1, t y2, t y1, t y2, t
∼ N2
=
ε1, t −1 −0 . 5 0 . 1 y1, t − 1 + + . −4 0 . 2 −0 . 75 y2, t − 1 ε2, t
=
ε1, t 1 . + 4 ε2, t
=
ε1, t 1 0 . 5 0 . 1 y1, t − 1 + + . 4 0 . 2 0 . 75 y2, t − 1 ε2, t
0 2 −1 , . 0 −1 1
• The system is in state 1 when y2, t − 4 < − 3, the system is in state 2 when −3 ≤ y2, t − 4 < 3, and the system is in state 3 otherwise. 12-2258
summarize
t = [-3 3]; ttDGP = threshold(t); constant1 = [-1; -4]; constant2 = [1; 4]; constant3 = [1; 4]; AR1 = [-0.5 0.1; 0.2 -0.75]; AR3 = [0.5 0.1; 0.2 0.75]; Sigma = [2 -1; -1 1]; mdl1DGP = varm(Constant=constant1,AR={AR1}); mdl2DGP = varm(Constant=constant2); mdl3DGP = varm(Constant=constant3,AR={AR3}); mdlDGP = [mdl1DGP; mdl2DGP; mdl3DGP]; MdlDGP = tsVAR(ttDGP,mdlDGP,Covariance=Sigma);
DIsplay a summary of the unestimated DGP. summarize(MdlDGP) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["1" "2" "3"] ["1" "2"] [2x2 double]
summarize prints an object display. Generate a random response path of length 500 from the DGP. Specify that second response variable with a delay of 4 as the threshold variable. rng(10) % For reproducibiliy y = simulate(MdlDGP,500,Index=2,Delay=4);
Create a partially specified threshold-switching model that has the same structure as the DGP, but specify the transition mid-level, submodel coefficients, and model-wide covariance as unknown for estimation. tt = threshold([NaN; NaN]); mdlar = varm(2,1); mdlc = varm(2,0); Mdl = tsVAR(tt,[mdlar; mdlc; mdlar],Covariance=nan(2));
Fully specify a threshold transition that has the same structure as tt, but set the mid-levels to -1 and 1. t0 = [-1 1]; tt0 = threshold(t0);
Fit the threshold-switching model to the simulated series. Specify the threshold variable y2, t − 4. Plot the loglikelihood after each iteration of the threshold search algorithm. EstMdl = estimate(Mdl,tt0,y,IterationPlot=true,Index=2,Delay=4);
12-2259
12
Functions
The plot displays the evolution of the loglikelihood as the estimation procedure searches for optimal levels. The procedure terminates when one of the stopping criteria is satisfied. Display an estimation summary for state 3 only. summarize(EstMdl,3) Description 2-Dimensional VAR Submodel, State 3 Submodel Estimate ________ State State State State State State
3 3 3 3 3 3
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
1.0621 3.8707 0.47396 0.23013 0.10561 0.7568
StandardError _____________ 0.095701 0.068772 0.058016 0.041691 0.018233 0.013102
Return Estimation Summary Table Create a discrete threshold transition at mid-level 1. 12-2260
TStatistic __________
PValue __________
11.098 56.284 8.1694 5.5199 5.7924 57.761
1.2802e-28 0 3.0997e-16 3.3927e-08 6.9371e-09 0
summarize
ttDGP = threshold(1) ttDGP = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 1 [] ["1" "2"] 2
Create the following fully specified self-exciting TAR (SETAR) model for the DGP. • State 1: yt = εt. • State 2: yt = 2 + εt . • εt ∼ Ν 0, 1 . Specify the submodels by using arima. mdl1DGP = arima(Constant=0); mdl2DGP = arima(Constant=2); mdlDGP = [mdl1DGP mdl2DGP];
Create a threshold-switching model for the DGP. Specify the model-wide innovations variance. MdlDGP = tsVAR(ttDGP,mdlDGP,Covariance=1);
Generate a random response path of length 100 from the DGP. By default, simulate assumes a SETAR model with delay d = 1. In other words, the threshold variable is yt − 1. rng(1) % For reproducibiliy y = simulate(MdlDGP,100);
Create a partially specified threshold-switching model that has the same structure as the datagenerating process, but specify the transition mid-level, submodel coefficients, and model-wide constant as unknown for estimation. tt = threshold(NaN); mdl1 = arima('Constant',NaN); mdl2 = arima('Constant',NaN); Mdl = tsVAR(tt,[mdl1,mdl2],'Covariance',NaN);
Fully specify a threshold transition that has the same structure as tt, but set the mid-level to 0. tt0 = threshold(0);
Fit the model to the simulated path. By default, the model is self-exciting and the delay of the threshold variable is d = 1. EstMdl = estimate(Mdl,tt0,y);
Return an estimation summary table. results = summarize(EstMdl) results=2×4 table Estimate
StandardError
TStatistic
PValue
12-2261
12
Functions
State 1 Constant(1) State 2 Constant(1)
________
_____________
__________
__________
-0.12774 2.1774
0.13241 0.16829
-0.96474 12.939
0.33467 2.7264e-38
results is a table containing estimates and inferences for all submodel coefficients. Identify significant coefficient estimates. results.Properties.RowNames(results.PValue < 0.05) ans = 1x1 cell array {'State 2 Constant(1)'}
Input Arguments Mdl — Threshold-switching dynamic regression model tsVAR object Threshold-switching dynamic regression model, specified as a tsVAR object returned by estimate or tsVAR. state — State to summarize integer in 1:Mdl.NumStates (default) | state name in Mdl.StateNames State to summarize, specified as an integer in 1:Mdl.NumStates or a state name in Mdl.StateNames. The default summarizes all states. Example: summarize(Mdl,3) summarizes the third state in Mdl. Example: summarize(Mdl,"Recession") summarizes the state labeled "Recession" in Mdl. Data Types: double | char | string
Output Arguments results — Model summary table | tsVAR object Model summary, returned as a table or tsVAR object. • If Mdl is an estimated threshold-switching model returned by estimate, results is a table of summary information for the submodel parameter estimates. Each row corresponds to a submodel coefficient. Columns correspond to the estimate (Estimate), standard error (StandardError), tstatistic (TStatistic), and the p-value (PValue). When the summary includes all states (the default), results.Properties stores the following fit statistics:
12-2262
summarize
Field
Description
Description
Model summary description (character vector)
EffectiveSampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
NumConstraints
Number of equality constraints (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike information criterion (numeric scalar)
BIC
Bayesian information criterion (numeric scalar)
• If Mdl is an unestimated model, results is a tsVAR object that is equal to Mdl. Note When results is a table, it contains only submodel parameter estimates: • Mdl.Switch contains estimates of threshold transitions. • Threshold-switching models can have one or more residual covariance matrices. When Mdl has a model-wide covariance, Mdl.Covariance contains the estimated residual covariance. Otherwise, Mdl.Submodels(j).Covariance contains the estimated residual covariance of state j. For details, see tsVAR.
Algorithms estimate searches over levels and rates for estimated threshold transitions while solving a conditional least-squares problem for submodel parameters, as described in [2]. The standard errors, loglikelihood, and information criteria are conditional on optimal parameter values in the estimated threshold transitions Mdl.Switch. In particular, standard errors do not account for variation in estimated levels and rates.
Version History Introduced in R2021b
References [1] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [2] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also Objects tsVAR | threshold 12-2263
12
Functions
Functions estimate Topics “Estimate Threshold-Switching Dynamic Regression Models” on page 10-93
12-2264
summarize
summarize Display estimation results of vector autoregression (VAR) model
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the VAR(p) model Mdl. • If Mdl is an estimated VAR model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The summary also includes the loglikelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) model fit statistics, as well as the estimated innovations covariance and correlation matrices. • If Mdl is an unestimated VAR model returned by varm, then summarize prints the standard object display (the same display that varm prints during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated VAR model, then results is a structure containing estimation results. • If Mdl is an unestimated VAR model, then results is a varm model object that is equal to Mdl.
Examples Fit VAR(4) Model to Matrix of Response Data Fit a VAR(4) model to consumer price index (CPI) and unemployment rate series. Supply the response series as a numeric matrix. Load the Data_USEconModel data set. load Data_USEconModel
Plot the two series on separate plots. figure; plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL); title('Consumer Price Index') ylabel('Index') xlabel('Date')
12-2265
12
Functions
figure; plot(DataTimeTable.Time,DataTimeTable.UNRATE); title('Unemployment Rate'); ylabel('Percent'); xlabel('Date');
12-2266
summarize
Stabilize the CPI by converting it to a series of growth rates. Synchronize the two series by removing the first observation from the unemployment rate series. rcpi = price2ret(DataTimeTable.CPIAUCSL); unrate = DataTimeTable.UNRATE(2:end);
Create a default VAR(4) model using the shorthand syntax. Mdl = varm(2,4) Mdl = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional VAR(4) Model" "Y1" "Y2" 2 4 [2×1 vector of NaNs] {2×2 matrices of NaNs} at lags [1 2 3 ... and 1 more] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix of NaNs]
Mdl is a varm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set. 12-2267
12
Functions
EstMdl = estimate(Mdl,[rcpi unrate]) EstMdl = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VAR(4) Model" "Y1" "Y2" 2 4 [0.00171639 0.316255]' {2×2 matrices} at lags [1 2 3 ... and 1 more] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
EstMdl is an estimated varm model object. It is fully specified because all parameters have known values. The description indicates that the autoregressive polynomial is stationary. Display summary statistics from the estimation. summarize(EstMdl) AR-Stationary 2-Dimensional VAR(4) Model Effective Sample Size: 241 Number of Estimated Parameters: 18 LogLikelihood: 811.361 AIC: -1586.72 BIC: -1524
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2) AR{2}(1,1) AR{2}(2,1) AR{2}(1,2) AR{2}(2,2) AR{3}(1,1) AR{3}(2,1) AR{3}(1,2) AR{3}(2,2) AR{4}(1,1) AR{4}(2,1) AR{4}(1,2) AR{4}(2,2)
Value ___________
StandardError _____________
TStatistic __________
PValue __________
0.0017164 0.31626 0.30899 -4.4834 -0.0031796 1.3433 0.22433 7.1896 0.0012375 -0.26817 0.35333 1.487 0.0028594 -0.22709 -0.047563 8.6379 -0.00096323 0.076725
0.0015988 0.091961 0.063356 3.6441 0.0011306 0.065032 0.069631 4.005 0.0018631 0.10716 0.068287 3.9277 0.0018621 0.1071 0.069026 3.9702 0.0011142 0.064088
1.0735 3.439 4.877 -1.2303 -2.8122 20.656 3.2217 1.7951 0.6642 -2.5025 5.1742 0.37858 1.5355 -2.1202 -0.68906 2.1757 -0.86448 1.1972
0.28303 0.0005838 1.0772e-06 0.21857 0.004921 8.546e-95 0.0012741 0.072631 0.50656 0.012331 2.2887e-07 0.705 0.12465 0.033986 0.49079 0.029579 0.38733 0.23123
Innovations Covariance Matrix: 0.0000 -0.0002 -0.0002 0.1167
12-2268
summarize
Innovations Correlation Matrix: 1.0000 -0.0925 -0.0925 1.0000
Compare Several VAR Model Fits Consider these four VAR models of consumer price index (CPI) and unemployment rate: VAR(0), VAR(1), VAR(4), and VAR(8). Using historical data, estimate each, and then compare the model fits using the resulting BIC. Load the Data_USEconModel data set. Declare variables for the consumer price index (CPI) and unemployment rate (UNRATE) series. Remove any missing values from the beginning of the series. load Data_USEconModel cpi = DataTimeTable.CPIAUCSL; unrate = DataTimeTable.UNRATE; idx = all(~isnan([cpi unrate]),2); cpi = cpi(idx); unrate = unrate(idx);
Stabilize CPI by converting it to a series of growth rates. Synchronize the two series by removing the first observation from the unemployment rate series. rcpi = price2ret(cpi); unrate = unrate(2:end);
Within a loop: • Create a VAR model using the shorthand syntax. • Estimate the VAR Model. Reserve the maximum value of p as presample observations. • Store the estimation results. numseries = 2; p = [0 1 4 8]; estMdlResults = cell(numel(p),1); % Preallocation Y0 = [rcpi(1:max(p)) unrate(1:max(p))]; Y = [rcpi((max(p) + 1):end) unrate((max(p) + 1):end)]; for j = 1:numel(p) Mdl = varm(numseries,p(j)); EstMdl = estimate(Mdl,Y,'Y0',Y); estMdlResults{j} = summarize(EstMdl); end
estMdlResults is a 4-by-1 cell array of structure arrays containing the estimation results of each model. Extract the BIC from each set of results. BIC = cellfun(@(x)x.BIC,estMdlResults) BIC = 4×1 103 ×
12-2269
12
Functions
-0.7153 -1.3678 -1.4378 -1.3853
The model corresponding to the lowest BIC has the best fit among the models considered. Therefore, the VAR(4) is the best fitting model.
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object returned by estimate, varm, or varm (a vecm function).
Output Arguments results — Model summary structure array | varm model object Model summary, returned as a structure array or a varm model object. • If Mdl is an estimated VAR model, then results is a structure array containing the fields in this table.
12-2270
Field
Description
Description
Model summary description (string)
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike information criterion (numeric scalar)
BIC
Bayesian information criterion (numeric scalar)
Table
Parameter estimates with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
Covariance
Estimated residual covariance matrix (the maximum likelihood estimate), a Mdl.NumSeries-by-Mdl.NumSeries numeric matrix with rows and columns corresponding to the innovations in the response equations ordered by the data Y
summarize
Field
Description
Correlation
Estimated residual correlation matrix, its dimensions correspond to the dimensions of Covariance
summarize uses mvregress to implement multivariate normal, maximum likelihood estimation. For more details on estimates and standard errors, see “Estimation of Multivariate Regression Models”. • If Mdl is an unestimated VAR model, then results is a varm model object that is equal to Mdl.
Version History Introduced in R2017a
See Also Objects varm Functions estimate | varm | mvregress Topics “VAR Model Estimation” on page 9-34 “Fit VAR Model of CPI and Unemployment Rate” on page 9-38 “VAR Model Case Study” on page 9-91 “Estimation of Multivariate Regression Models”
12-2271
12
Functions
summarize Display estimation results of vector error-correction (VEC) model
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the VEC(p – 1) model Mdl. • If Mdl is an estimated VEC model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC), and the estimated innovations covariance and correlation matrices. • If Mdl is an unestimated VEC model returned by vecm, then summarize prints the standard object display (the same display that vecm prints during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated VEC model, then results is a structure containing estimation results. • If Mdl is an unestimated VEC model, then results is a vecm model object that is equal to Mdl.
Examples Fit VEC(1) Model to Matrix of Response Data Fit a VEC(1) model to seven macroeconomic series. Supply the response data as a numeric matrix. Consider a VEC model for the following macroeconomic series: • Gross domestic product (GDP) • GDP implicit price deflator • Paid compensation of employees • Nonfarm business sector hours of all persons • Effective federal funds rate • Personal consumption expenditures • Gross private domestic investment Suppose that a cointegrating rank of 4 and one short-run term are appropriate, that is, consider a VEC(1) model. Load the Data_USEconVECModel data set. 12-2272
summarize
load Data_USEconVECModel
For more information on the data set and variables, enter Description at the command line. Determine whether the data needs to be preprocessed by plotting the series on separate plots. figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.GDP); title("Gross Domestic Product"); ylabel("Index"); xlabel("Date"); nexttile plot(FRED.Time,FRED.GDPDEF); title("GDP Deflator"); ylabel("Index"); xlabel("Date"); nexttile plot(FRED.Time,FRED.COE); title("Paid Compensation of Employees"); ylabel("Billions of $"); xlabel("Date"); nexttile plot(FRED.Time,FRED.HOANBS); title("Nonfarm Business Sector Hours"); ylabel("Index"); xlabel("Date");
12-2273
12
Functions
figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.FEDFUNDS) title("Federal Funds Rate") ylabel("Percent") xlabel("Date") nexttile plot(FRED.Time,FRED.PCEC) title("Consumption Expenditures") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.GPDI) title("Gross Private Domestic Investment") ylabel("Billions of $") xlabel("Date")
Stabilize all series, except the federal funds rate, by applying the log transform. Scale the resulting series by 100 so that all series are on the same scale. FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
12-2274
summarize
Create a VEC(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = FRED.Properties.VariableNames Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model with Linear Time Trend" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs] [4×1 vector of NaNs] {7×7 matrix of NaNs} at lag [1] [7×1 vector of NaNs] [7×0 matrix] [7×7 matrix of NaNs]
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set and the default options. EstMdl = estimate(Mdl,FRED.Variables) EstMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [14.1329 8.77841 -7.20359 ... and 4 more]' [7×4 matrix] [7×4 matrix] [7×7 matrix] [-28.6082 109.555 -77.0912 ... and 1 more]' [4×1 vector of zeros] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
EstMdl is an estimated vecm model object. It is fully specified because all parameters have known values. By default, estimate imposes the constraints of the H1 Johansen VEC model form by removing the cointegrating trend and linear trend terms from the model. Parameter exclusion from estimation is equivalent to imposing equality constraints to zero. Display a short summary from the estimation. results = summarize(EstMdl)
12-2275
12
Functions
results = struct with fields: Description: "7-Dimensional Rank = 4 VEC(1) Model" Model: "H1" SampleSize: 238 NumEstimatedParameters: 112 LogLikelihood: -1.4939e+03 AIC: 3.2118e+03 BIC: 3.6007e+03 Table: [133x4 table] Covariance: [7x7 double] Correlation: [7x7 double]
The Table field of results is a table of parameter estimates and corresponding statistics.
Compare Several VEC Model Fits Consider the model and data in “Fit VEC(1) Model to Matrix of Response Data” on page 12-2272 and these four alternative VEC models: VEC(0), VEC(1), VEC(3), and VEC(7). Using historical data, estimate each of the four models, and then compare the model fits using the resulting Bayesian Information Criterion (BIC). Load the Data_USEconVECModel data set and preprocess the data. load Data_USEconVECModel FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
Within a loop: • Create a VEC model using the shorthand syntax. • Estimate the VEC Model. Reserve the maximum value of p as presample observations. • Store the estimation results. numlags = [0 1 3 7]; p = numlags + 1; Y0 = FRED{1:max(p),:}; Y = FRED{((max(p) + 1):end),:}; for j = 1:numel(p) Mdl = vecm(7,4,numlags(j)); EstMdl = estimate(Mdl,Y,'Y0',Y); results(j) = summarize(EstMdl); end
results is a 4-by-1 structure array containing the estimation results of each model. Extract the BIC from each set of results. BIC = [results.BIC]
12-2276
summarize
BIC = 1×4 103 × 5.3948
5.4372
5.8254
6.5536
The model corresponding to the lowest BIC has the best fit among the models considered. Therefore, the VEC(0) model is the best fitting model.
Input Arguments Mdl — VEC model vecm model object VEC model, specified as a vecm model object returned by estimate or vecm.
Output Arguments results — Model summary structure array | vecm model object Model summary, returned as a structure array or a vecm model object. • If Mdl is an estimated VEC model, then results is a structure array containing the fields in this table. Field
Description
Description
Model summary description (string)
Model
Johansen model of deterministic terms ("H2", "H1*", "H1", "H*", "H") [1]
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Parameter estimates with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
Covariance
Estimated residual covariance matrix (the maximum likelihood estimate), an Mdl.NumSeries-by-Mdl.NumSeries numeric matrix with rows and columns corresponding to the innovations in the response equations ordered by the columns of Y
12-2277
12
Functions
Field
Description
Correlation
Estimated residual correlation matrix whose dimensions correspond to the dimensions of Covariance
summarize uses mvregress to implement multivariate normal, maximum likelihood estimation. For more details on estimates and standard errors, see “Estimation of Multivariate Regression Models”. • If Mdl is an unestimated VEC model, then results is a vecm model object that is equal to Mdl.
Version History Introduced in R2017b
References [1] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995.
See Also Objects vecm Functions estimate | vecm | mvregress Topics “Model the United States Economy” on page 9-151 “Estimation of Multivariate Regression Models”
12-2278
threshold
threshold Create threshold transitions
Description threshold creates threshold transitions from the specified levels and transition type, either discrete or smooth. Use a threshold object to specify the switching mechanism of a threshold-switching dynamic regression model (tsVAR). To study a threshold transitions model, pass a fully specified threshold object to an object function on page 12-2282. You can specify transition levels and rates as unknown parameters (NaN values), which you can estimate when you fit a tsVAR model to data by using estimate. Alternatively, to create a random switching mechanism, governed by a discrete-time Markov chain, for a Markov-switching dynamic regression model, see dtmc and msVAR.
Creation Syntax tt = threshold(levels) tt = threshold(levels,Name,Value) Description tt = threshold(levels) creates the threshold transitions object tt for discrete state transitions on page 12-2287 specified by the transition mid-levels levels. tt = threshold(levels,Name,Value) sets properties on page 12-2280 using name-value argument syntax. For example, threshold([0 1],Type="exponential",Rates=[0.5 1.5]) specifies smooth, exponential transitions at mid-levels 0 and 1 with rates 0.5 and 1.5, respectively. Input Arguments levels — Transition mid-levels t1, t2,… tn increasing numeric vector Transition mid-levels t1, t2,… tn, specified as an increasing numeric vector. Levels tj separate threshold variable data into n + 1 states represented by the intervals (−∞,t1),[t1,t2), … [tn,∞). NaN entries indicate estimable transition levels. The estimate function of tsVAR treats the known elements of levels as equality constraints during optimization. threshold stores levels in the Levels property Data Types: double 12-2279
12
Functions
Properties You can set most properties when you create a model by using name-value argument syntax. You can modify only StateNames by using dot notation. For example, create a two-state, logistic transition at 0, and then label the first and second states Depression and Recession, respectively. tt = threshold(0,Type="logistic"); tt.StateNames = ["Depression" "Recession"];
Type — Type of transitions "discrete" (default) | "normal" | "logistic" | "exponential" | "custom" This property is read-only. Type of transitions, specified as a character vector or string scalar. The transition function F(zt,tj,rj) associates a transition type with each threshold level tj, where zt is a threshold variable and rj is a level-specific transition rate. Each function F is bounded between 0 and 1. This table contains the supported types of transitions: Value "discrete" (default)
Description Discrete transitions: F(zt, t j) =
0 , zt < t j 1 , zt > = t j
.
Discrete transitions do not have transition rates. "normal" "logistic" "exponential"
Cumulative normal transitions: F(zt,tj,rj) = normcdf(zt,tj,1/rj). Logistic transitions: F(zt, t j, r j) =
1 −r j(zt − t j)
1+e
Exponential transitions: 2
F(zt, t j, r j) = 1 − e−r j(zt − t j) . "custom"
Custom transition function specified by the function handle of the TransitionFunction property.
Example: "normal" Data Types: char | string Levels — Transition mid-levels t1, t2,… tn increasing numeric vector This property is read-only. Transition mid-levels t1, t2,… tn, specified as an increasing numeric vector. The levels input argument sets Levels. Data Types: double Rates — Transition rates r1, r2,… rn ones(n,1) (default) | positive numeric vector 12-2280
.
threshold
This property is read-only. Transition rates r1, r2,… rn, specified as a positive numeric vector of length n, the number of levels. Each rate corresponds to a level in Levels. NaN values indicate estimable rates. The estimate function of tsVAR treats the known elements of Rates as equality constraints during optimization. threshold ignores rates for discrete transitions. Example: [0.5 1.5] Data Types: double NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar and derived from Levels. Data Types: double StateNames — Unique state labels string(1:(n + 1)) (default) | string vector | cell vector of character vectors | numeric vector Unique state labels, specified as a string vector, cell vector of character vectors, or numeric vector of length numStates. StateNames(1) names the state corresponding to (−∞,t1), StateName(2) names the state corresponding to [t1,t2),… and StateNames(n + 1) names the state corresponding to [tn,∞). Example: ["Depression" "Recession" "Stagnant" "Boom"] Data Types: string TransitionFunction — Custom transition function [] (default) | function handle This property is read-only. Custom transition function F(zt,tj,rj), specified as a function handle. The handle must specify a function with the following syntax: function f = transitionfcn(z,tj,rj)
where: • You can replace the name transitionfcn. • z is a numeric vector of threshold variable data. • tj is a numeric scalar threshold level. • rj is a numeric scalar rate. • f is a numeric vector of in the interval [0,1]. When Type is not "custom", threshold ignores TransitionFunction. Example: @transitionfcn 12-2281
12
Functions
Data Types: function_handle
Object Functions ttplot ttdata ttstates
Plot threshold transitions Transition function data Threshold variable data state path
Examples Create Discrete Threshold Transition Create a threshold transition at mid-level t1 = 0. t1 = 0; tt = threshold(t1) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 0 [] ["1" "2"] 2
tt is a threshold object representing a discrete threshold transition at mid-level 0. tt is fully specified because its properties do not contain NaN values. Therefore, you can pass tt to any threshold object function. Given a univariate threshold variable, tt divides the range of the variable into two distinct states, which tt labels "1" and "2". tt also specifies the switching mechanism of a threshold-switching autoregressive (TAR) model, represented by a tsVAR object. Given values of the observed univariate transition variable: • The TAR model is in state "1" when the transition variable is in the interval − ∞ , 0 . • The TAR model is in state "2" when the transition variable is in the interval [0, ∞ ).
Plot Smooth Threshold Transitions This example shows how to create two logistic threshold transitions with different transition rates, and then display a gradient plot of the transitions. Load the yearly Canadian inflation and interest rates data set. Extract the inflation rate based on consumer price index (INF_C) from the table, and plot the series. load Data_Canada INF_C = DataTable.INF_C; plot(dates,INF_C); axis tight
12-2282
threshold
Assume the following characteristics of the inflation rate series: • Rates below 2% are low. • Rates at least 2% and below 8% are medium. • Rates at least 8% are high. • A logistic transition function describes the transition between states well. • Transition between low and medium rates are faster than transitions between medium and high. Create threshold transitions to describe the Canadian inflation rates. t = [2 8]; % Thresholds r = [3.5 1.5]; % Transition rates statenames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=r,StateNames=statenames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [2 8] [3.5000 1.5000] ["Low" "Med" 3
"High"]
12-2283
12
Functions
Plot the threshold transitions; show the gradient of the transition function between the states, and overlay the data. figure ttplot(tt,Data=INF_C)
Prepare Switching Mechanism of Threshold-Switching Model for Estimation A threshold-switching dynamic regression (tsVAR) model has two main components: • Threshold transitions, which represent the switching mechanism between states. Mid-levels and transition rates are estimable. • A collection of autoregressive models describing the dynamic system among states. Submodel coefficients and covariances are estimable. Before you create a threshold-switching model, you must specify its threshold transitions by using theshold. If you plan to fit a threshold-switching model to data, you can fully specify its threshold transitions if you know all mid-levels and transition rates. If you need to estimate some or all midlevels and rates, you can enter NaN values as placeholders for unknown parameters. estimate treats all specified parameters as equality constraints during estimation. Regardless, to fit threshold transition parameters to data, you must specify a partially specified threshold object as the switching mechanism of a threshold-switching model.
12-2284
threshold
Prepare All Estimable Parameters for Estimation Consider a smooth transition autoregressive (STAR) model that switches between three states (two thresholds) with an exponential transition function. Create the switching mechanism. Specify that all estimable parameters are unknown. t1 = [NaN NaN]; % Two unknown mid-levels r1 = [NaN NaN]; % Two unknown transition rates tt1 = threshold(t1,Type="exponential",Rates=r1) tt1 = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'exponential' [NaN NaN] [NaN NaN] ["1" "2" 3
"3"]
tt is a partially specified threshold object to pass to tsVAR as the switching mechanism. The estimate function of tsVAR fits the two mid-levels and transition rates to the data with any unknown submodel parameters in the threshold-switching model. Specify Equality Constraints Consider a STAR model, which has a switching mechanism with the following qualities: • The thresholds are at -1 and 1. • The transition function is exponential. • The transition rate between the first and second state is 0.5, but the rate between the second and third states is unknown. Create the switching mechanism. t2 = [-1 1]; r2 = [0.5 NaN]; tt2 = threshold(t2,Type="exponential",Rates=r2) tt2 = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'exponential' [-1 1] [0.5000 NaN] ["1" "2" 3
"3"]
tt is a partially specified threshold object to pass to tsVAR as the switching mechanism. The estimate function of tsVAR does the following: • Treat the mid-levels tt.Levels and the first transition rate tt.Rates(1) as equality constraints
12-2285
12
Functions
• Fit the second transition rate tt.Rates(2) to the data with any unknown submodel parameters in the threshold-switching model
Specify Custom Transition Functions Create smooth threshold transitions with the following qualities: • Mid-levels are at -1, 1, and 2. • The transition function is the Student's t cdf, which allows for a more gradual mixing than the normal cdf. • The transition rates, which are the degrees of freedom of the distribution, are 3, 10, and 100. t = [-1 1 2]; r = [3 10 100]; ttransfcn = @(z,ti,ri)tcdf(z,ri); tt = threshold(t,Type="custom",TransitionFunction=ttransfcn,Rates=r) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'custom' [-1 1 2] [3 10 100] ["1" "2" 4
"3"
Plot graphs of each transition function. figure ttplot(tt,Type="graph") legend(string(tt.Levels))
12-2286
"4"]
threshold
More About State Transition In threshold-switching dynamic regression models (tsVAR), a state transition occurs when a threshold variable zt crosses a transition mid-level. Discrete transitions result in an abrupt change in the submodel computing the response. Smooth transitions create weighted combinations of submodel responses that change continuously with the value of zt, and state changes indicate a shift in the dominant submodel. The smooth transition weights are determined by a transition function F(zt,tj,rj), where tj is threshold j and rj is transition rate j (see Type). Discrete, normal, and logistic transition functions separate states at small and large values of zt. Exponential transitions separate states at small and large values of |zt|. Exponential transitions model economic variables with "inner" and "outer" states, such as deviations from purchasing power parity.
Tips • To widen a smooth transition band to show a more gradual mixing of states, decrease the transition rate by specifying the Rates name-value argument when you create threshold transitions.
12-2287
12
Functions
Version History Introduced in R2021b
References [1] Enders, Walter. Applied Econometric Time Series. New York: John Wiley & Sons, Inc., 2009. [2] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [3] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also Objects tsVAR Functions ttplot | ttdata | ttstates | estimate Topics “Create Threshold Transitions” on page 10-73 “Visualize Threshold Transitions” on page 10-76 “Create Threshold-Switching Dynamic Regression Models” on page 10-88
12-2288
toCellArray
toCellArray Convert lag operator polynomial object to cell array
Syntax [coefficients, lags] = toCellArray(A)
Description [coefficients, lags] = toCellArray(A) converts a lag operator polynomial object A(L) to an equivalent cell array. coefficients is the cell array equivalent to the lag operator polynomial A(L). lags is a vector of unique integer lags associated with the polynomial coefficients. Elements of lags are in ascending order. The first element of lags is the smaller of the smallest nonzero coefficient lag of the object and zero; the last element of lags is the degree of the polynomial. That is, lags = [min(A.Lags,0), 1, 2, ... A.Degree].
Examples Convert Lag Operator to a Cell Array Create a LagOp polynomial and convert it to a cell array: A = LagOp({0.8 1 0 .6}); B = toCellArray(A); class(B) ans = 'cell'
Algorithms LagOp objects implicitly store polynomial lags and corresponding coefficient matrices of zero-valued coefficients via lag-based indexing. However, cell arrays conform to traditional element indexing rules, and must explicitly store zero coefficient matrices. The output cell array is equivalent to the input lag operator polynomial in the sense that the same lag operator is created when the output coefficients and lags are used to create a new LagOp object. That is, the following two statements produce the same polynomial A(L): [coefficients,lags] = toCellArray(A); A = LagOp(coefficients,'Lags',lags);
12-2289
12
Functions
tsVAR Create threshold-switching dynamic regression model
Description The tsVAR function returns a tsVAR object that specifies the functional form of a threshold-switching dynamic regression model on page 12-2306 for the univariate or multivariate response process yt. The tsVAR object also stores the parameter values of the model. A tsVAR object has two key components: • Switching mechanism among states, represented by threshold transitions (threshold object) • State-specific submodels, either autoregressive (ARX) or vector autoregression (VARX) models (arima or varm objects), which can contain exogenous regression components The components completely specify the model structure. The threshold transition levels, smooth transition function rates, and submodel parameters, such as the AR coefficients and innovationdistribution variance, are unknown and estimable unless you specify their values. To estimate a model containing unknown parameter values, pass the model and data to estimate. To work with an estimated or fully specified tsVAR object, pass it to an object function on page 12-2293. Alternatively, to create a Markov-switching dynamic regression model, which has a switching mechanism governed by a discrete-time Markov chain, see dtmc and msVAR.
Creation Syntax Mdl = tsVAR(tt,mdl) Mdl = tsVAR(tt,mdl,Name,Value) Description Mdl = tsVAR(tt,mdl) creates a threshold-switching dynamic regression model Mdl (a tsVAR object) that has the threshold transitions switching mechanism among states tt and the statespecific, stable dynamic regression submodels mdl. Mdl = tsVAR(tt,mdl,Name,Value) sets some properties on page 12-2291 using name-value argument syntax. For example, tsVAR(tt,mdl,Covariance=Sigma,SeriesNames=["GDP" "CPI"]) generates all innovations from the covariance matrix Sigma and labels the submodel response series "GDP" and "CPI", respectively. Input Arguments tt — Threshold transitions threshold object 12-2290
tsVAR
Threshold transitions, with NumStates states, specified as a threshold object. The states represented in tt.StateNames correspond to the states represented in the submodel vector mdl. tsVAR stores tt in the Switch property. mdl — State-specific dynamic regression submodels vector of arima objects | vector of varm objects State-specific dynamic regression submodels, specified as a length mc.NumStates vector of model objects individually constructed by arima or varm. All submodels must be of the same type (arima or varm) and have the same number of series. Unlike other model estimation tools, estimate does not infer the size of submodel regression coefficient arrays during estimation. Therefore, you must specify the Beta property of each submodel appropriately. For example, to include and estimate three predictors of the regression component of univariate submodel j, set mdl(j).Beta = NaN(3,1). tsVAR processes and stores mdl in the property Submodels.
Properties You can set the Covariance and SeriesNames properties when you create a model by using namevalue argument syntax. MATLAB derives the values of all other properties from inputs tt and mdl. You can modify only SeriesNames by using dot notation. For example, create a threshold-switching model for a 2-D response series in which all submodels share the same unknown covariance matrix, and then label the first and second series "GDP" and "CPI", respectively. Mdl = tsVAR(tt,mdl,Covariance=nan(2,2)); Mdl.SeriesNames = ["GDP" "CPI"];
Switch — Threshold transitions for switching mechanism among states threshold object This property is read-only. Threshold transitions for the switching mechanism among states, specified as a threshold object. Submodels — State-specific vector autoregression submodels vector of varm objects This property is read-only. State-specific vector autoregression submodels, specified as a vector of varm objects of length NumStates. tsVAR removes unsupported submodel components. • For arima submodels, tsVAR does not support the moving average (MA), differencing, and seasonal components. If any submodel is a composite conditional mean and variance model (for example, its Variance property is a garch object), tsVAR issues an error. 12-2291
12
Functions
• For varm submodels, tsVAR does not support the trend component. tsVAR converts submodels specified as arima objects to 1-D varm objects. NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar. Data Types: double NumSeries — Number of time series positive integer This property is read-only. Number of time series, specified as a positive integer. NumSeries specifies the dimensionality of the response variable and innovation in all submodels. Data Types: double StateNames — State labels string vector This property is read-only. State labels, specified as a string vector of length NumStates. Data Types: string SeriesNames — Unique Series labels string(1:numSeries) (default) | string vector | cell array of character vectors | numeric vector Unique series labels, specified as a string vector, cell array of character vectors, or a numeric vector of length numSeries. tsVAR stores the series names as a string vector. Data Types: string Covariance — Model-wide innovations covariance [] (default) | positive numeric scalar | positive semidefinite matrix Model-wide innovations covariance, specified as a positive numeric scalar for univariate models or a numSeries-by-numSeries positive semidefinite matrix for multivariate models. If Covariance is not an empty array ([]), object functions of tsVAR generate all innovations from Covariance and ignore submodel covariances. If Covariance is [] (the default), object functions of tsVAR generate innovations from submodel covariances. estimate does not support equality constraints on the innovations covariance. estimate ignores specified entries in Covariance or in submodel innovations covariances, and estimates all covariances instead. Data Types: double 12-2292
tsVAR
Notes • NaN-valued elements in either the properties of Switch or the submodels of Submodels indicate unknown, estimable parameters. Specified elements, except submodel innovation variances, indicate equality constraints on parameters in model estimation. • All unknown submodel parameters are state dependent.
Object Functions estimate forecast simulate summarize
Fit threshold-switching dynamic regression model to data Forecast sample paths from threshold-switching dynamic regression model Simulate sample paths of threshold-switching dynamic regression model Summarize threshold-switching dynamic regression model estimation results
Examples Create Fully Specified Univariate TAR Model Create a two-state TAR model for a 1-D response process. Specify all parameter values (this example uses arbitrary values). Create Switching Mechanism Create a discrete threshold transition at level 0. Label the regimes to reflect the state of the economy: • When the threshold variable (currently unknown) is in − ∞ , 0 , the economy is in a recession. • When the threshold variable is in [0, ∞ , the economy is expanding. t = 0; tt = threshold(t,StateNames=["Recession" "Expansion"]) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 0 [] ["Recession" 2
"Expansion"]
tt is a fully specified threshold object that describes the switching mechanism of the thresholdswitching model. Create State-Specific Models for Response Series Assume the following univariate models describe the response process of the system: • Recession: yt = − 1 + 0 . 1yt − 1 + ε1, t, where ε1, t ∼ Ν 0, 1 . • Expansion: yt = −1 + 0 . 3yt − 1 + 0 . 2yt − 2 + ε2, t, where ε2, t ∼ Ν 0, 4 . 12-2293
12
Functions
For each regime, use arima to create an AR model that describes the response process within the regime. c1 = -1; c2 = 1; ar1 = 0.1; ar2 = [0.3 0.2]; v1 = 1; v2 = 4; mdl1 = arima(Constant=c1,AR=ar1,Variance=v1, ... Description="Recession State Model") mdl1 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State Model" Name = "Gaussian" 1 0 0 -1 {0.1} at lag [1] {} {} {} 0 [1×0] 1
ARIMA(1,0,0) Model (Gaussian Distribution) mdl2 = arima(Constant=c2,AR=ar2,Variance=v2, ... Description="Expansion State Model") mdl2 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Expansion State Model" Name = "Gaussian" 2 0 0 1 {0.3 0.2} at lags [1 2] {} {} {} 0 [1×0] 4
ARIMA(2,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are fully specified arima objects. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. mdl = [mdl1; mdl2];
12-2294
tsVAR
Create Threshold-Switching Model Use tsVAR to create a TAR model from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["Recession" "Expansion"] "1" []
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 1-Dimensional VAR(2) Model" "Y1" 1 2 1 {0.3 0.2} at lags [1 2] 0 [1×0 matrix] 4
Mdl is a fully specified tsVAR object representing a univariate two-state TAR model. tsVAR stores specified arima submodels as varm objects. Because Mdl is fully specified, you can pass it to any tsVAR object function for further analysis (see “Object Functions” on page 12-2293). Alternatively, you can specify the threshold model parameters in Mdl.Switch as initial values for the estimation procedure (see estimate).
Create Fully Specified Univariate STAR Model Create a three-state STAR model with logistic transitions (LSTAR) for a 1-D response process. Specify all parameter values (this example uses arbitrary values). Create smooth, logistic threshold transitions at levels 2 and 8. Specify the following transition rates: • 3.5, when the system transitions from state 1 to state 2. • 1.5, when the system transitions from state 2 to state 3. t = [2 8]; r = [3.5 1.5]; tt = threshold(t,Type="logistic",Rates=r) tt = threshold with properties:
12-2295
12
Functions
Type: Levels: Rates: StateNames: NumStates:
'logistic' [2 8] [3.5000 1.5000] ["1" "2" "3"] 3
tt is a fully specified threshold object. Assume the following univariate models describe the response process of the system: • State 1: y = − 5 + ε , where ε ∼ Ν 0, 0 . 12 . 1 1 t • State 2: y = ε , where ε ∼ Ν 0, 0 . 22 . 2 2 t • State 3: y = 5 + ε , where ε ∼ Ν 0, 0 . 32 . 3 3 t mdl1 = arima(Constant=-5,Variance=0.1); mdl2 = arima(Constant=0,Variance=0.2); mdl3 = arima(Constant=5,Variance=0.3); mdl = [mdl1,mdl2,mdl3];
Create a STAR model from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 1 ["1" "2" "3"] "1" []
Mdl is a fully specified tsVAR object representing the STAR model.
Create Partially Specified Univariate Model for Estimation Consider fitting to data a two-state exponential STAR model for a 1-D response process. Assume all parameters are unknown (includes transition mid-level t, rate r , and all dynamic model coefficients and variances θ). Create an exponential threshold transition. Specify unknown elements using NaN. tt = threshold(NaN,Type="exponential",Rates=NaN) tt = threshold with properties:
12-2296
tsVAR
Type: Levels: Rates: StateNames: NumStates:
'exponential' NaN NaN ["1" "2"] 2
tt is a partially specified threshold object. The mid-level tt.Levels and transition rate tt.Rates are unknown and estimable. Create AR(1) and AR(2) models by using the shorthand syntax of arima. mdl1 = arima(1,0,0); mdl2 = arima(2,0,0);
mdl1 and mdl2 are partially specified arima objects. NaN-valued properties correspond to unknown, estimable parameters. Store the submodels in a vector. mdl = [mdl1; mdl2];
Create a STAR model template from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["1" "2"] "1" []
Mdl is a partially specified tsVAR object representing a univariate two-state STAR model. Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(1) Model" "Y1" 1 1 NaN {NaN} at lag [1] 0 [1×0 matrix] NaN
Mdl.Submodels(2) ans = varm with properties:
12-2297
12
Functions
Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" "Y1" 1 2 NaN {NaN NaN} at lags [1 2] 0 [1×0 matrix] NaN
tsVAR converts the arima object submodels to 1-D varm object equivalents. Mdl is prepared for estimation. You can pass Mdl to estimate, along with data and a fully specified threshold transition (threshold object) containing initial values for optimization.
Create Fully Specified Multivariate Model Create the following two, three-state threshold-switching dynamic regression models for a 2-D response process: 1
A model with state-specific innovations distributions
2
A model with a model-wide innovation covariance
Specify all parameter values (this example uses arbitrary values). Create Threshold Transitions Create logistic threshold transitions at mid-levels 2 and 8 with rates 3.5 and 1.5, respectively. Label the corresponding states "Low", "Med", and "High". t = [2 8]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [2 8] [3.5000 1.5000] ["Low" "Med" 3
"High"]
tt is a fully specified threshold object. Specify State-Specific Innovations Covariance Matrices Assume the following VAR models describe the response processes of the system: •
12-2298
State 1: yt =
1 0 1 −0 . 1 + ε1, t, where ε1, t ∼ N , . −1 0 −0 . 1 1
tsVAR
• •
State 2: yt = State 3: yt =
2 −2
+
0.5 0.1 0.5 0.5
yt − 1 + ε2, t, where ε2, t ∼ N
0 0
,
2
−0 . 2
−0 . 2
2
.
3 0 . 25 0 0 0 0 3 −0 . 3 + yt − 1 + yt − 2 + ε3, t, where ε3, t ∼ N , . −3 0 0 0 . 25 0 0 −0 . 3 3
% Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Autoregression coefficients (numSeries AR1 = {}; % 0 AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2
x numSeries matrices) lags lag lags
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VAR Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3);
mdl1, mdl2, and mdl3 are fully specified varm objects. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. mdl = [mdl1; mdl2; mdl3];
Create an LSTAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] []
"High"]
Mdl is a fully specified tsVAR object representing a multivariate three-state LSTAR model. Because the Covariance property is empty ([]), the submodels have their own innovations covariance matrix. Specify Model-Wide Innovations Covariance Matrix Suppose that the innovations covariance matrix is invariant across states and has value 2 −0 . 2 . −0 . 2 2 12-2299
12
Functions
Create an LSTAR model like Mdl that has the model-wide innovations covariance matrix. MdlCov = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"],Covariance=Sigma2) MdlCov = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] [2x2 double]
"High"]
MdlCov.Covariance ans = 2×2 2.0000 -0.2000
-0.2000 2.0000
The Covariance property of MdlCov is nonempty, which means the innovations distribution of all submodels are equal.
Create Fully Specified Model Containing Regression Component Consider including regression components for exogenous variables in each submodel of the threshold-switching dynamic regression model in “Create Fully Specified Multivariate Model” on page 12-2298. Create logistic threshold transitions at mid-levels 2 and 8 with rates 3.5 and 1.5, respectively. Label the corresponding states "Low", "Med", and "High". t = [2 8]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [2 8] [3.5000 1.5000] ["Low" "Med" 3
"High"]
Assume the following VARX models describe the response processes of the system: •
12-2300
State 1: yt =
1 1 0 1 −0 . 1 + x1, t + ε1, t, where ε1, t ∼ N , . −1 −1 0 −0 . 1 1
tsVAR
• •
State 2: yt =
2 −2
+
2
2
−2 −2
x2, t +
0.5 0.1 0.5 0.5
yt − 1 + ε2, t, where ε2, t ∼ N
0 0
,
2
−0 . 2
−0 . 2
2
.
3 3 3 3 0 . 25 0 0 0 + x3, t + yt − 1 + yt − 2 + ε3, t, where −3 −3 −3 −3 0 0 0 . 25 0 0 3 −0 . 3 ε3, t ∼ N , . 0 −0 . 3 3 State 3: yt =
x1, t represents a single exogenous variable, x2, t represents two exogenous variables, and x3, t represents three exogenous variables. Store the submodels in a vector. % Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Regression coefficients (numSeries x numRegressors matrices) Beta1 = [1; -1]; % 1 regressor Beta2 = [2 2; -2 -2]; % 2 regressors Beta3 = [3 3 3; -3 -3 -3]; % 3 regressors % Autoregression coefficients (numSeries x numSeries matrices) AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VARX mdl1 = mdl2 = mdl3 =
submodels varm(Constant=C1,AR=AR1,Beta=Beta1,Covariance=Sigma1); varm(Constant=C2,AR=AR2,Beta=Beta2,Covariance=Sigma2); varm(Constant=C3,AR=AR3,Beta=Beta3,Covariance=Sigma3);
mdl = [mdl1; mdl2; mdl3];
Create an LSTAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] []
"High"]
Mdl.Submodels(1)
12-2301
12
Functions
ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional VARX(0) Model with 1 Predictor" "Y1" "Y2" 2 0 [1 -1]' {} [2×1 vector of zeros] [2×1 matrix] [2×2 matrix]
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(1) Model with 2 Predictors" "Y1" "Y2" 2 1 [2 -2]' {2×2 matrix} at lag [1] [2×1 vector of zeros] [2×2 matrix] [2×2 matrix]
Mdl.Submodels(3) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(2) Model with 3 Predictors" "Y1" "Y2" 2 2 [3 -3]' {2×2 matrices} at lags [1 2] [2×1 vector of zeros] [2×3 matrix] [2×2 matrix]
Create Partially Specified Multivariate Model for Estimation Consider fitting to data a three-state TAR model for a 2-D response process. Assume all parameters are unknown (including the transition two mid-levels t and all dynamic model coefficients and variances θ). Create discrete threshold transitions at two unknown levels. This specification imples a three-state model. t = [NaN NaN]; tt = threshold(t);
12-2302
tsVAR
tt is a partially specified threshold object. The transition mid-levels tt.Levels are completely unknown and estimable. Create 2-D VAR(0), VAR(1), and VAR(2) models by using the shorthand syntax of varm. Store the models in a vector. mdl1 = varm(2,0); mdl2 = varm(2,1); mdl3 = varm(2,2); mdl = [mdl1 mdl2 mdl3]; mdl(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of [2×0 matrix] [2×2 matrix of
VAR(1) Model"
NaNs] NaNs} at lag [1] zeros] NaNs]
mdl contains three state-specific varm model templates for estimation. NaN values in the properties indicate estimable parameters. Create a threshold-switching model template from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["1" "2" "3"] ["1" "2"] []
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend:
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of
VAR(1) Model"
NaNs] NaNs} at lag [1] zeros]
12-2303
12
Functions
Beta: [2×0 matrix] Covariance: [2×2 matrix of NaNs]
Mdl is a partially specified tsVAR model for estimation.
Specify Model Regression Component for Estimation Consider estimating all submodel coefficients and innovations covariances, and the threshold levels, of the TAR model in “Create Fully Specified Model Containing Regression Component” on page 122300. Create logistic threshold transitions at two unknown (NaN) mid-levels and with two unknown rates. Label the corresponding states "Low", "Med", and "High". stateNames = ["Low" "Med" "High"]; tt = threshold([NaN NaN],Type="logistic",Rates=[NaN NaN],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [NaN NaN] [NaN NaN] ["Low" "Med" 3
"High"]
Specify the appropriate 2-D VAR model for each state by using thee shorthand syntax of varm, then use dot notation to specify a numSeries-by-numRegressors unknown, estimable exogenous regression coefficient matrix. • State 1: VARX(0) mode1 with one regressor • State 2: VARX(1) model with two regressors • State 3: VARX(2) model with three regressors Store the submodels in a vector. mdl1 = varm(2,0); mdl1.Beta = nan(2,1); % numSeries-by-numRegressors mdl2 = varm(2,1); mdl2.Beta = nan(2,2); mdl3 = varm(2,2); mdl3.Beta = nan(2,3); mdl = [mdl1; mdl2; mdl3];
Create an estimable LSTAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties:
12-2304
tsVAR
Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] []
"High"]
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of [2×2 matrix of [2×2 matrix of
VARX(1) Model with 2 Predictors"
NaNs] NaNs} at lag [1] zeros] NaNs] NaNs]
Create Model Specifying Equality Constraints for Estimation The estimate function generally supports constraints on any parameter to a known constant. Also, you can specify a model-wide innovations covariance by setting the Covariance property of tsVAR. Consider estimating a univariate threshold-switching model with the following characteristics: • A threshold transition is known to occur at 0. • The transition function is the normal cdf with unknown rate. • States 1 and 2 are constant models. The constants are unknown. • The innovations process is invariant among states, but the variance is unknown. Create the described threshold transition. t = 0; tt = threshold(t,Type="normal",Rates=NaN) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'normal' 0 NaN ["1" "2"] 2
tt is a partially specified threshold object. Only the transition function rate tt.Rates is unknown and estimable. 12-2305
12
Functions
Create the described submodels by using the shorthand syntax of arima. Store the submodels in a vector. mdl1 = arima(0,0,0); mdl2 = arima(0,0,0) mdl2 = arima with properties: Description: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(0,0,0) Model (Gaussian Distribution)" Name = "Gaussian" 0 0 0 NaN {} {} {} {} 0 [1×0] NaN
mdl = [mdl1 mdl2];
mdl1 and mdl2 are partially specified arima objects representing constant-only linear models. Each model constant is unknown and estimable. Create a threshold-switching model from the specified threshold transitions and submodels. Specify and unknown, estimable model-wide innovations covariance matrix. Mdl = tsVAR(tt,mdl,Covariance=NaN) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["1" "2"] "1" NaN
Mdl is a partially specified tsVAR object configured for estimation. Because Mdl.Covariance is nonempty, MATLAB ignores any specified submodel innovations variances.
More About Threshold-Switching Dynamic Regression Model A Threshold-switching dynamic regression model of a univariate or multivariate response series yt is a nonlinear time series model that describes the dynamic behavior of the series in the presence of structural breaks or regime changes. A collection of state-specific dynamic regression submodels describes the dynamic behavior of yt within the regimes. 12-2306
tsVAR
yt
f 1 yt; xt, θ1
, st(zt) = 1
f 2 yt; xt, θ2
, st(zt) = 2
fn + 1
, ⋮ ⋮ yt; xt, θn + 1 , st(zt) = n + 1
where: • n + 1 is the number of regimes (NumStates). •
f i yt; xt, θi is the regime i dynamic regression model of yt (Submodels(i)). Submodels are either univariate (ARX) or multivariate (VARX). Such a collection of models yields a threshold autoregressive model (TAR).
• zt is the univariate threshold variable. zt can be exogenous to the system or endogenous and delayed. If zt = yk, t − d, the system is a self-exciting threshold autoregressive model (SETAR) with unobserved delay d. • xt is a vector of observed exogenous variables at time t. • θi is the regime i collection of parameters of the dynamic regression model, such as AR coefficients and the innovation variances. The switching mechanism (Switch) is governed by threshold transitions and zt. The state index variable st(zt) is not random—observed values of the threshold variable zt determine the state of the system: 1 st =
, zt < t1
2
, t1 ≤ zt < t2 , ⋮ ⋮ n + 1 , zt ≥ tn
where tj is the unobserved threshold mid-level j (Switch.Levels(j)). Threshold levels must be inferred from the data. A state transition occurs when zt crosses a transition mid-level. Transitions can be discrete or smooth. Transitions of TAR models are discrete, which result in an abrupt change in the submodel computing the response. An extension of the TAR model is to allow for smooth transitions. Smooth transition autoregressive models (STAR) create weighted combinations of submodel responses that change continuously with the value of zt, and state changes indicate a shift in the dominant submodel. The smooth transition weights are determined by a transition function F(zt,tj,rj) (Switch.Type) and transition rate parameter rj (Switch.Rates), where 0 ≤ F(zt,tj,rj) ≤ 1. Supported transition functions include the normal cdf, logistic (LSTAR), bounded exponential (ESTAR), and custom functions. As a result, the general form of the threshold-switching autoregressive model is yt = f 1(yt; xt, θ1) +
n
∑
j=1
f j + 1 yt; xt, θ j + 1 − f j yt; xt, θ j F zt, t j, r j
.
• In this general case, innovation covariances can switch with the submodel. • For STAR models, the formula assigns weights to all submodel means, regardless of the current state. 12-2307
12
Functions
• For TAR models, the formula assigns unit weight to the current state/submodel only. • With observations, F(zt,tj,rj) = ttdata(tt,z), where z is a vector of threshold variable data. When Covariance is a nonempty array Σ, it is used to generate all innovations, independent of the submodels. The model reduces to ([3], Eqn. 3.6) yt = μ1 +
n
∑
j=1
μ j + 1 − μ j F zt, t j, r j
+ εt,
where: • μ j = E yt yt − 1, xt , which is the conditional mean of the response series yt in state j. • εt is an iid Gaussian innovations series with covariance Σ.
Algorithms A threshold variable zt, which triggers transitions between states, is not required to create Mdl. Specify exogenous or endogenous threshold variable data, and its characteristics, when you pass Mdl to an object function on page 12-2308.
Version History Introduced in R2021b
References [1] Enders, Walter. Applied Econometric Time Series. New York: John Wiley & Sons, Inc., 2009. [2] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [3] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also threshold | arima | varm Topics “Create Threshold Transitions” on page 10-73 “Create Threshold-Switching Dynamic Regression Models” on page 10-88 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-110
12-2308
ttdata
ttdata Transition function data
Syntax T = ttdata(tt,z) T = ttdata(tt,z,UseZeroLevels=tf)
Description ttdata evaluates the transition function for observations of the threshold variable. To plot transition functions of threshold transitions, use ttplot. T = ttdata(tt,z) returns transition function data T for the threshold transitions in tt at values of the threshold variable z. T = ttdata(tt,z,UseZeroLevels=tf) returns transformation function data with all levels set to 0 when tf is true.
Examples Evaluate Transition Function Create logistic threshold transitions at levels 0 and 5. t = [0 5]; tt = threshold(t,Type="logistic");
tt is a threshold object. By default, the rate of each logistic transition function is 1. Evaluate the transition function at a sequence of transition variable data from -10 to 10. z = -10:0.01:10; F = ttdata(tt,z); size(F) ans = 1×2 2001
2
F is a 2001-by-2 vector of transition function data. Each column is the transition function data for the corresponding threshold in tt.Levels.
12-2309
12
Functions
Plot Transition Functions at Respective Levels To facilitate comparisons among transition rates, the Type="graph" option of ttplot graphs all transition functions at the same level. This example shows how to graph transition functions each at their respective level. Create normal threshold transitions at levels 0 and 5 with rates 0.5 and 1.5, respectively. tt = threshold([0 5],Type="normal",Rates=[0.5 1.5]);
Evaluate the transition functions at their respective level (the default), and then evaluate them each relative to level 0. Specify a sequence of transition variable data from -10 to 10. z = -10:0.01:10; n = numel(z); T0 = ttdata(tt,z); T1 = ttdata(tt,z,UseZeroLevels=true);
T0 is an n-by-1 vector of raw transition function data evaluated at the grid of transition variable data. T1 is an n-by-1 vector of transition function data translated to be centered at level 0. Plot both sets of transition functions separately. % Raw transition functions figure plot(z,T0,LineWidth=2) xline(tt.Levels,'--') grid on xlabel("Level") legend(["Level 0, Rate 0.5" "Level 5, Rate 1.5"],Location="northwest")
12-2310
ttdata
% Shifted transition functions figure plot(z,T1,LineWidth=2) grid on xlabel("Distance from Level") legend(["Level 0, Rate 0.5" "Level 5, Rate 1.5"],Location="northwest")
12-2311
12
Functions
Input Arguments tt — Threshold transitions threshold object Threshold transitions, with NumStates states, specified as a threshold object. tt must be fully specified (no NaN entries). z — Threshold variable data numeric vector Threshold variable data, specified as a numeric vector. Data Types: double tf — Flag indicating whether to compute data with all levels set to zero false (default) | true Flag indicating whether to compute data with all levels set to 0, specified as a value in this table:
12-2312
Value
Description
true
Computes transformation function data with all levels set to 0. This setting is useful for comparing transition rates.
ttdata
Value
Description
false
Computes raw transformation function data.
Example: UseZeroLevels=true Data Types: logical
Output Arguments T — Transition function data numeric matrix Transition function data F(z,tt.Levels,tt.Rates), returned as a numeric matrix. F is specified by tt.TransitionFunctionData. Data in T gives transition function values relative to each level in tt.Levels. The number of rows of T is equal to the length of z and the number of columns is equal to the number of levels. For details, see threshold.
Version History Introduced in R2021b
See Also Objects threshold Functions ttplot | ttstates Topics “Create Threshold Transitions” on page 10-73 “Visualize Threshold Transitions” on page 10-76
12-2313
12
Functions
ttplot Plot threshold transitions
Syntax ttplot(tt) ttplot(tt,Name,Value) ttplot(ax, ___ ) h = ttplot( ___ )
Description ttplot plots transition functions of threshold transitions. To evaluate the transition function for observations of the threshold variable, use ttdata. ttplot(tt) plots transition bands between states of the threshold transitions tt on the y-axis. The plot shows gradient shading of the mixing level on page 12-2323 in the transition bands. ttplot(tt,Name,Value) uses additional options specified by one or more name-value arguments. For example, ttplot(tt,Type="graph") specifies plotting a line plot of the transition function at each threshold level on the same axes. ttplot(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca) using any of the input argument combinations in the previous syntaxes. h = ttplot( ___ ) returns a handle h to the threshold transitions plot. Use h to modify properties of the plot after you create it.
Examples Plot Discrete Threshold Transitions Create discrete threshold transitions at 0 and 2. t = [0 2]; tt = threshold(t) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' [0 2] [] ["1" "2" 3
"3"]
tt is a threshold object. The specified thresholds split the space into three states. Plot the threshold transitions. 12-2314
ttplot
ttplot(tt);
ttplot graphs a gradient plot by default. The y-axis represents the value of the threshold variable zt (currently undefined) and the state-space: • The system is in state 1 when zt < 0. • The system is in state 2 when 0 ≤ zt < 2. • The system is in state 3 when zt ≥ 2. Because the transitions are discrete, ttplot graphs the levels as lines—the regime switches abruptly when zt crosses a threshold variable. Because zt is undefined, the x-axis is arbitrary. When you specify threshold variable data by using the Data name-value argument, the x-axis is the sample index.
Plot Smooth Threshold Transitions This example shows how to create two logistic threshold transitions with different transition rates, and then display a gradient plot of the transitions. Load the yearly Canadian inflation and interest rates data set. Extract the inflation rate based on consumer price index (INF_C) from the table, and plot the series. 12-2315
12
Functions
load Data_Canada INF_C = DataTable.INF_C; plot(dates,INF_C); axis tight
Assume the following characteristics of the inflation rate series: • Rates below 2% are low. • Rates at least 2% and below 8% are medium. • Rates at least 8% are high. • A logistic transition function describes the transition between states well. • Transition between low and medium rates are faster than transitions between medium and high. Create threshold transitions to describe the Canadian inflation rates. t = [2 8]; % Thresholds r = [3.5 1.5]; % Transition rates statenames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=r,StateNames=statenames) tt = threshold with properties: Type: 'logistic' Levels: [2 8]
12-2316
ttplot
Rates: [3.5000 1.5000] StateNames: ["Low" "Med" NumStates: 3
"High"]
Plot the threshold transitions; show the gradient of the transition function between the states, and overlay the data. figure ttplot(tt,Data=INF_C)
Visually Compare Transition Functions Create normal cdf threshold transitions at levels 0 and 5, with rates 0.5 and 1.5. t = [0 5]; r = [0.5 1.5]; tt = threshold(t,Type="normal",Rates=r) tt = threshold with properties: Type: 'normal' Levels: [0 5] Rates: [0.5000 1.5000]
12-2317
12
Functions
StateNames: ["1" NumStates: 3
"2"
"3"]
To compare the behavior of the transition functions, plot their graphs at the same level. figure ttplot(tt,Type="graph",Width=20)
Plot the transition functions at their levels. Evaluate the transition function over a 1-D grid of values by using ttdata, and then plot the results. lower = tt.Levels(1) - 3/min(tt.Rates); upper = tt.Levels(end) + 3/min(tt.Rates); z = lower:0.1:upper; F = ttdata(tt,z,UseZeroLevels=false); figure plot(z,F,LineWidth=2) grid on xlabel("Level") legend(["Level 0, Rate 0.5" "Level 5, Rate 1.5"],Location="NorthWest")
12-2318
ttplot
Plot Exponential Threshold Transitions Create smooth threshold transitions for the Australian to US dollar exchange rate to model price parity. Load the Australia/US purchasing power and interest rates data set. Extract the log of the exchange rate EXCH from the table. load Data_JAustralian EXCH = DataTable.EXCH;
Consider a two-state system where: • State 1 occurs when the Australian dollar buys more than the US dollar (EXCH ≥ 0). • State 2 occurs when the US dollar buys more than the Australian dollar (EXCH < 0). • States are weighed more highly as the system deviates from parity (EXCH = 0). Create threshold transitions representing the system. To attribute a greater amount of mixing away from the threshold, specify an exponential transition function. Set the transition rate to 2.5. tt = threshold(0,Type="exponential",Rates=2.5) tt = threshold with properties:
12-2319
12
Functions
Type: Levels: Rates: StateNames: NumStates:
'exponential' 0 2.5000 ["1" "2"] 2
Plot the threshold transitions with the threshold data. figure ttplot(tt,Data=EXCH);
Try improving the display by experimenting with the transition band width (Width name-value argument). figure ttplot(tt,Data=EXCH,Width=2);
12-2320
ttplot
Plot the transition function. figure ttplot(tt,Type="graph");
12-2321
12
Functions
Input Arguments tt — Threshold transitions threshold object Threshold transitions, with NumStates states, specified as a threshold object. tt must be fully specified (no NaN entries). ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, ttplot plots to the current axes (gca). Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: Type="graph" specifies plotting a line plot of the transition function at each threshold level on the same axes. 12-2322
ttplot
Type — Plot type "gradient" (default) | "graph" Plot type, specified as a value in this table. Value
Description
"gradient"
Gradient shading of the mixing level in each transition band.
"graph"
Graphs of transition functions at each level. ttplot plots graphs with levels set to zero.
Example: Type="graph" Data Types: char | string Data — Data on threshold variable zt to include in plot [] (empty array) (default) | numeric vector Data on a threshold variable zt to include in the plot, specified as a numeric vector. ttplot plots Data with gradient shading of transition bands (Type="gradient"). If Type="graph", ttplot ignores Data. Data Types: double Width — Width of transition bands positive numeric scalar Width of transition bands, specified as a positive numeric scalar. • For gradient plots (Type="gradient"), ttplot truncates transition function data outside of the bands. • For transition function graphs (Type="graph"), ttplot sets the x-axis limits to [-Width/2 Width/2]. By default, ttplot selects the band width automatically. Example: Width=10 Data Types: double
Output Arguments h — Plot handle graphics object Plot handle, returned as a graphics object. h contains a unique plot identifier, which you can use to query or modify properties of the plot.
More About Mixing Level The mixing level is the degree to which adjacent states contribute to a response. 12-2323
12
Functions
Transition functions F vary between 0 and 1; adjacent states are assigned weights F and 1 – F. The mixing level between adjacent states is the minimum weight min(F, 1 – F). The following characteristics define the mixing behavior of each transition type: • Discrete transitions have no mixing. • Normal and logistic transitions achieve maximum mixing at threshold levels. • Exponential transitions achieve maximum mixing on either side of threshold levels. For more details, see threshold.
Tips • Use the Width name-value argument to adjust the display of transition function graph (Type="graph") plots with varying rates. In multilevel gradient plots (Type="gradient"), a large enough width results in overlapping transition bands that can misrepresent data. By default, ttplot chooses an appropriate width for displaying all transitions.
Version History Introduced in R2021b
See Also Objects threshold Functions ttdata Topics “Create Threshold Transitions” on page 10-73 “Visualize Threshold Transitions” on page 10-76 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-110
12-2324
ttstates
ttstates Threshold variable data state path
Syntax states = ttstates(tt,z)
Description states = ttstates(tt,z) returns the state path for values of the threshold variable z relative to the levels in the Levels property of tt.
Examples Compute State Path Load the yearly Canadian inflation and interest rates data set. Extract the inflation rate based on consumer price index (INF_C) from the table. load Data_Canada INF_C = DataTable.INF_C;
Assume the following characteristics of the inflation rate series: • Rates below 2% are low. • Rates at least 2% and below 8% are medium. • Rates at least 8% are high. • States transition abruptly. Create threshold transitions to describe the Canadian inflation rates. statenames = ["Low" "Med" "High"]; tt = threshold([2 8],StateNames=statenames);
Infer the state path by passing the inflation rate series through the threshold transitions. n = numel(INF_C); states = ttstates(tt,INF_C); snpath = tt.StateNames(states);
states is an n-by-1 vector of inferred state indices. snpath is the state path using state names instead of indices. Separately plot the inflation rate series and inferred state path. figure tiledlayout(2,1) nexttile h = ttplot(tt,Data=INF_C); legend(h([1 3]),["State threshold" "Inflation rate"])
12-2325
12
Functions
nexttile plot(states,'go',LineWidth=2) ylabel('State') yticks(1:3) yticklabels(tt.StateNames) axis tight
Input Arguments tt — Threshold transitions threshold object Threshold transitions, with NumStates states, specified as a threshold object. tt must be fully specified (no NaN entries). z — Threshold variable data numeric vector Threshold variable data, specified as a numeric vector. Data Types: double
12-2326
ttstates
Output Arguments states — Threshold data states numeric vector Threshold data states, returned as a numeric vector with the same length as z. If the transition mid-levels tt.Levels are t1, t2,… tn, ttstates labels states (−∞,t1), [t1,t2),… [tn,∞) as 1, 2, …, n+1, respectively. States are independent of threshold rates.
Algorithms In threshold-switching dynamic regression models (tsVAR), state transitions occur when a threshold variable crosses a transition mid-level. Discrete transitions result in an abrupt change in the submodel computing the response. Smooth transitions create weighted combinations of submodel responses that change continuously with the value of the threshold variable, and state changes indicate a shift in the dominant submodel. For more details, see tsVAR.
Version History Introduced in R2021b
See Also Objects threshold | tsVAR Functions ttdata Topics “Create Threshold Transitions” on page 10-73 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-110
12-2327
12
Functions
tune Tune Bayesian state-space model posterior sampler
Syntax [params,Proposal] = tune(PriorMdl,Y,params0) [params,Proposal] = tune(PriorMdl,Y,params0,Name=Value)
Description The tune function searches for the posterior mode to form proposal distribution moments of the Metropolis-Hastings sampler [1][2]. To improve the proposal distribution, for example, increase the acceptance rate of proposed posterior draws, pass the outputs of tune to simulate. [params,Proposal] = tune(PriorMdl,Y,params0) returns proposal distribution parameter mean vector params and scale matrix Proposal to improve the Metropolis-Hastings sampler. PriorMdl is the Bayesian state-space model that specifies the state-space model structure (likelihood) and prior distribution, Y is the data for the likelihood, and params0 is the vector of initial values for the unknown state-space model parameters θ in PriorMdl. [params,Proposal] = tune(PriorMdl,Y,params0,Name=Value) specifies additional options using one or more name-value arguments. For example, tune(Mdl,Y,params0,Hessian="opg",Display=false) uses the outer-product of gradients method to compute the Hessian matrix and suppresses the display of the optimized values.
Examples Tune Proposal Distribution for Metropolis Sampler of Bayesian State-Space Model Simulate observed responses from a known state-space model, then treat the model as Bayesian and draw parameters from the posterior distribution. Tune the proposal distribution of the MetropolisHastings sampler by using tune. Suppose the following state-space model is a data-generating process (DGP). xt, 1 xt, 2
=
xt − 1, 1 0.5 0 1 0 ut, 1 + 0 −0 . 75 xt − 1, 2 0 0 . 5 ut, 2
yt = 1 1
xt, 1 xt, 2
.
Create a standard state-space model object ssm that represents the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)]; B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C);
12-2328
tune
Simulate a response path from the DGP. rng(1); % For reproducibility y = simulate(DGP,200);
Suppose the structure of the DGP is known, but the state parameters trueTheta are unknown, explicitly xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 1
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
Consider a Bayesian state-space model representing the model with unknown parameters. Arbitrarily assume that the prior distribution of ϕ1, ϕ2, σ12, and σ22 are independent Gaussian random variables with mean 0.5 and variance 1. The Local Functions on page 12-2330 section contains two functions required to specify the Bayesian state-space model. You can use the functions only within this script. The paramMap function accepts a vector of the unknown state-space model parameters and returns all the following quantities: • •
A= B=
ϕ1 0 0 ϕ2 σ1 0 0 σ2
. .
• C= 1 1. • D = 0. • Mean0 and Cov0 are empty arrays [], which specify the defaults. • StateType = 0 0 , indicating that each state is stationary. The paramDistribution function accepts the same vector of unknown parameters as does paramMap, but it returns the log prior density of the parameters at their current values. Specify that parameter values outside the parameter space have log prior density of -Inf. Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm. Mdl = bssm(@paramMap,@priorDistribution) Mdl = Mapping that defines a state-space model: @paramMap Log density of parameter prior distribution: @priorDistribution
12-2329
12
Functions
The simulate function requires a proposal distribution scale matrix. You can obtain a data-driven proposal scale matrix by using the tune function. Alternatively, you can supply your own scale matrix. Obtain a data-driven scale matrix by using the tune function. Supply a random set of initial parameter values. numParams = 4; theta0 = rand(numParams,1); [theta0,Proposal] = tune(Mdl,y,theta0); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.6968 0.4459 0.0798 c(2) | 0.7662 -0.8781 0.0483 c(3) | 0.3425 0.9633 0.0694 c(4) | 0.8459 0.3978 0.0726
theta0 is a 4-by-1 estimate of the posterior mode and Proposal is the Hessian matrix. Both outputs are the optimized moments of the proposal distribution, the latter of which is up to a proportionality constant. tune displays convergence information and an estimation table, which you can suppress by using the Display options of the optimizer and tune. Draw 1000 random parameter vectors from the posterior distribution. Specify the simulated response path as observed responses and the optimized values returned by tune for the initial parameter values and the proposal distribution. [Theta,accept] = simulate(Mdl,y,theta0,Proposal); accept accept = 0.4010
Theta is a 4-by-1000 matrix of randomly drawn parameters from the posterior distribution. Rows correspond to the elements of the input argument theta of the functions paramMap and priorDistribution. accept is the proposal acceptance probability. In this case, simulate accepts 40% of the proposal draws. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean
12-2330
tune
Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Suppress Tuning Displays and Specify Hessian Method Consider the following time-varying, state-space model for a DGP: • From periods 1 through 250, the state equation includes stationary AR(2) and MA(1) models, respectively, and the observation model is the weighted sum of the two states. • From periods 251 through 500, the state model includes only the first AR(2) model. • μ0 = 0 . 5 0 . 5 0 0 and Σ0 is the identity matrix. Symbolically, the DGP is x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1
σ1 0 0 0 u1t 0 1 u2t for t = 1, . . . , 250, 0 1
+
0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
yt = c1 x1t + x3t + σ2εt . x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1
σ1
+
1 0 0 0 x3, t − 1
0
u1t
for t = 251,
x4, t − 1 yt = c2x1t + σ3εt . x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0
u1t
for t = 252, . . . , 500 .
yt = c2x1t + σ3εt . where: • The AR(2) parameters ϕ1, ϕ2 = 0 . 5, − 0 . 2 and σ1 = 0 . 4. • The MA(1) parameter θ = 0 . 3. 12-2331
12
Functions
• The observation equation parameters c1, c2 = 2, 3 and σ2, σ3 = 0 . 1, 0 . 2 . Simulate a response path of length 500 from the model. The function timeVariantParamMapBayes.m, stored in matlabroot/examples/econ/main, specifies the state-space model structure. params = [0.5; -0.2; 0.4; 0.3; 2; 0.1; 3; 0.2]; numObs = 500; numParams = numel(params); [A,B,C,D,mean0,Cov0,stateType] = timeVariantParamMapBayes(params,numObs); DGP = ssm(A,B,C,D,Mean0=mean0,Cov0=Cov0,StateType=stateType); rng(1) % For reproducibility y = simulate(DGP,numObs); plot(y) ylabel("y")
Create a time-varying, Bayesian state-space model that uses the structure of the DGP, but all parameters are unknown and the prior density is flat (the function flatPriorBSSM.m in matlabroot/examples/econ/main is the prior density). Create a bssm object representing the Bayesian state-space model object. Mdl = bssm(@(params)timeVariantParamMapBayes(params,numObs),@flatPriorBSSM);
12-2332
tune
Obtain optimized values for the proposal distribution moments by using tune. Initialize the parameter values to a random set of positive values in [0,0.1]. Suppress all tuning disaplys. Use the Hessian matrix returned by the optimizer of the posterior mode. params0 = 0.1*rand(numParams,1); options = optimoptions("fminunc",Display="off"); [params0,Proposal] = tune(Mdl,y,params0,Options=options,Display=false, ... Hessian="optimizer");
Draw a sample from the posterior distribution. Supply the optimized parameter estimates. Set the proposal distribution to multivariate t25 with a scale matrix proportional. Set the proportionality constant to 0.005. [PostParams,accept] = simulate(Mdl,y,params0,Proposal, ... Dof=25,Proportion=0.1); accept accept = 0.7950
PostParams is an 8-by-1000 matrix of 1000 random draws from the posterior distribution. The Metropolis-Hastings sampler accepted 80% of the proposed draws.
Apply Constraints for Sampler Tuning The log joint prior distribution function specifies parameter constraints by attributing a probability of -Inf for arguments outside the support of the distribution. Because posterior sampling does not occur during proposal distribution tuning, it is good practice to additionally specify constraints when you call tune. Consider a regression of the US unemployment rate onto and real gross national product (rGNP) rate, and suppose the resulting innovations are an ARMA(1,1) process. The state-space form of the relationship is x1, t x2, t
=
σ ϕ θ x1, t − 1 + ut 1 0 0 x2, t − 1
yt − βZt = x1, t, where: • x1, t is the ARMA process. • x2, t is a dummy state for the MA(1) effect. •
yt is the observed unemployment rate deflated by a constant and the rGNP rate (Zt).
• ut is an iid Gaussian series with mean 0 and standard deviation 1. Load the Nelson-Plosser data set, which contains a table DataTable that has the unemployment rate and rGNP series, among other series. load Data_NelsonPlosser
Create a variable in DataTable that represents the returns of the raw rGNP series. Because price-toreturns conversion reduces the sample size by one, prepad the series with NaN. 12-2333
12
Functions
DataTable.RGNPRate = [NaN; price2ret(DataTable.GNPR)]; T = height(DataTable);
Create variables for the regression. Represent the unemployment rate as the observation series and the constant and rGNP rate series as the deflation data Zt. Z = [ones(T,1) DataTable.RGNPRate]; y = DataTable.UR;
The functions armaDeflateYBayes.m and flatPriorDeflateY.m in matlabroot/examples/ econ/main specify the state-space model structure and likelihood, and the flat prior distribution. Create a bssm object representing the Bayesian state-space model. Specify the parameter-to-matrix mapping function as a handle to a function solely of the parameters. numParams = 5; Mdl = bssm(@(params)armaDeflateYBayes(params,y,Z),@flatPriorDeflateY) Mdl = Mapping that defines a state-space model: @(params)armaDeflateYBayes(params,y,Z) Log density of parameter prior distribution: @flatPriorDeflateY
Tune the proposal distribution. Initialize the Kalman filter with a random set of positive values in [0,0.5]. Suppress the optimization displays. The log prior joint density function flatPriorDeflateY.m specifies that the AR coefficient theta(1) must be within the unit circle and that the observation error standard deviation must be positive theta(3). Specify these constraints for proposal distribution optimization. Use the Hessian matrix returned by fmincon. rng(1) % For reproducibility params0 = 0.5*rand(numParams,1); options = optimoptions("fmincon",Display="off"); % Constrained optimization requires FMINCON lb = -Inf*ones(numParams,1); % Preallocation ub = Inf*ones(numParams,1); % Preallocation lb([1; 3]) = [-1; 0]; up(1) = 1; [params0,Proposal] = tune(Mdl,y,params0,Options=options, ... Display=false,Hessian="optimizer",Lower=lb,Upper=ub);
Draw a sample from the posterior distribution. Supply the proposal moments returned by tune. Set the proportionality constant to 0.1. Set a burn-in period of 2000 draws, set a thinning factor of 50, and specify retaining 1000 draws. [PostParams,accept] = simulate(Mdl,y,params0,Proposal,Proportion=0.1, ... BurnIn=2000,NumDraws=1000,Thin=50); accept accept = 0.7506
PostParams is a 5-by-1000 matrix of 1000 draws from the posterior distribution. The MetropolisHastings sampler accepts 75% of the proposed draws. paramNames = ["\phi" "\theta" "\sigma" "\beta_0" "\beta_1"]; figure h = tiledlayout(numParams,1);
12-2334
tune
for j = 1:numParams nexttile plot(PostParams(j,:)) hold on ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
Input Arguments PriorMdl — Prior Bayesian state-space model bssm model object Prior Bayesian state-space model, specified as a bssm model object return by bssm or ssm2bssm. The function handles of the properties PriorMdl.ParamDistribution and PriorMdl.ParamMap determine the prior and the data likelihood, respectively. Y — Observed response data numeric matrix | cell vector of numeric vectors Observed response data, from which tune forms the posterior distribution, specified as a numeric matrix or a cell vector of numeric vectors. • If PriorMdl is time invariant with respect to the observation equation, Y is a T-by-n matrix. Each row of the matrix corresponds to a period and each column corresponds to a particular 12-2335
12
Functions
observation in the model. T is the sample size and n is the number of observations per period. The last row of Y contains the latest observations. • If PriorMdl is time varying with respect to the observation equation, Y is a T-by-1 cell vector. Y{t} contains an nt-dimensional vector of observations for period t, where t = 1, ..., T. The corresponding dimensions of the coefficient matrices, outputs of PriorMdl.ParamMap, C{t}, and D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations. NaN elements indicate missing observations. For details on how the Kalman filter accommodates missing observations, see “Algorithms” on page 12-620. Data Types: double | cell params0 — Initial values for parameters Θ numeric vector Initial parameter values for the parameters Θ, specified as a numParams-by-1 numeric vector. Elements of params0 must correspond to the elements of the first input arguments of PriorMdl.ParamMap and Mdl.ParamDistribution. Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: tune(Mdl,Y,params0,Hessian="opg",Display=false) uses the outer-product of gradients method to compute the Hessian matrix and suppresses the display of the optimized values. Kalman Filter Options
Univariate — Univariate treatment of multivariate series flag false (default) | true Univariate treatment of a multivariate series flag, specified as a value in this table. Value
Description
true
Applies the univariate treatment of a multivariate series, also known as sequential filtering
false
Does not apply sequential filtering
The univariate treatment can accelerate and improve numerical stability of the Kalman filter. However, all observation innovations must be uncorrelated. That is, DtDt' must be diagonal, where Dt (t = 1, ..., T) is the output coefficient matrix D of PriorMdl.ParamMap and PriorMdl.ParamDistribution. Example: Univariate=true Data Types: logical SquareRoot — Square root filter method flag false (default) | true 12-2336
tune
Square root filter method flag, specified as a value in this table. Value
Description
true
Applies the square root filter method for the Kalman filter
false
Does not apply the square root filter method
If you suspect that the eigenvalues of the filtered state or forecasted observation covariance matrices are close to zero, then specify SquareRoot=true. The square root filter is robust to numerical issues arising from the finite precision of calculations, but requires more computational resources. Example: SquareRoot=true Data Types: logical Proposal Tuning Options
Hessian — Hessian approximation method "difference" (default) | "diagonal" | "opg" | "optimizer" | character vector Hessian approximation method for the Metropolis-Hastings proposal distribution scale matrix, specified as a value in this table. Value
Description
"difference"
Finite differencing
"diagonal"
Diagonalized result of finite differencing
"opg"
Outer product of gradients, ignoring the prior distribution
"optimizer"
Posterior distribution optimized by fmincon or fminunc. Specify optimization options by using the Options name-value argument.
Tip The Hessian="difference" setting can be computationally intensive and inaccurate, and the resulting scale matrix can be nonnegative definite. Try one of the other options for better results. Example: Hessian="opg" Data Types: char | string Lower — Parameter lower bounds [] (default) | numeric vector Parameter lower bounds when computing the Hessian matrix (see Hessian), specified as a numParams-by-1 numeric vector. If you specify Lower, tune uses Lower(j) specifies the lower bound of parameter theta(j), the first input argument of PriorMdl.ParamMap and PriorMdl.ParamDistribution. The default value [] specifies no lower bounds.
12-2337
12
Functions
Note Lower does not apply to posterior simulation. To apply parameter constraints on the posterior, code them in the log prior distribution function PriorMdl.ParamDistribution by setting the log prior of values outside the distribution support to -Inf. Example: Lower=[0 -5 -1e7] Data Types: double Upper — Parameter upper bounds [] (default) | numeric vector Parameter lower bounds when computing the Hessian matrix (see Hessian), specified as a numParams-by-1 numeric vector. Upper(j) specifies the upper bound of parameter theta(j), the first input argument of PriorMdl.ParamMap and PriorMdl.ParamDistribution. The default value [] specifies no upper bounds. Note Upper does not apply to posterior simulation. To apply parameter constraints on the posterior, code them in the log prior distribution function PriorMdl.ParamDistribution by setting the log prior of values outside the distribution support to -Inf. Example: Upper=[5 100 1e7] Data Types: double Options — Optimization options optimoptions optimization controller Optimization options for the setting Hessian="optimizer", specified as an optimoptions optimization controller. Options replaces default optimization options of the optimizer. For details on altering default values of the optimizer, see the optimization controller optimoptions, the constrained optimization function fmincon, or the unconstrained optimization function fminunc in Optimization Toolbox. For example, to change the constraint tolerance to 1e-6, set options = optimoptions(@fmincon,ConstraintTolerance=1e-6,Algorithm="sqp"). Then, pass Options by using Options=options. By default, tune uses the default options of the optimizer. Simplex — Simplex search flag true (default) | false Simplex search flag to improve initial parameter values, specified as a value in this table.
12-2338
Value
Description
true
Apply simplex search method to improve initial parameter values for proposal optimization. For more details, see “fminsearch Algorithm”.
tune
Value
Description
false
Does not apply simplex search method.
tune applies simplex search when the numerical optimization exit flag is not positive. Example: Simplex=false Data Types: logical Display — Proposal tuning results display flag true (default) | false Proposal tuning results display flag, specified as a value in this table. Value
Description
true
Displays tuning results
false
Suppresses tuning results display
Example: Display=false Data Types: logical
Output Arguments params — Optimized parameter values numeric vector Optimized parameter values for the Metropolis-Hastings sampler, returned as a numParams-by-1 numeric vector. params(j) contains the optimized value of parameter theta(j), where theta is the first input argument of PriorMdl.ParamMap and PriorMdl.ParamDistribution. When you call simulate, pass params as the initial parameter values input params0. Proposal — Proposal distribution covariance/scale matrix numeric matrix Proposal distribution covariance/scale matrix for the Metropolis-Hastings sampler, specified as a numParams-by-numParams numeric matrix. Rows and columns of Proposal correspond to elements in params. Proposal is the scale matrix up to a proportionality constant, which is specified by the Proportion name-value argument of estimate and simulate. The proposal distribution is multivariate normal or Student's t. When you call simulate, pass Proposal as the proposal distribution scale matrix input Proposal. Data Types: double
Algorithms • The Metropolis-Hastings sampler requires a carefully specified proposal distribution. Under the assumption of a Gaussian linear state-space model, tune tunes the sampler by performing numerical optimization to search for the posterior mode. A reasonable proposal for the 12-2339
12
Functions
multivariate normal or t distribution is the inverse of the negative Hessian matrix, which tune evaluates at the resulting posterior mode. • When tune tunes the proposal distribution, the optimizer that tune uses to search for the posterior mode before computing the Hessian matrix depends on your specifications. • tune uses fminunc by default. • tune uses fmincon when you perform constrained optimization by specifying the Lower or Upper name-value argument. • tune further refines the solution returned by fminunc or fmincon by using fminsearch when Simplex=true, which is the default.
Version History Introduced in R2022a
References [1] Hastings, Wilfred K. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications." Biometrika 57 (April 1970): 97–109. https://doi.org/10.1093/biomet/57.1.97. [2] Metropolis, Nicholas, Rosenbluth, Arianna. W., Rosenbluth, Marshall. N., Teller, Augusta. H., and Teller, Edward. "Equation of State Calculations by Fast Computing Machines." The Journal of Chemical Physics 21 (June 1953): 1087–92. https://doi.org/10.1063/1.1699114.
See Also Objects bssm Functions ssm2bssm | estimate | simulate | fminunc | fmincon | fminsearch Topics “Analyze Linearized DSGE Models” on page 11-178 “Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models” on page 11-199
12-2340
update
update Real-time state update by state-space model Kalman filtering
Syntax [nextState,NextStateCov] = update(Mdl,Y) [nextState,NextStateCov] = update(Mdl,Y,currentState,CurrentStateCov) [nextState,NextStateCov] = update( ___ ,Name,Value) [nextState,NextStateCov,logL] = update( ___ )
Description update efficiently updates the state distribution in real time by applying one recursion of the Kalman filter on page 11-7 to compute state-distribution moments for the final period of the specified response data. To compute state-distribution moments by recursive application of the Kalman filter for each period in the specified response data, use filter instead. [nextState,NextStateCov] = update(Mdl,Y) returns the updated state-distribution moments at the final time T, conditioned on the current state distribution, by applying one recursion of the Kalman filter on page 11-7 to the fully specified, standard state-space model on page 11-3 Mdl given T observed responses Y. nextState and NextStateCov are the mean and covariance, respectively, of the updated state distribution. [nextState,NextStateCov] = update(Mdl,Y,currentState,CurrentStateCov) initializes the Kalman filter at the current state distribution with mean currentState and covariance matrix CurrentStateCov. [nextState,NextStateCov] = update( ___ ,Name,Value) uses additional options specified by one or more name-value arguments, and uses any of the input-argument combinations in the previous syntaxes. For example, update(Mdl,Y,Params=params,SquareRoot=true) sets unknown parameters in the partially specified model Mdl to the values in params, and specifies use of the square-root Kalman filter variant for numerical stability. [nextState,NextStateCov,logL] = update( ___ ) also returns the loglikelihoods computed for each observation in Y.
Examples Compute Only Final State Distribution from Kalman Filter Suppose that a latent process is an AR(1). The state equation is xt = 0 . 5xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. 12-2341
12
Functions
T = 100; ARMdl = arima(AR=0.5,Constant=0,Variance=1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,Y0=x0);
Suppose further that the latent process is subject to additive measurement error. The observation equation is yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.75*randn(T,1);
Specify the four coefficient matrices. A B C D
= = = =
0.5; 1; 1; 0.75;
Specify the state-space model using the coefficient matrices. Mdl = ssm(A,B,C,D) Mdl = State-space model type: ssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.50)x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 1.33
12-2342
update
State types x1 Stationary
Mdl is an ssm model. Verify that the model is correctly specified using the display in the Command Window. The software infers that the state process is stationary. Subsequently, the software sets the initial state mean and covariance to the mean and variance of the stationary distribution of an AR(1) model. Filter the observations through the state-space model, in real time, to obtain the state distribution for period 100. [rtfX100,rtfXVar100] = update(Mdl,y) rtfX100 = 1.2073 rtfXVar100 = 0.3714
update applies the Kalman filter to all observations in y, and returns the state estimate of only period 100. Compare the result to the results of filter. [fX,~,output] = filter(Mdl,y); size(fX) ans = 1×2 100
1
fX100 = fX(100) fX100 = 1.2073 fXVar100 = output(end).FilteredStatesCov fXVar100 = 0.3714 tol = 1e-10; discrepencyMeans = fX100 - rtfX100; discrepencyVars = fXVar100 - rtfXVar100; areMeansEqual = norm(discrepencyMeans) < tol areMeansEqual = logical 1 areVarsEqual = norm(discrepencyVars) < tol areVarsEqual = logical 1
Like update, the filter function filters the observations through the model, but it returns all intermediate state estimates. Because update returns only the final state estimate, it is more suited to real-time calculations than filter. 12-2343
12
Functions
Filter States in Real Time Consider the simulated data and state-space model in “Compute Only Final State Distribution from Kalman Filter” on page 12-2341. T = 100; ARMdl = arima(AR=0.5,Constant=0,Variance=1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,Y0=x0); y = x + 0.75*randn(T,1); A = B = C = D = Mdl
0.5; 1; 1; 0.75; = ssm(A,B,C,D);
Suppose observations are available sequentially, and consider obtaining the updated state distribution by filtering each new observation as it is available. Simulate the following procedure using a loop. 1
Create variables that store the initial state distribution moments.
2
Filter the incoming observation through the model specifying the current initial state distribution moments.
3
Overwrite the current state distribution moments with the new state distribution moments.
4
Repeat steps 2 and 3 as new observations are available.
currentState = Mdl.Mean0; currentStateCov = Mdl.Cov0; newState = zeros(T,1); newStateCov = zeros(T,1); for j = 1:T [newState(j),newStateCov(j)] = update(Mdl,y(j),currentState,currentStateCov); currentState = newState(j); currentStateCov = newStateCov(j); end
Plot the observations, true state values, and new state means of each period. figure plot(1:T,x,'-k',1:T,y,'*g',1:T,newState,':r','LineWidth',2) xlabel("Period") legend(["True state values" "Observations" "New state values"])
12-2344
update
Compare the results to the results of filter. tol = 1e-10; [fX,~,output] = filter(Mdl,y); discrepencyMeans = fX - newState; discrepencyVars = [output.FilteredStatesCov]' - newStateCov; areMeansEqual = norm(discrepencyMeans) < tol areMeansEqual = logical 1 areVarsEqual = norm(discrepencyVars) < tol areVarsEqual = logical 1
The real-time filter update, applied to the entire data set sequentially, returns the same state distributions as filter.
Nowcast State-Space Model Containing Regression Component Consider that the linear relationship between the change in the unemployment rate and the nominal gross national product (nGNP) growth rate is of interest. Suppose the innovations of a mismeasured 12-2345
12
Functions
regression of the first difference of the unemployment rate onto the nGNP growth rate is an ARMA(1,1) series with Gaussian disturbances (that is, a regression model with ARMA(1,1) errors and measurement error). Symbolically, and in state-space form, the model is x1, t x2, t
=
1 ϕ θ x1, t − 1 + ut 1 0 0 x2, t − 1
yt − βZt = x1, t + σεt, where: • x1, t is the ARMA error series in the regression model. • x2, t is a dummy state for the MA(1) effect. •
y1, t is the observed change in the unemployment rate being deflated by the growth rate of nGNP (Zt).
• ut is a Gaussian series of disturbances having mean 0 and standard deviation 1. • εt is a Gaussian series of measurement errors with scale σ. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other measurements. load Data_NelsonPlosser
Preprocess the data by following this procedure: 1
Remove the leading missing observations.
2
Convert the nGNP series to a return series by using price2ret.
3
Apply the first difference to the unemployment rate series.
vars = ["GNPN" "UR"]; DT = rmmissing(DataTable(:,vars)); T = size(DT,1) - 1; % Sample size after differencing Z = [ones(T,1) price2ret(DT.GNPN)]; y = diff(DT.UR);
Though this example removes missing values, the Kalman filter accommodates series containing missing values. Specify the coefficient matrices. A B C D
= = = =
[NaN NaN; 0 0]; [1; 1]; [1 0]; NaN;
Specify the state-space model using ssm. Mdl = ssm(A,B,C,D);
Fit the model to all observations except for the final 10 observations (a holdout sample). Use a random set of initial parameter values for optimization. Specify the regression component and its initial value for optimization using the 'Predictors' and 'Beta0' name-value arguments, respectively. Restrict the estimate of σ to all positive, real numbers. 12-2346
update
fh = 10; params0 = [0.3 0.2 0.2]; [EstMdl,estParams] = estimate(Mdl,y(1:T-fh),params0,'Predictors',Z(1:T-fh,:), ... 'Beta0',[0.1 0.2],'lb',[-Inf,-Inf,0,-Inf,-Inf]); Method: Maximum likelihood (fmincon) Sample size: 51 Logarithmic likelihood: -87.2409 Akaike info criterion: 184.482 Bayesian info criterion: 194.141 | Coeff Std Err t Stat Prob ---------------------------------------------------------c(1) | -0.31780 0.37357 -0.85071 0.39494 c(2) | 1.21242 0.82223 1.47455 0.14034 c(3) | 0.45583 1.32970 0.34281 0.73174 y 1). • VEC{j} must contain Bj, the coefficient matrix of the lag term Δyt–j. • vec2var assumes that the coefficient of Δyt (B0) is the n-by-n identity. • For a LagOp lag operator polynomial specification: • VEC.Degree must be q. • VEC.Coefficients{0} is B0, the coefficient of Δyt. All other elements correspond to the coefficients of the subsequent lagged, differenced terms. For example, VEC.Coefficients{j} is the coefficient matrix of Δyt–j. VEC.Lags stores all nonzero lags. • To construct a model in reduced form, set VEC.Coefficients{0} to eye(VEC.Dimension). For example, consider converting 10 0.1 0.2 −0.1 0.01 0.5 0 Δyt = Δyt − 1 + Δyt − 2 + yt − 1 + εt 01 1 0.1 0.2 −0.3 −0.1 1 to a VAR(3) model. The model is in difference-equation notation on page 12-2394. You can convert the model by entering VAR = vec2var({[0.1 0.2; 1 0.1],... -[-0.1 0.01; 0.2 -0.3]},[0.5 0; -0.1 1]);
12-2393
12
Functions
The VEC(2) model in lag operator notation on page 12-2395 is 1 0 0 1
−
0.1 0.2 1 0.1
L−
−0.1 0.01 0.2 −0.3
L2 Δyt =
0.5 0 −0.1 1
yt − 1 + εt
The AR coefficient matrices of the lagged responses appear negated compared to the corresponding coefficients in difference-equation notation. To obtain the same result using LagOP lag operator polynomials, enter VEC = LagOp({eye(2), -[0.1 0.2; 1 0.1], -[-0.1 0.01; 0.2 -0.3]}); C = [0.5 0; -0.1 1]; VAR = vec2var(VEC,C);
C — Error-correction coefficient numeric matrix Error-correction coefficient, specified as an n-by-n numeric matrix. n is the number of time series in the VEC model. The dimensions of C and the matrices composing VEC must be equivalent. Data Types: double
Output Arguments VAR — VAR(p) model coefficients cell vector of square, numeric matrices | LagOp lag operator polynomial object | numeric vector VAR(p) model coefficients, returned as a a numeric vector, cell vector of n-by-n numeric matrices, or a LagOp lag operator polynomial object. n is the number of time series in the VEC model. VEC and VAR share the same data type and orientation. vec2var converts VEC(q) models to VAR(q + 1) models. That is: • If VEC is a cell or numeric vector, then numel(VAR) is numel(VEC) + 1. • If VEC is a LagOp lag operator polynomial, then VAR.Degree is VEC.Degree + 1.
More About Difference-Equation Notation A VAR(p) or VEC(q) model written in difference-equation notation isolates the present value of the response vector and its structural coefficient matrix on the left side of the equation. The right side of the equation contains the sum of the lagged responses, their coefficient matrices, the present innovation vector, and, for VEC models, the error-correction term. That is, a VAR(p) model written in difference-equation notation is A0 yt = a + A1 yt − 1 + A2 yt − 2 + ... + Ap yt − p + εt . A VEC(q) model written in difference equation notation is B0 Δyt = b + B1 Δyt − 1 + B2 Δyt − 2 + ... + Bq Δyt − q + Cyt − 1 + εt . For the variable and parameter definitions, see “VAR(p) Model” on page 12-2364 and “VEC(q) Model” on page 12-2364. 12-2394
vec2var
Lag Operator Notation A VAR(p) or VEC(q) model written in lag-operator notation positions all response terms to the left side of the equation. The right side of the equation contains the model constant offset vector, the present innovation, and, for VEC models, the error-correction term. That is, a VAR(p) model written in lag-operator notation is A(L)yt = a + εt where A(L) = A0 − A1L − A2L2 − ... − ApLp and L j yt = yt − j. A VEC(q) model written in difference equation notation is B(L)Δyt = b + Cyt − 1 + εt where B(L) = B0 − B1L − B2L2 − ... − BqLq. For the variable and parameter definitions, see “VAR(p) Model” on page 12-2364 and “VEC(q) Model” on page 12-2364. When comparing lag operator notation to difference-equation notation on page 12-2363, the signs of the lag terms are opposites. For more details, see “Lag Operator Notation” on page 1-21. VAR(p) Model A VAR(p) model is a multivariate, autoregressive time series model that has this general form: A0 yt = a + A1 yt − 1 + A2 yt − 2 + ... + Ap yt − p + εt . • yt is an n-dimensional time series. • A0 is the n-by-n invertible structural coefficient matrix. For models in reduced form, A0 = In, which is the n-dimensional identity matrix. • a is an n-dimensional vector of constant offsets. • Aj is the n-by-n coefficient matrix of yt–j, j = 1,...,p. • εt is an n-dimensional innovations series. The innovations are serially uncorrelated, and have a multivariate normal distribution with mean 0 and n-by-n covariance matrix Σ. VEC(q) Model A VEC(q) model is a multivariate, autoregressive time series model that has this general form: B0 Δyt = b + B1 Δyt − 1 + B2 Δyt − 2 + ... + Bq Δyt − q + Cyt − 1 + εt . • yt is an n-dimensional time series. • Δ is the first difference operator, that is, Δyt = yt – yt–1. • B0 is the n-by-n invertible structural coefficient matrix. For models in reduced form, B0 = In, which is the n-dimensional identity matrix. • b is an n-dimensional vector of constant offsets. • Bj is the n-by-n coefficient matrix of Δyt–j, j = 1,...,q. • εt is an n-dimensional innovations series. The innovations are serially uncorrelated, and have a multivariate normal distribution with mean 0 and n-by-n covariance matrix Σ. 12-2395
12
Functions
• C is the n-by-n error-correction or impact coefficient matrix.
Tips • To accommodate structural VEC models, specify the input argument VEC as a LagOp lag operator polynomial. • To access the cell vector of the lag operator polynomial coefficients of the output argument VAR, enter toCellArray(VAR). • To convert the model coefficients of the output argument from lag operator notation on page 122395 to the model coefficients in difference-equation notation on page 12-2394, enter VARDEN = toCellArray(reflect(VAR));
VARDEN is a cell vector containing q + 1 coefficients corresponding to the response terms in VAR.Lags in difference-equation notation. The first element is the coefficient of yt, the second element is the coefficient of yt–1, and so on. • The constant offset of the converted VAR model is the same as the constant offset of the VEC model.
Algorithms • vec2var does not impose stability requirements on the coefficients. To check for stability, use isStable. isStable requires a LagOp lag operator polynomial as input. For example, to check whether VAR, the cell array of n-byn numeric matrices, composes a stable time series, enter varLagOp = LagOp([eye(n) var]); isStable(varLagOp)
A 0 indicates that the polynomial is not stable. If VAR is a LagOp lag operator polynomial, then pass it to isStable.
Version History Introduced in R2015b
References [1] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Lutkepohl, H. "New Introduction to Multiple Time Series Analysis." Springer-Verlag, 2007.
See Also Objects LagOp Functions arma2ar | arma2ma | isStable | estimate | toCellArray | var2vec | armairf 12-2396
vec2var
Topics “Generate VEC Model Impulse Responses” on page 9-133 “Lag Operator Notation” on page 1-21 “Vector Autoregression (VAR) Models” on page 9-3 “Cointegration and Error Correction Analysis” on page 9-108
12-2397
12
Functions
vecm Create vector error-correction (VEC) model
Description The vecm function returns a vecm object specifying the functional form and storing the parameter values of a (p – 1)-order, cointegrated, multivariate vector error-correction model on page 12-2414 (VEC((p – 1)) model. The key components of a vecm object include the number of time series (response-variable dimensionality), the number of cointegrating relations among the response variables (cointegrating rank), and the degree of the multivariate autoregressive polynomial composed of first differences of the response series (short-run polynomial), which is p – 1. That is, p – 1 is the maximum lag with a nonzero coefficient matrix, and p is the order of the vector autoregression (VAR) model representation of the VEC model. Other model components include a regression component to associate the same exogenous predictor variables to each response series, and constant and time trend terms. Another important component of a VEC model is its Johansen form because it dictates how MATLAB includes deterministic terms in the model. This specification has implications on the estimation procedure and allowable equality constraints. For more details, see “Johansen Form” on page 122415 and [2]. Given the response-variable dimensionality, cointegrating rank, and short-run polynomial degree, all coefficient matrices and innovation-distribution parameters are unknown and estimable unless you specify their values by using name-value pair argument syntax. To choose which Johansen form is suitable for your data, then estimate a model containing all or partially unknown parameter values given the data, use estimate. To work with an estimated or fully specified vecm model object, pass it to an object function on page 12-2404. Alternatively, you can create and work with vecm model objects interactively by using Econometric Modeler.
Creation Syntax Mdl = vecm(numseries,rank,numlags) Mdl = vecm(Name,Value) Description Mdl = vecm(numseries,rank,numlags) creates a VEC(numlags) model composed of numseries time series containing rank cointegrating relations. The maximum nonzero lag in the short-run polynomial is numlags. All lags and the error-correction term have numseries-bynumseries coefficient matrices composed of NaN values. This shorthand syntax allows for easy model template creation in which you specify the model dimensions explicitly. The model template is suited for unrestricted parameter estimation, that is, 12-2398
vecm
estimation without any parameter equality constraints. After you create a model, you can alter property values using dot notation. Mdl = vecm(Name,Value) sets properties on page 12-2400 or additional options using name-value pair arguments. Enclose each name in quotes. For example, 'Lags',[1 4],'ShortRun',ShortRun specifies the two short-run coefficient matrices in ShortRun at lags 1 and 4. This longhand syntax allows for creating more flexible models. However, vecm must be able to infer the number of series (NumSeries) and cointegrating rank (Rank) from the specified name-value pair arguments. Name-value pair arguments and property values that correspond to the number of time series and cointegrating rank must be consistent with each other. Input Arguments The shorthand syntax provides an easy way for you to create model templates that are suitable for unrestricted parameter estimation. For example, to create a VEC(2) model composed of three response series containing one cointegrating relation and unknown parameter values, enter: Mdl = vecm(3,1,2);
To impose equality constraints on parameter values during estimation, set the appropriate property on page 12-2400 values using dot notation. numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double rank — Number of cointegrating relations nonnegative integer Number of cointegrating relations, specified as a nonnegative integer. The adjustment and cointegration matrices in the model have rank linearly independent columns, and are numseries-byrank matrices composed of NaN values. Data Types: double numlags — Number of first differences of responses nonnegative integer Number of first differences of responses to include in the short-run polynomial of the VEC(p – 1) model, specified as a nonnegative integer. That is, numlags = p – 1. Consequently, numlags specifies the number of short-run terms associated with the corresponding VAR(p) model. All lags have numseries-by-numseries short-run coefficient matrices composed of NaN values. Data Types: double
12-2399
12
Functions
Name-Value Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. The longhand syntax enables you to create models in which some or all coefficients are known. During estimation, estimate imposes equality constraints on any known parameters. Specify enough information for vecm to infer the number of response series and the cointegrating rank. Example: 'Adjustment',nan(3,2),'Lags',[4 8] specifies a three-dimensional VEC(8) model with two cointegrating relations and nonzero short-run coefficient matrices at lags 4 and 8. Lags — Short-run polynomial lags 1:(P-1) (default) | numeric vector of unique positive integers Short-run polynomial lags, specified as the comma-separated pair consisting of 'Lags' and a numeric vector containing at most P – 1 elements of unique positive integers. The matrix ShortRun{j} is the coefficient of lag Lags(j). Example: 'Lags',[1 4] Data Types: double
Properties You can set writable property values when you create the model object by using name-value pair argument syntax, or after you create the model object by using dot notation. For example, to create a VEC(1) model in the H1 Johansen form suitable for simulation, and composed of two response series of cointegrating rank one and no overall time trend term, enter: Mdl = vecm('Constant',[0; 0.01],'Adjustment',[-0.1; 0.15],... 'Cointegration',[1; -4],'ShortRun',{[0.3 -0.15 ; 0.1 -0.3]},... 'Covariance',eye(2)); Mdl.Trend = 0;
NumSeries — Number of time series m positive integer This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double Rank — Number of cointegrating relations nonnegative integer This property is read-only. Number of cointegrating relations, specified as a nonnegative integer. The adjustment and cointegration matrices in the model have Rank linearly independent columns and are NumSeries-byRank matrices. 12-2400
vecm
Data Types: double P — Corresponding VAR model order nonnegative integer This property is read-only. Corresponding VAR model order, specified as a nonnegative integer. P – 1 is the maximum lag in the short-run polynomial that has a nonzero coefficient matrix. Lags in the short-run polynomial that have degree less than P – 1 can have coefficient matrices composed entirely of zeros. P specifies the number of presample observations required to initialize the model. Data Types: double Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. vecm stores the value as a string scalar. The default value describes the parametric form of the model, for example "2-Dimensional Rank = 1 VEC(1) Model". Example: 'Description','Model 1' Data Types: string | char SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. Example: 'SeriesNames',{'CPI' 'Unemployment'} Data Types: string Constant — Overall model constant NaN(NumSeries,1) (default) | numeric vector Overall model constant (c), specified as a NumSeries-by-1 numeric vector. The value of Constant, and whether estimate supports equality constraints on it during estimation, depend on the Johansen form on page 12-2415 of the VEC model on page 12-2414. Example: 'Constant',[1; 2] Data Types: double Trend — Overall linear time trend nan(NumSeries,1) (default) | numeric vector Overall linear time trend (d), specified as a NumSeries-by-1 numeric vector. The value of Trend, and whether estimate supports equality constraints on it during estimation, depend on the Johansen form on page 12-2415 of the VEC model on page 12-2414. Example: 'Trend',[0.1; 0.2] Data Types: double 12-2401
12
Functions
Adjustment — Cointegration adjustment speeds NaN(NumSeries,Rank) (default) | numeric matrix Cointegration adjustment speeds (A), specified as a NumSeries-by-Rank numeric matrix. If you specify a matrix of known values, then all columns must be linearly independent (that is, Adjustment must be a matrix of full column rank). For estimation, you can impose equality constraints on the cointegration adjustment speeds by supplying a matrix composed entirely of numeric values or a mixture of numeric and missing (NaN) values. If Rank = 0, then Adjustment is an empty NumSeries-by-0 vector. For more details on specifying Adjustment, see “Algorithms” on page 12-2417. Example: 'Adjustment',NaN(2,1) Data Types: double Cointegration — Cointegration matrix NaN(NumSeries,Rank) (default) | numeric matrix Cointegration matrix (B), specified as a NumSeries-by-Rank numeric matrix. If you specify a matrix of known values, then all columns must be linearly independent (that is, Cointegration must be a matrix of full column rank). Cointegration cannot contain a mixture of missing (NaN) values and numeric values. Supported equality constraints on the cointegration matrix during estimation depend on the Johansen form on page 12-2415 of the VEC model on page 12-2414. If Rank = 0, then Cointegration is an empty NumSeries-by-0 vector. For more details on specifying Cointegration, see “Algorithms” on page 12-2417. Example: 'Cointegration',NaN(2,1) Data Types: double Impact — Impact matrix numeric matrix Impact, or long-run level, matrix (Π), specified as a NumSeries-by-NumSeries numeric matrix. The rank of Impact must be Rank. For estimation of full-rank models (Rank = NumSeries), you can impose equality constraints on the impact matrix by supplying a matrix containing a mixture of numeric and missing values (NaN). If 1 ≤ Rank ≤ NumSeries – 1, then the default value is Adjustment*Cointegration'. If Rank = 0, then Impact is a matrix of zeros. Consequently, the model does not have an errorcorrection term. For more details on specifying Impact, see “Algorithms” on page 12-2417. Example: 'Impact',[0.5 0.25 0; 0.3 0.15 0; 0 0 0.9] 12-2402
vecm
Data Types: double CointegrationConstant — Constant in cointegrating relations NaN(Rank,1) (default) | numeric vector Constant (intercept) in the cointegrating relations (c0), specified as a Rank-by-1 numeric vector. You can set CointegrationConstant only by using dot notation after you create the model. CointegrationConstant cannot contain a mixture of missing (NaN) values and numeric values. Supported equality constraints on the cointegration constant vector during estimation depend on the Johansen form on page 12-2415 of the VEC model on page 12-2414. If Rank = 0, then CointegrationConstant is a 0-by-1 vector of zeros. Example: Mdl.CointegrationConstant = [1; 0] Data Types: double CointegrationTrend — Time trend in cointegrating relations NaN(Rank,1) (default) | numeric vector Time trend in the cointegrating relations (d0), specified as a Rank-by-1 numeric vector. You can set CointegrationTrend only by using dot notation after you create the model. CointegrationTrend cannot contain a mixture of missing (NaN) values and numeric values. Supported equality constraints on the cointegration linear trend vector during estimation depend on the Johansen form on page 12-2415 of the VEC model on page 12-2414. If Rank = 0, then CointegrationTrend is a 0-by-1 vector of zeros. Example: Mdl.CointegrationTrend = [0; 0.5] Data Types: double ShortRun — Short-run coefficient matrices cell vector of numeric matrices Short-run coefficient matrices associated with the lagged response differences, specified as a cell vector of NumSeries-by-NumSeries numeric matrices. Specify coefficient signs corresponding to those coefficients in the VEC model on page 12-2414 expressed in difference-equation notation. The property P is numel(ShortRun) + 1. • If you set the 'Lags' name-value pair argument to Lags, the following conditions apply. • The lengths of ShortRun and Lags must be equal. • ShortRun{j} is the coefficient matrix of lag Lags(j). • By default, ShortRun is a numel(Lags)-by-1 cell vector of matrices composed of NaN values. • Otherwise, the following conditions apply. • ShortRun{j} is the coefficient matrix of lag j. • By default, ShortRun is a (P – 1)-by-1 cell vector of matrices composed of NaN values. MATLAB assumes that the coefficient of the current, differenced response (Δyt) is the identity matrix. Therefore, exclude this coefficient from ShortRun. 12-2403
12
Functions
Example: 'ShortRun',{[0.5 -0.1; 0.1 0.2]} Data Types: cell Beta — Regression coefficient matrix NumSeries-by-0 empty matrix (default) | numeric matrix Regression coefficient matrix associated with the predictor variables, specified as a NumSeries-byNumPreds numeric matrix. NumPreds is the number of predictor variables, that is, the number of columns in the predictor data. Beta(j,:) contains the regression coefficients for each predictor in the equation of response yj,t. Beta(:,k) contains the regression coefficient in each response equation for predictor xk. By default, all predictor variables are in the regression component of all response equations. You can exclude certain predictors from certain equations by specifying equality constraints to 0. Example: In a model that includes 3 responses and 4 predictor variables, to exclude the second predictor from the third equation and leave the others unrestricted, specify [NaN NaN NaN NaN; NaN NaN NaN NaN; NaN 0 NaN NaN]. The default value specifies no regression coefficient in the model. However, if you specify predictor data when you estimate the model using estimate, then MATLAB sets Beta to an appropriately sized matrix of NaN values. Example: 'Beta',[2 3 -1 2; 0.5 -1 -6 0.1] Data Types: double Covariance — Innovations covariance matrix NaN(NumSeries) (default) | numeric, positive definite matrix Innovations covariance matrix of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries numeric, positive definite matrix. Example: 'Covariance',eye(2) Data Types: double Note NaN-valued elements in properties indicate unknown, estimable parameters. Specified elements indicate equality constraints on parameters in model estimation. The innovations covariance matrix Covariance cannot contain a mix of NaN values and real numbers; you must fully specify the covariance or it must be completely unknown (NaN(NumSeries)).
Object Functions estimate fevd filter forecast infer irf simulate summarize varm 12-2404
Fit vector error-correction (VEC) model to data Generate vector error-correction (VEC) model forecast error variance decomposition (FEVD) Filter disturbances through vector error-correction (VEC) model Forecast vector error-correction (VEC) model responses Infer vector error-correction (VEC) model innovations Generate vector error-correction (VEC) model impulse responses Monte Carlo simulation of vector error-correction (VEC) model Display estimation results of vector error-correction (VEC) model Convert vector error-correction (VEC) model to vector autoregression (VAR) model
vecm
Examples Prepare VEC Model Template for Parameter Estimation Suppose that a VEC model with cointegrating rank of 4 and a short-run polynomial of degree 2 is appropriate for modeling the behavior of seven hypothetical macroeconometric time series. Create a VEC(7,4,2) model using the shorthand syntax. Mdl = vecm(7,4,2) Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(2) Model with Linear Time Trend" "Y1" "Y2" "Y3" ... and 4 more 7 4 3 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs] [4×1 vector of NaNs] {7×7 matrices of NaNs} at lags [1 2] [7×1 vector of NaNs] [7×0 matrix] [7×7 matrix of NaNs]
Mdl is a vecm model object that serves as a template for parameter estimation. MATLAB® considers the NaN values as unknown parameter values to be estimated. For example, the Adjustment property is a 7-by-4 matrix of NaN values. Therefore, the adjustment speeds are active model parameters to be estimated. By default, MATLAB® includes overall and cointegrating linear time trend terms in the model. You can create a VEC model in H1 Johansen form by removing the time trend terms, that is, by setting the Trend property to 0 using dot notation. Mdl.Trend = 0 Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun:
"7-Dimensional Rank = 4 VEC(2) Model" "Y1" "Y2" "Y3" ... and 4 more 7 4 3 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs] [4×1 vector of NaNs] {7×7 matrices of NaNs} at lags [1 2]
12-2405
12
Functions
Trend: [7×1 vector of zeros] Beta: [7×0 matrix] Covariance: [7×7 matrix of NaNs]
MATLAB® expands Trend to the appropriate length, a 7-by-1 vector of zeros.
Specify All Parameter Values of VEC Model Consider this VEC(1) model for three hypothetical response series. Δyt = c + AB′yt − 1 + Φ1 Δyt − 1 + εt −1 −0 . 3 0 . 3 0 0.1 0.2 0 . 1 −0 . 2 0 . 2 = −3 + −0 . 2 0 . 1 yt − 1 + 0 . 2 −0 . 2 0 Δyt − 1 + εt . −0 . 7 0 . 5 0 . 2 −30 −1 0 0 . 7 −0 . 2 0 . 3 The innovations are multivariate Gaussian with a mean of 0 and the covariance matrix 1.3 0.4 1.6 Σ = 0.4 0.6 0.7 . 1.6 0.7 5 Create variables for the parameter values. Adjustment = [-0.3 0.3; -0.2 0.1; -1 0]; Cointegration = [0.1 -0.7; -0.2 0.5; 0.2 0.2]; ShortRun = {[0. 0.1 0.2; 0.2 -0.2 0; 0.7 -0.2 0.3]}; Constant = [-1; -3; -30]; Trend = [0; 0; 0]; Covariance = [1.3 0.4 1.6; 0.4 0.6 0.7; 1.6 0.7 5];
Create a vecm model object representing the VEC(1) model using the appropriate name-value pair arguments. Mdl = vecm('Adjustment',Adjustment,'Cointegration',Cointegration,... 'Constant',Constant,'ShortRun',ShortRun,'Trend',Trend,... 'Covariance',Covariance) Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend:
12-2406
"3-Dimensional Rank = 2 VEC(1) Model" "Y1" "Y2" "Y3" 3 2 2 [-1 -3 -30]' [3×2 matrix] [3×2 matrix] [3×3 matrix] [2×1 vector of NaNs] [2×1 vector of NaNs] {3×3 matrix} at lag [1] [3×1 vector of zeros]
vecm
Beta: [3×0 matrix] Covariance: [3×3 matrix]
Mdl is, effectively, a fully specified vecm model object. That is, the cointegration constant and linear trend are unknown. However, they are not needed for simulating observations or forecasting, given that the overall constant and trend parameters are known. By default, vecm attributes the short-run coefficient to the first lag in the short-run polynomial. Consider another VEC model that attributes the short-run coefficient matrix ShortRun to the fourth lag term, specifies a matrix of zeros for the first lag coefficient, and treats all else as being equal to Mdl. Create this VEC(4) model. Mdl.ShortRun(4) = ShortRun; Mdl.ShortRun(1) = {0} Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"3-Dimensional Rank = 2 VEC(4) Model" "Y1" "Y2" "Y3" 3 2 5 [-1 -3 -30]' [3×2 matrix] [3×2 matrix] [3×3 matrix] [2×1 vector of NaNs] [2×1 vector of NaNs] {3×3 matrix} at lag [4] [3×1 vector of zeros] [3×0 matrix] [3×3 matrix]
Alternatively, you can create another model object using vecm and the same syntax as for Mdl, but additionally specify 'Lags',4.
Fit VEC(1) Model to Matrix of Response Data Fit a VEC(1) model to seven macroeconomic series. Supply the response data as a numeric matrix. Consider a VEC model for the following macroeconomic series: • Gross domestic product (GDP) • GDP implicit price deflator • Paid compensation of employees • Nonfarm business sector hours of all persons • Effective federal funds rate • Personal consumption expenditures • Gross private domestic investment 12-2407
12
Functions
Suppose that a cointegrating rank of 4 and one short-run term are appropriate, that is, consider a VEC(1) model. Load the Data_USEconVECModel data set. load Data_USEconVECModel
For more information on the data set and variables, enter Description at the command line. Determine whether the data needs to be preprocessed by plotting the series on separate plots. figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.GDP); title("Gross Domestic Product"); ylabel("Index"); xlabel("Date"); nexttile plot(FRED.Time,FRED.GDPDEF); title("GDP Deflator"); ylabel("Index"); xlabel("Date"); nexttile plot(FRED.Time,FRED.COE); title("Paid Compensation of Employees"); ylabel("Billions of $"); xlabel("Date"); nexttile plot(FRED.Time,FRED.HOANBS); title("Nonfarm Business Sector Hours"); ylabel("Index"); xlabel("Date");
12-2408
vecm
figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.FEDFUNDS) title("Federal Funds Rate") ylabel("Percent") xlabel("Date") nexttile plot(FRED.Time,FRED.PCEC) title("Consumption Expenditures") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.GPDI) title("Gross Private Domestic Investment") ylabel("Billions of $") xlabel("Date")
12-2409
12
Functions
Stabilize all series, except the federal funds rate, by applying the log transform. Scale the resulting series by 100 so that all series are on the same scale. FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
Create a VEC(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = FRED.Properties.VariableNames Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant:
12-2410
"7-Dimensional Rank = 4 VEC(1) Model with Linear Time Trend" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs]
vecm
CointegrationTrend: ShortRun: Trend: Beta: Covariance:
[4×1 {7×7 [7×1 [7×0 [7×7
vector of matrix of vector of matrix] matrix of
NaNs] NaNs} at lag [1] NaNs] NaNs]
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set and the default options. EstMdl = estimate(Mdl,FRED.Variables) EstMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [14.1329 8.77841 -7.20359 ... and 4 more]' [7×4 matrix] [7×4 matrix] [7×7 matrix] [-28.6082 109.555 -77.0912 ... and 1 more]' [4×1 vector of zeros] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
EstMdl is an estimated vecm model object. It is fully specified because all parameters have known values. By default, estimate imposes the constraints of the H1 Johansen VEC model form by removing the cointegrating trend and linear trend terms from the model. Parameter exclusion from estimation is equivalent to imposing equality constraints to zero. Display a short summary from the estimation. results = summarize(EstMdl) results = struct with fields: Description: "7-Dimensional Rank = 4 VEC(1) Model" Model: "H1" SampleSize: 238 NumEstimatedParameters: 112 LogLikelihood: -1.4939e+03 AIC: 3.2118e+03 BIC: 3.6007e+03 Table: [133x4 table] Covariance: [7x7 double] Correlation: [7x7 double]
The Table field of results is a table of parameter estimates and corresponding statistics.
12-2411
12
Functions
Forecast Responses from VEC Model This example follows from “Fit VEC(1) Model to Matrix of Response Data” on page 12-2407. Create and estimate the VEC(1) model. Treat the last ten periods as the forecast horizon. load Data_USEconVECModel FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI); Mdl = vecm(7,4,1) Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model with Linear Time Trend" "Y1" "Y2" "Y3" ... and 4 more 7 4 2 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs] [4×1 vector of NaNs] {7×7 matrix of NaNs} at lag [1] [7×1 vector of NaNs] [7×0 matrix] [7×7 matrix of NaNs]
Y = FRED{1:(end - 10),:}; EstMdl = estimate(Mdl,Y) EstMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model" "Y1" "Y2" "Y3" ... and 4 more 7 4 2 [14.5023 8.46791 -7.08266 ... and 4 more]' [7×4 matrix] [7×4 matrix] [7×7 matrix] [-32.8433 -101.126 -84.2373 ... and 1 more]' [4×1 vector of zeros] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
Forecast 10 responses using the estimated model and in-sample data as presample observations. 12-2412
vecm
YF = forecast(EstMdl,10,Y);
On separate plots, plot part of the GDP and GPDI series with their forecasted values. figure; plot(FRED.Time(end - 50:end),FRED.GDP(end - 50:end)); hold on plot(FRED.Time((end - 9):end),YF(:,1)) h = gca; fill(FRED.Time([end - 9 end end end - 9]),h.YLim([1,1,2,2]),'k',... 'FaceAlpha',0.1,'EdgeColor','none'); legend('True','Forecasted','Location','NW') title('Quarterly Scaled GDP: 2004 - 2016'); ylabel('Billions of $ (scaled)'); xlabel('Year'); hold off
figure; plot(FRED.Time(end - 50:end),FRED.GPDI(end - 50:end)); hold on plot(FRED.Time((end - 9):end),YF(:,7)) h = gca; fill(FRED.Time([end - 9 end end end - 9]),h.YLim([1,1,2,2]),'k',... 'FaceAlpha',0.1,'EdgeColor','none'); legend('True','Forecasted','Location','NW') title('Quarterly Scaled GPDI: 2004 - 2016'); ylabel('Billions of $ (scaled)');
12-2413
12
Functions
xlabel('Year'); hold off
More About Vector Error-Correction Model A vector error-correction (VEC) model is a multivariate, stochastic time series model consisting of a system of m = numseries equations of m distinct, differenced response variables. Equations in the system can include an error-correction term, which is a linear function of the responses in levels used to stabilize the system. The cointegrating rank r is the number of cointegrating relations that exist in the system. Each response equation can include an autoregressive polynomial composed of first differences of the response series (short-run polynomial of degree p – 1), a constant, a time trend, exogenous predictor variables, and a constant and time trend in the error-correction term. A VEC(p – 1) model in difference-equation notation and in reduced form can be expressed in two ways: • This equation is the component form of a VEC model, where the cointegration adjustment speeds and cointegration matrix are explicit, whereas the impact matrix is implied. Δyt = A B′yt − 1 + c0 + d0t + c1 + d1t + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt = c + dt + AB′yt − 1 + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt . 12-2414
vecm
The cointegrating relations are B'yt – 1 + c0 + d0t and the error-correction term is A(B'yt – 1 + c0 + d0t). • This equation is the impact form of a VEC model, where the impact matrix is explicit, whereas the cointegration adjustment speeds and cointegration matrix are implied. Δyt = Πyt − 1 + A c0 + d0t + c1 + d1t + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt = c + dt + Πyt − 1 + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt . In the equations: • yt is an m-by-1 vector of values corresponding to m response variables at time t, where t = 1,...,T. • Δyt = yt – yt – 1. The structural coefficient is the identity matrix. • r is the number of cointegrating relations and, in general, 0 < r < m. • A is an m-by-r matrix of adjustment speeds. • B is an m-by-r cointegration matrix. • Π is an m-by-m impact matrix with a rank of r. • c0 is an r-by-1 vector of constants (intercepts) in the cointegrating relations. • d0 is an r-by-1 vector of linear time trends in the cointegrating relations. • c1 is an m-by-1 vector of constants (deterministic linear trends in yt). • d1 is an m-by-1 vector of linear time-trend values (deterministic quadratic trends in yt). • c = Ac0 + c1 and is the overall constant. • d = Ad0 + d1 and is the overall time-trend coefficient. • Φj is an m-by-m matrix of short-run coefficients, where j = 1,...,p – 1 and Φp – 1 is not a matrix containing only zeros. • xt is a k-by-1 vector of values corresponding to k exogenous predictor variables. • β is an m-by-k matrix of regression coefficients. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. Condensed and in lag operator notation, the system is Φ(L)(1 − L)yt = A B′yt − 1 + c0 + d0t + c1 + d1t + βxt + εt = c + dt + AB′yt − 1 + βxt + εt where Φ(L) = I − Φ1 − Φ2 − ... − Φp − 1, I is the m-by-m identity matrix, and Lyt = yt – 1. If m = r, then the VEC model is a stable VAR(p) model in the levels of the responses. If r = 0, then the error-correction term is a matrix of zeros, and the VEC(p – 1) model is a stable VAR(p – 1) model in the first differences of the responses. Johansen Form The Johansen forms of a VEC Model on page 12-2414 differ with respect to the presence of deterministic terms. As detailed in [2], the estimation procedure differs among the forms. Consequently, allowable equality constraints on the deterministic terms during estimation differ among forms. For more details, see “The Role of Deterministic Terms” on page 9-109. This table describes the five Johansen forms and supported equality constraints. 12-2415
12
Functions
Form
Error-Correction Term
Deterministic Coefficients
Equality Constraints
H2
AB´yt − 1
c = 0 (Constant).
You can fully specify B.
d = 0 (Trend).
All deterministic coefficients are zero.
c0 = 0 (CointegrationConstant). d0 = 0 (CointegrationTrend). H1*
A(B´yt−1+c0)
c = Ac0. d = 0. d0 = 0.
If you fully specify either B or c0, then you must fully specify the other. MATLAB derives the value of c from c0 and A. All deterministic trends are zero.
H1
A(B´yt−1 + c0) + c1
c = Ac0 + c1.
You can fully specify B.
d = 0.
You can specify a mixture of NaN and numeric values for c.
d0 = 0.
MATLAB derives the value of c0 from c and A. All deterministic trends are zero.
H*
A(B´yt−1 + c0 + d0t) + c1
c = Ac0 + c1. d = Ad0.
If you fully specify either B or d0, then you must fully specify the other. You can specify a mixture of NaN and numeric values for c. MATLAB derives the value of c0 from c and A. MATLAB derives the value of d from A and d0.
H
A(B´yt−1+c0+d0t)+c1+d1t
c = Ac0 + c1.
You can fully specify B.
d = A.d0 + d1.
You can specify a mixture of NaN and numeric values for c and d. MATLAB derives the values of c0 and d0 from c, d, and A.
12-2416
vecm
Algorithms VEC Model Forms VEC models can take one of two forms: component form or impact form. Component Form
• If you create a vecm model object using the shorthand syntax, and then assign a value to at least one of these properties–Adjustment, Cointegration, CointegrationConstant, or CointegrationTrend–before assigning a value to the Impact property, then the model takes the component form. • If you create a vecm model object using the longhand syntax by assigning a value to either the Adjustment or Cointegration property (or both), but leave the Impact property unspecified, then the model the component form. • If 1 ≤ Rank ≤ NumSeries – 1, as is the case for most VEC model analyses, then it can be more convenient to work with the model in component form rather than impact form. • The Impact property is read-only and depends on the values of the Cointegration and Adjustment properties. Specifically, Impact = Adjustment*Cointegration'. • During estimation, MATLAB fits the VEC model to the data in two steps. First, it estimates the cointegrating relations. Second, it constructs a VARX model from the differenced responses and short-run polynomial, includes an exogenous term for the estimated cointegrating relations, and then fits the VARX model to the differenced response data. Impact Form
• If you create a vecm model object using the shorthand syntax, and then assign a value to the Impact property before assigning a value to any of these properties–Adjustment, Cointegration, CointegrationConstant, or CointegrationTrend–then the model takes the impact form. • If you create a vecm model object using the longhand syntax by assigning a value to the Impact property, but leave either the Adjustment or Cointegration property (or both) unspecified, then the model takes the impact form. • If the value of the Impact property is a full-rank matrix (Rank = NumSeries), then it can be more convenient to work with the model in impact form rather than component form. The corresponding VEC model has no unit roots. Therefore, the model is a stable VAR(P) model. Because each individual response variable is stable, any linear combination of response variables is also stable. • Because matrix factorizations are not necessarily unique, the values of the Cointegration and Adjustment properties are unknown. Therefore, after you pass the vecm model object to estimate, MATLAB sets the Cointegration property to the NumSeries-by-NumSeries identity matrix and the Adjustment property to the value of Impact in the estimated vecm model object. • The Cointegration and Adjustment properties are read-only and depend on the value of Impact. • During estimation, MATLAB fits the model in one step: it fits the equivalent VARX(P) model to the differenced responses and includes yt – 1 as exogenous predictors in the regression component. VEC Model Form Notes
• When using the longhand syntax to create a vecm model object, if you assign a value to the Impact property and either the Adjustment or Cointegration property (or both) 12-2417
12
Functions
simultaneously, then MATLAB uses the cointegration rank Rank to determine the form of the vecm model object. Specifically, if 1 ≤ Rank ≤ NumSeries – 1, then the model object is in component form. Otherwise, the model object is in impact form. • If the model is fully specified, then the form of the vecm model object is irrelevant. Deterministic Parameters MATLAB does not store c1 and δ1, but you can compute them using Adjustment, Constant, CointegrationConstant, Trend, CointegrationTrend, and the corresponding Johansen form on page 122415 of the VEC model.
Version History Introduced in R2017b
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [3] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [4] Lütkepohl, H. New Introduction to Multiple Time Series Analysis. Berlin: Springer, 2005.
See Also Apps Econometric Modeler Objects varm Functions vec2var | var2vec | egcitest | jcitest | jcontest Topics “Cointegration and Error Correction Analysis” on page 9-108 “Determine Cointegration Rank of VEC Model” on page 9-112 “Conduct Cointegration Test Using Econometric Modeler” on page 4-170 “Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
12-2418
vecm
vecm Convert vector autoregression (VAR) model to vector error-correction (VEC) model
Syntax VECMdl = vecm(Mdl)
Description VECMdl = vecm(Mdl) converts the VAR(p) model Mdl to its equivalent VEC(p – 1) model representation VECMdl.
Examples Convert VAR Model to VEC Model Consider a VAR(2) model for the following seven macroeconomic series. • Gross domestic product (GDP) • GDP implicit price deflator • Paid compensation of employees • Nonfarm business sector hours of all persons • Effective federal funds rate • Personal consumption expenditures • Gross private domestic investment Load the Data_USEconVECModel data set. load Data_USEconVECModel
For more information on the data set and variables, enter Description at the command line. Determine whether the data needs to be preprocessed by plotting the series on separate plots. figure; subplot(2,2,1) plot(FRED.Time,FRED.GDP); title('Gross Domestic Product'); ylabel('Index'); xlabel('Date'); subplot(2,2,2) plot(FRED.Time,FRED.GDPDEF); title('GDP Deflator'); ylabel('Index'); xlabel('Date'); subplot(2,2,3) plot(FRED.Time,FRED.COE); title('Paid Compensation of Employees'); ylabel('Billions of $');
12-2419
12
Functions
xlabel('Date'); subplot(2,2,4) plot(FRED.Time,FRED.HOANBS); title('Nonfarm Business Sector Hours'); ylabel('Index'); xlabel('Date');
figure; subplot(2,2,1) plot(FRED.Time,FRED.FEDFUNDS); title('Federal Funds Rate'); ylabel('Percent'); xlabel('Date'); subplot(2,2,2) plot(FRED.Time,FRED.PCEC); title('Consumption Expenditures'); ylabel('Billions of $'); xlabel('Date'); subplot(2,2,3) plot(FRED.Time,FRED.GPDI); title('Gross Private Domestic Investment'); ylabel('Billions of $'); xlabel('Date');
12-2420
vecm
Stabilize all series, except the federal funds rate, by applying the log transform. Scale the resulting series by 100 so that all series are on the same scale. FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
Create a VAR(2) model using the shorthand syntax. Specify the variable names. Mdl = varm(7,2); Mdl.SeriesNames = FRED.Properties.VariableNames;
Mdl is a varm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set and the default options. EstMdl = estimate(Mdl,FRED.Variables) EstMdl = varm with properties: Description: "AR-Stationary 7-Dimensional VAR(2) Model" SeriesNames: "GDP" "GDPDEF" "COE" ... and 4 more NumSeries: 7
12-2421
12
Functions
P: Constant: AR: Trend: Beta: Covariance:
2 [15.835 9.91375 -14.0917 ... and 4 more]' {7×7 matrices} at lags [1 2] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
EstMdl is an estimated varm model object. It is fully specified because all parameters have known values. Convert the estimated VAR(2) model to its equivalent VEC(1) model representation. VECMdl = vecm(EstMdl) VECMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 7 VEC(1) Model" "GDP" "GDPDEF" "COE" ... and 4 more 7 7 2 [15.835 9.91375 -14.0917 ... and 4 more]' [7×7 matrix] [7×7 diagonal matrix] [7×7 matrix] [7×1 vector of NaNs] [7×1 vector of NaNs] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
VECMdl is a vecm model object.
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object created by varm or estimate. Mdl must be fully specified.
Output Arguments VECMdl — VEC model equivalent vecm model object VEC model equivalent, returned as a vecm model object.
Algorithms Consider the m-dimensional VAR(p) model in difference equation notation. 12-2422
vecm
yt = c + dt +
p
∑
j=1
Γ j yt − j + βxt + εt .
• yt is an m-by-1 vector of values corresponding to m response variables at time t, where t = 1,...,T. • c is the overall constant. • d is the overall time trend coefficient. • xt is a k-by-1 vector of values corresponding to k exogenous predictor variables. • β is an m-by-k matrix of regression coefficients. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. • Γj is an m-by-m matrix of autoregressive coefficients. The equivalent VEC(p – 1) model using lag operator notation is (1 − L)yt = c + dt + Πyt − 1 +
p−1
∑
j=1
Φ j 1 − L yt − j + βxt + εt .
• Lyt = yt – 1. • Π is an m-by-m impact matrix with a rank of r. • Φj is an m-by-m matrix of short-run coefficients
Version History Introduced in R2017b
See Also Objects vecm | varm Functions var2vec
12-2423
12
Functions
vratiotest Variance ratio test for random walk
Syntax h = vratiotest(y) [h,pValue,stat,cValue] = vratiotest(y) StatTbl = vratiotest(Tbl) [ ___ ] = vratiotest( ___ ,Name=Value) [ ___ ,ratio] = vratiotest( ___ )
Description h = vratiotest(y) returns the rejection decision h from conducting the variance ratio test on page 12-2432 for assessing whether the univariate time series y is a random walk. [h,pValue,stat,cValue] = vratiotest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = vratiotest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the variance ratio test on the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = vratiotest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. vratiotest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when vratiotest conducts multiple tests: • vratiotest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, vratiotest(Tbl,DataVariable="GDP",Alpha=0.025,IID=[false true]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test does not assume that the innovations series is iid and the second test assumes that the innovations series is iid. [ ___ ,ratio] = vratiotest( ___ ) additionally returns the variance ratios of each test ratio.
Examples Conduct Variance Ratio Test on Vector of Data Test whether a time series is a random walk using the default options of vratiotest. Input the time series data as a numeric vector. 12-2424
vratiotest
Load the global large-cap equity indices data set, which contains daily closing prices during 1993– 2003. Extract the closing prices of the S&P 500 series. load Data_GlobalIdx1 dt = datetime(dates,ConvertFrom="datenum"); sp = DataTable.SP; plot(dt,sp) title("S&P 500 Price Series")
The first half of the series exhibits exponential growth. Scale the series by applying the log transformation. logsp = log(DataTable.SP);
The logsp series is in levels. Assess the null hypothesis that the log series is a random walk by applying the variance ratio test. Use default options. h = vratiotest(logsp) h = logical 0
h = 0 indicates that, at a 5% level of significance, the test fails to reject the null hypothesis that the series is a random walk. 12-2425
12
Functions
Return Test p-Value and Decision Statistics Load the global large-cap equity indices data set, extract the closing prices of the S&P 500 price series, and apply the log transform to the series. load Data_GlobalIdx1 logsp = log(DataTable.SP);
Assess the null hypothesis that the log series is a random walk by applying the variance ratio test. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stat,cValue] = vratiotest(logsp) h = logical 0 pValue = 0.7004 stat = 0.3847 cValue = 1.9600
Conduct Variance Ratio Test on Table Variable Test whether a time series, which is one variable in a table, is a random walk using the default options. Load the global large-cap equity indices data set. Convert the table DataTable to a timetable. load Data_GlobalIdx1 dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates);
Apply the log transform to all series. LTT = varfun(@log,TT(:,2:end)); LTT.Properties.VariableNames{end} ans = 'log_SP'
The last variable in the table is the log of the S&P 500 price series log_SP. Assess the null hypothesis of the variance ratio test that the log of the S&P 500 price series is a random walk. StatTbl = vratiotest(LTT) StatTbl=1×7 table h _____
12-2426
pValue _______
stat _______
cValue ______
Alpha _____
Period ______
IID _____
vratiotest
Test 1
false
0.70045
0.38471
1.96
0.05
2
false
vratiotest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Alpha, Period, and IID), and rows correspond to individual tests (in this case, vratiotest conducts one test). By default, vratiotest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Adjust Number of Overlapping Periods for Test Test whether the S&P 500 series is a random walk using various step sizes, with and without the iid innovations assumption. Load the global large-cap equity indices data set. Convert the table DataTable to a timetable and apply the log transform to all timetable variables. load Data_GlobalIdx1 dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); LTT = varfun(@log,TT(:,2:end));
Plot the S&P 500 returns. plot(diff(LTT.log_SP)) axis tight
12-2427
12
Functions
The plot indicates that the returns have possible conditional heteroscedasticity. Conduct separate tests for whether the series is a random walk using periods 2, 4, and 8. For each period, conduct separate tests assuming the innovations are iid and without the assumption. Return the variance ratios of each test, and compute the first-order autocorrelation of the returns from the first ratio. q = [2 4 8 2 4 8]; iid = logical([1 1 1 0 0 0]); [StatTbl,ratio] = vratiotest(LTT,Period=q,IID=iid,DataVariable="log_SP") StatTbl=6×7 table h _____ Test Test Test Test Test Test
1 2 3 4 5 6
ratio = 6×1 1.0111 0.9647 0.8763
12-2428
false false true false false false
pValue ________
stat ________
cValue ______
0.56704 0.33073 0.030933 0.70045 0.50788 0.13034
0.57242 -0.97265 -2.1579 0.38471 -0.66214 -1.5128
1.96 1.96 1.96 1.96 1.96 1.96
Alpha _____ 0.05 0.05 0.05 0.05 0.05 0.05
Period ______ 2 4 8 2 4 8
IID _____ true true true false false false
vratiotest
1.0111 0.9647 0.8763 rho1 = ratio(1) - 1 % First-order autocorrelation of returns rho1 = 0.0111
StatTbl.h indicates that the test fails to reject that the series is a random walk at 5% level, except in the case where period = 8 and IID = true. This rejection is likely due to the test not accounting for the heteroscedasticity.
Input Arguments y — Univariate time series data in levels numeric vector Univariate time series data in levels, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note • vratiotest removes missing observations, represented by NaN values, from the input series. • vratiotest assumes that the specified input data is in levels. To convert a return series r to levels, define y(1) appropriately and let y = cumsum([y(1) r]).
Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: vratiotest(Tbl,DataVariable="GDP",Alpha=0.025,IID=[false true]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test does not assume that the innovations series is iid and the second test assumes that the innovations series is iid. Period — Period q used to create overlapping return horizons 2 (default) | integer | vector of integers 12-2429
12
Functions
Period q used to create overlapping return horizons for the variance ratio, specified as an integer that is greater than 1 and less T/2 or a vector of such integers, where T is the effective sample size of yt. When Period = 2 (the default), the first-order autocorrelation of the returns is asymptotically equal to ratio − 1. vratiotest conducts a separate test for each value in Period. Example: Period=2:4 runs three tests; the first test sets Period to 2, the second test sets Period to 3, and the third test sets Period to 4. Data Types: double IID — Flag for whether to assume innovations are iid (independenct and identically distributed) false (default) | true | logical vector Flag for whether to assume the innovations are iid, specified as a value in this table or a vector of such values. Value
Description
false
Innovations are not iid; the alternative hypothesis is the innovations series εt is a correlated series. Set false when the input series is a long-term macroeconomic or financial price series because often, for such series, the iid assumption is unreasonable and rejection of the random-walk null hypothesis is due to heteroscedasticity is uninteresting.
true
Innovations are iid; the alternative hypothesis is εt is a dependent or not identically distributed series (for example, heteroscedastic). Set true to strengthen the random-walk null hypothesis.
vratiotest conducts a separate test for each value in IID. Example: IID=true Data Types: double Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar in the interval (0,1) or a numeric vector of such values. vratiotest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector
12-2430
vratiotest
Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string Note • If you perform sequential testing using multiple values of q (Period), small-sample size distortions, beyond those that result from the asymptotic approximation of critical values, can result [2]. • When vratiotest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors.
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. vratiotest returns h when you supply the input y. • Values of 1 indicate rejection of the random-walk null hypothesis in favor of the alternative. • Values of 0 indicate failure to reject the random-walk null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. vratiotest returns pValue when you supply the input res. p-values are standard normal probabilities. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. vratiotest returns stat when you supply the input y. Test statistics are asymptotically standard normal. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. vratiotest returns cValue when you supply the input y. 12-2431
12
Functions
Critical values are for two-tail probabilities. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. vratiotest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Period, Alpha, and IID. ratio — Variance ratios numeric scalar | numeric vector Variance ratios, returned as a numeric vector with length equal to the number of tests. Each ratio is the ratio of the following quantities: • The variance of the q-fold overlapping return horizon • q times the variance of the return series For a random walk, the ratios are asymptotically equal to one. For a mean-reverting series, the ratios are less than one. For a mean-averting series, the ratios are greater than one.
More About Variance Ratio Test The variance ratio test assesses the null hypothesis that a univariate time series yt is a random walk. The null model is yt = c + yt–1 + εt, where c is a drift constant and εt is an uncorrelated innovations series with zero mean. • When the innovations are not iid (the IID argument is false), the alternative is εt is a correlated series. • When innovations are iid (IID is true), the alternative is εt is either a dependent or not identically distributed series (for example, heteroscedastic). The test statistic is based on a ratio of variance estimates of returns rt = yt − yt−1 and period q (Period argument) return horizons t + … + rt−q+1. The variance ratio (ratio output) is for a test is VAR rt + ... + rt − q + 1 . qVAR rt The horizon overlaps to increase the efficiency of the estimator and power of the test. Under either null hypothesis, an uncorrelated εt series implies that the period q variance is asymptotically equal to q times the period 1 variance. However, the variance of the ratio depends on the degree of heteroscedasticity, and, therefore, the variance of the ratio depends on the null hypothesis. Rejection of the null hypothesis due to dependence of the innovations does not imply that the εt are correlated. If the innovations are dependent, nonlinear functions of εt can be correlated, regardless of whether εt are correlated. For example, if Cov(εt,εt−k) = 0 for all k ≠ 0, a k ≠ 0 can exist such that Cov(εt2,εt−k2) ≠ 0. 12-2432
vratiotest
The test is two-tailed; therefore, the test rejects the random-walk null hypothesis when the test statistic is outside of the critical interval [−cValue,cValue]. Each tail outside of the critical interval has probability Alpha/2.
Tips • The test finds the largest integer n such that nq ≤ T – 1, where q is the vaule of the Period argument and T is the sample size. Then, the test discards the final (T–1) – nq observations. To include these final observations, remove the initial (T–1) – nq observations from the input series before you run the test.
Version History Introduced in R2009b
References [1] Campbell, J. Y., A. W. Lo, and A. C. MacKinlay. Chapter 12. “The Econometrics of Financial Markets.” Nonlinearities in Financial Data. Princeton, NJ: Princeton University Press, 1997. [2] Cecchetti, S. G., and P. S. Lam. “Variance-Ratio Tests: Small-Sample Properties with an Application to International Output Data.” Journal of Business and Economic Statistics. Vol. 12, 1994, pp. 177–186. [3] Cochrane, J. “How Big is the Random Walk in GNP?” Journal of Political Economy. Vol. 96, 1988, pp. 893–920. [4] Faust, J. “When Are Variance Ratio Tests for Serial Dependence Optimal?” Econometrica. Vol. 60, 1992, pp. 1215–1226. [5] Lo, A. W., and A. C. MacKinlay. “Stock Market Prices Do Not Follow Random Walks: Evidence from a Simple Specification Test.” Review of Financial Studies. Vol. 1, 1988, pp. 41–66. [6] Lo, A. W., and A. C. MacKinlay. “The Size and Power of the Variance Ratio Test.” Journal of Econometrics. Vol. 40, 1989, pp. 203–238. [7] Lo, A. W., and A. C. MacKinlay. A Non-Random Walk Down Wall St. Princeton, NJ: Princeton University Press, 2001.
See Also adftest | pptest | kpsstest | lmctest Topics “Unit Root Nonstationarity” on page 3-33
12-2433
12
Functions
waldtest Wald test of model specification
Syntax h = waldtest(r,R,EstCov) h = waldtest(r,R,EstCov,alpha) [h,pValue] = waldtest( ___ ) [h,pValue,stat,cValue] = waldtest( ___ )
Description h = waldtest(r,R,EstCov) returns a logical value (h) with the rejection decision from conducting a Wald test on page 12-2443 of model specification. waldtest constructs the test statistic using the restriction function and its Jacobian, and the value of the unrestricted model covariance estimator, all evaluated at the unrestricted parameter estimates (r, R, and EstCov, respectively). • If any input argument is a cell vector of length k > 1, then the other input arguments must be cell arrays of length k. waldtest(r,R,EstCov) treats each cell as a separate, independent test, and returns a vector of rejection decisions. • If any input argument is a row vector, then the software returns output arguments as row vectors. h = waldtest(r,R,EstCov,alpha) returns the rejection decision of the Wald test conducted at significance level alpha. [h,pValue] = waldtest( ___ ) returns the rejection decision and p-value (pValue) for the hypothesis test, using any of the input arguments in the previous syntaxes. [h,pValue,stat,cValue] = waldtest( ___ ) additionally returns the test statistic (stat) and critical value (cValue) for the hypothesis test.
Examples Assess Model Specifications Using the Wald Test Check for significant lag effects in a time series regression model. Load the U.S. GDP data set. load Data_GDP
Plot the GDP against time. plot(dates,Data) datetick
12-2434
waldtest
The series seems to increase exponentially. Transform the data using the natural logarithm. logGDP = log(Data);
logGDP is increasing in time, so assume that there is a significant lag 1 effect. To use the Wald test to check if there is a significant lag 2 effect, you need the: • Estimated coefficients of the unrestricted model • Restriction function evaluated at the unrestricted model coefficient values • Jacobian of the restriction function evaluated at the unrestricted model coefficient values • Estimated, unrestricted parameter covariance matrix. The unrestricted model is yt = β0 + β1 yt − 1 + β2 yt − 2 + εt . Estimate the coefficients of the unrestricted model. LagLGDP = lagmatrix(logGDP,1:2); UMdl = fitlm(table(LagLGDP(:,1),LagLGDP(:,2),logGDP));
UMdl is a fitted LinearModel model. It contains, among other things, the fitted coefficients of the unrestricted model. 12-2435
12
Functions
The restriction is β2 = 0. Therefore, the restriction function (r) and Jacobian (R) are: • r = β2 • R= 0 0 1 Specify r, R, and the estimated, unrestricted parameter covariance matrix. r = UMdl.Coefficients.Estimate(3); R = [0 0 1]; EstParamCov = UMdl.CoefficientCovariance;
Test for a significant lag 2 effect using the Wald test. [h,pValue] = waldtest(r,R,EstParamCov) h = logical 1 pValue = 1.2521e-07
h = 1 indicates that the null, restricted hypothesis (β2 = 0) should be rejected in favor of the alternative, unrestricted hypothesis. pValue is quite small, which suggests that there is strong evidence for this result.
Assess Conditional Heteroscedasticity Using the Wald Test Test whether there are significant ARCH effects in a simulated response series using waldtest. Suppose that the model for the simulated data is AR(1) with an ARCH(1) variance. Symbolically, the model is yt = 0 . 9yt − 1 + εt, where • εt = wt ht • ht = 1 + 0 . 5ε2 t−1 • wt is Gaussian with mean 0 and variance 1. Specify the model for the simulated data. VarMdl = garch('ARCH',0.5,'Constant',1); Mdl = arima('Constant',0,'Variance',VarMdl,'AR',0.9);
Mdl is a fully specified AR(1) model with an ARCH(1) variance. Simulate presample and effective sample responses from Mdl. T = 100; rng(1); % For reproducibility n = 2; % Number of presample observations required for the Jacobian [y,epsilon,condVariance] = simulate(Mdl,T + n);
12-2436
waldtest
psI = 1:n; % Presample indices esI = (n + 1):(T + n); % Estimation sample indices
epsilon is the random path of innovations from VarMdl. The software filters epsilon through Mdl to yield the random response path y. Specify the unrestricted model assuming that the conditional mean model is yt = c + ϕ1 yt − 1 + εt, where ht = α0 + α1εt2− 1. Fit the simulated data (y) to the unrestricted model using the presample observations. UVarMdl = garch(0,1); UMdl = arima('ARLags',1,'Variance',UVarMdl); [UEstMdl,UEstParamCov] = estimate(UMdl,y(esI),'Y0',y(psI),... 'E0',epsilon(psI),'V0',condVariance(psI),'Display','off');
UEstMdl is the fitted, unrestricted model, and UEstParamCov is the estimated parameter covariance of the unrestricted model parameters. The null hypothesis is that α1 = 0, i.e., the restricted model is AR(1) with Gaussian innovations that have mean 0 and constant variance. Therefore, the restriction function is r(θ) = α1, where θ = [c, ϕ1, α0, α1]′. The components of the Wald test are: • The restriction function evaluated at the unrestricted parameter estimates is r = α1. • The Jacobian of r evaluated at the unrestricted model parameters is R = 0 0 0 1 . • The unrestricted model estimated parameter covariance matrix is UEstParamCov. Specify r and R. r = UEstMdl.Variance.ARCH{1}; R = [0, 0, 0, 1];
Test the null hypothesis that α1 = 0 at the 1% significance level using waldtest. [h,pValue,stat,cValue] = waldtest(r,R,UEstParamCov,0.01) h = logical 0 pValue = 0.0549 stat = 3.6846 cValue = 6.6349
h = 0 indicates that the null, restricted model should not be rejected in favor of the alternative, unrestricted model. This result is consistent with the model for the simulated data.
12-2437
12
Functions
Test Among Multiple Nested Model Specifications Assess model specifications by testing down among multiple restricted models using simulated data. The true model is the ARMA(2,1) yt = 3 + 0 . 9yt − 1 − 0 . 5yt − 2 + εt + 0 . 7εt − 1, where εt is Gaussian with mean 0 and variance 1. Specify the true ARMA(2,1) model, and simulate 100 response values. TrueMdl = arima('AR',{0.9,-0.5},'MA',0.7,... 'Constant',3,'Variance',1); T = 100; rng(1); % For reproducibility y = simulate(TrueMdl,T);
Specify the unrestricted model and the names of the candidate models for testing down. UMdl = arima(2,0,2); RMdlNames = {'ARMA(2,1)','AR(2)','ARMA(1,2)','ARMA(1,1)',... 'AR(1)','MA(2)','MA(1)'};
UMdl is the unrestricted, ARMA(2,2) model. RMdlNames is a cell array of strings containing the names of the restricted models. Fit the unrestricted model to the simulated data. [UEstMdl,UEstParamCov] = estimate(UMdl,y,'Display','off');
UEstMdl is the fitted, unrestricted model, and UEstParamCov is the estimated parameter covariance matrix. The unrestricted model has six parameters. To construct the restriction function and its Jacobian, you must know the order of the parameters in UEstParamCov. For this arima model, the order is [c, ϕ1, ϕ2, θ1θ2, σ2]. Each candidate model corresponds to a restriction function. Put the restriction function vectors into separate cells of a cell vector. rf1 rf2 rf3 rf4 rf5 rf6 rf7 r =
= UEstMdl.MA{2}; = cell2mat(UEstMdl.MA)'; = UEstMdl.AR{2}; = [UEstMdl.AR{2};UEstMdl.MA{2}]'; = [UEstMdl.AR{2};cell2mat(UEstMdl.MA)']; = cell2mat(UEstMdl.AR)'; = [cell2mat(UEstMdl.AR)';UEstMdl.MA{2}]; {rf1;rf2;rf3;rf4;rf5;rf6;rf7};
% % % % % % %
ARMA(2,1) AR(2) ARMA(1,2) ARMA(1,1) AR(1) MA(2) MA(1)
r is a 7-by-1 cell vector of vectors corresponding to the restriction function for the candidate models. Put the Jacobian of each restriction function into separate, corresponding cells of a cell vector. The order of the elements in the Jacobian must correspond to the order of the elements in UEstParamCov. J1 = [0 0 0 0 1 0]; J2 = [0 0 0 1 0 0; 0 0 0 0 1 0];
12-2438
% ARMA(2,1) % AR(2)
waldtest
J3 = [0 1 0 0 0 0]; J4 = [0 1 0 0 0 0; 0 0 0 0 1 J5 = [0 1 0 0 0 0; 0 0 0 1 0 J6 = [1 0 0 0 0 0; 0 1 0 0 0 J7 = [1 0 0 0 0 0; 0 1 0 0 0 R = {J1;J2;J3;J4;J5;J6;J7};
% ARMA(1,2) 0]; % ARMA(1,1) 0; 0 0 0 0 1 0]; % AR(1) 0]; % MA(2) 0; 0 0 0 0 1 0]; % MA(1)
R is a 7-by-1 cell vector of vectors corresponding to the restriction function for the candidate models. Put the estimated parameter covariance matrix in each cell of a 7-by-1 cell vector. EstCov = cell(7,1); % Preallocate for j = 1:length(EstCov) EstCov{j} = UEstParamCov; end
Apply the Wald test at a 1% significance level to find the appropriate, restricted model specifications. alpha = .01; h = waldtest(r,R,EstCov,alpha); RestrictedModels = RMdlNames(~h) RestrictedModels = 1x5 cell {'ARMA(2,1)'} {'ARMA(1,2)'}
{'ARMA(1,1)'}
{'MA(2)'}
{'MA(1)'}
RestrictedModels lists the most appropriate restricted models. You can test down again, but use ARMA(2,1) as the unrestricted model. In this case, you must remove MA(2) from the possible restricted models.
Conduct a Wald Test Using Nonlinear Restriction Functions Test whether the parameters of a nested model have a nonlinear relationship. Load the Deutschmark/British Pound bilateral spot exchange rate data set. load Data_MarkPound
The data set (Data) contains a time series of prices. Convert the prices to returns, and plot the return series. returns = price2ret(Data); figure plot(returns) axis tight ylabel('Returns') xlabel('Days, 02Jan1984 - 31Dec1991') title('{\bf Deutschmark/British Pound Bilateral Spot Exchange Rate}')
12-2439
12
Functions
The returns series shows signs of heteroscedasticity. Suppose that a GARCH(1,1) model is an appropriate model for the data. Fit a GARCH(1,1) model to the data including a constant. Mdl = garch(1,1); [EstMdl,EstParamCov] = estimate(Mdl,returns); GARCH(1,1) Conditional Variance Model (Gaussian Distribution): Value __________ Constant GARCH{1} ARCH{1}
1.0538e-06 0.80653 0.15439
StandardError _____________ 3.5053e-07 0.012913 0.011578
TStatistic __________
PValue __________
3.0063 62.458 13.335
0.0026443 0 1.4463e-40
g1 = EstMdl.GARCH{1}; a1 = EstMdl.ARCH{1};
g1 is the estimated GARCH effect, and a1 is the estimated ARCH effect. The following might represent relationships between the GARCH and ARCH coefficients: • γ1α1 = 1 • γ1 + α1 = 1 12-2440
waldtest
where γ1 is the GARCH effect and α1 is the ARCH effect. Specify these relationships as the restriction function r(θ) = 0, evaluated at the unrestricted model parameter estimates. This specification defines a nested, restricted model. r = [g1*a1; g1+a1] - 1;
Specify the Jacobian of the restriction function vector. R = [0, a1, g1;0, 1, 1];
Conduct a Wald test to assess whether there is sufficient evidence to reject the restricted model. [h,pValue,stat,cValue] = waldtest(r,R,EstParamCov) h = logical 1 pValue = 0 stat = 1.4589e+04 cValue = 5.9915
h = 1 indicates that there is sufficient evidence to reject the restricted model in favor of the unrestricted model. pValue = 0 indicates that the evidence for rejecting the restricted model is strong.
Input Arguments r — Restriction functions scalar | vector | cell vector of scalars or vectors Restriction functions corresponding to restricted models, specified as a scalar, vector, or cell vector of scalars or vectors. • If r is a q-vector or a singleton cell array containing a q-vector, then the software conducts one Wald test. q must be less than the number of unrestricted model parameters. • If r is a cell vector of length k > 1, and cell j contains a qj-vector, j = 1,...,k, then the software conducts k independent Wald tests. Each qj must be less than the number of unrestricted model parameters. Data Types: double | cell R — Restriction function Jacobians row vector | matrix | cell vector of row vectors or matrices Restriction function Jacobians, specified as a row vector, matrix, or cell vector of row vectors or matrices. • Suppose r1,...,rq are the q restriction functions, and the unrestricted model parameters are θ1,...,θp. Then, the restriction function Jacobian is 12-2441
12
Functions
∂r1 ∂r1 … ∂θ1 ∂θp R=
⋮ ⋱ ⋮ . ∂rq ∂rq ⋯ ∂θ1 ∂θp
• If R is a q-by-p matrix or a singleton cell array containing a q-by-p matrix, then the software conducts one Wald test. q must be less than p, which is the number of unrestricted model parameters. • If R is a cell vector of length k > 1, and cell j contains a qj-by-pj matrix, j = 1,...,k, then the software conducts k independent Wald tests. Each qj must be less than pj, which is the number of unrestricted parameters in model j. Data Types: double | cell EstCov — Unrestricted model parameter covariance estimate matrix | cell vector of matrices Unrestricted model parameter covariance estimates, specified as a matrix or cell vector of matrices. • If EstCov is a p-by-p matrix or a singleton cell array containing a p-by-p matrix, then the software conducts one Wald test. p is the number of unrestricted model parameters. • If EstCov is a cell vector of length k > 1, and cell j contains a pj-by-pj matrix, j = 1,...,k, then the software conducts k independent Wald tests. Each pj is the number of unrestricted parameters in model j. Data Types: double | cell alpha — Nominal significance levels 0.05 (default) | scalar | vector Nominal significance levels for the hypothesis tests, specified as a scalar or vector. Each element of alpha must be greater than 0 and less than 1. When conducting k > 1 tests, • If alpha is a scalar, then the software expands it to a k-by-1 vector. • If alpha is a vector, then it must have length k. Data Types: double
Output Arguments h — Test rejection decisions logical | vector of logicals Test rejection decisions, returned as a logical value or vector of logical values with a length equal to the number of tests that the software conducts. • h = 1 indicates rejection of the null, restricted model in favor of the alternative, unrestricted model. 12-2442
waldtest
• h = 0 indicates failure to reject the null, restricted model. pValue — Test statistic p-values scalar | vector Test statistic p-values, returned as a scalar or vector with a length equal to the number of tests that the software conducts. stat — Test statistics scalar | vector Test statistics, returned as a scalar or vector with a length equal to the number of tests that the software conducts. cValue — Critical values scalar | vector Critical values determined by alpha, returned as a scalar or vector with a length equal to the number of tests that the software conducts.
More About Wald Test The Wald test compares specifications of nested models by assessing the significance of q parameter restrictions to an extended model with p unrestricted parameters. The test statistic is W = r′ RΣθ R′
−1
r,
where • r is the restriction function that specifies restrictions of the form r(θ) = 0 on parameters θ in the unrestricted model, evaluated at the unrestricted model parameter estimates. In other words, r maps the p-dimensional parameter space to the q-dimensional restriction space. In practice, r is a q-by-1 vector, where q < p. Usually, r = θ − θ0, where θ is the unrestricted model parameter estimates for the restricted parameters and θ0 holds the values of the restricted model parameters under the null hypothesis. • R is the restriction function Jacobian evaluated at the unrestricted model parameter estimates. • Σ θ is the unrestricted model parameter covariance estimator evaluated at the unrestricted model parameter estimates. • W has an asymptotic chi-square distribution with q degrees of freedom. When W exceeds a critical value in its asymptotic distribution, the test rejects the null, restricted hypothesis in favor of the alternative, unrestricted hypothesis. The nominal significance level (α) determines the critical value.
12-2443
12
Functions
Note Wald tests depend on the algebraic form of the restrictions. For example, you can express the restriction ab = 1 as a – 1/b = 0, or b – 1/a = 0, or ab – 1 = 0. Each formulation leads to different test statistics.
Tips • Estimate unrestricted univariate linear time series models, such as arima or garch, or time series regression models (regARIMA) using estimate. Estimate unrestricted multivariate linear time series models, such as varm or vecm, using estimate. estimate returns parameter estimates and their covariance estimates, which you can process and use as inputs to waldtest. • If you cannot easily compute restricted parameter estimates, then use waldtest. By comparison: • lratiotest requires both restricted and unrestricted parameter estimates. • lmtest requires restricted parameter estimates.
Algorithms • waldtest performs multiple, independent tests when the restriction function vector, its Jacobian, and the unrestricted model parameter covariance matrix (r, R, and EstCov, respectively) are equal-length cell vectors. • If EstCov is the same for all tests, but r varies, then waldtest “tests down” against multiple restricted models. • If EstCov varies among tests, but r does not, then waldtest “tests up” against multiple unrestricted models. • Otherwise, waldtest compares model specifications pair-wise. • alpha is nominal in that it specifies a rejection probability in the asymptotic distribution. The actual rejection probability is generally greater than the nominal significance. • The Wald test rejection error is generally greater than the likelihood ratio and Lagrange multiplier test rejection errors.
Version History Introduced in R2009a
References [1] Davidson, R. and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, W. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2008. [4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-2444
waldtest
See Also Objects arima | varm | garch | regARIMA | vecm Functions lmtest | estimate | estimate | estimate | estimate Topics “Conduct Wald Test” on page 3-65 “Classical Model Misspecification Tests” on page 3-70 “Time Series Regression IX: Lag Order Selection” on page 5-261
12-2445
A
Appendices
A
Data Sets and Examples
Data Sets and Examples Econometrics Toolbox includes historical data sets and featured examples. Data sets include a diverse collection of macroeconomic time series for use with model estimation and testing. Featured examples use the data sets to demonstrate common workflows in econometric analysis and explore connections among related toolbox functions. Generally, each data set is a MAT file containing the following variables: • Data — A matrix of data. Each column is a variable (time series). Each row contains associated observations of the variables. • DataTable — A table of data. DataTable contains the same observations and has the same dimensionality as Data. • DataTimeTable — A timetable of data. DataTimeTable contains the same observations and has the same dimensionality as Data. • Description — Textual data set description, including data set variable definitions and references. • series — Vector of descriptive variable names. To load variables of a data set into the workspace, enter the following command at the command line, where DataSetName is one of the MAT files in the table. load DataSetName
A-2
Data Set Name
Description
Data_Accidental
Monthly number of accidental deaths in the U.S., 1973–1978
Data_Airline
Monthly number of international airline passengers, 1949–1960
Data_Canada
Canadian inflation and interest rates, 1954–1994
Data_Consumption
US food consumption, 1927–1962
Data_CreditDefaults
Investment-grade corporate bond defaults and four predictors, 1984–2004
Data_Danish
Danish stock returns, bond yields, 1922–1999
Data_DieboldLi
U.S. Treasury unsmoothed Fama-Bliss zero-coupon yields and macroeconomic factors, 1972–2000
Data_ElectricityPrices
Simulated daily electricity spot prices, 2010–2013
Data_EquityIdx
U.S. equity indices, 1990–2001
Data_FXRates
Currency exchange rates, 1979–1998
Data_GDP
U.S. Gross Domestic Product, 1947–2005
Data_GlobalIdx1
Global large-cap equity indices, 1993–2003
Data_GNP
U.S. Gross National Product, 1947–2005
Data_Income1
Simulated data on income and education
Data_Income2
Average annual earnings by educational attainment in eight workforce age categories
Data_JAustralian
Johansen's Australian data, 1972–1991
Data Sets and Examples
Data Set Name
Description
Data_JDanish
Johansen's Danish data, 1974–1987
Data_MarkPound
Deutschmark/British Pound foreign-exchange rate, 1984–1991
Data_NelsonPlosser
Macroeconomic series of Nelson and Plosser, 1860–1970
Data_Overshort
Daily overshorts from an underground gasoline tank in Colorado. 57 consecutive days
Data_PowerConsumption
Canadian electrical power consumption and GDP, 1960–2009
Data_Recessions
U.S. recession start and end dates, 1857–2022
Data_SchwertMacro
Macroeconomic series of Schwert, 1947–1985
Data_SchwertStock
Indices of U.S. stock prices, 1871–2008
Data_TBill
Three-month U.S. treasury bill secondary market rates, 1947– 2005
Data_USEconModel
U.S. macroeconomic series, 1947–2009
Data_USEconVECModel
U.S. macroeconomic series 1957–2016 and projections for the following 10 years from the Congressional Budget Office
To open the script of an Econometrics Toolbox featured example, enter the following command at the command line, where ExampleName is an example name in the table. openExample('econ/ExampleName')
Example Name
Title
Description
AnalyzeLinearizedDSGEMod “Analyze Linearized DSGE elsExample Models” on page 11-178
Analyze the dynamic stochastic general equilibrium (DSGE) model in [69] by using Bayesian state-space model tools.
Demo_ClassicalTests
“Classical Model Misspecification Tests” on page 3-70
Perform classical model misspecification tests.
Demo_DieboldLiModel
“Apply State-Space Methodology Analyze the popular Diebold-Li to Analyze Diebold-Li Yield yields-only and yields-macro Curve Model” on page 11-148 models of monthly yield-curve time series derived from U.S. Treasury bills and bonds by using state-space models and the Kalman filter.
Demo_HPFilter
“Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-30
Demo_RiskFHS
“Using Bootstrapping and Use bootstrapping and filtered Filtered Historical Simulation to historical simulation to evaluate Evaluate Market Risk” on page market risk 8-102
Use the Hodrick-Prescott filter to reproduce their original result
A-3
A
A-4
Data Sets and Examples
Example Name
Title
Description
Demo_RiskEVT
“Using Extreme Value Theory Use extreme value theory and and Copulas to Evaluate Market copulas to evaluate market risk Risk” on page 8-114
Demo_TSReg1
“Time Series Regression I: Linear Models” on page 5-173
Introduce basic assumptions behind multiple linear regression models
Demo_TSReg2
“Time Series Regression II: Collinearity and Estimator Variance” on page 5-180
Detect correlation among predictors and accommodating problems of large estimator variance
Demo_TSReg3
“Time Series Regression III: Influential Observations” on page 5-190
Detect influential observations in time series data and accommodating their effect on multiple linear regression models
Demo_TSReg4
“Time Series Regression IV: Investigate trending variables, Spurious Regression” on page 5- spurious regression, and 197 methods of accommodation in multiple linear regression models
Demo_TSReg5
“Time Series Regression V: Predictor Selection” on page 5209
Select a parsimonious set of predictors with high statistical significance for multiple linear regression models
Demo_TSReg6
“Time Series Regression VI: Residual Diagnostics” on page 5-220
Evaluate model assumptions and investigating respecification opportunities by examining the series of residuals
Demo_TSReg7
“Time Series Regression VII: Forecasting” on page 5-231
Present the basic setup for producing conditional and unconditional forecasts from multiple linear regression models
Demo_TSReg8
“Time Series Regression VIII: Examine how lagged predictors Lagged Variables and Estimator affect least-squares estimation Bias” on page 5-240 of multiple linear regression models
Demo_TSReg9
“Time Series Regression IX: Lag Illustrate predictor history Order Selection” on page 5-261 selection for multiple linear regression models
Demo_TSReg10
“Time Series Regression X: Estimate multiple linear Generalized Least Squares and regression models of time series HAC Estimators” on page 5-279 data in the presence of heteroscedastic or autocorrelated innovations
Data Sets and Examples
Example Name
Title
Description
Demo_USEconModel
“Model the United States Economy” on page 9-151
Model the U.S. economy using a VEC model as a linear alternative to the SmetsWouters DSGE macroeconomic model
ModelAndSimulateElectric “Model and Simulate Electricity itySpotPricesUsingSkewNo Spot Prices Using the SkewrmalExample Normal Distribution” on page 7189
Simulate the future behavior of electricity spot prices from a time series model fitted to historical data, and use the skew normal distribution to model the innovations process.
See Also More About •
“Create Tables and Assign Data to Them”
•
“Resample and Aggregate Data in Timetable”
•
“Combine Timetables and Synchronize Their Data”
•
“Clean Timetable with Missing, Duplicate, or Nonuniform Times”
A-5