148 75 27MB
English Pages 4522 Year 2023
Econometrics Toolbox™ User's Guide
R2023b
How to Contact MathWorks Latest news:
www.mathworks.com
Sales and services:
www.mathworks.com/sales_and_services
User community:
www.mathworks.com/matlabcentral
Technical support:
www.mathworks.com/support/contact_us
Phone:
508-647-7000
The MathWorks, Inc. 1 Apple Hill Drive Natick, MA 01760-2098 Econometrics Toolbox™ User's Guide © COPYRIGHT 1999–2023 by The MathWorks, Inc. The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders. Patents
MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information.
Revision History
October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012 March 2013 September 2013 March 2014 October 2014 March 2015 September 2015 March 2016 September 2016 March 2017 September 2017 March 2018 September 2018 March 2019 September 2019 March 2020 September 2020 March 2021 September 2021 March 2022 September 2022 March 2023 September 2023
Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only Online Only
Version 1.0 (Release 2008b) Revised for Version 1.1 (Release 2009a) Revised for Version 1.2 (Release 2009b) Revised for Version 1.3 (Release 2010a) Revised for Version 1.4 (Release 2010b) Revised for Version 2.0 (Release 2011a) Revised for Version 2.0.1 (Release 2011b) Revised for Version 2.1 (Release 2012a) Revised for Version 2.2 (Release 2012b) Revised for Version 2.3 (Release 2013a) Revised for Version 2.4 (Release 2013b) Revised for Version 3.0 (Release 2014a) Revised for Version 3.1 (Release 2014b) Revised for Version 3.2 (Release 2015a) Revised for Version 3.3 (Release 2015b) Revised for Version 3.4 (Release 2016a) Revised for Version 3.5 (Release 2016b) Revised for Version 4.0 (Release 2017a) Revised for Version 4.1 (Release 2017b) Revised for Version 5.0 (Release 2018a) Revised for Version 5.1 (Release 2018b) Revised for Version 5.2 (Release 2019a) Revised for Version 5.3 (Release 2019b) Revised for Version 5.4 (Release 2020a) Revised for Version 5.5 (Release 2020b) Revised for Version 5.6 (Release 2021a) Revised for Version 5.7 (Release 2021b) Revised for Version 6.0 (Release 2022a) Revised for Version 6.1 (Release 2022b) Revised for Version 6.2 (Release 2023a) Revised for Version 23.2 (R2023b)
Contents
1
2
Getting Started Econometrics Toolbox Product Description . . . . . . . . . . . . . . . . . . . . . . . .
1-2
Econometric Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Econometrics Toolbox Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-3 1-3 1-3
Represent Time Series Models Using Econometrics Toolbox Objects . . . . Model Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Model Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Retrieve Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modify Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Object Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-7 1-7 1-10 1-11 1-15 1-16 1-17
Stochastic Process Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Stochastic Process? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear Time Series Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unit Root Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lag Operator Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristic Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-18 1-18 1-19 1-19 1-20 1-21 1-22
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1-24
Data Preprocessing Data Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Why Transform? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Common Data Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-2 2-2 2-2
Trend-Stationary vs. Difference-Stationary Processes . . . . . . . . . . . . . . . . Nonstationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trend Stationary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Difference Stationary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-6 2-6 2-7 2-7
Specify Lag Operator Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lag Operator Polynomial of Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . Difference Lag Operator Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-9 2-9 2-11
Nonseasonal Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-13
v
3
Nonseasonal and Seasonal Differencing . . . . . . . . . . . . . . . . . . . . . . . . . .
2-16
Time Series Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-19
Moving Average Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-21
Moving Average Trend Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-22
Parametric Trend Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-24
Use Hodrick-Prescott Filter to Reproduce Original Result . . . . . . . . . . .
2-29
Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results . . .
2-34
Choose Time Series Filter for Business Cycle Analysis . . . . . . . . . . . . . .
2-40
Seasonal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Seasonal Filter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stable Seasonal Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sn × m seasonal filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-62 2-62 2-62 2-63
Seasonal Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is Seasonal Adjustment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deseasonalized Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seasonal Adjustment Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2-65 2-65 2-65 2-65
Seasonal Adjustment Using a Stable Seasonal Filter . . . . . . . . . . . . . . . .
2-67
Seasonal Adjustment Using S(n,m) Seasonal Filters . . . . . . . . . . . . . . . .
2-72
Model Selection Select ARIMA Model for Time Series Using Box-Jenkins Methodology . .
vi
Contents
3-2
Autocorrelation and Partial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . What Are Autocorrelation and Partial Autocorrelation? . . . . . . . . . . . . . . Theoretical ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sample ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute Sample ACF and PACF in MATLAB® . . . . . . . . . . . . . . . . . . . .
3-10 3-10 3-10 3-10 3-11
Ljung-Box Q-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-17
Detect Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compute Sample ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conduct the Ljung-Box Q-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-19 3-19 3-21
Engle’s ARCH Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-25
Detect ARCH Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Autocorrelation of Squared Residuals . . . . . . . . . . . . . . . . . . . . . . .
3-27 3-27
Conduct Engle's ARCH Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-29
Unit Root Nonstationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Is a Unit Root Test? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modeling Unit Root Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Available Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing for Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-32 3-32 3-32 3-36 3-37
Unit Root Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Simulated Data for a Unit Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Time Series Data for Unit Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Stock Data for Random Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-40 3-40 3-44 3-47
Assess Stationarity of a Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-50
Information Criteria for Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . Compute Information Criteria Using aicbic . . . . . . . . . . . . . . . . . . . . . . .
3-53 3-53
Model Comparison Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Available Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lagrange Multiplier Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-57 3-57 3-59 3-59 3-59 3-60
Conduct Lagrange Multiplier Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-61
Conduct Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-64
Compare GARCH Models Using Likelihood Ratio Test . . . . . . . . . . . . . . .
3-66
Classical Model Misspecification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-69
Check Fit of Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . .
3-80
Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-85
Residual Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Residuals for Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Residuals for Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Check Residuals for Conditional Heteroscedasticity . . . . . . . . . . . . . . . .
3-86 3-86 3-86 3-86
Assess Predictive Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-88
Nonspherical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Nonspherical Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-89 3-89
Plot a Confidence Band Using HAC Estimates . . . . . . . . . . . . . . . . . . . . .
3-90
Change the Bandwidth of a HAC Estimator . . . . . . . . . . . . . . . . . . . . . . .
3-97
Check Model Assumptions for Chow Test . . . . . . . . . . . . . . . . . . . . . . . .
3-103
Power of the Chow Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3-109
vii
4
viii
Contents
Econometric Modeler Analyze Time Series Data Using Econometric Modeler . . . . . . . . . . . . . . . Prepare Data for Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . Import Time Series Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fitting Models to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conducting Goodness-of-Fit Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finding Model with Best In-Sample Fit . . . . . . . . . . . . . . . . . . . . . . . . . . Export Session Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-2 4-3 4-4 4-6 4-15 4-30 4-36 4-38
Specifying Univariate Lag Operator Polynomials Interactively . . . . . . . . Specify Lag Structure Using Lag Order Tab . . . . . . . . . . . . . . . . . . . . . . Specify Lag Structure Using Lag Vector Tab . . . . . . . . . . . . . . . . . . . . . .
4-44 4-45 4-47
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Lag Structure Using Lag Order Tab . . . . . . . . . . . . . . . . . . . . . . Specify Lag Structure Using Lag Vector Tab . . . . . . . . . . . . . . . . . . . . . . Specify Coefficient Matrix Equality Constraints for Estimation . . . . . . . .
4-50 4-51 4-52 4-54
Prepare Time Series Data for Econometric Modeler App . . . . . . . . . . . . Prepare Table of Multivariate Data for Import . . . . . . . . . . . . . . . . . . . . . Prepare Numeric Vector for Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-59 4-59 4-60
Import Time Series Data into Econometric Modeler App . . . . . . . . . . . . Import Data from MATLAB Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . Import Data from MAT-File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-62 4-62 4-63
Plot Time Series Data Using Econometric Modeler App . . . . . . . . . . . . . Plot Univariate Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plot Multivariate Time Series and Correlations . . . . . . . . . . . . . . . . . . . .
4-66 4-66 4-67
Detect Serial Correlation Using Econometric Modeler App . . . . . . . . . . Plot ACF and PACF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conduct Ljung-Box Q-Test for Significant Autocorrelation . . . . . . . . . . . .
4-71 4-71 4-73
Detect ARCH Effects Using Econometric Modeler App . . . . . . . . . . . . . . Inspect Correlograms of Squared Residuals for ARCH Effects . . . . . . . . . Conduct Ljung-Box Q-Test on Squared Residuals . . . . . . . . . . . . . . . . . . Conduct Engle's ARCH Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-77 4-77 4-80 4-82
Assess Stationarity of Time Series Using Econometric Modeler . . . . . . . Test Assuming Unit Root Null Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Assuming Stationary Null Model . . . . . . . . . . . . . . . . . . . . . . . . . . . Test Assuming Random Walk Null Model . . . . . . . . . . . . . . . . . . . . . . . .
4-84 4-84 4-87 4-90
Assess Collinearity Among Multiple Series Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-94
Transform Time Series Using Econometric Modeler App . . . . . . . . . . . . Apply Log Transformation to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stabilize Time Series Using Nonseasonal Differencing . . . . . . . . . . . . .
4-97 4-97 4-101
Convert Prices to Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remove Seasonal Trend from Time Series Using Seasonal Difference . . Remove Deterministic Trend from Time Series . . . . . . . . . . . . . . . . . . .
4-104 4-107 4-109
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-112
Select ARCH Lags for GARCH Model Using Econometric Modeler App ........................................................
4-122
Estimate Multiplicative ARIMA Model Using Econometric Modeler App ........................................................
4-131
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-141 Specify t Innovation Distribution Using Econometric Modeler App . . .
4-150
Estimate Vector Autoregression Model Using Econometric Modeler . .
4-155
Conduct Cointegration Test Using Econometric Modeler . . . . . . . . . . .
4-170
Estimate Vector Error-Correction Model Using Econometric Modeler
4-180
Compare Predictive Performance After Creating Models Using Econometric Modeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-193
Estimate ARIMAX Model Using Econometric Modeler App . . . . . . . . . .
4-200
Estimate Regression Model with ARMA Errors Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-208
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4-221
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-230 Share Results of Econometric Modeler App Session . . . . . . . . . . . . . . .
5
4-237
Time Series Regression Models Time Series Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-3
Regression Models with Time Series Errors . . . . . . . . . . . . . . . . . . . . . . . . What Are Regression Models with Time Series Errors? . . . . . . . . . . . . . . . Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-5 5-5 5-5
Create Regression Models with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with ARIMA Errors Specifications . . . . . . . . . .
5-8 5-8
ix
Specify regARIMA Models Using Name-Value Pair Arguments . . . . . . . . . . 5-9 Specify Linear Regression Models Using Econometric Modeler App . . . . 5-15
x
Contents
Specify Default Regression Model with ARIMA Errors . . . . . . . . . . . . . .
5-19
Modify regARIMA Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modify Properties Using Dot Notation . . . . . . . . . . . . . . . . . . . . . . . . . . Nonmodifiable Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-21 5-21 5-23
Create Regression Models with AR Errors . . . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with AR Errors . . . . . . . . . . . . . . . . . . . . . . . . AR Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . . . . AR Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with AR Errors . . . . . . Regression Model with AR Errors and t Innovations . . . . . . . . . . . . . . . .
5-26 5-26 5-27 5-27 5-28 5-29
Create Regression Models with MA Errors . . . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with MA Errors . . . . . . . . . . . . . . . . . . . . . . . MA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with MA Errors . . . . . . Regression Model with MA Errors and t Innovations . . . . . . . . . . . . . . . .
5-31 5-31 5-32 5-32 5-33 5-34
Create Regression Models with ARMA Errors . . . . . . . . . . . . . . . . . . . . . . Default Regression Model with ARMA Errors . . . . . . . . . . . . . . . . . . . . . ARMA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . ARMA Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with ARMA Errors . . . . Regression Model with ARMA Errors and t Innovations . . . . . . . . . . . . . Specify Regression Model with ARMA Errors Using Econometric Modeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-37 5-37 5-38 5-38 5-39 5-40
Create Regression Models with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . Default Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . ARIMA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . . ARIMA Error Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with ARIMA Errors . . . Regression Model with ARIMA Errors and t Innovations . . . . . . . . . . . . .
5-46 5-46 5-47 5-47 5-48 5-49
Create Regression Models with SARIMA Errors . . . . . . . . . . . . . . . . . . . . SARMA Error Model Without an Intercept . . . . . . . . . . . . . . . . . . . . . . . Known Parameter Values for a Regression Model with SARIMA Errors . . Regression Model with SARIMA Errors and t Innovations . . . . . . . . . . . .
5-51 5-51 5-52 5-52
Specify Regression Model with SARIMA Errors . . . . . . . . . . . . . . . . . . . .
5-55
Specify ARIMA Error Model Innovation Distribution . . . . . . . . . . . . . . . . About the Innovation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Innovation Distribution Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-61 5-61 5-62 5-62
Impulse Response of Regression Models with ARIMA Errors . . . . . . . . .
5-66
Plot Impulse Response of Regression Model with ARIMA Errors . . . . . . Regression Model with AR Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-67 5-67
5-41
Regression Model with MA Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Model with ARMA Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-68 5-69 5-71
Maximum Likelihood Estimation of regARIMA Models . . . . . . . . . . . . . . Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-74 5-74 5-74
regARIMA Model Estimation Using Equality Constraints . . . . . . . . . . . .
5-76
Presample Values for regARIMA Model Estimation . . . . . . . . . . . . . . . . .
5-80
Initial Values for regARIMA Model Estimation . . . . . . . . . . . . . . . . . . . . .
5-82
Optimization Settings for regARIMA Model Estimation . . . . . . . . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Constraints on Regression Models with ARIMA Errors . . . . . . . . . . . . . .
5-84 5-84 5-86
Estimate Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . .
5-88
Estimate a Regression Model with Multiplicative ARIMA Errors . . . . . .
5-95
Select Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . . . . .
5-103
Choose Lags for ARMA Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-105
Intercept Identifiability in Regression Models with ARIMA Errors . . . Intercept Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intercept Identifiability Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-109 5-109 5-110
Alternative ARIMA Model Representations . . . . . . . . . . . . . . . . . . . . . . . Mathematical Development of regARIMA to ARIMAX Model Conversion .................................................... Show Conversion in MATLAB® . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-113
Simulate Regression Models with ARMA Errors . . . . . . . . . . . . . . . . . . . Simulate an AR Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate an MA Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate an ARMA Error Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-119 5-119 5-125 5-131
Simulate Regression Models with Nonstationary Errors . . . . . . . . . . . . Simulate a Regression Model with Nonstationary Errors . . . . . . . . . . . Simulate a Regression Model with Nonstationary Exponential Errors . .
5-138 5-138 5-141
5-113 5-115
Simulate Regression Models with Multiplicative Seasonal Errors . . . . 5-146 Simulate a Regression Model with Stationary Multiplicative Seasonal Errors .................................................... 5-146 Untitled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-148 Monte Carlo Simulation of Regression Models with ARIMA Errors . . . What Is Monte Carlo Simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Monte Carlo Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-151 5-151 5-151 5-152
Presample Data for regARIMA Model Simulation . . . . . . . . . . . . . . . . .
5-154
xi
Transient Effects in regARIMA Model Simulations . . . . . . . . . . . . . . . . What Are Transient Effects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Illustration of Transient Effects on Regression . . . . . . . . . . . . . . . . . . .
5-155 5-155 5-155
Forecast a Regression Model with ARIMA Errors . . . . . . . . . . . . . . . . . .
5-163
Forecast a Regression Model with Multiplicative Seasonal ARIMA Errors ........................................................ 5-166
6
Verify Predictive Ability Robustness of a regARIMA Model . . . . . . . . . .
5-170
MMSE Forecasting Regression Models with ARIMA Errors . . . . . . . . . What Are MMSE Forecasts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How forecast Generates MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . Forecast Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-172 5-172 5-172 5-174
Monte Carlo Forecasting of regARIMA Models . . . . . . . . . . . . . . . . . . . . Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantage of Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-175 5-175 5-175
Time Series Regression I: Linear Models . . . . . . . . . . . . . . . . . . . . . . . .
5-176
Time Series Regression II: Collinearity and Estimator Variance . . . . .
5-183
Time Series Regression III: Influential Observations . . . . . . . . . . . . . . .
5-193
Time Series Regression IV: Spurious Regression . . . . . . . . . . . . . . . . . .
5-200
Time Series Regression V: Predictor Selection . . . . . . . . . . . . . . . . . . . .
5-212
Time Series Regression VI: Residual Diagnostics . . . . . . . . . . . . . . . . .
5-223
Time Series Regression VII: Forecasting . . . . . . . . . . . . . . . . . . . . . . . . .
5-234
Time Series Regression VIII: Lagged Variables and Estimator Bias . . .
5-243
Time Series Regression IX: Lag Order Selection . . . . . . . . . . . . . . . . . .
5-264
Time Series Regression X: Generalized Least Squares and HAC Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5-282
Bayesian Linear Regression Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classical Versus Bayesian Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Main Bayesian Analysis Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . Posterior Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implement Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . Workflow for Standard Bayesian Linear Regression Models . . . . . . . . . .
xii
Contents
6-2 6-2 6-3 6-4 6-10 6-10
7
Workflow for Bayesian Predictor Selection . . . . . . . . . . . . . . . . . . . . . . .
6-13
Specify Gradient for HMC Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6-18
Posterior Estimation and Simulation Diagnostics . . . . . . . . . . . . . . . . . . Diagnose MCMC Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perform Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6-27 6-27 6-34
Tune Slice Sampler for Posterior Estimation . . . . . . . . . . . . . . . . . . . . . .
6-36
Compare Robust Regression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . .
6-43
Bayesian Lasso Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6-52
Bayesian Stochastic Search Variable Selection . . . . . . . . . . . . . . . . . . . .
6-63
Replacing Removed Syntaxes of estimate . . . . . . . . . . . . . . . . . . . . . . . . . Replace Removed Syntax When Estimating Analytical Marginal Posterior ..................................................... Replace Removed Syntax When Estimating Numerical Marginal Posterior ..................................................... Replace Removed Syntax When Estimating Conditional Posterior . . . . . .
6-73 6-74 6-75 6-77
Conditional Mean Models Creating Univariate Conditional Mean Models . . . . . . . . . . . . . . . . . . . . . . 7-3 Default ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 Specify Nonseasonal Models Using Name-Value Arguments . . . . . . . . . . . 7-5 Specify Multiplicative Models Using Name-Value Arguments . . . . . . . . . . . 7-9 Specify Conditional Mean Model Using Econometric Modeler App . . . . . 7-12 What Are Conditional Mean Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13 Create Autoregressive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AR Model with No Constant Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AR Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . . AR Model with t Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . Specify AR Model Using Econometric Modeler App . . . . . . . . . . . . . . . . What Are Autoregressive Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-16 7-16 7-17 7-17 7-18 7-19 7-19 7-21
Create Moving Average Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default MA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Model with No Constant Term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . . . . . . . MA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . MA Model with t Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . Specify MA Model Using Econometric Modeler App . . . . . . . . . . . . . . . . What Are Moving Average Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-24 7-24 7-25 7-25 7-26 7-27 7-27 7-29
xiii
xiv
Contents
Create Autoregressive Moving Average Models . . . . . . . . . . . . . . . . . . . . Default ARMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMA Model with No Constant Term . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . . Specify ARMA Model Using Econometric Modeler App . . . . . . . . . . . . . . What Are Autoregressive Moving Average Models? . . . . . . . . . . . . . . . . .
7-31 7-31 7-31 7-32 7-33 7-35
Create Autoregressive Integrated Moving Average Models . . . . . . . . . . . Default ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ARIMA Model with Known Parameter Values . . . . . . . . . . . . . . . . . . . . . Specify ARIMA Model Using Econometric Modeler App . . . . . . . . . . . . . What Are ARIMA Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-38 7-38 7-39 7-39 7-41
Create Multiplicative ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seasonal ARIMA Model with No Constant Term . . . . . . . . . . . . . . . . . . . Seasonal ARIMA Model with Known Parameter Values . . . . . . . . . . . . . . Specify Multiplicative ARIMA Model Using Econometric Modeler App . . What Are Multiplicative ARIMA Models? . . . . . . . . . . . . . . . . . . . . . . . .
7-44 7-44 7-45 7-46 7-49
Create Multiplicative Seasonal ARIMA Model for Time Series Data . . .
7-51
Create ARIMA Models That Include Exogenous Covariates . . . . . . . . . . . Create ARIMAX Model Using Longhand Syntax . . . . . . . . . . . . . . . . . . . Specify ARMAX Model Using Dot Notation . . . . . . . . . . . . . . . . . . . . . . . Specify ARIMAX or SARIMAX Model Using Econometric Modeler App . . What Are ARIMA Models That Include Exogenous Covariates? . . . . . . . .
7-55 7-55 7-56 7-57 7-61
Modify Properties of Conditional Mean Model Objects . . . . . . . . . . . . . . Dot Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonmodifiable Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-63 7-63 7-66
Specify Conditional Mean Model Innovation Distribution . . . . . . . . . . . . About the Innovation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choices for the Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choices for the Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . Specify the Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modify the Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-69 7-69 7-69 7-69 7-70 7-72
Specify Conditional Mean and Variance Models . . . . . . . . . . . . . . . . . . . .
7-75
Plot the Impulse Response Function of Conditional Mean Model . . . . . IRF of Moving Average Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IRF of Autoregressive Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IRF of ARMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IRF of Seasonal AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . More About the Impulse Response Function . . . . . . . . . . . . . . . . . . . . . .
7-80 7-80 7-84 7-89 7-91 7-95
Time Base Partitions for ARIMA Model Estimation . . . . . . . . . . . . . . . . . Partition Time Series Data for Estimation . . . . . . . . . . . . . . . . . . . . . . . .
7-97 7-99
Box-Jenkins Differencing vs. ARIMA Estimation . . . . . . . . . . . . . . . . . .
7-103
Maximum Likelihood Estimation for Conditional Mean Models . . . . . . Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-106 7-106 7-106
Conditional Mean Model Estimation with Equality Constraints . . . . . .
7-108
Presample Data for Conditional Mean Model Estimation . . . . . . . . . . .
7-109
Initial Values for Conditional Mean Model Estimation . . . . . . . . . . . . .
7-111
Optimization Settings for Conditional Mean Model Estimation . . . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional Mean Model Constraints . . . . . . . . . . . . . . . . . . . . . . . . . .
7-113 7-113 7-115
Estimate Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . .
7-117
Model Seasonal Lag Effects Using Indicator Variables . . . . . . . . . . . . .
7-120
Forecast IGD Rate from ARX Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-124
Estimate Conditional Mean and Variance Model . . . . . . . . . . . . . . . . . .
7-130
Choose ARMA Lags Using BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-135
Infer Residuals for Diagnostic Checking . . . . . . . . . . . . . . . . . . . . . . . . .
7-138
Monte Carlo Simulation of Conditional Mean Models . . . . . . . . . . . . . . What Is Monte Carlo Simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Monte Carlo Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-143 7-143 7-143 7-144
Presample Data for Conditional Mean Model Simulation . . . . . . . . . . .
7-145
Transient Effects in Conditional Mean Model Simulations . . . . . . . . . .
7-146
Simulate Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate AR Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulate MA Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-147 7-147 7-151
Simulate Trend-Stationary and Difference-Stationary Processes . . . . .
7-155
Simulate Multiplicative ARIMA Models . . . . . . . . . . . . . . . . . . . . . . . . .
7-159
Simulate Conditional Mean and Variance Models . . . . . . . . . . . . . . . . .
7-162
Monte Carlo Forecasting of Conditional Mean Models . . . . . . . . . . . . . Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantage of Monte Carlo Forecasting . . . . . . . . . . . . . . . . . . . . . . . . .
7-166 7-166 7-166
MMSE Forecasting of Conditional Mean Models . . . . . . . . . . . . . . . . . . What Are MMSE Forecasts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How forecast Generates MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . Forecast Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-167 7-167 7-167 7-168
Convergence of AR Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-170
Forecast Multiplicative ARIMA Model . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-174
xv
Specify Presample and Forecast Period Data to Forecast ARIMAX Model ........................................................ 7-177
8
xvi
Contents
Forecast Conditional Mean and Variance Model . . . . . . . . . . . . . . . . . . .
7-181
Model and Simulate Electricity Spot Prices Using the Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7-184
Conditional Variance Models Conditional Variance Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . General Conditional Variance Model Definition . . . . . . . . . . . . . . . . . . . . . GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EGARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GJR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-2 8-2 8-3 8-3 8-4
Specify GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Default GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Name-Value Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify GARCH Model Using Econometric Modeler App . . . . . . . . . . . . . Specify GARCH Model with Mean Offset . . . . . . . . . . . . . . . . . . . . . . . . . Specify GARCH Model with Known Parameter Values . . . . . . . . . . . . . . . Specify GARCH Model with t Innovation Distribution . . . . . . . . . . . . . . . Specify GARCH Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . .
8-6 8-6 8-7 8-8 8-11 8-13 8-14 8-14 8-15
Specify EGARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default EGARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Default EGARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Name-Value Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify EGARCH Model Using Econometric Modeler App . . . . . . . . . . . . Specify EGARCH Model with Mean Offset . . . . . . . . . . . . . . . . . . . . . . . . Specify EGARCH Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . Specify EGARCH Model with Known Parameter Values . . . . . . . . . . . . . . Specify EGARCH Model with t Innovation Distribution . . . . . . . . . . . . . .
8-17 8-17 8-19 8-19 8-22 8-24 8-24 8-25 8-26
Specify GJR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Default GJR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Default GJR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Name-Value Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify GJR Model Using Econometric Modeler App . . . . . . . . . . . . . . . . Specify GJR Model with Mean Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify GJR Model with Nonconsecutive Lags . . . . . . . . . . . . . . . . . . . . . Specify GJR Model with Known Parameter Values . . . . . . . . . . . . . . . . . . Specify GJR Model with t Innovation Distribution . . . . . . . . . . . . . . . . . .
8-28 8-28 8-29 8-30 8-33 8-35 8-35 8-36 8-37
Modify Properties of Conditional Variance Models . . . . . . . . . . . . . . . . . Dot Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nonmodifiable Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-39 8-39 8-41
Specify the Conditional Variance Model Innovation Distribution . . . . . .
8-44
Specify Conditional Variance Model for Exchange Rates . . . . . . . . . . . . .
8-47
Maximum Likelihood Estimation for Conditional Variance Models . . . . Innovation Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loglikelihood Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-52 8-52 8-52
Conditional Variance Model Estimation with Equality Constraints . . . .
8-54
Presample Data for Conditional Variance Model Estimation . . . . . . . . . .
8-55
Initial Values for Conditional Variance Model Estimation . . . . . . . . . . . .
8-57
Optimization Settings for Conditional Variance Model Estimation . . . . Optimization Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conditional Variance Model Constraints . . . . . . . . . . . . . . . . . . . . . . . . .
8-58 8-58 8-60
Infer Conditional Variances and Residuals . . . . . . . . . . . . . . . . . . . . . . . .
8-62
Likelihood Ratio Test for Conditional Variance Models . . . . . . . . . . . . . .
8-66
Compare Conditional Variance Models Using Information Criteria . . . .
8-69
Monte Carlo Simulation of Conditional Variance Models . . . . . . . . . . . . What Is Monte Carlo Simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generate Monte Carlo Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-72 8-72 8-72 8-73
Presample Data for Conditional Variance Model Simulation . . . . . . . . . .
8-75
Simulate GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-76
Assess EGARCH Forecast Bias Using Simulations . . . . . . . . . . . . . . . . . .
8-81
Simulate Conditional Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-86
Monte Carlo Forecasting of Conditional Variance Models . . . . . . . . . . . . Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantage of Monte Carlo Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . .
8-89 8-89 8-89
MMSE Forecasting of Conditional Variance Models . . . . . . . . . . . . . . . . . What Are MMSE Forecasts? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EGARCH MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How forecast Generates MMSE Forecasts . . . . . . . . . . . . . . . . . . . . . . .
8-90 8-90 8-90 8-90
Forecast GJR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-94
Forecast a Conditional Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-97
Converting from GARCH Functions to Model Objects . . . . . . . . . . . . . . .
8-99
Using Bootstrapping and Filtered Historical Simulation to Evaluate Market Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8-102
Using Extreme Value Theory and Copulas to Evaluate Market Risk . . .
8-114
xvii
9
Multivariate Time Series Models Vector Autoregression (VAR) Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Types of Stationary Multivariate Time Series Models . . . . . . . . . . . . . . . . Lag Operator Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stable and Invertible Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Models with Regression Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VAR Model Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xviii
Contents
9-3 9-3 9-5 9-6 9-6 9-7
Multivariate Time Series Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Load Multivariate Economic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multivariate Data Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preprocess Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time Base Partitions for Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partition Multivariate Time Series Data for Estimation . . . . . . . . . . . . . .
9-10 9-10 9-10 9-12 9-15 9-15 9-18
Vector Autoregression (VAR) Model Creation . . . . . . . . . . . . . . . . . . . . . . Create VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fully Specified Model Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Template for Unrestricted Estimation . . . . . . . . . . . . . . . . . . . . . . Partially Specified Model Object for Restricted Estimation . . . . . . . . . . . Display and Change Model Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Select Appropriate Lag Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-20 9-20 9-21 9-23 9-24 9-24 9-27
Create and Adjust VAR Model Using Shorthand Syntax . . . . . . . . . . . . . .
9-30
Create and Adjust VAR Model Using Longhand Syntax . . . . . . . . . . . . . .
9-32
VAR Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preparing VAR Models for Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fitting Models to Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examining the Stability of a Fitted Model . . . . . . . . . . . . . . . . . . . . . . . .
9-34 9-34 9-34 9-35
Convert VARMA Model to VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-37
Fit VAR Model of CPI and Unemployment Rate . . . . . . . . . . . . . . . . . . . .
9-38
Fit VAR Model to Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-42
VAR Model Forecasting, Simulation, and Analysis . . . . . . . . . . . . . . . . . . VAR Model Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-44 9-44 9-46 9-46
Generate VAR Model Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . .
9-48
Compare Generalized and Orthogonalized Impulse Response Functions .........................................................
9-52
Forecast VAR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-60
Forecast VAR Model Using Monte Carlo Simulation . . . . . . . . . . . . . . . .
9-63
Forecast VAR Model Conditional Responses . . . . . . . . . . . . . . . . . . . . . . .
9-66
Implement Seemingly Unrelated Regression . . . . . . . . . . . . . . . . . . . . . .
9-70
Estimate Capital Asset Pricing Model Using SUR . . . . . . . . . . . . . . . . . .
9-75
Simulate Responses of Estimated VARX Model . . . . . . . . . . . . . . . . . . . .
9-78
Simulate VAR Model Conditional Responses . . . . . . . . . . . . . . . . . . . . . .
9-84
Simulate Responses Using filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-88
VAR Model Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-90
Convert from vgx Functions to Model Objects . . . . . . . . . . . . . . . . . . . .
9-104
Cointegration and Error Correction Analysis . . . . . . . . . . . . . . . . . . . . . Integration and Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cointegration and Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . The Role of Deterministic Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cointegration Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-107 9-107 9-107 9-108 9-109
Determine Cointegration Rank of VEC Model . . . . . . . . . . . . . . . . . . . .
9-111
Identifying Single Cointegrating Relations . . . . . . . . . . . . . . . . . . . . . . . The Engle-Granger Test for Cointegration . . . . . . . . . . . . . . . . . . . . . . . Limitations of the Engle-Granger Test . . . . . . . . . . . . . . . . . . . . . . . . . .
9-113 9-113 9-113
Test for Cointegration Using the Engle-Granger Test . . . . . . . . . . . . . .
9-117
Estimate VEC Model Parameters Using egcitest . . . . . . . . . . . . . . . . . .
9-121
VEC Model Monte Carlo Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-124
Generate VEC Model Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . .
9-132
Identifying Multiple Cointegrating Relations . . . . . . . . . . . . . . . . . . . . .
9-136
Test for Cointegration Using the Johansen Test . . . . . . . . . . . . . . . . . . .
9-137
Estimate VEC Model Parameters Using jcitest . . . . . . . . . . . . . . . . . . . .
9-139
Compare Approaches to Cointegration Analysis . . . . . . . . . . . . . . . . . . .
9-142
Testing Cointegrating Vectors and Adjustment Speeds . . . . . . . . . . . . .
9-145
Test Cointegrating Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-146
Test Adjustment Speeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-148
Model the United States Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9-150
Incorporate Macroeconomic Scenario Projections in Loan Portfolio ECL Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-176
xix
10
xx
Contents
Structural Change Models Discrete-Time Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What Are Discrete-Time Markov Chains? . . . . . . . . . . . . . . . . . . . . . . . . Discrete-Time Markov Chain Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-2 10-2 10-3
Markov Chain Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discrete-Time Markov Chain Object Framework Overview . . . . . . . . . . . Markov Chain Analysis Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-8 10-8 10-11
Create and Modify Markov Chain Model Objects . . . . . . . . . . . . . . . . . . Create Markov Chain from Stochastic Transition Matrix . . . . . . . . . . . . Create Markov Chain from Random Transition Matrix . . . . . . . . . . . . . Specify Structure for Random Markov Chain . . . . . . . . . . . . . . . . . . . .
10-17 10-17 10-18 10-20
Work with State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-23
Visualize Markov Chain Structure and Evolution . . . . . . . . . . . . . . . . . .
10-27
Determine Asymptotic Behavior of Markov Chain . . . . . . . . . . . . . . . . .
10-39
Identify Classes in Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-47
Compare Markov Chain Mixing Times . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-50
Simulate Random Walks Through Markov Chain . . . . . . . . . . . . . . . . . .
10-59
Compute State Distribution of Markov Chain at Each Time Step . . . . .
10-66
Create Threshold Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-73
Visualize Threshold Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-76
Evaluate Threshold Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-82
Create Threshold-Switching Dynamic Regression Models . . . . . . . . . . .
10-88
Estimate Threshold-Switching Dynamic Regression Models . . . . . . . . .
10-94
Simulate Paths of Threshold-Switching Dynamic Regression Models
10-111
Forecast Threshold-Switching Dynamic Regression Models . . . . . . . .
10-118
Analyze US Unemployment Rate Using Threshold-Switching Model .
10-124
Creating Markov-Switching Dynamic Regression Models . . . . . . . . . . What Is a Markov-Switching Dynamic Regression Model? . . . . . . . . . . Markov-Switching Model Functionality of Econometrics Toolbox . . . . Represent Markov-Switching Model Using msVAR . . . . . . . . . . . . . . .
10-139 10-139 10-140 10-141
Create Univariate Markov-Switching Dynamic Regression Models . . Create Fully Specified Univariate Model . . . . . . . . . . . . . . . . . . . . . . .
10-146 10-146
10-149
Create Partially Specified Univariate Model for Estimation . . . . . . . . . Create Partially Specified Univariate Model Containing Regression Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-152
Modify msVAR Model Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-154
Create Multivariate Markov-Switching Dynamic Regression Models Create Fully Specified Multivariate Model . . . . . . . . . . . . . . . . . . . . . Create Fully Specified Multivariate Model Containing Regression Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Partially Specified Multivariate Model Containing Regression Components for Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10-156 10-156 10-159 10-161
Simulate Univariate Markov-Switching Dynamic Regression Model .
10-165
Simulate Multivariate Markov-Switching Dynamic Regression Model .......................................................
10-170
Monte Carlo Simulation of Markov-Switching Dynamic Regression Model Response Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-175 Analyze US Unemployment Rate Using Markov-Switching Model . . .
11
10-179
State-Space Models What Are State-Space Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State-Space Model Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-3 11-3 11-5
What Is the Kalman Filter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Standard Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . State Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filtered States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothed States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothed State Disturbances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Forecasted Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothed Observation Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kalman Gain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Backward Recursion of the Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . Diffuse Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-7 11-7 11-8 11-8 11-9 11-9 11-10 11-10 11-11 11-11 11-12
Explicitly Create State-Space Model Containing Known Parameter Values ........................................................ 11-14 Create State-Space Model with Unknown Parameters . . . . . . . . . . . . . . Explicitly Create State-Space Model Containing Unknown Parameters .................................................... Implicitly Create Time-Invariant State-Space Model . . . . . . . . . . . . . . .
11-16
Create State-Space Model Containing ARMA State . . . . . . . . . . . . . . . .
11-19
11-16 11-17
xxi
Implicitly Create State-Space Model Containing Regression Component ........................................................ 11-22 Implicitly Create Diffuse State-Space Model Containing Regression Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-24
Implicitly Create Time-Varying State-Space Model . . . . . . . . . . . . . . . .
11-26
Implicitly Create Time-Varying Diffuse State-Space Model . . . . . . . . . .
11-28
Create State-Space Model with Random State Coefficient . . . . . . . . . .
11-31
Estimate Time-Invariant State-Space Model . . . . . . . . . . . . . . . . . . . . . .
11-33
Estimate Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . .
11-36
Estimate Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . . .
11-40
Estimate State-Space Model Containing Regression Component . . . . .
11-44
Filter States of State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-46
Filter Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . . .
11-49
Filter Data Through State-Space Model in Real Time . . . . . . . . . . . . . .
11-54
Filter Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . . . . .
11-60
Filter States of State-Space Model Containing Regression Component ........................................................
11-66
Smooth States of State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-69
Smooth Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . . .
11-72
Smooth Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . . . .
11-78
Smooth States of State-Space Model Containing Regression Component ........................................................ 11-84
xxii
Contents
Simulate States and Observations of Time-Invariant State-Space Model ........................................................
11-87
Simulate Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . . .
11-90
Simulate States of Time-Varying State-Space Model Using Simulation Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-94
Estimate Random Parameter of State-Space Model . . . . . . . . . . . . . . . .
11-97
Forecast State-Space Model Using Monte-Carlo Methods . . . . . . . . . .
11-104
Forecast State-Space Model Observations . . . . . . . . . . . . . . . . . . . . . .
11-110
Forecast Observations of State-Space Model Containing Regression Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-113
Forecast Time-Varying State-Space Model . . . . . . . . . . . . . . . . . . . . . .
11-117
Forecast State-Space Model Containing Regime Change in the Forecast Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-121 Forecast Time-Varying Diffuse State-Space Model . . . . . . . . . . . . . . . .
11-126
Compare Simulation Smoother to Smoothed States . . . . . . . . . . . . . .
11-130
Rolling-Window Analysis of Time-Series Models . . . . . . . . . . . . . . . . . Rolling-Window Analysis for Parameter Stability . . . . . . . . . . . . . . . . . Rolling Window Analysis for Predictive Performance . . . . . . . . . . . . . .
11-135 11-135 11-135
Assess State-Space Model Stability Using Rolling Window Analysis . Assess Model Stability Using Rolling Window Analysis . . . . . . . . . . . . Assess Stability of Implicitly Created State-Space Model . . . . . . . . . .
11-138 11-138 11-141
Choose State-Space Model Specification Using Backtesting . . . . . . . .
11-145
Fit Bayesian Stochastic Volatility Model to S&P 500 Volatility . . . . . .
11-148
Apply State-Space Methodology to Analyze Diebold-Li Yield Curve Model ....................................................... 11-160
12
A
Analyze Linearized DSGE Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-190
Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11-211
Functions
Appendices Data Sets and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Featured Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A-2 A-2 A-3
xxiii
1 Getting Started • “Econometrics Toolbox Product Description” on page 1-2 • “Econometric Modeling” on page 1-3 • “Represent Time Series Models Using Econometrics Toolbox Objects” on page 1-7 • “Stochastic Process Characteristics” on page 1-18 • “Bibliography” on page 1-24
1
Getting Started
Econometrics Toolbox Product Description Model and analyze financial and economic systems using statistical time series methods Econometrics Toolbox provides functions and interactive workflows for analyzing and modeling time series data. It offers a wide range of visualizations and diagnostics for model selection, including tests for autocorrelation and heteroscedasticity, unit roots and stationarity, cointegration, causality, and structural change. You can estimate, simulate, and forecast economic systems using a variety of modeling frameworks that can be used either interactively, using the Econometric Modeler app, or programmatically, using functions provided in the toolbox. These frameworks include regression, ARIMA, state-space, GARCH, multivariate VAR and VEC, and switching models. The toolbox also provides Bayesian tools for developing time-varying models that learn from new data.
1-2
Econometric Modeling
Econometric Modeling In this section... “Model Selection” on page 1-3 “Econometrics Toolbox Features” on page 1-3
Model Selection A probabilistic time series model is necessary for a wide variety of analysis goals, including regression inference, forecasting, and Monte Carlo simulation. When selecting a model, aim to find the most parsimonious model that adequately describes your data. A simple model is easier to estimate, forecast, and interpret. • Specification tests help you identify one or more model families that could plausibly describe the data generating process. • Model comparisons help you compare the fit of competing models, with penalties for complexity. • Goodness-of-fit checks help you assess the in-sample adequacy of your model, verify that all model assumptions hold, and evaluate out-of-sample forecast performance. Model selection is an iterative process. When goodness-of-fit checks suggest model assumptions are not satisfied—or the predictive performance of the model is not satisfactory—consider making model adjustments. Additional specification tests, model comparisons, and goodness-of-fit checks help guide this process.
Econometrics Toolbox Features Modeling Questions
Features
What is the dimension of my response variable?
• The conditional mean and variance models, regression • arima models with ARIMA errors, and Bayesian linear regression • bayeslm models in this toolbox are for modeling univariate, • egarch discrete-time data. • egcitest • Separate models are available for multivariate, discretetime data, such as VAR and VEC models. • State-space models support univariate or multivariate response variables.
Related Functions
• dssm • garch • gjr • jcontest • regARIMA • ssm • varm
Is my time series stationary?
• Stationarity tests are available. If your data is not • arima stationary, consider transforming your data. Stationarity is • i10test the foundation of many time series models. • kpsstest • Or, consider using a nonstationary ARIMA model if there • lmctest is evidence of a unit root in your data.
1-3
1
Getting Started
Modeling Questions
Features
Related Functions
Does my time series have a unit root?
• Unit root tests are available. Evidence in favor of a unit root suggests your data is difference stationary.
• adftest
• You can difference a series with a unit root until it is stationary, or model it using a nonstationary ARIMA model.
• i10test
• arima • pptest • vratiotest
How can I handle seasonal effects?
• You can deseasonalize (seasonally adjust) your data. Use seasonal filters or regression models to estimate the seasonal component.
• arima • regARIMA
• Seasonal ARIMA models use seasonal differencing to remove seasonal effects. You can also include seasonal lags to model seasonal autocorrelation (both additively and multiplicatively). Is my data autocorrelated?
• Sample autocorrelation and partial autocorrelation functions help identify autocorrelation.
• arima
• Conduct a Ljung-Box Q-test to test autocorrelations at several lags jointly.
• fgls
• If autocorrelation is present, consider using a conditional mean model.
• lbqtest
• For regression models with autocorrelated errors, consider using FGLS or HAC estimators. If the error model structure is an ARIMA model, consider using a regression model with ARIMA errors. What if my data is heteroscedastic (exhibits volatility clustering)?
• autocorr • hac • parcorr • regARIMA
• Looking for autocorrelation in the squared residual series • archtest is one way to detect conditional heteroscedasticity. • egarch • Engle’s ARCH test evaluates evidence against the null of • fgls independent innovations in favor of an ARCH model • garch alternative. • To model conditional heteroscedasticity, consider using a • gjr conditional variance model.
• hac
• For regression models that exhibit heteroscedastic errors, consider using FGLS or HAC estimators. Is there an alternative to a Gaussian innovation distribution for leptokurtic data?
• You can use a Student’s t distribution to model fatter tails • arima than a Gaussian distribution (excess kurtosis). • egarch • You can specify a t innovation distribution for all • garch conditional mean and variance models, and ARIMA error • gjr models in Econometrics Toolbox. • regARIMA • You can estimate the degrees of freedom of the t distribution along with other model parameters.
1-4
Econometric Modeling
Modeling Questions
Features
Related Functions
How do I decide between several model fits?
• You can compare nested models using misspecification tests, such as the likelihood ratio test, Wald’s test, or Lagrange multiplier test.
• aicbic
• Information criteria, such as AIC or BIC, compare model fit with a penalty for complexity. Do I have two or • The Johansen and Engle-Granger cointegration tests more time series that assess evidence of cointegration. are cointegrated? • Consider using the VEC model for modeling multivariate, cointegrated series.
• lmtest • lratiotest • waldtest • egcitest • jcitest • jcontest
• Also consider cointegration when regressing time series. If present, it can introduce spurious regression effects. What if I want to include predictor variables?
• ARIMAX, VARX, regression models with ARIMA errors, and Bayesian linear regression models are available in this toolbox. • State-space models support predictor data.
• arima • bayeslm • dssm • ssm • regARIMA • varm
What if I want to implement regression, but the classical linear model assumptions might not apply?
• Regression models with ARIMA errors are available in this • bayeslm toolbox. • fgls • Regress robustly using FGLS or HAC estimators. • hac • Use Bayesian linear regression.
• mvregress
• For a series of examples on time series regression techniques that illustrate common principles and tasks in time series regression modeling, see Econometrics Toolbox Examples.
• regARIMA
• For more regression options, see Statistics and Machine Learning Toolbox™ documentation. What if observations Standard, linear state-space modeling is available in this of a dynamic process toolbox. include measurement error?
• dssm • ssm
See Also Related Examples •
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
•
“Detect Autocorrelation” on page 3-19
•
“Detect ARCH Effects” on page 3-27
•
“Unit Root Tests” on page 3-40
•
“Time Series Regression I: Linear Models” on page 5-176
1-5
1
Getting Started
•
“Time Series Regression II: Collinearity and Estimator Variance” on page 5-183
•
“Time Series Regression III: Influential Observations” on page 5-193
•
“Time Series Regression IV: Spurious Regression” on page 5-200
•
“Time Series Regression V: Predictor Selection” on page 5-212
•
“Time Series Regression VI: Residual Diagnostics” on page 5-223
•
“Time Series Regression VII: Forecasting” on page 5-234
•
“Time Series Regression VIII: Lagged Variables and Estimator Bias” on page 5-243
•
“Time Series Regression IX: Lag Order Selection” on page 5-264
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-282
More About
1-6
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
•
“Goodness of Fit” on page 3-85
•
“Regression Models with Time Series Errors” on page 5-5
•
“Nonspherical Models” on page 3-89
•
“What Are Conditional Mean Models?” on page 7-13
•
“Conditional Variance Models” on page 8-2
•
“Vector Autoregression (VAR) Models” on page 9-3
•
“Cointegration and Error Correction Analysis” on page 9-107
Represent Time Series Models Using Econometrics Toolbox Objects
Represent Time Series Models Using Econometrics Toolbox Objects In this section... “Model Objects” on page 1-7 “Model Properties” on page 1-10 “Create Model Object” on page 1-11 “Retrieve Model Properties” on page 1-15 “Modify Model Properties” on page 1-16 “Object Functions” on page 1-17
Model Objects Econometrics Toolbox includes a number of model objects used to represent a variety of discretetime, time series models. The supported models are univariate or multivariate, linear or nonlinear, and standard or Bayesian. Model specification tests (see “Specification Testing”), economic theory, or your analysis goals can suggest a model, or set of models, for your data. After preprocessing your data, running specification tests, and selecting a set of candidate models, create the objects that best represent the models in MATLAB® to proceed with your analysis. How you create an object depends on the object type. In general, you create a model object on page 1-11 by calling the object using its name and providing values for the corresponding model parameters. Models contain two main types of parameters: model infrastructure parameters, such as model dimensionality or number of lags, and estimable parameters, such as coefficients and an error variance. Objects store the parameter values, and other information, in model properties on page 110. You operate on models by passing them and possibly other inputs, such as data, to object functions on page 1-17. The following tables contain the objects available with Econometrics Toolbox. Univariate Linear Model Objects This table contains the available objects that represent univariate linear models. You can create some models by using the Econometric Modeler app. Model
Object
Econometric Modeler Support?
Integrated, autoregressive, moving average (ARIMA) model optionally containing exogenous predictor variables (ARIMAX) or seasonal components (SARIMA)
arima
Yes
Regression model with ARIMA errors
regARIMA
Yes
Generalized autoregressive conditional heteroscedasticity garch model (GARCH)
Yes
Exponential GARCH model
egarch
Yes
Glosten-Jagannathan-Runkle model
gjr
Yes
1-7
1
Getting Started
Multivariate Linear Model Objects This table contains the available objects that represent multivariate linear models. Description
Object
Econometric Modeler Support?
Vector autoregression model (VAR) optionally containing exogenous predictor variables (VARX)
varm
Yes
Vector error-correction (VEC), or cointegrated VAR, model vecm optionally containing exogenous predictor variables (VECX)
Yes
Nonlinear Model Objects Nonlinear models included with Econometrics Toolbox are nonlinear because at least one model parameter or coefficient is time-varying. Regime-switching and time-varying state-space models have this characteristic. This table contains the available objects that represent multivariate nonlinear models. Description
Object
Notes
Econometric Modeler Support?
Discrete-state threshold-switching dynamic regression model
tsVAR
A tsVAR object is the No composition of arima or varm objects, specifying the dynamic structure in each state, and a threshold object, specifying the switching mechanism ( see “Other Models” on page 110).
Discrete-state Markov-switching dynamic regression model
msVAR
An msVAR object is the No composition of arima or varm objects, specifying the dynamic structure in each state, and a dtmc object, specifying the switching mechanism ( see “Other Models” on page 110).
Standard, continuous state-space model ssm optionally containing exogenous predictor variables
You can specify coefficient No matrices explicitly or implicitly by supplying a custom function
Continuous state-space model with diffuse initial states optionally containing exogenous predictor variables
You can specify coefficient No matrices explicitly or implicitly by supplying a custom function
dssm
Bayesian Model Objects Econometrics Toolbox includes objects that represent a Bayesian view of some of the available models. A Bayesian model object specifies the parametric form of the model and the prior distributions on the parameters. 1-8
Represent Time Series Models Using Econometrics Toolbox Objects
Bayesian Linear Regression Model Objects
Bayesian linear regression model objects specify a linear regression model for a univariate response variable and the joint prior distribution of the regression coefficients and disturbance variance. In addition to standard Bayesian linear regression, several objects implement Bayesian predictor selection. This table contains the available objects that represent Bayesian linear regression models. To create a Bayesian linear regression model object, you can call the object by name or use the bayeslm function. Description
Object
Econometric Modeler Support?
Normal-inverse-gamma conjugate prior model. The regression coefficients and disturbance variance are dependent random variables.
conjugateblm No
Normal-inverse-gamma semiconjugate prior model. The regression coefficients and disturbance variance are independent random variables.
semiconjugat No eblm
Joint prior distribution is proportional to the inverse of the disturbance variance.
diffuseblm
Joint prior distribution is specified by a random sample from the respective distributions.
empiricalblm No
No
Joint prior distribution is specified in a custom function that customblm you write.
No
Bayesian lasso regression
No
lassoblm
Stochastic search variable selection (SSVS). The regression mixconjugate No coefficients and disturbance variance are dependent blm random variables (the prior and posterior distributions are conjugate). SSVS. The regression coefficients and disturbance variance mixsemiconju No are independent random variables (the prior and posterior gateblm distributions are semiconjugate). Bayesian VAR Model
Bayesian VAR model objects specify a VAR model for the multivariate response variable and the joint prior distribution of the linear coefficient matrices and innovations covariance matrix. This table contains the available objects that represent Bayesian VAR models. To create a Bayesian VAR model object, you can call the object by name or use the bayesvarm function. Description
Object
Econometric Modeler Support?
Normal conjugate prior on the coefficients and fixed covariance
normalbvarm
No
Matrix-normal-inverse-Wishart conjugate prior model. The conjugatebva No VAR coefficients and innovations covariance are dependent rm random variables.
1-9
1
Getting Started
Description
Object
Econometric Modeler Support?
Matrix-normal-inverse-Wishart semiconjugate prior model. The VAR coefficients and innovations covariance are independent random variables.
semiconjugat No ebvarm
Joint prior distribution is proportional to the inverse of the determinant of the innovations covariance.
diffusebvarm No
Joint prior distribution is specified by a random sample from the respective distributions.
empiricalbva No rm
Bayesian State-Space Model
Bayesian state-space model objects specify a linear Gaussian state-space model the multivariate response variable and the joint prior distribution of the parameters. To create a Bayesian state-space model object, call the bssm function. Custom functions you write determine the structure of the statespace model and the joint prior distribution of the parameters. Other Models Econometrics Toolbox includes several objects that you cannot directly fit to data, but are useful for experimenting, characterizing, and visualizing dynamic systems. This table contains the available objects. Description
Object
Estimation
Econometric Modeler Support?
Threshold transitions characterized by transition mid-levels and a transition type
thresho Estimate threshold transitions No ld of a threshold-switching dynamic regression model tsVAR.
Discrete-time Markov chain characterized by a transition matrix
dtmc
Estimate the transition matrix No of a Markov-switching dynamic regression model msVAR.
Lag operator polynomial
LagOp
Not directly estimable
No
Model Properties A model object holds all the information necessary for characterizing the model and performing operations, such as estimation and forecasting. This information is model dependent, but it can include the following quantities: • Parametric form of the model • Number of model parameters (e.g., the degree of the model) • Innovation distribution (Gaussian or Student’s t) • Amount of presample data needed to initialize the model Such pieces of information are properties of the model, which are stored as fields within the model object. In this way, a model object resembles a MATLAB data structure (struct array). 1-10
Represent Time Series Models Using Econometrics Toolbox Objects
All model objects have properties according to the econometric models they represent. Each property has a predefined name, which you cannot change. For example, arima supports conditional mean models (multiplicative and additive AR, MA, ARMA, ARIMA, and ARIMAX processes). Every arima model object has these properties, shown with their corresponding names. Property Name
Property Description
Constant
Model constant
AR
Nonseasonal AR coefficients
MA
Nonseasonal MA coefficients
SAR
Seasonal AR coefficients (in a multiplicative model)
SMA
Seasonal MA coefficients (in a multiplicative model)
D
Degree of nonseasonal differencing
Seasonality
Degree of seasonal differencing
Variance
Variance of the innovation distribution
Distribution
Parametric family of the innovation distribution
P
Amount of presample data needed to initialize the AR component of the model
Q
Amount of presample data needed to initialize the MA component of the model
Create Model Object Create a model object by using its creation function and assigning values to model properties. Objects require values for model infrastructure parameters, either specified directly or inferred by other inputs. Estimable parameters can be specified or unspecified. The creation function assigns default values to any properties you do not, or cannot, specify. Tip It is good practice to be aware of the default property values for any model you create. You can fully specify a model by specifying all parameter values, or partially specify a model by providing only values of the required, model infrastructure parameters and optionally some estimable parameters. In most cases, an estimable parameter is configured for estimation when its value is NaN, which is the default value for estimable parameters for most models. Some objects accept a custom function specifying the model form. Most objects support parameter estimation. For example, to create a model object representing a particular ARIMA model, use the arima function and specify at least the autoregressive and moving average polynomial degrees and the degree of nonseasonal integration. The function creates the model object of the corresponding type (arima) in the MATLAB workspace, as shown in the figure.
1-11
1
Getting Started
You can work with model objects as you would with any other variable in MATLAB. For example, you can assign the object variable a name, view it in the MATLAB Workspace, and display its value in the Command Window by typing its name. When a model object exists in the workspace, double-click its name in the Workspace window to open the Variable Editor. The Variable Editor shows all model properties and their names. This image shows a workspace containing an arima model named Mdl. Each property name is assigned a value. You can access or reassign writable properties by using dot notation, for example, Mdl.Constant = NaN; In addition to having a predefined name, each model property has a predefined data type. When assigning or modifying a property’s value, the assignment must be consistent with the property data type. For example, the arima properties have these data types. Property Name
Property Data Type
Constant
Scalar
AR
Cell array
MA
Cell array
SAR
Cell array
SMA
Cell array
D
Nonnegative integer
Seasonality
Nonnegative integer
Variance
Positive scalar
Distribution
struct array
P
Nonnegative integer (you cannot specify)
Q
Nonnegative integer (you cannot specify)
Specify an AR(2) Model To illustrate assigning property values, consider specifying the AR(2) model yt = 0 . 8yt − 1 − 0 . 2yt − 2 + εt, where the innovations are independent and identically distributed normal random variables with mean 0 and variance 0.2. Because the equation is a conditional mean model, use arima to create an 1-12
Represent Time Series Models Using Econometrics Toolbox Objects
object that represents the model. Assign values to model properties by using name-value pair arguments. This model has two AR coefficients, 0.8 and -0.2. Assign these values to the property AR as a cell array, {0.8,-0.2}. Assign the value 0.2 to Variance, and 0 to Constant. You do not need to assign a value to Distribution because the default innovation distribution is 'Gaussian'. There are no MA terms, seasonal terms, or degrees of integration, so do not assign values to these properties. You cannot specify values for the properties P and Q. In summary, specify the model as follows: Mdl = arima('AR',{0.8,-0.2},'Variance',0.2,'Constant',0) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 0 0 {0.8 -0.2} at lags [1 2] {} {} {} 0 [1×0] 0.2
The output displays the value of the created model, Mdl. Notice that the property Seasonality is not in the output. Seasonality only displays for models with seasonal integration. The property is still present, however, as seen in the Variable Editor.
1-13
1
Getting Started
Mdl has values for every arima property, even though the specification included only three. arima assigns default values for the unspecified properties. The values of SAR, MA, and SMA are empty cell arrays because the model has no seasonal or MA terms. The values of D and Seasonality are 0 because there is no nonseasonal or seasonal differencing. arima sets: • P equal to 2, the number of presample observations needed to initialize an AR(2) model. • Q equal to 0 because there is no MA component to the model (i.e., no presample innovations are needed). Specify a GARCH(1,1) Model As another illustration, consider specifying the GARCH(1,1) model yt = εt, where εt = σtzt σt2 = κ + γ1σt2− 1 + α1εt2− 1 Assume zt follows a standard normal distribution. 1-14
Represent Time Series Models Using Econometrics Toolbox Objects
This model has one GARCH coefficient (corresponding to the lagged variance term) and one ARCH coefficient (corresponding to the lagged squared innovation term), both with unknown values. To specify this model, enter: Mdl = garch('GARCH',NaN,'ARCH',NaN) Mdl = garch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] 0
The default value for the constant term is also NaN. Parameters with NaN values need to be estimated or otherwise specified before you can forecast or simulate the model. There is also a shorthand syntax to create a default GARCH(1,1) model: Mdl = garch(1,1) Mdl = garch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] 0
The shorthand syntax returns a GARCH model with one GARCH coefficient and one ARCH coefficient, with default NaN values.
Retrieve Model Properties The property values in an existing model are retrievable. Working with models resembles working with struct arrays because you can access model properties using dot notation. That is, type the model name, then the property name, separated by '.' (a period). For example, consider the arima model with this AR(2) specification: Mdl = arima('AR',{0.8,-0.2},'Variance',0.2,'Constant',0);
To display the value of the property AR for the created model, enter: arCoefficients = Mdl.AR
1-15
1
Getting Started
arCoefficients=1×2 cell array {[0.8000]} {[-0.2000]}
AR is a cell array, so you must use cell-array syntax. The coefficient cell arrays are lag-indexed, so entering secondARCoefficient = Mdl.AR{2} secondARCoefficient = -0.2000
returns the coefficient at lag 2. You can also assign any property value to a new variable: ar = Mdl.AR ar=1×2 cell array {[0.8000]} {[-0.2000]}
Modify Model Properties You can also modify model properties using dot notation. For example, consider this AR(2) specification: Mdl = arima('AR',{0.8,-0.2},'Variance',0.2,'Constant',0) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 0 0 {0.8 -0.2} at lags [1 2] {} {} {} 0 [1×0] 0.2
The created model has the default Gaussian innovation distribution. Change the innovation distribution to a Student's t distribution with eight degrees of freedom. The data type for Distribution is a struct array. Mdl.Distribution = struct('Name','t','DoF',8) Mdl = arima with properties: Description: "ARIMA(2,0,0) Model (t Distribution)" SeriesName: "Y" Distribution: Name = "t", DoF = 8
1-16
Represent Time Series Models Using Econometrics Toolbox Objects
P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
2 0 0 0 {0.8 -0.2} at lags [1 2] {} {} {} 0 [1×0] 0.2
The variable Mdl is updated accordingly.
Object Functions Object functions are functions that accept model objects as inputs and perform an operation on the model and other inputs. In Econometrics Toolbox, these functions, which represent steps in an econometrics analysis workflow, accept most objects included in the toolbox: • estimate • forecast • simulate Models that you can fit to data have these three methods in common, but the model objects in the toolbox can have other object functions. Object functions can distinguish between model objects (e.g., an arima model vs. a garch model). That is, some object functions accept different optional inputs and return different outputs depending on the type of model that is input. Find object function reference pages for a specific model by entering, for example, doc arima/ estimate.
See Also Related Examples •
“Econometric Modeling” on page 1-3
1-17
1
Getting Started
Stochastic Process Characteristics In this section... “What Is a Stochastic Process?” on page 1-18 “Stationary Processes” on page 1-19 “Linear Time Series Model” on page 1-19 “Unit Root Process” on page 1-20 “Lag Operator Notation” on page 1-21 “Characteristic Equation” on page 1-22
What Is a Stochastic Process? A time series yt is a collection of observations on a variable indexed sequentially over several time points t = 1, 2,...,T. Time series observations y1, y2,...,yT are inherently dependent. From a statistical modeling perspective, this means it is inappropriate to treat a time series as a random sample of independent observations. The goal of statistical modeling is finding a compact representation of the data-generating process for your data. The statistical building block of econometric time series modeling is the stochastic process. Heuristically, a stochastic process is a joint probability distribution for a collection of random variables. By modeling the observed time series yt as a realization from a stochastic process y = yt; t = 1, ..., T , it is possible to accommodate the high-dimensional and dependent nature of the data. The set of observation times T can be discrete or continuous. “Figure 1-1, Monthly Average CO2” on page 1-19 displays the monthly average CO2 concentration (ppm) recorded by the Mauna Loa Observatory in Hawaii from 1980 to 2012 [3].
1-18
Stochastic Process Characteristics
Figure 1-1, Monthly Average CO2
Stationary Processes Stochastic processes are weakly stationary or covariance stationary (or simply, stationary) if their first two moments are finite and constant over time. Specifically, if yt is a stationary stochastic process, then for all t: • E(yt) = μ < ∞. • V(yt) = σ2 < ∞. • Cov(yt, yt–h) = γh for all lags h ≠ 0. Does a plot of your stochastic process seem to increase or decrease without bound? The answer to this question indicates whether the stochastic process is stationary. “Yes” indicates that the stochastic process might be nonstationary. In “Figure 1-1, Monthly Average CO2” on page 1-19, the concentration of CO2 is increasing without bound which indicates a nonstationary stochastic process.
Linear Time Series Model Wold’s theorem [2] states that you can write all weakly stationary stochastic processes in the general linear form yt = μ +
∞
∑
i=1
ψiεt − i + εt .
1-19
1
Getting Started
Here, εt denotes a sequence of uncorrelated (but not necessarily independent) random variables from a well-defined probability distribution with mean zero. It is often called the innovation process because it captures all new information in the system at time t.
Unit Root Process A linear time series model is a unit root process if the solution set to its characteristic equation on page 1-22 contains a root that is on the unit circle (i.e., has an absolute value of one). Subsequently, the expected value, variance, or covariance of the elements of the stochastic process grows with time, and therefore is nonstationary. If your series has a unit root, then differencing it might make it stationary. For example, consider the linear time series model yt = yt − 1 + εt, where εt is a white noise sequence of innovations with variance σ2 (this is called the random walk). The characteristic equation of this model is z − 1 = 0, which has a root of one. If the initial observation y0 is fixed, then you can write the model as yt = y0 +
t
∑
i=1
εi . Its expected value is y0, which is independent of time. However, the
variance of the series is tσ2, which grows with time making the series unstable. Take the first difference to transform the series and the model becomes dt = yt − yt − 1 = εt. The characteristic equation for this series is z = 0, so it does not have a unit root. Note that • E(dt) = 0, which is independent of time, • V(dt) = σ2, which is independent of time, and • Cov(dt, dt − s) = 0, which is independent of time for all integers 0 < s < t. “Figure 1-1, Monthly Average CO2” on page 1-19 appears nonstationary. What happens if you plot the first difference dt = yt – yt–1 of this series? “Figure 1-2, Monthly Difference in CO2” on page 1-21 displays the dt. Ignoring the fluctuations, the stochastic process does not seem to increase or decrease in general. You can conclude that dt is stationary, and that yt is unit root nonstationary. For details, see “Differencing” on page 2-3.
1-20
Stochastic Process Characteristics
Figure 1-2, Monthly Difference in CO2
Lag Operator Notation The lag operator L operates on a time series yt such that Li yt = yt − i. An mth-degree lag polynomial of coefficients b1, b2,...,bm is defined as B(L) = (1 + b1L + b2L2 + … + bmLm) . In lag operator notation, you can write the general linear model using an infinite-degree polynomial ψ(L) = (1 + ψ1L + ψ2L2 + …), yt = μ + ψ(L)εt . You cannot estimate a model that has an infinite-degree polynomial of coefficients with a finite amount of data. However, if ψ(L) is a rational polynomial (or approximately rational), you can write it (at least approximately) as the quotient of two finite-degree polynomials. Define the q-degree polynomial θ(L) = (1 + θ1L + θ2L2 + … + θqLq) and the p-degree polynomial ϕ(L) = (1 + ϕ1L + ϕ2L2 + … + ϕpLp). If ψ(L) is rational, then ψ(L) =
θ(L) . ϕ(L) 1-21
1
Getting Started
Thus, by Wold’s theorem, you can model (or closely approximate) every stationary stochastic process as yt = μ +
θ(L) ε, ϕ(L) t
which has p + q coefficients (a finite number).
Characteristic Equation A degree p characteristic polynomial of the linear time series model yt = ϕ1 yt − 1 + ϕ2 yt − 2 + ... + ϕp yt − p + εt is ϕ(a) = ap − ϕ1ap − 1 − ϕ2ap − 2 − ... − ϕp . It is another way to assess that a series is a stationary process. For example, the characteristic equation of yt = 0.5yt − 1 − 0.02yt − 2 + εt is ϕ(a) = a2 − 0.5a + 0.02. The roots of the homogeneous characteristic equation ϕ(a) = 0 (called the characteristic roots) determine whether the linear time series is stationary. If every root in ϕ(a) lies inside the unit circle, then the process is stationary. Roots lie within the unit circle if they have an absolute value less than one. This is a unit root process if one or more roots lie inside the unit circle (i.e., have absolute value of one). Continuing the example, the characteristic roots of ϕ(a) = 0 are a = 0.4562, 0.0438 . Since the absolute values of these roots are less than one, the linear time series model is stationary.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell, 1938. [3] Tans, P., and R. Keeling. (2012, August). “Trends in Atmospheric Carbon Dioxide.” NOAA Research. Retrieved October 5, 2012 from https://gml.noaa.gov/ccgg/trends/ mlo.html.
See Also Related Examples
1-22
•
“Creating Univariate Conditional Mean Models” on page 7-3
•
“Specify GARCH Models” on page 8-6
•
“Specify EGARCH Models” on page 8-17
•
“Specify GJR Models” on page 8-28
•
“Simulate Stationary Processes” on page 7-147
•
“Assess Stationarity of a Time Series” on page 3-50
Stochastic Process Characteristics
More About •
“Econometric Modeling” on page 1-3
•
“What Are Conditional Mean Models?” on page 7-13
•
“Conditional Variance Models” on page 8-2
1-23
1
Getting Started
Bibliography [1] Ait-Sahalia, Y. “Testing Continuous-Time Models of the Spot Interest Rate.” The Review of Financial Studies. Spring 1996, Vol. 9, No. 2, pp. 385–426. [2] Ait-Sahalia, Y. "Transition Densities for Interest Rate and Other Nonlinear Diffusions." The Journal of Finance. Vol. 54, No. 4, August 1999. [3] Akaike, Hirotugu. "Information Theory and an Extension of the Maximum Likelihood Principle.” In Selected Papers of Hirotugu Akaike, edited by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa, 199–213. New York: Springer, 1998. https://doi.org/10.1007/978-1-4612-1694-0_15. [4] Akaike, Hirotugu. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19, no. 6 (December 1974): 716–23. https://doi.org/10.1109/ TAC.1974.1100705. [5] Almon, S. "The Distributed Lag Between Capital Appropriations and Expenditures." Econometrica. Vol. 33, 1965, pp. 178–196. [6] Amano, R. A., and S. van Norden. "Unit Root Tests and the Burden of Proof." Bank of Canada. Working paper 92–7, 1992. [7] Andrews, D. W. K. "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation." Econometrica. Vol. 59, 1991, pp. 817–858. [8] Andrews, D. W. K., and J. C. Monohan. "An Improved Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimator." Econometrica. Vol. 60, 1992, pp. 953–966. [9] Baillie, R. T., and T. Bollerslev. “Prediction in Dynamic Models with Time-Dependent Conditional Variances.” Journal of Econometrics. Vol. 52, 1992, pp. 91–113. [10] Banerjee, A. N., and J. R. Magnus. "On the Sensitivity of the Usual t- and F-Tests to Covariance Misspecification." Journal of Econometrics. Vol. 95, 2000, pp. 157–176. [11] Barone-Adesi, G., K. Giannopoulos, and L. Vosper. "VaR without Correlations for Non-Linear Portfolios." Journal of Futures Markets. Vol. 19, 1999, pp. 583–602. [12] Baxter, Marianne, and Robert G. King. "Measuring Business Cycles: Approximate Band-Pass Filters for Economic Time Series." Review of Economics and Statistics 81, no. 4 (November 1999): 575–93. https://doi.org/10.1162/003465399558454. [13] Belsley, D. A., E. Kuh, and R. E. Welsh. Regression Diagnostics. New York, NY: John Wiley & Sons, Inc., 1980. [14] Bera, A. K., and H. L. Higgins. “A Survey of ARCH Models: Properties, Estimation and Testing.” Journal of Economic Surveys. Vol. 7, No. 4, 1993. [15] Beveridge, Stephen, and Charles R. Nelson. "A New Approach to Decomposition of Economic Time Series into Permanent and Transitory Components with Particular Attention to Measurement of the 'Business Cycle.'" Journal of Monetary Economics 7 (January 1981): 151– 74. https://doi.org/10.1016/0304-3932(81)90040-4. [16] Bohrnstedt, G. W., and T. M. Carter. "Robustness in Regression Analysis." In Sociological Methodology, H. L. Costner, editor, pp. 118–146. San Francisco: Jossey-Bass, 1971. 1-24
Bibliography
[17] Bollerslev, T. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” Review of Economics and Statistics. Vol. 69, 1987, pp. 542–547. [18] Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics. Vol. 31, 1986, pp. 307–327. [19] Bollerslev, T., R. Y. Chou, and K. F. Kroner. “ARCH Modeling in Finance: A Review of the Theory and Empirical Evidence.” Journal of Econometrics. Vol. 52, 1992, pp. 5–59. [20] Bollerslev, T., R. F. Engle, and D. B. Nelson. “ARCH Models.” Handbook of Econometrics. Vol. 4, Chapter 49, Amsterdam: Elsevier Science B.V., 1994, pp. 2959–3038. [21] Bollerslev, T., and E. Ghysels. “Periodic Autoregressive Conditional Heteroscedasticity.” Journal of Business and Economic Statistics. Vol. 14, 1996, pp. 139–151. [22] Bouye, E., V. Durrleman, A. Nikeghbali, G. Riboulet, and Roncalli, T. "Copulas for Finance: A Reading Guide and Some Applications." Groupe de Rech. Oper., Credit Lyonnais, Paris, 2000. [23] Box, G. E. P., and D. R. Cox. "An Analysis of Transformations". Journal of the Royal Statistical Society. Series B, Vol. 26, 1964, pp. 211–252. [24] Box, G. E. P. and D. Pierce. "Distribution of Residual Autocorrelations in AutoregressiveIntegrated Moving Average Time Series Models." Journal of the American Statistical Association. Vol. 65, 1970, pp. 1509–1526. [25] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [26] Brandolini, D., M. Pallotta, and R. Zenti. "Risk Management in an Asset Management Company: A Practical Case." Presented at EMFA 2001, Lugano, Switzerland. 2000. [27] Breusch, T.S., and L. G. Godfrey. "A Review of Recent Work on Testing for Autocorrelation in Dynamic Simultaneous Models." In Currie, D., R. Nobay, and D. Peel (Eds.), Macroeconomic Analysis: Essays in Macroeconomics and Econometrics. London: Croom Helm, 1981. [28] Breusch, T.S., and Pagan, A.R. "Simple test for heteroscedasticity and random coefficient variation". Econometrica. v. 47, 1979, pp. 1287–1294. [29] Brieman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Boca Raton, FL: Chapman & Hall/CRC, 1984. [30] Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002. [31] Brooks, C., S. P. Burke, and G. Persand. “Benchmarks and the Accuracy of GARCH Model Estimation.” International Journal of Forecasting. Vol. 17, 2001, pp. 45–56. [32] Brown, Bryan W., and Roberto S. Mariano. "Predictors in Dynamic Nonlinear Models: LargeSample Behavior." Econometric Theory, 5 (December 1989): 430–52. https://doi.org/10.1017/ S0266466600012603. [33] Brown, M. B. and Forsythe, A. B. "Robust Tests for Equality of Variances." Journal of the American Statistical Association. 69, 1974, pp. 364–367. [34] Burke, S. P. "Confirmatory Data Analysis: The Joint Application of Stationarity and Unit Root Tests." University of Reading, UK. Discussion paper 20, 1994. 1-25
1
Getting Started
[35] Burnham, Kenneth P., and David R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed, New York: Springer, 2002. [36] Burns, Arthur F., and Wesley C. Mitchell. Measuring Business Cycles. Cambridge, MA: National Bureau of Economic Research, 1946. [37] Campbell, J. Y., A. W. Lo, and A. C. MacKinlay. Chapter 12. “The Econometrics of Financial Markets.” Nonlinearities in Financial Data. Princeton, NJ: Princeton University Press, 1997. [38] Caner, M., and L. Kilian. “Size Distortions of Tests of the Null Hypothesis of Stationarity: Evidence and Implications for the PPP Debate.” Journal of International Money and Finance. Vol. 20, 2001, pp. 639–657. [39] Cecchetti, S. G., and P. S. Lam. “Variance-Ratio Tests: Small-Sample Properties with an Application to International Output Data.” Journal of Business and Economic Statistics. Vol. 12, 1994, pp. 177–186. [40] Chambers, M. J. "Jackknife Estimation of Stationary Autoregressive Models." University of Essex Discussion Paper No. 684, 2011. [41] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [42] Chow, G. C. "Tests of Equality Between Sets of Coefficients in Two Linear Regressions." Econometrica. Vol. 28, 1960, pp. 591–605. [43] Christiano, Lawrence J., and Terry J. Fitzgerald. "The Band Pass Filter." International Economic Review 44 (May 2003): 435–65. https://doi.org/10.1111/1468-2354.t01-1-00076. [44] Christoffersen, P.F. Elements of Financial Risk Management. Waltham, MA: Academic Press, 2002. [45] Clarke, K. A. "The Phantom Menace: Omitted Variable Bias in Econometric Research." Conflict Management and Peace Science. Vol. 22, 2005, pp. 341–352. [46] Clark, Peter K. "The Cyclical Component of U. S. Economic Activity." The Quarterly Journal of Economics 102, no. 4 (November 1987): 797–814. https://doi.org/10.2307/1884282. [47] Clements, Michael P., and Jeremy Smith. "The Performance of Alternative Forecasting Methods for SETAR Models." International Journal of Forecasting, 13 (December 1997): 463–75. https://doi.org/10.1016/S0169-2070(97)00017-4. [48] Cochrane, J. "How Big is the Random Walk in GNP?" Journal of Political Economy. Vol. 96, 1988, pp. 893–920. [49] Cogley, Timothy, and James M. Nason. "Effects of the Hodrick-Prescott Filter on Trend and Difference Stationary Time Series Implications for Business Cycle Research." Journal of Economic Dynamics and Control 19, no. 1 (January1995): 253–78. https://doi.org/ 10.1016/0165-1889(93)00781-X. [50] Congressional Budget Office, Budget and Economic Data, 10-Year Economic Projections, https://www.cbo.gov/data/budget-economic-data. 1-26
Bibliography
[51] Cribari-Neto, F. "Asymptotic Inference Under Heteroskedasticity of Unknown Form." Computational Statistics & Data Analysis. Vol. 45, 2004, pp. 215-233. [52] Cramér, H. Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press, 1946. [53] Dagum, E. B. The X-11-ARIMA Seasonal Adjustment Method. Number 12–564E. Statistics Canada, Ottawa, 1980. [54] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [55] Davidson, R., and E. Flachaire. "The Wild Bootstrap, Tamed at Last." Journal of Econometrics. Vol. 146, 2008, pp. 162–169. [56] de Jong, Robert M., and Neslihan Sakarya. "The Econometrics of the Hodrick-Prescott Filter." Review of Economics and Statistics 98, no. 2 (May 2016): 310–17. https://doi.org/10.1162/ REST_a_00523. [57] Del Negro, M., Schorfheide, F., Smets, F. and Wouters, R. "On the Fit of New Keynesian Models." Journal of Business & Economic Statistics. Vol. 25, No. 2, 2007, pp. 123–162. [58] Diebold, F. X. Elements of Forecasting. Mason, OH: Thomson Higher Education, 2007. [59] Diebold, F.X., and C. Li. "Forecasting the Term Structure of Government Bond Yields." Journal of Econometrics. Vol. 130, No. 2, 2006, pp. 337–364. [60] Diebold, F. X., G. D. Rudebusch, and B. Aruoba (2006), "The Macroeconomy and the Yield Curve: A Dynamic Latent Factor Approach." Journal of Econometrics. Vol. 131, 2006, pp. 309–338. [61] Diebold, F.X., and G.D. Rudebusch. Business Cycles: Durations, Dynamics, and Forecasting. Princeton, NJ: Princeton University Press, 1999. [62] den Haan, W. J., and A. Levin. "A Practitioner's Guide to Robust Covariance Matrix Estimation." In Handbook of Statistics. Edited by G. S. Maddala and C. R. Rao. Amsterdam: Elsevier, 1997. [63] Dickey, D. A., and W. A. Fuller. “Distribution of the Estimators for Autoregressive Time Series with a Unit Root.” Journal of the American Statistical Association. Vol. 74, 1979, pp. 427–431. [64] Dickey, D. A., and W. A. Fuller. “Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root.” Econometrica. Vol. 49, 1981, pp. 1057–1072. [65] Dowd, K. Measuring Market Risk. West Sussex: John Wiley & Sons, 2005. [66] Durbin J., and S. J. Koopman. “A Simple and Efficient Simulation Smoother for State Space Time Series Analysis.” Biometrika. Vol 89., No. 3, 2002, pp. 603–615. [67] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012. [68] Durbin, J. and G.S. Watson. "Testing for Serial Correlation in Least Squares Regression." Biometrika. Vol. 37, 1950, pp. 409–428. [69] Elder, J., and P. E. Kennedy. “Testing for Unit Roots: What Should Students Be Taught?” Journal of Economic Education. Vol. 32, 2001, pp. 137–146. 1-27
1
Getting Started
[70] Embrechts, P., A. McNeil, and D. Straumann. "Correlation and Dependence in Risk Management: Properties and Pitfalls". Risk Management: Value At Risk and Beyond. Cambridge: Cambridge University Press, 1999, pp. 176–223. [71] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [72] Engle, Robert. F. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica 50 (July 1982): 987–1007. https://doi.org/ 10.2307/1912773. [73] Engle, R. F. and C. W. J. Granger. “Co-Integration and Error-Correction: Representation, Estimation, and Testing.” Econometrica. v. 55, 1987, pp. 251–276. [74] Engle, R. F., D. M. Lilien, and R. P. Robins. “Estimating Time Varying Risk Premia in the Term Structure: The ARCH-M Model.” Econometrica. Vol. 59, 1987, pp. 391–407. [75] Faust, J. “When Are Variance Ratio Tests for Serial Dependence Optimal?” Econometrica. Vol. 60, 1992, pp. 1215–1226. [76] Fernández-Villaverde, Jesús, Rubio-Ramírez, Juan F., and Schorfheide, Frank. "Solution and Estimation Methods for DSGE Models." Handbook of Macroeconomics 2 (November 2016) 527–724. https://doi.org/10.1016/bs.hesmac.2016.03.006. [77] Findley, D. F., B. C. Monsell, W. R. Bell, M. C. Otto, and B.-C. Chen. "New Capabilities and Methods of the X-12-ARIMA Seasonal-Adjustment Program." Journal of Business & Economic Statistics. Vol. 16, Number 2, 1998, pp. 127–152. [78] Fisher, F. M. "Tests of Equality Between Sets of Coefficients in Two Linear Regressions: An Expository Note." Econometrica. Vol. 38, 1970, pp. 361–66. [79] Fisher, R. A.. "Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population." Biometrika. Vol. 10, 1915, pp. 507–521. [80] Fisher, R. A. "On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample." Metron. Vol. 1, 1921, pp. 3–32. [81] Fisher, R. A. "The Distribution of the Partial Correlation Coefficient." Metron. Vol. 3, 1924, pp. 329–332. [82] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [83] Gallant, A. R. Nonlinear Statistical Models. Hoboken, NJ: John Wiley & Sons, Inc., 1987. [84] Gilks, W. R., S. Richardson, and D.J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Boca Raton: Chapman & Hall/CRC, 1996. [85] Glasserman, P. Monte Carlo Methods in Financial Engineering. New York: Springer-Verlag, 2004. [86] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [87] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. 1-28
Bibliography
[88] Gourieroux, C. ARCH Models and Financial Applications. New York: Springer-Verlag, 1997. [89] Goutte, C. "Note on Free Lunches and Cross-Validation." Neural Computation. Vol. 9, 1997, pp. 1211–1215. [90] Granger, Clive W. J. "The Typical Spectral Shape of an Economic Variable." Econometrica 34, no. 1 (January 1966): 150–61. https://doi.org/10.2307/1909859. [91] Granger, C., and P. Newbold. "Forecasting Transformed Series." Journal of the Royal Statistical Society. Series B, Vol. 38, 1976, pp. 189–203. [92] Granger, C. W. J., and P. Newbold. "Spurious Regressions in Econometrics." Journal of Econometrics. Vol. 2, 1974, pp. 111–120. [93] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008. [94] Goldberger, A. T. A Course in Econometrics. Cambridge, MA: Harvard University Press, 1991. [95] Goldfeld, S. M., and Quandt, R. E. "Some Tests for Homoscedasticity". Journal of the American Statistical Association. v. 60, 1965, pp. 539–547. [96] Hagerud, G. E. “Modeling Nordic Stock Returns with Asymmetric GARCH.” Working Paper Series in Economics and Finance. No. 164, Stockholm: Department of Finance, Stockholm School of Economics, 1997. [97] Hagerud, G. E. “Specification Tests for Asymmetric GARCH.” Working Paper Series in Economics and Finance. No. 163, Stockholm: Department of Finance, Stockholm School of Economics, 1997. [98] Haggstrom, O. Finite Markov Chains and Algorithmic Applications. Cambridge, UK: Cambridge University Press, 2002. [99] Hamilton, J. D. "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." Econometrica. Vol. 57, 1989, pp. 357–384. [100] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [101] Hamilton, J. D. "Macroeconomic Regimes and Regime Shifts." In Handbook of Macroeconomics. (H. Uhlig and J. Taylor, eds.). Amsterdam: Elsevier, 2016. [102] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [103] Hamilton, James D. "Why You Should Never Use the Hodrick-Prescott Filter." The Review of Economics and Statistics 100 (December 2018): 831–43. https://doi.org/10.1162/ rest_a_00706. [104] Hannan, Edward J., and Barry G. Quinn. “The Determination of the Order of an Autoregression.” Journal of the Royal Statistical Society: Series B (Methodological) 41, no. 2 (January 1979): 190–95. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x. [105] Hart, J. D. "Kernel Regression Estimation With Time Series Errors." Journal of the Royal Statistical Society. Series B, Vol. 53, 1991, pp. 173–187. [106] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. New York: Springer, 2008. 1-29
1
Getting Started
[107] Haug, A. “Testing Linear Restrictions on Cointegrating Vectors: Sizes and Powers of Wald Tests in Finite Samples.” Econometric Theory. Vol. 18, 2002, pp. 505–524. [108] Helwege, J., and P. Kleiman. "Understanding Aggregate Default Rates of High Yield Bonds." Federal Reserve Bank of New York Current Issues in Economics and Finance. Vol. 2, No. 6, 1996, pp. 1–6. [109] Hendry, D. F. Econometrics: Alchemy or Science? Oxford: Oxford University Press, 2001. [110] Hentschel, L. "All in the Family: Nesting Symmetric and Asymmetric GARCH Models." Journal of Financial Economics. Vol. 39, 1995, pp. 71–104. [111] Hibbs, D. "Problems of Statistical Estimation and Causal Inference in Dynamic Time Series Models." In Costner, H. (Ed.) Sociological Methodology. San Francisco: Jossey-Bass, 1974. [112] Hoerl, A. E., and R. W. Kennard. "Ridge Regression: Applications to Nonorthogonal Problems." Technometrics. Vol. 12, No. 1, 1970, pp. 69–82. [113] Hull, J. C. Options, Futures, and Other Derivatives. 5th ed. Englewood Cliffs, NJ: Prentice Hall, 2002. [114] Hodrick, Robert J. "An Exploration of Trend-Cycle Decomposition Methodologies in Simulated Data." National Bureau of Economic Research Working Paper No. w26750. Social Science Research Network (February 2020). https://papers.ssrn.com/abstract=3539317. [115] Hodrick, Robert J., and Edward C. Prescott. "Postwar U.S. Business Cycles: An Empirical Investigation." Journal of Money, Credit and Banking 29, no. 1 (February 1997): 1–16. https:// doi.org/10.2307/2953682. [116] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [117] Hyndman, Rob J. "Highest-Density Forecast Regions for Nonlinear and Non-Normal Time Series Models." Journal of Forecasting, 14 (September 1995): 431–41. https://doi.org/10.1002/ for.3980140503. [118] Inder, B. A. "Finite Sample Power of Tests for Autocorrelation in Models Containing Lagged Dependent Variables." Economics Letters. Vol. 14, 1984, pp.179–185. [119] Jarrow, A. Finance Theory. Englewood Cliffs, NJ: Prentice-Hall, 1988. [120] Jarvis, J. P., and D. R. Shier. "Graph-Theoretic Analysis of Finite Markov Chains." In Applied Mathematical Modeling: A Multidisciplinary Approach. Boca Raton: CRC Press, 2000. [121] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [122] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 2, 2nd ed. New York: John Wiley & Sons, 1995. [123] Johnston, J. Econometric Methods. New York: McGraw-Hill, 1972. [124] Jonsson, J. G., and M. Fridson. "Forecasting Default Rates on High Yield Bonds." Journal of Fixed Income. Vol. 6, No. 1, 1996, pp. 69–77. 1-30
Bibliography
[125] Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lϋtkepohl, and T. C. Lee. The Theory and Practice of Econometrics. New York, NY: John Wiley & Sons, Inc., 1985. [126] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [127] Kennedy, P. A Guide to Econometrics. 6th ed. New York: John Wiley & Sons, 2008. [128] Keuzenkamp, H. A., and M. McAleer. "Simplicity, Scientific Inference and Economic Modeling." Economic Journal. Vol. 105, 1995, pp. 1–21. [129] Kiefer, N. M., T. J. Vogelsang, and H. Bunzel. "Simple Robust Testing of Regression Hypotheses." Econometrica. Vol. 68, 2000, pp. 695–714. [130] Kim, C.-J. "Dynamic Linear Models with Markov Switching." Journal of Econometrics. Vol. 60, 1994, pp. 1–22. [131] Kimball, M. "The Quantitative Analytics of the Basic Neomonetarist Model." Journal of Money, Credit, and Banking, Part 2: Liquidity, Monetary Policy, and Financial Intermediation. Vol. 27, No. 4, 1995, pp. 1241–1277. [132] King, M. L. "Robust Tests for Spherical Symmetry and Their Application to Least Squares Regression." Annals of Statistics. Vol. 8, 1980, pp. 1265–1271. [133] Kole, E. "Regime Switching Models: An Example for a Stock Market Index." Rotterdam, NL: Econometric Institute, Erasmus School of Economics, 2010. [134] Koyck, L. M. Distributed Lags Models and Investment Analysis. Amsterdam: North-Holland, 1954. [135] Krolzig, H.-M. Markov-Switching Vector Autoregressions. Berlin: Springer, 1997. [136] Krolzig, H. -M., and Hendry, D.F. "Computer Automation of General-To-Specific Model Selection Procedures." Journal of Economic Dynamics & Control. Vol. 25, 2001, pp. 831–866. [137] Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. 5th Ed. New York: McGraw-Hill/Irwin, 2005. [138] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [139] Leybourne, S. J. and B. P. M. McCabe. “A Consistent Test for a Unit Root.” Journal of Business and Economic Statistics. Vol. 12, 1994, pp. 157–166. [140] Leybourne, S. J. and B. P. M. McCabe. “Modified Stationarity Tests with Data-Dependent ModelSelection Rules.” Journal of Business and Economic Statistics. Vol. 17, 1999, pp. 264–270. [141] Lin, Jin-Lung, and Clive W. J. Granger. "Forecasting from Non-Linear Models in Practice." Journal of Forecasting, 3 (January 1994): 1–9. https://doi.org/10.1002/for.3980130102. [142] Litterman, Robert B. "Forecasting with Bayesian Vector Autoregressions: Five Years of Experience." Journal of Business and Economic Statistics 4, no. 1 (January 1986): 25–38. https://doi.org/10.2307/1391384. [143] Ljung, G. and G. E. P. Box. "On a Measure of Lack of Fit in Time Series Models." Biometrika. Vol. 66, 1978, pp. 67–72. 1-31
1
Getting Started
[144] Lo, A. W., and A. C. MacKinlay. “Stock Market Prices Do Not Follow Random Walks: Evidence from a Simple Specification Test.” Review of Financial Studies. Vol. 1, 1988, pp. 41–66. [145] Lo, A. W., and A. C. MacKinlay. “The Size and Power of the Variance Ratio Test.” Journal of Econometrics. Vol. 40, 1989, pp. 203–238. [146] Lo, A. W., and A. C. MacKinlay. A Non-Random Walk Down Wall St. Princeton, NJ: Princeton University Press, 2001. [147] Loeffler, G., and P. N. Posch. Credit Risk Modeling Using Excel and VBA. West Sussex, England: Wiley Finance, 2007. [148] Long, J. S., and L. H. Ervin. "Using Heteroscedasticity-Consistent Standard Errors in the Linear Regression Model." The American Statistician. v. 54, 2000, pp. 217-224. [149] Longstaff, F. A., and E. S. Schwartz. “Valuing American Options by Simulation: A Simple LeastSquares Approach.” The Review of Financial Studies. Spring 2001, Vol. 14, No. 1, pp. 113– 147. [150] Lütkepohl, H. New Introduction to Multiple Time Series Analysis. Berlin: Springer, 2005. [151] Lütkepohl, Helmut, and Markus Krätzig, editors. Applied Time Series Econometrics. 1st ed. Cambridge University Press, 2004. https://doi.org/10.1017/CBO9780511606885. [152] MacKinnon, J. G. "Numerical Distribution Functions for Unit Root and Cointegration Tests." Journal of Applied Econometrics. Vol. 11, 1996, pp. 601–618. [153] MacKinnon, J. G., and H. White. "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." Journal of Econometrics. Vol. 29, 1985, pp. 305–325. [154] Maddala, G. S., and I. M. Kim. Unit Roots, Cointegration, and Structural Change. Cambridge, UK: Cambridge University Press, 1998. [155] Maeshiro, A. "Teaching Regressions with a Lagged Dependent Variable and Autocorrelated Disturbances." Journal of Economic Education. Vol. 27, 1996, pp. 72–84. [156] Maeshiro, A. "An Illustration of the Bias of OLS for Yt = λYt–1+Ut." Journal of Economic Education. Vol. 31, 2000, pp. 76–80. [157] Malinvaud, E. Statistical Methods of Econometrics. Amsterdam: North-Holland, 1970. [158] Marriott, F. and J. Pope. "Bias in the Estimation of Autocorrelations." Biometrika. Vol. 41, 1954, pp. 390–402. [159] Mashal, R. and A. Zeevi. "Beyond Correlation: Extreme Co-movements between Financial Assets." Columbia University, New York, 2002. [160] McCullough, B. D., and C. G. Renfro. “Benchmarks and Software Standards: A Case Study of GARCH Procedures.” Journal of Economic and Social Measurement. Vol. 25, 1998, pp. 59–71. [161] McLeod, A.I. and W.K. Li. “Diagnostic Checking ARMA Time Series Models Using SquaredResidual Autocorrelations.”Journal of Time Series Analysis. Vol. 4, 1983, pp. 269–273. 1-32
Bibliography
[162] McNeil, A. and R. Frey. "Estimation of Tail Related Risk Measure for Heteroscedastic Financial Time Series: An Extreme Value Approach." Journal of Empirical Finance. Vol. 7, 2000, pp. 271–300. [163] Moler, C. Numerical Computing with MATLAB. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2004. [164] Montgomery, J. Mathematical Models of Social Systems. Unpublished manuscript. Department of Sociology, University of Wisconsin-Madison, 2016. [165] Morin, N. "Likelihood Ratio Tests on Cointegrating Vectors, Disequilibrium Adjustment Vectors, and their Orthogonal Complements." European Journal of Pure and Applied Mathematics. v. 3, 2010, pp. 541–571. [166] Morley, James C., Charles R. Nelson, and Eric Zivot. "Why Are the Beveridge-Nelson and Unobserved-Components Decompositions of GDP So Different?" Review of Economics and Statistics 85, no. 2 (May 2003): 235–43. https://doi.org/10.1162/003465303765299765. [167] National Bureau of Economic Research (NBER), Business Cycle Expansions and Contractions, https://www.nber.org/research/data/us-business-cycle-expansions-andcontractions. [168] Nelson, D. B. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica. Vol. 59, 1991, pp. 347–370. [169] Nelson, D. B. "Conditional Heteroskedasticity in Asset Returns: A New Approach." Econometrica.. Vol. 59, No. 2, 1991, pp. 347–370. [170] Nelson, C., and C. Plosser. "Trends and Random Walks in Macroeconomic Time Series: Some Evidence and Implications." Journal of Monetary Economics. Vol. 10, 1982, pp. 130–162. [171] Nelson, R. C., and A. F. Siegel. "Parsimonious Modeling of Yield Curves." Journal of Business. Vol. 60, No. 4, 1987, pp. 473–489. [172] Newey, W. K., and K. D. West. “A Simple Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica. Vol. 55, 1987, pp. 703–708. [173] Newey, W. K, and K. D. West. “Automatic Lag Selection in Covariance Matrix Estimation.” The Review of Economic Studies. Vol. 61, No. 4, 1994, pp. 631–653. [174] Norris, J. R. Markov Chains. Cambridge, UK: Cambridge University Press, 1997. [175] Nystrom, K. and J. Skoglund. "Univariate Extreme Value Theory, GARCH and Measures of Risk." Preprint, submitted 2002. [176] Nystrom, K. and J. Skoglund. "A Framework for Scenario-Based Risk Management." Preprint, submitted 2002. [177] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, 1991˙. [178] Park, T. and G. Casella. "The Bayesian Lasso." Journal of American Statistical Association. Vol. 103, 2008, pp. 681–686. [179] Ng, S., and P. Perron. “Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag.” Journal of the American Statistical Association. Vol. 90, 1995, pp. 268–281. 1-33
1
Getting Started
[180] Park, R. E. "Estimation with Heteroscedastic Error Terms". Econometrica. 34, 1966 p. 888. [181] Perron, P. “Trends and Random Walks in Macroeconomic Time Series: Further Evidence from a New Approach.” Journal of Economic Dynamics and Control. Vol. 12, 1988, pp. 297–332. [182] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29. [183] Peters, J. P. “Estimating and Forecasting Volatility of Stock Indices Using Asymmetric GARCH Models and Skewed Student-t Densities.” Working Paper. Belgium: École d'Administration des Affaires, University of Liège, March 20, 2001. [184] Phillips, P. “Time Series Regression with a Unit Root.” Econometrica. Vol. 55, 1987, pp. 277– 301. [185] Phillips, P., and P. Perron. “Testing for a Unit Root in Time Series Regression." Biometrika. Vol. 75, 1988, pp. 335–346. [186] Qin, H., and A. T. K. Wan. "On the Properties of the t- and F-Ratios in Linear Regressions with Nonnormal Errors." Econometric Theory. Vol. 20, No. 4, 2004, pp. 690–700. [187] Ravn, Morton O., and Harald Uhlig. "On Adjusting the Hodrick-Prescott Filter for the Frequency of Observations." The Review of Economics and Statistics 84 , no. 2 (May 2002): 371–76. https://doi.org/10.1162/003465302317411604. [188] Rea, J. D. "Indeterminacy of the Chow Test When the Number of Observations is Insufficient." Econometrica. Vol. 46, 1978, p. 229. [189] Roncalli, T., A. Durrleman, and A. Nikeghbali. "Which Copula Is the Right One?" Groupe de Rech. Oper., Credit Lyonnais, Paris, 2000. [190] Schwert, W. "Effects of Model Specification on Tests for Unit Roots in Macroeconomic Data." Journal of Monetary Economics. Vol. 20, 1987, pp. 73–103. [191] Schwarz, Gideon. “Estimating the Dimension of a Model.” The Annals of Statistics 6, no. 2 (March 1978): 461–64. https://doi.org/10.1214/aos/1176344136. [192] Schwert, W. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159. [193] Shackleton, Robert. "Estimating and Projecting Potential Output Using CBO's Forecasting Growth Model." Congressional Budget Office Working Paper No. 2018-03 (February 2018). https://www.cbo.gov/publication/53558. [194] Shao, J. "An Asymptotic Theory for Linear Model Selection." Statistica Sinica. Vol. 7, 1997, pp. 221–264. [195] Sharpe, W. F. "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk." Journal of Finance. Vol. 19, 1964, pp. 425–442. [196] Shreve, S. E. Stochastic Calculus for Finance II: Continuous-Time Models. New York: SpringerVerlag, 2004. [197] Sims, Christopher A. "Solving Linear Rational Expectations Models." Computational Economics 20 (October 2002) 1–20. https://doi.org/10.1023/A:1020517101123. 1-34
Bibliography
[198] Sims, C., Stock, J., and Watson, M. "Inference in Linear Time Series Models with Some Unit Roots." Econometrica. Vol. 58, 1990, pp. 113–144. [199] Smets, F. and Wouters, R. "An Estimated Stochastic Dynamic General Equilibrium Model of the Euro Area." European Central Bank, Working Paper Series. No. 171, 2002. [200] Smets, F. and Wouters, R. "Comparing Shocks and Frictions in US and Euro Area Business Cycles: A Bayesian DSGE Approach." European Central Bank, Working Paper Series. No. 391, 2004. [201] Smets, F. and Wouters, R. "Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach." European Central Bank, Working Paper Series. No. 722, 2007. [202] Strang, G. Linear Algebra and Its Applications. 4th ed. Pacific Grove, CA: Brooks Cole, 2005. [203] Stock, James H., and Mark W. Watson. "Forecasting Inflation." Journal of Monetary Economics 44, no. 2 (October 1999): 293–335. https://doi.org/10.1016/S0304-3932(99)00027-6. [204] Stone, M. "An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike's Criterion." Journal of the Royal Statistical Society. Series B, Vol. 39, 1977, pp. 44–47. [205] Stone, R. "The Analysis of Market Demand." Journal of the Royal Statistical Society. Vol. 108, 1945, pp. 1–98. [206] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [207] Tibshirani, R. "Regression Shrinkage and Selection via the Lasso." Journal of Royal Statistical Society. Vol. 58, 1996, pp. 267–288. [208] Tsay,R.S. Analysis of Financial Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 2005. [209] Turner, P. M. "Testing for Cointegration Using the Johansen Approach: Are We Using the Correct Critical Values?" Journal of Applied Econometrics. v. 24, 2009, pp. 825–831. [210] U.S. Federal Reserve Economic Data (FRED), Federal Reserve Bank of St. Louis, https:// fred.stlouisfed.org/. [211] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999. [212] Weisberg, S. Applied Linear Regression. Hoboken, NJ: John Wiley & Sons, Inc., 2005. [213] Wielandt, H. Topics in the Analytic Theory of Matrices. Lecture notes prepared by R. Mayer. Department of Mathematics, University of Wisconsin-Madison, 1967. [214] White, H. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity." Econometrica. v. 48, 1980, pp. 817–838. [215] White, J. S. "Asymptotic Expansions for the Mean and Variance of the Serial Correlation Coefficient." Biometrika. Vol 48, 1961, pp. 85–94. [216] White, H. Asymptotic Theory for Econometricians. New York: Academic Press, 1984. 1-35
1
Getting Started
[217] White, H., and I. Domowitz. “Nonlinear Regression with Dependent Observations.” Econometrica. Vol. 52, 1984, pp. 143–162. [218] Wilson, A. L. "When is the Chow Test UMP?" The American Statistician. Vol. 32, 1978, pp. 66– 68. [219] Wold, Herman. "A Study in the Analysis of Stationary Time Series." Journal of the Institute of Actuaries 70 (March 1939): 113–115. https://doi.org/10.1017/S0020268100011574. [220] Wooldridge, J. M. Introductory Econometrics. Cincinnati, OH: South-Western, 2009.
1-36
2 Data Preprocessing • “Data Transformations” on page 2-2 • “Trend-Stationary vs. Difference-Stationary Processes” on page 2-6 • “Specify Lag Operator Polynomials” on page 2-9 • “Nonseasonal Differencing” on page 2-13 • “Nonseasonal and Seasonal Differencing” on page 2-16 • “Time Series Decomposition” on page 2-19 • “Moving Average Filter” on page 2-21 • “Moving Average Trend Estimation” on page 2-22 • “Parametric Trend Estimation” on page 2-24 • “Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-29 • “Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results” on page 2-34 • “Choose Time Series Filter for Business Cycle Analysis” on page 2-40 • “Seasonal Filters” on page 2-62 • “Seasonal Adjustment” on page 2-65 • “Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-67 • “Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-72
2
Data Preprocessing
Data Transformations In this section... “Why Transform?” on page 2-2 “Common Data Transformations” on page 2-2
Why Transform? You can transform time series to: • Isolate temporal components of interest. • Remove the effect of nuisance components (like seasonality). • Make a series stationary. • Reduce spurious regression effects. • Stabilize variability that grows with the level of the series. • Make two or more time series more directly comparable. You can choose among many data transformation to address these (and other) aims. For example, you can use decomposition methods to describe and estimate time series components. Seasonal adjustment is a decomposition method you can use to remove a nuisance seasonal component. Detrending and differencing are transformations you can use to address nonstationarity due to a trending mean. Differencing can also help remove spurious regression effects due to cointegration. In general, if you apply a data transformation before modeling your data, you then need to backtransform model forecasts to return to the original scale. This is not necessary in Econometrics Toolbox if you are modeling difference-stationary data. Use arima to model integrated series that are not a priori differenced. A key advantage of this is that arima also returns forecasts on the original scale automatically.
Common Data Transformations • “Detrending” on page 2-2 • “Differencing” on page 2-3 • “Log Transformations” on page 2-4 • “Prices, Returns, and Compounding” on page 2-4 Detrending Some nonstationary series can be modeled as the sum of a deterministic trend and a stationary stochastic process. That is, you can write the series yt as yt = μt + εt, where εt is a stationary stochastic process with mean zero. 2-2
Data Transformations
The deterministic trend, μt, can have multiple components, such as nonseasonal and seasonal components. You can detrend (or decompose) the data to identify and estimate its various components. The detrending process proceeds as follows: 1
Estimate the deterministic trend component.
2
Remove the trend from the original data.
3
(Optional) Model the remaining residual series with an appropriate stationary stochastic process.
Several techniques are available for estimating the trend component. You can estimate it parametrically using least squares, nonparametrically using filters (moving averages), or a combination of both. Detrending yields estimates of all trend and stochastic components, which might be desirable. However, estimating trend components can require making additional assumptions, performing extra steps, and estimating additional parameters. Differencing Differencing is an alternative transformation for removing a mean trend from a nonstationary series. This approach is advocated in the Box-Jenkins approach to model specification [1]. According to this methodology, the first step to build models is differencing your data until it looks stationary. Differencing is appropriate for removing stochastic trends (e.g., random walks). Define the first difference as Δyt = yt − yt − 1, where Δ is called the differencing operator. In lag operator notation, where Li yt = yt − i, Δyt = (1 − L)yt . You can create lag operator polynomial objects using LagOp. Similarly, define the second difference as 2
Δ2 yt = (1 − L) yt = (yt − yt − 1) − (yt − 1 − yt − 2) = yt − 2yt − 1 + yt − 2 . Like taking derivatives, taking a first difference makes a linear trend constant, taking a second difference makes a quadratic trend constant, and so on for higher-degree polynomials. Many complex stochastic trends can also be eliminated by taking relatively low-order differences. Taking D differences makes a process with D unit roots stationary. For series with seasonal periodicity, seasonal differencing can address seasonal unit roots. For data with periodicity s (e.g., quarterly data have s = 4 and monthly data have s = 12), the seasonal differencing operator is defined as Δs yt = (1 − Ls)yt = yt − yt − s . Using a differencing transformation eliminates the intermediate estimation steps required for detrending. However, this means you can’t obtain separate estimates of the trend and stochastic components. 2-3
2
Data Preprocessing
Log Transformations For a series with exponential growth and variance that grows with the level of the series, a log transformation can help linearize and stabilize the series. If you have negative values in your time series, you should add a constant large enough to make all observations greater than zero before taking the log transformation. In some application areas, working with differenced, logged series is the norm. For example, the first differences of a logged time series, Δlogyt = logyt − logyt − 1, are approximately the rates of change of the series. Prices, Returns, and Compounding The rates of change of a price series are called returns. Whereas price series do not typically fluctuate around a constant level, the returns series often looks stationary. Thus, returns series are typically used instead of price series in many applications. Denote successive price observations made at times t and t + 1 as yt and yt+1, respectively. The continuously compounded returns series is the transformed series rt = log
yt + 1 = logyt + 1 − logyt . yt
This is the first difference of the log price series, and is sometimes called the log return. An alternative transformation for price series is simple returns, rt =
yt + 1 − yt yt + 1 = − 1. yt yt
For series with relatively high frequency (e.g., daily or weekly observations), the difference between the two transformations is small. Econometrics Toolbox has price2ret for converting price series to returns series (with either continuous or simple compounding), and ret2price for the inverse operation.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also LagOp | price2ret | ret2price
Related Examples
2-4
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Moving Average Trend Estimation” on page 2-22
•
“Nonseasonal Differencing” on page 2-13
Data Transformations
•
“Nonseasonal and Seasonal Differencing” on page 2-16
•
“Parametric Trend Estimation” on page 2-24
•
“Specify Lag Operator Polynomials” on page 2-9
More About •
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
•
“Moving Average Filter” on page 2-21
•
“Seasonal Adjustment” on page 2-65
•
“Time Series Decomposition” on page 2-19
2-5
2
Data Preprocessing
Trend-Stationary vs. Difference-Stationary Processes In this section... “Nonstationary Processes” on page 2-6 “Trend Stationary” on page 2-7 “Difference Stationary” on page 2-7
Nonstationary Processes The stationary stochastic process is a building block of many econometric time series models. Many observed time series, however, have empirical features that are inconsistent with the assumptions of stationarity. For example, the following plot shows quarterly U.S. GDP measured from 1947 to 2005. There is a very obvious upward trend in this series that one should incorporate into any model for the process. load Data_GDP plot(Data) xlim([0,234]) title('Quarterly U.S. GDP, 1947-2005')
A trending mean is a common violation of stationarity. There are two popular models for nonstationary series with a trending mean. 2-6
Trend-Stationary vs. Difference-Stationary Processes
• Trend stationary: The mean trend is deterministic. Once the trend is estimated and removed from the data, the residual series is a stationary stochastic process. • Difference stationary: The mean trend is stochastic. Differencing the series D times yields a stationary stochastic process. The distinction between a deterministic and stochastic trend has important implications for the longterm behavior of a process: • Time series with a deterministic trend always revert to the trend in the long run (the effects of shocks are eventually eliminated). Forecast intervals have constant width. • Time series with a stochastic trend never recover from shocks to the system (the effects of shocks are permanent). Forecast intervals grow over time. Unfortunately, for any finite amount of data there is a deterministic and stochastic trend that fits the data equally well (Hamilton, 1994). Unit root tests are a tool for assessing the presence of a stochastic trend in an observed series.
Trend Stationary You can write a trend-stationary process, yt, as yt = μt + εt, where: • μt is a deterministic mean trend. • εt is a stationary stochastic process with mean zero. In some applications, the trend is of primary interest. Time series decomposition methods focus on decomposing μt into different trend sources (e.g., secular trend component and seasonal component). You can decompose series nonparametrically using filters (moving averages), or parametrically using regression methods. Given an estimate μ t, you can explore the residual seriesyt − μ t for autocorrelation, and optionally model it using a stationary stochastic process model.
Difference Stationary In the Box-Jenkins modeling approach [2], nonstationary time series are differenced until stationarity is achieved. You can write a difference-stationary process, yt, as ΔD yt = μ + ψ(L)εt, where: • ΔD = (1 − L)D is a Dth-degree differencing operator. • ψ(L) = (1 + ψ L + ψ L2 + …) is an infinite-degree lag operator polynomial with absolutely 1 2 summable coefficients and all roots lying outside the unit circle. • εt is an uncorrelated innovation process with mean zero. 2-7
2
Data Preprocessing
Time series that can be made stationary by differencing are called integrated processes. Specifically, when D differences are required to make a series stationary, that series is said to be integrated of order D, denoted I(D). Processes with D ≥ 1 are often said to have a unit root.
References [1] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Related Examples •
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Nonseasonal Differencing” on page 2-13
•
“Moving Average Trend Estimation” on page 2-22
•
“Specify Lag Operator Polynomials” on page 2-9
More About
2-8
•
“Moving Average Filter” on page 2-21
•
“Time Series Decomposition” on page 2-19
•
“What Are ARIMA Models?” on page 7-41
Specify Lag Operator Polynomials
Specify Lag Operator Polynomials In this section... “Lag Operator Polynomial of Coefficients” on page 2-9 “Difference Lag Operator Polynomials” on page 2-11
Lag Operator Polynomial of Coefficients Define the lag operator L such that Liyt = yt–i. An m-degree polynomial of coefficients A in the lag operator L is given by A(L) = (A0 + A1L1 + … + AmLm) . Here, the coefficient A0 corresponds to lag 0, A1 corresponds to lag 1, and so on, to Am, which corresponds to lag m. To specify a coefficient lag operator polynomial in Econometrics Toolbox, use LagOp. Specify the (nonzero) coefficients A0,...,Am as a cell array, and the lags of the nonzero coefficients as a vector. The coefficients of lag operator polynomial objects are designed to look and feel like traditional MATLAB cell arrays. There is, however, an important difference: elements of cell arrays are accessible by positive integer sequential indexing, i.e., 1, 2, 3,.... The coefficients of lag operator polynomial objects are accessible by lag-based indexing. That is, you can specify any nonnegative integer lags, including lag 0. For example, consider specifying the polynomial A(L) = (1 − 0.3L + 0.6L4) . This polynomial has coefficient 1 at lag 0, coefficient –0.3 at lag 1, and coefficient 0.6 at lag 4. Enter: A = LagOp({1,-0.3,0.6},'Lags',[0,1,4]) A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.3 0.6] Lags: [0 1 4] Degree: 4 Dimension: 1
The created lag operator object A corresponds to a lag operator polynomial of degree 4. A LagOp object has a number of properties describing it: • Coefficients, a cell array of coefficients. • Lags, a vector indicating the lags of nonzero coefficients. • Degree, the degree of the polynomial. • Dimension, the dimension of the polynomial (relevant for multivariate time series). To access properties of the model, use dot notation. That is, enter the variable name and then the property name, separated by a period. To access specific coefficients, use dot notation along with cell array syntax (consistent with the Coefficients data type). To illustrate, returns the coefficient at lag 4: 2-9
2
Data Preprocessing
A.Coefficients{4} ans = 0.6000
Return the coefficient at lag 0: A.Coefficients{0} ans = 1
This last command illustrates lag indexing. The index 0 is valid, and corresponds to the lag 0 coefficient. Notice what happens if you index a lag larger than the degree of the polynomial: A.Coefficients{6} ans = 0
This does not return an error. Rather, it returns O, the coefficient at lag 6 (and all other lags with coefficient zero). Use similar syntax to add new nonzero coefficients. For example, to add the coefficient 0.4 at lag 6, A.Coefficients{6} = 0.4 A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.3 0.6 0.4] Lags: [0 1 4 6] Degree: 6 Dimension: 1
The lag operator polynomial object A now has nonzero coefficients at lags 0, 1, 4, and 6, and is degree 6. When lag indices are placed inside of parentheses the result is another lag-based cell array that represent a subset of the original polynomial. A0 = A.Coefficients(0) A0 = 1-D Lag-Indexed Cell Array Created at Lags [0] with Non-Zero Coefficients at Lags [0].
A0 is a new object that preserves lag-based indexing and is suitable for assignment to and from lag operator polynomial. class(A0) ans = 'internal.econ.LagIndexedArray'
In contrast, when lag indices are placed inside curly braces, the result is the same data type as the indices themselves: class(A.Coefficients{0})
2-10
Specify Lag Operator Polynomials
ans = 'double'
Difference Lag Operator Polynomials You can express the differencing operator, Δ, in lag operator polynomial notation as Δ = (1 − L) . More generally, D
ΔD = (1 − L) . To specify a first differencing operator polynomial using LagOp, specify coefficients 1 and –1 at lags 0 and 1: D1 = LagOp({1,-1},'Lags',[0,1]) D1 = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1] Lags: [0 1] Degree: 1 Dimension: 1
Similarly, the seasonal differencing operator in lag polynomial notation is Δs = (1 − Ls) . This has coefficients 1 and –1 at lags 0 and s, where s is the periodicity of the seasonality. For example, for monthly data with periodicity s = 12, D12 = LagOp({1,-1},'Lags',[0,12]) D12 = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1] Lags: [0 12] Degree: 12 Dimension: 1
This results in a polynomial object with degree 12. D
When a difference lag operator polynomial is applied to a time series yt, (1 − L) yt, this is equivalent to filtering the time series. Note that filtering a time series using a polynomial of degree D results in the loss of the first D observations. 2
Consider taking second differences of a time series yt, (1 − L) yt . You can write this differencing 2
polynomial as (1 − L) = (1 − L)(1 − L) . Create the second differencing polynomial by multiplying the polynomial D1 to itself to get the second-degree differencing polynomial: D2 = D1*D1
2-11
2
Data Preprocessing
D2 = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -2 1] Lags: [0 1 2] Degree: 2 Dimension: 1
The coefficients in the second-degree differencing polynomial correspond to the coefficients in the difference equation 2
(1 − L) yt = yt − 2yt − 1 + yt − 2 . To see the effect of filtering (differencing) on the length of a time series, simulate a data set with 10 observations to filter: rng('default') Y = randn(10,1);
Filter the time series Y using D2: Yf = filter(D2,Y); length(Yf) ans = 8
The filtered series has two observations less than the original series. The time indices for the new series can be optionally returned: Note that the time indices are given relative to time 0. That is, the original series corresponds to times 0,...,9. The filtered series loses the observations at the first two times (times 0 and 1), resulting in a series corresponding to times 2,...,9. You can also filter a time series, say Y, with a lag operator polynomial, say D2, using this shorthand syntax: Yf = D2(Y);
See Also LagOp | filter
Related Examples •
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Nonseasonal Differencing” on page 2-13
•
“Nonseasonal and Seasonal Differencing” on page 2-16
•
“Plot the Impulse Response Function of Conditional Mean Model” on page 7-80
More About •
2-12
“Moving Average Filter” on page 2-21
Nonseasonal Differencing
Nonseasonal Differencing This example shows how to take a nonseasonal difference of a time series. The time series is quarterly U.S. GDP measured from 1947 to 2005. Load the GDP data set included with the toolbox. load Data_GDP Y = Data; N = length(Y); figure plot(Y) xlim([0,N]) title('U.S. GDP')
The time series has a clear upward trend. Take a first difference of the series to remove the trend, Δyt = (1 − L)yt = yt − yt − 1 . First create a differencing lag operator polynomial object, and then use it to filter the observed series. 2-13
2
Data Preprocessing
D1 = LagOp({1,-1},'Lags',[0,1]); dY = filter(D1,Y); figure plot(2:N,dY) xlim([0,N]) title('First Differenced GDP Series')
The series still has some remaining upward trend after taking first differences. Take a second difference of the series, 2
Δ2 yt = (1 − L) yt = yt − 2yt − 1 + yt − 2 . D2 = D1*D1; ddY = filter(D2,Y); figure plot(3:N,ddY) xlim([0,N]) title('Second Differenced GDP Series')
2-14
Nonseasonal Differencing
The second-differenced series appears more stationary.
See Also LagOp | filter
Related Examples •
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Nonseasonal and Seasonal Differencing” on page 2-16
•
“Specify Lag Operator Polynomials” on page 2-9
More About •
“Data Transformations” on page 2-2
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
2-15
2
Data Preprocessing
Nonseasonal and Seasonal Differencing This example shows how to apply both nonseasonal and seasonal differencing using lag operator polynomial objects. The time series is monthly international airline passenger counts from 1949 to 1960. Load the airline data set (Data_Airline.mat). load Data_Airline y = log(DataTimeTable.PSSG); T = length(y); figure plot(DataTimeTable.Time,y) title('Log Airline Passenger Counts')
The data shows a linear trend and a seasonal component with periodicity 12. Take the first difference to address the linear trend, and the 12th difference to address the periodicity. If yt is the series to be transformed, the transformation is ΔΔ12 yt = (1 − L)(1 − L12)yt, where Δ denotes the difference operator, and L denotes the lag operator. 2-16
Nonseasonal and Seasonal Differencing
Create the lag operator polynomials 1 − L and 1 − L12. Then, multiply them to get the desired lag operator polynomial. D1 = LagOp({1 -1},'Lags',[0,1]); D12 = LagOp({1 -1},'Lags',[0,12]); D = D1*D12 D = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1 -1 1] Lags: [0 1 12 13] Degree: 13 Dimension: 1
The first polynomial, 1 − L, has coefficient 1 at lag 0 and coefficient -1 at lag 1. The seasonal differencing polynomial, 1 − L12, has coefficient 1 at lag 0, and -1 at lag 12. The product of these polynomials is (1 − L)(1 − L12) = 1 − L − L12 + L13, which has coefficient 1 at lags 0 and 13, and coefficient -1 at lags 1 and 12. Filter the data with differencing polynomial D to get the nonseasonally and seasonally differenced series. dY = filter(D,y); length(y) - length(dY) ans = 13
The filtered series is 13 observations shorter than the original series. This is due to applying a degree 13 polynomial filter. figure plot(DataTimeTable.Time(14:end),dY) title('Differenced Log Airline Passenger Counts')
2-17
2
Data Preprocessing
The differenced series has neither the trend nor seasonal component exhibited by the original series.
See Also LagOp | filter
Related Examples •
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Nonseasonal Differencing” on page 2-13
•
“Specify Lag Operator Polynomials” on page 2-9
More About
2-18
•
“Data Transformations” on page 2-2
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
Time Series Decomposition
Time Series Decomposition Time series decomposition involves separating a time series into several distinct components. There are three components that are typically of interest: • Tt, a deterministic, nonseasonal secular trend component. This component is sometimes restricted to being a linear trend, though higher-degree polynomials are also used. • St, a deterministic seasonal component with known periodicity. This component captures level shifts that repeat systematically within the same period (e.g., month or quarter) between successive years. It is often considered to be a nuisance component, and seasonal adjustment is a process for eliminating it. • It, a stochastic irregular component. This component is not necessarily a white noise process. It can exhibit autocorrelation and cycles of unpredictable duration. For this reason, it is often thought to contain information about the business cycle, and is usually the most interesting component. There are three functional forms that are most often used for representing a time series yt as a function of its trend, seasonal, and irregular components: • Additive decomposition, where yt = Tt + St + It . This is the classical decomposition. It is appropriate when there is no exponential growth in the series, and the amplitude of the seasonal component remains constant over time. For identifiability from the trend component, the seasonal and irregular components are assumed to fluctuate around zero. • Multiplicative decomposition, where yt = TtStIt . This decomposition is appropriate when there is exponential growth in the series, and the amplitude of the seasonal component grows with the level of the series. For identifiability from the trend component, the seasonal and irregular components are assumed to fluctuate around one. • Log-additive decomposition, where logyt = Tt + St + It . This is an alternative to the multiplicative decomposition. If the original series has a multiplicative decomposition, then the logged series has an additive decomposition. Using the logs can be preferable when the time series contains many small observations. For identifiability from the trend component, the seasonal and irregular components are assumed to fluctuate around zero. You can estimate the trend and seasonal components by using filters (moving averages) or parametric regression models. Given estimates T t and S t, the irregular component is estimated as I t = yt − T t − S t using the additive decomposition, and It=
yt T tS t
2-19
2
Data Preprocessing
using the multiplicative decomposition. The series yt − T t (or yt /T t using the multiplicative decomposition) is called a detrended series. Similarly, the series yt − S t (or yt /S t) is called a deseasonalized series.
See Also hpfilter | bkfilter | cffilter | hfilter
Related Examples
2-20
•
“Data Transformations” on page 2-2
•
“Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-29
•
“Choose Time Series Filter for Business Cycle Analysis” on page 2-40
•
“Moving Average Filter” on page 2-21
•
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-67
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-72
•
“Parametric Trend Estimation” on page 2-24
•
“Seasonal Adjustment” on page 2-65
Moving Average Filter
Moving Average Filter Some time series are decomposable into various trend components. To estimate a trend component without making parametric assumptions, you can consider using a filter. Filters are functions that turn one time series into another. By appropriate filter selection, certain patterns in the original time series can be clarified or eliminated in the new series. For example, a low-pass filter removes high frequency components, yielding an estimate of the slow-moving trend. A specific example of a linear filter is the moving average. Consider a time series yt, t = 1,...,N. A symmetric (centered) moving average filter of window length 2q + 1 is given by mt =
q
∑
j= −q
b j yt + j,
q < t < N − q.
You can choose any weights bj that sum to one. To estimate a slow-moving trend, typically q = 2 is a good choice for quarterly data (a 5-term moving average), or q = 6 for monthly data (a 13-term moving average). Because symmetric moving averages have an odd number of terms, a reasonable choice for the weights is b j = 1/4q for j = ±q, and b j = 1/2q otherwise. Implement a moving average by convolving a time series with a vector of weights using conv. You cannot apply a symmetric moving average to the q observations at the beginning and end of the series. This results in observation loss. One option is to use an asymmetric moving average at the ends of the series to preserve all observations.
See Also conv
Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Parametric Trend Estimation” on page 2-24
•
“Data Transformations” on page 2-2
•
“Time Series Decomposition” on page 2-19
•
“Seasonal Filters” on page 2-62
2-21
2
Data Preprocessing
Moving Average Trend Estimation This example shows how to estimate long-term trend using a symmetric moving average function. This is a convolution that you can implement using conv. The time series is monthly international airline passenger counts from 1949 to 1960. Load the airline data set (Data_Airline). load Data_Airline y = log(DataTimeTable.PSSG); T = length(y); figure plot(DataTable.Time,y) title 'Log Airline Passenger Counts'; hold on
The data shows a linear trend and a seasonal component with periodicity 12. The periodicity of the data is monthly, so a 13-term moving average is a reasonable choice for estimating the long-term trend. Use weight 1/24 for the first and last terms, and weight 1/12 for the interior terms. Add the moving average trend estimate to the observed time series plot. wts = [1/24; repmat(1/12,11,1); 1/24]; yS = conv(y,wts,'valid');
2-22
Moving Average Trend Estimation
h = plot(DataTimeTable.Time(7:end-6),yS,'r','LineWidth',2); legend(h,'13-Term Moving Average') hold off
When you use the shape parameter 'valid' in the call to conv, observations at the beginning and end of the series are lost. Here, the moving average has window length 13, so the first and last 6 observations do not have smoothed values.
See Also conv
Related Examples •
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-67
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-72
•
“Parametric Trend Estimation” on page 2-24
More About •
“Time Series Decomposition” on page 2-19
•
“Moving Average Filter” on page 2-21 2-23
2
Data Preprocessing
Parametric Trend Estimation This example shows how to estimate nonseasonal and seasonal trend components using parametric models. The time series is monthly accidental deaths in the U.S. from 1973 to 1978 (Brockwell and Davis, 2002). Load Data Load the accidental deaths data set. load Data_Accidental y = DataTimeTable.NUMD; T = length(y); figure plot(DataTimeTable.Time,y/1000) title('Monthly Accidental Deaths') ylabel('Number of Deaths (thousands)') hold on
The data shows a potential quadratic trend and a strong seasonal component with periodicity 12. Fit Quadratic Trend Fit the polynomial 2-24
Parametric Trend Estimation
Tt = β0 + β1t + β2t2 to the observed series. t = (1:T)'; X = [ones(T,1) t t.^2]; b = X\y; tH = X*b; h2 = plot(DataTimeTable.Time,tH/1000,'r','LineWidth',2); legend(h2,'Quadratic Trend Estimate') hold off
Detrend Original Series Subtract the fitted quadratic line from the original data. xt = y - tH;
Estimate Seasonal Indicator Variables Create indicator (dummy) variables for each month. The first indicator is equal to one for January observations, and zero otherwise. The second indicator is equal to one for February observations, and zero otherwise. A total of 12 indicator variables are created for the 12 months. Regress the detrended series against the seasonal indicators. 2-25
2
Data Preprocessing
mo = repmat((1:12)',6,1); sX = dummyvar(mo); bS = sX\xt; st = sX*bS; figure plot(DataTimeTable.Time,st/1000) ylabel 'Number of Deaths (thousands)'; title('Parametric Estimate of Seasonal Component (Indicators)')
In this regression, all 12 seasonal indicators are included in the design matrix. To prevent collinearity, an intercept term is not included (alternatively, you can include 11 indicators and an intercept term). Deseasonalize Original Series Subtract the estimated seasonal component from the original series. dt = y - st; figure plot(DataTimeTable.Time,dt/1000) title('Monthly Accidental Deaths (Deseasonalized)') ylabel('Number of Deaths (thousands)')
2-26
Parametric Trend Estimation
The quadratic trend is much clearer with the seasonal component removed. Estimate Irregular Component Subtract the trend and seasonal estimates from the original series. The remainder is an estimate of the irregular component. bt = y - tH - st; figure plot(DataTimeTable.Time,bt/1000) title('Irregular Component') ylabel('Number of Deaths (thousands)')
2-27
2
Data Preprocessing
You can optionally model the irregular component using a stochastic process model. References: Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also dummyvar
Related Examples
2-28
•
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-67
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-72
•
“Time Series Decomposition” on page 2-19
•
“Seasonal Adjustment” on page 2-65
Use Hodrick-Prescott Filter to Reproduce Original Result
Use Hodrick-Prescott Filter to Reproduce Original Result This example shows how to use the Hodrick-Prescott filter to decompose a time series. The Hodrick-Prescott filter specialized filter for trend and business cycle estimation. The filter separates a time series into trend and cyclical components (no seasonal components), the latter of which is often of interest to business cycle analysts. Suppose a time series yt can be additively decomposed into a trend τt and business cycle component ct, such that, for t = 1, . . . , T, yt = τt + ct . The objective function for the filter is T
∑
t=1
ct2 + λ
T−1
∑
t=2
2
((τt + 1 − τt) − (τt − τt − 1)) .
The programming problem is to minimize the objective over all trends components τ1, . . . , τT . The hyperparameter λ is a nonnegative smoothing parameter that penalizes the objective for large secodorder differences of the trend component. The conceptual basis for this programming problem is that the first sum minimizes the difference between the data and its trend component (which is the cyclical component) and the second sum minimizes the second-order difference of the trend component, which is analogous to minimization of the second derivative of the trend component. The programmuing problem is equivalent to that of a cubic spline smoother. Use Hodrick-Prescott Filter to Analyze GNP Cyclicality Using data similar to the data found in Hodrick and Prescott [1], plot the cyclical component of GNP. This result should coincide with the results in the paper. However, since the GNP data here and in the paper are both adjusted for seasonal variations with conversion from nominal to real values, differences can be expected due to differences in the sources for the pair of adjustments. Note that our data comes from the St. Louis Federal Reserve FRED database [2], which was downloaded with the Datafeed Toolbox™. load Data_GNP % Change the following two lines to try alternative periods startdate = datetime(1950,1,1); enddate = datetime(1979,4,1); DTT = DataTimeTable(startdate:enddate,:); DTT.GNPRLog = log(DTT.GNPR); figure plot(DTT.Time,DTT.GNPRLog) title("Gross National Product (Log)")
2-29
2
Data Preprocessing
Filter the series for each smoothing parameter value λ = 400, 1600, 6400, and ∞. The infinite smoothing parameter detrends the data. [TTbl4,CTbl4] = hpfilter(DTT,Smoothing=400,DataVariables="GNPRLog"); [TTbl16,CTbl16] = hpfilter(DTT,Smoothing=1600,DataVariables="GNPRLog"); [TTbl64,CTbl64] = hpfilter(DTT,Smoothing=6400,DataVariables="GNPRLog"); [TTblInf,CTblInf] = hpfilter(DTT,Smoothing=Inf,DataVariables="GNPRLog");
Plot Cyclical GNP and Its Relationship With Long-Term Trend Generate Figure 1 from Hodrick and Prescott [1] by setting the numeric slider interactive control for λ to 1600. lambda = ; [~,CTbl] = hpfilter(DTT,lambda,DataVariables="GNPRLog"); plot(DTT.Time,CTbl.GNPRLog,"b"); hold all plot(DTT.Time,CTblInf.GNPRLog - CTbl.GNPRLog,"r"); title("Figure 1 from Hodrick and Prescott"); ylabel("GNP Trend"); legend(["Cyclical GNP" "Difference"]); hold off
2-30
Use Hodrick-Prescott Filter to Reproduce Original Result
The blue line is the cyclical component with smoothing parameter 1600 and the red line is the difference with respect to the detrended cyclical component. The difference is smooth enough to suggest that the choice of smoothing parameter is appropriate. You can use the numeric slider control to tune the filter interactively. Statistical Tests on Cyclical GNP Reconstruct Table 1 from Hodrick and Prescott [1]. With the cyclical components, compute standard deviations, autocorrelations for lags 1 to 10, and perform a Dickey-Fuller unit root test to assess nonstationarity. ACFTbl4 = autocorr(CTbl4,NumLags=10,DataVariable="GNPRLog"); ACFTbl16 = autocorr(CTbl16,NumLags=10,DataVariable="GNPRLog"); ACFTbl64 = autocorr(CTbl64,NumLags=10,DataVariable="GNPRLog"); ACFTblInf = autocorr(CTblInf,NumLags=10,DataVariable="GNPRLog"); [StatTbl4] = adftest(CTbl4,Model="ARD"); [StatTbl16] = adftest(CTbl16,Model="ARD"); [StatTbl64] = adftest(CTbl64,Model="ARD"); [StatTblInf] = adftest(CTblInf,Model="ARD"); displayResults(CTbl4,CTbl16,CTbl64,CTblInf, ... ACFTbl4,ACFTbl16,ACFTbl64,ACFTblInf, ... StatTbl4,StatTbl16,StatTbl64,StatTblInf); Table 1 from Hodrick and Prescott Reference Smoothing Parameter
2-31
2
Data Preprocessing
Std. Dev. Autocorrelations 1 2 3 4 5 6 7 8 9 10 Unit Root Reject H0
400 1.52
1600 1.75
6400 2.06
Infinity 3.11
0.74 0.38 0.05 -0.21 -0.36 -0.39 -0.35 -0.28 -0.22 -0.19 -4.35 1
0.78 0.47 0.17 -0.07 -0.24 -0.30 -0.31 -0.29 -0.26 -0.25 -4.13 1
0.82 0.57 0.33 0.12 -0.03 -0.10 -0.13 -0.15 -0.15 -0.17 -3.79 1
0.92 0.81 0.70 0.59 0.50 0.44 0.39 0.35 0.31 0.26 -2.28 0
As shown in [1], as λ increases, standard deviations increase, autocorrelations increase over longer lags, and the unit root hypothesis is rejected for all but the detrended case. These results imply that any of the cyclical series with finite smoothing is effectively stationary. Local Function function displayResults(gnpcycle4,gnpcycle16,gnpcycle64,gnpcycleinf,... gnpacf4,gnpacf16,gnpacf64,gnpacfinf,... gnptest4,gnptest16,gnptest64,gnptestinf) % DISPLAYRESULTS Display cyclical GNP test results in tabular form fprintf(1,'Table 1 from Hodrick and Prescott Reference\n'); fprintf(1,' %10s %s\n',' ','Smoothing Parameter'); fprintf(1,' %10s %10s %10s %10s %10s\n',' ','400','1600','6400','Infinity'); fprintf(1,' %-10s %10.2f %10.2f %10.2f %10.2f\n','Std. Dev.', ... 100*std(gnpcycle4.GNPRLog),100*std(gnpcycle16.GNPRLog), ... 100*std(gnpcycle64.GNPRLog),100*std(gnpcycleinf.GNPRLog)); fprintf(1,' Autocorrelations\n'); for i = 2:11 fprintf(1,' %10g %10.2f %10.2f %10.2f %10.2f\n',(i-1), ... gnpacf4.ACF(i),gnpacf16.ACF(i),gnpacf64.ACF(i),gnpacfinf.ACF(i)) end fprintf(1,' %-10s %10.2f %10.2f %10.2f %10.2f\n','Unit Root', ... gnptest4.stat,gnptest16.stat,gnptest64.stat,gnptestinf.stat); fprintf(1,' %-10s %10d %10d %10d %10d\n','Reject H0',... gnptest4.h,gnptest16.h,gnptest64.h,gnptestinf.h); end
References [1] Hodrick, Robert J., and Edward C. Prescott. "Postwar U.S. Business Cycles: An Empirical Investigation." Journal of Money, Credit and Banking 29, no. 1 (February 1997): 1–16. https:// doi.org/10.2307/2953682. [2] U.S. Federal Reserve Economic Data (FRED), Federal Reserve Bank of St. Louis, https:// fred.stlouisfed.org/.
See Also hpfilter
2-32
Use Hodrick-Prescott Filter to Reproduce Original Result
More About •
When to Use the Hodrick-Prescott Filter
•
“Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results” on page 2-34
•
“Choose Time Series Filter for Business Cycle Analysis” on page 2-40
•
“Moving Average Filter” on page 2-21
•
“Seasonal Filters” on page 2-62
•
“Time Series Decomposition” on page 2-19
2-33
2
Data Preprocessing
Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results Smooth the quarterly U.S. Gross Domestic Product (GDP) series by applying the Hodrick-Prescott filter. Compare the smoothed trend components returned from the one-sided and two-sided methods. Use interactive controls to experiment with parameter settings interactively. The standard, two-sided Hodrick-Prescott filter [1] computes a centered difference to estimate a second derivative at time t, t = 1, . . . , T, using both past and future values of the input series. Although past and future values, relative to t, are available for input historical series, in real time, future values are not known leading to anomalous end effects. As a result, the filtered data can be unsuitable for forecasting [2]. The one-sided Hodrick-Prescott filter [2] uses, at each time t, only current and previous values of the input series; it does not revise outputs when new data is available. Load the U.S. GDP data set Data_GDP.mat, which contains quarterly measurements from 1947 through 2005. load Data_GDP
The data set contains a timetable DataTimeTable, among other variables, containing the series. Plot the raw series. Indicate periods of recession in the plot by drawing vertical bands. figure plot(DataTimeTable.Time,DataTimeTable.GDP) recessionplot ylabel("Billions of Dollars") title("Gross Domestic Product")
2-34
Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results
The GDP tends to increase with time also displaying a cycle with the upward trend. Decompose the series into trend and cyclical components using the standard, two-sided HodrickPrescott filter. Because the series is quarterly, use the default smoothing parameter value. Return the third output to plot the data series and the smoothed trend. figure [TTbl2S,CTbl2S,h] = hpfilter(DataTimeTable); legend(h,AutoUpdate="off",Location="best") recessionplot
2-35
2
Data Preprocessing
TTbl2S and CTbl2S are timetables containing variables for the additive smoothed trend and cyclical components, respectively, of the GDP. The trend appears quite smooth with mild oscillations. Decompose the series again using the one-sided Hodrick-Prescott filter by setting the FilterType name-value argument. figure [TTbl1S,CTbl1S,h] = hpfilter(DataTimeTable,FilterType="one-sided"); legend(h,AutoUpdate="off",Location="best") recessionplot
2-36
Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results
The trend of the one-sided filter appears to level sharply close to recessions, as compared to the smooth trend of the two-sided filter. Plot the cyclical components computed from both filter methods on the same plot. figure hold on plot(CTbl2S.Time,CTbl2S.GDP,"r") plot(CTbl1S.Time,CTbl1S.GDP,"r:",LineWidth=2) recessionplot hold off ylabel("Cyclical Component") title("Gross Domestic Product") legend(["Two-Sided HP Cycle","One-Sided HP Cycle"])
2-37
2
Data Preprocessing
The cyclical components appear to be distinct calibrations of forecasting models. Interactive Controls enable you to experiment with the smoothing parameter and the filter type interactively. Choose the filter type by using the drop-down list and adjust the smoothing parameter by using the numeric slider. ft = lambda =
; % FilterType ; % Smoothing
figure [TTbl1S,CTbl1S,h] = hpfilter(DataTimeTable,FilterType=ft,Smoothing=lambda); legend(h,AutoUpdate="off",Location="best") recessionplot
2-38
Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results
References [1] Hodrick, Robert J., and Edward C. Prescott. "Postwar U.S. Business Cycles: An Empirical Investigation." Journal of Money, Credit and Banking 29, no. 1 (February 1997): 1–16. https:// doi.org/10.2307/2953682. [2] Stock, James H., and Mark W. Watson. "Forecasting Inflation." Journal of Monetary Economics 44, no. 2 (October 1999): 293–335. https://doi.org/10.1016/S0304-3932(99)00027-6. [3] U.S. Federal Reserve Economic Data (FRED), Federal Reserve Bank of St. Louis, https:// fred.stlouisfed.org/.
See Also hpfilter | bkfilter | cffilter | hfilter
More About •
When to Use the Hodrick-Prescott Filter
•
“Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-29
•
“Choose Time Series Filter for Business Cycle Analysis” on page 2-40
•
“Time Series Decomposition” on page 2-19 2-39
2
Data Preprocessing
Choose Time Series Filter for Business Cycle Analysis This example compares the performance of the Hodrick-Prescott filter (hpfilter) for business cycle analysis with the performance of alternatives in the context of various economic data-generating processes. The alternative business cycle filters, available with Econometrics Toolbox™, are the: • Baxter-King filter [1], as computed by the bkfilter function • Christiano-Fitzgerald filter [4], as computed by the cffilter function • One-sided Hodrick-Prescott filter [15], as computed by the hpfilter function • Hamilton filter [9], as computed by the hfilter function This example uses historical data sets available with Econometrics Toolbox. For current data, download series from the Federal Reserve Economic Data (FRED) database on the Federal Reserve Bank of St. Louis website at https://fred.stlouisfed.org/. What Is a Business Cycle? The forces that shape a macroeconomy cause both long-term trends and temporary fluctuations in econometric data. Long-term secular influences include population growth, capital accumulation, productivity enhancements, and market development. Short-term influences include seasonality, regulatory intervention, central bank policies, technology shocks, and investor outlook. When observed in aggregate over multiple indicators of growth, medium-term variations in the economy are often described as recessions and expansions, or business cycles. Despite the suggestion of regularity, these empirical cycles are nondeterministic, aperiodic, and a mixture of frequencies. Business cycles are evident in many macroeconomic time series. For example, the monthly US unemployment rate from 1954 through 1998 shows a distinctive pattern of peaks and troughs. To see the business cycle, load the monthly US unemployment data Data_Unemployment.mat. load Data_Unemployment
The variable Data is a 45-by-12 data matrix with months January through December as columns and years 1954 through 1998 as rows. Arrange the data as one column vector of increasing observation times. Create a datetime vector of observation times. Data = Data'; UN = Data(:); tstart = datetime(dates(1),1,1); tend = datetime(dates(end),12,1); tUN = tstart:calmonths(1):tend;
Plot the series and overlay recession bands determined by the National Bureau of Economic Research (NBER). figure plot(tUN,UN) recessionplot ylabel("Rate (%)") title("Unemployment Rate")
2-40
Choose Time Series Filter for Business Cycle Analysis
In such series, you can attempt to distinguish a trend component, accounting for long-term growth, and a cyclical component, capturing shorter-term deviations from the trend. What constitutes a trend and cycle, however, is a matter of problem formulation, analytic objectives, and the data available. Methods for achieving trend-cycle decompositions of economic time series have a long history. Burns and Mitchell [3] were the first to describe the stylized facts of business cycles in the modern era. They defined US business cycles as cyclical components with a duration of no less than 6 quarters (18 months) and no more than 32 quarters (8 years). Their method requires the investigator's judgement to find peaks and troughs in empirical data, then compare them to reference cycles. The process is illsuited to computer-driven analytics. Hodrick-Prescott Filter Hodrick and Prescott [11] sought to develop an analogue to the Burns and Mitchell approach with a clear computational basis. The Hodrick-Prescott filter removes a low-frequency trend τt from a time series yt and assigns the remaining high-frequency components to a cycle ct, so that yt = τt + ct. The filter identifies the components τt and ct by minimizing the objective function −1 ∑Tt = 1 ct2 + λ∑Tt = 2 τt + 1 − τt − τt − τt − 1
2
,
where T is the sample size of yt and λ is a tunable smoothing parameter. The first term minimizes the deviation of the cyclical component ct = yt − τt from the overall series data. The second term is a numerical second derivative of τt, so that minimization penalizes rapid changes in the slope of the 2-41
2
Data Preprocessing
trend to a degree determined by λ. The derivative is centered at time t, incorporating past and future values, making the filter two-sided and noncausal. The Hodrick-Prescott filter identifies the trend component through smoothing. The frequencies assigned to the cyclical component are highpass, and do not correspond to any specific definition of business cycle. Hodrick and Prescott say of the cyclical component that "the fluctuations studied are those that are too rapid to be accounted for by slowly changing demographic and technological factors and changes in stocks of capital that produce secular growth in output per capita." The hpfilter function implements the Hodrick-Prescott filter. For monthly data such as the unemployment rate series, Hodrick and Prescott recommend a smoothing parameter of λ = 14,400. [UNHPTrend,UNHPCycle] = hpfilter(UN,Smoothing=14400);
Ravn and Uhlig [13] reassess the Hodrick-Prescott filter and recommend a more aggressive smoothing parameter for monthly data of λ = 129,600. [UNRUTrend,UNRUCycle] = hpfilter(UN,Smoothing=129600);
Visually compare the resulting trend components. figure hold on plot(tUN,UN) plot(tUN,UNHPTrend,"r") plot(tUN,UNRUTrend,"m:",LineWidth=2) recessionplot hold off ylabel("Rate (%)") title("Unemployment Rate") legend(["Data" "HP trend" "RU trend"])
2-42
Choose Time Series Filter for Business Cycle Analysis
Visually compare the resulting cyclical components, which are the difference between the data and the trends. figure hold on plot(tUN,UNHPCycle,"r") plot(tUN,UNRUCycle,"m:",LineWidth=2) recessionplot hold off ylabel("Cyclical Component") title("Unemployment Rate") legend(["HP cycle" "RU cycle"])
2-43
2
Data Preprocessing
Greater smoothing of the Ravn-Uhlig trend leads to larger absolute deviations in the cycle. Baxter-King Filter Granger [8] notes that the “typical spectral shape” of macroeconomic time series exhibits substantial power in a range of low frequencies, a high-frequency noise component from aggregated variables, and a business cycle in between. The approach of Baxter and King [1] is focused on the specific definition of business cycle adopted by Burns and Mitchell: "Technically, we develop approximate band-pass filters that are constrained to produce stationary outcomes when applied to growing time series. For the empirical applications... we adopt the definition of business cycles suggested by the procedures and findings of NBER researchers like Burns and Mitchell." The band-pass methodology of Baxter and King formalizes Granger's insights and combines them with the conclusions of Burns and Mitchell, making rigorous a consensus perspective in macroeconomics. The bkfilter function implements the Baxter-King filter. Because it is a bandpass filter, the function requires upper and lower cutoff periods (UpperCutoff and LowerCutoff name-value arguments), in units of data periodicity, to delineate the extent of the business cycle. Instead of a smoothing parameter, an optional lag-length parameter (LagLength name-value argument) adjusts the size of a symmetric, time-invariant moving average that smooths the data and extracts a trend component. The fixed lag length results in data trimming on both ends of the data. Although this approach is different from that of Hodrick and Prescott, for many macroeconomic series, the Baxter-King filter produces similar results. To compare the results, load the US Gross 2-44
Choose Time Series Filter for Business Cycle Analysis
Domestic Product (GDP) data set Data_GDP, which contains quarterly measurements of the US GDP from 1947 through 2005. Plot the series. load Data_GDP GDP = Data; tGDP = datetime(dates,ConvertFrom="datenum"); figure plot(tGDP,GDP) recessionplot ylabel("Billions of Dollars") title("Gross Domestic Product")
The data drifts upward with a clear cyclical component around the trend. For quarterly data, Hodrick-Prescott and Ravn-Uhlig recommendations for the smoothing parameter coincide. Burns and Mitchell cutoffs for the bandpass filter are at 6 and 32 quarters. Baxter and King suggest a lag length of 12 quarters. Apply the Hodrick-Prescott and Baxter-King filters to the US GDP series. Visually compare the cyclical components. [~,GDPHPCycle] = hpfilter(GDP); [~,GDPBKCycle] = bkfilter(GDP); figure hold on
2-45
2
Data Preprocessing
plot(tGDP,GDPHPCycle,"r") plot(tGDP,GDPBKCycle,"g:",LineWidth=2) recessionplot hold off ylabel("Cyclical Component") title("Gross Domestic Product") legend(["HP cycle" "BK cycle"])
Baxter and King note that this close correspondence between the cyclical components does not hold true for all series because the Hodrick-Prescott filter allows unlimited high-frequency variation into the cyclical component. Some macroeconomic series, such as the inflation rate, have significant highfrequency variation. Load the Schwert macroeconomic series data set Data_SchwertMacro.mat. Extract the monthly Consumer Price Index (CPI) and transform it to an inflation rate series. load Data_SchwertMacro CPI = DataTableMth.CPI; INFRate = 100*diff(CPI)./CPI(1:end-1); tINF = datetime(datesMth(2:end),ConvertFrom="datenum");
Apply the Hodrick-Prescott and Baxter-King filters to the US inflation rate series. Use the parameter settings recommended for monthly data. Visually compare the cyclical components. [~,INFHPCycle] = hpfilter(INFRate,Smoothing=14400); [~,INFBKCycle] = bkfilter(INFRate,LowerCutoff=18,UpperCutoff=96, ... LagLength=36);
2-46
Choose Time Series Filter for Business Cycle Analysis
figure hold on plot(tINF,INFHPCycle,"r") plot(tINF,INFBKCycle,"g:",LineWidth=2) recessionplot hold off ylabel("Cyclical Component") title("Inflation Rate") legend(["HP Cycle" "BK Cycle"])
The absolute variation is significantly greater in the Hodrick-Prescott cycle. Christiano-Fitzgerald Filter Christiano and Fitzgerald [4] describe another bandpass filter to extract middle-frequency business cycles. The cffilter function implements the Christiano-Fitzgerald filter. Differences between the Baxter-King and Christiano-Fitzgerald bandpass filters are mostly technical (each optimizes a different objective function), not functional (both filters specify lower and upper cutoff periods for the cycle). The Christiano-Fitzgerald filter produces asymptotically optimal finitesample approximations to an ideal bandpass filter. A caveat is that it "is optimal under the (most likely, false) assumption that the data are generated by a pure random walk." To evaluate the performance of the filter under the pure random walk assumption, generate a path from a random walk model, with a drift component to introduce a trend. 2-47
2
Data Preprocessing
RW0 = 0; % Initial value drift = 0.2; % Drift component sigma = 0.2; % Innovations standard deviation rng(1); % For reproducibility RW = RW0*ones(100,1); for tRW = 2:100 RW(tRW) = RW(tRW-1) + drift + sigma*randn; % Random walk with unit root end plot(RW) title("Random Walk with Drift")
Compare the Baxter-King and Christiano-Fitzgerald bandpass filters under the random walk assumption, treating the data as quarterly. When a series is trending, cffilter must estimate an additional drift parameter. Therefore, specify that the series has drift by setting the Drift namevalue argument to true. The Christiano-Fitzgerald filter avoids end-trimming of data by using an asymmetric, time-varying moving average (the default FilterType="asymmetric") to smooth the data and extract the trend component, so it does not require a LagLength name-value argument. The asymmetric filter can cause some phase shifting in the cycle. [~,RWBKCycle] = bkfilter(RW); [~,RWCFCycle] = cffilter(RW,Drift=true); figure hold on plot(RWBKCycle,"g:",LineWidth=2)
2-48
Choose Time Series Filter for Business Cycle Analysis
plot(RWCFCycle,"c",LineWidth=2) hold off ylabel("Cyclical Component") title("Random Walk with Drift") legend(["BK cycle" "CF cycle"])
The presence of random walks in economic data, and more generally unit roots in autoregressive models, is a subject of continuing debate in econometrics. The assumption has significant consequences for long-term forecasting. Regarding filter performance, however, Christiano and Fitzgerald claim that their method "is nearly optimal for the type of time series representations that fit US data on interest rates, unemployment, inflation, and output" and consequently "though it is only optimal for one particular time series representation, nevertheless works well for standard macroeconomic time series." Compare the two bandpass filters on the GDP output data. Because the raw GDP data exhibits a drift, specify estimating the drift component. [~,GDPBKCycle] = bkfilter(GDP); [~,GDPCFCycle] = cffilter(GDP,Drift=true); figure hold on plot(tGDP,GDPBKCycle,"g:",LineWidth=2) plot(tGDP,GDPCFCycle,"c",LineWidth=2) recessionplot hold off ylabel("Cyclical Component")
2-49
2
Data Preprocessing
title("Gross Domestic Product") legend(["BK cycle" "CF cycle"])
Differences between the Baxter-King and Christiano-Fitzgerald cycles appear at data endpoints, where the Baxter-King method clips the cycle by lag and lead lengths. For the GDP data, you can also see differences in the amplitudes of the cyclical swings, which determine turning points for periods of recession and expansion. For a thorough analysis of economic data, you can use the two filters, with their different optimizations, to complement each other. One-Sided Hodrick-Prescott Filter The standard Hodrick-Prescott filter computes a two-sided, centered difference to estimate a second derivative at time t, using both past and future values of the input series. As such, the filter is often applied to historical data. However, this noncausality can lead to end effects that give the filtered data a retrospective, and artificial, predictive power. Stock and Watson [15] suggest that the noncausality in the Hodrick-Prescott filter can distort forecast results. To address this distortion, they consider a one-sided version of the filter that uses only current and previous values of the input series. The one-sided filter does not revise outputs when new data becomes available. The one-sided filter aims to produce robust forecast performance rather than extract a secular trend. This revised goal reframes the notion of business cycle. The hpfilter function, with name-value argument FilterType set to "one-sided", implements the one-sided Hodrick-Prescott filter. 2-50
Choose Time Series Filter for Business Cycle Analysis
Compare the two versions of the Hodrick-Prescott filter on the GDP output data. [~,GDPHP2Cycle] = hpfilter(GDP); [~,GDPHP1Cycle] = hpfilter(GDP,Smoothing=1600,FilterType="one-sided"); figure hold on plot(tGDP,GDPHP2Cycle,"r") plot(tGDP,GDPHP1Cycle,"r:",LineWidth=2) recessionplot hold off ylabel("Cyclical Component") title("Gross Domestic Product") legend(["Two-sided HP cycle" "One-sided HP cycle"])
The two cycles lead to distinct calibrations of forecasting models (see Evaluate Filter Performance on page 2-53). Hamilton Filter Hamilton [9] expresses concern about use of the Hodrick-Prescott filter, including ad hoc methods for setting the smoothing parameter and the potential for producing spurious end effects. These concerns have been addressed in, for example, [1], [4], [13], and [15]. Hamilton presents what he calls a "better alternative," claiming it "offers a robust approach to detrending that achieves all the objectives sought by users of the HP filter with none of its drawbacks." His method is "a regression of the variable at date t + h on the four most recent values 2-51
2
Data Preprocessing
as of date t," with the business cycle defined by the forecast error: "How different is the value at t + h from the value that we would have expected to see based on its behavior through date t?" The method is related to the time series decomposition in [2]. The hfilter function implements the Hamilton filter. Compare the Hamilton filter with the similarly forecast-oriented one-sided Hodrick-Prescott filter. Apply the filters to the GDP output data. Use the default values of the following optional name-value arguments of hfilter: • LeadLength —The horizon h of the response variable in the regression. The default is 8. • LagLength — The number of recent values to use as predictor variables in the regression. The default is 4. [~,GDPHCycle] = hfilter(GDP); [~,GDPHP1Cycle] = hpfilter(GDP,FilterType="one-sided"); figure hold on plot(tGDP,GDPHCycle,"k") plot(tGDP,GDPHP1Cycle,"r:",LineWidth=2) recessionplot hold off ylabel("Cyclical Component") title("Gross Domestic Product") legend(["H cycle" "HP1 cycle"])
2-52
Choose Time Series Filter for Business Cycle Analysis
Because hfilter requires lags of the input variable, the function trims observations at the beginning of the data. The Hamilton cycle shows more distinct departures during historical recessions and expansions, which identify turning points and calibrate forecasting models. Evaluate Filter Performance Despite the different approaches, the filters in this example often identify similar cycles in economic data, reinforcing the idea of a true business cycle that exists in the data-generating process (DGP). The plots show estimated cycles under the different assumptions and optimizations characterizing the individual filters. However, the plots do not show how well each filter meets its own objectives or captures the DGP cycle. To evaluate relative performance, you must measure outcomes on the same terms. For example, the Baxter-King and Christiano-Fitzgerald filters both approximate ideal bandpass filters for isolating a mid-range of frequencies. Accordingly, compare the performance of these two filters by seeing which frequencies appear in the extracted cycle. A periodogram, computed by the discrete “Fourier Transforms”, shows the power of different frequencies in the filtered series, conditional on filter parameter settings. Compute the power spectrum of the cyclical components of the US GDP returned by the Baxter-King and Christiano-Fitzgerald filters (GDPBKCycle and GDPCFCycle). Specify comparable settings. fs = 1; % Sample frequency (per quarter) GDPBKCycle = rmmissing(GDPBKCycle); nBK = length(GDPBKCycle); DFTBK = fft(GDPBKCycle); fBK = (0:nBK-1)*(fs/nBK); % Frequency range PBK = DFTBK.*conj(DFTBK)/nBK; % Power nCF = DFTCF fCF = PCF =
length(GDPCFCycle); = fft(GDPCFCycle); (0:nCF-1)*(fs/nCF); % Frequency range DFTCF.*conj(DFTCF)/nCF; % Power
Compare the periodograms of the two series. figure hold on plot(fBK,log(PBK),"g:",LineWidth=2) plot(fCF,log(PCF),"c",LineWidth=2) hold off xline(fs/6,"r") % Burns and Mitchell upper frequency xline(fs/32,"r") % Burns and Mitchell lower frequency xlim([0 fs/2]) xlabel("Frequency") ylabel("Log Power") title("Periodogram") legend(["BK cycle" "CF cycle" "Cutoffs"])
2-53
2
Data Preprocessing
Both filters concentrate power in the specified frequency band. On the GDP data, the ChristianoFitzgerald filter is slightly more successful at removing the low-frequency trend components, while the Baxter-King filter is more successful at removing high-frequency noise. Similarly, compare the one-sided Hodrick-Prescott filter with the Hamilton filter based on their forecasting goals. Use the following procedure: 1
Fit a two-state (recession/expansion) Markov-switching model to the extracted cycles in the first two-thirds of the GDP data.
2
Forecast each series into the final one-third of the data.
Partition the data into estimation and forecast samples. n = length(GDP); nEst = floor((2/3)*n); nFor = n-nEst; GDPEst = GDP(1:nEst); % Estimation sample tGDPEst = tGDP(1:nEst); GDPFor = GDP(nEst+1:end); % Forecast sample tGDPFor = tGDP(nEst+1:end);
Filter the cycles from the estimation sample using the one-sided Hodrick-Prescott and Hamilton filters. Preprocess the cycles by trimming any leading and trailing NaN values and scaling the series to improve estimation. [~,GDPEstHCycle] = hfilter(GDPEst); GDPEstHCycle = rmmissing(GDPEstHCycle);
2-54
Choose Time Series Filter for Business Cycle Analysis
GDPEstHCycleS = GDPEstHCycle/100; [~,GDPEstHP1Cycle] = hpfilter(GDPEst,FilterType="one-sided"); GDPEstHP1CycleS = GDPEstHP1Cycle/100;
Fit a two-state Markov-switching model with AR(1) submodels to each estimation sample cycle. For more details on Markov-switching models, see “Creating Markov-Switching Dynamic Regression Models” on page 10-139. % Partially specified models for estimation P = NaN(2); mc = dtmc(P); mdl = arima(1,0,0); Mdl = msVAR(mc,[mdl; mdl]); % Fully specified models of initial values P0 = 0.5*ones(2); mc0 = dtmc(P0); mdl01 = arima('Constant',1,'AR',0.5,'Variance',1); mdl02 = arima('Constant',-1,'AR',0.5,'Variance',1); Mdl0 = msVAR(mc0,[mdl01; mdl02]); % Fit models to scaled cycles. EstMdlH = estimate(Mdl,Mdl0,GDPEstHCycleS); EstMdlHP1 = estimate(Mdl,Mdl0,GDPEstHP1CycleS);
Extract cycles from the forecast sample using the one-sided Hodrick-Prescott and Hamilton filters. Preprocess the cycles by scaling the series to match the estimation sample scale. [~,GDPForHCycle] = hfilter(GDPFor); GDPForHCycleS = GDPForHCycle/100; [~,GDPForHP1Cycle] = hpfilter(GDPFor,FilterType="one-sided"); GDPForHP1CycleS = GDPForHP1Cycle/100;
Provide optimal point forecasts of the switching models into the forecast period. Supply the scaled estimation cycles as presample data to initialize the forecasts. fH = forecast(EstMdlH,GDPEstHCycleS,nFor); fHP1 = forecast(EstMdlHP1,GDPEstHP1CycleS,nFor);
Compare the forecasts with the filtered cycles. figure hold on h1 = plot(tGDPEst(end-length(GDPEstHCycle)+1:end),GDPEstHCycleS,"k"); h2 = plot(tGDPFor,GDPForHCycleS,"k--"); h3 = plot(tGDPEst,GDPEstHP1CycleS,"r:",LineWidth=2); h4 = plot(tGDPFor,GDPForHP1CycleS,"r--"); h5 = plot(tGDPFor,fH,"k",LineWidth=2); h6 = plot(tGDPFor,fHP1,"r",LineWidth=2); yfill = [ylim fliplr(ylim)]; xfill = [tGDPEst([end end])' tGDPFor([end end])']; patch(xfill,yfill,'k',FaceAlpha=0.05) hold off ylabel("Scaled Cyclical Component") title("GDP Forecasts") names = ["H estimation cycle" "H forecast cycle" ...
2-55
2
Data Preprocessing
"HP1 estimation cycle" "HP1 forecast cycle" ... "H forecast" "HP1 forecast"]; legend([h1 h2 h3 h4 h5 h6],names,Location="northwest")
The forecasts, though different, are comparable over a business cycle horizon. However, the forecasts depend as much on the model fit to the cycles and the forecasting method employed as they do on the cycles extracted by the filters. As a result, a clear measure of filter performance is difficult to determine from this analysis (see also [14]). The difficulty is essential. The extracted cycles cannot be compared to measurable data. The filters do not identify true cycles, but cycles built into their respective designs. Absolute filter performance becomes self-referential, evaluated on specific assumptions about the nature of the business cycle and the data-generating process. Hodrick [10] addresses concerns about the Hodrick-Prescott filter by comparing it to other filters in the context of a variety of data-generating processes relevant to economic time series. He considers simple processes, such as random walks and ARIMA models, with no distinction between trend and cyclical components, and more complex processes, such as unobserved components models, where trend and cyclical components are modeled explicitly. When built into the model, cycles serve as proxies for true cycles in the macroeconomy, and simulations provide measurable data for comparison. Hodrick claims that "as the time series become more complex, the performance of the HP and BK filters more closely characterize the underlying cyclical frameworks than the H filter." To illustrate Hodrick's investigation, simulate an unobserved components model of the GDP, as in [5], calibrated with parameter settings in [10]. 2-56
Choose Time Series Filter for Business Cycle Analysis
numSim = 100; rng(1); % For reproducibility % Preallocate coditionalMean = ones(numSim,1); simCycle = ones(numSim,1); simTrend = ones(numSim,1); % Coefficients from Hodrick [10] sigma1 = 0.021; sigma2 = 0.603; sigma3 = 0.545; ARCycle = [1.510,-0.565]; % Unobserved components model for t = 3:numSim coditionalMean(t) = coditionalMean(t-1) + sigma1*randn; % Random walk simCycle(t) = ARCycle*[simCycle(t-1);simCycle(t-2)] + sigma2*randn; % ARIMA(2,0,0) simTrend(t) = simTrend(t-1) + coditionalMean(t-1) + sigma3*randn; end simData = simCycle + simTrend; % DGP = Cycle + Trend figure plot(simData) title("Simulated GDP Data")
2-57
2
Data Preprocessing
The simulated data is comparable to empirical GDP data. Apply the two-sided Hodrick-Prescott, Baxter-King, and Hamilton filters to the simulated data. [~,simHPCycle] = hpfilter(simData); [~,simBKCycle] = bkfilter(simData); [~,simHCycle] = hfilter(simData); figure hold on h0 = plot(simCycle,LineWidth=2); h1 = plot(simHPCycle,"r"); h2 = plot(simBKCycle,"g:",LineWidth=2); h3 = plot(simHCycle,"k--"); hold off names = ["DGP" "HP" "BK" "H"]; legend([h0 h1 h2 h3],names,Location="northwest") title("Cycle Estimates")
The cycles identified by the Hodrick-Prescott and Baxter-King filters mostly follow one another, while the Hamilton cycle swings more widely. Because the DGP cycle, simCycle, is built into the model, it is available for comparison when evaluating filter performance. Plot pairwise correlations of the cycles. corrplot([simCycle simHPCycle simBKCycle simHCycle],VarNames=names)
2-58
Choose Time Series Filter for Business Cycle Analysis
In this simulation of the GDP, the HP and BK cycles are tightly correlated with each other, and they are more strongly correlated with the DGP than the Hamilton cycle. For an analysis with similar conclusions, see [12]. Hodrick summarizes by saying, "Consequently, the most desirable approach to decomposing a time series into growth and cyclical components and hence the advice that one would give to someone that wants to detrend a series to focus on cyclical fluctuations clearly depends on the underlying model that one has in mind. For GDP, if one thinks that growth is caused by slowly moving changes in demographics, like population growth and changes in rates of labor force participation, as well as slowly moving changes in the productivity of capital and labor, then the filtering methods of Hodrick and Prescott and Baxter and King seem like the superior way to model the cyclical component." Summary The Hodrick-Prescott filter focuses on the identification of a slowly moving, easily computable trend component in macroeconomic time series. The Baxter-King and Christiano-Fitzgerald filters focus on frequency domain specifications of the cycle. The one-sided Hodrick-Prescott and Hamilton filters are more concerned with forecast performance for real-time analysis. Each filter is based on a particular definition of the business cycle, and choosing among them depends on what you aim to capture in the cycle. Because you cannot measure true business cycles, you must evaluate each filter relative to its own goals. Simulation studies show that different options perform better or worse depending on the nature of the data-generating process. Researchers often have a specific model in mind, based on economic theory, and they should choose a filter that is best adapted to that setting. Otherwise, 2-59
2
Data Preprocessing
practitioners should look to the filters presented in this example for a variety of robust, well-regarded options, suitable for comparative analyses.
References [1] Baxter, Marianne, and Robert G. King. "Measuring Business Cycles: Approximate Band-Pass Filters for Economic Time Series." Review of Economics and Statistics 81, no. 4 (November 1999): 575–93. https://doi.org/10.1162/003465399558454. [2] Beveridge, Stephen, and Charles R. Nelson. "A New Approach to Decomposition of Economic Time Series into Permanent and Transitory Components with Particular Attention to Measurement of the 'Business Cycle.'" Journal of Monetary Economics 7 (January 1981): 151–74. https:// doi.org/10.1016/0304-3932(81)90040-4. [3] Burns, Arthur F., and Wesley C. Mitchell. Measuring Business Cycles. Cambridge, MA: National Bureau of Economic Research, 1946. [4] Christiano, Lawrence J., and Terry J. Fitzgerald. "The Band Pass Filter." International Economic Review 44 (May 2003): 435–65. https://doi.org/10.1111/1468-2354.t01-1-00076. [5] Clark, Peter K. "The Cyclical Component of U. S. Economic Activity." The Quarterly Journal of Economics 102, no. 4 (November 1987): 797–814. https://doi.org/10.2307/1884282. [6] Cogley, Timothy, and James M. Nason. "Effects of the Hodrick-Prescott Filter on Trend and Difference Stationary Time Series Implications for Business Cycle Research." Journal of Economic Dynamics and Control 19, no. 1 (January1995): 253–78. https://doi.org/ 10.1016/0165-1889(93)00781-X. [7] de Jong, Robert M., and Neslihan Sakarya. "The Econometrics of the Hodrick-Prescott Filter." Review of Economics and Statistics 98, no. 2 (May 2016): 310–17. https://doi.org/10.1162/ REST_a_00523. [8] Granger, Clive W. J. "The Typical Spectral Shape of an Economic Variable." Econometrica 34, no. 1 (January 1966): 150–61. https://doi.org/10.2307/1909859. [9] Hamilton, James D. "Why You Should Never Use the Hodrick-Prescott Filter." The Review of Economics and Statistics 100 (December 2018): 831–43. https://doi.org/10.1162/ rest_a_00706. [10] Hodrick, Robert J. "An Exploration of Trend-Cycle Decomposition Methodologies in Simulated Data." National Bureau of Economic Research Working Paper No. w26750. Social Science Research Network (February 2020). https://papers.ssrn.com/abstract=3539317. [11] Hodrick, Robert J., and Edward C. Prescott. "Postwar U.S. Business Cycles: An Empirical Investigation." Journal of Money, Credit and Banking 29, no. 1 (February 1997): 1–16. https:// doi.org/10.2307/2953682. [12] Morley, James C., Charles R. Nelson, and Eric Zivot. "Why Are the Beveridge-Nelson and Unobserved-Components Decompositions of GDP So Different?" Review of Economics and Statistics 85, no. 2 (May 2003): 235–43. https://doi.org/10.1162/003465303765299765. [13] Ravn, Morton O., and Harald Uhlig. "On Adjusting the Hodrick-Prescott Filter for the Frequency of Observations." The Review of Economics and Statistics 84 , no. 2 (May 2002): 371–76. https://doi.org/10.1162/003465302317411604. 2-60
Choose Time Series Filter for Business Cycle Analysis
[14] Shackleton, Robert. "Estimating and Projecting Potential Output Using CBO's Forecasting Growth Model." Congressional Budget Office Working Paper No. 2018-03 (February 2018). https://www.cbo.gov/publication/53558. [15] Stock, James H., and Mark W. Watson. "Forecasting Inflation." Journal of Monetary Economics 44, no. 2 (October 1999): 293–335. https://doi.org/10.1016/S0304-3932(99)00027-6.
See Also hpfilter | bkfilter | cffilter | hfilter
More About •
When to Use the Hodrick-Prescott Filter
•
“Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-29
•
“Compare One-Sided and Two-Sided Hodrick-Prescott Filter Results” on page 2-34
•
“Time Series Decomposition” on page 2-19
2-61
2
Data Preprocessing
Seasonal Filters In this section... “What Is a Seasonal Filter?” on page 2-62 “Stable Seasonal Filter” on page 2-62 “Sn × m seasonal filter” on page 2-63
What Is a Seasonal Filter? You can use a seasonal filter (moving average) to estimate the seasonal component of a time series. For example, seasonal moving averages play a large role in the X-11-ARIMA seasonal adjustment program of Statistics Canada [1] and the X-12-ARIMA seasonal adjustment program of the U.S. Census Bureau [2]. For observations made during period k, k = 1,...,s (where s is the known periodicity of the seasonality), a seasonal filter is a convolution of weights and observations made during past and future periods k. For example, given monthly data (s = 12), a smoothed January observation is a symmetric, weighted average of January data. In general, for a time series xt, t = 1,...,N, the seasonally smoothed observation at time k + js, j = 1, ...,N/s – 1, is sk +
js
r
=
∑
l= −r
alxk + ( j + l)s,
(2-1) r
with weights al such that ∑l =
− r al
= 1.
The two most commonly used seasonal filters are the stable seasonal filter and the Sn × m seasonal filter.
Stable Seasonal Filter Use a stable seasonal filter if the seasonal level does not change over time, or if you have a short time series (under 5 years). Let nk be the total number of observations made in period k. A stable seasonal filter is given by sk =
1 nk
(N/s) − 1
∑
j=1
xk + js,
for k = 1,...,s, and s k = s k − s for k > s. s
Defines = (1/s) ∑k = 1 s k . For identifiability from the trend component, • Use s k = s k − s to estimate the seasonal component for an additive decomposition model (that is, constrain the component to fluctuate around zero). • Use s k = s k /s to estimate the seasonal component for a multiplicative decomposition model (that is, constrain the component to fluctuate around one). 2-62
Seasonal Filters
Sn × m seasonal filter To apply an Sn × m seasonal filter, take a symmetric n-term moving average of m-term averages. This is equivalent to taking a symmetric, unequally weighted moving average with n + m – 1 terms (that is, use r = (n + m − 1)/2 in “Equation 2-1”). An S3×3 filter has five terms with weights 1/9, 2/9, 1/3, 2/9, 1/9 . To illustrate, suppose you have monthly data over 10 years. Let Janyy denote the value observed in January, 20yy. The S3×3-filtered value for January 2005 is J an05 =
1 1 1 1 Jan03 + Jan04 + Jan05 + Jan04 + Jan05 + Jan06 + Jan05 + Jan06 + Jan07 . 3 3 3 3
Similarly, an S3×5 filter has seven terms with weights 1/15, 2/15, 1/5, 1/5, 1/5, 2/15, 1/15 . When using a symmetric filter, observations are lost at the beginning and end of the series. You can apply asymmetric weights at the ends of the series to prevent observation loss. To center the seasonal estimate, define a moving average of the seasonally filtered series, q
s t = ∑ j = − q b js t + j . A reasonable choice for the weights areb j = 1/4q for j = ±q and b j = 1/2q otherwise. Here, q = 2 for quarterly data (a 5-term average), or q = 6 for monthly data (a 13-term average). For identifiability from the trend component, • Use s t = s t − s t to estimate the seasonal component of an additive model (that is, constrain the component to fluctuate approximately around zero). • Use s t = s t /s t to estimate the seasonal component of a multiplicative model (that is, constrain the component to fluctuate approximately around one).
References [1] Dagum, E. B. The X-11-ARIMA Seasonal Adjustment Method. Number 12–564E. Statistics Canada, Ottawa, 1980. [2] Findley, D. F., B. C. Monsell, W. R. Bell, M. C. Otto, and B.-C. Chen. “New Capabilities and Methods of the X-12-ARIMA Seasonal-Adjustment Program.” Journal of Business & Economic Statistics. Vol. 16, Number 2, 1998, pp. 127–152.
See Also Related Examples •
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-67
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-72 2-63
2
Data Preprocessing
More About
2-64
•
“Moving Average Filter” on page 2-21
•
“Seasonal Adjustment” on page 2-65
•
“Time Series Decomposition” on page 2-19
Seasonal Adjustment
Seasonal Adjustment In this section... “What Is Seasonal Adjustment?” on page 2-65 “Deseasonalized Series” on page 2-65 “Seasonal Adjustment Process” on page 2-65
What Is Seasonal Adjustment? Economists and other practitioners are sometimes interested in extracting the global trends and business cycles of a time series, free from the effect of known seasonality. Small movements in the trend can be masked by a seasonal component, a trend with fixed and known periodicity (e.g., monthly or quarterly). The presence of seasonality can make it difficult to compare relative changes in two or more series. Seasonal adjustment is the process of removing a nuisance periodic component. The result of a seasonal adjustment is a deseasonalized time series. Deseasonalized data is useful for exploring the trend and any remaining irregular component. Because information is lost during the seasonal adjustment process, you should retain the original data for future modeling purposes.
Deseasonalized Series Consider decomposing a time series, yt, into three components: • Trend component, Tt • Seasonal component, St with known periodicity s • Irregular (stationary) stochastic component, It The most common decompositions are additive, multiplicative, and log-additive. To seasonally adjust a time series, first obtain an estimate of the seasonal component, S t. The estimateS t should be constrained to fluctuate around zero (at least approximately) for additive models, and around one, approximately, for multiplicative models. These constraints allow the seasonal component to be identifiable from the trend component. Given S t, the deseasonalized series is calculated by subtracting (or dividing by) the estimated seasonal component, depending on the assumed decomposition. • For an additive decomposition, the deseasonalized series is given by d = y − S . t t t • For a multiplicative decomposition, the deseasonalized series is given by d = y /S . t t t
Seasonal Adjustment Process To best estimate the seasonal component of a series, you should first estimate and remove the trend component. Conversely, to best estimate the trend component, you should first estimate and remove the seasonal component. Thus, seasonal adjustment is typically performed as an iterative process. The following steps for seasonal adjustment resemble those used within the X-12-ARIMA seasonal adjustment program of the U.S. Census Bureau [1]. 2-65
2
Data Preprocessing
1
Obtain a first estimate of the trend component, T t, using a moving average or parametric trend estimate.
2
Detrend the original series. For an additive decomposition, calculate xt = yt − T t. For a multiplicative decomposition, calculate xt = yt /T t.
3
Apply a seasonal filter to the detrended series,xt, to obtain an estimate of the seasonal component, S t. Center the estimate to fluctuate around zero or one, depending on the chosen decomposition. Use an S3×3 seasonal filter if you have adequate data, or a stable seasonal filter otherwise.
4
Deseasonalize the original series. For an additive decomposition, calculate dt = yt − S t. For a multiplicative decomposition, calculate dt = yt /S t ..
5
Obtain a second estimate of the trend component, T t,, using the deseasonalized series dt . Consider using a Henderson filter [1], with asymmetric weights at the ends of the series.
6
Detrend the original series again. For an additive decomposition, calculate xt = yt − T t. For a multiplicative decomposition, calculate xt = yt /T t.
7
Apply a seasonal filter to the detrended series, xt, to obtain an estimate of the seasonal component, S t. Consider using an S3×5 seasonal filter if you have adequate data, or a stable seasonal filter otherwise.
8
Deseasonalize the original series. For an additive decomposition, calculate dt = yt − S t. For a multiplicative decomposition, calculate dt = yt /S t . This is the final deseasonalized series.
References [1] Findley, D. F., B. C. Monsell, W. R. Bell, M. C. Otto, and B.-C. Chen. “New Capabilities and Methods of the X-12-ARIMA Seasonal-Adjustment Program.” Journal of Business & Economic Statistics. Vol. 16, Number 2, 1998, pp. 127–152.
See Also Related Examples
2-66
•
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-67
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-72
•
“Parametric Trend Estimation” on page 2-24
•
“Time Series Decomposition” on page 2-19
•
“Seasonal Filters” on page 2-62
•
“Moving Average Filter” on page 2-21
Seasonal Adjustment Using a Stable Seasonal Filter
Seasonal Adjustment Using a Stable Seasonal Filter This example shows how to use a stable seasonal filter to deseasonalize a time series (using an additive decomposition). The time series is monthly accidental deaths in the U.S. from 1973 to 1978 (Brockwell and Davis, 2002). Load Data Load the accidental deaths data set. load Data_Accidental y = DataTimeTable.NUMD; T = length(y); figure plot(DataTimeTable.Time,y/1000) title('Monthly Accidental Deaths') ylabel('Number of deaths (thousands)') hold on
The data exhibits a strong seasonal component with periodicity 12.
2-67
2
Data Preprocessing
Apply 13-term moving average Smooth the data using a 13-term moving average. To prevent observation loss, repeat the first and last smoothed values six times. Subtract the smoothed series from the original series to detrend the data. Add the moving average trend estimate to the observed time series plot. sW13 = [1/24; repmat(1/12,11,1); 1/24]; yS = conv(y,sW13,'same'); yS(1:6) = yS(7); yS(T-5:T) = yS(T-6); xt = y-yS; plot(DataTimeTable.Time,yS/1000,'r','LineWidth',2); legend('13-Term Moving Average') hold off
The detrended time series is xt. Using the shape parameter 'same' when calling conv returns a smoothed series the same length as the original series. Create Seasonal Indices Create a cell array, sidx, to store the indices corresponding to each period. The data is monthly, with periodicity 12, so the first element of sidx is a vector with elements 1, 13, 25,...,61 (corresponding to January observations). The second element of sidx is a vector with elements 2, 14, 16,...,62 (corresponding to February observations). This is repeated for all 12 months. 2-68
Seasonal Adjustment Using a Stable Seasonal Filter
s = 12; sidx = cell(s,1); for i = 1:s sidx{i,1} = i:s:T; end sidx{1:2} ans = 1×6 1
13
25
37
49
61
14
26
38
50
62
ans = 1×6 2
Using a cell array to store the indices allows for the possibility that each period does not occur the same number of times within the span of the observed series. Apply Stable Seasonal Filter Apply a stable seasonal filter to the detrended series, xt. Using the indices constructed in Step 3, average the detrended data corresponding to each period. That is, average all of the January values (at indices 1, 13, 25,...,61), and then average all of the February values (at indices 2, 14, 26,...,62), and so on for the remaining months. Put the smoothed values back into a single vector. Center the seasonal estimate to fluctuate around zero. sst = cellfun(@(x) mean(xt(x)),sidx); % Store smoothed values in a vector of length T nc = floor(T/s); % Num. complete years rm = mod(T,s); % Num. extra months sst = [repmat(sst,nc,1);sst(1:rm)]; % Center the seasonal estimate (additive) sBar = mean(sst); sst = sst-sBar; figure plot(DataTimeTable.Time,sst/1000) title('Stable Seasonal Component') ylabel('Number of deaths (thousands)')
2-69
2
Data Preprocessing
The stable seasonal component has constant amplitude across the series. The seasonal estimate is centered, and fluctuates around zero. Deseasonalize Series Subtract the estimated seasonal component from the original data. dt = y - sst; figure plot(DataTimeTable.Time,dt/1000) title('Deseasonalized Series') ylabel('Number of deaths (thousands)')
2-70
Seasonal Adjustment Using a Stable Seasonal Filter
The deseasonalized series consists of the long-term trend and irregular components. A large-scale quadratic trend in the number of accidental deaths is clear with the seasonal component removed. References: Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002.
See Also conv | cellfun
Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using S(n,m) Seasonal Filters” on page 2-72
•
“Parametric Trend Estimation” on page 2-24
•
“Time Series Decomposition” on page 2-19
•
“Moving Average Filter” on page 2-21
•
“Seasonal Filters” on page 2-62
•
“Seasonal Adjustment” on page 2-65
2-71
2
Data Preprocessing
Seasonal Adjustment Using S(n,m) Seasonal Filters This example shows how to apply Sn × m seasonal filters to deseasonalize a time series (using a multiplicative decomposition). The time series is monthly international airline passenger counts from 1949 to 1960. Load Data Load the airline data set. load Data_Airline y = DataTimeTable.PSSG; T = length(y); figure plot(DataTimeTable.Time,y) title('Airline Passenger Counts') hold on
The data shows an upward linear trend and a seasonal component with periodicity 12. Detrend Data Using 13-term Moving Average Before estimating the seasonal component, estimate and remove the linear trend. Apply a 13-term symmetric moving average, repeating the first and last observations six times to prevent data loss. 2-72
Seasonal Adjustment Using S(n,m) Seasonal Filters
Use weight 1/24 for the first and last terms in the moving average, and weight 1/12 for all interior terms. Divide the original series by the smoothed series to detrend the data. Add the moving average trend estimate to the observed time series plot. sW13 = [1/24;repmat(1/12,11,1);1/24]; yS = conv(y,sW13,'same'); yS(1:6) = yS(7); yS(T-5:T) = yS(T-6); xt = y./yS; plot(DataTable.Time,yS,'r','LineWidth',2) legend(["Passenger counts" "13-Term Moving Average"]) hold off
Create Seasonal Indices Create a cell array, sidx, to store the indices corresponding to each period. The data is monthly, with periodicity 12, so the first element of sidx is a vector with elements 1, 13, 25,...,133 (corresponding to January observations). The second element of sidx is a vector with elements 2, 14, 16,...,134 (corresponding to February observations). This is repeated for all 12 months. s = 12; sidx = cell(s,1); % Preallocation for i = 1:s
2-73
2
Data Preprocessing
sidx{i,1} = i:s:T; end sidx{1:2} ans = 1×12 1
13
25
37
49
61
73
85
97
109
121
133
26
38
50
62
74
86
98
110
122
134
ans = 1×12 2
14
Using a cell array to store the indices allows for the possibility that each period does not occur the same number of times within the span of the observed series. Apply S(3,3) Filter Apply a 5-term S3 × 3 seasonal moving average to the detrended series xt. That is, apply a moving average to the January values (at indices 1, 13, 25,...,133), and then apply a moving average to the February series (at indices 2, 14, 26,...,134), and so on for the remaining months. Use asymmetric weights at the ends of the moving average (using conv2). Put the smoothed values back into a single vector. To center the seasonal component around one, estimate, and then divide by, a 13-term moving average of the estimated seasonal component. % S3x3 seasonal filter % Symmetric weights sW3 = [1/9;2/9;1/3;2/9;1/9]; % Asymmetric weights for end of series aW3 = [.259 .407;.37 .407;.259 .185;.111 0]; % Apply filter to each month shat = NaN*y; for i = 1:s ns = length(sidx{i}); first = 1:4; last = ns - 3:ns; dat = xt(sidx{i}); sd = conv(dat,sW3,'same'); sd(1:2) = conv2(dat(first),1,rot90(aW3,2),'valid'); sd(ns -1:ns) = conv2(dat(last),1,aW3,'valid'); shat(sidx{i}) = sd; end % 13-term moving average of filtered series sW13 = [1/24;repmat(1/12,11,1);1/24]; sb = conv(shat,sW13,'same'); sb(1:6) = sb(s+1:s+6); sb(T-5:T) = sb(T-s-5:T-s); % Center to get final estimate s33 = shat./sb;
2-74
Seasonal Adjustment Using S(n,m) Seasonal Filters
figure plot(DataTimeTable.Time,s33) title('Estimated Seasonal Component')
Notice that the seasonal level changes over the range of the data. This illustrates the difference between an Sn × m seasonal filter and a stable seasonal filter. A stable seasonal filter assumes that the seasonal level is constant over the range of the data. Apply 13-term Henderson Filter To get an improved estimate of the trend component, apply a 13-term Henderson filter to the seasonally adjusted series. The necessary symmetric and asymmetric weights are provided in the following code. % Deseasonalize series dt = y./s33; % Henderson filter weights sWH = [-0.019;-0.028;0;.066;.147;.214; .24;.214;.147;.066;0;-0.028;-0.019]; % Asymmetric weights for end of series aWH = [-.034 -.017 .045 .148 .279 -.005 .051 .130 .215 .292 .061 .135 .201 .241 .254 .144 .205 .230 .216 .174 .211 .233 .208 .149 .080
.421; .353; .244; .120; .012;
2-75
2
Data Preprocessing
.238 .213 .147 .066 .001 -.026 -.016
.210 .146 .066 .003 -.022 -.011 0
.144 .066 .004 -.020 -.008 0 0
.068 .003 -.025 -.016 0 0 0
.002 -.039 -.042 0 0 0 0
-.058; -.092; 0 ; 0 ; 0 ; 0 ; 0 ];
% Apply 13-term Henderson filter first = 1:12; last = T-11:T; h13 = conv(dt,sWH,'same'); h13(T-5:end) = conv2(dt(last),1,aWH,'valid'); h13(1:6) = conv2(dt(first),1,rot90(aWH,2),'valid'); % New detrended series xt = y./h13; figure plot(DataTimeTable.Time,y) hold on plot(DataTimeTable.Time,h13,'r','LineWidth',2); legend(["Passenger counts" "13-Term Henderson Filter"]) title('Airline Passenger Counts') hold off
2-76
Seasonal Adjustment Using S(n,m) Seasonal Filters
Apply S(3,5) Seasonal Filter To get 6. an improved estimate of the seasonal component, apply a 7-term S3 × 5 seasonal moving average to the newly detrended series. The symmetric and asymmetric weights are provided in the following code. Center the seasonal estimate to fluctuate around 1. Deseasonalize the original series by dividing it by the centered seasonal estimate. % S3x5 seasonal filter % Symmetric weights sW5 = [1/15;2/15;repmat(1/5,3,1);2/15;1/15]; % Asymmetric weights for end of series aW5 = [.150 .250 .293; .217 .250 .283; .217 .250 .283; .217 .183 .150; .133 .067 0; .067 0 0]; % Apply filter to each month shat = NaN*y; for i = 1:s ns = length(sidx{i}); first = 1:6; last = ns-5:ns; dat = xt(sidx{i}); sd = conv(dat,sW5,'same'); sd(1:3) = conv2(dat(first),1,rot90(aW5,2),'valid'); sd(ns-2:ns) = conv2(dat(last),1,aW5,'valid'); shat(sidx{i}) = sd; end % 13-term moving average of filtered series sW13 = [1/24;repmat(1/12,11,1);1/24]; sb = conv(shat,sW13,'same'); sb(1:6) = sb(s+1:s+6); sb(T-5:T) = sb(T-s-5:T-s); % Center to get final estimate s35 = shat./sb; % Deseasonalized series dt = y./s35; figure plot(DataTimeTable.Time,dt) title('Deseasonalized Airline Passenger Counts')
2-77
2
Data Preprocessing
The deseasonalized series consists of the long-term trend and irregular components. With the seasonal component removed, it is easier to see turning points in the trend. Plot the components and the original series. Compare the original series to a series reconstructed using the component estimates. figure plot(DataTimeTable.Time,y,'Color',[.85,.85,.85],'LineWidth',4) hold on plot(DataTimeTable.Time,h13,'r','LineWidth',2) plot(DataTimeTable.Time,h13.*s35,'k--','LineWidth',1.5) legend(["Passenger counts" "13-Term Henderson Filter" ... "Trend and Seasonal Components"]) hold off title('Airline Passenger Counts')
2-78
Seasonal Adjustment Using S(n,m) Seasonal Filters
Estimate Irregular Component Detrend and deseasonalize the original series. Plot the remaining estimate of the irregular component. Irr = dt./h13; figure plot(DataTimeTable.Time,Irr) title('Airline Passenger Counts Irregular Component')
2-79
2
Data Preprocessing
You can optionally model the detrended and deseasonalized series using a stationary stochastic process model.
See Also conv | conv2 | cellfun
Related Examples •
“Moving Average Trend Estimation” on page 2-22
•
“Seasonal Adjustment Using a Stable Seasonal Filter” on page 2-67
•
“Parametric Trend Estimation” on page 2-24
More About
2-80
•
“Time Series Decomposition” on page 2-19
•
“Moving Average Filter” on page 2-21
•
“Seasonal Filters” on page 2-62
•
“Seasonal Adjustment” on page 2-65
3 Model Selection • “Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2 • “Autocorrelation and Partial Autocorrelation” on page 3-10 • “Ljung-Box Q-Test” on page 3-17 • “Detect Autocorrelation” on page 3-19 • “Engle’s ARCH Test” on page 3-25 • “Detect ARCH Effects” on page 3-27 • “Unit Root Nonstationarity” on page 3-32 • “Unit Root Tests” on page 3-40 • “Assess Stationarity of a Time Series” on page 3-50 • “Information Criteria for Model Selection” on page 3-53 • “Model Comparison Tests” on page 3-57 • “Conduct Lagrange Multiplier Test” on page 3-61 • “Conduct Wald Test” on page 3-64 • “Compare GARCH Models Using Likelihood Ratio Test” on page 3-66 • “Classical Model Misspecification Tests” on page 3-69 • “Check Fit of Multiplicative ARIMA Model” on page 3-80 • “Goodness of Fit” on page 3-85 • “Residual Diagnostics” on page 3-86 • “Assess Predictive Performance” on page 3-88 • “Nonspherical Models” on page 3-89 • “Plot a Confidence Band Using HAC Estimates” on page 3-90 • “Change the Bandwidth of a HAC Estimator” on page 3-97 • “Check Model Assumptions for Chow Test” on page 3-103 • “Power of the Chow Test” on page 3-109
3
Model Selection
Select ARIMA Model for Time Series Using Box-Jenkins Methodology This example shows how to use the Box-Jenkins methodology to select an ARIMA model. The time series is the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991. Box-Jenkins Methodology The Box-Jenkins methodology [1] is a five-step process for identifying, selecting, and assessing conditional mean models (for discrete, univariate time series data). 1
Determine whether the time series is stationary. If the series is not stationary, successively difference it to attain stationarity. The sample autocorrelation function (ACF) and partial autocorrelation function (PACF) of a stationary series decay exponentially (or cut off completely after a few lags).
2
Identify a stationary conditional mean model for the series. The sample ACF and PACF functions can help with this selection. For an autoregressive (AR) process, the sample ACF decays gradually, but the sample PACF cuts off after a few lags. Conversely, for a moving average (MA) process, the sample ACF cuts off after a few lags, but the sample PACF decays gradually. If both the ACF and PACF decay gradually, consider an ARMA model.
3
Create a model template for estimation, and then fit the model to the series. When fitting nonstationary models in Econometrics Toolbox™, you do not need to manually difference the series and fit a stationary model. Instead, you can use the series on the original scale, and create an arima model object with the desired degree of nonseasonal and seasonal differencing. Fitting an ARIMA model directly is advantageous for forecasting: forecasts are returned on the original scale (not differenced).
4
Conduct goodness-of-fit checks to ensure the model describes the series adequately. Residuals should be uncorrelated, homoscedastic, and normally distributed with constant mean and variance. If the residuals are not normally distributed, you can change the innovation distribution to a Student’s t.
5
After choosing a model—and checking its fit and forecasting ability—you can use the model to forecast or generate Monte Carlo simulations over a future time horizon.
Load the Data Load and plot the Australian CPI data. load Data_JAustralian y = DataTable.PAU; T = length(y); figure plot(y) h1 = gca; h1.XLim = [0,T]; h1.XTick = 1:10:T; h1.XTickLabel = datestr(dates(1:10:T),17); title('Log Quarterly Australian CPI')
3-2
Select ARIMA Model for Time Series Using Box-Jenkins Methodology
The series is nonstationary, with a clear upward trend. Plot the Sample ACF and PACF Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) for the CPI series. figure subplot(2,1,1) autocorr(y) subplot(2,1,2) parcorr(y)
3-3
3
Model Selection
The significant, linearly decaying sample ACF indicates a nonstationary process. Difference the Data Take a first difference of the data, and plot the differenced series. dY = diff(y); figure plot(dY) h2 = gca; h2.XLim = [0,T]; h2.XTick = 1:10:T; h2.XTickLabel = datestr(dates(2:10:T),17); title('Differenced Log Quarterly Australian CPI')
3-4
Select ARIMA Model for Time Series Using Box-Jenkins Methodology
Differencing removes the linear trend. The differenced series appears more stationary. Plot the Sample ACF and PACF of the Differenced Series Plot the sample ACF and PACF of the differenced series to look for behavior more consistent with a stationary process. figure subplot(2,1,1) autocorr(dY) subplot(2,1,2) parcorr(dY)
3-5
3
Model Selection
The sample ACF of the differenced series decays more quickly. The sample PACF cuts off after lag 2. This behavior is consistent with a second-degree autoregressive (AR(2)) model. Specify and Estimate an ARIMA(2,1,0) Model Specify, and then estimate, an ARIMA(2,1,0) model for the log quarterly Australian CPI. This model has one degree of nonseasonal differencing and two AR lags. By default, the innovation distribution is Gaussian with a constant variance. Mdl = arima(2,1,0); EstMdl = estimate(Mdl,y); ARIMA(2,1,0) Model (Gaussian Distribution): Value __________ Constant AR{1} AR{2} Variance
0.010072 0.21206 0.33728 9.2302e-05
StandardError _____________ 0.0032802 0.095428 0.10378 1.1112e-05
TStatistic __________
PValue __________
3.0707 2.2222 3.2499 8.3066
0.0021356 0.02627 0.0011543 9.8491e-17
Both AR coefficients are significant at the 0.05 significance level.
3-6
Select ARIMA Model for Time Series Using Box-Jenkins Methodology
Check Goodness of Fit Infer the residuals from the fitted model. Check that the residuals are normally distributed and uncorrelated. res = infer(EstMdl,y); figure subplot(2,2,1) plot(res./sqrt(EstMdl.Variance)) title('Standardized Residuals') subplot(2,2,2) qqplot(res) subplot(2,2,3) autocorr(res) subplot(2,2,4) parcorr(res) hvec = findall(gcf,'Type','axes'); set(hvec,'TitleFontSizeMultiplier',0.8,... 'LabelFontSizeMultiplier',0.8);
The residuals are reasonably normally distributed and uncorrelated. Generate Forecasts Generate forecasts and approximate 95% forecast intervals for the next 4 years (16 quarters). 3-7
3
Model Selection
[yF,yMSE] = forecast(EstMdl,16,y); UB = yF + 1.96*sqrt(yMSE); LB = yF - 1.96*sqrt(yMSE); figure h4 = plot(y,'Color',[.75,.75,.75]); hold on h5 = plot(78:93,yF,'r','LineWidth',2); h6 = plot(78:93,UB,'k--','LineWidth',1.5); plot(78:93,LB,'k--','LineWidth',1.5); fDates = [dates; dates(T) + cumsum(diff(dates(T-16:T)))]; h7 = gca; h7.XTick = 1:10:(T+16); h7.XTickLabel = datestr(fDates(1:10:end),17); legend([h4,h5,h6],'Log CPI','Forecast',... 'Forecast Interval','Location','Northwest') title('Log Australian CPI Forecast') hold off
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
3-8
Select ARIMA Model for Time Series Using Box-Jenkins Methodology
See Also Apps Econometric Modeler Objects arima Functions autocorr | parcorr | estimate | infer | forecast
Related Examples •
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Box-Jenkins Differencing vs. ARIMA Estimation” on page 7-103
•
“Nonseasonal Differencing” on page 2-13
•
“Infer Residuals for Diagnostic Checking” on page 7-138
•
“Creating Univariate Conditional Mean Models” on page 7-3
•
“Trend-Stationary vs. Difference-Stationary Processes” on page 2-6
•
“Goodness of Fit” on page 3-85
•
“MMSE Forecasting of Conditional Mean Models” on page 7-167
3-9
3
Model Selection
Autocorrelation and Partial Autocorrelation In this section... “What Are Autocorrelation and Partial Autocorrelation?” on page 3-10 “Theoretical ACF and PACF” on page 3-10 “Sample ACF and PACF” on page 3-10 “Compute Sample ACF and PACF in MATLAB®” on page 3-11
What Are Autocorrelation and Partial Autocorrelation? Autocorrelation is the linear dependence of a variable with itself at two points in time. For stationary processes, autocorrelation between any two observations depends only on the time lag h between them. Define Cov(yt, yt–h) = γh. Lag-h autocorrelation is given by ρh = Corr(yt, yt − h) =
γh . γ0
The denominator γ0 is the lag 0 covariance, that is, the unconditional variance of the process. Correlation between two variables can result from a mutual linear dependence on other variables (confounding). Partial autocorrelation is the autocorrelation between yt and yt–h after the removal of any linear dependence on y1, y2, ..., yt–h+1. The partial lag-h autocorrelation is denoted ϕh, h .
Theoretical ACF and PACF The autocorrelation function (ACF) for a time series yt, t = 1,...,N, is the sequence ρh, h = 1, 2,...,N – 1. The partial autocorrelation function (PACF) is the sequence ϕh, h, h = 1, 2,...,N – 1. The theoretical ACF and PACF for the AR, MA, and ARMA conditional mean models are known, and are different for each model. These differences among models are important to keep in mind when you select models. Conditional Mean Model
ACF Behavior
PACF Behavior
AR(p)
Tails off gradually
Cuts off after p lags
MA(q)
Cuts off after q lags
Tails off gradually
ARMA(p,q)
Tails off gradually
Tails off gradually
Sample ACF and PACF Sample autocorrelation and sample partial autocorrelation are statistics that estimate the theoretical autocorrelation and partial autocorrelation. Using these qualitative model selection tools, you can compare the sample ACF and PACF of your data against known theoretical autocorrelation functions [1]. For an observed series y1, y2,...,yT, denote the sample mean y . The sample lag-h autocorrelation is given by 3-10
Autocorrelation and Partial Autocorrelation
ρh =
∑Tt = h + 1 (yt − y)(yt − h − y) . ∑Tt = 1 (yt − y)2
The standard error for testing the significance of a single lag-h autocorrelation, ρ h, is approximately h−1
2
SEρ = (1 + 2 ∑i = 1 ρ i )/N . When you use autocorr to plot the sample autocorrelation function (also known as the correlogram), approximate 95% confidence intervals are drawn at ±2SEρ by default. Optional input arguments let you modify the calculation of the confidence bounds. The sample lag-h partial autocorrelation is the estimated lag-h coefficient in an AR model containing h lags, ϕ h, h . The standard error for testing the significance of a single lag-h partial autocorrelation is approximately 1/ N . When you use parcorr to plot the sample partial autocorrelation function, approximate 95% confidence intervals are drawn at ±2/ N by default. Optional input arguments let you modify the calculation of the confidence bounds.
Compute Sample ACF and PACF in MATLAB® This example shows how to compute and plot the sample ACF and PACF of a time series by using the Econometrics Toolbox™ functions autocorr and parcorr, and the Econometric Modeler app. Generate Synthetic Time Series Simulate an MA(2) process yt by filtering a series of 1000 standard Gaussian deviates εt through the difference equation yt = εt − εt − 1 + εt − 2 . rng('default') % For reproducibility e = randn(1000,1); y = filter([1 -1 1],1,e);
Plot and Compute ACF Plot the sample ACF of yt by passing the simulated time series to autocorr. autocorr(y)
3-11
3
Model Selection
The sample autocorrelation of lags greater than 2 is insignificant. Compute the sample ACF by calling autocorr again. Return the first output argument. acf = autocorr(y) acf = 21×1 1.0000 -0.6682 0.3618 -0.0208 0.0146 -0.0311 0.0611 -0.0828 0.0772 -0.0493 ⋮
acf(j) is the sample autocorrelation of yt at lag j – 1. Plot and Compute PACF Plot the sample PACF of yt by passing the simulated time series to parcorr. parcorr(y)
3-12
Autocorrelation and Partial Autocorrelation
The sample PACF gradually decreases with increasing lag. Compute the sample PACF by calling parcorr again. Return the first output argument. pacf = parcorr(y) pacf = 21×1 1.0000 -0.6697 -0.1541 0.2929 0.3421 0.0314 -0.1483 -0.2290 -0.0394 0.1419 ⋮
pacf(j) is the sample partial autocorrelation of yt at lag j – 1. The sample ACF and PACF suggest that yt is an MA(2) process. Use Econometric Modeler Open the Econometric Modeler app by entering econometricModeler at the command prompt. 3-13
3
Model Selection
econometricModeler
Load the simulated time series y. 1
On the Econometric Modeler tab, in the Import section, select Import > Import From Workspace.
2
In the Import Data dialog box, in the Import? column, select the check box for the y variable.
3
Click Import.
The variable y1 appears in the Data Browser, and its time series plot appears in the Time Series Plot(y1) figure window. Plot the sample ACF by clicking ACF on the Plots tab. Plot the sample PACF by clicking PACF on the Plots tab. Position the PACF plot below the ACF plot by dragging the PACF(y1) tab to the lower half of the document.
3-14
Autocorrelation and Partial Autocorrelation
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler Functions autocorr | parcorr
More About •
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71 3-15
3
Model Selection
3-16
•
“Detect Autocorrelation” on page 3-19
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect ARCH Effects” on page 3-27
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
•
“Ljung-Box Q-Test” on page 3-17
•
“What Are Autoregressive Models?” on page 7-21
•
“What Are Moving Average Models?” on page 7-29
•
“What Are Autoregressive Moving Average Models?” on page 7-35
Ljung-Box Q-Test
Ljung-Box Q-Test The sample autocorrelation function (ACF) and partial autocorrelation function (PACF) are useful qualitative tools to assess the presence of autocorrelation at individual lags. The Ljung-Box Q-test is a more quantitative way to test for autocorrelation at multiple lags jointly [1]. The null hypothesis for this test is that the first m autocorrelations are jointly zero, H0 : ρ1 = ρ2 = … = ρm = 0. The choice of m affects test performance. If N is the length of your observed time series, choosing m ≈ ln(N) is recommended for power [2]. You can test at multiple values of m. If seasonal autocorrelation is possible, you might consider testing at larger values of m, such as 10 or 15. The Ljung-Box test statistic is given by m
Q(m) = N(N + 2) ∑h = 1
2
ρh . N−h
This is a modification of the Box-Pierce Portmanteau “Q” statistic [3]. Under the null hypothesis, Q(m) 2 distribution. follows a χm You can use the Ljung-Box Q-test to assess autocorrelation in any series with a constant mean. This includes residual series, which can be tested for autocorrelation during model diagnostic checks. If the residuals result from fitting a model with g parameters, you should compare the test statistic to a χ 2 distribution with m – g degrees of freedom. Optional input arguments to lbqtest let you modify the degrees of freedom of the null distribution. You can also test for conditional heteroscedasticity by conducting a Ljung-Box Q-test on a squared residual series. An alternative test for conditional heteroscedasticity is Engle’s ARCH test (archtest).
References [1] Ljung, G. and G. E. P. Box. “On a Measure of Lack of Fit in Time Series Models.” Biometrika. Vol. 66, 1978, pp. 67–72. [2] Tsay, R. S. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2010. [3] Box, G. E. P. and D. Pierce. “Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models.” Journal of the American Statistical Association. Vol. 65, 1970, pp. 1509–1526.
See Also Apps Econometric Modeler Functions lbqtest | archtest
3-17
3
Model Selection
Related Examples •
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect Autocorrelation” on page 3-19
•
“Detect ARCH Effects” on page 3-27
More About
3-18
•
“Autocorrelation and Partial Autocorrelation” on page 3-10
•
“Engle’s ARCH Test” on page 3-25
•
“Residual Diagnostics” on page 3-86
•
“What Are Conditional Mean Models?” on page 7-13
Detect Autocorrelation
Detect Autocorrelation In this section... “Compute Sample ACF and PACF” on page 3-19 “Conduct the Ljung-Box Q-Test” on page 3-21
Compute Sample ACF and PACF This example shows how to compute the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) to qualitatively assess autocorrelation. The time series is 57 consecutive days of overshorts from a gasoline tank in Colorado. Step 1. Load the data. Load the time series of overshorts. load('Data_Overshort.mat') Y = Data; N = length(Y); figure plot(Y) xlim([0,N]) title('Overshorts for 57 Consecutive Days')
3-19
3
Model Selection
The series appears to be stationary. Step 2. Plot the sample ACF and PACF. Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). figure subplot(2,1,1) autocorr(Y) subplot(2,1,2) parcorr(Y)
3-20
Detect Autocorrelation
The sample ACF and PACF exhibit significant autocorrelation. The sample ACF has significant autocorrelation at lag 1. The sample PACF has significant autocorrelation at lags 1, 3, and 4. The distinct cutoff of the ACF combined with the more gradual decay of the PACF suggests an MA(1) model might be appropriate for this data. Step 3. Store the sample ACF and PACF values. Store the sample ACF and PACF values up to lag 15. acf = autocorr(Y,'NumLags',15); pacf = parcorr(Y,'NumLags',15); [length(acf) length(pacf)] ans = 1×2 16
16
The outputs acf and pacf are vectors storing the sample autocorrelation and partial autocorrelation at lags 0, 1,...,15 (a total of 16 lags).
Conduct the Ljung-Box Q-Test This example shows how to conduct the Ljung-Box Q-test for autocorrelation. 3-21
3
Model Selection
The time series is 57 consecutive days of overshorts from a gasoline tank in Colorado. Step 1. Load the data. Load the time series of overshorts. load('Data_Overshort.mat') Y = Data; N = length(Y); figure plot(Y) xlim([0,N]) title('Overshorts for 57 Consecutive Days')
The data appears to fluctuate around a constant mean, so no data transformations are needed before conducting the Ljung-Box Q-test. Step 2. Conduct the Ljung-Box Q-test. Conduct the Ljung-Box Q-test for autocorrelation at lags 5, 10, and 15. [h,p,Qstat,crit] = lbqtest(Y,'Lags',[5,10,15]) h = 1x3 logical array 1
3-22
1
1
Detect Autocorrelation
p = 1×3 0.0016
0.0007
0.0013
30.5986
36.9639
18.3070
24.9958
Qstat = 1×3 19.3604 crit = 1×3 11.0705
All outputs are vectors with three elements, corresponding to tests at each of the three lags. The first element of each output corresponds to the test at lag 5, the second element corresponds to the test at lag 10, and the third element corresponds to the test at lag 15. The test decisions are stored in the vector h. The value h = 1 means reject the null hypothesis. Vector p contains the p-values for the three tests. At the α = 0 . 05 significance level, the null hypothesis of no autocorrelation is rejected at all three lags. The conclusion is that there is significant autocorrelation in the series. The test statistics and χ 2 critical values are given in outputs Qstat and crit, respectively.
References [1] Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002.
See Also Apps Econometric Modeler Functions autocorr | lbqtest | parcorr
Related Examples •
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects” on page 3-27
•
“Choose ARMA Lags Using BIC” on page 7-135
•
“Create Multiplicative Seasonal ARIMA Model for Time Series Data” on page 7-51
•
“Specify Conditional Mean and Variance Models” on page 7-75
More About •
“Autocorrelation and Partial Autocorrelation” on page 3-10
•
“Ljung-Box Q-Test” on page 3-17 3-23
3
Model Selection
3-24
•
“What Are Moving Average Models?” on page 7-29
•
“Goodness of Fit” on page 3-85
Engle’s ARCH Test
Engle’s ARCH Test An uncorrelated time series can still be serially dependent due to a dynamic conditional variance process. A time series exhibiting conditional heteroscedasticity—or autocorrelation in the squared series—is said to have autoregressive conditional heteroscedastic (ARCH) effects. Engle’s ARCH test is a Lagrange multiplier test to assess the significance of ARCH effects [1]. Consider a time series yt = μt + εt, whereμt is the conditional mean of the process, andεt is an innovation process with mean zero. Suppose the innovations are generated as εt = σtzt, where zt is an independent and identically distributed process with mean 0 and variance 1. Thus, E(εtεt + h) = 0 for all lags h ≠ 0 and the innovations are uncorrelated. Let Ht denote the history of the process available at time t. The conditional variance of yt is Var(yt Ht − 1) = Var(εt Ht − 1) = E(εt2 Ht − 1) = σt2 . Thus, conditional heteroscedasticity in the variance process is equivalent to autocorrelation in the squared innovation process. Define the residual series et = yt − μ t . If all autocorrelation in the original series, yt, is accounted for in the conditional mean model, then the residuals are uncorrelated with mean zero. However, the residuals can still be serially dependent. The alternative hypothesis for Engle’s ARCH test is autocorrelation in the squared residuals, given by the regression Ha : et2 = α0 + α1et2− 1 + … + αmet2− m + ut, where ut is a white noise error process. The null hypothesis is H0 : α0 = α1 = … = αm = 0. To conduct Engle’s ARCH test using archtest, you need to specify the lag m in the alternative hypothesis. One way to choose m is to compare loglikelihood values for different choices of m. You can use the likelihood ratio test (lratiotest) or information criteria (aicbic) to compare loglikelihood values. To generalize to a GARCH alternative, note that a GARCH(P,Q) model is locally equivalent to an ARCH(P + Q) model. This suggests also considering values m = P + Q for reasonable choices of P and Q. 3-25
3
Model Selection
The test statistic for Engle’s ARCH test is the usual F statistic for the regression on the squared residuals. Under the null hypothesis, the F statistic follows a χ 2 distribution with m degrees of freedom. A large critical value indicates rejection of the null hypothesis in favor of the alternative. As an alternative to Engle’s ARCH test, you can check for serial dependence (ARCH effects) in a residual series by conducting a Ljung-Box Q-test on the first m lags of the squared residual series with lbqtest. Similarly, you can explore the sample autocorrelation and partial autocorrelation functions of the squared residual series for evidence of significant autocorrelation.
References [1] Engle, Robert F. “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica. Vol. 50, 1982, pp. 987–1007.
See Also archtest | lbqtest | lratiotest | aicbic
Related Examples •
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect ARCH Effects” on page 3-27
•
“Specify Conditional Mean and Variance Models” on page 7-75
More About
3-26
•
“Ljung-Box Q-Test” on page 3-17
•
“Autocorrelation and Partial Autocorrelation” on page 3-10
•
“Model Comparison Tests” on page 3-57
•
“Information Criteria for Model Selection” on page 3-53
•
“Conditional Variance Models” on page 8-2
Detect ARCH Effects
Detect ARCH Effects In this section... “Test Autocorrelation of Squared Residuals” on page 3-27 “Conduct Engle's ARCH Test” on page 3-29
Test Autocorrelation of Squared Residuals This example shows how to inspect a squared residual series for autocorrelation by plotting the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). Then, conduct a Ljung-Box Q-test to more formally assess autocorrelation. Load the Data. Load the NASDAQ data included with the toolbox. Convert the daily close composite index series to a percentage return series. load Data_EquityIdx; y = DataTable.NASDAQ; r = 100*price2ret(y); T = length(r); figure plot(r) xlim([0,T]) title('NASDAQ Daily Returns')
3-27
3
Model Selection
The returns appear to fluctuate around a constant level, but exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. The returns are of relatively high frequency. Therefore, the daily changes can be small. For numerical stability, it is good practice to scale such data. Plot the Sample ACF and PACF. Plot the sample ACF and PACF for the squared residual series. e = r - mean(r); figure subplot(2,1,1) autocorr(e.^2) subplot(2,1,2) parcorr(e.^2)
3-28
Detect ARCH Effects
The sample ACF and PACF show significant autocorrelation in the squared residual series. This indicates that volatility clustering is present in the residual series. Conduct a Ljung-Box Q-test. Conduct a Ljung-Box Q-test on the squared residual series at lags 5 and 10. [h,p] = lbqtest(e.^2,'Lags',[5,10]) h = 1x2 logical array 1
1
p = 1×2 0
0
The null hypothesis is rejected for the two tests (h = 1). The p values for both tests is 0. Thus, not all of the autocorrelations up to lag 5 (or 10) are zero, indicating volatility clustering in the residual series.
Conduct Engle's ARCH Test This example shows how to conduct Engle's ARCH test for conditional heteroscedasticity. 3-29
3
Model Selection
Load and Preprocess Data Load the NASDAQ data included with the toolbox. Convert the daily close composite index series to a return series. load Data_EquityIdx ReturnsTbl = price2ret(DataTable); figure plot(ReturnsTbl.NASDAQ) title('NASDAQ Daily Returns') axis tight
The returns appear to fluctuate around a constant level, but exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. The returns are of relatively high frequency. Therefore, the daily changes can be small. For numerical stability, it is good practice to scale such data. Conduct Engle's ARCH Test Conduct Engle's ARCH test for conditional heteroscedasticity on the residual series from a fit of the percent returns series to a constant-only model. Specify two lags in the alternative hypothesis. ReturnsTbl.Residuals_NASDAQ = 100*(ReturnsTbl.NASDAQ - mean(ReturnsTbl.NASDAQ)); StatTbl = archtest(ReturnsTbl,DataVariable="Residuals_NASDAQ",Lags=2)
3-30
Detect ARCH Effects
StatTbl=1×6 table h _____ Test 1
true
pValue ______ 0
stat ______
cValue ______
399.97
5.9915
Lags ____ 2
Alpha _____ 0.05
The null hypothesis is soundly rejected (h = 1, p = 0) in favor of the ARCH(2) alternative. The F statistic for the test is 399.97, much larger than the critical value from the χ 2 distribution with two degrees of freedom, 5.99. The test concludes there is significant volatility clustering in the residual series.
See Also archtest | autocorr | lbqtest | parcorr
Related Examples •
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Detect Autocorrelation” on page 3-19
•
“Specify Conditional Mean and Variance Models” on page 7-75
More About •
“Engle’s ARCH Test” on page 3-25
•
“Autocorrelation and Partial Autocorrelation” on page 3-10
•
“Conditional Variance Models” on page 8-2
3-31
3
Model Selection
Unit Root Nonstationarity In this section... “What Is a Unit Root Test?” on page 3-32 “Modeling Unit Root Processes” on page 3-32 “Available Tests” on page 3-36 “Testing for Unit Roots” on page 3-37
What Is a Unit Root Test? A unit root process is a data-generating process whose first difference is stationary. In other words, a unit root process yt has the form yt = yt–1 + stationary process. A unit root test attempts to determine whether a given time series is consistent with a unit root process. The next section gives more details of unit root processes, and suggests why it is important to detect them.
Modeling Unit Root Processes There are two basic models for economic data with linear growth characteristics: • Trend-stationary process (TSP): yt = c + δt + stationary process • Unit root process, also called a difference-stationary process (DSP): Δyt = δ + stationary process Here Δ is the differencing operator, Δyt = yt – yt–1 = (1 – L)yt, where L is the lag operator defined by Liyt = yt – i. The processes are indistinguishable for finite data. In other words, there are both a TSP and a DSP that fit a finite data set arbitrarily well. However, the processes are distinguishable when restricted to a particular subclass of data-generating processes, such as AR(p) processes. After fitting a model to data, a unit root test checks if the AR(1) coefficient is 1. There are two main reasons to distinguish between these types of processes: • “Forecasting” on page 3-32 • “Spurious Regression” on page 3-36 Forecasting A TSP and a DSP produce different forecasts. Basically, shocks to a TSP return to the trend line c + δt as time increases. In contrast, shocks to a DSP might be persistent over time. For example, consider the simple trend-stationary model y1, t = 0 . 9y1, t − 1 + 0 . 02t + ε1, t 3-32
Unit Root Nonstationarity
and the difference-stationary model y2, t = 0 . 2 + y2, t − 1 + ε2, t . In these models, ε1, t and ε2, t are independent innovation processes. For this example, the innovations are independent and distributed N(0,1). Both processes grow at rate 0.2. To calculate the growth rate for the TSP, which has a linear term 0 . 02t, set ε1, t = 0. Then, solve the model y1, t = c + δt for c and δ. c + δt = 0 . 9(c + δ(t − 1)) + 0 . 02t . The solution is c = − 1 . 8, δ = 0 . 2. A plot for t = 1:1000 shows the TSP stays very close to the trend line, while the DSP has persistent deviations away from the trend line. T = 1000; % Sample size t = (1:T)'; % Period vector rng(5); % For reproducibility randm = randn(T,2); % Innovations y = zeros(T,2); % Columns of y are data series % Build trend stationary series y(:,1) = .02*t + randm(:,1); for ii = 2:T y(ii,1) = y(ii,1) + y(ii-1,1)*.9; end % Build difference stationary series y(:,2) = .2 + randm(:,2); y(:,2) = cumsum(y(:,2)); figure plot(y(:,1),'b') hold on plot(y(:,2),'g') plot((1:T)*0.2,'k--') legend('Trend Stationary','Difference Stationary',... 'Trend Line','Location','NorthWest') hold off
3-33
3
Model Selection
Forecasts based on the two series are different. To see this difference, plot the predicted behavior of the two series using varm, estimate, and forecast. The following plot shows the last 100 data points in the two series and predictions of the next 100 points, including confidence bounds. AR = {[NaN 0; 0 NaN]}; % Independent response series trend = [NaN; 0]; % Linear trend in first series only Mdl = varm('AR',AR,'Trend',trend); EstMdl = estimate(Mdl,y); EstMdl.SeriesNames = ["Trend stationary" "Difference stationary"]; [ynew,ycov] = forecast(EstMdl,100,y); % This generates predictions for 100 time steps seY = sqrt(diag(EstMdl.Covariance))'; % Extract standard deviations of y CIY = zeros([size(y) 2]); % In-sample intervals CIY(:,:,1) = y - seY; CIY(:,:,2) = y + seY; extractFSE = cellfun(@(x)sqrt(diag(x))',ycov,'UniformOutput',false); seYNew = cell2mat(extractFSE); CIYNew = zeros([size(ynew) 2]); % Forecast intervals CIYNew(:,:,1) = ynew - seYNew; CIYNew(:,:,2) = ynew + seYNew; tx = (T-100:T+100); hs = 1:2;
3-34
Unit Root Nonstationarity
figure; for j = 1:Mdl.NumSeries hs(j) = subplot(2,1,j); hold on; h1 = plot(tx,tx*0.2,'k--'); axis tight; ha = gca; h2 = plot(tx,[y(end-100:end,j); ynew(:,j)]); h3 = plot(tx(1:101),squeeze(CIY(end-100:end,j,:)),'r:'); plot(tx(102:end),squeeze(CIYNew(:,j,:)),'r:'); h4 = fill([tx(102) ha.XLim([2 2]) tx(102)],ha.YLim([1 1 2 2]),[0.7 0.7 0.7],... 'FaceAlpha',0.1,'EdgeColor','none'); title(EstMdl.SeriesNames{j}); hold off; end legend(hs(1),[h1 h2 h3(1) h4],... {'Trend','Process','Interval estimate','Forecast horizon'},'Location','Best');
Examine the fitted parameters by passing the estimated model to summarize, and you find estimate did an excellent job. The TSP has confidence intervals that do not grow with time, whereas the DSP has confidence intervals that grow. Furthermore, the TSP goes to the trend line quickly, while the DSP does not tend towards the trend line y = 0 . 2t asymptotically.
3-35
3
Model Selection
Spurious Regression The presence of unit roots can lead to false inferences in regressions between time series. Suppose xt and yt are unit root processes with independent increments, such as random walks with drift xt = c1 + xt–1 + ε1(t) yt = c2 + yt–1 + ε2(t), where εi(t) are independent innovations processes. Regressing y on x results, in general, in a nonzero regression coefficient, and significant coefficient of determination R2. This result holds despite xt and yt being independent random walks. If both processes have trends (ci ≠ 0), there is a correlation between x and y because of their linear trends. However, even if the ci = 0, the presence of unit roots in the xt and yt processes yields correlation. For more information on spurious regression, see Granger and Newbold [1] and “Time Series Regression IV: Spurious Regression” on page 5-200.
Available Tests There are four Econometrics Toolbox tests for unit roots. These functions test for the existence of a single unit root. When there are two or more unit roots, the results of these tests might not be valid. • “Dickey-Fuller and Phillips-Perron Tests” on page 3-36 • “KPSS Test” on page 3-37 • “Variance Ratio Test” on page 3-37 Dickey-Fuller and Phillips-Perron Tests adftest performs the augmented Dickey-Fuller test. pptest performs the Phillips-Perron test. These two classes of tests have a null hypothesis of a unit root process of the form yt = yt–1 + c + δt + εt, which the functions test against an alternative model yt = γyt–1 + c + δt + εt, where γ < 1. The null and alternative models for a Dickey-Fuller test are like those for a PhillipsPerron test. The difference is adftest extends the model with extra parameters accounting for serial correlation among the innovations: yt = c + δt + γyt – 1 + ϕ1Δyt – 1 + ϕ2Δyt – 2 +...+ ϕpΔyt – p + εt, where • L is the lag operator: Lyt = yt–1. • Δ = 1 – L, so Δyt = yt – yt–1. • εt is the innovations process. Phillips-Perron adjusts the test statistics to account for serial correlation. 3-36
Unit Root Nonstationarity
There are three variants of both adftest and pptest, corresponding to the following values of the 'model' parameter: • 'AR' assumes c and δ, which appear in the preceding equations, are both 0; the 'AR' alternative has mean 0. • 'ARD' assumes δ is 0. The 'ARD' alternative has mean c/(1–γ). • 'TS' makes no assumption about c and δ. For information on how to choose the appropriate value of 'model', see “Choose Models to Test” on page 3-37. KPSS Test The KPSS test, kpsstest, is an inverse of the Phillips-Perron test: it reverses the null and alternative hypotheses. The KPSS test uses the model: yt = ct + δt + ut, with ct = ct–1 + vt. Here ut is a stationary process, and vt is an i.i.d. process with mean 0 and variance σ2. The null hypothesis is that σ2 = 0, so that the random walk term ct becomes a constant intercept. The alternative is σ2 > 0, which introduces the unit root in the random walk. Variance Ratio Test The variance ratio test, vratiotest, is based on the fact that the variance of a random walk increases linearly with time. vratiotest can also take into account heteroscedasticity, where the variance increases at a variable rate with time. The test has a null hypotheses of a random walk: Δyt = εt.
Testing for Unit Roots • “Transform Data” on page 3-37 • “Choose Models to Test” on page 3-37 • “Determine Appropriate Lags” on page 3-38 • “Conduct Unit Root Tests at Multiple Lags” on page 3-38 Transform Data Transform your time series to be approximately linear before testing for a unit root. If a series has exponential growth, take its logarithm. For example, GDP and consumer prices typically have exponential growth, so test their logarithms for unit roots. If you want to transform your data to be stationary instead of approximately linear, unit root tests can help you determine whether to difference your data, or to subtract a linear trend. For a discussion of this topic, see “What Is a Unit Root Test?” on page 3-32 Choose Models to Test • For adftest or pptest, choose model in as follows: 3-37
3
Model Selection
• If your data shows a linear trend, set model to 'TS'. • If your data shows no trend, but seem to have a nonzero mean, set model to 'ARD'. • If your data shows no trend and seem to have a zero mean, set model to 'AR' (the default). • For kpsstest, set trend to true (default) if the data shows a linear trend. Otherwise, set trend to false. • For vratiotest, set IID to true if you want to test for independent, identically distributed innovations (no heteroscedasticity). Otherwise, leave IID at the default value, false. Linear trends do not affect vratiotest. Determine Appropriate Lags Setting appropriate lags depends on the test you use: • adftest — One method is to begin with a maximum lag, such as the one recommended by Schwert [2]. Then, test down by assessing the significance of the coefficient of the term at lag pmax. Schwert recommends a maximum lag of pmax = maximum lag = 12 T /100
1/4
,
where x is the integer part of x. The usual t statistic is appropriate for testing the significance of coefficients, as reported in the reg output structure. Another method is to combine a measure of fit, such as SSR, with information criteria such as AIC, BIC, and HQC. These statistics also appear in the reg output structure. Ng and Perron [3] provide further guidelines. • kpsstest — One method is to begin with few lags, and then evaluate the sensitivity of the results by adding more lags. For consistency of the Newey-West estimator, the number of lags must go to infinity as the sample size increases. Kwiatkowski et al. [4] suggest using a number of lags on the order of T1/2, where T is the sample size. For an example of choosing lags for kpsstest, see “Test Time Series Data for Unit Root” on page 3-44. • pptest — One method is to begin with few lags, and then evaluate the sensitivity of the results by adding more lags. Another method is to look at sample autocorrelations of yt – yt–1; slow rates of decay require more lags. The Newey-West estimator is consistent if the number of lags is O(T1/4), where T is the effective sample size, adjusted for lag and missing values. White and Domowitz [5] and Perron [6] provide further guidelines. For an example of choosing lags for pptest, see “Test Time Series Data for Unit Root” on page 344. • vratiotest does not use lags. Conduct Unit Root Tests at Multiple Lags Run multiple tests simultaneously by entering a vector of parameters for lags, alpha, model, or test. All vector parameters must have the same length. The test expands any scalar parameter to the length of a vector parameter. For an example using this technique, see “Test Time Series Data for Unit Root” on page 3-44.
3-38
Unit Root Nonstationarity
References [1] Granger, C. W. J., and P. Newbold. “Spurious Regressions in Econometrics.” Journal of Econometrics. Vol 2, 1974, pp. 111–120. [2] Schwert, W. “Tests for Unit Roots: A Monte Carlo Investigation.” Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159. [3] Ng, S., and P. Perron. “Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag.” Journal of the American Statistical Association. Vol. 90, 1995, pp. 268–281. [4] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [5] White, H., and I. Domowitz. “Nonlinear Regression with Dependent Observations.” Econometrica. Vol. 52, 1984, pp. 143–162. [6] Perron, P. “Trends and Random Walks in Macroeconomic Time Series: Further Evidence from a New Approach.” Journal of Economic Dynamics and Control. Vol. 12, 1988, pp. 297–332.
See Also adftest | kpsstest | pptest | vratiotest
Related Examples •
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Unit Root Tests” on page 3-40
•
“Assess Stationarity of a Time Series” on page 3-50
3-39
3
Model Selection
Unit Root Tests In this section... “Test Simulated Data for a Unit Root” on page 3-40 “Test Time Series Data for Unit Root” on page 3-44 “Test Stock Data for Random Walk” on page 3-47
Test Simulated Data for a Unit Root This example shows how to test univariate time series models for stationarity. It shows how to simulate data from four types of models: trend stationary, difference stationary, stationary (AR(1)), and a heteroscedastic, random walk model. It also shows that the tests yield expected results. Simulate four time series. T = 1e3; t = (1:T)';
% Sample size % Time multiple
rng(142857);
% For reproducibility
y1 = randn(T,1) + .2*t; % Trend stationary Mdl2 = arima(D=1,Constant=0.2,Variance=1); y2 = simulate(Mdl2,T,Y0=0); % Difference stationary Mdl3 = arima(AR=0.99,Constant=0.2,Variance=1); y3 = simulate(Mdl3,T,Y0=0); % AR(1) Mdl4 = arima(D=1,Constant=0.2,Variance=1); sigma = (sin(t/200) + 1.5)/2; % Std deviation e = randn(T,1).*sigma; % Innovations y4 = filter(Mdl4,e,Y0=0); % Heteroscedastic
Plot the first 100 points in each series. y = [y1 y2 y3 y4]; figure plot1 = plot(y(1:100,:),LineWidth=2); plot1(1).LineStyle = ":"; plot1(4).LineStyle = ":"; title("First 100 Periods of Each Series") legend("Trend Stationary","Difference Stationary","AR(1)", ... "Heteroscedastic",Location="northwest")
3-40
Unit Root Tests
All of the models appear nonstationary and behave similarly. Therefore, you might find it difficult to distinguish which series comes from which model simply by looking at their initial segments. Plot the entire data set. plot2 = plot(y,LineWidth=2); plot2(1).LineStyle = ":"; plot2(4).LineStyle = ":"; title("Each Entire Series"); legend("Trend Stationary","Difference Stationary","AR(1)", ... "Heteroscedastic",Location="NorthWest");
3-41
3
Model Selection
The differences between the series are clearer here: • The trend stationary series has little deviation from its mean trend. • The difference stationary and heteroscedastic series have persistent deviations away from the trend line. • The AR(1) series exhibits long-run stationary behavior; the others grow linearly. • The difference stationary and heteroscedastic series appear similar. However, that the heteroscedastic series has much more local variability near period 300, and much less near period 900. The model variance is maximal when sin(t/200) = 1, at time 100π ≈ 314. The model variance is minimal when sin(t/200) = − 1, at time 300π ≈ 942. Therefore, the visual variability matches the model. Use the Augmented Dicky-Fuller test on the three growing series (y1, y2, and y4) to assess whether the series have a unit root. Since the series are growing, specify that there is a trend. In this case, the null hypothesis is H0 : yt = yt − 1 + c + b1 Δyt − 1 + b2 Δyt − 2 + εt and the alternative hypothesis is H1 : yt = ayt − 1 + c + δt + b1 Δyt − 1 + b2 Δyt − 2 + εt. Set the number of lags to 2 for demonstration purposes. hY1 = adftest(y1,Model="ts",Lags=2) hY1 = logical 1 hY2 = adftest(y2,Model="ts",Lags=2)
3-42
Unit Root Tests
hY2 = logical 0 hY4 = adftest(y4,Model="ts",Lags=2) hY4 = logical 0
• hY1 = 1 indicates that there is sufficient evidence to suggest that y1 is trend stationary. This is the correct decision because y1 is trend stationary by construction. • hY2 = 0 indicates that there is not enough evidence to suggest that y2 is trend stationary. This is the correct decision since y2 is difference stationary by construction. • hY4 = 0 indicates that there is not enough evidence to suggest that y4 is trend stationary. This is the correct decision, however, the Dickey-Fuller test is not appropriate for a heteroscedastic series. Use the Augmented Dickey-Fuller test on the AR(1) series (y3) to assess whether the series has a unit root. Since the series is not growing, specify that the series is autoregressive with a drift term. In this case, the null hypothesis is H0 : yt = yt − 1 + b1 Δyt − 1 + b2 Δyt − 2 + εt and the alternative hypothesis is H1 : yt = ayt − 1 + b1 Δyt − 1 + b2 Δyt − 2 + εt. Set the number of lags to 2 for demonstration purposes. hY3 = adftest(y3,Model="ard",Lags=2) hY3 = logical 1
hY3 = 1 indicates that there is enough evidence to suggest that y3 is a stationary, autoregressive process with a drift term. This is the correct decision because y3 is an autoregressive process with a drift term by construction. Use the KPSS test to assess whether the series are unit root nonstationary. Specify that there is a trend in the growing series (y1, y2, and y4). The KPSS test assumes the following model: yy = ct + δt + ut ct = ct − 1 + εt, where ut is a stationary process and εt is an independent and identically distributed process with mean 0 and variance σ2. Whether there is a trend in the model, the null hypothesis is H0 : σ2 = 0 (the series is trend stationary) and the alternative hypothesis is H1 : σ2 > 0 (not trend stationary). Set the number of lags to 2 for demonstration purposes. hY1 = kpsstest(y1,Lags=2,Trend=true) hY1 = logical 0 hY2 = kpsstest(y2,Lags=2,Trend=true) hY2 = logical 1
3-43
3
Model Selection
hY3 = kpsstest(y3,Lags=2) hY3 = logical 1 hY4 = kpsstest(y4,Lags=2,Trend=true) hY4 = logical 1
All is tests result in the correct decision. Use the variance ratio test on al four series to assess whether the series are random walks. The null hypothesis is H0: Var(Δyt) is constant, and the alternative hypothesis is H1: Var(Δyt) is not constant. Specify that the innovations are independent and identically distributed for all but y1. Test y4 both ways. hY1 = vratiotest(y1) hY1 = logical 1 hY2 = vratiotest(y2,IID=true) hY2 = logical 0 hY3 = vratiotest(y3,IID=true) hY3 = logical 0 hY4NotIID = vratiotest(y4) hY4NotIID = logical 0 hY4IID = vratiotest(y4,IID=true) hY4IID = logical 0
All tests result in the correct decisions, except for hY4_2 = 0. This test does not reject the hypothesis that the heteroscedastic process is an IID random walk. This inconsistency might be associated with the random seed. Alternatively, you can assess stationarity using pptest
Test Time Series Data for Unit Root
3-44
Unit Root Tests
This example shows how to test a univariate time series for a unit root. It uses wages data (1900-1970) in the manufacturing sector. The series is in the Nelson-Plosser data set. Load the Nelson-Plosser data. Extract the nominal wages data. load Data_NelsonPlosser wages = DataTable.WN;
Trim the NaN values from the series and the corresponding dates (this step is optional because the test ignores NaN values). wDates = dates(isfinite(wages)); wages = wages(isfinite(wages));
Plot the data to look for trends. plot(wDates,wages) title('Wages')
The plot suggests exponential growth. Transform the data using the log function to linearize the series. logWages = log(wages); plot(wDates,logWages) title('Log Wages')
3-45
3
Model Selection
The plot suggests that time series has a linear trend. Test the null hypothesis that there is no unit root (trend stationary) against the alternative hypothesis that the series is a unit root process with a trend (difference stationary). Set 'Lags',7:2:11, as suggested in Kwiatkowski et al., 1992. [h1,pValue1] = kpsstest(logWages,'Lags',7:2:11) h1 = 1x3 logical array 0
0
0
pValue1 = 1×3 0.1000
0.1000
0.1000
kpsstest fails to reject the null hypothesis that the wage series is trend stationary. Test the null hypothesis that the series is a unit root process (difference stationary) against the alternative hypothesis that the series is trend stationary. [h2,pValue2] = adftest(logWages,'Model','ts') h2 = logical 0
3-46
Unit Root Tests
pValue2 = 0.8327
adftest fails to reject the null hypothesis that the wage series is a unit root process. Because the results of the two tests are inconsistent, it is unclear that the wage series has a unit root. This is a typical result of tests on many macroeconomic series. kpsstest has a limited set of calculated critical values. When it calculates a test statistic that is outside this range, the test reports the p-value at the appropriate endpoint. So, in this case, pValue reflects the closest tabulated value. When a test statistic lies inside the span of tabulated values, kpsstest linearly interpolates the p-value.
Test Stock Data for Random Walk This example shows how to assess whether a time series is a random walk. It uses market data for daily returns of stocks and cash (money market) from the period January 1, 2000 to November 7, 2005. Load the data, which is availbale only with the Financial Toolbox™ documentation. load CAPMuniverse
The timetable AssetsTimeTable contains the data. The first column of data in the timetable is the daily return of a technology stock. The last (14th) column is the daily return for cash (the daily money market rate). The returns are the logs of the ratios of values at the end of a day over the values at the beginning of the day. Because vratiotest takes prices as inputs, as opposed to returns, Convert the data to prices (values) instead of returns. DTTCS = varfun(@cumsum,AssetsTimeTable,InputVariables=[1 14]); vnames = AssetsTimeTable.Properties.VariableNames([1 14]); DTTCS.Properties.VariableNames = vnames;
Plot the data to see whether the series appear to be stationary. figure tiledlayout(2,1) nexttile; plot(DTTCS.Time,DTTCS.AAPL); title("Log of Relative Stock Value") nexttile; plot(DTTCS.Time,DTTCS.CASH) title("Log of Accumulated Cash")
3-47
3
Model Selection
Cash has small variability, and appears to have long-term trends. The stock series has larger variability, and possibly an upwards trend in the second half of the data. Test whether the stock series matches a random walk. [VRTestAAPL1,ratioAAPL1] = vratiotest(DTTCS,DataVariable="AAPL") VRTestAAPL1=1×7 table h _____ Test 1
false
pValue _______
stat _______
cValue ______
0.16457
-1.3899
1.96
Alpha _____ 0.05
Period ______ 2
IID _____ false
ratioAAPL1 = 0.9436
vratiotest does not reject the hypothesis that a random walk is a reasonable model for the stock series. Test whether an iid random walk is a reasonable model for the stock series. [VRTestAAPL2,ratioAAPL2] = vratiotest(DTTCS,DataVariable="AAPL",IID=true) VRTestAAPL2=1×7 table h _____
3-48
pValue ________
stat _______
cValue ______
Alpha _____
Period ______
IID _____
Unit Root Tests
Test 1
true
0.030449
-2.1642
1.96
0.05
2
true
ratioAAPL2 = 0.9436
vratiotest rejects the hypothesis that an iid random walk is a reasonable model for the technology stock series at the 5% level. Thus, vratiotest indicates that the most appropriate model of the technology stock series is a heteroscedastic random walk. Test whether the cash series matches a random walk. [VRTestCASH,ratioCASH] = vratiotest(DTTCS,DataVariable="CASH") VRTestCASH=1×7 table h _____ Test 1
true
pValue ___________
stat ______
cValue ______
4.6093e-145
25.647
1.96
Alpha _____ 0.05
Period ______ 2
IID _____ false
ratioCASH = 2.0006
vratiotest rejects the hypothesis that a random walk is a reasonable model for the cash series (pValue = 4.6093e-145). The removal of a trend from the series does not affect the resulting statistics.
References [1] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178.
See Also adftest | kpsstest | pptest | vratiotest
More About •
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Unit Root Nonstationarity” on page 3-32
3-49
3
Model Selection
Assess Stationarity of a Time Series This example shows how to check whether a linear time series is a unit root process in several ways. You can assess unit root nonstationarity statistically, visually, and algebraically. Simulate Data Suppose that the true model for a linear time series is
where the innovation series is iid with mean 0 and variance 1.5. Simulate data from this model. This model is a unit root process because the lag polynomial of the right side has characteristic root 1. Mdl = arima('AR',0.2,'MA',-0.5,'D',1,'Constant',0,... 'Variance',1.5); T = 30; rng(5); Y = simulate(Mdl,T);
Assess Stationarity Statistically Econometrics Toolbox™ has four formal tests to choose from to check if a time series is nonstationary: adftest, kpsstest, pptest, and vratiotest. Use adftest to perform the DickeyFuller test on the data that you simulated in the previous steps. adftest(Y) ans = logical 0
The test result indicates that you should not reject the null hypothesis that the series is a unit root process. Assess Stationarity Visually Suppose you don't have the time series model, but you have the data. Inspect a plot of the data. Also, inspect the plots of the sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF). plot(Y); title('Simulated Time Series') xlabel('t') ylabel('Y') subplot(2,1,1) autocorr(Y) subplot(2,1,2) parcorr(Y)
3-50
Assess Stationarity of a Time Series
The downward sloping of the plot indicates a unit root process. The lengths of the line segments on the ACF plot gradually decay, and continue this pattern for increasing lags. This behavior indicates a nonstationary series. Assess Stationarity Algebraically Suppose you have the model in standard form:
Write the equation in lag operator notation and solve for
to get
Use LagOp to convert the rational polynomial to a polynomial. Also, use isStable to inspect the characteristic roots of the denominator. num = LagOp([1 -0.5]); denom = LagOp([1 -1.2 0.2]); quot = mrdivide(num,denom); [r1,r2] = isStable(denom) Warning: Termination window not currently open and coefficients are not below tolerance.
3-51
3
Model Selection
r1 = logical 0 r2 = 1.0000 0.2000
This warning indicates that the resulting quotient has a degree larger than 1001, e.g., there might not be a terminating degree. This indicates instability. r1 = 0 indicates that the denominator is unstable. r2 is a vector of characteristics roots, one of the roots is 1. Therefore, this is a unit root process. isStable is a numerical routine that calculates the characteristic values of a polynomial. If you use quot as an argument to isStable, then the output might indicate that the polynomial is stable (i.e., all characteristic values are slightly less than 1). You might need to adjust the tolerance options of isStable to get more accurate results.
See Also More About •
3-52
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
Information Criteria for Model Selection
Information Criteria for Model Selection Misspecification tests, such as the likelihood ratio (lratiotest), Lagrange multiplier (lmtest), and Wald (waldtest) tests, are appropriate only for comparing nested models. In contrast, information criteria are model selection tools to compare any models fit to the same data—the models being compared do not need to be nested. Information criteria are likelihood-based measures of model fit that include a penalty for complexity (specifically, the number of parameters). Different information criteria are distinguished by the form of the penalty, and can favor different models. Let logL θ denote the value of the maximized loglikelihood objective function for a model with k parameters fit to T data points. The aicbic function returns these information criteria: • Akaike information criterion (AIC). — The AIC compares models from the perspective of information entropy, as measured by Kullback-Leibler divergence. The AIC for a given model is −2logL θ + 2k . • Bayesian (Schwarz) information criterion (BIC) — The BIC compares models from the perspective of decision theory, as measured by expected loss. The BIC for a given model is −2logL θ + klog T . • Corrected AIC (AICc) — In small samples, AIC tends to overfit. The AICc adds a second-order bias-correction term to the AIC for better performance in small samples. The AICc for a given model is AIC+
2k k + 1 . T−k−1
The bias-correction term increases the penalty on the number of parameters relative to the AIC. Because the term approaches 0 with increasing sample size, AICc approaches AIC asymptotically. The analysis in [3] suggests using AICc when numObs/numParam < 40. • Consistent AIC (CAIC) — The CAIC imposes an additional penalty for complex models, as compared to the BIC. The CAIC for a given model is −2logL θ + k log T + 1 = BIC + k . • Hannan-Quinn criterion (HQC) — The HQC imposes a smaller penalty on complex models than the BIC in large samples. The HQC for a given model is −2logL θ + 2klog log T . Regardless of the information criterion, when you compare values for multiple models, smaller values of the criterion indicate a better, more parsimonious fit. Some experts scale information criteria values by T. aicbic scales results when you set the 'Normalize' name-value pair argument to true.
Compute Information Criteria Using aicbic
3-53
3
Model Selection
This example shows how to use aicbic to compute information criteria for several competing GARCH models fit to simulated data. Although this example uses aicbic, some Statistics and Machine Learning Toolbox™ and Econometrics Toolbox™ model fitting functions also return information criteria in their estimation summaries. Simulate Data Simulate a random path of length 50 from the ARCH(1) data generating process (DGP) yt = εt εt2 = 0 . 5 + 0 . 1εt2− 1, where εt is a random Gaussian series of innovations. rng(1) % For reproducibility DGP = garch('ARCH',{0.1},'Constant',0.5); T = 50; y = simulate(DGP,T); plot(y) ylabel('Innovation') xlabel('Time')
Create Competing Models Assume that the DGP is unknown, and that the ARCH(1), GARCH(1,1), ARCH(2), and GARCH(1,2) models are appropriate for describing the DGP. 3-54
Information Criteria for Model Selection
For each competing model, create a garch model template for estimation. Mdl(1) Mdl(2) Mdl(3) Mdl(4)
= = = =
garch(0,1); garch(1,1); garch(0,2); garch(1,2);
Estimate Models Fit each model to the simulated data y, compute the loglikelihood, and suppress the estimation display. numMdl = numel(Mdl); logL = zeros(numMdl,1); % Preallocate numParam = zeros(numMdl,1); for j = 1:numMdl [EstMdl,~,logL(j)] = estimate(Mdl(j),y,'Display','off'); results = summarize(EstMdl); numParam(j) = results.NumEstimatedParameters; end
Compute and Compare Information Criteria For each model, compute all available information criteria. Normalize the results by the sample size T. [~,~,ic] = aicbic(logL,numParam,T,'Normalize',true) ic = struct with fields: aic: [1.7619 1.8016 bic: [1.8384 1.9163 aicc: [1.7670 1.8121 caic: [1.8784 1.9763 hqc: [1.7911 1.8453
1.8019 1.9167 1.8124 1.9767 1.8456
1.8416] 1.9946] 1.8594] 2.0746] 1.8999]
ic is a 1-D structure array with a field for each information criterion. Each field contains a vector of measurements; element j corresponds to the model yielding loglikelihood logL(j). For each criterion, determine the model that yields the minimum value. [~,minIdx] = structfun(@min,ic); [Mdl(minIdx).Description]' ans = 5x1 string "GARCH(0,1) Conditional "GARCH(0,1) Conditional "GARCH(0,1) Conditional "GARCH(0,1) Conditional "GARCH(0,1) Conditional
Variance Variance Variance Variance Variance
Model Model Model Model Model
(Gaussian (Gaussian (Gaussian (Gaussian (Gaussian
Distribution)" Distribution)" Distribution)" Distribution)" Distribution)"
3-55
3
Model Selection
The model that minimizes all criteria is the ARCH(1) model, which has the same structure as the DGP.
References [1] Akaike, Hirotugu. "Information Theory and an Extension of the Maximum Likelihood Principle.” In Selected Papers of Hirotugu Akaike, edited by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa, 199–213. New York: Springer, 1998. https://doi.org/10.1007/978-1-4612-1694-0_15. [2] Akaike, Hirotugu. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19, no. 6 (December 1974): 716–23. https://doi.org/10.1109/ TAC.1974.1100705. [3] Burnham, Kenneth P., and David R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed, New York: Springer, 2002. [4] Hannan, Edward J., and Barry G. Quinn. “The Determination of the Order of an Autoregression.” Journal of the Royal Statistical Society: Series B (Methodological) 41, no. 2 (January 1979): 190–95. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x. [5] Lütkepohl, Helmut, and Markus Krätzig, editors. Applied Time Series Econometrics. 1st ed. Cambridge University Press, 2004. https://doi.org/10.1017/CBO9780511606885. [6] Schwarz, Gideon. “Estimating the Dimension of a Model.” The Annals of Statistics 6, no. 2 (March 1978): 461–64. https://doi.org/10.1214/aos/1176344136.
See Also aicbic | lratiotest | lmtest | waldtest
More About
3-56
•
“Choose ARMA Lags Using BIC” on page 7-135
•
“Compare Conditional Variance Models Using Information Criteria” on page 8-69
•
“Model Comparison Tests” on page 3-57
•
“Goodness of Fit” on page 3-85
Model Comparison Tests
Model Comparison Tests In this section... “Available Tests” on page 3-57 “Likelihood Ratio Test” on page 3-59 “Lagrange Multiplier Test” on page 3-59 “Wald Test” on page 3-59 “Covariance Matrix Estimation” on page 3-60
Available Tests The primary goal of model selection is choosing the most parsimonious model that adequately fits your data. Three asymptotically equivalent tests compare a restricted model (the null model) against an unrestricted model (the alternative model), fit to the same data: • Likelihood ratio (LR) test • Lagrange multiplier (LM) test • Wald (W) test For a model with parameters θ, consider the restriction r(θ) = 0, which is satisfied by the null model. For example, consider testing the null hypothesis θ = θ0 . The restriction function for this test is r(θ) = θ − θ0 . The LR, LM, and Wald tests approach the problem of comparing the fit of a restricted model against MLE
an unrestricted model differently. For a given data set, let l(θ0
) denote the loglikelihood function MLE
evaluated at the maximum likelihood estimate (MLE) of the restricted (null) model. Let l(θA ) denote the loglikelihood function evaluated at the MLE of the unrestricted (alternative) model. The following figure illustrates the rationale behind each test.
3-57
3
Model Selection
• Likelihood ratio test. If the restricted model is adequate, then the difference between the MLE
maximized objective functions, l(θA
MLE
) − l(θ0
), should not significantly differ from zero.
• Lagrange multiplier test. If the restricted model is adequate, then the slope of the tangent of the loglikelihood function at the restricted MLE (indicated by T0 in the figure) should not significantly differ from zero (which is the slope of the tangent of the loglikelihood function at the unrestricted MLE, indicated by T). • Wald test. If the restricted model is adequate, then the restriction function evaluated at the unrestricted MLE should not significantly differ from zero (which is the value of the restriction function at the restricted MLE). The three tests are asymptotically equivalent. Under the null, the LR, LM, and Wald test statistics are all distributed as χ 2 with degrees of freedom equal to the number of restrictions. If the test statistic exceeds the test critical value (equivalently, the p-value is less than or equal to the significance level), the null hypothesis is rejected. That is, the restricted model is rejected in favor of the unrestricted model. Choosing among the LR, LM, and Wald test is largely determined by computational cost: • To conduct a likelihood ratio test, you need to estimate both the restricted and unrestricted models. • To conduct a Lagrange multiplier test, you only need to estimate the restricted model (but the test requires an estimate of the variance-covariance matrix). 3-58
Model Comparison Tests
• To conduct a Wald test, you only need to estimate the unrestricted model (but the test requires an estimate of the variance-covariance matrix). All things being equal, the LR test is often the preferred choice for comparing nested models. Econometrics Toolbox has functionality for all three tests.
Likelihood Ratio Test You can conduct a likelihood ratio test using lratiotest. The required inputs are: • Value of the maximized unrestricted loglikelihood, l(θMLE) A • Value of the maximized restricted loglikelihood, l(θMLE) 0 • Number of restrictions (degrees of freedom) Given these inputs, the likelihood ratio test statistic is 2
MLE
G = 2 × l(θA
MLE
) − l(θ0
) .
When estimating conditional mean and variance models (using arima, garch, egarch, or gjr), you can return the value of the loglikelihood objective function as an optional output argument of estimate or infer. For multivariate time series models, you can get the value of the loglikelihood objective function using estimate.
Lagrange Multiplier Test The required inputs for conducting a Lagrange multiplier test are: • Gradient of the unrestricted likelihood evaluated at the restricted MLEs (the score), S • Variance-covariance matrix for the unrestricted parameters evaluated at the restricted MLEs, V Given these inputs, the LM test statistic is LM = S′VS . You can conduct an LM test using lmtest. A specific example of an LM test is Engle’s ARCH test, which you can conduct using archtest.
Wald Test The required inputs for conducting a Wald test are: • Restriction function evaluated at the unrestricted MLE, r • Jacobian of the restriction function evaluated at the unrestricted MLEs, R • Variance-covariance matrix for the unrestricted parameters evaluated at the unrestricted MLEs, V Given these inputs, the test statistic for the Wald test is −1
W = r′(RVR′)
r.
You can conduct a Wald test using waldtest. 3-59
3
Model Selection
Tip You can often compute the Jacobian of the restriction function analytically. Or, if you have Symbolic Math Toolbox™, you can use the function jacobian.
Covariance Matrix Estimation For estimating a variance-covariance matrix, there are several common methods, including: • Outer product of gradients (OPG). Let G be the matrix of gradients of the loglikelihood function. If your data set has N observations, and there are m parameters in the unrestricted likelihood, then G is an N × m matrix. −1
The matrix (G′G)
is the OPG estimate of the variance-covariance matrix.
For arima, garch, egarch, and gjr models, the estimate method returns the OPG estimate of the variance-covariance matrix. • Inverse negative Hessian (INH). Given the loglikelihood function l(θ), the INH covariance estimate has elements cov(i, j) = −
∂2 l(θ) ∂θi ∂θ j
−1
.
The estimation function for multivariate models, estimate, returns the expected Hessian variance-covariance matrix. Tip If you have Symbolic Math Toolbox, you can use jacobian twice to calculate the Hessian matrix for your loglikelihood function.
See Also Objects arima | garch | egarch | gjr Functions lmtest | waldtest | lratiotest
Related Examples •
“Conduct Lagrange Multiplier Test” on page 3-61
•
“Conduct Wald Test” on page 3-64
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-66
More About
3-60
•
“Goodness of Fit” on page 3-85
•
“Information Criteria for Model Selection” on page 3-53
•
“Maximum Likelihood Estimation for Conditional Mean Models” on page 7-106
•
“Maximum Likelihood Estimation for Conditional Variance Models” on page 8-52
Conduct Lagrange Multiplier Test
Conduct Lagrange Multiplier Test This example shows how to calculate the required inputs for conducting a Lagrange multiplier (LM) test with lmtest. The LM test compares the fit of a restricted model against an unrestricted model by testing whether the gradient of the loglikelihood function of the unrestricted model, evaluated at the restricted maximum likelihood estimates (MLEs), is significantly different from zero. The required inputs for lmtest are the score function and an estimate of the unrestricted variancecovariance matrix evaluated at the restricted MLEs. This example compares the fit of an AR(1) model against an AR(2) model. Compute Restricted MLE Obtain the restricted MLE by fitting an AR(1) model (with a Gaussian innovation distribution) to the given data. Assume you have presample observations ( y−1, y0) = (9.6249,9.6396). Y = [10.1591; 10.5965; 10.0357; 9.6318; Y0 = [9.6249;
10.1675; 10.1957; 10.6558; 10.2243; 10.4429; 10.3848; 10.3972; 9.9478; 9.6402; 9.7761; 10.8202; 10.3668; 10.3980; 10.2892; 9.6310; 9.1378; 9.6318; 9.1378]; 9.6396];
Mdl = arima(1,0,0); EstMdl = estimate(Mdl,Y,'Y0',Y0); ARIMA(1,0,0) Model (Gaussian Distribution): Value _______ Constant AR{1} Variance
3.2999 0.67097 0.12506
StandardError _____________ 2.4606 0.24635 0.043015
TStatistic __________
PValue _________
1.3411 2.7237 2.9074
0.17988 0.0064564 0.0036441
When conducting an LM test, only the restricted model needs to be fit. Compute Gradient Matrix Estimate the variance-covariance matrix for the unrestricted AR(2) model using the outer product of gradients (OPG) method. For an AR(2) model with Gaussian innovations, the contribution to the loglikelihood function at time t is given by logLt = − 0 . 5log(2πσε2) −
2
(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2) 2σε2
where σε2 is the variance of the innovation distribution. The contribution to the gradient at time t is ∂logLt ∂logLt ∂logLt ∂logLt , ∂c ∂ϕ1 ∂ϕ2 ∂σε2 3-61
3
Model Selection
where ∂logLt = ∂c
yt − c − ϕ1 yt − 1 − ϕ2 yt − 2
∂logLt = ∂ϕ1
yt − 1(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2)
∂logLt = ∂ϕ2
yt − 2(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2)
∂logLt
(yt − c − ϕ1 yt − 1 − ϕ2 yt − 2) 1 + 2 2σε 2σε4
∂σε2
σε2 σε2 σε2 2
=−
Evaluate the gradient matrix, G, at the restricted MLEs (using ϕ2 = 0 ). c = EstMdl.Constant; phi1 = EstMdl.AR{1}; phi2 = 0; sig2 = EstMdl.Variance; Yt = Y; Yt1 = [9.6396; Y(1:end-1)]; Yt2 = [9.6249; Yt1(1:end-1)]; N = length(Y); G = zeros(N,4); G(:,1) = (Yt-c-phi1*Yt1-phi2*Yt2)/sig2; G(:,2) = Yt1.*(Yt-c-phi1*Yt1-phi2*Yt2)/sig2; G(:,3) = Yt2.*(Yt-c-phi1*Yt1-phi2*Yt2)/sig2; G(:,4) = -0.5/sig2 + 0.5*(Yt-c-phi1*Yt1-phi2*Yt2).^2/sig2^2;
Estimate Variance-Covariance Matrix Compute the OPG variance-covariance matrix estimate. V = inv(G'*G) V = 4×4 6.1431 -0.6966 0.0827 0.0367
-0.6966 0.1535 -0.0846 -0.0061
0.0827 -0.0846 0.0771 0.0024
0.0367 -0.0061 0.0024 0.0019
Numerical inaccuracies can occur due to computer precision. To make the variance-covariance matrix symmetric, combine half of its value with half of its transpose. V = V/2 + V'/2;
Calculate Score Function Evaluate the score function (the sum of the individual contributions to the gradient). score = sum(G);
3-62
Conduct Lagrange Multiplier Test
Conduct Lagrange Multiplier Test Conduct the Lagrange multiplier test to compare the restricted AR(1) model against the unrestricted AR(2) model. The number of restrictions (the degree of freedom) is one. [h,p,LMstat,crit] = lmtest(score,V,1) h = logical 0 p = 0.5787 LMstat = 0.3084 crit = 3.8415
The restricted AR(1) model is not rejected in favor of the AR(2) model (h = 0).
See Also Objects arima Functions estimate | lmtest
Related Examples •
“Conduct Wald Test” on page 3-64
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-66
More About •
“Model Comparison Tests” on page 3-57
•
“Goodness of Fit” on page 3-85
•
“What Are Autoregressive Models?” on page 7-21
3-63
3
Model Selection
Conduct Wald Test This example shows how to calculate the required inputs for conducting a Wald test with waldtest. The Wald test compares the fit of a restricted model against an unrestricted model by testing whether the restriction function, evaluated at the unrestricted maximum likelihood estimates (MLEs), is significantly different from zero. The required inputs for waldtest are a restriction function, the Jacobian of the restriction function evaluated at the unrestricted MLEs, and an estimate of the variance-covariance matrix evaluated at the unrestricted MLEs. This example compares the fit of an AR(1) model against an AR(2) model. Compute Unrestricted MLE Obtain the unrestricted MLEs by fitting an AR(2) model (with a Gaussian innovation distribution) to the given data. Assume you have presample observations (y−1, y0) = (9.6249,9.6396) Y = [10.1591; 10.5965; 10.0357; 9.6318; Y0 = [9.6249;
10.1675; 10.1957; 10.6558; 10.2243; 10.4429; 10.3848; 10.3972; 9.9478; 9.6402; 9.7761; 10.8202; 10.3668; 10.3980; 10.2892; 9.6310; 9.1378; 9.6318; 9.1378]; 9.6396];
Mdl = arima(2,0,0); [EstMdl,V] = estimate(Mdl,Y,'Y0',Y0); ARIMA(2,0,0) Model (Gaussian Distribution): Value _______ Constant AR{1} AR{2} Variance
StandardError _____________
2.8802 0.60623 0.10631 0.12386
2.5239 0.40372 0.29283 0.042598
TStatistic __________ 1.1412 1.5016 0.36303 2.9076
PValue _________ 0.25379 0.1332 0.71658 0.0036425
When conducting a Wald test, only the unrestricted model needs to be fit. estimate returns the estimated variance-covariance matrix as an optional output. Compute Jacobian Matrix Define the restriction function, and calculate its Jacobian matrix. For comparing an AR(1) model to an AR(2) model, the restriction function is r(c, ϕ1, ϕ2, σε2) = ϕ2 − 0 = 0 . The Jacobian of the restriction function is ∂r ∂r ∂r ∂r ∂c ∂ϕ1 ∂ϕ2 ∂σε2 = 0 0 1 0 Evaluate the restriction function and Jacobian at the unrestricted MLEs. 3-64
Conduct Wald Test
r = EstMdl.AR{2}; R = [0 0 1 0];
Conduct Wald Test Conduct a Wald test to compare the restricted AR(1) model against the unrestricted AR(2) model. [h,p,Wstat,crit] = waldtest(r,R,V) h = logical 0 p = 0.7166 Wstat = 0.1318 crit = 3.8415
The restricted AR(1) model is not rejected in favor of the AR(2) model (h = 0).
See Also arima | estimate | waldtest
Related Examples •
“Conduct Lagrange Multiplier Test” on page 3-61
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-66
More About •
“Model Comparison Tests” on page 3-57
•
“Goodness of Fit” on page 3-85
•
“What Are Autoregressive Models?” on page 7-21
3-65
3
Model Selection
Compare GARCH Models Using Likelihood Ratio Test This example shows how to conduct a likelihood ratio test to choose the number of lags in a GARCH model. Load Data Load the Deutschmark/British pound foreign-exchange rate data included with the toolbox. Convert the daily rates to returns. load Data_MarkPound Y = Data; r = price2ret(Y); N = length(r); figure plot(r) xlim([0,N]) title('Mark-Pound Exchange Rate Returns')
The daily returns exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. The returns are of relatively high frequency. Therefore, the daily changes can be small. For numerical stability, scale the data to percentage returns. 3-66
Compare GARCH Models Using Likelihood Ratio Test
r = 100*r;
Fit GARCH(1,1) Model Create and fit a GARCH(1,1) model (with a mean offset) to the returns series. Return the value of the loglikelihood objective function. Mdl1 = garch('Offset',NaN,'GARCHLags',1,'ARCHLags',1); [EstMdl1,~,logL1] = estimate(Mdl1,r); GARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution):
Constant GARCH{1} ARCH{1} Offset
Value __________
StandardError _____________
TStatistic __________
PValue __________
0.010761 0.80597 0.15313 -0.0061904
0.001323 0.01656 0.013974 0.0084336
8.1342 48.669 10.959 -0.73402
4.1454e-16 0 6.038e-28 0.46294
Fit GARCH(2,1) Model Create and fit a GARCH(2,1) model with a mean offset. Mdl2 = garch(2,1); Mdl2.Offset = NaN; [EstMdl2,~,logL2] = estimate(Mdl2,r); GARCH(2,1) Conditional Variance Model with Offset (Gaussian Distribution):
Constant GARCH{1} GARCH{2} ARCH{1} Offset
Value __________
StandardError _____________
TStatistic __________
PValue __________
0.011226 0.48964 0.29769 0.16842 -0.0049837
0.001538 0.11159 0.10218 0.016583 0.0084764
7.2992 4.3878 2.9133 10.156 -0.58795
2.8947e-13 1.1453e-05 0.003576 3.1158e-24 0.55657
Conduct Likelihood Ratio Test Conduct a likelihood ratio test to compare the restricted GARCH(1,1) model fit to the unrestricted GARCH(2,1) model fit. The degree of freedom for this test is one (the number of restrictions). [h,p] = lratiotest(logL2,logL1,1) h = logical 1 p = 0.0218
3-67
3
Model Selection
At the 0.05 significance level, the null GARCH(1,1) model is rejected (h = 1) in favor of the unrestricted GARCH(2,1) alternative.
See Also Objects garch Functions estimate | lratiotest
Related Examples •
“Conduct Lagrange Multiplier Test” on page 3-61
•
“Conduct Wald Test” on page 3-64
•
“Compare Conditional Variance Models Using Information Criteria” on page 8-69
More About
3-68
•
“Model Comparison Tests” on page 3-57
•
“Goodness of Fit” on page 3-85
•
“GARCH Model” on page 8-3
Classical Model Misspecification Tests
Classical Model Misspecification Tests This example shows the use of the likelihood ratio, Wald, and Lagrange multiplier tests. These tests are useful in the evaluation and assessment of model restrictions and, ultimately, the selection of a model that balances the often competitive goals of adequacy and simplicity. Introduction Econometric models are a balance. On the one hand, they must be sufficiently detailed to account for relevant economic factors and their influence on observed data patterns. On the other hand, they must avoid unnecessary complexities that lead to computational challenges, over-fitting, or problems with interpretation. Working models are often developed by considering a sequence of nested specifications, in which larger, theoretical models are examined for simplifying restrictions on the parameters. If the parameters are estimated by maximum likelihood, three classical tests are typically used to assess the adequacy of the restricted models. They are the likelihood ratio test, the Wald test, and the Lagrange multiplier test. The loglikelihood of model parameters θ, given data d, is denoted L(θ | d). With no restrictions on the model, L is optimized at the maximum likelihood estimate (MLE) θ. With restrictions of the form r (θ) ∼ = 0, L is optimized at θ with a generally reduced loglikelihood of describing the data. The classical tests evaluate the statistical significance of model restrictions using information obtained from these optimizations. The framework is very general; it encompasses both linear and nonlinear models, and both linear and nonlinear restrictions. In particular, it extends the familiar framework of t and F tests for linear models. Each test uses the geometry of the loglikelihood surface to evaluate the significance of model restrictions in a different way: • The likelihood ratio test considers the difference in loglikelihoods at θ and ∼ θ . If the restrictions are insignificant, this difference should be near zero. • The Wald test considers the value of r at θ. If the restrictions are insignificant, this value should be ∼ near the value of r at θ , which is zero. • The Lagrange multiplier test considers the gradient, or score, of L at ∼ θ . If the restrictions are insignificant, this vector should be near the score at θ, which is zero. The likelihood ratio test evaluates the difference in loglikelihoods directly. The Wald and Lagrange multiplier tests do so indirectly, with the idea that insignificant changes in the evaluated quantities can be identified with insignificant changes in the parameters. This identification depends on the curvature of the loglikelihood surface in the neighborhood of the MLE. As a result, the Wald and Lagrange multiplier tests include an estimate of parameter covariance in the formulation of the test statistic. Econometrics Toolbox™ software implements the likelihood ratio, Wald, and Lagrange multiplier tests in the functions lratiotest, waldtest, and lmtest, respectively. Data and Models Consider the following data from the U.S. Census Bureau, giving average annual earnings by educational attainment level: 3-69
3
Model Selection
load Data_Income2 numLevels = 8; X = 100*repmat(1:numLevels,numLevels,1); % Levels: 100, 200, ..., 800 x = X(:); % Education y = Data(:); % Income n = length(y); % Sample size levelNames = DataTable.Properties.VariableNames; boxplot(Data,'labels',levelNames) grid on xlabel('Educational Attainment') ylabel('Average Annual Income (1999 Dollars)') title('{\bf Income and Education}')
The income distributions in the data are conditional on the educational attainment level x. This pattern is also evident in a histogram of the data, which shows the small sample size: figure edges = [0:0.2:2]*1e5; centers =[0.1:0.2:1.9]*1e5; BinCounts = zeros(length(edges)-1,numLevels); for j = 1:numLevels BinCounts(:,j) = histcounts(Data(:,j),edges); end;
3-70
Classical Model Misspecification Tests
h = bar(centers,BinCounts); axis tight grid on legend(h,levelNames) xlabel('Average Annual Income (1999 Dollars)') ylabel('Number of Observations') title('{\bf Income and Education}')
A common model for this type of data is the gamma distribution, with conditional density ρ
f (yi | xi, β, ρ) =
βi ρ − 1 −yiβi y e , Γ(ρ)
where βi = 1/(β + xi) and i = 1, . . . , n . Gamma distributions are sums of ρ exponential distributions, and so admit natural restrictions on the value of ρ. The exponential distribution, with ρ equal to 1, is monotonically decreasing and too simple to describe the unimodal distributions in the data. For the purposes of illustration, we will maintain a restricted model that is the sum of two exponential distributions, obtained by imposing the restriction 3-71
3
Model Selection
r(β, ρ) = ρ − 2 = 0 . This null model will be tested against the unrestricted alternative represented by the general gamma distribution. The loglikelihood function of the conditional gamma density, and its derivatives, are found analytically: n
∑
L(β, ρ | x) = ρ
i=1
n
lnβi − nlnΓ(ρ) + (ρ − 1)
n
∑
i=1
n
lnyi −
∑
i=1
yi βi
n
∂L 2 = − ρ ∑ βi + ∑ yiβi ∂β i=1 i=1 ∂L = ∂ρ ∂2 L 2
∂β
n
∑
n
i=1
lnβi − nΨ(ρ) +
n
=ρ
∑
i=1
2
βi − 2
n
∑
i=1
∑
i=1
lnyi
3
yi βi
∂2 L = − nΨ′(ρ) ∂ρ2 n
∂2 L = − ∑ βi ∂β ∂ρ i=1 where Ψ is the digamma function, the derivative of lnΓ. The loglikelihood function is used to find MLEs for the restricted and unrestricted models. The derivatives are used to construct gradients and parameter covariance estimates for the Wald and Lagrange multiplier tests. Maximum Likelihood Estimation Since optimizers in MATLAB® and Optimization Toolbox™ software find minima, we maximize the loglikelihood by minimizing the negative loglikelihood function. Using the L found above, we code the negative loglikelihood function with parameter vector p = [beta;rho]: nLLF = @(p)sum(p(2)*(log(p(1)+x))+gammaln(p(2))-(p(2)-1)*log(y)+y./(p(1)+x));
We use the function fmincon to compute the restricted parameter estimates at ρ = 2. The lower bound on β assures that the logarithm in nLLF is evaluated at positive arguments: options = optimoptions(@fmincon,'TolFun',1e-10,'Display','off'); rp0 = [1 1]; % Initial values rlb = [-min(x) 2]; % Lower bounds rub = [Inf 2]; % Upper bounds [rmle,rnLL] = fmincon(nLLF,rp0,[],[],[],[],rlb,rub,[],options); rbeta = rmle(1); % Restricted beta estimate rrho = rmle(2); % Restricted rho estimate rLL = -rnLL; % Restricted loglikelihood
3-72
Classical Model Misspecification Tests
Unrestricted parameter estimates are computed in a similar manner, starting from initial values given by the restricted estimates: up0 = [rbeta rrho]; % Initial values ulb = [-min(x) 0]; % Lower bounds uub = [Inf Inf]; % Upper bounds [umle,unLL] = fmincon(nLLF,up0,[],[],[],[],ulb,uub,[],options); ubeta = umle(1); % Unrestricted beta estimate urho = umle(2); % Unrestricted rho estimate uLL = -unLL; % Unrestricted loglikelihood
We display the MLEs on a logarithmic contour plot of the negative loglikelihood surface: betas = 1e3:1e2:4e4; rhos = 0:0.1:10; [BETAS,RHOS] = meshgrid(betas,rhos); NLL = zeros(size(BETAS)); for i = 1:numel(NLL) NLL(i) = nLLF([BETAS(i),RHOS(i)]); end L = log10(unLL); v = logspace(L-0.1,L+0.1,100); contour(BETAS,RHOS,NLL,v) % Negative loglikelihood surface colorbar hold on plot(ubeta,urho,'bo','MarkerFaceColor','b') % Unrestricted MLE line([1e3 4e4],[2 2],'Color','k','LineWidth',2) % Restriction plot(rbeta,rrho,'bs','MarkerFaceColor','b') % Restricted MLE legend('nllf','umle','restriction','rmle') xlabel('\beta') ylabel('\rho') title('{\bf Unrestricted and Restricted MLEs}')
3-73
3
Model Selection
Covariance Estimators The intuitive relationship between the curvature of the loglikelihood surface and the variance/ covariance of the parameter estimates is formalized by the information matrix equality, which identifies the negative expected value of the Hessian with the Fisher information matrix. The second derivatives in the Hessian express loglikelihood concavities. The Fisher information matrix expresses parameter variance; its inverse is the asymptotic covariance matrix. Covariance estimators required by the Wald and Lagrange multiplier tests are computed in a variety of ways. One approach is to use the outer product of gradients (OPG), which requires only first derivatives of the loglikelihood. While popular for its relative simplicity, the OPG estimator can be unreliable, especially with small samples. Another, often preferable, estimator is the inverse of the negative expected Hessian. By the information matrix equality, this estimator is the asymptotic covariance, appropriate for large samples. If analytic expectations are difficult to compute, the expected Hessian can be replaced by the Hessian evaluated at the parameter estimates, the so-called "observed" Fisher information. We compute each of the three estimators, using the derivatives of L found earlier. Conditional expectations in the Hessian are found using E[g(X)Y | X] = g(X)E[Y | X] . We evaluate the estimators at the unrestricted parameter estimates, for the Wald test, and then at the restricted parameter estimates, for the Lagrange multiplier test.
3-74
Classical Model Misspecification Tests
Different scales for the β and ρ parameters are reflected in the relative sizes of the variances. The small sample size is reflected in the differences among the estimators. We increase the precision of the displays to show these differences: format long
Evaluated at the unrestricted parameter estimates, the estimators are: % OPG estimator: UG = [-urho./(ubeta+x)+y.*(ubeta+x).^(-2),-log(ubeta+x)-psi(urho)+log(y)]; Uscore = sum(UG)'; UEstCov1 = inv(UG'*UG) %#ok UEstCov1 = 2×2 106 × 6.163694782854278 -0.002335589407070
-0.002335589407070 0.000000949847037
% Hessian estimator (observed information): UDPsi = (psi(urho+0.0001)-psi(urho-0.0001))/(0.0002); % Digamma derivative UH = [sum(urho./(ubeta+x).^2)-2*sum(y./(ubeta+x).^3),-sum(1./(ubeta+x)); ... -sum(1./(ubeta+x)),-n*UDPsi]; UEstCov2 = -inv(UH) %#ok UEstCov2 = 2×2 106 × 5.914336186238142 -0.001864364370431
-0.001864364370431 0.000000648730014
% Expected Hessian estimator (expected information): UEH = [-sum(urho./((ubeta+x).^2)), -sum(1./(ubeta+x)); ... -sum(1./(ubeta+x)),-n*UDPsi]; UEstCov3 = -inv(UEH) %#ok UEstCov3 = 2×2 106 × 4.993544524752526 -0.001574105056079
-0.001574105056079 0.000000557232151
Evaluated at the restricted parameter estimates, the estimators are: % OPG estimator: RG = [-rrho./(rbeta+x)+y.*(rbeta+x).^(-2),-log(rbeta+x)-psi(rrho)+log(y)]; Rscore = sum(RG)'; REstCov1 = inv(RG'*RG) %#ok REstCov1 = 2×2 107 ×
3-75
3
Model Selection
6.614327250443059 -0.000476569931624
-0.000476569931624 0.000000040110448
% Hessian estimator (observed information): RDPsi = (psi(rrho+0.0001)-psi(rrho-0.0001))/(0.0002); % Digamma derivative RH = [sum(rrho./(rbeta+x).^2)-2*sum(y./(rbeta+x).^3),-sum(1./(rbeta+x)); ... -sum(1./(rbeta+x)),-n*RDPsi]; REstCov2 = -inv(RH) %#ok REstCov2 = 2×2 107 × 2.708411988833666 -0.000153135043006
-0.000153135043006 0.000000011081064
% Expected Hessian estimator (expected information): REH = [-sum(rrho./((rbeta+x).^2)),-sum(1./(rbeta+x)); ... -sum(1./(rbeta+x)),-n*RDPsi]; REstCov3 = -inv(REH) %#ok REstCov3 = 2×2 107 × 2.613663217708711 -0.000147777897490
-0.000147777897490 0.000000010778169
Return to short numerical displays: format short
The Likelihood Ratio Test The likelihood ratio test, which evaluates the statistical significance of the difference in loglikelihoods at the unrestricted and restricted parameter estimates, is generally considered to be the most reliable of the three classical tests. Its main disadvantage is that it requires estimation of both models. This may be an issue if either the unrestricted model or the restrictions are nonlinear, making significant demands on the necessary optimizations. Once the required loglikelihoods have been obtained through maximum likelihood estimation, use lratiotest to run the likelihood ratio test: dof = 1; % Number of restrictions [LRh,LRp,LRstat,cV] = lratiotest(uLL,rLL,dof) %#ok LRh = logical 1 LRp = 7.9882e-05 LRstat = 15.5611 cV = 3.8415
3-76
Classical Model Misspecification Tests
The test rejects the restricted model (LRh = 1), with a p-value (LRp = 7.9882e-005) well below the default significance level (alpha = 0.05), and a test statistic (LRstat = 15.5611) well above the critical value (cV = 3.8415). Like the Wald and Lagrange multiplier tests, the likelihood ratio test is asymptotic; the test statistic is evaluated with a limiting distribution obtained by letting the sample size tend to infinity. The same chi-square distribution, with degree of freedom dof, is used to evaluate the individual test statistics of each of the three tests, with the same critical value cV. Consequences for drawing inferences from small samples should be apparent, and this is one of the reasons why the three tests are often used together, as checks against one another. The Wald Test The Wald test is appropriate in situations where the restrictions impose significant demands on parameter estimation, as in the case of multiple nonlinear constraints. The Wald test has the advantage that it requires only the unrestricted parameter estimate. Its main disadvantage is that, unlike the likelihood ratio test, it also requires a reasonably accurate estimate of the parameter covariance. To perform a Wald test, restrictions must be formulated as functions from p-dimensional parameter space to q-dimensional restriction space: r1(θ1, …, θp) r=
⋮ rq(θ1, …, θp)
with Jacobian ∂r1 ∂r1 … ∂θ1 ∂θp R=
⋮ ⋱ ⋮ . ∂rq ∂rq … ∂θ1 ∂θp
For the gamma distribution under consideration, the single restriction r(β, ρ) = ρ − 2 maps 2-dimensional parameter space to 1-dimensional restriction space with Jacobian [0 1]. Use waldtest to run the Wald test with each of the unrestricted covariance estimates computed previously. The number of restrictions dof is the length of the input vector r, so it does not have to be input explicitly as for lratiotest or lmtest: r = urho-2; % Restriction vector R = [0 1]; % Jacobian restrictions = {r,r,r}; Jacobians = {R,R,R}; UEstCov = {UEstCov1,UEstCov2,UEstCov3}; [Wh,Wp,Wstat,cV] = waldtest(restrictions,Jacobians,UEstCov) %#ok Wh = 1x3 logical array
3-77
3
Model Selection
1
1
1
Wp = 1×3 0.0144
0.0031
0.0014
8.7671
10.2066
3.8415
3.8415
Wstat = 1×3 5.9878 cV = 1×3 3.8415
The test rejects the restricted model with each of the covariance estimates. Hypothesis tests in Econometrics Toolbox and Statistics Toolbox™ software operate at a default 5% significance level. The significance level can be changed with an optional input: alpha = 0.01; % 1% significance level [Wh2,Wp2,Wstat2,cV2] = waldtest(restrictions,Jacobians,UEstCov,alpha) %#ok Wh2 = 1x3 logical array 0
1
1
Wp2 = 1×3 0.0144
0.0031
0.0014
8.7671
10.2066
6.6349
6.6349
Wstat2 = 1×3 5.9878 cV2 = 1×3 6.6349
The OPG estimator fails to reject the restricted model at the new significance level. The Lagrange Multiplier Test The Lagrange multiplier test is appropriate in situations where the unrestricted model imposes significant demands on parameter estimation, as in the case where the restricted model is linear but the unrestricted model is not. The Lagrange multiplier test has the advantage that it requires only the restricted parameter estimate. Its main disadvantage is that, like the Wald test, it also requires a reasonably accurate estimate of the parameter covariance. Use lmtest to run the Lagrange multiplier test with each of the restricted covariance estimates computed previously: 3-78
Classical Model Misspecification Tests
scores = {Rscore,Rscore,Rscore}; REstCov = {REstCov1,REstCov2,REstCov3}; [LMh,LMp,LMstat,cV] = lmtest(scores,REstCov,dof) %#ok LMh = 1x3 logical array 1
1
1
LMp = 1×3 0.0000
0.0024
0.0027
9.2442
8.9916
3.8415
3.8415
LMstat = 1×3 33.4617 cV = 1×3 3.8415
The test again rejects the restricted model with each of the covariance estimates at the default significance level. The reliability of the OPG estimator is called into question by the anomalously large value of the first test statistic. Summary The three classical model misspecification tests form a natural toolkit for econometricians. In the context of maximum likelihood estimation, they all attempt to make the same distinction, between an unrestricted and a restricted model in some hierarchy of progressively simpler descriptions of the data. Each test, however, comes with different requirements, and so may be useful in different modeling situations, depending on the computational demands. When used together, inferences can vary among the tests, especially with small samples. Users should consider the tests as only one component of a wider statistical and economic analysis.
References [1] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
3-79
3
Model Selection
Check Fit of Multiplicative ARIMA Model This example shows how to do goodness of fit checks. Residual diagnostic plots help verify model assumptions, and cross-validation prediction checks help assess predictive performance. The time series is monthly international airline passenger numbers from 1949 to 1960. Load the data and estimate a model. Load the airline data set. load Data_Airline y = log(Data); T = length(y); Mdl = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12); EstMdl = estimate(Mdl,y); ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution): Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
Check the residuals for normality. One assumption of the fitted model is that the innovations follow a Gaussian distribution. Infer the residuals, and check them for normality. res = infer(EstMdl,y); stres = res/sqrt(EstMdl.Variance); figure subplot(1,2,1) qqplot(stres) x = -4:.05:4; [f,xi] = ksdensity(stres); subplot(1,2,2) plot(xi,f,'k','LineWidth',2); hold on plot(x,normpdf(x),'r--','LineWidth',2) legend('Standardized Residuals','Standard Normal') hold off
3-80
Check Fit of Multiplicative ARIMA Model
The quantile-quantile plot (QQ-plot) and kernel density estimate show no obvious violations of the normality assumption. Check the residuals for autocorrelation. Confirm that the residuals are uncorrelated. Look at the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) plots for the standardized residuals. figure subplot(2,1,1) autocorr(stres) subplot(2,1,2) parcorr(stres)
3-81
3
Model Selection
[h,p] = lbqtest(stres,'lags',[5,10,15],'dof',[3,8,13]) h = 1x3 logical array 0
0
0
p = 1×3 0.1842
0.3835
0.7321
The sample ACF and PACF plots show no significant autocorrelation. More formally, conduct a LjungBox Q-test at lags 5, 10, and 15, with degrees of freedom 3, 8, and 13, respectively. The degrees of freedom account for the two estimated moving average coefficients. The Ljung-Box Q-test confirms the sample ACF and PACF results. The null hypothesis that all autocorrelations are jointly equal to zero up to the tested lag is not rejected (h = 0) for any of the three lags. Check predictive performance. Use a holdout sample to compute the predictive MSE of the model. Use the first 100 observations to estimate the model, and then forecast the next 44 periods. y1 = y(1:100); y2 = y(101:end);
3-82
Check Fit of Multiplicative ARIMA Model
Mdl1 = estimate(Mdl,y1); ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution): Value _________ Constant MA{1} SMA{12} Variance
0 -0.35674 -0.63319 0.0013285
StandardError _____________ 0 0.089461 0.098744 0.00015882
TStatistic __________ NaN -3.9876 -6.4124 8.365
PValue __________ NaN 6.6739e-05 1.4326e-10 6.013e-17
yF1 = forecast(Mdl1,44,'Y0',y1); pmse = mean((y2-yF1).^2) pmse = 0.0069 figure plot(y2,'r','LineWidth',2) hold on plot(yF1,'k--','LineWidth',1.5) xlim([0,44]) title('Prediction Error') legend('Observed','Forecast','Location','northwest') hold off
3-83
3
Model Selection
The predictive ability of the model is quite good. You can optionally compare the PMSE for this model with the PMSE for a competing model to help with model selection.
See Also Objects arima Functions autocorr | parcorr | lbqtest | estimate | infer | forecast
More About
3-84
•
“Create Multiplicative Seasonal ARIMA Model for Time Series Data” on page 7-51
•
“Estimate Multiplicative ARIMA Model” on page 7-117
•
“Simulate Multiplicative ARIMA Models” on page 7-159
•
“Forecast Multiplicative ARIMA Model” on page 7-174
•
“Detect Autocorrelation” on page 3-19
•
“Goodness of Fit” on page 3-85
•
“Residual Diagnostics” on page 3-86
•
“Assess Predictive Performance” on page 3-88
•
“MMSE Forecasting of Conditional Mean Models” on page 7-167
•
“Autocorrelation and Partial Autocorrelation” on page 3-10
•
“Ljung-Box Q-Test” on page 3-17
Goodness of Fit
Goodness of Fit After specifying a model and estimating its parameters, it is good practice to perform goodness-of-fit checks to diagnose the adequacy of your fitted model. When assessing model adequacy, areas of primary concern are: • Violations of model assumptions, potentially resulting in bias and inaccurate standard errors • Poor predictive performance • Missing explanatory variables Goodness-of-fit checks can help you identify areas of model inadequacy. They can also suggest ways to improve your model. For example, if you conduct a test for residual autocorrelation and get a significant result, you might be able to improve your model fit by adding additional autoregressive or moving average terms. Some strategies for evaluating goodness of fit are: • Compare your model against an augmented alternative. Make comparisons, for example, by conducting a likelihood ratio test. Testing your model against a more elaborate alternative model is a way to assess evidence of inadequacy. Give careful thought when choosing an alternative model. • Making residual diagnostic plots is an informal—but useful—way to assess violation of model assumptions. You can plot residuals to check for normality, residual autocorrelation, residual heteroscedasticity, and missing predictors. Formal tests for autocorrelation and heteroscedasticity can also help quantify possible model violations. • Predictive performance checks. Divide your data into two parts: a training set and a validation set. Fit your model using only the training data, and then forecast the fitted model over the validation period. By comparing model forecasts against the true, holdout observations, you can assess the predictive performance of your model. Prediction mean square error (PMSE) can be calculated as a numerical summary of the predictive performance. When choosing among competing models, you can look at their respective PMSE values to compare predictive performance.
See Also Related Examples •
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
•
“Check Fit of Multiplicative ARIMA Model” on page 3-80
•
“Compare GARCH Models Using Likelihood Ratio Test” on page 3-66
More About •
“Residual Diagnostics” on page 3-86
•
“Model Comparison Tests” on page 3-57
•
“Assess Predictive Performance” on page 3-88
3-85
3
Model Selection
Residual Diagnostics In this section... “Check Residuals for Normality” on page 3-86 “Check Residuals for Autocorrelation” on page 3-86 “Check Residuals for Conditional Heteroscedasticity” on page 3-86
Check Residuals for Normality A common assumption of time series models is a Gaussian innovation distribution. After fitting a model, you can infer residuals and check them for normality. If the Gaussian innovation assumption holds, the residuals should look approximately normally distributed. Some plots for assessing normality are: • Histogram • Box plot • Quantile-quantile plot • Kernel density estimate The last three plots are in Statistics and Machine Learning Toolbox. If you see that your standardized residuals have excess kurtosis (fatter tails) compared to a standard normal distribution, you can consider using a Student’s t innovation distribution.
Check Residuals for Autocorrelation In time series models, the innovation process is assumed to be uncorrelated. After fitting a model, you can infer residuals and check them for any unmodeled autocorrelation. As an informal check, you can plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). If either plot shows significant autocorrelation in the residuals, you can consider modifying your model to include additional autoregression or moving average terms. More formally, you can conduct a Ljung-Box Q-test on the residual series. This tests the null hypothesis of jointly zero autocorrelations up to lag m, against the alternative of at least one nonzero autocorrelation. You can conduct the test at several values of m. The degrees of freedom for the Qtest are usually m. However, for testing a residual series, you should use degrees of freedom m – p – q, where p and q are the number of AR and MA coefficients in the fitted model, respectively.
Check Residuals for Conditional Heteroscedasticity A white noise innovation process has constant variance. After fitting a model, you can infer residuals and check them for heteroscedasticity (nonconstant variance). As an informal check, you can plot the sample ACF and PACF of the squared residual series. If either plot shows significant autocorrelation, you can consider modifying your model to include a conditional variance process. 3-86
Residual Diagnostics
More formally, you can conduct an Engle’s ARCH test on the residual series. This tests the null hypothesis of no ARCH effects against the alternative ARCH model with k lags.
See Also Apps Econometric Modeler Functions histogram | boxplot | qqplot | ksdensity | autocorr | parcorr | lbqtest | archtest
Related Examples •
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
•
“Detect Autocorrelation” on page 3-19
•
“Detect ARCH Effects” on page 3-27
•
“Check Fit of Multiplicative ARIMA Model” on page 3-80
More About •
“Goodness of Fit” on page 3-85
•
“Assess Predictive Performance” on page 3-88
•
“Ljung-Box Q-Test” on page 3-17
•
“Engle’s ARCH Test” on page 3-25
•
“Autocorrelation and Partial Autocorrelation” on page 3-10
3-87
3
Model Selection
Assess Predictive Performance If you plan to use a fitted model for forecasting, a good practice is to assess the predictive ability of the model. Models that fit well in-sample are not guaranteed to forecast well. For example, overfitting can lead to good in-sample fit, but poor predictive performance. When checking predictive performance, it is important to not use your data twice. That is, the data you use to fit your model should be different than the data you use to assess forecasts. You can use cross validation to evaluate out-of-sample forecasting ability: 1
Divide your time series into two parts: a training set and a validation set.
2
Fit a model to your training data.
3
Forecast the fitted model over the validation period.
4
Compare the forecasts to the holdout validation observations using plots and numerical summaries (such as predictive mean square error).
Prediction mean square error (PMSE) measures the discrepancy between model forecasts and observed data. Suppose you have a time series of length N, and you set aside M validation points, v .. After fitting your model to the first N – M data points (the training set), denoted y1v, y2v, …, yM v
v
v
generate forecasts y 1, y 2, …, y M . The model PMSE is calculated as PMSE =
M
2
1 v yiv − y i . Mi∑ =1
You can calculate PMSE for various choices of M to verify the robustness of your results.
See Also Related Examples •
“Check Fit of Multiplicative ARIMA Model” on page 3-80
More About
3-88
•
“Goodness of Fit” on page 3-85
•
“Residual Diagnostics” on page 3-86
•
“MMSE Forecasting of Conditional Mean Models” on page 7-167
•
“MMSE Forecasting of Conditional Variance Models” on page 8-90
Nonspherical Models
Nonspherical Models What Are Nonspherical Models? Consider the linear time series model yt = Xt β + εt, where yt is the response, xt is a vector of values for the r predictors, β is the vector of regression coefficients, and εt is the random innovation at time t. Ordinary least squares (OLS) estimation and inference techniques for this framework depend on certain assumptions, e.g., homoscedastic and uncorrelated innovations. For more details on the classical linear model, see “Time Series Regression I: Linear Models” on page 5-176. If your data exhibits signs of assumption violations, then OLS estimates or inferences based on them might not be valid. In particular, if the data is generated with an innovations process that exhibits autocorrelation or heteroscedasticity, then the model (or the residuals) are nonspherical. These characteristics are often detected through testing of model residuals (for details, see “Time Series Regression VI: Residual Diagnostics” on page 5-223). Nonspherical residuals are often considered a sign of model misspecification, and models are revised to whiten the residuals and improve the reliability of standard estimation techniques. In some cases, however, nonspherical models must be accepted as they are, and estimated as accurately as possible using revised techniques. Cases include: • Models presented by theory • Models with predictors that are dictated by policy • Models without available data sources, for which predictor proxies must be found A variety of alternative estimation techniques have been developed to deal with these situations.
See Also Related Examples •
“Plot a Confidence Band Using HAC Estimates” on page 3-90
•
“Change the Bandwidth of a HAC Estimator” on page 3-97
•
“Time Series Regression I: Linear Models” on page 5-176
•
“Time Series Regression VI: Residual Diagnostics” on page 5-223
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-282
3-89
3
Model Selection
Plot a Confidence Band Using HAC Estimates This example shows how to plot heteroscedastic-and-autocorrelation consistent (HAC) corrected confidence bands using Newey-West robust standard errors. One way to estimate the coefficients of a linear model is by OLS. However, time series models tend to have innovations that are autocorrelated and heteroscedastic (i.e., the errors are nonspherical). If a time series model has nonspherical errors, then usual formulae for standard errors of OLS coefficients are biased and inconsistent. Inference based on these inefficient standard errors tends to inflate the Type I error rate. One way to account for nonspherical errors is to use HAC standard errors. In particular, the Newey-West estimator of the OLS coefficient covariance is relatively robust against nonspherical errors. Load Data Load the Canadian electric power consumption data set from the World Bank. The response is Canada's electrical energy consumption in kWh (DataTimeTable.consump), the predictor is Canada's GDP in year 2000 USD (DataTimeTable.gdp), and the data set also contains the GDP deflator (DataTimeTable.gdp_deflator). Because DataTimeTable is a timetable, DataTimeTable.Time is the sample year. load Data_PowerConsumption
Define Model Model the behavior of the annual difference in electrical energy consumption with respect to real GDP as a linear model: consumpDiff t = β0 + β1rGDPt + εt . consumpDiff = diff(DataTimeTable.consump); consumpDiff = consumpDiff/1.0e+10; T = size(consumpDiff,1);
% Annual difference in consumption % Scale for numerical stability
rGDP = DataTimeTable.gdp./(DataTimeTable.gdp_deflator); % Deflate GDP rGDP = rGDP(2:end)/1.0e+10; Mdl = fitlm(rGDP,consumpDiff); coeff = Mdl.Coefficients(:,1); EstParamCov = Mdl.CoefficientCovariance; resid = Mdl.Residuals.Raw;
Plot Data Plot the difference in energy consumption, consumpDiff versus the real GDP, to check for possible heteroscedasticity. figure plot(rGDP,consumpDiff,".") title("Annual Difference in Energy Consumption vs real GDP - Canada") xlabel("Real GDP (year 2000 USD)") ylabel("Annual Difference in Energy Consumption (kWh)")
3-90
Plot a Confidence Band Using HAC Estimates
The figure indicates that heteroscedasticity might be present in the annual difference in energy consumption. As real GDP increases, the annual difference in energy consumption seems to be less variable. Plot Residuals Plot the residuals from Mdl against the fitted values and year to assess heteroscedasticity and autocorrelation. figure tiledlayout("flow") nexttile([1 2]) plot(Mdl.Fitted,resid,".") hold on plot([min(Mdl.Fitted) max(Mdl.Fitted)],[0 0],"k-") title("Residual Plots") xlabel("Fitted Consumption") ylabel("Residuals") axis tight hold off nexttile autocorr(resid) h1 = gca; h1.FontSize = 8; nexttile parcorr(resid)
3-91
3
Model Selection
h2 = gca; h2.FontSize = 8;
The residual plot reveals decreasing residual variance with increasing fitted consumption. The autocorrelation function shows that autocorrelation might be present in the first few lagged residuals. Test for Heteroscedasticity and Autocorrelation Test for conditional heteroscedasticity using Engle's ARCH test. Test for autocorrelation using the Ljung-Box Q test. Test for overall correlation using the Durbin-Watson test. [~,englePValue] = archtest(resid); englePValue englePValue = 0.1463 [~,lbqPValue] = lbqtest(resid,Lag=1:3); % Significance of first three lags lbqPValue lbqPValue = 1×3 0.0905
0.1966
0.0522
[dwPValue] = dwtest(Mdl); dwPValue dwPValue = 0.0024
3-92
Plot a Confidence Band Using HAC Estimates
The p value of Engle's ARCH test suggests significant conditional heteroscedasticity at 15% significance level. The p value for the Ljung-Box Q test suggests significant autocorrelation with the first and third lagged residuals at 10% significance level. The p value for the Durbin-Watson test suggests that there is strong evidence for overall residual autocorrelation. The results of the tests suggest that the standard linear model conditions of homoscedasticity and uncorrelated errors are violated, and inferences based on the OLS coefficient covariance matrix are suspect. One way to proceed with inference (such as constructing a confidence band) is to correct the OLS coefficient covariance matrix by estimating the Newey-West coefficient covariance. Estimate Newey-West Coefficient Covariance Correct the OLS coefficient covariance matrix by estimating the Newey-West coefficient covariance using hac. Compute the maximum lag to be weighted for the standard Newey-West estimate, maxLag (Newey and West, 1994). Use hac to estimate the standard Newey-West coefficient covariance. maxLag = floor(4*(T/100)^(2/9)); [NWEstParamCov,~,NWCoeff] = hac(Mdl,Type="HAC", ... Bandwidth=maxLag+1); Estimator type: HAC Estimation method: BT Bandwidth: 4.0000 Whitening order: 0 Effective sample size: 49 Small sample correction: on Coefficient Covariances: | Const x1 -------------------------Const | 0.3720 -0.2990 x1 | -0.2990 0.2454
The Newey-West standard error for the coefficient of rGDP, labeled x1 in the table, is less than the usual OLS standard error. This suggests that, in this data set, correcting for residual heteroscedasticity and autocorrelation increases the precision in measuring the linear effect of real GDP on energy consumption. Calculate Working-Hotelling Confidence Bands Compute the 95% Working-Hotelling confidence band for each covariance estimate using nlpredci (Kutner et al., 2005). rGDPdes = [ones(T,1) rGDP]; modelfun = @(b,x)(b(1)*x(:,1)+b(2)*x(:,2));
% Design matrix % Define the linear model
[beta,nlresid,~,EstParamCov] = nlinfit(rGDPdes, ... consumpDiff,modelfun,[1,1]); % Estimate the model [fity,fitcb] = nlpredci(modelfun,rGDPdes,beta,nlresid, ... Covar=EstParamCov,SimOpt="on"); % Margin of errors conbandnl = [fity - fitcb fity + fitcb]; % Confidence bands [fity,NWfitcb] = nlpredci(modelfun,rGDPdes, ... beta,nlresid,Covar=NWEstParamCov,SimOpt="on"); % Corrected margin of error NWconbandnl = [fity - NWfitcb fity + NWfitcb]; % Corrected confidence bands
3-93
3
Model Selection
Plot Working-Hotelling Confidence Bands Plot the Working-Hotelling confidence bands on the same axes twice: one plot displaying electrical energy consumption with respect to real GDP, and the other displaying the electrical energy consumption time series. figure l1 = plot(rGDP,consumpDiff,"k."); hold on l2 = plot(rGDP,fity,"b-",LineWidth=2); l3 = plot(rGDP,conbandnl,"r-"); l4 = plot(rGDP,NWconbandnl,"g--"); title("Data with 95% Working-Hotelling Conf. Bands") xlabel("real GDP (year 2000 USD)") ylabel("Consumption (kWh)") axis([0.7 1.4 -2 2.5]) legend([l1 l2 l3(1) l4(1)],"Data","Fitted","95% conf. band", ... "Newey-West 95% conf. band",Location="southeast") hold off
figure year = DataTimeTable.Time(2:end); l1 = plot(year,consumpDiff); hold on l2 = plot(year,fity,"k-",LineWidth=2); l3 = plot(year,conbandnl,"r-"); l4 = plot(year,NWconbandnl,"g--"); title("Consumption with 95% Working-Hotelling Conf. Bands")
3-94
Plot a Confidence Band Using HAC Estimates
xlabel("Year") ylabel("Consumption (kWh)") legend([l1 l2 l3(1) l4(1)],"Consumption","Fitted", ... "95% conf. band","Newey-West 95% conf. band", ... Location="southwest") hold off
The plots show that the Newey-West estimator accounts for the heteroscedasticity in that the confidence band is wide in areas of high volatility, and thin in areas of low volatility. The OLS coefficient covariance estimator ignores this pattern of volatility. References: 1
Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. 5th Ed. New York: McGraw-Hill/Irwin, 2005.
2
Newey, W. K., and K. D. West. "A Simple Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, 1987, pp. 703-708.
3-95
3
Model Selection
3
Newey, W. K, and K. D. West. "Automatic Lag Selection in Covariance Matrix Estimation." The Review of Economic Studies. Vol. 61 No. 4, 1994, pp. 631-653.
See Also Related Examples •
“Change the Bandwidth of a HAC Estimator” on page 3-97
•
“Time Series Regression VI: Residual Diagnostics” on page 5-223
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-282
More About •
3-96
“Nonspherical Models” on page 3-89
Change the Bandwidth of a HAC Estimator
Change the Bandwidth of a HAC Estimator This example shows how to change the bandwidth when estimating a HAC coefficient covariance, and compare estimates over varying bandwidths and kernels. How does the bandwidth affect HAC estimators? If you change it, are there large differences in the estimates, and, if so, are the differences practically significant? Explore bandwidth effects by estimating HAC coefficient covariances over a grid of bandwidths. Load and Plot Data Determine how the cost of living affects the behavior of nominal wages. Load the Nelson Plosser data set to explore their statistical relationship. load Data_NelsonPlosser DTT = rmmissing(DataTimeTable); cpi = DTT.CPI; % Cost of living wm = DTT.WN; % Nominal wages figure plot(DTT.CPI,DTT.WN,"o") hFit = lsline; % Regression line xlabel("Consumer Price Index (1967 = 100)") ylabel("Nominal Wages (current $)") legend(hFit,"OLS Line",Location="southeast") title("{\bf Cost of Living}") grid on
3-97
3
Model Selection
The plot suggests that a linear model might capture the relationship between the two variables. Define Model Model the behavior of nominal wages with respect to CPI as this linear model. wmt = β0 + β1cpit + εt Mdl = fitlm(DTT.CPI,DTT.WN) Mdl = Linear regression model: y ~ 1 + x1 Estimated Coefficients: Estimate ________ (Intercept) x1
-2541.5 88.041
SE ______
tStat _______
pValue _________
174.64 2.6784
-14.553 32.871
2.407e-21 4.507e-40
Number of observations: 62, Error degrees of freedom: 60 Root Mean Squared Error: 494 R-squared: 0.947, Adjusted R-Squared: 0.947 F-statistic vs. constant model: 1.08e+03, p-value = 4.51e-40
3-98
Change the Bandwidth of a HAC Estimator
coeffCPI = Mdl.Coefficients.Estimate(2); seCPI = Mdl.Coefficients.SE(2);
Plot Residuals Plot the residuals from Mdl against the fitted values to assess heteroscedasticity and autocorrelation. figure stem(Mdl.Residuals.Raw) xlabel("Observation") ylabel("Residual") title("{\bf Linear Model Residuals}") axis tight grid on
The residual plot shows varying levels of dispersion, which indicates heteroscedasticity. Neighboring residuals (with respect to observation) tend to have the same sign and magnitude, which indicates the presence of autocorrelation. Estimate HAC standard errors. Obtain HAC standard errors over varying bandwidths using the Bartlett (for the Newey-West estimate) and quadratic spectral kernels. numEstimates = 10; stdErrBT = zeros(numEstimates,1); stdErrQS = zeros(numEstimates,1);
3-99
3
Model Selection
for bw = 1:numEstimates [~,CoeffTbl] = hac(DTT,ResponseVariable="WN",PredictorVariables="CPI", ... Bandwidth=bw,Display="off"); % Newey-West [~,CoeffTblQS] = hac(DTT,ResponseVariable="WN",PredictorVariables="CPI", ... Weights="QS",Bandwidth=bw,Display="off"); % HAC using quadratic spectral kernel stdErrBT(bw) = CoeffTbl.SE(2); stdErrQS(bw) = CoeffTblQS.SE(2); end
You can increase numEstimates to discover how increasing bandwidths affect the HAC estimates. Plot Standard Errors Visually compare the Newey-West standard errors of β1 to those using the quadratic spectral kernel over the bandwidth grid. figure hold on hCoeff = plot(1:numEstimates,repmat(coeffCPI,numEstimates,1), ... LineWidth=2); hOLS = plot(1:numEstimates,repmat(coeffCPI+seCPI,numEstimates,1), ... "g--"); plot(1:numEstimates,repmat(coeffCPI-seCPI,numEstimates,1),"g--") hBT = plot(1:numEstimates,coeffCPI+stdErrBT,"ro--"); plot(1:numEstimates,coeffCPI-stdErrBT,"ro--") hQS = plot(1:numEstimates,coeffCPI+stdErrQS,"kp--", ... LineWidth=2); plot(1:numEstimates,coeffCPI-stdErrQS,"kp--",LineWidth=2) hold off xlabel("Bandwidth") ylabel("CPI Coefficient") legend([hCoeff,hOLS,hBT,hQS],["OLS estimate" "OLS SE" ... "Newey-West SE" "Quadratic spectral SE"],Location="east") title("{\bf CPI Coefficient Standard Errors}") grid on
3-100
Change the Bandwidth of a HAC Estimator
The plot suggests that, for this data set, accounting for heteroscedasticity and autocorrelation using either HAC estimate results in more conservative intervals than the usual OLS standard error. The precision of the HAC estimates decreases as the bandwidth increases along the defined grid. For this data set, the Newey-West estimates are slightly more precise than those using the quadratic spectral kernel. This might be because the latter captures heteroscedasticity and autocorrelation better than the former. References: 1
Andrews, D. W. K. "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation." Econometrica. Vol. 59, 1991, pp. 817-858.
2
Newey, W. K., and K. D. West. "A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, No. 3, 1987, pp. 703-708.\
3
Newey, W. K., and K. D. West. "Automatic Lag Selection in Covariance Matrix Estimation." The Review of Economic Studies. Vol. 61, No. 4, 1994, pp. 631-653.
See Also Related Examples •
“Plot a Confidence Band Using HAC Estimates” on page 3-90
•
“Time Series Regression VI: Residual Diagnostics” on page 5-223 3-101
3
Model Selection
•
“Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-282
More About •
3-102
“Nonspherical Models” on page 3-89
Check Model Assumptions for Chow Test
Check Model Assumptions for Chow Test This example shows how to check the model assumptions for a Chow test. The model is of U.S. gross domestic product (GDP), with consumer price index (CPI) and paid compensation of employees (COE) as predictors. The forecast horizon is 2007 - 2009, just before and after the 2008 U.S. recession began. Load and Inspect Data Load the U.S. macroeconomic data set. load Data_USEconModel
The time series in the data set contain quarterly, macroeconomic measurements from 1947 to 2009. For more details, a list of variables, and descriptions, enter Description at the command line. Extract the predictors and the response from the table. Focus the sample on observations taken from 1960 - 2009. idx = year(DataTimeTable.Time) >= 1960; dates = DataTimeTable.Time(idx); y = DataTimeTable.GDP(idx); X = DataTimeTable{idx,["CPIAUCSL" "COE"]}; varNames = ["CPIAUCSL" "COE" "GDP"];
Identify forecast horizon indices. fHIdx = year(dates) >= 2007;
Plot all series individually. Identify the periods of recession. figure tiledlayout(2,2) nexttile plot(dates,y) title(varNames{end}); xlabel("Year"); axis tight; datetick; recessionplot; for j = 1:size(X,2) nexttile plot(dates,X(:,j)) title(varNames{j}); xlabel("Year"); axis tight; datetick; recessionplot; end
3-103
3
Model Selection
All variables appear to grow exponentially. Also, around the last recession, a decline appears. Suppose that a linear regression model of GDP onto CPI and COE is appropriate, and you want to test whether there is a structural change in the model in 2007. Check Chow Test Assumptions Chow tests rely on: • Independent, Gaussian-distributed innovations • Constancy of the innovations variance within subsamples • Constancy of the innovations across any structural breaks If a model violates these assumptions, then the Chow test result might not be correct, or the Chow test might lack power. Investigate whether the assumptions hold. If any do not, preprocess the data further. Fit the linear model to the entire series. Include an intercept. Mdl = fitlm(X,y);
Mdl is a LinearModel model object. Draw two histogram plots using the residuals: one with respect to fitted values in case order, and the other with respect to the previous residual. figure tiledlayout(2,1)
3-104
Check Model Assumptions for Chow Test
nexttile plotResiduals(Mdl,"lagged"); nexttile plotResiduals(Mdl,"caseorder");
Because the scatter plot of residual vs. lagged residual forms a trend, autocorrelation exists in the residuals. Also, residuals on the extremes seem to flare out, which suggests the presence of heteroscedasticity. Conduct Engle's ARCH test at 5% level of significance to assess whether the innovations have conditional heteroscedasticity with ARCH(1) effects. Supply the table of residals and specify the raw residuals. StatTbl = archtest(Mdl.Residuals,DataVariable="Raw") StatTbl=1×6 table h _____ Test 1
true
pValue ______ 0
stat ______
cValue ______
109.37
3.8415
Lags ____ 1
Alpha _____ 0.05
h = 1 suggests to reject the null hypothesis that the entire residual series has no conditional heteroscedasticity. Apply the log transformation to all series that appear to grow exponentially to reduce the effects of heteroscedasticity. 3-105
3
Model Selection
y = log(y); X = log(X);
To account for autocorrelation, create predictor variables for all exponential series by lagging them by one period. LagMat = lagmatrix([X y],1); X = [X(2:end,:) LagMat(2:end,:)]; % Concatenate data and remove first row fHIdx = fHIdx(2:end); y = y(2:end);
Based on the residual diagnostics, choose this linear model for GDP GDPt = β0 + β1CPIAUCSLt + β2COEt + β3CPIAUCSLt − 1 + β4COEt − 1 + β5GDPt − 1 + εt .
εt should be a Gaussian series of innovations with mean zero and constant variance σ2. Diagnose the residuals again. Mdl = fitlm(X,y); figure tiledlayout(2,1) nexttile plotResiduals(Mdl,"lagged"); nexttile plotResiduals(Mdl,"caseorder");
3-106
Check Model Assumptions for Chow Test
StatTbl = archtest(Mdl.Residuals,DataVariable="Raw") StatTbl=1×6 table h _____ Test 1
false
pValue _______
stat ______
cValue ______
0.28133
1.1607
3.8415
Lags ____ 1
Alpha _____ 0.05
SubMdl = {fitlm(X(~fHIdx,:),y(~fHIdx)) fitlm(X(fHIdx,:),y(fHIdx))}; subRes = {SubMdl{1}.Residuals.Raw SubMdl{2}.Residuals.Raw}; [hVT2,pValueVT2] = vartest2(subRes{1},subRes{2}) hVT2 = 0 pValueVT2 = 0.1645
The residual plots and tests suggest that the innovations are homoscedastic and uncorrelated. Conduct a Kolmogorov-Smirnov test to assess whether the innovations are Gaussian. [hKS,pValueKS] = kstest(Mdl.Residuals.Raw/std(Mdl.Residuals.Raw)) hKS = logical 0 pValueKS = 0.2347
hKS = 0 suggests to not reject the null hypothesis that the innovations are Gaussian. For the distributed lag model, the Chow test assumptions appear valid. Conduct Chow Test Treating 2007 and beyond as a post-recession regime, test whether the linear model is stable. Specify that the break point is the last quarter of 2006. Because the complementary subsample size is greater than the number of coefficients, conduct a break point test. bp = find(~fHIdx,1,'last'); chowtest(X,y,bp,'Display','summary'); RESULTS SUMMARY *************** Test 1 Sample size: 196 Breakpoint: 187 Test type: breakpoint Coefficients tested: All Statistic: 1.3741 Critical value: 2.1481 P value: 0.2272 Significance level: 0.0500 Decision: Fail to reject coefficient stability
3-107
3
Model Selection
The test fails to reject the stability of the linear model. Evidence is inefficient to infer a structural change between Q4-2006 and Q1-2007.
See Also chowtest | archtest | vartest2 | fitlm | LinearModel
Related Examples •
3-108
“Power of the Chow Test” on page 3-109
Power of the Chow Test
Power of the Chow Test This example shows how to estimate the power of a Chow test using a Monte Carlo simulation. Introduction Statistical power is the probability of rejecting the null hypothesis given that it is actually false. To estimate the power of a test: 1
Simulate many data sets from a model that typifies the alternative hypothesis.
2
Test each data set.
3
Estimate the power, which is the proportion of times the test rejects the null hypothesis.
The following can compromise the power of the Chow test: • Linear model assumption departures • Relatively large innovation variance • Using the forecast test when the sample size of the complementary subsample is greater than the number of coefficients in the test [42]. Departures from model assumptions allow for an examination of the factors that most affect the power of the Chow test. Consider the model y=
X1 0 beta1 + innov 0 X2 beta2
• innov is a vector of random Gaussian variates with mean zero and standard deviation sigma. • X1 and X2 are the sets of predictor data for initial and complementary subsamples, respectively. • beta1 and beta2 are the regression coefficient vectors for the initial and complementary subsamples, respectively. Simulate Predictor Data Specify four predictors, 50 observations, and a break point at period 44 for the simulated linear model. numPreds = 4; numObs = 50; bp = 44; rng(1); % For reproducibility
Form the predictor data by specifying means for the predictors, and then adding random, standard Gaussian noise to each of the means. mu = [0 1 2 3]; X = repmat(mu,numObs,1) + randn(numObs,numPreds);
To indicate an intercept, add a column of ones to the predictor data. 3-109
3
Model Selection
X = [ones(numObs,1) X]; X1 = X(1:bp,:); % Initial subsample predictors X2 = X(bp+1:end,:); % Complementary subsample predictors
Specify the true values of the regression coefficients. beta1 = [1 2 3 4 5]'; % Initial subsample coefficients
Estimate Power for Small and Large Jump Compare the power between the break point and forecast tests for jumps of different sizes small in the intercept and second regression coefficient. In this example, a small jump is a 10% increase in the current value, and a large jump is a 15% increase. Complementary subsample coefficients beta2Small = beta1 + [beta1(1)*0.1 0 beta1(3)*0.1 0 0 ]'; beta2Large = beta1 + [beta1(1)*0.15 0 beta1(3)*0.15 0 0 ]';
Simulate 1000 response paths of the linear model for each of the small and large coefficient jumps. Specify that sigma is 0.2. Choose to test the intercept and the second regression coefficient. M = 1000; sigma = 0.2; Coeffs = [true false true false false]; h1BP = nan(M,2); % Preallocation h1F = nan(M,2); for j = 1:M innovSmall = sigma*randn(numObs,1); innovLarge = sigma*randn(numObs,1); ySmall = [X1 zeros(bp,size(X2,2)); ... zeros(numObs - bp,size(X1,2)) X2]*[beta1; beta2Small] + innovSmall; yLarge = [X1 zeros(bp,size(X2,2)); ... zeros(numObs - bp,size(X1,2)) X2]*[beta1; beta2Large] + innovLarge; h1BP(j,1) = chowtest(X,ySmall,bp,'Intercept',false,'Coeffs',Coeffs,... 'Display','off')'; h1BP(j,2) = chowtest(X,yLarge,bp,'Intercept',false,'Coeffs',Coeffs,... 'Display','off')'; h1F(j,1) = chowtest(X,ySmall,bp,'Intercept',false,'Coeffs',Coeffs,... 'Test','forecast','Display','off')'; h1F(j,2) = chowtest(X,yLarge,bp,'Intercept',false,'Coeffs',Coeffs,... 'Test','forecast','Display','off')'; end
Estimate the power by computing the proportion of times chowtest correctly rejected the null hypothesis of coefficient stability. power1BP = mean(h1BP); power1F = mean(h1F); table(power1BP',power1F','RowNames',{'Small_Jump','Large_Jump'},... 'VariableNames',{'Breakpoint','Forecast'}) ans=2×2 table Breakpoint __________ Small_Jump Large_Jump
3-110
0.717 0.966
Forecast ________ 0.645 0.94
Power of the Chow Test
In this scenario, the Chow test can detect a change in the coefficient with more power when the jump is larger. The break point test has greater power to detect the jump than the forecast test. Estimate Power for Large Innovations Variance Simulate 1000 response paths of the linear model for a large coefficient jump. Specify that sigma is 0.4. Choose to test the intercept and the second regression coefficient. sigma = 0.4; h2BP = nan(M,1); h2F = nan(M,1); for j = 1:M innov = sigma*randn(numObs,1); y = [X1 zeros(bp,size(X2,2)); ... zeros(numObs - bp,size(X1,2)) X2]*[beta1; beta2Large] + innov; h2BP(j) = chowtest(X,y,bp,'Intercept',false,'Coeffs',Coeffs,... 'Display','off')'; h2F(j) = chowtest(X,y,bp,'Intercept',false,'Coeffs',Coeffs,... 'Test','forecast','Display','off')'; end power2BP = mean(h2BP); power2F = mean(h2F); table([power1BP(2); power2BP],[power1F(2); power2F],... 'RowNames',{'Small_sigma','Large_Sigma'},... 'VariableNames',{'Breakpoint','Forecast'}) ans=2×2 table Breakpoint __________ Small_sigma Large_Sigma
0.966 0.418
Forecast ________ 0.94 0.352
For larger innovation variance, both Chow tests have difficulty detecting the large structural breaks in the intercept and second regression coefficient.
See Also chowtest
Related Examples •
“Check Model Assumptions for Chow Test” on page 3-103
3-111
4 Econometric Modeler • “Analyze Time Series Data Using Econometric Modeler” on page 4-2 • “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44 • “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50 • “Prepare Time Series Data for Econometric Modeler App” on page 4-59 • “Import Time Series Data into Econometric Modeler App” on page 4-62 • “Plot Time Series Data Using Econometric Modeler App” on page 4-66 • “Detect Serial Correlation Using Econometric Modeler App” on page 4-71 • “Detect ARCH Effects Using Econometric Modeler App” on page 4-77 • “Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84 • “Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94 • “Transform Time Series Using Econometric Modeler App” on page 4-97 • “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 • “Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122 • “Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131 • “Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141 • “Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150 • “Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155 • “Conduct Cointegration Test Using Econometric Modeler” on page 4-170 • “Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180 • “Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193 • “Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200 • “Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208 • “Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4-221 • “Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230 • “Share Results of Econometric Modeler App Session” on page 4-237
4
Econometric Modeler
Analyze Time Series Data Using Econometric Modeler The Econometric Modeler app is an interactive tool for analyzing univariate or multivariate time series data. The app is well suited for visualizing and transforming data, performing statistical specification and model identification tests, fitting models to data, and iterating among these actions. When you are satisfied with a model, you can export it to the MATLAB Workspace to forecast future responses or for further analysis. You can also generate code or a report from a session. Start Econometric Modeler by entering econometricModeler at the MATLAB command line, or by clicking Econometric Modeler under Computational Finance in the apps gallery (Apps tab on the MATLAB Toolstrip). The following workflow describes how to find a model with the best in-sample fit to time series data using Econometric Modeler. The workflow is not a strict prescription—the steps you implement depend on your goals and the model type. You can easily skip steps and iterate several steps as needed. The app is well suited to the Box-Jenkins approach to time series model building [1]. 1
Prepare data for Econometric Modeler on page 4-3 — For a response variable, or select multiple response variables for a multivariate analysis, to analyze and from which to build a predictive model. Optionally, select explanatory variables to include in the model. Note You can import only one variable from the MATLAB Workspace into Econometric Modeler. Therefore, at the command line, you must synchronize and concatenate multiple series into one variable.
2
Import time series variables on page 4-4 — Import Data into Econometric Modeler from the MATLAB Workspace or a MAT-file. After importing data, you can adjust variable properties or the presence of variables.
3
Perform exploratory data analysis on page 4-6 — View the series in various ways, stabilize the series by transforming them, and detect time series properties by performing statistical tests. • Visualize time series data on page 4-6 — Supported plots include time series plots and correlograms (for example, the autocorrelation function (ACF)). • Perform specification and model identification hypothesis tests on page 4-9 — Test series for stationarity, heteroscedasticity, autocorrelation, and collinearity or cointegration among multiple series. For ARIMA and GARCH models, this step can include determining the appropriate number of lags to include in the model. Supported tests include the augmented Dickey-Fuller test, Engle's ARCH test, the Ljung-Box Q-test, Belsley collinearity diagnostics, and the Johansen cointegration test. • Transform time series on page 4-14 — Supported transformations include the log transformation and seasonal and nonseasonal differencing.
4-2
4
Fit candidate models to the data on page 4-15 — Choose model parametric forms for univariate or multivariate response series based on the exploratory data analysis or dictated by economic theory. Then, estimate the model. Supported univariate models include seasonal and nonseasonal conditional mean (for example, ARIMA), conditional variance (for example, GARCH), and multiple linear regression models (optionally containing ARMA errors). Supported multivariate models include vector autoregression (VAR) and vector error-correction (VEC) models.
5
Conduct goodness-of-fit checks on page 4-30 — Ensure that the model adequately describes the data by performing residual diagnostics.
Analyze Time Series Data Using Econometric Modeler
• Visualize the residuals to check whether they are centered on zero, normally distributed, homoscedastic, and serially uncorrelated. Supported plots include quantile-quantile and ACF plots. • Test the residuals for homoscedasticity and autocorrelation. Supported tests include the Ljung-Box Q-test and Engle's ARCH test on the squared residuals. 6
Find the model with the best in-sample fit on page 4-36 — Estimate multiple models within the same family, and then choose the model that yields the minimal fit statistic, for example, Akaike information criterion (AIC).
7
Export session results on page 4-38 — After you find a model or models that perform adequately, summarize the results of the session. The method you choose depends on your goals. Supported methods include: • Export variables on page 4-38 — Econometric Modeler exports selected variables to the MATLAB Workspace. If a session in the app does not complete your analysis goal, such as forecasting responses, then you can export variables (including estimated models) for further analysis at the command line. • Generate a function on page 4-39 — Econometric Modeler generates a MATLAB plain text or live function that returns a selected model given the imported data. This method helps you understand the command-line functions that the app uses to create predictive models. You can modify the generated function to accomplish your analysis goals. • Generate a report on page 4-40 — Econometric Modeler produces a document, such as, a PDF, describing your activities on selected variables or models. This method provides a clear and convenient summary of your analysis when you complete your goal in the app.
Prepare Data for Econometric Modeler App You can import only one variable from the MATLAB Workspace into Econometric Modeler. Therefore, before importing data, concatenate the response series and any predictor series into one variable. Econometric Modeler supports these variable data types. • MATLAB timetable — Variables must be double-precision numeric vectors. A best practice is to import your data in a timetable because Econometric Modeler: • Names variables by using the names stored in the VariableNames field of the Properties property. • Uses the time variable values as tick labels for any axis that represents time. Otherwise, tick labels representing time are indices. • Enables you to overlay recession bands on time series plots (see recessionplot) For more details on timetables, see “Create Timetables”. • MATLAB table — Variables must be double-precision numeric vectors. Variable names are the names in the VariableNames field of the Properties property. • Numeric vector or matrix — For a matrix, each column is a separate variable named variableNamej, where j is the corresponding column. Regardless of variable type, Econometric Modeler assumes that rows correspond to time points (observations).
4-3
4
Econometric Modeler
Import Time Series Variables The data set can exist in the MATLAB Workspace or in a MAT-file that you can access from your machine. • To import a data set from the Workspace, on the Econometric Modeler tab, in the Import section, click . In the Import Data dialog box, click the check box in the Import? column for the variable containing the data, and then click Import. All variables in the Workspace of the supported data type appear in the dialog box, but you can choose one only. • To import data from a MAT-file, in the Import section, click Import, then select Import From MAT-file. In the Select a MAT-file dialog box, browse to the folder containing the data set, then double-click the MAT-file. After you import data, Econometric Modeler performs all the following actions. • The name of each variable (column) in the data set appears in the Time Series pane. • The value of the variable selected in the Time Series pane appears in the Preview pane. • A time series plot including all variables appears in the Time Series Plot(VariableName) figure window, where VariableName is the name of one of the variables in the Time Series pane. You can interact with the variables in the Time Series pane in several ways. • To select a variable to perform a statistical test or create a plot, for example, click the variable in the Time Series pane. If you double-click the variable instead, then the app also plots it in a separate time series plot. • To open, delete, or export a variable, right-click it in the Time Series pane. Then, from the context menu, choose the desired action. • To select or operate on multiple time series simultaneously, press Ctrl and click each variable you want to use. Consider importing the data in the Data_USEconModel MAT-file. 1
At the command line, load the data into the Workspace. load Data_USEconModel
2
3
4-4
In Econometric Modeler, in the Import section of the Econometric Modeler tab, click Import Data dialog box appears.
. The
Data_USEconModel stores several variables. Data, DataTable, and DataTimeTable contain the same data, but DataTable is a table, and DataTimeTable is a timetable. Import DataTimeTable by selecting the corresponding Import? check box, then clicking Import.
Analyze Time Series Data Using Econometric Modeler
All variables in DataTimeTable appear in the Time Series pane. Suppose that you want to retain COE, FEDFUNDS, and GDP only. Select all other variables, right-click one of them, and select Delete.
After working in the app, you can import another data set. After you click Import, Econometric Modeler displays the following dialog box.
If you click OK, then Econometric Modeler deletes all variables from the Time Series and Models panes, and closes all documents in the right pane. 4-5
4
Econometric Modeler
Perform Exploratory Data Analysis An exploratory data analysis includes determining characteristics of your variables and relationships between them, with the formation of a predictive model in mind. For time series data, identify series that grow exponentially, contain trends, or are nonstationary, and then transform them appropriately. For ARIMA models, to identify the model form and significant lags in the serial correlation structure of the response series, use the Box-Jenkins methodology [1]. If you plan to create GARCH models, then assess whether the series contain volatility clustering and significant lags. For multiple regression models, identify collinear predictors and those predictors that are linearly related to the response. For multivariate models, in addition to univariate analyses, you can test whether series are cointegrated. For time series data analysis, an exploratory analysis usually includes iterating among visualizing the data, performing statistical specification and model identification tests, and transforming the data. Visualizing Time Series Data After you import a data set, Econometric Modeler selects all variables in the imported data and displays a time series plot of them in the right pane by default. For example, after you import DataTimeTable in the Data_USEconModel data set, the app displays this time series plot.
To create your own time series plot: 4-6
Analyze Time Series Data Using Econometric Modeler
1
In the Time Series pane, select the appropriate number of series for the plot.
2
Click the Plots tab in the toolstrip.
3
Click the button for the type of plot you want.
Econometric Modeler supports the following time series plots. Plot
Goals • Identify missing values or outliers.
Time series
or
• Identify series that grow exponentially or that contain a trend. • Identify nonstationary series. • Identify series that contain volatility clustering. • Compare two series with different scales in the same plot (Y-Y Axis). • Compare multiple series with similar scales in the same plot.
Autocorrelation function
• Identify series with serial correlation. • Determine whether an AR model is appropriate.
(ACF)
• Identify significant MA lags for model identification.
Partial ACF (PACF)
• Identify series with serial correlation. • Determine whether an MA model is appropriate. • Identify significant AR lags for model identification. • Inspect variable distributions. • Identify variables with linear relationships pairwise.
Correlations
You can interact with an existing plot by: • Right-clicking it • Using the plot buttons that appear when you pause on the plot • Using the options on the figure window Supported interactions vary by plot type. • Save a figure — Right-click the figure, then select Export. Save the figure that appears. • Add or remove time series in a plot — Right-click the figure, point to Show Time Series menu, then select the time series to add or remove. • Plot recession bands — Right-click a time series plot, then select Show Recessions. • Show grid lines — Pause on the figure, then click • Toggle legend — Pause on the figure, then click • Pan — Pause on the figure, then click Data”.
. .
. For more details on panning, see “Zoom, Pan, and Rotate
• Zoom — Pause on the figure. To zoom in, click “Zoom, Pan, and Rotate Data”.
. To zoom out, click
. For more details, see
• Restore view — To return the plot to its original view after you pan or zoom, pause on the figure, then click
. 4-7
4
Econometric Modeler
For serial correlation function plots, additional options exist in the ACF or PACF tabs. You can specify the: • Number of lags to display • Number of standard deviations for the confidence bands • MA or AR order in which the theoretical ACF or PACF, respectively, is effectively zero Econometric Modeler updates the plot in real time as you adjust the parameters. As you explore your data, plots and computation results accumulate in the right pane under tabs. You can customize the display of the documents in the right pane to, for example, view multiple plots simultaneously, by performing any of the following actions: • Orient the plot tabs by dragging them into different sections of the right pane. As you drag a plot, the app highlights possible sections in which to place it. To undo the last document or figure window positioning, pause on the dot located in the middle of the partition, then click when it appears. • Click the Document Actions button include:
on the upper-right corner of the document. Options
• Tile All — Choose a layout for multiple plots. • Tab Position — Select where to display the figure tab. Consider an ARIMA model for the effective federal funds rate (FEDFUNDS). To identify the model characteristics (for example, the number of AR or MA lags), plot the time series, ACF, and PACF sideby-side.
4-8
1
In the Time Series pane, double-click FEDFUNDS.
2
Add recession bands to the plot by right-clicking the plot in the Time Series Plot(FEDFUNDS) figure window, then selecting Show Recessions.
3
On the Plots tab, click ACF.
4
Click PACF.
5
Click the Time Series Plot(FEDFUNDS) figure window and drag it to the left side of the right pane. Click the PACF(FEDFUNDS) figure window and drag it to the bottom right of the pane.
Analyze Time Series Data Using Econometric Modeler
The ACF dies out slowly and the PACF cuts off after the first lag. The behavior of the ACF suggests that the time series must be transformed before you choose on the form of the ARIMA model. In the right pane, observe the dot in the middle of the horizontal partition between the correlograms (below the Lag x axis label of the ACF). To undo this correlogram positioning, that is, separate the correlograms by tabs, pause on the dot and click when it appears. Performing Specification and Model Identification Hypothesis Tests You can perform hypothesis tests to confirm time series properties that you obtain visually or test for properties that are difficult to see. Econometric Modeler enables you to run tests multiple times with parameter settings. Econometric Modeler supports these tests for univariate series. Test
Hypotheses
Augmented Dickey-
H0: Series has a unit root.
Fuller
H1: Series is stationary. For details on the supported parameters, see adftest.
Kwiatkowski, Phillips, Schmidt, Shin (KPSS)
H0: Series is trend stationary. H1: Series has a unit root. For details on the supported parameters, see kpsstest.
4-9
4
Econometric Modeler
Test
Hypotheses
Leybourne-McCabe
H0: Series is a trend stationary AR(p) process. H1: Series is an ARIMA(p,1,1) process. To specify p, adjust the Number of Lags parameter. For details on the supported parameters, see lmctest.
Phillips-Peron
H0: Series has a unit root. H1: Series is stationary. For details on the supported parameters, see pptest.
Variance ratio
H0: Series is a random walk. H1: Series is not a random walk. For details on the supported parameters, see vratiotest.
Engle's ARCH
H0: Series exhibits no conditional heteroscedasticity (ARCH effects). H1: Series is an ARCH(p) model, with p > 0. To specify p, adjust the Number of Lags parameter. For details on the supported parameters, see archtest.
Ljung-Box Q-test
H0: Series exhibits no autocorrelation in the first m lags, that is, corresponding coefficients are jointly zero. H1: Series has at least one nonzero autocorrelation coefficient ρj, j ∈ {1, …,m}. To specify m, adjust the Number of Lags parameter. For details on the supported parameters, see lbqtest.
Note Before conducting tests, Econometric Modeler removes leading and trailing missing values (NaN values) in the series. Engle's ARCH test does not support missing values within the series, that is, NaN values preceded and succeeded by observations. The stationarity test results suggest whether you should transform a series to stabilize it, and which transformation is appropriate. For ARIMA models, stationarity test results suggest whether to include degrees of integration. Engle's ARCH test results indicate whether the series exhibits volatility clustering and suggests lags to include in a GARCH model. Ljung-Box Q-test results suggest how many AR lags are required in an ARIMA model. To perform a univariate test in Econometric Modeler:
4-10
1
Select a variable in the Time Series pane.
2
On the Econometric Modeler tab, in the Tests section, click New Test.
3
In the test gallery, click the test you want to conduct. A new tab for the test type appears in the toolstrip, and a new document for the test results appears in the right pane.
4
On the test type tab, in the Parameters section, adjust parameters for the test. For example, consider performing an Engle's ARCH test. On the ARCH tab, in the Parameters section, select
Analyze Time Series Data Using Econometric Modeler
the number of lags in the test statistic using the Number of Lags spin box, or the significance level (that is, the value of α) using the Significance Level spin box.
5
On the test-type tab, in the Tests section, click Run Test. The test results, including whether to reject the null hypothesis, the p-value, and the parameter settings, appear in a new row in the Results table of the test results document. If the null hypothesis has been rejected, then the app highlights the row in yellow.
If you run multiple tests on a particular series, the results of each test appear as a new row in the Results table. To remove a row from the Results table, select the corresponding check box in the Select column, then click Clear Tests in the test-type tab. Note Multiple testing inflates the false discovery rate. One conservative way to maintain an overall false discovery rate of α is to apply the Bonferroni correction to the significance level of each test. That is, for a total of t tests, set Significance Level value to α/t. Econometric Modeler supports these tests and diagnostics for multiple series. Tests and Diagnostics
Description
Belsley Collinearity
For details on the supported parameters and results, see collintest.
Diagnostics Engle-Granger
H0: The series do not exhibit cointegration. H1: The series exhibit cointegration. For details on the supported parameters, see egcitest.
Johansen
For a specified cointegration rank r: • H0: The series exhibit at most rank r cointegration. • H1: The series exhibit cointegration with rank greater than r. For details on the supported parameters, see jcitest.
To diagnose multiple series in Econometric Modeler: 1
Select at least two variables in the Time Series pane.
2
On the Econometric Modeler tab, in the Tests section, click New Test.
3
In the tests gallery, select the diagnostics you want to run. A new tab for the diagnostic appears in the toolstrip, and a new document for the results appears in the right pane.
4
On the tab for the diagnostic, you can adjust parameters for the diagnostic in the appropriate section. For example, consider conducting an Engle-Granger cointegration test on the selected series. In the tests gallery, select Engle-Granger Test. On the EGCI tab, in the Parameters section, select the cointegration regression form by using the Cointegration Regression Form 4-11
4
Econometric Modeler
list and select the test to conduct on the regression residuals by using the Residual Regression Form list.
For example, consider a predictive model containing Canadian inflation and interest rates as predictor variables. Determine whether the variables are collinear. The Data_Canada data set contains the time series. 1
Import the DataTimeTable variable in the Data_Canada data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4). The time series plot appears in the right pane.
All series appear to contain autocorrelation. Although you should remove autocorrelation from predictor variables before creating a predictive model, this example proceeds without removing autocorrelation.
4-12
Analyze Time Series Data Using Econometric Modeler
2
In the Tests section, click New Test. In the Collinearity section, click Belsley Collinearity Diagnostics. Econometric Modeler creates a new tab for the Belsley collinearity diagnostics in the toolstrip, and a it creates a new document for the results in the right pane. The results contains a table of the singular values, condition indices, and the variance-decomposition proportions for each series. Rows Econometric Modeler highlights in yellow have a condition index greater than the tolerance specified by the Condition Index parameter value (default is 30) in the Tolerances section of the Collinearity tab. The columns of the table labeled with series names form the matrix of variance-decomposition proportions. Those series with variance-decomposition proportion greater than the tolerance specified by the Variance-Decompoosition Proportion parameter value (default is 0.5) in the highlighted row exhibit multicollinearity.
Because their variance-decomposition proportions are above the tolerance (default tolerance is 0.5) for the condition index, the collinear predictors are INT_L, INT_M, and INT_S. 3
You can add or remove time series from the diagnostics. For example, remove the inflation rates from the diagnostics by performing the following procedure.
4-13
4
Econometric Modeler
a
In the test-results document, right-click the Results table or plot.
b
Point to Show Time Series. A list of all variables appears.
c
Remove the inflation rate series INF_C and INF_G from the diagnostics by deselecting the corresponding check boxes. As you deselect series, Econometric Modeler recomputes the results based on the selected series.
For more details on the Belsley collinearity diagnostics results and multicollinearity, see collintest and “Time Series Regression II: Collinearity and Estimator Variance” on page 5-183. Transforming Time Series The Box-Jenkins methodology [1] for ARIMA model selection assumes that the response series is stationary, and spurious regression models can result from a model containing nonstationary predictors and response variables (for more details, see “Time Series Regression IV: Spurious Regression” on page 5-200). To stabilize your series, Econometric Modeler supports these transformations in the Transforms section of the Econometric Modeler tab. Transformation
Use When Series ...
Notes
Log
Has an exponential trend or variance that grows with its levels
All values in the series must be positive.
Linear detrend
Has a linear deterministic trend that can be identified using least squares
When Econometric Modeler detrends the series, it ignores leading or trailing missing (NaN) values. If any missing values occur between observed values, then the app returns a vector of NaN values with length equal to the series.
First-order difference
4-14
Has a stochastic trend
Econometric Modeler prepends the differenced series with a NaN value. This action ensures that the differenced series has the same length and time base as the original series.
Analyze Time Series Data Using Econometric Modeler
Transformation
Use When Series ...
Notes
Seasonal difference
Has a seasonal, stochastic trend
You can specify the period in a season using the spin box. For example, 12 indicates a monthly seasonal transformation. Econometric Modeler prepends the differenced series with nan(period,1), where period is the specified period in a season. This action ensures that the differenced series has the same length and time base as the original series.
For more details, see “Data Transformations” on page 2-2. To transform a variable, select the variable in the Time Series pane, then click a transformation. After you transform a series, a new variable representing the transformed series appears in the Time Series pane. Also, Econometric Modeler plots and selects the new variable. To create the variable name, the app appends the transformation name to the end of the variable name. You can rename the transformed variable by clicking it twice in the Time Series to select the text of the variable name, and then entering the new name. You can select multiple series by pressing Ctrl and clicking each series, and then apply the same transformation to the selected series simultaneously. The app creates new variables for each series, appends the transformation name to the end of each transformed variable name, and plots the transformed variables in the same figure. For example, suppose that the GDP series in Data_USEconModel has an exponential trend and a stochastic trend. Stabilize the GDP by applying the log transformation and then applying the second difference. 1
Import the DataTimeTable variable in the Data_USEconModel data set into the Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
In the Time Series pane, select GDP.
3
On the Econometric Modeler tab, in the Transforms section, click Log. The app creates a variable named GDPLog, which appears in the Time Series pane, and displays a plot for the time series.
4
In the Transforms section, click Difference. The app creates a variable named GDPLogDiff and displays a plot for the time series.
5
In the Transforms section, click Difference. The app creates a variable called GDPLogDiffDiff and displays a plot for the time series.
GDPLogDiffDiff is the stabilized GDP.
Fitting Models to Data The results of an exploratory data analysis can suggest several candidate models. To choose a model, in the Time Series pane, select the time series for the response. On the Econometric Modeler tab, in the Models section, click a model or click one in the models gallery. Econometric Modeler allows you to select only those models that are appropriate for the number of the selected response series. After you select a model, you configure it for estimation. Econometric Modeler supports the following models. 4-15
4
Econometric Modeler
Model
Response
Conditional Univariate mean: ARMA/ ARIMA Models section
Type
Stationary autoregressive (AR) For details, see “What Are Autoregressive Models?” on page 721, arima, and estimate.
Univariate Stationary moving average (MA) For details, see “What Are Moving Average Models?” on page 729, arima, and estimate. Univariate Stationary ARMA For details, see “What Are Autoregressive Moving Average Models?” on page 7-35, arima, and estimate. Univariate Nonstationary, integrated ARMA (ARIMA) For details, see “What Are ARIMA Models?” on page 7-41, arima, and estimate. Univariate Seasonal (Multiplicative) ARIMA (SARIMA) For details, see “What Are Multiplicative ARIMA Models?” on page 7-49, arima, and estimate. Univariate ARIMA including exogenous predictors (ARIMAX) For details, see “What Are ARIMA Models That Include Exogenous Covariates?” on page 7-61, arima, and estimate. Univariate Seasonal ARIMAX For details, see arima and estimate. Conditional Univariate variance: GARCH Models section
Generalized autoregressive conditional heteroscedastic (GARCH) For details, see “GARCH Model” on page 8-3, garch, and estimate.
4-16
Analyze Time Series Data Using Econometric Modeler
Model
Response
Type
Univariate Exponential GARCH (EGARCH) For details, see “EGARCH Model” on page 8-3, egarch, and estimate. Univariate Glosten, Jagannathan, and Runkle (GJR) For details, see “GJR Model” on page 8-4, gjr, and estimate. Multiple linear regression: Regression Models section
Univariate Multiple linear regression For details, see “Time Series Regression I: Linear Models” on page 5-176, LinearModel, and fitlm. Univariate Regression model with ARMA errors For details, see “Regression Models with Time Series Errors” on page 5-5, regARIMA, and estimate.
Vector Autoregression Models: Multivariate Models section
Multivariate Stationary vector autoregression (VAR) For details, see “Vector Autoregression (VAR) Models” on page 9-3 and varm. Multivariate VAR including exogenous variable (VARX) For details, see “Vector Autoregression (VAR) Models” on page 9-3 and varm. Multivariate Vector error-correction (VEC) or cointegrated VAR For details, see vecm.
For univariate conditional mean model estimation, SARIMA and SARIMAX are the most flexible models. You can create any conditional mean model that excludes exogenous predictors by clicking SARIMA, or you can create any conditional mean model that includes at least one exogenous predictor by clicking SARIMAX. For multivariate model estimation, the model you choose depends on whether the selected time series are stationary or cointegrated. For stationary series, create a VAR model by clicking VAR. To include exogenous predictors, click VARX instead. For cointegrated series, click VEC.
4-17
4
Econometric Modeler
After you select a model, the app displays the Type Model Parameters dialog box, where Type is the model type. For example, this figure shows the SARIMAX Model Parameters dialog box.
Adjustable parameters in the Type Model Parameters window depend on Type. In general, adjustable parameters include: • Deterministic terms and linear regression coefficients corresponding to predictor variables (see “Adjusting Deterministic Terms and Regression Component Parameters” on page 4-19) • Time series component parameters for univariate models, which include seasonal and nonseasonal lags and degrees of integration (see “Adjusting Time Series Component Parameters for Univariate Models” on page 4-20) • Time series component parameters for multivariate models, which include AR lags and AR coefficient elements to specify equality constraints during estimation (see “Adjusting Time Series Component Parameters for Multivariate Models” on page 4-21) • For univariate models, the innovation distribution (see “Adjusting Innovation Distribution Parameters for Univariate Models” on page 4-21) As you adjust parameter values, the equation in the Model Equation section changes to match your specifications. Adjustable parameters correspond to input and name-value pair arguments described in corresponding model creation reference pages. For details, see the function reference page for a specific model. Regardless of the model you choose, all unspecified coefficients in the model are unknown and estimable, including the t-distribution degrees of freedom parameter (when you specify a t innovation distribution). Note Econometric Modeler does not support: • Optimization option adjustments for estimation. 4-18
Analyze Time Series Data Using Econometric Modeler
• Composite conditional mean and variance models. For details, see “Specify Conditional Mean and Variance Models” on page 7-75. • Equality constraints on univariate model parameters during estimation (except for holding some parameters fixed at zero during estimation). To adjust optimization options, estimate composite conditional mean and variance models, or apply equality constraints, use the MATLAB command line. Adjusting Deterministic Terms and Regression Component Parameters Supported deterministic terms depend on the selected model and include a model constant (offset or intercept) and linear time trend. To include a model constant (offset or intercept) term, select the Include Constant Term or Include Offset Term check box. Similar for a linear time trend, select the Include Trend check box. To remove a deterministic term (that is, constrain it to zero during estimation), clear the check box. The location and type of the check box in the Type Model Parameters dialog box depends on the model type. By default, Econometric Modeler includes a model constant in all model types except conditional variance models. To select predictors for the regression component, in the Predictors list, select the check box in the Include? column corresponding to the predictors you want to include in the model. By default, the app does not include a regression component in any model type. • If you select ARIMAX, SARIMAX, or RegARMA, then you must choose at least one predictor. • If you select MLR, then you can specify one of the following: • An MLR model when you choose at least one predictor • A constant mean model (intercept-only model) when you clear all check boxes in the Include? column and select the Include Intercept check box • An error-only model when you clear all check boxes in the Include? column and clear the Include Intercept check box Consider a linear regression model of GDP onto CPI and the unemployment rate. To specify the regression: 1
Import the DataTimeTable variable in the Data_USEconModel data set into the Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
In the Time Series pane, select the response variable GDP.
3
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
4
In the models gallery, in the Regression Models section, click MLR.
5
In the MLR Model Parameters dialog box, in the Include? column, select the CPIAUCSL and UNRATE check boxes.
4-19
4
Econometric Modeler
6
Click the Estimate button.
Adjusting Time Series Component Parameters for Univariate Models In general for univariate models, time series component parameters contain lags to include in the seasonal and nonseasonal lag operator polynomials, and seasonal and nonseasonal degrees of integration. 4-20
Analyze Time Series Data Using Econometric Modeler
• For conditional mean models, you can specify seasonal and nonseasonal autoregressive lags, and seasonal and nonseasonal moving average lags. You can also adjust seasonal and nonseasonal degrees of integration. • For conditional variance models, you can specify ARCH and GARCH lags. EGARCH and GJR models also support leverage lags. • For regression models with ARMA errors, you can specify nonseasonal autoregressive and moving average lags. For models containing seasonal lags or degrees of seasonal or nonseasonal integration, use the command line instead. Econometric Modeler supports two options to adjust the parameters. The adjustment options are on separate tabs of the Type Model Parameters dialog box: the Lag Order and Lag Vector tabs. On the Lag Order tab, you can specify the orders of lag operator polynomials. This feature enables you to include all lags efficiently, from 1 through the specified order, in a lag operator polynomial. On the Lag Vector tab, you can specify the individual lags that comprise a lag operator polynomial. This feature is well suited for creating flexible models. For more details, see “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44. Adjusting Innovation Distribution Parameters for Univariate Models For univariate models, you can specify that the distribution of the innovations is Gaussian. For all models, except multiple linear regression models, you can specify the Student's t instead to address leptokurtic innovation distributions (for more details, see “Maximum Likelihood Estimation for Conditional Mean Models” on page 7-106, “Maximum Likelihood Estimation for Conditional Variance Models” on page 8-52, or “Maximum Likelihood Estimation of regARIMA Models” on page 5-74). If you specify the t distribution, then Econometric Modeler estimates its degrees of freedom parameter using maximum likelihood. By default, Econometric Modeler uses the Gaussian distribution for the innovations. To change the innovation distribution, in the Type Model Parameters dialog box, from the Innovation Distribution button, select a distribution in the list.
Adjusting Time Series Component Parameters for Multivariate Models Supported time series component parameters for multivariate models depend on the model type. All types enable you to include nonseasonal AR lag coefficients. However, VEC models additionally support the following parameters specifications: • The Johansen model form, which specifies which deterministic terms (overall or within the cointegrating relation) to include in the model. For details, see “Johansen Form” on page 12-2657. • Cointegration rank r • Adjustment speed matrix A • Cointegration matrix B This figure is an example of the Type Model Parameters dialog box.
4-21
4
Econometric Modeler
Like univariate models, Econometric Modeler supports adjusting the AR or short-run lag operator polynomial efficiently by specifying the lag order (Lag Order tab) or, for flexibility, by specifying individual lags (Lag Vector tab). Unlike univariate models, Econometric Modeler supports estimation equality constraints on individual entries of the AR or short-run lag coefficient matrix, which correspond to self- or cross-variable lag coefficients in the model. Coefficient constraints enable you to test economic scenarios. Econometric Modeler holds the specified values fixed during estimation. For VEC models, you can specify equality constraints on the entire matrix A or B, except for Johansen forms H* and H1* which support equality constraints only for A. To specify such equality constraints: 1
4-22
In the Type Model Parameters dialog box, select the lag to constrain by using the AR Coefficients (ϕ) (VAR or VARX) or Short-Run Coefficients (Φ) (for VEC) list.
Analyze Time Series Data Using Econometric Modeler
2
Perform either one of the following alternatives: • Click the elements of the matrices to constrain, and then enter the value. Econometric Modeler estimates all NaN entries. • Import a matrix. a
At the command line, create an appropriately sized matrix of constraints or mix of constraints and NaN values for AR or short-run lags.
b
In Econometric Modeler, in the Type Model Parameters dialog box, click Import.
c
In the dialog box, select the variable to import for the coefficient matrix.
For details, see “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50. Estimating a Univariate Model Econometric Modeler treats all parameters in the model as unknown and estimable. After you specify a model, fit it to the data by clicking Estimate in the Type Model Parameters dialog box. Note • Econometric Modeler requires initial values for estimable parameters and presample observations to initialize the model for estimation. Econometric Modeler always chooses the default initial and presample values as described in the estimate reference page of the model you want to estimate. • If Econometric Modeler issues an error during estimation, then: • The specified model poorly describes the data. Adjust model parameters, then estimate the new model. • At the command line, adjust optimization options and estimate the model. For details, see “Optimization Settings for Conditional Mean Model Estimation” on page 7-113, “Optimization Settings for Conditional Variance Model Estimation” on page 8-58, or “Optimization Settings for regARIMA Model Estimation” on page 5-84.
After you estimate a model: • A new variable that describes the estimated model appears in the Models pane with the name Type_response. Type is the model type and response is the response variable to which Econometric Modeler fit the model, for example, ARIMA_FEDFUNDS. You operate on an estimated model in the Models pane by right-clicking it. In addition to the options available for time series variables (see “Import Time Series Variables” on page 4-4), the context menu includes the Modify option, which enables you to modify and re-estimate a model. For example, right-click a model and select Modify. Then, in the Type Model Parameters dialog box, adjust parameters and click Estimate. • The object display of the model appears in the Preview pane. • The Model Summary(Type_response) document summarizing the estimation results appears in the right pane. Results shown depend on the model type. For conditional mean and regression models, results include: 4-23
4
Econometric Modeler
• Model Fit — A time series plot of the response series and the fitted values y • Parameters — An estimation summary table containing parameter estimates, standard errors, and t statistics and p-values for testing the null hypothesis that the corresponding parameter is 0 • Residual Plot — A time series plot of the residuals • Goodness of Fit — Akaike information criterion (AIC) and Bayesian information criterion (BIC) model fit statistics For conditional variance models, the results also include an estimation summary table and goodness-of-fit statistics, but Econometric Modeler plots: • Conditional Variances — A time series plot of the inferred conditional variances σ 2 t •
Standardized Residuals — A time series plot of the standardized residuals
yt − c 2
σt
, where c is
the estimated offset You can interact with individual plots by pausing on one and selecting an interaction (see “Visualizing Time Series Data” on page 4-6). You can also interact with the summary by rightclicking the document. Options include: • Export — Place plot in a separate figure window. • Show Model — Display the summary of another estimated model by pointing to Show Model, then selecting a model in the list. • Show Recessions — Plot recession bands in time series plots. Consider a SARIMA(0,1,1)×(0,1,1)12 for the monthly international airline passenger numbers from 1949 to 1960 in the Data_Airline data set. To estimate this model using the Econometric Modeler: 1
Import the DataTimeTable variable in the Data_Airline data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section a
Set Degrees of Integration to 1.
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section
4-24
a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box.
Analyze Time Series Data Using Econometric Modeler
4
Click Estimate.
As a result: • A variable named SARIMA_PSSG appears in the Models pane. • The value of SARIMA_PSSG appears in the Preview pane.
4-25
4
Econometric Modeler
• An estimation summary appears in the new Model Summary(SARIMA_PSSG) document.
4-26
Analyze Time Series Data Using Econometric Modeler
Estimating a Multivariate Model Econometric Modeler treats all parameters in the model as unknown and estimable by default. However, unlike univariate models, Econometric Modeler supports equality constraints on some parameters for estimation. You can specify coefficient values in the app or import a lag coefficient matrix from the workspace. After you configure the model, fit it to the data by clicking Estimate in the Type Model Parameters dialog box. Note • To initialize the model for estimation, Econometric Modeler removes the first p observations from the response data to use as a presample, and then the function fits the model to the remaining observations. • If Econometric Modeler issues an error during estimation, the specified model poorly describes the data. Adjust model parameters, and then estimate the new model.
4-27
4
Econometric Modeler
After you estimate a model, Econometric Modeler shows results similar to that of univariate estimation (see “Estimating a Univariate Model” on page 4-23), and you can interact with an estimated multivariate model in the same ways as with univariate models. Notable differences include: • A new variable that describes the estimated model appears in the Models pane with the name Typej, where Type is the model type and j is estimated model j of that type, for example, VAR2 is the second estimated VAR model during the session. • The Model Summary(Type_response) document summarizing the estimation results appears in the right pane. However, the plots shown depend on the model type and the selected time series in the Time Series list at the top of the document. For VAR models, results include: • Model Fit — A time series plot of the selected time series and the corresponding fitted values y • Residual Plot — A time series plot of the residuals corresponding to the selected time series For VEC models, the results additionally include a time series plot of the cointegrating relation, which is invariant to the selected time series. Consider a 3-D VAR(4) model of quarterly measurements of the US gross domestic product (GDP), M1 money supply, and the 3-month T-bill rate from 1947 through 2009. The file Data_USEconModel.mat contains the series, among other economic measurements. To estimate this model using the Econometric Modeler:
4-28
1
Import the DataTimeTable variable in the Data_USEconModel data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
On the Econometric Modeler tab, in the Time Series pane, click GDP, and then press Ctrl and click M1SL.
3
Because GDP and M1SL exhibit exponential growth, use their growth rates in the model. On the Econometric Modeler tab, in the Transforms section, click Log, and then click Difference. Change the name of the transformed variables to GDPRate and M1SLRate, respectively.
4
In the Time Series pane, click TB3MS, and then stabilize the series by clicking, in the Transforms section, Difference. Change the name of the transformed series to TB3MSRate.
5
With TB3MSRate selected, select the three rate series for the VAR model by pressing Ctrl and clicking GDPRate and M1SLRate.
6
On the Econometric Modeler tab, in the Models section, click VAR.
7
In the VAR Model Parameters dialog box, on the Lag Order tab, set Autoregressive Order to 4.
Analyze Time Series Data Using Econometric Modeler
8
Click Estimate.
As a result: • A variable named VAR appears in the Models pane. • The value of VAR appears in the Preview pane.
4-29
4
Econometric Modeler
• An estimation summary appears in the new Model Summary(VAR) document. The plots are relative to the GDPRate series. The indices of the parameter estimates in the Parameters section correspond to the order of the series in the Time Series list (for example AR{1}(2,3) is the lag 1 AR coefficient of the TB3MSRate series in the equation of the M1SLRate series.
Conducting Goodness-of-Fit Checks After you estimate a model, a good practice is to determine the adequacy of the fitted model (see “Goodness of Fit” on page 3-85). Econometric Modeler is well suited for visually assessing the insample fit (for all models except conditional variance models) and performing residual diagnostics. Residual diagnostics include evaluating the model assumptions and investigating whether you must respecify the model to address other properties of the data. Model assumptions to assess include checking whether the residuals are centered on zero, normally distributed, homoscedastic, and serially uncorrelated. If the residuals do not demonstrate all these properties, then you must determine the severity of the departure, whether to transform the data, and whether to specify a different model. For more details on residual diagnostics, see “Time Series Regression VI: Residual Diagnostics” on page 5-223 and “Residual Diagnostics” on page 3-86. To perform goodness-of-fit checks using Econometric Modeler, in the Models pane, select an estimated model. Then, complete the following steps: 4-30
Analyze Time Series Data Using Econometric Modeler
• To visually assess the in-sample fit for all models (except conditional variance models), inspect the Model Fit plot in the Model Summary document. For multivariate models, Econometric Modeler displays fitted values of one series. You can select a different series in the model to plot by clicking the series in the Time Series list. • To visually assess whether the residuals are centered on zero, autocorrelated, and heteroscedastic, inspect the Residual Plot in the Model Summary document. For multivariate models, Econometric Modeler displays residuals of one series. You can select a different residual series to plot by clicking the corresponding series in the Time Series list. • On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics. The diagnostics gallery provides these residual plots and tests. Method Residual histogram Residual quantile-quantile
Diagnostic Visually assess normality Visually assess normality and skewness
plot ACF Ljung-Box Q-test
Visually assess whether residuals are autocorrelated Test residuals for significant autocorrelation
ACF of squared residuals
Visually assess whether residuals have conditional heteroscedasticity
Engle's ARCH test
Test residuals for conditional heteroscedasticity (significant ARCH effects)
Alternatively, to plot a histogram, quantile-quantile plot, or ACF of the residuals of an estimated model: 1
Select a model in the Models pane.
2
Click the Plots tab.
3
In the Plots section, click the arrow and then click one of the plots in the Model Plots section of the gallery.
For multivariate models: • Econometric Modeler plots residual diagnostics for all model series within the same document. • Econometric Modeler runs residual diagnostic tests simultaneously for all series, but it shows results of each residual series separately. You can choose which results to show by clicking, in the test tab, the series in the Time Series list. Note Another important goodness-of-fit check is predictive-performance assessment. To assess the predictive performance of several models: 1
Fit a set of models to the data using Econometric Modeler.
2
Perform residual diagnostics on all models.
3
Choose a subset of models with desirable residual properties and minimal fit statistics (see “Finding Model with Best In-Sample Fit” on page 4-36). 4-31
4
Econometric Modeler
4
Export the chosen models to the MATLAB Workspace (see “Export Session Results” on page 438).
5
Perform a predictive performance assessment at the command line (see “Assess Predictive Performance” on page 3-88).
For an example, see “Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193. Consider performing goodness-of-fit checks on the estimated SARIMA(0,1,1)×(0,1,1)12 model for the airline counts data in “Estimating a Univariate Model” on page 4-23. 1
4-32
In the right pane, on the Model Summary(SARIMA_PSSG) document: a
Model Fit suggests that the model fits to the data fairly well.
b
Residual Plot suggests that the residuals have a mean of zero. However, the residuals appear heteroscedastic and serially correlated.
Analyze Time Series Data Using Econometric Modeler
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics. In the diagnostics gallery: a
Click Residual Q-Q Plot. The right pane display a figure window named QQPlot(SARIMA_PSSG) containing a quantile-quantile plot of the residuals.
4-33
4
Econometric Modeler
The plot suggests that the residuals are approximately normal, but with slightly heavier tails. b
4-34
Click Autocorrelation Function. In the toolstrip, the ACF tab appears and contains plot options. The right pane displays a figure window named ACF(SARIMA_PSSG) containing the ACF of the residuals.
Analyze Time Series Data Using Econometric Modeler
Because almost all the sample autocorrelation values are below the confidence bounds, the residuals are likely not serially correlated. c
Click Engle's ARCH Test. On the ARCH tab, in the Tests section, click Run Test to run the test using default options. The right pane displays the ARCH(SARIMA_PSSG) document, which shows the test results in the Results table.
4-35
4
Econometric Modeler
The results suggest rejection of the null hypothesis that the residuals exhibit no ARCH effects at a 5% level of significance. You can try removing heteroscedasticity by applying the log transformation to the series.
Finding Model with Best In-Sample Fit Econometric Modeler enables you to fit multiple related models to a data set efficiently. After you estimate a model, you can estimate other models by iterating the methods in “Perform Exploratory Data Analysis” on page 4-6, “Fitting Models to Data” on page 4-15, and “Conducting Goodness-of-Fit Checks” on page 4-30. After each iteration, a new model variable appears in the Models pane. For models in the same parametric family that you fit to the same response series, you can determine the model with the best parsimonious, in-sample fit among the estimated models by comparing their fit statistics. From a subset of candidate models, to determine the model of best fit using Econometric Modeler: 1
In the Models pane, double-click an estimated model. In the right pane, estimation results of the model appear in the Model Summary(Model) document, where Model is the name of the selected model.
2
On the Model Summary(Model) document, in the Goodness of Fit table, choose a fit statistic (AIC or BIC) and record its value.
3
Iterate the previous steps for all candidate models.
4
Choose the model that yields the minimal fit statistic.
For more details on goodness-of-fit statistics, see “Information Criteria for Model Selection” on page 3-53. Consider finding the best-fitting SARIMA model, with a period of 12, for the log of the airline passenger counts in the Data_Airline data set. Fit a subset of SARIMA models, considering all combinations of models that include up to two seasonal and nonseasonal MA lags.
4-36
1
Import the DataTimeTable variable in the Data_Airline data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
2
Apply the log transformation to PSSG (see “Transforming Time Series” on page 4-14).
3
Fit a SARIMA(0,1,q)×(0,1,q12)12 to PSSGLog, where all unknown orders are 0 (see “Estimating a Univariate Model” on page 4-23).
4
In the right pane, on the Model Summary(SARIMA_PSSGLog) document, in the Goodness of Fit table, record the AIC value.
Analyze Time Series Data Using Econometric Modeler
5
In the Models pane, select PSSGLog.
6
Iterate steps 4 and 5, but adjust q and q12 to cover the nine permutations of q ∈ {0,1,2} and q12 ∈ {0,1,2}. Econometric Modeler distinguishes subsequent models of the same type by appending consecutive digits to the end of the variable name.
The resulting AIC values are in this table. Model
Variable Name
AIC
SARIMA(0,1,0)× (0,1,0)12
SARIMA_PSSGLog1
-491.8042
SARIMA(0,1,0)× (0,1,1)12
SARIMA_PSSGLog2
-530.5327
SARIMA(0,1,0)× (0,1,2)12
SARIMA_PSSGLog3
-528.5330
SARIMA(0,1,1)× (0,1,0)12
SARIMA_PSSGLog4
-508.6853
SARIMA(0,1,1)× (0,1,1)12
SARIMA_PSSGLog5
-546.3970
SARIMA(0,1,1)× (0,1,2)12
SARIMA_PSSGLog6
-544.6444
SARIMA(0,1,2)× (0,1,0)12
SARIMA_PSSGLog7
-506.8027
SARIMA(0,1,2)× (0,1,1)12
SARIMA_PSSGLog8
-544.4789
SARIMA(0,1,2)× (0,1,2)12
SARIMA_PSSGLog9
-542.7171
Because it yields the minimal AIC, the SARIMA(0,1,1)×(0,1,1)12 model is the model with the best parsimonious, in-sample fit. 4-37
4
Econometric Modeler
Export Session Results Econometric Modeler offers several options for you to share your session results. The option you choose depends on your analysis goals. The options for sharing your results are in the Export section of the Econometric Modeler tab. This table describes the available options. Option Export Variables
Description Export time series and model variables to the MATLAB Workspace. Choose this option to perform further analysis at the MATLAB command line. For example, you can generate forecasts from an estimated model or check the predictive performance of several models.
Generate Function
Generate a MATLAB function to use outside the app. The function accepts the data loaded into the app as input, and outputs a model estimated in the app session. Choose this option to: • Understand the functions used by Econometric Modeler to create and estimate the model. • Modify the generated function in the MATLAB Editor for further use.
Generate Live Function
Generate a MATLAB live function to use outside the app. The function accepts the data loaded into the app as input, and outputs a model estimated in the app session. Choose this option to: • Understand the functions used by Econometric Modeler to create and estimate the model. • Modify the generated function in the Live Editor for further use.
Generate Report
Generate a report that summarizes the session. Choose this option when you achieve your analysis goals in Econometric Modeler, and you want to share a summary of the results.
Exporting Variables To export time series and estimated model variables from the Time Series or Models pane to the MATLAB Workspace: 1
2
4-38
On the Econometric Modeler tab, in the Export section, click Variables.
or Export > Export
In the Export Variables dialog box, all time series variables appear in the left pane and all model variables appear in the right pane. Choose time series and model variables to export by selecting the corresponding check boxes in the Select column. The app selects the check box of all time series or model variables that are selected in the Time Series and Models panes. Clear any check boxes for variables to you do not want to export. For example, this figure shows how to select the PSSGLog time series and the SARIMA_PSSGLog SARIMA model.
Analyze Time Series Data Using Econometric Modeler
3
Click Export.
The selected variables appear in the MATLAB Workspace. Time series variables are double-precision column vectors. Estimated models are objects of type depending on the model (for example, an exported ARIMA model is an arima object). Alternatively, you can export variables by selecting at least one variable, right-clicking a selected variable, and selecting Export. Generating a Function The app can generate a plain text function or a live function. The main difference between the two functions is the editor used to modify the generated function: you edit plain text functions in the MATLAB editor and live functions in the Live Editor. For more details on the differences between the two function types, see “What Is a Live Script or Function?”. Regardless of the function type you choose, the generated function accepts the data loaded into the app as input, and outputs a model estimated in the app session. To export a MATLAB function or live function that creates a model estimated in an app session: 1
In the Models pane, select an estimated model.
2
On the Econometric Modeler tab, in the Export section, click Export. In the Export menu, choose Generate Function or Generate Live Function.
The MATLAB Editor or Live Editor displays an untitled, unsaved function containing the code that estimates the model. • By default, the function name is modelTimeSeries. • The function accepts the originally imported data set as input. • Before the function estimates the model, it extracts the variables from the input data set used in estimation, and applies the same transformations to the variables that you applied in Econometric Modeler. • The function returns the selected estimated model.
4-39
4
Econometric Modeler
Consider generating a live function that returns SARIMA_PSSGLog, the SARIMA(0,1,1)×(0,1,1)12 model fit to the log of airline passenger data (see “Estimating a Univariate Model” on page 4-23). This figure shows the generated live function.
Generating a Report Econometric Modeler can produce a report describing your activities on selected time series and model variables. The app organizes the report into chapters corresponding to selected time series and model variables. Chapters describe session activities that you perform on the corresponding variable. Chapters on time series variables describe transformations, plots, and tests that you perform on the selected variable in the session. Estimated model chapters contain an estimation summary, that is, elements of the Model Summary document (see “Estimating a Univariate Model” on page 4-23), and residual diagnostics plots and tests. You can export the report as one of the following document types: • Hypertext Markup Language (HTML) • Microsoft® Word XML Format Document (DOCX) • Portable Document Format (PDF) To export a report: 1
4-40
On the Econometric Modeler tab, in the Export section, click Export > Generate Report.
Analyze Time Series Data Using Econometric Modeler
2
In the Select Variables for Report dialog box, all time series variables in the Time Series pane appear in the left pane and all model variables in the Models pane appear in the right pane. Choose variables to include the report by selecting their check boxes in the Select column.
3
Select a document type by clicking Report Format and selecting the format you want.
4
Click OK.
5
In the Select File to Write window: a
Browse to the folder in which you want to save the report.
b
In the File name box, type a name for the report.
c
Click Save.
Consider generating an HTML report for the analysis of the airline passenger data (see “Conducting Goodness-of-Fit Checks” on page 4-30). This figure shows how to select all variables and the HTML format.
This figure shows a sample of the generated report.
4-41
4
Econometric Modeler
4-42
Analyze Time Series Data Using Econometric Modeler
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler
More About •
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Import Time Series Data into Econometric Modeler App” on page 4-62
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131
•
“Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
•
“Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200
•
“Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
•
“Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208
•
“Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122
•
“Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4221
•
“Share Results of Econometric Modeler App Session” on page 4-237
•
Creating ARIMA Models Using Econometric Modeler App
4-43
4
Econometric Modeler
Specifying Univariate Lag Operator Polynomials Interactively Consider building a predictive, univariate time series model (conditional mean, conditional variance, or regression model with ARMA errors) by using the Econometric Modeler app. After you choose candidate models for estimation (see Perform Exploratory Data Analysis on page 4-6), you can specify the model structure of each. To do so, on the Econometric Modeler tab, in the Models section, click a model or display the gallery of supported models and click the model you want.
After you select a time series model, the Type Model Parameters dialog box appears, where Type is the model type. For example, if you select SARIMAX, then the SARIMAX Model Parameters dialog box appears.
4-44
Specifying Univariate Lag Operator Polynomials Interactively
• For all dynamic models, Econometric Modeler supports two options to specify the lag operator polynomials. The adjustment options are on separate tabs: the Lag Order and Lag Vector tab. The Lag Order tab options offer a straight forward way to include consecutive lags from lag 1 and degrees of integration (see “Specify Lag Structure Using Lag Order Tab” on page 4-45). The Lag Vector tab options allow you to create flexible models (see “Specify Lag Structure Using Lag Vector Tab” on page 4-47). • The Type Model Parameters dialog box contains a Nonseasonal or Seasonal section. The Seasonal section is absent in strictly nonseasonal model dialog boxes. To specify the nonseasonal lag operator polynomial structure, use the parameters in the Nonseasonal section. To adjust the seasonal lag operator polynomial structure, including seasonality, use the parameters in the Seasonal section. • To specify the degrees of nonseasonal integration, in the Nonseasonal section, in the Degree of Integration box, type the degree, or click the appropriate arrow on
.
• To specify the seasonal periodicity for seasonal models, in the Seasonal section, in the Period box, type the periodicity, or click the appropriate arrow on . When the periodicity is greater than 0, you can specify the seasonal autoregressive or moving average polynomial order similarly, and you can specify one degree of seasonal integration by clicking the Include Seasonal Difference check box. • For verification, the model form appears in the Model Equation section. The model form updates to your specifications in real time.
Specify Lag Structure Using Lag Order Tab On the Lag Order tab, in the Nonseasonal section, you can specify the orders of each lag operator polynomial in the nonseasonal component. In the appropriate lag polynomial order box (for example, the Autoregressive Order box), type the nonnegative integer order or click the appropriate arrow 4-45
4
Econometric Modeler
on . The app includes all consecutive lags from 1 through L in the polynomial, where L is the specified order. For seasonal models, on the Lag Order tab, in the Seasonal section: 1
Specify the period in the season by entering the nonnegative integer period in the Period box or by clicking
2
.
Specify the seasonal lag operator polynomial order. In the appropriate lag polynomial order box (for example, the Autoregressive Order box), type the nonnegative integer order ignoring seasonality or click . The lag operator exponents in the resulting polynomial are multiples of the specified period.
For example, if Period is 12 and Autoregressive Order in the Seasonal section is 3, then the seasonal autoregressive polynomial is 1 − Φ4L4 − Φ8L8 − Φ12L12 . To specify seasonal integration, select the Include Seasonal Difference check box. A seasonal difference polynomial appears in the Model Equation section, and its lag operator exponent is equal to the specified period. Consider a SARIMA(0,1,1)×(0,1,2)4 model, a seasonal multiplicative ARIMA model with four periods in a season. To specify this model using the parameters in the Lag Order tab: 1
Select a time series variable in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab, enter these values for the corresponding parameters. • In the Nonseasonal section, in the Degree of Integration box, type 1. • In the Nonseasonal section, in the Moving Average Order box, type 1. • In the Seasonal section, in the Period box, type 4. This value indicates a quarterly season. • In the Seasonal section, in the Moving Average Order box, type 2. This action includes seasonal MA lags 4 and 8 in the equation. • In the Seasonal section, select the Include Seasonal Difference check box.
4-46
Specifying Univariate Lag Operator Polynomials Interactively
Specify Lag Structure Using Lag Vector Tab On the Lag Vector tab, you specify the lags in the corresponding seasonal or nonseasonal lag operator polynomial. This figure shows the Lag Vector tab in the SARIMA Model Parameters dialog box.
4-47
4
Econometric Modeler
To specify the lags that comprise each lag operator polynomial, type a list of nonnegative, unique integers in the corresponding box. Separate values by commas or spaces, or use the colon operator (for example, 4:4:12). Specify the seasonal-difference degree by typing a nonnegative integer in the Seasonality box or by clicking
.
Consider a SARIMA(0,1,1)×(0,1,2)4 model, a seasonal multiplicative ARIMA model with four periods in a season. To specify this model using the parameters in the Lag Vector tab: 1
Select a time series variable in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, click the Lag Vector tab, then enter these values for the corresponding parameters. • In the Nonseasonal section, in the Degree of Integration box, type 1. • In the Nonseasonal section, in the Moving Average Lags box, type 1. • In the Seasonal section, in the Seasonality box, type 4. Therefore, a seasonal-difference polynomial of degree 4 appears in the equation in the Model Equation section. • In the Seasonal section, in the Moving Average Lags box, type 4 8. This action includes seasonal MA lags 4 and 8 in the equation.
4-48
Specifying Univariate Lag Operator Polynomials Interactively
See Also Apps Econometric Modeler Objects arima | regARIMA | garch | gjr | egarch
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122
4-49
4
Econometric Modeler
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively Consider building a predictive, multivariate time series model (vector autoregression (VAR) or vector error-correction (VEC)) by using the Econometric Modeler app. After you choose candidate models for estimation (see Perform Exploratory Data Analysis on page 4-6), you can specify the model structure of each. To do so, on the Econometric Modeler tab, in the Time Series pane, select all series in the model, and then, in the Models section, click the model you want.
After you select a time series model, the Type Model Parameters dialog box appears, where Type is the model type. For example, if you select VAR, then the VAR Model Parameters dialog box appears.
• Econometric Modeler supports two options to specify the autoregressive (AR) or short-run lag operator polynomials. The adjustment options are on separate tabs: the Lag Order and Lag Vector tab. The Lag Order tab options offer a straight forward way to include consecutive lags from lag 1 (see “Specify Lag Structure Using Lag Order Tab” on page 4-51). The Lag Vector tab
4-50
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
options allow you to create flexible models (see “Specify Lag Structure Using Lag Vector Tab” on page 4-52). • Regardless of which lag adjustment option you choose, you can specify equality constraints for estimation on individual lag coefficients (entries) within the AR (for VAR models) or short-run (for VEC models) matrix. Similarly, for VEC models, you can specify equality constraints on entire adjustment-speed and cointegration matrices, except for Johansen forms H* and H1* which support equality constraints only for the entire adjustment speed matrix. Econometric modeler enables you to enter equality constraints in the matrices provided in the dialog box or import entire matrices containing constraints from the workspace. For more details, see “Specify Coefficient Matrix Equality Constraints for Estimation” on page 4-54. • For verification, the model form appears in the Model Equation section. The model form updates to your specifications in real time.
Specify Lag Structure Using Lag Order Tab On the Lag Order tab, you can specify the orders of the lag operator polynomial by using the parameter appropriate for the model. • For VAR or VARX models, use the Autoregressive Order box to specify the order of the autoregressive polynomial. • For VEC models, use the Number of Lags box to specify the order of the short-run polynomial. Type a nonnegative integer or click the appropriate arrow on . The app includes all consecutive lags from 1 through L in the polynomial, where L is the specified order. When you specify an order, you can verify the lag operator polynomial in the Model Equation section. Also, Econometric Modeler includes all consecutive lags through L in the AR Coefficients (ϕ) (for VAR models) or Short-Run Coefficients (Φ) (for VEC models) list. The matrix below the list is the coefficient matrix of the selected lag, which you can use to specify equality constraints (see “Specify Coefficient Matrix Equality Constraints for Estimation” on page 4-54). Consider a 3-D VAR(4) model. To specify this model using the parameters in the Lag Order tab: 1
Select the three time series of the model in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click VAR.
3
In the VAR Model Parameters dialog box, on the Lag Order tab, in the Autoregressive Order box, type 4.
4-51
4
Econometric Modeler
4
Verify that the model in the Model Equation and the available lags in the AR Coefficients (ϕ) list contain lags 1 through 4.
Specify Lag Structure Using Lag Vector Tab On the Lag Vector tab, you specify a list of individual lags in the corresponding lag operator polynomial. This figure shows the Lag Vector tab in the VAR Model Parameters dialog box.
4-52
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
To specify the lags that comprise each lag operator polynomial, type a list of nonnegative, unique integers in the corresponding box. Separate values by commas or spaces, or use the colon operator (for example, 4:4:12). Consider a VAR(8) model that includes only lags 1, 4, and 8. To specify this model using the parameters in the Lag Vector tab: 1
Select the three time series of the model in the Time Series pane.
2
On the Econometric Modeler tab, in the Models section, click VAR.
3
In the VAR Model Parameters dialog box, on the Lag Vector tab, in the Autoregressive Lags box, type 1 4 8.
4-53
4
Econometric Modeler
Specify Coefficient Matrix Equality Constraints for Estimation Econometric Modeler supports equality constraints when estimating the following parameters: • For VAR and VARX models, you can specify equality constraints on individual entries of the AR coefficient matrices. In other words, you can specify known values for some matrix elements and estimate the others. • For VEC models, you can specify equality constraints on: • Individual entries of the short-run coefficient matrices, like the AR coefficients of VAR models. • For all Johansen forms, the entire adjustment-speed matrix A. That is, Econometric Modeler does not support constraints on individual matrix elements. • For all Johansen forms except for H* and H1*, the entire cointegration matrix B. To specify equality constraints, enter the constraints in the appropriate matrix provided in the Type Model Parameters dialog box below the name of the matrix. By default, Econometric Modeler populates all matrices with NaN values, which indicate unknown, estimable parameters. For all matrices, each row corresponds to a response equation in the model. For AR and short-run coefficient matrices, each column is the lag coefficient of the specified variable within the equation. For example, in the figure, entry (2,3) is the lag 1 coefficient of TB3MSRate in the equation of M1SLRate.
4-54
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
For VEC models, adjustment-speed A and cointegration matrices B have the r linearly independent columns, where r is the rank of the cointegrating relation specified in the Rank box. Each column corresponds to a cointegrating relation.
Econometric Modeler enables you to specify constraints in two ways: • You can click the elements you want to constrain in the matrices provided, and enter the constraint. • You can use the Import button above the matrix to import an appropriately sized and configured matrix of constraints from the workspace. For example, consider a VEC(1) model for short-, medium-, and long-term annual Canadian interest rate series from 1954 through 1994. Suppose economic theory suggests the following characteristics of the model: • Each series is a unit root process with a cointegration rank of 2. • The deterministic terms in the model are a vector of intercepts in the cointegrating relations and a deterministic linear trend vector in the levels of the data (H1 Johansen form). 4-55
4
Econometric Modeler
• The linear effect, on the current long-run interest rate, of the short-run interest rate in the previous year is 0.5. • The cointegrating relations that produce stationary processes are 2.1INT_L - 2.1INT_M + 0.1INT_S and 1.7INT_L - 3.7INT_M + 1.8INT_S. To specify this model and its characteristics, follow this procedure. 1
Load the Canadian inflation and interest rate data Data_Canada.mat. load Data_Canada
2
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). 3
Import the DataTimeTable variable in the Data_Canada data set into Econometric Modeler (see “Import Time Series Variables” on page 4-4).
4
Select the series INT_L, INT_M, and INT_S in the Time Series pane.
5
On the Econometric Modeler tab, in the Models section, click VEC.
6
In the VEC Model Parameters dialog box, in the Rank box, type 2.
7
In the matrix below the Short-Run Coefficients (Φ) list, in element (1,3), type 0.5.
8
Below Cointegration Matrix (B), observe that the order of the variables in the matrices is INT_L, INT_M, and INT_S. At the command line, create the cointegration matrix with the corresponding variable order. B = [2.1 1.7; -2.1 -3.7; 0.1 1.7];
In the VEC Model Parameters dialog box, at Cointegration Matrix (B), click Import. In the dialog box, select B. The figure shows the parameter configurations in the VEC Model Parameters dialog box.
4-56
Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively
See Also Apps Econometric Modeler Objects vecm | varm
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 4-57
4
Econometric Modeler
4-58
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
Prepare Time Series Data for Econometric Modeler App
Prepare Time Series Data for Econometric Modeler App These examples show how to prepare time series data at the MATLAB command line for use in the Econometric Modeler app. You can import only one variable into Econometric Modeler. The variable can exist in the MATLAB Workspace or a MAT-file. A row in a MATLAB timetable contains simultaneously sampled observations. When you import a timetable, the app plots associated times on the x axis of time series plots and enables you to overlay recession bands on the plots. Therefore, these examples show how to create timetables for univariate and multivariate time series data. For other supported data types and variable orientation, see “Prepare Data for Econometric Modeler App” on page 4-3.
Prepare Table of Multivariate Data for Import This example shows how to create a MATLAB timetable from synchronized data stored in a MATLAB table. The data set contains annual Canadian inflation and interest rates from 1954 through 1994. At the command line, clear the Workspace, then load the Data_Canada.mat data set. Display all variables in the workspace. clear all load Data_Canada whos Name Data DataTable DataTimeTable Description dates series
Size
Bytes
41x5 41x5 41x5 34x55 41x1 1x5
1640 8107 3827 3740 328 878
Class
Attributes
double table timetable char double cell
Data, DataTable, and DataTimeTable contain the time series, and dates contains the sampling years as a numeric vector. The row names of DataTable are the sampling years. For more details about the data set, enter Description at the command line. Time series plots in Econometric Modeler label the x-axis with sampling times when they are attached to the data. Although you can import DataTimeTable to use this feature, consider preparing DataTable as a timetable. Clear the row names of DataTable. DataTable.Properties.RowNames = {};
Convert the sampling years to a datetime vector. Specify the years, and assume that measurements were taken at the end of December. Specify that the time format is the sampling year. dates = datetime(dates,12,31,'Format','yyyy');
Convert the table DataTable to a timetable by associating the rows with the sampling times in dates. DTT = table2timetable(DataTable,'RowTimes',dates);
4-59
4
Econometric Modeler
DTT is a timetable containing the five time series and a variable named Time representing the time base. DTT is prepared for importing into Econometric Modeler. If your time series are not synchronized (that is, do not share a common time base), then you must synchronize them before you import them into the app. For more details, see synchronize and “Combine Timetables and Synchronize Their Data”.
Prepare Numeric Vector for Import This example shows how to create a timetable from a univariate time series stored as a numeric column vector. The data set contains the quarterly US gross domestic product (GDP) prices from 1947 through 2005. At the command line, clear the workspace, then load the Data_GDP.mat data set. Display all variables in the workspace. clear all load Data_GDP whos Name Data DataTable DataTimeTable Description dates
Size
Bytes
Class
234x1 234x1 234x1 22x59 234x1
1872 32337 4727 2596 1872
double table timetable char double
Attributes
Data contains the time series, and dates contains the sampling times as serial date numbers. For more details about the data set, enter Description at the command line. Convert the sampling times to a datetime vector. By default, MATLAB stores the hours, minutes, and seconds when converting from serial date numbers. Remove these clock times from the data. dates = datetime(dates,'ConvertFrom','datenum','Format','ddMMMyyyy',... 'Locale','en_US');
Create a timetable containing the data, and associate each row with the corresponding sampling time in dates. Name the variable GDP. DTT = timetable(Data,'RowTimes',dates,'VariableNames',{'GDP'});
DTT is a timetable, and is prepared for importing into Econometric Modeler.
See Also Apps Econometric Modeler Objects timetable Functions table2timetable | synchronize 4-60
Prepare Time Series Data for Econometric Modeler App
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Timetables”
•
“Prepare Data for Econometric Modeler App” on page 4-3
4-61
4
Econometric Modeler
Import Time Series Data into Econometric Modeler App These examples show how to import time series data into the Econometric Modeler app. Before you import the data, you must prepare the data at the MATLAB command line (see “Prepare Time Series Data for Econometric Modeler App” on page 4-59).
Import Data from MATLAB Workspace This example shows how to import data from the MATLAB Workspace into Econometric Modeler. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import the MATLAB timetable containing the data: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window. 4-62
Import Time Series Data into Econometric Modeler App
The series exhibits a seasonal trend and serial correlation, but an exponential growth is not apparent. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71.
Import Data from MAT-File This example shows how to import data from a MAT-file, stored on your machine, into Econometric Modeler. Suppose that the data set is named Data_Airline.mat and is stored in the MyData folder of your C drive. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import the MAT-file containing the data: 1
On the Econometric Modeler tab, in the Import section, click Import > Import From MATfile. 4-63
4
Econometric Modeler
4-64
2
In the Select a MAT-file window, browse to the C:\MyData folder. Select Data_Airline.mat, then click Open.
3
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
4
Click Import.
Import Time Series Data into Econometric Modeler App
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
See Also Apps Econometric Modeler Objects timetable
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
Creating ARIMA Models Using Econometric Modeler App
4-65
4
Econometric Modeler
Plot Time Series Data Using Econometric Modeler App These examples show how to plot univariate and multivariate time series data by using the Econometric Modeler app. After plotting time series, you can interact with the plots.
Plot Univariate Time Series Data This example shows how to plot univariate time series data, then overlay recession bands in the plot. The data set contains the quarterly US gross domestic product (GDP) prices from 1947 through 2005. At the command line, load the Data_GDP.mat data set. load Data_GDP
DataTimeTable is a timetable containing time series data. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable GDP appears in the Time Series pane, and its time series plot appears in the Time Series Plot(GDP) figure window. Overlay recession bands by right-clicking the plot and figure window, then selecting Show Recessions. Overlay a grid by pausing on the plot and clicking
.
Focus on the GDP from 1970 to the end of the sampling period:
4-66
1
Pause on the plot, then click
2
Position the cross hair at (1970,12000), then drag the cross hair to (2005,3500).
.
Plot Time Series Data Using Econometric Modeler App
The GDP appears flat or decreasing before and during periods of recession.
Plot Multivariate Time Series and Correlations This example shows how to plot multiple series on the same time series plot, interact with the resulting plot, and plot the correlations among the variables. The data set, stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
. 4-67
4
Econometric Modeler
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(INF_C) figure window. Overlay recession bands by right-clicking the plot and selecting Show Recessions. Overlay a grid by pausing on the plot and clicking
.
Remove the inflation rates (INF_C and INF_G) from the time series plot:
4-68
1
Right-click the plot.
2
Point to Show Time Series, then clear INF_C.
3
Repeat steps 1 and 2, but clear INF_G instead.
Plot Time Series Data Using Econometric Modeler App
Generate a correlation plot for all variables: 1
Select all variables in the Time Series pane.
2
Click the Plots tab, then click Correlations.
A correlations plot appears in the Correlations(INF_C) figure window. Remove the inflation rate based on GDP (INF_G) from the correlations plot: 1
Right-click the plot.
2
Point to Show Time Series, then clear INF_G.
All variables appear to be skewed to the right. According to the Pearson correlation coefficients (topleft of the off-diagonal plots): • The inflation rate explains at least 70% of the variability in the interest rates (when used as a predictor in a linear regression).
4-69
4
Econometric Modeler
• The interest rates are highly correlated; each explains at least 94% of the variability in another series.
See Also Apps Econometric Modeler Functions corrplot | recessionplot
More About
4-70
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Import Time Series Data into Econometric Modeler App” on page 4-62
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
Creating ARIMA Models Using Econometric Modeler App
Detect Serial Correlation Using Econometric Modeler App
Detect Serial Correlation Using Econometric Modeler App These examples show how to assess serial correlation by using the Econometric Modeler app. Methods include plotting the autocorrelation function (ACF) and partial autocorrelation function (PACF), and testing for significant lag coefficients using the Ljung-Box Q-test. The data set Data_Overshort.mat contains 57 consecutive days of overshorts from a gasoline tank in Colorado.
Plot ACF and PACF This example shows how to plot the ACF and PACF of a time series. At the command line, load the Data_Overshort.mat data set. load Data_Overshort
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable OSHORT appears in the Time Series pane, and its time series plot appears in the Time Series Plot(OSHORT) figure window.
4-71
4
Econometric Modeler
The series appears to be stationary. Close the Time Series Plot(OSHORT) figure window. Plot the ACF of OSHORT by clicking the Plots tab. The ACF appears in the ACF(OSHORT) figure window then clicking ACF. Plot the PACF of OSHORT by clicking the Plots tab then clicking PACF. The PACF appears in the PACF(OSHORT) figure window. Position the correlograms so that you can view them at the same time by dragging the PACF(OSHORT) figure window to the bottom of the right pane.
4-72
Detect Serial Correlation Using Econometric Modeler App
The sample ACF and PACF exhibit significant autocorrelation (that is, both contain lags that are more than two standard deviations away from 0). The sample ACF shows that the autocorrelation at lag 1 is significant. The sample PACF shows that the autocorrelations at lags 1, 3, and 4 are significant. The distinct cutoff of the ACF and the more gradual decay of the PACF suggest an MA(1) model might be appropriate for this data.
Conduct Ljung-Box Q-Test for Significant Autocorrelation This example shows how to conduct the Ljung-Box Q-test for significant autocorrelation lags. At the command line, load the Data_Overshort.mat data set. load Data_Overshort
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). 4-73
4
Econometric Modeler
Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable OSHORT appears in the Time Series pane, and its time series plot appears in the Time Series Plot(OSHORT) figure window.
The series appears to be stationary, and it fluctuates around a constant mean. Therefore, you do not need to transform the data before conducting the test. Conduct three Ljung-Box Q-Tests for testing the null hypothesis that the first 10, 5, and 1 autocorrelations are jointly zero:
4-74
1
On the Econometric Modeler tab, in the Tests section, click New Test > Ljung-Box Q-Test.
2
On the LBQ tab, in the Parameters section:
Detect Serial Correlation Using Econometric Modeler App
a
Set Number of Lags to 10.
b
Set DOF to 10.
c
To achieve a false positive rate below 0.05, use the Bonferroni correction to set Significance Level to 0.05/3 = 0.0167.
3
In the Tests section, click Run Test.
4
Repeat steps 2 and 3 twice, with these changes: a
Set Number of Lags to 5 and the DOF to 5.
b
Set Number of Lags to 1 and the DOF to 1.
The test results appear in the Results table of the LBQ(OSHORT) document.
The results show that not every autocorrelation up to lag 5 (or 10) is zero, indicating volatility clustering in the residual series.
See Also Apps Econometric Modeler Functions parcorr | autocorr | lbqtest
More About •
“Detect Autocorrelation” on page 3-19
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
•
“Autocorrelation and Partial Autocorrelation” on page 3-10
•
“Ljung-Box Q-Test” on page 3-17
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 4-75
4
Econometric Modeler
•
4-76
Creating ARIMA Models Using Econometric Modeler App
Detect ARCH Effects Using Econometric Modeler App
Detect ARCH Effects Using Econometric Modeler App These examples show how to assess whether a series has volatility clustering by using the Econometric Modeler app. Methods include inspecting correlograms of squared residuals and testing for significant ARCH lags. The data set, stored in Data_EquityIdx.mat, contains a series of daily NASDAQ closing prices from 1990 through 2001.
Inspect Correlograms of Squared Residuals for ARCH Effects This example shows how to visually determine whether a series has significant ARCH effects by plotting the autocorrelation function (ACF) and partial autocorrelation function (PACF) of a series of squared residuals. At the command line, load the Data_EquityIdx.mat data set. load Data_EquityIdx
The data set contains a table of NASDAQ and NYSE closing prices, among other variables. For more details about the data set, enter Description at the command line. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(NASDAQ) figure window. Convert the daily close NASDAQ index series to a percentage return series by taking the log of the series, then taking the first difference of the logged series: 1
In the Time Series pane, select NASDAQ.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
3
With NASDAQLog selected, in the Transforms section, click Difference.
4
In the Time Series pane, rename the NASDAQLogDiff variable by clicking it twice to select its name and entering NASDAQReturns.
The time series plot of the NASDAQ returns appears in the Time Series Plot(NASDAQReturns) figure window.
4-77
4
Econometric Modeler
The returns appear to fluctuate around a constant level, but exhibit volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. Compute squared residuals: 1
Export NASDAQReturns to the MATLAB Workspace: a
In the Time Series pane, right-click NASDAQReturns.
b
In the context menu, select Export.
NASDAQReturns appears in the MATLAB Workspace. 2
4-78
At the command line: a
For numerical stability, scale the returns by a factor of 100.
b
Create a residual series by removing the mean from the scaled returns series. Because you took the first difference of the NASDAQ prices to create the returns, the first element of the returns is missing. Therefore, to estimate the sample mean of the series, call mean(NASDAQReturns,'omitnan').
Detect ARCH Effects Using Econometric Modeler App
c
Square the residuals.
d
Add the squared residuals as a new variable to the DataTimeTable timetable.
NASDAQReturns = 100*NASDAQReturns; NASDAQResiduals = NASDAQReturns - mean(NASDAQReturns,'omitnan'); NASDAQResiduals2 = NASDAQResiduals.^2; DataTimeTable.NASDAQResiduals2 = NASDAQResiduals2;
In Econometric Modeler, import DataTimeTable: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Econometric Modeler dialog box, click OK to clear all variables and documents in the app.
3
In the Import Data dialog box, in the Import? column, select the check box for DataTimeTable.
4
Click Import.
Plot the ACF and PACF: 1
In the Time Series pane, select the NASDAQResiduals2 time series.
2
Click the Plots tab, then click ACF.
3
Click the Plots tab, then click PACF.
4
Close the Time Series Plot(NASDAQ) figure window. Then, position the ACF(NASDAQResiduals2) figure window above the PACF(NASDAQResiduals2) figure window.
4-79
4
Econometric Modeler
The sample ACF and PACF show significant autocorrelation in the squared residuals. This result indicates that volatility clustering is present.
Conduct Ljung-Box Q-Test on Squared Residuals This example shows how to test squared residuals for significant ARCH effects using the Ljung-Box Qtest. At the command line:
4-80
1
Load the Data_EquityIdx.mat data set.
2
Convert the NASDAQ prices to returns. To maintain the correct time base, prepend the resulting returns with a NaN value.
3
Scale the NASDAQ returns.
4
Compute residuals by removing the mean from the scaled returns.
5
Square the residuals.
Detect ARCH Effects Using Econometric Modeler App
6
Add the vector of squared residuals as a variable to DataTimeTable.
For more details on the steps, see “Inspect Correlograms of Squared Residuals for ARCH Effects” on page 4-77. load Data_EquityIdx NASDAQReturns = 100*price2ret(DataTimeTable.NASDAQ); NASDAQReturns = [NaN; NASDAQReturns]; NASDAQResiduals2 = (NASDAQReturns - mean(NASDAQReturns,'omitnan')).^2; DataTimeTable.NASDAQResiduals2 = NASDAQResiduals2;
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(NASDAQ) figure window. Test the null hypothesis that the first m = 5 autocorrelation lags of the squared residuals are jointly zero by using the Ljung-Box Q-test. Then, test the null hypothesis that the first m = 10 autocorrelation lags of the squared residuals are jointly zero. 1
In the Time Series pane, select the NASDAQResiduals2 time series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Ljung-Box Q-Test.
3
On the LBQ tab, in the Parameters section, set both the Number of Lags and DOF to 5. To maintain a significance level of 0.05 for the two tests, set Significance Level to 0.025.
4
In the Tests section, click Run Test.
5
Repeat steps 3 and 4, but set both the Number of Lags and DOF to 10 instead.
The test results appear in the Results table of the LBQ(NASDAQResiduals2) document.
4-81
4
Econometric Modeler
The null hypothesis is rejected for the two tests. The p-value for each test is 0. The results show that not every autocorrelation up to lag 5 (or 10) is zero, indicating volatility clustering in the squared residuals.
Conduct Engle's ARCH Test This example shows how to test residuals for significant ARCH effects using the Engle's ARCH Test. At the command line: 1
Load the Data_EquityIdx.mat data set.
2
Convert the NASDAQ prices to returns. To maintain the correct time base, prepend the resulting returns with a NaN value.
3
Scale the NASDAQ returns.
4
Compute residuals by removing the mean from the scaled returns.
5
Add the vector of residuals as a variable to DataTimeTable.
For more details on the steps, see “Inspect Correlograms of Squared Residuals for ARCH Effects” on page 4-77. load Data_EquityIdx NASDAQReturns = 100*price2ret(DataTimeTable.NASDAQ); NASDAQReturns = [NaN; NASDAQReturns]; NASDAQResiduals = NASDAQReturns - mean(NASDAQReturns,'omitnan'); DataTimeTable.NASDAQResiduals = NASDAQResiduals;
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables appear in the Time Series pane, and a time series plot of the all the series appears in the Time Series Plot(NASDAQ) figure window. Test the null hypothesis that the NASDAQ residuals series exhibits no ARCH effects by using Engle's ARCH test. Specify that the residuals series is an ARCH(2) model. 1
In the Time Series pane, select the NASDAQResiduals time series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Engle's ARCH Test.
3
On the ARCH tab, in the Parameters section, set Number of Lags to 2.
4
In the Tests section, click Run Test.
The test results appear in the Results table of the ARCH(NASDAQResiduals) document. 4-82
Detect ARCH Effects Using Econometric Modeler App
The null hypothesis is rejected in favor of the ARCH(2) alternative. The test result indicates significant volatility clustering in the residuals.
See Also Apps Econometric Modeler Functions parcorr | autocorr | lbqtest | archtest
More About •
“Detect ARCH Effects” on page 3-27
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Autocorrelation and Partial Autocorrelation” on page 3-10
•
“Ljung-Box Q-Test” on page 3-17
•
“Engle’s ARCH Test” on page 3-25
4-83
4
Econometric Modeler
Assess Stationarity of Time Series Using Econometric Modeler These examples show how to conduct statistical hypothesis tests for assessing whether a time series is a unit root process by using the Econometric Modeler app. The test you use depends on your assumptions about the nature of the nonstationarity of an underlying model.
Test Assuming Unit Root Null Model This example uses the Augmented Dickey-Fuller and Phillips-Perron tests to assess whether a time series is a unit root process. The null hypothesis for both tests is that the time series is a unit root process. The data set, stored in Data_USEconModel.mat, contains the US gross domestic product (GDP) measured quarterly, among other series. At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, among GDP, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(COE) figure window. In the Time Series pane, double-click GDP. A time series plot of GDP appears in the Time Series Plot(GDP) figure window.
4-84
Assess Stationarity of Time Series Using Econometric Modeler
The series appears to grow without bound. Apply the log transformation to GDP. On the Econometric Modeler tab, in the Transforms section, click Log. In the Time Series pane, a variable representing the logged GDP (GDPLog) appears. A time series plot of the logged GDP appears in the Time Series Plot(GDPLog) figure window.
4-85
4
Econometric Modeler
The logged GDP series appears to have a time trend or drift term. Using the Augmented Dickey-Fuller test, test the null hypothesis that the logged GDP series has a unit root against a trend stationary AR(1) model alternative. Conduct a separate test for an AR(1) model with drift alternative. For the null hypothesis of both tests, include the restriction that the trend and drift terms, respectively, are zero by conducting F tests. 1
With GDPLog selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Augmented Dickey-Fuller Test.
2
On the ADF tab, in the Parameters section: a
Set Number of Lags to 1.
b
Select Model > Trend Stationary.
c
Select Test Statistic > F statistic.
3
In the Tests section, click Run Test.
4
Repeat steps 2 and 3, but select Model > Autoregressive with Drift instead.
The test results appear in the Results table of the ADF(GDPLog) document. 4-86
Assess Stationarity of Time Series Using Econometric Modeler
For the test supposing a trend stationary AR(1) model alternative, the null hypothesis is not rejected. For the test assuming an AR(1) model with drift, the null hypothesis is rejected. Apply the Phillips-Perron test using the same assumptions as in the Augmented Dickey-Fuller tests, except the trend and drift terms in the null model cannot be zero. 1
With GDPLog selected in the Time Series pane, click the Econometric Modeler tab. Then, in the Tests section, click New Test > Phillips-Perron Test.
2
On the PP tab, in the Parameters section: a
Set Number of Lags to 1.
b
Select Model > Trend Stationary.
3
In the Tests section, click Run Test.
4
Repeat steps 2 and 3, but select Model > Autoregressive with Drift instead.
The test results appear in the Results table of the PP(GDPLog) document.
The null is not rejected for both tests. These results suggest that the logged GDP possibly has a unit root. The difference in the null models can account for the differences between the Augmented DickeyFuller and Phillips-Perron test results.
Test Assuming Stationary Null Model This example uses the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test to assess whether a time series is a unit root process. The null hypothesis is that the time series is stationary. The data set, stored in Data_NelsonPlosser.mat, contains annual nominal wages, among other US macroeconomic series. 4-87
4
Econometric Modeler
At the command line, load the Data_NelsonPlosser.mat data set. load Data_NelsonPlosser
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including the nominal wages WN, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(BY)figure window. In the Time Series pane, double-click WN. A time series plot of WN appears in the Time Series Plot(WN) figure window.
4-88
Assess Stationarity of Time Series Using Econometric Modeler
The series appears to grow without bound, and wage measurements are missing before 1900. To zoom into values occurring after 1900, pause on the plot, click box produced by dragging the cross hair.
, and enclose the time series in the
Apply the log transformation to WN. On the Econometric Modeler tab, in the Transforms section, click Log. In the Time Series pane, a variable representing the logged wages (WNLog) appears. The logged series appears in the Time Series Plot(WNLog) figure window.
The logged wages appear to have a linear trend. Using the KPSS test, test the null hypothesis that the logged wages are trend stationary against the unit root alternative. As suggested in [1], conduct three separate tests by specifying 7, 9, and 11 lags in the autoregressive model. 1
With WNLog selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > KPSS Test.
2
On the KPSS tab, in the Parameters section, set Number of Lags to 7.
3
In the Tests section, click Run Test. 4-89
4
Econometric Modeler
4
Repeat steps 2 and 3, but set Number of Lags to 9 instead.
5
Repeat steps 2 and 3, but set Number of Lags to 11 instead.
The test results appear in the Results table of the KPSS(WNLog) document.
All tests fail to reject the null hypothesis that the logged wages are trend stationary.
Test Assuming Random Walk Null Model This example uses the variance ratio test to assess the null hypothesis that a time series is a random walk. The data set, stored in CAPMuniverse.mat and available with the Financial Toolbox™ documentation, contains market data for daily returns of stocks and cash (money market) from the period January 1, 2000 to November 7, 2005. At the command line, load the CAPMuniverse.mat data set. load CAPMuniverse
The series are in the timetable AssetsTimeTable. The first column of data (AAPL) is the daily return of a technology stock. The last column is the daily return for cash (the daily money market rate, CASH). Accumulate the daily technology stock and cash returns. AssetsTimeTable.AAPLcumsum = cumsum(AssetsTimeTable.AAPL); AssetsTimeTable.CASHcumsum = cumsum(AssetsTimeTable.CASH);
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import AssetsTimeTable into the app: 1
4-90
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the AssetsTimeTable variable.
3
Click Import.
Assess Stationarity of Time Series Using Econometric Modeler
The variables, including stock and cash prices (AAPLcumsum and CASHcumsum), appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(AAPL) figure window. In the Time Series pane, double-click AAPLcumsum. A time series plot of AAPLcumsum appears in the Time Series Plot(AAPLcumsum) figure window.
The accumulated returns of the stock appear to wander at first, with high variability, and then grow without bound after 2004. Using the variance ratio test, test the null hypothesis that the series of accumulated stock returns is a random walk. First, test without assuming IID innovations for the alternative model, then test assuming IID innovations. 1
With AAPLcumsum selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Variance Ratio Test.
2
On the VRatio tab, in the Tests section, click Run Test. 4-91
4
Econometric Modeler
3
On the VRatio tab, in the Parameters section, select the IID Innovations check box.
4
In the Tests section, click Run Test.
The test results appear in the Results table of the VRatio(AAPLcumsum) document.
Without assuming IID innovations for the alternative model, the test fails to reject the random walk null model. However, assuming IID innovations, the test rejects the null hypothesis. This result might be due to heteroscedasticity in the series, that is, the series might be a heteroscedastic random walk. In the Time Series pane, double-click CASHcumsum. A time series plot of CASHcumsum appears in the Time Series Plot(CASHcumsum) figure window.
4-92
Assess Stationarity of Time Series Using Econometric Modeler
The series of accumulated cash returns exhibits low variability and appears to have long-term trends. Test the null hypothesis that the series of accumulated cash returns is a random walk: 1
With CASHcumsum selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Variance Ratio Test.
2
On the VRatio tab, in the Parameters section, clear the IID Innovations box.
3
In the Tests section, click Run Test.
The test results appear in the Results tab of the VRatio(CASHcumsum) document.
The test rejects the null hypothesis that the series of accumulated cash returns is a random walk.
References [1] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178.
See Also adftest | kpsstest | lmctest | vratiotest
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Assess Stationarity of a Time Series” on page 3-50
•
“Unit Root Nonstationarity” on page 3-32
•
“Unit Root Tests” on page 3-40
4-93
4
Econometric Modeler
Assess Collinearity Among Multiple Series Using Econometric Modeler App This example shows how to assess the strengths and sources of collinearity among multiple series by using Belsley collinearity diagnostics in the Econometric Modeler app. The data set, stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(INF_C) figure window.
4-94
Assess Collinearity Among Multiple Series Using Econometric Modeler App
Perform Belsley collinearity diagnostics on all series. On the Econometric Modeler tab, in the Tests section, click New Test > Belsley Collinearity Diagnostics. The Collinearity(INF_C) document appear with the following results: • A table of singular values, corresponding condition indices, and corresponding variable variancedecomposition proportions • A plot of the variable variance-decomposition proportions corresponding to the condition index that is above the threshold, and a horizontal line indicating the variance-decomposition threshold
4-95
4
Econometric Modeler
The interest rates have variance-decomposition proportions exceeding the default tolerance, 0.5, indicated by red markers in the plot. This result suggests that the interest rates exhibit multicollinearity. If you use the three interest rates as predictors in a linear regression model, then the predictor data matrix can be ill conditioned.
See Also collintest
More About
4-96
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Time Series Regression II: Collinearity and Estimator Variance” on page 5-183
Transform Time Series Using Econometric Modeler App
Transform Time Series Using Econometric Modeler App The Econometric Modeler app enables you to transform time series data based on deterministic or stochastic trends you see in plots or hypothesis test conclusions. Available transformations in the app are log, seasonal and nonseasonal difference, and linear detrend. These examples show how to apply each transformation to time series data.
Apply Log Transformation to Data This example shows how to stabilize a time series, whose variability grows with the level of the series, by applying the log transformation. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, and its time series plot is in the Time Series Plot(PSSG) figure window. Fit a SARIMA(0,1,1)×(0,1,1)12 model to the data in levels: 1
On the Econometric Modeler tab, in the Models section, click the arrow to display the model gallery.
2
In the models gallery, in the ARMA/ARIMA Models section, click SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section a
Set Degrees of Integration to 1.
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box. 4-97
4
Econometric Modeler
4
Click Estimate.
The model variable SARIMA_PSSG appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(SARIMA_PSSG) document.
4-98
Transform Time Series Using Econometric Modeler App
The spread of the residuals increases with the level of the data, which is indicative of heteroscedasticity. Apply the log transform to PSSG: 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-99
4
Econometric Modeler
The exponential growth appears removed from the series. With PSSGLog selected in the Time Series pane, fit the SARIMA(0,1,1)×(0,1,1)12 model to the logged series using the same dialog box settings that you used for PSSG. The estimation summary appears in the Model Summary(SARIMA_PSSGLog) document.
4-100
Transform Time Series Using Econometric Modeler App
The spread of the residuals does not appear to change systematically with the levels of the data.
Stabilize Time Series Using Nonseasonal Differencing This example shows how to stabilize a time series by applying multiple nonseasonal difference operations. The data set, which is stored in Data_USEconModel.mat, contains the US gross domestic product (GDP) measured quarterly, among other series. At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 4-101
4
Econometric Modeler
1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including GDP, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(COE) figure window. In the Time Series pane, double-click GDP. A time series plot of GDP appears in the Time Series Plot(GDP) figure window.
The series appears to grow without bound. Apply the first difference to GDP. On the Econometric Modeler tab, in the Transforms section, click Difference. In the Time Series pane, a variable representing the differenced GDP (GDPDiff) appears. A time series plot of the differenced GDP appears in the Time Series Plot(GDPDiff) figure window. 4-102
Transform Time Series Using Econometric Modeler App
The differenced GDP series appears to grow without bound after 1970. Apply the second difference to the GDP by differencing the differenced GDP. With GDPDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Difference. In the Time Series pane, a variable representing the transformed differenced GDP (GDPDiffDiff) appears. A time series plot of the differenced GDP appears in the Time Series Plot(GDPDiffDiff) figure window.
4-103
4
Econometric Modeler
The transformed differenced GDP series appears stationary, although heteroscedastic.
Convert Prices to Returns This example shows how to convert multiple series of prices to returns. The data set, which is stored in Data_USEconModel.mat, contains the US GDP and personal consumption expenditures measured quarterly, among other series. At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
4-104
On the Econometric Modeler tab, in the Import section, click the Import button
.
Transform Time Series Using Econometric Modeler App
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
GDP and PCEC, among other series, appear in the Time Series pane, and a time series plot containing all series appears in the figure window. In the Time Series pane, click GDP, then press Ctrl and click PCEC. Both series are selected. Click the Plots tab, then click Time Series. A time series plot of GDP and PCEC appears in the Time Series Plot(GDP) figure window.
Both series, as prices, appear to grow without bound. Convert the GDP and personal consumption expenditure prices to returns: 4-105
4
Econometric Modeler
1
Click the Econometric Modeler tab. Ensure that GDP and PCEC are selected in the Time Series pane.
2
In the Transforms section, click Log. The Time Series pane displays variables representing the logged GDP series (GDPLog) and the logged personal consumption expenditure series (PCECLog).
3
With GDPLog and PCECLog selected in the Time Series pane, in the Transforms section, click Difference.
The Time Series pane displays variables representing the GDP returns (GDPLogDiff) and personal consumption expenditure returns (PCECLogDiff). A time series plot of the GDP and personal consumption expenditure returns appears in the Time Series Plot(GDPLogDiff) figure window. In the Time Series pane, rename the GDPLogDiff and PCECLogDiff variables. Click GDPLogDiff twice to select its name and enter GDPReturns. Click PCECLogDiff twice to select its name and enter PCECReturns. The app updates the names of all documents associated with both returns.
4-106
Transform Time Series Using Econometric Modeler App
The series of GDP and personal consumption expenditure returns appear stationary, but observations within each series appear serially correlated.
Remove Seasonal Trend from Time Series Using Seasonal Difference This example shows how to stabilize a time series exhibiting seasonal integration by applying a seasonal difference. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
4-107
4
Econometric Modeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window. Address the seasonal trend by applying the 12th order seasonal difference. On the Econometric Modeler tab, in the Transforms section, set Seasonal to 12. Then, click Seasonal. The transformed variable PSSGSeasonalDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGSeasonalDiff) figure window.
The transformed series appears to have a nonseasonal trend. 4-108
Transform Time Series Using Econometric Modeler App
Address the nonseasonal trend by applying the first difference. With PSSGSeasonalDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Difference. The transformed variable PSSGSeasonalDiffDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGSeasonalDiffDiff) figure window.
The transformed series appears stationary, but observations appear serially correlated. In the Time Series pane, rename the PSSGSeasonalDiffDiff variable by clicking it twice to select its name and entering PSSGStable. The app updates the names of all documents associated with the transformed series.
Remove Deterministic Trend from Time Series This example shows how to remove a least-squares-derived deterministic trend from a nonstationary time series. The data set Data_Airline.mat contains monthly counts of airline passengers. At the command line, load the Data_Airline.mat data set. 4-109
4
Econometric Modeler
load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSG) figure window. Apply the log transformation to the series. On the Econometric Modeler tab, in the Transforms section, click Log. The transformed variable PSSGLog appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window. Identify the deterministic trend by using least squares. Then, detrend the series by removing the identified deterministic trend. On the Econometric Modeler tab, in the Transforms section, click Detrend. The transformed variable PSSGLogDetrend appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogDetrend) figure window.
4-110
Transform Time Series Using Econometric Modeler App
PSSGLogDetrend does not appear to have a deterministic trend, although it has a marked cyclic trend.
See Also Econometric Modeler
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Data Transformations” on page 2-2
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
Creating ARIMA Models Using Econometric Modeler App
4-111
4
Econometric Modeler
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App This example shows how to use the Box-Jenkins methodology to select and estimate an ARIMA model by using the Econometric Modeler app. Then, it shows how to export the estimated model to generate forecasts. The data set, which is stored in Data_JAustralian.mat, contains the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991, among other time series. Prepare Data for Econometric Modeler At the command line, load the Data_JAustralian.mat data set. load Data_JAustralian
Import Data into Econometric Modeler At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including PAU, appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(EXCH) figure window. Create a time series plot of PAU by double-clicking PAU in the Time Series pane.
4-112
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
The series appears nonstationary because it has a clear upward trend. Plot Sample ACF and PACF of Series Plot the sample autocorrelation function (ACF) and partial autocorrelation function (PACF). 1
In the Time Series pane, select the PAU time series.
2
Click the Plots tab, then click ACF.
3
Click the Plots tab, then click PACF.
4
Close all figure windows except for the correlograms. Then, drag the ACF(PAU) figure window above the PACF(PAU) figure window.
4-113
4
Econometric Modeler
The significant, linearly decaying sample ACF indicates a nonstationary process. Close the ACF(PAU) and PACF(PAU) figure windows. Difference the Series Take a first difference of the data. With PAU selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Difference. The transformed variable PAUDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PAUDiff) figure window.
4-114
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
Differencing removes the linear trend. The differenced series appears more stationary. Plot Sample ACF and PACF of Differenced Series Plot the sample ACF and PACF of PAUDiff. With PAUDiff selected in the Time Series pane: 1
Click the Plots tab, then click ACF.
2
Click the Plots tab, then click PACF.
3
Close the Time Series Plot(PAUDiff) figure window. Then, drag the ACF(PAUDiff) figure window above the PACF(PAUDiff) figure window.
4-115
4
Econometric Modeler
The sample ACF of the differenced series decays more quickly. The sample PACF cuts off after lag 2. This behavior is consistent with a second-degree autoregressive (AR(2)) model for the differenced series. Close the ACF(PAUDiff) and PACF(PAUDiff) figure windows. Specify and Estimate ARIMA Model Estimate an ARIMA(2,1,0) model for the log quarterly Australian CPI. This model has one degree of nonseasonal differencing and two AR lags.
4-116
1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Models section, click ARIMA.
3
In the ARIMA Model Parameters dialog box, on the Lag Order tab: a
Set Degree of Integration to 1.
b
Set Autoregressive Order to 2.
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
4
Click Estimate.
The model variable ARIMA_PAU appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMA_PAU) document.
4-117
4
Econometric Modeler
Both AR coefficients are significant at a 5% significance level. Check Goodness of Fit Check that the residuals are normally distributed and uncorrelated by plotting a histogram, quantilequantile plot, and ACF of the residuals.
4-118
1
Close the Model Summary(ARIMA_PAU) document.
2
With ARIMA_PAU selected in the Models pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
Click Residual Diagnostics > Residual Q-Q Plot.
4
Click Residual Diagnostics > Autocorrelation Function.
5
In the right pane, drag the Histogram(ARIMA_PAU) and QQPlot(ARIMA_PAU) figure windows so that they occupy the upper two quadrants, and drag the ACF so that it occupies the lower two quadrants.
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
The residual plots suggest that the residuals are approximately normally distributed and uncorrelated. However, there is some indication of an excess of large residuals. This behavior suggests that a t innovation distribution might be appropriate. Export Model to Workspace Export the model to the MATLAB Workspace. 1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Export section, click Export > Export Variables.
3
In the Export Variables dialog box, select the Select check box for the ARIMA_PAU model.
4
Click Export. The check box for the PAU time series is already selected.
The variables PAU and ARIMA_PAU appear in the workspace. Generate Forecasts at Command Line Generate forecasts and approximate 95% forecast intervals from the estimated ARIMA(2,1,0) model for the next four years (16 quarters). Use the entire series as a presample for the forecasts. 4-119
4
Econometric Modeler
[PAUF,PAUMSE] = forecast(ARIMA_PAU,16,'Y0',PAU); UB = PAUF + 1.96*sqrt(PAUMSE); LB = PAUF - 1.96*sqrt(PAUMSE); datesF = dates(end) + calquarters(1:16); figure h4 = plot(dates,PAU,'Color',[.75,.75,.75]); hold on h5 = plot(datesF,PAUF,'r','LineWidth',2); h6 = plot(datesF,UB,'k--','LineWidth',1.5); plot(datesF,LB,'k--','LineWidth',1.5); legend([h4,h5,h6],'Log CPI','Forecast',... 'Forecast Interval','Location','Northwest') title('Log Australian CPI Forecast') hold off
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler 4-120
Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App
Objects arima Functions estimate | forecast
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
•
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Share Results of Econometric Modeler App Session” on page 4-237
•
Creating ARIMA Models Using Econometric Modeler App
4-121
4
Econometric Modeler
Select ARCH Lags for GARCH Model Using Econometric Modeler App This example shows how to select the appropriate number of ARCH and GARCH lags for a GARCH model by using the Econometric Modeler app. The data set, stored in Data_MarkPound, contains daily Deutschmark/British pound bilateral spot exchange rates from 1984 through 1991. Import Data into Econometric Modeler At the command line, load the Data_MarkPound.mat data set. load Data_MarkPound
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import Data to the app: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the Data variable.
3
Click Import.
The variable Data1 appears in the Time Series pane, and its time series plot appears in the Time Series Plot(Data1) figure window.
4-122
Select ARCH Lags for GARCH Model Using Econometric Modeler App
The exchange rate looks nonstationary (it does not appear to fluctuate around a fixed level). Transform Data Convert the exchange rates to returns. 1
With Data1 selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, click Log. In the Time Series pane, a variable representing the logged exchange rates (Data1Log) appears , and its time series plot appears in the Time Series Plot(Data1Log) figure window.
2
In the Time Series pane, select Data1Log.
3
On the Econometric Modeler tab, in the Transforms section, click Difference.
In the Time Series pane, a variable representing the returns (Data1LogDiff) appears. A time series plot of the differenced series appears in the Time Series Plot(Data1LogDiff) figure window. Check for Autocorrelation In the Time Series pane, rename the Data1LogDiff variable by clicking it twice to select its name and entering Returns. 4-123
4
Econometric Modeler
The app updates the names of all documents associated with the returns.
The returns series fluctuates around a common level, but exhibits volatility clustering. Large changes in the returns tend to cluster together, and small changes tend to cluster together. That is, the series exhibits conditional heteroscedasticity. Visually assess whether the returns have serial correlation by plotting the sample ACF and PACF:
4-124
1
Close all figure windows in the right pane.
2
In the Time Series pane, select the Returns time series.
3
Click the Plots tab, then click ACF.
4
Click the Plots tab, then click PACF.
5
Drag the PACF(Returns) figure window below the ACF(Returns) figure window so that you can view them simultaneously.
Select ARCH Lags for GARCH Model Using Econometric Modeler App
The sample ACF and PACF show virtually no significant autocorrelation. Conduct the Ljung-Box Q-test to assess whether there is significant serial correlation in the returns for at most 5, 10, and 15 lags. To maintain a false-discovery rate of approximately 0.05, specify a significance level of 0.05/3 = 0.0167 for each test. 1
Close the ACF(Returns) and PACF(Returns) figure windows.
2
With Returns selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Ljung-Box Q-Test.
3
On the LBQ tab, in the Parameters section, set Number of Lags to 5.
4
Set Significance Level to 0.0167.
5
In the Tests section, click Run Test.
6
Repeat steps 3 through 5 twice, with these changes. a
Set Number of Lags to 10 and the DOF to 10.
b
Set Number of Lags to 15 and the DOF to 15.
The test results appear in the Results table of the LBQ(Returns) document. 4-125
4
Econometric Modeler
The Ljung-Box Q-test null hypothesis that all autocorrelations up to the tested lags are zero is not rejected for tests at lags 5, 10, and 15. These results, and the ACF and PACF, suggest that a conditional mean model is not needed for this returns series. Check for Conditional Heteroscedasticity To check the returns for conditional heteroscedasticity, Econometric Modeler requires a series of squared residuals. After importing the squared residuals into the app, visually assess whether there is conditional heteroscedasticity by plotting the ACF and PACF of the squared residuals. Then, determine the appropriate number of lags for a GARCH model of the returns by conducting Engle's ARCH test. Compute the series of squared residuals at the command line by demeaning the returns, then squaring each element of the result. Export Returns to the command line: 1
In the Time Series pane, right-click Returns.
2
In the context menu, select Export.
Returns appears in the MATLAB Workspace. Remove the mean from the returns, then square each element of the result. To ensure all series in the Time Series pane are synchronized, Econometric Modeler prepends first-differenced series with a NaN value. Therefore, to estimate the sample mean, use mean(Returns,'omitnan'). Residuals = Returns - mean(Returns,'omitnan'); Residuals2 = Residuals.^2;
Create a table containing the Returns and Residuals2 variables. Tbl = table(Returns,Residuals,Residuals2);
Import Tbl into Econometric Modeler: 1
4-126
On the Econometric Modeler tab, in the Import section, click
.
2
The app must clear the right pane and all documents before importing new data. Therefore, after clicking Import, in the Econometric Modeler dialog box, click OK.
3
In the Import Data dialog box, in the Import? column, select the check box for the Tbl variable.
Select ARCH Lags for GARCH Model Using Econometric Modeler App
4
Click Import.
The variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(Residuals) figure window. Plot the ACF and PACF of the squared residuals. 1
Close the Time Series Plot(Residuals) figure window.
2
In the Time Series pane, select the Residuals2 time series.
3
Click the Plots tab, then click ACF.
4
Click the Plots tab, then click PACF.
5
Drag the PACF(Residuals2) figure window below the ACF(Residuals2) figure window so that you can view them simultaneously.
The sample ACF and PACF of the squared returns show significant autocorrelation. This result suggests that a GARCH model with lagged variances and lagged squared innovations might be appropriate for modeling the returns. 4-127
4
Econometric Modeler
Conduct Engle's ARCH test on the residuals series. Specify a two-lag ARCH model alternative hypothesis. 1
Close all figure windows.
2
In the Time Series pane, select the Residuals time series.
3
On the Econometric Modeler tab, in the Tests section, click New Test > Engle's ARCH Test.
4
On the ARCH tab, in the Parameters section, set Number of Lags to 2.
5
In the Tests section, click Run Test.
The test results appear in the Results table of the ARCH(Residuals) document.
Engle's ARCH test rejects the null hypothesis of no ARCH effects in favor of the alternative ARCH model with two lagged squared innovations. An ARCH model with two lagged innovations is locally equivalent to a GARCH(1,1) model. Create and Fit GARCH Model Fit a GARCH(1,1) model to the returns series.
4-128
1
In the Time Series pane, select the Returns time series.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GARCH.
4
In the GARCH Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1.
c
Because the returns required demeaning, include an offset by selecting the Include Offset check box.
Select ARCH Lags for GARCH Model Using Econometric Modeler App
5
Click Estimate.
The model variable GARCH_Returns appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GARCH_Returns) document.
4-129
4
Econometric Modeler
An alternative way to select lags for a GARCH model is by fitting several models containing different lag polynomial degrees. Then, choose the model yielding the minimal AIC.
See Also Apps Econometric Modeler Objects garch Functions estimate
More About
4-130
•
“Specify Conditional Variance Model for Exchange Rates” on page 8-47
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
Estimate Multiplicative ARIMA Model Using Econometric Modeler App This example uses the Box-Jenkins method [1] to determine a multiplicative seasonal ARIMA (SARIMA) model for a univariate series by using the Econometric Modeler app. Specifically, the example shows this procedure: 1
Visually determine whether the series has an exponential trend, and then remove the trend if one is present. The remaining steps determine the structure of the conditional mean model to fit to the series without the exponential trend.
2
Visually determine whether the transformed series has seasonal integration, and then remove the seasonal integration, if it is present, by taking the seasonal difference. Record the degrees of seasonal integration removed.
3
Determine whether the transformed series has a unit root by conducting a statistical test, and then apply the first difference to the series to stabilize it, if a unit root is present. Record whether a unit root exists.
4
Determine the nonseasonal AR and MA degrees of the conditional mean model for the stationary series by inspecting correlograms of the stabilized series. Record the polynomial degrees.
5
Use the results of steps 2 through 4 to create and estimate a conditional mean model for the nonstationary series with the exponential trend removed.
The data set Data_Airline.mat contains monthly counts of airline passengers. Import Data into Econometric Modeler At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
4-131
4
Econometric Modeler
The series exhibits a seasonal trend, serial correlation, and possible exponential growth. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71. Stabilize Series To properly determine the lag operator polynomial orders for the conditional mean model, correlograms require a stationary series. Address the exponential trend by applying the log transform to PSSG. 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-132
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
The exponential growth appears to be removed from the series. The rest of the example determines the parameters of conditional mean model for the series PSSGLog by several visual and statistical diagnoses and transformations. Address the seasonal trend by applying the 12th order seasonal difference. With PSSGLog selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, set Seasonal to 12. Then, click Seasonal. The transformed variable PSSGLogSeasonalDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiff) figure window.
4-133
4
Econometric Modeler
The transformed series appears to have a unit root. Test the null hypothesis that PSSGLogSeasonalDiff has a unit root by using the Augmented DickeyFuller test. Specify that the alternative is an AR(0) model, then test again specifying an AR(1) model. Adjust the significance level to 0.025 to maintain a total significance level of 0.05. 1
With PSSGLogSeasonalDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Augmented Dickey-Fuller Test.
2
On the ADF tab, in the Parameters section, set Significance Level to 0.025.
3
In the Tests section, click Run Test.
4
In the Parameters section, set Number of Lags to 1.
5
In the Tests section, click Run Test.
The test results appear in the Results table of the ADF(PSSGLogSeasonalDiff) document.
4-134
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
Both tests fail to reject the null hypothesis that the series is a unit root process. Address the unit root by applying the first difference to PSSGLogSeasonalDiff. With PSSGLogSeasonalDiff selected in the Time Series pane, click the Econometric Modeler tab. Then, in the Transforms section, click Difference. The transformed variable PSSGLogSeasonalDiffDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiffDiff) figure window. In the Time Series pane, rename the PSSGLogSeasonalDiffDiff variable by clicking it twice to select its name and entering PSSGStable. The app updates the names of all documents associated with the transformed series.
4-135
4
Econometric Modeler
Identify Model for Series Determine the lag structure for a conditional mean model of the series by plotting the sample autocorrelation function (ACF) and partial autocorrelation function (PACF) of the stabilized series.
4-136
1
With PSSGStable selected in the Time Series pane, click the Plots tab, then click ACF.
2
Show the first 50 lags of the ACF. On the ACF tab, set Number of Lags to 50.
3
Click the Plots tab, then click PACF.
4
Show the first 50 lags of the PACF. On the PACF tab, set Number of Lags to 50.
5
Drag the ACF(PSSGStable) figure window above the PACF(PSSGStable) figure window.
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
According to [1], the autocorrelations in the ACF and PACF, the results in previous steps, and the balance of model parsimony with complexity, suggest that the following SARIMA(0,1,1)×(0,1,1)12 model is appropriate for the nonstationary series with the exponential trend removed PSSGLog. (1 − L) 1 − L12 yt = 1 + θ1L 1 + Θ12L12 εt . Close all figure windows. Specify and Estimate SARIMA Model Specify the SARIMA(0,1,1)×(0,1,1)12 model. 1
In the Time Series pane, select the PSSGLog time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow > SARIMA.
3
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section a
Set Degrees of Integration to 1. 4-137
4
Econometric Modeler
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section
4
a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box.
Click Estimate.
The model variable SARIMA_PSSGLog appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(SARIMA_PSSGLog) document.
4-138
Estimate Multiplicative ARIMA Model Using Econometric Modeler App
The results include: • Model Fit — A time series plot of PSSGLog and the fitted values from SARIMA_PSSGLog. • Residual Plot — A time series plot of the residuals of SARIMA_PSSGLog. • Parameters — A table of estimated parameters of SARIMA_PSSGLog. Because the constant term was held fixed to 0 during estimation, its value and standard error are 0. • Goodness of Fit — The AIC and BIC fit statistics of SARIMA_PSSGLog.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler 4-139
4
Econometric Modeler
Objects arima Functions estimate
More About
4-140
•
“Create Multiplicative Seasonal ARIMA Model for Time Series Data” on page 7-51
•
“Estimate Multiplicative ARIMA Model” on page 7-117
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Share Results of Econometric Modeler App Session” on page 4-237
•
Creating ARIMA Models Using Econometric Modeler App
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App This example shows how to evaluate ARIMA model assumptions by performing residual diagnostics in the Econometric Modeler app. The data set, which is stored in Data_JAustralian.mat, contains the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991, among other time series. Import Data into Econometric Modeler At the command line, load the Data_JAustralian.mat data set. load Data_JAustralian
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including PAU, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(EXCH) figure window. Create a time series plot of PAU by double-clicking PAU in the Time Series pane.
4-141
4
Econometric Modeler
Specify and Estimate ARIMA Model Estimate an ARIMA(2,1,0) model for the log quarterly Australian CPI (for details, see “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112).
4-142
1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Models section, click ARIMA.
3
In the ARIMA Model Parameters dialog box, on the Lag Order tab: a
Set the Degree of Integration to 1.
b
Set the Autoregressive Order to 2.
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
4
Click Estimate.
The model variable ARIMA_PAU appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMA_PAU) document.
4-143
4
Econometric Modeler
In the Model Summary(ARIMA_PAU) document, the Residual Plot figure is a time series plot of the residuals. The plot suggests that the residuals are centered at y = 0 and they exhibit volatility clustering. Perform Residual Diagnostics Visually assess whether the residuals are normally distributed by plotting their histogram and a quantile-quantile plot: 1
Close the Model Summary(ARIMA_PAU) document.
2
With ARIMA_PAU selected in the Models pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
Click Residual Diagnostics > Residual Q-Q Plot.
Inspect the histogram by clicking the Histogram(ARIMA_PAU) figure window.
4-144
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
Inspect the quantile-quantile plot by clicking the QQPlot(ARIMA_PAU) figure window.
4-145
4
Econometric Modeler
The residuals appear approximately normally distributed. However, there is an excess of large residuals, which indicates that a t innovation distribution might be a reasonable model modification. Visually assess whether the residuals are serially correlated by plotting their autocorrelations. With ARIMA_PAU selected in the Models pane, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
4-146
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
All lags that are greater than 0 correspond to insignificant autocorrelations. Therefore, the residuals are uncorrelated in time. Visually assess whether the residuals exhibit heteroscedasticity by plotting the ACF of the squared residuals. With ARIMA_PAU selected in the Models pane, click the Econometric Modeler tab. Then, click the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation.
4-147
4
Econometric Modeler
Significant autocorrelations occur at lags 4 and 5, which suggests a composite conditional mean and variance model for PAU.
See Also Apps Econometric Modeler Objects arima Functions estimate | infer
More About
4-148
•
“Infer Residuals for Diagnostic Checking” on page 7-138
•
“Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230
Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150
•
Creating ARIMA Models Using Econometric Modeler App
4-149
4
Econometric Modeler
Specify t Innovation Distribution Using Econometric Modeler App This example shows how to specify a t innovation distribution for an ARIMA model by using the Econometric Modeler app. The example also shows how to fit the model to data. The data set, which is stored in Data_JAustralian.mat, contains the log quarterly Australian Consumer Price Index (CPI) measured from 1972 and 1991, among other time series. Import Data into Econometric Modeler At the command line, load the Data_JAustralian.mat data set. load Data_JAustralian
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including PAU, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(EXCH) figure window. Create a time series plot of PAU by double-clicking PAU in the Time Series pane.
4-150
Specify t Innovation Distribution Using Econometric Modeler App
Specify and Estimate ARIMA Model Estimate an ARIMA(2,1,0) model for the log quarterly Australian CPI. Specify a t innovation distribution. (For details, see “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 and “Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141.) 1
In the Time Series pane, select the PAU time series.
2
On the Econometric Modeler tab, in the Models section, click ARIMA.
3
In the ARIMA Model Parameters dialog box, on the Lag Order tab: a
Set the Degree of Integration to 1.
b
Set the Autoregressive Order to 2.
c
Click the Innovation Distribution button, then select t.
4-151
4
Econometric Modeler
4
Click Estimate.
The model variable ARIMA_PAU appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMA_PAU) document.
4-152
Specify t Innovation Distribution Using Econometric Modeler App
The app estimates the t innovation degrees of freedom (DoF) along with the model coefficients and variance.
See Also Apps Econometric Modeler Objects arima Functions estimate
More About •
“Specify Conditional Mean Model Innovation Distribution” on page 7-69
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112 4-153
4
Econometric Modeler
•
4-154
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
Estimate Vector Autoregression Model Using Econometric Modeler
Estimate Vector Autoregression Model Using Econometric Modeler This example models the quarterly US GDP growth rate, M1 money supply rate, and the 3-month Tbill rate series by using the Econometric Modeler app. The example shows how to perform the following actions in the app: 1
Stabilize the raw nonstationary series.
2
Fit several competing vector autoregression (VAR) models, and choose the one with the best, parsimonious fit.
3
Diagnose each residual series.
4
Export the chosen model to the command line.
At the command line, the example conducts Granger-causality tests on the series given the estimated model, and it uses the model to generate forecasts. The data set, which is stored in Data_USEconModel.mat, contains the raw, quarterly US GDP, M1 money supply, and 3-month T-bill rate, among other series, from 1947 through 2009. Load and Import Data into Econometric Modeler At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
GDP, M1SL, and TB3MS, among other series, appear in the Time Series pane, and a time series plot containing all series appears in the figure window. Create separate time series plots by double-clicking each of GDP, M1SL, and TB3MS in the Time Series pane. Position the Time Series Plot(series) tabs by clicking and dragging each to see all plots simultaneously.
4-155
4
Econometric Modeler
The US GDP and M1 money supply series exhibit exponential growth, and the 3-month T-Bill series appears like a random walk. Diagnose and Transform Series Remove the exponential trend from the GDP and M1 money supply series by applying the log transform to each series. Click the Econometric Modeler tab, and then, in the Time Series pane, click GDP and Ctrl click M1SL. In the Transforms section, click Log. The transformed series GDPLog and M1SLLog appear in the Time Series pane, and their time series plot appears in the Time Series Plot(GDPLog) figure window.
4-156
Estimate Vector Autoregression Model Using Econometric Modeler
Test the null hypothesis that each series is a unit root process against a stationary AR(p) with drift alternative, where p = 4 through 1. For each series GDPLog, M1SLLog, and TB3MS: 1
On the Econometric Modeler tab, in the Time Series pane, click the series.
2
On the Econometric Modeler tab, in the Time Series pane, click New Test > Augmented Dickey-Fuller Test.
3
On the ADF tab, in the Parameters section, in the Number of Lags box, type 4.
4
In the Tests section, click Run.
5
Repeat steps 3 and 4 for lags 3, 2, and 1.
The following figures show the results. The tests fail to reject the null hypotheses of a unit root series in all cases, which suggests that each series is difference stationary. 4-157
4
Econometric Modeler
Stabilize the series by applying the first difference to each series. 1
Click the Econometric Modeler tab, and then, in the Time Series pane, click GDPLog and Ctrl +click M1SLLog and TB3MS.
2
In the Transforms section, click Difference. The transformed series GDPLogDiff, M1SLLogDiff, and TB3MSDiff appear in the Time Series pane, and their time series plot appears in the Time Series Plot(GDPLogDiff) figure window.
3
Rename GDPLogDiff and M1SLLogDiff to GDPRate and M1SLRate by clicking their names twice in the Time Series pane, and typing their new names.
Estimate VAR Models Estimate 3-D VAR(p) models of the US quarterly GDP growth rate series GDPRate, M1 money supply growth rate series M1SLRate, and change in the 3-month treasury bill rate series TB3MSDiff, where p = 1 through 4.
4-158
1
In the Time Series pane, click GDPRate and Ctrl+click M1SLRate and TB3MSDiff.
2
In the Models section, click VAR.
3
Fit a VAR(1) model (the default) by clicking Estimate in the VAR Model Parameters dialog box.
Estimate Vector Autoregression Model Using Econometric Modeler
The model variable VAR appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(VAR) document. 4
Repeat steps 1 and 2 for each AR order p = 2 through 4. In the Var Model Parameters dialog box, set the AR order by using the Autoregressive Order box. Similar to the VAR(1) estimation, a variable for each model (VAR2, VAR3, and VAR4) appears in the Models pane, and their estimation summaries appear in their respective Model Summary(ModelName) document. You can view properties of an estimated model in the Preview pane by clicking the model in the Models pane. For example, click VAR4.
4-159
4
Econometric Modeler
Select Model with Best In-Sample Fit The estimation summary in each Model Summary(VARp) tab contains a plot of fitted values and residuals, with respect to the time series in the Time Series list, a standard statistical table of estimates and inferences, and a table of information criteria. Compare the information criteria of each estimated model simultaneously by positioning the estimation summary documents so that they occupy the four quadrants of the right pane. The model with the lowest value has the best, parsimonious fit.
4-160
Estimate Vector Autoregression Model Using Econometric Modeler
The VAR(2) model VAR2 produces the lowest AIC and BIC values. Choose this model for further analysis. Check Goodness of Fit Inspect the following VAR(2) plots of each residual series: • Histograms, for center, normality, and outliers
4-161
4
Econometric Modeler
• Quantile-quantile plots, for normality, skewness, and tails • Autocorrelation function (ACF), for serial correlation • ACF of squared residual series, for heteroscedasticity This example diagnoses the residuals visually. Alternatively, you can conduct statistical tests to diagnose the residuals. In the Model Summary(VAR2) document, click Document Actions > Tile All, and then click the Single radio button.
On the Models pane, click VAR2. Plot separate residual histograms. On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram. Histograms of the each residual series appear in the Histogram(VAR2) document.
4-162
Estimate Vector Autoregression Model Using Econometric Modeler
Each residual series appears centered around 0, with varying degrees of slight skewness and possible outliers. This example proceeds without addressing possible skewness and outliers. Plot separate residual quantile-quantile plots. With VAR2 selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual QQ Plot. Quantile-quantile plots of each residual series appear in the QQPlot(VAR2) document.
4-163
4
Econometric Modeler
Each series has slightly fatter tails than what is expected by the normal distribution. This example proceeds without addressing possible kurtosis. Plot separate ACF plots of each residual series. With VAR2 selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function. ACF plots of each residual series appear in the ACF(VAR2) document.
4-164
Estimate Vector Autoregression Model Using Econometric Modeler
Each series has several significant, albeit small, autocorrelations. For example, the GDPRate residual series has significant autocorrelations at lags 4 and 16, and the M1SLRate and TB3MSDiff quarterly difference series both have significant autocorrelations at lags 7. To address the autocorrelations, you can include higher AR lags in the VAR model for estimation, but this example proceeds without addressing the autocorrelations this way. Plot separate ACF plots of each squared residual series. With VAR2 selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation. ACF plots of the each squared residual series appear in the ACF(VAR2)2 document.
4-165
4
Econometric Modeler
The M1SLRate and TB3MSDiff series have significant autocorrelations, which suggests that heteroscedasticity is present. This example proceeds without addressing possible heteroscedasticity. Export Model to Workspace Export the model to the workspace. 1
With the VAR2 model selected in the Models pane, on the Econometric Modeler tab, in the Export section, click Export > Export Variables.
2
In the Export Variables dialog box, select the Select check box for GDPRate, M1SLRate, and TB3MSDiff.
3
Click Export.
The variables GDPRate, M1SLRate, TB3MSDiff, and VAR2 appear in the workspace.
4-166
Estimate Vector Autoregression Model Using Econometric Modeler
Conduct Causality Analysis at Command Line Determine whether series in the system Granger-cause the other series by conducting 1-step, leaveone-out Granger-causality tests. Pass the estimated VAR(2) model to gctest. gctest(VAR2) H0 _______________________________________________
Decision __________________
"Exclude "Exclude "Exclude "Exclude "Exclude "Exclude
"Reject "Reject "Cannot "Reject "Reject "Cannot
lagged lagged lagged lagged lagged lagged
M1SLRate in GDPRate equation" TB3MSDiff in GDPRate equation" GDPRate in M1SLRate equation" TB3MSDiff in M1SLRate equation" GDPRate in TB3MSDiff equation" M1SLRate in TB3MSDiff equation"
H0" H0" reject H0" H0" H0" reject H0"
Distribution ____________ "Chi2(2)" "Chi2(2)" "Chi2(2)" "Chi2(2)" "Chi2(2)" "Chi2(2)"
The null hypothesis of the 1-step leave-one-out Granger-causality test is that a series does not 1-step Granger-cause another series, conditioned on all other series being present in the system. The results suggest: • Given that the 3-month T-bill change is in the system, enough evidence exists to suggest that M1 money supply rate 1-step Granger-causes the GDP rate. • Given that the M1 money supply rate is in the system, enough evidence exists to suggest that the 3-month T-bill change 1-step Granger-causes the GDP rate. • Given that the 3-month T-bill change is in the system, not enough evidence exists to suggest that the GDP rate 1-step Granger-causes the M1 money supply rate. • Given that the GDP rate is in the system, enough evidence exists to suggest that the 3-month T-bill change 1-step Granger-causes the M1 money supply rate. • Given that the M1 money supply rate is in the system, enough evidence exists to suggest that the GDP rate 1-step Granger-causes the 3-month T-bill change. • Given that the GDP rate is in the system, not enough evidence exists to suggest that the 1-step M1 money supply rate Granger-causes the 3-month T-bill change. Generate Forecasts at Command Line Generate forecasts and approximate 95% forecast intervals from the estimated VAR(2) model for the next four years (16 quarters). For convenience, use the entire series as a presample for the forecasts. The forecast function discards all specified presample observations except for the required final two observations. Y = [GDPRate M1SLRate TB3MSDiff]; [YF,YFMSE] = forecast(VAR2,16,Y); YFSE = cell2mat(cellfun(@(x)sqrt(diag(x)'),YFMSE, ... UniformOutput=false)); UB = YF + 1.96*YFSE; LB = YF - 1.96*YFSE; datesF = DataTimeTable.Time(end) + calquarters(1:16); figure tiledlayout(3,1) for j = 1:VAR2.NumSeries nexttile h1 = plot(DataTimeTable.Time(end-30:end),Y(end-30:end,j), ... Color=[.75,.75,.75]);
4-167
Stat ____
8. 16 3. 20 20 0.4
4
Econometric Modeler
hold on h2 = plot(datesF,YF(:,j),"r",LineWidth=2); h3 = plot(datesF,UB(:,j),"k--",LineWidth=1.5); plot(datesF,LB(:,j),"k--",LineWidth=1.5); ct = [Y(end,j) YF(1,j); Y(end,j) LB(1,j); Y(end,j) UB(1,j);]; plot([DataTimeTable.Time(end); datesF(1)],ct,Color=[.75,.75,.75]) legend([h1 h2 h3],VAR2.SeriesNames(j),"Forecast", ... "Forecast interval",Location="northwest") hold off end
See Also Apps Econometric Modeler Objects varm Functions gctest | forecast
More About • 4-168
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
Estimate Vector Autoregression Model Using Econometric Modeler
•
“Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50
•
“Forecast VAR Model Conditional Responses” on page 9-66
4-169
4
Econometric Modeler
Conduct Cointegration Test Using Econometric Modeler This example models the annual Canadian inflation and interest rate series by using the Econometric Modeler app. The example performs the following actions in the app: 1
Test each raw series for stationarity.
2
Test for cointegration using the Engle-Granger cointegration test.
3
Test for cointegration among all possible cointegration ranks using the Johansen cointegration test.
The data set, which is stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. Load and Import Data into Econometric Modeler At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(INF_C) figure window.
4-170
Conduct Cointegration Test Using Econometric Modeler
Plot only the three interest rate series INT_L, INT_M, and INT_S together. In the Time Series pane, click INT_L and Ctrl click INT_M and INT_S. Then, on the Plots tab, in the Plots section, click Time Series. The time series plot appears in the Time Series(INT_L) document.
4-171
4
Econometric Modeler
The interest rate series each appear nonstationary, and they appear to move together with meanreverting spread. In other words, they exhibit cointegration. To establish these properties, this example conducts statistical tests. Conduct Stationarity Test Assess whether each interest rate series is stationary by conducting Phillips-Perron unit-root tests. For each series, assume a stationary AR(1) process with drift for the alternative hypothesis. You can confirm this property by viewing the autocorrelation and partial autocorrelation function plots. Close all plots in the right pane, and perform the following procedure for each series INT_L, INT_M, and INT_S.
4-172
1
In the Time Series pane, click a series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Phillips-Perron Test.
3
On the PP tab, in the Parameters section, in the Number of Lags box, type 1, and in the Model list select Autoregressive with Drift.
Conduct Cointegration Test Using Econometric Modeler
4
In the Tests section, click Run Test. Test results for the selected series appear in the PP(series) tab.
Position the test results to view them simultaneously.
All tests fail to reject the null hypothesis that the series contains a unit root process. Conduct Cointegration Tests Econometric Modeler supports the Engle-Granger and Johansen cointegration tests. Before conducting a cointegration, determine whether there is visual evidence of cointegration by regressing each interest rate on the other interest rates. 1
In the cointegrating relation, assign INT_L as the dependent variable and INT_M and INT_S as the independent variables. In the Time Series pane, click INT_L. 4-173
4
Econometric Modeler
4-174
2
On the Econometric Modeler tab, in the Models section, click the arrow > MLR.
3
In the MLR Model Parameters dialog box, select the Include? check boxes of the time series INT_M and INT_S.
4
Click Estimate. The model variable MLR_INT_L appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(MLR_INT_L) document.
5
Iterate steps 1 through 4 twice. For the first iteration, assign INT_M as the dependent variable and INT_L and INT_S as the independent variables. For the second iteration, assign INT_S as the dependent variable and INT_M and INT_L as the independent variables.
6
On each Model Summary(MLR_seriesName) tab, determine whether the residual plot appears stationary.
Conduct Cointegration Test Using Econometric Modeler
4-175
4
Econometric Modeler
The residual series of the regression on INT_L appears trending, and the other residual series appear stationary. For the Engle-Granger test, choose INT_S as the dependent variable. Conduct Engle-Granger Test The Engle-Granger tests for one cointegrating relation by performing two univariate regressions: the cointegrating regression and the subsequent residual regression (for more details, see “Identifying Single Cointegrating Relations” on page 9-113). Therefore, to perform the cointegrating regression, the test requires: • The response series that takes the role of the dependent variable in the first regression. • Deterministic terms to include in the cointegrating regression. Then, the test assesses whether the residuals resulting from the cointegrating regression are a unit root process. Available tests are the Augmented Dickey-Fuller (adftest) or Phillips-Perron (pptest) test. Both perform a residual regression to form the test statistic. Therefore, the test also requires: • The unit root test to conduct, either the adftest or pptest function. The residual regression model for both tests is an AR model without deterministic terms. For more flexible models, such as AR models with drift, call the functions at the command line. • Number of lagged residuals to include in the AR model. Conduct the Engle-Granger test.
4-176
1
In the Time Series pane, click INT_L and Ctrl+click INT_M and INT_S.
2
In the Tests section, click New Test > Engle-Granger Test.
3
On the EGCI tab, in the Parameters section: a
In the Dependent Variable list, select INT_S.
b
In the Residual Regression Form list, select PP for the Phillips-Perron unit root test.
c
In the Number of Lags box, type 1.
Conduct Cointegration Test Using Econometric Modeler
4
In the Tests section, click Run Test. Test results appear in the EGCI document.
5
In the Tests section, click Run Test again. Test results and a plot of the cointegration relation for the largest rank appear in the EGCI tab.
The test rejects the null hypothesis that the series does not exhibit cointegration. Although the test is suited to determine whether series are cointegrated, the results are limited to that. For example, the test does not give insight into the cointegrating rank, which is required to form a VEC model of the series. For more details, see “Identifying Single Cointegrating Relations” on page 9-113. Johansen Test The Johansen cointegration test runs a separate test for each possible cointegration rank 0 through m – 1, where m is the number of series. This characteristic makes the Johansen test better suited than the Engle-Granger test to determine the cointegrating rank for a VEC model of the series. Also, the Johansen test framework is multivariate, and, therefore, results are not relative to an arbitrary choice for a univariate regression response variable. For more details, see “Identifying Multiple Cointegrating Relations” on page 9-136. 4-177
4
Econometric Modeler
The Johansen test requires: • Deterministic terms to include in the cointegrating relation and the model in levels. • Number of short-run lags in the VEC model of the series. Because the raw series do not contain a linear trend, assume that the only deterministic term in the model is an intercept in the cointegrating relation (H1* Johansen form), and include 1 lagged difference term in the model.
4-178
1
With INT_L, INT_M, and INT_S selected in the Time Series pane, click Tests section, click New Test > Johansen Test.
2
On the JCI tab, in the Parameters section, in the Number of Lags box, type 1, and in the Model list select H1*.
3
In the Tests section, click Run Test. Test results and a plot of the cointegration relation for the largest rank appear in the JCI tab.
Conduct Cointegration Test Using Econometric Modeler
Econometric Modeler conducts a separate test for each cointegration rank 0 through 2 (the number of series – 1). The test rejects the null hypothesis of no cointegration (Cointegration rank = 0), but fails to reject the null hypotheses of Cointegration rank ≤ 1 and Cointegration rank ≤ 2. The results suggest that the cointegration rank is 1.
See Also Apps Econometric Modeler Objects vecm Functions forecast
More About •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
4-179
4
Econometric Modeler
Estimate Vector Error-Correction Model Using Econometric Modeler This example models the annual Canadian inflation and interest rate series by using the Econometric Modeler app. The example shows how to perform the following actions in the app: 1
Test each raw series for stationarity.
2
Test for cointegration and determine a Johansen cointegration form if cointegration is present.
3
Fit several completing vector error-correction (VEC) models, and choose the one with the best, parsimonious fit.
4
Diagnose each residual series.
5
Export the chosen model to the command line.
At the command line, the example uses the model to generate forecasts. The data set, which is stored in Data_Canada, contains annual Canadian inflation and interest rates from 1954 through 1994. Load and Import Data into Econometric Modeler At the command line, load the Data_Canada.mat data set. load Data_Canada
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The Canadian interest and inflation rate variables appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(INF_C) figure window.
4-180
Estimate Vector Error-Correction Model Using Econometric Modeler
Plot only the three interest rate series INT_L, INT_M, and INT_S together. In the Time Series pane, click INT_L and Ctrl+click INT_M and INT_S. Then, on the Plots tab, in the Plots section, click Time Series. The time series plot appears in the Time Series(INT_L) document.
4-181
4
Econometric Modeler
The interest rate series each appear nonstationary, and they appear to move together with meanreverting spread. In other words, they exhibit cointegration. To establish these properties, this example conducts statistical tests. Conduct Stationarity Test Assess whether each interest rate series is stationary by conducting Phillips-Perron unit root tests. For each series, assume a stationary AR(1) process with drift for the alternative hypothesis. You can confirm this property by viewing the autocorrelation and partial autocorrelation function plots. Close all plots in the right pane, and perform the following procedure for each series INT_L, INT_M, and INT_S.
4-182
1
In the Time Series pane, click a series.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Phillips-Perron Test.
3
On the PP tab, in the Parameters section, in the Number of Lags box, type 1, and in the Model list select Autoregressive with Drift.
Estimate Vector Error-Correction Model Using Econometric Modeler
4
In the Tests section, click Run Test. Test results for the selected series appear in the PP(series) tab.
Position the test results to view them simultaneously.
All tests fail to reject the null hypothesis that the series contains a unit root process. Conduct Cointegration Test To create and estimate a VEC model of unit root series, the series need to exhibit cointegration. Conduct a Johansen test. Because the raw series do not contain a linear trend, assume that the only deterministic term in the model is an intercept in the cointegrating relation (H1* Johansen form), and include 1 lagged difference term in the model. 1
In the Time Series pane, click INT_L and Ctrl+click INT_M and INT_S. 4-183
4
Econometric Modeler
2
In the Tests section, click New Test > Johansen Test.
3
On the JCI tab, in the Parameters section, in the Number of Lags box, type 1. In the Model list, select H1*.
4
In the Tests section, click Run Test. Test results and a plot of the cointegration relation for the largest rank appear in the JCI tab.
Econometric Modeler conducts a separate test for each cointegration rank 0 through 2 (the number of series – 1). The test rejects the null hypothesis of no cointegration (Cointegration rank = 0), but fails to reject the null hypothesis of Cointegration rank ≤ 1. The conclusion is to set the cointegration rank of the VEC model to 1. Estimate VEC Models Estimate 3-D VEC(p) models of the interest rate series, with a cointegration rank of 1 and p = 1 and 2.
4-184
Estimate Vector Error-Correction Model Using Econometric Modeler
1
With INT_L, INT_M, and INT_S selected in the Time Series pane, in the Models section, click VEC.
2
In the VEC Model Parameters dialog box, in the Johansen Form, select H1*. Fit the VEC(1) model by clicking Estimate.
The model variable VEC appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(VEC) document. 3
Repeat steps 1 and 2 for the short-run polynomial order p = 2. Similar to the VEC(1) estimation, the variable VEC2 appears in the Models pane, and its estimation summary appears in the Model Summary(VEC2) document. You can view properties of an estimated model in the Preview pane by clicking the model in the Models pane. For example, click VEC. 4-185
4
Econometric Modeler
Select Model with Best In-Sample Fit The estimation summary in each Model Summary(VECp) tab contains a plot of fitted values and residuals, with respect to the time series in the Time Series list, a standard statistical table of estimates and inferences, and a table of information criteria. Compare the information criteria of each estimated model simultaneously by positioning the estimation summary documents so that they occupy the left and right sections of the right pane. The model with the lowest value has the best, parsimonious fit.
The VEC(1) model VEC produces the lowest AIC and BIC values. Choose this model for further analysis. 4-186
Estimate Vector Error-Correction Model Using Econometric Modeler
Check Goodness of Fit Inspect the following VEC(1) plots of each residual series: • Histograms, for center, normality, and outliers • Quantile-quantile plots, for normality, skewness, and tails • Autocorrelation function (ACF), for serial correlation • ACF of squared residual series, for heteroscedasticity This example diagnoses the residuals visually. Alternatively, you can conduct statistical tests to diagnose the residuals. Dismiss the Model Summary(VEC2) document by clicking
on its tab.
On the Models pane, click VEC. Plot separate residual histograms. On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram. Histograms of the each residual series appear in the Histogram(VEC) document.
4-187
4
Econometric Modeler
Each residual series appears approximately centered around 0 and approximately normal. Plot separate residual quantile-quantile plots. With VEC selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual QQ Plot. Quantile-quantile plots of each residual series appear in the QQPlot(VEC) document.
The residual series are slightly skewed left and have lighter tails than the normal distribution. This example proceeds without addressing possible skewness and light tails. Plot separate ACF plots of each residual series. With VEC selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function. ACF plots of each residual series appear in the ACF(VEC) document.
4-188
Estimate Vector Error-Correction Model Using Econometric Modeler
The residuals do not exhibit significant autocorrelation. Plot separate ACF plots of each squared residual series. With VEC selected in the Time Series pane, on the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation. ACF plots of each squared residual series appear in the ACF(VEC)2 document.
4-189
4
Econometric Modeler
Each squared residual series has significant autocorrelations at early lags, which suggests that heteroscedasticity is present in all series. This example proceeds without addressing possible heteroscedasticity. Export Model to Workspace Export the model to the workspace. 1
With the VEC model selected in the Models pane, on the Econometric Modeler tab, in the Export section, click Export > Export Variables.
2
In the Export Variables dialog box, select the Select check box for INT_L, INT_M, and INT_S.
3
Click Export.
The variables INT_L, INT_M, INT_S, and VEC appear in the workspace. Generate Forecasts at Command Line Generate forecasts and approximate 95% forecast intervals from the estimated VEC(1) model for the next five years. For convenience, use the entire series as a presample for the forecasts. The 4-190
Estimate Vector Error-Correction Model Using Econometric Modeler
forecast function discards all specified presample observations except for the required final observation. Y0 = [INT_L INT_M INT_S]; [YF,YFMSE] = forecast(VEC,5,Y0); YFSE = cell2mat(cellfun(@(x)sqrt(diag(x)'),YFMSE,UniformOutput=false)); UB = YF + 1.96*YFSE; LB = YF - 1.96*YFSE; datesF = DataTimeTable.Time(end) + calyears(1:5); figure tiledlayout(3,1) for j = 1:VEC.NumSeries nexttile h1 = plot(DataTimeTable.Time,Y0(:,j),Color=[.75,.75,.75]); hold on h2 = plot(datesF,YF(:,j),"r",LineWidth=2); h3 = plot(datesF,UB(:,j),"k--",LineWidth=1.5); plot(datesF,LB(:,j),"k--",LineWidth=1.5); ct = [Y0(end,j) YF(1,j); Y0(end,j) LB(1,j); Y0(end,j) UB(1,j);]; plot([DataTimeTable.Time(end); datesF(1)],ct,Color=[.75,.75,.75]) legend([h1 h2 h3],VEC.SeriesNames(j),"Forecast", ... "Forecast interval",Location="northwest") hold off end
4-191
4
Econometric Modeler
See Also Apps Econometric Modeler Objects vecm Functions forecast
More About
4-192
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50
Compare Predictive Performance After Creating Models Using Econometric Modeler
Compare Predictive Performance After Creating Models Using Econometric Modeler This example shows how to choose lags for an ARIMA model by comparing the AIC values of estimated models using the Econometric Modeler app. The example also shows how to compare the predictive performance of several models that have the best in-sample fits at the command line. The data set Data_Airline.mat contains monthly counts of airline passengers. Import Data into Econometric Modeler At the command line, load the Data_Airline.mat data set. load Data_Airline
To compare predictive performance later, reserve the last two years of data as a holdout sample. fHorizon = 24; HoldoutTimeTable = DataTimeTable((end - fHorizon + 1):end,:); DataTimeTable((end - fHorizon + 1):end,:) = [];
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
4-193
4
Econometric Modeler
The series exhibits a seasonal trend, serial correlation, and possible exponential growth. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71. Remove Exponential Trend Address the exponential trend by applying the log transform to PSSG. 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-194
Compare Predictive Performance After Creating Models Using Econometric Modeler
The exponential growth appears to be removed from the series. Compare In-Sample Model Fits Box, Jenkins, and Reinsel suggest a SARIMA(0,1,1)×(0,1,1)12 model without a constant for PSSGLog [1] (for more details, see “Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131). However, consider all combinations of monthly SARIMA models that include up to two seasonal and nonseasonal MA lags. Specifically, iterate the following steps for each of the nine models of the form SARIMA(0,1,q)×(0,1,q12)12, where q ∈ {0,1,2} and q12 ∈ {0,1,2}. 1
For the first iteration: a
Let q = q12 = 0.
b
With PSSGLog selected in the Time Series pane, click the Econometric Modeler tab. In the Models section, click the arrow to display the models gallery.
c
In the models gallery, in the ARMA/ARIMA Models section, click SARIMA.
d
In the SARIMA Model Parameters dialog box, on the Lag Order tab:
4-195
4
Econometric Modeler
• Nonseasonal section i
Set Degrees of Integration to 1.
ii
Set Moving Average Order to 0.
iii
Clear the Include Constant Term check box.
• Seasonal section
e 2
i
Set Period to 12 to indicate monthly data.
ii
Set Moving Average Order to 0.
iii
Select the Include Seasonal Difference check box.
Click Estimate.
Rename the new model variable. a
In the Models pane, click the new model variable twice to select its name.
b
Enter SARIMA01qx01q12. For example, when q = q12 = 0, rename the variable to SARIMA010x010.
3
In the Model Summary(SARIMA01qx01q12) document, in the Goodness of Fit table, note the AIC value. For example, for the model variable SARIMA010x010, the AIC is in this figure.
4
For the next iteration, choose values of q and q12. For example, q = 0 and q12 = 1 for the second iteration.
5
In the Models pane, right-click SARIMA01qx01q12. In the context menu, select Modify to open the SARIMA Model Parameters dialog box with the current settings for the selected model.
6
In the SARIMA Model Parameters dialog box: a
In the Nonseasonal section, set Moving Average Order to q.
b
In the Seasonal section, set Moving Average Order to q12.
c
Click Estimate.
After you complete the steps, the Models pane contains nine estimated models named SARIMA010x010 through SARIMA012x012. The resulting AIC values are in this table.
4-196
Compare Predictive Performance After Creating Models Using Econometric Modeler
Model
Variable Name
AIC
SARIMA(0,1,0)× (0,1,0)12
SARIMA010x010
-410.3520
SARIMA(0,1,0)× (0,1,1)12
SARIMA010x011
-443.0009
SARIMA(0,1,0)× (0,1,2)12
SARIMA010x012
-441.0010
SARIMA(0,1,1)× (0,1,0)12
SARIMA011x010
-422.8680
SARIMA(0,1,1)× (0,1,1)12
SARIMA011x011
-452.0039
SARIMA(0,1,1)× (0,1,2)12
SARIMA011x012
-450.0605
SARIMA(0,1,2)× (0,1,0)12
SARIMA012x010
-420.9760
SARIMA(0,1,2)× (0,1,1)12
SARIMA012x011
-450.0087
SARIMA(0,1,2)× (0,1,2)12
SARIMA012x012
-448.0650
The three models yielding the lowest three AIC values are SARIMA(0,1,1)×(0,1,1)12, SARIMA(0,1,1)× (0,1,2)12, and SARIMA(0,1,2)×(0,1,1)12. These models have the best parsimonious in-sample fit. Export Best Models to Workspace Export the models with the best in-sample fits. 1 2
On the Econometric Modeler tab, in the Export section, click
.
In the Export Variables dialog box, in the Models column, click the Select check box for the SARIMA011x011, SARIMA011x012, and SARIMA012x011. Clear the check box for any other selected models.
4-197
4
Econometric Modeler
3
Click Export.
The arima model objects SARIMA011x011, SARIMA011x012, and SARIMA012x011 appear in the MATLAB Workspace. Estimate Forecasts At the command line, estimate two-year-ahead forecasts for each model. f5 = forecast(SARIMA_PSSGLog5,fHorizon); f6 = forecast(SARIMA_PSSGLog6,fHorizon); f8 = forecast(SARIMA_PSSGLog8,fHorizon);
f5, f6, and f8 are 24-by-1 vectors containing the forecasts. Compare Prediction Mean Square Errors Estimate the prediction mean square error (PMSE) for each of the forecast vectors. logPSSGHO = log(HoldoutTimeTable.Variables); pmse5 = mean((logPSSGHO - f5).^2); pmse6 = mean((logPSSGHO - f6).^2); pmse8 = mean((logPSSGHO - f8).^2);
Identify the model yielding the lowest PMSE. [~,bestIdx] = min([pmse5 pmse6 pmse8],[],2)
The SARIMA(0,1,1)×(0,1,1)12 model performs the best in-sample and out-of-sample.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
4-198
Compare Predictive Performance After Creating Models Using Econometric Modeler
See Also Apps Econometric Modeler Objects arima Functions estimate | infer
More About •
“Assess Predictive Performance” on page 3-88
•
“Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131
•
“Share Results of Econometric Modeler App Session” on page 4-237
4-199
4
Econometric Modeler
Estimate ARIMAX Model Using Econometric Modeler App This example shows how to specify and estimate an ARIMAX model using the Econometric Modeler app. The data set, which is stored in Data_CreditDefaults.mat, contains annual investmentgrade corporate bond default rates, among other predictors, from 1984 through 2004. Consider modeling corporate bond default rates as a linear, dynamic function of the other time series in the data set. Import Data into Econometric Modeler At the command line, load the Data_CreditDefaults.mat data set. load Data_CreditDefaults
For more details on the data set, enter Description at the command line. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variables, including IGD, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(AGE) figure window. Assess Stationarity of Dependent Variable In the Time Series pane, double-click IGD. The value of IGD appears in the Preview pane, and a time series plot for IGD appears in the Time Series Plot(IGD) figure window.
4-200
Estimate ARIMAX Model Using Econometric Modeler App
IGD appears to be stationary. Assess whether IGD has a unit root by conducting a Phillips-Perron test: 1
On the Econometric Modeler tab, in the Tests section, click New Test > Phillips-Perron Test.
2
On the PP tab, in the Parameters section, set Number of Lags to 1.
3
In the Tests section, click Run Test.
The test results in the Results table of the PP(IGD) document.
4-201
4
Econometric Modeler
The test rejects the null hypothesis that IGD contains a unit root. Inspect Correlation and Collinearity Among Variables Plot the pairwise correlations between variables. 1
Select all variables in the Time Series pane by clicking AGE, then press Shift and click SPR.
2
Click the Plots tab, then click Correlations.
A correlations plot appears in the Correlations(AGE) figure window.
All predictors appear weakly associated with IGD. You can test whether the correlation coefficients are significant by using corrplot at the command line. Assess whether any variables are collinear by performing Belsley collinearity diagnostics:
4-202
1
In the Time Series pane, select all variables.
2
Click the Econometric Modeler tab. Then, in the Tests section, click New Test > Belsley Collinearity Diagnostics.
Estimate ARIMAX Model Using Econometric Modeler App
Tabular results appear in the Collinearity(AGE) document.
None of the condition indices are greater than the condition-index tolerance (30). Therefore, the variables do not exhibit multicollinearity. Specify and Estimate ARIMAX Model Consider an ARIMAX(0,0,1) model for IGD containing all predictors. Specify and estimate the model. 1
In the Time Series pane, click IGD.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the ARMA/ARIMA Models section, click ARIMAX.
4
In the ARIMAX Model Parameters dialog box, on the Lag Order tab, set Moving Average Order to 1.
5
In the Predictors section, select the Include? check box for each time series.
4-203
4
Econometric Modeler
6
4-204
Click Estimate. The model variable ARIMAX_IGD appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(ARIMAX_IGD) document.
Estimate ARIMAX Model Using Econometric Modeler App
At a 0.10 significance level, all predictors and the MA coefficient are significant. Close all figure windows and documents. Check Goodness of Fit Check that the residuals are normally distributed and uncorrelated by plotting a histogram, quantilequantile plot, and ACF of the residuals. 1
In the Models pane, select ARIMAX_IGD.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
Click Residual Diagnostics > Residual Q-Q Plot.
4
Click Residual Diagnostics > Autocorrelation Function.
5
In the right pane, drag the Histogram(ARIMAX_IGD) and QQPlot(ARIMAX_IGD) figure windows so that they occupy the upper two quadrants, and drag the ACF so that it occupies the lower two quadrants.
4-205
4
Econometric Modeler
The residual histogram and quantile-quantile plots suggest that the residuals might not be normally distributed. According to the ACF plot, the residuals do not exhibit serial correlation. Standard inferences rely on the normality of the residuals. To remedy nonnormality, you can try transforming the response, then estimating the model using the transformed response.
See Also Apps Econometric Modeler Objects arima Functions estimate | pptest | corrplot
4-206
Estimate ARIMAX Model Using Econometric Modeler App
More About •
“Forecast IGD Rate from ARX Model” on page 7-124
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
4-207
4
Econometric Modeler
Estimate Regression Model with ARMA Errors Using Econometric Modeler App This example shows how to specify and estimate a regression model with ARMA errors using the Econometric Modeler app. The data set, which is stored in Data_USEconModel.mat, contains the US personal consumption expenditures measured quarterly, among other series. Consider modeling the US personal consumption expenditures (PCEC, in $ billions) as a linear function of the effective federal funds rate (FEDFUNDS), unemployment rate (UNRATE), and real gross domestic product (GDP, in $ billions with respect to the year 2000). Import Data into Econometric Modeler At the command line, load the Data_USEconModel.mat data set. load Data_USEconModel
Convert the federal funds and unemployment rates from percents to decimals. DataTimeTable.UNRATE = 0.01*DataTimeTable.UNRATE; DataTimeTable.FEDFUNDS = 0.01*DataTimeTable.FEDFUNDS;
Convert the nominal GDP to real GDP by dividing all values by the GDP deflator (GDPDEF) and scaling the result by 100. Create a column in DataTimeTable for the real GDP series. DataTimeTable.RealGDP = 100*DataTimeTable.GDP./DataTimeTable.GDPDEF;
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
All time series variables in DataTimeTable appear in the Time Series pane, and a time series plot of the series appears in the Time Series Plot(COE) figure window. Plot the Series Plot the PCEC, RealGDP, FEDFUNDS, and UNRATE series on separate plots.
4-208
1
In the Time Series pane, double-click PCEC.
2
Repeat step 1 for RealGDP, FEDFUNDS, and UNRATE.
3
In the right pane, drag the Time Series Plot(PCEC) figure window to the top so that it occupies the first two quadrants.
4
Drag the Time Series Plot(RealGDP) figure window to the first quadrant.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
5
Drag the Time Series Plot(UNRATE) figure window to the third quadrant.
The PCEC and RealGDP series appear to have an exponential trend. The UNRATE and FEDFUNDS series appear to have a stochastic trend. Right-click the tab for any figure window, then select Close All to close all the figure windows. Assess Collinearity Among Series Check whether the series are collinear by performing Belsley collinearity diagnostics. 1
In the Time Series pane, select PCEC. Then, press Ctrland click to select RealGDP, FEDFUNDS, and UNRATE.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Belsley Collinearity Diagnostics.
The Belsley collinearity diagnostics results appear in the Collinearity(FEDFUNDS) document.
4-209
4
Econometric Modeler
All condition indices are below the default condition-index tolerance, which is 30. The time series do not appear to be collinear. Specify and Estimate Linear Model Specify a linear model in which PCEC is the response and RealGDP, FEDFUNDS, and UNRATE are predictors.
4-210
1
In the Time Series pane, select PCEC.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click MLR.
4
In the MLR Model Parameters dialog box, in the Predictors section, select the Include? check box for the FEDFUNDS, RealGDP, and UNRATE time series.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
5
Click Estimate.
The model variable MLR_PCEC appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(MLR_PCEC) document.
4-211
4
Econometric Modeler
In the Model Summary(MLR_PCEC) figure window, the residual plot suggests that the standard linear model assumption of uncorrelated errors is violated. The residuals appear autocorrelated, nonstationary, and possibly heteroscedastic. Stabilize Variables To stabilize the residuals, stabilize the response and predictor series by converting the PCEC and RealGDP prices to returns, and by applying the first difference to FEDFUNDS and UNRATE. Convert PCEC and RealGDP prices to returns: 1
In the Time Series pane, select the PCEC time series, then press Ctrl and select the RealGDP time series.
2
On the Econometric Modeler tab, in the Transforms section, click Log, then click Diff. In the Time Series pane, variables representing the logged and differenced time series appear.
3
In the Time Series pane, rename the PCECLogDiff and RealGDPLogDiff. Click the PCECLogDiff variable twice to select its name and enter PCECReturns. Click the RealGDPLogDiff variable twice to select its name and enter RealGDPReturns.
Apply the first difference to FEDFUNDS and UNRATE: 4-212
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
1
In the Time Series pane, select the FEDFUNDS time series, then press Ctrl and select the UNRATE time series.
2
On the Econometric Modeler tab, in the Transforms section, click Difference. In the Time Series pane, variables representing the first difference of the time series appear.
3
Close all figure windows and documents.
Respecify and Estimate Linear Model Respecify the linear model, but use the stabilized series instead. 1
In the Time Series pane, select PCECReturns.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click MLR.
4
In the MLR Model Parameters dialog box, in the Predictors section, select the Include? check box for the FEDFUNDSDiff, RealGDPReturns, and UNRATEDiff time series.
5
Click Estimate.
The model variable MLR_PCECReturns appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(MLR_PCECReturns) document.
4-213
4
Econometric Modeler
The residual plot suggests that the residuals are autocorrelated. Check Goodness of Fit of Linear Model Assess whether the residuals are normally distributed and autocorrelated by generating quantilequantile and ACF plots. Create a quantile-quantile plot of the MLR_PCECReturns model residuals: 1
In the Time Series pane, select the MLR_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Q-Q Plot.
The residuals are skewed to the right. Plot the ACF of the residuals:
4-214
1
In the Time Series pane, select the MLR_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
3
On the ACF tab, set Number of Lags to 40.
The plot shows autocorrelation in the first 34 lags. Specify and Estimate Regression Model with ARMA Errors Attempt to remedy the autocorrelation in the residuals by specifying a regression model with ARMA(1,1) errors for PCECReturns. 1
In the Time Series pane, select PCECReturns.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box: a
In the Lag Order tab: i
Set Autoregressive Order to 1. 4-215
4
Econometric Modeler
ii
Set Moving Average Order to 1.
b
In the Predictors section, select the Include? check box for the FEDFUNDSDiff, RealGDPReturns, and UNRATEDiff time series.
c
Click Estimate.
The model variable RegARMA_PCECReturns appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(RegARMA_PCECReturns) document.
4-216
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
The t statistics suggest that all coefficients are significant, except for the coefficient of UNRATEDiff. The residuals appear to fluctuate around y = 0 without autocorrelation. Check Goodness of Fit of ARMA Error Model Assess whether the residuals of the RegARMA_PCECReturns model are normally distributed and autocorrelated by generating quantile-quantile and ACF plots. Create a quantile-quantile plot of the RegARMA_PCECReturns model residuals: 1
In the Models pane, select the RegARMA_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Q-Q Plot.
4-217
4
Econometric Modeler
The residuals appear approximately normally distributed. Plot the ACF of the residuals:
4-218
1
In the Models pane, select the RegARMA_PCECReturns model.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
Estimate Regression Model with ARMA Errors Using Econometric Modeler App
The first autocorrelation lag is significant. From here, you can estimate multiple models that differ by the number of autoregressive and moving average polynomial orders in the ARMA error model. Then, choose the model with the lowest fit statistic. Or, you can check the predictive performance of the models by comparing forecasts to outof-sample data.
See Also Apps Econometric Modeler Objects regARIMA | LinearModel Functions estimate | fitlm | autocorr | collintest 4-219
4
Econometric Modeler
More About
4-220
•
“Estimate Regression Model with ARIMA Errors” on page 5-88
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App This example shows how to specify and fit GARCH, EGARCH, and GJR models to data using the Econometric Modeler app. Then, the example determines the model that fits to the data the best by comparing fit statistics. The data set, which is stored in Data_FXRates.mat, contains foreign exchange rates measured daily from 1979–1998. Consider creating a predictive model for the Swiss franc to US dollar exchange rate (CHF). Import Data into Econometric Modeler At the command line, load the Data_FXRates.mat data set. load Data_FXRates
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
All time series variables in DataTimeTable appear in the Time Series pane, and a time series plot of all the series appears in the Time Series Plot(AUD) figure window. Plot the Series Plot the Swiss franc exchange rates by double-clicking the CHF time series in the Time Series pane. Highlight periods of recession: 1
In the Time Series Plot(CHF) figure window, right-click the plot.
2
In the context menu, select Show Recessions.
4-221
4
Econometric Modeler
The CHF series appears to have a stochastic trend. Stabilize the Series Stabilize the Swiss franc exchange rates by applying the first difference to CHF. 1
In the Time Series pane, select CHF.
2
On the Econometric Modeler tab, in the Transforms section, click Difference.
3
Highlight periods of recession: a
In the Time Series Plot(CHFDiff) figure window, right-click the plot.
b
In the context menu, select Show Recessions.
A variable named CHFDiff, representing the differenced series, appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(CHFDiff) figure window.
4-222
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
The series appears to be stable, but it exhibits volatility clustering. Assess Presence of Conditional Heteroscedasticity Test the stable Swiss franc exchange rate series for conditional heteroscedasticity by conducting Engle's ARCH test. Run the test assuming an ARCH(1) alternative model, then run the test again assuming an ARCH(2) alternative model. Maintain an overall significance level of 0.05 by decreasing the significance level of each test to 0.05/2 = 0.025. 1
In the Time Series pane, select CHFDiff.
2
On the Econometric Modeler tab, in the Tests section, click New Test > Engle's ARCH Test.
3
On the ARCH tab, in the Parameters section, set Number of Lags of 1.
4
Set Significance Level to 0.025.
5
In the Tests section, click Run Test.
6
Repeat steps 3 through 5, but set Number of Lags to 2 instead.
The test results appear in the Results table of the ARCH(CHFDiff) document.
4-223
4
Econometric Modeler
The tests reject the null hypothesis of no ARCH effects against the alternative models. This result suggests specifying a conditional variance model for CHFDiff containing at least two ARCH lags. Conditional variance models with two ARCH lags are locally equivalent to models with one ARCH and one GARCH lag. Consider GARCH(1,1), EGARCH(1,1), and GJR(1,1) models for CHFDiff. Estimate GARCH Model Specify a GARCH(1,1) model and fit it to the CHFDiff series.
4-224
1
In the Time Series pane, select the CHFDiff time series.
2
Click the Econometric Modeler tab. Then, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GARCH.
4
In the GARCH Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1.
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
c
Click Estimate.
The model variable GARCH_CHFDiff appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GARCH_CHFDiff) document.
4-225
4
Econometric Modeler
Specify and Estimate EGARCH Model Specify an EGARCH(1,1) model containing a leverage term at the first lag, and fit the model to the CHFDiff series. 1
In the Time Series pane, select the CHFDiff time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click EGARCH.
4
In the EGARCH Model Parameters dialog box, on the Lag Order tab:
5
a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1. Consequently, the app includes a corresponding leverage lag. You can remove or adjust leverage lags on the Lag Vector tab.
Click Estimate.
The model variable EGARCH_CHFDiff appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(EGARCH_CHFDiff) document.
4-226
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
Specify and Estimate GJR Model Specify a GJR(1,1) model containing a leverage term at the first lag, and fit the model to the CHFDiff series. 1
In the Time Series pane, select the CHFDiff time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GJR.
4
In the GJR Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1. Consequently, the app includes a corresponding leverage lag. You can remove or adjust leverage lags on the Lag Vector tab.
c
Click Estimate.
The model variable GJR_CHFDiff appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GJR_CHFDiff) document.
4-227
4
Econometric Modeler
Choose Model Choose the model with the best parsimonious in-sample fit. Base your decision on the model yielding the minimal Akaike information criterion (AIC). The table shows the AIC fit statistics of the estimated models, as given in the Goodness of Fit section of the estimation summary of each model. Model
AIC
GARCH(1,1)
-28730
EGARCH(1,1)
-28726
GJR(1,1)
-28737
The GJR(1,1) model yields the minimal BIC value. Therefore, it has the best parsimonious in-sample fit of all the estimated models.
See Also Apps Econometric Modeler Objects egarch | gjr | garch 4-228
Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App
Functions estimate
More About •
“Compare Conditional Variance Models Using Information Criteria” on page 8-69
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
4-229
4
Econometric Modeler
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App This example shows how to evaluate GARCH model assumptions by performing residual diagnostics using the Econometric Modeler app. The data set, stored in CAPMuniverse.mat available with the Financial Toolbox documentation, contains market data for daily returns of stocks and cash (money market) from the period January 1, 2000 to November 7, 2005. Consider modeling the market index returns (MARKET). Import Data into Econometric Modeler At the command line, load the CAPMuniverse.mat data set. load CAPMuniverse
The series are in the timetable AssetsTimeTable. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import AssetsTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click
.
2
In the Import Data dialog box, in the Import? column, select the check box for the AssetsTimeTable variable.
3
Click Import.
The market index variables, including MARKET, appear in the Time Series pane, and a time series plot containing all the series appears in the Time Series Plot(APPL) figure window. Plot the Series Plot the market index series by double-clicking the MARKET time series in the Time Series pane.
4-230
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App
The series appears to fluctuate around y = 0 and exhibits volatility clustering. Consider a GARCH(1,1) model without a mean offset for the series. Specify and Estimate GARCH Model Specify a GARCH(1,1) model without a mean offset. 1
In the Time Series pane, select MARKET.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the GARCH Models section, click GARCH.
4
In the GARCH Model Parameters dialog box, on the Lag Order tab: a
Set GARCH Degree to 1.
b
Set ARCH Degree to 1.
4-231
4
Econometric Modeler
5
Click Estimate.
The model variable GARCH_MARKET appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(GARCH_MARKET) document.
4-232
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App
The p values of the coefficient estimates are close to zero, which indicates that the estimates are significant. The inferred conditional variances show high volatility through 2003, then small volatility through 2005. The standardized residuals appear to fluctuate around y = 0, and there are several large (in magnitude) residuals. Check Goodness of Fit Assess whether the standardized residuals are normally distributed and uncorrelated. Then, assess whether the residual series has lingering conditional heteroscedasticity. Assess whether the standardized residuals are normally distributed by plotting their histogram and a quantile-quantile plot: 1
In the Models pane, select GARCH_MARKET.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Residual Histogram.
3
In the Diagnostics section, click Residual Diagnostics > Residual Q-Q Plot.
The histogram and quantile-quantile plot appear in the Histogram(GARCH_MARKET) and QQPlot(GARCH_MARKET) figure windows, respectively. Assess whether the standardized residuals are autocorrelated by plotting their autocorrelation function (ACF). 4-233
4
Econometric Modeler
1
In the Models pane, select GARCH_MARKET.
2
On the Econometric Modeler tab, in the Diagnostics section, click Residual Diagnostics > Autocorrelation Function.
The ACF plot appears in the ACF(GARCH_MARKET) figure window. Assess whether the residual series has lingering conditional heteroscedasticity by plotting the ACF of the squared standardized residuals: 1
In the Models pane, select GARCH_MARKET.
2
Click the Econometric Modeler tab. Then, in the Diagnostics section, click Residual Diagnostics > Squared Residual Autocorrelation.
The ACF of the squared standardized residuals appears in the ACF(GARCH_MARKET)2 figure window. Arrange the histogram, quantile-quantile plot, ACF, and the ACF of the squared standardized residual series so that they occupy the four quadrants of the right pane. On the Documents pane, click the Document Actions button squares.
4-234
, select Tile All, place the pointer in the (2,2) position of the matrix of
Perform GARCH Model Residual Diagnostics Using Econometric Modeler App
Although the results show a few large standardized residuals, they appear to be approximately normally distributed. The ACF plots of the standardized and squared standardized residuals do not contain any significant autocorrelations. Therefore, it is reasonable to conclude that the standardized residuals are uncorrelated and homoscedastic.
See Also Apps Econometric Modeler Objects garch Functions estimate | infer | autocorr
More About •
“Infer Conditional Variances and Residuals” on page 8-62
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141 4-235
4
Econometric Modeler
•
4-236
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
Share Results of Econometric Modeler App Session
Share Results of Econometric Modeler App Session This example shows how to share the results of an Econometric Modeler app session by: • Exporting time series and model variables to the MATLAB Workspace • Generating MATLAB plain text and live functions to use outside the app • Generating a report of your activities on time series and estimated models During the session, the example transforms and plots data, runs statistical tests, and estimates a multiplicative seasonal ARIMA model. The data set Data_Airline.mat contains monthly counts of airline passengers. Import Data into Econometric Modeler At the command line, load the Data_Airline.mat data set. load Data_Airline
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). Import DataTimeTable into the app: 1
On the Econometric Modeler tab, in the Import section, click the Import button
2
In the Import Data dialog box, in the Import? column, select the check box for the DataTimeTable variable.
3
Click Import.
.
The variable PSSG appears in the Time Series pane, its value appears in the Preview pane, and its time series plot appears in the Time Series Plot(PSSG) figure window.
4-237
4
Econometric Modeler
The series exhibits a seasonal trend, serial correlation, and possible exponential growth. For an interactive analysis of serial correlation, see “Detect Serial Correlation Using Econometric Modeler App” on page 4-71. Stabilize Series Address the exponential trend by applying the log transform to PSSG. 1
In the Time Series pane, select PSSG.
2
On the Econometric Modeler tab, in the Transforms section, click Log.
The transformed variable PSSGLog appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLog) figure window.
4-238
Share Results of Econometric Modeler App Session
The exponential growth appears to be removed from the series. Address the seasonal trend by applying the 12th order seasonal difference. With PSSGLog selected in the Time Series pane, on the Econometric Modeler tab, in the Transforms section, set Seasonal to 12. Then, click Seasonal. The transformed variable PSSGLogSeasonalDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiff) figure window.
4-239
4
Econometric Modeler
The transformed series appears to have a unit root. Test the null hypothesis that PSSGLogSeasonalDiff has a unit root by using the Augmented DickeyFuller test. Specify that the alternative is an AR(0) model, then test again specifying an AR(1) model. Adjust the significance level to 0.025 to maintain a total significance level of 0.05. 1
With PSSGLogSeasonalDiff selected in the Time Series pane, on the Econometric Modeler tab, in the Tests section, click New Test > Augmented Dickey-Fuller Test.
2
On the ADF tab, in the Parameters section, set Significance Level to 0.025.
3
In the Tests section, click Run Test.
4
In the Parameters section, set Number of Lags to 1.
5
In the Tests section, click Run Test.
The test results appear in the Results table of the ADF(PSSGLogSeasonalDiff) document.
4-240
Share Results of Econometric Modeler App Session
Both tests fail to reject the null hypothesis that the series is a unit root process. Address the unit root by applying the first difference to PSSGLogSeasonalDiff. With PSSGLogSeasonalDiff selected in the Time Series pane, click the Econometric Modeler tab. Then, in the Transforms section, click Difference. The transformed variable PSSGLogSeasonalDiffDiff appears in the Time Series pane, and its time series plot appears in the Time Series Plot(PSSGLogSeasonalDiffDiff) figure window. In the Time Series pane, rename the PSSGLogSeasonalDiffDiff variable by clicking it twice to select its name and PSSGStable. The app updates the names of all documents associated with the transformed series.
4-241
4
Econometric Modeler
Identify Model for Series Determine the lag structure for a conditional mean model of the data by plotting the sample autocorrelation function (ACF) and partial autocorrelation function (PACF).
4-242
1
With PSSGStable selected in the Time Series pane, click the Plots tab, then click ACF.
2
Show the first 50 lags of the ACF. On the ACF tab, set Number of Lags to 50.
3
Click the Plots tab, then click PACF.
4
Show the first 50 lags of the PACF. On the PACF tab, set Number of Lags to 50.
5
Drag the ACF(PSSGStable) figure window above the PACF(PSSGStable) figure window.
Share Results of Econometric Modeler App Session
According to [1], the autocorrelations in the ACF and PACF suggest that the following SARIMA(0,1,1) ×(0,1,1)12 model is appropriate for PSSGLog. (1 − L) 1 − L12 yt = 1 + θ1L 1 + Θ12L12 εt . Close all figure windows. Specify and Estimate SARIMA Model Specify the SARIMA(0,1,1)×(0,1,1)12 model. 1
In the Time Series pane, select the PSSGLog time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the ARMA/ARIMA Models section, click SARIMA.
4
In the SARIMA Model Parameters dialog box, on the Lag Order tab: • Nonseasonal section 4-243
4
Econometric Modeler
a
Set Degrees of Integration to 1.
b
Set Moving Average Order to 1.
c
Clear the Include Constant Term check box.
• Seasonal section
5
a
Set Period to 12 to indicate monthly data.
b
Set Moving Average Order to 1.
c
Select the Include Seasonal Difference check box.
Click Estimate.
The model variable SARIMA_PSSGLog appears in the Models pane, its value appears in the Preview pane, and its estimation summary appears in the Model Summary(SARIMA_PSSGLog) document.
4-244
Share Results of Econometric Modeler App Session
Export Variables to Workspace Export PSSGLog, PSSGStable, and SARIMA_PSSGLog to the MATLAB Workspace. 1 2
On the Econometric Modeler tab, in the Export section, click
.
In the Export Variables dialog box, select the Select check boxes for the PSSGLog and PSSGStable time series, and the SARIMA_PSSGLog model (if necessary). The app automatically selects the check boxes for all variables that are highlighted in the Time Series and Models panes.
4-245
4
Econometric Modeler
3
Click Export.
At the command line, list all variables in the workspace. whos Name Data DataTable DataTimeTable Description PSSGLog PSSGStable SARIMA_PSSGLog dates series
Size 144x1 144x2 144x1 22x54 144x1 144x1 1x1 144x1 1x1
Bytes 1152 3525 3311 2376 1152 1152 7963 1152 162
Class
Attributes
double table timetable char double double arima double cell
The contents of Data_Airline.mat, the numeric vectors PSSGLog and PSSGStable, and the estimated arima model object SARIMA_PSSGLog are variables in the workspace. Forecast the next three years (36 months) of log airline passenger counts using SARIMA_PSSGLog. Specify the PSSGLog as presample data. numObs = 36; fPSSG = forecast(SARIMA_PSSGLog,numObs,'Y0',PSSGLog);
Plot the passenger counts and the forecasts. fh = DataTimeTable.Time(end) + calmonths(1:numObs); figure; plot(DataTimeTable.Time,exp(PSSGLog)); hold on plot(fh,exp(fPSSG)); legend('Airline Passenger Counts','Forecasted Counts',... 'Location','best') title('Monthly Airline Passenger Counts, 1949-1963')
4-246
Share Results of Econometric Modeler App Session
ylabel('Passenger counts') hold off
Generate Plain Text Function from App Session Generate a MATLAB function for use outside the app. The function returns the estimated model SARIMA_PSSGLog given DataTimeTable. 1
In the Models pane of the app, select the SARIMA_PSSGLog model.
2
On the Econometric Modeler tab, in the Export section, click Export > Generate Function. The MATLAB Editor opens and contains a function named modelTimeSeries. The function accepts DataTimeTable (the variable you imported in this session), transforms data, and returns the estimated SARIMA(0,1,1)×(0,1,1)12 model SARIMA_PSSGLog.
4-247
4
Econometric Modeler
3
On the Editor tab, click Save > Save.
4
Save the function to your current folder by clicking Save in the Select File for Save As dialog box.
At the command line, estimate the SARIMA(0,1,1)×(0,1,1)12 model by passing DataTimeTable to modelTimeSeries. Name the model SARIMA_PSSGLog2. Compare the estimated model to SARIMA_PSSGLog. SARIMA_PSSGLog2 = modelTimeSeries(DataTimeTable); summarize(SARIMA_PSSGLog) summarize(SARIMA_PSSGLog2) ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488 Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488
4-248
Share Results of Econometric Modeler App Session
Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
As expected, the models are identical. Generate Live Function from App Session Unlike a plain text function, a live function contains formatted text and equations that you can modify by using the Live Editor. Generate a live function for use outside the app. The function returns the estimated model SARIMA_PSSGLog given DataTimeTable. 1
In the Models pane of the app, select the SARIMA_PSSGLog model.
2
On the Econometric Modeler tab, in the Export section, click Export > Generate Live Function. The Live Editor opens and contains a function named modelTimeSeries. The function accepts DataTimeTable (the variable you imported in this session), transforms data, and returns the estimated SARIMA(0,1,1)×(0,1,1)12 model SARIMA_PSSGLog.
3
To ensure the function does not shadow the M-file function, change the name of the function to modelTimeSeriesMLX.
4-249
4
Econometric Modeler
4
On the Live Editor tab, in the File section, click Save > Save.
5
Save the function to your current folder by clicking Save in the Select File for Save As dialog box.
At the command line, estimate the SARIMA(0,1,1)×(0,1,1)12 model by passing DataTimeTable to modelTimeSeriesMLX. Name the model SARIMA_PSSGLog2. Compare the estimated model to SARIMA_PSSGLog. SARIMA_PSSGLog2 = modelTimeSeriesMLX(DataTimeTable); summarize(SARIMA_PSSGLog) summarize(SARIMA_PSSGLog2) ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488 Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distribution) Effective Sample Size: 144 Number of Estimated Parameters: 3 LogLikelihood: 276.198 AIC: -546.397 BIC: -537.488 Value _________ Constant MA{1} SMA{12} Variance
0 -0.37716 -0.57238 0.0012634
StandardError _____________ 0 0.066794 0.085439 0.00012395
TStatistic __________ NaN -5.6466 -6.6992 10.193
PValue __________ NaN 1.6364e-08 2.0952e-11 2.1406e-24
As expected, the models are identical. Generate Report Generate a PDF report of all your actions on the PSSGLog and PSSGStable time series, and the SARIMA_PSSGLog model. 1
4-250
On the Econometric Modeler tab, in the Export section, click Export > Generate Report.
Share Results of Econometric Modeler App Session
2
In the Select Variables for Report dialog box, select the Select check boxes for the PSSGLog and PSSGStable time series, and the SARIMA_PSSGLog model (if necessary). The app automatically selects the check boxes for all variables that are highlighted in the Time Series and Models panes.
3
Click OK.
4
In the Select File to Write dialog box, navigate to the C:\MyData folder.
5
In the File name box, type SARIMAReport.
6
Click Save.
The app publishes the code required to create PSSGLog, PSSGStable, and SARIMA_PSSGLog in the PDF C:\MyData\SARIMAReport.pdf. The report includes: • A title page and table of contents • Plots that include the selected time series • Descriptions of transformations applied to the selected time series • Results of statistical tests conducted on the selected time series • Estimation summaries of the selected models
4-251
4
Econometric Modeler
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also Apps Econometric Modeler Objects arima Functions summarize | estimate
More About
4-252
•
“Create Multiplicative Seasonal ARIMA Model for Time Series Data” on page 7-51
•
“Estimate Multiplicative ARIMA Model” on page 7-117
•
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
Creating ARIMA Models Using Econometric Modeler App
5 Time Series Regression Models • “Time Series Regression Models” on page 5-3 • “Regression Models with Time Series Errors” on page 5-5 • “Create Regression Models with ARIMA Errors” on page 5-8 • “Specify Default Regression Model with ARIMA Errors” on page 5-19 • “Modify regARIMA Model Properties” on page 5-21 • “Create Regression Models with AR Errors” on page 5-26 • “Create Regression Models with MA Errors” on page 5-31 • “Create Regression Models with ARMA Errors” on page 5-37 • “Create Regression Models with ARIMA Errors” on page 5-46 • “Create Regression Models with SARIMA Errors” on page 5-51 • “Specify Regression Model with SARIMA Errors” on page 5-55 • “Specify ARIMA Error Model Innovation Distribution” on page 5-61 • “Impulse Response of Regression Models with ARIMA Errors” on page 5-66 • “Plot Impulse Response of Regression Model with ARIMA Errors” on page 5-67 • “Maximum Likelihood Estimation of regARIMA Models” on page 5-74 • “regARIMA Model Estimation Using Equality Constraints” on page 5-76 • “Presample Values for regARIMA Model Estimation” on page 5-80 • “Initial Values for regARIMA Model Estimation” on page 5-82 • “Optimization Settings for regARIMA Model Estimation” on page 5-84 • “Estimate Regression Model with ARIMA Errors” on page 5-88 • “Estimate a Regression Model with Multiplicative ARIMA Errors” on page 5-95 • “Select Regression Model with ARIMA Errors” on page 5-103 • “Choose Lags for ARMA Error Model” on page 5-105 • “Intercept Identifiability in Regression Models with ARIMA Errors” on page 5-109 • “Alternative ARIMA Model Representations” on page 5-113 • “Simulate Regression Models with ARMA Errors” on page 5-119 • “Simulate Regression Models with Nonstationary Errors” on page 5-138 • “Simulate Regression Models with Multiplicative Seasonal Errors” on page 5-146 • “Monte Carlo Simulation of Regression Models with ARIMA Errors” on page 5-151 • “Presample Data for regARIMA Model Simulation” on page 5-154 • “Transient Effects in regARIMA Model Simulations” on page 5-155 • “Forecast a Regression Model with ARIMA Errors” on page 5-163 • “Forecast a Regression Model with Multiplicative Seasonal ARIMA Errors” on page 5-166 • “Verify Predictive Ability Robustness of a regARIMA Model” on page 5-170 • “MMSE Forecasting Regression Models with ARIMA Errors” on page 5-172
5
Time Series Regression Models
• “Monte Carlo Forecasting of regARIMA Models” on page 5-175 • “Time Series Regression I: Linear Models” on page 5-176 • “Time Series Regression II: Collinearity and Estimator Variance” on page 5-183 • “Time Series Regression III: Influential Observations” on page 5-193 • “Time Series Regression IV: Spurious Regression” on page 5-200 • “Time Series Regression V: Predictor Selection” on page 5-212 • “Time Series Regression VI: Residual Diagnostics” on page 5-223 • “Time Series Regression VII: Forecasting” on page 5-234 • “Time Series Regression VIII: Lagged Variables and Estimator Bias” on page 5-243 • “Time Series Regression IX: Lag Order Selection” on page 5-264 • “Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-282
5-2
Time Series Regression Models
Time Series Regression Models Time series regression models attempt to explain the current response using the response history (autoregressive dynamics) and the transfer of dynamics from relevant predictors (or otherwise). Theoretical frameworks for potential relationships among variables often permit different representations of the system. Use time series regression models to analyze time series data, which are measurements that you take at successive time points. For example, use time series regression modeling to: • Examine the linear effects of the current and past unemployment rates and past inflation rates on the current inflation rate. • Forecast GDP growth rates by using an ARIMA model and include the CPI growth rate as a predictor. • Determine how a unit increase in rainfall, amount of fertilizer, and labor affect crop yield. You can start a time series analysis by building a design matrix (Xt), which can include current and past observations of predictors. You can also complement the regression component with an autoregressive (AR) component to account for the possibility of response (yt) dynamics. For example, include past measurements of inflation rate in the regression component to explain the current inflation rate. AR terms account for dynamics unexplained by the regression component, which is necessarily underspecified in econometric applications. Also, the AR terms absorb residual autocorrelations, simplify innovation models, and generally improve forecast performance. Then, apply ordinary least squares (OLS) to the multiple linear regression (MLR) model: yt = Xt β + ut . If a residual analysis suggests classical linear model assumption departures such as that heteroscedasticity or autocorrelation (i.e., nonspherical errors), then: • You can estimate robust HAC (heteroscedasticity and autocorrelation consistent) standard errors (for details, see hac). • If you know the innovation covariance matrix (at least up to a scaling factor), then you can apply generalized least squares (GLS). Given that the innovation covariance matrix is correct, GLS effectively reduces the problem to a linear regression where the residuals have covariance I. • If you do not know the structure of the innovation covariance matrix, but know the nature of the heteroscedasticity and autocorrelation, then you can apply feasible generalized least squares (FGLS). FGLS applies GLS iteratively, but uses the estimated residual covariance matrix. FGLS estimators are efficient under certain conditions. For details, see [1], Chapter 11. There are time series models that model the dynamics more explicitly than MLR models. These models can account for AR and predictor effects as with MLR models, but have the added benefits of: • Accounting for moving average (MA) effects. Include MA terms to reduce the number of AR lags, effectively reducing the number of observation required to initialize the model. • Easily modeling seasonal effects. In order to model seasonal effects with an MLR model, you have to build an indicator design matrix. • Modeling nonseasonal and seasonal integration for unit root nonstationary processes. These models also differ from MLR in that they rely on distribution assumptions (i.e., they use maximum likelihood for estimation). Popular types of time series regression models include: 5-3
5
Time Series Regression Models
• Autoregressive integrated moving average with exogenous predictors (ARIMAX). This is an ARIMA model that linearly includes predictors (exogenous or otherwise). For details, see arima or “ARIMAX(p,D,q) Model” on page 7-61. • Regression model with ARIMA time series errors. This is an MLR model where the unconditional disturbance process (ut) is an ARIMA time series. In other words, you explicitly model ut as a linear time series. For details, see regARIMA. • Distributed lag model (DLM). This is an MLR model that includes the effects of predictors that persist over time. In other words, the regression component contains coefficients for contemporaneous and lagged values of predictors. Econometrics Toolbox does not contain functions that model DLMs explicitly, but you can use regARIMA or fitlm with an appropriately constructed predictor (design) matrix to analyze a DLM. • Transfer function (autoregressive distributed lag) model. This model extends the distributed lag framework in that it includes autoregressive terms (lagged responses). Econometrics Toolbox does not contain functions that model DLMs explicitly, but you can use the arima functionality with an appropriately constructed predictor matrix to analyze an autoregressive DLM. The choice you make on which model to use depends on your goals for the analysis, and the properties of the data.
References [1] Greene, W. H. Econometric Analysis. 6th ed. Englewood Cliffs, NJ: Prentice Hall, 2008.
See Also arima | regARIMA | hac | fitlm
More About
5-4
•
“ARIMAX(p,D,q) Model” on page 7-61
•
“Regression Models with Time Series Errors” on page 5-5
Regression Models with Time Series Errors
Regression Models with Time Series Errors In this section... “What Are Regression Models with Time Series Errors?” on page 5-5 “Conventions” on page 5-5
What Are Regression Models with Time Series Errors? Regression models with time series errors attempt to explain the mean behavior of a response series (yt, t = 1,...,T) by accounting for linear effects of predictors (Xt) using a multiple linear regression (MLR). However, the errors (ut), called unconditional disturbances, are time series rather than white noise, which is a departure from the linear model assumptions. Unlike the ARIMA model that includes exogenous predictors, regression models with time series errors preserve the sensitivity interpretation of the regression coefficients (β) [2]. These models are particularly useful for econometric data. Use these models to: • Analyze the effects of a new policy on a market indicator (an intervention model). • Forecast population size adjusting for predictor effects, such as expected prevalence of a disease. • Study the behavior of a process adjusting for calendar effects. For example, you can analyze traffic volume by adjusting for the effects of major holidays. For details, see [3]. • Estimate the trend by including time (t) in the model. • Forecast total energy consumption accounting for current and past prices of oil and electricity (distributed lag model). Use these tools in Econometrics Toolbox to: • Specify a regression model with ARIMA errors (see regARIMA). • Estimate parameters using a specified model, and response and predictor data (see estimate). • Simulate responses using a model and predictor data (see simulate). • Forecast responses using a model and future predictor data (see forecast). • Infer residuals and estimated unconditional disturbances from a model using the model and predictor data (see infer). • filter innovations through a model using the model and predictor data • Generate impulse responses (see impulse). • Compare a regression model with ARIMA errors to an ARIMAX model (see arima).
Conventions A regression model with time series errors has the following form (in lag operator notation): yt = c + Xt β + ut a L A L 1−L
D
1 − Ls ut = b L B L εt,
(5-1)
where 5-5
5
Time Series Regression Models
• t = 1,...,T. • yt is the response series. • Xt is row t of X, which is the matrix of concatenated predictor data vectors. That is, Xt is observation t of each predictor series. • c is the regression model intercept. • β is the regression coefficient. • ut is the disturbance series. • εt is the innovations series. • L jy = y t− j. t • a L = 1 − a L − ... − a Lp , which is the degree p, nonseasonal autoregressive polynomial. 1 p •
A L = 1 − A1L − ... − ApsL
•
1 − L , which is the degree D, nonseasonal integration polynomial.
•
1 − Ls , which is the degree s, seasonal integration polynomial.
ps
, which is the degree ps, seasonal autoregressive polynomial.
D
• b L = 1 + b L + ... + b Lq , which is the degree q, nonseasonal moving average polynomial. 1 q • B L = 1 + B L + ... + B Lqs , which is the degree q , seasonal moving average polynomial. s 1 qs Following Box and Jenkins methodology, ut is a stationary or unit root nonstationary, regular, linear time series. However, if ut is unit root nonstationary, then you do not have to explicitly difference the series as they recommend in [1]. You can simply specify the seasonal and nonseasonal integration degree using the software. For details, see “Create Regression Models with ARIMA Errors” on page 5-8. Another deviation from the Box and Jenkins methodology is that ut does not have a constant term (conditional mean), and therefore its unconditional mean is 0. However, the regression model contains an intercept term, c. Note If the unconditional disturbance process is nonstationary (i.e., the nonseasonal or seasonal integration degree is greater than 0), then the regression intercept, c, is not identifiable. For details, see “Intercept Identifiability in Regression Models with ARIMA Errors” on page 5-109. The software enforces stability and invertibility of the ARMA process. That is, ψ(L) =
b(L)B(L) = 1 + ψ1L + ψ2L2 + ..., a(L)A(L)
where the series {ψt} must be absolutely summable. The conditions for {ψt} to be absolutely summable are: • a(L) and A(L) are stable (i.e., the eigenvalues of a(L) = 0 and A(L) = 0 lie inside the unit circle). • b(L) and B(L) are invertible (i.e., their eigenvalues lie of b(L) = 0 and B(L) = 0 inside the unit circle). The software uses maximum likelihood for parameter estimation. You can choose either a Gaussian or Student’s t distribution for the innovations, εt. 5-6
Regression Models with Time Series Errors
The software treats predictors as nonstochastic variables for estimation and inference.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hyndman, R. J. (2010, October). “The ARIMAX Model Muddle.” Rob J. Hyndman. Retrieved May 4, 2017 from https://robjhyndman.com/hyndsight/arimax/. [3] Ruey, T. S. “Regression Models with Time Series Errors.” Journal of the American Statistical Association. Vol. 79, Number 385, March 1984, pp. 118–124.
See Also regARIMA | arima | estimate | filter | forecast | impulse | infer | simulate
Related Examples •
“Alternative ARIMA Model Representations” on page 5-113
•
“Intercept Identifiability in Regression Models with ARIMA Errors” on page 5-109
More About •
“What Are ARIMA Models That Include Exogenous Covariates?” on page 7-61
•
“Create Regression Models with ARIMA Errors” on page 5-8
5-7
5
Time Series Regression Models
Create Regression Models with ARIMA Errors In this section... “Default Regression Model with ARIMA Errors Specifications” on page 5-8 “Specify regARIMA Models Using Name-Value Pair Arguments” on page 5-9 “Specify Linear Regression Models Using Econometric Modeler App” on page 5-15
Default Regression Model with ARIMA Errors Specifications Regression models with ARIMA errors have the following form (in lag operator notation on page 121): yt = c + Xt β + ut a L A L 1−L
D
1 − Ls ut = b L B L εt,
where • t = 1,...,T. • yt is the response series. • Xt is row t of X, which is the matrix of concatenated predictor data vectors. That is, Xt is observation t of each predictor series. • c is the regression model intercept. • β is the regression coefficient. • ut is the disturbance series. • εt is the innovations series. • L jy = y t− j. t • a L = 1 − a L − ... − a Lp , which is the degree p, nonseasonal autoregressive polynomial. 1 p •
A L = 1 − A1L − ... − ApsL
•
1 − L , which is the degree D, nonseasonal integration polynomial.
•
1 − Ls , which is the degree s, seasonal integration polynomial.
ps
, which is the degree ps, seasonal autoregressive polynomial.
D
• b L = 1 + b L + ... + b Lq , which is the degree q, nonseasonal moving average polynomial. 1 q • B L = 1 + B L + ... + B Lqs , which is the degree q , seasonal moving average polynomial. s 1 qs For simplicity, use the shorthand notation Mdl = regARIMA(p,D,q) to specify a regression model with ARIMA(p,D,q) errors, where p, D, and q are nonnegative integers. Mdl has the following default properties.
5-8
Property Name
Property Data Type
AR
Length p cell vector of NaNs
Beta
Empty vector [] of regression coefficients, corresponding to the predictor series
Create Regression Models with ARIMA Errors
Property Name
Property Data Type
D
Nonnegative scalar, corresponding to D
Distribution
"Gaussian", corresponding to the distribution of εt
Intercept
NaN, corresponding to c
MA
Length q cell vector of NaNs
P
Number of AR terms plus degree of integration, p +D
Q
Number of MA terms, q
SAR
Empty cell vector
SMA
Empty cell vector
Variance
NaN, corresponding to the variance of εt
Seasonality
0, corresponding to s
If you specify nonseasonal ARIMA errors, then • The properties D and Q are the inputs D and q, respectively. • Property P = p + D, which is the degree of the compound, nonseasonal autoregressive polynomial. In other words, P is the degree of the product of the nonseasonal autoregressive polynomial, a(L) and the nonseasonal integration polynomial, (1 – L)D. The values of properties P and Q indicate how many presample observations the software requires to initialize the time series. You can modify the properties of Mdl using dot notation. For example, Mdl.Variance = 0.5 sets the innovation variance to 0.5. For maximum flexibility in specifying a regression model with ARIMA errors, use name-value pair arguments to, for example, set each of the autoregressive parameters to a value, or specify multiplicative seasonal terms. For example, Mdl = regARIMA('AR',{0.2 0.1}) defines a regression model with AR(2) errors, and the coefficients are a1 = 0.2 and a2 = 0.1.
Specify regARIMA Models Using Name-Value Pair Arguments You can only specify the nonseasonal autoregressive and moving average polynomial degrees, and nonseasonal integration degree using the shorthand notation regARIMA(p,D,q). Some tasks, such as forecasting and simulation, require you to specify values for parameters. You cannot specify parameter values using shorthand notation. For maximum flexibility, use name-value pair arguments to specify regression models with ARIMA errors. The nonseasonal ARIMA error model might contain the following polynomials: • The degree p autoregressive polynomial a(L) = 1 – a1L – a2L2 –...– apLp. The eigenvalues of a(L) must lie within the unit circle (i.e., a(L) must be a stable polynomial). • The degree q moving average polynomial b(L) = 1 + b1L + b2L2 +...+ bqLq. The eigenvalues of b(L) must lie within the unit circle (i.e., b(L) must be an invertible polynomial). • The degree D nonseasonal integration polynomial is (1 – L)D. 5-9
5
Time Series Regression Models
The following table contains the name-value pair arguments that you use to specify the ARIMA error model (i.e., a regression model with ARIMA errors, but without a regression component and intercept): yt = ut 1
a(L)( = b(L)εt .
5-10
(5-2)
Create Regression Models with ARIMA Errors
Name-Value Pair Arguments for Nonseasonal ARIMA Error Models Name
Corresponding Model Term(s) in “Equation 5-2”
When to Specify
AR
Nonseasonal AR coefficients: a1, a2,...,ap
• To set equality constraints for the AR coefficients. For example, to specify the AR coefficients in the ARIMA error model ut = 0.8ut − 1 − 0.2ut − 2 + εt, specify 'AR',{0.8,-0.2}. • You only need to specify the nonzero elements of AR. If the nonzero coefficients are at nonconsecutive lags, specify the corresponding lags using ARLags. • The coefficients must correspond to a stable AR polynomial.
ARLags
Lags corresponding • ARLags is not a model property. to nonzero, Use this argument as a shortcut for specifying AR nonseasonal AR when the nonzero AR coefficients correspond to coefficients nonconsecutive lags. For example, to specify nonzero AR coefficients at lags 1 and 12, e.g., ut = a1ut − 1 + a2ut − 12 + εt, specify 'ARLags',[1,12]. • Use AR and ARLags together to specify known nonzero AR coefficients at nonconsecutive lags. For example, if in the given AR(12) error model with a1 = 0.6 and a12 = –0.3, then specify 'AR', {0.6,-0.3},'ARLags',[1,12].
D
Degree of nonseasonal differencing, D
• To specify a degree of nonseasonal differencing greater than zero. For example, to specify one degree of differencing, specify 'D',1. • By default, D has value 0 (meaning no nonseasonal integration).
Distribution
Distribution of the • Use this argument to specify a Student’s t innovation process, distribution. By default, the innovation distribution εt is "Gaussian". For example, to specify a t distribution with unknown degrees of freedom, specify 'Distribution','t'. • To specify a t innovation distribution with known degrees of freedom, assign Distribution a structure with fields Name and DoF. For example, for a t distribution with nine degrees of freedom, specify 'Distribution',struct('Name','t','DoF' ,9).
5-11
5
Time Series Regression Models
Name
Corresponding Model Term(s) in “Equation 5-2”
When to Specify
MA
Nonseasonal MA coefficients: b1, b2,...,bq
• To set equality constraints for the MA coefficients. For example, to specify the MA coefficients in the ARIMA error model ut = εt + 0.5εt − 1 + 0.2εt − 2, specify 'MA',{0.5,0.2}. • You only need to specify the nonzero elements of MA. If the nonzero coefficients are at nonconsecutive lags, specify the corresponding lags using MALags. • The coefficients must correspond to an invertible MA polynomial.
MALags
Lags corresponding • MALags is not a model property. to nonzero, • Use this argument as a shortcut for specifying MA nonseasonal MA when the nonzero MA coefficients correspond to coefficients nonconsecutive lags. For example, to specify nonzero MA coefficients at lags 1 and 4, e.g., ut = εt + b1εt − 1 + b4εt − 4, specify 'MALags',[1,4]. • Use MA and MALags together to specify known nonzero MA coefficients at nonconsecutive lags. For example, if in the given MA(4) error model b1 = 0.5 and b4 = 0.2, specify 'MA', {0.4,0.2},'MALags',[1,4].
Variance
Scalar variance, σ2, To set equality constraints for σ2. For example, for an of the innovation ARIMA error model with known innovation variance process, εt 0.1, specify 'Variance',0.1. By default, Variance has value NaN.
Use the name-value pair arguments in the following table in conjunction with those in Name-Value Pair Arguments for Nonseasonal ARIMA Error Models to specify the regression components of the regression model with ARIMA errors: yt = c + Xt β + ut 1
a(L)( = b(L)εt .
5-12
(5-3)
Create Regression Models with ARIMA Errors
Name-Value Pair Arguments for the Regression Component of the regARIMA Model Name
Corresponding Model Term(s) in “Equation 5-3”
When to Specify
Beta
Regression • Use this argument to specify the values of the coefficient values coefficients of the predictor series. For example, corresponding to use 'Beta',[0.5 7 -2] to specify the predictor series, β = 0.5 7 −2 ′ . β • By default, Beta is an empty vector, [].
Intercept
Intercept term for the regression model, c
• To set equality constraints for c. For example, for a model with no intercept term, specify 'Intercept',0. • By default, Intercept has value NaN.
If the time series has seasonality s, then • The degree ps seasonal autoregressive polynomial is A(L) = 1 – A 1L – A2L2 –...– ApsLps. • The degree qs seasonal moving average polynomial is B(L) 1 + B 1L + B2L2 +...+ BqsLqs. • The degree s seasonal integration polynomial is (1 – Ls). Use the name-value pair arguments in the following table in conjunction with those in tables NameValue Pair Arguments for Nonseasonal ARIMA Error Models and Name-Value Pair Arguments for the Regression Component of the regARIMA Model to specify the regression model with multiplicative seasonal ARIMA errors: yt = c + Xt β + ut 1
a(L)( A(L)(1 − Ls)ut = b(L)B(L)εt .
(5-4)
5-13
5
Time Series Regression Models
Name-Value Pair Arguments for Seasonal ARIMA Models Argument
Corresponding Model When to Specify Term(s) in “Equation 5-4”
SAR
Seasonal AR coefficients: A1, • To set equality constraints for the seasonal AR A2,...,Aps coefficients. • Use SARLags to specify the lags of the nonzero seasonal AR coefficients. Specify the lags associated with the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or 12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...). For example, to specify the ARIMA error model (1 − 0.8L)(1 − 0.2L12)ut = εt, specify 'AR',0.8,'SAR',0.2,'SARLags',12. • The coefficients must correspond to a stable seasonal AR polynomial.
SARLags
Lags corresponding to nonzero seasonal AR coefficients, in the periodicity of the responses
• SARLags is not a model property. • Use this argument when specifying SAR to indicate the lags of the nonzero seasonal AR coefficients. For example, to specify the ARIMA error model (1 − a1L)(1 − A12L12)ut = εt, specify 'ARLags',1,'SARLags',12.
SMA
Seasonal MA coefficients: B1, • To set equality constraints for the seasonal MA B2,...,Bqs coefficients. • Use SMALags to specify the lags of the nonzero seasonal MA coefficients. Specify the lags associated with the seasonal polynomials in the periodicity of the observed data (e.g., 4, 8,... for quarterly data, or 12, 24,... for monthly data), and not as multiples of the seasonality (e.g., 1, 2,...). For example, to specify the ARIMA error model ut = (1 + 0.6L)(1 + 0.2L4)εt, specify 'MA',0.6,'SMA',0.2,'SMALags',4. • The coefficients must correspond to an invertible seasonal MA polynomial.
SMALags
Lags corresponding to the nonzero seasonal MA coefficients, in the periodicity of the responses
• SMALags is not a model property. • Use this argument when specifying SMA to indicate the lags of the nonzero seasonal MA coefficients. For example, to specify the model ut = (1 + b1L)(1 + B4L4)εt, specify 'MALags',1,'SMALags',4.
5-14
Create Regression Models with ARIMA Errors
Argument
Corresponding Model When to Specify Term(s) in “Equation 5-4”
Seasonality
Seasonal periodicity, s
• To specify the degree of seasonal integration s in the seasonal differencing polynomial Δs = 1 – Ls. For example, to specify the periodicity for seasonal integration of quarterly data, specify 'Seasonality',4. • By default, Seasonality has value 0 (meaning no periodicity nor seasonal integration).
Note You cannot assign values to the properties P and Q. For multiplicative ARIMA error models, • regARIMA sets P equal to p + D + ps + s. • regARIMA sets Q equal to q + qs
Specify Linear Regression Models Using Econometric Modeler App You can specify the predictor variables in the regression component, and the error model lag structure and innovation distribution, using the Econometric Modeler app. The app treats all coefficients as unknown and estimable. At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). In the app, you can see all supported models by selecting a time series variable for the response in the Time Series pane. Then, on the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
5-15
5
Time Series Regression Models
The Regression Models section contains supported regression models. To specify a multiple linear regression (MLR) model, select MLR. To specify regression models with ARMA errors, select RegARMA. After you select a model, the app displays the Type Model Parameters dialog box, where Type is the model type. This figure shows the RegARMA Model Parameters dialog box.
5-16
Create Regression Models with ARIMA Errors
Adjustable parameters depend on the model Type. In general, adjustable parameters include: • Predictor variables for the linear regression component, listed in the Predictors section. • For regression models with ARMA errors, you must include at least one predictor in the model. To include a predictor, select the corresponding check box in the Include? column. • For MLR models, you can clear all check boxes in the Include? column. In this case, you can specify a constant mean model (intercept-only model) by selecting the Include Intercept check box. Or, you can specify an error-only model by clearing the Include Intercept check box. • The innovation distribution and nonseasonal lags for the error model, for regression models with ARMA errors. As you adjust parameter values, the equation in the Model Equation section changes to match your specifications. Adjustable parameters correspond to input and name-value pair arguments described in the previous sections and in the regARIMA reference page. For more details on specifying models using the app, see “Fitting Models to Data” on page 4-15 and “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44.
See Also Apps Econometric Modeler 5-17
5
Time Series Regression Models
Objects regARIMA
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Specify Default Regression Model with ARIMA Errors” on page 5-19
•
“Modify regARIMA Model Properties” on page 5-21
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-37
•
“Create Regression Models with SARIMA Errors” on page 5-51
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61
More About •
5-18
“Regression Models with Time Series Errors” on page 5-5
Specify Default Regression Model with ARIMA Errors
Specify Default Regression Model with ARIMA Errors This example shows how to specify the default regression model with ARIMA errors using the shorthand ARIMA(p, D, q) notation corresponding to the following equation: yt = c + ut D
1 − ϕ1L − ϕ2L2 − ϕ3L3 1 − L ut = 1 + θ1L + θ2L2 εt . Specify a regression model with ARIMA(3,1,2) errors. Mdl = regARIMA(3,1,2) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 4 1 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The model specification for Mdl appears in the Command Window. By default, regARIMA sets: • The autoregressive (AR) parameter values to NaN at lags [1 2 3] • The moving average (MA) parameter values to NaN at lags [1 2] • The variance (Variance) of the innovation process, εt, to NaN • The distribution (Distribution) of εt to Gaussian • The regression model intercept to NaN There is no regression component (Beta) by default. The property:
• P = p + D, which represents the number of presample observations that the software requires to initialize the autoregressive component of the model to perform, for example, estimation. • D represents the level of nonseasonal integration. • Q represents the number of presample observations that the software requires to initialize the moving average component of the model to perform, for example, estimation. Fit Mdl to data by passing it and the data into estimate. If you pass the predictor series into estimate, then estimate estimates Beta by default. You can modify the properties of Mdl using dot notation. 5-19
5
Time Series Regression Models
References: Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Modify regARIMA Model Properties” on page 5-21
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-37
•
“Create Regression Models with SARIMA Errors” on page 5-51
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61
More About •
5-20
“Regression Models with Time Series Errors” on page 5-5
Modify regARIMA Model Properties
Modify regARIMA Model Properties In this section... “Modify Properties Using Dot Notation” on page 5-21 “Nonmodifiable Properties” on page 5-23
Modify Properties Using Dot Notation If you create a regression model with ARIMA errors using regARIMA, then the software assigns values to all of its properties. To change any of these property values, you do not need to reconstruct the entire model. You can modify property values of an existing model using dot notation. To access the property, type the model name, then the property name, separated by '|.|' (a period). Specify the regression model with ARIMA(3,1,2) errors yt = c + ut D
1 − ϕ1L − ϕ2L2 − ϕ3L3 1 − L ut = 1 + θ1L + θ2L2 εt . Mdl = regARIMA(3,1,2);
Use cell array notation to set the autoregressive and moving average parameters to values. Mdl.AR = {0.2 0.1 0.05}; Mdl.MA = {0.1 -0.05} Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 4 1 2 {0.2 0.1 0.05} at lags [1 2 3] {} {0.1 -0.05} at lags [1 2] {} NaN
Use dot notation to display the autoregressive coefficients of Mdl in the Command Window. ARCoeff = Mdl.AR ARCoeff=1×3 cell array {[0.2000]} {[0.1000]}
{[0.0500]}
ARCoeff is a 1-by-3 cell array. Each, successive cell contains the next autoregressive lags. You can also add more lag coefficients. 5-21
5
Time Series Regression Models
Mdl.MA = {0.1 -0.05 0.01} Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,3) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 4 1 3 {0.2 0.1 0.05} at lags [1 2 3] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
By default, the specification sets the new coefficient to the next, consecutive lag. The addition of the new coefficient increases Q by 1. You can specify a lag coefficient to a specific lag term by using cell indexing. Mdl.AR{12} = 0.01 Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(12,1,3) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 13 1 3 {0.2 0.1 0.05 0.01} at lags [1 2 3 12] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
The autoregressive coefficient 0.01 is located at the 12th lag. Property P increases to 13 with the new specification. Set the innovation distribution to the t distribution with NaN degrees of freedom. Distribution = struct('Name','t','DoF',NaN); Mdl.Distribution = Distribution Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept:
5-22
"ARIMA(12,1,3) Error Model (t Distribution)" "Y" Name = "t", DoF = NaN NaN
Modify regARIMA Model Properties
Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
[1×0] 13 1 3 {0.2 0.1 0.05 0.01} at lags [1 2 3 12] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
If DoF is NaN, then estimate estimates the degrees of freedom. For other tasks, such as simulating or forecasting a model, you must specify a value for DoF. To specify a regression coefficient, assign a vector to the property Beta. Mdl.Beta = [1; 3; -5] Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARIMA(12,1,3) Error Model (t Distribution)" "Y" Name = "t", DoF = NaN NaN [1 3 -5] 13 1 3 {0.2 0.1 0.05 0.01} at lags [1 2 3 12] {} {0.1 -0.05 0.01} at lags [1 2 3] {} NaN
If you pass Mdl into estimate with the response data and three predictor series, then the software fixes the non-|NaN| parameters at their values, and estimate Intercept, Variance, and DoF. For example, if you want to simulate data from this model, then you must specify Variance and DoF.
Nonmodifiable Properties Not all properties of a regARIMA model are modifiable. To change them directly, you must redefine the model using regARIMA. Nonmodifiable properties include: • P, which is the compound autoregressive polynomial degree. The software determines P from p, d, ps, and s. For details on notation, see “Regression Model with ARIMA Time Series Errors” on page 12-2045. • Q, which is the compound moving average degree. The software determines Q from q and qs • DoF, which is the degrees of freedom for models having a t-distributed innovation process Though they are not explicitly properties, you cannot reassign or print the lag structure using ARLags, MALags, SARLags, or SMALags. Pass these and the lag structure into regARIMA as namevalue pair arguments when you specify the model. For example, specify a regression model with ARIMA(4,1) errors using regARIMA, where the autoregressive coefficients occur at lags 1 and 4. 5-23
5
Time Series Regression Models
Mdl = regARIMA('ARLags',[1 4],'MALags',1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(4,1) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 4 1 {NaN NaN} at lags [1 4] {} {NaN} at lag [1] {} NaN
You can produce the same results by specifying a regression model with ARMA(1,1) errors, then adding an autoregressive coefficient at the fourth lag. Mdl = regARIMA(1,0,1); Mdl.AR{4} = NaN Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(4,1) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 4 1 {NaN NaN} at lags [1 4] {} {NaN} at lag [1] {} NaN
To change the value of DoF, you must define a new structure for the distribution, and use dot notation to pass it into the model. For example, specify a regression model with AR(1) errors having tdistributed innovations. Mdl = regARIMA('AR',0.5,'Distribution','t') Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR:
5-24
"ARMA(1,0) Error Model (t Distribution)" "Y" Name = "t", DoF = NaN NaN [1×0] 1 0 {0.5} at lag [1] {}
Modify regARIMA Model Properties
MA: {} SMA: {} Variance: NaN
The value of DoF is NaN by default. Specify that the t distribution has 10 degrees of freedom. Distribution = struct('Name','t','DoF',10); Mdl.Distribution = Distribution Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(1,0) Error Model (t Distribution)" "Y" Name = "t", DoF = 10 NaN [1×0] 1 0 {0.5} at lag [1] {} {} {} NaN
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-37
•
“Create Regression Models with SARIMA Errors” on page 5-51
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61
More About •
“Regression Models with Time Series Errors” on page 5-5
5-25
5
Time Series Regression Models
Create Regression Models with AR Errors In this section... “Default Regression Model with AR Errors” on page 5-26 “AR Error Model Without an Intercept” on page 5-27 “AR Error Model with Nonconsecutive Lags” on page 5-27 “Known Parameter Values for a Regression Model with AR Errors” on page 5-28 “Regression Model with AR Errors and t Innovations” on page 5-29 These examples show how to create regression models with AR errors using regARIMA. For details on specifying regression models with AR errors using the Econometric Modeler app, see “Specify Regression Model with ARMA Errors Using Econometric Modeler App” on page 5-41.
Default Regression Model with AR Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify a regression model with AR errors. Specify the default regression model with AR(3) errors: yt = c + Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt . Mdl = regARIMA(3,0,0) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,0) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 3 0 {NaN NaN NaN} at lags [1 2 3] {} {} {} NaN
The software sets the innovation distribution to Gaussian, and each parameter to NaN. The AR coefficients are at lags 1 through 3. Pass Mdl into estimate with data to estimate the parameters set to NaN. Though Beta is not in the display, if you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values. 5-26
Create Regression Models with AR Errors
AR Error Model Without an Intercept This example shows how to specify a regression model with AR errors without a regression intercept. Specify the default regression model with AR(3) errors: yt = Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt . Mdl = regARIMA('ARLags',1:3,'Intercept',0) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,0) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [1×0] 3 0 {NaN NaN NaN} at lags [1 2 3] {} {} {} NaN
The software sets Intercept to 0, but all other estimable parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation.
AR Error Model with Nonconsecutive Lags This example shows how to specify a regression model with AR errors, where the nonzero AR terms are at nonconsecutive lags. Specify the regression model with AR(4) errors: yt = c + Xt β + ut ut = a1ut − 1 + a4ut − 4 + εt . Mdl = regARIMA('ARLags',[1,4]) Mdl = regARIMA with properties: Description: "ARMA(4,0) Error Model (Gaussian Distribution)"
5-27
5
Time Series Regression Models
SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Y" Name = "Gaussian" NaN [1×0] 4 0 {NaN NaN} at lags [1 4] {} {} {} NaN
The AR coefficients are at lags 1 and 4. Verify that the AR coefficients at lags 2 and 3 are 0. Mdl.AR ans=1×4 cell array {[NaN]} {[0]}
{[0]}
{[NaN]}
The software displays a 1-by-4 cell array. Each consecutive cell contains the corresponding AR coefficient value. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then, estimate holds a2 = 0 and a3 = 0 during estimation.
Known Parameter Values for a Regression Model with AR Errors This example shows how to specify values for all parameters of a regression model with AR errors. Specify the regression model with AR(4) errors: −2 + ut 0.5 ut = 0 . 2ut − 1 + 0 . 1ut − 4 + εt, yt = Xt
where εt is Gaussian with unit variance. Mdl = regARIMA('AR',{0.2,0.1},'ARLags',[1,4], ... 'Intercept',0,'Beta',[-2;0.5],'Variance',1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR:
5-28
"Regression with ARMA(4,0) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [-2 0.5] 4 0 {0.2 0.1} at lags [1 4]
Create Regression Models with AR Errors
SAR: MA: SMA: Variance:
{} {} {} 1
There are no NaN values in any Mdl properties, and therefore there is no need to estimate Mdl using estimate. However, you can simulate or forecast responses from Mdl using simulate or forecast.
Regression Model with AR Errors and t Innovations This example shows how to set the innovation distribution of a regression model with AR errors to a t distribution. Specify the regression model with AR(4) errors: −2 + ut 0.5 ut = 0 . 2ut − 1 + 0 . 1ut − 4 + εt, yt = Xt
where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('AR',{0.2,0.1},'ARLags',[1,4],... 'Intercept',0,'Beta',[-2;0.5],'Variance',1,... 'Distribution','t') Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(4,0) Error Model (t Distribution)" "Y" Name = "t", DoF = NaN 0 [-2 0.5] 4 0 {0.2 0.1} at lags [1 4] {} {} {} 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t10 distribution. Mdl.Distribution = struct('Name','t','DoF',10) Mdl = regARIMA with properties: Description: "Regression with ARMA(4,0) Error Model (t Distribution)" SeriesName: "Y"
5-29
5
Time Series Regression Models
Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
Name = "t", DoF = 10 0 [-2 0.5] 4 0 {0.2 0.1} at lags [1 4] {} {} {} 1
You can simulate or forecast responses using simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also Apps Econometric Modeler Objects regARIMA Functions estimate | simulate | forecast
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-37
•
“Create Regression Models with SARIMA Errors” on page 5-51
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61
More About •
5-30
“Regression Models with Time Series Errors” on page 5-5
Create Regression Models with MA Errors
Create Regression Models with MA Errors In this section... “Default Regression Model with MA Errors” on page 5-31 “MA Error Model Without an Intercept” on page 5-32 “MA Error Model with Nonconsecutive Lags” on page 5-32 “Known Parameter Values for a Regression Model with MA Errors” on page 5-33 “Regression Model with MA Errors and t Innovations” on page 5-34 These examples show how to create regression models with MA errors using regARIMA. For details on specifying regression models with MA errors using the Econometric Modeler app, see “Specify Regression Model with ARMA Errors Using Econometric Modeler App” on page 5-41.
Default Regression Model with MA Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify the regression model with MA errors. Specify the default regression model with MA(2) errors: yt = c + Xt β + ut ut = εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA(0,0,2) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(0,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 0 2 {} {} {NaN NaN} at lags [1 2] {} NaN
The software sets each parameter to NaN, and the innovation distribution to Gaussian. The MA coefficients are at lags 1 and 2. Pass Mdl into estimate with data to estimate the parameters set to NaN. Though Beta is not in the display, if you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values. 5-31
5
Time Series Regression Models
MA Error Model Without an Intercept This example shows how to specify a regression model with MA errors without a regression intercept. Specify the default regression model with MA(2) errors: yt = Xt β + ut ut = εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA('MALags',1:2,'Intercept',0) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(0,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [1×0] 0 2 {} {} {NaN NaN} at lags [1 2] {} NaN
The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation.
MA Error Model with Nonconsecutive Lags This example shows how to specify a regression model with MA errors, where the nonzero MA terms are at nonconsecutive lags. Specify the regression model with MA(12) errors: yt = c + Xt β + ut ut = εt + b1εt − 1 + b12εt − 12 . Mdl = regARIMA('MALags',[1, 12]) Mdl = regARIMA with properties: Description: "ARMA(0,12) Error Model (Gaussian Distribution)" SeriesName: "Y"
5-32
Create Regression Models with MA Errors
Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
Name = "Gaussian" NaN [1×0] 0 12 {} {} {NaN NaN} at lags [1 12] {} NaN
The MA coefficients are at lags 1 and 12. Verify that the MA coefficients at lags 2 through 11 are 0. Mdl.MA' ans=12×1 cell array {[NaN]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[ 0]} {[NaN]}
After applying the transpose, the software displays a 12-by-1 cell array. Each consecutive cell contains the corresponding MA coefficient value. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then estimate holds b2 = b3 =...= b11 = 0 during estimation.
Known Parameter Values for a Regression Model with MA Errors This example shows how to specify values for all parameters of a regression model with MA errors. Specify the regression model with MA(2) errors: 0.5 yt = Xt −3 + ut 1.2 ut = εt + 0 . 5εt − 1 − 0 . 1εt − 2, where εt is Gaussian with unit variance. Mdl = regARIMA('Intercept',0,'Beta',[0.5; -3; 1.2],... 'MA',{0.5, -0.1},'Variance',1)
5-33
5
Time Series Regression Models
Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(0,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [0.5 -3 1.2] 0 2 {} {} {0.5 -0.1} at lags [1 2] {} 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate Mdl using estimate. However, you can simulate or forecast responses from Mdl using simulate or forecast.
Regression Model with MA Errors and t Innovations This example shows how to set the innovation distribution of a regression model with MA errors to a t distribution. Specify the regression model with MA(2) errors: 0.5 yt = Xt −3 + ut 1.2 ut = εt + 0 . 5εt − 1 − 0 . 1εt − 2, where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('Intercept',0,'Beta',[0.5; -3; 1.2],... 'MA',{0.5, -0.1},'Variance',1,'Distribution','t') Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
5-34
"Regression with ARMA(0,2) Error Model (t Distribution)" "Y" Name = "t", DoF = NaN 0 [0.5 -3 1.2] 0 2 {} {} {0.5 -0.1} at lags [1 2] {} 1
Create Regression Models with MA Errors
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t15 distribution. Mdl.Distribution = struct('Name','t','DoF',15) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(0,2) Error Model (t Distribution)" "Y" Name = "t", DoF = 15 0 [0.5 -3 1.2] 0 2 {} {} {0.5 -0.1} at lags [1 2] {} 1
You can simulate and forecast responses from by passing Mdl to simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also Apps Econometric Modeler Objects regARIMA Functions estimate | simulate | forecast
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with ARMA Errors” on page 5-37
•
“Create Regression Models with SARIMA Errors” on page 5-51
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61 5-35
5
Time Series Regression Models
More About •
5-36
“Regression Models with Time Series Errors” on page 5-5
Create Regression Models with ARMA Errors
Create Regression Models with ARMA Errors In this section... “Default Regression Model with ARMA Errors” on page 5-37 “ARMA Error Model Without an Intercept” on page 5-38 “ARMA Error Model with Nonconsecutive Lags” on page 5-38 “Known Parameter Values for a Regression Model with ARMA Errors” on page 5-39 “Regression Model with ARMA Errors and t Innovations” on page 5-40 “Specify Regression Model with ARMA Errors Using Econometric Modeler App” on page 5-41
Default Regression Model with ARMA Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify the regression model with ARMA errors. Specify the default regression model with ARMA(3,2) errors: yt = c + Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA(3,0,2) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 3 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets each parameter to NaN, and the innovation distribution to Gaussian. The AR coefficients are at lags 1 through 3, and the MA coefficients are at lags 1 and 2. Pass Mdl into estimate with data to estimate the parameters set to NaN. The regARIMA model sets Beta to [] and does not display it. If you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values.
5-37
5
Time Series Regression Models
ARMA Error Model Without an Intercept This example shows how to specify a regression model with ARMA errors without a regression intercept. Specify the default regression model with ARMA(3,2) errors: yt = Xt β + ut ut = a1ut − 1 + a2ut − 2 + a3ut − 3 + εt + b1εt − 1 + b2εt − 2 . Mdl = regARIMA('ARLags',1:3,'MALags',1:2,'Intercept',0) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(3,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [1×0] 3 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation.
ARMA Error Model with Nonconsecutive Lags This example shows how to specify a regression model with ARMA errors, where the nonzero ARMA terms are at nonconsecutive lags. Specify the regression model with ARMA(8,4) errors: yt = c + Xt β + ut ut = a1u1 + a4u4 + a8u8 + εt + b1εt − 1 + b4εt − 4 . Mdl = regARIMA('ARLags',[1,4,8],'MALags',[1,4]) Mdl = regARIMA with properties: Description: "ARMA(8,4) Error Model (Gaussian Distribution)" SeriesName: "Y"
5-38
Create Regression Models with ARMA Errors
Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
Name = "Gaussian" NaN [1×0] 8 4 {NaN NaN NaN} at lags [1 4 8] {} {NaN NaN} at lags [1 4] {} NaN
The AR coefficients are at lags 1, 4, and 8, and the MA coefficients are at lags 1 and 4. The software sets the interim lags to 0. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then estimate holds all interim lag coefficients to 0 during estimation.
Known Parameter Values for a Regression Model with ARMA Errors This example shows how to specify values for all parameters of a regression model with ARMA errors. Specify the regression model with ARMA(3,2) errors: 2.5 + ut −0 . 6 ut = 0 . 7ut − 1 − 0 . 3ut − 2 + 0 . 1ut − 3 + εt + 0 . 5εt − 1 + 0 . 2εt − 2, yt = Xt
where εt is Gaussian with unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},'Variance',1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(3,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [2.5 -0.6] 3 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate Mdl using estimate. However, you can simulate or forecast responses from Mdl using simulate or forecast.
5-39
5
Time Series Regression Models
Regression Model with ARMA Errors and t Innovations This example shows how to set the innovation distribution of a regression model with ARMA errors to a t distribution. Specify the regression model with ARMA(3,2) errors: 2.5 + ut −0 . 6 ut = 0 . 7ut − 1 − 0 . 3ut − 2 + 0 . 1ut − 3 + εt + 0 . 5εt − 1 + 0 . 2εt − 2, yt = Xt
where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},'Variance',1,... 'Distribution','t') Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(3,2) Error Model (t Distribution)" "Y" Name = "t", DoF = NaN 0 [2.5 -0.6] 3 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t5 distribution. Mdl.Distribution = struct('Name','t','DoF',5) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
5-40
"Regression with ARMA(3,2) Error Model (t Distribution)" "Y" Name = "t", DoF = 5 0 [2.5 -0.6] 3 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
Create Regression Models with ARMA Errors
You can simulate or forecast responses from Mdl using simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
Specify Regression Model with ARMA Errors Using Econometric Modeler App In the Econometric Modeler app, you can specify the predictor variables in the regression component, and the error model lag structure and innovation distribution of a regression model with ARMA(p,q) errors, by following these steps. All specified coefficients are unknown but estimable parameters. 1
At the command line, open the Econometric Modeler app. econometricModeler
Alternatively, open the app from the apps gallery (see Econometric Modeler). 2
In the Time Series pane, select the response time series to which the model will be fit.
3
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
4
In the models gallery, in the Regression Models section, click RegARMA. The RegARMA Model Parameters dialog box appears.
5-41
5
Time Series Regression Models
5
Choose the error model lag structure. To specify a regression model with ARMA(p,q) errors that includes all AR lags from 1 through p and all MA lags from 1 through q, use the Lag Order tab. For the flexibility to specify the inclusion of particular lags, use the Lag Vector tab. For more details, see “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44. Regardless of the tab you use, you can verify the model form by inspecting the equation in the Model Equation section.
6
In the Predictors section, choose at least one predictor variable by selecting the Include? check box for the time series.
For example, suppose you are working with the Data_USEconModel.mat data set and its variables are listed in the Time Series pane. • To specify a regression model with AR(3) errors for the unemployment rate containing all consecutive AR lags from 1 through its order, Gaussian-distributed innovations, and the predictor variables COE, CPIAUCSL, FEDFUNDS, and GDP: 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA. .
4
5-42
In the regARMA Model Parameters dialog box, on the Lag Order tab, set Autoregressive Order to 3.
Create Regression Models with ARMA Errors
5
In the Predictors section, select the Include? check box for the COE, CPIAUCSL, FEDFUNDS, and GDP time series.
• To specify a regression model with MA(2) errors for the unemployment rate containing all MA lags from 1 through its order, Gaussian-distributed innovations, and the predictor variables COE and CPIAUCSL. 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box, on the Lag Order tab, set Moving Average Order to 2.
5
In the Predictors section, select the Include? check box for the COE and CPIAUCSL time series.
• To specify the regression model with ARMA(8,4) errors for the unemployment rate containing nonconsecutive lags yt = c + β1COEt + β2CPIAUCSLt + ut 1 − α1L − α4L4 − α8L8 ut = 1 + b1L + b4L4 εt
,
where εt is a series of IID Gaussian innovations: 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box, click the Lag Vector tab:
5
a
In the Autoregressive Lags box, type 1 4 8.
b
In the Moving Average Lags box, type 1 4.
In the Predictors section, select the Include? check box for the COE and CPIAUCSL time series.
5-43
5
Time Series Regression Models
• To specify a regression model with ARMA(3,2) errors for the unemployment rate containing all consecutive AR and MA lags through their respective orders, the predictor variables COE and CPIAUCSL, and t-distributed innovations: 1
In the Time Series pane, select the UNRATE time series.
2
On the Econometric Modeler tab, in the Models section, click the arrow to display the models gallery.
3
In the models gallery, in the Regression Models section, click RegARMA.
4
In the regARMA Model Parameters dialog box, click the Lag Order tab: a
Set Autoregressive Order to 3.
b
Set Moving Average Order to 2.
5
Click the Innovation Distribution button, then select t.
6
In the Predictors section, select the Include? check box for the COE and CPIAUCSL time series.
The degrees of freedom parameter of the t distribution is an unknown but estimable parameter. After you specify a model, click Estimate to estimate all unknown parameters in the model.
See Also Apps Econometric Modeler Objects regARIMA 5-44
Create Regression Models with ARMA Errors
Functions estimate | simulate | forecast
Related Examples •
“Analyze Time Series Data Using Econometric Modeler” on page 4-2
•
“Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44
•
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARIMA Errors” on page 5-46
•
“Create Regression Models with SARIMA Errors” on page 5-51
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61
More About •
“Regression Models with Time Series Errors” on page 5-5
5-45
5
Time Series Regression Models
Create Regression Models with ARIMA Errors In this section... “Default Regression Model with ARIMA Errors” on page 5-46 “ARIMA Error Model Without an Intercept” on page 5-47 “ARIMA Error Model with Nonconsecutive Lags” on page 5-47 “Known Parameter Values for a Regression Model with ARIMA Errors” on page 5-48 “Regression Model with ARIMA Errors and t Innovations” on page 5-49
Default Regression Model with ARIMA Errors This example shows how to apply the shorthand regARIMA(p,D,q) syntax to specify the regression model with ARIMA errors. Specify the default regression model with ARIMA(3,1,2) errors: yt = c + Xt β + ut 1 − a1L − a2L2 − a3L3 1 − L ut = 1 + b1L + b2L2 εt . Mdl = regARIMA(3,1,2) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 4 1 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets each parameter to NaN, and the innovation distribution to Gaussian. The AR coefficients are at lags 1 through 3, and the MA coefficients are at lags 1 and 2. The property P = p + D = 3 + 1 = 4. Therefore, the software requires at least four presample values to initialize the time series. Pass Mdl into estimate with data to estimate the parameters set to NaN. The regARIMA model sets Beta to [] and does not display it. If you pass a matrix of predictors ( Xt) into estimate, then estimate estimates Beta. The estimate function infers the number of regression coefficients in Beta from the number of columns in Xt. Tasks such as simulation and forecasting using simulate and forecast do not accept models with at least one NaN for a parameter value. Use dot notation to modify parameter values. 5-46
Create Regression Models with ARIMA Errors
Be aware that the regression model intercept (Intercept) is not identifiable in regression models with ARIMA errors. If you want to estimate Mdl, then you must set Intercept to a value using, for example, dot notation. Otherwise, estimate might return a spurious estimate of Intercept.
ARIMA Error Model Without an Intercept This example shows how to specify a regression model with ARIMA errors without a regression intercept. Specify the default regression model with ARIMA(3,1,2) errors: yt = Xt β + ut 1 − a1L − a2L2 − a3L3 1 − L ut = 1 + b1L + b2L2 εt . Mdl = regARIMA('ARLags',1:3,'MALags',1:2,'D',1,'Intercept',0) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(3,1,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [1×0] 4 1 2 {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} NaN
The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. In general, if you want to use estimate to estimate a regression models with ARIMA errors where D > 0 or s > 0, then you must set Intercept to a value before estimation. You can modify the properties of Mdl using dot notation.
ARIMA Error Model with Nonconsecutive Lags This example shows how to specify a regression model with ARIMA errors, where the nonzero AR and MA terms are at nonconsecutive lags. Specify the regression model with ARIMA(8,1,4) errors: 5-47
5
Time Series Regression Models
yt = Xt β + ut (1 − a1L − a4L4 − a8L8)(1 − L)ut = (1 + b1L + b4L4)εt . Mdl = regARIMA('ARLags',[1,4,8],'D',1,'MALags',[1,4],... 'Intercept',0) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"ARIMA(8,1,4) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 [1×0] 9 1 4 {NaN NaN NaN} at lags [1 4 8] {} {NaN NaN} at lags [1 4] {} NaN
The AR coefficients are at lags 1, 4, and 8, and the MA coefficients are at lags 1 and 4. The software sets the interim lags to 0. Pass Mdl and data into estimate. The software estimates all parameters that have the value NaN. Then estimate holds all interim lag coefficients to 0 during estimation.
Known Parameter Values for a Regression Model with ARIMA Errors This example shows how to specify values for all parameters of a regression model with ARIMA errors. Specify the regression model with ARIMA(3,1,2) errors: yt = Xt
2.5 + ut −0 . 6
1 − 0 . 7L + 0 . 3L2 − 0 . 1L3 1 − L ut = 1 + 0 . 5L + 0 . 2L2 εt, where εt is Gaussian with unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},... 'Variance',1,'D',1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept:
5-48
"Regression with ARIMA(3,1,2) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0
Create Regression Models with ARIMA Errors
Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
[2.5 -0.6] 4 1 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate it. However, you can simulate or forecast responses by passing Mdl to simulate or forecast.
Regression Model with ARIMA Errors and t Innovations This example shows how to set the innovation distribution of a regression model with ARIMA errors to a t distribution. Specify the regression model with ARIMA(3,1,2) errors: yt = Xt
2.5 + ut −0 . 6
1 − 0 . 7L + 0 . 3L2 − 0 . 1L3 1 − L ut = 1 + 0 . 5L + 0 . 2L2 εt, where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('Intercept',0,'Beta',[2.5; -0.6],... 'AR',{0.7, -0.3, 0.1},'MA',{0.5, 0.2},'Variance',1,... 'Distribution','t','D',1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARIMA(3,1,2) Error Model (t Distribution)" "Y" Name = "t", DoF = NaN 0 [2.5 -0.6] 4 1 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t10 distribution. Mdl.Distribution = struct('Name','t','DoF',10)
5-49
5
Time Series Regression Models
Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARIMA(3,1,2) Error Model (t Distribution)" "Y" Name = "t", DoF = 10 0 [2.5 -0.6] 4 1 2 {0.7 -0.3 0.1} at lags [1 2 3] {} {0.5 0.2} at lags [1 2] {} 1
You can simulate or forecast responses by passing Mdl to simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-37
•
“Create Regression Models with SARIMA Errors” on page 5-51
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61
More About •
5-50
“Regression Models with Time Series Errors” on page 5-5
Create Regression Models with SARIMA Errors
Create Regression Models with SARIMA Errors In this section... “SARMA Error Model Without an Intercept” on page 5-51 “Known Parameter Values for a Regression Model with SARIMA Errors” on page 5-52 “Regression Model with SARIMA Errors and t Innovations” on page 5-52
SARMA Error Model Without an Intercept This example shows how to specify a regression model with SARMA errors without a regression intercept. Specify the default regression model with SARMA(1, 1) × (2, 1, 1)4 errors: yt = Xt β + ut 4
1 − a1L 1 − A4L − A8L8 1 − L4 ut = 1 + b1L 1 + B4L4 εt . Mdl = regARIMA('ARLags',1,'SARLags',[4, 8],... 'Seasonality',4,'MALags',1,'SMALags',4,'Intercept',0) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARMA(1,1) Error Model Seasonally Integrated with Seasonal AR(8) and MA(4) (Gau "Y" Name = "Gaussian" 0 [1×0] 13 5 {NaN} at lag [1] {NaN NaN} at lags [4 8] {NaN} at lag [1] {NaN} at lag [4] 4 NaN
The name-value pair argument: • 'ARLags',1 specifies which lags have nonzero coefficients in the nonseasonal autoregressive polynomial, so a(L) = (1 − a1L). • 'SARLags',[4 8] specifies which lags have nonzero coefficients in the seasonal autoregressive polynomial, so A(L) = (1 − A4L4 − A8L8). • 'MALags',1 specifies which lags have nonzero coefficients in the nonseasonal moving average polynomial, so b(L) = (1 + b1L). • 'SMALags',4 specifies which lags have nonzero coefficients in the seasonal moving average polynomial, so B(L) = (1 + B4L4). • 'Seasonality',4 specifies the degree of seasonal integration and corresponds to (1 − L4). 5-51
5
Time Series Regression Models
The software sets Intercept to 0, but all other parameters in Mdl are NaN values by default. Property P = p + D + ps + s = 1 + 0 + 8 + 4 = 13, and property Q = q + qs = 1 + 4 = 5. Therefore, the software requires at least 13 presample observation to initialize Mdl. Since Intercept is not a NaN, it is an equality constraint during estimation. In other words, if you pass Mdl and data into estimate, then estimate sets Intercept to 0 during estimation. You can modify the properties of Mdl using dot notation. Be aware that the regression model intercept (Intercept) is not identifiable in regression models with ARIMA errors. If you want to estimate Mdl, then you must set Intercept to a value using, for example, dot notation. Otherwise, estimate might return a spurious estimate of Intercept.
Known Parameter Values for a Regression Model with SARIMA Errors This example shows how to specify values for all parameters of a regression model with SARIMA errors. Specify the regression model with SARIMA(1, 1, 1) × (1, 1, 0)12 errors: yt = Xt β + ut 1 − 0 . 2L 1 − L 1 − 0 . 25L12 − 0 . 1L24 1 − L12 ut = 1 + 0 . 15L εt, where εt is Gaussian with unit variance. Mdl = regARIMA('AR',0.2,'SAR',{0.25, 0.1},'SARLags',[12 24],... 'D',1,'Seasonality',12,'MA',0.15,'Intercept',0,'Variance',1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARIMA(1,1,1) Error Model Seasonally Integrated with Seasonal AR(24) (Gaussian "Y" Name = "Gaussian" 0 [1×0] 38 1 1 {0.2} at lag [1] {0.25 0.1} at lags [12 24] {0.15} at lag [1] {} 12 1
The parameters in Mdl do not contain NaN values, and therefore there is no need to estimate Mdl. However, you can simulate or forecast responses by passing Mdl to simulate or forecast.
Regression Model with SARIMA Errors and t Innovations
5-52
Create Regression Models with SARIMA Errors
This example shows how to set the innovation distribution of a regression model with SARIMA errors to a t distribution. Specify the regression model with SARIMA(1, 1, 1) × (1, 1, 0)12 errors: yt = Xt β + ut 1 − 0 . 2L 1 − L 1 − 0 . 25L12 − 0 . 1L24 1 − L12 ut = 1 + 0 . 15L εt, where εt has a t distribution with the default degrees of freedom and unit variance. Mdl = regARIMA('AR',0.2,'SAR',{0.25, 0.1},'SARLags',[12 24],... 'D',1,'Seasonality',12,'MA',0.15,'Intercept',0,... 'Variance',1,'Distribution','t') Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARIMA(1,1,1) Error Model Seasonally Integrated with Seasonal AR(24) (t Distrib "Y" Name = "t", DoF = NaN 0 [1×0] 38 1 1 {0.2} at lag [1] {0.25 0.1} at lags [12 24] {0.15} at lag [1] {} 12 1
The default degrees of freedom is NaN. If you don't know the degrees of freedom, then you can estimate it by passing Mdl and the data to estimate. Specify a t10 distribution. Mdl.Distribution = struct('Name','t','DoF',10) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"ARIMA(1,1,1) Error Model Seasonally Integrated with Seasonal AR(24) (t Distrib "Y" Name = "t", DoF = 10 0 [1×0] 38 1 1 {0.2} at lag [1] {0.25 0.1} at lags [12 24] {0.15} at lag [1] {} 12 1
5-53
5
Time Series Regression Models
You can simulate or forecast responses by passing Mdl to simulate or forecast because Mdl is completely specified. In applications, such as simulation, the software normalizes the random t innovations. In other words, Variance overrides the theoretical variance of the t random variable (which is DoF/(DoF - 2)), but preserves the kurtosis of the distribution.
See Also regARIMA | estimate | simulate | forecast
Related Examples •
“Create Regression Models with ARIMA Errors” on page 5-8
•
“Specify Default Regression Model with ARIMA Errors” on page 5-19
•
“Create Regression Models with AR Errors” on page 5-26
•
“Create Regression Models with MA Errors” on page 5-31
•
“Create Regression Models with ARMA Errors” on page 5-37
•
“Specify Regression Model with SARIMA Errors” on page 5-55
•
“Specify ARIMA Error Model Innovation Distribution” on page 5-61
More About •
5-54
“Regression Models with Time Series Errors” on page 5-5
Specify Regression Model with SARIMA Errors
Specify Regression Model with SARIMA Errors This example shows how to specify a regression model with multiplicative seasonal ARIMA errors. Load the Airline data set from the MATLAB® root folder, and load the recession data set. Plot the monthly passenger totals and log-totals. load Data_Airline.mat load Data_Recessions y = DataTimeTable.PSSG; logY = log(y); figure tiledlayout(2,1) nexttile plot(DataTimeTable.Time,y) title('{\bf Monthly Passenger Totals (Jan1949 - Dec1960)}') nexttile plot(DataTimeTable.Time,log(y)) title('{\bf Monthly Passenger Log-Totals (Jan1949 - Dec1960)}')
The log transformation seems to linearize the time series. Construct this predictor, which is whether the country was in a recession during the sampled period. 0 means the country was not in a recession, and 1 means that it was in a recession. 5-55
5
Time Series Regression Models
X = zeros(numel(dates),1); % Preallocation for j = 1:size(Recessions,1) X(dates >= Recessions(j,1) & dates 0 or s > 0, and you want to estimate the intercept, c, then c is not identifiable. You can show that this is true. • Consider “Equation 5-8”. Solve for ut in the second equation and substitute it into the first. 5-109
5
Time Series Regression Models
yt = c + Xt β + Η−1(L)Ν(L)εt, where • Η(L) = a(L)(1 − L)D A(L)(1 − Ls) . • Ν(L) = b(L)B(L) . • The likelihood function is based on the distribution of εt. Solve for εt. εt = Ν−1(L)Η(L)yt + Ν−1(L)Η(L)c + Ν−1(L)Η(L)Xt β . • Note that Ljc = c. The constant term contributes to the likelihood as follows. 1
Ν−1(L)Η(L)c = Ν−1(L)a(L)A(L)( (1 − Ls)c 1
= Ν−1(L)a(L)A(L)( (c − c) =0 or 1
Ν−1(L)Η(L)c = Ν−1(L)a(L)A(L)(1 − Ls)( c 1
= Ν−1(L)a(L)A(L)(1 − Ls)( (1 − L)c 1
= Ν−1(L)a(L)A(L)(1 − Ls)( (c − c) = 0. Therefore, when the ARIMA error model is integrated, the likelihood objective function based on the distribution of εt is invariant to the value of c. In general, the effective constant in the equivalent ARIMAX representation of a regression model with ARIMA errors is a function of the compound autoregressive coefficients and the original intercept c, and incorporates a nonlinear constraint. This constraint is seamlessly incorporated for applications such as Monte Carlo simulation of integrated models with nonzero intercepts. However, for estimation, the ARIMAX model is unable to identify the constant in the presence of an integrated polynomial, and this results in spurious or unusual parameter estimates. You should exclude an intercept from integrated models in most applications.
Intercept Identifiability Illustration As an illustration, consider the regression model with ARIMA(2,1,1) errors without predictors yt = 0.5 + ut 1 − 0.8L + 0.4L2 (1 − L)ut = (1 + 0.3L)εt,
(5-9)
or yt = 0.5 + ut 1 − 1.8L + 1.2L2 − 0.4L3 ut = (1 + 0.3L)εt . You can rewrite “Equation 5-10” using substitution and some manipulation 5-110
(5-10)
Intercept Identifiability in Regression Models with ARIMA Errors
yt = 1 − 1.8 + 1.2 − 0.4 0.5 + 1.8yt − 1 − 1.2yt − 2 + 0.4yt − 3 + εt + 0.3εt − 1 . Note that 1 − 1.8 + 1.2 − 0.4 0.5 = 0(0.5) = 0. Therefore, the regression model with ARIMA(2,1,1) errors in “Equation 5-10” has an ARIMA(2,1,1) model representation yt = 1.8yt − 1 − 1.2yt − 2 + 0.4yt − 3 + εt + 0.3εt − 1 . You can see that the constant is not present in the model (which implies its value is 0), even though the value of the regression model with ARIMA errors intercept is 0.5. You can also simulate this behavior. Start by specifying the regression model with ARIMA(2,1,1) errors in “Equation 5-10”. Mdl0 = regARIMA('D',1,'AR',{0.8 -0.4},'MA',0.3,... 'Intercept',0.5,'Variance', 0.2);
Simulate 1000 observations. rng(1); T = 1000; y = simulate(Mdl0, T);
Fit Mdl to the data. Mdl = regARIMA('ARLags',1:2,'MALags',1,'D',1);... % "Empty" model to pass into estimate [EstMdl,EstParamCov] = estimate(Mdl,y,'Display','params');
Warning: When ARIMA error model is integrated, the intercept is unidentifiable and cannot be esti ARIMA(2,1,1) Error Model (Gaussian Distribution):
Intercept AR{1} AR{2} MA{1} Variance
Value ________
StandardError _____________
TStatistic __________
PValue ___________
NaN 0.89647 -0.45102 0.18804 0.19789
NaN 0.048507 0.038916 0.054505 0.0083512
NaN 18.481 -11.59 3.45 23.696
NaN 2.9207e-76 4.6573e-31 0.00056069 3.9373e-124
estimate displays a warning to inform you that the intercept is not identifiable, and sets its estimate, standard error, and t-statistic to NaN. Plot the profile likelihood for the intercept. c = linspace(Mdl0.Intercept - 50,... Mdl0.Intercept + 50,100); % Grid of intercepts logL = nan(numel(c),1); % For preallocation for i = 1:numel(logL) EstMdl.Intercept = c(i); [~,~,~,logL(i)] = infer(EstMdl,y);
5-111
5
Time Series Regression Models
end figure plot(c,logL) title('Profile Log-Likelihood with Respect to the Intercept') xlabel('Intercept') ylabel('Loglikelihood')
The loglikelihood does not change over the grid of intercept values. The slight oscillation is a result of the numerical routine used by infer.
See Also Related Examples •
5-112
“Estimate Regression Model with ARIMA Errors” on page 5-88
Alternative ARIMA Model Representations
Alternative ARIMA Model Representations In this section... “Mathematical Development of regARIMA to ARIMAX Model Conversion” on page 5-113 “Show Conversion in MATLAB®” on page 5-115
Mathematical Development of regARIMA to ARIMAX Model Conversion ARIMAX models and regression models with ARIMA errors are closely related, and the choice of which to use is generally dictated by your goals for the analysis. If your objective is to fit a parsimonious model to data and forecast responses, then there is very little difference between the two models. If you are more interested in preserving the usual interpretation of a regression coefficient as a measure of sensitivity, i.e., the effect of a unit change in a predictor variable on the response, then use a regression model with ARIMA errors. Regression coefficients in ARIMAX models do not possess that interpretation because of the dynamic dependence on the response [1]. Suppose that you have the parameter estimates from a regression model with ARIMA errors, and you want to see how the model structure compares to ARIMAX model. Or, suppose you want some insight as to the underlying relationship between the two models. The ARIMAX model is (t = 1,...,T): Η(L)yt = c + Xt β + Ν(L)εt,
(5-11)
where • yt is the univariate response series. • Xt is row t of X, which is the matrix of concatenated predictor series. That is, Xt is observation t of each predictor series. • β is the regression coefficient. • c is the regression model intercept. • Η(L) = ϕ(L)(1 − L)DΦ(L)(1 − Ls) = 1 − η L − η L2 − ... − η LP, which is the degree P lag operator 1 2 P polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials. For more details on notation, see “What Are Multiplicative ARIMA Models?” on page 7-49. • Ν(L) = θ(L)Θ(L) = 1 + ν L + ν L2 + ... + ν LQ, which is the degree Q lag operator polynomial that 1 2 Q captures the combined effect of the seasonal and nonseasonal moving average polynomials. • εt is a white noise innovation process. The regression model with ARIMA errors is (t = 1,...,T) yt = c + Xt β + ut A(L)ut = B(L)εt,
(5-12)
where 5-113
5
Time Series Regression Models
• ut is the unconditional disturbances process. •
D
A(L) = ϕ(L)(1 − L) Φ(L)(1 − Ls) = 1 − a1L − a2L2 − ... − aPLP, which is the degree P lag operator polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials.
• B(L) = θ(L)Θ(L) = 1 + b L + b L2 + ... + b LQ, which is the degree Q lag operator polynomial that 1 2 Q captures the combined effect of the seasonal and nonseasonal moving average polynomials. The values of the variables defined in “Equation 5-12” are not necessarily equivalent to the values of the variables in “Equation 5-11”, even though the notation might be similar. Consider “Equation 5-12”, the regression model with ARIMA errors. Use the following operations to convert the regression model with ARIMA errors to its corresponding ARIMAX model. 1
Solve for ut. yt = c + Xt β + ut ut =
2
B(L) ε . A(L) t
Substitute ut into the regression equation. yt = c + Xt β +
B(L) ε A(L) t
A(L)yt = A(L)c + A(L)Xt β + B(L)εt . 3
Solve for yt. yt = A(L)c + A(L)Xt β + = A(L)c + ZtΓ +
P
k=1
P
∑
∑
k=1
ak yt − k + B(L)εt (5-13)
ak yt − k + B(L)εt .
In “Equation 5-13”, • A(L)c = (1 – a1 – a2 –...– aP)c. That is, the constant in the ARIMAX model is the intercept in the regression model with ARIMA errors with a nonlinear constraint. Though applications, such as simulate, handle this constraint, estimate cannot incorporate such a constraint. In the latter case, the models are equivalent when you fix the intercept and constant to 0. • In the term A(L)Xtβ, the lag operator polynomial A(L) filters the T-by-1 vector Xtβ, which is the linear combination of the predictors weighted by the regression coefficients. This filtering process requires P presample observations of the predictor series. • arima constructs the matrix Zt as follows: • Each column of Zt corresponds to each term in A(L). • The first column of Zt is the vector Xtβ. • The second column of Zt is a sequence of d2 NaNs (d2 is the degree of the second term in d
A(L)), followed by the product L j Xt β. That is, the software attaches d2 NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by d2 observations. 5-114
Alternative ARIMA Model Representations
• The jth column of Zt is a sequence of dj NaNs (dj is the degree of the jth term in A(L)), d
followed by the product L j Xt β. That is, the software attaches dj NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by dj observations. . • Γ = [1 –a1 –a2 ... –aP]'. The arima converter removes all zero-valued autoregressive coefficients of the difference equation. Subsequently, the arima converter does not associate zero-valued autoregressive coefficients with columns in Zt, nor does it include corresponding, zero-valued coefficients in Γ. 4
Rewrite “Equation 5-13”, yt = (1 −
P
∑
k=1
ak)c + Xt β −
P
∑
k=1
ak Xt − kβ +
P
∑
k=1
ak yt − k + εt +
Q
∑
k=1
εt − k .
For example, consider the following regression model whose errors are ARMA(2,1): yt = 0.2 + 0.5Xt + ut 1 − 0.8L + 0.4L2 ut = 1 + 0.3L εt .
(5-14)
The equivalent ARMAX model is: yt = 0.12 + 0.5 − 0.4L + 0.2L2 Xt + 0.8yt − 1 − 0.4yt − 2 + (1 + 0.3L)εt = 0.12 + ZtΓ + 0.8yt − 1 − 0.4yt − 2 + (1 + 0.3L)εt, or (1 − 0.8L + 0.4L2)yt = 0.12 + ZtΓ + (1 + 0.3L)εt, where Γ = [1 –0.8 0.4]' and x1 NaN NaN x2
x1
NaN
Zt = 0.5 x3
x2
x1
.
⋮ ⋮ ⋮ xT xT − 1 xT − 2 This model is not integrated because all of the eigenvalues associated with the AR polynomial are within the unit circle, but the predictors might affect the otherwise stable process. Also, you need presample predictor data going back at least 2 periods to, for example, fit the model to data.
Show Conversion in MATLAB® Illustrate the conversion in MATLAB® by model simulation and estimation. Specify the regression model with ARIMA errors in “Equation 5-14”. 5-115
5
Time Series Regression Models
MdlregARIMA0 = regARIMA('Intercept',0.2,'AR',{0.8 -0.4}, ... 'MA',0.3,'Beta',[0.3 -0.2],'Variance',0.2);
Generate presample observations and predictor data. rng(1); % For reproducibility T = 100; maxPQ = max(MdlregARIMA0.P,MdlregARIMA0.Q); numObs = T + maxPQ; % Adjust number of observations to account for presample XregARIMA = randn(numObs,2); % Simulate predictor data u0 = randn(maxPQ,1); % Presample unconditional disturbances u(t) e0 = randn(maxPQ,1); % Presample innovations e(t)
Simulate data from the regression model with ARIMA errors MdlregARIMA0. rng(100) % For consistent seed with later call [y1,e1,u1] = simulate(MdlregARIMA0,T,'U0',u0, ... 'E0',e0,'X',XregARIMA);
Convert the regression model with ARIMA errors to an ARIMAX model. [MdlARIMAX0,XARIMAX] = arima(MdlregARIMA0,'X',XregARIMA); MdlARIMAX0 MdlARIMAX0 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMAX(2,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 1 0.12 {0.8 -0.4} at lags [1 2] {} {0.3} at lag [1] {} 0 [1 -0.8 0.4] 0.2
Generate presample responses for the ARIMAX model to ensure consistency with the regression model with ARIMA errors. Simulate data from the ARIMAX model. y0 = MdlregARIMA0.Intercept + XregARIMA(1:maxPQ,:)*MdlregARIMA0.Beta' + u0; rng(100) % For consistent seed with earlier call y2 = simulate(MdlARIMAX0,T,'Y0',y0,'E0',e0,'X',XARIMAX); figure plot(y1,'LineWidth',3) hold on plot(y2,'r:','LineWidth',2.5) hold off title("\bf Simulated Paths") legend("regARIMA Model","ARIMAX Model",'Location','best')
5-116
Alternative ARIMA Model Representations
The simulated paths are equal because the arima converter enforces the nonlinear constraint when it converts the regression model intercept to the ARIMAX model constant. Fit a regression model with ARIMA errors to the simulated data. MdlregARIMA0 = regARIMA('ARLags',[1 2],'MALags',1); EstMdlregARIMA = estimate(MdlregARIMA0,y1,'E0',e0,'U0',u0,'X',XregARIMA); Regression with ARMA(2,1) Error Model (Gaussian Distribution): Value ________ Intercept AR{1} AR{2} MA{1} Beta(1) Beta(2) Variance
0.14074 0.83061 -0.45402 0.42803 0.29552 -0.17601 0.18231
StandardError _____________ 0.1014 0.1375 0.1164 0.15145 0.022938 0.030607 0.027765
TStatistic __________ 1.3879 6.0407 -3.9007 2.8262 12.883 -5.7506 6.5663
PValue __________ 0.16518 1.5349e-09 9.5927e-05 0.0047109 5.597e-38 8.8941e-09 5.1569e-11
Fit an ARIMAX model to the simulated data. MdlARIMAX = arima('ARLags',[1 2],'MALags',1); EstMdlARIMAX = estimate(MdlARIMAX,y2,'E0',e0,'Y0',... y0,'X',XARIMAX);
5-117
5
Time Series Regression Models
ARIMAX(2,0,1) Model (Gaussian Distribution): Value ________ Constant AR{1} AR{2} MA{1} Beta(1) Beta(2) Beta(3) Variance
0.084996 0.83136 -0.45599 0.426 1.053 -0.6904 0.45399 0.18112
StandardError _____________ 0.064217 0.13634 0.11788 0.15753 0.13685 0.19262 0.15352 0.028836
TStatistic __________ 1.3236 6.0975 -3.8683 2.7043 7.6949 -3.5843 2.9572 6.281
PValue __________ 0.18564 1.0775e-09 0.0001096 0.0068446 1.4166e-14 0.00033796 0.0031047 3.3634e-10
Convert the estimated regression model with ARIMA errors EstMdlregARIMA to an ARIMAX model. ConvertedMdlARIMAX = arima(EstMdlregARIMA,'X',XregARIMA) ConvertedMdlARIMAX = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMAX(2,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 1 0.087737 {0.830611 -0.454025} at lags [1 2] {} {0.428031} at lag [1] {} 0 [1 -0.830611 0.454025] 0.182313
The estimated ARIMAX model constant is not equal to the ARIMAX model constant converted from the regression model with ARIMA errors. In other words, EstMdlARIMAX.Constant is 0.084996 and ConvertedMdlARIMAX.Constant = 0.087737. The reason for the discrepancy is estimate does not enforce the nonlinear constraint that the arima converter enforces. As a result, the other estimates are close, but not equal.
References [1] Hyndman, R. J. (2010, October). "The ARIMAX Model Muddle." Rob J. Hyndman. Retrieved May 4, 2017 from https://robjhyndman.com/hyndsight/arimax/.
See Also estimate | estimate | arima
Related Examples • 5-118
“Estimate Regression Model with ARIMA Errors” on page 5-88
Simulate Regression Models with ARMA Errors
Simulate Regression Models with ARMA Errors In this section... “Simulate an AR Error Model” on page 5-119 “Simulate an MA Error Model” on page 5-125 “Simulate an ARMA Error Model” on page 5-131
Simulate an AR Error Model This example shows how to simulate sample paths from a regression model with AR errors without specifying presample disturbances. Specify the regression model with AR(2) errors: −2 + ut 1.5 ut = 0 . 75ut − 1 − 0 . 5ut − 2 + εt, yt = 2 + Xt
where εt is Gaussian with mean 0 and variance 1. Beta = [-2; 1.5]; Intercept = 2; a1 = 0.75; a2 = -0.5; Variance = 1; Mdl = regARIMA('AR',{a1, a2},'Intercept',Intercept,... 'Beta',Beta,'Variance',Variance);
Generate two length T = 50 predictor series by random selection from the standard Gaussian distribution. T = 50; rng(1); % For reproducibility X = randn(T,2);
The software treats the predictors as nonstochastic series. Generate and plot one sample path of responses from Mdl. rng(2); ySim = simulate(Mdl,T,'X',X); figure plot(ySim) title('{\bf Simulated Response Series}')
5-119
5
Time Series Regression Models
simulate requires P = 2 presample unconditional disturbances (ut) to initialize the error series. Without them, as in this case, simulate sets the necessary presample unconditional disturbances to 0. Alternatively, filter a random innovation series through Mdl using filter. rng(2); e = randn(T,1); yFilter = filter(Mdl,e,'X',X); figure plot(yFilter) title('{\bf Simulated Response Series Using Filtered Innovations}')
5-120
Simulate Regression Models with ARMA Errors
The plots suggest that the simulated responses and the responses generated from the filtered innovations are equivalent. Simulate 1000 response paths from Mdl. Assess transient effects by plotting the unconditional disturbance (U) variances across the simulated paths at each period. numPaths = 1000; [Y,~,U] = simulate(Mdl,T,'NumPaths',numPaths,'X',X); figure h1 = plot(Y,'Color',[.85,.85,.85]); title('{\bf 1000 Simulated Response Paths}') hold on h2 = plot(1:T,Intercept+X*Beta,'k--','LineWidth',2); legend([h1(1),h2],'Simulated Path','Mean') hold off
5-121
5
Time Series Regression Models
figure h1 = plot(var(U,0,2),'r','LineWidth',2); hold on theoVarFix = ((1-a2)*Variance)/((1+a2)*((1-a2)^2-a1^2)); h2 = plot([1 T],[theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') hold off
5-122
Simulate Regression Models with ARMA Errors
The simulated response paths follow their theoretical mean, c + Xβ, which is not constant over time (and might look nonstationary). The variance of the process is not constant, but levels off at the theoretical variance by the 10th period. The theoretical variance of the AR(2) error model is 1 − a2 σε2 1 + a2
1 − a2
2
− a12
=
(1 + 0 . 5) 2
2
(1 − 0 . 5) (1 + 0 . 5) − 0 . 75
= 1 . 78
You can reduce transient effects is by partitioning the simulated data into a burn-in portion and a portion for analysis. Do not use the burn-in portion for analysis. Include enough periods in the burn-in portion to overcome the transient effects. burnIn = 1:10; notBurnIn = burnIn(end)+1:T; Y = Y(notBurnIn,:); X = X(notBurnIn,:); U = U(notBurnIn,:); figure h1 = plot(notBurnIn,Y,'Color',[.85,.85,.85]); hold on h2 = plot(notBurnIn,Intercept+X*Beta,'k--','LineWidth',2); title('{\bf 1000 Simulated Response Paths for Analysis}') legend([h1(1),h2],'Simulated Path','Mean') hold off
5-123
5
Time Series Regression Models
figure h1 = plot(notBurnIn,var(U,0,2),'r','LineWidth',2); hold on h2 = plot([notBurnIn(1) notBurnIn(end)],... [theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Converged Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') hold off
5-124
Simulate Regression Models with ARMA Errors
Unconditional disturbance simulation variances fluctuate around the theoretical variance due to Monte Carlo sampling error. Be aware that the exclusion of the burn-in sample from analysis reduces the effective sample size.
Simulate an MA Error Model This example shows how to simulate responses from a regression model with MA errors without specifying a presample. Specify the regression model with MA(8) errors: −2 + ut 1.5 ut = εt + 0 . 4εt − 1 − 0 . 3εt − 4 + 0 . 2εt − 8, yt = 2 + Xt
where εt is Gaussian with mean 0 and variance 0.5. Beta = [-2; 1.5]; Intercept = 2; b1 = 0.4; b4 = -0.3; b8 = 0.2; Variance = 0.5;
5-125
5
Time Series Regression Models
Mdl = regARIMA('MA',{b1, b4, b8},'MALags',[1 4 8],... 'Intercept',Intercept,'Beta',Beta,'Variance',Variance);
Generate two length T = 100 predictor series by random selection from the standard Gaussian distribution. T = 100; rng(4); % For reproducibility X = randn(T,2);
The software treats the predictors as nonstochastic series. Generate and plot one sample path of responses from Mdl. rng(5); ySim = simulate(Mdl,T,'X',X); figure plot(ySim) title('{\bf Simulated Response Series}')
simulate requires Q = 8 presample innovations (εt) to initialize the error series. Without them, as in this case, simulate sets the necessary presample innovations to 0. Alternatively, use filter to filter a random innovation series through Mdl. rng(5); e = randn(T,1);
5-126
Simulate Regression Models with ARMA Errors
yFilter = filter(Mdl,e,'X',X); figure plot(yFilter) title('{\bf Simulated Response Series Using Filtered Innovations}')
The plots suggest that the simulated responses and the responses generated from the filtered innovations are equivalent. Simulate 1000 response paths from Mdl. Assess transient effects by plotting the unconditional disturbance (U) variances across the simulated paths at each period. numPaths = 1000; [Y,~,U] = simulate(Mdl,T,'NumPaths',numPaths,'X',X); figure h1 = plot(Y,'Color',[.85,.85,.85]); title('{\bf 1000 Simulated Response Paths}') hold on h2 = plot(1:T,Intercept+X*Beta,'k--','LineWidth',2); legend([h1(1),h2],'Simulated Path','Mean') hold off
5-127
5
Time Series Regression Models
figure h1 = plot(var(U,0,2),'r','LineWidth',2); hold on theoVarFix = (1+b1^2+b4^2+b8^2)*Variance; h2 = plot([1 T],[theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') hold off
5-128
Simulate Regression Models with ARMA Errors
The simulated paths follow their theoretical mean, c + Xβ, which is not constant over time (and might look nonstationary). The variance of the process is not constant, but levels off at the theoretical variance by the 15th period. The theoretical variance of the MA(8) error model is 2
2
2
2
2
2
(1 + b1 + b4 + b8 )σε2 = 1 + 0 . 4 + ( − 0 . 3) + 0 . 2 0 . 5 = 0 . 645 . You can reduce transient effects is by partitioning the simulated data into a burn-in portion and a portion for analysis. Do not use the burn-in portion for analysis. Include enough periods in the burn-in portion to overcome the transient effects. burnIn = 1:15; notBurnIn = burnIn(end)+1:T; Y = Y(notBurnIn,:); X = X(notBurnIn,:); U = U(notBurnIn,:); figure h1 = plot(notBurnIn,Y,'Color',[.85,.85,.85]); hold on h2 = plot(notBurnIn,Intercept+X*Beta,'k--','LineWidth',2); title('{\bf 1000 Simulated Response Paths for Analysis}') legend([h1(1),h2],'Simulated Path','Mean') axis tight hold off
5-129
5
Time Series Regression Models
figure h1 = plot(notBurnIn,var(U,0,2),'r','LineWidth',2); hold on h2 = plot([notBurnIn(1) notBurnIn(end)],... [theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Converged Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') axis tight hold off
5-130
Simulate Regression Models with ARMA Errors
Unconditional disturbance simulation variances fluctuate around the theoretical variance due to Monte Carlo sampling error. Be aware that the exclusion of the burn-in sample from analysis reduces the effective sample size.
Simulate an ARMA Error Model This example shows how to simulate responses from a regression model with ARMA errors without specifying a presample. Specify the regression model with ARMA(2,1) errors: −2 + ut 1.5 ut = 0 . 9ut − 1 − 0 . 1ut − 2 + εt + 0 . 5εt − 1, yt = 2 + Xt
where εt is distributed with 15 degrees of freedom and variance 1. Beta = [-2; 1.5]; Intercept = 2; a1 = 0.9; a2 = -0.1; b1 = 0.5; Variance = 1;
5-131
5
Time Series Regression Models
Distribution = struct('Name','t','DoF',15); Mdl = regARIMA('AR',{a1, a2},'MA',b1,... 'Distribution',Distribution,'Intercept',Intercept,... 'Beta',Beta,'Variance',Variance);
Generate two length T = 50 predictor series by random selection from the standard Gaussian distribution. T = 50; rng(6); % For reproducibility X = randn(T,2);
The software treats the predictors as nonstochastic series. Generate and plot one sample path of responses from Mdl. rng(7); ySim = simulate(Mdl,T,'X',X); figure plot(ySim) title('{\bf Simulated Response Series}')
simulate requires: • P = 2 presample unconditional disturbances to initialize the autoregressive component of the error series. 5-132
Simulate Regression Models with ARMA Errors
• Q = 1 presample innovations to initialize the moving average component of the error series. Without them, as in this case, simulate sets the necessary presample errors to 0. Alternatively, use filter to filter a random innovation series through Mdl. rng(7); e = randn(T,1); yFilter = filter(Mdl,e,'X',X); figure plot(yFilter) title('{\bf Simulated Response Series Using Filtered Innovations}')
The plots suggest that the simulated responses and the responses generated from the filtered innovations are equivalent. Simulate 1000 response paths from Mdl. Assess transient effects by plotting the unconditional disturbance (U) variances across the simulated paths at each period. numPaths = 1000; [Y,~,U] = simulate(Mdl,T,'NumPaths',numPaths,'X',X); figure h1 = plot(Y,'Color',[.85,.85,.85]); title('{\bf 1000 Simulated Response Paths}') hold on
5-133
5
Time Series Regression Models
h2 = plot(1:T,Intercept+X*Beta,'k--','LineWidth',2); legend([h1(1),h2],'Simulated Path','Mean') hold off
figure h1 = plot(var(U,0,2),'r','LineWidth',2); hold on theoVarFix = Variance*(a1*b1*(1+a2)+(1-a2)*(1+a1*b1+b1^2))/... ((1+a2)*((1-a2)^2-a1^2)); h2 = plot([1 T],[theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance',... 'Location','Best') hold off
5-134
Simulate Regression Models with ARMA Errors
The simulated paths follow their theoretical mean, c + Xβ, which is not constant over time (and might look nonstationary). The variance of the process is not constant, but levels off at the theoretical variance by the 10th period. The theoretical variance of the ARMA(2,1) error model is: 2
σε2 a1b1 1 + a2 + 1 − a2 1 + a1b1 + b1 1 + a2
2
1 − a2
2
− a12 2
=
0 . 9(0 . 5) 1 − 0 . 1 + 1 + 0 . 1 1 + 0 . 9(0 . 5) + 0 . 5 1 − 0.1
2
1 + 0.1
2
2
− 0.9
= 6 . 32 .
You can reduce transient effects by partitioning the simulated data into a burn-in portion and a portion for analysis. Do not use the burn-in portion for analysis. Include enough periods in the burn-in portion to overcome the transient effects. burnIn = 1:10; notBurnIn = burnIn(end)+1:T; Y = Y(notBurnIn,:); X = X(notBurnIn,:); U = U(notBurnIn,:); figure h1 = plot(notBurnIn,Y,'Color',[.85,.85,.85]); hold on h2 = plot(notBurnIn,Intercept+X*Beta,'k--','LineWidth',2);
5-135
5
Time Series Regression Models
title('{\bf 1000 Simulated Response Paths for Analysis}') legend([h1(1),h2],'Simulated Path','Mean') axis tight hold off
figure h1 = plot(notBurnIn,var(U,0,2),'r','LineWidth',2); hold on h2 = plot([notBurnIn(1) notBurnIn(end)],... [theoVarFix theoVarFix],'k--','LineWidth',2); title('{\bf Converged Unconditional Disturbance Variance}') legend([h1,h2],'Simulation Variance','Theoretical Variance') axis tight hold off
5-136
Simulate Regression Models with ARMA Errors
Unconditional disturbance simulation variances fluctuate around the theoretical variance due to Monte Carlo sampling error. Be aware that the exclusion of the burn-in sample from analysis reduces the effective sample size.
5-137
5
Time Series Regression Models
Simulate Regression Models with Nonstationary Errors In this section... “Simulate a Regression Model with Nonstationary Errors” on page 5-138 “Simulate a Regression Model with Nonstationary Exponential Errors” on page 5-141
Simulate a Regression Model with Nonstationary Errors This example shows how to simulate responses from a regression model with ARIMA unconditional disturbances, assuming that the predictors are white noise sequences. Specify the regression model with ARIMA errors: yt = 3 + Xt
2 + ut −1 . 5
Δut = 0 . 5Δut − 1 + εt + 1 . 4εt − 1 + 0 . 8εt − 2, where the innovations are Gaussian with variance 1. T = 150; % Sample size Mdl = regARIMA('MA',{1.4,0.8},'AR',0.5,'Intercept',3,... 'Variance',1,'Beta',[2;-1.5],'D',1);
Simulate two Gaussian predictor series with mean 0 and variance 1. rng(1); % For reproducibility X = randn(T,2);
Simulate and plot the response series. y = simulate(Mdl,T,'X',X); figure; plot(y); title 'Simulated Responses'; axis tight;
5-138
Simulate Regression Models with Nonstationary Errors
Regress y onto X. Plot the residuals, and test them for a unit root. RegMdl = fitlm(X,y); figure; subplot(2,1,1); plotResiduals(RegMdl,'caseorder'); subplot(2,1,2); plotResiduals(RegMdl,'lagged');
5-139
5
Time Series Regression Models
h = adftest(RegMdl.Residuals.Raw) h = logical 0
The residual plots indicate that they are autocorrelated and possibly nonstationary (as constructed). h = 0 indicates that there is insufficient evidence to suggest that the residual series is not a unit root process. Treat the nonstationary unconditional disturbances by transforming the data appropriately. In this case, difference the responses and predictors. Reestimate the regression model using the transformed responses, and plot the residuals. dY = diff(y); dX = diff(X); dRegMdl = fitlm(dX,dY); figure; subplot(2,1,1); plotResiduals(dRegMdl,'caseorder','LineStyle','-'); subplot(2,1,2); plotResiduals(dRegMdl,'lagged');
5-140
Simulate Regression Models with Nonstationary Errors
h = adftest(dRegMdl.Residuals.Raw) h = logical 1
The residual plots indicate that they are still autocorrelated, but stationary. h = 1 indicates that there is enough evidence to suggest that the residual series is not a unit root process. Once the residuals appear stationary, you can determine the appropriate number of lags for the error model using Box and Jenkins methodology. Then, use regARIMA to completely model the regression model with ARIMA errors.
Simulate a Regression Model with Nonstationary Exponential Errors This example shows how to simulate responses from a regression model with nonstationary, exponential, unconditional disturbances. Assume that the predictors are white noise sequences. Specify the following ARIMA error model: Δut = 0 . 9Δut − 1 + εt, where the innovations are Gaussian with mean 0 and variance 0.05. 5-141
5
Time Series Regression Models
T = 50; % Sample size MdlU = arima('AR',0.9,'Variance',0.05,'D',1,'Constant',0);
Simulate unconditional disturbances. Exponentiate the simulated errors. rng(10); % For reproducibility u = simulate(MdlU,T,'Y0',[0.5:1.5]'); expU = exp(u);
Simulate two Gaussian predictor series with mean 0 and variance 1. X = randn(T,2);
Generate responses from the regression model with time series errors: yt = 3 + Xt
2 u + e t. −1 . 5
Beta = [2;-1.5]; Intercept = 3; y = Intercept + X*Beta + expU;
Plot the responses. figure plot(y) title('Simulated Responses') axis tight
5-142
Simulate Regression Models with Nonstationary Errors
The response series seems to grow exponentially (as constructed). Regress y onto X. Plot the residuals. RegMdl1 = fitlm(X,y); figure subplot(2,1,1) plotResiduals(RegMdl1,'caseorder','LineStyle','-') subplot(2,1,2) plotResiduals(RegMdl1,'lagged')
The residuals seem to grow exponentially, and seem autocorrelated (as constructed). Treat the nonstationary unconditional disturbances by transforming the data appropriately. In this case, take the log of the response series. Difference the logged responses. It is recommended to transform the predictors the same way as the responses to maintain the original interpretation of their relationship. However, do not transform the predictors in this case because they contain negative values. Reestimate the regression model using the transformed responses, and plot the residuals. dLogY = diff(log(y)); RegMdl2 = fitlm(X(2:end,:),dLogY); figure subplot(2,1,1) plotResiduals(RegMdl2,'caseorder','LineStyle','-')
5-143
5
Time Series Regression Models
subplot(2,1,2) plotResiduals(RegMdl2,'lagged')
h = adftest(RegMdl2.Residuals.Raw) h = logical 1
The residual plots indicate that they are still autocorrelated, but stationary. h = 1 indicates that there is enough evidence to suggest that the residual series is not a unit root process. Once the residuals appear stationary, you can determine the appropriate number of lags for the error model using Box and Jenkins methodology. Then, use regARIMA to completely model the regression model with ARIMA errors.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also regARIMA 5-144
Simulate Regression Models with Nonstationary Errors
More About •
“Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
5-145
5
Time Series Regression Models
Simulate Regression Models with Multiplicative Seasonal Errors In this section... “Simulate a Regression Model with Stationary Multiplicative Seasonal Errors” on page 5-146 “Untitled” on page 5-148
Simulate a Regression Model with Stationary Multiplicative Seasonal Errors This example shows how to simulate sample paths from a regression model with multiplicative seasonal ARIMA errors using simulate. The time series is monthly international airline passenger numbers from 1949 to 1960. Load the airline and recessions data sets. load Data_Airline load Data_Recessions
Transform the airline data by applying the logarithm, and the 1st and 12th differences. y = DataTimeTable.PSSG; logY = log(y); DiffPoly = LagOp([1 -1]); SDiffPoly = LagOp([1 -1],'Lags',[0, 12]); dLogY = filter(DiffPoly*SDiffPoly,logY);
Construct the predictor (X), which determines whether the country was in a recession during the sampled period. A 0 in row t means the country was not in a recession in month t, and a 1 in row t means that it was in a recession in month t. X = zeros(numel(dates),1); % Preallocation for j = 1:size(Recessions,1) X(dates >= Recessions(j,1) & dates = 0) && (rhoPhi < 1); densityRhoLambda = (rhoLambda >= 0) && (rhoLambda < 1); densityRhoZ = (rhoZ >= 0) && (rhoZ < 1); % sigma follows the inverse gamma distribution. densitySigmaPhi = igpdf(100*sigmaPhi,3/2,1/8)/100; densitySigmaLambda = igpdf(100*sigmaLambda,3/2,2)/100; densitySigmaZ = igpdf(100*sigmaZ,3/2,1/8)/100; densitySigmaR = igpdf(100*sigmaR,3/2,2)/100;
% Joint prior distribution logPrior = log(densityBeta) + log(densityPi) + log(densityGamma) ... + log(densityLambda) + log(densityZeta) + log(densityV) ... + log(double(densityRhoPhi)) + log(double(densityRhoLambda)) + log(double(densityRhoZ)) . + log(densitySigmaPhi) + log(densitySigmaLambda) + log(densitySigmaZ) + log(densitySigmaR if isnan(logPrior) || ~isreal(logPrior) logPrior = -Inf; end % Inverse gamma density function density = igpdf(x,a,b) logDensity = -gammaln(a)-a.*log(b)-(a+1).*log(x)-1./(b.*x); density = exp(logDensity); end end
11-198
Analyze Linearized DSGE Models
The posterior does not belong to a common distribution family, but the posterior density can be evaluated up to a normalizing constant as the product of the prior and the likelihood. The random walk Metropolis-Hastings (RWMH) sampler is a popular MCMC method that generates asymptotic samples from the high dimensional posterior. The RWMH sampler first generates a parameter draw from a proposal distribution centered at the current state of the Markov chain. If the posterior density evaluated at the proposed parameter is greater than the current density value, the proposal is accepted. Otherwise, it is accepted with a certain probability to ensure reversibility of the Markov chain. To ensure the RWMH sampler works well, you must carefully tune the proposal. The key tuning parameter of the algorithm is the covariance matrix of the proposal distribution in the form of multivariate normal or t distributions. The inverse of the negative Hessian evaluated at the posterior mode (that is, the maximum of the posterior density) is a natural choice for the covariance matrix, despite the fact that the numeric Hessian is notoriously inaccurate and computationally intensive. A practical Bayesian parameter estimation workflow is: 1
Create a Bayesian SSM object.
2
Tune the RWMH sampler.
3
Run the tuned RWMH sampler.
Create Bayesian SSM Object Create a Bayesian SSM object by passing function handles to the SSM parameter mapping function Example_dsge2ssm.m and the log prior specification Example_dsgePrior.m to the bssm function. MdlB = bssm(@Example_dsge2ssm,@Example_dsgePrior);
MdlB is a bssm object representing the Bayesian SSM. Tune RWMH Sampler Tune the RWMH sampler using numerical optimization of the posterior density by passing the Bayesian SSM object, response data, and initial parameter values to the tune function. Specify the univariate treatment of the multivariate series. [params0,Proposal] = tune(MdlB,Y,params0,Univariate=true); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ----------------------------------------c(1) | 0.9900 0.9905 0.0016 c(2) | 1.0100 1.0052 0.0011 c(3) | 0.1000 0.1366 0.0042 c(4) | 1.0100 1.0063 0.0033 c(5) | 0.7000 0.6664 0.0472 c(6) | 0 0.0385 0.0857 c(7) | 0.9000 0.9125 0.0260 c(8) | 0.7000 0.7516 0.0813 c(9) | 0.1000 0.1227 0.0583 c(10) | 0.0100 0.0108 0.0010
11-199
11
State-Space Models
c(11) | c(12) | c(13) |
0.0100 0.0100 0.0100
0.0137 0.0101 0.0109
0.0046 0.0010 0.0008
params0 is the posterior mode and Proposal is an approximated inverse Hessian to use as the proposal covariance matrix for the RWMH sampler. Run Tuned RWMH Sampler Update the posterior distribution by running the RWMH sampler with the Hessian-based proposal (reduced by 75%). Specify a total of 5000 retained draws, a burn-in period of 1000, and the univariate treatment of a multivariate series [Params,accept] = simulate(MdlB,Y,params0,Proposal/4, ... NumDraws=5000,BurnIn=1000,Univariate=true); estParams = mean(Params,2); EstParamsCov = cov(Params'); fprintf("RWMH sampler proposal acceptance rate = %4.3f.\n",accept); RWMH sampler proposal acceptance rate = 0.335.
As a rule of thumb, an ideal acceptance rate of the proposal is approximately 25% to 50%. Run the RWMH sampler again with the sample-covariance-based proposal, and then compare the MCMC statistics. ParamsNew = simulate(MdlB,Y,params0,EstParamsCov/4, ... NumDraws=5000,BurnIn=1000,Univariate=true); estParamsNew = mean(ParamsNew,2); disp(array2table([estParamsMLE,estParams,estParamsNew], ... VariableNames=["MLE" "MCMC1" "MCMC2"],RowNames="c(" + (1:13) + ")"))
c(1) c(2) c(3) c(4) c(5) c(6) c(7) c(8) c(9) c(10) c(11) c(12) c(13)
MLE _________
MCMC1 ________
MCMC2 ________
0.98995 1.0049 0.13863 1.0028 0.68027 0.014694 0.91422 0.71871 0.14771 0.010367 0.015903 0.0095232 0.010964
0.99065 1.0054 0.13542 1.0082 0.67017 0.039982 0.91587 0.7536 0.12357 0.011134 0.014645 0.010276 0.01102
0.99049 1.0052 0.13608 1.007 0.68001 0.025718 0.91685 0.75425 0.12161 0.011086 0.015406 0.010368 0.011035
Plot the prior and posterior densities of the Calvo parameter ζp. zetaGrid = linspace(0.3,1,numPoints); priorDensity = betapdf(zetaGrid,5.833,2.5); posteriorDensity = ksdensity(Params(5,:),zetaGrid); figure plot(zetaGrid,posteriorDensity,"-b",zetaGrid,priorDensity,"--r");
11-200
Analyze Linearized DSGE Models
legend("Posterior","Prior") title("Prior and Posterior Distributions of Calvo Parameter") xlabel("\zeta_p") ylabel("Density")
Model-Implied Macroeconomic Dynamics The model-implied correlations between variables, which are the impulse response functions (IRFs) and forecast error variance decomposition (FEVD), can explain the dynamic interactions between macroeconomic variables. Temporal Correlations Autocorrelations and cross-correlations characterize dynamic relationships between variables over time. In a model with stationary variables, the model-implied temporal moments take the form corr yt, yt − h , corr xt, xt − h , and corr yt, xt − h , where the index h ≥ 0 is the number of lags. In a stationary system, leads and lags satisfy the identity corr yi, t, y j, t + h = corr y j, t, yi, t − h , ∀ i, j. Compute the temporal correlations in the fitted SSM. Specify 12 lags. numLags = 12; CYY = corr(Mdl,Params=estParams,NumLags=numLags);
The output CYY is a three-dimensional array of temporal correlations corr yi, t, y j, t − h , ∀h, i, j. Plot the autocorrelations of the output growth (top panel) and the cross-correlations between the output growth and the labor share, the inflation, and interest rate (bottom panel). 11-201
11
State-Space Models
CYYlag12 = CYY(1:7,1,2); CYYlag13 = CYY(1:7,1,3); CYYlag14 = CYY(1:7,1,4); CYYlead12 = CYY(7:-1:2,2,1); CYYlead13 = CYY(7:-1:2,3,1); CYYlead14 = CYY(7:-1:2,4,1); seq1 = 0:numLags; seq2 = -(numLags/2):(numLags/2); figure tiledlayout(2,1) nexttile plot(seq1,CYY(:,1,1)) legend(responses(1)) title("Model-Implied Correlations") xlabel("Lag") ylabel("Correlation") nexttile plot(seq2,[CYYlead12; CYYlag12],"-b", ... seq2,[CYYlead13;CYYlag13],":r", ... seq2,[CYYlead14;CYYlag14],"--c"); legend(responses(2:end)) title("Model-Implied Correlations") xlabel("Lag") ylabel("Correlation") axis tight
11-202
Analyze Linearized DSGE Models
Because the response variables Y t are observable and the sample analogues are available, the temporal correlation corr yt, yt − h is a key quantity to understanding the dynamics. In a correctly specified model with reasonable parameter estimates, the model-implied moments are close to the empirical moments obtained from the data. You can obtain the empirical moments from the sample analogues or a fitted parametric model, such as a vector autoregression (VAR) model. Compute and plot the sample correlations of the response variables for 12 lags. SampleCorr = [crosscorr(Y(:,1),Y(:,1),NumLags=numLags) ... crosscorr(Y(:,2),Y(:,1),NumLags=numLags) ... crosscorr(Y(:,3),Y(:,1),NumLags=numLags) ... crosscorr(Y(:,4),Y(:,1),NumLags=numLags)]'; seq3 = -numLags:numLags; figure tiledlayout(2,1) nexttile plot(seq3,SampleCorr(1,:)) legend(responses(1)) title("Empirical Correlations") xlabel("Lag") ylabel("Correlation") axis tight nexttile plot(seq3,SampleCorr(2,:),"-b",seq3,SampleCorr(3,:),":r", ... seq3,SampleCorr(4,:),"--c") legend(responses(2:end)) title("Empirical Correlations") xlabel("Lag") ylabel("Correlation") axis tight
11-203
11
State-Space Models
Impulse Response Functions Another approach to model-dynamics characterization is impulse response analysis, which studies how the dynamic system responds to a unit impulse of the structural shock, all other conditions being equal. You can obtain the IRF from the moving-average representation of an SSM yt = Ψ0 + Ψ1
∞
∑
s=0
Φ1s Φϵϵt − s.
The coefficient Ψ1Φ1s Φϵ demonstrates the s-step-ahead response after an impulse of a state shock. An alternative to the period-by-period IRF is the sum of the IRF over time, which gives the cumulative effect. In the stylized new Keynesian DSGE model, the first element of Y t is the single-period output growth. The interpretation of its cumulative IRF is the multiple-period output growth due to additivity of the log-returns. Compute the IRFs by passing the fitted SSM to the irf function. Specify the estimated parameters and their estimated covariance matrix, and a 21-period cumulative IRF. [Response,~,Lower,Upper] = irf(Mdl,Params=estParams, ... EstParamCov=EstParamsCov,NumPeriods=21,Cumulative=true);
Response is a three-dimensional array in which the element s, i, j characterizes the s − 1 periodahead response of variable j to a unit of shock to variable i.
11-204
Analyze Linearized DSGE Models
Plot the model-implied cumulative IRF of the output growth (in percentage) in response to the preference shock, the price mark-up shock, the technology growth shock, and the monetary policy shock. Response = 100*Response; Lower = 100*Lower; Upper = 100*Upper; titles = ["Preference" "Mark-Up" "Technology" "Monetary"] ... + " Shock"; seq4 = 0:20; ylims = [-0.8 0; -0.8 0.1; 0.5 1.5; -0.8, 0.2]; figure tiledlayout(2,2) for j = 1:4 nexttile plot(seq4,Response(:,j,1),"-b",seq4,Lower(:,j,1),"--r", ... seq4,Upper(:,j,1),"--r"); title(titles(j)) xlabel("Period") ylabel("Response") ylim(ylims(j,:)) end
11-205
11
State-Space Models
Forecast Error Variance Decomposition In addition to temporal correlations and IRFs, variance decompositions are informative about the model dynamics. The FEVD attributes the volatility of observations to component-wise state shocks. It shows the relative importance of each shock in affecting the forecast error variance. You can obtain the FEVD from the moving-average representation of an SSM yt = Ψ0 + Ψ1
∞
∑
s=0
Φ1s Φϵϵt − s. th
′s −1 s ′ , to which the j The total variances over h periods are Ψ1 ∑hs = 0 Φ1ΦϵΦϵ′ Φ1 Ψ1
th
′s −1 s Ψ1 ∑hs = ′ , where Φ j, ϵ is the j 0 Φ1Φ j, ϵΦ′j, ϵΦ1 Ψ1
shock contributes
column of the matrix.
Compute the variance decompositions by passing the fitted SSM to the fevd function. Plot the FEVDs in bar graphs. Decomposition = fevd(Mdl,Params=estParams); figure tiledlayout(2,2) for j = 1:4 nexttile bar(Decomposition(:,:,j),"stacked") title(titles(j)) xlabel("Period") ylabel("Portion") end
11-206
Analyze Linearized DSGE Models
The stacked series represent the structural shocks in the state equation, specifically from bottom to top bar, the preference shock, the price mark-up shock, the technology growth shock, and the monetary policy shock. Because the measurement equation is free from additive disturbances, the portions of the four shocks sum to one. Simulation-Based Forecast Bayesian forecasts derive from the posterior predictive distribution p Y f | Y = ∫p Y f | Y, θ p θ | Y dθ . In the equation, the model parameters are integrated out. Therefore, a Bayesian forecast accounts for parameter uncertainty and the volatility of future data. With the posterior samples obtained by MCMC, the simulation-based forecast computes a collection of predicted variables conditional on the model parameters, and then takes the average of the prediction results. In particular: • The posterior predictive mean is E Y f | Y = E E Y f | Y, θ | Y . • The posterior predictive variance is Var Y f | Y = E Var Y f | Y, θ | Y + Var E Y f | Y, θ | Y . To reduce the computational burden of Monte-Carlo integration, thin the posterior samples by selecting every other 50th draw (column) of Params. thin = 50; ParamsThin = Params(:,1:thin:end);
Forecast 12 periods out-of-sample using the state-space framework, conditional on the model parameters. 11-207
11
State-Space Models
numPeriods = 12; numDraws = size(ParamsThin,2); % Preallocate YfSim = zeros(numPeriods,4,numDraws); YfVarSim = zeros(numPeriods,4,numDraws); for n = 1:numDraws [A,B,C,D] = Example_dsge2ssm(ParamsThin(:,n)); MdlF = ssm(A,B,C,D); [YfSim(:,:,n),YfVarSim(:,:,n)] = forecast(MdlF,numPeriods,Y); end
YfSim is E Y f | Y, θ and YfVarSim is Var Y f | Y, θ . Marginalize over the parameters for the posterior predictive mean and variance. Yf = mean(YfSim,3); YfVar = mean(YfVarSim,3) + var(YfSim,0,3);
Plot the forecasted values. Use the slider to choose a different response series to plot. YfLB = Yf - 2*sqrt(YfVar); YfUB = Yf + 2*sqrt(YfVar); ind = ; fh = (numObs+1):(numObs+numPeriods); figure plot(1:numObs,Y(:,ind),":b",fh,Yf(:,ind),"-b", ... fh,YfLB(:,ind),"--r",fh,YfUB(:,ind),"--r") title("Out-of-Sample Forecast of " + responses(ind)) xlabel("Period") ylabel("Value")
11-208
Analyze Linearized DSGE Models
Conclusion Quantitative evaluation of the DSGE model has evolved considerably over the years. Formal parameter estimation is typically implemented in the state-space framework, in which the Kalman filter delivers the likelihood function for the Gaussian linear SSM. It is possible to estimate DSGE models by the numeric maximum likelihood, but Bayesian methods are popular in that a tight prior distribution can regularize the likelihood function. Bayesian inference of the DSGE model is facilitated by MCMC methods such as the Metropolis-Hasting sampler. Appropriate specification of the proposal distribution is crucial for an efficient MCMC. A popular specification is the random walk Gaussian proposal with the covariance matrix proportional to the inverse Hessian evaluated at the posterior maximum. In addition to parameter estimation, it is of interest to estimate other unknown variables such as the state variables, missing data, and the variables in the forecast horizon. In the Bayesian framework, the posterior predictive distribution characterizes the state of knowledge of those unknown variables, conditional on the observations. The Econometrics Toolbox provides ssm and bssm functionalities for general-purpose state-space modeling. You can cast many time series models, including the linearized DSGE model, in the statespace form for a variety of inference tasks such as maximum likelihood estimation, Bayesian posterior
11-209
11
State-Space Models
simulation, state filtering and smoothing, impulse response analysis, variance decomposition, out-ofsample forecast, and so on.
References [1] Del Negro, Marco, Eusepi, Stefano, Giannoni, Marc P., Sbordone, Argia M., Tambalotti, Andrea, Cocci, Matthew, Hasegawa, Raiden, and Linder, M. H. "The FRBNY DSGE Model." FRB of New York Staff Report 647 (October 2013). https://doi.org/10.2139/ssrn.2344312. [2] Fernández-Villaverde, Jesús, Rubio-Ramírez, Juan F., and Schorfheide, Frank. "Solution and Estimation Methods for DSGE Models." Handbook of Macroeconomics 2 (November 2016) 527–724. https://doi.org/10.1016/bs.hesmac.2016.03.006. [3] Sims, Christopher A. "Solving Linear Rational Expectations Models." Computational Economics 20 (October 2002) 1–20. https://doi.org/10.1023/A:1020517101123. [4] Smets, Frank, and Raf Wouters. "An Estimated Dynamic Stochastic General Equilibrium Model of the Euro Area." Journal of the European Economic Association 1 (September 2003): 1123– 1175. https://doi.org/10.1162/154247603770383415.
See Also Objects ssm | bssm Functions estimate | forecast | irf | fevd | simulate | tune
Related Examples
11-210
•
“What Are State-Space Models?” on page 11-3
•
“Apply State-Space Methodology to Analyze Diebold-Li Yield Curve Model” on page 11-160
Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models
Perform Outlier Detection Using Bayesian Non-Gaussian StateSpace Models This example shows how to use non-Gaussian error distributions in a Bayesian state-space model to detect outliers in a time series. The example follows the state-space framework for outlier detection in [1], Chapter 14. Robust regressions that employ distributions with excess kurtosis accommodate more extreme data compared to the Gaussian regressions. After a non-Gaussian regression (representing, for example, a linear time series decomposition model), an analysis of residuals (representing the irregular component of the decomposition) facilitates outlier detection. Consider using a state-space model as a linear filter for a simulated, quarterly series of coal consumption, in millions of short tons, from 1994 through 2020. Load and Plot Data Load the simulated coal consumption series Data_SimCoalConsumption.mat. The data set contains the timetable of data DataTimeTable. load Data_SimCoalConsumption
Plot the coal consumption series. figure plot(DataTimeTable.Time,DataTimeTable.CoalConsumption) ylabel("Consumption (Millions of Short Tons)") title("Quarterly Coal Consumption, 1994-2020")
11-211
11
State-Space Models
The series has a clear downward trend and pronounced seasonality, which suggests a linear decomposition model. The series does not explicitly exhibit unusually large or small observations. Specify State-Space Model Structure Because the series appears decomposable, consider the linear decomposition for the observed series yt = τt + γt + σ3εt, where, at time t: •
yt is the observed coal consumption.
• τt is the unobserved local linear trend. • γt is the unobserved seasonal component. • εt is the observation innovation with mean 0. Its standard deviation depends on its distribution. • σ3 is the observation innovation coefficient, an unknown parameter, which scales the observation innovation. The unobserved components are the model states, explicitly: Δ τt = Δ τt − 1 + σ1u1, t γt = − γt − 1 − γt − 2 − γt − 3 + σ2u2, t At time t:
11-212
Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models
• u1, t and u2 . t are state disturbance series with mean 0. Their standard deviations depend on its distribution. • σ1 and σ2 are the state-disturbance loading scalars, which are unknown parameters. • Rearrange the linear trend equation to obtain τt = 2τt − 1 − τt − 2 + σ1u1, t. The structure can be viewed as a state-space model xt = Axt − 1 + But yt = Cxt + Dεt, where: • xt = τt x2, t γt x4, t x5, t ′, where x j, t are dummy variables for higher order lagged terms of τt and γt. •
•
2 1 A= 0 0 0
−1 0 0 0 0
0 0 −1 1 0
0 0 −1 0 1
0 0 −1 . 0 0
σ1 0 0 0 B = 0 σ2 . 0 0 0 0
• C= 1 0 1 0 0. • D = σ3. Write a function called linearDecomposeMap in the Local Functions on page 11-217 section that maps a vector of parameters Θ to the state-space coefficient matrices. function [A,B,C,D,Mean0,Cov0,StateType] = linearDecomposeMap(theta) A = [2 -1 0 0 0; 1 0 0 0 0; 0 0 -1 -1 -1; 0 0 1 0 0; 0 0 0 1 0]; B = [theta(1) 0; 0 0; 0 theta(2); 0 0; 0 0]; C = [1 0 1 0 0]; D = theta(3); Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [2; 2; 2; 2; 2]; % All states are nonstationary end
Specify Prior Distribution 2 . Write a Assume the prior distribution of each of the disturbance and innovation scalars is χ10 function called priorDistribution in the Local Functions on page 11-217 section that returns the log prior of a value of Θ.
11-213
11
State-Space Models
function logprior = priorDistribution(theta) p = chi2pdf(theta,10); logprior = sum(log(p)); end
Create Bayesian State-Space Model Create a Bayesian state-space model representing the system by passing the state-space model structure and prior distribution functions as function handles to bssm. For comparison, create a model that sets the distribution of εt to a standard Gaussian distribution and a different model that sets the distribution of εt to a Student's t distribution with unknown degrees of freedom. y = DataTimeTable.CoalConsumption; MdlGaussian = bssm(@linearDecomposeMap,@priorDistribution); MdlT = bssm(@linearDecomposeMap,@priorDistribution,ObservationDistribution="t");
Prepare for Posterior Sampling Posterior sampling requires a good set of initial parameter values and tuned proposal scale matrix. Also, to access Bayesian estimates of state values and residuals, which compose the components of the decomposition, you must write a function that stores these quantities at each iteration of the MCMC sampler. For each prior model, use the tune function to obtain a data-driven set of initial values and a tuned proposal scale matrix. Specify a random set of positive values in [0,1] to initialize the Kalman filter. rng(10) % For reproducibility numParams = 3; theta0 = rand(numParams,1); [theta0Gaussian,ProposalGaussian] = tune(MdlGaussian,y,theta0,Display=false); Local minimum possible. fminunc stopped because it cannot decrease the objective function along the current search direction. [theta0T,ProposalT] = tune(MdlT,y,theta0,Display=false); Local minimum possible. fminunc stopped because it cannot decrease the objective function along the current search direction.
Write a function in the Local Functions on page 11-217 section called outputfcn that returns state estimates and observation residuals at each iteration of the MCMC sampler. function out = outputfcn(inputstruct) out.States = inputstruct.States; out.ObsResiduals = inputstruct.Y - inputstruct(end).States*inputstruct(end).C'; end
Decompose Series Using Gaussian Model Decompose the series by following this procedure:
11-214
1
Draw a large sample from the posterior distribution by using simulate.
2
Plot the linear, seasonal, and irregular components.
Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models
Use simulate to draw a posterior sample of 10000 from the model that assumes the observation innovation distribution is Gaussian. Specify the data-driven initial values and proposal scale matrix, specify the output function outputfcn, apply the univariate treatment to speed up calculations, and apply a burn-in period of 1000. Return the draws, acceptance probability, and output function output. simulate uses the Metropolis-Hastings sampler to draw the sample. NumDraws = 10000; BurnIn = 1000; [ThetaPostGaussian,acceptGaussian,outGaussian] = simulate(MdlGaussian, ... y,theta0Gaussian,ProposalGaussian,NumDraws=NumDraws,BurnIn=BurnIn, ... OutputFunction=@outputfcn,Univariate=true); acceptGaussian acceptGaussian = 0.4620
ThetaPostGaussian is a 3-by-1000 matrix of draws from the posterior distribution of Θ | y. simulate accepted about half of the proposed draws. outGaussian is a structure array with 1000 records, the last of which contains the final estimates of the components of the decomposition. Plot the components of the decomposition by calling the plotComponents function in the Local Functions on page 11-217 section. plotComponents(outGaussian,dates,"Gaussian")
The plot of the irregular component (residuals) does not contain any residuals of unusual magnitude. Because the Gaussian model structure applies very small probabilities to unusual observations, it is difficult for this type of model to discover outliers. 11-215
11
State-Space Models
Decompose Series Using Model with t-Distributed Innovations The t-distribution applies larger probabilities to extreme observations. This characteristic enables models to discover outliers more readily than a Gaussian model. Decompose the series by following this procedure: 1
Estimate the posterior distribution Θ, νε | y by using estimate. Inspect the estimate of νε.
2
Draw a large sample from the posterior distribution by using simulate.
3
Plot the linear, seasonal, and irregular components.
Estimate the posterior distribution of the model that assumes the observation innovations are t distributed. seed = 10; % To reproduce samples across estimate and simulate rng(seed) estimate(MdlT,y,theta0T,NumDraws=NumDraws,BurnIn=BurnIn,Univariate=true); Local minimum possible. fminunc stopped because the size of the current step is less than the value of the step size tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.0037 0.0035 0.0011 c(2) | 0.0623 0.0627 0.0080 c(3) | 0.0370 0.0375 0.0094 Posterior Distributions | Mean Std Quantile05 Quantile95 -------------------------------------------------c(1) | 0.0042 0.0013 0.0025 0.0066 c(2) | 0.0522 0.0079 0.0407 0.0665 c(3) | 0.0314 0.0097 0.0169 0.0490 x(1) | 14.6299 0.0308 14.5780 14.6794 x(2) | 14.6538 0.0247 14.6117 14.6935 x(3) | 0.1188 0.0491 0.0498 0.2078 x(4) | -0.6852 0.0387 -0.7538 -0.6284 x(5) | -0.1136 0.0321 -0.1647 -0.0608 ObsDoF | 4.7300 3.3612 1.5845 11.6515 Proposal acceptance rate = 35.30%
The Posterior Distributions table in the display shows posterior estimates of the state-space model parameters (labeled c(j)), final state values (labeled x(j)), and t-distribution degrees of freedom (labeled ObsDoF). The posterior mean of the degrees of freedom is about 5, which is low. This result suggests that the observation innovations have excess kurtosis. Use simulate to draw a posterior sample of 10000 from the model that assumes the observation innovations are t distributed. Specify the data-driven initial values and proposal scale matrix, specify the output function outputfcn, apply the univariate treatment to speed up calculations, and apply a burn-in period of 1000. Return the draws, acceptance probability, and output function output. simulate uses the Metropolis-within-Gibbs sampler to draw the sample. rng(seed) [ThetaPostT,acceptT,outT] = simulate(MdlT,y,theta0T,ProposalT, ...
11-216
Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models
NumDraws=NumDraws,BurnIn=BurnIn,OutputFunction=@outputfcn,Univariate=true); acceptT acceptT = 0.3154
ThetaPostT is a 3-by-1000 matrix of draws from the posterior distribution of Θ | y. simulate accepted about 30% of the proposed draws. outT is a structure array with 1000 records, the last of which contains the final estimates of the components of the decomposition. Plot the components of the decomposition. plotComponents(outT,dates,"$$t$$") h = gca; h.FontSize = 8;
The plot of the irregular component shows two clear outliers around 2005. Local Functions This example uses the following functions. linearDecomposeMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters Θ. function [A,B,C,D,Mean0,Cov0,StateType] = linearDecomposeMap(theta) A = [2 -1 0 0 0; 1 0 0 0 0; 0 0 -1 -1 -1; 0 0 1 0 0; 0 0 0 1 0];
11-217
11
State-Space Models
B = [theta(1) 0; 0 0; 0 theta(2); 0 0; 0 0]; C = [1 0 1 0 0]; D = theta(3); Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [2; 2; 2; 2; 2]; % All states are nonstationary end function logprior = priorDistribution(theta) p = chi2pdf(theta,10); logprior = sum(log(p)); end function out = outputfcn(inputstruct) out.States = inputstruct.States; out.ObsResiduals = inputstruct.Y - inputstruct(end).States*inputstruct(end).C'; end function plotComponents(output,dt,tcont) figure tiledlayout(2,2) nexttile plot(dt,output(end).States(:,1)) grid on title("Linear Trend: " + tcont,Interpreter="latex") h = gca; h.FontSize = 8; nexttile plot(dt,output(end).States(:,3)) grid on title("Seasonal Component: " + tcont,Interpreter="latex") h = gca; h.FontSize = 8; nexttile plot(dt,output(end).ObsResiduals) grid on title("Irregular Component: " + tcont,Interpreter="latex") end
References [1] Durbin, J, and Siem Jan Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also Objects bssm Functions estimate | tune | simulate
Related Examples • 11-218
“What Are State-Space Models?” on page 11-3
Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models
•
“What Is the Kalman Filter?” on page 11-7
•
“Analyze Linearized DSGE Models” on page 11-190
11-219
12 Functions
12
Functions
addBusinessCalendar Add business calendar awareness to timetables
Syntax TT = addBusinessCalendar(TT) TT = addBusinessCalendar( ___ ,Name=Value)
Description TT = addBusinessCalendar(TT) adds business calendar awareness to an input timetable TT by setting a custom property for the output timetable TT. TT = addBusinessCalendar( ___ ,Name=Value) sets additional options specified by one or more name-value arguments, using any of the input arguments in the previous syntax. For example, TT = addBusinessCalendar(TT,Holidays=H) replaces the default holidays stored in Data_NYSE_Closures.mat with the list of holidays H.
Examples Add Business Calendar Awareness to Timetable This example shows how to add a business calendar when you aggregate daily prices to weekly simulated prices. Simulate daily prices for three assets from January 1, 2014, through December 31, 2018. t = (datetime(2014,1,1):caldays:datetime(2018,12,31))'; rng(200,"twister") Price = 100 + 0.1*(0:numel(t) - 1)'.*cumsum(randn(numel(t),1)/100); Price = round(Price*100)/100; Price2 = round(Price*94)/100; Price3 = round(Price*88)/100; TT = timetable(Price,Price2,Price3,RowTimes=t); head(TT,15)
12-2
Time ___________
Price ______
Price2 ______
Price3 ______
01-Jan-2014 02-Jan-2014 03-Jan-2014 04-Jan-2014 05-Jan-2014 06-Jan-2014 07-Jan-2014 08-Jan-2014 09-Jan-2014 10-Jan-2014 11-Jan-2014
100 100 100 100 100.01 100.01 100.02 100.02 100.04 100.06 100.08
94 94 94 94 94.01 94.01 94.02 94.02 94.04 94.06 94.08
88 88 88 88 88.01 88.01 88.02 88.02 88.04 88.05 88.07
addBusinessCalendar
12-Jan-2014 13-Jan-2014 14-Jan-2014 15-Jan-2014
100.11 100.11 100.12 100.12
94.1 94.1 94.11 94.11
88.1 88.1 88.11 88.11
TT.Properties ans = TimetableProperties with properties: Description: '' UserData: [] DimensionNames: {'Time' 'Variables'} VariableNames: {'Price' 'Price2' 'Price3'} VariableDescriptions: {} VariableUnits: {} VariableContinuity: [] RowTimes: [1826x1 datetime] StartTime: 01-Jan-2014 SampleRate: NaN TimeStep: 1d Events: [] CustomProperties: No custom properties are set. Use addprop and rmprop to modify CustomProperties.
Add business calendar awareness to the timetable. TTBCA = addBusinessCalendar(TT); TTBCA.Properties ans = TimetableProperties with properties: Description: UserData: DimensionNames: VariableNames: VariableDescriptions: VariableUnits: VariableContinuity: RowTimes: StartTime: SampleRate: TimeStep: Events:
'' [] {'Time' 'Variables'} {'Price' 'Price2' 'Price3'} {} {} [] [1826x1 datetime] 01-Jan-2014 NaN 1d []
Custom Properties (access using t.Properties.CustomProperties.): BusinessCalendar: [1x1 struct]
TTBCA is a timetable containing the same data as TT, but the Time variable respects business days only when you operate on the timetable with business calendar aware functions. Aggregate the variables in each timetable to weekly mean series. TTW = convert2weekly(TT,Aggregation="mean"); TTBCAW = convert2weekly(TTBCA,Aggregation="mean"); head(TTW)
12-3
12
Functions
Time ___________
Price ______
Price2 ______
Price3 ______
03-Jan-2014 10-Jan-2014 17-Jan-2014 24-Jan-2014 31-Jan-2014 07-Feb-2014 14-Feb-2014 21-Feb-2014
100 100.02 100.12 100.22 100.37 100.54 100.64 100.83
94 94.023 94.109 94.203 94.343 94.51 94.599 94.784
88 88.021 88.104 88.191 88.321 88.479 88.561 88.734
Time ___________
Price ______
Price2 ______
Price3 ______
03-Jan-2014 10-Jan-2014 17-Jan-2014 24-Jan-2014 31-Jan-2014 07-Feb-2014 14-Feb-2014 21-Feb-2014
100 100.03 100.13 100.26 100.39 100.56 100.66 100.88
94 94.03 94.116 94.24 94.364 94.522 94.616 94.833
88 88.028 88.112 88.228 88.342 88.49 88.576 88.778
head(TTBCAW)
TTW contains the weekly means for all days in the data, but TTBCAW contains weekly means for only business days during each week.
Specify Custom Holidays Simulate daily prices for three assets from January 1, 2014, through December 31, 2018. t = (datetime(2014,1,1):caldays:datetime(2018,12,31))'; rng(200,"twister") Price = 100 + 0.1*(0:numel(t) - 1)'.*cumsum(randn(numel(t),1)/100); Price = round(Price*100)/100; Price2 = round(Price*94)/100; Price3 = round(Price*88)/100; TT = timetable(Price,Price2,Price3,RowTimes=t);
Suppose that the assets are owned by a company that observes all NYSE holidays and a holiday on August 10. Create a datetime vector containing the custom holiday list. load Data_NYSE_Closures % Data containing closures excluding Sundays NYSE_Holidays = NYSE(day(NYSE.Date,"dayofweek") ~= 7,:); % Exclude Saturdays aug10 = datetime(2014:2018,08,10)'; holidays = [aug10; NYSE.Date(isbetween(NYSE.Date,t(1),t(end),"closed"))];
Add business calendar awareness to the timetable. Specify the custom holiday list. TTCH = addBusinessCalendar(TT,Holidays=holidays);
12-4
addBusinessCalendar
TTCH is a timetable containing the same data as TT, but the Time variable respects the custom business days only when you operate on the timetable with business calendar aware functions. Create a timetable respecting the default business days. TTBCA = addBusinessCalendar(TT);
Aggregate the timetables by computing the annual means of the prices. TTY = convert2annual(TT,Aggregation="mean") TTY=5×3 timetable Time ___________ 31-Dec-2014 31-Dec-2015 31-Dec-2016 31-Dec-2017 31-Dec-2018
Price ______
Price2 ______
Price3 ______
103.38 120.53 135.12 151.4 196.83
97.177 113.3 127.01 142.31 185.02
90.974 106.07 118.9 133.23 173.21
TTBCAY = convert2annual(TTBCA,Aggregation="mean") TTBCAY=5×3 timetable Time Price ___________ ______ 31-Dec-2014 31-Dec-2015 30-Dec-2016 29-Dec-2017 31-Dec-2018
103.38 120.66 135.19 151.41 196.55
Price2 ______
Price3 ______
97.175 113.42 127.08 142.33 184.75
90.973 106.18 118.97 133.24 172.96
TTCHY = convert2annual(TTCH,Aggregation="mean") TTCHY=5×3 timetable Time Price ___________ ______ 31-Dec-2014 31-Dec-2015 30-Dec-2016 29-Dec-2017 31-Dec-2018
103.38 120.64 135.21 151.33 196.53
Price2 ______
Price3 ______
97.175 113.4 127.09 142.25 184.74
90.973 106.16 118.98 133.17 172.94
Input Arguments TT — Input timetable to update with business calendar awareness timetable Input timetable to update with business calendar awareness, specified as a timetable. Data Types: timetable 12-5
12
Functions
Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: TT = addBusinessCalendar(TT,Holidays=H) replaces the default holidays stored in Data_NYSE_Closures.mat with the list of holidays H Holidays — Alternate holidays and market closure dates datetime vector Alternate holidays and market closure dates, specified as a datetime vector. The dates in Holidays must be whole dates without HH:MM:SS components. No business is conducted on the dates in Holidays. By default, Holidays is the New York Stock Exchange (NYSE) holidays and market closure dates. For more details, load the default holidays in Data_NYSE_Closures.mat and inspect the NYSE variable, or, if you have a Financial Toolbox license, see holidays and isbusday. Tip If you have a Financial Toolbox license, you can generate alternate holiday schedules by using the createholidays function and performing this procedure: 1
Generate a new holidays function using createholidays.
2
Call the new holidays function to get the list of holidays.
3
Specify the alternate holidays to addBusinessCalendar by using the Holidays name-value argument.
Data Types: datetime Weekends — Alternate weekend days on which no business is conducted [1 0 0 0 0 0 1] or ["Sunday" "Saturday"] (default) | logical vector | string vector Alternate weekend days on which no business is conducted, specified as a length 7 logical vector or a string vector. For a logical vector, true (1) entries indicate a weekend day and false (0) entries indicate a weekday, where entries correspond to Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, and Saturday. Example: Weekends=[1 0 0 0 0 1 1] specifies that business is not conducted on Fridays through Sundays. For a string vector, entries explicitly list the weekend days. Example: Weekends=["Friday" "Saturday" "Sunday"] Tip If business is conducted seven days per week, set Weekends to [0 0 0 0 0 0 0]. Data Types: logical 12-6
addBusinessCalendar
Output Arguments TT — Updated timetable TT with added business calendar awareness by a custom property timetable Updated timetable TT with added business calendar awareness by the custom property BusinessCalendar, returned as a timetable. The custom property BusinessCalendar contains a data structure that contains a field IsBusinessDay that stores a callable function (F). The function F accepts a datetime matrix (D) and returns a logical indicator matrix (I) of the same size: I = F(D). true (1) elements of I indicate that the corresponding element of D occurs on a business day; false (0) elements of I indicate otherwise. Access the callable function F by using F = TT.Properties.CustomProperties.BusinessCalendar.IsBusinessDay.
Version History Introduced in R2020b
See Also timetable
12-7
12
Functions
adftest Augmented Dickey-Fuller test
Syntax h = adftest(y) [h,pValue,stat,cValue] = adftest(y) StatTbl = adftest(Tbl) [ ___ ] = adftest( ___ ,Name=Value) [ ___ ,reg] = adftest( ___ )
Description h = adftest(y) returns the rejection decision h from conducting an augmented Dickey-Fuller test on page 12-17 for a unit root in a univariate time series y. [h,pValue,stat,cValue] = adftest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = adftest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting an augmented Dickey-Fuller test for a unit root in the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = adftest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. adftest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when adftest conducts multiple tests: • adftest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, adftest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, for the presence of a unit root in the variable GDP of the table Tbl. The first test includes 0 lagged difference terms in the AR model, and the second test includes 1 lagged difference term in the AR model. [ ___ ,reg] = adftest( ___ ) additionally returns a structure of regression statistics for the hypothesis test reg.
Examples Conduct Dickey-Fuller Test Without Augmentation on Vector of Data Test a time series for a unit root using the default autoregressive model without augmented difference terms. Input the time series data as a numeric vector. 12-8
adftest
Load the Canadian inflation rate data and extract the CPI-based inflation rate INF_C. load Data_Canada y = DataTable.INF_C;
Test the time series for a unit root. h = adftest(y) h = logical 0
The result h = 0 indicates that this test fails to reject the null hypothesis of a unit root against the autoregressive alternative.
Return Test p-Value and Decision Statistics Load Canadian inflation rate data and extract the CPI-based inflation rate INF_C. load Data_Canada y = DataTable.INF_C;
Test the time series for a unit root. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stat,cValue] = adftest(y) h = logical 0 pValue = 0.3255 stat = -0.8769 cValue = -1.9476
Conduct Dickey-Fuller Test Without Augmentation on Table Variable Test a time series, which is one variable in a table, for a unit root using the default autoregressive model without augmented difference terms. Load Canadian inflation rate data, which contains yearly measurements on five time series variables in the table DataTable. load Data_Canada
Test the long-term bond rate series INT_L, the last variable in the table, for a unit root. StatTbl = adftest(DataTable) StatTbl=1×8 table h _____
pValue ______
stat _______
cValue _______
Lags ____
Alpha _____
Model ______
Test ______
12-9
12
Functions
Test 1
false
0.7358
0.24601
-1.9476
0
0.05
{'AR'}
{'T1'}
adftest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, Model, and Test), and rows correspond to individual tests (in this case, adftest conducts one test). By default, adftest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Conduct Augmented Dickey-Fuller Test Against Trend-Stationary Alternative Test a time series for a unit root against a trend-stationary alternative augmented with lagged difference terms. Load a GDP data set. Compute the log of the series. load Data_GDP; Y = log(Data);
Test for a unit root against a trend-stationary alternative, augmenting the model with 0, 1, and 2 lagged difference terms. h = adftest(Y,Model="TS",Lags=0:2) h = 1x3 logical array 0
0
0
adftest treats the three lag choices as three separate tests, and returns a vector with rejection decisions for each test. The values h = 0 indicate that all three tests fail to reject the null hypothesis of a unit root against the trend-stationary alternative.
Choose Number of Lags for Test by Inspecting OLS Statistics Test a time series for a unit root against trend-stationary alternatives augmented with different numbers of lagged difference terms. Look at the regression statistics corresponding to each of the alternative models to choose how many lagged difference terms to include in the augmented model. Load a US macroeconomic data set Data_USEconModel.mat. Compute the log of the GDP and include the result as a new variable called LogGDP in the data set. load Data_USEconModel DataTimeTable.LogGDP = log(DataTimeTable.GDP);
Test for a unit root in the logged GDP series using three different choices for the number of lagged difference terms. Return the regression statistics for each alternative model. [StatTbl,reg] = adftest(DataTimeTable,DataVariable="LogGDP",Model="TS",Lags=0:2); StatTbl
12-10
adftest
StatTbl=3×8 table h _____ Test 1 Test 2 Test 3
false false false
pValue _______
stat __________
cValue _______
0.999 0.99565 0.99214
1.0247 -0.0020747 -0.21274
-3.4302 -3.4303 -3.4304
Lags ____ 0 1 2
Alpha _____
Model ______
Test ______
0.05 0.05 0.05
{'TS'} {'TS'} {'TS'}
{'T1'} {'T1'} {'T1'}
adftest treats each of the three lag choices as separate tests, and returns results and settings for each test along the rows of the table StatTbl. reg is a 3-by-1 structure array containing regression statistics corresponding to each of the three alternative models. Display the names of the coefficients, their t-statistics and corresponding p-values, and the BIC resulting from the regression of the three alternative models. model1 = array2table([reg(1).tStats.t reg(1).tStats.pVal], ... RowNames=reg(1).names,VariableNames=["tStat" "pValue"]) model1=3×2 table tStat ________ c d a
-0.43299 -1.2195 167.66
pValue ___________ 0.6654 0.22383 8.5908e-255
model2 = array2table([reg(2).tStats.t reg(2).tStats.pVal], ... RowNames=reg(2).names,VariableNames=["tStat" "pValue"]) model2=4×2 table tStat ________ c d a b1
0.35537 -0.13077 185.44 8.1646
pValue ___________ 0.72262 0.89607 1.0349e-263 1.7553e-14
model3 = array2table([reg(3).tStats.t reg(3).tStats.pVal], ... RowNames=reg(3).names,VariableNames=["tStat" "pValue"]) model3=5×2 table tStat ________ c d a b1 b2
0.52121 0.089107 184.56 6.4983 1.8871
pValue ___________ 0.6027 0.92907 1.7276e-261 4.6217e-10 0.060353
The first model has no added difference terms, the second model has one difference term (b1), and the third model has two difference terms (b1 and b2). These results indicate that the coefficient of the first difference term is significantly different from zero in both the second and third models, but 12-11
12
Functions
the coefficient of the second term in the third model is not at a 0.05 significance level. This result suggests augmenting the model with one lagged difference term is adequate. Compare the BIC for each of the three alternatives. reg.BIC ans = -1.5114e+03 ans = -1.5589e+03 ans = -1.5496e+03
Of the three alternative models, the model augmented with one lagged difference term is the best because it yields the lowest BIC.
Input Arguments y — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note adftest removes missing observations, represented by NaN values, from the input series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: adftest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two separate tests at a level of significance of 0.025 on the variable GGDP of the input table Tbl. The first test includes 0 lagged difference terms in the AR model, and the second test includes 1 lagged difference term in the AR model. Lags — Number of lagged difference terms 0 (default) | nonnegative integer | vector of nonnegative integers Number p of lagged difference terms to include in the AR model, specified as a nonnegative integer or vector of nonnegative integers. 12-12
adftest
adftest conducts a separate test for each element in Lags. Example: Lags=[0 1] includes no lags in the AR model for the first test, and then includes Δyt – 1 in the AR model for the second test. Data Types: double Model — Model variant "AR" (default) | "ARD" | "TS" | character vector | string vector | cell vector of character vectors Model variant, specified as a model variant name, or a string vector or cell vector of model names. This table contains the supported model variant names. Model Variant Name
Description
"AR"
Autoregressive model variant, which specifies a test of the null model yt = yt − 1 + β1 Δyt − 1 + β2 Δyt − 2 + … + βpΔyt − p + εt against the alternative model yt = ϕyt − 1 + β1 Δyt − 1 + β2 Δyt − 2 + … + βp Δyt − p + εt, with AR(1) coefficient ϕ < 1.
"ARD"
Autoregressive model with drift variant, which specifies a test of the null model yt = yt − 1 + β1 Δyt − 1 + β2 Δyt − 2 + … + βpΔyt − p + εt against the alternative model yt = c + ϕyt − 1 + β1 Δyt − 1 + β2 Δyt − 2 + … + βpΔyt − p + εt, with drift coefficient c and AR(1) coefficient ϕ < 1.
"TS"
Trend-stationary model variant, which specifies a test of the null model yt = c + yt − 1 + β1 Δyt − 1 + β2 Δyt − 2 + … + βp Δyt − p + εt against the alternative model yt = c + δt + ϕyt − 1 + β1 Δyt − 1 + β2 Δyt − 2 + … + βp Δyt − p + εt, with drift coefficient c, deterministic trend coefficient δ, and AR(1) coefficient ϕ < 1. adftest conducts a separate test for each model variant name in Model. Example: Model=["AR" "ARD"] uses the stationary AR model as the alternative hypothesis for the first test, and then uses the stationary AR model with drift as the alternative hypothesis for the second test. Data Types: char | cell | string Test — Test statistic "t1" (default) | "t2" | "F" | character vector | string vector | cell vector of character vectors
12-13
12
Functions
Test statistic, specified as a test name, or a string vector or cell vector of test names. This table contains the supported test names. Test Name
Description
"t1"
Standard t statistic t1 =
ϕ −1 , SE ϕ
computed using the OLS estimate of the AR(1) coefficient ϕ and its standard error SE(ϕ ), in the alternative model. The test assesses the significance of the restriction, ϕ − 1 = 0. "t2"
Lag-adjusted, unstudentized t statistic t2 =
T(ϕ − 1) , 1−β1−…−βp
computed using the OLS estimates of the AR(1) coefficient and stationary coefficients in the alternative model. T is the effective sample size, adjusted for lags and missing values. The test assesses the significance of the restriction, ϕ − 1 = 0. "F"
F statistic for assessing the significance of a joint restriction on the alternative model. • For model variant (Model argument) "ARD", the restrictions areϕ − 1 = 0 and c = 0. • For model variant "TS", the restrictions areϕ − 1 = 0 and δ = 0. An F statistic is invalid for model variant "AR". adftest conducts a separate test for each test name in Test. Example: Test="F" computes the F test statistic for all tests. Data Types: char | cell | string Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. adftest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric.
12-14
adftest
Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string Note • When adftest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors. • A lagged and differenced time series has a reduced sample size. Absent presample values, if the test series yt is defined for t = 1,…,T, the lagged series yt– k is defined for t = k+1,…,T. The first difference applied to the lagged series yt– k further reduces the time base to k+2,…,T. With p lagged differences, the common time base is p+2,…,T and the effective sample size is T–(p+1).
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. adftest returns h when you supply the input y. • Values of 1 indicate rejection of the unit-root null model in favor of the alternative model. • Values of 0 indicate failure to reject the unit-root null model. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. adftest returns pValue when you supply the input y. • The p-value of test statistic (Test) "t1" or "t2" is a left-tail probability. • The p-value of test statistic "F" is a right-tail probability. When test statistics are outside tabulated critical values, adftest returns maximum (0.999) or minimum (0.001) p-values. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. adftest returns stat when you supply the input y. adftest computes test statistics using ordinary least squares (OLS) estimates of the coefficients in the alternative model. cValue — Critical values numeric scalar | numeric vector 12-15
12
Functions
Critical values, returned as a numeric scalar or vector with length equal to the number of tests. adftest returns cValue when you supply the input y. • The critical value of test statistic (Test) "t1" or "t2" is for a left-tail probability. • The critical value of test statistic "F" is for a right-tail probability. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. adftest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Model, and Test. reg — Regression statistics structure array Regression statistics from the OLS estimation of coefficients in the alternative model, returned as a structure array with number of records equal to the number of tests. Each element of reg has the fields in this table. You can access a field using dot notation, for example, reg(1).coeff contains the coefficient estimates of the first test. Field
Description
num
Length of input series with NaNs removed
size
Effective sample size T, num adjusted for lags
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
12-16
adftest
Field
Description
AIC
Akaike information criterion
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
More About Augmented Dickey-Fuller Test for Unit Root The augmented Dickey-Fuller test for a unit root assesses the null hypothesis of a unit root in the time series yt, where yt = c + δt + ϕyt − 1 + β1 Δyt − 1 + … + βp Δyt − p + εt, and • Δ is the differencing operator such that Δyt = yt − yt − 1 . • p is the number of lagged difference terms (see Lags). • c is the drift coefficient (see Model). • δ is the deterministic trend coefficient (see Model). • εt is a mean zero innovation process. The null hypothesis of a unit root is H0 : ϕ = 1. Under the alternative hypothesis, ϕ < 1. Variants of the model allow for different growth characteristics (see Model). The model with δ = 0 has no trend component, and the model with c = 0 and δ = 0 has no drift or trend. A test that fails to reject the null hypothesis, fails to reject the possibility of a unit root.
Tips • To draw valid inferences from the test, determine a suitable value for Lags. One method is to begin with a maximum lag, such as the one recommended in [7], and then test down by assessing the significance of β p, the coefficient of the largest lagged change in yt. The usual t statistic is appropriate, as returned in the reg output structure. Another method is to combine a measure of fit, such as the SSR, with information criteria, such as AIC, BIC, and HQC. These statistics are also returned in the reg output structure. For more details, see [6]. • With a specific testing strategy in mind, determine the value of Model by the growth characteristics of yt. If you include too many regressors (see Lags), the test loses power; if you include too few regressors, the test is biased towards favoring the null model [4]. In general, if a series grows, the "TS" model (see Model) provides a reasonable trend-stationary alternative to a unit-root process with drift. If a series is does not grow, the "AR" and "ARD" models provide 12-17
12
Functions
reasonable stationary alternatives to a unit-root process without drift. The "ARD" alternative model has a mean of c/(1 – a); the "AR" alternative model has mean 0.
Algorithms Dickey-Fuller statistics follow nonstandard distributions under the null hypothesis (even asymptotically). adftest uses tabulated critical values, generated by Monte Carlo simulations, for a range of sample sizes and significance levels of the null model with Gaussian innovations and five million replications per sample size. adftest interpolates critical values cValue and p-values pValue from the tables. Tables for tests of Test types "t1" and "t2" are identical to those for pptest. For small samples, tabulated values are valid only for Gaussian innovations. For large samples, values are also valid for non-Gaussian innovations.
Version History Introduced in R2009b
References [1] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Dickey, D. A., and W. A. Fuller. "Distribution of the Estimators for Autoregressive Time Series with a Unit Root." Journal of the American Statistical Association. Vol. 74, 1979, pp. 427–431. [3] Dickey, D. A., and W. A. Fuller. "Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root." Econometrica. Vol. 49, 1981, pp. 1057–1072. [4] Elder, J., and P. E. Kennedy. "Testing for Unit Roots: What Should Students Be Taught?" Journal of Economic Education. Vol. 32, 2001, pp. 137–146. [5] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [6] Ng, S., and P. Perron. "Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag." Journal of the American Statistical Association. Vol. 90, 1995, pp. 268–281. [7] Schwert, W. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159.
See Also kpsstest | lmctest | pptest | vratiotest | i10test Topics “Unit Root Tests” on page 3-40 “Unit Root Nonstationarity” on page 3-32
12-18
aicbic
aicbic Information criteria
Syntax aic = aicbic(logL,numParam) [aic,bic] = aicbic(logL,numParam,numObs) [aic,bic] = aicbic(logL,numParam,numObs,Normalize=true) [aic,bic,ic] = aicbic(logL,numParam,numObs) [aic,bic,ic] = aicbic(logL,numParam,numObs,Normalize=true)
Description To assess model adequacy, aicbic computes information criteria on page 12-24 given loglikelihood values obtained by fitting competing models to data. aic = aicbic(logL,numParam) returns the Akaike information criteria (AIC) given loglikelihood values logL derived from fitting different models to data, and given the corresponding number of estimated model parameters numParam. [aic,bic] = aicbic(logL,numParam,numObs) also returns the Bayesian (Schwarz) information criteria (BIC) given corresponding sample sizes used in estimation numObs. [aic,bic] = aicbic(logL,numParam,numObs,Normalize=true) normalizes results by dividing all output arguments by the sample sizes numObs. By default, aicbic does not normalize results (Normalize=false). [aic,bic,ic] = aicbic(logL,numParam,numObs) also returns the structure ic containing the AIC, BIC, and other information criteria on page 12-24. [aic,bic,ic] = aicbic(logL,numParam,numObs,Normalize=true) normalizes all returned information criteria by the sample sizes numObs.
Examples Compare Models Using AIC and BIC Compare the in-sample fits of three competing models using the AIC and BIC. Their loglikelihood values logL and corresponding number of estimated parameters numParam are in the following table. Suppose the effective sample size is 1500. logL = [-681.4724; -663.4615; -632.3158]; numParam = [12; 18; 27]; numObs = 1500; Tbl = table(logL,numParam,RowNames="Model"+string(1:3)) Tbl=3×2 table logL _______
numParam ________
12-19
12
Functions
Model1 Model2 Model3
-681.47 -663.46 -632.32
12 18 27
Compute AIC Calculate the AIC of each estimated model. aic = aicbic(logL,numParam) aic = 3×1 103 × 1.3869 1.3629 1.3186
The model with the lowest AIC has the best in-sample fit. Identify the model with the lowest AIC. [~,idxmin] = min(aic); bestFitAIC = Tbl.Properties.RowNames{idxmin} bestFitAIC = 'Model3'
The AIC suggests that Model3 has the best, most parsimonious fit, despite being the most complex of the three models. Compute BIC Calculate the BIC of each estimated model. Specify the sample size numObs, which is required for computing the BIC. [~,bic] = aicbic(logL,numParam,numObs) bic = 3×1 103 × 1.4507 1.4586 1.4621
As is the case with the AIC, the model with the lowest BIC has the best in-sample fit. Identify the model with the lowest BIC. [~,idxmin] = min(bic); bestFitBIC = Tbl.Properties.RowNames{idxmin} bestFitBIC = 'Model1'
The BIC suggests Model1, the simplest of the three models. The results show that when the sample size is large, the BIC imposes a greater penalty on complex models than the AIC.
12-20
aicbic
Compute All Information Criteria Fit several models to simulated data, and then compare the model fits using all available information criteria. Simulate a random path of length 100 from the data generating process (DGP) yt = 1 + 0 . 2yt − 1 − 0 . 4yt − 2 + εt, where εt is a random Gaussian series with mean 0 and variance 1. rng(1) % For reproducibility T = 100; DGP = arima(Constant=1,AR=[0.2 -0.4],Variance=1); y = simulate(DGP,T);
Assume that the DGP is unknown, and that the AR(1), AR(2), and AR(3) models are appropriate for describing the DGP. For each competing model, create an arima model template for estimation. Mdl(1) = arima(1,0,0); Mdl(2) = arima(2,0,0); Mdl(3) = arima(3,0,0);
Fit each model to the simulated data y, compute the loglikelihood, and suppress the estimation display. numMdl = numel(Mdl); logL = zeros(numMdl,1); numParam = zeros(numMdl,1);
% Preallocate
for j = 1:numMdl [EstMdl,~,logL(j)] = estimate(Mdl(j),y,Display="off"); results = summarize(EstMdl); numParam(j) = results.NumEstimatedParameters; end
For each model, compute all available information criteria. [~,~,ic] = aicbic(logL,numParam,T) ic = struct with fields: aic: [310.9968 285.5082 bic: [318.8123 295.9289 aicc: [311.2468 285.9292 caic: [321.8123 299.9289 hqc: [314.1599 289.7256
287.0309] 300.0567] 287.6692] 305.0567] 292.3027]
ic is a 1-D structure array with a field for each information criterion. Each field contains a vector of measurements; element j corresponds to the model yielding loglikelihood logL(j). For each criterion, determine the model that yields the minimum value. [~,minIdx] = structfun(@min,ic); [Mdl(minIdx).Description]'
12-21
12
Functions
ans = 5x1 string "ARIMA(2,0,0) "ARIMA(2,0,0) "ARIMA(2,0,0) "ARIMA(2,0,0) "ARIMA(2,0,0)
Model Model Model Model Model
(Gaussian (Gaussian (Gaussian (Gaussian (Gaussian
Distribution)" Distribution)" Distribution)" Distribution)" Distribution)"
The minimum of each criterion corresponds to the AR(2) model, which has the structure of the DGP.
Normalize Information Criteria Fit several models to simulated data, specify a presample for estimation, and then compare the model fits using normalized AIC. Simulate a random path of length 50 from the DGP yt = 1 + 0 . 2yt − 1 − 0 . 4yt − 2 + εt, where εt is a random Gaussian series with mean 0 and variance 1. rng(1) % For reproducibility T = 50; DGP = arima(Constant=1,AR=[0.2 -0.4],Variance=1); y = simulate(DGP,T);
Create an arima model template for each competing model. Mdl(1) = arima(1,0,0); Mdl(2) = arima(2,0,0); Mdl(3) = arima(3,0,0);
Fit each model to the simulated data y, and specify the required number of presample observations for each fit. Compute the loglikelihood, and suppress the estimation display. numMdl = numel(Mdl); logL = zeros(numMdl,1); numParam = zeros(numMdl,1); numObs = zeros(numMdl,1);
% Preallocate
for j = 1:numMdl y0 = y(1:Mdl(j).P); % Presample yest = y((Mdl(j).P+1):end); % Estimation sample [EstMdl,~,logL(j)] = estimate(Mdl(j),yest,Y0=y0, ... Display="off"); results = summarize(EstMdl); numParam(j) = results.NumEstimatedParameters; numObs(j) = results.SampleSize; end
For each model, compute the normalized AIC. aic = aicbic(logL,numParam,numObs,Normalize=true)
12-22
aicbic
aic = 3×1 3.2972 2.9880 3.0361
Determine the model that yields the minimum AIC. [~,minIdx] = min(aic); Mdl(minIdx).Description ans = "ARIMA(2,0,0) Model (Gaussian Distribution)"
Input Arguments logL — Loglikelihoods numeric vector Loglikelihoods associated with parameter estimates of different models, specified as a numeric vector. Data Types: double numParam — Number of estimated parameters positive integer | vector of positive integers Number of estimated parameters in the models, specified as a positive integer applied to all elements of logL, or a vector of positive integers with the same length as logL. Data Types: double numObs — Sample sizes positive integer | vector of positive integers Sample sizes used in estimation, specified as a positive integer applied to all elements of logL, or a vector of positive integers with the same length as logL. aicbic requires numObs for all criteria except the AIC. aicbic also requires numObs if 'Normalize' is true. Data Types: double
Output Arguments aic — AIC numeric vector AIC corresponding to elements of logL, returned as a numeric vector. bic — BIC numeric vector BIC corresponding to elements of logL, returned as a numeric vector. 12-23
12
Functions
ic — Information criteria structure array Information criteria, returned as a 1-D structure array containing the fields described in this table. Field values are numeric vectors with elements corresponding to elements of logL. Field
Description
aic
AIC
bic
BIC
aicc
Corrected AIC (AICc)
caic
Consistent AIC (CAIC)
hqc
Hannan-Quinn criteria (HQC)
ic.aic and ic.bic are the same values returned in aic and bic, respectively.
More About Information Criteria Information criteria rank models using measures that balance goodness of fit with parameter parsimony. For a particular criterion, models with lower values are preferred. This table describes how aicbic computes unnormalized criteria. Information Criterion
Formula
AIC
aic = -2*logL + 2*numParam
BIC
bic = -2*logL + log(numObs)*numParam
AICc
aicc = aic + [2*numParam*(numParam + 1)]/(numObs – numParam – 1)
CAIC
caic = -2*logL + (log(numObs) + 1)*numParam
HQC
hqc = -2*logL + 2*log(log(numObs))*numParam
Misspecification tests, such as the Lagrange multiplier (lmtest), likelihood ratio (lratiotest), and Wald (waldtest) tests, compare the loglikelihoods of two competing nested models. By contrast, information criteria based on loglikelihoods of individual model fits are approximate measures of information loss with respect to the DGP. Information criteria provide relative rankings of any number of competing models, including nonnested models.
Tips • In small samples, AIC tends to overfit. To address overfitting, AICc adds a size-dependent correction term that increases the penalty on the number of parameters. AICc approaches AIC asymptotically. The analysis in [3] suggests using AICc when numObs/numParam < 40. • When econometricians compare models with different numbers of autoregressive lags or different orders of differencing, they often scale information criteria by the number of observations [5]. To scale information criteria, set numObs to the effective sample size of each estimate, and set 'Normalize' to true. 12-24
aicbic
Version History Introduced before R2006a
References [1] Akaike, Hirotugu. "Information Theory and an Extension of the Maximum Likelihood Principle.” In Selected Papers of Hirotugu Akaike, edited by Emanuel Parzen, Kunio Tanabe, and Genshiro Kitagawa, 199–213. New York: Springer, 1998. https://doi.org/10.1007/978-1-4612-1694-0_15. [2] Akaike, Hirotugu. “A New Look at the Statistical Model Identification.” IEEE Transactions on Automatic Control 19, no. 6 (December 1974): 716–23. https://doi.org/10.1109/ TAC.1974.1100705. [3] Burnham, Kenneth P., and David R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. 2nd ed, New York: Springer, 2002. [4] Hannan, Edward J., and Barry G. Quinn. “The Determination of the Order of an Autoregression.” Journal of the Royal Statistical Society: Series B (Methodological) 41, no. 2 (January 1979): 190–95. https://doi.org/10.1111/j.2517-6161.1979.tb01072.x. [5] Lütkepohl, Helmut, and Markus Krätzig, editors. Applied Time Series Econometrics. 1st ed. Cambridge University Press, 2004. https://doi.org/10.1017/CBO9780511606885. [6] Schwarz, Gideon. “Estimating the Dimension of a Model.” The Annals of Statistics 6, no. 2 (March 1978): 461–64. https://doi.org/10.1214/aos/1176344136.
See Also Functions lmtest | lratiotest | waldtest Topics “Information Criteria for Model Selection” on page 3-53 “Time Series Regression V: Predictor Selection” on page 5-212 “Determine Minimal Number of Lags Using Information Criterion” on page 9-28 “Choose ARMA Lags Using BIC” on page 7-135 “Compare Conditional Variance Models Using Information Criteria” on page 8-69 “VAR Model Case Study” on page 9-90
12-25
12
Functions
archtest Engle test for residual heteroscedasticity
Syntax h = archtest(res) [h,pValue,stat,cValue] = archtest(res) StatTbl = archtest(Tbl) [ ___ ] = archtest( ___ ,Name=Value)
Description h = archtest(res) returns the rejection decision from conducting Engle’s ARCH test on page 1236 for residual heteroscedasticity in the univariate residual series res. [h,pValue,stat,cValue] = archtest(res) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = archtest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting Engle's ARCH test for residual heteroscedasticity in the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = archtest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. archtest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when archtest conducts multiple tests: • archtest treats each test as separate from all other tests. • If you specify res, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, archtest(Tbl,DataVariable="ResidualGDP",Alpha=0.025,Lags=[1 4]) conducts two tests, at a level of significance of 0.025, for the presence of heteroscedasticity in the variable ResidualGDP of the table Tbl. The first test includes 1 lag in the AR model of the squared residuals, and the second test includes 4 lags.
Examples Conduct Engle's ARCH Test on Vector of Data Test a time series for ARCH effects using default options of archtest. Input the time series data as a numeric vector. Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
12-26
archtest
Data is a time series vector of daily Deutschmark/British pound bilateral spot exchange rates. Plot the series. plot(Data) title("\bf Deutschmark/British Pound Bilateral Spot Exchange Rate") ylabel("Spot Exchange Rate") xlabel("Business Days Since January 2, 1984")
The series appears nonstationary. To stabilize the series, convert the spot exchange rates to returns. returns = price2ret(Data); plot(returns) title("\bf Deutschmark/British Pound Bilateral Spot Exchange Rate") ylabel("Return") xlabel("Business Days Since January 3, 1984")
12-27
12
Functions
Compute the deviations of the return series from the mean. residuals = returns - mean(returns);
At 0.05 level of significance, test the residual series of the returns for lag 1 ARCH effects. h = archtest(residuals) h = logical 1
The result h = 1 indicates rejection of the null hypothesis of no conditional heteroscedasticity in favor of a significant lag 1 ARCH effect in the return series.
Return Test p-Value and Decision Statistics Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
Preprocess the data by following this procedure: 1
12-28
Stabilize the series by computing daily returns.
archtest
2
Compute the deviations from the mean return.
returns = price2ret(Data); residuals = returns - mean(returns);
Test the residual series for a significant lag 1 ARCH effect. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stat,cValue] = archtest(residuals) h = logical 1 pValue = 0 stat = 96.2379 cValue = 3.8415
Conduct Engle's ARCH Test on Table Variable Test a time series, which is one variable in a table, for ARCH effects using default options of archtest. Load the equity index data set Data_EquityIdx. Preprocess the daily NASDAQ closing prices by performing the following actions: 1
Convert the price series to a percentage return series by using price2ret.
2
Represent the series as residuals that fluctuate around a constant level by centering the returns series.
Store the residual series in the table with the rest of the data. Because the price-to-return conversion reduces the sample size from the head of the series, impute the missing residual with the first residual. load Data_EquityIdx ret = 100*price2ret(DataTable.NASDAQ); res = ret - mean(ret); DataTable.Residuals_NASDAQ = [res(1); res]; DataTable.Properties.VariableNames{end} ans = 'Residuals_NASDAQ'
The residual series is the last variable in the table. Conduct the Engle's ARCH test on the residuals series at a 5% significance level by supplying the entire data set archtest. StatTbl = archtest(DataTable) StatTbl=1×6 table h _____
pValue ______
stat _____
cValue ______
Lags ____
Alpha _____
12-29
12
Functions
Test 1
true
0
208.1
3.8415
1
0.05
archtest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags and Alpha), and rows correspond to individual tests (in this case, archtest conducts one test). h = 1 and pValue = 0 rejects the null hypothesis and suggests that the evidence for ARCH(1) conditional heteroscedasticity in the NASDAQ returns residual series is strong. By default, archtest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Conduct Multiple ARCH Tests Conduct several, separate ARCH tests that use different significant levels. Consider the first 1000 days of the daily NYSE closing prices in the equity index data set from “Conduct Engle's ARCH Test on Table Variable” on page 12-29. Test a time series, which is one variable in a table, for ARCH effects using default options of archtest. Load the time series data and consider the first 1000 observations. Preprocess and compute the residuals of the NYSE series from a constant only model. load Data_EquityIdx T = 1000; DataTable = DataTable(1:T,:); ret = 100*price2ret(DataTable.NYSE); res = ret - mean(ret); DataTable.Residuals_NYSE = [res(1); res];
Plot the residuals of the NYSE percent returns series. plot(1:T,DataTable.Residuals_NYSE) title("Residuals of Constant NYSE Returns Model")
12-30
archtest
The first half of the series appears to have a larger variance than the latter half, which can indicate the presence of volatility clustering. Conduct the Engle's ARCH test on the residuals series at a 10%, 5%, 1%, and 0.1% significance levels. Specify the table variable name of the residuals. StatTbl = archtest(DataTable,Alpha=[0.1 0.05 0.01 0.001],DataVariable="Residuals_NYSE") StatTbl=4×6 table h _____ Test Test Test Test
1 2 3 4
true true true false
pValue _________
stat ______
cValue ______
0.0058387 0.0058387 0.0058387 0.0058387
7.5994 7.5994 7.5994 7.5994
2.7055 3.8415 6.6349 10.828
Lags ____ 1 1 1 1
Alpha _____ 0.1 0.05 0.01 0.001
The output table StatTbl contains a row for each test. The test rejects the null hypothesis for each significance level except 0.1% (pValue is the lowest significance level you can use to reject the null hypothesis).
12-31
12
Functions
Determine and Specify Lags for Test Statistic To draw valid inferences from Engle's ARCH test, determine a suitable number of lags for the model by following this procedure: 1
Fit the model over a range of plausible lags.
2
Compare the information criteria of the fitted models.
3
Choose the number of lags that yields the best fitting model for the ARCH test.
Load and Process Data Load the equity index data set Data_EquityIdx. Convert the table of data DataTable to a timetable. load Data_EquityIdx dates = datetime(dates,"ConvertFrom",'datenum'); TT = table2timetable(DataTable,RowTimes=dates); TT.Dates = [];
TT is a timetable containing the same data variable as DataTable, but observations (rows) are associated with the closing times in dates. Preprocess the daily NASDAQ closing prices by performing the following actions: 1
Convert the price series to a return series by using price2ret.
2
The sampling rate has a relatively high frequency. Therefore, the daily changes can be small. For numerical stability, scale the data by 100.
Store the percent returns series in the table with the rest of the data. Because the price-to-return conversion reduces the sample size from the head of the series, prepend the series with the first percent return. ret = 100*price2ret(TT.NASDAQ); TT.Returns_NASDAQ = [ret(1); ret]; TT.Properties.VariableNames{end} ans = 'Returns_NASDAQ'
Plot the percent returns series. figure plot(TT.Time,TT.Returns_NASDAQ) title("NASDAQ Daily Returns (%)")
12-32
archtest
The series appears to fluctuate at a constant level. The last quarter of the residual series seems to have higher variance than the first three quarters. This volatile behavior indicates conditional heteroscedasticity. Fit ARCH Models Over Grid of Lags Fit an ARCH(k) model to the NASDAQ percent returns for each k = 0, . . . , 4. Store the loglikelihood of each fit. numLags = 4; logL = zeros(numLags,1); % Preallocation for k = 1:numLags Mdl = garch(0,k); [~,~,logL(k)] = estimate(Mdl,TT.Returns_NASDAQ,Display="off"); end
Determine Suitable Number of Lags for Test Determine the best fitting model by computing and comparing each AIC. Choose the number of lags for the test that corresponds to the best fitting model. aic = aicbic(logL,1:numLags); [~,lags] = min(aic) lags = 4
The best fitting model, according to AIC, has four ARCH lags. 12-33
12
Functions
Conduct ARCH Test Represent the NASDAQ percent returns as residuals that fluctuate around a constant level by centering the returns. Store the returns in the timetable. TT.Residuals_NASDAQ = TT.Returns_NASDAQ - mean(TT.Returns_NASDAQ);
Conduct Engle's ARCH test at a 1% significance level on the residual series Residuals_NASDAQ. Specify four lags for the test statistic. StatsTbl = archtest(TT,DataVariable="Residuals_NASDAQ",Lags=lags,Alpha=0.01) StatsTbl=1×6 table h _____ Test 1
true
pValue ______ 0
stat ______
cValue ______
460.82
13.277
Lags ____ 4
Alpha _____ 0.01
h = 1 and pValue = 0 rejects the null hypothesis and suggests that the evidence for ARCH(4) conditional heteroscedasticity in the NASDAQ percent returns residual series is strong.
Input Arguments res — Residual series vector Residual series, specified as a numeric vector. Each element of res corresponds to an observation. Typically, res contains the (standardized) residuals from a model fit to observed time series. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single residual series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note archtest does not support residual series with missing (NaN-valued) observations. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: Alpha=0.025,Lags=[1 2] conducts two separate tests at a level of significance of 0.025. The first test includes 1 lag in the AR model of the squared residuals, and the second test includes 2 lags. 12-34
archtest
Lags — Number of lags 1 (default) | positive integer | vector of positive integers Number of lags L to include in the AR model for computing the test statistic, specified as a positive integer less than length(res) – 2 or a vector of such positive integers. archtest conducts a separate test for each element in Lags. Example: Lags=[1 4] conducts two tests. The first test includes only the first lag in the AR model of the squared residuals, and the second test includes the first through fourth lags. Data Types: double Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar in the interval (0,1) or a numeric vector of such values. archtest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. Example: DataVariable="ResidualGDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string Note • When archtest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector res and any value is a row vector, all outputs are row vectors.
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. archtest returns h when you supply the input res. 12-35
12
Functions
• Values of 1 indicate rejection of the no ARCH effects null hypothesis in favor of the alternative. • Values of 0 indicate failure to reject the no ARCH effects null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. archtest returns pValue when you supply the input res. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. archtest returns stat when you supply the input res. cValue — Test critical values numeric scalar | numeric vector Test critical values, determined by Alpha, returned as a numeric scalar or vector with length equal to the number of tests. archtest returns cValue when you supply the input res. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. archtest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags and Alpha.
More About Engle’s ARCH Test Engle’s ARCH test assesses the null hypothesis that a series of residuals (rt) exhibits no conditional heteroscedasticity (ARCH effects), against the alternative that an ARCH(L) model describes the series. The ARCH(L) model has the following form: rt2 = α0 + α1rt2− 1 + …+αLrt2− L + et, where there is at least one αj ≠ 0, j = 0,…,L. The test statistic is the Lagrange multiplier statistic TR2, where: • T is the sample size. • R2 is the coefficient of determination from fitting the ARCH(L) model for a number of lags (L) via regression. Under the null hypothesis, the asymptotic distribution of the test statistic is chi-square with L degrees of freedom. 12-36
archtest
Tips • To draw valid inferences from the test, determine a suitable number of lags by following this procedure: 1
Fit a sequence of ARCH(L) models by using arima, garch, egarch, or gjr models and its corresponding estimate function. Restrict each model by specifying progressively smaller ARCH lags (i.e., ARCH effects corresponding to increasingly smaller lag polynomial terms).
2
Obtain loglikelihoods from the estimated models.
3
Evaluate the significance of each restriction by using lratiotest. Alternatively, compute information criteria using aicbic and combine them with measures of fit.
• Residuals in an ARCH process are dependent, but not correlated. Therefore, archtest tests for heteroscedasticity without autocorrelation. To test for residual autocorrelation, use lbqtest. • GARCH(P,Q) processes are locally equivalent to ARCH(P + Q) processes. If archtest(res,Lags=L) shows evidence of conditional heteroscedasticity in residuals from a mean model, consider using a GARCH(P,Q) model with P + Q = L.
Version History Introduced before R2006a
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Engle, Robert. F. “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica 50 (July 1982): 987–1007. https://doi.org/ 10.2307/1912773.
See Also Objects arima | garch | egarch | gjr Functions lbqtest | lratiotest | aicbic Topics “Time Series Regression VI: Residual Diagnostics” on page 5-223 “Detect ARCH Effects” on page 3-27 “Engle’s ARCH Test” on page 3-25
12-37
12
Functions
arima Create univariate autoregressive integrated moving average (ARIMA) model
Description The arima function returns an arima object specifying the functional form and storing the parameter values of an ARIMA(p,D,q) linear time series model on page 12-60 for a univariate response process yt. arima enables you to create variations of the ARIMA model, including: • An autoregressive (AR(p)), moving average (MA(q)), or ARMA(p,q) model. • A model containing multiplicative seasonal components (SARIMA(p,D,q)⨉(ps,Ds,qs)s). • A model containing a linear regression component for exogenous covariates (ARIMAX). • A composite conditional mean and conditional variance model. For example, you can create an ARMA conditional mean model containing a GARCH conditional variance model (garch). The key components of an arima object are the polynomial degrees (for example, the AR polynomial degree p and the degree of integration D) because they completely specify the model structure. Given polynomial degrees, all other parameters, such as coefficients and innovation-distribution parameters, are unknown and estimable unless you specify their values. To estimate a model containing unknown parameter values, pass the model and data to estimate. To work with an estimated or fully specified arima object, pass it to an object function on page 12-46. Alternatively, you can: • Create and work with arima model objects interactively by using Econometric Modeler. • Model serial correlation in a disturbance series of a regression model by creating a regression model with ARIMA errors. For more details, see regARIMA and “Alternative ARIMA Model Representations” on page 5-113.
Creation Syntax Mdl = arima Mdl = arima(p,D,q) Mdl = arima(Name,Value) Description Mdl = arima creates an ARIMA(0,0,0) model containing only an unknown constant and a series of iid Gaussian innovations with mean 0 and an unknown variance. 12-38
arima
Mdl = arima(p,D,q) creates an ARIMA(p,D,q) model containing nonseasonal AR polynomial lags from 1 through p, the degree D nonseasonal integration polynomial, and nonseasonal MA polynomial lags from 1 through q. This shorthand syntax provides an easy way to create a model template in which you specify the degrees of the nonseasonal polynomials explicitly. The model template is suited for unrestricted parameter estimation. After you create a model, you can alter property on page 12-41 values using dot notation. Mdl = arima(Name,Value) sets properties on page 12-41 and polynomial lags using name-value pair arguments. Enclose each name in quotes. For example, 'ARLags',[1 4],'AR',{0.5 –0.1} specifies the values –0.5 and 0.1 for the nonseasonal AR polynomial coefficients at lags 1 and 4, respectively. This longhand syntax allows you to create more flexible models. arima infers all polynomial degrees from the properties that you set. Therefore, property values that correspond to polynomial degrees must be consistent with each other. Input Arguments The shorthand syntax provides an easy way for you to create nonseasonal ARIMA model templates that are suitable for unrestricted parameter estimation. For example, to create an ARMA(2,1) model containing unknown coefficients and innovations variance, enter: Mdl = arima(2,0,1);
To impose equality constraints on parameter values during estimation, or include seasonal components, set the appropriate property on page 12-41 values using dot notation. p — Nonseasonal autoregressive polynomial degree nonnegative integer Nonseasonal autoregressive polynomial degree, specified as a nonnegative integer. Data Types: double D — Degree of nonseasonal integration nonnegative integer Degree of nonseasonal integration (the degree of the nonseasonal differencing polynomial), specified as a nonnegative integer. D sets the property D. Data Types: double q — Nonseasonal moving average polynomial degree nonnegative integer Nonseasonal moving average polynomial degree, specified as a nonnegative integer. Data Types: double Name-Value Pair Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. 12-39
12
Functions
Before R2021a, use commas to separate each name and value, and enclose Name in quotes. The longhand syntax enables you to create seasonal models or models in which some or all coefficients are known. During estimation, estimate imposes equality constraints on any known parameters. Example: 'ARLags',[1 4],'AR',{0.5 –0.1} specifies the nonseasonal AR polynomial 1 − 0.5L1 + 0.1L4. ARLags — Lags associated with nonseasonal AR polynomial coefficients 1:numel(AR) (default) | numeric vector of unique positive integers Lags associated with the nonseasonal AR polynomial coefficients, specified as the comma-separated pair consisting of 'ARLags' and a numeric vector of unique positive integers. The maximum lag is p. AR{j} is the coefficient of lag ARLags(j), where AR is the value of the property AR. Example: ARLags=4 specifies the nonseasonal AR polynomial1 − ϕ4L4. Example: ARLags=1:4 specifies the nonseasonal AR polynomial1 − ϕ1L1 − ϕ2L2 − ϕ3L3 − ϕ4L4. Example: ARLags=[1 4] specifies the nonseasonal AR polynomial 1 − ϕ1L1 − ϕ4L4 . Data Types: double MALags — Lags associated with nonseasonal MA polynomial coefficients 1:numel(MA) (default) | numeric vector of unique positive integers Lags associated with the nonseasonal MA polynomial coefficients, specified as the comma-separated pair consisting of 'MALags' and a numeric vector of unique positive integers. The maximum lag is q. MA{j} is the coefficient of lag MALags(j), where MA is the value of the property MA. Example: MALags=3 specifies the nonseasonal MA polynomial 1 + θ3L3. Example: MALags=1:3 specifies the nonseasonal MA polynomial 1 + θ1L1 + θ2L2 + θ3L3 . Example: MALags=[1 3] specifies the nonseasonal MA polynomial 1 + θ1L1 + θ3L3. Data Types: double SARLags — Lags associated with seasonal AR polynomial coefficients 1:numel(SAR) (default) | numeric vector of unique positive integers Lags associated with the seasonal AR polynomial coefficients, specified as the comma-separated pair consisting of 'SARLags' and a numeric vector of unique positive integers. The maximum lag is ps. SAR{j} is the coefficient of lag SARLags(j), where SAR is the value of the property SAR. Specify SARLags as the periodicity of the observed data, and not as multiples of the Seasonality property. This convention does not conform to standard Box and Jenkins [1] notation, but it is more flexible for incorporating multiplicative seasonality. Example: 'SARLags',[4 8] specifies the seasonal AR polynomial 1 − Φ4L4 − Φ8L8 . Data Types: double 12-40
arima
SMALags — Lags associated with seasonal MA polynomial coefficients 1:numel(SMA) (default) | numeric vector of unique positive integers Lags associated with the seasonal MA polynomial coefficients, specified as the comma-separated pair consisting of 'SMALags' and a numeric vector of unique positive integers. The maximum lag is qs. SMA{j} is the coefficient of lag SMALags(j), where SMA is the value of the property SMA. Specify SMALags as the periodicity of the observed data, and not as multiples of the Seasonality property. This convention does not conform to standard Box and Jenkins [1] notation, but it is more flexible for incorporating multiplicative seasonality. Example: 'SMALags',4 specifies the seasonal MA polynomial 1 + Θ4L4 . Data Types: double Note Polynomial degrees are not estimable. If you do not specify a polynomial degree, or arima cannot infer it from other specifications, arima does not include the polynomial in the model.
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to create a fully specified ARMA(2,1) model, enter: Mdl = arima('Constant',1,'AR',{0.3 -0.15},'MA',0.2); Mdl.Variance = 1;
Note • NaN-valued properties indicate estimable parameters. Numeric properties indicate equality constraints on parameters during model estimation. Coefficient vectors can contain both numeric and NaN-valued elements. • You can specify polynomial coefficients as vectors in any orientation, but arima stores them as row vectors.
P — Compound AR polynomial degree nonnegative integer This property is read-only. Compound AR polynomial degree, specified as a nonnegative integer. P does not necessarily conform to standard Box and Jenkins notation [1] because P captures the degrees of the nonseasonal and seasonal AR polynomials (properties AR and SAR, respectively), nonseasonal integration (property D), and seasonality (property Seasonality). Explicitly, P = p + D + ps + s. P conforms to Box and Jenkins notation for models without integration or a seasonal AR component. P specifies the number of lagged observations required to initialize the AR components of the model. 12-41
12
Functions
Data Types: double Q — Compound MA polynomial degree nonnegative integer This property is read-only. Compound MA polynomial degree, specified as a nonnegative integer. Q does not necessarily conform to standard Box and Jenkins notation [1] because Q captures the degrees of the nonseasonal and seasonal MA polynomials (properties MA and SMA, respectively). Explicitly, Q = q + qs. Q conforms to Box and Jenkins notation for models without a seasonal MA component. Q specifies the number of lagged innovations required to initialize the MA components of the model. Data Types: double Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. arima stores the value as a string scalar. The default value describes the parametric form of the model, for example "ARIMAX(1,1,1) Model (Gaussian Distribution)". Example: "Model 1" Data Types: string | char Distribution — Conditional probability distribution of innovation process εt "Gaussian" (default) | "t" | structure array Conditional probability distribution of the innovation process εt, specified as a string or structure array. arima stores the value as a structure array. Distribution
String
Structure Array
Gaussian
"Gaussian"
struct('Name',"Gaussian")
Student’s t
"t"
struct('Name',"t",'DoF',DoF)
The 'DoF' field specifies the t distribution degrees of freedom parameter. • DoF > 2 or DoF = NaN. • DoF is estimable. • If you specify "t", DoF is NaN by default. You can change its value by using dot notation after you create the model. For example, Mdl.Distribution.DoF = 3. • If you supply a structure array to specify the Student's t distribution, then you must specify both the 'Name' and the 'DoF' fields. Example: Distribution=struct('Name',"t",'DoF',10) Constant — Model constant NaN (default) | numeric scalar Model constant, specified as a numeric scalar. 12-42
arima
Example: 1 Data Types: double AR — Nonseasonal AR polynomial coefficients cell vector Nonseasonal AR polynomial coefficients, specified as a cell vector. Cells contain numeric scalars or NaN values. A fully specified nonseasonal AR polynomial must be stable. Coefficient signs correspond to the model expressed in difference-equation notation. For example, for the nonseasonal AR polynomial ϕ L = 1 − 0.5L + 0.1L2, specify 'AR',{0.5 –0.1}. If you do not set the 'ARLags' name-value pair argument, AR{j} is the coefficient of lag j, j = 1, …,p, where p = numel(AR). Otherwise, p = max(ARLags) and the following conditions apply: • The lengths of AR and ARLags must be equal. • AR{j} is the coefficient of lag ARLags(j), for each j. • arima stores AR as a length p cell vector. All cells that do not correspond to lags in ARLags contain 0. The default value of AR depends on other specifications: • If you use the shorthand syntax to specify p > 0, AR is a length p cell vector, where each cell contains a NaN value. • If you specify ARLags, AR is a length p cell vector. AR{j} = NaN for each lag ARLags(j). All other cells contain 0. • Otherwise, AR is an empty cell vector {}, meaning the model does not contain a nonseasonal AR polynomial. The coefficients in AR correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If a coefficient is 1e–12 or below, arima excludes that coefficient and its corresponding lag in ARLags from the model. Example: {0.8} Example: {NaN –0.1} Data Types: cell SAR — Seasonal AR polynomial coefficients cell vector Seasonal AR polynomial coefficients, specified as a cell vector. Cells contain numeric scalars or NaN values. A fully specified seasonal AR polynomial must be stable. Coefficient signs correspond to the model expressed in difference-equation notation. For example, for the seasonal AR polynomial Φ L = 1 − 0.5L4 + 0.1L8, specify 'SAR',{0.5 –0.1}. If you do not set the 'SARLags' name-value pair argument, SAR{j} is the coefficient of lag j, j = 1, …,ps, where ps = numel(SAR). Otherwise, ps = max(SARLags) and the following conditions apply: 12-43
12
Functions
• The lengths of SAR and SARLags must be equal. • SAR{j} is the coefficient of lag SARLags(j), for each j. • arima stores SAR as a length ps cell vector. All cells that do not correspond to lags in SARLags contain 0. The default value of SAR depends on the value SARLags: • If you specify SARLags, SAR is a length ps cell vector. SAR{j} = NaN for each lag SARLags(j). All other cells contain 0. • Otherwise, SAR is an empty cell vector {}, meaning the model does not contain a seasonal AR polynomial. The coefficients in SAR correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If a coefficient is 1e–12 or below, arima excludes that coefficient and its corresponding lag in SARLags from the model. Example: {0.2 0.1} Example: {NaN 0 0 NaN} Data Types: cell MA — Nonseasonal MA polynomial coefficients cell vector Nonseasonal MA polynomial coefficients, specified as a cell vector. Cells contain numeric scalars or NaN values. A fully specified nonseasonal MA polynomial must be invertible. If you do not set the 'MALags' name-value pair argument, MA{j} is the coefficient of lag j, j = 1, …,q, where q = numel(MA). Otherwise, q = max(MALags) and the following conditions apply: • The lengths of MA and MALags must be equal. • MA{j} is the coefficient of lag MALags(j), for each j. • arima stores MA as a length q cell vector. All cells that do not correspond to lags in MALags contain 0. The default value of MA depends on other specifications: • If you use the shorthand syntax to specify q > 0, MA is a length q cell vector, where each cell contains a NaN value. • If you specify MALags, MA is a length q cell vector. MA{j} = NaN for each lag MALags(j). All other cells contain 0. • Otherwise, MA is an empty cell vector {}, meaning the model does not contain a nonseasonal MA polynomial. The coefficients in SMA correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If a coefficient is 1e–12 or below, arima excludes that coefficient and its corresponding lag in SMALags from the model. Example: 0.8 Example: {NaN –0.1} 12-44
arima
Data Types: cell SMA — Seasonal MA polynomial coefficients cell vector Seasonal MA polynomial coefficients, specified as a cell vector. Cells contain numeric scalars or NaN values. A fully specified seasonal MA polynomial must be invertible. If you do not set the 'SMALags' name-value pair argument, SMA{j} is the coefficient of lag j, j = 1, …,qs, where qs = numel(SMA). Otherwise, qs = max(SMALags) and the following conditions apply: • The lengths of SMA and SMALags must be equal. • SMA{j} is the coefficient of lag SMALags(j), for each j. • arima stores SMA as a length qs cell vector. All cells that do not correspond to lags in SMALags contain 0. The default value of SMA depends on other specifications: • If you specify SMALags, SMA is a length qs cell vector. SMA{j} = NaN for each lag SMALags(j). All other cells contain 0. • Otherwise, SMA is an empty cell vector {}, meaning the model does not contain a seasonal MA polynomial. The coefficients in SMA correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If a coefficient is 1e–12 or below, arima excludes that coefficient and its corresponding lag in SMALags from the model. Example: {0.2 0.1} Example: {NaN 0 0 NaN} Data Types: cell D — Degree of nonseasonal integration 0 (default) | nonnegative integer Degree of nonseasonal integration, or the degree of the nonseasonal differencing polynomial, specified as a nonnegative integer. Example: 1 Data Types: double Seasonality — Degree of seasonal differencing polynomial 0 (default) | nonnegative integer Degree of the seasonal differencing polynomial s, specified as a nonnegative integer. Example: 12 specifies monthly periodicity. Data Types: double Beta — Regression component coefficients empty row vector (default) | numeric vector Regression component coefficients of the conditional mean, specified as a numeric vector. 12-45
12
Functions
If you plan to estimate all elements of Beta, you do not need to specify it. During estimation, estimate infers the size of Beta from the number of columns of the specified exogenous data X. Example: [0.5 NaN 3] Data Types: double Variance — Model innovations variance NaN (default) | positive scalar | supported conditional variance model object Model innovations variance, specified as a positive scalar or a supported conditional variance model object (for example, garch). For all supported conditional variance models, see “Conditional Variance Models”. A positive scalar or NaN specifies a homoscedastic model. A conditional variance model object specifies a composite conditional mean and variance model. estimate fits all unknown, estimable parameters in the composition. Example: 1 Example: garch(1,0) Data Types: double SeriesName — Response series name string scalar | character vector | "Y" Response series name, specified as a string scalar or character vector. arima stores the value as a string scalar. Example: "StockReturn" Data Types: string | char
Object Functions estimate summarize infer filter impulse simulate forecast
Fit univariate ARIMA or ARIMAX model to data Display univariate ARIMA or ARIMAX model estimation results Infer univariate ARIMA or ARIMAX model residuals or conditional variances Filter disturbances using univariate ARIMA or ARIMAX model Generate univariate ARIMA model impulse response function (IRF) Monte Carlo simulation of univariate ARIMA or ARIMAX models Forecast univariate ARIMA or ARIMAX model responses or conditional variances
Examples Create Default Model Create a default regression model with ARIMA errors by using regARIMA. Mdl = regARIMA Mdl = regARIMA with properties: Description: "ARMA(0,0) Error Model (Gaussian Distribution)" SeriesName: "Y"
12-46
arima
Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
Name = "Gaussian" NaN [1×0] 0 0 {} {} {} {} NaN
Mdl is an regARIMA object. Properties of the model appear at the command line. The default model is yt = c + ut ut = εt, where c is an unknown constant and εt is a series of iid Gaussian random variables with mean 0 and variance σ2. Mdl is a model template for estimation. You can modify property values by using dot notation or fit the model to data by using estimate, but you cannot pass Mdl to any other object function.
Create Default Model Create a default ARIMA model by using arima. Mdl = arima Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(0,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 0 0 NaN {} {} {} {} 0 [1×0] NaN
Mdl is an arima object. Properties of the model appear at the command line. The default model is yt = c + εt, 12-47
12
Functions
where c is an unknown constant and εt is a series of iid Gaussian random variables with mean 0 and variance σ2. Mdl is a model template for estimation. You can modify property values by using dot notation or fit the model to data by using estimate, but you cannot pass Mdl to any other object function.
Create Fully Specified Model Create the ARIMA(2,1,1) model represented by this equation: (1 + 0 . 5L2)(1 − L)yt = 3 . 1 + (1 − 0 . 2L)εt, where εt is a series of iid Gaussian random variables. Use the longhand syntax to specify parameter values in the equation written in difference-equation notation: Δyt = 3 . 1 − 0 . 5Δyt − 2 + εt − 0 . 2εt − 1 . Mdl = arima('ARLags',2,'AR',-0.5,'D',1,'MA',-0.2,... 'Constant',3.1) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,1,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 3 1 1 3.1 {-0.5} at lag [2] {} {-0.2} at lag [1] {} 0 [1×0] NaN
Mdl is a fully specified arima object because all its parameters are known. You can pass Mdl to any arima object function except estimate. For example, plot the impulse response function of the model for 24 periods by using impulse. impulse(Mdl,24)
12-48
arima
Create Partially Specified Model Create the AR(1) model represented by this equation: yt = 1 + ϕyt − 1 + εt, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.5. Use the shorthand syntax to specify an AR(1) model template, then use dot notation to set the Constant and Variance properties. Mdl = arima(1,0,0); Mdl.Constant = 1; Mdl.Variance = 0.5; Mdl Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q:
"ARIMA(1,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 0 0
12-49
12
Functions
Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
1 {NaN} at lag [1] {} {} {} 0 [1×0] 0.5
Mdl is a partially specified arima object. You can modify property values by using dot notation or fit the unknown coefficient ϕ to data by using estimate, but you cannot pass Mdl to any other object function.
Create Nonseasonal ARIMA Model Template Create the ARIMA(3,1,2) model represented by this equation: (1 − ϕ1L − ϕ2L2 − ϕ3L3)(1 − L)yt = (1 + θ1L + θ2L2)εt, where εt is a series of iid Gaussian random variables with mean 0 and variance σ2. Because the model contains only nonseasonal polynomials, use the shorthand syntax. Mdl = arima(3,1,2) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(3,1,2) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 4 1 2 NaN {NaN NaN NaN} at lags [1 2 3] {} {NaN NaN} at lags [1 2] {} 0 [1×0] NaN
The property P is equal to p + D = 4. NaN-valued elements indicate estimable parameters.
Specify Nonconsecutive Lags To include additive seasonal lags, specify the lags matching the appropriate periodicity. For example, create the additive monthly MA(12) model represented in this equation: yt = εt + θ1εt − 1 + θ12εt − 12, 12-50
arima
where εt is a series of iid Gaussian random variables with mean 0 and variance σ2. Mdl = arima('Constant',0,'MALags',[1 12]) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(0,0,12) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 0 12 0 {} {} {NaN NaN} at lags [1 12] {} 0 [1×0] NaN
Create SARIMA Model Template Create the SARIMA 0, 1, 1 × 0, 1, 1 12 model (multiplicative, monthly MA model template with one degree of seasonal and nonseasonal integration) represented by this equation: (1 − L)(1 − L12)yt = (1 + θ1L)(1 + θ12L12)εt, where εt is a series of iid Gaussian random variables with mean 0 and variance σ2. Mdl = arima('Constant',0,'D',1,'Seasonality',12,... 'MALags',1,'SMALags',12) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12) (Gaussian Distri "Y" Name = "Gaussian" 13 1 13 0 {} {} {NaN} at lag [1] {NaN} at lag [12] 12 [1×0] NaN
12-51
12
Functions
Modify Model Object Create the AR(3) model represented by this equation: yt = 0 . 05 + 0 . 6yt − 1 + 0 . 2yt − 2 − 0 . 1yt − 3 + εt, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.01. Mdl = arima('Constant',0.05,'AR',{0.6,0.2,-0.1},'Variance',0.01) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(3,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 3 0 0 0.05 {0.6 0.2 -0.1} at lags [1 2 3] {} {} {} 0 [1×0] 0.01
Add a nonseasonal MA term at lag 2 with coefficient 0.2. Then, display the MA property. Mdl.MA = {0 0.2} Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(3,0,2) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 3 0 2 0.05 {0.6 0.2 -0.1} at lags [1 2 3] {} {0.2} at lag [2] {} 0 [1×0] 0.01
Mdl.MA ans=1×2 cell array {[0]} {[0.2000]}
In the model display, lags indicates the lags to which the corresponding coefficients are associated. Although MATLAB® removes zero-valued coefficients from the display, the properties storing coefficients preserve them. 12-52
arima
Change the model constant to 1. Mdl.Constant = 1 Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(3,0,2) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 3 0 2 1 {0.6 0.2 -0.1} at lags [1 2 3] {} {0.2} at lag [2] {} 0 [1×0] 0.01
Specify t Distribution for Innovations Create an AR(1) model template and specify iid t-distributed innovations with unknown degrees of freedom. Use the longhand syntax. Mdl = arima('ARLags',1,'Distribution',"t") Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(1,0,0) Model (t Distribution)" "Y" Name = "t", DoF = NaN 1 0 0 NaN {NaN} at lag [1] {} {} {} 0 [1×0] NaN
The degrees of freedom DoF is NaN, which indicates that the degrees of freedom is estimable. Create the fully specified AR(1) model represented by this equation: yt = 0 . 6yt − 1 + εt, where εt is an iid series of t-distributed random variables with 10 degrees of freedom. Use the longhand syntax. 12-53
12
Functions
innovdist = struct('Name',"t",'DoF',10); Mdl = arima('Constant',0,'AR',{0.6},... 'Distribution',innovdist) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(1,0,0) Model (t Distribution)" "Y" Name = "t", DoF = 10 1 0 0 0 {0.6} at lag [1] {} {} {} 0 [1×0] NaN
Create Composite Conditional Mean and Variance Model Template Create the ARMA(1,1) conditional mean model containing an ARCH(1) conditional variance model represented by these equations: yt = c + ϕyt − 1 + εt + θεt − 1 . εt = σtzt . σt2 = κ + γσt2− 1 . zt ∼ N(0, 1) . Create the ARMA(1,1) conditional mean model template by using the shorthand syntax. Mdl = arima(1,0,1) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
12-54
"ARIMA(1,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 0 1 NaN {NaN} at lag [1] {} {NaN} at lag [1] {} 0 [1×0] NaN
arima
The Variance property of Mdl is NaN, which means that the model variance is an unknown constant. Create the ARCH(1) conditional variance model template by using the shorthand syntax of garch. CondVarMdl = garch(0,1) CondVarMdl = garch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(0,1) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 1 NaN {} {NaN} at lag [1] 0
Create the composite conditional mean and variance model template by setting the Variance property of Mdl to CondVarMdl using dot notation. Mdl.Variance = CondVarMdl Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(1,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 0 1 NaN {NaN} at lag [1] {} {NaN} at lag [1] {} 0 [1×0] [GARCH(0,1) Model]
All NaN-valued properties of the conditional mean and variance models are estimable.
Estimate ARIMAX Model Create an ARMAX(1,2) model for predicting changes in the US personal consumption expenditure based on changes in paid compensation of employees. Load the US macroeconomic data set. load Data_USEconModel
DataTimeTable is a MATLAB® timetable containing quarterly macroeconomic measurements from 1947:Q1 through 2009:Q1. PCEC is the personal consumption expenditure series, and COE is the paid 12-55
12
Functions
compensation of employees series. Both variables are in levels. For more details on the data, enter Description at the command line. The series are nonstationary. To avoid spurious regression, stabilize the variables by converting the levels to returns using price2ret. Compute the sample size. pcecret = price2ret(DataTimeTable.PCEC); coeret = price2ret(DataTimeTable.COE); T = numel(pcecret);
Because conversion from levels to returns involves applying the first difference, the transformation reduces the total sample size by one observation. Create an ARMA(1,2) model template using the shorthand syntax. Mdl = arima(1,0,2);
The exogenous component enters the model during estimation. Therefore, you do not need to set the Beta property of Mdl to a NaN so that estimate fits the model to the data with the other parameters. ARMA(1,2) process initialization requires Mdl.P = 1 observation. Therefore, the presample period is the first time point in the data (first row) and the estimation sample is the rest of the data. Specify variables identifying the presample and estimation periods. idxpre = Mdl.P; idxest = (Mdl.P + 1):T;
Fit the model to the data. Specify the presample by using the 'Y0' name-value pair argument, and specify the exogenous data by using the 'X' name-value pair argument. EstMdl = estimate(Mdl,pcecret(idxest),'Y0',pcecret(idxpre),... 'X',coeret(idxest)); ARIMAX(1,0,2) Model (Gaussian Distribution): Value _________ Constant AR{1} MA{1} MA{2} Beta(1) Variance
0.0091866 -0.13506 -0.090445 0.29671 0.5831 5.305e-05
StandardError _____________ 0.001269 0.081986 0.082052 0.064589 0.048884 3.1387e-06
TStatistic __________ 7.239 -1.6474 -1.1023 4.5939 11.928 16.902
PValue __________ 4.5203e-13 0.099478 0.27034 4.3505e-06 8.4535e-33 4.3581e-64
All estimates, except the lag 1 MA coefficient, are significant at 0.1 level. Display EstMdl. EstMdl EstMdl = arima with properties: Description: "ARIMAX(1,0,2) Model (Gaussian Distribution)" SeriesName: "Y"
12-56
arima
Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
Name = "Gaussian" 1 0 2 0.00918662 {-0.135063} at lag [1] {} {-0.0904452 0.296714} at lags [1 2] {} 0 [0.583095] 5.30503e-05
Like Mdl, EstMdl is an arima model object representing an ARMA(1,2) process. Unlike Mdl, EstMdl is fully specified because it is fit to the data, and EstMdl contains an exogenous component, so it is an ARMAX(1,2) model.
Simulate ARIMA Model Create an arima model object for the random walk represented in this equation: yt = yt − 1 + εt, where εt is a series of iid Gaussian random variables with mean 0 and variance 1. Mdl = arima(0,1,0); Mdl.Constant = 0; Mdl.Variance = 1; Mdl Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(0,1,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 1 0 0 {} {} {} {} 0 [1×0] 1
Mdl is a fully specified arima model object. Simulate and plot 1000 paths of length 100 from the random walk. rng(1) % For reproducibility Y = simulate(Mdl,100,'NumPaths',1000);
12-57
12
Functions
plot(Y) title('Simulated Paths from Random Walk Process')
Forecast ARIMA Model Forecast NASDAQ daily closing prices over a 500-day horizon. Load the US equity indices data set. load Data_EquityIdx
The data set contains daily NASDAQ closing prices from 1990 through 2001. For more details, enter Description at the command line. Assume that an ARIMA(1,1,1) model is appropriate for describing the first 1500 NASDAQ closing prices. Create an ARIMA(1,1,1) model template. Mdl = arima(1,1,1);
estimate requires a presample of size Mdl.P = 2. Fit the model to the data. Specify the first two observations as a presample. idxpre = 1:Mdl.P; idxest = (Mdl.P + 1):1500;
12-58
arima
EstMdl = estimate(Mdl,DataTable.NASDAQ(idxest),... 'Y0',DataTable.NASDAQ(idxpre)); ARIMA(1,1,1) Model (Gaussian Distribution): Value _________ Constant AR{1} MA{1} Variance
0.43291 -0.076323 0.31312 27.86
StandardError _____________ 0.18607 0.082045 0.077284 0.63785
TStatistic __________
PValue __________
2.3266 -0.93026 4.0516 43.678
0.019989 0.35223 5.0876e-05 0
Forecast the closing values into a 500-day horizon by passing the estimated model to forecast. To initialize the model for forecasting, specify the last two observations in the estimation data as a presample. yf0 = DataTable.NASDAQ(idxest(end - 1:end)); yf = forecast(EstMdl,500,yf0);
Plot the first 2000 observations and the forecasts. dates = datetime(dates,'ConvertFrom',"datenum",... 'Format',"yyyy-MM-dd"); figure h1 = plot(dates(1:2000),DataTable.NASDAQ(1:2000)); hold on h2 = plot(dates(1501:2000),yf,'r'); legend([h1 h2],"Observed","Forecasted",... 'Location',"NorthWest") title("NASDAQ Composite Index: 1990-01-02 – 1997-11-25") xlabel("Time (days)") ylabel("Closing Price") hold off
12-59
12
Functions
After the start of 1995, the model forecasts almost always underestimate the true closing prices.
More About Lag Operator The lag operator L is defined as Li yt = yt − i . Lag operators condense polynomial notation. Linear Time Series Model A linear time series model for response process yt and random innovations εt is a stochastic process on page 1-18 in which the current response is a linear function of previous responses, the current and previous innovations, and exogenous covariates xt. In difference-equation notation, the general form of a linear time series model is: yt = c + xt β + a1 yt − 1 + … + aw yt − w + εt + b1εt − 1 + … + bvεt − v . Given w and v, all coefficients are estimable. Expressed in lag operator on page 12-60 notation, the general model form is: a(L)yt = c + xt β + b(L)εt . The lag operator polynomials in the model are often expressed as products of polynomials for nonseasonal and multiplicative seasonal effects and integration: 12-60
arima
Ds
D
ϕ(L)(1 − L) Φ(L)(1 − Ls) yt = c + xt β + θ(L)Θ(L)εt . Model Component
Description
ϕ(L)
ϕ(L) = 1 − ϕL − ϕ2L2 − ... − ϕpLp, AR stores the coefficients; indices correspond to lag exponents. a p-degree stable nonseasonal AR polynomial.
D
Degree of nonseasonal integration D
Φ(L)
Φ(L) = 1 − Φp1L p
arima Property
p1 p − Φp2L 2 − ...
a
− ΦpsL s,
SAR stores the coefficients; indices correspond to lag exponents.
ps-degree stable, multiplicative seasonal AR polynomial. s
Seasonality, or the degree of the seasonal differencing polynomial
Seasonality
Ds
Degree of seasonal integration
No corresponding property, but: • If Seasonality > 0, Ds = 1. • Otherwise, Ds = 0.
c
Model constant
Constant
β
Regression coefficient of exogenous covariates
Beta
θ(L)
θ(L) = 1 + θL + θ2L2 + ... + θqLq, a MA stores the coefficients; indices correspond to lag exponents. q-degree invertible nonseasonal MA polynomial.
Θ(L)
Θ(L) = 1 + Θq1L q
q1 q + Θq2L 2 + ...
a
+ ΘqsL s,
SMA stores the coefficients; indices correspond to lag exponents.
qs-degree invertible, multiplicative seasonal MA polynomial. εt
Series of random iid innovations
Distribution stores the distribution name and any parameters.
• The model property P is equal to p + D + ps + s. • The model property Q is equal to q + qs. Note The degrees of the lag operators in the seasonal polynomials Φ(L) and Θ(L) do not conform to the degrees defined by Box and Jenkins [1]. In other words, Econometrics Toolbox does not treat p1 = s, p2 = 2s,...,ps = rps and q1 = s, q2 = 2s,...,qs = rqs where rp and rq are positive integers. The software is flexible, letting you specify the lag operator degrees. See “Create Multiplicative ARIMA Models” on page 7-44.
12-61
12
Functions
Stationarity A stochastic process yt is stationary if its expected value, variance, and covariance between elements of the series are independent of time. For example, the MA(q) model, with c = 0, is stationary for any q < ∞ because each of the following are free of t for all time points [1]. • E(yt) = θ(L)0 = 0. • •
Var(yt) = σ2
q
∑
i=1
Cov(yt, yt − s) =
2
θi . σ2(θs + θ1θs − 1 + θ2θs − 2 + ... + θqθs − q) if s ≥ q 0 otherwise .
Unit Root The time series yt; t = 1, ..., T is a unit root process if its expected value, variance, or covariance grows with time. Consequently, the time series is nonstationary.
Version History Introduced in R2012a R2023b: Name an ARIMA model response series Name the response series of an ARIMA model by setting the SeriesName property to a string scalar. When you supply input response data to model object functions in a table or timetable, the functions choose the variable with name SeriesName as the response variable by default. R2018a: Describe an ARIMA model Describe an ARIMA model by setting the Description property to a string scalar. R2018a: Use indices that are consistent with MATLAB cell array indexing The indices of cell arrays of lag operator polynomial coefficients follow MATLAB cell array indexing rules. Affected model properties are the AR, MA, SAR, and SMA properties. • You cannot access any lag-zero coefficients by using an index of 0. For example, Mdl.AR{0} issues an error. Remove any instances of such indices of zero from your code. The value of all lag-zero coefficients is 1, except for the lag operator polynomial corresponding to the ARCH property, which has the value 0. • You cannot index beyond the maximal lag in the polynomial. For example, if Mdl.P is 4, then Mdl.AR{p} issues an error when p is greater than 4. For details on the maximal lags of the lag operator polynomials, see the corresponding property descriptions. 12-62
arima
Remove any instances of such indices beyond the maximal lag from your code. All coefficients beyond the maximal lag are 0. R2018a: Models store innovation distribution name as a string scalar Behavior changed in R2018a The Name field of the Distribution property of arima model objects stores the innovation distribution name as a string scalar, for example, "Gaussian" for Gaussian innovations. Before R2018a, MATLAB stored the innovation distribution name as a character vector, for example 'Gaussian' for Gaussian innovations. Although most text-data operations accept character vectors and string scalars for text-data input, the two data types have some differences. For details, see “Text in String and Character Arrays”.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Apps Econometric Modeler Objects gjr | egarch | garch Topics “Analyze Time Series Data Using Econometric Modeler” on page 4-2 “Creating Univariate Conditional Mean Models” on page 7-3 “Modify Properties of Conditional Mean Model Objects” on page 7-63 “Specify Conditional Mean Model Innovation Distribution” on page 7-69 “Create Autoregressive Models” on page 7-16 “Create Moving Average Models” on page 7-24 “Create Autoregressive Moving Average Models” on page 7-31 “Create Autoregressive Integrated Moving Average Models” on page 7-38 “Create ARIMA Models That Include Exogenous Covariates” on page 7-55 “Create Multiplicative ARIMA Models” on page 7-44 “Create Multiplicative Seasonal ARIMA Model for Time Series Data” on page 7-51 “Specify Conditional Mean and Variance Models” on page 7-75
12-63
12
Functions
arima Convert regression model with ARIMA errors to ARIMAX model
Syntax ARIMAXMdl = arima(Mdl) [ARIMAXMdl,XNew] = arima(Mdl,X=X) [ARIMAXMdl,Tbl2] = arima(Mdl,PredictorTbl=Tbl1) [ARIMAXMdl,Tbl2] = arima(Mdl,PredictorTbl=Tbl1,PredictorVariables= PredictorVariables)
Description The arima object function converts a specified regression model with ARIMA errors (regARIMA model object) to the equivalent ARIMAX model (arima model object). To create an ARIMAX model directly, see the arima function. ARIMAXMdl = arima(Mdl) returns ARIMAXMdl, the fully specified ARIMAX model representation of the fully specified regression model with ARIMA time series errors Mdl. [ARIMAXMdl,XNew] = arima(Mdl,X=X) returns the matrix of predictor data XNew for the output ARIMAX model, transformed from the specified matrix of predictor data X associated with the input regression model with ARIMA errors. [ARIMAXMdl,Tbl2] = arima(Mdl,PredictorTbl=Tbl1) returns the table or timetable of predictor data Tbl2 for the output ARIMAX model, transformed from the specified predictor data in the table or timetable Tbl1 associated with the input regression model with ARIMA errors. arima selects all variables in Tbl1 as predictor variables for the regression component of Mdl. [ARIMAXMdl,Tbl2] = arima(Mdl,PredictorTbl=Tbl1,PredictorVariables= PredictorVariables) selects the variable names in PredictorVariables from Tbl1 for the regression component in Mdl.
Examples Convert Regression Model with ARMA Errors to ARIMAX Model Convert a regression model with ARMA(4,1) errors to an ARIMAX model using the arima converter. Provide predictor data in a numeric array. Specify the regression model with ARMA(4,1) errors: yt = 1 + 0 . 5Xt + ut ut = 0 . 8ut − 1 − 0 . 4ut − 4 + εt + 0 . 3εt − 1, where εt is Gaussian with mean 0 and variance 1. 12-64
arima
Mdl = regARIMA(AR={0.8 -0.4},ARLags=[1 4],MA=0.3, ... Intercept=1,Beta=0.5,Variance=1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Regression with ARMA(4,1) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 [0.5] 4 1 {0.8 -0.4} at lags [1 4] {} {0.3} at lag [1] {} 1
You can verify that the lags of the autoregressive terms are 1 and 4 in the AR row. Generate random predictor data. rng(1,"twister"); % For reproducibility T = 20; Pred = randn(T,1);
Convert Mdl to an ARIMAX model. Supply the random set of predictor data Pred for Mdl and return the predictor data for the converted model. [ARIMAXMdl,XNew] = arima(Mdl,X=Pred); ARIMAXMdl ARIMAXMdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMAX(4,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 4 0 1 0.6 {0.8 -0.4} at lags [1 4] {} {0.3} at lag [1] {} 0 [1 -0.8 0.4] 1
The output arima model ARIMAXMdl is yt = 0 . 6 + ZtΓ + 0 . 8yt − 1 − 0 . 4yt − 4 + εt + 0 . 3εt − 1, where
12-65
12
Functions
0 . 5x1
NaN
NaN
0 . 5x2
0 . 5x1
NaN
0 . 5x3
0 . 5x2
NaN
ZtΓ = 0 . 5x4
0 . 5x3
NaN
0 . 5x5
0 . 5x4
0 . 5x1
1 −0 . 8 0.4
⋮ ⋮ ⋮ 0 . 5T 0 . 5xT − 1 0 . 5xT − 4 and x j is row j of Pred. Because the product of the autoregressive and integration polynomials is ϕ(L) = (1 − 0 . 8L + 0 . 4L4), ARIMAX.Beta is [1; -0.8; 0.4]. Note that the software carries over the autoregressive and moving average coefficients from Mdl to ARIMAX. Also, Mdl.Intercept = 1 and ARIMAX.Constant = (1 - 0.8 + 0.4)(1) = 0.6, i.e., the regARIMA model intercept and arima model constant are generally unequal.
Convert Regression Model with ARIMA Errors to ARIMAX Model Convert a regression model with seasonal ARIMA errors to an ARIMAX model using the arima converter. Specify the regression model with ARIMA(2, 1, 1) × (1, 1, 0)2 errors: yt = Xt
−2 + ut 1
(1 − 0 . 3L + 0 . 15L2)(1 − L)(1 − 0 . 2L2)(1 − L2)ut = (1 + 0 . 1L)εt, where εt is Gaussian with mean 0 and variance 1. Mdl = regARIMA(AR={0.3, -0.15},MA=0.1,ARLags=[1 2], ... SAR=0.2,SARLags=2,Seasonality=2,D=1, ... Intercept=0,Beta=[-2; 1],Variance=1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"Regression with ARIMA(2,1,1) Error Model Seasonally Integrated with Seasonal A "Y" Name = "Gaussian" 0 [-2 1] 7 1 1 {0.3 -0.15} at lags [1 2] {0.2} at lag [2] {0.1} at lag [1] {} 2 1
Generate predictor data. 12-66
arima
rng(1,"twister"); % For reproducibility T = 20; Pred = randn(T,2);
Convert Mdl to an ARIMAX model. Supply the random set of predictor data Pred for Mdl and return the predictor data for the converted model. [ARIMAX,XNew] = arima(Mdl,X=Pred); ARIMAX ARIMAX = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMAX(2,1,1) Model Seasonally Integrated with Seasonal AR(2) (Gaussian Distri "Y" Name = "Gaussian" 7 1 1 0 {0.3 -0.15} at lags [1 2] {0.2} at lag [2] {0.1} at lag [1] {} 2 [1 -1.3 -0.75 1.41 -0.34 -0.08 0.09 -0.03] 1
Mdl.Beta has length 2, but ARIMAX.Beta has length 8. This is because the product of the autoregressive and integration polynomials, ϕ(L)(1 − L)Φ(L)(1 − Ls), is 1 − 1 . 3L − 0 . 75L2 + 1 . 41L3 − 0 . 34L4 − 0 . 08L5 + 0 . 09L6 − 0 . 03L7 . You can see that when you add seasonality, seasonal lag terms, and integration to a model, the size of XNew can grow quite large. A conversion such as this might not be ideal for analyses involving small sample sizes.
Convert Fitted Model to ARIMAX Model Fit a regression model with ARMA(1,1) errors by regressing the US consumer price index (CPI) quarterly changes onto the US gross domestic product (GDP) growth rate. Convert the fitted model to an ARIMAX model. Supply a timetable of data and specify the series for the fit. Load and Transform Data Load the US macroeconomic data set. Compute the series of GDP quarterly growth rates and CPI quarterly changes. load Data_USEconModel DTT = price2ret(DataTimeTable,DataVariables="GDP"); DTT.GDPRate = 100*DTT.GDP; DTT.CPIDel = diff(DataTimeTable.CPIAUCSL); T = height(DTT) T = 248
12-67
12
Functions
figure tiledlayout(2,1) nexttile plot(DTT.Time,DTT.GDPRate) title("GDP Rate") ylabel("Percent Growth") nexttile plot(DTT.Time,DTT.CPIDel) title("Index")
The series appear stationary, albeit heteroscedastic. Prepare Timetable for Estimation When you plan to supply a timetable, you must ensure it has all the following characteristics: • The selected response variable is numeric and does not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the timetable. DTT = rmmissing(DTT); T_DTT = height(DTT) T_DTT = 248
Because each sample time has an observation for all variables, rmmissing does not remove any observations. 12-68
arima
Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt; areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 1
DTT is regular. Create Model Template for Estimation Suppose that a regression model of CPI quarterly changes onto the GDP rate, with ARMA(1,1) errors, is appropriate. Create a model template for a regression model with ARMA(1,1) errors template. Mdl = regARIMA(1,0,1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(1,1) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 1 1 {NaN} at lag [1] {} {NaN} at lag [1] {} NaN
Mdl is a partially specified regARIMA object. 12-69
12
Functions
Fit Model to Data Fit a regression model with ARMA(1,1) errors to the data. Specify the entire series GDP rate and CPI quarterly changes series, and specify the response and predictor variable names. EstMdl = estimate(Mdl,DTT,ResponseVariable="GDPRate", ... PredictorVariables="CPIDel"); Regression with ARMA(1,1) Error Model (Gaussian Distribution): Value ________ Intercept AR{1} MA{1} Beta(1) Variance
0.0162 0.60515 -0.16221 0.002221 0.000113
StandardError _____________ 0.0016077 0.089912 0.11051 0.00077691 7.2753e-06
TStatistic __________ 10.077 6.7305 -1.4678 2.8587 15.533
PValue __________ 6.9995e-24 1.6906e-11 0.14216 0.0042532 2.0838e-54
EstMdl is a fully specified, estimated regARIMA object. By default, estimate backcasts for the required Mdl.P = 1 presample regression model residual and sets the required Mdl.Q = 1 presample error model residual to 0. Convert Fitted Model Convert the fitted model to an ARIMAX model. Supply DTT and select the predictor variables from it. Return the timetable of predictor data for the converted model. [ARIMAXMdl,Tbl2] = arima(EstMdl,PredictorTbl=DTT,PredictorVariables="CPIDel"); ARIMAXMdl ARIMAXMdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMAX(1,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 0 1 0.00639649 {0.605153} at lag [1] {} {-0.162208} at lag [1] {} 0 [1 -0.605153] 0.000113005
tail(Tbl2)
12-70
Time _____
Interval ________
Q2-07 Q3-07 Q4-07
91 91 94
GDP ___________
GDPRate __________
CPIDel ______
Lag0XBeta _________
Lag1XBeta _________
0.00018278 0.00016916 6.1286e-05
0.018278 0.016916 0.0061286
1.675 1.359 3.355
0.0037202 0.0030183 0.0074515
0.0045486 0.0037202 0.0030183
arima
Q1-08 Q2-08 Q3-08 Q4-08 Q1-09
91 91 92 92 90
9.3272e-05 0.00011103 8.9585e-05 -0.00016145 -8.6878e-05
0.0093272 0.011103 0.0089585 -0.016145 -0.0086878
1.93 3.367 1.641 -7.098 1.137
0.0042865 0.0074781 0.0036447 -0.015765 0.0025253
0.0074515 0.0042865 0.0074781 0.0036447 -0.015765
ARIMAXMdl is an arima object representing the converted model. Tbl2 is a timetable containing the same variables as DTT and predictor variables for the exogenous regression component of ARIMAXMdl, Lag0XBeta and Lag1XBeta.
Input Arguments Mdl — Fully specified regression model with ARIMA errors regARIMA model object Fully specified regression model with ARIMA errors, specified as a regARIMA model object created by regARIMA or estimate. The properties of Mdl cannot contain NaN values. X — Predictor data xt [] (default) | numeric matrix Predictor data xt for the regression component of the input regression model with ARIMA errors Mdl, specified as a numobs-by-numpredsMdl numeric matrix, where numpredsMdl is numel(Mdl.Beta). The last row of X contains the latest observation. Each column of X is a separate predictor variable. Data Types: double Tbl1 — Time series data containing predictor variables xt [] (default) | table | timetable Time series data containing predictor variables xt associated with the regression component of the input regression model with ARIMA errors Mdl, specified as a table or timetable with numvars1 variables and numobs rows. Each selected predictor variable is a numeric vectors representing a single path of numobs observations. You can optionally select numpredsMdl predictor variables from Tbl1 by using the PredictorVariables name-value argument. Each row is an observation, and measurements in each row occur simultaneously. If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending. If Tbl1 is a table, the last row contains the latest observation. PredictorVariables — Variables to select from Tbl1 to treat as predictor variables xt Tbl1.Properties.VariableNames (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Tbl1 to treat as the predictor variables xt in the input regression model with ARIMA errors Mdl, specified as one of the following data types: 12-71
12
Functions
• String vector or cell vector of character vectors containing numpredsMdl variable names in Tbl1.Properties.VariableNames • A length numpredsMdl vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variables must be numeric vectors and cannot contain missing values (NaN). Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. Data Types: double | logical | char | cell | string Note • NaN values in X indicate missing values. The arima function accommodates NaN values such that observations in XNew corresponding to missing values in X are NaNs. • arima issues an error when any table or timetable input contains missing values.
Output Arguments ARIMAXMdl — ARIMAX model arima model object ARIMAX model equivalent of the input regression model with ARIMA time series errors Mdl, returned as a fully specified arima model object. XNew — Converted predictor data numeric matrix Converted predictor data matrix for the exogenous regression component of the output ARIMAX model ARIMAXMdl, returned as a numobs-by-numpredsARIMAXMdl numeric matrix. numpredsARIMAXMdl is one plus the number of nonzero autoregressive coefficients in the difference equation of Mdl (see “Algorithms” on page 12-73). The arima returns XNew only when you supply the numeric matrix input X The last row of XNew contains the latest observation of each series. Each column of XNew is a separate predictor variable. Data Types: double Tbl2 — Converted predictor series table | timetable Converted predictor series, associated with the exogenous regression component of the output ARIMAX model ARIMAXMdl, returned as a table or timetable, the same data type as Tbl1. arima returns Tbl2 only when you supply the input Tbl1. Tbl2 contains the following variables: 12-72
arima
• The converted predictor variables, which are in a numobs-by-1 numeric vectors. arima names the converted predictor variables in Tbl2 LagNumXBeta, where Num is the lag to which the predictor variable applies. The first converted predictor variable has name Lag0XBeta and applies to lag 0. The last predictor variable applies to lag Mdl.P. The arima function includes intermediate lags only when they are associated with non-zero autoregressive coefficients (see “Algorithms” on page 12-73). • All variables Tbl1. Each row is an observation, and measurements in each row occur simultaneously. If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal.
Algorithms Let X denote the matrix of concatenated predictor data vectors (or design matrix) and β denote the regression component for the regression model with ARIMA errors, Mdl. • If you specify X or Tbl1, arima returns converted predictor data in XNew or Tbl2 using a certain format. Suppose that the nonzero autoregressive lag term degrees of Mdl are 0 < a1 < a2 < ...< P, which is the largest lag term degree. The software obtains these lag term degrees by expanding and reducing the product of the seasonal and nonseasonal autoregressive lag polynomials, and the seasonal and nonseasonal integration lag polynomials D
ϕ(L)(1 − L) Φ(L)(1 − Ls) . • The first converted predictor variable is Xβ. • The second converted predictor variable is a sequence of a1 NaNs, and then the product Xa1 β, a
where Xa1 β = L 1 Xβ . • Converted Predictor variable j is a sequence of aj NaNs, and then the product Xa j β, where a
Xa j β = L j Xβ . • The last converted predictor variable is a sequence of ap NaNs, and then the product Xp β, where Xpβ = Lp Xβ . Suppose that Mdl is a regression model with ARIMA(3,1,0) errors, and ϕ1 = 0.2 and ϕ3 = 0.05. Then the product of the autoregressive and integration lag polynomials is (1 − 0.2L − 0.05L3)(1 − L) = 1 − 1.2L + 0.02L2 − 0.05L3 + 0.05L4 . This implies that ARIMAXMdl.Beta is [1 -1.2 0.02 -0.05 0.05] and XNew is x1 β NaN
NaN
NaN
NaN
x2 β
x1 β
NaN
NaN
NaN
x3 β
x2 β
x1 β
NaN
NaN
x4 β
x3 β
x2 β
x1 β
x5 β
x4 β
x3 β
x2 β
NaN , x1 β
⋮ ⋮ ⋮ ⋮ ⋮ xT β xT − 1 β xT − 2 β xT − 3 β xT − 4 β 12-73
12
Functions
where xj is row j of X. • If you do not specify X or Tbl1, arima returns converted predictor data in XNew as an empty matrix without rows and a number of columns equal to one plus the number of nonzero autoregressive coefficients in the difference equation of Mdl.
Version History Introduced in R2013b R2023b: arima accepts input data in tables and timetables In addition to accepting input predictor data in a numeric matrix, arima accepts input data in a table or regular timetable. To supply a table or timetable containing predictor data, use the PredictorTbl name-value argument. When you supply data in a table or timetable, arima chooses default predictor variables, but you can use the PredictorVariables name-value argument to select a different series from PredictorTbl. When you supply a table or timetable of data, arima returns results in a table or timetable.
See Also regARIMA | arima | estimate Topics “Time Series Regression Models” on page 5-3 “Alternative ARIMA Model Representations” on page 5-113
12-74
arma2ar
arma2ar Convert ARMA model to AR model
Syntax ar = arma2ar(ar0,ma0) ar = arma2ar(ar0,ma0,numLags)
Description ar = arma2ar(ar0,ma0) returns the coefficients of the truncated, infinite-order AR model approximation to an ARMA model having AR and MA coefficients specified by ar0 and ma0, respectively. arma2ar: • Accepts: • Vectors or cell vectors of matrices in difference-equation notation on page 12-82. • LagOp lag operator polynomials corresponding to the AR and MA polynomials in lag operator notation on page 12-83. • Accommodates time series models that are univariate or multivariate (i.e., numVars variables compose the model), stationary or integrated, structural or in reduced form, and invertible. • Assumes that the model constant c is 0. ar = arma2ar(ar0,ma0,numLags) returns the first nonzero numLags lag-term coefficients of the infinite-order AR model approximation of an ARMA model having AR coefficients ar0 and MA coefficients ma0.
Examples Convert an ARMA model to an AR Model Find the lag coefficients of the truncated, AR approximation of this univariate, stationary, and invertible ARMA model yt = 0 . 2yt − 1 − 0 . 1yt − 2 + εt + 0 . 5εt − 1 . The ARMA model is in difference-equation notation because the left side contains only yt and its coefficient 1. Create a vector containing the AR lag term coefficients in order starting from t - 1. ar0 = [0.2 -0.1];
Alternatively, you can create a cell vector of the scalar coefficients. Create a vector containing the MA lag term coefficient. ma0 = 0.5;
12-75
12
Functions
Convert the ARMA model to an AR model by obtaining the coefficients of the truncated approximation of the infinite-lag polynomial. ar = arma2ar(ar0,ma0) ar = 1×7 0.7000
-0.4500
0.2250
-0.1125
0.0562
-0.0281
0.0141
ar is a numeric vector because ar0 and ma0 are numeric vectors. The approximate AR model truncated at 7 lags is yt = 0 . 7yt − 1 − 0 . 45yt − 2 + 0 . 225yt − 3 − 0 . 1125yt − 4 + 0 . 0562yt − 5 + . . . −0 . 0281yt − 6 + 0 . 0141yt − 7 + εt
Convert an MA(3) Model to an AR(5) Model Find the first five lag coefficients of the AR approximation of this univariate and invertible MA(3) model yt = εt − 0 . 2εt − 1 + 0 . 5εt − 3 . The MA model is in difference-equation notation because the left side contains only yt and its coefficient of 1. Create a cell vector containing the MA lag term coefficient in order starting from t 1. Because the second lag term of the MA model is missing, specify a 0 for its coefficient. ma0 = {-0.2 0 0.5};
Convert the MA model to an AR model with at most five lag coefficients of the truncated approximation of the infinite-lag polynomial. Because there is no AR contribution, specify an empty cell ({}) for the AR coefficients. numLags = 5; ar0 = {}; ar = arma2ar(ar0,ma0,numLags) ar=1×5 cell array {[-0.2000]}
{[-0.0400]}
{[0.4920]}
{[0.1984]}
{[0.0597]}
ar is a cell vector of scalars because at least one of ar0 and ma0 is a cell vector. The approximate AR(5) model is yt = − 0 . 2yt − 1 − 0 . 04yt − 2 + 0 . 492yt − 3 + 0 . 1984yt − 4 + 0 . 0597yt − 5 + εt
12-76
arma2ar
Convert a Structural VARMA model to a Structural VAR model Find the coefficients of the truncated, structural VAR equivalent of the structural, stationary, and invertible VARMA model 1 0 . 2 −0 . 1 0 . 03 1 −0 . 15 − 0 . 9 −0 . 25 1 10 0 −0 . 02 0 . 03 0 1 0 + 0 . 003 0 . 001 00 1 0 . 3 0 . 01
−0 . 5 0 . 2 0 . 1 −0 . 05 0 . 02 0 . 01 4 0 . 3 0 . 1 −0 . 1 L − 0 . 1 0 . 01 0 . 001 L8 yt = −0 . 4 0 . 2 0 . 05 −0 . 04 0 . 02 0 . 005 0.3 0 . 01 L4 εt 0 . 01
where yt = y1t y2t y3t ′ and εt = ε1t ε2t ε3t ′. The VARMA model is in lag operator notation because the response and innovation vectors are on opposite sides of the equation. Create a cell vector containing the VAR matrix coefficients. Because this model is a structural model, start with the coefficient of yt and enter the rest in order by lag. Because the equation is in lag operator notation, include the sign in front of each matrix. Construct a vector that indicates the degree of the lag term for the corresponding coefficients. var0 = {[1 0.2 -0.1; 0.03 1 -0.15; 0.9 -0.25 1],... -[-0.5 0.2 0.1; 0.3 0.1 -0.1; -0.4 0.2 0.05],... -[-0.05 0.02 0.01; 0.1 0.01 0.001; -0.04 0.02 0.005]}; var0Lags = [0 4 8];
Create a cell vector containing the VMA matrix coefficients. Because this model is a structural model, start with the coefficient of εt and enter the rest in order by lag. Construct a vector that indicates the degree of the lag term for the corresponding coefficients. vma0 = {eye(3),... [-0.02 0.03 0.3; 0.003 0.001 0.01; 0.3 0.01 0.01]}; vma0Lags = [0 4];
arma2ar requires LagOp lag operator polynomials for input arguments that comprise structural VAR or VMA models. Construct separate LagOp polynomials that describe the VAR and VMA components of the VARMA model. VARLag = LagOp(var0,'Lags',var0Lags); VMALag = LagOp(vma0,'Lags',vma0Lags);
VARLags and VMALags are LagOp lag operator polynomials that describe the VAR and VMA components of the VARMA model. Convert the VARMA model to a VAR model by obtaining the coefficients of the truncated approximation of the infinite-lag polynomial. VAR = arma2ar(VARLag,VMALag) VAR = 3-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 4 Non-Zero Coefficients] Lags: [0 4 8 12]
12-77
12
Functions
Degree: 12 Dimension: 3
VAR is a LagOP lag operator polynomial. All coefficients except those corresponding to lags 0, 4, 8, and 12 are 3-by-3 matrices of zeros. Convert the coefficients to difference-equation notation by reflecting the VAR lag operator polynomial around lag zero. VARDiffEqn = reflect(VAR);
Display the nonzero coefficients of the resulting VAR models. lag2Idx = VAR.Lags + 1; % Lags start at 0.
Add 1 to convert to indices.
varCoeff = toCellArray(VAR); varDiffEqnCoeff = toCellArray(VARDiffEqn); fprintf
('
Lag Operator
Lag Operator
|
|
Difference Equation\n')
Difference Equation
for j = 1:numel(lag2Idx) fprintf('_________________________Lag %d_________________________\n',... lag2Idx(j) - 1) fprintf('%8.3f %8.3f %8.3f | %8.3f %8.3f %8.3f\n',... [varCoeff{lag2Idx(j)} varDiffEqnCoeff{lag2Idx(j)}]') fprintf('_______________________________________________________\n') end _________________________Lag 0_________________________ 1.000 0.030 0.900
0.200 1.000 -0.250
-0.100 | -0.150 | 1.000 |
1.000 0.030 0.900
0.200 1.000 -0.250
-0.100 -0.150 1.000
_______________________________________________________ _________________________Lag 4_________________________ 0.249 -0.312 0.091
-0.151 -0.099 -0.268
-0.397 | 0.090 | -0.029 |
-0.249 0.312 -0.091
0.151 0.099 0.268
0.397 -0.090 0.029
_______________________________________________________ _________________________Lag 8_________________________ 0.037 -0.101 -0.033
0.060 -0.007 0.029
-0.012 | 0.000 | 0.114 |
-0.037 0.101 0.033
-0.060 0.007 -0.029
0.012 -0.000 -0.114
_______________________________________________________ _________________________Lag 12_________________________ 0.014 0.000 -0.010
-0.007 -0.000 -0.018
-0.034 | -0.001 | 0.002 |
-0.014 -0.000 0.010
0.007 0.000 0.018
0.034 0.001 -0.002
_______________________________________________________
12-78
arma2ar
The coefficients of lags 4, 8, and 12 are opposites between VAR and VARDiffEqn.
Convert ARMA Model That Includes Constant to AR Model Find the lag coefficients and constant of the truncated AR approximation of this univariate, stationary, and invertible ARMA model. yt = 1 . 5 + 0 . 2yt − 1 − 0 . 1yt − 2 + εt + 0 . 5εt − 1 . The ARMA model is in difference-equation notation because the left side contains only yt and its coefficient of 1. Create separate vectors for the AR and MA lag term coefficients in order starting from t - 1. ar0 = [0.2 -0.1]; ma0 = 0.5;
Convert the ARMA model to an AR model by obtaining the first five coefficients of the truncated approximation of the infinite-lag polynomial. numLags = 5; ar = arma2ar(ar0,ma0,numLags) ar = 1×5 0.7000
-0.4500
0.2250
-0.1125
0.0562
To compute the constant of the AR model, consider the ARMA model in lag operator notation. 1 − 0 . 2L + 0 . 1L2 yt = 1 . 5 + (1 + 0 . 5L)εt or Φ(L)yt = 1 . 5 + Θ(L)εt Part of the conversion involves premultiplying both sides of the equation by the inverse of the MA lag operator polynomial, as in this equation. −1
Θ
−1
(L)Φ(L)yt = Θ
(L)1 . 5 + εt
To compute the inverse of MA lag operator polynomial, use the lag operator left-division object function mldivide. Theta = LagOp([1 0.5]); ThetaInv = mldivide(Theta,1,'RelTol',1e-5);
ThetaInv is a LagOp lag operator polynomial. The application of lag operator polynomials to constants results in the product of the constant with the sum of the coefficients. Apply ThetaInv to the ARMA model constant to obtain the AR model constant. arConstant = 1.5*sum(cell2mat(toCellArray(ThetaInv)))
12-79
12
Functions
arConstant = 1.0000
The approximate AR model is yt = 1 + 0 . 7yt − 1 − 0 . 45yt − 2 + 0 . 225yt − 3 − 0 . 1125yt − 4 + 0 . 0562yt − 5 + εt .
Input Arguments ar0 — Autoregressive coefficients numeric vector | cell vector of square, numeric matrices | LagOp lag operator polynomial object Autoregressive coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square, numeric matrices, or a LagOp lag operator polynomial object. If ar0 is a vector (numeric or cell), then the coefficient of yt is the identity. To specify a structural AR polynomial (i.e., the coefficient of yt is not the identity), use LagOp lag operator polynomials. • For univariate time series models, ar0 is a numeric vector, cell vector of scalars, or a onedimensional LagOp lag operator polynomial. For vectors, ar0 has length p and the elements correspond to lagged responses composing the AR polynomial in difference-equation notation on page 12-82. That is, ar0(j) or ar0{j} is the coefficient of yt-j. • For numVars-dimensional time series models, ar0 is a cell vector of numVars-by-numVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ar0 has length p. • ar0 and ma0 must contain numVars-by-numVars matrices. • The elements of ar0 correspond to the lagged responses composing the AR polynomial in difference equation notation. That is, ar0{j} is the coefficient matrix of yt-j. • Row k of an AR coefficient matrix contains the AR coefficients in the equation of the variable yk. Subsequently, column k must correspond to variable yk, and the column and row order of all autoregressive and moving average coefficients must be consistent. • For LagOp lag operator polynomials: • The first element of the Coefficients property corresponds to the coefficient of yt (to accommodate structural models). All other elements correspond to the coefficients of the subsequent lags in the Lags property. • To construct a univariate model in reduced form, specify 1 for the first coefficient. For numVars-dimensional multivariate models, specify eye(numVars) for the first coefficient. • When you work from a model in difference-equation notation, negate the AR coefficients of the lagged responses to construct the lag-operator polynomial equivalent. For example, consider yt = 0.5yt − 1 − 0.8yt − 2 + εt − 0.6εt − 1 + 0.08εt − 2. The model is in difference-equation form. To convert to an AR model, enter the following into the command window. ar = arma2ar([0.5 -0.8], [-0.6 0.08]);
The ARMA model written in lag-operator notation on page 12-83 is 1 − 0.5L + 0.8L2 yt = 1 − 0.6L + 0.08L2 εt . The AR coefficients of the lagged responses are negated compared to the corresponding coefficients in difference-equation format. In this form, to obtain the same result, enter the following into the command window. 12-80
arma2ar
ar0 = LagOp({1 -0.5 0.8}); ma0 = LagOp({1 -0.6 0.08}); ar = arma2ar(ar0, ma0);
It is a best practice for ar0 to constitute a stationary or unit-root stationary (integrated) time series model. If the ARMA model is strictly an MA model, then specify [] or {} for ar0. ma0 — Moving average coefficients numeric vector | cell vector of square, numeric matrices | LagOp lag operator polynomial object Moving average coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square, numeric matrices, or a LagOp lag operator polynomial object. If ma0 is a vector (numeric or cell), then the coefficient of εt is the identity. To specify a structural MA polynomial (i.e., the coefficient of εt is not the identity), use LagOp lag operator polynomials. • For univariate time series models, ma0 is a numeric vector, cell vector of scalars, or a onedimensional LagOp lag operator polynomial. For vectors, ma0 has length q and the elements correspond to lagged innovations composing the AR polynomial in difference-equation notation. That is, ma0(j) or ma0{j} is the coefficient of εt-j. • For numVars-dimensional time series models, ma0 is a cell vector of numeric numVars-bynumVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ma0 has length q. • ar0 and ma0 must both contain numVars-by-numVars matrices. • The elements of ma0 correspond to the lagged responses composing the AR polynomial in difference equation notation. That is, ma0{j} is the coefficient matrix of yt-j. • For LagOp lag operator polynomials: • The first element of the Coefficients property corresponds to the coefficient of εt (to accommodate structural models). All other elements correspond to the coefficients of the subsequent lags in the Lags property. • To construct a univariate model in reduced form, specify 1 for the first coefficient. For numVars-dimensional multivariate models, specify eye(numVars) for the first coefficient. It is a best practice for ma0 to constitute an invertible time series model. numLags — Maximum number of lag-term coefficients to return positive integer Maximum number of lag-term coefficients to return, specified as a positive integer. If you specify 'numLags', then arma2ar truncates the output polynomial at a maximum of numLags lag terms, and then returns the remaining coefficients. As a result, the output vector has numLags elements or is at most a degree numLags LagOp lag operator polynomial. By default, arma2ar determines the number of lag coefficients to return by the stopping criteria of mldivide. Data Types: double 12-81
12
Functions
Output Arguments ar — Coefficients of the truncated AR model numeric vector | cell vector of square, numeric matrices | LagOp lag operator polynomial object Coefficients of the truncated AR model approximation of the ARMA model, returned as a numeric vector, cell vector of square, numeric matrices, or a LagOp lag operator polynomial object. ar has numLags elements, or is at most a degree numLags LagOp lag operator polynomial. The data types and orientations of ar0 and ma0 determine the data type and orientation of ar. If ar0 or ma0 are of the same data type or have the same orientation, then ar shares the common data type or orientation. If at least one of ar0 or ma0 is a LagOp lag operator polynomial, then ar is a LagOp lag operator polynomial. Otherwise, if at least one of ar0 or ma0 is a cell vector, then ar is a cell vector. If ar0 and ma0 are cell or numeric vectors and at least one is a row vector, then ar is a row vector. If ar is a cell or numeric vector, then the order of the elements of ar corresponds to the order of the coefficients of the lagged responses in difference-equation notation on page 12-82 starting with the coefficient of yt-1. The resulting AR model is in reduced form. If ar is a LagOp lag operator polynomial, then the order of the coefficients of ar corresponds to the order of the coefficients of the lagged responses in lag operator notation on page 12-83 starting with the coefficient of yt. If Φ0 ≠ InumVars, then the resulting AR model is structural. To view the coefficients in difference-equation notation, pass ar to reflect.
More About Difference-Equation Notation A linear time series model written in difference-equation notation positions the present value of the response and its structural coefficient on the left side of the equation. The right side of the equation contains the sum of the lagged responses, present innovation, and lagged innovations with corresponding coefficients. In other words, a linear time series written in difference-equation notation is Φ0 yt = c + Φ1 yt − 1 + ... + Φp yt − p + Θ0εt + Θ1εt − 1 + ... + Θqεt − q, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • εt is a numVars-dimensional vector representing the innovations at time t. • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the n-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form.
12-82
arma2ar
Lag Operator Notation A time series model written in lag operator notation positions a p-degree lag operator polynomial on the present response on the left side of the equation. The right side of the equation contains the model constant and a q-degree lag operator polynomial on the present innovation. In other words, a linear time series model written in lag operator notation is Φ(L)yt = c + Θ(L)εt, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • Φ(L) = Φ − Φ L − Φ L2 − ... − Φ Lp, which is the autoregressive, lag operator polynomial. 0 1 2 p • L is the back-shift operator, in other words, L j y = y t − j. t • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • εt is a numVars-dimensional vector representing the innovations at time t. • Θ(L) = Θ + Θ L + Θ L2 + ... + Θ Lq, which is the moving average, lag operator polynomial. 0 1 2 q • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the numVars-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form. When comparing lag operator notation to difference-equation notation, the signs of the lagged AR coefficients appear negated relative to the corresponding terms in difference-equation notation. The signs of the moving average coefficients are the same and appear on the same side. For more details on lag operator notation, see “Lag Operator Notation” on page 1-21.
Tips • To accommodate structural ARMA models, specify the input arguments ar0 and ma0 as LagOp lag operator polynomials. • To access the cell vector of the lag operator polynomial coefficients of the output argument ar, enter toCellArray(ar). • To convert the model coefficients of the output argument from lag operator notation on page 1283 to the model coefficients in difference-equation notation on page 12-82, enter arDEN = toCellArray(reflect(ar));
arDEN is a cell vector containing at most numLags + 1 coefficients corresponding to the lag terms in ar.Lags of the AR model equivalent of the input ARMA model in difference-equation notation. The first element is the coefficient of yt, the second element is the coefficient of yt–1, and so on.
Algorithms • The software computes the infinite-lag polynomial of the resulting AR model according to this equation in lag operator notation: 12-83
12
Functions
−1
Θ
(L)Φ(L)yt = εt,
where Φ(L) =
p
∑
j=0
Φ jL j and Θ(L) =
q
∑
k=0
ΘkLk .
• arma2ar approximates the AR model coefficients whether ar0 and ma0 compose a stable polynomial (a polynomial that is stationary or invertible). To check for stability, use isStable. isStable requires a LagOp lag operator polynomial as input. For example, if ar0 is a vector, enter the following code to check ar0 for stationarity. ar0LagOp = LagOp([1 -ar0]); isStable(ar0LagOp)
A 0 indicates that the polynomial is not stable. You can similarly check whether the AR approximation to the ARMA model (ar) is stationary.
Version History Introduced in R2015a
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] Lutkepohl, H. New Introduction to Multiple Time Series Analysis. Springer-Verlag, 2007.
See Also Objects LagOp | varm Functions arma2ma | isStable | reflect | estimate | toCellArray | vec2var | var2vec Topics “Lag Operator Notation” on page 1-21
12-84
arma2ma
arma2ma Convert ARMA model to MA model
Syntax ma = arma2ma(ar0,ma0) ma = arma2ma(ar0,ma0,numLags)
Description ma = arma2ma(ar0,ma0) returns the coefficients of the truncated, infinite-order MA model approximation to an ARMA model having AR and MA coefficients specified by ar0 and ma0, respectively. arma2ma: • Accepts: • Vectors or cell vectors of matrices in difference-equation notation on page 12-82. • LagOp lag operator polynomials corresponding to the AR and MA polynomials in lag operator notation on page 12-83. • Accommodates time series models that are univariate or multivariate (i.e., numVars variables compose the model), stationary or integrated, structural or in reduced form, and invertible. • Assumes that the model constant c is 0. ma = arma2ma(ar0,ma0,numLags) returns the first nonzero numLags lag-term coefficients of the infinite-order MA model approximation of an ARMA model having AR coefficients ar0 and MA coefficients ma0.
Examples Convert an ARMA model to an MA Model Find the lag coefficients of the truncated, MA approximation of this univariate, stationary, and invertible ARMA model yt = 0 . 2yt − 1 − 0 . 1yt − 2 + εt + 0 . 5εt − 1 . The ARMA model is in difference-equation notation because the left side contains only yt and its coefficient 1. Create a vector containing the AR lag term coefficients in order starting from t - 1. ar0 = [0.2 -0.1];
Alternatively, you can create a cell vector of the scalar coefficients. Create a vector containing the MA lag term coefficient. ma0 = 0.5;
12-85
12
Functions
Convert the ARMA model to an MA model by obtaining the coefficients of the truncated approximation of the infinite-lag polynomial. ma = arma2ma(ar0,ma0) ma = 1×4 0.7000
0.0400
-0.0620
-0.0164
ma is a numeric vector because ar0 and ma0 are numeric vectors. The approximate MA model truncated at 4 lags is yt = εt + 0 . 7εt − 1 + 0 . 04εt − 2 − 0 . 062εt − 3 − 0 . 0164εt − 4 .
Convert an AR(3) Model to an MA(5) Model Find the first five lag coefficients of the MA approximation of this univariate and stationary AR(3) model yt = − 0 . 2yt − 1 + 0 . 5yt − 3 + εt . The AR model is in difference-equation notation because the left side contains only yt and its coefficient of 1. Create a cell vector containing the AR lag term coefficient in order starting from t - 1. Because the second lag term of the MA model is missing, specify a 0 for its coefficient. ar0 = {-0.2 0 0.5};
Convert the AR model to an MA model with at most five lag coefficients of the truncated approximation of the infinite-lag polynomial. Because there is no MA contribution, specify an empty cell ({}) for the MA coefficients. numLags = 5; ma0 = {}; ma = arma2ma(ar0,ma0,numLags) ma=1×5 cell array {[-0.2000]}
{[0.0400]}
{[0.4920]}
{[-0.1984]}
{[0.0597]}
ma is a cell vector of scalars because at least one of ar0 and ma0 is a cell vector. The approximate MA(5) model is yt = εt − 0 . 2εt − 1 + 0 . 04εt − 2 + 0 . 492εt − 3 − 0 . 1984εt − 4 + 0 . 0597εt − 5
Convert a Structural VARMA model to a Structural VMA model Find the coefficients of the truncated, structural VMA equivalent of the structural, stationary, and invertible VARMA model 12-86
arma2ma
1 0 . 2 −0 . 1 0 . 03 1 −0 . 15 + 0 . 9 −0 . 25 1 1 0 0 −0 . 02 0 . 03 0 1 0 + 0 . 003 0 . 001 0 0 1 0 . 3 0 . 01
0 . 5 −0 . 2 −0 . 1 0 . 05 −0 . 02 −0 . 01 4 −0 . 3 −0 . 1 0 . 1 L + −0 . 1 −0 . 01 −0 . 001 L8 yt = 0 . 4 −0 . 2 −0 . 05 0 . 04 −0 . 02 −0 . 005 0.3 0 . 01 L4 εt 0 . 01
where yt = y1t y2t y3t ′ and εt = ε1t ε2t ε3t ′. The VARMA model is in lag operator notation because the response and innovation vectors are on opposite sides of the equation. Create a cell vector containing the VAR matrix coefficients. Because this model is a structural model, start with the coefficient of yt and enter the rest in order by lag. Construct a vector that indicates the degree of the lag term for the corresponding coefficients. var0 = {[1 0.2 -0.1; 0.03 1 -0.15; 0.9 -0.25 1],... [0.5 -0.2 -0.1; -0.3 -0.1 0.1; 0.4 -0.2 -0.05],... [0.05 -0.02 -0.01; -0.1 -0.01 -0.001; 0.04 -0.02 -0.005]}; var0Lags = [0 4 8];
Create a cell vector containing the VMA matrix coefficients. Because this model is a structural model, start with the coefficient of εt and enter the rest in order by lag. Construct a vector that indicates the degree of the lag term for the corresponding coefficients. vma0 = {eye(3),... [-0.02 0.03 0.3; 0.003 0.001 0.01; 0.3 0.01 0.01]}; vma0Lags = [0 4];
arma2ma requires LagOp lag operator polynomials for input arguments that comprise structural VAR or VMA models. Construct separate LagOp polynomials that describe the VAR and VMA components of the VARMA model. VARLag = LagOp(var0,'Lags',var0Lags); VMALag = LagOp(vma0,'Lags',vma0Lags);
VARLags and VMALags are LagOp lag operator polynomials that describe the VAR and VMA components of the VARMA model. Convert the VARMA model to a VMA model by obtaining the coefficients of the truncated approximation of the infinite-lag polynomial. Specify to return at most 12 lagged terms. numLags = 12; VMA = arma2ma(VARLag,VMALag,numLags) VMA = 3-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 4 Non-Zero Coefficients] Lags: [0 4 8 12] Degree: 12 Dimension: 3
VMA is a LagOP lag operator polynomial. All coefficients except those corresponding to lags 0, 4, 8, and 12 are 3-by-3 matrices of zeros. 12-87
12
Functions
Display the nonzero coefficients of the resulting VMA model. lag2Idx = VMA.Lags + 1; % Lags start at 0. vmaCoeff = toCellArray(VMA);
Add 1 to convert to indices.
for j = 1:numel(lag2Idx) fprintf('___________Lag %d__________\n',lag2Idx(j) - 1) fprintf('%8.3f %8.3f %8.3f \n',vmaCoeff{lag2Idx(j)}) fprintf ('__________________________\n') end ___________Lag 0__________ 0.943 -0.172 0.069
-0.162 1.068 0.144
-0.889 0.421 0.974
__________________________ ___________Lag 4__________ -0.650 0.370 0.383
0.460 0.000 -0.111
0.546 -0.019 -0.312
__________________________ ___________Lag 8__________ 0.431 -0.170 -0.260
-0.138 0.122 0.165
-0.089 0.065 0.089
__________________________ ___________Lag 12__________ -0.216 0.099 0.153
0.078 -0.013 -0.042
0.047 -0.011 -0.026
__________________________
Unconditional Mean of ARMA Model Find the lag coefficients and constant of the truncated MA approximation of this univariate, stationary, and invertible ARMA model yt = 1 . 5 + 0 . 2yt − 1 − 0 . 1yt − 2 + εt + 0 . 5εt − 1 . The ARMA model is in difference-equation notation because the left side contains only yt and its coefficient of 1. Create separate vectors for the AR and MA lag term coefficients in order starting from t - 1. ar0 = [0.2 -0.1]; ma0 = 0.5;
12-88
arma2ma
Convert the ARMA model to an MA model by obtaining the first five coefficients of the truncated approximation of the infinite-lag polynomial. numLags = 5; ar = arma2ma(ar0,ma0,numLags) ar = 1×5 0.7000
0.0400
-0.0620
-0.0164
0.0029
To compute the constant of the MA model, consider the ARMA model in lag operator notation. 1 − 0 . 2L + 0 . 1L2 yt = 1 . 5 + (1 + 0 . 5L)εt or Φ(L)yt = 1 . 5 + Θ(L)εt Part of the conversion involves premultiplying both sides of the equation by the inverse of the AR lag operator polynomial, as in this equation. yt = Φ−1(L)1 . 5 + Φ−1(L)Θ(L)εt To compute the inverse of AR lag operator polynomial, use the lag operator left-division object function mldivide. Phi = LagOp([1 -0.2 0.1]); PhiInv = mldivide(Phi,1,'RelTol',1e-5);
PhiInv is a LagOp lag operator polynomial. The application of lag operator polynomials to constants results in the product of the constant with the sum of the coefficients. Apply PhiInv to the ARMA model constant to obtain the MA model constant. maConstant = 1.5*sum(cell2mat(toCellArray(PhiInv))) maConstant = 1.6667
The approximate MA model is yt = 1 . 667 + 0 . 7εt − 1 + 0 . 04εt − 2 − 0 . 062εt − 3 − 0 . 0164εt − 4 + 0 . 0029εt − 5 + εt . Since the unconditional expected value of all innovations is 0, the unconditional expected value (or mean) of the response series is E yt = 1 . 667 .
Input Arguments ar0 — Autoregressive coefficients numeric vector | cell vector of square, numeric matrices | LagOp lag operator polynomial object 12-89
12
Functions
Autoregressive coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square, numeric matrices, or a LagOp lag operator polynomial object. If ar0 is a vector (numeric or cell), then the coefficient of yt is the identity. To specify a structural AR polynomial (i.e., the coefficient of yt is not the identity), use LagOp lag operator polynomials. • For univariate time series models, ar0 is a numeric vector, cell vector of scalars, or a onedimensional LagOp lag operator polynomial. For vectors, ar0 has length p and the elements correspond to lagged responses composing the AR polynomial in difference-equation notation on page 12-92. That is, ar0(j) or ar0{j} is the coefficient of yt-j. • For numVars-dimensional time series models, ar0 is a cell vector of numVars-by-numVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ar0 has length p. • ar0 and ma0 must contain numVars-by-numVars matrices. • The elements of ar0 correspond to the lagged responses composing the AR polynomial in difference equation notation. That is, ar0{j} is the coefficient matrix of yt-j. • Row k of an AR coefficient matrix contains the AR coefficients in the equation of the variable yk. Subsequently, column k must correspond to variable yk, and the column and row order of all autoregressive and moving average coefficients must be consistent. • For LagOp lag operator polynomials: • The first element of the Coefficients property corresponds to the coefficient of yt (to accommodate structural models). All other elements correspond to the coefficients of the subsequent lags in the Lags property. • To construct a univariate model in reduced form, specify 1 for the first coefficient. For numVars-dimensional multivariate models, specify eye(numVars) for the first coefficient. • When you work from a model in difference-equation notation, negate the AR coefficient of the lagged terms to construct the lag-operator polynomial equivalent. For example, consider yt = 0.5yt − 1 − 0.8yt − 2 + εt − 0.6εt − 1 + 0.08εt − 2. The model is in difference-equation notation. To convert to an MA model, enter the following into the command window. ma = arma2ma([0.5 -0.8], [-0.6 0.08]);
The ARMA model in lag operator notation is 1 − 0.5L + 0.8L2 yt = 1 − 0.6L + 0.08L2 εt . The AR coefficients of the lagged responses are negated compared to the corresponding coefficients in difference-equation format. In this form, to obtain the same result, enter the following into the command window. ar0 = LagOp({1 -0.5 0.8}); ma0 = LagOp({1 -0.6 0.08}); ma = arma2ma(ar0, ma0);
It is a best practice for ar0 to constitute a stationary or unit-root stationary (integrated) time series model. ma0 — Moving average coefficients numeric vector | cell vector of square, numeric matrices | LagOp lag operator polynomial object Moving average coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square, numeric matrices, or a LagOp lag operator polynomial object. If ma0 is a vector (numeric or cell), then the coefficient of εt is the identity. To specify a structural MA polynomial (i.e., the coefficient of εt is not the identity), use LagOp lag operator polynomials. 12-90
arma2ma
• For univariate time series models, ma0 is a numeric vector, cell vector of scalars, or a onedimensional LagOp lag operator polynomial. For vectors, ma0 has length q and the elements correspond to lagged innovations composing the AR polynomial in difference-equation notation. That is, ma0(j) or ma0{j} is the coefficient of εt-j. • For numVars-dimensional time series models, ma0 is a cell vector of numeric numVars-bynumVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ma0 has length q. • ar0 and ma0 must both contain numVars-by-numVars matrices. • The elements of ma0 correspond to the lagged responses composing the AR polynomial in difference equation notation. That is, ma0{j} is the coefficient matrix of yt-j. • For LagOp lag operator polynomials: • The first element of the Coefficients property corresponds to the coefficient of εt (to accommodate structural models). All other elements correspond to the coefficients of the subsequent lags in the Lags property. • To construct a univariate model in reduced form, specify 1 for the first coefficient. For numVars-dimensional multivariate models, specify eye(numVars) for the first coefficient. If the ARMA model is strictly an AR model, then specify [] or {}. It is a best practice for ma0 to constitute an invertible time series model. numLags — Maximum number of lag-term coefficients to return positive integer Maximum number of lag-term coefficients to return, specified as a positive integer. If you specify 'numLags', then arma2ma truncates the output polynomial at a maximum of numLags lag terms, and then returns the remaining coefficients. As a result, the output vector has numLags elements or is at most a degree numLags LagOp lag operator polynomial. By default, arma2ma determines the number of lag coefficients to return by the stopping criteria of mldivide. Data Types: double
Output Arguments ma — Lag-term coefficients of the truncated MA model numeric vector | cell vector of square, numeric matrices | LagOp lag operator polynomial object Lag-term coefficients of the truncated MA model approximation of the ARMA model, returned as a numeric vector, cell vector of square, numeric matrices, or a LagOp lag operator polynomial object. ma has numLags elements, or is at most a degree numLags LagOp lag operator polynomial. The data types and orientations of ar0 and ma0 determine the data type and orientation of ma. If ar0 or ma0 are of the same data type or have the same orientation, then ma shares the common data type or orientation. If at least one of ar0 or ma0 is a LagOp lag operator polynomial, then ma is a LagOp lag operator polynomial. Otherwise, if at least one of ar0 or ma0 is a cell vector, then ma is a cell vector. If ar0 and ma0 are cell or numeric vectors and at least one is a row vector, then ma is a row vector. 12-91
12
Functions
If ma is a cell or numeric vector, then the order of the elements of ma corresponds to the order of the coefficients of the lagged innovations in difference-equation notation on page 12-92 starting with the coefficient of εt-1. The resulting MA model is in reduced form. If ma is a LagOp lag operator polynomial, then the order of the coefficients of ma corresponds to the order of the coefficients of the lagged innovations in lag operator notation on page 12-92 starting with the coefficient of εt. If Θ0 ≠ InumVars, then the resulting MA model is structural.
More About Difference-Equation Notation A linear time series model written in difference-equation notation positions the present value of the response and its structural coefficient on the left side of the equation. The right side of the equation contains the sum of the lagged responses, present innovation, and lagged innovations with corresponding coefficients. In other words, a linear time series written in difference-equation notation is Φ0 yt = c + Φ1 yt − 1 + ... + Φp yt − p + Θ0εt + Θ1εt − 1 + ... + Θqεt − q, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • εt is a numVars-dimensional vector representing the innovations at time t. • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the n-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form. Lag Operator Notation A time series model written in lag operator notation positions a p-degree lag operator polynomial on the present response on the left side of the equation. The right side of the equation contains the model constant and a q-degree lag operator polynomial on the present innovation. In other words, a linear time series model written in lag operator notation is Φ(L)yt = c + Θ(L)εt, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • Φ(L) = Φ − Φ L − Φ L2 − ... − Φ Lp, which is the autoregressive, lag operator polynomial. 0 1 2 p • L is the back-shift operator, in other words, L j y = y t − j. t • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • εt is a numVars-dimensional vector representing the innovations at time t. 12-92
arma2ma
• Θ(L) = Θ + Θ L + Θ L2 + ... + Θ Lq, which is the moving average, lag operator polynomial. 0 1 2 q • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the numVars-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form. When comparing lag operator notation to difference-equation notation, the signs of the lagged AR coefficients appear negated relative to the corresponding terms in difference-equation notation. The signs of the moving average coefficients are the same and appear on the same side. For more details on lag operator notation, see “Lag Operator Notation” on page 1-21.
Tips • To accommodate structural ARMA models, specify the input arguments ar0 and ma0 as LagOp lag operator polynomials. • To access the cell vector of the lag operator polynomial coefficients of the output argument ma, enter toCellArray(ma).
Algorithms • The software computes the infinite-lag polynomial of the resulting MA model according to this equation in lag operator notation: yt = Φ−1(L)Θ(L)εt where Φ(L) =
p
∑
j=0
Φ jL j and Θ(L) =
q
∑
k=0
ΘkLk .
• arma2ma approximates the MA model coefficients whether ar0 and ma0 compose a stable polynomial (a polynomial that is stationary or invertible). To check for stability, use isStable. isStable requires a LagOp lag operator polynomial as input. For example, if ar0 is a vector, enter the following code to check ar0 for stationarity. ar0LagOp = LagOp([1 -ar0]); isStable(ar0LagOp)
A 0 indicates that the polynomial is not stable. You can similarly check whether the MA approximation to the ARMA model (ma) is invertible.
Version History Introduced in R2015a
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-93
12
Functions
[3] Lutkepohl, H. New Introduction to Multiple Time Series Analysis. Springer-Verlag, 2007.
See Also Objects LagOp | varm Functions estimate | arma2ar | LagOp | isStable | toCellArray | vec2var | var2vec | armairf Topics “Lag Operator Notation” on page 1-21
12-94
armafevd
armafevd Generate or plot ARMA model forecast error variance decomposition (FEVD)
Syntax armafevd(ar0,ma0) armafevd(ar0,ma0,Name=Value) Y = armafevd( ___ ) armafevd(ax, ___ ) [Y,h] = armafevd( ___ )
Description The armafevd function returns or plots the forecast error variance decomposition on page 12-110 of the variables in a univariate or vector (multivariate) autoregressive moving average (ARMA or VARMA) model specified by arrays of coefficients or lag operator polynomials. Alternatively, you can return an FEVD from a fully specified (for example, estimated) model object by using a function in this table. Model Object
IRF function
varm
fevd
vecm
fevd
The FEVD provides information about the relative importance of each innovation in affecting the forecast error variance of all variables in the system. In contrast, the impulse response function (IRF) traces the effects of an innovation shock to one variable on the response of all variables in the system. To estimate IRFs of univariate or multivariate ARMA models, see armairf. armafevd(ar0,ma0) plots, in separate figures, the FEVD of the numVars time series variables that compose an ARMA(p,q) model, with autoregressive (AR) and moving average (MA) coefficients ar0 and ma0, respectively. Each figure corresponds to a variable and contains numVars line plots. The line plots are the FEVDs of that variable, over the forecast horizon, resulting from a one-standarddeviation innovation shock applied to all variables in the system at time 0. The armafevd function: • Accepts vectors or cell vectors of matrices in difference-equation notation on page 12-110 • Accepts LagOp lag operator polynomials corresponding to the AR and MA polynomials in lag operator notation on page 12-111 • Accommodates time series models that are univariate or multivariate, stationary or integrated, structural or in reduced form, and invertible or noninvertible • Assumes that the model constant c is 0 armafevd(ar0,ma0,Name=Value) plots the numVars FEVDs with additional options specified by one or more name-value arguments. For example, NumObs=10,Method="generalized" specifies a 10-period forecast horizon and the estimation of the generalized FEVD. 12-95
12
Functions
Y = armafevd( ___ ) returns the numVars FEVDs using any of the input argument combinations in the previous syntaxes. armafevd(ax, ___ ) plots to the axes specified in ax instead of the axes in new figures. The option ax can precede any of the input argument combinations in the previous syntaxes. [Y,h] = armafevd( ___ ) additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the returned plots.
Examples Plot Orthogonalized FEVD of Univariate ARMA Model Plot the FEVD of the univariate ARMA(2,1) model yt = 0 . 3yt − 1 − 0 . 1yt − 2 + εt + 0 . 05εt − 1 . Create vectors for the autoregressive and moving average coefficients as you encounter them in the model, which is expressed in difference-equation notation. AR0 = [0.3 -0.1]; MA0 = 0.05;
Plot the orthogonalized FEVD of yt. armafevd(AR0,MA0);
12-96
armafevd
Because yt is univariate, the FEVD is trivial.
Plot Orthogonalized FEVDs of VARMA Model Plot the FEVD of the VARMA(3,1) model −0 . 5 0 . 2 0 . 1 −0 . 05 0 . 02 0 . 01 −0 . 02 0 . 03 0 . 3 yt = 0 . 3 0 . 1 −0 . 1 yt − 1 + 0 . 1 0 . 01 0 . 001 yt − 3 + εt + 0 . 003 0 . 001 0 . 01 εt − 1 −0 . 4 0 . 2 0 . 05 −0 . 04 0 . 02 0 . 005 0 . 3 0 . 01 0 . 01 where yt = y1t y2t y3t ′ and εt = ε1t ε2t ε3t ′. The VARMA model is in difference-equation notation because the current response is isolated from all other terms in the equation. Create a cell vector containing the VAR matrix coefficients. The position of the coefficient matrix in the cell vector determines its lag. Therefore, specify a 3-by-3 matrix of zeros as the second element of the vector. var0 = {[-0.5 0.2 0.1; 0.3 0.1 -0.1; -0.4 0.2 0.05],... zeros(3),... [-0.05 0.02 0.01; 0.1 0.01 0.001; -0.04 0.02 0.005]};
Create a cell vector containing the VMA matrix coefficients. vma0 = {[-0.02 0.03 0.3; 0.003 0.001 0.01; 0.3 0.01 0.01]};
Plot the orthogonalized FEVDs of the VARMA model. armafevd(var0,vma0);
12-97
12
Functions
12-98
armafevd
12-99
12
Functions
armafevd returns three figures. Figure k contains the generalized FEVD of variable k to a shock applied to all other variables at time 0. • You can attribute most of the forecast error variance of variable 1 to a shock to variable 1. A shock to variable 2 does not contribute much to the forecast error variance of variable 1. • You can attribute most of the forecast error variance of variable 2 to a shock to variable 2. A shock to variable 3 does not contribute much to the forecast error variance of variable 2. • You can attribute most of the forecast error variance of variable 3 to a shock to variable 3. A shock to variable 2 does not contribute much to the forecast error variance of variable 3.
Plot Generalized FEVDs of Structural VARMA Model in Lag Operator Notation Plot the entire FEVD of the structural VARMA(8,4) model 1
0.2
−0 . 1 0 . 03 1 −0 . 15 − 0 . 9 −0 . 25 1 10 0 −0 . 02 0 . 03 0 1 0 + 0 . 003 0 . 001 00 1 0 . 3 0 . 01
−0 . 5 0 . 2 0 . 1 −0 . 05 0 . 02 0 . 01 4 0 . 3 0 . 1 −0 . 1 L − 0 . 1 0 . 01 0 . 001 L8 yt = −0 . 4 0 . 2 0 . 05 −0 . 04 0 . 02 0 . 005 0.3 0 . 01 L4 εt 0 . 01
where yt = y1t y2t y3t ′ and εt = ε1t ε2t ε3t ′. 12-100
armafevd
The VARMA model is in lag operator notation because the response and innovation vectors are on opposite sides of the equation. Create a cell vector containing the VAR matrix coefficients. Because this model is a structural model in lag operator notation, start with the coefficient of yt and enter the rest in order by lag. Construct a vector that indicates the degree of the lag term for the corresponding coefficients (the structuralcoefficient lag is 0). var0 = {[1 0.2 -0.1; 0.03 1 -0.15; 0.9 -0.25 1],... -[-0.5 0.2 0.1; 0.3 0.1 -0.1; -0.4 0.2 0.05],... -[-0.05 0.02 0.01; 0.1 0.01 0.001; -0.04 0.02 0.005]}; var0Lags = [0 4 8];
Create a cell vector containing the VMA matrix coefficients. Because this model is in lag operator notation, start with the coefficient of εt and enter the rest in order by lag. Construct a vector that indicates the degree of the lag term for the corresponding coefficients. vma0 = {eye(3),... [-0.02 0.03 0.3; 0.003 0.001 0.01; 0.3 0.01 0.01]}; vma0Lags = [0 4];
Construct separate lag operator polynomials that describe the VAR and VMA components of the VARMA model. VARLag = LagOp(var0,Lags=var0Lags); VMALag = LagOp(vma0,Lags=vma0Lags);
Plot the generalized FEVDs of the VARMA model. armafevd(VARLag,VMALag,Method="generalized");
12-101
12
Functions
12-102
armafevd
12-103
12
Functions
armafevd returns three figures. Figure k contains the generalized FEVD of variable k to a shock applied to all other variables at time 0. • You can attribute most of the forecast error variance of variable 1 to a shock to variable 1. Shocks to variables 2 and 3 contribute similarly to the forecast error variance of variable 1. • You can attribute most of the forecast error variance of variable 2 to a shock to variable 2. A shock to variable 3 does not contribute much to the forecast error variance of variable 2. • You can attribute most of the forecast error variance of variable 3 to shocks to variables 1 and 3, each contributing similar amounts. A shock to variable 2 does not contribute much to the forecast error variance of variable 3.
Return VAR Model FEVDs Compute the generalized FEVDs of the two-dimensional VAR(3) model yt =
1 −0 . 2 0 . 75 −0 . 1 0 . 55 −0 . 02 yt − 1 − yt − 2 + yt − 3 + εt . −0 . 1 0 . 3 −0 . 05 0 . 15 −0 . 01 0 . 03
In the equation, yt = [y1, t y2, t]′, εt = [ε1, t ε2, t]′, and, for all t, εt is Gaussian with mean zero and covariance matrix Σ=
12-104
0 . 5 −0 . 1 . −0 . 1 0 . 25
armafevd
Create a cell vector of matrices for the autoregressive coefficients as you encounter them in the model as expressed in difference-equation notation. Specify the innovation covariance matrix. AR1 AR2 AR3 ar0
= = = =
[1 -0.2; -0.1 0.3]; -[0.75 -0.1; -0.05 0.15]; [0.55 -0.02; -0.01 0.03]; {AR1 AR2 AR3};
InnovCov = [0.5 -0.1; -0.1 0.25];
Compute the generalized FEVDs of yt. Because no MA terms exist, specify an empty array ([]) for the second input argument. Y = armafevd(ar0,[],Method="generalized",InnovCov=InnovCov); size(Y) ans = 1×3 31
2
2
Y(10,1,2) ans = 0.1302
Y is a 31-by-2-by-2 array of FEVDs. Rows correspond to times 1 through 31 in the forecast horizon, columns correspond to the variables that armafevd shocks at time 0, and pages correspond to the FEVD of the variables in the system. For example, the contribution to the forecast error variance of variable 2 at time 10 in the forecast horizon, attributable to a shock to variable 1, is Y(10,1,2) = 0.1302. armafevd satisfies the stopping criterion after 31 periods. You can specify to stop sooner using the NumObs name-value argument. This practice is beneficial when the system has many variables. Compute and display the generalized FEVDs for the first 10 periods. Y10 = armafevd(ar0,[],Method="generalized",InnovCov=InnovCov, ... NumObs=10) Y10 = Y10(:,:,1) = 1.0000 0.9912 0.9863 0.9863 0.9873 0.9874 0.9864 0.9864 0.9866 0.9867
0.0800 0.1238 0.1343 0.1341 0.1294 0.1313 0.1342 0.1343 0.1336 0.1336
Y10(:,:,2) = 0.0800 0.1157
1.0000 0.9838
12-105
12
Functions
0.1235 0.1236 0.1237 0.1264 0.1296 0.1298 0.1298 0.1302
0.9737 0.9737 0.9736 0.9709 0.9679 0.9677 0.9677 0.9673
Y10 is a 10-by-2-by-2 array of FEVDs. Rows correspond to times 1 through 10 in the forecast horizon. In all FEVDs, the contributions appear to stabilize before 10 periods elapse. For each variable (page), compute the row sums. sum(Y10,2) ans = ans(:,:,1) = 1.0800 1.1150 1.1206 1.1204 1.1167 1.1187 1.1206 1.1207 1.1202 1.1203 ans(:,:,2) = 1.0800 1.0995 1.0972 1.0973 1.0973 1.0973 1.0975 1.0975 1.0975 1.0975
For generalized FEVDs, forecast error variance contributions at each period in the forecast horizon do not necessarily sum to one. This characteristic is in contrast to orthogonalized FEVDs, in which all rows sum to one.
Input Arguments ar0 — Autoregressive coefficients numeric vector | cell vector of square numeric matrices | LagOp lag operator polynomial object
12-106
armafevd
Autoregressive coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square numeric matrices, or LagOp lag operator polynomial object. If ar0 is a vector (numeric or cell), then the coefficient of yt is the identity (eye(numVars)). For an MA model, specify an empty array or cell ([] or {}). • For univariate time series models, ar0 is a numeric vector, cell vector of scalars, or onedimensional LagOp lag operator polynomial. For vectors, ar0 has length p, and the elements correspond to lagged responses that compose the AR polynomial in difference-equation notation on page 12-110. In other words, ar0(j) or ar0{j} is the coefficient of yt-j, j = 1,…,p. Variance decompositions of univariate models are trivial; see Y. • For numVars-dimensional time series models, ar0 is a cell vector of numVars-by-numVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ar0 has length p. • ar0 and ma0 each must contain numVars-by-numVars matrices. For each matrix, row k and column k correspond to variable k in the system k = 1,…,numVars. • The elements of ar0 correspond to the lagged responses that compose the AR polynomial in difference equation notation. In other words, ar0{j} is the coefficient matrix of vector yt-j, j = 1,…,p. For all AR coefficient matrices, row k contains the AR coefficients in the equation of the variable ykt, and column k contains the coefficients of variable ykt within the equations. The row and column order of all autoregressive and moving average coefficients must be consistent. • For LagOp lag operator polynomials: • Coefficients in the Coefficients property correspond to the lags of yt in the Lags property. • Specify a model in reduced form by supplying the identity for the first coefficient (eye(numVars)). • armafevd composes the model using lag operator notation on page 12-111. In other words, when you work from a model in difference-equation notation, negate the AR coefficients of the lagged responses to construct the lag operator polynomial equivalent. For example, consider yt = 0.5yt − 1 − 0.8yt − 2 + εt − 0.6εt − 1 + 0.08εt − 2. The model is in difference-equation form. To compute the FEVD, enter the following at the command line. y = armafevd([0.5 -0.8], [-0.6 0.08]);
The ARMA model written in lag operator notation is 1 − 0.5L + 0.8L2 yt = 1 − 0.6L + 0.08L2 εt . The AR coefficients of the lagged responses are negated compared to the corresponding coefficients in difference-equation format. To obtain the same result using lag operator notation, enter the following at the command line. ar0 = LagOp({1 -0.5 0.8}); ma0 = LagOp({1 -0.6 0.08}); y = armafevd(ar0, ma0);
ma0 — Moving average coefficients numeric vector | cell vector of square numeric matrices | LagOp lag operator polynomial object Moving average coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square numeric matrices, or LagOp lag operator polynomial object. If ma0 is a vector (numeric or cell), then the coefficient of εt is the identity (eye(numVars)). For an AR model, specify an empty array or cell ([] or {}). 12-107
12
Functions
• For univariate time series models, ma0 is a numeric vector, cell vector of scalars, or onedimensional LagOp lag operator polynomial. For vectors, ma0 has length q, and the elements correspond to lagged innovations that compose the AR polynomial in difference-equation notation on page 12-110. In other words, ma0(j) or ma0{j} is the coefficient of εt-j, j = 1,…,q. Variance decompositions of univariate models are trivial; see Y. • For numVars-dimensional time series models, ma0 is a cell vector of numeric numVars-bynumVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ma0 has length q. • ar0 and ma0 each must contain numVars-by-numVars matrices. For each matrix, row k and column k correspond to variable k in the system k = 1,…,numVars. • The elements of ma0 correspond to the lagged responses that compose the MA polynomial in difference-equation notation. In other words, ma0{j} is the coefficient matrix of εt-j, j = 1,…,q. For all MA coefficient matrices, row k contains the MA coefficients in the equation of the variable εkt, and column k contains the coefficients of εkt within the equations. The row and column order of all autoregressive and moving average coefficient matrices must be consistent. • For LagOp lag operator polynomials, coefficients in the Coefficients property correspond to the lags of εt in the Lags property. To specify a model in reduced form, supply the identity (eye(numVars)) for the coefficient that corresponds to lag 0. ax — Axes on which to plot FEVD of each variable vector of Axes objects Axes on which to plot the FEVD of each variable, specified as a vector of Axes objects with length equal to numVars. By default, armafevd plots variance decompositions on axes in separate figures. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: Method="generalized",NumObs=10 specifies to compute the generalized FEVD of each variable for 10 periods. InnovCov — Covariance matrix numeric scalar | numeric matrix Covariance matrix of the ARMA(p,q) model innovations εt, specified as a numeric scalar or a numVars-by-numVars numeric matrix. InnovCov must be a positive scalar or a positive definite matrix. The default value is eye(numVars). Example: InnovCov=0.2 Data Types: double 12-108
armafevd
NumObs — Forecast horizon positive integer Forecast horizon, or the number of periods for which armafevd computes the FEVD, specified as a positive integer. In other words, NumObs specifies the number of observations to include in the FEVD (the number of rows in Y). By default, armafevd determines NumObs by the stopping criteria of mldivide. Example: NumObs=10 Data Types: double Method — FEVD computation method "orthogonalized" (default) | "generalized" FEVD computation method, specified as a value in this table. Value
Description
"orthogonalized"
Compute variance decompositions using orthogonalized, one-standard-deviation innovation shocks. armafevd uses the Cholesky factorization of InnovCov for orthogonalization.
"generalized"
Compute variance decompositions using onestandard-deviation innovation shocks.
Example: Method="generalized" Data Types: char | string
Output Arguments Y — FEVD ones(numObs,1) | numeric array FEVD of each variable, returned as a column vector of ones or a numeric array. Y(t,j,k) is the contribution to the variance decomposition of variable k attributable to an innovation shock to variable j at time t, for t = 1,2,…,numObs, j = 1,2,...,numVars, and k = 1,2,...,numVars. The columns and pages of Y correspond to the variable order in ar0 and ma0. For univariate models, Y is ones(numObs,1) because the variance decomposition is one for each period in the forecast horizon. h — Handles to plotted graphics objects matrix of graphics objects Handles to plotted graphics objects, returned as a numVars-by-numVars matrix of graphics objects. h(j,k) corresponds to the FEVD of k attributable to an innovation shock to variable j at time 0. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
12-109
12
Functions
More About Difference-Equation Notation A linear time series model written in difference-equation notation positions the present value of the response and its structural coefficient on the left side of the equation. The right side of the equation contains the sum of the lagged responses, present innovation, and lagged innovations with corresponding coefficients. In other words, a linear time series written in difference-equation notation is Φ0 yt = c + Φ1 yt − 1 + ... + Φp yt − p + Θ0εt + Θ1εt − 1 + ... + Θqεt − q, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • εt is a numVars-dimensional vector representing the innovations at time t. • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the n-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form. Forecast Error Variance Decomposition The forecast error variance decomposition (FEVD) of a multivariate, dynamic system shows the relative importance of a shock to each innovation in affecting the forecast error variance of all variables in the system. Suppose yt is the ARMA(p,q) model containing numVars response variables Φ(L)yt = Θ(L)εt . • Φ(L) is the lag operator polynomial of the autoregressive coefficients, in other words, Φ(L) = Φ0 − Φ1L − Φ2L2 − ... − ΦpLp . • Θ(L) is the lag operator polynomial of the moving average coefficients, in other words, Θ(L) = Θ0 + Θ1L + Θ2L2 + ... + ΘqLq . • εt is the vector of numVars-D series of innovations. Assume that the innovations have zero mean and the constant, positive-definite covariance matrix Σ for all t. The infinite-lag MA representation of yt is yt = Φ−1(L)Θ(L)εt = Ω(L)εt . The general form of the FEVD of ykt (variable k) m periods into the future, attributable to a onestandard-deviation innovation shock to yjt, is m−1
∑
γm jk =
t=0 m−1
∑
t=0
12-110
ek′Cte j
2
. ek′ Ωt ΣΩt′ek
armafevd
• ej is a selection vector of length numVars containing a one in element j and zeros elsewhere. • For orthogonalized FEVDs, Cm = ΩmP, where P is the lower triangular factor in the Cholesky factorization of Σ. • For generalized FEVDs, Cm = σ−1ΩmΣ, where σj is the standard deviation of innovation j. j • The numerator is the contribution of an innovation shock to variable j to the forecast error variance of the m-step ahead forecast of variable k. The denominator is the mean squared error (MSE) of the m-step ahead forecast of variable k [3]. Lag Operator Notation A time series model written in lag operator notation positions a p-degree lag operator polynomial on the present response on the left side of the equation. The right side of the equation contains the model constant and a q-degree lag operator polynomial on the present innovation. In other words, a linear time series model written in lag operator notation is Φ(L)yt = c + Θ(L)εt, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • Φ(L) = Φ − Φ L − Φ L2 − ... − Φ Lp, which is the autoregressive, lag operator polynomial. 0 1 2 p • L is the back-shift operator, in other words, L j y = y t − j. t • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • εt is a numVars-dimensional vector representing the innovations at time t. • Θ(L) = Θ + Θ L + Θ L2 + ... + Θ Lq, which is the moving average, lag operator polynomial. 0 1 2 q • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the numVars-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form. When comparing lag operator notation to difference-equation notation, the signs of the lagged AR coefficients appear negated relative to the corresponding terms in difference-equation notation. The signs of the moving average coefficients are the same and appear on the same side. For more details on lag operator notation, see “Lag Operator Notation” on page 1-21.
Tips • To accommodate structural ARMA(p,q) models, supply LagOp lag operator polynomials for the input arguments ar0 and ma0. To specify a structural coefficient when you call LagOp, set the corresponding lag to 0 by using the Lags name-value argument. • For orthogonalized multivariate FEVDs, arrange the variables according to Wold causal ordering [3]: • The first variable (corresponding to the first row and column of both ar0 and ma0) is most likely to have an immediate impact (t = 0) on all other variables. 12-111
12
Functions
• The second variable (corresponding to the second row and column of both ar0 and ma0) is most likely to have an immediate impact on the remaining variables, but not the first variable. • In general, variable j (corresponding to row j and column j of both ar0 and ma0) is the most likely to have an immediate impact on the last numVars – j variables, but not the previous j – 1 variables.
Algorithms • armafevd plots FEVDs only when it returns no output arguments or h. • If Method is "orthogonalized", then armafevd orthogonalizes the innovation shocks by applying the Cholesky factorization of the innovations covariance matrix InnovCov. The covariance of the orthogonalized innovation shocks is the identity matrix, and the FEVD of each variable sums to one, that is, the sum along any row of Y is one. Therefore, the orthogonalized FEVD represents the proportion of forecast error variance attributable to various shocks in the system. However, the orthogonalized FEVD generally depends on the order of the variables. If Method is "generalized", then: • The resulting FEVD is invariant to the order of the variables. • The resulting FEVD is not based on an orthogonal transformation. • The resulting FEVD of a variable sums to one only when InnovCov is diagonal [4]. Therefore, the generalized FEVD represents the contribution to the forecast error variance of equation-wise shocks to the variables in the system. • If InnovCov is a diagonal matrix, then the resulting generalized and orthogonalized FEVDs are identical. Otherwise, the resulting generalized and orthogonalized FEVDs are identical only when the first variable shocks all variables (in other words, all else being the same, both methods yield the same value of Y(:,1,:)).
Version History Introduced in R2018b
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Lütkepohl, H. "Asymptotic Distributions of Impulse Response Functions and Forecast Error Variance Decompositions of Vector Autoregressive Models." Review of Economics and Statistics. Vol. 72, 1990, pp. 116–125. [3] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [4] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29.
See Also armairf | vec2var | LagOp | mldivide 12-112
armairf
armairf Generate or plot ARMA model impulse responses
Syntax armairf(ar0,ma0) armairf(ar0,ma0,Name=Value) Y = armairf( ___ ) armairf(ax, ___ ) [Y,h] = armairf( ___ )
Description The armairf function returns or plots the impulse response functions on page 12-127 (IRFs) of the variables in a univariate or vector (multivariate) autoregressive moving average (ARMA) model specified by arrays of coefficients or lag operator polynomials. Alternatively, you can return an IRF from a fully specified (for example, estimated) model object by using the function in this table. Model Object
IRF Function
arima
impulse
regARIMA
impulse
varm
irf
vecm
irf
IRFs trace the effects of an innovation shock to one variable on the response of all variables in the system. In contrast, the forecast error variance decomposition (FEVD) provides information about the relative importance of each innovation in affecting all variables in the system. To estimate FEVDs of univariate or multivariate ARMA models, see armafevd. armairf(ar0,ma0) plots, in separate figures, the impulse response function of the numVars time series variables that compose an ARMA(p,q) model. The autoregressive (AR) and moving average (MA) coefficients of the model are ar0 and ma0, respectively. Each figure contains numVars line plots representing the responses of a variable from applying a one-standard-deviation shock, at time 0, to all variables in the system over the forecast horizon. The armairf function: • Accepts vectors or cell vectors of matrices in difference-equation notation on page 12-127 • Accepts LagOp lag operator polynomials corresponding to the AR and MA polynomials in lag operator notation on page 12-128 • Accommodates time series models that are univariate or multivariate, stationary or integrated, structural or in reduced form, and invertible or noninvertible • Assumes that the model constant c is 0 12-113
12
Functions
armairf(ar0,ma0,Name=Value) plots the numVars IRFs with additional options specified by one or more name-value arguments. For example, NumObs=10,Method="generalized" specifies a 10period forecast horizon and the estimation of the generalized IRF. Y = armairf( ___ ) returns the numVars IRFs using any of the input argument combinations in the previous syntaxes. armairf(ax, ___ ) plots to the axes specified in ax instead of the axes in new figures. The option ax can precede any of the input argument combinations in the previous syntaxes. [Y,h] = armairf( ___ ) additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the returned plots.
Examples Plot Orthogonalized IRF of Univariate ARMA Model Plot the entire IRF of the univariate ARMA(2,1) model yt = 0 . 3yt − 1 − 0 . 1yt − 2 + εt + 0 . 05εt − 1 . Create vectors for the autoregressive and moving average coefficients as you encounter them in the model as expressed in difference-equation notation. AR0 = [0.3 -0.1]; MA0 = 0.05;
Plot the orthogonalized IRF of yt. armairf(AR0,MA0);
12-114
armairf
The impulse response fades after four periods. Alternatively, create an ARMA model that represents yt. Specify 1 for the variance of the innovations, and no model constant. Mdl = arima(AR=AR0,MA=MA0,Variance=1,Constant=0);
Mdl is an arima model object. Plot the IRF using Mdl. impulse(Mdl);
12-115
12
Functions
impulse uses a stem plot, whereas armairf uses a line plot. However, the IRFs in the two implementations are equal because the variance of the ARMA model is 1.
Plot Generalized IRF of Univariate ARMA Model Plot the entire generalized IRF of the univariate ARMA(2,1) model (1 − 0 . 3L + 0 . 1L2)yt = (1 + 0 . 05L)εt. Because the model is in lag operator form, create the polynomials using the coefficients as you encounter them in the model. AR0Lag = LagOp([1 -0.3 0.1]) AR0Lag = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.3 0.1] Lags: [0 1 2] Degree: 2 Dimension: 1 MA0Lag = LagOp([1 0.05])
12-116
armairf
MA0Lag = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 0.05] Lags: [0 1] Degree: 1 Dimension: 1
AR0Lag and MA0Lag are LagOp lag operator polynomials representing the autoregressive and moving average lag operator polynomials, respectively. Plot the generalized IRF by passing in the lag operator polynomials. armairf(AR0Lag,MA0Lag,Method="generalized");
The IRF is equivalent to the IRF in “Plot Orthogonalized IRF of Univariate ARMA Model” on page 12114.
Plot Generalized IRF of VARMA Model Plot the entire IRF of the structural vector autoregression moving average model (VARMA(8,4))
12-117
12
Functions
1 0 . 2 −0 . 1 0 . 03 1 −0 . 15 − 0 . 9 −0 . 25 1 10 0 −0 . 02 0 . 03 0 1 0 + 0 . 003 0 . 001 00 1 0 . 3 0 . 01
−0 . 5 0 . 2 0 . 1 −0 . 05 0 . 02 0 . 01 4 0 . 3 0 . 1 −0 . 1 L − 0 . 1 0 . 01 0 . 001 L8 yt = −0 . 4 0 . 2 0 . 05 −0 . 04 0 . 02 0 . 005 0.3 0 . 01 L4 εt 0 . 01
where yt = y1t y2t y3t ′ and εt = ε1t ε2t ε3t ′. The VARMA model is in lag operator notation because the response and innovation vectors are on opposite sides of the equation. Create a cell vector containing the VAR matrix coefficients. Because this model is a structural model in lag operator notation, start with the coefficient of yt and enter the rest in order by lag. Construct a vector that indicates the degree of the lag term for the corresponding coefficients (the structuralcoefficient lag is 0). var0 = {[1 0.2 -0.1; 0.03 1 -0.15; 0.9 -0.25 1],... -[-0.5 0.2 0.1; 0.3 0.1 -0.1; -0.4 0.2 0.05],... -[-0.05 0.02 0.01; 0.1 0.01 0.001; -0.04 0.02 0.005]}; var0Lags = [0 4 8];
Create a cell vector containing the VMA matrix coefficients. Because this model is in lag operator notation, start with the coefficient of εt and enter the rest in order by lag. Construct a vector that indicates the degree of the lag term for the corresponding coefficients. vma0 = {eye(3),... [-0.02 0.03 0.3; 0.003 0.001 0.01; 0.3 0.01 0.01]}; vma0Lags = [0 4];
Construct separate lag operator polynomials that describe the VAR and VMA components of the VARMA model. VARLag = LagOp(var0,Lags=var0Lags); VMALag = LagOp(vma0,Lags=vma0Lags);
Plot the generalized IRF of the VARMA model. figure; armairf(VARLag,VMALag,Method="generalized");
12-118
armairf
12-119
12
Functions
12-120
armairf
armairf returns three figures. Figure k contains the generalized IRF of variable k to a shock applied to all other variables at time 0. Because all IRFs fade after a finite number of periods, the VARMA model is stable.
Return IRF of ARMA Model Compute the entire orthogonalized IRF of the univariate ARMA(2,1) model yt = 0 . 3yt − 1 − 0 . 1yt − 2 + εt + 0 . 05εt − 1. Create vectors for the autoregressive and moving average coefficients as you encounter them in the model, which is expressed in difference-equation notation. AR0 = [0.3 -0.1]; MA0 = 0.05;
Plot the orthogonalized IRF of yt. y = armairf(AR0,MA0) y = 5×1 1.0000 0.3500
12-121
12
Functions
0.0050 -0.0335 -0.0105
y is a 5-by-1 vector of impulse responses. y(1) is the impulse response for time t = 0, y(2) is the impulse response for time t = 1, and so on. The IRF fades after period t = 4. Alternatively, create an ARMA model that represents yt. Specify 1 for the variance of the innovations, and no model constant. Mdl = arima(AR=AR0,MA=MA0,Variance=1,Constant=0);
Mdl is an arima model object. Plot the IRF of the ARIMA model Mdl. y = impulse(Mdl) y = 5×1 1.0000 0.3500 0.0050 -0.0335 -0.0105
The IRFs in the two implementations are equivalent.
Return IRFs of VAR Model Compute the generalized IRF of the 2-D VAR(3) model yt =
1 −0 . 2 0 . 75 −0 . 1 0 . 55 −0 . 02 yt − 1 − yt − 2 + yt − 3 + εt . −0 . 1 0 . 3 −0 . 05 0 . 15 −0 . 01 0 . 03
In the equation, yt = [y1, t y2, t]′, εt = [ε1, t ε2, t]′, and, for all t, εt is Gaussian with mean zero and covariance matrix Σ=
0 . 5 −0 . 1 . −0 . 1 0 . 25
Create a cell vector of matrices for the autoregressive coefficients as you encounter them in the model as expressed in difference-equation notation. Specify the innovation covariance matrix. AR1 AR2 AR3 ar0
= = = =
[1 -0.2; -0.1 0.3]; -[0.75 -0.1; -0.05 0.15]; [0.55 -0.02; -0.01 0.03]; {AR1 AR2 AR3};
InnovCov = [0.5 -0.1; -0.1 0.25];
12-122
armairf
Compute the entire generalized IRF of yt. Because no MA terms exist, specify an empty array ([]) for the second input argument. Y = armairf(ar0,[],Method="generalized",InnovCov=InnovCov); size(Y) ans = 1×3 31
2
2
Y(10,1,2) ans = -0.0116
Y is a 31-by-2-by-2 array of impulse responses. Rows correspond to times 0 through 30 in the forecast horizon, columns correspond to the variables that armairf shocks at time 0, and pages correspond to the impulse response of the variables in the system. For example, the generalized impulse response of variable 2 at time 10 in the forecast horizon, when variable 1 is shocked at time 0, is Y(11,1,2) = -0.0116. armairf satisfies the stopping criterion after 31 periods. You can specify to stop sooner using the NumObs name-value argument. This practice is beneficial when the system has many variables. Compute and display the generalized impulse responses for the first 10 periods. Y10 = armairf(ar0,[],Method="generalized",InnovCov=InnovCov, ... NumObs=10) Y10 = Y10(:,:,1) = 0.7071 0.7354 0.2135 0.0526 0.2929 0.3717 0.1872 0.0730 0.1360 0.1841
-0.2000 -0.3000 -0.1340 -0.0112 -0.0772 -0.1435 -0.0936 -0.0301 -0.0388 -0.0674
Y10(:,:,2) = -0.1414 -0.1131 -0.0509 0.0058 0.0040 -0.0300 -0.0325 -0.0082 -0.0001 -0.0116
0.5000 0.1700 -0.0040 -0.0113 -0.0003 0.0100 0.0133 0.0054 -0.0003 0.0028
12-123
12
Functions
Y10 is a 10-by-2-by-2 array of impulse responses. Rows correspond to times 0 through 9 in the forecast horizon. The impulse responses appear to fade with increasing time, which suggests a stable system. Copyright 2018 The MathWorks, Inc.
Input Arguments ar0 — Autoregressive coefficients numeric vector | cell vector of square numeric matrices | LagOp lag operator polynomial object Autoregressive coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square numeric matrices, or LagOp lag operator polynomial object. If ar0 is a vector (numeric or cell), then the coefficient of yt is the identity (eye(numVars)). For an MA model, specify an empty array or cell ([] or {}). • For univariate time series models, ar0 is a numeric vector, cell vector of scalars, or onedimensional LagOp lag operator polynomial. For vectors, ar0 has length p, and the elements correspond to lagged responses that compose the AR polynomial in difference-equation notation on page 12-127. In other words, ar0(j) or ar0{j} is the coefficient of yt-j, j = 1,…,p. • For numVars-dimensional time series models, ar0 is a cell vector of numVars-by-numVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ar0 has length p. • ar0 and ma0 each must contain numVars-by-numVars matrices. For each matrix, row k and column k correspond to variable k in the system k = 1,…,numVars. • The elements of ar0 correspond to the lagged responses that compose the AR polynomial in difference equation notation. In other words, ar0{j} is the coefficient matrix of vector yt-j, j = 1,…,p. For all AR coefficient matrices, row k contains the AR coefficients in the equation of the variable ykt, and column k contains the coefficients of variable ykt within the equations. The row and column order of all autoregressive and moving average coefficients must be consistent. • For LagOp lag operator polynomials: • Coefficients in the Coefficients property correspond to the lags of yt in the Lags property. • Specify a model in reduced form by supplying the identity for the first coefficient (eye(numVars)). • armairf composes the model using lag operator notation on page 12-128. In other words, when you work from a model in difference-equation notation, negate the AR coefficients of the lagged responses to construct the lag operator polynomial equivalent. For example, consider yt = 0.5yt − 1 − 0.8yt − 2 + εt − 0.6εt − 1 + 0.08εt − 2. The model is in differenceequation form. To compute the impulse responses, enter the following in the command line. ar0 = [0.5 -0.8]; ma0 = [-0.6 0.08]; y = armairf(ar0,ma0);
The ARMA model written in lag-operator notation on page 12-128 is 1 − 0.5L + 0.8L2 yt = 1 − 0.6L + 0.08L2 εt . The AR coefficients of the lagged responses are negated
12-124
armairf
compared to the corresponding coefficients in difference-equation format. To obtain the same result using lag operator notation, enter the following in the command line. ar0 = LagOp({1 -0.5 0.8}); ma0 = LagOp({1 -0.6 0.08}); y = armairf(ar0, ma0);
ma0 — Moving average coefficients numeric vector | cell vector of square numeric matrices | LagOp lag operator polynomial object Moving average coefficients of the ARMA(p,q) model, specified as a numeric vector, cell vector of square numeric matrices, or LagOp lag operator polynomial object. If ma0 is a vector (numeric or cell), then the coefficient of εt is the identity (eye(numVars)). For an AR model, specify an empty array or cell ([] or {}). • For univariate time series models, ma0 is a numeric vector, cell vector of scalars, or onedimensional LagOp lag operator polynomial. For vectors, ma0 has length q, and the elements correspond to lagged innovations that compose the AR polynomial in difference-equation notation on page 12-127. In other words, ma0(j) or ma0{j} is the coefficient of εt-j, j = 1,…,q. • For numVars-dimensional time series models, ma0 is a cell vector of numeric numVars-bynumVars numeric matrices or a numVars-dimensional LagOp lag operator polynomial. For cell vectors: • ma0 has length q. • ar0 and ma0 each must contain numVars-by-numVars matrices. For each matrix, row k and column k correspond to variable k in the system k = 1,…,numVars. • The elements of ma0 correspond to the lagged responses that compose the MA polynomial in difference-equation notation. In other words, ma0{j} is the coefficient matrix of εt-j, j = 1,…,q. For all MA coefficient matrices, row k contains the MA coefficients in the equation of the variable εkt, and column k contains the coefficients of εkt within the equations. The row and column order of all autoregressive and moving average coefficient matrices must be consistent. • For LagOp lag operator polynomials, coefficients in the Coefficients property correspond to the lags of εt in the Lags property. To specify a model in reduced form, supply the identity (eye(numVars)) for the coefficient that corresponds to lag 0. ax — Axes on which to plot IRF of each variable vector of Axes objects Axes on which to plot the IRF of each variable, specified as a vector of Axes objects with length equal to numVars. By default, armairf plots impulse responses on axes in separate figures. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-125
12
Functions
Example: Method="generalized",NumObs=10 specifies to compute the generalized IRF for 10 periods. InnovCov — Covariance matrix numeric scalar | numeric matrix Covariance matrix of the ARMA(p,q) model innovations εt, specified as a numeric scalar or a numVars-by-numVars numeric matrix. InnovCov must be a positive scalar or a positive definite matrix. The default value is eye(numVars). Example: InnovCov=0.2 Data Types: double NumObs — Forecast horizon positive integer Forecast horizon, or the number of periods for which armairf computes the IRF, specified as a positive integer. NumObs specifies the number of observations to include in the IRF (the number of rows in Y). By default, armairf determines NumObs by the stopping criteria of mldivide. Example: NumObs=10 Data Types: double Method — IRF computation method "orthogonalized" (default) | "generalized" IRF computation method, specified as a value in this table. Value
Description
"orthogonalized"
Compute impulse responses using orthogonalized, one-standard-deviation innovation shocks. armairf uses the Cholesky factorization of InnovCov for orthogonalization.
"generalized"
Compute impulse responses using one-standarddeviation innovation shocks.
Example: 'Method',"generalized" Data Types: string
Output Arguments Y — Impulse responses numeric column vector | numeric array Impulse responses, returned as a numeric column vector or numeric array. Y(t + 1,j,k) is the impulse response of variable k to a one-standard-deviation innovation shock to variable j at time 0, for t = 0, 1, ..., numObs – 1, j = 1,2,...,numVars, and k = 1,2,...,numVars. The columns and pages of Y correspond to the variable order in ar0 and ma0. 12-126
armairf
h — Handles to plotted graphics objects matrix of graphics objects Handles to plotted graphics objects, returned as a numVars-by-numVars matrix of graphics objects. h(j,k) corresponds to the IRF of variable k attributable to an innovation shock to variable j at time 0. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
More About Difference-Equation Notation A linear time series model written in difference-equation notation positions the present value of the response and its structural coefficient on the left side of the equation. The right side of the equation contains the sum of the lagged responses, present innovation, and lagged innovations with corresponding coefficients. In other words, a linear time series written in difference-equation notation is Φ0 yt = c + Φ1 yt − 1 + ... + Φp yt − p + Θ0εt + Θ1εt − 1 + ... + Θqεt − q, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • εt is a numVars-dimensional vector representing the innovations at time t. • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the n-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form. Impulse Response Function An impulse response function (IRF) of a time series model (or dynamic response of the system) measures the changes in the future responses of all variables in the system when a variable is shocked by an impulse. Suppose yt is the ARMA(p,q) model containing numVars response variables Φ(L)yt = Θ(L)εt . • Φ(L) is the lag operator polynomial of the autoregressive coefficients, in other words, Φ(L) = Φ0 − Φ1L − Φ2L2 − ... − ΦpLp . • Θ(L) is the lag operator polynomial of the moving average coefficients, in other words, Θ(L) = Θ0 + Θ1L + Θ2L2 + ... + ΘqLq . • εt is the vector of numVars innovations at time t. Assume that the innovations have zero mean and the constant, positive-definite covariance matrix Σ for all t. The infinite-lag MA representation of yt is 12-127
12
Functions
yt = Φ−1(L)Θ(L)εt = Ω(L)εt . The general form of the IRF of yt shocked by an impulse to variable j by one standard deviation of its innovation m periods into the future is ψ j(m) = Cme j . • ej is a selection vector of length numVars containing a one in element j and zeros elsewhere. • For the orthogonalized IRF, Cm = ΩmP, where P is the lower triangular factor in the Cholesky factorization of Σ. • For the generalized IRF, Cm = σ−1ΩmΣ, where σj is the standard deviation of innovation j. j Lag Operator Notation A time series model written in lag operator notation positions a p-degree lag operator polynomial on the present response on the left side of the equation. The right side of the equation contains the model constant and a q-degree lag operator polynomial on the present innovation. In other words, a linear time series model written in lag operator notation is Φ(L)yt = c + Θ(L)εt, where • yt is a numVars-dimensional vector representing the responses of numVars variables at time t, for all t and for numVars ≥ 1. • Φ(L) = Φ − Φ L − Φ L2 − ... − Φ Lp, which is the autoregressive, lag operator polynomial. 0 1 2 p • L is the back-shift operator, in other words, L j y = y t − j. t • Φj is the numVars-by-numVars matrix of AR coefficients of the response yt-j, for j = 0,...,p. • εt is a numVars-dimensional vector representing the innovations at time t. • Θ(L) = Θ + Θ L + Θ L2 + ... + Θ Lq, which is the moving average, lag operator polynomial. 0 1 2 q • Θk is the numVars-by-numVars matrix of MA coefficients of the innovation εt-k., k = 0,...,q. • c is the numVars-dimensional model constant. • Φ0 = Θ0 = InumVars, which is the numVars-dimensional identity matrix, for models in reduced form. When comparing lag operator notation to difference-equation notation, the signs of the lagged AR coefficients appear negated relative to the corresponding terms in difference-equation notation. The signs of the moving average coefficients are the same and appear on the same side. For more details on lag operator notation, see “Lag Operator Notation” on page 1-21.
Tips • To compute forecast error impulse responses, use the default value of InnovCov, which is a numVars-by-numVars identity matrix. In this case, all available computation methods (see Method) result in equivalent IRFs. • To accommodate structural ARMA(p,q) models, supply LagOp lag operator polynomials for the input arguments ar0 and ma0. To specify a structural coefficient when you call LagOp, set the corresponding lag to 0 by using the Lags name-value argument. 12-128
armairf
• For multivariate orthogonalized IRFs, arrange the variables according to Wold causal ordering [2]: • The first variable (corresponding to the first row and column of both ar0 and ma0) is most likely to have an immediate impact (t = 0) on all other variables. • The second variable (corresponding to the second row and column of both ar0 and ma0) is most likely to have an immediate impact on the remaining variables, but not the first variable. • In general, variable j (corresponding to row j and column j of both ar0 and ma0) is the most likely to have an immediate impact on the last numVars – j variables, but not the previous j – 1 variables.
Algorithms • If Method is "orthogonalized", then the resulting IRF depends on the order of the variables in the time series model. If Method is "generalized", then the resulting IRF is invariant to the order of the variables. Therefore, the two methods generally produce different results. • If InnovCov is a diagonal matrix, then the resulting generalized and orthogonal IRFs are identical. Otherwise, the resulting generalized and orthogonal IRFs are identical only when the first variable shocks all variables (that is, all else being the same, both methods yield the same Y(:,1,:)).
Version History Introduced in R2015b R2018b: armairf returns separate figures for each variable in the system Behavior changed in R2018b armairf now plots, in separate figures, the impulse response function (IRF) of the numVars variables in a system. Each figure contains numVars line plots representing the responses of a variable to a shock to all variables in the system at time 0. In previous releases, armairf returned one figure containing separate subplots for each variable. R2018b: armairf permutes the second and third dimensions of the impulse response array output Behavior changed in R2018b When you estimate and return a multivariate impulse response function (IRF) by using armairf: • The numVars columns of the 3-D array output now correspond to the numVars variables in the system receiving a shock at time 0. • The numVars pages of the 3-D array output now correspond to the IRF of the numVars variables in the system. In other words, element t,j,k of the returned 3-D array is the IRF of variable k to a shock to variable j at time 0, for t = 1,…,numObs. numObs is the number of observations in the IRF representing times 0, …,numObs – 1, respectively, and j,k = 1,…,numVars. In previous releases, the numVars columns of the 3-D array output of armairf corresponded to the IRF of the numVars variables in the system. Whereas, pages corresponded to the numVars variables in the system receiving a shock at time 0. 12-129
12
Functions
If you index the columns or pages of the 3-D array output, and you want to update your code so that you obtain results in the same format, permute the column and page indices. For example, suppose YNew is the IRF returned by armairf in R2018b or a later release. To reproduce the formatted results in previous releases, rearrange the columns and pages of YNew by using permute. For example:
numVars = 4; % 4-D system ar0 = {randn(numVars)}; % Random-valued, lag 1 coefficient YNew = armairf(ar0,[],NumObs=10); % New behavior - Columns correspond to shocked variables in IRF YOld = permute(YNew,[1 3 2]); % Previous behavior - Pages correspond to shocked variables i
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [3] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29.
See Also armafevd | impulse | vec2var | LagOp | mldivide | arima Topics “Generate VAR Model Impulse Responses” on page 9-48 “Compare Generalized and Orthogonalized Impulse Response Functions” on page 9-52 “Generate VEC Model Impulse Responses” on page 9-132
12-130
asymptotics
asymptotics Determine Markov chain asymptotics
Syntax xFix = asymptotics(mc) [xFix,tMix] = asymptotics(mc)
Description xFix = asymptotics(mc) returns the stationary distribution xFix of the discrete-time Markov chain mc. [xFix,tMix] = asymptotics(mc) additionally returns an estimate of the mixing time tMix.
Examples Determine Markov Chain Stationary Distribution Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 1/2 1/4 1/4 0 0 ; 0 0 1/3 0 2/3 0 0 ; 0 0 0 0 0 1/3 2/3; 0 0 0 0 0 1/2 1/2; 0 0 0 0 0 3/4 1/4; 1/2 1/2 0 0 0 0 0 ; 1/4 3/4 0 0 0 0 0 ]; mc = dtmc(P);
Plot a directed graph of the Markov chain. Indicate the probability of transition by using edge colors. figure; graphplot(mc,'ColorEdges',true);
12-131
12
Functions
Determine the stationary distribution of the Markov chain. xFix = asymptotics(mc) xFix = 1×7 0.1300
0.2034
0.1328
0.0325
0.1681
0.1866
0.1468
Because xFix is a row vector, it is the unique stationary distribution of mc.
Estimate Markov Chain Mixing Time Create a five-state transition matrix of empirical counts by generating a block diagonal matrix composed of discrete uniform draws. m = 100; % Maximal count rng(1); % For reproducibility P = blkdiag(randi(100,2) + 1,randi(100,3) + 1) P = 5×5 43 74 0
12-132
2 32 0
0 0 16
0 0 36
0 0 43
asymptotics
0 0
0 0
11 20
41 55
70 22
Create and plot a digraph of the Markov chain that is characterized by the transition matrix P. mc = dtmc(P); figure; graphplot(mc)
Determine the stationary distribution and mixing time of the Markov chain. [xFix,tMix] = asymptotics(mc) xFix = 2×5 0.9401 0
0.0599 0
0 0.1497
0 0.4378
0 0.4125
tMix = 0.8558
Rows of xFix correspond to the stationary distributions of the two independent recurrent classes of mc. Create separate Markov chains representing the recurrent subchains of mc. 12-133
12
Functions
mc1 = subchain(mc,1); mc2 = subchain(mc,3);
mc1 and mc2 are dtmc objects. mc1 is the recurrent class containing state 1, and mc2 is the recurrent class containing state 3. Compare the mixing times of the subchains. [x1,t1] = asymptotics(mc1) x1 = 1×2 0.9401
0.0599
t1 = 0.7369 [x2,t2] = asymptotics(mc2) x2 = 1×3 0.1497
0.4378
0.4125
t2 = 0.8558
mc1 approaches its stationary distribution more quickly than mc2. Determine Dumbbell Chain Mixing Time Create a "dumbbell" Markov chain containing 10 states in each "weight" and three states in the "bar." • Specify random transition probabilities between states within each weight. • If the Markov chain reaches the state in a weight that is closest to the bar, then specify a high probability of transitioning to the bar. • Specify uniform transitions between states in the bar. rng(1,"twister"); % For reproducibility w = 10; % Dumbbell weights DBar = [0 1 0; 1 0 1; 0 1 0]; % Dumbbell bar DB = blkdiag(rand(w),DBar,rand(w)); % Transition matrix % Connect dumbbell weights and bar DB(w,w+1) = 1; DB(w+1,w) = 1; DB(w+3,w+4) = 1; DB(w+4,w+3) = 1; mc = dtmc(DB);
Visualize the transition matrix using a heatmap. figure imagesc(mc.P) axis square colorbar
12-134
asymptotics
Plot a directed graph of the Markov chain. Suppress node labels. figure h = graphplot(mc); h.NodeLabel = {};
12-135
12
Functions
Plot the eigenvalues of the dumbbell chain. figure eigplot(mc)
12-136
asymptotics
The thin, red disc in the plot shows the spectral gap (the difference between the two largest eigenvalue moduli). The spectral gap determines the mixing time of the Markov chain. Large gaps indicate faster mixing, whereas thin gaps indicate slower mixing. In this case, the spectral gap is thin, indicating a long mixing time. Estimate the mixing time of the dumbbell chain and determine whether the chain is ergodic. [~,tMix] = asymptotics(mc) tMix = 85.3258 tf = isergodic(mc) tf = logical 1
On average, the time it takes for the total variation distance between any initial distribution and the stationary distribution to decay by a factor of e1 is about 85 steps.
Input Arguments mc — Discrete-time Markov chain dtmc object 12-137
12
Functions
Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries).
Output Arguments xFix — Stationary distribution nonnegative numeric matrix Stationary distribution, with xFix*P = xFix, returned as a nonnegative numeric matrix with NumStates columns. The number of rows of xFix is the number of independent recurrent classes in mc. • For unichains, the distribution is unique, and xFix is a 1-by-NumStates vector. • Otherwise, each row of xFix represents a distinct stationary distribution in mc. tMix — Mixing time positive numeric scalar Mixing time, returned as a positive numeric scalar. If μ, the second largest eigenvalue modulus (SLEM) of P, exists and is nonzero, then the estimated mixing time is −1/log μ . Note • If P is a nonnegative stochastic matrix, then the Markov chain mc it characterizes has a left eigenvector xFix with eigenvalue 1. The Perron-Frobenius Theorem [2] implies that if mc is a unichain (a chain with a single recurrent communicating class), then xFix is unique. For reducible chains with multiple recurrent classes, eigenvalue 1 has higher multiplicity, and xFix is nonunique. If a chain is periodic, xFix is stationary but not limiting because arbitrary initial distributions do not converge to it. xFix is both unique and limiting for ergodic chains only. See classify. • For ergodic chains, tMix is a characteristic time for any initial distribution to converge to xFix. Specifically, it is the time for the total variation distance between an initial distribution and xFix to decay by a factor of e = exp(1). Mixing times are a measure of the relative connectivity of transition structures in different chains.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [3] Seneta, E. Non-negative Matrices and Markov Chains. New York, NY: Springer-Verlag, 1981. 12-138
asymptotics
See Also Objects dtmc Functions isreducible | isergodic | classify | eigplot Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Determine Asymptotic Behavior of Markov Chain” on page 10-39
12-139
12
Functions
autocorr Sample autocorrelation
Syntax [acf,lags] = autocorr(y) ACFTbl = autocorr(Tbl) [ ___ ,bounds] = autocorr( ___ ) [ ___ ] = autocorr( ___ ,Name=Value) autocorr( ___ ) autocorr(ax, ___ ) [ ___ ,h] = autocorr( ___ )
Description [acf,lags] = autocorr(y) returns the sample autocorrelation function on page 12-151 (ACF) acf and associated lags lags of the univariate time series y. ACFTbl = autocorr(Tbl) returns the table ACFTbl containing variables for the sample ACF and associated lags of the last variable in the input table or timetable Tbl. To select a different variable in Tbl, for which to compute the ACF, use the DataVariable name-value argument. [ ___ ,bounds] = autocorr( ___ ) uses any input-argument combination in the previous syntaxes, and returns the output-argument combination for the corresponding input arguments and the approximate upper and lower confidence bounds bounds on the ACF. [ ___ ] = autocorr( ___ ,Name=Value) uses additional options specified by one or more namevalue arguments. For example, autocorr(Tbl,DataVariable="RGDP",NumLags=10,NumSTD=1.96) returns 10 lags of the sample ACF of the table variable "RGDP" in Tbl and 95% confidence bounds. autocorr( ___ ) plots the sample ACF of the input series with confidence bounds. autocorr(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = autocorr( ___ ) plots the sample ACF of the input series and additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it.
Examples Return ACF on Vector of Time Series Data Compute the ACF of a univariate time series. Input the time series data as a numeric vector. Load the quarterly real GDP series in Data_GDP.mat. Plot the series, which is stored in the numeric vector Data. 12-140
autocorr
load Data_GDP plot(Data)
The series exhibits exponential growth. Compute the returns of the series. ret = price2ret(Data);
ret is a series of real GDP returns; it has one less observation than the real GDP series. Compute the ACF of the real GDP returns, and return the associated lags. [acf,lags] = autocorr(ret); [acf lags] ans = 21×2 1.0000 0.3329 0.1836 -0.0216 -0.1172 -0.1632 -0.0870 -0.0707 -0.0380 0.0554
0 1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000
12-141
12
Functions
⋮
Let yt be the real GDP return at time t. In general, acf(j) = Corr( yt,yt − lags j ). Therefore, acf(1) = Corr( yt, yt) = 1.0000, acf(2) = Corr( yt,yt − 1) = 0.3329, and so on.
Compute ACF of Table Variable Compute the ACF of a time series, which is one variable in a table. Load the electricity spot price data set Data_ElectricityPrices.mat, which contains the daily spot prices in the timetable DataTimeTable. load Data_ElectricityPrices.mat DataTimeTable.Properties.VariableNames ans = 1x1 cell array {'SpotPrice'}
Plot the series. plot(DataTimeTable.SpotPrice)
The time series plot does not clearly indicate an exponential trend or unit root. 12-142
autocorr
Compute the ACF of the raw spot price series. ACFTbl = autocorr(DataTimeTable) ACFTbl=21×2 table Lags ACF ____ _______ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ⋮
1 0.55405 0.38251 0.31713 0.25107 0.21436 0.21275 0.19396 0.18292 0.18826 0.19476 0.19043 0.19963 0.19397 0.19957 0.25495
autocorr returns the results in the table ACFTbl, where variables correspond to the ACF (ACF) and associated lags (Lags). By default, autocorr computes the ACF of the last variable in the table. To select a variable from an input table, set the DataVariable option.
Return ACF Confidence Bounds Consider the electricity spot prices in “Compute ACF of Table Variable” on page 12-142. Load the electricity spot price data set Data_ElectricityPrices.mat. Compute the ACF and return the ACF confidence bounds. load Data_ElectricityPrices [ACFTbl,bounds] = autocorr(DataTimeTable) ACFTbl=21×2 table Lags ACF ____ _______ 0 1 2 3 4 5 6 7
1 0.55405 0.38251 0.31713 0.25107 0.21436 0.21275 0.19396
12-143
12
Functions
8 9 10 11 12 13 14 15 ⋮
0.18292 0.18826 0.19476 0.19043 0.19963 0.19397 0.19957 0.25495
bounds = 2×1 0.0532 -0.0532
Assuming the spot prices follow a Gaussian white noise series, an approximate 95.4% confidence interval on the ACF is (-0.0532, 0.0532).
Compare the ACF for Normalized and Unnormalized Series Although various estimates of the sample autocorrelation function exist, autocorr uses the form in Box, Jenkins, and Reinsel, 1994. In their estimate, they scale the correlation at each lag by the sample variance (var(y,1)) so that the autocorrelation at lag 0 is unity. However, certain applications require rescaling the normalized ACF by another factor. Simulate 1000 observations from the standard Gaussian distribution. rng(1); % For reproducibility y = randn(1000,1);
Compute the normalized and unnormalized sample ACF. [normalizedACF, lags] = autocorr(y,NumLags=10); unnormalizedACF = normalizedACF*var(y,1);
Compare the first 10 lags of the sample ACF with and without normalization. [lags normalizedACF unnormalizedACF] ans = 11×3 0 1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000 ⋮
12-144
1.0000 -0.0180 0.0536 -0.0206 -0.0300 -0.0086 -0.0108 -0.0116 0.0309 0.0341
0.9960 -0.0180 0.0534 -0.0205 -0.0299 -0.0086 -0.0107 -0.0116 0.0307 0.0340
autocorr
Plot ACF of Simulated Time Series Specify the MA(2) model: yt = εt − 0 . 5εt − 1 + 0 . 4εt − 2, where εt is Gaussian with mean 0 and variance 1. rng(1); % For reproducibility Mdl = arima(MA={-0.5 0.4},Constant=0,Variance=1) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(0,0,2) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 0 2 0 {} {} {-0.5 0.4} at lags [1 2] {} 0 [1×0] 1
Simulate 1000 observations from Mdl. y = simulate(Mdl,1000);
Plot the ACF of the simulated series. Specify that the series is an MA(2) process. autocorr(y,NumMA=2)
12-145
12
Functions
The ACF cuts off after the second lag. This behavior is indicative of an MA(2) process.
Specify Additional Lags in ACF Plot Specify the multiplicative seasonal ARMA (2, 0, 1) × (3, 0, 0)12 model: (1 − 0 . 75L − 0 . 15L2)(1 − 0 . 9L12 + 0 . 5L24 − 0 . 5L36)yt = 2 + εt − 0 . 5εt − 1, where εt is Gaussian with mean 0 and variance 1. Mdl = arima(AR={0.75,0.15},SAR={0.9,-0.5,0.5}, ... SARLags=[12 24 36],MA=-0.5,Constant=2, ... Variance=1);
Simulate data from Mdl. rng(1); % For reproducibility y = simulate(Mdl,1000);
Plot the default autocorrelation function (ACF). figure autocorr(y)
12-146
autocorr
The default correlogram does not display the dependence structure for higher lags. Plot the ACF for 40 lags. figure autocorr(y,NumLags=40)
12-147
12
Functions
The correlogram shows the larger correlations at lags 12, 24, and 36.
Input Arguments y — Observed univariate time series numeric vector Observed univariate time series for which autocorr computes or plots the ACF, specified as a numeric vector. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl contains contemporaneous observations of all variables. Specify a single series (variable) by using the DataVariable argument. The selected variable must be numeric. ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. 12-148
autocorr
By default, autocorr plots to the current axes (gca). Note Specify missing observations using NaN. The autocorr function treats missing values as missing completely at random on page 12-151. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: autocorr(Tbl,DataVariable="RGDP",NumLags=10,NumSTD=3) plots 10 lags of the sample ACF of the variable "RGDP" in Tbl, and displays confidence bounds consisting of 3 standard errors away from 0. NumLags — Number of lags positive integer Number of lags in the sample ACF, specified as a positive integer. autocorr uses lags 0:NumLags to estimate the ACF. The default is min([20,T – 1]), where T is the effective sample size of the input time series. Example: autocorr(y,NumLags=10) plots the sample ACF of y for lags 0 through 10. Data Types: double NumMA — Number of lags in theoretical MA model 0 (default) | nonnegative integer Number of lags in a theoretical MA model of the input time series, specified as a nonnegative integer less than NumLags. autocorr uses NumMA to estimate confidence bounds. • For lags > NumMA, autocorr uses Bartlett’s approximation [1] to estimate the standard errors under the model assumption. • If NumMA = 0, then autocorr assumes that the input time series is a Gaussian white noise process with a standard error of approximately 1/ T, where T is the effective sample size of the input time series. Example: autocorr(y,NumMA=10) specifies that y is an MA(10) process and plots confidence bounds for all lags greater than 10. Data Types: double NumSTD — Number of standard errors in confidence bounds 2 (default) | nonnegative scalar Number of standard errors in the confidence bounds, specified as a nonnegative scalar. For all lags greater than NumMA, the confidence bounds are 0 ±NumSTD*σ , where σ is the estimated standard error of the sample autocorrelation. 12-149
12
Functions
The default yields the approximate 95% confidence bounds. Example: autocorr(y,NumSTD=1.5) plots the ACF of y with confidence bounds 1.5 standard errors away from 0. Data Types: double DataVariable — Variable in Tbl last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl for which autocorr computes the ACF, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 selects the second table variable. Data Types: double | logical | char | string
Output Arguments acf — Sample ACF numeric vector Sample ACF, returned as a numeric vector of length NumLags + 1. autocorr returns acf only when you supply the input y. The elements of acf correspond to lags 0,1,2,..., NumLags (that is, elements of lags). For all time series, the lag 0 autocorrelation acf(1) = 1. lags — ACF lags numeric vector ACF lags, returned as a numeric vector with elements 0:NumLags. autocorr returns lags only when you supply the input y. ACFTbl — Sample ACF table Sample ACF, returned as a table with variables for the outputs acf and lags. autocorr returns ACFTbl when you supply the input Tbl. bounds — Approximate upper and lower confidence bounds numeric vector Approximate upper and lower confidence bounds assuming the input series is an MA(NumMA) process, returned as a two-element numeric vector. The NumSTD option specifies the number of standard errors in the confidence bounds. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot. 12-150
autocorr
More About Autocorrelation Function The autocorrelation function measures the correlation between the univariate time series yt and yt + k, where k = 0,...,K and yt is a stochastic process. According to [1], the autocorrelation for lag k is rk =
ck , c0
where •
ck =
T−k
1 (yt − y)(yt + k − y) . T t∑ =1
• c0 is the sample variance of the time series. Suppose that q is the lag beyond which the theoretical ACF is effectively 0. Then, the estimated standard error of the autocorrelation at lag k > q is SE(rk) =
q
1 1 + 2 ∑ r 2j . T j=1
If the series is completely random, then the standard error reduces to 1/ T. Missing Completely at Random Observations of a random variable are missing completely at random if the tendency of an observation to be missing is independent of both the random variable and the tendency of all other observations to be missing.
Tips • To plot the ACF without confidence bounds, set NumSTD=0.
Algorithms • If the input series is a fully observed series (that is, it does not contain any NaN values), autocorr uses a Fourier transform to compute the ACF in the frequency domain, then converts back to the time domain using an inverse Fourier transform. • If the input series is not fully observed (that is, it contains at least one NaN value), autocorr computes the ACF at lag k in the time domain, and includes in the sample average only those terms for which the cross product ytyt+k exists. Consequently, the effective sample size is a random variable. • autocorr plots the ACF when you do not return any output or when you return the fourth output h.
Version History Introduced before R2006a 12-151
12
Functions
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Apps Econometric Modeler Functions parcorr | crosscorr | filter Topics “Autocorrelation and Partial Autocorrelation” on page 3-10 “Time Series Regression VI: Residual Diagnostics” on page 5-223 “Detect Autocorrelation” on page 3-19 “Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2
12-152
bayeslm
bayeslm Create Bayesian linear regression model object
Syntax PriorMdl = bayeslm(NumPredictors) PriorMdl = bayeslm(NumPredictors,ModelType=modelType) PriorMdl = bayeslm(NumPredictors,ModelType=modelType,Name=Value)
Description To create a Bayesian vector autoregression (VARX) model for multivariate time series analysis, see bayesvarm. PriorMdl = bayeslm(NumPredictors) creates a Bayesian linear regression model on page 12170 object (PriorMdl) composed of NumPredictors predictors, an intercept, and a diffuse, joint prior distribution for β and σ2. PriorMdl is a template that defines the prior distributions and dimensionality of β. PriorMdl = bayeslm(NumPredictors,ModelType=modelType) specifies the joint prior distribution modelType for β and σ2. For this syntax, modelType can be: • 'conjugate', 'semiconjugate', or 'diffuse' to create a standard Bayesian linear regression prior model • 'mixconjugate', 'mixsemiconjugate', or 'lasso' to create a Bayesian linear regression prior model for predictor variable selection For example, ModelType="conjugate" specifies conjugate priors for the Gaussian likelihood, that is, β|σ2 as Gaussian, σ2 as inverse gamma. PriorMdl = bayeslm(NumPredictors,ModelType=modelType,Name=Value) uses additional options specified by one or more name-value arguments. For example, you can specify whether to include a regression intercept or specify additional options for the joint prior distribution modelType. • If you specify ModelType="empirical", you must also specify the BetaDraws and Sigma2Draws name-value arguments. BetaDraws and Sigma2Draws characterize the respective prior distributions. • If you specify ModelType="custom", you must also specify the LogPDF name-value argument. LogPDF completely characterizes the joint prior distribution.
Examples Default Diffuse Prior Model Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). 12-153
12
Functions
GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Suppose that the regression coefficients β = [β0, . . . , β3]′ and the disturbance variance σ2 are random variables, and their prior values and distribution are unknown. In this case, use the noninformative Jefferys prior: the joint prior distribution is proportional to 1/σ2. These assumptions and the data likelihood imply an analytically tractable posterior distribution. Create a diffuse prior model for the linear regression parameters, which is the default model type. Specify the number of predictors p. p = 3; Mdl = bayeslm(p) Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(1) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(2) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(3) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Mdl is a diffuseblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. bayeslm displays a summary of the prior distributions at the command line. Because the prior is noninformative and the model does not contain data, the summary is trivial. If you have data, then you can estimate characteristics of the posterior distribution by passing the prior model Mdl and data to estimate.
Normal-Inverse-Gamma Semiconjugate Prior Model Consider the linear regression model in “Default Diffuse Prior Model” on page 12-153. Assume these prior distributions: •
β | σ2 ∼ N4 M, V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution.
12-154
bayeslm
These assumptions and the data likelihood imply a normal-inverse-gamma semiconjugate model. The conditional posteriors are conjugate to the prior with respect to the data likelihood, but the marginal posterior is analytically intractable. Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors p. p = 3; Mdl = bayeslm(p,ModelType="semiconjugate") Mdl = semiconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ------------------------------------------------------------------------------Intercept | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(1) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(2) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(3) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Mdl is a semiconjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. bayeslm displays a summary of the prior distributions at the command line. For example, the elements of Positive represent the prior probability that the corresponding parameter is positive. If you have data, then you can estimate characteristics of the marginal or conditional posterior distribution by passing the prior model Mdl and data to estimate.
Set Hyperparameters of Normal-Inverse-Gamma Conjugate Prior Model Consider the linear regression model in “Default Diffuse Prior Model” on page 12-153. Assume these prior distributions: •
β | σ2 ∼ N4 M, σ2V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix. Suppose you have prior knowledge that M = −20 4 0 . 1 2 ′ and V is the identity matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model.
12-155
12
Functions
Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and set the regression coefficient names to the corresponding variable names. p = 3; Mdl = bayeslm(p,ModelType="conjugate",Mu=[-20; 4; 0.1; 2],V=eye(4), ... VarNames=["IPI" "E" "WR"]) Mdl = conjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ---------------------------------------------------------------------------------Intercept | -20 0.7071 [-21.413, -18.587] 0.000 t (-20.00, 0.58^2, 6) IPI | 4 0.7071 [ 2.587, 5.413] 1.000 t (4.00, 0.58^2, 6) E | 0.1000 0.7071 [-1.313, 1.513] 0.566 t (0.10, 0.58^2, 6) WR | 2 0.7071 [ 0.587, 3.413] 0.993 t (2.00, 0.58^2, 6) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Mdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. bayeslm displays a summary of the prior distributions at the command line. Although bayeslm assigns names to the intercept and disturbance variance, all other coefficients have the specified names. By default, bayeslm sets the shape and scale to 3 and 1, respectively. Suppose you have prior knowledge that the shape and scale are 5 and 2. Set the prior shape and scale of σ2 to their assumed values. Mdl.A = 5; Mdl.B = 2 Mdl = conjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 5 2
| Mean Std CI95 Positive Distribution ---------------------------------------------------------------------------------Intercept | -20 0.3536 [-20.705, -19.295] 0.000 t (-20.00, 0.32^2, 10) IPI | 4 0.3536 [ 3.295, 4.705] 1.000 t (4.00, 0.32^2, 10)
12-156
bayeslm
E WR Sigma2
| 0.1000 | 2 | 0.1250
0.3536 0.3536 0.0722
[-0.605, [ 1.295, [ 0.049,
0.805] 2.705] 0.308]
0.621 1.000 1.000
t (0.10, 0.32^2, 10) t (2.00, 0.32^2, 10) IG(5.00, 2)
bayeslm updates the prior distribution summary based on the changes in the shape and scale.
Custom Multivariate t Prior Model For Coefficients Consider the linear regression model in “Default Diffuse Prior Model” on page 12-153. Assume these prior distributions: •
•
is 4-D t distribution with 50 degrees of freedom for each component and the identity matrix for the correlation matrix. Also, the distribution is centered at
and each
component is scaled by the corresponding elements of the vector
.
.
bayeslm treats these assumptions and the data likelihood as if the corresponding posterior is analytically intractable. Declare a MATLAB® function that: • Accepts values of hyperparameters •
and
together in a column vector, and accepts values of the
Returns the value of the joint prior distribution,
, given the values of
and
function logPDF = priorMVTIG(params,ct,st,dof,C,a,b) %priorMVTIG Log density of multivariate t times inverse gamma % priorMVTIG passes params(1:end-1) to the multivariate t density % function with dof degrees of freedom for each component and positive % definite correlation matrix C. priorMVTIG returns the log of the product of % the two evaluated densities. % % params: Parameter values at which the densities are evaluated, an % m-by-1 numeric vector. % % ct: Multivariate t distribution component centers, an (m-1)-by-1 % numeric vector. Elements correspond to the first m-1 elements % of params. % % st: Multivariate t distribution component scales, an (m-1)-by-1 % numeric (m-1)-by-1 numeric vector. Elements correspond to the % first m-1 elements of params. % % dof: Degrees of freedom for the multivariate t distribution, a % numeric scalar or (m-1)-by-1 numeric vector. priorMVTIG expands % scalars such that dof = dof*ones(m-1,1). Elements of dof % correspond to the elements of params(1:end-1). %
12-157
12
Functions
% C: Correlation matrix for the multivariate t distribution, an % (m-1)-by-(m-1) symmetric, positive definite matrix. Rows and % columns correspond to the elements of params(1:end-1). % % a: Inverse gamma shape parameter, a positive numeric scalar. % % b: Inverse gamma scale parameter, a positive scalar. % beta = params(1:(end-1)); sigma2 = params(end); tVal = (beta - ct)./st; mvtDensity = mvtpdf(tVal,C,dof); igDensity = sigma2^(-a-1)*exp(-1/(sigma2*b))/(gamma(a)*b^a); logPDF = log(mvtDensity*igDensity); end
Create an anonymous function that operates like priorMVTIG, but accepts the parameter values only and holds the hyperparameter values fixed. dof = 50; C = eye(4); ct = [-25; 4; 0; 3]; st = [10; 1; 1; 1]; a = 3; b = 1; prior = @(params)priorMVTIG(params,ct,st,dof,C,a,b);
Create a custom joint prior model for the linear regression parameters. Specify the number of predictors p. Also, specify the function handle for priorMVTIG, and pass the hyperparameter values. p = 3; Mdl = bayeslm(p,ModelType="custom",LogPDF=prior) Mdl = customblm with properties: NumPredictors: Intercept: VarNames: LogPDF:
3 1 {4x1 cell} @(params)priorMVTIG(params,ct,st,dof,C,a,b)
The priors are defined by the function: @(params)priorMVTIG(params,ct,st,dof,C,a,b)
Mdl is a customblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. In this case, bayeslm does not display a summary of the prior distributions at the command line.
12-158
bayeslm
Perform Bayesian Lasso Regression Consider the linear regression model in “Default Diffuse Prior Model” on page 12-153. Assume these prior distributions: • For k = 0,...,3, βk | σ2 has a Laplace distribution with a mean of 0 and a scale of σ2 /λ, where λ is the shrinkage parameter. The coefficients are conditionally independent. • σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. Create a prior model for Bayesian linear regression by using bayeslm. Specify the number of predictors p and the variable names. p = 3; PriorMdl = bayeslm(p,ModelType="lasso", ... VarNames=["IPI" "E" "WR"]);
PriorMdl is a lassoblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. By default, bayeslm attributes a shrinkage of 0.01 to the intercept and 1 to the other coefficients in the model. Using dot notation, change the default shrinkages for all coefficients, except the intercept, by specifying a 3-by-1 vector containing the new values for the Lambda property of PriorMdl. • Attribute a shrinkage of 10 to IPI and WR. • Because E has a scale that is several orders of magnitude larger than the other variables, attribute a shrinkage of 1e5 to it. Lambda(2:end) contains the shrinkages of the coefficients corresponding to the specified variables in the VarNames property of PriorMdl. PriorMdl.Lambda = [10; 1e5; 10];
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,"GNPR"};
Perform Bayesian lasso regression by passing the prior model and data to estimate, that is, by estimating the posterior distribution of β and σ2. Bayesian lasso regression uses Markov chain Monte Carlo (MCMC) to sample from the posterior. For reproducibility, set a random seed. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: lasso MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -1.3472 6.8160 [-15.169, 11.590] 0.427 Empirical IPI | 4.4755 0.1646 [ 4.157, 4.799] 1.000 Empirical E | 0.0001 0.0002 [-0.000, 0.000] 0.796 Empirical WR | 3.1610 0.3136 [ 2.538, 3.760] 1.000 Empirical
12-159
12
Functions
Sigma2
| 60.1452
11.1180
[42.319, 85.085]
1.000
Empirical
Plot the posterior distributions. plot(PosteriorMdl)
Given a shrinkage of 10, the distribution of E is fairly dense around 0. Therefore, E might not be an important predictor.
Input Arguments NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When counting the number of predictors in the model, exclude the intercept term specified by Intercept. If you include a column of ones in the predictor data for an intercept term, then count it as a predictor variable and specify Intercept=false. Data Types: double 12-160
bayeslm
Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: ModelType="conjugate",Mu=1:3,V=1000*eye(3),A=1,B=0.5 specifies that the prior distribution of Beta given Sigma2 is Gaussian with mean vector 1:3 and covariance matrix Sigma2*1000*eye(3), and the distribution of Sigma2 is inverse gamma with shape 1 and scale 0.5. Options for All Prior Distributions
ModelType — Joint prior distribution of (β,σ2) 'diffuse' (default) | 'conjugate' | 'semiconjugate' | 'empirical' | 'custom' | 'lasso' | 'mixconjugate' | 'mixsemiconjugate' Joint prior distribution of (β,σ2), specified as a value in the following tables. For a standard Bayesian regression model, choose a value in this table. Value
Description
'conjugate'
Normal-inverse-gamma conjugate model • The prior distributions are β σ2 Np + 1 μ, σ2V . σ2 IG(A, B) . β and σ2 are dependent. • The corresponding marginal and conditional posterior distributions have closed forms (see “Analytically Tractable Posteriors” on page 6-5). You can adjust corresponding hyperparameters using the Mu, V, A, and B namevalue arguments.
'semiconjugate Normal-inverse-gamma semiconjugate model ' • The prior distributions are β σ2 Np + 1 μ, V . σ2 IG(A, B) . β and σ2 are independent. • The corresponding marginal posterior distribution does not have a closed form, but conditional posterior distributions do (see “Analytically Tractable Posteriors” on page 6-5). You can adjust corresponding hyperparameters using the Mu, V, A, and B namevalue arguments.
12-161
12
Functions
Value
Description
'diffuse'
Diffuse prior distributions • The joint prior pdf is f β, σ2 β, σ2 ∝
1 . σ2
• The corresponding marginal and conditional posterior distributions have closed forms (see “Analytically Tractable Posteriors” on page 6-5). 'empirical'
Custom prior distributions • You must also specify the BetaDraws and Sigma2Draws name-value arguments. • The corresponding marginal and conditional posterior distributions do not have closed forms. • Empirical models are better suited for updating a posterior distribution based on new data.
'custom'
Custom prior distributions • You must also specify the LogPDF name-value argument. • The corresponding marginal and conditional posterior distributions do not have closed forms.
For a Bayesian regression model that performs predictor variable selection, choose a value in this table. Value
Description
'mixconjugate'
Stochastic search variable selection (SSVS) [1] conjugate prior distributions • The data likelihood, prior distribution, and posterior distributions compose a conjugate Gaussian mixture model. • β and σ2 are dependent random variables. For more details, see mixconjugateblm.
'mixsemiconjugate'
SSVS [1] semiconjugate prior distributions • The data likelihood, prior distribution, and posterior distributions compose a semiconjugate Gaussian mixture model. • β and σ2 are independent random variables. For more details, see mixsemiconjugateblm.
12-162
bayeslm
Value
Description
'lasso'
Bayesian lasso regression prior distributions [3] • Conditioned on σ2, the prior distribution of each regression coefficient is double exponential with a mean of 0 and scale σ/λ, where λ is the lasso shrinkage parameter. As λ increases, the coefficients tend towards 0. • β and σ2 are dependent random variables. Regression coefficients are independent, a priori.
The prior model type that you choose depends on your assumptions on the joint distribution of the parameters. Your choice can affect posterior estimates and inferences. For more details, see “Implement Bayesian Linear Regression” on page 6-10. Example: ModelType="conjugate" Data Types: char Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then specify Intercept=false. Example: Intercept=false VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2)',...,'Beta(p)'}, where p is the value of NumPredictors. Note You cannot set the name of the intercept or disturbance variance. In displays, bayeslm gives the intercept the name Intercept and the disturbance variance the name Sigma2. Therefore, you cannot use "Intercept" and "Sigma2" as predictor names.
12-163
12
Functions
Example: VarNames=["UnemploymentRate"; "CPI"] Data Types: string | cell | char Options for Conjugate and Semiconjugate Joint Prior Distribution of β
Mu — Mean hyperparameter of Gaussian prior on β zeros(Intercept + NumPredictors,1) (default) | numeric vector Mean hyperparameter of the Gaussian prior on β, specified as a numeric vector. If Mu is a vector, then it must have NumPredictors or NumPredictors + 1 elements. • For NumPredictors elements, bayeslm sets the prior mean of the NumPredictors predictors only. Predictors correspond to the columns in the predictor data (specified during estimation, simulation, or forecasting). bayeslm ignores the intercept in the model, that is, bayeslm specifies the default prior mean to any intercept. • For NumPredictors + 1 elements, the first element corresponds to the prior mean of the intercept, and all other elements correspond to the predictors. Example: Mu=[1; 0.08; 2] Data Types: double V — Conditional covariance matrix hyperparameter of Gaussian prior on β 1e5*eye(Intercept + NumPredictors) (default) | symmetric, positive definite matrix Conditional covariance matrix hyperparameter of the Gaussian prior on β, specified as a c-by-c symmetric, positive definite matrix. c can be NumPredictors or NumPredictors + 1. • If c is NumPredictors, then bayeslm sets the prior covariance matrix to 1e5 0 ⋯ 0 0 ⋮
V
.
0 bayeslm attributes the default prior covariances to the intercept, and attributes V to the coefficients of the predictor variables in the data. Rows and columns of V correspond to columns (variables) in the predictor data. • If c is NumPredictors + 1, then bayeslm sets the entire prior covariance to V. The first row and column correspond to the intercept. All other rows and columns correspond to the columns in the predictor data. The default value is a flat prior. For an adaptive prior, specify diag(Inf(Intercept + NumPredictors,1)). Adaptive priors indicate zero precision in order for the prior distribution to have as little influence as possible on the posterior distribution. For ModelType=conjugate, V is the prior covariance of β up to a factor of σ2. Example: V=diag(Inf(3,1)) Data Types: double
12-164
bayeslm
Options for Bayesian Lasso Regression
Lambda — Lasso regularization parameter 1 (default) | positive numeric scalar | positive numeric vector Lasso regularization parameter for all regression coefficients, specified as a positive numeric scalar or (Intercept + NumPredictors)-by-1 positive numeric vector. Larger values of Lambda cause corresponding coefficients to shrink closer to zero. Suppose X is a T-by-NumPredictors matrix of predictor data, which you specify during estimation, simulation, or forecasting. • If Lambda is a vector and Intercept is true, Lambda(1) is the shrinkage for the intercept, Lambda(2) is the shrinkage for the coefficient of the first predictor X(:,1), Lambda(3) is the shrinkage for the coefficient of the second predictor X(:,2),…, and Lambda(NumPredictors + 1) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors). • If Lambda is a vector and Intercept is false, Lambda(1) is the shrinkage for the coefficient of the first predictor X(:,1),…, and Lambda(NumPredictors) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors). • If you supply the scalar s for Lambda, then all coefficients of the predictors in X have a shrinkage of s. • If Intercept is true, the intercept has a shrinkage of 0.01, and lassoblm stores [0.01; s*ones(NumPredictors,1)] in Lambda. • Otherwise, lassoblm stores s*ones(NumPredictors,1) in Lambda. Example: Lambda=6 Data Types: double Options for Prior Distribution of β and γ for SSVS Predictor Variable Selection
Mu — Component-wise mean hyperparameter of Gaussian mixture prior on β zeros(Intercept + NumPredictors,2) (default) | numeric matrix Component-wise mean hyperparameter of the Gaussian mixture prior on β, specified as an (Intercept + NumPredictors)-by-2 numeric matrix. The first column contains the prior means for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior means for component 2 (the variable-exclusion regime, that is, γ = 0). • If Intercept is false, then Mu has NumPredictors rows. bayeslm sets the prior mean of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, Mu has NumPredictors + 1 elements. The first element corresponds to the prior means of the intercept, and all other elements correspond to the predictor variables. Tip To perform SSVS, use the default value of Mu. Data Types: double V — Component-wise variance factor or variance hyperparameter of Gaussian mixture prior on β repmat([10 0.1],Intercept + NumPredictors,1) (default) | positive numeric matrix 12-165
12
Functions
Component-wise variance factor or variance hyperparameter of the Gaussian mixture prior on β, specified an (Intercept + NumPredictors)-by-2 positive numeric matrix. The first column contains the prior variance factors for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior variance factors for component 2 (the variable-exclusion regime, that is, γ = 0). For conjugate models (ModelType="mixconjugate"), V contains variance factors, and for semiconjugate models (ModelType="mixsemiconjugate"), V contains variances. • If Intercept is false, then V has NumPredictors rows. bayeslm sets the prior variance factor of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, V has NumPredictors + 1 elements. The first element corresponds to the prior variance factor of the intercept, and all other elements correspond to the predictor variables. Tip • To perform SSVS, specify a larger variance factor for regime 1 than for regime 2. That is, for all j, specify V(j,1) > V(j,2). • For details on what value to specify for V, see [1].
Data Types: double Probability — Prior probability distribution for variable inclusion and exclusion regimes 0.5*ones(Intercept + NumPredictors,1) (default) | numeric vector of values in [0,1] | function handle Prior probability distribution for the variable inclusion and exclusion regimes, specified an (Intercept + NumPredictors)-by-1 numeric vector of values in [0,1], or a function handle in the form @fcnName, where fcnName is the function name. Probability represents the prior probability distribution of γ = {γ1,…,γK}, where: • K = Intercept + NumPredictors, which is the number of coefficients in the regression model. • γk ∈ {0,1} for k = 1,…,K. Therefore, the sample space has a cardinality of 2K. • γk = 1 indicates variable VarNames(k) is included in the model, and γk = 0 indicates that the variable is excluded from the model. If Probability is a numeric vector: • Rows correspond to the variable names in VarNames. For models containing an intercept, the prior probability for intercept inclusion is Probability(1). • For k = 1,…,K, the prior probability for excluding variable k is 1 – Probability(k). • Prior probabilities of the variable-inclusion regime, among all variables and the intercept, are independent. If Probability is a function handle, then it represents a custom prior distribution of the variableinclusion regime probabilities. The corresponding function must have this declaration statement (the argument and function names can vary): logprob = regimeprior(varinc)
• logprob is a numeric scalar representing the log of the prior distribution. You can write the prior distribution up to a proportionality constant. 12-166
bayeslm
• varinc is a K-by-1 logical vector. Elements correspond to the variable names in VarNames and indicate the regime in which the corresponding variable exists. varinc(k) = true indicates VarName(k) is included in the model, and varinc(k) = false indicates it is excluded from the model. You can include more input arguments, but they must be known when you call bayeslm. For details on what value to specify for Probability, see [1]. Data Types: double | function_handle Correlation — Prior correlation matrix of β eye(Intercept + NumPredictors) (default) | numeric, positive definite matrix Prior correlation matrix of β for both components in the mixture model, specified as an (Intercept + NumPredictors)-by-(Intercept + NumPredictors) numeric, positive definite matrix. Consequently, the prior covariance matrix for component j in the mixture model is: • For conjugate (ModelType="mixconjugate"), sigma2*diag(sqrt(V(:,j)))*Correlation*diag(sqrt(V(:,j))) • For semiconjugate (ModelType="mixsemiconjugate"), diag(sqrt(V(:,j)))*Correlation*diag(sqrt(V(:,j))) where sigma2 is σ2 and V is the matrix of coefficient variance factors or variances. Rows and columns correspond to the variable names in VarNames. By default, regression coefficients are uncorrelated, conditional on the regime. Note You can supply any appropriately sized numeric matrix. However, if your specification is not positive definite, bayeslm issues a warning and replaces your specification with CorrelationPD, where: CorrelationPD = 0.5*(Correlation + Correlation.');
Tip For details on what value to specify for Correlation, see [1]. Data Types: double Options for Prior Distribution of σ2
A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar Shape hyperparameter of the inverse gamma prior on σ2, specified a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. 12-167
12
Functions
For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. This option does not apply to empirical or custom prior distributions. Example: A=0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale hyperparameter of the inverse gamma prior on σ2, specified as a positive scalar or Inf. With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. This option does not apply to empirical or custom prior distributions. Example: B=5 Data Types: double Required Options for Empirical Joint Prior Distributions
BetaDraws — Random sample from prior distribution of β numeric matrix Random sample from the prior distribution of β, specified as an (Intercept + NumPredictors)-byNumDraws numeric matrix. Rows correspond to regression coefficients: the first row corresponds to the intercept, and the subsequent rows correspond to columns in the predictor data. Columns correspond to successive draws from the prior distribution. BetaDraws and Sigma2Draws must have the same number of columns. For best results, draw a large number of samples. Data Types: double Sigma2Draws — Random sample from prior distribution of σ2 numeric row vector Random sample from the prior distribution of σ2, specified as a 1-by-NumDraws numeric row vector. Columns correspond to successive draws from the prior distribution. BetaDraws and Sigma2Draws must have the same number of columns. For best results, draw a large number of samples. Data Types: double Required Options for Custom Prior Distributions
LogPDF — Log of joint probability density function of (β,σ2) function handle Log of the joint probability density function of (β,σ2), specified as a function handle. 12-168
bayeslm
Suppose logprior is the name of the MATLAB function defining the joint prior distribution of (β,σ2). Then, logprior must have this form. function [logpdf,glpdf] = logprior(params) ... end
where: • logpdf is a numeric scalar representing the log of the joint probability density of (β,σ2). • glpdf is an (Intercept + NumPredictors + 1)-by-1 numeric vector representing the gradient of logpdf. Elements correspond to the elements of params. glpdf is an optional output argument, and only the Hamiltonian Monte Carlo sampler (see hmcSampler) applies it. If you know the analytical partial derivative with respect to some parameters, but not others, then set the elements of glpdf corresponding to the unknown partial derivatives to NaN. MATLAB computes the numerical gradient for missing partial derivatives, which is convenient, but slows sampling. • params is an (Intercept + NumPredictors + 1)-by-1 numeric vector. The first Intercept + NumPredictors elements must correspond to values of β, and the last element must correspond to the value of σ2. The first element of β is the intercept, if one exists. All other elements correspond to predictor variables in the predictor data, which you specify during estimation, simulation, or forecasting. Example: LogPDF=@logprior
Output Arguments PriorMdl — Bayesian linear regression model storing prior model assumptions conjugateblm model object | semiconjugateblm model object | diffuseblm model object | mixconjugateblm model object | mixsemiconjugateblm model object | lassoblm model object | ... Bayesian linear regression model storing prior model assumptions, returned as an object in this table. Value of ModelType
Returned Bayesian Linear Regression Model Object
'conjugate'
conjugateblm
'semiconjugate' semiconjugateblm 'diffuse'
diffuseblm
'empirical'
empiricalblm
'custom'
customblm
'mixconjugate'
mixconjugateblm
'mixsemiconjuga mixsemiconjugateblm te' 'lasso'
lassoblm
PriorMdl specifies the joint prior distribution and characteristics of the linear regression model only. The model object is a template intended for further use. To incorporate data into the model for 12-169
12
Functions
posterior distribution analysis, pass the model object and data to the appropriate object function, for example, estimate or simulate.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Version History Introduced in R2017a
References [1] George, E. I., and R. E. McCulloch. "Variable Selection Via Gibbs Sampling." Journal of the American Statistical Association. Vol. 88, No. 423, 1993, pp. 881–889. [2] Koop, G., D. J. Poirier, and J. L. Tobias. Bayesian Econometric Methods. New York, NY: Cambridge University Press, 2007. [3] Park, T., and G. Casella. "The Bayesian Lasso." Journal of the American Statistical Association. Vol. 103, No. 482, 2008, pp. 681–686.
See Also Objects conjugateblm | semiconjugateblm | diffuseblm | customblm | empiricalblm | lassoblm | mixconjugateblm | mixsemiconjugateblm 12-170
bayeslm
Functions gampdf | sampleroptions | bayesvarm Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10 “Specify Gradient for HMC Sampler” on page 6-18
12-171
12
Functions
bayesvarm Create prior Bayesian vector autoregression (VAR) model object
Syntax PriorMdl = bayesvarm(numseries,numlags) PriorMdl = bayesvarm(numseries,numlags,ModelType=modelType) PriorMdl = bayesvarm(numseries,numlags,ModelType=modelType,Name=Value)
Description To create a Bayesian linear regression model for univariate regression analysis, or to perform Bayesian predictor selection, see bayeslm. To create a non-Bayesian VAR model, see varm. PriorMdl = bayesvarm(numseries,numlags) creates the Bayesian VAR(p) model on page 12187 object PriorMdl, which specifies dimensionalities and prior assumptions for all model coefficients Λ = Φ1 Φ2 ⋯ Φp c δ Β ′ and the innovations covariance Σ, where: • numseries is the number of response time series variables. • p = numlags is the AR polynomial order. • The joint prior distribution of (Λ,Σ) is diffuse. PriorMdl = bayesvarm(numseries,numlags,ModelType=modelType) specifies the joint prior distribution modelType for Λ and Σ. For this syntax, modelType can be 'conjugate', 'semiconjugate', 'diffuse', or 'normal'. For example, ModelType="semiconjugate" specifies semiconjugate priors for the multivariate normal likelihood—specifically, vec(Λ)|Σ is multivariate normal, Σ is inverse Wishart, and Λ and Σ are independent. PriorMdl = bayesvarm(numseries,numlags,ModelType=modelType,Name=Value) uses additional options specified by one or more name-value arguments. For example, for non-diffuse models, you can specify Minnesota prior regularization options on page 12-0 to regularize the coefficients using the Minnesota prior on page 12-188 parameter structure.
Examples Default Diffuse Prior Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and variance Σ. 12-172
bayesvarm
Suppose that the AR coefficient matrices Φ1, . . . , Φ4, model constant c, and innovations covariance matrix Σ are random variables, and their prior distributions are unknown. In this case, use the noninformative diffuse prior: the joint prior distribution Φ1, . . . , Φ4, c, Σ is proportional to Σ
−2
.
Create a diffuse prior model for the 3-D VAR(4) model parameters, which is the default prior model type. numseries = 3; numlags = 4; PriorMdl = bayesvarm(numseries,numlags) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a diffusevarm Bayesian VAR model object representing the prior distribution of the AR coefficient matrices, model constant vector, and innovations covariance matrix. bayesvarm displays a summary of the prior distributions at the command line. • AR — Prior means of the AR coefficient matrices. • Constant — Prior means of the model constant vector. • Trend and Beta — Prior means of the linear time trend vector and exogenous regression coefficient matrix, respectively. Because the values are empty arrays, the corresponding parameters are not in the model. • Covariance — Prior mean of the innovations covariance matrix. If you have data, then you can estimate characteristics of the posterior distribution by passing PriorMdl and the data to estimate.
Matrix-Normal-Inverse-Wishart Conjugate Prior Model Consider the 3-D VAR(4) model in “Default Diffuse Prior Model” on page 12-172. Assume the following: • [Φ1, . . . , Φ4, c]′ | Σ ∼ N13 × 3 M, V, Σ . M is a 13-by-3 matrix of prior coefficient means (M 1: 3, 1: 3 is the prior mean matrix of Φ1′ , M 4: 6, 1: 3 is the prior mean matrix of Φ2′ ,..., and M 13, 1: 3 is the prior mean vector of c). V is a 13-by-13 matrix representing the among-coefficient prior covariance matrix within an equation. Σ is the 3-by-3 random innovations covariance matrix. 12-173
12
Functions
• Σ ∼ Inverse Wishart(Ω, ν). Ω is the 3-by-3 scale matrix, and ν is the degrees of freedom of the inverse Wishart distribution. • The coefficients and the innovations covariance matrix are dependent. • Prior coefficient variances among the equations are proportional. These assumptions and the data likelihood imply a matrix-normal-inverse-Wishart conjugate model. Create a matrix-normal-inverse-Wishart conjugate prior model for the VAR model parameters. numseries = 3; numlags = 4; PriorMdl = bayesvarm(numseries,numlags,ModelType="conjugate") PriorMdl = conjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [13x13 double] [3x3 double] 13 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a conjugatebvarm Bayesian VAR model object representing the prior distribution of the coefficients and innovations covariance matrix. bayesvarm displays a summary of the prior distributions at the command line; it returns the prior mean matrix in vectorized form. The model contains many estimable parameters. To achieve a parsimonious model, bayesvarm applies the Minnesota prior regularization method to the AR coefficients, by default. Inspect the default prior means (centers of shrinkage) of the AR coefficient matrices. AR1 = PriorMdl.AR{1} AR1 = 3×3 0.5000 0 0
0 0.5000 0
AR2 = PriorMdl.AR{2} AR2 = 3×3 0 0
12-174
0 0
0 0
0 0 0.5000
bayesvarm
0
0
0
AR3 = PriorMdl.AR{3} AR3 = 3×3 0 0 0
0 0 0
0 0 0
AR4 = PriorMdl.AR{4} AR4 = 3×3 0 0 0
0 0 0
0 0 0
Each series is an AR(1) model with AR coefficient 0.5, a priori. The tightness on shrinkage of the coefficients is proportional among the equations. Inspect the default tightness values by displaying a heatmap chart of the property V of PriorMdl, which contains a matrix of the scaled tightness on shrinkage of the coefficients for one equation (the unscaled shrinkage is Σ ⊗ V = kron(PriorMdl.Covariance,PriorMdl.V)). Omit the final row and column, which correspond to the model constant.
% Create labels for the chart. numARCoeffMats = PriorMdl.NumSeries*PriorMdl.P; arcoeffnames = strings(numARCoeffMats,1); for r = numlags:-1:1 arcoeffnames(((r-1)*numseries+1):(numseries*r)) = ["\phi_{11,"+r+"}" "\phi_{12,"+r+"}" "\phi_ end heatmap(arcoeffnames,arcoeffnames,PriorMdl.V(1:end-1,1:end-1));
12-175
12
Functions
The tightness values decrease with lag, which suggests (a priori) that the means of the corresponding greater-lagged coefficients are more tightly locked around their center of 0. Display the tightness of the model constant vector. PriorMdl.V(end,end) ans = 10000
The center of the model constant vector is 0 but has a large variance, which allows the estimation procedure to defer more to the data than the prior for the posterior mean of the constant vector. You can specify alternative values after you create a model by using dot notation. For example, increase the tightness of all coefficients by a factor of 100. PriorMdl.V = 100*PriorMdl.V;
Normal Conjugate Prior Model for Coefficients Consider the 3-D VAR(4) model in “Default Diffuse Prior Model” on page 12-172. Assume these prior distributions, as presented in [1]: • vec( Φ1, . . . , Φ4, c ′) | Σ ∼ N39 μ, V . μ is a 39-by-1 vector of prior coefficient means (the model has 39 individual coefficients), and V is a 39-by-39 prior coefficient covariance matrix. 12-176
bayesvarm
• The innovations covariance Σ is a fixed matrix. Suppose econometric theory dictates that −5
10 Σ= 0
−4
10
−4
0 10 0 . 1 −0 . 2 . −0 . 2 1 . 6
Create a normal conjugate prior model for the VAR model coefficients. Specify the value of Σ by using the Sigma name-value argument. numseries = 3; numlags = 4; Sigma = [10e-5 0 10e-4; 0 0.1 -0.2; 10e-4 -0.2 1.6]; PriorMdl = bayesvarm(numseries,numlags,ModelType="normal", ... Sigma=Sigma) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a normalbvarm Bayesian VAR model object representing the prior distribution of the coefficients. Because Σ is fixed for normalbvarm prior models, PriorMdl.Sigma and PriorMdl.Covariance are equal. PriorMdl.Sigma ans = 3×3 0.0001 0 0.0010
0 0.1000 -0.2000
0.0010 -0.2000 1.6000
PriorMdl.Covariance ans = 3×3 0.0001 0
0 0.1000
0.0010 -0.2000
12-177
12
Functions
0.0010
-0.2000
1.6000
Set Minnesota Prior Parameters of Normal-Inverse-Wishart Semiconjugate Prior Model Consider the 3-D VAR(4) model in “Default Diffuse Prior Model” on page 12-172. Assume the following: • vec Φ1, . . . , Φ4, c ′ Σ ∼ N39 μ, V . μ is a 39-by-1 vector of prior coefficient means (the model has 39 individual coefficients), and V is a 39-by-39 prior coefficient covariance matrix. • Σ ∼ InverseWishart Ω, ν . Ω is the 3-by-3 scale matrix, and ν is the degrees of freedom of the inverse Wishart distribution. • The coefficients and the innovations covariance matrix are independent. These assumptions and the data likelihood imply a normal-inverse-Wishart semiconjugate model. The model contains many estimable parameters. To achieve a parsimonious model, bayesvarm enables you to regularize the coefficients by using the Minnesota prior regularization method, rather than specifying each prior mean and variance. Create a normal-inverse-Wishart semiconjugate prior model for the VAR model parameters. Specify the following: • All series are AR(1) models, a priori, with AR coefficient 0.9. Set the Center name-value argument to a 3-by-1 vector composed of 0.9. • The tightness around self lags in Φ1 is 1. Set the SelfLag name-value argument to 1. • The tightness around cross lags in Φ1 is 0.5. Set the CrossLag name-value argument to 0.5. • All tightness values decay by a factor of the lag degree squared. Set the Decay name-value argument to 2. numseries = 3; numlags = 4; center = 0.9*ones(numseries,1); PriorMdl = bayesvarm(numseries,numlags,ModelType="semiconjugate", ... Center=center,SelfLag=1,CrossLag=0.5,Decay=2) PriorMdl = semiconjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant:
12-178
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] 13 {[3x3 double] [3x3 double] [3x3 double] [3x1 double]
[3x3 double]}
bayesvarm
Trend: [3x0 double] Beta: [3x0 double] Covariance: [3x3 double]
PriorMdl is a semiconjugatebvarm Bayesian VAR model object representing the prior distribution of the coefficients and innovations covariance matrix. bayesvarm displays a summary of the prior distributions at the command line; it returns the prior mean matrix in vectorized form. Display the prior means of the AR coefficient matrices. AR1 = PriorMdl.AR{1} AR1 = 3×3 0.9000 0 0
0 0.9000 0
0 0 0.9000
AR2 = PriorMdl.AR{2} AR2 = 3×3 0 0 0
0 0 0
0 0 0
AR3 = PriorMdl.AR{3} AR3 = 3×3 0 0 0
0 0 0
0 0 0
AR4 = PriorMdl.AR{4} AR4 = 3×3 0 0 0
0 0 0
0 0 0
Each series is an AR(1) model, a priori. The property V of PriorMdl contains a matrix of the tightness on shrinkage of the coefficients. The rows and columns of V correspond to the elements of the Mu property of PriorMdl. • Elements 1 through 3 correspond to the lag 1 AR coefficients in the first equation ordered by response variable, that is, ϕ1, 11, ϕ1, 12, and ϕ1, 13. • Elements 4 through 6 correspond to the lag 2 AR coefficients in the first equation. • Elements 7 through 9 correspond to the lag 3 AR coefficients in the first equation. • Elements 10 through 12 correspond to the lag 4 AR coefficients in the first equation. 12-179
12
Functions
• Element 13 is the model constant in the first equation. MATLAB® repeats the pattern for each equation. In this example, the tightness of shrinkage is the same for all equations. Display a heatmap chart of the property V of PriorMdl for the tightness values of the AR coefficients in the first equation.
% Create labels for the chart. numARCoeffMats = PriorMdl.NumSeries*PriorMdl.P; arcoeffnames = strings(numARCoeffMats,1); for r = numlags:-1:1 arcoeffnames(((r-1)*numseries+1):(numseries*r)) = ["\phi_{"+r+",11}" "\phi_{"+r+",12}" "\phi_ end heatmap(arcoeffnames,arcoeffnames,PriorMdl.V(1:numARCoeffMats,1:numARCoeffMats));
The tightness values decrease with lag, which suggests (a priori) that the means of the corresponding greater-lagged coefficients are more tightly locked around their center of 0. By default, AR coefficients are uncorrelated. Display the tightness of the model constant vector. PriorMdl.V(numARCoeffMats + 1,numARCoeffMats + 1) ans = 10000
The center of the model constant vector is 0 but has a large variance, which allows the estimation procedure to defer more to the data than the prior for the posterior mean of the constant vector. 12-180
bayesvarm
You can specify alternative values after you create a model by using dot notation. For example, increase the tightness of all coefficients by a factor of 100. PriorMdl.V = 100*PriorMdl.V;
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double numlags — Number of lagged responses p 0 (default) | nonnegative integer Number of lagged responses p to include in the VAR model, specified as a nonnegative integer. bayesvarm includes lags 1 through numlags. Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: IncludeTrend=true,NumPredictors=3 specifies a linear trend term and a linear regression term for three exogenous variables in all response equations. Model Options
ModelType — Joint prior distribution of (Λ,Σ) 'diffuse' (default) | 'conjugate' | 'semiconjugate' | 'normal' Joint prior distribution of (Λ,Σ), specified as a value in the following table. In the table: • λ = vec(Λ). • d = IncludeConstant + IncludeTrend + NumPredictors. • The inverse Wishart hyperparameters Ω and ν correspond to the name-value arguments and output model properties Omega and DoF, respectively. You can adjust their values by specifying the name-value arguments or by using dot notation after bayesvarm returns PriorMdl.
12-181
12
Functions
Value
Description
'conjugate'
Matrix-normal-inverse-Wishart conjugate model. The priors are Λ Σ N mp + d
×m
Μ, V, Σ
Nm mp + d μ, Σ ⊗ V Σ Inverse Wishart Ω, ν , where Λ and Σ are dependent. 'semiconjugate Normal-inverse-Wishart semiconjugate model. The priors are ' λ Σ Nm mp + d μ, V Σ Inverse Wishart Ω, ν , where Λ and Σ are independent. 'diffuse'
Diffuse prior distributions. The joint prior pdf is f Λ, Σ Λ, Σ ∝ Σ
− m + 1 /2
.
Regularization options do not apply to diffuse priors. 'normal'
Normal conjugate prior model. The prior is λ Nm mp + d μ, V . Σ is known and fixed, and it corresponds to the property Sigma of PriorMdl. After bayesvarm returns PriorMdl, you can adjust the value of Σ by using dot notation.
Note • The multivariate normal hyperparameters μ and V correspond to the Mu and V properties of PriorMdl, respectively. The Minnesota prior regularization options on page 12-0 [1] enable you to specify μ and V for coefficient shrinkage and tightness completely and easily. You can also display or adjust their values directly by using dot notation after bayesvarm returns PriorMdl. • The prior model type that you choose depends on your assumptions about the joint distribution of the parameters. Your choice can affect posterior estimates and inferences. For more details, see “Implement Bayesian Linear Regression” on page 6-10.
Example: ModelType="conjugate" Data Types: char | string SeriesNames — Response series names string vector | cell vector of character vectors Response series names for display, specified as a length m string vector or cell vector of character vectors. The default is ["Y1" "Y2" ... "Ym"]. Example: SeriesNames=["CPI" "Unemployment"] Data Types: string | char 12-182
bayesvarm
IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Example: IncludeConstant=false Data Types: logical IncludeTrend — Flag for including linear time trend term δ false (default) | true Flag for including a linear time trend term δ, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Example: IncludeTrend=true Data Types: logical NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. bayesvarm includes all predictor variables symmetrically in each response equation. Example: NumPredictors=3 Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the parametric form of the model, for example "2-Dimensional VAR(3) Model". Example: Description="Model 1" Data Types: string | char Minnesota Prior Regularization Options for Nondiffuse Priors
Center — Center of shrinkage 0.5*ones(numseries,1) (default) | numeric vector Center of shrinkage for lag 1 self lags, or the prior expectation on the diagonal elements of Φ1, specified as a numseries-by-1 numeric vector. Center(j) is the prior mean of ϕ1, j j. 12-183
12
Functions
Each element can be any real number, but typical values are in the interval [0,1]. This table describes the prior model of an individual response series for the specified value. Value
Prior Model
0
White noise process
In the interval (0,1)
Stationary AR(1)
1
Random walk
bayesvarm sets the prior means of the following variables to 0: • Off-diagonal elements of Φ1 • All elements of Φq, q > 1 • Model constants c • Linear time trend coefficients δ • Exogenous predictor coefficients Β For more details, see “Minnesota Prior” on page 12-188. Example: Center=0.01*ones(3,1) Data Types: double SelfLag — Tightness of shrinkage on all self lags of Φ1 0.05 (default) | positive numeric scalar Tightness of shrinkage on all self lags of Φ1, specified as a positive numeric scalar. SelfLag contributes to the prior variances of all self-lag coefficients in the model (property V of the output model PriorMdl). Tip Relatively small tightness values indicate strong belief in prior assumptions during estimation (that is, relatively small values tightly lock self lags around their prior mean). Relatively large values place more weight on information in the data during estimation. For more details, see “Minnesota Prior” on page 12-188. Example: SelfLag=0.5 Data Types: double CrossLag — Tightness on all cross-variable lag coefficients of Φ1 0.01 (default) | positive numeric scalar Tightness on all cross-variable lag coefficients of Φ1, specified as a positive numeric scalar. For conjugate prior models, bayesvarm sets 'CrossLag' to the value of the SelfLag name-value argument. CrossLag contributes to the prior variances of all cross-variable lag coefficients in the model (property V of the output model PriorMdl).
12-184
bayesvarm
Tip Relatively small tightness values indicate strong belief in prior assumptions during estimation (that is, relatively small values tightly lock cross lags around their prior mean). Relatively large values place more weight on information in the data during estimation. For more details, see “Minnesota Prior” on page 12-188. Example: CrossLag=0.05 Data Types: double Decay — Speed of prior variance decay 1 (default) | positive numeric scalar Speed of the prior variance decay with increasing lag, specified as a positive numeric scalar. Decay contributes to the prior variance of all lag coefficient matrices greater than lag 1 (property V of the output model PriorMdl). Tip Relatively large values cause lag coefficient variances to decay more quickly, which tightly locks higher-order lag coefficients to their prior means. Example: Decay=2 Data Types: double Scale — Response variable variances ones(numseries,1) (default) | positive numeric vector Response variable variances for the cross-variable lag coefficient tightness CrossLag, specified as a numseries-by-1 positive numeric vector. Elements correspond to the response variables. For conjugate prior models, bayesvarm ignores Scale. Scale contributes to the prior variances of all cross-variable lag coefficients in the model (property V of the output model PriorMdl), but does not directly contribute to the innovations covariance matrix stored in the property Sigma. Tip Specify 'Scale' when response variable scales are unbalanced. Example: Scale=[2 1] Data Types: double VarianceX — Prior variance of exogenous coefficients 1e4 (default) | positive numeric scalar Prior variance of exogenous coefficients, specified as a positive numeric scalar. VarianceX sets the prior variances of all exogenous variables, including the model constant c, linear time trend term δ, and exogenous predictor coefficients Β. VarianceX contributes to the value of the prior coefficient variance (property V of the output model PriorMdl). 12-185
12
Functions
Tip Relatively small tightness values indicate strong belief in prior assumptions during estimation (that is, relatively small values tightly lock coefficients of exogenous variables to their prior means). Relatively large values place more weight on information in the data during estimation. Example: VarianceX=100 Data Types: double Innovations Covariance Hyperparameter Options
Sigma — Fixed innovations covariance matrix for normal prior model positive definite numeric matrix Fixed innovations covariance matrix for the normal prior model, specified as a numseries-bynumseries positive definite numeric matrix. If you specify ModelType="normal", you must specify Sigma. For other prior models, Σ is a random variable, so Sigma does not apply. Example: Sigma=eye(2) Data Types: double Omega — Inverse Wishart scale matrix diag(Scale) (default) | positive definite numeric matrix Inverse Wishart scale matrix, specified as a numseries-by-numseries positive definite numeric matrix. Example: Omega=eye(numseries) Data Types: double DoF — Inverse Wishart degrees of freedom numseries + 10 (default) | positive numeric scalar Inverse Wishart degrees of freedom, specified as a positive numeric scalar. For a proper distribution, specify a value that is greater than numseries – 1. For a distribution with a finite mean, specify a value that is greater than numseries + 1. Example: DoF=8 Data Types: double
Output Arguments PriorMdl — Bayesian VAR model storing prior model assumptions conjugatebvarm model object | semiconjugatebvarm model object | diffusebvarm model object | normalbvarm model object Bayesian VAR model storing prior model assumptions, returned as one of the model objects listed in this table.
12-186
bayesvarm
Value of ModelType
Returned Bayesian VAR Model Object
'conjugate'
conjugatebvarm
'semiconjugate' semiconjugatebvarm 'diffuse'
diffusebvarm
'normal'
normalbvarm
PriorMdl specifies the joint prior distribution and characteristics of the VAR model only. The model object is a template intended for further use. To incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function, for example, estimate or simulate.
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. • zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ). 12-187
12
Functions
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is T
∏
ℓ Λ, Σ y, x =
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ) (see the ModelType name-value argument). bayesvarm enables you to adjust hyperparameters by using the Minnesota prior on page 12-188 assumptions and parameter structure [1]; the structure regularizes the coefficients. In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution of (Λ,Σ). Minnesota Prior The Minnesota prior, introduced in [1], is a hyperparameter structure for the joint prior distribution of (Λ,Σ) used to obtain a parsimonious model by regularizing the endogenous coefficient matrices of a Bayesian VAR(p) model on page 12-187. Minnesota regularization considers a tuning parameter for the center of shrinkage and several tuning parameters for the tightness of shrinkage. The center of shrinkage is specified by the prior mean of the coefficients (see the Center name-value argument). The Minnesota regularization method sets the prior mean of all coefficients to 0 except the lag 1 self lags (diagonal elements of the AR coefficient matrix Φ1). The prior mean of each lag 1 self lag is a real number, typically in the interval [0,1], where (a priori) response series j is one of the following: • White noise process if prior mean j is 0 • AR(1) model if prior mean j is in the interval (0,1) • Random walk if prior mean j is 1 For example, suppose numseries is 2, numlags is 2, NumPredictors is 1, and all other model options have default values. If you specify Center=0.01*ones(2,1), the vectorized prior mean of Λ|Σ is Β ⨉ Φ1 ⨉Φ2 ⨉c ⨉ 0.01 0 0 0 0 0 μ = vec ′ 0 0.01 0 0 0 0 ⨉ϕ1, 1: 0.01 0
=[
,
where ϕq,j: is row j of Φq. MATLAB stores μ in the Mu property of PriorMdl. You can adjust Mu by using dot notation. The tightness of shrinkage is specified by the prior variance of the coefficients ϕr,jk. For all prior models except conjugate,
12-188
bayesvarm
v0 Var ϕq, jk Σ =
;
j=k
v× σ2j
,
qd
; j≠k 2
qd σk where:
• v0 is the tightness on the prior means of all self lags of Φ1 (SelfLag). • d is the speed of tightness decay (Decay). • ν× is the tightness on the prior means of all cross-variable lag coefficients of Φ1 (CrossLag). • σ2 is the prior response variance (element j of Scale). j For conjugate prior models, Var ϕq, jk Σ =
v0 qd
∀ j, k .
Tips • Because MATLAB does not adjust input data for variable scales, a best practice is to adjust all series to have a similar magnitude. Consequently, the scales of the coefficients are similar. • By default, bayesvarm creates Bayesian VAR models by using the Minnesota prior on page 12188 assumptions and parameter structure [1]. After you create a model, you can inspect the effect of coefficient shrinkage by calling summarize(PriorMdl). You can change the prior mean and variance by setting PriorMdl.Mu and PriorMdl.V, respectively.
Version History Introduced in R2020a
References [1] Litterman, Robert B. "Forecasting with Bayesian Vector Autoregressions: Five Years of Experience." Journal of Business and Economic Statistics 4, no. 1 (January 1986): 25–38. https://doi.org/10.2307/1391384.
See Also Objects normalbvarm | conjugatebvarm | semiconjugatebvarm | diffusebvarm Functions bayeslm
12-189
12
Functions
bkfilter Baxter-King filter for trend and cyclical components
Syntax [Trend,Cyclical] = bkfilter(Y) [TTbl,CTbl] = bkfilter(Tbl) [ ___ ] = bkfilter( ___ ,Name=Value) bkfilter( ___ ) bkfilter(ax, ___ ) [ ___ ,h] = bkfilter( ___ )
Description Separate one or more time series into additive trend and cyclical components by applying the BaxterKing filter on page 12-199 [1]. bkfilter optionally plots the series and smoothed trend component, with cycles removed. In addition to the Baxter-King filter, Econometrics Toolbox supports the Christiano-Fitzgerald (cffilter), Hamilton (hfilter), and Hodrick-Prescott (hpfilter) filters. [Trend,Cyclical] = bkfilter(Y) returns the additive trend Trend and cyclical Cycilcal components from applying the Baxter-King filter on page 12-199 to each variable (column) of the input matrix of time series data Y, using the definition of a business cycle in [2] for quarterly data. [TTbl,CTbl] = bkfilter(Tbl) returns the tables or timetables TTbl and CTbl containing variables for the trend and cyclical components, respectively, from applying the Baxter-King filter to each variable in the input table or timetable Tbl. To select different variables in Tbl to filter, use the DataVariables name-value argument. [ ___ ] = bkfilter( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. bkfilter returns the output argument combination for the corresponding input arguments. For example, bkfilter(Tbl,Stationarity=true,DataVariables=1:5) applies the Baxter-King filter to the first five variables in the input table Tbl and specifies at all input series are stationary. bkfilter( ___ ) plots time series variables in the input data and their respective smoothed trend components (cycles removed), computed by the Baxter-King filter, on the same axes. bkfilter(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = bkfilter( ___ ) plots the specified series and their trend components, and additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it.
Examples
12-190
bkfilter
Apply Baxter-King Filter to Matrix of Data Plot the cyclical component of the US post-WWII, seasonally adjusted, quarterly, real gross national product (GNPR). load Data_GNP GNPR = Data(:,2); [trend,cyclical] = bkfilter(GNPR); T = numel(trend) T = 235
trend and cyclical are 235-by-1 vectors containing the trend and cyclical components, respectively, resulting from applying the Baxter-King filter to the series with default upper and lower cutoffs, and lag length for the filter moving average. The first and last 12 values are NaNs. plot(dates,cyclical) axis tight ylabel("Real GNP Cyclical Component")
Apply Baxter-King Filter to Table Variables Apply the Baxter-King filter to all variables in input table variables.
12-191
12
Functions
Load the Schwert stock data set Data_SchwertStock.mat, which contains monthly returns of the NYSE index from 1871 through 2008 in DataTimeTableMth, among three other variables (for details, enter Description). Remove all missing observations from all series. load Data_SchwertStock TTM = rmmissing(DataTimeTableMth);
Aggregate the monthly data in the timetable to quarterly measurements. TTQ = convert2quarterly(TTM);
Apply the Baxter-King filter to all variables in the quarterly timetable. Use the default cutoffs and lag length for the moving average. [TQTT,CQTT] = bkfilter(TTQ); size(TQTT) ans = 1×2 220
4
TQTT and CQTT are 220-by-4 timetables containing the trend and cyclical components, respectively, of the series in TTQ. Variables in the input and output timetables correspond. By default, bkfilter filters all variables in the input table or timetable. To select a subset of variables, set the DataVariables option. The default lag length is 12. Consequently, the first and last 12 rows in the output timetable are NaNvalued. Remove the leading and lagging NaNs from the trends and display what remains. TQTTCut = rmmissing(TQTT); CQTTCut = rmmissing(CQTT); TQTTCut TQTTCut=196×4 timetable Time Return ___________ __________ 31-Mar-1874 30-Jun-1874 30-Sep-1874 31-Dec-1874 31-Mar-1875 30-Jun-1875 30-Sep-1875 31-Dec-1875 31-Mar-1876 30-Jun-1876 30-Sep-1876 31-Dec-1876 31-Mar-1877 30-Jun-1877 30-Sep-1877 31-Dec-1877 ⋮
12-192
-0.039822 -0.017105 0.0039487 -0.0078419 0.020326 -0.0020712 -0.0085514 -0.006103 -0.0055681 0.013271 -0.033603 0.04053 0.0023469 -0.061762 0.066959 -0.017554
DivYld _________
CapGain __________
CapGainA __________
0.0032538 0.0044321 0.0010179 0.0050448 0.0024432 0.0038703 0.0044146 0.0036185 0.0031237 0.0044001 0.0042907 0.0045942 0.0032843 0.004893 0.0047892 0.0029106
-0.028711 -0.023919 0.002954 -0.0063275 0.015128 -0.017158 -0.01171 -0.0058395 -0.0059341 0.0034571 -0.03317 0.031644 -0.014032 -0.049214 0.058975 -0.026179
-0.043076 -0.021537 0.0029307 -0.012887 0.017883 -0.0059416 -0.012966 -0.0097214 -0.0086918 0.0088705 -0.037894 0.035936 -0.0009374 -0.066655 0.06217 -0.020464
bkfilter
CQTTCut CQTTCut=196×4 timetable Time Return ___________ __________ 31-Mar-1874 30-Jun-1874 30-Sep-1874 31-Dec-1874 31-Mar-1875 30-Jun-1875 30-Sep-1875 31-Dec-1875 31-Mar-1876 30-Jun-1876 30-Sep-1876 31-Dec-1876 31-Mar-1877 30-Jun-1877 30-Sep-1877 31-Dec-1877 ⋮
0.01699 0.020025 0.016201 0.0064867 -0.002434 -0.0061591 -0.0033857 0.0015781 0.0017405 -0.0062558 -0.021212 -0.034671 -0.039779 -0.028847 -0.003445 0.023547
DivYld ___________
CapGain __________
CapGainA __________
-0.00117 -0.0013329 -0.00079002 1.9907e-06 0.00036989 3.1258e-05 -0.00076308 -0.0013475 -0.0010938 1.9708e-05 0.0013923 0.0023024 0.0023122 0.0017301 0.0011069 0.00090623
0.015758 0.018379 0.013713 0.0036027 -0.0041687 -0.0050638 0.00044226 0.0058395 0.0031869 -0.0094095 -0.026956 -0.03854 -0.038012 -0.022516 0.0022493 0.022362
0.01816 0.021358 0.016991 0.0064847 -0.0028039 -0.0061903 -0.0026226 0.0029255 0.0028343 -0.0062755 -0.022604 -0.036974 -0.042091 -0.030577 -0.0045519 0.022641
To compare outputs between different tabular inputs, apply the Baxter-King filter to all variables in the table of monthly data DataTableMth and the timetable of monthly data TTM. % Table input of monthly data DTM = rmmissing(DataTableMth); [TMDT,CMDT] = bkfilter(DataTableMth); TMDT = rmmissing(TMDT); CMDT = rmmissing(CMDT); size(TMDT) ans = 1×2 632
4
tail(TMDT)
May1924 Jun1924 Jul1924 Aug1924 Sep1924 Oct1924 Nov1924 Dec1924
Return __________
DivYld _________
CapGain __________
CapGainA __________
-0.0016302 0.047692 0.044844 0.010929 -0.0086959 -0.0014852 0.062927 0.045108
0.002973 0.0065778 0.0060522 0.0019358 0.006971 0.0049456 0.0020931 0.0070319
-0.0046032 0.041115 0.038792 0.0089936 -0.015667 -0.0064308 0.060834 0.038076
-0.0046032 0.041115 0.038792 0.0089936 -0.015667 -0.0064308 0.060834 0.038076
tail(CMDT) Return _________
DivYld ___________
CapGain _________
CapGainA _________
12-193
12
Functions
May1924 Jun1924 Jul1924 Aug1924 Sep1924 Oct1924 Nov1924 Dec1924
0.0074662 0.017044 0.016657 0.0096193 0.0035508 0.0064063 0.013666 0.015515
-4.9042e-05 0.00019971 0.00028124 0.00019693 0.00014389 7.9476e-05 7.2083e-05 0.00010861
0.0075152 0.016844 0.016376 0.0094224 0.0034069 0.0063268 0.013594 0.015407
0.0075152 0.016844 0.016376 0.0094224 0.0034069 0.0063268 0.013594 0.015407
% Timetable input of monthly data [TMTT,CMTT] = bkfilter(TTM); TMTT = rmmissing(TMTT); CMTT = rmmissing(CMTT); size(TMTT) ans = 1×2 632
4
tail(TMTT) Time ___________
Return __________
DivYld _________
CapGain __________
CapGainA __________
01-May-1924 01-Jun-1924 01-Jul-1924 01-Aug-1924 01-Sep-1924 01-Oct-1924 01-Nov-1924 01-Dec-1924
-0.0016302 0.047692 0.044844 0.010929 -0.0086959 -0.0014852 0.062927 0.045108
0.002973 0.0065778 0.0060522 0.0019358 0.006971 0.0049456 0.0020931 0.0070319
-0.0046032 0.041115 0.038792 0.0089936 -0.015667 -0.0064308 0.060834 0.038076
-0.0046032 0.041115 0.038792 0.0089936 -0.015667 -0.0064308 0.060834 0.038076
tail(CMTT) Time ___________
Return _________
DivYld ___________
CapGain _________
CapGainA _________
01-May-1924 01-Jun-1924 01-Jul-1924 01-Aug-1924 01-Sep-1924 01-Oct-1924 01-Nov-1924 01-Dec-1924
0.0074662 0.017044 0.016657 0.0096193 0.0035508 0.0064063 0.013666 0.015515
-4.9042e-05 0.00019971 0.00028124 0.00019693 0.00014389 7.9476e-05 7.2083e-05 0.00010861
0.0075152 0.016844 0.016376 0.0094224 0.0034069 0.0063268 0.013594 0.015407
0.0075152 0.016844 0.016376 0.0094224 0.0034069 0.0063268 0.013594 0.015407
Because the data is disaggregated, the outputs of the daily data have more rows than from the quarterly data. The filter results of the daily inputs are equal among the corresponding outputs, but bkfilter returns tables of results, instead of timetables, when you supply data in a table.
Set Filter Parameters for Data Periodicity Load the Nelson-Plosser macroeconomic data set Data_NelsonPlosser.mat, which contains series measured yearly in the timetable DataTimeTable. 12-194
bkfilter
load Data_NelsonPlosser
Apply the Baxter-King filter to the real and nominal GNP series, GNPR and GNPN, respectively. Filter out cyclical component frequencies outside the interval [2,8], and set the lag length of the moving average to 3 years. Plot the trend component with each series. bkfilter(DataTimeTable,DataVariables=["GNPR" "GNPN"], ... LowerCutoff=2,UpperCutoff=8,LagLength=3);
Experiment with filter parameter values by adjusting the interactive controls. varnames = string(DataTimeTable.Properties.VariableNames); lc =
; % LowerCutoff
uc =
; % UpperCutoff
q = tfs = vn =
; % LagLength ; % Stationarity ; % DataVariables
figure [TTbl,CTbl,h] = bkfilter(DataTimeTable,DataVariables=vn, ... LowerCutoff=lc,UpperCutoff=uc,LagLength=q,Stationarity=tfs);
12-195
12
Functions
Input Arguments Y — Time series data numeric matrix Time series data, specified as a numObs-by-numVars numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. Specify numVars variables to filter by using the DataVariables argument. The selected variables must be numeric. ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, bkfilter plots to the current axes (gca). 12-196
bkfilter
Note bkfilter removes, from the specified data, all rows containing at least one missing observation, represented by a NaN value. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: bkfilter(Tbl,Stationarity=true,DataVariables=1:5) applies the Baxter-King filter to the first five variables in the input table Tbl and specifies at all input series are stationary. LowerCutoff — Lower cutoff period for cyclical component 6 (default) | numeric scalar | numeric vector Lower cutoff period for the cyclical component, in units of data periodicity, specified as a numeric scalar or a length numVars vector with elements greater than or equal to 2. For a scalar, bkfilter uses LowerCutoff for all input series. For a vector, bkfilter applies LowerCutoff(j) to selected j in the input data. The default is 6, meaning 6 quarters (18 months), which is based on the definition of business cycle in [2]. For more details, see “Tips” on page 12-199 Example: LowerCutoff=[4 6] applies a lower cutoff of 4 to the first series in the input data and a lower cutoff of 6 to the second series. Data Types: double UpperCutoff — Upper cutoff period for cyclical component 32 (default) | numeric scalar | numeric vector Upper cutoff period for the cyclical component, in units of data periodicity, specified as a numeric scalar or a length numVars vector with elements greater than or equal to the corresponding LowerCutoff for each series. For a scalar, bkfilter uses UpperCutoff for all input series. For a vector, bkfilter applies UpperCutoff(j) to series j in the input data. The default is 32, meaning 32 quarters (8 years), which is based on the definition of business cycle in [2]. For more details, see “Tips” on page 12-199 Example: UpperCutoff=[32 36] applies an upper cutoff of 32 to the first series in the input data and an upper cutoff of 36 to the second series. Data Types: double 12-197
12
Functions
LagLength — Number of consecutive lags 12 (default) | positive integer | vector of positive integers Number of lags of the symmetric moving average, specified as a positive integer less than (numObs −1)/2 or vector of such positive integers. For a scalar, bkfilter applies LagLength to all input series. For a vector, bkfilter applies LagLength(j) to series j in the input data. The default is 12, suggested in [1] for quarterly data. For more details, see “Tips” on page 12-199 Example: LagLength=[4 8] specifies 4 lags for the symmetric moving average of the first series in the input data and 8 for the symmetric moving average of the second series. Data Types: double Stationarity — Flag indicating whether input series is stationary false (default) | true | logical vector Flag indicating whether input series is stationary, specified as a value or a vector of values in this table. Value
Description
false
Input series is nonstationary
true
Input series is stationary
For a scalar, bkfilter applies Stationarity to all input series. For a vector, bkfilter applies Stationarity(j) to series j in the input data. Example: Stationarity=[true false] specifies that the first input series is stationary and the second input series is nonstationary. Data Types: logical DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl that bkfilter filters, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string
Output Arguments Trend — Smoothed trend component τt numeric vector | numeric matrix 12-198
bkfilter
Smoothed trend component τt of each series in the data, returned as a numObs-by-numVars numeric matrix. bkfilter returns Trend when you supply the input Y. The first and last LagLength values are NaN. Cyclical — Cyclical component ct table | timetable Cyclical component ct of each series in the data, returned as a numObs-by-numVars numeric matrix. bkfilter returns Cyclical when you supply the input Y. The first and last LagLength values are NaN. TTbl — Smoothed trend component τt table | timetable Smoothed trend component τt of each specified series, returned as a numObs-by-numVars table or timetable, the same data type as Tbl. bkfilter returns TTbl when you supply the input Tbl. For each selected series and corresponding LagLength, the first and last LagLength values are NaN. CTbl — Cyclical component ct numeric vector | numeric matrix Cyclical component ct of each specified series, returned as a numObs-by-numVars table or timetable, the same data type as Tbl. bkfilter returns CTbl when you supply the input Tbl. For each selected series and corresponding LagLength, the first and last LagLength values are NaN. h — Handles to plotted graphics objects vector of graphics objects Handles to plotted graphics objects, returned as a vector of graphics objects. bkfilter plots the data and trend only when you return no outputs or you return h. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
More About Baxter-King Filter The Baxter-King filter separates a time series yt into a trend component τt (Trend and TTbl) and cyclical component ct (Cyclical and CTbl) such that yt = τt + ct. The method implements a symmetric, fixed-length, time-invariant moving average for the cycle as a finite-sample approximation of an ideal bandpass filter [1].
Tips Baxter and King [1] suggest values in the table for the cutoff period name-value arguments LowerCutoff and UpperCutoff, and lag length name-value argument LagLength that depend on the periodicity of the data. 12-199
12
Functions
Periodicity
LowerCutoff
UpperCutoff
LagLength
Yearly
2
8
3
Quarterly
6
32
12
Monthly
18
96
36
In practice, use vectors of cutoff periods and lag lengths to test alternatives. Use the plot produced by bkfilter to compare results among settings.
Version History Introduced in R2023a
References [1] Baxter, Marianne, and Robert G. King. "Measuring Business Cycles: Approximate Band-Pass Filters for Economic Time Series." Review of Economics and Statistics 81, no. 4 (November 1999): 575–93. https://doi.org/10.1162/003465399558454. [2] Burns, Arthur F., and Wesley C. Mitchell. Measuring Business Cycles. Cambridge, MA: National Bureau of Economic Research, 1946.
See Also hpfilter | cffilter | hfilter Topics “Choose Time Series Filter for Business Cycle Analysis” on page 2-40 When to Use the Hodrick-Prescott Filter
12-200
bnlssm
bnlssm Create Bayesian nonlinear non-Gaussian state-space model
Description bnlssm creates a bnlssm object, representing a Bayesian nonlinear non-Gaussian state-space model on page 12-212, from a specified nonlinear mapping function, which defines the state-space model structure, and the log prior distribution function of the parameters. The state-space model can be time-invariant on page 11-4 or time-varying on page 11-5, and the state or observation variables, xt or yt, respectively, can be multivariate series. State disturbances are Gaussian random variables, and the observation innovations have a custom distribution. In general, a bnlssm object specifies the joint prior distribution and characteristics of the state-space model only. That is, the model object is a template intended for further use. Alternative state-space models include: • The ssm model object — Standard linear Gaussian state-space model • The dssm model object — Standard linear Gaussian state-space model with diffuse initial state distribution • The bssm model object — Bayesian linear state-space model
Creation Syntax PriorMdl = bnlssm(ParamMap,ParamDistribution) PriorMdl = bnlssm(ParamMap,ParamDistribution,Name=Value) Description PriorMdl = bnlssm(ParamMap,ParamDistribution) creates the Bayesian nonlinear nonGaussian state-space model on page 12-212 object PriorMdl. ParamMap is a function of the collection of state-space model parameters Θ that characterizes the nonlinear state dynamics (transition of states xt from time t – 1 to t) and nonlinear state measurements (observations) yt. ParamDistribution is the log prior density of Θ. The state model has additive linear Gaussian disturbances ut; the observation model can have additive linear Gaussian innovations εt or yt can have a custom density. PriorMdl is a template that specifies the joint prior distribution of Θ and the structure of the nonlinear state-space model. PriorMdl = bnlssm(ParamMap,ParamDistribution,Name=Value) sets properties on page 12207 using name-value arguments. For example, bnlssm(ParamMap,ParamDistribution,ObservationForm="distribution") specifies that the observation log density log(p(yt|xt)) is custom and defined in ParamMap. 12-201
12
Functions
Input Arguments ParamMap — Parameter Θ mapping characterizing state-space model structure function handle Parameter Θ mapping characterizing the state-space model structure and determining the data likelihood, specified as a function handle in the form @fcnName, where fcnName is the function name. ParamMap sets the ParamMap property. This table contains the supported forms for ParamMap and their required signatures (For more details on the terms in the equations, see Bayesian nonlinear non-Gaussian state-space model on page 12-212). The distribution of the observations yt determines the form and paramMap is the MATLAB function name for ParamMap, but you can use a different name. Observation Form
Observation Distribution
Equation
εt is an additive linear Gaussian random series.
[A,B,C,D,Mean0,Cov0] = xt = At xt − 1; θAt + Bfunction t θBt ut
Custom
[A,B,LogY,Mean0,Cov0] = xt = At xt − 1; θAt + Bfunction t θBt ut
Distribution You must set ObservationForm="d istribution"
State-Space Model
Required ParamMap Signature
yt = Ct xt; θCt + Dt θDt εt .
yt logp yt xt; θyt .
paramMap can accept additional inputs, such as predictor data for a regression component in the observation equation, and it can return additional outputs. This signature shows the reserved, but optional, outputs. function [A,B,LogY,Mean0,Cov0,StateType,DeflatedData] = paramMap(theta)
• theta is a numParams-by-1 numeric vector of the state-space model parameters Θ as the first input argument. The function accepts inputs in subsequent positions. • ParamMap returns the state-space model parameters in this table.
12-202
bnlssm
Equation Quantity
Output Position Output
State-transition 1 mapping on page 12213, At
Required output A is one of the following quantities: • For a linear time-invariant model, A is an m-by-m coefficient matrix. • For a linear time-varying model, A is a T-by-1 cell vector, where cell t contains the mt-by-mt–1 coefficient matrix. • For a nonlinear mapping, A is a function handle. The corresponding function must have this signature: function xt = at(lagxt)
• lagxt is a column vector or matrix of states at time t – 1, with mt–1 rows. • xt is a column vector or matrix at time t, with mt rows. The function can accept additional inputs and return additional outputs. For details on input and output matrices, see Multipoint. State-disturbanceloading coefficient matrix on page 12213, Bt
2
Required output B is one of the following mappings for the additive linear Gaussian state-disturbance series ut: • For a linear time-invariant model, B is an m-by-k coefficient matrix. • For a linear time-varying model, B is a T-by-1 cell vector, where cell t contains the mt-by-kt coefficient matrix.
12-203
12
Functions
Equation Quantity
Output Position Output
Measurementsensitivity mapping on page 12-213, Ct
3, for equation form only
Required output C, for equation-form models, is one of the following quantities: • For a linear time-invariant model, C is an n-by-m coefficient matrix. • For a linear time-varying model, C is a T-by-1 cell vector, where cell t contains the nt-by-m coefficient matrix. • For a nonlinear mapping, C is a function handle. The corresponding function must have this signature: function yt = ct(xt)
• xt is a column vector or matrix of states at time t, with mt rows. • yt is a column vector or matrix at time t, with mt rows. The function can accept additional inputs and return additional outputs. For details on input and output matrices, see Multipoint. Observation4, for equation innovation coefficient form only matrix on page 12214, Dt
Required output D, for equation-form models, is one of the following mappings for the additive linear Gaussian observation-innovation series εt: • For a linear time-invariant model, D is an n-by-h coefficient matrix. • For a linear time-varying model, D is a T-by-1 cell vector, where cell t contains the nt-by-ht coefficient matrix.
log p(yt|xt;Θyt)
3, for distribution Required output LogY, for distribution-form models, form only is a function handle to the custom log observation density. The corresponding function must have this signature: function p = logyt(yt,xt)
• yt is a column vector of responses at time t yt with nt rows. • xt is a column vector or matrix of states at time t, with mt rows. • p is a scalar or row vector of corresponding log densities. The function can accept additional inputs and return additional outputs. For details on input and output matrices, see Multipoint.
12-204
bnlssm
Equation Quantity
Output Position Output
Initial state mean vector, μ0
• 5, for equation Output Mean0 is an m0-by-1 vector. form • For linear state transition A, the default is the • 4, for mean of the stationary distribution of the states. distribution • For nonlinear state transitions, you must specify form Mean0. bnlssm assumes x0 ~ N(μ0,Σ0) regardless of equation form. For more details, see “State Characteristics” on page 12-214.
Initial state 6 covariance matrix, Σ0
Output Cov0 is an m0-by-m0 matrix. • For linear state transition A, the default is the covariance of the stationary distribution of the states. • For nonlinear state transitions, you must specify Cov0. For more details, see “State Characteristics” on page 12-214.
State classification vector, StateType
7
Optional output vector of flags specifying the corresponding state type, either stationary (0), the constant 1 (1), or diffuse, static on page 12-215, or nonstationary (2). For more details, see “State Characteristics” on page 12-214.
DeflatedData
8
Optional output array of response data deflated by predictor data, which accommodates a regression component in the observation equation
The subscript t of functions and parameters indicate that the parameters can be time-varying. Ignore the subscript for time-invariant functions or parameters. To skip specifying an optional output argument, set the argument to [] in the function body. For example, to skip specifying StateType, set StateType = []; in the function. Specify parameters to include in the posterior distribution by setting their value to an entry in the first input argument theta and set known entries of the coefficients to their values. For example, the following lines define the 1-D time-invariant state-space model xt = axt − 1 + but yt = xt + dεt . A B C D
= = = =
theta(1); theta(2); 1; theta(3);
If paramMap requires the input parameter vector argument only, you can create the bnlssm object by calling: Mdl = bnlssm(@paramMap,...)
12-205
12
Functions
In general, create the bnlssm object by calling: Mdl = bnlssm(@(theta)paramMap(theta,...otherInputArgs...),...)
Example: bnlssm(@(params)paramFun(theta,y,z),@ParamDistribution) specifies the function paramFun that accepts the state-space model parameters theta, observed responses y, and predictor data z. Tip • Because out-of-bounds prior density evaluation is 0, set the log prior density of out-of-bounds parameter arguments to -Inf. • A best practice is to set StateType of each state within ParamMap for both of the following reasons: • By default, the software generates StateType, but the default choice might not be accurate. For example, the software cannot distinguish between a constant 1 state and a static state. • The software cannot infer StateType from data because the data theoretically comes from the observation equation. The realizations of the state equation are unobservable.
Data Types: function_handle ParamDistribution — Log of joint probability density function of the state-space model parameters Π(Θ) function handle Log of joint probability density function of the state-space model parameters Π(Θ), specified as a function handle in the form @fcnName, where fcnName is the function name. ParamDistribution sets the ParamDistribution property. Suppose logPrior is the name of the MATLAB function defining the joint prior distribution of Θ. Then, logPrior must have this form. function logpdf = logPrior(theta,...otherInputs...) ... end
where: • theta is a numparams-by-1 numeric vector of the linear state-space model parameters Θ. Elements of theta must correspond to those of ParamMap. The function can accept other inputs in subsequent positions. • logpdf is a numeric scalar representing the log of the joint probability density of Θ at the input theta. If ParamDistribution requires the input parameter vector argument only, you can create the bnlssm object by calling: Mdl = bnlssm(...,@logPrior)
In general, create the bnlssm object by calling: Mdl = bnlssm(...,@(theta)logPrior(theta,...otherInputArgs...))
12-206
bnlssm
Tip Because out-of-bounds prior density evaluation is 0, set the log prior density of out-of-bounds parameter arguments to -Inf. Data Types: function_handle
Properties ObservationForm — Observation model form "equation" (default) | "distribution" Observation model form, specified as a value in this table. Value
Observation Distribution
"equation"
εt is an additive linear Gaussian random series.
xt = At xt − 1; θAt + Bt θBt ut
Custom
xt = At xt − 1; θAt + Bt θBt ut
"distribution"
State-Space Model
yt = Ct xt; θCt + Dt θDt εt .
yt logp yt xt; θyt . Example: ObservationForm="distribution" Data Types: char | string Multipoint — Multipoint evaluation of nonlinear functions empty array ([]) (default) | "A" | "C" | "LogY" | string vector Multipoint evaluation of nonlinear functions of ParamMap A, C and LogY, specified as a "A", "C", "LogY", or a string vector of such values. Specify Multipoint to speed up particle filtering routines. For the specified nonlinear functions, bnlssm object functions can evaluate multiple points simultaneously. For example, suppose x1 and x2 are two points (state particles) to be evaluated by At(xt) and Multipoint="A". The function At evaluates the concatenated points At([x1 x2]) = [Z1 Z2]. You must write the functions so that they can evaluate numpaticles points, or states, simultaneously. To write nonlinear functions to support multipoint evaluation: • For A = At, the function must accept an mt–1-by-numparticles matrix of state particles at time t, and return a 1-by-numparticles row vector of corresponding evaluations. For example, A = x(1,:)./x(2,:). • For C = Ct, the function must accept an nt-by-1 column vector responses at time t and an mt-bynumparticles matrix of state particles at time t, and return a 1-by-numparticles row vector of corresponding evaluations. At time t, the software applies each observation to all state particles. For example, C = theta(2)*x(1,:).*x(2,:). • For LogY = log p(yt|xt;Θyt), the function must accept an nt-by-1 column vector responses at time t and an mt-by-numparticles matrix of state particles at time t, and return a 1-by-numparticles row vector of corresponding log density evaluations. At time t, the software applies each observation to all state particles. If you disable multipoint evaluation (the default), functions process points sequentially, for example, functions evaluate At(x1) = Z1, and then they evaluate At(x2) = Z2. 12-207
12
Functions
When At or Ct are coefficient matrices, functions always apply multipoint evaluation because, for example, At[x1 x2] is well-defined. Example: Multipoint=["A" "LogY"] indicates that bnlssm object functions can evaluate multiple points of the nonlinear functions A and LogY simultaneously. Data Types: char | string ParamMap — Parameter Θ mapping characterizing state-space model structure function handle Parameter Θ mapping characterizing state-space model structure, stored as a function handle and set by the ParamMap input argument. ParamMap completely specifies the structure of the state-space model. Data Types: function_handle ParamDistribution — Parameter distribution representation function handle | numeric matrix Parameter distribution representation, stored as a function handle or a numparams-by-numdraws numeric matrix. • ParamDistribution is a function handle for the log prior distribution of the parameters ParamDistribution when you create PriorMdl directly by using bnlssm. • ParamDistribution is a numparams-by-numdraws numeric matrix containing random draws from the posterior distribution of the parameters when you sample from the posterior using an object function. Rows correspond to the elements of theta and columns correspond to subsequent draws of the pseudo-marginal and particle-marginal Metropolis-Hastings samplers [1] [2][3]. Data Types: function_handle
Object Functions filter
Forward recursion of Bayesian nonlinear non-Gaussian state-space model
Examples Create Bayesian Nonlinear Model in Equation Form This example shows how to create the following Bayesian nonlinear state-space model in equation form by using bnlssm. The state-space model contains two independent, stationary, autoregressive states each with a model constant. The observations are a nonlinear function of the states with Gaussian noise. The prior distribution of the parameters is flat. Symbolically, the system of equations is xt, 1 xt, 2 xt, 3 xt, 4
=
θ1 θ2 0 0 xt − 1, 1 0 1 0 0 xt − 1, 2 0 0 θ3 θ4 xt − 1, 3 0 0 0 1 xt − 1, 4
θ5 0 +
0 0 ut, 1 0 θ6 ut, 3 0 0
yt = log(exp(xt, 1 − μ1) + exp(xt, 3 − μ3)) + θ7εt . 12-208
bnlssm
μ1 and μ3 are the unconditional means of the corresponding states. The initial distribution moments of each state are their unconditional mean and covariance. Create a Bayesian nonlinear state-space model characterized by the system. The observation equation is in equation form, that is, the function composing the states is nonlinear and the innovation series εt is additive, linear, and Gaussian. The Local Functions on page 12-209 section contains two functions required to specify the Bayesian nonlinear state-space model: the state-space model parameter mapping function and the prior distribution of the parameters. You can use the functions only within this script. Mdl = bnlssm(@paramMap,@priorDistribution) Mdl = bnlssm with properties: ParamMap: ParamDistribution: ObservationForm: Multipoint:
@paramMap @priorDistribution "equation" [1x0 string]
Mdl is a bnlssm model specifying the state-space model structure and prior distribution of the statespace model parameters. Because Mdl contains unknown values, it serves as a template for posterior analysis with observations. Local Functions These functions specify the state-space model parameter mappings, in equation form, and log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = @(x)blkdiag([theta(1) theta(2); 0 1],[theta(3) theta(4); 0 1])*x; B = [theta(5) 0; 0 0; 0 theta(6); 0 0]; C = @(x)log(exp(x(1)-theta(2)/(1-theta(1))) + ... exp(x(3)-theta(4)/(1-theta(3)))); D = theta(7); Mean0 = [theta(2)/(1-theta(1)); 1; theta(4)/(1-theta(3)); 1]; Cov0 = diag([theta(5)^2/(1-theta(1)^2) 0 theta(6)^2/(1-theta(3)^2) 0]); StateType = [0; 1; 0; 1]; % Stationary state and constant 1 processes end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta([1 3])) >= 1) (theta(5:7) = 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Specify Log Chi-Squared Distributed Observation Innovations To model volatility clustering, you can specify log χ12 distributed observation innovations by setting appropriate mixture weights, and regime means and variances of a 7-regime Gaussian mixture distribution. Consider the Bayesian state-space model in “Create Time-Invariant Bayesian State-Space Model with Known and Unknown Parameters” on page 12-222, but assume that the observation-innovations process is distributed as a log χ12 random variable. Create a structure array with the following fields and values. • Field Name with value "mixture" • Field Weight with value [0.0089 0.0541 0.1338 0.2761 0.2923 0.1494 0.0854] • Field Mean with value [-9.3202 -5.3145 -3.4147 -1.7097 -0.4531 0.3975 1.1925] • Field Variance with value [3.2793 2.4574 1.8874 1.3121 0.8843 0.5898 0.4995].^2 weight = [0.0089 0.0541 0.1338 0.2761 0.2923 0.1494 0.0854]; mu = [-9.3202 -5.3145 -3.4147 -1.7097 -0.4531 0.3975 1.1925]; sigma2 = [3.2793 2.4574 1.8874 1.3121 0.8843 0.5898 0.4995].^2; ObsInnovDist = struct("Name","mixture","Weight",weight, ... "Mean",mu,"Variance",sigma2);
Create the model by passing function handles to the local functions on page 12-230 that represent the state-space model structure and prior distribution of the model parameters Θ. Use the structure array ObsInnovDist to specify that the distribution of the observation innovations is finite Gaussian mixture with hyperparameters that define a log χ12 distribution. 12-228
bssm
Mdl = bssm(@paramMap,@priorDistribution,ObservationDistribution=ObsInnovDist); Mdl.ObservationDistribution ans = struct with fields: Name: "mixture" Weight: [0.0089 0.0541 0.1338 0.2761 0.2923 0.1494 0.0854] Mean: [-9.3202 -5.3145 -3.4147 -1.7097 -0.4531 0.3975 1.1925] Variance: [10.7538 6.0388 3.5623 1.7216 0.7820 0.3479 0.2495]
Mdl is a bssm model. The property Mdl.ObservationDistribution is a structure array specifying the distribution of the observation-innovations process. All distribution hyperparameters are fully specified. Plot the distribution of the observation innovations. Compare the Gaussian mixture representation of the log χ12 and the true log χ12 distributions. r = numel(weight); LogChi2GMMdl = gmdistribution(mu',reshape(sigma2,1,1,r),weight); gmPDF = @(x)arrayfun(@(x0)pdf(LogChi2GMMdl,x0),x); logchi2PDF = @(x)((1/sqrt(2*pi))*exp((x-exp(x))/2)); figure fplot(gmPDF,[-10,5]) hold on fplot(logchi2PDF,"--r") title("Log Chi-Squared Distribution: Gaussian Mixture Versus True") legend("Gaussian Mixture","True",Location="best")
12-229
12
Functions
The distributions appear nearly identical. Local Functions function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Create Model Containing Regression Component to Deflate Observations Consider a regression of the US unemployment rate onto and real gross national product (RGNP) rate, and suppose the resulting innovations are an ARMA(1,1) process. The state-space form of the relationship is x1, t x2, t
=
σ ϕ θ x1, t − 1 + ut 1 0 0 x2, t − 1
yt − βZt = x1, t, where: • x1, t is the ARMA process. • x2, t is a dummy state for the MA(1) effect. •
yt is the observed unemployment rate deflated by a constant and the RGNP rate (Zt).
• ut is an iid Gaussian series with mean 0 and standard deviation 1. Load the Nelson-Plosser data set, which contains a table DataTable that has the unemployment rate and RGNP series, among other series. load Data_NelsonPlosser
Create a variable in DataTable that represents the returns of the raw RGNP series. Because priceto-returns conversion reduces the sample size by one, prepad the series with NaN. 12-230
bssm
DataTable.RGNPRate = [NaN; price2ret(DataTable.GNPR)]; T = height(DataTable);
Create variables for the regression. Represent the unemployment rate as the observation series and the constant and RGNP rate series as the deflation data Zt. Z = [ones(T,1) DataTable.RGNPRate]; y = DataTable.UR;
Write a function that specifies how the parameters theta map to the state-space model matrices, defers the initial state moments to the default, specifies the state types, and specifies the regression. Save this code as a file named armaDeflateYBayes.m on your MATLAB® path. Alternatively, open this example to access the function. function [A,B,C,D,Mean0,Cov0,StateType,DeflatedY] = armaDeflateYBayes(theta,y,Z) % Time-invariant, Bayesian state-space model parameter mapping function % example. This function maps the vector parameters to the state-space % matrices (A, B, C, and D), the default initial state value and the % default initial state variance (Mean0 and Cov0), the type of state % (StateType), and the deflated observations (DeflatedY). The log prior % distribution enforces parameter constraints (see flatPriorDeflateY.m). A = [theta(1) theta(2); 0 0]; B = [1; 1]; C = [theta(3) 0]; D = 0; Mean0 = []; Cov0 = []; StateType = [0 0]; DeflatedY = y - Z*[theta(4); theta(5)]; end
Write a function that specifies a joint flat prior and parameter constraints. Save this code as a file named flatPriorDeflateY.m on your MATLAB path. Alternatively, open this example to access the function. % Copyright 2022 The MathWorks, Inc. function logprior = flatPriorDeflateY(theta) % flatPriorDeflateY computes the log of the flat prior density for the five % variables in theta (see armaDeflateYBayes.m). Log probabilities % for parameters outside the parameter space are -Inf. % theta(1) and theta(2) are the AR and MA terms in a stationary % ARMA(1,1) model. The AR term must be within the unit circle. AROutUC = abs(theta(1)) >= 1; % The standard deviation of innovations (theta(3)) must be positive. nonnegsig1 = theta(3) 0 logprior = -Inf; else logprior = 0; % Prior density is proportional to 1 for all values % in the parameter space. end end
12-231
12
Functions
Create a bssm object representing the Bayesian state-space model. Specify the parameter-to-matrix mapping function as a handle to a function solely of the parameters theta. Mdl = bssm(@(theta)armaDeflateYBayes(theta,y,Z),@flatPriorDeflateY) Mdl = Mapping that defines a state-space model: @(theta)armaDeflateYBayes(theta,y,Z) Log density of parameter prior distribution: @flatPriorDeflateY
More About Bayesian Linear State-Space Model A Bayesian state-space model is a Bayesian view of a standard linear state-space model on page 11-3, in which the vector of model parameters Θ are treated as random variables with a joint prior distribution Π(Θ) and a posterior distribution composed of the joint prior and data likelihood computed by the standard Kalman filter on page 11-7 Π(Θ|yt). In general, a linear, multivariate, time-varying on page 11-5, Gaussian state-space model is the system of equations xt = Atxt − 1 + Btut yt − Zt β ′ = Ctxt + Dtεt, for t = 1, ..., T and where: • xt = xt1, ..., xtm ′ is an mt-dimensional state vector describing the dynamics of some, possibly t unobservable, phenomenon at period t. The initial state distribution (x0) has mean μ0 (Mean0) and covariance matrix Σ0 (Cov0). •
yt = yt1, ..., ytnt ′ is an nt-dimensional observation vector describing how the states are measured by observers at period t.
• ut = ut1, ..., utk ′ is a kt-dimensional white-noise vector of state disturbances at period t. All t disturbances are either multivariate Gaussian distributed or multivariate Student's t distributed, with νu degrees of freedom. • εt = εt1, ..., εth ′ is an ht-dimensional white-noise vector of observation innovations at period t. All t innovations are either multivariate Gaussian distributed or multivariate Student's t distributed, with νε degrees of freedom. • εt and ut are uncorrelated. • For time-invariant state-space models on page 11-4, • Zt = zt1 zt2 ⋯ ztd is row t of a T-by-d matrix of predictors Z. Each column of Z corresponds to a predictor, and each successive row to a successive period. If the observations are multivariate, then all predictors deflate each observation. • β is a d-by-n matrix of regression coefficients for Zt. 12-232
bssm
• At, Bt, Ct, Dt, and β (when present) are model parameters arbitrarily collected in the vector Θ. The joint prior distribution of Θ is Π(Θ) and the joint posterior distribution of Θ is Π(Θ|yt,Zt). The following definitions describe each of the model parameters and state characteristics, and how to configure them as outputs of ParamMap. State-Transition Coefficient Matrix At The state-transition coefficient matrix At is a matrix or cell vector of matrices that specifies how the states xt are expected to transition from period t – 1 to t, for all t = 1,...,T. In other words, the expected state-transition equation at period t is E(xt|xt–1) = Atxt–1. For time-invariant state-space models, the output A is an m-by-m matrix, where m is the number of state variables. For time-varying state-space models, the output A is a series of matrices represented by a Tdimensional cell array, where A{t} contains an mt-by-mt – 1 state-transition coefficient matrix. If the number of state variables changes from period t – 1 to t, mt ≠ mt – 1. State-Disturbance-Loading Coefficient Matrix Bt The state-disturbance-loading coefficient matrix Bt is a matrix or cell vector of matrices that specifies the additive error structure of the state disturbances ut in the state-transition equation from period t – 1 to t, for all t = 1,...,T. In other words, the state-transition equation at period t is xt = Atxt–1 + Btut. For time-invariant state-space models, the output B is an m-by-k matrix, where m is the number of state variables and k is the number of state disturbances. The quantity B*(B') is the statedisturbance covariance matrix for all periods. For time-varying state-space models, B is a T-dimensional cell array, where B{t} contains an mt-by-kt state-disturbance-loading coefficient matrix. If the number of state variables or state disturbances changes at period t, the matrix dimensions between B{t-1} and B{t} vary. The quantity B{t}*(B{t}') is the state-disturbance covariance matrix for period t. Measurement-Sensitivity Coefficient Matrix Ct The measurement-sensitivity coefficient matrix Ct is a matrix or cell vector of matrices that specifies how the states xt are expected to linearly combine at period t to form the observations, yt, for all t = 1,...,T. In other words, the expected observation equation at period t is E(yt|xt) = Ctxt. For time-invariant state-space models, the output C is an n-by-m matrix, where n is the number of observation variables and m is the number of state variables. For time-varying state-space models, the output C is a T-dimensional cell array, where C{t} contains an nt-by-mt measurement-sensitivity coefficient matrix. If the number of state or observation variables changes at period t, the matrix dimensions between C{t-1} and C{t} vary. Observation-Innovation Coefficient Matrix Dt The observation-innovation coefficient matrix Dt is a matrix or cell vector of matrices that specifies the additive error structure of the observation innovations εt in the observation equation at period t, for all t = 1,...,T. In other words, the observation equation at period t is yt = Ctxt + Dtεt.
12-233
12
Functions
For time-invariant state-space models, the output D is an n-by-h matrix, where n is the number of observation variables and h is the number of observation innovations. The quantity D*(D') is the observation-innovation covariance matrix for all periods. For time-varying state-space models, the output D is a T-dimensional cell array, where D{t} contains an nt-by-ht matrix. If the number of observation variables or observation innovations changes at period t, then the matrix dimensions between D{t-1} and D{t} vary. The quantity D{t}*(D{t}') is the observation-innovation covariance matrix for period t. State Characteristics Other state characteristics include initial state moments and a description of the dynamic behavior of each state. You can optionally specify the state characteristics by including extra output arguments for ParamMap after the required coefficient matrices. • Mean0 — Initial state mean μ0, an m-by-1 numeric vector, where m is the number of states in x1. • Cov0 — Initial state covariance matrix Σ0, an m-by-m positive semidefinite matrix. • StateType — State dynamic behavior indicator, an m-by-1 numeric vector. This table summarizes the available types of initial state distributions. Value
State Dynamic Behavior Indicator
0
Stationary (for example, ARMA models)
1
The constant 1 (that is, the state is 1 with probability 1)
2
Diffuse, nonstationary (for example, random walk model, seasonal linear time series), or static state on page 12-234
For example, suppose that the state equation has two state variables: The first state variable is an AR(1) process, and the second state variable is a random walk. Specify the initial distribution types by setting StateType=[0; 2]; within the ParamMap function. Static State A static state does not change in value throughout the sample, that is, P xt + 1 = xt = 1 for all t = 1,...,T. Latent Variance Variables of t-Distributed Errors To facilitate posterior sampling, multivariate Student's t-distributed state disturbances and observation innovations are each represented as a inverse-gamma scale mixture, where the inversegamma random variable is the latent variance variable. Explicitly, suppose the m-dimensional state disturbances ut are iid multivariate t distributed with location 0, scale Im, and degrees of freedom νu. As an inverse-gamma scale mixture ut = ζtut, where: 12-234
bssm
• The latent variable ζt is inverse-gamma with shape and scale νu/2. • ut is an m-dimensional multivariate standard Gaussian random variable. Multivariate t-distributed observation innovations can be similarly decomposed. You can access ζt by writing a custom output function that returns the field for the specified error type, either StateVariance or ObservationVariance. For more details, see the OutputFunction name-value argument and Output output argument.
Tips • Load the data to the MATLAB workspace before creating the model. • Create the parameter-to-matrix mapping function and log prior distribution function each as their own file. • To specify a log χ21 distribution for the observation innovation process εt, set the ObservationDistribution to the structure array struct("Weight",weight,"Mean",mu,"Variance",sigma2), where: • weight is [0.0089 0.0541 0.1338 0.2761 0.2923 0.1494 0.0854]. • mu is [-9.3202 -5.3145 -3.4147 -1.7097 -0.4531 0.3975 1.1925]. • sigma2 is [3.2793 2.4574 1.8874 1.3121 0.8843 0.5898 0.4995].^2.
Algorithms Distribution Hyperparameters This table describes the supported distribution hyperparameters, and their values and defaults. Distribution Error Support Student's t
Hyperparameter Field Name
Multivariate Degrees of "DoF" ut (state) and freedom parameter εt (observation)
Value
Default
Positive numeric scalar, NaN, or a function handle. You must specify the value.
When you specify a structure array, you must specify "DoF". Otherwise, the default is NaN.
12-235
12
Functions
Distribution Error Support Finite Gaussian mixture
Hyperparameter Field Name
Univariate εt Weights (probability distribution) for r regimes
"Weight"
Value
Default
Length r nonnegative vector.
1
bssm normalizes the vector so that its elements sum to 1. bssm determines the number of regimes r by the number of elements of the value of "Weight".
Means for r regimes
"Mean"
Length r finite numeric row vector
zeros(1,r)
Variances for r regimes
"Variance"
Length r finite numeric row vector
ones(1,r)
"Delta"
Numeric scalar or NaN
NaN
Skew normal Univariate εt Distribution scale 2
is δ + 1 and shape is δ • bssm fixes hyperparameters to specified numeric values.
• For a NaN or a function handle, where the values are supported, bssm treats the hyperparameter as unknown. Consequently, bssm object functions estimate them by computing their posterior distribution with all other unknown parameters in Θ. The value of the hyperparameter determines its prior distribution. • For NaN, the prior is flat. • For a function handle (supported for "DoF"), the associated function represents the log prior distribution. The function has the form, where x is a numeric scalar. function logpdf = logPrior(x,...otherInputs...) ... end
For example, a valid function handle is @(x)log(normpdf(x,7,1)). State-Space Model Behaviors • For each state variable j, default values of Mean0 and Cov0 depend on StateType(j): • If StateType(j) = 0 (stationary state), bssm generates the initial value using the stationary distribution. If you provide all values in the coefficient matrices (that is, your model has no unknown parameters), bssm generates the initial values. Otherwise, the software generates the initial values during estimation. 12-236
bssm
• If StateType(j) = 1 (constant state), Mean0(j) is 1 and Cov0(j) is 0. • If StateType(j) = 2 (nonstationary or diffuse state), Mean0(j) is 0 and Cov0(j) is 1e7. • For static states on page 12-234 that do not equal 1 throughout the sample, the software cannot assign a value to the degenerate, initial state distribution. Therefore, set the StateType of static states to 2. Consequently, the software treats static states as nonstationary and assigns the static state a diffuse initial distribution. • bssm models do not store observed responses or predictor data. Supply the data wherever necessary using the appropriate input or name-value pair arguments. • DeflateY is the deflated-observation data, which accommodates a regression component in the observation equation. For example, in this function, which has a linear regression component, Y is the vector of observed responses and Z is the vector of predictor data. function [A,B,C,D,Mean0,Cov0,StateType,DeflateY] = ParamFun(theta,Y,Z) ... DeflateY = Y - params(9) - params(10)*Z; ... end
Version History Introduced in R2022a R2022b: Assume non-Gaussian, fat-tailed distributions on the state disturbance and observation innovation processes bssm enables you to assume the Student's t distribution for the conditional distribution of the state disturbance or observation innovations process. These error settings are suited when the process or measurement errors have excess kurtosis (or is fat-tailed or leptokuric). Specify the distribution of either error process by using the following name-value argument when you create a bssm object: • StateDistribution — Distribution of the state disturbance process • ObservationDistribution — Distribution of the observation innovation process
References [1] Hastings, Wilfred K. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications." Biometrika 57 (April 1970): 97–109. https://doi.org/10.1093/biomet/57.1.97. [2] Metropolis, Nicholas, Rosenbluth, Arianna. W., Rosenbluth, Marshall. N., Teller, Augusta. H., and Teller, Edward. "Equation of State Calculations by Fast Computing Machines." The Journal of Chemical Physics 21 (June 1953): 1087–92. https://doi.org/10.1063/1.1699114.
See Also Objects ssm | dssm | bnlssm Functions ssm2bssm 12-237
12
Functions
Topics “What Are State-Space Models?” on page 11-3 “What Is the Kalman Filter?” on page 11-7 “Analyze Linearized DSGE Models” on page 11-190 “Perform Outlier Detection Using Bayesian Non-Gaussian State-Space Models” on page 11-211 “Fit Bayesian Stochastic Volatility Model to S&P 500 Volatility” on page 11-148
12-238
chowtest
chowtest Chow test for structural change
Syntax h = chowtest(X,y,bp) [h,pValue,stat,cValue] = chowtest(X,y,bp) StatTbl = chowtest(Tbl,bp) ___ = chowtest( ___ ,Name=Value)
Description Chow tests assess the stability of coefficients β in a multiple linear regression model of the form y = Xβ + ε. chowtest splits the data at specified break points. Coefficients are estimated in initial subsamples, then tested for compatibility with data in complementary subsamples. h = chowtest(X,y,bp) returns a vector of test decisions h from conducting Chow tests on page 12-259 on the multiple linear regression model y = Xβ + ε at the break points in bp. y is a vector of response data and X is a matrix of predictor data. Each element of bp results in a separate test. [h,pValue,stat,cValue] = chowtest(X,y,bp) additionally returns vectors of p-values pValue, test statistics stat, and critical values cValue for the tests. StatTbl = chowtest(Tbl,bp) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting Chow tests on the variables of the table or timetable Tbl. Each row of StatTbl contains the results of the corresponding test. The response variable in the regression is the last table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. ___ = chowtest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. chowtest returns the output argument combination for the corresponding input arguments. In addition to bp, some options control the number of tests to conduct. For example, chowtest(Tbl,ResponseVariable="GDP",Test=["breakpoint" "forecast"],Intercept=false) conducts two tests for the presence of a structural break in the coefficients of the regression model of GDP on all other variables of the table Tbl without an intercept term. The first test assesses coefficient equality constraints directly, and the second test assesses forecast performance.
Examples Conduct Chow Test for Structural Change Conduct Chow tests to assess whether there are structural changes in the equation for food demand around World War II. Input the predictor series as a matrix and input the response series as a vector. 12-239
12
Functions
Load the US food consumption data set Data_Consumption.mat, which contains annual measurements from 1927 through 1962 with missing data due to the war in the matrix Data. load Data_Consumption
Suppose that you want to develop a model for consumption as determined by food prices and disposable income, and assess its stability through the economic shock through the war. Plot the series. P = Data(:,1); % Food price index I = Data(:,2); % Disposable income index Q = Data(:,3); % Food consumption index figure; plot(dates,[P I Q]) axis tight grid on xlabel("Year") ylabel("Index") legend(["Price" "Income" "Consumption"],Location="southeast")
Measurements are missing from 1942 through 1947, which correspond to World War II. Stabilize each series by applying the log transformation.
12-240
chowtest
LP = log(P); LI = log(I); LQ = log(Q);
Assume that log consumption is a linear function of the logs of food price and income. LQt = β0 + β1LIt + β2LP + εt .
εt is a Gaussian random variable with mean 0 and standard deviation σ2. Identify the indices before World War II. Plot log consumption with respect to the logs of food price and income. preWarIdx = (dates = datetime(1948,12,31),1); bp = [bp1941 bp1948]; StatTbl = chowtest(LogTT,bp) RESULTS SUMMARY *************** Test 1 Sample size: 30 Breakpoint: 15 Test type: breakpoint Coefficients tested: All
12-244
chowtest
Statistic: 5.5400 Critical value: 3.0088 P value: 0.0049 Significance level: 0.0500 Decision: Reject coefficient stability *************** Test 2 Sample size: 30 Breakpoint: 16 Test type: breakpoint Coefficients tested: All Statistic: 1.2942 Critical value: 3.0088 P value: 0.2992 Significance level: 0.0500 Decision: Fail to reject coefficient stability StatTbl=2×8 table h _____ Test 1 Test 2
true false
pValue _________
stat ______
cValue ______
0.0049125 0.29918
5.54 1.2942
3.0088 3.0088
Break Point ___________ 15 16
Alpha _____
Intercept _________
0.05 0.05
true true
StatTbl contains the decision statistics and options for each test (row). By default, chowtest selects the last table variable as the response, and selects all other variables as predictors. You can select a different variable by using the ResponseVariable name-value argument. You can choose a different set of predictor variables by using the PredictorVariables name-value argument.
Test Model of Real U.S. GNP for Structural Change Apply the Chow test to assess the stability of an explanatory model of US real gross national product (RGNP) using the end of World War II as a break point. Load the Nelson-Plosser data set Data_NelsonPlosser.mat, which contains the table of data DataTable. load Data_NelsonPlosser
The time series in the data set contain annual, macroeconomic measurements from 1860 to 1970. For more details, a list of variables, and descriptions, enter Description in the command line. 12-245
____
{'br {'br
12
Functions
Convert the table to a timetable. Focus the sample to measurements from the end of 1915 through the end of 1970. dates = datetime(dates,12,31); span = isbetween(dates,datetime(1915,12,31),datetime(1970,12,31),"closed"); TT = table2timetable(DataTable,RowTimes=dates); TT.Dates = []; TT = TT(span,:);
Consider a predictive model of the US RGNP GNPR given measurements of the industrial production index IPI, total employment E, and real wages WR. Plot the series in the model. prednames = ["IPI" "E" "WR"]; tiledlayout(2,2) for j = ["GNPR" prednames] nexttile plot(TT.Time,TT{:,j}) ylabel(j) end
To address exponential growth, apply the log transform to the series. LogTT = varfun(@log,TT);
12-246
chowtest
LogTT is a timetable containing the transformed variables in TT, but with names prepended with log_. Select the index corresponding to the end of World War II, September 2, 1945. bp = find(LogTT.Time > datetime(1945,9,2),1);
Assume that an appropriate multiple regression model to describe real GNP is log(GNPRt) = β0 + β1log(IPIt) + β2log(Et) + β3log(WRt) . Conduct a break point test to assess whether all regression coefficients are stable. Use the end of WWII as a break point. Print a test summary to the command line. lprednames = "log_" + prednames; StatTbl = chowtest(LogTT,bp,ResponseVariable="log_GNPR", ... PredictorVariables=lprednames,Display="summary") RESULTS SUMMARY *************** Test 1 Sample size: 56 Breakpoint: 31 Test type: breakpoint Coefficients tested: All Statistic: 4.0978 Critical value: 2.5652 P value: 0.0062 Significance level: 0.0500 Decision: Reject coefficient stability StatTbl=1×8 table h _____ Test 1
true
pValue _________
stat ______
cValue ______
0.0061633
4.0978
2.5652
Break Point ___________ 31
Alpha _____ 0.05
Intercept _________ true
StatTbl contains decision statistics and test options for the test. StatTbl.h = 1 and StatTbl.pValue < 0.01 indicate string evidence to reject the null hypothesis that the regression coefficients before and after WWII are equivalent.
Assess Stability of Subsets of Regression Coefficients Conduct a Chow test to assess the stability of a subset of regression coefficients. This example expands on “Conduct Chow Test for Structural Change” on page 12-239. Load the US food consumption data set. Convert the table to a timetable, and remove rows containing missing values. 12-247
____
{'br
12
Functions
load Data_Consumption.mat dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Row = []; TT = rmmissing(TT);
Apply the log transformation to each series. LogTT = varfun(@log,DataTable);
Identify the indices before World War II. preWarIdx = dates 1 displays results in the command window.
The value of Display applies to all tests. Example: Display="off" Data Types: char | string ResponseVariable — Variable in Tbl to use for response first variable in Tbl (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variable in Tbl to use for response, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. chowtest uses the same specified response variable for all tests. Example: ResponseVariable="GDP" Example: ResponseVariable=[true false false false] or ResponseVariable=1 selects the first table variable as the response. Data Types: double | logical | char | cell | string PredictorVariables — Variables in Tbl to use for the predictors string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl to use for the predictors, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. chowtest uses the same specified predictors for all tests. By default, chowtest uses all variables in Tbl that are not specified by the ResponseVariable name-value argument. Example: PredictorVariables=["UN" "CPI"] Example: PredictorVariables=[false true true false] or DataVariables=[2 3] selects the second and third table variables. Data Types: double | logical | char | cell | string 12-257
12
Functions
Note • When chowtest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. Vector values and Coeffs arrays must share a common dimension, equal to numTests. • If you specify X and y, and bp, Intercept, Test, or Alpha are row vectors, chowtest returns row vectors.
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests numTests. chowtest returns h when you supply the inputs X and y. The Chow test on page 12-259 has the following hypotheses are: • H0: Regression coefficients β, selected by the Coeffs name-value argument, are identical across subsamples. • H1: At least one regression coefficient in β, selected by the Coeffs name-value argument, exhibits significant change across subsamples. Elements of h have the following values and meanings. • Values of 1 indicate rejection of the null hypothesis that the regression coefficients β, selected by Coeffs, are identical across subsamples model, in favor of the alternative hypothesis. • Values of 0 indicate failure to reject the null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests numTests. chowtest returns pValue when you supply the inputs X and y. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests numTests. chowtest returns stat when you supply the inputs X and y. For details, see “Chow Tests” on page 12-259. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests numTests. chowtest returns cValue when you supply the inputs X and y. Alpha determines the critical values. 12-258
chowtest
StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. chowtest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the break point Break Point and test settings specified by Alpha, Intercept, and Test.
More About Chow Tests Chow tests assess the stability of the coefficients β in a multiple linear regression model of the form y = Xβ + ε. chowtest supports the two variations of the Chow test introduced in [1]: the break point and forecast tests. The break point test is a standard F test from the analysis of covariance. The forecast test makes use of the standard theory of prediction intervals. Chow’s contribution is to place both tests within the general linear hypothesis framework, and then to develop appropriate test statistics for testing subsets of coefficients (see Coeffs). For test-statistic formulae, see [1].
Tips • Chow tests assume continuity of the innovations variance across structural changes. Heteroscedasticity can distort the size and power of the test. Therefore, verify the innovationsvariance-continuity assumption holds before using the test results for inference. • If both subsamples contain more than numCoeffs observations, you can conduct a forecast test instead of a break point test (see Test). However, the forecast test might have lower power relative to the break point test [1]. Nevertheless, Wilson (1978) suggests conducting the forecast test in the presence of unknown specification errors [4]. • You can apply the forecast test to cases where both subsamples have size greater than numCoeffs, where you would typically apply a breakpoint test. In such cases, the forecast test might have significantly reduced power relative to a break point test [1]. Nevertheless, Wilson (1978) suggests use of the forecast test in the presence of unknown specification errors [4]. • The forecast test is based on the unbiased predictions, with zero mean error, that result from stable coefficients. However, zero mean forecast error does not, in general, guarantee coefficient stability. Therefore, forecast tests are most effective in checking for structural breaks, rather than model continuity [3]. • To obtain diagnostic statistics for each subsample, such as regression coefficient estimates, their standard errors, error sums of squares, and so on, pass the appropriate data to fitlm. For details on working with LinearModel model objects, see “Multiple Linear Regression”.
Version History Introduced in R2015b R2022a: chowtest returns a results table when you supply a table of data
12-259
12
Functions
If you supply a table of time series data Tbl, chowtest returns a table containing variables for the test rejection decisions h, p-values pValue, tests statistics stat, and critical values cValue, with rows corresponding to separate tests. Before R2022a, chowtest returned the numeric outputs in separate positions of the output when you supplied a table of input data. Starting in R2022a, if you supply a table of input data, update your code to return all outputs in the first output position. StatTbl = chowtest(Tbl,bp,Name=Value)
If you request more outputs, chowtest issues an error. Also, access results by using table indexing. For more details, see “Access Data in Tables”.
References [1] Chow, G. C. "Tests of Equality Between Sets of Coefficients in Two Linear Regressions." Econometrica. Vol. 28, 1960, pp. 591–605. [2] Fisher, F. M. "Tests of Equality Between Sets of Coefficients in Two Linear Regressions: An Expository Note." Econometrica. Vol. 38, 1970, pp. 361–66. [3] Rea, J. D. "Indeterminacy of the Chow Test When the Number of Observations is Insufficient." Econometrica. Vol. 46, 1978, p. 229. [4] Wilson, A. L. "When is the Chow Test UMP?" The American Statistician. Vol. 32, 1978, pp. 66–68.
See Also fitlm | LinearModel | cusumtest | recreg Topics “Check Model Assumptions for Chow Test” on page 3-103 “Power of the Chow Test” on page 3-109
12-260
cffilter
cffilter Christiano-Fitzgerald filter for trend and cyclical components
Syntax [Trend,Cyclical] = cffilter(Y) [TTbl,CTbl] = cffilter(Tbl) [ ___ ] = cffilter( ___ ,Name=Value) cffilter( ___ ) cffilter(ax, ___ ) [ ___ ,h] = cffilter( ___ )
Description Separate one or more time series into additive trend and cyclical components by applying the Christiano-Fitzgerald filter on page 12-271 [2]. cffilter optionally plots the series and trend component, with cycles removed. In addition to the Christiano-Fitzgerald filter, Econometrics Toolbox supports the Baxter-King (bkfilter), Hamilton (hfilter), and Hodrick-Prescott (hpfilter) filters. [Trend,Cyclical] = cffilter(Y) returns the additive trend Trend and cyclical Cycilcal components from applying the Christiano-Fitzgerald filter on page 12-271 to each variable (column) of the input matrix of time series data Y, using the definition of a business cycle in [1] for quarterly data. [TTbl,CTbl] = cffilter(Tbl) returns the tables or timetables TTbl and CTbl containing variables for the trend and cyclical components, respectively, from applying the Christiano-Fitzgerald filter to each variable in the input table or timetable Tbl. To select different variables in Tbl to filter, use the DataVariables name-value argument. [ ___ ] = cffilter( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. cffilter returns the output argument combination for the corresponding input arguments. For example, cffilter(Tbl,Symmetric=true,Drift=[false false true],DataVariables=1:3) applies the symmetric Christiano-Fitzgerald filter to the first three variables in the input table Tbl, and removes the linear drift term from the third variable before applying the filter. cffilter( ___ ) plots time series variables in the input data and their respective smoothed trend components (cycles removed), computed by the Christiano-Fitzgerald filter, on the same axes. cffilter(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = cffilter( ___ ) plots the specified series and their trend components, and additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it. 12-261
12
Functions
Examples Apply Christiano-Fitzgerald Filter to Matrix of Data Plot the cyclical component of the US post-WWII, seasonally adjusted, quarterly, real gross national product (GNPR). load Data_GNP GNPR = Data(:,2); [trend,cyclical] = cffilter(GNPR); T = numel(trend) T = 235
trend and cyclical are 235-by-1 vectors containing the trend and cyclical components, respectively, resulting from applying the asymmetric Christiano-Fitzgerald filter to the series with default upper and lower cutoffs. plot(dates,cyclical) axis tight ylabel("Real GNP Cyclical Component")
12-262
cffilter
Apply Christiano-Fitzgerald Filter to Table Variables Apply the Christiano-Fitzgerald filter to all variables in input table variables. Load the Schwert stock data set Data_SchwertStock.mat, which contains monthly returns of the NYSE index from 1871 through 2008 in DataTimeTableMth, among three other variables (for details, enter Description). Remove all missing observations from all series. load Data_SchwertStock TTM = rmmissing(DataTimeTableMth);
Aggregate the monthly data in the timetable to quarterly measurements. TTQ = convert2quarterly(TTM);
Apply the asymmetric Christiano-Fitzgerald filter to all variables in the quarterly timetable. Use the default cutoffs. [TQTT,CQTT] = cffilter(TTQ); size(TQTT) ans = 1×2 220
4
TQTT and CQTT are 220-by-4 timetables containing the trend and cyclical components, respectively, of the series in TTQ. Variables in the input and output timetables correspond. By default, cffilter filters all variables in the input table or timetable. To select a subset of variables, set the DataVariables option. To compare outputs between different tabular inputs, apply the Christiano-Fitzgerald filter to all variables in the table of monthly data DataTableMth and the timetable of monthly data TTM. % Table input of daily data DTM = rmmissing(DataTableMth); [TMDT,CMDT] = cffilter(DTM); size(TMDT) ans = 1×2 656
4
tail(TMDT)
May1925 Jun1925 Jul1925 Aug1925 Sep1925 Oct1925 Nov1925 Dec1925
Return __________
DivYld _________
CapGain __________
CapGainA __________
0.082825 0.005878 0.014371 0.046424 0.022868 0.079477 0.00055416 0.050338
0.0030391 0.0057145 0.0054548 0.0033349 0.0064024 0.0058413 0.003497 0.0068429
0.079846 0.00023547 0.0089194 0.043012 0.016375 0.073612 -0.0028871 0.043564
0.079786 0.00016356 0.0089158 0.04309 0.016465 0.073636 -0.0029429 0.043495
tail(CMDT)
12-263
12
Functions
DivYld ___________
CapGain __________
CapGainA ___________
-0.016639 -0.0026377 0.0044847 0.0017835 -0.0060897 -0.011281 -0.0090187 -0.000747
-0.00055599 -0.00025619 -0.00024249 -0.00062296 -0.0011003 -0.0012017 -0.00068252 0.00024995
-0.016143 -0.0024535 0.0047236 0.0024837 -0.0048985 -0.010055 -0.0083919 -0.0010665
-0.016083 -0.0023816 0.0047272 0.0024064 -0.0049894 -0.010079 -0.0083361 -0.00099695
May1925 Jun1925 Jul1925 Aug1925 Sep1925 Oct1925 Nov1925 Dec1925
Return __________
% Timetable input of daily data [TMTT,CMTT] = cffilter(TTM); size(TMTT) ans = 1×2 656
4
tail(TMTT) Time ___________
Return __________
DivYld _________
CapGain __________
CapGainA __________
01-May-1925 01-Jun-1925 01-Jul-1925 01-Aug-1925 01-Sep-1925 01-Oct-1925 01-Nov-1925 01-Dec-1925
0.082825 0.005878 0.014371 0.046424 0.022868 0.079477 0.00055416 0.050338
0.0030391 0.0057145 0.0054548 0.0033349 0.0064024 0.0058413 0.003497 0.0068429
0.079846 0.00023547 0.0089194 0.043012 0.016375 0.073612 -0.0028871 0.043564
0.079786 0.00016356 0.0089158 0.04309 0.016465 0.073636 -0.0029429 0.043495
Time ___________
Return __________
DivYld ___________
CapGain __________
CapGainA ___________
01-May-1925 01-Jun-1925 01-Jul-1925 01-Aug-1925 01-Sep-1925 01-Oct-1925 01-Nov-1925 01-Dec-1925
-0.016639 -0.0026377 0.0044847 0.0017835 -0.0060897 -0.011281 -0.0090187 -0.000747
-0.00055599 -0.00025619 -0.00024249 -0.00062296 -0.0011003 -0.0012017 -0.00068252 0.00024995
-0.016143 -0.0024535 0.0047236 0.0024837 -0.0048985 -0.010055 -0.0083919 -0.0010665
-0.016083 -0.0023816 0.0047272 0.0024064 -0.0049894 -0.010079 -0.0083361 -0.00099695
tail(CMTT)
Because the data is disaggregated, the outputs of the daily data have more rows than from the quarterly data. The filter results of the daily inputs are equal among the corresponding outputs, but cffilter returns tables of results, instead of timetables, when you supply data in a table.
12-264
cffilter
Set Filter Parameters for Data Periodicity Load the Nelson-Plosser macroeconomic data set Data_NelsonPlosser.mat, which contains series measured yearly in the timetable DataTimeTable. load Data_NelsonPlosser
Apply the asymmetric Christiano-Fitzgerald filter to the real and nominal GNP series, GNPR and GNPN, respectively. Filter out cyclical component frequencies outside the interval [2,8]. Plot the trend component with each series. figure cffilter(DataTimeTable,DataVariables=["GNPR" "GNPN"], ... LowerCutoff=2,UpperCutoff=8);
Compare the results of the asymmetric filter with the symmetric filter. in addition to cyclical component cutoffs of 2 and 8, and set the lag length of the symmetric moving average to 3 years. figure cffilter(DataTimeTable,DataVariables=["GNPR" "GNPN"], ... LowerCutoff=2,UpperCutoff=8,FilterType="symmetric",LagLength=3);
12-265
12
Functions
Unlike the asymmetric filter, the first and last LagLength=3 values of the returned components of the symmetric filter are NaN-valued. Experiment with filter parameter values by adjusting the interactive controls. Because this setup always sets the lag length of the symmetric moving average, cffilter implements the symmetric method only. varnames = string(DataTimeTable.Properties.VariableNames); lc =
; % LowerCutoff
uc =
; % UpperCutoff ; % LagLength
q = tfs =
; % Stationarity
tfd =
; % Drift
vn =
; % DataVariables
figure [TTbl,CTbl,h] = cffilter(DataTimeTable,DataVariables=vn, ... LowerCutoff=lc,UpperCutoff=uc,LagLength=q,Stationarity=tfs, ... Drift=tfd);
12-266
cffilter
Input Arguments Y — Time series data numeric matrix Time series data, specified as a numObs-by-numVars numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. Specify numVars variables to filter by using the DataVariables argument. The selected variables must be numeric. ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, cffilter plots to the current axes (gca). 12-267
12
Functions
Note cffilter removes, from the specified data, all rows containing at least one missing observation, represented by a NaN value. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: cffilter(Tbl,Symmetric=true,Drift=[false false true],DataVariables=1:3) applies the symmetric Christiano-Fitzgerald filter to the first three variables in the input table Tbl, and removes the linear drift term from the third variable before applying the filter. LowerCutoff — Lower cutoff period for cyclical component 6 (default) | numeric scalar | numeric vector Lower cutoff period for the cyclical component, in units of data periodicity, specified as a numeric scalar or a length numVars vector with elements greater than or equal to 2. For a scalar, cffilter uses LowerCutoff for all input series. For a vector, cffilter applies LowerCutoff(j) to selected j in the input data. The default is 6, meaning 6 quarters (18 months), which is based on the definition of business cycle in [1]. For more details, see “Tips” on page 12-271 Example: LowerCutoff=[4 6] applies a lower cutoff of 4 to the first series in the input data and a lower cutoff of 6 to the second series. Data Types: double UpperCutoff — Upper cutoff period for cyclical component 32 (default) | numeric scalar | numeric vector Upper cutoff period for the cyclical component, in units of data periodicity, specified as a numeric scalar or a length numVars vector with elements greater than or equal to the corresponding LowerCutoff for each series. For a scalar, cffilter uses UpperCutoff for all input series. For a vector, cffilter applies UpperCutoff(j) to series j in the input data. The default is 32, meaning 32 quarters (8 years), which is based on the definition of business cycle in [1]. For more details, see “Tips” on page 12-271 Example: UpperCutoff=[32 36] applies an upper cutoff of 32 to the first series in the input data and an upper cutoff of 36 to the second series. 12-268
cffilter
Data Types: double FilterType — Finite-sample approximation of ideal bandpass filter "asymmetric" (default) | "symmetric" | character vector Finite-sample approximation of ideal bandpass filter, specified as a value in this table. Value
Description
"asymmetric"
Asymmetric, time-varying moving average
"symmetric"
Symmetric, time-invariant moving average
If you set LagLength, cffilter sets FilterType to "symmetric". Example: FilterType="symmetric" Data Types: char | string LagLength — Number of consecutive lags of moving average for the symmetric filter 12 (default) | positive integer | vector of positive integers Number of consecutive lags of the moving average for the symmetric filter (see FilterType), specified as a positive integer less than (numObs−1)/2 or vector of such positive integers. For a scalar, cffilter applies LagLength to all input series. For a vector, cffilter applies LagLength(j) to series j in the input data. If you specify LagLength, cffilter sets FilterType to "symmetric". The default is 12, suggested in [2] for quarterly data. For more details, see “Tips” on page 12-271 Example: LagLength=[4 8] sets FilterType to "symmetric" and specifies 4 lags for the symmetric moving average of the first series in the input data and 8 for the symmetric moving average of the second series. Data Types: double Stationarity — Flag indicating whether input series is stationary false (default) | true | logical vector Flag indicating whether input series is stationary, specified as a value or a vector of values in this table. Value
Description
false
Input series is a nonstationary random walk.
true
Input series is stationary; cffilter adjusts the filter to use ideal weights at series endpoints.
For more details on the alternatives, see [2]. For a scalar, cffilter applies Stationarity to all input series. For a vector, cffilter applies Stationarity(j) to series j in the input data. 12-269
12
Functions
Example: Stationarity=[true false] specifies that the first input series is stationary and the second input series is nonstationary. Data Types: logical Drift — Flag indicating drift in input series false (default) | true Flag indicating drift in the input series, specified as a value in this table. Value
Description
false
cffilter does not linear drift from the input series.
true
cffilter removes linear drift from the input series before it applies the filter.
For a scalar, cffilter applies Drift to all input series. For a vector, cffilter applies Drift(j) to series j in the input data. Example: Drift=[false true] removes the linear drift from only the second input series. Data Types: logical DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl that cffilter filters, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string
Output Arguments Trend — Smoothed trend component τt numeric vector | numeric matrix Smoothed trend component τt of each series in the data, returned as a numObs-by-numVars numeric matrix. cffilter returns Trend when you supply the input Y. When cffilter implements the symmetric filter (see FilterType) for a series, the first and last LagLength values are NaN for the corresponding column in Trend. Cyclical — Cyclical component ct numeric vector | numeric matrix Cyclical component ct of each series in the data, returned as a numObs-by-numVars numeric matrix. cffilter returns Cyclical when you supply the input Y. 12-270
cffilter
When cffilter implements the symmetric filter (see FilterType) for a series, the first and last LagLength values are NaN for the corresponding column in Cyclical. TTbl — Smoothed trend component τt table | timetable Smoothed trend component τt of each specified series, returned as a numObs-by-numVars table or timetable, the same data type as Tbl. cffilter returns TTbl when you supply the input Tbl. For each selected series and corresponding LagLength, when cffilter implements the symmetric filter (see FilterType) for a series, the first and last LagLength values are NaN in the corresponding variable in TTbl. CTbl — Cyclical component ct table | timetable Cyclical component ct of each specified series, returned as a numObs-by-numVars table or timetable, the same data type as Tbl. cffilter returns CTbl when you supply the input Tbl. For each selected series and corresponding LagLength, when cffilter implements the symmetric filter (see FilterType) for a series, the first and last LagLength values are NaN in the corresponding variable in CTbl. h — Handles to plotted graphics objects vector of graphics objects Handles to plotted graphics objects, returned as a vector of graphics objects. cffilter plots the data and trend only when you return no outputs or you return h. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
More About Christiano-Fitzgerald Filter The Christiano-Fitzgerald filter separates a time series yt into a trend component τt (Trend and TTbl) and cyclical component ct (Cyclical and CTbl) such that yt = τt + ct. Depending on the value of the value of the FilterType name-value argument, cffilter implements either an asymmetric, time-varying, or a symmetric, time-invariant moving average as a finite-sample approximation of an ideal bandpass filter [2].
Tips The definition of a business cycle in [1] suggests values in the table for the cutoff periods LowerCutoff and UpperCutoff, and lag length LagLength that depend on the periodicity of the data. Periodicity
LowerCutoff
UpperCutoff
LagLength
Yearly
2
8
3
Quarterly
6
32
12
12-271
12
Functions
Periodicity
LowerCutoff
UpperCutoff
LagLength
Monthly
18
96
36
In practice, use vectors of cutoff periods and lag lengths to test alternatives. Use the plot produced by cffilter to compare results among settings.
Version History Introduced in R2023a
References [1] Burns, Arthur F., and Wesley C. Mitchell. Measuring Business Cycles. Cambridge, MA: National Bureau of Economic Research, 1946. [2] Christiano, Lawrence J., and Terry J. Fitzgerald. "The Band Pass Filter." International Economic Review 44 (May 2003): 435–65. https://doi.org/10.1111/1468-2354.t01-1-00076.
See Also hpfilter | bkfilter | hfilter Topics “Choose Time Series Filter for Business Cycle Analysis” on page 2-40 When to Use the Hodrick-Prescott Filter
12-272
classify
classify Classify Markov chain states
Syntax bins = classify(mc) [bins,ClassStates,ClassRecurrence,ClassPeriod] = classify(mc)
Description bins = classify(mc) partitions states of the discrete-time Markov chain mc into disjoint communicating classes on page 12-278 and returns the class labels bins identifying the communicating class to which each state belongs. [bins,ClassStates,ClassRecurrence,ClassPeriod] = classify(mc) additionally returns the states in each class (ClassStates), whether the classes are recurrent (ClassRecurrence), and class periods (ClassPeriod).
Examples Identify Communicating Classes of Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0.5 0.5 P= 0 0
0.5 0.4 0 0
0 0.1 0 1
0 0 . 1 0
Create the Markov chain that is characterized by the transition matrix P. P = [0.5 0.5 0 0; 0.5 0.4 0.1 0; 0 0 0 1; 0 0 1 0]; mc = dtmc(P);
Plot a directed graph of the Markov chain. Visually identify the communicating class to which each state belongs by using node colors. figure; graphplot(mc,'ColorNodes',true);
12-273
12
Functions
States 3 and 4 belong to a communicating class with period 2. States 1 and 2 are transient. Programmatically identify to which communicating classes the states belong. bins = classify(mc) bins = 1×4 1
1
2
2
bins is a 1-by-4 vector of communicating class labels. Elements of bins correspond to the states in mc.StateNames. For example, bins(3) = 2 indicates that state 3 is in communicating class 2.
Determine Class Structure of Markov Chain Identify the communicating classes of a Markov chain. Then, determine whether the classes are recurrent and their periodicity. Generate a random seven-state Markov chain. Specify that 40 random elements in the transition matrix should be zero. rng(1); % For reproducibility mc = mcmix(7,'Zeros',40);
12-274
classify
Plot a directed graph of the Markov chain. Visually identify the communicating class to which each state belongs by using node colors. figure; graphplot(mc,'ColorNodes',true)
Identify the communicating classes in mc, and then determine: • The communicating class to which each state belongs • Whether each communicating class is recurrent • The period of each class [bins,ClassStates,ClassRecurrence,ClassPeriod] = classify(mc) bins = 1×7 6
4
6
3
2
ClassStates=1×6 cell array {["7"]} {["5"]} {["4"]}
5
1
{["2"]}
{["6"]}
{["1"
"3"]}
ClassRecurrence = 1x6 logical array 0
0
0
0
0
1
12-275
12
Functions
ClassPeriod = 1×6 1
1
1
1
1
2
mc has seven classes. Each state is its own communicating class, except states 1 and 3, which together compose class 6. Class 6 is the only recurrent class; classes 1 through 5 are transient. Class 6 has period 2; all other classes are aperiodic. Extract Recurrent Subchain of Markov Chain Identify the communicating classes of a Markov chain. Then, extract any recurrent subchains from the Markov chain. Generate a random seven-state Markov chain. Specify that 40 random elements in the transition matrix should be zero. rng(1); % For reproducibility mc = mcmix(7,'Zeros',40);
Identify all communicating classes in the Markov chain and whether the classes are recurrent. [bins,~,ClassRecurrence] = classify(mc); recurrentClass = find(ClassRecurrence,1) recurrentClass = 6 recurrentState = find((bins == recurrentClass)) recurrentState = 1×2 1
3
Class 6, composed of states 1 and 3, is the only recurrent class in mc. Create a subchain from class 6 by specifying at least one state in the subchain. Plot a digraph of the subchain. sc = subchain(mc,recurrentState); figure; graphplot(sc,'ColorNodes',true);
12-276
classify
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries).
Output Arguments bins — Communicating class membership labels numeric vector Communicating class on page 12-278 membership labels for each state, returned as a numeric vector of NumStates length. bins(j) is the label for the communicating class to which state j belongs. Bin values range from 1 through NumClasses. ClassStates — State names in each class cell vector of string vectors State names in each class, returned as a cell vector of length NumClasses containing string vectors. ClassNames{j} is the list of state names in class j. State names are specified in mc.StateNames.
12-277
12
Functions
ClassRecurrence — Class recurrence flags logical vector Class recurrence flags, returned as a logical vector of NumClasses length. ClassRecurrence(j) indicates whether class j is recurrent (true) or transient (false). ClassPeriod — Class periods numeric vector Class periods, returned as a numeric vector of length NumClasses. ClassPeriod(j) is the period of class j. Aperiodic classes have period 1. Note The order of classes in ClassStates, ClassRecurrence, and ClassPeriod corresponds to the class numbers assigned in bins.
More About Communicating Class Communicating classes of a Markov chain are the equivalence classes formed under the relation of mutual reachability. That is, two states are in the same class if and only if each is reachable from the other with nonzero probability in a finite number of steps. Communicating classes are equivalent to strongly connected components in the associated digraph [2]. See graphplot. Irreducible Chain An irreducible chain is a Markov chain consisting of a single communicating class on page 12-278. Unichain A unichain is a Markov chain consisting of a single recurrent class and any transient classes that transition to the recurrent class.
Algorithms • classify determines recurrence and transience from the outdegree of the supernode associated with each communicating class in the condensed digraph [1]. An outdegree of 0 corresponds to recurrence; an outdegree that is greater than 0 corresponds to transience. See graphplot. • classify determines periodicity using a breadth-first search of cycles in the associated digraph, as in [3]. Class period is the greatest common divisor of the lengths of all cycles originating at any state in the class.
Version History Introduced in R2017b
12-278
classify
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [3] Jarvis, J. P., and D. R. Shier. "Graph-Theoretic Analysis of Finite Markov Chains." In Applied Mathematical Modeling: A Multidisciplinary Approach. Boca Raton: CRC Press, 2000.
See Also Objects digraph Functions graphplot | subchain | isreducible | isergodic Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Determine Asymptotic Behavior of Markov Chain” on page 10-39 “Identify Classes in Markov Chain” on page 10-47
12-279
12
Functions
collintest Belsley collinearity diagnostics
Syntax [sValue,condIdx,VarDecomp] = collintest(X) VarDecompTbl = collintest(Tbl) [ ___ ] = collintest( ___ ,Name=Value) collintest(ax,Plot="on", ___ ) [ ___ ,h] = collintest( ___ ,Plot="on")
Description [sValue,condIdx,VarDecomp] = collintest(X) displays, at the command window, Belsley collinearity diagnostics on page 12-290 for assessing the strength and sources of collinearity among variables in the matrix of time series data X. The function also returns the singular values on page 12290 in decreasing order sValue, condition indices on page 12-290 condIdx, and variance decomposition proportions on page 12-290 VarDecomp. VarDecompTbl = collintest(Tbl) displays the Belsley collinearity diagnostics on all the variables of the table or timetable Tbl. The function also returns the table VarDecompTbl containing variables for the singular values and condition indices, and variables for the variance-decomposition proportions associated with each time series. To select a subset of variables in Tbl, for which to compute collinearity diagnostics, use the DataVariables name-value argument. [ ___ ] = collintest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. collintest returns the output argument combination for the corresponding input arguments. For example, collintest(Tbl,Plot="on",Display="off",DataVariables=1:5) plots the Belslely collinearity diagnostics for the first 5 variables of the table Tbl to a figure instead of the command window. collintest(ax,Plot="on", ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = collintest( ___ ,Plot="on") plots the diagnostics of the input series and additionally returns handles to plotted graphics objects h. Use elements of h to modify properties of the plot after you create it.
Examples Compute Belsley Collinearity Diagnostics on Matrix of Data Display collinearity diagnostics for multiple time series using the default options of collintest. Input the time series data as a numeric matrix. 12-280
collintest
Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada
Display the Belsley collinearity diagnostics at the command window. Return the singular values, condition indices, and variance decomposition proportions. series' ans = 5x1 cell {'(INF_C) Inflation rate (CPI-based)' } {'(INF_G) Inflation rate (GDP deflator-based)'} {'(INT_S) Interest rate (short-term)' } {'(INT_M) Interest rate (medium-term)' } {'(INT_L) Interest rate (long-term)' } [sValue,condIdx,VarDecomp] = collintest(Data); Variance Decomposition sValue condIdx Var1 Var2 Var3 Var4 Var5 --------------------------------------------------------2.1748 1 0.0012 0.0018 0.0003 0.0000 0.0001 0.4789 4.5413 0.0261 0.0806 0.0035 0.0006 0.0012 0.1602 13.5795 0.3386 0.3802 0.0811 0.0011 0.0137 0.1211 17.9617 0.6138 0.5276 0.1918 0.0004 0.0193 0.0248 87.8245 0.0202 0.0099 0.7233 0.9979 0.9658
Only the last row in the display has a condition index larger than the default tolerance, 30. In this row, the last three variables (in the last three columns) have variance-decomposition proportions exceeding the default tolerance, 0.5. These results suggest that the short-, medium-, and long-term interest rates exhibit multicollinearity. collintest organizes the outputs in the display table. sValue sValue = 5×1 2.1748 0.4789 0.1602 0.1211 0.0248 condIdx condIdx = 5×1 1.0000 4.5413 13.5795 17.9617 87.8245
12-281
12
Functions
VarDecomp VarDecomp = 5×5 0.0012 0.0261 0.3386 0.6138 0.0202
0.0018 0.0806 0.3802 0.5276 0.0099
0.0003 0.0035 0.0811 0.1918 0.7233
0.0000 0.0006 0.0011 0.0004 0.9979
0.0001 0.0012 0.0137 0.0193 0.9658
Compute Belsley Collinearity Diagnostics on Table Variables Display and return collinearity diagnostics for multiple time series, which are variables in a table, using default options. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Display the Belsley collinearity diagnostics, using all default options. VarDecompTbl = collintest(TT) Variance Decomposition sValue condIdx INF_C INF_G INT_S INT_M INT_L --------------------------------------------------------2.1748 1 0.0012 0.0018 0.0003 0.0000 0.0001 0.4789 4.5413 0.0261 0.0806 0.0035 0.0006 0.0012 0.1602 13.5795 0.3386 0.3802 0.0811 0.0011 0.0137 0.1211 17.9617 0.6138 0.5276 0.1918 0.0004 0.0193 0.0248 87.8245 0.0202 0.0099 0.7233 0.9979 0.9658 VarDecompTbl=5×7 table sValue condIdx ________ _______ 2.1748 0.47889 0.16015 0.12108 0.024763
1 4.5413 13.579 17.962 87.825
INF_C _________
INF_G _________
INT_S __________
INT_M __________
INT_L __________
0.0012446 0.0261 0.33864 0.61384 0.020173
0.0017784 0.080594 0.38021 0.52756 0.0098575
0.00033202 0.0034869 0.081126 0.19176 0.72329
4.2326e-05 0.00057749 0.0011166 0.00035545 0.99791
8.0328e-05 0.001159 0.013662 0.019308 0.96579
collintest returns collinearity diagnostics in the table VarDecompTbl, where variables correspond to the singular values, condition indices, and variance-decomposition proportions of each variable in the data (sValue, condIdx, and VarDecomp). The command window display and output table have a similar form. By default, collintest computes collinearity diagnostics for all variables in the input table. To select a subset of variables from an input table, set the DataVariables option. 12-282
collintest
Extract the variance-decomposition proportions from the output table. varnames = DataTable.Properties.VariableNames; VarDecomp = VarDecompTbl(:,varnames) VarDecomp=5×5 table INF_C INF_G _________ _________ 0.0012446 0.0261 0.33864 0.61384 0.020173
0.0017784 0.080594 0.38021 0.52756 0.0098575
INT_S __________
INT_M __________
INT_L __________
0.00033202 0.0034869 0.081126 0.19176 0.72329
4.2326e-05 0.00057749 0.0011166 0.00035545 0.99791
8.0328e-05 0.001159 0.013662 0.019308 0.96579
Plot Belsley Collinearity Diagnostics Plot collinearity diagnostics for all time series in a table. Load data of Canadian inflation and interest rates Data_Canada.mat. load Data_Canada
Plot the Belsley collinearity diagnostics for all series. collintest(DataTable,Plot="on"); Variance Decomposition sValue condIdx INF_C INF_G INT_S INT_M INT_L --------------------------------------------------------2.1748 1 0.0012 0.0018 0.0003 0.0000 0.0001 0.4789 4.5413 0.0261 0.0806 0.0035 0.0006 0.0012 0.1602 13.5795 0.3386 0.3802 0.0811 0.0011 0.0137 0.1211 17.9617 0.6138 0.5276 0.1918 0.0004 0.0193 0.0248 87.8245 0.0202 0.0099 0.7233 0.9979 0.9658
12-283
12
Functions
The plot corresponds to the values in the last row of the variance-decomposition proportions, which are the only proportions with a condition index larger than the default tolerance of 30. The interest rate series have variance-decomposition proportions exceeding the default tolerance of 0.5 (red markers in the plot).
Plot Belsley Collinearity Diagnostics for Selected Variables and Intercept Compute collinearity diagnostics for selected time series and an intercept. Load the credit default data set Data_CreditDefaults.mat. The table DataTable contains the default rate of investment-grade corporate bonds series (IGD, the response variable) and several predictor variables. load Data_CreditDefaults
Consider a multiple regression model for the default rate that includes an intercept term. Include a variable in the table of data that represents the intercept in the design matrix (that is, a column of ones). Place the intercept variable at the beginning of the table. Const = ones(height(DataTable),1); DataTable = addvars(DataTable,Const,Before=1);
Create a variable that contains all predictor variable names. 12-284
collintest
varnames = DataTable.Properties.VariableNames; prednames = varnames(varnames ~= "IGD");
Graph a correlation plot of all predictor variables except for the intercept dummy variable. figure corrplot(DataTable,DataVariables=prednames(2:end), ... TestR="on");
The predictor BBB is moderately linearly associated with the other predictors, while all other predictors appear unassociated with each other. Plot the Belsley collinearity diagnostics of the predictor variables. Adjust the following options for the collinearity diagnostics: • Set the condition index tolerance to 10. • Set the variance-decomposition proportion tolerance to 0.5. figure collintest(DataTable,Plot="on",DataVariables=prednames, ... TolIdx=10,TolProp=0.5); Variance Decomposition sValue condIdx Const AGE BBB CPF SPR --------------------------------------------------------2.0605 1 0.0015 0.0024 0.0020 0.0140 0.0025
12-285
12
Functions
0.8008 0.2563 0.1710 0.1343
2.5730 8.0400 12.0464 15.3405
0.0016 0.0037 0.2596 0.7335
0.0025 0.3208 0.0950 0.5793
0.0004 0.0105 0.8287 0.1585
0.8220 0.0004 0.1463 0.0173
0.0023 0.3781 0.0001 0.6170
The row associated with condition index 12 (row 4) has one predictor (BBB) with a proportion above the tolerance 0.5, but collinearity requires two or more predictors for a dependency. The row associated with condition index 15.3 (row 5) shows a weak dependence involving AGE, SPR, and the intercept, which the correlation plot does not expose.
Input Arguments X — Time series data numeric matrix Time series data, specified as a numObs-by-numVars numeric matrix. Each column of X corresponds to a variable, and each row corresponds to an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. 12-286
collintest
Specify numVars variables to include in the diagnostics computations by using the DataVariables argument. The selected variables must be numeric. ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, collintest plots to the current axes (gca). Note • To specify a model containing an intercept, include a variable (column) of ones in the time series data. • collintest scales all variables to unit length before computing diagnostics; do not center the variables in the data. • Impute or remove all missing observations (indicated by NaN entries) in the input data before passing the set to collintest.
Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: collintest(Tbl,Plot="on",Display="off",DataVariables=1:5) plots the Belslely collinearity diagnostics for the first 5 variables of the table Tbl to a figure instead of the command window. VarNames — Unique variable names used in displays and plots of results string vector | character vector | cell vector of strings | cell vector of character vectors Unique variable names used in displays and plots of the results, specified as a string vector or cell vector of strings of a length numVars. VarNames(j) specifies the name to use for variable X(:,j) or DataVariables(j). If an intercept term is present, VarNames must include the intercept term (e.g., include the name "Const"). The software truncates all variable names to the first five characters. • If the input time series data is a matrix X, the default is {'var1','var2',...}. • If the input time series data is a table or timetable Tbl, the default is Tbl.Properties.VariableNames. Example: VarNames=["Const" "AGE" "BBD"] Data Types: char | cell | string Display — Flag for command window display of results "on" (default) | "off" | character vector 12-287
12
Functions
Flag for a command window display of results, specified as a value in this table. Value
Description
"on"
collintest displays all outputs in tabular form to the command window.
"off"
collintest does not display the results to the command window.
Example: Display="off" Data Types: char | string Plot — Flag for plotting results "off" (default) | "on" | character vector Flag for plotting results to a figure, specified as a value in this table. Value
Description
"on"
collintest plots critical rows of the output VarDecomp, specifically, rows with condition indices on page 12-290 above the input tolerance TolIdx. If a group of at least two variables in a critical row have variance-decomposition proportions on page 12-290 above the input tolerance TolProp, the group is identified with red markers.
"off"
collintest does not plot results to a figure.
Example: Plot="on" Data Types: char | string TolIdx — Condition index tolerance 30 (default) | numeric scalar of at least 1 Condition index tolerance, specified as a scalar value of at least 1. collintest uses TolIdx to decide which indices are large enough to infer a near dependency in the data. TolIdx is used only when the Plot argument is "on". Example: TolIdx=25 Data Types: double TolProp — Variance-decomposition proportion tolerance 0.5 (default) | numeric scalar in [0,1] Variance-decomposition proportion tolerance, specified as a numeric scalar in the interval [0,1]. collintest uses TolProp to decide which variables are involved in any near dependency. TolProp is used only when the Plot argument is "on". Example: TolProp=0.4 Data Types: double DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector 12-288
collintest
Variables in Tbl for which collintest computes Belsley collinearity diagnostics, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string
Output Arguments sValue — Singular values numeric vector Singular values on page 12-290 of the scaled design matrix composed of the specified time series variables, returned as a numeric vector with elements in descending order. collintest returns sValue when you supply the input X. condIdx — Condition indices numeric vector Condition indices on page 12-290, returned as a numeric vector with elements in ascending order. All condition indices have value between 1 and the condition number on page 12-290 of the scaled design matrix of the specified time series variables. collintest returns condIdx when you supply the input X. Large indices identify near dependencies among the specified variables. The size of the indices is a measure of how near dependencies are to collinearity. VarDecomp — Variance-decomposition proportions numeric matrix Variance-decomposition proportions on page 12-290, returned as a numVars-by-numVars numeric matrix. Large proportions, combined with a large condition index, identify groups of variables involved in near dependencies. collintest returns VarDecomp when you supply the input X. The size of the proportions is a measure of how badly the regression is degraded by the dependency. VarDecompTbl — Collinearity diagnostics summary table Collinearity diagnostics summary, returned as a table with variables for the outputs sValue, condIdx, and VarDecomp. collintest returns Tbl when you supply the input Tbl. The value of the VarNames argument determines the variable names of the columns of VarDecomp. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot. 12-289
12
Functions
collintest plots only when you set Plot="on".
More About Belsley Collinearity Diagnostics Belsley collinearity diagnostics assess the strength and sources of collinearity among variables in a multiple linear regression model on page 12-290. To assess collinearity, the software computes singular values on page 12-290 of the scaled variable matrix, X, and then converts them to condition indices on page 12-290. The conditional indices identify the number and strength of any near dependencies between variables in the variable matrix. The software decomposes the variance of the ordinary least squares (OLS) estimates of the regression coefficients in terms of the singular values to identify variables involved in each near dependency, and the extent to which the dependencies degrade the regression. Condition Indices The condition indices (condIdx) for a scaled matrix X identify the number and strength of any near dependencies in X. For scaled matrix X with p columns and singular values (sValue) S1 ≥ S2 ≥ … ≥ Sp, the condition indices of the columns of X are S1 /S j (sValue(1)/sValue(j)), where j = 1,...,p. All condition indices are bounded between one and the condition number on page 12-290. Condition Number The condition number of a scaled matrix X is an overall diagnostic for detecting collinearity. For scaled matrix X with p columns and singular values (sValue) S1 ≥ S2 ≥ … ≥ Sp, the condition number is S1 /Sp (sValue(1)/sValue(end)). The condition number achieves its lower bound of one when the columns of scaled X are orthonormal. The condition number rises as variates exhibit greater dependency. A limitation of the condition number as a diagnostic is that it fails to provide specifics on the strength and sources of any near dependencies. Multiple Linear Regression Model A multiple linear regression model is a model of the form Y = Xβ + ε . X is a design matrix of regression variables, and β is a vector of regression coefficients. Singular Values The singular values (sValue) of a scaled matrix X are the diagonal elements of the matrix S in the singular value decomposition USV′ . In descending order, the singular values of the scaled matrix X with p columns are S1 ≥ S2 ≥ … ≥ Sp. Variance-Decomposition Proportions Variance-decomposition proportions identify groups of variates involved in near dependencies, and the extent to which the dependencies degrade the regression. 12-290
collintest
From the singular value decomposition USV′ of scaled design matrix X (with p columns), define the following quantities: • V is the matrix of orthonormal eigenvectors of X′X. • The singular values (sValue) S1 ≥ S2 ≥ … ≥ Sp are the ordered diagonal elements of the matrix S. The variance of the OLS estimate of multiple linear regression coefficient i, βi, is proportional to the sum 2
2
2
2
2
2
V(i, 1) /S1 + V(i, 2) /S2 + … + V(i, p) /Sp , where V(i, j) denotes element (i,j) of V. Variance-decomposition proportion (i,j) (VarDecomp) is the proportion of term j in the sum relative to the entire sum, j = 1,...,p. 2
The terms S j are the eigenvalues of scaled X′X. Thus, large variance-decomposition proportions correspond to small eigenvalues of X′X, a common diagnostic for collinearity. The singular value decomposition provides a more direct, numerically stable view of the eigensystem of scaled X′X.
Tips • For purposes of collinearity diagnostics, Belsley [1] shows that column scaling of the design matrix composed of the input time series data is always desirable. However, he also shows that centering the data in X is undesirable. For models with an intercept, if you center the data in X, the role of the constant term in any near dependency is hidden, and yields misleading diagnostics. • Tolerances for identifying large condition indices and variance-decomposition proportions are comparable to critical values in standard hypothesis tests. Experience determines the most useful tolerance, but experiments suggest the collintest defaults are good starting points [1].
Version History Introduced in R2012a R2022a: collintest returns a results table when you supply a table of data If you supply a table of time series data Tbl, collintest returns a table containing variables for the singular values sValue and condition indices condIdx, and variables for the variance-decomposition proportions VarDecomp associated with each time series, from which collinearity is diagnosed. Before R2022a, collintest returned sValue, condIdx, and VarDecomp in separate positions of the output when you supplied a table of input data. Starting in R2022a, if you supply a table of input data, update your code to return all collinearity diagnostic outputs in the first output position. The second optional output is the graphics object h. [VarDecompTbl,h] = collintest(Tbl,Name=Value)
collintest issues an error if you request more outputs. Also, access results by using table indexing. For more details, see “Access Data in Tables”. 12-291
12
Functions
References [1] Belsley, D. A., E. Kuh, and R. E. Welsh. Regression Diagnostics. New York, NY: John Wiley & Sons, Inc., 1980. [2] Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lϋtkepohl, and T. C. Lee. The Theory and Practice of Econometrics. New York, NY: John Wiley & Sons, Inc., 1985.
See Also Apps Econometric Modeler Functions corrplot Topics “Time Series Regression II: Collinearity and Estimator Variance” on page 5-183 “Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94
12-292
conjugateblm
conjugateblm Bayesian linear regression model with conjugate prior for data likelihood
Description The Bayesian linear regression model on page 12-304 object conjugateblm specifies that the joint prior distribution of the regression coefficients and the disturbance variance, that is, (β, σ2) is the dependent, normal-inverse-gamma conjugate model. The conditional prior distribution of β|σ2 is multivariate Gaussian with mean μ and variance σ2V. The prior distribution of σ2 is inverse gamma with shape A and scale B. The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. The specified priors are conjugate for the likelihood, and the resulting marginal and conditional posterior distributions are analytically tractable. For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12296.
Creation Syntax PriorMdl = conjugateblm(NumPredictors) PriorMdl = conjugateblm(NumPredictors,Name,Value) Description PriorMdl = conjugateblm(NumPredictors) creates a Bayesian linear regression model on page 12-304 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is the dependent normal-inversegamma conjugate model. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = conjugateblm(NumPredictors,Name,Value) sets properties on page 12-293 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, conjugateblm(2,'VarNames',["UnemploymentRate"; "CPI"]) specifies the names of the two predictor variables in the model.
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to set a 12-293
12
Functions
more diffuse prior covariance matrix for PriorMdl than the default value, a Bayesian linear regression model containing three model coefficients, enter PriorMdl.V = 100*eye(3);
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term for the value. After creating a model, if you change the of value NumPredictors using dot notation, then these parameters revert to the default values: • Variable names (VarNames) • Prior mean of β (Mu) • Prior covariance matrix of β (V) Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. 12-294
conjugateblm
Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char Mu — Mean hyperparameter of Gaussian prior on β zeros(Intercept + NumPredictors,1) (default) | numeric scalar | numeric vector Mean parameter of the Gaussian prior on β, specified as a numeric scalar or vector. If Mu is a vector, then it must have NumPredictors or NumPredictors + 1 elements. • For NumPredictors elements, conjugateblm sets the prior mean of the NumPredictors predictors only. Predictors correspond to the columns in the predictor data (specified during estimation, simulation, or forecasting). conjugateblm ignores the intercept in the model, that is, conjugateblm specifies the default prior mean to any intercept. • For NumPredictors + 1 elements, the first element corresponds to the prior mean of the intercept, and all other elements correspond to the predictors. Example: 'Mu',[1; 0.08; 2] Data Types: double V — Conditional covariance matrix hyperparameter of Gaussian prior on β 10000*eye(Intercept + NumPredictors) (default) | symmetric, positive-definite matrix | diag(Inf(Intercept + NumPredictors,1)) Conditional covariance matrix of Gaussian prior on β, specified as a c-by-c symmetric, positive definite matrix. c can be NumPredictors or NumPredictors + 1. • If c is NumPredictors, then conjugateblm sets the prior covariance matrix to 1e5 0 ⋯ 0 0 ⋮ 0
V
.
conjugateblm attributes the default prior covariances to the intercept, and attributes V to the coefficients of the predictor variables in the data. Rows and columns of V correspond to columns (variables) in the predictor data. • If c is NumPredictors + 1, then conjugateblm sets the entire prior covariance to V. The first row and column correspond to the intercept. All other rows and columns correspond to the columns in the predictor data. The default value is a flat prior. For an adaptive prior, specify diag(Inf(Intercept + NumPredictors,1)). Adaptive priors indicate zero precision in order for the prior distribution to have as little influence as possible on the posterior distribution. V is the prior covariance of β up to a factor of σ2. Example: 'V',diag(Inf(3,1)) Data Types: double A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar 12-295
12
Functions
Shape hyperparameter of the inverse gamma prior on σ2, specified as a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. Example: 'A',0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf. With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. Example: 'B',5 Data Types: double
Object Functions estimate simulate forecast plot summarize
Estimate posterior distribution of Bayesian linear regression model parameters Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of standard Bayesian linear regression model
Examples Create Normal-Inverse-Gamma Conjugate Prior Model Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t time points, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume that the prior distributions are: •
12-296
β | σ2 ∼ N4 M, σ2V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
conjugateblm
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model. Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p. p = 3; Mdl = bayeslm(p,'ModelType','conjugate') Mdl = conjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Beta(1) | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Beta(2) | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Beta(3) | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Mdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, bayeslm displays a summary of the prior distributions. You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. Mdl.VarNames = ["IPI" "E" "WR"] Mdl = conjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) IPI | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) E | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) WR | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6)
12-297
12
Functions
Sigma2
| 0.5000
0.5000
[ 0.138,
1.616]
1.000
IG(3.00,
1)
Estimate Marginal Posterior Distributions Consider the linear regression model in “Create Normal-Inverse-Gamma Conjugate Prior Model” on page 12-296. Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 Log marginal likelihood: -259.348 | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | -24.2494 8.7821 [-41.514, -6.985] 0.003 t (-24.25, 8.65^2, 68) IPI | 4.3913 0.1414 [ 4.113, 4.669] 1.000 t (4.39, 0.14^2, 68) E | 0.0011 0.0003 [ 0.000, 0.002] 1.000 t (0.00, 0.00^2, 68) WR | 2.4683 0.3490 [ 1.782, 3.154] 1.000 t (2.47, 0.34^2, 68) Sigma2 | 44.1347 7.8020 [31.427, 61.855] 1.000 IG(34.00, 0.00069)
PosteriorMdl is a conjugateblm model object storing the joint marginal posterior distribution of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of WR is in [1.782, 3.154] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.003. • Distribution, which contains descriptions of the posterior distributions of the parameters. For example, the marginal posterior distribution of IPI is t with a mean of 4.39, a standard deviation of 0.14, and 68 degrees of freedom. Access properties of the posterior distribution using dot notation. For example, display the marginal posterior means by accessing the Mu property. 12-298
conjugateblm
PosteriorMdl.Mu ans = 4×1 -24.2494 4.3913 0.0011 2.4683
Estimate Conditional Posterior Distribution Consider the linear regression model in “Create Normal-Inverse-Gamma Conjugate Prior Model” on page 12-296. Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p, and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',["IPI" "E" "WR"]) PriorMdl = conjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) IPI | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) E | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) WR | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the conditional posterior distribution of β given the data and σ2 = 2, and return the estimation summary table to access the estimates. [Mdl,Summary] = estimate(PriorMdl,X,y,'Sigma2',2); Method: Analytic posterior distributions Conditional variable: Sigma2 fixed at 2
12-299
12
Functions
Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | -24.2494 1.8695 [-27.914, -20.585] 0.000 N (-24.25, 1.87^2) IPI | 4.3913 0.0301 [ 4.332, 4.450] 1.000 N (4.39, 0.03^2) E | 0.0011 0.0001 [ 0.001, 0.001] 1.000 N (0.00, 0.00^2) WR | 2.4683 0.0743 [ 2.323, 2.614] 1.000 N (2.47, 0.07^2) Sigma2 | 2 0 [ 2.000, 2.000] 1.000 Fixed value
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 2 during estimation, inferences on it are trivial. Extract the mean vector and covariance matrix of the conditional posterior of β from the estimation summary table. condPostMeanBeta = Summary.Mean(1:(end - 1)) condPostMeanBeta = 4×1 -24.2494 4.3913 0.0011 2.4683 CondPostCovBeta = Summary.Covariances(1:(end - 1),1:(end - 1)) CondPostCovBeta = 4×4 3.4950 0.0350 -0.0001 0.0241
0.0350 0.0009 -0.0000 -0.0013
-0.0001 -0.0000 0.0000 -0.0000
0.0241 -0.0013 -0.0000 0.0055
Display Mdl. Mdl Mdl = conjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) IPI | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6)
12-300
conjugateblm
E WR Sigma2
| 0 | 0 | 0.5000
70.7107 70.7107 0.5000
[-141.273, 141.273] [-141.273, 141.273] [ 0.138, 1.616]
0.500 0.500 1.000
t (0.00, 57.74^2, t (0.00, 57.74^2, IG(3.00, 1)
6) 6)
Because estimate computes the conditional posterior distribution, it returns the original prior model, not the posterior, in the first position of the output argument list.
Estimate Posterior Probability Using Monte Carlo Simulation Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-298. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. p = 3; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 Log marginal likelihood: -259.348 | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | -24.2494 8.7821 [-41.514, -6.985] 0.003 t (-24.25, 8.65^2, 68) IPI | 4.3913 0.1414 [ 4.113, 4.669] 1.000 t (4.39, 0.14^2, 68) E | 0.0011 0.0003 [ 0.000, 0.002] 1.000 t (0.00, 0.00^2, 68) WR | 2.4683 0.3490 [ 1.782, 3.154] 1.000 t (2.47, 0.34^2, 68) Sigma2 | 44.1347 7.8020 [31.427, 61.855] 1.000 IG(34.00, 0.00069)
Extract the posterior mean of β from the posterior model, and the posterior covariance of β from the estimation summary returned by summarize. estBeta = PosteriorMdl.Mu; Summary = summarize(PosteriorMdl); estBetaCov = Summary.Covariances{1:(end - 1),1:(end - 1)};
Suppose that if the coefficient of real wages (WR) is below 2.5, then a policy is enacted. Although the posterior distribution of WR is known, and so you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead. Draw 1e6 samples from the marginal posterior distribution of β. NumDraws = 1e6; rng(1); BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws);
12-301
12
Functions
BetaSim is a 4-by- 1e6 matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws. Isolate the draws corresponding to the coefficient of WR, and then identify which draws are less than 2.5. isWR = PosteriorMdl.VarNames == "WR"; wrSim = BetaSim(isWR,:); isWRLT2p5 = wrSim < 2.5;
Find the marginal posterior probability that the regression coefficient of WR is below 2.5 by computing the proportion of draws that are less than 2.5. probWRLT2p5 = mean(isWRLT2p5) probWRLT2p5 = 0.5362
The posterior probability that the coefficient of real wages is less than 2.5 is about 0.54. The marginal posterior distribution of the coefficient of WR is a t68, but centered at 2.47 and scaled by 0.34. Directly compute the posterior probability that the coefficient of WR is less than 2.5. center = estBeta(isWR); stdBeta = sqrt(diag(estBetaCov)); scale = stdBeta(isWR); t = (2.5 - center)/scale; dof = 68; directProb = tcdf(t,dof) directProb = 0.5361
The posterior probabilities are nearly identical. Copyright 2018 The MathWorks, Inc.
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-298. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so you can use them to forecast real GNP. Turn the estimation display off. p = 3; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
12-302
conjugateblm
Forecast responses using the posterior predictive distribution and using the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product'); ylabel('rGNP'); xlabel('Year'); hold off
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 25.5397
12-303
12
Functions
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. Copyright 2018 The MathWorks, Inc.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Algorithms You can reset all model properties using dot notation, for example, PriorMdl.V = diag(Inf(3,1)). For property resets, conjugateblm does minimal error checking of values. Minimizing error checking has the advantage of reducing overhead costs for Markov chain Monte Carlo simulations, which results in efficient execution of the algorithm.
Alternatives The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2017a 12-304
conjugateblm
See Also Objects semiconjugateblm | diffuseblm | customblm | empiricalblm Functions bayeslm Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-305
12
Functions
conjugatebvarm Bayesian vector autoregression (VAR) model with conjugate prior for data likelihood
Description The Bayesian VAR model on page 12-325 object conjugatebvarm specifies the joint prior or posterior distribution of the array of model coefficients Λ and the innovations covariance matrix Σ of an m-D VAR(p) model. The joint prior distribution (Λ,Σ) is the dependent, matrix-normal-inverseWishart conjugate model on page 12-326. In general, when you create a Bayesian VAR model object, it specifies the joint prior distribution and characteristics of the VARX model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12-312.
Creation Syntax PriorMdl = conjugatebvarm(numseries,numlags) PriorMdl = conjugatebvarm(numseries,numlags,Name,Value) Description To create a conjugatebvarm object, use either the conjugatebvarm function (described here) or the bayesvarm function. The syntaxes for each function are similar, but the options differ. bayesvarm enables you to set prior hyperparameter values for Minnesota prior[1] regularization easily, whereas conjugatebvarm requires the entire specification of prior distribution hyperparameters. PriorMdl = conjugatebvarm(numseries,numlags) creates a numseries-D Bayesian VAR(numlags) model object PriorMdl, which specifies dimensionalities and prior assumptions for all model coefficients Λ = Φ1 Φ2 ⋯ Φp c δ Β ′ and the innovations covariance Σ, where: • numseries = m, the number of response time series variables. • numlags = p, the AR polynomial order. • The joint prior distribution of (Λ,Σ) is the dependent, matrix-normal-inverse-Wishart conjugate model on page 12-326. PriorMdl = conjugatebvarm(numseries,numlags,Name,Value) sets writable properties on page 12-307 (except NumSeries and P) using name-value pair arguments. Enclose each property name in quotes. For example, conjugatebvarm(3,2,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]) specifies the names of the three response variables in the Bayesian VAR(2) model.
12-306
conjugatebvarm
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double numlags — Number of lagged responses nonnegative integer Number of lagged responses in each equation of yt, specified as a nonnegative integer. The resulting model is a VAR(numlags) model; each lag has a numseries-by-numseries coefficient matrix. numlags sets the P property. Data Types: double
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to create a 3-D Bayesian VAR(1) model and label the first through third response variables, and then include a linear time trend term, enter: PriorMdl = conjugatebvarm(3,1,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]); PriorMdl.IncludeTrend = true; Model Characteristics and Dimensionality
Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the model dimensionality, for example '2-Dimensional VAR(3) Model'. Example: "Model 1" Data Types: string | char NumSeries — Number of time series m positive integer This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double P — Multivariate autoregressive polynomial order nonnegative integer This property is read-only. 12-307
12
Functions
Multivariate autoregressive polynomial order, specified as a nonnegative integer. P is the maximum lag that has a nonzero coefficient matrix. P specifies the number of presample observations required to initialize the model. Data Types: double SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. conjugatebvarm stores SeriesNames as a string vector. Example: ["UnemploymentRate" "CPI" "FEDFUNDS"] Data Types: string IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Data Types: logical IncludeTrend — Flag for including linear time trend term δt false (default) | true Flag for including a linear time trend term δt, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Data Types: logical NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. conjugatebvarm includes all predictor variables symmetrically in each response equation. Distribution Hyperparameters
Mu — Mean of vectorized matrix normal prior on Λ zeros(NumSeries*(NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors),1) (default) | numeric vector 12-308
conjugatebvarm
Mean of the vectorized matrix normal prior on Λ, specified as a NumSeries*k-by-1 numeric vector, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). Mu(1:k) corresponds to all coefficients in the equation of response variable SeriesNames(1), Mu((k + 1):(2*k)) corresponds to all coefficients in the equation of response variable SeriesNames(2), and so on. For a set of indices corresponding to an equation: • Elements 1 through NumSeries correspond to the lag 1 AR coefficients of the response variables ordered by SeriesNames. • Elements NumSeries + 1 through 2*NumSeries correspond to the lag 2 AR coefficients of the response variables ordered by SeriesNames. • In general, elements (q – 1)*NumSeries + 1 through q*NumSeries corresponds to the lag q AR coefficients of the response variables ordered by SeriesNames. • If IncludeConstant is true, element NumSeries*P + 1 is the model constant. • If IncludeTrend is true, element NumSeries*P + 2 is the linear time trend coefficient. • If NumPredictors > 0, elements NumSeries*P + 3 through k constitute the vector of regression coefficients of the exogenous variables. This figure shows the structure of the transpose of Mu for a 2-D VAR(3) model that contains a constant vector and four exogenous predictors: y1, t y2, t ⨉ ⨉ ϕ ϕ ϕ ϕ ϕ ϕ c β β β β ϕ ϕ ϕ ϕ ϕ [ 1, 11 1, 12 2, 11 2, 12 3, 11 3, 12 1 11 12 13 14 1, 21 1, 22 2, 21 2, 22 3, 21 ϕ3, 22 c2 β21 β22 β23 β24
], where • ϕq,jk is element (j,k) of the lag q AR coefficient matrix. • cj is the model constant in the equation of response variable j. • Bju is the regression coefficient of the exogenous variable u in the equation of response variable j. Tip bayesvarm enables you to specify Mu easily by using the Minnesota regularization method. To specify Mu directly: 1
Set separate variables for the prior mean of each coefficient matrix and vector.
2
Horizontally concatenate all coefficient means in this order: Coef f = Φ1 Φ2 ⋯ Φp c δ Β .
3
Vectorize the transpose of the coefficient mean matrix. Mu = Coeff.'; Mu = Mu(:);
Data Types: double V — Scaled conditional covariance matrix of vectorized matrix normal prior on Λ eye(NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors) (default) | positive definite numeric matrix 12-309
12
Functions
Scaled conditional covariance matrix of vectorized matrix normal prior on Λ, specified as a k-by-k symmetric, positive definite matrix, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). Row and column indices correspond to all model coefficients relative to the coefficients in the equation of the first response variable y1,t (for more details, see “Algorithms” on page 12-327). • Elements 1 through NumSeries correspond to the lag 1 AR coefficients of the response variables ordered by SeriesNames. • Elements NumSeries + 1 through 2*NumSeries correspond to the lag 2 AR coefficients of the response variables ordered by SeriesNames. • In general, elements (q – 1)*NumSeries + 1 through q*NumSeries correspond to the lag q AR coefficients of the response variables ordered by SeriesNames. • Element NumSeries*P + 1 is the model constant. • Element NumSeries*P + 2 is the linear time trend coefficient. • Element NumSeries*P + 3 through k constitute the vector of regression coefficients of the exogenous variables. For example, consider a 3-D VAR(2) model containing a constant and four exogenous variables. • V(1,1) is Var(ϕ1,11), Var(ϕ1,21), and Var(ϕ1,31). • V(1,4) is Cov(ϕ1,11,ϕ2,11), Cov(ϕ1,21,ϕ2,21), and Cov(ϕ1,31,ϕ2,31). • V(8,9) is Cov(β11,β12), Cov(β21,β22), and Cov(β31,β32), which are the covariances of the regression coefficients of the first and second exogenous variables for all equations. Tip bayesvarm enables you to create any Bayesian VAR prior model and specify V easily by using the Minnesota regularization method. Data Types: double Omega — Inverse Wishart scale matrix eye(numseries) (default) | positive definite numeric matrix Inverse Wishart scale matrix, specified as a NumSeries-by-NumSeries positive definite numeric matrix. Data Types: double DoF — Inverse Wishart degrees of freedom numseries + 10 (default) | positive numeric scalar Inverse Wishart degrees of freedom, specified as a positive numeric scalar. For a proper distribution, specify a value that is greater than numseries – 1. For a distribution with a finite mean, specify a value that is greater than numseries + 1. Data Types: double VAR Model Parameters Derived from Distribution Hyperparameters
AR — Distribution mean of autoregressive coefficient matrices Φ1,…,Φp cell vector of numeric matrices 12-310
conjugatebvarm
This property is read-only. Distribution mean of the autoregressive coefficient matrices Φ1,…,Φp associated with the lagged responses, specified as a P-D cell vector of NumSeries-by-NumSeries numeric matrices. AR{j} is Φj, the coefficient matrix of lag j. Rows correspond to equations and columns correspond to lagged response variables; SeriesNames determines the order of response variables and equations. Coefficient signs are those of the VAR model expressed in difference-equation notation. If P = 0, AR is an empty cell. Otherwise, AR is the collection of AR coefficient means extracted from Mu. Data Types: cell Constant — Distribution mean of model constant c numeric vector This property is read-only. Distribution mean of the model constant c (or intercept), specified as a NumSeries-by-1 numeric vector. Constant(j) is the constant in equation j; SeriesNames determines the order of equations. If IncludeConstant = false, Constant is an empty array. Otherwise, Constant is the model constant vector mean extracted from Mu. Data Types: double Trend — Distribution mean of linear time trend δ numeric vector This property is read-only. Distribution mean of the linear time trend δ, specified as a NumSeries-by-1 numeric vector. Trend(j) is the linear time trend in equation j; SeriesNames determines the order of equations. If IncludeTrend = false (the default), Trend is an empty array. Otherwise, Trend is the linear time trend coefficient mean extracted from Mu. Data Types: double Beta — Distribution mean of regression coefficient matrix Β numeric matrix This property is read-only. Distribution mean of the regression coefficient matrix B associated with the exogenous predictor variables, specified as a NumSeries-by-NumPredictors numeric matrix. Beta(j,:) contains the regression coefficients of each predictor in the equation of response variable j yj,t. Beta(:,k) contains the regression coefficient in each equation of predictor xk. By default, all predictor variables are in the regression component of all response equations. You can down-weight a predictor from an equation by specifying, for the corresponding coefficient, a prior mean of 0 in Mu and a small variance in V. When you create a model, the predictor variables are hypothetical. You specify predictor data when you operate on the model (for example, when you estimate the posterior by using estimate). Columns of the predictor data determine the order of the columns of Beta. 12-311
12
Functions
Data Types: double Covariance — Distribution mean of innovations covariance matrix Σ positive definite numeric matrix This property is read-only. Distribution mean of the innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries positive definite numeric matrix. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. Data Types: double
Object Functions estimate forecast simsmooth simulate summarize
Estimate posterior distribution of Bayesian vector autoregression (VAR) model parameters Forecast responses from Bayesian vector autoregression (VAR) model Simulation smoother of Bayesian vector autoregression (VAR) model Simulate coefficients and innovations covariance matrix of Bayesian vector autoregression (VAR) model Distribution summary statistics of Bayesian vector autoregression (VAR) model
Examples Create Matrix-Normal-Inverse-Wishart Conjugate Prior Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. Assume the following prior distributions: •
Φ1, . . . , Φ4, c ′ Σ ∼ Ν13 × 3 Μ, V, Σ , where M is a 13-by-3 matrix of means and V is the 13-by-13 among-coefficient scale matrix. Equivalently, vec Φ1, . . . , Φ4, c ′ Σ ∼ Ν39 vec Μ , Σ ⊗ V .
• Σ ∼ Inverse Wishart Ω, ν , where Ω is the 3-by-3 scale matrix and ν is the degrees of freedom. Create a conjugate prior model for the 3-D VAR(4) model parameters. numseries = 3; numlags = 4; PriorMdl = conjugatebvarm(numseries,numlags) PriorMdl = conjugatebvarm with properties: Description: "3-Dimensional VAR(4) Model" NumSeries: 3
12-312
conjugatebvarm
P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [13x13 double] [3x3 double] 13 {[3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]
[3x3 double]}
PriorMdl is a conjugatebvarm Bayesian VAR model object representing the prior distribution of the coefficients and innovations covariance of the 3-D VAR(4) model. The command line display shows properties of the model. You can display properties by using dot notation. Display the prior mean matrices of the four AR coefficients by setting each matrix in the cell to a variable. AR1 = PriorMdl.AR{1} AR1 = 3×3 0 0 0
0 0 0
0 0 0
AR2 = PriorMdl.AR{2} AR2 = 3×3 0 0 0
0 0 0
0 0 0
AR3 = PriorMdl.AR{3} AR3 = 3×3 0 0 0
0 0 0
0 0 0
AR4 = PriorMdl.AR{4} AR4 = 3×3 0 0 0
0 0 0
0 0 0
12-313
12
Functions
conjugatebvarm centers all AR coefficients at 0 by default. The AR property is read-only, but it is derived from the writeable property Mu.
Create Conjugate Bayesian AR(2) Model Consider a 1-D Bayesian AR(2) model for the daily NASDAQ returns from January 2, 1990 through December 31, 2001. yt = c + ϕ1 yt − 1 + ϕ2 yt − 1 + εt . The priors are: • [ϕ1 ϕ2 c]′ | σ2 ∼ N3 μ, σ2V , where μ is a 3-by-1 vector of coefficient means and V is a 3-by-3 scaled covariance matrix. • σ2 ∼ IG α, β , where α =
ν 2
is the degrees of freedom and β =
Ω 2
is the scale.
Create a conjugate prior model for the AR(2) model parameters. numseries = 1; numlags = 2; PriorMdl = conjugatebvarm(numseries,numlags) PriorMdl = conjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" 1 2 "Y1" 1 0 0 [3x1 double] [3x3 double] 1 11 {[0] [0]} 0 [1x0 double] [1x0 double] 0.1111
conjugatebvarm interprets the innovations covariance matrix as an inverse Wishart random variable. Because the scales and degrees of freedom hyperparameters among the inverse Wishart and inverse gamma distributions are not equal, you can adjust them by using dot notation. For example, to achieve 10 degrees of freedom for the inverse gamma interpretation, set the inverse Wishart degrees of freedom to 20. PriorMdl.DoF = 20 PriorMdl = conjugatebvarm with properties:
12-314
conjugatebvarm
Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" 1 2 "Y1" 1 0 0 [3x1 double] [3x3 double] 1 20 {[0] [0]} 0 [1x0 double] [1x0 double] 0.0556
Specify High Lag Coefficient Tightness and Response Names In the 3-D VAR(4) model of “Create Matrix-Normal-Inverse-Wishart Conjugate Prior Model” on page 12-312, consider excluding lags 2 and 3 from the model. You cannot exclude coefficient matrices from models, but you can specify high prior tightness on zero for coefficients that you want to exclude. Create a conjugate prior model for the 3-D VAR(4) model parameters. Specify response variable names. By default, AR coefficient prior means are zero. Specify high tightness values for lags 2 and 3 by setting their prior variances to 1e-6. Leave all other coefficient tightness values at their defaults: • 1 for AR coefficient variances • 1e3 for constant vector variances • 0 for all coefficient covariances Also, for conjugate Bayesian VAR models only, MATLAB® assumes that coefficient variances are proportional across response equations. Therefore, specify variances relative to the first equation. numseries = 3; numlags = 4; seriesnames = ["INFL"; "UNRATE"; "FEDFUNDS"]; vPhi1 = ones(1,numseries); vPhi2 = 1e-6*ones(1,numseries); vPhi3 = 1e-6*ones(1,numseries); vPhi4 = ones(1,numseries); vc = 1e3; V = diag([vPhi1 vPhi2 vPhi3 vPhi4 vc]); PriorMdl = conjugatebvarm(numseries,numlags,'SeriesNames',seriesnames,... 'V',V) PriorMdl = conjugatebvarm with properties:
12-315
12
Functions
Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "UNRATE" "FEDFUNDS"] 1 0 0 [39x1 double] [13x13 double] [3x3 double] 13 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
Prepare Prior for Exogenous Predictor Variables Consider the 2-D VARX(1) model for the US real GDP (RGDP) and investment (GCE) rates that treats the personal consumption (PCEC) rate as exogenous: RGDPt GCEt
=c+Φ
RGDPt − 1 GCEt − 1
+ PCECt β + εt .
For all t, εt is a series of independent 2-D normal innovations with a mean of 0 and covariance Σ. Assume the following prior distributions: •
Φ c β ′ Σ ∼ Ν4 × 2 Μ, V, Σ , where M is a 4-by-2 matrix of means and V is the 4-by-4 amongcoefficient scale matrix. Equivalently, vec Φ c β ′ Σ ∼ Ν8 vec Μ , Σ ⊗ V .
• Σ ∼ Inverse Wishart Ω, ν , where Ω is the 2-by-2 scale matrix and ν is the degrees of freedom. Create a conjugate prior model for the 2-D VARX(1) model parameters. numseries = 2; numlags = 1; numpredictors = 1; PriorMdl = conjugatebvarm(numseries,numlags,'NumPredictors',numpredictors) PriorMdl = conjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V:
12-316
"2-Dimensional VAR(1) Model" 2 1 ["Y1" "Y2"] 1 0 1 [8x1 double] [4x4 double]
conjugatebvarm
Omega: DoF: AR: Constant: Trend: Beta: Covariance:
[2x2 double] 12 {[2x2 double]} [2x1 double] [2x0 double] [2x1 double] [2x2 double]
Display the prior mean of the coefficients Mu with the corresponding coefficients. coeffnames = ["phi(11)"; "phi(12)"; "c(1)"; "beta(1)"; "phi(21)"; "phi(22)"; "c(2)"; "beta(2)"]; array2table(PriorMdl.Mu,'VariableNames',{'PriorMean'},'RowNames',coeffnames) ans=8×1 table PriorMean _________ phi(11) phi(12) c(1) beta(1) phi(21) phi(22) c(2) beta(2)
0 0 0 0 0 0 0 0
Set Prior Hyperparameters for Minnesota Regularization conjugatebvarm options enable you to specify prior hyperparameter values directly, but bayesvarm options are well suited for tuning hyperparameters following the Minnesota regularization method. Consider the 3-D VAR(4) model of “Create Matrix-Normal-Inverse-Wishart Conjugate Prior Model” on page 12-312. The model contains 39 coefficients. For coefficient sparsity, create a conjugate Bayesian VAR model by using bayesvarm. Specify the following, a priori: • Each response is an AR(1) model, on average, with lag 1 coefficient 0.75. • Prior scaled coefficient covariances decay with increasing lag at a rate of 2 (that is, lower lags are more important than higher lags). numseries = 3; numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'ModelType','conjugate',... 'Center',0.75,'Decay',2) PriorMdl = conjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1
12-317
12
Functions
IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
0 0 [39x1 double] [13x13 double] [3x3 double] 13 {[3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]
[3x3 double]}
Display the prior coefficient means in the equation of the first response. Phi1 = PriorMdl.AR{1} Phi1 = 3×3 0.7500 0 0
0 0.7500 0
0 0 0.7500
Phi2 = PriorMdl.AR{2} Phi2 = 3×3 0 0 0
0 0 0
0 0 0
Phi3 = PriorMdl.AR{3} Phi3 = 3×3 0 0 0
0 0 0
0 0 0
Phi4 = PriorMdl.AR{4} Phi4 = 3×3 0 0 0
0 0 0
0 0 0
Display a heatmap of the prior scaled covariances of the coefficients in the first response equation.
% Create labels for the chart. numARCoeffMats = PriorMdl.NumSeries*PriorMdl.P; arcoeffnames = strings(numARCoeffMats,1); for r = numlags:-1:1 arcoeffnames(((r-1)*numseries+1):(numseries*r)) = ["\phi_{"+r+",11}" "\phi_{"+r+",12}" "\phi_ end
12-318
conjugatebvarm
heatmap(arcoeffnames,arcoeffnames,PriorMdl.V(1:end-1,1:end-1));
For conjugate Bayesian VAR models, scaled covariances are proportional among equations.
Work with Prior and Posterior Distributions Consider the 3-D VAR(4) model of “Create Matrix-Normal-Inverse-Wishart Conjugate Prior Model” on page 12-312. Estimate the posterior distribution, and generate forecasts from the corresponding posterior predictive distribution. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
12-319
12
Functions
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a conjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = conjugatebvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate Posterior Distribution Estimate the posterior distribution by passing the prior model and entire data series to estimate. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','equation'); Bayesian VAR under conjugate priors Effective Sample Size: 197
12-320
conjugatebvarm
Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1260 -0.4400 0.1049 0.3176 -0.0545 0.0440 0.4173 | (0.0713) (0.1395) (0.0366) (0.0810) (0.1490) (0.0386) (0.0802) DUNRATE | -0.0236 0.4440 0.0350 0.0900 0.2295 0.0520 -0.0330 | (0.0396) (0.0774) (0.0203) (0.0449) (0.0827) (0.0214) (0.0445) DFEDFUNDS | -0.1514 -1.3408 -0.2762 0.3275 -0.2971 -0.3041 0.2609 | (0.1517) (0.2967) (0.0777) (0.1722) (0.3168) (0.0820) (0.1705) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.2725 -0.0197 0.1407 | (0.0270) (0.0106) (0.0417) DUNRATE | -0.0197 0.0839 -0.1290 | (0.0106) (0.0083) (0.0242) DFEDFUNDS | 0.1407 -0.1290 1.2322 | (0.0417) (0.0242) (0.1220)
Because the prior is conjugate for the data likelihood, the posterior is a conjugatebvarm object. By default, estimate uses the first four observations as a presample to initialize the model. Generate Forecasts from Posterior Predictive Distribution From the posterior predictive distribution, generate forecasts over a two-year horizon. Because sampling from the posterior predictive distribution requires the entire data set, specify the prior model in forecast instead of the posterior. fh = 8; FY = forecast(PriorMdl,fh,rmDataTimeTable{:,seriesnames});
FY is an 8-by-3 matrix of forecasts. Plot the end of the data set and the forecasts. fp = rmDataTimeTable.Time(end) + calquarters(1:fh); figure plotdata = [rmDataTimeTable{end - 10:end,seriesnames}; FY]; plot([rmDataTimeTable.Time(end - 10:end); fp'],plotdata) hold on plot([fp(1) fp(1)],ylim,'k-.') legend(seriesnames) title('Data and Forecasts') hold off
12-321
12
Functions
Compute Impulse Responses Plot impulse response functions by passing posterior estimations to armairf. armairf(PosteriorMdl.AR,[],'InnovCov',PosteriorMdl.Covariance)
12-322
conjugatebvarm
12-323
12
Functions
12-324
conjugatebvarm
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. 12-325
12
Functions
• zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation. Dependent, Matrix-Normal-Inverse-Wishart Conjugate Model The dependent, matrix-normal-inverse-Wishart conjugate model is an m-D Bayesian VAR(p) model on page 12-325 in which the conditional prior distribution of Λ|Σ is matrix normal with mean matrix Μ and scale matrices Σ and V. The prior distribution of Σ is inverse Wishart with scale matrix Ω and degrees of freedom ν. Symbolically: Λ Σ N(mp + r + 1c + 1δ) × m Μ, V, Σ Σ Inverse Wishart Ω, ν , which implies, for λ = vec(Λ), λ Σ Nm(mp + r + 1c + 1δ) μ, Σ ⊗ V , where • μ = vec(Μ) = Mu. • V = V. • r = NumPredictors. • 1c is 1 if IncludeConstant is true, and 0 otherwise. 12-326
conjugatebvarm
• 1δ is 1 if IncludeTrend is true, and 0 otherwise. To achieve posterior distributions that are conjugate for the data likelihood, the AR coefficient matrix covariances must be proportional among equations. And, for each equation, self- and cross-lag covariances must be equal. The posterior distributions are Λ Σ, yt, xt N(mp + r + 1c + 1δ) × m Μ, V, Σ Σ yt, xt Inverse Wishart Ω, ν , where: •
Μ= V
−1
−1
T
∑
+
t=1
•
T
V = V −1 +
∑
t=1
•
Ω=Ω+
T
∑
t=1
zt′zt
T
V −1Μ +
∑
t=1
zt′yt′ .
−1
zt′zt
.
yt yt′ + ΜV −1Μ − ΜV
−1
Μ.
• ν = T + ν.
Algorithms • If you pass either a conjugatebvarm or diffusebvarm object and data to estimate, MATLAB returns a conjugatebvarm object representing the posterior distribution. • The conditional covariance (unscaled) of the entire vectorized matrix normal prior is Σ⊗V. To achieve conjugacy, these conditions must be true: • Prior covariances are assumed to be proportional among all equations. Σ determines the proportionality, and scales V during posterior estimation. • For an equation, the covariances between all AR coefficients, self lag and cross lag, are equal. conjugatebvarm enforces the first condition, but not the second. Therefore, conjugatebvarm applies elements of V to all coefficients in the model relative to the coefficients in the equation of y1,t.
Version History Introduced in R2020a
References [1] Litterman, Robert B. "Forecasting with Bayesian Vector Autoregressions: Five Years of Experience." Journal of Business and Economic Statistics 4, no. 1 (January 1986): 25–38. https://doi.org/10.2307/1391384.
12-327
12
Functions
See Also Functions bayesvarm Objects semiconjugatebvarm | diffusebvarm | normalbvarm
12-328
convert2daily
convert2daily Aggregate timetable data to daily periodicity
Syntax TT2 = convert2daily(TT1) TT2 = convert2daily(TT1,Name,Value)
Description TT2 = convert2daily(TT1) aggregates data (for example, high-frequency and intra-day data) to a daily periodicity. TT2 = convert2daily(TT1,Name,Value) uses additional options specified by one or more namevalue arguments.
Examples Aggregate Timetable Data to Daily Periodicity Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The timetable DataTimeTable contains measurements recorded at various, irregular times during trading hours (09:30 to 16:00) of the New York Stock Exchange (NYSE) from January 1, 2018, through December 31, 2020. For example, display the first few observations. head(DataTimeTable) Time ____________________
Price ______
Log_Return __________
01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27
DataTimeTable does not include business calendar awareness. If you want to account for nonbusiness days (weekends, holidays, and market closures) and you have a Financial Toolbox™ license, add business calendar awareness by using the addBusinessCalendar function. Aggregate the daily price series to a daily series by reporting the final price of each day. 12-329
12
Functions
DailyPrice = convert2daily(DataTimeTable(:,"Price")); tail(DailyPrice) Time ___________
Price ______
24-Dec-2020 25-Dec-2020 26-Dec-2020 27-Dec-2020 28-Dec-2020 29-Dec-2020 30-Dec-2020 31-Dec-2020
286.35 286.26 285.68 285.61 294.36 300.44 303.84 301.04
DailyPrice is a timetable containing the final prices for each reported day in DataTimeTable.
Specify Aggregation Method for Each Variable This example shows how to specify the appropriate aggregation method for the units of a variable. Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The price series Price contains absolute measurements, whereas the log returns series Log_Return is the rate of change of the price series among successive observations. Because the series have different units, you must specify the appropriate method when you aggregate the series. Specifically, if you report the final price for a given periodicity, you must report the sum of the log returns within each period. Aggregate the data so that the result has a daily periodicity. For each series, specify the aggregation method that is appropriate for the unit. DailyTT = convert2daily(DataTimeTable,Aggregation=["lastvalue" "sum"]) DailyTT=1096×2 timetable Time Price ___________ ______ 01-Jan-2018 02-Jan-2018 03-Jan-2018 04-Jan-2018 05-Jan-2018 06-Jan-2018 07-Jan-2018 08-Jan-2018 09-Jan-2018 10-Jan-2018 11-Jan-2018 12-Jan-2018 13-Jan-2018 14-Jan-2018
12-330
100.15 99.72 105.57 109.01 110.69 110.48 113.83 116.41 118.54 120.46 120.87 119.91 117.38 116.04
Log_Return __________ -0.023876 -0.0043028 0.057008 0.032065 0.015294 -0.001899 0.029872 0.022412 0.018132 0.016067 0.0033978 -0.0079741 -0.021325 -0.011482
convert2daily
15-Jan-2018 16-Jan-2018 ⋮
114.72 115.28
-0.011441 0.0048696
DailyTT1 is a timetable containing the daily final prices and log returns. Verify the results for January 1, 2018, through January 3, 2018. jan42018 = datetime(2018,01,04); DataTimeTable(DataTimeTable.Time < jan42018,:) ans=9×2 timetable Time ____________________ 01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27 14:45:20
Price ______
Log_Return __________
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05 105.57
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929 -0.013922
DailyTT(DailyTT.Time < jan42018,:) ans=3×2 timetable Time ___________ 01-Jan-2018 02-Jan-2018 03-Jan-2018
Price ______
Log_Return __________
100.15 99.72 105.57
-0.023876 -0.0043028 0.057008
By visual comparison, the daily final results match. Each computed daily log return is the sum of the log returns recorded during the corresponding day in the raw data. Cross-check the log returns of January 2 and 3 by computing the difference between the log final prices for each day. verify = diff(log(DailyTT.Price)); verify(1:2) ans = 2×1 -0.0043 0.0570
Input Arguments TT1 — Data to aggregate to daily periodicity timetable Data to aggregate to a daily periodicity, specified as a timetable. 12-331
12
Functions
Each variable can be a numeric vector (univariate series) or numeric matrix (multivariate series). Note • NaNs indicate missing values. • Timestamps must be in ascending or descending order.
By default, all days are business days. If your timetable does not account for nonbusiness days (weekends, holidays, and market closures), add business calendar awareness by using addBusinessCalendar first. For example, the following command adds business calendar logic to include only NYSE business days. TT = addBusinessCalendar(TT);
Data Types: timetable Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: TT2 = convert2daily(TT1,'Aggregation',["lastvalue" "sum"]) Aggregation — Intra-day aggregation method for data in TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Intra-day aggregation method for TT1 defining how data is aggregated over business days, specified as one of the following methods, a string vector of methods, or a length numVariables cell vector of methods, where numVariables is the number of variables in TT1. • "sum" — Sum the values in each year or day. • "mean" — Calculate the mean of the values in each year or day. • "prod" — Calculate the product of the values in each year or day. • "min" — Calculate the minimum of the values in each year or day. • "max" — Calculate the maximum of the values in each year or day. • "firstvalue" — Use the first value in each year or day. • "lastvalue" — Use the last value in each year or day. • @customfcn — A custom aggregation method that accepts a timetable and returns a numeric scalar (for univariate series) or row vector (for multivariate series). The function must accept empty inputs []. If you specify a single method, convert2daily applies the specified method to all time series in TT1. If you specify a string vector or cell vector aggregation, convert2daily applies aggregation(j) to TT1(:,j); convert2daily applies each aggregation method one at a time (for more details, see retime). For example, consider a daily timetable representing TT1 with three variables. Time ____________________
12-332
AAA ______
BBB ______
CCC ________________
convert2daily
01-Jan-2018 01-Jan-2018 02-Jan-2018 02-Jan-2018 02-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018 04-Jan-2018 04-Jan-2018 04-Jan-2018 05-Jan-2018 05-Jan-2018
09:45:47 12:48:09 10:27:32 12:46:09 14:14:13 15:52:31 09:47:11 11:24:23 14:41:17 16:00:00 09:55:51 10:07:12 14:26:23 13:13:12 14:57:53
100.00 100.03 100.07 100.08 100.25 100.19 100.54 100.59 101.40 101.94 102.53 103.35 103.40 103.91 103.89
200.00 200.06 200.14 200.16 200.50 200.38 201.08 201.18 202.80 203.88 205.06 206.70 206.80 207.82 207.78
300.00 300.09 300.21 300.24 300.75 300.57 301.62 301.77 304.20 305.82 307.59 310.05 310.20 311.73 311.67
400.00 400.12 400.28 400.32 401.00 400.76 402.16 402.36 405.60 407.76 410.12 413.40 413.60 415.64 415.56
The corresponding default daily results representing TT2 (where the 'lastvalue' is reported for each day) are as follows. Time ___________ 01-Jan-2018 02-Jan-2018 03-Jan-2018 04-Jan-2018 05-Jan-2018
AAA ______ 100.03 100.19 101.94 103.40 103.89
BBB ______ 200.06 200.38 203.88 206.80 207.78
CCC ________________ 300.09 400.12 300.57 400.76 305.82 407.76 310.20 413.60 311.67 415.56
All methods omit missing data (NaNs) in direct aggregation calculations on each variable. However, for situations in which missing values appear in the first row of TT1, missing values can also appear in the aggregated results TT2. To address missing data, write and specify a custom aggregation method (function handle) that supports missing data. Data Types: char | string | cell | function_handle
Output Arguments TT2 — Daily data timetable Daily data, returned as a timetable. The time arrangement of TT1 and TT2 are the same. If a variable of TT1 has no records for a business day within the sampling time span, convert2daily returns a NaN for that variable and business day in TT2. The first date in TT2 is the first business date on or after the first date in TT1. The last date in TT2 is the last business date on or before the last date in TT1.
Version History Introduced in R2021a
See Also convert2weekly | convert2monthly | convert2quarterly | convert2semiannual | convert2annual | timetable | addBusinessCalendar 12-333
12
Functions
Topics “Resample and Aggregate Data in Timetable” “Combine Timetables and Synchronize Their Data” “Retime and Synchronize Timetable Variables Using Different Methods”
12-334
convert2weekly
convert2weekly Aggregate timetable data to weekly periodicity
Syntax TT2 = convert2weekly(TT1) TT2 = convert2weekly( ___ ,Name,Value)
Description TT2 = convert2weekly(TT1) aggregates data (for example, data recorded daily) to a weekly periodicity. TT2 = convert2weekly( ___ ,Name,Value) uses additional options specified by one or more name-value arguments.
Examples Aggregate Timetable Data to Weekly Periodicity Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The timetable DataTimeTable contains measurements recorded at various, irregular times during trading hours (09:30 to 16:00) of the New York Stock Exchange (NYSE) from January 1, 2018, through December 31, 2020. For example, display the first few observations. head(DataTimeTable) Time ____________________
Price ______
Log_Return __________
01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27
DataTimeTable does not include business calendar awareness. If you want to account for nonbusiness days (weekends, holidays, and market closures) and you have a Financial Toolbox™ license, add business calendar awareness by using the addBusinessCalendar function. Aggregate the price series to a weekly series by reporting the final price in each week. 12-335
12
Functions
WeeklyPrice = convert2weekly(DataTimeTable(:,"Price"));
WeeklyPrice is a timetable containing the final prices for each reported week in DataTimeTable.
Specify Aggregation Method for Each Variable This example shows how to specify the appropriate aggregation method for the units of a variable. It also shows how to use convert2weekly to aggregate both intra-day data and aggregated daily data, which result in equivalent weekly aggregates. Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The price series Price contains absolute measurements, whereas the log returns series Log_Return is the rate of change of the price series among successive observations. Because the series have different units, you must specify the appropriate method when you aggregate the series. Specifically, if you report the final price for a given periodicity, you must report the sum of the log returns within each period. To understand how to maintain consistency among aggregation methods, use two approaches to aggregate DataTimeTable so that the result has a weekly periodicity. 1
Pass DataTimeTable directly to convert2weekly.
2
Aggregate DataTimeTable so that the result has a daily periodicity by using convert2daily, then pass the result to convert2weekly.
In both cases, specify reporting the last price and the sum of the log returns for each period. Directly aggregate the data so that the result has a weekly periodicity. For each series, specify the aggregation method that is appropriate for the unit. aggmethods = ["lastvalue" "sum"]; WeeklyTT1 = convert2weekly(DataTimeTable,Aggregation=aggmethods) WeeklyTT1=157×2 timetable Time Price ___________ ______ 05-Jan-2018 12-Jan-2018 19-Jan-2018 26-Jan-2018 02-Feb-2018 09-Feb-2018 16-Feb-2018 23-Feb-2018 02-Mar-2018 09-Mar-2018 16-Mar-2018 23-Mar-2018 30-Mar-2018 06-Apr-2018
12-336
110.69 119.91 116.6 118.51 120.03 117.07 117.06 116.72 109.98 110.27 107.35 112.78 110.27 105.27
Log_Return ___________ 0.076188 0.080008 -0.027992 0.016248 0.012744 -0.02497 -8.5423e-05 -0.0029087 -0.059479 0.0026334 -0.026837 0.049344 -0.022507 -0.046403
convert2weekly
13-Apr-2018 20-Apr-2018 ⋮
106.01 107.93
0.007005 0.017949
WeeklyTT1 is a timetable containing the weekly data. Price is a series of the final stock prices for each week, and Log_Return is the sum of the log returns for each week. Aggregate the data in two steps: aggregate the data so that the result has a daily periodicity, then aggregate the daily data to weekly data. For each series, specify the aggregation method that is appropriate for the unit. DailyTT = convert2daily(DataTimeTable,Aggregation=aggmethods); tail(DailyTT) Time ___________
Price ______
Log_Return ___________
24-Dec-2020 25-Dec-2020 26-Dec-2020 27-Dec-2020 28-Dec-2020 29-Dec-2020 30-Dec-2020 31-Dec-2020
286.35 286.26 285.68 285.61 294.36 300.44 303.84 301.04
-0.0067521 -0.00031435 -0.0020282 -0.00024506 0.030176 0.020445 0.011253 -0.0092581
WeeklyTT2 = convert2weekly(DailyTT,Aggregation=aggmethods) WeeklyTT2=157×2 timetable Time Price ___________ ______ 05-Jan-2018 12-Jan-2018 19-Jan-2018 26-Jan-2018 02-Feb-2018 09-Feb-2018 16-Feb-2018 23-Feb-2018 02-Mar-2018 09-Mar-2018 16-Mar-2018 23-Mar-2018 30-Mar-2018 06-Apr-2018 13-Apr-2018 20-Apr-2018 ⋮
110.69 119.91 116.6 118.51 120.03 117.07 117.06 116.72 109.98 110.27 107.35 112.78 110.27 105.27 106.01 107.93
Log_Return ___________ 0.076188 0.080008 -0.027992 0.016248 0.012744 -0.02497 -8.5423e-05 -0.0029087 -0.059479 0.0026334 -0.026837 0.049344 -0.022507 -0.046403 0.007005 0.017949
DailyTT is a timetable with daily periodicity. Price is a series of the final stock prices for each day, and Log_Return is the sum of the log returns for each day. WeeklyTT1 and WeeklyTT2 are equal. convert2weekly reports results on Fridays by default. For weeks during which Friday is not a trading day in the NYSE, the function reports results on the previous business day. You can use the 12-337
12
Functions
name-value argument EndOfWeekDay to specify a different day of the week that ends business weeks.
Input Arguments TT1 — Data to aggregate to weekly periodicity timetable Data to aggregate to a weekly periodicity, specified as a timetable. Each variable can be a numeric vector (univariate series) or numeric matrix (multivariate series). Note • NaNs indicate missing values. • Timestamps must be in ascending or descending order.
By default, all days are business days. If your timetable does not account for nonbusiness days (weekends, holidays, and market closures), add business calendar awareness by using addBusinessCalendar first. For example, the following command adds business calendar logic to include only NYSE business days. TT = addBusinessCalendar(TT);
Data Types: timetable Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: TT2 = convert2weekly(TT1,'Aggregation',["lastvalue" "sum"]) Aggregation — Aggregation method for TT1 data for intra-week or inter-day aggregation "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Aggregation method for TT1 defining how data is aggregated over business days in an intra-week or inter-day periodicity, specified as one of the following methods, a string vector of methods, or a length numVariables cell vector of methods, where numVariables is the number of variables in TT1. • "sum" — Sum the values in each year or day. • "mean" — Calculate the mean of the values in each year or day. • "prod" — Calculate the product of the values in each year or day. • "min" — Calculate the minimum of the values in each year or day. • "max" — Calculate the maximum of the values in each year or day. • "firstvalue" — Use the first value in each year or day. • "lastvalue" — Use the last value in each year or day. 12-338
convert2weekly
• @customfcn — A custom aggregation method that accepts a table variable and returns a numeric scalar (for univariate series) or row vector (for multivariate series). The function must accept empty inputs []. If you specify a single method, convert2weekly applies the specified method to all time series in TT1. If you specify a string vector or cell vector aggregation, convert2weekly applies aggregation(j) to TT1(:,j); convert2weekly applies each aggregation method one at a time (for more details, see retime). For example, consider a daily timetable representing TT1 with three variables. Time ___________ 01-Jan-2018 02-Jan-2018 03-Jan-2018 04-Jan-2018 05-Jan-2018 06-Jan-2018 07-Jan-2018 08-Jan-2018 09-Jan-2018 10-Jan-2018 11-Jan-2018 12-Jan-2018 13-Jan-2018 14-Jan-2018 15-Jan-2018 16-Jan-2018 17-Jan-2018 18-Jan-2018 19-Jan-2018
AAA ______ 100.00 100.03 100.07 100.08 100.25 100.19 100.54 100.59 101.40 101.94 102.53 103.35 103.40 103.91 103.89 104.44 104.44 104.04 104.94
BBB ______ 200.00 200.06 200.14 200.16 200.50 200.38 201.08 201.18 202.80 203.88 205.06 206.70 206.80 207.82 207.78 208.88 208.88 208.08 209.88
CCC ________________ 300.00 400.00 300.09 400.12 300.21 400.28 300.24 400.32 300.75 401.00 300.57 400.76 301.62 402.16 301.77 402.36 304.20 405.60 305.82 407.76 307.59 410.12 310.05 413.40 310.20 413.60 311.73 415.64 311.67 415.56 313.32 417.76 313.32 417.76 312.12 416.16 314.82 419.76
The corresponding default weekly results representing TT2 (in which all days are business days and the 'lastvalue' is reported on Fridays) are as follows. Time ___________ 05-Jan-2018 12-Jan-2018 19-Jan-2018
AAA ______ 100.25 103.35 104.94
BBB ______ 200.50 206.70 209.88
CCC ________________ 300.75 401.00 310.05 413.40 314.82 419.76
The default 'lastvalue' returns the latest observed value in a given week for all variables in TT1. All methods omit missing data (NaNs) in direct aggregation calculations on each variable. However, for situations in which missing values appear in the first row of TT1, missing values can also appear in the aggregated results TT2. To address missing data, write and specify a custom aggregation method (function handle) that supports missing data. Data Types: char | string | cell | function_handle Daily — Intra-day aggregation method for TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Intra-day aggregation method for TT1, specified as an aggregation method, a string vector of methods, or a length numVariables cell vector of methods. For more details on supported methods and behaviors, see the 'Aggregation' name-value argument. 12-339
12
Functions
Data Types: char | string | cell | function_handle EndOfWeekDay — Day of week that ends business weeks "Friday" (weeks end on Friday) (default) | scalar integer with value 1 through 7 | "Sunday" | "Monday" | "Tuesday" | "Wednesday" | "Thursday" | "Friday" | "Saturday" | character vector Day of the week that ends business weeks, specified as a value in the table. Value
Day Ending Each Week
"Sunday" or 1
Sunday
"Monday" or 2
Monday
"Tuesday" or 3
Tuesday
"Wednesday" or 4
Wednesday
"Thursday" or 5
Thursday
"Friday" or 6
Friday
"Saturday" or 7
Saturday
If the specified end-of-week day in a given week is not a business day, the preceding business day ends that week. Data Types: double | char | string
Output Arguments TT2 — Weekly data timetable Weekly data, returned as a timetable. The time arrangement of TT1 and TT2 are the same. If a variable of TT1 has no business-day records during an annual period within the sampling time span, convert2weekly returns a NaN for that variable and annual period in TT2. If the first week (week1) of TT1 contains at least one business day, the first date in TT2 is the last business date of week1. Otherwise, the first date in TT2 is the next end-of-week business date of TT1. If the last week (weekT) of TT1 contains at least one business day, the last date in TT2 is the last business date of weekT. Otherwise, the last date in TT2 is the previous end-of-week business date of TT1.
Version History Introduced in R2021a
See Also convert2daily | convert2monthly | convert2quarterly | convert2semiannual | convert2annual | timetable | addBusinessCalendar Topics “Resample and Aggregate Data in Timetable” “Combine Timetables and Synchronize Their Data” 12-340
convert2weekly
“Retime and Synchronize Timetable Variables Using Different Methods”
12-341
12
Functions
convert2monthly Aggregate timetable data to monthly periodicity
Syntax TT2 = convert2monthly(TT1) TT2 = convert2monthly(TT1,Name,Value)
Description TT2 = convert2monthly(TT1) aggregates data (for example, data recorded daily or weekly) to monthly periodicity. TT2 = convert2monthly(TT1,Name,Value) uses additional options specified by one or more name-value arguments.
Examples Aggregate Timetable Data to Monthly Periodicity Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The timetable DataTimeTable contains measurements recorded at various, irregular times during trading hours (09:30 to 16:00) of the New York Stock Exchange (NYSE) from January 1, 2018, through December 31, 2020. For example, display the first few observations. head(DataTimeTable) Time ____________________
Price ______
Log_Return __________
01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27
DataTimeTable does not include business calendar awareness. If you want to account for nonbusiness days (weekends, holidays, and market closures) and you have a Financial Toolbox™ license, add business calendar awareness by using the addBusinessCalendar function. Aggregate the price series to a monthly series by reporting the final price in each month. 12-342
convert2monthly
MonthlyPrice = convert2monthly(DataTimeTable(:,"Price")); tail(MonthlyPrice) Time ___________
Price ______
31-May-2020 30-Jun-2020 31-Jul-2020 31-Aug-2020 30-Sep-2020 31-Oct-2020 30-Nov-2020 31-Dec-2020
227.22 224.29 236.4 227.5 246.77 275.07 298.87 301.04
MonthlyPrice is a timetable containing the final prices for each reported month in DataTimeTable.
Use Custom Aggregation Method to Convert Daily Data to Monthly Periodicity You can apply custom aggregation methods using function handles. Specify a function handle to aggregate related variables in a timetable while maintaining consistency between aggregated results when converting from a daily to a monthly periodicity. Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
Include another variable in the data called Simple_Return, which contains the simple (proportional) returns associated with the price series, and examine the first few rows. DataTimeTable.Simple_Return = exp(DataTimeTable.Log_Return) - 1; head(DataTimeTable) Time ____________________
Price ______
Log_Return __________
01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27
% Log returns to simple returns
Simple_Return _____________ -0.025056 0.0114 0.0035594 -0.0133 -0.0042936 0.003911 0.038458 0.029723
The price series Price contains absolute measurements, whereas the log and simple returns series, Log_Return and Simple_Return, are the rates of change of the price series among successive observations. Because the series have different units, you must specify the appropriate method when you aggregate the series. Specifically, if you report the final price for a given periodicity, you must report the sum of the log returns within each period and a custom transformation for simple returns. Create a function to aggregate simple returns. 12-343
12
Functions
f = @(x)(prod(1 + x,1,'omitnan') - 1);
Aggregate the data so that the result has an monthly periodicity. For each series, specify the aggregation method that is appropriate for the unit. TT = convert2monthly(DataTimeTable,Aggregation={'lastvalue' 'sum' f}); head(TT) Time ___________
Price ______
Log_Return __________
31-Jan-2018 28-Feb-2018 31-Mar-2018 30-Apr-2018 31-May-2018 30-Jun-2018 31-Jul-2018 31-Aug-2018
117.35 113.52 110.74 105.58 97.88 99.29 102.72 124.99
0.13462 -0.033182 -0.024794 -0.047716 -0.075727 0.014303 0.033962 0.19623
Simple_Return _____________ 0.1441 -0.032637 -0.024489 -0.046596 -0.07293 0.014405 0.034545 0.2168
The aggregation function for simple returns operates along the first dimension (row) and omits missing data (NaNs). For more information on custom aggregation functions, see timetable and retime.
Input Arguments TT1 — Data to aggregate to monthly periodicity timetable Data to aggregate to a monthly periodicity, specified as a timetable. Each variable can be a numeric vector (univariate series) or numeric matrix (multivariate series). Note • NaNs indicate missing values. • Timestamps must be in ascending or descending order.
By default, all days are business days. If your timetable does not account for nonbusiness days (weekends, holidays, and market closures), add business calendar awareness by using addBusinessCalendar first. For example, the following command adds business calendar logic to include only NYSE business days. TT = addBusinessCalendar(TT);
Data Types: timetable Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. 12-344
convert2monthly
Example: TT2 = convert2monthly(TT1,'Aggregation',["lastvalue" "sum"]) Aggregation — Aggregation method for TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Aggregation method for TT1 defining how to aggregate data over business days in an intra-month or inter-day periodicity, specified as one of the following methods, a string vector of methods, or a length numVariables cell vector of methods, where numVariables is the number of variables in TT1. • "sum" — Sum the values in each year or day. • "mean" — Calculate the mean of the values in each year or day. • "prod" — Calculate the product of the values in each year or day. • "min" — Calculate the minimum of the values in each year or day. • "max" — Calculate the maximum of the values in each year or day. • "firstvalue" — Use the first value in each year or day. • "lastvalue" — Use the last value in each year or day. • @customfcn — A custom aggregation method that accepts a table variable and returns a numeric scalar (for univariate series) or row vector (for multivariate series). The function must accept empty inputs []. If you specify a single method, convert2monthly applies the specified method to all time series in TT1. If you specify a string vector or cell vector aggregation, convert2monthly applies aggregation(j) to TT1(:,j); convert2monthly applies each aggregation method one at a time (for more details, see retime). For example, consider a daily timetable representing TT1 with three variables. Time ___________ 01-Jan-2018 02-Jan-2018 03-Jan-2018 . . . 31-Jan-2018 . . . 28-Feb-2018 . . . 31-Mar-2018 . . . 30-Apr-2018 . . . 31-May-2018 .
AAA ______ 100.00 100.03 100.07 . . . 114.65 . . . 129.19 . . . 162.93 . . . 171.72 . . . 201.24 .
BBB ______ 200.00 200.06 200.14 . . . 229.3 . . . 258.38 . . . 325.86 . . . 343.44 . . . 402.48 .
CCC ________________ 300.00 400.00 300.09 400.12 300.21 400.28 . . . . . . 343.95 458.60 . . . . . . 387.57 516.76 . . . . . . 488.79 651.72 . . . . . . 515.16 686.88 . . . . . . 603.72 804.96 . .
12-345
12
Functions
. . 30-Jun-2018
. . 223.22
. . 446.44
. . 669.66
. . 892.88
The corresponding default monthly results representing TT2 (in which all days are business days and the 'lastvalue' is reported on the last business day of each month) are as follows. Time ___________ 31-Jan-2018 28-Feb-2018 31-Mar-2018 30-Apr-2018 31-May-2018 30-Jun-2018
AAA ______ 114.65 129.19 162.93 171.72 201.24 223.22
BBB ______ 229.30 258.38 325.86 343.44 402.48 446.44
CCC ________________ 343.95 458.60 387.57 516.76 488.79 651.72 515.16 686.88 603.72 804.96 669.66 892.88
All methods omit missing data (NaNs) in direct aggregation calculations on each variable. However, for situations in which missing values appear in the first row of TT1, missing values can also appear in the aggregated results TT2. To address missing data, write and specify a custom aggregation method (function handle) that supports missing data. Data Types: char | string | cell | function_handle Daily — Intra-day aggregation method for TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Intra-day aggregation method for TT1, specified as an aggregation method, a string vector of methods, or a length numVariables cell vector of methods. For more details on supported methods and behaviors, see the 'Aggregation' name-value argument. Data Types: char | string | cell | function_handle EndOfMonthDay — Day of the month that ends months last business day of month (default) | integer with value 1 to 31 Day of the month that ends months, specified as a scalar integer with value 1 to 31. For months with fewer days than EndOfMonthDay, convert2monthly reports aggregation results on the last business day of the month. Data Types: double
Output Arguments TT2 — Monthly data timetable Monthly data, returned as a timetable. The time arrangement of TT1 and TT2 are the same. If a variable of TT1 has no business-day records during a month within the sampling time span, convert2monthly returns a NaN for that variable and month in TT2. If the first month (month1) of TT1 contains at least one business day, the first date in TT2 is the last business date of month1. Otherwise, the first date in TT2 is the next end-of-month business date of TT1. 12-346
convert2monthly
If the last month (monthT) of TT1 contains at least one business day, the last date in TT2 is the last business date of monthT. Otherwise, the last date in TT2 is the previous end-of-month business date of TT1.
Version History Introduced in R2021a
See Also convert2daily | convert2weekly | convert2quarterly | convert2semiannual | convert2annual | addBusinessCalendar | timetable Topics “Resample and Aggregate Data in Timetable” “Combine Timetables and Synchronize Their Data” “Retime and Synchronize Timetable Variables Using Different Methods”
12-347
12
Functions
convert2quarterly Aggregate timetable data to quarterly periodicity
Syntax TT2 = convert2quarterly(TT1) TT2 = convert2quarterly(TT1,Name,Value)
Description TT2 = convert2quarterly(TT1) aggregates data (for example, data recorded daily or weekly) to a quarterly periodicity. TT2 = convert2quarterly(TT1,Name,Value) uses additional options specified by one or more name-value arguments.
Examples Aggregate Timetable Data to Quarterly Periodicity Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The timetable DataTimeTable contains measurements recorded at various, irregular times during trading hours (09:30 to 16:00) of the New York Stock Exchange (NYSE) from January 1, 2018, through December 31, 2020. For example, display the first few observations. head(DataTimeTable) Time ____________________
Price ______
Log_Return __________
01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27
DataTimeTable does not include business calendar awareness. If you want to account for nonbusiness days (weekends, holidays, and market closures) and you have a Financial Toolbox™ license, add business calendar awareness by using the addBusinessCalendar function. Aggregate the price series to a quarterly series by reporting the final price in each quarter. 12-348
convert2quarterly
QuarterlyPrice = convert2quarterly(DataTimeTable(:,"Price"));
QuarterlyPrice is a timetable containing the final prices for each reported quarter in DataTimeTable.
Specify Aggregation Method for Each Variable This example shows how to specify the appropriate aggregation method for the units of a variable. It also shows how to use convert2quarterly to aggregate both intra-day data and aggregated monthly data, which result in equivalent quarterly aggregates. Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The price series Price contains absolute measurements, whereas the log returns series Log_Return is the rate of change of the price series among successive observations. Because the series have different units, you must specify the appropriate method when you aggregate the series. Specifically, if you report the final price for a given periodicity, you must report the sum of the log returns within each period. To understand how to maintain consistency among aggregation methods, use two approaches to aggregate DataTimeTable so that the result has a quarterly periodicity. 1
Pass DataTimeTable directly to convert2quarterly.
2
Aggregate DataTimeTable so that the result has a monthly periodicity by using convert2monthly, and then pass the result to convert2quarterly.
In both cases, specify reporting the last price and the sum of the log returns for each period. Directly aggregate the data so that the result has a quarterly periodicity. For each series, specify the aggregation method that is appropriate for the unit. aggmethods = ["lastvalue" "sum"]; QuarterlyTT1 = convert2quarterly(DataTimeTable,Aggregation=aggmethods); tail(QuarterlyTT1) Time ___________
Price ______
Log_Return __________
31-Mar-2019 30-Jun-2019 30-Sep-2019 31-Dec-2019 31-Mar-2020 30-Jun-2020 30-Sep-2020 31-Dec-2020
112.93 169.77 148.97 153.22 229.88 224.29 246.77 301.04
0.29286 0.40768 -0.1307 0.02813 0.40568 -0.024618 0.095517 0.19879
QuarterlyTT1 is a timetable containing the annual data. Price is a series of the final stock prices for each year, and Log_Return is the sum of the log returns for each quarter. 12-349
12
Functions
Aggregate the data in two steps: aggregate the data so that the result has a monthly periodicity, then aggregate the monthly data to quarterly data. For each series, specify the aggregation method that is appropriate for the unit. MonthlyTT = convert2monthly(DataTimeTable,Aggregation=aggmethods); tail(MonthlyTT) Time ___________
Price ______
Log_Return __________
31-May-2020 30-Jun-2020 31-Jul-2020 31-Aug-2020 30-Sep-2020 31-Oct-2020 30-Nov-2020 31-Dec-2020
227.22 224.29 236.4 227.5 246.77 275.07 298.87 301.04
-0.029872 -0.012979 0.052585 -0.038375 0.081306 0.10857 0.082983 0.0072345
QuarterlyTT2 = convert2quarterly(MonthlyTT,Aggregation=aggmethods); tail(QuarterlyTT2) Time ___________
Price ______
Log_Return __________
31-Mar-2019 30-Jun-2019 30-Sep-2019 31-Dec-2019 31-Mar-2020 30-Jun-2020 30-Sep-2020 31-Dec-2020
112.93 169.77 148.97 153.22 229.88 224.29 246.77 301.04
0.29286 0.40768 -0.1307 0.02813 0.40568 -0.024618 0.095517 0.19879
MonthlyTT is a timetable with monthly periodicity. Price is a series of the final stock prices for each month, and Log_Return is the sum of the log returns for each month. QuarterlyTT1 and QuarterlyTT2 are equal.
Input Arguments TT1 — Data to aggregate to quarterly periodicity timetable Data to aggregate to a quarterly periodicity, specified as a timetable. Each variable can be a numeric vector (univariate series) or numeric matrix (multivariate series). Note • NaNs indicate missing values. • Timestamps must be in ascending or descending order.
12-350
convert2quarterly
By default, all days are business days. If your timetable does not account for nonbusiness days (weekends, holidays, and market closures), add business calendar awareness by using addBusinessCalendar first. For example, the following command adds business calendar logic to include only NYSE business days. TT = addBusinessCalendar(TT);
Data Types: timetable Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: TT2 = convert2quarterly(TT1,'Aggregation',["lastvalue" "sum"]) Aggregation — Aggregation method for TT1 data for intra-quarter or inter-day aggregation "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Aggregation method for TT1 data defining how to aggregate data over business days in an intraquarter or inter-day periodicity, specified as one of the following methods, a string vector of methods, or a length numVariables cell vector of methods, where numVariables is the number of variables in TT1. • "sum" — Sum the values in each year or day. • "mean" — Calculate the mean of the values in each year or day. • "prod" — Calculate the product of the values in each year or day. • "min" — Calculate the minimum of the values in each year or day. • "max" — Calculate the maximum of the values in each year or day. • "firstvalue" — Use the first value in each year or day. • "lastvalue" — Use the last value in each year or day. • @customfcn — A custom aggregation method that accepts a table variable and returns a numeric scalar (for univariate series) or row vector (for multivariate series). The function must accept empty inputs []. If you specify a single method, convert2quarterly applies the specified method to all time series in TT1. If you specify a string vector or cell vector aggregation, convert2quarterly applies aggregation(j) to TT1(:,j); convert2quarterly applies each aggregation method one at a time (for more details, see retime). For example, consider a daily timetable representing TT1 with three variables. Time ___________ 01-Jan-2018 02-Jan-2018 03-Jan-2018 . . . 31-Mar-2018 .
AAA ______ 100.00 100.03 100.07 . . . 162.93 .
BBB ______ 200.00 200.06 200.14 . . . 325.86 .
CCC ________________ 300.00 400.00 300.09 400.12 300.21 400.28 . . . . . . 488.79 651.72 . .
12-351
12
Functions
. . 30-Jun-2018 . . . 30-Sep-2018 . . . 31-Dec-2018
. . 223.22 . . . 232.17 . . . 243.17
. . 446.44 . . . 464.34 . . . 486.34
. . 669.66 . . . 696.51 . . . 729.51
. . 892.88 . . . 928.68 . . . 972.68
The corresponding default quarterly results representing TT2 (in which all days are business days and the 'lastvalue' is reported on the last business day of each quarter) are as follows. Time ___________ 31-Mar-2018 30-Jun-2018 30-Sep-2018 31-Dec-2018
AAA ______ 162.93 223.22 232.17 243.17
BBB ______ 325.86 446.44 464.34 486.34
CCC ________________ 488.79 651.72 669.66 892.88 696.51 928.68 729.51 972.68
All methods omit missing data (NaNs) in direct aggregation calculations on each variable. However, for situations in which missing values appear in the first row of TT1, missing values can also appear in the aggregated results TT2. To address missing data, write and specify a custom aggregation method (function handle) that supports missing data. Data Types: char | string | cell | function_handle Daily — Intra-day aggregation method for TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Intra-day aggregation method for TT1, specified as an aggregation method, a string vector of methods, or a length numVariables cell vector of methods. For more details on supported methods and behaviors, see the 'Aggregation' name-value argument. Data Types: char | string | cell | function_handle
Output Arguments TT2 — Quarterly data timetable Quarterly data, returned as a timetable. The time arrangement of TT1 and TT2 are the same. convert2quarterly reports quarterly aggregation results on the last business day of March, June, September, and December. If a variable of TT1 has no business-day records during a quarter within the sampling time span, convert2quarterly returns a NaN for that variable and quarter in TT2. If the first quarter (Q1) of TT1 contains at least one business day, the first date in TT2 is the last business date of Q1. Otherwise, the first date in TT2 is the next end-of-quarter business date of TT1.
12-352
convert2quarterly
If the last quarter (QT) of TT1 contains at least one business day, the last date in TT2 is the last business date of QT. Otherwise, the last date in TT2 is the previous end-of-quarter business date of TT1.
Version History Introduced in R2021a
See Also convert2daily | convert2weekly | convert2monthly | convert2semiannual | convert2annual | timetable | addBusinessCalendar Topics “Resample and Aggregate Data in Timetable” “Combine Timetables and Synchronize Their Data” “Retime and Synchronize Timetable Variables Using Different Methods”
12-353
12
Functions
convert2semiannual Aggregate timetable data to semiannual periodicity
Syntax TT2 = convert2semiannual(TT1) TT2 = convert2semiannual(TT1,Name,Value)
Description TT2 = convert2semiannual(TT1) aggregates data (for example, data recorded daily or weekly) to a semiannual periodicity. TT2 = convert2semiannual(TT1,Name,Value) uses additional options specified by one or more name-value arguments.
Examples Aggregate Timetable Data to Semiannual Periodicity Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The timetable DataTimeTable contains measurements recorded at various, irregular times during trading hours (09:30 to 16:00) of the New York Stock Exchange (NYSE) from January 1, 2018, through December 31, 2020. For example, display the first few observations. head(DataTimeTable) Time ____________________
Price ______
Log_Return __________
01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27
DataTimeTable does not include business calendar awareness. If you want to account for nonbusiness days (weekends, holidays, and market closures) and you have a Financial Toolbox™ license, add business calendar awareness by using the addBusinessCalendar function. Aggregate the price series to a semiannual series by reporting the final price of each January-to-June period and July-to-December period. 12-354
convert2semiannual
SemiannualPrice = convert2semiannual(DataTimeTable(:,"Price")); tail(SemiannualPrice) Time ___________
Price ______
30-Jun-2018 31-Dec-2018 30-Jun-2019 31-Dec-2019 30-Jun-2020 31-Dec-2020
99.29 84.26 169.77 153.22 224.29 301.04
SemiannualPrice is a timetable containing the final prices for each reported semiannual period in DataTimeTable.
Specify Aggregation Method for Each Variable This example shows how to specify the appropriate aggregation method for the units of a variable. It also shows how to use convert2semiannual to aggregate both intra-day data and aggregated quarterly data, which result in equivalent semiannual aggregates. Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The price series Price contains absolute measurements, whereas the log returns series Log_Return is the rate of change of the price series among successive observations. Because the series have different units, you must specify the appropriate method when you aggregate the series. Specifically, if you report the final price for a given periodicity, you must report the sum of the log returns within each period. To understand how to maintain consistency among aggregation methods, use two approaches to aggregate DataTimeTable so that the result has a semiannual periodicity. 1
Pass DataTimeTable directly to convert2semiannual.
2
Aggregate DataTimeTable so that the result has a quarterly periodicity by using convert2quarterly, and then pass the result to convert2semiannual.
In both cases, specify reporting the last price and the sum of the log returns for each period. Directly aggregate the data so that the result has a semiannual periodicity. For each series, specify the aggregation method that is appropriate for the unit. aggmethods = ["lastvalue" "sum"]; SemiannualTT1 = convert2semiannual(DataTimeTable,Aggregation=aggmethods); tail(SemiannualTT1) Time ___________
Price ______
30-Jun-2018 31-Dec-2018
99.29 84.26
Log_Return __________ -0.032501 -0.16414
12-355
12
Functions
30-Jun-2019 31-Dec-2019 30-Jun-2020 31-Dec-2020
169.77 153.22 224.29 301.04
0.70054 -0.10257 0.38107 0.2943
SemiannualTT1 is a timetable containing the semiannual data. Price is a series of the final stock prices for each January-to-June period and July-to-December period, and Log_Return is the sum of the log returns for each semiannual period. Aggregate the data in two steps: aggregate the data so that the result has a quarterly periodicity, then aggregate the quarterly data to semiannual data. For each series, specify the aggregation method that is appropriate for the unit. QuarterlyTT = convert2quarterly(DataTimeTable,Aggregation=aggmethods); tail(QuarterlyTT) Time ___________
Price ______
Log_Return __________
31-Mar-2019 30-Jun-2019 30-Sep-2019 31-Dec-2019 31-Mar-2020 30-Jun-2020 30-Sep-2020 31-Dec-2020
112.93 169.77 148.97 153.22 229.88 224.29 246.77 301.04
0.29286 0.40768 -0.1307 0.02813 0.40568 -0.024618 0.095517 0.19879
SemiannualTT2 = convert2semiannual(QuarterlyTT,Aggregation=aggmethods) SemiannualTT2=6×2 timetable Time Price ___________ ______ 30-Jun-2018 31-Dec-2018 30-Jun-2019 31-Dec-2019 30-Jun-2020 31-Dec-2020
99.29 84.26 169.77 153.22 224.29 301.04
Log_Return __________ -0.032501 -0.16414 0.70054 -0.10257 0.38107 0.2943
QuarterlyTT is a timetable with quarterly periodicity. Price is a series of the final stock prices for each quarter, and Log_Return is the sum of the log returns for each quarter. SemiannualTT1 and SemiannualTT2 are equal.
Input Arguments TT1 — Data to aggregate to semiannual periodicity timetable Data to aggregate to a semiannual periodicity, specified as a timetable. Each variable can be a numeric vector (univariate series) or numeric matrix (multivariate series). 12-356
convert2semiannual
Note • NaNs indicate missing values. • Timestamps must be in ascending or descending order.
By default, all days are business days. If your timetable does not account for nonbusiness days (weekends, holidays, and market closures), add business calendar awareness by using addBusinessCalendar first. For example, the following command adds business calendar logic to include only NYSE business days. TT = addBusinessCalendar(TT);
Data Types: timetable Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: TT2 = convert2semiannual(TT1,'Aggregation',["lastvalue" "sum"]) Aggregation — Aggregation method for semiannual period to semiannual periodicity (inter-day aggregation) "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Aggregation method for TT1 defining how data is aggregated over business days in a semiannual period to semiannual periodicity (inter-day aggregation), specified as one of the following methods, a string vector of methods, or a length numVariables cell vector of methods, where numVariables is the number of variables in TT1. • "sum" — Sum the values in each year or day. • "mean" — Calculate the mean of the values in each year or day. • "prod" — Calculate the product of the values in each year or day. • "min" — Calculate the minimum of the values in each year or day. • "max" — Calculate the maximum of the values in each year or day. • "firstvalue" — Use the first value in each year or day. • "lastvalue" — Use the last value in each year or day. • @customfcn — A custom aggregation method that accepts a table variable and returns a numeric scalar (for univariate series) or row vector (for multivariate series). The function must accept empty inputs []. If you specify a single method, convert2semiannual applies the specified method to all time series in TT1. If you specify a string vector or cell vector aggregation, convert2semiannual applies aggregation(j) to TT1(:,j); convert2semiannual applies each aggregation method one at a time (for more details, see retime). For example, consider a daily timetable representing TT1 with three variables. Time ___________
AAA ______
BBB ______
CCC _________________
12-357
12
Functions
01-Jan-2018 02-Jan-2018 03-Jan-2018 . . . 28-Jun-2018 29-Jun-2018 30-Jun-2018 01-Jul-2018 02-Jul-2018 03-Jul-2018 . . . 29-Dec-2018 30-Dec-2018 31-Dec-2018
100.00 100.02 99.96 . . . 69.63 70.15 75.77 75.68 71.34 69.25 . . . 249.16 250.21 256.75
200.00 200.04 199.92 . . . 139.26 140.3 151.54 151.36 142.68 138.50 . . . 498.32 500.42 513.50
300.00 300.06 299.88 . . . 208.89 210.45 227.31 227.04 214.02 207.75 . . . 747.48 750.63 770.25
400.00 400.08 399.84 . . . 278.52 280.60 303.08 302.72 285.36 277.00 . . . 996.64 1000.84 1027.00
The corresponding default semiannual results representing TT2 (in which all days are business days and the 'lastvalue' is reported on the last business day of each semiannual period) are as follows. Time ___________ 30-Jun-2018 31-Dec-2018
AAA ______ 75.77 256.75
BBB ______ 151.54 513.50
CCC ________________ 227.31 303.08 770.25 1027.00
All methods omit missing data (NaNs) in direct aggregation calculations on each variable. However, for situations in which missing values appear in the first row of TT1, missing values can also appear in the aggregated results TT2. To address missing data, write and specify a custom aggregation method (function handle) that supports missing data. Data Types: char | string | cell | function_handle Daily — Intra-day aggregation method for TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Intra-day aggregation method for TT1, specified as an aggregation method, a string vector of methods, or a length numVariables cell vector of methods. For more details on supported methods and behaviors, see the 'Aggregation' name-value argument. Data Types: char | string | cell | function_handle
Output Arguments TT2 — Semiannual data timetable Semiannual data, returned as a timetable. convert2semiannual reports semiannual aggregation results on the last business day of June and December. The time arrangement of TT1 and TT2 are the same. If a variable of TT1 has no business-day records during an annual period within the sampling time span, convert2semiannual returns a NaN for that variable and annual period in TT2. 12-358
convert2semiannual
The first date in TT2 is the last business date of the semiannual period in which the first date in TT1 occurs, provided TT1 has business dates in that semiannual period. Otherwise the first date in TT2 is the next end-of-semiannual-period business date. The last date in TT2 is the last business date of the semiannual period in which the last date in TT1 occurs, provided TT1 has business dates in that semiannual period. Otherwise the last date in TT2 is the previous end-of-semiannual-period business date.
Version History Introduced in R2021a
See Also convert2daily | convert2weekly | convert2monthly | convert2quarterly | convert2annual | timetable | addBusinessCalendar Topics “Resample and Aggregate Data in Timetable” “Combine Timetables and Synchronize Their Data” “Retime and Synchronize Timetable Variables Using Different Methods”
12-359
12
Functions
convert2annual Aggregate timetable data to annual periodicity
Syntax TT2 = convert2annual(TT1) TT2 = convert2annual(TT1,Name,Value)
Description TT2 = convert2annual(TT1) aggregates data (for example, recorded daily or weekly data) to annual periodicity. TT2 = convert2annual(TT1,Name,Value) uses additional options specified by one or more name-value arguments.
Examples Aggregate Timetable Data to Annual Periodicity Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The timetable DataTimeTable contains measurements recorded at various, irregular times during trading hours (09:30 to 16:00) of the New York Stock Exchange (NYSE) from January 1, 2018, through December 31, 2020. For example, display the first few observations. head(DataTimeTable) Time ____________________
Price ______
Log_Return __________
01-Jan-2018 01-Jan-2018 01-Jan-2018 01-Jan-2018 02-Jan-2018 03-Jan-2018 03-Jan-2018 03-Jan-2018
100 101.14 101.5 100.15 99.72 100.11 103.96 107.05
-0.025375 0.011336 0.0035531 -0.01339 -0.0043028 0.0039033 0.037737 0.02929
11:52:48 13:23:13 14:45:09 15:30:30 10:43:37 10:02:21 11:22:37 13:42:27
DataTimeTable does not include business calendar awareness. If you want to account for nonbusiness days (weekends, holidays, and market closures) and you have a Financial Toolbox™ license, add business calendar awareness by using the addBusinessCalendar function. Aggregate the price series to an annual series by reporting the final price in each year. 12-360
convert2annual
AnnualPrice = convert2annual(DataTimeTable(:,"Price"));
AnnualPrice is a timetable containing the final prices for each reported year in DataTimeTable.
Specify Aggregation Method for Each Variable This example shows how to specify the appropriate aggregation method for the units of a variable. It also shows how to use convert2annual to aggregate both intra-day data and aggregated intra-dayto-monthly data, which result in equivalent annual aggregates. Load the simulated stock price data and corresponding logarithmic returns in SimulatedStockSeries.mat. load SimulatedStockSeries
The price series Price contains absolute measurements, whereas the log returns series Log_Return is the rate of change of the price series among successive observations. Because the series have different units, you must specify the appropriate method when you aggregate the series. Specifically, if you report the final price for a given periodicity, you must report the sum of the log returns within each period. To understand how convert2annual maintains consistency among aggregation methods, use two approaches to aggregate DataTimeTable so that the result has an annual periodicity. 1
Pass DataTimeTable directly to convert2annual.
2
Aggregate DataTimeTable so that the result has a monthly periodicity by using convert2monthly, and then pass the result to convert2annual.
In both cases, specify reporting the last price and the sum of the log returns for each period. Directly aggregate the data so that the result has an annual periodicity. For each series, specify the aggregation method that is appropriate for the unit. aggmethods = ["lastvalue" "sum"]; AnnualTT1 = convert2annual(DataTimeTable,Aggregation=aggmethods) AnnualTT1=3×2 timetable Time Price ___________ ______ 31-Dec-2018 31-Dec-2019 31-Dec-2020
84.26 153.22 301.04
Log_Return __________ -0.19664 0.59797 0.67537
AnnualTT1 is a timetable containing the annual data. Price is a series of the final stock prices for each year, and Log_Return is the sum of the log returns for each year. Aggregate the data in two steps: aggregate the data so that the result has a monthly periodicity, then aggregate the monthly data to annual data. For each series, specify the aggregation method that is appropriate for the unit. MonthlyTT = convert2monthly(DataTimeTable,Aggregation=aggmethods); tail(MonthlyTT)
12-361
12
Functions
Time ___________
Price ______
Log_Return __________
31-May-2020 30-Jun-2020 31-Jul-2020 31-Aug-2020 30-Sep-2020 31-Oct-2020 30-Nov-2020 31-Dec-2020
227.22 224.29 236.4 227.5 246.77 275.07 298.87 301.04
-0.029872 -0.012979 0.052585 -0.038375 0.081306 0.10857 0.082983 0.0072345
AnnualTT2 = convert2annual(MonthlyTT,Aggregation=aggmethods) AnnualTT2=3×2 timetable Time Price ___________ ______ 31-Dec-2018 31-Dec-2019 31-Dec-2020
84.26 153.22 301.04
Log_Return __________ -0.19664 0.59797 0.67537
MonthlyTT is a timetable with monthly periodicity. Price is a series of the final stock prices for each month, and Log_Return is the sum of the log returns for each month. AnnualTT1 and AnnualTT2 are equal.
Input Arguments TT1 — Data to aggregate to annual periodicity timetable Data to aggregate to an annual periodicity, specified as a timetable. Each variable can be a numeric vector (univariate series) or numeric matrix (multivariate series). Note • NaNs indicate missing values. • Timestamps must be in ascending or descending order.
By default, all days are business days. If your timetable does not account for nonbusiness days (weekends, holidays, and market closures), add business calendar awareness by using addBusinessCalendar first. For example, the following command adds business calendar logic to include only NYSE business days. TT = addBusinessCalendar(TT);
Data Types: timetable
12-362
convert2annual
Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Example: TT2 = convert2annual(TT1,'Aggregation',["lastvalue" "sum"]) Aggregation — Aggregation method for TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Aggregation method for TT1 defining how to aggregate data over business days in a year to an annual periodicity, specified as one of the following methods, a string vector of methods, or a length numVariables cell vector of methods, where numVariables is the number of variables in TT1. • "sum" — Sum the values in each year or day. • "mean" — Calculate the mean of the values in each year or day. • "prod" — Calculate the product of the values in each year or day. • "min" — Calculate the minimum of the values in each year or day. • "max" — Calculate the maximum of the values in each year or day. • "firstvalue" — Use the first value in each year or day. • "lastvalue" — Use the last value in each year or day. • @customfcn — A custom aggregation method that accepts a table variable and returns a numeric scalar (for univariate series) or row vector (for multivariate series). The function must accept empty inputs []. If you specify a single method, convert2annual applies the specified method to all time series in TT1. If you specify a string vector or cell vector aggregation, convert2annual applies aggregation(j) to TT1(:,j); convert2annual applies each aggregation method one at a time (for more details, see retime). For example, consider an input daily timetable with three variables. Time ___________ 01-Jan-2018 02-Jan-2018 03-Jan-2018 . . . 29-Dec-2018 30-Dec-2018 31-Dec-2018
AAA ______ 100.00 100.03 100.07 . . . 249.16 250.21 256.75
BBB ______ 200.00 200.06 200.14 . . . 498.32 500.42 513.50
CCC ________________ 300.00 400.00 300.09 400.12 300.21 400.28 . . . . . . 747.48 996.64 750.63 1000.84 770.25 1027.00
By default, convert2annual applies the aggregation method "lastvalue", which reports for each variable the values of the last business day of each year. The aggregated annual results are as follows: TT2 = convert2annual(TT1) TT2 = 1×3 timetable
12-363
12
Functions
Time ___________ 31-Dec-2018
AAA ______ 256.75
BBB ______ 513.50
CCC ________________ 770.25 1027.00
All methods omit missing data (NaNs) in direct aggregation calculations on each variable. However, for situations in which missing values appear in the first row of TT1, missing values can also appear in the aggregated results TT2. To address missing data, write and specify a custom aggregation method (function handle) that supports missing data. Data Types: char | string | cell | function_handle Daily — Intra-day aggregation method for TT1 "lastvalue" (default) | "sum" | "prod" | "mean" | "min" | "max" | "firstvalue" | character vector | function handle | string vector | cell vector of character vectors or function handles Intra-day aggregation method for TT1, specified as an aggregation method, a string vector of methods, or a length numVariables cell vector of methods. For more details on supported methods and behaviors, see the 'Aggregation' name-value argument. Data Types: char | string | cell | function_handle EndOfYearMonth — Month that ends annual periods "December" (weeks end on Friday) (default) | integer with value 1 to 12 | "January" | "February" | "March" | "April" | "May" | "June" | "July" | "August" | "September" | "October" | "November" | "December" | character vector Month that ends annual periods, specified as a value in this table. Value
Month Ending Each Year
"January" or 1
January
"February" or 2
February
"March" or 3
March
"April" or 4
April
"May" or 5
May
"June" or 6
June
"July" or 7
July
"August" or 8
August
"September" or 9
September
"October" or 10
October
"November" or 11
November
"December" or 12
December
Data Types: double | char | string
Output Arguments TT2 — Annual data timetable Annual data, returned as a timetable. The time arrangement of TT1 and TT2 are the same. 12-364
convert2annual
If a variable of TT1 has no business-day records during an annual period within the sampling time span, convert2annual returns a NaN for that variable and annual period in TT2. If the first annual period (year1) of TT1 contains at least one business day, the first date in TT2 is the last business date of year1. Otherwise, the first date in TT2 is the next end-of-year-period business date of TT1. If the last annual period (yearT) of TT1 contains at least one business day, the last date in TT2 is the last business date of yearT. Otherwise, the last date in TT2 is the previous end-of-year-period business date of TT1.
Version History Introduced in R2021a
See Also convert2daily | convert2weekly | convert2monthly | convert2quarterly | convert2semiannual | timetable | addBusinessCalendar Topics “Resample and Aggregate Data in Timetable” “Combine Timetables and Synchronize Their Data” “Retime and Synchronize Timetable Variables Using Different Methods”
12-365
12
Functions
corr Model-implied temporal correlations of state-space model
Syntax Cyy = corr(Mdl) Cyy = corr(Mdl,Name,Value) [Cyy,Cxx,Cyx] = corr( ___ ) [Cyy,Cxx,Cyx] = corr( ___ ,'Params',estParams)
Description The corr function returns model-implied temporal correlations and covariances on page 12-379 of the state or measurement variables in a stationary, time-invariant state-space model. To determine whether the model captures characteristics present in the data, You can compare model-implied associations of present and lagged variables to sample analogues. Other state-space model tools to characterize the dynamics of a specified system include the following: • The impulse response function (IRF), computed by irf and plotted by irfplot, traces the effects of a shock to a state disturbance on the measurement variables in the system. • The forecast error variance decomposition (FEVD), computed by fevd, provides information about the relative importance of each state disturbance in affecting the forecast error variance of all measurement variables in the system. Fully Specified State-Space Model
Cyy = corr(Mdl) returns Corr(yt,yt – 1), the model-implied temporal correlation of each measurement variable of the fully specified, standard, stationary state-space model Mdl. Cyy = corr(Mdl,Name,Value) uses additional options specified by one or more name-value arguments. For example, 'Covariance',true,'NumLags',10 specifies returning temporal covariances Cov(yt,yt – h), h = 0 through 10. [Cyy,Cxx,Cyx] = corr( ___ ) also returns Corr(xt,xt – h), the correlations between the state variables and their self-lags Cxx, and Corr(yt,xt – h), the correlations between the state variables and their self lags Cxx and the cross-correlations between the measurement variables and lags of the state variables Cyx using any of the input argument combinations in the previous syntaxes. h is the value of the NumLags name-value argument. corr returns covariances when the value of the Covariance name-value argument is true. Partially Specified State-Space Model
[Cyy,Cxx,Cyx] = corr( ___ ,'Params',estParams) uses the partially specified, standard state-space model Mdl and substitutes the parameter estimates estParams for all unknown parameters in the model.
Examples
12-366
corr
Temporal Correlations of Measurement Variables Explicitly create the state-space model x1, t = 0 . 9x1, t − 1 + 0 . 2u1, t x2, t = 0 . 1x1, t − 1 + 0 . 3x2, t − 1 + u2, t y1, t = x1, t y2, t = x1, t + x2, t . A = B = C = Mdl
[0.9 0; 0.1 0.3]; [0.2 0; 0 1]; [1 0; 1 1]; = ssm(A,B,C,'StateType',[0 0])
Mdl = State-space model type: ssm State vector length: 2 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 0 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equations: x1(t) = (0.90)x1(t-1) + (0.20)u1(t) x2(t) = (0.10)x1(t-1) + (0.30)x2(t-1) + u2(t) Observation equations: y1(t) = x1(t) y2(t) = x1(t) + x2(t) Initial state distribution: Initial state means x1 x2 0 0 Initial state covariance matrix x1 x2 x1 0.21 0.03 x2 0.03 1.10 State types x1 Stationary
x2 Stationary
Mdl is an ssm model object. Because all parameters have known values, the object is fully specified. Compute the temporal correlations of the measurement variables through lag 1. Cyy = corr(Mdl)
12-367
12
Functions
Cyy = Cyy(:,:,1) = 1.0000 0.9000
0.4411 0.4072
Cyy(:,:,2) = 0.4411 0.3970
1.0000 0.4212
Rows correspond to lags, columns correspond to the latest observation of the measurement variable in the correlation, and pages correspond to the lagged measurement variable. For example, Corr y1, t y2, t − 1 is 0.3970. Display a heatmap of the temporal correlations between the latest observation of each measurement variable and all lags of measurement variable 1. Corryy1 = Cyy(:,:,1); hm = heatmap(Corryy1); ylabel('h'); hm.YDisplayLabels = ["0" "1"]; xlabel('i') hm.XDisplayLabels = ["1" "2"]; title('Corr(y_{i,t},y_{1,t - h})')
12-368
corr
Display a heatmap of the temporal correlations between the current observation of measurement variable 1 and all lags of all measurement variables. Corry1y = squeeze(Cyy(:,1,:)); hm = heatmap(Corry1y); ylabel('h'); hm.YDisplayLabels = ["0" "1"]; xlabel('j') hm.XDisplayLabels = ["1" "2"]; title('Corr(y_{1,t},y_{j,t - h})')
Display a heatmap of the temporal correlations between the current observation of all measurement variables and the first lag of all measurement variables. Corryylag1 = squeeze(Cyy(2,:,:)); hm = heatmap(Corryylag1); ylabel('i'); hm.YDisplayLabels = ["1" "2"]; xlabel('j') hm.XDisplayLabels = ["1" "2"]; title('Corr(y_{i,t},y_{j,t - 1})')
12-369
12
Functions
Specify Number of Periods Explicitly create the state-space model x1, t = 0 . 9x1, t − 1 + 0 . 2u1, t x2, t = 0 . 1x1, t − 1 − 0 . 3x2, t − 1 + u2, t y1, t = x1, t + ε1, t y2, t = x1, t + x2, t + ε2, t . A = B = C = D = Mdl
[0.9 0; 0.1 -0.3]; [0.2 0; 0 1]; [1 0; 1 1]; eye(2); = ssm(A,B,C,D,'StateType',[0 0]);
Mdl is an ssm model object. Compute the temporal correlations of the measurement variables from lag 0 through 20. numlags = 20; Cyy = corr(Mdl,'NumLags',numlags);
12-370
corr
Cyy is a 21-by-2-by-2 array representing the 20-period temporal correlations of the measurement variables. Display Cyy(:,2,2), which is the model-implied autocorrelation of y2, t. acfy2 = Cyy(:,2,2) acfy2 = 21×1 1.0000 -0.0466 0.1267 0.0634 0.0723 0.0605 0.0558 0.0498 0.0449 0.0404 ⋮
Generate a random path of measurements of length 200 from the model. rng(1); % For reproducibility Y = simulate(Mdl,200);
Compute the sample autocorrelation function (ACF) of each variable for 2q0 lags. sacfy1 = autocorr(Y(:,1),'NumLags',numlags); sacfy2 = autocorr(Y(:,2),'NumLags',numlags);
Visually compare the model-implied and sample ACF of each measurement variable. acfy1 = Cyy(:,1,1); plot([acfy1 sacfy1]) xticklabels(0:numlags) ylabel("Autocorrelation") xlabel("Lags") legend(["ACF(y_{1,t})" "Sample ACF(y_{1,t})"]) title("ACF(y_{1,t})") axis tight
12-371
12
Functions
plot([acfy2 sacfy2]) xticklabels(0:numlags) ylabel("Autocorrelation") xlabel("Lags") legend(["ACF(y_{2,t})" "Sample ACF(y_{2,t})"]) title("ACF(y_{2,t})") axis tight
12-372
corr
State- and Cross-Variable Temporal Correlations Explicitly create the state-space model x1, t = 0 . 9x1, t − 1 + 0 . 2u1, t x2, t = 0 . 1x1, t − 1 − 0 . 3x2, t − 1 + u2, t y1, t = x1, t + ε1, t y2, t = x1, t + x2, t + ε2, t . A = B = C = D = Mdl
[0.9 0; 0.1 -0.3]; [0.2 0; 0 1]; [1 0; 1 1]; eye(2); = ssm(A,B,C,D,'StateType',[0 0]);
Mdl is an ssm model object. Compute the temporal correlations of the measurement and state variables, as well as their crosscorrelations. [Cyy,Cxx,Cyx] = corr(Mdl);
Each output variable is a 2-by-2-by-2 array containing temporal correlations from lag 0 to 1. Cyy contains the correlations between the measurement variables, Cxx contains the correlations among 12-373
12
Functions
the state variables, and Cyx contains cross-correlations between the current observation of the measurement variables and lagged state variables. Plot a heatmap of the correlations between x1, t and the lags of all state variables. Cx1x = squeeze(Cxx(:,1,:)); hm = heatmap(Cx1x); ylabel('h'); hm.YDisplayLabels = ["0" "1"]; xlabel('j') hm.XDisplayLabels = ["1" "2"]; title('Corr(x_{1,t},x_{j,t - h})')
Plot a heatmap of the cross-correlations between all measurement variables and the lags of x2, t. Cyx2 = Cyx(:,:,2); hm = heatmap(Cyx2); ylabel('h'); hm.YDisplayLabels = ["0" "1"]; xlabel('i') hm.XDisplayLabels = ["1" "2"]; title('Corr(y_{i,t},x_{2,t - h})')
12-374
corr
Temporal Covariances of Estimated Model Simulate data from a known model, fit a model to the data, and then compare sample and modelimplied covariances. Simulate Data Explicitly create the state-space model x1, t = 0 . 9x1, t − 1 + 0 . 2u1, t x2, t = 0 . 1x1, t − 1 − 0 . 3x2, t − 1 + u2, t y1, t = x1, t + ε1, t y2, t = x1, t + x2, t + ε2, t . ADGP = [0.9 0; 0.1 -0.3]; BDGP = [0.2 0; 0 1]; CDGP = [1 0; 1 1]; DDGP = eye(2); DGP = ssm(ADGP,BDGP,CDGP,DDGP,'StateType',[0 0]);
Generate a random path of measurements of length 500 from the model.
12-375
12
Functions
rng(1); % For reproducibility numobs = 500; Y = simulate(DGP,numobs);
Fit Model to Data Create a state-space model template to fit to the data by replacing each nonzero state parameter of the data-generating process with a NaN value. A = [NaN 0; NaN NaN]; B = [NaN 0; 0 NaN]; Mdl = ssm(A,B,CDGP,DDGP,'StateType',[0
0]);
Fit the model template to the data. Specify a random set of positive starting values. Return the vector of estimated parameters. [~,estParams] = estimate(Mdl,Y,abs(rand(5,1))); Method: Maximum likelihood (fminunc) Sample size: 500 Logarithmic likelihood: -1694.08 Akaike info criterion: 3398.15 Bayesian info criterion: 3419.23 | Coeff Std Err t Stat Prob --------------------------------------------------c(1) | 0.91506 0.04229 21.63951 0 c(2) | -0.25898 0.25406 -1.01934 0.30805 c(3) | -0.15383 0.08243 -1.86621 0.06201 c(4) | -0.16808 0.04926 -3.41221 0.00064 c(5) | 1.19275 0.06842 17.43153 0 | | Final State Std Dev t Stat Prob x(1) | -0.12293 0.30568 -0.40217 0.68756 x(2) | -0.80608 0.79263 -1.01697 0.30917
Compute Covariances Compute model-implied temporal covariances of the measurement variables by passing the statespace model template and estimated parameters to corr. Return the covariances instead of the correlations. Covyy = corr(Mdl,'Params',estParams,'Covariance',true) Covyy = Covyy(:,:,1) = 1.1737 0.1589
0.1376 0.1195
Covyy(:,:,2) = 0.1376 0.1259
2.5676 -0.1297
Covyy is a 2-by-2-by-2 array of temporal covariances of the measurement variables. Compute the sample covariances of the measurement variables and their first lags. 12-376
corr
AugData = lagmatrix(Y,[0 1]); SCovyy = cov(AugData(2:end,:));
Compare Covariances Compare the model-implied temporal covariances and the sample covariances. names = ["y_1t" "y_2t" "y_1t-1" "y_2t-1"]; Covy1y = squeeze(Covyy(:,1,:))'; Covy2y = squeeze(Covyy(:,2,:))'; CovyLag1 = [Covy1y(:,2) Covy2y(:,2) Covy1y(:,1) Covy2y(:,1)]'; ModelCovariances = array2table([Covy1y(:) Covy2y(:) CovyLag1],'RowNames',names,... 'VariableNames',names) ModelCovariances=4×4 table y_1t y_2t _______ ________ y_1t y_2t y_1t-1 y_2t-1
1.1737 0.13759 0.15891 0.1259
0.13759 2.5676 0.11949 -0.12972
y_1t-1 _______
y_2t-1 ________
0.15891 0.11949 1.1737 0.13759
0.1259 -0.12972 0.13759 2.5676
SampleCovariances = array2table(SCovyy,'RowNames',names,'VariableNames',names) SampleCovariances=4×4 table y_1t y_2t ________ ________ y_1t y_2t y_1t-1 y_2t-1
1.2459 0.22058 0.11689 0.070475
0.22058 2.5332 0.074687 -0.17466
y_1t-1 ________
y_2t-1 ________
0.11689 0.074687 1.2437 0.22158
0.070475 -0.17466 0.22158 2.5419
The model-implied and sample covariances appear to be similar in magnitude. Note that the covariances are invariant to the reference time; for example, Cov(y1, t, y2, t) = Cov(y1, t − 1, y2, t − 1).
Input Arguments Mdl — Standard, stationary state-space model ssm model object Standard, stationary state-space model, specified as an ssm model object returned by ssm or its estimate function. • Temporal moments are well defined for stationary states. Therefore, corr issues an error if one or more of the following conditions apply: • At least one state is nonstationary (Mdl.StateType contains at least one value of 2). • At least one coefficient is time varying. • Either the measurement or state variable is dimension varying.
12-377
12
Functions
• If Mdl is partially specified (that is, it contains unknown parameters), specify estimates of the unknown parameters by using the 'Params' name-value argument. Otherwise, corr issues an error. • The initial covariance matrix Mdl.Cov0 is implied by the transition equation. Therefore, corr ignores Mdl.Cov0 and values corresponding to Cov0 in the value of Params. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Covariance',true,'NumLags',10 specifies returning Cov(yt,yt – 10), covariances of selfand cross-lags of the measurement variables from lags 0 through 10. NumLags — Maximum number of state or measurement variable lags 1 (default) | nonnegative integer Maximum number of state or measurement variable lags to include in the computation, specified as a nonnegative integer. corr returns associations from lags 0 through NumLags. Example: 'NumLags',10 Data Types: double Params — Estimates of unknown parameters numeric vector Estimates of the unknown parameters in the partially specified state-space model Mdl, specified as a numeric vector. If Mdl is partially specified (contains unknown parameters specified by NaNs), you must specify Params. The estimate function returns parameter estimates of Mdl in the appropriate form. However, you can supply custom estimates by arranging the elements of Params as follows: • If Mdl is an explicitly created model (Mdl.ParamMap is empty []), arrange the elements of Params to correspond to hits of a column-wise search of NaNs in the state-space model coefficient matrices, initial state mean vector, and covariance matrix. • If Mdl is time invariant, the order is A, B, C, D, Mean0, and Cov0. • If Mdl is time varying, the order is A{1} through A{end}, B{1} through B{end}, C{1} through C{end}, D{1} through D{end}, Mean0, and Cov0. • If Mdl is an implicitly created model (Mdl.ParamMap is a function handle), the first input argument of the parameter-to-matrix mapping function determines the order of the elements of Params. If Mdl is fully specified, corr ignores Params. Example: Consider the state-space model Mdl with A = B = [NaN 0; 0 NaN] , C = [1; 1], D = 0, and initial state means of 0 with covariance eye(2). Mdl is partially specified and explicitly created. Because the model parameters contain a total of four NaNs, Params must be a 4-by-1 vector, where Params(1) is the estimate of A(1,1), Params(2) is the estimate of A(2,2), Params(3) is the estimate of B(1,1), and Params(4) is the estimate of B(2,2). 12-378
corr
Data Types: double Covariance — Flag for returning temporal covariances false (default) | true Flag for returning the temporal covariances instead of the correlations, specified as a value in this table. Value
Description
false
Output arguments represent temporal correlations
true
Output arguments represent temporal covariances
Example: 'Covariance',true Data Types: logical
Output Arguments Cyy — Temporal associations of measurement variables numeric array Temporal associations of the measurement variables (correlations or covariances), returned as a (NumLags + 1)-by-n-by-n numeric array. Cyy(h + 1,i,j) is the temporal association between yi,t and yj,t – h, for h = 0,1,2,...,NumLags, i = 1,2,...,n (number of measurement variables), and j = 1,2,...,n. Cxx — Temporal associations of state variables numeric array Temporal associations of the state variables, returned as a (NumLags + 1)-by-m-by-m numeric array. Cxx(h + 1,i,j) is the temporal association between xi,t and xj,t – h, for h = 0,1,2,...,NumLags, i = 1,2,...,m (number of state variables), and j = 1,2,...,m. Cyx — Temporal cross-associations between measurement and state variables numeric array Temporal cross-associations between measurement and state variables, returned as a (NumLags + 1)by-n-by-m numeric array. Cyx(h + 1,i,j) is the temporal association between yi,t and xj,t – h, for h = 0,1,2,...,NumLags, i = 1,2,...,n, and j = 1,2,...,m.
More About Model-Implied Temporal Associations Model-implied temporal correlations and covariances measure self- and cross-lag associations between measurement and state variables in a state-space model, as prescribed by the model. To facilitate model specification, you can compare model-implied temporal correlations and covariances to sample analogues. 12-379
12
Functions
Consider the time-invariant state-space model on page 11-3 at time t xt = Axt − 1 + But yt = Cxt + Dεt . Consider a demeaned state-space model represented by state and measurement variables x 0, t and y 0, t: 1
Append the state-space model with an appropriately sized constant state vector representing an intercept. x0, t 1
=
A0 A1 x0, t − 1
0 1 1 x0, t yt = C0 C1 + Dεt . 1 2
+
B ut 0
Demean the variables. x 0, t = x0, t − E x0, t = x0, t − I − A0
−1
A1
y 0, t = y0, t − E y0, t = y0, t − C1 − C0 I − A0 3
−1
A1 .
Demean the state-space model and drop constant terms that do not affect the covariance. x 0, t = A0x 0, t − 1 + But y 0, t = C0x 0, t + Dεt .
Because the difference between the full state-space model and the demeaned model is the inclusion of constant states, Cov(xt, xt − h) = Cov x 0, t, x 0, t − h = Γ0, h, which implies Cxx(1, : , : ) = Γ0, 0 = A0Γ0, 0 A0′ + BB′ . Let Γ 0, 0 be the solution to the equation. Using the demeaned state equation, Cxx(h + 1, : , : ) = Γ0, h = AoΓh − 1; h = 1, 2, .... The preceding results imply the following: • Cyy(1, : , : ) = Cov yt, yt = C0Γ0, 0C0′ + DD′ . • Cyy(h + 1, : , : ) = Cov yt, yt − h = C0Γ0, hC0′ ; h = 1, 2, .... • Cyx(h + 1: , : ) = Cov(yt, xt − h) = C0Γ0, h; h = 0, 1, 2....
Tips • To obtain an association matrix of lead variables from an association matrix of lagged variables, use the identity C at, bt + h = C at, bt − h ′, where: • C is an association function, either Corr or Cov. 12-380
corr
• at and bt are yt or xt.
Version History Introduced in R2021a
See Also Objects ssm Functions irf | irfplot | estimate | filter | smooth | forecast | fevd Topics “What Are State-Space Models?” on page 11-3
12-381
12
Functions
customblm Bayesian linear regression model with custom joint prior distribution
Description The Bayesian linear regression model on page 12-394 object customblm contains a log of the pdf of the joint prior distribution of (β,σ2). The log pdf is a custom function that you declare. The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. MATLAB treats the prior distribution function as if it is unknown. Therefore, the resulting posterior distributions are not analytically tractable. To estimate or simulate from posterior distributions, MATLAB implements the slice sampler. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12384.
Creation Syntax PriorMdl = customblm(NumPredictors,'LogPDF',LogPDF) PriorMdl = customblm(NumPredictors,'LogPDF',LogPDF,Name,Value) Description PriorMdl = customblm(NumPredictors,'LogPDF',LogPDF) creates a Bayesian linear regression model on page 12-394 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. LogPDF is a function representing the log of the joint prior distribution of (β,σ2). PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = customblm(NumPredictors,'LogPDF',LogPDF,Name,Value) sets properties on page 12-382 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, customblm(2,'LogPDF',@logprior,'Intercept',false) specifies the function that represents the log of the joint prior density of (β,σ2), and specifies a regression model with 2 regression coefficients, but no intercept.
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter 12-382
customblm
PriorMdl.Intercept = false;
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term from the value. After creating a model, if you change the value of NumPredictors using dot notation, then VarNames reverts to its default value. Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char LogPDF — Log of joint probability density function of (β,σ2) function handle 12-383
12
Functions
Log of the joint probability density function of (β,σ2), specified as a function handle in the form @fcnName, where fcnName is the function name. Suppose logprior is the name of the MATLAB function defining the joint prior distribution of (β,σ2). Then, logprior must have this form. function [logpdf,glpdf] = logprior(params) ... end
where: • logpdf is a numeric scalar representing the log of the joint probability density of (β,σ2). • glpdf is an (Intercept + NumPredictors + 1)-by-1 numeric vector representing the gradient of logpdf. Elements correspond to the elements of params. glpdf is an optional output argument, and only the Hamiltonian Monte Carlo sampler (see hmcSampler) applies it. If you know the analytical partial derivative with respect to some parameters, but not others, then set the elements of glpdf corresponding to the unknown partial derivatives to NaN. MATLAB computes the numerical gradient for missing partial derivatives, which is convenient, but slows sampling. • params is an (Intercept + NumPredictors + 1)-by-1 numeric vector. The first Intercept + NumPredictors elements must correspond to values of β, and the last element must correspond to the value of σ2. The first element of β is the intercept, if one exists. All other elements correspond to predictor variables in the predictor data, which you specify during estimation, simulation, or forecasting. Example: 'LogPDF',@logprior
Object Functions estimate simulate forecast plot summarize
Estimate posterior distribution of Bayesian linear regression model parameters Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of standard Bayesian linear regression model
Examples Create Custom Multivariate t Prior Model for Coefficients Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR).
For all time points, is a series of independent Gaussian disturbances with a mean of 0 and variance . Assume these prior distributions:
12-384
customblm
•
•
is 4-D t distribution with 50 degrees of freedom for each component and the identity matrix for the correlation matrix. Also, the distribution is centered at
and each
component is scaled by the corresponding elements of the vector
.
.
bayeslm treats these assumptions and the data likelihood as if the corresponding posterior is analytically intractable. Declare a MATLAB® function that: • Accepts values of hyperparameters •
and
together in a column vector, and accepts values of the
Returns the value of the joint prior distribution,
, given the values of
and
function logPDF = priorMVTIG(params,ct,st,dof,C,a,b) %priorMVTIG Log density of multivariate t times inverse gamma % priorMVTIG passes params(1:end-1) to the multivariate t density % function with dof degrees of freedom for each component and positive % definite correlation matrix C. priorMVTIG returns the log of the product of % the two evaluated densities. % % params: Parameter values at which the densities are evaluated, an % m-by-1 numeric vector. % % ct: Multivariate t distribution component centers, an (m-1)-by-1 % numeric vector. Elements correspond to the first m-1 elements % of params. % % st: Multivariate t distribution component scales, an (m-1)-by-1 % numeric (m-1)-by-1 numeric vector. Elements correspond to the % first m-1 elements of params. % % dof: Degrees of freedom for the multivariate t distribution, a % numeric scalar or (m-1)-by-1 numeric vector. priorMVTIG expands % scalars such that dof = dof*ones(m-1,1). Elements of dof % correspond to the elements of params(1:end-1). % % C: Correlation matrix for the multivariate t distribution, an % (m-1)-by-(m-1) symmetric, positive definite matrix. Rows and % columns correspond to the elements of params(1:end-1). % % a: Inverse gamma shape parameter, a positive numeric scalar. % % b: Inverse gamma scale parameter, a positive scalar. % beta = params(1:(end-1)); sigma2 = params(end); tVal = (beta - ct)./st; mvtDensity = mvtpdf(tVal,C,dof); igDensity = sigma2^(-a-1)*exp(-1/(sigma2*b))/(gamma(a)*b^a);
12-385
12
Functions
logPDF = log(mvtDensity*igDensity); end
Create an anonymous function that operates like priorMVTIG, but accepts only the parameter values, and holds the hyperparameter values fixed. dof = 50; C = eye(4); ct = [-25; 4; 0; 3]; st = ones(4,1); a = 3; b = 1; logPDF = @(params)priorMVTIG(params,ct,st,dof,C,a,b);
Create a custom joint prior model for the linear regression parameters. Specify the number of predictors p. Also, specify the function handle for priorMVTIG and the variable names. p = 3; PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,... 'VarNames',["IPI" "E" "WR"]) PriorMdl = customblm with properties: NumPredictors: Intercept: VarNames: LogPDF:
3 1 {4x1 cell} @(params)priorMVTIG(params,ct,st,dof,C,a,b)
The priors are defined by the function: @(params)priorMVTIG(params,ct,st,dof,C,a,b)
PriorMdl is a customblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. In this case, bayeslm does not display a summary of the prior distributions at the command line.
Estimate Marginal Posterior Distributions Consider the linear regression model in “Create Custom Multivariate t Prior Model for Coefficients” on page 12-384. Create an anonymous function that operates like priorMVTIG, but accepts the parameter values only and holds the hyperparameter values fixed at their values. dof = 50; C = eye(4); ct = [-25; 4; 0; 3]; st = ones(4,1); a = 3; b = 1; logPDF = @(params)priorMVTIG(params,ct,st,dof,C,a,b);
12-386
customblm
Create a custom joint prior model for the linear regression parameters. Specify the number of predictors p. Also, specify the function handle for priorMVTIG and the variable names. p = 3; PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,... 'VarNames',["IPI" "E" "WR"]) PriorMdl = customblm with properties: NumPredictors: Intercept: VarNames: LogPDF:
3 1 {4x1 cell} @(params)priorMVTIG(params,ct,st,dof,C,a,b)
The priors are defined by the function: @(params)priorMVTIG(params,ct,st,dof,C,a,b)
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. Specify a width for the slice sampler that is close to the posterior standard deviation of the parameters assuming a diffuse prior model. Reduce serial correlation by specifying a thinning factor of 10, and reduce the effective default number of draws by a factor of 10. width = [20,0.5,0.01,1,20]; thin = 10; numDraws = 1e5/thin; rng(1) % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Width',width,'Thin',thin,... 'NumDraws',numDraws); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------Intercept | -25.0069 0.9919 [-26.990, -23.065] 0.000 Empirical IPI | 4.3544 0.1083 [ 4.143, 4.562] 1.000 Empirical E | 0.0011 0.0002 [ 0.001, 0.001] 1.000 Empirical WR | 2.5613 0.3293 [ 1.939, 3.222] 1.000 Empirical Sigma2 | 47.0593 8.7570 [32.690, 67.115] 1.000 Empirical
PosteriorMdl is an empiricalblm model object storing draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of WR is in [1.939, 3.222] is 0.95. 12-387
12
Functions
• Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0. estimate derives the posterior characteristics from draws from the posterior distributions, which MATLAB® stores as matrices in the properties BetaDraws and Sigma2Draws. To monitor mixing and convergence of the MCMC sample, construct trace plots. In the BetaDraws property, draws correspond to columns and parameters to rows. figure; for j = 1:4 subplot(2,2,j) plot(PosteriorMdl.BetaDraws(j,:)) title(sprintf('Trace Plot of %s',PosteriorMdl.VarNames{j})); end
figure; plot(PosteriorMdl.Sigma2Draws) title('Trace Plot of Sigma2');
12-388
customblm
The trace plots indicate adequate mixing and convergence, and there are no transient effects to remove.
Estimate Conditional Posterior Distribution Consider the linear regression model in “Create Custom Multivariate t Prior Model for Coefficients” on page 12-384. Create an anonymous function that operates like priorMVTIG, but accepts the parameter values only and holds the hyperparameter values fixed. dof = 50; C = eye(4); ct = [-25; 4; 0; 3]; st = ones(4,1); a = 3; b = 1; logPDF = @(params)priorMVTIG(params,ct,st,dof,C,a,b);
Create a custom joint prior model for the linear regression parameters. Specify the number of predictors p. Also, specify the function handle for priorMVTIG and the variable names. p = 3; PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,... 'VarNames',["IPI" "E" "WR"])
12-389
12
Functions
PriorMdl = customblm with properties: NumPredictors: Intercept: VarNames: LogPDF:
3 1 {4x1 cell} @(params)priorMVTIG(params,ct,st,dof,C,a,b)
The priors are defined by the function: @(params)priorMVTIG(params,ct,st,dof,C,a,b)
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the conditional posterior distribution of β given the data and σ2 = 2, and return the estimation summary table to access the estimates. Specify a width for the slice sampler that is close to the posterior standard deviation of the parameters assuming a diffuse prior model. Reduce serial correlation by specifying a thinning factor of 10, and reduce the effective default number of draws by a factor of 10. width = [20,0.5,0.01,1]; thin = 10; numDraws = 1e5/thin; rng(1) % For reproducibility [Mdl,Summary] = estimate(PriorMdl,X,y,'Sigma2',2,... 'Width',width,'Thin',thin,'NumDraws',numDraws); Method: MCMC sampling with 10000 draws Conditional variable: Sigma2 fixed at Number of observations: 62 Number of predictors: 4
2
| Mean Std CI95 Positive Distribution -------------------------------------------------------------------------Intercept | -24.7820 0.8767 [-26.483, -23.054] 0.000 Empirical IPI | 4.3825 0.0254 [ 4.332, 4.431] 1.000 Empirical E | 0.0011 0.0000 [ 0.001, 0.001] 1.000 Empirical WR | 2.4752 0.0724 [ 2.337, 2.618] 1.000 Empirical Sigma2 | 2 0 [ 2.000, 2.000] 1.000 Empirical
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 2 during estimation, inferences on it are trivial. Extract the mean vector and covariance matrix of the conditional posterior of β from the estimation summary table. condPostMeanBeta = Summary.Mean(1:(end - 1)) condPostMeanBeta = 4×1 -24.7820 4.3825 0.0011
12-390
customblm
2.4752 CondPostCovBeta = Summary.Covariances(1:(end - 1),1:(end - 1)) CondPostCovBeta = 4×4 0.7686 0.0084 -0.0000 0.0019
0.0084 0.0006 0.0000 -0.0015
-0.0000 0.0000 0.0000 -0.0000
0.0019 -0.0015 -0.0000 0.0052
Display Mdl. Mdl Mdl = customblm with properties: NumPredictors: Intercept: VarNames: LogPDF:
3 1 {4x1 cell} @(params)priorMVTIG(params,ct,st,dof,C,a,b)
The priors are defined by the function: @(params)priorMVTIG(params,ct,st,dof,C,a,b)
Because estimate computes the conditional posterior distribution, it returns the original prior model, not the posterior, in the first position of the output argument list. Also, estimate does not return the MCMC sample. Therefore, to monitor convergence of the MCMC sample, use simulate instead and specify the same random number seed.
Estimate Posterior Probability Using Monte Carlo Simulation Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-386. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Turn the estimation display off. dof = 50; C = eye(4); ct = [-25; 4; 0; 3]; st = ones(4,1); a = 3; b = 1; logPDF = @(params)priorMVTIG(params,ct,st,dof,C,a,b); p = 3; PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,... 'VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)};
12-391
12
Functions
y = DataTable{:,'GNPR'}; width = [20,0.5,0.01,1,20]; thin = 10; numDraws = 1e5/thin; rng(1) % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Width',width,'Thin',thin,... 'NumDraws',numDraws,'Display',false);
Estimate posterior distribution summary statistics for β by using the draws from the posterior distribution stored in posterior model. estBeta = mean(PosteriorMdl.BetaDraws,2); EstBetaCov = cov(PosteriorMdl.BetaDraws');
Suppose that if the coefficient of real wages (WR)is below 2.5, then a policy is enacted. Although the posterior distribution of WR is known, and you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead. Draw 1e6 samples from the marginal posterior distribution of β. NumDraws = 1e6; BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws);
BetaSim is a 4-by- 1e6 matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws. Isolate the draws corresponding to the coefficient of WR, and then identify which draws are less than 2.5. isWR = PosteriorMdl.VarNames == "WR"; wrSim = BetaSim(isWR,:); isWRLT2p5 = wrSim < 2.5;
Find the marginal posterior probability that the regression coefficient of WR is below 2.5 by computing the proportion of draws that are less than 2.5. probWRLT2p5 = mean(isWRLT2p5) probWRLT2p5 = 0.4430
The posterior probability that the coefficient of WR is less than 2.5 is about 0.4430.
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-386. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so you can use them to forecast real GNP. Turn the estimation display off. load Data_NelsonPlosser VarNames = {'IPI'; 'E'; 'WR'}; fhs = 10; % Forecast horizon size
12-392
customblm
X = DataTable{1:(end - fhs),VarNames}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,VarNames}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses dof = 50; C = eye(4); ct = [-25; 4; 0; 3]; st = ones(4,1); a = 3; b = 1; logPDF = @(params)priorMVTIG(params,ct,st,dof,C,a,b); p = 3; PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,... 'VarNames',VarNames); width = [20,0.5,0.01,1,20]; thin = 10; numDraws = 1e5/thin; rng(1) % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Width',width,'Thin',thin,... 'NumDraws',numDraws,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product'); ylabel('rGNP'); xlabel('Year'); hold off
12-393
12
Functions
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 12.8148
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. 12-394
customblm
• β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Alternatives The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2017a
See Also Objects conjugateblm | diffuseblm | empiricalblm | semiconjugateblm Functions bayeslm | sampleroptions Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10 “Specify Gradient for HMC Sampler” on page 6-18 “Bayesian Stochastic Search Variable Selection” on page 6-63
12-395
12
Functions
cusumtest Cusum test for structural change
Syntax h = cusumtest(X,y) h = cusumtest(Tbl) h = cusumtest( ___ ,Name=Value) [h,H,Stat,W,B] = cusumtest( ___ ) cusumtest( ___ ) cusumtest(ax, ___ ) [ ___ ,sumPlots] = cusumtest( ___ )
Description Cusum tests on page 12-415 assess the stability of coefficients β in a multiple linear regression model of the form y = Xβ + ε. Inference is based on a sequence of sums, or sums of squares, of recursive residuals (standardized one-step-ahead forecast errors) computed iteratively from nested subsamples of the data. Under the null hypothesis of coefficient constancy, values of the sequence outside an expected range suggest structural change in the model over time. h = cusumtest(X,y) returns test rejection decision h from conducting a cusum test on the multiple linear regression model y = Xβ + ε, where y is a vector of response data and X is a matrix of predictor data. h = cusumtest(Tbl) conducts a cusum test on the variables of the table or timetable Tbl. The response variable in the regression is the last table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. h = cusumtest( ___ ,Name=Value) uses additional options specified by one or more name-value arguments. Some options control the number of tests to conduct. The following conditions apply when cusumtest conducts multiple tests: • cusumtest treats each test as separate from all other tests. • All outputs expand their singleton dimension to contain results from each test. For example, cusumtest(Tbl,ResponseVariable="RGDP",Test=["cusum" cusumsq"]) conducts two cusum tests using GDP as the response variable in the regressions and all other variables in the table Tbl as predictors. The first test uses the cusum test statistic and the second test uses the cusum of squares test statistic. [h,H,Stat,W,B] = cusumtest( ___ ) also returns the following decision statistics from conducting a cusum test, using any input-argument combination in the previous syntaxes: • h, the test decision • H, the sequence of decisions for each iteration of the test 12-396
cusumtest
• Stat, the sequence of test statistics • W, the sequence of recursive residuals • B, the sequence of coefficient estimates cusumtest( ___ ) plots both the sequence of cusums and the critical lines resulting from the cusum tests. cusumtest(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,sumPlots] = cusumtest( ___ ) additionally returns handles to plotted graphics objects. Use elements of sumPlots to modify properties of the plot after you create it.
Examples Conduct Cusum Test for Structural Change Conduct a cusum test to assess whether there is a structural break in the equation for food demand. Input the predictor series as a matrix and input the response series as a vector. Load the US food consumption data set Data_Consumption.mat, which contains annual measurements from 1927 through 1962 with missing data due to World War II in the matrix Data. load Data_Consumption
Suppose that you want to develop a model for consumption as determined by food prices and disposable income, and assess its stability through the economic shock through the war. Plot the series. P = Data(:,1); % Food price index I = Data(:,2); % Disposable income index Q = Data(:,3); % Food consumption index figure; plot(dates,[P I Q]) axis tight grid on xlabel("Year") ylabel("Index") legend(["Price" "Income" "Consumption"],Location="southeast")
12-397
12
Functions
Measurements are missing from 1942 through 1947, which correspond to WWII. Stabilize each series by applying the log transformation. LP = log(P); LI = log(I); LQ = log(Q);
Assume that log consumption is a linear function of the logs of food price and income. LQt = β0 + β1LIt + β2LP + εt .
εt is a Gaussian random variable with mean 0 and standard deviation σ2. Identify the indices before WWII. Plot log consumption with respect to the logs of food price and income. preWarIdx = (dates = datetime(1941,12,31),1); chowtest(LogTT,bp,Display="summary"); RESULTS SUMMARY *************** Test 1 Sample size: 30 Breakpoint: 15 Test type: breakpoint Coefficients tested: All Statistic: 5.5400 Critical value: 3.0088 P value: 0.0049 Significance level: 0.0500 Decision: Reject coefficient stability
12-406
cusumtest
The test results reject the null hypothesis that the coefficients are stable. The Chow and cusum test results are not consistent. For details on cusum test limitations, see “Limitations” on page 12-415.
Test for Structural Break in Volatility Check whether a cusum of squares test can detect a structural break in volatility in simulated data. Simulate a series of data from this regression model yt = 1 2 3 xt + ε1t; t = 1, . . . , 50 yt = 1 2 3 xt + ε2t; t = 51, . . . , 100 .
xt is a series of observations from three standard Gaussian predictor variables. ε1t and ε2t are series of Gaussian innovations both with mean 0 and standard deviation 0.1 and 0.2, respectively. rng(1); % For reproducibility T = 100; X = randn(T,3); sigma1 = 0.1; sigma2 = 0.2; e = [sigma1*randn(T/2,1); sigma2*randn(T/2,1)]; b = (1:3)'; y = X*b + e;
Conduct a cusum of squares test using a 5% level of significance. Plot the test statistics and critical region bands. Indicate that there is no model intercept. Request to return whether the test statistics cross into critical region at each iteration. [~,H] = cusumtest(X,y,Test="cusumsq",Plot="on", ... Direction=["forward" "backward"],Display="off", ... Intercept=false);
12-407
12
Functions
12-408
cusumtest
Because the test statistics cross the critical lines at least once for both tests, the tests reject the null hypothesis of constant volatility at 5% level. The test statistics change direction around iteration 50, which is consistent with the simulated break in volatility in the data. H is a 2-by-97 logical matrix containing the sequence of decisions for each iteration of each cusum of squares test. The first row corresponds to the forward cusum of squares test, and the second row corresponds to the backward cusum of squares test. For the forward test, determine the iterations that result in the test statistics crossing the critical line. bp = find(H(1,:) == 1) bp = 1×35 24
25
26
27
28
29
30
31
32
33
34
35
36
37
Input Arguments X — Predictor data X numeric matrix Predictor data X for the multiple linear regression model, specified as a numObs-by-numPreds numeric matrix. 12-409
38
39
12
Functions
Each row represents one of the numObs observations and each column represents one of the numPreds predictor variables. Data Types: double y — Response data y numeric vector Response data y for the multiple linear regression model, specified as a numObs-by-1 numeric vector. Rows of y and X correspond. Data Types: double Tbl — Combined predictor and response data table | timetable Combined predictor and response data for the multiple linear regression model, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. The test regresses the response variable, which is the last variable in Tbl, on the predictor variables, which are all other variables in Tbl. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument to select numPreds predictors. ax — Axes on which to plot vector of Axes objects Axes on which to plot, specified as a vector of Axes objects with length numTests. By default, cusumtest plots each test to a separate figure. Note NaNs in X, y, or Tbl indicate missing values, and cusumtest removes observations containing at least one NaN. That is, to remove NaNs in X or y, cusumtest merges the variables [X y], and then it uses list-wise deletion to remove any row that contains at least one NaN. cusumtest also removes any row of Tbl containing at least one NaN. Removing NaNs in the data reduces the sample size and can create irregular time series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: cusumtest(Tbl,ResponseVariable="RGDP",Test=["cusum" cusumsq"]) conducts two cusum tests using GDP as the response variable in the regressions and all other variables in the table Tbl as predictors. The first test uses the cusum test statistic and the second test uses the cusum of squares test statistic. Intercept — Flag to include intercept true (default) | false | logical vector 12-410
cusumtest
Flag to include an intercept when cusumtest fits the regression model, specified as a value in this table or a length numTests vector of such values. Value
Description
true
cusumtest includes an intercept when fitting the regression model. numCoeffs = numPreds + 1.
false
cusumtest does not include an intercept when fitting the regression model. numCoeffs = numPreds.
cusumtest conducts a separate test for each value in Intercept. Example: Intercept=false excludes an intercept from the model for each test. Data Types: logical Test — Type of cusum test "cusum" (default) | "cusumsq" | character vector | string vector of test names | cell vector of test names Type of cusum test, specified as a test name, or a string vector or cell vector of test names of length numTests. Test Name
Description
"cusum"
Cusum test statistic. See [1].
"cusumsq"
Cusum of squares test statistic. See [1].
cusumtest conducts a separate test for each test name in Test. Example: Test=["cusum" "cusumsq"] conducts two cusum tests. The first test uses the cusum test statistic and the second test uses the cusum of squares test statistic. Data Types: char | cell | string Direction — Iteration direction "forward" (default) | "backward" | character vector | string vector of direction names | cell vector of direction names Iteration direction, specified as a direction name, or a string vector or cell vector of direction names of length numTests. Direction Name
Description
"forward"
cusumtest computes recursive residuals beginning with the first numCoeffs + 1 observations. Then, cusumtest adds one at a time until it reaches numObs observations.
"backward"
cusumtest reverses the order of the observations, and then follows the same steps as in "forward".
cusumtest conducts a separate test for each value in Direction.
12-411
12
Functions
Example: Test=["cusum" "cusumsq"] conducts two cusum tests. The first test computes recursive residuals using the forward method and the second test computes recursive residuals using the backward method. Data Types: char | cell | string Alpha — Nominal significance levels 0.05 (default) | numeric scalar | numeric vector Nominal significance levels for the tests, specified as a numeric scalar or numeric vector of length numTests. • For cusum tests (Test="cusum"), all elements of Alpha must be in the interval (0,1). • For cusum of squares tests (Test="cusumsq"), all elements of Alpha must be in the interval [0.01,0.20]. cusumtest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double Display — Flag for command window display of results "off" | "summary" Flag for a command window display of results, specified as a value in this table. Value
Description
Default Value When
"off"
cusumtest does not display numTests = 1 results in the command window.
"summary"
For each test, cusumtest numTests > 1 displays results in the command window.
The value of Display applies to all tests. Example: Display="off" Data Types: char | string Plot — Flag indicating whether to plot test results "off" | "on" Flag indicating whether to plot test results, specified as a value in this table. Value
Description
Default Value When
"off"
cusumtest does not produce any plots.
cusumtest returns any output argument.
"on"
cusumtest produces individual cusumtest does not return any plots for each test. output arguments.
Depending on the value of Test, the plots show the sequence of cusums or cusums of squares together with critical lines determined by the value of Alpha. 12-412
cusumtest
The value of Plot applies to all tests. Example: Plot="off" Data Types: char | string ResponseVariable — Variable in Tbl to use for response first variable in Tbl (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variable in Tbl to use for response, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. cusumtest uses the same specified response variable for all tests. Example: ResponseVariable="GDP" Example: ResponseVariable=[true false false false] or ResponseVariable=1 selects the first table variable as the response. Data Types: double | logical | char | cell | string PredictorVariables — Variables in Tbl to use for the predictors string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl to use for the predictors, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. cusumtest uses the same specified predictors for all tests. By default, cusumtest uses all variables in Tbl that are not specified by the ResponseVariable name-value argument. Example: PredictorVariables=["UN" "CPI"] Example: PredictorVariables=[false true true false] or DataVariables=[2 3] selects the second and third table variables. Data Types: double | logical | char | cell | string Note • When cusumtest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If the value of any option is a row vector, so is output h. Array and table outputs retain their specified dimensions.
Output Arguments h — Test rejection decisions logical scalar | logical vector 12-413
12
Functions
Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests numTests. cusumtest returns h when you supply the inputs X and y. Hypotheses are independent of the value of Test. • H0: Coefficients in β are equal in all sequential subsamples. • H1: Coefficients in β change during the period of the sample. Elements of h have the following values and meanings. • Values of 1 indicates rejection of H0 in favor of H1. • Value of 0 indicates failure to reject H0. H — Sequence of test rejection decisions logical matrix | table Sequence of test rejection decisions for each iteration of the cusum tests, returned as a numTestsby-(numObs – numPreds) logical matrix or table of logical variables. Rows correspond to separate cusum tests and columns or variables correspond to iterations. When H is a table, variable j has label Hj. • For tests in which Direction is "forward", columns or variables correspond to times numPreds + 1,...,numObs. • For tests in which Direction is "backward", columns or variables correspond to times numObs – (numPreds + 1),...,1. Rows corresponding to tests in which Intercept is true contain one less iteration, and the value in the first column of H defaults to false. For a particular test (row), if any test decision in the sequence is 1, then h is true; that is, h = any(H,2). Otherwise, h is false. Stat — Sequence of test statistics numeric matrix | table Sequence of test statistics for each iteration of the cusum tests, returned as a numTests-by-(numObs – numPreds) numeric matrix or table of numeric variables. Rows correspond to separate cusum tests and columns or variables correspond to iterations. When Stat is a table, variable j has label Statj. Values in any row depend on the value of Test. Array indices corresponds to the indexing in H. When W is a table, variable j has label Wj. Rows corresponding to tests in which Intercept is true contain one less iteration, and the value in the first column of Stat defaults to NaN. W — Sequence of standardized recursive residuals numeric matrix | table Sequence of standardized recursive residuals, returned as a numTests-by-(numObs – numPreds) numeric matrix or table of numeric variables. 12-414
cusumtest
Values in any row depend on the value of Test. Array indices correspond to the indexing in H. When W is a table, variable j has label Wj. Rows corresponding to tests in which Intercept is true contain one less iteration, and the value in the first column of W defaults to NaN. B — Sequence of recursive regression coefficient estimates numeric array Sequence of recursive regression coefficient estimates, returned as a (numPreds + 1)-by-(numObs – numPreds)-by-numTests numeric array. • B(i,j,k) corresponds to coefficient i at iteration j for test k. At iteration j of test k, cusumtest estimates the coefficients using B(:,j,k) = X(1:numPreds+j,inRegression)\y(1:numPreds+j);
inRegression is a logical vector indicating the predictors in the regression at iteration j of test k. • During forward iterations, initially constant predictors can cause multicollinearity. Therefore, cusumtest holds out constant predictors until their data changes. For iterations in which cusumtest excludes predictors from the regression, corresponding coefficient estimates default to NaN. Similarly, for backward regression, cusumtest holds out terminally constant predictors. For more details, see [1]. • Tests in which: • Intercept is true contain one less iteration, and all values in the first column of B default to NaN. • Intercept is false contain one less coefficient, and the value in the first row, which corresponds to the intercept, defaults to NaN. sumPlots — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a 3-by-numTests graphics array. sumPlots contains unique plot identifiers, which you can use to query or modify properties of the plot.
Limitations Cusum tests have little power to detect structural changes in the following cases. • Late in the sample period • When multiple changes produce cancellations in the cusums
More About Cusum Tests Cusum tests provide useful diagnostics for various model misspecifications, including gradual structural change, multiple structural changes, missing predictors, and neglected nonlinearities. The tests, formulated in [1], are based on cumulative sums, or cusums, of residuals resulting from recursive regressions. 12-415
12
Functions
Tips • The cusum of squares test: • Is a “useful complement to the cusum test, particularly when the departure from constancy of the [recursive coefficients] is haphazard rather than systematic” [1] • Has greater power for cases in which multiple shifts are likely to cancel • Is often suggested for detecting structural breaks in volatility • Alpha specifies the nominal significance levels for the tests. The actual size of a test depends on various assumptions and approximations that cusumtest uses to compute the critical lines. Plots of the recursive residuals are the best indicator of structural change. Brown, et al. suggest that the tests “should be regarded as yardsticks for the interpretation of data rather than leading to hard and fast decisions” [1]. • To produce basic diagnostic plots of the recursive coefficient estimates having the same scale for test n, enter plot(B(:,:,n)')
recreg produces similar plots, optionally using robust standard error bands.
Algorithms • cusumtest handles initially constant predictor data using the method suggested in [1]. If a predictor's data is constant for the first numCoeffs observations and this results in multicollinearity with an intercept or another predictor, then cusumtest drops the predictor from regressions and the computation of recursive residuals until its data changes. Similarly, cusumtest temporarily holds out terminally constant predictors from backward regressions. Initially constant predictors in backward regressions, or terminally constant predictors in forward regressions, are not held out by cusumtest and can lead to rank deficiency in terminal iterations. • cusumtest computes critical lines for inference in essentially different ways for the two test statistics. For cusums, cusumtest solves the normal CDF equation in [1] dynamically for each value of Alpha. For the cusums of squares test, cusumtest interpolates parameter values from the table in [2], using the method suggested in [1]. Sample sizes with degrees of freedom less than 4 are below tabulated values, and cusumtest cannot compute critical lines. Sample sizes with degrees of freedom greater than 202 are above tabulated values, and cusumtest uses the critical value associated with the largest tabulated sample size.
Version History Introduced in R2016a R2022a: cusumtest returns some outputs in tables when you supply a table of data If you supply a table of time series data Tbl, cusumtest returns H, Stat, and W as tables containing variables for recursive decision statistics with rows corresponding to separate tests. Before R2022a, cusumtest returned the matrix outputs when you supplied a table of input data. Starting in R2022a, if you supply a table of input data, update your code by accessing results in H, Stat, and W using table indexing. For more details, see “Access Data in Tables”. 12-416
cusumtest
References [1] Brown, R. L., J. Durbin, and J. M. Evans. "Techniques for Testing the Constancy of Regression Relationships Over Time." Journal of the Royal Statistical Society, Series B. Vol. 37, 1975, pp. 149–192. [2] Durbin, J. "Tests for Serial Correlation in Regression Analysis Based on the Periodogram of Least Squares Residuals." Biometrika. Vol. 56, 1969, pp. 1–15.
See Also recreg | fitlm | LinearModel | chowtest
12-417
12
Functions
corrplot Plot variable correlations
Syntax [R,PValue] = corrplot(X) [R,PValue] = corrplot(Tbl) [ ___ ] = corrplot( ___ ,Name=Value) corrplot( ___ ) corrplot(ax, ___ ) [ ___ ,H] = corrplot( ___ )
Description [R,PValue] = corrplot(X) plots Pearson's correlation coefficients between all pairs of variables in the matrix of time series data X. The plot is a numVars-by-numVars grid, where numVars is the number of time series variables (columns) in X, including the following subplots: • Each off diagonal subplot contains a scatterplot of a pair of variables with a least-squares reference line, the slope of which is equal to the displayed correlation coefficient. • Each diagonal subplot contains the distribution of a variable as a histogram. Also, the function returns the correlation matrix in the plots R and a matrix of p-values PValue for testing the null hypothesis that each pair of coefficients is not correlated against the alternative hypothesis of a nonzero correlation. [R,PValue] = corrplot(Tbl) plots the Pearson's correlation coefficients between all pairs of variables in the table or timetable Tbl, and also returns tables for the correlation matrix R and matrix of p-values PValue. To select a subset of variables in Tbl, for which to plot the correlation matrix, use the DataVariables name-value argument. [ ___ ] = corrplot( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. corrplot returns the output argument combination for the corresponding input arguments. For example, corrplot(Tbl,Type="Spearman",TestR="on",DataVariables=1:5) computes Spearman’s rank correlation coefficient for the first 5 variables of the table Tbl and tests for significant correlation coefficients. corrplot( ___ ) plots the correlation matrix. corrplot(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,H] = corrplot( ___ ) plots the diagnostics of the input series and additionally returns handles to plotted graphics objects H. Use elements of H to modify properties of the plot after you create it. 12-418
corrplot
Examples Plot and Return Pearson's Correlation Coefficients Between Variables in Matrix of Data Plot and return Pearson's correlation coeffifients between pairs of time series using the default options of corrplot. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada
Plot and return the correlation matrix between all pairs of variables in the data. R = corrplot(Data)
R = 5×5 1.0000 0.9266 0.7401 0.7287 0.7136
0.9266 1.0000 0.5908 0.5716 0.5556
0.7401 0.5908 1.0000 0.9758 0.9384
0.7287 0.5716 0.9758 1.0000 0.9861
0.7136 0.5556 0.9384 0.9861 1.0000
The correlation plot shows that the short-term, medium-term, and long-term interest rates are highly correlated. 12-419
12
Functions
Plot and Return Correlations and p-values Between Table Variables Plot correlations between time series, which are variables in a table, using default options. Return a table of pairwise correlations and a table of corresponding significance-test p-values. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Plot and return the correlation matrix, with corresponding significance-test p-values, between all pairs of variables in the data [R,PValue] = corrplot(TT)
R=5×5 table
INF_C INF_G INT_S
12-420
INF_C _______
INF_G _______
INT_S _______
INT_M _______
INT_L _______
1 0.92665 0.74007
0.92665 1 0.59077
0.74007 0.59077 1
0.72867 0.57159 0.9758
0.7136 0.55557 0.93843
corrplot
INT_M INT_L
0.72867 0.7136
0.57159 0.55557
PValue=5×5 table INF_C __________ INF_C INF_G INT_S INT_M INT_L
1 3.6657e-18 3.2113e-08 6.6174e-08 1.6318e-07
0.9758 0.93843
1 0.98609
0.98609 1
INF_G __________
INT_S __________
INT_M __________
INT_L __________
3.6657e-18 1 4.7739e-05 9.4769e-05 0.00016278
3.2113e-08 4.7739e-05 1 2.3206e-27 1.3408e-19
6.6174e-08 9.4769e-05 2.3206e-27 1 5.1602e-32
1.6318e-07 0.00016278 1.3408e-19 5.1602e-32 1
corrplot returns the correlation matrix and corresponding matrix of p-values in tables R and PValue, respectively. By default, corrplot computes correlations between all pairs of variables in the input table. To select a subset of variables from an input table, set the DataVariables option.
Plot Correlations Between Selected Variables Plot the correlation matrix for selected time series. Load the credit default data set Data_CreditDefaults.mat. The table DataTable contains the default rate of investment-grade corporate bonds series (IGD, the response variable) and several predictor variables. load Data_CreditDefaults
Consider a multiple regression model for the default rate that includes an intercept term. Include a variable in the table of data that represents the intercept in the design matrix (that is, a column of ones). Place the intercept variable at the beginning of the table. Const = ones(height(DataTable),1); DataTable = addvars(DataTable,Const,Before=1);
Create a variable that contains all predictor variable names. varnames = DataTable.Properties.VariableNames; prednames = varnames(varnames ~= "IGD");
Graph a correlation plot of all predictor variables except for the intercept dummy variable. corrplot(DataTable,DataVariables=prednames(2:end));
12-421
12
Functions
The predictor BBB is moderately linearly associated with the other predictors, while all other predictors appear unassociated with each other.
Plot and Test Kendall's Rank Correlation Coefficients Plot Kendall's rank correlations between multiple time series. Conduct a hypothesis test to determine which correlations are significantly different from zero. Load data on Canadian inflation and interest rates. load Data_Canada
Plot the Kendall's rank correlation coefficients between all pairs of variables. Identify which correlations are significantly different from zero by conducting hypothesis tests. corrplot(DataTable,Type="Kendall",TestR="on")
12-422
corrplot
The correlation coefficients highlighted in red indicate which pairs of variables have correlations significantly different from zero. For these time series, all pairs of variables have correlations significantly different from zero.
Conduct Right-Tailed Correlation Tests Test for correlations greater than zero between multiple time series. Load data on Canadian inflation and interest rates Data_Canada.mat. load Data_Canada
Return the pairwise Pearson's correlations and corresponding p-values for testing the null hypothesis of no correlation against the right-tailed alternative that the correlations are greater than zero. [R,PValue] = corrplot(DataTable,Tail="right");
12-423
12
Functions
PValue PValue=5×5 table INF_C __________ INF_C INF_G INT_S INT_M INT_L
1 1.8329e-18 1.6056e-08 3.3087e-08 8.1592e-08
INF_G __________
INT_S __________
INT_M __________
INT_L __________
1.8329e-18 1 2.3869e-05 4.7384e-05 8.1392e-05
1.6056e-08 2.3869e-05 1 1.1603e-27 6.7041e-20
3.3087e-08 4.7384e-05 1.1603e-27 1 2.5801e-32
8.1592e-08 8.1392e-05 6.7041e-20 2.5801e-32 1
The output PValue has pairwise p-values all less than the default 0.05 significance level, indicating that all pairs of variables have correlation significantly greater than zero.
Input Arguments X — Time series data numeric matrix Time series data, specified as a numObs-by-numVars numeric matrix. Each column of X corresponds to a variable, and each row corresponds to an observation. Data Types: double 12-424
corrplot
Tbl — Time series data table | timetable Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. Specify numVars variables to include in the diagnostics computations by using the DataVariables argument. The selected variables must be numeric. ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, corrplot plots to the current axes (gca). corrplot does not support UIAxes targets. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: corrplot(Tbl,Type="Spearman",TestR="on",DataVariables=1:5) computes Spearman’s rank correlation coefficient for the first 5 variables of the table Tbl and tests for significant correlation coefficients. Type — Correlation coefficient "Pearson" (default) | "Kendall" | "Spearman" | character vector Correlation coefficient to compute, specified as a value in this table. Value
Description
"Pearson"
Pearson’s linear correlation coefficient
"Kendall"
Kendall’s rank correlation coefficient (τ)
"Spearman"
Spearman’s rank correlation coefficient (ρ)
Example: Type="Kendall" Data Types: char | string Rows — Option for handling rows in input time series data that contain NaN values "pairwise" (default) | "all" | "complete" | character vector Option for handling rows in the input time series data that contain NaN values, specified as a value in this table. Value
Description
"all"
Use all rows, regardless of any NaN entries.
"complete"
Use only rows that do not contain NaN entries.
12-425
12
Functions
Value
Description
"pairwise"
Use rows that do not contain NaN entries in column (variable) i or j to compute R(i,j).
Example: Rows="complete" Data Types: char | string Tail — Alternative hypothesis "both" (default) | "right" | "left" | character vector Alternative hypothesis Ha used to compute the p-values PValue, specified as a value in this table. Value
Description
"both"
Ha: Correlation is not zero.
"right"
Ha: Correlation is greater than zero.
"left"
Ha: Correlation is less than zero.
Example: Tail="left" Data Types: char | string VarNames — Unique variable names to use in plots string vector | character vector | cell vector of strings | cell vector of character vectors Unique variable names used in the plots, specified as a string vector or cell vector of strings of a length numVars. VarNames(j) specifies the name to use for variable X(:,j) or DataVariables(j). • If the input time series data is the matrix X, the default is {'var1','var2',...}. • If the input time series data is the table or timetable Tbl, the default is Tbl.Properties.VariableNames. Example: VarNames=["Const" "AGE" "BBD"] Data Types: char | cell | string TestR — Flag for testing whether correlations are significant "off" (default) | "on" | character vector Flag for testing whether correlations are significant, specified as a value in this table. Value
Description
"on"
corrplot highlights significant correlations in the correlation matrix plot using red font.
"off"
All correlations in the correlation matrix plot have black font.
Example: TestR="on" Data Types: char | string Alpha — Significance level 0.05 (default) | scalar in [0,1] 12-426
corrplot
Significance level for correlation tests, specified as a scalar in the interval [0,1]. Example: Alpha=0.01 Data Types: double DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl for which corrplot includes in the correlation matrix plot, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true true false] or DataVariables=1:3 selects the first through third table variables. Data Types: double | logical | char | cell | string
Output Arguments R — Correlations numeric matrix | table Correlations between pairs of variables in the input time series data that are displayed in the plots, returned as one of the following quantities: • numVars-by-numVars numeric matrix when you supply the input X. • numVars-by-numVars table when you supply the input Tbl, where numVars is the selected number of variables in the DataVariables argument. PValue — p-values numeric matrix | table p-values corresponding to significance tests on the elements of R, returned as one of the following quantities: • numVars-by-numVars numeric matrix when you supply the input X. • numVars-by-numVars table when you supply the input Tbl, where the variables specified by the DataVariables argument determines numVars and the names of the rows and columns of the output table. The p-values are used to test the null hypothesis of no correlation against the alternative hypothesis of a nonzero correlation, with test tail specified by the TestR argument. H — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as one of the following quantities: • numVars-by-numVars matrix of graphics objects when you supply the input X 12-427
12
Functions
• numVars-by-numVars table of graphics objects when you supply the input Tbl, where the variables specified by the DataVariables argument determines numVars and the names of the rows and columns of the output table H contains unique plot identifiers, which you can use to query or modify properties of the plot.
Tips • The setting Rows="pairwise" (the default) can return a correlation matrix that is not positive definite. The setting Rows="complete" returns a positive-definite matrix, but, in general, the estimates are based on fewer observations.
Algorithms • corrplot computes p-values for Pearson’s correlation by transforming the correlation to create a t-statistic with numObs – 2 degrees of freedom. The transformation is exact when the input time series data is normal. • corrplot computes p-values for Kendall’s and Spearman’s rank correlations by using either the exact permutation distributions (for small sample sizes) or large-sample approximations. • corrplot computes p-values for two-tailed tests by doubling the more significant of the two onetailed p-values.
Version History Introduced in R2012a R2022a: corrplot returns results in tables when you supply a table of data If you supply a table of time series data Tbl, corrplot returns all outputs in separate tables. Rows and variables in the tables correspond to the variables specified by DataVariables. Before R2022a, corrplot returned each output as a matrix when you supplied a table of input data. Starting in R2022a, if you supply a table of input data and return any of the outputs, access results by using table indexing. For more details, see “Access Data in Tables”.
See Also Apps Econometric Modeler Functions collintest | corr Topics “Time Series Regression II: Collinearity and Estimator Variance” on page 5-183 “Plot Time Series Data Using Econometric Modeler App” on page 4-66
12-428
crosscorr
crosscorr Sample cross-correlation
Syntax [xcf,lags] = crosscorr(y1,y2) XCFTbl = crosscorr(Tbl) [ ___ ,bounds] = crosscorr( ___ ) [ ___ ] = crosscorr( ___ ,Name=Value) crosscorr( ___ ) crosscorr(ax, ___ ) [ ___ ,h] = crosscorr( ___ )
Description [xcf,lags] = crosscorr(y1,y2) returns the sample cross-correlation function on page 12-440 (XCF) xcf and associated lags lags between the univariate time series y1 and y2. XCFTbl = crosscorr(Tbl) returns the table XCFTbl containing variables for the sample XCF and associated lags of the last two variables in the input table or timetable Tbl. To select different variables in Tbl, for which to compute the XCF, use the DataVariables name-value argument. [ ___ ,bounds] = crosscorr( ___ ) uses any input-argument combination in the previous syntaxes, and returns the output-argument combination for the corresponding input arguments and the approximate upper and lower confidence bounds bounds on the XCF. [ ___ ] = crosscorr( ___ ,Name=Value) uses additional options specified by one or more namevalue arguments. For example, crosscorr(Tbl,DataVariables=["RGDP" "CPI"],NumLags=10,NumSTD=1.96) returns the sample XCF for lags -10 through 10 of the table variables "RGDP" and "CPI" in Tbl and 95% confidence bounds. crosscorr( ___ ) plots the sample XCF between the input series with confidence bounds. crosscorr(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = crosscorr( ___ ) plots the sample XCF between the input series and additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it.
Examples Compute XCF Between Vectors of Time Series Data Compute the XCF between two univariate time series. Input the time series data as numeric vectors. Load the equity index data Data_EquityIdx.mat. The variable Data is a 3028-by-2 matrix of daily closing prices from the NASDAQ and NYSE composite indices. Plot the two series. 12-429
12
Functions
load Data_EquityIdx yyaxis left dt = datetime(dates,ConvertFrom="datenum"); plot(dt,Data(:,1)) ylabel("NASDAQ") yyaxis right plot(dt,Data(:,2)) ylabel("NYSE") title("Daily Closing Prices, 1990-2001")
The series exhibit exponential growth. Compute the returns of each series. Ret = price2ret(Data);
Ret is a 3027-by-2 series of returns; it has one less observation than Data. Compute the XCF between the NASDAQ and NYSE returns, and return the associated lags. rnasdaq = Ret(:,1); rnyse = Ret(:,2); [xcf,lags] = crosscorr(rnasdaq,rnyse);
xcf and lags are 41-by-1 vectors that describe the XCF. Display several values of the XCF. 12-430
crosscorr
XCF = [xcf lags]; XCF([1:3 20:22 end-2:end],:) ans = 9×2 -0.0108 0.0186 -0.0002 0.0345 0.7080 0.0651 -0.0461 0.0010 0.0015
-20.0000 -19.0000 -18.0000 -1.0000 0 1.0000 18.0000 19.0000 20.0000
The correlation between the current NASDAQ return and the NYSE return from 20 days before is xcf(1) = -0.0108. The correlation between the NASDAQ and NYSE returns is xcf(21) = 0.7080. The correlation between the NASDAQ return from 20 days ago and the current NYSE return is xcf(41) = 0.0015.
Compute XCF of Table Variable Compute the XCF between two univariate time series, which are two variables in a table. Load the equity index data Data_EquityIdx.mat. The variable DataTable is a 3028-by-2 table of daily closing prices from the NYSE and NASDAQ composite indices, which are stored in the variables NYSE and NASDAQ. load Data_EquityIdx DataTable.Properties.VariableNames ans = 1x2 cell {'NYSE'}
{'NASDAQ'}
Compute the returns of the series. Store the results in a new table. RetTbl = price2ret(DataTable); head(RetTbl) Tick ____ 2 3 4 5 6 7 8 9
Interval ________ 1 1 1 1 1 1 1 1
NYSE __________
NASDAQ __________
-0.0010106 -0.0076633 -0.0084415 0.0035387 -0.010188 -0.0063818 0.0034295 -0.023407
0.0034122 -0.0032816 -0.0025501 0.0010688 -0.0042382 -0.013378 -0.0040909 -0.020573
RetTbl is a 3027-by-4 table containing the returns of the indices, ticks (days by default), and time intervals between successive prices. 12-431
12
Functions
Compute the XCF between the NASDAQ and NYSE return series. XCFTbl = crosscorr(RetTbl) XCFTbl=41×2 table Lags XCF ____ ___________ -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 ⋮
-0.010809 0.018571 -0.00016185 -0.020271 -0.029353 0.00023188 -0.0080616 0.041498 0.078821 -0.013793 0.0076655 0.01763 -0.0011033 -0.011457 -0.016523 -0.046749
crosscorr returns the results in the table XCFTbl, where variables correspond to the XCF (XCF) and associated lags (Lags). By default, crosscorr computes the XCF of the two variables in the table. To select variables from an input table, set the DataVariables option.
Return XCF Confidence Bounds Consider the equity index series in “Compute XCF of Table Variable” on page 12-431. Load the NYSE and NASDAQ closing price series in Data_EquityIdx.mat and preprocess the series. Compute the XCF and return the XCF confidence bounds. load Data_EquityIdx RetTbl = price2ret(DataTable); [XCFTbl,bounds] = crosscorr(RetTbl) XCFTbl=41×2 table Lags XCF ____ ___________ -20 -19 -18 -17 -16 -15 -14
12-432
-0.010809 0.018571 -0.00016185 -0.020271 -0.029353 0.00023188 -0.0080616
crosscorr
-13 -12 -11 -10 -9 -8 -7 -6 -5 ⋮
0.041498 0.078821 -0.013793 0.0076655 0.01763 -0.0011033 -0.011457 -0.016523 -0.046749
bounds = 2×1 0.0364 -0.0364
Assuming the NYSE and NASDAQ return series are uncorrelated, an approximate 95.4% confidence interval on the XCF is (-0.0364, 0.0364).
Plot the XCF Generate 100 random variates from a Gaussian distribution with mean 0 and variance 1. rng(3); % For reproducibility x = randn(100,1);
Create a 4-period delayed version of x. y = lagmatrix(x,4);
Plot the XCF between x and y. Because lagmatrix prepends lagged series with NaN values and crosscorr does not support NaN values, start the series at observation 5. crosscorr(x(5:end),y(5:end))
12-433
12
Functions
The upper and lower confidence bounds are the horizontal lines in the XCF plot. By design, the XCF peaks at lag 4.
Select Table Variables for XCF Plot Load the currency exchange rates data set Data_FXRates.mat. The table DataTable contains daily exchange rates of several countries, relative to the US dollar from 1980 through 1998 (with omissions). load Data_FXRates.mat dt = datetime(dates,ConvertFrom="datenum");
Plot the UK pound and French franc exchange rates. yyaxis left plot(dt,DataTable.GBP) ylabel("UK Pound/$") yyaxis right plot(dt,DataTable.FRF) ylabel("French Franc/$")
12-434
crosscorr
The series appear to be correlated. Stabilize all series in the table by computing the first difference. DiffDT = varfun(@diff,DataTable); DiffDT.Properties.VariableNames = DataTable.Properties.VariableNames;
Determine whether lags of one series are associated with the other series by computing the XCF between the daily changes in the UK pound and French franc exchange rates. figure crosscorr(DiffDT,DataVariables=["GBP" "FRF"]);
12-435
12
Functions
The series have a high contemporaneous correlation, but all other cross-correlations are either insignificant or below 0.1.
Specify Additional XCF Lags Specify the AR(1) model for the first series y1t = 2 + 0 . 3y1t − 1 + εt, where εt is Gaussian with mean 0 and variance 1. MdlY1 = arima(AR=0.3,Constant=2,Variance=1);
MdlY1 is a fully specified arima object representing the AR(1) model. Simulate data from the AR(1) model. rng(3); % For reproducibility T = 1000; y1 = simulate(MdlY1,T);
Simulate standard Gaussian variates for the second series; induce correlation at lag 36. y2 = [randn(36,1); y1(1:end-36) + randn(T-36,1)*0.1];
12-436
crosscorr
Plot the XCF by using the default settings. crosscorr(y1,y2)
All correlations in the plot are within the 2-standard-error confidence bounds. Therefore, none are significant. Plot the XCF for 60 lags on both sides of lag 0. Specify 3 standard errors for the confidence bounds. crosscorr(y1,y2,NumLags=60,NumSTD=3)
12-437
12
Functions
The plot shows significant correlations at and around lag 36.
Input Arguments y1 — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector of length T1. Data Types: double y2 — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector of length T2. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable with T rows. Each row of Tbl contains contemporaneous observations of all variables. Specify the two input series (variables) by using the DataVariables argument. The selected variables must be numeric. 12-438
crosscorr
ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, crosscorr plots to the current axes (gca). Note Missing observations, specified by NaN entries in the input series, result in a NaN-valued XCF. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: crosscorr(Tbl,DataVariables=["RGDP" "CPI"],NumLags=10,NumSTD=1.96) returns the sample XCF for lags -10 through 10 of the table variables "RGDP" and "CPI" in Tbl and 95% confidence bounds. NumLags — Number of lags positive integer Number of lags in the sample XCF, specified as a positive integer. crosscorr uses lags 0, ±1, ±2, …, ±NumLags to compute the sample XCF. If you supply y1 and y2, the default is min(20, min(T1,T2) – 1)). If you supply Tbl, the default is min(20, T – 1). Example: crosscorr(y1,y2,NumLags=10) plots the sample XCF between y1 and y2 for lags –10 through 10. Data Types: double NumSTD — Number of standard errors in confidence bounds 2 (default) | nonnegative scalar Number of standard errors in the confidence bounds, specified as a nonnegative scalar. The confidence bounds are 0 ± NumSTD*σ , where σ is the estimated standard error of the sample crosscorrelation between the input series assuming the series are uncorrelated. The default yields approximate 95% confidence bounds. Example: crosscorr(y1,y2,NumSTD=1.5) plots the XCF of y1 and y2 with confidence bounds 1.5 standard errors away from 0. Data Types: double DataVariables — Two variables in Tbl last two variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Two variables in Tbl for which crosscorr computes the XCF, specified as a string vector or cell vector of character vectors containing two variable names in Tbl.Properties.VariableNames, or 12-439
12
Functions
an integer or logical vector representing the indices of two names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | string
Output Arguments xcf — Sample XCF numeric vector Sample XCF between the input time series, returned as a numeric vector of length 2*NumLags + 1. The elements of xcf correspond to the elements of lags. The center element is the lag 0 crosscorrelation. crosscorr returns xcf only when you supply the inputs y1 and y2. lags — XCF lags numeric vector XCF lags, returned as a numeric vector with elements (-NumLags):NumLags having the same orientation as y1. crosscorr returns lags only when you supply the inputs y1 and y2. XCFTbl — Sample XCF table Sample XCF, returned as a table with variables for the outputs xcf and lags. crosscorr returns XCFTbl only when you supply the input Tbl. bounds — Approximate upper and lower XCF confidence bounds numeric vector Approximate upper and lower XCF confidence bounds assuming the input series are uncorrelated, returned as a two-element numeric vector. The NumSTD option specifies the number of standard errors from 0 in the confidence bounds. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
More About Cross-Correlation Function The cross-correlation function (XCF) measures the similarity between a time series and lagged versions of another time series as a function of the lag. Consider the time series y1,t and y2,t and lags k = 0, ±1, ±2, …. For data pairs (y1,1,y2,1), (y1,2,y2,2), …, (y1,T,y2,T), an estimate of the lag k cross-covariance is 12-440
crosscorr
T−k
1 T cy1y2(k) =
∑
y1, t − y 1 y2, t + k − y 2 ; k = 0, 1, 2, …
t=1
,
T+k
1 T
∑
y2, t − y 2 y1, t − k − y 1 ; k = 0, − 1, − 2, …
t=1
where y 1 and y 2 are the sample means of the series. The sample standard deviations of the series are: • sy = cy y (0), where cy y (0) = Var(y1) . 1 1 1 1 1 • sy = cy y (0), where cy y (0) = Var(y2) . 2 2 2 2 2 An estimate of the cross-correlation is r y1y2(k) =
cy1y2(k) sy1sy2
; k = 0, ± 1, ± 2, ….
Algorithms • If y1 and y2 have different lengths, crosscorr appends enough zeros to the end of the shorter vector to make both vectors the same size. • crosscorr uses a Fourier transform (fft) to compute the XCF in the frequency domain, and then crosscorr converts back to the time domain using an inverse Fourier transform (ifft). • NaN values in the input series result in NaN values in the output XCF. Unlike autocorr and parcorr, crosscorr does not treat NaN values as missing completely at random. Whereas autocorr and parcorr compute coefficients in the time domain, crosscorr uses fft and ifft to compute coefficients in the frequency domain. Therefore, missing data treatments follow fft and ifft defaults. • crosscorr plots the XCF when you do not request any output or when you request the fourth output.
Version History Introduced before R2006a
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.
See Also autocorr | parcorr | filter
12-441
12
Functions
diffuseblm Bayesian linear regression model with diffuse conjugate prior for data likelihood
Description The Bayesian linear regression model on page 12-451 object diffuseblm specifies that the joint prior distribution of (β,σ2) is proportional to 1/σ2 (the diffuse prior model). The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. The resulting marginal and conditional posterior distributions are analytically tractable. For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12443.
Creation Syntax PriorMdl = diffuseblm(NumPredictors) PriorMdl = diffuseblm(NumPredictors,Name,Value) Description PriorMdl = diffuseblm(NumPredictors) creates a Bayesian linear regression model on page 12-451 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is the diffuse model. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = diffuseblm(NumPredictors,Name,Value) sets properties on page 12-442 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, diffuseblm(2,'VarNames',["UnemploymentRate"; "CPI"]) specifies the names of the two predictor variables in the model.
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter PriorMdl.Intercept = false;
12-442
diffuseblm
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term from the value. After creating a model, if you change the value of NumPredictors using dot notation, then VarNames reverts to its default value. Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char
Object Functions estimate
Estimate posterior distribution of Bayesian linear regression model parameters
12-443
12
Functions
simulate forecast plot summarize
Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of standard Bayesian linear regression model
Examples Create Diffuse Prior Model Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t time points, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Suppose that the regression coefficients β = [β0, . . . , β3]′ and the disturbance variance σ2 are random variables, and you have no prior knowledge of their values or distribution. That is, you want to use the noninformative Jeffreys prior: the joint prior distribution is proportional to 1/σ2. These assumptions and the data likelihood imply an analytically tractable posterior distribution. Create a diffuse prior model for the linear regression parameters, which is the default prior model type. Specify the number of predictors p. p = 3; Mdl = bayeslm(p) Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(1) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(2) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Beta(3) | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Mdl is a diffuseblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, bayeslm displays a summary of the prior distributions. Because the prior is noninformative and the data have not been incorporated yet, the summary is trivial. 12-444
diffuseblm
You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. Mdl.VarNames = ["IPI" "E" "WR"] Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Estimate Marginal Posterior Distributions Consider the linear regression model in “Create Diffuse Prior Model” on page 12-444. Create a diffuse prior model for the linear regression parameters. Specify the number of predictors, p, and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------------Intercept | -24.2536 9.5314 [-43.001, -5.506] 0.006 t (-24.25, 9.37^2, 58) IPI | 4.3913 0.1535 [ 4.089, 4.693] 1.000 t (4.39, 0.15^2, 58) E | 0.0011 0.0004 [ 0.000, 0.002] 0.999 t (0.00, 0.00^2, 58) WR | 2.4682 0.3787 [ 1.723, 3.213] 1.000 t (2.47, 0.37^2, 58) Sigma2 | 51.9790 10.0034 [35.965, 74.937] 1.000 IG(29.00, 0.00069)
12-445
12
Functions
PosteriorMdl is a conjugateblm model object storing the joint marginal posterior distribution of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of WR is in [1.723, 3.213] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.006. • Distribution, which contains descriptions of the posterior distributions of the parameters. For example, the marginal posterior distribution of IPI is t with a mean of 4.39, a standard deviation of 0.15, and 58 degrees of freedom. Access properties of the posterior distribution using dot notation. For example, display the marginal posterior means by accessing the Mu property. PosteriorMdl.Mu ans = 4×1 -24.2536 4.3913 0.0011 2.4682
Estimate Conditional Posterior Distribution Consider the linear regression model in “Create Diffuse Prior Model” on page 12-444. Create a diffuse prior model for the linear regression parameters. Specify the number of predictors p, and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]) PriorMdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Load the Nelson-Plosser data set. Create variables for the response and predictor series. 12-446
diffuseblm
load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the conditional posterior distribution of β given the data and σ2 = 2, and return the estimation summary table to access the estimates. [Mdl,Summary] = estimate(PriorMdl,X,y,'Sigma2',2); Method: Analytic posterior distributions Conditional variable: Sigma2 fixed at 2 Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | -24.2536 1.8696 [-27.918, -20.589] 0.000 N (-24.25, 1.87^2) IPI | 4.3913 0.0301 [ 4.332, 4.450] 1.000 N (4.39, 0.03^2) E | 0.0011 0.0001 [ 0.001, 0.001] 1.000 N (0.00, 0.00^2) WR | 2.4682 0.0743 [ 2.323, 2.614] 1.000 N (2.47, 0.07^2) Sigma2 | 2 0 [ 2.000, 2.000] 1.000 Fixed value
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 2 during estimation, inferences on it are trivial. Extract the mean vector and covariance matrix of the conditional posterior of β from the estimation summary table. condPostMeanBeta = Summary.Mean(1:(end - 1)) condPostMeanBeta = 4×1 -24.2536 4.3913 0.0011 2.4682 CondPostCovBeta = Summary.Covariances(1:(end - 1),1:(end - 1)) CondPostCovBeta = 4×4 3.4956 0.0350 -0.0001 0.0241
0.0350 0.0009 -0.0000 -0.0013
-0.0001 -0.0000 0.0000 -0.0000
0.0241 -0.0013 -0.0000 0.0055
Display Mdl. Mdl Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1
12-447
12
Functions
VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Because estimate computes the conditional posterior distribution, it returns the original prior model, not the posterior, in the first position of the output argument list.
Estimate Posterior Probability Using Monte Carlo Simulation Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-445. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------------Intercept | -24.2536 9.5314 [-43.001, -5.506] 0.006 t (-24.25, 9.37^2, 58) IPI | 4.3913 0.1535 [ 4.089, 4.693] 1.000 t (4.39, 0.15^2, 58) E | 0.0011 0.0004 [ 0.000, 0.002] 0.999 t (0.00, 0.00^2, 58) WR | 2.4682 0.3787 [ 1.723, 3.213] 1.000 t (2.47, 0.37^2, 58) Sigma2 | 51.9790 10.0034 [35.965, 74.937] 1.000 IG(29.00, 0.00069)
Extract the posterior mean of β from the posterior model, and the posterior covariance of β from the estimation summary returned by summarize. estBeta = PosteriorMdl.Mu; Summary = summarize(PosteriorMdl); estBetaCov = Summary.Covariances{1:(end - 1),1:(end - 1)};
Suppose that if the coefficient of real wages is below 2.5, then a policy is enacted. Although the posterior distribution of WR is known, and you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead. 12-448
diffuseblm
Draw 1e6 samples from the marginal posterior distribution of β. NumDraws = 1e6; rng(1); BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws);
BetaSim is a 4-by- 1e6 matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws. Isolate the draws corresponding to the coefficient of real wages, and then identify which draws are less than 2.5. isWR = PosteriorMdl.VarNames == "WR"; wrSim = BetaSim(isWR,:); isWRLT2p5 = wrSim < 2.5;
Find the marginal posterior probability that the regression coefficient of WR is below 2.5 by computing the proportion of draws that are less than 2.5. probWRLT2p5 = mean(isWRLT2p5) probWRLT2p5 = 0.5341
The posterior probability that the coefficient of real wages is less than 2.5 is about 0.53. The marginal posterior distribution of the coefficient of WR is a t58, but centered at 2.47 and scaled by 0.37. Directly compute the posterior probability that the coefficient of WR is less than 2.5. center = estBeta(isWR); stdBeta = sqrt(diag(estBetaCov)); scale = stdBeta(isWR); t = (2.5 - center)/scale; dof = 68; directProb = tcdf(t,dof) directProb = 0.5333
The posterior probabilities are nearly identical.
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Estimate Marginal Posterior Distributions” on page 12-445. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so you can use them to forecast real GNP. Turn the estimation display off. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data
12-449
12
Functions
yFT = DataTable{(end - fhs + 1):end,'GNPR'};
% True future responses
PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product'); ylabel('rGNP'); xlabel('Year'); hold off
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2))
12-450
diffuseblm
frmse = 25.5489
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. Copyright 2018 The MathWorks, Inc.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Alternatives The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2017a
See Also Objects semiconjugateblm | conjugateblm | customblm | empiricalblm Functions bayeslm 12-451
12
Functions
Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-452
diffusebvarm
diffusebvarm Bayesian vector autoregression (VAR) model with diffuse prior for data likelihood
Description The Bayesian VAR model on page 12-466 object diffusebvarm specifies the joint prior distribution of the array of model coefficients Λ and the innovations covariance matrix Σ of an m-D VAR(p) model. The joint prior distribution (Λ,Σ) is the diffuse model on page 12-2101. A diffuse prior model does not enable you to specify hyperparameter values for coefficient sparsity; all AR lags in the model are weighted equally. To implement Minnesota regularization, create a conjugate, semiconjugate, or normal prior model by using bayesvarm. In general, when you create a Bayesian VAR model object, it specifies the joint prior distribution and characteristics of the VARX model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12-457.
Creation Syntax PriorMdl = diffusebvarm(numseries,numlags) PriorMdl = diffusebvarm(numseries,numlags,Name,Value) Description To create a diffusebvarm object, use either the diffusebvarm function (described here) or the bayesvarm function. PriorMdl = diffusebvarm(numseries,numlags) creates a numseries-D Bayesian VAR(numlags) model object PriorMdl, which specifies dimensionalities and prior assumptions for all model coefficients λ = vec Λ = vec Φ1 Φ2 ⋯ Φp c δ Β ′ and the innovations covariance Σ, where: • numseries = m, the number of response time series variables. • numlags = p, the AR polynomial order. • The joint prior distribution of (λ,Σ) is the diffuse model on page 12-467. PriorMdl = diffusebvarm(numseries,numlags,Name,Value) sets writable properties on page 12-454 (except NumSeries and P) using name-value pair arguments. Enclose each property name in quotes. For example, diffusebvarm(3,2,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]) specifies the names of the three response variables in the Bayesian VAR(2) model.
12-453
12
Functions
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double numlags — Number of lagged responses nonnegative integer Number of lagged responses in each equation of yt, specified as a nonnegative integer. The resulting model is a VAR(numlags) model; each lag has a numseries-by-numseries coefficient matrix. numlags sets the P property. Data Types: double
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to create a 3-D Bayesian VAR(1) model and label the first through third response variables, and then include a linear time trend term, enter: PriorMdl = diffusebvarm(3,1,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]); PriorMdl.IncludeTrend = true; Model Characteristics and Dimensionality
Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the model dimensionality, for example '2-Dimensional VAR(3) Model'. Example: "Model 1" Data Types: string | char NumSeries — Number of time series m positive integer This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double P — Multivariate autoregressive polynomial order nonnegative integer This property is read-only. 12-454
diffusebvarm
Multivariate autoregressive polynomial order, specified as a nonnegative integer. P is the maximum lag that has a nonzero coefficient matrix. P specifies the number of presample observations required to initialize the model. Data Types: double SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. diffusebvarm stores SeriesNames as a string vector. Example: ["UnemploymentRate" "CPI" "FEDFUNDS"] Data Types: string IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Data Types: logical IncludeTrend — Flag for including linear time trend term δt false (default) | true Flag for including a linear time trend term δt, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Data Types: logical NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. diffusebvarm includes all predictor variables symmetrically in each response equation. VAR Model Parameters Derived from Distribution Hyperparameters
AR — Distribution mean of autoregressive coefficient matrices Φ1,…,Φp cell vector of numeric matrices 12-455
12
Functions
This property is read-only. Distribution mean of the autoregressive coefficient matrices Φ1,…,Φp associated with the lagged responses, specified as a P-D cell vector of NumSeries-by-NumSeries numeric matrices. AR{j} is Φj, the coefficient matrix of lag j. Rows correspond to equations and columns correspond to lagged response variables; SeriesNames determines the order of response variables and equations. Coefficient signs are those of the VAR model expressed in difference-equation notation. If P = 0, AR is an empty cell. Otherwise, AR is the collection of AR coefficient means extracted from Mu. Data Types: cell Constant — Distribution mean of model constant c numeric vector This property is read-only. Distribution mean of the model constant c (or intercept), specified as a NumSeries-by-1 numeric vector. Constant(j) is the constant in equation j; SeriesNames determines the order of equations. If IncludeConstant = false, Constant is an empty array. Otherwise, Constant is the model constant vector mean extracted from Mu. Data Types: double Trend — Distribution mean of linear time trend δ numeric vector This property is read-only. Distribution mean of the linear time trend δ, specified as a NumSeries-by-1 numeric vector. Trend(j) is the linear time trend in equation j; SeriesNames determines the order of equations. If IncludeTrend = false (the default), Trend is an empty array. Otherwise, Trend is the linear time trend coefficient mean extracted from Mu. Data Types: double Beta — Distribution mean of regression coefficient matrix Β numeric matrix This property is read-only. Distribution mean of the regression coefficient matrix B associated with the exogenous predictor variables, specified as a NumSeries-by-NumPredictors numeric matrix. Beta(j,:) contains the regression coefficients of each predictor in the equation of response variable j yj,t. Beta(:,k) contains the regression coefficient in each equation of predictor xk. By default, all predictor variables are in the regression component of all response equations. You can down-weight a predictor from an equation by specifying, for the corresponding coefficient, a prior mean of 0 in Mu and a small variance in V. When you create a model, the predictor variables are hypothetical. You specify predictor data when you operate on the model (for example, when you estimate the posterior by using estimate). Columns of the predictor data determine the order of the columns of Beta. 12-456
diffusebvarm
Data Types: double Covariance — Distribution mean of innovations covariance matrix Σ nan(NumSeries,NumSeries) Distribution mean of the innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries matrix of NaN. Because the prior model is diffuse, the mean of Σ is unknown, a priori.
Object Functions estimate forecast simsmooth simulate summarize
Estimate posterior distribution of Bayesian vector autoregression (VAR) model parameters Forecast responses from Bayesian vector autoregression (VAR) model Simulation smoother of Bayesian vector autoregression (VAR) model Simulate coefficients and innovations covariance matrix of Bayesian vector autoregression (VAR) model Distribution summary statistics of Bayesian vector autoregression (VAR) model
Examples Create Diffuse Prior Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt
4
=c+
FEDFUNDSt
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. Assume that the joint prior distribution of the VAR model parameters Φ1, . . . , Φ4, c ′, Σ is diffuse. Create a diffuse prior model for the 3-D VAR(4) model parameters. numseries = 3; numlags = 4; PriorMdl = diffusebvarm(numseries,numlags) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double]
[3x3 double]}
12-457
12
Functions
Covariance: [3x3 double]
PriorMdl is a diffusebvarm Bayesian VAR model object representing the prior distribution of the coefficients and innovations covariance of the 3-D VAR(4) model. The command line display shows properties of the model. You can display properties by using dot notation. Display the prior covariance mean matrices of the four AR coefficients by setting each matrix in the cell to a variable. AR1 = PriorMdl.AR{1} AR1 = 3×3 0 0 0
0 0 0
0 0 0
AR2 = PriorMdl.AR{2} AR2 = 3×3 0 0 0
0 0 0
0 0 0
AR3 = PriorMdl.AR{3} AR3 = 3×3 0 0 0
0 0 0
0 0 0
AR4 = PriorMdl.AR{4} AR4 = 3×3 0 0 0
0 0 0
0 0 0
diffusebvarm centers all AR coefficients at 0 by default. Because the model is diffuse, the data informs the posterior distribution.
Create Diffuse Bayesian AR(2) Model Consider a 1-D Bayesian AR(2) model for the daily NASDAQ returns from January 2, 1990 through December 31, 2001. yt = c + ϕ1 yt − 1 + ϕ2 yt − 1 + εt . 12-458
diffusebvarm
The joint prior is diffuse. Create a diffuse prior model for the AR(2) model parameters. numseries = 1; numlags = 2; PriorMdl = diffusebvarm(numseries,numlags) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" 1 2 "Y1" 1 0 0 {[0] [0]} 0 [1x0 double] [1x0 double] NaN
Specify Response Names and Include Linear Time Trend Consider adding a linear time trend term to the 3-D VAR(4) model of “Create Diffuse Prior Model” on page 12-457: INFLt UNRATEt
4
= c + δt +
FEDFUNDSt
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
Create a diffuse prior model for the 3-D VAR(4) model parameters. Specify response variable names. numseries = 3; numlags = 4; seriesnames = ["INFL"; "UNRATE"; "FEDFUNDS"]; PriorMdl = diffusebvarm(numseries,numlags,'SeriesNames',seriesnames,... 'IncludeTrend',true) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "UNRATE" "FEDFUNDS"] 1 1 0 {[3x3 double] [3x3 double] [3x3 double] [3x1 double]
[3x3 double]}
12-459
12
Functions
Trend: [3x1 double] Beta: [3x0 double] Covariance: [3x3 double]
Prepare Prior for Exogenous Predictor Variables Consider the 2-D VARX(1) model for the US real GDP (RGDP) and investment (GCE) rates that treats the personal consumption (PCEC) rate as exogenous: RGDPt GCEt
=c+Φ
RGDPt − 1 GCEt − 1
+ PCECt β + εt .
For all t, εt is a series of independent 2-D normal innovations with a mean of 0 and covariance Σ. Assume that the joint prior distribution is diffuse. Create a diffuse prior model for the 2-D VARX(1) model parameters. numseries = 2; numlags = 1; numpredictors = 1; PriorMdl = diffusebvarm(numseries,numlags,'NumPredictors',numpredictors) PriorMdl = diffusebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: AR: Constant: Trend: Beta: Covariance:
"2-Dimensional VAR(1) Model" 2 1 ["Y1" "Y2"] 1 0 1 {[2x2 double]} [2x1 double] [2x0 double] [2x1 double] [2x2 double]
Work with Prior and Posterior Distributions Consider the 3-D VAR(4) model of “Create Diffuse Prior Model” on page 12-457. Estimate the posterior distribution, and generate forecasts from the corresponding posterior predictive distribution. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"];
12-460
diffusebvarm
DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = diffusebvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate Posterior Distribution Estimate the posterior distribution by passing the prior model and entire data series to estimate. 12-461
12
Functions
rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','equation'); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1241 -0.4809 0.1005 0.3236 -0.0503 0.0450 0.4272 | (0.0762) (0.1536) (0.0390) (0.0868) (0.1647) (0.0413) (0.0860) DUNRATE | -0.0219 0.4716 0.0391 0.0913 0.2414 0.0536 -0.0389 | (0.0413) (0.0831) (0.0211) (0.0469) (0.0891) (0.0223) (0.0465) DFEDFUNDS | -0.1586 -1.4368 -0.2905 0.3403 -0.2968 -0.3117 0.2848 | (0.1632) (0.3287) (0.0835) (0.1857) (0.3526) (0.0883) (0.1841) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
PosteriorMdl is a conjugatebvarm model object; the posterior is analytically tractable. By default, estimate uses the first four observations as a presample to initialize the model. Generate Forecasts from Posterior Predictive Distribution From the posterior predictive distribution, generate forecasts over a two-year horizon. Because sampling from the posterior predictive distribution requires the entire data set, specify the prior model in forecast instead of the posterior. fh = 8; FY = forecast(PriorMdl,fh,rmDataTimeTable{:,seriesnames});
FY is an 8-by-3 matrix of forecasts. Plot the end of the data set and the forecasts. fp = rmDataTimeTable.Time(end) + calquarters(1:fh); figure plotdata = [rmDataTimeTable{end - 10:end,seriesnames}; FY]; plot([rmDataTimeTable.Time(end - 10:end); fp'],plotdata) hold on plot([fp(1) fp(1)],ylim,'k-.') legend(seriesnames) title('Data and Forecasts') hold off
12-462
diffusebvarm
Compute Impulse Responses Plot impulse response functions by passing posterior estimations to armairf. armairf(PosteriorMdl.AR,[],'InnovCov',PosteriorMdl.Covariance)
12-463
12
Functions
12-464
diffusebvarm
12-465
12
Functions
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. 12-466
diffusebvarm
• zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation. Diffuse Model The diffuse model is an m-D Bayesian VAR model on page 12-466 that has the noninformative joint prior distribution Λ, Σ ∝ Σ
−
m+1 2
.
The diffuse model is the limiting case of the conjugate prior model (see conjugatebvarm) when Μ → 0, V-1 → 0, Ω → 0, and ν → -k, where: • k = mp + r + 1c + 1δ, the number of coefficients per response equation. • r = NumPredictors. • 1c is 1 if IncludeConstant is true, and 0 otherwise. • 1δ is 1 if IncludeTrend is true, and 0 otherwise. If the sample size is large enough to satisfy least-squares estimation, the posterior distributions are proper and analytically tractable. Λ Σ, yt, xt N(mp + r + 1c + 1δ) × m Μ, V, Σ Σ yt, xt Inverse Wishart Ω, ν , 12-467
12
Functions
where: •
Μ=
−1
T
∑
t=1
•
V=
T
∑
t=1
•
Ω=
T
∑
t=1
zt′zt
T
∑
t=1
zt′yt′ .
−1
zt′zt
.
yt − Μ′zt′ yt − Μ′zt′ ′ .
• ν = T + k.
Algorithms • If you pass a diffusebvarm object and data to estimate, MATLAB returns a conjugatebvarm object representing the posterior distribution.
Version History Introduced in R2020a
See Also Functions bayesvarm Objects conjugatebvarm | normalbvarm | semiconjugatebvarm
12-468
distplot
distplot Plot Markov chain redistributions
Syntax distplot(mc,X) distplot(mc,X,Name,Value) distplot(ax, ___ ) h = distplot( ___ )
Description distplot(mc,X) creates a heatmap from the data X showing the evolution of a distribution of states in the discrete-time Markov chain mc. distplot(mc,X,Name,Value) uses additional options specified by one or more name-value arguments. For example, specify the type of plot or the frame rate for animated plots. distplot(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca) using any of the input argument combinations in the previous syntaxes. h = distplot( ___ ) returns a handle to the distribution plot. Use h to modify properties of the plot after you create it.
Examples Visualize Evolution of State Distribution Create a four-state Markov chain from a randomly generated transition matrix containing eight infeasible transitions. rng('default'); % For reproducibility mc = mcmix(4,'Zeros',8);
mc is a dtmc object. Plot a digraph of the Markov chain. figure; graphplot(mc);
12-469
12
Functions
State 4 is an absorbing state. Compute the state redistributions at each step for 10 discrete time steps. Assume an initial uniform distribution over the states. X = redistribute(mc,10) X = 11×4 0.2500 0.0869 0.1073 0.0533 0.0641 0.0379 0.0404 0.0266 0.0259 0.0183 ⋮
0.2500 0.2577 0.2990 0.2133 0.2010 0.1473 0.1316 0.0997 0.0864 0.0670
0.2500 0.3088 0.1536 0.1844 0.1092 0.1162 0.0765 0.0746 0.0526 0.0484
0.2500 0.3467 0.4402 0.5489 0.6257 0.6985 0.7515 0.7991 0.8351 0.8663
X is an 11-by-4 matrix. Rows correspond to time steps, and columns correspond to states. Visualize the state redistribution. figure; distplot(mc,X)
12-470
distplot
After 10 transitions, the distribution appears to settle with a majority of the probability mass in state 4.
Animate Evolution of Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 0 0 0
0 0 0 0 0
1/2 1/4 1/4 0 0 ; 1/3 0 2/3 0 0 ; 0 0 0 1/3 2/3; 0 0 0 1/2 1/2; 0 0 0 3/4 1/4;
12-471
12
Functions
1/2 1/2 0 1/4 3/4 0 mc = dtmc(P);
0 0
0 0
0 0
0 ; 0 ];
Compute the state redistributions at each step for 20 discrete time steps. X = redistribute(mc,20);
Animate the redistributions in a histogram. Specify a half-second frame rate. figure; distplot(mc,X,'Type','histogram','FrameRate',0.5);
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). X — Evolution of state probabilities nonnegative numeric matrix Evolution of state probabilities, specified as a (1 + numSteps)-by-NumStates nonnegative numeric matrix returned by redistribute. The first row is the initial state distribution. Subsequent rows are 12-472
distplot
the redistributions at each step. distplot normalizes the rows by their respective sums before plotting. Data Types: double ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, distplot plots to the current axes (gca). Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Type','graph','FrameRate',3 creates an animated plot of the redistributions using a frame rate of 3 seconds. Type — Plot type 'evolution' (default) | 'histogram' | 'graph' Plot type, specified as the comma-separated pair consisting of 'Type' and a value in this table. Value
Description
'evolution' Evolution of the initial distribution. The plot is a (1 + NumSteps)-by-NumStates heatmap. Row i displays the redistribution at step i. 'histogram' Animated histogram of the redistributions. The vertical axis displays probability mass, and the horizontal axis displays states. The 'FrameRate' name-value pair argument controls the animation progress. 'graph'
Animated graph of the redistributions. distplot colors the nodes by their probability mass at each step. The 'FrameRate' name-value pair argument controls the animation progress.
Example: 'Type','graph' Data Types: string | char FrameRate — Length of discrete time steps positive scalar Length of discrete time steps, in seconds, for animated plots, specified as the comma-separated pair consisting of 'FrameRate' and a positive scalar. The default is a pause at each time step. The animation proceeds when you press the space bar. Example: 'FrameRate',3 Data Types: double
12-473
12
Functions
Output Arguments h — Handle to distribution plot graphics object Handle to the distribution plot, returned as a graphics object. h contains a unique plot identifier, which you can use to query or modify properties of the plot.
Version History Introduced in R2017b
See Also Objects dtmc Functions redistribute | simplot Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Compute State Distribution of Markov Chain at Each Time Step” on page 10-66
12-474
disp
disp Display summary information for diffuse state-space model
Syntax disp(Mdl) disp(Mdl,params) disp( ___ ,Name,Value)
Description disp(Mdl) displays summary information for the diffuse state-space model on page 11-4 (dssm model object) Mdl. The display includes the state and observation equations as a system of scalar equations to facilitate model verification. The display also includes the coefficient dimensionalities, notation, and initial state distribution types. The software displays unknown parameter values using c1 for the first unknown parameter, c2 for the second unknown parameter, and so on. For time-varying models with more than 20 different sets of equations, the software displays the first and last 10 groups in terms of time (the last group is the latest). disp(Mdl,params) displays the dssm model Mdl and applies initial values to the model parameters (params). disp( ___ ,Name,Value) displays Mdl using additional options specified by one or more Name,Value pair arguments. For example, you can specify the number of digits to display after the decimal point for model coefficients, or the number of terms per row for state and observation equations. You can use any of the input arguments in the previous syntaxes.
Input Arguments Mdl — Diffuse state-space model dssm model object Diffuse state-space model, specified as a dssm model object returned by dssm or estimate. params — Initial values for unknown parameters [] (default) | numeric vector Initial values for unknown parameters, specified as a numeric vector. The elements of params correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise, following the order A, B, C, D, Mean0, Cov0. 12-475
12
Functions
• If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then you must set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrices mapping function. To set the type of initial state distribution, see dssm. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. MaxStateEq — Maximum number of equations to display 100 (default) | positive integer Maximum number of equations to display, specified as the comma-separated pair consisting of 'MaxStateEq' and a positive integer. If the maximum number of states among all periods is no larger than MaxStateEq, then the software displays the model equation by equation. Example: 'MaxStateEq',10 Data Types: double NumDigits — Number of digits to display after decimal point 2 (default) | nonnegative integer Number of digits to display after the decimal point for known or estimated model coefficients, specified as the comma-separated pair consisting of 'NumDigits' and a nonnegative integer. Example: 'NumDigits',0 Data Types: double Period — Period to display state and observation equations positive integer Period to display state and observation equations for time-varying state-space models, specified as the comma-separated pair consisting of 'Period' and a positive integer. By default, the software displays state and observation equations for all periods. If Period exceeds the maximum number of observations that the model supports, then the software displays state and observation equations for all periods. If the model has more than 20 different sets of equations, then the software displays the first and last 10 groups in terms of time (the last group is the latest). Example: 'Period',120 Data Types: double PredictorsPerRow — Number of equation terms to display per row 5 (default) | positive integer Number of equation terms to display per row, specified as the comma-separated pair consisting of 'PredictorsPerRow' and a positive integer. 12-476
disp
Example: 'PredictorsPerRow',3 Data Types: double
Examples Verify Explicitly Created Diffuse State-Space Model An important step in state-space model analysis is to ensure that the software interprets the state and observation equation matrices as you intend. Use disp to help you verify the diffuse state-space model. Define a diffuse state-space model, where the state equation is an AR(2) model, and the observation equation is the difference between the current and previous state plus the observation error. Symbolically, the state-space model is x1, t x2, t x3, t
0 . 6 0 . 2 0 . 5 x1, t − 1 0.3 = 1 0 0 x2, t − 1 + 0 u1, t 0 0 1 x3, t − 1 0 x1, t
yt = 1 −1 0 x2, t + 0 . 1εt . x3, t Assume the initial state distribution is diffuse. There are three states: x1, t is the AR(2) process, x2, t represents x1, t − 1, and x3, t is the AR(2) model constant. Define the state-transition matrix. A = [0.6 0.2 0.5; 1 0 0; 0 0 1];
Define the state-disturbance-loading matrix. B = [0.3; 0; 0];
Define the measurement-sensitivity matrix. C = [1 -1 0];
Define the observation-innovation matrix. D = 0.1;
Specify the state-space model using dssm. Identify the type of initial state distributions (StateType) by noting the following: • x1, t is an AR(2) process with diffuse initial distribution. • x2, t is the same AR(2) process as x1, t. • x3, t is the constant 1 for all periods. 12-477
12
Functions
StateType = [2 2 1]; Mdl = dssm(A,B,C,D,'StateType',StateType);
Mdl is a dssm model. Verify the diffuse state-space model using disp. disp(Mdl) State-space model type: dssm State vector length: 3 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State x1(t) x2(t) x3(t)
equations: = (0.60)x1(t-1) + (0.20)x2(t-1) + (0.50)x3(t-1) + (0.30)u1(t) = x1(t-1) = x3(t-1)
Observation equation: y1(t) = x1(t) - x2(t) + (0.10)e1(t) Initial state distribution: Initial state means x1 x2 x3 0 0 1 Initial state covariance matrix x1 x2 x3 x1 Inf 0 0 x2 0 Inf 0 x3 0 0 0 State types x1 x2 Diffuse Diffuse
x3 Constant
Cov0 has infinite variance for the AR(2) states.
Display Diffuse State-Space Model with Initial Values Create a diffuse state-space model containing two independent, autoregressive states, and where the observations are the deterministic sum of the two states. Symbolically, the system of equations is xt, 1 xt, 2 12-478
=
ϕ1 0 xt − 1, 1 0 ϕ2 xt − 1, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
disp
yt = 1 1
xt, 1 xt, 2
.
Specify the state-transition matrix. A = [NaN 0; 0 NaN];
Specify the state-disturbance-loading matrix. B = [NaN 0; 0 NaN];
Specify the measurement-sensitivity matrix. C = [1 1];
Create the diffuse state-space model by using dssm. Specify that the first state is stationary and the second is diffuse. StateType = [0; 2]; Mdl = dssm(A,B,C,'StateType',StateType);
Mdl is a dssm model object. Display the state-space model. Specify initial values for the unknown parameters and the initial state means and covariance matrix as follows: • ϕ1, 0 = ϕ2, 0 = 0 . 1. • σ1, 0 = σ2, 0 = 0 . 2. params = [0.1; 0.1; 0.2; 0.2]; disp(Mdl,params) State-space model type: dssm State vector length: 2 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 0 Sample size supported by model: Unlimited Unknown parameters for estimation: 4 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations: x1(t) = (c1)x1(t-1) + (c3)u1(t) x2(t) = (c2)x2(t-1) + (c4)u2(t) Observation equation: y1(t) = x1(t) + x2(t) Initial state distribution: Initial state means
12-479
12
Functions
x1 0
x2 0
Initial state covariance matrix x1 x2 x1 0.04 0 x2 0 Inf State types x1 Stationary
x2 Diffuse
The software computes the initial state mean and variance of the stationary state using its stationary distribution.
Explicitly Create and Display Time-Varying Diffuse State-Space Model From periods 1 through 50, the state model is a diffuse AR(2) and a stationary MA(1) model, and the observation model is the sum of the two states. From periods 51 through 100, the state model includes the first AR(2) model only. Symbolically, the state-space model is, for periods 1 through 50, x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1 0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
+
σ1 0 0 0 u1t 0 0 0 0 u2t 0 0 1 0 u3t , 0 0 1 0 u4t
yt = a1 x1t + x3t + σ2εt for period 51, x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1 1 0 0 0 x3, t − 1
u1t +
σ1 0 0 0 u2t 0 0 0 0 u3t
x4, t − 1
u4t
yt = a2x1t + σ3εt and for periods 52 through 100, x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0 u1t 0 0 u2t
yt = a2x1t + σ3εt . Specify the state-transition coefficient matrix. A1 = {[NaN NaN 0 0; 1 0 0 0; 0 0 0 NaN; 0 0 0 0]}; A2 = {[NaN NaN 0 0; 1 0 0 0]}; A3 = {[NaN NaN; 1 0]}; A = [repmat(A1,50,1);A2;repmat(A3,49,1)];
Specify the state-disturbance-loading coefficient matrix. 12-480
disp
B1 = {[NaN 0 0 0; 0 0 0 0; 0 0 1 0; 0 0 1 0]}; B2 = {[NaN 0 0 0; 0 0 0 0]}; B3 = {[NaN 0; 0 0]}; B = [repmat(B1,50,1);B2;repmat(B3,49,1)];
Specify the measurement-sensitivity coefficient matrix. C1 = {[NaN 0 NaN 0]}; C3 = {[NaN 0]}; C = [repmat(C1,50,1);repmat(C3,50,1)];
Specify the observation-disturbance coefficient matrix. D1 = {NaN}; D3 = {NaN}; D = [repmat(D1,50,1);repmat(D3,50,1)];
Create the diffuse state-space model. Specify that the initial state distributions are diffuse for the states composing the AR model and stationary for those composing the MA model. StateType = [2; 2; 0; 0]; Mdl = dssm(A,B,C,D,'StateType',StateType);
Mdl is an dssm model object. The model is large and contains a different set of parameters for each period. The software displays state and observation equations for the first 10 and last 10 periods. You can choose which periods to display the equations for using the 'Period' name-value pair argument. Display the diffuse state-space model, and use 'Period' display the state and observation equations for the 50th, 51st, and 52nd periods. disp(Mdl,'Period',50) State-space model type: dssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 600 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 50): x1(t) = (c148)x1(t-1) + (c149)x2(t-1) + (c300)u1(t) x2(t) = x1(t-1) x3(t) = (c150)x4(t-1) + u3(t) x4(t) = u3(t) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40
12-481
12
Functions
c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 50): y1(t) = (c449)x1(t) + (c450)x3(t) + (c550)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597 Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 x3 x4 Diffuse Diffuse Stationary Stationary disp(Mdl,'Period',51) State-space model type: dssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 600 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,...
12-482
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
disp
Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 51): x1(t) = (c151)x1(t-1) + (c152)x2(t-1) + (c301)u1(t) x2(t) = x1(t-1) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 51): y1(t) = (c451)x1(t) + (c551)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 x3 x4 Diffuse Diffuse Stationary Stationary disp(Mdl,'Period',52) State-space model type: dssm State vector length: Time-varying
12-483
12
Functions
Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 600 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 52): x1(t) = (c153)x1(t-1) + (c154)x2(t-1) + (c302)u1(t) x2(t) = x1(t-1) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 52): y1(t) = (c452)x1(t) + (c552)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597 Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified.
12-484
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
disp
State types x1 x2 Diffuse Diffuse
x3 Stationary
x4 Stationary
The software attributes a different set of coefficients for each period. You might experience numerical issues when you estimate such models. To reuse parameters among groups of periods, consider creating a parameter-to-matrix mapping function.
Tips • The software always displays explicitly specified state-space models (that is, models you create without using a parameter-to-matrix mapping function). Try explicitly specifying state-space models first so that you can verify them using disp. • A parameter-to-matrix function that you specify to create Mdl is a black box to the software. Therefore, the software might not display complex, implicitly defined state-space models.
Algorithms • If you implicitly create Mdl, and if the software cannot infer locations for unknown parameters from the parameter-to-matrix function, then the software evaluates these parameters using their initial values and displays them as numeric values. This evaluation can occur when the parameterto-matrix function has a random, unknown coefficient, which is a convenient form for a Monte Carlo study. • The software displays the initial state distributions as numeric values. This type of display occurs because, in many cases, the initial distribution depends on the values of the state equation matrices A and B. These values are often a complicated function of unknown parameters. In such situations, the software does not display the initial distribution symbolically. Additionally, if Mean0 and Cov0 contain unknown parameters, then the software evaluates and displays numeric values for the unknown parameters.
Version History Introduced in R2015b
References [1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also dssm | ssm | estimate | filter | smooth | forecast Topics “What Are State-Space Models?” on page 11-3
12-485
12
Functions
disp Display summary information for state-space model
Syntax disp(Mdl) disp(Mdl,params) disp( ___ ,Name,Value)
Description disp(Mdl) displays summary information for the state-space model on page 11-3 (ssm model object) Mdl. The display includes the state and observation equations as a system of scalar equations to facilitate model verification. The display also includes the coefficient dimensionalities, notation, and initial state distribution types. The software displays unknown parameter values using c1 for the first unknown parameter, c2 for the second unknown parameter, and so on. For time-varying models with more than 20 different sets of equations, the software displays the first and last 10 groups in terms of time (the last group is the latest). disp(Mdl,params) displays the ssm model Mdl and applies initial values to the model parameters (params). disp( ___ ,Name,Value) displays the ssm model with additional options specified by one or more Name,Value pair arguments. For example, you can specify the number of digits to display after the decimal point for model coefficients, or the number of terms per row for state and observation equations. You can use any of the input arguments in the previous syntaxes.
Examples Verify Explicitly Created State-Space Model An important step in state-space model analysis is to ensure that the software interprets the state and observation equation matrices as you intend. Use disp to help you verify the state-space model. Define a state-space model, where the state equation is an AR(2) model, and the observation equation is the difference between the current and previous state plus the observation error. Symbolically, the state-space model is x1, t x2, t x3, t
0 . 6 0 . 2 0 . 5 x1, t − 1 0.3 = 1 0 0 x2, t − 1 + 0 u1, t 0 0 1 x3, t − 1 0 x1, t
yt = 1 −1 0 x2, t + 0 . 1εt . x3, t 12-486
disp
There are three states: x1, t is the AR(2) process, x2, t represents x1, t − 1, and x3, t is the AR(2) model constant. Define the state-transition matrix. A = [0.6 0.2 0.5; 1 0 0; 0 0 1];
Define the state-disturbance-loading matrix. B = [0.3; 0; 0];
Define the measurement-sensitivity matrix. C = [1 -1 0];
Define the observation-innovation matrix. D = 0.1;
Specify the state-space model using ssm. Set the initial-state mean (Mean0) and covariance matrix (Cov0). Identify the type of initial state distributions (StateType) by noting the following: • x1, t is a stationary, AR(2) process. • x2, t is also a stationary, AR(2) process. • x3, t is the constant 1 for all periods. Mean0 = [0; 0; 1]; % The mean of the AR(2) varAR2 = 0.3*(1 - 0.2)/((1 + 0.2)*((1 - 0.2)^2 - 0.6^2)); % The variance of the AR(2) Cov1AR2 = 0.6*0.3/((1 + 0.2)*((1 - 0.2)^2) - 0.6^2); % The covariance of the AR(2) Cov0 = zeros(3); Cov0(1:2,1:2) = varAR2*eye(2) + Cov1AR2*flip(eye(2)); StateType = [0; 0; 1]; Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType);
Mdl is an ssm model. Verify the state-space model using disp. disp(Mdl) State-space model type: ssm State vector length: 3 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State x1(t) x2(t) x3(t)
equations: = (0.60)x1(t-1) + (0.20)x2(t-1) + (0.50)x3(t-1) + (0.30)u1(t) = x1(t-1) = x3(t-1)
12-487
12
Functions
Observation equation: y1(t) = x1(t) - x2(t) + (0.10)e1(t) Initial state distribution: Initial state means x1 x2 x3 0 0 1 Initial state covariance matrix x1 x2 x3 x1 0.71 0.44 0 x2 0.44 0.71 0 x3 0 0 0 State types x1 Stationary
x2 Stationary
x3 Constant
Display State-Space Model and Initial Values Define a state-space model containing two independent, autoregressive states, and where the observations are the deterministic sum of the two states. Symbolically, the system of equations is xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 1
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
Specify the state-transition matrix. A = [NaN 0; 0 NaN];
Specify the state-disturbance-loading matrix. B = [NaN 0; 0 NaN];
Specify the measurement-sensitivity matrix. C = [1 1];
Specify an empty matrix for the observation disturbance matrix. D = [];
Use ssm to define the state-space model. Specify the initial state means and covariance matrix to as unknown parameters. Specify that the states are stationary. Mean0 = nan(2,1); Cov0 = nan(2,2); StateType = zeros(2,1); Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType);
12-488
disp
Mdl is an ssm model containing unknown parameters. Use disp to display the state-space model. Specify initial values for the unknown parameters and the initial state means and covariance matrix as follows: • ϕ1, 0 = ϕ2, 0 = 0 . 1. • σ1, 0 = σ2, 0 = 0 . 2. • x1, 0 = 1 and x2, 0 = 0 . 5. • Σx1, 0, x2, 0 = I2. params = [0.1; 0.1; 0.2; 0.2; 1; 0.5; 1; 0; 0; 1]; disp(Mdl,params) State-space model type: ssm State vector length: 2 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 0 Sample size supported by model: Unlimited Unknown parameters for estimation: 10 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations: x1(t) = (c1)x1(t-1) + (c3)u1(t) x2(t) = (c2)x2(t-1) + (c4)u2(t) Observation equation: y1(t) = x1(t) + x2(t) Initial state distribution: Initial state means x1 x2 1 0.50 Initial state covariance matrix x1 x2 x1 1 0 x2 0 1 State types x1 Stationary
x2 Stationary
12-489
12
Functions
Explicitly Create and Display Time-Varying State-Space Model From periods 1 through 50, the state model is an AR(2) and an MA(1) model, and the observation model is the sum of the two states. From periods 51 through 100, the state model includes the first AR(2) model only. Symbolically, the state-space model is, for periods 1 through 50, x1, t x2, t x3, t
=
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1
σ1 0
0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
x4, t
0 0 u1, t 0 1 u3, t , 0 1
+
yt = a1 x1, t + x3, t + σ2εt for period 51, x1, t − 1 x1, t x2, t
=
ϕ1 ϕ2 0 0 x2, t − 1
+
1 0 0 0 x3, t − 1
σ1 0
u1, t
x4, t − 1 yt = a2x1t + σ3εt and for periods 52 through 100, x1, t x2, t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0 0 0
u1, t
yt = a2x1, t + σ3εt . Specify the state-transition coefficient matrix. A1 = {[NaN NaN 0 0; 1 0 0 0; 0 0 0 NaN; 0 0 0 0]}; A2 = {[NaN NaN 0 0; 1 0 0 0]}; A3 = {[NaN NaN; 1 0]}; A = [repmat(A1,50,1);A2;repmat(A3,49,1)];
Specify the state-disturbance-loading coefficient matrix. B1 = {[NaN 0;0 0; 0 1; 0 1]}; B2 = {[NaN; 0]}; B3 = {[NaN; 0]}; B = [repmat(B1,50,1);B2;repmat(B3,49,1)];
Specify the measurement-sensitivity coefficient matrix. C1 = {[NaN 0 NaN 0]}; C3 = {[NaN 0]}; C = [repmat(C1,50,1);repmat(C3,50,1)];
Specify the observation-disturbance coefficient matrix. D1 = {NaN}; D3 = {NaN}; D = [repmat(D1,50,1);repmat(D3,50,1)];
12-490
disp
Specify the state-space model. Set the initial state means and covariance matrix to unknown parameters. Specify that the initial state distributions are stationary. Mean0 = nan(4,1); Cov0 = nan(4,4); StateType = [0; 0; 0; 0]; Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType);
Mdl is an ssm model. The model is large and contains a different set of parameters for each period. The software displays state and observation equations for the first 10 and last 10 periods. You can choose which periods to display the equations for using the 'Period' name-value pair argument. Display the state-space model, and use 'Period' to display the state and observation equations for the 50th, 51st, and 52nd periods. disp(Mdl,'Period',50) State-space model type: ssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 620 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 50): x1(t) = (c148)x1(t-1) + (c149)x2(t-1) + (c300)u1(t) x2(t) = x1(t-1) x3(t) = (c150)x4(t-1) + u2(t) x4(t) = u2(t) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c269 c2 c288 c289 c2 c308 c309 c3
12-491
12
Functions
c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c328 c329 c3 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 c348 c349 c3 Observation equation (in period 50): y1(t) = (c449)x1(t) + (c450)x3(t) + (c550)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597 Initial state distribution: Initial state means x1 x2 x3 x4 NaN NaN NaN NaN Initial state covariance matrix x1 x2 x3 x4 x1 NaN NaN NaN NaN x2 NaN NaN NaN NaN x3 NaN NaN NaN NaN x4 NaN NaN NaN NaN State types x1 Stationary
x2 Stationary
x3 Stationary
x4 Stationary
disp(Mdl,'Period',51) State-space model type: ssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 620 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 51): x1(t) = (c151)x1(t-1) + (c152)x2(t-1) + (c301)u1(t) x2(t) = x1(t-1)
12-492
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
disp
Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 51): y1(t) = (c451)x1(t) + (c551)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
Initial state distribution: Initial state means x1 x2 x3 x4 NaN NaN NaN NaN Initial state covariance matrix x1 x2 x3 x4 x1 NaN NaN NaN NaN x2 NaN NaN NaN NaN x3 NaN NaN NaN NaN x4 NaN NaN NaN NaN State types x1 Stationary
x2 Stationary
x3 Stationary
x4 Stationary
disp(Mdl,'Period',52)
12-493
12
Functions
State-space model type: ssm State vector length: Time-varying Observation vector length: 1 State disturbance vector length: Time-varying Observation innovation vector length: 1 Sample size supported by model: 100 Unknown parameters for estimation: 620 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations (in period 52): x1(t) = (c153)x1(t-1) + (c154)x2(t-1) + (c302)u1(t) x2(t) = x1(t-1) Time-varying transition matrix A contains unknown parameters: c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 c17 c18 c19 c20 c21 c22 c23 c24 c25 c26 c27 c28 c29 c30 c31 c32 c33 c34 c35 c36 c37 c38 c39 c40 c41 c42 c43 c44 c45 c46 c47 c48 c49 c50 c51 c52 c53 c54 c55 c56 c57 c58 c59 c60 c61 c62 c63 c64 c65 c66 c67 c68 c69 c70 c71 c72 c73 c74 c75 c76 c77 c78 c79 c80 c81 c82 c83 c84 c85 c86 c87 c88 c89 c90 c91 c92 c93 c94 c95 c96 c97 c98 c99 c100 c101 c102 c103 c104 c105 c106 c107 c108 c109 c110 c111 c112 c113 c114 c115 c116 c117 c121 c122 c123 c124 c125 c126 c127 c128 c129 c130 c131 c132 c133 c134 c135 c136 c137 c141 c142 c143 c144 c145 c146 c147 c148 c149 c150 c151 c152 c153 c154 c155 c156 c157 c161 c162 c163 c164 c165 c166 c167 c168 c169 c170 c171 c172 c173 c174 c175 c176 c177 c181 c182 c183 c184 c185 c186 c187 c188 c189 c190 c191 c192 c193 c194 c195 c196 c197 c201 c202 c203 c204 c205 c206 c207 c208 c209 c210 c211 c212 c213 c214 c215 c216 c217 c221 c222 c223 c224 c225 c226 c227 c228 c229 c230 c231 c232 c233 c234 c235 c236 c237 c241 c242 c243 c244 c245 c246 c247 c248 c249 c250 Time-varying state disturbance loading matrix B contains unknown parameters: c251 c252 c253 c254 c255 c256 c257 c258 c259 c260 c261 c262 c263 c264 c265 c266 c267 c271 c272 c273 c274 c275 c276 c277 c278 c279 c280 c281 c282 c283 c284 c285 c286 c287 c291 c292 c293 c294 c295 c296 c297 c298 c299 c300 c301 c302 c303 c304 c305 c306 c307 c311 c312 c313 c314 c315 c316 c317 c318 c319 c320 c321 c322 c323 c324 c325 c326 c327 c331 c332 c333 c334 c335 c336 c337 c338 c339 c340 c341 c342 c343 c344 c345 c346 c347 Observation equation (in period 52): y1(t) = (c452)x1(t) + (c552)e1(t) Time-varying measurement sensitivity matrix C contains unknown parameters: c351 c352 c353 c354 c355 c356 c357 c358 c359 c360 c361 c362 c363 c364 c365 c366 c367 c371 c372 c373 c374 c375 c376 c377 c378 c379 c380 c381 c382 c383 c384 c385 c386 c387 c391 c392 c393 c394 c395 c396 c397 c398 c399 c400 c401 c402 c403 c404 c405 c406 c407 c411 c412 c413 c414 c415 c416 c417 c418 c419 c420 c421 c422 c423 c424 c425 c426 c427 c431 c432 c433 c434 c435 c436 c437 c438 c439 c440 c441 c442 c443 c444 c445 c446 c447 c451 c452 c453 c454 c455 c456 c457 c458 c459 c460 c461 c462 c463 c464 c465 c466 c467 c471 c472 c473 c474 c475 c476 c477 c478 c479 c480 c481 c482 c483 c484 c485 c486 c487 c491 c492 c493 c494 c495 c496 c497 c498 c499 c500 Time-varying observation innovation loading matrix D contains unknown parameters: c501 c502 c503 c504 c505 c506 c507 c508 c509 c510 c511 c512 c513 c514 c515 c516 c517 c521 c522 c523 c524 c525 c526 c527 c528 c529 c530 c531 c532 c533 c534 c535 c536 c537 c541 c542 c543 c544 c545 c546 c547 c548 c549 c550 c551 c552 c553 c554 c555 c556 c557 c561 c562 c563 c564 c565 c566 c567 c568 c569 c570 c571 c572 c573 c574 c575 c576 c577 c581 c582 c583 c584 c585 c586 c587 c588 c589 c590 c591 c592 c593 c594 c595 c596 c597 Initial state distribution:
12-494
c118 c138 c158 c178 c198 c218 c238
c119 c139 c159 c179 c199 c219 c239
c1 c1 c1 c1 c2 c2 c2
c268 c288 c308 c328 c348
c269 c289 c309 c329 c349
c2 c2 c3 c3 c3
c368 c388 c408 c428 c448 c468 c488
c369 c389 c409 c429 c449 c469 c489
c3 c3 c4 c4 c4 c4 c4
c518 c538 c558 c578 c598
c519 c539 c559 c579 c599
c5 c5 c5 c5 c6
disp
Initial state means x1 x2 x3 x4 NaN NaN NaN NaN Initial state covariance matrix x1 x2 x3 x4 x1 NaN NaN NaN NaN x2 NaN NaN NaN NaN x3 NaN NaN NaN NaN x4 NaN NaN NaN NaN State types x1 Stationary
x2 Stationary
x3 Stationary
x4 Stationary
The software attributes a different set of coefficients for each period. You might experience numerical issues when you estimate such models. To reuse parameters among groups of periods, consider creating a parameter-to-matrix mapping function.
Input Arguments Mdl — Standard state-space model ssm model object Standard state-space model, specified as an ssm model object returned by ssm or estimate. params — Initial values for unknown parameters [] (default) | numeric vector Initial values for unknown parameters, specified as a numeric vector. The elements of params correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise, following the order A, B, C, D, Mean0, Cov0. • If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then you must set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrices mapping function. To set the type of initial state distribution, see ssm. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-495
12
Functions
Example: disp(Mdl,'MaxStateEq',50) MaxStateEq — Maximum number of equations to display 100 (default) | positive integer Maximum number of equations to display, specified as the comma-separated pair consisting of 'MaxStateEq' and a positive integer. If the maximum number of states among all periods is no larger than MaxStateEq, then the software displays the model equation by equation. Example: 'MaxStateEq',10 Data Types: double NumDigits — Number of digits to display after decimal point 2 (default) | nonnegative integer Number of digits to display after the decimal point for known or estimated model coefficients, specified as the comma-separated pair consisting of 'NumDigits' and a nonnegative integer. Example: 'NumDigits',0 Data Types: double Period — Period to display state and observation equations positive integer Period to display state and observation equations for time-varying state-space models, specified as the comma-separated pair consisting of 'Period' and a positive integer. By default, the software displays state and observation equations for all periods. If Period exceeds the maximum number of observations that the model supports, then the software displays state and observation equations for all periods. If the model has more than 20 different sets of equations, then the software displays the first and last 10 groups in terms of time (the last group is the latest). Example: 'Period',120 Data Types: double PredictorsPerRow — Number of equation terms to display per row 5 (default) | positive integer Number of equation terms to display per row, specified as the comma-separated pair consisting of 'PredictorsPerRow' and a positive integer. Example: 'PredictorsPerRow',3 Data Types: double
Tips • The software always displays explicitly specified state-space models (that is, models you create without using a parameter-to-matrix mapping function). Try explicitly specifying state-space models first so that you can verify them using disp. • A parameter-to-matrix function that you specify to create Mdl is a black box to the software. Therefore, the software might not display complex, implicitly defined state-space models. 12-496
disp
Algorithms • If you implicitly create Mdl, and if the software cannot infer locations for unknown parameters from the parameter-to-matrix function, then the software evaluates these parameters using their initial values and displays them as numeric values. This evaluation can occur when the parameterto-matrix function has a random, unknown coefficient, which is a convenient form for a Monte Carlo study. • The software displays the initial state distributions as numeric values. This type of display occurs because, in many cases, the initial distribution depends on the values of the state equation matrices A and B. These values are often a complicated function of unknown parameters. In such situations, the software does not display the initial distribution symbolically. Additionally, if Mean0 and Cov0 contain unknown parameters, then the software evaluates and displays numeric values for the unknown parameters.
Version History Introduced in R2014a
References [1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also ssm | estimate | filter | smooth | forecast | simulate Topics “Create State-Space Model with Random State Coefficient” on page 11-31 “What Are State-Space Models?” on page 11-3
12-497
12
Functions
dssm Create diffuse linear Gaussian state-space model
Description The dssm function returns a dssm object specifying the functional form and storing the parameter values of a diffuse linear Gaussian state-space model on page 11-4 for a latent state process xt possibly imperfectly observed through the variable yt. The variables xt and yt can be univariate or multivariate and the model parameters can be time-invariant on page 11-4 or time-varying on page 11-5. A diffuse state-space model contains diffuse states, and variances of the initial distributions of diffuse states are Inf. All diffuse states are independent of each other and all other states. Object functions on page 12-504 of the dssm object implement the diffuse Kalman filter for filtering, smoothing, and parameter estimation. The key components of a dssm object are the state-transition A, state-disturbance-loading B, measurement-sensitivity C, and observation-innovation D coefficient matrices because they completely specify the model structure. You can explicitly specify each matrix or supply a custom function that implicitly specifies them. Regardless, given the model structure, all coefficients are unknown and estimable unless you specify their values. To estimate a model containing unknown parameter values, pass the model and data to estimate. To work with an estimated or fully specified dssm object, pass it to an object function on page 12-504. Alternative state-space models include: • The ssm model object — Standard linear Gaussian state-space model • The bssm model object — Bayesian linear state-space model • The bnlssm model object — Bayesian nonlinear non-Gaussian state-space model
Creation Syntax Mdl = dssm(A,B,C) Mdl = dssm(A,B,C,D) Mdl = dssm( ___ ,Name=Value) Mdl = dssm(ParamMap) Mdl = dssm(SSMMdl) Description Explicitly Specify Coefficient Matrices
Mdl = dssm(A,B,C) returns the diffuse linear Gaussian state-space model on page 11-4 Mdl with state-transition matrix A, state-disturbance-loading matrix B, and measurement-sensitivity matrix C. 12-498
dssm
At each time t, the state combination yt = Cxt is observed without error. dssm sets the model properties on page 12-499 A, B, and C from the corresponding inputs. Mdl = dssm(A,B,C,D) additionally specifies the observation-innovation matrix D and sets the property D. Mdl = dssm( ___ ,Name=Value) sets properties that describe the initial state distribution using name-value arguments, and using any input-argument combination in the previous syntaxes. For example, dssm(A,B,C,StateType=[0; 1; 2]) specifies that the first state variable is initially stationary, the second state variable is initially the constant 1, and the third state variable is initially nonstationary. Implicitly Specify Coefficient Matrices By Using Custom Function
Mdl = dssm(ParamMap) returns the diffuse state-space model Mdl whose structure is specified by the custom parameter-to-matrix mapping function ParamMap. The function maps a parameter vector θ to the matrices A, B, and C. Optionally, ParamMap can map parameters to D, Mean0, Cov0, or StateType. To accommodate a regression component in the observation equation, ParamMap can return deflated observation data. Convert from Diffuse to Standard State-Space Model
Mdl = dssm(SSMMdl) converts a standard state-space model object SSMMdl to a diffuse state-space model object Mdl. dssm sets all initial variances of diffuse states in Mdl.Cov0 to Inf. Because Mdl is a diffuse state-space model, dssm object functions apply the diffuse Kalman filter, instead of the standard Kalman filter, for filtering, smoothing, and parameter estimation. Input Arguments SSMMdl — Standard state-space model ssm model object Standard state-space model to convert to a diffuse state-space model, specified as an ssm model object. dssm sets all initial state variances of nonconstant states to Inf and all initial states variances of constant states to 0.
Properties A — State-transition coefficient matrix matrix | cell vector of matrices | empty array [] State-transition coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. When you implicitly create a model, dssm sets A to an empty array, and determines the state-transition coefficient matrix from ParamMap The state-transition coefficient matrix, At, specifies how the states, xt, are expected to transition from period t – 1 to t, for all t = 1,...,T. That is, the expected state-transition equation at period t is E(xt|xt–1) = Atxt–1. For time-invariant state-space models, specify A as an m-by-m matrix, where m is the number of states per period. 12-499
12
Functions
For time-varying state-space models, specify A as a T-dimensional cell array, where A{t} contains an mt-by-mt – 1 state-transition coefficient matrix. If the number of states changes from period t – 1 to t, then mt ≠ mt – 1. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. A contributes: • sum(isnan(A(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in A at each period. • numParamsA unknown parameters to time-varying state-space models, where numParamsA = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),A,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in A. You cannot specify A and ParamMap simultaneously. Data Types: double | cell B — State-disturbance-loading coefficient matrix matrix | cell vector of matrices | empty array [] State-disturbance-loading coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. When you implicitly create a model, dssm sets B to an empty array, and determines the state-disturbance-loading coefficient matrix from ParamMap. The state disturbances, ut, are independent Gaussian random variables with mean 0 and standard deviation 1. The state-disturbance-loading coefficient matrix, Bt, specifies the additive error structure in the state-transition equation from period t – 1 to t, for all t = 1,...,T. That is, the state-transition equation at period t is xt = Atxt–1 + Btut. For time-invariant state-space models, specify B as an m-by-k matrix, where m is the number of states and k is the number of state disturbances per period. B*B' is the state-disturbance covariance matrix for all periods. For time-varying state-space models, specify B as a T-dimensional cell array, where B{t} contains an mt-by-kt state-disturbance-loading coefficient matrix. If the number of states or state disturbances changes at period t, then the matrix dimensions between B{t-1} and B{t} vary. B{t}*B{t}' is the state-disturbance covariance matrix for period t. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. B contributes: • sum(isnan(B(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in B at each period. • numParamsB unknown parameters to time-varying state-space models, where numParamsB = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),B,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in B. You cannot specify B and ParamMap simultaneously. Data Types: double | cell 12-500
dssm
C — Measurement-sensitivity coefficient matrix matrix | cell vector of matrices | empty array [] Measurement-sensitivity coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. When you implicitly create a model, dssm sets C to an empty array, and determines the measurement-sensitivity coefficient matrix from ParamMap. The measurement-sensitivity coefficient matrix, Ct, specifies how the states are expected to linearly combine at period t to form the observations, yt, for all t = 1,...,T. That is, the expected observation equation at period t is E(yt|xt) = Ctxt. For time-invariant state-space models, specify C as an n-by-m matrix, where n is the number of observations and m is the number of states per period. For time-varying state-space models, specify C as a T-dimensional cell array, where C{t} contains an nt-by-mt measurement-sensitivity coefficient matrix. If the number of states or observations changes at period t, then the matrix dimensions between C{t-1} and C{t} vary. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. C contributes: • sum(isnan(C(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in C at each period. • numParamsC unknown parameters to time-varying state-space models, where numParamsC = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),C,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in C. You cannot specify C and ParamMap simultaneously. Data Types: double | cell D — Observation-innovation coefficient matrix empty array [] (default) | matrix | cell vector of matrices Observation-innovation coefficient matrix for explicit state-space model creation, specified as a matrix or cell vector of matrices. The observation innovations, εt, are independent Gaussian random variables with mean 0 and standard deviation 1. The observation-innovation coefficient matrix, Dt, specifies the additive error structure in the observation equation at period t, for all t = 1,...,T. That is, the observation equation at period t is yt = Ctxt + Dtεt. For time-invariant state-space models, specify D as an n-by-h matrix, where n is the number of observations and h is the number of observation innovations per period. D*D' is the observationinnovation covariance matrix for all periods. For time-varying state-space models, specify D as a T-dimensional cell array, where D{t} contains an nt-by-ht matrix. If the number of observations or observation innovations changes at period t, then the matrix dimensions between D{t-1} and D{t} vary. D{t}*D{t}' is the observation-innovation covariance matrix for period t. NaN values in any coefficient matrix indicate unique, unknown parameters in the state-space model. D contributes: 12-501
12
Functions
• sum(isnan(D(:))) unknown parameters to time-invariant state-space models. In other words, if the state-space model is time invariant, then the software uses the same unknown parameters defined in D at each period. • numParamsD unknown parameters to time-varying state-space models, where numParamsD = sum(cell2mat(cellfun(@(x)sum(sum(isnan(x))),D,'UniformOutput',0))). In other words, if the state-space model is time varying, then the software assigns a new set of parameters for each matrix in D. By default, D is an empty matrix indicating no observation innovations in the state-space model. However, when you implicitly create a model, dssm sets D to [], and determines the observationinnovation coefficient matrix from ParamMap. You cannot specify D and ParamMap simultaneously. Data Types: double | cell Mean0 — Initial state mean numeric vector | empty array [] Initial state mean for explicit state-space model creation, specified as a numeric vector or an empty array ([]). As a numeric vector, Mean0 has length equal to the number of initial states (size(A,1) or size(A{1},1)). Mean0 is the mean of the Gaussian distribution of the states at period 0. If you implicitly create a state-space model by specifying ParamMap, the following conditions apply: • You cannot specify the Mean0 property by using name-value argument syntax. Instead, specify the initial state mean in the parameter-to-matrix mapping function. • Before you estimate the model by using the estimate function, Mean0 is [] and read only. The estimate function specifies Mean0 after estimation. For the default values, see “Algorithms” on page 12-516. Data Types: double Cov0 — Initial state covariance matrix square matrix | empty array [] Initial state covariance matrix, specified as a square matrix or an empty array []. As a matrix, Cov0 has dimensions equal to the number of initial states (size(A,1) or size(A{1},1)). Cov0 is the covariance of the Gaussian distribution of the states at period 0. If you implicitly create a state-space model by specifying ParamMap, the following conditions apply: • You cannot specify the Cov0 property by using name-value argument syntax. Instead, specify the initial state covariance in the parameter-to-matrix mapping function. • Before you estimate the model by using the estimate function, Cov0 is [] and read only. The estimate function specifies Cov0 after estimation. Diagonal elements of Cov0 that have value Inf correspond to diffuse initial state distributions. This specification indicates complete ignorance or no prior knowledge of the initial state value. Subsequently, the software filters, smooths, and estimates parameters in the presence of diffuse initial state distributions using the diffuse Kalman filter. To use the standard Kalman filter for diffuse states instead, set each diagonal element of Cov0 to a large, positive value, for example, 1e7. This specification suggests relatively weak knowledge of the initial state value. 12-502
dssm
For the default values, see “Algorithms” on page 12-516. Data Types: double StateType — Initial state distribution type numeric vector | empty array [] Initial state distribution type, specified as a numeric vector or empty array []. As a numeric vector, StateType has length equal to the number of initial states (size(A,1) or size(A{1},1)). This table summarizes the available types of initial state distributions. Value
Initial State Distribution Type
0
Stationary (e.g., ARMA models)
1
The constant 1 (that is, the state is 1 with probability 1)
2
Nonstationary (e.g., random walk model, seasonal linear time series) or static state on page 12-515
Example: Suppose that the state equation has two state variables: The first state variable is an AR(1) process, and the second state variable is a random walk. Set StateType to [0; 2]. If you implicitly create a state-space model by specifying ParamMap, the following conditions apply: • You cannot specify the StateType property by using name-value argument syntax. Instead, specify the initial state covariance in the parameter-to-matrix mapping function. • Before you estimate the model by using the estimate function, StateType is [] and read only. The estimate function specifies StateType after estimation. For nonstationary states, dssm sets Cov0 to Inf by default. Subsequently, the software assumes that diffuse states are uncorrelated and implements the diffuse Kalman filter for filtering, smoothing, and parameter estimation. This specification imposes no prior knowledge on the initial state values of diffuse states. Data Types: double ParamMap — Parameter-to-matrix mapping function empty array [] (default) | function handle Parameter-to-matrix mapping function for implicit state-space model creation, specified as a function handle. The function, to which ParamMap is a function handle, must accept at least one input argument and return at least three output arguments. The requisite input argument is a vector of unknown statespace model parameters θ, and the requisite output arguments correspond to the coefficient matrices A, B, and C, respectively. If your parameter-to-mapping function requires the input θ only, then implicitly create a state-space model by entering Mdl = ssm(@ParamMap)
In general, you can write an intermediate function, for example, ParamFun, using the syntax function [A,B,C,D,Mean0,Cov0,StateType,DeflateY] = ... ParamFun(theta,...otherInputArgs...)
12-503
12
Functions
In this general case, create the state-space model by entering Mdl = ssm(@(theta)ParamMap(theta,...otherInputArgs...))
However, the following conditions apply: • Follow the order of the output arguments. • theta is a vector, and each element corresponds to an unknown state-space model parameter. • ParamFun must return A, B, and C, which correspond to the state-transition, state-disturbanceloading, and measurement-sensitivity coefficient matrices, respectively. • For the optional output arguments D, Mean0, Cov0, StateType, and DeflateY: • The optional output arguments correspond to the observation-innovation coefficient matrix D and the properties Mean0, Cov0, and StateType. • To skip specifying an optional output argument, set the argument to [] in the function body. For example, to skip specifying D, then set D = []; in the function. • DeflateY is the deflated-observation data, which accommodates a regression component in the observation equation. For example, in this function, which has a linear regression component, Y is the vector of observed responses and Z is the vector of predictor data. function [A,B,C,D,Mean0,Cov0,StateType,DeflateY] = ParamFun(theta,Y,Z) ... DeflateY = Y - theta(9) - theta(10)*Z; ... end
• For the default values of Mean0, Cov0, and StateType, see “Algorithms” on page 12-516. • It is best practice to: • Load the data to the MATLAB Workspace before specifying the model. • Create the parameter-to-matrix mapping function as its own file. If you specify ParamMap, you cannot specify any other property or input argument. If you explicitly create a state-space model, ParamMap is an empty array []. Data Types: function_handle
Object Functions Fit Model to Data estimate refine disp
Maximum likelihood parameter estimation of diffuse state-space models Refine initial parameters to aid diffuse state-space model estimation Display summary information for diffuse state-space model
Estimate State Variables filter smooth
12-504
Forward recursion of diffuse state-space models Backward recursion of diffuse state-space models
dssm
Impulse Response Function irf irfplot
Impulse response function (IRF) of state-space model Plot impulse response function (IRF) of state-space model
Generate Minimum Mean Square Error Forecasts forecast
Forecast states and observations of diffuse state-space models
Examples Explicitly Create Diffuse State-Space Model Containing Known and Unknown Parameters Create a diffuse state-space model containing two independent states, x1, t and x3, t, and an observation, yt, that is the deterministic sum of the two states at time t. x1 is an AR(1) model with a constant and x3 is a random walk. Symbolically, the state-space model is x1, t x2, t x3, t
σ1 0 ϕ1 c1 0 xt − 1, 1 ut, 1 = 0 1 0 xt − 1, 2 + 0 0 ut, 3 0 σ2 0 0 1 xt − 1, 3 xt, 1
yt = 1 0 1 xt, 2 . xt, 3 The state disturbances, u1, t and u3, t, are standard Gaussian random variables. Specify the state-transition matrix. A = [NaN NaN 0; 0 1 0; 0 0 1];
The NaN values indicate unknown parameters. Specify the state-disturbance-loading matrix. B = [NaN 0; 0 0; 0 NaN];
Specify the measurement-sensitivity matrix. C = [1 0 1];
Create a vector that specifies the state types. In this example: • x1, t is a stationary AR(1) model, so its state type is 0. • x2, t is a placeholder for the constant in the AR(1) model. Because the constant is unknown and is expressed in the first equation, x2, t is 1 for the entire series. Therefore, its state type is 1. • x3, t is a nonstationary, random walk with drift, so its state type is 2. StateType = [0 1 2];
Create the state-space model using dssm. 12-505
12
Functions
Mdl = dssm(A,B,C,StateType=StateType) Mdl = State-space model type: dssm State vector length: 3 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 0 Sample size supported by model: Unlimited Unknown parameters for estimation: 4 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State x1(t) x2(t) x3(t)
equations: = (c1)x1(t-1) + (c2)x2(t-1) + (c3)u1(t) = x2(t-1) = x3(t-1) + (c4)u2(t)
Observation equation: y1(t) = x1(t) + x3(t) Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 x3 Stationary Constant Diffuse
Mdl is a dssm model object containing unknown parameters. A detailed summary of Mdl prints to the Command Window. If you do not specify the initial state covariance matrix, then the initial variance of x3, t is Inf. It is good practice to verify that the state and observation equations are correct. If the equations are not correct, then expand the state-space equation and verify it manually.
Explicitly Create Diffuse State-Space Model Containing Observation Error Create a diffuse state-space model containing two random walk states. The observations are the sum of the two states, plus Gaussian error. Symbolically, the equation is xt, 1 xt, 2
=
σ1 0 ut, 1 1 0 xt − 1, 1 + 0 1 xt − 1, 2 0 σ2 ut, 2
yt = 1 1
xt, 1 xt, 2
+ σ3εt .
Define the state-transition matrix. 12-506
dssm
A = [1 0; 0 1];
Define the state-disturbance-loading matrix. B = [NaN 0; 0 NaN];
Define the measurement-sensitivity matrix. C = [1 1];
Define the observation-innovation matrix. D = NaN;
Create a vector that specifies that both states are nonstationary. StateType = [2; 2];
Create the state-space model using dssm. Mdl = dssm(A,B,C,D,StateType=StateType) Mdl = State-space model type: dssm State vector length: 2 Observation vector length: 1 State disturbance vector length: 2 Observation innovation vector length: 1 Sample size supported by model: Unlimited Unknown parameters for estimation: 3 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State equations: x1(t) = x1(t-1) + (c1)u1(t) x2(t) = x2(t-1) + (c2)u2(t) Observation equation: y1(t) = x1(t) + x2(t) + (c3)e1(t) Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types x1 x2 Diffuse Diffuse
Mdl is an dssm model containing unknown parameters. A detailed summary of Mdl prints to the Command Window. Pass the data and Mdl to estimate to estimate the parameters. During estimation, the initial state variances are Inf, and estimate implements the diffuse Kalman filter. 12-507
12
Functions
Create Known Diffuse State-Space Model with Initial State Values Create a diffuse state-space model, where: • The state x1, t is a stationary AR(2) model with ϕ1 = 0 . 6, ϕ2 = 0 . 2, and a constant 0.5. The state disturbance is a mean zero Gaussian random variable with standard deviation 0.3. • The state x4, t is a random walk. The state disturbance is a mean zero Gaussian random variable with standard deviation 0.05. • The observation y1, t is the difference between the current and previous value in the AR(2) state, plus a mean 0 Gaussian observation innovation with standard deviation 0.1. • The observation y2, t is the random walk state plus a mean 0 Gaussian observation innovation with standard deviation 0.02. Symbolically, the state-space model is x1, t x2, t x3, t x4, t
0.6 1 = 0 0
0.2 0 0 0
0.5 0 1 0
0 0 0 1
x1, t − 1 x2, t − 1 x3, t − 1 x4, t − 1
0.3 0 u1, t 0 0 + 0 0 u4, t 0 0 . 05
x1, t y1, t y2, t
=
ε1, t 1 −1 0 0 x2, t 0.1 0 + . 0 0 0 1 x3, t 0 0 . 02 ε2, t x4, t
The model has four states: x1, t is the AR(2) process, x2, t represents x1, t − 1, x3, t is the AR(2) model constant, and x4, t is the random walk. Define the state-transition matrix. A = [0.6 0.2 0.5 0; 1 0 0 0; 0 0 1 0; 0 0 0 1];
Define the state-disturbance-loading matrix. B = [0.3 0; 0 0; 0 0; 0 0.05];
Define the measurement-sensitivity matrix. C = [1 -1 0 0; 0 0 0 1];
Define the observation-innovation matrix. D = [0.1; 0.02];
Use dssm to create the state-space model. Identify the type of initial state distributions (StateType) by noting the following: • x1, t is a stationary AR(2) process. 12-508
dssm
• x2, t is also a stationary AR(2) process. • x3, t is the constant 1 for all periods. • x4, t is nonstationary. Set the initial state means to 0. The initial state mean for constant states must be 1. mean0 = [0; 0; 1; 0]; stateType = [0; 0; 1; 2]; Mdl = dssm(A,B,C,D,Mean0=mean0,StateType=stateType) Mdl = State-space model type: dssm State vector length: 4 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State x1(t) x2(t) x3(t) x4(t)
equations: = (0.60)x1(t-1) + (0.20)x2(t-1) + (0.50)x3(t-1) + (0.30)u1(t) = x1(t-1) = x3(t-1) = x4(t-1) + (0.05)u2(t)
Observation equations: y1(t) = x1(t) - x2(t) + (0.10)e1(t) y2(t) = x4(t) + (0.02)e1(t) Initial state distribution: Initial state means x1 x2 x3 x4 0 0 1 0 Initial state covariance matrix x1 x2 x3 x4 x1 0.21 0.16 0 0 x2 0.16 0.21 0 0 x3 0 0 0 0 x4 0 0 0 Inf State types x1 Stationary
x2 Stationary
x3 Constant
x4 Diffuse
Mdl is a dssm model object. dssm sets the initial state: • Covariance matrix for the stationary states to the asymptotic covariance of the AR(2) model • Variance for constant states to 0 12-509
12
Functions
• Variance for diffuse states to Inf You can display or modify properties of Mdl using dot notation. For example, display the initial state covariance matrix. Mdl.Cov0 ans = 4×4 0.2143 0.1607 0 0
0.1607 0.2143 0 0
0 0 0 0
0 0 0 Inf
Reset the initial state means for the stationary states to their asymptotic values. Mdl.Mean0(1:2) = 0.5/(1-0.2-0.6); Mdl.Mean0 ans = 4×1 2.5000 2.5000 1.0000 0
Implicitly Create Time-Invariant State-Space Model Use a parameter mapping function to create a time-invariant state-space model, where the state model is AR(1) model. The states are observed with bias, but without random error. Set the initial state mean and variance, and specify that the state is stationary. Write a function that specifies how the parameters in params map to the state-space model matrices, the initial state values, and the type of state. Symbolically, the model is
% Copyright 2015 The MathWorks, Inc. function [A,B,C,D,Mean0,Cov0,StateType] = timeInvariantParamMap(params) % Time-invariant state-space model parameter mapping function example. This % function maps the vector params to the state-space matrices (A, B, C, and % D), the initial state value and the initial state variance (Mean0 and % Cov0), and the type of state (StateType). The state model is AR(1) % without observation error. varu1 = exp(params(2)); % Positive variance constraint A = params(1); B = sqrt(varu1); C = params(3); D = [];
12-510
dssm
Mean0 = 0.5; Cov0 = 100; StateType = 0; end
Save this code as a file named timeInvariantParamMap.m to a folder on your MATLAB® path. Create the state-space model by passing the function timeInvariantParamMap as a function handle to ssm. Mdl = ssm(@timeInvariantParamMap);
ssm implicitly creates the state-space model. Usually, you cannot verify implicitly defined state-space models.
Convert Standard to Diffuse State-Space Model By default, ssm assigns a large scalar (1e7) to the initial state variance of all diffuse states in a standard state-space model. Using this specification, the software subsequently estimates, filters, and smooths a standard state-space model using the standard Kalman filter. A standard state-space model treatment is an approximation to results from an analysis that treats diffuse states using infinite variance. To implement the diffuse Kalman filter instead, convert the standard state-space model to a diffuse state-space model. This conversion attributes infinite variance to all diffuse states. Explicitly create a two-dimensional standard state-space model. Specify that the first state equation is x1, t = x1, t − 1 + u1, t and that the second state equation is x2, t = 0 . 2x2, t − 1 + u2, t. Specify that the first observation equation is y1, t = x1, t + ε1, t and that the second observation equation is y2, t = x2, t + ε2, t. Specify that the states are diffuse and nonstationary, respectively. A = [1 0; 0 0.2]; B = [1 0; 0 1]; C = [1 0;0 1]; D = [1 0; 0 1]; stateType = [2 0]; MdlSSM = ssm(A,B,C,D,StateType=stateType) MdlSSM = State-space model type: ssm State vector length: 2 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 2 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equations: x1(t) = x1(t-1) + u1(t) x2(t) = (0.20)x2(t-1) + u2(t)
12-511
12
Functions
Observation equations: y1(t) = x1(t) + e1(t) y2(t) = x2(t) + e2(t) Initial state distribution: Initial state means x1 x2 0 0 Initial state covariance matrix x1 x2 x1 1.00e+07 0 x2 0 1.04 State types x1 x2 Diffuse Stationary
MdlSSM is an ssm model object. In some cases, ssm can detect the state type, but it is good practice to specify whether the state is stationary, diffuse, or the constant 1. Because the model does not contain any unknown parameters, ssm infers the initial state distributions. Convert MdlSSM to a diffuse state-space model. Mdl = dssm(MdlSSM) Mdl = State-space model type: dssm State vector length: 2 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 2 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equations: x1(t) = x1(t-1) + u1(t) x2(t) = (0.20)x2(t-1) + u2(t) Observation equations: y1(t) = x1(t) + e1(t) y2(t) = x2(t) + e2(t) Initial state distribution: Initial state means x1 x2 0 0 Initial state covariance matrix x1 x2
12-512
dssm
x1 x2
Inf 0
0 1.04
State types x1 x2 Diffuse Stationary
Mdl is a dssm model object. The structures of Mdl and MdlSSM are equivalent, except that the initial state variance of the state in Mdl is Inf rather than 1e7. To see the difference between the two models, simulate 10 periods of data from a state-space model that is similar to MdlSSM. Set the initial state covariance matrix to I2. Mdl0 = MdlSSM; Mdl0.Cov0 = eye(2); T = 10; rng(1); % For reproducibility y = simulate(Mdl0,T);
Obtain filtered and smoothed states from Mdl and MdlSSM using the simulated data. fY = filter(MdlSSM,y); fYD = filter(Mdl,y); sY = smooth(MdlSSM,y); sYD = smooth(Mdl,y);
Plot the filtered and smoothed states. figure tiledlayout(2,1) nexttile plot(1:T,y(:,1),"-o",1:T,fY(:,1),"-d",1:T,fYD(:,1),"-*") title("Filter Estimates of x_{1,t}") nexttile plot(1:T,y(:,1),"-o",1:T,sY(:,1),"-d",1:T,sYD(:,1),"-*") title("Smooth Esimates of x_{1,t}") legend("Simulated data","State estimates, MdlSSM","State estimates, Mdl", ... Location="best")
12-513
12
Functions
figure tiledlayout(2,1) nexttile plot(1:T,y(:,2),"-o",1:T,fY(:,2),"-d",1:T,fYD(:,2),"-*") title("Filtered States of x_{2,t}") nexttile plot(1:T,y(:,2),"-o",1:T,sY(:,2),"-d",1:T,sYD(:,2),"-*") title("Smoothed States of x_{2,t}") legend("Simulated data","State estimates, MdlSSM","State estimates, Mdl", ... Location="best")
12-514
dssm
In addition to apparent transient behavior in the random walk, the filtered and smoothed states between the standard and diffuse state-space models appear nearly equivalent. The slight difference occurs because filter and smooth set all diffuse state estimates in the diffuse state-space model to 0 while they implement the diffuse Kalman filter. Once the covariance matrices of the smoothed states attain full rank, filter and smooth switch to using the standard Kalman filter. In this case, the switching time occurs after the first period.
More About Static State A static state does not change in value throughout the sample, that is, P xt + 1 = xt = 1 for all t = 1,...,T.
Tip • Specify ParamMap in a more general or complex setting, where, for example: • The initial state values are parameters. • In time-varying models, you want to use the same parameters for more than one period. • You want to impose parameter constraints.
12-515
12
Functions
• You can create a dssm model object that does not contain any diffuse states. However, subsequent computations, for example, filtering and parameter estimation, can be inefficient. If all states have stationary distributions or are the constant 1, create an ssm model object instead.
Algorithms • Default values for Mean0 and Cov0: • If you explicitly specify the state-space model (that is, you provide the coefficient matrices A, B, C, and optionally D), then: • For stationary states, the software generates the initial value using the stationary distribution. If you provide all values in the coefficient matrices (that is, your model has no unknown parameters), then dssm generates the initial values. Otherwise, the software generates the initial values during estimation. • For states that are always the constant 1, dssm sets Mean0 to 1 and Cov0 to 0. • For diffuse states, the software sets Mean0 to 0 and Cov0 to Inf by default. • If you implicitly specify the state-space model (that is, you provide the parameter vector to the coefficient-matrices-mapping function ParamMap), then the software generates the initial values during estimation. • For static states that do not equal 1 throughout the sample, the software cannot assign a value to the degenerate, initial state distribution. Therefore, set static states to 2 using the name-value pair argument StateType. Subsequently, the software treats static states as nonstationary and assigns the static state a diffuse initial distribution. • It is best practice to set StateType for each state. By default, the software generates StateType, but this behavior might not be accurate. For example, the software cannot distinguish between a constant 1 state and a static state. • The software cannot infer StateType from data because the data theoretically comes from the observation equation. The realizations of the state equation are unobservable. • dssm models do not store observed responses or predictor data. Supply the data wherever necessary using the appropriate input or name-value pair arguments. • Suppose that you want to create a diffuse state-space model using a parameter-to-matrix mapping function with this signature: [A,B,C,D,Mean0,Cov0,StateType,DeflateY] = paramMap(params,Y,Z)
and you specify the model using an anonymous function Mdl = dssm(@(params)paramMap(params,Y,Z))
The observed responses Y and predictor data Z are not input arguments in the anonymous function. If Y and Z exist in the MATLAB Workspace before you create Mdl, then the software establishes a link to them. Otherwise, if you pass Mdl to estimate, the software throws an error. The link to the data established by the anonymous function overrides all other corresponding input argument values of dssm. This distinction is important particularly when conducting a rolling window analysis. For details, see “Rolling-Window Analysis of Time-Series Models” on page 11-135.
12-516
dssm
Alternatives Create an ssm model object instead of a dssm model object when: • The model does not contain any diffuse states. • The diffuse states are correlated with each other or to other states. • You want to implement the standard Kalman filter.
Version History Introduced in R2015b
References [1] Durbin, J, and Siem Jan Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also Objects bssm | ssm | bnlssm Topics “What Are State-Space Models?” on page 11-3 “Rolling-Window Analysis of Time-Series Models” on page 11-135 “Implicitly Create Time-Varying Diffuse State-Space Model” on page 11-28 “Implicitly Create Diffuse State-Space Model Containing Regression Component” on page 11-24
12-517
12
Functions
dtmc Create discrete-time Markov chain
Description dtmc creates a discrete-time, finite-state, time-homogeneous Markov chain from a specified state transition matrix. After creating a dtmc object, you can analyze the structure and evolution of the Markov chain, and visualize the Markov chain in various ways, by using the object functions on page 12-519. Also, you can use a dtmc object to specify the switching mechanism of a Markov-switching dynamic regression model (msVAR). To create a switching mechanism, governed by threshold transitions and threshold variable data, for a threshold-switching dynamic regression model, see threshold and tsVAR.
Creation Syntax mc = dtmc(P) mc = dtmc(P,'StateNames',stateNames) Description mc = dtmc(P) creates the discrete-time Markov chain object mc specified by the state transition matrix P. mc = dtmc(P,'StateNames',stateNames) optionally associates the names stateNames to the states. Input Arguments P — State transition matrix nonnegative numeric matrix State transition matrix, specified as a numStates-by-numStates nonnegative numeric matrix. P(i,j) is either the theoretical probability of a transition from state i to state j or an empirical count of observed transitions from state i to state j. P can be fully specified (all elements are nonnegative numbers), partially specified (elements are a mix of nonnegative numbers and NaN values), or unknown (completely composed of NaN values). dtmc normalizes each row of P without any NaN values to sum to 1, then stores the normalized matrix in the property P. Data Types: double 12-518
dtmc
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, for the two-state model mc, to label the first and second states Depression and Recession, respectively, enter: mc.StateNames = ["Depression" "Recession"];
P — Normalized transition matrix nonnegative numeric matrix This property is read-only. Normalized transition matrix, specified as a numStates-by-numStates nonnegative numeric matrix. If x is a row vector of length numStates specifying a distribution of states at time t (x sums to 1), then x*P is the distribution of states at time t + 1. NaN entries indicate estimable transition probabilities. The estimate function of msVAR treats the known elements of P as equality constraints during optimization. Data Types: double NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar. Data Types: double StateNames — Unique state labels string(1:numStates) (default) | string vector | cell vector of character vectors | numeric vector Unique state labels, specified as a string vector, cell vector of character vectors, or numeric vector of length numStates. Elements correspond to rows and columns of P. Example: ["Depression" "Recession" "Stagnant" "Boom"] Data Types: string
Object Functions dtmc objects require a fully specified transition matrix P.
Determine Markov Chain Structure asymptotics isergodic isreducible classify lazy subchain
Determine Markov chain asymptotics Check Markov chain for ergodicity Check Markov chain for reducibility Classify Markov chain states Adjust Markov chain state inertia Extract Markov subchain 12-519
12
Functions
Describe Markov Chain Evolution redistribute simulate
Compute Markov chain redistributions Simulate Markov chain state walks
Visualize Markov Chain distplot eigplot graphplot simplot
Plot Markov chain redistributions Plot Markov chain eigenvalues Plot Markov chain directed graph Plot Markov chain simulations
Examples Create Markov Chain Using Matrix of Transition Probabilities Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0.5 0.5 P= 0 0
0.5 0 0 0
0 0.5 0 1
0 0 . 1 0
Element Pi j is the probability that the process transitions to state j at time t + 1 given that it is in state i at time t, for all t. Create the Markov chain that is characterized by the transition matrix P. P = [0.5 0.5 0 0; 0.5 0 0.5 0; 0 0 0 1; 0 0 1 0]; mc = dtmc(P);
mc is a dtmc object that represents the Markov chain. Display the number of states in the Markov chain. numstates = mc.NumStates numstates = 4
Plot a directed graph of the Markov chain. figure; graphplot(mc);
12-520
dtmc
Observe that states 3 and 4 form an absorbing class, while states 1 and 2 are transient.
Create Markov Chain Using Matrix of Observed Transition Counts Consider this transition matrix in which element (i, j) is the observed number of times state i transitions to state j. 16 5 P= 9 4
2 11 7 14
3 10 6 15
13 8 . 12 1
For example, P32 = 7 implies that state 3 transitions to state 2 seven times. P = [16 5 9 4
2 11 7 14
3 10 6 15
13; 8; 12; 1];
Create the Markov chain that is characterized by the transition matrix P. mc = dtmc(P);
12-521
12
Functions
Display the normalized transition matrix stored in mc. Verify that the elements within rows sum to 1 for all rows. mc.P ans = 4×4 0.4706 0.1471 0.2647 0.1176
0.0588 0.3235 0.2059 0.4118
0.0882 0.2941 0.1765 0.4412
0.3824 0.2353 0.3529 0.0294
sum(mc.P,2) ans = 4×1 1 1 1 1
Plot a directed graph of the Markov chain. figure; graphplot(mc);
12-522
dtmc
Label Markov Chain States Consider the two-state business cycle of the US real gross national product (GNP) in [3] p. 697. At time t, real GNP can be in a state of expansion or contraction. Suppose that the following statements are true during the sample period. • If real GNP is expanding at time t, then the probability that it will continue in an expansion state at time t + 1 is p11 = 0 . 90. • If real GNP is contracting at time t, then the probability that it will continue in a contraction state at time t + 1 is p22 = 0 . 75. Create the transition matrix for the model. p11 = 0.90; p22 = 0.75; P = [p11 (1 - p11); (1 - p22) p22];
Create the Markov chain that is characterized by the transition matrix P. Label the two states. mc = dtmc(P,'StateNames',["Expansion" "Contraction"]) mc = dtmc with properties: P: [2x2 double] StateNames: ["Expansion" NumStates: 2
"Contraction"]
Plot a directed graph of the Markov chain. Indicate the probability of transition by using edge colors. figure; graphplot(mc,'ColorEdges',true);
12-523
12
Functions
Create Markov Chain from Random Transition Matrix To help you explore the dtmc object functions, mcmix creates a Markov chain from a random transition matrix using only a specified number of states. Create a five-state Markov chain from a random transition matrix. rng(1); % For reproducibility mc = mcmix(5) mc = dtmc with properties: P: [5x5 double] StateNames: ["1" "2" NumStates: 5
"3"
"4"
"5"]
mc is a dtmc object. Plot the eigenvalues of the transition matrix on the complex plane. figure; eigplot(mc)
12-524
dtmc
This spectrum determines structural properties of the Markov chain, such as periodicity and mixing rate.
Create Markov Chain with Unknown Transition Matrix Entries Consider a Markov-switching autoregression (msVAR) model for the US GDP containing four economic regimes: depression, recession, stagnation, and expansion. To estimate the transition probabilities of the switching mechanism, you must supply a dtmc model with an unknown transition matrix entries to the msVAR framework. Create a 4-regime Markov chain with an unknown transition matrix (all NaN entries). Specify the regime names. P = nan(4); statenames = ["Depression" "Recession" ... "Stagnation" "Expansion"]; mcUnknown = dtmc(P,'StateNames',statenames) mcUnknown = dtmc with properties: P: [4x4 double] StateNames: ["Depression"
"Recession"
"Stagnation"
"Expansion"]
12-525
12
Functions
NumStates: 4 mcUnknown.P ans = 4×4 NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
Suppose economic theory states that the US economy never transitions to an expansion from a recession or depression. Create a 4-regime Markov chain with a partially known transition matrix representing the situation. P(1,4) = 0; P(2,4) = 0; mcPartial = dtmc(P,'StateNames',statenames) mcPartial = dtmc with properties: P: [4x4 double] StateNames: ["Depression" NumStates: 4
"Recession"
"Stagnation"
"Expansion"]
mcPartial.P ans = 4×4 NaN NaN NaN NaN
NaN NaN NaN NaN
NaN NaN NaN NaN
0 0 NaN NaN
The estimate function of msVAR treats the known elements of mcPartial.P as equality constraints during optimization. For more details on Markov-switching dynamic regression models, see msVAR.
Alternatives You also can create a Markov chain object using mcmix.
Version History Introduced in R2017b
12-526
dtmc
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Haggstrom, O. Finite Markov Chains and Algorithmic Applications. Cambridge, UK: Cambridge University Press, 2002. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [4] Norris, J. R. Markov Chains. Cambridge, UK: Cambridge University Press, 1997.
See Also mcmix | msVAR Topics “Discrete-Time Markov Chains” on page 10-2 “Markov Chain Modeling” on page 10-8
12-527
12
Functions
Econometric Modeler Analyze and model econometric time series
Description The Econometric Modeler app provides an interface for interactive exploratory data analysis. The flexible interface supports analysis of univariate and multivariate time series and conditional mean (for example, ARIMA), conditional variance (for example, GARCH), multivariate (for example, VAR and VEC), and time series regression model estimation. Using the app, you can: • Visualize and transform time series data. • Perform statistical specification and model identification tests. • Estimate candidate models and compare fits. • Perform post-fit assessments and residual diagnostics. • Automatically generate code or a report from a session.
12-528
Econometric Modeler
Open the Econometric Modeler App • MATLAB Toolstrip: On the Apps tab, under Computational Finance, click the app icon. • MATLAB command prompt: Enter econometricModeler.
Examples •
“Prepare Time Series Data for Econometric Modeler App” on page 4-59
•
“Import Time Series Data into Econometric Modeler App” on page 4-62
•
“Plot Time Series Data Using Econometric Modeler App” on page 4-66
•
“Detect Serial Correlation Using Econometric Modeler App” on page 4-71
•
“Detect ARCH Effects Using Econometric Modeler App” on page 4-77
•
“Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84
•
“Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94
•
“Conduct Cointegration Test Using Econometric Modeler” on page 4-170
•
“Transform Time Series Using Econometric Modeler App” on page 4-97
•
“Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4-112
•
“Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131
•
“Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141
•
“Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150
•
“Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200
•
“Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208
•
“Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155
•
“Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
•
“Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193
•
“Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122
•
“Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4221
•
“Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230
•
“Share Results of Econometric Modeler App Session” on page 4-237
Version History Introduced in R2018a
See Also Objects arima | regARIMA | garch | gjr | egarch | varm | vecm 12-529
12
Functions
Functions autocorr | parcorr | crosscorr | adftest | kpsstest | lmctest | lbqtest | archtest | vratiotest | pptest | collintest | egcitest | jcitest | aicbic Topics “Prepare Time Series Data for Econometric Modeler App” on page 4-59 “Import Time Series Data into Econometric Modeler App” on page 4-62 “Plot Time Series Data Using Econometric Modeler App” on page 4-66 “Detect Serial Correlation Using Econometric Modeler App” on page 4-71 “Detect ARCH Effects Using Econometric Modeler App” on page 4-77 “Assess Stationarity of Time Series Using Econometric Modeler” on page 4-84 “Assess Collinearity Among Multiple Series Using Econometric Modeler App” on page 4-94 “Conduct Cointegration Test Using Econometric Modeler” on page 4-170 “Transform Time Series Using Econometric Modeler App” on page 4-97 “Implement Box-Jenkins Model Selection and Estimation Using Econometric Modeler App” on page 4112 “Estimate Multiplicative ARIMA Model Using Econometric Modeler App” on page 4-131 “Perform ARIMA Model Residual Diagnostics Using Econometric Modeler App” on page 4-141 “Specify t Innovation Distribution Using Econometric Modeler App” on page 4-150 “Estimate ARIMAX Model Using Econometric Modeler App” on page 4-200 “Estimate Regression Model with ARMA Errors Using Econometric Modeler App” on page 4-208 “Estimate Vector Autoregression Model Using Econometric Modeler” on page 4-155 “Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180 “Compare Predictive Performance After Creating Models Using Econometric Modeler” on page 4-193 “Select ARCH Lags for GARCH Model Using Econometric Modeler App” on page 4-122 “Compare Conditional Variance Model Fit Statistics Using Econometric Modeler App” on page 4-221 “Perform GARCH Model Residual Diagnostics Using Econometric Modeler App” on page 4-230 “Share Results of Econometric Modeler App Session” on page 4-237 “Analyze Time Series Data Using Econometric Modeler” on page 4-2 “Specifying Univariate Lag Operator Polynomials Interactively” on page 4-44 “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50 Creating ARIMA Models Using Econometric Modeler App
12-530
egarch
egarch EGARCH conditional variance time series model
Description Use egarch to specify a univariate EGARCH (exponential generalized autoregressive conditional heteroscedastic) model. The egarch function returns an egarch object specifying the functional form of an EGARCH(P,Q) model on page 12-547, and stores its parameter values. The key components of an egarch model include the: • GARCH polynomial, which is composed of lagged, logged conditional variances. The degree is denoted by P. • ARCH polynomial, which is composed of the magnitudes of lagged standardized innovations. • Leverage polynomial, which is composed of lagged standardized innovations. • Maximum of the ARCH and leverage polynomial degrees, denoted by Q. P is the maximum nonzero lag in the GARCH polynomial, and Q is the maximum nonzero lag in the ARCH and leverage polynomials. Other model components include an innovation mean model offset, a conditional variance model constant, and the innovations distribution. All coefficients are unknown (NaN values) and estimable unless you specify their values using namevalue pair argument syntax. To estimate models containing all or partially unknown parameter values given data, use estimate. For completely specified models (models in which all parameter values are known), simulate or forecast responses using simulate or forecast, respectively.
Creation Syntax Mdl = egarch Mdl = egarch(P,Q) Mdl = egarch(Name,Value) Description Mdl = egarch creates a zero-degree conditional variance egarch object. Mdl = egarch(P,Q) creates an EGARCH conditional variance model object (Mdl) with a GARCH polynomial with a degree of P, and ARCH and leverage polynomials each with a degree of Q. All polynomials contain all consecutive lags from 1 through their degrees, and all coefficients are NaN values. This shorthand syntax enables you to create a template in which you specify the polynomial degrees explicitly. The model template is suited for unrestricted parameter estimation, that is, estimation without any parameter equality constraints. However, after you create a model, you can alter property values using dot notation. 12-531
12
Functions
Mdl = egarch(Name,Value) sets properties on page 12-533 or additional options using namevalue pair arguments. Enclose each name in quotes. For example, 'ARCHLags',[1 4],'ARCH', {0.2 0.3} specifies the two ARCH coefficients in ARCH at lags 1 and 4. This longhand syntax enables you to create more flexible models. Input Arguments The shorthand syntax provides an easy way for you to create model templates that are suitable for unrestricted parameter estimation. For example, to create an EGARCH(1,2) model containing unknown parameter values, enter: Mdl = egarch(1,2);
To impose equality constraints on parameter values during estimation, set the appropriate property on page 12-533 values using dot notation. P — GARCH polynomial degree nonnegative integer GARCH polynomial degree, specified as a nonnegative integer. In the GARCH polynomial and at time t, MATLAB includes all consecutive logged conditional variance terms from lag t – 1 through lag t – P. You can specify this argument using the egarch(P,Q) shorthand syntax only. If P > 0, then you must specify Q as a positive integer. Example: egarch(1,1) Data Types: double Q — ARCH polynomial degree nonnegative integer ARCH polynomial degree, specified as a nonnegative integer. In the ARCH polynomial and at time t, MATLAB includes all consecutive magnitudes of standardized innovation terms (for the ARCH polynomial) and all standardized innovation terms (for the leverage polynomial) from lag t – 1 through lag t – Q. You can specify this argument using the egarch(P,Q) shorthand syntax only. If P > 0, then you must specify Q as a positive integer. Example: egarch(1,1) Data Types: double Name-Value Pair Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. The longhand syntax enables you to create models in which some or all coefficients are known. During estimation, estimate imposes equality constraints on any known parameters. 12-532
egarch
Example: 'ARCHLags',[1 4],'ARCH',{NaN NaN} specifies an EGARCH(0,4) model and unknown, but nonzero, ARCH coefficient matrices at lags 1 and 4. GARCHLags — GARCH polynomial lags 1:P (default) | numeric vector of unique positive integers GARCH polynomial lags, specified as the comma-separated pair consisting of 'GARCHLags' and a numeric vector of unique positive integers. GARCHLags(j) is the lag corresponding to the coefficient GARCH{j}. The lengths of GARCHLags and GARCH must be equal. Assuming all GARCH coefficients (specified by the GARCH property) are positive or NaN values, max(GARCHLags) determines the value of the P property. Example: 'GARCHLags',[1 4] Data Types: double ARCHLags — ARCH polynomial lags 1:Q (default) | numeric vector of unique positive integers ARCH polynomial lags, specified as the comma-separated pair consisting of 'ARCHLags' and a numeric vector of unique positive integers. ARCHLags(j) is the lag corresponding to the coefficient ARCH{j}. The lengths of ARCHLags and ARCH must be equal. Assuming all ARCH and leverage coefficients (specified by the ARCH and Leverage properties) are positive or NaN values, max([ARCHLags LeverageLags]) determines the value of the Q property. Example: 'ARCHLags',[1 4] Data Types: double LeverageLags — Leverage polynomial lags 1:Q (default) | numeric vector of unique positive integers Leverage polynomial lags, specified as the comma-separated pair consisting of 'LeverageLags' and a numeric vector of unique positive integers. LeverageLags(j) is the lag corresponding to the coefficient Leverage{j}. The lengths of LeverageLags and Leverage must be equal. Assuming all ARCH and leverage coefficients (specified by the ARCH and Leverage properties) are positive or NaN values, max([ARCHLags LeverageLags]) determines the value of the Q property. Example: 'LeverageLags',1:4 Data Types: double
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to create an EGARCH(1,1) model with unknown coefficients, and then specify a t innovation distribution with unknown degrees of freedom, enter: 12-533
12
Functions
Mdl = egarch('GARCHLags',1,'ARCHLags',1); Mdl.Distribution = "t";
P — GARCH polynomial degree nonnegative integer This property is read-only. GARCH polynomial degree, specified as a nonnegative integer. P is the maximum lag in the GARCH polynomial with a coefficient that is positive or NaN. Lags that are less than P can have coefficients equal to 0. P specifies the minimum number of presample conditional variances required to initialize the model. If you use name-value pair arguments to create the model, then MATLAB implements one of these alternatives (assuming the coefficient of the largest lag is positive or NaN): • If you specify GARCHLags, then P is the largest specified lag. • If you specify GARCH, then P is the number of elements of the specified value. If you also specify GARCHLags, then egarch uses GARCHLags to determine P instead. • Otherwise, P is 0. Data Types: double Q — Maximum degree of ARCH and leverage polynomials nonnegative integer This property is read-only. Maximum degree of ARCH and leverage polynomials, specified as a nonnegative integer. Q is the maximum lag in the ARCH and leverage polynomials in the model. In either type of polynomial, lags that are less than Q can have coefficients equal to 0. Q specifies the minimum number of presample innovations required to initiate the model. If you use name-value pair arguments to create the model, then MATLAB implements one of these alternatives (assuming the coefficients of the largest lags in the ARCH and leverage polynomials are positive or NaN): • If you specify ARCHLags or LeverageLags, then Q is the maximum between the two specifications. • If you specify ARCH or Leverage, then Q is the maximum number of elements between the two specifications. If you also specify ARCHLags or LeverageLags, then egarch uses their values to determine Q instead. • Otherwise, Q is 0. Data Types: double Constant — Conditional variance model constant NaN (default) | numeric scalar Conditional variance model constant, specified as a numeric scalar or NaN value. Data Types: double 12-534
egarch
GARCH — GARCH polynomial coefficients cell vector of positive scalars or NaN values GARCH polynomial coefficients, specified as a cell vector of positive scalars or NaN values. • If you specify GARCHLags, then the following conditions apply. • The lengths of GARCH and GARCHLags are equal. • GARCH{j} is the coefficient of lag GARCHLags(j). • By default, GARCH is a numel(GARCHLags)-by-1 cell vector of NaN values. • Otherwise, the following conditions apply. • The length of GARCH is P. • GARCH{j} is the coefficient of lag j. • By default, GARCH is a P-by-1 cell vector of NaN values. The coefficients in GARCH correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, egarch excludes that coefficient and its corresponding lag in GARCHLags from the model. Data Types: cell ARCH — ARCH polynomial coefficients cell vector of positive scalars or NaN values ARCH polynomial coefficients, specified as a cell vector of positive scalars or NaN values. • If you specify ARCHLags, then the following conditions apply. • The lengths of ARCH and ARCHLags are equal. • ARCH{j} is the coefficient of lag ARCHLags(j). • By default, ARCH is a Q-by-1 cell vector of NaN values. For more details, see the Q property. • Otherwise, the following conditions apply. • The length of ARCH is Q. • ARCH{j} is the coefficient of lag j. • By default, ARCH is a Q-by-1 cell vector of NaN values. The coefficients in ARCH correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, egarch excludes that coefficient and its corresponding lag in ARCHLags from the model. Data Types: cell Leverage — Leverage polynomial coefficients cell vector of numeric scalars or NaN values Leverage polynomial coefficients, specified as a cell vector of numeric scalars or NaN values. • If you specify LeverageLags, then the following conditions apply. • The lengths of Leverage and LeverageLags are equal. 12-535
12
Functions
• Leverage{j} is the coefficient of lag LeverageLags(j). • By default, Leverage is a Q-by-1 cell vector of NaN values. For more details, see the Q property. • Otherwise, the following conditions apply. • The length of Leverage is Q. • Leverage{j} is the coefficient of lag j. • By default, Leverage is a Q-by-1 cell vector of NaN values. The coefficients in Leverage correspond to coefficients in an underlying LagOp lag operator polynomial, and are subject to a near-zero tolerance exclusion test. If you set a coefficient to 1e–12 or below, egarch excludes that coefficient and its corresponding lag in LeverageLags from the model. Data Types: cell UnconditionalVariance — Model unconditional variance positive scalar This property is read-only. The model unconditional variance, specified as a positive scalar. The unconditional variance is σε2 = exp
κ P
(1 − ∑i = 1 γi)
.
κ is the conditional variance model constant (Constant). Data Types: double Offset — Innovation mean model offset 0 (default) | numeric scalar | NaN Innovation mean model offset, or additive constant, specified as a numeric scalar or NaN value. Data Types: double Distribution — Conditional probability distribution of innovation process εt "Gaussian" (default) | "t" | structure array Conditional probability distribution of the innovation process εt, specified as a string or structure array. egarch stores the value as a structure array. Distribution
String
Structure Array
Gaussian
"Gaussian"
struct('Name',"Gaussian")
Student’s t
"t"
struct('Name',"t",'DoF',DoF)
The 'DoF' field specifies the t distribution degrees of freedom parameter. • DoF > 2 or DoF = NaN. • DoF is estimable. 12-536
egarch
• If you specify "t", DoF is NaN by default. You can change its value by using dot notation after you create the model. For example, Mdl.Distribution.DoF = 3. • If you supply a structure array to specify the Student's t distribution, then you must specify both the 'Name' and the 'DoF' fields. Example: Distribution=struct('Name',"t",'DoF',10) Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. egarch stores the value as a string scalar. The default value describes the parametric form of the model, for example "EGARCH(1,1) Conditional Variance Model (Gaussian Distribution)". Data Types: string | char SeriesName — Response series name string scalar | character vector | "Y" Response series name, specified as a string scalar or character vector. egarch stores the value as a string scalar. Example: "StockReturn" Data Types: string | char Note • All NaN-valued model parameters, which include coefficients and the t-innovation-distribution degrees of freedom (if present), are estimable. When you pass the resulting egarch object and data to estimate, MATLAB estimates all NaN-valued parameters. During estimation, estimate treats known parameters as equality constraints, that is,estimate holds any known parameters fixed at their values. • Typically, the lags in the ARCH and leverage polynomials are the same, but their equality is not a requirement. Differing polynomials occur when: • Either ARCH{Q} or Leverage{Q} meets the near-zero exclusion tolerance. In this case, MATLAB excludes the corresponding lag from the polynomial. • You specify polynomials of differing lengths by specifying ARCHLags or LeverageLags, or by setting the ARCH or Leverage property. In either case, Q is the maximum lag between the two polynomials.
Object Functions estimate filter forecast infer simulate summarize
Fit conditional variance model to data Filter disturbances through conditional variance model Forecast conditional variances from conditional variance models Infer conditional variances of conditional variance models Monte Carlo simulation of conditional variance models Display estimation results of conditional variance model
12-537
12
Functions
Examples Create Default EGARCH Model Create a default egarch model object and specify its parameter values using dot notation. Create an EGARCH(0,0) model. Mdl = egarch Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(0,0) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 0 NaN {} {} {} 0
Mdl is an egarch model. It contains an unknown constant, its offset is 0, and the innovation distribution is 'Gaussian'. The model does not have GARCH, ARCH, or leverage polynomials. Specify two unknown ARCH and leverage coefficients for lags one and two using dot notation. Mdl.ARCH = {NaN NaN}; Mdl.Leverage = {NaN NaN}; Mdl Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(0,2) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 2 NaN {} {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
The Q, ARCH, and Leverage properties update to 2, {NaN NaN}, {NaN NaN}, respectively. The two ARCH and leverage coefficients are associated with lags 1 and 2.
Create EGARCH Model Using Shorthand Syntax Create an egarch model object using the shorthand notation egarch(P,Q), where P is the degree of the GARCH polynomial and Q is the degree of the ARCH and leverage polynomial. 12-538
egarch
Create an EGARCH(3,2) model. Mdl = egarch(3,2) Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(3,2) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 3 2 NaN {NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Mdl is an egarch model object. All properties of Mdl, except P, Q, and Distribution, are NaN values. By default, the software: • Includes a conditional variance model constant • Excludes a conditional mean model offset (i.e., the offset is 0) • Includes all lag terms in the GARCH polynomial up to lag P • Includes all lag terms in the ARCH and leverage polynomials up to lag Q Mdl specifies only the functional form of an EGARCH model. Because it contains unknown parameter values, you can pass Mdl and time-series data to estimate to estimate the parameters.
Create EGARCH Model Using Longhand Syntax Create an egarch model object using name-value pair arguments. Specify an EGARCH(1,1) model. By default, the conditional mean model offset is zero. Specify that the offset is NaN. Include a leverage term. Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN) Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution)" "Y" Name = "Gaussian" 1 1 NaN {NaN} at lag [1] {NaN} at lag [1] {NaN} at lag [1] NaN
Mdl is an egarch model object. The software sets all parameters to NaN, except P, Q, and Distribution. 12-539
12
Functions
Since Mdl contains NaN values, Mdl is appropriate for estimation only. Pass Mdl and time-series data to estimate.
Create EGARCH Model with Known Coefficients Create an EGARCH(1,1) model with mean offset, yt = 0 . 5 + εt, where εt = σtzt, σt2 = 0 . 0001 + 0 . 75logσt2− 1 + 0 . 1
|εt − 1| εt − 1 εt − 3 2 − − 0.3 + 0 . 01 , σt − 1 π σt − 1 σt − 3
and zt is an independent and identically distributed standard Gaussian process. Mdl = egarch('Constant',0.0001,'GARCH',0.75,... 'ARCH',0.1,'Offset',0.5,'Leverage',{-0.3 0 0.01}) Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(1,3) Conditional Variance Model with Offset (Gaussian Distribution)" "Y" Name = "Gaussian" 1 3 0.0001 {0.75} at lag [1] {0.1} at lag [1] {-0.3 0.01} at lags [1 3] 0.5
egarch assigns default values to any properties you do not specify with name-value pair arguments. An alternative way to specify the leverage component is 'Leverage',{-0.3 0.01},'LeverageLags',[1 3].
Access EGARCH Model Properties Access the properties of a created egarch model object using dot notation. Create an egarch model object. Mdl = egarch(3,2) Mdl = egarch with properties: Description: "EGARCH(3,2) Conditional Variance Model (Gaussian Distribution)" SeriesName: "Y" Distribution: Name = "Gaussian"
12-540
egarch
P: Q: Constant: GARCH: ARCH: Leverage: Offset:
3 2 NaN {NaN NaN NaN} at lags [1 2 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Remove the second GARCH term from the model. That is, specify that the GARCH coefficient of the second lagged conditional variance is 0. Mdl.GARCH{2} = 0 Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(3,2) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 3 2 NaN {NaN NaN} at lags [1 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
The GARCH polynomial has two unknown parameters corresponding to lags 1 and 3. Display the distribution of the disturbances. Mdl.Distribution ans = struct with fields: Name: "Gaussian"
The disturbances are Gaussian with mean 0 and variance 1. Specify that the underlying disturbances have a t distribution with five degrees of freedom. Mdl.Distribution = struct('Name','t','DoF',5) Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(3,2) Conditional Variance Model (t Distribution)" "Y" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {NaN NaN} at lags [1 2] {NaN NaN} at lags [1 2] 0
Specify that the ARCH coefficients are 0.2 for the first lag and 0.1 for the second lag. 12-541
12
Functions
Mdl.ARCH = {0.2 0.1} Mdl = egarch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Leverage: Offset:
"EGARCH(3,2) Conditional Variance Model (t Distribution)" "Y" Name = "t", DoF = 5 3 2 NaN {NaN NaN} at lags [1 3] {0.2 0.1} at lags [1 2] {NaN NaN} at lags [1 2] 0
To estimate the remaining parameters, you can pass Mdl and your data to estimate and use the specified parameters as equality constraints. Or, you can specify the rest of the parameter values, and then simulate or forecast conditional variances from the GARCH model by passing the fully specified model to simulate or forecast, respectively.
Estimate EGARCH Model Fit an EGARCH model to an annual time series of Danish nominal stock returns from 1922-1999. Load the Data_Danish data set. Plot the nominal returns (RN). load Data_Danish; nr = DataTable.RN; figure; plot(dates,nr); hold on; plot([dates(1) dates(end)],[0 0],'r:'); % Plot y = 0 hold off; title('Danish Nominal Stock Returns'); ylabel('Nominal return (%)'); xlabel('Year');
12-542
egarch
The nominal return series seems to have a nonzero conditional mean offset and seems to exhibit volatility clustering. That is, the variability is smaller for earlier years than it is for later years. For this example, assume that an EGARCH(1,1) model is appropriate for this series. Create an EGARCH(1,1) model. The conditional mean offset is zero by default. To estimate the offset, specify that it is NaN. Include a leverage lag. Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN);
Fit the EGARCH(1,1) model to the data. EstMdl = estimate(Mdl,nr); EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1} ARCH{1} Leverage{1} Offset
-0.62723 0.77419 0.38636 -0.002499 0.10325
StandardError _____________ 0.74401 0.23628 0.37361 0.19222 0.037727
TStatistic __________
PValue _________
-0.84304 3.2766 1.0341 -0.013001 2.7368
0.3992 0.0010507 0.30107 0.98963 0.0062047
EstMdl is a fully specified egarch model object. That is, it does not contain NaN values. You can assess the adequacy of the model by generating residuals using infer, and then analyzing them. 12-543
12
Functions
To simulate conditional variances or responses, pass EstMdl to simulate. To forecast innovations, pass EstMdl to forecast.
Simulate EGARCH Model Observations and Conditional Variances Simulate conditional variance or response paths from a fully specified egarch model object. That is, simulate from an estimated egarch model or a known egarch model in which you specify all parameter values. Load the Data_Danish data set. load Data_Danish; rn = DataTable.RN;
Create an EGARCH(1,1) model with an unknown conditional mean offset. Fit the model to the annual, nominal return series. Include a leverage term. Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN); EstMdl = estimate(Mdl,rn); EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1} ARCH{1} Leverage{1} Offset
-0.62723 0.77419 0.38636 -0.002499 0.10325
StandardError _____________ 0.74401 0.23628 0.37361 0.19222 0.037727
TStatistic __________
PValue _________
-0.84304 3.2766 1.0341 -0.013001 2.7368
0.3992 0.0010507 0.30107 0.98963 0.0062047
Simulate 100 paths of conditional variances and responses from the estimated EGARCH model. numObs = numel(rn); % Sample size (T) numPaths = 100; % Number of paths to simulate rng(1); % For reproducibility [VSim,YSim] = simulate(EstMdl,numObs,'NumPaths',numPaths);
VSim and YSim are T-by- numPaths matrices. Rows correspond to a sample period, and columns correspond to a simulated path. Plot the average and the 97.5% and 2.5% percentiles of the simulate paths. Compare the simulation statistics to the original data. VSimBar = mean(VSim,2); VSimCI = quantile(VSim,[0.025 0.975],2); YSimBar = mean(YSim,2); YSimCI = quantile(YSim,[0.025 0.975],2); figure; subplot(2,1,1); h1 = plot(dates,VSim,'Color',0.8*ones(1,3)); hold on;
12-544
egarch
h2 = plot(dates,VSimBar,'k--','LineWidth',2); h3 = plot(dates,VSimCI,'r--','LineWidth',2); hold off; title('Simulated Conditional Variances'); ylabel('Cond. var.'); xlabel('Year'); subplot(2,1,2); h1 = plot(dates,YSim,'Color',0.8*ones(1,3)); hold on; h2 = plot(dates,YSimBar,'k--','LineWidth',2); h3 = plot(dates,YSimCI,'r--','LineWidth',2); hold off; title('Simulated Nominal Returns'); ylabel('Nominal return (%)'); xlabel('Year'); legend([h1(1) h2 h3(1)],{'Simulated path' 'Mean' 'Confidence bounds'},... 'FontSize',7,'Location','NorthWest');
Forecast EGARCH Model Conditional Variances Forecast conditional variances from a fully specified egarch model object. That is, forecast from an estimated egarch model or a known egarch model in which you specify all parameter values. The example follows from “Estimate EGARCH Model” on page 12-542. 12-545
12
Functions
Load the Data_Danish data set. load Data_Danish; nr = DataTable.RN;
Create an EGARCH(1,1) model with an unknown conditional mean offset and include a leverage term. Fit the model to the annual nominal return series. Mdl = egarch('GARCHLags',1,'ARCHLags',1,'LeverageLags',1,'Offset',NaN); EstMdl = estimate(Mdl,nr); EGARCH(1,1) Conditional Variance Model with Offset (Gaussian Distribution): Value _________ Constant GARCH{1} ARCH{1} Leverage{1} Offset
-0.62723 0.77419 0.38636 -0.002499 0.10325
StandardError _____________ 0.74401 0.23628 0.37361 0.19222 0.037727
TStatistic __________
PValue _________
-0.84304 3.2766 1.0341 -0.013001 2.7368
0.3992 0.0010507 0.30107 0.98963 0.0062047
Forecast the conditional variance of the nominal return series 10 years into the future using the estimated EGARCH model. Specify the entire returns series as presample observations. The software infers presample conditional variances using the presample observations and the model. numPeriods = 10; vF = forecast(EstMdl,numPeriods,nr);
Plot the forecasted conditional variances of the nominal returns. Compare the forecasts to the observed conditional variances. v = infer(EstMdl,nr); figure; plot(dates,v,'k:','LineWidth',2); hold on; plot(dates(end):dates(end) + 10,[v(end);vF],'r','LineWidth',2); title('Forecasted Conditional Variances of Nominal Returns'); ylabel('Conditional variances'); xlabel('Year'); legend({'Estimation sample cond. var.','Forecasted cond. var.'},... 'Location','Best');
12-546
egarch
More About EGARCH Model An EGARCH model is a dynamic model that addresses conditional heteroscedasticity, or volatility clustering, in an innovations process. Volatility clustering occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes with time. An EGARCH model posits that the current conditional variance is the sum of these linear processes: • Past logged conditional variances (the GARCH component or polynomial) • Magnitudes of past standardized innovations (the ARCH component or polynomial) • Past standardized innovations (the leverage component or polynomial) Consider the time series yt = μ + εt, where εt = σtzt . The EGARCH(P,Q) conditional variance process, σt2, has the form logσt2 = κ +
P
∑
i=1
γilogσt2− i +
Q
∑
j=1
αj
εt − j εt − j −E σt − j σt − j
+
Q
∑
j=1
ξj
εt − j . σt − j
The table shows how the variables correspond to the properties of the egarch object. 12-547
12
Functions
Variable
Description
Property
μ
Innovation mean model constant 'Offset' offset
κ
Conditional variance model constant
'Constant'
γj
GARCH component coefficients
'GARCH'
αj
ARCH component coefficients
'ARCH'
ξj
Leverage component coefficients
'Leverage'
zt
Series of independent random variables with mean 0 and variance 1
'Distribution'
If zt is Gaussian, then E
εt − j σt − j
= E zt −
j
=
2 . π
If zt is t distributed with ν > 2 degrees of freedom, then εt − j E σt − j
= E zt −
j
=
ν−1 2 ν 2
ν−2Γ π Γ
.
To ensure a stationary EGARCH model, all roots of the GARCH lag operator polynomial, (1 − γ1L − … − γPLP), must lie outside of the unit circle. The EGARCH model is unique from the GARCH and GJR models because it models the logarithm of the variance. By modeling the logarithm, positivity constraints on the model parameters are relaxed. However, forecasts of conditional variances from an EGARCH model are biased, because by Jensen’s inequality, E(σt2) ≥ exp E(logσt2) . EGARCH models are appropriate when positive and negative shocks of equal magnitude do not contribute equally to volatility [1].
Tips • You can specify an egarch model as part of a composition of conditional mean and variance models. For details, see arima. • An EGARCH(1,1) specification is complex enough for most applications. Typically in these models, the GARCH and ARCH coefficients are positive, and the leverage coefficients are negative. If you get these signs, then large unanticipated downward shocks increase the variance. If you get signs opposite to those signs that are expected, you can encounter difficulties inferring volatility sequences and forecasting. A negative ARCH coefficient is problematic. In this case, an EGARCH model might not be the best choice for your application.
12-548
egarch
Version History Introduced in R2012a R2023a: Name an EGARCH model response series You can name the response series of an EGARCH model by setting the SeriesName property of the associated model to a string scalar. When you supply input response data to model object functions in a table or timetable, the functions choose the variable with name SeriesName as the response variable by default. R2018a: Describe an EGARCH model Describe an EGARCH model by setting the Description property to a string scalar. R2018a: Use indices that are consistent with MATLAB cell array indexing The indices of cell arrays of lag operator polynomial coefficients follow MATLAB cell array indexing rules. Affected model properties are GARCH, ARCH, and Leverage properties. Update code
• You cannot access any lag-zero coefficients by using an index of 0. For example, Mdl.ARCH{0} issues an error. Remove any instances of such indices of zero from your code. The value of all lag-zero coefficients is 1 except for the lag operator polynomial corresponding to the ARCH property, which has the value 0. • You cannot index beyond the maximal lag in the polynomial. For example, if Mdl.P is 4, then Mdl.ARCH{p} issues an error when p is greater than 4. For details on the maximal lags of the lag operator polynomials, see the corresponding property descriptions. Remove any instances of such indices beyond the maximal lag from your code. All coefficients beyond the maximal lag are 0. R2018a: Models store innovation distribution name as a string scalar Behavior changed in R2018a The Name field of the Distribution property of egarch model objects stores the innovation distribution name as a string scalar, for example, "Gaussian" for Gaussian innovations. Before R2018a, MATLAB stored the innovation distribution name as a character vector, for example 'Gaussian' for Gaussian innovations. Although most text-data operations accept character vectors and string scalars for text-data input, the two data types have some differences. For details, see “Text in String and Character Arrays”.
References [1] Tsay, R. S. Analysis of Financial Time Series. 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2010. 12-549
12
Functions
See Also Objects garch | gjr | arima Topics “Conditional Variance Models” on page 8-2 “EGARCH Model” on page 8-3 “Specify EGARCH Models” on page 8-17 “Modify Properties of Conditional Variance Models” on page 8-39 “Specify Conditional Mean and Variance Models” on page 7-75 “Infer Conditional Variances and Residuals” on page 8-62 “Compare Conditional Variance Models Using Information Criteria” on page 8-69 “Assess EGARCH Forecast Bias Using Simulations” on page 8-81 “Forecast a Conditional Variance Model” on page 8-97
12-550
egcitest
egcitest Engle-Granger cointegration test
Syntax h = egcitest(Y) [h,pValue,stat,cValue] = egcitest(Y) StatTbl = egcitest(Tbl) [ ___ ] = egcitest( ___ ,Name=Value) [ ___ ,reg1,reg2] = egcitest( ___ )
Description h = egcitest(Y) returns the rejection decision h from conducting the Engle-Granger cointegration test for assessing the null hypothesis of no cointegration among the variables in the multivariate time series Y. egcitest forms test statistics by regressing the response data Y(:,1) onto the predictor data Y(:,2:end), and then tests the residuals for a unit root. [h,pValue,stat,cValue] = egcitest(Y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = egcitest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Engle-Granger cointegration test on the variables of the table or timetable Tbl. The response variable in the regression is the first table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. [ ___ ] = egcitest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. egcitest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when egcitest conducts multiple tests: • egcitest treats each test as separate from all other tests. • If you specify Y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, egcitest(Tbl,ResponseVariable="GDP",Alpha=0.025,Lags=[0 1]) chooses GDP as the response variable from the table Tbl and conducts two tests at a level of significance of 0.025. The first test includes 0 lag in the residual regression, and the second test includes 1 lag in the residual regression. [ ___ ,reg1,reg2] = egcitest( ___ ) additionally returns the following structures of regression statistics, which are required to form the test statistic: 12-551
12
Functions
• reg1 – Statistics resulting from the cointegrating regression of the specified response variable ResponseVariable onto the specified predictor variables PredictorVariables • reg2 – Statistics resulting from the residual regression implemented by the specified unit root test RReg
Examples Conduct Engle-Granger Cointegration Test on Matrix of Data Test a multivariate time series for cointegration using the default values of the Engle-Granger cointegration test. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada series' ans = 5x1 cell {'(INF_C) Inflation rate (CPI-based)' } {'(INF_G) Inflation rate (GDP deflator-based)'} {'(INT_S) Interest rate (short-term)' } {'(INT_M) Interest rate (medium-term)' } {'(INT_L) Interest rate (long-term)' }
Test the interest rate series for cointegration by using the Engle-Granger cointegration test. Use default options and return the rejection decision and p-value. h = egcitest(Data(:,3:end)) h = logical 0
egcitest uses the τ test by default, and it fails to reject the null hypothesis (h = 0) of no cointegration among the interest rate series.
Return Test p-Value and Decision Statistics Load data of Canadian inflation and interest rates Data_Canada.mat. load Data_Canada
Test the interest rate series for cointegration by using the Engle-Granger cointegration test. Use default options and return the rejection decision, p-value, τ-test statistic, and critical value. [h,pValue,stat,cValue] = egcitest(Data(:,3:end)) h = logical 0
12-552
egcitest
pValue = 0.0526 stat = -3.9321 cValue = -3.9563
Conduct Default Engle-Granger Cointegration Test on Table Variables Conduct the Engle-Granger cointegration test on a multivariate time series using default options, which use the first table variable as the response, all other table variables as predictors, and includes a constant term in the cointegrating regression. Return a table of test results. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Conduct the Engel-Granger cointegration test by passing the timetable to egcitest and using default options. For the cointegrating regression, egcitest uses the CPI-based inflation rate as the response variable and all other variables in the timetable as predictors. StatTbl = egcitest(TT) StatTbl=1×9 table h _____ Test 1
true
pValue _________
stat _______
cValue _______
0.0023851
-6.2491
-4.7673
Lags ____ 0
Alpha _____
Test ______
CReg _____
RR ___
0.05
{'t1'}
{'c'}
{'A
StatTbl is a table of test results. The rows correspond to variables in the input timetable TT, and the columns correspond to the rejection decision, and corresponding p-value, decision statistics, and specified test options. In this case, the test rejects the null hypothesis in favor of the alternative of cointegration among all the table variables. By default, egcitest includes all input table variables in the cointegration test. To select a response variable for the cointegrating regression, set the ResponseVariable option. To select predictor variables, set the PredictorVariables option.
Select Test Statistics and Plot Cointegrating Relation Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable of the interest rate series only. load Data_Canada dates = datetime(dates,12,31); idxINT = contains(DataTable.Properties.VariableNames,"INT");
12-553
12
Functions
TT = table2timetable(DataTable(:,idxINT),RowTimes=dates); TT.Observations = [];
Plot the interest rate series. figure plot(TT.Time,TT.Variables) legend(series(idxINT),Location="northwest") grid on
Reproduce row 1 of Table II in [3] by testing for cointegration, specifying the default variable assignments for the cointegrating regression and deterministic terms (response variable y1 is INT_S, the other interest rates y2 and y3 are predictors, and the model has a constant c), and specifying the τ and z tests. Return the cointegrating regression statistics. [StatTbl,reg] = egcitest(TT,Test=["t1" "t2"]); StatTbl StatTbl=2×9 table h _____ Test 1 Test 2
false true
pValue ________
stat _______
cValue _______
0.052627 0.020157
-3.9321 -25.454
-3.9563 -22.115
Lags ____ 0 0
Alpha _____
Test ______
CReg _____
RRe ____
0.05 0.05
{'t1'} {'t2'}
{'c'} {'c'}
{'AD {'AD
The τ test (Test 1) fails to reject the null hypothesis, but the z test (Test 2) rejects the null hypothesis in favor of the presence of cointegration. 12-554
egcitest
Plot the estimated cointegrating relation using the regression statistics from the z test b1 y1 − y2 y3 − Xa, where Xa = c. b2 c = reg(2).coeff(1); b = reg(2).coeff(2:3); figure plot(TT.Time,TT.Variables*[1; -b] - c) grid on
Input Arguments Y — Data representing observations of multivariate time series yt numeric matrix Data representing observations of a multivariate time series yt, specified as a numObs-by-numDims numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. The test regresses the response variable Y(:,1) on the predictor variables Y(:,2:end). Data Types: double Tbl — Data representing observations of multivariate time series yt table | timetable 12-555
12
Functions
Data representing observations of a multivariate time series yt, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. The test regresses the response variable, which is the first variable in Tbl, on the predictor variables, which are all other variables in Tbl. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. The selected variables must be numeric. Note egcitest removes, from the specified data, all observations containing at least one missing observation, represented by a NaN value. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: egcitest(Tbl,ResponseVariable="GDP",Alpha=0.025,Lags=[0 1]) chooses GDP as the response variable from the table Tbl and conducts two tests at a level of significance of 0.025. The first test includes 0 lag in the residual regression, and the second test includes 1 lag in the residual regression. CReg — Cointegrating regression form "c" (default) | "nc" | "ct" | "ctt" | character vector | string vector | cell vector of character vectors Cointegrating regression form, specified as the name of a form, or a string vector or cell vector of form names. In general, cointegrating regression is y1 = Xa + Y 2b + ε where y1 is the response variable, Y2 contains the predictor variables, and X is a design matrix for optional deterministic coefficients a, including a constant, linear time trend, and quadratic time trend. This table contains the supported forms and their names. Form Name
Description
"nc"
The regression does not include X; no constant or trends.
"c"
X contains a variable for the constant, but not for the trends.
"ct"
X contains variables for the constant and the linear time trend.
"ctt"
X contains variables for the constant, linear time trend, and quadratic time trend.
egcitest conducts a separate test for each form name in CReg. Example: CReg=["ct" "ctt"] includes a constant and linear time trend terms in the cointegrating regression for the first test, and then includes all three deterministic terms in the cointegrating regression for the second test. Data Types: char | string | cell 12-556
egcitest
CVec — Cointegrating-regression coefficient equality constraints numeric vector (default) | cell vector of numeric vectors Cointegrating-regression coefficient equality constraints, specified as the numeric vector [a; b] or cell vector of such numeric vectors. a contains the equality constraints of the deterministic terms in the cointegrating regression. The length of a depends on the corresponding value of the CReg name-value argument, one of 0, 1, 2, or 3. For coefficients in the regression, their order in a is constant, linear trend, and quadratic trend. b contains the numDims − 1 equality constraints for the coefficient of the corresponding predictor variable in Y2. Specify NaN entries to estimate the corresponding coefficient in the regression. When CVec is completely specified (does not contain any NaN values), egcitest does not perform the cointegrating regression. By default, CVec is a completely unspecified cointegrating vector (completely composed of NaN values). Consequently, egcitest estimates all coefficients. egcitest conducts a separate test for each set of equality constraints in CVec. Example: egcitest(Tbl,CVec=[2 NaN NaN]) fixes the constant in the cointegrating regression to 2 and estimates the coefficients of the two predictor variables in Tbl. Example: egcitest(Tbl,CVec={[2 NaN NaN]; nan(3,1)), for the first test, fixes the constant in the cointegrating regression to 2 and estimates the coefficients of the two predictor variables in Tbl, and for the second test, estimates all coefficients. Example: egcitest(Tbl,CReg="ctt",CVec=[2 0.5 0.25 NaN NaN]) fixes the constant to 2, the linear trend to 0.5, and the quadratic trend to 0.25, and estimates the coefficients of the two predictor variables in Tbl. Data Types: double | cell RReg — Residual regression form "pp" (default) | "adf" | character vector | string vector | cell vector of character vectors Residual regression form, specified as the name of a form, or a string vector or cell vector of form names. Form Name
Description
"adf"
Augmented Dickey-Fuller test (adftest) of residuals from the cointegrating regression
"pp"
Phillips-Perron test (pptest) of residuals from the cointegrating regression
egcitest computes test statistics by calling adftest and pptest with the setting Model="AR". This setting requires residuals from appropriately demeaned and detrended data, which is specified by the cointegrating-regression form CReg. egcitest conducts a separate test for each form name in RReg. Example: CReg=["adf" "pp"] performs the augmented Dickey-Fuller test for the residual regression of the first test, and then performs the Phillips-Perron test for the residual regression of the second test. 12-557
12
Functions
Data Types: char | string | cell Lags — Number of lags in residual regression 0 (default) | nonnegative integer | vector of nonnegative integers Number of lags in the residual regression, specified as a nonnegative integer or vector of nonnegative integers. The meaning of Lags depends on the value of the RReg name-value argument. For more details, see the Lags argument of the adftest and pptest functions. egcitest conducts a separate test for each element in Lags. Example: Lags=[0 1] includes no lags in the residual regression for the first test, and then includes one lag for the residual regression for the second test. Data Types: double Test — Test statistic type from residual regression "t1" (default) | "t2" | character vector | string vector | cell vector of character vectors Test statistic type from residual regression, specified as test name, or a string vector or cell vector of test names. This table contains the supported test names. Test Name
Description
"t1"
τ test
"t2"
z test For more details, see the Test argument of the adftest and pptest functions. egcitest conducts a separate test for each element in Test. Example: Test=["t1" "t2"] computes the τ test from the residual regression for the first test, and then computes the z test from the residual regression for the second test. Data Types: char | cell | string Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. egcitest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double ResponseVariable — Variable in Tbl to use for response first variable in Tbl (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variable in Tbl to use for response in the cointegrating regression, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. egcitest uses the same specified response variable for all tests.
12-558
egcitest
Example: ResponseVariable="GDP" Data Types: double | logical | char | cell | string PredictorVariables — Variables in Tbl to use for the predictors string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl to use for the predictors in the cointegrating regression, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. egcitest uses the same specified predictors for all tests. By default, egcitest uses all variables in Tbl that is not specified by the ResponseVariable name-value argument. Example: DataVariables=["UN" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Note • When egcitest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the matrix Y and any value is a row vector, all outputs are row vectors. • A lagged and differenced time series has a reduced sample size. Absent presample values, if the test series yt is defined for t = 1,…,T, the lagged series yt– k is defined for t = k+1,…,T. The first difference applied to the lagged series yt– k further reduces the time base to k+2,…,T. With p lagged differences, the common time base is p+2,…,T and the effective sample size is T–(p+1).
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. egcitest returns h when you supply the input Y. • Values of 1 indicate rejection of the null hypothesis in favor of the alternative of cointegration. • Values of 0 indicate failure to reject the null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. egcitest returns pValue when you supply the input Y. The p-values are left-tailed probabilities. 12-559
12
Functions
stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. egcitest returns stat when you supply the input Y. The RReg and Test settings of a particular test determine the test statistic. For more details, see adftest and pptest. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. egcitest returns cValue when you supply the input Y. The critical values are for left-tailed probabilities. Because egcitest estimates the residuals (that is, residuals are unobserved), critical values are different from those used in adftest or pptest (unless the cointegrating vector is completely specified by the CVec setting). egcitest loads tables of critical values from the file Data_EGCITest.mat, and then linearly interpolates test critical values from the tables. Critical values in the tables derive from methods described in [3]. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. egcitest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Test, CReg, and RReg. reg1 — Cointegrating regression statistics structure array Cointegrating regression statistics, returned as a structure array. The number of records equal to the number of tests. egcitest regresses the response variable ResponseVariable onto the predictor variables PredictorVariables using the regression form CReg and specified equality constraints CVec. Each element of reg1 has the fields in this table. You can access a field using dot notation, for example, reg1(3).coeff contains the coefficient estimates of the third test.
12-560
num
Length of input series with NaNs removed
size
Effective sample size, adjusted for lags and difference
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
egcitest
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
reg2 — Residual regression statistics structure array Residual regression statistics, returned as a structure array containing the same fields as reg1. The number of records equal to the number of tests. egcitest tests the residuals of the cointegrating regression for a unit root by passing the residuals, and the values of Lags and Test, to the test specified by RReg. The tests form the test statistic by a regression of the residuals using specified options. For more details on the test options and the fields of reg2, see adftest or pptest.
Tips • To draw valid inferences from the test, determine a suitable value for Lags. For more details, see the adftest “Tips” on page 12-17 and the pptest “Tips” on page 12-1963. • Samples with less than approximately 20 through 40 observations (depending on the dimension of the data numDims) can yield unreliable critical values, and therefore unreliable inferences. See [3]. • If a test result suggests that the time series are cointegrated, you can use the residuals as data for the error-correction term in a VEC representation of the variables. Follow this procedure: 1
Extract the residuals from the reg1 output (reg1.res).
2
Estimate autoregressive model components using the estimate function of varm, and treat the extracted residual series as exogenous for estimation.
Alternative Functionality App The Econometric Modeler app enables you to conduct the Engle-Granger cointegration test. 12-561
12
Functions
Version History Introduced in R2011a
References [1] Engle, R. F. and C. W. J. Granger. "Co-Integration and Error-Correction: Representation, Estimation, and Testing." Econometrica. Vol. 55, 1987, pp. 251–276. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] MacKinnon, J. G. "Numerical Distribution Functions for Unit Root and Cointegration Tests." Journal of Applied Econometrics. Vol. 11, 1996, pp. 601–618.
See Also Objects vecm | varm Functions estimate | jcitest | adftest | pptest | vec2var Topics “Cointegration and Error Correction Analysis” on page 9-107 “Identifying Single Cointegrating Relations” on page 9-113 “Test for Cointegration Using the Engle-Granger Test” on page 9-117 “Estimate VEC Model Parameters Using egcitest” on page 9-121 “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50
12-562
eigplot
eigplot Plot Markov chain eigenvalues
Syntax eigplot(mc) eVals = eigplot(mc) eigplot(ax,mc) [eVals,h] = eigplot( ___ )
Description eigplot(mc) creates a plot containing the eigenvalues of the transition matrix of the discrete-time Markov chain mc on the complex plane. The plot highlights the following: • Unit circle • Perron-Frobenius eigenvalue at (1,0) • Circle of second largest eigenvalue magnitude (SLEM) • Spectral gap between the two circles, which determines the mixing time eVals = eigplot(mc) additionally returns the eigenvalues eVals sorted by magnitude. eigplot(ax,mc) plots on the axes specified by ax instead of the current axes (gca). [eVals,h] = eigplot( ___ ) additionally returns the handle to the eigenvalue plot using input any of the input arguments in the previous syntaxes. Use h to modify properties of the plot after you create it.
Examples Plot Markov Chain Eigenvalues Create 10-state Markov chains from two random transition matrices, with one transition matrix being more sparse than the other. rng(1); % For reproducibility numstates = 10; mc1 = mcmix(numstates,'Zeros',20); mc2 = mcmix(numstates,'Zeros',80); % mc2.P is more sparse than mc1.P
Plot the eigenvalues of the transition matrices on the separate complex planes. figure; eigplot(mc1);
12-563
12
Functions
figure; eigplot(mc2);
12-564
eigplot
The pink disc in the plots show the spectral gap (the difference between the two largest eigenvalue moduli). The spectral gap determines the mixing time of the Markov chain. Large gaps indicate faster mixing, whereas thin gaps indicate slower mixing. Because the spectral gap of mc1 is thicker than the spectral gap of mc2, mc1 mixes faster than mc2.
Compute Transition Matrix Eigenvalues Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 0
0 0 0
1/2 1/4 1/4 0 0 ; 1/3 0 2/3 0 0 ; 0 0 0 1/3 2/3;
12-565
12
Functions
0 0 0 0 1/2 1/2 1/4 3/4 mc = dtmc(P);
0 0 0 0
0 0 0 0
0 0 0 0
1/2 1/2; 3/4 1/4; 0 0 ; 0 0 ];
Plot and return the eigenvalues of the transition matrix on the complex plane. figure; eVals = eigplot(mc)
eVals = 7×1 complex -0.5000 -0.5000 1.0000 -0.3207 0.1604 0.1604 -0.0000
+ + + + +
0.8660i 0.8660i 0.0000i 0.0000i 0.2777i 0.2777i 0.0000i
Three eigenvalues have modulus one, which indicates that the period of mc is three. Compute the mixing time of the Markov chain. [~,tMix] = asymptotics(mc) tMix = 0.8793
12-566
eigplot
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, eigplot plots to the current axes (gca).
Output Arguments eVals — Transition matrix eigenvalues numeric vector Transition matrix eigenvalues sorted by magnitude, returned as a numeric vector. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot. Note • By the Perron-Frobenius Theorem [2], a chain with a single recurrent communicating class (a unichain) has exactly one eigenvalue equal to 1 (the Perron-Frobenius eigenvalue), and an accompanying nonnegative left eigenvector that normalizes to a unique stationary distribution. All other eigenvalues have modulus less than or equal to 1. The inequality is strict unless the recurrent class is periodic. When there is periodicity of period k, there are k eigenvalues on the unit circle at the k roots of unity. • For an ergodic unichain, any initial distribution converges to the stationary distribution at a rate determined by the second largest eigenvalue modulus (SLEM), μ. The spectral gap, 1 – μ, provides a visual measure, with large gaps (smaller SLEM circles) producing faster convergence. Rates are exponential, with a characteristic time given by tMix = −
1 . log μ
See asymptotics.
Version History Introduced in R2017b 12-567
12
Functions
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [3] Seneta, E. Non-negative Matrices and Markov Chains. New York, NY: Springer-Verlag, 1981.
See Also Objects dtmc Functions asymptotics Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Determine Asymptotic Behavior of Markov Chain” on page 10-39 “Compare Markov Chain Mixing Times” on page 10-50
12-568
empiricalblm
empiricalblm Bayesian linear regression model with samples from prior or posterior distributions
Description The Bayesian linear regression model on page 12-577 object empiricalblm contains samples from the prior distributions of β and σ2, which MATLAB uses to characterize the prior or posterior distributions. The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. Because the form of the prior distribution functions are unknown, the resulting posterior distributions are not analytically tractable. Hence, to estimate or simulate from posterior distributions, MATLAB implements sampling importance resampling on page 12-578. You can create a Bayesian linear regression model with an empirical prior directly using bayeslm or empiricalblm. However, for empirical priors, estimating the posterior distribution requires that the prior closely resemble the posterior. Hence, empirical models are better suited for updating posterior distributions estimated using Monte Carlo sampling (for example, semiconjugate and custom prior models) given new data.
Creation Either the estimate function returns an empiricalblm object or you directly create one by using empiricalblm. • Return empiricalblm object using estimate: For semiconjugate, empirical, custom, and variable-selection prior models, estimate estimates the posterior distribution using Monte Carlo sampling. Specifically, estimate characterizes the posterior distribution by a large number of draws from that distribution. estimate stores the draws in the BetaDraws and Sigma2Draws properties of the returned Bayesian linear regression model object. Hence, when you estimate semiconjugateblm, empiricalblm, customblm, lassoblm, mixconjugateblm, and mixconjugateblm model objects, estimate returns an empiricalblm object. • Create empiricalblm object directly: If you want to update an estimated posterior distribution using new data, and you have draws from the posterior distribution of β and σ2, you can create an empirical model using empiricalblm.
Syntax PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws) PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws,Name,Value)
12-569
12
Functions
Description PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws) creates a Bayesian linear regression model on page 12-577 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The random samples from the prior distributions of β and σ2, BetaDraws and Sigma2Draws, respectively, characterize the prior distributions. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,' Sigma2Draws',Sigma2Draws,Name,Value) sets properties on page 12-570 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, empiricalblm(2,'BetaDraws',BetaDraws,'Sigma2Draws',Sigma2Draws,'Intercept', false) specifies the random samples from the prior distributions of β and σ2 and specifies a regression model with 2 regression coefficients, but no intercept.
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to specify that there is no model intercept in PriorMdl, a Bayesian linear regression model containing three model coefficients, enter PriorMdl.Intercept = false;
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term from the value. After creating a model, if you change the value of NumPredictors using dot notation, then VarNames reverts to its default value. Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table.
12-570
Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
empiricalblm
Value
Description
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char BetaDraws — Random sample from prior distribution of β numeric matrix Random sample from the prior distribution of β, specified as a (Intercept + NumPredictors)-byNumDraws numeric matrix. Rows correspond to regression coefficients; the first row corresponds to the intercept, and the subsequent rows correspond to columns in the predictor data. Columns correspond to successive draws from the prior distribution. BetaDraws and SigmaDraws must have the same number of columns. NumDraws should be reasonably large. Data Types: double Sigma2Draws — Random sample from prior distribution of σ2 numeric matrix Random sample from the prior distribution of σ2, specified as a 1-by-NumDraws numeric matrix. Columns correspond to successive draws from the prior distribution. BetaDraws and Sigma2Draws must have the same number of columns. NumDraws should be reasonably large. Data Types: double
Object Functions estimate
Estimate posterior distribution of Bayesian linear regression model parameters 12-571
12
Functions
simulate forecast plot summarize
Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of standard Bayesian linear regression model
Examples Create Empirical Prior Model Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t time points, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume that the prior distributions are: •
β | σ2 ∼ N4 M, V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. These assumptions and the data likelihood imply a normal-inverse-gamma semiconjugate model. That is, the conditional posteriors are conjugate to the prior with respect to the data likelihood, but the marginal posterior is analytically intractable. Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors p. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate') PriorMdl = semiconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ------------------------------------------------------------------------------Intercept | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(1) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(2) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(3) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2)
12-572
empiricalblm
Sigma2
| 0.5000
0.5000
[ 0.138,
1.616]
1.000
IG(3.00,
1)
Mdl is a semiconjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, bayeslm displays a summary of the prior distributions. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser VarNames = {'IPI'; 'E'; 'WR'}; X = DataTable{:,VarNames}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y); Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical Beta(1) | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical Beta(2) | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical Beta(3) | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical
PosteriorMdl is an empiricalblm model object storing draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of WR is in [1.762, 3.178] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.005. In this case, the marginal posterior is analytically intractable. Therefore, estimate uses Gibbs sampling to draw from the posterior and estimate the posterior characteristics.
Update Marginal Posterior Distribution for New Data Consider the linear regression model in “Create Empirical Prior Model” on page 12-572. Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]);
12-573
12
Functions
Load the Nelson-Plosser data set. Partition the data by reserving the last five periods in the series. load X0 = y0 = X1 = y1 =
Data_NelsonPlosser DataTable{1:(end - 5),PriorMdl.VarNames(2:end)}; DataTable{1:(end - 5),'GNPR'}; DataTable{(end - 4):end,PriorMdl.VarNames(2:end)}; DataTable{(end - 4):end,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. rng(1); % For reproducibility PosteriorMdl0 = estimate(PriorMdl,X0,y0); Method: Gibbs sampling with 10000 draws Number of observations: 57 Number of predictors: 4 | Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | -34.3887 10.5218 [-55.350, -13.615] 0.001 Empirical IPI | 3.9076 0.2786 [ 3.356, 4.459] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 3.2146 0.4967 [ 2.228, 4.196] 1.000 Empirical Sigma2 | 45.3098 8.5597 [31.620, 64.972] 1.000 Empirical
PosteriorMdl0 is an empiricalblm model object storing the Gibbs-sampling draws from the posterior distribution. Update the posterior distribution based on the last 5 periods of data by passing those observations and the posterior distribution to estimate. PosteriorMdl1 = estimate(PosteriorMdl0,X1,y1); Method: Importance sampling/resampling with 10000 draws Number of observations: 5 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -24.3152 9.3408 [-41.163, -5.301] 0.008 Empirical IPI | 4.3893 0.1440 [ 4.107, 4.658] 1.000 Empirical E | 0.0011 0.0004 [ 0.000, 0.002] 0.998 Empirical WR | 2.4763 0.3694 [ 1.630, 3.170] 1.000 Empirical Sigma2 | 46.5211 8.2913 [33.646, 65.402] 1.000 Empirical
To update the posterior distributions based on draws, estimate uses sampling importance resampling.
Estimate Posterior Probability Using Monte Carlo Simulation Consider the linear regression model in “Estimate Marginal Posterior Distribution” on page 12-2070. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. 12-574
empiricalblm
p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y); Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical IPI | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical
Estimate posterior distribution summary statistics for β by using the draws from the posterior distribution stored in posterior model. estBeta = mean(PosteriorMdl.BetaDraws,2); EstBetaCov = cov(PosteriorMdl.BetaDraws');
Suppose that if the coefficient of real wages is below 2.5, then a policy is enacted. Although the posterior distribution of WR is known, and so you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead. Draw 1e6 samples from the marginal posterior distribution of β. NumDraws = 1e6; rng(1); BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws);
BetaSim is a 4-by- 1e6 matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws. Isolate the draws corresponding to the coefficient of real wages, and then identify which draws are less than 2.5. isWR = PosteriorMdl.VarNames == "WR"; wrSim = BetaSim(isWR,:); isWRLT2p5 = wrSim < 2.5;
Find the marginal posterior probability that the regression coefficient of WR is below 2.5 by computing the proportion of draws that are less than 2.5. probWRLT2p5 = mean(isWRLT2p5) probWRLT2p5 = 0.5283
The posterior probability that the coefficient of real wages is less than 2.5 is about 0.53. Copyright 2018 The MathWorks, Inc. 12-575
12
Functions
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Estimate Marginal Posterior Distribution” on page 12-2070. Create a prior model for the regression coefficients and disturbance variance, then estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so you can use them to forecast real GNP. Turn the estimation display off. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and using the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off
12-576
empiricalblm
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 25.1938
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. 12-577
12
Functions
• β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters. Sampling Importance Resampling In the Bayesian statistics context, sampling importance resampling is a Monte Carlo algorithm for drawing samples from the posterior distribution. In general, the algorithm draws a large, weighted sample of parameter values with replacement from their respective prior distributions. This is the algorithm in the context of Bayesian linear regression. 1
Randomly draw a large number of samples from the joint prior distribution π(β,σk2). B is the (p + 1)-by-K matrix containing the values of β, and Σ is the 1-by-K vector containing the values of σ2.
2
For each draw k = 1,...,K: a
Estimate the residual vector rkt = yt − Xt βk, where yj is response j, Xj is the 1-by-p vector of predictor observations, and βk is draw k from B.
b
Evaluate the likelihood ℓk = ∑ j log ϕ rk j; 0, σk2 , where ϕ(rkj;0,σk2) is the pdf of the normal distribution with a mean of 0 and variance of σk2 (draw k from Σ).
c
3
Estimate the normalized importance weight wk =
exp ℓk
∑k exp ℓk
.
Randomly draw, with replacement, K parameters from B and Σ with respect to the normalized importance weights.
The resulting sample is approximately from the joint posterior distribution π(β,σk2|yt,Xt). Those prior draws that the algorithm is likely to choose to be in the posterior sample are those that yield higher data likelihood values. If the prior draws yield poor likelihood values, then the chosen posterior sample will poorly approximate the actual posterior distribution. For details on diagnostics, see “Algorithms” on page 12-578.
Algorithms • After implementing sampling importance resampling on page 12-578 to sample from the posterior distribution, estimate, simulate, and forecast compute the effective sample size (ESS), which is the number of samples required to yield reasonable posterior statistics and inferences. Its formula is 12-578
empiricalblm
ESS =
1 . ∑ j w2j
If ESS < 0.01*NumDraws, then MATLAB throws a warning. The warning implies that, given the sample from the prior distribution, the sample from the proposal distribution is too small to yield good quality posterior statistics and inferences. • If the effective sample size is too small, then: • Increase the sample size of the draws from the prior distributions. • Adjust the prior distribution hyperparameters, and then resample from them. • Specify BetaDraws and Sigma2Draws as samples from informative prior distributions. That is, if the proposal draws come from nearly flat distributions, then the algorithm can be inefficient.
Alternatives The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2017a
See Also Objects customblm | semiconjugateblm | lassoblm | mixconjugateblm | mixconjugateblm Functions bayeslm Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-579
12
Functions
empiricalbvarm Bayesian vector autoregression (VAR) model with samples from prior or posterior distribution
Description The Bayesian VAR model on page 12-589 object empiricalbvarm contains samples from the distributions of the coefficients Λ and innovations covariance matrix Σ of a VAR(p) model, which MATLAB uses to characterize the corresponding prior or posterior distributions. For Bayesian VAR model objects that have an intractable posterior, the estimate function returns an empiricalbvarm object representing the empirical posterior distribution. However, if you have random draws from the prior or posterior distributions of the coefficients and innovations covariance matrix, you can create a Bayesian VAR model with an empirical prior directly by using empiricalbvarm.
Creation Syntax Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,'SigmaDraws', SigmaDraws) Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,' SigmaDraws',SigmaDraws,Name,Value) Description Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,'SigmaDraws', SigmaDraws) creates a numseries-D Bayesian VAR(numlags) model object Mdl characterized by the random samples from the prior or posterior distributions of λ = vec Λ = vec Φ1 Φ2 ⋯ Φp c δ Β ′ and Σ, CoeffDraws and SigmaDraws, respectively. • numseries = m, a positive integer specifying the number of response time series variables. • numlags = p, a nonnegative integer specifying the AR polynomial order (that is, number of numseries-by-numseries AR coefficient matrices in the VAR model). Mdl = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,' SigmaDraws',SigmaDraws,Name,Value) sets writable properties on page 12-581 (except NumSeries and P) using name-value pair arguments. Enclose each property name in quotes. For example, empiricalbvarm(3,2,'CoeffDraws',CoeffDraws,'SigmaDraws',SigmaDraws,'SeriesNam es',["UnemploymentRate" "CPI" "FEDFUNDS"]) specifies the random samples from the distributions of λ and Σ and the names of the three response variables. Because the posterior distributions of a semiconjugate prior model (semiconjugatebvarm) are analytically intractable, estimate returns an empiricalbvarm object that characterizes the posteriors and contains the Gibbs sampler draws from the full conditionals. 12-580
empiricalbvarm
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double numlags — Number of lagged responses nonnegative integer Number of lagged responses in each equation of yt, specified as a nonnegative integer. The resulting model is a VAR(numlags) model; each lag has a numseries-by-numseries coefficient matrix. numlags sets the P property. Data Types: double
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to create a 3-D Bayesian VAR(1) model from the coefficient and innovations covariance arrays of draws CoeffDraws and SigmaDraws, respectively, and then label the response variables, enter: Mdl = empiricalbvarm(3,1,'CoeffDraws',CoeffDraws,'SigmaDraws',SigmaDraws); Mdl.SeriesNames = ["UnemploymentRate" "CPI" "FEDFUNDS"]; Required Draws from Distribution
CoeffDraws — Random sample from prior or posterior distribution of λ numeric matrix Random sample from the prior or posterior distribution of λ, specified as a NumSeries*k-bynumdraws numeric matrix, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). CoeffDraws represents the empirical distribution of λ based on a size numdraws sample. Columns correspond to successive draws from the distribution. CoeffDraws(1:k,:) corresponds to all coefficients in the equation of response variable SeriesNames(1), CoeffDraws((k + 1): (2*k),:) corresponds to all coefficients in the equation of response variable SeriesNames(2), and so on. For a set of row indices corresponding to an equation: • Elements 1 through NumSeries correspond to the lag 1 AR coefficients of the response variables ordered by SeriesNames. • Elements NumSeries + 1 through 2*NumSeries correspond to the lag 2 AR coefficients of the response variables ordered by SeriesNames. • In general, elements (q – 1)*NumSeries + 1 through q*NumSeries correspond to the lag q AR coefficients of the response variables ordered by SeriesNames. • If IncludeConstant is true, element NumSeries*P + 1 is the model constant. 12-581
12
Functions
• If IncludeTrend is true, element NumSeries*P + 2 is the linear time trend coefficient. • If NumPredictors > 0, elements NumSeries*P + 3 through k constitute the vector of regression coefficients of the exogenous variables. This figure shows the row structure of CoeffDraws for a 2-D VAR(3) model that contains a constant vector and four exogenous predictors: y1, t y2, t ⨉ ⨉ [ϕ1, 11 ϕ1, 12 ϕ2, 11 ϕ2, 12 ϕ3, 11 ϕ3, 12 c1 β11 β12 β13 β14 ϕ1, 21 ϕ1, 22 ϕ2, 21 ϕ2, 22 ϕ3, 21 ϕ3, 22 c2 β21 β22 β23 β24
], where • ϕq,jk is element (j,k) of the lag q AR coefficient matrix. • cj is the model constant in the equation of response variable j. • Bju is the regression coefficient of the exogenous variable u in the equation of response variable j. CoeffDraws and SigmaDraws must be based on the same number of draws, and both must represent draws from either the prior or posterior distribution. numdraws should be reasonably large, for example, 1e6. Data Types: double SigmaDraws — Random sample from prior or posterior distribution of Σ array of positive definite numeric matrices Random sample from the prior or posterior distribution of Σ, specified as a NumSeries-byNumSeries-by-numdraws array of positive definite numeric matrices. SigmaDraws represents the empirical distribution of Σ based on a size numdraws sample. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. Columns correspond to successive draws from the distribution. CoeffDraws and SigmaDraws must be based on the same number of draws, and both must represent draws from either the prior or posterior distribution. numdraws should be reasonably large, for example, 1e6. Data Types: double Model Characteristics and Dimensionality
Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the model dimensionality, for example '2-Dimensional VAR(3) Model'. Example: "Model 1" Data Types: string | char NumSeries — Number of time series m positive integer 12-582
empiricalbvarm
This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double P — Multivariate autoregressive polynomial order nonnegative integer This property is read-only. Multivariate autoregressive polynomial order, specified as a nonnegative integer. P is the maximum lag that has a nonzero coefficient matrix. P specifies the number of presample observations required to initialize the model. Data Types: double SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. empiricalbvarm stores SeriesNames as a string vector. Example: ["UnemploymentRate" "CPI" "FEDFUNDS"] Data Types: string IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Data Types: logical IncludeTrend — Flag for including linear time trend term δt false (default) | true Flag for including a linear time trend term δt, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Data Types: logical 12-583
12
Functions
NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. empiricalbvarm includes all predictor variables symmetrically in each response equation. VAR Model Parameters Derived from Distribution Draws
AR — Distribution mean of autoregressive coefficient matrices Φ1,…,Φp cell vector of numeric matrices This property is read-only. Distribution mean of the autoregressive coefficient matrices Φ1,…,Φp associated with the lagged responses, specified as a P-D cell vector of NumSeries-by-NumSeries numeric matrices. AR{j} is Φj, the coefficient matrix of lag j. Rows correspond to equations and columns correspond to lagged response variables; SeriesNames determines the order of response variables and equations. Coefficient signs are those of the VAR model expressed in difference-equation notation. If P = 0, AR is an empty cell. Otherwise, AR is the collection of AR coefficient means extracted from Mu. Data Types: cell Constant — Distribution mean of model constant c numeric vector This property is read-only. Distribution mean of the model constant c (or intercept), specified as a NumSeries-by-1 numeric vector. Constant(j) is the constant in equation j; SeriesNames determines the order of equations. If IncludeConstant = false, Constant is an empty array. Otherwise, Constant is the model constant vector mean extracted from Mu. Data Types: double Trend — Distribution mean of linear time trend δ numeric vector This property is read-only. Distribution mean of the linear time trend δ, specified as a NumSeries-by-1 numeric vector. Trend(j) is the linear time trend in equation j; SeriesNames determines the order of equations. If IncludeTrend = false (the default), Trend is an empty array. Otherwise, Trend is the linear time trend coefficient mean extracted from Mu. Data Types: double Beta — Distribution mean of regression coefficient matrix Β numeric matrix This property is read-only. 12-584
empiricalbvarm
Distribution mean of the regression coefficient matrix B associated with the exogenous predictor variables, specified as a NumSeries-by-NumPredictors numeric matrix. Beta(j,:) contains the regression coefficients of each predictor in the equation of response variable j yj,t. Beta(:,k) contains the regression coefficient in each equation of predictor xk. By default, all predictor variables are in the regression component of all response equations. You can down-weight a predictor from an equation by specifying, for the corresponding coefficient, a prior mean of 0 in Mu and a small variance in V. When you create a model, the predictor variables are hypothetical. You specify predictor data when you operate on the model (for example, when you estimate the posterior by using estimate). Columns of the predictor data determine the order of the columns of Beta. Data Types: double Covariance — Distribution mean of innovations covariance matrix Σ positive definite numeric matrix This property is read-only. Distribution mean of the innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries positive definite numeric matrix. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. Data Types: double
Object Functions summarize
Distribution summary statistics of Bayesian vector autoregression (VAR) model
Examples Create Empirical Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. You can create an empirical Bayesian VAR model for the coefficients Φ1, . . . , Φ4, c ′ and innovations covariance matrix Σ in two ways: 1
Indirectly create an empiricalbvarm model by estimating the posterior distribution of a semiconjugate prior model.
2
Directly create an empiricalbvarm model by supplying draws from the prior or posterior distribution of the parameters. 12-585
12
Functions
Indirect Creation Assume the following prior distributions: • vec Φ1, . . . , Φ4, c ′ Σ ∼ Ν39 μ, V , where μ is a 39-by-1 vector of means and V is the 39-by-39 covariance matrix. • Σ ∼ Inverse Wishart Ω, ν , where Ω is the 3-by-3 scale matrix and ν is the degrees of freedom. Create a semiconjugate prior model for the 3-D VAR(4) model parameters. numseries = 3; numlags = 4; PriorMdl = semiconjugatebvarm(numseries,numlags) PriorMdl = semiconjugatebvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Omega: DoF: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] 13 {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a semiconjugatebvarm Bayesian VAR model object representing the prior distribution of the coefficients and innovations covariance of the 3-D VAR(4) model. Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
12-586
empiricalbvarm
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Estimate the posterior distribution by passing the prior model and entire data series to estimate. rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','off') PosteriorMdl = empiricalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: CoeffDraws: SigmaDraws: AR:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x10000 double] [3x3x10000 double] {[3x3 double] [3x3 double] [3x3 double]
[3x3 double]}
12-587
12
Functions
Constant: Trend: Beta: Covariance:
[3x1 [3x0 [3x0 [3x3
double] double] double] double]
PosteriorMdl is an empiricalbvarm model representing the empirical posterior distribution of the coefficients and innovations covariance matrix. empiricalbvarm stores the draws from the posteriors of λ and Σ in the CoeffDraws and SigmaDraws properties, respectively. Direct Creation Draw a random sample of size 1000 from the prior distribution PriorMdl. numdraws = 1000; [CoeffDraws,SigmaDraws] = simulate(PriorMdl,'NumDraws',numdraws); size(CoeffDraws) ans = 1×2 39
1000
size(SigmaDraws) ans = 1×3 3
3
1000
Create a Bayesian VAR model characterizing the empirical prior distributions of the parameters. PriorMdlEmp = empiricalbvarm(numseries,numlags,'CoeffDraws',CoeffDraws,... 'SigmaDraws',SigmaDraws) PriorMdlEmp = empiricalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: CoeffDraws: SigmaDraws: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1000 double] [3x3x1000 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
Display the prior covariance mean matrices of the four AR coefficients by setting each matrix in the cell to a variable. AR1 = PriorMdlEmp.AR{1}
12-588
empiricalbvarm
AR1 = 3×3 -0.0198 -0.0207 -0.0009
0.0181 -0.0301 0.0638
-0.0273 -0.0070 0.0113
AR2 = PriorMdlEmp.AR{2} AR2 = 3×3 -0.0453 -0.0103 0.0277
0.0371 -0.0304 -0.0253
0.0110 -0.0011 0.0061
AR3 = PriorMdlEmp.AR{3} AR3 = 3×3 0.0368 -0.0306 -0.0314
-0.0059 -0.0106 -0.0276
0.0018 0.0179 0.0116
AR4 = PriorMdlEmp.AR{4} AR4 = 3×3 0.0159 -0.0178 0.0476
0.0406 0.0415 -0.0128
-0.0315 -0.0024 -0.0165
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. 12-589
12
Functions
• δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. • zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation.
Version History Introduced in R2020a
See Also Functions bayesvarm Objects semiconjugatebvarm
12-590
estimate
estimate Fit univariate ARIMA or ARIMAX model to data
Syntax EstMdl = estimate(Mdl,y) [EstMdl,EstParamCov,logL,info] = estimate( ___ ) EstMdl = estimate(Mdl,Tbl1) [EstMdl,EstParamCov,logL,info] = estimate(Mdl,Tbl1) [ ___ ] = estimate( ___ ,Name,Value)
Description EstMdl = estimate(Mdl,y) returns the fully specified ARIMA model EstMdl. This model stores the estimated parameter values resulting from fitting the partially specified ARIMA model Mdl to the observed univariate time series y by using maximum likelihood. EstMdl and Mdl are the same model type and have the same structure. [EstMdl,EstParamCov,logL,info] = estimate( ___ ) also returns the estimated variancecovariance matrix associated with estimated parameters EstParamCov, the optimized loglikelihood objective function logL, and a data structure of summary information info. EstMdl = estimate(Mdl,Tbl1) fits the partially specified ARIMA model Mdl to response variable in the input table or timetable Tbl1, which contains time series data, and returns the fully specified, estimated ARIMA model EstMdl. estimate selects the response variable named in Mdl.SeriesName or the sole variable in Tbl1. To select a different response variable in Tbl1 to fit the model to, use the ResponseVariable name-value argument. [EstMdl,EstParamCov,logL,info] = estimate(Mdl,Tbl1) also returns the estimated variance-covariance matrix associated with estimated parameters EstParamCov, the optimized loglikelihood objective function logL, and a data structure of summary information info. [ ___ ] = estimate( ___ ,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. estimate returns the output argument combination for the corresponding input arguments. For example, estimate(Mdl,y,Y0=y0,X=Pred) fits the ARIMA model Mdl to the vector of response data y, specifies the vector of presample response data y0, and includes a linear regression term in the model for the exogenous predictor data Pred. Supply all input data using the same data type. Specifically: • If you specify the numeric vector y, optional data sets must be numeric arrays and you must use the appropriate name-value argument. For example, to specify a presample, set the Y0 name-value argument to a numeric matrix of presample data. • If you specify the table or timetable Tbl1, optional data sets must be tables or timetables, respectively, and you must use the appropriate name-value argument. For example, to specify a presample, set the Presample name-value argument to a table or timetable of presample data. 12-591
12
Functions
Examples Fit ARMA Model to Vector of Simulated Response Data Fit an ARMA(2,1) model to simulated data. Simulate Data from Known Model Suppose that the data generating process (DGP) is yt = 0 . 5yt − 1 − 0 . 3yt − 2 + εt + 0 . 2εt − 1, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1. Create the ARMA(2,1) model representing the DGP. DGP = arima(AR={0.5,-0.3},MA=0.2,Constant=0, ... Variance=0.1) DGP = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 1 0 {0.5 -0.3} at lags [1 2] {} {0.2} at lag [1] {} 0 [1×0] 0.1
DGP is a fully specified arima model object. Simulate a random 500 observation path from the ARMA(2,1) model. rng(5,"twister"); % For reproducibility T = 500; y = simulate(DGP,T);
y is a 500-by-1 column vector representing a simulated response path from the ARMA(2,1) model DGP. Estimate Model Create an ARMA(2,1) model template for estimation. Mdl = arima(2,0,1) Mdl = arima with properties:
12-592
estimate
Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 1 NaN {NaN NaN} at lags [1 2] {} {NaN} at lag [1] {} 0 [1×0] NaN
Mdl is a partially specified arima model object. Only required, nonestimable parameters that determine the model structure are specified. NaN-valued properties, including ϕ1, ϕ2, θ1, c, and σ2, are unknown model parameters to be estimated. Fit the ARMA(2,1) model to y. EstMdl = estimate(Mdl,y) ARIMA(2,0,1) Model (Gaussian Distribution):
Constant AR{1} AR{2} MA{1} Variance
Value _________
StandardError _____________
0.0089018 0.49563 -0.25495 0.27737 0.10004
0.018417 0.10323 0.070155 0.10732 0.0066577
TStatistic __________ 0.48334 4.8013 -3.6341 2.5846 15.027
PValue __________ 0.62886 1.5767e-06 0.00027897 0.0097491 4.9017e-51
EstMdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 1 0.00890178 {0.495632 -0.254951} at lags [1 2] {} {0.27737} at lag [1] {} 0 [1×0] 0.100043
MATLAB® displays a table containing an estimation summary, which includes parameter estimates and inferences. For example, the Value column contains corresponding maximum-likelihood estimates, and the PValue column contains p-values for the asymptotic t-test of the null hypothesis that the corresponding parameter is 0. 12-593
12
Functions
EstMdl is a fully specified, estimated arima model object; its estimates resemble the parameter values of the DGP.
Apply Equality Constraints to Parameters During Estimation Fit an AR(2) model to simulated data while holding the model constant fixed during estimation. Simulate Data from Known Model Suppose the DGP is yt = 0 . 5yt − 1 − 0 . 3yt − 2 + εt, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1. Create the AR(2) model representing the DGP. DGP = arima(AR={0.5,-0.3},Constant=0,Variance=0.1);
Simulate a random 500 observation path from the model. rng(5,"twister"); % For reproducibility T = 500; y = simulate(DGP,T);
Create Model Object Specifying Constraint Assume that the mean of yt is 0, which implies that c is 0. Create an AR(2) model for estimation. Set c to 0. Mdl = arima(ARLags=1:2,Constant=0) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 0 0 {NaN NaN} at lags [1 2] {} {} {} 0 [1×0] NaN
Mdl is a partially specified arima model object. Specified parameters include all required parameters and the model constant. NaN-valued properties, including ϕ1, ϕ2, and σ2, are unknown model parameters to be estimated. 12-594
estimate
Estimate Model Fit the AR(2) model template containing the constraint to y. EstMdl = estimate(Mdl,y) ARIMA(2,0,0) Model (Gaussian Distribution): Value ________ Constant AR{1} AR{2} Variance
0 0.56342 -0.29355 0.10022
StandardError _____________
TStatistic __________
0 0.044225 0.041786 0.006644
NaN 12.74 -7.0252 15.085
PValue __________ NaN 3.5474e-37 2.137e-12 2.0476e-51
EstMdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 0 0 {0.563425 -0.293554} at lags [1 2] {} {} {} 0 [1×0] 0.100222
EstMdl is a fully specified, estimated arima model object; its estimates resemble the parameter values of the AR(2) model DGP. The value of c in the estimation summary and object display is 0, and corresponding inferences are trivial or do not apply.
Compute Estimated Standard Errors Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
The table DataTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 2001. Convert the table to a timetable. dt = datetime(dates,'ConvertFrom','datenum','Format','yyyy-MM-dd'); TT = table2timetable(DataTable,'RowTimes',dt);
Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period Fit an ARIMA(1,1,1) model to the data, and return the estimated parameter covariance matrix. 12-595
12
Functions
Mdl = arima(1,1,1); [EstMdl,EstParamCov] = estimate(Mdl,TT{:,"NYSE"}); ARIMA(1,1,1) Model (Gaussian Distribution):
Constant AR{1} MA{1} Variance
Value ________
StandardError _____________
0.15745 -0.21995 0.2854 17.159
0.09783 0.15642 0.15382 0.20038
TStatistic __________ 1.6094 -1.4062 1.8554 85.632
PValue ________ 0.10753 0.15967 0.063541 0
EstParamCov EstParamCov = 4×4 0.0096 -0.0002 0.0002 0.0023
-0.0002 0.0245 -0.0240 -0.0060
0.0002 -0.0240 0.0237 0.0057
0.0023 -0.0060 0.0057 0.0402
EstMdl is a fully specified, estimated arima model object. Rows and columns of EstParamCov correspond to the rows in the table of estimates and inferences; for example, Cov ϕ1, θ1 = − 0 . 024. Compute estimated parameter standard errors by taking the square root of the diagonal elements of the covariance matrix. estParamSE = sqrt(diag(EstParamCov)) estParamSE = 4×1 0.0978 0.1564 0.1538 0.2004
Compute a Wald-based 95% confidence interval on ϕ. T = size(TT,1); % Effective sample size phihat = EstMdl.AR{1}; sephihat = estParamSE(2); ciphi = phihat + tinv([0.025 0.975],T - 3)*sephihat ciphi = 1×2 -0.5266
0.0867
The interval contains 0, which suggests that ϕ is insignificant.
12-596
estimate
Fit ARIMA Model to Response Variable in Timetable Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply a timetable of data and specify the series for the fit. Load Data Load the US equity index data set Data_EquityIdx. load Data_EquityIdx T = height(DataTimeTable) T = 3028
The timetable DataTimeTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 2001. Plot the daily NYSE price series. figure plot(DataTimeTable.Time,DataTimeTable.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")
Prepare Timetable for Estimation When you plan to supply a timetable, you must ensure it has all the following characteristics: 12-597
12
Functions
• The selected response variable is numeric and does not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the timetable, relative to the NYSE price series. DTT = rmmissing(DataTimeTable,DataVariables="NYSE"); T_DTT = height(DTT) T_DTT = 3028
Because all sample times have observed NYSE prices, rmmissing does not remove any observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"days") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Business day rules make daily macroeconomic measurements irregular. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables. DTTW = convert2weekly(DTT,Aggregation="mean"); areTimestampsRegular = isregular(DTTW,"weeks") areTimestampsRegular = logical 1 T_DTTW = height(DTTW) T_DTTW = 627
DTTW is regular. figure plot(DTTW.Time,DTTW.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")
12-598
estimate
Create Model Template for Estimation Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Mdl = arima(1,1,1) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(1,1,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 1 1 NaN {NaN} at lag [1] {} {NaN} at lag [1] {} 0 [1×0] NaN
Mdl is a partially specified arima model object. 12-599
12
Functions
Fit Model to Data Fit an ARIMA(1,1,1) model to weekly average NYSE closing prices. Specify the entire series and the response variable name. EstMdl = estimate(Mdl,DTTW,ResponseVariable="NYSE"); ARIMA(1,1,1) Model (Gaussian Distribution):
Constant AR{1} MA{1} Variance
Value ________
StandardError _____________
0.86385 -0.37582 0.47221 55.89
0.46496 0.22719 0.21741 1.832
TStatistic __________ 1.8579 -1.6542 2.172 30.507
PValue ___________ 0.063181 0.098091 0.029859 2.1201e-204
EstMdl is a fully specified, estimated arima model object. By default, estimate backcasts for the required Mdl.P = 2 presample responses.
Initialize Model Estimation Using Presample Response Data Because an ARIMA model is a function of previous values, estimate requires presample data to initialize the model early in the sampling period. Although, estimate backcasts for presample data by default, you can specify required presample data instead. The P property of an arima model object specifies the required number of presample observations. Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply timetables of presample and estimation data sets. Load Data Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
Prepare Timetable for Estimation The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables. DTTW = convert2weekly(DataTimeTable,Aggregation="mean");
Create Model Template for Estimation Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Mdl = arima(1,1,1) Mdl = arima with properties:
12-600
estimate
Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(1,1,1) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 1 1 NaN {NaN} at lag [1] {} {NaN} at lag [1] {} 0 [1×0] NaN
Mdl.P is 2. Therefore, estimate requires 2 presample observations to initialize the model for estimation. Partition Sample Partition the entire sample DTTW into presample and estimation sample timetables. The presample occurs first and contains 2 observations and the estimation sample contains the remaining observations in DTTW. PS = DTTW(1:Mdl.P,:); ES = DTTW((Mdl.P+1):end,:);
Estimate Model Fit an ARIMA(1,1,1) model to the estimation sample. Specify the presample sample and response variable names. EstMdl = estimate(Mdl,ES,ResponseVariable="NYSE", ... Presample=PS,PresampleResponseVariable="NYSE"); ARIMA(1,1,1) Model (Gaussian Distribution):
Constant AR{1} MA{1} Variance
Value ________
StandardError _____________
0.83623 -0.32862 0.42703 56.065
0.453 0.23526 0.22613 1.8433
TStatistic __________ 1.846 -1.3968 1.8884 30.416
PValue ___________ 0.064891 0.16247 0.058966 3.3795e-203
Specify Initial Parameter Values for Optimization Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Specify initial parameter values obtained from an analysis of a pilot sample. Load Data Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
12-601
12
Functions
Prepare Timetable for Estimation The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables. DTTW = convert2weekly(DataTimeTable,Aggregation="mean");
Create Model Template for Estimation Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as NYSE. Mdl = arima(ARLags=1,D=1,MALags=1,SeriesName="NYSE");
Fit Model to Pilot Sample Treat the first two years as a pilot sample for obtaining initial parameter values when fitting the model to the remaining three years of data. Fit the model to the pilot sample. By default, estimate uses the response data in the table variable that matches Mdl.SeriesName. endPilot = datetime(1991,12,31); DTTW0 = DTTW(DTTW.Time endPilot,:); c0 = EstMdl0.Constant; ar0 = EstMdl0.AR; ma0 = EstMdl0.MA; var0 = EstMdl0.Variance; EstMdl = estimate(Mdl,DTTWEst,Constant0=c0,AR0=ar0, ... MA0=ma0,Variance0=var0); ARIMA(1,1,1) Model (Gaussian Distribution):
Constant AR{1} MA{1} Variance
12-602
Value ________
StandardError _____________
TStatistic __________
PValue ___________
0.93922 -0.38996 0.48477 64.661
0.55503 0.26259 0.25108 2.4853
1.6922 -1.485 1.9307 26.018
0.09061 0.13753 0.053514 3.1308e-149
estimate
Estimate ARIMA Model Containing Exogenous Predictors (ARIMAX) Fit an ARIMAX model to simulated time series data. Simulate Predictor and Response Data Create the ARIMAX(2,1,0) model for the DGP, represented by yt in the equation 1
(1 − 0 . 5L + 0 . 3L2)(1 − L) yt = 2 + 1 . 5x1, t + 2 . 6x2, t − 0 . 3x3, t + εt, where εt is a series of iid Gaussian random variables with mean 0 and variance 0.1. DGP = arima(AR={0.5,-0.3},D=1,Constant=2, ... Variance=0.1,Beta=[1.5 2.6 -0.3]);
Assume that the exogenous variables x1, t, x2, t, and x3, t are represented by the AR(1) processes x1, t = 0 . 1x1, t − 1 + η1, t x2, t = 0 . 2x2, t − 1 + η2, t x3, t = 0 . 3x3, t − 1 + η3, t, where ηi, t follows a Gaussian distribution with mean 0 and variance 0.01 for i ∈ 1, 2, 3 . Create ARIMA models that represent the exogenous variables. MdlX1 = arima(AR=0.1,Constant=0,Variance=0.01); MdlX2 = arima(AR=0.2,Constant=0,Variance=0.01); MdlX3 = arima(AR=0.3,Constant=0,Variance=0.01);
Simulate length 1000 exogenous series from the AR models. Store the simulated data in a matrix. T = 1000; rng(10,"twister"); % For reproducibility x1 = simulate(MdlX1,T); x2 = simulate(MdlX2,T); x3 = simulate(MdlX3,T); X = [x1 x2 x3];
X is a 1000-by-3 matrix of simulated time series data. Each row corresponds to an observation in the time series, and each column corresponds to an exogenous variable. Simulate a length 1000 series from the DGP. Specify the simulated exogenous data. y = simulate(DGP,T,X=X);
y is a 1000-by-1 vector of response data. Estimate Model Create an ARIMA(2,1,0) model template for estimation. Mdl = arima(2,1,0) Mdl = arima with properties: Description: "ARIMA(2,1,0) Model (Gaussian Distribution)"
12-603
12
Functions
SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Y" Name = "Gaussian" 3 1 0 NaN {NaN NaN} at lags [1 2] {} {} {} 0 [1×0] NaN
The model description (Description property) and value of Beta suggest that the partially specified arima model object Mdl is agnostic of the exogenous predictors. Estimate the ARIMAX(2,1,0) model; specify the exogenous predictor data. Because estimate backcasts for presample responses (a process that requires presample predictor data for ARIMAX models), fit the model to the latest T – Mdl.P responses. (Alternatively, you can specify presample responses by using the Y0 name-value argument.) EstMdl = estimate(Mdl,y((Mdl.P + 1):T),X=X); ARIMAX(2,1,0) Model (Gaussian Distribution):
Constant AR{1} AR{2} Beta(1) Beta(2) Beta(3) Variance
Value ________
StandardError _____________
1.7519 0.56076 -0.26625 1.4764 2.5638 -0.34422 0.10673
0.021143 0.016511 0.015966 0.10157 0.10445 0.098623 0.0047273
TStatistic __________ 82.859 33.963 -16.676 14.536 24.547 -3.4903 22.577
PValue ___________ 0 7.9497e-253 1.9636e-62 7.1228e-48 4.6633e-133 0.00048249 7.3161e-113
EstMdl is a fully specified, estimated arima model object. When you estimate the model by using estimate and supply the exogenous data by specifying the X name-value argument, MATLAB® recognizes the model as an ARIMAX(2,1,0) model and includes a linear regression component for the exogenous variables. The estimated model is 1
1 − 0 . 56L + 0 . 27L2 1 − L yt = 1 . 75 + 1 . 48x1, t + 2 . 56x2, t − 0 . 34x3, t + εt, which resembles the DGP represented by Mdl0. Because MATLAB returns the AR coefficients of the model expressed in difference-equation notation, their signs are opposite in the equation.
Compute Fitted Response Values Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Compute estimated weekly averages closing price within the time range of the data. 12-604
estimate
Load the US equity index data set Data_EquityIdx. load Data_EquityIdx
The daily price series are irregular because observations occur only on business days. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables. DTTW = convert2weekly(DataTimeTable,Aggregation="mean"); numobs = height(DTTW) numobs = 627
Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Specify the response series name as NYSE. Mdl = arima(1,1,1); Mdl.SeriesName = "NYSE";
Fit an ARIMA(1,1,1) model to the entire sample. Suppress the estimation display. EstMdl = estimate(Mdl,DTTW,Display="off");
Infer residuals et from the estimated model, specify the required presample. ResidTT = infer(EstMdl,DTTW); tail(ResidTT) Time ___________
NYSE ______
NASDAQ ______
NYSE_Residual _____________
NYSE_Variance _____________
16-Nov-2001 23-Nov-2001 30-Nov-2001 07-Dec-2001 14-Dec-2001 21-Dec-2001 28-Dec-2001 04-Jan-2002
577.11 583 581.41 584.96 574.03 582.1 590.28 589.8
1886.9 1898.3 1925.8 1998.1 1981 1967.9 1967.2 1950.4
5.8562 5.4409 -2.8105 3.4212 -12.071 8.7933 6.2015 -1.2004
55.89 55.89 55.89 55.89 55.89 55.89 55.89 55.89
ResidTT is a 627-by-4 timetable containing the data passed to esimtate DTTW, and the residuals NYSE_Residual and estimated conditional variances NYSE_Variance from the fit. Because the model variance is a constant, the conditional variance variable contains a vector completely composed of 55.89, which is the model variance estimate. Compute the fitted values yt, and store them in ResidTT. ResidTT.NYSE_YHat = ResidTT.NYSE - ResidTT.NYSE_Residual; tail(ResidTT) Time ___________
NYSE ______
NASDAQ ______
NYSE_Residual _____________
NYSE_Variance _____________
16-Nov-2001 23-Nov-2001 30-Nov-2001 07-Dec-2001
577.11 583 581.41 584.96
1886.9 1898.3 1925.8 1998.1
5.8562 5.4409 -2.8105 3.4212
55.89 55.89 55.89 55.89
NYSE_YHat _________ 571.25 577.56 584.22 581.54
12-605
12
Functions
14-Dec-2001 21-Dec-2001 28-Dec-2001 04-Jan-2002
574.03 582.1 590.28 589.8
1981 1967.9 1967.2 1950.4
-12.071 8.7933 6.2015 -1.2004
55.89 55.89 55.89 55.89
586.1 573.3 584.08 591
Plot the last 200 observations with corresponding fitted values on the same graph. figure h = plot(ResidTT.Time((end-199):end),ResidTT{(end-199):end,["NYSE" "NYSE_YHat"]}); h(2).LineStyle = "--"; legend(["Observations" "Fitted values"]) title("Model of NYSE Weekly Average Closing Prices")
The fitted values closely track the observations. Plot the residuals versus the fitted values. figure plot(ResidTT.NYSE_YHat,ResidTT.NYSE_Residual,".",MarkerSize=15) ylabel("Residuals") xlabel("Fitted Values") title("Residual Plot")
12-606
estimate
Residual variance appears larger for larger fitted values. One remedy for this behavior is to apply the log transform to the data.
Input Arguments Mdl — Partially specified ARIMA model arima model object Partially specified ARIMA model used to indicate constrained and estimable model parameters, specified as an arima model object returned by arima. Properties of Mdl describe the model structure and can specify parameter values. estimate fits unspecified (NaN-valued) parameters to the data y. estimate treats specified parameters as equality constraints during estimation. y — Single path of observed response data yt numeric column vector Single path of observed response data yt, to which the model Mdl is fit, specified as a numobs-by-1 numeric column vector. The last observation of y is the latest observation. y is the continuation of the presample series Y0. Data Types: double 12-607
12
Functions
Tbl1 — Time series data table | timetable Time series data, to which estimate fits the model, specified as a table or timetable with numvars variables and numobs rows. The selected response variable is a numeric vector representing a single path of numobs observations. You can optionally select a response variable yt from Tbl1 by using the ResponseVariables name-value argument, and you can select numpreds predictor variables xt for the exogenous regression component by using the PredictorVariables name-value argument. Each row is an observation, and measurements in each row occur simultaneously. Variables in Tbl1 represent the continuation of corresponding variables in Presample. If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending. If Tbl1 is a table, the last row contains the latest observation. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: esimtate(Mdl,y,Y0=y0,X=Pred) uses the vector y0 as presample responses for estimation and includes a linear regression component for the exogenous predictor data in the vector Pred. Estimation Options
ResponseVariable — Response variable yt to select from Tbl1 string scalar | character vector | integer | logical vector Response variable yt to select from Tbl1 containing the response data, specified as one of the following data types: • String scalar or character vector containing a variable name in Tbl1.Properties.VariableNames • Variable index (integer) to select from Tbl1.Properties.VariableNames • A length numvars logical vector, where ResponseVariable(j) = true selects variable j from Tbl1.Properties.VariableNames, and sum(ResponseVariable) is 1 The selected variable must be a numeric vector and cannot contain missing values (NaN). If Tbl1 has one variable, the default specifies that variable. Otherwise, the default matches the variable to name in Mdl.SeriesName. Example: ResponseVariable="StockRate2" Example: ResponseVariable=[false false true false] or ResponseVariable=3 selects the third table variable as the response variable. Data Types: double | logical | char | cell | string 12-608
estimate
X — Exogenous predictor data numeric matrix Exogenous predictor data for the linear regression component, specified as a numeric matrix containing numpreds columns. Use X only when you supply a vector of response data y. numpreds is the number of predictor variables. Rows correspond to observations, and the last row contains the latest observation. estimate does not use the regression component in the presample period. X must have at least as many observations as are used after the presample period: • If you specify Y0, X must have at least numobs rows. • Otherwise, X must have at least numobs + Mdl.P observations to account for the presample removal. In either case, if you supply more rows than necessary, estimate uses the latest observations only. estimate synchronizes X and y so that the latest observations (last rows) occur simultaneously. Columns correspond to individual predictor variables. By default, estimate excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Exogenous predictor variables xt to select from Tbl1 string vector | cell vector of character vectors | vector of integers | logical vector Exogenous predictor variables xt to select from Tbl1 containing predictor data for the regression component, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in Tbl1.Properties.VariableNames • A length numpreds vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames, and sum(PredictorVariables) is numpreds The selected variables must be numeric vectors and cannot contain missing values (NaN). If you specify PredictorVariables, you must also specify presample response data to by using the Presample and PresampleResponseVariable name-value arguments. For more details, see “Algorithms” on page 12-617. By default, estimate excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. Data Types: double | logical | char | cell | string Options — Optimization options optimoptions optimization controller 12-609
12
Functions
Optimization options, specified as an optimoptions optimization controller. For details on modifying the default values of the optimizer, see optimoptions or fmincon in Optimization Toolbox. For example, to change the constraint tolerance to 1e-6, set options = optimoptions(@fmincon,ConstraintTolerance=1e-6,Algorithm="sqp"). Then, pass Options into estimate using Options=options. By default, estimate uses the same default options as fmincon, except Algorithm is "sqp" and ConstraintTolerance is 1e-7. Display — Command Window display option "params" (default) | "diagnostics" | "full'" | "iter" | "off" | string vector | cell vector of character vectors Command Window display option, specified as one or more of the values in this table. Value
Information Displayed
"diagnostics"
Optimization diagnostics
"full"
Maximum likelihood parameter estimates, standard errors, t statistics, iterative optimization information, and optimization diagnostics
"iter"
Iterative optimization information
"off"
None
"params"
Maximum likelihood parameter estimates, standard errors, and t statistics and p-values of coefficient significance tests
Example: Display="off" is well suited for running a simulation that estimates many models. Example: Display=["params" "diagnostics"] displays all estimation results and the optimization diagnostics. Data Types: char | cell | string Presample Specifications
Y0 — Presample response data yt numeric column vector Presample response data yt to initialize the model, specified as a numpreobs-by-1 numeric column vector. Use Y0 only when you supply the vector of response data y. numpreobs is the number of presample observations. Each row is a presample observation. The last row contains the latest presample observation. numpreobs must be at least Mdl.P. If numpreobs > Mdl.P, estimate uses the latest required number of observations only. The last element or row contains the latest observation. By default, estimate backward forecasts (backcasts) for the necessary amount of presample responses. For details on partitioning data for estimation, see “Time Base Partitions for ARIMA Model Estimation” on page 7-97. Data Types: double 12-610
estimate
E0 — Presample residual data et numeric column vector Presample residual data et to initialize the model, specified as a numpreobs-by-1 numeric column vector. Use E0 only when you supply the vector of response data y. numpreobs is the number of presample observations. Each row is a presample observation. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q. If numpreobs > Mdl.Q, estimate uses the latest required number of observations only. The last element or row contains the latest observation. If Mdl.Variance is a conditional variance model object, such as a garch model, estimate can require more than Mdl.Q presample innovations. By default, estimate sets all required presample residuals to 0, which is the expected value of the corresponding innovations series. Data Types: double V0 — Presample conditional variances σt2 numeric positive column vector Presample conditional variances σ2t to initialize any conditional variance model, numpreobs-by-1 positive column vector. If Mdl.Variance is a conditional variance model, V0 provides initial values for that model. Use V0 only when you supply the vector of response data y. Each row is a presample observation. numpreobs must be at least number of observations required to initialize the conditional variance model type in Mdl.Variance (see estimate). If V0 has extra rows, estimate uses only the latest observations. The last row contains the latest presample observation. If the variance is constant, estimate ignores V0. By default, estimate sets the necessary presample conditional variances to the average squared value of the inferred residuals. Data Types: double Presample — Presample data table | timetable Presample data containing the response yt, residual et, or conditional variance σt2 series to initialize the model for estimation, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1. Each selected variable is a single path of numpreobs observations representing the presample of responses, residuals, or conditional variances for the selected response variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must satisfy one of the following conditions: • numpreobs ≥ Mdl.P when Presample provides only presample responses • numpreobs ≥ Mdl.Q when Presample provides only presample residuals 12-611
12
Functions
• numpreobs ≥ max([Mdl.P Mdl.Q]) when Presample provides presample responses and residuals. • Mdl can require more presample observations then specified in the other conditions when Presample provides presample conditional variances. For more details, see estimate. If you supply more rows than necessary, estimate uses the latest required number of observations only. When If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default: • When Mdl is an ARIMA model without an exogenous linear regression component, estimate backcasts for necessary presample responses, sets necessary presample residuals to 0, and sets necessary presample variances to the average squared value of inferred residuals. • When Mdl is an ARIMAX model (you specify the PredictorVariables name-value argument), you must specify presample response data because estimate cannot backcast for presample responses. estimate sets necessary presample residuals to 0 and necessary presample variances to the average squared value of inferred residuals. If you specify the Presample, you must specify the presample response, innovation, and conditional variance variable names by using the PresampleResponseVariable, PresampleInnovationVariable, or PresampleVarianceVariable name-value argument, respectively. PresampleResponseVariable — Response variable yt to select from Presample string scalar | character vector | integer | logical vector Response variable yt to select from Presample containing presample response data, specified as one of the following data types: • String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleResponseVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample response data by using the Presample name-value argument, you must specify PresampleResponseVariable. Example: PresampleResponseVariable="GDP" 12-612
estimate
Example: PresampleResponseVariable=[false false true false] or PresampleResponseVariable=3 selects the third table variable for presample response data. Data Types: double | logical | char | cell | string PresampleInnovationVariable — Residual variable et to select from Presample string scalar | character vector | integer | logical vector Residual variable et to select from Presample containing presample residual data, specified as one of the following data types: • String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample residual data by using the Presample name-value argument, you must specify PresampleInnovationVariable. Example: PresampleInnovationVariable="GDPInnov" Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable for presample residual data. Data Types: double | logical | char | cell | string PresampleVarianceVariable — Conditional variance variable σt2 to select from of Presample string scalar | character vector | integer | logical vector Conditional variance variable σt2 to select from of Presample containing presample conditional variance data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable. Example: PresampleVarianceVariable="StockRateVar0" Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable. Data Types: double | logical | char | cell | string
12-613
12
Functions
Initial Parameter Value Specifications
Constant0 — Initial estimate of model constant numeric scalar Initial estimate of the model constant c, specified as a numeric scalar. By default, estimate derives initial estimates using standard time series techniques. Data Types: double AR0 — Initial estimates of nonseasonal AR polynomial coefficients numeric vector Initial estimates of the nonseasonal AR polynomial coefficients ϕ(L), specified as a numeric vector. Elements of AR0 correspond to nonzero cells of Mdl.AR. By default, estimate derives initial estimates using standard time series techniques. Data Types: double SAR0 — Initial estimates of seasonal autoregressive polynomial coefficients numeric vector Initial estimates of the seasonal autoregressive polynomial coefficients Φ(L), specified as a numeric vector. Elements of SAR0 correspond to nonzero cells of Mdl.SAR. By default, estimate derives initial estimates using standard time series techniques. Data Types: double MA0 — Initial estimates of nonseasonal moving average polynomial coefficients numeric vector Initial estimates of the nonseasonal moving average polynomial coefficients θ(L), specified as a numeric vector. Elements of MA0 correspond to elements of Mdl.MA. By default, estimate derives initial estimates using standard time series techniques. Data Types: double SMA0 — Initial estimates of seasonal moving average polynomial coefficients numeric vector Initial estimates of the seasonal moving average polynomial coefficients Θ(L), specified as a numeric vector. Elements of SMA0 correspond to nonzero cells of Mdl.SMA. By default, estimate derives initial estimates using standard time series techniques. Data Types: double 12-614
estimate
Beta0 — Initial estimates of regression coefficients numeric vector Initial estimates of the regression coefficients β, specified as a numeric vector. The length of Beta0 must equal the numpreds. Elements of Beta0 correspond to the predictor variables represented by the columns of X or PredictorVariables. By default, estimate derives initial estimates using standard time series techniques. Data Types: double DoF0 — Initial estimate of t-distribution degrees-of-freedom parameter 10 (default) | positive scalar Initial estimate of the t-distribution degrees-of-freedom parameter ν, specified as a positive scalar. DoF0 must exceed 2. Data Types: double Variance0 — Initial estimates of variances of innovations positive scalar | cell vector of name-value arguments Initial estimates of variances of innovations, specified as a positive scalar or a cell vector of namevalue arguments. Mdl.Variance Value
Description
'Variance0' Value
Numeric scalar or NaN
Constant variance
Positive scalar
garch, egarch, or gjr model object
Conditional variance model
Cell vector of name-value arguments for specifying initial estimates, see the estimate function of the conditional variance model objects. The cell vector must have the form {'Name1',value1,'Name2',value2,...}.
By default, estimate derives initial estimates using standard time series techniques. Example: For a model with a constant variance, set Variance0=2 to specify an initial variance estimate of 2. Example: For a composite conditional mean and variance model, set Variance0={'Constant0',2,'ARCH0',0.1} to specify an initial estimate of 2 for the conditional variance model constant, and an initial estimate of 0.1 for the lag 1 coefficient in the ARCH polynomial. Data Types: double | cell Note • NaN values in y, X, Y0, E0, and V0 indicate missing values. estimate removes missing values from specified data by listwise deletion. • For the presample, estimate horizontally concatenates Y0, E0, and V0, and then it removes any row of the concatenated matrix containing at least one NaN. 12-615
12
Functions
• For the estimation sample, estimate horizontally concatenates y and X, and then it removes any row of the concatenated matrix containing at least one NaN. • Regardless of sample, estimate synchronizes the specified, possibly jagged vectors with respect to the latest observation of the sample (last row). This type of data reduction reduces the effective sample size and can create an irregular time series. • estimate issues an error when any table or timetable input contains missing values.
Output Arguments EstMdl — Estimated ARIMA model arima model object Estimated ARIMA model, returned as an arima model object. EstMdl is a copy of Mdl that has NaN values replaced with parameter estimates. EstMdl is fully specified. EstParamCov — Estimated covariance matrix of maximum likelihood estimates positive semidefinite numeric matrix Estimated covariance matrix of maximum likelihood estimates known to the optimizer, returned as a positive semidefinite numeric matrix. The rows and columns contain the covariances of the parameter estimates. The standard error of each parameter estimate is the square root of the main diagonal entries. The rows and columns corresponding to any parameters held fixed as equality constraints are zero vectors. Parameters corresponding to the rows and columns of EstParamCov appear in the following order: • Constant • Nonzero AR coefficients at positive lags, from the smallest to largest lag • Nonzero SAR coefficients at positive lags, from the smallest to largest lag • Nonzero MA coefficients at positive lags, from the smallest to largest lag • Nonzero SMA coefficients at positive lags, from the smallest to largest lag • Regression coefficients (when you specify exogenous data), ordered by the columns of X or entries of PredictorVariables • Variance parameters, a scalar for constant variance models and vector for conditional variance models (see estimate for the order of parameters) • Degrees of freedom (t-innovation distribution only) Data Types: double logL — Optimized loglikelihood objective function value numeric scalar Optimized loglikelihood objective function value, returned as a numeric scalar. 12-616
estimate
Data Types: double info — Optimization summary structure array Optimization summary, returned as a structure array with the fields described in this table. Field
Description
exitflag
Optimization exit flag (see fmincon in Optimization Toolbox)
options
Optimization options controller (see optimoptions and fmincon in Optimization Toolbox)
X
Vector of final parameter estimates
X0
Vector of initial parameter estimates
For example, you can display the vector of final estimates by entering info.X in the Command Window. Data Types: struct
Tip • To access values of the estimation results, including the number of free parameters in the model, pass EstMdl to summarize.
Algorithms • estimate infers innovations and conditional variances (when present) of the underlying response series, and then uses constrained maximum likelihood to fit the model Mdl to the response data y. • Because you can specify numeric presample data inputs Y0, E0, and V0 of differing lengths, estimate assumes that all specified sets have these characteristics: • The final observation (row) in each set occurs simultaneously. • The first observation in the estimation sample immediately follows the last observation in the presample, with respect to the sampling frequency. • If you specify the Display name-value argument, the value overrides the Diagnostics and Display settings of the Options name-value argument. Otherwise, estimate displays optimization information using Options settings. • estimate uses the outer product of gradients (OPG) method to perform covariance matrix estimation on page 3-60. • If you supply data in the table or timetable Tbl1 to estimate an ARIMAX model, estimate cannot backcast for presample responses. Therefore, if you specify PredictorVariables, you must also specify presample response data by using the Presample and PresampleResponseVariable name-value arguments.
Version History Introduced in R2012a R2023b: estimate accepts input data in tables and timetables
12-617
12
Functions
In addition to accepting input data (in-sample and presample data) in numeric arrays, estimate accepts input data in tables or regular timetables. When you supply data in a table or timetable, estimate chooses the default series on which to operate, but you can use the specified optional name-value argument to select a different series. Name-value arguments to support tabular workflows include: • ResponseVariable specifies the variable name of the response series in the input data Tbl1, to which the model is fit. • Presample specifies the input table or timetable of presample response, residual, and conditional variance data. • PresampleResponseVariable specifies the variable name of the response series to select from Presample. • PresampleInnovationVariable specifies the variable name of the residual series to select from Presample. • PresampleVarianceVariable specifies the variable name of the conditional variance series to select from Presample. • PredictorVariables specifies the names of the predictor series to select from the input data for the exogenous regression component. R2019b: estimate includes the final lag in all estimated univariate time series model polynomials Behavior changed in R2019b estimate includes the final polynomial lag as specified in the input model template for estimation. In other words, the specified polynomial degrees of an input model template returned by an object creation function and the corresponding polynomial degrees of the estimated model returned by estimate are equal. Before R2019b, estimate removed trailing lags estimated below the tolerance of 1e-12. Update Code
Polynomial degrees require minimum presample observations for operations downstream of estimation, such as model forecasting and simulation. If a model template in your code does not describe the data generating process well, then the polynomials in the estimated model can have higher degrees than in previous releases. Consequently, you must supply additional presample responses for operations on the estimated model; otherwise, the function issues an error. For more details, see the Y0 name-value argument. R2018a: The Print name-value argument is removed Errors starting in R2018a Replace all instances of 'Print',true with 'Display','on', and 'Print',false with 'Display','off'.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. 12-618
estimate
[3] Greene, William. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Prentice Hall, 2008. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima Functions infer | summarize | estimate Topics “Time Base Partitions for ARIMA Model Estimation” on page 7-97 “Estimate Multiplicative ARIMA Model” on page 7-117 “Estimate Conditional Mean and Variance Model” on page 7-130 “Model Seasonal Lag Effects Using Indicator Variables” on page 7-120 “Maximum Likelihood Estimation for Conditional Mean Models” on page 7-106 “Conditional Mean Model Estimation with Equality Constraints” on page 7-108 “Presample Data for Conditional Mean Model Estimation” on page 7-109 “Initial Values for Conditional Mean Model Estimation” on page 7-111 “Optimization Settings for Conditional Mean Model Estimation” on page 7-113
12-619
12
Functions
estimate Estimate posterior distribution of Bayesian linear regression model parameters
Syntax PosteriorMdl = estimate(PriorMdl,X,y) PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) [PosteriorMdl,Summary] = estimate( ___ )
Description To perform predictor variable selection for a Bayesian linear regression model, see estimate. PosteriorMdl = estimate(PriorMdl,X,y) returns the Bayesian linear regression on page 12636 model PosteriorMdl that characterizes the joint posterior distributions of the coefficients β and the disturbance variance σ2. PriorMdl specifies the joint prior distribution of the parameters and the structure of the linear regression model. X is the predictor data and y is the response data. PriorMdl and PosteriorMdl might not be the same object type. To produce PosteriorMdl, the estimate function updates the prior distribution with information about the parameters that it obtains from the data. NaNs in the data indicate missing values, which estimate removes by using list-wise deletion. PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) specifies additional options using one or more name-value pair arguments. For example, you can specify a value for either β or σ2 to estimate the conditional posterior distribution of one parameter given the specified value of the other parameter. If you specify the Beta or Sigma2 name-value pair argument, then PosteriorMdl and PriorMdl are equal. [PosteriorMdl,Summary] = estimate( ___ ) uses any of the input argument combinations in the previous syntaxes to return a table that contains the following for each parameter: the posterior mean and standard deviation, 95% credible interval, posterior probability that the parameter is greater than 0, and description of the posterior distribution (if one exists). Also, the table contains the posterior covariance matrix of β and σ2. If you specify the Beta or Sigma2 name-value pair argument, then estimate returns conditional posterior estimates.
Examples Compare Default Prior and Marginal Posterior Estimation to OLS Estimates Consider a model that predicts the fuel economy (in MPG) of a car given its engine displacement and weight. Load the carsmall data set. 12-620
estimate
load carsmall x = [Displacement Weight]; y = MPG;
Regress fuel economy onto engine displacement and weight, including an intercept to obtain ordinary least-squares (OLS) estimates. Mdl = fitlm(x,y) Mdl = Linear regression model: y ~ 1 + x1 + x2 Estimated Coefficients: Estimate __________ (Intercept) x1 x2
46.925 -0.014593 -0.0068422
SE _________
tStat _______
pValue __________
2.0858 0.0082695 0.0011337
22.497 -1.7647 -6.0353
6.0509e-39 0.080968 3.3838e-08
Number of observations: 94, Error degrees of freedom: 91 Root Mean Squared Error: 4.09 R-squared: 0.747, Adjusted R-Squared: 0.741 F-statistic vs. constant model: 134, p-value = 7.22e-28 Mdl.MSE ans = 16.7100
Create a default, diffuse prior distribution for one predictor. p = 2; PriorMdl = bayeslm(p);
PriorMdl is a diffuseblm model object. Use default options to estimate the posterior distribution. PosteriorMdl = estimate(PriorMdl,x,y); Method: Analytic posterior distributions Number of observations: 94 Number of predictors: 3 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | 46.9247 2.1091 [42.782, 51.068] 1.000 t (46.92, 2.09^2, 91) Beta(1) | -0.0146 0.0084 [-0.031, 0.002] 0.040 t (-0.01, 0.01^2, 91) Beta(2) | -0.0068 0.0011 [-0.009, -0.005] 0.000 t (-0.01, 0.00^2, 91) Sigma2 | 17.0855 2.5905 [12.748, 22.866] 1.000 IG(45.50, 0.0013)
PosteriorMdl is a conjugateblm model object. The posterior means and the OLS coefficient estimates are almost identical. Also, the posterior standard deviations and OLS standard errors are almost identical. The posterior mean of Sigma2 is close to the OLS mean squared error (MSE). 12-621
12
Functions
Estimate Posterior Using Hamiltonian Monte Carlo Sampler Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of total employment (E) and real wages (WR).
For all , is a series of independent Gaussian disturbances with a mean of 0 and variance Assume these prior distributions: • •
.
is a 3-D t distribution with 10 degrees of freedom for each component, correlation matrix C, location ct, and scale st. , with shape
and scale
.
bayeslm treats these assumptions and the data likelihood as if the corresponding posterior is analytically intractable. Declare a MATLAB® function that: • Accepts values of hyperparameters •
and
together in a column vector, and accepts values of the
Returns the value of the joint prior distribution,
, given the values of
and
function logPDF = priorMVTIG(params,ct,st,dof,C,a,b) %priorMVTIG Log density of multivariate t times inverse gamma % priorMVTIG passes params(1:end-1) to the multivariate t density % function with dof degrees of freedom for each component and positive % definite correlation matrix C. priorMVTIG returns the log of the product of % the two evaluated densities. % % params: Parameter values at which the densities are evaluated, an % m-by-1 numeric vector. % % ct: Multivariate t distribution component centers, an (m-1)-by-1 % numeric vector. Elements correspond to the first m-1 elements % of params. % % st: Multivariate t distribution component scales, an (m-1)-by-1 % numeric (m-1)-by-1 numeric vector. Elements correspond to the % first m-1 elements of params. % % dof: Degrees of freedom for the multivariate t distribution, a % numeric scalar or (m-1)-by-1 numeric vector. priorMVTIG expands % scalars such that dof = dof*ones(m-1,1). Elements of dof % correspond to the elements of params(1:end-1). % % C: Correlation matrix for the multivariate t distribution, an % (m-1)-by-(m-1) symmetric, positive definite matrix. Rows and % columns correspond to the elements of params(1:end-1). %
12-622
estimate
% a: Inverse gamma shape parameter, a positive numeric scalar. % % b: Inverse gamma scale parameter, a positive scalar. % beta = params(1:(end-1)); sigma2 = params(end); tVal = (beta - ct)./st; mvtDensity = mvtpdf(tVal,C,dof); igDensity = sigma2^(-a-1)*exp(-1/(sigma2*b))/(gamma(a)*b^a); logPDF = log(mvtDensity*igDensity); end
Create an anonymous function that operates like priorMVTIG, but accepts the parameter values only, and holds the hyperparameter values fixed at arbitrarily chosen values. prednames = ["E" "WR"]; p = numel(prednames); numcoeff = p + 1; rng(1); % For reproducibility dof = 10; V = rand(numcoeff); Sigma = 0.5*(V + V') + numcoeff*eye(numcoeff); st = sqrt(diag(Sigma)); C = diag(1./st)*Sigma*diag(1./st); ct = rand(numcoeff,1); a = 10*rand; b = 10*rand; logPDF = @(params)priorMVTIG(params,ct,st,dof,C,a,b);
Create a custom joint prior model for the linear regression parameters. Specify the number of predictors p. Also, specify the function handle for priorMVTIG and the variable names. PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,... 'VarNames',prednames);
PriorMdl is a customblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,"GNPR"};
using the Hamiltonian Monte Carlo (HMC) Estimate the marginal posterior distributions of and sampler. Specify drawing 10,000 samples and a burn-in period of 1000 draws. PosteriorMdl = estimate(PriorMdl,X,y,'Sampler','hmc','NumDraws',1e4,... 'Burnin',1e3); Method: MCMC sampling with 10000 draws Number of observations: 62
12-623
12
Functions
Number of predictors:
3
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | -3.6344 5.6044 [-16.124, 6.211] 0.246 Empirical E | -0.0056 0.0006 [-0.007, -0.004] 0.000 Empirical WR | 15.2548 0.7768 [13.711, 16.775] 1.000 Empirical Sigma2 | 1285.5647 240.9600 [896.743, 1833.793] 1.000 Empirical
PosteriorMdl is an empiricalblm model object storing the draws from the posterior distributions. View a trace plot and an ACF plot of the draws from the posterior of disturbance variance. Do not plot the burn-in period. figure; subplot(2,1,1) plot(PosteriorMdl.BetaDraws(2,1001:end)); title(['Trace Plot ' char(8212) ' \beta_1']); xlabel('MCMC Draw') ylabel('Simulation Index') subplot(2,1,2) autocorr(PosteriorMdl.BetaDraws(2,1001:end)) figure; subplot(2,1,1) plot(PosteriorMdl.Sigma2Draws(1001:end)); title(['Trace Plot ' char(8212) ' Disturbance Variance']); xlabel('MCMC Draw') ylabel('Simulation Index') subplot(2,1,2) autocorr(PosteriorMdl.Sigma2Draws(1001:end))
12-624
(for example) and the
estimate
12-625
12
Functions
The MCMC sample of the disturbance variance appears to mix well.
Estimate Conditional Posterior Distributions Consider the regression model in “Estimate Posterior Using Hamiltonian Monte Carlo Sampler” on page 12-622. This example uses the same data and context, but assumes a diffuse prior model instead. Create a diffuse prior model for the linear regression parameters. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','diffuse','VarNames',["IPI" "E" "WR"]) PriorMdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one
12-626
estimate
IPI E WR Sigma2
| | | |
0 0 0 Inf
Inf Inf Inf Inf
[ [ [ [
NaN, NaN, NaN, NaN,
NaN] NaN] NaN] NaN]
0.500 0.500 0.500 1.000
Proportional Proportional Proportional Proportional
to to to to
one one one 1/Sigma2
PriorMdl is a diffuseblm model object. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the conditional posterior distribution of β given the data and that σ2 = 2, and return the estimation summary table to access the estimates. [Mdl,SummaryBeta] = estimate(PriorMdl,X,y,'Sigma2',2); Method: Analytic posterior distributions Conditional variable: Sigma2 fixed at 2 Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | -24.2536 1.8696 [-27.918, -20.589] 0.000 N (-24.25, 1.87^2) IPI | 4.3913 0.0301 [ 4.332, 4.450] 1.000 N (4.39, 0.03^2) E | 0.0011 0.0001 [ 0.001, 0.001] 1.000 N (0.00, 0.00^2) WR | 2.4682 0.0743 [ 2.323, 2.614] 1.000 N (2.47, 0.07^2) Sigma2 | 2 0 [ 2.000, 2.000] 1.000 Fixed value
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 2 during estimation, inferences on it are trivial. Extract the mean vector and covariance matrix of the conditional posterior of β from the estimation summary table. condPostMeanBeta = SummaryBeta.Mean(1:(end - 1)) condPostMeanBeta = 4×1 -24.2536 4.3913 0.0011 2.4682 CondPostCovBeta = SummaryBeta.Covariances(1:(end - 1),1:(end - 1)) CondPostCovBeta = 4×4 3.4956 0.0350 -0.0001 0.0241
0.0350 0.0009 -0.0000 -0.0013
-0.0001 -0.0000 0.0000 -0.0000
0.0241 -0.0013 -0.0000 0.0055
12-627
12
Functions
Display Mdl. Mdl Mdl = diffuseblm with properties: NumPredictors: 3 Intercept: 1 VarNames: {4x1 cell} | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------Intercept | 0 Inf [ NaN, NaN] 0.500 Proportional to one IPI | 0 Inf [ NaN, NaN] 0.500 Proportional to one E | 0 Inf [ NaN, NaN] 0.500 Proportional to one WR | 0 Inf [ NaN, NaN] 0.500 Proportional to one Sigma2 | Inf Inf [ NaN, NaN] 1.000 Proportional to 1/Sigma2
Because estimate computes the conditional posterior distribution, it returns the original prior model, not the posterior, in the first position of the output argument list. Estimate the conditional posterior distributions of σ2 given that β is condPostMeanBeta. [~,SummarySigma2] = estimate(PriorMdl,X,y,'Beta',condPostMeanBeta); Method: Analytic posterior distributions Conditional variable: Beta fixed at -24.2536 Number of observations: 62 Number of predictors: 4
4.3913
0.00112035
2.46823
| Mean Std CI95 Positive Distribution -------------------------------------------------------------------------------Intercept | -24.2536 0 [-24.254, -24.254] 0.000 Fixed value IPI | 4.3913 0 [ 4.391, 4.391] 1.000 Fixed value E | 0.0011 0 [ 0.001, 0.001] 1.000 Fixed value WR | 2.4682 0 [ 2.468, 2.468] 1.000 Fixed value Sigma2 | 48.5138 9.0088 [33.984, 69.098] 1.000 IG(31.00, 0.00069)
estimate displays a summary of the conditional posterior distribution of σ2. Because β is fixed to condPostMeanBeta during estimation, inferences on it are trivial. Extract the mean and variance of the conditional posterior of σ2 from the estimation summary table. condPostMeanSigma2 = SummarySigma2.Mean(end) condPostMeanSigma2 = 48.5138 CondPostVarSigma2 = SummarySigma2.Covariances(end,end) CondPostVarSigma2 = 81.1581
12-628
estimate
Access Estimates in Estimation Display Consider the regression model in “Estimate Posterior Using Hamiltonian Monte Carlo Sampler” on page 12-622. This example uses the same data and context, but assumes a semiconjugate prior model instead. Create a semiconjugate prior model for the linear regression parameters. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate',... 'VarNames',["IPI" "E" "WR"]);
PriorMdl is a semiconjugateblm model object. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Estimate the marginal posterior distributions of β and σ2. rng(1); % For reproducibility [PosteriorMdl,Summary] = estimate(PriorMdl,X,y); Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical IPI | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical
PosteriorMdl is an empiricalblm model object because marginal posterior distributions of semiconjugate models are analytically intractable, so estimate must implement a Gibbs sampler. Summary is a table containing the estimates and inferences that estimate displays at the command line. Display the summary table. Summary Summary=5×6 table
Intercept IPI E WR Sigma2
Mean _________
Std __________
CI95 ________________________
-23.992 4.3929 0.0011124 2.4711 46.747
9.052 0.14578 0.00033976 0.3576 8.455
-41.734 4.1011 0.00045128 1.7622 33.099
-6.1976 4.6782 0.0017883 3.1781 66.126
Positive ________
Distribution _____________
0.0053 1 0.9989 1 1
{'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'}
12-629
12
Functions
Access the 95% equitailed credible interval of the regression coefficient of IPI. Summary.CI95(2,:) ans = 1×2 4.1011
4.6782
Input Arguments PriorMdl — Bayesian linear regression model conjugateblm model object | semiconjugateblm model object | diffuseblm model object | empiricalblm model object | customblm model object Bayesian linear regression model representing a prior model, specified as an object in this table. Model Object
Description
conjugateblm
Dependent, normal-inverse-gamma conjugate model returned by bayeslm or estimate
semiconjugatebl Independent, normal-inverse-gamma semiconjugate model returned by m bayeslm diffuseblm
Diffuse prior model returned by bayeslm
empiricalblm
Prior model characterized by samples from prior distributions, returned by bayeslm or estimate
customblm
Prior distribution function that you declare returned by bayeslm
PriorMdl can also represent a joint posterior model returned by estimate, either a conjugateblm or empiricalblm model object. In this case, estimate updates the joint posterior distribution using the new observations in X and y. X — Predictor data numeric matrix Predictor data for the multiple linear regression model, specified as a numObservations-byPriorMdl.NumPredictors numeric matrix. numObservations is the number of observations and must be equal to the length of y. Data Types: double y — Response data numeric vector Response data for the multiple linear regression model, specified as a numeric vector with numObservations elements. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. 12-630
estimate
Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Sigma2',2 specifies estimating the conditional posterior distribution of the regression coefficients given the data and that the specified disturbance variance is 2. Options for All Prior Distributions
Display — Flag to display Bayesian estimator summary at command line true (default) | false Flag to display Bayesian estimator summary at the command line, specified as the comma-separated pair consisting of 'Display' and a value in this table. Value
Description
true
estimate prints estimation information and a table summarizing the Bayesian estimators to t command line.
false
estimate does not print to the command line.
The estimation information includes the estimation method, fixed parameters, the number of observations, and the number of predictors. The summary table contains estimated posterior means and standard deviations (square root of the posterior variance), 95% equitailed credible intervals, the posterior probability that the parameter is greater than 0, and a description of the posterior distribution (if known). If you specify one of Beta or Sigma2, then estimate includes your specification in the display, and corresponding posterior estimates are trivial. Example: 'Display',false Data Types: logical Options for All Prior Distributions Except Empirical
Beta — Value of regression coefficients for conditional posterior distribution estimation of disturbance variance empty array ([]) (default) | numeric column vector Value of the regression coefficients for conditional posterior distribution estimation of the disturbance variance, specified as the comma-separated pair consisting of 'Beta' and a (PriorMdl.Intercept + PriorMdl.NumPredictors)-by-1 numeric vector. estimate estimates the characteristics of π(σ2| y,X,β = Beta), where y is y, X is X, and Beta is the value of 'Beta'. If PriorMdl.Intercept is true, then Beta(1) corresponds to the model intercept. All other values correspond to the predictor variables that compose the columns of X. Beta cannot contain any NaN values (that is, all coefficients must be known). You cannot specify Beta and Sigma2 simultaneously. By default, estimate does not compute characteristics of the conditional posterior of σ2. Example: 'Beta',1:3 Data Types: double Sigma2 — Value of disturbance variance for conditional posterior distribution estimation of regression coefficients empty array ([]) (default) | positive numeric scalar 12-631
12
Functions
Value of the disturbance variance for conditional posterior distribution estimation of the regression coefficients, specified as the comma-separated pair consisting of 'Sigma2' and a positive numeric scalar. estimate estimates characteristics of π(β|y,X,Sigma2), where y is y, X is X, and Sigma2 is the value of 'Sigma2'. You cannot specify Sigma2 and Beta simultaneously. By default, estimate does not compute characteristics of the conditional posterior of β. Example: 'Sigma2',1 Data Types: double Options for Semiconjugate, Empirical, and Custom Prior Distributions
NumDraws — Monte Carlo simulation adjusted sample size 1e5 (default) | positive integer Monte Carlo simulation adjusted sample size, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer. estimate actually draws BurnIn + NumDraws*Thin samples, but it bases the estimates off NumDraws samples. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-637. If PriorMdl is a semiconjugateblm model and you specify Beta or Sigma2, then MATLAB ignores NumDraws. Example: 'NumDraws',1e7 Data Types: double Options for Semiconjugate and Custom Prior Distributions
BurnIn — Number of draws to remove from beginning of Monte Carlo sample 5000 (default) | nonnegative scalar Number of draws to remove from the beginning of the Monte Carlo sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and a nonnegative scalar. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-637. Tip To help you specify the appropriate burn-in period size, determine the extent of the transient behavior in the Monte Carlo sample by specifying 'BurnIn',0, simulating a few thousand observations using simulate, and then plotting the paths. Example: 'BurnIn',0 Data Types: double Thin — Monte Carlo adjusted sample size multiplier 1 (default) | positive integer Monte Carlo adjusted sample size multiplier, specified as the comma-separated pair consisting of 'Thin' and a positive integer. The actual Monte Carlo sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, estimate discards every Thin – 1 draws, and then retains the next. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-637. 12-632
estimate
Tip To reduce potential large serial correlation in the Monte Carlo sample, or to reduce the memory consumption of the draws stored in PosteriorMdl, specify a large value for Thin. Example: 'Thin',5 Data Types: double BetaStart — Starting values of regression coefficients for MCMC sample numeric column vector Starting values of the regression coefficients for the Markov chain Monte Carlo (MCMC) sample, specified as the comma-separated pair consisting of 'BetaStart' and a numeric column vector with (PriorMdl.Intercept + PriorMdl.NumPredictors) elements. By default, BetaStart is the ordinary least-squares (OLS) estimate. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values. Example: 'BetaStart',[1; 2; 3] Data Types: double Sigma2Start — Starting values of disturbance variance for MCMC sample positive numeric scalar Starting values of the disturbance variance for the MCMC sample, specified as the comma-separated pair consisting of 'Sigma2Start' and a positive numeric scalar. By default, Sigma2Start is the OLS residual mean squared error. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values. Example: 'Sigma2Start',4 Data Types: double Options for Custom Prior Distributions
Reparameterize — Reparameterization of σ2 as log(σ2) false (default) | true Reparameterization of σ2 as log(σ2) during posterior estimation and simulation, specified as the comma-separated pair consisting of 'Reparameterize' and a value in this table. Value
Description
false
estimate does not reparameterize σ2.
true
estimate reparameterizes σ2 as log(σ2). estimate converts results back to the original scale and does not change the functional form of PriorMdl.LogPDF.
12-633
12
Functions
Tip If you experience numeric instabilities during the posterior estimation or simulation of σ2, then specify 'Reparameterize',true. Example: 'Reparameterize',true Data Types: logical Sampler — MCMC sampler 'slice' (default) | 'metropolis' | 'hmc' MCMC sampler, specified as the comma-separated pair consisting of 'Sampler' and a value in this table. Value
Description
'slice'
Slice sampler
'metropolis'
Random walk Metropolis sampler
'hmc'
Hamiltonian Monte Carlo (HMC) sampler
Tip • To increase the quality of the MCMC draws, tune the sampler. 1
Before calling estimate, specify the tuning parameters and their values by using sampleroptions. For example, to specify the slice sampler width width, use: options = sampleroptions('Sampler',"slice",'Width',width);
2
Specify the object containing the tuning parameter specifications returned by sampleroptions by using the 'Options' name-value pair argument. For example, to use the tuning parameter specifications in options, specify: 'Options',options
• If you specify the HMC sampler, then a best practice is to provide the gradient for some variables, at least. estimate resorts the numerical computation of any missing partial derivatives (NaN values) in the gradient vector.
Example: 'Sampler',"hmc" Data Types: string Options — Sampler options [] (default) | structure array Sampler options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by sampleroptions. Use 'Options' to specify the MCMC sampler and its tuningparameter values. Example: 'Options',sampleroptions('Sampler',"hmc") Data Types: struct Width — Typical sampling-interval width positive numeric scalar | numeric vector of positive values 12-634
estimate
Typical sampling-interval width around the current value in the marginal distributions for the slice sampler, specified as the comma-separated pair consisting of 'Width' and a positive numeric scalar or a (PriorMdl.Intercept + PriorMdl.NumPredictors + 1)-by-1 numeric vector of positive values. The first element corresponds to the model intercept, if one exists in the model. The next PriorMdl.NumPredictors elements correspond to the coefficients of the predictor variables ordered by the predictor data columns. The last element corresponds to the model variance. • If Width is a scalar, then estimate applies Width to all PriorMdl.NumPredictors + PriorMdl.Intercept + 1 marginal distributions. • If Width is a numeric vector, then estimate applies the first element to the intercept (if one exists), the next PriorMdl.NumPredictors elements to the regression coefficients corresponding to the predictor variables in X, and the last element to the disturbance variance. • If the sample size (size(X,1)) is less than 100, then Width is 10 by default. • If the sample size is at least 100, then estimate sets Width to the vector of corresponding posterior standard deviations by default, assuming a diffuse prior model (diffuseblm). The typical width of the slice sampler does not affect convergence of the MCMC sample. It does affect the number of required function evaluations, that is, the efficiency of the algorithm. If the width is too small, then the algorithm can implement an excessive number of function evaluations to determine the appropriate sampling width. If the width is too large, then the algorithm might have to decrease the width to an appropriate size, which requires function evaluations. estimate sends Width to the slicesample function. For more details, see slicesample. Tip • For maximum flexibility, specify the slice sampler width width by using the 'Options' namevalue pair argument. For example: 'Options',sampleroptions('Sampler',"slice",'Width',width)
Example: 'Width',[100*ones(3,1);10]
Output Arguments PosteriorMdl — Bayesian linear regression model storing distribution characteristics conjugateblm model object | semiconjugateblm model object | diffuseblm model object | empiricalblm model object | customblm model object Bayesian linear regression model storing distribution characteristics, returned as a conjugateblm, semiconjugateblm, diffuseblm, empiricalblm, or customblm model object. • If you do not specify either Beta or Sigma2 (their values are []), then estimate updates the prior model using the data likelihood to form the posterior distribution. PosteriorMdl characterizes the posterior distribution. Its object type depends on the prior model type (PriorMdl). Model Object
PriorMdl
conjugateblm
conjugateblm or diffuseblm
12-635
12
Functions
Model Object
PriorMdl
empiricalblm
semiconjugateblm, empiricalblm, or customblm
• If you specify either Beta or Sigma2, then PosteriorMdl equals PriorMdl (the two models are the same object storing the same property values). estimate does not update the prior model to form the posterior model. However, estBeta, EstBetaCov, estSigma2, estSigma2Var, and Summary store conditional posterior estimates. For more details on the display of PosteriorMdl, see Summary. For details on supported posterior distributions that are analytically tractable, see “Analytically Tractable Posteriors” on page 6-5. Summary — Summary of Bayesian estimators table Summary of Bayesian estimators, returned as a table. Summary contains the same information as the display of the estimation summary (Display). Rows correspond to parameters, and columns correspond to these posterior characteristics for each parameter: • Mean – Posterior mean • Std – Posterior standard deviation • CI95 – 95% equitailed credible interval • Positive – The posterior probability the parameter is greater than 0 • Distribution – Description of the marginal or conditional posterior distribution of the parameter, when known • Covariances – Estimated covariance matrix of the coefficients and disturbance variance Row names are the names in PriorMdl.VarNames. The name of the last row is Sigma2. Alternatively, pass PosteriorMdl to summarize to obtain a summary of Bayesian estimators.
Limitations If PriorMdl is an empiricalblm model object. You cannot specify Beta or Sigma2. You cannot estimate conditional posterior distributions by using an empirical prior distribution.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. 12-636
estimate
• εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Tips • Monte Carlo simulation is subject to variation. If estimate uses Monte Carlo simulation, then estimates and inferences might vary when you call estimate multiple times under seemingly equivalent conditions. To reproduce estimation results, set a random number seed by using rng before calling estimate. • If estimate issues an error while estimating the posterior distribution using a custom prior model, then try adjusting initial parameter values by using BetaStart or Sigma2Start, or try adjusting the declared log prior function, and then reconstructing the model. The error might indicate that the log of the prior distribution is –Inf at the specified initial values.
Algorithms • Whenever the prior distribution (PriorMdl) and the data likelihood yield an analytically tractable posterior distribution, estimate evaluates the closed-form solutions to Bayes estimators. Otherwise, estimate resorts to Monte Carlo simulation to estimate parameters and draw inferences. For more details, see “Posterior Estimation and Inference” on page 6-4. • This figure illustrates how estimate reduces the Monte Carlo sample using the values of NumDraws, Thin, and BurnIn. Rectangles represent successive draws from the distribution. estimate removes the white rectangles from the Monte Carlo sample. The remaining NumDraws black rectangles compose the Monte Carlo sample.
Version History Introduced in R2017a 12-637
12
Functions
R2019b: estimate returns only an estimated model object and estimation summary Errors starting in R2019b For a simpler interface, estimate returns only an estimated model and an estimation summary table. Now, the supported syntaxes are: PosteriorMdl = estimate(...); [PosteriorMdl,Summary] = estimate(...);
You can obtain estimated posterior means and covariances, based on the marginal or conditional distributions, from the estimation summary table. In past releases, estimate returned these output arguments: [PosteriorMdl,estBeta,EstBetaCov,estSigma2,estSigma2Var,Summary] = estimate(...);
estBeta, EstBetaCov, estSigma2, and estSigma2Var are posterior means and covariances of β and σ2. Starting in R2019b, if you request any output argument in a position greater than the second position, estimate issues this error: Too many output arguments.
For details on how to update your code, see “Replacing Removed Syntaxes of estimate” on page 6-73.
See Also Objects conjugateblm | semiconjugateblm | diffuseblm | empiricalblm | customblm Functions bayeslm | forecast | simulate | summarize | plot | sampleroptions Topics “Replacing Removed Syntaxes of estimate” on page 6-73 “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10 “Specify Gradient for HMC Sampler” on page 6-18 “Tune Slice Sampler for Posterior Estimation” on page 6-36
12-638
estimate
estimate Perform predictor variable selection for Bayesian linear regression models
Syntax PosteriorMdl = estimate(PriorMdl,X,y) PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) [PosteriorMdl,Summary] = estimate( ___ )
Description To estimate the posterior distribution of a standard Bayesian linear regression model, see estimate. PosteriorMdl = estimate(PriorMdl,X,y) returns the model that characterizes the joint posterior distributions of β and σ2 of a Bayesian linear regression on page 12-653 model. estimate also performs predictor variable selection. PriorMdl specifies the joint prior distribution of the parameters, the structure of the linear regression model, and the variable selection algorithm. X is the predictor data and y is the response data. PriorMdl and PosteriorMdl are not the same object type. To produce PosteriorMdl, estimate updates the prior distribution with information about the parameters that it obtains from the data. NaNs in the data indicate missing values, which estimate removes using list-wise deletion. PosteriorMdl = estimate(PriorMdl,X,y,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, 'Lambda',0.5 specifies that the shrinkage parameter value for Bayesian lasso regression is 0.5 for all coefficients except the intercept. If you specify Beta or Sigma2, then PosteriorMdl and PriorMdl are equal. [PosteriorMdl,Summary] = estimate( ___ ) uses any of the input argument combinations in the previous syntaxes and also returns a table that includes the following for each parameter: posterior estimates, standard errors, 95% credible intervals, and posterior probability that the parameter is greater than 0.
Examples Select Variables Using Bayesian Lasso Regression Consider the multiple linear regression model that predicts US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. 12-639
12
Functions
Assume the prior distributions are: • For k = 0,...,3, βk | σ2 has a Laplace distribution with a mean of 0 and a scale of σ2 /λ, where λ is the shrinkage parameter. The coefficients are conditionally independent. • σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. Create a prior model for Bayesian lasso regression. Specify the number of predictors, the prior model type, and variable names. Specify these shrinkages: • 0.01 for the intercept • 10 for IPI and WR • 1e5 for E because it has a scale that is several orders of magnitude larger than the other variables The order of the shrinkages follows the order of the specified variable names, but the first element is the shrinkage of the intercept. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','Lambda',[0.01; 10; 1e5; 10],... 'VarNames',["IPI" "E" "WR"]);
PriorMdl is a lassoblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,"GNPR"};
Perform Bayesian lasso regression by passing the prior model and data to estimate, that is, by estimating the posterior distribution of β and σ2. Bayesian lasso regression uses Markov chain Monte Carlo (MCMC) to sample from the posterior. For reproducibility, set a random seed. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: lasso MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------Intercept | -1.3472 6.8160 [-15.169, 11.590] 0.427 Empirical IPI | 4.4755 0.1646 [ 4.157, 4.799] 1.000 Empirical E | 0.0001 0.0002 [-0.000, 0.000] 0.796 Empirical WR | 3.1610 0.3136 [ 2.538, 3.760] 1.000 Empirical Sigma2 | 60.1452 11.1180 [42.319, 85.085] 1.000 Empirical
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions in the MATLAB® command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: 12-640
estimate
• CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of IPI is in [4.157, 4.799] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.427. Plot the posterior distributions. plot(PosteriorMdl)
Given the shrinkages, the distribution of E is fairly dense around 0. Therefore, E might not be an important predictor. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
12-641
12
Functions
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-642
estimate
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Select Variables Using SSVS Consider the regression model in “Select Variables Using Bayesian Lasso Regression” on page 12639. Create a prior model for performing stochastic search variable selection (SSVS). Assume that β and σ2 are dependent (a conjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo for estimation, set a random number seed to reproduce the results. 12-643
12
Functions
rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ---------------------------------------------------------------------------------Intercept | -18.8333 10.1851 [-36.965, 0.716] 0.037 Empirical 0.8806 IPI | 4.4554 0.1543 [ 4.165, 4.764] 1.000 Empirical 0.4545 E | 0.0010 0.0004 [ 0.000, 0.002] 0.997 Empirical 0.0925 WR | 2.4686 0.3615 [ 1.766, 3.197] 1.000 Empirical 0.1734 Sigma2 | 47.7557 8.6551 [33.858, 66.875] 1.000 Empirical NaN
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions in the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E (standardized) is in [0.000, 0.0.002] is 0.95. • Regime, which contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability E that should be included in the model is 0.0925. Assuming that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
12-644
estimate
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-645
12
Functions
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Estimate Conditional Posterior Distributions Consider the regression model and prior distribution in “Select Variables Using Bayesian Lasso Regression” on page 12-639. Create a Bayesian lasso regression prior model for 3 predictors and specify variable names. Specify the shrinkage values 0.01, 10, 1e5, and 10 for the intercept, and the coefficients of IPI, E, and WR. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"],... 'Lambda',[0.01; 10; 1e5; 10]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,"GNPR"};
Estimate the conditional posterior distribution of β given the data and that σ2 = 10, and return the estimation summary table to access the estimates. 12-646
estimate
rng(1); % For reproducibility [Mdl,SummaryBeta] = estimate(PriorMdl,X,y,'Sigma2',10); Method: lasso MCMC sampling with 10000 draws Conditional variable: Sigma2 fixed at 10 Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------Intercept | -8.0643 4.1992 [-16.384, 0.018] 0.025 Empirical IPI | 4.4454 0.0679 [ 4.312, 4.578] 1.000 Empirical E | 0.0004 0.0002 [ 0.000, 0.001] 0.999 Empirical WR | 2.9792 0.1672 [ 2.651, 3.305] 1.000 Empirical Sigma2 | 10 0 [10.000, 10.000] 1.000 Empirical
estimate displays a summary of the conditional posterior distribution of β. Because σ2 is fixed at 10 during estimation, inferences on it are trivial. Display Mdl. Mdl Mdl = lassoblm with properties: NumPredictors: Intercept: VarNames: Lambda: A: B:
3 1 {4x1 cell} [4x1 double] 3 1
| Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | 0 100 [-200.000, 200.000] 0.500 Scale mixture IPI | 0 0.1000 [-0.200, 0.200] 0.500 Scale mixture E | 0 0.0000 [-0.000, 0.000] 0.500 Scale mixture WR | 0 0.1000 [-0.200, 0.200] 0.500 Scale mixture Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Because estimate computes the conditional posterior distribution, it returns the model input PriorMdl, not the conditional posterior, in the first position of the output argument list. Display the estimation summary table. SummaryBeta SummaryBeta=5×6 table Mean __________ Intercept IPI E
-8.0643 4.4454 0.00039896
Std __________
CI95 ________________________
4.1992 0.067949 0.00015673
-16.384 4.312 9.4925e-05
0.01837 4.5783 0.00070697
Positive ________
Distribution ____________
0.0254 1 0.9987
{'Empirical' {'Empirical' {'Empirical'
12-647
12
Functions
WR Sigma2
2.9792 10
0.16716 0
2.6506 10
3.3046 10
1 1
{'Empirical' {'Empirical'
SummaryBeta contains the conditional posterior estimates. Estimate the conditional posterior distributions of σ2 given that β is the conditional posterior mean of β | σ2, X, y (stored in SummaryBeta.Mean(1:(end – 1))). Return the estimation summary table. condPostMeanBeta = SummaryBeta.Mean(1:(end - 1)); [~,SummarySigma2] = estimate(PriorMdl,X,y,'Beta',condPostMeanBeta); Method: lasso MCMC sampling with 10000 draws Conditional variable: Beta fixed at -8.0643 Number of observations: 62 Number of predictors: 4
4.4454
0.00039896
2.9792
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------Intercept | -8.0643 0.0000 [-8.064, -8.064] 0.000 Empirical IPI | 4.4454 0.0000 [ 4.445, 4.445] 1.000 Empirical E | 0.0004 0.0000 [ 0.000, 0.000] 1.000 Empirical WR | 2.9792 0.0000 [ 2.979, 2.979] 1.000 Empirical Sigma2 | 56.8314 10.2921 [39.947, 79.731] 1.000 Empirical
estimate displays an estimation summary of the conditional posterior distribution of σ2 given the data and that β is condPostMeanBeta. In the display, inferences on β are trivial.
Access Estimates in Estimation Summary Display Consider the regression model in “Select Variables Using Bayesian Lasso Regression” on page 12639. Create a prior model for performing SSVS. Assume that β and σ2 are dependent (a conjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo for estimation, set a random number seed to reproduce the results. Suppress the estimation display, but return the estimation summary table. rng(1); [PosteriorMdl,Summary] = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. Summary is a table with columns corresponding to posterior 12-648
estimate
characteristics and rows corresponding to the coefficients (PosteriorMdl.VarNames) and disturbance variance (Sigma2). Display the estimated parameter covariance matrix (Covariances) and proportion of times the algorithm includes each predictor (Regime). Covariances = Summary(:,"Covariances") Covariances=5×1 table Covariances ______________________________________________________________________ Intercept IPI E WR Sigma2
103.74 1.0486 -0.0031629 0.6791 7.3916
1.0486 0.023815 -1.3637e-05 -0.030387 0.06611
-0.0031629 -1.3637e-05 1.3481e-07 -8.8792e-05 -0.00025044
0.6791 -0.030387 -8.8792e-05 0.13066 0.089039
7.3916 0.06611 -0.00025044 0.089039 74.911
Regime = Summary(:,"Regime") Regime=5×1 table Regime ______ Intercept IPI E WR Sigma2
0.8806 0.4545 0.0925 0.1734 NaN
Regime contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability that E should be included in the model is 0.0925. Assuming that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model.
Input Arguments PriorMdl — Bayesian linear regression model for predictor variable selection mixconjugateblm model object | mixsemiconjugateblm model object | lassoblm model object Bayesian linear regression model for predictor variable selection, specified as a model object in this table. Model Object
Description
mixconjugateblm
Dependent, Gaussian-mixture-inverse-gamma conjugate model for SSVS predictor variable selection, returned by bayeslm
mixsemiconjugateblm
Independent, Gaussian-mixture-inverse-gamma semiconjugate model for SSVS predictor variable selection, returned by bayeslm
12-649
12
Functions
Model Object
Description
lassoblm
Bayesian lasso regression model returned by bayeslm
X — Predictor data numeric matrix Predictor data for the multiple linear regression model, specified as a numObservations-byPriorMdl.NumPredictors numeric matrix. numObservations is the number of observations and must be equal to the length of y. Data Types: double y — Response data numeric vector Response data for the multiple linear regression model, specified as a numeric vector with numObservations elements. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Sigma2',2 specifies estimating the conditional posterior distribution of the regression coefficients given the data and that the specified disturbance variance is 2. Display — Flag to display Bayesian estimator summary to command line true (default) | false Flag to display Bayesian estimator summary to the command line, specified as the comma-separated pair consisting of 'Display' and a value in this table. Value
Description
true
estimate prints estimation information and a table summarizing the Bayesian estimators to the command line.
false
estimate does not print to the command line.
The estimation information includes the estimation method, fixed parameters, the number of observations, and the number of predictors. The summary table contains estimated posterior means, standard deviations (square root of the posterior variance), 95% equitailed credible intervals, the posterior probability that the parameter is greater than 0, and a description of the posterior distribution (if known). For models that perform SSVS, the display table includes a column for variable-inclusion probabilities. If you specify either Beta or Sigma2, then estimate includes your specification in the display. Corresponding posterior estimates are trivial. Example: 'Display',false 12-650
estimate
Data Types: logical Beta — Value of regression coefficients for conditional posterior distribution estimation of disturbance variance empty array ([]) (default) | numeric column vector Value of the regression coefficients for conditional posterior distribution estimation of the disturbance variance, specified as the comma-separated pair consisting of 'Beta' and a (PriorMdl.Intercept + PriorMdl.NumPredictors)-by-1 numeric vector. estimate estimates the characteristics of π(σ2| y,X,β = Beta), where y is y, X is X, and Beta is the value of 'Beta'. If PriorMdl.Intercept is true, then Beta(1) corresponds to the model intercept. All other values correspond to the predictor variables that compose the columns of X. Beta cannot contain any NaN values (that is, all coefficients must be known). You cannot specify Beta and Sigma2 simultaneously. By default, estimate does not compute characteristics of the conditional posterior of σ2. Example: 'Beta',1:3 Data Types: double Sigma2 — Value of disturbance variance for conditional posterior distribution estimation of regression coefficients empty array ([]) (default) | positive numeric scalar Value of the disturbance variance for conditional posterior distribution estimation of the regression coefficients, specified as the comma-separated pair consisting of 'Sigma2' and a positive numeric scalar. estimate estimates characteristics of π(β|y,X,Sigma2), where y is y, X is X, and Sigma2 is the value of 'Sigma2'. You cannot specify Sigma2 and Beta simultaneously. By default, estimate does not compute characteristics of the conditional posterior of β. Example: 'Sigma2',1 Data Types: double NumDraws — Monte Carlo simulation adjusted sample size 1e5 (default) | positive integer Monte Carlo simulation adjusted sample size, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer. estimate actually draws BurnIn – NumDraws*Thin samples. Therefore, estimate bases the estimates off NumDraws samples. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-637. Example: 'NumDraws',1e7 Data Types: double BurnIn — Number of draws to remove from beginning of Monte Carlo sample 5000 (default) | nonnegative scalar Number of draws to remove from the beginning of the Monte Carlo sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and a nonnegative scalar. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-637. 12-651
12
Functions
Tip To help you specify the appropriate burn-in period size, determine the extent of the transient behavior in the Monte Carlo sample by specifying 'BurnIn',0, simulating a few thousand observations using simulate, and then plotting the paths. Example: 'BurnIn',0 Data Types: double Thin — Monte Carlo adjusted sample size multiplier 1 (default) | positive integer Monte Carlo adjusted sample size multiplier, specified as the comma-separated pair consisting of 'Thin' and a positive integer. The actual Monte Carlo sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, estimate discards every Thin – 1 draws, and then retains the next. For details on how estimate reduces the full Monte Carlo sample, see “Algorithms” on page 12-637. Tip To reduce potential large serial correlation in the Monte Carlo sample, or to reduce the memory consumption of the draws stored in PosteriorMdl, specify a large value for Thin. Example: 'Thin',5 Data Types: double BetaStart — Starting values of regression coefficients for MCMC sample numeric column vector Starting values of the regression coefficients for the Markov chain Monte Carlo (MCMC) sample, specified as the comma-separated pair consisting of 'BetaStart' and a numeric column vector with (PriorMdl.Intercept + PriorMdl.NumPredictors) elements. By default, BetaStart is the ordinary least-squares (OLS) estimate. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values. Example: 'BetaStart',[1; 2; 3] Data Types: double Sigma2Start — Starting values of disturbance variance for MCMC sample positive numeric scalar Starting values of the disturbance variance for the MCMC sample, specified as the comma-separated pair consisting of 'Sigma2Start' and a positive numeric scalar. By default, Sigma2Start is the OLS residual mean squared error. Tip A good practice is to run estimate multiple times using different parameter starting values. Verify that the solutions from each run converge to similar values.
12-652
estimate
Example: 'Sigma2Start',4 Data Types: double
Output Arguments PosteriorMdl — Bayesian linear regression model storing distribution characteristics mixconjugateblm model object | mixsemiconjugateblm model object | lassoblm model object | empiricalblm model object Bayesian linear regression model storing distribution characteristics, returned as a mixconjugateblm, mixsemiconjugateblm, lassoblm, or empiricalblm model object. • If you do not specify either Beta or Sigma2 (their values are []), then estimate updates the prior model using the data likelihood to form the posterior distribution. PosteriorMdl characterizes the posterior distribution and is an empiricalblm model object. Information PosteriorMdl stores or displays helps you decide whether predictor variables are important. • If you specify either Beta or Sigma2, then PosteriorMdl equals PriorMdl (the two models are the same object storing the same property values). estimate does not update the prior model to form the posterior model. However, Summary stores conditional posterior estimates. For more details on the display of PosteriorMdl, see Summary. Summary — Summary of Bayesian estimators table Summary of Bayesian estimators, returned as a table. Summary contains the same information as the display of the estimation summary (Display). Rows correspond to parameters, and columns correspond to these posterior characteristics: • Mean – Posterior mean • Std – Posterior standard deviation • CI95 – 95% equitailed credible interval • Positive – Posterior probability that the parameter is greater than 0 • Distribution – Description of the marginal or conditional posterior distribution of the parameter, when known • Covariances – Estimated covariance matrix of the coefficients and disturbance variance • Regime – Variable-inclusion probabilities for models that perform SSVS; low probabilities indicate that the variable should be excluded from the model Row names are the names in PriorMdl.VarNames. The name of the last row is Sigma2. Alternatively, pass PosteriorMdl to summarize to obtain a summary of Bayesian estimators.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: 12-653
12
Functions
• yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Tip • Monte Carlo simulation is subject to variation. If estimate uses Monte Carlo simulation, then estimates and inferences might vary when you call estimate multiple times under seemingly equivalent conditions. To reproduce estimation results, before calling estimate, set a random number seed by using rng.
Algorithms This figure shows how estimate reduces the Monte Carlo sample using the values of NumDraws, Thin, and BurnIn.
Rectangles represent successive draws from the distribution. estimate removes the white rectangles from the Monte Carlo sample. The remaining NumDraws black rectangles compose the Monte Carlo sample.
Version History Introduced in R2018b 12-654
estimate
See Also Objects mixconjugateblm | mixsemiconjugateblm | lassoblm Functions summarize | forecast | simulate | plot Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10 “Bayesian Stochastic Search Variable Selection” on page 6-63 “Bayesian Lasso Regression” on page 6-52
12-655
12
Functions
estimate Estimate posterior distribution of Bayesian state-space model parameters
Syntax PosteriorMdl = estimate(PriorMdl,Y,params0) PosteriorMdl = estimate(PriorMdl,Y,params0,Name=Value) [PosteriorMdl,estParams,EstParamCov,Summary] = estimate( ___ )
Description PosteriorMdl = estimate(PriorMdl,Y,params0) returns the posterior Bayesian state-space model PosteriorMdl from combining the Bayesian state-space model prior distribution and likelihood PriorMdl with the response data Y. The input argument params0 is the vector of initial values for the unknown state-space model parameters Θ in PriorMdl. PosteriorMdl = estimate(PriorMdl,Y,params0,Name=Value) specifies additional options using one or more name-value arguments. For example, estimate(Mdl,Y,params0,NumDraws=1e6,Thin=3,DoF=10) uses the multivariate t10 distribution for the Metropolis-Hastings [1][2] proposal, draws 3e6 random vectors of parameters, and thins the sample to reduce serial correlation by discarding every 2 draws until it retains 1e6 draws. [PosteriorMdl,estParams,EstParamCov,Summary] = estimate( ___ ) additionally returns the following quantities using any of the input-argument combinations in the previous syntaxes: • estParams — A vector containing the estimated parameters Θ. • EstParamCov — The estimated variance-covariance matrix of the estimated parameters Θ. • Summary — Estimation summary of the posterior distribution of parameters Θ. If the distribution of the state disturbances or observation innovations is non-Gaussian, Summary also includes the estimation summary of the final state values, and any estimated disturbance or innovation distribution hyperparameters.
Examples Estimate Posterior Distribution of Time-Invariant Model Simulate observed responses from a known state-space model, then treat the model as Bayesian and estimate the posterior distribution of the parameters by treating the state-space model parameters as unknown. Suppose the following state-space model is a data-generating process (DGP). xt, 1 xt, 2
=
xt − 1, 1 0.5 0 1 0 ut, 1 + 0 −0 . 75 xt − 1, 2 0 0 . 5 ut, 2
yt = 1 1
12-656
xt, 1 xt, 2
.
estimate
Create a standard state-space model object ssm that represents the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)]; B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C);
Simulate a response path from the DGP. rng(1); % For reproducibility y = simulate(DGP,200);
Suppose the structure of the DGP is known, but the state parameters trueTheta are unknown, explicitly xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 1
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
Consider a Bayesian state-space model representing the model with unknown parameters. Arbitrarily assume that the prior distribution of ϕ1, ϕ2, σ12, and σ22 are independent Gaussian random variables with mean 0.5 and variance 1. The Local Functions on page 12-660 section contains two functions required to specify the Bayesian state-space model. You can use the functions only within this script. The paramMap function accepts a vector of the unknown state-space model parameters and returns all the following quantities: • •
A= B=
ϕ1 0 0 ϕ2 σ1 0 0 σ2
. .
• C= 1 1. • D = 0. • Mean0 and Cov0 are empty arrays [], which specify the defaults. • StateType = 0 0 , indicating that each state is stationary. The paramDistribution function accepts the same vector of unknown parameters as does paramMap, but it returns the log prior density of the parameters at their current values. Specify that parameter values outside the parameter space have log prior density of -Inf. Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm. PriorMdl = bssm(@paramMap,@priorDistribution) PriorMdl = Mapping that defines a state-space model:
12-657
12
Functions
@paramMap Log density of parameter prior distribution: @priorDistribution
PriorMdl is a bssm object representing the Bayesian state-space model with unknown parameters. Estimate the posterior distribution using default options of estimate. Specify a random set of positive values in [0,1] to initialize the MCMC algorithm. numParams = 4; theta0 = rand(numParams,1); PosteriorMdl = estimate(PriorMdl,y,theta0); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.6968 0.4459 0.0798 c(2) | 0.7662 -0.8781 0.0483 c(3) | 0.3425 0.9633 0.0694 c(4) | 0.8459 0.3978 0.0726 Posterior Distributions | Mean Std Quantile05 Quantile95 -----------------------------------------------c(1) | 0.4491 0.0905 0.3031 0.6164 c(2) | -0.8577 0.0606 -0.9400 -0.7365 c(3) | 0.9589 0.0695 0.8458 1.0699 c(4) | 0.4316 0.0893 0.3045 0.6023 Proposal acceptance rate = 40.10% PosteriorMdl.ParamMap ans = function_handle with value: @paramMap ThetaPostDraws = PosteriorMdl.ParamDistribution; [numParams,numDraws] = size(ThetaPostDraws) numParams = 4 numDraws = 1000
estimate finds an optimal proposal distribution for the Metropolis-Hastings sampler by using the tune function. Therefore, by default, estimate prints convergence information from tune. Also, estimate displays a summary of the posterior distribution of the parameters. The true values of the parameters are close to their corresponding posterior means; all are within their corresponding 95% credible intervals. PosteriorMdl is a bssm object representing the posterior distribution. 12-658
estimate
• PosteriorMdl.ParamMap is the function handle to the function representing the state-space model structure; it is the same function as PrioirMdl.ParamMap. • ThetaPostDraws is a 4-by-1000 matrix of draws from the posterior distribution. By default, estimate treats the first 100 draws as a burn-in sample and removes them from the matrix. To diagnose the Markov chain induced by the Metropolis-Hastings sampler, create trace plots of the posterior parameter draws. paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(ThetaPostDraws(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
The sampler eventually settles at near the true values of the parameters. In this case, the sample shows serial correlation and transient behavior. You can remedy serial correlation in the sample by adjusting the Thin name-value argument, and you can remedy transient effects by increasing the burn-in period using the BurnIn name-value argument.
12-659
12
Functions
Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Improve Markov Chain Convergence Consider the model in the example “Estimate Posterior Distribution of Time-Invariant Model” on page 12-656. Improve the Markov chain convergence by adjusting sampler options. Create a standard state-space model object ssm that represents the DGP, and then simulate a response path. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)]; B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C); rng(1); % For reproducibility y = simulate(DGP,200);
Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm (the functions are in Local Functions on page 12-663). Mdl = bssm(@paramMap,@priorDistribution) Mdl = Mapping that defines a state-space model: @paramMap Log density of parameter prior distribution:
12-660
estimate
@priorDistribution
Estimate the posterior distribution. Specify the simulated response path as observed responses, specify a random set of positive values in [0,1] to initialize the MCMC algorithm, and shut off all optimization displays. The plots in “Estimate Posterior Distribution of Time-Invariant Model” on page 12-656 suggest that the Markov chain settles after 500 draws. Therefore, specify a burn-in period of 500 (BurnIn=500). Specify thinning the sample by keeping the first draw of each set of 30 successive draws (Thin=30). Retain 2000 random parameter vectors (NumDraws=2000). numParams = 4; theta0 = rand(numParams,1); options = optimoptions("fminunc",Display="off"); PosteriorMdl = estimate(Mdl,y,theta0,Options=options, ... NumDraws=2000,BurnIn=500,Thin=30); Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.6968 0.4459 0.0798 c(2) | 0.7662 -0.8781 0.0483 c(3) | 0.3425 0.9633 0.0694 c(4) | 0.8459 0.3978 0.0726 Posterior Distributions | Mean Std Quantile05 Quantile95 -----------------------------------------------c(1) | 0.4495 0.0822 0.3135 0.5858 c(2) | -0.8561 0.0587 -0.9363 -0.7468 c(3) | 0.9645 0.0744 0.8448 1.0863 c(4) | 0.4333 0.0860 0.3086 0.5889 Proposal acceptance rate = 38.85% ThetaPostDraws = PosteriorMdl.ParamDistribution;
Plot trace plots and correlograms of the parameters. paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(ThetaPostDraws(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
12-661
12
Functions
figure h = tiledlayout(4,1); for j = 1:numParams nexttile autocorr(ThetaPostDraws(j,:)); ylabel(paramNames(j)); title([]); end title(h,"Posterior Sample Correlograms")
12-662
estimate
The sampler quickly settles near the true values of the parameters. The sample shows little serial correlation and no transient behavior. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0);
12-663
12
Functions
logprior = sum(log(p)); end end
Estimate Degrees of Freedom of t-Distributed State Disturbances Simulate observed responses from a DGP, then treat the model as Bayesian and estimate the posterior distribution of the model parameters Θ and the degrees of freedom νu of multivariate t-distributed state disturbances. Consider the following DGP. xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 3
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
• The true value of the state-space parameter set Θ = ϕ1, ϕ2, σ1, σ2 = 0 . 5, − 0 . 75, 1, 0 . 5 . • The state disturbances u1, t and u2, t are jointly a multivariate Student's t random series with νu = 5 degrees of freedom. Create a vector autoregression (VAR) model that represents the state equation of the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; trueDoF = 5; phi = [trueTheta(1) 0; 0 trueTheta(2)]; Sigma = [trueTheta(3)^2 0; 0 trueTheta(4)^2]; DGP = varm(AR={phi},Covariance=Sigma,Constant=[0; 0]);
Filter a random 2-D multivariate central t series of length 500 through the VAR model to obtain state values. Set the degrees of freedom to 5. rng(10) % For reproducibility T = 500; trueU = mvtrnd(eye(DGP.NumSeries),trueDoF,T); X = filter(DGP,trueU);
Obtain a series of observations from the DGP by the linear combination yt = x1, e + 3x2, t. C = [1 3]; y = X*C';
Consider a Bayesian state-space model representing the model with parameters Θ and νu treated as unknown. Arbitrarily assume that the prior distribution of the parameters in Θ are independent Gaussian random variables with mean 0.5 and variance 1. Assume that the prior on the degrees of freedom νu is flat. The functions in Local Functions on page 12-666 specify the state-space structure and prior distributions. Create the Bayesian state-space model by passing function handles to the paramMap and priorDistribution functions to bssm. Specify that the state disturbance distribution is multivariate Student's t with unknown degrees of freedom. 12-664
estimate
PriorMdl = bssm(@paramMap,@priorDistribution,StateDistribution="t");
PriorMdl is a bssm object representing the Bayesian state-space model with unknown parameters. Estimate the posterior distribution by using estimate. Specify a random set of positive values in [0,1] to initialize the MCMC algorithm. Set the burn-in period of the MCMC algorithm to 1000 draws, thin the entire MCMC sample by retaining every third draw, and set the proposal scale matrix proportionality constant to 0.25 to increase the proposal acceptance rate. numParamsTheta = 4; theta0 = rand(numParamsTheta,1); PosteriorMdl = estimate(PriorMdl,y,theta0,Thin=3,BurnIn=1000,Proportion=0.25); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.9219 0.3622 0.1151 c(2) | 0.9475 -0.7530 0.0454 c(3) | 0.2299 1.3465 0.1917 c(4) | 0.6759 0.5891 0.0545 Posterior Distributions | Mean Std Quantile05 Quantile95 ---------------------------------------------------c(1) | 0.4516 0.1036 0.2641 0.5972 c(2) | -0.7459 0.0376 -0.8085 -0.6863 c(3) | 0.9816 0.1526 0.7430 1.2494 c(4) | 0.4843 0.0402 0.4234 0.5520 x(1) | -1.1901 0.9271 -2.7359 0.2539 x(2) | 0.2133 0.3090 -0.2680 0.7286 StateDoF | 5.3980 0.7931 4.1574 6.7111 Proposal acceptance rate = 51.10% ThetaPostDraws = PosteriorMdl.ParamDistribution; [numParams,numDraws] = size(ThetaPostDraws) numParams = 4 numDraws = 1000
estimate displays a summary of the posterior distribution of the parameters Θ (c(1) through c(4)), the final values of the two states (x(1) and x(2)), and νu (StateDoF). The true values of the parameters are close to their corresponding posterior means; all are within their corresponding 95% credible intervals. PosteriorMdl is a bssm object representing the posterior distribution. • PosteriorMdl.ParamMap is the function handle to the function representing the state-space model structure. It is the same function as PrioirMdl.ParamMap. • ThetaPostDraws is a 4-by-1000 matrix of draws from the posterior distribution of Θ | y. To diagnose the Markov chain induced by the Metropolis-Hastings sampler, create trace plots of the posterior parameter draws of Θ. 12-665
12
Functions
paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(ThetaPostDraws(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
The sample shows some serial correlation. You can increase Thin to remedy this behavior. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 3]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end
12-666
estimate
function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Estimate Posterior of Time-Varying Model Consider the following time-varying, state-space model for a DGP: • From periods 1 through 250, the state equation includes stationary AR(2) and MA(1) models, respectively, and the observation model is the weighted sum of the two states. • From periods 251 through 500, the state model includes only the first AR(2) model. • μ0 = 0 . 5 0 . 5 0 0 and Σ0 is the identity matrix. Symbolically, the DGP is x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1 0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
σ1 0 0 0 u1t 0 1 u2t for t = 1, . . . , 250, 0 1
+
yt = c1 x1t + x3t + σ2εt x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1 1 0 0 0 x3, t − 1
σ1
+
0
u1t
for t = 251,
x4, t − 1 yt = c2x1t + σ3εt x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0
u1t
for t = 252, . . . , 500,
yt = c2x1t + σ3εt where: • The AR(2) parameters ϕ1, ϕ2 = 0 . 5, − 0 . 2 and σ1 = 0 . 4. • The MA(1) parameter θ = 0 . 3. • The observation equation parameters c1, c2 = 2, 3 and σ2, σ3 = 0 . 1, 0 . 2 . Write a function that specifies how the parameters theta and sample size T map to the state-space model matrices, the initial state moments, and the state types. Save this code as a file named 12-667
12
Functions
timeVariantParamMapBayes.m on your MATLAB® path. Alternatively, open the example to access the function. type timeVariantParamMapBayes.m % Copyright 2022 The MathWorks, Inc. function [A,B,C,D,Mean0,Cov0,StateType] = timeVariantParamMapBayes(theta,T) % Time-variant, Bayesian state-space model parameter mapping function % example. This function maps the vector params to the state-space matrices % (A, B, C, and D), the initial state value and the initial state variance % (Mean0 and Cov0), and the type of state (StateType). From periods 1 % through T/2, the state model is a stationary AR(2) and an MA(1) model, % and the observation model is the weighted sum of the two states. From % periods T/2 + 1 through T, the state model is the AR(2) model only. The % log prior distribution enforces parameter constraints (see % flatPriorBSSM.m). T1 = floor(T/2); T2 = T - T1 - 1; A1 = {[theta(1) theta(2) 0 0; 1 0 0 0; 0 0 0 theta(4); 0 0 0 0]}; B1 = {[theta(3) 0; 0 0; 0 1; 0 1]}; C1 = {theta(5)*[1 0 1 0]}; D1 = {theta(6)}; Mean0 = [0.5 0.5 0 0]; Cov0 = eye(4); StateType = [0 0 0 0]; A2 = {[theta(1) theta(2) 0 0; 1 0 0 0]}; B2 = {[theta(3); 0]}; A3 = {[theta(1) theta(2); 1 0]}; B3 = {[theta(3); 0]}; C3 = {theta(7)*[1 0]}; D3 = {theta(8)}; A = [repmat(A1,T1,1); A2; repmat(A3,T2,1)]; B = [repmat(B1,T1,1); B2; repmat(B3,T2,1)]; C = [repmat(C1,T1,1); repmat(C3,T2+1,1)]; D = [repmat(D1,T1,1); repmat(D3,T2+1,1)]; end
Simulate a response path of length 500 from the model. params = [0.5; -0.2; 0.4; 0.3; 2; 0.1; 3; 0.2]; numParams = numel(params); numObs = 500; [A,B,C,D,mean0,Cov0,stateType] = timeVariantParamMapBayes(params,numObs); DGP = ssm(A,B,C,D,Mean0=mean0,Cov0=Cov0,StateType=stateType); rng(1) % For reproducibility y = simulate(DGP,numObs); plot(y) ylabel("y")
12-668
estimate
Write a function that specifies a flat prior distribution on the state-space model parameters theta. The function returns the scalar log prior for an input set of parameters. Save this code as a file named flatPriorBSSM.m on your MATLAB® path. Alternatively, open the example to access the function. type flatPriorBSSM.m % Copyright 2022 The MathWorks, Inc. function logprior = flatPriorBSSM(theta) % flatPriorBSSM computes the log of the flat prior density for the eight % variables in theta (see timeVariantParamMapBayes.m). Log probabilities % for parameters outside the parameter space are -Inf. % theta(1) and theta(2) are lag 1 and lag 2 terms in a stationary AR(2) % model. The eigenvalues of the AR(1) representation need to be within % the unit circle. evalsAR2 = eig([theta(1) theta(2); 1 0]); evalsOutUC = sum(abs(evalsAR2) >= 1) > 0; % Standard deviations of disturbances and errors (theta(3), theta(6), % and theta(8)) need to be positive. nonnegsig1 = theta(3) Mdl.P, estimate uses the latest required number of observations only. The last element or row contains the latest observation. By default, estimate backcasts the error model for the required presample unconditional disturbances. Data Types: double Presample — Presample data table | timetable Presample data containing the error model residual series, associated with the model innovations εt, or the regression residual series, associated with the unconditional disturbances ut, to initialize the model for estimation, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1. Each selected variable is a single path of numpreobs observations representing the presample of error or regression model residuals associated the selected response variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must satisfy one of the following conditions: • numpreobs ≥ Mdl.P when Presample provides only presample regression model residuals • numpreobs ≥ Mdl.Q when Presample provides only presample error model residuals • numpreobs ≥ max([Mdl.P Mdl.Q]) when Presample provides presample error and regression model residuals. If you supply more rows than necessary, estimate uses the latest required number of observations only. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default, estimate backcasts for necessary presample regression model residuals and it sets necessary presample error model residuals to zero. If you specify Presample, you must specify at least one of the presample regression or error model residual variable names by using the PresampleRegressionDisturbanceVariable or PresampleInnovationVariable name-value argument, respectively. PresampleInnovationVariable — Error model residual variable to select from Presample string scalar | character vector | integer | logical vector Error model residual variable to select from Presample containing presample error model residual data, associated with the model innovations εt, specified as one of the following data types: 12-777
12
Functions
• String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample error model residual data by using the Presample name-value argument, you must specify PresampleInnovationVariable. Example: PresampleInnovationVariable="GDPInnov" Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable for presample error model residual data. Data Types: double | logical | char | cell | string PresampleRegressionDistrubanceVariable — Regression model residual variable to select from Presample string scalar | character vector | integer | logical vector Regression model residual variable to select from Presample containing presample data for the regression model residuals, associated with the unconditional disturbances ut, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleRegressionDistrubanceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample regression residual data by using the Presample name-value argument, you must specify PresampleRegressionDistrubanceVariable. Example: PresampleRegressionDistrubanceVariable="StockRateU" Example: PresampleRegressionDistrubanceVariable=[false false true false] or PresampleRegressionDistrubanceVariable=3 selects the third table variable as the presample regression model residual data. Data Types: double | logical | char | cell | string Initial Parameter Value Specifications
Intercept0 — Initial estimate of regression model intercept c numeric scalar Initial estimate of the regression model intercept c, specified as a numeric scalar. By default, estimate derives initial estimates using standard time series techniques. Data Types: double 12-778
estimate
AR0 — Initial estimates of nonseasonal autoregressive (AR) polynomial coefficients ɑ(L) numeric vector Initial estimates of the nonseasonal AR polynomial coefficients ɑ(L), specified as a numeric vector. Elements of AR0 correspond to nonzero cells of Mdl.AR. By default, estimate derives initial estimates using standard time series techniques. Data Types: double SAR0 — Initial estimates of seasonal AR polynomial coefficients A(L) numeric vector Initial estimates of the seasonal AR polynomial coefficients A(L), specified as a numeric vector. Elements of SAR0 correspond to nonzero cells of Mdl.SAR. By default, estimate derives initial estimates using standard time series techniques. Data Types: double MA0 — Initial estimates of nonseasonal moving average (MA) polynomial coefficients b(L) numeric vector Initial estimates of the nonseasonal MA polynomial coefficients b(L), specified as a numeric vector. Elements of MA0 correspond to elements of Mdl.MA. By default, estimate derives initial estimates using standard time series techniques. Data Types: double SMA0 — Initial estimates of seasonal MA polynomial coefficients B(L) numeric vector Initial estimates of the seasonal moving average polynomial coefficients B(L), specified as a numeric vector. Elements of SMA0 correspond to nonzero cells of Mdl.SMA. By default, estimate derives initial estimates using standard time series techniques. Data Types: double Beta0 — Initial estimates of regression coefficients numeric vector Initial estimates of the regression coefficients β, specified as a numeric vector. The length of Beta0 must equal the numpreds. Elements of Beta0 correspond to the predictor variables represented by the columns of X or PredictorVariables. By default, estimate derives initial estimates using standard time series techniques. Data Types: double DoF0 — Initial estimate of t-distribution degrees-of-freedom parameter 10 (default) | positive scalar 12-779
12
Functions
Initial estimate of the t-distribution degrees-of-freedom parameter ν, specified as a positive scalar. DoF0 must exceed 2. Data Types: double Variance0 — Initial estimates of error model innovation variance σt2 positive scalar Initial estimate of the error model innovation variance σt2, specified as a positive scalar. By default, estimate derives initial estimates using standard time series techniques. Example: Variance0=2 Data Types: double Note • NaN values in y, X, E0, and U0 indicate missing values. estimate removes missing values from specified data by listwise deletion. • For the presample, estimate horizontally concatenates E0 and U0, and then it removes any row of the concatenated matrix containing at least one NaN. • For the estimation sample, estimate horizontally concatenates y and X, and then it removes any row of the concatenated matrix containing at least one NaN. • Regardless of sample, estimate synchronizes the specified, possibly jagged vectors with respect to the latest observation of the sample (last row). This type of data reduction reduces the effective sample size and can create an irregular time series. • estimate issues an error when any table or timetable input contains missing values. • The intercept c of a regression model with ARIMA errors having nonzero degrees of seasonal or nonseasonal integration, Mdl.Seasonality or Mdl.D, is not identifiable. In other words, estimate cannot estimate an intercept of a regression model with ARIMA errors that has nonzero degrees of seasonal or nonseasonal integration. If you pass in such a model for estimation, estimate displays a warning in the Command Window and sets EstMdl.Intercept to NaN. • If you specify the Display name-value argument, the value takes precedence over the specifications of the optimization options Diagnostics and Display. Otherwise, estimate honors all selections related to the display of optimization information in the optimization options.
Output Arguments EstMdl — Estimated regression model with ARIMA errors regARIMA model object Estimated regression model with ARIMA errors, returned as a regARIMA model object. estimate uses maximum likelihood to calculate all parameter estimates not constrained by Mdl (that is, it estimates all parameters in Mdl that you set to NaN). EstMdl is a copy of Mdl that has NaN values replaced with parameter estimates. EstMdl is fully specified. 12-780
estimate
EstParamCov — Estimated covariance matrix of maximum likelihood estimates positive semidefinite numeric matrix Estimated covariance matrix of maximum likelihood estimates known to the optimizer, returned as a positive semidefinite numeric matrix. The rows and columns contain the covariances of the parameter estimates. The standard error of each parameter estimate is the square root of the main diagonal entries. The rows and columns corresponding to any parameters held fixed as equality constraints are zero vectors. Parameters corresponding to the rows and columns of EstParamCov appear in the following order: • Intercept • Nonzero AR coefficients at positive lags, from the smallest to largest lag • Nonzero SAR coefficients at positive lags, from the smallest to largest lag • Nonzero MA coefficients at positive lags, from the smallest to largest lag • Nonzero SMA coefficients at positive lags, from the smallest to largest lag • Regression coefficients (when you specify exogenous data), ordered by the columns of X or entries of PredictorVariables • Innovations variance • Degrees of freedom (t-innovation distribution only) estimate uses the outer product of gradients (OPG) method to perform covariance matrix estimation on page 3-60. Data Types: double logL — Optimized loglikelihood objective function value numeric scalar Optimized loglikelihood objective function value, returned as a numeric scalar. Data Types: double info — Optimization summary structure array Optimization summary, returned as a structure array with the fields described in this table. Field
Description
exitflag
Optimization exit flag (see fmincon in Optimization Toolbox)
options
Optimization options controller (see optimoptions and fmincon in Optimization Toolbox)
X
Vector of final parameter estimates
X0
Vector of initial parameter estimates
For example, you can display the vector of final estimates by entering info.X in the Command Window. Data Types: struct 12-781
12
Functions
Tip • To access values of the estimation results, including the number of free parameters in the model, pass EstMdl to summarize.
Algorithms estimate estimates the parameters as follows: 1
Initialize the model by applying initial data and parameter values.
2
Infer the unconditional disturbances from the regression model.
3
Infer the residuals of the ARIMA error model.
4
Use the distribution of the innovations to build the likelihood function.
5
Maximize the loglikelihood function with respect to the parameters using fmincon.
Version History Introduced in R2013b R2023b: estimate accepts input data in tables and timetables In addition to accepting input data (in-sample and presample data) in numeric arrays, estimate accepts input data in tables or regular timetables. When you supply data in a table or timetable, estimate chooses the default series on which to operate, but you can use the specified optional name-value argument to select a different series. Name-value arguments to support tabular workflows include: • ResponseVariable specifies the variable name of the response series in the input data Tbl1, to which the model is fit. • PredictorVariables specifies the names of the predictor series to select from the input data for the model regression component. • Presample specifies the input table or timetable of presample response, regression model residual, and error model residual data. • PresampleResponseVariable specifies the variable name of the response series to select from Presample. • PresampleInnovationVariable specifies the variable name of the error model residual series to select from Presample. • PresampleRegressionDisturbanceVariable specifies the name of the regression residual series to select from Presample. R2019b: estimate includes the final lag in all estimated univariate time series model polynomials Behavior changed in R2019b estimate includes the final polynomial lag as specified in the input model template for estimation. In other words, the specified polynomial degrees of an input model template returned by an object creation function and the corresponding polynomial degrees of the estimated model returned by estimate are equal. 12-782
estimate
Before R2019b, estimate removed trailing lags estimated below the tolerance of 1e-12. Update Code
Polynomial degrees require minimum presample observations for operations downstream of estimation, such as model forecasting and simulation. If a model template in your code does not describe the data generating process well, then the polynomials in the estimated model can have higher degrees than in previous releases. Consequently, you must supply additional presample responses for operations on the estimated model; otherwise, the function issues an error. For more details, see the Y0 name-value argument.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [3] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991. [6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
See Also Objects regARIMA Functions forecast | infer | simulate | summarize Topics “Estimate Regression Model with ARIMA Errors” on page 5-88 “Intercept Identifiability in Regression Models with ARIMA Errors” on page 5-109 “Alternative ARIMA Model Representations” on page 5-113 “Maximum Likelihood Estimation for Conditional Mean Models” on page 7-106 “Conditional Mean Model Estimation with Equality Constraints” on page 7-108 “Presample Data for Conditional Mean Model Estimation” on page 7-109 “Initial Values for Conditional Mean Model Estimation” on page 7-111 “Optimization Settings for Conditional Mean Model Estimation” on page 7-113
12-783
12
Functions
estimate Maximum likelihood parameter estimation of state-space models
Syntax EstMdl = estimate(Mdl,Y,params0) EstMdl = estimate(Mdl,Y,params0,Name,Value) [EstMdl,estParams,EstParamCov,logL,Output] = estimate( ___ )
Description EstMdl = estimate(Mdl,Y,params0) returns an estimated state-space model on page 11-3 from fitting the ssm model Mdl to the response data Y. params0 is the vector of initial values for the unknown parameters in Mdl. EstMdl = estimate(Mdl,Y,params0,Name,Value) estimates the state-space model with additional options specified by one or more Name,Value pair arguments. For example, you can specify to deflate the observations by a linear regression using predictor data, control how the results appear in the Command Window, and indicate which estimation method to use for the parameter covariance matrix. [EstMdl,estParams,EstParamCov,logL,Output] = estimate( ___ ) additionally returns: • estParams, a vector containing the estimated parameters • EstParamCov, the estimated variance-covariance matrix of the estimated parameters • logL, the optimized loglikelihood value • Output, optimization diagnostic information structure using any of the input arguments in the previous syntaxes.
Examples Fit Time-Invariant State-Space Model to Data Generate data from a known model, and then fit a state-space model to the data. Suppose that a latent process is this AR(1) process xt = 0 . 5xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. T = 100; ARMdl = arima('AR',0.5,'Constant',0,'Variance',1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,'Y0',x0);
12-784
estimate
Suppose further that the latent process is subject to additive measurement error as indicated in the equation yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.1. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.1*randn(T,1);
Together, the latent process and observation equations compose a state-space model. Supposing that the coefficients and variances are unknown parameters, the state-space model is xt = ϕxt − 1 + σ1ut yt = xt + σ2εt . Specify the state-transition matrix. Use NaN values for unknown parameters. A = NaN;
Specify the state-disturbance-loading coefficient matrix. B = NaN;
Specify the measurement-sensitivity coefficient matrix. C = 1;
Specify the observation-innovation coefficient matrix D = NaN;
Specify the state-space model using the coefficient matrices. Also, specify the initial state mean, variance, and distribution (which is stationary). Mean0 = 0; Cov0 = 10; StateType = 0; Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType);
Mdl is an ssm model. Verify that the model is correctly specified using the display in the Command Window. Pass the observations to estimate to estimate the parameter. Set a starting value for the parameter to params0. σ1 and σ2 must be positive, so set the lower bound constraints using the 'lb' namevalue pair argument. Specify that the lower bound of ϕ is -Inf. params0 = [0.9; 0.5; 0.1]; EstMdl = estimate(Mdl,y,params0,'lb',[-Inf; 0; 0]) Method: Maximum likelihood (fmincon) Sample size: 100 Logarithmic likelihood: -140.532 Akaike info criterion: 287.064 Bayesian info criterion: 294.879 | Coeff Std Err t Stat
Prob
12-785
12
Functions
------------------------------------------------c(1) | 0.45425 0.19870 2.28611 0.02225 c(2) | 0.89013 0.30359 2.93205 0.00337 c(3) | 0.38750 0.57858 0.66975 0.50302 | | Final State Std Dev t Stat Prob x(1) | 1.52989 0.35621 4.29498 0.00002 EstMdl = State-space model type: ssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.45)x1(t-1) + (0.89)u1(t) Observation equation: y1(t) = x1(t) + (0.39)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 10 State types x1 Stationary
EstMdl is an ssm model. The results of the estimation appear in the Command Window, contain the fitted state-space equations, and contain a table of parameter estimates, their standard errors, t statistics, and p-values. Use dot notation to use or display the fitted state-transition matrix. EstMdl.A ans = 0.4543
Pass EstMdl to forecast to forecast observations, or to simulate to conduct a Monte Carlo study.
12-786
estimate
Estimate State-Space Model Containing Regression Component Suppose that the linear relationship between the change in the unemployment rate and the nominal gross national product (nGNP) growth rate is of interest. Suppose further that the first difference of the unemployment rate is an ARMA(1,1) series. Symbolically, and in state-space form, the model is x1, t x2, t
=
1 ϕ θ x1, t − 1 + u1, t 1 0 0 x2, t − 1
yt − βZt = x1, t + σεt, where: • x1, t is the change in the unemployment rate at time t. • x2, t is a dummy state for the MA(1) effect. •
y1, t is the observed change in the unemployment being deflated by the growth rate of nGNP (Zt).
• u1, t is the Gaussian series of state disturbances having mean 0 and standard deviation 1. • εt is the Gaussian series of observation innovations having mean 0 and standard deviation σ. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series and the first difference of each. Also, remove the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); u = DataTable.UR(~isNaN); T = size(gnpn,1); Z = [ones(T-1,1) diff(log(gnpn))]; y = diff(u);
% Flag periods containing NaNs % Sample size
This example proceeds using series without NaN values. However, using the Kalman filter framework, the software can accommodate series containing missing values. Specify the state-transition coefficient matrix. A = [NaN NaN; 0 0];
Specify the state-disturbance-loading coefficient matrix. B = [1; 1];
Specify the measurement-sensitivity coefficient matrix. C = [1 0];
Specify the observation-innovation coefficient matrix. D = NaN;
Specify the state-space model using ssm. Mdl = ssm(A,B,C,D);
12-787
12
Functions
Estimate the model parameters. Specify the regression component and its initial value for optimization using the 'Predictors' and 'Beta0' name-value pair arguments, respectively. Display the estimates and all optimization diagnostic information. Restrict the estimate of σ to all positive, real numbers. params0 = [0.3 0.2 0.1]; % Chosen arbitrarily EstMdl = estimate(Mdl,y,params0,'Predictors',Z,'Display','full',... 'Beta0',[0.1 0.2],'lb',[-Inf,-Inf,0,-Inf,-Inf]); ____________________________________________________________ Diagnostic Information Number of variables: 5 Functions Objective: Gradient: Hessian:
@(c)-fML(c,Mdl,Y,Predictors,unitFlag,sqrtFlag,mexFlag,mexTv finite-differencing bfgs
Constraints Nonlinear constraints:
do not exist
Number Number Number Number
of of of of
linear inequality constraints: linear equality constraints: lower bound constraints: upper bound constraints:
0 0 1 0
Algorithm selected interior-point ____________________________________________________________ End diagnostic information First-order Norm of Iter F-count f(x) Feasibility optimality step 0 6 2.579611e+02 0.000e+00 4.601e+01 1 20 2.556482e+02 0.000e+00 3.652e+01 1.392e-01 2 27 2.503349e+02 0.000e+00 4.319e+01 1.908e-01 3 35 2.379655e+02 0.000e+00 1.290e+01 1.083e+01 4 41 1.947747e+02 0.000e+00 1.947e+01 7.154e+00 5 47 1.606422e+02 0.000e+00 2.138e+02 1.185e+01 6 53 1.257812e+02 0.000e+00 9.342e+01 1.593e+00 7 59 1.109632e+02 0.000e+00 1.083e+01 2.498e+00 8 65 1.047697e+02 0.000e+00 1.079e+01 1.601e+00 9 72 1.033231e+02 0.000e+00 9.500e+00 7.536e-01 10 79 1.022781e+02 0.000e+00 5.056e+00 1.905e+00 11 85 1.006434e+02 0.000e+00 3.648e+00 1.316e+00 12 91 1.001322e+02 0.000e+00 3.021e+00 7.938e-01 13 97 9.979543e+01 0.000e+00 7.406e-01 8.786e-01 14 103 9.974791e+01 0.000e+00 9.977e-01 3.450e-01 15 109 9.973777e+01 0.000e+00 8.530e-01 1.973e-01 16 115 9.973329e+01 0.000e+00 7.201e-01 5.951e-02 17 121 9.973242e+01 0.000e+00 6.040e-01 7.202e-03 18 127 9.973200e+01 0.000e+00 5.630e-01 1.062e-02 19 133 9.973055e+01 0.000e+00 4.245e-01 2.889e-02 20 139 9.972864e+01 0.000e+00 3.486e-01 2.696e-02 21 145 9.972653e+01 0.000e+00 1.990e-01 1.796e-02 22 151 9.972569e+01 0.000e+00 9.999e-02 2.488e-02
12-788
estimate
23 24 25 26 27 28 29 30
157 163 169 175 181 187 193 199
9.972463e+01 9.972456e+01 9.972454e+01 9.972454e+01 9.972454e+01 9.972454e+01 9.972454e+01 9.972454e+01
0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00
5.198e-02 2.000e-02 1.903e-03 2.000e-04 3.503e-06 2.000e-06 1.401e-06 1.816e-06
2.508e-02 3.687e-03 2.211e-03 2.510e-04 3.275e-05 3.264e-07 2.049e-07 8.482e-08
Iter F-count 31 211
f(x) 9.972454e+01
Feasibility 0.000e+00
First-order optimality 1.907e-06
Norm of step 5.733e-08
Local minimum possible. Constraints satisfied. fmincon stopped because the size of the current step is less than the value of the step size tolerance and constraints are satisfied to within the value of the constraint tolerance. Method: Maximum likelihood (fmincon) Sample size: 61 Logarithmic likelihood: -99.7245 Akaike info criterion: 209.449 Bayesian info criterion: 220.003 | Coeff Std Err t Stat Prob ---------------------------------------------------------c(1) | -0.34098 0.29608 -1.15164 0.24948 c(2) | 1.05003 0.41377 2.53771 0.01116 c(3) | 0.48592 0.36790 1.32079 0.18657 y eps) ans = ans(:,:,1) = 0 ans(:,:,2) = 0 ans(:,:,3) = 0 ans(:,:,4) = 0
Row sums among the pages are close to 1. Display the contributions to the forecast error variance of the bond rate when real income is shocked at time 0. Decomposition(:,2,3) ans = 20×1 0.0499 0.1389 0.1700 0.1807 0.1777 0.1694 0.1601 0.1516 0.1446 0.1390 ⋮
Plot the FEVDs of all series on separate plots by passing the estimated AR coefficient matrices and innovations covariance matrix of Mdl to armafevd. armafevd(EstMdl.AR,[],"InnovCov",EstMdl.Covariance);
12-889
12
Functions
12-890
fevd
12-891
12
Functions
12-892
fevd
Each plot shows the four FEVDs of a variable when all other variables are shocked at time 0. Mdl.SeriesNames specifies the variable order.
Estimate Generalized FEVD of VAR Model Consider the 4-D VAR(2) model in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12888. Estimate the generalized FEVD of the system for 100 periods. Load the Danish money and income data set, and then estimate the VAR(2) model. load Data_JDanish Mdl = varm(4,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the generalized FEVD from the estimated VAR(2) model over a forecast horizon with length 100. Decomposition = fevd(Mdl,Method="generalized",NumObs=100);
Decomposition is a 100-by-4-by-4 array representing the generalized FEVD of Mdl. Plot the generalized FEVD of the bond rate when real income is shocked at time 0. 12-893
12
Functions
figure; plot(1:100,Decomposition(:,2,3)) title("FEVD of IB When Y Is Shocked") xlabel("Forecast Horizon") ylabel("Variance Contribution") grid on
When real income is shocked, the contribution of the bond rate to the forecast error variance settles at approximately 0.061.
Specify Data in Timetables When Computing FEVD and Confidence Intervals Fit a 4-D VAR(2) model to Danish money and income rate series data in a timetable. Then, estimate and plot the orthogonalized FEVD and corresponding confidence intervals from the estimated model. Load the Danish money and income data set. load Data_JDanish
The data set includes four time series in the timetable DataTimeTable. For more details on the data set, enter Description at the command line. Assuming that the series are stationary, create a varm model object that represents a 4-D VAR(2) model. Specify the variable names. 12-894
fevd
Mdl = varm(4,2); Mdl.SeriesNames = DataTimeTable.Properties.VariableNames;
Mdl is a varm model object specifying the structure of a 4-D VAR(2) model; it is a template for estimation. Fit the VAR(2) model to the data set. EstMdl = estimate(Mdl,DataTimeTable);
Mdl is a fully specified varm model object representing an estimated 4-D VAR(2) model. Estimate the orthogonalized FEVD and corresponding 95% confidence intervals from the estimated VAR(2) model. To return confidence intervals, you must set a name-value argument that controls confidence intervals, for example, Confidence. Set Confidence to 0.95. rng(1); % For reproducibility Tbl = fevd(EstMdl,Confidence=0.95); size(Tbl) ans = 1×2 20
12
Tbl is a timetable with 20 rows, representing the periods in the FEVD, and 12 variables. Each variable is a 20-by-4 matrix of the FEVD or confidence bound associated with a variable in the model EstMdl. For example, Tbl.M2_FEVD(:,2) is the FEVD of M2 resulting from a one-standarddeviation shock on 01-Apr-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. [Tbl.M2_FEVD_LowerBound(:,2),Tbl.M2_FEVD_UpperBound(:,2)] are the corresponding 95% confidence intervals. Plot the FEVD of M2 and its 95% confidence interval resulting from a one-standard-deviation shock on 01-Apr-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. idxM2 = startsWith(Tbl.Properties.VariableNames,"M2"); M2FEVD = Tbl(:,idxM2); shockIdx = 2; figure hold on plot(M2FEVD.Time,M2FEVD.M2_FEVD(:,shockIdx),"-o") plot(M2FEVD.Time,[M2FEVD.M2_FEVD_LowerBound(:,shockIdx) ... M2FEVD.M2_FEVD_UpperBound(:,shockIdx)],"-o",Color="r") legend("FEVD","95% confidence interval") title('M2 FEVD, Shock to Y') hold off
12-895
12
Functions
Monte Carlo Confidence Intervals on True FEVD Consider the 4-D VAR(2) model in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12888. Estimate and plot its orthogonalized FEVD and 95% Monte Carlo confidence intervals on the true FEVD. Load the Danish money and income data set, and then estimate the VAR(2) model. load Data_JDanish Mdl = varm(4,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the FEVD and corresponding 95% Monte Carlo confidence intervals from the estimated VAR(2) model. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl);
Decomposition, Lower, and Upper are 20-by-4-by-4 arrays representing the orthogonalized FEVD of Mdl and corresponding lower and upper bounds of the confidence intervals. For all arrays, rows correspond to consecutive time points from time 1 to 20, columns correspond to variables receiving a one-standard-deviation innovation shock at time 0, and pages correspond to the variables whose forecast error variance fevd decomposes. Mdl.SeriesNames specifies the variable order. 12-896
fevd
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(1:20,fevdshock2resp3); hold on h2 = plot(1:20,FEVDCIShock2Resp3,'r--'); legend([h1 h2(1)],["FEVD" "95% confidence interval"], ... 'Location',"best") xlabel("Forecast Horizon"); ylabel("Variance Contribution"); title("FEVD of IB When Y Is Shocked"); grid on hold off
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0 and 0.5 with 95% confidence.
12-897
12
Functions
Bootstrap Confidence Intervals on True FEVD Consider the 4-D VAR(2) model in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12888. Estimate and plot its orthogonalized FEVD and 90% bootstrap confidence intervals on the true FEVD. Load the Danish money and income data set, and then estimate the VAR(2) model. Return the residuals from model estimation. load Data_JDanish Mdl = varm(4,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; [Mdl,~,~,Res] = estimate(Mdl,DataTable.Series); T = size(DataTable,1) % Total sample size T = 55 n = size(Res,1)
% Effective sample size
n = 53
Res is a 53-by-4 array of residuals. Columns correspond to the variables in Mdl.SeriesNames. The estimate function requires Mdl.P = 2 observations to initialize a VAR(2) model for estimation. Because presample data (Y0) is unspecified, estimate takes the first two observations in the specified response data to initialize the model. Therefore, the resulting effective sample size is T – Mdl.P = 53, and rows of Res correspond to the observation indices 3 through T. Estimate the orthogonalized FEVD and corresponding 90% bootstrap confidence intervals from the estimated VAR(2) model. Draw 500 paths of length n from the series of residuals. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl,E=Res,NumPaths=500, ... Confidence=0.9);
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(0:19,fevdshock2resp3); hold on h2 = plot(0:19,FEVDCIShock2Resp3,"r--"); legend([h1 h2(1)],["FEVD" "90% confidence interval"], ... 'Location',"best") xlabel("Time Index"); ylabel("Response"); title("FEVD of IB When Y Is Shocked"); grid on hold off
12-898
fevd
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0.05 and 0.4 with 90% confidence.
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object created by varm or estimate. Mdl must be fully specified. If Mdl is an estimated model (returned by estimate) , you must supply any optional data using the same data type as the input response data, to which the model is fit. If Mdl is a custom varm model object (an object not returned by estimate or modified after estimation), fevd can require a sample size for the simulation SampleSize or presample responses Y0. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-899
12
Functions
Example: fevd(Mdl,NumObs=10,Method="generalized",E=Res) specifies estimating a generalized FEVD for periods 1 through 10, and bootstraps the residuals in the numeric array Res to compute the 95% confidence bounds. Options for All FEVDs
NumObs — Number of periods 20 (default) | positive integer Number of periods for which fevd computes the FEVD (the forecast horizon), specified as a positive integer. NumObs specifies the number of observations to include in the FEVD (the number of rows in Decomposition). Example: NumObs=10 specifies estimation of the FEVD for times 1 through 10. Data Types: double Method — FEVD computation method "orthogonalized" (default) | "generalized" | character vector FEVD computation method, specified as a value in this table. Value
Description
"orthogonalized"
Compute variance decompositions using orthogonalized, one-standard-deviation innovation shocks. fevd uses the Cholesky factorization of Mdl.Covariance for orthogonalization.
"generalized"
Compute variance decompositions using onestandard-deviation innovation shocks.
Example: Method="generalized" Data Types: char | string Options for Confidence Bound Estimation
NumPaths — Number of sample paths 100 (default) | positive integer Number of sample paths (trials) to generate, specified as a positive integer. Example: NumPaths=1000 generates 1000 sample paths from which the software derives the confidence bounds. Data Types: double SampleSize — Number of observations for Monte Carlo simulation or bootstrap per sample path positive integer Number of observations for the Monte Carlo simulation or bootstrap per sample path, specified as a positive integer. • If Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter), the default is the sample size of the data to which the model is fit (see summarize). 12-900
fevd
• Otherwise: • If fevd estimates confidence bounds by conducting a Monte Carlo simulation, you must specify SampleSize. • If fevd estimates confidence bounds by bootstrapping residuals, the default is the length of the specified series of residuals (size(Res,1), where Res is the number of residuals in E or InSample). Example: If you specify SampleSize=100 and do not specify the E name-value argument, the software estimates confidence bounds from NumPaths random paths of length 100 from Mdl. Example: If you specify SampleSize=100,E=Res, the software resamples, with replacement, 100 observations (rows) from Res to form a sample path of innovations to filter through Mdl. The software forms NumPaths random sample paths from which it derives confidence bounds. Data Types: double Y0 — Presample response data numeric matrix Presample response data that provides initial values for model estimation during the simulation, specified as a numpreobs-by-numseries numeric matrix. Use Y0 only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreobs is the number of presample observations. numseries is Mdl.NumSeries, the dimensionality of the input model. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs is the number of specified presample responses and it must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. numseries is the dimensionality of the input VAR model Mdl.NumSeries. Columns must correspond to the response variables in Mdl.SeriesNames. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Y0 to the presample response data used for estimation by default (see the Y0 name-value argument of estimate). • If Mdl is a custom model and you return confidence bounds Lower or Upper, you must specify Y0. Data Types: double Presample — Presample data table | timetable Presample data that provides initial values for the model Mdl, specified as a table or timetable with numprevars variables and numpreobs rows. Use Presample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to response data in a table or timetable. 12-901
12
Functions
Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. Each variable is a numpreobs numeric vector representing one path. To control presample variable selection, see the optional PresampleResponseVariables name-value argument. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Presample to the presample response data used for estimation by default (see the Presample name-value argument of estimate). • If Mdl is a custom model (for example, you modify a model after estimation by using dot notation) and you return confidence bounds in the table or timetable Tbl, you must specify Presample. PresampleResponseVariables — Variables to select from Presample to use for presample response data string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Presample to use for presample data, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Presample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleResponseVariables(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleResponseVariables) is numseries PresampleResponseVariables applies only when you specify Presample. The selected variables must be numeric vectors and cannot contain missing values (NaN). PresampleResponseNames does not need to contain the same names as in Mdl.SeriesNames; fevd uses the data in selected variable PresampleResponseVariables(j) as a presample for Mdl.SeriesNames(j). If the number of variables in Presample matches Mdl.NumSeries, the default specifies all variables in Presample. If the number of variables in Presample exceeds Mdl.NumSeries, the default matches variables in Presample to names in Mdl.SeriesNames. Example: PresampleResponseVariables=["GDP" "CPI"] Example: PresampleResponseVariables=[true false true false] or PresampleResponseVariable=[1 3] selects the first and third table variables for presample data. Data Types: double | logical | char | cell | string 12-902
fevd
X — Predictor data numeric matrix Predictor data xt for estimating the model regression component during the simulation, specified as a numeric matrix containing numpreds columns. Use X only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreds is the number of predictor variables (size(Mdl.Beta,2)). Each row corresponds to an observation, and measurements in each row occur simultaneously. The last row contains the latest observation. X must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. fevd does not use the regression component in the presample period. Columns correspond to individual predictor variables. All predictor variables are present in the regression component of each response equation. To maintain model consistency when fevd estimates the confidence bounds, specify predictor data when Mdl has a regression component. If Mdl is an estimated model, specify the predictor data used during model estimation (see the X name-value argument of estimate). By default, fevd excludes the regression component from confidence bound estimation, regardless of its presence in Mdl. Data Types: double E — Series of residuals et from which to draw bootstrap samples numeric matrix Series of residuals from which to draw bootstrap samples, specified as a numperiods-by-numseries numeric matrix. fevd assumes that E is free of serial correlation. Use E only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. Each column is the residual series corresponding to the response series names in Mdl.SeriesNames. Each row corresponds to a period in the FEVD and the corresponding confidence bounds. If Mdl is an estimated varm model object (an object returned by estimate), you can specify E as the inferred residuals from estimation (see the E output argument of estimate or infer). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Data Types: double InSample — Time series data table | timetable 12-903
12
Functions
Time series data containing numvars variables, including numseries variables of residuals et to bootstrap or numpreds predictor variables xt for the model regression component, specified as a table or timetable. Use InSample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated varm model object (an object returned by estimate and unmodified thereafter) fit to response data in a table or timetable. Each variable is a single path of observations, which fevd applies to all NumPaths sample paths. If you specify Presample, you must specify which variables are residuals and predictors. See the ResidualVariables and PredictorVariables name-value arguments. Each row is an observation, and measurements in each row occur simultaneously. InSample must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). • The datetime vector InSample.Time must be ascending or descending. • Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the last row contains the latest observation. By default, fevd derives confidence bounds by conducting a Monte Carlo simulation and does not use the regression component, regardless of its presence in Mdl. ResidualVariables — Variables to select from InSample to treat as residuals et for bootstrapping string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as residuals for bootstrapping, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in InSample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where ResidualVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(ResidualVariables) is numseries Regardless, selected residual variable j is the residual series for Mdl.SeriesNames(j). The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Example: ResidualVariables=["GDP_Residuals" "CPI_Residuals"] Example: ResidualVariables=[true false true false] or ResidualVariable=[1 3] selects the first and third table variables as the disturbance variables. Data Types: double | logical | char | cell | string 12-904
fevd
PredictorVariables — Variables to select from InSample to treat as exogenous predictor variables xt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as exogenous predictor variables xt, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames • A length numpreds vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(PredictorVariables) is numpreds Regardless, selected predictor variable j corresponds to the coefficients Mdl.Beta(:,j). PredictorVariables applies only when you specify InSample. The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string Confidence — Confidence level 0.95 (default) | numeric scalar in [0,1] Confidence level for the confidence bounds, specified as a numeric scalar in the interval [0,1]. For each period, randomly drawn confidence intervals cover the true response 100*Confidence% of the time. The default value is 0.95, which implies that the confidence bounds represent 95% confidence intervals. Example: Confidence=0.9 specifies 90% confidence intervals. Data Types: double Note • NaN values in Y0, X, and E indicate missing data. fevd removes missing data from these arguments by list-wise deletion. For each argument, if a row contains at least one NaN, fevd removes the entire row. List-wise deletion reduces the sample size, can create irregular time series, and can cause E and X to be unsynchronized. • fevd issues an error when any table or timetable input contains missing values.
12-905
12
Functions
Output Arguments Decomposition — FEVD numeric array FEVD of each response variable, returned as a numobs-by-numseries-by-numseries numeric array. numobs is the value of NumObs. Columns and pages correspond to the response variables in Mdl.SeriesNames. fevd returns Decomposition only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Decomposition(t,j,k) is the contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time t, for t = 1,2,…,numobs, j = 1,2,...,numseries, and k = 1,2,...,numseries. Lower — Lower confidence bounds numeric array Lower confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Lower correspond to elements of Decomposition. fevd returns Lower only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Lower(t,j,k) is the lower bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Upper — Upper confidence bounds numeric array Upper confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Upper correspond to elements of Decomposition. fevd returns Upper only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Upper(t,j,k) is the upper bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Tbl — FEVD and confidence bounds table | timetable FEVD and confidence bounds, returned as a table or timetable with numobs rows. fevd returns Tbl only in the following situations: 12-906
fevd
• You supply optional data inputs as tables or timetables. • Mdl is an estimated model object fit to response data in a table or timetable. Regardless, the data type of Tbl is the same as the data type of specified data. Tbl contains the following variables: • The FEVD of each series in yt. Each FEVD variable in Tbl is a numobs-by-numseries numeric matrix, where numobs is the value of NumObs and numseries is the value of Mdl.NumSeries. fevd names the FEVD of response variable ResponseJ in Mdl.SeriesNames ResponseJ_FEVD. For example, if Mdl.Series(j) is GDP, Tbl contains a variable for the corresponding FEVD with the name GDP_FEVD. ResponseJ_FEVD(t,k) is the contribution to the variance decomposition of response variable ResponseJ attributable to a one-standard-deviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. • The lower and upper confidence bounds on the true FEVD of the response series, when you set at least one name-value argument that controls the confidence bounds. Each confidence bound variable in Tbl is a numobs-by-numseries numeric matrix. ResponseJ_FEVD_LowerBound and ResponseJ_FEVD_UpperBound are the names of the lower and upper bound variables, respectively, of the confidence interval on the FEVD of response variable Mdl.SeriesNames(J) = ResponseJ. For example, if Mdl.SeriesNames(j) is GDP, Tbl contains variables for the corresponding lower and upper bounds of the confidence interval with the name GDP_FEVD_LowerBound and GDP_FEVD_UpperBound. (ResponseJ_FEVD_LowerBound(t,k), ResponseJ_FEVD_UpperBound(t,k)) is the 95% confidence interval on the FEVD of response variable ResponseJ attributable to a one-standarddeviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. If Tbl is a timetable, the row order of Tbl, either ascending or descending, matches the row order of InSample, when you specify it. If you do not specify InSample and you specify Presample, the row order of Tbl is the same as the row order Presample.
More About Forecast Error Variance Decomposition The forecast error variance decomposition (FEVD) of a multivariate, dynamic system shows the relative importance of a shock to each innovation in affecting the forecast error variance of all variables in the system. Consider a numseries-D VAR(p) model on page 12-908 for the multivariate response variable yt. In lag operator notation, the infinite lag MA representation of yt is yt = Φ−1(L) c + βxt + δt + Φ−1(L)εt = Ω(L) c + βxt + δt + Ω(L)εt . The general form of the FEVD of ykt (variable k) m periods into the future, attributable to a onestandard-deviation innovation shock to yjt, is
12-907
12
Functions
m−1
∑
γm jk =
t=0 m−1
∑
t=0
ek′Cte j
2
. ek′ Ωt ΣΩt′ek
• ej is a selection vector of length numseries containing a 1 in element j and zeros elsewhere. • For orthogonalized FEVDs, Cm = ΩmP, where P is the lower triangular factor in the Cholesky factorization of Σ. • For generalized FEVDs, Cm = σ−1ΩmΣ, where σj is the standard deviation of innovation j. j • The numerator is the contribution of an innovation shock to variable j to the forecast error variance of the m-step-ahead forecast of variable k. The denominator is the mean square error (MSE) of the m-step-ahead forecast of variable k [3]. Vector Autoregression Model A vector autoregression (VAR) model is a stationary multivariate time series model consisting of a system of m equations of m distinct response variables as linear functions of lagged responses and other terms. A VAR(p) model in difference-equation notation and in reduced form is yt = c + Φ1 yt − 1 + Φ2 yt − 2 + ... + Φp yt − p + βxt + δt + εt . • yt is a numseries-by-1 vector of values corresponding to numseries response variables at time t, where t = 1,...,T. The structural coefficient is the identity matrix. • c is a numseries-by-1 vector of constants. • Φj is a numseries-by-numseries matrix of autoregressive coefficients, where j = 1,...,p and Φp is not a matrix containing only zeros. • xt is a numpreds-by-1 vector of values corresponding to numpreds exogenous predictor variables. • β is a numseries-by-numpreds matrix of regression coefficients. • δ is a numseries-by-1 vector of linear time-trend values. • εt is a numseries-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively a numseries-by-numseries covariance matrix Σ. For t ≠ s, εt and εs are independent. Condensed and in lag operator notation, the system is Φ(L)yt = c + βxt + δt + εt, where Φ(L) = I − Φ1L − Φ2L2 − ... − ΦpLp, Φ(L)yt is the multivariate autoregressive polynomial, and I is the numseries-by-numseries identity matrix.
Algorithms • If Method is "orthogonalized", then fevd orthogonalizes the innovation shocks by applying the Cholesky factorization of the model covariance matrix Mdl.Covariance. The covariance of the orthogonalized innovation shocks is the identity matrix, and the FEVD of each variable sums to one (that is, the sum along any row of Decomposition or rows associated with FEVD variables in Tbl is one). Therefore, the orthogonalized FEVD represents the proportion of forecast error 12-908
fevd
variance attributable to various shocks in the system. However, the orthogonalized FEVD generally depends on the order of the variables. If Method is "generalized", then the resulting FEVD is invariant to the order of the variables, and is not based on an orthogonal transformation. Also, the resulting FEVD sums to one for a particular variable only when Mdl.Covariance is diagonal [4]. Therefore, the generalized FEVD represents the contribution to the forecast error variance of equation-wise shocks to the response variables in the model. • If Mdl.Covariance is a diagonal matrix, then the resulting generalized and orthogonalized FEVDs are identical. Otherwise, the resulting generalized and orthogonalized FEVDs are identical only when the first variable in Mdl.SeriesNames shocks all variables (for example, all else being the same, both methods yield the same value of Decomposition(:,1,:)). • The predictor data in X or InSample represents a single path of exogenous multivariate time series. If you specify X or InSample and the model Mdl has a regression component (Mdl.Beta is not an empty array), fevd applies the same exogenous data to all paths used for confidence interval estimation. • fevd conducts a simulation to estimate the confidence bounds Lower and Upper or associated variables in Tbl. • If you do not specify residuals by supplying E or using InSample, fevd conducts a Monte Carlo simulation by following this procedure: 1
Simulate NumPaths response paths of length SampleSize from Mdl.
2
Fit NumPaths models that have the same structure as Mdl to the simulated response paths. If Mdl contains a regression component and you specify predictor data by supplying X or using InSample, fevd fits the NumPaths models to the simulated response paths and the same predictor data (the same predictor data applies to all paths).
3
Estimate NumPaths FEVDs from the NumPaths estimated models.
4
For each time point t = 0,…,NumObs, estimate the confidence intervals by computing 1 – Confidence and Confidence quantiles (the upper and lower bounds, respectively).
• Otherwise, fevd conducts a nonparametric bootstrap by following this procedure: 1
Resample, with replacement, SampleSize residuals from E or InSample. Perform this step NumPaths times to obtain NumPaths paths.
2
Center each path of bootstrapped residuals.
3
Filter each path of centered, bootstrapped residuals through Mdl to obtain NumPaths bootstrapped response paths of length SampleSize.
4
Complete steps 2 through 4 of the Monte Carlo simulation, but replace the simulated response paths with the bootstrapped response paths.
Version History Introduced in R2019a R2022b: fevd accepts input data in tables and timetables, and return results in tables and timetables
12-909
12
Functions
In addition to accepting input data in numeric arrays, fevd accepts input data in tables and timetables. fevd chooses default series on which to operate, but you can use the following namevalue arguments to select variables. • Presample specifies the input table or regular timetable of presample response data. • PresampleResponseVariables specifies the response series names from Presample. • Insample specifies the table or regular timetable of residual and predictor data to compute bootstrap estimates. • ResidualVariables specifies the residual series names in InSample. • PredictorVariables specifies the predictor series in InSample for a model regression component.
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Lütkepohl, H. "Asymptotic Distributions of Impulse Response Functions and Forecast Error Variance Decompositions of Vector Autoregressive Models." Review of Economics and Statistics. Vol. 72, 1990, pp. 116–125. [3] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [4] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29.
See Also Objects varm Functions estimate | simulate | filter | irf | armafevd
12-910
fevd
fevd Generate vector error-correction (VEC) model forecast error variance decomposition (FEVD)
Syntax Decomposition = fevd(Mdl) Decomposition = fevd(Mdl,Name=Value) [Decomposition,Lower,Upper] = fevd( ___ ) Tbl = fevd( ___ )
Description The fevd function returns the forecast error decomposition on page 12-932 (FEVD) of the variables in a VEC(p – 1) model on page 12-933 attributable to shocks to each response variable in the system. A fully specified vecm model object characterizes the VEC model. The FEVD provides information about the relative importance of each innovation in affecting the forecast error variance of all response variables in the system. In contrast, the impulse response function (IRF) traces the effects of an innovation shock to one variable on the response of all variables in the system. To estimate the IRF of a VEC model characterized by a vecm model object, see irf. You can supply optional data, such as a presample, as a numeric array, table, or timetable. However, all specified input data must be the same data type. When the input model is estimated (returned by estimate), supply the same data type as the data used to estimate the model. The data type of the outputs matches the data type of the specified input data. Decomposition = fevd(Mdl) returns a numeric array containing the orthogonalized FEVDs of the response variables that compose the VEC(p – 1) model Mdl characterized by a fully specified vecm model object. fevd shocks variables at time 0, and returns the FEVD for times 1 through 20. If Mdl is an estimated model (returned by estimate) fit to a numeric matrix of input response data, this syntax applies. Decomposition = fevd(Mdl,Name=Value) uses additional options specified by one or more name-value arguments. fevd returns numeric arrays when all optional input data are numeric arrays. For example, fevd(Mdl,NumObs=10,Method="generalized") specifies estimating a generalized FEVD for periods 1 through 10. If Mdl is an estimated model fit to a numeric matrix of input response data, this syntax applies. [Decomposition,Lower,Upper] = fevd( ___ ) returns numeric arrays of lower Lower and upper Upper 95% confidence bounds for confidence intervals on the true FEVD, for each period and variable in the FEVD, using any input argument combination in the previous syntaxes. By default, fevd estimates confidence bounds by conducting Monte Carlo simulation. If Mdl is an estimated model fit to a numeric matrix of input response data, this syntax applies. 12-911
12
Functions
If Mdl is a custom vecm model object (an object not returned by estimate or modified after estimation), fevd can require a sample size for the simulation SampleSize or presample responses Y0. Tbl = fevd( ___ ) returns the table or timetable Tbl containing the FEVDs and, optionally, corresponding 95% confidence bounds, of the response variables that compose the VEC(p – 1) model Mdl. The FEVD of the corresponding response is a variable in Tbl containing a matrix with columns corresponding to the variables in the system shocked at time 0. If you set at least one name-value argument that controls the 95% confidence bounds on the FEVD, Tbl also contains a variable for each of the lower and upper bounds. For example, Tbl contains confidence bounds when you set the NumPaths name-value argument. If Mdl is an estimated model fit to a table or timetable of input response data, this syntax applies.
Examples Specify Data in Numeric Matrix When Plotting FEVD Fit a 4-D VEC(2) model with two cointegrating relations to Danish money and income rate series data in a numeric matrix. Then, estimate and plot the orthogonalized FEVD from the estimated model. Load the Danish money and income data set. load Data_JDanish
For more details on the data set, enter Description at the command line. Create a vecm model object that represents a 4-D VEC(2) model with two cointegrating relations. Specify the variable names. Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames;
Mdl is a vecm model object specifying the structure of a 4-D VEC(2) model; it is a template for estimation. Fit the VEC(2) model to the numeric matrix of time series data Data. Mdl = estimate(Mdl,Data);
EstMdl is a fully specified vecm model object representing an estimated 4-D VEC(2) model. Estimate the orthogonalized FEVD from the estimated VEC(2) model. Decomposition = fevd(Mdl);
Decomposition is a 20-by-4-by-4 array representing the FEVD of Mdl. Rows correspond to consecutive time points from time 1 to 20, columns correspond to variables receiving a one-standarddeviation innovation shock at time 0, and pages correspond to variables whose forecast error variance fevd decomposes. Mdl.SeriesNames specifies the variable order. By default, fevd uses the H1 Johansen form, which is the same default form that estimate uses. 12-912
fevd
Because Decomposition represents an orthogonalized FEVD, rows should sum to 1. This characteristic illustrates that orthogonalized FEVDs represent proportions of variance contributions. Confirm that all rows of Decomposition sum to 1. rowsums = sum(Decomposition,2); sum((rowsums - 1).^2 > eps) ans = ans(:,:,1) = 0 ans(:,:,2) = 0 ans(:,:,3) = 0 ans(:,:,4) = 0
Row sums among the pages are close to 1. Display the contributions to the forecast error variance of the bond rate when real income is shocked at time 0. Decomposition(:,2,3) ans = 20×1 0.0694 0.1744 0.1981 0.2182 0.2329 0.2434 0.2490 0.2522 0.2541 0.2559 ⋮
The armafevd function plots the FEVD of VAR models characterized by AR coefficient matrices. Plot the FEVD of a VEC model by: 1
Expressing the VEC(2) model as a VAR(3) model by passing Mdl to varm
2
Passing the VAR model AR coefficients and innovations covariance matrix to armafevd
Plot the VEC(2) model FEVD for 40 periods. 12-913
12
Functions
VARMdl = varm(Mdl); armafevd(VARMdl.AR,[],InnovCov=VARMdl.Covariance, ... NumObs=40);
12-914
fevd
12-915
12
Functions
12-916
fevd
Each plot shows the four FEVDs of a variable when all other variables are shocked at time 0. Mdl.SeriesNames specifies the variable order.
Estimate Generalized FEVD of VEC Model Consider the 4-D VEC(2) model with two cointegrating relations in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12-912. Estimate the generalized FEVD of the system for 100 periods. Load the Danish money and income data set, then estimate the VEC(2) model. load Data_JDanish Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the generalized FEVD from the estimated VEC(2) model over a forecast horizon with length 100. Decomposition = fevd(Mdl,Method="generalized",NumObs=100);
Decomposition is a 100-by-4-by-4 array representing the generalized FEVD of Mdl. Plot the generalized FEVD of the bond rate when real income is shocked at time 0. 12-917
12
Functions
figure; plot(1:100,Decomposition(:,2,3)) title("FEVD of IB When Y Is Shocked") xlabel("Forecast Horizon") ylabel("Variance Contribution") grid on
When real income is shocked, the contribution of the bond rate to the forecast error variance settles at approximately 0.08.
Specify Data in Timetables When Computing FEVD and Confidence Intervals Fit a 4-D VEC(2) model with two cointegrating relations to Danish money and income rate series data in a timetable. Then, estimate and plot the orthogonalized FEVD and corresponding confidence intervals from the estimated model. Load the Danish money and income data set. load Data_JDanish
Create a vecm model object that represents a 4-D VEC(2) model with two cointegrating relations. Specify the variable names. Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames;
12-918
fevd
Mdl is a vecm model object specifying the structure of a 4-D VEC(2) model; it is a template for estimation. Fit the VEC(2) model to the data set. EstMdl = estimate(Mdl,DataTimeTable);
EstMdl is a fully specified vecm model object representing an estimated 4-D VEC(2) model. Estimate the orthogonalized FEVD and corresponding 95% confidence intervals from the estimated VEC(2) model. To return confidence intervals, you must set a name-value argument that controls confidence intervals, for example, Confidence. Set Confidence to 0.95. rng(1); % For reproducibility Tbl = fevd(EstMdl,Confidence=0.95); Tbl.Time(1) ans = datetime 01-Oct-1974 size(Tbl) ans = 1×2 20
12
Tbl is a timetable with 20 rows, representing the periods in the FEVD, and 12 variables. Each variable is a 20-by-4 matrix of the FEVD or confidence bound associated with a variable in the model EstMdl. For example, Tbl.M2_FEVD(:,2) is the FEVD of M2 resulting from a 1-standard-deviation shock on 01-Jul-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. [Tbl.M2_FEVD_LowerBound(:,2),Tbl.M2_FEVD_UpperBound(:,2)] are the corresponding 95% confidence intervals. By default, fevd uses the H1 Johansen form, which is the same default form that estimate uses. Plot the FEVD of M2 and its 95% confidence interval resulting from a 1-standard-deviation shock on 01-Jul-1974 (period 0) to Mdl.SeriesNames(2), which is the variable Y. idxM2 = startsWith(Tbl.Properties.VariableNames,"M2"); M2FEVD = Tbl(:,idxM2); shockIdx = 2; figure hold on plot(M2FEVD.Time,M2FEVD.M2_FEVD(:,shockIdx),"-o") plot(M2FEVD.Time,[M2FEVD.M2_FEVD_LowerBound(:,shockIdx) ... M2FEVD.M2_FEVD_UpperBound(:,shockIdx)],"-o",Color="r") legend("FEVD","95% confidence interval") title('M2 FEVD, Shock to Y') hold off
12-919
12
Functions
Monte Carlo Confidence Intervals on True FEVD Consider the 4-D VEC(2) model with two cointegrating relations in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12-912. Estimate and plot its orthogonalized FEVD and 95% Monte Carlo confidence intervals on the true FEVD. Load the Danish money and income data set, then estimate the VEC(2) model. load Data_JDanish Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; Mdl = estimate(Mdl,DataTable.Series);
Estimate the FEVD and corresponding 95% Monte Carlo confidence intervals from the estimated VEC(2) model. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl);
Decomposition, Lower, and Upper are 20-by-4-by-4 arrays representing the orthogonalized FEVD of Mdl and corresponding lower and upper bounds of the confidence intervals. For all arrays, rows correspond to consecutive time points from time 1 to 20, columns correspond to variables receiving a one-standard-deviation innovation shock at time 0, and pages correspond to the variables whose forecast error variance fevd decomposes. Mdl.SeriesNames specifies the variable order. 12-920
fevd
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(1:20,fevdshock2resp3); hold on h2 = plot(1:20,FEVDCIShock2Resp3,"r--"); legend([h1 h2(1)],["FEVD" "95% Confidence Interval"],... Location="best") xlabel("Forecast Horizon"); ylabel("Variance Contribution"); title("FEVD of IB When Y Is Shocked"); grid on hold off
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0 and 0.7 with 95% confidence.
12-921
12
Functions
Bootstrap Confidence Intervals on True FEVD Consider the 4-D VEC(2) model with two cointegrating relations in “Specify Data in Numeric Matrix When Plotting FEVD” on page 12-912. Estimate and plot its orthogonalized FEVD and 90% bootstrap confidence intervals on the true FEVD. Load the Danish money and income data set, then estimate the VEC(2) model. Return the residuals from model estimation. load Data_JDanish Mdl = vecm(4,2,2); Mdl.SeriesNames = DataTable.Properties.VariableNames; [Mdl,~,~,Res] = estimate(Mdl,DataTable.Series); T = size(DataTable,1) % Total sample size T = 55 n = size(Res,1)
% Effective sample size
n = 52
Res is a 52-by-4 array of residuals. Columns correspond to the variables in Mdl.SeriesNames. The estimate function requires Mdl.P = 3 observations to initialize a VEC(2) model for estimation. Because presample data (Y0) is unspecified, estimate takes the first three observations in the specified response data to initialize the model. Therefore, the resulting effective sample size is T – Mdl.P = 52, and rows of Res correspond to the observation indices 4 through T. Estimate the orthogonalized FEVD and corresponding 90% bootstrap confidence intervals from the estimated VEC(2) model. Draw 500 paths of length n from the series of residuals. rng(1); % For reproducibility [Decomposition,Lower,Upper] = fevd(Mdl,E=Res,NumPaths=500, ... Confidence=0.9);
Plot the orthogonalized FEVD with its confidence bounds of the bond rate when real income is shocked at time 0. fevdshock2resp3 = Decomposition(:,2,3); FEVDCIShock2Resp3 = [Lower(:,2,3) Upper(:,2,3)]; figure; h1 = plot(0:19,fevdshock2resp3); hold on h2 = plot(0:19,FEVDCIShock2Resp3,'r--'); legend([h1 h2(1)],["FEVD" "90% Confidence Interval"], ... Location="best") xlabel("Time Index"); ylabel("Response"); title("FEVD of IB When Y Is Shocked"); grid on hold off
12-922
fevd
In the long run, and when real income is shocked, the proportion of forecast error variance of the bond rate settles between approximately 0 and 0.6 with 90% confidence.
Input Arguments Mdl — VEC model vecm model object VEC model, specified as a vecm model object created by vecm or estimate. Mdl must be fully specified. If Mdl is an estimated model (returned by estimate) , you must supply any optional data using the same data type as the input response data, to which the model is fit. If Mdl is a custom vecm model object (an object not returned by estimate or modified after estimation), fevd can require a sample size for the simulation SampleSize or presample responses Y0. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. 12-923
12
Functions
Example: fevd(Mdl,NumObs=10,Method="generalized",E=Res) specifies estimating a generalized FEVD for periods 1 through 10, and bootstraps the residuals in the numeric array Res to compute the 95% confidence bounds. Options for All FEVDs
NumObs — Number of periods 20 (default) | positive integer Number of periods for which fevd computes the FEVD (the forecast horizon), specified as a positive integer. NumObs specifies the number of observations to include in the FEVD (the number of rows in Decomposition). Example: NumObs=10 specifies estimation of the FEVD for times 1 through 10. Data Types: double Method — FEVD computation method "orthogonalized" (default) | "generalized" | character vector FEVD computation method, specified as a value in this table. Value
Description
"orthogonalized"
Compute variance decompositions using orthogonalized, one-standard-deviation innovation shocks. fevd uses the Cholesky factorization of Mdl.Covariance for orthogonalization.
"generalized"
Compute variance decompositions using onestandard-deviation innovation shocks.
Example: Method="generalized" Data Types: char | string Model — Johansen form of VEC(p – 1) model deterministic terms "H1" (default) | "H2" | "H1*" | "H*" | "H" | character vector Johansen form of the VEC(p – 1) model deterministic terms [2], specified as a value in this table (for variable definitions, see “Vector Error-Correction Model” on page 12-1658). Value
Error-Correction Term
Description
"H2"
AB´yt − 1
No intercepts or trends are present in the cointegrating relations, and no deterministic trends are present in the levels of the data. Specify this model only when all response series have a mean of zero.
"H1*"
12-924
A(B´yt−1+c0)
Intercepts are present in the cointegrating relations, and no deterministic trends are present in the levels of the data.
fevd
Value
Error-Correction Term
Description
"H1"
A(B´yt−1+c0)+c1
Intercepts are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H*"
A(B´yt−1+c0+d0t)+c1
Intercepts and linear trends are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H"
A(B´yt−1+c0+d0t)+c1+d1t
Intercepts and linear trends are present in the cointegrating relations, and deterministic quadratic trends are present in the levels of the data. If quadratic trends are not present in the data, this model can produce good in-sample fits but poor out-of-sample forecasts.
For more details on the Johansen forms, see estimate. • If Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter), the default is the Johansen form used for estimation (see Model). • Otherwise, the default is "H1". Tip A best practice is to maintain model consistency during the simulation that estimates confidence bounds. Therefore, if Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter), incorporate any constraints imposed during estimation by deferring to the default value of Model. Example: Model="H1*" Data Types: string | char Options for Confidence Bound Estimation
NumPaths — Number of sample paths 100 (default) | positive integer Number of sample paths (trials) to generate, specified as a positive integer. Example: NumPaths=1000 generates 1000 sample paths from which the software derives the confidence bounds. Data Types: double SampleSize — Number of observations for Monte Carlo simulation or bootstrap per sample path positive integer Number of observations for the Monte Carlo simulation or bootstrap per sample path, specified as a positive integer. • If Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter), the default is the sample size of the data to which the model is fit (see summarize). 12-925
12
Functions
• Otherwise: • If fevd estimates confidence bounds by conducting a Monte Carlo simulation, you must specify SampleSize. • If fevd estimates confidence bounds by bootstrapping residuals, the default is the length of the specified series of residuals (size(Res,1), where Res is the number of residuals in E or InSample). Example: If you specify SampleSize=100 and do not specify the E name-value argument, the software estimates confidence bounds from NumPaths random paths of length 100 from Mdl. Example: If you specify SampleSize=100,E=Res, the software resamples, with replacement, 100 observations (rows) from Res to form a sample path of innovations to filter through Mdl. The software forms NumPaths random sample paths from which it derives confidence bounds. Data Types: double Y0 — Presample response data numeric matrix Presample response data that provides initial values for model estimation during the simulation, specified as a numpreobs-by-numseries numeric matrix. Use Y0 only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreobs is the number of presample observations. numseries is Mdl.NumSeries, the dimensionality of the input model. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs is the number of specified presample responses and it must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. numseries is the dimensionality of the input VEC model Mdl.NumSeries. Columns must correspond to the response variables in Mdl.SeriesNames. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Y0 to the presample response data used for estimation by default (see the Y0 name-value argument of estimate). • If Mdl is a custom model and you return confidence bounds Lower or Upper, you must specify Y0. Data Types: double Presample — Presample data table | timetable Presample data that provide initial values for the model Mdl, specified as a table or timetable with numprevars variables and numpreobs rows. Use Presample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. 12-926
fevd
Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs is the number of specified presample responses and it must be at least Mdl.P. If you supply more rows than necessary, fevd uses the latest Mdl.P observations only. Each variable is a numpreobs numeric vector representing one path. To control presample variable selection, see the optional PresampleResponseVariables name-value argument. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. The following situations determine the default or whether presample response data is required. • If Mdl is an unmodified estimated model, fevd sets Presample to the presample response data used for estimation by default (see the Presample name-value argument of estimate). • If Mdl is a custom model (for example, you modify a model after estimation by using dot notation) and you return confidence bounds in the table or timetable Tbl, you must specify Presample. PresampleResponseVariables — Variables to select from Presample to use for presample response data string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Presample to use for presample data, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Presample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleResponseVariables(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleResponseVariables) is numseries PresampleResponseVariables applies only when you specify Presample. The selected variables must be numeric vectors and cannot contain missing values (NaN). PresampleResponseNames does not need to contain the same names as in Mdl.SeriesNames; fevd uses the data in selected variable PresampleResponseVariables(j) as a presample for Mdl.SeriesNames(j). If the number of variables in Presample matches Mdl.NumSeries, the default specifies all variables in Presample. If the number of variables in Presample exceeds Mdl.NumSeries, the default matches variables in Presample to names in Mdl.SeriesNames. Example: PresampleResponseVariables=["GDP" "CPI"] Example: PresampleResponseVariables=[true false true false] or PresampleResponseVariable=[1 3] selects the first and third table variables for presample data. Data Types: double | logical | char | cell | string 12-927
12
Functions
X — Predictor data numeric matrix Predictor data xt for estimating the model regression component during the simulation, specified as a numeric matrix containing numpreds columns. Use X only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. numpreds is the number of predictor variables (size(Mdl.Beta,2)). Each row corresponds to an observation, and measurements in each row occur simultaneously. The last row contains the latest observation. X must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. fevd does not use the regression component in the presample period. Columns correspond to individual predictor variables. All predictor variables are present in the regression component of each response equation. To maintain model consistency when fevd estimates the confidence bounds, a good practice is to specify predictor data when Mdl has a regression component. If Mdl is an estimated model, specify the predictor data used during model estimation (see the X name-value argument of estimate). By default, fevd excludes the regression component from confidence bound estimation, regardless of its presence in Mdl. Data Types: double E — Series of residuals from which to draw bootstrap samples numeric matrix Series of residuals from which to draw bootstrap samples, specified as a numperiods-by-numseries numeric matrix. fevd assumes that E is free of serial correlation. Use E only in the following situations: • You supply other optional data inputs as numeric matrices. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to a numeric matrix of response data. Each column is the residual series corresponding to the response series names in Mdl.SeriesNames. Each row corresponds to a period in the FEVD and the corresponding confidence bounds. If Mdl is an estimated vecm model object (an object returned by estimate), you can specify E as the inferred residuals from estimation (see the E output argument of estimate or infer). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Data Types: double InSample — Time series data table | timetable 12-928
fevd
Time series data containing numvars variables, including numseries variables of residuals et to bootstrap or numpreds predictor variables xt for the model regression component, specified as a table or timetable. Use InSample only in the following situations: • You supply other optional data inputs as tables or timetables. • Mdl is an estimated vecm model object (an object returned by estimate and unmodified thereafter) fit to response data in a table or timetable. Each variable is a single path of observations, which fevdapplies to all NumPaths sample paths. If you specify Presample you must specify which variables are residuals and predictors, see the ResidualVariables and PredictorVariables name-value arguments. Each row is an observation, and measurements in each row occur simultaneously. InSample must have at least SampleSize rows. If you supply more rows than necessary, fevd uses only the latest observations. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). • The datetime vector InSample.Time must be strictly ascending or descending. • Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the last row contains the latest observation. By default, fevd derives confidence bounds by conducting a Monte Carlo simulation and does not use model the regression component, regardless of its presence in Mdl. ResidualVariables — Variables to select from InSample to treat as residuals et for bootstrapping string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as residuals for bootstrapping, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in InSample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where ResidualVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(ResidualVariables) is numseries Regardless, selected residual variable j is the residual series for Mdl.SeriesNames(j). The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd derives confidence bounds by conducting a Monte Carlo simulation. Example: ResidualVariables=["GDP_Residuals" "CPI_Residuals"] Example: ResidualVariables=[true false true false] or ResidualVariable=[1 3] selects the first and third table variables as the disturbance variables. Data Types: double | logical | char | cell | string 12-929
12
Functions
PredictorVariables — Variables to select from InSample to treat as exogenous predictor variables xt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as exogenous predictor variables xt, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames • A length numpreds vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(PredictorVariables) is numpreds Regardless, selected predictor variable j corresponds to the coefficients Mdl.Beta(:,j). PredictorVariables applies only when you specify InSample. The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, fevd excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string Confidence — Confidence level 0.95 (default) | numeric scalar in [0,1] Confidence level for the confidence bounds, specified as a numeric scalar in the interval [0,1]. For each period, randomly drawn confidence intervals cover the true response 100*Confidence% of the time. The default value is 0.95, which implies that the confidence bounds represent 95% confidence intervals. Example: Confidence=0.9 specifies 90% confidence intervals. Data Types: double Note • NaN values in Y0, X, and E indicate missing data. fevd removes missing data from these arguments by list-wise deletion. For each argument, if a row contains at least one NaN, fevd removes the entire row. List-wise deletion reduces the sample size, can create irregular time series, and can cause E and X to be unsynchronized. • fevd issues an error when any table or timetable input contains missing values.
12-930
fevd
Output Arguments Decomposition — FEVD numeric array FEVD of each response variable, returned as a numobs-by-numseries-by-numseries numeric array. numobs is the value of NumObs. Columns and pages correspond to the response variables in Mdl.SeriesNames. fevd returns Decomposition only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Decomposition(t,j,k) is the contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time t, for t = 1,2,…,numobs, j = 1,2,...,numseries, and k = 1,2,...,numseries. Lower — Lower confidence bounds numeric array Lower confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Lower correspond to elements of Decomposition. fevd returns Lower only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Lower(t,j,k) is the lower bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Upper — Upper confidence bounds numeric array Upper confidence bounds, returned as a numobs-by-numseries-by-numseries numeric array. Elements of Upper correspond to elements of Decomposition. fevd returns Upper only in the following situations: • You supply optional data inputs as numeric matrices. • Mdl is an estimated model fit to a numeric matrix of response data. Upper(t,j,k) is the upper bound of the 100*Confidence-th percentile interval on the true contribution to the variance decomposition of variable k attributable to a one-standard-deviation innovation shock to variable j at time 0. Tbl — FEVD and confidence bounds table | timetable FEVD and confidence bounds, returned as a table or timetable with numobs rows. fevd returns Tbl only in the following situations: 12-931
12
Functions
• You supply optional data inputs as tables or timetables. • Mdl is an estimated model object fit to response data in a table or timetable. Regardless, the data type of Tbl is the same as the data type of specified data. Tbl contains the following variables: • The FEVD of each series in yt. Each FEVD variable in Tbl is a numobs-by-numseries numeric matrix, where numobs is the value of NumObs and numseries is the value of Mdl.NumSeries. fevd names the FEVD of response variable ResponseJ in Mdl.SeriesNames ResponseJ_FEVD. For example, if Mdl.Series(j) is GDP, Tbl contains a variable for the corresponding FEVD with the name GDP_FEVD. ResponseJ_FEVD(t,k) is the contribution to the variance decomposition of response variable ResponseJ attributable to a one-standard-deviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. • The lower and upper confidence bounds on the true FEVD of the response series, when you set at least one name-value argument that controls the confidence bounds. Each confidence bound variable in Tbl is a numobs-by-numseries numeric matrix. ResponseJ_FEVD_LowerBound and ResponseJ_FEVD_UpperBound are the names of the lower and upper bound variables, respectively, of the confidence interval on the FEVD of response variable Mdl.SeriesNames(J) = ResponseJ. For example, if Mdl.SeriesNames(j) is GDP, Tbl contains variables for the corresponding lower and upper bounds of the confidence interval with the name GDP_FEVD_LowerBound and GDP_FEVD_UpperBound. (ResponseJ_FEVD_LowerBound(t,k), ResponseJ_FEVD_UpperBound(t,k)) is the 95% confidence interval on the FEVD of response variable ResponseJ attributable to a one-standarddeviation innovation shock to variable k at time t, for t = 1,2,…,numobs, J = 1,2,...,numseries, and k = 1,2,...,numseries. If Tbl is a timetable, the row order of Tbl, either ascending or descending, matches the row order of InSample, when you specify it. If you do not specify InSample and you specify Presample, the row order of Tbl is the same as the row order Presample.
More About Forecast Error Variance Decomposition The forecast error variance decomposition (FEVD) of a multivariate, dynamic system shows the relative importance of a shock to each innovation in affecting the forecast error variance of all variables in the system. Consider a numseries-D VEC(p – 1) model on page 12-933 for the multivariate response variable yt. In lag operator notation, the equivalent VAR(p) representation of a VEC(p – 1) model is: Γ(L)yt = c + dt + βxt + εt, where Γ(L) = I − Γ1L − Γ2L2 − ... − ΓpLp and I is the numseries-by-numseries identify matrix. In lag operator notation, the infinite lag MA representation of yt is: yt = Γ−1(L) c + βxt + dt + Γ−1(L)εt = Ω(L) c + βxt + dt + Ω(L)εt . 12-932
fevd
The general form of the FEVD of ykt (variable k) m periods into the future, attributable to a onestandard-deviation innovation shock to yjt, is m−1
γm jk =
2
∑
ek′Cte j
∑
ek′ Ωt ΣΩt′ek
t=0 m−1 t=0
.
• ej is a selection vector of length numseries containing a 1 in element j and zeros elsewhere. • For orthogonalized FEVDs, Cm = ΩmP, where P is the lower triangular factor in the Cholesky factorization of Σ. • For generalized FEVDs, Cm = σ−1ΩmΣ, where σj is the standard deviation of innovation j. j • The numerator is the contribution of an innovation shock to variable j to the forecast error variance of the m-step-ahead forecast of variable k. The denominator is the mean square error (MSE) of the m-step-ahead forecast of variable k [4]. Vector Error-Correction Model A vector error-correction (VEC) model is a multivariate, stochastic time series model consisting of a system of m = numseries equations of m distinct, differenced response variables. Equations in the system can include an error-correction term, which is a linear function of the responses in levels used to stabilize the system. The cointegrating rank r is the number of cointegrating relations that exist in the system. Each response equation can include an autoregressive polynomial composed of first differences of the response series (short-run polynomial of degree p – 1), a constant, a time trend, exogenous predictor variables, and a constant and time trend in the error-correction term. A VEC(p – 1) model in difference-equation notation and in reduced form can be expressed in two ways: • This equation is the component form of a VEC model, where the cointegration adjustment speeds and cointegration matrix are explicit, whereas the impact matrix is implied. Δyt = A B′yt − 1 + c0 + d0t + c1 + d1t + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt = c + dt + AB′yt − 1 + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt . The cointegrating relations are B'yt – 1 + c0 + d0t and the error-correction term is A(B'yt – 1 + c0 + d0t). • This equation is the impact form of a VEC model, where the impact matrix is explicit, whereas the cointegration adjustment speeds and cointegration matrix are implied. Δyt = Πyt − 1 + A c0 + d0t + c1 + d1t + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt = c + dt + Πyt − 1 + Φ1 Δyt − 1 + ... + Φp − 1 Δyt − (p − 1) + βxt + εt . In the equations: • yt is an m-by-1 vector of values corresponding to m response variables at time t, where t = 1,...,T. • Δyt = yt – yt – 1. The structural coefficient is the identity matrix. • r is the number of cointegrating relations and, in general, 0 < r < m. 12-933
12
Functions
• A is an m-by-r matrix of adjustment speeds. • B is an m-by-r cointegration matrix. • Π is an m-by-m impact matrix with a rank of r. • c0 is an r-by-1 vector of constants (intercepts) in the cointegrating relations. • d0 is an r-by-1 vector of linear time trends in the cointegrating relations. • c1 is an m-by-1 vector of constants (deterministic linear trends in yt). • d1 is an m-by-1 vector of linear time-trend values (deterministic quadratic trends in yt). • c = Ac0 + c1 and is the overall constant. • d = Ad0 + d1 and is the overall time-trend coefficient. • Φj is an m-by-m matrix of short-run coefficients, where j = 1,...,p – 1 and Φp – 1 is not a matrix containing only zeros. • xt is a k-by-1 vector of values corresponding to k exogenous predictor variables. • β is an m-by-k matrix of regression coefficients. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. Condensed and in lag operator notation, the system is Φ(L)(1 − L)yt = A B′yt − 1 + c0 + d0t + c1 + d1t + βxt + εt = c + dt + AB′yt − 1 + βxt + εt where Φ(L) = I − Φ1 − Φ2 − ... − Φp − 1, I is the m-by-m identity matrix, and Lyt = yt – 1. If m = r, then the VEC model is a stable VAR(p) model in the levels of the responses. If r = 0, then the error-correction term is a matrix of zeros, and the VEC(p – 1) model is a stable VAR(p – 1) model in the first differences of the responses.
Algorithms • If Method is "orthogonalized", then fevd orthogonalizes the innovation shocks by applying the Cholesky factorization of the model covariance matrix Mdl.Covariance. The covariance of the orthogonalized innovation shocks is the identity matrix, and the FEVD of each variable sums to one, that is, the sum along any row of Decomposition or rows associated with FEVD variables in Tbl is one. Therefore, the orthogonalized FEVD represents the proportion of forecast error variance attributable to various shocks in the system. However, the orthogonalized FEVD generally depends on the order of the variables. If Method is "generalized", then the resulting FEVD, then the resulting FEVD is invariant to the order of the variables, and is not based on an orthogonal transformation. Also, the resulting FEVD sums to one for a particular variable only when Mdl.Covariance is diagonal[5]. Therefore, the generalized FEVD represents the contribution to the forecast error variance of equation-wise shocks to the response variables in the model. • If Mdl.Covariance is a diagonal matrix, then the resulting generalized and orthogonalized FEVDs are identical. Otherwise, the resulting generalized and orthogonalized FEVDs are identical only when the first variable in Mdl.SeriesNames shocks all variables (for example, all else being the same, both methods yield the same value of Decomposition(:,1,:)). • The predictor data in X or InSample represents a single path of exogenous multivariate time series. If you specify X or InSample and the model Mdl has a regression component (Mdl.Beta is 12-934
fevd
not an empty array), fevd applies the same exogenous data to all paths used for confidence interval estimation. • fevd conducts a simulation to estimate the confidence bounds Lower and Upper or associated variables in Tbl. • If you do not specify residuals by supplying E or using InSample, fevd conducts a Monte Carlo simulation by following this procedure: 1
Simulate NumPaths response paths of length SampleSize from Mdl.
2
Fit NumPaths models that have the same structure as Mdl to the simulated response paths. If Mdl contains a regression component and you specify predictor data by supplying X or using InSample, fevd fits the NumPaths models to the simulated response paths and the same predictor data (the same predictor data applies to all paths).
3
Estimate NumPaths FEVDs from the NumPaths estimated models.
4
For each time point t = 0,…,NumObs, estimate the confidence intervals by computing 1 – Confidence and Confidence quantiles (the upper and lower bounds, respectively).
• Otherwise, fevd conducts a nonparametric bootstrap by following this procedure: 1
Resample, with replacement, SampleSize residuals from E or InSample. Perform this step NumPaths times to obtain NumPaths paths.
2
Center each path of bootstrapped residuals.
3
Filter each path of centered, bootstrapped residuals through Mdl to obtain NumPaths bootstrapped response paths of length SampleSize.
4
Complete steps 2 through 4 of the Monte Carlo simulation, but replace the simulated response paths with the bootstrapped response paths.
Version History Introduced in R2019a R2022b: fevd accepts input data in tables and timetables, and return results in tables and timetables In addition to accepting input data in numeric arrays, fevd accepts input data in tables and timetables. fevd chooses default series on which to operate, but you can use the following namevalue arguments to select variables. • Presample specifies the input table or regular timetable of presample response data. • PresampleResponseVariables specifies the response series names from Presample. • Insample specifies the table or regular timetable of residual and predictor data to compute bootstrap estimates. • ResidualVariables specifies the residual series names in InSample. • PredictorVariables specifies the predictor series in InSample for a model regression component.
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-935
12
Functions
[2] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [3] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [4] Lütkepohl, Helmut. New Introduction to Multiple Time Series Analysis. New York, NY: SpringerVerlag, 2007. [5] Pesaran, H. H., and Y. Shin. "Generalized Impulse Response Analysis in Linear Multivariate Models." Economic Letters. Vol. 58, 1998, pp. 17–29.
See Also Objects vecm Functions estimate | simulate | filter | irf | varm
12-936
fgls
fgls Feasible generalized least squares
Syntax [coeff,se,EstCoeffCov] = fgls(X,y) [CoeffTbl,CovTbl] = fgls(Tbl) [ ___ ] = fgls( ___ ,Name=Value) [ ___ ] = fgls(ax, ___ ,Plot=plot) [ ___ ,iterPlots] = fgls( ___ ,Plot=plot)
Description [coeff,se,EstCoeffCov] = fgls(X,y) returns vectors of coefficient estimates coeff and corresponding standard errors se, and the estimated coefficient covariance matrix EstCoeffCov from applying feasible generalized least squares on page 12-957 (FGLS) to the multiple linear regression model y = Xβ + ε. y is a vector of response data and X is a matrix of predictor data. [CoeffTbl,CovTbl] = fgls(Tbl) applies FGLS to the variables in the table or timetable Tbl, and returns FGLS coefficient estimates and standard errors in the table CoeffTbl and FGLS estimated coefficient covariance matrix EstCoeffCov. The response variable in the regression is the last table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. [ ___ ] = fgls( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. fgls returns the output argument combination for the corresponding input arguments. For example, fgls(Tbl,ResponseVariable="GDP",InnovMdl="H4",Plot="all") provides coefficient, standard error, and residual mean-squared error (MSE) plots of iterations of FGLS for a regression model with White’s robust innovations covariance, and the table variable GDP is the response while all other variables are predictors. [ ___ ] = fgls(ax, ___ ,Plot=plot) plots on the axes specified in ax instead of the axes of new figures when plot is not "off". ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,iterPlots] = fgls( ___ ,Plot=plot) returns handles to plotted graphics objects when plot is not "off". Use elements of iterPlots to modify properties of the plots after you create them.
Examples
12-937
12
Functions
Estimate FGLS Coefficients and Uncertainty Measures Suppose the sensitivity of the US consumer price index (CPI) to changes in the paid compensation of employees (COE) is of interest. Load the US macroeconomic data set, which contains the timetable of data DataTimeTable. Extract the COE and CPI series from the table. load Data_USEconModel.mat COE = DataTimeTable.COE; CPI = DataTimeTable.CPIAUCSL; dt = DataTimeTable.Time;
Plot the series. tiledlayout(2,1) nexttile plot(dt,CPI); title("\bf Consumer Price Index, Q1 in 1947 to Q1 in 2009"); axis tight nexttile plot(dt,COE); title("\bf Compensation Paid to Employees, Q1 in 1947 to Q1 in 2009"); axis tight
The series are nonstationary. Stabilize them by computing their returns.
12-938
fgls
rCPI = price2ret(CPI); rCOE = price2ret(COE);
Regress rCPI onto rCOE including an intercept to obtain ordinary least squares (OLS) estimates, standard errors, and the estimated coefficient covariance. Generate a lagged residual plot. Mdl = fitlm(rCOE,rCPI); clmCoeff = Mdl.Coefficients.Estimate clmCoeff = 2×1 0.0033 0.3513 clmSE = Mdl.Coefficients.SE clmSE = 2×1 0.0010 0.0490 CLMEstCoeffCov = Mdl.CoefficientCovariance CLMEstCoeffCov = 2×2 0.0000 -0.0000
-0.0000 0.0024
figure plotResiduals(Mdl,"lagged")
12-939
12
Functions
The residual plot exhibits an upward trend, which suggests that the innovations comprise an autoregressive process. This violates one of the classical linear model assumptions. Consequently, hypothesis tests based on the regression coefficients are incorrect, even asymptotically. Estimate the regression coefficients, standard errors, and coefficient covariances using FGLS. By default, fgls includes an intercept in the regression model and imposes an AR(1) model on the innovations. [coeff,se,EstCoeffCov] = fgls(rCPI,rCOE) coeff = 2×1 0.0148 0.1961 se = 2×1 0.0012 0.0685 EstCoeffCov = 2×2 0.0000 -0.0000
12-940
-0.0000 0.0047
fgls
Row 1 of the outputs corresponds to the intercept and row 2 corresponds to the coefficient of rCOE. If the COE series is exogenous with respect to the CPI, then the FGLS estimates coeff are consistent and asymptotically more efficient than the OLS estimates.
Estimate FGLS Coefficients and Uncertainty Measures on Table Data Load the US macroeconomic data set, which contains the timetable of data DataTimeTable. load Data_USEconModel
Stabilize all series by computing their returns. RDT = price2ret(DataTimeTable);
RDT is a timetable of returns of all variables in DataTimeTable. The price2ret function conserves variable names. Estimate the regression coefficients, standard errors, and the coefficient covariance matrix using FGLS. Specify the response and predictor variable names. [CoeffTbl,CoeffCovTbl] = fgls(RDT,ResponseVariable="CPIAUCSL",PredictorVariables="COE") CoeffTbl=2×2 table Coeff __________ Const COE
6.2416e-05 0.20562
CoeffCovTbl=2×2 table Const ___________ Const COE
1.7848e-10 -5.6329e-07
SE _________ 1.336e-05 0.055615
COE ___________ -5.6329e-07 0.003093
When you supply a table or timetable of data, fgls returns tables of estimates.
Specify AR Lags For FGLS Estimation Suppose the sensitivity of the US consumer price index (CPI) to changes in the paid compensation of employees (COE) is of interest. This example expands on the analysis outlined in the example “Estimate FGLS Coefficients and Uncertainty Measures” on page 12-937. Load the US macroeconomic data set. load Data_USEconModel
The series are nonstationary. Stabilize them by applying the log, and then the first difference. 12-941
12
Functions
LDT = price2ret(Data); rCOE = LDT(:,1); rCPI = LDT(:,2);
Regress rCPI onto rCOE, which includes an intercept to obtain OLS estimates. Plot correlograms for the residuals. Mdl = fitlm(rCOE,rCPI); u = Mdl.Residuals.Raw; figure; subplot(2,1,1) autocorr(u); subplot(2,1,2); parcorr(u);
The correlograms suggest that the innovations have significant AR effects. According to Box-Jenkins methodology, the innovations seem to comprise an AR(3) series. For details, see “Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2. Estimate the regression coefficients using FGLS. By default, fgls assumes that the innovations are autoregressive. Specify that the innovations are AR(3) by using the ARLags name-value argument, and print the final estimates to the command window by using the Display name-value argument. fgls(rCPI,rCOE,ARLags=3,Display="final"); OLS Estimates:
12-942
fgls
| Coeff SE -----------------------Const | 0.0122 0.0009 x1 | 0.4915 0.0686 FGLS Estimates: | Coeff SE -----------------------Const | 0.0148 0.0012 x1 | 0.1972 0.0684
If the COE rate series is exogenous with respect to the CPI rate, the FGLS estimates are consistent and asymptotically more efficient than the OLS estimates.
Account for Residual Heteroscedasticity Using FGLS Estimation Model the nominal GNP GNPN growth rate accounting for the effects of the growth rates of the consumer price index CPI, real wages WR, and the money stock MS. Account for classical linear model departures. Load the Nelson-Plosser data set, which contains the data in the table DataTable. Remove all observations containing at least one missing value. load Data_NelsonPlosser DT = rmmissing(DataTable); T = height(DT);
Plot the series. predNames = ["CPI" "WR" "MS"]; tiledlayout(2,2) for j = ["GNPN" predNames] nexttile plot(DT{:,j}); xticklabels(DT.Dates) title(j); axis tight end
12-943
12
Functions
All series appear nonstationary. For each series, compute the returns. RetDT = price2ret(DT);
RetTT is a timetable of the returns of the variables in TT. The variables names are conserved. Regress the GNPN rate onto the CPI, WR, and MS rates. Examine a scatter plot and correlograms of the residuals. Mdl = fitlm(RetDT,ResponseVar="GNPN",PredictorVar=predNames); figure plotResiduals(Mdl,"caseorder"); axis tight
12-944
fgls
figure tiledlayout(2,1) nexttile autocorr(Mdl.Residuals.Raw); nexttile parcorr(Mdl.Residuals.Raw);
12-945
12
Functions
The residuals appear to flare in, which is indicative of heteroscedasticity. The correlograms suggest that there is no autocorrelation. Estimate FGLS coefficients by accounting for the heteroscedasticity of the residuals. Specify that the estimated innovation covariance is diagonal with the squared residuals as weights (that is, White's robust estimator H0). fgls(RetDT,ResponseVariable="GNPN",PredictorVariables=predNames, ... InnovMdl="HC0",Display="final"); OLS Estimates: | Coeff SE ------------------------Const | -0.0076 0.0085 CPI | 0.9037 0.1544 WR | 0.9036 0.1906 MS | 0.4285 0.1379 FGLS Estimates: | Coeff SE ------------------------Const | -0.0102 0.0017 CPI | 0.8853 0.0169 WR | 0.8897 0.0294 MS | 0.4874 0.0291
12-946
fgls
Estimate FGLS Coefficients of Models Containing ARMA Errors Create this regression model with ARMA(1,2) errors, where εt is Gaussian with mean 0 and variance 1. 2 + ut 3 ut = 0 . 6ut − 1 + εt − 0 . 3εt − 1 + 0 . 1εt − 1 . yt = 1 + xt
beta = [2 3]; phi = 0.2; theta = [-0.3 0.1]; Mdl = regARIMA(AR=phi,MA=theta,Intercept=1, ... Beta=beta,Variance=1);
Mdl is a regARIMA model. You can access its properties using dot notation. Simulate 500 periods of 2-D standard Gaussian values for xt, and then simulate responses using Mdl. numObs = 500; rng(1); % For reproducibility X = randn(numObs,2); y = simulate(Mdl,numObs,X=X);
fgls supports AR(p) innovation models. You can convert an ARMA model polynomial to an infinite-lag AR model polynomial using arma2ar. By default, arma2ar returns the coefficients for the first 10 terms. After the conversion, determine how many lags of the resulting AR model are practically significant by checking the length of the returned vector of coefficients. Choose the number of terms that exceed 0.00001. format long arParams = arma2ar(phi,theta) arParams = 1×3 -0.100000000000000
0.070000000000000
0.031000000000000
arLags = sum(abs(arParams) > 0.00001); format short
Some of the parameters have small magnitude. You might want to reduce the number of lags to include in the innovations model for fgls. Estimate the coefficients and their standard errors using FGLS and the simulated data. Specify that the innovations comprise an AR(arLags) process. [coeff,~,EstCoeffCov] = fgls(X,y,InnovMdl="AR",ARLags=arLags) coeff = 3×1 1.0372 2.0366
12-947
12
Functions
2.9918 EstCoeffCov = 3×3 0.0026 -0.0000 0.0001
-0.0000 0.0022 0.0000
0.0001 0.0000 0.0024
The estimated coefficients are close to their true values.
Plot Iterations of FGLS Estimation This example expands on the analysis in “Estimate FGLS Coefficients of Models Containing ARMA Errors” on page 12-947. Create this regression model with ARMA(1,4) errors, where εt is Gaussian with mean 0 and variance 1. 1.5 + ut 2 ut = 0 . 9ut − 1 + εt − 0 . 4εt − 1 + 0 . 2εt − 4 . yt = 1 + xt
beta = [1.5 2]; phi = 0.9; theta = [-0.4 0.2]; Mdl = regARIMA(AR=phi,MA=theta,MALags=[1 4],Intercept=1,Beta=beta,Variance=1);
Suppose the distribution of the predictors is xt ∼ N
−1 0 . 25 0 , . 1 0 1
Simulate 30 periods from xt, and then simulate 30 corresponding responses from the regression model with ARMA errors Mdl. numObs = 30; rng(1); % For reproducibility muX = [-1 1]; sigX = [0.5 1]; X = randn(numObs,numel(beta)).*sigX + muX; y = simulate(Mdl,numObs,X=X);
Convert the ARMA model polynomial to an infinite-lag AR model polynomial using arma2ar. By default, arma2ar returns the coefficients for the first 10 terms. Find the number of terms that exceed 0.00001. arParams = arma2ar(phi,theta); arLags = sum(abs(arParams) > 1e-5);
Estimate the regression coefficients by using eight iterations of FGLS, and specify the number of lags in the AR innovation model (arLags). Also, specify to plot the coefficient estimates and their standard errors for each iteration, and to display the final estimates and the OLS estimates in tabular form. 12-948
fgls
[coeff,~,EstCoeffCov] = fgls(X,y,InnovMdl="AR",ARLags=arLags, ... NumIter=8,Plot=["coeff" "se"],Display="final"); OLS Estimates: | Coeff SE -----------------------Const | 1.7619 0.4514 x1 | 1.9637 0.3480 x2 | 1.7242 0.2152 FGLS Estimates: | Coeff SE -----------------------Const | 1.0845 0.6972 x1 | 1.7020 0.2919 x2 | 2.0825 0.1603
12-949
12
Functions
The algorithm seems to converge after the four iterations. The FGLS estimates are closer to the true values than the OLS estimates. Properties of iterative FGLS estimates in finite samples are difficult to establish. For asymptotic properties, one iteration of FGLS is sufficient, but fgls supports iterative FGLS for experimentation. If the estimates or standard errors show instability after successive iterations, then the estimated innovations covariance might be ill conditioned. Consider scaling the residuals by using the ResCond name-value argument to improve the conditioning of the estimated innovations covariance.
Input Arguments X — Predictor data X numeric matrix Predictor data X for the multiple linear regression model, specified as a numObs-by-numPreds numeric matrix. Each row represents one of the numObs observations and each column represents one of the numPreds predictor variables. Data Types: double y — Response data y numeric vector 12-950
fgls
Response data y for the multiple linear regression model, specified as a numObs-by-1 numeric vector. Rows of y and X correspond. Data Types: double Tbl — Combined predictor and response data table | timetable Combined predictor and response data for the multiple linear regression model, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. The test regresses the response variable, which is the last variable in Tbl, on the predictor variables, which are all other variables in Tbl. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument to select numPreds predictors. ax — Axes on which to plot vector of Axes objects Axes on which to plot, specified as a vector of Axes objects with length equal to the number of plots specified by the Plot name-value argument. By default, fgls creates a separate figure for each plot. Note NaNs in X, y, or Tbl indicate missing values, and fgls removes observations containing at least one NaN. That is, to remove NaNs in X or y, fgls merges the variables [X y], and then it uses list-wise deletion to remove any row that contains at least one NaN. fgls also removes any row of Tbl containing at least one NaN. Removing NaNs in the data reduces the sample size and can create irregular time series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: fgls(Tbl,ResponseVariable="GDP",InnovMdl="H4",Plot="all") provides coefficient, standard error, and residual mean squared error (RMSE) plots of iterations of FGLS for a regression model with White’s robust innovations covariance, and the table variable GDP is the response while all other variables are predictors. VarNames — Unique variable names to use in display string vector | character vector | cell vector of strings | cell vector of character vectors Unique variable names used in the display, specified as a string vector or cell vector of strings of a length numCoeffs: • If Intercept=true, VarNames(1) is the name of the intercept (for example 'Const') and VarNames(j + 1) specifies the name to use for variable X(:,j) or PredictorVariables(j). 12-951
12
Functions
• If Intercept=false, VarNames(j) specifies the name to use for variable X(:,j) or PredictorVariables(j). The defaults is one of the following alternatives prepended by 'Const' when an intercept is present in the model: • {'x1','x2',...} when you supply inputs X and y • Tbl.Properties.VariableNames when you supply input table or timetable Tbl Example: VarNames=["Const" "AGE" "BBD"] Data Types: char | cell | string Intercept — Flag to include intercept true (default) | false Flag to include a model intercept, specified as a value in this table. Value
Description
true
fgls includes an intercept term in the regression model. numCoeffs = numPreds + 1.
false
fgls does not include an intercept when fitting the regression model. numCoeffs = numPreds.
Example: Intercept=false Data Types: logical InnovMdl — Model for innovations covariance estimate "AR" (default) | "CLM" | "HC0" | "HC1" | "HC2" | "HC3" | "HC4" | character vector Model for the innovations covariance estimate, specified as a model name in the following table. Set InnovMdl to specify the structure of the innovations covariance estimator Ω . • For diagonal innovations covariance models (i.e., models with heteroscedasticity), Ω = diag(ω), where ω = {ωi; i = 1,...,T} is a vector of innovation variance estimates for the observations, and T = numObs. fgls estimates the data-driven vector ω using the corresponding model residuals (ε), their −1
leverages hi = xi(X′X) Model Name
xi′, and the degrees of freedom dfe. Weight
Reference
"CLM"
"HC0" "HC1"
12-952
[4]
T
∑
ωi =
1 df e
ωi =
εi2
[6]
ωi =
T 2 ε df e i
[5]
εi2
i=1
fgls
Model Name "HC2" "HC3"
"HC4"
Weight ωi = ωi = ωi =
Reference [5]
εi2 1 − hi
[5]
εi2 2
(1 − hi)
[1]
εi2 di
(1 − hi)
where di = min 4,
hi h
• For full innovation covariance models (in other words, models having heteroscedasticity and autocorrelation), specify "AR". fgls imposes an AR(p) model on the innovations, and constructs Ω using the number of lags, p, specified by the name-value argument arLags and the Yule-Walker equations. If the NumIter name-value argument is 1 and you specify the InnovCov0 name-value argument, fgls ignores InnovMdl. Example: InnovMdl=HC0 Data Types: char | string ARLags — Number of lags 1 (default) | positive integer Number of lags to include in the autoregressive (AR) innovations model, specified as a positive integer. If the InnovMdl name-value argument is not "AR" (that is, for diagonal models), fgls ignores ARLags. For general ARMA innovations models, convert the innovations model to the equivalent AR form by performing one of the following actions. • Construct the ARMA innovations model lag operator polynomial using LagOp. Then, divide the AR polynomial by the MA polynomial using, for example, mrdivide. The result is the infinite-order, AR representation of the ARMA model. • Use arma2ar, which returns the coefficients of the infinite-order, AR representation of the ARMA model. Example: ARLags=4 Data Types: double InnovCov0 — Initial innovations covariance [] (default) | positive vector | positive definite matrix | positive semidefinite matrix Initial innovations covariance, specified as a positive vector, positive semidefinite matrix, or a positive definite matrix.
12-953
12
Functions
InnovCov0 replaces the data-driven estimate of the innovations covariance (Ω ) in the first iteration of GLS. • For diagonal innovations covariance models (that is, models with heteroscedasticity), specify a numObs-by-1 vector. InnovCov0(j) is the variance of innovation j. • For full innovation covariance models (that is, models having heteroscedasticity and autocorrelation), specify a numObs-by-numObs matrix. InnovCov0(j,k) is the covariance of innovations j and k. By default, fgls uses a data-driven Ω (see the InnovMdl name-value argument). Data Types: double NumIter — Number of iterations 1 (default) | positive integer Number of iterations to implement for the FGLS algorithm, specified as a positive integer. fgls estimates the innovations covariance Ω at each iteration from the residual series according to the innovations covariance model InnovMdl. Then, the software computes the GLS estimates of the model coefficients. Example: NumIter=10 Data Types: double ResCond — Flag to scale residuals false (default) | true Flag to scale the residuals at each iteration of FGLS, specified as a value in this table. Value
Description
true
fgls scales the residuals at each iteration.
false
fgls does not scale the residuals at each iteration.
Tip The setting ResCond=true can improve the conditioning of the estimation of the innovations covariance Ω . Data Types: logical Display — Command window display control "off" (default) | "final" | "iter" | character vector Command window display control, specified as a value in this table.
12-954
Value
Description
"final"
fgls displays the final estimates.
"iter"
fgls displays the estimates after each iteration.
"off"
fgls suppresses command window display.
fgls
fgls shows estimation results in tabular form. Example: Display="iter" Data Types: char | string Plot — Control for plotting results "off" (default) | "all" | "coeff" | "mse" | "se" | character vector | string vector | cell array of character vectors Control for plotting results after each iteration, specified as a value in the following table, or a string vector or cell array of character vectors of such values. To examine the convergence of the FGLS algorithm, specify plotting the estimates for each iteration. Value
Description
"all"
fgls plots the estimated coefficients, their standard errors, and the residual mean-squared error (MSE) on separate plots.
"coeff"
fgls plots the estimated coefficients.
"mse"
fgls plots the MSEs.
"off"
fgls does not plot the results.
"se"
fgls plots the estimated coefficient standard errors.
Example: Plot="all" Example: Plot=["coeff" "se"] separately plots iterative coefficient estimates and their standard errors. Data Types: char | string | cell ResponseVariable — Variable in Tbl to use for response first variable in Tbl (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variable in Tbl to use for response, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. fgls uses the same specified response variable for all tests. Example: ResponseVariable="GDP" Example: ResponseVariable=[true false false false] or ResponseVariable=1 selects the first table variable as the response. Data Types: double | logical | char | cell | string PredictorVariables — Variables in Tbl to use for the predictors string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl to use for the predictors, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. 12-955
12
Functions
fgls uses the same specified predictors for all tests. By default, fgls uses all variables in Tbl that are not specified by the ResponseVariable namevalue argument. Example: PredictorVariables=["UN" "CPI"] Example: PredictorVariables=[false true true false] or DataVariables=[2 3] selects the second and third table variables. Data Types: double | logical | char | cell | string
Output Arguments coeff — FGLS coefficient estimates numeric vector FGLS coefficient estimates, returned as a numCoeffs-by-1 numeric vector. fgls returns coeff when you supply the inputs X and y. Rows of coeff correspond to the predictor matrix columns, with the first row corresponding to the intercept when Intercept=true. For example, in a model with an intercept, the value of β 1 (corresponding to the predictor x1) is in position 2 of coeff. se — Coefficient standard error estimates numeric vector Coefficient standard error estimates, returned as a numCoeffs-by-1 numeric. The elements of se are sqrt(diag(EstCoeffCov)). fgls returns se when you supply the inputs X and y. Rows of se correspond to the predictor matrix columns, with the first row corresponding to the intercept when Intercept=true. For example, in a model with an intercept, the estimated standard error of β 1 (corresponding to the predictor x1) is in position 2 of se, and is the square root of the value in position (2,2) of EstCoeffCov. EstCoeffCov — Coefficient covariance matrix estimate numeric matrix Coefficient covariance matrix estimate, returned as a numCoeffs-by-numCoeffs numeric matrix. fgls returns EstCoeffCov when you supply the inputs X and y. Rows and columns of EstCoeffCov correspond to the predictor matrix columns, with the first row and column corresponding to the intercept when Intercept=true. For example, in a model with an intercept, the estimated covariance of β 1 (corresponding to the predictor x1) and β 2 (corresponding to the predictor x2) are in positions (2,3) and (3,2) of EstCoeffCov, respectively. CoeffTbl — FGLS coefficient estimates and standard errors table FGLS coefficient estimates and standard errors, returned as a numCoeffs-by-2 table. fgls returns CoeffTbl when you supply the input Tbl. For j = 1,…,numCoeffs, row j of CoeffTbl contains estimates of coefficient j in the regression model and it has label VarNames(j). The first variable Coeff contains the coefficient estimates coeff and the second variable SE contains the standard errors se. 12-956
fgls
CovTbl — Coefficient covariance matrix estimate table Coefficient covariance matrix estimate, returned as a numCoeffs-by-numCoeffs table containing the coefficient covariance matrix estimate EstCoeffCov. fgls returns CovTbl when you supply the input Tbl. For each pair (i,j), CovTbl(i,j) contains the covariance estimate of coefficients i and j in the regression model. The label of row and variable j is VarNames(j), j = 1,…,numCoeffs. iterPlots — Handles to plotted graphics objects structure array of graphics objects Handles to plotted graphics objects, returned as a structure array of graphics objects. iterPlots contains unique plot identifiers, which you can use to query or modify properties of the plot. iterPlots is not available if the value of the Plot name-value argument is "off".
More About Feasible Generalized Least Squares Feasible generalized least squares (FGLS) estimates the coefficients of a multiple linear regression model and their covariance matrix in the presence of nonspherical innovations with an unknown covariance matrix. Let yt = Xtβ + εt be a multiple linear regression model, where the innovations process εt is Gaussian with mean 0, but with true, nonspherical covariance matrix Ω (for example, the innovations are heteroscedastic or autocorrelated). Also, suppose that the sample size is T and there are p predictors (including an intercept). Then, the FGLS estimator of β is β FGLS = X ⊤Ω
−1
X
−1 ⊤
X Ω
−1
y,
where Ω is an innovations covariance estimate based on a model (e.g., innovations process forms an AR(1) model). The estimated coefficient covariance matrix is 2
Σ FGLS = σ FGLS X ⊤Ω
−1
X
−1
,
where
2 σ FGLS
y⊤ Ω =
−1
−Ω
−1
X X ⊤Ω
−1
X
−1 ⊤
T−p
X Ω
−1
y .
FGLS estimates are computed as follows: 1
OLS is applied to the data, and then residuals ε t are computed.
2
Ω is estimated based on a model for the innovations covariance.
3
β FGLS is estimated, along with its covariance matrix Σ FGLS .
4
Optional: This process can be iterated by performing the following steps until β FGLS converges. 12-957
12
Functions
a
Compute the residuals of the fitted model using the FGLS estimates.
b
Apply steps 2–3.
If Ω is a consistent estimator of Ω and the predictors that comprise X are exogenous, then FGLS estimators are consistent and efficient. Asymptotic distributions of FGLS estimators are unchanged by repeated iteration. However, iterations might change finite sample distributions. Generalized Least Squares Generalized least squares (GLS) estimates the coefficients of a multiple linear regression model and their covariance matrix in the presence of nonspherical innovations with known covariance matrix. The setup and process for obtaining GLS estimates is the same as in FGLS on page 12-957, but replace Ω with the known innovations covariance matrix Ω. In the presence of nonspherical innovations, and with known innovations covariance, GLS estimators are unbiased, efficient, and consistent, and hypothesis tests based on the estimates are valid. Weighted Least Squares Weighted least squares (WLS) estimates the coefficients of a multiple linear regression model and their covariance matrix in the presence of uncorrelated but heteroscedastic innovations with known, diagonal covariance matrix. The setup and process to obtain WLS estimates is the same as in FGLS on page 12-957, but replace Ω with the known, diagonal matrix of weights. Typically, the diagonal elements are the inverse of the variances of the innovations. In the presence of heteroscedastic innovations, and when the variances of the innovations are known, WLS estimators are unbiased, efficient, and consistent, and hypothesis tests based on the estimates are valid.
Tips • To obtain standard generalized least squares on page 12-958 (GLS) estimates: • Set the InnovCov0 name-value argument to the known innovations covariance. • Set the NumIter name-value argument to 1. • To obtain weighted least squares on page 12-958 (WLS) estimates, set the InnovCov0 name-value argument to a vector of inverse weights (e.g., innovations variance estimates). • In specific models and with repeated iterations, scale differences in the residuals might produce a badly conditioned estimated innovations covariance and induce numerical instability. Conditioning improves when you set ResCond=true.
Algorithms • In the presence of nonspherical innovations, GLS produces efficient estimates relative to OLS and consistent coefficient covariances, conditional on the innovations covariance. The degree to which fgls maintains these properties depends on the accuracy of both the model and estimation of the innovations covariance. 12-958
fgls
• Rather than estimate FGLS estimates the usual way, fgls uses methods that are faster and more stable, and are applicable to rank-deficient cases. • Traditional FGLS methods, such as the Cochrane-Orcutt procedure, use low-order, autoregressive models. These methods, however, estimate parameters in the innovations covariance matrix using OLS, where fgls uses maximum likelihood estimation (MLE) [2].
Version History Introduced in R2014b R2022a: fgls returns estimates in tables when you supply a table of data If you supply a table of time series data Tbl, fgls returns the following outputs: • In the first position, fgls returns the table CoeffTbl containing variables for coefficient estimates Coeff and standard errors SE with rows corresponding to, and labeled as, VarNames. • In the second position, fgls returns a table containing the estimated coefficient covariances CovTbl with rows and variables corresponding to, and labeled as, VarNames. Before R2022a, fgls returned the numeric outputs in separate positions of the output when you supplied a table of input data. Starting in R2022a, if you supply a table of input data, update your code to return all outputs in the first through third output positions. [CoeffTbl,CovTbl,iterPlots] = fgls(Tbl,Name=Value)
If you request more outputs, fgls issues an error. Also, access results by using table indexing. For more details, see “Access Data in Tables”.
References [1] Cribari-Neto, F. "Asymptotic Inference Under Heteroskedasticity of Unknown Form." Computational Statistics & Data Analysis. Vol. 45, 2004, pp. 215–233. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lϋtkepohl, and T. C. Lee. The Theory and Practice of Econometrics. New York, NY: John Wiley & Sons, Inc., 1985. [4] Kutner, M. H., C. J. Nachtsheim, J. Neter, and W. Li. Applied Linear Statistical Models. 5th ed. New York: McGraw-Hill/Irwin, 2005. [5] MacKinnon, J. G., and H. White. "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." Journal of Econometrics. Vol. 29, 1985, pp. 305–325. [6] White, H. "A Heteroskedasticity-Consistent Covariance Matrix and a Direct Test for Heteroskedasticity." Econometrica. Vol. 48, 1980, pp. 817–838. 12-959
12
Functions
See Also Functions fitlm | lscov | hac | arma2ar Objects regARIMA Topics “Classical Model Misspecification Tests” on page 3-69 “Time Series Regression I: Linear Models” on page 5-176 “Time Series Regression VI: Residual Diagnostics” on page 5-223 “Time Series Regression X: Generalized Least Squares and HAC Estimators” on page 5-282 “Autocorrelation and Partial Autocorrelation” on page 3-10 “Engle’s ARCH Test” on page 3-25 “Nonspherical Models” on page 3-89 “Time Series Regression Models” on page 5-3
12-960
filter
filter Filter disturbances using univariate ARIMA or ARIMAX model
Syntax Y = filter(Mdl,Z) [Y,E,V] = filter(Mdl,Z) Tbl2 = filter(Mdl,Tbl1) [ ___ ] = filter( ___ ,Name,Value)
Description Y = filter(Mdl,Z) returns the numeric array of one or more response series Y resulting from filtering the numeric array of one or more underlying disturbance series Z through the fully specified, univariate ARIMA model Mdl. Z is associated with the model innovations process that drives the specified ARIMA model. [Y,E,V] = filter(Mdl,Z) also returns numeric arrays of model innovations E and, when Mdl represents a composite conditional mean and variance model, conditional variances V, resulting from filtering the disturbance paths Z through the model Mdl. Tbl2 = filter(Mdl,Tbl1) returns the table or timetable Tbl2 containing the results from filtering the paths of disturbances in the input table or timetable Tbl1 through Mdl. The disturbance variable in Tbl1 is associated with the model innovations process through Mdl. filter selects the variable Mdl.SeriesName, or the sole variable in Tbl1, as the disturbance variable to filter through the model. To select a different variable in Tbl1 to filter through the model, use the DisturbanceVariable name-value argument. [ ___ ] = filter( ___ ,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. filter returns the output argument combination for the corresponding input arguments. For example, filter(Mdl,Z,Z0=PS,X=Pred) filters the numeric vector of disturbances Z through the ARIMAX Mdl, and specifies the numeric vector of presample disturbance data PS to initialize the model and the exogenous predictor data X for the regression component.
Examples Filter Vector of Disturbances Through Model Compute the impulse response function (IRF) of an ARMA model by filtering a vector of zeros, representing disturbances, through the model. Specify a mean zero ARMA(2,0,1) model. Mdl = arima(Constant=0,AR={0.5 -0.8},MA=-0.5, ... Variance=0.1);
12-961
12
Functions
Simulate the first 20 responses of the IRF. Generate a disturbance series with a one-time, unit impulse, and then filter it. z = [1; zeros(19,1)]; y = filter(Mdl,z);
y is a 20-by-1 response path resulting from filtering the disturbance path z through the model. y represents the IRF. The filter function requires one presample observation to initialize the model. By default, filter uses the unconditional mean of the process, which is 0. y = y/y(1);
Normalize the IRF such that the first element is 1. Plot the impulse response function. figure stem((0:numel(y)-1)',y,"filled"); title("Impulse Response")
The impulse response assesses the dynamic behavior of a system to a one-time, unit impulse. Alternatively, you can use the impulse function to plot the IRF for an ARIMA process.
12-962
filter
Simulate and Filter Multiple Paths Filter a matrix of disturbance paths. Return the paths of responses and innovations, which drive the data-generating processes. Create a mean zero ARIMA(2,0,1) model. Mdl = arima(Constant=0,AR={0.5,-0.8},MA=-0.5, ... Variance=0.1);
Generate 20 random, length 100 paths from the model. rng(1,"twister"); % For reproducibility [ySim,eSim,vSim] = simulate(Mdl,100,NumPaths=20);
ySim, eSim, and vSim are 100-by-20 matrices of 20 simulated response, innovation, and conditional variance paths of length 100, respectively. Because Mdl does not have a conditional variance model, vSim is a matrix completely composed of the value of Mdl.Variance. Obtain disturbance paths by standardizing the simulated innovations. zSim = eSim./sqrt(vSim);
Filter the disturbance paths through the model. [yFil,eFil] = filter(Mdl,zSim);
yFil and eFil are 100-by-20 matrices. The columns are independent paths generated from filtering corresponding disturbance paths in zSim through the model Mdl. Confirm that the outputs of simulate and filter are identical. sameE = norm(eSim - eFil) < eps sameE = logical 1 sameY = norm(ySim - yFil) < eps sameY = logical 1
The logical values 1 confirm the outputs are effectively identical.
Filter Disturbance Path in Timetable Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply a timetable of data and specify the series for the fit. Then, filter randomly generated Gaussian noise paths through the estimated model to simulate responses and innovations. Load Data Load the US equity index data set Data_EquityIdx. 12-963
12
Functions
load Data_EquityIdx T = height(DataTimeTable) T = 3028
The timetable DataTimeTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 2001. Plot the daily NYSE price series. figure plot(DataTimeTable.Time,DataTimeTable.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")
Prepare Timetable for Estimation When you plan to supply a timetable, you must ensure it has all the following characteristics: • The selected response variable is numeric and does not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the timetable, relative to the NYSE price series. DTT = rmmissing(DataTimeTable,DataVariables="NYSE"); T_DTT = height(DTT) T_DTT = 3028
12-964
filter
Because all sample times have observed NYSE prices, rmmissing does not remove any observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"days") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Business day rules make daily macroeconomic measurements irregular. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables. DTTW = convert2weekly(DTT,Aggregation="mean"); areTimestampsRegular = isregular(DTTW,"weeks") areTimestampsRegular = logical 1 T_DTTW = height(DTTW) T_DTTW = 627
DTTW is regular. figure plot(DTTW.Time,DTTW.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")
12-965
12
Functions
Create Model Template for Estimation Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Set the response series name to NYSE. Mdl = arima(1,1,1); Mdl.SeriesName = "NYSE";
Mdl is a partially specified arima model object. Fit Model to Data Fit an ARIMA(1,1,1) model to weekly average NYSE closing prices. Specify the entire series. EstMdl = estimate(Mdl,DTTW); ARIMA(1,1,1) Model (Gaussian Distribution):
Constant AR{1} MA{1} Variance
12-966
Value ________
StandardError _____________
0.86385 -0.37582 0.47221 55.89
0.46496 0.22719 0.21741 1.832
TStatistic __________ 1.8579 -1.6542 2.172 30.507
PValue ___________ 0.063181 0.098091 0.029859 2.1201e-204
filter
EstMdl is a fully specified, estimated arima model object. By default, estimate backcasts for the required Mdl.P = 2 presample responses. Filter Random Gaussian Disturbance Paths Generate 2 random, independent series of length T_DTTW from the standard Gaussian distribution. Store the matrix of series as one variable in DTTW. rng(1,"twister") % For reproducibility DTTW.Z = randn(T_DTTW,2);
DTTW contains a new variable called Z containing a T_DTTW-by-2 matrix of two disturbance paths. Filter the paths of disturbances through the estimated ARIMA model. Specify the table variable name containing the disturbance paths. Tbl2 = filter(EstMdl,DTTW,DisturbanceVariable="Z"); tail(Tbl2) Time ___________
NYSE ______
NASDAQ ______
Z _____________________
NYSE_Response ________________
NYSE_Innova _____________
16-Nov-2001 23-Nov-2001 30-Nov-2001 07-Dec-2001 14-Dec-2001 21-Dec-2001 28-Dec-2001 04-Jan-2002
577.11 583 581.41 584.96 574.03 582.1 590.28 589.8
1886.9 1898.3 1925.8 1998.1 1981 1967.9 1967.2 1950.4
-1.8948 1.3583 -0.9118 -0.14964 -0.40114 -0.57758 2.0039 -0.50964
358.78 367.95 363.35 361.61 359.6 355.48 370.83 369.19
-14.166 10.155 -6.8165 -1.1187 -2.9989 -4.318 14.981 -3.8101
0.41292 0.27051 1.1119 -2.418 0.98498 0.0039243 -0.92415 -0.43856
433.57 436.63 445.61 428.95 434.9 437.03 430.2 427.09
size(Tbl2) ans = 1×2 627
6
Tbl2 is a 627-by-6 timetable containing all variables in DTTW, and the two filtered response paths NYSE_Response, innovation paths NYSE_Innovation, and constant variance paths NYSE_Variance (Mdl.Variance = 55.89).
Supply Presample Responses Assess the dynamic behavior of a system to a persistent change in a variable by plotting a step response. Supply presample responses to initialize the model. Specify a mean zero ARIMA(2,0,1) process. Mdl = arima(Constant=0,AR={0.5 -0.8},MA=-0.5, ... Variance=0.1);
Simulate the first 20 responses to a sequence of unit disturbances. Generate a disturbance series of ones, and then filter it. Set all presample observations equal to zero. 12-967
-
0. -
12
Functions
Z = ones(20,1); Y = filter(Mdl,Z,Y0=zeros(Mdl.P,1)); Y = Y/Y(1);
The last step normalizes the step response function to ensure that the first element is 1. Plot the step response function. figure stem((0:numel(Y)-1)',Y,"filled"); title("Step Response")
Simulate Responses from ARIMAX Model Create models for the response and predictor series. Set an ARIMAX(2,1,3) model to the response MdlY, and an AR(1) model to the MdlX. MdlY = arima(AR={0.1 0.2},D=1,MA={-0.1 0.1 0.05}, ... Constant=1,Variance=0.5,Beta=2); MdlX = arima(AR=0.5,Constant=0,Variance=0.1);
Simulate a length 100 predictor series x and a series of iid normal disturbances z having mean zero and variance 1.
12-968
filter
rng(1,"twister") z = randn(100,1); x = simulate(MdlX,100);
Filter the disturbances z using MdlY to produce the response series y. Plot y. y = filter(MdlY,z,X=x); figure plot(y); xlabel("Time") ylabel("Response")
Filter Disturbances Through Composite Conditional Mean and Variance Model Create the composite AR(1)/GARCH(1,1) model yt = 1 + 0 . 5yt − 1 + εt εt = σtzt σt2 = 0 . 2 + 0 . 1σt2− 1 + 0 . 05εt2− 1 zt ∼ N(0, 1) . Create the composite model. 12-969
12
Functions
CVMdl = garch(Constant=0.2,GARCH=0.1,ARCH=0.05); Mdl = arima(Constant=1,AR=0.5,Variance=CVMdl) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(1,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 0 0 1 {0.5} at lag [1] {} {} {} 0 [1×0] [GARCH(1,1) Model]
Mdl is an arima object. The property Mdl.Variance contains a garch object that represents the conditional variance model. Generate a random series of 100 standard Gaussian of disturbances. rng(1,"twister") % For reproducibility z = randn(100,1);
Filter the disturbances through the model. Return and plot the simulated conditional variances. [y,e,v] = filter(Mdl,z); plot(z)
12-970
filter
Input Arguments Mdl — Fully specified ARIMA model arima model object Fully specified ARIMA model, specified as an arima model object created by arima or estimate. The properties of Mdl cannot contain NaN values. Z — Disturbance series paths zt numeric column vector | numeric matrix Underlying disturbance paths zt, specified as a numobs-by-1 numeric column vector or numobs-bynumpaths numeric matrix. numObs is the length of the time series (sample size). numpaths is the number of separate, independent disturbance paths. zt drives the innovation process εt. For a variance process σt2, the innovation process is εt = σtzt . Each row corresponds to a sampling time. The last row contains the latest set of disturbances. Each column corresponds to a separate, independent path of disturbances. filter assumes that disturbances across any row occur simultaneously. 12-971
12
Functions
Z is the continuation of the presample disturbances Z0. Data Types: double Tbl1 — Time series data table | timetable Time series data containing the observed disturbance variable zt, associated with the model innovations process εt, and, optionally, predictor variables xt, specified as a table or timetable with numvars variables and numobs rows. You can optionally select the disturbance variable or numpreds predictor variables by using the DisturbanceVariable or PredictorVariables name-value arguments, respectively. For a variance process σt2, the innovation process is εt = σtzt . Each row is an observation, and measurements in each row occur simultaneously. The selected disturbance variable is a single path (numobs-by-1 vector) or multiple paths (numobs-by-numpaths matrix) of numobs observations of disturbance data. Each path (column) of the selected disturbance variable is independent of the other paths, but path j of all presample and in-sample variables correspond, for j = 1,…,numpaths. Each selected predictor variable is a numobs-by-1 numeric vector representing one path. The filter function includes all predictor variables in the model when it filters each disturbance path. Variables in Tbl1 represent the continuation of corresponding variables in Presample. If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending. If Tbl1 is a table, the last row contains the latest observation. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: filter(Mdl,Z,Z0=PS,X=Pred) specifies the numeric vector of presample disturbance data PS to initialize the model and the exogenous predictor data X for the regression component. DisturbanceVariable — Disturbance variable zt to select from Tbl1 string scalar | character vector | integer | logical vector Disturbance variable zt to select from Tbl1 containing the disturbance data to filter through Mdl, specified as one of the following data types: • String scalar or character vector containing a variable name in Tbl1.Properties.VariableNames • Variable index (positive integer) to select from Tbl1.Properties.VariableNames • A logical vector, where DisturbanceVariable(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). 12-972
filter
If Tbl1 has one variable, the default specifies that variable. Otherwise, the default matches the variable to names in Mdl.SeriesName. Example: DisturbanceVariable="StockRateDist" Example: DisturbanceVariable=[false false true false] or DisturbanceVariable=3 selects the third table variable as the disturbance variable. Data Types: double | logical | char | cell | string Y0 — Presample response data yt numeric column vector | numeric matrix Presample response data yt to initialize the model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use Y0 only when you supply the numeric array of disturbance data Z. numpreobs is the number of presample observations. numprepaths is the number of presample response paths. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the AR model component. If numpreobs > Mdl.P, filter uses the latest required observations only. Columns of Y0 are separate, independent presample paths. The following conditions apply: • If Y0 is a column vector, it represents a single response path. filter applies it to each output path. • If Y0 is a matrix, each column represents a presample response path. filter applies Y0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, filter uses the first size(Z,2) columns only. By default, filter sets any necessary presample responses to one of the following values: • The unconditional mean of the model when Mdl represents a stationary AR process without a regression component • Zero when Mdl represents a nonstationary process or when it contains a regression component Data Types: double Z0 — Presample disturbance data zt numeric column vector | numeric matrix Presample disturbance data zt providing initial values for the input disturbance series Z, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use Z0 only when you supply the numeric array of disturbance data Z. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the MA model component. If Mdl.Variance is a conditional variance model (for example, a garch model object), filter can require more rows than Mdl.Q. If numpreobs is larger than required, filter uses the latest required observations only. Columns of Z0 are separate, independent presample paths. The following conditions apply: 12-973
12
Functions
• If Z0 is a column vector, it represents a single disturbance path. filter applies it to each output path. • If Z0 is a matrix, each column represents a presample disturbance path. filter applies Z0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, filter uses the first size(Z,2) columns only. By default, filter sets the necessary presample disturbances to zero. Data Types: double V0 — Presample conditional variance data σt2 positive numeric column vector | positive numeric matrix Presample conditional variance data σt2 used to initialize the conditional variance model, specified as a numpreobs-by-1 positive numeric column vector or a numpreobs-by-numprepaths positive numeric matrix. If the conditional variance Mdl.Variance is constant, filter ignores V0. Use V0 only when you supply the numeric array of disturbance data Z. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the conditional variance model in Mdl.Variance. For details, see the filter function of conditional variance models. If numpreobs is larger than required, filter uses the latest required observations only. Columns of V0 are separate, independent presample paths. The following conditions apply: • If V0 is a column vector, it represents a single path of conditional variances. filter applies it to each output path. • If V0 is a matrix, each column represents a presample path of conditional variances. filter applies V0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, filter uses the first size(Z,2) columns only. By default, filter sets all necessary presample conditional variances to the unconditional variance of the conditional variance process. Data Types: double Presample — Presample data table | timetable Presample data containing paths of response yt, disturbance zt, or conditional variance σt2 series to initialize the model, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1. Each selected variable is a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-bynumprepaths matrix) of numpreobs observations representing the presample of the response, disturbance, or conditional variance series for DisturbanceVariable, the selected disturbance variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be one of the following values: • At least Mdl.P when Presample provides only presample responses 12-974
filter
• At least Mdl.Q when Presample provides only presample disturbances or conditional variances • At least max([Mdl.P Mdl.Q]) otherwise When Mdl.Variance is a conditional variance model, filter can require more than the minimum required number of presample values. If you supply more rows than necessary, filter uses the latest required number of observations only. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default, filter sets the following values: • For necessary presample responses: • The unconditional mean of the model when Mdl represents a stationary AR process without a regression component • Zero when Mdl represents a nonstationary process or when it contains a regression component. • For necessary presample disturbances, zero. • For necessary presample conditional variances, the unconditional variance of the conditional variance model n Mdl.Variance. If you specify the Presample, you must specify the presample response, disturbance, or conditional variance name by using the PresampleResponseVariable, PresampleDisturbanceVariable, or PresampleVarianceVariable name-value argument. PresampleResponseVariable — Response variable yt to select from Presample string scalar | character vector | integer | logical vector Response variable yt to select from Presample containing presample response data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleResponseVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric matrix and cannot contain missing values (NaNs). If you specify presample response data by using the Presample name-value argument, you must specify PresampleResponseVariable. Example: PresampleResponseVariable="Stock0" 12-975
12
Functions
Example: PresampleResponseVariable=[false false true false] or PresampleResponseVariable=3 selects the third table variable as the presample response variable. Data Types: double | logical | char | cell | string PresampleDisturbanceVariable — Disturbance variable zt to select from Presample string scalar | character vector | integer | logical vector Disturbance variable zt to select from Presample containing presample disturbance data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleDisturbanceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric matrix and cannot contain missing values (NaNs). If you specify presample disturbance data by using the Presample name-value argument, you must specify PresampleDisturbanceVariable. Example: PresampleDisturbanceVariable="StockRateDist0" Example: PresampleDisturbanceVariable=[false false true false] or PresampleDisturbanceVariable=3 selects the third table variable as the presample disturbance variable. Data Types: double | logical | char | cell | string PresampleVarianceVariable — Conditional variance variable σt2 to select from Presample string scalar | character vector | integer | logical vector Conditional variance variable σt2 to select from Presample containing presample conditional variance data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable. Example: PresampleVarianceVariable="StockRateVar0" Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable. Data Types: double | logical | char | cell | string 12-976
filter
X — Exogenous predictor data numeric matrix Exogenous predictor data for the regression component in the model, specified as a numeric matrix with numpreds columns. numpreds is the number of predictor variables (numel(Mdl.Beta)). Use X only when you supply the numeric array of disturbance data Z. X must have at least numobs rows. The last row contains the latest predictor data. If X has more than numobs rows, filter uses only the latest numobs rows. Each row of X corresponds to each period in Z (period for which filter filters errors; the period after the presample). filter does not use the regression component in the presample period. Columns of X are separate predictor variables. filter applies X to each filtered path; that is, X represents one path of observed predictors. By default, filter excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Exogenous predictor variables xt to select from Tbl1 string vector | cell vector of character vectors | vector of integers | logical vector Exogenous predictor variables xt to select from Tbl1 containing predictor data for the regression component, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in Tbl1.Properties.VariableNames • A vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames • A logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variables must be numeric vectors and cannot contain missing values (NaNs). By default, filter excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. Data Types: double | logical | char | cell | string Note • NaN values in Z, X, Y0, Z0, and V0 indicate missing values. filter removes missing values from specified data by list-wise deletion. • For the presample, filter horizontally concatenates the possibly jagged arrays Y0, Z0, and V0 with respect to the last rows, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data, filter horizontally concatenates the possibly jagged arrays Z and X, and then it removes any row of the concatenated matrix containing at least one NaN. 12-977
12
Functions
This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, filter assumes that you synchronize the presample data such that the latest observations occur simultaneously. • filter issues an error when any table or timetable input contains missing values.
Output Arguments Y — Simulated response paths yt numeric column vector | numeric matrix Simulated response paths yt, returned as a length numobs column vector or a numobs-by-numpaths numeric matrix. filter returns Y only when you supply the input Z. For each t = 1, …, numobs, the simulated response at time t Y(t,:) corresponds to the filtered disturbance at time t Z(t,:) and response path j Y(:,j) corresponds to the filtered disturbance path j Z(:,j). Y represents the continuation of the presample response paths in Y0. E — Simulated paths of model innovations εt numeric column vector | numeric matrix Simulated paths of model innovations εt, returned as a length numobs column vector or a numobs-bynumpaths numeric matrix. filter returns E only when you supply the input Z. The dimensions of Y and E correspond. Columns of E are scaled disturbance paths (innovations) such that, for a particular path εt = σtzt . V — Conditional variance paths σt2 numeric column vector | numeric matrix Conditional variance paths σt2, returned as a length numobs column vector or numobs-by-numpaths numeric matrix. filter returns V only when you supply the input Z. The dimensions of Y and V correspond. If Z is a matrix, then the columns of V are the filtered conditional variance paths corresponding to the columns of Z. Columns of V are conditional variance paths of corresponding paths of innovations εt (E) such that, for a particular path εt = σtzt . V represents the continuation of the presample conditional variance paths in V0. Tbl2 — Simulated response yt, innovation εt, and conditional variance σt2 paths table | timetable Simulated response yt, innovation εt, and conditional variance σt2 paths, returned as a table or timetable, the same data type as Tbl1. filter returns Tbl2 only when you supply the input Tbl1. 12-978
filter
Tbl2 contains the following variables: • The simulated response paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the disturbance variable in Tbl1. filter names the simulated response variable in Tbl2 responseName_Response, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding simulated response paths with the name StockReturns_Response. • The simulated innovation paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the disturbance variable in Tbl1. filter names the simulated innovation variable in Tbl2 responseName_Innovation, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding simulated innovation paths with the name StockReturns_Innovation. • The simulated conditional variances paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the disturbance variable in Tbl1. filter names the simulated conditional variance variable in Tbl2 responseName_Variance, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding simulated conditional variance paths with the name StockReturns_Variance. • All variables Tbl1. If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal.
Alternative Functionality filter generalizes simulate; both functions filter a series of disturbances to produce output responses, innovations, and conditional variances. However, simulate autogenerates a series of mean zero, unit variance, independent and identically distributed (iid) disturbances according to the distribution in Mdl. In contrast, filter enables you to directly specify custom disturbances.
Version History Introduced in R2012b R2023b: filter accepts input data in tables and timetables, and returns results in tables and timetables In addition to accepting input data (in-sample and presample) in numeric arrays, filter accepts input data in tables or regular timetables. When you supply data in a table or timetable, the following conditions apply: • filter chooses the default in-sample disturbance series and predictor data on which to operate, but you can use the specified optional name-value argument to select a different series. • If you specify optional presample data to initialize the model, you must also specify the presample response, disturbance, or conditional variance series name. • filter returns results in a table or timetable. Name-value arguments to support tabular workflows include: 12-979
12
Functions
• DisturbanceVariable specifies the name of the disturbance series to select from the input data to filter through the model. • Presample specifies the input table or timetable of presample response, disturbance, and conditional variance data. • PresampleResponseVariable specifies the name of the response series to select from Presample. • PresampleDisturbanceVariable specifies the name of the disturbance series to select from Presample. • PresampleVarianceVariable specifies the name of the conditional variance series to select from Presample. • PredictorVariables specifies the names of the predictor series to select from the input data for a model regression component.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima Functions estimate | impulse | infer | simulate | forecast Topics “Simulate Conditional Mean and Variance Models” on page 7-162 “Plot the Impulse Response Function of Conditional Mean Model” on page 7-80 “Monte Carlo Simulation of Conditional Mean Models” on page 7-143 “Presample Data for Conditional Mean Model Simulation” on page 7-145
12-980
filter
filter Forward recursion of Bayesian nonlinear non-Gaussian state-space model
Syntax [X,logL] = filter(Mdl,Y,params) [X,logL] = filter(Mdl,Y,params,Name=Value) [X,logL,Output,RND] = filter( ___ )
Description filter estimates state-distribution moments of a Bayesian nonlinear non-Gaussian state-space model (bnlssm), conditioned on model parameters Θ, for each period of the specified response data by using importance sampling and resampling in the sequential Monte Carlo (SMC) framework. filter approximates the state filtering distribution and likelihood function by applying particles, or weighted random samples. [X,logL] = filter(Mdl,Y,params) returns approximate state-distribution means X for each sampling time in the input response data Y and the corresponding loglikelihood logL resulting from performing forward recursion of, or filtering, the Bayesian nonlinear state-space model Mdl. filter evaluates the parameter map Mdl.ParamMap by using the vector of parameter values params. filter filters Y and particles, weighted random samples representing state values, through the model by using SMC. [X,logL] = filter(Mdl,Y,params,Name=Value) specifies additional options using one or more name-value arguments. For example, filter(Mdl,Y,params,NumParticles=1e4,Resample="residual") specifies 1e4 particles for the SMC routine and to resample residuals. [X,logL,Output,RND] = filter( ___ ) additionally returns the following quantities using any of the input-argument combinations in the previous syntaxes: • Output — Filtering results by sampling period • Approximate loglikelihood values associated with the input data, input parameters, and particles • Filter estimate of state-distribution means • Filter estimate of state-distribution covariance • Custom statistics • Effective sample size • Flags indicating which data the software used to filter • Flags indicating resampling • RND — Normal random variables generated by filter used to reproduce results or reuse random variates generated by a previous call of filter
12-981
12
Functions
Examples Compute Filter State Estimates and Loglikelihood This example uses simulated data to compute filter estimates of state distribution means of the following Bayesian nonlinear state-space model in equation. The state-space model contains two independent, stationary, autoregressive states each with a model constant. The observations are a nonlinear function of the states with Gaussian noise. The prior distribution of the parameters is flat. Symbolically, the system of equations is xt, 1 xt, 2 xt, 3 xt, 4
=
θ1 θ2 0 0 xt − 1, 1 0 1 0 0 xt − 1, 2 0 0 θ3 θ4 xt − 1, 3
θ5 0 +
0 0 0 1 xt − 1, 4
0 0 ut, 1 0 θ6 ut, 3 0 0
yt = log(exp(xt, 1 − μ1) + exp(xt, 3 − μ3)) + θ7εt .
μ1 and μ3 are the unconditional means of the corresponding states. The initial distribution moments of each state are their unconditional mean and covariance. Create a Bayesian nonlinear state-space model characterized by the system. The observation equation is in equation form, that is, the function composing the states is nonlinear and the innovation series εt is additive, linear, and Gaussian. The Local Functions on page 12-984 section contains two functions required to specify the Bayesian nonlinear state-space model: the state-space model parameter mapping function and the prior distribution of the parameters. You can use the functions only within this script. Mdl = bnlssm(@paramMap,@priorDistribution) Mdl = bnlssm with properties: ParamMap: ParamDistribution: ObservationForm: Multipoint:
@paramMap @priorDistribution "equation" [1x0 string]
Mdl is a bnlssm model specifying the state-space model structure and prior distribution of the statespace model parameters. Because Mdl contains unknown values, it serves as a template for posterior analysis with observations. Simulate a series of 100 observations from the following stationary 2-D VAR process. xt, 1 = 1 + 0 . 9xt − 1, 1 + 0 . 3ut, 1 xt, 3 = − 1 + − 0 . 75xt − 1, 3 + 0 . 2ut, 3, where the disturbance series ut, j are standard Gaussian random variables. rng(1,"twister") % For reproducibility T = 100; thetatrue = [0.9; 1; -0.75; -1; 0.3; 0.2; 0.1];
12-982
filter
MdlSim = varm(AR={diag(thetatrue([1 3]))},Covariance=diag(thetatrue(5:6).^2), ... Constant=thetatrue([2 4])); XSim = simulate(MdlSim,T);
Compose simulated observations using the following equation. yt = log(exp(xt, 1 − x‾1) + exp(xt, 3 − x‾3)) + 0 . 1εt, where the innovation series εt is a standard Gaussian random variable. ysim = log(sum(exp(XSim - mean(XSim)),2)) + thetatrue(7)*randn(T,1);
To compute state estimates, the filter function requires response data and a model with known state-space model parameters. Choose a random set with the following constraints: • θ1 and θ3 are within the unit circle. Use U −1, 1 to generate values. • θ and θ are real numbers. Use the N 0, 32 distribution to generate values. 2 4 • θ5, θ6, and θ7 are positive real numbers. Use the χ 2 distribution to generate values. 1 theta13 = (-1+(1-(-1)).*rand(2,1)); theta24 = 3*randn(2,1); theta567 = chi2rnd(1,3,1); theta = [theta13(1); theta24(1); theta13(2); theta24(2); theta567];
Compute filtered state estimates and corresponding loglikelihood by passing the Bayesian nonlinear model, simulated data, and parameter values to filter. [FilterX,logL] = filter(Mdl,ysim,theta); size(FilterX) ans = 1×2 100
4
logL logL = -134.1053
FilterX is a 100-by-2 matrix of filter state estimates, with rows corresponding to periods in the sample and columns corresponding to the state variables. logL is the approximate loglikelihood function estimate evaluated at the data and parameter values. Compare the loglikelihood logL and the loglikelihood computed using θ from the data simulation. [FilterXSim,logLSim] = filter(Mdl,ysim,thetatrue); logLSim logLSim = -0.4078
logLSim > logL, suggesting that the model evaluated at thetaSim has the better fit. Plot the two sets of filter state estimates with the true state values. figure tiledlayout(2,1)
12-983
12
Functions
nexttile plot([FilterX(:,1) FilterXSim(:,1) XSim(:,1)]) title("x_{t,1}") legend("Filter State, random \theta","Filter State, true \theta","XSim") nexttile plot([FilterX(:,3) FilterXSim(:,3) XSim(:,2)]) title("x_{t,3}") legend("Filter State, random \theta","Filter State, true \theta","XSim")
The filter state estimates using the true value of θ and the simulated state paths are close. The filter state estimates are far from the simulated state paths. Local Functions These functions specify the state-space model parameter mappings, in equation form, and log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = @(x)blkdiag([theta(1) theta(2); 0 1],[theta(3) theta(4); 0 1])*x; B = [theta(5) 0; 0 0; 0 theta(6); 0 0]; C = @(x)log(exp(x(1)-theta(2)/(1-theta(1))) + ... exp(x(3)-theta(4)/(1-theta(3)))); D = theta(7); Mean0 = [theta(2)/(1-theta(1)); 1; theta(4)/(1-theta(3)); 1]; Cov0 = diag([theta(5)^2/(1-theta(1)^2) 0 theta(6)^2/(1-theta(3)^2) 0]); StateType = [0; 1; 0; 1]; % Stationary state and constant 1 processes end
12-984
filter
function logprior = priorDistribution(theta) paramconstraints = [(abs(theta([1 3])) >= 1) (theta(5:7) = 1; theta([2 4]) numpaths, filter uses the first size(Z,2) columns only. By default, filter sets any necessary presample disturbances to an independent sequence of standardized disturbances drawn from Mdl.Distribution. Data Types: double V0 — Positive presample conditional variance paths σt2 positive column vector | positive matrix Positive presample conditional variance paths σt2, specified as a numpreobs-by-1 positive column vector or numpreobs-by-numprepaths positive matrix. V0 provides initial values for the conditional variances in the model. Use V0 only when you supply the numeric array of disturbances Z. To initialize the conditional variance model, numpreobs must be at least max([Mdl.P Mdl.Q]). If numpreobs > max([Mdl.P Mdl.Q]), filter uses the latest required number of observations only. The last element or row contains the latest observation. • If V0 is a column vector, it represents a single path of the conditional variance series. filter applies it to each output path. • If V0 is a matrix, numprepaths must be at least numpaths. If numprepaths > numpaths, filter uses the first size(Z,2) columns only. By default, filter sets any necessary presample conditional variances to the unconditional variance of the process. Data Types: double Presample — Presample data table | timetable Presample data containing paths of innovation εt or conditional variance σt2 series to initialize the model, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1.
12-1006
filter
Each selected variable is a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-bynumprepaths matrix) of numpreobs observations representing the presample of the disturbance or conditional variance series for DisturbanceVariable, the selected disturbance variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be one of the following values: • Mdl.Q when Presample provides only presample disturbances • max([Mdl.P Mdl.Q]) when Presample provides presample conditional variances If you supply more rows than necessary, filter uses the latest required number of observations only. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default, filter sets any necessary presample disturbances to an independent sequence of standardized disturbances drawn from Mdl.Distribution, and it sets any necessary presample conditional variances to the unconditional variance of the process characterized by Mdl. If you specify the Presample, you must specify the presample disturbance or conditional variance names by using the PresampleDisturbanceVariable or PresampleVarianceVariable namevalue argument. PresampleDisturbanceVariable — Variable of Presample containing presample disturbance paths zt string scalar | character vector | integer | logical vector Variable of Presample containing presample disturbance paths zt, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleDisturbanceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric matrix and cannot contain missing values (NaNs). If you specify presample disturbance data by using the Presample name-value argument, you must specify PresampleDisturbanceVariable. Example: PresampleDisturbanceVariable="StockRateDist0" Example: PresampleDisturbanceVariable=[false false true false] or PresampleDisturbanceVariable=3 selects the third table variable as the presample disturbance variable. Data Types: double | logical | char | cell | string 12-1007
12
Functions
PresampleVarianceVariable — Variable of Presample containing data for the presample conditional variances σt2 string scalar | character vector | integer | logical vector Variable of Presample containing data for the presample conditional variances σt2, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable. Example: PresampleVarianceVariable="StockRateVar0" Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable. Data Types: double | logical | char | cell | string Note • NaN values in Z, Z0, and V0 indicate missing values. filter removes missing values from specified data by list-wise deletion. • For the presample, filter horizontally concatenates Z0 and V0, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data Z, filter removes any row containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, filter assumes that you synchronize the presample data such that the latest observations occur simultaneously. • filter issues an error when any table or timetable input contains missing values.
Output Arguments V — Filtered conditional variance paths σt2 numeric column vector | numeric matrix Filtered conditional variance paths σt2, returned as a numobs-by-1 numeric column vector or numobsby-numpaths numeric matrix. V represents the conditional variances of the mean-zero, heteroscedastic innovations associated with Y. filter returns V only when you supply the input Z. The dimensions of V and Z are equivalent. If Z is a matrix, then the columns of V are the conditional variance paths corresponding to the columns of Z. 12-1008
filter
Rows of V are periods corresponding to the periodicity of Z. Y — Filtered response paths yt numeric column vector | numeric matrix Filtered response paths yt, returned as a numobs-by-1 numeric column vector or numobs-bynumpaths. Y usually represents a mean-zero, heteroscedastic time series of innovations with conditional variances given in V. filter returns Y only when you supply the input Z. Y can also represent a time series of mean-zero, heteroscedastic innovations plus an offset. If Mdl includes an offset, then filter adds the offset to the underlying mean-zero, heteroscedastic innovations. Therefore, Y represents a time series of offset-adjusted innovations. If Z is a matrix, then the columns of Y are the response paths corresponding to the columns of Z. Rows of Y are periods corresponding to the periodicity of Z. Tbl2 — Filtered conditional variance σt2 and response yt paths table | timetable Filtered conditional variance σt2 and response yt paths, returned as a table or timetable, the same data type as Tbl1. filter returns Tbl2 only when you supply the input Tbl1. Tbl2 contains the following variables: • The filtered conditional variances paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the disturbance variable in Tbl1. filter names the filtered conditional variance variable in Tbl2 responseName_Variance, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding filtered response paths with the name StockReturns_Variance. • The filtered response paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the disturbance variable in Tbl1. filter names the filtered response variable in Tbl2 responseName_Response, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding filtered conditional variance paths with the name StockReturns_Response. • All variables Tbl1. If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal.
Alternatives filter generalizes simulate. Both function filter a series of disturbances to produce output responses and conditional variances. However, simulate autogenerates a series of mean-zero, unitvariance, independent and identically distributed (iid) disturbances according to the distribution in the conditional variance model object, Mdl. In contrast, filter lets you directly specify your own disturbances.
12-1009
12
Functions
Version History Introduced in R2012a R2023a: filter accepts input data in tables and timetables, and returns results in tables and timetables In addition to accepting input data (in-sample and presample) in numeric arrays, filter accepts input data in tables or regular timetables. When you supply data in a table or timetable, the following conditions apply: • filter chooses the default in-sample disturbance series on which to operate, but you can use the specified optional name-value argument to select a different series. • If you specify optional presample disturbance or conditional variance data to initialize the model, you must also specify the presample disturbance or conditional variance series name. • filter returns results in a table or timetable. Name-value arguments to support tabular workflows include: • DisturbanceVariable specifies the variable name of the disturbance paths in the input data Tbl1 to filter through the model. • Presample specifies the input table or timetable of presample disturbance and conditional variance data. • PresampleDisturbanceVariable specifies the variable name of the disturbance paths to select from Presample. • PresampleVarianceVariable specifies the variable name of the conditional variance paths to select from Presample.
References [1] Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics. Vol. 31, 1986, pp. 307–327. [2] Bollerslev, T. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” The Review of Economics and Statistics. Vol. 69, 1987, pp. 542–547. [3] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [4] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995. [5] Engle, R. F. “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica. Vol. 50, 1982, pp. 987–1007. [6] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects garch | egarch | gjr 12-1010
filter
Functions estimate | forecast | simulate Topics “Simulate Conditional Variance Model” on page 8-86 “Simulate GARCH Models” on page 8-76 “Monte Carlo Simulation of Conditional Variance Models” on page 8-72 “Presample Data for Conditional Variance Model Simulation” on page 8-75
12-1011
12
Functions
filter Forward recursion of diffuse state-space models
Syntax X = filter(Mdl,Y) X = filter(Mdl,Y,Name,Value) [X,logL,Output] = filter( ___ )
Description X = filter(Mdl,Y) returns filtered states on page 11-8 (X) by performing forward recursion of the fully specified diffuse state-space model on page 11-4 Mdl. That is, filter applies the diffuse Kalman filter on page 11-12 using Mdl and the observed responses Y. X = filter(Mdl,Y,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, specify the regression coefficients and predictor data to deflate the observations, or specify to use the univariate treatment of a multivariate model. If Mdl is not fully specified, then you must specify the unknown parameters as known scalars using the 'Params' Name,Value pair argument. [X,logL,Output] = filter( ___ ) additionally returns the loglikelihood value (logL) and an output structure array (Output) using any of the input arguments in the previous syntaxes. Output contains: • Filtered and forecasted states on page 11-8 • Estimated covariance matrices of the filtered and forecasted states • Loglikelihood value • Forecasted observations on page 11-10 and its estimated covariance matrix • Adjusted Kalman gain on page 11-11 • Vector indicating which data the software used to filter
Input Arguments Mdl — Diffuse state-space model dssm model object Diffuse state-space model, specified as an dssm model object returned by dssm or estimate. If Mdl is not fully specified (that is, Mdl contains unknown parameters), then specify values for the unknown parameters using the 'Params' name-value pair argument. Otherwise, the software issues an error. estimate returns fully-specified state-space models. Mdl does not store observed responses or predictor data. Supply the data wherever necessary using the appropriate input or name-value pair arguments. Y — Observed response data numeric matrix | cell vector of numeric vectors 12-1012
filter
Observed response data, specified as a numeric matrix or a cell vector of numeric vectors. • If Mdl is time invariant with respect to the observation equation, then Y is a T-by-n matrix, where each row corresponds to a period and each column corresponds to a particular observation in the model. T is the sample size and m is the number of observations per period. The last row of Y contains the latest observations. • If Mdl is time varying with respect to the observation equation, then Y is a T-by-1 cell vector. Each element of the cell vector corresponds to a period and contains an nt-dimensional vector of observations for that period. The corresponding dimensions of the coefficient matrices in Mdl.C{t} and Mdl.D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations. NaN elements indicate missing observations. For details on how the Kalman filter accommodates missing observations, see “Algorithms” on page 12-1025. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Beta',beta,'Predictors',Z specifies to deflate the observations by the regression component composed of the predictor data Z and the coefficient matrix beta. Beta — Regression coefficients [] (default) | numeric matrix Regression coefficients corresponding to predictor variables, specified as the comma-separated pair consisting of 'Beta' and a d-by-n numeric matrix. d is the number of predictor variables (see Predictors) and n is the number of observed response series (see Y). If Mdl is an estimated state-space model, then specify the estimated regression coefficients stored in estParams. Params — Values for unknown parameters numeric vector Values for unknown parameters in the state-space model, specified as the comma-separated pair consisting of 'Params' and a numeric vector. The elements of Params correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of Params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise following the order A, B, C, D, Mean0, and Cov0. • If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then you must set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrix mapping function. If Mdl contains unknown parameters, then you must specify their values. Otherwise, the software ignores the value of Params. 12-1013
12
Functions
Data Types: double Predictors — Predictor variables in state-space model observation equation [] (default) | numeric matrix Predictor variables in the state-space model observation equation, specified as the comma-separated pair consisting of 'Predictors' and a T-by-d numeric matrix. T is the number of periods and d is the number of predictor variables. Row t corresponds to the observed predictors at period t (Zt). The expanded observation equation is yt − Zt β = Cxt + Dut . That is, the software deflates the observations using the regression component. β is the time-invariant vector of regression coefficients that the software estimates with all other parameters. If there are n observations per period, then the software regresses all predictor series onto each observation. If you specify Predictors, then Mdl must be time invariant. Otherwise, the software returns an error. By default, the software excludes a regression component from the state-space model. Data Types: double SwitchTime — Final period for diffuse state initialization positive integer Final period for diffuse state initialization, specified as the comma-separated pair consisting of 'SwitchTime' and a positive integer. That is, estimate uses the observations from period 1 to period SwitchTime as a presample to implement the exact initial Kalman filter (see “Diffuse Kalman Filter” on page 11-12 and [1]). After initializing the diffuse states, estimate applies the standard Kalman filter on page 11-7 to the observations from periods SwitchTime + 1 to T. The default value for SwitchTime is the last period in which the estimated smoothed state precision matrix is singular (i.e., the inverse of the covariance matrix). This specification represents the fewest number of observations required to initialize the diffuse states. Therefore, it is a best practice to use the default value. If you set SwitchTime to a value greater than the default, then the effective sample size decreases. If you set SwitchTime to a value that is fewer than the default, then estimate might not have enough observations to initialize the diffuse states, which can result in an error or improper values. In general, estimating, filtering, and smoothing state-space models with at least one diffuse state requires SwitchTime to be at least one. The default estimation display contains the effective sample size. Data Types: double Tolerance — Forecast uncertainty threshold 0 (default) | nonnegative scalar Forecast uncertainty threshold, specified as the comma-separated pair consisting of 'Tolerance' and a nonnegative scalar. 12-1014
filter
If the forecast uncertainty for a particular observation is less than Tolerance during numerical estimation, then the software removes the uncertainty corresponding to the observation from the forecast covariance matrix before its inversion. It is best practice to set Tolerance to a small number, for example, le-15, to overcome numerical obstacles during estimation. Example: 'Tolerance',le-15 Data Types: double Univariate — Univariate treatment of multivariate series flag false (default) | true Univariate treatment of a multivariate series flag, specified as the comma-separated pair consisting of 'Univariate' and true or false. Univariate treatment of a multivariate series is also known as sequential filtering. The univariate treatment can accelerate and improve numerical stability of the Kalman filter. However, all observation innovations must be uncorrelated. That is, DtDt' must be diagonal, where Dt, t = 1,...,T, is one of the following: • The matrix D{t} in a time-varying state-space model • The matrix D in a time-invariant state-space model Example: 'Univariate',true Data Types: logical
Output Arguments X — Filtered states numeric matrix | cell vector of numeric vectors Filtered states on page 11-8, returned as a numeric matrix or a cell vector of numeric vectors. If Mdl is time invariant, then the number of rows of X is the sample size, T, and the number of columns of X is the number of states, m. The last row of X contains the latest filtered states. If Mdl is time varying, then X is a cell vector with length equal to the sample size. Cell t of X contains a vector of filtered states with length equal to the number of states in period t. The last cell of X contains the latest filtered states. filter pads the first SwitchTime periods of X with zeros or empty cells. The zeros or empty cells represent the periods required to initialize the diffuse states. logL — Loglikelihood function value scalar Loglikelihood function value, returned as a scalar. Missing observations and observations before SwitchTime do not contribute to the loglikelihood. Output — Filtering results by period structure array 12-1015
12
Functions
Filtering results by period, returned as a structure array. Output is a T-by-1 structure, where element t corresponds to the filtering result at time t. • If Univariate is false (it is by default), then the following table outlines the fields of Output. Field
Description
Estimate of
LogLikelihood
Scalar loglikelihood objective N/A function value
FilteredStates
mt-by-1 vector of filtered states on page 11-8
E xt y1, ..., yt
FilteredStatesCov
mt-by-mt variance-covariance matrix of filtered states
Var xt y1, ..., yt
ForecastedStates
mt-by-1 vector of state forecasts on page 11-8
E xt y1, ..., yt − 1
ForecastedStatesCov
mt-by-mt variance-covariance matrix of state forecasts
Var xt y1, ..., yt − 1
ForecastedObs
ht-by-1 forecasted observation E yt y1, ..., yt − 1 on page 11-10 vector
ForecastedObsCov
ht-by-ht variance-covariance matrix of forecasted observations
KalmanGain
mt-by-nt adjusted Kalman gain N/A on page 11-11 matrix
DataUsed
ht-by-1 logical vector N/A indicating whether the software filters using a particular observation. For example, if observation i at time t is a NaN, then element i in DataUsed at time t is 0.
Var yt y1, ..., tt − 1
• If Univarite is true, then the fields of Output are the same as in the previous table, except for the following amendments. Field
Changes
ForecastedObs
Same dimensions as if Univariate = 0, but only the first elements are equal
ForecastedObsCov
n-by-1 vector of forecasted observation variances. The first element of this vector is equivalent to ForecastedObsCov(1,1) when Univariate is false. The rest of the elements are not necessarily equivalent to their corresponding values in ForecastObsCov when Univariate.
12-1016
filter
Field
Changes
KalmanGain
Same dimensions as if Univariate is false, though KalmanGain might have different entries.
filter pads the first SwitchTime periods of the fields of Output with empty cells. These empty cells represent the periods required to initialize the diffuse states.
Examples Filter States of Time-Invariant Diffuse State-Space Model Suppose that a latent process is a random walk. The state equation is xt = xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. T = 100; x0 = 1.5; rng(1); % For reproducibility u = randn(T,1); x = cumsum([x0;u]); x = x(2:end);
Suppose further that the latent process is subject to additive measurement error. The observation equation is yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.75*randn(T,1);
Specify the four coefficient matrices. A B C D
= = = =
1; 1; 1; 0.75;
Create the diffuse state-space model using the coefficient matrices. Specify that the initial state distribution is diffuse. Mdl = dssm(A,B,C,D,'StateType',2) Mdl = State-space model type: dssm
12-1017
12
Functions
State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 Inf State types x1 Diffuse
Mdl is an dssm model. Verify that the model is correctly specified using the display in the Command Window. Filter states for periods 1 through 100. Plot the true state values and the filtered state estimates. filteredX = filter(Mdl,y); figure plot(1:T,x,'-k',1:T,filteredX,':r','LineWidth',2) title({'State Values'}) xlabel('Period') ylabel('State') legend({'True state values','Filtered state values'})
12-1018
filter
The true values and filter estimates are approximately the same, except for the first filtered state, which is zero.
Filter States of Diffuse State-Space Model Containing Regression Component Suppose that the linear relationship between unemployment rate and the nominal gross national product (nGNP) is of interest. Suppose further that unemployment rate is an AR(1) series. Symbolically, and in state-space form, the model is xt = ϕxt − 1 + σut yt − βZt = xt, where: • xt is the unemployment rate at time t. •
yt is the observed change in the unemployment rate being deflated by the return of nGNP (Zt).
• ut is the Gaussian series of state disturbances having mean 0 and unknown standard deviation σ. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
12-1019
12
Functions
Preprocess the data by taking the natural logarithm of the nGNP series, and removing the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); y = diff(DataTable.UR(~isNaN)); T = size(gnpn,1); Z = price2ret(gnpn);
% Flag periods containing NaNs % The sample size
This example continues using the series without NaN values. However, using the Kalman filter framework, the software can accommodate series containing missing values. Specify the coefficient matrices. A = NaN; B = NaN; C = 1;
Create the state-space model using dssm by supplying the coefficient matrices and specifying that the state values come from a diffuse distribution. The diffuse specification indicates complete ignorance about the moments of the initial distribution. StateType = 2; Mdl = dssm(A,B,C,'StateType',StateType);
Estimate the parameters. Specify the regression component and its initial value for optimization using the 'Predictors' and 'Beta0' name-value pair arguments, respectively. Display the estimates and all optimization diagnostic information. Restrict the estimate of σ to all positive, real numbers. params0 = [0.3 0.2]; % Initial values chosen arbitrarily Beta0 = 0.1; [EstMdl,estParams] = estimate(Mdl,y,params0,'Predictors',Z,'Beta0',Beta0,... 'lb',[-Inf 0 -Inf]); Method: Maximum likelihood (fmincon) Effective Sample size: 60 Logarithmic likelihood: -110.477 Akaike info criterion: 226.954 Bayesian info criterion: 233.287 | Coeff Std Err t Stat Prob -------------------------------------------------------c(1) | 0.59436 0.09408 6.31738 0 c(2) | 1.52554 0.10758 14.17991 0 y 0 is a linear combination of X(t) for times t = t, t–1, t–2,...,t–p.
Examples Filter a Series Through a Lag Polynomial Create a LagOp polynomial and a random time series: rng('default')
% Make output reproducible
A = LagOp({1 -0.6 0.08 0.2}, 'Lags', [0 1 2 4]); X = randn(10, A.Dimension);
Filter the input time series with no explicit initial observations, allowing the filter method to automatically strip all required initial data from the beginning of the input time series X(t). [Y1,T1] = filter(A, X);
Manually strip all required presample observations directly from the beginning of X(t), then pass in the reduced-length X(t) and the stripped presample observations directly to the filter method. In this case, the first 4 observations of X(t) are stripped because the degree of the lag operator polynomial created below is 4. [Y2,T2] = filter(A, X((A.Degree + 1):end,:), ... 'Initial', X(1:A.Degree,:));
Manually strip part of the required presample observations from the beginning of X(t) and let the filter method automatically strip the remaining observations from X(t). [Y3,T3] = filter(A, X((A.Degree - 1):end,:), ... 'Initial', X(1:A.Degree - 2,:));
The filtered output series are all the same. However, the associated time vectors are not. disp([T1 T2 T3]) 4 5 6 7 8 9
0 1 2 3 4 5
2 3 4 5 6 7
Algorithms Filtering is limited to single paths, so matrix data are assumed to be a single path of a multidimensional process, and 3-D data (multiple paths of a multidimensional process) are not allowed.
See Also mldivide 12-1028
filter
filter Filtered inference of operative latent states in Markov-switching dynamic regression data
Syntax FS = filter(Mdl,Y) FS = filter(Mdl,Y,Name,Value) [FS,logL] = filter( ___ )
Description FS = filter(Mdl,Y) returns filtered state probabilities FS from conducting optimal conditional inference of the probabilities of the operative latent states in the regime-switching data Y. The Markov-switching dynamic regression model Mdl models the data. filter uses a recursive application of Bayes' rule, as in Hamilton [3]. FS = filter(Mdl,Y,Name,Value) uses additional options specified by one or more name-value arguments. For example, 'Y0',Y0 initializes the dynamic component of each submodel by using the presample response data Y0. [FS,logL] = filter( ___ ) also returns the estimated loglikelihood logL using any of the input argument combinations in the previous syntaxes.
Examples Compute Filtered State Probabilities Compute filtered state probabilities from a two-state Markov-switching dynamic regression model for a 1-D response process. This example uses arbitrary parameter values for the data-generating process (DGP). Create Fully Specified Model for DGP Create a two-state discrete-time Markov chain model for the switching mechanism. P = [0.9 0.1; 0.2 0.8]; mc = dtmc(P);
mc is a fully specified dtmc object. For each state, create an AR(0) (constant only) model for the response process. Store the models in a vector. mdl1 = arima('Constant',2,'Variance',3); mdl2 = arima('Constant',-2,'Variance',1); mdl = [mdl1; mdl2];
mdl1 and mdl2 are fully specified arima objects. 12-1029
12
Functions
Create a Markov-switching dynamic regression model from the switching mechanism mc and the vector of submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR object. Simulate Data from DGP filter requires responses to compute filtered state probabilities. Generate one random response and state path, both of length 30, from the DGP. rng(1000); % For reproducibility [y,~,sp] = simulate(Mdl,30);
Compute State Probabilities Compute filtered and smoothed state probabilities from the Markov-switching model given the simulated response data. fs = filter(Mdl,y); ss = smooth(Mdl,y);
fs and ss are 30-by-2 matrices of filtered and smoothed state probabilities, respectively, for each period in the simulation horizon. Although the filtered state probabilities at time t (fs(t,:)) are based on the response data through time t (y(1:t)), the smoothed state probabilities at time t (ss(t,:)) are based on all observations. Plot the simulated state path and the filtered and smoothed state probabilities on the same graph. figure plot(sp,'m') hold on plot(fs(:,2),'r') plot(ss(:,2),'g') yticks([0 1 2]) xlabel("Time") title("Observed States with Estimated State Probabilities") legend({'Simulated states','Filtered probability: state 2',... 'Smoothed probability: state 2'}) hold off
12-1030
filter
Compute Filtered Probabilities of Recession Consider a two-state Markov-switching dynamic regression model of the postwar US real GDP growth rate. The model has the parameter estimates presented in [1]. Create Markov-Switching Dynamic Regression Model Create a fully specified discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.92 0.08; 0.26 0.74]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
Create separate, fully specified AR(0) models for the two regimes. sigma = 3.34; % Homoscedastic models across states mdl1 = arima('Constant',4.62,'Variance',sigma^2); mdl2 = arima('Constant',-0.48,'Variance',sigma^2); mdl = [mdl1 mdl2];
Create the Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR object. 12-1031
12
Functions
Load and Preprocess Data Load the US GDP data set. load Data_GDP
Data contains quarterly measurements of the US real GDP in the period 1947:Q1–2005:Q2. The period of interest in [1] is 1947:Q2–2004:Q2. For more details on the data set, enter Description at the command line. Transform the data to an annualized rate series by: 1
Converting the data to a quarterly rate within the estimation period
2
Annualizing the quarterly rates
qrate = diff(Data(2:230))./Data(2:229); % Quarterly rate arate = 100*((1 + qrate).^4 - 1); % Annualized rate
The transformation drops the first observation. Compute Filtered State Probabilities Compute filtered state probabilities for the data and model. FS = filter(Mdl,arate); FS(end,:) ans = 1×2 0.9396
0.0604
FS is a 228-by-2 matrix of filtered state probabilities. Rows correspond to periods in the data arate, and columns correspond to the regimes. Plot the filtered probabilities of recession, as in [1], Figure 6. figure; plot(dates(3:230),FS(:,2),'r') datetick('x') title('Current Filter Probabilities and NBER Recessions') recessionplot
12-1032
filter
Compute Smoothed State Probabilities Compute smoothed state probabilities, and then plot the smoothed probabilities of recession as in [1], Figure 6. SS = smooth(Mdl,arate); figure plot(dates(3:230),SS(:,2),'r') datetick('x') recessionplot title('Full-Sample Smoothed Probabilities and NBER Recessions')
12-1033
12
Functions
Compute Filtered State Probabilities from Model with VARX Submodels Compute filtered state probabilities from a three-state Markov-switching dynamic regression model for a 2-D VARX response process. This example uses arbitrary parameter values for the DGP. Create Fully Specified Model for DGP Create a three-state discrete-time Markov chain model for the switching mechanism. P = [5 1 1; 1 5 1; 1 1 5]; mc = dtmc(P);
mc is a fully specified dtmc object. dtmc normalizes the rows of P so that they sum to 1. For each state, create a fully specified VARX(0) model (constant and regression coefficient matrix only) for the response process. Specify different constant vectors across models. Specify the same regression coefficient for the two regressors, and specify the same covariance matrix. Store the VARX models in a vector. % Constants C1 = [1;-1]; C2 = [3;-3]; C3 = [5;-5]; % Regression coefficient
12-1034
filter
Beta = [0.2 0.1;0 -0.3]; % Covariance matrix Sigma = [1.8 -0.4; -0.4 1.8]; % VARX submodels mdl1 = varm('Constant',C1,'Beta',Beta,... 'Covariance',Sigma); mdl2 = varm('Constant',C2,'Beta',Beta,... 'Covariance',Sigma); mdl3 = varm('Constant',C3,'Beta',Beta,... 'Covariance',Sigma); mdl = [mdl1; mdl2; mdl3];
mdl contains three fully specified varm model objects. For the DGP, create a fully specified Markov-switching dynamic regression model from the switching mechanism mc and the submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR model. Simulate Data from DGP Simulate data for the two exogenous series by generating 30 observations from the standard 2-D Gaussian distribution. rng(1) % For reproducibility X = randn(30,2);
Generate one random response and state path, both of length 30, from the DGP. Specify the simulated exogenous data for the submodel regression components. [Y,~,SP] = simulate(Mdl,30,'X',X);
Y is a 30-by-2 matrix of one simulated response path. SP is a 30-by-1 vector of one simulated state path. Compute State Probabilities Compute filtered and smoothed state probabilities from the DGP given the simulated response data. FS = filter(Mdl,Y,'X',X); SS = smooth(Mdl,Y,'X',X);
FS and SS are 30-by-2 matrices of filtered and smoothed state probabilities, respectively, for each period in the simulation horizon. Plot the simulated state path and the filtered and smoothed state probabilities on subplots in the same figure. figure subplot(3,1,1) plot(SP,'m') yticks([1 2 3]) legend({'Simulated states'})
12-1035
12
Functions
subplot(3,1,2) plot(FS,'--') legend({'Filtered s1','Filtered s2','Filtered s3'}) subplot(3,1,3) plot(SS,'-') legend({'Smoothed s1','Smoothed s2','Smoothed s3'})
Specify Presample Data Consider the data in “Compute Filtered Probabilities of Recession” on page 12-1031, but assume that the period of interest is 1960:Q1–2004:Q2. Also, consider adding an autoregressive term to each submodel. Create Partially Specified Model for Estimation Create a partially specified Markov-switching dynamic regression model for estimation. Specify AR(1) submodels. P = NaN(2); mc = dtmc(P,'StateNames',["Expansion" "Recession"]); mdl = arima(1,0,0); Mdl = msVAR(mc,[mdl; mdl]);
Because the submodels are AR(1), each requires one presample observation to initialize its dynamic component for estimation. 12-1036
filter
Create Fully Specified Model Containing Initial Values Create the model containing initial parameter values for the estimation procedure. mc0 = dtmc(0.5*ones(2),'StateNames',["Expansion" "Recession"]); submdl01 = arima('Constant',1,'Variance',1,'AR',0.001); submdl02 = arima('Constant',-1,'Variance',1,'AR',0.001); Mdl0 = msVAR(mc0,[submdl01; submdl02]);
Load and Preprocess Data Load the data. Transform the entire set to an annualized rate series. load Data_GDP qrate = diff(Data)./Data(1:(end - 1)); arate = 100*((1 + qrate).^4 - 1);
Identify the presample and estimation sample periods using the dates associated with the annualized rate series. Because the transformation applies the first difference, you must drop the first observation date from the original sample. dates = datetime(dates(2:end),'ConvertFrom','datenum',... 'Format','yyyy:QQQ','Locale','en_US'); estPrd = datetime(["1960:Q2" "2004:Q2"],'InputFormat','yyyy:QQQ',... 'Format','yyyy:QQQ','Locale','en_US'); idxEst = isbetween(dates,estPrd(1),estPrd(2)); idxPre = dates < (estPrd(1));
Estimate Model Fit the model to the estimation sample data. Specify the presample observation. arate0 = arate(idxPre); arateEst = arate(idxEst); EstMdl = estimate(Mdl,Mdl0,arateEst,'Y0',arate0);
EstMdl is a fully specified msVAR object. Compute State Probabilities Compute filtered and smoothed state probabilities from the estimated model and data in the estimation period. Specify the presample observation. Plot the estimated probabilities of recession on subplots in the same figure. FS = filter(EstMdl,arateEst,'Y0',arate0); SS = smooth(EstMdl,arateEst,'Y0',arate0); figure; subplot(2,1,1) plot(dates(idxEst),FS(:,2),'r') title("Current Filter Probabilities and NBER Recessions") recessionplot subplot(2,1,2) plot(dates(idxEst),SS(:,2),'r') title("Full-Sample Smoothed Probabilities and NBER Recessions") recessionplot
12-1037
12
Functions
Return Loglikelihood for Data Consider the model and data in “Compute Filtered Probabilities of Recession” on page 12-1031. Create the fully specified Markov-switching model. P = [0.92 0.08; 0.26 0.74]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]); sigma = 3.34; mdl1 = arima('Constant',4.62,'Variance',sigma^2); mdl2 = arima('Constant',-0.48,'Variance',sigma^2); mdl = [mdl1; mdl2]; Mdl = msVAR(mc,mdl);
Load and preprocess the data. load Data_GDP qrate = diff(Data(2:230))./Data(2:229); arate = 100*((1 + qrate).^4 - 1);
Compute filtered state probabilities and the loglikelihood for the data and model.
12-1038
filter
[FS,logL] = filter(Mdl,arate); logL logL = -640.3016
Input Arguments Mdl — Fully specified Markov-switching dynamic regression model msVAR model object Fully specified Markov-switching dynamic regression model, specified as an msVAR model object returned by msVAR or estimate. Properties of a fully specified model object do not contain NaN values. Y — Observed response data numeric matrix Observed response data, specified as a numObs-by-numSeries numeric matrix. numObs is the sample size. numSeries is the number of response variables (Mdl.NumSeries). Rows correspond to observations, and the last row contains the latest observation. Columns correspond to individual response variables. Y represents the continuation of the presample response series in Y0. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Y0',Y0,'X',X initializes the dynamic component of each submodel in Mdl by using the presample response data Y0, and includes a linear regression component in each submodel composed of the predictor data in X and the specified regression coefficients. Y0 — Presample response data numeric matrix Presample response data, specified as the comma-separated pair consisting of 'Y0' and a numPreSampleObs-by-numSeries numeric matrix. The number of presample observations numPreSampleObs must be sufficient to initialize the AR terms of all submodels. If numPreSampleObs exceeds the AR order of any state, filter uses the latest observations. By default, Y0 is the initial segment of Y, which reduces the effective sample size. Data Types: double S0 — Initial state probabilities nonnegative numeric vector 12-1039
12
Functions
Initial state probabilities, specified as the comma-separated pair consisting of 'S0' and a nonnegative numeric vector of length numStates. filter normalizes S0 to produce a distribution. By default, S0 is a steady-state distribution computed by asymptotics. Example: 'S0',[0.2 0.2 0.6] Example: 'S0',[0 1] specifies state 2 as the initial state. Data Types: double X — Predictor data numeric matrix | cell vector of numeric matrices Predictor data used to evaluate regression components in all submodels of Mdl, specified as the comma-separated pair consisting of 'X' and a numeric matrix or a cell vector of numeric matrices. To use a subset of the same predictors in each state, specify X as a matrix with numPreds columns and at least numObs rows. Columns correspond to distinct predictor variables. Submodels use initial columns of the associated matrix, in order, up to the number of submodel predictors. The number of columns in the Beta property of Mdl.SubModels(j) determines the number of exogenous variables in the regression component of submodel j. If the number of rows exceeds numObs, then filter uses the latest observations. To use different predictors in each state, specify a cell vector of such matrices with length numStates. By default, filter ignores regression components in Mdl. Data Types: double
Output Arguments FS — Filtered state probabilities nonnegative numeric matrix Filtered state probabilities, returned as a numObs-by-numStates nonnegative numeric matrix. logL — Estimated loglikelihood numeric scalar Estimated loglikelihood of the response data Y, returned as a numeric scalar.
Algorithms filter proceeds iteratively from an initial estimate of the state distribution S0 to estimates in FS by using forecasts from the current data history at each time step. smooth refines current estimates of the state distribution that filter produces by iterating backward from the full sample history Y.
Version History Introduced in R2019b 12-1040
filter
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." Econometrica. Vol. 57, 1989, pp. 357–384. [3] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects msVAR Functions estimate | smooth
12-1041
12
Functions
filter Filter disturbances through regression model with ARIMA errors
Syntax Y = filter(Mdl,Z) [Y,E,U] = filter(Mdl,Z) Tbl2 = filter(Mdl,Tbl1) [ ___ ] = filter( ___ ,Name,Value)
Description Y = filter(Mdl,Z) returns a numeric array of one or more response series Y resulting from filtering the numeric array of one or more underlying disturbance series Z through the fully specified, univariate regression model with ARIMA errors Mdl. Z is associated with the error model innovations process that drives the specified regression model with ARIMA errors. [Y,E,U] = filter(Mdl,Z) also returns numeric arrays of one or more series of error model innovations E and unconditional disturbances U, resulting from filtering the disturbance paths Z through the model Mdl. Tbl2 = filter(Mdl,Tbl1) returns the table or timetable Tbl2 containing the results from filtering the paths of disturbances in the input table or timetable Tbl1 through Mdl. The disturbance variable in Tbl1 is associated with the model innovations process that drives Mdl. filter selects the variable Mdl.SeriesName, or the sole variable in Tbl1, as the disturbance variable to filter through the model. To select a different variable in Tbl1 to filter through the model, use the DisturbanceVariable name-value argument. [ ___ ] = filter( ___ ,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. filter returns the output argument combination for the corresponding input arguments. For example, filter(Mdl,Z,X=Pred,Z0=PSZ) specifies the predictor data Pred for the model regression component and the observed errors in the presample period PSZ to initialize the model.
Examples Filter Disturbance Vector to Compute Impulse Response Function Compute the impulse response function (IRF) of an innovation shock to the regression model with ARMA(2,1) errors. Supply the innovation shock as a vector. The IRF assesses the dynamic behavior of a system to a one-time shock. Typically, the magnitude of the shock is 1. Alternatively, it might be more meaningful to examine an IRF of an innovation shock with a magnitude of one standard deviation. In regression models with ARIMA errors, 12-1042
filter
• The IRF is invariant to the behavior of the predictors and the intercept. • The IRF of the model is defined as the impulse response of the unconditional disturbances as governed by the ARIMA error component. Create the following regression model with ARMA(2,1) errors: yt = ut ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, where εt is Gaussian with variance 0.1. Mdl = regARIMA(Intercept=0,AR={0.5 -0.8},MA=-0.5, ... Variance=0.1);
When you construct an impulse response function for a regression model with ARIMA errors, you must set Intercept to 0. Simulate the first 30 responses of the impulse response function by generating a error series with a one-time impulse with magnitude equal to one standard deviation, and then filter it. Also, use impulse to compute the IRF. z = [sqrt(Mdl.Variance); zeros(29,1)]; yFltr = filter(Mdl,z); yImpls = impulse(Mdl,30);
% Shock of 1 std
When you construct an IRF of a regression model with ARIMA errors containing a regression component, do not specify the predictor matrix, X, in filter. Plot the IRFs. figure tiledlayout(2,1) nexttile stem((0:numel(yFltr)-1)',yFltr,"filled") title("Impulse Response to Shock of One Standard Deviation") nexttile stem((0:numel(yImpls)-1)',yImpls,"filled") title("Impulse Response to Unit Shock")
12-1043
12
Functions
The IRF given a shock of one standard deviation is a scaled version of the IRF returned by impulse.
Compute Step Response Compute the step response function of a regression model with ARMA(2,1) errors. The step response assesses the dynamic behavior of a system to a persistent shock. Typically, the magnitude of the shock is 1. Alternatively, it might be more meaningful to examine a step response of a persistent innovation shock with a magnitude of one standard deviation. This example plots the step response of a persistent innovations shock in a model without an intercept and predictor matrix for regression. However, note that filter is flexible in that it accepts a persistent innovations or predictor shock that you construct using any magnitude, then filters it through the model. Specify the following regression model with ARMA(2,1) errors: yt = ut ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, where εt is Gaussian with variance 0.1. Mdl = regARIMA(Intercept=0,AR={0.5 -0.8},MA=-0.5, ... Variance=0.1);
12-1044
filter
Compute the first 30 responses to a sequence of unit errors by generating an error series of one standard deviation, and then filtering it. z = sqrt(Mdl.Variance)*ones(30,1); % Persistent shock of one std y = filter(Mdl,z); y = y/y(1); % Normalize relative to y(1)
Plot the step response function. figure stem((0:numel(y)-1)',y,"filled") title("Step Response for Persistent Shock of One STD")
The step response settles around 0.4.
Filter Timetable of Disturbances Through Estimated Model Fit a regression model with ARMA(1,1) errors by regressing the US consumer price index (CPI) quarterly changes onto the US gross domestic product (GDP) growth rate. Supply a timetable of data and specify the series for the fit. Then, filter paths of disturbances in a timetable through the fitted model.
12-1045
12
Functions
Load and Transform Data Load the US macroeconomic data set. Compute the series of GDP quarterly growth rates and CPI quarterly changes. load Data_USEconModel DTT = price2ret(DataTimeTable,DataVariables="GDP"); DTT.GDPRate = 100*DTT.GDP; DTT.CPIDel = diff(DataTimeTable.CPIAUCSL); T = height(DTT) T = 248 figure tiledlayout(2,1) nexttile plot(DTT.Time,DTT.GDPRate) title("GDP Rate") ylabel("Percent Growth") nexttile plot(DTT.Time,DTT.CPIDel) title("Index")
The series appear stationary, albeit heteroscedastic. Prepare Timetable for Estimation When you plan to supply a timetable, you must ensure it has all the following characteristics: 12-1046
filter
• The selected response variable is numeric and does not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the timetable. DTT = rmmissing(DTT); T_DTT = height(DTT) T_DTT = 248
Because each sample time has an observation for all variables, rmmissing does not remove any observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt; areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 1
DTT is regular. Create Model Template for Estimation Suppose that a regression model of CPI quarterly changes onto the GDP rate, with ARMA(1,1) errors, is appropriate. Create a model template for a regression model with ARMA(1,1) errors template. Mdl = regARIMA(1,0,1) Mdl = regARIMA with properties: Description: "ARMA(1,1) Error Model (Gaussian Distribution)"
12-1047
12
Functions
SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"Y" Name = "Gaussian" NaN [1×0] 1 1 {NaN} at lag [1] {} {NaN} at lag [1] {} NaN
Mdl is a partially specified regARIMA object. Fit Model to Data Fit a regression model with ARMA(1,1) errors to the data. Specify the entire series GDP rate and CPI quarterly changes series, and specify the response and predictor variable names. EstMdl = estimate(Mdl,DTT,ResponseVariable="GDPRate", ... PredictorVariables="CPIDel"); Regression with ARMA(1,1) Error Model (Gaussian Distribution): Value ________ Intercept AR{1} MA{1} Beta(1) Variance
0.0162 0.60515 -0.16221 0.002221 0.000113
StandardError _____________ 0.0016077 0.089912 0.11051 0.00077691 7.2753e-06
TStatistic __________ 10.077 6.7305 -1.4678 2.8587 15.533
PValue __________ 6.9995e-24 1.6906e-11 0.14216 0.0042532 2.0838e-54
EstMdl is a fully specified, estimated regARIMA object. Filter Random Gaussian Disturbance Paths Generate 2 random, independent series of length T_DTT from the standard Gaussian distribution. Store the matrix of series as one variable in DTT. rng(1,"twister") % For reproducibility DTT.Z = randn(T_DTT,2);
DTT contains a new variable called Z containing a T_DTT-by-2 matrix of two disturbance paths. Filter the paths of disturbances through the estimated model. Specify the table variable name containing the disturbance paths. Tbl2 = filter(EstMdl,DTT,DisturbanceVariable="Z"); tail(Tbl2)
12-1048
Time _____
Interval ________
Q2-07 Q3-07 Q4-07
91 91 94
GDP ___________
GDPRate __________
CPIDel ______
0.00018278 0.00016916 6.1286e-05
0.018278 0.016916 0.0061286
1.675 1.359 3.355
Z ______________________
Y _______
-0.36436 -0.093312 0.48981
0.01606 0.01575 0.02129
-0.7055 -0.3311 -1.5208
filter
Q1-08 Q2-08 Q3-08 Q4-08 Q1-09
91 91 92 92 90
9.3272e-05 0.00011103 8.9585e-05 -0.00016145 -8.6878e-05
0.0093272 0.011103 0.0089585 -0.016145 -0.0086878
1.93 3.367 1.641 -7.098 1.137
1.4014 -0.27422 0.67582 0.19058 0.67036
0.16528 -0.48787 0.58697 -0.90337 0.37101
0.03333 0.0212 0.02690 0.0235 0.02743
size(Tbl2) ans = 1×2 248
8
Tbl2 is a 248-by-8 timetable containing all variables in DTT, and the two filtered response paths Y_Response, error model innovation paths Y_ErrorInnovation, and unconditional disturbance paths Y_RegressionInnovation.
Filter Error Paths Through Regression Model with SARIMA Errors Simulate 100 independent paths of responses by filtering 100 independent paths of errors zt, where innovations εt = σ zt, through the following regression model with SARIMA 2, 1, 1 12 errors. yt = X
1.5 + ut −2
1 − 0 . 2L − 0 . 1L2 1 − L 1 − 0 . 01L12 1 − L12 ut = 1 + 0 . 5L 1 + 0 . 02L12 εt, where εt follows a t-distribution with 15 degrees of freedom. Distribution = struct("Name","t","DoF",15); Mdl = regARIMA(AR={0.2 0.1},SAR=0.01,SARLags=12, ... MA=0.5,SMA=0.02,SMALags=12,D=1,Seasonality=12, ... Beta=[1.5; -2],Intercept=0,Variance=0.1, ... Distribution=Distribution) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"Regression with ARIMA(2,1,1) Error Model Seasonally Integrated with Seasonal A "Y" Name = "t", DoF = 15 0 [1.5 -2] 27 1 13 {0.2 0.1} at lags [1 2] {0.01} at lag [12] {0.5} at lag [1] {0.02} at lag [12] 12 0.1
Simulate a length 25 path of data from the standard bivariate normal distribution for the predictor variables in the regression component. 12-1049
12
Functions
rng(1,"twister") % For reproducibility numObs = 25; Pred = randn(numObs,2);
Simulate 100 independent paths of errors of length 25 from the standard normal distribution. numPaths = 100; Z = randn(numObs,numPaths);
Simulate 100 independent response paths from model by filtering the paths of errors through the model. Supply the predictor data for the regression component. Y = filter(Mdl,Z,X=Pred); figure plot(Y) title("Simulated Response Paths")
Plot the 2.5th, 50th (median), and 97.5th percentiles of the simulated response paths. lower = prctile(Y,2.5,2); middle = median(Y,2); upper = prctile(Y,97.5,2); figure plot(1:25,lower,"r:",1:25,middle,"k", ... 1:25,upper,"r:")
12-1050
filter
title("Monte Carlo Summary of Responses") legend("95% Interval","Median",Location="best")
Compare Responses from filter and simulate Simulate responses using filter and simulate. Then compare the simulated responses. Both filter and simulate filter a series of errors to produce output responses y, innovations e, and unconditional disturbances u. The difference is that simulate generates errors from Mdl.Distribution, whereas filter accepts a random array of errors that you generate from any distribution. Specify the following regression model with ARMA(2,1) errors: 0.1 + ut −0 . 2 ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, yt = Xt
where εt is Gaussian with variance 0.1. Mdl = regARIMA(Intercept=0,AR={0.5 -0.8},MA=-0.5, ... Beta=[0.1 -0.2],Variance=0.1);
Mdl is a fully specified regARIMA object. 12-1051
12
Functions
Simulate a one path of bivariate standard normal data for the predictor variables. Then, simulate a path of responses and innovations from the regression model with ARMA(2,1) errors. Supply the simulated predictor data to simulate for the regression component. rng(1,"twister") % For reproducibility Pred = randn(100,2); % Simulate predictor data [ySim,eSim] = simulate(Mdl,100,X=Pred);
ySim and eSIM are 100-by-1 vectors of simulated responses and innovations, respectively, from the model Mdl. Produce model errors by standardizing the simulated innovations. Filter the simulated errors through the model. Supply the predictor data to filter. z1 = eSim./sqrt(Mdl.Variance); yFlt1 = filter(Mdl,z1,X=Pred);
yFlt1 is a 100-by-1 vector of responses resulting from filtering the simulated errors z1 through the model Mdl. Confirm that the simulated responses from simulate and filter are identical by plotting the two series. figure h1 = plot(ySim); hold on h2 = plot(yFlt1,"."); title("Filtered and Simulated Responses") legend([h1 h2],["Simulate" "Filter"],Location="best") hold off
12-1052
filter
Alternatively, simulate responses by randomly generating your own errors and passing them into filter. rng(1,"twister") Pred = randn(100,2); z2 = randn(100,1); yFlt2 = filter(Mdl,z2,X=Pred); figure h1 = plot(ySim); hold on h2 = plot(yFlt2,"."); title("Filtered and Simulated Responses") legend([h1 h2],["Simulate" "Filter"],Location="best") hold off
12-1053
12
Functions
This plot is the same as the previous plot, confirming that both simulation methods are equivalent. filter multiplies the error, Z, by sqrt(Mdl.Variance) before filtering Z through the model. Therefore, if you want to specify a different distribution, set Mdl.Variance to 1, and then generate your own errors using, for example, random("unif",a,b) for the Uniform(a, b) distribution.
Input Arguments Mdl — Fully specified regression model with ARIMA errors regARIMA model object Fully specified regression model with ARIMA errors, specified as a regARIMA model object created by regARIMA or estimate. The properties of Mdl cannot contain NaN values. Z — Error model disturbance series zt that drives the innovations process εt numeric column vector | numeric matrix Error model disturbance series zt that drives the error model innovations process εt, specified as a numobs-by-1 numeric column vector or a numobs-by-numpaths numeric matrix. numobs is the length of the time series (sample size). numpaths is the number of separate, independent disturbance paths. The innovations process εt = σzt, where σ = sqrt(Mdl.Variance), the standard deviation of the innovations. 12-1054
filter
Each row corresponds to a sampling time. The last row contains the latest set of disturbances. Each column corresponds to a separate, independent path of error model disturbances. filter assumes that disturbances across any row occur simultaneously. Z is the continuation of the presample disturbances Z0. Data Types: double Tbl1 — Time series data table | timetable Time series data containing the error model disturbance series zt that drives the error model innovations process εt, and, optionally, predictor variables xt, specified as a table or timetable with numvars variables and numobs rows. You can optionally select the disturbance variable or numpreds predictor variables by using the DisturbanceVariable or PredictorVariables name-value arguments, respectively. The innovations process εt = σzt, where σ = sqrt(Mdl.Variance), the standard deviation of the innovations. Each row is an observation, and measurements in each row occur simultaneously. The selected disturbance variable is a single path (numobs-by-1 vector) or multiple paths (numobs-by-numpaths matrix) of numobs observations of disturbance data. Each path (column) of the selected disturbance variable is independent of the other paths, but path j of all presample and in-sample variables correspond, for j = 1,…,numpaths. Each selected predictor variable is a numobs-by-1 numeric vector representing one path. The filter function includes all predictor variables in the model when it filters each disturbance path. Variables in Tbl1 represent the continuation of corresponding variables in Presample. If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending. If Tbl1 is a table, the last row contains the latest observation. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: filter(Mdl,Z,X=Pred,Z0=PSZ) specifies the predictor data Pred for the model regression component and the observed errors in the presample period PSZ to initialize the model. DisturbanceVariable — Disturbance variable zt to select from Tbl1 string scalar | character vector | integer | logical vector Disturbance variable zt to select from Tbl1 containing the disturbance data to filter through Mdl, specified as one of the following data types: • String scalar or character vector containing a variable name in Tbl1.Properties.VariableNames • Variable index (positive integer) to select from Tbl1.Properties.VariableNames 12-1055
12
Functions
• A logical vector, where DisturbanceVariable(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If Tbl1 has one variable, the default specifies that variable. Otherwise, the default matches the variable to names in Mdl.SeriesName. Example: DisturbanceVariable="StockRateDist" Example: DisturbanceVariable=[false false true false] or DisturbanceVariable=3 selects the third table variable as the disturbance variable. Data Types: double | logical | char | cell | string X — Predictor data numeric matrix Predictor data for the model regression component, specified as a numobs-by-numpreds numeric matrix. numpreds is the number of predictor variables (numel(Mdl.Beta)). Use X only when you supply the numeric array of disturbance data Z. X must have at least numobs rows. The last row contains the latest predictor data. If X has more than numobs rows, filter uses only the latest numobs rows. Each row of X corresponds to each period in Z (period for which filter filters errors; the period after the presample). filter does not use the regression component in the presample period. Columns of X are separate predictor variables. filter applies X to each filtered path; that is, X represents one path of observed predictors. By default, filter excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Predictor variables xt to select from Tbl1 string vector | cell vector of character vectors | vector of integers | logical vector Predictor variables xt to select from Tbl1 containing the predictor data for the model regression component, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in Tbl1.Properties.VariableNames • A vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames • A logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variables must be numeric vectors and cannot contain missing values (NaNs). By default, filter excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. 12-1056
filter
Data Types: double | logical | char | cell | string Z0 — Presample disturbance data zt numeric column vector | numeric matrix Presample disturbance data zt to initialize the error model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use Z0 only when you supply the numeric array of disturbance data Z. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the error model moving average (MA) component. If numpreobs is larger than required, filter uses the latest required observations only. Columns of Z0 are separate, independent presample paths. The following conditions apply: • If Z0 is a column vector, it represents a single disturbance path. filter applies it to each output path. • If Z0 is a matrix, each column represents a presample disturbance path. filter applies Z0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, filter uses the first size(Z,2) columns only. By default, filter sets the necessary presample disturbances to zero. Data Types: double U0 — Presample regression innovation data (unconditional disturbances) ut numeric column vector | numeric matrix Presample regression innovation data (unconditional disturbances) ut to initialize the error model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use U0 only when you supply the numeric array of disturbance data Z. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the error model autoregressive (AR) component. If numpreobs is larger than required, filter uses the latest required observations only. Columns of U0 are separate, independent presample paths. The following conditions apply: • If U0 is a column vector, it represents a single path. filter applies it to each path. • If U0 is a matrix, each column represents a presample path. filter applies U0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, filter uses the first size(Z,2) columns only. By default, filter sets the necessary presample unconditional disturbances to 0. Data Types: double Presample — Presample data table | timetable Presample data containing paths of disturbance zt or regression innovation (unconditional disturbance) ut series to initialize the model, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1. 12-1057
12
Functions
Each selected variable is a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-bynumprepaths matrix) of numpreobs observations representing the presample of the error model disturbance or regression innovation series for DisturbanceVariable, the selected error model disturbance variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be one of the following values: • At least Mdl.P when Presample provides only presample regression innovations to initialize the error model AR component • At least Mdl.Q when Presample provides only presample error model disturbances to initialize the error model MA component • At least max([Mdl.P Mdl.Q]) otherwise If you supply more rows than necessary, filter uses the latest required number of observations only. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default, filter sets necessary presample error model disturbances and regression innovations to zero. If you specify the Presample, you must specify the presample error model disturbance or regression innovation variable name by using the PresampleDisturbanceVariable or PresampleRegressionDisturbanceVariable name-value argument. PresampleDisturbanceVariable — Error model disturbance variable zt to select from Presample string scalar | character vector | integer | logical vector Error model disturbance variable zt to select from Presample containing the presample error model disturbance data, specified as one of the following data types: • String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleDisturbanceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample error model disturbance data by using the Presample name-value argument, you must specify PresampleDisturbanceVariable. Example: PresampleDisturbanceVariable="GDP_Z" 12-1058
filter
Example: PresampleDisturbanceVariable=[false false true false] or PresampleDisturbanceVariable=3 selects the third table variable for presample error model disturbance data. Data Types: double | logical | char | cell | string PresampleRegressionDisturbanceVariable — Regression model innovation variable to select from Presample string scalar | character vector | integer | logical vector Regression model innovation variable, associated with unconditional disturbances ut, to select from Presample containing data for the presample regression model innovations, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleRegressionDisturbanceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample regression model innovation data by using the Presample name-value argument, you must specify PresampleRegressionDisturbanceVariable. Example: PresampleRegressionDisturbanceVariable="StockRateU" Example: PresampleRegressionDisturbanceVariable=[false false true false] or PresampleRegressionDisturbanceVariable=3 selects the third table variable as the presample regression model innovation data. Data Types: double | logical | char | cell | string Note • NaN values in Z, X, Z0 and U0 indicate missing values. filter removes missing values from specified data by listwise deletion. • For the presample, filter horizontally concatenates the possibly jagged arrays Z0 and U0 with respect to the last rows, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data, filter horizontally concatenates the possibly jagged arrays Z and X, and then it removes any row of the concatenated matrix containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, filter assumes that you synchronize the presample data such that the latest observations occur simultaneously. • filter issues an error when any table or timetable input contains missing values. • All predictor variables (columns) in X are associated with each input error series to produce numpaths output series.
12-1059
12
Functions
Output Arguments Y — Simulated response paths yt numeric column vector | numeric matrix Simulated response paths yt, returned as a numobs-by-1 column vector or a numobs-by-numpaths numeric matrix. filter returns Y only when you supply the input Z. For each t = 1, …, numobs, the simulated responses at time t Y(t,:) correspond to the filtered errors at time t Z(t,:) and response path j Y(:,j) corresponds to the filtered disturbance path j Z(:,j) when Z is a matrix. Y represents the continuation of presample inputs. E — Simulated, mean-zero innovations paths εt numeric column vector | numeric matrix Simulated, mean-zero innovations paths εt of the error model, returned as a numobs-by-1 column vector or a numobs-by-numpaths numeric matrix. filter returns E only when you supply the input Z. The dimensions of Y and E correspond. Columns of E are scaled disturbance paths (innovations) such that, for a particular path, εt = σzt. U — Simulated unconditional disturbance paths ut numeric column vector | numeric matrix Simulated unconditional disturbance paths ut, returned as a numobs-by-1 column vector or a numobsby-numpaths numeric matrix. filter returns U only when you supply the input Z. The dimensions of Y and U correspond. Tbl2 — Simulated response yt, error model innovation εt, and unconditional disturbance ut paths table | timetable Simulated response yt, error model innovation εt, and unconditional disturbance ut paths, returned as a table or timetable, the same data type as Tbl1. filter returns Tbl2 only when you supply the input Tbl1. Tbl2 contains the following variables: • The filtered response paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the error model disturbance variable in Tbl1. filter names the simulated response variable in Tbl2 responseName_Response, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding simulated response paths with the name StockReturns_Response. • The simulated error model innovation paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the error model disturbance variable in Tbl1. filter names the simulated error model innovation variable in Tbl2 responseName_ErrorInnovation, where responseName is Mdl.SeriesName. For example, if 12-1060
filter
Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding simulated error model innovation paths with the name StockReturns_ErrorInnovation. • The simulated unconditional disturbance paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths, each corresponding to the input observations and paths of the error model disturbance variable in Tbl1. filter names the simulated unconditional disturbance variable in Tbl2 responseName_RegressionInnovation, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding simulated unconditional disturbance paths with the name StockReturns_RegressionInnovation. • All variables Tbl1. If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal.
Alternative Functionality filter generalizes simulate. Both filter a series of errors to produce responses Y, innovations E, and unconditional disturbances U. However, simulate autogenerates a series of mean zero, unit variance, independent and identically distributed (iid) errors according to the distribution in Mdl. In contrast, filter requires that you specify your own errors, which can come from any distribution.
Version History Introduced in R2013b R2023b: filter accepts input data in tables and timetables In addition to accepting input data (in-sample and presample data) in numeric arrays, filter accepts input data in tables or regular timetables. When you supply data in a table or timetable, the following conditions apply: • filter chooses the default in-sample error model disturbance series on which to operate, but you can use the specified optional name-value argument to select a different series. • If you specify optional presample error model disturbance or regression model innovation data to initialize the model, you must also specify the appropriate presample variable names. • filter returns results in a table or timetable. Name-value arguments to support tabular workflows include: • DisturbanceVariable specifies the name of the disturbance series in the input data to filter through the model. • PredictorVariables specifies the names of the predictor series to select from the input data for the model regression component. • Presample specifies the input table or timetable of presample error model disturbance or regression innovation data. • PresampleDisturbanceVariable specifies the name of the error model disturbance series to select from Presample. • PresampleRegressionDisturbanceVariable specifies the name of the regression model innovation series to select from Presample. 12-1061
12
Functions
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [3] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991. [6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
See Also Objects regARIMA Functions simulate Topics “Impulse Response of Regression Models with ARIMA Errors” on page 5-66 “Monte Carlo Simulation of Regression Models with ARIMA Errors” on page 5-151 “Presample Data for regARIMA Model Simulation” on page 5-154 “Simulate Regression Models with Multiplicative Seasonal Errors” on page 5-146 “Simulate Regression Models with Nonstationary Errors” on page 5-138
12-1062
filter
filter Forward recursion of state-space models
Syntax X = filter(Mdl,Y) X = filter(Mdl,Y,Name,Value) [X,logL,Output] = filter( ___ )
Description filter computes state-distribution moments for each period of the specified response data by recursively applying the Kalman filter. To compute updated state-distribution moments efficiently during only the final period of the specified response data by applying one recursion of the Kalman filter, use update instead. X = filter(Mdl,Y) returns filtered states on page 11-8 (X) from performing forward recursion of the fully specified state-space model on page 11-3 Mdl. That is, filter applies the standard Kalman filter on page 11-7 using Mdl and the observed responses Y. X = filter(Mdl,Y,Name,Value) uses additional options specified by one or more Name,Value arguments. For example, specify the regression coefficients and predictor data to deflate the observations, or specify to use the square-root filter. If Mdl is not fully specified, then you must specify the unknown parameters as known scalars using the 'Params' Name,Value argument. [X,logL,Output] = filter( ___ ) uses any of the input arguments in the previous syntaxes to additionally return the loglikelihood value (logL) and an output structure array (Output) using any of the input arguments in the previous syntaxes. Output contains: • Filtered and forecasted states on page 11-8 • Estimated covariance matrices of the filtered and forecasted states • Loglikelihood value • Forecasted observations on page 11-10 and its estimated covariance matrix • Adjusted Kalman gain on page 11-11 • Vector indicating which data the software used to filter
Examples Filter States of Time-Invariant State-Space Model Suppose that a latent process is an AR(1). The state equation is xt = 0 . 5xt − 1 + ut, 12-1063
12
Functions
where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. T = 100; ARMdl = arima('AR',0.5,'Constant',0,'Variance',1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,'Y0',x0);
Suppose further that the latent process is subject to additive measurement error. The observation equation is yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model. Use the random latent state process (x) and the observation equation to generate observations. y = x + 0.75*randn(T,1);
Specify the four coefficient matrices. A B C D
= = = =
0.5; 1; 1; 0.75;
Specify the state-space model using the coefficient matrices. Mdl = ssm(A,B,C,D) Mdl = State-space model type: ssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.50)x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0
12-1064
filter
Initial state covariance matrix x1 x1 1.33 State types x1 Stationary
Mdl is an ssm model. Verify that the model is correctly specified using the display in the Command Window. The software infers that the state process is stationary. Subsequently, the software sets the initial state mean and covariance to the mean and variance of the stationary distribution of an AR(1) model. Filter states for periods 1 through 100. Plot the true state values and the filtered state estimates. filteredX = filter(Mdl,y); figure plot(1:T,x,'-k',1:T,filteredX,':r','LineWidth',2) title({'State Values'}) xlabel('Period') ylabel('State') legend({'True state values','Filtered state values'})
The true values and filter estimates are approximately the same. 12-1065
12
Functions
Filter States of State-Space Model Containing Regression Component Suppose that the linear relationship between the change in the unemployment rate and the nominal gross national product (nGNP) growth rate is of interest. Suppose further that the first difference of the unemployment rate is an ARMA(1,1) series. Symbolically, and in state-space form, the model is x1, t x2, t
=
1 ϕ θ x1, t − 1 + u1, t 1 0 0 x2, t − 1
yt − βZt = x1, t + σεt, where: • x1, t is the change in the unemployment rate at time t. • x2, t is a dummy state for the MA(1) effect. •
y1, t is the observed change in the unemployment rate being deflated by the growth rate of nGNP (Zt).
• u1, t is the Gaussian series of state disturbances having mean 0 and standard deviation 1. • εt is the Gaussian series of observation innovations having mean 0 and standard deviation σ. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series, and the first difference of each series. Also, remove the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); u = DataTable.UR(~isNaN); T = size(gnpn,1); Z = [ones(T-1,1) diff(log(gnpn))]; y = diff(u);
% Flag periods containing NaNs % Sample size
Though this example removes missing values, the software can accommodate series containing missing values in the Kalman filter framework. Specify the coefficient matrices. A B C D
= = = =
[NaN NaN; 0 0]; [1; 1]; [1 0]; NaN;
Specify the state-space model using ssm. Mdl = ssm(A,B,C,D);
Estimate the model parameters, and use a random set of initial parameter values for optimization. Specify the regression component and its initial value for optimization using the 'Predictors' and 12-1066
filter
'Beta0' name-value pair arguments, respectively. Restrict the estimate of σ to all positive, real numbers. params0 = [0.3 0.2 0.2]; [EstMdl,estParams] = estimate(Mdl,y,params0,'Predictors',Z,... 'Beta0',[0.1 0.2],'lb',[-Inf,-Inf,0,-Inf,-Inf]); Method: Maximum likelihood (fmincon) Sample size: 61 Logarithmic likelihood: -99.7245 Akaike info criterion: 209.449 Bayesian info criterion: 220.003 | Coeff Std Err t Stat Prob ---------------------------------------------------------c(1) | -0.34098 0.29608 -1.15164 0.24948 c(2) | 1.05003 0.41377 2.53771 0.01116 c(3) | 0.48592 0.36790 1.32079 0.18657 y 12-1530
infer
Mdl.Q, infer uses the latest required number of observations only. The last element or row contains the latest observation. • If E0 is a column vector, it represents a single path of the underlying innovation series. infer applies it to each output path. • If E0 is a matrix, each column represents a presample path of the underlying innovation series. numprepaths must be at least numpaths. If numprepaths > numpaths, infer uses the first size(Y,2) columns only. The defaults are: • For GARCH(P,Q) and GJR(P,Q) models, infer sets any necessary presample innovations to the square root of the average squared value of the offset-adjusted response series Y. • For EGARCH(P,Q) models, infer sets any necessary presample innovations to zero. Data Types: double V0 — Positive presample conditional variance paths σt2 positive column vector | positive matrix Positive presample conditional variance paths σt2, specified as a numpreobs-by-1 positive column vector or numpreobs-by-numprepaths positive matrix. V0 provides initial values for the conditional variances in the model. Use V0 only when you supply the numeric array of disturbances Z. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. • For GARCH(P,Q) and GJR(P,Q) models, numpreobs must be at least Mdl.P. • For EGARCH(P,Q) models,numpreobs must be at least max([Mdl.P Mdl.Q]). numpreobs must be at least max([Mdl.P Mdl.Q]). If numpreobs > max([Mdl.P Mdl.Q]), infer uses the latest required number of observations only. The last element or row contains the latest observation. • If V0 is a column vector, it represents a single path of the conditional variance series. infer applies it to each output path. • If V0 is a matrix, each column represents a presample path of the conditional variance series. numprepaths must be at least numpaths. If numprepaths > numpaths, infer uses the first size(Y,2) columns only. By default, infer sets any necessary presample conditional variances to the unconditional variance of the process. Data Types: double Presample — Presample data table | timetable Presample data containing paths of innovation εt or conditional variance σt2 series to initialize the model, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1. Each selected variable is a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-bynumprepaths matrix) of numpreobs observations representing the presample of numpreobs 12-1531
12
Functions
observations of the innovation or conditional variance series for ResponseVariable, the selected response variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be one of the following values: • Mdl.Q when Presample provides only presample innovations. • Mdl.P when Presample provides only presample conditional variances. • max([Mdl.P Mdl.Q]) when Presample provides both presample innovations and conditional variances If numpreobs exceeds the minimum number, infer uses the latest required number of observations only. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. The defaults are: • For GARCH(P,Q) and GJR(P,Q) models, infer sets any necessary presample innovations to the square root of the average squared value of the offset-adjusted response series Y. • For EGARCH(P,Q) models, infer sets any necessary presample innovations to zero. • infer sets any necessary presample conditional variances to the unconditional variance of the process. If you specify the Presample, you must specify the presample innovation or conditional variance variable names by using the PresampleInnovationVariable or PresampleVarianceVariable name-value argument. PresampleInnovationVariable — Variable of Presample containing presample innovation paths εt string scalar | character vector | integer | logical vector Variable of Presample containing presample innovation paths εt, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (integer) to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleInnovationVariable) is 1 The selected variable must be a numeric matrix and cannot contain missing values (NaN). If you specify presample innovation data by using the Presample name-value argument, you must specify PresampleInnovationVariable. 12-1532
infer
Example: PresampleInnovationVariable="StockRateInnov0" Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable as the presample innovation variable. Data Types: double | logical | char | cell | string PresampleVarianceVariable — Variable of Presample containing data for the presample conditional variances σt2 string scalar | character vector | integer | logical vector Variable of Presample containing data for the presample conditional variances σt2, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable. Example: PresampleVarianceVariable="StockRateVar0" Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable. Data Types: double | logical | char | cell | string Notes: • NaN values in Y, E0, and V0 indicate missing values. infer removes missing values from specified data by list-wise deletion. • For the presample, infer horizontally concatenates E0 and V0, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data Y, infer removes any row containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, infer assumes that you synchronize the presample data such that the latest observations occur simultaneously. • infer issues an error when any table or timetable input contains missing values.
Output Arguments V — Conditional variances numeric column vector | numeric matrix 12-1533
12
Functions
Conditional variances inferred from the response data Y, returned as a numeric column vector or matrix. infer returns V only when you supply the input Y. The dimensions of V and Y are equivalent. If Y is a matrix, then the columns of V are the inferred conditional variance paths corresponding to the columns of Y. Rows of V are periods corresponding to the periodicity of Y. logL — Loglikelihood objective function values numeric scalar | numeric vector Loglikelihood objective function values associated with the model Mdl, returned as a numeric scalar or vector of length numpaths. If Y is a vector, then logL is a scalar. Otherwise, logL is vector of length size(Y,2), and each element is the loglikelihood of the corresponding column (or path) in Y. Tbl2 — Inferred conditional variance σt2 and innovation εt paths table | timetable Inferred conditional variance σt2 and innovation εt paths, returned as a table or timetable, the same data type as Tbl1. infer returns Tbl2 only when you supply the input Tbl1. When Mdl is an estimated model returned by estimate, the returned, inferred innovations are residuals. Tbl2 contains the following variables: • The inferred conditional variance paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding path of presample conditional variances in Presample. infer names the filtered conditional variance variable in Tbl2 responseName_Variance, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding inferred conditional variance paths with the name StockReturns_Variance. • The inferred innovation paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths. Each path corresponds to the input response path in Tbl1 and represents the continuation of the corresponding presample innovations path in Presample. infer names the inferred innovations variable in Tbl2 responseName_Residual, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding inferred innovations paths with the name StockReturns_Residual. • All variables Tbl1. If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal.
Algorithms If you do not specify presample data (E0 and V0, or Presample), infer derives the necessary presample observations from the unconditional, or long-run, variance of the offset-adjusted response process. • For all conditional variance model types, required presample conditional variances are the sample average of the squared disturbances of the offset-adjusted specified response data (Y or Tbl1). 12-1534
infer
• For GARCH(P,Q) and GJR(P,Q) models, the required presample innovations are the square root of the average squared value of the offset-adjusted response data. • For EGARCH(P,Q) models, the required presample innovaitons are 0. These specifications minimize initial transient effects.
Version History Introduced in R2012a R2023a: infer accepts input data in tables and timetables, and returns results in tables and timetables In addition to accepting input data (in-sample and presample) in numeric arrays, infer accepts input data in tables or regular timetables. When you supply data in a table or timetable, the following conditions apply: • infer chooses the default in-sample response series on which to operate, but you can use the specified optional name-value argument to select a different series. • If you specify optional presample innovation or conditional variance data to initialize the model, you must also specify the presample innovation or conditional variance series name. • infer returns results in a table or timetable. Name-value arguments to support tabular workflows include: • ResponseVariable specifies the variable name of the response paths in the input data, from which infer infers conditional variances and innovations. • Presample specifies the input table or timetable of presample innovation and conditional variance data. • PresampleInnovationVariable specifies the variable name of the innovation paths to select from Presample. • PresampleVarianceVariable specifies the variable name of the conditional variance paths to select from Presample.
References [1] Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics. Vol. 31, 1986, pp. 307–327. [2] Bollerslev, T. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” The Review of Economics and Statistics. Vol. 69, 1987, pp. 542–547. [3] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [4] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995. [5] Engle, R. F. “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica. Vol. 50, 1982, pp. 987–1007. 12-1535
12
Functions
[6] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [7] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects gjr | egarch | garch Functions estimate Topics “Infer Conditional Variances and Residuals” on page 8-62 “Compare Conditional Variance Models Using Information Criteria” on page 8-69
12-1536
infer
infer Infer univariate ARIMA or ARIMAX model residuals or conditional variances
Syntax E = infer(Mdl,Y) [E,V] = infer(Mdl,Y) Tbl2 = infer(Mdl,Tbl1) [ ___ ] = infer( ___ ,Name=Value) [ ___ ,logL] = infer( ___ )
Description E = infer(Mdl,Y) returns the numeric array of one or more residual series E inferred from the fully specified, univariate ARIMA model Mdl and the numeric array of one or more response series Y. [E,V] = infer(Mdl,Y) also returns the numeric array of one or more conditional variance V series when Mdl represents a composite conditional mean and variance model. Tbl2 = infer(Mdl,Tbl1) returns the table or timetable Tbl2 containing paths of residuals and conditional variances inferred from the model Mdl and the response data in the input table or timetable Tbl1. infer selects the response variable named in Mdl.SeriesName or the sole variable in Tbl1. To select a different response variable in Tbl1 to infer residuals and conditional variances, use the ResponseVariable name-value argument. [ ___ ] = infer( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. infer returns the output argument combination for the corresponding input arguments. For example, infer(Mdl,Y,Y0=PS,X=Pred) infers residuals from the numeric vector of responses Y with respect to the ARIMAX Mdl, and specifies the numeric vector of presample response data PS to initialize the model and the exogenous predictor data Pred for the regression component. [ ___ ,logL] = infer( ___ ) also returns a numeric vector containing the loglikelihood objective function values logL associated with each specified path of response data.
Examples Infer Residuals From Model and Vector of Response Data Infer residuals from an AR model by supplying a hypothetical response series in a vector. Specify an AR(2) model using known parameters. Mdl = arima(AR={0.5 -0.8},Constant=0.002, ... Variance=0.8);
12-1537
12
Functions
Simulate response data with 100 observations. rng(1,"twister"); Y = simulate(Mdl,100);
Y is a 100-by-1 vector containing a random response path drawn from Mdl. Infer residuals for all corresponding responses. E = infer(Mdl,Y);
E is a 100-by-1 vector containing a residuals corresponding to Y, with respect to Mdl. By default, infer backcasts for required presample observations. Plot the residuals. figure plot(E) title("Inferred Residuals")
Infer Conditional Variances Infer the conditional variances from an AR(1) and GARCH(1,1) composite model. Return the loglikelihood value. 12-1538
infer
Specify an AR(1) model using known parameters. Set the variance equal to a garch model. Mdl = arima(AR={0.8 -0.3},Constant=0); MdlVar = garch(Constant=0.0002,GARCH=0.6,ARCH=0.2); Mdl.Variance = MdlVar;
Simulate response data with 100 observations. rng(1,"twister") Y = simulate(Mdl,100);
Infer residuals and conditional variances for the entire response series. Compute the loglikelihood at the simulated data. [E,V,logL] = infer(Mdl,Y); logL logL = 209.6405
E and V are 100-by-1 vectors of inferred residuals and conditional variances, given the response data and model. Plot the conditional variances. figure plot(V) title("Inferred Conditional Variances")
12-1539
12
Functions
Supply Presample Responses Infer residuals from an AR model by supplying a hypothetical response series in a vector. Supply presample responses to initialize the model. Specify an AR(2) model using known parameters. Mdl = arima(AR={0.5 -0.8},Constant=0.002, ... Variance=0.8) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 0 0.002 {0.5 -0.8} at lags [1 2] {} {} {} 0 [1×0] 0.8
Consider inferring residuals from a response series of length T = 100. Because the model requires Mdl.P responses to initialize the model, simulate T + Mdl.P = 102 responses from the model. rng(1,"twister"); T = 100; TSim = T + Mdl.P; y = simulate(Mdl,TSim);
Y is a 102-by-1 vector representing a random response path drawn from the model. Infer residuals from the last T response and use the first Mdl.P observations as a presample to initialize the model. E = infer(Mdl,y((Mdl.P+1):end),Y0=y(1:Mdl.P)); size(E) ans = 1×2 100
1
E is a 100-by-1 vector containing a residuals corresponding to the last 100 observations of y, with respect to Mdl. Plot the residuals. figure plot(E) title("Inferred Residuals")
12-1540
infer
Infer Residuals From Model and Response Data in Timetable Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply timetables of insample and presample data for the fit. Then, infer the residuals from the fit. Load Data Load the US equity index data set Data_EquityIdx. load Data_EquityIdx T = height(DataTimeTable) T = 3028
The timetable DataTimeTable includes the time series variable NYSE, which contains daily NYSE composite closing prices from January 1990 through December 2001. Plot the daily NYSE price series. figure plot(DataTimeTable.Time,DataTimeTable.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")
12-1541
12
Functions
Prepare Timetable for Estimation When you plan to supply a timetable, you must ensure it has all the following characteristics: • The selected response variable is numeric and does not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the timetable, relative to the NYSE price series. DTT = rmmissing(DataTimeTable,DataVariables="NYSE"); T_DTT = height(DTT) T_DTT = 3028
Because all sample times have observed NYSE prices, rmmissing does not remove any observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"days") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time)
12-1542
infer
areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Business day rules make daily macroeconomic measurements irregular. Remedy the time irregularity by computing the weekly average closing price series of all timetable variables. DTTW = convert2weekly(DTT,Aggregation="mean"); areTimestampsRegular = isregular(DTTW,"weeks") areTimestampsRegular = logical 1 T_DTTW = height(DTTW) T_DTTW = 627
DTTW is regular. figure plot(DTTW.Time,DTTW.NYSE) title("NYSE Daily Closing Prices: 1990 - 2001")
12-1543
12
Functions
Create Model Template for Estimation Suppose that an ARIMA(1,1,1) model is appropriate to model NYSE composite series during the sample period. Create an ARIMA(1,1,1) model template for estimation. Mdl = arima(1,1,1);
Mdl is a partially specified arima model object. Fit Model to Data infer requires Mdl.P presample observations to initialize the model. infer backcasts for necessary presample responses, but you can provide a presample. Partition the data into presample and in-sample, or estimation sample, observations. T0 = Mdl.P; DTTW0 = DTTW(1:T0,:); DTTW1 = DTTW((T0+1):end,:);
Fit an ARIMA(1,1,1) model to the in-sample weekly average NYSE closing prices. Specify the response variable name, presample timetable, and the presample response variable name. EstMdl = estimate(Mdl,DTTW1,ResponseVariable="NYSE", ... Presample=DTTW0,PresampleResponseVariable="NYSE"); ARIMA(1,1,1) Model (Gaussian Distribution):
Constant AR{1} MA{1} Variance
Value ________
StandardError _____________
0.83623 -0.32862 0.42703 56.065
0.453 0.23526 0.22613 1.8433
TStatistic __________ 1.846 -1.3968 1.8884 30.416
PValue ___________ 0.064891 0.16247 0.058966 3.3795e-203
EstMdl is a fully specified, estimated arima model object. Infer Residuals Infer the residuals from the fitted model and in-sample observations. Specify the response variable name, presample timetable, and the presample response variable name. Tbl2 = infer(EstMdl,DTTW1,ResponseVariable="NYSE", ... Presample=DTTW0,PresampleResponseVariable="NYSE"); tail(Tbl2)
12-1544
Time ___________
NYSE ______
NASDAQ ______
16-Nov-2001 23-Nov-2001 30-Nov-2001 07-Dec-2001 14-Dec-2001
577.11 583 581.41 584.96 574.03
1886.9 1898.3 1925.8 1998.1 1981
Y_Residual __________ 5.8649 5.3303 -2.7678 3.3787 -12.038
Y_Variance __________ 56.065 56.065 56.065 56.065 56.065
infer
21-Dec-2001 28-Dec-2001 04-Jan-2002
582.1 590.28 589.8
1967.9 1967.2 1950.4
8.7774 6.2526 -1.3009
56.065 56.065 56.065
size(Tbl2) ans = 1×2 625
4
Tbl2 is a 625-by-4 timetable containing all variables in DTTW1, and the inferred residuals from the fit NYSE_Response and constant variance paths NYSE_Variance (Mdl.Variance = 56.065).
Compute Fitted Responses Fit an ARIMA(1,1,1) model to the weekly average NYSE closing prices. Supply a timetable of data and specify the series for the fit. Then, compute fitted responses. Load the US equity index data set Data_EquityIdx. load Data_EquityIdx T = height(DataTimeTable) T = 3028
Remedy the time irregularity by computing the weekly average closing price series of all timetable variables. DTTW = convert2weekly(DataTimeTable,Aggregation="mean"); T_DTTW = height(DTTW) T_DTTW = 627
Create an ARIMA(1,1,1) model template for estimation. Set the response series name to NYSE. Mdl = arima(1,1,1); Mdl.SeriesName = "NYSE";
Partition the data into presample and in-sample, or estimation sample, observations. T0 = Mdl.P; DTTW0 = DTTW(1:T0,:); DTTW1 = DTTW((T0+1):end,:);
Fit an ARIMA(1,1,1) model to the in-sample weekly average NYSE closing prices. Specify the presample timetable, and the presample response variable name. EstMdl = estimate(Mdl,DTTW1,Presample=DTTW0, ... PresampleResponseVariable="NYSE"); ARIMA(1,1,1) Model (Gaussian Distribution): Value ________
StandardError _____________
TStatistic __________
PValue ___________
12-1545
12
Functions
Constant AR{1} MA{1} Variance
0.83623 -0.32862 0.42703 56.065
0.453 0.23526 0.22613 1.8433
1.846 -1.3968 1.8884 30.416
0.064891 0.16247 0.058966 3.3795e-203
Infer the residuals from the fitted model and in-sample observations. Specify the presample timetable, and the presample response variable name. Tbl2 = infer(EstMdl,DTTW1,Presample=DTTW0, ... PresampleResponseVariable="NYSE");
Compute fitted response values by subtracting the residuals from the observed response series. Tbl2.YHat = Tbl2.NYSE - Tbl2.NYSE_Residual;
Plot the observed responses and the fitted values. figure plot(Tbl2.Time,[Tbl2.NYSE Tbl2.YHat]) legend("Observations","Fitted values") title("NYSE Weekly Average Price Series")
The fitted values closely track the observations. Plot the residuals versus the fitted values. figure plot(Tbl2.YHat,Tbl2.NYSE_Residual,".",MarkerSize=15)
12-1546
infer
ylabel("Residuals") xlabel("Fitted Values") title("Residual Plot")
Residual variance appears larger for larger fitted values. One remedy for this behavior is to apply the log transform to the data.
Infer Residuals from ARMAX Model Infer residuals from an ARMAX model. Specify an ARMA(1,2) model using known parameters for the response (MdlY) and an AR(1) model for the predictor data (MdlX). MdlY = arima(AR=0.2,MA={-0.1,0.6},Constant=1, ... Variance=2,Beta=3) MdlY = arima with properties: Description: SeriesName: Distribution: P: D:
"ARIMAX(1,0,2) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 0
12-1547
12
Functions
Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
2 1 {0.2} at lag [1] {} {-0.1 0.6} at lags [1 2] {} 0 [3] 2
MdlX = arima(AR=0.3,Constant=0,Variance=1);
If you do not specify presample responses, infer requires at least T + MdlY.P predictor observations to simulate a response series of length T. Consider simulating a response series of length 100. Simulate a predictor series of length 101, and then simulate the response series. Provide the predictor data to simulate for the exogenous regression component. rng(1,"twister") % For reproducibility T = 100; Pred = simulate(MdlX,T + MdlY.P); Y = simulate(MdlY,T,X=Pred);
Infer residuals using the entire series. E = infer(MdlY,Y,X=Pred); figure plot(E) title("Inferred Residuals")
12-1548
infer
Input Arguments Mdl — Fully specified ARIMA model arima model object Fully specified ARIMA model, specified as an arima model object created by arima or estimate. The properties of Mdl cannot contain NaN values. Y — Response data yt numeric column vector | numeric matrix Response data yt, specified as a numobs-by-1 numeric column vector or numobs-by-numpaths numeric matrix. numObs is the length of the time series (sample size). numpaths is the number of separate, independent paths of response series. infer infers the residuals and conditional variances of columns of Y, which are time series characterized by Mdl. Y is the continuation of the presample series Y0. Each row corresponds to a sampling time. The last row contains the latest set of observations. Each column corresponds to a separate, independent path of response data. infer assumes that responses across any row occur simultaneously. Data Types: double 12-1549
12
Functions
Tbl1 — Time series data table | timetable Time series data containing the observed response variable yt and, optionally, predictor variables xt for the exogenous regression component, specified as a table or timetable with numvars variables and numobs rows. You can optionally select the response variable or numpreds predictor variables by using the ResponseVariable or PredictorVariables name-value arguments, respectively. Each row is an observation, and measurements in each row occur simultaneously. The selected response variable is a single path (numobs-by-1 vector) or multiple paths (numobs-by-numpaths matrix) of numobs observations of response data. Each path (column) of the selected response variable is independent of the other paths, but path j of all presample and in-sample variables correspond, for j = 1,…,numpaths. Each selected predictor variable is a numobs-by-1 numeric vector representing one path. The infer function includes all predictor variables in the model when it infers residuals and conditional variances. Variables in Tbl1 represent the continuation of corresponding variables in Presample. If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending. If Tbl1 is a table, the last row contains the latest observation. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: infer(Mdl,Y,Y0=PS,X=Pred) infers residuals from the numeric vector of responses Y through the ARIMAX Mdl, and specifies the numeric vector of presample response data PS to initialize the model and the exogenous predictor data Pred for the regression component. ResponseVariable — Response variable yt to select from Tbl1 string scalar | character vector | integer | logical vector Response variable yt to select from Tbl1 containing the response data, specified as one of the following data types: • String scalar or character vector containing a variable name in Tbl1.Properties.VariableNames • Variable index (positive integer) to select from Tbl1.Properties.VariableNames • A logical vector, where DisturbanceVariable(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If Tbl1 has one variable, the default specifies that variable. Otherwise, the default matches the variable to names in Mdl.SeriesName. Example: ResponseVariable="StockRate" Example: ResponseVariable=[false false true false] or ResponseVariable=3 selects the third table variable as the response variable. 12-1550
infer
Data Types: double | logical | char | cell | string Y0 — Presample response data yt numeric column vector | numeric matrix Presample response data yt to initialize the model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use Y0 only when you supply the numeric array of response data Y. numpreobs is the number of presample observations. numprepaths is the number of presample response paths. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the AR model component. If numpreobs > Mdl.P, infer uses the latest required number of observations only. Columns of Y0 are separate, independent presample paths. The following conditions apply: • If Y0 is a column vector, it represents a single response path. infer applies it to each output path. • If Y0 is a matrix, each column represents a presample response path. infer applies Y0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, infer uses the first size(Y,2) columns only. By default, infer backcasts to obtain the necessary observations. Data Types: double E0 — Presample residual data et numeric column vector | numeric matrix Presample residual data et to initialize the model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use E0 only when you supply the numeric array of response data Y. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the MA model component. If Mdl.Variance is a conditional variance model (for example, a garch model object), infer can require more rows than Mdl.Q. If numpreobs is larger than required, infer uses the latest required number of observations only. Columns of E0 are separate, independent presample paths. The following conditions apply: • If E0 is a column vector, it represents a single residual path. infer applies it to each output path. • If E0 is a matrix, each column represents a presample residual path. infer applies E0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, infer uses the first size(Y,2) columns only. • infer assumes each column of E0 has a mean of zero. By default, infer sets the necessary presample disturbances to zero. Data Types: double V0 — Presample conditional variances σt2 positive numeric column vector | positive numeric matrix 12-1551
12
Functions
Presample conditional variances σt2 to initialize the conditional variance model, specified as a numpreobs-by-1 positive numeric column vector or a numpreobs-by-numprepaths positive numeric matrix. If the conditional variance Mdl.Variance is constant, infer ignores V0. Use V0 only when you supply the numeric array of response data Y. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the conditional variance model in Mdl.Variance. For details, see the infer function of conditional variance models. If numpreobs is larger than required, infer uses the latest required number of observations only. Columns of V0 are separate, independent presample paths. The following conditions apply: • If V0 is a column vector, it represents a single path of conditional variances. infer applies it to each output path. • If V0 is a matrix, each column represents a presample path of conditional variances. infer applies V0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, infer uses the first size(Y,2) columns only. By default, infer sets all necessary presample conditional variances to the average squared value of inferred residuals. Data Types: double Presample — Presample data table | timetable Presample data containing paths of response yt, residual et, or conditional variance σt2 series to initialize the model, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Use Presample only when you supply a table or timetable of data Tbl1. Each selected variable is a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-bynumprepaths matrix) of numpreobs observations representing the presample of the response, residual, or conditional variance series for ResponseVariable, the selected response variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be one of the following values: • At least Mdl.P when Presample provides only presample responses • At least Mdl.Q when Presample provides only presample disturbances or conditional variances • At least max([Mdl.P Mdl.Q]) otherwise When Mdl.Variance is a conditional variance model, infer can require more than the minimum required number of presample values. If you supply more rows than necessary, infer uses the latest required number of observations only. When Presample provides presample residuals, infer assumes each presample residual path has a mean of zero. If Presample is a timetable, all the following conditions must be true: 12-1552
infer
• Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default: • When Mdl is a model without a exogenous linear regression component (ARIMAX), infer backcasts for necessary presample responses, sets necessary presample residuals to 0, and sets necessary presample variances to the average squared value of inferred residuals. • When Mdl is an ARIMAX model (you specify the PredictorVariables name-value argument), you must specify presample response data, but infer sets necessary presample residuals to 0 and sets necessary presample variances to the average squared value of inferred residuals. If you specify the Presample, you must specify the presample response, residual, or conditional variance name by using the PresampleResponseVariable, PresampleInnovationVariable, or PresampleVarianceVariable name-value argument. PresampleResponseVariable — Response variable yt to select from Presample string scalar | character vector | integer | logical vector Response variable yt to select from Presample containing presample response data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleResponseVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric matrix and cannot contain missing values (NaNs). If you specify presample response data by using the Presample name-value argument, you must specify PresampleResponseVariable. Example: PresampleResponseVariable="Stock0" Example: PresampleResponseVariable=[false false true false] or PresampleResponseVariable=3 selects the third table variable as the presample response variable. Data Types: double | logical | char | cell | string PresampleInnovationVariable — Presample residual variable et to select from Presample string scalar | character vector | integer | logical vector Presample residual variable et to select from Presample containing presample residual data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames 12-1553
12
Functions
• Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric matrix and cannot contain missing values (NaNs). If you specify presample residual data by using the Presample name-value argument, you must specify PresampleInnovationVariable. Example: PresampleInnovationVariable="StockRateDist0" Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable as the presample innovation variable. Data Types: double | logical | char | cell | string PresampleVarianceVariable — Conditional variance variable σt2 to select from Presample string scalar | character vector | integer | logical vector Conditional variance variable σt2 to select from Presample containing presample conditional variance data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable. Example: PresampleVarianceVariable="StockRateVar0" Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable. Data Types: double | logical | char | cell | string X — Exogenous predictor data numeric matrix Exogenous predictor data for the model regression component, specified as a numeric matrix with numpreds columns. numpreds is the number of predictor variables (numel(Mdl.Beta)). Use X only when you supply the numeric array of response data Y. If you do not specify Y0, the number of rows of X must be at least numObs + Mdl.P. Otherwise, the number of rows of X must be at least numObs. If the number of rows of X exceeds the number necessary, infer uses only the latest observations. infer does not use the regression component in the presample period. Columns of X are separate predictor variables. infer applies X to each path; that is, X represents one path of observed predictors. 12-1554
infer
By default, infer excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Exogenous predictor variables xt to select from Tbl1 string vector | cell vector of character vectors | vector of integers | logical vector Exogenous predictor variables xt to select from Tbl1 containing the predictor data for the model regression component, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in Tbl1.Properties.VariableNames • A vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames • A logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variables must be numeric vectors and cannot contain missing values (NaNs). If you specify PredictorVariables, you must also specify presample response data to by using the Presample and PresampleResponseVariable name-value arguments. For more details, see “Algorithms” on page 12-1556. By default, infer excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. Data Types: double | logical | char | cell | string Note • NaN values in Y, X, Y0, E0, and V0 indicate missing values. infer removes missing values from specified data by list-wise deletion. • For the presample, infer horizontally concatenates the possibly jagged arrays Y0, E0, and V0 with respect to the last rows, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data, infer horizontally concatenates the possibly jagged arrays Y and X, and then it removes any row of the concatenated matrix containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, infer assumes that you synchronize the presample data such that the latest observations occur simultaneously. • infer issues an error when any table or timetable input contains missing values.
Output Arguments E — Inferred residual paths et numeric matrix 12-1555
12
Functions
Inferred residual paths et, returned as a numobs-by-numpaths numeric matrix. infer returns E only when you supply the input Y. E(j,k) is the path k residual of time j; it is the residual associated with response Y(j,k). V — Inferred conditional variance paths σt numeric matrix Inferred conditional variance paths σt, returned as a numobs-by-numpaths numeric matrix. infer returns V only when you supply the input Y. V(j,k) is the path k conditional variance of time j; it is the conditional variance associated with response Y(j,k). Tbl2 — Inferred residual et and conditional variance σt2 paths table | timetable Inferred residual et and conditional variance σt2 paths, returned as a table or timetable, the same data type as Tbl1. infer returns Tbl2 only when you supply the input Tbl1. Tbl2 contains the following variables: • The inferred residual paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths. Each path corresponds to the input response path in Tbl1 and represents the continuation of the corresponding presample residual path in Presample. infer names the inferred residual variable in Tbl2 responseName_Residual, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding inferred innovations paths with the name StockReturns_Residual. • The inferred conditional variance paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding path of presample conditional variances in Presample. infer names the inferred conditional variance variable in Tbl2 responseName_Variance, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding inferred conditional variance paths with the name StockReturns_Variance. • All variables Tbl1. If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal. logL — Loglikelihood objective function values numeric scalar | numeric vector Loglikelihood objective function values associated with the model Mdl, returned as a numeric scalar or vector of length numpaths. If Y is a vector, then logL is a scalar. Otherwise, logL is vector of length size(Y,2), and each element is the loglikelihood of the corresponding column (or path) in Y.
Algorithms If you supply data in the table or timetable Tbl1 to estimate an ARIMAX model, infer cannot backcast for presample responses. Therefore, if you specify PredictorVariables, you must also 12-1556
infer
specify presample response data by using the Presample and PresampleResponseVariable name-value arguments.
Version History Introduced in R2012a R2023b: infer accepts input data in tables and timetables, and returns results in tables and timetables In addition to accepting input data (in-sample and presample) in numeric arrays, infer accepts input data in tables or regular timetables. When you supply data in a table or timetable, the following conditions apply: • infer chooses the default in-sample response series on which to operate, but you can use the specified optional name-value argument to select a different series. • If you specify optional presample response, residual, or conditional variance data to initialize the model, you must also specify the appropriate presample variable names. • infer returns results in a table or timetable. Name-value arguments to support tabular workflows include: • ResponseVariable specifies the variable name of the response paths in the input data, from which infer infers conditional variances and innovations. • Presample specifies the input table or timetable of presample innovation and conditional variance data. • PresampleResponseVariable specifies the variable name of the response paths to select from Presample. • PresampleInnovationVariable specifies the variable name of the residual paths to select from Presample. • PresampleVarianceVariable specifies the variable name of the conditional variance paths to select from Presample. • PredictorVariables specifies the names of the predictor series to select from the input data for a model regression component.
References [1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995. [3] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima 12-1557
12
Functions
Functions estimate | filter | impulse | simulate | forecast Topics “Infer Residuals for Diagnostic Checking” on page 7-138 “Residual Diagnostics” on page 3-86
12-1558
infer
infer Infer residuals of univariate regression model with ARIMA time series errors
Syntax E = infer(Mdl,Y) [E,U,V] = infer(Mdl,Y) Tbl2 = infer(Mdl,Tbl1) [ ___ ] = infer( ___ ,Name=Value) [ ___ ,logL] = infer( ___ )
Description E = infer(Mdl,Y) returns the numeric array of one or more residual series E inferred from the fully specified, univariate regression model with ARIMA time series errors Mdl and the numeric array of one or more response series Y. [E,U,V] = infer(Mdl,Y) also returns the numeric array of one or more unconditional disturbance U and innovation variance V series. Tbl2 = infer(Mdl,Tbl1) returns the table or timetable Tbl2 containing paths of residuals, unconditional disturbances, innovation variances inferred from the model Mdl and the response data in the input table or timetable Tbl1. infer selects the response variable named in Mdl.SeriesName or the sole variable in Tbl1. To select a different response variable in Tbl1 to infer residuals, unconditional disturbances, and innovation variances, use the ResponseVariable name-value argument. [ ___ ] = infer( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. infer returns the output argument combination for the corresponding input arguments. For example, infer(Mdl,Y,U0=u0,X=Pred) infers residuals from the numeric vector of response data Y with respect to the regression model with ARIMA errors Mdl, and specifies the numeric vector of presample regression model residual data u0 to initialize the model and the predictor data Pred for the regression component. [ ___ ,logL] = infer( ___ ) also returns a numeric vector containing the loglikelihood objective function values logL associated with each specified path of response data.
Examples Infer Vector of Residuals from Regression Model with ARIMA Errors Infer error model residuals from a simulated path of responses from the following regression model with ARMA(2,1) errors: 12-1559
12
Functions
0.1 + ut −0 . 2 ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, yt = Xt
where εt is Gaussian with variance 0.1. Assume the predictors are standard Gaussian random variables. Provide data as numeric arrays. Create the regression model with ARIMA errors. Simulate responses from the model and two predictor series. Mdl = regARIMA(Intercept=0,AR={0.5 -0.8},MA=-0.5, ... Beta=[0.1; -0.2],Variance=0.1); rng(1,"twister"); % For reproducibility Pred = randn(100,2); y = simulate(Mdl,100,X=Pred);
Infer and plot the error model residuals. By default, infer backcasts for the necessary presample unconditional disturbances and sets necessary presample error model residuals to zero. e = infer(Mdl,y,X=Pred); figure plot(e) title("Inferred Residuals")
e is a 100-by-1 vector of error model residuals, associated with error model innovations εt. 12-1560
infer
Examine Residuals of Estimated Model in Timetable Fit a regression model with ARMA(1,1) errors by regressing the US gross domestic product (GDP) growth rate onto consumer price index (CPI) quarterly changes. Examine the error model and regression residuals. Supply a timetable of data and specify the series for the fit. Load and Transform Data Load the US macroeconomic data set. Compute the series of GDP quarterly growth rates and CPI quarterly changes. load Data_USEconModel DTT = price2ret(DataTimeTable,DataVariables="GDP"); DTT.GDPRate = 100*DTT.GDP; DTT.CPIDel = diff(DataTimeTable.CPIAUCSL); T = height(DTT) T = 248 figure tiledlayout(2,1) nexttile plot(DTT.Time,DTT.GDPRate) title("GDP Rate") ylabel("Percent Growth") nexttile plot(DTT.Time,DTT.CPIDel) title("Index")
12-1561
12
Functions
The series appear stationary, albeit heteroscedastic. Prepare Timetable for Estimation When you plan to supply a timetable, you must ensure it has all the following characteristics: • The selected response variable is numeric and does not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the timetable. DTT = rmmissing(DTT); T_DTT = height(DTT) T_DTT = 248
Because each sample time has an observation for all variables, rmmissing does not remove any observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time)
12-1562
infer
areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt; areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 1
DTT is regular. Create Model Template for Estimation Suppose that a regression model of CPI quarterly changes onto the GDP rate, with ARMA(1,1) errors, is appropriate. Create a model template for a regression model with ARMA(1,1) errors template. Specify the response variable name. Mdl = regARIMA(1,0,1); Mdl.SeriesName = "GDPRate";
Mdl is a partially specified regARIMA object. Fit Model to Data Fit a regression model with ARMA(1,1) errors to the data. Specify the entire series GDP rate and CPI quarterly changes series, and specify the predictor variable name. EstMdl = estimate(Mdl,DTT,PredictorVariables="CPIDel"); Regression with ARMA(1,1) Error Model (Gaussian Distribution): Value ________ Intercept AR{1} MA{1} Beta(1) Variance
0.0162 0.60515 -0.16221 0.002221 0.000113
StandardError _____________ 0.0016077 0.089912 0.11051 0.00077691 7.2753e-06
TStatistic __________ 10.077 6.7305 -1.4678 2.8587 15.533
PValue __________ 6.9995e-24 1.6906e-11 0.14216 0.0042532 2.0838e-54
EstMdl is a fully specified, estimated regARIMA object. By default, estimate backcasts for the required Mdl.P = 1 presample regression model residual and sets the required Mdl.Q = 1 presample error model residual to 0. 12-1563
12
Functions
Examine Residuals Infer a timetable of error model and regression residuals for all observations. Specify the predictor variable name. Tbl2 = infer(EstMdl,DTT,PredictorVariables="CPIDel") Tbl2=248×6 timetable Time Interval _____ ________ Q2-47 Q3-47 Q4-47 Q1-48 Q2-48 Q3-48 Q4-48 Q1-49 Q2-49 Q3-49 Q4-49 Q1-50 Q2-50 Q3-50 Q4-50 Q1-51 ⋮
91 92 92 91 91 92 92 90 91 92 91 91 91 91 91 91
GDP ___________
GDPRate _________
CPIDel ______
0.00015183 0.00018374 0.000427 0.00025617 0.00028739 0.00026512 5.1468e-05 -0.00021196 -0.00015576 6.1077e-05 -0.00010311 0.00040675 0.00036908 0.00065211 0.00040718 0.00053382
0.015183 0.018374 0.0427 0.025617 0.028739 0.026512 0.0051468 -0.021196 -0.015576 0.0061077 -0.010311 0.040675 0.036908 0.065211 0.040718 0.053382
0.08 0.76 0.57 0.09 0.65 0.21 -0.31 -0.14 0.01 -0.17 -0.14 0.03 0.24 0.46 0.64 0.9
GDPRate_ErrorResidual _____________________ -0.0007572 0.0010863 0.025116 -0.0019795 0.005197 0.0039745 -0.015678 -0.033356 -0.014767 0.0071327 -0.019164 0.037154 0.011432 0.037635 0.00016008 0.021232
Tbl2 is a 248-by-6 timetable containing the error model residuals GDPRate_ErrorResidual, regression residuals GDPRate_RegressionResidual, and all variables in DTT. Separately plot the inferred error model and regression residuals. Tbl2.GDPRate_Fitted = Tbl2.GDPRate - Tbl2.GDPRate_RegressionResidual; figure h = tiledlayout(2,2); title(h,"Error Model Residuals") nexttile plot(Tbl2.Time,Tbl2.GDPRate_ErrorResidual,'b',Tbl2.Time([1 end]),[0 0],'--r') title("Case Order") nexttile histogram(Tbl2.GDPRate_ErrorResidual) title("Histogram") nexttile plot(Tbl2.GDPRate_ErrorResidual(1:end-1),Tbl2.GDPRate_ErrorResidual(2:end),'o') title("e_{t-1} versus e_t") nexttile plot(Tbl2.GDPRate_Fitted,Tbl2.GDPRate_ErrorResidual,'o') title("Fitted versus e_t")
12-1564
GDPRate_R _________
0
-
infer
figure h = tiledlayout(2,2); title(h,"Regression Residuals") nexttile plot(Tbl2.Time,Tbl2.GDPRate_RegressionResidual,'b',Tbl2.Time([1 end]),[0 0],'--r') title("Case Order") nexttile histogram(Tbl2.GDPRate_RegressionResidual) title("Histogram") nexttile plot(Tbl2.GDPRate_RegressionResidual(1:end-1),Tbl2.GDPRate_RegressionResidual(2:end),'o') title("e_{t-1} versus e_t") nexttile plot(Tbl2.GDPRate_Fitted,Tbl2.GDPRate_RegressionResidual,'o') title("Fitted versus e_t")
12-1565
12
Functions
Compare Model Fits By Using Likelihood Ratio Test Fit this regression model with ARMA(2,1) errors to simulated data: 0.1 + ut −0 . 2 ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, yt = 1 + Xt
where εt is Gaussian with variance 0.1. Compare the fit to an intercept-only regression model by conducting a likelihood ratio test. Provide response and predictor data in vectors. Simulate Data Specify the regression model ARMA(2,1) errors. Simulate responses from the model, and simulate two predictor series from the standard Gaussian distribution. Mdl0 = regARIMA(Intercept=1,AR={0.5 -0.8},MA=-0.5, ... Beta=[0.1; -0.2],Variance=0.1); rng(1,"twister") % For reproducibility Pred = randn(100,2); y = simulate(Mdl0,100,X=Pred);
y is a 100-by-1 random response path simulated from Mdl. 12-1566
infer
Fit Unrestricted Model Create an unrestricted model template of a regression model with ARMA(2,1) errors for estimation. Mdl = regARIMA(2,0,1) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: Q: AR: SAR: MA: SMA: Variance:
"ARMA(2,1) Error Model (Gaussian Distribution)" "Y" Name = "Gaussian" NaN [1×0] 2 1 {NaN NaN} at lags [1 2] {} {NaN} at lag [1] {} NaN
The AR coefficients, MA coefficients, and the innovation variance are NaN values. estimate estimates those parameters. When Beta is an empty array, estimate determines the number of regression coefficients to estimate. Fit the unrestricted model to the data. Specify the predictor data. EstMdlUR = estimate(Mdl,y,X=Pred); Regression with ARMA(2,1) Error Model (Gaussian Distribution): Value ________ Intercept AR{1} AR{2} MA{1} Beta(1) Beta(2) Variance
StandardError _____________
1.0167 0.64995 -0.69174 -0.64508 0.10866 -0.20979 0.073117
0.010154 0.093794 0.082575 0.11055 0.020965 0.022824 0.008716
TStatistic __________ 100.13 6.9295 -8.3771 -5.835 5.183 -9.1917 8.3888
PValue __________ 0 4.2226e-12 5.4247e-17 5.3796e-09 2.1835e-07 3.8679e-20 4.9121e-17
EstMdlUR is a fully specified regARIMA object representing the estimated unrestricted regression model with ARIMA errors. Fit Restricted Model The restricted model contains the same error model, but the regression model contains only an intercept. That is, the restricted model imposes two restrictions on the unrestricted model: β1 = β2 = 0. Fit the restricted model to the data. EstMdlR = estimate(Mdl,y); ARMA(2,1) Error Model (Gaussian Distribution):
12-1567
12
Functions
Value ________ Intercept AR{1} AR{2} MA{1} Variance
1.0176 0.51541 -0.53359 -0.34923 0.1445
StandardError _____________
TStatistic __________
0.024905 0.18536 0.10949 0.19423 0.020214
40.859 2.7805 -4.8735 -1.798 7.1486
PValue __________ 0 0.0054271 1.0963e-06 0.07218 8.7671e-13
EstMdlR is a fully specified regARIMA object representing the estimated restricted regression model with ARIMA errors. Compute Residuals and Loglikelihoods Compute the residual series and loglikelihoods for the estimated models. [eUR,uUR,~,logLUR] = infer(EstMdlUR,y,X=Pred); [eR,uR,~,logLR] = infer(EstMdlR,y);
eUR and uUR are 100-by-1 vectors containing the error model and regression residuals from the unrestricted estimation. loglUR is the corresponding loglikelihood. eR and uR are 100-by-1 vectors containing the error model and regression residuals from the restricted estimation. loglR is the corresponding loglikelihood. Conduct Likelihood Ratio Test The likelihood ratio test requires the optimized loglikelihoods of the unrestricted and restricted models, and it requires the number of model restrictions (degrees of freedom). Conduct a likelihood ratio test to determine which model has the better fit to the data. dof = 2; [h,p] = lratiotest(logLUR,logLR,dof) h = logical 1 p = 1.6653e-15
The p-value is close to zero, which suggests that there is strong evidence to reject the null hypothesis that the data fits the restricted model better than the unrestricted model.
Input Arguments Mdl — Fully specified regression model with ARIMA errors regARIMA model object Fully specified regression model with ARIMA errors, specified as a regARIMA model object created by regARIMA or estimate. The properties of Mdl cannot contain NaN values. Y — Response data yt numeric column vector | numeric matrix 12-1568
infer
Response data yt, specified as a numobs-by-1 numeric column vector or numobs-by-numpaths numeric matrix. numObs is the length of the time series (sample size). numpaths is the number of separate, independent paths of response series. infer infers the residuals, unconditional disturbances, and innovation variances of columns of Y, which are time series characterized by Mdl. Each row corresponds to a sampling time. The last row contains the latest set of observations. Each column corresponds to a separate, independent path of response data. infer assumes that responses across any row occur simultaneously. Data Types: double Tbl1 — Time series data table | timetable Time series data containing the observed response variable yt and, optionally, predictor variables xt for the regression component, specified as a table or timetable with numvars variables and numobs rows. You can optionally select the response variable or numpreds predictor variables by using the ResponseVariable or PredictorVariables name-value arguments, respectively. Each row is an observation, and measurements in each row occur simultaneously. The selected response variable is a single path (numobs-by-1 vector) or multiple paths (numobs-by-numpaths matrix) of numobs observations of response data. Each path (column) of the selected response variable is independent of the other paths, but path j of all presample and in-sample variables correspond, for j = 1,…,numpaths. Each selected predictor variable is a numobs-by-1 numeric vector representing one path. The infer function includes all predictor variables in the model when it infers residuals. Variables in Tbl1 represent the continuation of corresponding variables in Presample. If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be strictly ascending or descending. If Tbl1 is a table, the last row contains the latest observation. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: infer(Mdl,Y,U0=u0,X=Pred) infers residuals from the numeric vector of response data Y with respect to the regression model with ARIMA errors Mdl, and specifies the numeric vector of presample regression model residual data u0 to initialize the model and the predictor data Pred for the regression component. ResponseVariable — Response variable yt to select from Tbl1 string scalar | character vector | integer | logical vector Response variable yt to select from Tbl1 containing the response data, specified as one of the following data types: 12-1569
12
Functions
• String scalar or character vector containing a variable name in Tbl1.Properties.VariableNames • Variable index (positive integer) to select from Tbl1.Properties.VariableNames • A logical vector, where DisturbanceVariable(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If Tbl1 has one variable, the default specifies that variable. Otherwise, the default matches the variable to names in Mdl.SeriesName. Example: ResponseVariable="StockRate" Example: ResponseVariable=[false false true false] or ResponseVariable=3 selects the third table variable as the response variable. Data Types: double | logical | char | cell | string X — Predictor data numeric matrix Predictor data for the model regression component, specified as a numeric matrix with numpreds columns. numpreds is the number of predictor variables (numel(Mdl.Beta)). Use X only when you supply the numeric array of response data Y. X must have at least numobs rows. If the number of rows of X exceeds numobs, infer uses only the latest observations. infer does not use the regression component in the presample period. Columns of X are separate predictor variables. infer applies X to each path; that is, X represents one path of observed predictors. By default, infer excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Predictor variables xt to select from Tbl1 string vector | cell vector of character vectors | vector of integers | logical vector Predictor variables xt to select from Tbl1 containing the predictor data for the model regression component, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in Tbl1.Properties.VariableNames • A vector of unique indices (positive integers) of variables to select from Tbl1.Properties.VariableNames • A logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames The selected variables must be numeric vectors and cannot contain missing values (NaNs). By default, infer excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. 12-1570
infer
Data Types: double | logical | char | cell | string E0 — Presample error model residual data et numeric column vector | numeric matrix Presample error model residual data et to initialize the error model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use E0 only when you supply the numeric array of response data Y. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the moving average (MA) component of the error model. If numpreobs is larger than required, infer uses the latest required number of observations only. Columns of E0 are separate, independent presample paths. The following conditions apply: • If E0 is a column vector, it represents a single residual path. infer applies it to each output path. • If E0 is a matrix, each column represents a presample residual path. infer applies E0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, infer uses the first size(Y,2) columns only. • infer assumes each column of E0 has a mean of zero. By default, infer sets the necessary presample disturbances to zero. Data Types: double U0 — Presample regression residual data numeric column vector | numeric matrix Presample regression residual data, associated with the unconditional disturbances ut, to initialize the error model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-bynumprepaths numeric matrix. Use U0 only when you supply the numeric array of response data Y. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the error model autoregressive (AR) component. If numpreobs is larger than required, infer uses the latest required observations only. Columns of U0 are separate, independent presample paths. The following conditions apply: • If U0 is a column vector, it represents a single path. infer applies it to each path. • If U0 is a matrix, each column represents a presample path. infer applies U0(:,j) to initialize path j. numprepaths must be at least numpaths. If numprepaths > numpaths, infer uses the first size(Z,2) columns only. By default, infer backcasts for necessary presample unconditional disturbances. Data Types: double Presample — Presample data table | timetable Presample data containing paths of error model residual et or regression residual series to initialize the model, specified as a table or timetable, the same type as Tbl1, with numprevars variables and 12-1571
12
Functions
numpreobs rows. Regression residuals are associated with the unconditional disturbances ut. Use Presample only when you supply a table or timetable of data Tbl1. Each selected variable is a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-bynumprepaths matrix) of numpreobs observations representing the presample of the error model or regression residual series for ResponseVariable, the selected response variable in Tbl1. Each row is a presample observation, and measurements in each row occur simultaneously. numpreobs must be one of the following values: • At least Mdl.P when Presample provides only presample regression residuals • At least Mdl.Q when Presample provides only presample error model residuals • At least max([Mdl.P Mdl.Q]) otherwise If you supply more rows than necessary, infer uses the latest required number of observations only. When Presample provides presample residuals, infer assumes each presample error model residual path has a mean of zero. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default, infer backcasts for necessary presample regression residuals and sets necessary presample error model residuals to zero. If you specify the Presample, you must specify the presample error model or regression residual name by using the PresampleInnovationVariable or PresampleRegressionDisturbanceVariable name-value argument. PresampleInnovationVariable — Error model residual et to select from Presample string scalar | character vector | integer | logical vector Error model residual variable et to select from Presample containing the presample error model residual data, specified as one of the following data types: • String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample error model residual data by using the Presample name-value argument, you must specify PresampleInnovationVariable. Example: PresampleInnovationVariable="GDP_Z" 12-1572
infer
Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable for presample error model residual data. Data Types: double | logical | char | cell | string PresampleRegressionDistrubanceVariable — Regression model residual variable to select from Presample string scalar | character vector | integer | logical vector Regression model residual variable, associated with unconditional disturbances ut, to select from Presample containing data for the presample regression model residuals, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleRegressionDistrubanceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample regression model residual data by using the Presample name-value argument, you must specify PresampleRegressionDistrubanceVariable. Example: PresampleRegressionDistrubanceVariable="StockRateU" Example: PresampleRegressionDistrubanceVariable=[false false true false] or PresampleRegressionDistrubanceVariable=3 selects the third table variable as the presample regression model residual data. Data Types: double | logical | char | cell | string Note • NaN values in Y, X, E0 and U0 indicate missing values. infer removes missing values from specified data by listwise deletion. • For the presample, infer horizontally concatenates the possibly jagged arrays E0 and U0 with respect to the last rows, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data, infer horizontally concatenates the possibly jagged arrays Y and X, and then it removes any row of the concatenated matrix containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, infer assumes that you synchronize the presample data such that the latest observations occur simultaneously. • infer issues an error when any table or timetable input contains missing values. • All predictor variables (columns) in X are associated with each input response series to produce numpaths output series.
12-1573
12
Functions
Output Arguments E — Inferred error model residuals et numeric matrix Inferred error model residuals et, returned as a numobs-by-numpaths numeric matrix. infer returns E only when you supply the input Y. E(j,k) is the path k error model residual of time j; it is the error model residual associated with response Y(j,k). Inferred residuals are et = u t − ϕ1u t − 1 − ... − ϕPu t − P − θ1et − 1 − ... − θQet − Q
u t is row t of the inferred unconditional disturbances U, ϕj is composite autoregressive coefficient j, and θk is composite moving average coefficient k. U — Inferred regression residuals numeric matrix Inferred regression residuals associated with the unconditional disturbances ut, returned as a numobs-by-numpaths numeric matrix. infer returns V only when you supply the input Y. U(j,k) is the path k regression model residual of time j; it is the regression model residual associated with response Y(j,k). Inferred unconditional disturbances are u t = yt − c − xt β . yt is row t of the response data Y, xt is row t of the predictor data X, c is the model intercept Mdl.Intercept, and β is the vector of regression coefficients Mdl.Beta. V — Inferred innovation variances numeric matrix Inferred innovation variances, returned as a numobs-by-numpaths numeric matrix. infer returns V only when you supply the input Y. All elements in V are equal to Mdl.Variance. Tbl2 — Inferred error model residual et and regression residual paths table | timetable Inferred error model residual et and regression residual paths, returned as a table or timetable, the same data type as Tbl1. infer returns Tbl2 only when you supply the input Tbl1. Regression residuals are associated with the unconditional disturbances ut. Tbl2 contains the following variables: • The inferred error model residual paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths. Each path corresponds to the input response path in Tbl1 and represents the continuation of the corresponding presample error model residual path in Presample. infer names the inferred residual variable in Tbl2 responseName_ErrorResidual, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable 12-1574
infer
for the corresponding inferred error model residual paths with the name StockReturns_ErrorResidual. • The inferred regression residual paths, which are in a numobs-by-numpaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding path of presample regression residuals in Presample. infer names the inferred regression residual variable in Tbl2 responseName_RegressionResidual, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl2 contains a variable for the corresponding inferred regression residual paths with the name StockReturns_RegressionResidual. • All variables Tbl1. If Tbl1 is a timetable, row times of Tbl1 and Tbl2 are equal. Tbl2 does not include a variable containing inferred paths of innovation variances. To create such a variable, enter Tbl2.responseName_Variance = Mdl.Variance*ones(size(Tbl2));. logL — Loglikelihood objective function values numeric scalar | numeric vector Loglikelihood objective function values associated with the model Mdl, returned as a numeric scalar or vector of length numpaths. If Y is a vector, then logL is a scalar. Otherwise, logL is vector of length size(Y,2), and each element is the loglikelihood of the corresponding column (or path) in Y.
Version History Introduced in R2013b R2023b: infer accepts input data in tables and timetables In addition to accepting input data (in-sample and presample data) in numeric arrays, infer accepts input data in tables or regular timetables. When you supply data in a table or timetable, the following conditions apply: • infer chooses the default in-sample response series on which to operate, but you can use the specified optional name-value argument to select a different series. • If you specify optional presample error model residual or regression model residual data to initialize the model, you must also specify the appropriate presample variable names. • infer returns results in a table or timetable. Name-value arguments to support tabular workflows include: • ResponseVariable specifies the name of the response series to select from the input data, from which residuals are inferred. • PredictorVariables specifies the names of the predictor series to select from the input data for a model regression component. • Presample specifies the input table or timetable of presample regression residual or error model residual data. • PresampleInnovationVariable specifies the name of the error model residual series to select from Presample. 12-1575
12
Functions
• PresampleRegressionDisturbanceVariable specifies the name of the regression residual series to select from Presample.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [3] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991. [6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
See Also Objects regARIMA Functions estimate | forecast | simulate Topics “Infer Residuals for Diagnostic Checking” on page 7-138 “Forecast a Regression Model with Multiplicative Seasonal ARIMA Errors” on page 5-166 “Residual Diagnostics” on page 3-86 “Select Regression Model with ARIMA Errors” on page 5-103 “Intercept Identifiability Illustration” on page 5-110
12-1576
infer
infer Infer vector autoregression model (VAR) innovations
Syntax E = infer(Mdl,Y) Tbl2 = infer(Mdl,Tbl1) ___ = infer( ___ ,Name=Value) [ ___ ,logL] = infer( ___ )
Description E = infer(Mdl,Y) returns a numeric array E containing the series of multivariate inferred innovations from evaluating the fully specified VAR(p) model Mdl at the numeric array of response data Y. For example, if Mdl is a VAR model fit to the response data Y, E contains the residuals. Tbl2 = infer(Mdl,Tbl1) returns the table or timetable Tbl2 containing the multivariate residuals from evaluating the fully specified VAR(p) model Mdl at the response variables in the table or timetable of data Tbl1. infer selects the variables in Mdl.SeriesNames or all variables in Tbl1. To select different response variables in Tbl1 at which to evaluate the model, use the ResponseVariables name-value argument. ___ = infer( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. infer returns the output argument combination for the corresponding input arguments. For example, infer(Mdl,Y,Y0=PS,X=Exo) computes the residuals of the VAR(p) model Mdl at the matrix of response data Y, and specifies the matrix of presample response data PS and the matrix of exogenous predictor data Exo. Supply all input data using the same data type. Specifically: • If you specify the numeric matrix Y, optional data sets must be numeric arrays and you must use the appropriate name-value argument. For example, to specify a presample, set the Y0 name-value argument to a numeric matrix of presample data. • If you specify the table or timetable Tbl1, optional data sets must be tables or timetables, respectively, and you must use the appropriate name-value argument. For example, to specify a presample, set the Presample name-value argument to a table or timetable of presample data. [ ___ ,logL] = infer( ___ ) returns the loglikelihood objective function value logL evaluated at the specified data.
Examples Infer VAR(4) Model Innovations From Matrix of Response Data Fit a VAR(4) model to the consumer price index (CPI) and unemployment rate data in a matrix. Then, infer the model innovations (residuals) from the estimated model. 12-1577
12
Functions
Load the Data_USEconModel data set. load Data_USEconModel
Plot the two series on separate plots. figure plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL) title("Consumer Price Index") ylabel("Index") xlabel("Date")
figure plot(DataTimeTable.Time,DataTimeTable.UNRATE) title("Unemployment Rate") ylabel("Percent") xlabel("Date")
12-1578
infer
Stabilize the CPI by converting it to a series of growth rates. Synchronize the two series by removing the first observation from the unemployment rate series. rcpi = price2ret(DataTimeTable.CPIAUCSL); unrate = DataTimeTable.UNRATE(2:end);
Create a default VAR(4) model by using the shorthand syntax. Mdl = varm(2,4);
Estimate the model using the entire data set. EstMdl = estimate(Mdl,[rcpi unrate]);
EstMdl is a fully specified, estimated varm model object. Infer innovations from the estimated model. Supply the same response data that the model was fit to as a numeric matrix. E = infer(EstMdl,[rcpi unrate]);
E is a 241-by-2 matrix of inferred innovations. The first and second columns contain the residuals corresponding to the CPI growth rate and unemployment rate, respectively. Alternatively, you can return residuals when you call estimate by supplying an output variable in the fourth position. 12-1579
12
Functions
Plot the residuals on separate plots. Synchronize the residuals with the dates by removing any missing observations from the data and removing the first Mdl.P dates. idx = all(~isnan([rcpi unrate]),2); datesr = DataTimeTable.Time(idx); figure plot(datesr((Mdl.P + 1):end),E(:,1)); ylabel("Consumer Price Index") xlabel("Date") title("Residual Plot") hold on yline(0,"r--"); hold off
figure plot(datesr((Mdl.P + 1):end),E(:,2)) ylabel("Unemployment Rate") xlabel("Date") title("Residual Plot") hold on yline(0,"r--"); hold off
12-1580
infer
The residuals corresponding to the CPI growth rate exhibit heteroscedasticity because the series appears to cycle through periods of higher and lower variance.
Infer VAR(4) Model Innovations from Timetable of Response Data Fit a VAR(4) model to the consumer price index (CPI) and unemployment rate data in a timetable. Then, infer the model innovations (residuals) from the estimated model. Load and Preprocess Data Load the Data_USEconModel data set. Compute the CPI growth rate. Because the growth rate calculation consumes the earliest observation, include the rate variable in the timetable by prepending the series with NaN. load Data_USEconModel DataTimeTable.RCPI = [NaN; price2ret(DataTimeTable.CPIAUCSL)]; numobs = height(DataTimeTable) numobs = 249
Prepare Timetable for Estimation When you plan to supply a timetable directly to estimate, you must ensure it has all the following characteristics: 12-1581
12
Functions
• All selected response variables are numeric and do not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the table, relative to the CPI rate (RCPI) and unemployment rate (UNRATE) series. varnames = ["RCPI" "UNRATE"]; DTT = rmmissing(DataTimeTable,DataVariables=varnames); numobs = height(DTT) numobs = 245
rmmissing removes the four initial missing observations from the DataTimeTable to create a subtable DTT. The variables RCPI and UNRATE of DTT do not have any missing observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt; areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 1
DTT is regular with respect to time. Create Model Template for Estimation Create a default VAR(4) model by using the shorthand syntax. Specify the response variable names. Mdl = varm(2,4); Mdl.SeriesNames = varnames;
Fit Model to Data Estimate the model. Pass the entire timetable DTT. By default, estimate selects the response variables in Mdl.SeriesNames to fit to the model. Alternatively, you can use the ResponseVariables name-value argument. 12-1582
infer
EstMdl = estimate(Mdl,DTT);
Compute Residuals Infer innovations from the estimated model. Supply the same response data that the model was fit to as a timetable. By default, infer selects the variables to use from EstMdl.SeriesNames. Tbl = infer(EstMdl,DTT); head(Tbl) Time _____
COE _____
Q1-49 Q2-49 Q3-49 Q4-49 Q1-50 Q2-50 Q3-50 Q4-50
144.1 141.9 141 140.5 144.6 150.6 159 166.9
CPIAUCSL ________ 23.91 23.92 23.75 23.61 23.64 23.88 24.34 24.98
FEDFUNDS ________ NaN NaN NaN NaN NaN NaN NaN NaN
GCE ____
GDP _____
GDPDEF ______
GPDI ____
GS10 ____
HOANBS ______
45.6 47.3 47.2 46.6 45.6 46.1 45.9 49.5
270 266.2 267.7 265.2 275.2 284.6 302 313.4
16.531 16.35 16.256 16.272 16.222 16.286 16.63 16.95
40.9 34 37.3 35.2 44.4 49.9 56.1 65.9
NaN NaN NaN NaN NaN NaN NaN NaN
53.961 53.058 52.501 52.291 52.696 53.997 55.7 56.213
size(Tbl) ans = 1×2 241
17
Tbl is a 241-by-17 timetable of variables in DTT and estimated model residuals, RCPI_Residuals and UNRATE_Residuals. Alternatively, you can return residuals when you call estimate by supplying an output variable in the fourth position.
Infer Innovations from Model Containing Regression Component Estimate a VAR(4) model of the consumer price index (CPI), the unemployment rate, and the gross domestic product (GDP). Include a linear regression component containing the current quarter and the last four quarters of government consumption expenditures and investment (GCE). Infer model innovations. Load the Data_USEconModel data set. Compute the real GDP. load Data_USEconModel DataTimeTable.RGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF*100;
Plot all variables on separate plots. figure tiledlayout(2,2) nexttile plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL); ylabel("Index") title("Consumer Price Index") nexttile
12-1583
12
Functions
plot(DataTimeTable.Time,DataTimeTable.UNRATE); ylabel("Percent") title("Unemployment Rate") nexttile plot(DataTimeTable.Time,DataTimeTable.RGDP); ylabel("Output") title("Real Gross Domestic Product") nexttile plot(DataTimeTable.Time,DataTimeTable.GCE); ylabel("Billions of $") title("Government Expenditures")
Stabilize the CPI, GDP, and GCE by converting each to a series of growth rates. Synchronize the unemployment rate series with the others by removing its first observation. varnames = ["CPIAUCSL" "RGDP" "GCE"]; DTT = varfun(@price2ret,DataTimeTable,InputVariables=varnames); DTT.Properties.VariableNames = varnames; DTT.UNRATE = DataTimeTable.UNRATE(2:end);
Make the time base regular. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
Expand the GCE rate series to a matrix that includes the first lagged series through the fourth lag series. 12-1584
infer
RGCELags = lagmatrix(DTT,1:4,DataVariables="GCE"); DTT = [DTT RGCELags]; DTT = rmmissing(DTT);
Create a default VAR(4) model by using the shorthand syntax. Specify the response variable names. Mdl = varm(3,4); Mdl.SeriesNames = ["CPIAUCSL" "UNRATE" "RGDP"];
Estimate the model using the entire sample. Specify the GCE and its lags as exogenous predictor data for the regression component. prednames = contains(DTT.Properties.VariableNames,"GCE"); EstMdl = estimate(Mdl,DTT,PredictorVariables=prednames);
Infer innovations from the estimated model. Supply the predictor data. Return the loglikelihood objective function value. [Tbl,logL] = infer(EstMdl,DTT,PredictorVariables=prednames); size(Tbl) ans = 1×2 240
11
head(Tbl) Time _____
CPIAUCSL __________
RGDP __________
GCE __________
Q1-49 Q2-49 Q3-49 Q4-49 Q1-50 Q2-50 Q3-50 Q4-50
0.00041815 -0.0071324 -0.0059122 0.0012698 0.010101 0.01908 0.025954 0.035395
-0.0031645 0.011385 -0.010366 0.040091 0.029649 0.03844 0.017994 0.01197
0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478 0.075508 0.14807
UNRATE ______ 6.2 6.6 6.6 6.3 5.4 4.4 4.3 3.4
Lag1GCE __________
Lag2GCE __________
Lag ____
0.047147 0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478 0.075508
0.04948 0.047147 0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478
0 0 0. 0. -0.0 -0. -0. 0.
logL logL = 1.7056e+03
Tbl is a 240-by-11 timetable of data and inferred innovations from the estimated model (residuals). Plot the residuals on separate plots. idx = endsWith(Tbl.Properties.VariableNames,"_Residuals"); resvars = Tbl.Properties.VariableNames(idx); titles = "Residuals: " + EstMdl.SeriesNames; figure tiledlayout(2,2) for j = 1:Mdl.NumSeries nexttile plot(Tbl.Time,Tbl{:,resvars(j)}); xlabel("Date"); title(titles(j));
12-1585
12
Functions
hold on yline(0,"r--"); hold off end
The residuals corresponding to the CPI and GDP growth rates exhibit heteroscedasticity because the CPI series appears to cycle through periods of higher and lower variance. Also, the first half of the GDP series seems to have higher variance than the latter half.
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object created by varm or estimate. Mdl must be fully specified. Y — Response data numeric matrix | numeric array Response data, specified as a numobs-by-numseries numeric matrix or a numobs-by-numseries-bynumpaths numeric array. numobs is the sample size. numseries is the number of response series (Mdl.NumSeries). numpaths is the number of response paths. 12-1586
infer
Rows correspond to observations, and the last row contains the latest observation. Y represents the continuation of the presample response series in Y0. Columns must correspond to the response variable names in Mdl.SeriesNames. Pages correspond to separate, independent numseries-dimensional paths. Among all pages, responses in a particular row occur at the same time. Data Types: double Tbl1 — Time series data table | timetable Time series data containing observed response variables yt and, optionally, predictor variables xt for a model with a regression component, specified as a table or timetable with numvars variables and numobs rows. Each selected response variable is a numobs-by-numpaths numeric matrix, and each selected predictor variable is a numeric vector. Each row is an observation, and measurements in each row occur simultaneously. You can optionally specify numseries response variables by using the ResponseVariables name-value argument, and you can specify numpreds predictor variables by using the PredictorVariables name-value argument. Paths (columns) within a particular response variable are independent, but path j of all variables correspond, for j = 1,…,numpaths. If Tbl1 is a timetable, it must represent a sample with a regular datetime time step (see isregular), and the datetime vector Tbl1.Time must be ascending or descending. If Tbl1 is a table, the last row contains the latest observation. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: infer(Mdl,Y,Y0=PS,X=Exo) computes the residuals of the VAR(p) model Mdl at the matrix of response data Y, and specifies the matrix of presample response data PS and the matrix of exogenous predictor data Exo. ResponseVariables — Variables to select from Tbl1 to treat as response variables yt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Tbl1 to treat as response variables yt, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Tbl1.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Tbl1.Properties.VariableNames • A length numvars logical vector, where ResponseVariables(j) = true selects variable j from Tbl1.Properties.VariableNames, and sum(ResponseVariables) is numseries 12-1587
12
Functions
The selected variables must be numeric vectors (single path) or matrices (columns represent multiple independent paths) of the same width, and cannot contain missing values (NaN). If the number of variables in Tbl1 matches Mdl.NumSeries, the default specifies all variables in Tbl1. If the number of variables in Tbl1 exceeds Mdl.NumSeries, the default matches variables in Tbl1 to names in Mdl.SeriesNames. Example: ResponseVariables=["GDP" "CPI"] Example: ResponseVariables=[true false true false] or ResponseVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string Y0 — Presample responses numeric matrix | numeric array Presample responses that provide initial values for the model Mdl, specified as a numpreobs-bynumseries numeric matrix or a numpreobs-by-numseries-by-numprepaths numeric array. Use Y0 only when you supply a numeric array of response data Y. numpreobs is the number of presample observations. numprepaths is the number of presample response paths. Each row is a presample observation, and measurements in each row, among all pages, occur simultaneously. The last row contains the latest presample observation. Y0 must have at least Mdl.P rows. If you supply more rows than necessary, infer uses the latest Mdl.P observations only. Each column corresponds to the response series associated with the respective response series in Y. Pages correspond to separate, independent paths. • If Y0 is a matrix, infer applies it to each path (page) in Y. Therefore, all paths in Y derive from common initial conditions. • Otherwise, infer applies Y0(:,:,j) to Y(:,:,j). Y0 must have at least numpaths pages, and infer uses only the first numpaths pages. By default, infer uses the first Mdl.P observations, for example, Y(1:Mdl.P,:), as a presample. This action reduces the effective sample size. Data Types: double Presample — Presample data table | timetable Presample data that provides initial values for the model Mdl, specified as a table or timetable, the same type as Tbl1, with numprevars variables and numpreobs rows. Each row is a presample observation, and measurements in each row, among all paths, occur simultaneously. numpreobs must be at least Mdl.P. If you supply more rows than necessary, infer uses the latest Mdl.P observations only. Each variable is a numpreobs-by-numprepaths numeric matrix. Variables correspond to the response series associated with the respective response variable in Tbl1. To control presample variable selection, see the optional PresampleResponseVariables name-value argument. For each variable, columns are separate, independent paths. 12-1588
infer
• If variables are vectors, infer applies them to each path in Tbl1 to produce the corresponding residuals in Tbl2. Therefore, all response paths derive from common initial conditions. • Otherwise, for each variable ResponseK and each path j, infer applies Presample.ResponseK(:,j) to produce Tbl2.ResponseK(:,j). Variables must have at least numpaths columns, and infer uses only the first numpaths columns. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs Tbl1 and Presample must be consistent in time such that Presample immediately precedes Tbl1 with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. By default, infer uses the first or earliest Mdl.P observations in Tbl1 as a presample, and then it fits the model to the remaining numobs – Mdl.P observations. This action reduces the effective sample size. PresampleResponseVariables — Variables to select from Presample to use for presample response data string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Presample to use for presample data, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Presample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Presample.Properties.VariableNames • A length numvars logical vector, where PresampleResponseVariables(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleResponseVariables) is numseries The selected variables must be numeric vectors (single path) or matrices (columns represent multiple independent paths) of the same width, and cannot contain missing values (NaN). PresampleResponseNames does not need to contain the same names as in Tbl1; infer uses the data in selected variable PresampleResponseVariables(j) as a presample for the response variable corresponding to ResponseVariables(j). The default specifies the same response variables as those selected from Tbl1 (see ResponseVariables). Example: PresampleResponseVariables=["GDP" "CPI"] Example: PresampleResponseVariables=[true false true false] or PresampleResponseVariable=[1 3] selects the first and third table variables for presample data. Data Types: double | logical | char | cell | string X — Predictor data xt numeric matrix 12-1589
12
Functions
Predictor data xt for the regression component in the model, specified as a numeric matrix containing numpreds columns. Use X only when you supply a numeric array of response data Y. numpreds is the number of predictor variables (size(Mdl.Beta,2)). Each row corresponds to an observation, and measurements in each row occur simultaneously. The last row contains the latest observation. X must have at least as many observations as Y. If you supply more rows than necessary, infer uses only the latest observations. infer does not use the regression component in the presample period. • If you specify a numeric array for a presample by using Y0, X must have at least numobs rows (see Y). • Otherwise, X must have at least numobs – Mdl.P observations to account for the default presample removal from Y. Each column is an individual predictor variable. All predictor variables are present in the regression component of each response equation. infer applies X to each path (page) in Y; that is, X represents one path of observed predictors. By default, infer excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Variables to select from Tbl1 to treat as exogenous predictor variables xt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Tbl1 to treat as exogenous predictor variables xt, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in Tbl1.Properties.VariableNames • A length numpreds vector of unique indices (integers) of variables to select from Tbl1.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from Tbl1.Properties.VariableNames, and sum(PredictorVariables) is numpreds The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, infer excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. Data Types: double | logical | char | cell | string Note • NaN values in Y, Y0, and X indicate missing values. infer removes missing values from the data by list-wise deletion. 12-1590
infer
1
If Y is a 3-D array, then infer horizontally concatenates the pages of Y to form a numobs-by(numpaths*numseries + numpreds) matrix.
2
If a regression component is present, then infer horizontally concatenates X to Y to form a numobs-by-numpaths*numseries + 1 matrix. infer assumes that the last rows of each series occur at the same time.
3
infer removes any row that contains at least one NaN from the concatenated data.
4
infer applies steps 1 and 3 to the presample paths in Y0.
This process ensures that the inferred output innovations of each path are the same size and are based on the same observation times. In the case of missing observations, the results obtained from multiple paths of Y can differ from the results obtained from each path individually. This type of data reduction reduces the effective sample size. • infer issues an error when any table or timetable input contains missing values.
Output Arguments E — Inferred multivariate innovations series numeric matrix | numeric array Inferred multivariate innovations series, returned as either a numeric matrix, or as a numeric array that contains columns and pages corresponding to Y. infer returns E only when you supply a matrix of response data Y. • If you specify Y0, then E has numobs rows (see Y). • Otherwise, E has numobs – Mdl.P rows to account for the presample removal. Tbl2 — Inferred multivariate innovations series table | timetable Inferred multivariate innovations series and other variables, returned as a table or timetable, the same data type as Tbl1. infer returns Tbl2 only when you supply the input Tbl1. Tbl2 contains the inferred innovation paths E from evaluating the model Mdl at the paths of selected response variables Y, and it contains all variables in Tbl1. infer names the innovation variable corresponding to variable ResponseJ in Tbl1 ResponseJ_Residuals. For example, if one of the selected response variables for estimation in Tbl1 is GDP, Tbl2 contains a variable for the residuals in the response equation of GDP with the name GDP_Residuals. If you specify presample response data, Tbl2 and Tbl1 have the same number of rows, and their rows correspond. Otherwise, because infer removes initial observations from Tbl1 for the required presample by default, Tbl2 has numobs – Mdl.P rows to account for that removal. If Tbl1 is a timetable, Tbl1 and Tbl2 have the same row order, either ascending or descending. logL — Loglikelihood objective function value numeric scalar | numeric vector Loglikelihood objective function value, returned as a numeric scalar or a numpaths-element numeric vector. logL(j) corresponds to the response path in Y(:,:,j) or the path (column) j of the selected response variables of Tbl1. 12-1591
12
Functions
Algorithms Suppose Y, Y0, and X are the response, presample response, and predictor data specified by the numeric data inputs in Y, Y0, and X, or the selected variables from the input tables or timetables Tbl1 and Presample. • infer infers innovations by evaluating the VAR model Mdl, specifically, ε t = Φ (L)yt − c − β xt − δ t . • infer uses this process to determine the time origin t0 of models that include linear time trends. • If you do not specify Y0, then t0 = 0. • Otherwise, infer sets t0 to size(Y0,1) – Mdl.P. Therefore, the times in the trend component are t = t0 + 1, t0 + 2,..., t0 + numobs, where numobs is the effective sample size (size(Y,1) after infer removes missing values). This convention is consistent with the default behavior of model estimation in which estimate removes the first Mdl.P responses, reducing the effective sample size. Although infer explicitly uses the first Mdl.P presample responses in Y0 to initialize the model, the total number of observations in Y0 and Y (excluding missing values) determines t0.
Version History Introduced in R2017a R2022b: infer accepts input data in tables and timetables, and return results in tables and timetables In addition to accepting input data in numeric arrays, infer accepts input data in tables and timetables. infer chooses default series on which to operate, but you can use the following namevalue arguments to select variables. • ResponseVariables specifies the response series names in the input data from which residuals are inferred. • PredictorVariables specifies the predictor series names in the input data for a model regression component. • Presample specifies the input table or timetable of presample response data. • PresampleResponseVariables specifies the response series names from Presample.
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [3] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [4] Lütkepohl, H. New Introduction to Multiple Time Series Analysis. Berlin: Springer, 2005. 12-1592
infer
See Also Objects varm Functions estimate | filter Topics “VAR Model Estimation” on page 9-34 “Fit VAR Model of CPI and Unemployment Rate” on page 9-38 “VAR Model Case Study” on page 9-90
12-1593
12
Functions
infer Infer vector error-correction (VEC) model innovations
Syntax E = infer(Mdl,Y) Tbl2 = infer(Mdl,Tbl1) ___ = infer( ___ ,Name,Value) [ ___ ,logL] = infer( ___ )
Description E = infer(Mdl,Y) returns a numeric array E containing the series of multivariate inferred innovations from evaluating the fully specified VEC(p – 1) model Mdl at the numeric array of response data Y. For example, if Mdl is a VEC model fit to the response data Y, E contains the residuals. Tbl2 = infer(Mdl,Tbl1) returns the table or timetable Tbl2 containing the multivariate residuals from evaluating the fully specified VEC(p – 1) model Mdl at the response variables in the table or timetable of data Tbl1. infer selects the variables in Mdl.SeriesNames or all variables in Tbl1. To select different response variables in Tbl1 at which to evaluate the model, use the ResponseVariables name-value argument. ___ = infer( ___ ,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. infer returns the output argument combination for the corresponding input arguments. For example, infer(Mdl,Y,Y0=PS,X=Exo) computes the residuals of the VEC(p – 1) model Mdl at the matrix of response data Y, and specifies the matrix of presample response data PS and the matrix of exogenous predictor data Exo. Supply all input data using the same data type. Specifically: • If you specify the numeric matrix Y, optional data sets must be numeric arrays and you must use the appropriate name-value argument. For example, to specify a presample, set the Y0 name-value argument to a numeric matrix of presample data. • If you specify the table or timetable Tbl1, optional data sets must be tables or timetables, respectively, and you must use the appropriate name-value argument. For example, to specify a presample, set the Presample name-value argument to a table or timetable of presample data. [ ___ ,logL] = infer( ___ ) returns the loglikelihood objective function value logL evaluated at the specified data.
Examples Infer VEC Model Innovations From Matrix of Response Data Consider a VEC model for the following seven macroeconomic series, and then fit the model to a matrix of response data. 12-1594
infer
• Gross domestic product (GDP) • GDP implicit price deflator • Paid compensation of employees • Nonfarm business sector hours of all persons • Effective federal funds rate • Personal consumption expenditures • Gross private domestic investment Suppose that a cointegrating rank of 4 and one short-run term are appropriate, that is, consider a VEC(1) model. Load the Data_USEconVECModel data set. load Data_USEconVECModel
For more information on the data set and variables, enter Description at the command line. Determine whether the data needs to be preprocessed by plotting the series on separate plots. figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.GDP) title("Gross Domestic Product") ylabel("Index") xlabel("Date") nexttile plot(FRED.Time,FRED.GDPDEF) title("GDP Deflator") ylabel("Index") xlabel("Date") nexttile plot(FRED.Time,FRED.COE) title("Paid Compensation of Employees") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.HOANBS) title("Nonfarm Business Sector Hours") ylabel("Index") xlabel("Date")
12-1595
12
Functions
figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.FEDFUNDS) title("Federal Funds Rate") ylabel("Percent") xlabel("Date") nexttile plot(FRED.Time,FRED.PCEC) title("Consumption Expenditures") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.GPDI) title("Gross Private Domestic Investment") ylabel("Billions of $") xlabel("Date")
12-1596
infer
Stabilize all series, except the federal funds rate, by applying the log transform. Scale the resulting series by 100 so that all series are on the same scale. FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
Create a VEC(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = FRED.Properties.VariableNames Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant:
"7-Dimensional Rank = 4 VEC(1) Model with Linear Time Trend" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs]
12-1597
12
Functions
CointegrationTrend: ShortRun: Trend: Beta: Covariance:
[4×1 {7×7 [7×1 [7×0 [7×7
vector of matrix of vector of matrix] matrix of
NaNs] NaNs} at lag [1] NaNs] NaNs]
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model by supplying a matrix of data. Use default options. EstMdl = estimate(Mdl,FRED.Variables) EstMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [14.1329 8.77841 -7.20359 ... and 4 more]' [7×4 matrix] [7×4 matrix] [7×7 matrix] [-28.6082 109.555 -77.0912 ... and 1 more]' [4×1 vector of zeros] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
EstMdl is an estimated vecm model object. It is fully specified because all parameters have known values. By default, estimate imposes the constraints of the H1 Johansen VEC model form by removing the cointegrating trend and linear trend terms from the model. Parameter exclusion from estimation is equivalent to imposing equality constraints to zero. Infer innovations from the estimated model, the residuals from the model fit. Supply the matrix of insample data. E = infer(EstMdl,FRED.Variables);
E is a 238-by-7 matrix of inferred innovations. Columns correspond to the variable names in EstMdl.SeriesNames. Alternatively, you can return residuals when you call estimate by supplying an output variable in the fourth position. Plot the residuals on separate plots. Synchronize the residuals with the dates by removing the first EstMdl.P dates. idx = FRED.Time((EstMdl.P + 1):end); titles = "Residuals: " + EstMdl.SeriesNames; figure tiledlayout(2,2) for j = 1:4 nexttile
12-1598
infer
plot(idx,E(:,j)) hold on yline(0,"r--") hold off title(titles(j)) end
figure tiledlayout(2,2) for j = 5:7 nexttile plot(idx,E(:,j)) hold on yline(0,"r--") hold off title(titles(j)) end
12-1599
12
Functions
The residuals corresponding to the federal funds rate exhibit heteroscedasticity.
Infer VEC Model Innovations From Timetable of Response Data Consider a VEC model for the following seven macroeconomic series, and then fit the model to a timetable of response data. This example is based on “Infer VEC Model Innovations From Matrix of Response Data” on page 12-1594. Load and Preprocess Data Load the Data_USEconVECModel data set. load Data_USEconVECModel DTT = FRED; DTT.GDP = 100*log(DTT.GDP); DTT.GDPDEF = 100*log(DTT.GDPDEF); DTT.COE = 100*log(DTT.COE); DTT.HOANBS = 100*log(DTT.HOANBS); DTT.PCEC = 100*log(DTT.PCEC); DTT.GPDI = 100*log(DTT.GPDI);
12-1600
infer
Prepare Timetable for Estimation When you plan to supply a timetable directly to estimate, you must ensure it has all the following characteristics: • All selected response variables are numeric and do not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the table. DTT = rmmissing(DTT); numobs = height(DTT) numobs = 240
DTT does not contain any missing values. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
DTT is regular with respect to time. Create Model Template for Estimation Create a VEC(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = DTT.Properties.VariableNames;
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data.
12-1601
12
Functions
Fit Model to Data Estimate the model by supplying the timetable of data DTT. By default, because the number of variables in Mdl.SeriesNames is the number of variables in DTT, estimate fits the model to all the variables in DTT. EstMdl = estimate(Mdl,DTT);
EstMdl is an estimated vecm model object. Compute Residuals Infer innovations from the estimated model, the residuals from the model fit. Supply the timetable of in-sample data DTT. By default, because the number of variables in Mdl.SeriesNames is the number of variables in DTT, infer selects all the variables in DTT, from which to compute residuals. Tbl = infer(EstMdl,DTT); head(Tbl) Time ___________
GDP ______
GDPDEF ______
COE ______
HOANBS ______
FEDFUNDS ________
PCEC ______
GPDI ______
GDP_Re ______
01-Jul-1957 01-Oct-1957 01-Jan-1958 01-Apr-1958 01-Jul-1958 01-Oct-1958 01-Jan-1959 01-Apr-1959
617.44 616.48 614.93 615.87 618.76 621.54 623.66 626.19
281.55 281.61 282.68 282.97 283.57 284.04 284.31 284.46
558.01 557.48 556.15 556.03 558.99 560.84 563.55 565.91
399.59 397.5 395.21 393.76 394.95 396.43 398.35 400.24
3.47 2.98 1.2 0.93 1.76 2.42 2.8 3.39
566.71 567.26 567.09 568.09 569.81 571.11 573.62 575.54
437.32 426.27 420.02 417.59 427.67 438.2 442.12 449.31
0.1 -2. -2. 0. 2. 0.6 0.3 0.2
size(Tbl) ans = 1×2 238
14
Tbl is a 238-by-14 timetable of in-sample data in DTT and estimated model residuals. Residual variables names are appended with _Residuals, for example, GDP_Residuals. Alternatively, you can return residuals when you call estimate by supplying an output variable in the fourth position.
Infer Innovations from Model Containing Regression Component Consider the model and data in “Infer VEC Model Innovations From Matrix of Response Data” on page 12-1594. Load Data Load the Data_USEconVECModel data set. load Data_USEconVECModel
12-1602
infer
The Data_Recessions data set contains the beginning and ending serial dates of recessions. Load this data set. Convert the matrix of date serial numbers to a datetime array. load Data_Recessions dtrec = datetime(Recessions,ConvertFrom="datenum");
Preprocess Data Remove the exponential trend from the series, and then scale them by a factor of 100. DTT = FRED; DTT.GDP = 100*log(DTT.GDP); DTT.GDPDEF = 100*log(DTT.GDPDEF); DTT.COE = 100*log(DTT.COE); DTT.HOANBS = 100*log(DTT.HOANBS); DTT.PCEC = 100*log(DTT.PCEC); DTT.GPDI = 100*log(DTT.GPDI);
Create a dummy variable that identifies periods in which the U.S. was in a recession or worse. Specifically, the variable should be 1 if FRED.Time occurs during a recession, and 0 otherwise. Include the variable with the FRED data. isin = @(x)(any(dtrec(:,1) Y1 is the IRF of yt, and the plot entitled U1 -> X1 is the IRF of xt. Both IRFs indicate that the effects of the shock on the system diminish after about 8 periods.
Specify Number of Periods and IRFs to Plot Plot the 10-period IRFs of only the measurement variables in a system. Explicitly create the multivariate diffuse state-space model x1, t = x1, t − 1 + 0 . 2u1, t x2, t = x1, t − 1 + 0 . 3x2, t − 1 + u2, t y1, t = x1, t + ε1, t y2, t = x1, t + x2, t + ε2, t . A = B = C = D = Mdl
[1 0; 1 0.3]; [0.2 0; 0 1]; [1 0; 1 1]; eye(2); = dssm(A,B,C,D)
Mdl = State-space model type: dssm
12-1679
12
Functions
State vector length: 2 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 2 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equations: x1(t) = x1(t-1) + (0.20)u1(t) x2(t) = x1(t-1) + (0.30)x2(t-1) + u2(t) Observation equations: y1(t) = x1(t) + e1(t) y2(t) = x1(t) + x2(t) + e2(t) Initial state distribution: Initial state means x1 x2 0 0 Initial state covariance matrix x1 x2 x1 Inf 0 x2 0 Inf State types x1 x2 Diffuse Diffuse
Mdl is a fully specified ssm model object. Plot the two 10-period IRFs of y2, t, and suppress the state variable IRFs. irfplot(Mdl,'NumPeriods',10,'PlotY',2,'PlotX',[]);
12-1680
irfplot
The top subplot is the IRF of y2, t resulting from a shock to u1, t, which is persistent because the shock filters through the random walk state x1, t. The bottom subplot is the IRF of y2, t resulting from a shock to u2, t, which is transient and eventually diminishes as time elapses because the state x2, t exhibits autoregressive behavior.
Plot Cumulative IRFs of Estimated Model to Specified Axes Simulate data from a known model, fit the data to a state-space model, and then the plot cumulative IRFs of the estimated model to specified axes. Assume that the data generating process (DGP) is the AR(1) model xt = 1 + 0 . 75xt − 2 + ut, where ut is a series of independent and identically distributed Gaussian variables with mean 0 and variance 1. Simulate 500 observations from the model. rng(1); % For reproducibility DGP = arima('Constant',1,'AR',{0 0.75},'Variance',1); y = simulate(DGP,500);
12-1681
12
Functions
Explicitly create a state-space model template for estimation that represents the model xt = c + ϕxt − 2 + ηut yt = xt . A = B = C = D = Mdl
[0 NaN NaN; 0 1 0; 1 0 0]; [NaN; 0; 0]; [1 0 0]; 0; = ssm(A,B,C,D,'StateType',[0 1 0]);
Fit the model template to the data. Specify a set of positive, random standard Gaussian starting values for the three model parameters. EstMdl = estimate(Mdl,y,abs(randn(3,1))); Method: Maximum likelihood (fminunc) Sample size: 500 Logarithmic likelihood: -892.214 Akaike info criterion: 1790.43 Bayesian info criterion: 1803.07 | Coeff Std Err t Stat Prob -------------------------------------------------c(1) | 0.41320 0.12199 3.38730 0.00071 c(2) | 0.67319 0.02749 24.48749 0 c(3) | 1.11450 0.03623 30.76557 0 | | Final State Std Dev t Stat Prob x(1) | 3.69929 0 Inf 0 x(2) | 1 0 Inf 0 x(3) | 1.43378 0 Inf 0
EstMdl is a fully specified dssm model object. Plot the cumulative IRFs of the first and third state variables, and the measurement variable in EstMdl. Return the plot in the same figure, on three separate subplots. ax = gobjects(3,1); for j = 1:numel(ax) ax(j) = subplot(3,1,j); end irfplot(ax,EstMdl,'Cumulative',true,'PlotX',[1 3]);
12-1682
irfplot
Because yt = xt, the top two IRFs in the figure are equivalent. Because x1, t − 1 = x3, t, the IRF in the subplot at the bottom of the figure is shifted to the left, relative to the other two plots.
Plot Time-Varying IRF Simulate data from a time-varying state-space model, fit a model to the data, then plot the timevarying IRF of the estimated model. Consider the DGP represented by this system xt =
0 . 75xt − 1 + ut;
t < 11
−0 . 1xt − 1 + 3ut; t ≥ 11
yt = 1 . 5xt + 2εt . Write a function that specifies how the parameters params map to the state-space model matrices. Save this code as a file named timeVariantAR1ParamMap.m on your MATLAB® path. Alternatively, open the example to access the function. type timeVariantAR1ParamMap.m % Copyright 2020 The MathWorks, Inc. function [A,B,C,D] = timeVariantAR1ParamMap(params)
12-1683
12
Functions
% % % % %
Time-varying state-space model parameter mapping function example. This function maps the vector params to the state-space matrices (A, B, C, and D). From periods 1 through 10, the state model is an AR(1)model, and from periods 11 through 20, the state model is possibly a different AR(1) model. The measurement equation is the same throughout the time span. A1 = {params(1)}; A2 = {params(2)}; varu1 = exp(params(3)); % Positive variance constraints varu2 = exp(params(4)); B1 = {sqrt(varu1)}; B2 = {sqrt(varu2)}; C = params(5); vare1 = exp(params(6)); D = sqrt(vare1); A = [repmat(A1,10,1); repmat(A2,10,1)]; B = [repmat(B1,10,1); repmat(B2,10,1)]; end
Implicitly create a partially specified state-space model representing the DGP. For this example, fix the measurement-sensitivity coefficient C to 1.5. C = 1.5; fixCParamMap = @(x)timeVariantAR1ParamMap([x(1:4), C, x(5)]); DGP = ssm(fixCParamMap);
Simulate 20 observations from the DGP. Because DGP is partially specified, pass the true parameter values to simulate by using the 'Params' name-value pair argument. rng(10) % For reproducibility A1 = 0.75; A2 = -0.1; B1 = 1; B2 = 3; D = 2; trueParams = [A1 A2 2*log(B1) 2*log(B2) 2*log(D)]; % Transform variances for parameter map y = simulate(DGP,20,'Params',trueParams);
y is a 20-by-1 vector of simulated measurements yt from the DGP. Because DGP is a partially specified, implicit model object, its parameters are unknown. Therefore, it can serve as a model template for estimation. Fit the model to the simulated data. Specify random standard Gaussian draws for the initial parameter values. Return the parameter estimates. [~,estParams] = estimate(DGP,y,randn(1,5),'Display','off') estParams = 1×5 0.6164
-0.1665
0.0135
1.6803
-1.5855
estParams is a 1-by-5 vector of parameter estimates. The output argument list of the parameter mapping function determines the order of the estimates: A{1}, A{2}, B{1}, B{2}, and D. Plot the IRF of the measurement and state variables by supplying DGP (not the estimated model) and the estimated parameters by using the 'Params' name-value pair argument. 12-1684
irfplot
h = irfplot(DGP,'Params',estParams); xline(h(1,1).Parent,10.5,'--')
xline(h(1,2).Parent,10.5,'--')
12-1685
12
Functions
The figures show time-varying IRFs of the measurement and state variables. The first 10 periods correspond to the IRF of the first state equation. During period 11, what remains of the shock transfers to the second state equation, and filters through that system until it diminishes.
Plot IRF Confidence Bounds Plot the measurement variuable IRF and the 95% confidence intervals on the true IRFs. Assume that the data generating process (DGP) is the AR(1) model xt = 1 + 0 . 75xt − 2 + ut, where ut is a series of independent and identically distributed Gaussian variables with mean 0 and variance 1. Simulate 500 observations from the model. rng(1); % For reproducibility DGP = arima('Constant',1,'AR',{0 0.75},'Variance',1); y = simulate(DGP,500);
Explicitly create a diffuse state-space model template for estimation that represents the model. Fit the model to the data, and return parameter estimates and their corresponding estimated covariance matrix. 12-1686
irfplot
A = [0 NaN NaN; 0 1 0; 1 0 0]; B = [NaN; 0; 0]; C = [1 0 0]; D = 0; Mdl = dssm(A,B,C,D,'StateType',[0 1 0]); [~,estParams,EstParamCov] = estimate(Mdl,y,abs(randn(3,1))); Method: Maximum likelihood (fminunc) Effective Sample size: 500 Logarithmic likelihood: -892.214 Akaike info criterion: 1790.43 Bayesian info criterion: 1803.07 | Coeff Std Err t Stat Prob -------------------------------------------------c(1) | 0.41320 0.12199 3.38730 0.00071 c(2) | 0.67319 0.02749 24.48749 0 c(3) | 1.11450 0.03623 30.76557 0 | | Final State Std Dev t Stat Prob x(1) | 3.69929 0 Inf 0 x(2) | 1 0 Inf 0 x(3) | 1.43378 0 Inf 0
Mdl is a fully specified, estimated dssm model object. Plot the IRF, with its 95% confidence intervals, of the measurement variable. irfplot(Mdl,'Params',estParams,'EstParamCov',EstParamCov,... 'PlotX',[]);
12-1687
12
Functions
The blue line represents the estimated IRF of yt. The dashed red lines represent the upper and lower, pointwise 95% confidence bounds on the true IRF. The model has only one lag term (lag 2), as the shock filters through the system, it impacts the first state variable during odd periods only.
Input Arguments Mdl — State-space model ssm model object | dssm model object State-space model, specified as an ssm model object returned by ssm or its estimate function, or a dssm model object returned by dssm or its estimate function. If Mdl is partially specified (that is, it contains unknown parameters), specify estimates of the unknown parameters by using the 'Params' name-value argument. Otherwise, irfplot issues an error. irfplot issues an error when Mdl is a dimension-varying model, which is a time-varying on page 115 model containing at least one variable that changes dimension during the sampling period (for example, a state variable drops out of the model). Tip If Mdl is fully specified, you cannot estimate confidence bounds. To estimate confidence bounds: 1
12-1688
Create a partially specified state-space model template for estimation Mdl.
irfplot
2
Estimate the model by using the estimate function and data. Return the estimated parameters estParams and estimated parameter covariance matrix EstParamCov.
3
Pass the model template for estimation Mdl to irfplot, and specify the parameter estimates and covariance matrix by using the 'Params' and 'EstParamCov' name-value arguments.
4
For the irfplot function, return the appropriate output arguments for lower and upper confidence bounds.
ax — Axes on which to plot IRFs vector of Axes objects Axes on which to plot the IRFs, specified as a pU*(pY + pX)-by-1 vector of Axes objects, where pU, pY, and pX are the lengths of the values of the 'PlotU', 'PlotY', and 'PlotX' name-value pair arguments, respectively. irfplot plots IRFs to the axes in ax in this order. 1
IRFs of all measurement variables PlotY(:) resulting from a shock to the first state disturbance PlotU(1).
2
IRFs of all measurement variables PlotY(:) resulting from a shock to the second state disturbance PlotU(2).
3
Continue the procedure similarly until irfplot plots the IRF associated with the last state disturbance PlotU(end).
4
Repeat steps 1 through 3, but replace the measurement variables with the state variables PlotX.
By default, irfplot plots the measurement-variable IRFs to the axes of subplots in a new figure, and plots the state-variable IRFs to the axes of subplots in another new figure. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'PlotU',1:2,'PlotX',[] plots only the measurement-variable IRFs resulting from shocks applied to the first and second state-disturbance variables (the state-variable IRF plot is suppressed). IRF Options
NumPeriods — Number of periods 20 (default) | positive integer Number of periods for which irfplot computes the IRF, specified as a positive integer. Periods in the IRF start at time 1 and end at time NumPeriods. Example: 'NumPeriods',10 specifies the inclusion of 10 consecutive time points in the IRF starting at time 1, during which irfplot applies the shock, and ending at time 10. Data Types: double 12-1689
12
Functions
Params — Estimates of unknown parameters numeric vector Estimates of the unknown parameters in the partially specified state-space model Mdl, specified as a numeric vector. If Mdl is partially specified (contains unknown parameters specified by NaNs), you must specify Params. The estimate function returns parameter estimates of Mdl in the appropriate form. However, you can supply custom estimates by arranging the elements of Params as follows: • If Mdl is an explicitly created model (Mdl.ParamMap is empty []), arrange the elements of Params to correspond to hits of a column-wise search of NaNs in the state-space model coefficient matrices, initial state mean vector, and covariance matrix. • If Mdl is time invariant, the order is A, B, C, D, Mean0, and Cov0. • If Mdl is time varying, the order is A{1} through A{end}, B{1} through B{end}, C{1} through C{end}, D{1} through D{end}, Mean0, and Cov0. • If Mdl is an implicitly created model (Mdl.ParamMap is a function handle), the first input argument of the parameter-to-matrix mapping function determines the order of the elements of Params. If Mdl is fully specified, irfplot ignores Params. Example: Consider the state-space model Mdl with A = B = [NaN 0; 0 NaN] , C = [1; 1], D = 0, and initial state means of 0 with covariance eye(2). Mdl is partially specified and explicitly created. Because the model parameters contain a total of four NaNs, Params must be a 4-by-1 vector, where Params(1) is the estimate of A(1,1), Params(2) is the estimate of A(2,2), Params(3) is the estimate of B(1,1), and Params(4) is the estimate of B(2,2). Data Types: double PlotU — State-disturbance variables ut to shock vector of positive integers State-disturbance variables ut to shock for the IRF plots, specified as the comma-separated pair consisting of 'PlotU' and a vector of positive integers. Elements are the indices of the statedisturbance variables u1,t, u2,t, …, uk,t. By default, irfplot shocks all state-disturbance variables. Example: 'PlotU',[1 3] shocks u1,1 and u3,1, and irfplot plots the resulting IRFs. Data Types: double PlotY — Measurement-variable IRFs to plot vector of positive integers | [] Measurement-variable IRFs to plot, specified as the comma-separated pair consisting of 'PlotY' and a vector of positive integers. Elements are the indices of the measurement variables y1,t, y2,t, …, yn,t. If PlotY is empty [], irfplot does not plot any measurement-variable IRFs. By default, irfplot plots all measurement-variable IRFs. Example: 'PlotY',1 plots the IRF of y1,t. Data Types: double 12-1690
irfplot
PlotX — State-variable IRFs to plot vector of positive integers | [] State-variable IRFs to plot, specified as the comma-separated pair consisting of 'PlotX' and a vector of positive integers. Elements are the indices of the state variables x1,t, x2,t, …, xm,t. If PlotX is empty [], irfplot does not plot any state-variable IRFs. By default, irfplot plots all state-variable IRFs. Example: 'PlotX',[] does not plot any state-variable IRFs. Data Types: double Cumulative — Flag for computing cumulative IRF false (default) | true Flag for computing the cumulative IRF, specified as a value in this table. Value
Description
true
irfplot computes the cumulative IRF of all variables over the specified time range.
false
irfplot computes the standard, period-by-period IRF of all variables over the specified time range.
Example: 'Cumulative',true Data Types: logical Method — IRF estimation algorithm 'repeated-multiplication' (default) | 'eigendecomposition' IRF estimation algorithm, specified as 'repeated-multiplication' or 'eigendecomposition'. The IRF estimator of time m contains the factor Am. This table describes the supported algorithms to compute the matrix power. Value
Description
'repeated-multiplication'
irfplot uses recursive multiplication.
'eigendecomposition'
irfplot attempts to use the spectral decomposition of A to compute the matrix power. Specify this value only when you suspect that the recursive multiplication algorithm might experience numerical issues. For more details, see Algorithms on page 12-1693.
Data Types: string | char Confidence Bound Estimation Options
EstParamCov — Estimated covariance matrix of unknown parameters positive semidefinite numeric matrix Estimated covariance matrix of unknown parameters in the partially specified state-space model Mdl, specified as a positive semidefinite numeric matrix. 12-1691
12
Functions
estimate returns the estimated parameter covariance matrix of Mdl in the appropriate form. However, you can supply custom estimates by setting EstParamCov(i,j) to the estimated covariance of the estimated parameters Params(i) and Params(j), regardless of whether Mdl is time invariant or time varying. If Mdl is fully specified, irfplot ignores EstParamCov. By default, irfplot does not estimate confidence bounds. Data Types: double NumPaths — Number of Monte Carlo sample paths 1000 (default) | positive integer Number of Monte Carlo sample paths (trials) to generate to estimate confidence bounds, specified as a positive integer. Example: 'NumPaths',5000 Data Types: double Confidence — Confidence level 0.95 (default) | numeric scalar in [0,1] Confidence level for the confidence bounds, specified as a numeric scalar in the interval [0,1]. For each period, randomly drawn confidence intervals cover the true response 100*Confidence% of the time. The default value is 0.95, which implies that the confidence bounds represent 95% confidence intervals. Example: Confidence=0.9 specifies 90% confidence intervals. Data Types: double
Output Arguments h — Plot handles to IRFs and confidence bounds matrix of Line objects Plot handles to the IRFs and confidence bounds, returned as a 3-by-pU*(pY + pX) matrix of Line objects, where pU, pY, and pX are the lengths of the values of the 'PlotU', 'PlotY', and 'PlotX' name-value pair arguments, respectively. Each column corresponds to the IRF of a combination of a state disturbance and a measurement or state variable. For a particular column, row 1 contains the handle to the IRF, and rows 2 and 3 contain the handles to the lower and upper confidence bounds, respectively. The columns display information in this order:
12-1692
1
IRFs of all measurement variables PlotY(:) resulting from a shock to the first state disturbance PlotU(1).
2
IRFs of all measurement variables PlotY(:) resulting from a shock to the second state disturbance PlotU(2).
3
Continues the display similarly until irfplot reaches the IRF associated with the last state disturbance PlotU(end).
irfplot
4
Repeat steps 1 through 3, but replace the measurement variables with the state variables PlotX.
h contains unique plot identifiers, which you can use to query or modify properties of the plots.
More About Impulse Response Function An impulse response function (IRF) of a state-space model (or dynamic response of the system) measures contemporaneous and future changes in the state and measurement variables when each state-disturbance variable is shocked by a unit impulse at period 1. In other words, the IRF at time t is the derivative of each state and measurement variable at time t with respect to a state-disturbance variable at time 1, for each t ≥ 1. Consider the time-invariant state-space model on page 11-3 xt = Axt − 1 + But yt = Cxt + Dεt, and consider an unanticipated unit shock at period 1, applied to state-disturbance variable j uj,t. The r-step-ahead response of the state variables xt to the shock is ψx j(r) = Ar b j, where r > 0 and bj is column j of the state-disturbance-loading matrix B. The r-step-ahead response of the measurement variables yt to the shock is ψy j(r) = CAr b j . IRFs depend on the time interval over which they are computed. However, the IRF of a time-invariant state-space model is time homogeneous, which means that the IRF does not depend on the time at which the shock is applied. Time-varying IRFs, which are the IRFs of a time-varying but dimensioninvariant system, have the form ψx j(r) = Ar ⋯A2 A1b1, j ψy j(r) = Cr Ar ⋯A2 A1b1, j, where b1,j is column j of B1, the period 1 state-disturbance-loading matrix. Time-varying IRFs depend on the time at which the shock is applied. irfplot always applies the shock at period 1. IRFs are independent of the initial state distribution.
Algorithms • If you specify 'eigendecomposition' for the 'Method' name-value pair argument, irfplot attempts to diagonalize the state-transition matrix A by using the spectral decomposition. irfplot resorts to recursive multiplication instead under at least one of these circumstances: • An eigenvalue is complex. • The rank of the matrix of eigenvectors is less than the number of states 12-1693
12
Functions
• Mdl is time varying. • If you do not supply 'EstParamCov', confidence bounds of each period overlap. • irfplot uses Monte Carlo simulation to compute confidence intervals. 1
irfplot randomly draws NumPaths variates from the asymptotic sampling distribution of the unknown parameters in Mdl, which is Np(Params,EstParamCov), where p is the number of unknown parameters.
2
For each randomly drawn parameter set j, irfplot:
3
a
Creates a state-space model that is equal to Mdl, but substitutes in parameter set j
b
Computes the random IRF of the resulting model ψj(t), where t = 1 through NumPaths
For each time t, the lower bound of the confidence interval is the (1 – c)/2 quantile of the simulated IRF at period t ψ(t), where c = Confidence. Similarly, the upper bound of the confidence interval at time t is the (1 – c)/2 upper quantile of ψ(t).
Version History Introduced in R2020b
See Also Objects ssm | dssm Functions irf | estimate | filter | smooth | forecast Topics “What Are State-Space Models?” on page 11-3
12-1694
isEqLagOp
isEqLagOp Determine if two LagOp objects are same mathematical polynomial
Syntax indicator = isEqLagOp(A,B) indicator = isEqLagOp(A,B,Name,Value)
Description indicator = isEqLagOp(A,B) determines if two lag operator polynomials A and B are the same. indicator is a boolean indicator for the equality test. TRUE indicates the two polynomials are identical to within tolerance; FALSE indicates the two polynomials are not identical to within tolerance. indicator = isEqLagOp(A,B,Name,Value) determines if two lag operator polynomials are the same with additional options specified by one or more Name,Value pair arguments. If at least one of A or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator).
Input Arguments A Lag operator polynomial object, as created by LagOp, against which the equality of B is tested. B Lag operator polynomial object, as created by LagOp, against which the equality of A is tested. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Tolerance Nonnegative scalar tolerance used for testing equality. The default is 1e-12. Specifying a tolerance greater than the default relaxes the comparison criterion. Two polynomials are deemed sufficiently close to indicate equality if the differences in magnitude of all elements of all coefficient matrices at all lags are less than or equal to the specified tolerance. Default: 1e-12 12-1695
12
Functions
Output Arguments indicator Boolean indicator for the equality test. true indicates the two polynomials are identical to within tolerance; false indicates the two polynomials are not identical to within tolerance.
Examples Determine the Equivalence of Two Lag Polynomials Create a lag operator polynomial and convert it to a cell array: A = LagOp({1 0.8 0.3 0.2}); B = toCellArray(A); isEqLagOp(A,B) ans = logical 1
The converted cell array is equivalent to the LagOp polynomial object.
See Also toCellArray Topics “Specify Lag Operator Polynomials” on page 2-9
12-1696
isergodic
isergodic Check Markov chain for ergodicity
Syntax tf = isergodic(mc)
Description tf = isergodic(mc) returns true if the discrete-time Markov chain mc is ergodic on page 121698 and false otherwise.
Examples Determine Whether Markov Chain Is Ergodic Consider this three-state transition matrix. 0 10 P= 0 0 1 . 1 00 Create the Markov chain that is characterized by the transition matrix P. P = [0 1 0; 0 0 1; 1 0 0]; mc = dtmc(P);
Determine whether the Markov chain is ergodic. isergodic(mc) ans = logical 0
0 indicates that the Markov chain is not ergodic. Visually confirm that the Markov chain is not ergodic by plotting its eigenvalues on the complex plane. figure; eigplot(mc);
12-1697
12
Functions
All three eigenvalues have modulus one. This result indicates that the period of the Markov chain is three. Periodic Markov chains are not ergodic.
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries).
Output Arguments tf — Ergodicity flag true | false Ergodicity flag, returned as true if mc is an ergodic Markov chain and false otherwise.
More About Ergodic Chain A Markov chain is ergodic if it is both irreducible and aperiodic. This condition is equivalent to the transition matrix being a primitive nonnegative matrix. 12-1698
isergodic
Algorithms • By Wielandt's theorem [3], the Markov chain mc is ergodic if and only if all elements of Pm are positive for m = (n – 1)2 + 1. P is the transition matrix (mc.P) and n is the number of states (mc.NumStates). To determine ergodicity, isergodic computes Pm. • By the Perron-Frobenius Theorem [2], ergodic Markov chains have unique limiting distributions. That is, they have unique stationary distributions to which every initial distribution converges. Ergodic unichains, which consist of a single ergodic class plus transient classes, also have unique limiting distributions (with zero probability mass in the transient classes).
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985. [3] Wielandt, H. "Unzerlegbare, Nicht Negativen Matrizen." Mathematische Zeitschrift. Vol. 52, 1950, pp. 642–648.
See Also Objects dtmc Functions asymptotics | isreducible Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Determine Asymptotic Behavior of Markov Chain” on page 10-39
12-1699
12
Functions
isNonZero Find lags associated with nonzero coefficients of LagOp objects
Syntax indicator = isNonZero(A,testLags)
Description Given a vector of candidate lags to test, indicator = isNonZero(A,testLags), determines which lags are associated with nonzero coefficients of a lag operator polynomial A(L).
Examples Determine Which Lag Has a Nonzero Coefficient Create a Lag Operator polynomial object and add a term with the Coefficients property: A = LagOp({1 0.8 0.3 0.2}); A.Coefficients(7)={0.5}; isNonZero(A,7) ans = logical 1
12-1700
isreducible
isreducible Check Markov chain for reducibility
Syntax tf = isreducible(mc)
Description tf = isreducible(mc) returns true if the discrete-time Markov chain mc is reducible on page 121702 and false otherwise.
Examples Determine Whether Markov Chain Is Reducible Consider this three-state transition matrix. 0.5 0.5 0 P = 0.5 0.5 0 0 0 1 Create the Markov chain that is characterized by the transition matrix P. P = [0.5 0.5 0; 0.5 0.5 0; 0 0 1]; mc = dtmc(P);
Determine whether the Markov chain is reducible. isreducible(mc) ans = logical 1
1 indicates that mc is reducible. Visually confirm the reducibility of the Markov chain by plotting its digraph. figure; graphplot(mc);
12-1701
12
Functions
Two independent chains appear in the figure. This result indicates that you can analyze the two chains separately.
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries).
Output Arguments tf — Reducibility flag true | false Reducibility flag, returned as true if mc is a reducible Markov chain and false otherwise.
More About Reducible Chain A Markov chain is reducible if it consists of more than one communicating class. Asymptotic analysis is reduced to individual subclasses. See classify and asymptotics. 12-1702
isreducible
Algorithms • The Markov chain mc is irreducible if every state is reachable from every other state in at most n – 1 steps, where n is the number of states (mc.NumStates). This result is equivalent to Q = (I + Z)n –1 containing all positive elements. I is the n-by-n identity matrix. The zero-pattern matrix of the transition matrix P (mc.P) is Zij = I(Pij > 0), for all i,j [2]. To determine reducibility, isreducible computes Q. • By the Perron-Frobenius Theorem [2], irreducible Markov chains have unique stationary distributions. Unichains, which consist of a single recurrent class plus transient classes, also have unique stationary distributions (with zero probability mass in the transient classes). Reducible chains with multiple recurrent classes have stationary distributions that depend on the initial distribution.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985.
See Also Objects dtmc Functions asymptotics | classify | isergodic Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Determine Asymptotic Behavior of Markov Chain” on page 10-39 “Identify Classes in Markov Chain” on page 10-47
12-1703
12
Functions
isStable Determine stability of lag operator polynomial
Syntax [indicator,eigenvalues] = isStable(A)
Description [indicator,eigenvalues] = isStable(A) takes a lag operator polynomial object A and checks if it is stable. The stability condition requires that the magnitudes of all roots of the characteristic polynomial are less than 1 to within a small numerical tolerance.
Input Arguments A Lag operator polynomial object, as produced by LagOp.
Output Arguments indicator Boolean value for the stability test. true indicates that A(L) is stable and that the magnitude of all eigenvalues of its characteristic polynomial are less than one; false indicates that A(L) is unstable and that the magnitude of at least one of the eigenvalues of its characteristic polynomial is greater than or equal to one. eigenvalues Eigenvalues of the characteristic polynomial associated with A(L). The length of eigenvalues is the product of the degree and dimension of A(L).
Examples Check a Lag Operator Polynomial for Stability Divide two Lag Operator polynomial objects and check if the resulting polynomial is stable: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); [indicator,eigenvalues]=isStable(A\B) indicator = logical 1 eigenvalues = 4×1 complex
12-1704
isStable
0.3531 -0.0723 -0.0723 -0.3086
+ + +
0.0000i 0.3003i 0.3003i 0.0000i
Tips • Zero-degree polynomials are always stable. • For polynomials of degree greater than zero, the presence of NaN-valued coefficients returns a false stability indicator and vector of NaNs in eigenvalues. • When testing for stability, the comparison incorporates a small numerical tolerance. The indicator is true when the magnitudes of all eigenvalues are less than 1-10*eps, where eps is machine precision. Users who wish to incorporate their own tolerance (including 0) may simply ignore indicator and determine stability as follows: [~,eigenvalues] = isStable(A); indicator = all(abs(eigenvalues) < (1-tol));
for some small, nonnegative tolerance tol.
References [1] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also mldivide | mrdivide Topics “Specify Lag Operator Polynomials” on page 2-9 “Plot the Impulse Response Function of Conditional Mean Model” on page 7-80
12-1705
12
Functions
jcitest Johansen cointegration test
Syntax h = jcitest(Y) h = jcitest(Tbl) h = jcitest( ___ ,Name=Value) [h,pValue,stat,cValue] = jcitest( ___ ) [h,pValue,stat,cValue,mles] = jcitest( ___ )
Description h = jcitest(Y) returns the rejection decisions h from conducting the Johansen test, which assesses each null hypothesis H(r) of cointegration rank less than or equal to r among the numDimsdimensional multivariate time series Y against the alternatives H(numDims) (trace test) or H(r + 1) (maxeig test). The tests produce maximum likelihood estimates of the parameters in a vector errorcorrection (VEC on page 12-1717) model of the cointegrated series. h = jcitest(Tbl) returns rejection decisions from conducting the Johansen test on the variables of the table or timetable Tbl. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. h = jcitest( ___ ,Name=Value) uses additional options specified by one or more name-value arguments, using any input-argument combination in the previous syntaxes. Some options control the number of tests to conduct. The following conditions apply when jcitest conducts multiple tests: • jcitest treats each test as separate from all other tests. • Each row of all outputs contains the results of the corresponding test. For example, jcitest(Tbl,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. [h,pValue,stat,cValue] = jcitest( ___ ) displays, at the command window, the results of the Johansen test and returns the p-values pValue, test statistics stat, and critical values cValue of the test. The results display includes the ranks r, corresponding rejection decisions, p-values, decision statistics, and specified options. [h,pValue,stat,cValue,mles] = jcitest( ___ ) also returns a structure of maximum likelihood estimates associated with the VEC(q) model of the multivariate time series yt.
Examples
12-1706
jcitest
Conduct Johansen Cointegration Test on Matrix of Data Test a multivariate time series for cointegration using the default values of the Johansen cointegration test. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada series' ans = 5x1 cell {'(INF_C) Inflation rate (CPI-based)' } {'(INF_G) Inflation rate (GDP deflator-based)'} {'(INT_S) Interest rate (short-term)' } {'(INT_M) Interest rate (medium-term)' } {'(INT_L) Interest rate (long-term)' }
Test the interest rate series for cointegration by using the Johansen cointegration test. Use default options and return the rejection decision. h = jcitest(Data(:,3:end)) h=1×7 table
t1
r0 _____
r1 _____
r2 _____
Model ______
true
true
false
{'H1'}
Lags ____ 0
Test _________
Alpha _____
{'trace'}
0.05
By default, jcitest conducts the trace test and uses the H1 Johansen form by default. The test fails to reject the null hypothesis of rank 2 cointegration in the series.
Conduct Default Johansen Cointegration Test on Table Variables Conduct the Johansen cointegration test on a multivariate time series using default options, which tests all table variables. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Conduct the Johansen cointegration test by passing the timetable to jcitest and using default options. jcitest tests for cointegration among all table variables by default. h = jcitest(TT) h=1×9 table r0 _____
r1 _____
r2 _____
r3 _____
r4 _____
Model ______
Lags ____
Test _________
Alpha _____
12-1707
12
Functions
t1
true
true
false
false
true
{'H1'}
0
{'trace'}
The test fails to reject the null hypotheses of rank 2 and 3 cointegration among the series. By default, jcitest includes all input table variables in the cointegration test. To select a subset of variables to test, set the DataVariables option.
Conduct Johansen Test for Each Test Statistic jcitest supports two types Johansen tests. Conduct a test for each type. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. Identify the interest rate series. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; idxINT = contains(TT.Properties.VariableNames,"INT");
Conduct the Johansen cointegration test to assess cointegration among the interest rate series. Specify both test types trace and maxeig, and set the level of significance to 2.5%. h = jcitest(TT,DataVariables=idxINT,Test=["trace" "maxeig"],Alpha=0.025) h=2×7 table
t1 t2
r0 _____
r1 _____
r2 _____
Model ______
true false
false false
false false
{'H1'} {'H1'}
Lags ____ 0 0
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.025 0.025
h is a 2-row table; rows contain results of separate tests. At 2.5% level of significance: • The trace test fails to reject the null hypotheses of ranks 1 and 2 cointegration among the series. • The maxeig test fails to reject the null hypotheses for each cointegration rank.
Return Test p-Values and Decision Statistics Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. Identify the interest rate series. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; idxINT = contains(TT.Properties.VariableNames,"INT");
12-1708
0.05
jcitest
Conduct the Johansen cointegration test to assess cointegration among the interest rate series. Specify both test types trace and maxeig. [h,pValue,stat,cValue] = jcitest(TT,DataVariables=idxINT,Test=["trace" "maxeig"]) ************************ Results Summary (Test 1) Data: TT Effective sample size: 40 Model: H1 Lags: 0 Statistic: trace Significance level: 0.05 r h stat cValue pValue eigVal ---------------------------------------0 1 37.6886 29.7976 0.0050 0.4101 1 1 16.5770 15.4948 0.0343 0.2842 2 0 3.2003 3.8415 0.0737 0.0769 ************************ Results Summary (Test 2) Data: TT Effective sample size: 40 Model: H1 Lags: 0 Statistic: maxeig Significance level: 0.05 r h stat cValue pValue eigVal ---------------------------------------0 0 21.1116 21.1323 0.0503 0.4101 1 0 13.3767 14.2644 0.0687 0.2842 2 0 3.2003 3.8415 0.0737 0.0769 h=2×7 table
t1 t2
r0 _____
r1 _____
r2 _____
Model ______
true false
true false
false false
{'H1'} {'H1'}
pValue=2×7 table r0 _________ t1 t2
0.0050497 0.050346
stat=2×7 table r0 ______
Lags ____ 0 0
r1 ________
r2 ________
Model ______
0.034294 0.06874
0.073661 0.073661
{'H1'} {'H1'}
r1 ______
r2 ______
Model ______
Lags ____
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.05 0.05
Lags ____ 0 0
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.05 0.05
Test __________
Alpha _____
12-1709
12
Functions
t1 t2
37.689 21.112
16.577 13.377
3.2003 3.2003
{'H1'} {'H1'}
cValue=2×7 table r0 ______
r1 ______
r2 ______
Model ______
15.495 14.264
3.8415 3.8415
{'H1'} {'H1'}
t1 t2
29.798 21.132
0 0
Lags ____ 0 0
{'trace' } {'maxeig'}
0.05 0.05
Test __________
Alpha _____
{'trace' } {'maxeig'}
0.05 0.05
jcitest prints a results display for each test to the command window. All outputs are tables containing the corresponding statistics and test options.
Plot Estimated Cointegrating Relations Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; idxINT = contains(TT.Properties.VariableNames,"INT");
Plot the interest series. plot(TT.Time,TT{:,idxINT}) legend(series(idxINT),Location="northwest") grid on
12-1710
jcitest
Test the interest rate series for cointegration; use the default Johansen form H1. Return all outputs. [h,pValue,stat,cValue,mles] = jcitest(TT,DataVariables=idxINT); ************************ Results Summary (Test 1) Data: TT Effective sample size: 40 Model: H1 Lags: 0 Statistic: trace Significance level: 0.05 r h stat cValue pValue eigVal ---------------------------------------0 1 37.6886 29.7976 0.0050 0.4101 1 1 16.5770 15.4948 0.0343 0.2842 2 0 3.2003 3.8415 0.0737 0.0769 h h=1×7 table r0 _____
r1 _____
r2 _____
Model ______
Lags ____
Test _________
Alpha _____
12-1711
12
Functions
t1
true
true
false
{'H1'}
0
{'trace'}
0.05
pValue pValue=1×7 table r0 _________ t1
0.0050497
r1 ________
r2 ________
Model ______
0.034294
0.073661
{'H1'}
Lags ____ 0
Test _________
Alpha _____
{'trace'}
0.05
The test fails to reject the null hypothesis of rank 2 cointegration in the series. Plot the estimated cointegrating relations B′yt − 1 + c0: TTLag = lagmatrix(TT,1); T = height(TTLag); B = mles.r2.paramVals.B; c0 = mles.r2.paramVals.c0; plot(TTLag.Time,TTLag{:,idxINT}*B+repmat(c0',T,1)) grid on
Input Arguments Y — Data representing observations of multivariate time series yt numeric matrix 12-1712
jcitest
Data representing observations of a multivariate time series yt, specified as a numObs-by-numDims numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double Tbl — Data representing observations of multivariate time series yt table | timetable Data representing observations of a multivariate time series yt, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. Note jcitest removes the following observations from the specified data: • All rows containing at least one missing observation, represented by a NaN value • From the beginning of the data, initial values required to initialize lagged variables
Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: jcitest(Tbl,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. Model — Johansen form of VEC(q) model deterministic terms "H1" (default) | "H2" | "H1*" | "H*" | "H" | character vector | string vector | cell vector of character vectors Johansen form of the VEC(q) model deterministic terms [3], specified as a Johansen form name in the table, or a string vector or cell vector of character vectors of such values (for model parameter definitions, see “Vector Error-Correction (VEC) Model” on page 12-1717). Value
Error-Correction Term
Description
"H2"
AB´yt − 1
No intercepts or trends are present in the cointegrating relations, and no deterministic trends are present in the levels of the data. Specify this model only when all response series have a mean of zero.
"H1*"
A(B´yt−1+c0)
Intercepts are present in the cointegrating relations, and no deterministic trends are present in the levels of the data.
"H1"
A(B´yt−1+c0)+c1
Intercepts are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
12-1713
12
Functions
Value
Error-Correction Term
Description
"H*"
A(B´yt−1+c0+d0t)+c1
Intercepts and linear trends are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H"
A(B´yt−1+c0+d0t)+c1+d1t
Intercepts and linear trends are present in the cointegrating relations, and deterministic quadratic trends are present in the levels of the data. If quadratic trends are not present in the data, this model can produce good in-sample fits but poor out-of-sample forecasts.
jcitest conducts a separate test for each value in Model. Example: Model="H1*" uses the Johansen form H1* for all tests. Example: Model=["H1*" "H1"] uses Johansen form H1* for the first test, and then uses Johansen form H1 for the second test. Data Types: string | char | cell Lags — Number of lagged differences q 0 (default) | nonnegative integer | vector of nonnegative integers Number of lagged differences q in the VEC(q) model, specified as a nonnegative integer or vector of nonnegative integers. jcitest conducts a separate test for each value in Lags. Example: Lags=1 includes Δyt – 1 in the model for all tests. Example: Lags=[0 1] includes no lags in the model for the first test, and then includes Δyt – 1 in the model for the second test. Data Types: double Test — Test to perform "trace" (default) | "maxeig" | character vector | string vector | cell vector of character vectors Test to perform, specified as a value in this table, or a string vector or cell vector of character vectors of such values. Value
Description
"trace"
The alternative hypothesis is H(numDims), and the test statistics are −T log(1 − λr + 1) + … + log(1 − λm) .
"maxieig"
The alternative hypothesis is H(r + 1), and the test statistics are −Tlog(1 − λr + 1) .
Both tests assess the null hypothesis H(r) of cointegration rank less than or equal to r. jcitest computes statistics using the effective sample size T ≤ nnumObs and ordered estimates of the eigenvalues of C = AB′, λ1 > ... > λm, where m = numDims. 12-1714
jcitest
jcitest conducts a separate test for each value in Test. Example: Test="maxeig" conducts the maxeig test for all tests. Example: Test=["maxeig" "trace"] conducts the maxeig test for the first test, and then conducts the trace test for the second test. Data Types: char | string | cell Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. jcitest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double Display — Command window display control "off" | "summary" | "params" | "full" Command window display control, specified as a value in this table. Value
Description
"off"
jcitest does not display the results to the command window. If jcitest returns h or no outputs, this display is the default.
"summary"
jcitest displays a tabular summary of test results. The tabular display includes null ranks r = 0:(numDims − 1) in the first column of each summary. jcitest displays multiple test results in separate summaries. When jcitest returns any other output than h (for example, pValue), this display is the default. You cannot set this display when jcitest returns h or no outputs.
"params"
jcitest displays maximum likelihood estimates of the parameter values associated with the reduced-rank VEC(q) model of yt. You can set this display only when jcitest returns mles. jcitest returns the displayed parameter estimates in the field mles.rn(j).paramVals for null rank r = n and test j.
"full"
jcitest displays both "summary" and "params".
Example: Display="off" Data Types: char | string DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl for which jcitest conducts the test, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] 12-1715
12
Functions
Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Note • When jcitest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • A lagged and differenced time series has a reduced sample size. Absent presample values, if the test series yt is defined for t = 1,…,T, the lagged series yt– k is defined for t = k+1,…,T. The first difference applied to the lagged series yt– k further reduces the time base to k+2,…,T. With p lagged differences, the common time base is p+2,…,T and the effective sample size is T–(p+1).
Output Arguments h — Test rejection decisions table Test rejection decisions, returned as a numTests-by-(numDims + 3) table, where numTests is the number of tests, which is determined by specified options. Row j of h corresponds to test j with options. Rows of h correspond to tests specified by the values of the last three variables Model, Test, and Alpha. Row labels are t1, t2, …, tu, where u = numTests. Variables of h correspond to different, maintained cointegration ranks r = 0, 1, …, numDims – 1 and specified name-value arguments that control the number of tests. Variable labels are r0, r1, …, rR, where R = numDims – 1, and Model, Test, and Alpha. To access results, for example, the result for test j of null rank k, use h.rk(j). Variable k, labeled rk, is logical vector whose entries have the following interpretations: • 1 (true) indicates rejection of the null hypothesis of cointegration rank k in favor of the alternative hypothesis. • 0 (false) indicates failure to reject the null hypothesis of cointegration rank k. pValue — Test statistic p-values table Test statistic p-values, returned as a table with the same dimensions and labels as h. Variable k, labeled rk, is a numeric vector of p-values for the corresponding tests. The p-values are right-tailed probabilities. When test statistics are outside tabulated critical values, jcitest returns maximum (0.999) or minimum (0.001) p-values. stat — Test statistics table 12-1716
jcitest
Test statistics, returned as a table with the same dimensions and labels as h. The Test setting of a particular test determines the test statistic. cValue — Critical values table Critical values, returned as a table with the same dimensions and labels as h. Variable k, labeled rk, is a numeric vector of critical values for the corresponding tests. The critical values are for righttailed probabilities determined by Alpha. jcitest loads tables of critical values from the file Data_JCITest.mat, and then linearly interpolates test critical values from the tables. Critical values in the tables derive from methods described in [4]. mles — Maximum likelihood estimates (MLE) associated with VEC(q) model of yt structure array Maximum likelihood estimates associated with the VEC(q) model of yt, returned as a table with the same dimensions and labels as h. Variable k, labeled rk, is a structure array of MLEs with elements for the corresponding tests. Each element of mles.rk has the fields in this table. You can access a field using dot notation, for example, mles.r2(3).paramVals contains the parameter estimates of the third test corresponding to the null hypothesis of rank 2 cointegration. Field
Description
paramNames
Cell vector of parameter names, of the form: {A, B, B1, … Bq, c0, d0, c1, d1} Elements depend on the values of the Lags and Model namevalue arguments.
paramVals
Structure of parameter estimates with field names corresponding to the parameter names in paramNames.
res
T-by-numDims matrix of residuals, where T is the effective sample size, obtained by fitting the VEC(q) model of yt to the input data.
EstCov
Estimated covariance Q of the innovations process εt.
eigVal
Eigenvalue associated with H(r).
eigVec
Eigenvector associated with the eigenvalue in eigVal. Eigenvectors v are normalized so that v′S11v = 1, where S11 is defined as in [3].
rLL
Restricted loglikelihood of yt under the null.
uLL
Unrestricted loglikelihood of yt under the alternative.
More About Vector Error-Correction (VEC) Model A vector error-correction (VEC) model is a multivariate, stochastic time series model consisting of a system of m = numDims equations of m distinct, differenced response variables. Equations in the 12-1717
12
Functions
system can include an error-correction term, which is a linear function of the responses in levels used to stabilize the system. The cointegrating rank r is the number of cointegrating relations that exist in the system. Each response equation can include a degree q autoregressive polynomial composed of first differences of the response series, a constant, a time trend, and a constant and time trend in the error-correction term. Expressed in lag operator notation, a VEC(q) model for a multivariate time series yt is Φ(L)(1 − L)yt = A B′yt − 1 + c0 + d0t + c1 + d1t + εt = c + dt + Cyt − 1 + εt, where • yt is an m = numDims dimensional time series corresponding to m response variables at time t, t = 1,...,T. • Φ(L) = I − Φ1 − Φ2 − ... − Φq, I is the m-by-m identity matrix, and Lyt = yt – 1. • The cointegrating relations are B'yt – 1 + c0 + d0t and the error-correction term is A(B'yt – 1 + c0 + d0t). • r is the number of cointegrating relations and, in general, 0 ≤ r ≤ m. • A is an m-by-r matrix of adjustment speeds. • B is an m-by-r cointegration matrix. • C = AB′ is an m-by-m impact matrix with a rank of r. • c0 is an r-by-1 vector of constants (intercepts) in the cointegrating relations. • d0 is an r-by-1 vector of linear time trends in the cointegrating relations. • c1 is an m-by-1 vector of constants (deterministic linear trends in yt). • d1 is an m-by-1 vector of linear time-trend values (deterministic quadratic trends in yt). • c = Ac0 + c1 and is the overall constant. • d = Ad0 + d1 and is the overall time-trend coefficient. • Φj is an m-by-m matrix of short-run coefficients, where j = 1,...,q and Φq is not a matrix containing only zeros. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. If m = r, then the VEC model is a stable VAR(q + 1) model in the levels of the responses. If r = 0, then the error-correction term is a matrix of zeros, and the VEC(q) model is a stable VAR(q) model in the first differences of the responses.
Tips • To convert VEC(q) model parameters in the mles output to VAR(q + 1) model parameters, use vec2var. • To test linear constraints on the error-correction speeds A and the space of cointegrating relations spanned by B, use jcontest. 12-1718
jcitest
Algorithms • jcitest identifies deterministic terms that are outside of the cointegrating relations, c1 and d1, by projecting constant and linear regression coefficients, respectively, onto the orthogonal complement of A. • If jcitest fails to reject the null hypothesis of cointegration rank r = 0, the inference is that the error-correction coefficient C is zero, and the VEC(q) model reduces to a standard VAR(q) model in first differences. If jcitest rejects all cointegration ranks r less than numDims, the inference is that C has full rank, and yt is stationary in levels. • The parameters A and B in the reduced-rank VEC(q) model are not identifiable, but their product C = AB′ is identifiable. jcitest constructs B = V(:,1:r) using the orthonormal eigenvectors V returned by eig, and then renormalizes so that V'*S11*V = I [3]. • The time series in the specified input data can be stationary in levels or first differences (that is, I(0) or I(1)). Rather than pretesting series for unit roots (using, e.g., adftest, pptest, kpsstest, or lmctest), the Johansen procedure formulates the question within the model. An I(0) series is associated with a standard unit vector in the space of cointegrating relations, and the jcontest can test for its presence. • Deterministic cointegration, where cointegrating relations, perhaps with an intercept, produce stationary series, is the traditional sense of cointegration introduced by Engle and Granger [1] (see egcitest). Stochastic cointegration, where cointegrating relations produce trend-stationary series (that is, d0 is nonzero), extends the definition of cointegration to accommodate a greater variety of economic series. • Unless higher-order trends are present in the data, models with fewer restrictions can produce good in-sample fits, but poor out-of-sample forecasts.
Alternative Functionality App The Econometric Modeler app enables you to conduct the Johansen cointegration test.
Version History Introduced in R2011a
References [1] Engle, R. F. and C. W. J. Granger. "Co-Integration and Error-Correction: Representation, Estimation, and Testing." Econometrica. Vol. 55, 1987, pp. 251–276. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [3] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [4] MacKinnon, J. G. "Numerical Distribution Functions for Unit Root and Cointegration Tests." Journal of Applied Econometrics. Vol. 11, 1996, pp. 601–618. [5] Turner, P. M. "Testing for Cointegration Using the Johansen Approach: Are We Using the Correct Critical Values?" Journal of Applied Econometrics. v. 24, 2009, pp. 825–831. 12-1719
12
Functions
See Also Objects vecm | varm Functions estimate | vec2var | egcitest | jcontest Topics “Cointegration and Error Correction Analysis” on page 9-107 “Identifying Single Cointegrating Relations” on page 9-113 “Compare Approaches to Cointegration Analysis” on page 9-142 “Test for Cointegration Using the Johansen Test” on page 9-137 “Test Cointegrating Vectors” on page 9-146 “Estimate VEC Model Parameters Using jcitest” on page 9-139 “Testing Cointegrating Vectors and Adjustment Speeds” on page 9-145 “Specifying Multivariate Lag Operator Polynomials and Coefficient Constraints Interactively” on page 4-50 “Estimate Vector Error-Correction Model Using Econometric Modeler” on page 4-180
12-1720
jcontest
jcontest Johansen constraint test
Syntax h = jcontest(Y,r,test,Cons) [h,pValue,stat,cValue] = jcontest(Y,r,test,Cons) StatTbl = jcontest(Tbl,r,test,Cons) [ ___ ] = jcontest( ___ ,Name=Value) [ ___ ,mles] = jcontest( ___ )
Description h = jcontest(Y,r,test,Cons) returns the rejection decisions h from conducting the Johansen constraint test, which assesses linear constraints on either the error-correction (adjustment) speeds A or the cointegration space spanned by the cointegrating matrix B in the reduced-rank VEC(q) on page 12-1734 model of the multivariate time series yt, where: • Y is a matrix of observations of yt. • r is the common rank of matrices A and B. • test specifies the constraint types, including linear or equality constraints on A or B. • Cons specifies the test constraint values. For a particular test, the constraint type and values form the null hypotheses tested against the alternative hypothesis H(r) of cointegration rank less than or equal to r (an unconstrained VEC model). The tests also produce maximum likelihood estimates of the parameters in the VEC(q) model, subject to the constraints. Each element of test and Cons results in a separate test. [h,pValue,stat,cValue] = jcontest(Y,r,test,Cons) also returns the p-values pValue, test statistics stat, and critical values cValue of the test. StatTbl = jcontest(Tbl,r,test,Cons) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Johansen constraint test on all variables of the input table or timetable Tbl. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. [ ___ ] = jcontest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. jcontest returns the output argument combination for the corresponding input arguments. In addition to bp, some options control the number of tests to conduct. For example, jcontest(Tbl,r,test,Cons,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. [ ___ ,mles] = jcontest( ___ ) also returns a structure of maximum likelihood estimates associated with the constrained VEC(q) models of the multivariate time series yt. 12-1721
12
Functions
Examples Conduct Johansen Constraint Test on Matrix of Data Test for weak exogeneity of a time series, with respect to other series in the system, by using default options of jcontest. Input the time series data as a numeric matrix. Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada series' ans = 5x1 cell {'(INF_C) Inflation rate (CPI-based)' } {'(INF_G) Inflation rate (GDP deflator-based)'} {'(INT_S) Interest rate (short-term)' } {'(INT_M) Interest rate (medium-term)' } {'(INT_L) Interest rate (long-term)' }
Use the Johansen constraint test to assess whether the CPI-based inflation rate y1, t is weakly exogenous with respect to the three interest rate series by testing the following constraint in a 4-D VEC model of the series: (1 − L)y1, t = c + εt . Specify a rank of 1 for the test, a linear constraint on the 4-by-1 adjustment speed vector A so that a1 = 0, and default options. Return the rejection decision. Cons = [1; 0; 0; 0]; Y = Data(:,[1 3:5]); h = jcontest(Y,1,"ACon",Cons) h = logical 0
Given default options and assumptions, h = 0 suggests that the test fails to reject the null hypothesis of the constrained model, which is that the inflation rate is weakly exogenous with respect to the interest rate series.
Return Test p-Values and Decision Statistics Load data of Canadian inflation and interest rates Data_Canada.mat, which contains the series in the matrix Data. load Data_Canada
Conduct the default Johansen constraint test to assess whether the CPI-based inflation rate is weakly exogenous with respect to the interest rate series. Return the test decisions and p-values. 12-1722
jcontest
Cons = [1; 0; 0; 0]; Y = Data(:,[1 3:5]); [h,pValue,stat,cValue] = jcontest(Y,1,"ACon",Cons) h = logical 0 pValue = 0.3206 stat = 0.9865 cValue = 3.8415
Conduct Johansen Constraint Test on Table Variables Test for weak exogeneity of a time series, which are variables in a table, with respect to the other time series in the table. Return a table of results. Load data of Canadian inflation and interest rates Data_Canada.mat. Convert the table DataTable to a timetable. load Data_Canada dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = [];
Use the Johansen constraint test to assess whether the CPI-based and GDP-deflator-based inflation rates ( y1, t and y2, t, respectively) are weakly exogenous with respect to the three interest rate series by testing the following constraint in a 5-D VEC model of the series: (1 − L)y1, t = c1, 1 + ε1, t (1 − L)y2, t = c2, 1 + ε2, t . Specify a rank of 2 for the test, a linear constraint on the 4-by-2 adjustment speed vector A so that a1 = 0 and a2 = 0, and default options. Cons = [1 0 0 0 0 StatTbl =
0; 1; 0; 0; 0]; jcontest(TT,2,"ACon",Cons)
StatTbl=1×8 table h _____ Test 1
true
pValue __________
stat ______
cValue ______
1.3026e-05
27.907
9.4877
Lags ____ 0
Alpha _____
Model ______
Test ________
0.05
{'H1'}
{'acon'}
StatTbl.h = 1 means that the test rejects the null hypothesis of the constrained model that the inflation rates are jointly weakly exogenous. StatTbl.pValue = 1.3026e-5 suggests that the evidence to reject is strong. 12-1723
12
Functions
By default, jcontest conducts the Johansen constraint test on all variables in the input table. To select a subset of variables from an input table, set the DataVariables option.
Test Purchasing Power Parity Using jcontest Use the Johansen framework to test multivariate time series with the following characteristics: 1
The log Australian CPI, log US CPI, and the exchange rate series of the countries are stationary.
2
The three series exhibit cointegration.
3
The Australian and the US dollars have the same purchasing power.
Load and Inspect Data Load the data on Australian and U.S. prices Data_JAustralian.mat, which contains the table DataTable. Convert the table to a timetable. load Data_JAustralian dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Dates = [];
Plot the log Australian and log US CPI series (PAU and PUS, respectively), and the log AUD/USD exchange rate series EXCH. varnames = ["PAU" "PUS" "EXCH"]; plot(TT.Time,TT{:,varnames}) legend(varnames,Location="best") grid on
12-1724
jcontest
Pretest for Stationarity Use jcontest to test the null hypothesis that the individual series are stationary by specifying, for each variable j, the following constrained model a1, j (1 − L)yt = a2, j (yt − 1 + c0) + c1 + εt . a3, j Specify the variables to use in the test. Cons = num2cell(eye(3),1) Cons=1×3 cell array {3x1 double} {3x1 double}
{3x1 double}
StatTbl0 = jcontest(TT,1,"BVec",Cons,DataVariables=varnames) StatTbl0=3×8 table h _____ Test 1 Test 2 Test 3
true true false
pValue __________
stat ______
cValue ______
1.307e-05 1.0274e-05 0.06571
22.49 22.972 5.445
5.9915 5.9915 5.9915
Lags ____ 0 0 0
Alpha _____
Model ______
Test ________
0.05 0.05 0.05
{'H1'} {'H1'} {'H1'}
{'bvec'} {'bvec'} {'bvec'}
12-1725
12
Functions
jcontest returns a table of test results. Each row corresponds to a separate test and columns correspond to results or specified options for each test. StatTbl.h(j) = 1 rejects the null hypothesis of stationarity of variable j, and StatTbl.h(j) = 0 fails to reject stationarity. Test for Cointegration Test for cointegration by using jcitest. StatTbl1 = jcitest(TT,DataVariables=varnames) StatTbl1=1×7 table r0 r1 _____ _____ t1
true
false
r2 _____
Model ______
false
{'H1'}
Lags ____ 0
Test _________
Alpha _____
{'trace'}
0.05
StatTbl1.r1 = 0 and StatTbl1.r2 = 0 suggest that the series exhibit at least rank 1 cointegration. Test for Purchasing Power Parity Test for purchasing power parity PAU = PUS + EXCH. StatTbl2 = jcontest(TT,1,"BCon",[1 -1 -1]',DataVariables=varnames) StatTbl2=1×8 table h _____ Test 1
false
pValue ________
stat ______
cValue ______
0.053995
3.7128
3.8415
Lags ____ 0
Alpha _____
Model ______
Test ________
0.05
{'H1'}
{'bcon'}
StatTbl2.h = 0 means that the test fails to reject the null hypothesis of the constrained model, that is, purchasing power parity between the models should not be rejected.
Inspect Maximum Likelihood Estimates of Constrained VEC Models Compare the four types of supported constraints on the adjustment-speed and cointegrating matrices. Load the data on Australian and US prices Data_JAustralian.mat, which contains the table DataTable. Convert the table to a timetable. Consider a 3-D VEC model consisting of the log Australian and US CPI, and the log AUD/USD exchange rate series. load Data_JAustralian dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); TT.Dates = []; varnames = ["PAU" "PUS" "EXCH"];
Conduct the four Johansen constraint tests; specify the arbitrary constraint value 1 −1 −1 ′. Return the test results and maximum likelihood estimates of the constrained model. 12-1726
jcontest
[StatTbl,mle] = jcontest(TT,1,["ACon" "AVec" "BCon" "BVec"], ... [1 -1 -1]',DataVariables=varnames); StatTbl StatTbl=4×8 table h _____ Test Test Test Test
1 2 3 4
false true false false
pValue __________
stat ______
cValue ______
0.11047 3.0486e-08 0.053995 0.074473
2.5475 34.612 3.7128 5.1946
3.8415 5.9915 3.8415 5.9915
Lags ____ 0 0 0 0
Alpha _____
Model ______
Test ________
0.05 0.05 0.05 0.05
{'H1'} {'H1'} {'H1'} {'H1'}
{'acon'} {'avec'} {'bcon'} {'bvec'}
mle is a 4-by-1 structure array with fields containing the maximum likelihood estimates of the constrained models for each test. For each test, display the estimates of A and B, and compute the MLE of the impact matrix C = A B′. The impactmat function is a local supporting function on page 12-1729 that computes the MLE of the impact matrix and displays the estimated matrices. [AACon,BACon,CACon] = impactmat(mle(1)) AACon = 3×1 0.0043 0.0055 -0.0012 BACon = 3×1 2.8496 -2.3341 -6.2670 CACon = 3×3 0.0121 0.0156 -0.0035
-0.0099 -0.0128 0.0028
-0.0267 -0.0343 0.0076
[1 -1 -1]*AACon ans = -1.7347e-18 [AAVec,BAVec,CAVec] = impactmat(mle(2)) AAVec = 3×1 1 -1 -1 BAVec = 3×1
12-1727
12
Functions
-0.0204 0.0158 0.0246 CAVec = 3×3 -0.0204 0.0204 0.0204
0.0158 -0.0158 -0.0158
0.0246 -0.0246 -0.0246
[ABCon,BBCon,CBCon] = impactmat(mle(3)) ABCon = 3×1 -0.0043 -0.0052 -0.0089 BBCon = 3×1 1.8001 -3.9210 5.7211 CBCon = 3×3 -0.0078 -0.0094 -0.0159
0.0170 0.0206 0.0347
-0.0248 -0.0300 -0.0507
[1 -1 -1]*BBCon ans = 0 [ABVec,BBVec,CBVec] = impactmat(mle(4)) ABVec = 3×1 0.0252 0.0422 0.0556 BBVec = 3×1 1 -1 -1 CBVec = 3×3 0.0252 0.0422
12-1728
-0.0252 -0.0422
-0.0252 -0.0422
jcontest
0.0556
-0.0556
-0.0556
Observe that the AVec and BVec constraints apply the constraint value directly to the coefficients, whereas the estimates of the ACon and BCon constraints fulfill the corresponding linear constraint. Supporting Function function [A,B,C] = impactmat(mlest) A = mlest.paramVals.A; B = mlest.paramVals.B; C = A*B'; end
Input Arguments Y — Data representing observations of multivariate time series yt numeric matrix Data representing observations of a multivariate time series yt, specified as a numObs-by-numDims numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double Tbl — Data representing observations of multivariate time series yt table | timetable Data representing observations of a multivariate time series yt, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. To select a subset of variables in Tbl to test, use the DataVariables name-value argument. r — Common rank of A and B positive integer in [1,numDims − 1] Common rank of A and B, specified by a positive integer in the interval [1,numDims − 1]. Tip Infer r by conducting a Johansen test using jcitest. Data Types: double test — Null hypothesis constraint types "ACon" | "AVec" | "BVec" | "BVec" | character vector | string vector | cell vector of character vectors Null hypothesis constraint type, specified as a constraint name in the table, or a string vector or cell vector of character vectors of such values. Constraint Name
Description
"ACon"
Test linear constraints on A.
"AVec"
Test specific vectors in A.
12-1729
12
Functions
Constraint Name
Description
"BVec"
Test linear constraints on B.
"BVec"
Test specific vectors in B.
jcontest conducts a separate test for each value in test. Data Types: string | char | cell Cons — Null hypothesis constraint values R numeric matrix | cell vector of numeric matrices Null hypothesis constraint values, specified as a value for the corresponding constraint type test, or a cell vector of such values, in this table. For constraints on B, the number of rows in each matrix numDims1 is one of the following, where numDims is the number of dimensions in the input data: • numDims + 1 when the Model name-value argument is "H*" or "H1*" and constraints include the restricted deterministic term in the model • numDims otherwise Constraint Type test
Constraint Value Cons
Description
"ACon"
R, a numDims-by-numCons Specifies numCons constraints on A given by R'A numeric matrix = 0, where numCons ≤ numDims − r.
"AVec"
numDims1-by-numCons numeric matrix
Specifies numCons equality constraints imposed on error-correction speed vectors in A, where numCons ≤ r.
"BCon"
R, numDims-by-numCons numeric matrix
Specifies numCons constraints on B given by R'B = 0, where numCons ≤ numDims − r.
"BVec"
numDims1-by-numCons numeric matrix
Specifies numCons equality constraints imposed on numCons of the cointegrating vectors in B, where numCons ≤ r.
Tip When you construct constraint values, use the following interpretations of the rows and columns of A and B. • Row i of A contains the adjustment speeds of variable yi,t to disequilibrium in each of the r cointegrating relations. • Column j of A contains the adjustment speeds of each of the numDims variables to disequillibrium in cointegrating relation j. • Row i of B contains the coefficients of variable yi,t in each of the r cointegrating relations. • Column j of B contains the coefficients of each of the numDims variables in cointegrating relation j.
jcontest conducts a separate test for each cell in Cons. Data Types: string | char | cell 12-1730
jcontest
Note jcontest removes the following observations from the specified data: • All rows containing at least one missing observation, represented by a NaN value • From the beginning of the data, initial values required to initialize lagged variables
Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: jcontest(Tbl,r,test,Cons,Model="H2",DataVariables=1:5) tests the first 5 variables in the input table Tbl using the Johansen model that excludes all deterministic terms. Model — Johansen form of VEC(q) model deterministic terms "H1" (default) | "H2" | "H1*" | "H*" | "H" | character vector | string vector | cell vector of character vectors Johansen form of the VEC(q) model deterministic terms [3], specified as a Johansen form name in the table, or a string vector or cell vector of character vectors of such values (for model parameter definitions, see “Vector Error-Correction (VEC) Model” on page 12-1734). Value
Error-Correction Term
Description
"H2"
AB´yt − 1
No intercepts or trends are present in the cointegrating relations, and no deterministic trends are present in the levels of the data. Specify this model only when all response series have a mean of zero.
"H1*"
A(B´yt−1+c0)
Intercepts are present in the cointegrating relations, and no deterministic trends are present in the levels of the data.
"H1"
A(B´yt−1+c0)+c1
Intercepts are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H*"
A(B´yt−1+c0+d0t)+c1
Intercepts and linear trends are present in the cointegrating relations, and deterministic linear trends are present in the levels of the data.
"H"
A(B´yt−1+c0+d0t)+c1+d1t
Intercepts and linear trends are present in the cointegrating relations, and deterministic quadratic trends are present in the levels of the data. If quadratic trends are not present in the data, this model can produce good in-sample fits but poor out-of-sample forecasts.
jcontest conducts a separate test for each value in Model. 12-1731
12
Functions
Example: Model="H1*" uses the Johansen form H1* for all tests. Example: Model=["H1*" "H1"] uses Johansen form H1* for the first test, and then uses Johansen form H1 for the second test. Data Types: string | char | cell Lags — Number of lagged differences q 0 (default) | nonnegative integer | vector of nonnegative integers Number of lagged differences q in the VEC(q) model, specified as a nonnegative integer or vector of nonnegative integers. jcontest conducts a separate test for each value in Lags. Example: Lags=1 includes Δyt – 1 in the model for all tests. Example: Lags=[0 1] includes no lags in the model for the first test, and then includes Δyt – 1 in the model for the second test. Data Types: double Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. jcontest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariables — Variables in Tbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl for which jcontest conducts the test, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Note • When jcontest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors. • A lagged and differenced time series has a reduced sample size. Absent presample values, if the test series yt is defined for t = 1,…,T, the lagged series yt– k is defined for t = k+1,…,T. The first 12-1732
jcontest
difference applied to the lagged series yt– k further reduces the time base to k+2,…,T. With p lagged differences, the common time base is p+2,…,T and the effective sample size is T–(p+1).
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. jcontest returns h when you supply the input Y. • Values of 1 indicate rejection of the null hypothesis that the specified constraints test and Cons hold in favor of the alternative hypothesis that they do not hold. • Values of 0 indicate failure to reject the null hypothesis that the constraints hold. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. jcontest returns pValue when you supply the input Y. The p-values are right-tail probabilities. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. jcontest returns stat when you supply the input Y. The test statistics are likelihood ratios determined by the test. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. jcontest returns cValue when you supply the input Y. The asymptotic distributions of the test statistics are chi-square, with the degree-of-freedom parameter determined by the test. The critical value of the test statistics are for a right-tail probability. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. jcontest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Model, and Test. mles — Maximum likelihood estimates (MLE) associated with constrained VEC(q) model of yt structure array 12-1733
12
Functions
Maximum likelihood estimates associated with the constrained VEC(q) model of yt, return as a structure array with the number of records equal to the number of tests. Each element of mles has the fields in this table. You can access a field using dot notation, for example, mles(3).paramVals contains a structure of the parameter estimates. Field
Description
paramNames
Cell vector of parameter names, of the form: {A, B, B1, … Bq, c0, d0, c1, d1}. Elements depend on the values of Lags and Model.
paramVals
Structure of parameter estimates with field names corresponding to the parameter names in paramNames.
res
T-by-numDims matrix of residuals, where T is the effective sample size, obtained by fitting the VEC(q) model of yt to the input data.
EstCov
Estimated covariance Q of the innovations process εt.
rLL
Restricted loglikelihood of yt under the null hypothesis.
uLL
Unrestricted loglikelihood of yt under the alternative hypothesis.
dof
Degrees of freedom of the asymptotic chi-square distribution of the test statistic.
More About Vector Error-Correction (VEC) Model A vector error-correction (VEC) model is a multivariate, stochastic time series model consisting of a system of m = numDims equations of m distinct, differenced response variables. Equations in the system can include an error-correction term, which is a linear function of the responses in levels used to stabilize the system. The cointegrating rank r is the number of cointegrating relations that exist in the system. Each response equation can include a degree q autoregressive polynomial composed of first differences of the response series, a constant, a time trend, and a constant and time trend in the error-correction term. Expressed in lag operator notation, a VEC(q) model for a multivariate time series yt is Φ(L)(1 − L)yt = A B′yt − 1 + c0 + d0t + c1 + d1t + εt = c + dt + Cyt − 1 + εt, where • yt is an m = numDims dimensional time series corresponding to m response variables at time t, t = 1,...,T. • Φ(L) = I − Φ1 − Φ2 − ... − Φq, I is the m-by-m identity matrix, and Lyt = yt – 1. • The cointegrating relations are B'yt – 1 + c0 + d0t and the error-correction term is A(B'yt – 1 + c0 + d0t). • r is the number of cointegrating relations and, in general, 0 ≤ r ≤ m. 12-1734
jcontest
• A is an m-by-r matrix of adjustment speeds. • B is an m-by-r cointegration matrix. • C = AB′ is an m-by-m impact matrix with a rank of r. • c0 is an r-by-1 vector of constants (intercepts) in the cointegrating relations. • d0 is an r-by-1 vector of linear time trends in the cointegrating relations. • c1 is an m-by-1 vector of constants (deterministic linear trends in yt). • d1 is an m-by-1 vector of linear time-trend values (deterministic quadratic trends in yt). • c = Ac0 + c1 and is the overall constant. • d = Ad0 + d1 and is the overall time-trend coefficient. • Φj is an m-by-m matrix of short-run coefficients, where j = 1,...,q and Φq is not a matrix containing only zeros. • εt is an m-by-1 vector of random Gaussian innovations, each with a mean of 0 and collectively an m-by-m covariance matrix Σ. For t ≠ s, εt and εs are independent. If m = r, then the VEC model is a stable VAR(q + 1) model in the levels of the responses. If r = 0, then the error-correction term is a matrix of zeros, and the VEC(q) model is a stable VAR(q) model in the first differences of the responses.
Tips • jcontest compares finite-sample statistics to asymptotic critical values, and tests can show significant size distortions for small samples [2]. Larger samples lead to more reliable inferences. • To convert VEC(q) model parameters in the mles output to vector autoregressive (VAR) model parameters, use the vec2var function.
Algorithms • jcontest identifies deterministic terms that are outside of the cointegrating relations, c1 and d1, by projecting constant and linear regression coefficients, respectively, onto the orthogonal complement of A. • The parameters A and B in the reduced-rank VEC(q) model are not identifiable. jcontest identifies B using the methods in [3], depending on the test. • Tests on B answer questions about the space of cointegrating relations. Tests on A answer questions about common driving forces in the system. For example, an all-zero row in A indicates a variable that is weakly exogenous with respect to the coefficients in B. Such a variable can affect other variables, but it does not adjust to disequilibrium in the cointegrating relations. Similarly, a standard unit vector column in A indicates a variable that is exclusively adjusting to disequilibrium in a particular cointegrating relation. • Constraint matrices R satisfying R′A = 0 or R′B = 0 are equivalent to A = Hφ or B = Hφ, where H is the orthogonal complement of R (null(R')) and φ is a vector of free parameters.
Version History Introduced in R2011a
12-1735
12
Functions
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Haug, A. “Testing Linear Restrictions on Cointegrating Vectors: Sizes and Powers of Wald Tests in Finite Samples.” Econometric Theory. v. 18, 2002, pp. 505–524. [3] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [4] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [5] Morin, N. "Likelihood Ratio Tests on Cointegrating Vectors, Disequilibrium Adjustment Vectors, and their Orthogonal Complements." European Journal of Pure and Applied Mathematics. v. 3, 2010, pp. 541–571.
See Also Objects vecm | varm Functions jcitest | vec2var Topics “Cointegration and Error Correction Analysis” on page 9-107 “Identifying Single Cointegrating Relations” on page 9-113 “Compare Approaches to Cointegration Analysis” on page 9-142 “Test for Cointegration Using the Johansen Test” on page 9-137 “Test Cointegrating Vectors” on page 9-146 “Estimate VEC Model Parameters Using jcitest” on page 9-139 “Testing Cointegrating Vectors and Adjustment Speeds” on page 9-145
12-1736
kpsstest
kpsstest KPSS test for stationarity
Syntax h = kpsstest(y) [h,pValue,stat,cValue] = kpsstest(y) StatTbl = kpsstest(Tbl) [ ___ ] = kpsstest( ___ ,Name=Value) [ ___ ,reg] = kpsstest( ___ )
Description h = kpsstest(y) returns the rejection decision h from conducting the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test on page 12-1747 for a unit root in the univariate time series y. [h,pValue,stat,cValue] = kpsstest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = kpsstest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the KPSS test for a unit root in the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable namevalue argument. [ ___ ] = kpsstest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. kpsstest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when kpsstest conducts multiple tests: • kpsstest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, kpsstest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, for the presence of a unit root in the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West estimator of the long-run variance and the second test includes 1 autocovariance lag. [ ___ ,reg] = kpsstest( ___ ) additionally returns a structure of regression statistics for the hypothesis test reg.
Examples Conduct KPSS Test on Vector of Data Test a time series for a unit root using the default options of kpsstest. Input the time series data as a numeric vector. 12-1737
12
Functions
Load the Nelson-Plosser macroeconomic series data set. Plot the real gross national product (RGNP). load Data_NelsonPlosser rgnp = DataTable.GNPR; dt = datetime(dates,ConvertFrom="datenum"); plot(dt,rgnp) title("Real Gross National Product")
The series exhibits exponential growth. Linearize the RGNP series. linRGNP = log(rgnp);
Assess the null hypothesis of the KPSS test, which is that the series is trend stationary. Use default options. h = kpsstest(linRGNP) h = logical 1
h = 1 indicates that, at a 5% level of significance, the test rejects the null hypothesis that the linearized Real GNP series is trend stationary, which suggests that the series is unit root nonstationary.
12-1738
kpsstest
Return Test p-Value and Decision Statistics Load the Nelson-Plosser Macroeconomic series data set, and linearize the RGNP series. load Data_NelsonPlosser linRGNP = log(DataTable.GNPR);
Assess the null hypothesis that the series is trend stationary. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stats,cValue] = kpsstest(linRGNP) h = logical 1 pValue = 0.0100 stats = 0.6299 cValue = 0.1460
Conduct KPSS Test on Table Variable Test whether a time series, which is one variable in a table, is trend stationary using the default options. Load the Nelson-Plosser macroeconomic series data set, which contains annual measurements of macroeconomic variables in the table DataTable. Linearize the RGNP series by applying the log transformation, and store the result in DataTable. load Data_NelsonPlosser DataTable.LinRGNP = log(DataTable.GNPR); DataTable.Properties.VariableNames{end} ans = 'LinRGNP'
Test the null hypothesis that the linearized RGNP series is trend stationary. StatTbl = kpsstest(DataTable) StatTbl=1×7 table h _____ Test 1
true
pValue ______
stat _______
cValue ______
0.01
0.62989
0.146
Lags ____ 0
Alpha _____
Trend _____
0.05
true
kpsstest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, Trend), and rows correspond to individual tests (in this case, kpsstest conducts one test). 12-1739
12
Functions
By default, kpsstest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Specify Lags for Newey-West Estimator by Testing Up Conduct multiple tests on the linearized RGNP series that reproduce the first row of the second half of Table 5 in [2]. Load the Nelson-Plosser macroeconomic series data set, which contains annual measurements of macroeconomic variables in the table DataTable. Apply the log transformation to all variables in the table. load Data_NelsonPlosser LogDT = varfun(@log,DataTable); LogDT.Properties.VariableNames{end} ans = 'log_SP'
varfun applies log to all variables in DataTable, prepends log_ to all transformed variable names, and stores the result in the table LogDT. The final variable is the log of the stock price index series (SP). Assess the null hypothesis that the linearized RGNP series is trend stationary over a range of lags. Specify the variable name of the linearized RGNP series log_GNPR. lags = (0:8); StatTbl = kpsstest(LogDT,DataVariable="log_GNPR",Lags=lags) StatTbl=9×7 table h _____ Test Test Test Test Test Test Test Test Test
1 2 3 4 5 6 7 8 9
true true true true true true true false false
pValue ________
stat _______
cValue ______
0.01 0.01 0.01 0.0169 0.027579 0.04015 0.048417 0.05886 0.066757
0.62989 0.33666 0.24209 0.1976 0.17291 0.15782 0.1479 0.14122 0.13695
0.146 0.146 0.146 0.146 0.146 0.146 0.146 0.146 0.146
Lags ____ 0 1 2 3 4 5 6 7 8
Alpha _____
Trend _____
0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05
true true true true true true true true true
The tests corresponding to 0 ≤ lags ≤ 2 produce p-values that are less than 0.01. For 2 < lags < 7, the tests indicate sufficient evidence to suggest that log RGNP is unit root nonstationary (as opposed to the series being trend stationary) at the default 5% level.
Select Newey-West Estimator Lags Using Sample Size Test whether the wage series in the manufacturing sector (1900–1970) has a unit root. Use the advice in [2] to select the number of lags in the Newey-West estimator of the coefficient standard errors. 12-1740
kpsstest
Load the Nelson-Plosser macroeconomic data set. Remove all missing values from the data relative to the wage series WN. load Data_NelsonPlosser [DataTable,idx] = rmmissing(DataTable,DataVariables="WN"); dt = dates(~idx);
Compute the effective sample size T and its square root, where the latter is approximately the number of lags recommended for the Newey-West estimator. T = height(DataTable); sqrtT = sqrt(T);
Plot the wage series. plot(dt,DataTable.WN) title("Wages")
The wage series appears to grow exponentially. Linearize the wages series by applying the log transformation to all variables in the table. LogDT = varfun(@log,DataTable); plot(dt,LogDT.log_WN) title("Log Wages")
12-1741
12
Functions
The log wage series appears to have a linear trend. Test the null hypothesis that the log wage series is trend stationary (no unit root) against the alternative hypothesis that the log wage series is difference stationary. Conduct the test by setting a range of lags for the Newey-West estimator around T . StatTbl = kpsstest(LogDT,DataVariable="log_WN",Lags=7:10) StatTbl=4×7 table h _____ Test Test Test Test
1 2 3 4
false false false false
pValue ______ 0.1 0.1 0.1 0.1
stat ________
cValue ______
0.10678 0.10074 0.096634 0.094058
0.146 0.146 0.146 0.146
Lags ____ 7 8 9 10
Alpha _____
Trend _____
0.05 0.05 0.05 0.05
true true true true
All tests fail to reject the null hypothesis that the log wages series is trend stationary. The p-values are larger than 0.1. The software compares the test statistic to critical values and computes p-values that it interpolates from tables in [2].
12-1742
kpsstest
Inspect Regression Statistics Load the Nelson-Plosser macroeconomic series data set. Apply the log transformation to all variables in the table. load Data_NelsonPlosser LogDT = varfun(@log,DataTable);
Assess the null hypothesis that the linearized RGNP series is trend stationary. Use the Trend option to conduct the test with (true) and without (false) a deterministic time trend term in the response model. Return the regression statistics. [~,reg] = kpsstest(LogDT,DataVariable="log_GNPR",Trend=[true false]);
reg is a structure array of length 2 with fields that store the OLS regression results. Each element corresponds to a test. Compare the coefficient estimates. withTrend = reg(1).coeff withTrend = 2×1 4.5834 0.0310 woTrend = reg(2).coeff woTrend = 5.5595
For the first test, the response model for the regression includes a trend term, so the regression coefficients withTrend include a model intercept (under the null hypothesis) 4.5834 and the coefficient of the time trend 0.0310. For the second test, the response model includes an intercept only for the regression, so the intercept woTrend is 5.5595. Display the coefficient standard errors for the first test. reg(1).se ans = 2×1 0.0344 0.0010
The Lags option includes autocovariance lags in the Newey-West estimator of the long-run variance. Therefore, the option does not affect the estimated OLS coefficients, standard errors, or MSE. Conduct a KPSS test for each lag from 0 through 4. Compare the standard OLS and the Newey-West estimates. lags = 0:4; [~,regLags] = kpsstest(LogDT,DataVariable="log_GNPR",Lags=lags); coeffs = table(regLags.coeff,VariableNames="Lags_"+lags, ... RowNames=["Intercept" "Trend"]); se = table(regLags.se,VariableNames="Lags_"+lags, ...
12-1743
12
Functions
RowNames=["SE_Intercept" "SE_Trend"]); mse = table(regLags.MSE,VariableNames="Lags_"+lags, ... RowNames="MSE"); nw = table(regLags.NWEst,VariableNames="Lags_"+lags, ... RowNames="NWVar"); [coeffs; se; mse; nw] ans=6×5 table
Intercept Trend SE_Intercept SE_Trend MSE NWVar
Lags_0 __________
Lags_1 __________
Lags_2 __________
Lags_3 __________
Lags_4 __________
4.5834 0.030988 0.03443 0.00095035 0.017933 0.017354
4.5834 0.030988 0.03443 0.00095035 0.017933 0.03247
4.5834 0.030988 0.03443 0.00095035 0.017933 0.045154
4.5834 0.030988 0.03443 0.00095035 0.017933 0.055321
4.5834 0.030988 0.03443 0.00095035 0.017933 0.063222
Input Arguments y — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note kpsstest removes missing observations, represented by NaN values, from the input series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: kpsstest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, for the presence of a unit root in the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West estimator of the long-run variance and the second test includes 1 autocovariance lag. Lags — Number of autocovariance lags 0 (default) | nonnegative integer | vector of nonnegative integers 12-1744
kpsstest
Number of autocovariance lags to include in the Newey-West estimator of the long-run variance, specified as a nonnegative integer or vector of nonnegative integers. If Lags(j) > 0, kpsstest includes lags 1 through Lags(j) in the estimator for test j. kpsstest conducts a separate test for each element in Lags. Example: Lags=0:2 includes zero lagged autocovariance terms in the Newey-West estimator for the first test, the lag 1 autocovariance term for the second test, and autocovariance lags 1 and 2 in the third test. Data Types: double Trend — Flag for including deterministic trend term δt true (default) | false | logical vector Flag for including deterministic trend δt in the model, specified as a logical scalar or vector. kpsstest conducts a separate test for each element in Trend. Example: Trend=false excludes δt from the response model for all tests. Data Types: logical Alpha — Significance level 0.05 (default) | numeric scalar | numeric vector Significance level for the hypothesis test, specified as a numeric scalar or vector with entries between 0.01 and 0.10. kpsstest conducts a separate test for each element in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string Note • When kpsstest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors.
12-1745
12
Functions
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. kpsstest returns h when you supply the input y. • Values of 1 indicate rejection of the trend-stationary null hypothesis in favor of the unit root alternative. • Values of 0 indicate failure to reject the trend-stationary null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. kpsstest returns pValue when you supply the input y. The p-values are right-tail probabilities. When test statistics are outside tabulated critical values, kpsstest returns maximum (0.10) or minimum (0.01) p-values. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. kpsstest returns stat when you supply the input y. kpsstest computes test statistics by using an ordinary least squares (OLS) regression (for more details, see KPSS Test on page 12-1747). • If you set Trend=false, kpsstest regresses y on an intercept. • Otherwise, kpsstest regresses y on an intercept and trend term. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. kpsstest returns cValue when you supply the input y. Critical values are for right-tail probabilities. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. kpsstest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, and Trend. reg — Regression statistics structure array 12-1746
kpsstest
Regression statistics for OLS estimation of the coefficients in the model, returned as a structure array with the number of records equal to the number of tests. Each element of reg has the fields in this table. You can access a field using dot notation, for example, reg(1).coeff contains the coefficient estimates of the first test. Field
Description
num
Length of input series with NaNs removed
size
Effective sample size T, adjusted for lags
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
autoCov
Estimated residual autocovariances
NWEst
Newey-West coefficient standard error estimates
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
More About Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) Test The KPSS test assesses the null hypothesis that a univariate time series is trend stationary against the alternative that it is a nonstationary unit root process. The test uses the structural model 12-1747
12
Functions
yt = ct + δt + u1t ct = ct − 1 + u2t, where • δ is the trend coefficient (see the Trend argument). • u1t is a stationary process. • u2t is an independent and identically distributed process with mean 0 and variance σ2. The null hypothesis is that σ2 = 0, which implies that the random walk term (ct) is constant and acts as the model intercept. The alternative hypothesis is that σ2 > 0, which introduces the unit root in the random walk. An OLS regression of yt onto Xt yields the residual series {et}, where Xt has one of the following forms: • Xt = 1 for all t when Trend is false. • Xt = [1 δt] when Trend is true. The test statistic is
∑
T
t=1 s2T 2
2
St
,
where • T is the effective sample size. • s2 is the Newey-West estimate of the long-run variance. • sT = e1 + e2 + … + eT.
Tips • To draw valid inferences from a KPSS test, you must determine a suitable value for the Lags argument. The following methods can determine a suitable number of lags: • Begin with a small number of lags, and then evaluate the sensitivity of the results by adding more lags. • Kwiatkowski et al. [2] suggest that a number of lags on the order of T , where T is the effective sample size, is often satisfactory under both the null and the alternative. For consistency of the Newey-West estimator, the number of lags must approach infinity as the sample size increases. • With a specific testing strategy in mind, determine the value of the Trend argument by the growth characteristics of the input time series. • If the input series grows, include a trend term by setting Trend to true (default). This setting provides a reasonable comparison of a trend stationary null and a unit root process with drift. • If a series does not exhibit long-term growth characteristics, exclude a trend term by setting Trend to false. 12-1748
kpsstest
Algorithms • Test statistics follow nonstandard distributions under the null, even asymptotically. Kwiatkowski et al. [2] use Monte Carlo simulations, for models with and without a trend, to tabulate asymptotic critical values for a standard set of significance levels between 0.01 and 0.1. kpsstest interpolates critical values and p-values from these tables.
Version History Introduced in R2009b
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [3] Newey, W. K., and K. D. West. "A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, 1987, pp. 703–708.
See Also pptest | adftest | vratiotest | lmctest Topics “Unit Root Tests” on page 3-40 “Unit Root Nonstationarity” on page 3-32
12-1749
12
Functions
lagmatrix Create lagged time series data
Syntax YLag = lagmatrix(Y,lags) [YLag,TLag] = lagmatrix(Y,lags) LagTbl = lagmatrix(Tbl,lags) [ ___ ] = lagmatrix( ___ ,Name=Value)
Description YLag = lagmatrix(Y,lags) shifts the input regular series Y in time by the lags (positive) or leads (negative) in lags, and returns the matrix of shifted series YLag. [YLag,TLag] = lagmatrix(Y,lags) also returns a vector TLag representing the common time base for the shifted series relative to the original time base of 1, 2, 3, …, numObs. LagTbl = lagmatrix(Tbl,lags) shifts all variables in the input table or timetable Tbl, which represent regular time series, and returns the table or timetable of shifted series LagTbl. To select different variables in Tbl to shift, use the DataVariables name-value argument. [ ___ ] = lagmatrix( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. lagmatrix returns the output argument combination for the corresponding input arguments. For example, lagmatrix(Tbl,1,Y0=zeros(1,5),DataVariables=1:5) lags, by one period, the first five variables in the input table Tbl and sets the presample of each series to 0.
Examples Shift Matrix of Data Create a bivariate time series matrix X with five observations each. Y = [1 -1; 2 -2 ;3 -3 ;4 -4 ;5 -5] Y = 5×2 1 2 3 4 5
-1 -2 -3 -4 -5
Create a shifted matrix, which is composed of the original X and its first two lags. lags = [0 1 2]; XLag = lagmatrix(Y,lags)
12-1750
lagmatrix
XLag = 5×6 1 2 3 4 5
-1 -2 -3 -4 -5
NaN 1 2 3 4
NaN -1 -2 -3 -4
NaN NaN 1 2 3
NaN NaN -1 -2 -3
XLAG is a 5-by-6 matrix: • The first two columns contain the original data (lag 0). • Columns 3 and 4 contain the data lagged by one unit. • Columns 5 and 6 contain the data lagged by two units. By default, lagmatrix returns only values corresponding to the time base of the original data, and the function fills unknown presample values using NaNs.
Return Time Base of Shifted Series Create a bivariate time series matrix X with five observations each. Y = [1 -1; 2 -2 ;3 -3 ;4 -4 ;5 -5] Y = 5×2 1 2 3 4 5
-1 -2 -3 -4 -5
Create a shifted matrix, which is composed of the original X and its first two lags. Return the time base of the shift series. lags = [0 1 2]; [XLag,TLag] = lagmatrix(Y,lags);
By default, lagmatrix returns the time base of the input data.
Shift Table Variables Shift multiple time series, which are variables in tables, using the default options of lagmatrix. Load data of yearly Canadian inflation and interest rates Data_Canada.mat, which contains five series in the table DataTable. load Data_Canada
Create a timetable from the table of data. 12-1751
12
Functions
dates = datetime(dates,12,31); TT = table2timetable(DataTable,RowTimes=dates); TT.Observations = []; tail(TT) Time ___________
INF_C _______
INF_G _______
INT_S ______
INT_M ______
INT_L ______
31-Dec-1987 31-Dec-1988 31-Dec-1989 31-Dec-1990 31-Dec-1991 31-Dec-1992 31-Dec-1993 31-Dec-1994
4.2723 3.9439 4.8743 4.6547 5.4633 1.4946 1.8246 0.18511
4.608 4.5256 4.7258 3.1015 2.8614 1.2281 1.0473 0.60929
8.1692 9.4158 12.016 12.805 8.8301 6.5088 4.9268 5.4168
9.4158 9.7717 10.203 11.193 9.1625 7.4317 6.4583 7.7867
9.9267 10.227 9.9217 10.812 9.8067 8.7717 7.8767 8.58
Create timetable containing all series lagged by one year, the series themselves, and the series led by a year. lags = [1 0 -1]; LagTT = lagmatrix(TT,lags); head(LagTT) Time ___________
Lag1INF_C _________
31-Dec-1954 31-Dec-1955 31-Dec-1956 31-Dec-1957 31-Dec-1958 31-Dec-1959 31-Dec-1960 31-Dec-1961
NaN 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396
Lag1INF_G _________ NaN 1.4468 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139
Lag1INT_S _________ NaN 1.4658 1.5533 2.9025 3.7775 2.2925 4.805 3.3242
Lag1INT_M _________ NaN 2.6683 2.7908 3.7575 4.565 3.4692 4.9383 4.5192
Lag1INT_L _________ NaN 3.255 3.1892 3.6058 4.125 4.115 5.0492 5.1892
Lag0INF_C _________ 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396 1.0156
LagTT is a timetable containing the shifted series. lagmatrix appends each variable of the input timetable by Lagj or Leadj, depending on whether the series is a lag or lead, with j indicating the number of shifting units. By default, lagmatrix shifts all variables in the input table. You can choose a subset of variables to shift by using the DataVariables name-value argument. For example, shift only the inflation rate series. LagTTINF = lagmatrix(TT,lags,DataVariables=["INF_C" "INF_G"]); head(LagTTINF)
12-1752
Time ___________
Lag1INF_C _________
31-Dec-1954 31-Dec-1955 31-Dec-1956 31-Dec-1957 31-Dec-1958 31-Dec-1959 31-Dec-1960 31-Dec-1961
NaN 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396
Lag1INF_G _________ NaN 1.4468 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139
Lag0INF_C _________ 0.6606 0.077402 1.4218 3.1546 2.4828 1.183 1.2396 1.0156
Lag0INF_G _________
Lead1INF_C __________
1.4468 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139 0.46074
0.077402 1.4218 3.1546 2.4828 1.183 1.2396 1.0156 1.1088
Lead1INF_G __________ 0.76162 3.0433 2.3148 1.3636 2.0722 1.2139 0.46074 1.3737
lagmatrix
Specify Presample and Postsample Data Create a vector of univariate time series data. y = [0.1 0.4 -0.2 0.1 0.2]';
Create vectors representing presample and postsample data. y0 = [0.50; 0.75]*y(1) y0 = 2×1 0.0500 0.0750 yF = [0.75; 0.50]*y(end) yF = 2×1 0.1500 0.1000
Shift the series by two units in both directions. Specify the presample and postsample data, and return a matrix containing shifted series for the entire time base. lags = [2 0 -2]; [YLag,TLag] = lagmatrix(y,lags,Y0=y0,YF=yF) YLag = 5×3 0.0500 0.0750 0.1000 0.4000 -0.2000
0.1000 0.4000 -0.2000 0.1000 0.2000
-0.2000 0.1000 0.2000 0.1500 0.1000
TLag = 5×1 1 2 3 4 5
Because the presample and postsample have enough observations to cover the time base of the input data, the shifted series YLag is completely specified (it does not contain NaN entries). Shift the series in the same way, but return a matrix containing shifted series for the entire time base by specifying "full" for the Shape name-value argument. [YLagFull,TLagFull] = lagmatrix(y,lags,Y0=y0,YF=yF,Shape="full")
12-1753
12
Functions
YLagFull = 9×3 NaN NaN 0.0500 0.0750 0.1000 0.4000 -0.2000 0.1000 0.2000
0.0500 0.0750 0.1000 0.4000 -0.2000 0.1000 0.2000 0.1500 0.1000
0.1000 0.4000 -0.2000 0.1000 0.2000 0.1500 0.1000 NaN NaN
TLagFull = 9×1 -1 0 1 2 3 4 5 6 7
Because the presample and postsample do not contain enough observations to cover the full time base, which includes presample through postsample times, lagmatrix fills unknown sample units using NaN values.
Input Arguments Y — Time series data numeric matrix Time series data, specified as a numObs-by-numVars numeric matrix. Each column of Y corresponds to a variable, and each row corresponds to an observation. Data Types: double lags — Data shifts integer | integer-valued vector Data shifts, specified as an integer or integer-valued vector of length numShifts. • Lags are positive integers, which shift the input series forward over the time base. • Leads are negative integers, which shift the input series backward over the time base. lagmatrix applies each specified shift in lags, in order, to each input series. Shifts of regular time series have units of one time step. Data Types: double Tbl — Time series data table | timetable 12-1754
lagmatrix
Time series data, specified as a table or timetable with numObs rows. Each row of Tbl is an observation. If Tbl is a timetable, it must represent a sample with a regular datetime time step (see isregular). Specify numVars variables to filter by using the DataVariables argument. The selected variables must be numeric. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: lagmatrix(Tbl,1,Y0=zeros(1,5),DataVariables=1:5) lags, by one period, the first 5 variables in the input table Tbl and sets the presample of each series to 0. Y0 — Presample data NaN (default) | numeric matrix | table | timetable Presample data to backward fill lagged series, specified as a matrix with numVars columns, or a table or timetable. For a table or timetable, the DataVariables name-value argument selects the variables in Y0 to shift. Y0 must have the same data type as the input data. Timetables must have regular sample times preceding times in Tbl. lagmatrix fills required presample values from the end of Y0. Example: Y0=zeros(size(Y,2),2) YF — Postsample data to front fill led series NaN (default) | numeric matrix | table | timetable Postsample data to frontward fill led series, specified as a matrix with numVars columns, or a table or timetable. For a table or timetable, the DataVariables name-value argument selects the variables in YF to shift. The default for postsample data is NaN. YF must have the same data type as the input data. Timetables must have regular sample times following times in Tbl. lagmatrix fills required postsample values from the beginning of YF. Example: YF=ones(size(Y,2),3) DataVariables — Variables in Tbl, Y0, and YF all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in Tbl, Y0, and YF, from which lagmatrix creates shifted time series data, specified as a string vector or cell vector of character vectors containing variable names in Tbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. 12-1755
12
Functions
Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string Shape — Part of shifted series to appear in outputs "same" (default) | "full" | "valid" | character vector Part of the shifted series to appear in the outputs, specified as a value in this table. Value
Description
"full"
Outputs contain all values in the input time series data and all specified presample Y0 or postsample Yf values on an expanded time base.
"same"
Outputs contain only values on the original time base.
"valid"
Outputs contain values for times at which all series have specified (non-NaN) values.
To illustrate the shape of the output shifted time series for each value of Shape, suppose the input time series data is a 2-D series with numObs = T observations y1, t y2, t , and lags is [1 0 -1]. The output shifted series is one of the three T-by-6 matrix arrays in this figure.
Example: Shape="full" Data Types: char | string
Output Arguments YLag — Shifted time series variables numeric matrix Shifted time series variables in Y, returned as a numeric matrix. lagmatrix returns YLag when you supply the input Y. Columns are, in order, all series in Y shifted by the lags(1), all series in Y shifted by the lags(2), …, all series in Y shifted by lags(end). Rows depend on the value of the Shape name-value argument. 12-1756
lagmatrix
For example, suppose Y is the 2-D time series of numObs = T observations y1, t y2, t , lags is [1 0 -1], and Shape if "full". YLag is the T-by-6 matrix NaN
NaN
NaN
NaN
y1, 1 y2, 1
NaN
NaN
y1, 1
y2, 1
y1, 2 y2, 2
y1, 1
y2, 1
y1, 2
y2, 2
y1, 3 y2, 3
⋮
⋮
⋮
⋮
y1, T − 2 y2, T − 2 y1, T − 1 y2, T − 1
⋮ ⋮ . y1, T y2, T
y1, T − 1 y2, T − 1
NaN NaN
y1, T
y2, T
y1, T
y2, T
NaN
NaN NaN NaN
TLag — Common time base for the shifted series numeric matrix Common time base for the shifted series relative to the original time base of 1, 2, 3, …, numObs, returned as a vector of length equal to the number of observations in YLag. lagmatrix returns TLag when you supply the input Y. Series with lags (lags > 0) have higher indices; series with leads (lags < 0) have lower indices. For example, the value of TLag for the example in the YLag output description is the column vector with entries 0:(T+1). LagTbl — Shifted time series variables and common time base table | timetable Shifted time series variables and common time base, returned as a table or timetable, the same data type as Tbl. lagmatrix returns LagTbl when you supply the input Tbl. LagTbl contains the outputs YLag and TLag. The following conditions apply: • Each lagged variable of LagTbl has a label Lagjvarname, where varname is the corresponding variable name in DataVariables and j is lag j in lags. • Each lead variable has a label Leadjvarname, where j is lead j in lags. • If LagTbl is a table, the variable labeled TLag contains TLag. • If LagTbl is a timetable, the Time variable contains TLag.
Version History Introduced before R2006a R2022a: lagmatrix accepts input data in tables and timetables, and returns results in tables In addition to accepting input data in numeric arrays, lagmatrix accepts input data in tables and timetables. To choose which variables from the input table or timetable to lag, specify the DataVariables name-value argument. R2022a: Specify presample, postsample, or output shape, and return time base
12-1757
12
Functions
In addition to supporting input data in a table or timetable, lagmatrix enables you to optionally specify the following values by using name-value argument syntax: • Y0 — Presample data, specified as a matrix, table, or timetable • YF — Postsample data, specified as a matrix, table, or timetable • Shape — Output shape specifying which part of the shifted series to return. Also, lagmatrix returns the time base defined by the value of Shape, either in the second output position when you specify numeric data or as a variable in the output table when you specify data in a table or timetable.
See Also Functions filter Objects LinearModel Topics “Check Model Assumptions for Chow Test” on page 3-103 “Time Series Regression IX: Lag Order Selection” on page 5-264
12-1758
LagOp
LagOp Create lag operator polynomial
Description Create a p-degree, m-dimensional lag operator polynomial A(L) = A0 + A1L1 + A2L2 +...+ ApLp by specifying the coefficient matrices A0,…,Ap and, optionally, the corresponding lags. L is the lag (or backshift) operator such that Ljyt = yt–j. LagOp object functions on page 12-1761 enable you to work with specified polynomials. For example, you can filter time series data through a polynomial, determine whether one is stable, or combine multiple polynomials by performing polynomial algebra including addition, subtraction, multiplication, and division. To fit a dynamic model containing lag operator polynomials to data, create the appropriate model object, and then fit it to the data. For univariate models, see arima and estimate; for multivariate models, see varm and estimate. For further analysis, you can create a LagOp object from the resulting estimated coefficients.
Creation Syntax A = LagOp(coefficients) A = LagOp(coefficients,Name,Value) Description A = LagOp(coefficients) creates a lag operator polynomial A with coefficients coefficients, and sets the Coefficients property. A = LagOp(coefficients,Name,Value) specifies additional options using one or more namevalue pair arguments. For example, LagOp(coefficients,'Lags',[0 4 8],'Tolerance',1e-10) associates the coefficients to lags 0, 4, and 8, and sets the lag inclusion tolerance to 1e-10. Input Arguments coefficients — Lag operator polynomial coefficients numeric vector | square numeric matrix | cell vector of square numeric matrices Lag operator polynomial coefficients, specified as a numeric vector, m-by-m square numeric matrix, or cell vector of m-by-m square numeric matrices. Value
Polynomial Returned by LagOp
Numeric vector of length p + 1
A p-degree, 1-D lag operator polynomial, where coefficient(j) is the coefficient of lag j – 1
12-1759
12
Functions
Value
Polynomial Returned by LagOp
m-by-m square numeric matrix
A 0-degree, m-D lag operator polynomial, where coefficient is A0, the coefficient of lag 0
Length p + 1 cell vector of m-by-m square numeric matrices
A p-degree m-D lag operator polynomial, where coefficient(j) is the coefficient matrix of lag j – 1
Example: LagOp(1:3) creates the polynomial A(L) = 1 + 2L1 + 3L2. Name-Value Pair Arguments
Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Lags',[4 8],'Tolerance',1e-10 associates the coefficients to lags 4 and 8, and sets the coefficient magnitude tolerance to 1e-10. Lags — Lags associated with polynomial coefficients vector of unique, nonnegative integers Lags associated with the polynomial coefficients, specified as the comma-separated pair consisting of 'Lags' and a vector of unique, nonnegative integers. Lags must have numcoeff elements, where numcoeff is one of the following: • If coefficients is a numeric or cell vector, numcoeff = numel(coefficients). The coefficient of lag Lags(j) is coefficients(j). • If coefficients is a matrix, numcoeff = 1. Example: 'Lags',[0 4 8 12] Tolerance — Lag inclusion tolerance 1e–12 (default) | nonnegative numeric scalar Lag inclusion tolerance, specified as the comma-separated pair consisting of 'Tolerance' and a nonnegative numeric scalar. Let c be the element of coefficient j with the largest magnitude. If c ≤ Tolerance, LagOp removes coefficient j from the polynomial. Consequently, MATLAB performs the following actions: • Remove the corresponding lag from the vector in the Lags property. • Replace the corresponding coefficient of the array in the Coefficients property with zeros(m). • If a removed lag is the final lag in the polynomial, MATLAB reduces the degree of the polynomial to the degree of the next largest lag present in the polynomial. For example, if MATLAB removes lags 3 and 4 from a degree 4 polynomial because their coefficient magnitudes are below Tolerance, the resulting polynomial has a degree of 2. Example: 'Tolerance',1e-10
12-1760
LagOp
Properties Coefficients — Lag operator polynomial coefficients lag-indexed cell array of numeric scalars | lag-indexed cell array of numeric matrices Lag operator polynomial coefficients, specified as a lag-indexed cell array of numeric scalars or m-bym matrices. Coefficients{j} is the coefficient of lag j, where the integer j ≥ 0. For example, A.Coefficients{0} stores the lag 0 coefficient of A. The Lags property stores the indices of nonzero coefficients of Coefficients. Degree — Polynomial degree p nonnegative numeric scalar This property is read-only. Polynomial degree p, specified as a nonnegative numeric scalar. Degree = max(Lags), the largest lag associated with a nonzero coefficient. Data Types: double Dimension — Polynomial dimension m positive numeric scalar This property is read-only. Polynomial dimension m, specified as a positive numeric scalar. You can apply the polynomial A only to an m-D time series variable. Data Types: double Lags — Polynomial lags associated with nonzero coefficients lag-indexed vector of nonnegative integers This property is read-only. Polynomial lags associated with nonzero coefficients, specified as a lag-indexed vector of nonnegative integers. To work with the value of Lags, convert it to a standard MATLAB vector by entering the following code. lags = A.Lags;
Data Types: double
Object Functions filter isEqLagOp isNonZero isStable minus mldivide mrdivide mtimes
Apply lag operator polynomial to filter time series Determine if two LagOp objects are same mathematical polynomial Find lags associated with nonzero coefficients of LagOp objects Determine stability of lag operator polynomial Lag operator polynomial subtraction Lag operator polynomial left division Lag operator polynomial right division Lag operator polynomial multiplication 12-1761
12
Functions
plus reflect toCellArray
Lag operator polynomial addition Reflect lag operator polynomial coefficients around lag zero Convert lag operator polynomial object to cell array
Examples Create and Modify Univariate Lag Operator Polynomial Create a LagOp object that represents the lag operator polynomial A(L) = 1 − 0 . 6L + 0 . 08L2 . A = LagOp([1 -0.6 0.08]) A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.6 0.08] Lags: [0 1 2] Degree: 2 Dimension: 1
Display the coefficient of lag 0. a0 = A.Coefficients{0} a0 = 1
Assign a nonzero coefficient to the third lag: A.Coefficients{3} = 0.5 A = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -0.6 0.08 0.5] Lags: [0 1 2 3] Degree: 3 Dimension: 1
The polynomial degree increases to 3.
Specify Polynomial Lags Create a LagOp object that represents the lag operator polynomial A(L) = 1 + 0 . 25L4 + 0 . 1L8 + 0 . 05L12 . nonzeroCoeffs = [1 0.25 0.1 0.05]; lags = [0 4 8 12]; A = LagOp(nonzeroCoeffs,'Lags',lags) A = 1-D Lag Operator Polynomial:
12-1762
LagOp
----------------------------Coefficients: [1 0.25 0.1 0.05] Lags: [0 4 8 12] Degree: 12 Dimension: 1
Extract the coefficients from the lag operator polynomial, and display all coefficients from lags 0 through 12. allCoeffs = toCellArray(A); allCoeffs = cell2mat(allCoeffs'); allLags = 0:A.Degree;
% Extract coefficients % Prepare lags for display
table(allCoeffs,'RowNames',"Lag " + string(allLags)) ans=13×1 table allCoeffs _________ Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag Lag
0 1 2 3 4 5 6 7 8 9 10 11 12
1 0 0 0 0.25 0 0 0 0.1 0 0 0 0.05
Create Multivariate Lag Operator Polynomial Create a LagOp object that represents the lag operator polynomial 0.5 0 0 1 0 . 25 0 . 1 AL = 0 1 0 + −0 . 5 1 −0 . 5 L4 . 0 0 −0 . 5 0 . 15 −0 . 2 1 Phi0 = [0.5 0 0 1 0 0
0;... 0;... -0.5];
Phi4 = [1 -0.5 0.15
0.25 1 -0.2
0.1;... -0.5;... 1];
Phi = {Phi0 Phi4}; lags = [0 4]; A = LagOp(Phi,'Lags',lags)
12-1763
12
Functions
A = 3-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 2 Non-Zero Coefficients] Lags: [0 4] Degree: 4 Dimension: 3
Work with Lag Operator Polynomials Create Multivariate Lag Operator Polynomial Create the 2-D, degree 3 lag operator polynomial A(L) =
1 0 0 . 5 0 . 25 0 . 05 0 . 025 3 + L+ L . 0 1 −0 . 1 0 . 4 −0 . 01 0 . 04
m = 2; A0 = eye(m); A1 = [0.5 0.25; -0.1 0.4]; A3 = 0.1*A1; Coeffs = {A0 A1 A3}; lags = [0 1 3]; A = LagOp(Coeffs,'Lags',lags) A = 2-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 3 Non-Zero Coefficients] Lags: [0 1 3] Degree: 3 Dimension: 2
Determine Polynomial Stability A lag operator polynomial is stable if the magnitudes of all eigenvalues of its characteristic polynomial are less than 1. Determine whether the polynomial is stable, and return the eigenvalues of its characteristic polynomial. [tf,evals] = isStable(A) tf = logical 1 evals = 6×1 complex -0.5820 -0.5820 0.0821 0.0821 0.0499
12-1764
+ + +
0.1330i 0.1330i 0.2824i 0.2824i 0.2655i
LagOp
0.0499 - 0.2655i
Invert Polynomial Compute the inverse of A L by right dividing the 2-by-2 identity matrix by A L . Ainv = mrdivide(eye(A.Dimension),A) Ainv = 2-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 10 Non-Zero Coefficients] Lags: [0 1 2 3 4 5 6 7 8 9] Degree: 9 Dimension: 2
Ainv is a LagOp object representing the inverse of A L , a degree 9 lag operator polynomial. In theory, the inverse of a lag operator polynomial is of infinite degree, but the mrdivide coefficientmagnitude tolerances truncate the polynomial. Multiply A L and its inverse. checkinv = Ainv*A checkinv = 2-D Lag Operator Polynomial: ----------------------------Coefficients: [Lag-Indexed Cell Array with 4 Non-Zero Coefficients] Lags: [0 10 11 12] Degree: 12 Dimension: 2
Because the inverse computation returns the truncated theoretical inverse, the product checkinv contains lags representing the remainder. You can decrease the mrdivide coefficient-magnitude tolerances to obtain an inverse polynomial that is more precise. Filter Time Series Produce the 2-D time series yt = A(L)et = A0et + A1et − 1 + A2et − 2 + A3et − 3 by filtering the 2-D series of 100 random standard Gaussian deviates et through the polynomial. T = 100; e = randn(T,m); y = filter(A,e); plot((A.Degree + 1):T,y) title('Filtered Series')
12-1765
12
Functions
y is a 97-by-2 matrix representing yt . y has p fewer observations than e because filter requires the first p observations of e to initialize the dynamic series when producing y(1:4,:).
Version History Introduced in R2010a
See Also arima | varm Topics “Specify Lag Operator Polynomials” on page 2-9 “Stochastic Process Characteristics” on page 1-18 “Create Autoregressive Models” on page 7-16 “Create Moving Average Models” on page 7-24 “Vector Autoregression (VAR) Model Creation” on page 9-20
12-1766
lassoblm
lassoblm Bayesian linear regression model with lasso regularization
Description The Bayesian linear regression model on page 12-1782 object lassoblm specifies the joint prior distribution of the regression coefficients and the disturbance variance (β, σ2) for implementing Bayesian lasso regression [1]. For j = 1,…,NumPredictors, the conditional prior distribution of βj|σ2 is the Laplace (double exponential) distribution with a mean of 0 and scale σ2/λ, where λ is the lasso regularization, or shrinkage, parameter. The prior distribution of σ2 is inverse gamma with shape A and scale B. The data likelihood is
T
∏
t=1
ϕ yt; xt β, σ2 , where ϕ(yt;xtβ,σ2) is the Gaussian probability density
evaluated at yt with mean xtβ and variance σ2. The resulting posterior distribution is not analytically tractable. For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis and feature selection, pass the model object and data to the appropriate object function on page 12-1770.
Creation Syntax PriorMdl = lassoblm(NumPredictors) PriorMdl = lassoblm(NumPredictors,Name,Value) Description PriorMdl = lassoblm(NumPredictors) creates a Bayesian linear regression model on page 122077 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is appropriate for implementing Bayesian lasso regression [1]. PriorMdl is a template that defines the prior distributions and specifies the values of the lasso regularization parameter λ and the dimensionality of β. PriorMdl = lassoblm(NumPredictors,Name,Value) sets properties on page 12-1768 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, lassoblm(3,'Lambda',0.5) specifies a shrinkage of 0.5 for the three coefficients (not the intercept).
12-1767
12
Functions
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to set the shrinkage for all coefficients, except the intercept, to 0.5, enter PriorMdl.Lambda = 0.5;
NumPredictors — Number of predictor variables nonnegative integer Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term for the value. After creating a model, if you change the of value NumPredictors using dot notation, then these parameters revert to the default values: • Variable names (VarNames) • Shrinkage parameter (Lambda) Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. 12-1768
lassoblm
The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char Lambda — Lasso regularization parameter 1 (default) | positive numeric scalar | positive numeric vector Lasso regularization parameter for all regression coefficients, specified as a positive numeric scalar or (Intercept + NumPredictors)-by-1 positive numeric vector. Larger values of Lambda cause corresponding coefficients to shrink closer to zero. Suppose X is a T-by-NumPredictors matrix of predictor data, which you specify during estimation, simulation, or forecasting. • If Lambda is a vector and Intercept is true, Lambda(1) is the shrinkage for the intercept, Lambda(2) is the shrinkage for the coefficient of the first predictor X(:,1), Lambda(3) is the shrinkage for the coefficient of the second predictor X(:,2),…, and Lambda(NumPredictors + 1) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors). • If Lambda is a vector and Intercept is false, Lambda(1) is the shrinkage for the coefficient of the first predictor X(:,1),…, and Lambda(NumPredictors) is the shrinkage for the coefficient of the last predictor X(:,NumPredictors). • If you supply the scalar s for Lambda, then all coefficients of the predictors in X have a shrinkage of s. • If Intercept is true, the intercept has a shrinkage of 0.01, and lassoblm stores [0.01; s*ones(NumPredictors,1)] in Lambda. • Otherwise, lassoblm stores s*ones(NumPredictors,1) in Lambda. Example: 'Lambda',6 Data Types: double A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar Shape hyperparameter of the inverse gamma prior on σ2, specified as a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. Example: 'A',0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf. 12-1769
12
Functions
With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. Example: 'B',5 Data Types: double
Object Functions estimate simulate forecast plot summarize
Perform predictor variable selection for Bayesian linear regression models Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Examples Create Prior Model for Bayesian Lasso Regression Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions: • For j = 0,...,3, β j | σ2 has a Laplace distribution with a mean of 0 and a scale of σ2 /λ, where λ is the shrinkage parameter. The coefficients are conditionally independent. • σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. Create a prior model for Bayesian linear regression. Specify the number of predictors p. p = 3; Mdl = lassoblm(p);
PriorMdl is a lassoblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, lassoblm displays a summary of the prior distributions. Alternatively, you can create a prior model for Bayesian lasso regression by passing the number of predictors to bayeslm and setting the ModelType name-value pair argument to 'lasso'. MdlBayesLM = bayeslm(p,'ModelType','lasso') MdlBayesLM = lassoblm with properties: NumPredictors: 3
12-1770
lassoblm
Intercept: VarNames: Lambda: A: B:
1 {4x1 cell} [4x1 double] 3 1
| Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | 0 100 [-200.000, 200.000] 0.500 Scale mixture Beta(1) | 0 1 [-2.000, 2.000] 0.500 Scale mixture Beta(2) | 0 1 [-2.000, 2.000] 0.500 Scale mixture Beta(3) | 0 1 [-2.000, 2.000] 0.500 Scale mixture Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Mdl and MdlBayesLM are equivalent model objects. You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. Mdl.VarNames = ["IPI" "E" "WR"] Mdl = lassoblm with properties: NumPredictors: Intercept: VarNames: Lambda: A: B:
3 1 {4x1 cell} [4x1 double] 3 1
| Mean Std CI95 Positive Distribution --------------------------------------------------------------------------Intercept | 0 100 [-200.000, 200.000] 0.500 Scale mixture IPI | 0 1 [-2.000, 2.000] 0.500 Scale mixture E | 0 1 [-2.000, 2.000] 0.500 Scale mixture WR | 0 1 [-2.000, 2.000] 0.500 Scale mixture Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
MATLAB® associates the variable names to the regression coefficients in displays.
Perform Variable Selection Using Default Lasso Shrinkage Consider the linear regression model in “Create Prior Model for Bayesian Lasso Regression” on page 12-1770. Create a prior model for performing Bayesian lasso regression. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]); shrinkage = PriorMdl.Lambda
12-1771
12
Functions
shrinkage = 4×1 0.0100 1.0000 1.0000 1.0000
PriorMdl stores the shrinkage values for all predictors in its Lambda property. shrinkage(1) is the shrinkage for the intercept, and the elements of shrinkage(2:end) correspond to the coefficients of the predictors in Mdl.VarNames. The default shrinkage for the intercept is 0.01, and the default is 1 for all other coefficients. Load the Nelson-Plosser data set. Create variables for the response and predictor series. Because lasso is sensitive to variable scales, standardize all variables. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; X = (X - mean(X,'omitnan'))./std(X,'omitnan'); y = (y - mean(y,'omitnan'))/std(y,'omitnan');
Although this example standardizes variables, you can specify different shrinkage values for each coefficient by setting the Lambda property of PriorMdl to a numeric vector of shrinkage values. Implement Bayesian lasso regression by estimating the marginal posterior distributions of β and σ2. Because Bayesian lasso regression uses Markov chain Monte Carlo (MCMC) for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: lasso MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------Intercept | -0.4490 0.0527 [-0.548, -0.344] 0.000 Empirical IPI | 0.6679 0.1063 [ 0.456, 0.878] 1.000 Empirical E | 0.1114 0.1223 [-0.110, 0.365] 0.827 Empirical WR | 0.2215 0.1367 [-0.024, 0.494] 0.956 Empirical Sigma2 | 0.0343 0.0062 [ 0.024, 0.048] 1.000 Empirical
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions at the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E (standardized) is in [-0.110, 0.365] is 0.95. • Positive, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0. 12-1772
lassoblm
By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-1773
12
Functions
The trace plots indicate that the draws seem to mix well. The plot shows no detectable transience or serial correlation, and the draws do not jump between states. Plot the posterior distributions of the coefficients and disturbance variance. figure; plot(PosteriorMdl)
12-1774
lassoblm
E and WR might not be important predictors because 0 is within the region of high density in their posterior distributions.
Attribute Different Shrinkage Values to Coefficients Consider the linear regression model in “Create Prior Model for Bayesian Lasso Regression” on page 12-1770 and its implementation in “Perform Variable Selection Using Default Lasso Shrinkage” on page 12-1771. When you implement lasso regression, a common practice is to standardize variables. However, if you want to preserve the interpretation of the coefficients, but the variables have different scales, then you can perform differential shrinkage by specifying a different shrinkage for each coefficient. Create a prior model for performing Bayesian lasso regression. Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. Determine whether the variables have exponential trends by plotting each in separate figures. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)};
12-1775
12
Functions
y = DataTable{:,'GNPR'}; figure; plot(dates,y) title('GNPR')
for j = 1:3 figure; plot(dates,X(:,j)); title(PriorMdl.VarNames(j + 1)); end
12-1776
lassoblm
12-1777
12
Functions
12-1778
lassoblm
The variables GNPR, IPI, and WR appear to have an exponential trend. Remove the exponential trend from the variables GNPR, IPI, and WR. y = log(y); X(:,[1 3]) = log(X(:,[1 3]));
All predictor variables have different scales (for more details, enter Description at the command line). Display the mean of each predictor. Remove observations containing leading missing values from all predictors. predmeans = mean(X,'omitnan') predmeans = 1×3 104 × 0.0002
4.7700
0.0004
The values of the second predictor are much greater than those of the other two predictors and the response. Therefore, the regression coefficient of the second predictor can appear close to zero. Using dot notation, attribute a very low shrinkage to the intercept, a shrinkage of 0.1 to the first and third predictors, and a shrinkage of 1000 to the second predictor. PriorMdl.Lambda = [1e-5 0.1 1e4 0.1];
12-1779
12
Functions
Implement Bayesian lasso regression by estimating the marginal posterior distributions of β and σ2. Because Bayesian lasso regression uses MCMC for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: lasso MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ---------------------------------------------------------------------Intercept | 2.0281 0.6839 [ 0.679, 3.323] 0.999 Empirical IPI | 0.3534 0.2497 [-0.139, 0.839] 0.923 Empirical E | 0.0000 0.0000 [-0.000, 0.000] 0.762 Empirical WR | 0.5250 0.3482 [-0.126, 1.209] 0.937 Empirical Sigma2 | 0.0315 0.0055 [ 0.023, 0.044] 1.000 Empirical
Forecast Responses Using Posterior Predictive Distribution Consider the linear regression model in “Create Prior Model for Bayesian Lasso Regression” on page 12-1770. Perform Bayesian lasso regression: 1
Create a Bayesian lasso prior model for the regression coefficients and disturbance variance. Use the default shrinkage.
2
Hold out the last 10 periods of data from estimation.
3
Estimate the marginal posterior distributions.
p = 3; PriorMdl = bayeslm(p,'ModelType','lasso','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF)
12-1780
lassoblm
h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 25.4831
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. When you perform Bayesian lasso regression, a best practice is to search for appropriate shrinkage values. One way to do so is to estimate the forecast RMSE over a grid of shrinkage values, and choose the shrinkage that minimizes the forecast RMSE.
12-1781
12
Functions
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Tips • Lambda is a tuning parameter. Therefore, perform Bayesian lasso regression using a grid of shrinkage values, and choose the model that best balances a fit criterion and model complexity. • For estimation, simulation, and forecasting, MATLAB does not standardize predictor data. If the variables in the predictor data have different scales, then specify a shrinkage parameter for each predictor by supplying a numeric vector for Lambda.
Alternative Functionality The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2018b
References [1] Park, T., and G. Casella. "The Bayesian Lasso." Journal of the American Statistical Association. Vol. 103, No. 482, 2008, pp. 681–686.
12-1782
lassoblm
See Also Objects empiricalblm | mixconjugateblm | mixsemiconjugateblm Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-1783
12
Functions
lazy Adjust Markov chain state inertia
Syntax lc = lazy(mc) lc = lazy(mc,w)
Description lc = lazy(mc) transforms the discrete-time Markov chain mc into the lazy chain on page 12-1792 lc with an adjusted state inertia. lc = lazy(mc,w) applies the inertial weights w for the transformation.
Examples Create Lazy Markov Chain Consider this three-state transition matrix. 0 10 P= 0 0 1 . 1 00 Create the irreducible and periodic Markov chain that is characterized by the transition matrix P. P = [0 1 0; 0 0 1; 1 0 0]; mc = dtmc(P);
At time t = 1,..., T, mc is forced to move to another state deterministically. Determine the stationary distribution of the Markov chain and whether it is ergodic. xFix = asymptotics(mc) xFix = 1×3 0.3333
0.3333
0.3333
isergodic(mc) ans = logical 0
mc is irreducible and not ergodic. As a result, mc has a stationary distribution, but it is not a limiting distribution for all initial distributions. Show why xFix is not a limiting distribution for all initial distributions. 12-1784
lazy
x0 = [1 0 0]; x1 = x0*P x1 = 1×3 0
1
0
0
1
0
0
x2 = x1*P x2 = 1×3 0 x3 = x2*P x3 = 1×3 1
sum(x3 == x0) == mc.NumStates ans = logical 1
The initial distribution is reached again after several steps, which implies that the subsequent state distributions cycle through the same sets of distributions indefinitely. Therefore, mc does not have a limiting distribution. Create a lazy version of the Markov chain mc. lc = lazy(mc) lc = dtmc with properties: P: [3x3 double] StateNames: ["1" "2" NumStates: 3
"3"]
lc.P ans = 3×3 0.5000 0 0.5000
0.5000 0.5000 0
0 0.5000 0.5000
lc is a dtmc object. At time t = 1,..., T, lc "flips a fair coin". It remains in its current state if the "coin shows heads" and transitions to another state if the "coin shows tails". Determine the stationary distribution of the lazy chain and whether it is ergodic. lcxFix = asymptotics(lc)
12-1785
12
Functions
lcxFix = 1×3 0.3333
0.3333
0.3333
isergodic(lc) ans = logical 1
lc and mc have the same stationary distributions, but only lc is ergodic. Therefore, the limiting distribution of lc exists and is equal to its stationary distribution.
Supply Inertial Weights for Lazy Chain Transformation Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 1/2 1/4 1/4 0 0 ; 0 0 1/3 0 2/3 0 0 ; 0 0 0 0 0 1/3 2/3; 0 0 0 0 0 1/2 1/2; 0 0 0 0 0 3/4 1/4; 1/2 1/2 0 0 0 0 0 ; 1/4 3/4 0 0 0 0 0 ]; mc = dtmc(P);
Plot the eigenvalues of the transition matrix on the complex plane. figure; eigplot(mc); title('Original Markov Chain')
12-1786
lazy
Three eigenvalues have modulus one, which indicates that the period of mc is three. Create lazy versions of the Markov chain mc using various inertial weights. Plot the eigenvalues of the lazy chains on separate complex planes. w2 = 0.1; % More active Markov chain w3 = 0.9; % Lazier Markov chain w4 = [0.9 0.1 0.25 0.5 0.25 0.001 0.999]; % Laziness differs between states lc1 lc2 lc3 lc4
= = = =
lazy(mc); lazy(mc,w2); lazy(mc,w3); lazy(mc,w4);
figure; eigplot(lc1); title('Default Laziness');
12-1787
12
Functions
figure; eigplot(lc2); title('More Active Chain');
12-1788
lazy
figure; eigplot(lc3); title('Lazier Chain');
12-1789
12
Functions
figure; eigplot(lc4); title('Differing Laziness Levels');
12-1790
lazy
All lazy chains have only one eigenvalue with modulus one. Therefore, they are aperiodic. The spectral gap (distance between inner and outer circle) determines the mixing time. Observe that all lazy chains take longer to mix than the original Markov chain. Chains with different inertial weights than the default take longer to mix than the default lazy chain.
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). w — Inertial weights 0.5 (default) | numeric scalar | numeric vector Inertial weights, specified as a numeric scalar or vector of length NumStates. Values must be between 0 and 1. • If w is a scalar, lazy applies it to all states. That is, the transition matrix of the lazy chain (lc.P) is the result of the linear transformation Plazy = 1 − w P + wI . P is mc.P and I is the NumStates-by-NumStates identity matrix. 12-1791
12
Functions
• If w is a vector, lazy applies the weights state by state (row by row). Data Types: double
Output Arguments lc — Discrete-time Markov chain dtmc object Discrete-time Markov chain, returned as a dtmc object. lc is the lazy version of mc.
More About Lazy Chain A lazy version of a Markov chain has, for each state, a probability of staying in the same state equal to at least 0.5. In a directed graph of a Markov chain, the default lazy transformation ensures self-loops on all states, eliminating periodicity. If the Markov chain is irreducible, then its lazy version is ergodic. See graphplot.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013.
See Also Objects dtmc Functions graphplot | asymptotics Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Determine Asymptotic Behavior of Markov Chain” on page 10-39
12-1792
lbqtest
lbqtest Ljung-Box Q-test for residual autocorrelation
Syntax h = lbqtest(res) [h,pValue,stat,cValue] = lbqtest(res) StatTbl = lbqtest(Tbl) [ ___ ] = lbqtest( ___ ,Name=Value)
Description h = lbqtest(res) returns the rejection decision h from conducting a Ljung-Box Q-test on page 121804 for autocorrelation in the residual series res. [h,pValue,stat,cValue] = lbqtest(res) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = lbqtest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting a Ljung-Box Q-test for residual autocorrelation in the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = lbqtest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. lbqtest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when lbqtest conducts multiple tests: • lbqtest treats each test as separate from all other tests. • If you specify res, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, lbqtest(Tbl,DataVariable="ResidualGDP",Alpha=0.025,Lags=[1 4]) conducts two tests, at a level of significance of 0.025, for the presence of residual autocorrelation in the variable ResidualGDP of the table Tbl. The first test includes 1 lag in the test statistic, and the second test includes 4 lags.
Examples Conduct Ljung-Box Q-Test on Vector of Data Test a time series for residual autocorrelation using default options of lbqtest. Input the time series data as a numeric vector. Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
12-1793
12
Functions
Data is a time series vector of daily Deutschmark/British pound bilateral spot exchange rates. Plot the series. plot(Data) title("\bf Deutschmark/British Pound Bilateral Spot Exchange Rate") ylabel("Spot Exchange Rate") xlabel("Business Days Since January 2, 1984")
The series appears nonstationary. To stabilize the series, convert the spot exchange rates to returns. returns = price2ret(Data); plot(returns) title("\bf Deutschmark/British Pound Bilateral Spot Exchange Rate") ylabel("Return") xlabel("Business Days Since January 3, 1984")
12-1794
lbqtest
Compute the deviations of the return series from the mean. residuals = returns - mean(returns);
At 0.05 level of significance, test the residual series for autocorrelation using the default options of the Ljung-Box Q-test. h = lbqtest(residuals) h = logical 0
The result h = 0 indicates that insufficient evidence exists to reject the null hypothesis of no residual autocorrelation through 20 lags.
Return Test p-Value and Decision Statistics Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
Preprocess the data by following this procedure: 1
Stabilize the series by computing daily returns. 12-1795
12
Functions
2
Compute the deviations from the mean return.
returns = price2ret(Data); residuals = returns - mean(returns);
Test the residual series for a significant autocorrelation from 1 through 20 lags. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stat,cValue] = lbqtest(residuals) h = logical 0 pValue = 0.1131 stat = 27.8445 cValue = 31.4104
Conduct Ljung-Box Q-Test on Table Variable Test a time series, which is one variable in a table, for residual autocorrelation using default options of lbqtest. Load the equity index data set Data_EquityIdx. Preprocess the daily NASDAQ closing prices by performing the following actions: 1
Convert the price series to a percentage return series by using price2ret.
2
Represent the series as residuals that fluctuate around a constant level by centering the returns series.
Store the residual series in the table with the rest of the data. Because the price-to-return conversion reduces the sample size from the head of the series, replace the missing residual with the a NaN. load Data_EquityIdx ret = 100*price2ret(DataTable.NASDAQ); res = ret - mean(ret); DataTable.Residuals_NASDAQ = [NaN; res]; DataTable.Properties.VariableNames{end} ans = 'Residuals_NASDAQ'
The residual series is the last variable in the table. Conduct Ljung-Box Q-test on the residual series at a 5% significance level by supplying the entire data set lbqtest. StatTbl = lbqtest(DataTable) StatTbl=1×7 table h _____
12-1796
pValue __________
stat ______
cValue ______
Lags ____
Alpha _____
DoF ___
lbqtest
Test 1
true
2.8182e-11
92.395
31.41
20
0.05
20
lbqtest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, and DoF), and rows correspond to individual tests (in this case, lbqtest conducts one test). h = 1 and pValue = 2.82e-11 rejects the null hypothesis and suggests that the evidence for at least one significant autocorrelation in lags 1 through 20 in the NASDAQ returns residual series is strong. By default, lbqtest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Test Time Series for Autocorrelation and ARCH Effects Load the Deutschmark/British pound foreign-exchange rate data set. load Data_MarkPound
Convert the prices to returns. returns = price2ret(Data);
Compute the deviations of the return series. res = returns - mean(returns);
Test the hypothesis that the residual series is not autocorrelated, using the default number of lags. h1 = lbqtest(res) h1 = logical 0
h1 = 0 indicates that there is insufficient evidence to reject the null hypothesis that the residuals of the returns are not autocorrelated. Test the hypothesis that there are significant ARCH effects, using the default number of lags [3]. h2 = lbqtest(res.^2) h2 = logical 1
h2 = 1 indicates that there are significant ARCH effects in the residuals of the returns. Test for residual heteroscedasticity using archtest. Specify an alternative ARCH(L) model, where L ≤ 20, for consistency with h2. h3 = archtest(res,Lags=20)
12-1797
12
Functions
h3 = logical 1
h3 = 1 indicates that the null hypothesis of no residual heteroscedasticity should be rejected in favor of an ARCH(L) model, where L ≤ 20. This result is consistent with h2.
Test Several Autocorrelation Lags Conduct multiple Ljung-Box Q-tests for autocorrelation by specifying several lags for the test statistic. The data set is a time series of 57 consecutive days of overshorts from an underground gasoline tank in Colorado [2]. That is, the current overshort ( yt) represents the accuracy in measuring the amount of fuel: • In the tank at the end of day t • In the tank at the end of day t − 1 • Delivered to the tank on day t • Sold on day t. Load the data set. load Data_Overshort T = height(DataTable); figure plot(DataTable.OSHORT) title('Daily Gasoline Overshorts')
12-1798
lbqtest
lbqtest is appropriate for a series with a constant mean. Because the series appears to fluctuate around a constant mean, you do not need to stabilize it. Compute deviations from the mean. DataTable.Residuals_OSHORT = DataTable.OSHORT - mean(DataTable.OSHORT);
Assess whether the residuals are autocorrelated. Include 5, 10, and 15 lags in the test statistic, and adjust the significance level of each test to 0.05/3. StatTbl = lbqtest(DataTable,DataVariable="Residuals_OSHORT", ... Lags=[5 10 15],Alpha=0.05/3) StatTbl=3×7 table h _____ Test 1 Test 2 Test 3
true true true
pValue __________
stat ______
cValue ______
Lags ____
Alpha ________
DoF ___
0.0016465 0.00068328 0.001281
19.36 30.599 36.964
13.839 21.707 28.88
5 10 15
0.016667 0.016667 0.016667
5 10 15
Rows of StatTbl contain results of separate tests conducted for each specified lag. Each test rejects the null hypothesis at 0.0167 level of significance.
12-1799
12
Functions
Assess Autocorrelation in Inferred Residuals Infer residuals from an estimated ARIMA model, and assess whether the residuals exhibit autocorrelation using lbqtest. Load the Australian Consumer Price Index (CPI) data set. The time series (cpi) is the log quarterly CPI from 1972 to 1991. Remove the trend in the series by taking the first difference. load Data_JAustralian cpi = DataTable.PAU; T = length(cpi); dCPI = diff(cpi); dt = datetime(dates,ConvertFrom="datenum"); figure plot(dt(2:T),dCPI) title("Differenced Australian CPI") xlabel("Year") ylabel("CPI Growth Rate") axis tight
The differenced series appears stationary. Fit an AR(1) model to the series, and then infer residuals from the estimated model. Mdl = arima(1,0,0); EstMdl = estimate(Mdl,dCPI);
12-1800
lbqtest
ARIMA(1,0,0) Model (Gaussian Distribution): Value _________ Constant AR{1} Variance
0.015564 0.29646 0.0001038
StandardError _____________
TStatistic __________
PValue __________
5.4106 2.6834 8.6994
6.2808e-08 0.0072876 3.3362e-18
0.0028766 0.11048 1.1932e-05
res = infer(EstMdl,dCPI); stdRes = res/sqrt(EstMdl.Variance); % Standardized residuals
Assess whether the residuals are autocorrelated by conducting a Ljung-Box Q-test. The standardized residuals originate from the estimated model (EstMdl) containing parameters. When using such residuals, perform the following actions: • Adjust the degrees of freedom (DoF) of the test statistic distribution to account for the estimated parameters. • Set the number of lags to include in the test statistic. • When you count the estimated parameters, skip the constant and variance parameters. lags = 10; dof = lags - 1; % One autoregressive parameter [h,pValue] = lbqtest(stdRes,Lags=lags,DoF=dof) h = logical 1 pValue = 0.0119
pValue = 0.0119 suggests that the residuals have significant autocorrelation in at least one lag of lags 1 through 5, at the 5% level.
Input Arguments res — Residual series vector Residual series, specified as a numeric vector. Each element of res corresponds to an observation. Typically, res contains the (standardized) residuals from a model fit to observed time series. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single residual series (variable) to test by using the DataVariable argument. The selected variable must be numeric. 12-1801
12
Functions
Note Specify missing observations using NaN. The lbqtest function treats missing values as missing completely at random on page 12-1804. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: lbqtest(Tbl,DataVariable="ResidualGDP",Alpha=0.025,Lags=[1 4]) conducts two tests, at a level of significance of 0.025, for the presence of residual autocorrelation in the variable ResidualGDP of the table Tbl. The first test includes 1 lag in the test statistic, and the second test includes 4 lags. Lags — Number of lags min([20,T-1]) (default) | positive integer | vector of positive integers Number of lags L to include in the test statistic, specified as a positive integer that is less than T or a vector of such positive integers, where T is the effective sample size (the number of nonmissing values in the input series). lbqtest conducts a separate test for each element in Lags. Example: Lags=[1 4] conducts two tests. The first test includes only the first lag in the AR model of the squared residuals, and the second test includes the first through fourth lags. Data Types: double Alpha — Significance level 0.05 (default) | numeric scalar | numeric vector Significance level for the hypothesis test, specified as a numeric scalar in the interval (0,1) or a numeric vector of such values. lbqtest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DoF — Degrees of freedom Lags (default) | positive integer | vector of positive integers Degrees of freedom for the asymptotic chi-square distribution of the test statistic under the null hypothesis, specified as a positive integer or vector of positive integers. lbqtest conducts a separate test for each value in DoF. If DoF is an integer, then it must be less than or equal to Lags. Otherwise, each element of DoF must be less than or equal to the corresponding element of Lags. Example: DoF=15 specifies 15 degrees of freedom for the distribution of the test statistic. Data Types: double 12-1802
lbqtest
DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. Example: DataVariable="ResidualGDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. lbqtest returns h when you supply the input res. • Values of 1 indicate rejection of the no residual autocorrelation null hypothesis in favor of the alternative. • Values of 0 indicate failure to reject the no residual autocorrelation null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. lbqtest returns pValue when you supply the input res. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. lbqtest returns stat when you supply the input res. cValue — Test critical values numeric scalar | numeric vector Test critical values, determined by Alpha, returned as a numeric scalar or vector with length equal to the number of tests. lbqtest returns cValue when you supply the input res. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. lbqtest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, and DoF.
12-1803
12
Functions
More About Ljung-Box Q-Test The Ljung-Box Q-test is a "portmanteau" test that assesses the null hypothesis that a series of residuals exhibits no autocorrelation for a fixed number of lags L (see Lags), against the alternative that some autocorrelation coefficient ρ(k), k = 1, ..., L is nonzero. The test statistic is Q = T(T + 2)
L
∑
k=1
2
ρ(k) , (T − k)
where T is the sample size, L is the number of autocorrelation lags, and ρ(k) is the sample autocorrelation at lag k. Under the null hypothesis, the asymptotic distribution of Q is chi-square with L degrees of freedom. Missing Completely at Random Observations of a random variable are missing completely at random if the tendency of an observation to be missing is independent of both the random variable and the tendency of all other observations to be missing.
Tips If you obtain the input residual series by fitting a model to data, reduce the degrees of freedom DoF by the number of estimated coefficients, excluding constants. For example, if you obtain the input residuals by fitting an ARMA(p,q) model, set DoF=L−p−q, where L is the value of Lags.
Algorithms • The value of the Lags argument L affects the power of the test. • If L is too small, the test does not detect high-order autocorrelations. • If L is too large, the test loses power when a significant correlation at one lag is washed out by insignificant correlations at other lags. • Box, Jenkins, and Reinsel suggest the default setting Lags=min[20,T-1] [1]. • Tsay cites simulation evidence showing better test power performance when L is approximately log(T) [5]. • lbqtest does not directly test for serial dependencies other than autocorrelation. However, you can use it to identify conditional heteroscedasticity (ARCH effects) by testing squared residuals [4]. Engle's test assesses the significance of ARCH effects directly. For details, see archtest.
Version History Introduced before R2006a 12-1804
lbqtest
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Brockwell, P. J. and R. A. Davis. Introduction to Time Series and Forecasting. 2nd ed. New York, NY: Springer, 2002. [3] Gourieroux, C. ARCH Models and Financial Applications. New York: Springer-Verlag, 1997. [4] McLeod, A. I. and W. K. Li. "Diagnostic Checking ARMA Time Series Models Using SquaredResidual Autocorrelations." Journal of Time Series Analysis. Vol. 4, 1983, pp. 269–273. [5] Tsay, R. S. Analysis of Financial Time Series. 2nd Ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005.
See Also autocorr | archtest Topics “Detect Autocorrelation” on page 3-19 “Check Fit of Multiplicative ARIMA Model” on page 3-80 “Specify Conditional Mean and Variance Models” on page 7-75 “Time Series Regression VI: Residual Diagnostics” on page 5-223 “Ljung-Box Q-Test” on page 3-17
12-1805
12
Functions
lmctest Leybourne-McCabe stationarity test
Syntax h = lmctest(y) [h,pValue,stat,cValue] = lmctest(y) StatTbl = lmctest(Tbl) [ ___ ] = lmctest( ___ ,Name=Value) [ ___ ,reg1,reg2] = lmctest( ___ )
Description h = lmctest(y) returns the rejection decision h from conducting the Leybourne-McCabe stationarity test on page 12-1815 for assessing whether the univariate time series y is stationary. [h,pValue,stat,cValue] = lmctest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = lmctest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Leybourne-McCabe stationarity test on the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = lmctest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. lmctest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when lmctest conducts multiple tests: • lmctest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, lmctest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test includes 0 lagged terms in the structural model, and the second test includes 1 lagged term in the structural model. [ ___ ,reg1,reg2] = lmctest( ___ ) additionally returns structures of regression statistics, which are required to form the test statistic on page 12-1816. • reg1 – Maximum likelihood estimation of the reduced-form model • reg2 – Deterministic local level model of filtered response data, with Gaussian noise and an optional linear trend
Examples
12-1806
lmctest
Conduct Leybourne-McCabe Stationarity Test on Vector of Data Test a time series for stationarity using the default options of lmctest. Input the time series data as a numeric vector. Load Schwert's macroeconomic data set. Extract the monthly unemployment rate UN. load Data_SchwertMacro un = DataTableMth.UN;
Represent the series as a growth rate by applying the first difference. unr = diff(un);
The first difference operation causes unr to have one less observation than un. The timebase of unr starts at observation 2. Plot the unemployment growth rate. dts = datetime(datesMth,ConvertFrom="datenum"); plot(dts(2:end),unr) title("Unemployment Growth Rate")
Assess the null hypothesis of the Leybourne-McCabe stationarity test that the unemployment growth rate series is a trend-stationary AR(0) model. Use default options. h = lmctest(unr)
12-1807
12
Functions
h = logical 0
h = 0 indicates that, at a 5% level of significance, the test fails to reject the null hypothesis that the unemployment growth rate series is a trend-stationary AR(0) model.
Return Test p-Value and Decision Statistics Load Schwert's macroeconomic data set Data_SchwertMacro.mat. Extract the monthly unemployment rate UN, and apply the first difference to the series. load Data_SchwertMacro unr = diff(DataTableMth.UN);
Assess the null hypothesis that the series is a trend-stationary AR(0) process. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stats,cValue] = lmctest(unr) h = logical 0 pValue = 0.1000 stats = 0.0978 cValue = 0.1460
pValue = 0.1000 is the maximum tabulated value; its actual value can be larger than 0.1000.
Conduct Leybourne-McCabe Stationarity Test on Table Variable Test whether a time series, which is one variable in a table, for stationarity using the default options. Load Schwert's macroeconomic data set. Convert the table of monthly series to a timetable. load Data_SchwertMacro dates = datetime(datesMth,ConvertFrom="datenum"); TT = table2timetable(DataTableMth,RowTimes=dates);
Apply the first difference to all monthly series. DTT = varfun(@diff,TT(:,2:end)); DTT.Properties.VariableNames{end} ans = 'diff_SIG'
Assess the null hypothesis of the Leybourne-McCabe stationarity test that the rate of the volatility of returns to Standard & Poor's composite index series is a trend-stationary AR(0) model. StatTbl = lmctest(DTT)
12-1808
lmctest
StatTbl=1×8 table h _____ Test 1
false
pValue ______ 0.1
stat _________
cValue ______
0.0027953
0.146
Lags ____ 0
Alpha _____
Trend _____
Test ________
0.05
true
{'var2'}
lmctest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, Trend, and Test), and rows correspond to individual tests (in this case, lmctest conducts one test). By default, lmctest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Assess Whether Series Is Trend Stationary and AR(p) Test the growth of the US unemployment rate using the data in [5]. Load Schwert's macroeconomic data set. Convert the table of monthly series to a timetable. Apply the first difference to all variables in the timetable. load Data_SchwertMacro dates = datetime(datesMth,ConvertFrom="datenum"); TT = table2timetable(DataTableMth,RowTimes=dates); TT.Dates = []; DTT = varfun(@diff,TT);
Define the time range of the sample considered in [4]. Extract a subtable with the corresponding dates. trLM = timerange("1948-01-01","1985-12-01","closed"); DTT4 = DTT(trLM,:);
Assess the null hypothesis that the unemployment rate growth is a trend-stationary, AR(1) process. Conduct the same test twice. For the first test, specify the test statistic that uses the estimated variance from OLS regression (Test = "var1"). For the second test, specify the test statistic that uses estimated variance from the maximum likelihood of the reduced-form regression model (Test = "var2"). StatTbl = lmctest(DTT4,DataVariable="diff_UN", ... Lags=1,Test=["var1" "var2"]) StatTbl=2×8 table h _____ Test 1 Test 2
false true
pValue ________
stat ________
cValue ______
0.1 0.020721
0.099166 0.18741
0.146 0.146
Lags ____ 1 1
Alpha _____
Trend _____
Test ________
0.05 0.05
true true
{'var1'} {'var2'}
Row Test 1 of StatTbl contains the results of the first test StatTbl.h(1) = 0 indicates that, at a 5% level of significance, there is not enough evidence to reject that the unemployment growth rate is a trend-stationary, AR(1) process. Row Test 2 of StatTbl contains the results of the second test StatTbl.h(2) = 1 indicates that, at a 5% level of significance, there is enough evidence to reject 12-1809
12
Functions
that the unemployment growth rate is a trend-stationary, AR(1) process, which implies that the unemployment rate growth is nonstationary. Leybourne and McCabe report that the original LMC statistic fails to reject stationarity, while the modified LMC statistic does reject it [4].
Inspect Regression Statistics Load Schwert's macroeconomic data set. Convert the table of monthly series to a timetable. Apply the first difference to all variables in the timetable. load Data_SchwertMacro dates = datetime(datesMth,ConvertFrom="datenum"); TT = table2timetable(DataTableMth,RowTimes=dates); TT.Dates = []; DTT = varfun(@diff,TT);
Assess the null hypothesis that the unemployment rate growth is a trend-stationary, AR(1) process. Return the regression statistics from the maximum likelihood estimation of reduced-form model and from the OLS estimation of deterministic local level model of filtered response data. [StatTbl,reg1,reg2] = lmctest(DTT,DataVariable="diff_UN",Lags=1);
Display the coefficients of both regressions. rownames = ["c_0"; "delta"; "b_1"]; varnames = ["Coefficient"; "SE"; "PValue"]; coeff = reg1.coeff; se = reg1.se; pval = reg1.tStats.pVal; table(coeff,se,pval,RowNames=rownames,VariableNames=varnames) ans=3×3 table
c_0 delta b_1
Coefficient ___________
SE _________
PValue __________
-0.00051942 -0.25044 0.63665
0.0041996 0.051797 0.046451
0.90162 1.8305e-06 5.6215e-36
rownames = ["Intercept"; "Trend"]; coeff = reg2.coeff; se = reg2.se; pval = reg2.tStats.pVal; table(coeff,se,pval,RowNames=rownames,VariableNames=varnames) ans=2×3 table
Intercept Trend
12-1810
Coefficient ___________
SE __________
PValue _______
0.013904 -2.2372e-05
0.024521 9.3396e-05
0.57099 0.81079
lmctest
Input Arguments y — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note lmctest removes missing observations, represented by NaN values, from the input series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: lmctest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test includes 0 lagged terms in the structural model, and the second test includes 1 lagged term in the structural model. Lags — Number p of lagged response terms 0 (default) | nonnegative integer | vector of nonnegative integers Number p of lagged values of yt to include in the structural model, specified as a nonnegative integer or vector of nonnegative integers. p is equal to the number of lagged differences of yt in the reducedform model. lmctest conducts a separate test for each element in Lags. Lags > 0 decreases the sample size (see “Algorithms” on page 12-1816). Tip To draw valid inferences from a Leybourne-McCabe stationarity test, you must determine a suitable value for the Lags argument. The test is robust when p is greater than its true value in the data-generating process, and simulation evidence shows that, under the null hypothesis, the marginal distribution of the MLE of bp is asymptotically normal [3]. Therefore, you can use a standard t-test to determine whether bp is significant. However, estimated coefficient standard errors are unreliable when the MA(1) coefficient a is near 1. For a model-order test, valid under both the null and alternative hypotheses, that relies only on the MLEs of bp and a, and not on their standard errors, see [4]. 12-1811
12
Functions
Example: Lags=[0 1] includes no lags in the structural model for the first test, and then includes yt 1 in the structural model for the second test. Data Types: double Trend — Flag for including deterministic trend term δt true (default) | false | logical vector Flag for including deterministic trend δt in the structural model, specified as a logical scalar or vector. Trend=true is equivalent to including the drift term δ in the reduced-form model. lmctest conducts a separate test for each element in Trend. Tip With a specific testing strategy in mind, determine the value of the Trend argument by the growth characteristics of the input time series. • If the input series grows, include a trend term by setting Trend to true (default). This setting provides a reasonable comparison of a trend stationary null and a unit root process with drift. • If a series does not exhibit long-term growth characteristics, exclude a trend term by setting Trend to false.
Example: Trend=false excludes δt from the structural model for all tests. Data Types: logical Test — Variance estimate "var1" (default) | "var2" | character vector | string vector | cell vector of character vectors 2
Variance estimate σ 1 to use for the test statistic, specified as an estimate name, or a string vector or cell vector of estimate names, in this table. Test Name
Description
"var1"
2
σ1 =
e2′e2 , T
where e2 is the residual vector from the regression of the deterministic local level model of the filtered responses. "var2"
2
σ 1 = aσ2, where a and σ2 are MLEs from the reduced-form model regression. For more details, see “Test Statistics” on page 12-1816. lmctest conducts a separate test for each estimate name in Test. Example: Test=["var1" "var2"] conducts two tests. The first test uses the variance estimate described by "var1" to compute the test statistic, and the second test uses the variance estimate "var2". Data Types: char | cell | string Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector
12-1812
lmctest
Nominal significance level for the hypothesis test, specified a numeric scalar between 0.01 and 0.10 or a numeric vector of such values. lmctest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. lmctest returns h when you supply the input y. • Values of 1 indicate rejection of the AR(p) null hypothesis in favor of the ARIMA(p,1,1) alternative. • Values of 0 indicate failure to reject the AR(p) null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. lmctest returns pValue when you supply the input y. The p-values are right-tail probabilities. When test statistics are outside tabulated critical values, lmctest returns maximum (0.10) or minimum (0.01) p-values. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. lmctest returns stat when you supply the input y. For details, see “Test Statistics” on page 12-1816. cValue — Critical values numeric scalar | numeric vector 12-1813
12
Functions
Critical values, returned as a numeric scalar or vector with length equal to the number of tests. lmctest returns cValue when you supply the input y. Critical values are for right-tail probabilities. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. lmctest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Trend, and Test. reg1 — Regression statistics from maximum likelihood estimation of reduced-form model structure array Regression statistics from the maximum likelihood estimation of the reduced-form model, returned as a structure array with the number of records equal to the number of tests. Each element of reg1 has the fields in this table. You can access a field using dot notation, for example, reg(1).coeff contains the coefficient estimates of the first test.
12-1814
num
Length of input series with NaNs removed, T – 1
size
Effective sample size, adjusted for lags and difference, T – (p + 1)
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
lmctest
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
reg2 — Regression statistics from estimation of deterministic local level model of filtered response data structure array Regression statistics from the estimation of the deterministic local level model of the filtered response data, returned as a structure array with the number of records equal to the number of tests. reg2 has the same fields as reg1, but with the differences described in the following table. num
Length of input series with NaNs removed, T – p
size
Effective sample size, adjusted for lags and difference, T – p
More About Leybourne-McCabe Stationarity Test The Leybourne-McCabe stationarity test assesses the null hypothesis that a response series yt is a trend-stationary, degree p autoregressive model (AR(p)) against the alternative hypothesis that yt is nonstationary. Structural Model for Hypotheses
The structural model for the response series is yt = ct + δt + b1 yt − 1 + ⋯ + bp yt − 1 + u1, t ct = ct − 1 + u2, t, where •
u1, t iid 0, σ12 u2, t iid 0, σ22 ,
• u1,t and u2,t are independent. • The number of lags p is the value of the Lags option. • The trend term δt is present when Trend=true. • T is the sample size without missing observations. The model is second-order equivalent in moments to the reduced-form ARIMA(p,1,1) model Δyt = δ + b1 Δyt − 1 + ... + bp Δyt − p + 1 − aL vt, where L is the lag operator Lyt = yt–1 and vt ~ iid(0,σ2). The null hypothesis is that σ22 = 0 in the structural model, which is equivalent to a = 1 in the reduced-form model. The alternative is that σ22 > 0 or a < 1. Under the null hypothesis, the structural model is AR(p) with intercept c0 and trend δt; the reduced-form model is an over-differenced ARIMA(p,1,1) representation of the same process. 12-1815
12
Functions
Test Statistics
lmctest computes test statistics using this two-stage method: 1
Perform OLS regression of the reduced-form model to obtain maximum likelihood estimates (MLEs) of the coefficients (results are stored in the output reg1). Specifically, regress Y = Δyt on p lagged differences of y. lmctest stores the regression results in reg1.
2
Regress filtered data zt onto xt where • zt = yt − b1 yt − 1 − ... − bp yt − p . • t = p+1,…,T • xt is one of the following: • 1, for an intercept-only model when Trend=false • [1 t], for a model with an intercept and deterministic tome trend when Trend=true lmctest stores the regression results in reg2.
The test statistic s* (output stat) is s∗ =
e1′Ve1 s2T 2
,
wherea • e1 is a vector of the residuals from the reduced-form model regression. • V(i,j) = min(i,j) • s2 is an estimate of σ2 that depends on the value of the variance-estimation algorithm option Test: 1 • "var1" — The estimate is 2
σ1 =
e2′e2 , T
where e2 is the residual vector from the regression zt onto xt (output reg2). This selection is the original Leybourne-McCabe test, as described in [3], and it has a rate of consistency O(T). • "var2" (default) — The estimate is aσ2, where a and σ2 are MLEs from the reduced-form model regression (output reg1). This selection is the modified Leybourne-McCabe test, as described in [4], and it has a rate of consistency O(T2).
Tips • The alternative hypothesis that σ22 > 0 implies 0 < a < 1. As a result, an alternative model with a = 0 and a random walk, reduced-form model with iid errors is not possible. The class of I(1) alternatives represented by such a model is appropriate for economic series with significant MA(1) components [3]. To test for a random walk, use vratiotest.
Algorithms • The value of the Lags option lags the response in the structural model, and the reduced-form model operates on the first difference of the response. In general, when a time series is lagged or 12-1816
lmctest
differenced, the sample size is reduced. Without a presample, if yt is defined for t = 1,…,T, the lagged series yt–k is defined for t = k+1,…,T. When yt–k is differenced, the time base reduces to k +2,…,T. p lagged differences reduce the common time base to p+2,…,T and the effective sample size is T – (p+1). • Test statistics follow nonstandard distributions under the null, even asymptotically. Asymptotic critical values for a standard set of significance levels between 0.01 and 0.1, for models with and without a trend, have been tabulated in [2] using Monte Carlo simulations. Critical values cValue and p-values pValue reported by lmctest are interpolated from the tables. The tabulated tables are identical to those for kpsstest. • Bootstrapped critical values, used by tests with a unit root null (such as adftest and pptest), are not possible for lmctest [1]. As a result, size distortions for small samples may be significant, especially for highly persistent processes.
Version History Introduced in R2010a
References [1] Caner, M., and L. Kilian. "Size Distortions of Tests of the Null Hypothesis of Stationarity: Evidence and Implications for the PPP Debate." Journal of International Money and Finance. Vol. 20, 2001, pp. 639–657. [2] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. “Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root.” Journal of Econometrics. Vol. 54, 1992, pp. 159–178. [3] Leybourne, S. J., and B. P. M. McCabe. "A Consistent Test for a Unit Root." Journal of Business and Economic Statistics. Vol. 12, 1994, pp. 157–166. [4] Leybourne, S. J., and B. P. M. McCabe. "Modified Stationarity Tests with Data-Dependent ModelSelection Rules." Journal of Business and Economic Statistics. Vol. 17, 1999, pp. 264–270. [5] Schwert, G. W. "Effects of Model Specification on Tests for Unit Roots in Macroeconomic Data." Journal of Monetary Economics. Vol. 20, 1987, pp. 73–103.
See Also pptest | adftest | vratiotest | kpsstest Topics “Unit Root Nonstationarity” on page 3-32
12-1817
12
Functions
lmtest Lagrange multiplier test of model specification
Syntax h = lmtest(score,ParamCov,dof) h = lmtest(score,ParamCov,dof,alpha) [h,pValue] = lmtest( ___ ) [h,pValue,stat,cValue] = lmtest( ___ )
Description h = lmtest(score,ParamCov,dof) returns a logical value (h) with the rejection decision from conducting a Lagrange multiplier test on page 12-1826 of model specification at the 5% significance level. lmtest constructs the test statistic using the score function (score), the estimated parameter covariance (ParamCov), and the degrees of freedom (dof). h = lmtest(score,ParamCov,dof,alpha) returns the rejection decision of the Lagrange multiplier test conducted at significance level alpha. • If score and ParamCov are length k cell arrays, then all other arguments must be length k vectors or scalars. lmtest treats each cell as a separate test, and returns a vector of rejection decisions. • If score is a row cell array, then lmtest returns a row vector. [h,pValue] = lmtest( ___ ) returns the rejection decision and p-value (pValue) for the hypothesis test, using any of the input arguments in the previous syntaxes. [h,pValue,stat,cValue] = lmtest( ___ ) additionally returns the test statistic (stat) and critical value (cValue) for the hypothesis test.
Examples Choose the Best AR Model Specification Compare AR model specifications for a simulated response series using lmtest. Consider the AR(3) model: yt = 1 + 0 . 9yt − 1 − 0 . 5yt − 2 + 0 . 4yt − 3 + εt, where εt is Gaussian with mean 0 and variance 1. Specify this model using arima. Mdl = arima('Constant',1,'Variance',1,'AR',{0.9,-0.5,0.4});
Mdl is a fully specified, AR(3) model. Simulate presample and effective sample responses from Mdl. 12-1818
lmtest
T = 100; rng(1); % For reproducibility n = max(Mdl.P,Mdl.Q); % Number of presample observations y = simulate(Mdl,T + n);
y is a a random path from Mdl that includes presample observations. Specify the restricted model: yt = c + ϕ1 yt − 1 + ϕ2 yt − 2 + εt, where εt is Gaussian with mean 0 and variance σ2. Mdl0 = arima(3,0,0); Mdl0.AR{3} = 0;
The structure of Mdl0 is the same as Mdl. However, every parameter is unknown, except that ϕ3 = 0. This is an equality constraint during estimation. Estimate the restricted model using the simulated data (y). [EstMdl0,EstParamCov] = estimate(Mdl0,y((n+1):end),... 'Y0',y(1:n),'display','off'); phi10 = EstMdl0.AR{1}; phi20 = EstMdl0.AR{2}; phi30 = 0; c0 = EstMdl0.Constant; phi0 = [c0;phi10;phi20;phi30]; v0 = EstMdl0.Variance;
EstMdl0 contains the parameter estimates of the restricted model. lmtest requires the unrestricted model score evaluated at the restricted model estimates. The unrestricted model gradient is ∂l(ϕ1, ϕ2, ϕ3, c, σ2; yt, . . . , yt − 3) 1 = 2 (yt − c − ϕ1 yt − 1 − ϕ2 yt − 2 − ϕ3 yt − 3) ∂c σ ∂l(ϕ1, ϕ2, ϕ3, c, σ2; yt, . . . , yt − 3) 1 = 2 (yt − c − ϕ1 yt − 1 − ϕ2 yt − 2 − ϕ3 yt − 3)yt − ∂ϕ j σ ∂l(ϕ1, ϕ2, ϕ3, c, σ2; yt, . . . , yt − 3) ∂σ2
= −
j
1 1 2 + (yt − c − ϕ1 yt − 1 − ϕ2 yt − 2 − ϕ3 yt − 3) . 2σ2 2σ4
MatY = lagmatrix(y,1:3); LagY = MatY(all(~isnan(MatY),2),:); cGrad = (y((n+1):end)-[ones(T,1),LagY]*phi0)/v0; phi1Grad = ((y((n+1):end)-[ones(T,1),LagY]*phi0).*LagY(:,1))/v0; phi2Grad = ((y((n+1):end)-[ones(T,1),LagY]*phi0).*LagY(:,2))/v0; phi3Grad = ((y((n+1):end)-[ones(T,1),LagY]*phi0).*LagY(:,3))/v0; vGrad = -1/(2*v0)+((y((n+1):end)-[ones(T,1),LagY]*phi0).^2)/(2*v0^2); Grad = [cGrad,phi1Grad,phi2Grad,phi3Grad,vGrad]; % Gradient matrix score = sum(Grad)'; % Score under the restricted model
12-1819
12
Functions
Evaluate the unrestricted parameter covariance estimator using the restricted MLEs and the outer product of gradients (OPG) method. EstParamCov0 = inv(Grad'*Grad); dof = 1; % Number of model restrictions
Test the null hypothesis that ϕ3 = 0 at a 1% significance level using lmtest. [h,pValue] = lmtest(score,EstParamCov0,dof,0.1) h = logical 1 pValue = 2.2524e-09
pValue is close to 0, which suggests that there is strong evidence to reject the restricted, AR(2) model in favor of the unrestricted, AR(3) model.
Assess Model Specifications Using the Lagrange Multiplier Test Compare two model specifications for simulated education and income data. The unrestricted model has the following loglikelihood: n
l(β, ρ) = − nlogΓ(ρ) + ρ
∑
k=1
n
n
logβk + (ρ − 1)
∑
k=1
log(yk) −
∑
k=1
ykβk,
where •
βk =
1 . β + xk
• xk is the number of grades that person k completed. •
yk is the income (in thousands of USD) of person k.
That is, the income of person k given the number of grades that person k completed is Gamma distributed with shape ρ and rate βi. The restricted model sets ρ = 1, which implies that the income of person k given the number of grades person k completed is exponentially distributed with mean β + xi. The restricted model is H0 : ρ = 1. In order to compare this model to the unrestricted model, you require: • The gradient vector of the unrestricted model • The maximum likelihood estimate (MLE) under the restricted model • The parameter covariance estimator evaluated under the MLEs of the restricted model Load the data. load Data_Income1 x = DataTable.EDU; y = DataTable.INC;
12-1820
lmtest
Estimate the restricted model parameters by maximizing l(ρ, β) with respect to β subject to the restriction ρ = 1. The gradient of l(ρ, β) is ∂l(ρ, β) = ∂β
T
∑
i=1
2
(yiβi − ρβi) T
∂l(ρ, β) = − TΨ(ρ) + ∑ (logβi yi), ∂ρ i=1 where Ψ(ρ) is the digamma function. rho0 = 1; % Restricted rho dof = 1; % Number of restrictions dLBeta = @(beta) sum(y./((beta + x).^2) - rho0./(beta + x));... % Anonymous gradient function [betaHat0,fVal,exitFlag] = fzero(dLBeta,0) betaHat0 = 15.6027 fVal = 2.7756e-17 exitFlag = 1 beta = [0:0.1:50]; plot(beta,arrayfun(dLBeta,beta)) hold on plot([beta(1);beta(end)],zeros(2,1),'k:') plot(betaHat0,fVal,'ro','MarkerSize',10) xlabel('{\beta}') ylabel('Loglikelihood Gradient') title('{\bf Loglikelihood Gradient with Respect to \beta}') hold off
12-1821
12
Functions
The gradient with respect to β (dLBeta) is decreasing, which suggests that there is a local maximum at its root. Therefore, betaHat0 is the MLE for the restricted model. fVal indicates that the value of the gradient is very close to 0 at betaHat0. The exit flag (exitFlag) is 1, which indicates that fzero found a root of the gradient without a problem. Estimate the parameter covariance under the restricted model using the outer product of gradients (OPG). rGradient = [-rho0./(betaHat0+x)+y.*(betaHat0+x).^(-2),... log(y./(betaHat0+x))-psi(rho0)]; % Gradient per unit rScore = sum(rGradient)'; % Score function rEstParamCov = inv(rGradient'*rGradient); % Parameter covariance estimate
Test the unrestricted model against the restricted model using the Lagrange multiplier test. [h,pValue] = lmtest(rScore,rEstParamCov,dof) h = logical 1 pValue = 7.4744e-05
pValue is close to 0, which indicates that there is strong evidence to suggest that the unrestricted model fits the data better than the restricted model.
12-1822
lmtest
Assess Conditional Heteroscedasticity Using the Lagrange Multiplier Test Test whether there are significant ARCH effects in a simulated response series using lmtest. The parameter values in this example are arbitrary. Specify the AR(1) model with an ARCH(1) variance: yt = 0 . 9yt − 1 + εt, where • εt = wt ht . • ht = 1 + 0 . 5ε2 . t−1 • wt is Gaussian with mean 0 and variance 1. VarMdl = garch('ARCH',0.5,'Constant',1); Mdl = arima('Constant',0,'Variance',VarMdl,'AR',0.9);
Mdl is a fully specified, AR(1) model with an ARCH(1) variance. Simulate presample and effective sample responses from Mdl. T = 100; rng(1); % For reproducibility n = 2; % Number of presample observations required for the gradient [y,ep,v] = simulate(Mdl,T + n);
ep is the random path of innovations from VarMdl. The software filters ep through Mdl to yield the random response path y. Specify the restricted model and assume that the AR model constant is 0: yt = c + ϕ1 yt − 1 + εt, where ht = α0 + α1εt2− 1. VarMdl0 = garch(0,1); VarMdl0.ARCH{1} = 0; Mdl0 = arima('ARLags',1,'Constant',0,'Variance',VarMdl0);
The structure of Mdl0 is the same as Mdl. However, every parameter is unknown, except for the restriction α1 = 0. These are equality constraints during estimation. You can interpret Mdl0 as an AR(1) model with the Gaussian innovations that have mean 0 and constant variance. Estimate the restricted model using the simulated data (y). psI = 1:n; % Presample indices esI = (n + 1):(T + n); % Estimation sample indices [EstMdl0,EstParamCov] = estimate(Mdl0,y(esI),... 'Y0',y(psI),'E0',ep(psI),'V0',v(psI),'display','off'); phi10 = EstMdl0.AR{1}; alpha00 = EstMdl0.Variance.Constant;
EstMdl0 contains the parameter estimates of the restricted model. 12-1823
12
Functions
lmtest requires the unrestricted model score evaluated at the restricted model estimates. The unrestricted model loglikelihood function is T
l(ϕ1, α0, α1) =
∑
t=2
− 0 . 5log(2π) − 0 . 5loght −
εt2 , 2ht
where εt = yt − ϕ1 yt − 1. The unrestricted gradient is ∂l(ϕ1, α0, α1) = ∂α
T
1 zt f t, 2h t t=2
∑
where zt = [1, εt2− 1] and f t =
I=
εt2 − 1. The information matrix is ht
T
1 2
2ht
∑
t=2
zt′zt .
Under the null, restricted model, ht = h0 = α0 for all t, where α0 is the estimate from the restricted model analysis. Evaluate the gradient and information matrix under the restricted model. Estimate the parameter covariance by inverting the information matrix. e = y - phi10*lagmatrix(y,1); eLag1Sq = lagmatrix(e,1).^2; h0 = alpha00; ft = (e(esI).^2/h0 - 1); zt = [ones(T,1),eLag1Sq(esI)]'; score0 = 1/(2*h0)*zt*ft; % Score function InfoMat0 = (1/(2*h0^2))*(zt*zt'); EstParamCov0 = inv(InfoMat0); % Estimated parameter covariance dof = 1; % Number of model restrictions
Test the null hypothesis that α1 = 0 at the 5% significance level using lmtest. [h,pValue] = lmtest(score0,EstParamCov0,dof) h = logical 1 pValue = 4.0443e-06
pValue is close to 0, which suggests that there is evidence to reject the restricted AR(1) model in favor of the unrestricted AR(1) model with an ARCH(1) variance.
Input Arguments score — Unrestricted model loglikelihood gradients vector | cell array of vectors 12-1824
lmtest
Unrestricted model loglikelihood gradients evaluated at the restricted model parameter estimates, specified as a vector or cell vector. • For a single test, score can be a p-vector or a singleton cell array containing a p-by-1 vector. p is the number of parameters in the unrestricted model. • For conducting k > 1 tests, score must be a length k cell array. Cell j must contain one pj-by-1 vector that corresponds to one independent test. pj is the number of parameters in the unrestricted model of test j. Data Types: double | cell ParamCov — Parameter covariance estimate matrix | cell array of matrices Parameter covariance estimate, specified as a symmetric matrix of cell array of symmetric matrices. ParamCov is the unrestricted model parameter covariance estimator evaluated at the restricted model parameter estimates. • For a single test, ParamCov can be a p-by-p matrix or singleton cell array containing a p-by-p matrix. p is the number of parameters in the unrestricted model. • For conducting k > 1 tests, ParamCov must be a length k cell array. Cell j must contain one pj-bypj matrix that corresponds to one independent test. pj is the number of parameters in the unrestricted model of test j. Data Types: double | cell dof — Degrees of freedom positive integer | vector of positive integers Degrees of freedom for the asymptotic, chi-square distribution of the test statistics, specified as a positive integer or vector of positive integers. For each corresponding test, the elements of dof: • Are the number of model restrictions • Should be less than the number of parameters in the unrestricted model When conducting k > 1 tests, • If dof is a scalar, then the software expands it to a k-by-1 vector. • If dof is a vector, then it must have length k. alpha — Nominal significance levels 0.05 (default) | scalar | vector Nominal significance levels for the hypothesis tests, specified as a scalar or vector. Each element of alpha must be greater than 0 and less than 1. When conducting k > 1 tests, • If alpha is a scalar, then the software expands it to a k-by-1 vector. • If alpha is a vector, then it must have length k. Data Types: double 12-1825
12
Functions
Output Arguments h — Test rejection decisions logical | vector of logicals Test rejection decisions, returned as a logical value or vector of logical values with a length equal to the number of tests that the software conducts. • h = 1 indicates rejection of the null, restricted model in favor of the alternative, unrestricted model. • h = 0 indicates failure to reject the null, restricted model. pValue — Test statistic p-values scalar | vector Test statistic p-values, returned as a scalar or vector with a length equal to the number of tests that the software conducts. stat — Test statistics scalar | vector Test statistics, returned as a scalar or vector with a length equal to the number of tests that the software conducts. cValue — Critical values scalar | vector Critical values determined by alpha, returned as a scalar or vector with a length equal to the number of tests that the software conducts.
More About Lagrange Multiplier Test This test compares specifications of nested models by assessing the significance of restrictions to an extended model with unrestricted parameters. The test statistic (LM) is LM = S′VS, where • S is the gradient of the unrestricted loglikelihood function, evaluated at the restricted parameter estimates (score), i.e., S=
∂l(θ) ∂θ
θ = θ 0, MLE
.
• V is the covariance estimator for the unrestricted model parameters, evaluated at the restricted parameter estimates. If LM exceeds a critical value in its asymptotic distribution, then the test rejects the null, restricted (nested) model in favor of the alternative, unrestricted model. 12-1826
lmtest
The asymptotic distribution of LM is chi-square. Its degrees of freedom (dof) is the number of restrictions in the corresponding model comparison. The nominal significance level of the test (alpha) determines the critical value (cValue).
Tips • lmtest requires the unrestricted model score and parameter covariance estimator evaluated at parameter estimates for the restricted model. For example, to compare competing, nested arima models: 1
Analytically compute the score and parameter covariance estimator based on the innovation distribution.
2
Use estimate to estimate the restricted model parameters.
3
Evaluate the score and covariance estimator at the restricted model estimates.
4
Pass the evaluated score, restricted covariance estimate, and the number of restrictions (i.e., the degrees of freedom) into lmtest.
• If you find estimating parameters in the unrestricted model difficult, then use lmtest. By comparison: • waldtest only requires unrestricted parameter estimates. • lratiotest requires both unrestricted and restricted parameter estimates.
Algorithms • lmtest performs multiple, independent tests when inputs are cell arrays. • If the gradients and covariance estimates are the same for all tests, but the restricted parameter estimates vary, then lmtest “tests down” against multiple restricted models. • If the gradients and covariance estimates vary, but the restricted parameter estimates do not, then lmtest “tests up” against multiple unrestricted models. • Otherwise, lmtest compares model specifications pair-wise. • alpha is nominal in that it specifies a rejection probability in the asymptotic distribution. The actual rejection probability can differ from the nominal significance. Lagrange multiplier tests tend to under-reject for small values of alpha, and over-reject for large values of alpha. Lagrange multiplier tests typically yield lower rejection errors than likelihood ratio and Wald tests.
Version History Introduced in R2009a
References [1] Davidson, R. and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, W. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2008. 12-1827
12
Functions
[4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima | varm | garch | regARIMA Functions lratiotest | estimate | estimate | waldtest | estimate | estimate Topics “Model Comparison Tests” on page 3-57 “Classical Model Misspecification Tests” on page 3-69 “Conduct Lagrange Multiplier Test” on page 3-61
12-1828
lratiotest
lratiotest Likelihood ratio test of model specification
Syntax h = lratiotest(uLogL,rLogL,dof) h = lratiotest(uLogL,rLogL,dof,alpha) [h,pValue] = lratiotest( ___ ) [h,pValue,stat,cValue] = lratiotest( ___ )
Description h = lratiotest(uLogL,rLogL,dof) returns a logical value (h) with the rejection decision from conducting a likelihood ratio test on page 12-1835 of model specification. lratiotest constructs the test statistic using the loglikelihood objective function evaluated at the unrestricted model parameter estimates (uLogL) and the restricted model parameter estimates (rLogL). The test statistic distribution has dof degrees of freedom. • If uLogL or rLogL is a vector, then the other must be a scalar or vector of equal length. lratiotest(uLogL,rLogL,dof) treats each element of a vector input as a separate test, and returns a vector of rejection decisions. • If uLogL or rLogL is a row vector, then lratiotest(uLogL,rLogL,dof) returns a row vector. h = lratiotest(uLogL,rLogL,dof,alpha) returns the rejection decision of the likelihood ratio test conducted at significance level alpha. [h,pValue] = lratiotest( ___ ) returns the rejection decision and p-value (pValue) for the hypothesis test, using any of the input arguments in the previous syntaxes. [h,pValue,stat,cValue] = lratiotest( ___ ) additionally returns the test statistic (stat) and critical value (cValue) for the hypothesis test.
Examples Assess Model Specifications Using the Likelihood Ratio Test Compare two model specifications for simulated education and income data. The unrestricted model has the following loglikelihood: n
l(β, ρ) = − nlogΓ(ρ) + ρ
∑
k=1
n
logβk + (ρ − 1)
∑
k=1
n
log(yk) −
∑
k=1
ykβk,
where •
βk =
1 . β + xk 12-1829
12
Functions
• xk is the number of grades that person k completed. •
yk is the income (in thousands of USD) of person k.
That is, the income of person k given the number of grades that person k completed is Gamma distributed with shape ρ and rate βk. The restricted model sets ρ = 1, which implies that the income of person k given the number of grades person k completed is exponentially distributed with mean β + xk. The restricted model is H0 : ρ = 1. Comparing this model to the unrestricted model using lratiotest requires the following: • The loglikelihood function • The maximum likelihood estimate (MLE) under the unrestricted model • The MLE under the restricted model Load the data. load Data_Income1 x = DataTable.EDU; y = DataTable.INC;
To estimate the unrestricted model parameters, maximize l(ρ, β) with respect to ρ and β. The gradient of l(ρ, β) is n
∂l(ρ, β) = − nψ(ρ) + ∑ log(ykβk) ∂ρ k=1 ∂l(ρ, β) = ∂β
n
∑
k=1
βk(βk yk − ρ),
where ψ(ρ) is the digamma function. nLogLGradFun = @(theta) deal(-sum(-gammaln(theta(1)) - ... theta(1)*log(theta(2) + x) + (theta(1)-1)*log(y) - ... y./(theta(2)+x)),... -[sum(-psi(theta(1))+log(y./(theta(2)+x)));... sum(1./(theta(2)+x).*(y./(theta(2)+x)-theta(1)))]);
nLogLGradFun is an anonymous function that returns the negative loglikelihood and the gradient given the input theta, which holds the parameters ρ and β, respectively. Numerically optimize the negative loglikelihood function using fmincon, which minimizes an objective function subject to constraints. theta0 = randn(2,1); % Initial value for optimization uLB = [0 -min(x)]; % Unrestricted model lower bound uUB = [Inf Inf]; % Unrestricted model upper bound options = optimoptions('fmincon','Algorithm','interior-point',... 'FunctionTolerance',1e-10,'Display','off',... 'SpecifyObjectiveGradient',true); % Optimization options [uMLE,uLogL] = fmincon(nLogLGradFun,theta0,[],[],[],[],uLB,uUB,[],options); uLogL = -uLogL;
12-1830
lratiotest
uMLE is the unrestricted maximum likelihood estimate, and uLogL is the loglikelihood maximum. Impose the restriction to the loglikelihood by setting the corresponding lower and upper bound constraints of ρ to 1. Minimize the negative, restricted loglikelihood. dof = 1; % Number of restrictions rLB = [1 -min(x)]; % Restricted model lower bound rUB = [1 Inf]; % Restricted model upper bound [rMLE,rLogL] = fmincon(nLogLGradFun,theta0,[],[],[],[],rLB,rUB,[],options); rLogL = -rLogL;
rMLE is the unrestricted maximum likelihood estimate, and rLogL is the loglikelihood maximum. Use the likelihood ratio test to assess whether the data provide enough evidence to favor the unrestricted model over the restricted model. [h,pValue,stat] = lratiotest(uLogL,rLogL,dof) h = logical 1 pValue = 8.9146e-04 stat = 11.0404
pValue is close to 0, which indicates that there is strong evidence suggesting that the unrestricted model fits the data better than the restricted model.
Test Among Multiple Nested Model Specifications Assess model specifications by testing down among multiple restricted models using simulated data. The true model is the ARMA(2,1) yt = 3 + 0 . 9yt − 1 − 0 . 5yt − 2 + εt + 0 . 7εt − 1, where εt is Gaussian with mean 0 and variance 1. Specify the true ARMA(2,1) model, and simulate 100 response values. TrueMdl = arima('AR',{0.9,-0.5},'MA',0.7,... 'Constant',3,'Variance',1); T = 100; rng(1); % For reproducibility y = simulate(TrueMdl,T);
Specify the unrestricted model and the candidate models for testing down. Mdl = {arima(2,0,2),arima(2,0,1),arima(2,0,0),arima(1,0,2),arima(1,0,1),... arima(1,0,0),arima(0,0,2),arima(0,0,1)}; rMdlNames = {'ARMA(2,1)','AR(2)','ARMA(1,2)','ARMA(1,1)',... 'AR(1)','MA(2)','MA(1)'};
Mdl is a 1-by-7 cell array. Mdl{1} is the unrestricted model, and all other cells contain a candidate model. 12-1831
12
Functions
Fit the candidate models to the simulated data. logL = zeros(size(Mdl,1),1); % Preallocate loglikelihoods dof = logL; % Preallocate degrees of freedom for k = 1:size(Mdl,2) [EstMdl,~,logL(k)] = estimate(Mdl{k},y,'Display','off'); dof(k) = 4 - (EstMdl.P + EstMdl.Q); % Number of restricted parameters end uLogL = logL(1); rLogL = logL(2:end); dof = dof(2:end);
uLogL and rLogL are the values of the unrestricted loglikelihood evaluated at the unrestricted and restricted model parameter estimates, respectively. Apply the likelihood ratio test at a 1% significance level to find the appropriate, restricted model specification(s). alpha = .01; h = lratiotest(uLogL,rLogL,dof,alpha); RestrictedModels = rMdlNames(~h) RestrictedModels = 1x4 cell {'ARMA(2,1)'} {'ARMA(1,2)'}
{'ARMA(1,1)'}
{'MA(2)'}
The most appropriate restricted models are ARMA(2,1), ARMA(1,2), ARMA(1,1), or MA(2). You can test down again, but use ARMA(2,1) as the unrestricted model. In this case, you must remove MA(2) from the possible restricted models.
Assess Conditional Heteroscedasticity Using the Likelihood Ratio Test Test whether there are significant ARCH effects in a simulated response series using lratiotest. The parameter values in this example are arbitrary. Specify the AR(1) model with an ARCH(1) variance: yt = 0 . 9yt − 1 + εt, where • εt = wt ht . • ht = 1 + 0 . 5ε2 . t−1 • wt is Gaussian with mean 0 and variance 1. VarMdl = garch('ARCH',0.5,'Constant',1); Mdl = arima('Constant',0,'Variance',VarMdl,'AR',0.9);
Mdl is a fully specified AR(1) model with an ARCH(1) variance. Simulate presample and effective sample responses from Mdl. 12-1832
lratiotest
T = 100; rng(1); % For reproducibility n = 2; % Number of presample observations required for the gradient [y,epsilon,condVariance] = simulate(Mdl,T + n); psI = 1:n; % Presample indices esI = (n + 1):(T + n); % Estimation sample indices
epsilon is the random path of innovations from VarMdl. The software filters epsilon through Mdl to yield the random response path y. Specify the unrestricted model assuming that the conditional mean model constant is 0: yt = ϕ1 yt − 1 + εt, where ht = α0 + α1εt2− 1. Fit the simulated data (y) to the unrestricted model using the presample observations. UVarMdl = garch(0,1); UMdl = arima('ARLags',1,'Constant',0,'Variance',UVarMdl); [~,~,uLogL] = estimate(UMdl,y(esI),'Y0',y(psI),'E0',epsilon(psI),... 'V0',condVariance(psI),'Display','off');
uLogL is the maximum value of the unrestricted loglikelihood function. Specify the restricted model assuming that the conditional mean model constant is 0: yt = ϕ1 yt − 1 + εt, where ht = α0. Fit the simulated data (y) to the restricted model using the presample observations. RVarMdl = garch(0,1); RVarMdl.ARCH{1} = 0; RMdl = arima('ARLags',1,'Constant',0,'Variance',RVarMdl); [~,~,rLogL] = estimate(RMdl,y(esI),'Y0',y(psI),'E0',epsilon(psI),... 'V0',condVariance(psI),'Display','off');
The structure of RMdl is the same as UMdl. However, every parameter is unknown, except for the restriction. These are equality constraints during estimation. You can interpret RMdl as an AR(1) model with the Gaussian innovations that have mean 0 and constant variance. Test the null hypothesis that α1 = 0 at the default 5% significance level using lratoitest. dof = (UMdl.P + UMdl.Q + UVarMdl.P + UVarMdl.Q) ... - (RMdl.P + RMdl.Q + RVarMdl.P + RVarMdl.Q); [h,pValue,stat,cValue] = lratiotest(uLogL,rLogL,dof) h = logical 1 pValue = 6.7505e-04 stat = 11.5567 cValue = 3.8415
12-1833
12
Functions
h = 1 indicates that the null, restricted model should be rejected in favor of the alternative, unrestricted model. pValue is close to 0, suggesting that there is strong evidence for the rejection. stat is the value of the chi-square test statistic, and cValue is the critical value for the test.
Input Arguments uLogL — Unrestricted model loglikelihood maxima scalar | vector Unrestricted model loglikelihood maxima, specified as a scalar or vector. If uLogL is a scalar, then the software expands it to the same length as rLogL. Data Types: double rLogL — Restricted model loglikelihood maxima scalar | vector Restricted model loglikelihood maxima, specified as a scalar or vector. If rLogL is a scalar, then the software expands it to the same length as uLogL. Elements of rLogL should not exceed the corresponding elements of uLogL. Data Types: double dof — Degrees of freedom positive integer | vector of positive integers Degrees of freedom for the asymptotic, chi-square distribution of the test statistics, specified as a positive integer or vector of positive integers. For each corresponding test, the elements of dof: • Are the number of model restrictions • Should be less than the number of parameters in the unrestricted model. When conducting k > 1 tests, • If dof is a scalar, then the software expands it to a k-by-1 vector. • If dof is a vector, then it must have length k. Data Types: double alpha — Nominal significance levels 0.05 (default) | scalar | vector Nominal significance levels for the hypothesis tests, specified as a scalar or vector. Each element of alpha must be greater than 0 and less than 1. When conducting k > 1 tests, • If alpha is a scalar, then the software expands it to a k-by-1 vector. • If alpha is a vector, then it must have length k. Data Types: double 12-1834
lratiotest
Output Arguments h — Test rejection decisions logical | vector of logicals Test rejection decisions, returned as a logical value or vector of logical values with a length equal to the number of tests that the software conducts. • h = 1 indicates rejection of the null, restricted model in favor of the alternative, unrestricted model. • h = 0 indicates failure to reject the null, restricted model. pValue — Test statistic p-values scalar | vector Test statistic p-values, returned as a scalar or vector with a length equal to the number of tests that the software conducts. stat — Test statistics scalar | vector Test statistics, returned as a scalar or vector with a length equal to the number of tests that the software conducts. cValue — Critical values scalar | vector Critical values determined by alpha, returned as a scalar or vector with a length equal to the number of tests that the software conducts.
More About Likelihood Ratio Test The likelihood ratio test compares specifications of nested models by assessing the significance of restrictions to an extended model with unrestricted parameters. The test uses the following algorithm: 1
Maximize the loglikelihood function [l(θ)] under the restricted and unrestricted model assumptions. Denote the MLEs for the restricted and unrestricted models θ 0 and θ , respectively.
2
Evaluate the loglikelihood objective function at the restricted and unrestricted MLEs, i.e., l
0
= l θ 0 and l = l θ .
3
Compute the likelihood ratio test statistic, LR = 2 l − l
4
If LR exceeds a critical value (Cα) relative to its asymptotic distribution, then reject the null, restricted model in favor of the alternative, unrestricted model.
0
.
• Under the null hypothesis, LR is χd2 distributed with d degrees of freedom. • The degrees of freedom for the test (d) is the number of restricted parameters. • The significance level of the test (α) determines the critical value (Cα). 12-1835
12
Functions
Tips • Estimate unrestricted and restricted univariate linear time series models, such as arima or garch, or time series regression models (regARIMA) using estimate. Estimate unrestricted and restricted VAR models (varm) using estimate. The estimate functions return loglikelihood maxima, which you can use as inputs to lratiotest. • If you can easily compute both restricted and unrestricted parameter estimates, then use lratiotest. By comparison: • waldtest only requires unrestricted parameter estimates. • lmtest requires restricted parameter estimates.
Algorithms • lratiotest performs multiple, independent tests when the unrestricted or restricted model loglikelihood maxima (uLogL and rLogL, respectively) is a vector. • If rLogL is a vector and uLogL is a scalar, then lratiotest “tests down” against multiple restricted models. • If uLogL is a vector and rLogL is a scalar, then lratiotest “tests up” against multiple unrestricted models. • Otherwise, lratiotest compares model specifications pair-wise. • alpha is nominal in that it specifies a rejection probability in the asymptotic distribution. The actual rejection probability is generally greater than the nominal significance.
Version History Introduced before R2006a
References [1] Davidson, R. and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, W. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2008. [4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima | varm | garch | regARIMA Functions lmtest | estimate | estimate | waldtest | estimate | estimate 12-1836
lratiotest
Topics “Compare GARCH Models Using Likelihood Ratio Test” on page 3-66 “Classical Model Misspecification Tests” on page 3-69 “Model Comparison Tests” on page 3-57
12-1837
12
Functions
mcmix Create random Markov chain with specified mixing structure
Syntax mc = mcmix(numStates) mc = mcmix(numStates,Name=Value)
Description mc = mcmix(numStates) returns the discrete-time Markov chain mc containing numStates states. mc is characterized by random transition probabilities. mc = mcmix(numStates,Name=Value) uses additional options specified by one or more namevalue arguments to structure mc to simulate different mixing times. For example, you can control the pattern of feasible transitions.
Examples Generate Markov Chain from Random Transition Matrix Generate a six-state Markov chain from a random transition matrix. rng(1); % For reproducibility mc = mcmix(6);
mc is a dtmc object. Display the transition matrix. mc.P ans = 6×6 0.2732 0.3050 0.0078 0.2480 0.2708 0.2791
0.1116 0.2885 0.0439 0.1481 0.2488 0.1095
0.1145 0.0475 0.0082 0.2245 0.0580 0.0991
0.1957 0.0195 0.2439 0.0485 0.1614 0.2611
0.0407 0.1513 0.2950 0.1369 0.0137 0.1999
0.2642 0.1882 0.4013 0.1939 0.2474 0.0513
Plot a digraph of the Markov chain. Specify coloring the edges according to the probability of transition. figure; graphplot(mc,ColorEdges=true);
12-1838
mcmix
Decrease Feasible Transitions Generate random transition matrices containing a specified number of zeros in random locations. A zero in location (i, j) indicates that state i does not transition to state j. Generate two 10-state Markov chains from random transition matrices. Specify the random placement of 10 zeros within one chain and 30 zeros within the other chain. rng(1); % For reproducibility numStates = 10; mc1 = mcmix(numStates,Zeros=10); mc2 = mcmix(numStates,Zeros=30);
mc1 and mc2 are dtmc objects. Estimate the mixing times for each Markov chain. [~,tMix1] = asymptotics(mc1) tMix1 = 0.7567 [~,tMix2] = asymptotics(mc2) tMix2 = 0.8137
mc1, the Markov chain with higher connectivity, mixes more quickly than mc2. 12-1839
12
Functions
Fix Specific Transition Probabilities Generate a Markov chain characterized by a partially random transition matrix. Also, decrease the number of feasible transitions. Generate a 4-by-4 matrix of missing (NaN) values, which represents the transition matrix. P = NaN(4);
Specify that state 1 transitions to state 2 with probability 0.5, and that state 2 transitions to state 1 with the same probability. P(1,2) = 0.5; P(2,1) = 0.5;
Create a Markov chain characterized by the partially known transition matrix. For the remaining unknown transition probabilities, specify that five transitions are infeasible for 5 random transitions. An infeasible transition is a transition whose probability of occurring is zero. rng(1); % For reproducibility mc = mcmix(4,Fix=P,Zeros=5);
mc is a dtmc object. With the exception of the fixed elements (1,2) and (2,1) of the transition matrix, mcmix places five zeros in random locations and generates random probabilities for the remaining nine locations. The probabilities in a particular row sum to 1. Display the transition matrix and plot a digraph of the Markov chain. In the plot, indicate transition probabilities by specifying edge colors. P = mc.P P = 4×4 0 0.5000 0.1632 0
0.5000 0 0 0.5672
0.1713 0.1829 0.8368 0.1676
figure; graphplot(mc,'ColorEdges',true);
12-1840
0.3287 0.3171 0 0.2652
mcmix
Input Arguments numStates — Number of states positive integer Number of states, specified as a positive integer. If you do not specify any name-value arguments, mcmix constructs a Markov chain with random transition probabilities. Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: Zeros=10 places 0 at 10 random locations in the transition matrix. Fix — Locations and values of fixed transition probabilities NaN(numStates) (default) | numeric matrix 12-1841
12
Functions
Locations and values of fixed transition probabilities, specified as a numStates-by-numStates numeric matrix. Probabilities in any row must have a sum less than or equal to 1. Rows that sum to 1 also fix 0 values in the rest of the row. mcmix assigns random probabilities to locations containing NaN values. Example: Fix=[0.5 NaN NaN; NaN 0.5 NaN; NaN NaN 0.5] Data Types: double Zeros — Number of zero-valued transition probabilities 0 (default) | positive integer Number of zero-valued transition probabilities to assign to random locations in the transition matrix, specified as a positive integer less than NumStates. The mcmix function assigns Zeros zeros to the locations containing a NaN in Fix. Example: Zeros=10 Data Types: double StateNames — Unique state labels string(1:numStates) (default) | string vector | cell vector of character vectors | numeric vector Unique state labels, specified as a string vector, cell vector of character vectors, or numeric vector of numStates length. Elements correspond to rows and columns of the transition matrix. Example: StateNames=["Depression" "Recession" "Stagnant" "Boom"] Data Types: double | string | cell
Output Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain, returned as a dtmc object.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985.
See Also dtmc | asymptotics 12-1842
mcmix
Topics “Discrete-Time Markov Chains” on page 10-2 “Markov Chain Modeling” on page 10-8
12-1843
12
Functions
minus Lag operator polynomial subtraction
Syntax C = minus(A, B, 'Tolerance', tolerance) C = A -B
Description Given two lag operator polynomials A(L) and B(L), C = minus(A, B, 'Tolerance', tolerance) performs a polynomial subtraction C(L) = A(L) – B(L) with tolerance tolerance. 'Tolerance' is the nonnegative scalar tolerance used to determine which coefficients are included in the result. The default tolerance is 1e–12. Specifying a tolerance greater than 0 allows the user to exclude polynomial lags with near-zero coefficients. A coefficient matrix of a given lag is excluded only if the magnitudes of all elements of the matrix are less than or equal to the specified tolerance. C = A -B performs a polynomial subtraction. If at least one of A or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator).
Examples Subtract Two Lag Operator Polynomials Create two LagOp polynomials and subtract one from the other: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); A-B ans = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [-0.1 0.08] Lags: [1 2] Degree: 2 Dimension: 1
Algorithms The subtraction operator (–) invokes minus, but the optional coefficient tolerance is available only by calling minus directly.
See Also plus 12-1844
mixconjugateblm
mixconjugateblm Bayesian linear regression model with conjugate priors for stochastic search variable selection (SSVS)
Description The Bayesian linear regression model on page 12-1858 object mixconjugateblm specifies the joint prior distribution of the regression coefficients and the disturbance variance (β, σ2) for implementing SSVS on page 12-1859 (see [1] and [2]) assuming β and σ2 are dependent random variables. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis and feature selection, pass the model object and data to the appropriate object function on page 12-1849.
Creation Syntax PriorMdl = mixconjugateblm(NumPredictors) PriorMdl = mixconjugateblm(NumPredictors,Name,Value) Description PriorMdl = mixconjugateblm(NumPredictors) creates a Bayesian linear regression model on page 12-1858 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is appropriate for implementing SSVS for predictor selection [2]. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = mixconjugateblm(NumPredictors,Name,Value) sets properties on page 12-1845 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, mixconjugateblm(3,'Probability',abs(rand(4,1))) specifies random prior regime probabilities for all four coefficients in the model.
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter PriorMdl.Intercept = false;
NumPredictors — Number of predictor variables nonnegative integer 12-1845
12
Functions
Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term for the value. After creating a model, if you change the of value NumPredictors using dot notation, then these parameters revert to the default values: • Variables names (VarNames) • Prior mean of β (Mu) • Prior variances of β for each regime (V) • Prior correlation matrix of β (Correlation) • Prior regime probabilities (Probability) Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char 12-1846
mixconjugateblm
Mu — Component-wise mean hyperparameter of Gaussian mixture prior on β zeros(Intercept + NumPredictors,2) (default) | numeric matrix Component-wise mean hyperparameter of the Gaussian mixture prior on β, specified as an (Intercept + NumPredictors)-by-2 numeric matrix. The first column contains the prior means for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior means for component 2 (the variable-exclusion regime, that is, γ = 0). • If Intercept is false, then Mu has NumPredictors rows. mixconjugateblm sets the prior mean of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, Mu has NumPredictors + 1 elements. The first element corresponds to the prior means of the intercept, and all other elements correspond to the predictor variables. Tip To perform SSVS, use the default value of Mu. Example: In a 3-coefficient model, 'Mu',[0.5 0; 0.5 0; 0.5 0] sets the component 1 prior mean of all coefficients to 0.5 and sets the component 2 prior mean of all coefficients to 0. Data Types: double V — Component-wise variance factor hyperparameter of Gaussian mixture prior on β repmat([10 0.1],Intercept + NumPredictors,1) (default) | positive numeric matrix Component-wise variance factor hyperparameter of the Gaussian mixture prior on β, an (Intercept + NumPredictors)-by-2 positive numeric matrix. The first column contains the prior variance factors for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior variance factors for component 2 (the variable-exclusion regime, that is, γ = 0). Regardless of regime or coefficient, the prior variance of a coefficient is the variance factor times σ2. • If Intercept is false, then V has NumPredictors rows. mixconjugateblm sets the prior variance factor of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, V has NumPredictors + 1 elements. The first element corresponds to the prior variance factor of the intercept, and all other elements correspond to the predictor variables. Tip • To perform SSVS, specify a larger variance factor for regime 1 than for regime 2 (for all j, specify V(j,1) > V(j,2)). • For more details on what value to specify for V, see [1].
Example: In a 3-coefficient model, 'V',[100 1; 100 1; 100 1] sets the component 1 prior variance factor of all coefficients to 100 and sets the component 2 prior variance factor of all coefficients to 1. Data Types: double
12-1847
12
Functions
Probability — Prior probability distribution for variable inclusion and exclusion regimes 0.5*ones(Intercept + NumPredictors,1) (default) | numeric vector of values in [0,1] | function handle Prior probability distribution for the variable inclusion and exclusion regimes, specified as an (Intercept + NumPredictors)-by-1 numeric vector of values in [0,1], or a function handle in the form @fcnName, where fcnName is the function name. Probability represents the prior probability distribution of γ = {γ1,…,γK}, where: • K = Intercept + NumPredictors, which is the number of coefficients in the regression model. • γk ∈ {0,1} for k = 1,…,K. Therefore, the sample space has a cardinality of 2K. • γk = 1 indicates variable VarNames(k) is included in the model, and γk = 0 indicates that the variable is excluded from the model. If Probability is a numeric vector: • Rows correspond to the variable names in VarNames. For models containing an intercept, the prior probability for intercept inclusion is Probability(1). • For k = 1,…,K, the prior probability for excluding variable k is 1 – Probability(k). • Prior probabilities of the variable-inclusion regime, among all variables and the intercept, are independent. If Probability is a function handle, then it represents a custom prior distribution of the variableinclusion regime probabilities. The corresponding function must have this declaration statement (the argument and function names can vary): logprob = regimeprior(varinc)
• logprob is a numeric scalar representing the log of the prior distribution. You can write the prior distribution up to a proportionality constant. • varinc is a K-by-1 logical vector. Elements correspond to the variable names in VarNames and indicate the regime in which the corresponding variable exists. varinc(k) = true indicates VarName(k) is included in the model, and varinc(k) = false indicates it is excluded from the model. You can include more input arguments, but they must be known when you call mixconjugateblm. For details on what value to specify for Probability, see [1]. Example: In a 3-coefficient model, 'Probability',rand(3,1) assigns random prior variableinclusion probabilities to each coefficient. Data Types: double | function_handle Correlation — Prior correlation matrix of β eye(Intercept + NumPredictors) (default) | numeric, positive definite matrix Prior correlation matrix of β for both components in the mixture model, specified as an (Intercept + NumPredictors)-by-(Intercept + NumPredictors) numeric, positive definite matrix. Consequently, the prior covariance matrix for component j in the mixture model is sigma2*diag(sqrt(V(:,j)))*Correlation*diag(sqrt(V(:,j))), where sigma2 is σ2 and V is the matrix of coefficient variance factors. Rows and columns correspond to the variable names in VarNames. 12-1848
mixconjugateblm
By default, regression coefficients are uncorrelated, conditional on the regime. Note You can supply any appropriately sized numeric matrix. However, if your specification is not positive definite, mixconjugateblm issues a warning and replaces your specification with CorrelationPD, where: CorrelationPD = 0.5*(Correlation + Correlation.');
For details on what value to specify for Correlation, see [1]. Data Types: double A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar Shape hyperparameter of the inverse gamma prior on σ2, specified as a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. Example: 'A',0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf. With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. Example: 'B',5 Data Types: double
Object Functions estimate simulate forecast plot summarize
Perform predictor variable selection for Bayesian linear regression models Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Examples
12-1849
12
Functions
Create Prior Model for SSVS Consider the linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions for k = 0,...,3: •
βk | σ2, γk = γkσ V k1Z1 + (1 − γk)σ V k2Z2, where Z1 and Z2 are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori, but they are dependent on the disturbance variance.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. • γk ∈ 0, 1 and it represents the random variable-inclusion regime variable with a discrete uniform distribution. Create a prior model for SSVS. Specify the number of predictors p. p = 3; PriorMdl = mixconjugateblm(p);
PriorMdl is a mixconjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. mixconjugateblm displays a summary of the prior distributions at the command line. Alternatively, you can create a prior model for SSVS by passing the number of predictors to bayeslm and setting the ModelType name-value pair argument to 'mixconjugate'. MdlBayesLM = bayeslm(p,'ModelType','mixconjugate') MdlBayesLM = mixconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Beta(1) | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Beta(2) | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Beta(3) | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
12-1850
mixconjugateblm
Mdl and MdlBayesLM are equivalent model objects. You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. PriorMdl.VarNames = ["IPI" "E" "WR"] PriorMdl = mixconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution IPI | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution E | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution WR | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
MATLAB® associates the variable names to the regression coefficients in displays. Plot the prior distributions. plot(PriorMdl);
12-1851
12
Functions
The prior distribution of each coefficient is a mixture of two Gaussians: both components have a mean of zero, but component 1 has a large variance relative to component 2. Therefore, their distributions are centered at zero and have the spike-and-slab appearance.
Perform Variable Selection Using SSVS and Default Options Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1849. Create a prior model for performing SSVS. Assume that β and σ2 are dependent (a conjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Display the prior regime probabilities and Gaussian mixture variance factors of the prior β. priorProbabilities = table(PriorMdl.Probability,'RowNames',PriorMdl.VarNames,... 'VariableNames',"Probability") priorProbabilities=4×1 table Probability ___________ Intercept
12-1852
0.5
mixconjugateblm
IPI E WR
0.5 0.5 0.5
priorV = array2table(PriorMdl.V,'RowNames',PriorMdl.VarNames,... 'VariableNames',["gammaIs1" "gammaIs0"]) priorV=4×2 table gammaIs1 ________ Intercept IPI E WR
10 10 10 10
gammaIs0 ________ 0.1 0.1 0.1 0.1
PriorMdl stores prior regime probabilities in the Probability property and the regime variance factors in the V property. The default prior probability of variable inclusion is 0.5. The default variance factors for each coefficient are 10 for the variable-inclusion regime and 0.01 for the variableexclusion regime. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo (MCMC) for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ---------------------------------------------------------------------------------Intercept | -18.8333 10.1851 [-36.965, 0.716] 0.037 Empirical 0.8806 IPI | 4.4554 0.1543 [ 4.165, 4.764] 1.000 Empirical 0.4545 E | 0.0010 0.0004 [ 0.000, 0.002] 0.997 Empirical 0.0925 WR | 2.4686 0.3615 [ 1.766, 3.197] 1.000 Empirical 0.1734 Sigma2 | 47.7557 8.6551 [33.858, 66.875] 1.000 Empirical NaN
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions at the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E (standardized) is in [0.000, 0.002] is 0.95. 12-1853
12
Functions
• Regime, which contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability that E should be included in the model is 0.0925. Assuming, that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-1854
mixconjugateblm
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Specify Custom Prior Regime Probability Distribution for SSVS Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1849. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser VarNames = ["IPI" "E" "WR"]; X = DataTable{:,VarNames}; y = DataTable{:,"GNPR"};
Assume the following: • The intercept is in the model with probability 0.9. • IPI and E are in the model with probability 0.75. • If E is included in the model, then the probability that WR is included in the model is 0.9. • If E is excluded from the model, then the probability that WR is included is 0.25. Declare a function named priorssvsexample.m that: 12-1855
12
Functions
• Accepts a logical vector indicating whether the intercept and variables are in the model (true for model inclusion). Element 1 corresponds to the intercept, and the rest of the elements correspond to the variables in the data. • Returns a numeric scalar representing the log of the described prior regime probability distribution. function logprior = priorssvsexample(varinc) %PRIORSSVSEXAMPLE Log prior regime probability distribution for SSVS % PRIORSSVSEXAMPLE is an example of a custom log prior regime probability % distribution for SSVS with dependent random variables. varinc is % a 4-by-1 logical vector indicating whether 4 coefficients are in a model % and logPrior is a numeric scalar representing the log of the prior % distribution of the regime probabilities. % % Coefficients enter a model according to these rules: % * varinc(1) is included with probability 0.9. % * varinc(2) and varinc(3) are in the model with probability 0.75. % * If varinc(3) is included in the model, then the probability that % varinc(4) is included in the model is 0.9. % * If varinc(3) is excluded from the model, then the probability % that varinc(4) is included is 0.25. logprior = log(0.9) + 2*log(0.75) + log(varinc(3)*0.9 + (1-varinc(3))*0.25); end
are (a conjugate mixture model). Create a prior model for performing SSVS. Assume that and Specify the number of predictors p the names of the regression coefficients, and custom, prior probability distribution of the variable-inclusion regimes. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"],... 'Probability',@priorssvsexample);
Implement SSVS by estimating the marginal posterior distributions of and MCMC for estimation, set a random number seed to reproduce the results.
. Because SSVS uses
rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ---------------------------------------------------------------------------------Intercept | -18.7971 10.1644 [-37.002, 0.765] 0.039 Empirical 0.8797 IPI | 4.4559 0.1530 [ 4.166, 4.760] 1.000 Empirical 0.4623 E | 0.0010 0.0004 [ 0.000, 0.002] 0.997 Empirical 0.2665 WR | 2.4684 0.3618 [ 1.759, 3.196] 1.000 Empirical 0.1727 Sigma2 | 47.7391 8.6741 [33.823, 67.024] 1.000 Empirical NaN
12-1856
mixconjugateblm
Assuming, that variables with Regime < 0.1 should be removed from the model, the results suggest that you can include all variables in the model.
Forecast Responses Using Posterior Predictive Distribution Consider the regression model in “Create Prior Model for SSVS” on page 12-1849. Perform SSVS: 1
Create a Bayesian regression model for SSVS with a conjugate prior for the data likelihood. Use the default settings.
2
Hold out the last 10 periods of data from estimation.
3
Estimate the marginal posterior distributions.
p = 3; PriorMdl = bayeslm(p,'ModelType','mixconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off
12-1857
12
Functions
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 18.8470
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. When you perform Bayesian regression with SSVS, a best practice is to tune the hyperparameters. One way to do so is to estimate the forecast RMSE over a grid of hyperparameter values, and choose the value that minimizes the forecast RMSE.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: 12-1858
mixconjugateblm
• yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters. Stochastic Search Variable Selection Stochastic search variable selection (SSVS) is a predictor variable selection method for Bayesian linear regression that searches the space of potential models for models with high posterior probability, and averages the models it finds after it completes the search. SSVS assumes that the prior distribution of each regression coefficient is a mixture of two Gaussian distributions, and the prior distribution of σ2 is inverse gamma with shape A and scale B. Let γ = {γ1, …,γK} be a latent, random regime indicator for the regression coefficients β, where: • K is the number of coefficients in the model (Intercept + NumPredictors). γk = 1 means that βk|σ2,γk is Gaussian with mean 0 and variance c1. • γk = 0 means that a predictor is Gaussian with mean 0 and variance c2. • A probability mass function governs the distribution of γ, and the sample space of γ is composed of 2K elements. More specifically, given γk and σ2, βk = γkc1Z + (1 – γk)c2Z, where: • Z is a standard normal random variable. • For conjugate models (mixconjugateblm), cj = σ2Vj, j = 1,2. • For semiconjugate models (mixsemiconjugateblm), cj = Vj. c1 is relatively large, which implies that the corresponding predictor is more likely to be in the model. c2 is relatively small, which implies that the corresponding predictor is less likely to be in the model because distribution is dense around 0. In this framework, if the potential exists for a total of K coefficients in a model, then the space has 2K models through which to search. Because computing posterior probabilities of all 2K models can be computationally expensive, SSVS uses MCMC to sample γ = {γ1,…,γK} and estimate posterior probabilities of corresponding models. The models that the algorithm chooses often have higher posterior probabilities. The algorithm composes the estimated posterior distributions of β and σ2 by computing the weighted average of the sampled models. The algorithm attributes a larger weight to those models sampled more often. 12-1859
12
Functions
The resulting posterior distribution for conjugate mixture models is analytically tractable (see “Algorithms” on page 12-1860). For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5.
Algorithms A closed-form posterior exists for conjugate mixture priors in the SSVS framework with K coefficients. However, because the prior β|σ2,γ, marginalized by γ, is a 2K-component Gaussian mixture, MATLAB uses MCMC instead to sample from the posterior for numerical stability.
Alternative Functionality The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2018b
References [1] George, E. I., and R. E. McCulloch. "Variable Selection Via Gibbs Sampling." Journal of the American Statistical Association. Vol. 88, No. 423, 1993, pp. 881–889. [2] Koop, G., D. J. Poirier, and J. L. Tobias. Bayesian Econometric Methods. New York, NY: Cambridge University Press, 2007.
See Also Objects empiricalblm | lassoblm | mixsemiconjugateblm Topics “Bayesian Linear Regression” on page 6-2 “Bayesian Lasso Regression” on page 6-52 “Bayesian Stochastic Search Variable Selection” on page 6-63
12-1860
mixsemiconjugateblm
mixsemiconjugateblm Bayesian linear regression model with semiconjugate priors for stochastic search variable selection (SSVS)
Description The Bayesian linear regression model on page 12-1874 object mixsemiconjugateblm specifies the joint prior distribution of the regression coefficients and the disturbance variance (β, σ2) for implementing SSVS on page 12-1875 (see [1] and [2]) assuming β and σ2 are dependent random variables. In general, when you create a Bayesian linear regression model object, it specifies the joint prior distribution and characteristics of the linear regression model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis and feature selection, pass the model object and data to the appropriate object function on page 12-1865.
Creation Syntax PriorMdl = mixsemiconjugateblm(NumPredictors) PriorMdl = mixsemiconjugateblm(NumPredictors,Name,Value) Description PriorMdl = mixsemiconjugateblm(NumPredictors) creates a Bayesian linear regression model on page 12-1858 object (PriorMdl) composed of NumPredictors predictors and an intercept, and sets the NumPredictors property. The joint prior distribution of (β, σ2) is appropriate for implementing SSVS for predictor selection [2]. PriorMdl is a template that defines the prior distributions and the dimensionality of β. PriorMdl = mixsemiconjugateblm(NumPredictors,Name,Value) sets properties on page 121861 (except NumPredictors) using name-value pair arguments. Enclose each property name in quotes. For example, mixsemiconjugateblm(3,'Probability',abs(rand(4,1))) specifies random prior regime probabilities for all four coefficients in the model.
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to exclude an intercept from the model, enter PriorMdl.Intercept = false;
NumPredictors — Number of predictor variables nonnegative integer 12-1861
12
Functions
Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer. NumPredictors must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation. When specifying NumPredictors, exclude any intercept term for the value. After creating a model, if you change the of value NumPredictors using dot notation, then these parameters revert to the default values: • Variable names (VarNames) • Prior mean of β (Mu) • Prior variances of β for each regime (V) • Prior correlation matrix of β (Correlation) • Prior regime probabilities (Probability) Data Types: double Intercept — Flag for including regression model intercept true (default) | false Flag for including a regression model intercept, specified as a value in this table. Value
Description
false
Exclude an intercept from the regression model. Therefore, β is a p-dimensional vector, where p is the value of NumPredictors.
true
Include an intercept in the regression model. Therefore, β is a (p + 1)-dimensional vector. This specification causes a T-by-1 vector of ones to be prepended to the predictor data during estimation and simulation.
If you include a column of ones in the predictor data for an intercept term, then set Intercept to false. Example: 'Intercept',false Data Types: logical VarNames — Predictor variable names string vector | cell vector of character vectors Predictor variable names for displays, specified as a string vector or cell vector of character vectors. VarNames must contain NumPredictors elements. VarNames(j) is the name of the variable in column j of the predictor data set, which you specify during estimation, simulation, or forecasting. The default is {'Beta(1)','Beta(2),...,Beta(p)}, where p is the value of NumPredictors. Example: 'VarNames',["UnemploymentRate"; "CPI"] Data Types: string | cell | char 12-1862
mixsemiconjugateblm
Mu — Component-wise mean hyperparameter of Gaussian mixture prior on β zeros(Intercept + NumPredictors,2) (default) | numeric matrix Component-wise mean hyperparameter of the Gaussian mixture prior on β, specified as an (Intercept + NumPredictors)-by-2 numeric matrix. The first column contains the prior means for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior means for component 2 (the variable-exclusion regime, that is, γ = 0). • If Intercept is false, then Mu has NumPredictors rows. mixsemiconjugateblm sets the prior mean of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, Mu has NumPredictors + 1 elements. The first element corresponds to the prior means of the intercept, and all other elements correspond to the predictor variables. Tip To perform SSVS, use the default value of Mu. Example: In a 3-coefficient model, 'Mu',[0.5 0; 0.5 0; 0.5 0] sets the component 1 prior mean of all coefficients to 0.5 and sets the component 2 prior mean of all coefficients to 0. Data Types: double V — Component-wise variance hyperparameter of Gaussian mixture prior on β repmat([10 0.1],Intercept + NumPredictors,1) (default) | positive numeric matrix Component-wise variance hyperparameter of the Gaussian mixture prior on β, an (Intercept + NumPredictors)-by-2 positive numeric matrix. The first column contains the prior variance factors for component 1 (the variable-inclusion regime, that is, γ = 1). The second column contains the prior variance factors for component 2 (the variable-exclusion regime, that is, γ = 0). • If Intercept is false, then V has NumPredictors rows. mixsemiconjugateblm sets the prior variance factor of the NumPredictors coefficients corresponding to the columns in the predictor data set, which you specify during estimation, simulation, or forecasting. • Otherwise, V has NumPredictors + 1 elements. The first element corresponds to the prior variance factor of the intercept, and all other elements correspond to the predictor variables. Tip • To perform SSVS, specify a larger variance factor for regime 1 than for regime 2 (for all j, specify V(j,1) > V(j,2)). • For more details on what value to specify for V, see [1].
Example: In a 3-coefficient model, 'V',[100 1; 100 1; 100 1] sets the component 1 prior variance factor of all coefficients to 100 and sets the component 2 prior variance factor of all coefficients to 1. Data Types: double Probability — Prior probability distribution for variable inclusion and exclusion regimes 0.5*ones(Intercept + NumPredictors,1) (default) | numeric vector of values in [0,1] | function handle 12-1863
12
Functions
Prior probability distribution for the variable inclusion and exclusion regimes, specified as an (Intercept + NumPredictors)-by-1 numeric vector of values in [0,1], or a function handle in the form @fcnName, where fcnName is the function name. Probability represents the prior probability distribution of γ = {γ1,…,γK}, where: • K = Intercept + NumPredictors, which is the number of coefficients in the regression model. • γk ∈ {0,1} for k = 1,…,K. Therefore, the sample space has a cardinality of 2K. • γk = 1 indicates variable VarNames(k) is included in the model, and γk = 0 indicates that the variable is excluded from the model. If Probability is a numeric vector: • Rows correspond to the variable names in VarNames. For models containing an intercept, the prior probability for intercept inclusion is Probability(1). • For k = 1,…,K, the prior probability for excluding variable k is 1 – Probability(k). • Prior probabilities of the variable-inclusion regime, among all variables and the intercept, are independent. If Probability is a function handle, then it represents a custom prior distribution of the variableinclusion regime probabilities. The corresponding function must have this declaration statement (the argument and function names can vary): logprob = regimeprior(varinc)
• logprob is a numeric scalar representing the log of the prior distribution. You can write the prior distribution up to a proportionality constant. • varinc is a K-by-1 logical vector. Elements correspond to the variable names in VarNames and indicate the regime in which the corresponding variable exists. varinc(k) = true indicates VarName(k) is included in the model, and varinc(k) = false indicates it is excluded from the model. You can include more input arguments, but they must be known when you call mixsemiconjugateblm. For details on what value to specify for Probability, see [1]. Example: In a 3-coefficient model, 'Probability',rand(3,1) assigns random prior variableinclusion probabilities to each coefficient. Data Types: double | function_handle Correlation — Prior correlation matrix of β eye(Intercept + NumPredictors) (default) | numeric, positive definite matrix Prior correlation matrix of β for both components in the mixture model, specified as an (Intercept + NumPredictors)-by-(Intercept + NumPredictors) numeric, positive definite matrix. Consequently, the prior covariance matrix for component j in the mixture model is diag(sqrt(V(:,j)))*Correlation*diag(sqrt(V(:,j))), where V is the matrix of coefficient variances. Rows and columns correspond to the variable names in VarNames. By default, regression coefficients are uncorrelated, conditional on the regime. 12-1864
mixsemiconjugateblm
Note You can supply any appropriately sized numeric matrix. However, if your specification is not positive definite, mixsemiconjugateblm issues a warning and replaces your specification with CorrelationPD, where: CorrelationPD = 0.5*(Correlation + Correlation.');
Tip For details on what value to specify for Correlation, see [1]. Data Types: double A — Shape hyperparameter of inverse gamma prior on σ2 3 (default) | numeric scalar Shape hyperparameter of the inverse gamma prior on σ2, specified as a numeric scalar. A must be at least –(Intercept + NumPredictors)/2. With B held fixed, the inverse gamma distribution becomes taller and more concentrated as A increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. For the functional form of the inverse gamma distribution, see “Analytically Tractable Posteriors” on page 6-5. Example: 'A',0.1 Data Types: double B — Scale hyperparameter of inverse gamma prior on σ2 1 (default) | positive scalar | Inf Scale parameter of inverse gamma prior on σ2, specified as a positive scalar or Inf. With A held fixed, the inverse gamma distribution becomes taller and more concentrated as B increases. This characteristic weighs the prior model of σ2 more heavily than the likelihood during posterior estimation. Example: 'B',5 Data Types: double
Object Functions estimate simulate forecast plot summarize
Perform predictor variable selection for Bayesian linear regression models Simulate regression coefficients and disturbance variance of Bayesian linear regression model Forecast responses of Bayesian linear regression model Visualize prior and posterior densities of Bayesian linear regression model parameters Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Examples
12-1865
12
Functions
Create Prior Model for SSVS Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions for k = 0,...,3: •
βk | σ2, γk = γk V k1Z1 + (1 − γk) V k2Z2, where Z1 and Z2 are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. • γk ∈ 0, 1 and it represents the random variable-inclusion regime variable with a discrete uniform distribution. Create a prior model for SSVS. Specify the number of predictors p. p = 3; PriorMdl = mixsemiconjugateblm(p);
PriorMdl is a mixsemiconjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. mixsemiconjugateblm displays a summary of the prior distributions at the command line. Alternatively, you can create a prior model for SSVS by passing the number of predictors to bayeslm and setting the ModelType name-value pair argument to 'mixsemiconjugate'. MdlBayesLM = bayeslm(p,'ModelType','mixsemiconjugate') MdlBayesLM = mixsemiconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Beta(1) | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Beta(2) | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Beta(3) | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
12-1866
mixsemiconjugateblm
Mdl and MdlBayesLM are equivalent model objects. You can set writable property values of created models using dot notation. Set the regression coefficient names to the corresponding variable names. PriorMdl.VarNames = ["IPI" "E" "WR"] PriorMdl = mixsemiconjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: Probability: Correlation: A: B:
3 1 {4x1 [4x2 [4x2 [4x1 [4x4 3 1
cell} double] double] double] double]
| Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution IPI | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution E | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution WR | 0 2.2472 [-5.201, 5.201] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
MATLAB® associates the variable names to the regression coefficients in displays. Plot the prior distributions. plot(PriorMdl);
12-1867
12
Functions
The prior distribution of each coefficient is a mixture of two Gaussians: both components have a mean of zero, but component 1 has a large variance relative to component 2. Therefore, their distributions are centered at zero and have the spike-and-slab appearance.
Perform Variable Selection Using SSVS and Default Options Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1865. Create a prior model for performing SSVS. Assume that β and σ2 are independent (a semiconjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixsemiconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Display the prior regime probabilities and Gaussian mixture variance factors of the prior β. priorProbabilities = table(PriorMdl.Probability,'RowNames',PriorMdl.VarNames,... 'VariableNames',"Probability") priorProbabilities=4×1 table Probability ___________ Intercept
12-1868
0.5
mixsemiconjugateblm
IPI E WR
0.5 0.5 0.5
priorV = array2table(PriorMdl.V,'RowNames',PriorMdl.VarNames,... 'VariableNames',["gammaIs1" "gammaIs0"]) priorV=4×2 table gammaIs1 ________ Intercept IPI E WR
10 10 10 10
gammaIs0 ________ 0.1 0.1 0.1 0.1
PriorMdl stores prior regime probabilities in the Probability property and the regime variance factors in the V property. The default prior probability of variable inclusion is 0.5. The default variance factors for each coefficient are 10 for the variable-inclusion regime and 0.01 for the variableexclusion regime. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Implement SSVS by estimating the marginal posterior distributions of β and σ2. Because SSVS uses Markov chain Monte Carlo (MCMC) for estimation, set a random number seed to reproduce the results. rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ------------------------------------------------------------------------------Intercept | -1.5629 2.6816 [-7.879, 2.703] 0.300 Empirical 0.5901 IPI | 4.6217 0.1222 [ 4.384, 4.865] 1.000 Empirical 1 E | 0.0004 0.0002 [ 0.000, 0.001] 0.976 Empirical 0.0918 WR | 2.6098 0.3691 [ 1.889, 3.347] 1.000 Empirical 1 Sigma2 | 50.9169 9.4955 [35.838, 72.707] 1.000 Empirical NaN
PosteriorMdl is an empiricalblm model object that stores draws from the posterior distributions of β and σ2 given the data. estimate displays a summary of the marginal posterior distributions at the command line. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns correspond to characteristics of the posterior distribution. The characteristics include: • CI95, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of E is in [0.000, 0.001] is 0.95. 12-1869
12
Functions
• Regime, which contains the marginal posterior probability of variable inclusion (γ = 1 for a variable). For example, the posterior probability that E should be included in the model is 0.0918. Assuming that variables with Regime < 0.1 should be removed from the model, the results suggest that you can exclude the unemployment rate from the model. By default, estimate draws and discards a burn-in sample of size 5000. However, a good practice is to inspect a trace plot of the draws for adequate mixing and lack of transience. Plot a trace plot of the draws for each parameter. You can access the draws that compose the distribution (the properties BetaDraws and Sigma2Draws) using dot notation. figure; for j = 1:(p + 1) subplot(2,2,j); plot(PosteriorMdl.BetaDraws(j,:)); title(sprintf('%s',PosteriorMdl.VarNames{j})); end
figure; plot(PosteriorMdl.Sigma2Draws); title('Sigma2');
12-1870
mixsemiconjugateblm
The trace plots indicate that the draws seem to mix well. The plots show no detectable transience or serial correlation, and the draws do not jump between states.
Specify Custom Prior Regime Probability Distribution for SSVS Consider the linear regression model in “Create Prior Model for SSVS” on page 12-1865. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser VarNames = ["IPI" "E" "WR"]; X = DataTable{:,VarNames}; y = DataTable{:,"GNPR"};
Assume the following: • The intercept is in the model with probability 0.9. • IPI and E are in the model with probability 0.75. • If E is included in the model, then the probability that WR is included in the model is 0.9. • If E is excluded from the model, then the probability that WR is included is 0.25. Declare a function named priorssvsexample.m that: 12-1871
12
Functions
• Accepts a logical vector indicating whether the intercept and variables are in the model (true for model inclusion). Element 1 corresponds to the intercept, and the rest of the elements correspond to the variables in the data. • Returns a numeric scalar representing the log of the described prior regime probability distribution. function logprior = priorssvsexample(varinc) %PRIORSSVSEXAMPLE Log prior regime probability distribution for SSVS % PRIORSSVSEXAMPLE is an example of a custom log prior regime probability % distribution for SSVS with dependent random variables. varinc is % a 4-by-1 logical vector indicating whether 4 coefficients are in a model % and logPrior is a numeric scalar representing the log of the prior % distribution of the regime probabilities. % % Coefficients enter a model according to these rules: % * varinc(1) is included with probability 0.9. % * varinc(2) and varinc(3) are in the model with probability 0.75. % * If varinc(3) is included in the model, then the probability that % varinc(4) is included in the model is 0.9. % * If varinc(3) is excluded from the model, then the probability % that varinc(4) is included is 0.25. logprior = log(0.9) + 2*log(0.75) + log(varinc(3)*0.9 + (1-varinc(3))*0.25); end
are independent (a semiconjugate Create a prior model for performing SSVS. Assume that and mixture model). Specify the number of predictors p the names of the regression coefficients, and custom, prior probability distribution of the variable-inclusion regimes. p = 3; PriorMdl = mixsemiconjugateblm(p,'VarNames',["IPI" "E" "WR"],... 'Probability',@priorssvsexample);
Implement SSVS by estimating the marginal posterior distributions of and MCMC for estimation, set a random number seed to reproduce the results.
. Because SSVS uses
rng(1); PosteriorMdl = estimate(PriorMdl,X,y); Method: MCMC sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution Regime ------------------------------------------------------------------------------Intercept | -1.4658 2.6046 [-7.781, 2.546] 0.308 Empirical 0.5516 IPI | 4.6227 0.1222 [ 4.385, 4.866] 1.000 Empirical 1 E | 0.0004 0.0002 [ 0.000, 0.001] 0.976 Empirical 0.2557 WR | 2.6105 0.3692 [ 1.886, 3.346] 1.000 Empirical 1 Sigma2 | 50.9621 9.4999 [35.860, 72.596] 1.000 Empirical NaN
12-1872
mixsemiconjugateblm
Assuming, that variables with Regime < 0.1 should be removed from the model, the results suggest that you can include all variables in the model.
Forecast Responses Using Posterior Predictive Distribution Consider the regression model in “Create Prior Model for SSVS” on page 12-1865. Perform SSVS: 1
Create a Bayesian regression model for SSVS with a semiconjugate prior for the data likelihood. Use the default settings.
2
Hold out the last 10 periods of data from estimation.
3
Estimate the marginal posterior distributions.
p = 3; PriorMdl = bayeslm(p,'ModelType','mixsemiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Forecast responses using the posterior predictive distribution and the future predictor data XF. Plot the true values of the response and the forecasted values. yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('Forecast Horizon','True GNPR','Forecasted GNPR','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off
12-1873
12
Functions
yF is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data. Estimate the forecast root mean squared error (RMSE). frmse = sqrt(mean((yF - yFT).^2)) frmse = 4.5935
The forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best-performing model of the ones being compared. When you perform Bayesian regression with SSVS, a best practice is to tune the hyperparameters. One way to do so is to estimate the forecast RMSE over a grid of hyperparameter values, and choose the value that minimizes the forecast RMSE. Copyright 2018 The MathWorks, Inc.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: 12-1874
mixsemiconjugateblm
• yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters. Stochastic Search Variable Selection Stochastic search variable selection (SSVS) is a predictor variable selection method for Bayesian linear regression that searches the space of potential models for models with high posterior probability, and averages the models it finds after it completes the search. SSVS assumes that the prior distribution of each regression coefficient is a mixture of two Gaussian distributions, and the prior distribution of σ2 is inverse gamma with shape A and scale B. Let γ = {γ1, …,γK} be a latent, random regime indicator for the regression coefficients β, where: • K is the number of coefficients in the model (Intercept + NumPredictors). γk = 1 means that βk|σ2,γk is Gaussian with mean 0 and variance c1. • γk = 0 means that a predictor is Gaussian with mean 0 and variance c2. • A probability mass function governs the distribution of γ, and the sample space of γ is composed of 2K elements. More specifically, given γk and σ2, βk = γkc1Z + (1 – γk)c2Z, where: • Z is a standard normal random variable. • For conjugate models (mixconjugateblm), cj = σ2Vj, j = 1,2. • For semiconjugate models (mixsemiconjugateblm), cj = Vj. c1 is relatively large, which implies that the corresponding predictor is more likely to be in the model. c2 is relatively small, which implies that the corresponding predictor is less likely to be in the model because distribution is dense around 0. In this framework, if the potential exists for a total of K coefficients in a model, then the space has 2K models through which to search. Because computing posterior probabilities of all 2K models can be computationally expensive, SSVS uses MCMC to sample γ = {γ1,…,γK} and estimate posterior probabilities of corresponding models. The models that the algorithm chooses often have higher posterior probabilities. The algorithm composes the estimated posterior distributions of β and σ2 by computing the weighted average of the sampled models. The algorithm attributes a larger weight to those models sampled more often. 12-1875
12
Functions
The resulting posterior distribution for semiconjugate mixture models is analytically intractable. For details on the posterior distribution, see “Analytically Tractable Posteriors” on page 6-5.
Alternative Functionality The bayeslm function can create any supported prior model object for Bayesian linear regression.
Version History Introduced in R2018b
References [1] George, E. I., and R. E. McCulloch. "Variable Selection Via Gibbs Sampling." Journal of the American Statistical Association. Vol. 88, No. 423, 1993, pp. 881–889. [2] Koop, G., D. J. Poirier, and J. L. Tobias. Bayesian Econometric Methods. New York, NY: Cambridge University Press, 2007.
See Also Objects empiricalblm | lassoblm | mixconjugateblm Topics “Bayesian Linear Regression” on page 6-2 “Bayesian Lasso Regression” on page 6-52 “Bayesian Stochastic Search Variable Selection” on page 6-63
12-1876
mldivide
mldivide Lag operator polynomial left division
Syntax B = A\C B = mldivide(A, C'PropertyName',PropertyValue)
Description Given two lag operator polynomials, A(L) and C(L)B = A\C perform a left division so that C(L) = A(L)*B(L), or B(L) = A(L)\C(L). Left division requires invertibility of the coefficient matrix associated with lag 0 of the denominator polynomial A(L). B = mldivide(A, C'PropertyName',PropertyValue) accepts one or more comma-separated property name/value pairs.
Input Arguments A Denominator (divisor) lag operator polynomial object, as produced by LagOp, in the quotient A(L) \C(L). C Numerator (dividend) lag operator polynomial object, as produced by LagOp, in the quotient A(L) \C(L)). If at least one of A or C is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator). 'AbsTol' Nonnegative scalar absolute tolerance used as part of the termination criterion of the calculation of the quotient coefficients and, subsequently, to determine which coefficients to include in the quotient. Specifying an absolute tolerance allows for customization of the termination criterion. Once the algorithm has terminated, 'AbsTol' is used to exclude polynomial lags with near-zero coefficients. A coefficient matrix for a given lag is excluded if the magnitudes of all elements of the matrix are less than or equal to the absolute tolerance. Default: 1e-12 'RelTol' Nonnegative scalar relative tolerance used as part of the termination criterion of the calculation of the quotient coefficients. At each lag, a coefficient matrix is calculated and its 2-norm compared to the largest coefficient 2-norm. If the ratio of the current norm to the largest norm is less than or equal to 'RelTol', then the relative termination criterion is satisfied. Default: 0.01 12-1877
12
Functions
'Window' Positive integer indicating the size of the window used to check termination tolerances. Window represents the number of consecutive lags for which coefficients must satisfy a tolerance-based termination criterion in order to terminate the calculation of the quotient coefficients. If coefficients remain below tolerance for the length of the specified tolerance window, they are assumed to have died out sufficiently to terminate the algorithm (see notes below). Default: 20 'Degree' Nonnegative integer indicating the maximum degree of the quotient polynomial. For stable denominators, the default is the power to which the magnitude of the largest eigenvalue of the denominator must be raised to equal the relative termination tolerance 'RelTol'; for unstable denominators, the default is the power to which the magnitude of the largest eigenvalue must be raised to equal the largest positive floating point number (see realmax). The default is 1000, regardless of the stability of the denominator. Default: 1000
Output Arguments B Quotient lag operator polynomial object, such that B(L) = A(L)\C(L).
Examples Divide Lag Operator Polynomials Create two LagOp polynomial objects: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5});
The ratios A/B and B\A are equal: isEqLagOp(A/B,B\A) ans = logical 1
Tips The right division operator (\) invokes mldivide, but the optional inputs are available only by calling mldivide directly. To right-invert a stable B(L), set C(L) = eye(B.Dimension). 12-1878
mldivide
Algorithms Lag operator polynomial division generally results in infinite-degree polynomials. mldivide imposes a termination criterion to truncate the degree of the quotient polynomial. If 'Degree' is unspecified, the maximum degree of the quotient is determined by the stability of the denominator. Stable denominator polynomials usually result in quotients whose coefficients exhibit geometric decay in absolute value. (When coefficients change sign, it is the coefficient envelope which decays geometrically.) Unstable denominators usually result in quotients whose coefficients exhibit geometric growth in absolute value. In either case, maximum degree will not exceed the value of 'Degree'. To control truncation error by terminating the coefficient sequence too early, the termination criterion involves three steps: 1
At each lag in the quotient polynomial, a coefficient matrix is calculated and tested against both a relative and an absolute tolerance (see 'RelTol' and 'AbsTol' inputs ).
2
If the current coefficient matrix is below either tolerance, then a tolerance window is opened to ensure that all subsequent coefficients remain below tolerance for a number of lags determined by 'Window'.
3
If any subsequent coefficient matrix within the window is above both tolerances, then the tolerance window is closed and additional coefficients are calculated, repeating steps (1) and (2) until a subsequent coefficient matrix is again below either tolerance, and a new window is opened.
Steps (1)-(3) are repeated until a coefficient is below tolerance and subsequent coefficients remains below tolerance for 'Window' lags, or until the maximum 'Degree' is encountered, or until a coefficient becomes numerically unstable (NaN or +/-Inf).
References [1] Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hayashi, F. Econometrics. Princeton, NJ: Princeton University Press, 2000. [3] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also mrdivide Topics “Specify Lag Operator Polynomials” on page 2-9 “Plot the Impulse Response Function of Conditional Mean Model” on page 7-80
12-1879
12
Functions
mrdivide Lag operator polynomial right division
Syntax A = C/B A = mrdivide(C, B,'PropertyName', PropertyValue)
Description A = C/B returns the quotient lag operator polynomial (A), which is the result of C(L)/B(L). A = mrdivide(C, B,'PropertyName', PropertyValue) accepts one or more optional commaseparated property name/value pairs.
Input Arguments C Numerator (dividend) lag operator polynomial object, as produced by LagOp, in the quotient C(L)/ B(L). B Denominator (divisor) lag operator polynomial object, as produced by LagOp, in the quotient C(L)/ B(L). If at least one of C or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator). 'AbsTol' Nonnegative scalar absolute tolerance used as part of the termination criterion of the calculation of the quotient coefficients and, subsequently, to determine which coefficients to include in the quotient. Specifying an absolute tolerance allows for customization of the termination criterion. Once the algorithm has terminated, 'AbsTol' is used to exclude polynomial lags with near-zero coefficients. A coefficient matrix for a given lag is excluded if the magnitudes of all elements of the matrix are less than or equal to the absolute tolerance. Default: 1e-12 'RelTol' Nonnegative scalar relative tolerance used as part of the termination criterion of the calculation of the quotient coefficients. At each lag, a coefficient matrix is calculated and its 2-norm compared to the largest coefficient 2-norm. If the ratio of the current norm to the largest norm is less than or equal to 'RelTol', then the relative termination criterion is satisfied. Default: 0.01 12-1880
mrdivide
'Window' Positive integer indicating the size of the window used to check termination tolerances. Window represents the number of consecutive lags for which coefficients must satisfy a tolerance-based termination criterion in order to terminate the calculation of the quotient coefficients. If coefficients remain below tolerance for the length of the specified tolerance window, they are assumed to have died out sufficiently to terminate the algorithm (see notes below). Default: 20 'Degree' Nonnegative integer indicating the maximum degree of the quotient polynomial. For stable denominators, the default is the power to which the magnitude of the largest eigenvalue of the denominator must be raised to equal the relative termination tolerance 'RelTol'; for unstable denominators, the default is the power to which the magnitude of the largest eigenvalue must be raised to equal the largest positive floating point number (see realmax). The default is 1000, regardless of the stability of the denominator. Default: 1000
Output Arguments A Quotient lag operator polynomial object, with A(L) = C(L)/B(L).
Examples Invert a Lag Operator Polynomial Create a LagOp polynomial object with a sequence of scalar coefficients specified as a cell array: A = LagOp({1 -0.5});
Invert the polynomial by using the short-hand slash ("/") operator: a = 1 / A a = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 0.5 0.25 0.125 0.0625 0.03125 0.015625] Lags: [0 1 2 3 4 5 6] Degree: 6 Dimension: 1
Tips The right division operator (/) invokes mrdivide, but the optional inputs are available only by calling mrdivide directly. To right-invert a stable B(L), set C(L) = eye(B.Dimension). 12-1881
12
Functions
Algorithms Lag operator polynomial division generally results in infinite-degree polynomials. mrdivide imposes a termination criterion to truncate the degree of the quotient polynomial. If 'Degree' is unspecified, the maximum degree of the quotient is determined by the stability of the denominator. Stable denominator polynomials usually result in quotients whose coefficients exhibit geometric decay in absolute value. (When coefficients change sign, it is the coefficient envelope which decays geometrically.) Unstable denominators usually result in quotients whose coefficients exhibit geometric growth in absolute value. In either case, maximum degree will not exceed the value of 'Degree'. To control truncation error by terminating the coefficient sequence too early, the termination criterion involves three steps: 1
At each lag in the quotient polynomial, a coefficient matrix is calculated and tested against both a relative and an absolute tolerance (see 'RelTol' and 'AbsTol' inputs ).
2
If the current coefficient matrix is below either tolerance, then a tolerance window is opened to ensure that all subsequent coefficients remain below tolerance for a number of lags determined by 'Window'.
3
If any subsequent coefficient matrix within the window is above both tolerances, then the tolerance window is closed and additional coefficients are calculated, repeating steps (1) and (2) until a subsequent coefficient matrix is again below either tolerance, and a new window is opened.
The algorithm repeats steps 1–3 until a coefficient is below tolerance and subsequent coefficients remains below tolerance for 'Window' lags, or until the maximum 'Degree' is encountered, or until a coefficient becomes numerically unstable (NaN or +/-Inf).
References [1] Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hayashi, F. Econometrics. Princeton, NJ: Princeton University Press, 2000. [3] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also mldivide
12-1882
msVAR
msVAR Create Markov-switching dynamic regression model
Description The msVAR function returns an msVAR object that specifies the functional form of a Markov-switching dynamic regression model on page 12-1898 for the univariate or multivariate response process yt. The msVAR object also stores the parameter values of the model. An msVAR object has two key components: • Switching mechanism among states, represented by a discrete-time Markov chain (dtmc object) • State-specific submodels, either autoregressive (ARX) or vector autoregression (VARX) models (arima or varm objects), which can contain exogenous regression components The components completely specify the model structure. The Markov chain transition matrix and submodel parameters, such as the AR coefficients and innovation-distribution variance, are unknown and estimable unless you specify their values. To estimate a model containing unknown parameter values, pass the model and data to estimate. To work with an estimated or fully specified msVAR object, pass it to an object function on page 12-1885. Alternatively, to create a threshold-switching dynamic regression model, which has a switching mechanism governed by threshold transitions and observations of a threshold variable, see threshold and tsVAR.
Creation Syntax Mdl = msVAR(mc,mdl) Mdl = msVAR(mc,mdl,'SeriesNames',seriesNames) Description Mdl = msVAR(mc,mdl) creates a Markov-switching dynamic regression model Mdl (an msVAR object) that has the discrete-time Markov chain, switching mechanism among states mc and the statespecific, stable dynamic regression submodels mdl. Mdl = msVAR(mc,mdl,'SeriesNames',seriesNames) optionally sets the SeriesNames property, which associates the names seriesNames to the time series of the model. Input Arguments mc — Discrete-time Markov chain for switching mechanism among states dtmc object Discrete-time Markov chain for the switching mechanism among states, specified as a dtmc object. 12-1883
12
Functions
The states represented in the rows and columns of the transition matrix mc.P correspond to the states represented in the submodel vector mdl. msVAR processes and stores mc in the Switch property. mdl — State-specific dynamic regression submodels vector of arima objects | vector of varm objects State-specific dynamic regression submodels, specified as a length mc.NumStates vector of model objects individually constructed by arima or varm. All submodels must be of the same type (arima or varm) and have the same number of series. Unlike other model estimation tools, estimate does not infer the size of submodel regression coefficient arrays during estimation. Therefore, you must specify the Beta property of each submodel appropriately. For example, to include and estimate three predictors of the regression component of univariate submodel j, set mdl(j).Beta = NaN(3,1). msVAR processes and stores mdl in the property Submodels.
Properties You can set only the SeriesNames property when you create a model by using name-value argument syntax or by using dot notation. MATLAB derives the values of all other properties from inputs mc and mdl. For example, create a Markov-switching model for a 2-D response series, and then label the first and second series "GDP" and "CPI", respectively. Mdl = msVAR(mc,mdl); Mdl.SeriesNames = ["GDP" "CPI"];
NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar. Data Types: double NumSeries — Number of time series positive integer This property is read-only. Number of time series, specified as a positive integer. NumSeries specifies the dimensionality of the response variable and innovation in all submodels. Data Types: double StateNames — State labels string vector This property is read-only. State labels, specified as a string vector of length NumStates. 12-1884
msVAR
Data Types: string SeriesNames — Unique Series labels string(1:numSeries) (default) | string vector | cell array of character vectors | numeric vector Unique series labels, specified as a string vector, cell array of character vectors, or a numeric vector of length numSeries. msVAR stores the series names as a string vector. Data Types: string Switch — Discrete-time Markov chain for switching mechanism among states dtmc object This property is read-only. Discrete-time Markov chain for the switching mechanism among states, specified as a dtmc object. Submodels — State-specific vector autoregression submodels vector of varm objects This property is read-only. State-specific vector autoregression submodels, specified as a vector of varm objects of length NumStates. msVAR removes unsupported submodel components. • For arima submodels, msVAR does not support the moving average (MA), differencing, and seasonal components. If any submodel is a composite conditional mean and variance model (for example, its Variance property is a garch object), msVAR issues an error. • For varm submodels, msVAR does not support the trend component. msVAR converts submodels specified as arima objects to 1-D varm objects. Notes • NaN-valued elements in either the properties of Switch or the submodels of Submodels indicate unknown, estimable parameters. Specified elements, except submodel innovation variances, indicate equality constraints on parameters in model estimation. • All unknown submodel parameters are state dependent.
Object Functions estimate filter forecast simulate smooth summarize
Fit Markov-switching dynamic regression model to data Filtered inference of operative latent states in Markov-switching dynamic regression data Forecast sample paths from Markov-switching dynamic regression model Simulate sample paths of Markov-switching dynamic regression model Smoothed inference of operative latent states in Markov-switching dynamic regression data Summarize Markov-switching dynamic regression model estimation results 12-1885
12
Functions
Examples Create Fully Specified Univariate Model Create a two-state Markov-switching dynamic regression model for a 1-D response process. Specify all parameter values (this example uses arbitrary values). Create a two-state discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.9 0.1; 0.3 0.7]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]) mc = dtmc with properties: P: [2x2 double] StateNames: ["Expansion" NumStates: 2
"Recession"]
mc is a dtmc object. For each regime, use arima to create an AR model that describes the response process within the regime. % Constants C1 = 5; C2 = -5; % AR coefficients AR1 = [0.3 0.2]; % 2 lags AR2 = 0.1; % 1 lag % Innovations variances v1 = 2; v2 = 1; % AR Submodels mdl1 = arima('Constant',C1,'AR',AR1,... 'Variance',v1,'Description','Expansion State') mdl1 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality:
12-1886
"Expansion State" "Y" Name = "Gaussian" 2 0 0 5 {0.3 0.2} at lags [1 2] {} {} {} 0
msVAR
Beta: [1×0] Variance: 2 ARIMA(2,0,0) Model (Gaussian Distribution) mdl2 = arima('Constant',C2,'AR',AR2,... 'Variance',v2,'Description','Recession State') mdl2 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State" "Y" Name = "Gaussian" 1 0 0 -5 {0.1} at lag [1] {} {} {} 0 [1×0] 1
ARIMA(1,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are fully specified arima objects. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2];
Use msVAR to create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
2 1 ["Expansion" "1" [1x1 dtmc] [2x1 varm]
"Recession"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P:
"AR-Stationary 1-Dimensional VAR(2) Model" "Y1" 1 2
12-1887
12
Functions
Constant: AR: Trend: Beta: Covariance:
5 {0.3 0.2} at lags [1 2] 0 [1×0 matrix] 2
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 1-Dimensional VAR(1) Model" "Y1" 1 1 -5 {0.1} at lag [1] 0 [1×0 matrix] 1
Mdl is a fully specified msVAR object representing a univariate two-state Markov-switching dynamic regression model. msVAR stores specified arima submodels as varm objects. Because Mdl is fully specified, you can pass it to any msVAR object function for further analysis (see “Object Functions” on page 12-1885). Or, you can specify that the parameters of Mdl are initial values for the estimation procedure (see estimate).
Create Fully Specified Model for US GDP Rate Consider a two-state Markov-switching dynamic regression model of the postwar US real GDP growth rate. The model has the parameter estimates presented in [1]. Create a discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.92 0.08; 0.26 0.74]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
mc is a fully specified dtmc object. Create separate AR(0) models (constant only) for the two regimes. sigma = 3.34; % Homoscedastic models across states mdl1 = arima('Constant',4.62,'Variance',sigma^2); mdl2 = arima('Constant',-0.48,'Variance',sigma^2); mdl = [mdl1 mdl2];
Create the Markov-switching dynamic regression model that describes the behavior of the US GDP growth rate. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties:
12-1888
msVAR
NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
2 1 ["Expansion" "1" [1x1 dtmc] [2x1 varm]
"Recession"]
Mdl is a fully specified msVAR object.
Create Partially Specified Univariate Model for Estimation Consider fitting to data a two-state Markov-switching model for a 1-D response process. Create a discrete-time Markov chain model for the switching mechanism. Specify a 2-by-2 matrix of NaN values for the transition matrix. This setting indicates that you want to estimate all transition probabilities. Label the states. P = NaN(2); mc = dtmc(P,'StateNames',["Expansion" "Recession"]) mc = dtmc with properties: P: [2x2 double] StateNames: ["Expansion" NumStates: 2
"Recession"]
mc.P ans = 2×2 NaN NaN
NaN NaN
mc is a partially specified dtmc object. The transition matrix mc.P is completely unknown and estimable. Create AR(1) and AR(2) models by using the shorthand syntax of arima. After you create each model, specify the model description by using dot notation. mdl1 = arima(1,0,0); mdl1.Description = "Expansion State" mdl1 = arima with properties: Description: SeriesName: Distribution: P: D: Q:
"Expansion State" "Y" Name = "Gaussian" 1 0 0
12-1889
12
Functions
Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
NaN {NaN} at lag [1] {} {} {} 0 [1×0] NaN
ARIMA(1,0,0) Model (Gaussian Distribution) mdl2 = arima(2,0,0); mdl2.Description = "Recession State" mdl2 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State" "Y" Name = "Gaussian" 2 0 0 NaN {NaN NaN} at lags [1 2] {} {} {} 0 [1×0] NaN
ARIMA(2,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are partially specified arima objects. NaN-valued properties correspond to unknown, estimable parameters. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2];
Create a Markov-switching model template from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
2 1 ["Expansion" "1" [1x1 dtmc] [2x1 varm]
"Recession"]
Mdl is a partially specified msVAR object representing a univariate two-state Markov-switching dynamic regression model. 12-1890
msVAR
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(1) Model" "Y1" 1 1 NaN {NaN} at lag [1] 0 [1×0 matrix] NaN
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" "Y1" 1 2 NaN {NaN NaN} at lags [1 2] 0 [1×0 matrix] NaN
msVAR converts the arima object submodels to 1-D varm object equivalents. Mdl is prepared for estimation. You can pass Mdl, along with data and a fully specified model containing initial values for optimization, to estimate.
Create Fully Specified Multivariate Model Create a three-state Markov-switching dynamic regression model for a 2-D response process. Specify all parameter values (this example uses arbitrary values). Create a three-state discrete-time Markov chain model that describes the regime switching mechanism. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P); mc.P ans = 3×3 0.8333 0.0833 0.0833
0.0833 0.8333 0.0833
0.0833 0.0833 0.8333
mc is a dtmc object. dtmc normalizes P so that each row sums to 1. 12-1891
12
Functions
For each regime, use varm to create a VAR model that describes the response process within the regime. Specify all parameter values. % Constants (numSeries x 1 vectors) C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; % Autoregression coefficients (numSeries AR1 = {}; % 0 AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2
x numSeries matrices) lags lag lags
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VAR Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3);
mdl1, mdl2, and mdl3 are fully specified varm objects. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2; mdl3];
Use msVAR to create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
3 2 ["1" "2" ["1" "2"] [1x1 dtmc] [3x1 varm]
"3"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
12-1892
"2-Dimensional VAR(0) Model" "Y1" "Y2" 2 0 [1 -1]' {} [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
msVAR
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VAR(1) Model" "Y1" "Y2" 2 1 [2 -2]' {2×2 matrix} at lag [1] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
Mdl.Submodels(3) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VAR(2) Model" "Y1" "Y2" 2 2 [3 -3]' {2×2 matrices} at lags [1 2] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
Mdl is a fully specified msVAR object representing a multivariate three-state Markov-switching dynamic regression model.
Create Fully Specified Model Containing Regression Component Consider including regression components for exogenous variables in each submodel of the Markovswitching dynamic regression model in “Create Fully Specified Multivariate Model” on page 12-1891. Create a three-state discrete-time Markov chain model that describes the regime switching mechanism. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P);
For each regime, use varm to create a VARX model that describes the response process within the regime. Specify all parameter values. % Constants (numSeries x 1 vectors) C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; % Autoregression coefficients (numSeries x numSeries matrices) AR1 = {}; % 0 lags AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 lag
12-1893
12
Functions
AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2 lags % Regression coefficients Beta1 = [1;-1]; Beta2 = [2 2;-2 -2]; Beta3 = [3 3 3;-3 -3 -3];
(numSeries x numRegressors matrices) % 1 regressor % 2 regressors % 3 regressors
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VARX Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Beta',Beta1,... 'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Beta',Beta2,... 'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Beta',Beta3,... 'Covariance',Sigma3);
mdl1, mdl2, and mdl3 are fully specified varm objects representing the state-specified submodels. Store the submodels in a vector with order corresponding to the regimes in mc.StateNames. mdl = [mdl1; mdl2; mdl3];
Use msVAR to create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
3 2 ["1" "2" ["1" "2"] [1x1 dtmc] [3x1 varm]
"3"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance: Mdl.Submodels(2)
12-1894
"2-Dimensional VARX(0) Model with 1 Predictor" "Y1" "Y2" 2 0 [1 -1]' {} [2×1 vector of zeros] [2×1 matrix] [2×2 matrix]
msVAR
ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(1) Model with 2 Predictors" "Y1" "Y2" 2 1 [2 -2]' {2×2 matrix} at lag [1] [2×1 vector of zeros] [2×2 matrix] [2×2 matrix]
Mdl.Submodels(3) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(2) Model with 3 Predictors" "Y1" "Y2" 2 2 [3 -3]' {2×2 matrices} at lags [1 2] [2×1 vector of zeros] [2×3 matrix] [2×2 matrix]
Create Partially Specified Multivariate Model for Estimation Consider fitting to data a three-state Markov-switching model for a 2-D response process. Create a discrete-time Markov chain model for the switching mechanism. Specify a 3-by-3 matrix of NaN values for the transition matrix. This setting indicates that you want to estimate all transition probabilities. P = nan(3); mc = dtmc(P);
mc is a partially specified dtmc object. The transition matrix mc.P is completely unknown and estimable. Create 2-D VAR(0), VAR(1), and VAR(2) models by using the shorthand syntax of varm. Store the models in a vector. mdl1 = varm(2,0); mdl2 = varm(2,1); mdl3 = varm(2,2); mdl = [mdl1 mdl2 mdl3]; mdl(1) ans = varm with properties:
12-1895
12
Functions
Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 0 [2×1 vector of {} [2×1 vector of [2×0 matrix] [2×2 matrix of
VAR(0) Model"
NaNs] zeros] NaNs]
mdl contains three state-specific varm model templates for estimation. NaN values in the properties indicate estimable parameters. Create a Markov-switching model template from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl) Mdl = msVAR with properties: NumStates: NumSeries: StateNames: SeriesNames: Switch: Submodels:
3 2 ["1" "2" ["1" "2"] [1x1 dtmc] [3x1 varm]
"3"]
Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 0 [2×1 vector of {} [2×1 vector of [2×0 matrix] [2×2 matrix of
VAR(0) Model"
NaNs] zeros] NaNs]
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
12-1896
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of [2×0 matrix] [2×2 matrix of
VAR(1) Model"
NaNs] NaNs} at lag [1] zeros] NaNs]
msVAR
Mdl.Submodels(3) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional VAR(2) Model" "Y1" "Y2" 2 2 [2×1 vector of NaNs] {2×2 matrices of NaNs} at lags [1 2] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix of NaNs]
Mdl is a partially specified msVAR model for estimation.
Specify Model Regression Component for Estimation Consider including regression components for exogenous variables in the submodels of the Markovswitching dynamic regression model in “Create Partially Specified Multivariate Model for Estimation” on page 12-1895. Assume that the VAR(0) model includes the regressor x1t, the VAR(1) model includes the regressors x1t and x2t, and the VAR(2) model includes the regressors x1t, x2t, and x3t. Create the discrete-time Markov chain. P = nan(3); mc = dtmc(P);
Create 2-D VARX(0), VARX(1), and VARX(2) models by using the shorthand syntax of varm. For each model, set the Beta property to a numSeries-by-numRegressors matrix of NaN values by using dot notation. Store all models in a vector. numSeries = 2; mdl1 = varm(numSeries,0); mdl1.Beta = NaN(numSeries,1); mdl2 = varm(numSeries,1); mdl2.Beta = NaN(numSeries,2); mdl3 = varm(numSeries,2); mdl3.Beta = nan(numSeries,3); mdl = [mdl1; mdl2; mdl3];
Create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,mdl); Mdl.Submodels(2) ans = varm with properties:
12-1897
12
Functions
Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of [2×2 matrix of [2×2 matrix of
VARX(1) Model with 2 Predictors"
NaNs] NaNs} at lag [1] zeros] NaNs] NaNs]
Create Model Specifying Equality Constraints for Estimation Consider the model in “Create Partially Specified Multivariate Model for Estimation” on page 121895. Suppose theory dictates that states do not persist. Create a discrete-time Markov chain model for the switching mechanism. Specify a 3-by-3 matrix of NaN values for the transition matrix. Indicate that states do not persist by setting the diagonal elements of the matrix to 0. P = nan(3); P(logical(eye(3))) = 0; mc = dtmc(P);
mc is a partially specified dtmc object. Create the submodels and store them in a vector. mdl1 = mdl2 = mdl3 = submdl
varm(2,0); varm(2,1); varm(2,2); = [mdl1; mdl2; mdl3];
Create a Markov-switching dynamic regression model from the switching mechanism mc and the state-specific submodels mdl. Mdl = msVAR(mc,submdl); Mdl.Switch.P ans = 3×3 0 NaN NaN
NaN 0 NaN
NaN NaN 0
estimate treats the known diagonal elements of the transition matrix as equality constraints during estimation. For more details, see estimate.
More About Markov-Switching Dynamic Regression Model A Markov-switching dynamic regression model of a univariate or multivariate response series yt describes the dynamic behavior of the series in the presence of structural breaks or regime changes. 12-1898
msVAR
A collection of state-specific dynamic regression submodels describes the dynamic behavior of yt within the regimes. f 1 yt; xt, θ1 , st = 1 yt
f 2 yt; xt, θ2 , st = 2
, ⋮ ⋮ f n yt; xt, θn , st = n
where: • st is a discrete-time Markov chain representing the switching mechanism among regimes (Switch). • n is the number of regimes (NumStates). •
f i yt; xt, θi is the regime i dynamic regression model of yt (Submodels(i)). Submodels are either univariate (ARX) or multivariate (VARX).
• xt is a vector of observed exogenous variables at time t. • θi is the regime i collection of parameters of the dynamic regression model, such as AR coefficients and the innovation variances. Hamilton [2] proposes a general model, known as Markov-switching autoregression (MSAR), allowing for lagged values of the switching state s. Hamilton [3] shows how to convert an MSAR model into a dynamic regression model with a higher-dimensional state space, supported by msVAR.
Version History Introduced in R2019b
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." Econometrica. Vol. 57, 1989, pp. 357–384. [3] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Krolzig, H.-M. Markov-Switching Vector Autoregressions. Berlin: Springer, 1997.
See Also dtmc | arima | varm | tsVAR
12-1899
12
Functions
mtimes Lag operator polynomial multiplication
Syntax C = mtimes(A, B, 'Tolerance',tolerance) C = A * B
Description Given two lag operator polynomials A(L) and B(L),C = mtimes(A, B, 'Tolerance',tolerance) performs a polynomial multiplication C(L) = A(L) * B(L). If at least one of A or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator). 'Tolerance' is the nonnegative scalar tolerance used to determine which coefficients are included in the result. The default tolerance is 1e-12. Specifying a tolerance greater than 0 allows the user to exclude polynomial lags with near-zero coefficients. A coefficient matrix of a given lag is excluded only if the magnitudes of all elements of the matrix are less than or equal to the specified tolerance. C = A * B performs a polynomial multiplication C(L) = A(L) * B(L).
Examples Multiply Two Lag Operator Polynomials Create two LagOp polynomials and multiply them together: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); mtimes(A,B) ans = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [1 -1.1 0.38 -0.04] Lags: [0 1 2 3] Degree: 3 Dimension: 1
Tips The multiplication operator (*) invokes mtimes, but the optional coefficient tolerance is available only by calling mtimes directly.
See Also mldivide | mrdivide 12-1900
normalbvarm
normalbvarm Bayesian vector autoregression (VAR) model with normal conjugate prior and fixed covariance for data likelihood
Description The Bayesian VAR model on page 12-1921 object normalbvarm specifies the prior distribution of the array of model coefficients Λ in an m-D VAR(p) model, where the innovations covariance matrix Σ is known and fixed. The prior distribution of Λ is the normal conjugate prior model on page 12-1922. In general, when you create a Bayesian VAR model object, it specifies the joint prior distribution and characteristics of the VARX model only. That is, the model object is a template intended for further use. Specifically, to incorporate data into the model for posterior distribution analysis, pass the model object and data to the appropriate object function on page 12-1906.
Creation Syntax PriorMdl = normalbvarm(numseries,numlags) PriorMdl = normalbvarm(numseries,numlags,Name,Value) Description To create a normalbvarm object, use either the normalbvarm function (described here) or the bayesvarm function. The syntaxes for each function are similar, but the options differ. bayesvarm enables you to set prior hyperparameter values for Minnesota prior[1] regularization easily, whereas normalbvarm requires the entire specification of prior distribution hyperparameters. PriorMdl = normalbvarm(numseries,numlags) creates a numseries-D Bayesian VAR(numlags) model object PriorMdl, which specifies dimensionalities and prior assumptions for all model coefficients λ = vec Λ = vec Φ1 Φ2 ⋯ Φp c δ Β ′ , where: • numseries = m, the number of response time series variables. • numlags = p, the AR polynomial order. • The prior distribution of λ is the normal conjugate prior model on page 12-1922. • The fixed innovations covariance Σ is the m-by-m identity matrix. PriorMdl = normalbvarm(numseries,numlags,Name,Value) sets writable properties on page 12-1902 (except NumSeries and P) using name-value pair arguments. Enclose each property name in quotes. For example, normalbvarm(3,2,'Sigma',4*eye(3),'SeriesNames', ["UnemploymentRate" "CPI" "FEDFUNDS"]) specifies the names of the three response variables in the Bayesian VAR(2) model, and fixes the innovations covariance matrix at 4*eye(3).
12-1901
12
Functions
Input Arguments numseries — Number of time series m 1 (default) | positive integer Number of time series m, specified as a positive integer. numseries specifies the dimensionality of the multivariate response variable yt and innovation εt. numseries sets the NumSeries property. Data Types: double numlags — Number of lagged responses nonnegative integer Number of lagged responses in each equation of yt, specified as a nonnegative integer. The resulting model is a VAR(numlags) model; each lag has a numseries-by-numseries coefficient matrix. numlags sets the P property. Data Types: double
Properties You can set writable property values when you create the model object by using name-value argument syntax, or after you create the model object by using dot notation. For example, to create a 3-D Bayesian VAR(1) model and label the first through third response variables, and then include a linear time trend term, enter: PriorMdl = normalbvarm(3,1,'SeriesNames',["UnemploymentRate" "CPI" "FEDFUNDS"]); PriorMdl.IncludeTrend = true; Model Characteristics and Dimensionality
Description — Model description string scalar | character vector Model description, specified as a string scalar or character vector. The default value describes the model dimensionality, for example '2-Dimensional VAR(3) Model'. Example: "Model 1" Data Types: string | char NumSeries — Number of time series m positive integer This property is read-only. Number of time series m, specified as a positive integer. NumSeries specifies the dimensionality of the multivariate response variable yt and innovation εt. Data Types: double P — Multivariate autoregressive polynomial order nonnegative integer This property is read-only. 12-1902
normalbvarm
Multivariate autoregressive polynomial order, specified as a nonnegative integer. P is the maximum lag that has a nonzero coefficient matrix. P specifies the number of presample observations required to initialize the model. Data Types: double SeriesNames — Response series names string vector | cell array of character vectors Response series names, specified as a NumSeries length string vector. The default is ['Y1' 'Y2' ... 'YNumSeries']. normalbvarm stores SeriesNames as a string vector. Example: ["UnemploymentRate" "CPI" "FEDFUNDS"] Data Types: string IncludeConstant — Flag for including model constant c true (default) | false Flag for including a model constant c, specified as a value in this table. Value
Description
false
Response equations do not include a model constant.
true
All response equations contain a model constant.
Data Types: logical IncludeTrend — Flag for including linear time trend term δt false (default) | true Flag for including a linear time trend term δt, specified as a value in this table. Value
Description
false
Response equations do not include a linear time trend term.
true
All response equations contain a linear time trend term.
Data Types: logical NumPredictors — Number of exogenous predictor variables in model regression component 0 (default) | nonnegative integer Number of exogenous predictor variables in the model regression component, specified as a nonnegative integer. normalbvarm includes all predictor variables symmetrically in each response equation. Distribution Hyperparameters
Mu — Mean of multivariate normal prior on λ zeros(NumSeries*(NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors),1) (default) | numeric vector 12-1903
12
Functions
Mean of the multivariate normal prior on λ, specified as a NumSeries*k-by-1 numeric vector, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). Mu(1:k) corresponds to all coefficients in the equation of response variable SeriesNames(1), Mu((k + 1):(2*k)) corresponds to all coefficients in the equation of response variable SeriesNames(2), and so on. For a set of indices corresponding to an equation: • Elements 1 through NumSeries correspond to the lag 1 AR coefficients of the response variables ordered by SeriesNames. • Elements NumSeries + 1 through 2*NumSeries correspond to the lag 2 AR coefficients of the response variables ordered by SeriesNames. • In general, elements (q – 1)*NumSeries + 1 through q*NumSeries corresponds to the lag q AR coefficients of the response variables ordered by SeriesNames. • If IncludeConstant is true, element NumSeries*P + 1 is the model constant. • If IncludeTrend is true, element NumSeries*P + 2 is the linear time trend coefficient. • If NumPredictors > 0, elements NumSeries*P + 3 through k constitute the vector of regression coefficients of the exogenous variables. This figure shows the structure of the transpose of Mu for a 2-D VAR(3) model that contains a constant vector and four exogenous predictors: y1, t y2, t ⨉ ⨉ ϕ ϕ ϕ ϕ ϕ ϕ c β β β β ϕ ϕ ϕ ϕ ϕ [ 1, 11 1, 12 2, 11 2, 12 3, 11 3, 12 1 11 12 13 14 1, 21 1, 22 2, 21 2, 22 3, 21 ϕ3, 22 c2 β21 β22 β23 β24
], where • ϕq,jk is element (j,k) of the lag q AR coefficient matrix. • cj is the model constant in the equation of response variable j. • Bju is the regression coefficient of the exogenous variable u in the equation of response variable j. Tip bayesvarm enables you to specify Mu easily by using the Minnesota regularization method. To specify Mu directly: 1
Set separate variables for the prior mean of each coefficient matrix and vector.
2
Horizontally concatenate all coefficient means in this order: Coef f = Φ1 Φ2 ⋯ Φp c δ Β .
3
Vectorize the transpose of the coefficient mean matrix. Mu = Coeff.'; Mu = Mu(:);
Data Types: double V — Conditional covariance matrix of multivariate normal prior on λ eye(NumSeries*(NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors)) (default) | symmetric, positive definite numeric matrix 12-1904
normalbvarm
Conditional covariance matrix of multivariate normal prior on λ, specified as a NumSeries*k-byNumSeries*k symmetric, positive definite matrix, where k = NumSeries*P + IncludeIntercept + IncludeTrend + NumPredictors (the number of coefficients in a response equation). Row and column indices correspond to the model coefficients in the same way as Mu. For example, consider a 3-D VAR(2) model containing a constant and four exogenous variables. • V(1,1) is Var(ϕ1,11). • V(5,6) is Cov(ϕ2,12,ϕ2,13). • V(8,9) is Cov(β11,β12). Tip bayesvarm enables you to create any Bayesian VAR prior model and specify V easily by using the Minnesota regularization method. Data Types: double Sigma — Fixed innovations covariance matrix Σ eye(NumSeries) (default) | positive definite numeric matrix Fixed innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries positive definite numeric matrix. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. Sigma sets the Covariance property. Data Types: double VAR Model Parameters Derived from Distribution Hyperparameters
AR — Distribution mean of autoregressive coefficient matrices Φ1,…,Φp cell vector of numeric matrices This property is read-only. Distribution mean of the autoregressive coefficient matrices Φ1,…,Φp associated with the lagged responses, specified as a P-D cell vector of NumSeries-by-NumSeries numeric matrices. AR{j} is Φj, the coefficient matrix of lag j. Rows correspond to equations and columns correspond to lagged response variables; SeriesNames determines the order of response variables and equations. Coefficient signs are those of the VAR model expressed in difference-equation notation. If P = 0, AR is an empty cell. Otherwise, AR is the collection of AR coefficient means extracted from Mu. Data Types: cell Constant — Distribution mean of model constant c numeric vector This property is read-only. Distribution mean of the model constant c (or intercept), specified as a NumSeries-by-1 numeric vector. Constant(j) is the constant in equation j; SeriesNames determines the order of equations. 12-1905
12
Functions
If IncludeConstant = false, Constant is an empty array. Otherwise, Constant is the model constant vector mean extracted from Mu. Data Types: double Trend — Distribution mean of linear time trend δ numeric vector This property is read-only. Distribution mean of the linear time trend δ, specified as a NumSeries-by-1 numeric vector. Trend(j) is the linear time trend in equation j; SeriesNames determines the order of equations. If IncludeTrend = false (the default), Trend is an empty array. Otherwise, Trend is the linear time trend coefficient mean extracted from Mu. Data Types: double Beta — Distribution mean of regression coefficient matrix Β numeric matrix This property is read-only. Distribution mean of the regression coefficient matrix B associated with the exogenous predictor variables, specified as a NumSeries-by-NumPredictors numeric matrix. Beta(j,:) contains the regression coefficients of each predictor in the equation of response variable j yj,t. Beta(:,k) contains the regression coefficient in each equation of predictor xk. By default, all predictor variables are in the regression component of all response equations. You can down-weight a predictor from an equation by specifying, for the corresponding coefficient, a prior mean of 0 in Mu and a small variance in V. When you create a model, the predictor variables are hypothetical. You specify predictor data when you operate on the model (for example, when you estimate the posterior by using estimate). Columns of the predictor data determine the order of the columns of Beta. Data Types: double Covariance — Fixed innovations covariance matrix Σ positive definite numeric matrix This property is read-only. Fixed innovations covariance matrix Σ of the NumSeries innovations at each time t = 1,...,T, specified as a NumSeries-by-NumSeries symmetric, positive definite numeric matrix. Rows and columns correspond to innovations in the equations of the response variables ordered by SeriesNames. The Sigma property sets Covariance. Data Types: double
Object Functions estimate
12-1906
Estimate posterior distribution of Bayesian vector autoregression (VAR) model parameters
normalbvarm
forecast simsmooth simulate summarize
Forecast responses from Bayesian vector autoregression (VAR) model Simulation smoother of Bayesian vector autoregression (VAR) model Simulate coefficients and innovations covariance matrix of Bayesian vector autoregression (VAR) model Distribution summary statistics of Bayesian vector autoregression (VAR) model
Examples Create Normal Conjugate Prior Model Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt
4
=c+
FEDFUNDSt
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and fixed covariance Σ = I, the 3-D identity matrix. Assume that the prior distribution vec Φ1, . . . , Φ4, c ′ ∼ Ν39 μ, V , where μ is a 39-by-1 vector of means and V is the 39-by-39 covariance matrix. Create a normal conjugate prior model for the 3-D VAR(4) model parameters. numseries = 3; numlags = 4; PriorMdl = normalbvarm(numseries,numlags) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
PriorMdl is a normalbvarm Bayesian VAR model object representing the prior distribution of the coefficients of the 3-D VAR(4) model. The command line display shows properties of the model. You can display properties by using dot notation. Display the prior mean matrices of the four AR coefficients by setting each matrix in the cell to a variable. 12-1907
12
Functions
AR1 = PriorMdl.AR{1} AR1 = 3×3 0 0 0
0 0 0
0 0 0
AR2 = PriorMdl.AR{2} AR2 = 3×3 0 0 0
0 0 0
0 0 0
AR3 = PriorMdl.AR{3} AR3 = 3×3 0 0 0
0 0 0
0 0 0
AR4 = PriorMdl.AR{4} AR4 = 3×3 0 0 0
0 0 0
0 0 0
normalbvarm centers all AR coefficients at 0 by default. The AR property is read only, but it is derived from the writeable property Mu. Display the fixed innovations covariance Σ. PriorMdl.Covariance ans = 3×3 1 0 0
0 1 0
0 0 1
Covariance is a read-only property. To set the value Σ , use the 'Sigma' name-value pair argument or specify the Sigma property by using dot notation. For example: PriorMdl.Sigma = 4*eye(PriorMdl.NumSeries);
12-1908
normalbvarm
Specify Innovations Covariance Matrix Consider the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1907. Suppose econometric theory dictates that −5
10 Σ= 0
−4
10
−4
0 10 0 . 1 −0 . 2 . −0 . 2 1 . 6
Create a normal conjugate prior model for the VAR model coefficients. Specify the value of Σ. numseries = 3; numlags = 4; Sigma = [10e-5 0 10e-4; 0 0.1 -0.2; 10e-4 -0.2 1.6]; PriorMdl = normalbvarm(numseries,numlags,'Sigma',Sigma) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["Y1" "Y2" "Y3"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
Because Σ is fixed for normalbvarm prior models, PriorMdl.Sigma and PriorMdl.Covariance are equal. PriorMdl.Sigma ans = 3×3 0.0001 0 0.0010
0 0.1000 -0.2000
0.0010 -0.2000 1.6000
PriorMdl.Covariance ans = 3×3 0.0001 0 0.0010
0 0.1000 -0.2000
0.0010 -0.2000 1.6000
12-1909
12
Functions
Create Bayesian AR(2) Model with Normal Conjugate Coefficient Prior Consider a 1-D Bayesian AR(2) model for the daily NASDAQ returns from January 2, 1990 through December 31, 2001. yt = c + ϕ1 yt − 1 + ϕ2 yt − 1 + εt . The coefficient prior distribution is [ϕ1 ϕ2 c]′ | σ2 ∼ N3 μ, V , where μ is a 3-by-1 vector of coefficient means and V is a 3-by-3 covariance matrix. Assume Var(εt) is 2. Create a normal conjugate prior model for the AR(2) model parameters. numseries = 1; numlags = 2; PriorMdl = normalbvarm(numseries,numlags,'Sigma',2) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" 1 2 "Y1" 1 0 0 [3x1 double] [3x3 double] 2 {[0] [0]} 0 [1x0 double] [1x0 double] 2
Specify High Tightness for Lags and Response Names In the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1907, consider excluding lags 2 and 3 from the model. You cannot exclude coefficient matrices from models, but you can specify high prior tightness on zero for coefficients that you want to exclude. Create a normal conjugate prior model for the 3-D VAR(4) model parameters. Specify response variable names. By default, AR coefficient prior means are zero. Specify high tightness values for lags 2 and 3 by setting their prior variances to 1e-6. Leave all other coefficient tightness values at their defaults: 12-1910
normalbvarm
• 1 for AR coefficient variances • 1e3 for constant vector variances • 0 for all coefficient covariances numseries = 3; numlags = 4; seriesnames = ["INFL"; "UNRATE"; "FEDFUNDS"]; vPhi1 = ones(numseries,numseries); vPhi2 = 1e-6*ones(numseries,numseries); vPhi3 = 1e-6*ones(numseries,numseries); vPhi4 = ones(numseries,numseries); vc = 1e3*ones(3,1); Vmat = [vPhi1 vPhi2 vPhi3 vPhi4 vc]'; V = diag(Vmat(:)); PriorMdl = normalbvarm(numseries,numlags,'SeriesNames',seriesnames,... 'V',V) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "UNRATE" "FEDFUNDS"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
[3x3 double]}
Set Prior Hyperparameters for Minnesota Regularization normalbvarm options enable you to specify coefficient prior hyperparameter values directly, but bayesvarm options are well suited for tuning hyperparameters following the Minnesota regularization method. Consider the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1907. The model contains 39 coefficients. For coefficient sparsity, create a normal conjugate Bayesian VAR model by using bayesvarm. Specify the following, a priori: • Each response is an AR(1) model, on average, with lag 1 coefficient 0.75. • Prior self-lag coefficients have variance 100. This large variance setting allows the data to influence the posterior more than the prior. • Prior cross-lag coefficients have variance 1. This small variance setting tightens the cross-lag coefficients to zero during estimation. 12-1911
12
Functions
• Prior coefficient covariances decay with increasing lag at a rate of 2 (that is, lower lags are more important than higher lags). • The innovations covariance Σ = I. numseries = 3; numlags = 4; seriesnames = ["INFL"; "UNRATE"; "FEDFUNDS"]; Sigma = eye(numseries); PriorMdl = bayesvarm(numseries,numlags,'ModelType','normal','Sigma',Sigma,... 'Center',0.75,'SelfLag',100,'CrossLag',1,'Decay',2,'SeriesNames',seriesnames) PriorMdl = normalbvarm with properties: Description: NumSeries: P: SeriesNames: IncludeConstant: IncludeTrend: NumPredictors: Mu: V: Sigma: AR: Constant: Trend: Beta: Covariance:
"3-Dimensional VAR(4) Model" 3 4 ["INFL" "UNRATE" "FEDFUNDS"] 1 0 0 [39x1 double] [39x39 double] [3x3 double] {[3x3 double] [3x3 double] [3x3 double] [3x1 double] [3x0 double] [3x0 double] [3x3 double]
Display all prior coefficient means. Phi1 = PriorMdl.AR{1} Phi1 = 3×3 0.7500 0 0
0 0.7500 0
Phi2 = PriorMdl.AR{2} Phi2 = 3×3 0 0 0
0 0 0
0 0 0
Phi3 = PriorMdl.AR{3} Phi3 = 3×3 0 0 0
12-1912
0 0 0
0 0 0
0 0 0.7500
[3x3 double]}
normalbvarm
Phi4 = PriorMdl.AR{4} Phi4 = 3×3 0 0 0
0 0 0
0 0 0
Display a heatmap of the prior coefficient covariances for each response equation.
numexocoeffseqn = PriorMdl.IncludeConstant + ... PriorMdl.IncludeTrend + PriorMdl.NumPredictors; % Number of exogenous coefficient numcoeffseqn = PriorMdl.NumSeries*PriorMdl.P + numexocoeffseqn; % Total number of coefficients pe arcoeffnames = strings(numseries,numlags,numseries); for j = 1:numseries % Equations for r = 1:numlags for k = 1:numseries % Response Variables arcoeffnames(k,r,j) = "\phi_{"+r+","+j+k+"}"; end end arcoeffseqn = arcoeffnames(:,:,j); idx = ((j-1)*numcoeffseqn + 1):(numcoeffseqn*j) - numexocoeffseqn; Veqn = PriorMdl.V(idx,idx); figure heatmap(arcoeffseqn(:),arcoeffseqn(:),Veqn); title(sprintf('Equation of %s',seriesnames(j))) end
12-1913
12
Functions
12-1914
normalbvarm
Work with Prior and Posterior Distributions Consider the 3-D VAR(4) model of “Create Normal Conjugate Prior Model” on page 12-1907. Estimate the posterior distribution, and generate forecasts from the corresponding posterior predictive distribution. Load and Preprocess Data Load the US macroeconomic data set. Compute the inflation rate. Plot all response series. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; figure plot(DataTimeTable.Time,DataTimeTable{:,seriesnames}) legend(seriesnames)
12-1915
12
Functions
Stabilize the unemployment and federal funds rates by applying the first difference to each series. DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3);
Remove all missing values from the data. rmDataTimeTable = rmmissing(DataTimeTable);
Create Prior Model Create a normal conjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names. Assume that the innovations covariance is the identity matrix. numseries = numel(seriesnames); numlags = 4; PriorMdl = normalbvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate Posterior Distribution Estimate the posterior distribution by passing the prior model and entire data series to estimate. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','equation'); Bayesian VAR under normal priors and fixed Sigma Effective Sample Size: 197
12-1916
normalbvarm
Number of equations: 3 Number of estimated Parameters: 39
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1260 -0.4400 0.1049 0.3176 -0.0545 0.0440 0.4173 | (0.1367) (0.2673) (0.0700) (0.1551) (0.2854) (0.0739) (0.1536) DUNRATE | -0.0236 0.4440 0.0350 0.0900 0.2295 0.0520 -0.0330 | (0.1367) (0.2673) (0.0700) (0.1551) (0.2854) (0.0739) (0.1536) DFEDFUNDS | -0.1514 -1.3408 -0.2762 0.3275 -0.2971 -0.3041 0.2609 | (0.1367) (0.2673) (0.0700) (0.1551) (0.2854) (0.0739) (0.1536) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS -------------------------------------INFL | 1 0 0 | (0) (0) (0) DUNRATE | 0 1 0 | (0) (0) (0) DFEDFUNDS | 0 0 1 | (0) (0) (0)
Because the prior is conjugate for the data likelihood, the posterior is analytically tractable. By default, estimate uses the first four observations as a presample to initialize the model. Generate Forecasts from Posterior Predictive Distribution From the posterior predictive distribution, generate forecasts over a two-year horizon. Because sampling from the posterior predictive distribution requires the entire data set, specify the prior model in forecast instead of the posterior. fh = 8; rng(1); % For reproducibility FY = forecast(PriorMdl,fh,rmDataTimeTable{:,seriesnames});
FY is an 8-by-3 matrix of forecasts. Plot the end of the data set and the forecasts. fp = rmDataTimeTable.Time(end) + calquarters(1:fh); figure plotdata = [rmDataTimeTable{end - 10:end,seriesnames}; FY]; plot([rmDataTimeTable.Time(end - 10:end); fp'],plotdata) hold on plot([fp(1) fp(1)],ylim,'k-.') legend(seriesnames) title('Data and Forecasts') hold off
12-1917
12
Functions
Compute Impulse Responses Plot impulse response functions by passing posterior estimations to armairf. armairf(PosteriorMdl.AR,[],'InnovCov',PosteriorMdl.Covariance)
12-1918
normalbvarm
12-1919
12
Functions
12-1920
normalbvarm
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. 12-1921
12
Functions
• zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation. Normal Conjugate Prior Model The normal conjugate prior model, outlined in [1], is an m-D Bayesian VAR model on page 12-1921 in which the innovations covariance matrix Σ is known and fixed, while the coefficient vector λ = vec(Λ) has the prior distribution λ Nm(mp + r + 1c + 1δ) μ, V , where: • r = NumPredictors. • 1c is 1 if IncludeConstant is true, and 0 otherwise. • 1δ is 1 if IncludeTrend is true, and 0 otherwise. The posterior distribution is proper and analytically tractable. λ yt, xt Nm(mp) + r + 1c + 1δ μ, V , where: 12-1922
normalbvarm
•
μ = V −1 +
T
∑
t=1
•
T
V = V −1 +
∑
t=1
Zt′Σ−1Zt Zt′Σ−1Zt
−1
V −1μ +
T
∑
t=1
Zt′Σ−1 yt .
−1
.
Version History Introduced in R2020a
References [1] Litterman, Robert B. "Forecasting with Bayesian Vector Autoregressions: Five Years of Experience." Journal of Business and Economic Statistics 4, no. 1 (January 1986): 25–38. https://doi.org/10.2307/1391384.
See Also Functions bayesvarm Objects semiconjugatebvarm | diffusebvarm | conjugatebvarm
12-1923
12
Functions
parcorr Sample partial autocorrelation
Syntax [pacf,lags] = parcorr(y) PACFTbl = parcorr(Tbl) [ ___ ,bounds] = parcorr( ___ ) [ ___ ] = parcorr( ___ ,Name=Value) parcorr( ___ ) parcorr(ax, ___ ) [ ___ ,h] = parcorr( ___ )
Description [pacf,lags] = parcorr(y) returns the sample partial autocorrelation on page 12-1935 function (PACF) pacf and associated lags lags of the univariate time series y. PACFTbl = parcorr(Tbl) returns the table PACFTbl containing variables for the sample PACF and associated lags of the last variable in the input table or timetable Tbl. To select a different variable in Tbl, for which to compute the PACF, use the DataVariable name-value argument. [ ___ ,bounds] = parcorr( ___ ) uses any input-argument combination in the previous syntaxes, and returns the output-argument combination for the corresponding input arguments and the approximate upper and lower confidence bounds bounds on the PACF. [ ___ ] = parcorr( ___ ,Name=Value) uses additional options specified by one or more namevalue arguments. For example, parcorr(Tbl,DataVariable="RGDP",NumLags=10,NumSTD=1.96) returns 10 lags of the sample PACF of the table variable "RGDP" in Tbl and 95% confidence bounds. parcorr( ___ ) plots the sample PACF of the input series with confidence bounds. parcorr(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca). ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,h] = parcorr( ___ ) plots the sample PACF of the input series and additionally returns handles to plotted graphics objects. Use elements of h to modify properties of the plot after you create it.
Examples Compute PACF from Vector of Time Series Data Compute the PACF of a univariate time series. Input the time series data as a numeric vector. Load the quarterly real GDP series in Data_GDP.mat. Plot the series, which is stored in the numeric vector Data. 12-1924
parcorr
load Data_GDP plot(Data)
The series exhibits exponential growth. Compute the returns of the series. ret = price2ret(Data);
ret is a series of real GDP returns; it has one less observation than the real GDP series. Compute the PACF of the real GDP returns, and return the associated lags. [pacf,lags] = parcorr(ret); [pacf lags] ans = 21×2 1.0000 0.3329 0.0828 -0.1205 -0.1080 -0.0869 0.0226 -0.0254 -0.0243 0.0699
0 1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000
12-1925
12
Functions
⋮
Let yt be the real GDP return at time t. pacf(3) = 0.0828 means that the correlation between yt and yt − 2, after adjusting for the linear effects of yt − 1 on yt, is 0.0828.
Compute PACF of Table Variable Compute the PACF of a time series, which is one variable in a table. Load the electricity spot price data set Data_ElectricityPrices.mat, which contains the daily spot prices in the timetable DataTimeTable. load Data_ElectricityPrices.mat DataTimeTable.Properties.VariableNames ans = 1x1 cell array {'SpotPrice'}
Plot the series. plot(DataTimeTable.SpotPrice)
The time series plot does not clearly indicate an exponential trend or unit root. 12-1926
parcorr
Compute the PACF of the raw spot price series. PACFTbl = parcorr(DataTimeTable) PACFTbl=21×2 table Lags PACF ____ ________ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ⋮
1 0.5541 0.10938 0.099833 0.029511 0.038836 0.065892 0.029965 0.034951 0.050091 0.051031 0.033994 0.051877 0.028973 0.047456 0.11895
parcorr returns the results in the table PACFTbl, where variables correspond to the PACF (PACF) and associated lags (Lags). By default, parcorr computes the PACF of the last variable in the table. To select a variable from an input table, set the DataVariable option.
Return PACF Confidence Bounds Consider the electricity spot prices in “Compute PACF of Table Variable” on page 12-1926. Load the electricity spot price data set Data_ElectricityPrices.mat. Compute the PACF and return the PACF confidence bounds. load Data_ElectricityPrices [PACFTbl,bounds] = parcorr(DataTimeTable) PACFTbl=21×2 table Lags PACF ____ ________ 0 1 2 3 4 5 6 7
1 0.5541 0.10938 0.099833 0.029511 0.038836 0.065892 0.029965
12-1927
12
Functions
8 9 10 11 12 13 14 15 ⋮
0.034951 0.050091 0.051031 0.033994 0.051877 0.028973 0.047456 0.11895
bounds = 2×1 0.0532 -0.0532
Assuming the spot prices follow a Gaussian white noise series, an approximate 95.4% confidence interval on the PACF is (-0.0532, 0.0532).
Compare OLS and Yule-Walker PACFs Load the US quarterly macroeconomic series data Data_USEconModel.mat. Remove all missing values from the timetable of data DataTimeTable by using listwise deletion. load Data_USEconModel DataTimeTable = rmmissing(DataTimeTable);
Compute the PACF of the raw effective federal funds rate FEDFUNDS by using the OLS method (the default method when the data does not contain any missing values). Change the name of the PACF variable of the output table to ols. PACFTbl = parcorr(DataTimeTable,DataVariable="FEDFUNDS",Method="ols"); PACFTbl = renamevars(PACFTbl,"PACF","ols");
Compute the PACF of the raw effective federal funds rate FEDFUNDS by solving the Yule-Walker equations. Store the result as the variable yw in PACFTbl. PACFTbl.yw = parcorr(DataTimeTable.FEDFUNDS,Method="yule-walker");
Compare the PACFs between the methods. PACFTbl PACFTbl=21×3 table Lags ols ____ _________ 0 1 2 3 4 5 6
12-1928
1 0.92881 0.074551 0.14949 -0.23745 0.073346 -0.25854
yw _________ 1 0.91502 0.061337 0.15031 -0.20808 0.073911 -0.21128
parcorr
7 8 9 10 11 12 13 14 15 ⋮
0.021783 0.10817 0.16147 -0.081344 0.050378 0.075574 0.026074 -0.036989 0.074656
0.018625 0.092159 0.10987 -0.077452 0.034769 0.070003 0.02403 -0.025924 0.048657
Plot the PACFs in the same stem plot. stem(repmat(PACFTbl.Lags,1,2),PACFTbl{:,["ols" "yw"]},"filled") title("PACF of Effective Federal Funds Rate") legend(["OLS" "Yule-Walker"])
Plot PACF of Simulated Time Series Specify the AR(2) model: yt = 0 . 6yt − 1 − 0 . 5yt − 2 + εt, where εt is Gaussian with mean 0 and variance 1. 12-1929
12
Functions
rng(1); % For reproducibility Mdl = arima(AR={0.6 -0.5},Constant=0,Variance=1) Mdl = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(2,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 2 0 0 0 {0.6 -0.5} at lags [1 2] {} {} {} 0 [1×0] 1
Simulate 1000 observations from Mdl. y = simulate(Mdl,1000);
Plot the PACF. Specify that the series is an AR(2) process. parcorr(y,NumAR=2)
12-1930
parcorr
The PACF cuts off after the second lag. This behavior indicates an AR(2) process.
Specify Additional Lags for PACF Plot Specify the multiplicative seasonal ARMA (2, 0, 1) × (3, 0, 0)12 model: (1 − 0 . 75L − 0 . 15L2)(1 − 0 . 9L12 + 0 . 75L24 − 0 . 5L36)yt = 2 + εt − 0 . 5εt − 1, where εt is Gaussian with mean 0 and variance 1. Mdl = arima(AR={0.75 0.15},SAR={0.9 -0.75 0.5}, ... SARLags=[12 24 36],MA=-0.5,Constant=2, ... Variance=1);
Simulate data from Mdl. rng(1); y = simulate(Mdl,1000);
Plot the default partial autocorrelation function (PACF). figure parcorr(y)
The default correlogram does not display the dependence structure for higher lags. 12-1931
12
Functions
Plot the PACF for 40 lags. figure parcorr(y,NumLags=40)
The correlogram shows the larger correlations at lags 12, 24, and 36.
Input Arguments y — Observed univariate time series numeric vector Observed univariate time series for which parcorr computes or plots the PACF, specified as a numeric vector. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl contains contemporaneous observations of all variables. Specify a single series (variable) by using the DataVariable argument. The selected variable must be numeric. 12-1932
parcorr
ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, parcorr plots to the current axes (gca). Note Specify missing observations using NaN. The parcorr function treats missing values as missing completely at random on page 12-1935. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: parcorr(Tbl,DataVariable="RGDP",NumLags=10,NumSTD=3) plots 10 lags of the sample PACF of the variable "RGDP" in Tbl, and displays confidence bounds consisting of 3 standard errors away from 0. NumLags — Number of lags positive integer Number of lags in the sample PACF, specified as a positive integer. parcorr uses lags 0:NumLags to estimate the PACF. The default is min([20,T – 1]), where T is the effective sample size of the input time series. Example: parcorr(y,Numlags=10) plots the sample PACF of y for lags 0 through 10. Data Types: double NumAR — Number of lags in theoretical AR model 0 (default) | nonnegative integer Number of lags in a theoretical AR model of the input time series, specified as a nonnegative integer less than NumLags. parcorr uses NumAR to estimate confidence bounds. For lags > NumAR, parcorr assumes that the input times series is a Gaussian white noise process. Consequently, the standard error is approximately 1/ T, where T is the effective sample size of the input time series. Example: parcorr(y,NumAR=10) specifies that y is an AR(10) process and plots confidence bounds for all lags greater than 10. Data Types: double NumSTD — Number of standard errors in confidence bounds 2 (default) | nonnegative scalar Number of standard errors in the confidence bounds, specified as a nonnegative scalar. For all lags greater than NumAR, the confidence bounds are 0 ± NumSTD*σ , where σ is the estimated standard error of the sample partial autocorrelation. 12-1933
12
Functions
The default yields approximate 95% confidence bounds. Example: parcorr(y,NumSTD=1.5) plots the PACF of y with confidence bounds 1.5 standard errors away from 0. Data Types: double Method — PACF estimation method "ols" | "yule-walker" | character vector PACF estimation method, specified as a value in this table. Value
Description
Restrictions
"ols"
Ordinary least squares (OLS)
The input times series must be fully observed (it cannot contain any NaN values).
"yule-walker"
Yule-Walker equations
None.
If the input time series is fully observed, the default is "ols". Otherwise, the default is "yulewalker". Example: parcorr(y,Method="yule-walker") computes the PACF of y using the Yule-Walker equations. Data Types: char | string DataVariable — Variable in Tbl last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl for which parcorr computes the PACF, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 selects the second table variable. Data Types: double | logical | char | string
Output Arguments pacf — Sample PACF numeric vector Sample PACF, returned as a numeric vector of length NumLags + 1. parcorr returns pacf only when you supply the input y. The elements of pacf correspond to lags 0, 1, 2, ..., NumLags (that is, elements of lags). For all time series, the lag 0 partial autocorrelation pacf(1) = 1. lags — PACF lags numeric vector PACF lags, returned as a numeric vector with elements 0:NumLags. parcorr returns lags only when you supply the input y. 12-1934
parcorr
PACFTbl — Sample PACF table Sample PACF, returned as a table with variables for the outputs pacf and lags. parcorr returns PACFTbl only when you supply the input Tbl. bounds — Approximate upper and lower confidence bounds numeric vector Approximate upper and lower confidence bounds assuming the input series is an AR(NumAR) process, returned as a two-element numeric vector. The NumSTD option specifies the number of standard errors from 0 in the confidence bounds. h — Handles to plotted graphics objects graphics array Handles to plotted graphics objects, returned as a graphics array. h contains unique plot identifiers, which you can use to query or modify properties of the plot.
More About Partial Autocorrelation Function The partial autocorrelation function measures the correlation between yt and yt + k after adjusting for the linear effects of yt + 1,...,yt + k – 1. The estimation of the PACF involves solving the Yule-Walker equations with respect to the autocorrelations. However, if the time series is fully observed, then the PACF can be estimated by fitting successive autoregressive models of orders 1, 2, ... using ordinary least squares. For details, see [1], Chapter 3. Missing Completely at Random Observations of a random variable are missing completely at random if the tendency of an observation to be missing is independent of both the random variable and the tendency of all other observations to be missing.
Tips • To plot the PACF without confidence bounds, set NumSTD=0.
Algorithms parcorr plots the PACF when you do not request any output or when you request the fourth output h.
Version History Introduced before R2006a
12-1935
12
Functions
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Apps Econometric Modeler Functions autocorr | crosscorr | filter Topics “Autocorrelation and Partial Autocorrelation” on page 3-10 “Select ARIMA Model for Time Series Using Box-Jenkins Methodology” on page 3-2 “Detect Autocorrelation” on page 3-19
12-1936
plot
plot Visualize prior and posterior densities of Bayesian linear regression model parameters
Syntax plot(PosteriorMdl) plot(PriorMdl) plot(PosteriorMdl,PriorMdl) plot( ___ ,Name,Value) pointsUsed = plot( ___ ) [pointsUsed,posteriorDensity,priorDensity] = plot( ___ ) [pointsUsed,posteriorDensity,priorDensity,FigureHandle] = plot( ___ )
Description plot(PosteriorMdl) or plot(PriorMdl) plots the posterior or prior distributions of the parameters in the Bayesian linear regression model on page 12-1951 PosteriorMdl or PriorMdl, respectively. plot adds subplots for each parameter to one figure and overwrites the same figure when you call plot multiple times. plot(PosteriorMdl,PriorMdl) plots the posterior and prior distributions in the same subplot. plot uses solid blue lines for posterior densities and dashed red lines for prior densities. plot( ___ ,Name,Value) uses any of the input argument combinations in the previous syntaxes and additional options specified by one or more name-value pair arguments. For example, you can evaluate the posterior or prior density by supplying values of β and σ2, or choose which parameter distributions to include in the figure. pointsUsed = plot( ___ ) also returns the values of the parameters that plot uses to evaluate the densities in the subplots. [pointsUsed,posteriorDensity,priorDensity] = plot( ___ ) also returns the values of the evaluated densities. If you specify one model, then plot returns the density values in PosteriorDensity. Otherwise, plot returns the posterior density values in PosteriorDensity and the prior density values in PriorDensity. [pointsUsed,posteriorDensity,priorDensity,FigureHandle] = plot( ___ ) returns the figure handle of the figure containing the distributions.
Examples Plot Prior and Posterior Distributions Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). 12-1937
12
Functions
GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions: •
β | σ2 ∼ N4 M, σ2V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model. Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and the variable names. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames);
PriorMdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Plot the prior distributions. plot(PriorMdl);
12-1938
plot
plot plots the marginal prior distributions of the intercept, regression coefficients, and disturbance variance. Suppose that the mean of the regression coefficients is [ − 20 4 0 . 001 2]′ and their scaled covariance matrix is 1 0 0 0 0 . 001 0 0 0 1e − 8 0 0 0
0 0 . 0 0.1
Also, the prior scale of the disturbance variance is 0.01. Specify the prior information using dot notation. PriorMdl.Mu = [-20; 4; 0.001; 2]; PriorMdl.V = diag([1 0.001 1e-8 0.01]); PriorMdl.B = 0.01;
Request a new figure and plot the prior distribution. plot(PriorMdl);
plot replaces the current distribution figure with a plot of the prior distribution of the disturbance variance. Load the Nelson-Plosser data set, and create variables for the predictor and response data. 12-1939
12
Functions
load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Estimate the posterior distributions. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is a conjugateblm model object that contains the posterior distributions of β and σ2. Plot the posterior distributions. plot(PosteriorMdl);
Plot the prior and posterior distributions of the parameters on the same subplots. plot(PosteriorMdl,PriorMdl);
12-1940
plot
Plot Distributions to Separate Figures Consider the regression model in “Plot Prior and Posterior Distributions” on page 12-1937. Load the Nelson-Plosser data set, create a default conjugate prior model, and then estimate the posterior using the first 75% of the data. Turn off the estimation display. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR; d = 0.75; PosteriorMdlFirst = estimate(PriorMdl,X(1:floor(d*end),:),y(1:floor(d*end)),... 'Display',false);
Plot the prior distribution and the posterior distribution of the disturbance variance. Return the figure handle. [~,~,~,h] = plot(PosteriorMdlFirst,PriorMdl,'VarNames','Sigma2');
12-1941
12
Functions
h is the figure handle for the distribution plot. If you change the tag name of the figure by changing the Tag property, then the next plot call places all new distribution plots on a different figure. Change the name of the figure handle to FirstHalfData using dot notation. h.Tag = 'FirstHalfData';
Estimate the posterior distribution using the rest of the data. Specify the posterior distribution based on the final 25% of the data as the prior distribution. PosteriorMdl = estimate(PosteriorMdlFirst,X(ceil(d*end):end,:),... y(ceil(d*end):end),'Display',false);
Plot the posterior of the disturbance variance based on half of the data and all the data to a new figure. plot(PosteriorMdl,PosteriorMdlFirst,'VarNames','Sigma2');
12-1942
plot
This type of plot shows the evolution of the posterior distribution when you incorporate new data.
Return Default Distribution and Evaluations Consider the regression model in “Plot Prior and Posterior Distributions” on page 12-1937. Load the Nelson-Plosser data set and create a default conjugate prior model. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Plot the prior distributions. Request the values of the parameters used to create the plots and their respective densities. [pointsUsedPrior,priorDensities1] = plot(PriorMdl);
12-1943
12
Functions
pointsUsedPrior is a 5-by-1 cell array of 1-by-1000 numeric vectors representing the values of the parameters that plot uses to plot the corresponding densities. The first element corresponds to the intercept, the next three elements correspond to the regression coefficients, and the last element corresponds to the disturbance variance. priorDensities1 has the same dimensions as pointsUsed and contains the corresponding density values. Estimate the posterior distribution. Turn off the estimation display. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
Plot the posterior distributions. Request the values of the parameters used to create the plots and their respective densities. [pointsUsedPost,posteriorDensities1] = plot(PosteriorMdl);
12-1944
plot
pointsUsedPost and posteriorDensities1 have the same dimensions as pointsUsedPrior. The pointsUsedPost output can be different from pointsUsedPrior. posteriorDensities1 contains the posterior density values. Plot the prior and posterior distributions. Request the values of the parameters used to create the plots and their respective densities. [pointsUsedPP,posteriorDensities2,priorDensities2] = plot(PosteriorMdl,PriorMdl);
12-1945
12
Functions
All output values have the same dimensions as pointsUsedPrior. The posteriorDensities2 output contains the posterior density values. The priorDensities2 output contains the prior density values. Confirm that pointsUsedPP is equal to pointsUsedPost. compare = @(a,b)sum(a == b) == numel(a); cellfun(compare,pointsUsedPost,pointsUsedPP) ans = 5x1 logical array 1 1 1 1 1
The points used are equivalent. Confirm that the posterior densities are the same, but that the prior densities are not. cellfun(compare,posteriorDensities1,posteriorDensities2) ans = 5x1 logical array 1 1
12-1946
plot
1 1 1 cellfun(compare,priorDensities1,priorDensities2) ans = 5x1 logical array 0 0 0 0 0
When plotting only the prior distribution, plot evaluates the prior densities at points that produce a clear plot of the prior distribution. When plotting both a prior and posterior distribution, plot prefers to plot the posterior clearly. Therefore, plot can determine a different set of points to use.
Specify Values for Density Evaluation and Plotting Consider the regression model in “Plot Prior and Posterior Distributions” on page 12-1937. Load the Nelson-Plosser data set and create a default conjugate prior model for the regression coefficients and disturbance variance. Then, estimate the posterior distribution and obtain the estimation summary table from summarize. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR; PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 Log marginal likelihood: -259.348 | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | -24.2494 8.7821 [-41.514, -6.985] 0.003 t (-24.25, 8.65^2, 68) IPI | 4.3913 0.1414 [ 4.113, 4.669] 1.000 t (4.39, 0.14^2, 68) E | 0.0011 0.0003 [ 0.000, 0.002] 1.000 t (0.00, 0.00^2, 68) WR | 2.4683 0.3490 [ 1.782, 3.154] 1.000 t (2.47, 0.34^2, 68) Sigma2 | 44.1347 7.8020 [31.427, 61.855] 1.000 IG(34.00, 0.00069) summaryTbl = summarize(PosteriorMdl); summaryTbl = summaryTbl.MarginalDistributions;
summaryTbl is a table containing the statistics that estimate displays at the command line. 12-1947
12
Functions
For each parameter, determine a set of 50 evenly spaced values within three standard deviations of the mean. Put the values into the cells of a 5-by-1 cell vector following the order of the parameters that comprise the rows of the estimation summary table. Points = cell(numel(summaryTbl.Mean),1); % Preallocation for j = 1:numel(summaryTbl.Mean) Points{j} = linspace(summaryTbl.Mean(j) - 3*summaryTbl.Std(j),... summaryTbl.Mean(j) + 2*summaryTbl.Std(j),50); end
Plot the posterior distributions within their respective intervals. plot(PosteriorMdl,'Points',Points)
Input Arguments PosteriorMdl — Bayesian linear regression model storing posterior distribution characteristics conjugateblm model object | empiricalblm model object Bayesian linear regression model storing posterior distribution characteristics, specified as a conjugateblm or empiricalblm model object returned by estimate.
12-1948
plot
When you also specify PriorMdl, then PosteriorMdl is the posterior distribution composed of PriorMdl and data. If the NumPredictors and VarNames properties of the two models are not equal, plot issues an error. PriorMdl — Bayesian linear regression model storing prior distribution characteristics conjugateblm model object | semiconjugateblm model object | diffuseblm model object | mixconjugateblm model object | lassoblm model object Bayesian linear regression model storing prior distribution characteristics, specified as a conjugateblm, semiconjugateblm, diffuseblm, empiricalblm, customblm, mixconjugateblm, mixsemiconjugateblm, or lassoblm model object returned by bayeslm. When you also specify PosteriorMdl, then PriorMdl is the prior distribution that, when combined with the data likelihood, forms PosteriorMdl. If the NumPredictors and VarNames properties of the two models are not equal, plot issues an error. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'VarNames',["Beta1"; "Beta2"; "Sigma2"] plots the distributions of regression coefficients corresponding to the names Beta1 and Beta2 in the VarNames property of the model object and the disturbance variance Sigma2. VarNames — Parameter names cell vector of character vectors | string vector Parameter names indicating which densities to plot in the figure, specified as the comma-separated pair consisting of 'VarNames' and a cell vector of character vectors or string vector. VarNames must include "Intercept", any name in the VarNames property of PriorMdl or PosteriorMdl, or "Sigma2". By default, plot chooses "Intercept" (if an intercept exists in the model), all regression coefficients, and "Sigma2". If the model has more than 34 regression coefficients, then plot chooses the first through the 34th only. VarNames is case insensitive. Tip If your model contains many variables, then try plotting subsets of the parameters on separate plots for a better view of the distributions. Example: 'VarNames',["Beta(1)","Beta(2)"] Data Types: string | cell Points — Parameter values for density evaluation and plotting numeric vector | cell vector of numeric vectors Parameter values for density evaluation and plotting, specified as the comma-separated pair consisting of 'Points' and a numPoints-dimensional numeric vector or a numVarNames12-1949
12
Functions
dimensional cell vector of numeric vectors. numPoints is the number of parameters values that plot evaluates and plots the density. • If Points is a numeric vector, then plot evaluates and plots the densities of all specified distributions by using its elements (see VarNames). • If Points is a cell vector of numeric vectors, then: • numVarNames must be numel(VarNames), where VarNames is the value of VarNames. • Cells correspond to the elements of VarNames. • For j = 1,…,numVarNames, plot evaluates and plots the density of the parameter named VarNames{j} by using the vector of points in cell Points(j). By default, plot determines 1000 adequate values at which to evaluate and plot the density for each parameter. Example: 'Points',{1:0.1:10 10:0.2:25 1:0.01:2} Data Types: double | cell
Output Arguments pointsUsed — Parameter values used for density evaluation and plotting cell vector of numeric vectors Parameter values used for density evaluation and plotting, returned as a cell vector of numeric vectors. Suppose Points and VarNames are the values of Points and VarNames, respectively. If Points is a numeric vector, then PointsUsed is repmat({Points},numel(VarNames)). Otherwise, PointsUsed equals Points. Cells correspond to the names in VarNames. posteriorDensity — Evaluated and plotted posterior densities cell vector of numeric row vectors Evaluated and plotted posterior densities, returned as a numVarNames-by-1 cell vector of numeric row vectors. numVarNames is numel(VarNames), where VarNames is the value of VarNames. Cells correspond to the names in VarNames. posteriorDensity has the same dimensions as priorDensity. priorDensity — Evaluated and plotted prior densities cell vector of numeric row vectors Evaluated and plotted prior densities, returned as a numVarNames-by-1 cell vector of numeric row vectors. priorDensity has the same dimensions as posteriorDensity. FigureHandle — Figure window containing distributions figure object Figure window containing distributions, returned as a figure object. plot overwrites the figure window that it produces. If you rename FigureHandle for a new figure window, or call figure before calling plot, then plot continues to overwrite the current figure. To plot distributions to a different figure window, 12-1950
plot
change the figure identifier of the current figure window by renaming its Tag property. For example, to rename the current figure window called FigureHandle to newFigure, at the command line, enter: FigureHandle.Tag = newFigure;
Limitations Because improper distributions (distributions with densities that do not integrate to 1) are not well defined, plot cannot plot them very well.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Version History Introduced in R2017a
See Also Objects conjugateblm | semiconjugateblm | diffuseblm | empiricalblm | customblm | mixconjugateblm | mixsemiconjugateblm | lassoblm Functions forecast | simulate | summarize | summarize 12-1951
12
Functions
Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-1952
plus
plus Lag operator polynomial addition
Syntax C = plus(A, B, 'Tolerance', tolerance) C = A + B
Description Given two lag operator polynomials A(L) and B(L), C = plus(A, B, 'Tolerance', tolerance) performs a polynomial addition C(L) = A(L) + B(L)with tolerance tolerance. 'Tolerance' is the nonnegative scalar tolerance used to determine which coefficients are included in the result. The default tolerance is 1e-12. Specifying a tolerance greater than 0 allows the user to exclude polynomial lags with near-zero coefficients. A coefficient matrix of a given lag is excluded only if the magnitudes of all elements of the matrix are less than or equal to the specified tolerance. C = A + B performs a polynomial addition. If at least one of A or B is a lag operator polynomial object, the other can be a cell array of matrices (initial lag operator coefficients), or a single matrix (zero-degree lag operator).
Examples Add Two Lag Operator Polynomials Create two LagOp polynomials and add them: A = LagOp({1 -0.6 0.08}); B = LagOp({1 -0.5}); plus(A,B) ans = 1-D Lag Operator Polynomial: ----------------------------Coefficients: [2 -1.1 0.08] Lags: [0 1 2] Degree: 2 Dimension: 1
Algorithms The addition operator (+) invokes plus, but the optional coefficient tolerance is available only by calling plus directly.
See Also minus 12-1953
12
Functions
pptest Phillips-Perron test for one unit root
Syntax h = pptest(y) [h,pValue,stat,cValue] = pptest(y) StatTbl = pptest(Tbl) [ ___ ] = pptest( ___ ,Name=Value) [ ___ ,reg] = pptest( ___ )
Description h = pptest(y) returns the rejection decision h from conducting the Phillips-Perron test on page 121962 for a unit root in the univariate time series y. [h,pValue,stat,cValue] = pptest(y) also returns the p-value pValue, test statistic stat, and critical value cValue of the test. StatTbl = pptest(Tbl) returns the table StatTbl containing variables for the test results, statistics, and settings from conducting the Phillips-Perron test on the last variable of the input table or timetable Tbl. To select a different variable in Tbl to test, use the DataVariable name-value argument. [ ___ ] = pptest( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. pptest returns the output argument combination for the corresponding input arguments. Some options control the number of tests to conduct. The following conditions apply when pptest conducts multiple tests: • pptest treats each test as separate from all other tests. • If you specify y, all outputs are vectors. • If you specify Tbl, each row of StatTbl contains the results of the corresponding test. For example, pptest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, on the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West covariance estimator and the second test includes 1 autocovariance lags. [ ___ ,reg] = pptest( ___ ) additionally returns a structure of regression statistics for the hypothesis test reg.
Examples Conduct Phillips-Perron Test on Vector of Data Test a time series for a unit root using the default options of pptest. Input the time series data as a numeric vector. 12-1954
pptest
Load the Canadian inflation rate data and extract the CPI-based inflation rate INF_C. load Data_Canada y = DataTable.INF_C;
Test the time series for a unit root. h = pptest(y) h = logical 0
The result h = 0 indicates that this test fails to reject the null hypothesis of a unit root against the AR(1) alternative.
Return Test p-Value and Decision Statistics Load Canadian inflation rate data and extract the CPI-based inflation rate INF_C. load Data_Canada y = DataTable.INF_C;
Test the time series for a unit root. Return the test decision, p-value, test statistic, and critical value. [h,pValue,stat,cValue] = pptest(y) h = logical 0 pValue = 0.3255 stat = -0.8769 cValue = -1.9476
Conduct Phillips-Perron Test on Table Variable Test a time series, which is one variable in a table, for a unit root using default options. Load Canadian inflation rate data, which contains yearly measurements on five time series variables in the table DataTable. load Data_Canada
Test the long-term bond rate series INT_L, the last variable in the table, for a unit root. StatTbl = pptest(DataTable) StatTbl=1×8 table h _____
pValue ______
stat _______
cValue _______
Lags ____
Alpha _____
Model ______
Test ______
12-1955
12
Functions
Test 1
false
0.7358
0.24601
-1.9476
0
0.05
{'AR'}
{'T1'}
pptest returns test results and settings in the table StatTbl, where variables correspond to test results (h, pValue, stat, and cValue) and settings (Lags, Alpha, Model, and Test), and rows correspond to individual tests (in this case, pptest conducts one test). By default, pptest tests the last variable in the table. To select a variable from an input table to test, set the DataVariable option.
Assess Stationarity Using Phillips-Perron Test Test GDP data for a unit root using a trend-stationary alternative with 0, 1, and 2 lags for the NeweyWest estimator. Load the GDP data set. load Data_GDP logGDP = log(Data);
Perform the Phillips-Perron test including 0, 1, and 2 autocovariance lags in the Newey-West robust covariance estimator. h = pptest(logGDP,Model="TS",Lags=0:2) h = 1x3 logical array 0
0
0
Each test returns h = 0, which means the test fails to reject the unit-root null hypothesis for each set of lags. Therefore, there is not enough evidence to suggest that log GDP is trend stationary.
Inspect Regression Statistics Test a time series for a unit root against trend-stationary alternatives. Inspect the regression statistics corresponding to each of the tests. Load a US macroeconomic data set Data_USEconModel.mat. Compute the log of the GDP and include the result as a new variable called LogGDP in the data set. load Data_USEconModel DataTimeTable.LogGDP = log(DataTimeTable.GDP);
Test for a unit root in the logged GDP series using three different choices for the number of lagged difference terms. Return the regression statistics for each alternative model. [StatTbl,reg] = pptest(DataTimeTable,DataVariable="LogGDP",Model="TS",Lags=0:2); StatTbl StatTbl=3×8 table h
12-1956
pValue
stat
cValue
Lags
Alpha
Model
Test
pptest
Test 1 Test 2 Test 3
_____
_______
_______
_______
false false false
0.999 0.999 0.99829
1.0247 0.56702 0.31644
-3.4302 -3.4302 -3.4302
____ 0 1 2
_____
______
______
0.05 0.05 0.05
{'TS'} {'TS'} {'TS'}
{'T1'} {'T1'} {'T1'}
pptest treats each of the three lag choices as separate tests, and returns results and settings for each test along the rows of the table StatTbl. reg is a 3-by-1 structure array containing regression statistics corresponding to each of the three alternative models. Display the names of the coefficients, and their estimates and corresponding p-values resulting from the regressions of the three alternative models. test1 = array2table([reg(1).coeff reg(1).tStats.pVal], ... RowNames=reg(1).names,VariableNames=["Coeff" "pValue"]) test1=3×2 table Coeff ___________ c d a
-0.014026 -0.00013104 1.0061
pValue ___________ 0.6654 0.22383 8.5908e-255
test2 = array2table([reg(2).coeff reg(2).tStats.pVal], ... RowNames=reg(2).names,VariableNames=["Coeff" "pValue"]) test2=3×2 table Coeff ___________ c d a
-0.014026 -0.00013104 1.0061
pValue ___________ 0.6654 0.22383 8.5908e-255
test3 = array2table([reg(3).coeff reg(3).tStats.pVal], ... RowNames=reg(3).names,VariableNames=["Coeff" "pValue"]) test3=3×2 table Coeff ___________ c d a
-0.014026 -0.00013104 1.0061
pValue ___________ 0.6654 0.22383 8.5908e-255
The coefficients c and d are the drift and deterministic trend, respectively, in the alternative model. a is the lag 1 AR term in the null and alternative models. Although each test uses a different number of lags, all coefficient estimates are the same. This result occurs because Lags factors into the NeweyWest estimate of the long-run variance, not the model itself. Compare the Newey-West variance estimates among the tests. reg.NWEst
12-1957
12
Functions
ans = 1.2352e-04 ans = 1.8025e-04 ans = 2.2355e-04
Input Arguments y — Univariate time series data numeric vector Univariate time series data, specified as a numeric vector. Each element of y represents an observation. Data Types: double Tbl — Time series data table | timetable Time series data, specified as a table or timetable. Each row of Tbl is an observation. Specify a single series (variable) to test by using the DataVariable argument. The selected variable must be numeric. Note pptest removes missing observations, represented by NaN values, from the input series. Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: pptest(Tbl,DataVariable="GDP",Alpha=0.025,Lags=[0 1]) conducts two tests, at a level of significance of 0.025, for the presence of a unit root in the variable GDP of the table Tbl. The first test includes 0 autocovariance lags in the Newey-West estimator of the long-run variance and the second test includes 1 autocovariance lag. Lags — Number of autocovariance lags 0 (default) | nonnegative integer | vector of nonnegative integers Number of autocovariance lags to include in the Newey-West estimator of the long-run variance, specified as a nonnegative integer or vector of nonnegative integers. If Lags(j) > 0, pptest includes lags 1 through Lags(j) in the estimator for test j. pptest conducts a separate test for each element in Lags. Example: Lags=0:2 includes zero lagged autocovariance terms in the Newey-West estimator for the first test, the lag 1 autocovariance term for the second test, and autocovariance lags 1 and 2 in the third test. Data Types: double 12-1958
pptest
Model — Model variant "AR" (default) | "ARD" | "TS" | character vector | string vector | cell vector of character vectors Model variant, specified as a model variant name, or a string vector or cell vector of model names. This table contains the supported model variant names. Model Variant Name
Description
"AR"
Autoregressive model variant, which specifies a test of the null model yt = yt – 1 + εt against the alternative model yt = ϕyt – 1 + εt with AR(1) coefficient |ϕ| < 1.
"ARD"
Autoregressive model with drift variant, which specifies a test of the null model yt = yt – 1 + εt against the alternative model yt = c + ϕyt – 1 + εt with drift coefficient c and AR(1) coefficient |ϕ| < 1.
"TS"
Trend-stationary model variant, which specifies a test of the null model yt = yt – 1 + εt against the alternative model yt = c + + ϕyt – 1 + εt with drift coefficient c, deterministic trend coefficient δ, and AR(1) coefficient | ϕ| < 1. pptest conducts a separate test for each model variant name in Model. Example: Model=["AR" "ARD"] uses the stationary AR model as the alternative hypothesis for the first test, and then uses the stationary AR model with drift as the alternative hypothesis for the second test. Data Types: char | cell | string Test — Test statistic "t1" (default) | "t2" | character vector | string vector | cell vector of character vectors Test statistic, specified as a test name, or a string vector or cell vector of test names. This table contains the supported test names.
Test Name
Description
"t1"
Modification of the standard t statistic t1 =
ϕ −1 , SE ϕ
computed using the ordinary least squares (OLS) estimate of the AR(1) coefficient ϕ and its standard error SE(ϕ ), in the alternative model.
12-1959
12
Functions
Test Name
Description
"t2"
Modification of the unstudentized t statistic t2 = T(ϕ – 1) t2 =
T(ϕ − 1) , 1−β1−…−βp
computed using the OLS estimates of the AR(1) coefficient ϕ in the alternative model. T is the effective sample size, which is adjusted for lags and missing values. The test assesses the significance of the restriction ϕ − 1 = 0. pptest modifies the test statistics to account for serial correlations in the innovations process εt. pptest conducts a separate test for each test name in Test. Example: Test="t2" computes the F test statistic for all tests. Data Types: char | cell | string Alpha — Nominal significance level 0.05 (default) | numeric scalar | numeric vector Nominal significance level for the hypothesis test, specified as a numeric scalar between 0.001 and 0.999 or a numeric vector of such values. pptest conducts a separate test for each value in Alpha. Example: Alpha=[0.01 0.05] uses a level of significance of 0.01 for the first test, and then uses a level of significance of 0.05 for the second test. Data Types: double DataVariable — Variable in Tbl to test last variable (default) | string scalar | character vector | integer | logical vector Variable in Tbl to test, specified as a string scalar or character vector containing a variable name in Tbl.Properties.VariableNames, or an integer or logical vector representing the index of a name. The selected variable must be numeric. Example: DataVariable="GDP" Example: DataVariable=[false true false false] or DataVariable=2 tests the second table variable. Data Types: double | logical | char | string Note • When pptest conducts multiple tests, the function applies all single settings (scalars or character vectors) to each test. • All vector-valued specifications that control the number of tests must have equal length. • If you specify the vector y and any value is a row vector, all outputs are row vectors.
12-1960
pptest
Output Arguments h — Test rejection decisions logical scalar | logical vector Test rejection decisions, returned as a logical scalar or vector with length equal to the number of tests. pptest returns h when you supply the input y. • Values of 1 indicate rejection of the unit-root null hypothesis in favor of the alternative. • Values of 0 indicate failure to reject the unit-root null hypothesis. pValue — Test statistic p-values numeric scalar | numeric vector Test statistic p-values, returned as a numeric scalar or vector with length equal to the number of tests. pptest returns pValue when you supply the input y. The p-values are left-tail probabilities. When test statistics are outside tabulated critical values, pptest returns maximum (0.999) or minimum (0.001) p-values. stat — Test statistics numeric scalar | numeric vector Test statistics, returned as a numeric scalar or vector with length equal to the number of tests. pptest returns stat when you supply the input y. pptest computes test statistics using OLS estimates of the coefficients in the alternative model. cValue — Critical values numeric scalar | numeric vector Critical values, returned as a numeric scalar or vector with length equal to the number of tests. pptest returns cValue when you supply the input y. The critical values are for left-tail probabilities. StatTbl — Test summary table Test summary, returned as a table with variables for the outputs h, pValue, stat, and cValue, and with a row for each test. pptest returns StatTbl when you supply the input Tbl. StatTbl contains variables for the test settings specified by Lags, Alpha, Model, and Test. reg — Regression statistics structure array Regression statistics from the OLS estimation of coefficients in the alternative model, returned as a structure array with number of records equal to the number of tests. Each element of reg has the fields in this table. You can access a field using dot notation, for example, reg(1).coeff contains the coefficient estimates of the first test. 12-1961
12
Functions
num
Length of input series with NaNs removed
size
Effective sample size, adjusted for lags
names
Regression coefficient names
coeff
Estimated coefficient values
se
Estimated coefficient standard errors
Cov
Estimated coefficient covariance matrix
tStats
t statistics of coefficients and p-values
FStat
F statistic and p-value
yMu
Mean of the lag-adjusted input series
ySigma
Standard deviation of the lag-adjusted input series
yHat
Fitted values of the lag-adjusted input series
res
Regression residuals
autoCov
Estimated residual autocovariances
NWEst
Newey-West estimator
DWStat
Durbin-Watson statistic
SSR
Regression sum of squares
SSE
Error sum of squares
SST
Total sum of squares
MSE
Mean square error
RMSE
Standard error of the regression
RSq
R2 statistic
aRSq
Adjusted R2 statistic
LL
Loglikelihood of data under Gaussian innovations
AIC
Akaike information criterion
BIC
Bayesian (Schwarz) information criterion
HQC
Hannan-Quinn information criterion
More About Phillips-Perron Test The Phillips-Perron test assesses the null hypothesis of a unit root in a univariate time series yt, where yt = c + δt + ϕyt – 1 + εt and • c is the drift coefficient (see Model). • δ is the deterministic trend coefficient (see Model). • εt is a mean zero innovation process. The null hypothesis of a unit root restricts ϕ = 1. The alternative hypothesis is ϕ < 1. A test that fails to reject the null hypothesis, fails to reject the possibility of a unit root. 12-1962
pptest
Variants of the model allow for different growth characteristics (see Model). The model with δ = 0 has no trend component, and the model with c = 0 and δ = 0 has no drift or trend.
Tips • To draw valid inferences from a Phillips-Perron test, you must determine a suitable value for the Lags argument. The following methods help determine a suitable value: • Begin by setting a small value and then evaluate the sensitivity of the results by adding more lags. • Inspect sample autocorrelations of yt − yt−1; slow rates of decay require more lags. The Newey-West estimator is consistent when the number of lags is O(T1/4), where T is the effective sample size, adjusted for lag and missing values. For more details, see [9] and [5]. • With a specific testing strategy in mind, determine the value of Model by the growth characteristics of yt. If you include too many regressors (see Lags), the test loses power; if you include too few regressors, the test is biased towards favoring the null model [2]. In general, if a series grows, the "TS" model (see Model) provides a reasonable trend-stationary alternative to a unit-root process with drift. If a series is does not grow, the "AR" and "ARD" models provide reasonable stationary alternatives to a unit-root process without drift. The "ARD" alternative model has a mean of c/(1 – a); the "AR" alternative model has mean 0.
Algorithms • In general, when a time series is lagged, the sample size is reduced. Without a presample, if yt is defined for t = 1,…,T, the lag k series yt–k is defined for t = k+1,…,T. Consequently, the effective sample size of the common time base is T − k. • To account for serial correlations in the innovations process εt, pptest uses modified DickeyFuller statistics (see adftest). • Phillips-Perron statistics stat follow nonstandard distributions under the null, even asymptotically. pptest uses tabulated critical values, generated by Monte Carlo simulations, for a range of sample sizes and significance levels of the null model with Gaussian innovations and five million replications per sample size. pptest interpolates critical values cValue and p-values pValue from the tables. Tables for tests of Test types "t1" and "t2" are identical to those for adftest.
Version History Introduced in R2009b
References [1] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Elder, J., and P. E. Kennedy. "Testing for Unit Roots: What Should Students Be Taught?" Journal of Economic Education. Vol. 32, 2001, pp. 137–146. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-1963
12
Functions
[4] Newey, W. K., and K. D. West. "A Simple, Positive Semidefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." Econometrica. Vol. 55, 1987, pp. 703–708. [5] Perron, P. "Trends and Random Walks in Macroeconomic Time Series: Further Evidence from a New Approach." Journal of Economic Dynamics and Control. Vol. 12, 1988, pp. 297–332. [6] Phillips, P. "Time Series Regression with a Unit Root." Econometrica. Vol. 55, 1987, pp. 277–301. [7] Phillips, P., and P. Perron. "Testing for a Unit Root in Time Series Regression." Biometrika. Vol. 75, 1988, pp. 335–346. [8] Schwert, W. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159. [9] White, H., and I. Domowitz. "Nonlinear Regression with Dependent Observations." Econometrica. Vol. 52, 1984, pp. 143–162.
See Also adftest | kpsstest | vratiotest | lmctest Topics “Unit Root Nonstationarity” on page 3-32
12-1964
price2ret
price2ret Convert prices to returns
Syntax [Returns,intervals] = price2ret(Prices) ReturnTbl = price2ret(PriceTbl) [ ___ ] = price2ret( ___ ,Name=Value)
Description [Returns,intervals] = price2ret(Prices) returns the matrix of numVars continuously compounded return series Returns, and corresponding time intervals intervals, from the matrix of numVars price series Prices. ReturnTbl = price2ret(PriceTbl) returns the table or timetable of continuously compounded return series ReturnTbl of each variable in the table or timetable of price series PriceTbl. To select different variables in Tbl from which to compute returns, use the DataVariables name-value argument. [ ___ ] = price2ret( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. price2ret returns the output argument combination for the corresponding input arguments. For example, price2ret(Tbl,Method="periodic",DataVariables=1:5) computes the simple periodic returns of the first five variables in the input table Tbl.
Examples Compute Return Series from Price Series in Vector of Data Load the Schwert Stock data set Data_SchwertStock.mat, which contains daily prices of the S&P index from 1930 through 2008, among other variables (enter Description for more details). load Data_SchwertStock numObs = height(DataTableDly) numObs = 20838 dates = datetime(datesDly,ConvertFrom="datenum");
Convert the S&P price series to returns. prices = DataTableDly.SP; returns = price2ret(prices);
returns is a 20837-by-1 vector of daily S&P returns compounded continuously. r9 = returns(9) r9 = 0.0033
12-1965
12
Functions
p9_10 = [prices(9) prices(10)] p9_10 = 1×2 21.4500
21.5200
returns(9) = 0.0033 is the daily return of the prices in the interval [21.45, 21.52]. plot(dates,DataTableDly.SP) ylabel("Price") yyaxis right plot(dates(1:end-1),returns) ylabel("Return") title("S&P Index Prices and Returns")
Compute Simple Periodic Return Series from Table of Price Series Convert the price series in a table to simple periodic return series. Load the US equity indices data set, which contains the table DataTable of daily closing prices of the NYSE and NASDAQ composite indices from 1990 through 2011. load Data_EquityIdx
Create a timetable from the table. 12-1966
price2ret
dates = datetime(dates,ConvertFrom="datenum"); TT = table2timetable(DataTable,RowTimes=dates); numObs = height(TT);
Convert the NASDAQ and NYSE prices to simple periodic and continuously compounded returns. varnames = ["NASDAQ" "NYSE"]; TTRetC = price2ret(TT,DataVariables=varnames); TTRetP = price2ret(TT,DataVariables=varnames,Method="periodic");
Because TT is a timetable, TTRetC and TTRetP are timetables. Plot the return series with the corresponding prices for the last 50 observations. idx = ((numObs - 1) - 51):(numObs - 1); figure plot(dates(idx + 1),TT.NYSE(idx + 1)) title("NYSE Index Prices and Returns") ylabel("Price") yyaxis right h = plot(dates(idx),[TTRetC.NYSE(idx) TTRetP.NYSE(idx)]); h(2).Marker = 'o'; h(2).Color = 'k'; ylabel("Return") legend(["Price" "Continuous" "Periodic"],Location="northwest") axis tight
12-1967
12
Functions
figure plot(dates(idx + 1),TT.NASDAQ(idx + 1)) title("NASDAQ Index Prices and Returns") ylabel("Price") yyaxis right h = plot(dates(idx),[TTRetC.NASDAQ(idx) TTRetP.NASDAQ(idx)]); h(2).Marker = 'o'; h(2).Color = 'k'; ylabel("Return") legend(["Price" "Continuous" "Periodic"],Location="northwest") axis tight
In this case, the simple periodic and continuously compounded returns of each price series are similar.
Specify Observation Times and Units Create two stock price series from continuously compounded returns that have the following characteristics: • Series 1 grows at a 10 percent rate at each observation time. • Series 2 changes at a random uniform rate in the interval [-0.1, 0.1] at each observation time. • Each series starts at price 100 and is 10 observations in length. 12-1968
price2ret
rng(1); % For reproducibility numObs = 10; p1 = 100; r1 = 0.10; r2 = [0; unifrnd(-0.10,0.10,numObs - 1,1)]; s1 = 100*exp(r1*(0:(numObs - 1))'); cr2 = cumsum(r2); s2 = 100*exp(cr2); S = [s1 s2];
Convert each price series to a return series, and return the observation intervals. [R,intervals] = price2ret(S);
Prepend the return series so that the input and output elements are of the same length and correspond. [[NaN; intervals] S [[NaN NaN]; R] r2] ans = 10×6 NaN 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100.0000 110.5171 122.1403 134.9859 149.1825 164.8721 182.2119 201.3753 222.5541 245.9603
100.0000 98.3541 102.7850 93.0058 89.4007 83.3026 76.7803 72.1105 69.9172 68.4885
NaN 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000
NaN -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
0 -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
price2ret returns rates matching the rates from the simulated series. price2ret assumes prices are recorded in a regular time base. Therefore, all durations between prices are 1. Convert the prices to returns again, but associate the prices with years starting from August 1, 2010. tau1 = datetime(2010,08,01); dates = tau1 + years((0:(numObs-1))'); [Ry,intervalsy] = price2ret(S,Ticks=dates); [[NaN; intervalsy] S [[NaN NaN]; Ry] r2] ans = 10×6 NaN 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425 365.2425
100.0000 110.5171 122.1403 134.9859 149.1825 164.8721 182.2119 201.3753 222.5541 245.9603
100.0000 98.3541 102.7850 93.0058 89.4007 83.3026 76.7803 72.1105 69.9172 68.4885
NaN 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003
NaN -0.0000 0.0001 -0.0003 -0.0001 -0.0002 -0.0002 -0.0002 -0.0001 -0.0001
0 -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
12-1969
12
Functions
price2ret assumes time units are days. Therefore, all durations are approximately 365 and the returns are normalized for that time unit. Compute returns again, but specify that the observation times are years. [Ryy,intervalsyy] = price2ret(S,Ticks=dates,Units="years"); [[NaN; intervalsyy] S [[NaN NaN]; Ryy] r2] ans = 10×6 NaN 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
100.0000 110.5171 122.1403 134.9859 149.1825 164.8721 182.2119 201.3753 222.5541 245.9603
100.0000 98.3541 102.7850 93.0058 89.4007 83.3026 76.7803 72.1105 69.9172 68.4885
NaN 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000 0.1000
NaN -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
0 -0.0166 0.0441 -0.1000 -0.0395 -0.0706 -0.0815 -0.0627 -0.0309 -0.0206
price2ret normalizes the returns relative to years, and now the returned rates match the simulated rates.
Input Arguments Prices — Time series of prices numeric matrix Time series of prices, specified as a numObs-by-numVars numeric matrix. Each row of Prices corresponds to an observation time specified by the optional Ticks name-value argument. Each column of Prices corresponds to an individual price series. Data Types: double PriceTbl — Time series of prices table | timetable Time series of prices, specified as a table or timetable with numObs rows. Each row of Tbl is an observation time. For a table, the optional Ticks name-value argument specifies observation times. For a timetable, PriceTbl.Time specifies observation times and it must be a datetime vector. Specify numVars variables, from which to compute returns, by using the DataVariables argument. The selected variables must be numeric. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: price2ret(Tbl,Method="periodic",DataVariables=1:5) computes the simple periodic returns of the first five variables in the input table Tbl. 12-1970
price2ret
Ticks — Observation times τ numeric vector | datetime vector Observation times τ, specified as a length numObs numeric or datetime vector of increasing values. When the input price series are in a matrix or table, the default is 1:numObs. When the input price series are in a timetable, price2ret uses the row times in PriceTbl.Time and ignores Ticks. PriceTbl.Time must be a datetime vector. Example: Ticks=datetime(1950:2020,12,31) specifies the end of each year from 1950 through 2020. Example: Ticks=datetime(1950,03,31):calquarters(1):datetime(2020,12,31) specifies the end of each quarter during the years 1950 through 2020. Data Types: double | datetime Units — Time units "days" (default) | "milliseconds" | "seconds" | "minutes" | "hours" | "years" | character vector Time units to use when observation times Ticks are datetimes, specified as a value in this table. Value
Description
"milliseconds"
Milliseconds
"seconds"
Seconds
"minutes"
Minutes
"hours"
Hours
"days"
Days
"years"
Years
price2ret requires time units to convert duration intervals to numeric values for normalizing returns. When the value of the Ticks name-value argument is a numeric vector, price2ret ignores the value of Units. Example: Units="years" Data Types: char | string Method — Compounding method "continuous" (default) | "periodic" | character vector Compounding method, specified as a value in this table. Value
Description
"continuous"
Compute continuously compounded returns
"periodic"
Compute simple periodic returns
Example: Method="periodic" Data Types: char | string 12-1971
12
Functions
DataVariables — Variables in PriceTbl all variables (default) | string vector | cell vector of character vectors | vector of integers | logical vector Variables in PriceTbl, from which price2ret computes returns, specified as a string vector or cell vector of character vectors containing variable names in PriceTbl.Properties.VariableNames, or an integer or logical vector representing the indices of names. The selected variables must be numeric. Example: DataVariables=["GDP" "CPI"] Example: DataVariables=[true true false false] or DataVariables=[1 2] selects the first and second table variables. Data Types: double | logical | char | cell | string
Output Arguments Returns — Return series numeric matrix Return series, returned as a (numObs – 1)-by-numVars numeric matrix. price2ret returns Returns when you supply the input Prices. Returns in row i ri are associated with price interval [pi,pi+1], i = 1:(numObs - 1), according to the compounding method Method: • When Method is "continuous", ri =
log pi + 1 /pi . τi + 1 − τi
• When Method is "periodic", ri =
pi + 1 /pi − 1 . τi + 1 − τi
When observation times τ (see Ticks) are datetimes, the magnitude of the normalizing interval τi+1 – τi depends on the specified time units (see Units). intervals — Time intervals between observations numeric vector Time intervals between observations τi+1 – τi, returned as a length numObs – 1 numeric vector. price2ret returns intervals when you supply the input Prices. When observation times (see Ticks) are datetimes, interval magnitudes depend on the specified time units (see Units). ReturnTbl — Return series and time intervals table | timetable Return series and time intervals, returned as a table or timetable, the same data type as PriceTbl, with numObs – 1 rows. price2ret returns ReturnTbl when you supply the input PriceTbl. ReturnTbl contains the outputs Returns and intervals. 12-1972
price2ret
ReturnTbl associates observation time τi+1 with the end of the interval for the returns in row i ri.
Algorithms Consider the following variables: • p is a price series (Prices). • r is the corresponding return series (Returns). • τ is a vector of the observation times (Ticks). • δ is the series of lengths between observation times (intervals). The following figure shows how the inputs and outputs are associated.
Version History Introduced before R2006a R2022a: price2ret supports name-value argument syntax for all optional inputs Behavior changed in R2022a price2ret accepts the observation times ticktimes and compounding method method as the name-value arguments Ticks and Method, respectively. However, the function will continue to accept the previous syntax. The syntax before R2022a is price2ret(Prices,ticktimes,method)
The recommended syntax for R2022a and later releases is price2ret(Prices,Ticks=ticktimes,Method=method)
As with any set of name-value arguments, you can specify them in any order.
See Also ret2price | tick2ret Topics “Returns with Negative Prices”
12-1973
12
Functions
print (To be removed) Display parameter estimation results for conditional variance models Note print will be removed in a future release. Use summarize instead.
Syntax print(Mdl,EstParamCov)
Description print(Mdl,EstParamCov) displays parameter estimates, standard errors, and t statistics for the fitted conditional variance model Mdl, with estimated parameter variance-covariance matrix EstParamCov. Mdl can be a garch, egarch, or gjr model.
Examples Print GARCH Estimation Results Print the results from estimating a GARCH model using simulated data. Simulate data from an GARCH(1,1) model with known parameter values. Mdl0 = garch('Constant',0.01,'GARCH',0.8,'ARCH',0.14) Mdl0 = garch with properties: Description: SeriesName: Distribution: P: Q: Constant: GARCH: ARCH: Offset:
"GARCH(1,1) Conditional Variance Model (Gaussian Distribution)" "Y" Name = "Gaussian" 1 1 0.01 {0.8} at lag [1] {0.14} at lag [1] 0
rng 'default'; [V,Y] = simulate(Mdl0,100);
Fit a GARCH(1,1) model to the simulated data, turning off the print display. Mdl = garch(1,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead.
12-1974
print
GARCH(1,1) Conditional Variance Model: ---------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant GARCH{1} ARCH{1}
Value ----------0.0167004 0.77263 0.191686
Standard Error -----------0.0165077 0.0776905 0.0750675
t Statistic ----------1.01167 9.94498 2.55351
Print EGARCH Estimation Results Print the results from estimating an EGARCH model using simulated data. Simulate data from an EGARCH(1,1) model with known parameter values. Mdl0 = egarch('Constant',0.01,'GARCH',0.8,'ARCH',0.14,... 'Leverage',-0.1); rng 'default'; [V,Y] = simulate(Mdl0,100);
Fit an EGARCH(1,1) model to the simulated data, turning off the print display. Mdl = egarch(1,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. EGARCH(1,1) Conditional Variance Model: -------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant GARCH{1} ARCH{1} Leverage{1}
Value ----------0.0654887 0.858069 0.27702 -0.179034
Standard Error -----------0.0746316 0.154361 0.171036 0.125057
t Statistic ----------0.877494 5.55886 1.61966 -1.43162
Print GJR Estimation Results Print the results from estimating a GJR model using simulated data. Simulate data from a GJR(1,1) model with known parameter values. 12-1975
12
Functions
Mdl0 = gjr('Constant',0.01,'GARCH',0.8,'ARCH',0.14,... 'Leverage',0.1); rng 'default'; [V,Y] = simulate(Mdl0,100);
Fit a GJR(1,1) model to the simulated data, turning off the print display. Mdl = gjr(1,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,'Display','off');
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. GJR(1,1) Conditional Variance Model: -------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant GARCH{1} ARCH{1} Leverage{1}
Value ----------0.194785 0.69954 0.192965 0.214988
Standard Error -----------0.254198 0.11266 0.0931335 0.223923
t Statistic ----------0.766271 6.20928 2.07192 0.9601
Input Arguments Mdl — Conditional variance model garch model object | egarch model object | gjr model object Conditional variance model without any unknown parameters, specified as a garch, egarch, or gjr model object. Mdl is usually the estimated conditional variance model returned by estimate. EstParamCov — Estimated parameter variance-covariance matrix numeric matrix Estimated parameter variance-covariance matrix, returned as a numeric matrix. EstParamCov is usually the estimated conditional variance model returned by estimate. The rows and columns associated with any parameters contain the covariances. The standard errors of the parameter estimates are the square root of the entries along the main diagonal. The rows and columns associated with any parameters held fixed as equality constraints during estimation contain 0s. The order of the parameters in EstParamCov must be: • Constant 12-1976
print
• Nonzero GARCH coefficients at positive lags • Nonzero ARCH coefficients at positive lags • For EGARCH and GJR models, nonzero leverage coefficients at positive lags • Degrees of freedom (t innovation distribution only) • Offset (models with nonzero offset only) Data Types: double
Version History Introduced in R2012a
See Also Objects garch | egarch | gjr Functions estimate | summarize
12-1977
12
Functions
print (To be removed) Display parameter estimation results for ARIMA or ARIMAX models
Syntax print(EstMdl,EstParamCov)
Description print(EstMdl,EstParamCov) displays parameter estimates, standard errors, and t statistics for a fitted ARIMA or ARIMAX model.
Examples Print ARIMA Estimation Results Print the results from estimating an ARIMA model using simulated data. Simulate data from an ARMA(1,1) model using known parameter values. MdlSim = arima(Constant=0.01,AR=0.8,MA=0.14,Variance=0.1); rng("default") Y = simulate(MdlSim,100);
Fit an ARMA(1,1) model to the simulated data, turning off the print display. Mdl = arima(1,0,1); [EstMdl,EstParamCov] = estimate(Mdl,Y,Display="off");
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. ARIMA(1,0,1) Model: -------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant AR{1} MA{1} Variance
12-1978
Value ----------0.0445373 0.822892 0.12032 0.133727
Standard Error -----------0.0460376 0.0711631 0.101817 0.0178793
t Statistic ----------0.967412 11.5635 1.18173 7.4794
print
Print ARIMAX Estimation Results Print the results of estimating an ARIMAX model. Load the Credit Defaults data set, assign the response IGD to Y and the predictors AGE, CPF, and SPR to the matrix X, and obtain the sample size T. To avoid distraction from the purpose of this example, assume that all predictor series are stationary. load Data_CreditDefaults X = Data(:,[1 3:4]); T = size(X,1); y = Data(:,5);
Separate the initial values from the main response and predictor series. y0 = y(1); yEst = y(2:T); XEst = X(2:end,:);
Set the ARIMAX(1,0,0) model yt = c + ϕ1 yt − 1 + εt to MdlY to fit to the data. MdlY = arima(1,0,0);
Fit the model to the data and specify the initial values. [EstMdl,EstParamCov] = estimate(MdlY,yEst,X=XEst, ... Y0=y0,Display="off");
Print the estimation results. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. ARIMAX(1,0,0) Model: --------------------Conditional Probability Distribution: Gaussian Parameter ----------Constant AR{1} Beta(1) Beta(2) Beta(3) Variance
Value -----------0.204768 -0.0173091 0.0239329 -0.0124602 0.0680871 0.00539463
Standard Error -----------0.266078 0.565618 0.0218417 0.00749917 0.0745041 0.00224393
t Statistic -----------0.769578 -0.030602 1.09574 -1.66154 0.913871 2.4041
Input Arguments EstMdl — Estimated ARIMA or ARIMAX model arima model object Estimated ARIMA or ARIMAX model, specified as an arima model object returned by estimate. 12-1979
12
Functions
EstParamCov — Estimated error variance-covariance matrix square matrix Estimated error variance-covariance matrix as returned by estimate, specified as a square matrix with rows and columns corresponding to parameters known to the optimizer of estimate. Known parameters include all parameters estimate estimated. Rows and columns associated with parameters fixed during estimation contain 0s. The order of the parameters (that is, rows and columns) in EstParamCov is: • Constant • Nonzero AR coefficients at positive lags • Nonzero SAR coefficients at positive lags • Nonzero MA coefficients at positive lags • Nonzero SMA coefficients at positive lags • Regression coefficients (when EstMdl contains them) • Variance parameters (scalar for constant-variance models, or a vector of parameters for a conditional variance model) • Degrees of freedom (t innovation distribution only)
Version History Introduced in R2012a R2018a: Warns Warns starting in R2018a print will be removed in a future release. Use summarize instead. This list shows the differences between print and summarize: • For an unestimated (custom) arima model input, summarize returns the standard object display of the model. • For an estimated arima model input, as returned by estimate, summarize prints an estimation summary in a MATLAB table and lists other estimation statistics. summarize returns the estimation statistics in an output structure array.
See Also Objects arima Functions estimate | summarize
12-1980
print
print (To be removed) Display estimation results for regression models with ARIMA errors
Syntax print(EstMdl,EstParamCov)
Description print(EstMdl,EstParamCov) displays parameter estimates, standard errors, and t statistics of a fitted regression model with ARIMA model.
Examples Print Estimation Results of Regression Model with ARIMA Errors Regress GDP onto CPI using a regression model with ARMA(1,1) errors, and print the results. Load the US Macroeconomic data set and preprocess the data. load Data_USEconModel logGDP = log(DataTable.GDP); dlogGDP = diff(logGDP); dCPI = diff(DataTable.CPIAUCSL);
Fit the model to the data. Mdl = regARIMA(1,0,1); [EstMdl,EstParamCov] = estimate(Mdl,dlogGDP,X=dCPI,Display="off");
Print the estimates. print(EstMdl,EstParamCov) Warning: PRINT will be removed in a future release; use SUMMARIZE instead. Regression with ARIMA(1,0,1) Error Model: -----------------------------------------Conditional Probability Distribution: Gaussian Parameter ----------Intercept AR{1} MA{1} Beta1 Variance
Value ----------0.014776 0.605274 -0.161651 0.00204403 9.35782e-05
Standard Error -----------0.00146271 0.0892902 0.10956 0.000706163 6.03135e-06
t Statistic ----------10.1018 6.77872 -1.47546 2.89456 15.5153
12-1981
12
Functions
Input Arguments EstMdl — Estimated regression model with ARIMA errors regARIMA model object Estimated regression model with ARIMA errors, specified as a regARIMA model object returned by estimate. EstParamCov — Estimation error variance-covariance square numeric matrix Estimation error variance-covariance, specified as a square numeric matrix. EstParamCov is a square matrix with a row and column for each parameter known to the optimizer that estimate uses to fit EstMdl. Known parameters include all parameters estimate estimates. If you specify equality constraints on a parameter for estimation, the parameter is known and the rows and columns associated with it contain zeros. print omits coefficients of lag operator polynomials at lags excluded from EstMdl. print arranges the parameters in ParamCov as follows: • Intercept • Nonzero AR coefficients at positive lags • Nonzero SAR coefficients at positive lags • Nonzero MA coefficients at positive lags • Nonzero SMA coefficients at positive lags • Regression coefficients (when Mdl contains them) • Variance • Degrees of freedom for the t distribution Data Types: double
Version History Introduced in R2013b R2018a: Warns Warns starting in R2018a print will be removed in a future release. Use summarize instead. This list shows the differences between print and summarize: • For an unestimated (custom) regARIMA model input, summarize returns the standard object display of the model. • For an estimated regARIMA model input, as returned by estimate, summarize prints an estimation summary in a MATLAB table and lists other estimation statistics. summarize returns the estimation statistics in an output structure array. 12-1982
print
See Also Objects regARIMA Functions estimate | summarize
12-1983
12
Functions
recessionplot Overlay recession bands on time series plot
Syntax recessionplot recessionplot(Name,Value) hBands = recessionplot( ___ )
Description recessionplot overlays shaded US recession bands, as reported by the National Bureau of Economic Research (NBER) [1], on a time series plot in the current axes. Abscissa data must represent dates created by datenum or datetime. recessionplot(Name,Value) uses additional options specified by one or more name-value arguments. For example, recessionplot('recessions',recessionPeriods) specifies overlaying shaded bands for the recession periods in recessionPeriods. hBands = recessionplot( ___ ) returns a vector of handles to the recession bands, using any of the input-argument combinations in the previous syntaxes.
Examples Overlay Recession Bands on Time Series Plot Load the Data_Unemployment.mat data set, which contains a monthly US unemployment rate series measured from 1954 through 1998. load Data_Unemployment
The variables Data and dates, among others, appear in the workspace. For more details on the data, enter Description. Data is a 45-by-12 matrix of the unemployment rates. The rows of Data correspond to successive years and its columns correspond to successive months; Data(j,k) is the unemployment rate in month k of year j. Represent Data as a vector of regular time series data by transposing the matrix, and then vertically concatenating the columns of the result. Data = Data'; un = Data(:);
dates is a numeric vector of the 45 consecutive sampling years. Create a datetime vector that expands dates by including all months within each year. Y Y M D t
12-1984
= = = = =
repmat(dates',12,1); Y(:); repmat((1:12)',length(dates),1); ones(length(un),1); datetime(Y,M,D);
recessionplot
Alternatively, you can use the calmonths function to efficiently include all months within each year. tspan = datetime([dates(1); dates(end)],[1; 12],[1; 1]); t = (tspan(1):calmonths(1):tspan(2))';
Plot the unemployment rate series. Overlay bands for recession periods reported by NBER. figure plot(t,un) recessionplot ylabel('Rate (%)') title("Unemployment Rate")
Periods of recession appear to occur with sudden, relatively large increases in the unemployment rate.
Change Color and Transparency of Recession Bands Overlay recession bands on time series plot, then return the handles of the recession bands to change the color and opacity of the bands. Load the Data_CreditDefaults.mat data set, which contains a credit default rate series and several predictor series measured annually from 1984 through 2004. load Data_CreditDefaults
12-1985
12
Functions
The variables Data and dates, among others, appear in the workspace. For more details on the data, enter Description. Data is a 21-by-5 numeric matrix containing the series. Extract the predictor series, which comprise the first four columns. X = Data(:,1:4);
dates is a numeric vector containing the 21 sampling years. Convert dates to a datetime vector of years. Assume the series are measured at the end of the year. T = numel(dates); dates = [dates [12 31].*ones(T,2)]; dates = datetime(dates);
Plot the predictor series. Overlay recession bands and return the handles to the bands. Change the band color to red and reduce the opacity to 0.1. figure; plot(dates,X,'LineWidth',2); xlabel("Year"); ylabel("Level"); hBands = recessionplot; set(hBands,'FaceColor',"r",'FaceAlpha',0.1);
12-1986
recessionplot
Input Arguments Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'recessions',recessionPeriods specifies overlaying shaded bands for the recession periods in recessionPeriods. axes — Axes on which to plot Axes object Axes on which to overlay recession bands, specified as an Axes object. The target axes must contain a time series plot with serial dates on the horizontal axis. By default, recessionplot plots to the current axes (gca). recessions — Recession periods Data_Recessions.mat (default) | matrix of serial date numbers | datetime matrix Recession periods, or data indicating the beginning and end of historical recessions, specified as a numRecessions-by-2 matrix of serial date numbers or datetime entries. Each row is a period of recession, with the first column indicating the beginning of the recession and the second column indicating the end of the recession. The default is the US recession data in Data_Recessions.mat, reported by NBER [1].
Output Arguments hBands — Handles to plotted graphics objects graphics vector Handles to plotted graphics objects, returned as a graphics vector. hBands contains unique plot identifiers, which you can use to query or modify properties of the recession bands.
Tips • recessionplot requires datetime values or serial date numbers on the horizontal axis of a time series plot. To convert other date information to this format before plotting, use datetime or datenum. • To achieve satisfactory displays on certain monitors and projectors, change the color and opacity of the recession bands by setting the FaceColor and FaceAlpha properties of the output handles.
Version History Introduced in R2012a 12-1987
12
Functions
R2022a: Plot band for new recession interval The National Bureau of Economic Research defined the period February, 2020, through April, 2020, in the US as a recession (https://www.nber.org/research/business-cycle-dating). The data set Data_Recessions.mat includes the new recession period. Consequently, recessionplot plots a recession band for that period in time series plots.
References [1] National Bureau of Economic Research (NBER), Business Cycle Expansions and Contractions, https://www.nber.org/research/data/us-business-cycle-expansions-andcontractions.
See Also datenum | datetime Topics “Dates and Time” “Represent Dates and Times in MATLAB” “Convert Between Text and datetime or duration Values”
12-1988
recreg
recreg Recursive linear regression
Syntax [Coeff,SE] = recreg(X,y) [CoeffTbl,SETbl] = recreg(Tbl) ___ = recreg( ___ ,Name=Value) recreg( ___ ) ___ = recreg(ax, ___ ) [ ___ ,coeffPlots] = recreg( ___ )
Description recreg recursively estimates coefficients (β) and their standard errors in a multiple linear regression model of the form y = Xβ + ε by performing successive regressions using nested or rolling windows. recreg has options for OLS, HAC, and FGLS estimates, and for iterative plots of the estimates. [Coeff,SE] = recreg(X,y) returns a matrix of regression coefficient estimates Coeff and a corresponding matrix of standard error estimates SE from recursive regressions of the multiple linear regression model y = Xβ + ε. [CoeffTbl,SETbl] = recreg(Tbl) returns regression coefficients estimates in the table CoeffTbl, and standard error estimates in the table SETbl from a recursive regression on the linear model of the variables in the table or timetable Tbl. The response variable in the regression is the last table variable, and all other variables are the predictor variables. To select a different response variable for the regression, use the ResponseVariable name-value argument. To select different predictor variables, use the PredictorNames name-value argument. ___ = recreg( ___ ,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in previous syntaxes. recreg returns the output argument combination for the corresponding input arguments. For example, recreg(Tbl,ResponseVariable="GDP",Intercept=false,Estimator="fgls") excludes an intercept term from the regression model, in which the response variable is the variable GDP in the table Tbl, and uses FGLS to estimate coefficients and standard errors. recreg( ___ ) plots iterative coefficient estimates with ±2 standard error bands for each coefficient in the multiple linear regression model. ___ = recreg(ax, ___ ) plots on the axes specified in ax instead of the axes of new figures. The option ax can precede any of the input argument combinations in the previous syntaxes. [ ___ ,coeffPlots] = recreg( ___ ) additionally returns handles to plotted graphics objects. Use elements of coeffPlots to modify properties of the plots after you create it.
Examples
12-1989
12
Functions
Inspect Consumption Model Coefficients for Structural Change Check coefficient estimates for instability in a model of food demand around World War II. Implement forward and backward recursive regressions in a rolling window. Load the US food consumption data set, which contains annual measurements from 1927 through 1962 with missing data due to WWII. load Data_Consumption
For more details on the data, enter Description at the command prompt. Plot the series. P = Data(:,1); % Food price index I = Data(:,2); % Disposable income index Q = Data(:,3); % Food consumption index figure plot(dates,[P I Q]) axis tight grid on xlabel("Year") ylabel("Index") title("\bf Time Series Plot of All Series") legend("Price","Income","Consumption",Location="southeast")
Measurements are missing from 1942 through 1947, which correspond to WWII. 12-1990
recreg
To examine elasticities, apply the log transformation to each series. LP = log(P); LI = log(I); LQ = log(Q);
Consider a model in which log consumption is a linear function of the logs of food price and income. In other words, LQt = β0 + β1LIt + β2LP + εt .
εt is a Gaussian random variable with mean 0 and standard deviation σ2. Identify the breakpoint index at the end of WWII, 1945. Ignore missing years with missing data. numCoeff = 4; % Three predictors and an intercept T = numel(dates(~isnan(P))); % Sample size bpIdx = find(dates(~isnan(P)) >= 1945,1) - numCoeff bpIdx = 12
The 12th iteration corresponds to the end of the war. Plot forward recursive-regression coefficient estimates using a rolling window 1/4 of the sample size. Indicate to plot the coefficients of LP and LI only in the same figure. X = [LP LI]; y = LQ; varnames = ["Log-price" "Log-income"]; plotvars = [false true true]; window = ceil(T*1/4); recreg(X,y,Window=window,Plot="combined",PlotVars=plotvars, ... VarNames=varnames);
12-1991
12
Functions
Plot forward recursive-regression coefficient estimates using a rolling window 1/3 of the sample size. window = ceil(T*1/3); recreg(X,y,Window=window,Plot="combined",PlotVars=plotvars, ... VarNames=varnames);
12-1992
recreg
Plot forward recursive-regression coefficient estimates using a rolling window of size 1/2 of the sample size. window = ceil(T*1/2); recreg(X,y,Window=window,Plot="combined",PlotVars=plotvars, ... VarNames=varnames);
12-1993
12
Functions
As the window size increases, the lines show less volatility, but the coefficients do exhibit instability.
Inspect Real US GNP Model for Instability Apply recursive regressions using nested windows to look for instability in an explanatory model of real GNP for a period spanning World War II. Load the Nelson-Plosser data set. load Data_NelsonPlosser
The time series in the data set contain annual, macroeconomic measurements from 1860 to 1970. For more details, a list of variables, and descriptions, enter Description in the command line. Several series have missing data. Focus the sample to measurements from 1915 to 1970. Identify the index corresponding to 1945, the end of WWII, to use as a breakpoint for the test. span = (1915 NumPaths, simulate uses only the first NumPaths columns. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The datetime vector of sample timestamps Presample.Time must be ascending or descending. • If you specify InSample, Presample must immediately precede InSample, with respect to the sampling frequency. If Presample is a table, the last row contains the latest presample observation. By default, simulate sets the following values: • For necessary presample responses: • The unconditional mean of the model when Mdl represents a stationary AR process without a regression component • Zero when Mdl represents a nonstationary process or when it contains a regression component. 12-2158
simulate
• For necessary presample disturbances, zero. • For necessary presample conditional variances, the unconditional variance of the conditional variance model n Mdl.Variance. If you specify the Presample, you must specify the presample response, innovation, or conditional variance variable name by using the PresampleResponseVariable, PresampleInnovationVariable, or PresampleVarianceVariable name-value argument. PresampleResponseVariable — Response variable yt to select from Presample string scalar | character vector | integer | logical vector Response variable yt to select from Presample containing presample response data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleResponseVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric matrix and cannot contain missing values (NaNs). If you specify presample response data by using the Presample name-value argument, you must specify PresampleResponseVariable. Example: PresampleResponseVariable="Stock0" Example: PresampleResponseVariable=[false false true false] or PresampleResponseVariable=3 selects the third table variable as the presample response variable. Data Types: double | logical | char | cell | string InSample — In-sample predictor data table | timetable In-sample predictor data for the exogenous regression component of the model, specified as a table or timetable. InSample contains numvars variables, including numpreds predictor variables xt. simulate returns the simulated variables in the output table or timetable Tbl, which is commensurate with InSample. Each row corresponds to an observation in the simulation horizon, the first row is the earliest observation, and measurements in each row, among all paths, occur simultaneously. InSample must have at least numobs rows to cover the simulation horizon. If you supply more rows than necessary, simulate uses only the first numobs rows. Each selected predictor variable is a numeric vector without missing values (NaNs). All predictor variables are present in the regression component of each response equation and apply to all response paths. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). 12-2159
12
Functions
• The datetime vector InSample.Time must be ascending or descending. • If you specify Presample, Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the last row contains the latest observation. By default, simulate does not include the regression component in the model, regardless of the value of Mdl.Beta. PredictorVariables — Exogenous predictor variables xt to select from InSample string vector | cell vector of character vectors | vector of integers | logical vector Exogenous predictor variables xt to select from InSample containing predictor data for the regression component, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames • A vector of unique indices (positive integers) of variables to select from InSample.Properties.VariableNames • A logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames The selected variables must be numeric vectors and cannot contain missing values (NaNs). By default, simulate excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. Data Types: double | logical | char | cell | string Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: simulate(Mdl,10,NumPaths=1000,Y0=y0) simulates 1000 sample paths of length 10 from the ARIMA model Mdl, and uses the observations in y0 as a presample to initialize each generated path. NumPaths — Number of independent sample paths to generate 1 (default) | positive integer Number of independent sample paths to generate, specified as a positive integer. Example: NumPaths=1000 Data Types: double Y0 — Presample response data yt numeric column vector | numeric matrix 12-2160
simulate
Presample response data yt used as initial values for the model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths numeric matrix. Use Y0 only when you supply optional data inputs as numeric arrays. numpreobs is the number of presample observations. numprepaths is the number of presample response paths. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the AR model component. If numpreobs > Mdl.P, simulate uses the latest required number of observations only. Columns of Y0 are separate, independent presample paths. The following conditions apply: • If Y0 is a column vector, it represents a single response path. simulate applies it to each output path. • If Y0 is a matrix, simulate applies Y0(:,j) to initialize path j. Y0 must have at least NumPaths columns; simulate uses only the first NumPaths columns of Y0. By default, simulate sets any necessary presample responses to one of the following values: • The unconditional mean of the model when Mdl represents a stationary AR process without a regression component • Zero when Mdl represents a nonstationary process or when it contains a regression component Data Types: double E0 — Presample innovation data εt numeric column vector | numeric matrix Presample innovation data εt used to initialize either the moving average (MA) component of the ARIMA model or the conditional variance model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths matrix. Use E0 only when you supply optional data inputs as numeric arrays. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the MA model component. If Mdl.Variance is a conditional variance model (for example, a garch model object), simulate can require more rows than Mdl.Q. If numpreobs is larger than required, simulate uses the latest required number of observations only. Columns of E0 are separate, independent presample paths. The following conditions apply: • If E0 is a column vector, it represents a single residual path. simulate applies it to each output path. • If E0 is a matrix, simulate applies E0(:,j) to initialize simulating path j. E0 must have at least NumPaths columns; simulate uses only the first NumPaths columns of E0. By default, simulate sets the necessary presample disturbances to zero. Data Types: double V0 — Presample conditional variance data σt2 positive numeric column vector | positive numeric matrix 12-2161
12
Functions
Presample conditional variance data σt2 used to initialize the conditional variance model, specified as a numpreobs-by-1 positive numeric column vector or a numpreobs-by-numprepaths positive numeric matrix. If the conditional variance Mdl.Variance is constant, simulate ignores V0. Use V0 only when you supply optional data inputs as numeric arrays. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the conditional variance model in Mdl.Variance. For details, see the simulate function of conditional variance models. If numpreobs is larger than required, simulate uses the latest required number of observations only. Columns of V0 are separate, independent presample paths. The following conditions apply: • If V0 is a column vector, it represents a single path of conditional variances. simulate applies it to each output path. • If V0 is a matrix, simulate applies V0(:,j) to initialize simulating path j. V0 must have at least NumPaths columns; simulate uses only the first NumPaths columns of V0. By default, simulate sets all necessary presample observations to the unconditional variance of the conditional variance process. Data Types: double PresampleInnovationVariable — Residual variable et to select from Presample string scalar | character vector | integer | logical vector Residual variable et to select from Presample containing the presample residual data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric matrix and cannot contain missing values (NaNs). If you specify presample residual data by using the Presample name-value argument, you must specify PresampleInnovationVariable. Example: PresampleInnovationVariable="StockRateDist0" Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable as the presample innovation variable. Data Types: double | logical | char | cell | string PresampleVarianceVariable — Conditional variance variable σt2 to select from Presample string scalar | character vector | integer | logical vector Conditional variance variable σt2 to select from Presample containing presample conditional variance data, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames 12-2162
simulate
• Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable. Example: PresampleVarianceVariable="StockRateVar0" Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable. Data Types: double | logical | char | cell | string X — Exogenous predictor data numeric matrix Exogenous predictor data for the regression component in the model, specified as a numeric matrix with numpreds columns. numpreds is the number of predictor variables (numel(Mdl.Beta)). Use X only when you supply optional data inputs as numeric arrays. Each row of X corresponds to a period in the length numobs simulation sample (period for which simulate simulates observations; the period after the presample). X must have at least numobs rows. The last row contains the latest predictor data. If X has more than numobs rows, simulate uses only the latest numobs rows. simulate does not use the regression component in the presample period. Columns of X are separate predictor variables. simulate applies X to each simulated path; that is, X represents one path of observed predictors. By default, simulate excludes the regression component, regardless of its presence in Mdl. Data Types: double Note • NaN values in X, Y0, E0, and V0 indicate missing values. simulate removes missing values from specified data by list-wise deletion. • For the presample, simulate horizontally concatenates the possibly jagged arrays Y0, E0, and V0 with respect to the last rows, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data, simulate removes any row of X containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, simulate assumes that you synchronize the presample data such that the latest observations occur simultaneously. • simulate issues an error when any table or timetable input contains missing values.
12-2163
12
Functions
Output Arguments Y — Simulated response paths yt numeric column vector | numeric matrix Simulated response paths yt, returned as a numobs-by-1 numeric column vector or a numobs-byNumPaths numeric matrix. simulate returns Y by default and when you supply optional data in numeric arrays. Y represents the continuation of the presample responses in Y0. Each row corresponds to a period in the simulated series; the simulated series has the periodicity of Mdl. Each column is a separate simulated path. E — Simulated model innovations paths εt numeric column vector | numeric matrix Simulated model innovations paths εt, returned as a numobs-by-1 numeric column vector or a numobs-by-NumPaths numeric matrix. simulate returns E by default and when you supply optional data in numeric arrays The dimensions of E correspond to the dimensions of Y. V — Simulated conditional variance paths σt2 numeric column vector | numeric matrix Simulated conditional variance paths σt2 of the mean-zero innovations associated with Y, returned as a numobs-by-1 numeric column vector or a numobs-by-NumPaths numeric matrix. simulate returns V by default and when you supply optional data in numeric arrays The dimensions of V correspond to the dimensions of Y. Tbl — Simulated response yt, innovation εt, and conditional variance σt2 paths table | timetable Simulated response yt, innovation εt, and conditional variance σt2 paths, returned as a table or timetable, the same data type as Presample or InSample. simulate returns Tbl only when you supply at least one of the inputs Presample and InSample. Tbl contains the following variables: • The simulated response paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding presample path in Presample, or each path corresponds, in time, with the rows of InSample. simulate names the simulated response variable in Tbl responseName_Response, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl contains a variable for the corresponding simulated response paths with the name StockReturns_Response. • The simulated innovation paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding presample path in Presample, or each path corresponds, in time, with the rows of InSample. simulate names the simulated innovation variable in Tbl responseName_Innovation, where responseName is Mdl.SeriesName. For example, if 12-2164
simulate
Mdl.SeriesName is StockReturns, Tbl contains a variable for the corresponding simulated innovation paths with the name StockReturns_Innovation. • The simulated conditional variance paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding presample path in Presample, or each path corresponds, in time, with the rows of InSample. simulate names the simulated conditional variance variable in Tbl responseName_Variance, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl contains a variable for the corresponding simulated conditional variance paths with the name StockReturns_Variance. • When you supply InSample, Tbl contains all variables in InSample. If Tbl is a timetable, the following conditions hold: • The row order of Tbl, either ascending or descending, matches the row order of Preample. • If you specify InSample, row times Tbl.Time are InSample.Time(1:numobs). Otherwise, Tbl.Time(1) is the next time after Presample(end) relative to the sampling frequency, and Tbl.Time(2:numobs) are the following times relative to the sampling frequency.
Version History Introduced in R2012a R2023a: simulate accepts input data in tables and timetables, and returns results in tables and timetables In addition to accepting presample and in-sample predictor data in numeric arrays, simulate accepts input data in tables or regular timetables. When you supply input data in a table or timetable, the following conditions apply: • If you specify optional presample response, innovation, or conditional variance data to initialize the model, you must also specify corresponding variable names containing the data to use. • If you specify optional in-sample predictor data for the exogenous regression component of the model, you must also specify corresponding predictor variable names containing the data to use. • simulate returns results in a table or timetable. Name-value arguments to support tabular workflows include: • Presample specifies the input table or regular timetable of presample innovations and conditional variance data. • PresampleResponseVariable specifies the variable name of the response paths to select from Presample. • PresampleInnovationVariable specifies the variable name of the innovation paths to select from Presample. • PresampleVarianceVariable specifies the variable name of the conditional variance paths to select from Presample. • InSample specifies the input table or regular timetable of in-sample predictor data. • PredictorVariables specifies the names of the predictor series to select from InSample for a model regression component. 12-2165
12
Functions
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects arima Functions estimate | filter | impulse | forecast Topics “Simulate Stationary Processes” on page 7-147 “Simulate Trend-Stationary and Difference-Stationary Processes” on page 7-155 “Simulate Multiplicative ARIMA Models” on page 7-159 “Simulate Conditional Mean and Variance Models” on page 7-162 “Monte Carlo Simulation of Conditional Mean Models” on page 7-143 “Presample Data for Conditional Mean Model Simulation” on page 7-145 “Transient Effects in Conditional Mean Model Simulations” on page 7-146 “Monte Carlo Forecasting of Conditional Mean Models” on page 7-166
12-2166
simulate
simulate Simulate regression coefficients and disturbance variance of Bayesian linear regression model
Syntax [BetaSim,sigma2Sim] = simulate(Mdl) [BetaSim,sigma2Sim] = simulate(Mdl,X,y) [BetaSim,sigma2Sim] = simulate( ___ ,Name,Value) [BetaSim,sigma2Sim,RegimeSim] = simulate( ___ )
Description [BetaSim,sigma2Sim] = simulate(Mdl) returns a random vector of regression coefficients (BetaSim) and a random disturbance variance (sigma2Sim) drawn from the Bayesian linear regression model on page 12-2187 Mdl of β and σ2. • If Mdl is a joint prior model (returned by bayeslm), then simulate draws from the prior distributions. • If Mdl is a joint posterior model (returned by estimate), then simulate draws from the posterior distributions. [BetaSim,sigma2Sim] = simulate(Mdl,X,y) draws from the marginal posterior distributions produced or updated by incorporating the predictor data X and corresponding response data y. • If Mdl is a joint prior model, then simulate produces the marginal posterior distributions by updating the prior model with information about the parameters that it obtains from the data. • If Mdl is a marginal posterior model, then simulate updates the posteriors with information about the parameters that it obtains from the additional data. The complete data likelihood is composed of the additional data X and y, and the data that created Mdl. NaNs in the data indicate missing values, which simulate removes by using list-wise deletion. [BetaSim,sigma2Sim] = simulate( ___ ,Name,Value) uses any of the input argument combinations in the previous syntaxes and additional options specified by one or more name-value pair arguments. For example, you can specify a value for β or σ2 to simulate from the conditional posterior distribution of one parameter, given the specified value of the other parameter. [BetaSim,sigma2Sim,RegimeSim] = simulate( ___ ) also returns draws from the latent regime distribution if Mdl is a Bayesian linear regression model for stochastic search variable selection (SSVS), that is, if Mdl is a mixconjugateblm or mixsemiconjugateblm model object.
Examples Simulate Parameter Value from Prior and Posterior Distributions Consider the multiple linear regression model that predicts the US real gross national product (GNPR) by using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). 12-2167
12
Functions
GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions: •
β | σ2 ∼ N4 M, σ2V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser varNames = {'IPI' 'E' 'WR'}; X = DataTable{:,varNames}; y = DataTable{:,'GNPR'};
Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and the variable names. p = 3; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',varNames);
PriorMdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Simulate a set of regression coefficients and a value of the disturbance variance from the prior distribution. rng(1); % For reproducibility [betaSimPrior,sigma2SimPrior] = simulate(PriorMdl) betaSimPrior = 4×1 -33.5917 -49.1445 -37.4492 -25.3632 sigma2SimPrior = 0.1962
betaSimPrior is the randomly drawn 4-by-1 vector of regression coefficients corresponding to the names in PriorMdl.VarNames. The sigma2SimPrior output is the randomly drawn scalar disturbance variance. Estimate the posterior distribution. PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 Log marginal likelihood: -259.348
12-2168
simulate
| Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | -24.2494 8.7821 [-41.514, -6.985] 0.003 t (-24.25, 8.65^2, 68) IPI | 4.3913 0.1414 [ 4.113, 4.669] 1.000 t (4.39, 0.14^2, 68) E | 0.0011 0.0003 [ 0.000, 0.002] 1.000 t (0.00, 0.00^2, 68) WR | 2.4683 0.3490 [ 1.782, 3.154] 1.000 t (2.47, 0.34^2, 68) Sigma2 | 44.1347 7.8020 [31.427, 61.855] 1.000 IG(34.00, 0.00069)
PosteriorMdl is a conjugateblm Bayesian linear regression model object representing the posterior distribution of the regression coefficients and disturbance variance. Simulate a set of regression coefficients and a value of the disturbance variance from the posterior distribution. [betaSimPost,sigma2SimPost] = simulate(PosteriorMdl) betaSimPost = 4×1 -25.9351 4.4379 0.0012 2.4072 sigma2SimPost = 41.9575
betaSimPost and sigma2SimPost have the same dimensions as betaSimPrior and sigma2SimPrior, respectively, but are drawn from the posterior.
Implement Gibbs Sampler for Posterior Estimation Consider the regression model in “Simulate Parameter Value from Prior and Posterior Distributions” on page 12-2167. Load the data and create a conjugate prior model for the regression coefficients and the disturbance variance. Then, estimate the posterior distribution and return the estimation summary table. load Data_NelsonPlosser varNames = {'IPI' 'E' 'WR'}; X = DataTable{:,varNames}; y = DataTable{:,'GNPR'}; p = 3; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',varNames); [PosteriorMdl,Summary] = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 Log marginal likelihood: -259.348 | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | -24.2494 8.7821 [-41.514, -6.985] 0.003 t (-24.25, 8.65^2, 68)
12-2169
12
Functions
IPI E WR Sigma2
| | | |
4.3913 0.0011 2.4683 44.1347
0.1414 0.0003 0.3490 7.8020
[ 4.113, 4.669] [ 0.000, 0.002] [ 1.782, 3.154] [31.427, 61.855]
1.000 1.000 1.000 1.000
t (4.39, 0.14^2, 68) t (0.00, 0.00^2, 68) t (2.47, 0.34^2, 68) IG(34.00, 0.00069)
Summary is a table containing the statistics that estimate displays at the command line. Although the marginal and conditional posterior distributions of β and σ2 are analytically tractable, this example focuses on how to implement the Gibbs sampler to reproduce known results. Estimate the model again, but use a Gibbs sampler. Alternate between sampling from the conditional posterior distributions of the parameters. Sample 10,000 times and create variables for preallocation. Start the sampler by drawing from the conditional posterior of β given σ2 = 2. m = 1e4; BetaDraws = zeros(p + 1,m); sigma2Draws = zeros(1,m + 1); sigma2Draws(1) = 2; rng(1); % For reproducibility for j = 1:m BetaDraws(:,j) = simulate(PriorMdl,X,y,'Sigma2',sigma2Draws(j)); [~,sigma2Draws(j + 1)] = simulate(PriorMdl,X,y,'Beta',BetaDraws(:,j)); end sigma2Draws = sigma2Draws(2:end); % Remove initial value from MCMC sample
Graph trace plots of the parameters. figure; for j = 1:(p + 1); subplot(2,2,j); plot(BetaDraws(j,:)) ylabel('MCMC Draw') xlabel('Simulation Index') title(sprintf('Trace Plot — %s',PriorMdl.VarNames{j})); end
12-2170
simulate
figure; plot(sigma2Draws) ylabel('MCMC Draw') xlabel('Simulation Index') title('Trace plot — Sigma2')
12-2171
12
Functions
The Markov chain Monte Carlo (MCMC) samples appear to converge and mix well. Apply a burn-in period of 1000 draws, and then compute the means and standard deviations of the MCMC samples. Compare them with the estimates from estimate. bp = 1000; postBetaMean = mean(BetaDraws(:,(bp + 1):end),2); postSigma2Mean = mean(sigma2Draws(:,(bp + 1):end)); postBetaStd = std(BetaDraws(:,(bp + 1):end),[],2); postSigma2Std = std(sigma2Draws((bp + 1):end)); [Summary(:,1:2),table([postBetaMean; postSigma2Mean],... [postBetaStd; postSigma2Std],'VariableNames',{'GibbsMean','GibbsStd'})] ans=5×4 table
Intercept IPI E WR Sigma2
Mean _________
Std __________
GibbsMean _________
GibbsStd __________
-24.249 4.3913 0.0011202 2.4683 44.135
8.7821 0.1414 0.00032931 0.34895 7.802
-24.293 4.3917 0.0011229 2.4654 44.011
8.748 0.13941 0.00032875 0.34364 7.7816
The estimates are very close. MCMC variations account for the differences.
12-2172
simulate
Simulate Regimes from SSVS Predictor Selection Consider the regression model in “Simulate Parameter Value from Prior and Posterior Distributions” on page 12-2167. Assume these prior distributions for k = 0,...,3: •
βk | σ2, γk = γkσ V k1Z1 + (1 − γk)σ V k2Z2, where Z1 and Z2 are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori, but they are dependent on the disturbance variance.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. • γk ∈ 0, 1 and it represents the random variable-inclusion regime variable with a discrete uniform distribution. Create a prior model for performing SSVS. Assume that β and σ2 are dependent (a conjugate mixture model). Specify the number of predictors p and the names of the regression coefficients. p = 3; PriorMdl = mixconjugateblm(p,'VarNames',["IPI" "E" "WR"]);
Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};
Compute the number of possible regimes, that is, number of combinations that result from including and excluding variables in the model. cardRegime = 2^(PriorMdl.Intercept + PriorMdl.NumPredictors) cardRegime = 16
Simulate 10,000 regimes from the posterior distribution. rng(1); [~,~,RegimeSim] = simulate(PriorMdl,X,y,'NumDraws',10000);
RegimeSim is a 4-by-1000 logical matrix. Rows correspond to the variables in Mdl.VarNames, and columns correspond to draws from the posterior distribution. Plot a histogram of the regimes visited. Recode the regimes so that they are readable. Specifically, for each regime, create a string that identifies the variables in the model, and separate the variables with dots. cRegime = num2cell(RegimeSim,1); cRegime = categorical(cellfun(@(c)join(PriorMdl.VarNames(c),"."),cRegime)); cRegime(ismissing(cRegime)) = "NoCoefficients"; histogram(cRegime); title('Variables Included in Models') ylabel('Frequency');
12-2173
12
Functions
Compute the marginal posterior probability of variable inclusion. table(mean(RegimeSim,2),'RowNames',PriorMdl.VarNames,... 'VariableNames',"Regime") ans=4×1 table Regime ______ Intercept IPI E WR
0.8829 0.4547 0.098 0.1692
Robust Regression Using Gibbs Sampler Consider a Bayesian linear regression model containing one predictor, and a t distributed disturbance variance with a profiled degrees of freedom parameter ν. •
λ j ∼ IG(ν/2, 2/ν).
• ε j | λ j ∼ N(0, λ jσ2) •
12-2174
f (β, σ2) ∝
1 σ2
simulate
These assumptions imply: • ε j ∼ t(0, σ2, ν) •
λ j | ε j ∼ IG
ν+1 2 , 2 ν + ε2 /σ2 j
λ is a vector of latent scale parameters that attributes low precision to observations far from the regression line. ν is a hyperparameter controlling the influence of λ on the observations. For this problem, the Gibbs sampler is well suited to estimate the coefficients because you can simulate the parameters of a Bayesian linear regression model conditioned on λ, and then simulate λ from its conditional distribution. Generate n = 100 responses from yt = 1 + 2xt + et, where x ∈ [0, 2] and et ∼ N(0, 0 . 52). rng('default'); n = 100; x = linspace(0,2,n)'; b0 = 1; b1 = 2; sigma = 0.5; e = randn(n,1); y = b0 + b1*x + sigma*e;
Introduce outlying responses by inflating all responses below x = 0 . 25 by a factor of 3. y(x < 0.25) = y(x < 0.25)*3;
Fit a linear model to the data. Plot the data and the fitted regression line. Mdl = fitlm(x,y) Mdl = Linear regression model: y ~ 1 + x1 Estimated Coefficients: Estimate ________ (Intercept) x1
2.6814 0.78974
SE _______
tStat ______
pValue __________
0.28433 0.24562
9.4304 3.2153
2.0859e-15 0.0017653
Number of observations: 100, Error degrees of freedom: 98 Root Mean Squared Error: 1.43 R-squared: 0.0954, Adjusted R-Squared: 0.0862 F-statistic vs. constant model: 10.3, p-value = 0.00177 figure; plot(Mdl); hl = legend; hold on;
12-2175
12
Functions
The simulated outliers appear to influence the fitted regression line. Implement this Gibbs sampler: 1
Draw parameters from the posterior distribution of β, σ2 | y, x, λ. Deflate the observations by λ, create a diffuse prior model with two regression coefficients, and draw a set of parameters from the posterior. The first regression coefficient corresponds to the intercept, so specify that bayeslm not include an intercept.
2
Compute residuals.
3
Draw values from the conditional posterior of λ.
Run the Gibbs sampler for 20,000 iterations and apply a burn-in period of 5,000. Specify ν = 1, preallocate for the posterior draws, and initialize λ to a vector of ones. m = 20000; nu = 1; burnin = 5000; lambda = ones(n,m + 1); estBeta = zeros(2,m + 1); estSigma2 = zeros(1,m + 1); for j = 1:m yDef = y./sqrt(lambda(:,j)); xDef = [ones(n,1) x]./sqrt(lambda(:,j)); PriorMdl = bayeslm(2,'Model','diffuse','Intercept',false); [estBeta(:,j + 1),estSigma2(1,j + 1)] = simulate(PriorMdl,xDef,yDef); ep = y - [ones(n,1) x]*estBeta(:,j + 1);
12-2176
simulate
sp = (nu + 1)/2; sc = 2./(nu + ep.^2/estSigma2(1,j + 1)); lambda(:,j + 1) = 1./gamrnd(sp,sc); end
A good practice is to diagnose the MCMC sampler by examining trace plots. For brevity, this example skips this task. Compute the mean of the draws from the posterior of the regression coefficients. Remove the burn-in period draws. postEstBeta = mean(estBeta(:,(burnin + 1):end),2) postEstBeta = 2×1 1.3971 1.7051
The estimate of the intercept is lower and the slope is higher than the estimates returned by fitlm. Plot the robust regression line with the regression line fitted by least squares. h = gca; xlim = h.XLim'; plotY = [ones(2,1) xlim]*postEstBeta; plot(xlim,plotY,'LineWidth',2); hl.String{4} = 'Robust Bayes';
12-2177
12
Functions
The regression line fit using robust Bayesian regression appears to be a better fit.
Estimate Maximum A Posteriori Probability Using Monte Carlo The maximum a posteriori probability (MAP) estimate is the posterior mode, that is, the parameter value that yields the maximum of the posterior pdf. If the posterior is analytically intractable, then you can use Monte Carlo sampling to estimate the MAP. Consider the linear regression model in “Simulate Parameter Value from Prior and Posterior Distributions” on page 12-2167. Load the Nelson-Plosser data set. Create variables for the response and predictor series. load Data_NelsonPlosser varNames = {'IPI' 'E' 'WR'}; X = DataTable{:,varNames}; y = DataTable{:,'GNPR'};
Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and the variable names. p = 3; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',varNames) PriorMdl = conjugateblm with properties: NumPredictors: Intercept: VarNames: Mu: V: A: B:
3 1 {4x1 cell} [4x1 double] [4x4 double] 3 1
| Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) IPI | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) E | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) WR | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
Estimate the marginal posterior distributions of β and σ2. rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y); Method: Analytic posterior distributions Number of observations: 62 Number of predictors: 4 Log marginal likelihood: -259.348
12-2178
simulate
| Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | -24.2494 8.7821 [-41.514, -6.985] 0.003 t (-24.25, 8.65^2, 68) IPI | 4.3913 0.1414 [ 4.113, 4.669] 1.000 t (4.39, 0.14^2, 68) E | 0.0011 0.0003 [ 0.000, 0.002] 1.000 t (0.00, 0.00^2, 68) WR | 2.4683 0.3490 [ 1.782, 3.154] 1.000 t (2.47, 0.34^2, 68) Sigma2 | 44.1347 7.8020 [31.427, 61.855] 1.000 IG(34.00, 0.00069)
The display includes the marginal posterior distribution statistics. Extract the posterior mean of β from the posterior model, and extract the posterior covariance of β from the estimation summary returned by summarize. estBetaMean = PosteriorMdl.Mu; Summary = summarize(PosteriorMdl); EstBetaCov = Summary.Covariances{1:(end - 1),1:(end - 1)};
estBetaMean is a 4-by-1 vector representing the mean of the marginal posterior of β. EstBetaCov is a 4-by-4 matrix representing the covariance matrix of the posterior of β. Draw 10,000 parameter values from the posterior distribution. rng(1); % For reproducibility [BetaSim,sigma2Sim] = simulate(PosteriorMdl,'NumDraws',1e5);
BetaSim is a 4-by-10,000 matrix of randomly drawn regression coefficients. sigma2Sim is a 1by-10,000 vector of randomly drawn disturbance variances. Transpose and standardize the matrix of regression coefficients. Compute the correlation matrix of the regression coefficients. estBetaStd = sqrt(diag(EstBetaCov)'); BetaSim = BetaSim'; BetaSimStd = (BetaSim - estBetaMean')./estBetaStd; BetaCorr = corrcov(EstBetaCov); BetaCorr = (BetaCorr + BetaCorr')/2; % Enforce symmetry
Because the marginal posterior distributions are known, evaluate the posterior pdf at all simulated values. betaPDF = mvtpdf(BetaSimStd,BetaCorr,68); a = 34; b = 0.00069; igPDF = @(x,ap,bp)1./(gamma(ap).*bp.^ap).*x.^(-ap-1).*exp(-1./(x.*bp));... % Inverse gamma pdf sigma2PDF = igPDF(sigma2Sim,a,b);
Find the simulated values that maximize the respective pdfs, that is, the posterior modes. [~,idxMAPBeta] = max(betaPDF); [~,idxMAPSigma2] = max(sigma2PDF); betaMAP = BetaSim(idxMAPBeta,:); sigma2MAP = sigma2Sim(idxMAPSigma2);
betaMAP and sigma2MAP are the MAP estimates. 12-2179
12
Functions
Because the posterior of β is symmetric and unimodal, the posterior mean and MAP should be the same. Compare the MAP estimate of β with its posterior mean. table(betaMAP',PosteriorMdl.Mu,'VariableNames',{'MAP','Mean'},... 'RowNames',PriorMdl.VarNames) ans=4×2 table
Intercept IPI E WR
MAP _________
Mean _________
-24.559 4.3964 0.0011389 2.4473
-24.249 4.3913 0.0011202 2.4683
The estimates are fairly close to one another. Estimate the analytical mode of the posterior of σ2. Compare it to the estimated MAP of σ2. igMode = 1/(b*(a+1)) igMode = 41.4079 sigma2MAP sigma2MAP = 41.4075
These estimates are also fairly close.
Input Arguments Mdl — Bayesian linear regression model conjugateblm model object | semiconjugateblm model object | diffuseblm model object | mixconjugateblm model object | lassoblm model object Standard Bayesian linear regression model or model for predictor variable selection, specified as a model object in this table.
12-2180
Model Object
Description
conjugateblm
Dependent, normal-inverse-gamma conjugate model returned by bayeslm or estimate
semiconjugateblm
Independent, normal-inverse-gamma semiconjugate model returned by bayeslm
diffuseblm
Diffuse prior model returned by bayeslm
empiricalblm
Prior model characterized by samples from prior distributions, returned by bayeslm or estimate
customblm
Prior distribution function that you declare returned by bayeslm
mixconjugateblm
Dependent, Gaussian-mixture-inverse-gamma conjugate model for SSVS predictor variable selection, returned by bayeslm
simulate
Model Object
Description
mixsemiconjugateblm
Independent, Gaussian-mixture-inverse-gamma semiconjugate model for SSVS predictor variable selection, returned by bayeslm
lassoblm
Bayesian lasso regression model returned by bayeslm
Note • Typically, model objects returned by estimate represent marginal posterior distributions. When you estimate a posterior by using estimate, if you specify estimation of a conditional posterior, then estimate returns the prior model. • If Mdl is a diffuseblm model, then you must also supply X and y because simulate cannot draw from an improper prior distribution. • If you supply a lassoblm, mixconjugateblm, or mixsemiconjugateblm model object, supply the data X and y, and draw one value from the posterior, then a best practice is to initialize the Gibbs sampler by specifying the BetaStart and Sigma2Start name-value pair arguments.
X — Predictor data numeric matrix Predictor data for the multiple linear regression model, specified as a numObservations-byPriorMdl.NumPredictors numeric matrix. numObservations is the number of observations and must be equal to the length of y. If Mdl is a posterior distribution, then the columns of X must correspond to the columns of the predictor data used to estimate the posterior. Data Types: double y — Response data numeric vector Response data for the multiple linear regression model, specified as a numeric vector with numObservations elements. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'Sigma2',2 specifies simulating from the conditional posterior distribution of the regression coefficients given the data and the specified disturbance variance of 2.
12-2181
12
Functions
Options for All Models
NumDraws — Effective number of draws 1 (default) | positive integer Number of draws to sample from the distribution Mdl, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer. Tip If Mdl is an empiricalblm or a customblm model object, then a good practice is to specify a burn-in period with BurnIn and a thinning multiplier with Thin. For details on the adjusted sample size, see “Algorithms” on page 12-2187. Example: 'NumDraws',1e7 Data Types: double Options for All Models Except Empirical
Beta — Value of regression coefficients for simulation from conditional distribution of disturbance variance empty array ([]) (default) | numeric column vector Value of the regression coefficients for simulation from conditional distribution of the disturbance variance, specified as the comma-separated pair consisting of 'Beta' and an (Mdl.Intercept + Mdl.NumPredictors)-by-1 numeric vector. When using a posterior distribution, simulate draws from π(σ2|y,X,β = Beta), where y is y, X is X, and Beta is the value of 'Beta'. If Mdl.Intercept is true, then Beta(1) corresponds to the model intercept. All other values correspond to the predictor variables that compose the columns of X. You cannot specify Beta and Sigma2 simultaneously. By default, simulate does not draw from the conditional posterior of σ2. Example: 'Beta',1:3 Data Types: double Sigma2 — Value of disturbance variance for simulation from conditional distribution of regression coefficients empty array ([]) (default) | positive numeric scalar Value of the disturbance variance for simulation from the conditional distribution of the regression coefficients, specified as the comma-separated pair consisting of 'Sigma2' and a positive numeric scalar. When using a posterior distribution, simulate draws from π(β|y,X,Sigma2), where y is y, X is X, and Sigma2 is the value of 'Sigma2'. You cannot specify Sigma2 and Beta simultaneously. By default, simulate does not draw from the conditional posterior of β. Example: 'Sigma2',1 Data Types: double
12-2182
simulate
Options for All Models Except Conjugate and Empirical
BurnIn — Number of draws to remove from beginning of sample 0 (default) | nonnegative scalar Number of draws to remove from the beginning of the sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and a nonnegative scalar. For details on how simulate reduces the full sample, see “Algorithms” on page 12-2187. Tip To help you specify the appropriate burn-in period size: 1
Determine the extent of the transient behavior in the sample by specifying 'BurnIn',0.
2
Simulate a few thousand observations by using simulate.
3
Draw trace plots.
Example: 'BurnIn',0 Data Types: double Thin — Adjusted sample size multiplier 1 (default) | positive integer Adjusted sample size multiplier, specified as the comma-separated pair consisting of 'Thin' and a positive integer. The actual sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, simulate discards every Thin – 1 draws, and then retains the next draw. For more details on how simulate reduces the full sample, see “Algorithms” on page 12-2187. Tip To reduce potential large serial correlation in the sample, or to reduce the memory consumption of the draws stored in PosteriorMdl, specify a large value for Thin. Example: 'Thin',5 Data Types: double BetaStart — Starting values of regression coefficients for sampler numeric column vector Starting values of the regression coefficients for the sampler, specified as the comma-separated pair consisting of 'BetaStart' and a numeric column vector with (Mdl.Intercept + Mdl.NumPredictors) elements. By default, BetaStart is the ordinary least-squares (OLS) estimate. Tip A good practice is to run simulate multiple times with different parameter starting values. Verify that your estimates from each run converge to similar values. Example: 'BetaStart',[1; 2; 3] Data Types: double 12-2183
12
Functions
Sigma2Start — Starting values of disturbance variance for sampler positive numeric scalar Starting values of the disturbance variance for the sampler, specified as the comma-separated pair consisting of 'Sigma2Start' and a positive numeric scalar. By default, Sigma2Start is the OLS residual mean squared error. Tip A good practice is to run simulate multiple times with different parameter starting values. Verify that your estimates from each run converge to similar values. Example: 'Sigma2Start',4 Data Types: double Options for SSVS Models
RegimeStart — Starting values of latent regimes for sampler true(Mdl.Intercept + Mdl.NumPredictors) (default) | logical column vector Starting values of the latent regimes for the sampler, specified as the comma-separated pair consisting of 'RegimeStart' and a logical column vector with (Mdl.Intercept + Mdl.NumPredictors) elements. RegimeStart(k) = true indicates the inclusion of the variable Mdl.VarNames(k), and RegimeStart(k) = false indicates the exclusion of that variable. Tip A good practice is to run simulate multiple times using different parameter starting values. Verify that your estimates from each run converge to similar values. Example: 'RegimeStart',logical(randi([0 1],Mdl.Intercept + Mdl.NumPredictors,1)) Data Types: double Options for Custom Models
Reparameterize — Reparameterization of σ2 as log(σ2) false (default) | true Reparameterization of σ2 as log(σ2) during posterior estimation and simulation, specified as the comma-separated pair consisting of 'Reparameterize' and a value in this table. Value
Description
false
simulate does not reparameterize σ2.
true
simulate reparameterizes σ2 as log(σ2). simulate converts results back to the original scale and does not change the functional form of PriorMdl.LogPDF.
Tip If you experience numeric instabilities during the posterior estimation or simulation of σ2, then specify 'Reparameterize',true. Example: 'Reparameterize',true 12-2184
simulate
Data Types: logical Sampler — MCMC sampler 'slice' (default) | 'metropolis' | 'hmc' MCMC sampler, specified as the comma-separated pair consisting of 'Sampler' and a value in this table. Value
Description
'slice'
Slice sampler
'metropolis'
Random walk Metropolis sampler
'hmc'
Hamiltonian Monte Carlo (HMC) sampler
Tip • To increase the quality of the MCMC draws, tune the sampler. 1
Before calling simulate, specify the tuning parameters and their values by using sampleroptions. For example, to specify the slice sampler width width, use: options = sampleroptions('Sampler',"slice",'Width',width);
2
Specify the object containing the tuning parameter specifications returned by sampleroptions by using the 'Options' name-value pair argument. For example, to use the tuning parameter specifications in options, specify: 'Options',options
• If you specify the HMC sampler, then a best practice is to provide the gradient for some variables, at least. simulate resorts the numerical computation of any missing partial derivatives (NaN values) in the gradient vector.
Example: 'Sampler',"hmc" Data Types: string Options — Sampler options [] (default) | structure array Sampler options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by sampleroptions. Use 'Options' to specify the MCMC sampler and its tuningparameter values. Example: 'Options',sampleroptions('Sampler',"hmc") Data Types: struct Width — Typical sampling-interval width positive numeric scalar | numeric vector of positive values Typical sampling-interval width around the current value in the marginal distributions for the slice sampler, specified as the comma-separated pair consisting of 'Width' and a positive numeric scalar or a (PriorMdl.Intercept + PriorMdl.NumPredictors + 1)-by-1 numeric vector of positive values. The first element corresponds to the model intercept, if one exists in the model. The next 12-2185
12
Functions
PriorMdl.NumPredictors elements correspond to the coefficients of the predictor variables ordered by the predictor data columns. The last element corresponds to the model variance. • If Width is a scalar, then simulate applies Width to all PriorMdl.NumPredictors + PriorMdl.Intercept + 1 marginal distributions. • If Width is a numeric vector, then simulate applies the first element to the intercept (if one exists), the next PriorMdl.NumPredictors elements to the regression coefficients corresponding to the predictor variables in X, and the last element to the disturbance variance. • If the sample size (size(X,1)) is less than 100, then Width is 10 by default. • If the sample size is at least 100, then simulate sets Width to the vector of corresponding posterior standard deviations by default, assuming a diffuse prior model (diffuseblm). The typical width of the slice sampler does not affect convergence of the MCMC sample. It does affect the number of required function evaluations, that is, the efficiency of the algorithm. If the width is too small, then the algorithm can implement an excessive number of function evaluations to determine the appropriate sampling width. If the width is too large, then the algorithm might have to decrease the width to an appropriate size, which requires function evaluations. simulate sends Width to the slicesample function. For more details, see slicesample. Tip • For maximum flexibility, specify the slice sampler width width by using the 'Options' namevalue pair argument. For example: 'Options',sampleroptions('Sampler',"slice",'Width',width)
Example: 'Width',[100*ones(3,1);10]
Output Arguments BetaSim — Simulated regression coefficients numeric matrix Simulated regression coefficients, returned as an (Mdl.Intercept + Mdl.NumPredictors)-byNumDraws numeric matrix. Rows correspond to the variables in Mdl.VarNames, and columns correspond to individual, successive, independent draws from the distribution. sigma2Sim — Simulated disturbance variance numeric vector of positive values Simulated disturbance variance, returned as a 1-by-NumDraws numeric vector of positive values. Columns correspond to individual, successive, independent draws from the distribution. RegimeSim — Simulated regimes logical matrix Simulated regimes indicating variable inclusion or exclusion from the model, returned as an (Mdl.Intercept + Mdl.NumPredictors)-by-NumDraws logical matrix. Rows correspond to the variables in Mdl.VarNames, and columns correspond to individual, successive, independent draws from the distribution. 12-2186
simulate
simulate returns Regime only if Mdl is a mixconjugateblm or mixsemiconjugateblm model object. RegimeSim(k,d) = true indicates the inclusion of the variable Mdl.VarNames(k) in the model of draw d, and RegimeSim(k,d) = false indicates the exclusion of that variable in the model of draw d.
Limitations • simulate cannot draw values from an improper distribution, that is, a distribution whose density does not integrate to 1. • If Mdl is an empiricalblm model object, then you cannot specify Beta or Sigma2. You cannot simulate from the conditional posterior distributions by using an empirical distribution.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Algorithms • Whenever simulate must estimate a posterior distribution (for example, when Mdl represents a prior distribution and you supply X and y) and the posterior is analytically tractable, simulate simulates directly from the posterior. Otherwise, simulate resorts to Monte Carlo simulation to estimate the posterior. For more details, see “Posterior Estimation and Inference” on page 6-4. • If Mdl is a joint posterior model, then simulate simulates data from it differently compared to when Mdl is a joint prior model and you supply X and y. Therefore, if you set the same random seed and generate random values both ways, then you might not obtain the same values. However, 12-2187
12
Functions
corresponding empirical distributions based on a sufficient number of draws is effectively equivalent. • This figure shows how simulate reduces the sample by using the values of NumDraws, Thin, and BurnIn.
Rectangles represent successive draws from the distribution. simulate removes the white rectangles from the sample. The remaining NumDraws black rectangles compose the sample. • If Mdl is a semiconjugateblm model object, then simulate samples from the posterior distribution by applying the Gibbs sampler. 1
simulate uses the default value of Sigma2Start for σ2 and draws a value of β from π(β| σ2,X,y).
2
simulate draws a value of σ2 from π(σ2|β,X,y) by using the previously generated value of β.
3
The function repeats steps 1 and 2 until convergence. To assess convergence, draw a trace plot of the sample.
If you specify BetaStart, then simulate draws a value of σ2 from π(σ2|β,X,y) to start the Gibbs sampler. simulate does not return this generated value of σ2. • If Mdl is an empiricalblm model object and you do not supply X and y, then simulate draws from Mdl.BetaDraws and Mdl.Sigma2Draws. If NumDraws is less than or equal to numel(Mdl.Sigma2Draws), then simulate returns the first NumDraws elements of Mdl.BetaDraws and Mdl.Sigma2Draws as random draws for the corresponding parameter. Otherwise, simulate randomly resamples NumDraws elements from Mdl.BetaDraws and Mdl.Sigma2Draws. • If Mdl is a customblm model object, then simulate uses an MCMC sampler to draw from the posterior distribution. At each iteration, the software concatenates the current values of the regression coefficients and disturbance variance into an (Mdl.Intercept + Mdl.NumPredictors + 1)-by-1 vector, and passes it to Mdl.LogPDF. The value of the disturbance variance is the last element of this vector. • The HMC sampler requires both the log density and its gradient. The gradient should be a (NumPredictors+Intercept+1)-by-1 vector. If the derivatives of certain parameters are difficult to compute, then, in the corresponding locations of the gradient, supply NaN values instead. simulate replaces NaN values with numerical derivatives. • If Mdl is a lassoblm, mixconjugateblm, or mixsemiconjugateblm model object and you supply X and y, then simulate samples from the posterior distribution by applying the Gibbs sampler. If you do not supply the data, then simulate samples from the analytical, unconditional prior distributions. • simulate does not return default starting values that it generates. 12-2188
simulate
• If Mdl is a mixconjugateblm or mixsemiconjugateblm, then simulate draws from the regime distribution first, given the current state of the chain (the values of RegimeStart, BetaStart, and Sigma2Start). If you draw one sample and do not specify values for RegimeStart, BetaStart, and Sigma2Start, then simulate uses the default values and issues a warning.
Version History Introduced in R2017a
See Also Objects conjugateblm | customblm | empiricalblm | semiconjugateblm | diffuseblm | mixconjugateblm | mixsemiconjugateblm | lassoblm Functions forecast | estimate | sampleroptions Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-2189
12
Functions
simulate Simulate posterior draws of Bayesian state-space model parameters
Syntax [Params,accept] = simulate(PriorMdl,Y,params0,Proposal) [Params,accept] = simulate(PriorMdl,Y,params0,Proposal,Name=Value) [Params,accept,Output] = simulate(PriorMdl,Y,params0,Proposal,Name=Value)
Description [Params,accept] = simulate(PriorMdl,Y,params0,Proposal) returns 1000 random vectors of state-space model parameters Params drawn from the posterior distribution Π(θ|Y), where PriorMdl specifies the prior distribution and data likelihood, and Y is the observed response data. params0 is the set of initial parameter values and Proposal is the covariance matrix of the proposal distribution of the Metropolis-Hastings sampler [1][2]. accept is the acceptance rate of the proposal draws. [Params,accept] = simulate(PriorMdl,Y,params0,Proposal,Name=Value) specifies options using one or more name-value arguments. For example, simulate(PriorMdl,Y,params0,Proposal,NumDraws=1e6,Thin=3,DoF=10) uses the multivariate t10 distribution for the Metropolis-Hastings proposal, draws 3e6 random vectors of parameters, and thins the sample to reduce serial correlation by discarding every 2 draws until it retains 1e6 draws. [Params,accept,Output] = simulate(PriorMdl,Y,params0,Proposal,Name=Value) also returns the output Output of the custom function that monitors the Markov-chain Monte Carlo (MCMC) algorithm at each iteration, specified by the OutputFunction name-value argument.
Examples Draw Random Parameters from Posterior Distribution of Time-Invariant Model Simulate observed responses from a known state-space model, then treat the model as Bayesian and draw parameters from the posterior distribution. Suppose the following state-space model is a data-generating process (DGP). xt, 1 xt, 2
=
xt − 1, 1 0.5 0 1 0 ut, 1 + 0 −0 . 75 xt − 1, 2 0 0 . 5 ut, 2
yt = 1 1
xt, 1 xt, 2
.
Create a standard state-space model object ssm that represents the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)];
12-2190
simulate
B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C);
Simulate a response path from the DGP. rng(1); % For reproducibility y = simulate(DGP,200);
Suppose the structure of the DGP is known, but the state parameters trueTheta are unknown, explicitly xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 1
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
Consider a Bayesian state-space model representing the model with unknown parameters. Arbitrarily assume that the prior distribution of ϕ1, ϕ2, σ12, and σ22 are independent Gaussian random variables with mean 0.5 and variance 1. The Local Functions on page 12-2193 section contains two functions required to specify the Bayesian state-space model. You can use the functions only within this script. The paramMap function accepts a vector of the unknown state-space model parameters and returns all the following quantities: • •
A= B=
ϕ1 0 0 ϕ2 σ1 0 0 σ2
. .
• C= 1 1. • D = 0. • Mean0 and Cov0 are empty arrays [], which specify the defaults. • StateType = 0 0 , indicating that each state is stationary. The paramDistribution function accepts the same vector of unknown parameters as does paramMap, but it returns the log prior density of the parameters at their current values. Specify that parameter values outside the parameter space have log prior density of -Inf. Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm. Mdl = bssm(@paramMap,@priorDistribution) Mdl = Mapping that defines a state-space model: @paramMap Log density of parameter prior distribution:
12-2191
12
Functions
@priorDistribution
The simulate function requires a proposal distribution scale matrix. You can obtain a data-driven proposal scale matrix by using the tune function. Alternatively, you can supply your own scale matrix. Obtain a data-driven scale matrix by using the tune function. Supply a random set of initial parameter values, and shut off the estimation summary display. numParams = 4; theta0 = rand(numParams,1); [theta0,Proposal] = tune(Mdl,y,theta0,Display=false); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance.
Draw 1000 random parameter vectors from the posterior distribution. Specify the simulated response path as observed responses and the optimized values returned by tune for the initial parameter values and the proposal distribution. [Theta,accept] = simulate(Mdl,y,theta0,Proposal); accept accept = 0.4010
Theta is a 4-by-1000 matrix of randomly drawn parameters from the posterior distribution. Rows correspond to the elements of the input argument theta of the functions paramMap and priorDistribution. accept is the proposal acceptance probability. In this case, simulate accepts 40% of the proposal draws. Create trace plots of the parameters. paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(Theta(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
12-2192
simulate
The sampler eventually settles at near the true values of the parameters. In this case, the sample shows serial correlation and transient behavior. You can remedy serial correlation in the sample by adjusting the Thin name-value argument, and you can remedy transient effects by increasing the burn-in period using the BurnIn name-value argument. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1);
12-2193
12
Functions
sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Improve Markov Chain Convergence Consider the model in the example “Draw Random Parameters from Posterior Distribution of TimeInvariant Model” on page 12-2190. Improve the Markov chain convergence by adjusting sampler options. Create a standard state-space model object ssm that represents the DGP, and then simulate a response path. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)]; B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C); rng(1); % For reproducibility y = simulate(DGP,200);
Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm (the functions are in Local Functions on page 12-2196). Mdl = bssm(@paramMap,@priorDistribution) Mdl = Mapping that defines a state-space model: @paramMap Log density of parameter prior distribution: @priorDistribution
Simulate random parameter vectors from the posterior distribution. Specify the simulated response path as observed responses, and obtain an optimal proposal distribution by using tune and shut off all optimization displays. The plots in “Draw Random Parameters from Posterior Distribution of TimeInvariant Model” on page 12-2190 suggest that the Markov chain settles after 500 draws. Therefore, specify a burn-in period of 500 (BurnIn=500). Specify thinning the sample by keeping the first draw of each set of 30 successive draws (Thin=30). Retain 2000 random parameter vectors (NumDraws=2000). numParams = 4; theta0 = rand(numParams,1); options = optimoptions("fminunc",Display="off"); [theta0,Proposal] = tune(Mdl,y,theta0,Display=false,Options=options); [Theta,accept] = simulate(Mdl,y,theta0,Proposal, ... NumDraws=2000,BurnIn=500,Thin=30); accept accept = 0.3885
12-2194
simulate
Theta is a 4-by-2000 matrix of randomly drawn parameters from the posterior distribution. Rows correspond to the elements of the input argument theta of the functions paramMap and priorDistribution. accept is the proposal acceptance probability. In this case, simulate accepts 39% of the proposal draws. Create trace plots and correlograms of the parameters. paramNames = ["\phi_1" "\phi_2" "\sigma_1" "\sigma_2"]; figure h = tiledlayout(4,1); for j = 1:numParams nexttile plot(Theta(j,:)) hold on yline(trueTheta(j)) ylabel(paramNames(j)) end title(h,"Posterior Trace Plots")
figure h = tiledlayout(4,1); for j = 1:numParams nexttile autocorr(Theta(j,:)); ylabel(paramNames(j));
12-2195
12
Functions
title([]); end title(h,"Posterior Sample Correlograms")
The sampler quickly settles near the true values of the parameters. The sample shows little serial correlation and no transient behavior. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else
12-2196
simulate
mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Simulate Parameters from Posterior of Time-Varying Model Consider the following time-varying, state-space model for a DGP: • From periods 1 through 250, the state equation includes stationary AR(2) and MA(1) models, respectively, and the observation model is the weighted sum of the two states. • From periods 251 through 500, the state model includes only the first AR(2) model. • μ0 = 0 . 5 0 . 5 0 0 and Σ0 is the identity matrix. Symbolically, the DGP is x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1
σ1 0 0 0 u1t 0 1 u2t for t = 1, . . . , 250, 0 1
+
0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
yt = c1 x1t + x3t + σ2εt . x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1
σ1
+
1 0 0 0 x3, t − 1
0
u1t
for t = 251,
x4, t − 1 yt = c2x1t + σ3εt . x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0
u1t
for t = 252, . . . , 500 .
yt = c2x1t + σ3εt . where: • The AR(2) parameters ϕ1, ϕ2 = 0 . 5, − 0 . 2 and σ1 = 0 . 4. • The MA(1) parameter θ = 0 . 3. • The observation equation parameters c1, c2 = 2, 3 and σ2, σ3 = 0 . 1, 0 . 2 . Write a function that specifies how the parameters theta and sample size T map to the state-space model matrices, the initial state moments, and the state types. Save this code as a file named timeVariantParamMapBayes.m on your MATLAB® path. Alternatively, open the example to access the function. type timeVariantParamMapBayes.m % Copyright 2022 The MathWorks, Inc.
12-2197
12
Functions
function [A,B,C,D,Mean0,Cov0,StateType] = timeVariantParamMapBayes(theta,T) % Time-variant, Bayesian state-space model parameter mapping function % example. This function maps the vector params to the state-space matrices % (A, B, C, and D), the initial state value and the initial state variance % (Mean0 and Cov0), and the type of state (StateType). From periods 1 % through T/2, the state model is a stationary AR(2) and an MA(1) model, % and the observation model is the weighted sum of the two states. From % periods T/2 + 1 through T, the state model is the AR(2) model only. The % log prior distribution enforces parameter constraints (see % flatPriorBSSM.m). T1 = floor(T/2); T2 = T - T1 - 1; A1 = {[theta(1) theta(2) 0 0; 1 0 0 0; 0 0 0 theta(4); 0 0 0 0]}; B1 = {[theta(3) 0; 0 0; 0 1; 0 1]}; C1 = {theta(5)*[1 0 1 0]}; D1 = {theta(6)}; Mean0 = [0.5 0.5 0 0]; Cov0 = eye(4); StateType = [0 0 0 0]; A2 = {[theta(1) theta(2) 0 0; 1 0 0 0]}; B2 = {[theta(3); 0]}; A3 = {[theta(1) theta(2); 1 0]}; B3 = {[theta(3); 0]}; C3 = {theta(7)*[1 0]}; D3 = {theta(8)}; A = [repmat(A1,T1,1); A2; repmat(A3,T2,1)]; B = [repmat(B1,T1,1); B2; repmat(B3,T2,1)]; C = [repmat(C1,T1,1); repmat(C3,T2+1,1)]; D = [repmat(D1,T1,1); repmat(D3,T2+1,1)]; end
Simulate a response path of length 500 from the model. params = [0.5; -0.2; 0.4; 0.3; 2; 0.1; 3; 0.2]; numObs = 500; numParams = numel(params); [A,B,C,D,mean0,Cov0,stateType] = timeVariantParamMapBayes(params,numObs); DGP = ssm(A,B,C,D,Mean0=mean0,Cov0=Cov0,StateType=stateType); rng(1) % For reproducibility y = simulate(DGP,numObs); plot(y) ylabel("y")
12-2198
simulate
Write a function that specifies a flat prior distribution on the state-space model parameters theta. The function returns the scalar log prior for an input set of parameters. Save this code as a file named flatPriorBSSM.m on your MATLAB® path. Alternatively, open the example to access the function. type flatPriorBSSM.m % Copyright 2022 The MathWorks, Inc. function logprior = flatPriorBSSM(theta) % flatPriorBSSM computes the log of the flat prior density for the eight % variables in theta (see timeVariantParamMapBayes.m). Log probabilities % for parameters outside the parameter space are -Inf. % theta(1) and theta(2) are lag 1 and lag 2 terms in a stationary AR(2) % model. The eigenvalues of the AR(1) representation need to be within % the unit circle. evalsAR2 = eig([theta(1) theta(2); 1 0]); evalsOutUC = sum(abs(evalsAR2) >= 1) > 0; % Standard deviations of disturbances and errors (theta(3), theta(6), % and theta(8)) need to be positive. nonnegsig1 = theta(3) NumPaths, simulate uses only the first NumPaths columns. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. The defaults are: • For GARCH(P,Q) and GJR(P,Q) models, simulate sets any necessary presample innovations to the square root of the average squared value of the offset-adjusted response series Y. • For EGARCH(P,Q) models, simulate sets any necessary presample innovations to zero. • simulate sets any necessary presample conditional variances to the unconditional variance of the process. If you specify the Presample, you must specify the presample innovation or conditional variance variable names by using the PresampleInnovationVariable or PresampleVarianceVariable name-value argument. Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: simulate(Mdl,100,NumPaths=1000,E0=[0.5; 0.5]) specifies generating 1000 sample paths of length 100 from the model Mdl, and using [0.5; 0.5] as the presample of innovations per path. NumPaths — Number of sample paths to generate 1 (default) | positive integer 12-2240
simulate
Number of sample paths to generate, specified as a positive integer. Example: NumPaths=1000 Data Types: double E0 — Presample innovation paths εt numeric column vector | numeric matrix Presample innovation paths εt, specified as a numpreobs-by-1 numeric column vector or a numpreobs-by-numprepaths matrix. Use E0 only when you supply optional data inputs as numeric arrays. The presample innovations provide initial values for the innovations process of the conditional variance model Mdl. The presample innovations derive from a distribution with mean 0. numpreobs is the number of presample observations. numprepaths is the number of presample paths. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q. If numpreobs > Mdl.Q, simulate uses the latest required number of observations only. The last element or row contains the latest observation. • If E0 is a column vector, it represents a single path of the underlying innovation series. simulate applies it to each output path. • If E0 is a matrix, each column represents a presample path of the underlying innovation series. numprepaths must be at least NumPaths. If numprepaths > NumPaths, simulate uses the first NumPaths columns only. The defaults are: • For GARCH(P,Q) and GJR(P,Q) models, simulate sets any necessary presample innovations to an independent sequence of disturbances with mean zero and standard deviation equal to the unconditional standard deviation of the conditional variance process. • For EGARCH(P,Q) models, simulate sets any necessary presample innovations to an independent sequence of disturbances with mean zero and variance equal to the exponentiated unconditional mean of the logarithm of the EGARCH variance process. Example: E0=[0.5; 0.5] V0 — Positive presample conditional variance paths σt2 numeric column vector | numeric matrix Positive presample conditional variance paths, specified as a numpreobs-by-1 positive column vector or numpreobs-by-numprepaths positive matrix.. V0 provides initial values for the conditional variances in the model. Use V0 only when you supply optional data inputs as numeric arrays. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. • For GARCH(P,Q) and GJR(P,Q) models, numpreobs must be at least Mdl.P. • For EGARCH(P,Q) models,numpreobs must be at least max([Mdl.P Mdl.Q]). 12-2241
12
Functions
If numpreobs exceeds the minimum number, simulate uses only the latest observations. The last element or row contains the latest observation. • If V0 is a column vector, it represents a single path of the conditional variance series. simulate applies it to each output path. • If V0 is a matrix, each column represents a presample path of the conditional variance series. numprepaths must be at least NumPaths. If numprepaths > NumPaths, simulate uses the first NumPaths columns only. The defaults are: • For GARCH(P,Q) and GJR(P,Q) models, simulate sets any necessary presample variances to the unconditional variance of the conditional variance process. • For EGARCH(P,Q) models, simulate sets any necessary presample variances to the exponentiated unconditional mean of the logarithm of the EGARCH variance process. Example: V0=[1; 0.5] Data Types: double PresampleInnovationVariable — Variable of Presample containing presample innovation paths εt string scalar | character vector | integer | logical vector Variable of Presample containing presample innovation paths εt, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (integer) to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleInnovationVariable) is 1 The selected variable must be a numeric matrix and cannot contain missing values (NaN). If you specify presample innovation data by using the Presample name-value argument, you must specify PresampleInnovationVariable. Example: PresampleInnovationVariable="StockRateInnov0" Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable as the presample innovation variable. Data Types: double | logical | char | cell | string PresampleVarianceVariable — Variable of Presample containing data for the presample conditional variances σt2 string scalar | character vector | integer | logical vector Variable of Presample containing data for the presample conditional variances σt2, specified as one of the following data types: • String scalar or character vector containing a variable name in Presample.Properties.VariableNames 12-2242
simulate
• Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleVarianceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample conditional variance data by using the Presample name-value argument, you must specify PresampleVarianceVariable. Example: PresampleVarianceVariable="StockRateVar0" Example: PresampleVarianceVariable=[false false true false] or PresampleVarianceVariable=3 selects the third table variable as the presample conditional variance variable. Data Types: double | logical | char | cell | string Notes • NaN values in E0, and V0 indicate missing values. simulate removes missing values from specified data by list-wise deletion. simulate horizontally concatenates E0 and V0, and then it removes any row of the concatenated matrix containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, simulate assumes that you synchronize the presample data such that the latest observations occur simultaneously. • simulate issues an error when any table or timetable input contains missing values. • If E0 and V0 are column vectors, simulate applies them to every column of the outputs V and Y. This application allows simulated paths to share a common starting point for Monte Carlo simulation of forecasts and forecast error distributions.
Output Arguments V — Simulated conditional variance paths σt2 numeric column vector | numeric matrix Simulated conditional variance paths σt2 of the mean-zero innovations associated with Y, returned as a numobs-by-1 numeric column vector or numobs-by-NumPaths matrix. simulate returns V when you do not specify the input table or timetable Presample. Each column of V corresponds to a simulated conditional variance path. Rows of V are periods corresponding to the periodicity of Mdl. Y — Simulated response paths yt numeric column vector | numeric matrix Simulated response paths yt, returned as a numobs-by-1 numeric column vector or numobs-byNumPaths matrix. simulate returns Y when you do not specify the input table or timetable Presample. Y usually represents a mean-zero, heteroscedastic time series of innovations with conditional variances given in V (a continuation of the presample innovation series E0). 12-2243
12
Functions
Y can also represent a time series of mean-zero, heteroscedastic innovations plus an offset. If Mdl includes an offset, then simulate adds the offset to the underlying mean-zero, heteroscedastic innovations so that Y represents a time series of offset-adjusted innovations. Each column of Y corresponds to a simulated response path. Rows of Y are periods corresponding to the periodicity of Mdl. Tbl — Simulated conditional variance σt2 and response yt paths table | timetable Simulated conditional variance σt2 and response yt paths, returned as a table or timetable, the same data type as Presample. simulate returns Tbl only when you supply the input Presample. Tbl contains the following variables: • The simulated conditional variance paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding path of presample conditional variances in Presample. simulate names the simulated conditional variance variable in Tbl responseName_Variance, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl contains a variable for the corresponding simulated conditional variance paths with the name StockReturns_Variance. • The simulated response paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding presample innovations path in Presample. simulate names the simulated response variable in Tbl responseName_Response, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is StockReturns, Tbl contains a variable for the corresponding simulated response paths with the name StockReturns_Response. If Tbl is a timetable, the following conditions hold: • The row order of Tbl, either ascending or descending, matches the row order of Preample. • Tbl.Time(1) is the next time after Presample(end) relative the sampling frequency, and Tbl.Time(2:numobs) are the following times relative to the sampling frequency.
Version History Introduced in R2012a R2023a: simulate accepts input data in tables and timetables, and returns results in tables and timetables In addition to accepting presample data in numeric arrays, simulate accepts presample data in tables or regular timetables. When you supply data in a table or timetable, the following conditions apply: • If you specify optional presample innovation or conditional variance data to initialize the model, you must also specify the presample innovation or conditional variance series name. • simulate returns results in a table or timetable. Name-value arguments to support tabular workflows include: 12-2244
simulate
• Presample specifies the input table or regular timetable of presample innovations and conditional variance data. • PresampleInnovationVariable specifies the variable name of the innovation paths to select from Presample. • PresampleVarianceVariable specifies the variable name of the conditional variance paths to select from Presample.
References [1] Bollerslev, T. “Generalized Autoregressive Conditional Heteroskedasticity.” Journal of Econometrics. Vol. 31, 1986, pp. 307–327. [2] Bollerslev, T. “A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return.” The Review of Economics and Statistics. Vol. 69, 1987, pp. 542–547. [3] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [4] Enders, W. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, 1995. [5] Engle, R. F. “Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation.” Econometrica. Vol. 50, 1982, pp. 987–1007. [6] Glosten, L. R., R. Jagannathan, and D. E. Runkle. “On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks.” The Journal of Finance. Vol. 48, No. 5, 1993, pp. 1779–1801. [7] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [8] Nelson, D. B. “Conditional Heteroskedasticity in Asset Returns: A New Approach.” Econometrica. Vol. 59, 1991, pp. 347–370.
See Also Objects garch | egarch | gjr Functions estimate | forecast | filter Topics “Simulate GARCH Models” on page 8-76 “Simulate Conditional Variance Model” on page 8-86 “Assess EGARCH Forecast Bias Using Simulations” on page 8-81 “Monte Carlo Simulation of Conditional Variance Models” on page 8-72 “Presample Data for Conditional Variance Model Simulation” on page 8-75 “Monte Carlo Forecasting of Conditional Variance Models” on page 8-89
12-2245
12
Functions
simulate Simulate Markov chain state walks
Syntax X = simulate(mc,numSteps) X = simulate(mc,numSteps,'X0',x0)
Description X = simulate(mc,numSteps) returns data X on random walks of length numSteps through sequences of states in the discrete-time Markov chain mc. X = simulate(mc,numSteps,'X0',x0) optionally specifies the initial state of simulations x0.
Examples Simulate Random Walk Through Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0 0 P= 0 0 1/2 1/4
0 0 0 0 0 1/2 3/4
1/2 1/3 0 0 0 0 0
1/4 0 0 0 0 0 0
1/4 2/3 0 0 0 0 0
0 0 1/3 1/2 3/4 0 0
0 0 2/3 1/2 . 1/4 0 0
Create the Markov chain that is characterized by the transition matrix P. P = [ 0 0 1/2 1/4 1/4 0 0 ; 0 0 1/3 0 2/3 0 0 ; 0 0 0 0 0 1/3 2/3; 0 0 0 0 0 1/2 1/2; 0 0 0 0 0 3/4 1/4; 1/2 1/2 0 0 0 0 0 ; 1/4 3/4 0 0 0 0 0 ]; mc = dtmc(P);
Plot a directed graph of the Markov chain. Indicate the probability of transition by using edge colors. figure; graphplot(mc,'ColorEdges',true);
12-2246
simulate
Simulate a 20-step random walk that starts from a random state. rng(1); % For reproducibility numSteps = 20; X = simulate(mc,numSteps) X = 21×1 3 7 1 3 6 1 3 7 2 5
⋮
X is a 21-by-1 matrix. Rows correspond to steps in the random walk. Because X(1) is 3, the random walk begins at state 3. Visualize the random walk. figure; simplot(mc,X);
12-2247
12
Functions
Specify Starting States for Multiple Simulations Create a four-state Markov chain from a randomly generated transition matrix containing eight infeasible transitions. rng('default'); % For reproducibility mc = mcmix(4,'Zeros',8);
mc is a dtmc object. Plot a digraph of the Markov chain. figure; graphplot(mc);
12-2248
simulate
State 4 is an absorbing state. Run three 10-step simulations for each state. x0 = 3*ones(1,mc.NumStates); numSteps = 10; X = simulate(mc,numSteps,'X0',x0);
X is an 11-by-12 matrix. Rows corresponds to steps in the random walk. Columns 1–3 are the simulations that start at state 1; column 4–6 are the simulations that start at state 2; columns 7–9 are the simulations that start at state 3; and columns 10–12 are the simulations that start at state 4. For each time, plot the proportions states that are visited over all simulations. figure; simplot(mc,X)
12-2249
12
Functions
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). numSteps — Number of discrete time steps positive integer Number of discrete time steps in each simulation, specified as a positive integer. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'X0',[1 0 2] specifies simulating three times, the first simulation starts in state 1 and the final two simulations start in state 3. No simulations start in state 2. 12-2250
simulate
X0 — Initial states of simulations vector of nonnegative integers Initial states of simulations, specified as the comma-separated pair consisting of 'X0' and a vector of nonnegative integers of NumStates length. X0 provides counts for the number of simulations to begin in each state. The total number of simulations (numSims) is sum(X0). The default is a single simulation beginning from a random initial state. Example: 'X0',[10 10 0 5] specifies 10 simulations starting in state 1, 10 simulations starting in state 2, no simulations starting in state 3, and 5 simulations starting in state 4. simulate conducts sum(X0) = 25 simulations. Data Types: double
Output Arguments X — Indices of states numeric matrix of positive integers Indices of states visited during the simulations, returned as a (1 + numSteps)-by-numSims numeric matrix of positive integers. The first row contains the initial states. Columns, in order, are all simulations beginning in the first state, then all simulations beginning in the second state, and so on.
Tips • To start n simulations from state k, use: X0 = zeros(1,NumStates); X0(k) = n;
• To visualize the data created by simulate, use simplot.
Version History Introduced in R2017b
See Also Objects dtmc Functions redistribute | simplot Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Simulate Random Walks Through Markov Chain” on page 10-59
12-2251
12
Functions
simulate Simulate sample paths of Markov-switching dynamic regression model
Syntax Y = simulate(Mdl,numObs) Y = simulate(Mdl,numObs,Name,Value) [Y,E,StatePaths] = simulate( ___ )
Description Y = simulate(Mdl,numObs) returns a random numObs-period path of response series Y from simulating the fully specified Markov-switching dynamic regression model Mdl. Y = simulate(Mdl,numObs,Name,Value) uses additional options specified by one or more namevalue arguments. For example, 'NumPaths',1000,'Y0',Y0 simulates 1000 sample paths and initializes the dynamic component of each submodel by using the presample response data Y0. [Y,E,StatePaths] = simulate( ___ ) also returns the simulated innovation paths E and the simulated state paths StatePaths, using any of the input argument combinations in the previous syntaxes.
Examples Simulate Response Path Simulate a response path from a two-state Markov-switching dynamic regression model for a 1-D response process. This example uses arbitrary parameter values. Create Fully Specified Model Create a two-state discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.9 0.1; 0.3 0.7]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
mc is a fully specified dtmc object. For each regime, use arima to create an AR model that describes the response process within the regime. Store the submodels in a vector. mdl1 = arima('Constant',5,'AR',[0.3 0.2],... 'Variance',2); mdl2 = arima('Constant',-5,'AR',0.1,... 'Variance',1); mdl = [mdl1; mdl2];
mdl1 and mdl2 are fully specified arima objects. 12-2252
simulate
Create a Markov-switching dynamic regression model from the switching mechanism mc and the vector of submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR object. Simulate Response Path Generate one random response path of length 50 from the model. rng(1); % For reproducibility y = simulate(Mdl,50);
y is a 50-by-1 vector of one response path. Plot the response path. figure plot(y) xlabel("Time") ylabel("Response")
Simulate Multiple Paths Consider the model in “Simulate Response Path” on page 12-2252. 12-2253
12
Functions
Create the Markov-switching dynamic regression model. P = [0.9 0.1; 0.3 0.7]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]); mdl1 = arima('Constant',5,'AR',[0.3 0.2],... 'Variance',2); mdl2 = arima('Constant',-5,'AR',0.1,... 'Variance',1); mdl = [mdl1; mdl2]; Mdl = msVAR(mc,mdl);
Simulate 3 response, innovations, and state-index paths of 5 observations from the model. rng('default') % For reproducibility [Y,E,SP] = simulate(Mdl,5,'NumPaths',3) Y = 5×3 -5.7605 -5.7002 4.2446 -3.1665 -3.8995
10.9496 8.5772 10.7774 -2.2920 -4.7403
11.4633 -3.1268 -5.6161 -5.2677 -6.3141
0.9496 -1.7076 1.0143 1.6302 0.4889
1.4633 0.7269 -0.3034 0.2939 -0.7873
E = 5×3 -0.2050 -0.1241 2.1068 1.4090 1.4172 SP = 5×3 2 2 1 2 2
1 1 1 2 2
1 2 2 2 2
Y, E, and SP are 5-by-3 matrices. Columns represent separate, independent paths. Simulate a single path of responses, innovations, and states into a simulation horizon of length 50. Then plot each path separately. [y,e,sp] = simulate(Mdl,50); figure subplot(3,1,1) plot(y) ylabel('Response') grid on subplot(3,1,2) plot(e) ylabel('Innovation') grid on
12-2254
simulate
subplot(3,1,3) plot(sp,'m') ylabel('State') yticks([1 2]) yticklabels(Mdl.StateNames)
Simulate US GDP Rates and Economic States Consider a two-state Markov-switching dynamic regression model of the postwar US real GDP growth rate. The model has the parameter estimates presented in [1]. Create a discrete-time Markov chain model that describes the regime switching mechanism. Label the regimes. P = [0.92 0.08; 0.26 0.74]; mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
Create separate AR(0) models (constant only) for the two regimes. sigma = 3.34; % Homoscedastic models across states mdl1 = arima('Constant',4.62,'Variance',sigma^2); mdl2 = arima('Constant',-0.48,'Variance',sigma^2); mdl = [mdl1 mdl2];
12-2255
12
Functions
Create the Markov-switching dynamic regression model that describes the behavior of the US GDP growth rate. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR object. Generate one random path of 100 responses, corresponding innovations, and states from the model. rng(1) % For reproducibility [y,e,sp] = simulate(Mdl,100);
y is a 100-by-1 vector of GDP rates, and e is a 100-by-1 vector of corresponding innovations. sp is a 100-by-1 vector of state indices.
Specify Presample Data Consider the Markov-switching model in “Simulate US GDP Rates and Economic States” on page 122255, but assume that the submodels are AR(1) instead. Consider fitting the model to observations in the period 1960:Q1–2004:Q2. Create the model template for estimation. Specify AR(1) submodels. mc = dtmc(NaN(2),'StateNames',["Expansion" "Recession"]); ar1 = arima(1,0,0); Mdl = msVAR(mc,[ar1; ar1]);
Because the submodels are AR(1), each requires one presample observation to initialize its dynamic component for estimation. Create the model containing initial parameter values for the estimation procedure. mc0 = dtmc(0.5*ones(2),'StateNames',["Expansion" "Recession"]); submdl01 = arima('Constant',1,'Variance',1,'AR',0.001); submdl02 = arima('Constant',-1,'Variance',1,'AR',0.001); Mdl0 = msVAR(mc0,[submdl01; submdl02]);
Load the data. Transform the entire set to an annualized rate series. load Data_GDP qrate = diff(Data)./Data(1:(end - 1)); arate = 100*((1 + qrate).^4 - 1);
Identify the presample and estimation sample periods using the dates associated with the annualized rate series. Because the transformation applies the first difference, you must drop the first observation date from the original sample. dates = datetime(dates(2:end),'ConvertFrom','datenum',... 'Format','yyyy:QQQ','Locale','en_US'); estPrd = datetime(["1960:Q2" "2004:Q2"],'InputFormat','yyyy:QQQ',... 'Format','yyyy:QQQ','Locale','en_US'); idxEst = isbetween(dates,estPrd(1),estPrd(2)); idxPre = dates < estPrd(1);
Fit the model to the estimation sample data. Specify the presample observation. 12-2256
simulate
y0 = arate(idxPre); EstMdl = estimate(Mdl,Mdl0,arate(idxEst),'Y0',y0);
Simulate a response path from the fitted model over the estimation period. Specify the presample observation. rng(1) % For reproducibility numObs = sum(idxEst); aratesim = simulate(EstMdl,numObs,'Y0',y0);
Plot the observations and simulated path of annualized rates, and identify periods of recession by using recessionplot. figure; plot(dates(idxEst),[arate(idxEst) aratesim]) recessionplot xlabel("Time") ylabel("Annualized GDP Rate") legend("Observed","Simulated");
Fit Model to Simulated Data Assess estimation accuracy using simulated data from a known data-generating process (DGP). This example uses arbitrary parameter values. 12-2257
12
Functions
Create Model for DGP Create a fully specified, two-state discrete-time Markov chain model for the switching mechanism. P = [0.7 0.3; 0.1 0.9]; mc = dtmc(P);
For each state, create a fully specified AR(1) model for the response process. % Constants C1 = 5; C2 = -2; % Autoregression coefficients AR1 = 0.4; AR2 = 0.2; % Variances V1 = 4; V2 = 2; % AR Submodels dgp1 = arima('Constant',C1,'AR',AR1,'Variance',V1); dgp2 = arima('Constant',C2,'AR',AR2,'Variance',V2);
Create a fully specified Markov-switching dynamic regression model for the DGP. DGP = msVAR(mc,[dgp1,dgp2]);
Simulate Response Paths from DGP Generate 10 random response paths of length 1000 from the DGP. rng(1); % For reproducibility N = 10; n = 1000; Data = simulate(DGP,n,'Numpaths',N);
Data is a 1000-by-10 matrix of simulated responses. Create Model for Estimation Create a partially specified Markov-switching dynamic regression model that has the same structure as the data-generating process, but specify an unknown transition matrix and unknown submodel coefficients. PEst = NaN(2); mcEst = dtmc(PEst); mdl = arima(1,0,0); Mdl = msVAR(mcEst,[mdl; mdl]);
Create Model Containing Initial Values Create a fully specified Markov-switching dynamic regression model that has the same structure as Mdl, but set all estimable parameters to initial values. P0 = 0.5*ones(2); mc0 = dtmc(P0);
12-2258
simulate
mdl01 = arima('Constant',1,'AR',0.5,'Variance',2); mdl02 = arima('Constant',-1,'AR',0.5,'Variance',1); Mdl0 = msVAR(mc0,[mdl01,mdl02]);
Estimate Models Fit the model to each simulated path. For each path, plot the loglikelihood at each iteration of the EM algorithm. c1 = zeros(N,1); c2 = zeros(N,1); v1 = zeros(N,1); v2 = zeros(N,1); ar1 = zeros(N,1); ar2 = zeros(N,1); PStack = zeros(2,2,N); figure hold on for i = 1:N EstModel = estimate(Mdl,Mdl0,Data(:,i),'IterationPlot',true); c1(i) = EstModel.Submodels(1).Constant; c2(i) = EstModel.Submodels(2).Constant; v1(i) = EstModel.Submodels(1).Covariance; v2(i) = EstModel.Submodels(2).Covariance; ar1(i) = EstModel.Submodels(1).AR{1}; ar2(i) = EstModel.Submodels(2).AR{1}; PStack(:,:,i) = EstModel.Switch.P; end hold off
12-2259
12
Functions
Assess Accuracy Compute the Monte Carlo mean of each estimated parameter. c1Mean = mean(c1); c2Mean = mean(c2); v1Mean = mean(v1); v2Mean = mean(v2); ar1Mean = mean(ar1); ar2Mean = mean(ar2); PMean = mean(PStack,3);
Compare population parameters to the corresponding Monte Carlo estimates. DGPvsEstimate = [... C1 c1Mean C2 c2Mean V1 v1Mean V2 v2Mean AR1 ar1Mean AR2 ar2Mean] DGPvsEstimate = 6×2 5.0000 -2.0000 4.0000 2.0000
12-2260
5.0260 -1.9615 3.9710 1.9903
simulate
0.4000 0.2000
0.4061 0.2017
P P = 2×2 0.7000 0.1000
0.3000 0.9000
PEstimate = PMean PEstimate = 2×2 0.7065 0.1023
0.2935 0.8977
Simulate Paths from Model with VARX Submodels Generate random paths from a three-state Markov-switching dynamic regression model for a 2-D VARX response process. This example uses arbitrary parameter values for the DGP. Create Fully Specified Model for DGP Create a three-state discrete-time Markov chain model for the switching mechanism. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P);
mc is a fully specified dtmc object. dtmc normalizes the rows of P so that they sum to 1. For each regime, use varm to create a VAR model that describes the response process within the regime. Specify all parameter values. % Constants C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; % Autoregression coefficients AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % Regression coefficients Beta1 = [1;-1]; Beta2 = [2 2;-2 -2]; Beta3 = [3 3 3;-3 -3 -3]; % Innovations covariances Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3];
12-2261
12
Functions
% VARX submodels mdl1 = varm('Constant',C1,'AR',AR1,'Beta',Beta1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Beta',Beta2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Beta',Beta3,'Covariance',Sigma3); mdl = [mdl1; mdl2; mdl3];
mdl contains three fully specified varm model objects. For the DGP, create a fully specified Markov-switching dynamic regression model from the switching mechanism mc and the submodels mdl. Mdl = msVAR(mc,mdl);
Mdl is a fully specified msVAR model. Simulate Data Ignoring Regression Component If you do not supply exogenous data, simulate ignores the regression components in the submodels. Simulate 3 separate, independent paths of responses, innovations, and state indices of length 5 from the model. rng(1); % For reproducibility [Y,E,SP] = simulate(Mdl,5,'NumPaths',3) Y = Y(:,:,1) = 5.2387 4.4290 1.1668 -0.9654 -0.2701
1.5297 4.2738 -1.2905 -0.2028 0.8993
Y(:,:,2) = 2.7737 -0.8651 -0.0511 0.5826 2.4022
-2.5383 -1.1046 0.3696 -0.8926 -0.6912
Y(:,:,3) = 3.5443 4.9748 5.7213 4.2473 2.7972
0.8768 -0.7956 0.8073 0.5805 -1.3340
E = E(:,:,1) = 1.2387 -0.3434 0.1668
12-2262
1.5297 2.8896 -0.2905
simulate
-1.9654 -1.2701
0.7972 1.8993
E(:,:,2) = 1.7737 -1.8651 -1.0511 -0.4174 1.4022
-1.5383 -0.1046 1.3696 0.1074 0.3088
E(:,:,3) = -0.4557 1.1150 1.3134 -0.6941 1.7972
0.8768 -1.0061 0.7176 -0.6838 -0.3340
SP = 5×3 2 2 1 1 1
1 1 1 1 1
2 2 2 2 1
Y and E are 5-by-2-by-3 arrays of simulated responses and innovations, respectively. Rows correspond to time points, columns correspond to variables in the system, and pages correspond to paths. SP is a 5-by-3 matrix whose columns correspond to paths. Simulate a single path of responses, innovations, and states into a simulation horizon of length 50. Then plot each path separately. rng0 = rng; % Store settings to reproduce state sequence. [Y,E,SP] = simulate(Mdl,50); figure subplot(3,1,1) plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) subplot(3,1,2) plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) subplot(3,1,3) plot(SP,'m') ylabel("State") yticks([1 2 3])
12-2263
12
Functions
Simulate Data Including Regression Component Simulate exogenous data for the three regressors by generating 50 random observations from the 3-D standard Gaussian distribution. X = randn(50,3);
Generate one random response, innovation, and state path of length 50. Specify the simulated exogenous data for the submodel regression components. Plot the results. rng(rng0); % Reproduce state sequence in previous simulation. [Y,E,SP] = simulate(Mdl,50,'X',X); figure subplot(3,1,1) plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) subplot(3,1,2) plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) subplot(3,1,3) plot(SP,'m') ylabel("State") yticks([1 2 3])
12-2264
simulate
Perform Monte Carlo Estimation Consider the model in “Simulate Paths from Model with VARX Submodels” on page 12-2261. Create Fully Specified Model Create the Markov-switching model excluding the regression component. P = [10 1 1; 1 10 1; 1 1 10]; mc = dtmc(P); C1 = [1;-1]; C2 = [2;-2]; C3 = [3;-3]; AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3); mdl = [mdl1; mdl2; mdl3]; Mdl = msVAR(mc,mdl);
12-2265
12
Functions
Simulate Multiple Paths Generate 1000 random paths of responses for 50 time steps. Start all simulations at the first state. rng(10); % For reproducibility S0 = [1 0 0]; Y = simulate(Mdl,50,'S0',S0,'NumPaths',1000);
Y is a 50-by-2-by-1000 array of simulated response paths. Compute Monte Carlo Distribution For each variable and path, compute the process mean. mus = mean(Y,1);
For each variable, plot the Monte Carlo distribution of the process mean. figure h1 = histogram(mus(1,1,:),'Normalization',"probability",... 'BinWidth',0.1); hold on h2 = histogram(mus(1,2,:),'Normalization',"probability",... 'BinWidth',0.1); legend(["y_1" "y_2"]) title('Process Means') hold off
12-2266
simulate
For each variable and path, compute the process standard deviation. sigmas = std(Y,0,1);
For each variable, plot the Monte Carlo distribution of the process standard deviation. figure h1 = histogram(sigmas(1,1,:),'Normalization',"probability",... 'BinWidth',0.05); hold on h2 = histogram(sigmas(1,2,:),'Normalization',"probability",... 'BinWidth',0.05); legend(["y_1" "y_2"]) title('Process Standard Deviations') hold off
Input Arguments Mdl — Fully specified Markov-switching dynamic regression model msVAR model object Fully specified Markov-switching dynamic regression model, specified as an msVAR model object returned by msVAR or estimate. Properties of a fully specified model object do not contain NaN values. numObs — Number of observations to generate positive integer 12-2267
12
Functions
Number of observations to generate for each sample path, specified as a positive integer. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: 'NumPaths',1000,'Y0',Y0 simulates 1000 sample paths and initializes the dynamic component of each submodel by using the presample response data Y0. NumPaths — Number of sample paths to generate 1 (default) | positive integer Number of sample paths to generate, specified as the comma-separated pair consisting of 'NumPaths' and a positive integer. Example: 'NumPaths',1000 Data Types: double Y0 — Presample response data numeric matrix | numeric array Presample response data, specified as the comma-separated pair consisting of 'Y0' and a numeric matrix or array. To use the same presample data for each numPaths path, specify a numPreSampleObs-bynumSeries matrix, where numPaths is the value of NumPaths, numPreSampleObs is the number of presample observations, and numSeries is the number of response variables. To use different presample data for each path: • For univariate ARX submodels, specify a numPreSampleObs-by-numPaths matrix. • For multivariate VARX submodels, specify a numPreSampleObs-by-numSeries-by-numPaths array. The number of presample observations numPreSampleObs must be sufficient to initialize the AR terms of all submodels. If numPreSampleObs exceeds the AR order of any state, simulate uses the latest observations. Each time simulate switches states, it updates Y0 using the latest simulated observations. By default, simulate determines Y0 by the submodel of the initial state (see S0): • If the initial submodel is a stationary AR process without regression components, simulate sets presample observations to the unconditional mean. • Otherwise, simulate sets presample observations to zero. Data Types: double S0 — Initial state probabilities nonnegative numeric vector 12-2268
simulate
Initial state probabilities, specified as the comma-separated pair consisting of 'S0' and a nonnegative numeric vector of length numStates. simulate normalizes S0 to produce a distribution. simulate selects the initial state of each path from S0 at random. To start from a specific initial state, specify a distribution with a probability mass of 1 in that state. By default, simulate sets S0 to a steady-state distribution computed by asymptotics. Example: 'S0',[0.2 0.2 0.6] Example: 'S0',[0 1] specifies state 2 as the initial state. Data Types: double X — Predictor data numeric matrix | cell vector of numeric matrices Predictor data used to evaluate regression components in all submodels of Mdl, specified as the comma-separated pair consisting of 'X' and a numeric matrix or a cell vector of numeric matrices. To use a subset of the same predictors in each state, specify X as a matrix with numPreds columns and at least numObs rows. Columns correspond to distinct predictor variables. Submodels use initial columns of the associated matrix, in order, up to the number of submodel predictors. The number of columns in the Beta property of Mdl.SubModels(j) determines the number of exogenous variables in the regression component of submodel j. If the number of rows exceeds numObs, then simulate uses the latest observations. To use different predictors in each state, specify a cell vector of such matrices with length numStates. By default, simulate ignores regression components in Mdl. Data Types: double
Output Arguments Y — Simulated response paths numeric matrix | numeric array Simulated response paths, returned as a numeric matrix or array. Y represents the continuation of the presample responses in Y0. For univariate ARX submodels, Y is a numObs-by-numPaths matrix. For multivariate VARX submodels, Y is a numObs-by-numSeries-by-numPaths array. E — Simulated innovation paths numeric matrix | numeric array Simulated innovation paths, returned as a numeric matrix or array. For univariate ARX submodels, E is a numObs-by-numPaths matrix. For multivariate VARX submodels, E is a numObs-by-numSeries-by-numPaths array. StatePaths — Simulated state paths numeric matrix 12-2269
12
Functions
Simulated state paths, returned as a numObs-by-numPaths numeric matrix.
Version History Introduced in R2019b
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994.
See Also Objects msVAR Functions estimate | forecast
12-2270
simulate
simulate Monte Carlo simulation of univariate regression model with ARIMA time series errors
Syntax Y = simulate(Mdl,numobs) Y = simulate(Mdl,numobs,Name=Value) [Y,E,U] = simulate( ___ ) Tbl = simulate(Mdl,numobs,Presample=Presample,PresampleInnovationVariable= PresampleInnovationVariable) Tbl = simulate(Mdl,numobs,InSample=InSample,PredictorVariables= PredictorVariables) Tbl = simulate(Mdl,numobs,Presample=Presample,PresampleInnovationVariable= PresampleInnovationVariable,InSample=InSample,PredictorVariables= PredictorVariables) Tbl = simulate( ___ ,Name=Value)
Description Y = simulate(Mdl,numobs) returns the numeric vector Y containing a random numobs-period response path from simulating the fully specified regression model with ARIMA time series errors Mdl. Y = simulate(Mdl,numobs,Name=Value) uses additional options specified by one or more namevalue arguments. simulate returns numeric arrays when all optional input data are numeric arrays. For example, simulate(Mdl,10,NumPaths=1000,X=Pred) simulates 1000 sample paths of length 10 from the regression model with ARIMA errors Mdl, and uses the predictor data in Pred for the model regression component. [Y,E,U] = simulate( ___ ) uses any input-argument combination in the previous syntaxes to return numeric arrays of one or more independent series of error model innovations E and unconditional disturbances U, resulting from simulating the regression model with ARIMA errors. Tbl = simulate(Mdl,numobs,Presample=Presample,PresampleInnovationVariable= PresampleInnovationVariable) returns the table or timetable Tbl containing a variable for each of the random paths of response, error model innovation, and unconditional disturbance series resulting from simulating the regression model with ARIMA errors Mdl. simulate uses the error model variable PresampleInnovationVariable in the table or timetable of presample data Presample to initialize the model. To initialize the model using presample unconditional disturbance data, replace the PresampleInnovationVariable name-value argument with PresampleRegressionDisturbanceVariable name-value argument. Tbl = simulate(Mdl,numobs,InSample=InSample,PredictorVariables= PredictorVariables) specifies the variables PredictorVariables in the in-sample table or timetable of data InSample containing the predictor data for the model regression component. Tbl = simulate(Mdl,numobs,Presample=Presample,PresampleInnovationVariable= PresampleInnovationVariable,InSample=InSample,PredictorVariables= 12-2271
12
Functions
PredictorVariables) specifies presample error model innovation data to initialize the model and in-sample predictor data for the model regression component. Tbl = simulate( ___ ,Name=Value) uses additional options specified by one or more name-value arguments, using any input argument combination in the previous three syntaxes. For example, simulate(Mdl,100,NumPaths=1000,InSample=Tbl,PredictoreVariables="CPI") returns a timetable containing a variable for each of the response, error model innovation, and unconditional disturbance series. Each variable is a 100-by-1000 matrix representing 1000, 100-period paths simulated from the regression model with ARIMA errors. simulate applies the predictor data in the CPI variable of the timetable Tbl to the model regression component.
Examples Simulate Response Path Vector From Regression Model with ARMA Errors Create the following regression model with ARMA(2,1) errors: yt = 1 + ut ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, where εt is Gaussian with variance 0.1. Mdl = regARIMA(Intercept=1,AR={0.5 -0.8},MA=-0.5, ... Variance=0.1);
Mdl is a fully specified regARIMA object. Simulate a path of responses of length 100. rng(1,"twister") % For reproducibility y = simulate(Mdl,100);
y is a 100-by-1 vector containing the response path simulated from Mdl. Plot the simulated path. plot(y)
12-2272
simulate
Simulate Matrix of Response Paths Simulate 1000 paths of responses from the following regression model with ARMA(2,1) errors: 0.1 + ut −0 . 2 ut = 0 . 5ut − 1 − 0 . 8ut − 2 + εt − 0 . 5εt − 1, yt = Xt
where εt is Gaussian with variance 0.1. Assume the predictors are standard Gaussian random variables. Provide data as numeric arrays. Create the regression model with ARIMA errors. Mdl = regARIMA(Intercept=0,AR={0.5 -0.8},MA=-0.5, ... Beta=[0.1; -0.2],Variance=0.1);
Simulate two series of predictor data for the regression component. rng(1,"twister") % For reproducibility Pred = randn(100,2);
Simulate 1000 paths of responses each of length 100. 12-2273
12
Functions
numobs = 100; numpaths = 1000; y = simulate(Mdl,100,X=Pred,NumPaths=1000);
y is a 1000-by-100 matrix containing the independent response paths simulated from Mdl. Plot the simulated paths. plot(y)
Simulate Responses, Innovations, and Unconditional Disturbances Simulate paths of responses, innovations, and unconditional disturbances from a regression model with SARIMA 2, 1, 1 12 errors. Specify the model: yt = X
1.5 + ut −2
1 − 0 . 2L − 0 . 1L2 1 − L 1 − 0 . 01L12 1 − L12 ut = 1 + 0 . 5L 1 + 0 . 02L12 εt, where εt follows a t-distribution with 15 degrees of freedom. 12-2274
simulate
dstr = struct("Name","t","DoF",15); Mdl = regARIMA(AR={0.2 0.1},MA=0.5,SAR=0.01,SARLags=12, ... SMA=0.02,SMALags=12,D=1,Seasonality=12,Beta=[1.5; -2], ... Intercept=0,Variance=0.1,Distribution=dstr) Mdl = regARIMA with properties: Description: SeriesName: Distribution: Intercept: Beta: P: D: Q: AR: SAR: MA: SMA: Seasonality: Variance:
"Regression with ARIMA(2,1,1) Error Model Seasonally Integrated with Seasonal A "Y" Name = "t", DoF = 15 0 [1.5 -2] 27 1 13 {0.2 0.1} at lags [1 2] {0.01} at lag [12] {0.5} at lag [1] {0.02} at lag [12] 12 0.1
Simulate and plot 500 paths with 25 observations each. T = 25; rng(1,"twister") % For reproducibility Pred = randn(T,2); [Y,E,U] = simulate(Mdl,T,NumPaths=500,X=Pred); figure tiledlayout(3,1) nexttile plot(Y) axis tight title("Simulated Response Paths") nexttile plot(E) axis tight title("Simulated Innovations Paths") nexttile plot(U) axis tight title("Simulated Unconditional Disturbances Paths")
12-2275
12
Functions
Plot the 2.5th, 50th (median), and 97.5th percentiles of the simulated response paths. lower = prctile(Y,2.5,2); middle = median(Y,2); upper = prctile(Y,97.5,2); figure plot(1:25,lower,"r:",1:25,middle,"k",1:25,upper,"r:") title("95% Percentile Confidence Interval for Response") legend("95% Interval","Median",Location="best")
12-2276
simulate
Compute statistics across the second dimension (across paths) to summarize the sample paths. Plot a histogram of the simulated paths at time 20. figure histogram(Y(20,:),10) title("Response Distribution at Time 20")
12-2277
12
Functions
Forecast Model With Stationary Errors Using Monte Carlo Simulations Fit a regression model with ARMA(1,1) errors by regressing the US consumer price index (CPI) quarterly changes onto the US gross domestic product (GDP) growth rate. Forecast log GDP using Monte Carlo simulation and the estimated model. Supply data in timetables. Load and Transform Data Load the US macroeconomic data set. Compute the series of GDP quarterly growth rates and CPI quarterly changes. load Data_USEconModel DTT = price2ret(DataTimeTable,DataVariables="GDP"); DTT.GDPRate = 100*DTT.GDP; DTT.CPIDel = diff(DataTimeTable.CPIAUCSL); T = height(DTT) T = 248 figure tiledlayout(2,1) nexttile plot(DTT.Time,DTT.GDPRate) title("GDP Rate") ylabel("Percent Growth")
12-2278
simulate
nexttile plot(DTT.Time,DTT.CPIDel) title("Index")
The series appear stationary, albeit heteroscedastic. Prepare Timetable for Estimation When you plan to supply a timetable, you must ensure it has all the following characteristics: • The selected response variable is numeric and does not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the timetable. DTT = rmmissing(DTT); T_DTT = height(DTT) T_DTT = 248
Because each sample time has an observation for all variables, rmmissing does not remove any observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters")
12-2279
12
Functions
areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt; areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 1
DTT is regular. Create Model Template for Estimation Suppose that a regression model of the quarterly GDP rate on CPI changes, with ARMA(1,1) errors, is appropriate. Create a model template for a regression model with ARMA(1,1) errors template. Specify the response variable name. Mdl = regARIMA(1,0,1); Mdl.SeriesName = "GDPRate";
Mdl is a partially specified regARIMA object. Partition Data Reserve 2 years (8 quarters) of data at the end of the series to compare against the forecasts. numobs = 8; estidx = 1:(T_DTT-numobs); frstHzn = (T_DTT-numobs+1):T_DTT;
% Estimation sample % Forecast horizon
Fit Model to Data Fit a regression model with ARMA(1,1) errors to the estimation sample. Specify the predictor variable name. EstMdl = estimate(Mdl,DTT(estidx,:),PredictorVariables="CPIDel"); Regression with ARMA(1,1) Error Model (Gaussian Distribution):
12-2280
simulate
Value __________ Intercept AR{1} MA{1} Beta(1) Variance
0.016489 0.57835 -0.15125 0.0025095 0.00011319
StandardError _____________ 0.0017307 0.096952 0.11658 0.0014147 7.5405e-06
TStatistic __________ 9.5272 5.9653 -1.2974 1.7738 15.01
PValue __________ 1.6152e-21 2.4415e-09 0.19449 0.076089 6.2792e-51
EstMdl is a fully specified, estimated regARIMA object. By default, estimate backcasts for the required Mdl.P = 1 presample regression model residual and sets the required Mdl.Q = 1 presample error model residual to 0. Forecast Estimated Model Infer estimation sample unconditional disturbances to initialize the model for forecasting. Specify the predictor variable name. Tbl0 = infer(EstMdl,DTT(estidx,:),PredictorVariables="CPIDel");
Simulate 1000 paths with 8 observations each. Use the inferred unconditional disturbances as presample data. Specify the predictor and presample unconditional disturbance variable names. rng(1,"twister"); % For reproducibility numpaths = 1000; TblSim = simulate(EstMdl,numobs,NumPaths=numpaths,Presample=Tbl0, ... PresampleRegressionDisturbanceVariable="GDPRate_RegressionResidual", ... InSample=DTT(frstHzn,:),PredictorVariables="CPIDel");
Plot the simulation median forecast and approximate 95% forecast intervals. TblSim.FStats = quantile(TblSim.GDPRate_Response,[0.025 0.5 0.975],2); figure plot(DTT.Time(end-40:end),DTT.GDPRate(end-40:end),Color=[.7,.7,.7]) hold on h1 = plot(TblSim.Time,TblSim.FStats(:,[1 3]),"r:",LineWidth=2); h2 = plot(TblSim.Time,TblSim.FStats(:,2),"k",LineWidth=2); h = gca; ph = patch([repmat(TblSim.Time(1),1,2) repmat(TblSim.Time(end),1,2)], ... [h.YLim fliplr(h.YLim)], ... [0 0 0 0],"b"); ph.FaceAlpha = 0.1; legend([h1(1) h2],["95% percentile intervals" "Sim. median"],Location="northwest", ... AutoUpdate="off") axis tight title("GDP Rate Forecast Over 2-year Horizon") hold off
12-2281
12
Functions
Forecast Model with Nonstationary Errors Using Monte Carlo Simulation Fit a regression model with ARIMA(1,1,1) errors by regressing the quarterly log US GDP onto the log CPI. Forecast log GDP using Monte Carlo simulation and the estimated model. Supply data in timetables. Load the US macroeconomic data set. Compute the log GDP series. load Data_USEconModel DTT = DataTimeTable; DTT.LogGDP = log(DTT.GDP); T = height(DTT);
Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
Reserve 2 years (8 quarters) of data at the end of the series to compare against the forecasts. numobs = 8; estidx = 1:(T-numobs); frstHzn = (T-numobs+1):T;
12-2282
% Estimation sample % Forecast horizon
simulate
Suppose that a regression model of the quarterly log GDP on CPI, with ARMA(1,1) errors, is appropriate. Create a model template for a regression model with ARMA(1,1) errors template. Specify the response variable name. Mdl = regARIMA(1,1,1); Mdl.SeriesName = "LogGDP";
The intercept is not identifiable in a regression model with integrated errors. Fix its value before estimation. One way to do this is to estimate the intercept using simple linear regression. Use the estimation sample. coeff = [ones(T-numobs,1) DTT.CPIAUCSL(estidx)]\DTT.LogGDP(estidx); Mdl.Intercept = coeff(1);
Fit a regression model with ARMA(1,1,1) errors to the estimation sample. Specify the predictor variable name. EstMdl = estimate(Mdl,DTT(estidx,:),PredictorVariables="CPIAUCSL"); Regression with ARIMA(1,1,1) Error Model (Gaussian Distribution): Value __________ Intercept AR{1} MA{1} Beta(1) Variance
5.8303 0.92869 -0.39063 0.0029335 0.00010668
StandardError _____________ 0 0.028414 0.057599 0.0014645 6.9256e-06
TStatistic __________ Inf 32.684 -6.7819 2.0031 15.403
PValue __________ 0 2.612e-234 1.1858e-11 0.045166 1.5539e-53
EstMdl is a fully specified, estimated regARIMA object. By default, estimate backcasts for the required Mdl.P = 2 presample regression model residual and sets the required Mdl.Q = 1 presample error model residual to 0. Infer estimation sample unconditional disturbances to initialize the model for forecasting. Specify the predictor variable name. Tbl0 = infer(EstMdl,DTT(estidx,:),PredictorVariables="CPIAUCSL");
Simulate 1000 paths with 8 observations each. Use the inferred unconditional disturbances as presample data. Specify the predictor and presample unconditional disturbance variable names. rng(1,"twister"); % For reproducibility numpaths = 1000; TblSim = simulate(EstMdl,numobs,NumPaths=numpaths,Presample=Tbl0, ... PresampleRegressionDisturbanceVariable="LogGDP_RegressionResidual", ... InSample=DTT(frstHzn,:),PredictorVariables="CPIAUCSL");
Plot the simulation median forecast and approximate 95% forecast intervals. TblSim.FStats = quantile(TblSim.LogGDP_Response,[0.025 0.5 0.975],2); figure plot(DTT.Time(end-40:end),DTT.LogGDP(end-40:end),Color=[.7,.7,.7]) hold on h1 = plot(TblSim.Time,TblSim.FStats(:,[1 3]),"r:",LineWidth=2);
12-2283
12
Functions
h2 = plot(TblSim.Time,TblSim.FStats(:,2),"k",LineWidth=2); h = gca; ph = patch([repmat(TblSim.Time(1),1,2) repmat(TblSim.Time(end),1,2)], ... [h.YLim fliplr(h.YLim)],[0 0 0 0],"b"); ph.FaceAlpha = 0.1; legend([h1(1) h2],["95% percentile intervals" "Sim. median"],Location="northwest", ... AutoUpdate="off") axis tight title("Log GDP Forecast Over 2-year Horizon") hold off
Input Arguments Mdl — Fully specified regression model with ARIMA errors regARIMA model object Fully specified regression model with ARIMA errors, specified as a regARIMA model object created by regARIMA or estimate. The properties of Mdl cannot contain NaN values. numobs — Number of random observations to generate positive integer Sample path length, specified as a positive integer. numobs is the number of random observations to generate per output path. 12-2284
simulate
Data Types: double Presample — Presample data table | timetable Presample data containing paths of responses error model innovations εt or unconditional disturbances ut to initialize the model, specified as a table or timetable with numprevars variables and numpreobs rows. simulate returns the simulated variables in the output table or timetable Tbl, which is the same type as Presample. If Presample is a timetable, Tbl is a timetable that immediately follows Presample in time with respect to the sampling frequency. Each selected variable is a single path (numpreobs-by-1 vector) or multiple paths (numpreobs-bynumprepaths matrix) of numpreobs observations representing the presample of numpreobs observations of error model innovations or unconditional disturbances. Each row is a presample observation, and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be one of the following values: • At least Mdl.P when Presample provides only presample unconditional disturbances • At least Mdl.Q when Presample provides only presample error model innovations • At least max([Mdl.P Mdl.Q]) otherwise If numpreobs exceeds the minimum number, simulate uses the latest required number of observations only. If numprepaths > NumPaths, simulate uses only the first NumPaths columns. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The datetime vector of sample timestamps Presample.Time must be ascending or descending. • If you specify InSample, Presample must immediately precede InSample, with respect to the sampling frequency. If Presample is a table, the last row contains the latest presample observation. By default, simulate sets necessary presample error model innovations and unconditional disturbances to zero. If you specify the Presample, you must specify the presample error model innovation or unconditional disturbance variable name by using the PresampleInnovationVariable or PresampleRegressionDisturbanceVariable name-value argument. PresampleInnovationVariable — Error model innovation εt to select from Presample string scalar | character vector | integer | logical vector Error model innovation εt to select from Presample containing the presample error model innovation data, specified as one of the following data types: • String scalar or character vector containing the variable name to select from Presample.Properties.VariableNames 12-2285
12
Functions
• Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleInnovationVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample error model innovation data by using the Presample name-value argument, you must specify PresampleInnovationVariable. Example: PresampleInnovationVariable="GDP_E" Example: PresampleInnovationVariable=[false false true false] or PresampleInnovationVariable=3 selects the third table variable for presample error model innovation data. Data Types: double | logical | char | cell | string InSample — In-sample predictor data table | timetable In-sample predictor data for the model regression component, specified as a table or timetable. InSample contains numvars variables, including numpreds predictor variables xt. simulate returns the simulated variables in the output table or timetable Tbl, which is commensurate with InSample. Each row corresponds to an observation in the simulation horizon, the first row is the earliest observation, and measurements in each row, among all paths, occur simultaneously. InSample must have at least numobs rows to cover the simulation horizon. If you supply more rows than necessary, simulate uses only the first numobs rows. Each selected predictor variable is a numeric vector without missing values (NaNs). All predictor variables are present in the regression component of each response equation and apply to all response paths. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). • The datetime vector InSample.Time must be ascending or descending. • If you specify Presample, Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the last row contains the latest observation. By default, simulate does not include the regression component in the model, regardless of the value of Mdl.Beta. PredictorVariables — Predictor variables xt to select from InSample string vector | cell vector of character vectors | vector of integers | logical vector Predictor variables xt to select from InSample containing predictor data for the regression component, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames 12-2286
simulate
• A vector of unique indices (positive integers) of variables to select from InSample.Properties.VariableNames • A logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames The selected variables must be numeric vectors and cannot contain missing values (NaNs). By default, simulate excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables to supply the predictor data. Data Types: double | logical | char | cell | string Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: simulate(Mdl,100,NumPaths=1000,InSample=Tbl,PredictoreVariables="CPI") returns a timetable containing a variable for each of the response, error model innovation, and unconditional disturbance series. Each variable is a 100-by-1000 matrix representing 1000, 100period paths simulated from the regression model with ARIMA errors. simulate applies the predictor data in the CPI variable of the timetable Tbl to the model regression component. NumPaths — Number of independent sample paths to generate 1 (default) | positive integer Number of independent sample paths to generate, specified as a positive integer. Example: NumPaths=1000 Data Types: double X — Predictor data numeric matrix Predictor data for the model regression component, specified as a numeric matrix containing numpreds columns. numpreds is the number of predictor variables (numel(Mdl.Beta)). Use X only when you supply optional data inputs as numeric arrays. Each row of X corresponds to a period in the length numobs simulation sample (period for which simulate simulates observations; the period after the presample). X must have at least numobs rows. The last row contains the latest predictor data. If X has more than numobs rows, simulate uses only the latest numobs rows. simulate does not use the regression component in the presample period. Each column is an individual predictor variable. simulate applies X to each path; that is, X represents one path of observed predictors. By default, simulate excludes the regression component, regardless of its presence in Mdl. 12-2287
12
Functions
Data Types: double E0 — Presample error model innovations εt 0 (default) | numeric column vector | numeric matrix Presample error model innovations εt used to initialize the moving average (MA) component of the error model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-bynumprepaths matrix. Use E0 only when you supply optional data inputs as numeric arrays. numpreobs is the number of presample observations. numprepaths is the number of presample response paths. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.Q to initialize the MA component. If numpreobs is larger than required, simulate uses the latest required number of observations only. Columns of E0 are separate, independent presample paths. The following conditions apply: • If E0 is a column vector, it represents a single residual path. simulate applies it to each output path. • If E0 is a matrix, simulate applies E0(:,j) to initialize simulating path j. E0 must have at least NumPaths columns; simulate uses only the first NumPaths columns of E0. Data Types: double U0 — Presample unconditional disturbances ut 0 (default) | numeric column vector | numeric matrix Presample unconditional disturbances ut used to initialize the autoregressive (AR) component of the error model, specified as a numpreobs-by-1 numeric column vector or a numpreobs-bynumprepaths matrix. Use U0 only when you supply optional data inputs as numeric arrays. Each row is a presample observation (sampling time), and measurements in each row occur simultaneously. The last row contains the latest presample observation. numpreobs must be at least Mdl.P to initialize the AR component. If numpreobs is larger than required, simulate uses the latest required number of observations only. Columns of U0 are separate, independent presample paths. The following conditions apply: • If U0 is a column vector, it represents a single residual path. simulate applies it to each output path. • If U0 is a matrix, simulate applies U0(:,j) to initialize simulating path j. U0 must have at least NumPaths columns; simulate uses only the first NumPaths columns of U0. Data Types: double PresampleRegressionDistrubanceVariable — Unconditional disturbance variable ut to select from Presample string scalar | character vector | integer | logical vector Unconditional disturbance variable ut to select from Presample containing data for the presample unconditional disturbances, specified as one of the following data types: 12-2288
simulate
• String scalar or character vector containing a variable name in Presample.Properties.VariableNames • Variable index (positive integer) to select from Presample.Properties.VariableNames • A logical vector, where PresampleRegressionDistrubanceVariable(j) = true selects variable j from Presample.Properties.VariableNames The selected variable must be a numeric vector and cannot contain missing values (NaNs). If you specify presample unconditional disturbance data by using the Presample name-value argument, you must specify PresampleRegressionDistrubanceVariable. Example: PresampleRegressionDistrubanceVariable="StockRateU" Example: PresampleRegressionDistrubanceVariable=[false false true false] or PresampleRegressionDistrubanceVariable=3 selects the third table variable as the presample unconditional disturbance data. Data Types: double | logical | char | cell | string Note • NaN values in X, E0, and U0 indicate missing values. simulate removes missing values from specified data by list-wise deletion. • For the presample, simulate horizontally concatenates the possibly jagged arrays E0 and U0 with respect to the last rows, and then it removes any row of the concatenated matrix containing at least one NaN. • For in-sample data, simulate removes any row of X containing at least one NaN. This type of data reduction reduces the effective sample size and can create an irregular time series. • For numeric data inputs, simulate assumes that you synchronize the presample data such that the latest observations occur simultaneously. • simulate issues an error when any table or timetable input contains missing values.
Output Arguments Y — Simulated response paths yt numeric column vector | numeric matrix Simulated response paths yt, returned as a numobs-by-1 numeric column vector or a numobs-byNumPaths numeric matrix. simulate returns Y by default and when you supply optional data in numeric arrays. Y represents the continuation of the presample responses in Y0. Each row corresponds to a period in the simulated series; the simulated series has the periodicity of Mdl. Each column is a separate simulated path. E — Simulated error model innovation paths εt numeric column vector | numeric matrix 12-2289
12
Functions
Simulated error model innovations paths εt, returned as a numobs-by-1 numeric column vector or a numobs-by-NumPaths numeric matrix. Each column (path) of E has a mean of zero. simulate returns E by default and when you supply optional data in numeric arrays The dimensions of E correspond to the dimensions of Y. U — Simulated unconditional disturbance paths ut numeric column vector | numeric matrix Simulated unconditional disturbance paths ut, returned as a numobs-by-1 numeric column vector or a numobs-by-NumPaths numeric matrix. The dimensions of U correspond to the dimensions of Y. Tbl — Simulated response yt, error model innovation εt, and unconditional disturbance ut paths table | timetable Simulated response yt, error model innovation εt, and unconditional disturbance ut paths, returned as a table or timetable, the same data type as Presample or InSample. simulate returns Tbl only when you supply at least one of the inputs Presample and InSample. Tbl contains the following variables: • The simulated response paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the presample in Presample, or each path corresponds, in time, with the rows of InSample. simulate names the simulated response variable in Tbl responseName_Response, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is GDP, Tbl contains a variable for the corresponding simulated response paths with the name GDP_Response. • The simulated error model innovation paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path has a mean of zero, and represents the continuation of the corresponding presample path in Presample, or each path corresponds, in time, with the rows of InSample. simulate names the simulated error model innovation variable in Tbl responseName_ErrorInnovation, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is GDP, Tbl contains a variable for the corresponding simulated error model innovation paths with the name GDP_ErrorInnovation. • The simulated unconditional disturbance paths, which are in a numobs-by-NumPaths numeric matrix, with rows representing observations and columns representing independent paths. Each path represents the continuation of the corresponding presample path in Presample, or each path corresponds, in time, with the rows of InSample. simulate names the simulated unconditional disturbance variable in Tbl responseName_RegressionInnovation, where responseName is Mdl.SeriesName. For example, if Mdl.SeriesName is GDP, Tbl contains a variable for the corresponding simulated unconditional disturbance paths with the name GDP_RegressionInnovation. • When you supply InSample, Tbl contains all variables in InSample. If Tbl is a timetable, the following conditions hold: • The row order of Tbl, either ascending or descending, matches the row order of Preample. 12-2290
simulate
• If you specify InSample, row times Tbl.Time are InSample.Time(1:numobs). Otherwise, Tbl.Time(1) is the next time after Presample(end) relative to the sampling frequency, and Tbl.Time(2:numobs) are the following times relative to the sampling frequency.
Version History Introduced in R2013b R2023a: simulate accepts input data in tables and timetables, and returns results in tables and timetables In addition to accepting presample and in-sample predictor data in numeric arrays, simulate accepts input data in tables or regular timetables. When you supply input data in a table or timetable, the following conditions apply: • If you specify optional presample error model innovation or unconditional disturbance data to initialize the model, you must also specify corresponding variable names containing the data to use. • If you specify optional in-sample predictor data for the model regression component, you must also specify corresponding predictor variable names containing the data to use. • simulate returns results in a table or timetable. Name-value arguments to support tabular workflows include: • InSample specifies the table or regular timetable of predictor data for the model regression component. • PredictorVariables specifies the names of the predictor series to select from InSample for the model regression component. • Presample specifies the input table or timetable of presample regression innovation or error model innovation data. • PresampleInnovationVariable specifies the name of the error model innovation series to select from Presample. • PresampleRegressionDisturbanceVariable specifies the name of the unconditional disturbance series to select from Presample.
References [1] Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994. [2] Davidson, R., and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [3] Enders, Walter. Applied Econometric Time Series. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [4] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [5] Pankratz, A. Forecasting with Dynamic Regression Models. John Wiley & Sons, Inc., 1991. [6] Tsay, R. S. Analysis of Financial Time Series. 2nd ed. Hoboken, NJ: John Wiley & Sons, Inc., 2005. 12-2291
12
Functions
See Also Objects regARIMA Functions estimate | filter | forecast | infer Topics “Alternative ARIMA Model Representations” on page 5-113 “Simulate Stationary Processes” on page 7-147 “Simulate Trend-Stationary and Difference-Stationary Processes” on page 7-155 “Monte Carlo Simulation of Conditional Mean Models” on page 7-143 “Presample Data for Conditional Mean Model Simulation” on page 7-145 “Transient Effects in Conditional Mean Model Simulations” on page 7-146 “Monte Carlo Forecasting of Conditional Mean Models” on page 7-166
12-2292
simulate
simulate Monte Carlo simulation of state-space models
Syntax [Y,X] = simulate(Mdl,numObs) [Y,X] = simulate(Mdl,numObs,Name,Value) [Y,X,U,E] = simulate( ___ )
Description [Y,X] = simulate(Mdl,numObs) simulates one sample path of observations (Y) and states (X) from a fully specified, state-space model on page 11-3 (Mdl). The software simulates numObs observations and states per sample path. [Y,X] = simulate(Mdl,numObs,Name,Value) returns simulated responses and states with additional options specified by one or more Name,Value pair arguments. For example, specify the number of paths or model parameter values. [Y,X,U,E] = simulate( ___ ) additionally simulate state disturbances (U) and observation innovations (E) using any of the input arguments in the previous syntaxes.
Examples Simulate States and Observations of Time-Invariant State-Space Model Suppose that a latent process is an AR(1) model. The state equation is xt = 0 . 5xt − 1 + ut, where ut is Gaussian with mean 0 and standard deviation 1. Generate a random series of 100 observations from xt, assuming that the series starts at 1.5. T = 100; ARMdl = arima('AR',0.5,'Constant',0,'Variance',1); x0 = 1.5; rng(1); % For reproducibility x = simulate(ARMdl,T,'Y0',x0);
Suppose further that the latent process is subject to additive measurement error. The observation equation is yt = xt + εt, where εt is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model. Use the random latent state process (x) and the observation equation to generate observations. 12-2293
12
Functions
y = x + 0.75*randn(T,1);
Specify the four coefficient matrices. A B C D
= = = =
0.5; 1; 1; 0.75;
Specify the state-space model using the coefficient matrices. Mdl = ssm(A,B,C,D) Mdl = State-space model type: ssm State vector length: 1 Observation vector length: 1 State disturbance vector length: 1 Observation innovation vector length: 1 Sample size supported by model: Unlimited State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... State equation: x1(t) = (0.50)x1(t-1) + u1(t) Observation equation: y1(t) = x1(t) + (0.75)e1(t) Initial state distribution: Initial state means x1 0 Initial state covariance matrix x1 x1 1.33 State types x1 Stationary
Mdl is an ssm model. Verify that the model is correctly specified using the display in the Command Window. The software infers that the state process is stationary. Subsequently, the software sets the initial state mean and covariance to the mean and variance of the stationary distribution of an AR(1) model. Simulate one path each of states and observations. Specify that the paths span 100 periods. [simY,simX] = simulate(Mdl,100);
simY is a 100-by-1 vector of simulated responses. simX is a 100-by-1 vector of simulated states. 12-2294
simulate
Plot the true state values with the simulated states. Also, plot the observed responses with the simulated responses. figure subplot(2,1,1) plot(1:T,x,'-k',1:T,simX,':r','LineWidth',2) title({'True State Values and Simulated States'}) xlabel('Period') ylabel('State') legend({'True state values','Simulated state values'}) subplot(2,1,2) plot(1:T,y,'-k',1:T,simY,':r','LineWidth',2) title({'Observed Responses and Simulated responses'}) xlabel('Period') ylabel('Response') legend({'Observed responses','Simulated responses'})
By default, simulate simulates one path for each state and observation in the state-space model. To conduct a Monte Carlo study, specify to simulate a large number of paths.
Simulate State-Space Models Containing Unknown Parameters To generate variates from a state-space model, specify values for all unknown parameters. Explicitly create this state-space model. 12-2295
12
Functions
xt = ϕxt − 1 + σ1ut yt = xt + σ2εt where ut and εt are independent Gaussian random variables with mean 0 and variance 1. Suppose that the initial state mean and variance are 1, and that the state is a stationary process. A = NaN; B = NaN; C = 1; D = NaN; mean0 = 1; cov0 = 1; stateType = 0; Mdl = ssm(A,B,C,D,'Mean0',mean0,'Cov0',cov0,'StateType',stateType);
Simulate 100 responses from Mdl. Specify that the autoregressive coefficient is 0.75, the state disturbance standard deviation is 0.5, and the observation innovation standard deviation is 0.25. params = [0.75 0.5 0.25]; y = simulate(Mdl,100,'Params',params); figure; plot(y); title 'Simulated Responses'; xlabel 'Period';
12-2296
simulate
The software searches for NaN values column-wise following the order A, B, C, D, Mean0, and Cov0. The order of the elements in params should correspond to this search.
Estimate Monte-Carlo Forecasts of State-Space Model Suppose that the relationship between the change in the unemployment rate (x1, t) and the nominal gross national product (nGNP) growth rate (x3, t) can be expressed in the following, state-space model form. x1, t x2, t x3, t
ϕ1 θ1 γ1 0 x1, t − 1 =
x4, t
0 0 0 0 x2, t − 1 γ2 0 ϕ2 θ2 x3, t − 1 0 0 0 0 x4, t − 1
1 1 + 0 0
0 0 u1, t 1 u2, t 1
x1, t y1, t y2, t
=
σ1 0 ε1, t 1 0 0 0 x2, t + , 0 0 1 0 x3, t 0 σ2 ε2, t x4, t
where: • x1, t is the change in the unemployment rate at time t. • x2, t is a dummy state for the MA(1) effect on x1, t. • x3, t is the nGNP growth rate at time t. • x4, t is a dummy state for the MA(1) effect on x3, t. •
y1, t is the observed change in the unemployment rate.
•
y2, t is the observed nGNP growth rate.
• u1, t and u2, t are Gaussian series of state disturbances having mean 0 and standard deviation 1. • ε1, t is the Gaussian series of observation innovations having mean 0 and standard deviation σ1. • ε2, t is the Gaussian series of observation innovations having mean 0 and standard deviation σ2. Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things. load Data_NelsonPlosser
Preprocess the data by taking the natural logarithm of the nGNP series, and the first difference of each. Also, remove the starting NaN values from each series. isNaN = any(ismissing(DataTable),2); gnpn = DataTable.GNPN(~isNaN); u = DataTable.UR(~isNaN); T = size(gnpn,1); y = zeros(T-1,2); y(:,1) = diff(u); y(:,2) = diff(log(gnpn));
% Flag periods containing NaNs % Sample size % Preallocate
12-2297
12
Functions
This example proceeds using series without NaN values. However, using the Kalman filter framework, the software can accommodate series containing missing values. To determine how well the model forecasts observations, remove the last 10 observations for comparison. numPeriods = 10; isY = y(1:end-numPeriods,:); oosY = y(end-numPeriods+1:end,:);
% Forecast horizon % In-sample observations % Out-of-sample observations
Specify the coefficient matrices. A B C D
= = = =
[NaN NaN NaN 0; 0 0 0 0; NaN 0 NaN NaN; 0 0 0 0]; [1 0;1 0 ; 0 1; 0 1]; [1 0 0 0; 0 0 1 0]; [NaN 0; 0 NaN];
Specify the state-space model using ssm. Verify that the model specification is consistent with the state-space model. Mdl = ssm(A,B,C,D) Mdl = State-space model type: ssm State vector length: 4 Observation vector length: 2 State disturbance vector length: 2 Observation innovation vector length: 2 Sample size supported by model: Unlimited Unknown parameters for estimation: 8 State variables: x1, x2,... State disturbances: u1, u2,... Observation series: y1, y2,... Observation innovations: e1, e2,... Unknown parameters: c1, c2,... State x1(t) x2(t) x3(t) x4(t)
equations: = (c1)x1(t-1) + (c3)x2(t-1) + (c4)x3(t-1) + u1(t) = u1(t) = (c2)x1(t-1) + (c5)x3(t-1) + (c6)x4(t-1) + u2(t) = u2(t)
Observation equations: y1(t) = x1(t) + (c7)e1(t) y2(t) = x3(t) + (c8)e2(t) Initial state distribution: Initial state means are not specified. Initial state covariance matrix is not specified. State types are not specified.
Estimate the model parameters, and use a random set of initial parameter values for optimization. Restrict the estimate of σ1 and σ2 to all positive, real numbers using the 'lb' name-value pair 12-2298
simulate
argument. For numerical stability, specify the Hessian when the software computes the parameter covariance matrix, using the 'CovMethod' name-value pair argument. rng(1); params0 = rand(8,1); [EstMdl,estParams] = estimate(Mdl,isY,params0,... 'lb',[-Inf -Inf -Inf -Inf -Inf -Inf 0 0],'CovMethod','hessian'); Method: Maximum likelihood (fmincon) Sample size: 51 Logarithmic likelihood: -170.92 Akaike info criterion: 357.84 Bayesian info criterion: 373.295 | Coeff Std Err t Stat Prob ---------------------------------------------------c(1) | 0.06750 0.16548 0.40791 0.68334 c(2) | -0.01372 0.05887 -0.23302 0.81575 c(3) | 2.71201 0.27039 10.03006 0 c(4) | 0.83816 2.84586 0.29452 0.76836 c(5) | 0.06273 2.83474 0.02213 0.98235 c(6) | 0.05197 2.56875 0.02023 0.98386 c(7) | 0.00272 2.40777 0.00113 0.99910 c(8) | 0.00016 0.13942 0.00113 0.99910 | | Final State Std Dev t Stat Prob x(1) | -0.00000 0.00272 -0.00033 0.99973 x(2) | 0.12237 0.92954 0.13164 0.89527 x(3) | 0.04049 0.00016 256.74352 0 x(4) | 0.01183 0.00016 72.50925 0
EstMdl is an ssm model, and you can access its properties using dot notation. Filter the estimated, state-space model, and extract the filtered states and their variances from the final period. [~,~,Output] = filter(EstMdl,isY);
Modify the estimated, state-space model so that the initial state means and covariances are the filtered states and their covariances of the final period. This sets up simulation over the forecast horizon. EstMdl1 = EstMdl; EstMdl1.Mean0 = Output(end).FilteredStates; EstMdl1.Cov0 = Output(end).FilteredStatesCov;
Simulate 5e5 paths of observations from the fitted, state-space model EstMdl. Specify to simulate observations for each period. numPaths = 5e5; SimY = simulate(EstMdl1,10,'NumPaths',numPaths);
SimY is a 10-by- 2-by- numPaths array containing the simulated observations. The rows of SimY correspond to periods, the columns correspond to an observation in the model, and the pages correspond to paths. Estimate the forecasted observations and their 95% confidence intervals in the forecast horizon. 12-2299
12
Functions
MCFY = mean(SimY,3); CIFY = quantile(SimY,[0.025 0.975],3);
Estimate the theoretical forecast bands. [Y,YMSE] = forecast(EstMdl,10,isY); Lb = Y - sqrt(YMSE)*1.96; Ub = Y + sqrt(YMSE)*1.96;
Plot the forecasted observations with their true values and the forecast intervals. figure h = plot(dates(end-numPeriods-9:end),[isY(end-9:end,1);oosY(:,1)],'-k',... dates(end-numPeriods+1:end),MCFY(end-numPeriods+1:end,1),'.-r',... dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,1,1),'-b',... dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,1,2),'-b',... dates(end-numPeriods+1:end),Y(:,1),':c',... dates(end-numPeriods+1:end),Lb(:,1),':m',... dates(end-numPeriods+1:end),Ub(:,1),':m',... 'LineWidth',3); xlabel('Period') ylabel('Change in the unemployment rate') legend(h([1,2,4:6]),{'Observations','MC forecasts',... '95% forecast intervals','Theoretical forecasts',... '95% theoretical intervals'},'Location','Best') title('Observed and Forecasted Changes in the Unemployment Rate')
12-2300
simulate
figure h = plot(dates(end-numPeriods-9:end),[isY(end-9:end,2);oosY(:,2)],'-k',... dates(end-numPeriods+1:end),MCFY(end-numPeriods+1:end,2),'.-r',... dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,2,1),'-b',... dates(end-numPeriods+1:end),CIFY(end-numPeriods+1:end,2,2),'-b',... dates(end-numPeriods+1:end),Y(:,2),':c',... dates(end-numPeriods+1:end),Lb(:,2),':m',... dates(end-numPeriods+1:end),Ub(:,2),':m',... 'LineWidth',3); xlabel('Period') ylabel('nGNP growth rate') legend(h([1,2,4:6]),{'Observations','MC forecasts',... '95% MC intervals','Theoretical forecasts','95% theoretical intervals'},... 'Location','Best') title('Observed and Forecasted nGNP Growth Rates')
Input Arguments Mdl — Standard state-space model ssm model object Standard state-space model, specified as anssm model object returned by ssm or estimate. A standard state-space model has finite initial state covariance matrix elements. That is, Mdl cannot be a dssm model object. 12-2301
12
Functions
If Mdl is not fully specified (that is, Mdl contains unknown parameters), then specify values for the unknown parameters using the 'Params' Name,Value pair argument. Otherwise, the software throws an error. numObs — Number of periods per path to simulate positive integer Number of periods per path to generate variants, specified as a positive integer. If Mdl is a time-varying model on page 11-5, then the length of the cell vector corresponding to the coefficient matrices must be at least numObs. If numObs is fewer than the number of periods that Mdl can support, then the software only uses the matrices in the first numObs cells of the cell vectors corresponding to the coefficient matrices. Data Types: double Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: [Y,X] = simulate(Mdl,numObs,'NumPaths',100) NumPaths — Number of sample paths to generate variants 1 (default) | positive integer Number of sample paths to generate variants, specified as the comma-separated pair consisting of 'NumPaths' and a positive integer. Example: 'NumPaths',1000 Data Types: double Params — Values for unknown parameters numeric vector Values for unknown parameters in the state-space model, specified as the comma-separated pair consisting of 'Params' and a numeric vector. The elements of Params correspond to the unknown parameters in the state-space model matrices A, B, C, and D, and, optionally, the initial state mean Mean0 and covariance matrix Cov0. • If you created Mdl explicitly (that is, by specifying the matrices without a parameter-to-matrix mapping function), then the software maps the elements of Params to NaNs in the state-space model matrices and initial state values. The software searches for NaNs column-wise following the order A, B, C, D, Mean0, and Cov0. • If you created Mdl implicitly (that is, by specifying the matrices with a parameter-to-matrix mapping function), then you must set initial parameter values for the state-space model matrices, initial state values, and state types within the parameter-to-matrix mapping function. If Mdl contains unknown parameters, then you must specify their values. Otherwise, the software ignores the value of Params. 12-2302
simulate
Data Types: double
Output Arguments Y — Simulated observations matrix | cell matrix of numeric vectors Simulated observations, returned as a matrix or cell matrix of numeric vectors. If Mdl is a time-invariant model on page 11-4 with respect to the observations, then Y is a numObs-byn-by-numPaths array. That is, each row corresponds to a period, each column corresponds to an observation in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated observations. If Mdl is a time-varying model on page 11-5 with respect to the observations, then Y is a numObs-bynumPaths cell matrix of vectors. Y{t,j} contains a vector of length nt of simulated observations for period t of sample path j. The last row of Y contains the latest set of simulated observations. Data Types: cell | double X — Simulated states numeric matrix | cell matrix of numeric vectors Simulated states, returned as a numeric matrix or cell matrix of vectors. If Mdl is a time-invariant model with respect to the states, then X is a numObs-by-m-by-numPaths array. That is, each row corresponds to a period, each column corresponds to a state in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated states. If Mdl is a time-varying model with respect to the states, then X is a numObs-by-numPaths cell matrix of vectors. X{t,j} contains a vector of length mt of simulated states for period t of sample path j. The last row of X contains the latest set of simulated states. U — Simulated state disturbances matrix | cell matrix of numeric vectors Simulated state disturbances, returned as a matrix or cell matrix of vectors. If Mdl is a time-invariant model with respect to the state disturbances, then U is a numObs-by-h-bynumPaths array. That is, each row corresponds to a period, each column corresponds to a state disturbance in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated state disturbances. If Mdl is a time-varying model with respect to the state disturbances, then U is a numObs-bynumPaths cell matrix of vectors. U{t,j} contains a vector of length ht of simulated state disturbances for period t of sample path j. The last row of U contains the latest set of simulated state disturbances. Data Types: cell | double E — Simulated observation innovations matrix | cell matrix of numeric vectors Simulated observation innovations, returned as a matrix or cell matrix of numeric vectors. 12-2303
12
Functions
If Mdl is a time-invariant model with respect to the observation innovations, then E is a numObs-by-hby-numPaths array. That is, each row corresponds to a period, each column corresponds to an observation innovation in the model, and each page corresponds to a sample path. The last row corresponds to the latest simulated observation innovations. If Mdl is a time-varying model with respect to the observation innovations, then E is a numObs-bynumPaths cell matrix of vectors. E{t,j} contains a vector of length ht of simulated observation innovations for period t of sample path j. The last row of E contains the latest set of simulated observations. Data Types: cell | double
Tip Simulate states from their joint conditional posterior distribution given the responses by using simsmooth.
Version History Introduced in R2014a
References [1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.
See Also ssm | estimate | filter | forecast | smooth | simsmooth Topics “Simulate States and Observations of Time-Invariant State-Space Model” on page 11-87 “Simulate Time-Varying State-Space Model” on page 11-90 “Forecast State-Space Model Using Monte-Carlo Methods” on page 11-104 “Estimate Random Parameter of State-Space Model” on page 11-97 “What Are State-Space Models?” on page 11-3
12-2304
simulate
simulate Simulate sample paths of threshold-switching dynamic regression model
Syntax Y = simulate(Mdl,numObs) Y = simulate(Mdl,numObs,Name,Value) [Y,E,StatePaths] = simulate( ___ )
Description Y = simulate(Mdl,numObs) returns a random numObs-period path of response series Y from simulating the fully specified threshold-switching dynamic regression model Mdl. Y = simulate(Mdl,numObs,Name,Value) uses additional options specified by one or more namevalue arguments. For example, simulate(Mdl,10,NumPaths=1000,Y0=Y0) simulates 1000 sample paths of length 10, and initializes the dynamic component of each submodel by using the presample response data Y0. [Y,E,StatePaths] = simulate( ___ ) also returns the simulated innovation paths E and the simulated state paths StatePaths, using any of the input argument combinations in the previous syntaxes.
Examples Simulate Response Path from SETAR Model Suppose a data-generating process (DGP) is a two-state, self-exciting threshold autoregressive (SETAR) model for a 1-D response variable. Specify all parameter values (this example uses arbitrary values). Create Fully Specified Model for DGP Create a discrete threshold transition at level 0. Label the regimes to reflect the state of the economy: When the threshold variable (currently unknown) is in − ∞ , 0 , the economy is in a recession. When the threshold variable is in [0, ∞ , the economy is expanding. t = 0; tt = threshold(t,StateNames=["Recession" "Expansion"]) tt = threshold with properties: Type: Levels: Rates: StateNames:
'discrete' 0 [] ["Recession"
"Expansion"]
12-2305
12
Functions
NumStates: 2
tt is a fully specified threshold object that describes the switching mechanism of the thresholdswitching model. Assume the following univariate models describe the response process of the system: • Recession: yt = − 1 + 0 . 1yt − 1 + ε1, t, where ε1, t ∼ Ν 0, 1 . 2 • Expansion: y = 1 + + 0 . 3y t t − 1 + 0 . 2yt − 2 + ε2, t, where ε2, t ∼ Ν 0, 2 .
For each regime, use arima to create an AR model that describes the response process within the regime. c1 = -1; c2 = 1; ar1 = 0.1; ar2 = [0.3 0.2]; v1 = 1; v2 = 4; mdl1 = arima(Constant=c1,AR=ar1,Variance=v1,... Description="Recession State Model") mdl1 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State Model" "Y" Name = "Gaussian" 1 0 0 -1 {0.1} at lag [1] {} {} {} 0 [1×0] 1
ARIMA(1,0,0) Model (Gaussian Distribution) mdl2 = arima(Constant=c2,AR=ar2,Variance=v2,... Description="Expansion State Model") mdl2 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant:
12-2306
"Expansion State Model" "Y" Name = "Gaussian" 2 0 0 1
simulate
AR: SAR: MA: SMA: Seasonality: Beta: Variance:
{0.3 0.2} at lags [1 2] {} {} {} 0 [1×0] 4
ARIMA(2,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are fully specified arima objects. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. mdl = [mdl1; mdl2];
Use tsVAR to create a TAR model from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["Recession" "Expansion"] "1" []
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 1-Dimensional VAR(2) Model" "Y1" 1 2 1 {0.3 0.2} at lags [1 2] 0 [1×0 matrix] 4
Mdl is a fully specified tsVAR object representing a univariate two-state TAR model. tsVAR stores specified arima submodels as varm objects. Simulate Response Data from DGP Generate one random response path of length 50 from the model. simulate assumes that the threshold variable is yt − 1, which implies that the model is self-exciting. rng(1); % For reproducibility y = simulate(Mdl,50);
12-2307
12
Functions
y is a 50-by-1 vector of one response path. Plot the response path with the threshold by using ttplot. figure ttplot(Mdl.Switch,Data=y)
Simulate Multiple Paths Consider the following logistic TAR (LSTAR) model for the annual, CPI-based, Canadian inflation rate series yt. • State 1: y = − 5 + ε , where ε ∼ Ν 0, 0 . 12 . 1, t 1, t t • State 2: y = ε , where ε ∼ Ν 0, 0 . 22 . 2, t 2, t t • State 3: y = 5 + ε , where ε ∼ Ν 0, 0 . 32 . 3, t 3, t t • The system is in state 1 when yt < 2, the system is in state 2 when 2 ≤ yt < 8, and the system is in state 3 otherwise. • The transition function rate between states 1 and 2 is 3.5, and the transition function rate between states 2 and 3 is 1.5. Create an LSTAR model representing yt. 12-2308
simulate
t = [2 8]; tt = threshold([2 8],Type="logistic",Rates=[3.5 1.5]); mdl1 = arima(Constant=-5,Variance=0.1); mdl2 = arima(Constant=0,Variance=0.2); mdl3 = arima(Constant=5,Variance=0.3); Mdl = tsVAR(tt,[mdl1; mdl2; mdl3]);
Load the Canadian inflation and interest rate data set. load Data_Canada
Extract the CPI-based inflation rate series. INF_C = DataTable.INF_C; numObs = length(INF_C);
Simulate ten paths from the model. Specify the threshold variable type and its data. Y = simulate(Mdl,numObs,NumPaths=10,Type="exogenous",Z=INF_C);
Y is a numObs-by-10 matrix of simulated paths. Each column represents an independently simulated path. In a tiled layout, plot the threshold transitions with the data by using ttplot,and plot the simulated paths to one tile. tiledlayout(2,1) nexttile ttplot(tt,Data=INF_C) colorbar('off') xticklabels(dates(xticks)) nexttile plot(dates,Y) grid on axis tight title("Simulations")
12-2309
12
Functions
Y switches between submodels according to the value of the threshold variable INF_C. Mixing is evident for observations near thresholds, such as at the inflation rates of 1964 and 1978.
Return Innovations and States Consider the model for the annual, CPI-based, Canadian inflation rate series in “Simulate Multiple Paths” on page 12-2308. Create the LSTAR model for the series. t = [2 8]; tt = threshold([2 8],Type="logistic",Rates=[3.5 1.5]); mdl1 = arima(Constant=-5,Variance=0.1); mdl2 = arima(Constant=0,Variance=0.2); mdl3 = arima(Constant=5,Variance=0.3); Mdl = tsVAR(tt,[mdl1; mdl2; mdl3]);
Load the Canadian inflation and interest rate data set and extract the inflation rate series. load Data_Canada INF_C = DataTable.INF_C; numObs = length(INF_C);
12-2310
simulate
Simulate a length numObs path from the model. Specify the threshold variable type and its data. Return the innovations and states. [y,e,s] = simulate(Mdl,numObs,NumPaths=10,Type="exogenous",Z=INF_C); tiledlayout(3,1) nexttile plot(y); ylabel("Simulated Response") grid on nexttile plot(e) ylabel('Innovation') grid on nexttile stem(s) ylabel('State') yticks([1 2 3]) yticklabels(Mdl.StateNames)
12-2311
12
Functions
Initialize Multivariate Model Simulation from Multiple Starting Conditions This example shows how to initialize simulated paths from presample responses and initial states. The example uses arbitrary parameter values. Fully Specify LSETAR Model Consider the following 2-D LSETAR model. • • •
State 1, "Low": yt =
1 0 1 −0 . 1 + ε1, t, where ε1, t ∼ N , . −1 0 −0 . 1 1
State 2 , "Med": yt =
2 0.5 0.1 0 2 −0 . 2 + yt − 1 + ε2, t, where ε2, t ∼ N , . −2 0.5 0.5 0 −0 . 2 2
3 0 . 25 0 0 0 + yt − 1 + yt − 2 + ε3, t, where −3 0 0 0 . 25 0 0 3 −0 . 3 ε3, t ∼ N , . 0 −0 . 3 3 State 3, "High": yt =
• The system is in state 1 when y2, t − 4 < − 1, the system is in state 2 when −1 ≤ y2, t − 4 < 1, and the system is in state 3 otherwise. • The transition function is logistic. The transition rate from state 1 to 2 is 3.5, and the transition rate from state 1 to 3 is 1.5. Create logistic threshold transitions at mid-levels -1 and 1 with rates 3.5 and 1.5, respectively. Label the states. t = [-1 1]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames);
Create the VAR submodels by using varm. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. % Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Autoregression coefficients (numSeries AR1 = {}; % 0 AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2
x numSeries matrices) lags lag lags
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VAR Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3); mdl = [mdl1; mdl2; mdl3];
12-2312
simulate
Create an LSETAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] []
"High"]
Mdl is a fully specified tsVAR object representing a multivariate three-state LSETAR model. tsVAR object functions enable you to specify threshold variable characteristics and data. Initialize Simulation from Presample Responses Consider simulating 5 paths initialized from presample responses. Specify a numPreObs-bynumSeries-by-numPaths array of presample responses, where: • numPreObs is the number of presample responses per series and path. You must specify enough presample observations to initialize all AR components in the VAR models and the endogenous threshold variable. The largest AR component order is 2 and the threshold variable delay is 4, therefore simulate requires numPreObs=4 presample observations per series and path. • numSeries=2, the number of response series in the system. • numPaths=5, the number of independent paths to simulate. delay = 4; numPaths = 5; Y0 = zeros(delay,Mdl.NumSeries,numPaths); for j = 2:numPaths Y0(:,:,j) = 10*j*ones(delay,Mdl.NumSeries); end
Simulate 10 paths of length 100 from the LSETAR model from the presample. Specify the endogenous threshold variable and its delay, y2, t − 4. numObs = 100; rng(1); Y = simulate(Mdl,numObs,NumPaths=numPaths,Y0=Y0,Index=2,Delay=4);
Y is a 100-by-2-by-5 array of simulate response paths. For example, Y(50,2,3) is the simulated response of path 3, of series Y2, at time point 50. Plot the simulated paths for each variable on separate plots. tiledlayout(2,1) nexttile plot(squeeze(Y(:,1,:))) title("Y1") nexttile
12-2313
12
Functions
plot(squeeze(Y(:,2,:))) title("Y2")
The system quickly settles regardless of the presample. Initialize Simulation from States Simulate three paths of length 100, where each of the three states initialize a path. Specify state indices for initialization, and specify the endogenous threshold variable and its delay. S0 = 1:Mdl.NumStates; numPaths = numel(S0); Y = simulate(Mdl,numObs,NumPaths=numPaths,S0=S0,Index=2,Delay=4); tiledlayout(2,1) nexttile plot(squeeze(Y(:,1,:))) title("Y1") nexttile plot(squeeze(Y(:,2,:))) title("Y2")
12-2314
simulate
Simulate Model Containing Exogenous Regression Component Consider including regression components for exogenous variables in each submodel of the threshold-switching dynamic regression model in “Initialize Multivariate Model Simulation from Multiple Starting Conditions” on page 12-2311. Fully Specify LSETAR Model Create logistic threshold transitions at mid-levels -1 and 1 with rates 3.5 and 1.5, respectively. Label the states. t = [-1 1]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [-1 1] [3.5000 1.5000] ["Low" "Med" 3
"High"]
12-2315
12
Functions
Assume the following VARX models describe the response processes of the system: • • •
State 1: yt =
1 1 0 1 −0 . 1 + x1, t + ε1, t, where ε1, t ∼ N , . −1 −1 0 −0 . 1 1
State 2: yt =
2 2 2 0.5 0.1 0 2 −0 . 2 + x2, t + yt − 1 + ε2, t, where ε2, t ∼ N , . −2 −2 −2 0.5 0.5 0 −0 . 2 2
3 3 3 3 0 . 25 0 0 0 + x3, t + yt − 1 + yt − 2 + ε3, t, where −3 −3 −3 −3 0 0 0 . 25 0 0 3 −0 . 3 ε3, t ∼ N , . 0 −0 . 3 3 State 3: yt =
x1, t represents a single exogenous variable, x2, t represents two exogenous variables, and x3, t represents three exogenous variables. Store the submodels in a vector. % Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Regression coefficients (numSeries x numRegressors matrices) Beta1 = [1; -1]; % 1 regressor Beta2 = [2 2; -2 -2]; % 2 regressors Beta3 = [3 3 3; -3 -3 -3]; % 3 regressors % Autoregression coefficients (numSeries x numSeries matrices) AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; %VARX submodels mdl1 = varm(Constant=C1,AR=AR1,Beta=Beta1,Covariance=Sigma1); mdl2 = varm(Constant=C2,AR=AR2,Beta=Beta2,Covariance=Sigma2); mdl3 = varm(Constant=C3,AR=AR3,Beta=Beta3,Covariance=Sigma3); mdl = [mdl1; mdl2; mdl3];
Create an LSETAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames:
12-2316
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"]
"High"]
simulate
Covariance: []
Simulate Data Ignoring Regression Component If you do not supply exogenous data, simulate ignores the regression components in the submodels. Simulate a single path of responses, innovations, and states into a simulation horizon of length 50. Then plot each path separately. rng(1); % For reproducibility numObs = 50; [Y,E,SP] = simulate(Mdl,numObs); figure tiledlayout(3,1) nexttile plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) nexttile plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) nexttile stem(SP) ylabel("State") yticks([1 2 3])
12-2317
12
Functions
Simulate Data Including Regression Component simulate requires exogenous data in order to generate random paths from the model. Simulate exogenous data for the three regressors by generating 50 random observations from the 3-D standard Gaussian distribution. X = randn(50,3);
Generate one random response, innovation, and state path of length 50. Specify the simulated exogenous data for the submodel regression components. Plot the results. rng(1); % Reset seed for comparison [Y,E,SP] = simulate(Mdl,numObs,X=X); figure tiledlayout(3,1) nexttile plot(Y) ylabel("Response") grid on legend(["y_1" "y_2"]) nexttile plot(E) ylabel("Innovation") grid on legend(["e_1" "e_2"]) nexttile
12-2318
simulate
stem(SP) ylabel("State") yticks([1 2 3])
Perform Monte Carlo Estimation This example shows how to use Monte Carlo estimation to obtain an interval estimate of the threshold mid-level. Consider a SETAR model for the real US GDP growth rate yt with AR(4) submodels. Suppose the threshold variable is yt (self exciting with 0 delay). Create a discrete threshold transition at unknown mid-level t1. Label the states "Recession" and "Expansion". tt = threshold(NaN,StateNames=["Recession" "Expansion"]);
For each state, create a partially specified AR(4) model with one coefficient at lag 4. Store the state submodels in a vector. submdl = arima(ARLags=4); mdl = [submdl; submdl];
Each submodel has an unknown, estimable lag 4 coefficient, model constant, and innovations variance. 12-2319
12
Functions
Create a partially specified TAR model from the threshold transition and submodel vector. Mdl = tsVAR(tt,mdl);
Create a fully specified threshold transition that has the same structure as tt, but set the mid-level to 0. tt0 = threshold(0);
Load the US macroeconomic data set. Compute the real GDP growth rate as a percent. load Data_USEconModel rGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF; pRGDP = 100*price2ret(rGDP); T = numel(pRGDP);
Fit the TAR to the real GDP rate series. EstMdl = estimate(Mdl,tt0,pRGDP,Z=pRGDP,Type="exogenous");
Simulate 100 response paths from the estimated model. rng(100) % For reproducibility numPaths= 100; Y = simulate(EstMdl,T,NumPaths=numPaths,Z=pRGDP,Type="exogenous");
Fit the TAR model to each simulated response path. Specify the estimated threshold transition EstMdl.Switch to initialize the estimation procedure. For each path, store the estimated threshold transition mid-level. tMC = nan(T,1); for j = 1:numPaths EstMdlSim = estimate(Mdl,EstMdl.Switch,Y(:,j),Z=Y(:,j),Type="exogenous"); tMC(j) = EstMdlSim.Switch.Levels; end
tMC is a 100-by-1 vector representing a Monte Carlo sample of threshold transitions. Obtain a 95% confidence interval on the true threshold transition by computing the 0.25 and .975 quantiles of the Monte Carlo sample. tCI = quantile(tMC,[0.025 0.975]) tCI = 1×2 0.5158
0.9810
A 95% confidence interval on the true threshold transition is (0.52%, 0.98%).
Input Arguments Mdl — Fully specified threshold-switching dynamic regression model tsVAR model object 12-2320
simulate
Fully specified threshold-switching dynamic regression model, specified as an tsVAR model object returned by tsVAR or estimate. Properties of a fully specified model object do not contain NaN values. numObs — Number of observations to generate positive integer Number of observations to generate for each sample path, specified as a positive integer. Data Types: double Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: NumPaths=1000,Y0=Y0 simulates 1000 sample paths and initializes the dynamic component of each submodel by using the presample response data Y0. NumPaths — Number of sample paths to generate 1 (default) | positive integer Number of sample paths to generate, specified as a positive integer. Example: NumPaths=1000 Data Types: double Type — Type of threshold variable data "endogenous" (default) | "exogenous" Type of threshold variable data, specified as a value in this table. Value
Description
"endogenous"
The model is self-exciting with threshold variable data zt = y j, (t − d), generated by response j, where • The name-value argument 'Delay' specifies the delay d. • The name-value argument 'Index' specifies the component j of the multivariate response variable.
"exogenous"
The threshold variable is exogenous to the system. The name-value argument 'Z' specifies the threshold variable data and is required.
Example: Type="exogenous",Z=z specifies the data z for the exogenous threshold variable. Example: Type="endogenous",Index=2,Delay=4 specifies the endogenous threshold variable as y2,t−4, whose data is Y(:,2). Data Types: char | string | cell 12-2321
12
Functions
Y0 — Presample response data numeric matrix | numeric array Presample response data, specified as a numeric matrix or array. To use the same presample data for each of the numPaths path, specify a numPreSampleObs-bynumSeries matrix, where numPaths is the value of NumPaths, numPreSampleObs is the number of presample observations, and numSeries is the number of response variables. To use different presample data for each path: • For univariate ARX submodels, specify a numPreSampleObs-by-numPaths matrix. • For multivariate VARX submodels, specify a numPreSampleObs-by-numSeries-by-numPaths array. The number of presample observations numPreSampleObs must be sufficient to initialize the AR terms of all submodels. For models of type "endogenous", the number of presample observations must also be sufficient to initialize the delayed response. If numPreSampleObs exceeds the number necessary to initial the model, simulate uses only the latest observations. The last row contains the latest observations. simulate updates Y0 using the latest simulated observations each time it switches states. By default, simulate determines Y0 by the submodel of the initial state: • If the initial submodel is a stationary AR process without regression components, simulate sets presample observations to the unconditional mean. • Otherwise, simulate sets presample observations to zero. Data Types: double Z — Threshold variable data zt empty array ([]) (default) | numeric vector | numeric matrix Threshold variable data for simulations of type "exogenous", specified as a numeric vector of length numObsZ or a numObsZ-by-numPaths numeric matrix. For a numeric vector, simulate applies the same data to all simulated paths. For a matrix, simulate applies columns of Z to corresponding simulated paths. If numObsZ exceeds numobs, simulate uses only the latest observations. The last row contains the latest observation. simulate determines the initial state of simulations by values in the first row Z(1,:). Data Types: double Delay — Threshold variable delay d in yj,t−d 1 (default) | positive integer Threshold variable delay d in yj,t−d for simulations of type "endogenous", specified as a positive integer. Example: Delay=4 specifies that the threshold variable is y2,t−d, where j is the value of Index. Data Types: double 12-2322
simulate
Index — Threshold variable index j in yj,t−d 1 (default) | scalar in 1:Mdl.NumSeries Threshold variable index j in yj,t−d for simulations of type "endogenous", specified as a scalar in 1:Mdl.NumSeries. simulate ignores Index for univariate AR models. Example: Index=2 specifies that the threshold variable is y2,t−d, where d is the value of Delay. Data Types: double S0 — Initial states 1 (default) | numeric scalar | numeric vector Initial states of simulations, for simulations of type "endogenous", specified as a numeric scalar or vector of length numPaths. Entries of S0 must be in 1:Mdl.NumStates. A scalar S0 applies the same initial state to all paths. A vector S0 applies initial state S0(j) to path j. If you specify Y0, simulate ignores S0 and determines initial states by the specified presample data. Example: 'S0',2 applies state 2 to initialize all paths. Example: 'S0',[2 3] specifies state 2 as the initial state. Data Types: double X — Predictor data numeric matrix | cell vector of numeric matrices Predictor data used to evaluate regression components in all submodels of Mdl, specified as a numeric matrix or a cell vector of numeric matrices. To use a subset of the same predictors in each state, specify X as a matrix with numPreds columns and at least numObs rows. Columns correspond to distinct predictor variables. Submodels use initial columns of the associated matrix, in order, up to the number of submodel predictors. The number of columns in the Beta property of Mdl.SubModels(j) determines the number of exogenous variables in the regression component of submodel j. If the number of rows exceeds numObs, then simulate uses the latest observations. To use different predictors in each state, specify a cell vector of such matrices with length numStates. By default, simulate ignores regression components in Mdl. Data Types: double
Output Arguments Y — Simulated response paths numeric matrix | numeric array Simulated response paths, returned as a numeric matrix or array. Y represents the continuation of the presample responses in Y0. 12-2323
12
Functions
For univariate ARX submodels, Y is a numObs-by-numPaths matrix. For multivariate VARX submodels, Y is a numObs-by-numSeries-by-numPaths array. E — Simulated innovation paths numeric matrix | numeric array Simulated innovation paths, returned as a numeric matrix or array. For univariate ARX submodels, E is a numObs-by-numPaths matrix. For multivariate VARX submodels, E is a numObs-by-numSeries-by-numPaths array. simulate generates innovations using the covariance specification in Mdl. For more details, see tsVAR. StatePaths — Simulated state paths numeric matrix Simulated state paths, returned as a numObs-by-numPaths numeric matrix. If threshold levels in Mdl.Switch.Levels are t1, t2,… ,tn, simulate labels states of the threshold variable (-∞,t1), [t1,t2), … [tn,∞) as 1, 2, 3,... n + 1, respectively.
Version History Introduced in R2021b
References [1] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [2] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also Objects tsVAR Functions estimate | forecast Topics “Estimate Threshold-Switching Dynamic Regression Models” on page 10-94 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-111 “Forecast Threshold-Switching Dynamic Regression Models” on page 10-118
12-2324
simulate
simulate Monte Carlo simulation of vector autoregression (VAR) model
Syntax Y = simulate(Mdl,numobs) Y = simulate(Mdl,numobs,Name=Value) [Y,E] = simulate( ___ ) Tbl = simulate(Mdl,numobs,Presample=Presample) Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables) Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables,Presample=Presample) Tbl = simulate( ___ ,Name=Value)
Description Conditional and Unconditional Simulation for Numeric Arrays
Y = simulate(Mdl,numobs) returns the numeric array Y containing a random numobs-period path of multivariate response series from performing an unconditional simulation of the fully specified VAR(p) model Mdl. Y = simulate(Mdl,numobs,Name=Value) uses additional options specified by one or more namevalue arguments. simulate returns numeric arrays when all optional input data are numeric arrays. For example, simulate(Mdl,100,NumPaths=1000,Y0=PS) returns a numeric array of 1000, 100period simulated response paths from Mdl and specifies the numeric array of presample response data PS. To produce a conditional simulation, specify response data in the simulation horizon by using the YF name-value argument. [Y,E] = simulate( ___ ) also returns the numeric array containing the simulated multivariate model innovations series E corresponding to the simulated responses Y, using any input argument combination in the previous syntaxes. Unconditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,Presample=Presample) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the unconditional simulation of the response series in the model Mdl. simulate uses the table or timetable of presample data Presample to initialize the response series. simulate selects the variables in Mdl.SeriesNames to simulate, or it selects all variables in Presample. To select different response variables in Tbl to simulate, use the PresampleResponseVariables name-value argument. Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) uses additional options specified by one or more name-value arguments. For example, 12-2325
12
Functions
simulate(Mdl,100,Presample=PSTbl,PresampleResponseVariables=["GDP" "CPI"]) returns a timetable of variables containing 100-period simulated response and innovations series from Mdl, initialized by the data in the GDP and CPI variables of the timetable of presample data in PSTbl. Conditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the conditional simulation of the response series in the model Mdl. InSample is a table or timetable of response or predictor data in the simulation horizon that simulate uses to perform the conditional simulation and ResponseVariables specifies the response variables in InSample. Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables,Presample=Presample) uses the presample data in the table or timetable Presample to initialize the model. Tbl = simulate( ___ ,Name=Value) uses additional options specified by one or more name-value arguments, using any input argument combination in the previous two syntaxes.
Examples Return Response Series in Matrix from Unconditional Simulation Fit a VAR(4) model to the consumer price index (CPI) and unemployment rate data. Then, simulate a vector of responses from the estimated model. Load the Data_USEconModel data set. load Data_USEconModel
Plot the two series on separate plots. figure plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL); title("Consumer Price Index") ylabel("Index") xlabel("Date")
12-2326
simulate
figure plot(DataTimeTable.Time,DataTimeTable.UNRATE); title("Unemployment Rate") ylabel("Percent") xlabel("Date")
12-2327
12
Functions
Stabilize the CPI by converting it to a series of growth rates. Synchronize the two series by removing the first observation from the unemployment rate series. Create a new data set containing the transformed variables, and do not include any rows containing at least one missing observation. rcpi = price2ret(DataTimeTable.CPIAUCSL); unrate = DataTimeTable.UNRATE(2:end); dates = DataTimeTable.Time(2:end); Data = array2timetable([rcpi unrate],RowTimes=dates, ... VariableNames=["RCPI" "UNRATE"]); Data = rmmissing(Data);
Create a default VAR(4) model by using the shorthand syntax. Mdl = varm(2,4); Mdl.SeriesNames = Data.Properties.VariableNames;
Estimate the model using the entire data set. EstMdl = estimate(Mdl,Data.Variables);
EstMdl is a fully specified, estimated varm model object. Simulate a response series path from the estimated model with length equal to the path in the data. rng(1); % For reproducibility numobs = height(Data); Y = simulate(EstMdl,numobs);
12-2328
simulate
Y is a 245-by-2 matrix of simulated responses. The first and second columns contain the simulated CPI growth rate and unemployment rate, respectively. Plot the simulated and true responses. figure plot(Data.Time,Y(:,1)); hold on plot(Data.Time,Data.RCPI) title("CPI Growth Rate"); ylabel("Growth Rate") xlabel("Date") legend("Simulation","Observed") hold off
figure plot(Data.Time,Y(:,2)); hold on plot(Data.Time,Data.UNRATE) ylabel("Percent") xlabel("Date") title("Unemployment Rate") legend("Simulation","Observed") hold off
12-2329
12
Functions
Simulate Responses Using filter Illustrate the relationship between simulate and filter by estimating a 4-D VAR(2) model of the four response series in Johansen's Danish data set. Simulate a single path of responses using the fitted model and the historical data as initial values, and then filter a random set of Gaussian disturbances through the estimated model using the same presample responses. Load Johansen's Danish economic data. load Data_JDanish
For details on the variables, enter Description. Create a default 4-D VAR(2) model. Mdl = varm(4,2); Mdl.SeriesNames = DataTimeTable.Properties.VariableNames;
Estimate the VAR(2) model using the entire data set. EstMdl = estimate(Mdl,DataTimeTable.Variables);
When reproducing the results of simulate and filter: 12-2330
simulate
• Set the same random number seed using rng. • Specify the same presample response data using the Y0 name-value argument. Set the default random seed. Simulate 100 observations by passing the estimated model to simulate. Specify the entire data set as the presample. rng("default") YSim = simulate(EstMdl,100,Y0=DataTimeTable.Variables);
YSim is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in Data. Set the default random seed. Simulate 4 series of 100 observations from the standard Gaussian distribution. rng("default") Z = randn(100,4);
Filter the Gaussian values through the estimated model. Specify the entire data set as the presample. YFilter = filter(EstMdl,Z,Y0=DataTimeTable.Variables);
YFilter is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in the data Data. Before filtering the disturbances, filter scales Z by the lower triangular Cholesky factor of the model covariance in EstMdl.Covariance. Compare the resulting responses between filter and simulate. (YSim - YFilter)'*(YSim - YFilter) ans = 4×4 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
The results are identical.
Simulate Arrays of Multiple Response and Innovations Paths Load Johansen's Danish economic data. Remove all missing observations. load Data_JDanish Data = rmmissing(Data); T = height(Data);
For details on the variables, enter Description. Create a default 4-D VAR(2) model. Mdl = varm(4,2);
Estimate the VAR(2) model using the entire data set. 12-2331
12
Functions
EstMdl = estimate(Mdl,Data);
When reproducing the results of simulate and filter: • Set the same random number seed using rng. • Specify the same presample response data using the Y0 name-value argument. Simulate 100 paths of T – EstMdl.P, the effective sample size, responses, and corresponding innovations by passing the estimated model to simulate. Specify the same matrix of presample as the presample used in estimation (the earliest Mdl.P observations, by default). rng("default") p = Mdl.P; numobs = T - p; PS = Data(1:p,:); [YSim,ESim] = simulate(EstMdl,numobs,NumPaths=100,Y0=PS); size(YSim) ans = 1×3 53
4
100
YSim and ESim are 53-by-4-by-1000 numeric arrays of simulated responses and innovations, respectively. Each row corresponds to a period in the simulation horizon, each column corresponds to the variable in EstMdl.SeriesNames, and pages are separate, independently simulated paths. Plot each simulated response and innovations variable with their observations. figure InSample = Data((p+1):end,:); tiledlayout(2,2) for j = 1:numel(EstMdl.SeriesNames) nexttile h1 = plot(squeeze(YSim(:,j,:)),Color=[0.8 0.8 0.8]); hold on h2 = plot(InSample(:,j),Color="k",LineWidth=2); hold off title(series(j)) legend([h1(1) h2],["Simulated" "Observed"]) end
12-2332
simulate
E = infer(EstMdl,InSample,Y0=PS); figure tiledlayout(2,2) for j = 1:numel(EstMdl.SeriesNames) nexttile h1 = plot(squeeze(ESim(:,j,:)),Color=[0.8 0.8 0.8]); hold on h2 = plot(E(:,j),Color="k",LineWidth=2); hold off title("Innovations: " + EstMdl.SeriesNames{j}) legend([h1(1) h2],["Simulated" "Observed"]) end
12-2333
12
Functions
Return Timetable of Responses and Innovations from Unconditional Simulation Fit a VAR(4) model to the consumer price index (CPI) and unemployment rate data. Then, perform an unconditional simulation of the estimated model and return the simulated responses and corresponding innovations in a timetable. This example is based on “Return Response Series in Matrix from Unconditional Simulation” on page 12-2326. Load and Preprocess Data Load the Data_USEconModel data set. Compute the CPI growth rate. Because the growth rate calculation consumes the earliest observation, include the rate variable in the timetable by prepending the series with NaN. load Data_USEconModel DataTimeTable.RCPI = [NaN; price2ret(DataTimeTable.CPIAUCSL)]; T = height(DataTimeTable) T = 249
Prepare Timetable for Estimation When you plan to supply a timetable directly to estimate, you must ensure it has all the following characteristics: 12-2334
simulate
• All selected response variables are numeric and do not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the table, relative to the CPI rate (RCPI) and unemployment rate (UNRATE) series. varnames = ["RCPI" "UNRATE"]; DTT = rmmissing(DataTimeTable,DataVariables=varnames); T = height(DTT) T = 245
rmmissing removes the four initial missing observations from the DataTimeTable to create a subtable DTT. The variables RCPI and UNRATE of DTT do not have any missing observations. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt; areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 1
DTT is regular with respect to time. Create Model Template for Estimation Create a default VAR(4) model by using the shorthand syntax. Specify the response variable names. Mdl = varm(2,4); Mdl.SeriesNames = varnames;
Fit Model to Data Estimate the model. Pass the entire timetable DTT. By default, estimate selects the response variables in Mdl.SeriesNames to fit to the model. Alternatively, you can use the ResponseVariables name-value argument. 12-2335
12
Functions
EstMdl = estimate(Mdl,DTT); p = EstMdl.P p = 4
Perform Unconditional Simulation of Estimated Model Simulate a response and innovations path from the estimated model and return the simulated series as variables in a timetable. simulate requires information for the output timetable, such as variable names, sampling times for the simulation horizon, and sampling frequency. Therefore, supply a presample of the earliest p = 4 observations of the data DTT, from which simulate infers the required timetable information. Specify a simulation horizon of numobs – p. rng(1) % For reproducibility PSTbl = DTT(1:p,:); numobs = T - p; Tbl = simulate(EstMdl,T,Presample=PSTbl); size(Tbl) ans = 1×2 245
4
PSTbl PSTbl=4×15 timetable Time COE _____ _____ Q1-48 Q2-48 Q3-48 Q4-48
137.9 139.6 144.5 145.9
CPIAUCSL ________ 23.5 24.15 24.36 24.05
FEDFUNDS ________ NaN NaN NaN NaN
GCE ____
GDP _____
GDPDEF ______
GPDI ____
GS10 ____
HOANBS ______
37.6 39.7 41.4 43.5
260.4 267.3 273.9 275.2
16.111 16.254 16.556 16.597
45 48.1 50.2 49.1
NaN NaN NaN NaN
55.036 55.007 55.398 54.885
head(Tbl) Time _____
RCPI_Responses ______________
UNRATE_Responses ________________
RCPI_Innovations ________________
UNRATE_Innovations __________________
Q1-49 Q2-49 Q3-49 Q4-49 Q1-50 Q2-50 Q3-50 Q4-50
0.0037294 0.0064827 -0.0073358 -0.0057328 -0.0060454 -0.0084475 -0.0067066 -0.0020759
4.6036 5.0083 5.4981 5.7007 5.8687 5.5758 5.4129 5.2191
-0.0038547 0.0070154 -0.0045047 -0.0065904 -0.005022 -0.0034013 -0.0033182 0.0010595
0.25039 0.027504 0.25199 0.10593 0.13824 -0.26192 0.13055 0.11135
Tbl is a 241-by-4 matrix of simulated responses and innovations. RCPI_Responses is the simulated path of the CPI growth rate and RCPI_Innovations is the corresponding innovations series, and the variables associated with the unemployment rate are similar. The timestamps of Tbl follow directly from the timestamps of PSTbl, and they have the same sampling frequency.
12-2336
simulate
Simulate Responses from Model Containing Regression Component Estimate a VAR(4) model of the consumer price index (CPI), the unemployment rate, and the gross domestic product (GDP). Include a linear regression component containing the current and the last four quarters of government consumption expenditures and investment. Simulate multiple paths from the estimated model. Load the Data_USEconModel data set. Compute the real GDP. load Data_USEconModel DataTimeTable.RGDP = DataTimeTable.GDP./DataTimeTable.GDPDEF*100;
Plot all variables on separate plots. figure tiledlayout(2,2) nexttile plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL); ylabel("Index") title("Consumer Price Index") nexttile plot(DataTimeTable.Time,DataTimeTable.UNRATE); ylabel("Percent") title("Unemployment Rate") nexttile plot(DataTimeTable.Time,DataTimeTable.RGDP); ylabel("Output") title("Real Gross Domestic Product") nexttile plot(DataTimeTable.Time,DataTimeTable.GCE); ylabel("Billions of $") title("Government Expenditures")
12-2337
12
Functions
Stabilize the CPI, GDP, and GCE by converting each to a series of growth rates. Synchronize the unemployment rate series with the others by removing its first observation. varnames = ["CPIAUCSL" "RGDP" "GCE"]; DTT = varfun(@price2ret,DataTimeTable,InputVariables=varnames); DTT.Properties.VariableNames = varnames; DTT.UNRATE = DataTimeTable.UNRATE(2:end);
Make the time base regular. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
Expand the GCE rate series to a matrix that includes the first lagged series through the fourth lag series. RGCELags = lagmatrix(DTT,1:4,DataVariables="GCE"); DTT = [DTT RGCELags]; DTT = rmmissing(DTT);
Create separate presample and estimation sample data sets. The presample contains the earliest p = 4 observations, and the estimation sample contains the rest of the data. p = 4; PS = DTT(1:p,:); InSample = DTT((p+1):end,:); respnames = ["CPIAUCSL" "UNRATE" "RGDP"];
12-2338
simulate
idx = endsWith(InSample.Properties.VariableNames,"GCE"); prednames = InSample.Properties.VariableNames(idx);
Create a default VAR(4) model by using the shorthand syntax. Specify the response variable names. Mdl = varm(3,p); Mdl.SeriesNames = respnames;
Estimate the model using the entire sample. Specify the GCE and its lags as exogenous predictor data for the regression component. EstMdl = estimate(Mdl,InSample,Presample=PS,PredictorVariables=prednames);
Generate 100 random response and innovations paths from the estimated model by performing an unconditional simulation. Specify that the length of the paths is the same as the length of the estimation sample period. Supply the presample and estimation sample data. rng(1) % For reproducibility numpaths = 100; numobs = height(InSample); Tbl = simulate(EstMdl,numobs,NumPaths=numpaths, ... Presample=PS,InSample=InSample,PredictorVariables=prednames); size(Tbl) ans = 1×2 240
14
head(Tbl) Time _____
CPIAUCSL __________
RGDP __________
GCE __________
Q1-49 Q2-49 Q3-49 Q4-49 Q1-50 Q2-50 Q3-50 Q4-50
0.00041815 -0.0071324 -0.0059122 0.0012698 0.010101 0.01908 0.025954 0.035395
-0.0031645 0.011385 -0.010366 0.040091 0.029649 0.03844 0.017994 0.01197
0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478 0.075508 0.14807
UNRATE ______ 6.2 6.6 6.6 6.3 5.4 4.4 4.3 3.4
Lag1GCE __________
Lag2GCE __________
Lag ____
0.047147 0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478 0.075508
0.04948 0.047147 0.036603 -0.0021164 -0.012793 -0.021693 0.010905 -0.0043478
0 0 0. 0. -0.0 -0. -0. 0.
Tbl is a 240-by-14 timetable of estimation sample data, simulated responses (denoted responseName_Responses), and corresponding innovations (denoted responseName_Innovations). The simulated response and innovations variables are 240-by-100 matrices, where each row is a period in the estimation sample and each column is a separate, independently generated path. For each time in the estimation sample, compute the mean vector of the simulated responses among all paths. idx = endsWith(Tbl.Properties.VariableNames,"_Responses"); simrespnames = Tbl.Properties.VariableNames(idx); MeanSim = varfun(@(x)mean(x,2),Tbl,InputVariables=simrespnames);
MeanSim is a 240-by-3 timetable containing the average of the simulated responses at each time point. 12-2339
12
Functions
Plot the simulated responses, their averages, and the data. figure tiledlayout(2,2) for j = 1:Mdl.NumSeries nexttile plot(Tbl.Time,Tbl{:,simrespnames(j)},Color=[0.8,0.8,0.8]) title(Mdl.SeriesNames{j}); hold on h1 = plot(Tbl.Time,Tbl{:,respnames(j)}); h2 = plot(Tbl.Time,MeanSim{:,"Fun_"+simrespnames(j)}); hold off end hl = legend([h1 h2],"Data","Mean"); hl.Position = [0.6 0.25 hl.Position(3:4)];
Return Timetable of Responses and Innovations from Conditional Simulation Perform a conditional simulation of the VAR model in “Return Timetable of Responses and Innovations from Unconditional Simulation” on page 12-2334, in which economists hypothesize that the unemployment rate is 6% for 15 quarters after the end of the sampling period (from Q2 of 2009 through Q4 of 2012).
12-2340
simulate
Load and Preprocess Data Load the Data_USEconModel data set. Compute the CPI growth rate. Because the growth rate calculation consumes the earliest observation, include the rate variable in the timetable by prepending the series with NaN. load Data_USEconModel DataTimeTable.RCPI = [NaN; price2ret(DataTimeTable.CPIAUCSL)];
Prepare Timetable for Estimation Remove all missing values from the table, relative to the CPI rate (RCPI) and unemployment rate (UNRATE) series. varnames = ["RCPI" "UNRATE"]; DTT = rmmissing(DataTimeTable,DataVariables=varnames);
Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
Create Model Template for Estimation Create a default VAR(4) model by using the shorthand syntax. Specify the response variable names. p = 4; Mdl = varm(2,p); Mdl.SeriesNames = varnames;
Fit Model to Data Estimate the model. Pass the entire timetable DTT. By default, estimate selects the response variables in Mdl.SeriesNames to fit to the model. Alternatively, you can use the ResponseVariables name-value argument. EstMdl = estimate(Mdl,DTT);
Prepare for Conditional Simulation of Estimated Model Suppose economists hypothesize that the unemployment rate will be at 6% for the next 15 quarters. Create a timetable with the following qualities: • The timestamps are regular with respect to the estimation sample timestamps and they are ordered from Q2 of 2009 through Q4 of 2012. • The variable RCPI (and, consequently, all other variables in DTT) is a 15-by-1 vector of NaN values. • The variable UNRATE is a 15-by-1 vector, where each element is 6. numobs = 15; shdt = DTT.Time(end) + calquarters(1:numobs); DTTCondSim = retime(DTT,shdt,"fillwithmissing"); DTTCondSim.UNRATE = ones(numobs,1)*6;
DTTCondSim is a 15-by-15 timetable that follows directly, in time, from DTT, and both timetables have the same variables. All variables in DTTCondSim contain NaN values, except for UNRATE, which is a vector composed of the value 6. 12-2341
12
Functions
Perform Conditional Simulation of Estimated Model Simulate the CPI growth rate given the hypothesis by supplying the conditioning data DTTCondSim and specifying the response variable names. Generate 1000 paths. Because the simulation horizon is beyond the estimation sample data, supply the estimation sample as a presample to initialize the model. rng(1) % For reproducibility Tbl = simulate(EstMdl,numobs,NumPaths=1000, ... InSample=DTTCondSim,ResponseVariables=EstMdl.SeriesNames, ... Presample=DTT,PresampleResponseVariables=EstMdl.SeriesNames); size(Tbl) ans = 1×2 15
19
idx = endsWith(Tbl.Properties.VariableNames,["_Responses" "_Innovations"]); head(Tbl(:,idx)) Time _____
RCPI_Responses ______________
Q2-09 Q3-09 Q4-09 Q1-10 Q2-10 Q3-10 Q4-10 Q1-11
1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
UNRATE_Responses ________________ 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
RCPI_Innovations ________________ 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
UNRATE_Innovations __________________ 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000 1x1000
double double double double double double double double
Tbl is a 15-by-19 matrix of simulated responses and innovations of RCPI given UNRATE is 6% for the next 15 quarters. RCPI_Responses contains the simulated paths of the CPI growth rate and RCPI_Innovations contains the corresponding innovations series. UNRATE_Responses is a 15by-1000 matrix composed of the value 6. All other variables in Tbl are the variables and their values in DTTCondSim. Plot the simulated values of the CPI growth rate and their mean with the final few values of the estimation sample data. MeanRCPISim = mean(Tbl.RCPI_Responses,2); figure h1 = plot(DTT.Time((end-30):end),DTT.RCPI((end-30):end)); hold on h2 = plot(Tbl.Time,Tbl.RCPI_Responses,Color=[0.8 0.8 0.8]); h3 = plot(Tbl.Time,MeanRCPISim,Color="k",LineWidth=2); xline(Tbl.Time(1),"r--",LineWidth=2) hold off title(EstMdl.SeriesNames) legend([h1 h2(1) h3],["Estimation data" "Simulated paths" "Simulation mean"], ... Location="best")
12-2342
simulate
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object created by varm or estimate. Mdl must be fully specified. numobs — Number of random observations to generate positive integer Number of random observations to generate per output path, specified as a positive integer. The output arguments Y and E, or Tbl, have numobs rows. Data Types: double Presample — Presample data table | timetable Presample data that provides initial values for the model Mdl, specified as a table or timetable with numprevars variables and numpreobs rows. The following situations describe when to use Presample: • Presample is required when simulate performs an unconditional simulation, which occurs under one of the following conditions: 12-2343
12
Functions
• You do not supply data in the simulation horizon (that is, you do not use the InSample namevalue argument). • You specify only predictor data for the model regression component in the simulation horizon using the InSample and PredictorVariables name-value arguments, but you do not select any response variables from InSample. • Presample is optional when simulate performs a conditional simulation, that is, when you supply response data in the simulation horizon, on which to condition the simulated responses, by using the InSample and ResponseVariables name-value arguments. By default, simulate sets any necessary presample observations. • For stationary VAR processes without regression components, simulate sets presample observations to the unconditional mean μ = Φ−1(L)c . • For nonstationary processes or models that contain a regression component, simulate sets presample observations to zero. Regardless of the situation, simulate returns the simulated variables in the output table or timetable Tbl, which is commensurate with Presample. Each row is a presample observation, and measurements in each row, among all paths, occur simultaneously. numpreobs must be at least Mdl.P. If you supply more rows than necessary, simulate uses the latest Mdl.P observations only. Each variable is a numpreobs-by-numprepaths numeric matrix. Variables are associated with response series in Mdl.SeriesNames. To control presample variable selection, see the optional PresampleResponseVariables name-value argument. For each variable, columns are separate, independent paths. • If variables are vectors, simulate applies them to each respective path to initialize the model for the simulation. Therefore, all respective response paths derive from common initial conditions. • Otherwise, for each variable ResponseK and each path j, simulate applies Presample.ResponseK(:,j) to produce Tbl.ResponseK(:,j). Variables must have at least numpaths columns, and simulate uses only the first numpaths columns. If Presample is a timetable, all the following conditions must be true: • Presample must represent a sample with a regular datetime time step (see isregular). • The inputs InSample and Presample must be consistent in time such that Presample immediately precedes InSample with respect to the sampling frequency and order. • The datetime vector of sample timestamps Presample.Time must be ascending or descending. If Presample is a table, the last row contains the latest presample observation. InSample — Future time series response or predictor data table | timetable Future time series response or predictor data, specified as a table or timetable. InSample contains numvars variables, including numseries response variables yt or numpreds predictor variables xt for the model regression component. You can specify InSample only when other data inputs are tables or timetables. Use InSample in the following situations: 12-2344
simulate
• Perform conditional simulation. You must also supply the response variable names in InSample by using the ResponseVariables name-value argument. • Supply future predictor data for either unconditional or conditional simulation. To supply predictor data, you must specify predictor variable names in InSample by using the PredictorVariables name-value argument. Otherwise, simulate ignores the model regression component. simulate returns the simulated variables in the output table or timetable Tbl, which is commensurate with InSample. Each row corresponds to an observation in the simulation horizon, the first row is the earliest observation, and measurements in each row, among all paths, occur simultaneously. InSample must have at least numobs rows to cover the simulation horizon. If you supply more rows than necessary, simulate uses only the first numobs rows. Each response variable is a numeric matrix with numpaths columns. For each response variable K, columns are separate, independent paths. Specifically, path j of response variable ResponseK captures the state, or knowledge, of ResponseK as it evolves from the presample past (for example, Presample.ResponseK) into the future. For each selected response variable ResponseK: • If InSample.ResponseK is a vector, simulate applies to each of the numpaths output paths (see NumPaths). • Otherwise, InSample.ResponseK must have at least numpaths columns. If you supply more pages than necessary, simulate uses only the first numpaths columns. Each predictor variable is a numeric vector. All predictor variables are present in the regression component of each response equation and apply to all response paths. If InSample is a timetable, the following conditions apply: • InSample must represent a sample with a regular datetime time step (see isregular). • The datetime vector InSample.Time must be ascending or descending. • Presample must immediately precede InSample, with respect to the sampling frequency. If InSample is a table, the last row contains the latest observation. Elements of the response variables of InSample can be numeric scalars or missing values (indicated by NaN values). simulate treats numeric scalars as deterministic future responses that are known in advance, for example, set by policy. simulate simulates responses for corresponding NaN values conditional on the known values. Elements of selected predictor variables must be numeric scalars. By default, simulate performs an unconditional simulation without a regression component in the model (each selected response variable is a numobs-by-numpaths matrix composed of NaN values indicating a complete lack of knowledge of the future state of all simulated responses). Therefore, variables in Tbl result from a conventional, unconditional Monte Carlo simulation. For more details, see “Algorithms” on page 12-2351. Example: Consider simulating one path from a model composed of two response series, GDP and CPI, three periods into the future. Suppose that you have prior knowledge about some of the future values of the responses, and you want to simulate the unknown responses conditional on your knowledge. Specify InSample as a table containing the values that you know, and use NaN for values you do not know but want to simulate. For example, InSample=array2table([2 NaN; 0.1 NaN; NaN 12-2345
12
Functions
NaN],VariableNames=["GDP" "CPI"]) specifies that you have no knowledge of the future values of CPI, but you know that GDP is 2, 0.1, and unknown in periods 1, 2, and 3, respectively, in the simulation horizon. ResponseVariables — Variables to select from InSample to treat as response variables yt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as response variables yt, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in InSample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where ResponseVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(ResponseVariables) is numseries To perform conditional simulation, you must specify ResponseVariables to select the response variables in InSample for the conditioning data. ResponseVariables applies only when you specify InSample. The selected variables must be numeric vectors (single path) or matrices (columns represent multiple independent paths) of the same width. Example: ResponseVariables=["GDP" "CPI"] Example: ResponseVariables=[true false true false] or ResponseVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string Name-Value Pair Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: simulate(Mdl,100,NumPaths=1000,Y0=PS) returns a numeric array of 1000, 100period simulated response paths from Mdl and specifies the numeric array of presample response data PS. NumPaths — Number of sample paths to generate 1 (default) | positive integer Number of sample paths to generate, specified as a positive integer. The outputs Y and E have NumPaths pages, and each simulated response and innovation variable in the output Tbl is a numobs-by-NumPaths matrix. Example: NumPaths=1000 Data Types: double Y0 — Presample responses numeric matrix | numeric array 12-2346
simulate
Presample responses that provide initial values for the model Mdl, specified as a numpreobs-bynumseries numeric matrix or a numpreobs-by-numseries-by-numprepaths numeric array. Use Y0 only when you supply optional data inputs as numeric arrays. numpreobs is the number of presample observations. numprepaths is the number of presample response paths. Each row is a presample observation, and measurements in each row, among all pages, occur simultaneously. The last row contains the latest presample observation. Y0 must have at least Mdl.P rows. If you supply more rows than necessary, simulate uses the latest Mdl.P observations only. Each column corresponds to the response series name in Mdl.SeriesNames. Pages correspond to separate, independent paths. • If Y0 is a matrix, simulate applies it to simulate each sample path (page). Therefore, all paths in the output argument Y derive from common initial conditions. • Otherwise, simulate applies Y0(:,:,j) to initialize simulating path j. Y0 must have at least numpaths pages, and simulate uses only the first numpaths pages. By default, simulate sets any necessary presample observations. • For stationary VAR processes without regression components, simulate sets presample observations to the unconditional mean μ = Φ−1(L)c . • For nonstationary processes or models that contain a regression component, simulate sets presample observations to zero. Data Types: double PresampleResponseVariables — Variables to select from Presample to use for presample response data string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from Presample to use for presample data, specified as one of the following data types: • String vector or cell vector of character vectors containing numseries variable names in Presample.Properties.VariableNames • A length numseries vector of unique indices (integers) of variables to select from Presample.Properties.VariableNames • A length numprevars logical vector, where PresampleResponseVariables(j) = true selects variable j from Presample.Properties.VariableNames, and sum(PresampleResponseVariables) is numseries PresampleResponseVariables applies only when you specify Presample. The selected variables must be numeric vectors and cannot contain missing values (NaN). PresampleResponseNames does not need to contain the same names as in Mdl.SeriesNames; simulate uses the data in selected variable PresampleResponseVariables(j) as a presample for Mdl.SeriesNames(j). 12-2347
12
Functions
If the number of variables in Presample matches Mdl.NumSeries, the default specifies all variables in Presample. If the number of variables in Presample exceeds Mdl.NumSeries, the default matches variables in Presample to names in Mdl.SeriesNames. Example: PresampleResponseVariables=["GDP" "CPI"] Example: PresampleResponseVariables=[true false true false] or PresampleResponseVariable=[1 3] selects the first and third table variables for presample data. Data Types: double | logical | char | cell | string X — Predictor data numeric matrix Predictor data for the regression component in the model, specified as a numeric matrix containing numpreds columns. Use X only when you supply optional data inputs as numeric arrays. numpreds is the number of predictor variables (size(Mdl.Beta,2)). Each row corresponds to an observation, and measurements in each row occur simultaneously. The last row contains the latest observation. X must have at least numobs rows. If you supply more rows than necessary, simulate uses only the latest numobs observations. simulate does not use the regression component in the presample period. Each column is an individual predictor variable. All predictor variables are present in the regression component of each response equation. simulate applies X to each path (page); that is, X represents one path of observed predictors. By default, simulate excludes the regression component, regardless of its presence in Mdl. Data Types: double PredictorVariables — Variables to select from InSample to treat as exogenous predictor variables xt string vector | cell vector of character vectors | vector of integers | logical vector Variables to select from InSample to treat as exogenous predictor variables xt, specified as one of the following data types: • String vector or cell vector of character vectors containing numpreds variable names in InSample.Properties.VariableNames • A length numpreds vector of unique indices (integers) of variables to select from InSample.Properties.VariableNames • A length numvars logical vector, where PredictorVariables(j) = true selects variable j from InSample.Properties.VariableNames, and sum(PredictorVariables) is numpreds Regardless, selected predictor variable j corresponds to the coefficients Mdl.Beta(:,j). PredictorVariables applies only when you specify InSample. The selected variables must be numeric vectors and cannot contain missing values (NaN). By default, simulate excludes the regression component, regardless of its presence in Mdl. Example: PredictorVariables=["M1SL" "TB3MS" "UNRATE"] 12-2348
simulate
Example: PredictorVariables=[true false true false] or PredictorVariable=[1 3] selects the first and third table variables as the response variables. Data Types: double | logical | char | cell | string YF — Future multivariate response series numeric matrix | numeric array Future multivariate response series for conditional simulation, specified as a numeric matrix or array containing numseries columns. Use YF only when you supply optional data inputs as numeric arrays. Each row corresponds to observations in the simulation horizon, and the first row is the earliest observation. Specifically, row j in sample path k (YF(j,:,k)) contains the responses j periods into the future. YF must have at least numobs rows to cover the simulation horizon. If you supply more rows than necessary, simulate uses only the first numobs rows. Each column corresponds to the response variable name in Mdl.SeriesNames. Each page corresponds to a separate sample path. Specifically, path k (YF(:,:,k)) captures the state, or knowledge, of the response series as they evolve from the presample past (Y0) into the future. • If YF is a matrix, simulate applies YF to each of the numpaths output paths (see NumPaths). • Otherwise, YF must have at least numpaths pages. If you supply more pages than necessary, simulate uses only the first numpaths pages. Elements of YF can be numeric scalars or missing values (indicated by NaN values). simulate treats numeric scalars as deterministic future responses that are known in advance, for example, set by policy. simulate simulates responses for corresponding NaN values conditional on the known values. By default, YF is an array composed of NaN values indicating a complete lack of knowledge of the future state of all simulated responses. Therefore, simulate obtains the output responses Y from a conventional, unconditional Monte Carlo simulation. For more details, see “Algorithms” on page 12-2351. Example: Consider simulating one path from a model composed of four response series three periods into the future. Suppose that you have prior knowledge about some of the future values of the responses, and you want to simulate the unknown responses conditional on your knowledge. Specify YF as a matrix containing the values that you know, and use NaN for values you do not know but want to simulate. For example, YF=[NaN 2 5 NaN; NaN NaN 0.1 NaN; NaN NaN NaN NaN] specifies that you have no knowledge of the future values of the first and fourth response series; you know the value for period 1 in the second response series, but no other value; and you know the values for periods 1 and 2 in the third response series, but not the value for period 3. Data Types: double Note • NaN values in Y0 and X indicate missing values. simulate removes missing values from the data by list-wise deletion. If Y0 is a 3-D array, then simulate performs these steps: 1
Horizontally concatenate pages to form a numpreobs-by-numpaths*numseries matrix. 12-2349
12
Functions
2
Remove any row that contains at least one NaN from the concatenated data.
In the case of missing observations, the results obtained from multiple paths of Y0 can differ from the results obtained from each path individually. For conditional simulation (see YF), if X contains any missing values in the latest numobs observations, then simulate issues an error. • simulate issues an error when selected response variables from Presample and selected predictor variables from InSample contain any missing values.
Output Arguments Y — Simulated multivariate response series numeric matrix | numeric array Simulated multivariate response series, returned as a numobs-by-numseries numeric matrix or a numobs-by-numseries-by-numpaths numeric array. simulate returns Y only when you supply optional data sets as numeric matrices or arrays, for example, you use the Y0 name-value argument. Y represents the continuation of the presample responses in Y0. Each row is a time point in the simulation horizon. Values in a row, among all pages, occur simultaneously. The last row contains the latest simulated values. Each column corresponds to the response series name in Mdl.SeriesNames. Pages correspond to separate, independently simulated paths. If you specify future responses for conditional simulation using the YF name-value argument, the known values in YF appear in the same positions in Y. However, Y contains simulated values for the missing observations in YF. E — Simulated multivariate model innovations series numeric matrix | numeric array Simulated multivariate model innovations series, returned as a numobs-by-numseries numeric matrix or a numobs-by-numseries-by-numpaths numeric array. simulate returns E only when you supply optional data sets as numeric matrices or arrays, for example, you use the Y0 name-value argument. Elements of E and Y correspond. If you specify future responses for conditional simulation (see the YF name-value argument), simulate infers the innovations from the known values in YF and places the inferred innovations in the corresponding positions in E. For the missing observations in YF, simulate draws from the Gaussian distribution conditional on any known values, and places the draws in the corresponding positions in E. Tbl — Simulated multivariate response, model innovations, and other variables table | timetable
12-2350
simulate
Simulated multivariate response, model innovations, and other variables, returned as a table or timetable, the same data type as Presample or InSample. simulate returns Tbl only when you supply at least one of the inputs Presample and InSample. Tbl contains the following variables: • The simulated paths within the simulation horizon of the selected response series yt. Each simulated response variable in Tbl is a numobs-by-numpaths numeric matrix, where numobs is the value of NumObs and numpaths is the value of NumPaths. Each row corresponds to a time in the simulation horizon and each column corresponds to a separate path. simulate names the simulated response variable ResponseK ResponseK_Responses. For example, if Mdl.Series(K) is GDP, Tbl contains a variable for the corresponding simulated response with the name GDP_Responses. If you specify ResponseVariables, ResponseK is ResponseVariable(K). Otherwise, ResponseK is PresampleResponseVariable(K). • The simulated paths within the simulation horizon of the innovations εt corresponding to yt. Each simulated innovations variable in Tbl is a numobs-by-numpaths numeric matrix. Each row corresponds to a time in the simulation horizon and each column corresponds to a separate path. simulate names the simulated innovations variable of response ResponseK ResponseK_Innovations. For example, if Mdl.Series(K) is GDP, Tbl contains a variable for the corresponding innovations with the name GDP_Innovations. If Tbl is a timetable, the following conditions hold: • The row order of Tbl, either ascending or descending, matches the row order of InSample, when you specify it. If you do not specify InSample and you specify Presample, the row order of Tbl is the same as the row order Presample. • If you specify InSample, row times Tbl.Time are InSample.Time(1:numobs). Otherwise, Tbl.Time(1) is the next time after Presample(end) relative to the sampling frequency, and Tbl.Time(2:numobs) are the following times relative to the sampling frequency.
Algorithms Suppose Y0 and YF are the presample and future response data specified by the numeric data inputs in Y0 and YF or the selected variables from the input tables or timetables Presample and InSample. Similarly, suppose E contains the simulated model innovations as returned in the numeric array E or the table or timetable Tbl. • simulate performs conditional simulation using this process for all pages k = 1,...,numpaths and for each time t = 1,...,numobs. 1
simulate infers (or inverse filters) the model innovations for all response variables (E(t,:,k) from the known future responses (YF(t,:,k)). In E, simulate mimics the pattern of NaN values that appears in YF.
2
For the missing elements of E at time t, simulate performs these steps. a
Draw Z1, the random, standard Gaussian distribution disturbances conditional on the known elements of E.
b
Scale Z1 by the lower triangular Cholesky factor of the conditional covariance matrix. That is, Z2 = L*Z1, where L = chol(C,"lower") and C is the covariance of the conditional Gaussian distribution.
c
Impute Z2 in place of the corresponding missing values in E. 12-2351
12
Functions
3
For the missing values in YF, simulate filters the corresponding random innovations through the model Mdl.
• simulate uses this process to determine the time origin t0 of models that include linear time trends. • If you do not specify Y0, then t0 = 0. • Otherwise, simulate sets t0 to size(Y0,1) – Mdl.P. Therefore, the times in the trend component are t = t0 + 1, t0 + 2,..., t0 + numobs. This convention is consistent with the default behavior of model estimation in which estimate removes the first Mdl.P responses, reducing the effective sample size. Although simulate explicitly uses the first Mdl.P presample responses in Y0 to initialize the model, the total number of observations in Y0 (excluding any missing values) determines t0.
Version History Introduced in R2017a R2022b: simulate accepts input data in tables and timetables, and return results in tables and timetables In addition to accepting input data in numeric arrays, simulate accepts input data in tables and timetables. simulate chooses default series on which to operate, but you can use the following name-value arguments to select variables. • Presample specifies the input table or regular timetable of presample response data. • PresampleResponseVariables specifies the response series names in Presample. • Insample specifies the table or regular timetable of future response and predictor data for conditional simulation. • ResponseVariables specifies the response series names in InSample. • PredictorVariables specifies the predictor series in InSample for a model regression component.
References [1] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. [2] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995. [3] Juselius, K. The Cointegrated VAR Model. Oxford: Oxford University Press, 2006. [4] Lütkepohl, H. New Introduction to Multiple Time Series Analysis. Berlin: Springer, 2005.
See Also Objects varm Functions estimate | filter | infer | forecast 12-2352
simulate
Topics “VAR Model Forecasting, Simulation, and Analysis” on page 9-44 “Simulate Responses Using filter” on page 9-88 “Simulate Responses of Estimated VARX Model” on page 9-78 “Simulate VAR Model Conditional Responses” on page 9-84 “VAR Model Case Study” on page 9-90
12-2353
12
Functions
simulate Monte Carlo simulation of vector error-correction (VEC) model
Syntax Y = simulate(Mdl,numobs) Y = simulate(Mdl,numobs,Name=Value) [Y,E] = simulate( ___ ) Tbl = simulate(Mdl,numobs,Presample=Presample) Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables) Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables,Presample=Presample) Tbl = simulate( ___ ,Name=Value)
Description Conditional and Unconditional Simulation for Numeric Arrays
Y = simulate(Mdl,numobs) returns the numeric array Y containing a random numobs-period path of multivariate response series from performing an unconditional simulation of the fully specified VEC(p – 1) model Mdl. Y = simulate(Mdl,numobs,Name=Value) uses additional options specified by one or more namevalue arguments. simulate returns numeric arrays when all optional input data are numeric arrays. For example, simulate(Mdl,100,NumPaths=1000,Y0=PS) returns a numeric array of 1000, 100period simulated response paths from Mdl and specifies the numeric array of presample response data PS. To perform a conditional simulation, specify response data in the simulation horizon by using the YF name-value argument. [Y,E] = simulate( ___ ) also returns the numeric array containing the simulated multivariate model innovations series E corresponding to the simulated responses Y, using any input argument combination in the previous syntaxes. Unconditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,Presample=Presample) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the unconditional simulation of the response series in the model Mdl. simulate uses the table or timetable of presample data Presample to initialize the response series. simulate selects the variables in Mdl.SeriesNames to simulate or all variables in Presample. To select different response variables in Presample to simulate, use the PresampleResponseVariables name-value argument. Tbl = simulate(Mdl,numobs,Presample=Presample,Name=Value) uses additional options specified by one or more name-value arguments. For example, 12-2354
simulate
simulate(Mdl,100,Presample=PSTbl,PresampleResponseVariables=["GDP" "CPI"]) returns a timetable of variables containing 100-period simulated response and innovations series from Mdl, initialized by the data in the GDP and CPI variables of the timetable of presample data in PSTbl. Conditional Simulation for Tables and Timetables
Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables) returns the table or timetable Tbl containing the random multivariate response and innovations variables, which results from the conditional simulation of the response series in the model Mdl. InSample is a table or timetable of response or predictor data in the simulation horizon that simulate uses to perform the conditional simulation and ResponseVariables specifies the response variables in InSample. Tbl = simulate(Mdl,numobs,InSample=InSample,ResponseVariables= ResponseVariables,Presample=Presample) uses the presample data in the table or timetable Presample to initialize the model. Tbl = simulate( ___ ,Name=Value) uses additional options specified by one or more name-value arguments, using any input argument combination in the previous two syntaxes.
Examples Return Response Series in Matrix from Unconditional Simulation Consider a VEC model for the following seven macroeconomic series, fit the model to the data, and then perform unconditional simulation by generating a random path of the response variables from the estimated model. • Gross domestic product (GDP) • GDP implicit price deflator • Paid compensation of employees • Nonfarm business sector hours of all persons • Effective federal funds rate • Personal consumption expenditures • Gross private domestic investment Suppose that a cointegrating rank of 4 and one short-run term are appropriate, that is, consider a VEC(1) model. Load the Data_USEconVECModel data set. load Data_USEconVECModel
For more information on the data set and variables, enter Description at the command line. Determine whether the data needs to be preprocessed by plotting the series on separate plots. figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.GDP)
12-2355
12
Functions
title("Gross Domestic Product") ylabel("Index") xlabel("Date") nexttile plot(FRED.Time,FRED.GDPDEF) title("GDP Deflator") ylabel("Index") xlabel("Date") nexttile plot(FRED.Time,FRED.COE) title("Paid Compensation of Employees") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.HOANBS) title("Nonfarm Business Sector Hours") ylabel("Index") xlabel("Date")
figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.FEDFUNDS) title("Federal Funds Rate") ylabel("Percent") xlabel("Date")
12-2356
simulate
nexttile plot(FRED.Time,FRED.PCEC) title("Consumption Expenditures") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.GPDI) title("Gross Private Domestic Investment") ylabel("Billions of $") xlabel("Date")
Stabilize all series, except the federal funds rate, by applying the log transform. Scale the resulting series by 100 so that all series are on the same scale. FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
Create a VECM(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = FRED.Properties.VariableNames Mdl = vecm with properties:
12-2357
12
Functions
Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model with Linear Time Trend" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs] [4×1 vector of NaNs] {7×7 matrix of NaNs} at lag [1] [7×1 vector of NaNs] [7×0 matrix] [7×7 matrix of NaNs]
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set and the default options. EstMdl = estimate(Mdl,FRED.Variables) EstMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [14.1329 8.77841 -7.20359 ... and 4 more]' [7×4 matrix] [7×4 matrix] [7×7 matrix] [-28.6082 109.555 -77.0912 ... and 1 more]' [4×1 vector of zeros] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
EstMdl is an estimated vecm model object. It is fully specified because all parameters have known values. By default, estimate imposes the constraints of the H1 Johansen VEC model form by removing the cointegrating trend and linear trend terms from the model. Parameter exclusion from estimation is equivalent to imposing equality constraints to zero. Simulate a response series path from the estimated model with length equal to the path in the data. rng(1); % For reproducibility numobs = size(FRED,1); Y = simulate(EstMdl,numobs);
Y is a 240-by-7 matrix of simulated responses. Columns correspond to the variable names in EstMdl.SeriesNames.
12-2358
simulate
Simulate Responses Using filter Illustrate the relationship between simulate and filter by estimating a 4-D VEC(1) model of the four response series in Johansen's Danish data set. Simulate a single path of responses using the fitted model and the historical data as initial values, and then filter a random set of Gaussian disturbances through the estimated model using the same presample responses. Load Johansen's Danish economic data. load Data_JDanish
For details on the variables, enter Description. Create a default 4-D VEC(1) model. Assume that a cointegrating rank of 1 is appropriate. Mdl = vecm(4,1,1); Mdl.SeriesNames = DataTable.Properties.VariableNames Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"4-Dimensional Rank = 1 VEC(1) Model with Linear Time Trend" "M2" "Y" "IB" ... and 1 more 4 1 2 [4×1 vector of NaNs] [4×1 matrix of NaNs] [4×1 matrix of NaNs] [4×4 matrix of NaNs] NaN NaN {4×4 matrix of NaNs} at lag [1] [4×1 vector of NaNs] [4×0 matrix] [4×4 matrix of NaNs]
Estimate the VEC(1) model using the entire data set. Specify the H1* Johansen model form. EstMdl = estimate(Mdl,Data,Model="H1*");
When reproducing the results of simulate and filter, it is important to take these actions. • Set the same random number seed using rng. • Specify the same presample response data using the Y0 name-value argument. Simulate 100 observations by passing the estimated model to simulate. Specify the entire data set as the presample. rng("default") YSim = simulate(EstMdl,100,Y0=Data);
YSim is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in EstMdl.SeriesNames. 12-2359
12
Functions
Set the default random seed. Simulate 4 series of 100 observations from the standard Gaussian distribution. rng("default") Z = randn(100,4);
Filter the Gaussian values through the estimated model. Specify the entire data set as the presample. YFilter = filter(EstMdl,Z,Y0=Data);
YFilter is a 100-by-4 matrix of simulated responses. Columns correspond to the columns of the variables in EstMdl.SeriesNames. Before filtering the disturbances, filter scales Z by the lower triangular Cholesky factor of the model covariance in EstMdl.Covariance. Compare the resulting responses between filter and simulate. (YSim - YFilter)'*(YSim - YFilter) ans = 4×4 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
The results are identical.
Simulate Arrays of Multiple Response and Innovations Paths Consider this VEC(1) model for three hypothetical response series. Δyt = c + AB′yt − 1 + Φ1 Δyt − 1 + εt =
−1 −0 . 3 0 . 3 0 0.1 0.2 0 . 1 −0 . 2 0 . 2 −3 + −0 . 2 0 . 1 yt − 1 + 0 . 2 −0 . 2 0 Δyt − 1 + εt . −0 . 7 0 . 5 0 . 2 −30 −1 0 0 . 7 −0 . 2 0 . 3
The innovations are multivariate Gaussian with a mean of 0 and the covariance matrix 1.3 0.4 1.6 Σ = 0.4 0.6 0.7 . 1.6 0.7 5 Create variables for the parameter values. A = [-0.3 0.3; -0.2 0.1; -1 0]; B = [0.1 -0.7; -0.2 0.5; 0.2 0.2]; Phi = {[0. 0.1 0.2; 0.2 -0.2 0; 0.7 -0.2 0.3]}; c = [-1; -3; -30]; tau = [0; 0; 0]; Sigma = [1.3 0.4 1.6; 0.4 0.6 0.7; 1.6 0.7 5];
12-2360
% % % % % %
Adjustment Cointegration ShortRun Constant Trend Covariance
simulate
Create a vecm model object representing the VEC(1) model using the appropriate name-value pair arguments. Mdl = vecm(Adjustment=A,Cointegration=B, ... Constant=c,ShortRun=Phi,Trend=tau,Covariance=Sigma);
Mdl is effectively a fully specified vecm model object. That is, the cointegration constant and linear trend are unknown, but are not needed for simulating observations or forecasting given that the overall constant and trend parameters are known. Simulate 1000 paths of 100 observations. Return the innovations (scaled disturbances). rng(1); % For reproducibility numpaths = 1000; numobs = 100; [Y,E] = simulate(Mdl,numobs,NumPaths=numpaths);
Y is a 100-by-3-by-1000 matrix of simulated responses. E is a matrix whose dimensions correspond to the dimensions of Y, but represents the simulated, scaled disturbances. Columns correspond to the response variable names Mdl.SeriesNames. For each time point, compute the mean vector of the simulated responses among all paths. MeanSim = mean(Y,3);
MeanSim is a 100-by-7 matrix containing the average of the simulated responses at each time point. Plot the simulated responses and their averages, and plot the simulated innovations. figure tiledlayout(2,2) for j = 1:numel(Mdl.SeriesNames) nexttile h1 = plot(squeeze(Y(:,j,:)),Color=[0.8 0.8 0.8]); hold on h2 = plot(MeanSim(:,j),Color="k",LineWidth=2); hold off title(Mdl.SeriesNames{j}); legend([h1(1) h2],["Simulated" "Mean"]) end
12-2361
12
Functions
figure tiledlayout(2,2) for j = 1:numel(Mdl.SeriesNames) nexttile h1 = plot(squeeze(E(:,j,:)),Color=[0.8 0.8 0.8]); hold on yline(0,"r--") title("Innovations: " + Mdl.SeriesNames{j}) end
12-2362
simulate
Return Timetable of Responses and Innovations from Unconditional Simulation Consider a VEC model for the following seven macroeconomic series, and then fit the model to a timetable of response data. This example is based on “Return Response Series in Matrix from Unconditional Simulation” on page 12-2355. Load and Preprocess Data Load the Data_USEconVECModel data set. load Data_USEconVECModel DTT = FRED; DTT.GDP = 100*log(DTT.GDP); DTT.GDPDEF = 100*log(DTT.GDPDEF); DTT.COE = 100*log(DTT.COE); DTT.HOANBS = 100*log(DTT.HOANBS); DTT.PCEC = 100*log(DTT.PCEC); DTT.GPDI = 100*log(DTT.GPDI);
Prepare Timetable for Estimation When you plan to supply a timetable directly to estimate, you must ensure it has all the following characteristics: 12-2363
12
Functions
• All selected response variables are numeric and do not contain any missing values. • The timestamps in the Time variable are regular, and they are ascending or descending. Remove all missing values from the table. DTT = rmmissing(DTT); T = height(DTT) T = 240
DTT does not contain any missing values. Determine whether the sampling timestamps have a regular frequency and are sorted. areTimestampsRegular = isregular(DTT,"quarters") areTimestampsRegular = logical 0 areTimestampsSorted = issorted(DTT.Time) areTimestampsSorted = logical 1
areTimestampsRegular = 0 indicates that the timestamps of DTT are irregular. areTimestampsSorted = 1 indicates that the timestamps are sorted. Macroeconomic series in this example are timestamped at the end of the month. This quality induces an irregularly measured series. Remedy the time irregularity by shifting all dates to the first day of the quarter. dt = DTT.Time; dt = dateshift(dt,"start","quarter"); DTT.Time = dt;
DTT is regular with respect to time. Create Model Template for Estimation Create a VEC(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = DTT.Properties.VariableNames;
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Fit Model to Data Estimate the model by supplying the timetable of data DTT. By default, because the number of variables in Mdl.SeriesNames is the number of variables in DTT, estimate fits the model to all the variables in DTT. EstMdl = estimate(Mdl,DTT); p = EstMdl.P p = 2
12-2364
simulate
EstMdl is an estimated vecm model object. Perform Unconditional Simulation of Estimated Model Simulate a response and innovations path from the estimated model and return the simulated series as variables in a timetable. simulate requires information for the output timetable, such as variable names, sampling times for the simulation horizon, and sampling frequency. Therefore, supply a presample of the earliest p = 2 observations of the data DTT, from which simulate infers the required timetable information. Specify a simulation horizon of numobs - p. rng(1) % For reproducibility PSTbl = DTT(1:p,:); T = T - p; Tbl = simulate(EstMdl,T,Presample=PSTbl); size(Tbl) ans = 1×2 238
14
PSTbl PSTbl=2×7 timetable Time GDP ___________ ______ 01-Jan-1957 01-Apr-1957
615.4 615.87
GDPDEF ______
COE ______
HOANBS ______
FEDFUNDS ________
PCEC ______
GPDI ______
280.25 280.95
556.3 557.03
400.29 400.07
2.96 3
564.3 565.11
435.29 435.54
head(Tbl) Time ___________ 01-Jul-1957 01-Oct-1957 01-Jan-1958 01-Apr-1958 01-Jul-1958 01-Oct-1958 01-Jan-1959 01-Apr-1959
GDP_Responses _____________
GDPDEF_Responses ________________
616.84 619.27 620.08 620.73 621.25 621.9 622.57 624.12
281.66 282.31 282.64 282.94 283.36 284.06 284.44 285
COE_Responses _____________ 557.71 559.72 561.48 562.02 562.07 562.91 564.48 566.1
HOANBS_Responses ________________ 400.25 400.14 400.26 400 400.21 399.89 399.68 399.1
Tbl is a 238-by-14 matrix of simulated responses (denoted responseVariable_Responses) and corresponding innovations (denoted responseVariable_Innovations). The timestamps of Tbl follow directly from the timestamps of PSTbl, and they have the same sampling frequency.
Simulate Responses From Model Containing Regression Component Consider the model and data in “Return Response Series in Matrix from Unconditional Simulation” on page 12-2355. Load the Data_USEconVECModel data set. load Data_USEconVECModel
12-2365
FEDF ____
12
Functions
The Data_Recessions data set contains the beginning and ending serial dates of recessions. Load this data set. Convert the matrix of date serial numbers to a datetime array. load Data_Recessions dtrec = datetime(Recessions,ConvertFrom="datenum");
Remove the exponential trend from the series, and then scale them by a factor of 100. DTT = FRED; DTT.GDP = 100*log(DTT.GDP); DTT.GDPDEF = 100*log(DTT.GDPDEF); DTT.COE = 100*log(DTT.COE); DTT.HOANBS = 100*log(DTT.HOANBS); DTT.PCEC = 100*log(DTT.PCEC); DTT.GPDI = 100*log(DTT.GPDI);
Create a dummy variable that identifies periods in which the U.S. was in a recession or worse. Specifically, the variable should be 1 if FRED.Time occurs during a recession, and 0 otherwise. Include the variable with the FRED data. isin = @(x)(any(dtrec(:,1) 0. • σ3 > 0. MdlBSSM = ssm2bssm(MdlSSM,@flatPriorSSM2BSSM) MdlBSSM = Mapping that defines a state-space model: @(params)ParamMap2(params,MdlSSM) Log density of parameter prior distribution: @flatPriorSSM2BSSM
Local Functions This example uses the flatPriorSSM2BSSM function, which is the log prior distribution of the parameters. function logprior = flatPriorSSM2BSSM(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0) (theta(5) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else logprior = 0; end end
Input Arguments MdlSSM — Standard, linear state-space model ssm object Standard, linear state-space model, specified as an ssm object returned by ssm. Note The ssm2bssm converter is best suited for converting explicitly created, simple state-space models. For moderate through complex models, particularly implicitly created models where a 12-2439
12
Functions
parameter-to-matrix mapping function specifies the state-space, create a Bayesian model directly by using the bssm function. ParamDistribution — Log of joint probability density function of the state-space model parameters Π(θ) @(x)0 (default) | function handle Log of joint probability density function of the state-space model parameters Π(θ), specified as a function handle in the form @fcnName, where fcnName is the function name. ParamDistribution sets the ParamDistribution property of MdlBSSM. Suppose logPrior is the name of the MATLAB function defining the joint prior distribution of θ. Then, logPrior must have this form. function logpdf = logPrior(theta,...otherInputs...) ... end
where: • theta is a numParams-by-1 numeric vector of the linear state-space model parameters θ. Elements of theta must correspond to the unknown parameters of MdlSSM (see “Tips” on page 12-2441). The function can accept other inputs in subsequent positions. • logpdf is a numeric scalar representing the log of the joint probability density of θ at the input theta. If ParamDistribution requires the input parameter vector argument only, you can create the bssm object by calling: MdlBSSM = ssm2bssm(MdlSSM,@logPrior)
In general, create the bssm object by calling: MdlBSSM = ssm2bssm(MdlSSM,@(theta)logPrior(theta,...otherInputArgs...))
The default @(x)0 indicates a joint prior density that is proportional to 1 everywhere. Tip • The default joint prior is not necessarily a proper density. Consider specifying a proper prior instead. • Because out-of-bounds prior density evaluation is 0, set the log prior density of out-of-bounds parameter arguments to -Inf.
Data Types: function_handle
Output Arguments MdlBSSM — Bayesian state-space model bssm object 12-2440
ssm2bssm
Bayesian state-space model, returned as a bssm object. MdlBSSM completely specifies the state-space model structure (likelihood) and joint prior distribution.
Tips • To determine the order of the parameters for the first input argument theta of the log joint prior density function ParamDistribution, display the standard state-space model MdlSSM at the command line. MATLAB labels the parameters cj under the State equations and Observation equations headings, where j is the index of the parameter in the vector theta. For example, consider the following display of the standard state-space model MdlSSM. MdlSSM = State-space model type: ssm [ ... ] State equations: x1(t) = (c1)x1(t-1) + (c2)x2(t-1) + (c3)u1(t) x2(t) = x1(t-1) Observation equation: y1(t) = x1(t) + (c4)e1(t) [...]
In this case, theta is a 4-by-1 vector, where: • theta(1) is c1, the lag 1 AR coefficient of state variable x1,t. • theta(2) is c2, the lag 2 AR coefficient of state variable x1,t. • theta(3) is c3, the standard deviation of state disturbance u1,t. • theta(4) is c4, the standard deviation of observation innovation ε1,t.
Version History Introduced in R2022a
See Also ssm | bssm
12-2441
12
Functions
subchain Extract Markov subchain
Syntax sc = subchain(mc,states)
Description sc = subchain(mc,states) returns the subchain sc extracted from the discrete-time Markov chain mc. The subchain contains the states states and all states that are reachable from states.
Examples Extract Recurrent Subchain from Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0 0.5 P= 0 0
1 0 0 0
0 0.5 0.5 0.5
0 0 . 0.5 0.5
Create the Markov chain that is characterized by the transition matrix P. P = [0 1 0 0; 0.5 0 0.5 0; 0 0 0.5 0.5; 0 0 0.5 0.5]; mc = dtmc(P);
Plot a directed graph of the Markov chain. Visually identify the communicating class to which each state belongs by using node colors. figure; graphplot(mc,'ColorNodes',true);
12-2442
subchain
Determine the stationary distribution of the Markov chain. x = asymptotics(mc) x = 1×4 0.0000
0.0000
0.5000
0.5000
The Markov chain eventually gets absorbed into states 3 and 4, and subsequent transitions are stochastic. Extract the recurrent subchain of the Markov chain by passing mc to subchain and specifying one of the states in the recurrent, aperiodic communicating class. sc = subchain(mc,3);
sc is a dtmc object. Plot a directed graph of the subchain. figure; graphplot(sc,'ColorNodes',true)
12-2443
12
Functions
Extract Unichain from Markov Chain Consider this theoretical, right-stochastic transition matrix of a stochastic process. 0.5 0 P= 0 0
0.5 0.5 0 0
0 0.5 0.5 0.5
0 0 . 0.5 0.5
Create the Markov chain that is characterized by the transition matrix P. Name the states Regime 1 through Regime 4. P = [0.5 0.5 0 0; 0 0.5 0.5 0; 0 0 0.5 0.5; 0 0 0.5 0.5]; mc = dtmc(P,'StateNames',["Regime 1" "Regime 2" "Regime 3" "Regime 4"]);
Plot a digraph of the chain. Visually identify the communicating class to which each state belongs by using node colors. figure; graphplot(mc,'ColorNodes',true);
12-2444
subchain
Regimes 1 and 2 are in their own communicating class because Regime 2 does not transition to Regime 1. Extract the subchain containing Regime 2, a transient state. Display the transition matrix of the subchain. sc = subchain(mc,"Regime 2"); sc.P ans = 3×3 0.5000 0 0
0.5000 0.5000 0.5000
0 0.5000 0.5000
Regime 1 is not in the subchain. Plot a digraph of the subchain. figure; graphplot(sc,'ColorNodes',true);
12-2445
12
Functions
The plot shows a unichain: a Markov chain containing one recurrent communicating class and the selected transient class.
Input Arguments mc — Discrete-time Markov chain dtmc object Discrete-time Markov chain with NumStates states and transition matrix P, specified as a dtmc object. P must be fully specified (no NaN entries). states — States to include in subchain numeric vector of positive integers | string vector | cell vector of character vectors States to include in the subchain, specified as a numeric vector of positive integers, string vector, or cell vector of character vectors. • For a numeric vector, elements of states correspond to rows of the transition matrix mc.P. • For a string vector or cell vector of character vectors, elements of states must be state names in mc.StateNames. Example: ["Regime 1" "Regime 2"] Data Types: double | string | cell 12-2446
subchain
Output Arguments sc — Discrete-time Markov chain dtmc object Discrete-time Markov chain, returned as a dtmc object. sc is a subchain of mc containing the states states and all states reachable from states. The state names of the subchain sc.StateNames are inherited from mc.
Algorithms • State j is reachable from state i if there is a nonzero probability of moving from i to j in a finite number of steps. subchain determines reachability by forming the transitive closure of the associated digraph, then enumerating one-step transitions. • Subchains are closed under reachability to ensure that the transition matrix of sc remains stochastic (that is, rows sum to 1), with transition probabilities identical to the transition probabilities in mc.P. • If you specify a state in a recurrent communicating class, then subchain extracts the entire communicating class. If you specify a state in a transient communicating class, then subchain extracts the transient class and all classes reachable from the transient class. To extract a unichain, specify a state in each component transient class. See classify.
Version History Introduced in R2017b
References [1] Gallager, R.G. Stochastic Processes: Theory for Applications. Cambridge, UK: Cambridge University Press, 2013. [2] Horn, R., and C. R. Johnson. Matrix Analysis. Cambridge, UK: Cambridge University Press, 1985.
See Also Objects digraph Functions classify Topics “Markov Chain Modeling” on page 10-8 “Create and Modify Markov Chain Model Objects” on page 10-17 “Visualize Markov Chain Structure and Evolution” on page 10-27 “Identify Classes in Markov Chain” on page 10-47
12-2447
12
Functions
summarize Display univariate ARIMA or ARIMAX model estimation results
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the ARIMA model Mdl. • If Mdl is an estimated model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC), and the estimated innovations variance. • If Mdl is an unestimated model returned by arima, then summarize prints the standard object display (the same display that arima prints during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated model, then results is a structure containing estimation results. • If Mdl is an unestimated model, then results is an arima model object that is equal to Mdl.
Examples Display Estimation Results Print the results from estimating an ARMA model using simulated data. Simulate data from an ARMA(1,1) model using known parameter values. MdlSim = arima('Constant',0.01,'AR',0.8,'MA',0.14,... 'Variance',0.1); rng 'default'; Y = simulate(MdlSim,100);
Fit an ARMA(1,1) model to the simulated data, turning off the print display. Mdl = arima(1,0,1); EstMdl = estimate(Mdl,Y,'Display','off');
Print the estimation results. summarize(EstMdl) ARIMA(1,0,1) Model (Gaussian Distribution)
12-2448
summarize
Effective Sample Size: 100 Number of Estimated Parameters: 4 LogLikelihood: -41.296 AIC: 90.592 BIC: 101.013 Value ________ Constant AR{1} MA{1} Variance
0.044537 0.82289 0.12032 0.13373
StandardError _____________ 0.046038 0.071163 0.10182 0.017879
TStatistic __________ 0.96741 11.563 1.1817 7.4794
PValue __________ 0.33334 6.3104e-31 0.23731 7.466e-14
Extract Estimation Results from Fitting Composite Conditional Mean and Variance Model Load the NASDAQ data included with Econometrics™ toolbox. Convert the daily close composite index series to a return series. For numerical stability, convert the returns to percentage returns. Specify an AR(1) and GARCH(1,1) composite model. This is a model of the form rt = c + ϕ1rt − 1 + εt, where εt = σtzt, σt2 = κ + γ1σt2− 1 + α1εt2− 1, and zt is an independent and identically distributed standardized Gaussian process. load Data_EquityIdx nasdaq = DataTable.NASDAQ; r = 100*price2ret(nasdaq); T = length(r); Mdl = arima('ARLags',1,'Variance',garch(1,1));
Fit the model Mdl to the return series r by using estimate. Use the presample observations that estimate chooses by default. EstMdl = estimate(Mdl,r,'Display','params'); ARIMA(1,0,0) Model (Gaussian Distribution): Value ________ Constant AR{1}
0.072632 0.13816
StandardError _____________ 0.018047 0.019893
TStatistic __________
PValue __________
4.0245 6.945
5.7086e-05 3.7846e-12
12-2449
12
Functions
GARCH(1,1) Conditional Variance Model (Gaussian Distribution):
Constant GARCH{1} ARCH{1}
Value ________
StandardError _____________
TStatistic __________
PValue __________
0.022377 0.87312 0.11865
0.0033201 0.0091019 0.008717
6.7399 95.927 13.611
1.5852e-11 0 3.434e-42
Create a variable named results that contains the estimation results by using summarize. results = summarize(EstMdl) results = struct with fields: Description: "ARIMA(1,0,0) Model (Gaussian Distribution)" SampleSize: 3027 NumEstimatedParameters: 5 LogLikelihood: -4.7414e+03 AIC: 9.4929e+03 BIC: 9.5230e+03 Table: [2x4 table] VarianceTable: [3x4 table]
Extract the parameter estimate summary tables from the estimation results structure array by using dot notation. The Table field contains the conditional mean model parameter estimates and inferences. The VarianceTable field contains the conditional variance model parameter estimates and inferences. meanEstTbl = results.Table meanEstTbl=2×4 table Value ________ Constant AR{1}
0.072632 0.13816
StandardError _____________
TStatistic __________
PValue __________
4.0245 6.945
5.7086e-05 3.7846e-12
StandardError _____________
TStatistic __________
PValue __________
0.0033201 0.0091019 0.008717
6.7399 95.927 13.611
1.5852e-11 0 3.434e-42
0.018047 0.019893
varianceEstTbl = results.VarianceTable varianceEstTbl=3×4 table Value ________ Constant GARCH{1} ARCH{1}
0.022377 0.87312 0.11865
Input Arguments Mdl — ARIMA model arima model object ARIMA model, specified as an arima model object returned by estimate or arima. 12-2450
summarize
Output Arguments results — Model summary structure array | arima model object Model summary, returned as a structure array or an arima model object. • If Mdl is an estimated model, then results is a structure array containing the fields in this table. Field
Description
Description
Model summary description (string)
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Maximum likelihood estimates of the model parameters with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
VarianceTable
Maximum likelihood estimate of the model variance with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality). If Mdl.Variance is constant, then VarianceTable is a table containing one row. If Mdl.Variance is an estimated conditional variance model (for example, a garch model), then VarianceTable is a table whose rows correspond to estimated variance model parameters.
• If Mdl is an unestimated model, then results is an arima model object that is equal to Mdl.
Version History Introduced in R2018a
See Also Objects arima | garch | egarch | gjr 12-2451
12
Functions
Functions estimate
12-2452
summarize
summarize Distribution summary statistics of Bayesian vector autoregression (VAR) model
Syntax summarize(Mdl) summarize(Mdl,display) Summary = summarize(Mdl)
Description summarize(Mdl) displays, at the command line, a tabular summary of the coefficients of the Bayesian VAR(p) model on page 12-2466 Mdl, and the innovations covariance matrix. The summary includes the means and standard deviations of the distribution Mdl represents. summarize(Mdl,display) prints the summary using the display style display. Summary = summarize(Mdl) returns distribution summary statistics Summary.
Examples Inspect Minnesota Prior Assumptions Among Models Consider the 3-D VAR(4) model for the US inflation (INFL), unemployment (UNRATE), and federal funds (FEDFUNDS) rates. INFLt UNRATEt FEDFUNDSt
4
=c+
∑
j=1
INFLt −
ε1, t
j
Φ j UNRATEt −
+ ε2, t .
j
FEDFUNDSt −
j
ε3, t
For all t, εt is a series of independent 3-D normal innovations with a mean of 0 and covariance Σ. Assume that a prior distribution π Φ1, . . . , Φ4, c ′, Σ governs the behavior of the parameters. Consider using Minnesota regularization to obtain a parsimonious representation of the coefficient posterior distribution. For each supported prior assumption, create the corresponding Bayesian VAR(4) model object for the three response variables by using bayesvarm. For each model that supports the option, specify all the following. • The response variable names. • Prior self-lag coefficients have variance 100. This large-variance setting allows the data to influence the posterior more than the prior. • Prior cross-lag coefficients have variance 1. This small-variance setting tightens the cross-lag coefficients to zero during estimation. • Prior coefficient covariances decay with increasing lag at a rate of 2 (that is, lower lags are more important than larger lags). 12-2453
12
Functions
• For the normal conjugate prior model, assume that the innovations covariance is the 3-D identity matrix. seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; numseries = numel(seriesnames); numlags = 4; DiffusePriorMdl = bayesvarm(numseries,numlags,'SeriesNames',seriesnames); ConjugatePriorMdl = bayesvarm(numseries,numlags,'ModelType','conjugate',... 'SeriesNames',seriesnames,'Center',0.75,'SelfLag',100,'Decay',2); SemiConjugatePriorMdl = bayesvarm(numseries,numlags,'ModelType','semiconjugate',... 'SeriesNames',seriesnames,'Center',0.75,'SelfLag',100,'CrossLag',1,'Decay',2); NormalPriorMdl = bayesvarm(numseries,numlags,'ModelType','normal',... 'SeriesNames',seriesnames,'Center',0.75,'SelfLag',100,'CrossLag',1,'Decay',2,... 'Sigma',eye(numseries));
For each model, display summary of the prior distribution. summarize(DiffusePriorMdl) | Mean Std ------------------------Constant(1) | 0 Inf Constant(2) | 0 Inf Constant(3) | 0 Inf AR{1}(1,1) | 0 Inf AR{1}(2,1) | 0 Inf AR{1}(3,1) | 0 Inf AR{1}(1,2) | 0 Inf AR{1}(2,2) | 0 Inf AR{1}(3,2) | 0 Inf AR{1}(1,3) | 0 Inf AR{1}(2,3) | 0 Inf AR{1}(3,3) | 0 Inf AR{2}(1,1) | 0 Inf AR{2}(2,1) | 0 Inf AR{2}(3,1) | 0 Inf AR{2}(1,2) | 0 Inf AR{2}(2,2) | 0 Inf AR{2}(3,2) | 0 Inf AR{2}(1,3) | 0 Inf AR{2}(2,3) | 0 Inf AR{2}(3,3) | 0 Inf AR{3}(1,1) | 0 Inf AR{3}(2,1) | 0 Inf AR{3}(3,1) | 0 Inf AR{3}(1,2) | 0 Inf AR{3}(2,2) | 0 Inf AR{3}(3,2) | 0 Inf AR{3}(1,3) | 0 Inf AR{3}(2,3) | 0 Inf AR{3}(3,3) | 0 Inf AR{4}(1,1) | 0 Inf AR{4}(2,1) | 0 Inf AR{4}(3,1) | 0 Inf AR{4}(1,2) | 0 Inf AR{4}(2,2) | 0 Inf AR{4}(3,2) | 0 Inf
12-2454
summarize
AR{4}(1,3) | 0 Inf AR{4}(2,3) | 0 Inf AR{4}(3,3) | 0 Inf Innovations Covariance Matrix | INFL UNRATE FEDFUNDS -----------------------------------INFL | NaN NaN NaN | (NaN) (NaN) (NaN) UNRATE | NaN NaN NaN | (NaN) (NaN) (NaN) FEDFUNDS | NaN NaN NaN | (NaN) (NaN) (NaN)
Diffuse prior models put equal weight on all model coefficients. This specification allows the data to determine the posterior distribution. summarize(ConjugatePriorMdl) | Mean Std ------------------------------Constant(1) | 0 33.3333 Constant(2) | 0 33.3333 Constant(3) | 0 33.3333 AR{1}(1,1) | 0.7500 3.3333 AR{1}(2,1) | 0 3.3333 AR{1}(3,1) | 0 3.3333 AR{1}(1,2) | 0 3.3333 AR{1}(2,2) | 0.7500 3.3333 AR{1}(3,2) | 0 3.3333 AR{1}(1,3) | 0 3.3333 AR{1}(2,3) | 0 3.3333 AR{1}(3,3) | 0.7500 3.3333 AR{2}(1,1) | 0 1.6667 AR{2}(2,1) | 0 1.6667 AR{2}(3,1) | 0 1.6667 AR{2}(1,2) | 0 1.6667 AR{2}(2,2) | 0 1.6667 AR{2}(3,2) | 0 1.6667 AR{2}(1,3) | 0 1.6667 AR{2}(2,3) | 0 1.6667 AR{2}(3,3) | 0 1.6667 AR{3}(1,1) | 0 1.1111 AR{3}(2,1) | 0 1.1111 AR{3}(3,1) | 0 1.1111 AR{3}(1,2) | 0 1.1111 AR{3}(2,2) | 0 1.1111 AR{3}(3,2) | 0 1.1111 AR{3}(1,3) | 0 1.1111 AR{3}(2,3) | 0 1.1111 AR{3}(3,3) | 0 1.1111 AR{4}(1,1) | 0 0.8333 AR{4}(2,1) | 0 0.8333 AR{4}(3,1) | 0 0.8333 AR{4}(1,2) | 0 0.8333 AR{4}(2,2) | 0 0.8333 AR{4}(3,2) | 0 0.8333 AR{4}(1,3) | 0 0.8333 AR{4}(2,3) | 0 0.8333
12-2455
12
Functions
AR{4}(3,3) | 0 0.8333 Innovations Covariance Matrix | INFL UNRATE FEDFUNDS ----------------------------------------INFL | 0.1111 0 0 | (0.0594) (0.0398) (0.0398) UNRATE | 0 0.1111 0 | (0.0398) (0.0594) (0.0398) FEDFUNDS | 0 0 0.1111 | (0.0398) (0.0398) (0.0594)
With a tighter prior variance around 0 for larger lags, the posterior of the conjugate model is likely to be more sparse that the posterior of the diffuse model. summarize(SemiConjugatePriorMdl) | Mean Std -----------------------------Constant(1) | 0 100 Constant(2) | 0 100 Constant(3) | 0 100 AR{1}(1,1) | 0.7500 10 AR{1}(2,1) | 0 1 AR{1}(3,1) | 0 1 AR{1}(1,2) | 0 1 AR{1}(2,2) | 0.7500 10 AR{1}(3,2) | 0 1 AR{1}(1,3) | 0 1 AR{1}(2,3) | 0 1 AR{1}(3,3) | 0.7500 10 AR{2}(1,1) | 0 5 AR{2}(2,1) | 0 0.5000 AR{2}(3,1) | 0 0.5000 AR{2}(1,2) | 0 0.5000 AR{2}(2,2) | 0 5 AR{2}(3,2) | 0 0.5000 AR{2}(1,3) | 0 0.5000 AR{2}(2,3) | 0 0.5000 AR{2}(3,3) | 0 5 AR{3}(1,1) | 0 3.3333 AR{3}(2,1) | 0 0.3333 AR{3}(3,1) | 0 0.3333 AR{3}(1,2) | 0 0.3333 AR{3}(2,2) | 0 3.3333 AR{3}(3,2) | 0 0.3333 AR{3}(1,3) | 0 0.3333 AR{3}(2,3) | 0 0.3333 AR{3}(3,3) | 0 3.3333 AR{4}(1,1) | 0 2.5000 AR{4}(2,1) | 0 0.2500 AR{4}(3,1) | 0 0.2500 AR{4}(1,2) | 0 0.2500 AR{4}(2,2) | 0 2.5000 AR{4}(3,2) | 0 0.2500 AR{4}(1,3) | 0 0.2500 AR{4}(2,3) | 0 0.2500 AR{4}(3,3) | 0 2.5000 Innovations Covariance Matrix
12-2456
summarize
| INFL UNRATE FEDFUNDS ----------------------------------------INFL | 0.1111 0 0 | (0.0594) (0.0398) (0.0398) UNRATE | 0 0.1111 0 | (0.0398) (0.0594) (0.0398) FEDFUNDS | 0 0 0.1111 | (0.0398) (0.0398) (0.0594) summarize(NormalPriorMdl) | Mean Std -----------------------------Constant(1) | 0 100 Constant(2) | 0 100 Constant(3) | 0 100 AR{1}(1,1) | 0.7500 10 AR{1}(2,1) | 0 1 AR{1}(3,1) | 0 1 AR{1}(1,2) | 0 1 AR{1}(2,2) | 0.7500 10 AR{1}(3,2) | 0 1 AR{1}(1,3) | 0 1 AR{1}(2,3) | 0 1 AR{1}(3,3) | 0.7500 10 AR{2}(1,1) | 0 5 AR{2}(2,1) | 0 0.5000 AR{2}(3,1) | 0 0.5000 AR{2}(1,2) | 0 0.5000 AR{2}(2,2) | 0 5 AR{2}(3,2) | 0 0.5000 AR{2}(1,3) | 0 0.5000 AR{2}(2,3) | 0 0.5000 AR{2}(3,3) | 0 5 AR{3}(1,1) | 0 3.3333 AR{3}(2,1) | 0 0.3333 AR{3}(3,1) | 0 0.3333 AR{3}(1,2) | 0 0.3333 AR{3}(2,2) | 0 3.3333 AR{3}(3,2) | 0 0.3333 AR{3}(1,3) | 0 0.3333 AR{3}(2,3) | 0 0.3333 AR{3}(3,3) | 0 3.3333 AR{4}(1,1) | 0 2.5000 AR{4}(2,1) | 0 0.2500 AR{4}(3,1) | 0 0.2500 AR{4}(1,2) | 0 0.2500 AR{4}(2,2) | 0 2.5000 AR{4}(3,2) | 0 0.2500 AR{4}(1,3) | 0 0.2500 AR{4}(2,3) | 0 0.2500 AR{4}(3,3) | 0 2.5000 Innovations Covariance Matrix | INFL UNRATE FEDFUNDS ----------------------------------INFL | 1 0 0 | (0) (0) (0) UNRATE | 0 1 0
12-2457
12
Functions
| FEDFUNDS | |
(0) 0 (0)
(0) 0 (0)
(0) 1 (0)
Semiconjugate and normal conjugate prior models yield a richer prior specification than the conjugate and diffuse models.
Adjust Distribution Summary Displays Consider the 3-D VAR(4) model of “Inspect Minnesota Prior Assumptions Among Models” on page 122453. Assume that the prior distribution is diffuse. Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment and federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create a diffuse Bayesian VAR(4) prior model for the three response series. Specify the response variable names. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'SeriesNames',seriesnames);
Estimate the posterior distribution. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames}); Bayesian VAR under diffuse priors Effective Sample Size: 197 Number of equations: 3 Number of estimated Parameters: 39 | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781 AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868
12-2458
summarize
AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647 AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860 AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620 AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917 AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
Summarize the posterior distribution; compare each estimation display type. summarize(PosteriorMdl); % The default is 'table'. | Mean Std ------------------------------Constant(1) | 0.1007 0.0832 Constant(2) | -0.0499 0.0450 Constant(3) | -0.4221 0.1781 AR{1}(1,1) | 0.1241 0.0762 AR{1}(2,1) | -0.0219 0.0413 AR{1}(3,1) | -0.1586 0.1632 AR{1}(1,2) | -0.4809 0.1536 AR{1}(2,2) | 0.4716 0.0831 AR{1}(3,2) | -1.4368 0.3287 AR{1}(1,3) | 0.1005 0.0390 AR{1}(2,3) | 0.0391 0.0211 AR{1}(3,3) | -0.2905 0.0835 AR{2}(1,1) | 0.3236 0.0868 AR{2}(2,1) | 0.0913 0.0469 AR{2}(3,1) | 0.3403 0.1857 AR{2}(1,2) | -0.0503 0.1647
12-2459
12
Functions
AR{2}(2,2) | 0.2414 0.0891 AR{2}(3,2) | -0.2968 0.3526 AR{2}(1,3) | 0.0450 0.0413 AR{2}(2,3) | 0.0536 0.0223 AR{2}(3,3) | -0.3117 0.0883 AR{3}(1,1) | 0.4272 0.0860 AR{3}(2,1) | -0.0389 0.0465 AR{3}(3,1) | 0.2848 0.1841 AR{3}(1,2) | 0.2738 0.1620 AR{3}(2,2) | 0.0552 0.0876 AR{3}(3,2) | -0.7401 0.3466 AR{3}(1,3) | 0.0523 0.0428 AR{3}(2,3) | 0.0008 0.0232 AR{3}(3,3) | 0.0028 0.0917 AR{4}(1,1) | 0.0167 0.0901 AR{4}(2,1) | 0.0285 0.0488 AR{4}(3,1) | -0.0690 0.1928 AR{4}(1,2) | -0.1830 0.1520 AR{4}(2,2) | -0.1795 0.0822 AR{4}(3,2) | 0.1494 0.3253 AR{4}(1,3) | 0.0067 0.0395 AR{4}(2,3) | 0.0088 0.0214 AR{4}(3,3) | -0.1372 0.0845 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
The default is the same default tabular display that estimate prints. summarize(PosteriorMdl,'equation');
VAR Equations | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) INFL(-3) ------------------------------------------------------------------------------------------------INFL | 0.1241 -0.4809 0.1005 0.3236 -0.0503 0.0450 0.4272 | (0.0762) (0.1536) (0.0390) (0.0868) (0.1647) (0.0413) (0.0860) DUNRATE | -0.0219 0.4716 0.0391 0.0913 0.2414 0.0536 -0.0389 | (0.0413) (0.0831) (0.0211) (0.0469) (0.0891) (0.0223) (0.0465) DFEDFUNDS | -0.1586 -1.4368 -0.2905 0.3403 -0.2968 -0.3117 0.2848 | (0.1632) (0.3287) (0.0835) (0.1857) (0.3526) (0.0883) (0.1841) Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.3028 -0.0217 0.1579 | (0.0321) (0.0124) (0.0499) DUNRATE | -0.0217 0.0887 -0.1435 | (0.0124) (0.0094) (0.0283) DFEDFUNDS | 0.1579 -0.1435 1.3872 | (0.0499) (0.0283) (0.1470)
12-2460
summarize
In the 'equation' display, rows correspond to response equations in the VAR system, and columns correspond to lagged response variables within equations. Elements in the table correspond to the posterior means of the corresponding coefficient; under each mean in parentheses is the standard deviation of the posterior. summarize(PosteriorMdl,'matrix'); VAR Coefficient Matrix of Lag 1 | INFL(-1) DUNRATE(-1) DFEDFUNDS(-1) -------------------------------------------------INFL | 0.1241 -0.4809 0.1005 | (0.0762) (0.1536) (0.0390) DUNRATE | -0.0219 0.4716 0.0391 | (0.0413) (0.0831) (0.0211) DFEDFUNDS | -0.1586 -1.4368 -0.2905 | (0.1632) (0.3287) (0.0835) VAR Coefficient Matrix of Lag 2 | INFL(-2) DUNRATE(-2) DFEDFUNDS(-2) -------------------------------------------------INFL | 0.3236 -0.0503 0.0450 | (0.0868) (0.1647) (0.0413) DUNRATE | 0.0913 0.2414 0.0536 | (0.0469) (0.0891) (0.0223) DFEDFUNDS | 0.3403 -0.2968 -0.3117 | (0.1857) (0.3526) (0.0883) VAR Coefficient Matrix of Lag 3 | INFL(-3) DUNRATE(-3) DFEDFUNDS(-3) -------------------------------------------------INFL | 0.4272 0.2738 0.0523 | (0.0860) (0.1620) (0.0428) DUNRATE | -0.0389 0.0552 0.0008 | (0.0465) (0.0876) (0.0232) DFEDFUNDS | 0.2848 -0.7401 0.0028 | (0.1841) (0.3466) (0.0917) VAR Coefficient Matrix of Lag 4 | INFL(-4) DUNRATE(-4) DFEDFUNDS(-4) -------------------------------------------------INFL | 0.0167 -0.1830 0.0067 | (0.0901) (0.1520) (0.0395) DUNRATE | 0.0285 -0.1795 0.0088 | (0.0488) (0.0822) (0.0214) DFEDFUNDS | -0.0690 0.1494 -0.1372 | (0.1928) (0.3253) (0.0845) Constant Term INFL | 0.1007 | (0.0832) DUNRATE | -0.0499 | 0.0450 DFEDFUNDS | -0.4221 | 0.1781 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS -------------------------------------------
12-2461
12
Functions
INFL
| | DUNRATE | | DFEDFUNDS | |
0.3028 (0.0321) -0.0217 (0.0124) 0.1579 (0.0499)
-0.0217 (0.0124) 0.0887 (0.0094) -0.1435 (0.0283)
0.1579 (0.0499) -0.1435 (0.0283) 1.3872 (0.1470)
In the 'matrix' display, each table contains the posterior mean of the corresponding coefficient matrix. Under each mean in parentheses the posterior standard deviation.
Return and Inspect Estimation Summary Information Consider the 3-D VAR(4) model of “Inspect Minnesota Prior Assumptions Among Models” on page 122453. Assume that the parameters follow a semiconjugate prior model. Load the US macroeconomic data set. Compute the inflation rate, stabilize the unemployment aand federal funds rates, and remove missing values. load Data_USEconModel seriesnames = ["INFL" "UNRATE" "FEDFUNDS"]; DataTimeTable.INFL = 100*[NaN; price2ret(DataTimeTable.CPIAUCSL)]; DataTimeTable.DUNRATE = [NaN; diff(DataTimeTable.UNRATE)]; DataTimeTable.DFEDFUNDS = [NaN; diff(DataTimeTable.FEDFUNDS)]; seriesnames(2:3) = "D" + seriesnames(2:3); rmDataTimeTable = rmmissing(DataTimeTable);
Create a semiconjugate Bayesian VAR(4) prior model for the three response series. Specify the response variable names, and suppress the estimation display. numseries = numel(seriesnames); numlags = 4; PriorMdl = bayesvarm(numseries,numlags,'Model','semiconjugate',... 'SeriesNames',seriesnames);
Estimate the posterior distribution. Suppress the estimation display. PosteriorMdl = estimate(PriorMdl,rmDataTimeTable{:,seriesnames},'Display','off');
Because the posterior of a semiconjugate model is analytically intractable, PosteriorMdl is an empiricalbvarm model object storing the draws from the Gibbs sampler. Summarize the posterior distribution; return the estimation summary. Summary = summarize(PosteriorMdl); | Mean Std ------------------------------Constant(1) | 0.1830 0.0718 Constant(2) | -0.0808 0.0413 Constant(3) | -0.0161 0.1309 AR{1}(1,1) | 0.2246 0.0650 AR{1}(2,1) | -0.0263 0.0340 AR{1}(3,1) | -0.0263 0.0775
12-2462
summarize
AR{1}(1,2) | -0.0837 0.0824 AR{1}(2,2) | 0.3665 0.0740 AR{1}(3,2) | -0.1283 0.0948 AR{1}(1,3) | 0.1362 0.0323 AR{1}(2,3) | 0.0154 0.0198 AR{1}(3,3) | -0.0538 0.0685 AR{2}(1,1) | 0.2518 0.0700 AR{2}(2,1) | 0.0928 0.0352 AR{2}(3,1) | 0.0373 0.0628 AR{2}(1,2) | -0.0097 0.0632 AR{2}(2,2) | 0.1657 0.0709 AR{2}(3,2) | -0.0254 0.0688 AR{2}(1,3) | 0.0329 0.0308 AR{2}(2,3) | 0.0341 0.0199 AR{2}(3,3) | -0.1451 0.0637 AR{3}(1,1) | 0.2895 0.0665 AR{3}(2,1) | 0.0013 0.0332 AR{3}(3,1) | -0.0036 0.0530 AR{3}(1,2) | 0.0322 0.0538 AR{3}(2,2) | -0.0150 0.0667 AR{3}(3,2) | -0.0369 0.0568 AR{3}(1,3) | 0.0368 0.0298 AR{3}(2,3) | -0.0083 0.0194 AR{3}(3,3) | 0.1516 0.0603 AR{4}(1,1) | 0.0452 0.0644 AR{4}(2,1) | 0.0225 0.0325 AR{4}(3,1) | -0.0097 0.0470 AR{4}(1,2) | -0.0218 0.0468 AR{4}(2,2) | -0.1125 0.0611 AR{4}(3,2) | 0.0013 0.0491 AR{4}(1,3) | 0.0180 0.0273 AR{4}(2,3) | 0.0084 0.0179 AR{4}(3,3) | -0.0815 0.0594 Innovations Covariance Matrix | INFL DUNRATE DFEDFUNDS ------------------------------------------INFL | 0.2983 -0.0219 0.1750 | (0.0307) (0.0121) (0.0500) DUNRATE | -0.0219 0.0890 -0.1495 | (0.0121) (0.0093) (0.0290) DFEDFUNDS | 0.1750 -0.1495 1.4730 | (0.0500) (0.0290) (0.1514) Summary Summary = struct with fields: Description: "3-Dimensional VAR(4) Model" NumEstimatedParameters: 39 Table: [39x2 table] CoeffMap: [39x1 string] CoeffMean: [39x1 double] CoeffStd: [39x1 double] SigmaMean: [3x3 double] SigmaStd: [3x3 double]
12-2463
12
Functions
Summary is a structure array of fields containing posterior estimation information. For example, the CoeffMap field contains a list of the coefficient names. The order of the names corresponds to the order the all coefficient vector inputs and outputs. Display CoeffMap. Summary.CoeffMap ans = 39x1 string "AR{1}(1,1)" "AR{1}(1,2)" "AR{1}(1,3)" "AR{2}(1,1)" "AR{2}(1,2)" "AR{2}(1,3)" "AR{3}(1,1)" "AR{3}(1,2)" "AR{3}(1,3)" "AR{4}(1,1)" "AR{4}(1,2)" "AR{4}(1,3)" "Constant(1)" "AR{1}(2,1)" "AR{1}(2,2)" "AR{1}(2,3)" "AR{2}(2,1)" "AR{2}(2,2)" "AR{2}(2,3)" "AR{3}(2,1)" "AR{3}(2,2)" "AR{3}(2,3)" "AR{4}(2,1)" "AR{4}(2,2)" "AR{4}(2,3)" "Constant(2)" "AR{1}(3,1)" "AR{1}(3,2)" "AR{1}(3,3)" "AR{2}(3,1)" ⋮
Input Arguments Mdl — Prior or posterior Bayesian VAR model conjugatebvarm model object | semiconjugatebvarm model object | diffusebvarm model object | normalbvarm model object | empiricalbvarm model object Prior or posterior Bayesian VAR model, specified as a model object in this table. Model Object
Description
conjugatebvarm
Dependent, matrix-normal-inverse-Wishart conjugate model returned by bayesvarm, conjugatebvarm, or estimate
semiconjugatebv Independent, normal-inverse-Wishart semiconjugate prior model returned by arm bayesvarm or semiconjugatebvarm
12-2464
summarize
Model Object
Description
diffusebvarm
Diffuse prior model returned by bayesvarm or diffusebvarm
empiricalbvarm
Prior or posterior model characterized by random draws from respective distributions, returned by empiricalbvarm or estimate
display — Distribution summary display style 'table' (default) | 'off' | 'equation' | 'matrix' Distribution summary display style, specified as a value in this table. Value
Description
'off'
summarize does not print to the command line.
'table'
summarize prints the following: • Estimation information
• Tabular summary of coefficient posterior means and standard deviations; each row corresp to a coefficient, and each column corresponds to an estimate type
• Posterior mean of the innovations covariance matrix with standard deviations in parenthes 'equation'
summarize prints the following: • Estimation information
• Tabular summary of posterior means and standard deviations; each row corresponds to a response variable in the system, and each column corresponds to a coefficient in the equat (for example, the column labeled Y1(-1) contains the estimates of the lag 1 coefficient of first response variable in each equation)
• Posterior mean of the innovations covariance matrix with standard deviations in parenthes 'matrix'
summarize prints the following: • Estimation information
• Separate tabular displays of posterior means and standard deviations (in parentheses) for e parameter in the model Φ1,…, Φp, c, δ, Β, and Σ Data Types: char | string
Output Arguments Summary — Distribution summary statistics structure array Distribution summary statistics, returned as a structure array containing these fields: Field
Description
Data type
Description
Model description
string scalar
NumEstimatedParameters
Number of coefficients
numeric scalar
12-2465
12
Functions
Field
Description
Data type
Table
Table of coefficient distribution table means and standard deviations; each row corresponds to a coefficient and each column corresponds to a statistic
CoeffMap
Coefficient names
string vector
CoeffMean
Coefficient distribution means
numeric vector, rows correspond to CoeffMap
CoeffStd
Coefficient distribution standard numeric vector, rows deviations correspond to CoeffMap
SigmaMean
Innovations covariance distribution mean matrix
numeric matrix, rows and columns correspond to response equations
SigmaStd
Innovations covariance distribution standard deviation matrix
numeric matrix, rows and columns correspond to response equations
More About Bayesian Vector Autoregression (VAR) Model A Bayesian VAR model treats all coefficients and the innovations covariance matrix as random variables in the m-dimensional, stationary VARX(p) model. The model has one of the three forms described in this table. Model
Equation
Reduced-form VAR(p) in difference-equation notation
yt = Φ1 yt − 1 + ... + Φp yt − p + c + δt + Βxt + εt .
Multivariate regression
yt = Zt λ + εt .
Matrix regression
yt = Λ′zt′ + εt .
For each time t = 1,...,T: • yt is the m-dimensional observed response vector, where m = numseries. • Φ1,…,Φp are the m-by-m AR coefficient matrices of lags 1 through p, where p = numlags. • c is the m-by-1 vector of model constants if IncludeConstant is true. • δ is the m-by-1 vector of linear time trend coefficients if IncludeTrend is true. • Β is the m-by-r matrix of regression coefficients of the r-by-1 vector of observed exogenous predictors xt, where r = NumPredictors. All predictor variables appear in each equation. • zt = yt′ − 1 yt′ − 2 ⋯ yt′ − p 1 t xt′ , which is a 1-by-(mp + r + 2) vector, and Zt is the m-by-m(mp + r + 2) block diagonal matrix
12-2466
summarize
zt 0z ⋯ 0z 0z zt ⋯ 0z ⋮ ⋮ ⋱ ⋮ 0z 0z 0z zt
,
where 0z is a 1-by-(mp + r + 2) vector of zeros. •
Λ = Φ1 Φ2 ⋯ Φp c δ Β ′, which is an (mp + r + 2)-by-m random matrix of the coefficients, and the m(mp + r + 2)-by-1 vector λ = vec(Λ).
• εt is an m-by-1 vector of random, serially uncorrelated, multivariate normal innovations with the zero vector for the mean and the m-by-m matrix Σ for the covariance. This assumption implies that the data likelihood is ℓ Λ, Σ y, x =
T
∏
t=1
f yt; Λ, Σ, zt ,
where f is the m-dimensional multivariate normal density with mean ztΛ and covariance Σ, evaluated at yt. Before considering the data, you impose a joint prior distribution assumption on (Λ,Σ), which is governed by the distribution π(Λ,Σ). In a Bayesian analysis, the distribution of the parameters is updated with information about the parameters obtained from the data likelihood. The result is the joint posterior distribution π(Λ,Σ|Y,X,Y0), where: • Y is a T-by-m matrix containing the entire response series {yt}, t = 1,…,T. • X is a T-by-m matrix containing the entire exogenous series {xt}, t = 1,…,T. • Y0 is a p-by-m matrix of presample data used to initialize the VAR model for estimation.
Version History Introduced in R2020a
See Also Functions estimate | bayesvarm Objects normalbvarm | conjugatebvarm | semiconjugatebvarm | diffusebvarm | empiricalbvarm
12-2467
12
Functions
summarize Display estimation results of conditional variance model
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the conditional variance model Mdl. • If Mdl is an estimated model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC). • If Mdl is an unestimated model returned by garch, egarch, or gjr, then summarize prints the standard object display (the same display printed during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated model, then results is a structure containing estimation results. • If Mdl is an unestimated model, then results is a garch, egarch, or gjr model object that is equal to Mdl.
Examples Display Estimation Results Print the results from estimating a GARCH model using simulated data. Simulate data from a GARCH(1,1) model with known parameter values. Mdl0 = garch('Constant',0.01,'GARCH',0.8,'ARCH',0.14); rng 'default'; % For reproducibility [V,Y] = simulate(Mdl0,100);
Fit a GARCH(1,1) model to the simulated data. Suppress the estimation display. Mdl = garch(1,1); EstMdl = estimate(Mdl,Y,'Display','off');
Display an estimation summary. summarize(EstMdl) GARCH(1,1) Conditional Variance Model (Gaussian Distribution)
12-2468
summarize
Effective Sample Size: 100 Number of Estimated Parameters: 3 LogLikelihood: -96.5255 AIC: 199.051 BIC: 206.866 Value _______ Constant GARCH{1} ARCH{1}
0.0167 0.77263 0.19169
StandardError _____________
TStatistic __________
PValue __________
1.0117 9.945 2.5535
0.31169 2.6523e-23 0.010664
0.016508 0.07769 0.075068
Extract Estimation Results from Estimated Model Estimate several models by passing an EGARCH model template and data to estimate. Vary the number of ARCH and GARCH lags among the models. Extract the AIC from the estimation results, and choose the model that minimizes the fit statistic. Simulate data from an EGARCH(0,1) model with known parameter values. Mdl0 = egarch('Constant',0.01,'ARCH',0.75,'Leverage',-0.1); rng(2); % For reproducibility [~,Y] = simulate(Mdl0,100);
To determine the number of ARCH and GARCH lags, create and estimate multiple EGARCH models. Vary the number of GARCH and ARCH lags (p and q, respectively) among the models from 0 to 1 lag. Exclude the case where p = 1 and q = 0 because the presence of GARCH lags requires the presence of ARCH lags. Suppress all estimation displays. Extract the AIC from the estimation results structure. The field AIC stores the AIC. pq = [0 0; 0 1; 1 1]; AIC = zeros(size(pq,1),1); % Preallocation for j = 1:size(pq,1) Mdl = egarch(pq(j,1),pq(j,2)); EstMdl = estimate(Mdl,Y,'Display','off'); results = summarize(EstMdl); AIC(j) = results.AIC; end
Compare the AIC values among the models. [minAIC,bestidx] = min(AIC,[],1); bestPQ = pq(bestidx,:) bestPQ = 1×2 0
1
The best fitting model is the EGARCH(0,1) model because its corresponding AIC is the lowest. This model also has the structure of the model used to simulate the data. 12-2469
12
Functions
Input Arguments Mdl — Conditional variance model garch model object | egarch model object | gjr model object Conditional variance model, specified as a garch, egarch, or gjr model object returned by estimate, garch, egarch, or gjr.
Output Arguments results — Model summary structure array | garch model object | egarch model object | gjr model object Model summary, returned as a structure array or a garch, egarch, or gjr model object. • If Mdl is an estimated model, then results is a structure array containing the fields in this table. Field
Description
Description
Model summary description (string)
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Maximum likelihood estimates of the model parameters with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
• If Mdl is an unestimated model, then results is a conditional variance model object that is equal to Mdl.
Version History Introduced in R2012a
See Also Objects garch | egarch | gjr Functions estimate 12-2470
summarize
summarize Summarize Markov-switching dynamic regression model estimation results
Syntax summarize(Mdl) summarize(Mdl,state) results = summarize( ___ )
Description summarize(Mdl) displays a summary of the Markov-switching dynamic regression model Mdl. • If Mdl is an estimated model returned by estimate, then summarize displays estimation results to the MATLAB Command Window. The display includes: • A model description • Estimated transition probabilities • Fit statistics, which include the effective sample size, number of estimated submodel parameters and constraints, loglikelihood, and information criteria (AIC and BIC) • A table of submodel estimates and inferences, which includes coefficient estimates with standard errors, t-statistics, and p-values. • If Mdl is an unestimated Markov-switching model returned by msVAR, summarize prints the standard object display (the same display that msVAR prints during model creation). summarize(Mdl,state) displays only summary information for the submodel with name state. results = summarize( ___ ) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated Markov-switching model, results is a table containing the submodel estimates and inferences. • If Mdl is an unestimated model, results is an msVAR object that is equal to Mdl.
Examples Estimate Markov-Switching Dynamic Regression Model Consider a two-state Markov-switching dynamic regression model of the postwar US real GDP growth rate, as estimated in [1]. Create Partially Specified Model for Estimation Create a Markov-switching dynamic regression model for the naive estimator by specifying a twostate discrete-time Markov chain with an unknown transition matrix and AR(0) (constant only) submodels for both regimes. Label the regimes. P = NaN(2); mc = dtmc(P,'StateNames',["Expansion" "Recession"]);
12-2471
12
Functions
mdl = arima(0,0,0); Mdl = msVAR(mc,[mdl; mdl]);
Mdl is a partially specified msVAR object. NaN-valued elements of the Switch and SubModels properties indicate estimable parameters. Create Fully Specified Model Containing Initial Values The estimation procedure requires initial values for all estimable parameters. Create a fully specified Markov-switching dynamic regression model that has the same structure as Mdl, but set all estimable parameters to initial values. This example uses arbitrary initial values. P0 = 0.5*ones(2); mc0 = dtmc(P0,'StateNames',Mdl.StateNames); mdl01 = arima('Constant',1,'Variance',1); mdl02 = arima('Constant',-1,'Variance',1); Mdl0 = msVAR(mc0,[mdl01; mdl02]);
Mdl0 is a fully specified msVAR object. Load and Preprocess Data Load the US GDP data set. load Data_GDP
Data contains quarterly measurements of the US real GDP in the period 1947:Q1–2005:Q2. The estimation period in [1] is 1947:Q2–2004:Q2. For more details on the data set, enter Description at the command line. Transform the data to an annualized rate series: 1
Convert the data to a quarterly rate within the estimation period.
2
Annualize the quarterly rates.
qrate = diff(Data(2:230))./Data(2:229); % Quarterly rate arate = 100*((1 + qrate).^4 - 1); % Annualized rate
Estimate Model Fit the model Mdl to the annualized rate series arate. Specify Mdl0 as the model containing the initial estimable parameter values. EstMdl = estimate(Mdl,Mdl0,arate);
EstMdl is an estimated (fully specified) Markov-switching dynamic regression model. EstMdl.Switch is an estimated discrete-time Markov chain model (dtmc object), and EstMdl.Submodels is a vector of estimated univariate VAR(0) models (varm objects). Display the estimated state-specific dynamic models. EstMdlExp = EstMdl.Submodels(1) EstMdlExp = varm with properties: Description: "1-Dimensional VAR(0) Model" SeriesNames: "Y1"
12-2472
summarize
NumSeries: P: Constant: AR: Trend: Beta: Covariance:
1 0 4.90146 {} 0 [1×0 matrix] 12.087
EstMdlRec = EstMdl.Submodels(2) EstMdlRec = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(0) Model" "Y1" 1 0 0.0084884 {} 0 [1×0 matrix] 12.6876
Display the estimated state transition matrix. EstP = EstMdl.Switch.P EstP = 2×2 0.9088 0.2303
0.0912 0.7697
Display an estimation summary containing parameter estimates and inferences. summarize(EstMdl) Description 1-Dimensional msVAR Model with 2 Submodels Switch Estimated Transition Matrix: 0.909 0.091 0.230 0.770 Fit Effective Sample Size: 228 Number of Estimated Parameters: 2 Number of Constrained Parameters: 0 LogLikelihood: -639.496 AIC: 1282.992 BIC: 1289.851 Submodels Estimate _________
StandardError _____________
TStatistic __________
PValue ___________
12-2473
12
Functions
State 1 Constant(1) State 2 Constant(1)
4.9015 0.0084884
0.23023 0.2359
21.289 0.035983
1.4301e-100 0.9713
Display Estimation Summary Separately for Each State Create the following fully specified Markov-switching model the DGP. •
• • •
0.5 0.2 0.3 State transition matrix: P = 0 . 2 0 . 6 0 . 2 . 0.2 0.1 0.7 State 1: State 2: State 3:
y1, t y2, t y1, t y2, t y1, t y2, t
=
−1 −0 . 5 0 . 1 y1, t − 1 0 0.5 0 + + ε1, t, where ε1, t ∼ N2 . , −1 0 . 2 −0 . 75 y2, t − 1 0 0 1
=
−1 0 1 0 . + ε2, t, where ε2, t ∼ N2 , 2 0 0 1
=
1 0 . 5 0 . 1 y1, t − 1 0 1 −0 . 1 + + ε3, t, where ε3, t ∼ N2 , . 2 0 . 2 0 . 75 y2, t − 1 0 −0 . 1 2
PDGP = [0.5 0.2 0.3; 0.2 0.6 0.2; 0.2 0.1 0.7]; mcDGP = dtmc(PDGP); constant1 = [-1; -1]; constant2 = [-1; 2]; constant3 = [1; 2]; AR1 = [-0.5 0.1; 0.2 -0.75]; AR3 = [0.5 0.1; 0.2 0.75]; Sigma1 = [0.5 0; 0 1]; Sigma2 = eye(2); Sigma3 = [1 -0.1; -0.1 2]; mdl1DGP = varm(Constant=constant1,AR={AR1},Covariance=Sigma1); mdl2DGP = varm(Constant=constant2,Covariance=Sigma2); mdl3DGP = varm(Constant=constant3,AR={AR3},Covariance=Sigma3); mdlDGP = [mdl1DGP; mdl2DGP; mdl3DGP]; MdlDGP = msVAR(mcDGP,mdlDGP);
Generate a random response path of length 1000 from the DGP. rng(1) % For reproducibiliy Y = simulate(MdlDGP,1000);
Create a partially specified Markov-switching model that has the same structure as the DGP, but the transition matrix, and all submodel coefficients and innovations covariance matrices are unknown and estimable. mc = dtmc(nan(3)); mdlar = varm(2,1); mdlc = varm(2,0); Mdl = msVAR(mc,[mdlar; mdlc; mdlar]);
Initialize the estimation procedure by fully specifying a Markov-switching model that has the same structure as Mdl, but has the following parameter values: 12-2474
summarize
• A randomly drawn transition matrix • Randomly drawn contant vectors for each model • AR self lags of 0.1 and cross lags of 0 • The identify matrix for the innovations covariance P0 = randi(10,3,3); mc0 = dtmc(P0); constant01 = randn(2,1); constant02 = randn(2,1); constant03 = randn(2,1); AR0 = 0.1*eye(2); Sigma0 = eye(2); mdl01 = varm(Constant=constant01,AR={AR0},Covariance=Sigma0); mdl02 = varm(Constant=constant02,Covariance=Sigma0); mdl03 = varm(Constant=constant03,AR={AR0},Covariance=Sigma0); submdl0 = [mdl01; mdl02; mdl03]; Mdl0 = msVAR(mc0,submdl0);
Fit the Markov-switching model to the simulated series. Plot the loglikelihood after each iteration of the EM algorithm. EstMdl = estimate(Mdl,Mdl0,Y,IterationPlot=true);
The plot displays the evolution of the loglikelihood with increasing iterations of the EM algorithm. The procedure terminates when one of the stopping criteria is satisfied. Display an estimation summary of the model. 12-2475
12
Functions
summarize(EstMdl) Description 2-Dimensional msVAR Model with 3 Submodels Switch Estimated Transition Matrix: 0.501 0.245 0.254 0.204 0.549 0.247 0.188 0.102 0.710 Fit Effective Sample Size: 999 Number of Estimated Parameters: 14 Number of Constrained Parameters: 0 LogLikelihood: -3634.005 AIC: 7296.010 BIC: 7364.704 Submodels
State State State State State State State State State State State State State State
1 1 1 1 1 1 2 2 3 3 3 3 3 3
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2) Constant(1) Constant(2) Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
Estimate ________
StandardError _____________
-0.98929 -1.0884 -0.48446 0.1835 0.083953 -0.72972 -0.9082 1.9514 1.1212 1.9561 0.48965 0.22688 0.095847 0.72766
0.023779 0.030164 0.01547 0.019624 0.0070162 0.0089002 0.030103 0.030483 0.044427 0.0593 0.023149 0.030899 0.012005 0.016024
TStatistic __________ -41.603 -36.083 -31.316 9.3509 11.966 -81.989 -30.17 64.016 25.237 32.986 21.152 7.3427 7.9838 45.41
PValue ___________ 0 4.1957e-285 2.8121e-215 8.6868e-21 5.3839e-33 0 5.9064e-200 0 1.5818e-140 1.2831e-238 2.6484e-99 2.0936e-13 1.4188e-15 0
Display an estimation summary separately for each state. summarize(EstMdl,1) Description 2-Dimensional VAR Submodel, State 1 Submodel
State State State State State State
1 1 1 1 1 1
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
summarize(EstMdl,2)
12-2476
Estimate ________
StandardError _____________
-0.98929 -1.0884 -0.48446 0.1835 0.083953 -0.72972
0.023779 0.030164 0.01547 0.019624 0.0070162 0.0089002
TStatistic __________ -41.603 -36.083 -31.316 9.3509 11.966 -81.989
PValue ___________ 0 4.1957e-285 2.8121e-215 8.6868e-21 5.3839e-33 0
summarize
Description 2-Dimensional VAR Submodel, State 2 Submodel Estimate ________ State 2 Constant(1) State 2 Constant(2)
StandardError _____________
-0.9082 1.9514
TStatistic __________
PValue ___________
-30.17 64.016
5.9064e-200 0
TStatistic __________
PValue ___________
25.237 32.986 21.152 7.3427 7.9838 45.41
1.5818e-140 1.2831e-238 2.6484e-99 2.0936e-13 1.4188e-15 0
0.030103 0.030483
summarize(EstMdl,3) Description 2-Dimensional VAR Submodel, State 3 Submodel Estimate ________ State State State State State State
3 3 3 3 3 3
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
StandardError _____________
1.1212 1.9561 0.48965 0.22688 0.095847 0.72766
0.044427 0.0593 0.023149 0.030899 0.012005 0.016024
Return Estimation Summary Table Consider the model for the US GDP growth rate in “Estimate Markov-Switching Dynamic Regression Model” on page 12-2471. Create a Markov-switching dynamic regression model for the naive estimator. P = NaN(2); mc = dtmc(P,'StateNames',["Expansion" "Recession"]); mdl = arima(0,0,0); Mdl = msVAR(mc,[mdl; mdl]);
Create a fully specified Markov-switching dynamic regression model that has the same structure as Mdl, but set all estimable parameters to initial values. P0 = 0.5*ones(2); mc0 = dtmc(P0,'StateNames',Mdl.StateNames); mdl01 = arima('Constant',1,'Variance',1); mdl02 = arima('Constant',-1,'Variance',1); Mdl0 = msVAR(mc0,[mdl01; mdl02]);
Load the US GDP data set. Preprocess the data. load Data_GDP qrate = diff(Data(2:230))./Data(2:229); % Quarterly rate arate = 100*((1 + qrate).^4 - 1); % Annualized rate
Fit the model Mdl to the annualized rate series arate. Specify Mdl0 as the model containing the initial estimable parameter values. 12-2477
12
Functions
EstMdl = estimate(Mdl,Mdl0,arate);
Return an estimation summary table. results = summarize(EstMdl) results=2×4 table
State 1 Constant(1) State 2 Constant(1)
Estimate _________
StandardError _____________
TStatistic __________
PValue ___________
4.9015 0.0084884
0.23023 0.2359
21.289 0.035983
1.4301e-100 0.9713
results is a table containing estimates and inferences for all submodel coefficients. Identify significant coefficient estimates. results.Properties.RowNames(results.PValue < 0.05) ans = 1x1 cell array {'State 1 Constant(1)'}
Input Arguments Mdl — Markov-switching dynamic regression model msVAR object Markov-switching dynamic regression model, specified as an msVAR object returned by estimate or msVAR. state — State to summarize integer in 1:Mdl.NumStates (default) | state name in Mdl.StateNames State to summarize, specified as an integer in 1:Mdl.NumStates or a state name in Mdl.StateNames. The default summarizes all states. Example: summarize(Mdl,3) summarizes the third state in Mdl. Example: summarize(Mdl,"Recession") summarizes the state labeled "Recession" in Mdl. Data Types: double | char | string
Output Arguments results — Model summary table | msVAR object Model summary, returned as a table or an msVAR object. • If Mdl is an estimated Markov-switching model returned by estimate, results is a table of summary information for the submodel parameter estimates. Each row corresponds to a submodel 12-2478
summarize
coefficient. Columns correspond to the estimate (Estimate), standard error (StandardError), tstatistic (TStatistic), and the p-value (PValue). When the summary includes all states (the default), results.Properties stores the following fit statistics: Field
Description
Description
Model summary description (character vector)
EffectiveSampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
NumConstraints
Number of equality constraints (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike information criterion (numeric scalar)
BIC
Bayesian information criterion (numeric scalar)
• If Mdl is an unestimated model, results is an msVAR object that is equal to Mdl. Note When results is a table, it contains only submodel parameter estimates: • Mdl.Switch contains estimated transition probabilities. • Mdl.Submodels(j).Covariance contains the estimated residual covariance matrix of state j. For details, see msVAR.
Algorithms estimate implements a version of Hamilton's Expectation-Maximization (EM) algorithm, as described in [3]. The standard errors, loglikelihood, and information criteria are conditional on optimal parameter values in the estimated transition matrix Mdl.Switch. In particular, standard errors do not account for variation in estimated transition probabilities.
Version History Introduced in R2021b
References [1] Chauvet, M., and J. D. Hamilton. "Dating Business Cycle Turning Points." In Nonlinear Analysis of Business Cycles (Contributions to Economic Analysis, Volume 276). (C. Milas, P. Rothman, and D. van Dijk, eds.). Amsterdam: Emerald Group Publishing Limited, 2006. [2] Hamilton, J. D. "Analysis of Time Series Subject to Changes in Regime." Journal of Econometrics. Vol. 45, 1990, pp. 39–70. [3] Hamilton, James D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-2479
12
Functions
[4] Hamilton, J. D. "Macroeconomic Regimes and Regime Shifts." In Handbook of Macroeconomics. (H. Uhlig and J. Taylor, eds.). Amsterdam: Elsevier, 2016.
See Also Objects msVAR | dtmc Functions estimate
12-2480
summarize
summarize Display estimation results of regression model with ARIMA errors
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the regression model with ARIMA errors Mdl. • If Mdl is an estimated model returned by estimate, summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC), and the estimated innovations variance. • If Mdl is an unestimated model returned by regARIMA, summarize prints the standard object display. results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated model, results is a structure containing estimation results. • If Mdl is an unestimated model, results is a regARIMA model object equal to Mdl.
Examples Display Estimation Results Regress the US gross domestic product (GDP) onto the US consumer price index (CPI) using a regression model with ARMA(1,1) errors, and summarize the results. Load the US Macroeconomic data set and preprocess the data. load Data_USEconModel; logGDP = log(DataTimeTable.GDP); dlogGDP = diff(logGDP); dCPI = diff(DataTimeTable.CPIAUCSL);
Fit the model to the data. Mdl = regARIMA(1,0,1); EstMdl = estimate(Mdl,dlogGDP,X=dCPI,Display="off");
Display the estimates. summarize(EstMdl) ARMA(1,1) Error Model (Gaussian Distribution)
12-2481
12
Functions
Effective Sample Size: 248 Number of Estimated Parameters: 5 LogLikelihood: 798.406 AIC: -1586.81 BIC: -1569.24 Value __________ Intercept AR{1} MA{1} Beta(1) Variance
0.014776 0.60527 -0.16165 0.002044 9.3578e-05
StandardError _____________ 0.0014627 0.08929 0.10956 0.00070616 6.0314e-06
TStatistic __________ 10.102 6.7787 -1.4755 2.8946 15.515
PValue __________ 5.4238e-24 1.2124e-11 0.14009 0.0037969 2.7338e-54
Extract Estimation Results from Estimated Model Estimate several models by passing the data to estimate. Vary the autoregressive and moving average degrees p and q, respectively. Estimation results contain the AIC, which you can extract and then compare among the models. Simulate response and predictor data for the regression model with ARMA errors: −2 + ut 1.5 ut = 0 . 75ut − 1 − 0 . 5ut − 2 + εt + 0 . 7εt − 1, yt = 2 + Xt
where εt is Gaussian with mean 0 and variance 1. Mdl0 = regARIMA(Intercept=2,Beta=[-2; 1.5],AR={0.75, -0.5}, ... MA=0.7,Variance=1); rng(2,"twister"); % For reproducibility Pred = randn(1000,2); % Predictors y = simulate(Mdl0,1000,X=Pred);
To determine the number of AR and MA lags, create and estimate multiple regression models with ARMA(p, q) errors. Vary p = 1,..,3 and q = 1,...,3 among the models. Suppress all estimation displays. Extract the AIC from the estimation results structure. The field AIC stores the AIC. pMax = 3; qMax = 3; AIC = zeros(pMax,qMax); % Preallocation for p = 1:pMax for q = 1:qMax Mdl = regARIMA(p,0,q); EstMdl = estimate(Mdl,y,X=Pred,Display="off"); results = summarize(EstMdl); AIC(p,q) = results.AIC;
12-2482
summarize
end end
Compare the AIC values among the models. minAIC = min(min(AIC)) minAIC = 2.9280e+03 [bestP,bestQ] = find(AIC == minAIC) bestP = 2 bestQ = 1
The best fitting model is the regression model with AR(2,1) errors because its corresponding AIC is the lowest. This model also has the structure of the model used to simulate the data.
Input Arguments Mdl — Regression model with ARIMA errors regARIMA model object Regression model with ARIMA errors, specified as a regARIMA model object returned by regARIMA or estimate.
Output Arguments results — Model summary structure array | regARIMA model object Model summary, returned as a structure array or a regARIMA model object. • If Mdl is an estimated model, results is a structure array containing the fields in this table. Field
Description
Description
Model summary description (string)
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Maximum likelihood estimates of the model parameters with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
12-2483
12
Functions
• If Mdl is an unestimated model, results is a regARIMA model object equal to Mdl.
Version History Introduced in R2013b
See Also Objects regARIMA Functions estimate
12-2484
summarize
summarize Distribution summary statistics of standard Bayesian linear regression model
Syntax summarize(Mdl) SummaryStatistics = summarize(Mdl)
Description To obtain a summary of a Bayesian linear regression model for predictor selection, see summarize. summarize(Mdl) displays a tabular summary of the random regression coefficients and disturbance variance of the standard Bayesian linear regression model on page 12-2488 Mdl at the command line. For each parameter, the summary includes the: • Standard deviation (square root of the variance) • 95% equitailed credible intervals • Probability that the parameter is greater than 0 • Description of the distributions, if known SummaryStatistics = summarize(Mdl) returns a structure array that stores a: • Table containing the summary of the regression coefficients and disturbance variance • Table containing the covariances between variables • Description of the joint distribution of the parameters
Examples Summarize Posterior Distribution Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t time points, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions: •
β | σ2 ∼ N4 M, σ2V . M is a 4-by-1 vector of means, and V is a scaled 4-by-4 positive definite covariance matrix.
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. 12-2485
12
Functions
These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model. Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors p and the variable names. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames);
PriorMdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. Summarize the prior distribution. summarize(PriorMdl) | Mean Std CI95 Positive Distribution ----------------------------------------------------------------------------------Intercept | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) IPI | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) E | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) WR | 0 70.7107 [-141.273, 141.273] 0.500 t (0.00, 57.74^2, 6) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
The function displays a table of summary statistics and other information about the prior distribution at the command line. Load the Nelson-Plosser data set and create variables for the predictor and response data. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Estimate the posterior distributions. Suppress the estimation display. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is a conjugateblm model object that contains the posterior distributions of β and σ2. Obtain summary statistics from the posterior distribution. summary = summarize(PosteriorMdl);
summary is a structure array containing three fields: MarginalDistributions, Covariances, and JointDistribution. Display the marginal distribution summary and covariances by using dot notation. summary.MarginalDistributions ans=5×5 table
Intercept
12-2486
Mean _________
Std __________
-24.249
8.7821
CI95 ________________________ -41.514
-6.9847
Positive _________
Distr ____________
0.0032977
{'t (-24.25,
summarize
IPI E WR Sigma2
4.3913 0.0011202 2.4683 44.135
0.1414 0.00032931 0.34895 7.802
4.1134 0.00047284 1.7822 31.427
4.6693 0.0017676 3.1543 61.855
1 0.99952 1 1
{'t (4.39, 0 {'t (0.00, 0 {'t (2.47, 0 {'IG(34.00,
summary.Covariances ans=5×5 table
Intercept IPI E WR Sigma2
Intercept __________
IPI ___________
E ___________
WR ___________
Sigma2 ______
77.125 0.77133 -0.0023655 0.5311 0
0.77133 0.019994 -6.5001e-06 -0.02948 0
-0.0023655 -6.5001e-06 1.0844e-07 -8.0013e-05 0
0.5311 -0.02948 -8.0013e-05 0.12177 0
0 0 0 0 60.871
The MarginalDistributions field is a table of summary statistics and other information about the posterior distribution. Covariances is a table containing the covariance matrix of the parameters.
Input Arguments Mdl — Standard Bayesian linear regression model conjugateblm model object | semiconjugateblm model object | diffuseblm model object | empiricalblm model object | customblm model object Standard Bayesian linear regression model, specified as a model object in this table. Model Object
Description
conjugateblm
Dependent, normal-inverse-gamma conjugate prior or posterior model returned by bayeslm or estimate
semiconjugatebl Independent, normal-inverse-gamma semiconjugate prior model returned by m bayeslm diffuseblm
Diffuse prior model returned by bayeslm
empiricalblm
Prior or posterior model characterized by random draws from respective distributions, returned by bayeslm or estimate
customblm
Prior distribution function that you declare returned by bayeslm
Output Arguments SummaryStatistics — Parameter distribution summary structure array Parameter distribution summary, returned as a structure array containing the information in this table.
12-2487
12
Functions
Structure Field
Description
MarginalDistribu Table containing a summary of the parameter distributions. Rows correspond tions to parameters. Columns correspond to the: • Estimated posterior mean (Mean) • Standard deviation (Std) • 95% equitailed credible interval (CI95) • Posterior probability that the parameter is greater than 0 (Positive) • Description of the marginal or conditional posterior distribution of the parameter (Distribution) Row names are the names in Mdl.VarNames, and the name of the last row is Sigma2. Covariances
Table containing covariances between parameters. Rows and columns correspond to the intercept (if one exists) the regression coefficients, and disturbance variance. Row and column names are the same as the row names in MarginalDistributions.
JointDistributio A string scalar that describes the distributions of the regression coefficients n (Beta) and the disturbance variance (Sigma2) when known. For distribution descriptions: • N(Mu,V) denotes the normal distribution with mean Mu and variance matrix V. This distribution can be multivariate. • IG(A,B) denotes the inverse gamma distribution with shape A and scale B. • t(Mu,V,DoF) denotes the Student’s t distribution with mean Mu, variance V, and degrees of freedom DoF.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. 12-2488
summarize
Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Version History Introduced in R2017a
See Also Objects conjugateblm | semiconjugateblm | diffuseblm | empiricalblm | customblm Functions estimate | forecast Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-2489
12
Functions
summarize Distribution summary statistics of Bayesian linear regression model for predictor variable selection
Syntax summarize(Mdl) SummaryStatistics = summarize(Mdl)
Description To obtain a summary of a standard Bayesian linear regression model, see summarize. summarize(Mdl) displays a tabular summary of the random regression coefficients and disturbance variance of the Bayesian linear regression model on page 12-2493 Mdl at the command line. For each parameter, the summary includes the: • Standard deviation (square root of the variance) • 95% equitailed credible intervals • Probability that the parameter is greater than 0 • Description of the distributions, if known • Marginal probability that a coefficient should be included in the model, for stochastic search variable selection (SSVS) predictor-variable-selection models SummaryStatistics = summarize(Mdl) returns a structure array with a table summarizing the regression coefficients and disturbance variance, and a description of the joint distribution of the parameters.
Examples Summarize Prior and Posterior Distributions Consider the multiple linear regression model that predicts the US real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR). GNPRt = β0 + β1IPIt + β2Et + β3WRt + εt . For all t, εt is a series of independent Gaussian disturbances with a mean of 0 and variance σ2. Assume these prior distributions for k = 0,...,3: •
12-2490
βk | σ2, γk = γkσ V k1Z1 + (1 − γk)σ V k2Z2, where Z1 and Z2 are independent, standard normal random variables. Therefore, the coefficients have a Gaussian mixture distribution. Assume all coefficients are conditionally independent, a priori, but they are dependent on the disturbance variance.
summarize
• σ2 ∼ IG(A, B). A and B are the shape and scale, respectively, of an inverse gamma distribution. • γk ∈ 0, 1 and it represents the random variable-inclusion regime variable with a discrete uniform distribution. Create a prior model for SSVS. Specify the number of predictors p. p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','mixconjugateblm','VarNames',VarNames);
PriorMdl is a mixconjugateblm Bayesian linear regression model object for SSVS predictor selection representing the prior distribution of the regression coefficients and disturbance variance. Summarize the prior distribution. summarize(PriorMdl) | Mean Std CI95 Positive Distribution -----------------------------------------------------------------------------Intercept | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution IPI | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution E | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution WR | 0 1.5890 [-3.547, 3.547] 0.500 Mixture distribution Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1)
The function displays a table of summary statistics and other information about the prior distribution at the command line. Load the Nelson-Plosser data set, and create variables for the predictor and response data. load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable.GNPR;
Estimate the posterior distributions. Suppress the estimation display. PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);
PosteriorMdl is an empiricalblm model object that contains the posterior distributions of β and σ2. Obtain summary statistics from the posterior distribution. summary = summarize(PosteriorMdl);
summary is a structure array containing two fields: MarginalDistributions and JointDistribution. Display the marginal distribution summary by using dot notation. summary.MarginalDistributions ans=5×5 table Mean __________
Std _________
CI95 ________________________
Positive ________
Distribution _____________
12-2491
12
Functions
Intercept IPI E WR Sigma2
-18.66 4.4555 0.00096765 2.4739 47.773
10.348 0.15287 0.0003759 0.36337 8.6863
-37.006 4.1561 0.00021479 1.7607 33.574
0.8406 4.7561 0.0016644 3.1882 67.585
0.0412 1 0.9968 1 1
{'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'} {'Empirical'}
The MarginalDistributions field is a table of summary statistics and other information about the posterior distribution.
Input Arguments Mdl — Bayesian linear regression model for predictor variable selection mixconjugateblm model object | mixsemiconjugateblm model object | lassoblm model object Bayesian linear regression model for predictor variable selection, specified as a model object in this table. Model Object
Description
mixconjugateblm
Dependent, Gaussian-mixture-inverse-gamma conjugate model for SSVS predictor variable selection, returned by bayeslm
mixsemiconjugateblm
Independent, Gaussian-mixture-inverse-gamma semiconjugate model for SSVS predictor variable selection, returned by bayeslm
lassoblm
Bayesian lasso regression model returned by bayeslm
Output Arguments SummaryStatistics — Parameter distribution summary structure array Parameter distribution summary, returned as a structure array containing the information in this table.
12-2492
summarize
Structure Field
Description
MarginalDistribu Table containing a summary of the parameter distributions. Rows correspond tions to parameters. Columns correspond to the: • Estimated posterior mean (Mean) • Standard deviation (Std) • 95% equitailed credible interval (CI95) • Posterior probability that the parameter is greater than 0 (Positive) • Description of the marginal or conditional posterior distribution of the parameter (Distribution) Row names are the names in Mdl.VarNames. The name of the last row is Sigma2. JointDistributio A string scalar that describes the distributions of the regression coefficients n (Beta) and the disturbance variance (Sigma2) when known. For distribution descriptions: • N(Mu,V) denotes the normal distribution with mean Mu and variance matrix V. This distribution can be multivariate. • IG(A,B) denotes the inverse gamma distribution with shape A and scale B. • Mixture distribution denotes a Student’s t mixture distribution. Note If Mdl is a lassoblm model and Mdl.Probability is a function handle representing the regime probability distribution, then summarize cannot estimate prior distribution statistics for the coefficients. Therefore, entries corresponding to coefficient statistics are NaN values.
More About Bayesian Linear Regression Model A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables. For times t = 1,...,T: • yt is the observed response. • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t. • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables that compose the columns of xt. • εt is the random disturbance with a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is ℓ β, σ2 y, x =
T
∏
t=1
ϕ yt; xt β, σ2 .
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;. 12-2493
12
Functions
Before considering the data, you impose a joint prior distribution assumption on (β,σ2). In a Bayesian analysis, you update the distribution of the parameters by using information about the parameters obtained from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.
Algorithms • If Mdl is a lassoblm model object and Mdl.Probability is a numeric vector, then the 95% credible intervals on the regression coefficients are Mean + [–2 2]*Std, where Mean and Std are variables in the summary table. • If Mdl is a mixconjugateblm or mixsemiconjugateblm model object, then the 95% credible intervals on the regression coefficients are estimated from the mixture cdf. If the estimation fails, then summarize returns NaN values instead.
Version History Introduced in R2018b
See Also Objects mixconjugateblm | mixsemiconjugateblm | lassoblm Functions estimate Topics “Bayesian Linear Regression” on page 6-2 “Implement Bayesian Linear Regression” on page 6-10
12-2494
summarize
summarize Summarize threshold-switching dynamic regression model estimation results
Syntax summarize(Mdl) summarize(Mdl,state) results = summarize( ___ )
Description summarize(Mdl) displays a summary of the threshold-switching dynamic regression model Mdl. • If Mdl is an estimated model returned by estimate, then summarize displays estimation results to the MATLAB Command Window. The display includes: • A model description • Estimated threshold transitions • Fit statistics, which include the effective sample size, number of estimated submodel parameters and constraints, loglikelihood, and information criteria (AIC and BIC) • A table of submodel estimates and inferences, which includes coefficient estimates with standard errors, t-statistics, and p-values • If Mdl is an unestimated threshold-switching model returned by tsVAR, summarize prints the standard object display (the same display that tsVAR prints during model creation). summarize(Mdl,state) displays only summary information for the submodel with name state. results = summarize( ___ ) returns one of the following variables using any of the input argument combinations in the previous syntaxes. • If Mdl is an estimated threshold-switching model, results is a table containing the submodel estimates and inferences. • If Mdl is an unestimated model, results is a tsVAR object that is equal to Mdl. summarize does not print to the Command Window
Examples Fit SETAR Model to Simulated Data Assess estimation accuracy using simulated data from a known data-generating process (DGP). This example uses arbitrary parameter values. Create Model for DGP Create a discrete threshold transition at mid-level 1. ttDGP = threshold(1)
12-2495
12
Functions
ttDGP = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 1 [] ["1" "2"] 2
ttDGP is a threshold object representing the state-switching mechanism of the DGP. Create the following fully specified self-exciting TAR (SETAR) model for the DGP. • State 1: yt = εt. • State 2: yt = 2 + εt. • εt ∼ Ν 0, 1 . Specify the submodels by using arima. mdl1DGP = arima(Constant=0); mdl2DGP = arima(Constant=2); mdlDGP = [mdl1DGP mdl2DGP];
Because the innovations distribution is invariant across states, the tsVAR software ignores the value of the submodel innovations variance (Variance property). Create a threshold-switching model for the DGP. Specify the model-wide innovations variance. MdlDGP = tsVAR(ttDGP,mdlDGP,Covariance=1);
MdlDGP is a tsVAR object representing the DGP. Simulate Response Paths from DGP Generate a random response path of length 100 from the DGP. By default, simulate assumes a SETAR model with delay d = 1. In other words, the threshold variable is yt − 1. rng(1) % For reproducibiliy y = simulate(MdlDGP,100);
y is a 100-by-1 vector of representing the simulated response path. Create Model for Estimation Create a partially specified threshold-switching model that has the same structure as the datagenerating process, but specify the transition mid-level, submodel coefficients, and model-wide constant as unknown for estimation. tt = threshold(NaN); mdl1 = arima('Constant',NaN); mdl2 = arima('Constant',NaN); Mdl = tsVAR(tt,[mdl1,mdl2],'Covariance',NaN);
Mdl is a partially specified tsVAR object representing a template for estimation. NaN-valued elements of the Switch and Submodels properties indicate estimable parameters. 12-2496
summarize
Mdl is agnostic of the threshold variable; tsVAR object functions enable you to specify threshold variable characteristics or data. Create Threshold Transitions Containing Initial Values The estimation procedure requires initial values for all estimable threshold transition parameters. Fully specify a threshold transition that has the same structure as tt, but set the mid-level to 0. tt0 = threshold(0);
tt0 is a fully specified threshold object. Estimate Model Fit the model to the simulated path. By default, the model is self-exciting and the delay of the threshold variable is d = 1. EstMdl = estimate(Mdl,tt0,y) EstMdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["1" "2"] "1" 1.0225
EstMdl is a fully specified tsVAR object representing the estimated SETAR model. Display an estimation summary of the submodels. summarize(EstMdl) Description 1-Dimensional tsVAR Model with 2 Submodels Switch Transition Type: discrete Estimated Levels: 1.128 Fit Effective Sample Size: 99 Number of Estimated Parameters: 2 Number of Constrained Parameters: 0 LogLikelihood: -141.574 AIC: 287.149 BIC: 292.339 Submodels
State 1 Constant(1) State 2 Constant(1)
Estimate ________
StandardError _____________
TStatistic __________
PValue __________
-0.12774 2.1774
0.13241 0.16829
-0.96474 12.939
0.33467 2.7264e-38
12-2497
12
Functions
The estimates are close to their true values. Plot the estimated switching mechanism with the threshold data, which is the response data. figure ttplot(EstMdl.Switch,'Data',y)
Display Estimation Summary for One State Create the following fully specified SETAR model for the DGP. • • • •
State 1: State 2: State 3: ε1, t ε2, t
y1, t y2, t y1, t y2, t y1, t y2, t
∼ N2
=
ε1, t −1 −0 . 5 0 . 1 y1, t − 1 + + . −4 0 . 2 −0 . 75 y2, t − 1 ε2, t
=
ε1, t 1 . + 4 ε2, t
=
ε1, t 1 0 . 5 0 . 1 y1, t − 1 + + . 4 0 . 2 0 . 75 y2, t − 1 ε2, t
0 2 −1 , . 0 −1 1
• The system is in state 1 when y2, t − 4 < − 3, the system is in state 2 when −3 ≤ y2, t − 4 < 3, and the system is in state 3 otherwise. 12-2498
summarize
t = [-3 3]; ttDGP = threshold(t); constant1 = [-1; -4]; constant2 = [1; 4]; constant3 = [1; 4]; AR1 = [-0.5 0.1; 0.2 -0.75]; AR3 = [0.5 0.1; 0.2 0.75]; Sigma = [2 -1; -1 1]; mdl1DGP = varm(Constant=constant1,AR={AR1}); mdl2DGP = varm(Constant=constant2); mdl3DGP = varm(Constant=constant3,AR={AR3}); mdlDGP = [mdl1DGP; mdl2DGP; mdl3DGP]; MdlDGP = tsVAR(ttDGP,mdlDGP,Covariance=Sigma);
DIsplay a summary of the unestimated DGP. summarize(MdlDGP) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["1" "2" "3"] ["1" "2"] [2x2 double]
summarize prints an object display. Generate a random response path of length 500 from the DGP. Specify that second response variable with a delay of 4 as the threshold variable. rng(10) % For reproducibiliy y = simulate(MdlDGP,500,Index=2,Delay=4);
Create a partially specified threshold-switching model that has the same structure as the DGP, but specify the transition mid-level, submodel coefficients, and model-wide covariance as unknown for estimation. tt = threshold([NaN; NaN]); mdlar = varm(2,1); mdlc = varm(2,0); Mdl = tsVAR(tt,[mdlar; mdlc; mdlar],Covariance=nan(2));
Fully specify a threshold transition that has the same structure as tt, but set the mid-levels to -1 and 1. t0 = [-1 1]; tt0 = threshold(t0);
Fit the threshold-switching model to the simulated series. Specify the threshold variable y2, t − 4. Plot the loglikelihood after each iteration of the threshold search algorithm. EstMdl = estimate(Mdl,tt0,y,IterationPlot=true,Index=2,Delay=4);
12-2499
12
Functions
The plot displays the evolution of the loglikelihood as the estimation procedure searches for optimal levels. The procedure terminates when one of the stopping criteria is satisfied. Display an estimation summary for state 3 only. summarize(EstMdl,3) Description 2-Dimensional VAR Submodel, State 3 Submodel Estimate ________ State State State State State State
3 3 3 3 3 3
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2)
1.0621 3.8707 0.47396 0.23013 0.10561 0.7568
StandardError _____________ 0.095701 0.068772 0.058016 0.041691 0.018233 0.013102
Return Estimation Summary Table Create a discrete threshold transition at mid-level 1. 12-2500
TStatistic __________
PValue __________
11.098 56.284 8.1694 5.5199 5.7924 57.761
1.2802e-28 0 3.0997e-16 3.3927e-08 6.9371e-09 0
summarize
ttDGP = threshold(1) ttDGP = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 1 [] ["1" "2"] 2
Create the following fully specified self-exciting TAR (SETAR) model for the DGP. • State 1: yt = εt. • State 2: yt = 2 + εt . • εt ∼ Ν 0, 1 . Specify the submodels by using arima. mdl1DGP = arima(Constant=0); mdl2DGP = arima(Constant=2); mdlDGP = [mdl1DGP mdl2DGP];
Create a threshold-switching model for the DGP. Specify the model-wide innovations variance. MdlDGP = tsVAR(ttDGP,mdlDGP,Covariance=1);
Generate a random response path of length 100 from the DGP. By default, simulate assumes a SETAR model with delay d = 1. In other words, the threshold variable is yt − 1. rng(1) % For reproducibiliy y = simulate(MdlDGP,100);
Create a partially specified threshold-switching model that has the same structure as the datagenerating process, but specify the transition mid-level, submodel coefficients, and model-wide constant as unknown for estimation. tt = threshold(NaN); mdl1 = arima('Constant',NaN); mdl2 = arima('Constant',NaN); Mdl = tsVAR(tt,[mdl1,mdl2],'Covariance',NaN);
Fully specify a threshold transition that has the same structure as tt, but set the mid-level to 0. tt0 = threshold(0);
Fit the model to the simulated path. By default, the model is self-exciting and the delay of the threshold variable is d = 1. EstMdl = estimate(Mdl,tt0,y);
Return an estimation summary table. results = summarize(EstMdl) results=2×4 table Estimate
StandardError
TStatistic
PValue
12-2501
12
Functions
State 1 Constant(1) State 2 Constant(1)
________
_____________
__________
__________
-0.12774 2.1774
0.13241 0.16829
-0.96474 12.939
0.33467 2.7264e-38
results is a table containing estimates and inferences for all submodel coefficients. Identify significant coefficient estimates. results.Properties.RowNames(results.PValue < 0.05) ans = 1x1 cell array {'State 2 Constant(1)'}
Input Arguments Mdl — Threshold-switching dynamic regression model tsVAR object Threshold-switching dynamic regression model, specified as a tsVAR object returned by estimate or tsVAR. state — State to summarize integer in 1:Mdl.NumStates (default) | state name in Mdl.StateNames State to summarize, specified as an integer in 1:Mdl.NumStates or a state name in Mdl.StateNames. The default summarizes all states. Example: summarize(Mdl,3) summarizes the third state in Mdl. Example: summarize(Mdl,"Recession") summarizes the state labeled "Recession" in Mdl. Data Types: double | char | string
Output Arguments results — Model summary table | tsVAR object Model summary, returned as a table or tsVAR object. • If Mdl is an estimated threshold-switching model returned by estimate, results is a table of summary information for the submodel parameter estimates. Each row corresponds to a submodel coefficient. Columns correspond to the estimate (Estimate), standard error (StandardError), tstatistic (TStatistic), and the p-value (PValue). When the summary includes all states (the default), results.Properties stores the following fit statistics:
12-2502
summarize
Field
Description
Description
Model summary description (character vector)
EffectiveSampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
NumConstraints
Number of equality constraints (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike information criterion (numeric scalar)
BIC
Bayesian information criterion (numeric scalar)
• If Mdl is an unestimated model, results is a tsVAR object that is equal to Mdl. Note When results is a table, it contains only submodel parameter estimates: • Mdl.Switch contains estimates of threshold transitions. • Threshold-switching models can have one or more residual covariance matrices. When Mdl has a model-wide covariance, Mdl.Covariance contains the estimated residual covariance. Otherwise, Mdl.Submodels(j).Covariance contains the estimated residual covariance of state j. For details, see tsVAR.
Algorithms estimate searches over levels and rates for estimated threshold transitions while solving a conditional least-squares problem for submodel parameters, as described in [2]. The standard errors, loglikelihood, and information criteria are conditional on optimal parameter values in the estimated threshold transitions Mdl.Switch. In particular, standard errors do not account for variation in estimated levels and rates.
Version History Introduced in R2021b
References [1] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [2] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also Objects tsVAR | threshold 12-2503
12
Functions
Functions estimate Topics “Estimate Threshold-Switching Dynamic Regression Models” on page 10-94
12-2504
summarize
summarize Display estimation results of vector autoregression (VAR) model
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the VAR(p) model Mdl. • If Mdl is an estimated VAR model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The summary also includes the loglikelihood, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) model fit statistics, as well as the estimated innovations covariance and correlation matrices. • If Mdl is an unestimated VAR model returned by varm, then summarize prints the standard object display (the same display that varm prints during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated VAR model, then results is a structure containing estimation results. • If Mdl is an unestimated VAR model, then results is a varm model object that is equal to Mdl.
Examples Fit VAR(4) Model to Matrix of Response Data Fit a VAR(4) model to the consumer price index (CPI) and unemployment rate series. Supply the response series as a numeric matrix. Load the Data_USEconModel data set. load Data_USEconModel
Plot the two series on separate plots. figure; plot(DataTimeTable.Time,DataTimeTable.CPIAUCSL); title('Consumer Price Index') ylabel('Index') xlabel('Date')
12-2505
12
Functions
figure; plot(DataTimeTable.Time,DataTimeTable.UNRATE); title('Unemployment Rate'); ylabel('Percent'); xlabel('Date');
12-2506
summarize
Stabilize the CPI by converting it to a series of growth rates. Synchronize the two series by removing the first observation from the unemployment rate series. rcpi = price2ret(DataTimeTable.CPIAUCSL); unrate = DataTimeTable.UNRATE(2:end);
Create a default VAR(4) model by using the shorthand syntax. Mdl = varm(2,4) Mdl = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional VAR(4) Model" "Y1" "Y2" 2 4 [2×1 vector of NaNs] {2×2 matrices of NaNs} at lags [1 2 3 ... and 1 more] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix of NaNs]
Mdl is a varm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set. 12-2507
12
Functions
EstMdl = estimate(Mdl,[rcpi unrate]) EstMdl = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VAR(4) Model" "Y1" "Y2" 2 4 [0.00171639 0.316255]' {2×2 matrices} at lags [1 2 3 ... and 1 more] [2×1 vector of zeros] [2×0 matrix] [2×2 matrix]
EstMdl is an estimated varm model object. It is fully specified because all parameters have known values. The description indicates that the autoregressive polynomial is stationary. Display summary statistics from the estimation. summarize(EstMdl) AR-Stationary 2-Dimensional VAR(4) Model Effective Sample Size: 241 Number of Estimated Parameters: 18 LogLikelihood: 811.361 AIC: -1586.72 BIC: -1524
Constant(1) Constant(2) AR{1}(1,1) AR{1}(2,1) AR{1}(1,2) AR{1}(2,2) AR{2}(1,1) AR{2}(2,1) AR{2}(1,2) AR{2}(2,2) AR{3}(1,1) AR{3}(2,1) AR{3}(1,2) AR{3}(2,2) AR{4}(1,1) AR{4}(2,1) AR{4}(1,2) AR{4}(2,2)
Value ___________
StandardError _____________
TStatistic __________
PValue __________
0.0017164 0.31626 0.30899 -4.4834 -0.0031796 1.3433 0.22433 7.1896 0.0012375 -0.26817 0.35333 1.487 0.0028594 -0.22709 -0.047563 8.6379 -0.00096323 0.076725
0.0015988 0.091961 0.063356 3.6441 0.0011306 0.065032 0.069631 4.005 0.0018631 0.10716 0.068287 3.9277 0.0018621 0.1071 0.069026 3.9702 0.0011142 0.064088
1.0735 3.439 4.877 -1.2303 -2.8122 20.656 3.2217 1.7951 0.6642 -2.5025 5.1742 0.37858 1.5355 -2.1202 -0.68906 2.1757 -0.86448 1.1972
0.28303 0.0005838 1.0772e-06 0.21857 0.004921 8.546e-95 0.0012741 0.072631 0.50656 0.012331 2.2887e-07 0.705 0.12465 0.033986 0.49079 0.029579 0.38733 0.23123
Innovations Covariance Matrix: 0.0000 -0.0002 -0.0002 0.1167
12-2508
summarize
Innovations Correlation Matrix: 1.0000 -0.0925 -0.0925 1.0000
Compare Several VAR Model Fits Consider these four VAR models of consumer price index (CPI) and unemployment rate: VAR(0), VAR(1), VAR(4), and VAR(8). Using historical data, estimate each, and then compare the model fits using the resulting BIC. Load the Data_USEconModel data set. Declare variables for the consumer price index (CPI) and unemployment rate (UNRATE) series. Remove any missing values from the beginning of the series. load Data_USEconModel cpi = DataTimeTable.CPIAUCSL; unrate = DataTimeTable.UNRATE; idx = all(~isnan([cpi unrate]),2); cpi = cpi(idx); unrate = unrate(idx);
Stabilize CPI by converting it to a series of growth rates. Synchronize the two series by removing the first observation from the unemployment rate series. rcpi = price2ret(cpi); unrate = unrate(2:end);
Within a loop: • Create a VAR model using the shorthand syntax. • Estimate the VAR Model. Reserve the maximum value of p as presample observations. • Store the estimation results. numseries = 2; p = [0 1 4 8]; estMdlResults = cell(numel(p),1); % Preallocation Y0 = [rcpi(1:max(p)) unrate(1:max(p))]; Y = [rcpi((max(p) + 1):end) unrate((max(p) + 1):end)]; for j = 1:numel(p) Mdl = varm(numseries,p(j)); EstMdl = estimate(Mdl,Y,'Y0',Y); estMdlResults{j} = summarize(EstMdl); end
estMdlResults is a 4-by-1 cell array of structure arrays containing the estimation results of each model. Extract the BIC from each set of results. BIC = cellfun(@(x)x.BIC,estMdlResults) BIC = 4×1 103 ×
12-2509
12
Functions
-0.7153 -1.3678 -1.4378 -1.3853
The model corresponding to the lowest BIC has the best fit among the models considered. Therefore, the VAR(4) is the best fitting model.
Input Arguments Mdl — VAR model varm model object VAR model, specified as a varm model object returned by estimate, varm, or varm (a vecm function).
Output Arguments results — Model summary structure array | varm model object Model summary, returned as a structure array or a varm model object. • If Mdl is an estimated VAR model, then results is a structure array containing the fields in this table.
12-2510
Field
Description
Description
Model summary description (string)
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike information criterion (numeric scalar)
BIC
Bayesian information criterion (numeric scalar)
Table
Parameter estimates with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
Covariance
Estimated residual covariance matrix (the maximum likelihood estimate), a Mdl.NumSeries-by-Mdl.NumSeries numeric matrix with rows and columns corresponding to the innovations in the response equations ordered by the data Y
summarize
Field
Description
Correlation
Estimated residual correlation matrix, its dimensions correspond to the dimensions of Covariance
summarize uses mvregress to implement multivariate normal, maximum likelihood estimation. For more details on estimates and standard errors, see “Estimation of Multivariate Regression Models”. • If Mdl is an unestimated VAR model, then results is a varm model object that is equal to Mdl.
Version History Introduced in R2017a
See Also Objects varm Functions estimate | varm | mvregress Topics “VAR Model Estimation” on page 9-34 “Fit VAR Model of CPI and Unemployment Rate” on page 9-38 “VAR Model Case Study” on page 9-90 “Estimation of Multivariate Regression Models”
12-2511
12
Functions
summarize Display estimation results of vector error-correction (VEC) model
Syntax summarize(Mdl) results = summarize(Mdl)
Description summarize(Mdl) displays a summary of the VEC(p – 1) model Mdl. • If Mdl is an estimated VEC model returned by estimate, then summarize prints estimation results to the MATLAB Command Window. The display includes an estimation summary and a table of parameter estimates with corresponding standard errors, t statistics, and p-values. The estimation summary includes fit statistics, such as the Akaike Information Criterion (AIC), and the estimated innovations covariance and correlation matrices. • If Mdl is an unestimated VEC model returned by vecm, then summarize prints the standard object display (the same display that vecm prints during model creation). results = summarize(Mdl) returns one of the following variables and does not print to the Command Window. • If Mdl is an estimated VEC model, then results is a structure containing estimation results. • If Mdl is an unestimated VEC model, then results is a vecm model object that is equal to Mdl.
Examples Fit VEC(1) Model to Matrix of Response Data Fit a VEC(1) model to seven macroeconomic series. Supply the response data as a numeric matrix. Consider a VEC model for the following macroeconomic series: • Gross domestic product (GDP) • GDP implicit price deflator • Paid compensation of employees • Nonfarm business sector hours of all persons • Effective federal funds rate • Personal consumption expenditures • Gross private domestic investment Suppose that a cointegrating rank of 4 and one short-run term are appropriate, that is, consider a VEC(1) model. Load the Data_USEconVECModel data set. 12-2512
summarize
load Data_USEconVECModel
For more information on the data set and variables, enter Description at the command line. Determine whether the data needs to be preprocessed by plotting the series on separate plots. figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.GDP); title("Gross Domestic Product"); ylabel("Index"); xlabel("Date"); nexttile plot(FRED.Time,FRED.GDPDEF); title("GDP Deflator"); ylabel("Index"); xlabel("Date"); nexttile plot(FRED.Time,FRED.COE); title("Paid Compensation of Employees"); ylabel("Billions of $"); xlabel("Date"); nexttile plot(FRED.Time,FRED.HOANBS); title("Nonfarm Business Sector Hours"); ylabel("Index"); xlabel("Date");
12-2513
12
Functions
figure tiledlayout(2,2) nexttile plot(FRED.Time,FRED.FEDFUNDS) title("Federal Funds Rate") ylabel("Percent") xlabel("Date") nexttile plot(FRED.Time,FRED.PCEC) title("Consumption Expenditures") ylabel("Billions of $") xlabel("Date") nexttile plot(FRED.Time,FRED.GPDI) title("Gross Private Domestic Investment") ylabel("Billions of $") xlabel("Date")
Stabilize all series, except the federal funds rate, by applying the log transform. Scale the resulting series by 100 so that all series are on the same scale. FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
12-2514
summarize
Create a VEC(1) model using the shorthand syntax. Specify the variable names. Mdl = vecm(7,4,1); Mdl.SeriesNames = FRED.Properties.VariableNames Mdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model with Linear Time Trend" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [7×1 vector of NaNs] [7×4 matrix of NaNs] [7×4 matrix of NaNs] [7×7 matrix of NaNs] [4×1 vector of NaNs] [4×1 vector of NaNs] {7×7 matrix of NaNs} at lag [1] [7×1 vector of NaNs] [7×0 matrix] [7×7 matrix of NaNs]
Mdl is a vecm model object. All properties containing NaN values correspond to parameters to be estimated given data. Estimate the model using the entire data set and the default options. EstMdl = estimate(Mdl,FRED.Variables) EstMdl = vecm with properties: Description: SeriesNames: NumSeries: Rank: P: Constant: Adjustment: Cointegration: Impact: CointegrationConstant: CointegrationTrend: ShortRun: Trend: Beta: Covariance:
"7-Dimensional Rank = 4 VEC(1) Model" "GDP" "GDPDEF" "COE" ... and 4 more 7 4 2 [14.1329 8.77841 -7.20359 ... and 4 more]' [7×4 matrix] [7×4 matrix] [7×7 matrix] [-28.6082 109.555 -77.0912 ... and 1 more]' [4×1 vector of zeros] {7×7 matrix} at lag [1] [7×1 vector of zeros] [7×0 matrix] [7×7 matrix]
EstMdl is an estimated vecm model object. It is fully specified because all parameters have known values. By default, estimate imposes the constraints of the H1 Johansen VEC model form by removing the cointegrating trend and linear trend terms from the model. Parameter exclusion from estimation is equivalent to imposing equality constraints to zero. Display a short summary from the estimation. results = summarize(EstMdl)
12-2515
12
Functions
results = struct with fields: Description: "7-Dimensional Rank = 4 VEC(1) Model" Model: "H1" SampleSize: 238 NumEstimatedParameters: 112 LogLikelihood: -1.4939e+03 AIC: 3.2118e+03 BIC: 3.6007e+03 Table: [133x4 table] Covariance: [7x7 double] Correlation: [7x7 double]
The Table field of results is a table of parameter estimates and corresponding statistics.
Compare Several VEC Model Fits Consider the model and data in “Fit VEC(1) Model to Matrix of Response Data” on page 12-2512 and these four alternative VEC models: VEC(0), VEC(1), VEC(3), and VEC(7). Using historical data, estimate each of the four models, and then compare the model fits using the resulting Bayesian Information Criterion (BIC). Load the Data_USEconVECModel data set and preprocess the data. load Data_USEconVECModel FRED.GDP = 100*log(FRED.GDP); FRED.GDPDEF = 100*log(FRED.GDPDEF); FRED.COE = 100*log(FRED.COE); FRED.HOANBS = 100*log(FRED.HOANBS); FRED.PCEC = 100*log(FRED.PCEC); FRED.GPDI = 100*log(FRED.GPDI);
Within a loop: • Create a VEC model using the shorthand syntax. • Estimate the VEC Model. Reserve the maximum value of p as presample observations. • Store the estimation results. numlags = [0 1 3 7]; p = numlags + 1; Y0 = FRED{1:max(p),:}; Y = FRED{((max(p) + 1):end),:}; for j = 1:numel(p) Mdl = vecm(7,4,numlags(j)); EstMdl = estimate(Mdl,Y,'Y0',Y); results(j) = summarize(EstMdl); end
results is a 4-by-1 structure array containing the estimation results of each model. Extract the BIC from each set of results. BIC = [results.BIC]
12-2516
summarize
BIC = 1×4 103 × 5.3948
5.4372
5.8254
6.5536
The model corresponding to the lowest BIC has the best fit among the models considered. Therefore, the VEC(0) model is the best fitting model.
Input Arguments Mdl — VEC model vecm model object VEC model, specified as a vecm model object returned by estimate or vecm.
Output Arguments results — Model summary structure array | vecm model object Model summary, returned as a structure array or a vecm model object. • If Mdl is an estimated VEC model, then results is a structure array containing the fields in this table. Field
Description
Description
Model summary description (string)
Model
Johansen model of deterministic terms ("H2", "H1*", "H1", "H*", "H") [1]
SampleSize
Effective sample size (numeric scalar)
NumEstimatedParameters
Number of estimated parameters (numeric scalar)
LogLikelihood
Optimized loglikelihood value (numeric scalar)
AIC
Akaike Information Criterion (numeric scalar)
BIC
Bayesian Information Criterion (numeric scalar)
Table
Parameter estimates with corresponding standard errors, t statistics (estimate divided by standard error), and p-values (assuming normality); a table with rows corresponding to model parameters
Covariance
Estimated residual covariance matrix (the maximum likelihood estimate), an Mdl.NumSeries-by-Mdl.NumSeries numeric matrix with rows and columns corresponding to the innovations in the response equations ordered by the columns of Y
12-2517
12
Functions
Field
Description
Correlation
Estimated residual correlation matrix whose dimensions correspond to the dimensions of Covariance
summarize uses mvregress to implement multivariate normal, maximum likelihood estimation. For more details on estimates and standard errors, see “Estimation of Multivariate Regression Models”. • If Mdl is an unestimated VEC model, then results is a vecm model object that is equal to Mdl.
Version History Introduced in R2017b
References [1] Johansen, S. Likelihood-Based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press, 1995.
See Also Objects vecm Functions estimate | vecm | mvregress Topics “Model the United States Economy” on page 9-150 “Estimation of Multivariate Regression Models”
12-2518
threshold
threshold Create threshold transitions
Description threshold creates threshold transitions from the specified levels and transition type, either discrete or smooth. Use a threshold object to specify the switching mechanism of a threshold-switching dynamic regression model (tsVAR). To study a threshold transitions model, pass a fully specified threshold object to an object function on page 12-2522. You can specify transition levels and rates as unknown parameters (NaN values), which you can estimate when you fit a tsVAR model to data by using estimate. Alternatively, to create a random switching mechanism, governed by a discrete-time Markov chain, for a Markov-switching dynamic regression model, see dtmc and msVAR.
Creation Syntax tt = threshold(levels) tt = threshold(levels,Name,Value) Description tt = threshold(levels) creates the threshold transitions object tt for discrete state transitions on page 12-2527 specified by the transition mid-levels levels. tt = threshold(levels,Name,Value) sets properties on page 12-2520 using name-value argument syntax. For example, threshold([0 1],Type="exponential",Rates=[0.5 1.5]) specifies smooth, exponential transitions at mid-levels 0 and 1 with rates 0.5 and 1.5, respectively. Input Arguments levels — Transition mid-levels t1, t2,… tn increasing numeric vector Transition mid-levels t1, t2,… tn, specified as an increasing numeric vector. Levels tj separate threshold variable data into n + 1 states represented by the intervals (−∞,t1),[t1,t2), … [tn,∞). NaN entries indicate estimable transition levels. The estimate function of tsVAR treats the known elements of levels as equality constraints during optimization. threshold stores levels in the Levels property Data Types: double 12-2519
12
Functions
Properties You can set most properties when you create a model by using name-value argument syntax. You can modify only StateNames by using dot notation. For example, create a two-state, logistic transition at 0, and then label the first and second states Depression and Recession, respectively. tt = threshold(0,Type="logistic"); tt.StateNames = ["Depression" "Recession"];
Type — Type of transitions "discrete" (default) | "normal" | "logistic" | "exponential" | "custom" This property is read-only. Type of transitions, specified as a character vector or string scalar. The transition function F(zt,tj,rj) associates a transition type with each threshold level tj, where zt is a threshold variable and rj is a level-specific transition rate. Each function F is bounded between 0 and 1. This table contains the supported types of transitions: Value "discrete" (default)
Description Discrete transitions: F(zt, t j) =
0 , zt < t j 1 , zt > = t j
.
Discrete transitions do not have transition rates. "normal" "logistic" "exponential"
Cumulative normal transitions: F(zt,tj,rj) = normcdf(zt,tj,1/rj). Logistic transitions: F(zt, t j, r j) =
1 −r j(zt − t j)
1+e
Exponential transitions: 2
F(zt, t j, r j) = 1 − e−r j(zt − t j) . "custom"
Custom transition function specified by the function handle of the TransitionFunction property.
Example: "normal" Data Types: char | string Levels — Transition mid-levels t1, t2,… tn increasing numeric vector This property is read-only. Transition mid-levels t1, t2,… tn, specified as an increasing numeric vector. The levels input argument sets Levels. Data Types: double Rates — Transition rates r1, r2,… rn ones(n,1) (default) | positive numeric vector 12-2520
.
threshold
This property is read-only. Transition rates r1, r2,… rn, specified as a positive numeric vector of length n, the number of levels. Each rate corresponds to a level in Levels. NaN values indicate estimable rates. The estimate function of tsVAR treats the known elements of Rates as equality constraints during optimization. threshold ignores rates for discrete transitions. Example: [0.5 1.5] Data Types: double NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar and derived from Levels. Data Types: double StateNames — Unique state labels string(1:(n + 1)) (default) | string vector | cell vector of character vectors | numeric vector Unique state labels, specified as a string vector, cell vector of character vectors, or numeric vector of length numStates. StateNames(1) names the state corresponding to (−∞,t1), StateName(2) names the state corresponding to [t1,t2),… and StateNames(n + 1) names the state corresponding to [tn,∞). Example: ["Depression" "Recession" "Stagnant" "Boom"] Data Types: string TransitionFunction — Custom transition function [] (default) | function handle This property is read-only. Custom transition function F(zt,tj,rj), specified as a function handle. The handle must specify a function with the following syntax: function f = transitionfcn(z,tj,rj)
where: • You can replace the name transitionfcn. • z is a numeric vector of threshold variable data. • tj is a numeric scalar threshold level. • rj is a numeric scalar rate. • f is a numeric vector of in the interval [0,1]. When Type is not "custom", threshold ignores TransitionFunction. Example: @transitionfcn 12-2521
12
Functions
Data Types: function_handle
Object Functions ttplot ttdata ttstates
Plot threshold transitions Transition function data Threshold variable data state path
Examples Create Discrete Threshold Transition Create a threshold transition at mid-level t1 = 0. t1 = 0; tt = threshold(t1) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 0 [] ["1" "2"] 2
tt is a threshold object representing a discrete threshold transition at mid-level 0. tt is fully specified because its properties do not contain NaN values. Therefore, you can pass tt to any threshold object function. Given a univariate threshold variable, tt divides the range of the variable into two distinct states, which tt labels "1" and "2". tt also specifies the switching mechanism of a threshold-switching autoregressive (TAR) model, represented by a tsVAR object. Given values of the observed univariate transition variable: • The TAR model is in state "1" when the transition variable is in the interval − ∞ , 0 . • The TAR model is in state "2" when the transition variable is in the interval [0, ∞ ).
Plot Smooth Threshold Transitions This example shows how to create two logistic threshold transitions with different transition rates, and then display a gradient plot of the transitions. Load the yearly Canadian inflation and interest rates data set. Extract the inflation rate based on consumer price index (INF_C) from the table, and plot the series. load Data_Canada INF_C = DataTable.INF_C; plot(dates,INF_C); axis tight
12-2522
threshold
Assume the following characteristics of the inflation rate series: • Rates below 2% are low. • Rates at least 2% and below 8% are medium. • Rates at least 8% are high. • A logistic transition function describes the transition between states well. • Transition between low and medium rates are faster than transitions between medium and high. Create threshold transitions to describe the Canadian inflation rates. t = [2 8]; % Thresholds r = [3.5 1.5]; % Transition rates statenames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=r,StateNames=statenames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [2 8] [3.5000 1.5000] ["Low" "Med" 3
"High"]
12-2523
12
Functions
Plot the threshold transitions; show the gradient of the transition function between the states, and overlay the data. figure ttplot(tt,Data=INF_C)
Prepare Switching Mechanism of Threshold-Switching Model for Estimation A threshold-switching dynamic regression (tsVAR) model has two main components: • Threshold transitions, which represent the switching mechanism between states. Mid-levels and transition rates are estimable. • A collection of autoregressive models describing the dynamic system among states. Submodel coefficients and covariances are estimable. Before you create a threshold-switching model, you must specify its threshold transitions by using theshold. If you plan to fit a threshold-switching model to data, you can fully specify its threshold transitions if you know all mid-levels and transition rates. If you need to estimate some or all midlevels and rates, you can enter NaN values as placeholders for unknown parameters. estimate treats all specified parameters as equality constraints during estimation. Regardless, to fit threshold transition parameters to data, you must specify a partially specified threshold object as the switching mechanism of a threshold-switching model.
12-2524
threshold
Prepare All Estimable Parameters for Estimation Consider a smooth transition autoregressive (STAR) model that switches between three states (two thresholds) with an exponential transition function. Create the switching mechanism. Specify that all estimable parameters are unknown. t1 = [NaN NaN]; % Two unknown mid-levels r1 = [NaN NaN]; % Two unknown transition rates tt1 = threshold(t1,Type="exponential",Rates=r1) tt1 = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'exponential' [NaN NaN] [NaN NaN] ["1" "2" 3
"3"]
tt is a partially specified threshold object to pass to tsVAR as the switching mechanism. The estimate function of tsVAR fits the two mid-levels and transition rates to the data with any unknown submodel parameters in the threshold-switching model. Specify Equality Constraints Consider a STAR model, which has a switching mechanism with the following qualities: • The thresholds are at -1 and 1. • The transition function is exponential. • The transition rate between the first and second state is 0.5, but the rate between the second and third states is unknown. Create the switching mechanism. t2 = [-1 1]; r2 = [0.5 NaN]; tt2 = threshold(t2,Type="exponential",Rates=r2) tt2 = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'exponential' [-1 1] [0.5000 NaN] ["1" "2" 3
"3"]
tt is a partially specified threshold object to pass to tsVAR as the switching mechanism. The estimate function of tsVAR does the following: • Treat the mid-levels tt.Levels and the first transition rate tt.Rates(1) as equality constraints
12-2525
12
Functions
• Fit the second transition rate tt.Rates(2) to the data with any unknown submodel parameters in the threshold-switching model
Specify Custom Transition Functions Create smooth threshold transitions with the following qualities: • Mid-levels are at -1, 1, and 2. • The transition function is the Student's t cdf, which allows for a more gradual mixing than the normal cdf. • The transition rates, which are the degrees of freedom of the distribution, are 3, 10, and 100. t = [-1 1 2]; r = [3 10 100]; ttransfcn = @(z,ti,ri)tcdf(z,ri); tt = threshold(t,Type="custom",TransitionFunction=ttransfcn,Rates=r) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'custom' [-1 1 2] [3 10 100] ["1" "2" 4
"3"
Plot graphs of each transition function. figure ttplot(tt,Type="graph") legend(string(tt.Levels))
12-2526
"4"]
threshold
More About State Transition In threshold-switching dynamic regression models (tsVAR), a state transition occurs when a threshold variable zt crosses a transition mid-level. Discrete transitions result in an abrupt change in the submodel computing the response. Smooth transitions create weighted combinations of submodel responses that change continuously with the value of zt, and state changes indicate a shift in the dominant submodel. The smooth transition weights are determined by a transition function F(zt,tj,rj), where tj is threshold j and rj is transition rate j (see Type). Discrete, normal, and logistic transition functions separate states at small and large values of zt. Exponential transitions separate states at small and large values of |zt|. Exponential transitions model economic variables with "inner" and "outer" states, such as deviations from purchasing power parity.
Tips • To widen a smooth transition band to show a more gradual mixing of states, decrease the transition rate by specifying the Rates name-value argument when you create threshold transitions.
12-2527
12
Functions
Version History Introduced in R2021b
References [1] Enders, Walter. Applied Econometric Time Series. New York: John Wiley & Sons, Inc., 2009. [2] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [3] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also Objects tsVAR Functions ttplot | ttdata | ttstates | estimate Topics “Create Threshold Transitions” on page 10-73 “Visualize Threshold Transitions” on page 10-76 “Create Threshold-Switching Dynamic Regression Models” on page 10-88
12-2528
toCellArray
toCellArray Convert lag operator polynomial object to cell array
Syntax [coefficients, lags] = toCellArray(A)
Description [coefficients, lags] = toCellArray(A) converts a lag operator polynomial object A(L) to an equivalent cell array. coefficients is the cell array equivalent to the lag operator polynomial A(L). lags is a vector of unique integer lags associated with the polynomial coefficients. Elements of lags are in ascending order. The first element of lags is the smaller of the smallest nonzero coefficient lag of the object and zero; the last element of lags is the degree of the polynomial. That is, lags = [min(A.Lags,0), 1, 2, ... A.Degree].
Examples Convert Lag Operator to a Cell Array Create a LagOp polynomial and convert it to a cell array: A = LagOp({0.8 1 0 .6}); B = toCellArray(A); class(B) ans = 'cell'
Algorithms LagOp objects implicitly store polynomial lags and corresponding coefficient matrices of zero-valued coefficients via lag-based indexing. However, cell arrays conform to traditional element indexing rules, and must explicitly store zero coefficient matrices. The output cell array is equivalent to the input lag operator polynomial in the sense that the same lag operator is created when the output coefficients and lags are used to create a new LagOp object. That is, the following two statements produce the same polynomial A(L): [coefficients,lags] = toCellArray(A); A = LagOp(coefficients,'Lags',lags);
12-2529
12
Functions
tsVAR Create threshold-switching dynamic regression model
Description The tsVAR function returns a tsVAR object that specifies the functional form of a threshold-switching dynamic regression model on page 12-2546 for the univariate or multivariate response process yt. The tsVAR object also stores the parameter values of the model. A tsVAR object has two key components: • Switching mechanism among states, represented by threshold transitions (threshold object) • State-specific submodels, either autoregressive (ARX) or vector autoregression (VARX) models (arima or varm objects), which can contain exogenous regression components The components completely specify the model structure. The threshold transition levels, smooth transition function rates, and submodel parameters, such as the AR coefficients and innovationdistribution variance, are unknown and estimable unless you specify their values. To estimate a model containing unknown parameter values, pass the model and data to estimate. To work with an estimated or fully specified tsVAR object, pass it to an object function on page 12-2533. Alternatively, to create a Markov-switching dynamic regression model, which has a switching mechanism governed by a discrete-time Markov chain, see dtmc and msVAR.
Creation Syntax Mdl = tsVAR(tt,mdl) Mdl = tsVAR(tt,mdl,Name,Value) Description Mdl = tsVAR(tt,mdl) creates a threshold-switching dynamic regression model Mdl (a tsVAR object) that has the threshold transitions switching mechanism among states tt and the statespecific, stable dynamic regression submodels mdl. Mdl = tsVAR(tt,mdl,Name,Value) sets some properties on page 12-2531 using name-value argument syntax. For example, tsVAR(tt,mdl,Covariance=Sigma,SeriesNames=["GDP" "CPI"]) generates all innovations from the covariance matrix Sigma and labels the submodel response series "GDP" and "CPI", respectively. Input Arguments tt — Threshold transitions threshold object 12-2530
tsVAR
Threshold transitions, with NumStates states, specified as a threshold object. The states represented in tt.StateNames correspond to the states represented in the submodel vector mdl. tsVAR stores tt in the Switch property. mdl — State-specific dynamic regression submodels vector of arima objects | vector of varm objects State-specific dynamic regression submodels, specified as a length mc.NumStates vector of model objects individually constructed by arima or varm. All submodels must be of the same type (arima or varm) and have the same number of series. Unlike other model estimation tools, estimate does not infer the size of submodel regression coefficient arrays during estimation. Therefore, you must specify the Beta property of each submodel appropriately. For example, to include and estimate three predictors of the regression component of univariate submodel j, set mdl(j).Beta = NaN(3,1). tsVAR processes and stores mdl in the property Submodels.
Properties You can set the Covariance and SeriesNames properties when you create a model by using namevalue argument syntax. MATLAB derives the values of all other properties from inputs tt and mdl. You can modify only SeriesNames by using dot notation. For example, create a threshold-switching model for a 2-D response series in which all submodels share the same unknown covariance matrix, and then label the first and second series "GDP" and "CPI", respectively. Mdl = tsVAR(tt,mdl,Covariance=nan(2,2)); Mdl.SeriesNames = ["GDP" "CPI"];
Switch — Threshold transitions for switching mechanism among states threshold object This property is read-only. Threshold transitions for the switching mechanism among states, specified as a threshold object. Submodels — State-specific vector autoregression submodels vector of varm objects This property is read-only. State-specific vector autoregression submodels, specified as a vector of varm objects of length NumStates. tsVAR removes unsupported submodel components. • For arima submodels, tsVAR does not support the moving average (MA), differencing, and seasonal components. If any submodel is a composite conditional mean and variance model (for example, its Variance property is a garch object), tsVAR issues an error. 12-2531
12
Functions
• For varm submodels, tsVAR does not support the trend component. tsVAR converts submodels specified as arima objects to 1-D varm objects. NumStates — Number of states positive scalar This property is read-only. Number of states, specified as a positive scalar. Data Types: double NumSeries — Number of time series positive integer This property is read-only. Number of time series, specified as a positive integer. NumSeries specifies the dimensionality of the response variable and innovation in all submodels. Data Types: double StateNames — State labels string vector This property is read-only. State labels, specified as a string vector of length NumStates. Data Types: string SeriesNames — Unique Series labels string(1:numSeries) (default) | string vector | cell array of character vectors | numeric vector Unique series labels, specified as a string vector, cell array of character vectors, or a numeric vector of length numSeries. tsVAR stores the series names as a string vector. Data Types: string Covariance — Model-wide innovations covariance [] (default) | positive numeric scalar | positive semidefinite matrix Model-wide innovations covariance, specified as a positive numeric scalar for univariate models or a numSeries-by-numSeries positive semidefinite matrix for multivariate models. If Covariance is not an empty array ([]), object functions of tsVAR generate all innovations from Covariance and ignore submodel covariances. If Covariance is [] (the default), object functions of tsVAR generate innovations from submodel covariances. estimate does not support equality constraints on the innovations covariance. estimate ignores specified entries in Covariance or in submodel innovations covariances, and estimates all covariances instead. Data Types: double 12-2532
tsVAR
Notes • NaN-valued elements in either the properties of Switch or the submodels of Submodels indicate unknown, estimable parameters. Specified elements, except submodel innovation variances, indicate equality constraints on parameters in model estimation. • All unknown submodel parameters are state dependent.
Object Functions estimate forecast simulate summarize
Fit threshold-switching dynamic regression model to data Forecast sample paths from threshold-switching dynamic regression model Simulate sample paths of threshold-switching dynamic regression model Summarize threshold-switching dynamic regression model estimation results
Examples Create Fully Specified Univariate TAR Model Create a two-state TAR model for a 1-D response process. Specify all parameter values (this example uses arbitrary values). Create Switching Mechanism Create a discrete threshold transition at level 0. Label the regimes to reflect the state of the economy: • When the threshold variable (currently unknown) is in − ∞ , 0 , the economy is in a recession. • When the threshold variable is in [0, ∞ , the economy is expanding. t = 0; tt = threshold(t,StateNames=["Recession" "Expansion"]) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' 0 [] ["Recession" 2
"Expansion"]
tt is a fully specified threshold object that describes the switching mechanism of the thresholdswitching model. Create State-Specific Models for Response Series Assume the following univariate models describe the response process of the system: • Recession: yt = − 1 + 0 . 1yt − 1 + ε1, t, where ε1, t ∼ Ν 0, 1 . • Expansion: yt = −1 + 0 . 3yt − 1 + 0 . 2yt − 2 + ε2, t, where ε2, t ∼ Ν 0, 4 . 12-2533
12
Functions
For each regime, use arima to create an AR model that describes the response process within the regime. c1 = -1; c2 = 1; ar1 = 0.1; ar2 = [0.3 0.2]; v1 = 1; v2 = 4; mdl1 = arima(Constant=c1,AR=ar1,Variance=v1, ... Description="Recession State Model") mdl1 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Recession State Model" "Y" Name = "Gaussian" 1 0 0 -1 {0.1} at lag [1] {} {} {} 0 [1×0] 1
ARIMA(1,0,0) Model (Gaussian Distribution) mdl2 = arima(Constant=c2,AR=ar2,Variance=v2, ... Description="Expansion State Model") mdl2 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"Expansion State Model" "Y" Name = "Gaussian" 2 0 0 1 {0.3 0.2} at lags [1 2] {} {} {} 0 [1×0] 4
ARIMA(2,0,0) Model (Gaussian Distribution)
mdl1 and mdl2 are fully specified arima objects. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. 12-2534
tsVAR
mdl = [mdl1; mdl2];
Create Threshold-Switching Model Use tsVAR to create a TAR model from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["Recession" "Expansion"] "1" []
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 1-Dimensional VAR(2) Model" "Y1" 1 2 1 {0.3 0.2} at lags [1 2] 0 [1×0 matrix] 4
Mdl is a fully specified tsVAR object representing a univariate two-state TAR model. tsVAR stores specified arima submodels as varm objects. Because Mdl is fully specified, you can pass it to any tsVAR object function for further analysis (see “Object Functions” on page 12-2533). Alternatively, you can specify the threshold model parameters in Mdl.Switch as initial values for the estimation procedure (see estimate).
Create Fully Specified Univariate STAR Model Create a three-state STAR model with logistic transitions (LSTAR) for a 1-D response process. Specify all parameter values (this example uses arbitrary values). Create smooth, logistic threshold transitions at levels 2 and 8. Specify the following transition rates: • 3.5, when the system transitions from state 1 to state 2. • 1.5, when the system transitions from state 2 to state 3. t = [2 8]; r = [3.5 1.5]; tt = threshold(t,Type="logistic",Rates=r)
12-2535
12
Functions
tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [2 8] [3.5000 1.5000] ["1" "2" "3"] 3
tt is a fully specified threshold object. Assume the following univariate models describe the response process of the system: • State 1: y = − 5 + ε , where ε ∼ Ν 0, 0 . 12 . 1 1 t • State 2: y = ε , where ε ∼ Ν 0, 0 . 22 . 2 2 t • State 3: y = 5 + ε , where ε ∼ Ν 0, 0 . 32 . 3 3 t mdl1 = arima(Constant=-5,Variance=0.1); mdl2 = arima(Constant=0,Variance=0.2); mdl3 = arima(Constant=5,Variance=0.3); mdl = [mdl1,mdl2,mdl3];
Create a STAR model from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 1 ["1" "2" "3"] "1" []
Mdl is a fully specified tsVAR object representing the STAR model.
Create Partially Specified Univariate Model for Estimation Consider fitting to data a two-state exponential STAR model for a 1-D response process. Assume all parameters are unknown (includes transition mid-level t, rate r , and all dynamic model coefficients and variances θ). Create an exponential threshold transition. Specify unknown elements using NaN. tt = threshold(NaN,Type="exponential",Rates=NaN) tt = threshold with properties:
12-2536
tsVAR
Type: Levels: Rates: StateNames: NumStates:
'exponential' NaN NaN ["1" "2"] 2
tt is a partially specified threshold object. The mid-level tt.Levels and transition rate tt.Rates are unknown and estimable. Create AR(1) and AR(2) models by using the shorthand syntax of arima. mdl1 = arima(1,0,0); mdl2 = arima(2,0,0);
mdl1 and mdl2 are partially specified arima objects. NaN-valued properties correspond to unknown, estimable parameters. Store the submodels in a vector. mdl = [mdl1; mdl2];
Create a STAR model template from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["1" "2"] "1" []
Mdl is a partially specified tsVAR object representing a univariate two-state STAR model. Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(1) Model" "Y1" 1 1 NaN {NaN} at lag [1] 0 [1×0 matrix] NaN
Mdl.Submodels(2)
12-2537
12
Functions
ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"1-Dimensional VAR(2) Model" "Y1" 1 2 NaN {NaN NaN} at lags [1 2] 0 [1×0 matrix] NaN
tsVAR converts the arima object submodels to 1-D varm object equivalents. Mdl is prepared for estimation. You can pass Mdl to estimate, along with data and a fully specified threshold transition (threshold object) containing initial values for optimization.
Create Fully Specified Multivariate Model Create the following two, three-state threshold-switching dynamic regression models for a 2-D response process: 1
A model with state-specific innovations distributions
2
A model with a model-wide innovation covariance
Specify all parameter values (this example uses arbitrary values). Create Threshold Transitions Create logistic threshold transitions at mid-levels 2 and 8 with rates 3.5 and 1.5, respectively. Label the corresponding states "Low", "Med", and "High". t = [2 8]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [2 8] [3.5000 1.5000] ["Low" "Med" 3
"High"]
tt is a fully specified threshold object. Specify State-Specific Innovations Covariance Matrices Assume the following VAR models describe the response processes of the system: 12-2538
tsVAR
• • •
State 1: yt =
1 −1
+ ε1, t, where ε1, t ∼ N
0 0
,
1
−0 . 1
−0 . 1
1
.
State 2: yt =
2 0.5 0.1 0 2 −0 . 2 + yt − 1 + ε2, t, where ε2, t ∼ N , . −2 0.5 0.5 0 −0 . 2 2
State 3: yt =
3 0 . 25 0 0 0 0 3 −0 . 3 + yt − 1 + yt − 2 + ε3, t, where ε3, t ∼ N , . −3 0 0 0 . 25 0 0 −0 . 3 3
% Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Autoregression coefficients (numSeries AR1 = {}; % 0 AR2 = {[0.5 0.1; 0.5 0.5]}; % 1 AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % 2
x numSeries matrices) lags lag lags
% Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VAR Submodels mdl1 = varm('Constant',C1,'AR',AR1,'Covariance',Sigma1); mdl2 = varm('Constant',C2,'AR',AR2,'Covariance',Sigma2); mdl3 = varm('Constant',C3,'AR',AR3,'Covariance',Sigma3);
mdl1, mdl2, and mdl3 are fully specified varm objects. Store the submodels in a vector with order corresponding to the regimes in tt.StateNames. mdl = [mdl1; mdl2; mdl3];
Create an LSTAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] []
"High"]
Mdl is a fully specified tsVAR object representing a multivariate three-state LSTAR model. Because the Covariance property is empty ([]), the submodels have their own innovations covariance matrix.
12-2539
12
Functions
Specify Model-Wide Innovations Covariance Matrix Suppose that the innovations covariance matrix is invariant across states and has value 2 −0 . 2 . −0 . 2 2 Create an LSTAR model like Mdl that has the model-wide innovations covariance matrix. MdlCov = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"],Covariance=Sigma2) MdlCov = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] [2x2 double]
"High"]
MdlCov.Covariance ans = 2×2 2.0000 -0.2000
-0.2000 2.0000
The Covariance property of MdlCov is nonempty, which means the innovations distribution of all submodels are equal.
Create Fully Specified Model Containing Regression Component Consider including regression components for exogenous variables in each submodel of the threshold-switching dynamic regression model in “Create Fully Specified Multivariate Model” on page 12-2538. Create logistic threshold transitions at mid-levels 2 and 8 with rates 3.5 and 1.5, respectively. Label the corresponding states "Low", "Med", and "High". t = [2 8]; r = [3.5 1.5]; stateNames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=[3.5 1.5],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
12-2540
'logistic' [2 8] [3.5000 1.5000] ["Low" "Med" 3
"High"]
tsVAR
Assume the following VARX models describe the response processes of the system: • • •
State 1: yt =
1 1 0 1 −0 . 1 + x1, t + ε1, t, where ε1, t ∼ N , . −1 −1 0 −0 . 1 1
State 2: yt =
2 2 2 0.5 0.1 0 2 −0 . 2 + x2, t + yt − 1 + ε2, t, where ε2, t ∼ N , . −2 −2 −2 0.5 0.5 0 −0 . 2 2
3 3 3 3 0 . 25 0 0 0 + x3, t + yt − 1 + yt − 2 + ε3, t, where −3 −3 −3 −3 0 0 0 . 25 0 0 3 −0 . 3 ε3, t ∼ N , . 0 −0 . 3 3 State 3: yt =
x1, t represents a single exogenous variable, x2, t represents two exogenous variables, and x3, t represents three exogenous variables. Store the submodels in a vector. % Constants (numSeries x 1 vectors) C1 = [1; -1]; C2 = [2; -2]; C3 = [3; -3]; % Regression coefficients (numSeries x numRegressors matrices) Beta1 = [1; -1]; % 1 regressor Beta2 = [2 2; -2 -2]; % 2 regressors Beta3 = [3 3 3; -3 -3 -3]; % 3 regressors % Autoregression coefficients (numSeries x numSeries matrices) AR1 = {}; AR2 = {[0.5 0.1; 0.5 0.5]}; AR3 = {[0.25 0; 0 0] [0 0; 0.25 0]}; % Innovations covariances (numSeries x numSeries matrices) Sigma1 = [1 -0.1; -0.1 1]; Sigma2 = [2 -0.2; -0.2 2]; Sigma3 = [3 -0.3; -0.3 3]; % VARX mdl1 = mdl2 = mdl3 =
submodels varm(Constant=C1,AR=AR1,Beta=Beta1,Covariance=Sigma1); varm(Constant=C2,AR=AR2,Beta=Beta2,Covariance=Sigma2); varm(Constant=C3,AR=AR3,Beta=Beta3,Covariance=Sigma3);
mdl = [mdl1; mdl2; mdl3];
Create an LSTAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"]) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"]
"High"]
12-2541
12
Functions
Covariance: [] Mdl.Submodels(1) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional VARX(0) Model with 1 Predictor" "Y1" "Y2" 2 0 [1 -1]' {} [2×1 vector of zeros] [2×1 matrix] [2×2 matrix]
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(1) Model with 2 Predictors" "Y1" "Y2" 2 1 [2 -2]' {2×2 matrix} at lag [1] [2×1 vector of zeros] [2×2 matrix] [2×2 matrix]
Mdl.Submodels(3) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"AR-Stationary 2-Dimensional VARX(2) Model with 3 Predictors" "Y1" "Y2" 2 2 [3 -3]' {2×2 matrices} at lags [1 2] [2×1 vector of zeros] [2×3 matrix] [2×2 matrix]
Create Partially Specified Multivariate Model for Estimation Consider fitting to data a three-state TAR model for a 2-D response process. Assume all parameters are unknown (including the transition two mid-levels t and all dynamic model coefficients and variances θ). Create discrete threshold transitions at two unknown levels. This specification imples a three-state model. 12-2542
tsVAR
t = [NaN NaN]; tt = threshold(t);
tt is a partially specified threshold object. The transition mid-levels tt.Levels are completely unknown and estimable. Create 2-D VAR(0), VAR(1), and VAR(2) models by using the shorthand syntax of varm. Store the models in a vector. mdl1 = varm(2,0); mdl2 = varm(2,1); mdl3 = varm(2,2); mdl = [mdl1 mdl2 mdl3]; mdl(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of [2×0 matrix] [2×2 matrix of
VAR(1) Model"
NaNs] NaNs} at lag [1] zeros] NaNs]
mdl contains three state-specific varm model templates for estimation. NaN values in the properties indicate estimable parameters. Create a threshold-switching model template from the switching mechanism tt and the state-specific submodels mdl. Mdl = tsVAR(tt,mdl) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["1" "2" "3"] ["1" "2"] []
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P:
"2-Dimensional VAR(1) Model" "Y1" "Y2" 2 1
12-2543
12
Functions
Constant: AR: Trend: Beta: Covariance:
[2×1 {2×2 [2×1 [2×0 [2×2
vector of matrix of vector of matrix] matrix of
NaNs] NaNs} at lag [1] zeros] NaNs]
Mdl is a partially specified tsVAR model for estimation.
Specify Model Regression Component for Estimation Consider estimating all submodel coefficients and innovations covariances, and the threshold levels, of the TAR model in “Create Fully Specified Model Containing Regression Component” on page 122540. Create logistic threshold transitions at two unknown (NaN) mid-levels and with two unknown rates. Label the corresponding states "Low", "Med", and "High". stateNames = ["Low" "Med" "High"]; tt = threshold([NaN NaN],Type="logistic",Rates=[NaN NaN],StateNames=stateNames) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'logistic' [NaN NaN] [NaN NaN] ["Low" "Med" 3
"High"]
Specify the appropriate 2-D VAR model for each state by using thee shorthand syntax of varm, then use dot notation to specify a numSeries-by-numRegressors unknown, estimable exogenous regression coefficient matrix. • State 1: VARX(0) mode1 with one regressor • State 2: VARX(1) model with two regressors • State 3: VARX(2) model with three regressors Store the submodels in a vector. mdl1 = varm(2,0); mdl1.Beta = nan(2,1); % numSeries-by-numRegressors mdl2 = varm(2,1); mdl2.Beta = nan(2,2); mdl3 = varm(2,2); mdl3.Beta = nan(2,3); mdl = [mdl1; mdl2; mdl3];
Create an estimable LSTAR model from the switching mechanism tt and the state-specific submodels mdl. Label the series Y1 and Y2. Mdl = tsVAR(tt,mdl,SeriesNames=["Y1" "Y2"])
12-2544
tsVAR
Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [3x1 varm] 3 2 ["Low" "Med" ["Y1" "Y2"] []
"High"]
Mdl.Submodels(2) ans = varm with properties: Description: SeriesNames: NumSeries: P: Constant: AR: Trend: Beta: Covariance:
"2-Dimensional "Y1" "Y2" 2 1 [2×1 vector of {2×2 matrix of [2×1 vector of [2×2 matrix of [2×2 matrix of
VARX(1) Model with 2 Predictors"
NaNs] NaNs} at lag [1] zeros] NaNs] NaNs]
Create Model Specifying Equality Constraints for Estimation The estimate function generally supports constraints on any parameter to a known constant. Also, you can specify a model-wide innovations covariance by setting the Covariance property of tsVAR. Consider estimating a univariate threshold-switching model with the following characteristics: • A threshold transition is known to occur at 0. • The transition function is the normal cdf with unknown rate. • States 1 and 2 are constant models. The constants are unknown. • The innovations process is invariant among states, but the variance is unknown. Create the described threshold transition. t = 0; tt = threshold(t,Type="normal",Rates=NaN) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'normal' 0 NaN ["1" "2"] 2
12-2545
12
Functions
tt is a partially specified threshold object. Only the transition function rate tt.Rates is unknown and estimable. Create the described submodels by using the shorthand syntax of arima. Store the submodels in a vector. mdl1 = arima(0,0,0); mdl2 = arima(0,0,0) mdl2 = arima with properties: Description: SeriesName: Distribution: P: D: Q: Constant: AR: SAR: MA: SMA: Seasonality: Beta: Variance:
"ARIMA(0,0,0) Model (Gaussian Distribution)" "Y" Name = "Gaussian" 0 0 0 NaN {} {} {} {} 0 [1×0] NaN
mdl = [mdl1 mdl2];
mdl1 and mdl2 are partially specified arima objects representing constant-only linear models. Each model constant is unknown and estimable. Create a threshold-switching model from the specified threshold transitions and submodels. Specify and unknown, estimable model-wide innovations covariance matrix. Mdl = tsVAR(tt,mdl,Covariance=NaN) Mdl = tsVAR with properties: Switch: Submodels: NumStates: NumSeries: StateNames: SeriesNames: Covariance:
[1x1 threshold] [2x1 varm] 2 1 ["1" "2"] "1" NaN
Mdl is a partially specified tsVAR object configured for estimation. Because Mdl.Covariance is nonempty, MATLAB ignores any specified submodel innovations variances.
More About Threshold-Switching Dynamic Regression Model A Threshold-switching dynamic regression model of a univariate or multivariate response series yt is a nonlinear time series model that describes the dynamic behavior of the series in the presence of 12-2546
tsVAR
structural breaks or regime changes. A collection of state-specific dynamic regression submodels describes the dynamic behavior of yt within the regimes.
yt
f 1 yt; xt, θ1
, st(zt) = 1
f 2 yt; xt, θ2
, st(zt) = 2
fn + 1
, ⋮ ⋮ yt; xt, θn + 1 , st(zt) = n + 1
where: • n + 1 is the number of regimes (NumStates). •
f i yt; xt, θi is the regime i dynamic regression model of yt (Submodels(i)). Submodels are either univariate (ARX) or multivariate (VARX). Such a collection of models yields a threshold autoregressive model (TAR).
• zt is the univariate threshold variable. zt can be exogenous to the system or endogenous and delayed. If zt = yk, t − d, the system is a self-exciting threshold autoregressive model (SETAR) with unobserved delay d. • xt is a vector of observed exogenous variables at time t. • θi is the regime i collection of parameters of the dynamic regression model, such as AR coefficients and the innovation variances. The switching mechanism (Switch) is governed by threshold transitions and zt. The state index variable st(zt) is not random—observed values of the threshold variable zt determine the state of the system: 1 st =
, zt < t1
2
, t1 ≤ zt < t2 , ⋮ ⋮ n + 1 , zt ≥ tn
where tj is the unobserved threshold mid-level j (Switch.Levels(j)). Threshold levels must be inferred from the data. A state transition occurs when zt crosses a transition mid-level. Transitions can be discrete or smooth. Transitions of TAR models are discrete, which result in an abrupt change in the submodel computing the response. An extension of the TAR model is to allow for smooth transitions. Smooth transition autoregressive models (STAR) create weighted combinations of submodel responses that change continuously with the value of zt, and state changes indicate a shift in the dominant submodel. The smooth transition weights are determined by a transition function F(zt,tj,rj) (Switch.Type) and transition rate parameter rj (Switch.Rates), where 0 ≤ F(zt,tj,rj) ≤ 1. Supported transition functions include the normal cdf, logistic (LSTAR), bounded exponential (ESTAR), and custom functions. As a result, the general form of the threshold-switching autoregressive model is yt = f 1(yt; xt, θ1) +
n
∑
j=1
f j + 1 yt; xt, θ j + 1 − f j yt; xt, θ j F zt, t j, r j
.
• In this general case, innovation covariances can switch with the submodel. 12-2547
12
Functions
• For STAR models, the formula assigns weights to all submodel means, regardless of the current state. • For TAR models, the formula assigns unit weight to the current state/submodel only. • With observations, F(zt,tj,rj) = ttdata(tt,z), where z is a vector of threshold variable data. When Covariance is a nonempty array Σ, it is used to generate all innovations, independent of the submodels. The model reduces to ([3], Eqn. 3.6) yt = μ1 +
n
∑
j=1
μ j + 1 − μ j F zt, t j, r j
+ εt,
where: • μ j = E yt yt − 1, xt , which is the conditional mean of the response series yt in state j. • εt is an iid Gaussian innovations series with covariance Σ.
Algorithms A threshold variable zt, which triggers transitions between states, is not required to create Mdl. Specify exogenous or endogenous threshold variable data, and its characteristics, when you pass Mdl to an object function on page 12-2548.
Version History Introduced in R2021b
References [1] Enders, Walter. Applied Econometric Time Series. New York: John Wiley & Sons, Inc., 2009. [2] Teräsvirta, Tima. "Modelling Economic Relationships with Smooth Transition Regressions." In A. Ullahand and D.E.A. Giles (eds.), Handbook of Applied Economic Statistics, 507–552. New York: Marcel Dekker, 1998. [3] van Dijk, Dick. Smooth Transition Models: Extensions and Outlier Robust Inference. Rotterdam, Netherlands: Tinbergen Institute Research Series, 1999.
See Also threshold | arima | varm Topics “Create Threshold Transitions” on page 10-73 “Create Threshold-Switching Dynamic Regression Models” on page 10-88 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-111
12-2548
ttdata
ttdata Transition function data
Syntax T = ttdata(tt,z) T = ttdata(tt,z,UseZeroLevels=tf)
Description ttdata evaluates the transition function for observations of the threshold variable. To plot transition functions of threshold transitions, use ttplot. T = ttdata(tt,z) returns transition function data T for the threshold transitions in tt at values of the threshold variable z. T = ttdata(tt,z,UseZeroLevels=tf) returns transformation function data with all levels set to 0 when tf is true.
Examples Evaluate Transition Function Create logistic threshold transitions at levels 0 and 5. t = [0 5]; tt = threshold(t,Type="logistic");
tt is a threshold object. By default, the rate of each logistic transition function is 1. Evaluate the transition function at a sequence of transition variable data from -10 to 10. z = -10:0.01:10; F = ttdata(tt,z); size(F) ans = 1×2 2001
2
F is a 2001-by-2 vector of transition function data. Each column is the transition function data for the corresponding threshold in tt.Levels.
12-2549
12
Functions
Plot Transition Functions at Respective Levels To facilitate comparisons among transition rates, the Type="graph" option of ttplot graphs all transition functions at the same level. This example shows how to graph transition functions each at their respective level. Create normal threshold transitions at levels 0 and 5 with rates 0.5 and 1.5, respectively. tt = threshold([0 5],Type="normal",Rates=[0.5 1.5]);
Evaluate the transition functions at their respective level (the default), and then evaluate them each relative to level 0. Specify a sequence of transition variable data from -10 to 10. z = -10:0.01:10; n = numel(z); T0 = ttdata(tt,z); T1 = ttdata(tt,z,UseZeroLevels=true);
T0 is an n-by-1 vector of raw transition function data evaluated at the grid of transition variable data. T1 is an n-by-1 vector of transition function data translated to be centered at level 0. Plot both sets of transition functions separately. % Raw transition functions figure plot(z,T0,LineWidth=2) xline(tt.Levels,'--') grid on xlabel("Level") legend(["Level 0, Rate 0.5" "Level 5, Rate 1.5"],Location="northwest")
12-2550
ttdata
% Shifted transition functions figure plot(z,T1,LineWidth=2) grid on xlabel("Distance from Level") legend(["Level 0, Rate 0.5" "Level 5, Rate 1.5"],Location="northwest")
12-2551
12
Functions
Input Arguments tt — Threshold transitions threshold object Threshold transitions, with NumStates states, specified as a threshold object. tt must be fully specified (no NaN entries). z — Threshold variable data numeric vector Threshold variable data, specified as a numeric vector. Data Types: double tf — Flag indicating whether to compute data with all levels set to zero false (default) | true Flag indicating whether to compute data with all levels set to 0, specified as a value in this table:
12-2552
Value
Description
true
Computes transformation function data with all levels set to 0. This setting is useful for comparing transition rates.
ttdata
Value
Description
false
Computes raw transformation function data.
Example: UseZeroLevels=true Data Types: logical
Output Arguments T — Transition function data numeric matrix Transition function data F(z,tt.Levels,tt.Rates), returned as a numeric matrix. F is specified by tt.TransitionFunctionData. Data in T gives transition function values relative to each level in tt.Levels. The number of rows of T is equal to the length of z and the number of columns is equal to the number of levels. For details, see threshold.
Version History Introduced in R2021b
See Also Objects threshold Functions ttplot | ttstates Topics “Create Threshold Transitions” on page 10-73 “Visualize Threshold Transitions” on page 10-76
12-2553
12
Functions
ttplot Plot threshold transitions
Syntax ttplot(tt) ttplot(tt,Name,Value) ttplot(ax, ___ ) h = ttplot( ___ )
Description ttplot plots transition functions of threshold transitions. To evaluate the transition function for observations of the threshold variable, use ttdata. ttplot(tt) plots transition bands between states of the threshold transitions tt on the y-axis. The plot shows gradient shading of the mixing level on page 12-2563 in the transition bands. ttplot(tt,Name,Value) uses additional options specified by one or more name-value arguments. For example, ttplot(tt,Type="graph") specifies plotting a line plot of the transition function at each threshold level on the same axes. ttplot(ax, ___ ) plots on the axes specified by ax instead of the current axes (gca) using any of the input argument combinations in the previous syntaxes. h = ttplot( ___ ) returns a handle h to the threshold transitions plot. Use h to modify properties of the plot after you create it.
Examples Plot Discrete Threshold Transitions Create discrete threshold transitions at 0 and 2. t = [0 2]; tt = threshold(t) tt = threshold with properties: Type: Levels: Rates: StateNames: NumStates:
'discrete' [0 2] [] ["1" "2" 3
"3"]
tt is a threshold object. The specified thresholds split the space into three states. Plot the threshold transitions. 12-2554
ttplot
ttplot(tt);
ttplot graphs a gradient plot by default. The y-axis represents the value of the threshold variable zt (currently undefined) and the state-space: • The system is in state 1 when zt < 0. • The system is in state 2 when 0 ≤ zt < 2. • The system is in state 3 when zt ≥ 2. Because the transitions are discrete, ttplot graphs the levels as lines—the regime switches abruptly when zt crosses a threshold variable. Because zt is undefined, the x-axis is arbitrary. When you specify threshold variable data by using the Data name-value argument, the x-axis is the sample index.
Plot Smooth Threshold Transitions This example shows how to create two logistic threshold transitions with different transition rates, and then display a gradient plot of the transitions. Load the yearly Canadian inflation and interest rates data set. Extract the inflation rate based on consumer price index (INF_C) from the table, and plot the series. 12-2555
12
Functions
load Data_Canada INF_C = DataTable.INF_C; plot(dates,INF_C); axis tight
Assume the following characteristics of the inflation rate series: • Rates below 2% are low. • Rates at least 2% and below 8% are medium. • Rates at least 8% are high. • A logistic transition function describes the transition between states well. • Transition between low and medium rates are faster than transitions between medium and high. Create threshold transitions to describe the Canadian inflation rates. t = [2 8]; % Thresholds r = [3.5 1.5]; % Transition rates statenames = ["Low" "Med" "High"]; tt = threshold(t,Type="logistic",Rates=r,StateNames=statenames) tt = threshold with properties: Type: 'logistic' Levels: [2 8]
12-2556
ttplot
Rates: [3.5000 1.5000] StateNames: ["Low" "Med" NumStates: 3
"High"]
Plot the threshold transitions; show the gradient of the transition function between the states, and overlay the data. figure ttplot(tt,Data=INF_C)
Visually Compare Transition Functions Create normal cdf threshold transitions at levels 0 and 5, with rates 0.5 and 1.5. t = [0 5]; r = [0.5 1.5]; tt = threshold(t,Type="normal",Rates=r) tt = threshold with properties: Type: 'normal' Levels: [0 5] Rates: [0.5000 1.5000]
12-2557
12
Functions
StateNames: ["1" NumStates: 3
"2"
"3"]
To compare the behavior of the transition functions, plot their graphs at the same level. figure ttplot(tt,Type="graph",Width=20)
Plot the transition functions at their levels. Evaluate the transition function over a 1-D grid of values by using ttdata, and then plot the results. lower = tt.Levels(1) - 3/min(tt.Rates); upper = tt.Levels(end) + 3/min(tt.Rates); z = lower:0.1:upper; F = ttdata(tt,z,UseZeroLevels=false); figure plot(z,F,LineWidth=2) grid on xlabel("Level") legend(["Level 0, Rate 0.5" "Level 5, Rate 1.5"],Location="NorthWest")
12-2558
ttplot
Plot Exponential Threshold Transitions Create smooth threshold transitions for the Australian to US dollar exchange rate to model price parity. Load the Australia/US purchasing power and interest rates data set. Extract the log of the exchange rate EXCH from the table. load Data_JAustralian EXCH = DataTable.EXCH;
Consider a two-state system where: • State 1 occurs when the Australian dollar buys more than the US dollar (EXCH ≥ 0). • State 2 occurs when the US dollar buys more than the Australian dollar (EXCH < 0). • States are weighed more highly as the system deviates from parity (EXCH = 0). Create threshold transitions representing the system. To attribute a greater amount of mixing away from the threshold, specify an exponential transition function. Set the transition rate to 2.5. tt = threshold(0,Type="exponential",Rates=2.5) tt = threshold with properties:
12-2559
12
Functions
Type: Levels: Rates: StateNames: NumStates:
'exponential' 0 2.5000 ["1" "2"] 2
Plot the threshold transitions with the threshold data. figure ttplot(tt,Data=EXCH);
Try improving the display by experimenting with the transition band width (Width name-value argument). figure ttplot(tt,Data=EXCH,Width=2);
12-2560
ttplot
Plot the transition function. figure ttplot(tt,Type="graph");
12-2561
12
Functions
Input Arguments tt — Threshold transitions threshold object Threshold transitions, with NumStates states, specified as a threshold object. tt must be fully specified (no NaN entries). ax — Axes on which to plot Axes object Axes on which to plot, specified as an Axes object. By default, ttplot plots to the current axes (gca). Name-Value Arguments Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter. Before R2021a, use commas to separate each name and value, and enclose Name in quotes. Example: Type="graph" specifies plotting a line plot of the transition function at each threshold level on the same axes. 12-2562
ttplot
Type — Plot type "gradient" (default) | "graph" Plot type, specified as a value in this table. Value
Description
"gradient"
Gradient shading of the mixing level in each transition band.
"graph"
Graphs of transition functions at each level. ttplot plots graphs with levels set to zero.
Example: Type="graph" Data Types: char | string Data — Data on threshold variable zt to include in plot [] (empty array) (default) | numeric vector Data on a threshold variable zt to include in the plot, specified as a numeric vector. ttplot plots Data with gradient shading of transition bands (Type="gradient"). If Type="graph", ttplot ignores Data. Data Types: double Width — Width of transition bands positive numeric scalar Width of transition bands, specified as a positive numeric scalar. • For gradient plots (Type="gradient"), ttplot truncates transition function data outside of the bands. • For transition function graphs (Type="graph"), ttplot sets the x-axis limits to [-Width/2 Width/2]. By default, ttplot selects the band width automatically. Example: Width=10 Data Types: double
Output Arguments h — Plot handle graphics object Plot handle, returned as a graphics object. h contains a unique plot identifier, which you can use to query or modify properties of the plot.
More About Mixing Level The mixing level is the degree to which adjacent states contribute to a response. 12-2563
12
Functions
Transition functions F vary between 0 and 1; adjacent states are assigned weights F and 1 – F. The mixing level between adjacent states is the minimum weight min(F, 1 – F). The following characteristics define the mixing behavior of each transition type: • Discrete transitions have no mixing. • Normal and logistic transitions achieve maximum mixing at threshold levels. • Exponential transitions achieve maximum mixing on either side of threshold levels. For more details, see threshold.
Tips • Use the Width name-value argument to adjust the display of transition function graph (Type="graph") plots with varying rates. In multilevel gradient plots (Type="gradient"), a large enough width results in overlapping transition bands that can misrepresent data. By default, ttplot chooses an appropriate width for displaying all transitions.
Version History Introduced in R2021b
See Also Objects threshold Functions ttdata Topics “Create Threshold Transitions” on page 10-73 “Visualize Threshold Transitions” on page 10-76 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-111
12-2564
ttstates
ttstates Threshold variable data state path
Syntax states = ttstates(tt,z)
Description states = ttstates(tt,z) returns the state path for values of the threshold variable z relative to the levels in the Levels property of tt.
Examples Compute State Path Load the yearly Canadian inflation and interest rates data set. Extract the inflation rate based on consumer price index (INF_C) from the table. load Data_Canada INF_C = DataTable.INF_C;
Assume the following characteristics of the inflation rate series: • Rates below 2% are low. • Rates at least 2% and below 8% are medium. • Rates at least 8% are high. • States transition abruptly. Create threshold transitions to describe the Canadian inflation rates. statenames = ["Low" "Med" "High"]; tt = threshold([2 8],StateNames=statenames);
Infer the state path by passing the inflation rate series through the threshold transitions. n = numel(INF_C); states = ttstates(tt,INF_C); snpath = tt.StateNames(states);
states is an n-by-1 vector of inferred state indices. snpath is the state path using state names instead of indices. Separately plot the inflation rate series and inferred state path. figure tiledlayout(2,1) nexttile h = ttplot(tt,Data=INF_C); legend(h([1 3]),["State threshold" "Inflation rate"])
12-2565
12
Functions
nexttile plot(states,'go',LineWidth=2) ylabel('State') yticks(1:3) yticklabels(tt.StateNames) axis tight
Input Arguments tt — Threshold transitions threshold object Threshold transitions, with NumStates states, specified as a threshold object. tt must be fully specified (no NaN entries). z — Threshold variable data numeric vector Threshold variable data, specified as a numeric vector. Data Types: double
12-2566
ttstates
Output Arguments states — Threshold data states numeric vector Threshold data states, returned as a numeric vector with the same length as z. If the transition mid-levels tt.Levels are t1, t2,… tn, ttstates labels states (−∞,t1), [t1,t2),… [tn,∞) as 1, 2, …, n+1, respectively. States are independent of threshold rates.
Algorithms In threshold-switching dynamic regression models (tsVAR), state transitions occur when a threshold variable crosses a transition mid-level. Discrete transitions result in an abrupt change in the submodel computing the response. Smooth transitions create weighted combinations of submodel responses that change continuously with the value of the threshold variable, and state changes indicate a shift in the dominant submodel. For more details, see tsVAR.
Version History Introduced in R2021b
See Also Objects threshold | tsVAR Functions ttdata Topics “Create Threshold Transitions” on page 10-73 “Simulate Paths of Threshold-Switching Dynamic Regression Models” on page 10-111
12-2567
12
Functions
tune Tune Bayesian state-space model posterior sampler
Syntax [params,Proposal] = tune(PriorMdl,Y,params0) [params,Proposal] = tune(PriorMdl,Y,params0,Name=Value)
Description The tune function searches for the posterior mode to form proposal distribution moments of the Metropolis-Hastings sampler [1][2]. To improve the proposal distribution, for example, increase the acceptance rate of proposed posterior draws, pass the outputs of tune to simulate. [params,Proposal] = tune(PriorMdl,Y,params0) returns proposal distribution parameter mean vector params and scale matrix Proposal to improve the Metropolis-Hastings sampler. PriorMdl is the Bayesian state-space model that specifies the state-space model structure (likelihood) and prior distribution, Y is the data for the likelihood, and params0 is the vector of initial values for the unknown state-space model parameters θ in PriorMdl. [params,Proposal] = tune(PriorMdl,Y,params0,Name=Value) specifies additional options using one or more name-value arguments. For example, tune(Mdl,Y,params0,Hessian="opg",Display=false) uses the outer-product of gradients method to compute the Hessian matrix and suppresses the display of the optimized values.
Examples Tune Proposal Distribution for Metropolis Sampler of Bayesian State-Space Model Simulate observed responses from a known state-space model, then treat the model as Bayesian and draw parameters from the posterior distribution. Tune the proposal distribution of the MetropolisHastings sampler by using tune. Suppose the following state-space model is a data-generating process (DGP). xt, 1 xt, 2
=
xt − 1, 1 0.5 0 1 0 ut, 1 + 0 −0 . 75 xt − 1, 2 0 0 . 5 ut, 2
yt = 1 1
xt, 1 xt, 2
.
Create a standard state-space model object ssm that represents the DGP. trueTheta = [0.5; -0.75; 1; 0.5]; A = [trueTheta(1) 0; 0 trueTheta(2)]; B = [trueTheta(3) 0; 0 trueTheta(4)]; C = [1 1]; DGP = ssm(A,B,C);
12-2568
tune
Simulate a response path from the DGP. rng(1); % For reproducibility y = simulate(DGP,200);
Suppose the structure of the DGP is known, but the state parameters trueTheta are unknown, explicitly xt, 1 xt, 2
=
ϕ1 0 xt − 1, 1
yt = 1 1
0 ϕ2 xt − 1, 2 xt, 1 xt, 2
+
σ1 0 ut, 1 0 σ2 ut, 2
.
Consider a Bayesian state-space model representing the model with unknown parameters. Arbitrarily assume that the prior distribution of ϕ1, ϕ2, σ12, and σ22 are independent Gaussian random variables with mean 0.5 and variance 1. The Local Functions on page 12-2570 section contains two functions required to specify the Bayesian state-space model. You can use the functions only within this script. The paramMap function accepts a vector of the unknown state-space model parameters and returns all the following quantities: • •
A= B=
ϕ1 0 0 ϕ2 σ1 0 0 σ2
. .
• C= 1 1. • D = 0. • Mean0 and Cov0 are empty arrays [], which specify the defaults. • StateType = 0 0 , indicating that each state is stationary. The paramDistribution function accepts the same vector of unknown parameters as does paramMap, but it returns the log prior density of the parameters at their current values. Specify that parameter values outside the parameter space have log prior density of -Inf. Create the Bayesian state-space model by passing function handles directly to paramMap and paramDistribution to bssm. Mdl = bssm(@paramMap,@priorDistribution) Mdl = Mapping that defines a state-space model: @paramMap Log density of parameter prior distribution: @priorDistribution
12-2569
12
Functions
The simulate function requires a proposal distribution scale matrix. You can obtain a data-driven proposal scale matrix by using the tune function. Alternatively, you can supply your own scale matrix. Obtain a data-driven scale matrix by using the tune function. Supply a random set of initial parameter values. numParams = 4; theta0 = rand(numParams,1); [theta0,Proposal] = tune(Mdl,y,theta0); Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. Optimization and Tuning | Params0 Optimized ProposalStd ---------------------------------------c(1) | 0.6968 0.4459 0.0798 c(2) | 0.7662 -0.8781 0.0483 c(3) | 0.3425 0.9633 0.0694 c(4) | 0.8459 0.3978 0.0726
theta0 is a 4-by-1 estimate of the posterior mode and Proposal is the Hessian matrix. Both outputs are the optimized moments of the proposal distribution, the latter of which is up to a proportionality constant. tune displays convergence information and an estimation table, which you can suppress by using the Display options of the optimizer and tune. Draw 1000 random parameter vectors from the posterior distribution. Specify the simulated response path as observed responses and the optimized values returned by tune for the initial parameter values and the proposal distribution. [Theta,accept] = simulate(Mdl,y,theta0,Proposal); accept accept = 0.4010
Theta is a 4-by-1000 matrix of randomly drawn parameters from the posterior distribution. Rows correspond to the elements of the input argument theta of the functions paramMap and priorDistribution. accept is the proposal acceptance probability. In this case, simulate accepts 40% of the proposal draws. Local Functions This example uses the following functions. paramMap is the parameter-to-matrix mapping function and priorDistribution is the log prior distribution of the parameters. function [A,B,C,D,Mean0,Cov0,StateType] = paramMap(theta) A = [theta(1) 0; 0 theta(2)]; B = [theta(3) 0; 0 theta(4)]; C = [1 1]; D = 0; Mean0 = []; % MATLAB uses default initial state mean
12-2570
tune
Cov0 = []; % MATLAB uses initial state covariances StateType = [0; 0]; % Two stationary states end function logprior = priorDistribution(theta) paramconstraints = [(abs(theta(1)) >= 1) (abs(theta(2)) >= 1) ... (theta(3) < 0) (theta(4) < 0)]; if(sum(paramconstraints)) logprior = -Inf; else mu0 = 0.5*ones(numel(theta),1); sigma0 = 1; p = normpdf(theta,mu0,sigma0); logprior = sum(log(p)); end end
Suppress Tuning Displays and Specify Hessian Method Consider the following time-varying, state-space model for a DGP: • From periods 1 through 250, the state equation includes stationary AR(2) and MA(1) models, respectively, and the observation model is the weighted sum of the two states. • From periods 251 through 500, the state model includes only the first AR(2) model. • μ0 = 0 . 5 0 . 5 0 0 and Σ0 is the identity matrix. Symbolically, the DGP is x1t x2t x3t
=
x4t
ϕ1 ϕ2 0 0 x1, t − 1 1 0 0 0 x2, t − 1
σ1 0 0 0 u1t 0 1 u2t for t = 1, . . . , 250, 0 1
+
0 0 0 θ x3, t − 1 0 0 0 0 x4, t − 1
yt = c1 x1t + x3t + σ2εt . x1, t − 1 x1t x2t
=
ϕ1 ϕ2 0 0 x2, t − 1
σ1
+
1 0 0 0 x3, t − 1
0
u1t
for t = 251,
x4, t − 1 yt = c2x1t + σ3εt . x1t x2t
=
ϕ1 ϕ2 x1, t − 1 1 0 x2, t − 1
+
σ1 0
u1t
for t = 252, . . . , 500 .
yt = c2x1t + σ3εt . where: • The AR(2) parameters ϕ1, ϕ2 = 0 . 5, − 0 . 2 and σ1 = 0 . 4. • The MA(1) parameter θ = 0 . 3. 12-2571
12
Functions
• The observation equation parameters c1, c2 = 2, 3 and σ2, σ3 = 0 . 1, 0 . 2 . Write a function that specifies how the parameters theta and sample size T map to the state-space model matrices, the initial state moments, and the state types. Save this code as a file named timeVariantParamMapBayes.m on your MATLAB® path. Alternatively, open the example to access the function. type timeVariantParamMapBayes.m % Copyright 2022 The MathWorks, Inc. function [A,B,C,D,Mean0,Cov0,StateType] = timeVariantParamMapBayes(theta,T) % Time-variant, Bayesian state-space model parameter mapping function % example. This function maps the vector params to the state-space matrices % (A, B, C, and D), the initial state value and the initial state variance % (Mean0 and Cov0), and the type of state (StateType). From periods 1 % through T/2, the state model is a stationary AR(2) and an MA(1) model, % and the observation model is the weighted sum of the two states. From % periods T/2 + 1 through T, the state model is the AR(2) model only. The % log prior distribution enforces parameter constraints (see % flatPriorBSSM.m). T1 = floor(T/2); T2 = T - T1 - 1; A1 = {[theta(1) theta(2) 0 0; 1 0 0 0; 0 0 0 theta(4); 0 0 0 0]}; B1 = {[theta(3) 0; 0 0; 0 1; 0 1]}; C1 = {theta(5)*[1 0 1 0]}; D1 = {theta(6)}; Mean0 = [0.5 0.5 0 0]; Cov0 = eye(4); StateType = [0 0 0 0]; A2 = {[theta(1) theta(2) 0 0; 1 0 0 0]}; B2 = {[theta(3); 0]}; A3 = {[theta(1) theta(2); 1 0]}; B3 = {[theta(3); 0]}; C3 = {theta(7)*[1 0]}; D3 = {theta(8)}; A = [repmat(A1,T1,1); A2; repmat(A3,T2,1)]; B = [repmat(B1,T1,1); B2; repmat(B3,T2,1)]; C = [repmat(C1,T1,1); repmat(C3,T2+1,1)]; D = [repmat(D1,T1,1); repmat(D3,T2+1,1)]; end
Simulate a response path of length 500 from the model. params = [0.5; -0.2; 0.4; 0.3; 2; 0.1; 3; 0.2]; numObs = 500; numParams = numel(params); [A,B,C,D,mean0,Cov0,stateType] = timeVariantParamMapBayes(params,numObs); DGP = ssm(A,B,C,D,Mean0=mean0,Cov0=Cov0,StateType=stateType); rng(1) % For reproducibility y = simulate(DGP,numObs); plot(y) ylabel("y")
12-2572
tune
Write a function that specifies a flat prior distribution on the state-space model parameters theta. The function returns the scalar log prior for an input set of parameters. Save this code as a file named flatPriorBSSM.m on your MATLAB® path. Alternatively, open the example to access the function. type flatPriorBSSM.m % Copyright 2022 The MathWorks, Inc. function logprior = flatPriorBSSM(theta) % flatPriorBSSM computes the log of the flat prior density for the eight % variables in theta (see timeVariantParamMapBayes.m). Log probabilities % for parameters outside the parameter space are -Inf. % theta(1) and theta(2) are lag 1 and lag 2 terms in a stationary AR(2) % model. The eigenvalues of the AR(1) representation need to be within % the unit circle. evalsAR2 = eig([theta(1) theta(2); 1 0]); evalsOutUC = sum(abs(evalsAR2) >= 1) > 0; % Standard deviations of disturbances and errors (theta(3), theta(6), % and theta(8)) need to be positive. nonnegsig1 = theta(3) 1, and cell j contains a qj-vector, j = 1,...,k, then the software conducts k independent Wald tests. Each qj must be less than the number of unrestricted model parameters. Data Types: double | cell R — Restriction function Jacobians row vector | matrix | cell vector of row vectors or matrices Restriction function Jacobians, specified as a row vector, matrix, or cell vector of row vectors or matrices. • Suppose r1,...,rq are the q restriction functions, and the unrestricted model parameters are θ1,...,θp. Then, the restriction function Jacobian is 12-2683
12
Functions
∂r1 ∂r1 … ∂θ1 ∂θp R=
⋮ ⋱ ⋮ . ∂rq ∂rq ⋯ ∂θ1 ∂θp
• If R is a q-by-p matrix or a singleton cell array containing a q-by-p matrix, then the software conducts one Wald test. q must be less than p, which is the number of unrestricted model parameters. • If R is a cell vector of length k > 1, and cell j contains a qj-by-pj matrix, j = 1,...,k, then the software conducts k independent Wald tests. Each qj must be less than pj, which is the number of unrestricted parameters in model j. Data Types: double | cell EstCov — Unrestricted model parameter covariance estimate matrix | cell vector of matrices Unrestricted model parameter covariance estimates, specified as a matrix or cell vector of matrices. • If EstCov is a p-by-p matrix or a singleton cell array containing a p-by-p matrix, then the software conducts one Wald test. p is the number of unrestricted model parameters. • If EstCov is a cell vector of length k > 1, and cell j contains a pj-by-pj matrix, j = 1,...,k, then the software conducts k independent Wald tests. Each pj is the number of unrestricted parameters in model j. Data Types: double | cell alpha — Nominal significance levels 0.05 (default) | scalar | vector Nominal significance levels for the hypothesis tests, specified as a scalar or vector. Each element of alpha must be greater than 0 and less than 1. When conducting k > 1 tests, • If alpha is a scalar, then the software expands it to a k-by-1 vector. • If alpha is a vector, then it must have length k. Data Types: double
Output Arguments h — Test rejection decisions logical | vector of logicals Test rejection decisions, returned as a logical value or vector of logical values with a length equal to the number of tests that the software conducts. • h = 1 indicates rejection of the null, restricted model in favor of the alternative, unrestricted model. 12-2684
waldtest
• h = 0 indicates failure to reject the null, restricted model. pValue — Test statistic p-values scalar | vector Test statistic p-values, returned as a scalar or vector with a length equal to the number of tests that the software conducts. stat — Test statistics scalar | vector Test statistics, returned as a scalar or vector with a length equal to the number of tests that the software conducts. cValue — Critical values scalar | vector Critical values determined by alpha, returned as a scalar or vector with a length equal to the number of tests that the software conducts.
More About Wald Test The Wald test compares specifications of nested models by assessing the significance of q parameter restrictions to an extended model with p unrestricted parameters. The test statistic is W = r′ RΣθ R′
−1
r,
where • r is the restriction function that specifies restrictions of the form r(θ) = 0 on parameters θ in the unrestricted model, evaluated at the unrestricted model parameter estimates. In other words, r maps the p-dimensional parameter space to the q-dimensional restriction space. In practice, r is a q-by-1 vector, where q < p. Usually, r = θ − θ0, where θ is the unrestricted model parameter estimates for the restricted parameters and θ0 holds the values of the restricted model parameters under the null hypothesis. • R is the restriction function Jacobian evaluated at the unrestricted model parameter estimates. • Σ θ is the unrestricted model parameter covariance estimator evaluated at the unrestricted model parameter estimates. • W has an asymptotic chi-square distribution with q degrees of freedom. When W exceeds a critical value in its asymptotic distribution, the test rejects the null, restricted hypothesis in favor of the alternative, unrestricted hypothesis. The nominal significance level (α) determines the critical value.
12-2685
12
Functions
Note Wald tests depend on the algebraic form of the restrictions. For example, you can express the restriction ab = 1 as a – 1/b = 0, or b – 1/a = 0, or ab – 1 = 0. Each formulation leads to different test statistics.
Tips • Estimate unrestricted univariate linear time series models, such as arima or garch, or time series regression models (regARIMA) using estimate. Estimate unrestricted multivariate linear time series models, such as varm or vecm, using estimate. estimate returns parameter estimates and their covariance estimates, which you can process and use as inputs to waldtest. • If you cannot easily compute restricted parameter estimates, then use waldtest. By comparison: • lratiotest requires both restricted and unrestricted parameter estimates. • lmtest requires restricted parameter estimates.
Algorithms • waldtest performs multiple, independent tests when the restriction function vector, its Jacobian, and the unrestricted model parameter covariance matrix (r, R, and EstCov, respectively) are equal-length cell vectors. • If EstCov is the same for all tests, but r varies, then waldtest “tests down” against multiple restricted models. • If EstCov varies among tests, but r does not, then waldtest “tests up” against multiple unrestricted models. • Otherwise, waldtest compares model specifications pair-wise. • alpha is nominal in that it specifies a rejection probability in the asymptotic distribution. The actual rejection probability is generally greater than the nominal significance. • The Wald test rejection error is generally greater than the likelihood ratio and Lagrange multiplier test rejection errors.
Version History Introduced in R2009a
References [1] Davidson, R. and J. G. MacKinnon. Econometric Theory and Methods. Oxford, UK: Oxford University Press, 2004. [2] Godfrey, L. G. Misspecification Tests in Econometrics. Cambridge, UK: Cambridge University Press, 1997. [3] Greene, W. H. Econometric Analysis. 6th ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2008. [4] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press, 1994. 12-2686
waldtest
See Also Objects arima | varm | garch | regARIMA | vecm Functions lmtest | estimate | estimate | estimate | estimate Topics “Conduct Wald Test” on page 3-64 “Classical Model Misspecification Tests” on page 3-69 “Time Series Regression IX: Lag Order Selection” on page 5-264
12-2687
A
Appendices
A
Data Sets and Examples
Data Sets and Examples In this section... “Data Sets” on page A-2 “Featured Examples” on page A-3 Econometrics Toolbox features historical data sets and examples for use with its functions. The data sets include a diverse collection of macroeconomic time series you can use to estimate models, experiment, or establish benchmarks. The featured examples demonstrate common workflows in econometric analysis and explore connections among related toolbox functions by using the data sets.
Data Sets Generally, each data set is a MAT file containing the following variables: • Data — A matrix of data. Each column is a variable (time series). Each row contains associated observations of the variables. • DataTable — A table of data. DataTable contains the same observations and has the same dimensionality as Data. • DataTimeTable — A timetable of data. DataTimeTable contains the same observations and has the same dimensionality as Data. • Description — Textual data set description, including data set variable definitions and references. • series — Vector of descriptive variable names. To load the variables of a data set into the MATLAB Workspace, enter the following command at the MATLAB command line, where DataSetName is one of the MAT files in the table. load DataSetName
A-2
Data Set Name
Description
Data_Accidental
Monthly number of accidental deaths in the US, 1973–1978
Data_Airline
Monthly number of international airline passengers, 1949–1960
Data_Canada
Canadian inflation and interest rates, 1954–1994
Data_Consumption
US food consumption, 1927–1962
Data_CreditDefaults
Investment-grade corporate bond defaults and four predictors, 1984–2004
Data_Danish
Danish stock returns, bond yields, 1922–1999
Data_DieboldLi
U.S. Treasury unsmoothed Fama-Bliss zero-coupon yields and macroeconomic factors, 1972–2000
Data_ElectricityPrices
Simulated daily electricity spot prices, 2010–2013
Data_EquityIdx
U.S. equity indices, 1990–2001
Data_FXRates
Currency exchange rates, 1979–1998
Data_GDP
U.S. Gross Domestic Product, 1947–2005
Data_GlobalIdx1
Global large-cap equity indices, 1993–2003
Data Sets and Examples
Data Set Name
Description
Data_GNP
U.S. Gross National Product, 1947–2005
Data_Income1
Simulated data on income and education
Data_Income2
Average annual earnings by educational attainment in eight workforce age categories
Data_JAustralian
Johansen's Australian data, 1972–1991
Data_JDanish
Johansen's Danish data, 1974–1987
Data_MarkPound
Deutschmark/British Pound foreign-exchange rate, 1984–1991
Data_NelsonPlosser
Macroeconomic series of Nelson and Plosser, 1860–1970
Data_Overshort
Daily overshorts from an underground gasoline tank in Colorado. 57 consecutive days
Data_PowerConsumption
Canadian electrical power consumption and GDP, 1960–2009
Data_Recessions
U.S. recession start and end dates, 1857–2022
Data_SchwertMacro
Macroeconomic series of Schwert, 1947–1985
Data_SchwertStock
Indices of U.S. stock prices, 1871–2008
Data_TBill
Three-month U.S. treasury bill secondary market rates, 1947– 2005
Data_USEconModel
U.S. macroeconomic series, 1947–2009
Data_USEconVECModel
U.S. macroeconomic series 1957–2016 and projections for the following 10 years from the Congressional Budget Office
Unlisted data sets that you load in some examples are accessible only through the documentation. For help installing the Econometrics Toolbox documentation and those data sets, see “Help Preferences”.
Featured Examples You can access an example by clicking its title in the table. Then, to open the example script, click Open Live Script . Alternatively, if you have the Econometrics Toolbox documentation installed, you can open an example script by entering the following command at the MATLAB command line, where exampleName is the example name in the table. openExample('econ/exampleName')
Example Name
Title
Description
AnalyzeLinearizedDSGEMod “Analyze Linearized DSGE elsExample Models” on page 11-190
Analyze the dynamic stochastic general equilibrium (DSGE) model in [76] by using Bayesian state-space model tools.
Demo_ClassicalTests
Perform classical model misspecification tests.
“Classical Model Misspecification Tests” on page 3-69
A-3
A
A-4
Data Sets and Examples
Example Name
Title
Description
Demo_DieboldLiModel
“Apply State-Space Methodology Analyze the popular Diebold-Li to Analyze Diebold-Li Yield yields-only and yields-macro Curve Model” on page 11-160 models of monthly yield-curve time series derived from U.S. Treasury bills and bonds by using state-space models and the Kalman filter.
Demo_HPFilter
“Use Hodrick-Prescott Filter to Reproduce Original Result” on page 2-29
Demo_RiskFHS
“Using Bootstrapping and Use bootstrapping and filtered Filtered Historical Simulation to historical simulation to evaluate Evaluate Market Risk” on page market risk 8-102
Demo_RiskEVT
“Using Extreme Value Theory Use extreme value theory and and Copulas to Evaluate Market copulas to evaluate market risk Risk” on page 8-114
Demo_TSReg1
“Time Series Regression I: Linear Models” on page 5-176
Introduce basic assumptions behind multiple linear regression models
Demo_TSReg2
“Time Series Regression II: Collinearity and Estimator Variance” on page 5-183
Detect correlation among predictors and accommodating problems of large estimator variance
Demo_TSReg3
“Time Series Regression III: Influential Observations” on page 5-193
Detect influential observations in time series data and accommodating their effect on multiple linear regression models
Demo_TSReg4
“Time Series Regression IV: Investigate trending variables, Spurious Regression” on page 5- spurious regression, and 200 methods of accommodation in multiple linear regression models
Demo_TSReg5
“Time Series Regression V: Predictor Selection” on page 5212
Select a parsimonious set of predictors with high statistical significance for multiple linear regression models
Demo_TSReg6
“Time Series Regression VI: Residual Diagnostics” on page 5-223
Evaluate model assumptions and investigating respecification opportunities by examining the series of residuals
Use the Hodrick-Prescott filter to reproduce their original result.
Data Sets and Examples
Example Name
Title
Description
Demo_TSReg7
“Time Series Regression VII: Forecasting” on page 5-234
Present the basic setup for producing conditional and unconditional forecasts from multiple linear regression models
Demo_TSReg8
“Time Series Regression VIII: Examine how lagged predictors Lagged Variables and Estimator affect least-squares estimation Bias” on page 5-243 of multiple linear regression models
Demo_TSReg9
“Time Series Regression IX: Lag Illustrate predictor history Order Selection” on page 5-264 selection for multiple linear regression models
Demo_TSReg10
“Time Series Regression X: Estimate multiple linear Generalized Least Squares and regression models of time series HAC Estimators” on page 5-282 data in the presence of heteroscedastic or autocorrelated innovations
Demo_USEconModel
“Model the United States Economy” on page 9-150
ModelAndSimulateElectric “Model and Simulate Electricity itySpotPricesUsingSkewNo Spot Prices Using the SkewrmalExample Normal Distribution” on page 7184
Model the U.S. economy using a VEC model as a linear alternative to the SmetsWouters DSGE macroeconomic model Simulate the future behavior of electricity spot prices from a time series model fitted to historical data, and use the skew normal distribution to model the innovations process.
See Also More About •
“Create Tables and Assign Data to Them”
•
“Resample and Aggregate Data in Timetable”
•
“Combine Timetables and Synchronize Their Data”
•
“Clean Timetable with Missing, Duplicate, or Nonuniform Times”
A-5