Econometric Analysis: An Applied Approach to Business and Economics 1527563375, 9781527563377


116 72 5MB

English Pages [741] Year 2024

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Table of Contents
Preface
Chapter One
Chapter Two
Chapter Three
Chapter Four
Chapter Five
Chapter Six
Chapter Seven
Chapter Eight
Chapter Nine
Chapter Ten
Chapter Eleven
Chapter Twelve
Appendices
References
Subject Index
Greek Letters and Mathematical Symbols Used in the Book
Recommend Papers

Econometric Analysis: An Applied Approach to Business and Economics
 1527563375, 9781527563377

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Econometric Analysis

Econometric Analysis: An Applied Approach to Business and Economics By

Sharif Hossain

Econometric Analysis: An Applied Approach to Business and Economics By Sharif Hossain This book first published 2024 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2024 by Sharif Hossain All rights for this book are reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-6337-5 ISBN (13): 978-1-5275-6337-7

TABLE OF CONTENTS

Preface .......................................................................................................................................................................viii Chapter One ................................................................................................................................................................. 1 Introduction: Basic Concepts and Ideas 1.1 Introduction 1.2 Meaning of Econometrics 1.3 Meaning of Theoretical Model 1.4 Ideas behind Econometric Modeling 1.5 Types of Relationships 1.6 Random Error Term 1.7 Data Associated with Econometric Model 1.8 Variables Exercises Chapter Two .............................................................................................................................................................. 15 Simple Regression Models 2.1 Introduction 2.2 Simple Linear Regression Models and Estimation Methods 2.3 Properties of the Least Squares Estimators 2.4 Sampling Distribution of the Least Squares Estimators 2.5 Student t-Test for the Least Squares Estimators 2.6 Confidence Interval Estimation for the Population Parameter ȕ j (j = 0, 1) 2.7 Partitioning Total Sum of Squares 2.8 Coefficient of Determination 2.9 Relationship between Regression Parameter and Elasticity 2.10 Relationship between Correlation Coefficient and Regression 2.11 The Capital Asset Pricing Model (CAPM) 2.12 Two Variables Non-linear Relationships Exercises Chapter Three ............................................................................................................................................................ 75 Multiple Regression Models 3.1 Introduction 3.2 Multiple Regressions 3.3: Three Variables Linear Regression Model 3.4 Multiple Linear Regression Model 3.5 Properties of the Ordinary Least Squares (OLS) Estimators 3.6 Maximum Likelihood Estimation of the Parameters of Multiple Linear Regression Equations 3.7 Properties of the Maximum Likelihood Estimators 3.8 Some Important Theorems of the Estimators 3.9 Sampling Distribution of s 2 and Also Mean and Variance of s 2 ˆ 3.10 Important Properties of Y 3.11 Important Properties of Residuals 3.12 Characteristics of the Coefficient of Multiple Determination ( R 2 ) 3.13 Steps That Are Involved in Fitting a Multiple Linear Regression Model (Without Distributional Assumptions of Random Error Term) 3.14 Steps That Are Involved in Fitting a Multiple Linear Regression Model (With Distributional Assumptions of Random Error Term) ˆ 3.15 Show that ryy2ˆ = R 2 where ryyˆ is the Correlation Coefficient between y and Y 3.16 Analysis of Variance 3.17 Test of Significance of Parameter Estimates of Multiple Linear Regression Models 3.18 Multivariate Non-linear Relationships 3.19 Testing for Non-Linearity Exercises

vi

Table of Contents

Chapter Four ........................................................................................................................................................... 166 Heteroscedasticity 4.1 Introduction: Meaning of Heteroscedasticity 4.2: Structural Forms of Heteroscedasticity 4.3 Possible Reasons of Heteroscedasticity 4.4 Nature of Heteroscedasticity 4.5 Consequences of Heteroscedasticity 4.6 Properties of the OLS Estimator of ȕ when Var(İ  ı 2 , n : Effects on the Properties of the OLS Estimators 4.7 Methods for Detecting the Presence of Heteroscedasticity 4.8 Estimation of the Heteroscedastic Model Exercises Chapter Five ............................................................................................................................................................ 231 Autocorrelation 5.1 Introduction 5.2 Meaning of Autocorrelation 5.3 Sources of Autocorrelation 5.4 Mean, Variance and Covariance of Autocorrelated Random Error Terms 5.5 Consequences of Autocorrelation 5.6 Properties of the OLS Estimator of ȕ when Random Error Terms are Autocorrelated: Effects on the Properties of the OLS Estimators 5.7 OLS Estimation with Lagged Dependent Variables when Random Error Terms are Autocorrelated 5.8 Methods for Detecting the Presence of Autocorrelation 5.9 Methods for Estimating Autocorrelated Models Exercises Chapter Six .............................................................................................................................................................. 283 Multicollinearity 6.1 Introduction 6.2 Some Important Concepts 6.3 Sources of Multicollinearity 6.4 Consequences of multicollinearity 6.5 Properties of the OLS Estimators: Effects on the Properties of the OLS Estimators 6.6 Some Important Theorems 6.7 Detection of Multicollinearity 6.8 Solutions to the Problem of Multicollinearity Exercises Chapter Seven .......................................................................................................................................................... 333 Selection of Best Regression Equation and Diagnostic Testing 7.1 Introduction 7.2 Important Techniques to Select the Best Regression Equation 7.3 Model Specification, and Diagnostic Testing Exercises Chapter Eight........................................................................................................................................................... 359 Time Series Econometrics 8.1 Introduction 8.2 Some Fundamental Concepts and Ideas 8.3 Time Series Econometric Models 8.4 Autoregressive (AR) Process 8.5 Moving Average (MA) Process 8.6 Partial Autocorrelation Function 8.7 Sample ACF and PACF Plots for Different Processes 8.8 Autoregressive and Moving Average (ARMA) Process 8.9 Model Selection Criteria 8.10 Maximum Likelihood (ML) Method for Estimation AR Processes 8.11 Maximum Likelihood (ML) Method for Estimation MA Processes 8.12 Methods for Estimating ARMA Models 8.13 Autoregressive Conditional Heteroscedastic (ARCH) Models 8.14 Generalized Autoregressive Conditional Heteroscedasticity (GARCH) Model 8.15 The GARCH-in-Mean (GARCH-M) Models Exercises

Econometric Analysis: An Applied Approach to Business and Economics

vii

Chapter Nine ............................................................................................................................................................ 422 Univariate Time Series Econometrics with Unit Roots 9.1 Introduction 9.2 Meaning of the Unit Root 9.3 Distribution and Rates of Convergence of the OLS Estimated Coefficients of Unit Root Processes. 9.4 Brownian Motion: Continuous Time Stochastic Processes 9.5 The Functional Central Limit Theorem (FCLT) 9.6 Asymptotic Theory for Integrated Processes 9.7 Unit Root Tests 9.8 Summary of Dickey-Fuller Tests for Unit Roots in a First-Order Autoregressive Model (Absence of Serial Correlation) 9.9 Unit Root Tests with Serially Correlated Errors 9.10 Summary of Phillips-Perrons Tests for Unit Roots in a First-Order Autoregressive Model 9.11 General AR Processes with a Unit Root: The Augmented Dickey-Fuller test 9.12 Summary of the ADF Tests for Unit Roots in Autoregressive Equation 9.13 Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) Tests for Stationarity Exercises Chapter Ten ............................................................................................................................................................. 493 Multivariate Time Series Models 10.1 Introduction 10.2 Dynamic Models with Stationary Variables 10.3 Spurious Regressions 10.4 Cointegration 10.5 Vector Autoregressive (VAR) Models 10.6 Seemingly Unrelated Regression Equations (SURE) Models 10.7 Error Correction Mechanism (ECM) 10.8 Causality Tests Exercises Chapter Eleven ........................................................................................................................................................ 567 Limited Dependent Variable Models 11.1 Introduction 11.2 Dummy Dependent Variable 11.3 Dummy Dependent Variable Models 11.4 Goodness of Fit of Limited Dependent Variable Models 11.5 Censoring and Truncation 11.6 Poisson Regression Models Exercises Chapter Twelve........................................................................................................................................................ 636 Regression Models Based on Panel Data 12.1 Introduction 12.2 Panel Data Regression Models 12.3 Treatment of Individual Effect 12.4 Instrumental Variables and GMM Estimation 12.5 Panel Unit Root Tests 12.6 Panel Data Cointegration Analysis Appendices: Statistical Tables ................................................................................................................................ 695 References ................................................................................................................................................................ 703 Subject Index ........................................................................................................................................................... 707 Greek Letters and Mathematical Symbols Used in the Book .............................................................................. 728

PREFACE

During the last four decades, the uses of econometric tools and techniques in different areas of human activitiesirrespective of any discipline have been increasing rapidly for solving problems and appropriate decision-making and policy design. It is now generally accepted in all societies that the study of modern business, economics, socioeconomics, finance, banking and management is incomplete without proper knowledge of the basic and advanced level of econometric tools and techniques. Given the increasing complexity and variety of problems, in the fields of business, economics, socio-economics, finance, banking, management, medical sciences, psychology, sociology and natural sciences, students and researchers without a fair knowledge of econometrics may not be able to cope, and hence, may remain unfamiliar with many aspects of business, economic, socio-economic, finance, banking, managerial and scientific problems. Thus, the book “Econometrics Analysis: An Applied Approach to Business and Economics” is written to study basic and advanced studies of econometric courses for the undergraduate and graduate levels of students of business, economics, socio-economics, finance, banking, management and other social sciences as well as for the researchers already engaged in these fields who desire an introduction to econometric methods and their applications for solving real-life problems and reform policies in many aspects. In writing the first edition of this book, I aim to offer students and researchers a balanced presentation of fundamental and advanced levels of econometric concepts and methods along with practical advice on their effective application to real-life problems. This book covers almost all aspects of econometric techniques giving priority to modelling, inferential statistics, estimation, and testing of statistical hypotheses based on cross-sectional, time-series and panel data. This book includes the topics of basic concepts and ideas of econometrics; simple and multiple linear and non-linear regression analyses; the problems of single equation models: heteroscedasticity, autocorrelation, and multicollinearity; selection of the best regression equation along with the diagnostic tests; single variate and multivariate time-series econometric models; limited dependent-variable models; and econometric analyses of panel data models. It is known to us that econometrics is frequently interested in modelling to describe how the dependent and independent variables are related to each other. The principal objective of econometrics is to quantify the relationships between dependent and independent variables based on numerical data using statistical techniques, and for interpretation, and the resulting outcomes play a significant role in appropriate decision-making and reform policies. In general, econometrics deals with the macroeconomic relationships to reform policies for the economic development of any society. Nowadays, econometric tools and techniques are widely applicable and they play an important role in almost every field of social, business, economic, and scientific research and for the empirical verification of the theoretical construct. The wide application of econometrics in the field of finance has given rise to a new discipline namely Financial Econometrics. A preliminary study of econometrics requires a clear idea of the basic concepts and ideas. Keeping that in mind, primarily, the fundamental concepts and ideas of econometrics are discussed along with different functions which are associated with regression modelling. This book has been organised as follows: Introductory: some basic concepts and ideas of econometrics are introduced in Chapter 1. In Chapter 2, the simple linear and non-linear regression models along with how to fit them to a set of data points using different estimation methods namely: the ordinary least-squares (OLS) method, maximum likelihood method, and the method of moments are presented. In this chapter, the important properties of OLS estimators are discussed along with their proofs. Later on, it is shown how to judge whether a relationship between a dependent variable (y) and independent variable (x) is statistically significant, how to use the model to estimate the expected value of y, and how to forecast a future value of y for a given value of x. In this chapter, proper justifications for the techniques employed in regression analysis are also provided along with numerical problems. In Chapter 3, the multiple linear and non-linear regression models are presented along with their estimation techniques. In this chapter, the important properties of the OLS and ML estimators are discussed with their proofs. In this chapter, it is shown how to judge whether a relationship between a dependent variable (y) and independent variables X’s are statistically significant or not. In this chapter, the Wald test, the F-test, and the LM test are also discussed for testing joint null hypotheses and general linear hypotheses. This chapter discusses some fundamental theorems based on the OLS estimators and their proofs. The ANOVA tables are also presented in fitting the multiple regression equations in different cases in this chapter. All the techniques are described along with their applications to the numerical problems. In chapters 4, 5, and 6, problems of a single equation model namely: heteroscedasticity, autocorrelation and multicollinearity are presented along with the consequences and their proofs. Different tests for detecting the problems of heteroscedasticity, autocorrelation and multicollinearity and different estimation techniques of single-equation models with these problems are discussed. All the techniques are presented along with their applications to numerical problems.

Econometric Analysis: An Applied Approach to Business and Economics

ix

In Chapter 7, different criteria for selecting the best regression equation along with the diagnostic tests are presented. In this chapter, all the techniques are discussed along with their applications to numerical problems. In Chapter 8, different time-series econometrics models such as AR, MA, ARMA, ARIMA, ARCH, GARCH, EGARCH, MGARCH and GARCH-in Mean models are presented along with their important properties and estimation techniques. In this chapter, all the techniques are presented along with their applications to numerical problems. We know that the usual techniques of regression analyses can result in highly misleading conclusions when the time-series variables contain stochastic trends (Stock and Watson (1988), Nelson and Kang (1981), Granger and Newbold (1974)). In particular, if the dependent variable and at least one independent variable contain a stochastic trend, and if they are not co-integrated, the regression results are spurious (Phillips (1986), Granger and Newbold (1974)). Therefore, to identify the correct specification of the model, an investigation of the presence of stochastic trends in the time-series variables is very important. Therefore, in Chapter 9, the most popular and widely applicable tests to investigate whether time series data contain a stochastic trend or the unit root problem namely: Dickey and Fuller (DF), Phillips-Perron (PP), the Augmented Dickey-Fuller (ADF), and the Kwiatkowski, Phillips, Schmidt and Shin (KPSS) tests are discussed with their derivations. The DF, ADF and PP tests are discussed in four different cases. In this chapter, first, the meaning of unit root is explained with the numerical problem and later on, convergence rates of the OLS estimators of unit root processes are discussed. Brownian motion along with the functional central limit theorem are also discussed in this chapter. The chapter also discusses different unit root tests for different cases along with their applications to numerical problems. We know that the multivariate time-series models are very important relative to univariate time-series models, because they may improve forecasts, as a result, appropriate decisions can be taken and appropriate policies can be formulated. Therefore, in Chapter 10, multivariate time-series models such as dynamic, VAR, VEC and ARDL models are presented. In this chapter, the Engle and Granger test procedure is presented to test for detecting the causal relationships between pairs of variables. Three different types of models are presented for detecting the direction of causality between two variables X and Y depending upon the order of integration and the presence or absence of the cointegration relationship. In this chapter, the Toda-Yamamoto approach is also discussed for detecting the direction of causality between two variables X and Y, when X and Y may be integrated with different orders or not integrated or in both. In this chapter, all the techniques are discussed with their applications to numerical problems. In Chapter 11, the most important and widely applicable limited dependent variable regression models namely the linear probability model, logit model, probit model and Tobit model are presented for detecting the probability of an event which is associated with different independent variables along with their important properties and estimation techniques. In this chapter, the censored and truncated regression models are also presented along with their important properties and estimation techniques. In this chapter, the Poisson regression model is also presented along with the important properties and estimation techniques to deal with the count-dependent variable. In this chapter, all the techniques are discussed with their applications to numerical problems. Sometimes, we have to deal with the panel data for econometric analyses of economic relationships in advanced studies programs. Therefore, in Chapter 12, an econometric analysis for panel data models is also presented. The most popular and widely applicable panel-data regression model, fixed-effects model, and random-effects model along with their important properties and estimation techniques are presented. In this chapter, different panel unit-root tests and cointegration tests are also presented. In this chapter, all the techniques are discussed with their applications to numerical problems. If the students of the undergraduate and graduate levels study this book, they should be able to know that the econometric tools and techniques are very fundamental and important for an in-depth understanding of the subject matter and help deal with the real-life problems irrespective of any discipline; to formulate, estimate, test and interpret suitable models for empirical studies irrespective of any discipline; to develop the ability for using econometric argumentation in the exposition of economic, socio-economic and business problems; to apply different econometric tools and techniques for solving different kinds of business, economic, socioeconomic, financial, banking, and managerial problems based on cross-sectional, time series and panel data and for appropriate decision and policy making; to specify a time-series model for explanation or forecasting with an eye on the problems of non-stationarity, spuriousness, and seasonality; to apply different univariate time-series models such as AR, MA, ARMA, ARIMA, ARCH, GARCH, EGARCH, GARCH-in Mean models to deal with the real-life financial problems; to apply different multivariate time-series econometric models such as VAR , VEC and SURE models to deal with our real-life problems and to formulate short-run as well as long-run policies; to apply different econometric tools and techniques for panel-data analyses; to get the ideas in many software-based applications such as EViews, RATS and STATA to deal with numerical problems irrespective of any discipline; to apply econometric tools and techniques to write their Master’s and Ph.D. thesis; and to apply econometric methods for different research projects by researchers. Thus, it can be said that the book will be appreciated by the students of undergraduate and graduate levels, professionals, trainers, and researchers who are doing research in the fields of business, economics, socio-economics, finance, banking, management and other physical and natural sciences. It can also be said that this book is a substantial addition to the voluminous literature of cross-sectional, time-series and panel data regression models and formally trained econometricians and statisticians interested in understanding many applications of econometric tools and techniques to deal with real-life problems might find this book helpful. I would like to express my thanks to my student Bablu Nasir for his sincere cooperation. I would also like to acknowledge my gratefulness to Cambridge Scholars Publishing and its editors and staff for taking special care towards the publication of this book.

CHAPTER ONE INTRODUCTION: BASIC CONCEPTS AND IDEAS

1.1 Introduction Econometrics is frequently interested in illustrating the behaviours of the relationships between dependent and independent economic variables. Thus, it can be said that the main concern of econometrics is to describe how the dependent and independent variables are related to each other by modelling. The principal objective of econometrics is to quantify the relationships between dependent and independent variables based on numerical data, using statistical tools and techniques to ease interpretation. The resulting outcomes play an important role in appropriate decision-making and policy formulation. In general, econometrics deals with macroeconomic relationships, but nowadays, econometric tools and techniques are also widely used in almost every field of social science, business, finance and economic research and for empirical verification of the theoretical construct. The wide application of econometrics in the field of finance has given rise to a new discipline called Financial Econometrics. Studying econometrics will introduce econometric tools and techniques which are fundamental and important for an in-depth understanding, and help to deal with business, economics, socio-economic, financial and scientific problems; to formulate, estimate, test and interpret suitable models for empirical study of economic, socio-economic, finance and business phenomenon; to develop the ability to use the econometric argumentation in exposition of economics, socio-economics, finance and business; to apply different econometric tools and techniques for solving different kinds of economic, socio-economic, financial and business problems and for appropriate decision making and policy formulations; to specify a time-series model for explanation or for forecasting with an eye on the problems of non-stationarity, spurious regression, and seasonality; to apply different econometric tools and techniques for panel-data analyses; to apply different software packages including EViews, GAUSS, LIMDEP, Python, R, RATS, SPSS, STATA, etc. for solving different problems irrespective of any discipline; to apply econometric methods to write Master’s and Ph.D. thesis, and to apply econometric methods for research projects. Therefore, to study econometrics primarily, we have to get a clear idea about some basic concepts and ideas. In this chapter, some preliminary concepts and ideas which are associated with the study of econometrics are discussed. Also, different functions which are associated with regression modelling are discussed here.

1.2 Meaning of Econometrics Econometrics is defined as the mathematical, empirical or numerical measurement of economic relationships in which the dependent variable is a function of deterministic components plus a random error term. In general, econometrics is defined as the branch of economics which deals with mathematical, empirical or numerical measurement of economic relationships. The main objective of econometrics is to deal with empirical measurement of economic theories and their verification. Ex. 1-1: Suppose that an advertising agency might be interested in modelling the relationship between a firm’s sales revenue (Y) and advertising costs (X) to know the impact of advertising costs on sales revenue. The relationship between Y and X can be written as

Y = f(X, ȕ0 , ȕ1 ) + İ

(1.1)

where f(X, ȕ 0 , ȕ1 ) is a deterministic factor and İ is a random error term. The mathematical or empirical measurement of relationship (1.1) is called econometrics.

1.3 Meaning of Theoretical Model A theoretical model is defined as the functional relationship between economic variables in which one is the dependent variable and the remaining is/are independent variable(s). Theoretical models mainly deal with the development of appropriate methods for mathematical, empirical or numerical measurement of economic relationships.

2

Chapter One

Ex. 1-2: The relationship between profits (Y) and sales revenue (X) of a firm can be expressed by the following theoretical model: Y = f(X)

(1.2)

If Y linearly depends on X, then the theoretical model between Y and X is given by

Y = ȕ 0 + ȕ1X

(1.3)

where ȕ0 and ȕ1 are the unknown and unsigned parameters connected with X and Y. ȕ1 is the rate of change of Y with respect to X and ȕ 0 is the value of Y at the origin. Types of Theoretical Models There are two different types of theoretical models (i) The economic model or Deterministic model. (ii) The statistical model or Econometric model or Probabilistic model. Economic Model or Deterministic Model The economic or deterministic model is defined as the mathematical relationship between economic variables in which one is the dependent variable and the remaining is/are independent variable(s) which involves questions concerning the signs and magnitudes of unknown parameters. An economic or deterministic model hypothesised an exact relationship between economic variables. Thus, we can say that, in the economic model, the dependent variable Y is a function of the systematic components or deterministic components. The mathematical form of an economic model is given by

Y = f(X1 , X 2 ,.....,X k , ȕ 0 , ȕ1 ,.......,ȕ k )

(1.4)

If the variable Y linearly depends on X1, X2,…….and Xk, then the economic model of the above relationship can be written as:

Y = ȕ0 + ȕ1X1  ȕ 2 X 2  .......  ȕ k X k

(1.5)

where Y is the dependent variable and X1, X2,…….and Xk, are the independent variables. ȕ1 , ȕ 2 , …, and ȕ k are the unknown and unsigned parameters connected with the independent variables X1, X2,…….and Xk, and the dependent variable Y. These unknown and unobservable parameters are directly or indirectly related to elasticities and multipliers that play significant roles in economic decisions and actions. The economic model helps us to identify the relevant economic variables and economic parameters and gives us a basis for making economic conclusions and decisions. Ex. 1-3: The economic model of the quantity demanded of a product can be written as

Qd = f(P, Ps , Pc , Y, ȕ 0 , ȕ1 , ȕ 2 , ȕ3 , ȕ 4 )

(1.6)

where Qd is the quantity demanded of the product; P is the per unit price for the product; Ps is the per unit price for commodities that are substituted; Pc is the per unit price of items that are complements; and Y is the income level of consumers. ȕ1 , ȕ 2 , ȕ3 and ȕ 4 are the unknown and unsigned parameters connected with the price and income variables P, Ps, Pc, Y and the quantity variable Qd. These unknown and unobservable parameters are directly or indirectly related to elasticities of the quantity demanded of the product with respect to price and income that will play significant roles in economic decisions and actions about demand function. If Qd linearly depends on P, Ps, Pc, and Y, relationship (1.6) can be expressed as

Qd = ȕ 0 +ȕ1P +ȕ 2 Ps +ȕ3 Pc +ȕ 4 Y

(1.7)

ȕ1 indicates the rate of change of Qd with respect to P given that all other independent variables are constant; ȕ 2 indicates the rate of change of Qd with respect to Ps given that all other independent variables are constant; ȕ3 indicates the rate of change of Qd with respect to Pc given that all other independent variables are constant; ȕ 4 indicates the rate of change of Qd with respect to Y given that all other independent variables are constant; and ȕ 0 is the value of Qd when all the independent variables are zero. Statistical or Econometric or Probabilistic Model

Introduction: Basic Concepts and Ideas

3

The statistical, econometric or probabilistic model is a mathematical relationship of economic variables in which the dependent variable Y can be expressed as a function of the systematic components and the nonsystematic random error term. The mathematical form of the statistical, econometric or probabilistic model is given by

Y = f(X1 , X 2 ,......,X k , ȕ0 , ȕ1 , ȕ 2 ,.....,ȕ k )+İ

(1.8)

where f(X1 , X 2 ,......,X k , ȕ0 , ȕ1 ,......,ȕ k ) is the systematic component and İ is the nonsystematic component that we know is present but cannot be observed. This is called the random error term. If the variable Y linearly depends on X1, X2,…….and Xk, the econometric model of the above relationship can be expressed as

Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +......+ȕ k X k +İ i

(1.9)

where ȕ j (j=1, 2,……k) indicates the average impact of per unit change in Xj (j=1, 2,…,k) on Y given that all other independent variables are constant; and ȕ0 is the regression constant which indicates the average value of y when all X’s are zero. A statistical or econometric model specifies a sampling process by which the sample data are generated and identifies the unknown parameters ȕ0 , ȕ1 ,......, and ȕ k that are included in the model that describes the underlying probability distribution. Ex. 1-4: The econometric model between the household consumption expenditure (C) and the level of income (Y) can be expressed as

C = f(Y, ȕ0 , ȕ1 ) + İ

(1.10)

If C linearly depends on Y, the relationship can be written as

Ci = ȕ0 +ȕ1Yi + İi

(1.11)

where the term Ci indicates the consumption expenditure of the ith household; and Yi indicates the income level of the ith household; İi is the random error term corresponding to the ith set of observations (i=1, 2,..….,N). ȕ 0 , and ȕ1 are the unknown and unsigned parameters connected with the income level (Y) and the consumption expenditure(C). ȕ1 is called the regression coefficient which indicates the average impact of per unit change in Y on C and ȕ0 is called the regression constant which indicates the average value of C when Y is zero.

1.4 Ideas behind Econometric Modeling An economic model is a hypothetical or a theoretical construct which represents quantitative or logical relationships between two or more than two economic variables in which one is the dependent variable and the remaining is/are independent variable(s) describing the economic behaviours of an economic agent or of an economy or a sector of the economy. Economists are frequently using economic models to make several simplified statements or assumptions and then to see how different scenarios might play out. These types of theoretical constructs make statements or hypotheses that are mostly qualitative in nature. For example, microeconomic theory states that an increase in the price of a commodity is expected to increase the quantity supplied of that commodity given that other things are constant. Thus, the theoretical model illustrates a positive linear or non-linear relationship between the price and quantity supplied of the commodity. However, this theoretical construct does not provide any numerical measure of the relationship between the two variables. It also does not tell us how much the quantity supplied of the commodity will increase due to per unit change in the price of the commodity while all other independent variables are constant. However, the econometric model will provide us with numerical estimates. The econometric model gives the numerical, empirical or mathematical measurements of economic relationships. In addition, the main concern of mathematical economics is to express an economic theory in the mathematical/equational form without regard to empirical verification of the economic theory but econometric techniques are the outstanding methods for empirical verification of economic theorems. The economic models are confronted with the observational data. Using econometric techniques, we try to explore whether the theory can adequately explain the actual economic units. The main concern of economic statistics is to collect numerical data, record them, and represent them in tabular forms or graphs. Economists and statisticians attempt to describe their changes over time and also to detect relationships between economic variables. Mainly, economic statistics is a descriptive aspect of economics but it does not provide explanations of the development of economic variables and it does not provide any numerical measurement of economic relationships. However, econometric models will provide the mathematical or empirical measurement of economic relationships. The uses of econometric models for empirical or mathematical measurements or empirical verification of economic relationships have mainly two critics. First, sometimes econometric models may not be correctly specified for the empirical or mathematical measurement of economic relationships. For example, the use of a linear regression model for mathematical measurement of the relationship between marginal costs and the output level of an industrial sector is not correct. Another criticism is that

4

Chapter One

an econometric model may not be realistically compatible with the type of data to be used. An econometric model based on the data of the individual economic agent or commodity may not be suitable for use with data aggregated over individuals or commodities. Problems may arise if the data are aggregated over a period of time longer than to which the economic model is applied. Despite these drawbacks, econometric models play an important role as the guide for economic policies. Economic Theory, Mathematical Economics and Econometrics An economic theory deals with the laws and principles which govern the functioning of an economy and its various parts. The main concern of an economic theory is to express statements or hypotheses about relationships of economic variables that are mostly qualitative in nature. The economic theory does not allow to inclusion of any random factor that might affect the relationship of economic variables, and thus, prevent the inclusion of a stochastic variable. In addition, the economic theory does not provide any mathematical or numerical measurement of economic relationships. For example, in the microeconomic theory, the law of demand states that the quantity demanded of a commodity and the price of that commodity are inversely related keeping that all other factors are constant. Thus, the economic theory implies a negative or inverse relationship between the price and the quantity demanded of the commodity but this theory does not tell us how a per unit change in the price of the commodity affects the quantity demanded of that product. The main concern of mathematical economics is to express the relationship of economic variables in mathematical form or an equational form without the mathematical measurement or empirical verification of the theory. The use of mathematical formulas does not allow the economist or researchers to include any random factor which might affect the relationship but allows economists or researchers to make assumptions regarding economic theories. It can be said that mathematical economics deals with the economic behaviours of deterministic models. Thus, there are no significant differences between economic theory and mathematical economics, as both of them state the same economic relationship in which the former is an expression in the linguistic form. However, mathematical economics uses mathematical or equational forms. Both express different economic relationships in an exact form but not in a probabilistic model. Neither economic theory nor mathematical economics provides the numerical measurement of economic relationships. However, econometrics differs significantly from economic theory and mathematical economics. The main concern of econometrics is to express the economic relationships in a mathematical form, but it does not assume the relationships to be exact as the model used is probabilistic in nature. Thus, econometric techniques are developed to take into account the effect of random disturbances which make differences from the exact behavioural patterns suggested by economic theory and mathematical economics. Furthermore, econometric methods provide us with the mathematical/empirical or numerical measurement of economic relationships. For example, if we apply the econometric methods to estimate the demand function of the type:㻌 q t = Į +ȕp t +İ t , we find the numerical values of the coefficients of Į 㻌and ȕ respectively. Thus, it can be said that econometrics provides us with precise estimates of the coefficients or elasticities of economic relationships which play an important role for the policy-makers in policy reforms. Division of Econometrics Econometrics may be divided into two categories: (i) Theoretical econometrics (ii) Applied econometrics Theoretical Econometrics Theoretical econometrics is the branch of econometrics which deals with the development of appropriate methods for the mathematical or empirical or numerical measurement of economic relationships. For mathematical, empirical or numerical measurement of economic relationships, different techniques such as the OLS method, ML method, WLS, GLS, GMM, Unit Root Tests, Cointegration Analysis, Granger Causality Tests, etc. are developed. Applied Econometrics Applied econometrics is the branch of econometrics which deals with the application of econometric methods for the mathematical or empirical or numerical measurement of economic relationships in which one is the dependent variable and the remaining is/are independent variable(s) plus a random error term to specific branches of economic theory. Ex. 1-5: Assume that the relationship between return on assets (RET) and the risk (BETA), is linear, so the econometric model is given by

RETi = ȕ0 +ȕ1BETAi +İ i

(1.12)

where RETi is the return of the ith company, BETAi is the beta coefficient from CAPM of the stock’s price index of the ith company; and İi is the random error term corresponding to the ith set of observations (i =1, 2,…..,N). ȕ0 is the

Introduction: Basic Concepts and Ideas

5

regression constant which indicates the average value of RET when BETA is zero and ȕ1 is the regression coefficient which indicates the average rate of change of RET with respect to BETA. If we apply the OLS method for the empirical measurement of the relationship in (1.12) based on the observed data of RET and BETA, it is called applied econometrics. Scope and Limitation of Econometrics Econometrics is the branch of economics which deals with the application of statistical methods and mathematical statistics to economic data for the mathematical measurement of economic relationships. It highly focuses on giving experimental content for finding out economic relationships. The main objective of econometrics is to compute the relationships between economic variables through statistical techniques. Therefore, the study of econometrics is very important for solving economic problems analytically and thereby providing effective and efficient solutions. Thus, econometrics is immensely important in the study of complex mathematical and statistical models which help in a detailed study of economic relationships for a specific branch of economics. In this section, the scopes of econometrics are discussed below: Scope of Econometrics (i) Econometric tools and techniques are very important for a solid understanding and dealing with business, economic, socio-economic, financial and scientific problems. Ex. 1-6: If we are interested in examining empirically the risk-return relationship of Dhaka Stock Exchange (DSE), we can deal with this problem using an econometric method called GARCH-in-mean (GARCH(p, q) -M) model. (ii) Econometric methods are widely applicable for appropriate decision-making and policy formulation by government agencies, businessmen, economists, statisticians, researchers and other policymakers irrespective of any discipline. Ex. 1-7: Suppose the Bangladesh Govt. wants to devaluate its currency to correct the balance of payments. To devaluate BDT currency, the Govt. have to estimate the price elasticities of imports and exports demand functions of commodities. If exports and imports are of inelastic nature then devaluation will only ruin its economy. On the other hand, if the price elasticities are of elastic nature, then the devaluation plays a positive role in the economy. These price elasticities are to be estimated with the help of the demand functions of imports and exports commodities. Econometric tools and techniques may be applied to calculate the price elasticities of the demand for exports and imports of commodities. Then based on the calculated values, policymakers will make a decision, on whether the BDT currency will be devaluated or not. (iii) Econometric techniques are the outstanding methods for empirical verification of economic theorems. The mathematical formulations in economic theory are called models which are confronted with observational data. Based on the econometric techniques, we try to explore whether the theory can adequately explain the actual economic units or not. Ex. 1-8: The Keynesian consumption function is given by C = f(Y)

(1.13)

where C is the aggregate consumption expenditure, and Y is the level of income. If the variable C linearly depends on Y, then the Keynesian consumption function can be expressed by the following econometric model

Ct = ȕ 0 +ȕ1Yt +İ t

(1.14)

where ȕ0 is the average aggregate consumption expenditure at the zero income level, ȕ1 is the average rate of change of C with respect to X which is called the marginal propensity to consume (MPC) lying between 0 and 1. Using econometric technique(s), we can estimate ȕ0 and ȕ1 , and verify how closely such estimates confirm to economic theory. These estimated MPC are used to calculate the size of the multiplier and they can be used for predicting changes in dC 1 income due to changes in investment. Here, is the multiplier effect. If the estimated = ȕ1 is the MPC and dY 1  MPC value of ȕ1 lies between 0 and 1, it can be said that the above relationship (1.14) satisfies the Keynes’ law. Thus, econometrics helps us to investigate whether theory appears to be consistent with observed data, whether the functional relationship is stable over time, and whether changes over periods. (iv) To establish new relationships between economic variables and to prove old theorems, econometricians frequently use econometric tools and techniques.

6

Chapter One

(v) Econometric methods are the most popular and widely applicable for forecasting the demand functions of commodities. Ex. 1-9: Let there exist a linear demand function between the quantity demanded (Y) of a product and the time trend (t) of the type:

Yt = Į+ȕt+İ t

(1.15)

where Yt represents the quantity demanded of a product at time t, İ t is the random error term corresponding to the tth set of observations, Į is constant which indicates the average value of Y at the point of origin, and ȕ is the regression coefficient which indicates the average change of Y for 1 unit change in time t. Let Įˆ and ȕˆ be the least squares T

estimates of Į and ȕ. They can be expressed by Įˆ = y  ȕˆ t, and ȕˆ =

¦Y t  nty t

t=1 T

. Then we can forecast the demand

¦ t2  n t 2 t=1

ˆ = Į+ȕ(t+1), ˆ ˆ of the product for time period (t+1) as given by Y and so on. t+1

(vi) Econometric methods are widely applicable for forecasting business cycles. (vii) Econometric methods are widely applicable for market research purposes, such as the analysis of a profit function, production function, cost function, supply function, distribution of wealth, etc. (viii) Econometric techniques are widely applicable for solving different kinds of socio-economic problems such as poverty, income inequality, crimes, divorce etc. (ix) Now-a-days, the study of econometrics is very important for hypothesis testing, estimation of the parameters, identifying proper functional forms of economic variables, and for measuring the impact of independent variable(s) on a dependent variable. Limitation of Econometrics

Econometrics has its own limitations. Econometrics cannot be applied to answer all kinds of queries. Some of its limitations are given below: (i) Econometric methods are widely applicable for quantitative analyses, but for qualitative analyses, econometric methods are not directly applicable or has limited application. Ex. 1-10: If we are interested in knowing the impact of democracy or political crises on economic development, econometric methods fail to answer these questions directly, as such problems cannot be transformed into a mathematical model. However, econometrics can answer indirectly by using the dummy variable(s) in the regression equation.

(ii) The econometric techniques are based upon certain statistical assumptions which are not always true with economic data. (iii) Econometric methods do not give priority to moral judgment, but for policy formulations and decision-making, moral judgments play an important role. (iv) Econometrics methods are time consuming, very complex and not easy to understand. (v) Econometric techniques should be applied by econometricians or skilled persons who know econometrics to deal with the problems irrespective of any discipline but these persons are not available in our country. Steps Involved in an Econometric Analysis of Economic Models

The following steps are involved in an econometric analysis of an economic model: Step 1: First, we have to formulate an economic model that is appropriate to give the answers to the questions. For example, the economic model for the demand analysis of a commodity can be written as

Qd = f(P, Ps ,Pc ,Y, ȕ 0 , ȕ1 , ȕ 2 , ȕ3 , ȕ 4 ) where all the terms of equation (1.16) have already been defined previously.

(1.16)

Introduction: Basic Concepts and Ideas

7

Step 2: Second, we have to formulate an econometric model from an economic model, which is appropriate for the empirical measurement of underlying assumptions associated with the problem under consideration. For example, the econometric model of the economic model given in equation (1.16) can be written as

Qd = f(P, Ps ,Pc ,Y, ȕ0 , ȕ1 , ȕ 2 , ȕ3 , ȕ 4 ) + İ

(1.17)

If the quantity demanded Qd of the commodity linearly depends on P, Ps, Pc and Y, the econometric model can be written as

Qdi = ȕ 0 +ȕ1Pi +ȕ 2 Psi +ȕ3 Pci +ȕ 4 Y+İi

(1.18)

Step 3: Third, we have to collect an appropriate data set that is properly defined and matches the concepts of the econometric model for empirical measurement. For example, for the empirical measurement of the economic relationship (1.17), we have to collect the data of the variables P, Ps, Pc, Y and Qd respectively. Step 4: Fourth, we have to use a suitable software package such as Microsoft Excel, GAUSS, LIMDEP, RATS, SPSS, EViews, STATA, R, Python, TSP, etc. to estimate and test the model to be carried out with the observed data. Step 5: In the final stage, we use the estimated model for prediction and policy purposes. For example, using the software RATS with the help of the observed data of the variables P, Ps, Pc, Y and Qd we can estimate the values of ȕ0 , ȕ1 , ȕ 2 , ȕ 2 , and ȕ 4 respectively, we can predict the value of Qd , and test whether the relationship is significant or not. Based on these estimated values, a decision can be made about the functional form and also a new policy can be formulated about the quantity demanded of that particular commodity. Different steps that are involved in an econometric analysis of economic models are shown below graphically: Formulate an Economic Model

Formulate an Econometric Model from an Economic Model

Collect Appropriate Data Set for Empirical Measurements

Estimate the Model Using Software with the help of the Observed Data Set

Test Hypothesis

Reformulate the Model

Interpret the Results

Forecasting

Policy Making

Fig. 1-1: Flowchart for the steps involved in an econometric analysis of economic models

1.5 Types of Relationships A relationship refers to the functional form between two or more economic variables in which one is the dependent variable and the remaining is/are independent variable(s). For the time being, based on the functional form, the values of the dependent variable are determined for the given values of the independent variable(s) or explanatory variable(s). There are many different types of relationships of economic variables that we have to deal with. This section focuses on different types of relationships with corresponding examples.

8

Chapter One

Behavioral Relationship

The behavioral relationships tell us how the dependent variable will change its behavior on an average to respond to changes in the independent variable(s). In a behavioral relationship, the dependent variable is one that takes values depending on the values of independent variables. Ex. 1-11: The relationship between the quantity demanded of a product and its price of the type

q = Į +ȕp

(1.19)

is called a behavioural relationship. Here, q is the quantity demanded, and p is the price per unit of the commodity. Į and ȕ are two parameters. This relationship describes that every additional unit of price (p) of the commodity generates

ȕ amount of change in quantity the demanded (q) of the commodity, and at the origin, the value of q will be Į. Thus, it can be said that, in a behavioral relationship, the dependent variable will be detected based on the given values of the independent variable(s). Technical Relationship

The functional relationship between a firm’s physical production output and the factors associated with production inputs is called the technical relationship. Thus, the production function is purely a technical relationship which connects factor inputs and output. The production function of a firm depends on the technological development of a country. With every stage of technological development of a country, the production function of the firm undergoes the changes. Ex. 1-12: The Cobb-Douglas production function of the type P = A 0 K Į Lȕ

(1.20)

is called a technical relationship. Here, P is the output, A 0 is a technical constant, K is the input capital, L is the input labour, Į is the output elasticity with respect to capital, and ȕ is the output elasticity with respect to labour. The CobbDouglas production function reflects how much output we can expect if we have a combination of labour and capital, and for the known values of A0 , Į, and ȕ. Definitional Relationship

A relationship is said to be a definitional relationship from which we can obtain a meaningful definition by relating the variables. Ex. 1-13: The productivity of a firm may be measured either on an aggregate basis or on an individual basis, which are called total and partial measures respectively. The total productivity measure is obtained from the following definitional relationship

Total productivity index/measure = Total output/ Total input The value of a product can be obtained by multiplying the selling price of the product and the quantity sold. Thus, the relationship of the type v = p×q, is called a definitional relationship. Stochastic Relationship

A functional relationship of economic variables is said to be a stochastic relationship if the dependent variable Y can be expressed as a function of the systematic components and the nonsystematic random error term. The mathematical form of the stochastic relationship is given by Y=f(X1 ,X 2 ,...,X k , ȕ0 , ȕ1 ,..,ȕ k )+İ (1.21) where f(X1 ,X 2 ,....,X k , ȕ0 , ȕ1 ,....,ȕ k ) is the systematic component and İ is the nonsystematic component that we know is present but cannot be observed. This İ is called a random error term. If the variable Y linearly depends on X1 , X 2 ,......, and X k then the stochastic relationship can be expressed as

Y= ȕ0 +ȕ1X1 +....+ȕ k X k +İ All the terms of equation (1.22), have already been defined.

(1.22)

Ex. 1-14: The linear function of the returns (RET) of firms with some firm-specific factors of the type

RETi = ȕ 0 +ȕ1Si +ȕ 2 PEi +ȕ3 BETAi +İ i

(1.23)

Introduction: Basic Concepts and Ideas

9

is called a stochastic relationship. Here, RETi is the annual return in the percentage of the ith firm, Si is the size of the ith firm which is measured in terms of sales revenue, PEi is the price to earnings (P/E) ratio of the ith firm, BETAi is the ith firm’s CAPM beta coefficient, and İ i is the random error term corresponding to the ith set of observation which is assumed to be independently, identically and normally distributed with zero mean and constant variance for all i, i.e., İ i ~IIND(0, ı 2 ),  i. ȕ0 , ȕ1 , ȕ 2 , and ȕ3 are the parameters. This stochastic relationship describes that every additional unit of the jth independent variable (j =1, 2, 3) generates ȕ j amount of average change in returns (RET) and ȕ 0 indicates the average value of RET when the values of S, PE and BETA are zero. In this relationship, RET is determined by S, PE and BETA. Static Relationship

A functional relationship of economic variables is said to be a static relationship if the dependent variable Y can be expressed as a function only of the deterministic components or systematic components in one specified period with a constant increment. Thus, a static relationship is a non-probabilistic model. The mathematical form of the static relationship is given by

Y = f(X1 ,X 2 ,....,X k , ȕ0 , ȕ1 ,....,ȕ k )

(1.24)

where f(X1 , X 2 ,....,X k , ȕ0 , ȕ1 ,....,ȕ k ) is the deterministic component. The static relationship does not allow to include the random factor in the equation. Ex. 1-15: The linear consumption function of the type

C = Į+ȕY

(1.25)

is called a static relationship. Here, C is the per capita consumption expenditure, Y is per capita income, Į, and ȕ are two parameters. The parameter ȕ indicates the rate of change of C with respect to Y and Į is the value of C at the origin. In this relationship, C is determined only by Y. Dynamic Relationship

A functional relationship is said to be a dynamic relationship if the dependent variable Y at time t can be expressed as a function of its past values including current and lagged values of another independent variable X. If the dependent variable Y linearly depends on X, then the dynamic relationship is given by

Yt = ȕ0 +ȕ1X t +ȕ1X t-1 +Į1Yt-1

(1.26)

Both behavioural and technical relationships can be expressed as static and dynamic. Ex. 1-16: The linear consumption function of the type:

Ct = ȕ0 +ȕ1Yt +ȕ 2 Yt-1 +Į1Ct-1

(1.27)

is called a dynamic relationship. Here, C t = Per capita consumption expenditure at time t, Ct-1 = Per capita consumption expenditure at time (t-1), Yt = Per capita income at time t, Yt-1 = Per capita income at time (t-1), ȕ0 , ȕ1 , ȕ 2 , and Į1 are the unknown and unsigned parameters which are connected with the variables Y and C. Micro Relationship

A functional relationship between the dependent variable (Y) and an independent variable (X) is said to be a micro relationship if the functional relationship holds to study the economic behaviours of a single firm, single household, for a consumer and so on. The demand function, supply function, cost function, profit function, etc. for a single firm are the micro relationships. Ex. 1-17: If we consider a production function to study the behaviours of a single firm of the type q = CLĮ K ȕ

(1.28)

it is called a micro relationship. Here, q is the output level of an output variable Q of a firm, L is the quantity of the input variable labour of the firm, and K is the capital investment of the firm. Here, D is the output elasticity with respect to labour and ȕ is the output elasticity with respect to capital of the firm. If we consider a consumption function to study the behaviours of a single consumer it is called a micro relationship.

Chapter One

10

Macro Relationship

A functional relationship between the dependent variable (Y) and an independent variable (X) is said to be a macro relationship, if the function holds to study the behaviours of a nation or aggregate of consumers or the aggregate of firms. The national consumption function, national savings function, national investment function, and the aggregate demand function are the macro relationships. Ex. 1-18: The functional relationship between savings and gross national income of the type

S = Į+ȕY

(1.29)

is called a macro relationship. This is called a national saving function. Here, the dependent variable S is savings and Y is the gross national income. Į and ȕ are two parameters in which Į is the constant which indicates the amount of savings at the origin, which will be negative, if the gross national income becomes zero at no savings. However, one has to borrow money from someone to survive. Thus, Į is always negative. The parameter ȕ is the rate of change of savings with respect to Y. ȕ is less than or equal to one because savings cannot be greater than income. Thus, the ratio of the change in savings to the change in income is less than 1. It will be equal to 1 only when the consumption expenditure is zero. The scenario can be expressed by the following equation

ȕ=

ǻS ǻ(Y  C) ǻY  ǻC ǻC = = =1  =1  b ǻY ǻY ǻY ǻY

where b is the marginal propensity to consume (MPC) Thus, the saving function can be written as

S = Į+(1-b)Y

(1.30)

Here, (1-b) is called the marginal propensity to savings (MPS). Based on the functional relationship, we must determine the impact of Y on S but the influence of other factors is assumed to be constant.

1.6 Random Error Term In an economic model, the dependent variable can be expressed as a function of the deterministic or systematic components, which describes the exact relationship between economic variables. However, in practice, to deal with any problem irrespective of any discipline, we feel that all the observations do not fall exactly on a straight line or any other smoothing curve. We can expect that the observed values will be closer to the straight line, or a curve. For this reason, in regression models we introduce a stochastic disturbances term to measure the deviation of the actual value from the central value of the dependent variable. The introduced stochastic disturbance term is known as a random error term, because this term is introduced in the regression model to find the effects of all the factors which are not included in the equation by the investigators or researchers. These types of errors are errors of omitted variables, errors of misspecification of the mathematical form of the model, errors of the measurement of the dependent variable, etc. Ex. 1-19: Suppose we want to study the relationship between profit (Y) and investment (X) of a number of listed companies at DSE at time t. Let us have n sample observations which are divided into m sub-groups on the basis of investment level. We have X1, X2,…., and Xm investment levels whose profits are Y1, Y2,…..,and Ym. We cannot expect that all the listed companies within one sub-group having X1 level of investment will have an identical profit Y1. Some companies have profits more than Y1 while some others will have less than Y1. But all values will vary around a central figure of Y1. Hence, the disturbance term is introduced in the regression model which will show the deviations of actual values from the central value. Here, the profit of different listed companies of subgroup one will be Y1 +İ1 ,

Y1 +İ 2 , Y1 +İ 3 , Y1 +İ 4 , and so on. Error term may have positive or negative values and is drawn at random. Reasons to Include the Random Error Term in a Model

We know that econometric models are probabilistic models or statistical models. Thus, econometric models give priority to probabilistic judgments and like all statistical judgments, they are subject to errors. These errors may arise for the following reasons: (i) The error may arise due to the omission of relevant variables in the model. We know that each economic variable is affected by so many economic variables at the same time. If any one of the relevant variables is left out and cannot be included in the model, errors may arise. In such a case, an error is bound to occur in the model.

Introduction: Basic Concepts and Ideas

11

(ii) Errors may also arise due to the non-availability of data. The econometric study is frequently based upon the assumption that we have large samples of accurate data. But unfortunately, we will not find an example in which reliable and representative data are available. Thus, in the absence of accurate data, the sampling error arises in the model. (iii) Errors may also arise due to the misspecification of the functional form between economic variables. Sometimes, we assume that the relationship is linear between variables, but it may be non-linear. In such a case, the forecast is bound to be incorrect. This type of error arises mainly due to the misunderstanding of the investigator.

1.7 Data Associated with Econometric Analysis A variety of economic data sets are applied in an econometric analysis for the empirical measurement of economic models. The most important economic data sets which are widely applicable for econometric analysis of economic relationships are described below: Cross-Sectional Data

The data which are observed at a particular point in time or at one point in time are called the cross-sectional data. For cross-sectional data the subscript i is used. For example, let Y1 , Y2 ,....,Yi ,....,Yn is a sample of total infected people due to coronavirus of n countries in the year 2020. This is a cross-sectional data in which Yi indicates the total infected people due to coronavirus of the country i (i =1, 2,……..,n). Ex. 1-20: The number of deaths due to coronavirus pandemic in the year 2020, the GDP of Asian countries for the year 2023, the number of car accidents recorded in different big cities in the year 2023, etc. will constitute cross-section data. Time Series Data

The data observed over a period of time are called time-series data. In other words, an arrangement of statistical data in chronological order, i.e., in accordance with the occurrence of time is known as time series data. Time series data may be defined as a set of figures observed over a period of time. Mathematically, a time series is defined as the functional relationship such as

Yt = f(t), where t = t1 , t 2 ,.....,t n .

(1.31)

where Yt is the value of the variable Y at time t. Ex. 1-21: The GDP, growth rate of GDP, export values, import values etc. of Bangladesh for a specified period of time will constitute time series data. Panel Data or Longitudinal Data

The combination of cross-sectional and time-series data is called panel data. For panel data, the subscripts i and t are used. The subscript i is used for cross-sectional and t is used for time-series data. For example {Yit , i=1, 2,...,n; t =1, 2,...,T}, a set of values of the variable Y, will constitute a panel data. Here, Yit is the value of the variable Y corresponding to the ith ( i=1, 2,…,n) individual and tth ( t=1, 2,…….,T) time. Ex. 1-22: The GDP of SAARC countries over a period of time; FDI of Asian, European, OECD countries over a period of time; per capita income of the people of different districts of Bangladesh over a period of time; agricultural production of different districts of Bangladesh over a period of time; profits of different firms over a period of time will constitute panel data. Experimental Data

The data which are collected by experiments, i.e., by natural science are called experimental data. Sometimes, the investigators collect data to find the effect of some factors on a given phenomenon provided that the effects of some other factors are constants, then, these types of data are called experimental data. Ex. 1-23: Suppose, to find the impact of consumption of ice-creams on weight gain, a researcher collected data keeping that the eating, smoking, and drinking habits of the people are constant will constitute experimental data.

12

Chapter One

1.8 Variables In this section, the variables which are the most popular and commonly used in econometric analyses are discussed. Variable

A variable is defined as a characteristic which can take on different values at different times, places, or situations within its domain. The domain of a variable X is a set of all permissible values that X can take in a given context. Ex. 1-24: Household income, household expenditure, national income, national expenditure, prices, interest rates, wages, profits, industrial output, agricultural output etc. are all variables, whose values are obtained from published external data sources. For the income variable of a family, if the lowest value is BDT 0 and the highest value is BDT 50,000, the domain of the income variable is 0 to 50,000.

In general, variables can be classified into two broad categories namely: (i) quantitative variables and (ii) qualitative variables Quantitative Variable

Quantitative variables are those variables whose values can be measured and expressed on a numerical scale. Ex. 1-25: GDP, GNP, export values, import values of Bangladesh are the quantitative variables. Types of Quantitative Variable

Quantitative variables can be classified into two broad categories namely: (i) discrete variables and (ii) continuous variables. Discrete Variable

A quantitative variable X is said to be discrete if it takes only a particular finite or countably infinite value to its domain. Ex. 1-26: The number of persons in a family, the number of crimes that are happening every day in Dhaka city, the number of deaths due to road accidents, and the number of deaths due to coronavirus every day in Dhaka City are all discrete variables. Continuous Variable

A quantitative variable X is said to be a continuous variable if it takes value within a specific interval or range. The variable X is said to be continuous if it takes values 1 to 5 in a straight line. That is, 1 d X d 5. Ex. 1-27: Height and weight of a person, household income and expenditure, distance between two places etc. are all continuous variables. Qualitative Variable

Qualitative variables are those variables which cannot be measured and expressed numerically but can be classified into several groups or categories. The characteristics which are used to classify an object or an individual into different categories are called attributes. Ex. 1-28: Sex of persons, education level of a university, and outcomes of a coin tossing problem are qualitative variables. The sex of persons has two categories namely: male or female. The education level of a university can be classified into three categories namely: good, medium or bad, the outcomes of a coin tossing problem gives two categories namely: head or tail. These categories are sometimes called attributes. Categorical Variable

A categorical variable is one that has two or more categories. There are two types of categorical variables: nominal and ordinal. A categorical variable is said to be a nominal variable if it has no intrinsic ordering to its categories but an ordinal variable has a clear ordering to its categories. Ex. 1-29: Gender is a categorical variable having two categories namely: (i) male and (ii) female with no intrinsic ordering to the categories. The categorical variable temperature is a variable with three orderly categories namely: (i) low, (ii) medium and (iii) high. A frequency table is a way of counting how often each category of the variable in question occurs. It may be enhanced by the addition of percentages that fall into each category.

Introduction: Basic Concepts and Ideas

13

Dichotomous Variable

The categorical variables with only two categories or levels are called the dichotomous variables. Ex. 1-30: The outcomes of a coin tossing problem-head or tail, classification of the employees of a public university according to gender: male or female, classification of all the politicians of Bangladesh according to income class: rich or poor, classification of the politicians according to political party namely: Democrat or Republican, classification of the students according to pass or fail. The number of death due to coronavirus are classified according to age: under age 65 or 65 and over are all dichotomous variables Binary Variable

Binary variables are those variables which take only two values 1 or zero. 1 is for success and 0 for failure. Ex. 1-31: Let us define a variable Y such that

Y

­1, if a family owns a new car ® ¯0, otherwise

then y is called a binary variable. Dependent and Independent Variables

A variable is said to be a dependent variable if its values depend on changes in other variables in a functional relationship of two or more variables. In general, the dependent variable is denoted by Y. A variable is said to be an independent variable if its values do not depend on changes in other variables in a functional relationship. This is in contrast to the definition of the dependent variable. An independent variable is denoted by X. Ex. 1-32: If we consider the demand function of the type q = a+bp, then the variable quantity demanded q is called the dependent variable and the variable per unit price p of the product is called the independent variable. a and b are two parameters in which b is the rate of change of q with respect to p and a is the value of q at the origin. Control variable

A variable that is held constant in order to assess or clarify the relationship between two or more variables is called a control variable (sometimes called a controlled variable). A control variable can strongly influence the results in an experiment. Ex. 1-33: Let us consider an example to understand the meaning ofa control variable. Suppose, we want to perform a regression analysis and to investigate how the amount of sunlight received affects the growth of a plant. Here, the amount of sunlight received is the independent variable and growth of the plant is the dependent variable. As the independent variable changes, we can see the corresponding changes in the dependent variable i.e., the growth of the plant. A control variable is another factor in a regression analysis. In our example of a plant growth, control variables could be water and fertilizers supplied. If control variables are not kept constant, they could ruin the regression analysis. We may conclude that plants grow optimally at 6 hours of light a day. However, if our plant is receiving different levels of fertilizers and water supply, our experiment becomes invalid. Hence, we need to identify the variables that may affect the outcome of our analysis and should take actions to control them. Intervening Variable/Mediating Variable

An intervening variable is a hypothetical variable used to explain causal links between other variables. Intervening variables cannot be observed in an experiment, that is why they are called the hypothetical variable. In psychology, the intervening variable is sometimes called a mediator variable. In statistics, an intervening variable is usually considered to be a sub-type of a mediating variable. However, the lines between the two terms are somewhat fuzzy, and they are often used interchangeably. Ex. 1-34: We know there is an association between being poor and having a shorter life span. Just because someone is poor it does not mean that he/she will face an early death. So, other hypothetical variables are used to explain the phenomenon. These intervening variables could include lack of access to healthcare or poor nutrition, environmental pollution etc.

Chapter One

14

Exercises 1-1: Explain the meaning of econometrics with an example. 1-2: Define a theoretical model with an example. 1-3: Discuss different types of theoretical models with an example of each. 1-4: Explain the ideas behind econometric modeling of economic relationships with an example. 1-5: Distinguish between economic theory, mathematical economics and econometrics with an example of each. 1-6: Distinguish between economic and econometric models with an example of each. 1-7: Distinguish between theoretical and applied econometrics. 1-8: Discuss the advantages of econometrics. 1-9: What are the limitations of econometrics? 1-10: Write different steps that are involved in an econometric analysis of economic models. 1-11. Why do we prefer an econometric model rather than an economic model? Discuss. 1-12: Distinguish between economic theory and econometrics. 1-13. Distinguish between mathematical economics and econometrics. 1-14: Define different types of relationships with an example of each. 1-15: Distinguish between static and dynamic relationships with an example of each. 1-16: Distinguish between static and stochastic relationships with an example of each. 1-17: Distinguish between micro and macro relationships with an example of each. 1-18: Define a random error term with an example. Discuss the reasons to include the random error term in a model. 1-19: Define data with an example. Discuss different types of data with an example of each. 1-20: Distinguish between cross-sectional and time-series data with an example of each. 1-21: Distinguish between time-series and panel data with an example of each. 1-22: Define a variable with an example. Discuss different types of variables with an example of each. 1-23: Distinguish between quantitative and qualitative variables with an example of each. 1-24: Distinguish between discrete and continuous variables with an example of each. 1-25: Distinguish between dependent and independent variables with an example of each. 1-26: Distinguish between control and independent variables with an example of each. 1-27: Distinguish between categorical and dichotomous variables with an example of each. 1-28: Distinguish between categorical and binary variables with an example of each. 1.29: Identify the following functions:

(i): q d = Į+ȕp; (ii): q s = Į+ȕp; (iii): GDPt = GDP0 (1+r) t ; (iv): GDPt = eĮ+ȕt+İ t ; (v): q d = A 0 y Į ; (vi): q s = A 0 y Į eİ t ; (vii): GDPt = A 0 LĮt K ȕt ; (ix): Quick Ratio =

(xi): Yt = ȕ0 +ȕ1Yt-1 ; and (xii): Ct = Į+ȕYt .

Current Assets  Inventories Total Debt ; (x): Total Debt Ratio = ; Current Liabilities Total Assets

CHAPTER TWO SIMPLE REGRESSION MODELS

2.1 Introduction Most of the research problems irrespective of any discipline are associated with the topic of modelling, i.e., they try to describe how the dependent and independent variables are related to each other. For example, businessmen might be interested in modelling the relationship between the amount of capital investment and returns. A producer might be interested in modelling the relationship between the level of output and production costs. An advertising agency might be interested in modelling the relationship between a firm's sales revenue and advertising costs. A government agency might be interested in modelling the relationship between economic growth and investment and also might be interested in relating the growth rate of GDP to the time periods, etc. An investment firm might be interested in modelling the relationship between risk and stock returns and also might be interested in relating the performance of the stock market to the time periods etc. A doctor might be interested in modelling the relationship between the weight of children and breastfeeding. Therefore, to deal with these types of problems, students and researchers should be able to know the simple linear and non-linear regression models. Thus, in this chapter, simple linear and non-linear regression models are discussed and show how to fit them to a set of data points using different estimation methods namely: the method of least squares, maximum likelihood method, and method of moments. Then, we will show how to judge whether a relationship between a dependent variable(y) and independent variable(x) is statistically significant, how to use the model to estimate the expected value of y, and how to forecast a future value of y for a given value of x. In this chapter, proper justifications for the techniques employed in a regression analysis are also provided. For the empirical measurement of the relationships, the software packages RATS, EViews and STATA are used.

2.2 Simple Linear Regression Models and Estimation Methods In this section, the meaning of simple population regression equation, sample regression equation, and simple linear regression equations and the methods which are the most popular and widely applicable to estimate simple linear regression equations are discussed along with the assumptions associated with the models. Population Regression Function or Population Regression The functional relationship between dependent variable (Y) and independent variable (X) of the type Yi = E(Y|Xi )+İ i

(2.1)

is called a (two-variable) population regression function (PRF) or population regression equation. Here, E(Y|X i ) is called the conditional mean or conditional expectation of Y and read as the expected value of Y given that X takes the specific value Xi which is defined as a function of Xi. Mathematically it can be written as E(Y|X i ) = f(X i ) , (i = 1, 2,……..,N)

(2.2)

Equation (2.2) tells us how the average response of Y varies due to a change in X. İ i is the random error term. Thus, it can be said that, in a PRF, Yi is equal to the conditional mean of Y plus the random error term İ i . If Y linearly depends on X, then the PRF is defined as Yi = ȕ 0 +ȕ1X i +İ i

(2.3)

Here, E(Y|X i ) is defined as E(Y|X i ) = E 0 +E1X i

(2.4)

where ȕ 0 is called the intercept or constant term which indicates the value of E(Y|Xi) at the origin, and ȕ1 is the called regression coefficient or slope coefficient which gives the marginal effect of X on Y. Equation (2.3) is also known as the linear population regression function (LPRF) or simply the linear population regression. In the literature, sometimes alternatively we call it a linear population regression model (LPRM) or a linear population regression

Chapter Two

16

equation (LPRE). In real-life problems, we can't deal with the entire population for examination. That is why we have to estimate/predict/forecast based on the sample information. Simply, regression is defined as the mathematical or empirical measurement of the average relationship of a dependent variable with one or more than one independent variable. Ex. 2-1: Assume that the consumption expenditure (C) of households depends on income level Y. Then, the population regression equation between C and Y can be written as (2.5)

C = f(Y) + İ

where C is called the dependent variable/explained variable/regressand variable or response variable; Y is called the independent variable/explanatory variable, or regressor variable; and İ is called the random error term. If C linearly depends on Y, the average relationship of C can be expressed as (2.6)

Ci = ȕ 0 +ȕ1Yi +İ i

where ȕ 0 is the regression constant, which indicates the average consumption expenditures of the households when the income level Y is zero, ȕ1 is the regression coefficient, which indicates the average change of consumption expenditures of the household for changing one unit of Y, and İ i is the ith random error term. The graph of this equation is given below:

C 'C 'Y

C i = E 0 + E1Yi +H i 'C E1 'Y

E0 Level of Income

(Y)

Fig. 2-1: Regression equation between consumption expenditure and level of income

A regression analysis attempts to estimate the nature of the mathematical/empirical relationship between economic variables and thereby provides a mechanism for the prediction and forecasting of the PRFs. Simple Linear Regression Model The linear relationship between two variables Y and X of the type Yi = ȕ 0 +ȕ1X i +İ i

(2.7)

is called a simple linear regression model. Here, Y is called the dependent variable/explained variable/regressand variable/response variable, X is called the independent variable or explanatory variable or regressor variable, İ is called the disturbance term or random error term, ȕ 0 is called the regression constant which indicates the average value of Y when X is zero, and ȕ1 is called the regression coefficient which indicates the average impact of per unit change in X on Y. The variable Y is called the dependent variable because the value of Y will change due to the change in the value of X and the disturbance term İ . For time-series data Yi and Xi represent the value of Y and X at the ith period while in cross-section data, it represents observation of the ith individual or object. Thus, the simple linear regression model indicates the average relationship of the dependent variable Y with one independent variable X plus the random error term İ . The stochastic nature of the regression model implies that, for each value of X, there is a sampling distribution of the values of Y. The graphical presentation of a simple linear regression model between two variables Y and X is given below:

Simple Regression Models

17

Dependent Variable (Y)

Yi = E0 + E1Xi +H i

'Y

E1 =

'Y 'X

'X

E0 Independent Variable (X) Fig. 2-2: Simple linear regression model between Y and X.

Ex. 2-2: The simple linear regression model between profit and investment of several industries is given by prof i = ȕ 0 +ȕ1invi +İ i ; ( i = 1, 2,………,N)

(2.8)

where invi is the ith value of the independent variable investment, profi is the corresponding ith value of the dependent variable profit, İ i is the ith value of the random error term İ, ȕ 0 is called the regression constant which indicates the average value of profit when investment level is zero, and ȕ1 is called the regression coefficient which indicates the average impact of per unit change in investment on profit. Sample Regression Function (SRF) In a population regression function (PRF), we do not know the values of ȕ 0 , ȕ1 and İ i but we can have some estimates based on sample information. So, the sample counterpart of equation (2.4) can be written as ˆ = ȕˆ +ȕˆ X (Deterministic form) Y i 0 1 i

(2.9)

ˆ is read as Y -hat or Y -cap , Y ˆ is the estimator of E(Yi|Xi), ȕˆ is the estimator of ȕ , and ȕˆ is the where Y i i i i 0 0 1 estimator of ȕ1 .

We want to estimate the population regression function (2.3) based on the following regression function Yi = ȕˆ 0 +ȕˆ 1X i +݈ i

(2.10)

Thus, Yi = ȕˆ 0 +ȕˆ 1X i +݈ i (stochastic form) is called a sample regression function (SRF). When we are dealing with the sample data, the individual values of Y are defined as ˆ +e Yi = Y i i

(2.11)

where ei is the estimate of İ i which is called the residual. Thus, it can be said that the individual value of Y is equal to the sum of the estimate of the conditional mean of Y and the residual ( ei ) . Equation (2.11) can be written as Yi = ȕˆ 0 +ȕˆ 1X i +ei

The SRF along with PRF are shown below graphically:

(2.12)

Chapter Two

18

ˆ = ȕˆ +ȕˆ X Y i 0 1 i

Yi ˆ Y i

ei

İi

E(Yi |X i ) = ȕ 0 +ȕ1X i

SRF

E(Yi |X

PRF

Xi

Fig. 2-3: Sample and population regression equations

Ex. 2-3: Economic theory tells us that the quantity demanded (q) of a commodity will be a function of its price (p), i.e., q = f(p). However, the economic theory does not tell us whether the function will be linear or non-linear. If we assume a linear relationship between q and p, we can write (2.13)

q i = ȕ 0 +ȕ1pi

ȕ1 is always expected to be negative. From the samples, the price elasticity of demand is given by ep

p ȕˆ 1 u q

(2.14)

where ȕˆ 1 is the regression estimate of ȕ1 . The relationship between q and p that is produced here will not be exact because q is influenced by many other factors like consumers’ income, price of other related commodities, time period, age of consumers, etc. To count their effects, we use an unknown disturbances term İ i . Let ei be the residual. Then the SRF is given by q i = ȕˆ 0 +ȕˆ 1pi  ei

(2.15)

where ȕˆ 0 is the estimate of ȕ 0 and ȕˆ 1 is the regression estimate of ȕ1 . Equation (2.15) is called a sample regression function (SRF). The predicted line of equation (2.15) is shown for the given data in Table 2-1 . Table 2-1: Random sample from the population of per unit price and consumption of a commodity Price Per Unit (p, in TK) 30 48 25 55 40 20 35 50 36 45

Consumption (q, in kg) 112 55 120 40 75 125 98 45 90 65

The actual values with the predicted values are shown below graphically:

Simple Regression Models

40

60

80

100

120

140

19

20

30

40 p q

50

60

Fitted values

Fig. 2-4: Actual versus predicted values

Statistical Assumptions of Simple Linear Regression Models Let us consider the simple linear regression model between Y and X of the type Yi = ȕ 0 +ȕ1X i +İ i ; ( i = 1, 2,………,N)

(2.16)

The application of econometric techniques for the mathematical or empirical measurement of the simple linear regression model is based on some basic assumptions. These assumptions are given below: Normality: The first assumption is that the disturbance term İ i is normally distributed. Zero Mean: The mean value of the error term İ i is zero, i.e., E(İ i ) 0,  i (i=1, 2,….,N). This means that, for each value of X, some values of İ i will be positive and some will be negative, but their summation will be zero. Homoscedasticity: Every disturbance has the same variance around its zero mean whose value is unknown, i.e., Var(İ i ) = E(İ i2 ) = ı 2 ,  i. It means that, for all values of X's, İ i shows the same dispersion around the zero mean. This implies that the variance of the disturbance term does not depend on the value of X. For small as well as for large values of X, the sampling distribution of İ i for all i will be identical. Thus, we can say İ i ~N(0, ı 2 ) for all i. Non-autoregression: This assumption requires that various disturbance terms are not related to each other i.e. Cov(İ i , İ j ) 0, for i z j, and i, j = 1, 2, .........,N. The covariance of İ i and İ j is equal to zero means the value of the

disturbance term in one period does not depend on the value of that in the previous period. Non-stochastic: The independent or regressor variable X is a non-random variable whose values are fixed in repeated samples. Stochastic: The dependent or response variable Y is a random variable or stochastic variable. So, it has a sampling distribution. Yi is also normally distributed with mean ȕ 0 +ȕ1Yi and constant variance ı 2 . Thus, we can write

Yi ~N(ȕ 0 +ȕ1X i , ı 2 ). Independence: Xi and İ j are independent for all i and j. This assumption automatically follows that, if X is

considered as a non-random variable, then E(X i , İ j )

0,  i and j.

Methods for Estimating Simple Linear Regression Models or Parameters

The following three methods are the most popular and widely applicable for estimating simple linear regression models or parameters of simple linear regression models: (1) Method of moments (2) Ordinary least squares (OLS) method (3) Maximum likelihood (ML) method. Method of Moments

The simple linear regression model between two variables Y and X is given by

Chapter Two

20

Yi = ȕ 0 +ȕ1X i +İ i ; ( i = 1, 2,………….,N)

(2.17)

where Y is called the dependent variable/explained variable/regressand variable/response variable, X is called the independent variable/explanatory variable/regressor variable, İ is called the disturbance term or random error term, ȕ 0 is called the regression constant which indicates the average value of Y when X is zero, ȕ1 is the regression coefficient which indicates the average change in Y for per unit change in X, Yi is the ith observation of the dependent variable Y, Xi is the corresponding ith observation of the independent variable X, and İ i is the random error term corresponding to the ith set of observations. Since the value of Yi depends on İ i , which is unknown to us, we have to take the expected value of Yi for a given value of X i . The expected value of Yi for a given value of X i is given by E(Yi |X i =x i ) = ȕ 0 +ȕ1 x i +E(İ i )

(2.18)

yi = ȕ 0 +ȕ1 x i +E(İ i )

Assumptions: 0,  i, Var(İ i ) = ı 2 ,  i, Cov(İ i , İ j )

E(İ i )

E(İ i , İ j )

0, for i z j, İ i ~NIID(0, ı 2 ), and E(X i , İ j )

0,  i and j.

The regressor variable X is a non-random variable whose values are fixed in repeated samples. Thus, Cov(X i , İ j ) 0,  i and j. . Let yi be the observed value of the dependent variable y corresponding to the ith set of observations in the sample, x i be the corresponding ith value of the independent variable X, and e be the estimated value of İ (residual). Let ȕˆ i

i

0

and ȕˆ 1 be the estimators of ȕ 0 and ȕ1 respectively. Thus, for sample observations, the regression equation is given by yi = ȕˆ 0 +ȕˆ 1 x i +ei ; ( i = 1, 2,……….,n) ei = yi - ȕˆ 0 -ȕˆ 1 x i

(2.19)

For the sample observations, we can write n

¦e

n

0 Ÿ ¦ (yi - ȕˆ 0 -ȕˆ 1 x i )

i

i=1

(2.20)

0

i=1

n

¦e x i

n

0 Ÿ ¦ (yi - ȕˆ 0 -ȕˆ 1 x i )x i

i

i=1

0

(2.21)

i=1

From equations (2.20), it can be written as n

¦y

n

i

i=1

= nȕˆ 0 +ȕˆ 1 ¦ x i

(2.22)

i=1

and from equation (2.21), we have n

¦y x i

i

i=1

n

n

i=1

i=1

= ȕˆ 0 ¦ x i +ȕˆ 1 ¦ x i2

(2.23)

From equation (2.23), we have ȕˆ 0

y  ȕˆ 1 x1

(2.24)

Putting the value of ȕˆ 0 in equation (2.23), we have n

ȕˆ 1

¦y x i

i=1 n

i

 nx y

¦ x i2  nx 2 i=1

(2.25)

Simple Regression Models

21

Therefore, the estimated regression equation is given by yˆ i = ȕˆ 0 +ȕˆ 1 x i

(2.26)

and the estimated residuals are given by ei = yi - ȕˆ 0 -ȕˆ 1 x i

(2.27)

Ex. 2-4: A simple linear regression model is estimated using the method of moments based on time-series data for the period 1973-2018 by considering the yearly total expenditure (in crore BDT) as a dependent variable and total income (in crore BDT) as an independent variable of foreign banks in Bangladesh.1 Let us consider the regression equation of total expenditure (Y) on total income (X) of the type

(2.28)

Yt = ȕ 0 +ȕ1X t +İ t

where Yt is the total expenditure at time t, X t is the total income at time t, ȕ 0 is the regression constant which indicates the average expenditure when total income is zero, and ȕ1 indicates the average change in total expenditure for per unit change in total income. İ t is the random error term corresponding to the ith set of observations. We assume that İ t has satisfied all the usual assumptions of a linear regression equation. Let ȕˆ 0 and ȕˆ 1 be the moment estimators of ȕ 0 and ȕ1 which are given by equations (2.24) and (2.25). For the given data, we have T

¦x

t

60799.8218 ,

t=1

Thus, we have x =

T

¦y t=1

T

t

29519.5867,

¦x y t

t

112169938.449, and

t=1

T

¦x

2 t

235055920.1 .

t=1

60799.8218 29519.5867 = 1321.7353 , and y = = 641.7301. 46 46

Putting the values of all the terms in equations (2.25) and (2.24), we have 112169938.4485  (46 u 1321.7353 u 641.7301) ȕˆ 1 = 235055920.0537  (46 u 1321.73532 )

0.4729

(2.29)

and ȕˆ 0 = 641.7301  (0.4858 u 969.2969) =16.7007

(2.30)

Therefore, the fitted regression equation will be yˆ i = 16.7007 +0.4729x i

(2.31)

Comment: From the estimated results, it is found that, if yearly income increases by one crore BDT, the total expenditure of foreign banks will increase on average by BDT 0.4729 crore in Bangladesh. If there is no level of income, the average expenditure will be BDT 16.7007 crore.

The predicted values with the actual values are also shown below graphically:

1

Source: Statistics Department, Bangladesh Bank

Chapter Two

0

1000

2000

3000

22

0

2000 TEX

TINC

4000

6000

Fitted values

Fig. 2-5: Actual versus predicted values

Ordinary Least Squares (OLS) Method

The OLS method is the most popular among all these methods and widely applicable in estimating simple linear regression models. Let us consider the following simple linear regression model between two variables Y and X Yi = ȕ 0 +ȕ1 x i +İ i ; (i= 1, 2, …………,N)

(2.32)

All the terms, including assumptions of the linear regression equation, are discussed previously. Since the value of Yi depends on İ i , which is unknown to us, we have to take the expected value of Yi for a given value of X i . Thus, the conditional mean of Yi is given by: E(Yi |X i =x i ) = ȕ 0 +ȕ1 x i +E(İ i |x i ) yi = ȕ 0 +ȕ1 x i +E(İ i |x i )

(2.33)

Based on assumptions, equation (2.33), can be written as (2.34)

yi = ȕ 0 +ȕ1 x i

If yi is the ith observed value of the variable Y, x i is the corresponding ith observation of the variable X , ei is the residual corresponding to the ith set of observations in the sample. Let ȕˆ 0 and ȕˆ 1 be the least squares estimators of ȕ 0 and ȕ1 respectively. Then, the simple linear relationship between y and X for the sample observations is given by yi = ȕˆ 0 +ȕˆ 1 x i +ei ; (i= 1, 2, …….,n) ei = yi  ȕˆ 0  ȕˆ 1 x i

(2.35)

The principle of the least squares method is that the OLS estimators ȕˆ 0 and ȕˆ 1 are obtained by minimising the residual sum of squares. Now, squaring the residuals and then taking summation, we can write n

n

¦ e ¦ (y - ȕˆ 2 i

i=1

i

0

-ȕˆ 1 x i ) 2

(2.36)

i=1

To find the values of ȕˆ 0 and ȕˆ 1 that minimise this residual sum of squares, we have to differentiate

n

¦e i=1

respect to ȕˆ 0 and ȕˆ 1 , and then equate them to zero. These are called the first-order conditions for minimisation. The first-order conditions for minimisation are as n

į¦ ei2 i=1

įȕˆ 0

n

= 0 Ÿ -2¦ (yi -ȕˆ 0 -ȕˆ 1 x i ) = 0 i=1

(2.37)

2 i

with

Simple Regression Models

23

n

į¦ ei2 i=1

įȕˆ 1

n

= 0 Ÿ -2¦ (yi -ȕˆ 0 -ȕˆ 1 x i )x i = 0

(2.38)

i=1

From equation (2.37), we have n

¦ y -nȕˆ i

i=1

n

0

-ȕˆ 1 ¦ x i = 0

(2.39)

i=1

and from equation (2.39), we get n

n

n

¦ y x -ȕˆ ¦ x -ȕˆ ¦ x i

i

0

i=1

i

1

i=1

2 i

(2.40)

=0

i=1

Equations (2.39) and (2.40) are called the normal equations. These normal equations can also be written as n

nȕˆ 0 + ȕˆ 1 ¦ x i i=1

n

n

i=1

i=1

n

¦y

(2.41)

i

i=1

ȕˆ 0 ¦ x i + ȕˆ 1 ¦ x i2

n

¦x y i

(2.42)

i

i=1

There are two linear equations (2.41) and (2.42) in two random variables ȕˆ 0 and ȕˆ 1 respectively. Thus, the estimators ȕˆ 0 and ȕˆ 1 can be obtained by solving this system of two linear equations using different methods namely: (i) substitution procedure, (ii) matrix form and (iii) Cramer’s rule. (i) Substitution Procedure

The first equation of the system can be written as ȕˆ 0

y  ȕˆ 1 x

(2.43)

Putting the value of ȕˆ 0 in the second equation of the system, we have n

ȕˆ 1

¦x y i

i

 nx y

i=1 n

(2.44)

¦ x i2  nx 2 i=1

(ii) Matrix Form

The system of linear equations can be written in the following matrix form ª « n « « n «¦ xi ¬ i=1

n

º » ªȕˆ º i=1 »« 0» n ˆ 2»« x i » ¬ ȕ1 »¼ ¦ i=1 ¼

¦x

i

ª n º « ¦ yi » « i=1 » « n » « ¦ x i yi » ¬ i=1 ¼

ª « n Let us define, A = « n « «¦ xi ¬ i=1

n

º » ªȕˆ º i=1 » , ȕˆ = « 0 » , and b n » «¬ ȕˆ 1 »¼ x i2 » ¦ ¼ i=1

¦x

i

Thus, equation (2.45) can be written as Aȕˆ = b

(2.45)

ª n º « ¦ yi » « i=1 » « n » « ¦ x i yi » ¬ i=1 ¼

Chapter Two

24

ȕˆ = A -1b

(2.46)

Solving equation (2.46), we can find the value of ȕˆ 0 and ȕˆ 1 . We have

A

ª n 2 «¦ xi n « i=1 n ¦ x i2  n 2 x 2 «¬ nx 1

-1

i=1

º nx » » n »¼

(2.47)

Putting the values of A -1 and b in equation (2.46), we have ª n 2 1 «¦ x i n « i=1 n ¦ x i2  n 2 x 2 ¬« nx

ȕˆ

i=1

ª n º º « ¦ yi » nx » i=1 « » »« n » n ¼» « ¦ x i yi » ¬ i=1 ¼

n ª n 2 º ny ¦ x i  nx ¦ x i yi » « 1 i=1 « i=1 » n n » 2 2 2 « 2 n ¦ x i  n x « n ¦ x i yi  n x y » i=1 ¬ ¼ i=1

(2.48)

From equation (2.48), we have n

n

ny¦ x i2  nx ¦ x i yi

ȕˆ 0

i=1

i=1

n

(2.49)

2 i

n¦ x  n2 x 2 i=1

and n

n ¦ x i yi  n 2 x y

ȕˆ 1

i=1 n

2 i

2

n¦ x  n x

(2.50)

2

i=1

Equation (2.50) can also be expressed as n

¦x y i

ȕˆ 1

i=1 n

¦x

2 i

i

 nx y  nx 2

i=1

=

SP(x,y) SS(x)

(2.51)

Equation (2.49) can also be written as n

ȕˆ 0

n

y¦ x i2  nx 2 y  x ¦ x i yi  nx 2 y i=1

i=1

n

¦x i=1

2 i

 nx 2

Simple Regression Models n

y ¦ x i2  nx 2 y i=1 n

2 i

¦x

 nx 2

i=1

25

­ n ½ ®¦ x i yi  nx y ¾ x ¿  ¯ i=1 n

¦x

2 i

 nx 2

i=1

y  ȕˆ 1 x

(2.52)

(iii) Cramer’s Rule

Using the Cramer’s rule, ȕˆ 0 and ȕˆ 1 are given by n

¦y

n

¦x

i

i=1

n

n

¦x y ¦x i

ȕˆ 0

i

i=1

i

i=1

2 i

i=1

n

¦x

n

(2.53)

i

i=1

n

n

¦ xi

¦x

i=1

2 i

i=1

and n

¦y

n

i

i=1

n

n

¦x ¦x y i

ȕˆ 1

i

i=1

i

i=1

n

(2.54)

¦ xi

n

i=1

n

n

¦x ¦x i

i=1

2 i

i=1

From equation (2.54), we have n

n ¦ x i yi  n 2 x y

ȕˆ 1

i=1 n

n ¦ x i2  n 2 x 2 i=1

n

¦x y i

i=1 n

¦x

2 i

i

 nx y  nx 2

i=1

=

SP(x,y) SS(x)

From equation (2.53), we have

(2.55)

Chapter Two

26 n

n

ny¦ x i2  nx ¦ x i yi

ȕˆ 0

i=1

i=1

n

2 i

n¦ x  n2 x 2 i=1

n

n

y ¦ x i2  nx 2 y  x ¦ x i yi  nx 2 y i=1

i=1

n

¦x

2 i

 nx 2

i=1

= y  ȕˆ 1 x

(2.56)

Thus, it can be said that in all methods the same results are to be obtained. The second-order conditions for minimisation are n

į 2 ¦ ei2

n

i=1 2 1

GEˆ

2¦ x i2 ! 0

(2.57)

2n ! 0

(2.58)

i=1

n

į 2 ¦ ei2 i=1

GEˆ02

The second-order conditions for minimisations are satisfied. The third order condition for minimisation:

The third-order condition implies that n n ª n º į 2 ¦ ei2 į 2 ¦ ei2 « į 2 ¦ ei2 » i=1 × i=1 2 > « i=1 » « įȕˆ 1įȕˆ 0 » įȕˆ 12 įȕˆ 0 «¬ »¼

2

(2.59)

We have n

į 2 ¦ ei2 i=1

įȕˆ 1įȕˆ 0

(2.60)

= 2nx

We know n

i

 x) 2 t 0

2 i

 nx 2 t 0

¦ (x i=1

n

¦x i=1

n

4n ¦ x i2 t 4n 2 x 2 i=1

n

2

2¦ x i2 u 2n t > 2nx @ i=1

From equation (2.61) we have

(2.61)

Simple Regression Models n n ª n º į 2 ¦ ei2 į 2 ¦ ei2 « į 2 ¦ ei2 » i=1 i=1 i=1 » × >« « įȕˆ 1įȕˆ 0 » įȕˆ 12 įȕˆ 0 2 «¬ »¼

27

2

(2.62)

Thus, the third-order condition is also satisfied to minimise the residual sum of squares. Therefore, equations (2.43) and (2.44) or (2.51) and (2.52) or (2.55) and (2.56) give us the OLS estimators of ȕ1 and ȕ 0 respectively. Thus, the estimated equation is given by yˆ i = ȕˆ 0 +ȕˆ 1 x i

(2.63)

The estimated residuals are given by ei

yi  ȕˆ 0  ȕˆ 1 x i ; (i=1, 2,....,n)

(2.64)

Ex. 2-5: A simple linear regression model is estimated using the OLS method based on time-series data for the period 1973 to 2018 by considering yearly total expenditure (in crore BDT) as a dependent variable and total income (in crore BDT) as an independent variable of the state-owned banks of Bangladesh.2 Assume that the relationship between total expenditure (Y) and total income (X) of the state-owned banks is linear. The linear regression equation of total expenditure (Y) on total income (X) is given by Yt = ȕ 0 +ȕ1X t +İ t

(2.65)

Here, variable Y indicates the total expenditures and the variable X indicates the total income of the state-owned banks. ȕ 0 is the regression constant which indicates the average expenditure of the state-owned banks when income is zero, ȕ1 is the regression coefficient which indicates the impact of per unit change in total income on average expenditure Y, and İ is the random error term. Let ȕˆ and ȕˆ be the least squares estimators of ȕ and ȕ . For the data, we have 0

T

¦x

t

216693.86,

t=1

Thus, we have, x =

T

T

¦y

0

1

t

t=1

= 181226.06,

T

¦ x y = 2260000000, and ¦ x t

t

2 t

1

2750700000.

t=1

t=1

216693.86 181226.06 = 4710.7361, and y = = 3939.697. 46 46

Putting the values of all the terms in equations (2.55) and (2.56), we have 2.2600e+009  (46 u 4710.7361u 3939.697) ȕˆ 1 = 2.7507e+009  (46 u 3939.697 2 )

0.8129

(2.66)

and ȕˆ 0 = 3939.697  0.8129 u 4710.7361=110.3103

(2.67)

So, the estimated regression equation will be yˆ t = 110.3103 + 0.8129x t

(2.68)

Comment: From the estimated results, it is found that, if, yearly income increases by one crore BDT, the total expenditure will be increased on average by BDT 0.8128 crore. If there is no income, the average expenditure will be BDT 110.3103 crore. Comparison: From the estimated results, it is found that, in state-owned banks for increasing yearly income by one unit of money, the average expenditure is higher than in foreign banks. Also, it is found that at the zero level of income, the average expenditure of state-owned banks is much higher than foreign banks in Bangladesh. It clearly indicates the existence of corruption, mismanagement, and the existence of over-employed people in the state-owned banking sector in Bangladesh.

2

Source: Statistics Department, Bangladesh Bank

Chapter Two

28

Note: Various software packages say RATS, EViews, STATA, R and Python, etc. that can be applied directly by students, researchers and academicians to estimate a simple linear regression equation. The OLS estimates are obtained directly using EViews and are given in Table 2-2. Table 2-2: Least squares estimates for the given problem

Dependent Variable: TEX Method: Least Squares Sample: 1973 2018 Included observations: 46 Std. Error T-Stat 107.7777 1.0235 0.01393 58.3249 Mean dependent var 0.9872 S.D. dependent var 0.9869 Akaike info criterion 579.6932 Schwarz criterion 14785943 Hannan-Quinn Criterion -356.9237 Durbin-Watson stat 3401.791 0.0000

Variable Coefficient Constant 110.3103 TINC 0.8129 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log-likelihood F-statistic Prob(F-statistic)

Signif 0.3117 0.0000 3939.697 5072.667 15.60538 15.68489 15.63516 0.678090

0

5000

10000

15000

20000

The predicted values with the actual values are also shown below graphically:

0

5000

10000 TINC TEX

15000

20000

Fitted values

Fig. 2-6: Actual versus predicted values

Maximum Likelihood (ML) Method

Under the assumptions of the random error term İ i , Yi (i = 1, 2,….,n) is also normal variate with a mean ȕ 0 +ȕ1X i (i =1,2,.….,n) and variance ı 2 . The likelihood function is given by L(Y, ȕ 0 , ȕ1 , ı 2 ) = f(Y1 , ȕ 0 , ȕ1 , ı 2 ), f(Y2 , ȕ 0 , ȕ1 , ı 2 ),.........,f(Yn , ȕ 0 , ȕ1 , ı 2 ) n

L(Y, ȕ 0 , ȕ1 , ı 2 ) = 3 f(Yi , ȕ 0 , ȕ1 , ı 2 ) i=1

n

L(Y, ȕ 0 , ȕ1 , ı 2 ) = – i=1

1 2ʌı 2 n

e



1 2ı 2

1

> Yi  ȕ0  ȕ1Xi @2

n

2

> Yi ȕ0 ȕ1Xi @ ª 1 º  2ı2 ¦ i=1 L(Y, ȕ 0 , ȕ1 , ı ) = « e » 2 ¬ 2ʌı ¼ 2

(2.69)

Taking the logarithm of equation (2.69), we have n n 1 logL(Y, ȕ 0 , ȕ1 , ı 2 ) =  log(2ʌ)  log(ı 2 )  2 2 2 2ı

n

¦>Y  ȕ i

i=1

2

0

 ȕ1X i @

(2.70)

Simple Regression Models

29

Let ȕˆ 0 , ȕˆ 1 , and ıˆ 2 be the maximum likelihood estimators of ȕ 0 , ȕ1 , and ı 2 . So, the estimated log-likelihood function will be n n 1 logL(Y, ȕˆ 0 , ȕˆ 1 , ı 2 ) =  log(2ʌ)  log(ıˆ 2 )  2 2 2 2ıˆ

n

¦ ª¬ Y  ȕˆ i

0

i=1

 ȕˆ 1X i º¼

2

(2.71)

The maximum likelihood estimators ȕˆ 0 , ȕˆ 1 , and ıˆ 2 can be obtained by taking the partial derivatives of logL with respect to ȕˆ , ȕˆ , and ıˆ 2 and then equating to zero. These are called the first-order conditions. 0

1

The first-order condition for maximisation implies that

G logL(Y, ȕˆ 0 , ȕˆ 1 , ı 2 ) 1 =0Ÿ 2 ˆ ıˆ G ȕ0

¦ ª¬Y  ȕˆ

G logL(Y, ȕˆ 0 , ȕˆ 1 , ı 2 ) 1 =0Ÿ 2 ıˆ G ȕˆ 1

¦ ª¬ Y  ȕˆ

n

i

0

 ȕˆ 1X i º¼

0

 ȕˆ 1X i º¼ X i

i=1

n

i

i=1

G logL(Y, ȕˆ 0 , ȕˆ 1 , ı 2 ) n 1 =0Ÿ 2  4 2 2ıˆ 2ıˆ G ıˆ

n

¦ ª¬Y  ȕˆ i

i=1

(2.72)

0

0

(2.73)

0

 ȕˆ 1X i º¼

2

(2.74)

0

From equation (2.72), we have ȕˆ 0 = Y  ȕˆ 1X

(2.75)

Putting the value of ȕˆ 0 in equation (2.73) and then solving the equation, we have n

ȕˆ 1

¦ X Y  nXY i

i

i=1 n

(2.76)

¦ Xi2  nX 2 i=1

From equation (2.74), we have n

ıˆ

2

¦ (Y -ȕˆ i

0

-ȕˆ 1X i )

i=1

(2.77)

n

The second-order partial derivatives of logL with respect to ȕˆ 0 , ȕˆ 1 , and ıˆ 2 are given by

G 2 logL(Y, ȕˆ 0 , ȕˆ 1 , ı 2 ) n = 2 0 . Thus, equation (2.195), tells us that the expected return on any risky asset j over R f is proportional to the market risk premium which is equal to ȕ j . Regression Equations of CAPM

Equation (2.195) indicates the ex-ante equality in terms of unobserved expectations. Thus, equation (2.195) empirically cannot be estimated and tested statistically. Expectations equation (2.195) is a value that has not been

Simple Regression Models

57

observed. However, if we assume that the expectations are rational, so that expectations of economic agents correspond to mathematical expectations, we can derive an equation from equation (2.195) which involves actual return which is called a regression equation. Therefore, the derived equation of the CAPM model can be empirically estimated and testable. To derive the regression equation, let us now define the unexpected returns on asset j at time t as u jt = R jt  E(R jt )

(2.197)

E(R jt ) = R jt  u jt

Again, let us define the unexpected returns on the market portfolio as u mt = R mt  E(R mt ) E(R mt )

(2.198)

R mt  u mt

Since, R f is constant, equation (2.195), can be written as: R jt  u jt  R f = ȕ j (R mt  u mt  R f ) R jt  R f = ȕ j (R mt  R f ) + u jt  ȕ j u mt

R jt  R f = ȕ j (R mt  R f ) + İ jt

(2.199)

where İ jt = u jt  ȕ j u mt Equation (2.199) is called a regression equation without an intercept. If we add the intercept term the regression equation of the CAPM model is given by (2.200)

R jt  R f = ȕ 0 + ȕ j (R mt  R f ) + İ jt

where İ jt is a random error term. This error term is not something just added to the model, but it has a meaning, being a function of unexpected returns. This error term satisfies some minimal requirements for a regression error term that are given below: (i) It has a zero mean Proof: The expected value of the error term İ jt is given by E(İ jt ) = E(u jt )  ȕ j E(u mt )

(2.201)

0

(ii) It has a constant variance Proof: The variance of the error term İ jt is given by Var(İ jt ) = Var(u jt )  2ȕ jCov(u jt ,u mt )+ȕ 2j Var(u mt )

(2.202)

Since, İ jt is uncorrelated with the regressor (R mt  R f ), from the definition of ȕ j we have ȕj=

Cov(u jt ,u mt )

(2.203)

Var(u mt )

Therefore, equation (2.202) can be written as Var(İ jt )

Var(u jt )  2ȕ 2j Var(u mt )+ȕ 2j Var(u mt ) Var(u jt )  ȕ 2j Var(u mt ) 2

ª Cov(u jt ,u mt ) º = Var(u jt )  « » Var(u mt ) ¬ Var(u mt ) ¼ ı2 = ı12  122 ı2

(2.204)

Chapter Two

58

Where ı12 = Var(u jt ), ı 22 = Var(u mt ), and ı12 = Cov(u jt ,u mt ). Thus, it can be said that the variance of Var(İ jt ) is constant. (iii) It is uncorrelated with the regressor (R mt  R f ) Proof: The covariance between İ jt and (R mt  R f ) is given by Cov ^İ jt (R mt  R f )` = E ^İ jt (R mt  R f )` E ^İ jt R mt `

[R f is not stocchastic, E(İ jt R f )

^

E u jt  ȕ j u mt u mt

0]

`

E(u jt u mt )  ȕ j E(u 2mt ) E(u jt u mt )  E(u jt u mt )

=0

(2.205)

Thus, we can apply the OLS method to equation (2.199) or in equation (2.200). We know that the OLS estimators are asymptotically normally distributed thus by virtue of the asymptotic properties, the OLS estimates and the standard tests are appropriate in the case of the CAP model. Estimation and Testing of the CAPM

In this section, the CAPM is estimated for Grameen Phone (GP) Limited in Bangladesh using the daily price of GP and DSEX index for the year 2015 and the return on 10 years of government bonds is used as a risk-free return which is equal to 2.62%. To estimate the CAPM model for the daily return of GP3, STATA is used and the results are given below: Table 2-5: OLS estimates of the CAPM for GP

Variable Constant X (Excess market return) Number of Observation R2 AdjR 2 Residual Sum of squares Residual MS

Coef. 0.1771 0.2907

Std. Err. 0.0812 0.1053 202 0.0367 0.0319 335.5101 1.678

t-Test P>|t| [95% Conf. Interval] 1.94 0.030 [-0.0027, 0.3570] 2.76 0.006 [0.0830, 0.4984] Regression Sum of Squ. 12.7763 12.7763 Regression MS 1.2952 Root MSE F(1, 200) 7.62 0.0063 Prob of F

The estimated beta coefficient of the CAP model indicates how sensitive the value of industry share is to general market movements. From the estimated results, it is found that, for 100% excess market return at DSE corresponds to an expected excess return of GP to be about 29.07% and it is statistically significant at a 1% significance level. From the estimated intercept term, it can be concluded that the CAPM is valid at a 5% level of significance. The point estimate of 0.1771 implies that GP is expected to have a return that is 0.1771% per day higher than the CAPM predicts. The R 2 value is quite lower, thus, it can be concluded that the fit is not good.

1 ª­ ½º °§ R 1 ·§ R 2 · § R n · n °» « Average of Daily Market Return = ®¨ 1+ 1+ .......¨ 1+  1¾ u 100 ¸¨ ¸ ¸ « © 100 ¹© 100 ¹ » © 100 ¹ ° °¿ » ¬« ¯ ¼ Here, R 1 , R 2 ,.......,R n are the daily market returns at day-1,day-2,......day-n respectively

3

ª§ Average of Daily Market Returns ·T º Therefore, R m = «¨ 1+ ¸  1» ×100 , Here, T=365 days, and n=No. of trading days. 100 ¹ »¼ ¬«©

Simple Regression Models

59

2.12 Two Variables Non-linear Relationships In the previous sections, two-variable linear relationships were discussed. However, in practice, most of the business and economic relationships are not linear. The relationships between economic variables can be adequately represented only by non-linear functions. For example, the production function, the consumption function, the investment function, the cost function, the supply function, etc. are not linear. In such a situation, the method of least squares cannot be applied directly to estimate the parameters or regression models. Thus, to apply the OLS method for estimation, first, we have to transform the non-linear relationship into a linear form, and then the OLS method can be applied to the transformed equation. The logarithmic and reciprocal are the most commonly used transformations. Some of the transformations are discussed below: Linear-Log Model

A functional relationship between two variables Y and X is said to be a linear-log model in which only the independent variable X appears in a logarithmic form but the dependent variable Y is unchanged. The linear-log model is given by (2.206)

Yi = ȕ 0 +ȕ1lnX i +İ i

From equation (2.206), we have įYi įX i

ȕ1 Xi

(2.207)

From equation (2.207), it is clear to us, if ȕ1 ! 0, the marginal change in Y with respect to an increase in X is a decreasing function of X. Ex. 2-11: Let the variable Y be the production of potatoes and X be the number of acres cultivated land in Bangladesh. įY is the marginal product of an extra acre of cultivated land. We assume that the marginal product decreases Then, įX as acreage increases. This is because, when the acreage is low, we expect that more fertile lands will be cultivated first. As acreage increases, less fertile areas will be put to use, and the additional output from these areas may not be as high as the output from the more fertile lands. This suggests a diminishing marginal product of potato acreage. Thus, for the relationship between the production of potatoes and the acreage cultivated lands, we have to use a linear-log model.

Y

Y ln(X)

X Fig. 2-13(a): A non-linear log functional form

ln(X) Fig. 2-13(b): A linear- log functional form

Estimation of a Linear-Log Model

The linear-log model between two variables Y and X is given by Yi = ȕ 0 +ȕ1lnX i +İ i

(2.208)

where the variable Y is the dependent variable, the variable X is the independent variable, the regression coefficient ȕ1 indicates that for 100% change in the independent variable X, the variable Y will change by ȕ1 /100, the regression constant ȕ 0 indicates the average value of Y when ln(x i ) is zero, that is, when X is one. Let Zi = lnX i , then equation (2.208) can be written as:

Chapter Two

60

(2.209)

Yi = ȕ 0 +ȕ1 Zi +İ i

This is a linear function of Y on Z. Thus, we can apply the OLS method to the transformed equation to estimate ȕ 0 and ȕ1 . Let ȕˆ 0 and ȕˆ 1 be the least squares estimates of ȕ 0 and ȕ1 which are given by ȕˆ 0

Y  ȕˆ 1 Z

(2.210)

and n

ȕˆ 1

¦ Z Y  nZY i

i

i=1 n

(2.211)

¦ Zi2  nZ2 i=1

So, the estimated equation will be ˆ = ȕˆ +ȕˆ Z Y i 0 1 i

(2.212)

Ex. 2-12: A linear-log model is estimated using the OLS method based on the time-series data for the period 1983 to 2018 by considering the production of potatoes (in thousand metric tons) as a dependent variable and the area irrigated (in acre) as an independent variable 4 of Bangladesh. The following linear-log model between the production of potatoes (Y) and irrigated area (X) is considered for estimation

(2.213)

Yt = ȕ 0 +ȕ1lnX t +İ t

where Yt indicates the production of potatoes in thousand metric tons at time t, Xt indicates irrigated land in acres at time t, the regression coefficient ȕ1 indicates that for a 100% change in irrigated area the production of potatoes will change by ȕ1 /100 units, the regression constant ȕ 0 indicates the average production of potatoes when ln(x i ) is zero that is when the irrigated area is one acre. İ t is the random error term corresponding to the tth set of observations that satisfies all the usual assumptions. Let Z t = lnX t , the transformed equation be given by (2.214)

Yt = ȕ 0 +ȕ1 Z t +İ t

We can now apply the OLS method to equation (2.214) For the given problem, we have T

¦ Yt

T

146500,

t=1

T

¦ Zt =¦ ln(X t ) t=1

T

463.1165,

T=1

¦ Zt Yt = 1959152.1159, and t=1

T

¦Z

2 t

= 5974.442.

t=1

Thus, we have Y=

146500 36

= 4069.4444, and Z =

463.1165 36

=12.8643

The OLS estimators ȕˆ 0 and ȕˆ 1 are given by 1959152.1159  36 u 12.8643 u 4069.4444 ȕˆ 1 = 5974.4426  36 u 12.86432

4448.6897

and ȕˆ 0 = 4069.4444  4448.6897×12.6897

4

Source: Statistical Yearbooks of Bangladesh

(2.215)

Simple Regression Models

61

(2.216)

=  53160.0393

Therefore, the estimated equation will be ˆ =  53160.0393 + 4448.4444lnX , R 2 = 0.9192 ½ Y i i

t-test: -18.2405 SE: 2914.3871

° ¾ ° ¿

19.6645 226.2298

(2.217)

Comment: From the estimated results, it can be said that for a 100% increase in irrigated lands in Bangladesh, the production of potatoes will increase by 44.4868 units. The regression coefficient for log(X) is highly significant, thereby supporting the hypothesis that the marginal effect of the production of potatoes decreases as the irrigated area increases. From the coefficient of determination, it can be concluded that the linear-log model fits the data very well and about 91.92% of the total variation of the dependent variable is explained by the fitted equation and the remaining 8.08% is explained by the random factors. Note: The linear-log model is estimated using RATS and the results are given in Table 2-6. Table 2-6: The OLS estimates of the linear log model

Linear log model- Estimation by Least Squares, Dependent Variable Production of Potatoes (Y), Annual Data From 1982:01 To 2017:01 Variable Coefficient Std. Error T-Stat Signif Constant -53160.03929 2914.3871 -18.2405 0.0000 Z 4448.6897 226.2298 19.6644 0.0000 925.9466 Standard Error of Estimate 36 Usable Observations 29150822.61 0.9192 Sum of Squared Residuals Centered R2 386.6915 0.9168 Regression F(1,44) Adjusted R2 0.0000 0.9695 Significance Level of F Uncentered R2 -295.9623 34.903 Log Likelihood TR2 0.5937 Durbin-Watson Statistic Mean of Dependent Variable 4069.4444 3210.2127 Std Error of Depen. Variable

0

2000

4000

6000

8000

10000

Also, the predicted values with the actual values are shown below graphically:

12

12.5

z Y

13

13.5

14

Fitted values

Fig. 2-14: Actual versus fitted values of linear-log model

Reciprocal Model

A functional relationship between two variables Y and X is said to be a reciprocal model in which the independent variable X appears in the reciprocal form and the dependent variable Y remains unchanged. The reciprocal relationship between Y and X is given by Yi = ȕ 0 + ȕ1

1  İi Xi

(2.218)

Chapter Two

62

Ex. 2-13: The demand function of a commodity will be a reciprocal form because there is an inverse relationship between the quantity demanded of a commodity and the per unit price of that commodity. Thus, the demand curve is downward sloping, and we would expect ȕ1 to be positive. Y asymptotically tends to zero when X becomes very large. Y

ȕ0 X Fig. 2-15: A reciprocal relationship between Y and X

Estimation of a Reciprocal Model

The reciprocal model between Y and X is given by Yi = ȕ 0 + ȕ1

1  İi Xi

(2.219)

Let us define Zi = 1/X i Thus, equation (2.219) can be written as Yi = ȕ 0 +ȕ1 Zi +İ i

(2.220)

The transformed model (2.220) represents a linear relationship between Y and Z. Thus, we can apply the OLS method to the transformed model to estimate ȕ 0 and ȕ1 . Ex. 2-14: A reciprocal model is estimated using the OLS method based on the time-series data taken from the USA5 from 1975 to 2018 by considering the per capita consumption of beef (in pounds) as a dependent variable, and the real retail price of beef per pound (in cents) as an independent variable. Let us consider the relationship between the per capita consumption of beef (q) and the price per unit of beef (p) of the type qt

ȕ 0 +ȕ1

1 +İ t pt

(2.221)

where qt is the per capita consumption of beef (in pounds) at time t, pt is the per unit price of beef at time t, ȕ1 is the average impact of per unit change in 1/p on q, ȕ 0 is the average value of q when 1/pt is zero, for the per unit change in p, the average change of q will be - ȕ1 /p 2t . İ t is the random error term corresponding to the tth set of observation that satisfies all the usual assumptions. Let x t = 1/p t . Therefore, the transformed equation is given by q t = ȕ 0 +ȕ1 x t +İ t

(2.222)

This is a linear function of q on x. Thus, we can apply the OLS method to the transformed equation to estimate ȕ 0 and ȕ1 .

5

Source: USDA, National Chicken Council of USA

Simple Regression Models

63

For the given data, we have q=

3020.2 0.1505 = 68.6409, x = 44 44

0.00342,

T

¦ q x = 10.8788, t

t

and

t=1

T

¦x

2 t

= 5.9099e-004.

t=1

Thus, the OLS estimator ȕˆ 1 of ȕ1 is given by 10.8788-(44×0.00342×68.6409) ȕˆ 1 = 5.9099e-004-(44×0.003422 )

= 7194.4603

(2.223)

and ȕˆ 0 = 68.6409  (7194.4603 u 0.00342)

= 44.0319

(2.224)

Therefore, the estimated equation will be qˆ t = 44.0319 + 7194.4603x t , R 2 t-test: 40.6035 SE: 1.08444

24.3141 295.8963

0.9337 ½ ° ¾ ° ¿

(2.225)

Comment: From the estimated results, it can be said that, if the value of x increases by 1 unit, the average consumption of beef will increase by 7194.4603 units. If the value of x is zero, the average consumption of beef will be 44.0319 units. The relationship between q and 1/p is statistically significant at any significance level. The fit is very good

The results are also obtained using RATS and given in Table-2.11. Table 2-7: The OLS estimates of the reciprocal model

50

60

70

80

90

Reciprocal model- Estimation by Least Squares, Dependent Variable Per Capita Consumption of Beef (Y), Annual Data From 1975:01 to 2018:01 Variable Coefficient Std. Error T-Stat Signif Constant 44.0319 1.0844 40.6035 0.0000 x 7194.4603 295.8963 24.31412 0.0000 2.58269 44 Standard Error of Estimate Usable Observations 280.1532 0.9337 Sum of Squared Residuals Centered R2 591.1766 0.9321 Regression F(1,44) Adjusted R2 0.0000 Significance Level of F Uncentered R2 0.9987 Log Likelihood -103.1585 43.942 TR2 Durbin-Watson Statistic 0.3569 Mean of Dependent Variable 68.6409 Std Error of Depend. Variable 9.91063 Also, the predicted values with the actual values are shown below graphically:

.002

.003

.004 y

Fig. 2-16 : Actual versus fitted values of the reciprocal model

z

.005 Fitted values

.006

.007

Chapter Two

64

Semi-log Model (Semi-log Transformation)

A functional relationship between Y and X is said to be a semi-log model in which only the dependent variable Y appears in a logarithmic form and the independent variable X remains unchanged. The semi-log model between Y and X is given by ln(Yt ) = ȕ 0 +ȕ1X t +İ t

(2.226)

Sometimes we are given with the relation of the form Yt = Y0 eȕ1t+İ t

(2.227)

or Yt = eȕ0 +ȕ1t+İ t

(2.228)

which are called the semi-log models. The regression coefficient ȕ1 implies that, for increasing an additional unit of the variable X, the variable Y will increase by 100ȕ1 percent. The regression constant ȕ 0 indicates the average value of log(Y) when the value of X is zero. The relations in (2.227), and (2.228) can be transformed into a linear form by taking logarithms in both sides of these equations. If we take logarithms in both sides of these equations, we have lnYt = lnY0 +ȕ1 t+İ t

(2.229)

or lnYt = ȕ 0 +ȕ1 t+İ t

(2.230)

Let Z t = lnYt , ȕ 0 = lnY0 , and X t = t, then either equation (2.229) or equation (2.230) can be written as Zt

ȕ 0 +ȕ1X t +İ t

(2.231)

This is a linear form between Z and X, and we can apply the OLS method to estimate ȕ1 and ȕ 0 . Let ȕˆ 1 and ȕˆ 0 be the least squares estimates of ȕ1 and ȕ 0 which are given by T

ȕˆ 1

¦X Z t

t

 TX Z

t=1 T

(2.232)

¦ X 2t  TX 2 t=1

and ȕˆ 0

Z  ȕˆ 1X

(2.233)

For equation (2.229), the estimated value of Y0 is given by ˆ = Exp(ȕˆ ) Y 0 0

Thus, the estimated equations will be ˆ =Y ˆ eȕˆ1t Y t 0

(2.234)

Ex. 2-15: Let the GDP of Bangladesh be growing approximately at a constant rate r. More specifically, we can write GDPt = (1+r)GDPt-1 , where r is the fixed growth rate of the variable GDP. By repeated substitution we get

GDPt = GDP0 (1+r) t . To estimate the growth rate r, we have to use the semi-log model.

Simple Regression Models

65

Estimation of Semi-log Models

Assume that the variable Y is growing approximately at a constant rate r. More specifically, we can write that Yt = (1+r)Yt-1 , where r is the fixed growth rate of the variable Y. By repeated substitution, we get Yt = Y0 (1+r) t

(2.235)

To estimate this model, we have to transform this non-linear relationship into a linear form. Taking logarithm on both sides of equation (2.235), we have (2.236)

lnYt = lnY0 +tln(1+r)

Let Z t = lnYt , ȕ 0 = lnY0 , ȕ1 =ln(1+r), and X t = t. Then, the transformed equation can be written as Zt

(2.237)

ȕ 0 +ȕ1X t

Since Z and X may not satisfy the relationship exactly, we have to add a random error term İ t . Thus, equation (2.237) will be Zt

(2.238)

ȕ 0 +ȕ1X t +İ t

Exponentiating this relationship we get the original model: Yt = eȕ0 +ȕ1t+İ t lnYt = ȕ 0 +ȕ1 t+İ t Zt

(2.239)

ȕ 0 +ȕ1X t +İ t

This is a linear relationship between Z and X, so we can apply the OLS method to estimate ȕ 0 and ȕ1 . Let ȕˆ 0 and ȕˆ 1 be the least squares estimates of ȕ 0 and ȕ1 respectively, which are given by the equations (2.233) and (2.232). The ˆ = Exp(ȕˆ ). estimated values of r and Y0 are given by rˆ = {Exp(ȕˆ 1 ) -1}, and Y 0 0 Thus, the original estimated equation is given by ˆ =Y ˆ (1+r) ˆt Y t 0

(2.240)

2.0e+11

GDP

26

lnGDPt

1.5e+11

25.5

1.0e+11

25

24.5 5.0e+10

24 0

1970 1970

1980

1990

2000

2010

Fig. 2-17 (a): An exponential functional form

1980

1990

2000

2010

2020

Fig. 2-17(b): A log-linear functional form

2020

Chapter Two

66

Ex. 2-16: A semi-log model is estimated using the OLS method based on the time-series data for the period 19722019 for the GDP in million USD (constant 2015 USD price) of Bangladesh6. Let us consider the growth model for the GDP of Bangladesh of the type

GDPt = GDP0 (1+r) t

(2.241)

where r is the fixed growth rate of the real GDP of Bangladesh. The logarithmic transformation of equation (2.241) is lnGDPt

(2.242)

lnGDP0 +tln(1+r)

Let us define, ȕ 0 = lnGDP0 , ȕ1 = ln(1+r), and X t = t, and adding a random error İ t , we have (2.243)

ln(GDPt ) = ȕ 0 + ȕ1X t +İ t

Equation (2.243) is called a semi-log model. Now, we will estimate this model based on the real GDP of Bangladesh. Let Yt = lnGDPt , then the equation (2.243) can be written as: (2.244)

Yt = ȕ 0 +ȕ1X t +İ t

This is a linear function between Y and X and we can apply the OLS method to the equation. For the given data we have X =

1176 1189.8829 = 24.5, Y = = 24.7892, 48 48

T

T

¦ X Y = 29585.5951, and ¦ X t

t

t=1

2 t

= 38024.

t=1

Thus, the OLS estimator ȕˆ 1 of ȕ1 is given by 12178.0379  (48 u 24.5 u 24.7892) ȕˆ 1 = 38024  (48 u 24.52 ) 0.0471

(2.245)

and ȕˆ 0 = 24.7892  (0.0471u 24.5)

(2.246)

= 23.6364

Thus, the growth rate is given by rˆ = Antilog(0.0471)  1= 0.0482, and GDPˆ0 = Antilog(23.6364) = 1.8414e+010.

Thus the estimated equation is given by yˆ t = 23.6364 + 0.0471t; R 2 t-test: 1131 63.3707 SE: 0.0209 0.00074

0.9887 ½° ¾ °¿

(2.247)

or ˆ =1.8414e+010(1+0.0482) t GDP t

(2.248)

Comment: From the estimated results, it is found that the average growth rate of the GDP of Bangladesh will increase by 4.82% for one more additional year. It is statistically significant at any significance level. The fitted model explains about 98.87 percent of the total variation in the GDP. Thus we can say that the fit is very good.

The results are obtained using RATS and given in Table-2.8. 6

Source: WDI, 2020

Simple Regression Models

67

Table 2-8: The OLS estimates of the semi-log model

Semi-log model- Estimation by Least Squares, Dependent Variable Real GDP of Bangladesh, Annual Data From 1972:01 To 2019:01 Variable Coefficient Std. Error T-Stat Signif Constant 23.6364 0.0209 1131. 0.0000 t 0.0471 0.00074 63.371 0.0000 Usable Observations 48 Standard Error of Estimate 0.0712 Centered R2 0.9887 0.2336 Sum of Squared Residuals 0.9884 Regression F(1,44) 4015.8389 Adjusted R2 Uncentered R2 0.9999 Significance Level of F 0.0000 TR2 48 Log Likelihood 59.6960 Mean of Dependent Variable 24.7892 Durbin-Watson Statistic 0.0882 Std Error of Dependent Variable 0.6625

23.5

24

24.5

25

25.5

26

Also, the predicted values with the actual values are shown below graphically:

1970

1980

1990 y

YR

2000

2010

2020

Fitted values

Fig. 2-18: Actual versus predicted values of the semi-log model

Double log Models

A functional relationship between Y and X is said to be a double log model in which both the dependent (Y) and the independent (X) variables appear in a logarithmic form. Let the relationship between Y and X be given by Yt = Y0 Xȕt

(2.249)

Taking logarithms on both sides of equation (2.249) and then adding a random error term İ t , the transformed equation will be lnYt = lnY0 +ȕlnX t +İ t

(2.250)

This transformed model (2.250) is known as the double-log model because both the dependent and independent variables are in logarithmic forms. In equation (2.250), the regression coefficient ȕ indicates the elasticity of Y with respect to X meaning that for a 1 percent change in X, the percentage change of Y will be ȕ . Estimation of Double log Models

Let Z t = lnYt , D = lnY0 , and Wt = lnX t . So, equation (2.250) can be written as Z t = Į + ȕWt +İ t

(2.251)

This is a linear equation and we can apply the OLS method to estimate Į and ȕ. Let Įˆ and ȕˆ be the least squares estimates of Į and ȕ, which are given by Įˆ

and

ˆ Z  ȕW

(2.252)

Chapter Two

68 T

ȕˆ =

¦ W Z - TW Z t

t

t=1 T

(2.253)

¦ Wt2 -TW 2 t=1

So, the estimated equation will be ˆ =Y ˆ Xȕˆ Y t 0 t

(2.254)

ˆ = Exp(Į) ˆ where Y 0

Ex. 2-17: A double log model is estimated using the OLS method based on the time-series data for the period 19722019 considering GDP (constant 2015 USD) as a dependent variable and the labour force (LF) as an independent variable of the USA7. The following regression model between GDP and LF is formulated GDPt = A 0 LFtȕ eİ t

(2.255)

The logarithmic transformation of the equation (2.255) is given by ln(GDP) t

ln(A 0 )+ȕln(LFt )+İ t

(2.256)

Let Yt = lnGDPt , X t = ln(LFt ), and D = ln(A 0 ). So, equation (2.256) can be written as (2.257)

Yt = D +ȕX t +İ t

The transformed equation (2.257) is a simple linear regression equation between Y and X. Thus, we can apply the OLS method to the equation. Let Įˆ and ȕˆ be the least squares estimators of D and ȕ which are given by T

ȕˆ

¦ X Y  TX Y t

t

t=1

(2.258)

T

¦ X 2t  TX 2 t=1

and Įˆ

ˆ Y  ȕX

(2.259)

For the given data we have Y = 29.9562, X = 11.7835,

T

¦ Y X = 16946.9328, t

t

t=1

and

T

¦X

2 t

= 6666.5478.

t=1

Putting these values in equations (2.258) and (2.259), we have 16946.9328  (48 u 11.7835 u 29.9562) ȕˆ = 6666.5478  (48 u 11.78352 )

2.0298

(2.260)

and Įˆ

29.9562  2.0298 u 11.7835 6.0386

Thus, the estimated equation is given by

7

Source: WDI, 2020

(2.261)

Simple Regression Models

ˆ = Y t

6.0386 + 2.0298X t ; R 2

69

0.9731½ ° ¾ ° ¿

SE: 0.5860 0.0497 t-Test: 10.3056 40.8234

(2.262)

Therefore, we have A0

Antilog(6.0386) = 419.3076

Thus, the estimated non-linear equation is given by ˆ = 419.3076 u LF2.0298 GDP t t

(2.263)

Comment: From the estimated results, it is found that the output elasticity with respect to the labour force is 2.0298 which is highly significant at any significance level. This indicates that if the labour force increases by 100% the GDP will be increased by 202.98%. It is found that 97.31% of the total variation in GDP is explained by the fitted equation, thus the fit is very good.

The results are also obtained using the RATS software and are given below in Table 2-9. Table 2-9: The OLS estimates of the double log model

Double log model- Estimation by Least Squares, Dependent Variable Real GDP of USA Annual Data From 1972:01 To 2019:01 Variable Coefficient Std. Error T-Stat Signif Constant 6.0386 0.5860 10.3056 0.0000 lnLF 2.0298 0.0497 40.8234 0.0000 0.0650 Usable Observations Standard Error of Estimate 48 0.1943 Sum of Squared Residuals Centred R 2 0.9731 1666.5523 Regression F(1, 46) 0.9726 Adjusted R 2 0.0000 Significance Level of F 2 0.9999 Uncentered R 64.1181 Log Likelihood 48 nR 2 0.1042 Durbin-Watson Statistic 29.9562 Mean of Dependent Variable Std Error of Dependent Var. 0.3923

29

29.5

30

30.5

Also, the predicted values with the actual values are shown below graphically:

11.4

11.6 y

x

11.8

12

Fitted values

Fig. 2-19: Actual versus predicted values of double log model

Note: Different software packanges such as EViews, GAUSS, LIMDEP, MATLAB, Python, R, RATS, SAS, SHAZAM, SPLUS, SPSS, STATA, TSP etc. can be applied directly to estimate simple regression models.

Chapter Two

70

Exercises 2-1: Define a population and sample regression functions with an example of each. 2-2: Define a simple linear regression equation with an example. 2-3: Explain all the terms of a simple linear regression equation. Also, represent this equation graphically. 2-4: Discuss the method of moments, OLS, and ML methods to estimate a simple linear regression equation. 2-5: Write the important properties of OLS estimators of a simple linear regression equation. 2-6: Show that the OLS estimators are unbiased of the true population parameters. 2.7: Show that the OLS estimators are the best linear unbiased estimators (BLUE). 2.8: Discuss the importance of the BLUE property. 2-9: Discuss the asymptotic properties of the OLS estimators. 2-10: Find the sampling distribution of the OLS estimators. 2-11: Discuss the t-test for testing the significance of the relationship between Y and X. 2-12: Define the (1  D )100 percent confidence interval for a population parameter. Discuss the technique to obtain a 95% confidence interval for the regression parameters. 2-13: Show that the total sum of squares can be partitioned into components’ sum of squares. 2-14: Define the coefficient of determination and interpret its meaning. Show that the coefficient of determination lies between 0 and 1. 2-15: Discuss the important characteristics of the coefficient of determination. 2-16: How do we estimate the elasticity of the dependent variable y with respect to the independent variable x from the regression coefficient? Explain. n

2-17: Show that the square of the correlation coefficient rxy2 is given by rxy2 = ȕˆ 12

¦x i=1 n

2 i

n

and rxy2 = ȕˆ 1

¦ yi2 i=1

yi

Yi  Y, and x i

¦x y i

i

i=1 n

; where

¦ yi2 i=1

X i  X.

2-18: Define the capital asset pricing model (CAPM) with an example. Derive the regression equations of the CAPM. Estimate the CAPM using the real data. 2-19: What is meant by a non-linear relationship? Define different types of non-linear regression equations and discuss their estimation techniques. 2-20: Explain the significance of the parameter estimates of different types of non-linear regression equations. 2-21: Write the regression equation for each of the following relationships along with the importance of the parameters of each of the models. Give an example of each model that will be applicable to deal with real problems. (i ) Level-level relationship, (ii) Log-level relationship, (iii) Level-log relationship (iv) Log-log relationship 2-22: Let the variable Y indicate the per capita expenditure ( in USD) on public schools and the variable X indicate the per capita income (in USD) by districts of a country for a particular year. The following information is given 14

¦Y

i

i 1

4848;

14

¦X i=1

i

101267;

14

¦X Y i

i=1

i

35928869, and

14

¦X

2 i

744436813

i=1

Total Sum of Squares (TSS) = 78670.857 and Residual Sum of Squares (ESS) = 16485.243 Requirements:

[i] Obtain the regression equation of expenditure on income; Yi = Į+ȕX i +İ i , and comment on your obtained results.

Simple Regression Models

71

ˆ and hence the standard error of ȕ. ˆ [ii] Obtain the var(ȕ)

[iii] Calculate R 2 and comment on your obtained result. [iv] Test the null hypothesis H 0 : ȕ = 0, , against the alternative hypothesis H1: ȕ z 0, at Į = 5%. [v] What is the elasticity of expenditure for schooling with respect to income in the country? Comment on your result. [vi] Calculate the 95% confidence interval for ȕ. 2-23: Let the variable Y indicate the level of savings (in $) and the variable X indicate the level of income (in $) of 26 households of city A. Here, given that 26

¦X

i

68773.1;

1=1

26

¦Y

i

4214.3;

i=1

26

¦X Y i

1=1

i

26

13180859.9; and ¦ Xi2 =235882818.47 i=1

Total Sum of Squares (TSS) = 99870.087, and Residual Sum of Squares (ESS) = 23248.298 Requirements:

[i] Obtain the regression equation Yi = Į+ȕX i +İ i and comment on your results. [ii] What will be the savings if the level of income is 10000 $? ˆ and hence the standard error of ȕ. ˆ [iii] Obtain the var(ȕ)

[iv] Do you think the relationship between savings and income is statistically significant at a 5% level of significance? Justify. [v] Calculate R 2 and Adjusted R 2 . Explain your results. [vi] Obtain the elasticity of savings with respect to income and comment on your result. [vii] Obtain the 95% and 90% confidence intervals for ȕ and explain your results. 2-24: Let the variable Y indicate the amount of profits in million $ and the variable X indicate the amount of sales revenue in million $. Here, given that 32

¦Y

1

i=1

200;

32

¦X i=1

i

7140;

32

¦X Y i

i=1

i

141760; and

32

¦X

2 i

4944778

i=1

Total Sum of Squares (TSS) = 3299, Residual Sum of Squares (ESS) = 332 Requirements:

[i] Obtain the regression equation Yi = ȕ 0 +ȕX i +İ i and comment on your results. [ii] Obtain the var(ȕˆ 1 ) and hence the standard error of ȕˆ 1 . [iii] Test the null hypothesis H 0 : ȕ1 = 0, at a 5% level of significance. [iv] Calculate R 2 and compare it with Adjusted R 2 . [v] Obtain the elasticity of profits with respect to sales revenue and comment on your result. [vi] Obtain the 95% confidence interval for ȕ1 and explain your results.

Chapter Two

72

2-25: Given below are the data of the average productivity (X) and average salary (Y) of 10 different districts.

Average productivity (X) in USD) 8 7 9 10 9 11 13 12 14 15 18 20

Average salary (Y) in USD) 50 51 52 54 53 57 60 43 68 88 92 98

[i] Obtain the regression equation Yi = Į+ȕX i +İ i and comment on your results. ˆ and hence the standard error of ȕ. ˆ [ii] Obtain the var(ȕ)

[iii] Do you think the relationship between Y and X is statistically significant at a 5% level of significance? Justify. [iv] Calculate R 2 and Adjusted R 2 . Explain your results. [v] Obtain 95% and 90% confidence intervals for ȕ and explain your results. 2-26: The following data indicate total sales revenue ( in billions USD) and profits (in millions USD) for 28 companies in a developed country.

Profits 26 53 18 6 5 11 9 14 9 4 3 4 3 7

Sales 1082 1245 879 345 367 618 575 768 560 210 156 170 150 589

Profits 3 3 3 4 3 2 3 2 1 2 1 2 1 1

Sales 119 89 75 98 55 65 50 45 33 32 36 32 30 28

Requirements:

[i] Obtain the regression equation of profits on sales revenue and comment on your estimated results. [ii] Find the 95% confidence interval for the regression coefficient. [iii] Calculate TSS, RSS and ESS. [iv] Calculate the value of the coefficient of determination, and comment on your obtained result. [v] Find the Adj(R 2 ) and compare with R 2 . [vi] Calculate the elasticity of profits with respect to sales revenue and comment on your result.

Simple Regression Models

73

2-27: The obtained points of 12 students on midterm (X) and final exam. (Y) are as follows:

x y

26 85

16 58

28 79

23 70

25 82

31 84

32 86

15 55

34 88

35 92

15 54

18 60

Requirements:

[i] Obtain the regression equation of y on x. [ii] Test the null hypothesis H 0 : ȕ = 0 against a suitable alternative hypothesis. [iii] Calculate the value of the coefficient of determination. [iv] How R 2 and Adj(R 2 ) are different? What caution should you take when using R 2 for comparing two or more nested models? [v] What would be the private investment if the level of GDP is TK 8765? 2-28: Estimate the linear log model for the production of rice in Bangladesh and then compare it with India using the real data. 2-29: Given below are data showing the price per unit in $ and consumption in kilograms of a commodity:

Price 30 48 25 55 40 20 35 50 36 45 42 40

Quantity Demanded 112 55 120 40 75 125 98 45 90 65 70 74

Obtain the reciprocal model and comment on your results. Also, obtain the price elasticity of demand. 2-30: Obtain the semi-log model for the GDP of Bangladesh and then compare it with that of India using the real data. 2-31: Let the functional relationship between private investment and GDP be given by pinv t = A 0 GDPtȕ eİ t , where pinv t indicates private investment at the time t, GDPt indicates the gross domestic product at time t and İ t is the random error term corresponding to the tth set of observation. The private investment and GDP in billions $ are given below:

Private Investment (in billion $) 825 855 826 810 875 896 970 1075 1140 1251 1275 1450

GDP (in billion $) 5260 5585 5690 5978 6245 6675 7210 7358 7975 8235 8356 8480

74

Chapter Two

Requirements:

[i] Obtain the regression equation of private investment on GDP. [ii] Test the null hypothesis H 0 : ȕ = 0 against a suitable alternative hypothesis. [iii] Calculate the value of the coefficient of determination and comment. [iv] What would be the private investment if the level of GDP is TK 8765?

CHAPTER THREE MULTIPLE REGRESSION MODELS

3.1 Introduction In chapter two, two-variable linear and non-linear regression models and their estimation techniques were discussed. In two-variable regression models, we assumed that the dependent variable is influenced only by one independent variable at a certain point in time or in a certain situation. However, irrespective of any discipline, it will be very difficult to find any situation in which the dependent variable is influenced only by one independent variable. In most cases, the dependent variable is influenced by several independent variables at the same time or in a certain situation. For example, in the case of a demand function, the quantity demanded of a commodity not only depends on the price of the commodity but also depends on the consumer’s level of disposable income, price of related commodities, price of complement commodities, etc. In the case of the production function, the level of output depends on input land, input labour, input capital, input size of the industry etc. The economic growth of a country depends on domestic investment, foreign direct investment, government expenditure, money supply, inflation rate, trade openness, human capital index etc. The annual returns of a firm depend on the size of the firm, price-to-earnings ratio, beta risk which is obtained from the CAPM, and market-to-book ratio of the firm etc. The number of deaths due to COVID-19 depends on age, immune response, level of blood sugar, blood pressure, albumin to creatinine ratio, aspartate aminotransferase (AST) level, and alanine transaminase (ALT) level etc. Thus, to deal with these types of functional relationships, students and researchers should be able to know the multiple linear and non-linear regression models. Therefore, in this chapter, regression models with two or more than two independent variables, i.e., multiple regression models are discussed. In this chapter, the assumptions underlying classical multiple linear regression models with estimation methods namely the OLS and ML methods, show how to fit them to a set of data points are discussed. Furthermore, how to deduce whether a relationship between a dependent variable (y) and independent variables X’s are statistically significant or not is discussed. In this chapter, the F-test, Wald test and LM test are also discussed for testing the joint null hypotheses. ANOVA tables are also presented in fitting multiple regression equations in different cases. Proper justifications for the techniques employed in a regression analysis are also provided with the numerical applications. All numerical analyses have been performed using RATS/EViews or STATA.

3.2 Multiple Regressions Meaning: The average relationship of a dependent variable with two or more than two independent variables is called a multiple regression. If the dependent variable Y depends on k (where k t 2 ) independent variables say X1 , X 2 ,....., and X k , then the general form of the relationship can be expressed as Y = f(X1 , X 2 ,.....,X k , ȕ 0 , ȕ1 , ȕ 2 ,.....,ȕ k )+İ

(3.1)

where f(X1 , X 2 ,.....,X k , ȕ 0 , ȕ1 , ȕ 2 ,.....,ȕ k ) is called the systematic or deterministic component, and İ is called the nonsystematic component or random error term. The parameters ȕ 0 , ȕ1 , ȕ 2 ,....., and ȕ k are the unknown and unsigned parameters connected with the independent variables X1, X2,……., and Xk, and dependent variable Y. These unknown and unobservable parameters are directly or indirectly related to elasticities and multipliers that will play important roles for economic decisions and actions. If the variable Y linearly depends on X1 , X 2 ,....., and X k , the relationship can be written as Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +İ i ; (i = 1, 2, ………..,N)

(3.2)

where Y is a response or dependent variable, X1 , X 2 ,....., and X k are k regressor or explanatory or independent variables; ȕ 0 is the regression constant which indicates the average value of Y given that all the independent variables are zero; and ȕ1 , ȕ 2 ,.....,and ȕ k are the regression coefficient where ȕ j (j = 1, 2,…….,k) represents the average amount of change of the dependent variable Y due to the per unit change of an explanatory variable X j given that all other explanatory variables are constant. İ is the random error term, which indicates the effect of random factors which are not included in the model.

Chapter Three

76

Ex. 3-1: The quantity demanded (Q) of a commodity depends on its price (P), price of related commodities (Pr), price of complement commodities (Pc), and the consumer’s income level (Y), and the relationship can be written as

(3.3)

Q = f(P, Pr ,Pc ,Y, ȕ 0 , ȕ1 , ȕ 2 , ȕ 3 , ȕ 4 )

The theory does not specify the mathematical form of the demand function. If we assume that the relationship is linear, the mathematical form is given by Qi = ȕ 0 +ȕ1Pi +ȕ 2 Pri +ȕ3 Pci +ȕ 4 Yi ; ( i = 1, 2, …….,N)

(3.4)

The demand of a commodity may be influenced by some other factors such as consumers’ taste and preference, the age structure of consumers, time, household size, quality of the commodity etc. These factors are omitted from the function. The influence of such kind of factors may be taken by introducing an error term in the model. Thus, the econometric model is given by Q = f(P, Pr ,Pc ,Y, ȕ 0 , ȕ1 , ȕ 2 , ȕ3 , ȕ 4 )+İ

(3.5)

If the relationship is linear, equation (3.5) can be written as Qi = ȕ 0 +ȕ1Pi +ȕ 2 Pri +ȕ3 Pci +ȕ 4 Yi  İ i ; (i = 1, 2, ………..,N)

(3.6)

The regression constant ȕ 0 represents the average value of Y when P, Pr , Pc , and Y are zero. The regression coefficient ȕ j (j=1, 2, 3, 4) represents the average change in quantity demanded of the commodity for per unit change in the jth (j=1, 2, 3, 4) explanatory variable given that all other independent variables are constant. The random error term İ involves certain assumptions for estimating the parameters of the regression model. Basic Assumptions of Multiple Linear Regression Models

To estimate multiple linear regression models, the following assumptions should be needed (i) The random error term İ i is normally distributed with mean zero and constant variance ı 2 , i.e., E(İ i ) = 0,  i, and Var(İ i ) = ı 2 ,  i. Thus, İ i ~NID(0, ı 2 ),  i. (ii) The random error terms are independent of each other, i.e., Cov(İ i , İ j )

0, for i z j. Thus, İ i ~NIID(0, ı 2 ),  i.

(iii) The dependent or response variable Y is a stochastic variable. (iv) Each of the explanatory variables X1 , X 2 ,........,X k is a non-stochastic variable whose values are arbitrarily selected by the researchers without error terms and these variables remain fixed values in repeated samples. (v) No exact linear relationship exists between any of the explanatory variables so that the regression matrix X is a non-stochastic matrix with full column rank, i.e., Rank(X) = (k+1) which is less than n. (vi) The number of observations must be greater than the number of explanatory variables. (vii) X j 's and İ i 's are independent, i.e., Cov(X j , İ i )

0, for all i and j.

(viii) Rank(XcX) = Rank (X) = (k+1).

3.3 Three Variables Linear Regression Model Meaning: The linear relationship between Y and two explanatory variables X1 and X 2 of the type: Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i , , ( i = 1, 2,……….,N)

(3.7)

is called a three-variable linear regression model. where Yi is the ith observation of the dependent variable Y, X ji is the ith observation of the jth independent variable ( j = 1, 2), ȕ 0 is the regression constant which indicates the expected value of Y given that X1 and X 2 are zero, ȕ j (j =1, 2) is the regression coefficient which represents the average change of the dependent variable Y for the per unit

Multiple Regression Models

77

change in an explanatory variable X j (j =1, 2) given that another explanatory variable is constant, and İ i is the random error term corresponding to the ith set of observations. Ex. 3-2: If private investment (PIN) of a country linearly depends on GDP and interest rate (IR), then the linear regression model of the type

(3.8)

PIN i = ȕ 0 +ȕ1GDPi +ȕ 2 IR i +İ i

is called a three-variable linear regression model. where ȕ 0 is the regression constant which indicates the average value of PIN when GDP and IR are zero, ȕ1 is the regression coefficient which represents the average change of the dependent variable PIN for per unit change in an explanatory variable GDP given that IR is constant, ȕ 2 is the regression coefficient which represents the average change of the dependent variable PIN, for the per unit change of an explanatory variable IR given that GDP is constant and İ i is the random error term corresponding to the ith set of observations. Estimation of Three Variables Linear Regression Models

Assume that the dependent variable Y linearly depend on two explanatory variables X1 and X2, then the three variables regression model can be written as Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i , ( i = 1, 2,………., N)

(3.9)

All the terms of this model are explained previously. That is why not illustrated again. Since Yi depends on İ i which is unknown to us, we have to take the expected value of Yi for given values of X1i and X 2i . Thus, we have E(Yi |X1i = X1i , X 2i =X 2i ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +E(İ i |X1i = X1i , X 2i =X 2i )

(3.10)

Based on the assumptions, equation (3.10) can be written as Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i

(3.11)

Let ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 are the least squares estimators of ȕ 0 , ȕ1 , and ȕ 2 respectively and ei is the residual. Thus, for sample data, this equation can be written as yi = ȕˆ 0 +ȕˆ 1X1i +ȕˆ 2 X 2i  ei , (i = 1, 2, ……,n) ei = Yi  ȕˆ 0  ȕˆ 1X1i  ȕˆ 2 X 2i

(3.12)

The estimators ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 can be obtained using the OLS method. The principle of the OLS method is that these estimators are obtained by minimising the residual sum of squares, that is by minimizing

n

2 i

¦ e . The residual sum of i=1

n

¦e

squares

2 i

is given by

i=1

n

n

¦ e ¦ (Y  ȕˆ 2 i

i

i=1

Since

0

 ȕˆ 1X1i  ȕˆ 2 X 2i ) 2

(3.13)

i=1

n

¦e

2 i

is a function of ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 to apply the least-squares principle, partial derivatives of

i=1

respect to ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 should be zero. That is,

n

¦e i=1

2 i

with

Chapter Three

78 n

½ ° i=1 =0 Ÿ 2¦ (Yi  ȕˆ 0  ȕˆ 1X1i  ȕˆ 2 X 2i ) = 0 ° ° įȕˆ 0 i=1 ° n ° į¦ ei2 n ° i=1 ˆ ˆ ˆ =0 Ÿ 2¦ (Yi  ȕ 0  ȕ1X1i  ȕ 2 X 2i )X1i = 0 ¾ įȕˆ 1 i=1 ° ° n ° į¦ ei2 n i=1 =0 Ÿ 2¦ (Yi  ȕˆ 0  ȕˆ 1X1i  ȕˆ 2 X 2i )X 2i = 0 °° įȕˆ 1 i=1 ° °¿

į¦ ei2

n

(3.14)

Equation (3.14) is called a system of three linear equations in three variables ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 . This system can be written as n

n

i=1

i=1

n

nȕˆ 0 + ȕˆ 1 ¦ X1i ȕˆ 2 ¦ X 2i

¦Y

i

i=1

n

n

n

i=1

i=1

i=1

n

n

n

i=1

i=1

i=1

ȕˆ 0 ¦ X1i + ȕˆ 1 ¦ X1i2  ȕˆ 2 ¦ X1i X 2i ȕˆ 0 ¦ X 2i + ȕˆ 1 ¦ X1i X 2i  ȕˆ 2 ¦ X 22i

½ ° ° n ° X1i Yi ¾ ¦ i=1 ° n ° X 2i Yi ° ¦ i=1 ¿

(3.15)

The three equations of the system (3.15) are called the normal equations. The system of equations in (3.15) can be solved using different methods namely: substitution method, matrix form and Cramer’s rule. Substitution Method

The first equation of (3.15) can be written as ȕˆ 0 = Y  ȕˆ 1X1  ȕˆ 2 X 2

(3.16)

Putting the value of ȕˆ 0 in the second equation of (3.15), we have n

n

i=1

i=1

n

nX1 ª¬ Y  ȕˆ 1X1  ȕˆ 2 X 2 º¼ + ȕˆ 1 ¦ X1i2  ȕˆ 2 ¦ X1i X 2i n

n

i=1

i=1

¦X

1i

Yi

i=1

n

nX1Y  nȕˆ 1X12  nȕˆ 2 X1X 2 +ȕˆ 1 ¦ X1i2 +ȕˆ 2 ¦ X1i X 2i =

¦X

§ n · § n · ȕˆ 1 ¨ ¦ X1i2  nX12 ¸  ȕˆ 2 ¨ ¦ X1i X 2i  nX1X 2 ¸ © i=1 ¹ © i=1 ¹

1i

1i

Yi

i=1

n

¦X

Yi  nX1Y

i=1

ȕˆ 1SSX1 + ȕˆ 2SP(X1 ,X 2 ) = SP(X1 ,Y)

(3.17)

Again putting the value of ȕˆ 0 in the third equation of (3.15), we have ȕˆ 1SP(X1 , X 2 ) + ȕˆ 2SS(X 2 ) = SP(X 2 ,Y)

(3.18)

Equations (3.16), (3.17) and (3.18) are called reduced forms of normal equations. From equation (3.17), we have SP(X1 ,Y)  ȕˆ 2SP(X1 ,X 2 ) ȕˆ 1 = SSX1

(3.19)

and from equation (3.18), we have SP(X 2 ,Y)  ȕˆ 1SP(X1 ,X 2 ) ȕˆ 2 = SSX 2

(3.20)

Multiple Regression Models

79

Putting the value of ȕˆ 1 in equation (3.20), we have ª SP(X1 ,Y)  ȕˆ 2SP(X1 ,X 2 ) º ȕˆ 2SSX 2 = SP(X 2 ,Y)  SP(X1 ,X 2 ) « » SSX1 «¬ »¼ 2 ȕˆ 2 ªSSX 2SSX1  >SP(X1 ,X 2 ) @ º = SP(X 2 ,Y)SSX1  SP(X1 ,X 2 )SP(X1 ,Y) ¬ ¼

ȕˆ 2

SP(X 2 ,Y)SSX1  SP(X1 ,X 2 )SP(X1 ,Y) 2

SSX 2SSX1  >SP(X1 ,X 2 ) @

(3.21)

Again, putting the value of ȕˆ 2 in equation (3.19), we have ȕˆ 1

SP(X1 ,Y)SSX 2  SP(X1 ,X 2 )SP(X 2 ,Y) 2

SSX 2SSX1  >SP(X1 ,X 2 ) @

(3.22)

Putting the value of ȕˆ 1 and ȕˆ 2 in equation (3.16) we can obtain ȕˆ 0 . Putting the value of ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 , we can obtain the estimated equation, which is given by ˆ = ȕˆ +ȕˆ X +ȕˆ X Y i 0 1 1i 2 2i

(3.23)

So, the estimated residuals are given by ˆ , ( i= 1, 2,…..,n) ei = Yi  Y i

Matrix Form

The system (3.15) can be written as the following matrix form ª « n « « n « ¦ X1i « i=1 « n « ¦ X 2i «¬ i=1

n

¦ X1i i=1 n

2 1i

¦X i=1

n

¦X

1i

X 2i

i=1

n

º X 2i » ¦ i=1 » ªȕˆ 0 º n »« » X1i X 2i » « ȕˆ 1 » ¦ i=1 » «ȕˆ » n » «¬ 2 »¼ 2 X 2i » ¦ »¼ i=1

ª n º « ¦ Yi » « i=1 » « n » « ¦ X1i Yi » « i=1 » « n » « ¦ X 2i Yi » «¬ i=1 »¼

(3.24)

Equation (3.24) can also be written as X cXȕˆ = X cY

ª1 X11 «1 X 12 « «. . « where X = « . . «. . « «1 X1n-1 ««1 X ¬ 1n

(3.25) X 21 º X 22 »» . » » . »; . » » X 2n-1 » X 2n »¼»

ª Y1 º «Y » « 2» ˆ ªȕ 0 º « . » « » ˆȕ = « ȕˆ » ; and Y = « . » 1 « » «ˆ » « » . «¬ȕ 2 »¼ « » « Yn-1 » «« Y »» ¬ n¼

Equation (3.25) can be written as: 1 ȕˆ = XcX X cY

Solving equation (3.26), we can obtain the value of ȕˆ 0 , ȕˆ 1 and ȕˆ 2 respectively. Cramer’s Rule

(3.26)

Chapter Three

80

Using Cramer’s rule the ȕˆ 0 , ȕˆ 1 and ȕˆ 2 are given by n

n

¦Y n

i=1 n

i=1 n

ȕˆ 0

n

2 1i

¦X

1i

i=1

¦X

2i

n

¦X

Yi

1i

i=1

n

¦X

X 2i

n

¦X

¦X

1i

i=1

n

n

i=1

n

2i

i=1

¦X

1i

n

n

1i

i=1

¦X

1i

i=1

¦X

Yi

1i

n

i=1

2 1i

¦X n

2i

i=1

2 2i

¦X

1i

(3.28)

¦X

2i

i=1

i=1

n

¦X n

i=1

1i

X 2i

1i

i=1

¦X

n

n

¦X n

Yi

2i

n

¦X

2i

i=1

i=1

n

2 2i

i=1

n

2i

n

n

i=1

n

X 2i

¦X

¦X

i

n

¦X

1i

i=1

¦Y

¦X

2i

i=1

X 2i

i=1

ȕˆ 1

¦X

i=1

n

¦X

n

2 1i

i=1

n

¦X

(3.27)

i=1

¦X

1i

2 2i

i=1

n

¦X

X 2i

i=1

i=1

n

2i

i=1

¦X

Yi

1i

¦X

1i

i=1

¦X

n

¦X

i

n

¦X

1i

X 2i

i=1

X 2i

i=1

n

¦X

2 2i

i=1

and n

n

¦X

n

¦Y

1i

i

i=1

n

n

¦X i=1

i=1

n

ȕˆ 2

¦X

2 1i

¦X

1i

n

2i

i=1

¦X

X 2i

1i

i=1

¦X

1i

i=1

n

n

¦X

1i

i=1

i=1

2 1i

¦X i=1

n

¦X

n

2i

¦X

1i

i=1

n

¦X

1i

Yi

2i

Yi

i=1 n

¦X i=1

n

n

i=1

X 2i

(3.29)

n

¦X

2i

i=1

n

¦X

1i

X 2i

i=1

n

¦X

2 2i

i=1

Solving equations (3.27), (3.28) and (3.29), we can find the values of ȕˆ 0 , ȕˆ 1 and ȕˆ 2 respectively. Estimation of Three Variables Linear Regression Models Using Matrix ( Deviation Form]

In deviation form, equation (3.9) can be written as Yi  Y = ȕ1 (X1i  X1 ) +ȕ 2 (X 2i  X 2 )+İ i [ since, ȕ 0 = Y  ȕ1X1  ȕ 2 X 2 ] yi = ȕ1 x1i +ȕ 2 x 2i +İ i

(3.30)

Multiple Regression Models

81

where yi = Yi  Y, x1i = (X1i  X1 ), and x 2i = (X 2i  X 2 ) Let ȕˆ 1 and ȕˆ 2 be the least squares estimators of ȕ1 and ȕ 2 , and ei be the residual. Thus, for sample data equation (3.30) can be written as yi = ȕˆ 1 x1i +ȕˆ 2 x 2i +ei ; (i = 1, 2, ……,n)

(3.31)

In matrix form, equation (3.31) can be written as Y = Xȕˆ + e

(3.32)

ª y1 º «y » « 2» « . » where Y = « » ; X = « . » « . » « » ¬« y n ¼» nu1

ª x11 «x « 12 « . « « . « . « ¬« x1n

x 21 º ª e1 º «e » x 22 »» « 2» ª ȕˆ 1 º «.» . » » ; ȕˆ = « » ; and e = « » . » «¬ȕˆ 2 »¼ 2×1 «.» » «.» . » « » x 2n ¼» nu 2 ¬«e n ¼» nu1

The least squares estimator ȕˆ of ȕ can be obtained by minimising the residual sum of squares. The residual sum of squares is given by S = ece ˆ c (Y  Xȕ) ˆ = (Y  Xȕ) = Y cY  Y cXȕˆ  ȕˆ cXcY+ȕˆ cX cXȕˆ

(3.33)

Since, ȕˆ cX cY is a scalar quantity, (ȕˆ cX cY)c = Y cXȕˆ = ȕˆ cX cY Therefore, equation (3.33) can be written as S = Y cY  2ȕˆ cX cY+ȕˆ cX cXȕˆ

=

n

¦y i=1

2 i

 2 ª¬ȕˆ 1b1  ȕˆ 2 b 2 º¼ +ȕˆ 12 a11 +ȕˆ 1ȕˆ 2 a12 +ȕˆ 1ȕˆ 2 a 21 +ȕˆ 22 a 22

where YcY =

n

2 i

¦y , i=1

ªb º XcY = « 1 » , and XcX = ¬ b 2 ¼ 2u1

ª a11 «a ¬ 21

(3.34) a12 º . a 22 »¼ 2×2

Taking partial differentiation of S with respect to ȕˆ and then equating to zero, we have

įS =0 įȕˆ  § įS ¨ ˆ ¨ įȕ1 ¨ įS ¨¨ ˆ © įȕ 2

· ¸ ¸= 0 ¸  ¸¸ ¹

2b1 +2ȕˆ 1a11 +ȕˆ 2 a12 +ȕˆ 2 a 21 = 0 ½° ¾ 2b 2 +ȕˆ 1a12 +ȕˆ 1a 21 +2ȕˆ 2 a 22 = 0 °¿

Since, a12 = a 21 , system (3.35) can be written as

(3.35)

Chapter Three

82

2b1 +2ȕˆ 1a11 +2ȕˆ 2 a12 = 0 ½° ¾ 2b 2 +2ȕˆ 1a 21 +2ȕˆ 2 a 22 = 0 °¿

(3.36)

The system (3.36) of two linear equations can be arranged in the following matrix form a12 º ª ȕˆ 1 º ª0 º « » = a 22 »¼ «¬ȕˆ 2 »¼ «¬0 »¼

ªb º ªa 2 « 1 » + 2 « 11 ¬b2 ¼ ¬a 21

2X cY + 2(XcX)ȕˆ = 0  ȕˆ = (X cX)-1X cY

(3.37)

From equation (3.37), we can obtain ȕˆ 1 and ȕˆ 2 which are given by n

n

¦x y ¦x 1i

ȕˆ 1 =

i

i=1

2 2i

i=1

n

n

n

i=1

i=1

 ¦ x1i x 2i ¦ x 2i yi

ª n º x1i2 ¦ x 22i  « ¦ x1i x 2i » ¦ i=1 i=1 ¬ i=1 ¼

=

n

2

SP(x1 ,y) SS(x 2 )  SP(x1 ,x 2 ) SP(x 2 ,y) 2

SS(x1 ) SS(x 2 )  >SP(x1 ,x 2 ) @

(3.38)

and n

ȕˆ 2 =

=

n

2 1i

n

¦x y ¦x ¦x 2i

i

i=1

i=1

n

n

1i

i=1

n

x 2i ¦ x1i yi i=1

n

ª º x1i2 ¦ x 22i  « ¦ x1i x 2i » ¦ i=1 i=1 ¬ i=1 ¼

2

SP(x 2 ,y) SS(x1 )  SP(x1 ,x 2 ) SP(x1 ,y) 2

SS(x 2 ) SS(x1 )  >SP(x1 ,x 2 ) @

(3.39)

Then, we can obtain ȕˆ 0 which is given by ȕˆ 0 = Y  ȕˆ 1X1  ȕˆ 2 X 2

(3.40)

Therefore, the estimated equation is given by ˆ = ȕ +ȕˆ X +ȕˆ X Y i 0 1 1i 2 2i

(3.41)

The residuals are given by: ei = Yi  ȕˆ 0  ȕˆ 1X1i  ȕˆ 2 X 2i

(3.42)

Variance and Standard Error of OLS Estimators of Three Variables Regression Models

Equation (3.38) can also be written as n

ȕˆ 1 =

¦>x i=1

n

1i

SS(x 2 )  x 2i SP(x1 ,x 2 ) @ yi

ª n º x ¦ x  « ¦ x1i x 2i » ¦ i=1 i=1 ¬ i=1 ¼ 2 1i

n

2

2 2i

where x1i = X1i  X1 , and x 2i = X 2i  X 2 .

(3.43)

Multiple Regression Models

Let D =

83

2

n

n ª n º x1i2 ¦ x 22i  « ¦ x1i x 2i » and Pi = x1i SS(x 2 )  x 2iSP(x1 ,x 2 ) ¦ i=1 i=1 ¬ i=1 ¼

Therefore, equation (3.43) can be written as follows 1 n ȕˆ 1 = ¦ Pi yi D i=1

(3.44)

The variance of ȕˆ 1 is given by ı2 var(ȕˆ 1 ) = 2 D

n

2 i

¦P

(3.45)

i=1

Now n

n

¦ Pi2

¦ >x

i=1

i=1 n

¦x

2 1i

i=1

2

1i

SS(x 2 )  x 2iSP(x 1 , x 2 ) @ n

n

i=1

i=1

(SS(x 2 )) 2  2¦ x1i x 2iSS(x 2 )SP(x1 , x 2 )  ¦ x 22i (SP(x1 , x 2 )) 2 2

2

= SS(x1 )(SS(x 2 )) 2  2SS(x 2 ) >SP(x1 , x 2 ) @ +SS(x 2 ) >SP(x1 , x 2 ) @ 2 = SS(x 2 ) ªSS(x1 )SS(x 2 )  >SP(x1 , x 2 )@ º ¬ ¼

Putting the value of

n

2 i

¦P

(3.46)

and D in equation, (3.45) we have

i=1

SS(x 2 ) u ı2 SS(x 2 )SS(x 2 )  [SP(x1 ,x 2 )]2

var(ȕˆ 1 ) =

(3.47)

Similarly, we can find the variance of ȕˆ 2 which is given by SS(x1 ) u ı2 SS(x 2 )SS(x 2 )  [SP(x1 ,x 2 )]2

var(ȕˆ 2 ) =

(3.48)

The variance of ȕˆ 0 is given by ª 1 X 2SS(x 2 )+X 2 2SS(x1 )  2X1X 2SP(x1 ,x 2 ) º 2 var(ȕˆ 0 ) = « + 1 »ı SS(x1 )SS(x 2 )  [SP(x1 ,x 2 ]2 ¬n ¼

(3.49)

Since, ı 2 is unknown to us, we have to replace it with its’ unbiased estimate. The unbiased estimate of ı 2 is given by: n

2

2

ıˆ = s =

¦e

2 i

i=1

(3.50)

n 3

Therefore, the standard errors of ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 are given by n

SE(ȕˆ 0 ) = var(ȕˆ 0 ) =

2 1

2

¦e

2

ª 1 X SS(x 2 )+X 2 SS(x1 )  2X1X 2SP(x1 ,x 2 ) º i=1 i « + »× SS(x1 )SS(x 2 )  [SP(x1 ,x 2 ]2 ¬n ¼ n-3

(3.51)

Chapter Three

84

n

SE(ȕˆ 1 ) =

var(ȕˆ 2 ) =

¦e n

var(ȕˆ 2 ) =

SE(ȕ 2 ) =

2

ª º i=1 i SS(x 2 ) u « 2 » ¬ SS(x1 )SS(x 2 )  [SP(x1 ,x 2 ] ¼ n  3

¦e

(3.52)

2

ª º i=1 i SS(x1 ) u « 2 » ¬ SS(x1 )SS(x 2 )  [SP(x1 ,x 2 ] ¼ n  3

(3.53)

Test of Significance of Parameter Estimates of Three Variables Regression Equations

The null hypothesis to be tested is H0 : ȕ j

0, ( j = 0, 1, 2)

against the following three alternative hypotheses Case 1: H1 : ȕ j ! 0 Case 2: H1 : ȕ j  0 Case 3: H1 : ȕ j z 0 Method: Under the null hypothesis, the test statistic is given by t=

ȕˆ j  E(ȕˆ j )

~ t (n-3)d.f.

var(ȕˆ j ) ȕˆ j var(ȕˆ j )

~ t (n-3)d.f.

t cal

(3.54)

Let the level of significance be D . We then find the table value of the test statistic at a D level of significance with (n-3) degrees of freedom in each case and compare the calculated value with the table value to make a decision on whether a null hypothesis will be accepted or rejected. If the calculated value falls in the acceptance region, the null hypothesis will be accepted, otherwise it will be rejected. Test of Significance of Equality of Two Parameters ȕ1 and ȕ 2 of Three Variables Regression Equations

The null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 = ȕ

H 0 : ȕ1  ȕ 2 = 0

against the alternative hypothesis H1: ȕ1 z ȕ 2

The residual sum of squares of the unrestricted regression equation is given by ˆ c (Y  Xȕ) ˆ ece = (Y  Xȕ) = Y cY-ȕˆ cX cY Y cY = ȕˆ cX cY + ece

TSS = RSS + ESS

(3.55)

Multiple Regression Models

85

where TSS = Total sum of squares, RSS = Regression sum of squares and ESS = Residual sum of squares. Under the null hypothesis, the regression equation (3.9) is given by  Yi = ȕ 0 + ȕ(X 1i +X 2i ) + İ i  +İ Yi = ȕ 0 + ȕZ i i

(3.56)

In matrix notation, the equation (3.56) can be written as Y = ZȜ+İ ª Y1 º «Y » « 2» « . » where Y = « » , Z = « . » « . » « » ¬« YN ¼» N×1

ª1 Z1 º ª İ1 º «1 Z » «İ » 2 » « « 2» «. . » « . » ªȕ 0 º , Ȝ = « » , and İ = « » . « »  «. . » « . » ¬ȕ ¼ «. . » « . » « » « » ¬«1 Z N »¼ N×2 ¬«İ N ¼» N×1

Let Oˆ is the OLS estimator of O , then for sample observations the restricted model is given by: ˆ Y = ZȜ+e r

where Oˆ

(3.57)

Z cZ

-1

ZcY

Thus, the total sum of squares for the restricted regression equation is given by: YcY = Oˆ cZcY + ecr e r

TSSr = RSSr +ESSr

(3.58)

where YcY is the total sum of squares, Oˆ cZcY = RSSr is the regression sum of squares, and ecr e r =ESSr is the residual sum of squares for the restricted regression equation. So, the extra sum of squares due to H 0 is given by Extra SS = ESSr  ESS = Y cY  Ȝˆ cZcY  Y cY+ȕˆ cXcY = ȕˆ cX cY  Ȝˆ cZcY

(3.59)

with degrees of freedom (n-2)-(n-3) = 1 Under the null hypothesis, the test statistic is given by F=

ȕˆ cX cY  Ȝˆ cZcY ~ F(1, n  3) ESS/(n  3)

(3.60)

Let, at, Į = 5% level of significance with 1 and n-3 degrees of freedom the table value of the F-test statistic be Ftab . Decision: If the calculated value of the test statistic is greater than the table value Ftab , then the null hypothesis will be rejected, implying that both variables have equal importance on Y. Otherwise, we reject the null hypothesis.

Chapter Three

86

Table 3-1: ANOVA Table

Source of Variation Reg.

d.f. 3 2

Sum ȕˆ cX cY Ȝˆ cZcY

MS ȕˆ cXcY/3 Ȝˆ cZcY/2

1

ȕˆ cX cY-Ȝˆ cZcY

n-3 n

ece YcY

ȕˆ cXcY-Ȝˆ cZcY ece/n-3 Y cY/n

Reg due to H 0 Extra Residual Total

F-Test

F=

Decision If, F> Ftab at Į level of significance we reject H 0 , otherwise we accept it

ȕˆ cX cY  Ȝˆ cZcY ~F(1, n-3) ESS/(n-3)

Confidence Interval Estimation for the Population Parameter ȕ j (j =0, 1, 2) from the Student t-Test

To construct the confidence interval for the population parameter ȕ j (j=0, 1, 2), first, we have to choose the (1  D )100 percent confidence level and then find out the table value of the t-test statistic corresponding to (1  D )100% confidence level. Let the table value of the t-test statistic be t{(n-3)Į/2} .

Therefore, (1  D )100% confidence interval of the test statistic t is given by Prob{  t{(n-3)Į/2} d t d t{(n-3)Į/2} } 1  D Prob{  t{(n-3)Į/2} d

ȕˆ j  ȕ j t d t{(n-3)Į/2} } 1  D SE(ȕˆ ) j

Prob ȕˆ j  t{(n-3)Į/2}SE(ȕˆ j ) d ȕ j d ȕˆ j +t{(n-3)Į/2}SE(ȕˆ j )

^

`

1D

(3.61)

Therefore, the (1  D )100% confidence interval for the population parameter ȕ j is given by ªȕˆ j  t{(n-3);D /2}SE(ȕˆ j ), ȕˆ j +t{(n-3);Į/2}SE(ȕˆ j ) º ¬ ¼

(3.62)

where {ȕˆ j  t{(n-3)Į/2}SE(ȕˆ j )} is called the lower limit and {ȕˆ j +t{(n-3)Į/2}SE(ȕˆ j )} is called the upper limit of the interval. If we consider the level of significance Į = 5%, then we can obtain the 95% confidence level from equation (3.62) for the population parameter ȕ j (j = 0, 1, 2) Coefficient of Multiple Determination in Case of Three Variables Regression Equations

The coefficient of multiple determination R 2 is defined as the proportion of the total variation in the dependent variable Y as explained by the fitted regression equation of Y on X1 and X2 which is given by

R2

Regression Sum of Squares Total Sum of Squares n

¦ (yˆ -y)

2

¦ (y -y)

2

i

i=1 n

i

i=1

n

¦ yˆ

2 i

-ny 2

¦y

2 i

-ny 2

i=1 n

i=1

Another Method:

The coefficient of multiple determination R 2 is given by

(3.63)

Multiple Regression Models n

¦ (Yˆ  Y)

2

¦ (Y  Y)

2

87

i

R2

i=1 n

i

i=1

n

¦ yˆ

2 i

¦y

2 i

i=1 n

i=1

n

1

2 i

¦e i=1 n

(3.64)

¦y

2 i

i=1

We know that ei = yi  yˆ i and yˆ i

ȕˆ 1 x1i +ȕˆ 2 x 2i .

The sum of squared residuals can be written as: n

n

2 i

i

 yˆ i `

i

 ȕˆ 1 x1i  ȕˆ 2 x 2i

¦ e ¦ e ^y i

i=1

i=1 n

¦ e ^y i

i=1 n

¦e y i

i=1

i

n

n

i=1

i=1

 ȕˆ 1 ¦ ei x1i  ȕˆ 2 ¦ ei x 2i n

¦e x

We know that

`

i

0 and

1i

i=1

n

¦e x i

2i

(3.65)

0.

i=1

Therefore, equation (3.65) can be written as n

n

2 i

¦e ¦e y i

i=1

i

i=1 n

¦ ^y

i

 yˆ i ` yi

i

 ȕˆ 1 x1i  ȕˆ 2 x 2i yi

i=1 n

¦ ^y

`

i=1 n

¦y

2 i

i=1

n

n

i=1

i=1

 ȕˆ 1 ¦ x1i yi  ȕˆ 2 ¦ x 2i yi

Putting the value of

n

¦e

2 i

(3.66)

in equation (3.64)

i=1

n

R2

1

¦y i=1

2 i

n

n

 ȕˆ 1 ¦ x1i yi  ȕˆ 2 ¦ x 2i yi i=1

i=1

n

¦y

2 i

i=1

ȕˆ 1SP(x1 ,y) + ȕˆ 2SP(x 2 ,y) SS(y)

(3.67)

Chapter Three

88

Thus, the value of R 2 can also be obtained using the technique in (3.67). The value of R 2 lies between 0 and 1. The higher the value of R 2 , the greater the variation of the dependent variable Y is explained by the fitted regression equation and the fit is quite good. If the value of R 2 is close to zero, the greater the variation of the dependent variable Y is explained by the random error term and the fit is not good. 2 Adjusted R

Since the inclusion of an additional independent variable is likely to increase the regression sum of squares for the same total sum of squares regardless of how irrelevant it may be in the equation, as a result, R 2 will be increased, because equation (3.64) does not take into account the loss of degrees of freedom due to the inclusion of the additional independent variables. Thus, to overcome this problem, we use adjusted R 2 which is defined by

ESS n-k-1 Adjusted R =1TSS n-1 2

§ n-1 · TSS-RSS = 1- ¨ ¸ © n-k-1 ¹ TSS § n-1 · § RSS · = 1- ¨ ¸ ¨1  ¸ © n-k-1 ¹ © TSS ¹

§ n-1 · 2 = 1- ¨ ¸ 1  R © n-k-1 ¹ § n-1 · § n-1 · 2 = 1- ¨ ¸¨ ¸R © n-k-1 ¹ © n-k-1 ¹ -k § n-1 · 2 ¨ ¸R n-k-1 © n-k-1 ¹

(3.68)

where n is the number of observations and k is the number of explanatory variables in the equation Ex. 3-3: Data on total income (TINC, in billion BDT), deposit money banks’ investment (INVEST, in billion BDT) in the banking sector and money supply ((M2) in billion BDT) of Bangladesh over a period of time are given below. Table 3-2: Total income, investment (INVEST) and money supply (M2) in the banking sector

Year 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989

TINC 1.2891 1.7709 2.1456 2.3725 2.8579 3.3258 4.4001 5.8226 7.5085 8.8735 12.1347 16.1048 18.4315 20.7063 22.9188 26.2918

M2 12.596 13.968 17.398 21.41 27.6 32.449 41.36 45.487 58.992 83.858 105.342 123.381 143.531 164.08 190.781 222.976

INVEST 9.148 10.803 12.962 16.263 21.058 28.337 34.139 44.805 50.596 69.586 91.922 110.007 119.121 141.189 23.463 22.082

Year 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

TINC 45.524 50.9722 59.6942 79.6139 92.6807 97.4897 108.594 126.0868 153.441 207.8684 233.929 311.5922 368.9982 457.6637 586.5271 735.4997

INVEST 558.691 630.271 747.624 871.742 986.16 1139.945 1297.212 1514.464 1806.743 2115.044 2487.948 2964.997 3630.312 4405.199 5171.095 6035.054

M2 90.19 104.385 129.279 139.404 144.592 187.742 178.515 206.429 188.867 244.964 390.923 490.914 564.05 747.992 954.279 1348.618

Multiple Regression Models

28.2294 1990 29.3173 1991 30.5067 1992 27.6932 1993 27.9681 1994 32.6162 1995 37.898 1996 Source: Bangladesh Bank

250.044 285.26 315.356 364.03 422.123 456.905 506.275

25.449 49.12 55.764 59.155 78.706 78.192 82.394

803.7272 857.4984 902.2317 908.6477 970.8793

2013 2014 2015 2016 2017 2018 2019

1118.6475 1274.0731

89

7006.235 7876.137 9163.778 10160.761 11099.81 12196.115 13737.35

1654.174 1705.878 1736.275 1680.739 1695.581 1889.909 2518.515

(i) Estimate the linear regression equation of total income on deposit money banks’ investment and money supply (M2). (ii) Obtain the variance of the OLS estimators. (iii) Testify the significance of the relationship. (iv) Test the null hypothesis H 0 : ȕ1

ȕ2 .

(v) Calculate the 95% confidence interval for the parameter ȕ j (j =1, 2) (vi) Compare the R2 with the adjusted R2. Solution: Let the variable Y indicate the total income (in billion BDT), the variable X1 indicate deposit money banks investment (in billion BDT), and the variable X2 indicates money supply (in billion BDT). The linear regression equation of Y on X1 and X2 is given by

(3.69)

Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i

where ȕ 0 is the regression constant which indicates the average value of Y when X1 and X2 are zero, ȕ1 is the regression coefficient which indicates the average impact of the per unit change in the variable X1 on Y given that X2 is fixed, ȕ 2 is the regression coefficient which indicates the average impact of the per unit change in X2 on Y given that X1 is fixed, and İ i is a random error term corresponding to the ith set of observations which satisfies all the usual assumptions of a CLRM. For the given problem we have Y=

10923.0630 20226.4750 111507.889 = 237.4579, X1 = = 439.7060, and X 2 = = 2424.0845. 46 46 46

SS(X1 ) =

n

¦ X1i2  nX12 = 19761117.5337, SS(X 2 ) = i=1

n

¦X

2 2i

 nX 22 = 616636766.3021

i=1

n

n

i=1

i=1

SP(X1 ,X 2 ) = ¦ X1i X 2i  nX1X 2 =108647629.5807, SP(X1 ,Y) = ¦ X1i Yi  nX1Y = 10608107.4803 and n

SP(X 2 ,Y) = ¦ X 2i Yi  nX 2 Y = 59187624.8503. i=1

The OLS estimators ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 of the parameters ȕ 0 , ȕ1 , and ȕ 2 are given by (10608107.4803 u 616636766.3021)  (59187624.8503 u108647629.5807) ȕˆ 1 = 2 (19761117.5337 u 616636766.3021)  >108647629.5807 @

= 0.2906

(3.70)

(59187624.8503 u 19761117.5337)  (10608107.4803 u 108647629.5807) ȕˆ 2 = 2 (19761117.5337 u 616636766.3021)  >108647629.5807 @

= 0.0448

(3.71)

Chapter Three

90

and ȕˆ 0 = 237.4579  (0.2906 u 439.706)  (0.0448 u 2424.0845)

= 1.1224

(3.72)

The estimated equation is given by ˆ = 1.1224 + 0.2906X + 0.0448X ; R 2 Y i 1i 2i t-test: 0.2398 8.6658 SE: 4.6809 0.0335

7.4600 0.0060

0.9948½ ° ¾ ° ¿

(3.73)

Using Matrix Form

In matrix form, the OLS estimator ȕˆ of ȕ is given by ȕˆ = (X cX) -1X cY

-2.0381e-005 -3.4007e-007 º ª10923.0630 ª 0.0315 º « -2.0381e-005 1.6179e-006 -2.8507e-007 » «15411043.5823 » « »« » «¬-3.4007e-007 -2.8507e-007 5.1850e-008 »¼ «¬85666053.0360 »¼

ª1.1224 º «0.2906 » « » «¬0.0448 »¼

(3.74)

From equation (3.74), we have ȕˆ 0 = 1.1224, ȕˆ 1

0.2906, and ȕˆ 2

0.0448

Comment: From the estimated results, it is found that for an increase in investment by one billion BDT, the average value of the total income will increase by 0.2906 billion BDT while the money supply is constant. However, for an increase in additional unit of money supply, the total income on an average will increase by 0.0448 billion BDT with a fixed investment. The average total income will be 1.1224 billion BDT when total investment and money supply will be at zero level. The impacts of investment and money supply on total income are statistically significant at any significance level. From the estimated value of R2, it can be said that 99.48% of the total variation in the dependent variable total income is explained by the fitted regression equation and the fit is very good. Table 3-3: Three variables model is estimated by using RSTS

Estimation by Least Squares Method, Dependent Variable Real Total Income, Variable Coefficient Std. Error T-Stat 0.2398 4.6809 1.1224 Constant 8.6658 0.0335 0.2906 INVEST 7.4600 0.0060 0.0448 M2 Standard Error of Estimate Usable Observations 46 Sum of Squared Residuals Centered R2 0.9948 Regression F(2, 43) Adjusted R2 0.9946 Significance Level of F 0.9964 Uncentered R2 Log Likelihood 45.835 nR 2 Durbin-Watson Statistic 237.4579 Mean of Dependent Variable 357.8696 Std Error of Depen. Variable (ii) Variance of the OLS estimators

The residual sum of squares is given by n

2 i

n

¦ e =¦ (Y  1.1224  0.2906X i=1

i

i=1

1i

 0.0448X 2i ) 2 = 29886.1422

Signif 0.8116 0.0000 0.0000 26.3634 29886.1422 4124.5142 0.00000 -214.23087 0.7491

Multiple Regression Models

91

Thus, we have ª 1 (439.712 ×616636766.30)+(2424.082 ×19761117.53)  (2×345.41×1714.23×63365296.34) º var(ȕˆ 0 ) = « + » (19761117.53 u 616636766.30)  [108647629.5807]2 ¬ 46 ¼

u

29886.1422 43

= 20.4819

(3.75)

ª º 29886.1422 616636766.30 u var(ȕˆ 1 ) = « 2 » 43 ¬ (19761117.53 u 616636766.30)  [108647629.5807] ¼

= 1.0512e-003

(3.76)

ª º 29886.1422 19761117.53 var(ȕˆ 2 ) = « u 2 » 43 ¬ (19761117.53 u 616636766.30)  [108647629.5807] ¼

= 3.3687e-005

(3.77)

Hence, the standard errors of ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 are given by SE(ȕˆ 0 ) = 21.9109 = 4.6809, SE(ȕˆ 1 ) = 1.1245e-003= 0.0335, and SE(ȕˆ 2 ) = 3.6037e-005 = 0.0060

(iii) Here the null hypothesis to be tested is

H0 : ȕ j

0 ( j = 1, 2)

against the alternative hypothesis H1 : ȕ j z 0

Method: For the given problem we have obtained SE(ȕˆ 1 ) = 0.0324, and SE(ȕ 2 )

Under the null hypothesis, H 0 : ȕ1 t=

0, the test statistic is given by

0.2906 ~t 43 d.f. 0.0335

= 8.6658 Under the null hypothesis, H 0 : ȕ 2 t=

0.0058.

(3.78) 0, the test statistic is given by

0.0448 ~t 43 d.f. 0.0060

= 7.4600

(3.79)

Comment: From the estimated results, it is found that both the parameters ȕ1 and ȕ 2 are statistically significant at any significance level which implies that the variables INVEST and M2 have significant positive impacts on TINC. (iv) Here, the null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 = ȕ

against the alternative hypothesis H1: ȕ1 z ȕ 2

Under the null hypothesis, the test statistic is given by

Chapter Three

92

F=

RESS  UESS ~ F(1, 43) UESS/43

(3.80)

The unrestricted residual sum of squares (UESS) is given by UESS = 29886.142186 Under the null hypothesis the restricted residual sum of squares (RESS) is given by RESS = 56862.403239 Thus, the test statistic is given by: F=

56862.4032  29886.1422 ~ F(1, 43) 29886.1422 /43

= 38.8133

(3.81)

Let the level of significance be 5% Decision: At a 5% level of significance with 1 and 43 degrees of freedom, the table value of the test statistic is 4.067. Since the calculated value of the test statistic is greater than the table value, we reject the null hypothesis implying that the two parameters are not equal in importance in the regression equation. Another Method: (iv) The null hypothesis to be tested is H 0 : ȕ1 = ȕ 2

Or H 0 : ȕ1  ȕ 2 = 0 against the alternative hypothesis H1: ȕ1 z ȕ 2

Or H 0 : ȕ1  ȕ 2 z 0

Under the null hypothesis, the test statistic is given by t=

=

ȕˆ 1  ȕˆ 2 ~t (n-k-1)d.f. var(ȕˆ 1  ȕˆ 2 ) ȕˆ 1  ȕˆ 2 ~t (n-k-1)d.f. var(ȕˆ 1 )+var(ȕˆ 2 )  2cov(ȕˆ 1 ,ȕˆ 2 )

(3.82)

The variance-covariance matrix is given by ª var(ȕˆ 0 ) cov(ȕˆ 0 ,ȕˆ 1 ) cov(ȕˆ 0 ,ȕˆ 1 ) º « » ˆ = « cov(ȕˆ ,ȕˆ ) var(ȕ) var(ȕˆ 1 ) cov(ȕˆ 1 ,ȕˆ 2 ) » = (X cX)-1ıˆ 2 1 0 « » var(ȕˆ 2 ) »¼ «¬cov(ȕˆ 2 ,ȕˆ 0 ) cov(ȕˆ 2 ,ȕˆ 1 ) -0.0142 -2.3635e-004 º ª 21.9109 « -0.0142 1.1245e-003 -1.9813e-004 »» « «¬-2.3635e-004 -1.9813e-004 3.6037e-005 »¼

Putting these values in equation (3.82), we have

(3.83)

Multiple Regression Models

t =

0.2906  0.0448 1.1245e-003 +3.6037e-005  (2 u -1.9813e-004)

93

~t (43)d.f.

= 6.2297

(3.84)

Let the level of significance be D

5%.

Decision: At a 5% level of significance with 41 degrees of freedom, the table values of the t-test statistic is ±2.01. Since the calculated value of the test statistic does not fall in the acceptance region, the null hypothesis will be rejected implying that the two parameters ȕ1 and ȕ 2 are not equal, indicating that the two variables INVEST and M2 have no equal impacts on TINC. (v) For the given problem, we have ȕˆ 1 = 0.2906, and SE(ȕˆ 1 ) = 0.0335. At Į = 5% with 43 degrees of freedom we have, t{43;0.025} 2.01.

Putting these values in equation (3.62), the 95% confidence interval of ȕ1 is given by

>0.2906  2.01u 0.0335, 0.2906+2.01u 0.0335@ = [0.2233, 3579]

(3.85)

Also, we have ȕˆ 2 = 0.0448, and SE(ȕˆ 2 ) = 0.0060. Thus, putting the values of all the terms in equation (3.62) we have, the 95% confidence interval for ȕ 2 :

>0.0448

 (2.01u 0.0060), 0.0448+(2.01u 0.0060)@

= [0.0329, 0.0569]

(3.86)

(vi) For the given problem, the total sum of squares (TSS) is TSS =

n

2 i

¦Y

 nY 2

5763180.0556 ,

i=1

The residual sum of squares (ESS) is ESS = 29886.1422

Thus, we have R2

1

29886.1422 5763180.0556

= 0.9948

(3.87)

Another Method:

Putting the values of all the terms in equation (3.67), we have R2

(0.2906 u 10608107.4803) + (0.0448 u 59187624.8503) 5763180.0556 = 0.9948

(3.88)

The adjusted R 2 is given by § 46  1 · Adj(R 2 ) = 1  ¨ ¸ (1  0.9948) © 46  2  1 ¹

= 0.9946

(3.89)

Chapter Three

94

Comment: From the estimated result of R2, it can be said that in a three-variable regression model, 99.48% of the total variation in the dependent variable total income is explained by the fitted regression equation and the remaining 0.52% is explained by random factors. Thus, it can be concluded that the fit is very good. It is also found that the estimated value of adjusted R2 is almost equal to the estimated value of R 2 . Thus it can be said that there is no irrelevant independent variable in the model.

3.4 Multiple Linear Regression Model Meaning: A multiple linear regression model is defined as the linear relationship between three or more than three variables in which one is the dependent variable and the remaining are independent variables plus a random error term. In a multiple linear regression model, the change in dependent variable can be explained by reference to a changes in several independent variables. If the dependent variable (Y) linearly depends on k (k•2) independent variables X1, X2,….., and Xk, the multiple linear regression equation is given by Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +İ i ; (i= 1, 2, ………………,N)

(3.90)

where Yi is the ith observation of the dependent variable Y, X ji is the ith observation of the jth independent variable ( j = 1, 2, .....,k) , ȕ 0 is the regression constant which indicates the average value of Y when all X’s are zero, ȕ j ( j = 1, 2, .…..,k) is the jth regression coefficient which implies the average change in Y for a per unit change in Xj given that all other remaining variables are constant, and İ i is the random error term corresponding to the ith set of observations. Ex. 3-4: Assume that the return (RET) of firms linearly depend on various firm-specific factors namely: size of the firm (SIZE), market-to-book ratio (MB), price-to-earning ratio (PE), beta risk (BETA), and investment (INV) at a particular year. Thus the linear equation of RET on SIZE, MB, PE, BETA and INV is given by

(3.91)

RETi = ȕ 0 +ȕ1SIZE i +ȕ 2 MBi +ȕ3 PE i +ȕ 4 BETA i +ȕ5 INVi +İ i

where RETi is the percentage annual return of the ith firm, SIZEi is the size of the ith firm which is measured in terms of sales revenue, PE i is the price to earning (P/E) ratio of the ith firm, BETA i is the ith firm’s CAPM beta coefficient and INVi is the capital investment of the ith firm. ȕ 0 is the regression constant which indicates the average return of the firms if all the explanatory variables are zero, ȕ j is the jth (j = 1, 2, 3, 4, 5) regression coefficient which represents the average change of the dependent variable RET for a per unit change in the jth explanatory variable given that all other explanatory variables are constant. Variance-Covariance Matrix of a Multiple Linear Regression Equation

For, i = 1, 2,……., N, equation (3.90) can be written as Y1 = ȕ 0 +ȕ1X11 +ȕ 2 X 21 +...... +ȕ k X k1 +İ1 ½ ° Y2 = ȕ 0 +ȕ1X12 +ȕ 2 X 22 +......+ȕ k X k2 +İ 2 ° °° . ¾ . ° ° . ° YN = ȕ 0 +ȕ1X1N +ȕ 2 X 2N +......+ȕ k X kN +İ N °¿

(3.92)

The above system of N linear equations can be written in the following matrix form (3.93)

Y = Xȕ+İ

ª Y1 º «Y » « 2» « . » where Y= « » , X = « . » « . » « » ¬« YN ¼» N×1

ª1 X11 «1 X 12 « «. . « . «. «. . « 1 X 1N ¬«

X 21 X 22 . . . X 2N

..... X k1 º ..... X k2 »» . . » , ȕ= » . . » . . » » ..... X kN ¼» (N×(k+1))

The variance-covariance matrix of İ is given by

ªȕ 0 º «ȕ » « 1» «.» , and İ = « » «.» «.» « » ¬«ȕ k ¼» (k+1)u1

ª İ1 º «İ » « 2» « . » « » . « . » « . » « » ¬«İ N ¼» N×1

Multiple Regression Models

95

Var (İ) = E(İ  E(İ)) (İ  E(İ))c E(İİ c), [  E(İ) = 0] 

ª E(İ12 ) E(İ1İ 2 ) « E(İ 22 ) « E(İ 2 İ1 ) « . . = « . « . « . . « «¬ E(İ N İ1 ) E(İ N İ 2 )

..... E(İ1İ N ) º » ..... E(İ 2 İ N ) » ..... . » » ..... . » ..... . » » ..... E(İ 2N ) »¼

We assumed that Var (İ i ) = E(İ i2 )

ı 2 ,  i, and Cov (İ i ,İ j ) = E(İ i , İ j )

(3.94)

0,  i z j. Based on the assumptions,

equation (3.94) can be written as

var(H )

ªV 2 « « 0 « . « « . « . « «¬ 0

0

V2 . . . 0

. . . . . . . . . .

. . . .

. . . .

0 º » 0 » . » » . » . » » V 2 »¼ N u N

= ı2 IN

(3.95)

ª1 0 ... ... 1 º « 0 1 ... ... 0 » « » where I N = « . . ... ... . » « » « . . ... ... . » «¬ 0 0 ... ... 1 »¼ Nu N

Estimation of Multiple Linear Regression Equations

In matrix form, the multiple linear regression equation can be written as Y = Xȕ+İ

(3.96)

Y, X, ȕ and İ are defined previously. Since the value of Y depends on İ , which is a vector of unknown random error terms, we have to take the expected value of Y for a given value of X. The expected value of Y for a given value of X is given by E(Y|X) = Xȕ y = Xȕ

(3.97)

Let ȕˆ be the vector of least squares estimators of the parameters and e is the vector of residuals. Thus, for sample data equation (3.97) can be written as ˆ y = Xȕ+e

(3.98)

Chapter Three

96

ª y1 º «y » « 2» « . » where y = « » , X = « . » « . » « » ¬« y n ¼» n×1

ª1 X11 «1 X 12 « «. . « . «. «. . « ¬«1 X1n

X 21 X 22 . . . X 2n

ªȕˆ 0 º ..... X k1 º ª e1 º « » «e » » ˆ ..... X k2 » « ȕ1 » « 2» « » «.» . . » . , ȕˆ = « » , and e = « » . » «.» . . » «.» « » «.» » . . «.» « » » «ˆ » ..... X kn ¼» (n×(k+1)) ¬«e n ¼» n×1 ¬«ȕ k ¼» ku1

The least squares estimator ȕˆ of ȕ can be obtained by minimising the residual sum of squares. The residual sum of squares is given by S = ece ˆ c (y  Xȕ) ˆ = (y  Xȕ)

= ycy  ycXȕˆ  ȕˆ cXcy+ȕˆ cXcXȕˆ

(3.99)

Since ȕˆ cX cy is a scalar quantity, (ȕˆ cX cy)c = ycXȕˆ = ȕˆ cXcy Therefore, equation (3.99) can be written as S = ycyc  2ȕˆ cXcy+ȕˆ cX cXȕˆ n

¦y i=1

2 i

 2 ª¬ȕˆ 0 b 0  ȕˆ 1b1  ....  ȕˆ k b k º¼  ª¬ a 00ȕˆ 02  a 01ȕˆ 0ȕˆ 1  ....  a 0k ȕˆ 0ȕˆ k  a10ȕˆ 1ȕˆ 0 +a11ȕˆ 12  ...... 

a1k ȕˆ 1ȕˆ k  ......  a k0ȕˆ k ȕˆ 0  a k1ȕˆ k ȕˆ 1  .....  a kk ȕˆ 02 º¼ ª b0 º «b » « 1» « . » where XcY= « » , and XcX = « . » « . » « » ¬« b k ¼» (k+1)×1

ª a 00 «a « 10 « . « « . « . « ¬«a k0

(3.100)

a 01 . . . a 0k º a11 . . . a1k »» . . . . . » , » . . . . . » . . . . . » » a k1 . . . a kk ¼» (k+1)×(k+1)

Taking partial differentiation of S with respect to ȕˆ and then equating to zero, we have § įS ¨ ˆ ¨ įȕ0 ¨ įS ¨ ¨ įȕˆ1 įS ¨ Ÿ = 0 ¨ . įȕˆ  ¨ . ¨ ¨ . ¨ įS ¨ ¨¨© įȕˆ k

· ¸ ¸ ¸ ¸ ¸ ¸ ¸ = 0 ¸ ¸ ¸ ¸ ¸ ¸¸¹

ª a 00 ªb0 º «b » «a « 10 « 1» « . » « . Ÿ -2 « »  2 « « . « . » « . » « . « « » ¬« b k ¼» ¬« a k0

-2X cy+2XcXȕˆ = 0  ȕˆ = (X cX) -1X cy

a 01

. . . .

a 11 . . . a k1

. . . . .

. . . . .

. . . . .

. . . . .

a 0k º ª Eˆ0 º « » a 1k »» « Eˆ1 » . »« . » »« » . »« . » . » «« . »» » a kk ¼» «« Eˆ »» ¬ k¼

0 

(3.101)

Multiple Regression Models

97

Another way

Taking partial differentiation of S with respect to ȕˆ 0 , ȕˆ 1 , ……, ȕˆ k and then equating to zero, we have n įS ½ =0 Ÿ 2¦ (yi  ȕˆ 0  ȕˆ 1X1i  ȕˆ 2 X 2i  ...........  ȕ k X ki ) = 0 ° ˆ įȕ 0 i=1 ° n ° įS =0 Ÿ 2¦ (yi  ȕˆ 0  ȕˆ 1X1i  ȕˆ 2 X 2i  ..........  ȕ k X ki )X1i = 0 ° ˆ įȕ1 i=1 ° ° ............ ¾ ° ............ ° ° ............ ° n įS =0 Ÿ 2¦ (yi  ȕˆ 0  ȕˆ 1X1i  ȕˆ 2 X 2i  ..........  ȕˆ k X ki )X ki = 0 ° ° įȕˆ k i=1 ¿

(3.102)

The above system of (k+1) linear equations can be written as n

n

n

n

½ ° i=1 i=1 i=1 i=1 ° n n n n n ° 2 ˆȕ ˆ ˆ ˆ 0 ¦ X1i +ȕ1 ¦ X1i +ȕ 2 ¦ X1i X 2i +.......+ȕ k ¦ X1i X ki = ¦ X1i y i ° i=1 i=1 i=1 i=1 i=1 °° ............ ¾ ° ............ ° ............ ° ° n n n n n ȕˆ 0 ¦ X ki +ȕˆ 1 ¦ X1i X ki +ȕˆ 2 ¦ X 2i X ki +.......+ȕˆ k ¦ X 2ki = ¦ X ki yi ° °¿ i=1 i=1 i=1 i=1 i=1 nȕˆ 0 +ȕˆ 1 ¦ X1i +ȕˆ 2 ¦ X 2i +.........+ȕˆ k ¦ X ki =

¦y

i

(3.103)

The above (k+1) equations are called the normal equations or reduced forms of the normal equations. These equations can be written as the following matrix form: ª1 X11 «1 X 12 « «. . « . «. «. . « «¬1 X1n

X 21 ..... X k1 ºc ª1 X11 X 22 ..... X k2 »» ««1 X12 . . . » «. . » « . . . » «. . . . . » «. . » « X 2n ..... X kn »¼ «¬1 X1n

X 21 ..... X k1 º ªȕˆ 0 º « » X 22 ..... X k2 »» « ȕˆ 1 » . . . »« . » »« » . . . »« . » « » . . . »« . » » X 2n ..... X kn »¼ ««ȕˆ »» ¬ k¼

ª1 X11 «1 X 12 « «. . « . «. «. . « «¬1 X1n

X 21 ..... X k1 ºc ª y1 º X 22 ..... X k2 »» «« y2 »» . . . » « . » » « » (3.104) . . . » « . » . . . » « . » » « » X 2n ..... X kn »¼ «¬ y n »¼

XcXȕˆ = Xcy ȕˆ = (X cX) -1X cy

(3.105)

Thus, ȕˆ = (XcX)-1X cy is a unique least squares estimator of ȕ. This is called the ordinary least squares estimator of ȕ. Ex. 3-5: A multiple linear regression model is estimated using the OLS method based on the time-series data for the period 1970-2018 considering the regression model of per capita carbon dioxide emissions in metric tons (CO2), per capita primary energy consumption (EN) (millions Btu), trade openness (OPN) (% of exports and imports of GDP), urbanisation (UR) (% urban population of total) and per capita real GDP (PGDP) (constant 2015 US $) of the US1, of the type Yt = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ3 X 3t +ȕ 4 X 4t +İ t

(3.106)

where Y is the per capita carbon dioxide emissions in metric tons, X1 is the energy consumption (kg of oil equivalent per capita), X 2 is the trade openness (% of exports and imports of GDP), X3 is the urbanisation (% urban population 1

Source: World Bank Development Indicator

Chapter Three

98

of total), and X 4 is the per capita real GDP (PGDP) (constant 2015 US $). ȕ 0 is the regression constant which indicates the average value of carbon dioxide emissions when all the independent variables are zero, ȕ j (j = 1, 2, 3, 4) is the jth regression coefficient indicating the average change of carbon dioxide emissions for the per unit change in the jth explanatory variable (j=1, 2, 3, 4) given that remaining variables are fixed, İ t is the random error term corresponding to the ith set of observations which satisfied all the assumptions of a CLRM. In matrix form, equation (3.106) can be written as (3.107)

Y = Xȕ+İ

The OLS estimator ȕˆ of ȕ is given by -1 ȕˆ = X cX X cY

(3.108)

For the given data we have

(XcX)-1

-0.0818 -0.0896 -5.1739 1.5851e-003 º ª 367.2334 « -0.0818 1.1072e-004 3.5765e-005 6.3309e-004 -1.1886e-007 »» « « -0.0896 3.5765e-005 6.1981e-003 1.0394e-003 -3.5083e-006 » , and XcY = « » 6.3309e-004 1.0394e-003 0.0759 -2.3732e-005 » « -5.1739 «¬1.5851e-003 -1.1886e-007 -3.5083e-006 -2.3732e-005 9.3631e-009 »¼

ª941.3737 º «313384.6292 » « » «19877.8650 » « » «72476.9925 » «¬35754049.3329 »¼

Putting the values of (XcX)-1 and X cy in equation (3.108) ª-14.2622 º «0.0745 » « » » ȕˆ = « -0.0590 « » «0.1796 » «¬-9.86753e-005»¼

(3.109)

Thus, we have ȕˆ 0 =  14.2622, ȕˆ 1 = 0.0745, ȕˆ 2 =  0.0590, ȕˆ 3 = 0.1796, and ȕˆ 4 =  9.86753e-005

The estimated equation is given below ˆ Y t

 14.2622 + 0.0745X1t  0.0590X 2t  0.1796X 3t  9.86753e-005X 4t ; R 2

t-Test:  20170 SE: 7.0711

19.1812 0.0039

 2.0320 0.0291

1.7665 0.1016

 2.7637 3.57  E05

0.9588½ ° ¾ (3.110) ° ¿

Comment: From the estimated results, it is found that, for one unit increase in energy consumption, on an average, carbon dioxide emissions will increase by 0.0745 units given that all the explanatory variables are constant; for one unit increases in trade openness, on an average, carbon dioxide emissions will decrease by 0.0590 units given that all other explanatory variables are constant; for one unit increases in urban population, on an average, carbon dioxide emissions will increase by 0.1796 units given that all other independent variables are constant; for one unit increases in per capita real GDP, on an average, carbon dioxide emissions will decrease by 0.0000986 units given that all other independent variables are constant. The average carbon dioxide emissions will be -14.2622 units when all the independent variables are zero. It is also found that the impacts of the variables on carbon emissions are statistically significant. From the estimated result of R2, it can be said that 95.88% of the total variation in the dependent variable carbon dioxide emissions is explained by the fitted regression equation and the remaining 4.12% is explained by random factors. Thus, it can be concluded that the fit is very good.

The results are obtained using RATS and given in Table 3-4.

Multiple Regression Models

99

Table 3-4: The OLS estimates of the multiple linear regression model

Estimation by Least Squares, Dependent Variable Per Capita Carbon Dioxide Emissions, Annual Data From 1970:01 To 2018:01 Variable Coefficient Std. Error T-Stat Signif 0.0498 -2.0170 7.0711 Constant -14.2622 0.0000 19.1812 0.0039 0.0745 ENER 0.0482 -2.0320 0.0291 -0.0590 OPN 0.0843 1.7665 0.1016 0.1796 URP 0.0083 -2.7637 3.57-E05 -9.87-E05 PGDP 19.2117 Mean dependent var 49 Usable Observations 1.7407 S.D. dependent var 0.9588 R-squared 0.9404 Akaike info criterion 0.9551 Adjusted R-squared 1.1334 Schwarz criterion 0.3690 S.E. of regression 1.0136 Hannan-Quinn criterion 5.9907 Sum squared resid 0.3867 Durbin-Watson stat -18.0386 Log-likelihood 256.0503 F-statistic 0.0000 Prob(F-statistic)

3.5 Properties of the Ordinary Least Squares (OLS) Estimators (i) Property of Linearity: The OLS Estimator ȕˆ Is a Linear Function of Y Proof: The ordinary least squares estimator of ȕ is given by ȕˆ = (X cX)-1X cy

= By

(3.111)

where B = (XcX)-1Xc is a {(k+1) u n} matrix of fixed numbers. Let us define B as ª b 01 «b « 11 « . B =« « . « . « «¬ b k1

b02 b12 . . . b k2

. . . . . .

. . . . . .

. . . . . .

. b 0n º . b1n »» . . » » . . » . . » » . b kn »¼ (k+1)u n

Thus, equation (3.111) can be written as ªȕˆ 0 º ª b « » « 01 « ȕˆ 1 » « b11 « » « . «.» = « «.» « . « » « «.» « . « ˆ » ¬« b k1 ¬«ȕ k ¼»

b02 b12 . . . b k2

. . . . . .

. . . . . .

. . . . . .

. b0n º ª y1 º » «y » . b1n » « 2» » « . » . . » « » . . » « . » » « . » . . » « » . b kn ¼» (k+1)×n ¬« y n ¼»

(3.112)

Thus, we have ȕˆ j =

n

¦b

ji

yi , j = 0, 1, 2,......,k

(3.113)

i=1

Thus, each ȕˆ j is a linear combination of the components of the vector Y. Therefore, it can be said that least squares estimators are the linear function of observations of Y and so called linear estimators.

Chapter Three

100

(ii) Property of Unbiasedness: The OLS Estimator ȕˆ Is an Unbiased Estimator of ȕ Proof: We have ȕˆ = (X cX) -1X cy

= (XcX)-1X c[Xȕ+İ] = ȕ +(X cX)-1X cİ

(3.114)

Taking expectation in both sides of equation (3.114), we have ˆ = ȕ +(X cX)-1XcE(İ) E(ȕ) = ȕ +(X cX)-1X c0  =ȕ

(3.115)

Thus, we can say that ȕˆ is an unbiased estimator of ȕ. (iii) Variance Property: The Variance Covariance Matrix of ȕˆ Is Given by (XcX)-1ı 2 Proof: The variance-covariance matrix of ȕˆ is given by ˆ = E[ȕˆ  E(ȕ)] ˆ [ȕˆ  E(ȕ)] ˆ c var(ȕ) = E[ȕˆ  ȕ] [ȕˆ  ȕ]c

(3.116)

From equation (3.114), we have ȕˆ  ȕ = (X cX)-1X cİ . Putting this value in equation (3.116), we have ˆ = E[(X cX)-1X cİ((X cX)-1X cİ)c] var(ȕ) = [(XcX)-1X cE(İİ c)X(X cX)-1 ]

= (XcX)-1Xcı 2 IX(XcX)-1 = (X cX)-1ı 2

(3.117)

(iv) Property of Sampling Distribution: The OLS Estimator ȕˆ Is Normally Distributed with Mean Vector ȕ and Variance-Covariance Matrix (XcX)-1ı 2 Proof : We have ȕˆ = ȕ +(X cX)-1Xcİ

(3.118)

Since the least squares estimators are linear combinations of the İ cs which are normally distributed, thus, ȕˆ is ˆ = ȕ and var(ȕ) ˆ = (X cX) -1ı 2 . Thus, it can be said that ȕˆ is normally distributed normally distributed. We have E(ȕ) ˆ with mean vector ȕ and variance-covariance matrix (XcX)-1ı 2 , i.e., ȕ~N(ȕ, (X cX)-1ı 2 ). (v) Gauss Markov Theorem: Least Square Estimators Have Minimum Variance Among all Linear Unbiased Estimators and So ȕˆ Is the Best Linear Unbiased Estimator of ȕ

Proof: Let ȕ be any other linear unbiased estimator of ȕ which differs from the OLS estimator ȕˆ of ȕ by a quantity DY, such that ȕ  ȕˆ = DY.

Thus, we can write

Multiple Regression Models

101

ȕ = ȕˆ + DY = ȕ +(X cX)-1X cİ+D(Xȕ+İ)

= ȕ +DXȕ+[(X cX)-1X c+D]İ

(3.119)

Taking expectation in both sides of equation (3.119), we have  = ȕ +DXȕ E(ȕ)

(3.120)

Since, ȕ is an unbiased estimator of ȕ , this implies that DX=0. Variance-Covariance Matrix of ȕ

Since DX = 0, from equation (3.119), we have ȕ  ȕ = [(X cX)-1Xc+D]İ

(3.121)

The variance-covariance matrix of ȕ is given by  = E[ȕ  ȕ][ȕ  ȕ]c var(ȕ)

= E[[(X cX)-1X c+D]İ][[(X cX)-1X c+D]İ]c = [(XcX)-1X c+D] E(İİ c)[X(X cX)-1 +Dc] = [(XcX)-1Xc+D] ı 2 I[X(X cX)-1 +Dc]

= [(XcX)-1XcX(XcX)-1 +(X cX)-1X cDc+DX(X cX)-1 +DDc] ı = [(XcX)-1 +(X cX) -1 (DX)c+DX(XcX)-1 +DDc] ı 2 = [(XcX)-1 +DDc] ı 2 [ Since, DX=0]

= (X cX)-1ı 2  DDcı 2 ˆ + A Positive Quantity var(ȕ)

(3.122)

From equation (3.122), we have  > var(ȕ) ˆ var(ȕ)

(3.123)

Thus, we can say that the OLS estimators have minimum variance among the class of all other linear unbiased estimators. Hence, OLS estimators are BLUE. (vi) Invariance Property of the Least Square Estimators : Best Linear Unbiased Estimator of Any Linear Function of Parameters Is the Same Linear Function of the Least Squares Estimators. More Specifically It Can be Said that BLUE of the Linear Function ȥ = Ȗcȕ =

k

¦Ȗ ȕ j

j=1

j

ˆ = Ȗcȕˆ = Is ȥ

k

¦ Ȗ ȕˆ j

j

j=1

Proof: We have ȥˆ = Ȗ cȕˆ ˆ = Ȗ cȕ = ȥ ˆ = ȖcE(ȕ) E(ȥ)

Thus, ȥˆ = Ȗ cȕˆ is an unbiased estimator of ȥ = Ȗ cȕ

(3.124)

Chapter Three

102

The variance-covariance matrix of ȥˆ is given by ˆ ˆ = Ȗ cvar(ȕ)Ȗ var(ȥ) = Ȗ cı 2 (XcX)-1 Ȗ

(3.125)

Let us define ȥ = PcY be any other linear unbiased estimator of ȥ = Ȗ cȕ . Then, we have  = PcE(Y) = PcXȕ = Ȗcȕ which implies that PcX = Ȗc. E(ȥ)

The variance-covariance matrix of ȥ is given by var(\ ) = Pcvar(Y)P = Pcı 2 IP = ı 2 PcP

(3.126)

Again, from equation (3.124), we have ˆ = ı 2 PcX(X cX)-1X cP var(ȥ)

(3.127)

The difference between equations (3.126) and (3.127) is

  var(ȥ) ˆ = ı 2 PcP  ı 2 PcX(X cX)-1X cP var(ȥ) = ı 2 Pc ª¬ I  X(XcX)-1Xcº¼ P = ı 2 PcMP, [where M = I  X(XcX)-1X c]

(3.128)

Here, M is a symmetric, idempotent and semi-definite positive matrix. Therefore, we have   var(ȥ) ˆ t0 var(ȥ)  t var(ȥ) ˆ Ÿ var(ȥ) ˆ d var(PcY) Ÿ var(Ȗ cȕ)

(3.129)

ˆ = Ȗcȕˆ is the BLUE of Ȗ cȕ . Hence, ȥ (vii) Property of Consistency: The OLS Estimators Are Consistent i.e. if the Sample Size Is Large ȕˆ Tends to ȕ.

Proof: From equation (3.118), we have 1

ª X cX º Xcİ ȕˆ = ȕ + « » ¬ n ¼ n

(3.130)

Taking the limit in both sides of equation (3.130), we have 1

ª X cX º ª X cİ º lim ȕˆ = ȕ + lim « » nlim of « n » n of n of n ¬ ¼ ¬ ¼

We know that, if n o f , then

(3.131)

XcX ª X cX º tends to a finite quantity, which implies that « » n ¬ n ¼

if n o f. To show that ȕˆ is a consistent estimator of ȕ, we must show that

1

tends to a finite quantity

Multiple Regression Models

ª X cİ º lim « » ¬ n ¼

n of

103

(3.132)

0

We know that İ~N(0, ı 2 I n ) . Thus, we can write  ª X cİ º E« » =0 ¬ n ¼

The variance-covariance matrix is given by ª Xcİ Xcİ c º § ·§ ·» E «¨ ¸¨ ¸ «© n ¹ © n ¹ » ¬ ¼

X c E(İİ c)X n2

X c ı 2 IX n2 X cX ı 2 u n n ª XcX ı 2 º u » The variance-covariance matrix is finite which implies that lim « n of n¼ ¬ n

(3.133)

0

Thus, we have ª X cİ Xcİ c º § ·§ ·» lim E «¨ ¸¨ ¸ n of «© n ¹ © n ¹ » ¬ ¼

0

(3.134)

§ Xcİ · Here, ¨ ¸ has a zero mean and its variance-covariance matrix vanishes asymptotically, which implies that © n ¹ § Xcİ · lim ¨ ¸ 0. n of © n ¹

Thus, we have lim ȕˆ = ȕ + A Finite Quantity u 0 n of

(3.135)



Therefore, ȕˆ is a consistent estimator of ȕ. This property is called the large sample property. (viii) Asymptotic Normality: The OLS Estimator ȕˆ Is Asymptotically Normally Distributed with Mean ȕ and a o N ª¬ȕ, (XcX)-1ı 2 º¼ . Variance-covariance Matrix (X cX)-1ı 2 , i.e., ȕˆ  Proof: We have ȕˆ = ȕ +(X cX)-1Xcİ

(3.136)

The stabilising transformation of ȕˆ is given by n ȕˆ  ȕ = n (X cX) -1X cİ





-1

§ X cX · Xcİ = n ¨ ¸ © n ¹ n X cİ = n Q-1 n

(3.137)

Chapter Three

104

The limiting behaviour of

n ȕˆ  ȕ is the same as that of





n Q-1

X cİ . n

Here, Q is a fixed matrix. Asymptotic behaviour depends on the random variable

n

Xcİ . n

Now, ª n º « ¦ xi İi » Xcİ » n = n « i=1 n « n » «¬ »¼ § n ¨ ¦ Wi = n ¨ i=1 ¨ n ¨ ©

· ¸ ¸ ¸ ¸ ¹

(3.138)

= nW

where W is the sample mean of independent observations p Under the assumption W  o 0,

Var(x i İ i ) = x ci ı 2 x i = ı 2 x ci x i p Ÿ Var(W)  o

ı2Q n

p Ÿ Var( nW)  o ı2Q

(3.139)

Applying the Lindeberg-Feller Central Limit Theorem (CLT), we can write that d nW)  o N(0, ı 2 Q)

d Ÿ Q-1 nW)  o N(0, V 2 Q-1QQ -1 ) d Ÿ n ȕˆ  ȕ  o N(0, ı 2 Q-1 )





ª ı 2 -1 º a Ÿ ȕˆ  o N «ȕ, Q » ¬ n ¼ ª ı 2 § XcX ·-1 º a Ÿ ȕˆ  o N «ȕ, ¨ ¸ » ¬« n © n ¹ ¼» -1 a Ÿ ȕˆ  o N ªȕ, XcX ı 2 º ¬ ¼

(3.140)

Hence, the theorem is proved. (ix) The OLS Estimator of ı 2 Is given by ıˆ 2 = s 2 =

Let the residual vector e be given by

n 1 ei2 ¦ n  k  1 i=1

Multiple Regression Models

105

e = y  Xȕˆ = y  X(XcX)-1X cy

= My [ where M = I  X(XcX)-1Xc ] = MXȕ + Mİ = Mİ [ Since, MX=0]

(3.141)

which indicates that the estimated residuals are the linear functions of the true unknown disturbances. The residual sum of squares is given by ece = (Mİ)cMİ

= İ cM cMİ = İ cMİ [ Since, M is a symmetric and idempotent matrix]

(3.142)

Taking expectation in both sides of equation (3.142), we have E(ece) = E(İ cMİ) = E[trace(İ cMİ)] [Since İ cMİ is a scalar constant, so İ cMİ = trace(İ cMİ) ] = E[trace(Mİİ c)] = trace[M E(İİ c)]

= trace(Mı 2 I n ) = ı 2 trace(M)

= ı 2 [trace(In )  trace(X(XcX)-1Xc)] = ı 2 [trace(I n )  trace(XcX)-1XcX] = ı 2 [trace(I n )  trace I k+1 ]

= ı 2 (n  k  1)

(3.143)

From equation (3.143), we have ª ece º E« » ¬ n-k-1 ¼

ı2

This implies that

ıˆ 2 = s 2 =

(3.144) n 1 ei2 is an unbiased estimator of ı 2 . Therefore, the OLS estimate of ı 2 is given by ¦ n  k  1 i=1

n 1 ei2 ¦ n  k  1 i=1

(3.145)

3.6 Maximum Likelihood Estimation of the Parameters of Multiple Linear Regression Equations Under the normality assumption of the distribution of random errors of a multiple linear regression model we can write İ~N(0, ı 2 I n ) Ÿ İ i ~NID(0, ı 2 ) 

Chapter Three

106

Thus, the probability density function of İ i is given by 1

f(İ i ) =

e

2ʌı 2



1

İi2

2ı 2

(3.146)

The likelihood function is given by n

L = – f(İ i ) i=1 n

– i=1

1

e

2ʌı 2



1 2ı 2

n

İi2

n

1

İi2 ª 1 º  2ı2 ¦ i=1 = « e » 2 ¬ 2ʌı ¼ n 2 2

2ʌı

e

n 2 2

2ʌı



e

1 2ı 2



İ cİ

1 2ı 2

(y  Xȕ) (y  Xȕ)

(3.147)

Taking logarithms on both sides of equation (3.147), we have n n 1 log(L) =  log(2ʌ)  log(ı 2 )  2 (y  Xȕ)c (y  Xȕ) 2 2 2ı

(3.148)

The maximum likelihood estimators of ȕ and ı 2 can be obtained by taking the partial derivatives of log(L) with respect to ȕ and ı 2 and then equating to zero. Taking partial derivatives of log(L) with respect to ȕ and ı 2 and then equating zero, we have įlog(L) įȕ ȕ=ȕˆ

0Ÿ

1 2XcY+2XcXȕ 2ı 2

(3.149)

0

į 2 log(L) n 1  =0Ÿ (Y  Xȕ)c(Y  Xȕ) įı 2 ı2 =ıˆ 2 2V 2 2ı 4

0

(3.150)

Solving equation (3.149), we have ȕˆ = (X cX) -1X cY

(3.151)

From equation (3.150), we have 1 ˆ c (y  Xȕ) ˆ (y  Xȕ) n

ıˆ 2 =

ece n n

¦e

2 i

i=1

n

(3.152)

It is found that the MLE of ȕ is identical to the OLS estimator of ȕ but the MLE of ı 2 is not identical to the OLS estimator of ı 2 and it is biased.

Multiple Regression Models

107

3.7 Properties of the Maximum Likelihood (ML) Estimators (i) The Variance Property of the ML Estimators of ȕˆ and ı 2 .

The variance of the OLS estimators can be obtained from the information matrix. To obtain the information matrix, we need to take the second-order partial derivatives, i.e., į 2 log(L) (XcX) =  įȕįȕ ı2

(3.153)

į 2 log(L) (Xcİ) = 4 įȕįı 2 ı

(3.154)

į 2 log(L) n 1 ˆ c(Y  Xȕ) ˆ =  (Y  Xȕ) įı 4 2ı 4 ı 6

n ece  2ı 4 ı 6

(3.155)

From equations (3.153), (3.154) and (3.155), we have ª į 2 log(L) º (XcX) ½ E « ° » = V2 ° ¬ įȕįȕ ¼ ° ª į 2 log(L) º ° E « = 0 ¾ 2 » įȕįı ¬ ¼ ° ° 2 ª į log(L) º n ° E « = » 4 4 °¿ ¬ įı ¼ 2ı

(3.156)

Thus, the inverse of the information matrix, is given by

I(ș)

-1

§ (X cX)-1ı 2 = ¨¨ 0 ¨ ©

0 · ¸ 2ı 4 ¸ ¸ n ¹

(3.157)

From the inverse of the information matrix, we have 4 ˆ ıˆ 2 ) = 0, and var(ıˆ 2 ) = 2ı . var(ȕˆ ) = (X cX)-1ı 2 , cov(ȕ, n

(ii) Functional Property of the ML Estimators ˆ is the MLE of g(T ). For example, ıˆ 2 = ece = s 2 is the Let șˆ be the MLE of ș. If g(ș) is a function of ș, then g(ș) n 2 MLE of ı , then s is the MLE of ı.

(iii) Property of Distribution of the MLE of ı 2 : The MLE ıˆ 2 =

ece of ı 2 Is Distributed as Chi-square. n

Proof: We know, the MLE of ı 2 is given by ıˆ 2 =

ece n

(3.158)

We know, e = Mİ [where M = I  X(X cX)-1Xc]

Putting the value of e in equation (3.158), we have

(3.159)

Chapter Three

108

nıˆ 2 = Mİ c Mİ

= İ cM cMİ = İ cMİ [ Since, M is a symmetric and idempotent matrix]

(3.160)

İ cMİ is the quadratic form. Since İ~N(0, ı 2 I n ) , we have  İ cMİ 2 ~Ȥ trace(M)d.f ı2 nıˆ 2 2 ~Ȥ (n-k-1)d.f ı2

ıˆ 2 ~

ı2 2 Ȥ (n-k-1)d.f n

(3.161) 2

2

ˆ Is Not an Unbiased Estimator of ı (iv) Property of Biasedness of the MLE of ı : The MLE ı Proof: Since

nıˆ 2 2 ~Ȥ (n-k-1)d.f , we have ı2

ª nıˆ 2 º E « 2 » = (n-k-1) ¬ı ¼ E(ıˆ 2 ) =

(n-k-1) 2 ı n

 E(ıˆ 2 ) z ı 2

(3.162)

Thus, the MLE ıˆ 2 is not an unbiased estimator of ı 2 . The variance of

nıˆ 2 is given by ı2

ª nıˆ 2 º var « 2 » = 2(n-k-1) ¬ı ¼ var(ıˆ 2 ) =

2ı 4 (n-k-1) n2

(3.163)

(v) Property of Efficiency of the MLE of Proof: We have

ı 2 : The MLE ıˆ 2 Is Not a FULL Efficient Statistic

nıˆ 2 2 ~Ȥ (n-k-1)d.f ı2

The distribution function of the chi-square variate Ȥ 2 = 1

f(Ȥ 2 ) = 2 Now,

n-k-1 n-k-1 2 2

e



Ȥ2 2

ª¬ Ȥ 2 º¼

n-k-1 -1 2

; 0 I n  P @ , where P= X(X cX)-1X c is a symmetric and idempotent

matrix Proof: The variance-covariance matrix of e is given by var(e) = Mvar(Y)Mc = Mı 2 I n M c = ı 2 MM, [  M c = M]

= ı 2 M, [  M 2 =M] = ı 2 ª¬ I n  X(XcX)-1X cº¼ = ı 2 > In  P @

(3.224)

Since, Pc = P, and P 2 =P, P is a symmetric and idempotent matrix. Equation (3.218) can also be written as ª1  p11 p12 « p « 21 1  p 22 « . . var(e) = ı 2 « . « . « . . « p p   n2 ¬« n1

. . . . . . . . . . . . . . . . . .

p1n º p 2n »» . » » . » . » » 1  p nn ¼» n×n

Thus, from equation (3.225), we have var(ei ) = ı 2 1  pii ; i= 1, 2,.....,n, and cov(ei ,e j ) =  ı 2 pij , for i z j.

(3.225)

Chapter Three

122

where pii is the ith diagonal element and pij is the (i, j)th element of the matrix P Property iv: The correlation between ei and e j is given by

pij

1  pii

1  p

.

jj

Proof: The correlation coefficient between ei and e j is given by

ȡij =

=

=

cov(ei ,e j ) var(ei ) var(e j ) ı 2 pij ı 2 1  pii ı 2 1  p jj

pij

1  pii

1  p

(3.226)

jj

This shows that the residuals are correlated and have an unequal variance. Property v: The residual vector e is normally distributed with mean vector 0 and variance-covariance matrix ı 2 M  where M = I n  X(X cX)-1X c. Proof: We have e = MY,which implies that e is a linear function of Y. Since Y is normally distributed, e is also normally distributed. We have E(e) = 0 and var(e) = ı 2 M. Thus, we can say under the normality assumption,  e~N(0, ı 2 M).  Property vi: Residuals and OLS estimators are uncorrelated. Proof: The residual vector e is given by

e = MY

(3.227)

where M = I n  X(X cX)-1Xc, is an asymmetric and idempotent matrix. The OLS estimator ȕˆ is ȕˆ = (X cX) -1X cY = CY, [where C = (XcX)-1 Xc]

The covariance between e and ȕˆ is given by ˆ = cov(MY, CY) cov(e, ȕ)

= E[MY  E(MY)] [CY  E(CY)]c = M > Y  E(Y) @[Y  E(Y)]cCc = Mvar(Y)Cc

= Mı 2 I n Cc = ı 2 ª¬ I n -X(X cX)-1X cº¼ ª¬(X cX)-1X cº¼c

= ı 2 ª¬ X(X cX)-1  X(XcX)-1XcX(XcX)-1 º¼

(3.228)

Multiple Regression Models

123

= ı 2 ª¬ X(X cX)-1  X(XcX)-1 º¼ (3.229)

=0 

This shows that the residuals and the OLS estimators are uncorrelated. Property vii: The residuals are uncorrelated with estimated values. Proof: The residual vector e is e = MY; where M = I n  X(XcX)-1X c is an asymmetric and idempotent matrix and the ˆ = PY, where P = X(X cX)-1X c is also a symmetric and idempotent matrix. vector of estimated values is given by Y ˆ is given by The covariance between e and Y ˆ = cov(MY, PY) cov(e, Y) = E[MY  E(MY)] [PY  E(PY)]c = M > Y  E(Y) @[Y  E(Y)]cPc

= M var(Y) Pc

= Mı 2 I n Pc = ı 2 ª¬ I n  X(X cX)-1X cº¼ ª¬ X(X cX) -1X cº¼c

= ı 2 ª¬ I n  X(XcX)-1Xcº¼ ª¬ X(XcX)-1Xcº¼ = ı 2 ª¬ X(X cX)-1X c  X(X cX) -1X cX(X cX) -1X c º¼

= ı 2 ª¬ X(X cX)-1Xc  X(XcX)-1Xcº¼ (3.230)

=0 

This shows that the residuals are uncorrelated with the estimated values. Property viii: Residuals are uncorrelated with the explanatory variables. Proof: The covariance between e and X is given by ˆ cov(X ce) = cov[X c(Y  Y)] = cov(XcY  X cX(X cX)-1X cY) = cov(X cY  X cY)

=0 

(3.231)

This shows that the residuals are uncorrelated with explanatory variables.

3.12 Characteristics of the Coefficient of Multiple Determination ( R 2 ) The important characteristics of the coefficient of multiple determination are given below: (i) A measure of goodness of fit of a multiple regression equation is called a coefficient of multiple determination which is denoted by R 2 and is given by

Chapter Three

124

Regression Sum of Squares Total Sum of Squares

R2

=

ȕˆ cXcY YcY

=

ȕˆ 1SP(X1 ,Y)+ȕˆ 2SP(X 2 ,Y)+........+ȕˆ k SP(X k ,Y) YcY

where SP(X j ,Y) =

n

¦ (X

ji

(3.232)

 X j )(Yi  Y) , j = 1, 2,…..,k

i=1

(ii) R 2 lies between 0 and 1, i.e., 0 d R 2 d 1. (iii) The value of R 2 indicates what portion of the total variation in the dependent variable Y explained by the fitted regression equation or by explanatory or regressor variables. (iv) If R 2 1, it indicates that the relationship between Y and X’s is perfect and 100 percent of the total variation in the dependent variable Y is explained by the fitted regression equation. That is, there is no impact of random error term on Y. If R 2 0, it indicates that there is no linear relationship between Y and X’s but a non-linear relationship may exist. Thus, when the model fits the data very well, the observed and predicted values will be close to each other and the error sum of the squares will be relatively small. In such a case R 2 will be close to unity. If the error sum of squares is zero, then R 2 will be 1 because R2 = 1 = 1

Residual Sum of Squares Total Sum of Squares ece YcY

(3.233)

From equation (3.233), we see that if, ece = 0, then R 2 1, indicating that the relationship between Y and X’s is a perfect. On the other hand, if there is no relationship between Y and X, the linear model gives poor fit and the best estimate of the observations would be Y and the residual sum of squares will be close to total sum of squares and R 2 will be close to zero. (v) A positive square root of R 2 is called a multiple correlation coefficient. Adjusted R 2 in Case of Multiple Linear Regression Model

In a multiple linear regression equation, one common problem is that if we include additional independent variable(s) in the equation regardless of how irrelevant it may be, R 2 will increase. To overcome this type of problem we use adjusted R 2 instead of R 2 for measuring the goodness of fit of a regression equation which is denoted by R 2 and given by R2

1

ESS/(n-k-1) TSS (Adjusted) /(n-1)

ª n  1 º TSS (Adjusted)  RSS (Adjusted) 1 « » TSS (Adjusted) ¬ n  k  1¼ ª n-1 º ª RSS (Adjusted) º 1 « » » «1  ¬ n-k-1 ¼ ¬ TSS (Adjusted) ¼ ª n-1 º 2 1 « » ª¬1  R º¼ ¬ n-k-1 ¼ =

n-k-1-n+1 ª n-1 º 2 +« »R n-k-1 ¬ n-k-1 ¼

=

k ª n-1 º 2  R n-k-1 «¬ n-k-1 »¼

(3.234)

Multiple Regression Models

125

where k is the number of explanatory variables in the equation. From equation (3.234), we have R 2 is less than R 2 [unless R 2 1, in which case R 2 1 ]. Thus, we can say that it is a good practice to use adjusted R 2 rather than R 2 because adjusted R 2 tends to give an overly optimistic picture of the fit of the regression equation, particularly when the number of explanatory variables is not very small compared to the number of observations. Ex. 3-6: Calculate the coefficient of multiple determination and adjusted R 2 for the given problem in Ex. 3-5. Comment on your obtained results. Solution: For the given problem in Ex.3-5, the total sum of squares (TSS) is n

TSS = ¦ Yi2  nY 2 i=1

18230.8370  49 u 19.2117 2 145.4389

(3.235)

The residual sum of squares (ESS) is given by n

ESS = ¦ ei2

5.9907

(3.236)

i=1

The coefficient of multiple determination ( R 2 ) is given by R2 = 1

5.9907 145.4389

= 0.9588

(3.237)

The adjusted R 2 is given by § 48 · R 2 =1  ¨ ¸ (1  0.9588) © 44 ¹

= 0.9551

(3.238)

Comment: From the estimated results of R 2 , it can be said that 95.88% of the total variation in the dependent variable carbon dioxide emissions is explained by the fitted equation and the remaining 4.12 percent is explained by random factors. Thus, it can be concluded that the fit is very good. It is also found that the adjusted R 2 is almost similar to the R 2 . Thus, it can be concluded that in the multiple linear regression equation, there is no irrelevant variable.

3.13 Steps That Are Involved in Fitting a Multiple Linear Regression Model (Without Distributional Assumptions of Random Error Term) In this case, the following steps are followed in fitting a multiple linear regression equation. Step 1: First, we obtain the OLS estimate of ȕ which is given by ȕˆ = (XcX)-1 X cY. Then we obtain the fitted value of ˆ ˆ = Xȕ. Y which is given by Y ˆ Step 2: Second, we obtain the residual vector e which is given by e = Y  Y

Chapter Three

126

Step 3: Third, we obtain the variance-covariance matrix of ȕˆ which is given by ˆ = (X cX)-1ı 2 var(ȕ)

ª V00 «V « 10 2 =ı « . « « . «¬ Vk0

V01 ... ... V0k º V11 ... ... V1k »» . ... ... . » » . ... ... . » Vk1 ... ... Vkk »¼ (k+1)×(k+1)

(3.239)

If we multiply the diagonal elements by ı 2 , then we can obtain the variance of the OLS estimates, i.e., var(ȕˆ j ) = Vjjı 2 , j = 0, 1, ....,k . If we multiply the off-diagonal elements by ı 2 , then we find the co-variance of OLS estimates, i.e., cov(ȕˆ l , ȕˆ j ) = ı 2 Vlj ; (i, j = 0, 1,....,k, i z j). Step 4: Let Xc0 be a specified vector which consists of a set of given values of X1, X2,……, and Xk and is given by ˆ is the predicted value of X by the regression equation and is given by Xc0 = >1, x1 , x 2 , ... ... x k @ . Then Y 0 0 ˆ Thus, the variance-covariance matrix of Y ˆ = X c ȕ. ˆ is given by Y 0

0

0

ˆ ) = X c var(Eˆ )X var(Y 0 0 0

X c0 (X cX)-1ı 2 X 0 = X c0 (X cX)-1X 0 ı 2

(3.240)

Step 5: Then, we have to decompose of the total variation of Y in the explained and un-explained variation which are shown in ANOVA Table 3-5. Table 3-5: ANOVA Table

Source of Variation

d.f

SS

Explained

k+1

ȕˆ cXcY

Unexplained

n-k-1

Y cY  ȕˆ cX cY

Total

n

YcY

MS

RMS =

F-Test

ȕˆ cX cY k+1

(Y cY-ȕˆ cXcY) EMS = n-k-1

F=

Decision

RMS ~F(k+1,n-k-1) EMS

If Fcal > Ftab , then the null hypothesis will be rejected

Step 6: Test the lack of fit:

(i) The lack of fit can be tested using the following ratio: R2 = 1

ece TSS

(3.241)

ˆ R 2 is called the coefficient of multiple determination. It is the square of the correlation coefficient between Y and Y, 2 i.e., rYY ˆ

R 2 . R 2 lies between 0 and 1. An imperfect fit reflects the existence of pure error. For such situations it is

ˆ , for such situations, R 2 impossible for R 2 to actually attain 1. If the fit is perfect, that is Yi = Y i

1. If R 2 is near

about 1, we say that there is no pure error and the fit is adequate. If R 2 is very low, it indicates a lack of fit. (ii) Sometimes, it is better to use adjusted R 2 to test the lack of fit of a regression equation which is defined as ª n 1 º R 2 = 1  (1  R 2 ) « » ¬ n  k  1¼

(3.242)

The adjusted R 2 is defined for the corresponding degrees of freedom of the two quantities namely: residual sum of squares and total sum of squares (corrected), the idea being that the statistic R 2 can be used to compare equations

Multiple Regression Models

127

fitted not only to a specific set of data but also to two or more different data sets.

3.14 Steps That Are Involved in Fitting a Multiple Linear Regression Model (With Distributional Assumptions of Random Error Term) The following steps are involved in fitting a multiple regression model under the assumption that the random error term İ i ~N(0, ı 2 ),  i, Ÿ İ~N(0, ı 2 I n ). Step 1: First, we obtain the OLS estimate of ȕ, which is given by ȕˆ = (XcX)-1XcY. Then we obtain the fitted value of ˆ ˆ = Xȕ. Y which is given by Y ˆ Step 2: Second, we obtain the residual vector e which is given by e = Y  Y.

Step 3: Third, we obtain the variance-covariance matrix of ȕˆ which is given by ˆ = (X cX)-1ı 2 var(ȕ) ª V00 «V « 10 = ı2 « . « « . «¬ Vk0

V01 ... ... V0k º V11 ... ... V1k »» . ... ... . » » . ... ... . » Vk1 ... ... Vkk »¼ (k+1)u(k+1)

(3.243)

If we multiply the diagonal elements by ı 2 , then we can obtain the variance of the OLS estimates, i.e., var(ȕˆ j ) = Vjjı 2 ; j = 0, 1,......,k . If we multiply the off-diagonal elements by ı 2 , then we find the co-variance of the OLS estimates, i.e., cov(ȕˆ l , ȕˆ j ) = ı 2 Vlj ; (l, j = 0, 1,...,k, l z j). Step 4: Let x c0 be a specified vector which consists of a set of given values of X1 , X 2 ,......, and X k and given by,

Xc0 = >1 x1

x 2 ... ... x k @ . Then Yˆ 0 is the predicted value of X 0 by the regression equation and is given by

ˆ Thus the variance-covariance matrix of Y ˆ = X c ȕ. ˆ is given by Y 0 0 0 ˆ ) = Xc var(Eˆ )X var(Y 0 0 0 X c0 (X cX)-1V 2 X 0

= X c0 (XcX)-1X 0V 2

(3.244)

Step 5: Then, we have to decompose the total variation of Y in explained and un-explained variation which are shown in ANOVA Table 3-6. Table 3-6: The ANOVA Table

Source of Variation

d.f

SS

Explained

k+1

ȕˆ cX cY

Unexplained

n-k-1

YcY  ȕˆ cXcY

Total

n

YcY

MS

RMS =

ȕˆ cX cY k+1

EMS =

(Y cY-ȕˆ cXcY) n-k-1

F-Test

Decision

If Fcal > Ftab , then the null hypothesis RMS F= ~F(k+1,n-k-1) will be rejected EMS

Step 6: If repeat observations are available, we can split the residual sum of squares into two parts namely: SS (pure 2

error) with n e degrees of freedom whose estimation is given by n e ı and SS(lack of fit) with (n-k-1-n e ) degrees of

Chapter Three

128

freedom [repeat means in all co-ordinates X1 , X 2 ,........,X k must be reiterated]. Then, we can rearrange the ANOVA table as follows: Table 3-7: The ANOVA Table

Source of Variation ­°ȕˆ 0 Regression ® °¯ȕˆ 1 ,......,ȕˆ k |ȕˆ 0

d.f ­1 ® ¯k

­Lack of fit Residual ® ¯Pure error Total

­(n-k-1-n e ) ® ¯n e

SS °­Yc11c Y ®ˆ °¯ȕcX cY  Yc11c Y ­SS (lack of fit) ® ¯SS (pure error)

n

YcY

MS °­Yc11c Y ® ˆ °¯(ȕcX cY  Y c11c Y)/k ­MS (lack of fit) ® ¯MS (pure error)

Step 7: Test the lack of fit:

(i) The lack of fit can be tested using the F-test statistic instead of R 2 which is given by SS (lack of fit) /(n-k-1-n e ) F= ~F(n-k-1-n e , n e ) (3.245) SS (pure error)/n e This F value will be compared with the table value of F-statistic with degrees of freedom (n-k-1-n e ) and n e at D level of significance. If the result is significant, it indicates the existence of a lack of fit in the equation, otherwise, Residual SS is usually called s 2 and is an unbiased estimate of ı 2 . If the lack there is no lack of fit. Here, MSE = n-k-1 of fit can not be tested, use of s 2 as an estimate of ı 2 , another assumption considering the model to be correct, is to be made. Step 8: Testing the overall regression equation:

The null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 =........=ȕ k = 0

against the alternative hypothesis H1 : At least one of them is not zero

Under the null hypothesis, the test statistic is given by (Restricted Residual SS  Unrestricted Residual SS)/k ~F(k, n-k-1) Unrestricted Residual SS /(n-k-1)

F=

(3.246)

This calculated value will be compared with the table value of the F-test statistic with k and (n-k-1) degrees of freedom at D level of significance. If the calculated value is greater than the table value, we reject the null hypothesis meaning that the fitted equation is statistically significant. Otherwise, we accept the null hypothesis meaning that the equation will be fitted to the errors only.

3.15 Show that ryy2ˆ = R 2 where ryyˆ Is the Correlation Coefficient between y and yˆ For sample data, in deviation form, the multiple linear regression equation can be written as: k

¦ ȕˆ x

yi =

j

ji

+ei , [where x ji = (X ji  X j ), and yi = Yi  Y]

j=1

ei

k

yi  ¦ ȕˆ j x ji j=1

n

k ª º e = ¦ « yi  ¦ ȕˆ j x ji » ¦ i=1 i=1 ¬ j=1 ¼ 2 i

n

2

(3.247)

Multiple Regression Models

Taking the partial differentiation of

n

¦e

2 i

129

with respect to ȕˆ j and then equating to zero, we have

i=1

n

į¦ ei2 i=1

=0

įȕˆ j

n ª k º 2¦ « yi  ¦ ȕˆ j x ji » x ji i=1 ¬ j=1 ¼ n

¦e x i

0

(3.248)

0

ji

i=1

The estimated value of yi is given by k

¦ ȕˆ x

yˆ i =

j

(3.249)

ji

j=1

Thus we have n

n

k

¦ e yˆ ¦ e ¦ ȕˆ x i

i

i

i=1

j

i=1

=

ji

j=1

k

n

¦ ȕˆ ¦ e x j

i

j=1

ji

i=1

(3.250)

=0

We have TSS (Adjusted) = RSS (Adjusted) + ESS

(3.251)

Now, n

n

¦ y yˆ = ¦ (yˆ i

i

i=1

i

 ei )yˆ i

i=1

n

n

n

2 i

¦ y yˆ = ¦ yˆ  ¦ e yˆ i

i

i=1

i

i=1

n

n

¦ y yˆ = ¦ yˆ i

i

i=1

i

i=1

2 i

i=1

n

¦ y yˆ = RSS (Adjusted) i

(3.252)

i

i=1

ˆ is given by The correlation coefficient between Y and Y rYYˆ =

ˆ cov(Y, Y) ˆ var(Y) var(Y) n

¦ (Y  Y) (Yˆ  Y) i

=

i

i=1

n

n

¦ (Y  Y) ¦ (Yˆ  Y) 2

i

2

i

i=1

i=1

n

¦ y yˆ i

=

i

i=1

n

2 i

n

¦ y ¦ yˆ i=1

i=1

2 i

n

n

i=1

i=1

ˆ  Y)] , [where yi =¦ (Yi  Y), and yˆ i = ¦ (Y i

(3.253)

Chapter Three

130

Squaring both sides of (3.253), we have 2

2 rYY ˆ

ª n º « ¦ yi yˆ i » = ¬n i=1 n ¼ ¦ yi2 ¦ yˆ i2 i=1

i=1

2

ª n 2º « ¦ yˆ i » = n¬ i=1 n ¼ ¦ yi2 ¦ yˆ i2 i=1

i=1

n

¦ yˆ

2 i

¦y

2 i

i=1 n

=

i=1

RSS (Adjusted) TSS (Adjusted)

R2

(3.254)

Hence, the theorem is proved.

3.16 Analysis of Variance Analysis of variance is a statistical technique which is used to analyse the sample variance to evaluate econometric models with two or more independent variables or to compare models with a different number of independent variables. The analysis of variance technique can be used in different situations. In every situation it should be asked what happens to the residuals of an econometric model if we assume that the null hypothesis H 0 is true. When we use this technique, we have to use some notations which are given below: (i) Total Sum of Squares (TSS) or Total Variation TSS =

n

¦ (Y Y)

2

i

i=1

=

n

2 i

¦Y

 nY 2

(3.255)

i=1

(ii) Regression Sum of Squares (RSS) or Explained Sum of Squares or Systematic Variation n

ˆ Y) 2 RSS = ¦ (Y i i=1

=

n

¦ Yˆ

2 i

 nY 2

(3.256)

i=1

(iii) Residual Sum of Squares (ESS) or Chance Variation ESS =

n

¦ (Y Yˆ ) i

2

i

i=1

=

n

¦e

2 i

(3.257)

i=1

We use TSS, RSS and ESS in different situations to test the null hypothesis about the population parameter(s) of a regression equation. Let the null hypothesis to be tested is

Multiple Regression Models

131

H 0 : ȕ j = 0, (j = 1, 2,.....,k) and the alternative hypothesis H1: ȕ j z 0

Assume that, RESS = residual sum of squares when H0 is assumed to be true, RESS is also called the restricted residual sum of squares. UESS = residual sum of squares when H1 is assumed to be true, UESS is called the unrestricted residual sum of squares. Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/v1 ~F(v1 , v 2 ) UESS/v 2

(3.258)

= Fcal

We have to compare this calculated value with the table value at level of significance. If Fcal ! Ftab , then we reject the null hypothesis, otherwise we accept it. The results of the decision are reported below in the ANOVA Table 3-8. Table 3-8: The ANOVA Table

Source of Variations

df

Sum Squares

Regression

k+1

RSS

Residual

n-k-1

ESS

Total

n

TSS

of

Mean Square RSS k+1 ESS EMS = n-k-1 TSS TMS = n

F-test

Decision If

RMS =

RMS F= ~F(k+1,n-k-1) EMS

Fcal > Ftab ,

then

we reject the null hypothesis otherwise we accept it.

Definition of ANOVA Table

The ANOVA table is defined as the arrangement of the results of the analysis of variance systematically and scientifically in a tabular form to evaluate an econometric model with two or more independent variables or to compare models with a different number of independent variables. Uses of Analysis of Variance in Regression Analysis

In an econometric analysis of different economic relationships, the analysis of variance method is used for conducting various tests of significance. The most important applications are given below: (i) to test the significance of all the regressors in a regression equation (ii) to test the significance of the improvement in the fit by adding one or more explanatory variables in the regression equation (iii) to test the equality of coefficients obtained from different samples (iv) to test the stability of the regression coefficients (v) to test the restrictions imposed on the coefficients of a regression equation. ANOVA for Adjusted Model

The total sum of squares (TSS) can be partitioned into the regression sum of squares (RSS) and the residual sum of squares (ESS) as given by YcY = ȕˆ cXcY+ece

The RSS is given by

(3.259)

Chapter Three

132

ȕˆ cX cY = ª¬ y  ȕˆ 1X1  ȕˆ 2 X 2  .......  ȕˆ k X k

ȕˆ cX cY

ˆ ˆ ˆ ª ¬ y  ȕ1X1  ȕ 2 X 2  .......  ȕ k X k

ȕˆ 1

ȕˆ 1

ª 1 «X « 11 « . . . . ȕˆ k º¼ « « . « . « ¬« X k1

1 X12 . . . X k2

. . . .

. . . .

. 1 º ª y1 º . X1n »» «« y 2 »» . . »« . » »« » . . »« . » . . . . »« . » »« » . . . X kn ¼» ¬« y n ¼»

ª n º « ¦ yi » « i=1 » « n » « ¦ X1i yi » « i=1 » » . . . . ȕˆ k ¼º « « » . « » « » . « n » « X y» ¦ ki i »»¼ ««¬ i=1

n

n

n

n

n

n

n

i=1

i=1

i=1

i=1

i=1

i=1

i=1

ȕˆ cX cY = y ¦ yi  ȕˆ 1X1 ¦ yi  ȕˆ 2 X 2 ¦ yi  ......  ȕˆ k X k ¦ yi  ȕˆ 1 ¦ X1i yi  ȕˆ 2 ¦ X 2i yi +ȕˆ k ¦ X ki yi ª n º « ¦ yi » ˆȕcX cY  ¬ i=1 ¼ n

2

ª n º « ¦ yi » ˆȕcX cY  ¬ i=1 ¼ n

2

ª n º ª n º ª n º ȕˆ 1 « ¦ X1i yi  nX1 y »  ȕˆ 2 « ¦ X 2i yi  nX 2 y »  .....  ȕˆ k « ¦ X ki yi  nX k y » ¬ i=1 ¼ ¬ i=1 ¼ ¬ i=1 ¼

k

ª

n

¦ ȕˆ «¬¦ X j

j=1

i=1

ji

º yi  nX j y » ¼ 2

ª n º « ¦ yi » ˆ RSS (Adjusted ) = ȕcXcY  ¬ i=1 ¼ = ȕˆ cXcY  ny 2 n

(3.260)

The total adjusted sum of squares (TSS) is given by TSS (Adjusted) =

n

¦y

2 i

 ny 2

(3.261)

i=1

Thus, from equation (3.259), we have Y cY c  ny 2 = ȕˆ cX cY  ny 2 +ece

TSS (Adjusted) = RSS (Adjusted) +ESS5

(3.262)

The ANOVA Table for adjusted data is shown below:

5

RSS (Unadjusted) = SS(ȕˆ 0 , ȕˆ 1 ,.........,ȕˆ k ) = ȕˆ cXcY ; RSS (Adjusted) = SS(ȕˆ 1 ,.........,ȕˆ k |ȕˆ 0 ) = ȕˆ cXcY  ny 2 2

ª n º « ¦ yi » SS(ȕˆ 0 ) = ¬ i=1 ¼ = ny 2 n

Multiple Regression Models

133

Table 3-9: The ANOVA Table

Source of Variation Reg (ȕˆ 1 ,....,ȕˆ k |ȕˆ 0 )

df k

Sum of Squares ȕˆ cX cY  ny 2

Error

n-k1

YcY  ȕˆ cXcY

Total

n

YcY c  ny 2

Mean Square ȕˆ cX cY  ny 2 k ece EMS = n-k-1

RMS =

F-Calculated Value

Decision

RMS F= ~F(k, n-k-1) EMS = Fcal

If Fcal > Ftab , then the null hypothesis will be rejected

Decision: If Fcal >Ftab , at D level of significance with k and (n-k-1) d.f. we reject the null hypothesis, otherwise we accept the null hypothesis.

3.17 Test of Significance of Parameter Estimates of Multiple Linear Regression Models Testing Null Hypothesis for Significance of the Individual Parameters

Hypothesis testing uses relatively straight-forward methods to draw valid inferences about population parameters based on sample information drawn from the population. The testing procedure for individual parameters of a multiple regression model is very simple and similar to simple linear regression models. In this section, the t-test statistic is discussed for testing the significance of individual parameters of a multiple linear regression equation. Assume that the dependent variable (Y) linearly depends on k (k•2) independent variables X1, X2,….., and Xk. Then the multiple linear regression equation is given by Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +İ i , (i= 1, 2, …… ,n)

(3.263)

All the terms of equation (3.263) are already explained. The null hypothesis to be tested is 0, ( j = 0, 1, 2,......,k)

H0 : ȕ j

against the alternative hypothesis H1: ȕ j z 0 Under the null hypothesis, the test statistic is given by t=

ȕˆ j  E(ȕˆ j )

=

var(ȕˆ j ) ȕˆ j var(ȕˆ j )

~ t (n-k-1)d.f.

~ t (n-k-1)d.f.

(3.264)

where k is the number of explanatory variables. n

var(ȕˆ j )

2

¦ ei

i=1

n-k-1

v jj

(3.265)

where v jj is the diagonal element of (XcX)-1 and k is the number of independent variables in the multiple linear regression model. Let the level of significance be Į. Table Value: Let at Į level of significance with n-k-1 degrees of freedom, the table values of the test statistic be ±t (n-k-1);Į/2 . Comment: If the calculated value of the test statistic falls in the acceptance region, we accept the null hypothesis, otherwise we reject it. Rejecting the null hypothesis implies that the variable Xj (j=1, 2,….,k) has a significant impact

Chapter Three

134

on Y. Joint Test

We know that Total Sum of Squares = Regression Sum of Squares + Residual Sum of Squares TSS = RSS +ESS

(3.266)

where the unrestricted regression sum of squares (URSS) is given by URSS = ȕˆ cXcY

(3.267)

and the unrestricted residual sum of squares (UESS) is given by UESS = ece = Y cY  ȕˆ cXcY

(3.268)

The null hypothesis to be tested is: H 0 : ȕ1

ȕ 2 =....=ȕ k = 0

against the alternative hypothesis H1 : At least one of them is not zero

Under the null hypothesis, equation (3.263) can be written as Yi = ȕ 0 +İ i

(3.269)

So the restricted regression sum of squares (RRSS) is given by RRSS = ȕˆ cX cY = nY 2

(3.270)

The restricted residual sum of squares (RESS) is given by RESS = ece = Y cY  nY 2

(3.271)

Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/k ~ F(k, n-k-1) UESS/n-k-1 (YcY  nY 2  YcY  Eˆ cXcY) / k (YcY  Eˆ cXcY) / n-k-1

=

(ȕˆ cXcY  nY 2 )/k (YcY  ȕˆ cX cY)/n-k-1

(3.272)

We know that the R 2 is given by R2 =

RSS(Adjusted) TSS(Adjusted) ȕˆ cXcY  nY 2 YcY  nY 2

(3.273)

From equation (3.262), we have ȕˆ cX cY  nY 2 = R 2 (YcY  nY 2 )

Also, we have

(3.274)

Multiple Regression Models

1 R2

135

Y cY  ȕˆ cX cY YcY  nY 2

YcY  ȕˆ cXcY = (1  R 2 )(YcY  nY 2 ) UESS = (1  R 2 )(YcY  nY 2 )

(3.275)

Putting these values in equation (3.272), we have F=

(R 2 (YcY-nY 2 )) / k ~F(k, n-k-1) ((1-R 2 )(YcY-nY 2 )) / n-k-1 R2 / k ~F(k, n-k-1) (1-R 2 ) / n-k-1

(3.276)

Here, k is the number of restrictions imposed by the hypothesis. Let the level of significance be Į. Table Value: Let at Į level of significance with k and n-k-1 degrees of freedom, the table value of the test statistic be Ftab . Comment: If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis, otherwise we accept the hypothesis. Rejection of the null hypothesis implies that some of the independent variables have significant impacts on Y. Testing the Joint Null Hypothesis

The null hypothesis to be tested is H 0 : ȕ1

ȕ 2 =....=ȕ h = 0 [where h < k]

against the alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the multiple linear regression equation can be written as Yi = ȕ 0 +ȕ h+1X h+1,i +ȕ h+2 X h+2,i +.......+ȕ h+k X h+k,i +İ i

(3.277)

Equation (3.277) is called the restricted regression equation. In matrix notation, equation (3.277), can be expressed as  Y = Xȕ+İ

ª y1 º «y » « 2» « . »  = where Y = « » ; X « . » « . » « » ¬« y n ¼» nu1

(3.278) ª1 X h+1,1 «1 X h+1,2 « «. . « . «. «. . « 1 X h+1,n ¬«

X h+2,1 . . . X k1 º ª ȕ0 º «ȕ » . . . X k2 »» « h+1 » « . » . . . . . » ; ȕ = « ; and İ = » » . . . . . » « . » « . » . . . . . » » « » X h+2,n . . . X kn ¼» (nu(k-h+1)) «¬ ȕ k »¼ (k-h+1)×1 X h+2,2

ª İ1 º «İ » « 2» «.» « » «.» «.» « » ¬«İ n ¼» n×1

ˆ Let ȕ be the OLS estimator of ȕ which is given by ˆ  cX)  -1XY  ȕ = (X

(3.279)

The estimated equation will be  ˆ Y = Xȕ

(3.280)

Chapter Three

136

So, the residual vector e r for the restricted regression model is given by er

 ˆ (Y  Xȕ)

(3.281)

The residual sum of squares for the restricted regression model is given by ˆ c e rce r = RESS = Y cY  ȕ cX Y

(3.282)

Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/h ~ F(h, n-k-1) UESS/n-k-1

(3.283)

where h is the number of restrictions. Let the level of significance be Į . Comment: At Į level of significance with h and n-k-1 degrees of freedom, we find the table value of the F-test statistic. If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis, otherwise we accept the hypothesis. Extra Sum of Squares Principle and Partial F-Test

To determine whether several regressor variables in a regression model is meaningful, we investigate the contribution of this number of regressor variables by considering the extra sum of squares which arises due to the inclusion of some independent variables in the model. The mean square derived from this extra sum of squares can then be compared with the estimated value s 2 of ı 2 to see whether it appears at a large scale significantly. Let us consider the multiple linear regression model of the type Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.....+ȕ h X hi +ȕ h+1X (h+1)i +ȕ h+2 X (h+2)i +.....+ȕ k X ki +İ i

(3.284)

In matrix form, the equation (3.284) can be expressed as Y = Xȕ+İ

(3.285)

For model (3.285), we have YcY = ȕˆ cXcY +ece

TSS = RSS +ESS

(3.286)

where YcY = TSS, ȕˆ cX cY = RSS, and ece =ESS. From equation (3.286) we have ESS = TSS-RSS

(3.287)

Let us define S1 = ȕˆ cX cY

(3.288)

Thus, the residual sum of squares is given by ESS = Y cY  S1

(3.289)

And the residual means squares (EMS) is given by EMS =

ESS Y cY  S1 = df n  k 1

The null hypothesis to be tested is

(3.290)

Multiple Regression Models

137

H 0 : ȕ h+1 = ȕ h+2 =.....= ȕ k = 0

Under H 0 , the reduced model is given by (3.291)

Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +........+ȕ h X hi +İ i

In matrix form, equation (2.291) can be written as  +İ Y = XȜ

ª y1 º ª1 X11 «y » «1 X 12 « 2» « « . » «. .  = « where Y = « » ; X . « . » «. « . » «. . « » « «¬ y n »¼ nu1 «¬1 X1n

(3.292) X 21 . . . X k1 º ªȕ 0 º » «ȕ » X 22 . . . X k2 » « 1» «.» . . . . . » ;O= « » ; and İ = » . . . . . » «.» «.» . . . . . » » « » X 2n . . . X kn »¼ (n)×(k+1) «¬ȕ h »¼ (h+1)×1

ª İ1 º «İ » « 2» «.» « » «.» «.» « » «¬İ n ¼» n×1

Applying the OLS method to the reduced form we have  cX  -1 X  cY Oˆ = X

(3.293)

The regression sum of squares of the reduced form is given by  cY S2 = Oˆ cX

(3.294)

Then, the difference between S1 and S2 , i.e., S1  S2 is called the extra sum of squares due to the inclusion of the independent variables X (h+1) , X (h+2) ,......,and X k . Since S1 has (k+1) degrees of freedom and S2 has (h+1) degrees of freedom, and hence, (S1  S2 ) has (k+1-h-1) = k-h degrees of freedom. Now (S1  S2 ) can be written as S1  S2 = RSS (ȕˆ h+1 ,ȕˆ h+2 ,.........,ȕˆ k |ȕˆ 0 ,ȕˆ 1 , ȕˆ 2 ,.......ȕˆ h )

When the null hypothesis H 0 : ȕ h+1 = ȕ h+2 =.........= ȕ k = 0 is true, then it can be shown that E(S1  S2 ) = (k  h)ı 2 Under the assumption of normality of the random error terms, it can be shown that

(S1  S2 ) is distributed as chiı2

(n-k-1)s 2 is also distributed as chi-square independently with degrees of ı2 freedom (n-k-1). Therefore, to test the null hypothesis H 0 : ȕ h+1 = ȕ h+2 =.......= ȕ k = 0 against an appropriate alternative hypothesis, the test statistic is given by

square with (k-h) degrees of freedom and

ª S1  S2 º « V 2 » / (k-h) ¼ F= ¬ ~F(k-h, n-k-1) ª (n-k-1)s 2 º / (n-k-1) « V2 » ¬ ¼ =

(S1  S2 ) ~F(k-h, n-k-1) (k  h)s 2

The ANOVA table for this model is given below:

(3.295)

Chapter Three

138

Table 3-10: The ANOVA table

Source of Variation Reg due to Oˆ

d.f.

Sum of Squares S2

S2 /(h+1)

k-h

S1 -S2

(S1  S2 ) /(k  h)

n-k-1

YcY-S1 YcY

(YcY-S1 ) /(n-k-1) YcY/n

h+1

Reg. due ˆ (ȕˆ h+1 ,....,ȕˆ k |Ȝ) Error

to

Total

n

MS

F-Test

Decision

S -S F = 1 2 2 ~F(k-h, n-k-1) (k-h)s

If Fcal > Ftab , then the null hypothesis will be rejected

Decision: If the result is significant then the regressor variables should be included in the model. If not, there is no improvement in the prediction of Y by including the regressor variables. This test is known as the partial F-test. Note: If we wish, then the extra sum of squares principle (SS) can be used to determine SS(ȕˆ 0 ), SS(ȕˆ 1 |ȕˆ 0 ), SS(ȕˆ 2 |ȕˆ 0 , ȕˆ 1 ), ..........,SS(ȕˆ h |ȕˆ 0 , ȕˆ 1 ,.......ȕˆ h-1 ) successively for any regression model each with 1 degree of freedom.

These sum of squares are distributed independently of s 2 . Since each sum of squares has one d.f., these can be compared to s 2 by a series of partial F-tests. Ex. 3-7: Data on returns (RET), earnings per share (EPS), market-to-book ratio (MBR), sales revenue in million BDT (REV), beta risk (BETA) and leverage ratio (LEV) for 29 listed companies for the year 2019 are collected from DSE to answer the following requirements: Requirements:

(i) Construct the ANOVA Tables for unadjusted and adjusted models for the given data. (ii) Test the null hypotheses H 0 : ȕ j = 0, (j= 1, 2,...,5). (iii) Test the null hypothesis, H 0 : ȕ1 = ȕ 2 =....= ȕ 5 = 0 and (iii) Test the null hypothesis H 0 : ȕ1 = ȕ 3 = ȕ 4 = 0. Solution: (i): Assume that the returns of the firms (RET) linearly depend on earnings per share (EPS), market-to-book ratio (MBR), sales revenue (REV), beta risk (BETA) and leverage ratio (LEV). Therefore, the multiple linear regression equation is given by

(3.296)

Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ 3 X 3i +ȕ 4 X 4i +ȕ 5 X 5i +İ i

where Y is the returns of the firm i, X1i is the earning per share of the ith firm, X 2i is the market to book ratio of the ith firm, X 3i is the sales revenue of the ith firm, X 4i is the beta risk of the ith firm and X 5i is the leverage ratio (LEV) of the ith firm. İ i is the random error term corresponding to the ith set of observations that satisfies all the usual assumptions of a CLRM. Applying the OLS method to equation (3.296), we have ȕˆ 0

3.8834, ȕˆ 1

0.0111, ȕˆ 2

0.3137, ȕˆ 3

0.00000078, ȕˆ 4

0.0123, and ȕˆ 5

0.0382 .

Thus, the estimated equation will be ˆ = 3.8834 + 0.0111X + 0.3137X +0.00000078X  0.0123X  0.0382X Y i 1i 2i 3i 4i 5i

The average value of the dependent variable RET is given by Y=

18.0502 29

0.6224

The unadjusted total sum of squares (TSSU) is given by

(3.297)

Multiple Regression Models

TSSU =

n

2 i

¦Y

139

(3.298)

32.3259

i=1

The unadjusted regression sum of squares (RSSU) is given by RSSU =

n

¦ Yˆ

2 i

(3.299)

29.9685

i=1

The residual sum of squares (ESS) is given by ESSU =

n

¦e

2 i

(3.300)

TSSU  RSSU = 2.3574

i=1

The adjusted total sum of squares (TSSA) is given by TSSA =

n

2 i

¦Y

 nY 2

32.3259  (29 u 0.62242 )

21.0912

(3.301)

i=1

The adjusted regression sum of squares (RSSA) is given by RSSA =

n

¦ Yˆ

2 i

 nY 2

29.9685  (29 u 0.6224 2 ) 18.7338

(3.302)

i=1

The null hypothesis for the unadjusted model to be tested is H 0 : ȕ 0 = ȕ1 = ȕ 2 = ȕ 3 = ȕ 4 = ȕ5 = 0

against an alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the test statistic is given by F=

=

RSSU/6 ~F(6, 23) ESSU/23

29.9685/6 ~F(6, 23) 2.3574/23

= 48.7314

(3.303)

Let the level of significance be 5%. Decision: At a 5% level of significance with 6 and 23 degrees of freedom, the table value of the F-test statistic is 2.53. Since the calculated value of the F-test statistic is greater than the table value, the null hypothesis will be rejected. Thus, it can be said that all the parameters are not zero. Some of them have significant impacts on Y.

For the adjusted model the null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 = ȕ3 = ȕ 4 = ȕ5 = 0

against an alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis the test statistic is given by F=

RSSA/5 ~F(5, 23) ESSA/23

=

18.7338/5 ~F(5, 23) 2.3574/23

= 36.55524

(3.304)

Chapter Three

140

Decision: At a 5% level of significance with 5 and 23 degrees of freedom, the table value of the F-test statistic is 2.64. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Thus, it can be said that some of the independent variables have significant impacts on stock returns Y.

Now, all the results for unadjusted and adjusted models are shown in the ANOVA Table 3-11. Table 3-11: The ANOVA Table

Source of Variations

df

ANOVA table for the unadjusted model Sum of Squares Mean Square F-test

Regression

6

29.9685

Residual Total

23 29

2.3574 32.3259

Regression Error Total

5 23 28

4.9948

F = 48.7314 0.1025 1.1147 ANOVA Table for the adjusted model 18.7338 3.7468 2.3574 0.1025 F = 36.5552 21.0912 0.7533

Decision The null hypothesis will be rejected.

The null hypothesis will be rejected.

(ii) The null hypothesis to be tested is H 0 : ȕ j = 0, (j = 1, 2,...,5)

against the alternative hypothesis H 0 : ȕ j z 0, (j = 1, 2,...,5)

The residual mean square is S2 =

ESS 2.357399 = 23 23

0.1025

The variance-covariance matrix of ȕˆ is given by ˆ = (XcX)-1s 2 var(ȕ)

ª0.2625 «-7.0226e-004 « «-0.0306 « «-1.1067e-006 «-0.0389 « ¬«-1.8201e-003

-7.0226e-004 6.0125e-005 -1.7081e-005 9.6641e-009 7.9470e-005 3.3641e-006

-0.0306 -1.1067e-006 -1.7081e-005 9.6641e-009 0.0109 -1.8475e-007 -1.8475e-007 1.0157e-010 1.9947e-003 4.3720e-008 1.8103e-004 5.2715e-009

-0.0389 -1.8201e-003 º 7.9470e-005 3.3641e-006 »» 1.9947e-003 1.8103e-004 » » 4.3720e-008 5.2715e-009 » 0.0289 8.4399e-005 » » 8.4399e-005 1.5361e-005¼»

Under the null hypothesis, for the parameter ȕ1 , the test statistic is given by t=

0.0111 6.0125e-005

~t (23) d.f.

1.4175

(3.305)

Table Value: At a 5% level of significance with 23 degrees of freedom, the table value of the test statistic is ± 1.7139. Decision: Since the calculated value of the test statistic falls in the acceptance region, the null hypothesis will be accepted. Thus, it can be said that the variable earning per share has no significant impact on returns.

Under the null hypothesis for the parameter ȕ 2 , the test statistic is given by

Multiple Regression Models

t=

0.31375 0.0109

141

~t (23)d.f.

= 3.0027

(3.306)

Decision: Since the calculated value of the test statistic does not fall in the acceptance region, the null hypothesis will be rejected. Thus, it can be said that the variable market-to-book ratio has a significant positive impact on returns.

Under the null hypothesis for the parameter ȕ 3 , the test statistic is given by t=

0.00000078 1.0157e-010

~t (23)d.f.

= 0.0774

(3.307)

Decision: Since the calculated value of the test statistic is smaller than the table value, the null hypothesis will be accepted. Thus it can be said that the variable sales revenue has no significant impact on returns at all.

Under the null hypothesis for the parameter ȕ 4 , the test statistic is given by t=

-0.01232734 0.0289

~t (23)d.f.

= -0.07247

(3.308)

Decision: Since the calculated value of the test statistic is smaller than the table value, the null hypothesis will be accepted. Thus, it can be said that the variable beta risk has no significant impact on returns.

Under the null hypothesis for the parameter ȕ 5 , the test statistic is given by: t=

-0.038158 1.5361e-005

~t (23)d.f.

= -9.73607

(3.309)

Decision: Since the calculated value of the test statistic does not fall in the acceptance region, the null hypothesis will be rejected. Thus, it can be said that the variable leverage ratio has a significant negative impact on returns.

(ii) The null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 =.....= ȕ5 = 0

against the alternative hypothesis H1: At least one of them is not zero.

Under the null hypothesis, the test statistic is given by: F=

(RESS  UESS)/ h ~F(h, n-h-1) UESS/(n-h-1)

(3.310)

where RESS = Restricted residual sum of squares, UESS = Unrestricted residual sum of squares, h = No. of restrictions, and n = Sample size. For the given problem, we have, UESS = 2.3574, RESS = 21.0912, h = 5 and n = 29. Putting the values of all the terms in equation (3.298), we have F=

(21.0912  2.3574)/5 ~F(5, 23) 2.3574/23

= 36.5552 Let the level of significance be Į = 5% .

(3.311)

Chapter Three

142

Decision: At a 5% level of significance with 5 and 23 degrees of freedom, the table value of the F-test statistic is 2.64. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected, implying that some of the independent variables are associated with returns.

(iii) The null hypothesis to be tested is H 0 : ȕ1 = ȕ3 = ȕ 4 = 0

against the alternative hypothesis

H1: At least one of them is not zero. We have UESS = 2.3574, RESS = 2.5668, h = 3 and n = 29. Putting the values of all the terms in equation (3.310), we have F=

(2.5668  2.3574)/3 ~F(3, 25) 2.3574/25

= 0.7402

(3.312)

Decision: At a 5% level of significance with 3 and 25 degrees of freedom, the table value of the F-test statistic is 5.77. Since the calculated value of the test statistic is smaller than the table value, the null hypothesis will be accepted, implying that the independent variables namely the earning per share, sales revenue and the beta risk are not associated with returns at all. Wald Test Procedure for Testing a General Linear Hypothesis in Case of Multiple Linear Regression Equation

In matrix notation, the multiple regression equation can be written as Y = Xȕ+İ

(3.313)

All the terms of equation (3.313) have already been explained previously. The null hypothesis to be tested is H 0 : Rȕ = q

against the alternative hypothesis H1: Rȕ z q

where R is known as a {m u (k+1)} with full row rank m (m < k+1) , q is known as a (m u 1) vector of constant, and ȕ is known as a {(k+1) u 1} vector of parameters. The hypothesis imposes m linearly independent restrictions on the parameters6. Let ȕˆ be the maximum likelihood estimator of ȕ which is normally distributed with mean ȕ and varianceˆ covariance matrix (XcX)-1ı 2 , i.e., ȕ~N[ȕ, (X cX) -1ı 2 ]. The expected value of (Rȕˆ  q) is given by ˆ  q = Rȕ  q E(Rȕˆ  q) = E(Rȕ)

(3.314)

Under the null hypothesis we have E(Rȕˆ  q) = 0

(3.315)

The variance-covariance matrix of (Rȕˆ  q) is given by

6

We know that, if Y follows p-variate normal distribution with mean 0 and variance V, i.e., if Y~N p (0, V) , then YcV -1Y is

distributed as chi-square with p degrees of freedom.

Multiple Regression Models

143

ˆ c = R(X cX)-1ı 2 R c var(Rȕˆ  q) = Rvar(ȕ)R

(3.316)

ˆ we have Since (Rȕˆ  q) is a linear combination of ȕ, (Rȕˆ  q) ~ N ª¬ Rȕ  q , R(XcX)-1R cı 2 º¼

(3.317)

Under the null hypothesis, equation (3.317) can be written as (Rȕˆ  q) ~ N ª¬ 0 , R(X cX)-1R cı 2 º¼

(3.318) -1

Thus, under the null hypothesis, the quadratic form, (Rȕˆ  q )c ª¬ var(Rȕˆ  q ) º¼ (Rȕˆ  q ) is distributed as a chi-square with m degrees of freedom, i.e., 1 (Rȕˆ  q)c ª¬ R(X cX)-1R c)ı 2 º¼ (Rȕˆ  q) ~ Ȥ 2m

(3.319)

where m is the number of linear restrictions, for example if ­ȕ1 = ȕ 2 =.........= ȕ k =1 H0 : ® ¯ȕ1  ȕ 2 0

In this case, m is 2. Here, ı 2 is unknown to us, so we have to replace it with an unbiased and consistent estimator which is given by s2 =

1 n 2 ¦ ei n-k-1 i=1

(3.320)

Thus, under the null hypothesis, the Wald test statistic is given by WT

1 (Rȕˆ  q)c ª¬s 2 R(X cX)-1R c) º¼ (Rȕˆ  q) ~ Ȥ 2m

Let the level of significance be

(3.321)

D.

Decision: At Į level of significance with m degrees of freedom, we find the table value of the test statistic WT. If the calculated value of the test statistic WT is greater than the table value, we reject the null hypothesis, otherwise we accept it. F-Test Procedure for Testing a General Linear Hypothesis in Case of Multiple Linear Regression Equation

Under H 0 , we have Rȕˆ  q = Rȕˆ  Rȕ = R ª¬ȕˆ  ȕ º¼

(3.322)

We have ȕˆ  ȕ

(X cX)-1Xcİ

Thus, equation (3.322) can be expressed as Rȕˆ  q = R(XcX)-1Xcİ

Replacing (Rȕˆ  q) by R(XcX)-1X cİ, the WT can be written as

(3.323)

Chapter Three

144 -1

ª R(X cX)-1X cİ º¼c ª¬ R(XcX)-1R c) º¼ ª¬ R(X cX)-1X cİ º¼ 2 WT = ¬ ~Ȥ m ı2 -1

=

=

İ cX(XcX)-1R c ª¬ R(X cX)-1R c) º¼ R(X cX)-1X cİ ı2

~Ȥ 2m

İ cVİ 2 ~Ȥ m ı2

(3.324) 1

where V = X(X cX)-1R c ª¬ R(X cX)-1R c) º¼ R(X cX)-1X c We know that ESS İ cMİ 2 = ~Ȥ (n-k-1) [ where M = I n -X(XcX)-1X c ] ı2 ı2

(3.325)

Now, 1

MV = MX(X cX)-1R c ª¬ R(X cX)-1R c) º¼ R(XcX)-1X c = 0, [ MX = 0 ]

Since MV = 0, the two quadratic forms İ cVİ and İ cMİ are independent. Thus,

(3.326) İ cVİ İ cMİ and are independently 2 ı ı2

distributed as chi-square with degrees of freedom m and (n-k-1) respectively. From the definition of the F test, we can see that F=

Ȥ 2m /m ~F(m, n-k-1) 2 Ȥ (n-k-1) /n-k-1

ª(Rȕˆ  q)c ª R(XcX)-1ı 2 R c) º 1 (Rȕˆ  q) º / m ¬ ¼ « »¼ = ¬ ~F(m, n-k-1) 2 2 ª¬(n-k-1)s /ı º¼ /n-k-1

=

1 (Rȕˆ  q)c ª¬ R(XcX)-1R c) º¼ (Rȕˆ  q)

where s 2 =

ms 2

~F(m, n-k-1)

(3.327)

Residual SS n-k-1

Let R12 be the residual sum of squares for the full model which is called the unrestricted residual sum of squares and is given by R12 = ece

(3.328)

Again, let us define, R 02 be the residual sum of squares in the reduced model under H 0 and is given by R o2 = ec0 e0  c (Y  Xȕ)  = (Y  Xȕ)

(3.329)

where ȕ is the OLS estimator of ȕ in the reduced model Under the null hypothesis H 0 : Rȕ = q, it can be shown that 1 (Rȕˆ  q)c ª¬ R(XcX)-1R c) º¼ (Rȕˆ  q)

R 02  R12

(3.330)

Multiple Regression Models

145

Also, under the null hypothesis it can be shown that (R 02  R12 ) and R12 are independently distributed as chi-square. Thus the F-test statistic is given by F=

(R 02  R12 ) / m ~F(m, n-k-1) R12 / n-k-1

(Restricted Residual SS  Unrestricted Residual SS)/m ~ F(m, n-k-1) Unrestricted Residual SS/n-k-1

(3.331)

The Lagrange Multiplier (LM) Test for Testing a General Linear Hypothesis in the case of Multiple Linear Regression Equation

In matrix form, the multiple linear regression equation can be written as (3.332)

Y = Xȕ+İ

The null hypothesis to be tested is H 0 : Rȕ = q, [ where there are r linear restrictions]

against the alternative hypothesis H1: Rȕ z q

The LM test is based on the statistical behaviour of the Lagrange multipliers in the Lagrangian form of the constrained (by null hypothesis) least-squares problem defining the restricted LS estimate of the type (3.333)

b r = min[(Y  Xȕ)c(Y  Xȕ), subject to Rȕ = q ȕ

The Lagrange expression for the constrained least squares minimisation problem is (3.334)

L(ȕ, Ȝ) = (Y  Xȕ)c(Y  Xȕ)  O c( Rȕ  q)

where O is a (r×1) vector of Lagrange multipliers corresponding to the r constraints Rȕ = q. The first-order conditions for minimisation are į L(ȕ, Ȝ) = 0 Ÿ 2XcY+ 2XcXȕ  R cO = 0 įȕ|ȕ=b r

(3.335)

į L(ȕ, Ȝ) = 0 Ÿ q  Rȕ = 0 įȜ

(3.336)

From equation (3.335), we have 1 b r = (X cX)-1X cY + ((X cX)-1R cȜ 2

(3.337)

From equation (3.336), we have ȕ = R -1q

(3.338)

From equation (3.335), we have R cO = 2X cXȕ  2X cY

XcX

-1

R cO

-1

ˆ 2[q  Rȕ]

-1

R X cX R cO

-1

ˆ O = 2 ª R X cX R cº [q  Rȕ] -1

¬

-1

2 X cX XcXȕ  2(X cX)-1X cY, [ Multiplying both sides by X cX ]

¼

(3.339)

Chapter Three

146

Putting the value of br

O in equation (3.337), we have

-1 ȕˆ  (X cX)-1R c ª¬ R(X cX)-1R cº¼ ¬ªq  Rȕˆ ¼º

(3.340)

where b r is the restricted least squares estimate of ȕ and ȕˆ is the unrestricted least squares estimate of ȕ. -1

-1 ˆ is a random vector representing all of the We know the Lagrange multiplier O = 2 ª R X cX R cº [q  Rȕ] ¬ ¼ hypothetically possible outcomes of the optimal Lagrange multiplier relating to all of the hypothetically possible ȕˆ

outcomes in a classical predate context. Under the null hypothesis, we have ˆ ~ N(0, ı 2 R(X cX)-1R c) (q  Rȕ)

(3.341)

Then, from the properties of linear combinations of the normally distributed random variables,

O

1 ~N ª0, V 2 R(X cX)-1R c º « »¼ ¬ 2

(3.342)

Thus, we have

O c ª¬ R(XcX)-1R cº¼ O 4ı 2

~Ȥ 2r

(3.343)

Let s 2r be the restricted estimator of ı 2 as given by s 2r =

(Y  Xb r )c(Y  Xb r ) n  k+r

(3.344)

Thus, from equation (3.343), we have

O c ª¬ R(XcX)-1R cº¼ O 4s 2r

~Ȥ 2r

(3.345)

It is known as the Lagrange multiplier test statistic. Ex. 3-8: Using the data given in Ex. 3-5, test whether linear combinations of coefficients are equal to some specified ­ȕ1  ȕ 2 +ȕ3  ȕ 4 = 1 ° values, i.e., H 0 : ®ȕ1 =  ȕ 2 °ȕ = ȕ ¯ 3 4

using the Wald test, F-test and LM test considering the regression equation of the type: ln(CO2 t ) = ȕ 0 +ȕ1ln(ENER t )+ȕ 2 ln(OPN t )+ȕ 3 ln(UR t )+ȕ 4 ln(PGDPt )+İ t

Solution: Given that ln(CO2 t ) = ȕ 0 +ȕ1ln(ENER t )+ȕ 2 ln(OPN t )+ȕ 3 ln(UR t )+ȕ 4 ln(PGDPt )+İ t

(3.346)

We assume that the random error term İ t satisfies all the usual assumptions of a CLRM. In matrix form, the equation (3.346) can be written as Y = Xȕ +İ

The OLS estimator of ȕ is

(3.347)

Multiple Regression Models

ª-4.1855 º «1.3425 » « » ȕˆ = «-0.0620 » « » «0.1428 » «¬-0.1031 »¼

147

(3.348)

Thus, we have, Wald Test Statistic

The null hypothesis to be tested is ­ȕ1  ȕ 2 +ȕ 3  ȕ 4 = 1 ° H 0 : ®ȕ1 =  ȕ 2 °ȕ = ȕ ¯ 3 4 ªȕ1 º ª1 -1 1 -1º « » ª1 º ȕ2 H 0 : ««1 1 0 0 »» « » = ««0 »» «ȕ3 » «¬0 0 1 -1»¼ « » «¬0 »¼ ¬«ȕ 4 ¼» H 0 : Rȕ = q

against the alternative hypothesis H1: Rȕ z q ª1 -1 1 -1º where R = ««1 1 0 0 »» ; ȕ = «¬0 0 1 -1»¼

ªȕ1 º ª1 º «ȕ » « 2 » ; and q = « 0» « » «ȕ 3 » «¬0 »¼ « » ¬«ȕ 4 ¼»

Under the null hypothesis the Wald test statistic is given by WT

1 (Rȕˆ  q)c ª¬s 2 R(X cX)-1R c) º¼ (Rȕˆ  q) ~ F m2

(3.349)

We have ª1 -1 1 -1º ˆ Rȕ  q = ««1 1 0 0 »» «¬ 0 0 1 -1»¼

ª1.3425 º « -0.0620 » ª1 º « »  «0 » « 0.1428 » « » « » «¬0 »¼ ¬« -0.1031 ¼»

ª 0.6504 º = ««1.2805 »» ¬« 0.2459 ¼»

(3.350)

For the given problem, we have ª 4.2946e-003 « 6.6709e-005 (XcX)-1s 2 = « « 6.2130e-003 « ¬« -5.7565e-004

where s 2 = 3.5565e-004

6.6709e-005 6.2130e-003 -5.7565e-004º 8.4580e-004 1.6433e-003 -1.0449e-003»» » 1.6433e-003 0.0889 -0.0141 » -1.0449e-003 -0.0141 3.1330e-003 ¼»

(3.351)

Chapter Three

148

Thus, we have

2

-1

ª¬s R(X cX) R c) º¼

1

ª369.1341 -254.8718 -361.6340 º « -254.8718 396.8828 232.2814 » « » «¬ -361.6340 232.2814 363.9758 »¼

(3.352)

Therefore, under the null hypothesis, the WT statistic is given by ª 0.6504 ºc ª369.1341 -254.8718 -361.6340 º ª0.6504 º WT = ««1.2805 »» ««-254.8718 396.8828 232.2814 »» ««1.2805 »» ~ Ȥ 32 d.f. ¬« 0.2459 ¼» ¬«-361.6340 232.2814 363.9758 ¼» ¬« 0.2459 ¼»

= 434.9916

(3.353)

Let the level of significance be Į = 5% . Decision: At a 5% level of significance with 3 degrees of freedom, the table value of the Chi-square test statistic is 7.82. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Using the F-Test:

Under the null hypothesis the F-test statistic is given by: F=

(R 02  R12 ) / m ~F(m, n  k  1) R12 / n  k  1

(3.354)

We have 1 (Rȕˆ  q)c ª¬ R(X cX)-1R c) º¼ (Rȕˆ  q)

(R 02  R12 )

0.1547

(3.355)

The unrestricted residual sum of squares is given by: UESS =

n

¦e

2 i

(3.356)

0.01565

i=1

Putting these values in equation (3.354), we have F=

0.1547/3 ~F(3, 44) 0.01565/44

= 144.9972

(3.357)

Let the level of significance be Į = 5% . Decision: At a 5% level of significance with 3 and 44 degrees of freedom the table value of the F-test statistic is 2.82. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Using the LM Test

Under the null hypothesis, the LM test statistic is given by LM =

O c ª¬ R(XcX)-1R cº¼ O 4s 2r

~Ȥ 2r

(3.358)

For the given problem we have ª 0.1246 º

-1 -1 ˆ « -0.2842 » O = 2 ª R XcX R cº [q  Rȕ] « » ¬ ¼

«¬ -0.1079 »¼

We know that:

(3.359)

Multiple Regression Models

R 02  R12

149

0.1547

R 02 = 0.1547+R 12 = 0.1547+0.01565 = 0.17035

(3.360)

Thus, we have s 2r

0.17035 47

0.00362

(3.361)

We also have

-1

R(X cX) R c

ª375.2037 36.3438 349.5958º «36.3438 14.8287 26.6466 » « » «¬349.5958 26.6466 338.0663»¼

(3.362)

Putting these values in equation (3.358), we have

LM

ª0.1246 ºc ª375.2037 «-0.2842 » «36.3438 « » « ¬«-0.1079 ¼» ¬«349.5958

36.3438

349.5958 º ª0.1246 º 14.8287 26.6466 »» «« -0.2842»» 26.6466 338.0663¼» ¬«-0.1079 ¼»

4 u 0.00362

~ F 32

= 42.6841

(3.363)

Let the level of significance be Į = 5%. Decision: At a 5% level of significance with 3 degrees of freedom, the table value of the test statistic is 7.82. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Note: We are lucky that different software packages like RATS, EViews, STATA, R and Python etc., can be applied directly for testing the joint null hypotheses and the linear forms of hypotheses. Test of Significance of Equality of Parameters of Two Multiple Linear Regression Equations

Let us consider a multiple linear regression equation of the type Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +......+ȕ k X ki +İ i

(3.364)

In matrix form, equation (3.364) can be written as Y = Xȕ +İ

(3.365)

For this model, the residual sum of squares ( ESS1 ) is given by ESS1 = ycy  ȕˆ cXcy

(3.366)

where TSS1 = ycy, RSS1 = ȕˆ cX cy, and ȕˆ = (XcX)-1X cy. Let us consider another multiple linear regression equation of the type Zi = Į 0 +Į1 W1i +Į 2 W2i +.......+Į k Wki +ȟ i

(3.367)

In matrix form, the equation (3.367) can be written as Z = WĮ + ȟ

(3.368)

The residual sum of squares ( ESS2 ) for model (3.368) is given by ESS2 = ZcZ  Įˆ cW cZ

where TSS2 = ZcZ, RSS2 = Įˆ cW cZ, and Įˆ = (W cW)-1 W cZ.

(3.369)

Chapter Three

150

Thus, the unrestricted residual sum of squares (UESS) is given by UESS = ESS1 +ESS2

(3.370)

With degrees of freedom (n-k-1+m-k-1) = n+m-2(k+1) The null hypothesis to be tested is ­Į 0 = ȕ 0 ° °°Į1 = ȕ1 H 0 : ®. °. ° °¯Į k = ȕ k

against the alternative hypothesis H1: At least one of them is not equal.

Under the null hypothesis the combined model/restricted model is given by (3.371)

Pi = Ȝ 0 +Ȝ1M1i +Ȝ 2 M 2i +........+Ȝ k M ki +u i

In matrix notation, equation (3.371) can be written as (3.372)

P = MO + u

The residual sum of squares (RESS) for the restricted regression equation (3.72), is given by RESS = PcP  Ȝˆ cM cP

(3.373)

where the total sum of squares for the restricted model is RTSS = PcP , the regression/explained sum of squares (RRSS) for the restricted model is RRSS = Ȝˆ cM cP, and Oˆ = (M cM)-1M cP. Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/k+1 ~F{(k+1, n+m-2(k+1)} UESS/{n+m-2(k+1)}

(3.374)

Let the level of significance be Į. Decision: At D level of significance with {k+1, n+m-2(k+1)} degrees of freedom, we find out the table value of the F-test statistic. If the calculated value of the F-test is greater than the table value, we reject the null hypothesis implying that all the parameters of the given two models are not equal. Ex. 3-9: The data given below are the net profit (NETP), investment (INVEST), total revenue (REV), and total assets (ASSET) of BRAC Bank and Dutch Bangla Bank Ltd. over a period of time. All the variables are measured in billion BDT. Table 3-12: Net profit (NETP), investment (INVEST), revenue (REV) and total assets (ASSET) of BRAC and Dutch Bangla Banks Ltd.

Year 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

NETP 0.19 0.33 0.62 0.97 1.30 1.66 1.70 0.54 1.40 2.09

BRAC BANK INVEST REV 2.02 2.16 3.77 3.71 5.00 6.11 8.25 10.90 10.38 13.34 12.86 15.32 14.20 18.30 25.37 21.09 21.30 23.94 22.81 23.90

ASSET 16.88 30.01 46.38 72.44 95.13 119.15 133.20 173.68 179.71 204.59

NETP 0.37 0.36 0.48 0.82 1.14 2.00 2.15 2.31 2.00 2.21

DUTCH BANGLA BANK INVEST REV 3.44 3.43 5.88 5.18 5.91 6.37 5.39 4.00 8.91 9.69 10.61 11.00 10.90 14.11 13.43 18.21 17.44 20.05 19.26 20.74

ASSET 32.28 45.49 49.37 60.68 81.79 101.18 123.27 155.92 185.54 215.99

Multiple Regression Models

151

Test the null hypothesis that the coefficients of two multiple regression equations of net profit on investment, revenue and total assets for two banks will be equal. Solution: Let the regression equation of net profit (Y) on investment ( X1 ), total revenue ( X 2 ) and total assets ( X3 ) for BRAC bank be given by Yt = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ3 X 3t +İ t

(3.375)

Assume that İ t satisfies all the usual assumptions of a CLRM. In matrix form, equation (3.375) can be written as Y = Xȕ +İ

(3.376)

For the given data, the OLS estimator of ȕ is ª0.1848 º «-0.2535 » » ȕˆ = « «-0.0240 » « » «¬0.0416 »¼

(3.377)

The total sum of square ( TSS1 ) is given by TSS1 = Y cY =15.4663

(3.378)

The regression/explained sum of squares ( RSS1 ) is given by RSS1 = ȕˆ cX cY = 15.3074

(3.379)

Thus, the residual sum of squares ( ESS1 ) for model 1 is given by ESS1 = 15.4663  15.3074 = 0.1588

(3.380)

Again the regression equation of net profit (Z) on investment (W1), total revenue (W2) and total assets (W3) for DUTCH Bangla Bank is given by Zi = Į 0 +Į1 W1i +Į 2 W2i +Į 3 W3i +u i

(3.381)

Assume that ui satisfies all the usual assumptions of a CLRM. In matrix form, the equation (3.381) can be written as Z = WĮ + u

(3.382)

For the given data, the OLS estimator of D is ª0.23212 º «-0.05369 » » Įˆ = « «0.10931 » « » «¬0.00458 »¼

(3.383)

The total sum of squares ( TSS2 ) for model (3.382) is given by TSS2 = ZcZ = 25.3463

(3.384)

The regression/explained sum of squares ( RSS2 ) for model (3.382) is given by RSS2 = Įˆ cW cZ = 24.1475

The residual sum of squares ( ESS2 ) for model (3.382) is given by

(3.385)

Chapter Three

152

ESS2 = 25.3463  24.3463 = 1.1988

(3.386)

The unrestricted residual sum of squares (URSS) is given by UESS = ESS1 +ESS2

0.1588  1.1988 1.3576

(3.387)

The null hypothesis to be tested is ­Į 0 ° °Į1 H0 : ® °Į 2 °¯Į3

= ȕ0 = ȕ1 = ȕ2 = ȕ3

against the alternative hypothesis H1: At least one of them is not equal.

Under the null hypothesis, the restricted model is given by: Pi = Ȝ 0 +Ȝ1M1i +Ȝ 2 M 2i +Ȝ 3 M 3i + vi

(3.388)

In matrix form, the equation (3.388) can be written as: (3.389)

P = MO +v

The OLS estimator of O is: ª 0.2916 º «-0.1703» » Oˆ = « «0.0562 » « » ¬«0.0207 ¼»

(3.390)

The total sum of squares (TSS) for the restricted model (3.389) is: (3.391)

TSS = PcP = 40.8126

The regression sum of squares (RRSS) for the restricted model (3.389) is ˆ cP = 38.9847 RRSS = ȜM

(3.392)

The residual sum of squares (RESS) for the restricted model (3.389) is given by RESS = 40.8126-38.9847 = 1.8279

(3.393)

Under the null hypothesis, the test statistic is given by F=

(1.8279  1.3576)/4 ~F(4, 12) 1.3576/12

= 1.0393

(3.394)

Let the level of significance be 5%. Decision: At a 5% level of significance with 4 and 12 degrees of freedom, the table value of the test statistic is 2.48. Since the calculated value is less than the table value, the null hypothesis will be accepted. Thus, it can be concluded that all the parameters of the two models are identical.

3.18 Multivariate Non-linear Relationships In practice, for many business, economic, banking, financial and managerial problems, the assumption of the linear relationship between Y and the explanatory variables X’s may not be appropriate. The theory may suggest that the relationship among variables in which one is the dependent variable and the remaining are independent variables can

Multiple Regression Models

153

be adequately explained only by a non-linear function. For example, in the demand function, supply function, production function, cost function etc. nonlinearities may be expected. In such a situation, the method of least squares will not be applicable directly to estimate the equation. Thus, to apply the OLS method to the equation, the non-linear relationship can be transformed into a linear form. Logarithmic transformation is the most commonly used technique to transform the non-linear relationship into a linear form. Some of the examples of transformations are discussed below: Transformation of Polynomial Functions

Let the non-linear relationship between Y and X’s be presented by a polynomial function of the type Yi = ȕ 0 +ȕ1X i +ȕ 2 X i2 +......+ȕ k X ik +İ i

(3.395)

To estimate this non-linear relationship, we have to transform X i2 , X 3i ,......, and X ik to new variables and then we regress Y on X and new transformed variables with a constant term. Let us define, Zi = X i2 , Wi =X 3i , etc. So, the transformed equation will be (3.396)

Yi = ȕ 0 +ȕ1X i +ȕ 2 Zi +ȕ 3 Wi +.......+İ i

The transformed equation (3.396) is linear and we can apply the ordinary least squares method to estimate this equation. Ex. 3-10: We know a marginal cost function is a quadratic form, thus the functional relationship between the marginal cost and the level of output is given by MCi

ȕ 0 +ȕ1q i +ȕ 2 q i2 +İ i

(3.397)

where MC is the marginal cost of a commodity, q is the level of output of a commodity, İ is the random error term. This is a second-degree polynomial function. To estimate this non-linear equation, we have to transform this seconddegree polynomial function into a linear form. Let us define X1i = q i , and X 2i = q i2 . Thus, the transformed equation would be (3.398)

MCi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i

The transformed equation (3.398) is a three-variable linear equation and we can apply the OLS method to estimate the equation. Ex. 3-11: The yearly outputs (in units) and the total costs (in USD) of a firm over a period of time are given in Table 3-13. Table 3-13: Yearly outputs and costs of a firm

Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

Costs 9500 12500 19800 31500 45600 32600 50300 62500 85200 76500 95600 100200

Output 88 100 225 380 460 385 520 690 885 820 935 975

Year 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 Source: DSE

Estimate the total cost function of the type

Costs 120000 135000 125000 152000 162000 142000 195000 198000 202500 204525 210500 225000

Output 1050 1150 1100 1280 1335 1185 1350 1365 1385 1398 1425 1435

154

Chapter Three

C t = ȕ 0 +ȕ1q t +ȕ 2 q 2t +ȕ3 q 3t + İ t Solution: The given cost function is C t = ȕ 0 +ȕ1q t +ȕ 2 q 2t +ȕ3 q 3t + İ t

(3.399)

To estimate the given non-linear equation, we compute q i2 , and q 3i , and then we can apply the OLS method to the equation (3.382). The equation is estimated using RATS and the results are given below. Cˆ t =  2239.1636 +131.8542q t  0.117975q 2t + 0.000092q 3t ; R 2 t-Test: -0.30404 3.21777  1.95816 SE: 7364.5949 40.97692 0.060248

3.63653 0.000025

0.9916 ½ ° ¾ ° ¿

(3.400)

Thus, from the estimated results it can be said that the total cost function will be non-linear. Table 3-14: The estimated results of the cost equation

Estimation by Least Squares, Dependent Variable C: Annual Data from 1995:01-2018:01 Variable Coefficient Std. Error T-Stat Signif Constant -2239.1636 7364.5949 -0.30404 0.7642 q 131.8542 40.97692 3.21777 0.0043 q2 -0.11797 0.060248 -1.95816 0.0643 q3 0.000092 0.000025 3.63653 0.0016 Usable Observations 24 Standard Error of Estim. 6924.39633 Centered R2 0.9916 Sum of Squared Residuals 958945290.77 Adjusted R2 0.9903 Regression F(3,20) 785.9606 Uncentered R2 0.9978 Significance Level of F 0.0000 TR2 23.945 Log Likelihood -244.09401 Mean of Dependent Variable 112221.875 Durbin-Watson Statistic 1.2998 Std Error of Depen. Variable 70406.5629 Double-log Transformation

In a double-log model, both the dependent and independent variables appear in the logarithmic form. Let us consider the following non-linear relationship between Y and independent variables X’s Yi = A 0 X1iȕ1 Xȕ2i2 ......Xȕkik eİi

(3.401)

where Yi is the ith observation of the dependent variable Y, X ji (j=1, 2,…,k) is the ith observation of the jth independent variable, the parameter ȕ j (j =1, 2,…,k) is called the elasticity of the variable Y with respect to Xj (j = 1, 2,……,k) which means that, for 1% change in the variable Xj the variable Y will change by ȕ j % given that all other independent variables are constant. To estimate equation (3.401) or the parameters of equation (3.401), we have to transform this non-linear equation into a linear form using the logarithmic technique. Taking logarithms in both sides of equation (3.401), we have ln(Yi ) = ln(A 0 )+ȕ1ln(X1i )+ȕ 2 ln(X 2i )+......+ȕ k ln(X ki )+İ i

(3.402)

Let Zi = ln(Yi ), ȕ 0 = ln(A 0 ), W1i =ln(X1i ), W2i =ln(X 2i ), …….., and Wki =ln(X ki ). Thus, we have Zi = ȕ 0 +ȕ1 W1i +ȕ 2 W2i +......+ȕ k Wki +İ i

(3.403)

This is a multiple linear regression equation and we can apply the ordinary least squares method to this transformed equation (3.403) to estimate the parameters or to estimate the equation. Ex. 3-12: Let us consider the Cobb-Douglas production of the type Pt = A 0 LEt1 K ȕt 2 eİ t

(3.404)

Multiple Regression Models

155

where P = level of the output variable, L= quantity of input variable labour, K= quantity of input variable capital, A 0 is constant, ȕ1 , and ȕ 2 are two parameters, ȕ1 is called the output elasticity with respect to input labour given that K is fixed, and ȕ 2 is called the output elasticity with respect to input capital given that L is fixed. If ȕ1  ȕ 2 1, the production function is said to be a constant return to scale; if ȕ1  ȕ 2 ! 1, the production function is said to be an increasing return to scale; and if ȕ1  ȕ 2  1, the production function is said to be a decreasing return to scale. To estimate ȕ1 and ȕ 2 , we have to transform the non-linear equation into a linear form using the logarithmic technique. Taking logarithms on both sides of equation (3.404), we have (3.405)

ln(Pt ) = ln(A 0 )+ȕ1ln(L t )+ȕ 2 ln(K t ) +İ t

Model (3.405) is known as the double-log transformation. Let us define, Yt = ln(Pt ), ȕ 0 = ln(A 0 ), X1t = ln(L t ), and X 2t = ln(K t ) . So, the equation (3.405) can be written as (3.406)

Yt = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +İ t

This is a three-variable linear regression equation and we can apply the OLS method to estimate ȕ 0 , ȕ1 , and ȕ 2 . Ex. 3-13: Given below are the data of export values (EX) in million USD of the footwear industry of Bangladesh, world GDP (WGDP, in billion USD, constant 2015 USD), and world relative price index in percentage over a period of time. Table 3-15: Export values (EX) of the footwear industry, world GDP (WGDP) and word relative price index (RPI)

YR 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

EX 1.20 1.29 1.45 1.70 1.85 1.95 2.11 2.20 2.40 2.62 2.80 3.00 5.10 11.18 20.20 19.10 22.10 22.30 40.10

WGDP

RPI

27906.71 28444.33 28564.81 29254.26 30573.30 31709.56 32785.74 34000.05 35569.85 36877.30 37951.34 38489.94 39170.13 39771.07 40968.95 42209.90 43637.26 45253.76 46410.54

2.34 2.06 2.42 2.21 2.50 2.73 2.88 3.00 3.83 7.24 17.39 20.70 28.45 62.30 119.95 127.53 136.78 137.47 133.20

Year 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

EX 51.30 52.20 38.20 40.00 39.00 68.00 88.00 95.00 123.00 159.00 183.00 204.00 298.00 336.00 419.00 550.00 673.00 902.20 1234.00

WGDP

RPI

47929.01 50035.31 51008.65 52126.92 53643.34 55994.77 58160.93 60681.27 63240.65 64413.01 63326.81 66036.93 68093.25 69804.15 71655.32 73690.29 75792.98 77737.72 80198.00

131.88 134.51 137.09 136.97 133.83 128.17 123.56 119.51 113.03 109.24 105.19 100.00 93.36 90.70 86.79 83.32 80.15 77.77 75.56

Source: World Bank’s Development Indicators. UNCTAD Statistics, EPB, Bangladesh

Estimate the export demand function for the footwear industry of Bangladesh considering the non-linear relationship. Solution: Let the export demand function for the footwear industry of Bangladesh be EX t =A 0 WGDPtȕ1 RPIȕt 2 eİ t

(3.407)

where the variable EXt is the export values of the footwear industry of Bangladesh at time t, WGDPt indicates the world income at time t, RPIt is the word relative price index at time t, and A 0 is constant. The parameter ȕ1 is the elasticity of EX with respect to WGDP given that RPI is fixed and ȕ 2 is the elasticity of EX with respect to RPI given

Chapter Three

156

that WGDP is fixed. İ t is the random error term corresponding to the tth set of observations that satisfies all the usual assumptions of a CLRM. To estimate the equation (3.407) or to estimate A 0 , ȕ1 , and ȕ 2 we have to transform the non-linear equation (3.407) into a linear form. Taking logarithms on both sides of equation (3.407), we have (3.408)

ln(EX t ) = ln(A 0 ) +ȕ1ln(WGDPt )+ȕ 2 ln(RPI t )+İ t

This model (3.408) is known as the double-log transformation. Let Yt = ln(EX t ), ȕ 0 = ln(A 0 ), X1t = ln(WGDPt ), and X 2t =ln(RPI t ). Therefore, equation (3.408) can be written as (3.409)

Yt = ȕ 0 +ȕ1X1t + ȕ 2 X 2t + İ t

This is a three-variable linear equation. We assume that the random error term İ t satisfies all the usual assumptions of a linear regression equation. We can then apply the OLS method to equation (3.409) to estimate ȕ 0 , ȕ1 , and ȕ 2 respectively. For the given problem, we have, Y = 3.286, X1 = 10.766, and X 2 = 3.606. SP(X1 ,Y)

n

¦ (X

 X1 )(Yt  Y)

1t

n

¦ (X

25.464, SS(X1 )

1t

t=1

SP(X 2 ,Y)

 X 2 )(Yt  Y) =103.38, SS(X 2 )

2t

t=1

SP(X1 , X 2 )

3.903,

t=1

n

¦ (X

 X1 ) 2

n

¦ (X

2t

 X 2 )2

97.212, and

t=1

n

¦ (X

1t

 X1 )(X 2t  X 2 ) = 15.473.

t=1

Putting the values of all the terms in equations (3.22), (3.21) and (3.16), we have 25.464 u 97.212  15.473 u 103.38 ȕˆ 1 = 2 3.903 u 97.212  >15.473@ 103.38 u 3.903  15.473 u 25.464 ȕˆ 2 = 2 3.903 u 97.212  >15.473@

6.255

(3.410)

0.0678

(3.411)

and ȕˆ 0 = 3.286  6.255×10.766  0.0678×3.606 =  64.2977

(3.412)

Therefore, the estimated equation is ˆ =  64.2977 + 6.255X + 0.0678X ; R 2 Y t 1t 2t t-test: -19.5268 SE: 3.2928

19.3815 0.3227

1.0492 0.0647

0.9694½ ° ¾ ° ¿

(3.413)

Comment: From the estimated results, it is found that the export elasticity with respect to world income is 6.255, which means that for increasing 100% world income, the export values of the footwear industry of Bangladesh will increase by 625.5% when the relative price index is constant; and the export elasticity with respect to relative price index is 0.0678 which means that, for increasing 100% relative price index, the export values will increase by 6.78% when the world income is constant. From the estimated value of R 2 , it can be said that 96.94% of the total variation in the dependent variable Y is explained by the fitted regression equation and the remaining 3.06% is explained by the random factors. Thus the fit is very good.

Multiple Regression Models

157

3.19 Testing for Non-Linearity When we are dealing with business, economic, socio-economic, finance, banking and managerial problems the assumption of linearity between Y and X may not be appropriate. In most cases, the relationships between Y and X will be non-linear. Thus, it is very important to test for departures from the linearity assumption. To test for linearity, the null hypothesis is H 0 : E(Yt |X t = x t ) = x ct ȕ

against the alternative hypothesis H1: E(Yt |X t = x t ) z x ct ȕ , that is H1: E(Yt |X t = x t ) = f(x t )

The question may arise of postulating a particular functional form for f(x t ), which is not available unless we are prepared to assume a particular form for D(Yt |X t :ȕ) . Alternatively, we can use the parametrisation related to the Kolmogorov-Gabor and systematic component polynomials. Using third-order Kolmogorov-Gabor polynomials (KG(3)), we can postulate the alternative statistical general model of the type Yt =X ct ȕ 0 + Zc2t Ȗ 2 +Zc3t Ȗ 3 +u t

(3.414)

where Z2t includes the second-order terms x it x jt where i t j; i, j = 1, 2,.....,k, and Z3t includes the third-order terms x it x jt x lt where i t j t l; i, j,l, = 1, 2,.....,k (where x1t is assumed to be constant). Assuming that T is large enough to enable us to estimate (3.397). The null hypothesis to be tested is ­J 2 = 0 H0 : ® ¯J 3 = 0

against the alternative hypothesis ­J 2 z 0 H1: ® ¯J 3 z 0

Equivalently, the null hypothesis can be written as H 0 : RȖ = 0

against the alternative hypothesis H1: RȖ z 0 ªȖ º ª1 0 º where R = « , and Ȗ = « 2 » . » ¬0 1 ¼ ¬ Ȗ3 ¼

Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/q ~F(q, T-k * ) URSS/T-k *

(3.415)

where UESS= unrestricted residual sum of squares, RESS = restricted residual sum of squares which is based on the null hypothesis. Another form of the F test is given by 1 F = ª« RJˆ c ª¬s 2 R(X cX)-1R cº¼ RȖˆ º» / q~F(q, T-k * ) ¬ ¼

(3.416)

where q is the number of restrictions, (XcX)-1ı 2 is the variance-covariance matrix of the parameters, and s 2 is the OLS estimate of ı 2 .

Chapter Three

158

Comment: At Į level of significance with q and (T-k*) degrees of freedom, we find out the table value of the F-test statistic. If the calculated value of the F-test is greater than the table value, the null hypothesis will be rejected implying the existence of a non-linearity in the functional form between Y and X’s.

An asymptotically equivalent test can be based on the R 2 of the auxiliary regression equation ˆ + J c Z +J c Z +u e t = x ct (ȕ 0  ȕ) 2 2t 3 3t t

(3.417)

Under the null hypothesis, the Lagrange multiplier test statistic LM(y) = TR 2 ~F q2

(3.418)

R2 is the coefficient of determination of equation (3.417), and q is the number of restrictions. Comment: At D level of significance with q degrees of freedom, we find out the table value of the Chi-square test statistic. If the calculated value of the Chi-square test is greater than the table value, the null hypothesis will be rejected indicating the existence of a non-linear functional form between Y and X’s.

For a small sample, the F test is preferable in practice because of the degrees of freedom adjustment. Using the polynomial in M t , we can postulate an alternative general model of the form Yt = x ct ȕ 0 + C 2 M 2t + C3 M 3t +......+C m M mt + v t

(3.419)

where M t = x ct ȕ The null hypothesis to be tested is H 0 : C2 = C3 =......=Cm

0

against the alternative hypothesis H1: At least one of them is not zero.

Again, this can be tested using the F-type test. Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/m ~F(m, T-k * ) UESS/T-k *

(3.420)

where m is the number of restrictions. We can also use the LM test based on the auxiliary regression ˆ + e t = x ct (ȕ 0  ȕ)

m

¦ C Mˆ +v , where Mˆ = x cȕˆ i

j t

t

t

t

(3.421)

j=2

Under the null hypothesis, the LM test is given by LM = TR e2 ~F m2

(3.422)

Where R 2e is the coefficient of determination from the auxiliary regression equation (3.421) and m is the number of restrictions. If we accept the null hypothesis, it indicates the linear relationship between Y and X’s. Otherwise, the relationship between Y and X’s will be non-linear. Note: Different software packages like EViews, GAUSS, LIMDEP, MATLAB, Python, R, RATS, SAS, SHAZAM, SPLUS, SPSS, STATA, TSP can be applied directly for different analyses multiple regression models.

Multiple Regression Models

159

Exercises 3-1: Define a multiple regression and multiple linear regression equation with an example of each. 3-2: Distinguish between three-variable linear and non-linear regression equations. 3-3: What are the basic assumptions of a multiple linear regression equation? 3-4: Let three-variable linear regression equation be given by: Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i ;

(i) Explain all the terms of the given equation. (ii) Discuss the technique to estimate this equation. (iii) Obtain the variance of the estimators ȕˆ 0 , ȕˆ 1 , and ȕˆ 2 . (iv) Discuss the technique to test the null hypothesis H 0 : ȕ j

0, ( j=1, 2) against an appropriate alternative hypothesis.

(v) Discuss the technique to obtain the 90% and 95% confidence intervals for the population parameter ȕ j (j=1, 2). (vi) Discuss the technique to test the null hypothesis H 0 : ȕ1 = ȕ 2 of a three-variable linear regression equation with the ANOVA table. 3-5: Define the coefficient of determination in case of a three-variable linear regression equation. Discuss the techniques to obtain it. 3-6: Define a multiple linear regression equation with an example. Explain the meaning of all the terms of a multiple linear regression equation. 3-7: Write a multiple linear regression equation in the matrix form, and discuss the OLS method to estimate a multiple linear regression equation. 3-8: Discuss the important properties of the OLS estimators of a multiple linear regression equation. 3-9: Discuss the ML method to estimate a multiple linear regression equation. 3-10: Discuss all the important properties of the MLE of ı 2 . 3-11: Show that the OLS estimator s 2 is an unbiased estimator of ı 2 but the ML estimator of ı 2 is biased. ece is a consistent estimator of ı 2 where e is the residual vector, n is the sample size and k is n-k-1 the number of independent variables in a multiple linear regression equation.

3-12: Show that s 2 =

3-13: Show that Y~N(Xȕ, ı 2 I n ) , where I n is a unit matrix of order n. ˆ 3-14: Show that Y~N(Xȕ, ı 2 P), where P is a symmetric and idempotent matrix.

3-15: Show that RSS = ȕˆ 1SP(X1 ,Y)+ȕˆ 2SP(X 2 ,Y)+......+ȕˆ k SP(X k ,Y) , where RSS is the regression sum of squares and SP(X j ,Y) is the sum of products between Y and X j (j=1, 2,…,k). 3-16: Show that the total sum of squares can be partitioned into components sum of squares. 3-17: Find the quadratic forms of the regression sum of squares (RSS), residual sum squares (ESS) and total sum of squares (TSS). 3-18: Show that RSS and ESS are independent. 3-19: Find the expected value of the components’ sum of squares. 3-20: Show that

ESS is distributed as Chi-square with degrees of freedom (n-k-1). ı2

3-21: Find the sampling distribution of s 2 = ıˆ 2 and also find its mean and variance.

Chapter Three

160

ˆ 3-22: Discuss the important properties of Y. 3-23: Discuss the important properties of residuals. 3-24: Explain the meaning of the coefficient of multiple determination ( R 2 ). Discuss the techniques to obtain it. Discuss the important characteristics of R 2 . 3-25: Discuss different steps that are involved in fitting a multiple linear regression equation with and without distributional assumptions of the random error term. 3-26: Show that ryy2ˆ

R 2 , where ryyˆ is the correlation coefficient between y and yˆ .

3-27: Define analysis of variance. Construct the ANOVA tables for the unadjusted and adjusted multiple linear regression equations. 3-28: Discuss the test of significance of parameter estimates of a multiple linear regression equation. 3-29: Discuss the technique to test the joint null hypothesis H 0 : ȕ1 = ȕ 2 hypothesis where h < k, k is the number of independent variables.

.... ȕ h against an appropriate alternative

3-30: Discuss the F-test, Wald test, and Lagrange multiplier (LM) test procedures for testing a general null hypothesis in case of multiple linear regression equations. 3-31: Discuss the test procedure for testing the equality of parameters of two multiple linear regression equations. 3-32: Define multivariate non-linear relationships with an example of each. 3-33: Define a double log model with an example. Define the Cobb-Douglas production function with an example. Explain the meaning of all the terms of a Cobb-Douglas production function. 3-34: Discuss the technique to estimate the Cobb-Douglas production function of the type q i =A 0 Lȕi1 K ȕi 2 eİi and explain the meaning of all the terms of this function. 3-35: Discuss the testing procedure for non-linearity of a multiple regression equation. 3-36: Let us consider a multiple linear regression equation of type Yt =ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ3 X 3t +ȕ 4 X 4t  İ t . Which of the following hypotheses can be tested using the t-test? Which of them can be tested using an F test, an LM test, and the Wald test? In each case, state the number of restrictions. (i). H 0 : ȕ1 =1.5 , (ii). H 0 : ȕ 2 =1 , (iii). H 0 : ȕ1 = ȕ 2 =ȕ3 =ȕ 4 = 0 , (iv). H 0 : ȕ1 +ȕ 2 =2 ­ȕ1 +ȕ 2 = 2 (v). H 0 : ® ¯ ȕ2 = 1 (vi). H 0 : ȕ 2ȕ3 1 3-37: Which would you expect to be bigger: the unrestricted residual sum of squares (UESS) or the restricted residual sum of squares (RESS) and why?

3-38: Let the variable Y indicate private investment (in million USD), the variable X1 indicate GDP (in million USD) and the variable X 2 indicate interest rate (in USD). Here, given that, 20

¦Y

i

20934.6,

i=1 20

¦X

1i

20

¦X

1i

= 131441.3,

i=1

X 2i = 790570.132,

i=1

20

¦X

2i

= 130.8,

i=1

20

2 1i

¦X

=956461413.85,

i=1

20

¦X

1i

20

¦X

Yi =154118253.87,

i=1

20

¦X i=1

2 2i

=951.2334, and

2i

Yi =125989.52,

i=1

20

2 i

¦Y

=25009757.62.

i=1

Requirements:

(i) Obtain the regression equation of private investment on GDP and interest rate and comment on your obtained results. (ii) Do you think the hypothesis H 0 : ȕ j

0, ( j = 1, 2) is statistically significant at a 5% level of significance? Why?

(iii) Calculate the 95% confidence interval for the parameter ȕ j (j = 1, 2) of the regression equation.

Multiple Regression Models

161

(iv) Calculate total sum of squares (TSS), regression sum of squares (RSS) and residual sum of squares (ESS). (v) Calculate the value of the coefficient of determination, and comment on your obtained results. (vi) Find Adj(R 2 ) and compare it with R 2 . (vii) Calculate the investment elasticity with respect to GDP and interest rate and comment on your obtained results. 3-39: Let the variable Y indicate per capita consumption of chickens (in kg), the variable X1 indicates the real retail price of chicken per kg (in $) and the variable X 2 indicates per capita real disposable income (in $). Let for the sample observations the equation is ˆ = ȕˆ +ȕˆ X +ȕˆ X Ÿ y = ȕˆ x +ȕˆ x +e ; where y Y i 0 1 1i 2 2i i 1 1i 2 2i i i

Yi  Y, , x1i

X1i  X1 , and x 2i

X 2i  X 2 .

Given that, 15

¦Y

i

533.5,

i=1

ss(x 2 )

sp(x1 ,y)

15

¦X

1i

= 610.2,

i=1

15

¦X

2i

= 9776, ss(x1 )

i=1

15

15

i=1

i=1

1i

i=1

 X1 ) 2 = ¦ x1i2 = 248.36, i=1

15

15

i=1

i=1

¦ (X1i  X1 )(X 2i  X 2 ) = ¦ x1i x 2i = 7087.54,

15

¦ (X

15

1i

i=1

¦ (X 2i  X 2 )2 = ¦ x 22i = 590164.37, sp(x1 ,x 2 ) 15

15

¦ (X

 X1 )(Yi  Y) = ¦ x1i yi = 101.3, sp(x 2 ,y) i=1

15

¦ (X

15

2i

i=1

 X 2 )(Yi  Y) = ¦ x 2i y i = 12955.38, i=1

Total Sum of Squares (TSS) = 316.953, and Residual Sum of Squares (ESS) = 14.7210 Requirements:

(i) Obtain the regression equation and comment on your estimated results. ª ȕˆ º (ii) Obtain the variance-covariance matrix of ȕˆ = « 1 » . ˆ ¬«ȕ 2 ¼»

(iii) Do you think the hypothesis H0 : ȕ j

0, (j=1, 2) is statistically significant at a 5% level of significance and why?

(iv) Find Adj(R 2 ) and compare it with R 2 . (v) Find, the 95% confidence interval for ȕ1 and ȕ 2 respectively and comment on your obtained results. (vi) Calculate the price and income elasticities of demand. 3-40: Let the variable Y indicate the quantity demanded of a commodity Q (in kg), the variable X1 indicate the per unit price of that commodity (in $), and the variable X 2 indicate per capita disposable income (in $). For sample observation the equation is given by Yi = ȕˆ 0 +ȕˆ 1X1i +ȕˆ 2 X 2i  ei Ÿ yi = ȕˆ 1 x1i +ȕˆ 2 x 2i +ei ; where yi Given that, 23

¦Y

i

i=1

912.4,

23

¦X

1i

= 1103.9,

i=1

ª 23 2 « ¦ x1i c X X = « 23 i=1 « « ¦ x1i x 2i ¬ i=1

2719.04

23

¦X

2i

X1i  X1 , and x 2i

= 23806.5,

i=1

23

¦ x1i x 2i i=1

140788.25

Yi  Y, x1i

23

¦x i=1

2 2i

º 140788.25» » , and XcY = » 8398168.67 » ¼

§ 23 ¨ ¦ x1i yi ¨ i=1 ¨ 23 ¨ ¦ x 2i yi © i=1

· 1514.67 ¸ ¸ ¸ 94923.39 ¸ ¹

X 2i  X 2 .

Chapter Three

162

Total Sum of Squares (TSS) = 1195.9287; Residual Sum of Squares (ESS) = 106.6517 Answer the following questions:

(i) Obtain the regression equation and comment on your estimated results. ª ȕˆ º (ii) Obtain variance-covariance matrix of ȕˆ = « 1 » . «¬ȕˆ 2 »¼ (iii) Do you think the hypothesis H 0 : ȕ j

0, ( j = 1, 2) is statistically significant at a 5% level of significance? Why?

(iv) Calculate R 2 and compare it with Adjusted(R 2 ). (v) Calculate the price and income elasticities of demand. 3-41: Suppose you estimate the following regression equation to evaluate the effects of various firm-specific factors on the returns of a sample of 150 firms. RETi =ȕ 0 +ȕ1PE i +ȕ 2 MBR i +ȕ 3Si +ȕ 4 BETA i +İ i

Where RETi is the annual return in the percentage of the ith firm, PE i is the price-earning ratio of the ith firm, MBR i is the market to book ratio of the ith firm, Si is the size of the ith firm which is measured in terms of sales revenue, BETA i is the stock’s CAPM beta coefficient The obtained results are given below ˆ = 0.0785+0.1656PE +0.5678MBR +0.7568S  0.1025BETA ; R 2 RET i i i i i SE:

0.0654 0.4575

0.1455

0.1875

0.756

0.6578

Answer the following questions:

(i) Calculate the t-ratios. (ii) What do you conclude about the effect of each variable on the returns of firms? (iii) Based on your obtained results, what variable(s) would you consider for deleting from the equation? (iv) If a firm’s beta risk increased from 1 to 1.3, what would be the expected effect on the firm’s return? (v) Is the sign-on beta as you would have expected? Explain your answer in each case. 3-42: Assume that the variable Y indicates the level of outputs in log (Yi = lnPi, the variable P indicates the level of output), the variable X1 indicate the number of labours in log ( i.e., X1i = ln(Li), the input variable L indicates the number of labours ) and the variable X2 indicates capital in log (i.e., X2i =log(Ki), the input variable K indicates level of capital). For sample observations the equation is given by Yi = ȕˆ 0 +ȕˆ 1X1i +ȕˆ 2 X 2i  ei Ÿ yi = ȕˆ 1 x1i +ȕˆ 2 x 2i +ei , where yi

Yi  Y, x1i

X1i  X1 , and x 2i

X 2i  X 2 .

Given that, 20

¦Y

i

244.521,

i=1

20

¦X

1i

i=1

= 186.047,

20

¦X

2i

i=1

= 253.385, SS(x1 ) =

20

¦x

2 1i

0.6759, SS(x 2 ) =

i=1

20

20

20

i=1

i=1

i=1

20

¦x i=1

sp(x1 ,x 2 ) = ¦ x1i x 2i = 1.32, sp(x1 ,y) = ¦ x1i yi = 1.35, sp(x 2 ,y) = ¦ x 2i yi = 2.72,

Total Sum of Squares = 2.7653; and Residual Sum of Squares = 0.0136 Requirements:

(i) Estimate the non-linear equation Pi = A 0 Lȕi1 K ȕi 2 eİi and comment on your obtaining results.

2 2i

2.675,

Multiple Regression Models

163

(ii) Show that ȕ1 and ȕ 2 are the output elasticities with respect to labour and capital respectively. (iii) Test the null hypothesis H 0 : ȕ j

0, ( j = 1, 2) against a suitable alternative hypothesis.

(iv) Find, the 95% confidence interval for the parameter ȕ j (j =1, 2) of the regression equation. (v) Calculate TSS, RSS and ESS. (vi) Calculate the value of the coefficient of determination, and comment on your obtained result. (vii) Find Adjusted(R 2 ) and compare it with R 2 . (viii) Calculate the output elasticity with respect to input labour and capital. 3-43: Data on the quantity demanded of a commodity (in kg), the price of that commodity (in USD) and the level of income (in USD) are given below:

Quantity Demanded (q) 100 75 80 70 50 65 90 100 110 60 75 65

Price (p)

Income (in)

5 7 6 6 8 7 5 4 3 9 6 7

1000 600 1200 500 300 400 1300 1100 1300 300 450 350

Let the demand function be given by, q i = Apȕi 1 in ȕi 2 eİi , where q = per capita consumption of a commodity, p = price per unit of the commodity, in = per capita disposable income. Requirements:

(i) Show that ȕ1 is the elasticity of demand with respect to price and ȕ 2 is the elasticity of demand with respect to income. (ii) Discuss the method to estimate the parameters. (iii) Estimate the non-linear equation q i = Apȕi 1 in ȕi 2 eİi and comment on your obtained results. (iv) Test the null hypothesis H 0 : ȕ j

0, ( j =1, 2) against a suitable alternative hypothesis.

(v) Find, the 95% confidence interval for parameters. (vi) Calculate TSS, RSS and ESS. (vii) Calculate the value of the coefficient of determination and comment on your obtained results. (viii) Find, Adjusted(R 2 ) and compare with R 2 .

Chapter Three

164

3-44: Let the functional relationship of private investment with GDP and interest rate be given by pinv t = AGDPtȕ1 irtȕ2 eİ t , where, the variable pinv indicates private investment, the variable ir indicates interest rate İ is the random error term. The private investment (in billion dollars) , GDP (in billion dollars) and interest rate are given below:

Private Investment (in Billion Dollar) 524 616 696 729 750 769 823 850 824 802 890 979 1072 1135 1251 1369 1524 1652

GDP in Billion Dollar 3316 3689 4034 4319 4538 4892 5259 5588 5847 6081 6470 6797 7218 7530 7982 8479 8975 9560

Interest Rate 12 9 11 8 7 7 8 10 9 6 4 4 5 6 6 6 6 6

Source: World Bank’s Development Indicators.

Requirements:

(i) Obtain the regression equation of private investment on GDP and interest rate and comment on your obtained results. (ii) Test the null hypothesis H 0 : ȕ j

0, ( j =1, 2) against a suitable alternative hypothesis.

(iii) Find, the 95% confidence interval for the parameter ȕ j (j = 1, 2) and comment on your results. (iv) Calculate TSS, RSS and ESS. (v) Calculate the value of the coefficient of determination, and comment on your obtained results. (vi) Find, the Adjusted(R 2 ) and compare it with R 2 . (vii) Calculate the investment elasticity with respect to GDP and interest rate and comment on your results. 3-45: Let the non-linear multiple regression equation be given by Yi = ȕ 0 X1iȕ1 Xȕ2i2 eİi , where Yi is the ith value of the output variable Y, X1i is the ith value of the input variable X1 (labour), X2i is the ith value of the input variable X2 (capital) , and İ is the random error term. Given that ln( ȕˆ 0 ) = -1.652, ȕˆ 1 SE(ȕˆ ) 0.0934.

0.34, , ȕˆ 2

0.846, , R 2

0.995, , and n = 20, SE (ln( ȕˆ 0 )) = 0.606, , SE(ȕˆ 1 )

0.186, , and

2

Answer the following questions:

(i) Interpret the estimated values of ȕ1 and ȕ 2 . Is the function an increasing or decreasing return to scale? Why? (ii) Looking at the results, what can you say about the model? (iii) Find, the 95% confidence interval for ȕ1 and ȕ 2 , and comment on your results. (iv) Test the null hypothesis H 0 : ȕ j

0, ( j =1, 2) against a suitable alternative hypothesis.

(v) Test the null hypothesis H 0 : ȕ1 = ȕ 2 against a suitable alternative hypothesis.

Multiple Regression Models

165

(vi) Obtain an extra sum of squares due to the above null hypotheses. 3-46: Test the following null hypothesis using the Wald test, F-test and LM test . ­ȕ1 -ȕ 2 +ȕ3 +ȕ 4 -ȕ5 =1 ° H 0 : ®ȕ1 =ȕ 2 °ȕ =ȕ ¯ 3 4

3-47: Define a Cobb-Douglas production function and explain all the terms of the Cobb-Douglas production function. Find the output elasticities with respect to labour and capital of the Cobb-Douglas production function and also find their marginal products.

The estimation of the Cobb-Douglas production of a developed country is given below: Dependent Variable log(Q): Estimated by OLS method Usable Observations 42 Mean of Dependent Variable 12.2260 Std Error of Dependent Variable 0.38149 Standard Error of Estimate 0.02828 The sum of Squared Residuals 0.01361 Regression F(2,39) 1719.2311 Significance Level of F 0.00000 Log Likelihood 44.5522 Durbin-Watson Statistic 0.42567 Variable Coefficient 1. Constant -1.6524 2. log(L) 0.3397 3. log(K) 0.8460

Standard Error 0.6062 0.1857 0.0934

Answer the following questions: (i) Looking at the results what you can say about the model? (ii) Explain the output elasticities with respect to labour and capital. (iii) Test the null hypothesis H 0 : ȕ j = 0, ( j = 1, 2) against a suitable alternative hypothesis and make comments from your results. (iv) If TSS = 2.765, then find the Adjusted(R 2 ) and compare it with R 2 . (v) Do you think the production function is a decreasing return to scale? Why? (vi) Obtain the estimated equation.

CHAPTER FOUR HETEROSCEDASTICITY

4.1 Introduction: Meaning of Heteroscedasticity To estimate classical linear regression models using the OLS method, we assume that the variance of the random error term, İ i conditional on the explanatory variable(s) is constant, i.e., Var(İ i |X= x) = E(İ i2 |x) = ı 2 ;  i, or we can write that Var(İ|X) = ı 2 I N . It means that, for all values of X's, the İ's show the same dispersion around the zero mean. This implies that the variance of the disturbance term does not depend on the values of X. Thus, we can say that, İ i ~N(0, ı 2 ),  i, then, the İ's are said to be homoscedastic and its structure is shown in Fig. 4-2. However, in many business, economic, socio-economic, and financial studies, especially in cross-sectional or microeconomic analyses, the assumption of constant variance of the random error term İi , i, may not be valid, i.e., Var(İi |X=x) = E(İi2 |x) z ı2 ,  i; or Var(İ|x) z ı 2 I N . This is because the variance of the unobservable random error terms changes across different segments of the population, where the segments are determined by the different values of the explanatory variables. It means that, for all values of X's, the İ's do not show the same dispersion around the zero mean. This implies that the variance of the disturbance term depends on the values of X, i.e., Var(İ i |X= x) = f(x). Thus, we can say that

İ i ~N(0, ıi2 ), or İ~N(0, ı 2 ȍ). Therefore, in this situation, it can be said that İ's are heteroscedastic. Thus, we can say  that the problem of heteroscedasticity arises and the heteroscedastic structure is shown in Fig 4-3. The heteroscedastic nature implies that, for larger values of X, the variance of the random error terms will be larger around the zero mean. For example, if we consider an equation: proi = ȕ 0 +ȕ1invi +İ i , where proi is the profit of the ith industry, and invi is the investment of the ith industry, in this profit equation, the assumption of homoscedasticity is violated because the average higher investment corresponds to a higher profit. In addition, we may expect that the variation in profit among the industries having a large-scale investment is much larger than the variation among the industries that have a smallscale investment. If this is the case, the variance of İ i (i=1, 2, …,N) increases with investment which is called heteroscedasticity and the heteroscedastic nature is shown below graphically.

0

2000

4000

y = profit

6000

8000

Fitted values

10000 x = invest

Fig. 4-1: The curve with heteroscedasticity

Fig. 4-1 illustrates this case with the data of the banking sector of Bangladesh. In the banking sector of Bangladesh, larger values for investment correspond to higher expected profits, but they also have a higher variance. Since the curve for profit is expected to be upward-sloping, the heteroscedasticity in Fig. 4-1 could be explained by the following equation: Var(İ i |invi =invi ) = ıi2 = f(inv) = exp(ȕ 0 +ȕ1invi )

(4.1)

Heteroscedasticity may arise with correlated disturbances or uncorrelated disturbances either in a multiplicative form or in an additive form. In either case, we cannot apply the OLS method to estimate the heteroscedastic models. Thus,

Heteroscedasticity

167

students and researchers have to know the forms of heteroscedasticity, reasons for heteroscedasticity, problems of heteroscedasticity, tests for detecting the presence of heteroscedasticity, and the estimation techniques of the heteroscedastic models. Therefore, in this chapter, different structural forms of heteroscedasticity, reasons for heteroscedasticity, problems of heteroscedasticity, different tests for detecting the presence of heteroscedasticity, and estimating techniques of heteroscedastic models are discussed with their application to numerical problems.

4.2 Structural Forms of Heteroscedasticity When we are dealing with mathematical or empirical measurements of economic relationships, based on crosssectional or micro-economic data, heteroscedasticity may arise without correlated disturbances. However, if we deal with time-series data heteroscedasticity may arise with correlated disturbances. Thus, two structural forms of heteroscedasticity are discussed below. Heteroscedasticity with No Autocorrelation

The variance of the vector of random error terms is given by Var(İ) = E[İ  E(İ)][İ  E(İ)]c = E(İİ c)

ª E(İ12 ) E(İ1İ 2 ) « E(İ 22 ) « E(İ 2 İ1 ) « . . = « . « . « . . « ¬« E(İ N İ1 ) E(İ N İ 2 )

. . . . . .

. . . . . .

. E(İ1İ N ) º » . E(İ 2 İ N ) » . . » » . . » . . » » . E(İ 2N ) ¼» N×N

(4.2)

We assumed that Var(İ i ) = E(İ i2 ) = ı i2 ;  i and there is no autocorrelation between the random error terms, i.e., Cov(İ i , İ j ) = E(İ i , İ j ) = 0, for i z j. Then based on the assumptions, equation (4.2) can be written as: ªı12 « «0 « . Var(İ) = « « . « . « ¬« 0

0 ı 22 . . . 0

. . . . . .

. . . . . .

. 0º » . 0» . . » » . . » . . » » . ı 2N ¼» N×N

=:

(4.3)

Heteroscedasticity with Autocorrelation

Let ȡij be the correlation coefficient between İ i and İ j which is given by ȡij =

=

Cov(İ i , İ j ) Var(İ i ) Var(İ j ) E(İ i ,İ j ) ı i2 ı 2j

(4.4)

From equation (4.4) we have E(İ i ,İ j ) = ȡij ıi ı j , for i z j, (i, j= 1, 2,….,N). Therefore, the equation (4.2) can be written as

(4.5)

Chapter Four

168

ª ı12 ȡ12 ı1ı 2 « ı 22 « ȡ 21ı1ı 2 « . . Var(İ) = « . . « « . . « «¬ȡ N1ı N ı1 ȡ N2 ı N ı 2

. . . ȡ1N ı1ı N º » . . . ȡ 2N ı 2 ı N » » . . . . » . . . . » » . . . . » . . . ı 2N »¼ N×N

(4.6)

In either case, heteroscedasticity occurs principally in the multiplicative form or additive form. In this section, the multiplicative and additive forms of heteroscedasticity are discussed. Multiplicative Form of Heteroscedasticity

The multiplicative form of heteroscedasticity is proposed by Lahiri and Egy (1981) which is given by ı i2 = ı 2 z și

(4.7)

where ș measures the importance of heteroscedasticity, ı 2 is the proportional constant, and z’s are the variables that are responsible for heteroscedasticity. Taking the logarithm of equation (4.7), we have log(ıi2 ) = log(ı 2 ) + șlog(zi )

(4.8)

If ș is significantly different from zero, implies the existence of the multiplicative form of heteroscedasticity. A general representation of the multiplicative form of heteroscedasticity is given by: ıi2 = ı 2 z1iș1 zș2i2 .......zșkik

(4.9)

The logarithm of the equation (4.9) is given by: log(ıi2 ) = log(ı 2 ) + ș1log(z1i )+ș 2 log(z 2i )+.......+ș k log(z ki )

(4.10)

Here, the parameters ș1 , ș 2 ,......,and ș k measure the importance of heteroscedasticity, ı 2 is the proportional constant, and z’s are the variables which are responsible for heteroscedasticity. If ș1 =ș 2 =.......=ș k =0, it implies that there is no problem of heteroscedasticity. If ș1 , ș 2 ,....., and ș k are significantly different from zero, it implies the existence of the multiplicative form of heteroscedasticity. Additive Form of Heteroscedasticity

The additive form of heteroscedasticity is proposed by Goldfeld and Quandt (1972) and given by ıi2 = Į+ȕX i +įX i2

(4.11)

where Į, ȕ, and į are constant, but ȕ and į measure the importance of heteroscedasticity, X is the explanatory variable which is responsible for heteroscedasticity. A general representation of the additive form of heteroscedasticity is given by ıi2 = Į 0 +Į1z1i +Į 2 z 2i +.......+Į k z ki

(4.12)

where the non-stochastic variables z’s may be identical to the variables x’s or may be the function of x’s. Here, the variables z’s are responsible for heteroscedasticity. If the parameters Į1 , Į 2 ,....., and Į k are significantly different from zero, it implies that there is a problem of heteroscedasticity. If Į1 = Į 2 =........= Į k = 0, implies that there is no problem of heteroscedasticity.

Heteroscedasticity

169

4.3 Possible Reasons for Heteroscedasticity In general, heteroscedasticity may arise due to various reasons among which the following are the most important ones. (i) Heteroscedasticity may arise when we are dealing with a natural phenomenon that may have either an increasing or decreasing trend. For example, the variation in consumption patterns on food increases as income increases. Similarly, the variation in the number of car accidents decreases as the number of hours of driving practice increases. (ii) Heteroscedastic problems may arise in the regression model when we are dealing with the observations that are in the form of averages. For example, it is easier to collect data on the expenditure on food for the whole family rather than on a particular family member. Suppose, in a simple linear regression model of the type

(4.13)

Yij = ȕ 0 + ȕ1X ij + İ ij , i = 1, 2,....,N, j = 1, 2,.....,mi

Yij denotes the expenditure on food for the ith family having mi members and X ij denotes the age of the jth person in the ith family. It is difficult to record data for individual family members but it is easier to get data for the whole family. So, Yij's are known collectively. Then, instead of per member expenditure, we find the data on the average expenditure for each family member as yi

1 mi

mi

¦Y , ij

and x i

j=1

1 mi

mi

¦X , ij

and then model (4.13) becomes

j=1

(4.14)

yi = ȕ 0 + ȕ1 x i +İi

mi ı 2 ı 2 1 mi Var(İ ) , ¦ ij mi2 j=1 mi2 mi which indicates that the resultant variance of disturbances does not remain constant but depends on the number of members in a family mi . So a heteroscedasticity problem occurs in the data. The variance will remain constant only

If we assume that E(İ ij )

0 and Var(İ ij ) = ı 2 , then we have, E (İi )

0 and Var (İi )

when all mi 's are the same. (iii) Sometimes, heteroscedastic problems may happen in the data due to theoretical considerations, i.e., heteroscedasticity may also arise in a random coefficient model. For example, suppose, in the simple linear regression model

(4.15)

Yi = ȕ 0 +ȕ1X i +İ i

Yi denotes the yield of potatoes and Xi denotes the quantity of fertilizer in an agricultural experiment. It is observed that, when the quantity of fertilizer increases, the yield of Potatoes increases. Initially for increasing the quantity of fertilizer, the yield increases. Gradually, the rate of increase slows down, and if fertilizer is increased further, the crop burns. Thus, it can be said that ȕ1 changes with different levels of fertilizer. Since the rate of change of production of potatoes with respect to quantity of fertilizer is not constant, a possible way is to express it as a random variable with constant mean ȕ1 and constant variance, say, į 2 . Thus, equation (4.15) reduces to

(4.16)

Yi = ȕ 0 +ȕ1i X i +İ i

where ȕ1i = ȕ1 + vi , i = 1, 2,….,N where E(vi )

0,  i, Var(vi ) = į 2 ,  i, and Cov(İ i , vi )

0,  i.

Therefore, equation (4.16) can be written as Yi = ȕ 0 +ȕ1X i +X i vi +İ i

(4.17)

= ȕ 0 +ȕ1X i +u i

where u i Now, E(u i )

X i vi +İ i is the new random error term in the complete model. X i E(vi )+E(İ i )

0,  i, and

Chapter Four

170

Var(u i )

E(u i2 )

= X i2 E(vi2 )+X i E(İ i ,vi ) + E(İ i2 ) = X i2 į 2 +ı 2

So, the variance of the new random error terms depends on X i2 , and thus, heteroscedasticity is introduced in the model. (iv) Heteroscedasticity may also arise due to misspecification of the model, that is, the regression model which is not correctly specified. In this situation, we look at heteroscedasticity because some important variables are omitted from the model. For example, let the true monthly consumption function of beef be given by Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ 3 X 3i +u i

(4.18)

where Yi is the monthly consumption of beef of the ith person, i = 1, 2,……., N, X1i is the monthly income level of the ith consumer, X 2i is the per unit price of beef, and X 3i is the per unit price of pork. However, for some reasons, we run the following consumption function of beef Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i

(4.19)

Here, equation (4.18) is the correct model, but we run equation (4.19), letting İ i = ȕ 3 X 3t +u i . The new random error term İ i represents the influence of the omitted variable X3 . As a result, the disturbance terms İ's will reflect a systematic pattern which may cause the heteroscedastic problem in the data. (v) The skewness in the distribution of one or more explanatory variables in the model also causes heteroscedastic problems. In most countries, the distribution of income, expenditure, profits, wealth, and education is uneven, with the bulk of the income, expenditure, and wealth being owned by the upper strata of society. If we deal with income, expenditure, and wealth etc. the heteroscedastic problem might arise. (vi) Heteroscedasticity can happen while dealing with grouped data. For example, let us consider a functional relationship between expenditure and income for high-income, medium-income and low-income groups of the type Yij = ȕ 0 +ȕ1X ij +İ ij , i=1, 2, 3, j = 1, 2,.....,N i

(4.20)

where Yij is the expenditure of the jth person in the ith group (i=1, 2, 3), X ij is the income of the jth person in the ith group and İ ij is the random error term. In this equation, the assumption of homoscedasticity is violated because the average higher income corresponds to a higher expenditure. In addition, we can expect that the variation in expenditure among the persons having the higher income (high-income group) is much larger than the variation among the persons having the smaller income (medium-income and low-income groups). If this is the case, the variance of İ ij increases with higher income which is called heteroscedasticity. Thus, we have Var(İ ij |X ij =x ij ) = ı i2 = f(x ij )

(vii) Heteroscedasticity may also arise due to the presence of outliers in the data. An outlying observation or an outlier is an observation that is much different, either very small or very large, in relation to other observations in the sample. More precisely, an outlier is an observation from a different population to that generates the remaining sample observation. The inclusion of such an observation, especially if the sample size is small, can substantially alter the results of regression analysis. (viii) Heteroscedasticity may arise in many studies, especially in cross-section or micro-economic analysis. (ix) David Henry notes that heteroscedasticity may also arise because of an incorrect data transformation form (e.g., the ratio or first difference transformations) and an incorrect functional form (e.g., linear versus log-linear forms).

(x) Heteroscedasticity may arise due to changes in various economic policies like monetary policy, tax reform policy, liberal trade policy etc.

Heteroscedasticity

171

4.4 Nature of Heteroscedasticity In a classical regression model, we assume that the random error terms İ's are identically distributed with zero mean and constant variance, i.e., E(İ i |X i ) = 0, i, and Var(İ i |X i ) = ı 2 ,  i. Then, we can say that İ's are homoscedastic. This means that the dispersion of the random error terms İ's around their zero mean are the same and its structure is shown below graphically

f( H )



Y

E(İ 2 ) = 0

E(İ N ) = 0

E(İ1 ) = 0

X1

X2 ………………….. XN

X

Fig. 4-2: Homoscedastic disturbances

It also means that the dispersion of the observed value of the dependent variable Y around the mean value Xȕ means that, around the regression line Xȕ , is the same for all observations. Thus, from the graphical presentation, it can be said that, for increasing or decreasing values of X, the conditional variance of Y at any given X will remain constant. When the conditional variance of the random error terms at given X does not remain the same, that is, it is a function of X, we can say that the problem of heteroscedasticity is present whose heteroscedastic structure is shown below graphically. The heteroscedastic nature implies that the conditional variances of the random error terms will increase for a larger value of X. Also, the conditional variances of the random error terms may decrease for increasing values of X.

Y

f(İ)

E(İ N ) = 0 E(İ 2 ) = 0 E(İ1 ) = 0

X1

X2 ………………………XN

X

Fig. 4-3: Heteroscedastic disturbances

However, in many situations especially in cross-sectional or micro-economic analyses, the conditional variances of the random error terms at given X will be either increased or decreased for increasing values of X. For example, if we deal with a relationship based on a random sample of household consumption expenditure and income, we know households with low income do not have much flexibility in spending and consumption patterns among such lowincome households may not vary much. On the other hand, households with high-income have a great deal of flexibility in spending. That is, some households consume on a large scale, and some households may save on a large scale and may invest in financial sectors. This implies that the actual consumption expenditure might be different from the average consumption expenditure. Therefore, it is very common for households with higher income to have a larger dispersion around mean consumption than lower-income households. Thus, the conditional variances of the random error terms around zero mean or the conditional variance of the dependent variable Y around the mean consumption will increase with increasing values of X. This situation is called heteroscedasticity. If we assume that İ i

Chapter Four

172

is a random error term with E(İ i |X i ) = 0, and Var(İ i |X i ) = E(İ i2 |X i ) = ıi2 ,  i. This implies that each error term has a different variance around its zero mean. Hence, the variance-covariance matrix of İ will be ªı12 « «0 « . Var(İ) = « « . « . « «¬ 0

0

. . .

2 2

. . .

ı

.

. . .

.

. . .

.

. . .

0

. . .

0º » 0» . » » . » . » » ı 2N »¼ N×N

(4.21)

Sometimes, it is convenient to write that Var(İ1 |x i ) = ıi2 = ı 2 Wi , where Wi = f(X i ). Then, the variance-covariance matrix of İ is given by ª W1 «0 « « . Var(İ) = ı 2 « « . « . « «¬ 0

0

. . .

W2 .

. . . . . .

.

. . .

.

. . .

0

. . .

0 º 0 »» . » » . » . » » WN »¼ N×N

ı2ȍ

(4.22) N

For convenience, we use the normalization trace(:) = ¦ Wi = N . Heteroscedasticity may arise with or without i=1

correlated disturbances. In either case, heteroscedasticity may arise in an additive form or in multiplicative form which is illustrated briefly in section 4.2.

4.5 Consequences of Heteroscedasticity Let us consider a multiple linear regression model of the type Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +..........+ȕ k X ki +İ i = x ci ȕ+İ i

(4.23)

where Var(İ i |x i ) = ı i2 , i = 1, 2,......,N , i.e., the error variances are different for different values of i and are unknown. If the variances of error terms are not constant, but all other assumptions of the classical linear regression model are satisfied, then the following consequences of using the OLS estimators to obtain estimates of the population parameters can happen. (i) The linearity of the OLS estimator of ȕ will not be affected. (ii) The unbiasedness of the OLS estimator of ȕ will not be affected. (iii) The OLS estimator of ȕ is consistent. (iv) The OLS estimator of ȕ is inefficient and does not have the minimum-variance property (BLUE) property. (v) The estimated variances and covariances of the OLS estimators are biased and inconsistent. (vi) The estimated variances and covariances of the OLS estimates of ȕ's are biased and inconsistent when heteroskedasticity is present but ignored. Therefore, the standard tests are not valid for testing statistical hypotheses. (vii) The residual mean square s 2 will not be an unbiased estimator of ı 2 . (viii) The prediction of the dependent variable Y for a given value of x will be inefficient.

Heteroscedasticity

173

4.6 Properties of the OLS Estimator of ȕ when Var(İ) z ı 2 I n : Effects on the Properties of the OLS Estimators Let us consider a multiple linear regression equation of the type (4.24)

Yi = x ci ȕ +İ i

The variance-covariance matrix of İ is given by equation (4.22). Sometimes, it is convenient to write Var(İ i |x i ) = ıi2 = ı 2 Wi , where Wi = f(X i ). If we apply the OLS method to equation (4.24), then the following consequences or properties of the OLS estimators might happen. Property i: The linearity of the OLS estimator of ȕ will not be affected Proof: The OLS estimator of ȕ is given by ȕˆ OLS = (XcX)-1X cY

= BY

(4.25)

where B = (XcX)-1Xc is a {(k+1) u n} matrix of fixed numbers. Let us define B such that ª b 01 «b « 11 « . B =« « . « . « ¬« b k1

b02 b12 . . . b k2

. . . . b 0n º . . . . b1n »» . . . . . » » . . . . . » . . . . . » » . . . . b kn ¼» (k+1)u n

From equation (4.25), we have ȕˆ j =

n

¦b

ji

Yi , j = 0, 1, 2,.....,k

(4.26)

i=1

Thus, each ȕˆ j is a linear combination of the components of the vector Y. Hence, it can be said that the linearity of the OLS estimators will not be affected if the disturbances are heteroscedastic. Property ii: The unbiasedness of the OLS estimator of ȕ will not be affected. Proof: From equation (4.25), we have ȕˆ OLS = ȕ + (X cX)-1X cİ

(4.27)

Taking the expectation of the equation (4.27) we have E(ȕˆ OLS ) = ȕ, [  E(ȕˆ OLS ) = 0] 

(4.28)

This shows that the unbiasedness of the OLS estimator of ȕ is not affected if the disturbances are heteroscedastic. Property iii: The OLS estimator of ȕ is consistent Proof: From Chebyshev’s Inequality, we can write that ˆ ˆ t į d Var(ȕ) Prob ȕˆ  E(ȕ) 2 į

^

`

(4.29)

ˆ = ȕ and Var(ȕ) ˆ = X c: -1X -1 ı 2 . Therefore, putting these values in equation For the heteroscedastic model, E(ȕ)

(4.29), we have

Chapter Four

174

Prob ȕˆ  ȕ t į d

^

`

-1

Xcȍ -1X ı 2 į2 -1

1 § X cȍ -1X · ı 2 lim Prob ȕˆ  ȕ t į d 2 ¨ ¸ n of į © n ¹ n

^

`

-1

1 § X cȍ -1X · ı 2 lim Prob ȕˆ  ȕ d į t 1  2 ¨ ¸ n of į © n ¹ n

^

`

(4.30)

1

§ Xc: -1X · ı2 o 0, as n o f . Thus, equation (4.30) can be written as Now, ¨ ¸ o finite quantity and n © n ¹

^

`

lim Prob ȕˆ  ȕ d į

n of

1

ˆ =ȕ plim(ȕ)

(4.31)

This shows that ȕˆ is a consistent estimator of ȕ even if the disturbances are heteroscedastic. Property iv: The OLS estimates are inefficient and do not have the minimum variance (BLUE) property Proof: The variance-covariance matrix of ȕ is

var(ȕˆ OLS ) = E(ȕˆ OLS  ȕ)(ȕˆ OLS  ȕ)c = E[(X cX)-1X cİ][(XcX)-1Xcİ]c

= (XcX)-1XcE(İİ c)X(X cX)-1 = ı 2 (XcX)-1Xc:X(X cX)-1 z ı 2 (X cX)-1

(4.32)

We have var(ȕˆ OLS ) z ı2 (XcX)-1 , when Var(İ) = ı2In , then var(ȕˆ OLS ) reduces to ı2 (XcX)-1. Thus, when Var(İ) = ı 2 ȍ, and if it is wrongly assumed that Var(İ) = ı 2 I n , then Var(ȕˆ OLS ) z ı 2 (X cX)-1 and Var(ȕˆ OLS ) z s 2 (XcX)-1 . The OLS estimator ȕˆ is no longer the best, that is, it does not have the minimum variance (BLUE) property and it is inefficient. Property v: The OLS estimators are less efficient than the GLS estimators if the random error terms are heteroscedastic. Proof: If the variable Y and all the explanatory variables are measured as deviations from their mean value, then equation (4.24) can be written as yi = ȕ1 x1i +ȕ 2 x 2i +.......+ȕ k x ki +İ i , where yi = Yi  Y, and x ji

X ji  X j , j=1, 2,…,k.

yi = x ci ȕ+ İ i y = xȕ+İ

(4.33)

where x is a column of a vector of n observations on x i . E(yi ) = 0, i, and E(x ji ) = 0, i and j. The variance of the OLS estimator is given by

Heteroscedasticity

175

var(ȕˆ OLS ) = ı 2 (x cx)-1 x cȍx(x cx)-1 n

ı 2 ¦ x i2 w i i=1

=

­ n 2½ ®¦ x i ¾ ¯ i=1 ¿

(4.34)

2

The variance of the GLS estimator of ȕ is given by var(ȕˆ GLS ) = ı 2 (x cȍ -1 x)-1 ı2 ª x i2 º ¦ « » i=1 ¬ w i ¼

=

(4.35)

n

The relative efficiency of OLS to GLS estimator is given by Efficiency =

var(ȕˆ OLS ) var(ȕˆ ) GLS

n

V 2 ¦ x i2 w i i=1

­ n 2½ ®¦ x i ¾ = ¯ i=1 2 ¿

2

V

ª x i2 º » i=1 ¬ i ¼ n

¦«w

ª x i2 º » i=1 ¬ i ¼ 2 ­ n 2½ x ®¦ i ¾ ¯ i=1 ¿

n

=

2 i

n

¦x w ¦«w i=1

i

(4.36)

The efficiency of the OLS estimators to the GLS estimators depends on the postulated range for the heteroscedasticity. If we define w i = x i2 = z i , then from equation (4.36), we have var(Eˆ

OLS

var(Eˆ

1 n 2 ¦ zi n i=1 z2

)

GLS

)

z2  =

1 n (z i  z ) 2 ¦ n i=1 z2

1 n ¦ (zi  z )2 n i=1 = 1+ z2

1

Since

var(x i2 ) ª¬ E(x i2 ) º¼

var(x i2 ) is a positive quantity, from equation (4.37), we have ª¬ E(x i2 ) º¼

(4.37)

Chapter Four

176

var(ȕˆ OLS ) >1 var(ȕˆ ) GLS

var(ȕˆ OLS ) > var(ȕˆ GLS )

(4.38)

It shows that the OLS estimators are less efficient than the GLS estimators if the random error terms are heteroscedastic. Property vi: The usual tests are not valid for testing the null hypotheses when the random error terms are heteroscedastic. Proof: To test whether the parameters of a regression equation are statistically significant or not, we generally use the ˆ ı 2 (X cX)-1 ) Student t-test, F-test and the Chi-square test. In application of the t-test, and F-test, we assume that ȕ~N(ȕ, or ȕˆ ~N(ȕ , ı 2 a ), where a is the jth diagonal element of (X cX)-1 . This assumption will be true iff j

j

jj

jj

ˆ ı 2 (X cX)-1X cȍX(X cX)-1 ) Var(İ) = ı I n . However, the problem is, when Var(İ) = ı 2 ȍ, ȕ~N(ȕ, or ˆȕ ~N(ȕ , ı 2 b ) (j =1, 2,...,k), where b is the jth diagonal element of (XcX)-1Xc:X(X cX)-1 . Thus, when there is a j j jj jj 2

problem of heteroscedasticity, to test a null hypothesis H 0 : ȕ j = 0, (j = 1, 2,...,k), the t-test uses the wrong variance of the estimator, so the test statistic

ȕˆ j  ȕ j no longer has a t-distribution under H 0 . When there is a problem of se(ȕˆ ) j

heteroscedasticity, the standard error of the OLS estimator is larger, and consequently, the value of the test statistic will be smaller and the null hypothesis will be accepted wrongly. Hence, this test is invalid. Thus, it can be said that, if it is wrongly assumed that Var(İ) = ı 2 I n , then the application of the t-test is invalid. The OLS estimator ȕˆ j is not ece of ı 2 is wrong. Thus, the n-k-1 joint F-test will also be invalid, and for similar reasons, the confidence interval for population parameters will also be invalid.

efficient, the wrong variance-covariance matrix is assumed and the estimate s 2 =

Property vii: The residual mean square s 2 is not an unbiased estimator of whether the disturbances are heteroscedastic Proof: The residual sum of squares is given by ece = İ cMİ, [where M = I n -X(XcX)-1X c]

(4.39)

Taking the expectation of equation (4.39), we have E(ece) = E[trace(İ cMİ)], [ Since İ cMİ is a scalar constant, so trace(İ cMİ) = İ cMİ ] = E[trace(Mİİ c)] = trace [ME(İİ c)]

= trace [Mı 2 ȍ] = ı 2ȍ trace(M)

= ı 2 ȍ n-k-1 § ece · 2 Ÿ E¨ ¸ =ı ȍ n-k-1 © ¹

(4.40)

ece . This shows that s 2 is not an unbiased estimator of ı 2 if the n-k-1 disturbances are heteroscedastic. Therefore, to conduct any test statistic, the estimated value of ı 2 which is given by

Thus, we have E(s 2 ) z ı 2 , where s 2 =

Heteroscedasticity n

¦e

177

2

i ece cannot be applied if the disturbances are heteroscedastic. ıˆ = s = = i=1 n-k-1 n-k-1 2

2

4.7 Methods for Detecting the Presence of Heteroscedasticity In applied econometrics, one of the most important objectives is to test the assumptions involved in an econometric model. Thus, when we are dealing with cross-sectional and grouped data, testing for heteroscedasticity is an important issue because the assumption of the constant variance of random disturbances across observations is likely to be violated. Therefore, in this section, some informal and formal methods for detecting the presence of heteroscedasticity are discussed. First, the graphical method is discussed to detect the presence of heteroscedasticity and then some formal test statistics are discussed for detecting the presence of heteroscedasticity in data. Graphical Method for Detecting Heteroscedasticity

If we have no prior information about the nature of heteroscedasticity, in practice, we can do the regression analysis on the assumption that there is no problem with heteroscedasticity, and then, by examining squared residuals ei2 graphically, we can detect the presence of heteroscedasticity. The following steps are involved in detecting the presence of heteroscedasticity graphically: Step 1: First, apply the OLS method to the simple linear or multiple linear regression equation and then obtain the residuals ei 's which are given by ei = yi  yˆ i , i = 1, 2,.......,n.

(4.41)

where yˆ i = ȕˆ 0 + ȕˆ 1X i , for a simple linear regression equation or for a multiple linear regression equation it is given by yˆ i = ȕˆ 0 +ȕˆ 1X1i +ȕˆ 2 X 2i +.......+ȕˆ k X ki Step 2: Second, we obtain the squared values of all ei , i.e., ei2 for all i. (i = 1, 2,…. ,n). Step 3: Third, we plot ei2 to the Y-axis corresponding to the values of yˆ i to the X-axis. Plotting ei2 against yˆ i , we

can get the idea about whether the estimated value of Y is systematically related to the squared residuals ei2 . In the case of a simple linear regression equation, one may plot ei2 against X i instead of plotting ei2 against yˆ i . Also, in the case of a multiple linear regression equation, one may plot ei2 against each of the explanatory variables instead of plotting ei2 against yˆ i . Step 4: Let us now examine some situations to detect the presence of heteroscedasticity given below:

ei2

ei2

0 Fig. 4-4(a): No heteroscedasticity

yˆ i

yˆ i

0

Fig. 4-4(b): Heteroscedasticity with a linear relationship.

Chapter Four

178

ei2

ei2

0 Fig. 4-4(c): Heteroscedasticity with a linear

yˆ i

yˆ i

0 Fig. 4-4(d): Heteroscedasticity with an

relationship

exponential relationship

ei2

ei2

yˆ i

0

yˆ i

0

Fig. 4-4(e): Heteroscedasticity with a quadratic relationship

Fig. 4-4(f): Heteroscedasticity with a quadratic relationship

In Fig. 4-4(a), we see that there is no systematic pattern of the relationship between two variables ei2 and yˆ i , and thus, it can be said that there is no heteroscedastic problem in the data. In Fig. 4-4(b), 4-4(c), 4-4(d), 4-4(e) and 4-4(f) indicate the existence of a definite pattern of the relationship between two variables ei2 and yˆ i , Fig. 4-4(b) and Fig. 4-4(c) indicate the presence of heteroscedasticity with a linear relationship between ei2 and yˆ i and Fig. 4-4(d) indicates the presence of heteroscedasticity in the data with an exponential relationship between ei2 and yˆ i . Fig. 4-4 (e) and 4-4 (f), indicate the presence of a heteroscedastic problem in the data with a quadratic relationship between ei2 and yˆ i . Ex. 4-1: The data given below are the net profit (NETP, in million BDT, ) and sales revenue (REV, in million BDT) in the year 2019 of several listed companies at DSE. Detect the existence of heteroscedasticity using the graphical method. Table 4-1: Net profit and sales revenue of several listed companies at DSE

NETP 26.81 48.5 20.2 15.85 9.55 10.65 8.55 12.65 9.05

REV 1176.9 1346.6 685.5 425.68 335.65 520.25 405.25 404.65 325.26

NETP 2.5 2.35 2.65 4.25 2.5 2.65 3.25 2.56 1.5

REV 131.5 85.89 78.85 80.65 38.54 75.3 78.36 80.26 35.26

NETP 1.56 1.48 1.25 1.26 1.25 0.98 0.65 0.78 1.65

REV 26.68 24.65 22.65 25.36 22.36 20.65 19.56 16.59 38.29

Heteroscedasticity

3.75 3.95 4.5 2.25 5.3

112.25 115.85 150.65 105.55 125.5

1.66 1.75 1.35 1.46 1.35

179

34.6 37.68 32.52 39.58 38.26

0.85 0.78 0.75 1.68 0.87

19.35 16.96 15.65 36.78 18.96

Source: Yearly report of the companies

Solution: The simple linear regression equation of net profit on sales revenue is given by

(4.42)

NETPi = ȕ 0 +ȕ1REVi +İ i

Applying the OLS method to the equation we obtain the residuals which are given by ei = NETP  0.2930  0.0292REVi , (i=1, 2,……….,n)

(4.43)

We then obtain the squared residuals ei2 for all i. The results are obtained using RATS and given in Table 4-2. Table 4-2: Squared residuals

61.46 2.66 0.24

79.14 0.20 0.22

0.01 0.00 0.09

9.80 2.57 0.05

0.29 1.17 0.09

23.33 0.03 0.01

12.77 0.45 0.05

0.30 0.01 0.00

0.55 0.03 0.06

0.03 0.13 0.00

0.08 0.13 0.00

0.04 0.01 0.00

1.26 0.00 0.10

1.80 0.00 0.00

Now, we plot ei2 to the Y-axis against the estimated net profit or revenue to the X-axis. The graph which is obtained by plotting ei2 to the Y-axis against the estimated net profit is given below. 80 70 60 50 40 30 20 10 0 0

10

20

30

40

Fig. 4-5: Graph for detecting the presence of heteroscedasticity

The graph shows the presence of heteroscedastic problems in the data. Tests for Detecting the Presence of Heteroscedasticity

Tests for detecting the presence of heteroscedasticity are broken into two parts: (1) Tests for detecting heteroscedasticity without the presence of autocorrelation (2) Test for detecting heteroscedasticity with the presence of autocorrelation. Tests for Detecting Heteroscedasticity without the Presence of Autocorrelation

The tests which are the most popular and widely applicable for detecting the presence of heteroscedasticity are discussed in this section. Bartlett Test for Detecting Heteroscedasticity in Case of Simple Linear Regression Equation

A standard procedure for testing the equality of variances of several groups due to Bartlett (1937) is briefly discussed below. This test procedure involves the following steps: Step 1: First, we consider a simple linear regression model of the type Yij = Į+ȕX ij +İ ij , (i = 1, 2,......,k, j = 1, 2,......,N i )

(4.44)

Chapter Four

180

where Yij is the jth observation of the dependent variable Y corresponding to the ith group, Xij is the jth observation of the explanatory variable X corresponding to the ith group, İ ij is the jth observation of the disturbances term İ corresponding to the ith group, N i is the population size corresponding to the ith group, and k is the number of groups. The above model, İ i is normally distributed with zero mean and variance ı i2 I Ni . Step 2: Second, we set up the null hypothesis against an alternative hypothesis. The null hypothesis to be tested is

H 0 : ı12 = ı 22 =..........= ı 2k against the alternative hypothesis H1: At least two of them are not equal.

Step 3: Third, we select the appropriate test statistic. Under the null hypothesis, the test statistic is given by

Q=

k 1 ª º 2 2  mln(s ) mi ln(si2 ) » ~F (k-1)d.f. ¦ « M¬ i= 1 ¼

(4.45)

where M = 1+

1 ª k 1 1º  » , mi = (n i  2), m = «¦ 3(k  1) ¬ i=1 mi m ¼

k

¦m . i

(4.46)

i=1

n i is the number size of the ith group. ni

si2 =

¦e

2 ij

j=1

(4.47)

ni  2

and ni

k

s2 =

¦¦ e

2 ij

i=1 j=1

k

¦ (n

i

 2)

i

 2)si2

i=1 k

¦ (n i=1 k

¦ (n

i

 2)

i=1

k

2 i i

¦m s i=1 k

(4.48)

¦ mi i=1

ˆ , Įˆ and ȕˆ are the OLS estimators of Į and ȕ. where eij = Yij  Įˆ  ȕX ij

Decision: At a 5% level of significance with (k-1) degrees of freedom, we find the table value of the Chi-square test statistic. If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis implying the presence of heteroscedasticity. Ex. 4-2: Detect the presence of heteroscedasticity for the given problems in Ex. 2-4 and Ex. 2-5 using the Bartlett test. Solution: Let the regression equation of total expenditure (TEX) on total income (TINC) for foreign banks and stateowned banks be given by

TEX ij = Į +ȕTINCij +İ ij , (i = 1, 2, j = 1, 2,.........,n i )

(4.49)

Heteroscedasticity

181

The null hypothesis to be tested is H 0 : ı12 = ı 22

against the alternative hypothesis H1: They are not equal.

The estimated values of all the terms of equation (4.45) are presented in Table 4-3. Table 4-3: Estimated values of s12 , s 22 and

2

¦ m ln(s i

2 i

)

i=1

ni

Bank (i)

Estimated Variance, si2 =

Foreign banks State Owned banks

s12 s 22

216305.9667 44 14785942.5414 44

2 1j

¦e j=1

(n i  2)

mi = n i  2

mi ln(si2 )

4916.0447

44

374.0114

336044.1487

44

559.8999

88

933.9113

Total Now, we have s2 =

44×4916.0447+44× 336044.1487 = 384328.041, mln(s 2 ) 88

M = 1+

88 u ln(384328.0410) 1131.6142, and

1 ª1 1º 1.0114.  3(2  1) «¬ 22 88 »¼

Putting the values of all the terms in equation (4.45), we have Q=

1 2 >1131.6142  933.9113@ ~F1d.f. 1.0114

= 195.4745

(4.50)

Let the level of significance be 5% Decision: At a 5% level of significance with 1 degree of freedom, the table value of the Chi-square test statistic is 3.84. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected indicating the presence of heteroscedasticity. Bartlett Test for Detecting Heteroscedasticity in Case of Multiple Linear Regression Model

The null hypothesis to be tested is H 0 : ı12 = ı 22 =.......= ı 2m against the alternative hypothesis H1: At least two of them are not equal

Assume that the dependent variable Y linearly depend on k independent variables. Thus, the multiple linear regression equation for the ith group is given by: Yij = ȕi0 +ȕ i1X1ij +ȕ i2 X 2ij +.......+ȕ ik X kij +İ ij , (i = 1, 2,......,m, j = 1, 2,....,N i )

(4.51)

where m is the number of groups, N i is the population size corresponding to the ith group, Yij is the jth observation of the dependent variable Y corresponding to the ith group, X pij is the jth observation of the pth ( p = 1, 2,….,k)

Chapter Four

182

explanatory variable corresponding to the ith group, İ ij is the jth disturbance term corresponding to the ith group. The random error term İ i is normally distributed with zero mean and variance ı i2 I Ni . Let eij be the residual corresponding to the jth observation of the ith group which is given by eij

Yij  ȕˆ 0  ȕˆ i1X1ij  ȕˆ i2 X 2ij  .......  ȕˆ ik X kij

(4.52)

where ȕˆ 0 , ȕˆ i1 , ȕˆ i2 ,......, and ȕˆ ik are the OLS estimators of ȕ 0 , ȕ i1 , ȕ i2 ,......, and ȕ ik respectively. Under the null hypothesis, the test statistic is given by Q=

m 1ª º 2 hln(s 2 )  ¦ h i ln(si2 ) » ~F (m-1)d.f. « C¬ i= 1 ¼

(4.53) ni

m

ª 1 1º 1 where C = 1+ « ¦  » , h i = n i -k-1, h = 3(m  1) ¬ i=1 h i h ¼

m

si2 =

¦h , i

i=1

¦e

m

2 ij

j=1

ni  k  1

, and s 2 =

2 i i

¦h s i=1

h

.

Rejection of the null hypothesis at a given level of significance implies the presence of heteroscedasticity. Ex. 4-3: Let us consider the multiple linear regression equation for G7 countries (except Germany) of the type Yij = ȕi0 +ȕ i1X1ij +ȕ i2 X 2ij +ȕi3 X 3ij +ȕ i4 X 4ij +İ ij , (i = 1, 2,....,6, j = 1, 2,....,Ni )

(4.54)

where Y is the per capita carbon dioxide emissions in metric tons, X1 is the energy consumption (kg of oil equivalent per capita), X2 is the trade openness (% of exports and imports of GDP), X3 is the urbanisation (% of the urban population of total), and X4 is the per capita real GDP (PGDP) (constant 2015 US$). The null hypothesis to be tested is H 0 : ı12 = ı 22 =.....= ı 62 against the alternative hypothesis H1: At least two of them are not equal.

The estimated results for the G7 countries except Germany are given in Table 4-4. Table 4-4: Residual sum of squares, si2 , h i ln(si2 ), and

6

¦ h ln(s i

2 i

)

i= 1

Country (i) Canada France Italy Japan UK USA Total

Residual SS

h i = (n i  k  1)

15.2598 13.9250 5.1847 6.5487 4.7608 4.0095

50 51 51 51 51 51 305

si2 0.3052 0.2653 0.1017 0.1284 0.0933 0.0786

The estimated value of pooled variance is given by

s2

1 {(50 u 0.3052)+51[0.2653+0.1017+0.1284+0.0933+0.0786]} = 0.1616 305

Thus, we have hln(s 2 ) = 305 u ln(0.1616) =  555.8759, and C = 1+ Putting the values of all the terms in equation (4.53), we have

1 ª1 5 1 º 1.0077   15 «¬ 50 51 305 »¼

h i ln(si2 ) -59.3399 -66.2050 -116.5915 -104.6803 -120.9421 -129.7007 -597.4595

Heteroscedasticity

Q=

183

1 2 > 555.8759  497.4595@ ~F5d.f. 1.0077

= 41.2679

(4.55)

Let the level of significance be 5% Decision: At a 5% level of significance with 5 degrees of freedom, the table value of the test statistic is 11.07 . Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected implying that the variability of carbon emissions of G7 countries is not equal. Bartlett Test for Detecting Heteroscedasticity based on Likelihood Ratio (LR) Test Statistic

The null hypothesis to be tested is H 0 : ı12 = ı 22 =........= ı 2m

against the alternative hypothesis H1: At least two of them are not equal.

Bartlett’s test is based on the likelihood ratio test statistic. If there are m independent normal random samples in which there are n i observations in the ith sample, the likelihood ratio test statistic for testing the null hypothesis is given by ni

ª s2 º 2 LR = – « i2 » i=1 ¬ s ¼ m

where si2 =

1 ni

(4.56) ni

2 ij

¦e ,

(i = 1, 2,......,m, j = 1, 2,............,n i ) , s 2 =

j=1

1 m n i si2 , and n = ¦ n i=1

m

¦n . i

i=1

To obtain an unbiased test and a modification of -2 lnLR which is a closer approximation to the chi-square with (m-1) degrees of freedom under H 0 . Bartlett test replaces n i by (n i -1) and divides by a scalar constant. This leads to a test statistic is given by Q=

m 1 ª º hln(ıˆ 2 )  ¦ h i ln(ıˆ i2 ) » « M¬ i= 1 ¼

(4.57)

which is distributed as Ȥ 2 with (m-1) degrees of freedom under H0. where M = 1+

ıˆ 2 =

m ª m 1 1º 1 « ¦  » , h i = n i -1, h = ¦ h i 3(m  1) ¬ i=1 h i h ¼ i=1

m

¦ (n i=1

i

 1) = (n  m), ıˆ i2 =

ni 1 eij2 , and ¦ (n i  1) j=1

1 m ¦ (n i  1)ıˆ i2 . n  m i=1

Ex. 4-4: Test the problem of heterogeneity for the given problem in Ex. 4-3 using the Bartlett test based on the likelihood ratio test. Solution: The null hypothesis to be tested is

H 0 : ı12 = ı 22 =.....= ı 62 against the alternative hypothesis H1: At least two of them are not equal.

The estimated residual variances for G7 countries except Germany based on Bartlett are given by ıˆ i2 = (i = 1, 2,…,6) and the results are given in Table 4-5 including the estimated values of h i ln(ıˆ i2 ).

ni 1 ¦ eij2 , (n i  1) j=1

Chapter Four

184

Table 4-5: Residual sum of squares, ıˆ i2 , h i ln(ıˆ i2 ), and

6

¦ h ln(ıˆ i

2 i

)

i= 1

Country (i)

Residual SS

h i = (n i  1)

15.2598 13.9250 5.1847 6.5487 4.7608 4.0095

54 55 55 55 55 55 329

Canada France Italy Japan UK USA Total

ıˆ i2 0.2826 0.2532 0.0943 0.1191 0.0866 0.0729

h i ln(si2 ) -682432 -75.5506 -129.8892 -117.0437 -134.5805 -144.0267 -669.3337

The estimated pooled variance is given by ıˆ 2 =

1 [(54 u 0.2826)+55(0.2532+0.0943+0.1191+0.0866+0.0729)] = 0.1861 329

Thus, we have hln(ıˆ 2 ) = 329×ln(0.1861) = -553.2040, and M = 1+

1 ª1 5 1 º 1.0071.   « 15 ¬ 54 55 329 »¼

Therefore, the test statistic is given by Q=

1 2 > 553.2040  669.3337 @ ~ F5d.f 1.0071

= 115.3151

(4.58)

Let the level of significance be 5% Decision: At a 5% level of significance with 5 degrees of freedom, the table value of the test statistic is 11.07. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected implying that the variabilities of carbon emissions of G7 (except German) countries are not equal. The Park Test for Detecting Heteroscedasticity

The Park test involves the following steps: Step 1: First, we regress Y on X of the type Yi = ȕ 0 +ȕ1X i +u i , (i =1, 2,.......,n.)

(4.59)

Step 2: Second, we apply the OLS method to equation (4.59) and then we obtain the squared residuals ei2 for all i, which are given by ei2 = ( yi  ȕˆ 0  ȕˆ 1 x i ) 2 , (i = 1, 2,....,n)

(4.60)

Step 3: For the Park test, the structural form of heteroscedasticity is

ıi2 = ı 2 X și e vi

(4.61)

where vi is the new random error term that satisfies all the usual assumptions of a CLRM , the parameter ș measures the importance of heteroscedasticity, ı 2 is the proportional constant, and X is the independent variable which is responsible for heteroscedasticity. The logarithmic transformation of equation (4.61) will be ln(ı i2 ) = ln(ı 2 )+ ș(ln X i ) +vi

(4.62)

Step 4: Since ı i2 is unknown to us, Park suggested using ei2 as a proxy variable of ı i2 , so equation (4.62) can be written as

Heteroscedasticity

185

ln(ei2 ) = ln(ı 2 )+ș(ln X i )+vi zi = Į+șw i +vi ,

(4.63)

where zi = ln(ei2 ), and w i = ln(X i ). Step 5: We set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is H0 : ș = 0

against the alternative hypothesis H1: ș z 0

Under the null hypothesis, the test statistic is given by t=

șˆ ˆ se(ș)

~t (n-2)d.f.

(4.64)

Step 6: Apply the OLS method to equation (4.63) and obtain the OLS estimator of ș. Let șˆ be the OLS estimator of ˆ and let se(ș) ˆ be the standard error of ș. ˆ Putting these values in equation (4.64), we ș. Obtain the standard error of ș, calculate the value of the t-test. Step 7: We find the table value of the test statistic at a given level of significance Į with n-2 degrees of freedom. Step 8: In the final step, we make a decision on whether the null hypothesis will be accepted or not. To make a decision, we compare the calculated value of the test statistic with the table value. If the calculated value of the test statistic falls in the acceptance region, the null hypothesis will be accepted, implying that there is no heteroscedastic problem in the data. Otherwise, we can say that u’s are heteroscedastic. Ex. 4-5: Detect the heteroscedastic problem for the given problem in Ex. 4-1, using the Park test. Solution: To detect the presence of heteroscedasticity using the Park test, first we regress profit (NETP) on sales revenue (REV) of the type NETPi = ȕ 0 +ȕ1REVi +İ i

(4.65)

We apply the OLS method to equation (4.65), and then obtain squares of the residuals which are given by ei2 = (NETPi  0.2930  0.0292REVi ) 2

(4.66)

and then, we obtain ln(ei2 ) and ln(REVi ). Now, we regress ln(ei2 ) on ln(REVi ) of the type lnei2 = Į+șln(REVi )+vi

Assume that the new random error term vi satisfies all the usual assumptions of a CLRM. The null hypothesis to be tested is H0 : ș = 0

against the alternative hypothesis H1: ș z 0 ˆ = 0.4198. The OLS estimate is 2.2741, i.e., șˆ = 2.2741, and SE(ș)

Under the null hypothesis, the test statistic is given by

(4.67)

Chapter Four

186

t=

2.2741 ~t (n-2)d.f. 0.4198

= 5.4179

(4.68)

Let the level of significance be 5%. Decision: At a 5% level of significance with 40 degrees of freedom, the table value of the test statistic is r2.015 . Since the calculated value of the test statistic does not fall in the acceptance region, the null hypothesis will be rejected. Thus, it can be concluded that there is a heteroscedastic problem in the data. The Goldfeld and Quandt Test

The Goldfeld-Quandt (1965) test is used for a large sample sizes under the following assumptions: (i) Random error terms İ's are normally distributed. (ii) İ's are serially independent, i.e., Cov(İ i , İ j ) = 0, i z j. The null hypothesis to be tested is H 0 : Var(İ i ) = ı 2 ,  i

against the alternative hypothesis H1: Var(İ i ) z ı 2 , [with an increasing variance] This test procedure involves the following steps: Step 1: First, we order the observations according to the magnitude of the explanatory variable X. Step 2: Second, a certain number “C” of central observations are omitted. Goldfeld and Quandt suggest that, if the sample size is very large, i.e., n t 30, then the value of “C” will be one-quarter of the total number of observations. Then, the remaining observations (n-C) are divided into two equal groups, one group contains the small values of X, and another group contains the large values of X. Step 3 : Third, we regress Y on X for the first part of the sample of the type Y1i = Į 0 +Į1X1i + İ1i , (i = 1, 2,......,(n-C)/2)

(4.69)

We apply the OLS method to equation (4.69) and then we obtain the residual sum of squares which is given by ESS1 =

n1

2 1i

¦e

(4.70)

i=1

where e1i

Y1i  Įˆ 0  Įˆ 1X1i .

Step 4: Next, we regress Y on X for the second part of the sample which includes large values of X of the type Y2i = ș 0 +ș1X 2i +İ 2i

(4.71)

We apply the OLS method to equation (4.71) and then obtain the residual sum of squares which is given by n2

ESS2 = ¦ e 22i

(4.72)

i=1

where e 2i

Y2i  șˆ 0  șˆ 1X 2i , and n1 = n 2 =(n-C)/2.

Under the null hypothesis, the test statistic is given by F=

ESS2 / n 2 ~ F(n 2 , n1 ) ESS1 / n1

(4.73)

Heteroscedasticity

187

Since n1 = n 2 , the test statistic is given by F=

ESS2 ~ F(n 2 , n1 ) ESS1

(4.74)

If the value of F tends to one, we can say that there is no problem of heteroscedasticity. Let the level of significance be 5% Decision: At a 5% level of significance with n 2 and n1 degrees of freedom, we find the table value of the F-test statistic. If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis implying the existence of the heteroscedastic problem in the data. Note: The Goldfeld-Quandt test (1965, 1972) cannot yield any specific estimate of the form of heteroscedasticity. If there is more than one explanatory variable, we can not apply this test statistic for detecting the presence of heteroscedasticity. It overcomes the problems of the lack of independence of the least-square residuals by running two separate regression equations. Some central observations were omitted so that, under heteroscedasticity, residuals in the numerators correspond to relatively large variances than those in the denominator to relatively small variances. The omitted observations correspond to intermediate-sized variances. The optimal value of C is not obvious. Large values are likely to increase the power of the test through an increase in the value of the F-test statistic, but decrease the power through a reduction in the degrees of freedom. Goldfeld-Quandt (1965) suggest that C=8 for n= 30, and C=16 for n = 60. However, in their later work (1972), they use C=4 for n = 30. Ex. 4-6: Test the presence of heteroscedasticity using the Goldfeld-Quandt test for the given problem in Ex. 4-1. Solution: The assumptions for applying the Goldfeld-Quandt test are

(i) The random error terms İ's of the regression equation of net profit (Y) on sales revenue (X) are normally distributed (ii) İ's are serially independent, i.e., Cov(İ i , İ j ) = 0,  i z j. The null hypothesis to be tested is: H 0 : Var(İ i ) = ı 2 ,  i against the alternative hypothesis H1: Var(İ i ) z ı 2 , [with an increasing variance]

To test the null hypothesis, we first order the observations according to the magnitude of the explanatory variable sales revenue (X). After that, 4 central observations are omitted from the data and then the remaining observations are divided into two equal groups. One group includes the small values of X and another group includes the large values of X which are given in Table 4-6. Table 4-6: Observations of small and large samples

Large Sample

Small Sample Y 0.750 0.780 0.780 0.870 0.850 0.650 0.980 1.250 1.250 1.480 1.260 1.560 1.350 1.660 1.500

X 15.650 16.590 16.960 18.960 19.350 19.560 20.650 22.360 22.650 24.650 25.360 26.680 32.520 34.600 35.260

Y 2.650 2.560 4.250 2.350 2.250 3.750 3.950 5.300 2.500 4.500 9.050 9.550 12.650 8.550 15.850

X 78.850 80.260 80.650 85.890 105.550 112.250 115.850 125.500 131.500 150.650 325.260 335.650 404.650 405.250 425.680

Chapter Four

188

1.680 1.750 1.350 1.650

36.780 37.680 38.260 38.290

10.650 20.200 26.810 48.500

520.250 685.500 1176.900 1346.600

We now regress Y on X for the first part of the sample and then we obtain the residual sum of squares which is given by ESS1 = 0.5565

(4.75)

We also regress Y on X for the 2nd part of the sample which includes large values of X, and then we obtain the residual sum of squares. The residual sum of squares is ESS2 = 193.9161

(4.76)

Under the null hypothesis, the test statistics is given by F=

193.9161/19 ~F(19, 19) 0.5565/19

= 348.4667

(4.77)

Let the level of significance be 5% Table Value: At a 5% level of significance with 19 and 19 degrees of freedom, the table value of the test statistic is 2.19. Decision: Since the calculated value of the test statistic is greater than the table value, the null hypothesis of homoscedasticity will be rejected. Thus, it can be concluded that the heteroscedastic problem is present in the data. The Glejser Test

Since the Goldfeld-Quandt (1965) test cannot estimate any specific form of heteroscedasticity which could then be inserted in Var(İ), to derive the generalised least squares (GLS) estimators, a test in this respect was developed by Glejser (1969). This technique may be used for large samples and may be used in small samples strictly as a qualitative method to learn something about heteroscedasticity. This test procedure is highlighted below: Step 1: First, we regress Y on X of the type Yi = ȕ 0 +ȕ1X i +İ i

(4.78)

Step 2: Second, we apply the OLS method to equation (4.78), and then we compute the residuals ei 's. Step 3: Third, we regress the absolute values of ei 's on the explanatory variables raised to different powers of the types |ei | = Į 0 +Į1X i + vi

(4.79)

|ei | = Į 0 +Į1 X i + vi

(4.80)

|ei | = Į 0 +Į1X i2 + vi

(4.81)

|ei | = Į 0 +Į1

1 + vi Xi

(4.82)

|ei | = Į 0 +Į1X i + vi

(4.83)

|ei | = Į 0 +Į1X i2 + vi

(4.84)

and so on.

Heteroscedasticity

189

where vi is the new random variable that satisfies all the usual assumptions of a CLRM. We choose the form of regression which gives the best fit in light of the correlation coefficient and the standard error of the coefficient of Į 0 and Į1 . Now, we can say that, if Į 0 = 0, and Į1 z 0, the situation is referred to as pure heteroscedasticity. If both Į 0 and Į1 are not zero, the case is referred to as mixed heteroscedasticity. Heteroscedasticity will be detected in light of the statistical significance of Į 0 and Į1 . The null hypothesis to be tested is H 0 : Į j = 0, ( j = 0, 1)

against the alternative hypothesis H1: Į j z 0 Under the null hypothesis, we can use any standard test like Z-test, t-test, and F-test. If these coefficients are found statistically significant from zero, we accept that İ's are heteroscedastic. Comment: Goldfeld and Quandt point out that the new random error term vi has some problems in that its expected value is nonzero, it is serially correlated, and ironically, it is heteroscedastic. An additional difficulty with the Glejser (1969) method is that the last two equations are non-linear in the parameters, and therefore, can not be estimated with the usual OLS procedure. However the first four models give generally satisfactory results in detecting the presence of heteroscedasticity for large sample sizes. Ex. 4-7: Test the problem of heteroscedasticity using the Glejser test for the given problem in Ex. 4-1. Solution: First, we regress net profit (NETP) on sales revenue (REV) which is given by the equation (4.65). We apply the OLS method to equation (4.65) and then obtain the residuals ei 's, which are given by ei = NETPi  0.2930  0.0292REVi

(4.85)

We regress the absolute value of residuals on the explanatory variable (REV) of the type |ei | = Į 0 +Į1REVi +vi

(4.86)

We assume that the new random error term vi satisfies all the usual assumptions. The null hypothesis to be tested is H 0 : Į1 = 0

against the alternative hypothesis H1: Į1 z 0

We apply the OLS method to equation (4.86) and then obtain Įˆ 1 and SE(Įˆ 1 ). We have Įˆ 1 SE(Įˆ 1 ) = 0.000493.

0.005889, and

Under the null hypothesis the test statistic is t=

0.005889 ~t (40)d.f. 0.000493

= 11.9360

(4.87)

The test result indicates the presence of heteroscedasticity in the data at any significance level. The Spearman’s Rank Correlation Test

An alternative test of the Goldfeld-Quandt test (1965, a non-parametric test) is suggested by Jonston (1972). This is to compute the rank correlation between the absolute values of residuals and independent variables X’s. This test procedure involves the following steps:

Chapter Four

190

Step 1: First, we regress Y on X of the type

(4.88)

Yi = ȕ 0 +ȕ1X i +İ i

Step 2: Second, we apply the OLS method to equation (4.88) and then compute the residuals e’s. Step 3: Third, we order the e’s (ignoring their signs) and the X values in an ascending or descending order. Then, we compute the rank correlation coefficient between the absolute values of e’s and X’s which is denoted by rex and given by: ª n 2 º « ¦ di » » rex = 1  6 « i=12 « n(n  1) » «¬ »¼

(4.89)

where d i is the difference between ith rank of the variables |e| and X. rex lies between -1 to +1. A high-rank correlation coefficient suggests the presence of heteroscedasticity. Step 4: In this step, we set up the null hypothesis. The null hypothesis to be tested is H 0 : ȡ ex = 0

against the alternative hypothesis H1: ȡ ex z 0

Step 5: Now, we calculate the value of the test statistic. Under the null hypothesis, the test statistic is given by rex

Z=

1 n-1

(4.90)

~N(0, 1)

Or t=

rex n  2 1  rex2

(4.91)

~t (n-2)d.f.

Rejection of the null hypothesis at a given level of significance indicates the presence of a heteroscedastic problem in the data. Note: We can use this test procedure for both large and small samples. When it is not possible or very difficult to measure the values of the residuals and the independent variables, then we rank them and apply Spearman’s rank correlation test for detecting the presence of heteroscedasticity. If there is more than one explanatory variable, we compute the rank correlation coefficient between e’s and each of the explanatory variables. If the sample size is small, we apply the t-test. For large sample sizes, we can use the standard normal test. Ex. 4-8: Verify the existence of heteroscedasticity using Spearman’s rank correlation test for the given problem in Ex. 4-1. Solution: First, we regress net profit (NETP) on sales revenue of the type given by equation (4.65). We apply the OLS method to equation (4.65) and then obtain the absolute values of residuals which are given in Table 4-7. Table 4-7: Absolute values of residuals (e’s)

7.84 1.63 0.49

8.90 0.45 0.47

0.11 0.06 0.30

3.13 1.60 0.23

0.54 1.08 0.30

4.83 0.16 0.08

3.57 0.67 0.21

0.54 0.08 0.00

0.74 0.18 0.24

0.18 0.36 0.01

0.28 0.36 0.01

0.19 0.11 0.00

1.12 0.01 0.31

1.34 0.06 0.02

Heteroscedasticity

191

We order the values of |ei |'s and REV in an ascending order and then we find the differences of the ranks of them. Let d i be the difference of the ith rank of |e| (erank) and REV (xrank) which is given by d i = erank i  xrank i ,  i, ; i = 1, 2,…, 42. Then we compute d i2 and

42

¦d

2 i

which are given in Table 4-8.

i=1

Table 4-8: Calculations of d i , and d i2

erank

xrank

di

d i2

erank

xrank

di

d i2

41 42 11 38 29 40 39 30 32 15 20 16 34 35 37 26 7 36 33 13 31

41 42 40 38 35 39 37 36 34 29 30 33 28 31 32 27 24 26 20 22 23

0 0 -29 0 -6 1 2 -6 -2 -14 -10 -17 6 4 5 -1 -17 10 13 -9 8

0 0 841 0 36 1 4 36 4 196 100 289 36 16 25 1 289 100 169 81 64

9 14 24 25 12 5 8 28 27 21 18 22 10 17 2 19 3 4 1 23 6

25 15 14 17 13 21 18 12 10 9 11 8 7 6 2 19 5 3 1 16 4

-16 -1 10 8 -1 -16 -10 16 17 12 7 14 3 11 0 0 -2 1 0 7 2

256 1 100 64 1 256 100 256 289 144 49 196 9 121 0 0 4 1 0 49 4

We have

42

¦d

2 i

4188

i=1

Then, we compute the rank correlation coefficient between the absolute value of e’s and REV which is denoted by rex and given by ª 4188 º rex = 1  6 « » 2 ¬ 42(42  1) ¼

= 0.6606 The null hypothesis to be tested is H0 : ȡ = 0

against the alternative hypothesis H1: ȡ z 0

Under the null hypothesis, the test statistic is given by

(4.92)

Chapter Four

192

z=

0.6606 1/41

= 4.2301

(4.93)

Let the level of significance be 5%. Decision: At a 5% level of significance the table value, of the test statistic is r1.96 . Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected indicating the presence of a heteroscedastic problem in the data The Likelihood Ratio Test: Testing Heteroscedasticity for Grouped Data

The likelihood ratio test was proposed by Kmenta (1971) and Maddala (1977) for large sample size to detect the presence of heteroscedasticity. This test procedure involves the following steps: Step 1: First, we consider the simple linear regression equation for the ith group of the type Yij = ȕ 0 +ȕ1X ij +İ ij , (i=1, 2,…,m, j =1, 2,...,n i )

(4.94)

or a multiple linear regression equation of the type Yij = ȕi0 +ȕi1X1ij +ȕi2 X 2ij +.......+ȕik X kij +İ ij , (i=1, 2,.....,m, j =1, 2,.....,n i )

(4.95)

Step 2: Second, we apply the OLS method to equation (4.94) or (4.95) and then obtain the least squares residuals e’s for each group. For the ith group, the residuals are denoted eij's. Step 3: We then obtain the maximum likelihood estimate of the residual variances for each group. Let si2 be the

maximum likelihood estimate of the residual variance ı i2 for the ith group which is given by ni

2 i

s =

¦e

2 ij

j=1

, i = 1, 2,………,m

ni

(4.96)

Step 4: Now, we regress Y on X’s for the entire sample and then obtain the maximum likelihood estimate of the residual variance for the combined sample. Let s 2 be the maximum likelihood estimate of the residual variance ı 2 for the entire sample which is given by n

s2 =

¦e j=1

n

2 j

, where n =

m

¦n . i

(4.97)

i=1

Step 6:. We set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is H 0 : ı12 = ı 22 =.......= ı 2m

against the alternative hypothesis

H1: At least two of them are not equal. Step 7: Under the null hypothesis, the test statistic is given by m

2 LRT = nln(s 2 )  ¦ n i ln(si2 )~F (m-1)d.f.

(4.98)

i=1

Rejection of the null hypothesis at a given level of significance with a given degrees of freedom, implies the presence of heteroscedasticity. Ex. 4-9: Test the equality of variances of total expenditure between foreign banks and state-owned banks for the given problems in Ex. 2-4 and Ex. 2-5 using the LR test.

Heteroscedasticity

193

Solution: First, we regress total expenditure (TEX) on total income (TINC) for foreign banks of the type TEX t = ȕ 0 +ȕ1TINC t +İ t , t = 1, 2,........,n1 .

(4.99)

We apply the OLS method to equation (4.99) and then obtain the squares of residuals. Then, we obtain the maximum likelihood estimate of ı12 which is given by s12 =

1 n1 2 ¦ et , n1 t=1 216305.9667 46 4702.3036

(4.100)

where e t = TEX t  16.7007  0.4729TINC t . We now regress total expenditure (TEX) on total income (TINC) for state-owned banks of the type TEX t =D 0 +D1TINC t +u t , t = 1, 2,........,n 2

(4.101)

We apply the OLS method to equation (4.101) and then obtain the sum of squares of residuals. Then, we obtain the maximum likelihood estimate of ı 22 which is given by s 22 =

1 n2

n2

¦ uˆ

2 t

t=1

14785942.5414 46 321433.5335

(4.102)

where uˆ t = TEX t  110.3103  0.8129TINC t We now regress TEX on TINC for a combined sample of the type TEX t = į0 + į1TINC t +v t , t = 1, 2,........,n.

(4.103)

We apply the OLS method to the equation (4.103), and then obtain the sum of squares residuals. Then we obtain the maximum likelihood estimate of ı 2 which is given by s2 =

=

1 n 2 ¦ vˆ t n t=1 39620224.0056 92

= 430654.6088

where vˆ i = TEXi  146.7843  0.8081TINCi The null hypothesis to be tested is H 0 : ı12 = ı 22

against the alternative hypothesis H1: They are not equal.

Under the null hypothesis, the likelihood ratio test (LRT) statistic is given by:

(4.104)

Chapter Four

194

2 LRT = ^92 u log(430654.6088)`  ^(46.0 u log(4702.3036))+(46.0 u log(321433.5335))` ~ F1d.f.

= 221.2492

(4.105)

Let the level of significance be 5%. Decision: At a 5% level of significance with 1 degree of freedom, the table value of the test statistic is 3.84. Since the calculated value of the test statistic is greater than the table value, we reject the null hypothesis. Thus, it can be concluded that there is a problem of heteroscedasticity in the data. This implies that the variances of the total expenditure of foreign banks and state-owned banks in Bangladesh are not equal. The Breusch-Pagan Test

An asymptotic test for testing heteroscedasticity was developed by Breusch and Pagan (1979) which is based on the idea that, if the hypothesis of homoscedasticity is true, the ordinary least squares estimates of the regression coefficients do not differ significantly from the maximum likelihood estimates that allow for possible heteroscedasticity. The log-likelihood function that allows for heteroscedasticity is n 1 n 1 n § Y  ȕ 0  ȕ1X i · Log(L) =  log(2ʌ)  ¦ log(ı i2 )  ¦ ¨ i ¸ 2 2 i=1 2 i=1 © ıi ¹

2

(4.106)

The first derivatives of Log(L) should be equal to zero when the unknown parameters are replaced by their respective maximum likelihood estimates. If, instead the unknown parameters are replaced by the ordinary least squares estimates, and if the disturbances terms are homoscedastic, then the first derivatives of (4.106) should not be different significantly from zero. Let us now discuss the test procedure. The Breusch-Pagan test procedure involves the following steps: Step 1 : First, we consider the regression equation of the type Yi = ȕ 0 +ȕ1X1i +.........+ȕ k X ki +İ i

(4.107)

where the disturbances terms İ i 's are assumed to be normally and independently distributed with zero mean and variance ı i2 (i = 1, 2,….,n). Where ıi2 = h(z ci Į), and h(.) denotes some unspecified functional form, D is a {(m+1) u 1)} vector of unknown parameters unrelated to ȕ's . zci = (1 z1i z 2i ......,z mi ) is a vector of non-stochastic variables that may be identical or function of the variables of X’s. Step 2: Second, we apply the OLS method to equation (4.107) and then obtain residuals ei 's. Step 3: Now, we obtain the maximum likelihood estimator of ı 2 , which is given by n

s2 =

¦e

2 i

i=1

n

(4.108)

Step 4 : Next, we construct a new variable pi which is given by pi =

ei2 , i = 1, 2,………….,n. s2

(4.109)

Step 5 : We regress pi on the variable z’s which is given by pi = Į 0 +Į1z1i +Į 2 z 2i +.......+Į m z mi +vi

(4.110)

where vi is the new random error term that satisfies all the usual assumptions, z’s are the non-stochastic variables that may be identical or functions of X’s Step 6 : Now, we apply the OLS method to equation (4.110) and then obtain the regression/explained sum of squares (RSS). The regression or explained sum of squares (RSS) is given by

RSS = Total sum squares- Residual sum of squares

Heteroscedasticity

195

Step 7 : We set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested H 0 : Į1 = Į 2 =........=Į m = 0

against the alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the Breusch-Pagan (BP) test is given by BP =

RSS 2 ~F (m)d.f. 2

(4.111)

Under the null hypothesis, it can also be shown that BP = nR e2

(4.112)

is asymptotically distributed as a Chi-square with m degrees of freedom. where, R e2 is R-squared from the regression equation (4.110). Rejection of H 0 at a given level of significance implies the presence of heteroscedasticity. Note: The Breusch-Pagan test is the most powerful test when heteroscedasticity is present but in the case of a small sample, the stated level of significance is only a rough indication of the true level. This test is more applicable in those situations where the observations are not replicated. The most common situation is one of a single observation point. The Breusch-Pagan test for heteroscedasticity covers a wide range of heteroscedastic situations and it is also a very simple test and based on the OLS residuals. The Breusch-Pagan test is very sensitive to any violation of the normality assumption. The Breusch-Pagan test also requires prior knowledge of what might be causing heteroscedasticity. Ex. 4-10: Verify the existence of heteroscedasticity using the Breusch-Pagan (BP) test for the given problem in Ex. 33. Solution: First, we regress total income (TINC) on investment ( INV) and money supply (M2) of the type

(4.113)

TINC t = ȕ 0 +ȕ1INVt +ȕ 2 M2 t +İ t

We assume that the random error term İ t satisfies all the usual assumptions of a CLRM. We apply the OLS method to equation (4.112) and then obtain residuals e t 's which are given by e t = TINC t  1.1224  0.2906INVt  0.0448M2t , t=1, 2, …….,n.

and then, we obtain the maximum likelihood estimator of ı 2 which is given by s2 =

=

1 n 2 ¦ et n t=1 1 29886.1422 46

= 649.6987 We construct a new variable p t which is defined as p t =

e 2t ,  t and given in Table 4-9. s2

(4.114)

Chapter Four

196

Table 4-9: New variable p

Year 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989

p 0.014 0.015 0.019 0.03 0.049 0.086 0.111 0.165 0.185 0.405 0.642 0.78 0.867 1.276 0.064 0.118

Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

p 0.112 0.002 0.001 0.074 0.343 0.21 0.149 0.072 0.117 0.24 0.002 0.044 0.131 0.01 0.012 0.42

Year 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019

p 2.567 0.093 1.889 2.637 2.702 9.011 8.024 0.102 0.097 0.294 1.986 0.619 0.755 8.457

We now regress p t on INV and M2 of the type (4.115)

p t = Į 0 +Į1INVt +Į 2 M2 t + v t

where v t is the random error term that satisfies all the usual assumptions of a CLRM. We apply the OLS method to equation (4.115) and the OLS estimates are given below pˆ t = 0.2685+ 0.0034INVt  0.0003M 2 , R 2 t-Test 0.816 SE 0.329

1.450 0.0023

-0.755 0.0004

0.4107 ½ ° ¾ ° ¿

(4.116)

The null hypothesis to be tested is H 0 : Į1 = Į 2 = 0

against the alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the test statistic is given by:

BP = n u R e2 ~Ȥ 22

46 u 0.4107

18.892

(4.117)

Decision: At a 5% level of significance with 2 degrees of freedom, the table value of the Chi-square test statistic is 5.99. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected implying that there is a heteroscedastic problem in the data. The Lagrange Multiplier (LM) Test for Heteroscedasticity

In a linear regression model, the LM test is an appealing candidate for detecting the presence of heteroscedasticity. We do not need to assume that the variances of the disturbances are increasing or decreasing functions of a particular variable if we apply the LM test. The test procedure is discussed below:

Heteroscedasticity

197

Step 1: First, we regress Y on X’s of the type

(4.118)

Yi = ȕ 0 +ȕ1X1i +........+ȕ k X ki +İ i

We assume that the random error term İ i satisfies all the usual assumptions of a CMLRM. The null hypothesis to be tested is H0 : Var(İi |X1 , X2 ,...,Xk ) = ı2 . Because İ

is assumed to have a zero conditional mean, i.e.,

E(İi |X1 , X2 ,.......,Xk ) = 0 and Var(İ i |X1 , X 2 ,...,X k ) = E(İ i2 |X1 , X 2 ,..,X k ) = ı 2 . Therefore, the null hypothesis of homoscedasticity is equivalent to H 0 : E(İ i2 |X1 , X 2 ,.....,X k ) = ı 2 . This shows that to test for violation of the assumption of homoscedasticity, we want to test whether the expected value of İ i2 is related to one or more of the explanatory variables. If H 0 is false, the expected value of İ i2 will be a function of the X’s. Step 2: Second, a simple approach is to assume a linear function for ı i2 , i.e., ı i2 = Į 0 +Į1X1i +......+Į k X ki +vi

(4.119)

where vi 's are the new random error terms that satisfy all the usual assumptions of a regression equation. Since ıi2 is unknown to us, we have to replace it by ei2 , where ei 's are the OLS residuals which are obtained from equation (4.118). Thus, we can estimate the following equation ei2 = Į 0 +Į1X1i +........+Į k X ki + vi

(4.120)

Step 3: We set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is H 0 : Į1 =Į 2 =.......=Į k =0

against the alternative hypothesis H1: At least one of them is not zero.

Step 4: We apply the OLS method to equation (4.120) and then obtain the R-squared. Let R e2 be the coefficient of multiple determination of equation (4.120). Under the null hypothesis, the Lagrange multiplier (LM) test statistic is given by

LM = n u R e2

(4.121)

which is asymptotically distributed as a Chi-square with k degrees of freedom. Rejection of H 0 at a given level of significance implies the presence of heteroscedasticity. Ex. 4-11: Verify the existence of heteroscedasticity using the LM test for the given problem in Ex. 3-5. Solution: First, we regress carbon dioxide emissions (CO2) on energy consumption (EN), trade openness (OPN), urbanisation (UR) and economic growth (PGDP) of the type CO2t = ȕ 0 +ȕ1EN t +ȕ 2 OPN t +ȕ 3 UR t +ȕ 4 PGDPt +İ t

(4.122)

Assuming that the random error terms İ t 's satisfy all the usual assumptions of a CMLRM. We now apply the OLS method to equation (4.122) and then obtain the squared residuals e2t 's which are given by e 2t = (CO2 t +14.2622  0.0745EN t +0.059OPN t  0.1796UR t +9.86753e-005PGDPt ) 2

(4.123)

We regress e 2t on EN, OPN, UR, and PGDP of the type e 2t = Į 0 +Į1EN t +Į 2 OPN t +Į3 UR t +Į 4 PGDPt +v t

(4.124)

where v t is the random error term that satisfies all the usual assumptions of a classical multiple linear regression equation. The null hypothesis to be tested is

Chapter Four

198

H 0 : Į1 = Į 2 = Į3 = Į 4 = 0

against the alternative hypothesis H1 : At least one of them is not zero.

We apply the OLS method to the equation (4.124) and the OLS estimates are given below eˆ 2t = -2.075 + 0.0037EN t +0.0049OPN t +0.0098UR t +0.000003PGDPt , R 2 =0.1687 ½ ° t-Test: -0.659 2.153 0.380 0.217 0.166 ¾ ° SE: 3.146 0.002 0.013 0.045 0.00002 ¿

(4.125)

Under the null hypothesis, the test statistic is given by 2 LM = 49×0.1687~Ȥ (4)d.f.

= 8.2669

(4.126)

Let the level of significance be 5%. Decision: At a 5% level of significance with 4 degrees of freedom, the table value of the Chi-square test statistic is 9.49. Since the calculated value of the test statistic is smaller than the table value, the null hypothesis will be accepted implying that there is no heteroscedastic problem in the data.

Here, we can also apply the F-test statistic. Under the null hypothesis, the F-test statistic is given by F=

=

R e2 / k ~F(k, n-k-1) (1  R e2 ) / (n-k-1) 0.1687/4 ~F(4, 45) (1  0.1687)/45

= 2.2832

(4.127)

Decision: At a 5% level of significance with 4 and 45 degrees of freedom, the table value of the test statistic is 2.61. Since the calculated value of the test statistic is smaller than the table value, the null hypothesis will be accepted implying that there is no heteroscedastic problem in the data. White's Test

The Breusch-Pagan (1979) test has been criticised on the grounds that it is very sensitive to a minor violation of the assumption of normality of the regression disturbances. This dependency on normality can be removed by a slight modification of the test statistic. One such modification was proposed by H. White (1980). White's (1980) test is principally based on the fact that heteroscedasticity leads to a least-squares covariance matrix estimator that is inconsistent. This test involves the following steps: Step 1: First, we regress Y on X's of the type Yi = ȕ 0 +ȕ1X1i + ȕ 2 X 2i +......+ȕ k X ki +İ i

(4.128)

The typical homoscedastic assumption in truth may be replaced with a weaker assumption in the typical OLS, that the error term İ be uncorrelated with the independent variables, X j's, the squares of the independent variables, X 2j 's, and the cross products of the independent variables, X j X m , where j z m, , and j, m  {1, 2,....,k} . This led to the White test by White which suggested the inclusion of all the above as covariates in the third step regression. Step 2: Second, we apply the OLS method to equation (4.128) and then obtain the least-squares residuals ei 's. Then

we obtain squared residuals ei2 's. Step 3: Third, we consider the following artificial regression equation ei2 = į0 +į1 Z1i +į 2 Z2i +......+į p Zpi +vi

(4.129)

Heteroscedasticity

199

where vi is the new random error term that satisfies all the usual assumptions of a CLRM, and Z's are the function of the explanatory variables X’s. For a simple linear regression equation, White sets Z1i = X i , and Z2i = X i2 , the specification is p =2. For a multiple linear regression equation with two explanatory variables, the specification is p=5, i.e., Z1i = X1i , Z2i = X 2i , Z3i = X1i2 , Z4i = X 22i , and Z5i = X1i X 2i . We determine the Z’s in an analogy fashion for a model with more than two explanatory variables. We compute the coefficient of determination from the artificial regression equation (4.129) which is denoted by R e2 . Step 4: We set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is H 0 : į1 = į 2 =.......=į p = 0

against the alternative hypothesis H1: At least one of them is not zero.

Under the null hypothesis, the test statistic is given by WT = nR e2 ~Ȥ 2p

(4.130)

Rejection of H 0 at a given level of significance implies the presence of heteroscedasticity. Note: This test could be regarded as a one for misspecification as it is also likely to pick up other specification errors such as a misspecified mean function of the dependent variable Y or correlation between X and e in the stochastic regression model. If we are confident about the absence of these latter problems, the test can be regarded as one designed to detect the presence of heteroscedasticity. If this type of problem is present in our regression equation, we cannot apply this test to detect the presence of heteroscedasticity. Ex. 4-12: Test the presence of heteroscedasticity for the given problem in Ex. 4-1 using the White test. Solution: First, we regress net profit (NETP) on sales revenue (REV) of the type

(4.131)

NETPi = ȕ 0 +ȕ1REVi +İ i , i = 1, 2,......,n.

We apply the OLS method to equation (4.131) and then obtain the squares of residuals (ei2 ). The results are given in Table 4-2 . We consider the following auxiliary regression equation ei2 = į0 +į1REVi +į 2 REVi2 +vi

(4.132)

where vi is the new random error term that satisfies all the usual assumptions of a CLRM. We now apply the OLS method to equation (4.132), the OLS estimates are given below eˆ i2 = 0.6515  0.0093REVi +0.00005REVi2 , R e2 t-Test: 0.739 -1.394 SE: 0.8806 0.0066

8.989 5.5705e-006

0.9347 ½ ° ¾ ° ¿

(4.133)

The hypothesis to be tested is H 0 : į1 =į 2 = 0

against the alternative hypothesis H1: At least one of them is not zero.

Under the null hypothesis, the test statistic is given by: 2 WT = 42 u 0.9347~Ȥ (2)d.f.

= 39.3564 Let the level of significance be 5%.

(4.134)

200

Chapter Four

Decision: At a 5% level of significance with 2 degrees of freedom, the table value of the test statistic is 5.99. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected, implying the presence of heteroscedasticity in the data. The Harvey and Godfrey Test:

Harvey (1976) and Godfrey (1978) suggested a test statistic when there exists a specific structure of heteroscedasticity, i.e., to use the Harvey and Godfrey test, we must choose a specific functional form for the relationship between the error variance and the variables that are responsible to determine the error variance. Harvey and Godfrey test assumes that the error variance is an exponential function of one or more than one non-stochastic variables which are responsible for heteroscedasticity. The variables are usually assumed to be one or more of the explanatory variables in the regression equation. In the application of Harvey and Godfrey test, to detect the presence of heteroscedasticity we do not need to assume that the error variances are increasing or decreasing functions of the non-stochastic variables. The test procedure is discussed below: Step 1: First, we regress Y on X’s of the type Yi = ȕ 0 +ȕ1X1i +......+ȕ k X ki +İ i , i = 1, 2,.......,n.

(4.135)

The Harvey-Godfrey test assumes that the error variance is an exponential function of the non-stochastic variables. This can be written as follows ı i2 = exp(Į 0 +Į1 Z1i +Į 2 Z2i +.......+Į p Zpi ), i = 1, 2,………,n

(4.136)

where exp means the exponential function, p is the number of unknown coefficients, and the Z’s are non-stochastic variables with known values. (some or all of the Z’s might be the X’s in the model). The logarithmic transformation of equation (4.136) is given by ln(ı i2 ) = Į 0 +Į1 Z1i +Į 2 Z2i +........+Į p Zpi

(4.137)

Step 2: We apply the OLS method to equation (4.135) and then obtain the residuals ei 's. Then, we obtain squared

residuals ei2 's. Step 3: Third, we consider the following auxiliary regression equation ln(ei2 ) = Į 0 +Į1 Z1i +Į 2 Z2i +..........+Į p Zpi +vi

(4.138)

where vi is a new random error term that satisfies all the usual assumptions of a CLRM. Step 4: We set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is H 0 : Į1 = Į 2 =........= Į p = 0

against the alternative hypothesis H1: At least one of them is not zero.

Step 5: We apply the OLS method to the auxiliary regression equation (4.138) and then obtain the coefficient of determination which is denoted by R e2 . Step 6: Under null hypothesis, the test statistic is given by LM = nR e2 ~Ȥ 2p

(4.139)

Rejection of H 0 at a given level of significance implying the presence of heteroscedasticity in the data. Note: The Harvey-Godfrey test requires prior knowledge of what might be causing heteroscedasticity if it exists. For example, the Harvey-Godfrey test assumes that the error variances are non-linear functions of one or more than one explanatory variable. Thus, if heteroscedasticity exists, but the error variances are linear functions of one or more than one explanatory variable, then this test will not be applicable. The Harvey-Godfrey test for heteroscedasticity covers a wide range of heteroscedastic situations and it is also a very simple test and it is based on the OLS residuals. The

Heteroscedasticity

201

Harvey-Godfrey test is very sensitive to any violation of the normality assumption of the random error terms. If the random error terms are not normally distributed, then this test may not be valid. Ex. 4-13: Detect the presence of heteroscedasticity for the given problem in Ex. 4-1 using the Harvey-Godfrey test. Solution: First, we regress net profit (NETP) on sales revenue (REV) of the type

(4.140)

NETPi = ȕ 0 +ȕ1REVi + İ i , i = 1, 2,........,n.

Here, we assume that error variance ı i2 is an exponential function of the non-stochastic variable REV. We apply the OLS method to equation (4.140) and then obtained the squared residuals which are given in Table 4-2. We now consider the following auxiliary regression equation ln(ei2 ) = Į 0 + Į1REVi +vi

(4.141)

We assume that the new random error term vi satisfies all the usual assumptions of a CLRM. The null hypothesis to be tested is H 0 : Į1 = 0

against the alternative hypothesis H1: Į1 z 0

We apply the OLS method to the auxiliary regression equation (4.141) and the OLS estimates are given below ln(eˆ i2 ) = -4.2768+ 0.0077REVi , R e2 =0.2650 ½ ° t-Test: -6.2186 3.7974 ¾ ° SE: 0.6877 0.0020 ¿

(4.142)

Under the null hypothesis, the test statistic is given by 2 LM = 42 u 0.2650~F (1)d.f.

= 11.1293

(4.143)

Let the level of significance be 5% Decision: At a 5% level of significance with 1 degree of freedom, the table value of the test statistic is 3.84. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected indicating the presence of heteroscedasticity in the data. Thus, it can be said that the error variance will be the exponential function of the variable REV. Test for Detecting Heteroscedasticity with the Presence of Autocorrelation

When we deal with time-series data, we may wish to test the null hypothesis of homoscedasticity against the alternative hypothesis of autoregressive conditional heteroscedasticity (ARCH). In this case, we may use a simple test developed by Engle (1982). The Engles's Test

This test procedure involves the following steps: Step 1: First, we regress Y on X’s of the type Yt = ȕ 0 +ȕ1X1t +........+ȕ k X kt +İ t , t = 1, 2,.........,T.

(4.144)

Step 2: Second, we apply the OLS method to equation (4.144) and then obtain the least squares residuals e t , t. Then

we obtain squared residuals e 2t ,  t. Step 3: Third, we consider the following Engle's autoregression conditional heteroscedastic model

Chapter Four

202

e 2t = O0 +O1e 2t-1 +O2 e 2t-2 +........+Op e 2t-p +v t

(4.145)

where v t is a new random error term that satisfies all the usual assumptions of a CRLM. Step 4: We apply the OLS method to the equation (4.145) and then we compute the coefficient of determination which is denoted by R e2 . The lagged values of equation (4.145) are selected by using three most popular and widely used criteria namely: (i) Akaike’s Information Criterion (AIC) proposed by Akaike (1973), which is given by AIC = log(ıˆ 2 ) +

2k T

(4.146)

(ii) Schwarz’s Bayesian Information Criterion (SBIC) proposed by Schwarz (1978) which is given by: SBIC = log(ıˆ 2 ) +

k log(T) T

(4.147)

(iii) Hannan–Quinn information criterion (HQC) which is used as an alternative to the AIC and SBIC criteria. It is proposed by Edward James Hannan and Barry Gerard Quinn (1979) and given by HQC = ln ıˆ 2 +

2k log ^log(T)` T

(4.148)

1 T 2 ¦ et , k is the number of parameters in the equation to be T t=1 estimated and T is the sample size. The model with a lower value of AIC or BIC or HQC will be selected.

where ıˆ 2 is the MLE of ı 2 which is given by ıˆ 2 =

Step 5: We set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is H 0 : Ȝ1 = Ȝ 2 =........=Ȝ p = 0

against the alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the test statistics is given by LM = T×R e2 ~Ȥ 2p

(4.149)

Rejection of the null hypothesis at a given level of significance implies the presence of heteroscedasticity with autocorrelation. Note: This test is a simplification of the Lagrange multiplier test. This test is closely related to the Breusch-Pagan test and to the White test. Ex. 4-14: Test the null hypothesis of homoscedasticity against ARCH using the Engle test for the given problem in Ex. 2-5. Solution: First, we regress total expenditure (TEX) on total income (TINC) of the type TEX t = ȕ 0 +ȕ1TINC t + İ t , t = 1, 2,.....,T.

(4.150)

Assuming that the random error terms e t ,s satisfy all the usual assumptions of a CLRM, we apply the OLS method to equation (4.150) and then obtain the squares of the OLS residuals given by e 2t = (TEX t  110.3101  0.8129TINC t ) 2 , t = 1, 2,…………,T

(4.151)

See the results in Ex. 2-5. We now consider the following Engle's autoregression conditional heteroscedastic model e 2t = Ȝ 0 + Ȝ1e2t-1 + Ȝ 2 e2t-2 +.......+ Ȝ p e2t-p + v t

(4.152)

Heteroscedasticity

203

Assuming that the new random error term v t satisfies all the usual assumptions, we then apply the OLS method to equation (4.152) and then the values of AIC, SBIC and HQC criteria are obtained and given in Table 4-10. Table 4-10: AIC, SBIC and HQC criteria

Lags 1 2 3 4 5

AIC 27.3915 27.4354 27.5029 27.5198 27.4911

SBIC 27.4718 27.5571 27.6667 27.7267 27.7418

HQC 27.4214 27.4805 27.5633 27.5957 27.5824

It is found that for lag 1, the AIC, SBIC, and HQC are smaller. Therefore, the final Engle's autoregression conditional heteroscedastic model will be e 2t =O0 +O1e2t-1 +v t

(4.153)

The null hypothesis to be tested is H 0 : O1

0

against the alternative hypothesis H1 : O1 z 0

The OLS estimates of the equation (4.153) are given below eˆ 2t = 210702.7448+0.3580e 2t-1 , R e2 =0.1281½ ° t-Test: 1.5312 2.5137 ¾ ° SE: 137606.3044 0.1424 ¿

(4.154)

Under the null hypothesis, the test statistic is given by 2 LM = 45 u 0.1281~F1d.f.

= 5.7556

(4.155)

Let the level of significance be 5%. Decision: At a 5% level of significance with 1 degree of freedom, the table value of the test statistic is 3.84 . Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected implying the presence of an autoregressive conditional heteroscedastic problem in the data. Note: We are very fortunate that different software packages like EViews, GAUSS, LIMDEP, Python, R, RATS, SHAZAM, SPSS, STATA, and TSP, etc. can be applied directly to detect the presence of heteroscedasticity based on different tests.

4.8 Estimation of the Heteroscedastic Model Different methods have been derived to estimate a heteroscedastic model. In this section, the most popular and widely applicable methods are discussed for estimating a heteroscedastic model. Weighted Least Squares (WLS) Method

After detecting the presence of heteroscedasticity in the data using any test statistic, we need to resolve the issue of how to estimate the heteroscedastic model. The appropriate solution is to transform the original model in such a way that we obtain a form in which the transformed disturbance term has a constant variance, and then, we can apply the OLS method to the transformed model. The transformation of the original model depends on the particular form of heteroscedasticity. In general, the transformation of the original model lies in dividing through the original relationship by the square root of the term which is responsible for heteroscedasticity. Let us now consider the regression equation of the type:

Chapter Four

204

(4.156)

Yi = ȕ 0 +ȕ1X i +u i

where u i 's are heteroscedastic disturbances and satisfy all other usual assumptions. Suppose the form of heteroscedasticity Var(u i ) = ıi2 is known to us. Then, the appropriate transformation of the original model reduces to the form ȕ ȕX u Yi = 0+ 1 i+ i ıi ıi ıi ıi Yi* = ȕ 0 X*0i +ȕ1X1i* + u *i

where Yi* =

(4.157)

Yi X u 1 , X*0i = , X1i* = i , and u *i = i . ıi ıi ıi ıi

The new error term u *i satisfies all the usual assumptions, and hence, we can apply the OLS method to the transformed equation (4.157) to obtain the estimates of the parameters. Thus we can obtain the values of the coefficient of the original model. The WLS estimators ȕˆ 0 and ȕˆ 1 of ȕ 0 and ȕ1 are obtained by minimising the residual sum of squares. The residual sum of squares is given by n

¦ uˆ

*2 i

i=1

n

= ¦ (Yi*  ȕˆ 0 X*0i  ȕˆ 1X1i* ) 2

(4.158)

i=1

Taking partial derivatives of equation (4.58) with respect to ȕˆ 0 and ȕˆ 1 , and then equating to zero, we have n

į¦ uˆ *2 i i=1

įȕˆ 0

n

= 0 Ÿ 2¦ (Yi*  ȕˆ 0 X*0i  ȕˆ 1X1i* )X*0i

0

(4.159)

i=1

n

į¦ uˆ *2 i i=1

įȕˆ 1

n

=0 Ÿ 2¦ (Yi*  ȕˆ 0 X*0i  ȕˆ 1X1i* )X1i*

0

(4.160)

i=1

Equations (4.159) and (4.160) can also be written as the following reduced forms n

n

i=1

i=1

* * ˆ ȕˆ 0 ¦ X*2 0i +ȕ1 ¦ X1i X 0i =

n

* i

¦Y X

* 0i

(4.161)

i=1

n

n

n

i=1

i=1

i=1

ȕˆ 0 ¦ X1i* X*0i +ȕˆ 1 ¦ X1i*2 = ¦ Yi* X1i*

(4.162)

From equation (4.161), we have n

ȕˆ 0 =

* i

¦Y X i=1

* 0i

n

 ȕˆ 1 ¦ X1i* X*0i i=1

n

¦X

*2 0i

i=1

Putting the value of ȕˆ 0 in equation (4.162), we have

(4.163)

Heteroscedasticity

205

ª n * * ˆ n * * º n « ¦ Yi X 0i  ȕ1 ¦ X1i X 0i » * * * * *2 i=1 i=1 ˆ « »   Y X X X ȕ ¦ ¦ i 1i 1i 0i 1 ¦ X1i = 0 n « » *2 i=1 i=1 i=1 X 0i ¦ «¬ »¼ i=1 n

n

2

n ˆ ª X* X* º ȕ X X Y X ¦ 1 1i 0i » ¦ ¦ « n n ¬ i=1 ¼  ȕˆ * * *2 i=1 i=1  Y X + ¦ i 1i 1 ¦ X1i = 0 n n *2 *2 i=1 i=1 ¦ X 0i ¦ X 0i n

* 1i

n

* 0i

* i

* 0i

i=1

n

n

i=1

i=1

i=1

n

n

i=1

i=1

n ª n ª n * *º *2 º  ȕˆ 1 « ¦ X*2 X ¦ 0i 1i » « ¦ X1i X 0i » i=1 ¼ ¬ i=1 ¼ = ¬ i=1

¦ X*20i ¦ Yi* X1i*  ¦ X1i* X*0i ¦ Yi*X*0i n

¦X

n

*2 0i

¦X

i=1

n

ȕˆ 1 =

*2 0i

n

* i

n

* 1i

i=1

* 1i

n

n

X*0i ¦ Yi* X*0i i=1

n

ª ª *2 *2 º * * º « ¦ X 0i ¦ X1i »  « ¦ X1i X 0i » i=1 ¬ i=1 ¼ ¬ i=1 ¼

Defining w i = n

ȕˆ 1 =

i=1

n

(4.164)

2

1 , equation (4.164) can be written as ı i2

n

n

n

¦w ¦w X Y ¦w X ¦w Y i

i=1

*2 0i

i=1

¦X ¦Y X  ¦X i=1

2

i

i

i

i

i=1

i

i=1

i

i

i=1

n ª n ª n º 2º «¦ w i ¦ w i Xi »  «¦ w i Xi » i=1 ¬ i=1 ¼ ¬ i=1 ¼

(4.165)

2

The variance of ȕˆ 1 is given by n

Varr(ȕˆ 1 ) =

¦w

i

i=1

n ª n ª n º 2º w w X  ¦ ¦ i i i « » «¦ w i Xi » i=1 ¬ i=1 ¼ ¬ i=1 ¼

(4.166)

2

If the form of heteroscedasticity is ı i2 = ı 2 X i , where ı 2 is the finite constant. Thus, the appropriate transformation of the original model reduces to the form Yi Xi

= ȕ0

1 Xi

+ȕ1

Xi Xi

+

ui

(4.167)

Xi

w i Yi = ȕ 0 w i +ȕ1 w i X i +w i u i , where w i = Yi* = ȕ 0 w i +ȕ1X*i +u *i

1 Xi

.

(4.168)

where Yi* = w i Yi , X*i = w i X i , and u *i = w i u i . Since the new random error term u *i satisfies all the usual assumptions, we can apply the OLS method to obtain the estimates of the parameters, and hence, we can obtain the values of the coefficient of the original model. If the form of heteroscedasticity is ı i2 = ı 2 X i2 , where ı 2 is the finite constant. Thus, the appropriate transformation of the original model reduces to the form

Chapter Four

206

ȕ ȕX u Yi = 0 + 1 i+ i Xi Xi Xi Xi

Yi* = ȕ1 +ȕ 0 X*i + u *i

(4.169)

Yi u 1 , X*i = , and u *i = i . Xi Xi Xi

where Yi* =

The new random error term u *i of the transformed equation (4.169) satisfies all the usual assumptions, and hence, we can apply the OLS method to obtain the estimates of the parameters. Thus, we can obtain the values of the coefficient of the original model. If the form of heteroscedasticity is ı i2 = ı 2 (a 0 +a1X i ) , then the appropriate transformation a 0 + a1Xi . Thus, the transformation model will be

implies the division of the original equation by Yi (a 0 + a1X i )

ȕ0

=

(a 0 + a1X i )

+

ȕ1X i (a 0 + a1X i )

ui

+

(a 0 + a1 X i ) 1

w i Yi = ȕ 0 w i +ȕ1 w i X i +w i u i , where w i =

a 0 +a1X i

Yi* = ȕ 0 w i +ȕ1X*i +u *i

(4.170)

.

(4.171)

where Yi* = w i Yi , X*i = w i X i , and u *i = w i u i . The new random error term u *i of the transformed equation (4.171) satisfies all the usual assumptions, and hence, we can apply the OLS method to the transformed model to obtain the estimates of the parameters. Thus, we can also obtain the coefficient of the original model. This transformation is equivalent to the application of the weighted least squares. In ordinary least squares method, we minimise the simple sum of squares residuals, i.e.,

n

¦e

2 i

in which each

i=1

residual is assigned equal weight. That is,

n

¦e

2 i

is the unweighted residual sum of squares where u i 's are estimated by

i=1

ei 's is assumed to give an equally precise indication of the true regression line due to the assumption of homoscedasticity. However, if the variance of u i 's are not constant but increases with the increasing values of X, the greater dispersion of the observations on the right causes them to give a less accurate indication of where the true line lies. Therefore, it seems plausible to pay less attention to those observations for which dispersion is less, and hence provide a more accurate idea of the position of the true line. This can be achieved by assigning different weights to each u i . It is reasonable to use a weight the ratio 1/ ı i2 , i.e., to divide each residual by the variance of the disturbances term. When a disturbance u i is large, it's variance will be large and the weight will be small. Hence the large disturbances are assigned smaller weights. Thus, instead of minimising of the simple sum of squared residuals as in OLS, we minimise the weighted sum of squared residuals, i.e.,

ª ei2 º 2 » i=1 ¬ i ¼ n

¦ «ı

n

1

¦ı i=1

2 i

(yi  ȕˆ 0  ȕˆ 1X i ) 2

(4.172)

Hence, the reason for naming this method is the weighted least squares (WLS) method. By applying calculus, we may derive the partial derivatives of the weighted sum of squares residuals and the formulas for ȕˆ 0 and ȕˆ 1 . These weighted least-squares estimates are the best linear unbiased estimates. Ex. 4-15: Estimate the heteroscedastic model for the given problem in Ex. 4-1 using the weighted least-squares method. Solution: Let the linear regression equation of net profit (NETP) on sales revenue (REV) be given by NETPi = ȕ 0 +ȕ1REVi +İ i , i = 1, 2,…..,n.

(4.173)

Heteroscedasticity

207

We assume that the random error term İ i satisfies all other usual assumptions except the assumption of homoscedasticity. We assume that the form of heteroscedasticity is ıi2 = ı 2 REVi , where ı 2 is a finite constant term. Thus, we divide equation (4.173) by NETPi REVi

= ȕ0

1 REVi

+ ȕ1

REVi REVi

+

REVi , and the transform equation will be İi REVi

(4.174)

Yi = ȕ 0 w i +ȕ1X i +u i

where Yi =

NETPi REVi

, wi =

1 REVi

, X i = REVi , and u i =

İi REVi

.

Since the new random error term u i of the transformed equation (4.174) satisfies all the usual assumptions, we can apply the OLS method to equation (4.174) which is called the WLS method. The WLS estimates are obtained using RATS and given in Table 4-11. Table 4-11: The WLS estimates

Linear Regression - Estimation by Weighted Least Squares, Dependent Variable NETP Variable Coeff. Std Error T-Stat Constant 0.47865 0.108402 4.41544 REV 0.02814 0.0012461 22.58476 Mean of Depen. Variable 42 Usable Observations Std Error of Depen. Variable 40 Degrees of Freedom Standard Error of Estimate 0.8249 Centered R2 Sum of Squared Residuals 0.8206 Adjusted (R2) Log Likelihood 0.9539 Uncentered R2 Durbin-Watson Statistic 40.065 nR2

Signif 0.00007 0.00000 0.3649 1.4492 0.2207 0.3497 40.9609 1.9218

Comment: From the estimated results, it is found that for increasing sales revenue to 1 million BDT, the average net profit will increase by 0.02814 million BDT which is statistically significant at any significance level. Also, it is found that, when the sales revenue is at zero level, the average net profit will be 0.47864 million BDT which is also statistically significant. From the estimated value of R 2 , it can be said the fit is quite good. Problems with Using the WLSE Estimator

The main problem with the WLS estimator is that, to use it, we must know the true error variance and the standard deviation of the error for each observation in the sample. However, the true error variance is always unknown and unobservable. Thus, the WLS estimator is not feasible. Generalised Least Squares Estimator (GLS)

Sometimes, it may happen that some of the observations are less reliable than others in our regression model. This indicates that the variance of the observations is not equal. In other words, the non-singular matrix is not of the form Var(İ) = ı 2 I n , but it is a diagonal with unequal diagonal elements. It may also happen in some problems that the off-diagonal elements of Var(İ) are not zero or both of these events occur. When one or both of these events occur, the OLS method is not applicable for estimating the regression equation. In this situation, we transform the original equation in such a way that the random error terms of the transformed model satisfy all the usual assumptions of a CLRM. The transformation of the original model depends on the particular form of heteroscedasticity. In general, the transformation of the original model lies of dividing through the original relationship by the square root of the term which is responsible for the heteroscedasticity. The estimators which are obtained from the transformed model using the principle of least squares are called the generalised least squares estimators. Let us now discuss the technique to obtain the GLS estimators. Let us now consider a multiple linear regression equation in matrix form, of the type Y = Xȕ+İ

All the terms of equation (4.175) have already been defined previously.

(4.175)

Chapter Four

208

ªı12 « «0 « . Var(İ) = « « . « . « «¬ 0

0 ı

0 ..... 0

0º » 0 ..... 0 0 » . ..... . . » » . ..... . . » . ..... . . » » 0 ..... 0 ı 2n »¼

2 2

. . . 0

(4.176)

Let Var(İ i ) = ı 2 w i ,  i, where w i is a non-stochastic variable, be identical or function of X’s. Thus, we have ª w1 «0 « « . 2 Var(İ) = ı « « . « . « ¬« 0

0 w2 . . . 0

0 0 . . . 0

..... 0 0 º ..... 0 0 »» ..... . . » » ..... . . » ..... . . » » ..... 0 w n »¼

= ı2ȍ ª w1 «0 « « . where : = « « . « . « ¬« 0

(4.177) 0 w2 . . . 0

0 0 . . . 0

..... 0 0 º ..... 0 0 »» ..... . . » » ..... . . » ..... . . » » ..... 0 w n ¼»

Therefore, we cannot apply the OLS method to estimate equation (4.175) or to estimate the parameters of equation (4.175). Thus, we have to transform equation (4.175) in such a way that the variance-covariance matrix of İ* will be ı 2 I n , where İ* is a vector of random error terms of the transformed equation. Since : is a positive definite matrix, there exists a matrix P such that PcP = :-1

PcP

-1

=:

P -1Pc-1 = : PP -1Pc-1Pc = P:Pc I n = P:Pc

(4.178)

Multiplying equation (4.175) by the P matrix, we have PY = PXȕ+Pİ

Y* =X*ȕ+İ*

(4.179)

The variance-covariance matrix of the vector of random error terms of the transformed equation (4.179) is given by Var(İ* ) = Var(Pİ) = PVar(İ)Pc = Pı 2 ȍPc

= ı2 In

(4.180)

Heteroscedasticity

209

Since İ* satisfies the Gauss-Markov conditions, we can apply the OLS method to the transformed equation (4.180). If we apply the OLS method to the transformed equation, the OLS estimator of ȕ is given by ȕˆ = (X*c X* )-1X*c Y* = ((PX)cPX)XcPcPY

= (XcPcPX)-1X cPcPY = (Xc:-1X)-1X c:-1Y

(4.181)

This estimator is referred to as the GLS estimator of ȕ. If : =I, then this GLS estimator reduces to the OLS estimator. From equation (4.181), we can only compute the GLS estimator if : is known to us. If it is unknown to us, first we have to estimate : and then obtain the estimated value of ȕ which is called the feasible generalised least squares estimator of ȕ (FGLS or EGLS). For a simple linear regression equation, the GLS estimator can be obtained by regressing PY on PX . ª « « « « « where PY = « « « « « « « «¬

Y1 º ª X1 » « w1 » « w1 « X2 Y2 » » « w2 » « w2 » , PX = « . » « . » « . . » « . » « . « X Yn » » « n w n »¼ «¬ w n

º » » » » » » , and P = » » » » » » »¼

ª « « « « « « « « « « « « «¬

1 w1 0

0 1 w2

0 ... ... 0 ... ...

. . .

. . .

. . .

... ... ... ... ... ...

0

0

0 ... ...

º 0 » » » 0 » » . »» . . » » . » 1 » » w n »¼

Applying the OLS method to the transformed equation, we can obtain the GLS/WLS estimator of ȕ which is given by -1

1 ª n º ª n º ȕˆ = « ¦ h i X i Xci » « ¦ h i X i Yi » , where h i = wi ¬ i=1 ¼ ¬ i=1 ¼

(4.182)

The variance-covariance matrix of ȕˆ is given by ˆ = ı 2 (X c: -1X)-1 var(ȕ) ª n º = ı 2 « ¦ h i X i Xci » ¬ i=1 ¼

-1

(4.183)

Ex. 4-16: Obtain the GLS estimator for the given problem in Ex. 4-10. Solution: First, we regress total income (TINC) on money supply (M2) and investment (INV) of the type TINC t = ȕ 0 +ȕ1M2 t +ȕ 2 INVt +İ t , t = 1, 2,......,T.

(4.184)

where the disturbance terms İ t 's are assumed to be normally and independently distributed with zero mean and variance ı 2t . In matrix notation, equation (4.184) can be written as Y = Xȕ+İ

(4.185)

Chapter Four

210

where Y is a (T×1) matrix of the observations of the dependent variable TINC, X is a (T×3) matrix of the observations of the dependent variables M2 and INV, ȕ is a (3 u 1) matrix of parameters ȕ 0 , ȕ1 , and ȕ3 , and İ is a (T×1) vector of random error terms. The variance-covariance matrix of İ is ª ı12 « «0 « . Var(İ) = « « . « . « «¬ 0

0 ı

2 2

. . . . . .

.

. . .

.

. . .

.

. . .

0

. . .

0º » 0» . » » . » . » » ı T2 »¼

=:

(4.186)

The GLS estimators of the equation (4.184) can be obtained by regressing w t TINC t , on w t , w t M2t , and w t INVt of the type: w t TINC t = ȕ 0 w t + ȕ1 w t M2 t +ȕ 2 w t INVt +İ*t

(4.187)

where İ*t is a new random error term that satisfies all the usual assumptions of a CLRM and w t = 2 t

2 t

2 t

1 ı

2 t

=

1 . Since ıt

2 t

ı is unknown to us, we use e as a proxy variable of ı . e 's are the squared OLS residuals from equation (4.184). Applying the OLS method to equation (4.187) the estimators which are obtained are called the GLS estimators. The RATS is used to obtain the GLS estimators whose results are given in Table 4.12. Table 4-12: The GLS estimates

Linear Regression - Estimation by GLS, Dependent Variable Y (TINC) Variable Coeff. Std Error T-Stat Constant -0.0952 0.6326 -0.1505 M2 0.0441 0.0026 16.6233 INV 0.2966 0.0135 21.8603 Mean of Dependent Variable 46 Usable Observations Std Error of Dependent Variable 43 Degrees of Freedom Standard Error of Estimate 0.9986 Centered R2 Sum of Squared Residuals 0.9986 Adjusted R2 Log Likelihood 0.9991 Uncentered R2 Durbin-Watson Statistic 45.958 TxR2 In matrix form, the GLS estimator of ȕ is given by ˆ -1X ȕˆ GLS = X c:





-1

ˆ 1 Y X c:

(4.188)

The results are given below -6.8244e-004 2.5698e-003 º ª 0.4166 « -1 ˆ ˆ -1Y = Xc: X = «-6.8244e-004 7.3380e-006 -3.6686e-005»» , and Xc: «¬ 2.5698e-003 -3.6686e-005 1.9167e-004 »¼

ª 212.3723 º « 447197.1024 » . « » «¬84296.2371 »¼

Putting these values in equation (4.188), we have ª -0.0952 º « » ˆȕ GLS = « 0.0441 » «¬ 0.2966 »¼

The variance-covariance matrix of ȕˆ GLS is given by:

(4.189)

Signif 0.8811 0.0000 0.0000 17.4137 26.1677 0.9801 41.3071 -170.9985 0.7684

Heteroscedasticity

Var(ȕˆ GLS )

-6.5557e-004 2.4686e-003 º ª 0.4002 « -6.5557e-004 7.0491e-006 -3.5241e-005» « » «¬ 2.4686e-003 -3.5241e-005 1.8412e-004 »¼

211

(4.190)

Comment. From the estimated results, it is found that both the variables M2 and INV have significant positive impacts on total income in the banking sector of Bangladesh at any significance level. From the estimated value of R 2 , it can be said that the fit is very good. Feasible Generalised Least Squares (FGLS) Estimator

The GLS estimator requires that the error variance ı i2 should be known for each observation in the sample. However, the structure of the heteroscedasticity, ıi2 is generally unknown to us. Hence, to make the GLS estimator feasible, we can use the sample data to obtain an estimate of ı i2 for each observation in the sample. We can then obtain the GLS estimator using the estimate of ı i2 . This estimator will be different from the GLS estimator. This estimator is called the Feasible Generalised Least Squares (FGLS) estimator. The following steps are involved in obtaining the FGLS estimators of the parameters of a regression equation: Step 1: First, we regress Y on k explanatory variables of the type Yi = ȕ 0 +ȕ1X1i +.......+ȕ k X ki + İ i , i = 1, 2,......,n.

(4.191)

In practice, a common and widely used form of heteroscedasticity is the multiplicative form of heteroscedasticity. Here, it is assumed that the error variance ı i2 is a function of some exogenous variables z i 's. An exponential function is used to make sure that the error variance will be positive for all parameter estimates. In particular, it is assumed that Var(İ i |x i ) = ıi2 = ı 2 exp{Į1z1i +Į 2 z 2i +...........+Į p z pi }

= ı 2 exp(zicĮ)

(4.192)

where “exp” means the exponential function, p is the number of unknown coefficients, zi is a vector of p observed variables, and z’s are non-stochastic variables with known values. (some or all of the z’s might be equal to the X’s or the function of X’s in the model). Except this, all other assumptions are satisfied. Step 2: Second, apply the OLS method to equation (4.191) and then obtain the residuals which are given by ei = Yi  ȕˆ 0  ȕˆ 1X1i  ......  ȕˆ k X ki , i = 1, 2,.......,n.

(4.193)

Step 3: Next, obtain the squares of these residuals, i.e., ei2 ,  i, i = 1, 2,........,n. Now, regress ln(ei2 ) on z’s , that is, ln(ei2 ) = Į 0 +Į1z1i +Į 2 z 2i +......+Į k z ki +vi , i = 1, 2,.......,n.

(4.194)

where vi is a new random error term that satisfies all the usual assumptions of a CLRM. Step 4: Apply the OLS method to equation (4.194) and then obtain the least squares estimates of Į 0 , Į1 , Į 2 ,...., and Į k . The OLS estimator Įˆ will be consistent of Į. Step 5: Using these LS estimates, obtain the predicted values of ei2 ,  i, i = 1, 2,........,n. , which are given by eˆ i2 = exp ^Įˆ 0 +Įˆ 1z1i +Įˆ 2 z 2i +.......+Įˆ k z ki ` exp ^z ciDˆ `

Step 6: Calculate the weight w i =1

(4.195) eˆ i2 for each observation.

Step 7: Multiply Yi , X1i ,.......,and X ki by w i for each observation.

Chapter Four

212

Step 8: Regress w i Yi on w i , w i X1i ,......, and w i X ki of the type

w i Yi = ȕ 0 w i +ȕ1 w i X1i +........+ȕ k w i X ki + İ*i

(4.196)

Apply the OLS method to equation (4.196) and obtain the OLS estimates of the parameters. This yields the FGLS estimators of ȕ 0 , ȕ1 , ……, and ȕ k . Let ȕˆ FGLS be the feasible generalised least squares estimator of ȕ, , where ȕ is a vector of (k+1) parameters ȕ 0 , ȕ1 ,......., and ȕ k respectively. Then, the consistent estimator for the variance-covariance is given by matrix of ȕˆ FGLS

§ n x xc · Var ȕˆ FGLS = ıˆ 2 ¨ ¦ i 2 i ¸ ¨ i=1 eˆ i ¸ © ¹

^

-1

`

where ıˆ 2 =

(4.197)

n (yi  x icȕˆ FGLS ) 2 1 is a consistent estimator of ı 2 . ¦ n  k  1 i=1 eˆ i2

Ex. 4-17: Obtain the FGLS estimates for the given problem in Ex. 4-1. Solution: Let us consider the following heteroscedastic model NETPi = ȕ 0 +ȕ1REVi +İ i , i = 1, 2,........,n.

(4.198)

where NETPi is the net profit (in million BDT), REVi is the sales revenue (in million BDT) of the ith listed company at DSE, and İ i is the random error term corresponding to the ith set of observations. Here, we assume that the error variance is proportional to the exp(Į1REVi ), , that is, Var(İ i |NETPi ) = ı i2 = ı 2 exp(Į1REVi ), (i = 1, 2,....,n).

(4.199)

We apply the OLS method to equation (4.198) and then obtain the residuals ( ei 's ), and squares of residuals ( ei2 's ) which are given in Table 4-2. We now regress ln(ei2 ) on REVi of the type ln(ei2 ) = Į 0 +Į1REVi +vi , i = 1, 2,.......,n.

(4.200)

where vi is the new random error term that satisfies all the usual assumptions. We apply the OLS method to equation (4.200) and then obtain the OLS estimates which are give in Table 4-13. RATS is used to estimate the equation. Table 4-13: The OLS estimates of the equation (4.200)

Linear Regression - Estimation by OLS, Dependent Variable ln(ei2 ) Variable Coeff. Std Error T-Stat Constant -4.2768 0.6877 -6.2186 REV 0.0077 0.0020 3.7974 Std Error of Dependent Variable 42 Usable Observations Standard Error of Estimate Degrees of Freedom 40 Sum of Squared Residuals Centered R2 0.2649 Regression F(1, 40) 0.2466 Adjusted R2 Significance Level of F 0.4942 Uncentered R2 Log Likelihood 20.757 nxR2 Durbin-Watson Statistic -2.9128 Mean of Dependent Variable Chi-Squared (1) 11.1293

Signif 0.0000 0.0004 4.3791 3.8010 577.8908 14.4205 0.0004 -114.6514 1.5442 0.0008

The Chi-square test result indicates that the assumed form of heteroscedasticity is true. Using these LS estimates, we then obtain the predicted values of ei2 ,  i, i = 1, 2,........,n, which are given by

Heteroscedasticity

213

ln(eˆ i2 ) =  4.2768+0.0077REVi eˆ i2 = exp ^-4.2768+0.0077REVi `

We now calculate the weight w i =1

(4.201) eˆ i2 for each observation and then regress w i NETPi on w i , and w i REVi of the

type w i NETPi = ȕ 0 w i + ȕ1 w i REVi + İ*i , i = 1, 2,........,n

(4.202)

Finally, we apply the OLS method to equation (4.202) and the resultant estimates are called the FGLS estimates. RATS is used to obtain the FGLS estimates which are given in Table 4-14. Table 4-14: The FGLS estimates

Linear Regression - Estimation by OLS, Dependent Variable NETP Variable Coeff. Std Error T-Stat Constant 0.5610 0.1154 4.8587 REV 0.0270 0.0016 16.6269 Mean of Dependent Variable 42 Usable Observations Std Error of Dependent Variable 40 Degrees of Freedom Standard Error of Estimate 0.7344 Centered R2 Sum of Squared Residuals 0.7278 Adjusted R2 Log Likelihood 0.9515 Uncentered R2 Durbin-Watson Statistic 39.963 nxR2

Signif 0.0000 0.0000 13.5953 6.5043 3.3935 460.6223 -109.8885 1.9137

Comment: From the estimated results, it is found that, for increasing sales revenue to 1 million BDT, the average net profit will increase by 0.0270 million BDT of the listed companies at DSE and it is statistically significant at any significance level. Also, it is found that, when REV is at zero level, the average net profit will be 0.5610 million BDT, which is also statistically significant at any significance level. From the estimated value of R 2 , it can be said that the fit is not so good. Properties of the FGLS Estimator

If the form of heteroscedasticity that we assumed is a reasonable approximation of the true heteroscedasticity, then the FGLS estimator has the following properties: Property i: It is a linear function of the observed values of the dependent variable y. Property ii: It is biased in small samples. Property iii: It is asymptotically more efficient than the OLS estimator. Property iv: Monte Carlo studies suggest that it tends to yield more precise estimates than the OLS estimator.

However, if the model of heteroscedasticity that we assumed is not a reasonable approximation of the true heteroscedasticity, then the FGLS estimator will yield worse estimates than the OLS estimator. Estimation of a Heteroscedastic Model With Correlated Disturbances

Let us consider a multiple linear regression equation of the type Yt = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +.......+ȕ k X kt + İ t , t = 1, 2,......,T.

Yt  Y = ȕ1 (X1t  X1 ) + ȕ 2 (X 2t  X 2 )+......+ȕ k (X kt  X k ) + İ t y t = ȕ1 x1t +ȕ 2 x 2t +........+ȕ k x kt +İ t

(4.203)

where ȕ 0 = Y  ȕ1X1  ȕ 2 X 2  .......  ȕ k X k , y t = Yt  Y, and x jt = X jt  X j , j = 1, 2,.....,k. In matrix form, equation (4.203) can be written as Y = Xȕ+İ

(4.204)

Chapter Four

214

ª y1 º «y » « 2» « . » where Y = « » , X = « . » « . » « » ¬« y T ¼» T×1

ª x11 «x « 12 « . « « . « . « ¬« x1T

x 21 x 22 . . . x 2T

. . . x k1 º . . . x k2 »» . . . . » » ,ȕ= . . . . » . . . . » » . . . x kT ¼» k×T

ª ȕ1 º «ȕ » « 2» «.» « » , and İ = «.» «.» « » ¬«ȕ k ¼» k×1

ª İ1 º «İ » « 2» «.» « » . «.» «.» « » ¬«İ T ¼» T×1

The assumptions are given below: (i) E(İ t ) = 0,  t, (ii) E(İ 2t ) = ı 2t ;  t, (iii) E(İ t , İ t ) z 0, (iv) Rank(X) = k, and (v) X and İ are independent. The variance-covariance matrix of İ is given by ª E(İ12 ) E(İ1İ 2 ) « 2 « E(İ 2 İ1 ) E(İ 2 ) « . . Var(İ) = « . . « « . . « «¬ E(İ T İ1 ) E(İ T İ 2 )

. . . E(İ1İ T ) º » . . . E(İ 2 İ T ) » . . . . » » . . . . » . . . . » » . . . E(İ 2T ) »¼ T×T

(4.205)

The autocorrelated coefficient between İ t and İ t-j is given by ȡj =

E(İ t İ t-j ) Var(İ t ) Var(İ t-j )

,j

E(İ t İ t-j ) = ȡ j Var(İ t ) Var(İ t-j )

(4.206)

where (4.207)

İ t = ȡİ t-1 +v t

Here, v t is the new random error term that satisfies all the usual assumptions. For the first-order autoregressive model, we have Var(İ t ) =

ı 2v = ı2 ,  t 1  ȡ2

(4.208)

Putting these values in equation (4.205), we have ª ı2 « 2 « ıȡ « . Var(İ) = « « . « . « 2 T-1 ¬« ı ȡ

ı 2ȡ ı2 . . . 2 T-2

ıȡ

. . . ı 2 ȡ T-1 º » . . . ı 2 ȡ T-2 » . . . . » » . . . . » . . . . » » . . . ı 2 ¼» T×T

= ı2ȍ

ª 1 ȡ « 1 « ȡ « . . where ȍ = « . « . « . . « T-1 T-2 ȡ «¬ȡ

(4.209) . . . ȡ T-1 º » . . . ȡT-2 » . . . . » » . . . . » . . . . » » . . . 1 »¼ T×T

Heteroscedasticity

215

Let H be a non-singular transformation matrix. Now, we multiply equation (4.204) by H, and then we have HY = HXȕ+Hİ Y* = X*ȕ +İ*

(4.210)

Now, E(İ* ) = HE(İ) = 0, , and  Var(İ* ) = Var(Hİ) = HVar(İ)H c = ı 2 HȍHc

(4.211)

If it is possible to specify H such that (4.212)

H:H c = IT

We can apply the OLS method to the transform model and the resultant estimates would have all the optimal properties of OLS and could be validly subjected to the usual inference procedure. It can be shown that : is a symmetric positive definite matrix, a non-singular matrix P can be found such that PPc = PcP = :

P -1PPc(Pc)-1 = P -1:Pc-1 P -1:Pc-1 = IT

(4.213)

Comparing equations (4.212) and (4.213) we find that the appropriate H is given by H = P -1 , and Hc = Pc-1

(4.214)

Thus, we have Var(İ* ) = ı 2 P -1PPcPc-1

= ı 2 IT

(4.215)

It is also true that İ* ~NIID (0, ı 2 IT ), and hence, our appropriate transformation is given by equation (4.210). Since  E(İ* ) = 0, and Var(İ* ) = ı 2 IT , we can apply the OLS method to equation (4.210) to obtain the estimator of ȕ, and the  resultant estimator is called the GLS estimator of ȕ, as given by ȕˆ GLS = (X*c X* ) 1 (X*c Y* ) = [(P -1X)c(P -1X)]-1[(P -1X)cP -1Y] = (XcP -1c P -1X)-1 (XcP -1c P -1Y)

= (Xc:-1X)-1 (X c:-1Y)

(4.216)

The variance-covariance matrix of ȕˆ GLS is given by Var(ȕˆ GLS ) = (X*c X* ) 1ı 2 = [(P -1X)c(P -1X)]-1ı 2 = (X cP -1c P -1X)-1ı 2

= (X c:-1X)-1ı 2

(4.217)

Chapter Four

216

Now, : -1 = (PPc)-1 = Pc-1P -1 = H cH

(4.218)

It is shown in Chapter 5 that, it is approximately equivalent to applying the OLS method to the following transformed model Yt  ȡYt-1 =ȕ 0 (1  ȡ)+ȕ1 (X1t  ȡX1,t-1 )+ȕ 2 (X 2t  ȡX 2,t-1 )+......+ȕ k (X kt  ȡX k,t-1 )+ İ t  ȡİ t-1

which will give consistent and asymptotically efficient estimators. But due to the lagged dependent variable, there will be a finite sample bias. It is also explained in detail in Chapter 5 with a numerical example. Maximum Likelihood Estimation of a Heteroscedastic Model

Let us consider the multiple linear regression model of the type Yi = x ci ȕ + İ i , i = 1, 2,........,n ª 1 º «X » « 1i » « . » , and ȕ = where x i = « » « . » « . » « » «¬ X ki »¼ (k+1)u1

(4.219) ªȕ 0 º «ȕ » « 1» «.» .. « » «.» «.» « » «¬ȕ k »¼ (k+1)×1

Equation (4.219) can also be written as the following matrix form Y = Xȕ+İ

(4.220)

Assumptions: E(İ|X) = 0, , Var(İ|X) = ı 2 ȍ, and E(İ i ,İ j ) = 0,  i z j. Here, : is a matrix of known constants. If the disturbances  are multivariate normally distributed, that is, İ~ND(0, ı 2 ȍ), then the likelihood function is given by 

1

L=



n 2

2

ıȍ

1 2

e



1 2ı 2

Y  Xȕ c :1 Y  Xȕ

(4.221)

The log-likelihood function is given by log(L) = 

n n 1 1 log(2ʌ)  log(ı 2 )  log|ȍ|  Y  Xȕ c : 1 Y  Xȕ 2 2 2 2V 2

(4.222)

Since : is a matrix of known constants, maximising logL with respect to ȕ implies minimising the weighted sum of squares or the generalised sum of squares. The weighted sum of squares (WSS) is given by WSS = Y  Xȕ c : 1 Y  Xȕ = Y cȍ -1Y  ȕcX cȍ -1Y  Ycȍ -1Xȕ + ȕcX cȍ -1Xȕ

(4.223)

Differentiating WSS with respect to ȕ and then equating to zero, we have X c: 1Y  Yc: 1X + 2X c: 1Xȕ

0

(4.224)

Since Xc: 1Y is a scalar quantity thus (Xc: 1Y)c = Xc: 1Y Ÿ Yc: 1X =X c: 1Y . Therefore, equation (4.224) we can also be written as 2Xc: 1Y+2X c: 1Xȕ

0

Heteroscedasticity

Xc: 1Xȕ ȕˆ

217

Xc: 1Y

(X c: 1X) -1 X c: 1Y

(4.225)

Or, The necessary conditions for maximising logL are

G log(L) 1 =  2 ^X c: 1Y  Yc: 1X + 2X c: 1Xȕ` = 0 GE 2ı

(4.226)

įlog(L) n 1 =  2 log(ı 2 ) + Y  Xȕ c :1 Y  Xȕ = 0 2 įı 2ı 2ı 4

(4.227)

Solving equations (4.226) and (4.227), we have -1 ȕˆ ML = X c: 1X X c: 1Y

(4.228)

and ıˆ 2ML

1 ˆ c : 1 (Y  Xȕ) ˆ (Y  Xȕ) n

(4.229)

Thus, it can be said that the GLS and ML estimators of ȕ are identical. Variance of ML Estimates

The variance of the ML estimate of ȕ is given by Var(ȕˆ ML ) =

1 § į 2 log(L) · E ¨ ¸ 2 © įȕ ¹

(4.230)

From equation (4.226), we have į 2 log(L) 1 =  2 Xc:-1X 2 įȕ ı

(4.231)

Thus, we have § į 2 log(L) · 1 -1 E ¨ ¸ = 2 Xc: X 2 įȕ ı © ¹

(4.232)

Putting this value in equation (4.230), we have Var(ȕˆ ML ) =

ı2 Xcȍ-1X (X c:-1X) 1 ı 2

(4.233)

The variance of ıˆ 2ML is given by Var(ıˆ 2ML ) =

1 2

§ į log(L) · E ¨ ¸ 4 © įı ¹

From equation (4.227), we have

(4.234)

Chapter Four

218

į 2 log(L) n =  4 4 įı 2ı

(4.235)

Thus, putting this value in equation (4.234), we have Var(ıˆ 2ML ) =

2ı 4 n

(4.236)

Thus, we have a o N ª¬ȕ, (X c:-1X) 1ı 2 º¼ ȕˆ ML 

(4.237)

and ª 2ı 4 º a ıˆ 2ML  o N «ı 2 , » n ¼ ¬

(4.238)

We can apply the likelihood ratio statistic for testing a null hypothesis about ȕ. If we insert the ML estimates of ȕ and ı 2 in the log-likelihood function, we find the maximum log-likelihood function which is given by ˆ =  n ª1 + log(2ʌ) + log(ıˆ 2 ) º  1 log|: | log(L) ML ¼ 2¬ 2

(4.239)

Under the null hypothesis the likelihood function is given by n 1 2 log(Lˆ H0 ) =  ª¬1+ log(2ʌ) + log(ıˆ MLH ) º  log|: | 0 ¼ 2 2

(4.240)

Under the null hypothesis, the likelihood ratio test is given by ˆ º LR =  2 ª log Lˆ H0  log(L) ¬ ¼



n ª n º 2 =  2 «  log(ıˆ 2MLH0 )+ log(ıˆ ML )» 2 ¬ 2 ¼ 2 ­°Vˆ MLH = n log ® 2 0 ¯° Vˆ ML

½° ¾ ¿°

(4.241)

The LR statistic is asymptotically distributed as Chi-squared with m degrees of freedom where m is the number of restrictions. 2

2 Į

Maximum Likelihood Estimation of a Heteroscedastic Model if the Form of Heteroscedasticity Is ıi = ı z i

The given form of heteroscedasticity is ıi2 = ı 2 ziĮ , where z i (ı > 0, z i > 0), is a non-stochastic variable which will be identical or a function of the variable X’s, ı 2 and Į are the two parameters, measure the importance of heteroscedasticity. The lower its magnitude, the smaller the differences between individual variances. When Į = 0, the model is homoscedastic. These two parameters are unknown in which case they have to be estimated only with the regression coefficients Į and ȕ, or at least one of them may be specified priori. The log-likelihood function is given by 2 n n Į n 1 n ª Y  x cȕ º logL =  log(2ʌ)  log(ı 2 )  ¦ log(zi )  ¦ « i 2 Įi » 2 2 2 i=1 2 i=1 «¬ ı zi »¼

The necessary conditions for maximising logL are

(4.242)

Heteroscedasticity

ª x i Yi  x ci E º » zDi i=1 ¬ ¼

219

n

įlogL 1 =0Ÿ 2 įȕ ı

¦«

įlogL n 1 =0Ÿ 2 + 4 2 įı 2ı 2ı

(4.243)

0

ª Yi  x ci ȕ 2 º « » ¦ zDi i=1 « »¼ ¬ n

0

2 įlogL 1 n 1 n ª Y  x ci E log z i º = 0 Ÿ  ¦ log(z i )+ 2 ¦ « i » įĮ 2 i=1 2ı i=1 «¬ zDi »¼

(4.244)

0

(4.245)

Equations (4.243), (4.244) and (4.245) are not linear and highly non-linear because of the presence of the parameter D . A relatively simple way of obtaining a solution is as follows. we obtain the following three equations to be solved for the unknown values of ȕ, ı 2 , and Į. ª x i Yi º ¦ « D »= i=1 ¬ z i ¼ n

ª x i x ci ȕˆ º D » i=1 « ¬ zi »¼ n

¦«

ª ˆ 1 n Yi  x ci ȕ ıˆ = ¦ «« n i=1 zDi «¬



2

1 n ¦ log(zi ) 2 i=1



2

(4.246)

º » » »¼

(4.247)

2 ª º ˆ 1 n « Yi  x ci ȕ log zi » ¦ » 2ıˆ 2 i=1 « zDi «¬ »¼





(4.248)

If Į is known, then equations (4.246) and (4.247) can be solved for ȕˆ and ıˆ 2 . For a given value of Į, we can obtain ˆ and ıˆ 2 by generalised least squares using the weights w = z  Į/2 . In most economic models, it is most unlikely ȕ, i i that Į would lie outside the range of 0 to 4. We can obtain the estimates of ȕ and ı 2 for setting Į=0, Į=0.1, Į = 0.2 and so on. For each value of Į and the corresponding values of ȕ and ı 2 , we can also calculate the value of the likelihood function (L) or log-likelihood function (log(L)). Then, from all of these solutions, we select the one that gives us the largest value of L or logL. This solution will maximise the likelihood function or log-likelihood function as desired. This method is called the search method.

The variance of ȕˆ is given by Var(ȕˆ ) =

1 2

ª į logL º -E « » 2 ¬ įȕ ¼ ȕ=ȕˆ

(2.249)

From equation (4.243), we have į 2 logL 1 =  2 įȕ 2 ı

ª n x xcº «¦ i D i » «¬ i=1 zi »¼

(2.250)

Therefore, we have 1

ª n x xcº Var(ȕˆ ) = « ¦ i D i » ı 2 «¬ i=1 zi »¼ -1

= Xc:-1X ı 2

The variance of ıˆ 2 is given by

(4.251)

Chapter Four

220

1

Var(ıˆ 2 ) =

2

ª į logL º -E « » 4 ¬ įı ¼ ı2 =ıˆ 2

(4.252)

From equation (4.244), we have į 2 logL n 1 =  6 4 4 įı 2ı ı

ª (Y  x c Eˆ º i i » D z i =1 »¼ i ¬ n

¦ ««

(4.253)

Thus, we have ª į 2 logL º n E « » = 4 4 ¬ įı ¼ 2ı

(4.254)

Therefore, from equation (4.272), we have

2ı 4 n

Var(ıˆ 2 ) =

(4.255)

The variance of Dˆ is given by ˆ Var(Į)

1

(4.256)

2

ª į log(L) º E« » 2 ¬ įĮ ¼D



We have į 2 log(L) 1 = 2 įĮ 2 2ı

ª İ i2 log(z i ) 2 º « » ¦ ziĮ i 1 « ¬ ¼» n

ª į 2 log(L) º 1 E « » = 2 2 ¬ įĮ ¼ 2ı

ª E(İ i2 ) log(zi ) 2 º « » ¦ z iĮ i 1 « »¼ ¬

ª į 2 log(L) º 1 E « »= 2 2 įĮ 2ı ¬ ¼

ª ı 2 z iĮ log(zi ) 2 º « » ¦ ziĮ i=1 « ¬ ¼»

n

n

ª į 2 log(L) º 1 n 2 -E « » = ¦ ^log(zi )` 2 įĮ 2 i=1 ¬ ¼

(4.257)

Therefore, from equation (4.276) we have ˆ = Var(Į)

2

(4.258)

n

¦ ^log(z )`

2

i

i=1

Ex. 4-18: Obtain the maximum likelihood (ML) estimators for a heteroscedastic model TEX i = ȕ 0 +ȕ1TINCi +İ i , if

the form of heteroscedasticity is ı i2 = ı 2 TINCiĮ , where the variable TEX indicates the total spending and the variable TINC indicates the total income of the state-owned banks of Bangladesh. Solution: The given heteroscedastic model is TEX t = ȕ 0 +ȕ1TINC t +İ t

All the terms of equation (4.259) are explained before. Here, given that ı i2 = ı 2 TINCiĮ . The log-likelihood function will be

(4.259)

Heteroscedasticity

logL = 

221

2 D T T T 1 T ª TEX t  ȕ 0  ȕ1TINC t º log(2ʌ)  log(ı 2 )  ¦ log(TINC t )  ¦ « » 2 2 2 t=1 2 t=1 «¬ ı 2 TINCĮt »¼

(4.260)

We obtain the log-likelihood function for various values of Į as recorded below in Table 4-15. Table 4-15 : The estimated values of log(L) for different values of

D

D

Log(L) -356.9237 -337.8372 -323.4842 -312.1381 -302.4884 -299.9958 -309.2943 -325.22961

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

It is found that, for Į = 2.5, , the log-likelihood function is maximum. Now, the full set of maximum likelihood estimates for Į = 2.5, is given in Table 4-16. Table 4-16: The ML estimates for a multiplicative form of heteroscedasticity

Variable Constant TINC ıˆ 2 Dˆ

Coeffi. -0.8814 0.8891 0.000201 2.5

Standard Error 1.5028 0.0148 4.1982e-005 0.0109

t-ratio -0.5865 60.0748 4.7958 229.4285

Prob. 0.5605 0.0000 0.0000 0.0000

Estimation of a Heteroscedastic Model with Variances Constant Within Subgroups of Observations

In many instances, it might be reasonable to assume that the variance is constant within subgroups but it varies from group to group. Let there be m groups with

m

¦n

i

n, where n i is the number of observations in the ith group (i = 1,

i=1

2,………,m). The model for m groups can be written as (4.261)

Y = Xȕ+İ

ª Y1 º «Y » « 2» « . » where Y = « » , X = « . » « . » « » «¬ Ym »¼

ª X1 º «X » « 2» « . » « » , and İ = « . » « . » « » «¬ X m »¼

ª İ1 º «İ » « 2» « . » « ». « . » « . » « » «¬İ m »¼

where, Yi is a (n i u 1) vector of the observations of a dependent variable, X i is a {n i u (k+1)} non-stochastic designed matrix, ȕ is a ((k+1) u 1) vector of unknown parameters, İ i is a (n i u 1) vector of random error terms. Assumptions: E(İ i ) = 0,  i, E(İ i2 ) = ı i2 I ni , E(İ i ,İ j ) = 0, for i z j.

Equation (4.261) is the set of m regression equations where the coefficient vector ȕ for each equation is restricted to be the same and where there is no correlation between the disturbances in different equations within each group. The generalised least squares (GLS) estimator of ȕ is given by

Chapter Four

222 -1 ȕˆ GLS = X c:-1X X c:-1Y -1

ªm º ªm º = « ¦ ı i-2 X ic X i » « ¦ ıi-2 X ci Yi » ¬ i=1 ¼ ¬ i=1 ¼

(4.262)

For the ith group we have Xci X i bi = X ci Yi bi = (Xci X i )-1Xci Yi

(4.263)

where bi is a ^(k+1) u 1` vector of OLS estimators of the ith group of observations. Putting the value of Xci Y in equation (4.262), we have -1

ªm º ªm º ȕˆ GLS = « ¦ ı i-2 Xci X i » « ¦ ı i-2 Xci X i bi » ¬ i=1 ¼ ¬ i=1 ¼

(4.264)

The variance-covariance matrix of bi is given by -1

Var(bi ) = > X ci X i @ ıi2 Var(bi ) = ª¬ı i-2 Xci X i º¼ 1

-1

ıi-2 X ci X i

> Var(bi )@

(4.265)

Thus, putting this value in equation (4.264), we have -1

m ªm º 1 º ª 1 ȕˆ GLS = « ¦ > Var(bi ) @ » « ¦ > Var(bi )@ bi » ¬ i=1 ¼ ¬ i=1 ¼

=

m

ªm 1 º W b , where W = ¦ i i i « ¦ > Var(bi )@ » i=1 ¬ i=1 ¼

-1 1

> Var(bi )@

(4.266)

This is a matrix weighted average of the m least squares estimators. The weighting matrices are Wi . The weight will be larger for the smaller covariance matrix of the estimators. For this estimator to be operational, we need to replace ıi2 with an estimator. If n i >k, (i =1, 2,…..,m), then (Yi  X i bi )c(Yi  X i bi ) , where bi = (Xci X i )-1X ci Yi . ni  k 1

ıˆ i2 =

(4.267)

Then, the estimated GLS estimator is given by ˆ -1X ȕˆ GLS = X c:





-1

ˆ -1Y X c: -1

ªm º ªm º = « ¦ ıˆ i-2 X ic X i » « ¦ ıˆ i-2 X ci Yi » ¬ i=1 ¼ ¬ i=1 ¼

(4.268)

If the rows for a given Xi are identical, (4.268) is reduced to ȕˆ GLS =

m

¦ w b , , where w i

i=1

i

i

=

ı i-2 m

¦ı i=1

. -2 i

(4.269)

Heteroscedasticity ni

¦ (Y

ij

where ıˆ i2 =

ni

 Yi ) 2

j=1

n i -1

223

¦Y

ij

, and Yi

j=1

ni

.

Again, it can be said that the weights are larger for the estimators if the variances are smaller. The variance-covariance of ȕˆ GLS is given by -1 Var ¬ªȕˆ GLS ¼º = X c:-1X ı 2 -1

ª m -2 c º 2 «¦ ıi Xi Xi » ı ¬ i=1 ¼ m

2

where ı =

¦h ı i

2 i

i=1

h

(4.270)

m

, h i = n i -k-1, and h = ¦ h i . i=1

Ex. 4-19: Estimate the heteroscedastic model assuming that the variances are constant within subgroups but unequal variances between subgroups for the given problems in Ex. 2-4 and Ex. 2-5. Solution: There are two groups for the given problem. We assume that the variances are constant within groups but the group variances are not equal.

The model for 2 groups can be written as Yij = ȕ 0 +ȕ1X ij +İ ij , i = 1, 2 and j =1, 2,.....,n i .

(4.271)

where Yij is the jth expenditure of the ith group, X ij is the jth income of the ith group and İ ij is the random error term corresponding to the jth observation of the ith group. In matrix form, equation (4.271) can be written as (4.272)

Y = Xȕ+İ ª Y1 º where Y = « » , X = ¬ Y2 ¼

ª X1 º « X » , and İ = ¬ 2¼

ªİ1 º «İ » . ¬ 2¼

where Yi is a (n i u 1) vector of the observations of a dependent variable, X i is a (n i u 2) non-stochastic designed matrix, ȕ is a (2 u 1) vector of unknown parameters, and İ i is a vector of random error terms. Assumptions: E(İ i ) = 0,  i, E(İ i2 ) = ı i2 I ni , E(İ i , İ j ) = 0, for i z j.

The estimated results are given below for two groups Results for Foreign Banks:

The OLS estimator b1 for group 1 is given by b1 = (X1c X1 )-1X1c Y1 -8.5442e-006 º ª 29519.5867 º ª 0.0330 «-8.5442e-006 6.4643e-009 » «112169938.4485» ¬ ¼¬ ¼

ª16.70069 º « 0.47289 » ¬ ¼

(4.273)

Chapter Four

224

ıˆ 12 is given by ıˆ 12 =

(Y1  X1b1 )c(Y1  X1b1 ) n1  k  1 216305.9667 44

(4.274)

4916.0447

The variance-covariance matrix of b1 is given by Var(b1 ) = (X1c X1 )-1Vˆ12

-8.5442e-006 º ª 0.0330 «-8.5442e-006 6.4643e-009 » u 4916.0447 ¬ ¼ ª162.3880 = « ¬ -0.0420

-0.0420 º 3.1779e-005»¼

(4.275)

Results for State-Owned Banks:

The OLS estimator b 2 for group 2 is given by b 2 = (X 2c X 2 )-1X 2c Y2 -2.7231e-006 º ª181226.0600 º ª 0.0346 = « »« » ¬-2.7231e-006 5.7807e-010 ¼ ¬ 2.2600e+009 ¼ ª110.31025º « 0.81291 » ¬ ¼

(4.276)

ıˆ 22 is given by ıˆ 22 =

(Y2  X 2 b 2 )c(Y2  X 2 b 2 ) n2  k 1 14785942.5414 44 336044.1487

(4.277)

The variance-covariance matrix of b 2 is given by Var(b 2 ) = (X 2c X 2 )-1Vˆ 22 -2.7231e-006 º ª 0.0346 = « » u 336044.1487 ¬-2.7231e-006 5.7807e-010 ¼

ª11616.0381 = « ¬ -0.9151

-0.9151 º 1.9426e-004»¼

Putting all the terms in equation (4.268), we have

(4.278)

Heteroscedasticity

ȕˆ GLS

ª9.4940e-003 « 13.0125 ¬

ª154.5516 « -0.0359 ¬

225

1

13.0125 º ª 6.5440 º 55999.5508»¼ «¬ 29542.3024 »¼

-0.0359

º ª 6.5440 º 2.6202e-005»¼ «¬ 29542.3024 »¼

ª 49.5540 º « 0.5391 » ¬ ¼

(4.279)

The variance-covariance matrix of ȕˆ GLS

We have ıˆ 2 =

44 u 4916.0447+44 u 336044.1487 88

= 384328.0410

(4.280)

We have already found that ª 2 -2 c º « ¦ ıˆ i X i X i » ¬ i=1 ¼

-1

ª154.5516 « -0.0359 ¬

-0.0359 º 2.6202e-005»¼

Thus, putting these values in equation (4.270) we have Var ª¬ȕˆ GLS º¼

ª154.5516 « 0.0359 ¬

0.0359 º u 384328.0410 2.6202e-005»¼

ª59398521.9638 13802.2774 º « 13802.2774 10.0703 »¼ ¬

(4.281)

Derivation of the Variance-covariance Matrix of the GLS Estimator

The GLS estimator of ȕ is given by ȕˆ GLS = (Xc:-1X)-1X c:-1Y = (Xc:-1X)-1X c:-1 > Xȕ+İ @ = (Xc:-1X)-1X c:-1Xȕ  (X c:-1X)-1 Xc:-1İ = ȕ+(X c:-1X)-1X c:-1İ

(4.282)

Taking the expectation of equation (4.282), we have E(ȕˆ GLS ) = ȕ +(X cȍ -1X)-1X cȍ -1 E(İ)

= ȕ, [  E(İ) = 0] 

This shows that ȕˆ GLS is an unbiased estimator of ȕ. The variance-covariance matrix of ȕˆ GLS is given by

(4.283)

Chapter Four

226

Var(ȕˆ GLS ) = E(ȕˆ GLS  ȕ)(ȕˆ GLS  ȕ)c = E[(X c:-1X)-1 (X c:-1İ)] [(Xc:-1X)-1X c:-1İ]c = (Xc:-1X)-1X c:-1E(İİ c):-1X(Xc:-1X)-1

= (Xc:-1X)-1Xc:-1V 2 ::-1X(X c:-1X)-1 = (Xc:-1X)-1X c:-1X(X c:-1X)-1ı 2 = (Xc:-1X)-1ı 2

(4.284)

Since ı 2 is unknown to us, we have to estimate this. It can be estimated as follows ıˆ 2 =

1 c ˆ -1 Y  Xȕˆ GLS : Y  Xȕˆ GLS n  k 1









Thus, we have ˆ -1X)-1ıˆ 2 Var (ȕˆ GLS ) = (Xc:

(4.285)

Exercises 4-1: Explain the meaning of heteroscedasticity. Distinguish between homoscedasticity and heteroscedasticity with an example of each. 4-2: Let İ nu1 be a vector of random error terms of a multiple linear regression equation. Find the variance-covariance matrix of İ nu1 if the random error terms are not correlated and if they are correlated. 4-3: Let the heteroscedastic model between Y and X be Yi = Į+ȕX i +İ i and the form of heteroscedasticity be Var(İ i ) = ıi2 = ı 2 X i2 , i = 1, 2,.......,n, and Cov(İ i ,İ j ) = 0, i z j. Obtain the variance co-variance matrix of İ where

İ is a (n×1) matrix of the random error terms. 4-4: Distinguish between multiplicative and additive forms of heteroscedasticity with an example of each. 4-5: Write the important sources of heteroscedasticity and discuss the nature of heteroscedasticity. 4-6: Discuss the consequences, if we apply the OLS method to the heteroscedastic model. 4-7: Let the heteroscedastic model between Y and X’s be yi = ȕ1 x1i +ȕ 2 x 2i +...... +ȕ k x ki +İ i , where the variable Y and all the independent variables are measured in the deviation form from their mean value. The equation can also be written as yi = x icȕ  İ i or y = xȕ+İ. If the form of heteroscedasticity is Var(İ i |x i ) = ı i2 ı 2 w i ,  i, then shows that

the OLS estimator ȕˆ of ȕ is no longer the best. Also, shows that the OLS estimator ȕˆ is less efficient than the of GLS estimator of ȕ. Show that s 2 vector.

ece 2 is not an unbiased estimator of ı , where e is the OLS residuals n  k 1

4-8: Discuss the graphical method for detecting the presence of heteroscedasticity in the case of simple and multiple linear regression equations. 4-9: Discuss the Bartlett test for detecting the presence of heteroscedasticity in the data set in the case of simple and multiple linear regression equations. 4-10: Discuss the Park test for detecting the presence of heteroscedasticity. 4-11: Let the regression equation between Y and X be given by Yi = Į+ȕX i +İ i and the form of heteroscedasticity be Var(İ i ) = ıi2 = ı 2 XGi e vi , i = 1, 2,.........,n. Test the null hypothesis H 0 : į = 2 against an appropriate alternative hypothesis.

Heteroscedasticity

227

4-12: Discuss the following tests for detecting the presence of heteroscedasticity:

(i) The Goldfeld and Quandt test, (ii) The Glejser test, (iii) The Spearman’s rank correlation test, (iv) The likelihood ratio (LR) test, (v) The Breusch-Pagan test, (vi) The Lagrange multiplier test, (vii) The White test, (viii) The Harvey and Godfrey test, and (ix) The Engles’s test. 4-13.The data given below are the annual sales of a company (Y, in million USD) in ten different districts and the disposable income (X, in million USD) of the inhabitants of these districts.

Districts Y X

1 14 90

2 16 130

3 17 145

4 19 152

5 18 163

6 20 167

7 21 172

8 24 195

9 23 205

10 25 210

Detect the presence of heteroscedasticity using an appropriate test statistic. 4-14: Let the regression model between the corruption perception index (CPI) and per capita GDP (PGDP, constant 2015 USD) of high-income and low-income countries be given by CPIij = ȕ 0 + ȕ i PGDPij +İ ij , (i=1, 2, j = 1, 2,….., n i ),

where E(İ i |PGDPi ) = 0, the form of heteroscedasticity is Var(İ i |PGDPi ) = ı i2 I ni , i = 1, 2, and Cov(İ i ,İ j ) = 0, i z j. Discuss Bartlett’s test for detecting the presence of heteroscedasticity in CPI between two groups of countries. Collect the numerical data for two groups of countries and then testify. 4-15: Discuss the likelihood ratio test for testing heteroscedasticity. Assume that the variable Y indicate the corruption perception index and X indicates the economic growth (per capita real GDP) of all the countries of this globe. Let all the countries be divided into two groups namely (i) low-income group and (ii) high income group. In the low-income group, there are 72 countries and in the high-income group, there are 51 countries. For the combined regression

equation, the residual sum of squares is

123

¦e

2 i

17373.9057. For the low-income group, the residual sum of squares is

i=1

72

2 1i

¦e

6484.7149 and for the high-income group the residual sum of squares is

i=1

51

¦e

2 2i

6084.7682. . Test the

i=1

presence of heteroscedasticity between these two groups using the likelihood ratio test. 4-16: Assume that the regression equation indicate the relationship between expenditure and income of ith industry ( i = 1, 2, 3, 4, 5, 6 7). The results for the maximum likelihood estimate of ı 2 for the combined sample is s 2 =19. Also, the results for the maximum likelihood estimate of ıi2 ( i = 1, 2, 3, 4, 5, 6 7) and the sample size for each industry are given below

Industry 1 2 3 4 5 6 7

Estimated Variance ( si2 ) 16 20 25 12 22 17 21

Sample Size ( n i ) 25 35 34 36 32 28 35

Test the problem of heterogeneity using the likelihood ratio test. 4-17 Let the multiple linear regression equation for BRIC countries is Yij = ȕi0 +ȕi1X1ij +ȕ i2 X 2ij +ȕi3 X 3ij +ȕ i4 X 4ij +İ ij , (i = 1, 2, 3, 4; j = 1, 2,......,n i ) where Y is the per capita carbon dioxide emissions in metric tons , X1 is the energy consumption (kg of oil equivalent per capita), X2 is the trade openness (% of exports and imports of GDP), X3 is the urbanisation (% of the urban population of total), and X4 is the per capita real GDP (PGDP) (constant US $). Discuss the appropriate test statistic for detecting the presence of heteroscedasticity in carbon emissions of BRIC countries. Collect the numerical data and then testify.

4-18: Discuss the WLS method and GLS method to estimate a heteroscedastic model. 4-19: Let the regression model between Y and X be given by yi = ȕx i +İ i , where E(İ i |x i ) = 0, 2

2 i

the form of

heteroscedasticity is Var(İ i |x i ) = ı x , i = 1, 2,....,n, and Cov(İ i ,İ j ) = 0, i z j. Here, the variables y and x are measured in the deviation form from their mean.

228

Chapter Four

(i) Given a sample of observations on y and x, what is the most efficient estimator of ȕ? What is its variance? (ii) What is the OLS estimator of ȕ and what is the variance of the OLS estimator? (iii) Show that the OLS estimator of ȕ is never as efficient as the GLS estimator. 4-20: If the regression model is yi = ȕ 0 +İ i , for the structural form of heteroscedasticity given in problems 4-19, answer the following questions.

(i) What is the most efficient estimator of ȕ 0 ? What is its variance? (ii) What is the OLS estimator of ȕ 0 and what is the variance of the OLS estimator? (iii) Show that the OLS estimator of ȕ 0 is never as efficient as the GLS estimator. 4-21: Let the regression equation between Y and X is given by Yi = Į+ȕX i +İ i , where E(İ i |x i ) = 0, and the form of

heteroscedasticity is Var(İ i |x i ) = ı i2 , i = 1, 2,....,n. Given a sample of observations on Y and X, discuss the technique to estimate the equation. If the form of heteroscedasticity is Var(İ i |x i ) = ı i2 X i2 , i = 1, 2,....,n, then discuss the technique to estimate the equation. If the form of heteroscedasticity is Var(İ i |x i ) = ıi2 = ı 2 (a+bx i ), then discuss the technique to estimate the equation. If the form of heteroscedasticity is Var(İ i |x i ) = ı i2 = ı 2 (a+bx i +cx i2 ), discuss the technique to estimate the equation. 4-22: Let the regression equation between Y and X’s 2 i

be given by Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +İ i ,

2

Ÿ Yi x ci ȕ  İ i , where E(İ i |x i ) = 0, and Var(İ i |x i ) = ı = ı w i , where w i is a non-stochastic variable which will be identical or function of X’s. Given a sample of observations on Y and X’s, discuss the technique to obtain the generalised least squares (GLS) estimator of ȕ. If the form of heteroscedasticity is Var(İ i |x i ) = ıi2 = ı 2 X1i2 , then discuss the technique to obtain the GLS estimator of ȕ .

4-23: Discuss the method to estimate a heteroscedastic model with correlated disturbances. 4-24: Consider a linear model to the explain monthly beer consumption Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ3 X 3i +İ i , Ÿ Yi x ci ȕ  İ i , where the variable Y indicates consumption of milk, X1 is the income level of a consumer, X2 is the

price per litre of beer, and X3 is the education level of a consumer. If E(İ i |x i ) = 0, and Var(İ i |x i ) = ı i2 = ı 2 X1i2 , discuss the technique to obtain the GLS estimator of ȕ. 4-25: Let the regression equation between Y and X’s be given by, Yi = ȕ 0 +ȕ1X1i +........+ȕ k X ki +İ i , Ÿ Yi = x ci ȕ+İ i ,

where E(İ i |x i ) = 0 and Var(İ i |x i ) = ı i2 = ı 2 exp ^Į1z1i +Į 2 z 2i +......+Į p z pi ` = ı 2 exp ^zcĮ` , where zi 's are the nonstochastic variables which are identical to the X’s or function of X’s. Given a sample of observations on Y and X’s, discuss the technique to obtain the feasible generalised least squares (FGLS) estimator of ȕ. 4-26: For the regression equation given in problems 4-25, the form of heteroscedasticity is Var(İ i |x i ) = ı 2 ȍ, and E(İ i , İ j ) = 0,  i z j. Given a sample of observations on Y and X’s, then discuss the maximum likelihood (ML)

method to estimate the heteroscedastic model. 4-27: For the regression equation given in problem 4-25, if the form of heteroscedasticity is Var(İ i |x i ) = ı 2 ziĮ , where zi is a non-stochastic variable and which will be identical or a function of X’s, discuss the ML method to estimate the heteroscedastic model. 4-28: Discuss the technique to estimate a heteroscedastic model with variances constant within subgroups of observations. 4-29: Let the regression equation between per capita income (X) and per capita consumption expenditure (Y) of a developed country be given by Yi = Į+ȕX i +İ i , where E(İ i |x i ) = 0, and the form of heteroscedasticity is

Var(İ i |x i ) = ıi2 = ı 2 x i . The moment matrices and all other information are given below (the matrix X includes constant and one independent variable x).

Heteroscedasticity

229

Factor Xcȍ -1X

Value 49 º ª1.4932 « 49 1709.293»¼ ¬

Xcȍ -1Y

ª 16.464 º «916.3349 » ¬ ¼ 492.0414

The total sum of squares of the heteroscedastic model Sum of the dependent variable of the heteroscedastic model Sample size

154.6032 49

(i) Obtain the GLS estimate of the coefficient vector ȕ and comment on your results. (ii) Obtain the variance-covariance matrix of ȕˆ , where ȕˆ is the GLS estimator of ȕ . (iii) Obtain the regression and residual sum of squares for the heteroscedastic model. (iv) Obtain the centered R2, adjusted R2 and uncentered R2 and comment on your results. (v) Test the significance of the parameters in the regression equation. 4-30: Let the regression equation of total income (Y, in million $) on money supply (X1, in million $), and total investment (X2, in million $) of the banking sector of a developing country be given by Yt = Į+ȕ1X1t +ȕ 2 X 2t +İ t , 2

2

Ÿ Yt = xctȕ + İ t , where E(İ t |x t ) = 0, Cov(İ t ,İ j ) = 0, t z j, and the form of heteroscedasticity is Var(İ t |x t ) = ı t = ı x t .

The moment matrices and all other information are given below (the matrix X includes constant and independent variables). Factor Xcȍ -1X

Value 44 8.3572 º ª0.5502 « 44 75426.108 15198.087 »» « «¬8.3572 15198.087 3187.1374 »¼

Xcȍ -1Y

ª 5.2392 º «8530.3424 » « » «¬ 1759.893 »¼

Total sum of squares of the heteroscedastic model Sum of the dependent variable of the heteroscedastic model Sample size

989.1192 154.6135 44

(i) Obtain the GLS estimate of the coefficient vector ȕ and comment on your results. (ii) Obtain the variance-covariance matrix of ȕˆ , where ȕˆ is the GLS estimator of ȕ . (iii) Obtain the regression and residual sum of squares for the heteroscedastic model. (iv) Obtain the centered R2, adjusted R2 and uncentered R2 and comment on your results. (v) Test the significance of the parameters in the regression equation. 4-31: The moment matrices and all other necessary information for the regression equation of per capita real GDP (Y, in thousand USD) on corruption perception index (X) for high-income and low-income countries are given below: (in each case the matrix X includes a constant and one independent variable).

Chapter Four

230

Factor XcX

Sample 1, [HIC] 3382 º ª 51 «3382 236226 » ¬ ¼

Sample 2, [LIC] 2130 º ª 72 « 2130 70046 » ¬ ¼

XcY

ª 1716.165 º «124735.792 » ¬ ¼

ª127.706 º « 4026.673» ¬ ¼

YcY

78109.2816 1716.165

339.2215 127.7060

51

72

T

¦Y

t

t=1

n

(i) Compute the least squares regression coefficients and the residual variances si2 (i = 1 , 2) for each group. Comment on your results. (ii) Obtain the variance-covariance matrix of the OLS estimator of the coefficient vector for each group. (iii) Compute the centered R2, adjusted R2 and uncentered R2 and comment on your results. (iv) Test the null hypothesis that the variances in the two regression equations are the same without assuming that the coefficients are the same in the two regression equations. (v) Estimate the GLS estimate of the regression coefficients in the regression equation, assuming that the constant and slope are the same in both regression equations. Comment on your result. (vi) Obtain the variance-covariance matrix of the GLS estimator. (vii) Test the significance of the regression coefficients.

CHAPTER FIVE AUTOCORRELATION

5.1 Introduction To estimate classical linear regression models using the least squares method, we assume that the successive values of the random disturbance term İ's are independent i.e., the value that the random error term İ assumes in any period is independent of the value which is assumed in any previous period. This assumption implies that the covariance of İ i and İ j is zero i.e., cov(İ i , İ j ) = 0, for i z j. It is quite natural that in most of the economic relationships, this assumption is not true. When this assumption is not satisfied, we say there is an autocorrelation or serial correlation problem of the random variable İ . When we deal with cross-sectional data, spatial problems may happen. A more serious correlation problem among successive values of a random variable can happen when we deal with time series or macroeconomic data for empirical measurement. Autocorrelation is a special case of correlation. Correlation is defined as the relationship between two or more than two variables. Autocorrelation is defined as the relationship, not between two or more than two variables but between the members of a series of observations ordered in time of a random variable. In the case of econometric analyses of economic relationships, the random error terms of one period may affect those from different time intervals. For example, the effects of tsunamis, cyclones, floods, earthquakes, hartals, strikes, COVID-19, war, etc., of one period may be related to the effects of other periods. We know at the same time an economic variable is influenced by several economic variables, but due to different limitations of econometric analyses for the empirical measurement of economic relationships, we can't include all these variables in the econometric model. Due to the omission of the variables, the autocorrelation problem arises. If the random error terms of a classical linear regression equation are assumed to be homoscedastic but autocorrelated, we cannot apply the OLS method to estimate econometric models. Different forms of autocorrelation are very common in economic phenomena and for each form, the structure of the variance-covariance matrix of the random error terms will be different. The most popular and widely used form is known as the first-order autoregressive form or AR(1) process which is given by İ t = ȡİ t-1 +v t , where v t is the random error term that satisfies all the usual assumptions of a CLRM. If ȡ = 0, then, we have İ t = v t , which implies that the standard Gauss-Markov conditions are satisfied and we can apply the OLS method to estimate econometric models. If we apply the OLS method to an autocorrelated model, different problems may occur. Thus, we need to know whether the random error terms of an econometric model are autocorrelated or not. Therefore, in this Chapter, different forms of autocorrelation, sources of autocorrelation, consequences of autocorrelation, different tests for detecting the presence of autocorrelation problems, and estimation techniques of autocorrelated models are discussed with their numerical applications.

5.2 Meaning of Autocorrelation In Statistics, autocorrelation of a random process is defined as the correlation between successive values of the process at different points in time, as a function of the two times or of the time differences. Let Y be a repeatable process, and t be a point in time after the start of that process (t may be an integer for a discrete-time process or a real number for a continuous-time process). Then, Yt is the value that is produced by a given run of the process at time t. Suppose, the process has the defined known mean value ȝ t and variance ı 2t for all times t. Then the autocorrelation between times s and t is denoted by ȡ(s, t) and is given by: ȡ(s, t) =

Cov(Yt ,Ys ) Var(Yt ) Var(Ys )

E ^[Yt  E(Yt )][Ys  E(Ys )]` ı t ıs E ^[Yt  ȝ t ][Ys  ȝ s ]` ı t ıs

(5.1)

where "E" is the expected value operator. The expression (5.1) is not well-defined for all time series or processes, because the variance may be zero (for a constant process) or infinite. If the function ȡ(s, t) is well-defined, its value

Chapter Five

232

must lie in the range [í1, 1], with 1 indicating perfect positive correlation and í1 indicating perfect negative correlation. If Yt is a second-order stationary process then the mean and the variance of Yt are time-independent. Also, autocorrelation depends only on the time distance between a pair of values but not on their position in time. This further implies that autocorrelation can be expressed as a function of the time lags, and this would be an even function of the lags j = s í t. This gives the more familiar form of the type: ȡ(j) =

E ^[Yt  ȝ][Yt+j  ȝ]`

(5.2)

ı2

This is an even function thus it can be stated as ȡ(j) = ȡ(  j). Thus, we can write, ȡ(  j) =

E ^[Yt  ȝ][Yt-j  ȝ]`

(5.3)

ı2

In the econometric analysis of an economic relationship in a classical regression model, one of the basic assumptions is that the random error components or disturbances are independently distributed. That is, the value that the random error term İ assumes in any period is independent of the value assumed in any previous period. Thus, in the regression model Yt = x tcȕ+İ t , it is assumed that, ­ı 2 , if j = 0 E(İ t ,İ t-j ) = ® ¯0, if j z 0

(5.4)

i.e., the correlation between the successive disturbances is zero. When the assumption E(İ t ,İ t-j ) = ı 2 , j = 0, is violated, i.e., the variance of the disturbance term does not remain constant, then the problem of heteroscedasticity arises. When the assumption E(İ t ,İ t-j ) = 0, for t z j, is violated, i.e., the variance of the disturbance term remains constant through the successive values of the disturbance terms, but the disturbance terms are correlated, then we say that there is an autocorrelation or serial correlation problem in the data. Autocorrelation is a special case of correlation. Autocorrelation refers to a relationship between successive values of the same variable. When autocorrelation is absent, all of the off-diagonal elements of Var(İ) = E(İİ c) are zero; if autocorrelation is present, some or all of the off-diagonal elements will not be zero, where İ is a vector of the random error terms. We know, that in a time series variable, the autocorrelation problem is very common. The autocorrelation coefficient at lag j is denoted by ȡ j and is given by: ȡj =

Cov(İ t ,İ t-j )

(5.5)

Var(İ t ) Var(İ t-j )

where Cov(İ t ,İ t-j ) is the autocovariance at lag j; it is denoted by Ȗ j and is given by: Cov(İ t ,İ t-j )

Ȗj

= E[(İ t  E(İ t )][(İ t-j  E(İ t-j )]

= E (İ t ,İ t-j ), [  E(İ t )

E(İ t-j ) = 0]

(5.6)

Since İ t is a covariance stationary, we have: Var(İ t ) = Var(İ t-j ) = Ȗ 0 , where Ȗ 0

E[İ t  E(İ t )]2 = E(İ 2t ),  t.

(5.7)

Therefore, equation (5.5) can be written as: ȡj =

Ȗj Ȗ0

, j = 0, r 1, r 2, r 3,.....

(5.8)

Assuming that ȡ j and Ȗ j are symmetrical in j, i.e., these coefficients are constant over time and depend only on lag length j, the autocorrelation between the successive terms İ t and İ t-1 , where t = 2, 3,......,T , gives the autocorrelation

Autocorrelation

233

of order one, i.e., ȡ1 . The first-order autocorrelation is a common in economic phenomenon and generally, it can be expressed as: İ t = ȡ1İ t-1 + v t , where İ t is the value of İ at time t; İ t-1 is the value of İ at time t-1; ȡ1 is called the autocorrelation coefficient, and v t is the new random error term that satisfies all the usual assumptions of a CLRM. Similarly, the autocorrelation between the successive terms İ t , and İ t-2 , t = 3, 4,......,T , gives the autocorrelation of order two, i.e., ȡ 2 , and it can be expressed as the linear regression equation of the type: İ t = ȡ1İ t-1 +ȡ 2 İ t-2 +v t . This equation is called an autoregressive model of order 2, where ȡ1 is called the first-order autocorrelation coefficient and ȡ 2 is called the second-order autocorrelation coefficient. It is a very common practice in some disciplines, other than Statistics to drop the normalization by ı 2 and use the term "autocorrelation" interchangeably with "auto-covariance". However, the normalization is important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the strength of statistical dependence, and because the normalization has an effect on the statistical properties of the estimated autocorrelations.

5.3 Sources of Autocorrelation There are several reasons for which the random error terms of a regression equation are autocorrelated. The following are the most important ones: (i) Omitted Explanatory Variables We know in business, economic, socio-economic, and financial problems etc., one variable is influenced by many other variables at the same time. But, various limitations, and circumstances do not allow investigators or researchers to include all the independent variables that affect the dependent variable in a given model. For the econometric analysis of various relationships, the investigator includes only important and directly fit variables in econometric models. Due to the omission of one or more explanatory variables from a set of explanatory variables in the model, the random error terms may be autocorrelated because the influence of the omitted variable may be reflected in the random variable İ . Because the error term in one period may be related to the error terms in successive periods, the problem of autocorrelation arises. For example, assume that the true demand function of beef is given by: Q t = ȕ 0 +ȕ1Pt +ȕ 2 Yt +ȕ 3 PO t +u t

(5.9)

where Q t is the quantity demanded of beef at time t; Pt is the unit price of beef at time t; Yt is the income level of a consumer at time t; PO t is the unit price of pork at time t, and u t is the true random error term corresponding to the tth set of observations (t = 1, 2,…….,T). However, for some reasons, we run the following demand function for beef: Q t = ȕ 0 +ȕ1Pt +ȕ 2 Yt +İ t

(5.10)

Equation (5.9) is the correct or the true model, but we run equation (5.10) for estimation, letting İ t = ȕ 3 PO t +u t . The error term İ represents the influence of the omitted variable PO. As a result, the disturbance term İ will reflect a systematic pattern. Because, many time series data exhibit trends over time, PO t is also likely to depend on PO t-1 , PO t-2 ,..... . This will cause a correlation between İ t and İ t-1 , İ t-2 , …………, thereby violating the assumption of the independence of the random error terms. Thus, the omission of explanatory variables may cause the problem of autocorrelation. (ii) Misspecification of the Mathematical Form of the Econometric Model

For econometric analysis of an economic relationship, if we use a mathematical form which differs from the true form of the relationship, then the disturbance terms must show the problem of serial correlation. For example, suppose the true or correct model in a cost-output study is as follows: MCi = ȕ 0 +ȕ1q i +ȕ 2 q i2 +u i

(5.11)

where MC is the marginal cost of a commodity, q is the level of output of that commodity, and u is the true random error term. But, for forecasting, the following equation is considered: MCi = ȕ 0 +ȕ1q i +İ i

(5.12)

We know that the marginal cost curve for the true model will be non-linear but our fitted model gives the linear relationship. This linear marginal cost function will consistently overestimate the true marginal cost curve, whereas,

Chapter Five

234

beyond these points, it will consistently underestimate the true marginal cost. This happens because the disturbance term İ i in the estimated linear equation contains the q i2 term, i.e., İ i = ȕ 2 q i2 +u i , which may cause inter-relation among successive values of İ , thereby, creating the problem of autocorrelation. (iii) Misspecification of the True Random Error Term

If the random error terms contain systematic measurement errors, it may cause the problem of autocorrelation. If the explanatory variable is measured wrongly, the disturbances will be autocorrelated. For example, suppose a country is updating its national savings in a given period of time. If there is a systematic error in the way that it was measured, cumulative savings will reflect accumulated measurement errors. This will show up as a serial correlation problem. (iv) Interpolation in the Statistical Observations

Most of the published time-series data by national or international agencies, and public or private agencies involve some of the interpolation and smoothing processes, which averages the true disturbances over successive time periods. As a result, the successive values of İ will be autocorrelated.

5.4 Mean, Variance and Covariance of Autocorrelated Random Error Terms First-Order Autoregressive Scheme

In this section, first, the techniques are discussed to calculate the mean, variance, and covariance of the random error terms İ's with the simple Markov process for the first-order autocorrelated model. Then, the discussion focuses on higher-order autoregressive models. Let us consider the regression equation of the type: y t = x tcȕ+İ t

(5.13)

where x t is a {(k+1) u 1} vector of k independent variables, ȕ is a {(k+1) u 1} vector of parameters. The random error terms İ t 's are autocorrelated. In this case, the first-order autoregressive structure is given by: İ t = ȡİ t-1 +v t [|ȡ| 1 , the test is not applicable. In this case, another approach namely: The Breusch-Godfrey testing approach will be applicable to test for serially-correlated residuals when lagged dependent variables appear in X’s. Ex. 5-5: Data on per capita real GDP (PGDP, constant 2015 USD), domestic investment (DINV), FDI, trade openness (OPN), and government expenditure (GEXP), of Bangladesh are collected from 1972-2018 2 to detect the autocorrelation problem assuming that there is a lagged dependent variable as an explanatory variable. For this purpose, the following regression equation is considered, PGDPt = ȕ 0 +ĮPGDPt-1 +ȕ1DINVt +ȕ 2 FDI t +ȕ3 OPN t +ȕ 4 GOEX t +İ t

(5.114)

We may expect that İ t = ȡİ t-1 + v t , where {v t } is a Gaussian white noise process. The null hypothesis to be tested is, H0 : ȡ = 0

2

DINV: measured as the difference between the new and the disposed fixed assets owned by the household, business and government sectors; FDI: measured as the difference between credits and debits in capital transactions; OPN: measured as the trade-to-GDP ratio which is calculated as the sum of the exports and imports divided by GDP; GEXP: the total government expenditure on goods and services for people. Except the per-capita real GDP, all other variables are measured as a percentage of the nominal GDP. Source: WDI.

258

Chapter Five

against the alternative hypothesis H1: ȡ z 0

Since, the lagged dependent variable is included in the equation as a regressor variable, the Durbin’s h test is applicable to test the null hypothesis. Under the null hypothesis, the test statistic is given by equation (5.113). We apply the OLS method to run the given equation (5.114) and then we obtain the Durbin-Watson test statistic d and ˆ . The results are given in Table 5-3. SE(Į) Table 5-3: The OLS results of equation (5.114)

Linear Regression - Estimation by Least Squares Dependent Variable PGDP Variable Coeff Std Error T-Stat Signif 1. Constant -28.3017 6.6784 -4.2378 0.0001 2. PGDP{1} 1.0837 0.0130 83.5133 0.0000 3. FDI -1.6941 5.6769 -0.2984 0.7669 4. DINV -0.3381 0.5564 -0.6077 0.5468 5. OPN 0.1921 0.3135 0.6128 0.5435 6. GOEX 1.0444 2.3444 0.4455 0.6584 0.9990 Centered R2 S.E. of Estimate 0.9998 Uncentered R2 Sum of Squared Residuals Adjusted R2 0.9989 Log Likelihood TR2 45.993 Regression F(5, 40) Mean of Dependent Variable S. E. 559.9193 Significance Level of F -Stat of Dependent Variable 235.9251 Durbin-Watson Statistic

7.7524 2403.9998 -156.2649 8327.2127 0.0000 2.6134

From the estimated results in Table 5-3, we have the Durbin-Watson test statistic d = 2.6134. The standard error for ˆ = 0.0130, and T = 46. Thus, the least squares coefficient on the first lagged dependent variable PGDPt-1 is SE(Į) putting these values in equation (5.1123), we have 46 § 2.61339 · h = ¨1  ~N(0, 1) ¸ 2 © ¹ 1  {46 u 0.013}

= -3.2807

(5.115)

Let the level of significance be 5%. At a 5% level of significance, the table value of the test statistic is r1.96 . Since the calculated value is less than -1.96, the null hypothesis will be rejected. Thus, it can be concluded that the random error terms are autocorrelated The Breusch-Godfrey LM Test for the First-Order Autocorrelation

The Breusch-Godfrey LM test is useful in identifying serial correlation for the first-order autocorrelation. The test procedure is discussed below: First, we regress Y on X’s of the type, y t = x ct ȕ +İ t

(5.116)

We may suspect that, İ t = ȡİ t-1 + v t , where {v t } is a Gaussian white noise process. The null hypothesis to be tested is, H 0 : ȡ = 0, against H1: ȡ z 0.

We rewrite equation (5.116) y t = ȡy t-1  ȡy t-1 +x ct ȕ+İ t y t = ȡy t-1  ȡ > x ct-1ȕ +İ t-1 @ + x ct ȕ +İ t , [  y t-1 = x ct-1ȕ +İ t-1 ] y t = ȡy t-1  > x ct  ȡx ct-1 @ ȕ +İ t  ȡİ t-1 y t = ȡy t-1  > x ct  ȡx ct-1 @ ȕ+v t

(5.117)

Autocorrelation

where v t

259

İ t  ȡİ t-1

Thus, the restricted maximum likelihood estimator of ȕ is equal to the OLS estimator of ȕ of equation (5.116). The OLS residuals are given by, e t = y t  x ct ȕˆ

(5.118)

Under the null hypothesis, the test statistic is given by 2 LM = (T  1)R e2 ~Ȥ (1)d.f.

(5.119)

where R e2 is the coefficient of determination which is obtained from the regression equation of the type: e t = x ct G  O e t-1  u t , if x t contains lagged dependent or stochastic variables

(5.120)

Otherwise R e2 is obtained from the regression equation of the type: e t =O e t-1  u t

(5.121)

where {u t } is a Gaussian white noise process. Let the level of significance be Į = 5% We compare the calculated value of the test statistic with the table value for decision-making. Let, at a 5% level of significance with 1 degree of freedom the table value of the test statistic be Ȥ 2tab . If the calculated value of the test statistic is greater than Ȥ 2tab , the null hypothesis of no autocorrelation will be rejected implying that the error terms are autocorrelated. We can also apply the F test statistic which is given by: F=

R e2 / m ~F(m, T  k) (1  R e2 ) /(T  k)

(5.122)

where m = number of restrictions and k = number of estimated coefficients. Ex. 5-6: Perform the Breusch-Goldfrey LM test for the first-order autocorrelation of the given problem in Ex. 5-5. Solution: First, we regress PGDP on DINV, FDI, OPN and GOEX of the type, PGDPt = ȕ 0 +ȕ1DINVt +ȕ 2 FDI t +ȕ 3 OPN t +ȕ 4 GOEX t +İ t

(5.123)

We may suspect that İ t = ȡİ t-1 + v t , where {v t } is a Gaussian white noise process. The null hypothesis to be tested is, H 0 : ȡ = 0, against H1: ȡ z 0

We apply the OLS method to run equation (5.123) and then we obtain the residuals. The OLS estimates are given below: ˆ =100.1952+11.6695DINV +216.4671FDI +3.0528OPN +10.0424GOEX ½ PGDP t t t t t ° t-Test: 1.2047 1.8062 3.2338 0.7613 0.3233 ° ¾ 4.010 31.0539 SE: 83.168 6.4609 66.9373 ° 2 ° R 0.8265 ¿

(5.124)

The OLS residuals are given by, e t = PGDPt -100.1952-11.6695DINVt -216.4671FDI t -3.0528OPN t -10.0424GOEX t

Now, we regress e t on e t-1 of the type,

(5.125)

Chapter Five

260

e t = O e t-1 +u t

(5.126)

where ^u t ` is a Gaussian white noise process. We now apply the OLS method to run equation (5.126) and then we obtain the goodness of fit, i.e., coefficient of determination R e2 . The OLS estimates of equation (5.126), are given below, eˆ t = 0.8439e t-1 , R e2 =0.5609½ ° t-Test: 7.5869 ¾ ° SE: 0.1112 ¿

(5.127)

Under the null hypothesis, the test statistic is given by: 2 LM = 46 u 0.5609~Ȥ (1)d.f.

= 25.8014

(5.128)

Let the level of significance be 5%. At a 5% level of significance with 1 degree of freedom, the table value of the Chi-square test statistic is 3.84. Since, the calculated value is greater than the table value, the null hypothesis of no autocorrelation will be rejected implying that the error terms are autocorrelated. The Breusch-Godfrey LM Test for Higher-Order Autocorrelation

The Breusch-Godfrey LM test is useful in identifying serial correlation not only for the first order but also for higher orders. One of the serious problems in the Durbin-Watson test is that the Durbin-Watson test is applicable based on the assumption that the explanatory variables are non-stochastic (fixed in repeated samples). The application of the Breusch-Godfrey LM test for autocorrelation does not depend on the assumption of non-stochastic. It is also applicable if the explanatory variables contain stochastic variables, i.e., lagged dependent variables. Previously, we discussed the test procedure for first-order autocorrelation. Here, we discuss the test procedure for higher-order autocorrelation. This test is a specific type of Lagrange multiplier test. This test procedure involves the following steps: Step 1: First, we regress y on X’s of the type y t = x ct ȕ +İ t

(5.129)

We may expect that İ t = ȡ1İ t-1 +ȡ 2 İ t-2 +......+ȡ p İ t-p + v t , where {v t } is a Gaussian white noise process. It is called the pth-order autoregressive scheme or AR(p) model. Step 2: Second, we set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is,

H 0 : ȡ1 = ȡ 2 =........= ȡ p = 0 against the alternative hypothesis

H1: At least one of them is not zero. Step 3: We apply the OLS method to run equation (5.128) and then we obtain the OLS residuals e’s. Step 4: If the explanatory variables contain stochastic variables, i.e., lagged dependent variables, then we regress e t on all original x’s and e t-1 , e t-2 ,......,e t-p of the type e t = x ctD +G1e t-1 +G 2 e t-2 +......+G p e t-p + u t

(5.130)

If, all the x’s are strictly non-stochastic, we regress e t on e t-1 , e t-2 ,......,e t-p of the type e t = G1e t-1 +G 2 e t-2 +......+G p e t-p + u t

(5.131)

Autocorrelation

261

where {u t } is a Gaussian white noise process. Step 5: We apply the OLS method to run equation (5.130) or (5.131), then we obtain the OLS residuals which are given by, uˆ t = e t  x ctDˆ  Gˆ1e t-1  Gˆ2 e t-2  ......  Gˆp e t-p

(5.132)

Or uˆ t = e t  Gˆ1e t-1  Gˆ2 e t-2  ......  Gˆp e t-p

(5.133)

We then compute R e2 from the auxiliary regression equation either (5.130) or (5.131) which is given by, T

R e2 = 1 

¦ uˆ

2 t

t=p+1 T

(5.134)

¦ e2t

t=p+1

Since the Breusch-Godfrey LM (BG) test cannot specify the lag length of “e”, the Akaike Information Criteria (AIC), the Schwarz Bayesian Information Criteria (SBIC), or the Hannan-Quinn information criterion (HQIC) can be applied to select the lagged values of “e” in equation (5.130) or (5.131). These criteria are defined in equations (4.146), (4.147) and (4.148) respectively. Step 6: Under the null hypothesis, the test statistic is given by: BG = (T  p)R e2 ~Ȥ 2p

(5.135)

where T is the sample size and p is the number of restrictions. Step 7: Finally, we make a decision by comparing the calculated value of the test statistic with the corresponding table value. Let, the level of significance be 5%. At a 5% level of significance with p degrees of freedom, we find the table value of the Chi-square test statistic. If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis. Otherwise, we accept the hypothesis.

We can also apply the F test statistic which is given by: F=

R e2 / p ~F(p, T-k) (1  R e2 ) / T-k

(5.136)

where p = number of restrictions, and k = number of estimated coefficients. At a given level of significance with p and (T-k) degrees of freedom, we find the table value of the F-test statistic. If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis. Otherwise, we accept the hypothesis. Ex. 5-7: Test the higher-order autocorrelation using the Breusch-Godfrey LM (BG) test considering the regression equation for the given problem in Ex. 5-1. Solution: First, we regress PCEX on PGDP of the type, PCEX t = ȕ 0 +ȕ1PGDPt +İ t

(5.137)

We may expect that, İ t = ȡ1İ t-1 +ȡ 2 İ t-2 +....+ȡ p İ t-p + v t , where {v t } is a Gaussian white noise process. The null hypothesis to be tested is H 0 : ȡ1 = ȡ 2 =.......= ȡ p = 0

against the alternative hypothesis

H1: At least one of them is not zero.

Chapter Five

262

We apply the OLS method to run equation (5.137) and then we obtain the OLS residuals e’s, which are given by equation (5.92). We now regress e t on e t-1 , e t-2 ,......,e t-p of the type, e t = G1e t-1 +G 2 e t-2 +......+G p e t-p + u t

(5.138)

where {u t } is a Gaussian white noise process. We apply the OLS method to run equation (5.138) and then we obtain the OLS residuals uˆ t 's and hence R e2 from this auxiliary regression equation. Using the AIC, BIC and HQIC criteria, it is found that p= 2. The OLS estimates for p=2 of equation (5.138) are given below, eˆ t = 0.8884e t-1  0.0908e t-2 , R e2 = 0.7485½ ° t-Test: 5.5270 -0.764 ¾ ° SE: 0.1607 0.1188 ¿

(5.139)

Under the null hypothesis, the Breusch-Godfrey LM (BG) test statistic is given by, 2 BG = 45 u 0.7485 ~Ȥ (2)d.f.

= 33.6825

(5.140)

Let the level of significance be 5%. At a 5% level of significance with 2 degrees of freedom, the table value of the Chi-square test statistic is 5.99. Since, the calculated value of the test statistic is greater than the table value we reject the null hypothesis. Thus, it can be concluded that the random error terms are autocorrelated of order 2. The Runs Test for Autocorrelation

The null hypothesis to be tested is: H 0 : The errors are random or independent

against the alternative hypothesis H1: The errors are autocorrelated.

The test procedure involves the following steps: Step 1: First, we regress Y on X’s, of the type, y t = x ct ȕ +İ t

(5.141)

We may expect that İ t = ȡİ t-1 + v t , where {v t } is a Gaussian white noise process. Step 2: We apply the OLS method to run equation (5.141), and then we obtain the least squares residuals e’s. We have to observe the sign of every residual for a given data. We will get the sequences of + signs and minus signs of residuals like,

(++++++++)(---------)(+++)(----) (+++++++++++++)(--)(+)(-) Step 3: We have to find the number of runs from the sequences of + signs and – signs. Let, k indicates the number of runs. From the above sequences of signs, the number of runs k is 8. Step 4: We have to calculate the mean and the variance of k which are given by, E(k) =

2n n ^2n1n 2  n1  n 2 ` 2n1n 2 +1, and Var(k) = 1 2 . 3 n1  n 2 ^n1  n 2 ` ^n1  n 2  1`

where n = total number of observations (n = n1 + n 2 ) , n1 = number of + ve signs, n 2 = number of – ve signs, and k = number of runs. Step 6: Under the null hypothesis, the test statistic is given by,

Autocorrelation

Z=

k  E(k) Var(k)

263

~N(0, 1)

(5.142)

Step7: We have to select the level of significance that we can tolerate: Let, the level of significance be 5%. Step 8: At a 5% level of significance, the table value of the test statistic is r1.96 . Step 9: We have to compare the calculated value of the test statistic with the table value to make a decision on whether the null hypothesis will be accepted or not. If the calculated value of Z lies between -1.96 and +1.96, we accept the null hypothesis, indicating that the random error terms are not autocorrelated. Otherwise, we reject the null hypothesis indicating that the random error terms are autocorrelated. Ex. 5-8: Test the autocorrelation problem using the run test for the given problem in Ex. 5-5. Solution: First, we regress economic growth (PGDP) on domestic investment (DINV), foreign direct investment (FDI), trade openness (OPN) and government expenditure (GOEX) of the type is given by (5.123).

The null hypothesis to be tested is,

H 0 : The errors are random or independent against the alternative hypothesis

H1: The errors are autocorrelated We apply the OLS method to run equation (5.123) and then we obtain the OLS residuals. From the sequences of + ve signs and – ve signs of OLS residuals, it is found that the number of runs k is 11. It is also found that the number of +ve signs of OLS residuals is, n1 20, and the number of –ve signs is, n 2 27 . Thus, the mean and the variance of k are given by, E(k) =

2 u 20 u 27 ^2 u 20 u 27  47` 2 u 20 u 27 +1= 23.9787 , and Var(k) = 3 20  27 ^47` ^47  1`

0.2336

Under the null hypothesis, the test statistic is Z=

11  23.9787 0.2336

~N(0, 1)

= -26.8531

(5.143)

Let the level of significance be 5% Step 8: At a 5% level of significance, the table values of the test statistic is r1.96 . Since, the calculated value of Z does not fall between -1.96 and +1.96, we reject the null hypothesis indicating that the random error terms are autocorrelated Q-Test for Higher-Order Autocorrelation: Box-Pierce (1970) Q Test and Ljung-Box (1979) Q Test

This test statistic was derived by Box and Pierce in 1970 to test the joint null hypothesis that all the autocorrelation coefficients are simultaneously equal to zero. The null hypothesis to be tested is, H 0 : ȡ1 = ȡ 2 =........= ȡ p = 0 against the alternative hypothesis H1: At least one of them is not zero.

This test involves the following steps: Step 1: First, we regress Y on X’s of the type, y t = x ct ȕ + İ t , t = 1, 2,........,T

(5.144)

We may expect that İ t = ȡ1İ t-1 +ȡ 2 İ t-2 +......+ȡ p İ t-p + v t , where ^v t ` is a Gaussian white noise process.

Chapter Five

264

Step 2: We apply the OLS method to run equation (5.144), and then obtain the least squares residuals e’s. Step 3: We obtain the estimated value of ȡ j (j =1, 2,…..,p) which is given by, T

¦e e

t t-j

rj =

t=j+1 T

(5.145)

¦ e2t t=1

Step 4: We obtain the value of the test statistic. Under the null hypothesis, the test statistic is given by p

Q = T ¦ rj2 ~ Ȥ 2p

(5.146)

j=1

Step 5: We have to select the level of significance that we can tolerate. Let, the level of significance be Į = 5% . Step 6: At a 5% level of significance with p degrees of freedom, we find the table value of the Chi-square test statistic. If the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected implying that the error terms are autocorrelated with order p. Otherwise, we accept the null hypothesis, implying that the error terms are not autocorrelated.

It is found that the Box-Pierce test leads to the wrong decision frequently when the sample size is small. A modified version of the Box-Pierce test has better small-sample properties which was derived by Ljung and Box in 1979. This test is known as the Ljung-Box (1979) test. The test statistic is given by: p

Q = T ^T+2` ¦

rj2

j=1 (T  j)

~ Ȥ 2p

(5.147)

It is very clear from the Ljung-Box Q test statistic that, if the sample size is very large as T tends to infinity, the terms (T+2) and (T-j) of equation (5.147) will be cancelled out. Hence, for a large sample size, the Ljung-Box test will be equivalent to the Box-Pierce Q test. Thus, it is reasonable that the Ljung-Box Q test is more powerful than the BoxPierce Q test. To select the lagged length, we can apply different selection criteria say: AIC, SBIC and HQIC criterion. For estimating these criteria, see equations (4.146), (4.147) and (4.148) respectively. Ex. 5-9: The first six autocorrelation coefficients are estimated from the data of savings and investment of the USA and are given below:

Lag Autocorrelation Coefficient

1 0.8708

2 0.6301

3 0.3924

4 0.2026

5 0.0739

6 -0.0516

Test whether each of the individual correlation coefficients is statistically significant or not and test the significance of all six correlation coefficients jointly using both Box-Pierce and Ljung-Box tests. Solution: To test the significance of each of the individual correlation coefficients, we have to construct the 95% 1 . Given that T = 55, we have CI = [-0.2643, 0.2643]. confidence interval (CI) for each coefficient using r1.96 u T Therefore, it can be concluded that the first three correlation coefficients are individually significantly different from zero at a 5% level of significance because the estimated values of the first three correlation coefficients lie outside the range [-0.2643, 0.2643]. To test whether the six autocorrelation coefficients are simultaneously equal to zero using the Box-Pierce Q and Ljung-Box Q tests, we set up the following null hypothesis, H 0 : ȡ1 =ȡ 2 =.........=ȡ 6 = 0

against the alternative hypothesis H1: At least one of them is not zero.

Under the null hypothesis, the Box-Pierce Q test statistic is given by,

Autocorrelation

265

Q = 55 u ^0.87082  0.63012 +0.39242 +0.202632 +0.07392 + (-0.0516) 2 ` ~Ȥ 62 (5.148)

= 72.4581

Under the null hypothesis, the Ljung-Box Q test statistic is given by, ­ 0.87082 0.63012 0.39242 0.20262 0.0739 2 (-0.0516) 2 ½ 2  Q* = 55 u 57 u ® + + + + ¾ ~Ȥ 6 53 52 51 50 49 ¯ 54 ¿

(5.149)

= 81.181

Let the level of significance be 5%. At a 5% level of significance with 6 degrees of freedom, the table value of the Chi-square test statistic is 12.59. Since, the calculated values of both tests are greater than the table value, the null hypothesis will be rejected, implying that all of the first six autocorrelation coefficients are not zero. Note: In this instance, the individual test cause an acceptance of the null hypotheses H 0 : ȡ 4 = 0, H 0 : ȡ5 = 0, and H 0 : ȡ6 = 0, while the joint test did not. This is an unexpected result that may happen due to the low power of the joint test when three of the six individual autocorrelation coefficients are insignificant. Thus, the effect of the insignificant autocorrelation coefficient is diluted in the joint test by the significant coefficients. The sample size is also quite large in this example.

5.9 Methods for Estimating Autocorrelated Models The methods which are most popular and widely applicable for estimating autocorrelated models are discussed in this section. These are: (i) Generalized least squares (GLS) method (ii) Maximum likelihood (ML) method (iii) Feasible generalized least squares (FGLS) method Generalized Least Squares (GLS) Method

It would be possible for us to apply the GLS method to estimate an autocorrelated model if the form of autocorrelation is known. One approach which is most popular and widely applicable is known as the Cochrane-Orcutt procedure. This method works very well for the first-order autoregressive model. Let, us consider the regression equation of the type, (5.150)

Y = Xȕ +İ

We may expect that, İ t = ȡİ t-1 + v t , where ^v t ` is a Gaussian white noise process. We have Cov(İ t , İ t-j ) = ȡ jı İ2 , Var(İ t ) =

ı 2v = ı İ2 ,  t , and Corr(İ t , İ t-j ) = ȡ j . 1  ȡ2

We see that the lagged correlations are all powers of ȡ and they decline geometrically. These expressions can be used to derive the variance-covariance matrix of İ, and using this variance-covariance matrix, we can obtain the coefficient of the regression equation. This is called the GLS method. We have E(İ) = 0, and Var(İ) =ı İ2 ȍ, where ȍ is defined in equation (5.31). If, : is known, the GLS estimator of ȕ  is given by: -1 ȕˆ GLS = ª¬ Xc:-1X º¼ ª¬ X c:-1Y º¼

The sampling variance of ȕˆ GLS is given by:

(5.151)

Chapter Five

266 -1 Var(ȕˆ GLS ) = ª¬ X c:-1X º¼ ı İ2 -1

= ª¬ Xc:-1X º¼ s 2

(5.152)

where s2 =

ec:-1e , (n  k  1)

(5.153)

Here, e = Y  Xȕˆ GLS , and k is the number of explanatory variables. For : 1 , see the equation (5.68). In the case of the first-order autoregressive scheme of the random error terms, the estimate of ȕ can be obtained alternatively by using a simple two-stage procedure namely: In the first stage, we transform the original observation by the use of the known parameter ȡ . In the second stage, we can apply the OLS method to the transformed model. The procedure is discussed below: We are now seeking a transformation matrix P such that, PY = PXȕ+Pİ Y* =X*ȕ+İ*

(5.154)

The new random error vector İ* of the transformed equation (5.154) has a scalar variance-covariance matrix which is given by: Var(İ* ) = PVar(İ)Pc

= Pı İ2 ȍPc = ı İ2 PȍPc = ı İ2 I n

(5.155)

If we define, ª 1 U 2 « « U « 0 « « . 2 1 U P = H = « « . « . « « 0 «« 0 ¬

0

0 ....

0

0

1

0 ....

0

0

U

1 ....

0

0

.

. ....

.

.

.

. ....

.

.

.

. ....

.

.

0 0

0 ....  U 0 .... 0

1 U

0º » 0» 0» » .» » .» .» » 0» 1 »¼»

(5.156)

Then it can be shown that E(Hİİ cHc) = ı İ2 I n . Thus, the OLS method can be applied to the transformed equation of the type: HY = HXȕ + Hİ

HY is given by:

(5.157)

Autocorrelation

ª 1 U 2 « « U « 0 « « . HY = « « . « . « « 0 «¬« 0

0

0 ....

0

1 U

0 .... 1 ....

0 0

.

. ....

.

.

. ....

.

. 0

. .... . 0 ....  U

0

0 ....

0º » 0 0» 0 0» » . .» » . .» . .» » 1 0»  U 1 »¼» 0

0

267

ª Y1 º «Y » « 2 » « Y3 » « » « . » « . » « » « . » «Y » « T-1 » ¬« YT ¼»

ª 1 U 2 Y º 1 « » « Y2  ȡY1 » « Y  ȡY » 2 » « 3 = « » . « » . « » « » . « » «¬ YT  ȡYT-1 »¼

(5.158)

and HX is given by: ª 1 U 2 « « U « 0 « « . HX = « . « « . « « 0 «« 0 ¬

ª 1  ȡ2 « « 1 ȡ « . « « . « « . « . « « . « 1 ȡ «¬

0

0 ....

0

0

1 U

0 .... 1 ....

0 0

0 0

.

.

....

.

.

. .

. .

.... ....

. .

. .

0

0 ....  U

0

0 ....

0

1 U

0º » 0» 0» » .» » .» .» » 0» 1 »»¼

ª1 X 11 «1 X 12 « «1 X 13 « . «. «. . « . «. «. . « «¬1 X 1T

1  U 2 X 11 X 12  ȡX 11

1  ȡ 2 X 21 X 22  ȡX 21

.... . . .... . .

.

.

.... . .

.

.

.... . .

.

.

.... . .

.

.

.... . .

X 1T

.  ȡX 1,T-1

X 2T

.  ȡX 2,T-1

.... . . .... . .

X 21 X 22

... . . . ... . . .

X 23

... . . .

. .

... . . . ... . . .

.

... . . .

. X 2T

... . . . ... . . .

1  ȡ 2 X k1 º » X k2  ȡX k1 » » . » » . » . » » . » . » X kT  ȡX k,T-1 »»¼

X k1 º X k2 »» X k3 » » . » . » » . » . » » X kT »¼

(5.159)

These transformations can be done in different ways including partial differences, quasi-differences or pseudodifferences. One more common and widely used transformation is to multiply equation (5.150) by ª U « 0 « « . « H1 = « . « . « « 0 «« 0 ¬

1 U

0 1

.... ....

0 0

0 0

. .

. .

.... ....

. .

. .

.

.

....

.

.

0 0

0 0

.... ....

U 0

1 U

0º 0 »» .» » .» .» » 0» 1 »»¼

(5.160)

where H1 is obtained by deleting the first row of the matrix H. Thus, we are now using (T-1) observations rather than T where each transformed observation is obtained by subtracting ȡ times the previous observation from the current observation. It can be easily shown that the application of the OLS method to the transformed equation HY = HXȕ+Hİ will give exactly the estimator, while with the application of the OLS method to the transformed

Chapter Five

268

equation H1Y = H1Xȕ + H1İ , the result will be a very close approximation. The GLS estimates cannot be computed directly if the order of the autocorrelation structure is not known or the value(s) of the parameter(s) is unknown. Let, us now discuss the method considering the simple linear regression equation of the type: Yt = ȕ 0 +ȕ1X t +İ t ; t = 1, 2,……..,T

(5.161)

We may expect that, İ t = ȡİ t-1 + v t , where {v t } is a Gaussian white noise process. For (t-1) time period, equation (5.161) can be written as, (5.162)

Yt-1 = ȕ 0 +ȕ1X t-1 +İ t-1

Multiplying equation (5.162) by ȡ , we have (5.163)

ȡYt-1 = ȡȕ 0 + ȡȕ1X t-1 +ȡİ t-1

Subtracting equation (5.163) from (5.161), we have Yt  ȡYt-1 = ȕ 0 >1  ȡ @ + ȕ1 > X t  ȡX t-1 @ +v t

Yt* = Į +ȕ1X*t +v t

(5.164)

where Yt* = Yt  ȡYt-1 , Į

ȕ 0 >1  ȡ @ , X*t

X t  ȡX t-1 , and v t

İ t  ȡİ t-1 .

Since, ^v t ` is a Gaussian white noise process, we can apply the OLS method to estimate the parameters Į and ȕ1 of the transformed model (5.164). However, this statement is not entirely correct. Since, in this method, we cannot apply all the observations, we applied here (T-1) observations, and we are losing one observation in the process of taking the first difference. Thus, this procedure is not exactly the GLS procedure. For the first observation, the equation (5.161), can be written as (5.165)

Y1 = ȕ 0 +ȕ1X1 +İ1

We

have

to

multiply

equation

(5.164)

by

1-ȡ 2

because

the

variance

of

İ1

is

ı İ2 =

ı 2v . 1  ȡ2

Thus, the transformed equation for the first observation will be, Y1 1  ȡ 2 = ȕ 0 1  ȡ 2 + ȕ1X1 1  ȡ 2 +İ1 1  ȡ 2

Y1* = Į +ȕ1X1* + v1 If we make the transformations Yt* = Yt  ȡYt-1 , X*t

(5.166) X t  ȡX t-1 , for t = 2, 3,….,T, Y1* = 1-ȡ 2 Y1 , and X1* = 1-ȡ 2 X1 ,

for t=1, and then we regress Yt* on X*t , for t = 1, 2,…,T, i.e., Yt* = Į +ȕ1X*t + v t

(5.167)

All of the v’s have the same variance to the transformed equation (5.1667 for T observations. Thus, we can apply the OLS method to equation (5.167), which is known as the GLS method, and the resulting estimates are GLS estimates, which are BLUE. Ex. 5-10: Obtain the GLS estimators of the regression coefficients for the given problem in Ex.5-5 if ȡˆ = 0.779 . Solution: The transformations of the variables for t = 2, 3,………,T are given by,

ˆ ˆ ˆ ˆ ˆ Yt = PGDPt  ȡPGDP t-1 , X1t = FDI t  ȡFDI t-1 , X2t = DINVt  ȡDINVt-1 , X3t = GEXPt  ȡGEXPt-1 , and X4t = OPNt  ȡOPNt-1 . The transformations of the variables for the first observations are given by,

Autocorrelation

269

Y1 = 1  ȡˆ 2 PGDP1 , X11 = 1  ȡˆ 2 FDI1 , X 21 = 1  ȡˆ 2 DINV1 , X 31 = 1  ȡˆ 2 GEXP1 , and X 41 = 1  ȡˆ 2 OPN1 . We now regress Y on X’s for t= 1, 2,……,T of the type: (5.168)

Yt = ȕ 0 + ȕ1X1t + ȕ 2 X 2t + ȕ3 X 3t + ȕ 4 X 4t + v t

where ^v t ` is a white-noise process In matrix form, equation (5.168) can be written as (5.169)

Y = Xȕ+v

Since, {v t } is a Gaussian white noise process, we can apply OLS to the transformed model (5.169) where the resultant estimator of ȕ is called the GLS estimator of ȕ . The GLS estimator of ȕ is given by, -1 ȕˆ GLS = X cX XcY

(5.170)

For the transformed variables, we have

0.0346 -0.0168 -0.0697 ª 0.2013 « 0.0346 0.4071 -6.6545e-003 0.0125 « (XcX)-1 = « -0.0168 -6.6545e-003 6.0633e-003 -8.0719e-003 « 0.0125 -8.0719e-003 0.1379 « -0.0697 «¬-4.0322e-003 -8.8987e-003 -6.7796e-004 -5.4552e-003

-4.0322e-003º -8.8987e-003»» -6.7796e-004» » -5.4552e-003» 2.1504e-003 »¼

(5.171)

and ª 6580.4584 º «1003.5497 » « » (XcY) = «35963.2100 » « » « 7410.6105 » «¬ 48937.2886 »¼

(5.172)

Thus, we have

ȕˆ GLS

0.0346 -0.0168 -0.0697 ª 0.2013 « 0.0346 0.4071 -6.6545e-003 0.0125 « « = -0.0168 -6.6545e-003 6.0633e-003 -8.0719e-003 « -0.0697 0.0125 -8.0719e-003 0.1379 « «¬-4.0322e-003 -8.8987e-003 -6.7796e-004 -5.4552e-003

ª 42.09816 º «54.08492 » « » = « 7.97883 » « » «18.52279 » «¬ 4.96425 »¼

-4.0322e-003º ª6580.4584 º -8.8987e-003»» ««1003.5497 »» -6.7796e-004 » «35963.2100 » » »« -5.4552e-003» «7410.6105 » 2.1504e-003 »¼ «¬ 48937.2886 »¼

(5.173)

The variance-covariance matrix of ȕˆ GLS is given by T

Var ª¬ȕˆ GLS º¼ = > X cX @ ı , where ıˆ = -1

2

2

For the transformed model, we have

¦e

2 i

i=1

T-k-1

(5.174)

Chapter Five

270

ıˆ 2 =

127130.0946 = 3026.907 . 42

Hence, we have 0.0346 -0.0168 -0.0697 ª 0.2013 « 0.0346 0.4071 -6.6545e-003 0.0125 « Var ª¬ȕˆ GLS º¼ = « -0.0168 -6.6545e-003 6.0633e-003 -8.0719e-003 « 0.0125 -8.0719e-003 0.1379 « -0.0697 «¬-4.0322e-003 -8.8987e-003 -6.7796e-004 -5.4552e-003

ª « « = « « « «¬

609.1654

104.7944

-50.7845

-210.8689

104.7944

1232.2625

-20.1426

37.7865

-50.7845 -210.8689

-20.1426 37.7865

18.3532 -24.4328

-24.4328 417.3095

-12.2051

-26.9355

-2.0521

-16.5125

-4.0322e-003º -8.8987e-003»» -6.7796e-004 » u 3026.907 » -5.4552e-003» 2.1504e-003 »¼

-12.2051º -26.9355 »» -2.0521 » » -16.5125» 6.5092 »¼

(5.175)

Maximum Likelihood Method

We can apply the ML method to estimate an AR(1) model if the form of autocorrelation is either known or unknown. If the form of autocorrelation is known, the GLS and ML estimation for AR(1) model are equivalent. Let, us consider the following general linear regression equation: (5.176)

y t = x ct ȕ +İ t , t = 1, 2,.......,T

We may expect that, İ t = ȡİ t-1 + u t , where {u t } is a Gaussian white noise process. We have Cov(İ t , İ t-j ) = ȡ jı İ2 , and Var(İ t ) =

ı 2u = ı İ2 ,  t 1  ȡ2

Thus, the lagged correlations are all powers of ȡ and they decline geometrically. These expressions can be used to derive the variance-covariance matrix of the errors İ (see equation (5.30)), and using these variance-covariance matrix, we can then obtain the ML estimates of the coefficients of the regression equation (5.176). The likelihood function of the observations y1 , y 2 ,......, and y T is given by, f(y1 , y 2 ,......,yT ) = f(y1 ) f(y 2 |y1 ) f(y3 |y2 ).......f(yT |yT-1 )

(5.177)

The transformed model is given by, y*t = x *t cȕ +u t

(5.178)

So that in terms of the original data, we have y1 = x1*c E +

u1 1 U 2

, for t =1

(5.179)

For the remaining observations, we have y t |y t-1 = ȡy t-1 +x ct ȕ  x ct-1ȡȕ +u t , for t = 2, 3,………,T

(5.180)

For the first observation y1 , the p.d.f is given by f(y1 ) =

=





1  U 2 f u1 ª 1  ȡ 2 y1  x1*cȕ º ¬« ¼»

1  ȡ2 2ʌı 2u

e



1 ȡ 2 2ı 2u

> y1  x1cȕ @2

(5.181)

Autocorrelation

271

The p.d.f of y t |y t-1 1

f(y t |y t-1 ) =

e

2ʌı 2u



2 1 ª y t  ȡy t-1  x t  ȡx t-1 c ȕ º» ¼ 2ı 2u ¬«

, t = 2, 3,…..,T

(5.182)

Therefore, the likelihood function for the whole sample is given by, L=

1  ȡ2 2ʌı 2u

e



1 ȡ 2 2ı 2u

> y1  x1cȕ @2

1 ª 1  2 ª y t  ȡy t-1  x t  ȡx t-1 c ȕ º ¼» 2ı u ¬« « e – 2 t=2 « 2ʌı u ¬ T

2

º » ; t = 1, 2,….,T. » ¼

(5.183)

The log likelihood function is given by: 1 1  ȡ2 T 1 2 ªlog(2ʌ)+log(ı 2u ) º¼ log L =  ª¬ log(2ʌ)+log(ı 2u )  log(1  ȡ 2 º¼  y  x1cȕ @  2 > 1 2 2ı u 2 ¬



log L = 

1 2ı 2u

T

¦ ª¬ y t=2

t

 ȡy t-1  x t  ȡx t-1 c ȕ º ¼

2

T 1 1 ª log(2ʌ) + log(ı 2u ) º¼  log(1  ȡ 2 )  2 2¬ 2 2ı u

(5.184) T

¦ ª¬ y t=2

* t

 x *t cȕ º ¼

2

(5.185)

The sum of squares term of logL function is the residual sum of squares of the transformed model (5.178). Thus, the maximization of logL with respect to ȕ is equivalent to the minimization of the residual sum of squares. Therefore, maximizing logL with respect to ȕ , we have § T · ȕˆ ML = ¨ ¦ x *t x *t c ¸ © t=1 ¹

-1 T

* t

¦x y

* t

t=1

-1

= ª¬ Xc: -1X º¼ X c: -1Y

(5.186)

If ȡ is known then the ML estimates of ȕ will be equivalent to the GLS estimates of ȕ . Taking the partial derivative of logL with respect to ı 2u and then equating to zero, we have ıˆ 2u =

1 T 2 ¦ uˆ t T t=1

(5.187)

The information matrix is given by:

I ȕ, ı 2u

ª§ 1 · -1 «¨ 2 ¸ Xcȍ X ı u © ¹ = « « « 0 ¬«

º 0 » » T » » 2ı 4u ¼»

(5.188)

Thus, from the information matrix (5.188), we have -1 Var(ȕˆ ML ) = ª¬ X c: -1X º¼ ı u2

(5.189)

and Var(ıˆ 2u ) =

2ı 4u T

(5.190)

Feasible Generalized Least squares (FGLS) Estimator

The main problem with the GLS estimator is that we must know the true autocorrelation coefficient U in order to use it. If we don’t know the value of U, it is not possible for us to transform the variables Y and X or to create transformed

Chapter Five

272

variables Yt* and Xt*. However, in most cases, the true value of U is unknown and unobservable. Thus, the GLS is not a feasible estimator. The GLS estimator requires to know the value of U. To make the GLS estimator feasible, we can use the sample data to obtain an estimate of U. When we do this, we have a different estimator. This estimator is called the Feasible Generalized Least Squares (FGLS) Estimator. The following methods are the most popular and widely applicable to obtain the FGLS estimator. (1) The Cochrane-Orcutt (1949) method (2) The Hildreth-Lu (1960) search procedure (3) Durbin’s (1960) method (4) Prais-Winsten estimation procedure. The Cochrane-Orcutt (1949) Method

Let us consider a linear regression model of the type Yt = ȕ 0 + ȕ1X1t +ȕ 2 X 2t +........+ȕ k X kt + İ t , t = 1, 2,......,T

(5.191)

We may expect that, İ t = ȡİ t-1 + v t , where {v t } is a Gaussian white noise process. Recall that the new error term v t satisfies all the usual assumptions of the classical linear regression model. This statistical model describes what, we believe, is the true underlying process that is generating the data. To obtain the FGLS estimates of ȕ 0 , ȕ1 , ȕ 2 ,......., and ȕ k using the Cochrane-Orcutt method, the following steps are involved: Step 1: Use the OLS method to run equation (5.190), and then obtain the least squares residuals e’s which are given by, e t = Yt  ȕˆ 0  ȕˆ 1X1t  ȕˆ 2 X 2t  ........  ȕˆ k X kt ,  t

(5.192)

where, ȕˆ 0 , ȕˆ 1 ,....., and ȕˆ k are the OLS estimators of ȕ 0 , ȕ1 ,....., and ȕ k respectively. Step 2: Regress e t on e t-1 of the type, e t = ȡe t-1 + u t

(5.193)

where ^u t ` is a Gaussian white noise process. Step 3: Apply the OLS method to run equation (5.193) and then obtain the least squares estimate of ȡ which is given by, T

¦e e

t t-1

ȡˆ =

t=2 T

(5.194)

¦ e2t-1 t=2

Step 4: Use the estimate ȡˆ of ȡ to create the transformed variables such that, ˆ t-1 , X1t* = X1t  ȡX ˆ 1,t-1 , ………….., and X*kt = X kt  ȡX ˆ k,t-1 Yt* = Yt  ȡY

Step 5: Regress the transformed variable Yt* on a constant and the transformed variables X*'s of the type, ˆ t-1 =ȕ 0 >1  ȡˆ @ +ȕ1 ª¬ X1t  ȡX ˆ 1,t-1 º¼ +ȕ 2 ª¬ X 2t  ȡX ˆ 2,t-1 º¼ +......+ȕ k ª¬ X kt  ȡX ˆ k,t-1 º¼ +İ t  ȡİ ˆ t-1 Yt  ȡY Yt*

Į 0  ȕ1X1t*  ȕ 2 X*2t +.........+ ȕ k X*kt  v t

where ^v t ` is a Gaussian white noise process.

(5.195)

Autocorrelation

273

Step 6: Apply the OLS method to run equation (5.195) and then obtain the least squares estimators of the parameters. ˆ

ˆ

ˆ

Denoting the second-round estimates of the parameters by ȕˆ 0 , ȕˆ 1 ,......, and ȕˆ k and using these second-round estimates we then compute the second-round residuals which are given by eˆˆ t

Įˆ 0 ˆ ˆ ˆ ˆ Yt  ȕˆ 0  ȕˆ 1X1t  ......  ȕˆ k X kt , where ȕˆ 0 = ˆ (1  ȡ)

(5.196)

Step 7: Regress eˆˆ t on eˆˆ t-1 of the type, eˆˆ t = ȡeˆˆ t-1 + u t

(5.197)

where ^u t ` is a white noise process. We apply the OLS method to run equation (5.197) and then we obtain the second round estimate of ȡ which is given by: T

ȡˆˆ =

¦ eˆˆ eˆˆ

t t-1

t=2 T

(5.198)

¦ eˆˆ 2t-1 t=2

This iterative procedure is repeated until the value of the estimate of ȡ converges, that is, until the estimate of U from two successive iterations differs by not more than some small predetermined value say 0.001. Step 10: Use the final estimate of U to get the final estimates of ȕ 0 , ȕ1 , ȕ 2 ,......, and ȕ k . Ex. 5-11: Estimate the model with AR(1) autoregressive errors for the given problem in Ex- 5-1 using the Cochrane Orcutt iterative method. Solution: First, we regress PCEX on PGDP which is given by equation (5.90). We may expect that İ t = ȡİ t-1 + v t , where {v t } is a Gaussian white noise process. We apply the OLS method to run equation (5.90) and then obtain the least squares residauls which are given by equation (5.92). We now regress e t on e t-1 of the type,

(5.199)

e t = ȡe t-1 + u t

where ^u t ` is a Gaussian white-noise process. We apply the OLS to run equation (5.199) and then obtain the firstround estimate of ȡ which is given by T

¦e e

t t-1

ȡˆ =

t=2 T

¦e

(5.200)

= 0.7720

2 t-1

t=2

We use ȡˆ = 0.7591 to transform the original variables such that * t

PGDP

* t

PCEX*t

PCEX t  0.772PCEX t-1 , and

* t

PGDPt  0.772PGDPt-1 . Now, we regress PCEX on PGDP of the type,

PCEX*t = Į +ȕ1PGDPt* +w t

(5.201)

Įˆ . Then we apply the OLS method to run equation (5.201) and then obtain the second-round ˆ (1  ȡ) estimates of the parameters which are given by:

where ȕˆ 0 =

ˆ * = 36.5283+0.5431PGDP* , R 2 PCEX t t t-Test:

6.9009

16.0066

SE:

5.2932

0.0339

0.8506 ½ ° ¾ ° ¿

(5.202)

Chapter Five

274

We now use these second-round estimates of the parameters to obtain the second-round residauls which are given by: eˆˆ t = PCEX t  160.1829  0.5431PGDPt

(5.203)

Next, we regress eˆˆ t on eˆˆ t-1 of the type eˆˆ t = ȡeˆˆ t-1 + Ȧ t

(5.204)

where ^Zt ` is a Gaussian white noise process. Apply the OLS method to run equation (5.204) and then obtain the second-round estimate of ȡ which is T

ȡˆˆ =

¦ eˆˆ eˆˆ

t t-1

t=2 T

(5.205)

= 0.7798

¦ eˆˆ 2t-1 t=2

ˆˆ ˆ Continuing this procedure, it is found that for the fifth-round estimate of ȡ , i.e., for ȡˆˆ = 0.7830 , it is convergent. Finally we use ȡˆ = 0.7830 to obtain the final estimates of ȕ 0 and ȕ1 which are given below. All the results are obtained using the software RATS.

Thus, the estimated equation is 2 ˆ PCEX t = 155.9962+0.5479PGDPt , R

t-Test:

6.3709

15.4380

SE:

24.4794 0.0355

0.9816 ½ ° ¾ ° ¿

(5.206)

The Hildreth-Lu (1960) Search Procedure

Let us consider the linear regression model of the type Yt = Į + ȕX t + İ t , for t = 1, 2, …… …, T

(5.207)

We may expect that İ t = ȡİ t-1 +u t , where {u t } is a Gaussian white noise process. This procedure involves the following steps to obtain the FGLS estimates of D and E. Step 1: Choose a value of U between –1 and 1 in intervals of 0.10. Step 2: Use this value of U to create the transformed variables such that,

Yt* = Yt - UYt-1, and Xt* = Xt - UXt-1. Step 3: Regress the transformed variable Yt* on a constant and the transformed variable Xt* of the type:

Yt* = Į 0 + ȕX*t + w t where ^w t ` is a Gaussian white noise process and D 0

D (1  U )

Apply the OLS method to run equation (5.208) and then obtain the residual sum of squares, i.e.,

Step 4: ESS =

(5.208)

T

¦e

2 t

.

(5.209)

t=1

Step 5: Choose a different value of U between –1 and 1 in intervals of 0.10. Step 6: Repeat step 2 through step 4. Step 7: Repeat step 5 and step 6. By letting U vary between –1 and 1 in a systematic fashion, you get a set of values for the residual sum of squares, one for each assumed value of U.

Autocorrelation

275

Step 8: Choose the value of U for which the residual sum of squares will be minimum. Step 9: Use this estimate of U to get the final estimates of D and E. Ex. 5-12: Obtain the FGLS estimators of the parameters with AR(1) autoregressive errors for the given problem in Ex. 5-1 using the Hildreth-Lu (1960) search procedure. Solution: First, we regress PCEX on PGDP which is given by equation (5.90). We may expect that İ t = ȡİ t-1 + v t , where {v t } is a white noise process. The results are obtained using the RATS based on the Hildreth-Lu (1960) search procedure and are given below. Table 5-4: Results obtained by using the Hildreth-Lu (1960) search procedure

Regression with AR1 - Estimation by Hildreth-Lu Search Method, Dependent Variable PCEX Variable Coeff Std Error T-Stat Signif 1. Constant 155.9567 24.4795 6.3709 0.0000 2. PGDP 0.5479 0.0355 15.4379 0.0000 3. RHO 0.7831 0.0415 18.8795 0.0000 0.9816 Centered R2 Sum of Squared Residuals 0.9990 Uncentered R2 Log Likelihood 0.9808 Adjusted R2 Regression F(2,44) TR2 46.953 Significance Level of F –Stat. Mean of Depen. Variable 480.0499 Durbin-Watson Statistic S. E. of Dependent Variable 116.6430 Q(11-1) S.E. of Estimate 16.16422 Significance Level of Q

11496.4133 -195.9317 1175.6651 0.0000 2.2135 13.4441 0.1998

Comparison between the Cochrane-Orcutt and Hildreth-Lu Estimators

(i) Using the Cochrane-Orcutt iterative method, we choose the convergent value of ȡ and the corresponding regression gives us the estimates of the parameters. According to the Hildreth-Lu search procedure, we choose the value of ȡ for which the residual sum of squares will be minimum and the corresponding regression gives us the estimates of parameters. (ii) The Cochrane-Orcutt method cannot find the global minima if there is more than one local minima for the residual sum of squares, but the Hildreth-Lu Search procedure will find the global minima. (iii) Most statistical packages have both estimators. Some econometricians suggested to estimate the model using both estimators to make sure that the Cochrane-Orcutt estimator does not miss the global minimum. Durbin’s (1960) Method

Let, us consider the linear regression model of the type, Yt = Į + ȕX t + İ t , for t = 1, 2, …… …, T

(5.210)

We may expect that, İ t = ȡİ t-1 +u t , , where ^u t ` is a Gaussian white noise process. The GLS transformation of equation (5.210), is given by, Yt  ȡYt-1 = D >1  ȡ @ +ȕ > X t  ȡX t-1 @ +İ t  ȡİ t-1

(5.211)

We can rearrange equation (5.211) by moving Yt-1 to the righthand side of the type, Yt = Į >1  U @ + ȡYt-1 +ȕX t  ȕȡX t-1 + u t

(5.212)

Yt = ȕ 0 + ȡYt-1 +ȕX t +įX t-1 + u t

where ȕ 0 = Į >1  U @ , į =  ȕȡ, and u t

İ t  ȡİ t-1 .

We apply the OLS method to run equation (5.212). The OLS estimate of ȡ as a coefficient of Yt-1 will be biased but consistent. This will be biased because u t is correlated with Yt-1 . This is called the Durbin estimate of ȡ and it is denoted by ȡˆ D . In this stage, using this estimate ȡˆ D , the Cochrane-Orcutt iterative procedure is performed to obtain

Chapter Five

276

the FGLS estimators of Į and ȕ . That is, at this stage, we use ȡˆ D to compute the transformed variable such that Yt* = Yt  ȡˆ D Yt-1 , and X*t

> X t  ȡˆ D X t-1 @ .

We now regress Yt* on X*t of the type: Yt* = ȕ 0 +ȕX*t +u t

(5.213)

where {u t } is a Gaussian white noise process. Next, we apply the OLS method to the transformed equation (5.213) and we obtain the least squares estimates of ȕ 0 ȕˆ 0 and ȕ which are denoted by ȕˆ 0 and ȕˆ . Then, Įˆ = and ȕˆ are called the FGLS estimators of Į and ȕ 1  ȡˆ D respectively. Ex. 5-13: Estimate the FGLS estimates of the parameters with AR(1) autoregressive errors for the given problem in Ex. 5-1 by applying the Durbin’s (1960) method. Solution: First, we regress PCEX on PGDP of the type, PCEX t = ȕ 0 +ȕ1PGDPt +İ t , for t = 1, 2, …… …, T

(5.214)

We may expect that İ t = ȡİ t-1 +u t , where {u t } is a Gaussian white noise process. The transformation of equation is given by: PCEX t  ȡPCEX t-1 = ȕ 0 >1  ȡ @ +ȕ1 > PGDPt  ȡPGDPt-1 @ +İ t  ȡİ t-1

(5.215)

PCEX t = Į 0 +ȡPCEX t-1 +ȕ1PGDPt  ȕ 2 PGDPt-1 + u t

where Į 0

ȕ 0 >1  ȡ @ , ȕ 2 =  ȕ1ȡ, and u t = İ t  ȡİ t-1

We apply the OLS method to run equation (5.215) and the OLS estimates are given below, 2 ˆ PCEX t = 25.5819+ 0.7599PCEX t-1 + 0.1402PGDPt  0.0278PGDPt-1 , R

t-Test:

2.1451

18.1662

SE:

11.9259 0.0418

0.6702

0.1185

0.2092

0.2348

0.9832 ½ ° ¾ ° ¿

(5.216)

The Durbin estimate of ȡ is 0.7599, that is, ȡˆ D = 0.7599 . Using this Durbin estimate of ȡ , that is, ȡˆ D = 0.7599 , the Cochrane-Orcutt procedure is performed to obtain the FGLS estimators of ȕ 0 and ȕ1 . The transformed variables can be obtained as PCEX*t = PCEX t  0.7599PCEX t-1 , and PGDPt* = PGDPt  0.7599PGDPt-1 . We now regress PCEX*t on PGDPt* of the type, PCEX*t = į 0 +ȕ1PGDPt* +v t

(5.217)

where {v t } is a Gaussian white noise process and į0 = ȕ 0 >1  0.7471@ We apply the OLS method to run the transformed equation (5.217) and then we obtain the least squares estimates of į0 and ȕ1 . The results are given below ˆ * = 39.4768+0.5382PGDP* , R 2 PCEX t t t-Test:

7.3834

SE:

5.3467 0.0328

16.404

0.8567 ½ ° ¾ ° ¿

(5.218)

39.4768 = 164.3896 . Therefore, the FGLS estimates Thus, we have įˆ 0 = 39.4768 and ȕˆ 1 = 0.5381 , and hence, ȕˆ 0 = 0.2401 of ȕ 0 and ȕ1 are 164.3896 and 0.5382 respectively.

Autocorrelation

277

Hence, the estimated equation will be 2 ˆ PCEX t = 164.3896 + 0.5382PGDPt , R

t-Test:

7.3834

16.4040

SE:

22.2649

0.0328

0.8567 ½ ° ¾ ° ¿

(5.219)

Prais-Winsten Estimation Procedure

Let us consider the linear regression model of the type, Yt = X ct ȕ+İ t , for t = 1, 2, …… …, T

(5.220)

We may expect that İ t = ȡİ t-1 +u t , where {u t } is a Gaussian white noise process. In this method, the estimated value of ȡ is obtained from the Durbin-Watson d statistic. The estimated value of ȡ from the Durbin-Watson d statistic is given by: ȡˆ = 1 

d 2

(5.221)

The FGLS estimator of ȕ is given by, -1

ˆ -1X º X c: ˆ -1Y ȕˆ = ª¬ X c: ¼  Uˆ 0 ª 1 «  Uˆ 1+Uˆ 2  Uˆ « « 0  Uˆ 1+Uˆ 2 « . . ˆ -1 = 1 « . where : 2 « . . . 1  Uˆ « . . « . « 0 0 0 « 0 0 «¬ 0

(5.222) 0

0

....

....

0  Uˆ

0

....

....

0

....

....

.

.

....

....

.

.

....

....

.

....

....

. 0 0

....  Uˆ 1+Uˆ 2 .... 0  Uˆ

0 º 0 »» 0 » » . » . » » . »  Uˆ » » 1 »¼

The FGLS estimators can also be obtained by making transformation. The transformation for the first observation is given by * Y1* = 1  ȡˆ 2 Y1 , X11 = 1  ȡˆ 2 X11 , X*21 = 1  ȡˆ 2 X 21 ,…..and X*k1 = 1  ȡˆ 2 X k1 .

And the transformation for the remaining observations will be ˆ t-1 , X1t* = X1t  ȡX ˆ 1t-1 , X*2t = X 2t  ȡX ˆ 2t-1 ,……,and, X*kt = X kt  ȡX ˆ kt-1 , for t = 2, 3,…….,T . Then, we Yt* = Yt  ȡY

regress Yt* on X1t* , X*2t ,….., X*kt , for t = 1, 2,……….,T, of the type Yt* = Į + ȕ1X*2t +ȕ 2 X*2t +......+ȕ k X*kt +u t

(5.223)

Thus, all of the v’s have the same variance to the transformed equation (5.223) for T observations and we can apply the OLS method to equation (5.223) which is known as the FGLS method. The resulting estimates are called the FGLS estimates which are BLUE. In practice, the Prais-Winston estimation technique is used in an iterative scheme. That is, once the FGLS estimator is found using the ȡˆ from the Durbin-Watson test, we can obtain a new set of residuals e’s from equation (5.223). Then, we can obtain the new estimate of ȡ from the regression equation of e t on

e t-1 , i.e., e t = ȡe t-1 +v t

(5.224)

Next, we transform the variables using the new estimate of ȡ and estimate the transformed equation (5.223) using the OLS method. We can repeat this process until the estimated value of ȡ converges, i.e., until the estimate of U from

Chapter Five

278

two successive iterations differs by not more than a small predetermined value say 0.001. Finally, we use the final estimate of U to get the final FGLS estimators of the parameters ȕ's . Ex. 5-14: Estimate the FGLS estimates of the parameters with AR(1) autoregressive errors for the given problem in Ex. 5-1 by applying the Prais Winsten estimation procedure. Solution: First, we regress PCEX on PGDP of the type which is given by equation (5.214). We apply the OLS method to run equation (5.214) and then we obtain the Durbin-Watson d statistic. The OLS estimates are given below, PCEX t = 244.8516+0.4329PGDPt , R 2 t-Test:

10.9235

11.5530

SE:

22.4150 0.0375

0.7437, d = 0.1397 ½ ° ¾ ° ¿

(5.225)

We now obtain ȡˆ from the Durbin-Watson d statistic which is given by, ȡˆ = 1 

0.1397 2

0.9301

Based on ȡˆ = 0.9301, we estimate the FGLS estimates of ȕ 0 and ȕ1 using the Prais-Winston method by using the software package RATS. The results are given below PCEX t = 230.8908+0.4490PGDPt , R 2 t-Test:

12.4228 14.5440

SE:

18.5859

0.0309

0.8255½ ° ¾ ° ¿

(5.226)

Properties of the FGLS Estimator

If the model of autocorrelation that we assume is a reasonable approximation of the true autocorrelation, then the FGLS estimator will yield more precise estimates than the OLS estimates. The estimates of the variance and covariance of the parameter estimates will also be unbiased and consistent. However, if the model of autocorrelation that we assume is not a reasonable approximation of the true autocorrelation, then the FGLS estimator will yield worse estimates than the OLS estimates. Estimating Models with Higher Order AR(p) Autoregressive Errors

Let us consider a linear regression equation of the type, y t = x ct E  H t

(5.227)

We may expect that, İ t = ȡ1İ t-1 + ȡ 2 İ t-2 +......+ȡ p İ t-p +v t , where {v t } is a Gaussian white noise process. If the Lagrange multiplier (LM) test rejects the null hypothesis of no serial correlation of order p, i.e., H 0 : ȡ1 = ȡ 2 =...............=ȡ p = 0 , FGLS estimates of the parameters can be obtained from equation (5.227) with the pth-order autoregressive error. The estimation procedure involves the following steps: Step 1: First, apply the OLS method to run equation (5.227) and then obtain the OLS residuals e’s which are given by e t = y t  ȕˆ 0  ȕˆ 1X1t  ȕˆ 2 X 2t  ..........  ȕˆ k X kt , t = 1, 2,………,T

(5.228)

Step 2: Second, regress e t on e t-1 , e t-2 ,..........,e t-p of the type:

e t = ȡ1e t-1 + ȡ 2 e t-2 +........+ȡ p e t-p +u t , t = p+1, p+2,………,T

(5.229)

where ^u t ` is a Gaussian white noise process. Apply the OLS method to run equation (5.229) and then obtain the OLS estimates of ȡ1 , ȡ 2 ,…., and ȡ p . Step 3: Use these estimates ȡˆ 1 , ȡˆ 2 , ...............,and ȡˆ p to create the transformed variables. The transformed variables are

given by, y*t = y t  ȡˆ 1 y t-1  ȡˆ 2 y t-2  ..........  ȡˆ p y t-p , , and X*jt

X jt  ȡˆ 1 X j,t-1  ȡˆ 2 X j,t-2  ..........  ȡˆ p X j,t-p , j = 1, 2,…….,k.

Autocorrelation

279

Step 4: Regress the transformed variable Yt* on a constant and the transformed variable Xt*’s of the type y*t = ȕ*0 + ȕ1X1t* +ȕ 2 X*2t +.........+ȕ k X*kt + w t

(5.230)

where ^w t ` is a Gaussian white noise process and ȕ*0 = ȕ 0 ª¬1  ȡ1  ȡ 2  ........  ȡ p º¼ . Step 5: Apply the OLS method to the transformed equation (5.230) and then obtain the least squares estimates of the parameters. The slope coefficient of the transformed model, E’s will be the same as the slope coefficient of the original model. The constant term in the original model is

different and is given by ȕˆ 0 =

ȕˆ *0 . 1  ȡˆ 1  ȡˆ 2  ..........  ȡˆ p

Step 6: Use the ȕˆ 0 along with the estimates of ȕ j , j = 1, 2,......,k into the original regression equation to obtain the

revised residuals e’s. Then, go back to Step 2 and iterate until the estimates of ȡ's converge and the resultant estimates of ȕ's give the FGLS estimates. Ex. 5-15: Estimate the FGLS estimates of the parameters with AR(p) autoregressive errors for the given problem in Ex. 5-1. Solution: First, we regress PCEX on PGDP of the type

(5.231)

PCEX t = ȕ 0 + ȕ1PGDPt + İ t

We may expect that İ t = ȡ1İ t-1 + ȡ 2 İ t-2 +....+ȡ p İ t-p + v t , where {v t } is a Gaussian white noise process. For the given problem using the Breusch-Godfrey LM test it is found that the error terms are autocorrelated with order 2. Thus, we have to estimate equation (5.231) with AR(2) autoregressive errors. The estimated results are obtained by using the software package RATS for the model (5.231) with AR(2) autoregressive random errors for which ȡ's converge. The FGLS estimates of ȕ 0 and ȕ1 are given below in Table 5-5. Table 5-5: FGLS estimates of equation (5.229) with AR(2) autoregressive errors.

Linear Regression - Estimation by Least Squares, Dependent Variable PCEX Variable Coeff Std Error 1. Constant 153.4244 18.4109 2. PGDP 0.5521 0.0265 ˆȡ1 0.6443 0.1539 0.0904 0.1147 ȡˆ 2 Centered R2 Adjusted R2 Uncentered R2 Mean of Depen. Variable S. E. of Depen. Variable S.E. of Estimate

0.9099 0.9077 0.9908 132.2404 45.0686 13.6861

Sum of Squared Residuals Regression F(1, 43) Significance Level of F Log Likelihood Durbin-Watson Statistic

T-Stat 8.3333 20.8359 4.1871 0.7879

Signif 0.0000 0.0000 0.0001 0.4350 8054.2983 434.1346 0.0000 -180.5664 1.9843

Note: Many software packages such as EViews, GAUSS, LIMDEP, Python, R, RATS, STATA, SPSS and TSP can be applied directly to test for autocorrelation and for estimating autocorrelated models, allowing us not to perform additional task in the estimation process.

Chapter Five

280

Exercises 5-1: What is meant by autocorrelation? Explain with an example. 5-2: Write different sources of autocorrelation with an example of each. 5-3: Find the mean, variance, covariance and autocorrelation of AR(1) autoregressive random error terms. 5-4: Find the variance-covariance matrix of İ Tu1 if the random error terms are autocorrelated with order 1. 5-5: What are the consequences of autocorrelation? 5-6: Show that the OLS estimator ȕˆ will be a consistent estimate of ȕ even if the disturbance terms are autocorrelated. 5-7: Find the variance-covariance matrix of ȕˆ if the random error terms are autocorrelated with order one. 5-8: Show that the BLUE property is not satisfied if the random error terms are autocorrelated. ˆ underestimates the true variance of ȕˆ if the random error terms are autocorrelated. 5-9: Show that var(ȕ)

5-10: Show that the least squares estimates of the regression coefficients will be inefficient relative to GLS estimates if the random terms are autocorrelated. 5-11: Explain why the Student-t test, F-test and Chi-square test are not applicable to test the significance of the parameters if the random error terms are autocorrelated. 5-12: Show that the least squares estimators will be biased or inconsistent, if the regression equation contains the lagged values of the dependent variable and the random error terms are autocorrelated. 5-13: Discuss the graphical method to detect the presence of autocorrelation problem in a data. 5-14: Discuss the Von- Neumann ratio test for detecting the presence of the autocorrelation problem. 5-15: What do you understand by the term autocorrelation? An econometrician suspects that the residuals of his considered model might be autocorrelated. Explain the different steps involved in testing this theory using the DurbinWatson test. 5-16: Explain what is meant by the “inconclusive region” of the Durbin-Watson test. Show it graphically. 5-17: Let the Durbin-Watson test statistic be 0.96 and the sample size be 45 with three explanatory variables plus a constant term. Perform a test. What is your conclusion? 5-18: What are the shortcomings of the Durbin-Watson test statistic? Explain how you will tackle the problem of inconclusion of the Durbin-Watson test. 5-19: The least squares regression based on 45 observations produces the following results: yˆ t = 0.3256+1.2561x t , R 2 SE: 0.1256 0.2145,

0.9875½° ¾ DW = 1.325 °¿

Test the hypothesis that the true disturbances are not autocorrelated. 5-20: Discuss the asymptotic test for testing the first-order autocorrelation problem. 5-21: The least squares estimates based on a sample of 65 observations are given below: yˆ t = 5.7456+0.0752x t , R 2 0.9675½° ¾ SE: 0.3298 0.0565, ȡˆ = 0.7473 °¿

Test the hypothesis that the true disturbances are not autocorrelated. 5-22: Discuss the Breusch-Godfrey LM test for the first-order autocorrelation problem 5-23: The least squares regression based on 45 observations produces the following results:

Autocorrelation

yˆ t =  839.4854+0.3895x1t  0.19761x 2t  0.22926x 3t ; SE: 395.8126 0.0366 eˆ t

0.3025e t-1 , R e2

SE: 0.1472

0.0213

0.0176;

281

R2

0.9957 ½° ¾ DW = 1.3887 °¿

0.1256 ½° ¾ °¿

Test the hypothesis that the true disturbances are not autocorrelated using both DW test and the Breusch-Godfrey LM test. 5-24: Discuss the Breusch-Godfrey LM test for the higher-order autocorrelation problem. 5-25: A least squares method based on 59 observations produces the following results: yˆ t = 15.0627+0.324494x t , R 2 SE: 1.0649 eˆ t

0.0514

0.9657 ½° ¾ °¿

1.2442e t-1  0.4658e t-2 , R e2

SE: 0.1193

0.1190

0.7825°½ ¾ °¿

Test the hypothesis that the true disturbances are not autocorrelated using the Breusch-Godfrey LM test. 5-26: Discuss the run test for testing an autocorrelation problem in a data set. 5-27: The least squares residuals of a simple linear regression equation are given below:

-2.81, 5.63, -4.25, -2.26, -3.37, 2.52 1.97, 3.86. -0.04, 0.08, -1.04, -0.37, -2.04, -1.59, 3.73 3.56, -2.59, -0.39, -2.15, 1.98, 2.24, -3.25, -1.56, 0.35, -1.25, 3.51, -0.89, -2.56, -3.22, 1.56 Test the presence of autocorrelation problem in the data using the run test. 5-28: Discuss the Box-Pierce Q-test statistic and Ljung-Box Q test for testing the higher-order autocorrelation problem. 5-29: The first six autocorrelation coefficients are estimated based on a sample of 57 observations and are given below:

Lag

1

2

3

4

5

6

Autocorrelation coefficient

0.751

0.512

0.382

0.298

0.209

0.142

Test each of the individual correlation coefficients for significance and test the significance of all six correlation coefficients jointly using both Box-Pierce Q and Ljung-Box Q tests. 5-30: Explain why adding a lagged dependent variable and lagged independent variable(s) to the model eliminates the problem of the first-order autocorrelation. Give at least two reasons why this is not necessarily a preferred solution. 5-31 : Let us consider a regression equation in a matrix form of the type: YT×1 =X (T×(k+1))ȕ (k+1)×1) +İ T×1

We may expect that İ t = ȡİ T-1 +v t , where {v t } is a white noise process. Discuss the generalized least square method to estimate the model. Discuss the maximum likelihood method to estimate the model. 5-32: Explain what is meant by a feasible generalized least squares estimator. Discuss the following methods to estimate feasible generalized least squares estimators of a regression equation if the random error terms are autocorrelated with order 1.

(i) Cochrane-Orcutt method (ii) Hildreth-Lu search procedure (iii) Durbin’s method (iv) Prais-Winston estimation procedure

282

Chapter Five

5-33: Describe in steps how you would obtain the feasible generalized least squares estimators of the regression equation Yt = ȕ 0 +ȕ1X t +İ t , with AR(1) autoregressive errors. 5-34: Describe in steps how you would obtain the feasible generalized least squares estimators of the regression equation Yt = ȕ 0 +ȕ1X t +İ t , with AR(2) autoregressive errors. 5-35: Describe in steps how you would obtain the feasible generalized least squares estimators of the regression equation Yt = ȕ 0 +ȕ1X1t +........+ȕ k X kt +İ t , with AR(p) autoregressive errors. 5-36: Let, the output function of the UK be given by GDPt = GDP0 LĮt K ȕt eİ t , where GDPt is the real GDP at time t, L t is the labour force at time t, K t is the capital investment at time t, GDP0 is the initial real GDP, and İ t is the random error term corresponding to the tth set of observations. Test the presence of autocorrelation problem using the data of the UK and then estimate the model.

CHAPTER SIX MULTICOLLINEARITY

6.1 Introduction To estimate multiple linear regression models using the least squares method, we assume that the rank of the matrix X of observations on explanatory variables is the same as the number of explanatory variables. In other words, it can be said that the X matrix is of full column rank. This assumption implies that all the explanatory variables of a multiple linear regression equation are independent, i.e., there is no linear relationship among the explanatory variables. It means that the explanatory variables are orthogonal. But in most cases of econometric analyses of economic relationships, this assumption is not satisfied, which means that the explanatory variables may not remain independent and thus the multicollinearity problem arises. The term multicollinearity is used to explain the situation where the explanatory variables are highly or nearly intercorrelated. If two or more explanatory variables are closely related, we cannot distinguish the effect of one from the other. For example, if we consider a regression equation to find the effects of household income and liquid assets on household consumption expenditure, we cannot separate the effects of the two because, in our observations, income and liquid assets are highly related. Since household income and liquid assets are changing in the same proportion, we really cannot tell how much of the impact is due to the income and how much is due to the liquid asset on consumption expenditure. But we can say what the combined impact is, that is, we will find n

ȕˆ 1 +Ȝȕˆ 2 =

¦x i=1 n

1i

¦x

yi , where x 2i = Ȝx1i .

(6.1)

2 1i

i=1

Thus, no matter which arbitrary solution we take, the linear combination ȕˆ 1 +Ȝȕˆ 2 will always give us the same numerical value. So, ȕˆ 1 +Ȝȕˆ 2 is an estimable function, where O is the parameter defining the linear dependence between x and x which is also estimable. But, ȕˆ and ȕˆ cannot be estimated separately, and a certain linear 1

2

1

2

combination of ȕˆ 1 and ȕˆ 2 can also be estimated for any given value of X1 and Y. It is to be noted that multicollinearity may exist not only in linear forms, but it may also occur in non-linear forms. Multicollinearity may be perfect or imperfect. When one explanatory variable is an exact linear function of one or more explanatory variables with no error term, then the linear relationship between explanatory variables is called perfect multicollinearity. Imperfect multicollinearity means that one explanatory variable is a linear function of one or more explanatory variables plus an error term. If the explanatory variables are perfectly related, the OLS method fails to estimate the parameters of multiple linear regression equations because the matrix X cX will not be invertible. Even if the explanatory variables are closely related, the standard error of the estimators will be very large, leading to the acceptance of the null hypothesis wrongly. As a result, sometimes important explanatory variable(s) will be removed from the equation. If the regression equation contains only two explanatory variables, then multicollinearity is measured by a simple correlation coefficient (r). If more than two explanatory variables are included in the regression equation, then multicollinearity can be measured by multiple correlation coefficient (R), by regressing any one explanatory variable on the remaining explanatory variables or by partial correlation coefficient (r12.34... ) . In the case of perfect or near-perfect collinearity or multicollinearity problems, software packages such as EViews, GAUSS, LIMDEB, Python, R, RATS, SHAZAM, STATA, TSP, etc., can be used to estimate the equation automatically, excluding one of the involved variables from the analysis. If we find that the software is kicking out one of several explanatory variables from the analysis, it indicates the existence of a multicollinearity problem. Thus, students and researchers need to know what consequences can happen if we apply the OLS method to a multiple linear regression equation when there exists a multicollinearity problem. Therefore, in this Chapter, different forms of multicollinearity, sources of multicollinearity, consequences of multicollinearity, different tests for detecting the presence of multicollinearity problems, and different estimation techniques with multicollinearity problems are discussed with their applications to numerical problems.

Chapter Six

284

6.2 Some Important Concepts This section discusses some fundamental issues and concepts associated with the multicollinearity problem of a single equation regression model. Multicollinearity: The term multicollinearity is a statistical phenomenon in which two or more independent variables in a multiple regression equation are related in such a way that one independent variable can be linearly predicted from the other independent variable(s) with a high degree of accuracy. Multicollinearity is generally used in observational studies and is less popular in experimental studies. Ex. 6-1: Let us consider the wage equation of the type: (6.2)

wagei = ȕ 0 +ȕ1agei +ȕ 2 edu i +ȕ 3 expi +ȕ 4 gen i +İ i

where wage = hourly wages, age = years of age, edu = years of formal education, exp = years of working experience, gen = gender of wage earners which takes 1 for male and zero for female, and İ i ~N(0, ı 2 ) . For the given equation, the variable age and edu are correlated, because if we have a sample of young people, an extra year of age also implies another year of education (assuming that all students go to school). As a change in age also implies an extra year of education, we cannot say that education is constant when we change the value for age. Perfect (or Exact) Multicollinearity

If two or more independent variables of a multiple linear regression equation have an exact linear relationship between them, then the linear relationship between the explanatory variables is called perfect multicollinearity. The word “perfect multicollinearity” means that a change in one explanatory variable can be completely explained by movements in another explanatory variable. Ex. 6-2: Suppose we want to estimate the linear regression equation of the type: Yi

ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i , İ i ~N(0, ı 2 )

(6.3)

If there is an exact linear relationship between X1 and X2 of type: (6.4)

X1i = 5+2.5X 2i

8

10

12

x2

14

16

18

Then, we cannot estimate the effects of X1 and X2 on Y separately but we can find the combined effects of X1 and X2 on Y.

1

2

3 X1

4

5

Fig. 6-1: Perfect multicollinearity between X1 and X 2

Imperfect (or Near) Multicollinearity

If two or more independent variables of a multiple linear regression equation have an imperfect linear relationship between them, then the linear relationship between the explanatory variables is called imperfect multicollinearity or near multicollinearity. The word “imperfect multicollinearity” means that a change in one explanatory variable cannot be explained completely by movements in another explanatory variable. Ex. 6-3: Suppose we want to estimate equation (6.3) and if there is a linear relationship between X1 and X 2 of the type:

Multicollinearity

285

(6.5)

X1i = Į 0 +Į1X 2i +u i

where u i is a random error term, then the relationship between X1 and X 2 is called imperfect multicollinearity. 160 140 120 100 80 60

100

150

200

250

Fig. 6-2: Imperfect multicollinearity between X1 and X2

6.3 Sources of Multicollinearity Multicollinearity may arise for different reasons in applied research works. Some of the important sources of multicollinearity are highlighted below: (i) Economic variables tend to move together over time which may cause the interrelationship between or among explanatory variables, causing a multicollinearity problem. For example, income, consumption expenditure, savings, investment, price, employment, inflation, etc., tend to rise in periods of economic growth and decrease in periods of recession. Growth and trend factors in time series cause of multicollinearity. (ii) The use of lagged values of some explanatory variables as separate independent variables in the regression equation causes the problem of multicollinearity. For example, in an ice-cream consumption function, it is customary to include the past as well as the present values of the variable income and temperature with other explanatory variables. As the successive values of a certain variable are interrelated, a multicollinearity problem arises. (iii) The use of non-linear forms of some explanatory variables as separate independent variables in the regression equation causes the problem of multicollinearity. For example, in a wage equation, it is very much customary to include the squared form of the variable experience. The square form of a certain variable is intercorrelated with its level form. Thus, this causes the multicollinearity problem. (iv) The use of improper dummy variables in a regression analysis causes the problem of multicollinearity. (v) Including an explanatory variable that is computed from other explanatory variables in the equation causes the problem of multicollinearity. For example, family income = family savings + family expenditure. If the variables income, savings and expenditure are used as explanatory variables, it will cause the problem of multicollinearity. (vi) If we include the same or almost the same variable twice (height in feet and height in inches; or, more commonly, two different operationalisations of the same concept) in a model, it also causes the problem of multicollinearity. (vii) Repeated measures of the same measured variables cause the problem of multicollinearity. When data are collected over a period of time, it is very common to obtain repeated measures on a single variable. Such measures often exhibit high correlations. If they are used as IVs in a regression analysis, it will cause the problem of multicollinearity. (viii) Sometimes, the multicollinearity problem can be found in an overdetermined model. A large number of variables are included to make it more realistic, and consequently, the number of observations becomes smaller than the number of explanatory variables. This type of situation can happen in medical science research works where the number of patients may be small but the information is collected on a large number of variables which may cause the problem of multicollinearity.

6.4 Consequences of multicollinearity In the case of perfect or near-perfect multicollinearity, the following possible consequences are encountered: (i) The linearity of the OLS estimators will not be affected.

Chapter Six

286

(ii) The unbiasedness of the OLS estimators will not be affected. (iii) The estimates of the coefficients are indeterminant. (iv) The sampling variance of the OLSE becomes very large. So, OLSE becomes imprecise and the property of BLUE does not hold anymore for ȕ. (v) Due to large standard errors, the null hypothesis will often be accepted. So, important explanatory variables may be dropped from the equation. (vi) The large confidence interval of the parameters may arise due to large standard errors. (vii) It is impossible to separate the effects of changes in individual explanatory variables. (viii) Great sensitivity of OLSE or variance of OLSE to small changes in data.

6.5 Properties of the OLS Estimators: Effects on the Properties of the OLS Estimators If we apply the OLS method to estimate multiple regression models in which there is a multicollinearity problem, then the following consequences/properties are to be found: Property i: The linearity of the OLS estimators will not be affected. Property ii: The unbiasedness of the OLS estimators will not be affected. Property iii: The estimates of the coefficients are indeterminate. Proof: Let us consider the regression equation of the type: Yi = ȕ 0 + ȕ1X1i +ȕ 2 X 2i +İ i

(6.6)

yi = ȕ1 x1i + ȕ 2 x 2i + İ i

where yi = Yi  Y, x1i

X1i  X1 , and x 2i

X 2i  X 2

In matrix notation, equation (6.6) can also be written as: Y = Xȕ + İ, where Var(İ) = ı 2 I n ª y1 º «y » « 2» « . » where Y = « » , X = « . » « . » « » ¬« y n ¼»

ª x11 «x « 12 « . « « . « . « ¬« x1n

(6.7)

x 21 º x 22 »» . » ª ȕ1 º » , ȕ = « » , and İ = . » ¬ȕ 2 ¼ » . » x 2n ¼»

ª İ1 º «İ » « 2» «.» « » «.» «.» « » ¬«İ n ¼»

The OLS estimator of ȕ is given by: ȕˆ = (X cX)-1X cY ª n 2 « ¦ x 2i 1 « i=1 = 2 n n n n ª º « x1i2 ¦ x 22i  « ¦ x 2i x1i » « ¦ x 2i x1i ¦ i=1 i=1 ¬ i=1 ¼ ¬ i=1

n ºª n º ¦ x1i x 2i » « ¦ x1i yi » i=1 » « i=1 » n »« n » 2 x1i » « ¦ x 2i yi » ¦ ¼ ¬ i=1 ¼ i=1

n n n ª n º x1i yi ¦ x 22i  ¦ x 2i yi ¦ x1i x 2i » ¦ « 1 i=1 i=1 i=1 « i=1 » 2 n n n n n n n « » 2 ª º 2 2 x1i ¦ x 2i  « ¦ x 2i x1i » « ¦ x 2i yi ¦ x1i  ¦ x1i yi ¦ x 2i x1i » ¦ ¼ i=1 i=1 i=1 i=1 i=1 ¬ i=1 ¼ ¬ i=1

From equation (6.8) we have

(6.8)

Multicollinearity n

ȕˆ 1 =

n

¦x y ¦x 1i

i

i=1

n

2 2i

n

 ¦ x 2i yi ¦ x1i x 2i

i=1

n

i=1

n

i=1

n

ª º x ¦ x  « ¦ x 2i x1i » ¦ i=1 i=1 ¬ i=1 ¼ 2 1i

287

2 2i

(6.9)

2

and n

ȕˆ 2

n

n

n

¦ x 2i yi ¦ x1i2  ¦ x1i yi ¦ x 2i x1i i=1

i=1

n

i=1

n

i=1

n

ª º x ¦ x  « ¦ x 2i x1i » ¦ i=1 i=1 ¬ i=1 ¼ 2 1i

2 2i

(6.10)

2

Equation (6.9) can be written as: n

n

¦x y ¦x 1i

ȕˆ 1 = i=1

i

2 2i

i=1

n

n

i=1

i=1

 ¦ x 2i yi ¦ x1i x 2i

2 ª ª n º º x x « « ¦ 2i 1i » » n n 2 2 « x1i ¦ x 2i 1  ¬ni=1 n ¼ » ¦ « » 2 2 i=1 i=1 « ¦ x1i ¦ x 2i » i=1 ¬« i=1 ¼» n

n

¦x y ¦x 1i

i

i=1

2 2i

i=1 n

n

n

i=1

i=1

 ¦ x 2i yi ¦ x1i x 2i

n

¦ x1i2 ¦ x 22i ª¬1  r 2 º¼ i=1

n

, where r =

¦x

2i

x1i

i=1

n

2 1i

n

¦x ¦x

i=1

i=1

(6.11) 2 2i

i=1

If the explanatory variables are perfectly related, then we have r = r 1 . Therefore, equation (6.11) is given by: n

n

¦x y ¦x

ȕˆ 1 = i=1

1i

i

2 2i

i=1

n

n

i=1

i=1

 ¦ x 2i yi ¦ x1i x 2i 0

(6.12)

f Similarly, we can show that: ȕˆ 2 = f

(6.13)

If there exists a high degree of collinearity between X1 and X 2 , i.e., x 2i = cx1i where c is any arbitrary constant number, then equation (6.9) can be written as: n

ȕˆ 1 =

n

2

n

2 1i

n

¦ x y ¦ c x  ¦ cx y ¦ cx 1i

i=1

i

1i

i=1

n

i=1

n

2

2 1i

i=1

ª n º x ¦ c x  « ¦ c 2 x1i2 » ¦ i=1 i=1 ¬ i=1 ¼ 2 1i

n

i

2

2 1i

n

n

n

i=1

i=1 2

c 2 ¦ x1i yi ¦ x1i2  c 2 ¦ x1i yi ¦ x1i2 i=1

i=1

2

ª n º ª n º c 2 « ¦ x1i2 »  c 2 « ¦ x1i2 » ¬ i=1 ¼ ¬ i=1 ¼

0 0

Similarly, we have

(6.14)

Chapter Six

288

0 ȕˆ 2 = 0

(6.15)

Thus, it can be concluded that the parameters are indeterminate if the explanatory variables of a regression equation are perfectly or nearly perfectly correlated. Property iv: The sampling variance of the OLSE becomes very large. So, OLSE becomes imprecise and the property of BLUE does not hold anymore for ȕˆ . Proof: The variance-covariance matrix of ȕ is given by: ª n 2 « ¦ x 2i 1 ˆ « i=1 var(ȕ) = 2 n n n « n ª º 2 2  x 2i x1i x x  x x « ¦ 1i ¦ 2i « ¦ 2i 1i » ¬ ¦ i=1 i=1 i=1 ¬ i=1 ¼

n º ¦ x1i x 2i » i=1 » ı2 n » 2 x1i » ¦ ¼ i=1

(6.16)

Thus, from equation (6.16), we have n

var(ȕˆ 1 ) =

ı 2 ¦ x 22i i=1

n

n

ª n º x ¦ x  « ¦ x 2i x1i » ¦ i=1 i=1 ¬ i=1 ¼ 2 1i

2

(6.17)

2

(6.18)

2 2i

and n

var(ȕˆ 2 ) =

ı 2 ¦ x1i2 i=1

n

n

ª n º x ¦ x  « ¦ x 2i x1i » ¦ i=1 i=1 ¬ i=1 ¼ 2 1i

2 2i

From equation (6.17), we have var(ȕˆ 1 ) =

ı2

(6.19)

n

¦ x1i2 ª¬1  r 2 º¼ i=1

If the variables X1 , and X 2 are perfectly related, then from equation (6.19), we have var(ȕˆ 1 ) = f

(6.20)

Similarly, we have: var(ȕˆ 2 ) =

ı2

(6.21)

n

¦ x 22i ª¬1  r 2 º¼ i=1

If X1 , and X 2 are perfectly related, then from equation (6.21), we have: var(ȕˆ 2 ) = f

(6.22)

The covariance between ȕˆ 1 and ȕˆ 2 is given by: n

Cov(ȕˆ 1 , ȕˆ 2 ) =

ı 2 ¦ x1i x 2i i=1

n

2 1i

n

¦x ¦x i=1

i=1

2 2i

ª¬1  r 2 º¼

Multicollinearity



289

ı2 r n

n

2 1i

¦x ¦x i=1

i=1

2 2i

ª¬1  r 2 º¼

(6.23)

f

This shows that if the explanatory variables are perfectly related, the variance of the OLS estimates becomes infinitely large. From equations (6.19) and (6.21), we see that, for increasing collinearity, the variance of the OLS estimates will be increased sharply. Ex. 6-4: Here, ı 2 ,

n

¦x

2 1i

, and

i=1

n

¦x

2 2i

ı2

are constants and let us define c

n

i=1

¦x

and d = 2 1i

ı2 n

¦x

i=1

. 2 2i

i=1

Then the magnifications of the variance of ȕˆ 1 and ȕˆ 2 for increasing collinearity are shown in Table 6-1. Table 6-1: The values of Var(ȕˆ 1 ) and Var(ȕˆ 2 ) for the increasing value of r.

r var(ȕˆ 1 ) var(ȕˆ ) 2

0 c

0.4 1.67c

0.5 2c

0.6 2.5c

0.7 3.33c

0.8 5c

0.9 10c

0.95 20c

0.98 50c

0.99 100c

0.999 1000c

d

1.67d

2d

2.5d

3.33d

5d

10d

20d

50d

100d

1000d

Table 6-1 shows that the sampling variances rise sharply when r exceeds 0.90. Thus, it can be said that the OLS estimators do not hold the BLUE property when there is a multicollinearity problem. Property v: Often, the null hypothesis will be accepted due to the multicollinearity problem. So, the important explanatory variables may be dropped from the equation. Proof: The null hypothesis to be tested is:

H 0 : ȕ j = 0, (j = 1, 2) against the alternative hypothesis: H1: ȕ j z 0

Under the null hypothesis, the test statistic is given by: t=

ȕˆ j var(ȕˆ j )

~t (n-k-1)d.f.

(6.24)

Due to the multicollinearity problem, var(ȕˆ j ) will be very large, so the calculated value of the t statistic will be very small. Consequently, the null hypothesis will be accepted frequently. Therefore, sometimes important variable(s) may be dropped from the regression equation due to the multicollinearity problem. Property vi: The large confidence interval of the parameters may arise due to the multicollinearity problem. Proof: The confidence interval for the parameter ȕ j (j=1, 2) is given by: ȕˆ j  t Į/2, n-k-1SE(ȕˆ j ) d ȕ j d ȕˆ j +t Į/2, n-k-1SE(ȕˆ j )

(6.25)

Due to the multicollinearity problem, SE(ȕˆ j ) will be very large. Consequently, the confidence interval of ȕ j becomes wider. Property vii: It is impossible to separate the effects of change in individual explanatory variables. Proof: If we apply the OLS method to equation (6.7), we have (XcX)ȕˆ = X cY

(6.26)

Chapter Six

290

If two variables X1 , and X 2 are exactly related, then we have: x 2i = cx1i n

¦x

n

2 2i

= c 2 ¦ x1i2

i=1

(6.27)

i=1

We have

(XcX)

ª n 2 « ¦ x1i « i=1 « n 2 « c¦ x1i ¬ i=1 n

¦x

n º c¦ x1i2 » i=1 » n 2 2 » c ¦ x1i » ¼ i=1

ª1 c º «c c 2 » ¬ ¼

2 1i

i=1

(6.28)

Thus, the determinant of (XcX) will be zero and the rank of (XcX) is 1. Hence, we can say that X matrix is not a fullcolumn rank matrix, hence we cannot obtain the unique OLS estimators defined in equation (6.26). Again, we have ª n º « ¦ x1i yi » » XcY = « i=1 « n » « ¦ cx1i yi » ¬ i=1 ¼ n ª1º = ¦ x1i yi « » i=1 ¬c ¼

(6.29)

Thus, from equation (6.26), we have: n

¦x i=1

2 1i

ª1 c º ª ȕˆ 1 º n «c c 2 » « ˆ » = ¦ x1i yi ¬ ¼ ¬«ȕ 2 ¼» i=1 n

ȕˆ 1 +cȕˆ 2 =

¦x i=1 n

1i

¦x

ª1º «c » ¬ ¼

yi

(6.30) 2 1i

i=1

Equation (6.30) shows that if the explanatory variables are exactly correlated, we can't separate the effects of changes in individual explanatory variables on the dependent variable. Property viii: Great sensitivity of OLSE or variance of OLSE to a small change in data. Proof: Let us consider the linear regression model of the type: Yi = ȕ 0 +ȕ1X1i +.........+ȕ k X ki + İ i

(6.31)

In matrix notation, equation (6.31) can be written as: Y = Xȕ + İ

(6.32)

The OLS estimator of ȕ is -1 ȕˆ = X cX XcY

Partitioning the X matrix as:

(6.33)

Multicollinearity

X = ª¬ x j

291

X i º¼

(6.34)

where x j is the jth column vector of the observations on the jth explanatory variable, and X i is the submatrix of observations on the (k-1) other explanatory variables. We have § xc · XcX = ¨ j ¸ x j © Xci ¹ § x cj x j =¨ © Xci x j

Xi x cj X i · ¸ Xci X i ¹

(6.35)

The inverse matrix of the matrix (XcX) is given by:

XcX

-1

§A = ¨ 11 © A 21

A12 · ¸ A 22 ¹

(6.36)

where A11

ª x c x  x c X XcX -1 X cx º j i i i i j ¬ j j ¼

1

-1

-1

= ª¬ x cj M i x j º¼ , [M i = I  X i X ci X i X ci ]

(6.37)

The variance of ȕˆ is given by: ˆ = X cX -1 ı 2 var(ȕ)

(6.38)

So, the sampling variance of ȕˆ j is given by: var(ȕˆ j ) = A11ı 2 =

ı2 x cj M i x j

(6.39)

We regress the jth explanatory variable X j on the remaining (k-1) explanatory variables of the type: X ji = Į 0 +Į1 X1i +Į 2 X 2i +.......+Į j-1X j-1,i +Į j+1X j+1,i +.......+Į k X ki + u i

(6.40)

where ^u i ` is a Gaussian white noise process. In matrix form, equation (6.40) can be written as: (6.41)

x j = Xi Į + u

ª X j1 º «X » « j2 » « . » « » where x j = « . » « . » « » « . » «X » ¬« jn ¼»

n u1

ª1 X11 «1 X 12 « «. . « , Xi = « . . «. . « «. . «1 X 1n ¬«

The OLS estimates of Į is given by:

..... X j-1,1 ..... X j-1,2 ..... . ..... .....

. .

..... . ..... X j-1,n

X j+1,1 ..... Xk1 º ª Į0 º «Į » X j+1,2 ..... Xk2 »» « 1» « . » . ..... . » » « » . ..... . » , Į = « . » , and u = « . » . ..... . » » « » . ..... . » « Į k-1 » «« Į »» X j+1,n ..... Xkn »¼» ¬ k ¼ k×1

ª u1 º «u » « 2» « . » « » « . » . « . » « » « . » «« u »» ¬ n ¼ nu1

Chapter Six

292 -1

Įˆ = Xci X i Xci x j

(6.42)

The total sum of squares (TSS) is given by: (6.43)

TSS = x cj x j

The residual sum of squares (ESS) is given by: ESS = x cj x j  Įˆ cXci x j c -1 = x cj x j  ª Xci X i Xci x j º X cx j ¬ ¼ -1

= x cj x j  x jc X i Xci X i Xci x j -1 = x cj ª I  X i X ci X i X ci º x j ¬ ¼ -1

= x cj Mi x j , [Mi = I  Xi Xci Xi Xci ]

(6.44)

Thus, we can say that x cj M i x j is the residual sum of squares of equation (6.40). The residual sum of squares decreases with increasing collinearity between the jth explanatory variable with the remaining (k-1) explanatory variables. Thus, from equation (6.39), it can be said that the sampling variance of the OLS estimator ȕˆ j will be increased. The denominator in the sampling variance of the jth OLS estimator is the residual sum of squares from the multiple linear regression equation of the jth explanatory variables on the remaining (k-1) explanatory variables, and this can vary considerably from one to another. We now study R2’s very carefully to show that there is an association between the size of R2 and the extent to which the corresponding sampling variance is increased over the orthogonal case. To explain the relationship, let us now define: TSSj = Total sum of squares in deviation form for Xj ESSj = Residual sum of squares when Xj is regressed on (k-1) the remaining explanatory variables. R 2j = Square of the multiple correlation coefficient from the same regression equation.

Thus, R 2j is given by: R 2j = 1 

ESS j

ESS j TSS j

ª¬1  R 2j º¼ TSS j

(6.45)

The sampling variance of ȕˆ j is given by: var(ȕˆ j ) =

ı2 ª¬1  R 2j º¼ TSS j

(6.46)

Let ȕˆ jo be the OLS estimate of ȕ j in case of orthogonality. Then, we have var(ȕˆ jo ) =

ı2 TSS j

(6.47)

Since the total sum of squares TSSj and ı 2 are constants, the magnitude of the sampling variance with increased collinearity is given by:

Multicollinearity

293

var(ȕˆ j ) 1 = ª¬1  R 2j º¼ var(ȕˆ jo )

(6.48)

Here, the term orthogonality is used to measure the relative magnification of the sampling variance of different coefficients. Let us now examine the relative magnitude of the sampling variance of ȕˆ j over ȕˆ j0 for increasing collinearity: Table 6-2: Relative magnitudes of the sampling variance of ȕˆ j over ȕˆ j0 for increasing multicollinearity: R 2j var(ȕˆ j ) var(ȕˆ )

0.4

0.5

0.6

0.7

0.8

0.9

0.95

0.98

0.99

1.6667

2

2.5

3.3333

5.0

10.0

20.0

50.0

100.0

jo

Also, the relative magnitudes are shown below graphically: 100 Sampling variance with increasing collinearity 80

60

40

20

0 0.4

0.5

0.6

0.7

0.8

0.9

1.0

Square of the multiple correlation coefficient

Fig. 6-3: The magnitude of the sampling variance with increasing collinearity.

From the graphical presentation, we can say that the relationship is highly non-linear and the magnification factor increases dramatically as R 2j exceeds 0.9.

6.6 Some Important Theorems Theorem 6.6.1: Show That the Least Squares Estimate ȕˆ becomes too Large in Absolute Value in the Presence of a Multicollinearity Problem. Proof: Let d be the distance between ȕˆ and ȕ . Thus, the squared distance between ȕˆ and ȕ is given by: d 2 = (ȕˆ  ȕ)c(ȕˆ  ȕ)

(6.49)

Taking the expectation of equation (6.49), we have E(d 2 ) =

k

¦ E(ȕˆ

j

 ȕ j )2

j=1 k

=

¦ var(ȕˆ ) j

j=1

= ı 2 trace X cX

1

(6.50)

The trace of a matrix is the same as the sum of its eigenvalues. If O1 , O2 ,........,Ok are the eigenvalues of the matrix (XcX) , then

1

,

1

O1 O2

,..........,

1

Ok

are the eigenvalues of (XcX)-1 .

Chapter Six

294

Thus, from equation (6.50), we have k

E(d 2 ) = ı 2 ¦ i=1

1 Ȝi

(6.51)

Due to the presence of multicollinearity, if (XcX) is ill-conditioned, then at least one of the eigenvalues will be very small. So, the distance between ȕˆ and ȕ will also be large. Thus, c E(d 2 ) = E ȕˆ  ȕ ȕˆ  ȕ



ı 2 trace XcX





E ȕˆ cȕˆ  2ȕˆ cȕ  ȕcȕ



1



-1 E ȕˆ cȕˆ = ı 2 trace X cX +ȕcȕ



(6.52)

Equation (6.52) shows that ȕˆ is longer than ȕ . Therefore, it can be said that the OLSEs are too large in absolute value. Thus, it can be said that the least-squares method produces bad estimates of parameters in the presence of multicollinearity. This does not imply that the fitted model produces bad predictions. If the predictions are confined to x-space with non-harmful multicollinearity, then predictions are satisfactory. Theorem 6.6.2: Show That the Maximum Likelihood (ML) Estimate of V 2 Cannot be Obtained if the Multicollinearity Is Perfect. Proof: The maximum likelihood estimates of V 2 is given by: ıˆ 2 =

1 ece n

=

1 ª YcY  ȕˆ cX cY º ¼ n ¬

=

º 1ª c 1 « YcY  X cX X cY X cY » n¬ ¼

=

1ª 1 YcY  YcX XcX X cY º ¬ ¼ n

=

1 ª 1 Yc I  X X cX Xc Y º ¬ ¼ n









(6.53)

In the case of perfect multicollinearity, since the determinant of (XcX) will be zero, and as X is a singular matrix, the

XcX

1

matrix cannot be obtained. Thus, we cannot obtain the ıˆ 2 .

6.7 Detection of Multicollinearity In this section, several measures which are most popular and commonly used to detect the presence of multicollinearity in the data set are presented. Each of them is based on a particular approach. It is difficult to say which measure is ultimately the best. The detection of multicollinearity involves 3 aspects: (i) Determining its presence. (ii) Determining its severity. (iii) Determining its form or location.

Multicollinearity

295

Pairwise Correlation

The multicollinearity problem can be detected by taking a pairwise correlation between explanatory variables Xj and Xk. If the correlation coefficient between Xj and Xk is very high, then we can say that there exists collinearity between Xj and Xk. Let r be the correlation coefficient between Xj and Xk which is given by: r=

Cov(X j , X k ) Var(X j ) Var(X k ) n

=

¦X

ji

X ki  nX j X k

i=1

(6.54)

n ª n 2 2º ª 2 2º X nX  ¦ ¦ ji j « » « X ki  nX k » ¬ i=1 ¼ ¬ i=1 ¼

If |r| = 1, or |r| is close to 1, then we can say that the multicollinearity problem exists between two explanatory variables X j , and X k . Ex. 6-5: Data on profit after tax, (PROFIT, in million BDT), green banking activities (GB, in million BDT), investments (INVEST, in million BDT), loans and advances (LOAN, in million BDT), deposits and other accounts (DEPO, in million BDT), and paid-up capital (PAID, in million BDT), of 45 banks in the year 2018 of Bangladesh are collected to detect the multicollinearity problem1. To detect the multicollinearity problem, the regression equation of PROFIT on the variables GB, INVEST, LOAN, DEPO, and PAID is given by:

(6.55)

PROFITi = ȕ 0 +ȕ1GBi +ȕ 2 INVESTi +ȕ 3 LOAN i +ȕ 4 DEPOi +ȕ5 PAIDi +İ i

The multicollinearity problem will be detected using correlation coefficients between different pairs of explanatory variables. The estimated correlation coefficients between different pairs of variables are given in Table 6-4. Table 6-3: Estimated correlation matrix

GB INVEST LOAN DEPO PAID

GB 1 -0.0380 0.1270 0.1275 0.1677

INVEST -0.0380 1 0.6714 0.8548 0.4982

LOAN 0.1270 0.6714 1 0.9159 0.5884

DEPO 0.1275 0.8548 0.9159 1 0.5887

PAID 0.1677 0.4982 0.5884 0.5887 1

From the estimated results, it is found that the correlation coefficients between INVEST and DEPO; LOAN and DEPO; are very high. Thus, it can be said that there exists a multicollinearity problem in the data. Determinant of (XcX) Matrix

This method is based on the fact that, if there exists a multicollinearity problem among the explanatory variables, then the matrix (XcX) becomes ill-conditioned. The value of the determinant of the matrix (XcX) declines as the degree of multicollinearity increases. If the matrix (XcX) is not a full-column rank, i.e., if rank(XcX) < k, where k is the number of explanatory variables, then the matrix (XcX) will be singular, i.e., XcX = 0 . Thus, we can say if the degree of multicollinearity increases, then X cX o 0 . For the existence of perfect multicollinearity, the determinant of the matrix (XcX) will be zero. Therefore, the determinant of (XcX) , i.e., XcX is used as a measure to detect the presence of multicollinearity problem in data. If XcX = 0 , it indicates the existence of perfect multicollinearity in the data. If X cX is much closer to zero, it indicates the existence of near multicollinearity. This measure has some drawbacks as described below: (i) The determinant of (XcX) is affected by the variability of the explanatory variables. For example, suppose that we deal with a regression equation which includes two explanatory variables, i.e., k = 2, then we have

1 Sources: Data are collected from banks’ Annual Reports of 2018, banks’ disclosure on green banking for 2018, banks’ websites, Bangladesh Bank’s Annual Report on Green Banking of 2018, and green banking related articles and journals.

Chapter Six

296

ª n 2 « ¦ x1i (XcX) = « ni=1 « « ¦ x 2i x1i ¬ i=1

n

º x 2i » i=1 » n » 2 x 2i » ¦ i=1 ¼

¦x

1i

(6.56)

The determinant of the matrix (XcX) is given by: XcX

n

2 1i

n

¦x ¦x i=1

2 2i

i=1

ª n º  « ¦ x1i x 2i » ¬ i=1 ¼

ª ª n ºº « « ¦ x1i x 2i » » n n x1i2 ¦ x 22i «1  ¬ni=1 n ¼ » ¦ « 2 2 » i=1 i=1 « ¦ x1i ¦ x 2i » i=1 ¬ i=1 ¼ n

2 1i

n

¦x ¦x i=1

2 2i

i=1

ª¬1  r122 º¼

(6.57)

where r12 is the correlation coefficient between X1 and X 2 . From equation (6.57), we see that XcX not only depends on the correlation coefficient but also depends on the variability of explanatory variables X1 and X 2 . If explanatory variables have very low variability, then XcX may tend to zero which indicates the presence of multicollinearity and this is not the case so. If X1 and X 2 are perfectly related then r12 = r 1, which implies that X cX

0.

(ii) It gives no idea about the relative effects on individual coefficients. If multicollinearity is present, then it will not indicate which variable in X cX is causing multicollinearity, and thus, it is hard to be determined. (iii) The determinant XcX is not bounded between 0 and infinity. Correlation Matrix

If more than two explanatory variables are included in a regression equation, and if there exists near-linear dependency, then it is not necessary that any of the pairwise correlation coefficient rjk is large. Thus, the measure of pairwise correlation coefficients is not sufficient for detecting multicollinearity problem in the data. It is better to inspect the correlation matrix. The correlation matrix refers to the transformation of the (XcX) matrix into a correlation matrix by standardising the regressors using, x ji =

X ji  X j sj n

, where n is the sample size and s j is the

standard deviation of the variable X j , (j=1, 2,…,k). Then the correlation matrix is given by: ª1 «r « 21 «. R =« «. «. « ¬« rk1

r12 1 . . . rk2

... ... ... r1k º ... ... ... r2k »» ... ... ... . » » ... ... ... . » ... ... ... . » » ... ... ... 1 ¼» k uk

(6.58)

If the determinant of the correlation matrix is zero, then there exists perfect multicollinearity. If the determinant is near to zero, then we can say that there exists the problem of collinearity. If the determinant is 0, then the columns of the X matrix are said to be orthogonal. Thus, a value of R close to 0 is an indication of a high degree of multicollinearity. Any value of R lying between 0 and 1 gives an idea of the degree of multicollinearity. The standardisation is used to eliminate the unit of measurement problem.

Multicollinearity

297

The main drawbacks of this measure are given below: (i) This measure will not give us any information about the number of linear dependencies among explanatory variables. (ii) The determinant of the correlation matrix is not affected by the dispersion of the explanatory variables. For ª 1 r12 º 2 example, if we consider k=2, the correlation matrix is R = « » . Thus, we have |R| = 1-r12 , since r12 = r21 . Also, it ¬ r21 1 ¼ can be said that the determinant of the correlation matrix is not affected by the dispersion of the explanatory variables. (iii) |R| is bounded between 0 and 1. Ex. 6-6: Detect the problem of multicollinearity using the correlation matrix for the given problem in Ex. 6-5. Solution: For the given problem in Ex. 6-5, we have the following correlation matrix: ª1 « -0.038 « R = «0.127 « « 0.1275 «¬ 0.1677

-0.038

0.127

0.1275 0.1677 º 1 0.6714 0.8548 0.4982 »» 0.6714 1 0.9159 0.5884 » » 0.8548 0.9159 1 0.5887 » 0.4982 0.5884 0.5887 1 »¼

(6.59)

The determinant of R is 0.0169 , i.e., |R| = 0.0169 . Since, |R| is very much close to zero, it indicates the presence of a high degree of multicollinearity in the data. Measure based on Partial Regression

A simple measure which is based on the coefficient of determination of partial regression can be used to detect the presence of the multicollinearity problem in the data. Let R 2 be the coefficient of determination of the full model, i.e., R 2 is obtained from the regression equation of the type: yi = ȕ 0 +ȕ1X1i +.......+ȕ k X ki + İ i

(6.60)

And, let R 2j be the coefficient of determination of the model in which the jth (j =1, 2,…...,k) explanatory variable is dropped from the full model (6.60) i.e., from the model of the type: yi = ȕ 0 +ȕ1X1i +....+ȕ j-1X j-1,i +ȕ j+1X j+1,i +......+ȕ k X ki + İ i

(6.61)

Thus, for k explanatory variables, we can obtain k coefficient of determinations from k partial regression equations which are denoted by R12 , R 22 , ......., and R 2k respectively. Let R 2L be the largest among R12 , R 22 , ......., and R k2 , i.e., R 2L = Max ^R12 , R 22 , .......,R 2k ` . Then, we calculate the difference between R 2 and R 2L . If R 2  R 2L is close to zero, it indicates the presence of a high degree of multicollinearity in the data set. This procedure involves the following steps: Step 1: Firstly, regress, Y on k explanatory variables of the type: Yi = ȕ 0 +ȕ1X1i +.......+ȕ k X ki + İ i

(6.62)

where {İ i } is a Gaussian white noise process. Run the regression equation (6.62) using the OLS method, and then obtain the coefficient of determination R 2 . Step 2: Drop the jth (j= 1, 2,…..,k) explanatory variable and then regress Y on the remaining (k-1) explanatory variables of the type: Yi = Į 0 +Į1X1i +.....+Į j-1X j-1,i +Į j+1X j+1,i +.....+Į k X ki + u i

(6.63)

where {u i } is a Gaussian white noise process. Run the regression equation (6.63) using the OLS method, and then obtain the coefficient of determination R 2j (j=1, 2,…..,k). Now, obtain k coefficient of determinations for k partial regression equations which are denoted by, R12 , R 22 , ……, and R 2k . Let R 2L be the largest among R12 , R 22 , ……, and R 2k , i.e., R 2L = Max ^R12 , R 22 ,......,R 2k ` .

Chapter Six

298

Step 3: Calculate R 2  R 2L . Step 4. Finally, make a decision whether the multicollinearity problem exists based on the quantity of R 2  R 2L . If a

multicollinearity problem is present in the data, the value of R 2L will be very high. Thus, we can say, that the higher the degree of multicollinearity, the higher the value of R 2L . Therefore, it can be concluded that, if the quantity of R 2  R 2L is close to zero, it indicates the presence of a high degree of multicollinearity in the data.

Limitations

(i) This measure will not give any information about the underlying relations among the explanatory variables, i.e., from this measure, we are unable to say how many relationships are present or how many explanatory variables are responsible for multicollinearity in the data. (ii) Due to the specification problem, the value of (R 2  R 2L ) may be very small and it may be inferred in such situations that multicollinearity is present. Ex. 6-7: Detect the presence of multicollinearity for the given problem in Ex. 6-5, using a partial regression method. Solution: First, we regress PROFIT on GB, INVEST, LOAN, DEPO, and PAID of the type of equation (6.55). We run the regression equation (6.55) by applying the OLS method, and then we obtain the coefficient of determination. The estimated coefficient of determination is R 2 = 0.7504 . We now form the following four partial regression equations by dropping each of the explanatory variables, GB, INVEST, LOAN, DEPO and PAID from the full model (6.55): PROFITi = Į 0 +Į1INVESTi +Į 2 LOAN i +Į3 DEPOi +Į 4 PAIDi +İ1i

(6.64)

PROFITi = į 0 +į1GBi +į 2 LOAN i +į3 DEPOi +į 4 PAIDi + İ 2i

(6.65)

PROFITi = Ȗ 0 +Ȗ1GBi +Ȗ 2 INVESTi +Ȗ 3 DEPOi +Ȗ 4 PAIDi +İ 3i

(6.66)

PROFITi = Ȝ 0 +Ȝ1GBi +Ȝ 2 INVESTi +Ȝ 3 LOAN i +Ȝ 4 PAIDi +İ 4i

(6.67)

PROFITi = ș 0 +ș1GBi +ș 2 INVESTi +ș 3 LOAN i +ș 4 DEPOi +İ 5i

(6.68)

We apply the OLS method to run each of the five partial regression equations (6.64)-(6.68) respectively, and then we obtain the coefficient of determination for each of the five partial regression equations which are R12 = 0.6757, R 22 = 0.7266, R 33 = 0.6951, R 44 = 0.6976, and R 52 = 0.7479 . We find that R 52 = 0.7479 is the largest among them.

Now, we have R 2  R 52 = 0.7504  0.7479 = 0.0025 . Since (R 2  R 52 ) is approximately zero, it can be said that the multicollinearity problem is present in the data. Auxiliary Regression

In case of two explanatory variables of a regression equation, we can test the existence of multicollinearity problem in the data by examining the simple correlation coefficient between these two variables. But in case of more than two explanatory variables, we need to consider the auxiliary regression equation to investigate the existence of a multicollinearity problem. Then, from the multiple correlation coefficient of the auxiliary regression equation, we can detect the presence of a multicollinearity problem. Let us now consider the regression equation of the jth explanatory variable X j (j =1, 2,…..,k) on the remaining (k-1) explanatory variables of the type that is given in equation (6.40). Let R 2x j ,x1 ...x j-1x j+1 ,...x k be the multiple correlation coefficient from the regression equation (6.40). If the multiple correlation coefficients R 2x j ,x1 ...x j-1x j+1 ,...x k is very high, then we can say that there exists a near multicollinearity. This can be examined by considering the form of var(ȕˆ j ) of the regression equation (6.31). We have var(ȕˆ j ) =

ı2 [ see equation (6.39)] x cj M j x j

(6.69)

We know the residual sum of squares (ESS) of the auxiliary regression equation (6.40) is ESS = x cj M j x j [see equation (6.44)]

(6.70)

Multicollinearity

299

The coefficient of multiple determination of the auxiliary regression equation (6.40) is given by: R 2x j ,x1 ...x j-1x j+1 ,...x k = 1 

ESS TSS

ESS = (1  R 2x j ,x1 ...x j-1x j+1 ,...x k )x cj x j x cj M j x j

(1  R 2x j ,x1 ...x j-1x j+1 ,...x k )x cj x j

(6.71)

Thus, equation (6.71) can be written as: var(ȕˆ j ) =

ı2

(6.72)

(1  R 2x j ,x1 ...x j-1x j+1 ,...x k )x cj x j

Thus, from equation (6.72), we can say that, when the value of R 2x j ,x1 ...x j-1x j+1 ,...x k is very high, then the value of the denominator will be very small. So, the value of var(ȕˆ j ) will be very high. Thus, we can make a comment that a high value of var(ȕˆ ) indicates that the jth explanatory variable X (j = 1, 2,…….,k) is correlated with the other j

j

regressors. Hence, we can say that there exists a near collinearity. Klein's Rule to Detect the Problem of Multicollinearity

To detect the multicollinearity problem, Klein suggested a rule. According to Klein’s rule, multicollinearity would be regarded as a problem only if R 2y < R 2x j , where R 2y is the coefficient of multiple determination from the multiple regression equation of Y on k explanatory variables X j's , and R 2x j is the coefficient of multiple determination from the auxiliary regression equation. The following steps are involved in detecting the multicollinearity problem using Klein’s rule: Step 1: Firstly, regress Y on k explanatory variables of the type of equation (6.31). Apply the OLS method to run equation (6.31) and then obtain the coefficient of multiple determination which is denoted by R 2y . Step 2: Regress the jth explanatory variable (j = 1, 2,…..,k) on the remaining (k-1) explanatory variables of the type of equation (6.40) and then apply the OLS method to run the auxiliary regression equation (6.40) and then obtain the coefficient of multiple determination R 2x j (i-1, 2,…..,k). Now, obtain k coefficient of multiple determinations say: R 2x1 , R 2x 2 ,........,and R 2x k for k auxiliary regression equations.

Step 3: We compare R 2y with each of the k coefficients of multiple determinations R 2x1 , R 2x 2 , …., and R 2x k . If R 2y < R 2x j (j = 1, 2,……,k), then it can be said that the multicollinearity problem is present in the data.

Ex. 6-8: Detect the presence of multicollinearity problem for the given problem in Ex. 6-5 using Klein’s rule. Solution: First, we regress PROFIT on GB, INVEST, LOAN, DEPO, and PAID of the type that is given in equation (6.55). Then, we apply the OLS method to run equation (6.55) and obtain the coefficients of multiple determinations. The estimated coefficient of multiple determination is R 2y = 0.7504 . We now regress each of the explanatory variables

GB, INVEST, LOAN, DEPO, and PAID on the remaining explanatory variables which are given below: GBi = Į 0 +Į1INVESTi +Į 2 LOAN i +Į3 DEPOi + Į 4 PAIDi + İ1i

(6.73)

INVESTi = į 0 +į1GBi +į 2 LOAN i +į3 DEPOi +į 4 PAIDi +İ 2i

(6.74)

LOAN i = Ȗ 0 +Ȗ1GBi +Ȗ 2 INVESTi +Ȗ 3 DEPOi +Ȗ 4 PAIDi +İ 3i

(6.75)

DEPOi = Ȝ 0 +Ȝ1GBi +Ȝ 2 INVESTi +Ȝ 3 LOAN i +Ȝ 4 PAIDi +İ 4i

(6.76)

PAIDi = ș 0 +ș1GBi +ș 2 INVESTi +ș3 LOAN i +ș 4 DEPOi +İ 5i

(6.77)

We now apply the OLS method to run each of the auxiliary regression equations (6.73) - (6.77), and then obtain the coefficient of multiple determinations for each of them. Let the coefficients of multiple determinations be denoted by

Chapter Six

300

R12 , R 22 , R 32 , R 44 , and R 54 and given by: R12

0.1380, R 22 = 0.8304, R 32 = 0.892, R 24 = 0.948, and R 52 = 0.3799 . Since

R 22 = 0.8304, , R 32 = 0.892, and R 24 = 0.948 are greater than R 2y = 0.7504 , it can be said that the multicollinearity problem exists in the data and the variables INVEST, LOAN and DEPO are responsible for multicollinearity. Variance Inflation Factors (VIF)

If the multicollinearity problem is present in the data, then the matrix (XcX) becomes ill-conditioned. So, the diagonal elements of the inverse matrix (XcX)-1 will help us with detecting the multicollinearity problem. If R 2j denotes the coefficient of multiple determination which is obtained from a regression equation of the jth explanatory variable X j (j =1, 2,………,k) on the remaining (k-1) explanatory variables of the type: X ji = D 0 +D1X1i +.....+D j-1X j-1,i + D j+1X j+1,i +.......+D k X ki +vi

(6.78)

where {vi } is a Gaussian white noise process. Then the jth diagonal element of (XcX)-1 is given by a jj =

1 (1  R 2j )x cj x j

(6.79)

where x cj x j is the total sum of squares of the regression equation (6.78). The variance of ȕˆ j is given by: var(ȕˆ j ) =

ı2 (1  R 2j )x cj x j

(6.80)

The VIF of the jth explanatory variable X j is denoted by VIFj and is defined as: VIFj =

1 1  R 2j

(6.81)

If the jth explanatory variable X j is nearly orthogonal to the remaining (k-1) explanatory variables, then R 2j will be very small, i.e., close to zero, and consequently, VIFj will be 1. If X j is nearly linearly dependent on the subset of the remaining explanatory variables, then

R 2j is close to 1, and consequently, VIFj will be very large. If the

relationship is perfect, then R 2j will be 1 and consequently, VIFj will be infinity. Since the variance of the jth OLSE of ȕ j is

ı2 ı2 ˆ )= var(ȕ i.e., , we have j (1R2j)xcjxj (1 R2j )xcj x j

1 var(ȕˆ j ) = VIFj ª¬ x cj x j º¼ ı 2 .

(6.82)

Thus, VIFj is the factor by which the variance of ȕˆ j increases when the explanatory variables are linearly dependent. In other words, it can be said that the variance inflation factor VIFj for the jth (j = 1, 2,……..,k) explanatory variable is the factor which is responsible for inflating the sampling variance of the OLSE of the regression coefficients. Hence, the variance inflation factor (VIF) is used to measure the combined effect of dependencies among the explanatory variables on the variance of the term given by equation (6.81). One or more large VIFs indicate the presence of a multicollinearity problem in the data. In practice, usually, if VIF > 5 or 10, it indicates that the associated regression coefficients are poorly estimated because of multicollinearity. If regression coefficients are estimated by the OLS method and its variance is given by (XcX)-1ı 2 , then the variance inflation factor VIF indicates that it will be a part of this variance and is given by VIFj

1 . 1  R 2j

Limitations

(i) It will not give us any information about the number of dependencies among the explanatory variables. (ii) The rule of VIF > 5 or 10 is a rule of thumb which may differ from one situation to another situation.

Multicollinearity

301

Ex. 6-9: For the given problem in Ex. 6-8, we have R 22 = 0.8304, R 32 = 0.892, and R 24 = 0.948 . Thus, the variance inflation factors (VIF) for the variables INVEST, LOAN, and DEPO are given by: VIF2 =

1 = 5.89 1  0.8304

(6.83)

VIF3 =

1 = 9.2593 1  0.892

(6.84)

VIF4 =

1 = 19.2308 1  0.948

(6.85)

It is found that variance inflation factors VIF2 , VIF3 , and VIF4 are greater than 5, and hence, it can be said that the multicollinearity problem is present in the data. The Farrar and Glauber Tests for Multicollinearity

Farrar and Glauber (1967) suggested a procedure for detecting multicollinearity problem when the determinant of R x is different from 1 where R x is the correlation matrix of the explanatory variables. The Farrar and Glauber tests for detecting the presence of multicollinearity problem is a set of three test statistics. (i) The first one is the Chi-square test which is used to detect the presence and the severity of multicollinearity in multiple regression models. (ii) The second one is an F-test which is used to determine which regressors are collinear. (iii) The third one is a t-test that is used for finding out the form of multicollinearity, i.e., for determining which variables are responsible for the appearance of multicollinear variables. A Chi-square Test for Detecting the Presence and Severity of Multicollinearity in a Function with Several Explanatory Variables

This test statistic is applied to testify whether that X's are orthogonal. First, we standardise the explanatory variables X ji  X j X j's (j = 1, 2,.....,k) as x ji = , where n is the sample size and s j is the standard deviation of the regressor sj n variable X j , (j = 1, 2,.....,k) . Then the correlation matrix of the explanatory variables is given by: ª1 «r « 21 «. Rx = « «. «. « ¬« rk1

r12 1 . . . rk2

... ... ... ... ... ...

... ... ... ... ... ...

... r1k º ... r2k »» ... . » » ... . » ... . » » ... 1 ¼» ku k

(6.86)

where rjm is the correlation coefficient between x j and x m , and is given by: rjm =

Cov(x j , x m ) Var(x j ) Var(x m )

(6.87)

We now compute the value of the determinant of the matrix Rx which is given by R x . If the value of the determinant of the matrix Rx lies between 0 and 1, it indicates that there exist some degrees of multicollinearity in the regression model. This multicollinearity may be considered as a departure from the orthogonality. The stronger the departure from the orthogonality, the closer the problem of multicollinearity and vice versa. The null hypothesis to be tested is:

H 0 : The regressor variables X’s are orthogonal against the alternative hypothesis

Chapter Six

302

H1: They are not orthogonal. Under the null hypothesis, the test statistic is given by: 1 ª º Ȥ 2 =  « n  1  2k  5 » ln R x ~Ȥ 2v 6 ¬ ¼

where n is the sample size, k is the number of explanatory variables, and v =

(6.88) k (k  1) . 2

Comment: If the calculated value of the test statistic is less than the table value at a 5% level of significance with v degrees of freedom, we accept the null hypothesis, i.e., we accept that there is no multicollinearity problem in the function. Otherwise, we accept that X's are multicollinear. The F-test for Determination of Collinear Regressors

Farrar and Glauber have suggested that the F-test statistic is applicable to locate the explanatory variables which are multicollinear when R 2j > R 2y , where R 2j is the coefficient of multiple determination from an auxiliary regression equation, i.e., from the regression equation of the jth explanatory variable on the remaining (k-1) explanatory variables, and R 2y is the coefficient of multiple determination from a full regression equation, i.e., when we regress y on k explanatory variables. Let us define, R 2x j ,x1 ....x j-1x j+1 ....x k is the multiple correlation coefficient which is obtained from the auxiliary regression equation (6.78), for j = 1, 2,…….,k. The null hypothesis to be tested is: H 0 : ȡ 2x j ,x1 ....x j-1 ,x j+1 ....x k = 0, for j = 1, 2,.......,k

against the alternative hypothesis: H1: ȡ 2x j ,x1 ....x j-1 ,x j+1 ....x k z 0, for j = 1, 2,.......,k

Under the null hypothesis, the test statistic is given by R 2x j ,x1 ....x j-1 ,x j+1 ....x k F=

k 1 ~F(k-1, n-k) 1  R 2x j ,x1 ....x j-1 ,x j+1 ....x k

(6.89)

nk

Let the level of significance be 5%. Comment: At a 5% level of significance with (k-1) and (n-k) degrees of freedom, we find the table value of the F-test statistic. If the calculated value of the F-test is greater than the table value, we reject the null hypothesis, i.e., we can say that the jth explanatory variable X j is multicollinear with the other explanatory variables. Otherwise, we accept

the null hypothesis. t-Test for Detecting the Pattern of Multicollinearity

This test is used to detect which of the explanatory variables causes the multicollinearity problem. To find which variables are responsible for multicollinearity, we compute the partial correlation coefficients among the explanatory variables. Let, in the regression model, there be k explanatory variables. Then, we are interested in computing the partial correlation coefficient between X j and X m , keeping all other variables are fixed. The following steps are involved in computing the partial correlation coefficient between two variables X j and X m after eliminating the effects of other remaining (k-1) explanatory variables from both X j and X m . Step 1: Firstly, regress X j on remaining (k-1) explanatory variables of the type: X ji = Į 0 +Į1X1i +......+Į j-1X j-1,t + Į j+1X j+1,i +......+Į k X ki +vi

(6.90)

Multicollinearity

303

where {vi } is a Gaussian white noise process. Also, regress X m on the remaining (k-1) explanatory variables of the type: X mi = į 0 +į1X1i +......+į m-1X m-1,i + į m+1X m+1,i +.....+į k X ki + u i

(6.91)

where {u i } is a Gaussian white noise process. Step 2: Apply the OLS method to run equations (6.90) and (6.91), and then obtain the projected value of X j and X m

respectively, that is, ˆ = Įˆ +Įˆ X +.......+Įˆ X + Įˆ X +......+Įˆ X X ji 0 1 1i j-1 j-1,t j+1 j+1,i k ki

(6.92)

and ˆ ˆ = įˆ +įˆ X +.......+įˆ X + įˆ X X mi 0 1 1i m-1 m-1,i m+1 m+1,i +.......+į k X ki

(6.93)

Step 3: Obtain the residuals of vi and u i which are given by: ˆ vˆ i = X ji  X ji

(6.94)

and ˆ uˆ i = X mi  X mi

(6.95)

Step 4: Now, obtain the correlation coefficient between vˆ i and uˆ i which is called the partial correlation coefficient between X j and X m . It is denoted by rjm. 1, 2,.......,k and is given by rjm. 1, 2,.......,k =

Cov(vˆ i , uˆ i ) Var(vˆ i ) Var(uˆ i )

(6.96)

Using the above procedure, we can calculate the partial correlation coefficients between different pairs of variables. The null hypothesis to be tested is: H 0 : ȡ jm.1, 2,.....,k = 0,  j and m against the alternative hypothesis: H1: ȡ jm.1, 2,.....,k z 0,  j and m

Under the null hypothesis, the test statistic is given by: t=

rjm; 1, 2,.......,k n-k 2 1  rjm; 1, 2,.......,k

~t n-k

(6.97)

Let the level of significance be 5% Comment: At a 5% level of significance with n-k degrees of freedom, we find the table values of the test statistic. If the calculated value of the test statistic does not fall in an acceptance region, we reject the null hypothesis. This implies that the partial correlation coefficient between X j and X m is statistically significant, i.e., the variables X j and X m are responsible for the multicollinearity problem in the function.

Advantages of Farrar and Glauber Tests

(i) This method will give us information about which explanatory variables are multicollinear with others. (ii) This measure will give information about the underlying relations among the explanatory variables. That is, from this measure, we are able to say how many relationships are present or how many explanatory variables are responsible for the multicollinearity in the data.

Chapter Six

304

Ex. 6-10: Detect the problem of multicollinearity for the given problem in Ex. 6-5 using the Farrar and Glauber tests. Solution: The Chi-square test will be used for detecting the presence and severity of the multicollinearity problem in a function with 5 explanatory variables. For the given problem, the correlation matrix R is given by equation (6.59). The determinant of R is 0.0169. Since, |R| is close to zero, it indicates the presence of a high degree of multicollinearity problem in the data.

The null hypothesis to be tested is:

H 0 : The regressor variables are orthogonal against the alternative hypothesis: H1: They are not orthogonal.

For the given problem, we have n = 45, k = 5, |R| = 0.0169 , and v =

k (k  1) 2

10 .

Putting these values in equation (6.88), we have 1 ª º 2 Ȥ 2 =  « 45  1  10  5 » ln(0.0169)~Ȥ 10 6 ¬ ¼

= 169.3383

(6.98)

Let the level of significance be 5%. Comment: At a 5% level of significance with 10 degrees of freedom, the table value of the test statistic is 18.31. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected, implying that the explanatory variables are not orthogonal. Thus, the multicollinearity problem is present in the data. The F-test for Determination of Collinear Regressors

Since there are five explanatory variables, we have five auxiliary regression equations which are presented by equations (6.73) through (6.77). If we apply the OLS method to run each of the five auxiliary regression equations (6.73) - (6.77), we have the following coefficient of determinations: R12 0.1380, R 22 = 0.8304, R 32 = 0.892, R 24 = 0.948, and R 52 = 0.3799 . Since R 22 = 0.8304, R 32 = 0.892, and R 24 = 0.948 are greater than R 2y = 0.7504 , it can be said that a multicollinearity problem exists in the data and the variables INVEST, LOAN and DEPO are responsible for multicollinearity The null hypothesis to be tested is: H 0 : ȡ 2x j ,x1 ....x j-1, x j+1 ....x k = 0, for j = 1, 2, 3, 4, 5

against the alternative hypothesis: H1: ȡ 2x j ,x1 ....x j-1 , x j+1 ....x k z 0, for j = 1, 2, 3, 4, 5

Under the null hypothesis, the F-test statistics for five explanatory variables are given below in Table 6-4. Table 6-4: The calculated values of F-test

GB regressed on the remaining variables F1 1.6009

INVEST regressed on the remaining variables F2

48.9623

LOAN regressed on the remaining variables F3 82.5926

DEPO regressed on the remaining variables F4 182.3077

PAID regressed on the remaining variables F5 6.1264

Let the level of significance be 5%. Comment: At a 5% level of significance with 4 and 40 degrees of freedom, the table value of the F-test statistic is 2.61. It has been found that when INVEST, LOAN, DEPO and PAID are considered as independent variables, the calculated values of the F-test are greater than the table value. Thus, the null hypothesis will be rejected for these

Multicollinearity

305

variables. Hence, we can say that the explanatory variables INVEST, LOAN, DEPO and PAID are multicollinear respectively. t-Test for Detecting the Pattern of Multicollinearity

We have to calculate the partial correlation coefficients between different pairs of explanatory variables to detect which variables cause multicollinearity problem in the data using the t-test. Let us define, r12.3,4,5 , r13. 2,4,5 , r14. 2,3,5 and r15. 2,3,5 are the partial correlation coefficients between GB and INVEST, GB and LOAN, GB and DEPO and GB and PAID respectively. The estimated values of these partial correlation coefficients are given as: r12.3,4,5 = 0.3344, r13.2,4,5 = 0.1809, r14.2,3,5 =  0.2793, and r15.2,3,5 =  0.1480 . Again, let us define, r23.1,4,5, , r24.1,3,5 and

r25.1,3,4 are the partial correlation coefficients between INVEST and LOAN, INVEST and DEPO and INVEST and PAID respectively. The estimated values of these partial correlation coefficients are r23.1,4,5 = 0.5605, r24.1,3,5 =  0.8183, and r25.1,3,4 =  0.1268 . Also, let us define, r34.1,2,5 and r35.1,2,4 are the partial correlation coefficients between LOAN and DEPO, and LOAN and PAID respectively. The estimated values of the partial correlation coefficients are given as: r34.1,2,5 =  0.8733, and r35.1,2,4 =  0.1941 . Again, let us define, r45.123 is the partial correlation coefficient between DEPO and PAID. The estimated value is r45.123 = 0.0175 . All the values are obtained using the software package RATS. The null hypothesis to be tested is: H 0 : ȡij.1, 2, 3, 4, 5 = 0,  i and j , i z j

against the alternative hypothesis

H1: ȡij.1, 2, 3, 3, 5 z 0,  i and j, i z j Under the null hypothesis, the calculated values of the test statistic for partial correlation coefficients of different pairs of variables are given below in Table 6-5. Table 6-5: Calculated values of t-test statistic

Test values for partial correlation between GB and each of the four explanatory variables t=

0.3344 40

1  0.3345 = 2.2439

t=

2

0.1809 40

1  0.18092 = 0.1632

t=

~t 40

~t 40

0.2793 40

1  (0.2793) 2 = -1.8394

t=

0.148 40

1  (0.148) 2 =  0.9467

~t 40

Test values for partial correlation between INVEST and each of the remaining three explanatory variables t=

0.5605 40

1  0.5605 = 4.2805

t=

2

~t 40

0.8183 40 1  (0.8183) 2

~t 40

=  9.0044 0.1268 40 t= ~t 40 1  (0.1268) 2

Test values for partial correlation between LOAN and each of the remaining two explanatory variables t=

0.8733 40 1  (0.8733)

2

~t 40

11.3376 0.1941 40 t= ~t 40 1  (0.1941) 2

Test value for partial correlation between DEPO and PAID

t=

0.0175 40 1  0.01752 0.1106

~t 40

1.2514

=  0.8082

~t 40

Let the level of significance be 5% Comment: At a 5% level of significance with 40 degrees of freedom, the table value of the test statistic is r2.021 . Since the calculated values of the test statistics for the partial correlation coefficients between GB and INVEST, INVEST and LOAN, INVEST and DEPO and LOAN and DEPO do not fall between the table values, it can be concluded that the variables GB, INVEST, LOAN and DEPO are responsible for multicollinearity problem in the data.

Chapter Six

306

Leamer’s Method

Leamer (in Greene, 1993) has suggested the following measure of the effect of multicollinearity for the jth explanatory variable: 1/2

1 ­§ n 2· ½ ° ¨ ¦ X ji  X j ¸ ° ° ¹ ° ; j = 1, 2,…….,k. C j = ® © i=1 ¾ 1 XcX jj ° ° ° ° ¯ ¿ 1

(6.99)

-1

where X cX jj is the jth element of the matrix X cX . This measure is the square root of the ratio of the variances of ȕˆ j when estimated without and with the other variables. If the jth explanatory variable is uncorrelated with other explanatory variables, then C j will be 1. If the jth explanatory variable is correlated with other (k-1) explanatory variables, then C j will be equal to (1  R 2j )1/2 where R 2j is the coefficient of multiple determination

from a

regression equation in which the jth explanatory variable X j regresses on (k-1) remaining explanatory variables. The Condition Number

Another way to test the degree of multicollinearity is the magnitude of the eigenvalues of the correlation matrix of the regressors. Large variability among the eigenvalues indicates a higher degree of multicollinearity. Two features of these eigenvalues are given below: (i) If the eigenvalues are zero, it indicates exact collinearity among the explanatory variables. Therefore, very small eigenvalues indicate near-linear dependencies or high degrees of multicollinearity among the explanatory variables. (ii) Let, O1 , O2 ,........,Ok be the eigenvalues of the correlation matrix: ª1 «r « 21 «. R =« «. «. « «¬ rk1

r12 1 . . . rk2

... ... ... ... ... ...

... ... ... ... ... ...

... r1k º ... r2k »» ... . » » ... . » ... . » » ... 1 »¼ k uk

(6.100)

Then the square root of the ratio of the largest to the smallest eigenvalue is called the condition number and is given by: k=

Omax Omin

(6.101)

This number is the most popular and commonly used index to test the “instability” of the least-squares regression coefficients. A large condition number (say, 10 or more) indicates that relatively small changes in the data tend to produce large changes in the least-squares estimate. In this event, the correlation matrix of the regressors is said to be ill-conditioned (Greene, 1993). Let us now consider a regression equation for two explanatory variables. The condition number is given by: 1/ 2

­° 1+ r 2 ½° 12 k= ® ¾ 2 1  r 12 ° ¯° ¿

(6.102)

Setting the condition number k=10, we have r122 = 0.9608 (Fox 1997). Let us now examine how the variance of each estimated coefficient of ȕ j 's depends on the eigenvalues of (XcX) . We know, ˆ = (X cX)-1ı 2 var(ȕ)

(6.103)

Multicollinearity

307

Using the spectral decomposition of (XcX) , we can write that diagonal O1

P(X cX)Pc = $

O2 ..... Or ..... Ok

(6.104)

and (6.105)

PcP = PPc = I

where P is the k u k orthogonal matrix whose columns are given by the eigenvectors of (XcX) and O1 , O2 ,........,Or are the non-zero eigenvalues. Thus, we can write that P(XcX)Pc = $ PcP(X cX)PcP = Pc$P (XcX) = Pc$P (XcX)-1 = (Pc$P)-1 (XcX)-1 = P$-1Pc

(6.106)

Since A is the diagonal matrix of the eigenvalues of the matrix (XcX) , we have ˆ = ı2 var(ȕ)

§p pc · i i ¸ ¸ O i=1 © i ¹ k

¦ ¨¨

(6.107)

From equation (6.107), we have: var(ȕˆ j ) = ı 2

§ p 2ji ¨¨ ¦ i=1 © Oi k

· ¸¸; j = 1, 2,.....,k ¹

(6.108)

This suggests that the variance of each estimated coefficient of ȕ j depends on the eigenvalues of (XcX) . Small eigenvalues Ȝ i will dominate these variances. For this reason, we look at all the condition numbers Ȝ max Ȝi

ki =

(i =1, 2,……,k).

(6.109)

The large values indicate the existence of near collinearity among the explanatory variables where Omax is the largest eigenvalue of the correlation matrix of explanatory variables. Various applications with experimental and actual data sets suggest that thr condition number in the range of 10 to more indicates the existence of a collinearity problem in the data. If the condition number lies between 20 and 30, it indicates that the collinearity problem is serious. Ex. 6-11: Detect the multicollinearity problem for the given problem in Ex. 6-5, using the condition number. Solution: From the given problem the correlation matrix R of the explanatory variables is given by equation (6.59). The eigenvalues of the correlation matrix R are O1 = 3.0999, O2 = 1.0292, O3 = 0.5312, O4 = 0.3071, and O5 = 0.0325 respectively. The largest and smallest eigenvalues are 3.0999 and 0.0325. Therefore, the condition number is given by: k=

3.0999 0.0325

# 10

(6.110)

Since the condition number is approximately 10, it can be said that collinearity is present in the data. Variance Decomposition Proportions

Let us consider a multiple linear regression equation of the type:

Chapter Six

308

Yi = ȕ 0 + ȕ1X1i +ȕ 2 X 2i +.........+ȕ k X ki + İ i

(6.111)

yi = ȕ1 x1i +ȕ 2 x 2i +.........+ȕ k x ki + İ i where, yi = Yi  Y, and x ji = (X ji  X j )

In matrix notation, equation (6.111) can be written as: (6.112)

Y = Xȕ +İ

where Y is a (n×1) matrix, ȕ is a (k×1) matrix of parameters, X is a (n u k) design matrix and İ is a (n×1) matrix of random error terms. The variance-covariance matrix of ȕˆ is given by: ˆ = (X cX)-1ı 2 var(ȕ)

(6.113)

Let us now consider a reparametrised version by using the singular value decomposition of X. The matrix X can be written as X = Q/1/ 2 Pc

(6.114)

where / = diagonal (O1 , O2 ,.......,Ok ) , Q is a (n ×k) matrix such that QcQ = I, and Pƍ is a (k × k) matrix such that PPƍ = I. The O1 , O2 ,.......,Ok are the eigenvalues of the correlation matrix (XcX) of regressors. Now putting the value of X in equation (6.113), we have -1

ˆ = ª Q/1/ 2 Pc Q/1/ 2 Pc º ı 2 var(ȕ) ¬ ¼

= (P/ -1Pc)ı 2

(6.115)

Using this decomposition, it is possible to decompose the estimated variance of each regression coefficients into a sum of the data matrix X. We can express the variance of a single coefficient as var(ȕˆ j ) = ı 2

§ p 2ji · ¸¸; j = 1, 2,.....,k i=1 © i ¹ k

¦ ¨¨ O

(6.116)

where p ji denotes the (j, i)th element of the matrix P. Consequently, the proportion of var(ȕˆ j ) associated with any single eigenvalue is ij ji =

p 2ji / Oi k

¦p

2 ji

(6.117)

/ Oi

i 1

These values are shown below in Table 6-6: Table 6-6: Variance-decomposition proportion of OLS estimators associated with a single eigenvalue.

Eigenvalues Variables 1 2 . . . j . . . k

O1

O2

………..

Oi

…….

Ok

I11 I21

I12 I22

……….

I1i I2i

……..

I1k I2k

. . .

. . .

I j1

I j2

. . .

. . .

Ik1

Ik2

……….. ………… ………… ……….. ……….. ………… ………… ………… ………..

. . .

I ji . . .

Iki

………. ……… ………. ………. ………. ……….. ……….. ………. ……….

. . .

I jk . . .

Ikk

Multicollinearity

Each row total of Table 6-6, will be one. The presence of two or more large values of

309

I ji in a row indicates that linear

dependence associated with the corresponding eigenvalue is adversely affecting the precision of the associated coefficients. If I ji is greater than 0.50, it indicates that ȕˆ j is adversely affected by the multicollinearity problem, i.e., the estimate of ȕˆ j is influenced by the presence of multicollinearity. It is a good diagnostic tool in the sense that it informs of the presence of harmful multicollinearity as well as indicates the number of linear dependencies responsible for multicollinearity. This diagnostic is better than other diagnostics. Ex. 6-12: Detect the multicollinearity problem for the given problem in Ex. 6-5, using the Variance Decomposition Proportions technique. Solution: Firstly, we regress PROFIT on the explanatory variables GB, INVEST, LOAN, DEPO, and PAID of the type that is given in equation (6.55). In deviation form, equation (6.55) can be written as:

(6.118)

yi = ȕ1 x1i +ȕ 2 x 2i +ȕ 3 x 3i +ȕ 4 x 4i +ȕ5 x 5i + İ i

where yi = PROFITi  PROFIT, x1i = GBi  GB, x2i =INVESTi  INVEST, , x3i =LOANi  LOAN, x 4i = DEPOi  DEPO, and x 5i = PAIDi  PAID . In matrix form, equation (6.118) can be written as: (6.119)

Y= Xȕ+İ

For the given problem, the (XcX) matrix is given by:

ª4.2922e+009 «-4.7971e+008 « (XcX) = « 4.3637e+009 « « 6.2996e+009 ¬«221395773.8714

-4.7971e+008 4.3637e+009 6.2996e+009 221395773.871º 3.7166e+010 6.7868e+010 1.2428e+011 1.9352e+009 »» 6.7868e+010 2.7495e+011 3.6218e+011 6.2167e+009 » » 1.2428e+011 3.6218e+011 5.6877e+011 8.9460e+009 » 1.9352e+009 6.2167e+009 8.9460e+009 405948374.387¼»

The eigenvalues of the matrix (XcX) are given below:

O1 = 8.3776e+011, O2 = 3.7650e+010, O3 = 6.6823e+009, O4 = 3.2414e+009, and O5 = 251051907.366 The eigenvectors of the matrix (XcX) are shown below in a matrix form: ª8.9565e-003 « 0.1735 « EV = « 0.5477 « « 0.8183 «¬ 0.0132

-0.0278

-0.5289

0.8472

-0.0419

0.4490

0.7380

0.4722

-0.0273

-0.7831

0.2628

0.1316

0.4294

-0.3266

-0.1983

-4.4196e-003 3.6320e-003 0.0513

ª 2.7027e-010 « 6.9238e-011 « -1 (X cX) = «1.7257e-011 « « -2.6698e-011 «¬ -1.5339e-010

6.9238e-011

1.7257e-011

1.5865e-010

4.0969e-011

4.0969e-011 3.3677e-011 -5.9938e-011 -2.9471e-011 -1.0063e-010 -7.0994e-011

º » » » -0.0184 » 2.4548e-003» »¼ 0.9986

-2.6698e-011 -1.5339e-010 º -5.9938e-011 -1.0063e-010 »» -2.9471e-011 -7.0994e-011 » » 3.3816e-011 6.4087e-012» 6.4087e-012 3.9727e-009»¼

The decompositions of the estimated variance of each regression coefficients associated with any single eigenvalue are shown below in Table 6-7.

Chapter Six

310

Table 6-7: Decompositions of the estimated variance of each regression coefficients

Dimension Eigenvalues GB INVEST LOAN DEPO PAID

1 8.3776e+011 0.0000 0.00023 0.0106 0.0236 0.00000

2 3.7650e+010 0.0000 0.0338 0.4837 0.1448 0.0000

3 6.6823e+009 0.1549 0.5137 0.3069 0.4720 0.00000

4 3.2414e+009 0.8193 0.4336 0.1587 0.3587 0.000214

5 491171872.1963 0.0259 0.0187 0.0400 0.0071 0.9998

From the estimated results given in Table-6.8, it can be concluded that ȕˆ 2 , ȕˆ 3 , and ȕˆ 4 are affected adversely by the multicollinearity problem.

6.7 Solutions to the Problem of Multicollinearity Various estimation techniques which are most popular and widely applicable to estimate regression models with multicollinearity problem are variable selection, ridge regression, and principal component analysis. In this section, several alternative techniques are also discussed to estimate econometric models with the problems resulting from the presence of multicollinearity in the data. (i) Ignore It

Some writers have suggested that if the multicollinearity problem does not seriously affect the estimates of the regression coefficients, then one may tolerate its presence in the function. Sometimes, the existence of a multicollinearity problem in the data does not reduce the t-ratios on variables that would have been statistically significant without the multicollinearity problem sufficient to make them insignificant. The BLUE properties of the OLS estimators will not be affected if a near multicollinearity problem is present in the data. That is, the consistency, unbiasedness, and efficiency properties of the OLS estimators will not be affected seriously due to the presence of near multicollinearity problem in the data. However, it is very difficult to say that one can obtain small standard errors if near multicollinearity is present in the data set. If the principal objective is to produce forecasts from the estimated model, this will not be a matter for researchers. Since the estimates of the coefficients will not be affected by the presence of near multicollinearity, one may ignore its presence while estimating the regression equation. (ii) Drop Some Collinear Variables

If we can identify the variables which cause multicollinearity problem using some tests, then these collinear variables can be dropped from the equation so as to match the condition of the full rank of the X matrix. The process of omitting the explanatory variables from the function may be carried out based on some kinds of ordering of explanatory variables, e.g., variables having a smaller value of t-ratios will be dropped from the equation. In another example, sometimes we are not interested in all the parameters. In such cases, we can get the estimators of the parameters of interest which have smaller mean squared errors than the variance of OLSE by dropping some variables. If some variables are eliminated, this may reduce the predictive power of the equation. Sometimes, there is no assurance that how the model will exhibit less multicollinearity. Ex. 6-13: Let us consider the following demand function: q = f(p, p r , p c , y, t)

(6.120)

where q is the quantity demanded of a particular commodity, p is the per unit price of the commodity, pr is the per unit price of other related commodities, pc is the per unit price of complementary commodities, y is the level of income of a consumer, and t is a suitable measure of consumer’s test of preference. If the variables pr, pc, and t are not important in the demand function but cause the multicollinearity problem, we can exclude these variables from the demand function. We now consider the following regression equation to explain the procedure to drop the variables from the equation: Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i Yi  Y = ȕ1 (X1i  X1 ) + ȕ 2 (X 2i  X 2 ) + İ i yi = ȕ1 x1i +ȕ 2 x 2i +İ i

(6.121)

where yi = Yi  Y , x1i = X1i  X1 , and x 2i = X 2i  X 2 , {İ t } is a Gaussian white noise process.

Multicollinearity

311

The problem is that the variables x1 , and x 2 are correlated. Suppose, we are very much interested to know the effect of the variable x1 . We then drop the variable x 2 and estimate the following equation: (6.122)

yi = ȕ1 x1i +İ i

Let, ȕˆ 1 be the OLS estimator of ȕ1 from the complete model (6.121), and let ȕ 1 be the OLS estimator of ȕ1 from the omitted variable regression equation (6.122). For the OLS estimator ȕˆ , we have: 1

n

E(ȕˆ 1 ) = ȕ1 , and var(ȕˆ 1 ) =

ı

2

s11 1  r122

, where

s11 =

n

¦ x1i2 , and i=1

r12 =

¦x

1i

x 2i

i=1

n

2 1i

n

¦x ¦x i=1

. Here, r12 is called the 2 2i

i=1

correlation coefficient between variables x1 and x 2 . The OLS estimator ȕ 1 of ȕ1 from the omitted variable regression equation is given by: n

ȕ 1 =

¦x

1i

yi

i=1 n

(6.123)

¦ x1i2 i=1

Substituting for yi from equation (6.130), we have n

ȕ 1 =

¦ x ȕ x 1i

1

1i

+ȕ 2 x 2i +İ i

i=1

n

¦x

2 1i

i=1

n

= ȕ1 +ȕ 2

¦x

1i

n

x 2i

i=1



n

¦ x 1i2 i=1

¦x

İ

1i i

(6.124)

i=1 n

¦ x1i2 i=1

From equation (6.123), we have s E ȕ 1 = ȕ1 +ȕ 2 12 , [where s12 s11



n

¦x

1i

x 2i ]

(6.125)

i=1

and s ı2 var ȕ 1 = 11 2 >s11 @



=

ı2 s11

(6.126)

Thus, it is found that the estimator ȕ 1 is biased but has a smaller variance than ȕˆ 1 . The ratio of the variances of ȕ 1 and ȕˆ is given by: 1



var ȕ 1

2

var ȕˆ 1

s11



s11 1  r122 ı2

= 1 r

var ȕ 1 var ȕˆ 1



2 12

(6.127)

Chapter Six

312

If the relationship between x1 , and x 2 is very high, then, from equation (6.127), we have

E(Įˆ R )  Į @ ¬ ¼ 1 1 = ı 2 trace ª x cx+O I (x cx) x cx+O I º + > PĮ  Į @c > PĮ  Į @ ¬ ¼ 1 1 = ı 2 trace ª x cx+O I (x cx) x cx+O I º +Įc > P  I @c > P  I @ Į ¬ ¼

(6.167)

1

where P = x cx+O I (x cx) . If O = 0, then P = I and the first term of equation (6.167) will be the sum of the variances of the least-squares estimators of the coefficients when the second term vanishes. For this case, the mean squared error of the ridge estimator Įˆ R is given by: MSE(Įˆ R ) = ı 2 trace(x cx)-1

(6.168)

which is equal to the mean squared error of the OLS estimator Įˆ . The existence theorem states that there always exists the value of O , say O * , which is greater than zero, such that MSE ^Įˆ R (Ȝ* )` < MSE ^Įˆ R (0)`

(6.169)

Provided that ĮcĮ is bounded. The mean squared error of Įˆ R can also be written as: k

MSE(Įˆ R ) = ı 2 ¦ j=1

įj (į j +Ȝ)

-2

2

+Ȝ 2 Įc x cx+ȜI Į

where į1 , į 2 ,.........,į k are the eigenvalues of x cx .

(6.170)

Multicollinearity

321

Thus, as O increases, the bias in Įˆ R increases whereas its variance decreases. Thus, the trade-off between bias and variance depends upon the value of O . Choice of O

The estimation of ridge regression estimator depends on the value of O . There is no unique technique to determine the value of O . Different techniques have been suggested in the literature of econometrics to determine the value of O . The value of O can be determined using the following techniques: (i) Ridge trace, i.e., stability of the estimators with respect to O . (ii) Reasonable signs. (iii) Magnitude of the residual sum of squares etc. The technique ridge trace is discussed below to detect the value of O . Ridge Trace

Ridge trace is defined as a graphical method by which we can detect the value of Ȝ for which the estimators will be stabilised by plotting the ridge regression estimators to the Y-axis corresponding to different values of O . If the multicollinearity problem is serious, the instability of regression coefficients is reflected in the ridge trace. If the multicollinearity problem is present in the data, some of the ridge estimators change dramatically as the value of O increases and they stabilise at a particular value of O . The main objective of ridge trace is to inspect the curve and then detect the desirable value of O for which the estimators will be stabilised. For this value of O , say O * , the mean squared error of the ridge estimator will be smaller than the variance of OLSE. Ex. 6-16: Let us consider a model of six explanatory variables of the type: yi =

6

¦ Dˆ x j

ji

S jj + ei

j=1

=

6

¦ ȕˆ x j

ji

(6.171)

+ ei

j=1

ȕˆ j where ȕˆ j = Įˆ j S jj Ÿ Įˆ j = S jj

The ridge estimator of ȕ is ȕˆ R =(x cx+ȜI)-1 x cy

(6.172)

The ridge estimator ȕˆ j (j = 1, 2, 3, 4, 5, 6) of ȕ j is obtained for different values of O . These ridge estimators are plotted to the Y-axis corresponding to different values of O . For each value of O , we have a dot for each ridge estimator. The graph which is obtained by joining these dots for each ridge estimator by a smooth curve is called the ridge trace for the respective parameter and is shown below in Fig. 6-8. We now choose the value of O for which all the curves stabilise and become nearly parallel to each other. From Fig. 6-8, we see that the curves become nearly parallel starting from O O3 or so. Thus, one possible choice of O is O O3 , and the ridge estimators of the parameters are obtained as ȕˆ = (x cx+Ȝ I)-1 x cy . R

3

Chapter Six

322

Eˆ1

Eˆ2

Eˆ3

Eˆ4 Eˆ5

Eˆ6 O0

O1

O2

O3

O4

O5

O6

Fig. 6-8: Ridge trace

Fig. 6-8 indicates the presence of multicollinearity in the data because the characteristic of the ridge estimators at O = 0 is different from that of other values of O . For small values of O , the estimates change rapidly. The estimates stabilise gradually as O increases. The value of O at which all the estimates stabilise gives the desired value of O because moving away from such O will not bring any appreciable reduction in the residual sum of squares. If multicollinearity is present, the variation in ridge regression estimators is very high around O O0 . The optimal O is chosen such that, after that value of O , almost all traces stabilise. Advantages of Ridge Regression Analysis

(i) In a ridge regression analysis, we choose the value of O for which the reduction in the variance will be greater than the increase in the squared bias. Thus, the mean squared error of the ridge estimator Įˆ R is less than the variance ˆ . of the OLS estimator Įˆ , i.e., MSE(Įˆ R ) < var(Į) (ii) If multicollinearity problem is present in the data, then ridge trace is widely applicable to select the desirable explanatory variables. (iii) The ridge estimators make the regressors more nearly orthogonal. (iv) Ridge regression estimation is closely related to the Bayesian estimation. (v) The ridge estimates may result in a regression equation that does a better job of prediction in future observations than the least-squares estimates. Disadvantages of Ridge Regression Analysis

(i) The ridge estimator Įˆ R is biased. (ii) The ridge estimate will not necessarily provide the best fit to the data. (iii) The choice of O depends on data, and therefore, is a random variable. Using it as a random variable violates the assumption that is a constant. This will disturb the optimal properties derived under the assumption that O is constant. (iv) The value of O lies between 0 and infinity. We have to choose a large number of values of O for which the ridge estimators will be stabilised. This procedure is time consuming. (v) The choice of O from ridge trace is not unique. Different researchers may choose different values of O . As a kıˆ 2 , where ȕˆ and ıˆ 2 result, the ridge regression estimators will be different. Another technique to choose O is Ȝ= ȕˆ cȕˆ are obtained from the least squares estimation. (vi) To determine the value of O based on the stability of numerical estimates of ȕˆ j's is just a rough way. Different estimates may exhibit stability for different values of O and it may often be hard to strike a compromise. In such a situation, we can apply the generalised ridge regression estimators. (v) There is no guidance available regarding the testing of the hypothesis and for confidence interval estimation.

Multicollinearity

323

Another Way to Estimate Ridge Regression Estimator

Let us consider a multiple linear regression equation of the type: Yi = ȕ 0 +ȕ1X1i +........+ȕ k X ki +İ i

(6.173)

In deviation form, equation (6.173) can be written as: yi = ȕ1 x1i +ȕ 2 x 2i +..........+ȕ k x ki +İ i

(6.174)

where yi = Yi  Y, and x ji = X ji  X j , j = 1, 2,....,k. In matrix form, equation (6.174) can also be written as: (6.175)

Y = Xȕ+İ

-1 There are several interpretations of the ridge estimator ȕˆ R = X cX+ȜI of ȕ . One is to obtain the least-squares k

estimator when ¦ ȕ 2j = C, i.e., ȕcȕ = C. The estimator of ȕ can be obtained by minimising the Lagrange function of j=1

the type: S(ȕ) = (Y  Xȕ)c(Y  Xȕ)  Ȝ(ȕcȕ  C)

(6.176)

where O is the Lagrange multiplier. Differentiating S(ȕ) with respect to ȕ and then equating to zero, we have įS(ȕ) =0 įȕ 2X cY+2X cXȕ+2Ȝȕ = 0

XcX+ȜI ȕ = XcY

(6.177)

Equation (6.177) is called the normal equation and by solving this normal equation, we can obtain the estimator of ȕ which is called the ridge estimator. The ridge estimator is given by: -1 ȕˆ R = X cX+ȜI X cY

(6.178)

The ridge estimator of ȕ 0 is given by: k

ȕˆ 0R = Y  ¦ ȕˆ jR X j

(6.179)

j=1

If C is very small, it may indicate that most of the regression coefficients are close to zero; if C is large, it may indicate that the regression coefficients are away from zero. So, C puts a sort of penalty on the regression coefficients to enable its estimation. Principal Component Analysis

The principal component is a statistical technique/method/procedure which is used to produce an interrelationship among many correlated variables with a smaller number of principal components that are linearly uncorrelated. In the orthogonal transformation, the number of principal components is less than or equal to the number of original explanatory variables. This transformation is defined in such a way that the first principal component has the largest possible variance and the second component has the second largest variance, and so on. Let us now discuss the principal component analysis to solve the multicollinearity problem in a multiple linear regression equation of the type: Yi = ȕ 0 +ȕ1X1i +........+ȕ k X ki +İ i

(6.180)

Chapter Six

324

Let ȕˆ 0 , ȕˆ 1 ,........., and ȕˆ k be the estimates of ȕ 0 , ȕ1 ,......, and ȕ k and let ei be the residual corresponding to the ith set of sample observation. Then, for sample data, equation (6.180) can be written as: k

Yi  Y =

¦ ȕˆ (X j

k

ji

j=1

 X j )+ei [ where ȕˆ 0 =Y  ¦ ȕˆ j X j ]

(6.181)

j=1

Let us now transform the variable Xj (j=1, 2,….,k) such that x ji =

X ji  X j S jj

X ji  X j

n

, where S jj = ¦ (X ji  X j ) 2 i=1

x ji S jj

and yi = Yi  Y,

Thus, equation (6.181) can be written as k

yi =

¦ ȕˆ x j

S jj +ei

ji

j=1

k

= ¦ Įˆ j x ji +ei , where Įˆ j = ȕˆ j S jj Ÿ ȕˆ j j=1

Įˆ j S jj

(6.182)

In matrix notation, equation (6.182) can be written as y = xĮˆ +e

(6.183)

The correlation matrix (x cx) is given by ª1 «r « 21 «. x cx = « «. «. « «¬ rk1

r12 1 . . . rk2

... ... ... r1k º ... ... ... r2k »» ... ... ... . » » ... ... ... . » ... ... ... . » » ... ... ... 1 »¼ ku k

(6.184)

Let O1 , O2 ,.....,and Ok be the characteristic roots of the correlation matrix x cx which are the solutions of the determinantal equation: (6.185)

|x cx  O I| = 0

The sum of the characteristic roots of x cx is equal to the sum to the trace of the correlation matrix x cx , i.e., k

¦ Oj

k

(6.186)

j=1

Here, k is known as the total variation of the x’s, and under principal components, it will be reallocated in terms of the transformed variables P’s. Let b j be the characteristic vector associated with the characteristic roots O j , i.e., b j satisfies the following homogeneous set of equations:

x cx  O I b j

j

=0

(6.187)

Multicollinearity

The solutions bcj = b j1

325

b j2 . . . b jk (j = 1, 2,…..,k) are chosen from the infinity of proportional solutions that

exist for j which are the normalised solutions subject to: bcj b j = 1 k

2

¦ b ji

(6.188)

1

i=1

It can be shown that, if all the O j (j= 1, 2,…….,k) are different, the characteristic vectors or the latent vectors are pairwise. Then (6.189)

bcj bl = 0, for all j z l = 1, 2,.....,k

The b j (j=1, 2,...,k) vector is used to re-express the x’s in terms of the principal component Pj (j=1, 2,…,k) of the following linear function: Pj = b j1 x1 +b j2 x 2 +...........+b jk x k , j = 1, 2,.....,k

(6.190)

That is P1 = b11 x1 +b12 x 2 +............+b1k x k ½ P2 = b 21 x1 +b 22 x 2 +...........+b 2k x k °° ° . ¾ ° . ° Pk = b k1 x1 +b k2 x 2 +...........+b kk x k °¿

(6.191)

k

If the variance of P1 is maximum subject to ¦ b1j2 j=1

1 (called normalisation condition), then P1 is said to be the first

principal component. It is the linear function of x’s that has the highest variance. If P2 has the second highest variance k

subject to the condition, ¦ b 22j j=1

1, then P2 is said to be the second principal component. Here, P2 is uncorrelated with

P1 . Following this technique, we find k linear functions P1 , P2 , …., and Pk of x’s. These P1 , P2 , …., and Pk are called the k principal components corresponding to O1 , O2 ,......,and Ok . Thus, the principal component procedure creates a set of k artificial variables P’s from the original x’s in such a way that P’s are mutually orthogonal or uncorrelated. It can be shown that Var(P1 ) + Var(P2 ) +.......+Var(Pk ) = Var(x1 ) + Var(x 2 ) +.......+Var(x k )

(6.192)

But, unlike x1 , x 2 ,........,x k which may be highly correlated, P1 , P2 ,.......,Pk are orthogonal. The first principal component explains the largest proportion; the second principal component explains the second highest proportion, and further P’s explain smaller and smaller proportions of the variation of the standardised data until the variation is explained. The principal component analysis technique suggests that, instead of regressing y on x’s, we have to regress y on P’s. If we do this and then substitute the values of the P’s in terms of the values of x’s, we finally get the same results as before. Let us define a diagonal matrix / such that ªO1 «0 « «. /= « «. «. « ¬« 0

0

O2 . . . 0

0 0 . . . 0

.... .... .... .... .... ....

.... 0 º .... 0 »» .... . » » .... . » .... . » » .... Ok ¼»

(6.193)

Chapter Six

326

The / is the (k u k) diagonal matrix of the eigenvalues of the matrix x cx . Let us define, B is the (k u k) orthogonal matrix whose jth (j = 1, 2,……..,k) column is the jth eigenvector associated with O j and is given by: ª b11 «b « 12 « . B =« « . « . « «¬ b1k

b 21 b 22 . . . b 2k

... ... ... ... ... ...

... ... ... ... ... ....

... b k1 º ... b k2 »» ... . » » ... . » ... . » » ... b kk »¼

(6.194)

We have k ª ¦ b2 « j=1 1j «k « ¦ b 2j b1j « j=1 « BcB = « . « . « « . «k b kj b1j «« ¦ ¬ j=1

k

¦ b1j b 2j j=1

k

2

¦ b 2j j=1

. . . k

¦ b kj b 2j j=1

k ... ... ... ¦ b1j b kj º » j=1 » k ... ... ... ¦ b 2j b kj » j=1 » ... ... ... . » » ... ... ... . » » ... ... ... . » k » ... ... ... ¦ b 2kj » j=1 ¼»

ª1 0 0 ... ... 0 º «0 1 0 ... ... 0 » « » « . . . ... ... . » = « » « . . . ... ... . » « . . . ... ... . » « » ¬«0 0 0 ... ... 1 ¼»

(6.195)

= Ik

Since (x cx  O j I) b j = 0

x cxb j  O j b j = 0 b jc x cxb j  b jcO j b j = 0 b jc x cxb j  O j b jc b j = 0 b jc x cxb j

O j [  b jc b j = 1]

(6.196)

Thus, we can write Bcx cxB = /

(6.197)

Considering the canonical form of the linear regression model, y = xȕ+İ = xBBcȕ+İ = PĮ+İ

where P = xB, Į = Bcȕ , and Bcx cxB = PcP = /

(6.198)

Multicollinearity

Columns of P = P1

P2

327

... ... Pk define a new set of explanatory variables which are called principal

components. The OLS estimator Įˆ of D is given by: Įˆ = (PcP)-1Pcy = / -1Pcy

(6.199)

The variance-covariance matrix of Įˆ is given by: ˆ = (PcP)-1ı 2 var (Į) = ȁ -1ı 2 §1 1 1 · = ı 2 diagonal ¨ , ,.........., ¸ Ok ¹ © O1 O2

(6.200) k

k

Note that O j is the variance of the jth principal component and PcP = ¦ ¦ Pi Pj i=1 l=1

/ . A small eigenvalue of x cx means

that the linear relationship between the original explanatory variables exists and the variance of the corresponding orthogonal regression coefficient is large which indicates that multicollinearity exists. If one or more Ȝ j's are small, it indicates that multicollinearity is present. The Variance-Covariance Matrix of ȕˆ

We have ȕˆ = BĮˆ

(6.201)

The variance-covariance matrix of ȕˆ is given by: ˆ = Bvar(Į)B ˆ c var (ȕ) = B/ -1ı 2 Bc §1 1 1 · = B diagonal ¨ , ,.........., ¸ Bcı 2 Ok ¹ © O1 O2

(6.202)

Thus, from equation (6.212), we have 2

bij var(ȕˆ j ) = ı 2 ¦ i=1 Ȝ i k

(6.203)

That is, the variance of ȕˆ j is a linear combination of the reciprocal of the eigenvalues. O j is the variance of the jth principal component. If all the O j's are equal to one, it implies that the originally expected variables are all orthogonal. If O j = 0 , it implies a perfect linear relationship between the original explanatory variables. And, if O j tends to zero, it implies a linear relationship between the original explanatory variables. Retainment of Principal Components

To obtain the principal components estimators, the new set of variables or principal components P1 , Pk ,........, and Pk are assumed to be orthogonal and P’s are arranged in a decreasing-order magnitude of the eigenvalues of x cx such that O1 t O2 t ...... t Ok > 0 , and they retain the same magnitude of variance as of the original set of variables x’s. If the multicollinearity problem is severe, there will be at least one small value of eigenvalues. The elimination of one or more principal components associated with the smallest eigenvalues will reduce the total variance in the model. However, the principal components which are responsible for the multicollinearity problem will be removed from the model and the resulting model will be better. The principal component matrix P = P1 , P2 , ... ... ,Pk with P1 , Pk ,........, and Pk contains exactly the same information as the original data in x in the sense that the total variability in x and P is the same. The difference between them is that the original set of variables x’s are correlated and the set of

Chapter Six

328

new variables P1 , P2 ,........, and Pk are uncorrelated with each other and are arranged with respect to the magnitude of eigenvalues. The jth column vector Pj (j =1, 2,…….,k) corresponding to the largest O j accounts for the largest proportion of the variation in the original data and O j is the variance of Pj . If all the O j's are equal to one, it implies that the original expected variables are all orthogonal. O j = 0 implies a perfect linear relationship between the original explanatory variables. Thus, to solve this problem, the principal components will be eliminated from the model which are associated with the smallest eigenvalues. Since the principal component associated with the smallest eigenvalue is contributing the least variance, and so is the least informative, it will be eliminated. Using this procedure, principal components are eliminated until the remaining components explain some preselected variance in terms of the percentage of total variance. For example, if we assume that 95% of the total variance is needed, and suppose “s” principal components are eliminated which means that (k-s) principal components contribute 95% of the total variation, then s is selected to satisfy k-s

¦ Oj j=1 k

(6.204)

> 0.95

¦ Oj j=1

Also, the principal components associated with near-zero eigenvalues are removed from the model and the leastsquares method will be applied for estimation with the remaining principal components. Let the s principal components be eliminated. Now, only the remaining (k-s) components will be used for regression. So, the P matrix is partitioned as P = M1

M 2 = x Bs

Bk-s

(6.205)

where the submatrix M1 is of order (n u s) and contains s principal components to be eliminated. The submatrix M 2 is of order (n u (k-s)) and contains (k-s) principal components to be retained. After eliminating the s principal components from k, the reduced model can be written as: yi = Į1P1i +Į 2 P2i +......+Į k-s Pk-s,i +İ i

(6.206)

= M 2 į+İ

The reduced coefficients contain the coefficients associated with retained Pj's . So, M 2 = P1

P2

..... ..... Pk-s

į = Į1 , Į 2 , ... ... ,Į k-s c Bk-s = b1 , b 2 , ... ... ,b k-s

The OLS estimator of įˆ is given by:



įˆ = M 2c M 2



-1

M c2 y

(6.207)

Now, it is transformed back to the original explanatory variables as follows: Į = Bcȕ įˆ = Bck-sȕˆ pc

and ȕˆ pc = Bk-s įˆ

(6.208)

is the principal component regression estimator of ȕ . This method improves the efficiency as well as multicollinearity.

Multicollinearity

329

Ex. 6-17: Solve the multicollinearity problem for the given data in Ex. 6-14 using the principal component analysis. Solution: The principal component technique can be used to reduce the multicollinearity problem in the data. The reduction is accomplished by using less than the full set of principal components to explain the variation in the response variable. The correlation matrix for the variables x’s is given below in Table 6-9. Table 6-9: The Correlation matrix

x1 x2 x3

x4 x5

x1

x2

x3

x4

x5

1

-0.0380

0.1275

0.1275

0.1677

-0.0380

1

0.6714

0.8548

0.4988

0.1270

0.6714

1

0.9158

0.5884

0.1275

0.8548

0.9158

1

0.5887

0.1677

0.4988

0.5884

0.5887

1

The eigenvalues and eigenvectors of the correlation matrix are obtained by using the software package RATS and are given below in Table 6-10. Table 6-10: Eigenvalues and eigenvectors of the correlation matrix

Eigenvalues of the correlation matrix Difference Proportion Cumulative Value

Variable

Value

x1 x2 x3

3.0999

2.0707

0.6200

3.0999

Cumulative Proportion 0.6200

1.0292

0.4980

0.2058

4.1291

0.8258

0.5312

0.2241

0.1062

4.6603

0.9321

x4 x5

0.3071

0.2745

0.0614

4.9674

0.9935

0.0326

---

5.0000

1.0000

Variables

EV1 0.0898

0.0065 Eigenvectors EV2 EV3 0.9557 0.2286

EV4 0.1525

EV5 0.0559

0.4891

-0.2453

0.2761

0.7028

0.3612

0.5204

-0.0050

0.1465

-0.6777

0.4985

0.5502

-0.0520

0.2593

-0.0991

-0.7858

0.4233

0.1543

-0.8847

0.1175

-0.0207

x1 x2 x3 x4 x5

Let P1 , P2 , P3 , P4 , and P5 are the principal components which are given by: P1 = 0.0898x1 +0.4891x 2  0.5204x 3  0.5502x 4  0.4233x 5 ½ ° P2 = 0.9557x1  0.2453x 2  0.0050x 3  0.0520x 4 +0.1543x 5 ° ° P3 = 0.2286x1 +0.2276x 2 +0.1465x 3 +0.2593x 4  0.8847x 5 ¾ P4 = 0.1525x1 +0.7028x 2  0.6777x 3  0.0991x 4 +0.1175x 5 ° ° P5 = 0.0559x1 +0.3612x 2 + 0.4985x 3  0.7858x 4  0.0207x 5 °¿

(6.209)

Using these principal components, the model can be written as: yi = Į1P1i +Į 2 P2i +Į3 P3i +Į 4 P4i +Į 5 P5i +İ i

(6.210)

From the estimated results in Table 6-10, it is found that the first four components account for 99.35% of the total variance. Therefore, we can say that the 5th principal component is not significant. Hence, the first four components have been chosen. Then, the linear regression of Y on P1 , P2 , P3 , and P4 is given by: yi = Į1P1i +Į 2 P2i +Į3 P3i +Į 4 P4i +vi

(6.211)

Chapter Six

330

The OLS estimates of Į's of equation (6.211) are obtained using the software package RATS and the results are given below in Table 6-11. Table 6-11: The OLS estimates P1

Coefficient -15648.92

Std. Error 2005.220

t-Statistic -7.804092

Prob. 0.0000

P2

17530.43

3474.577

5.045342

0.0000

P3

-5893.337

4901.699

-1.202305

0.2361

P4

-26104.58

6362.100

-4.103139 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.

0.0002

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log-likelihood Durbin-Watson stat

0.7159 0.6951 3524.665 5.09E+08 -429.2970 1.557120

1.11E-05 6383.519 19.25765 19.41824 19.31751

The estimated values of ȕ's can be obtained using the following equation ȕˆ pc = BĮˆ

(6.212)

and the results are given below in Table 6-12.

ˆ Table 6-12: Estimated results of ȕ's x1 x2 x3

Coefficient 10020.3936

Std. Error 3634.3276

t-Statistic 2.7572

Prob. 0.0058

-31927.5504

4903.1343

-6.5117

0.0000

8596.3499

4483.9771

1.9171

0.0552

x4

-8462.7966

1841.0000

-4.5968

0.0000

Exercises 6-1: Explain the meaning of multicollinearity with an example. 6-2: Let us consider a multiple linear regression equation of the type: yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i . If the variable X1 is linearly related to the variable X2, then show that the parameters ȕ1 and ȕ 2 cannot be estimated separately. 6-3: Write different sources of multicollinearity with an example of each. 6-4: What are the consequences of multicollinearity? 6-5: Let us consider a multiple linear regression equation of the type: yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +İ i .

(i) Show that the OLS estimators of ȕ1 and ȕ 2 will be undefined if the variables X1 and X 2 are perfectly related. (ii) Show that the OLS estimators of ȕ1 and ȕ 2 will be zero if there exists a high degree of collinearity between X1 and X 2 . (iii) Show that the variances of the OLS estimators of ȕ1 and ȕ 2 are undefined if the variables X1 and X 2 are perfectly related. Also show that the magnitudes of the variances of OLS estimators ȕ1 and ȕ 2 rise sharply when the degree of relationship between X1 and X 2 will be very high. 6-6: Show that due to a multicollinearity problem, some important variables can be dropped from the multiple linear regression equation. 6-7: Let us consider a multiple linear regression equation of the type:

Multicollinearity

331

Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +İ i

Let ȕˆ j be the OLS estimate of ȕ j when the jth explanatory variable linearly depends on the remaining (k-1) explanatory variables, ȕˆ j0 be the OLS estimate of ȕ j in case of orthogonality and let R 2j be the coefficient of multiple determination which is obtained from a regression equation of the jth explanatory variable on the remaining (k-1) var(ȕˆ j ) 1 explanatory variables. Then show that . Also, shows that the magnitude of var(ȕˆ j ) relative to = 2 var(ȕˆ ) 1  R j j0

var(ȕˆ j0 ) will rise sharply if the degree of relationship of X j with the remaining (k-1) explanatory variables is very

high. 6-8: Show that the OLS estimator ȕˆ of ȕ becomes too large in absolute value in the presence of multicollinearity. 6-9: Show that the maximum likelihood estimate of ı 2 cannot be obtained if the multicollinearity is perfect. 6-10: Write different techniques to detect the presence of multicollinearity problem in a data set. 6-11: Discuss the pairwise correlation technique, determinant of (XcX) matrix, and correlation matrix to detect the presence of multicollinearity problem. 6-12: In which situation do we have to consider the auxiliary regression equation to detect the presence of a multicollinearity problem? Discuss this technique to detect the presence of multicollinearity problem. 6-13: Discuss Klein’s rule to detect the problem of multicollinearity in a data set. 6-14: Define the variance inflation factor (VIF). Explain how the variance inflation factor (VIF) can be applied to detect the multicollinearity problem. 6-15: Discuss the Farrar and Glauber tests for detecting the presence of multicollinearity problem in a data set. 6-16: What are the advantages of the Farrar and Glauber tests? 6-17: Discuss Leamer’s method to detect the multicollinearity problem. 6-18: Discuss the condition number technique for detecting the presence of multicollinearity problem in a data set. 6-19: Write different estimation techniques that can be applied when a multicollinearity problem exists in a data. 6-20: Discuss the ridge regression estimation technique in case of a non-orthogonal situation. 6-21: Find the mean, variance-covariance matrix, and mean squared error of the ridge estimator Įˆ R of Į . 6-22: Define the ridge trace. Explain how this method can be applied to obtain a ridge estimator. 6-23: What are the advantages and disadvantages of ridge regression analysis? 6-24: What is meant by principal component? Explain with an example. 6-25: Discuss the principal component analysis to solve the multicollinearity problem in a multiple linear regression equation. 6-26: Let us consider a multiple linear regression equation of the type: yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ3 X 3i +İ i

How will you apply the principal component procedure to estimate the parameters if there exists a multicollinearity problem? 6-27: Justify whether the following statements are true or false. Explain:

(i) In a multiple linear regression, the linear relationship among the regressors in the sample implies that the effects of change in an individual regressor cannot be obtained separately. (ii) The OLS estimates will be biased.

Chapter Six

332

(iii) It is not possible for us to decide whether the multicollinearity problem is present or not in a data set by looking at the correlations between the explanatory variables. (iv) The OLS estimators have high variances; this is evidence of multicollinearity. (v) The small value of the t-test for testing the null hypothesis H 0 : ȕi = 0 (i =1, 2,...,k) implies the presence of multicollinearity. 6-28: The results of the principal component analysis of the regression equation of carbon emissions on per capita energy use (EN), per capita electricity consumption (ELEC), per capita real GDP (PGDP), urbanisation (UR), and trade openness (OPN) of USA from the period 1970-2018 are given below:

(Eigenvalues: (Sum = 5, Average = 1) Number 1 2 3 4 5

Value 3.982782 0.781643 0.145824 0.079695 0.010056

Variable

EV1

EN ELEC PGDP UR OPN

-0.294277 0.443879 0.493235 0.487921 0.484793

EN ELEC PGDP UR OPN

EN 1.000000 -0.265166 -0.503425 -0.537680 -0.514466

Difference Proportion 3.201139 0.7966 0.635819 0.1563 0.066129 0.0292 0.069638 0.0159 --0.0020 Eigenvectors (loadings): EV2 EV3 0.906465 0.289110 0.406801 -0.740163 0.089714 0.306936 0.020195 0.523613 0.066168 0.013919 Ordinary correlations: ELEC PGDP 1.000000 0.871874 0.819961 0.857785

1.000000 0.984728 0.939764

Cumulative Value 3.982782 4.764425 4.910249 4.989944 5.000000

Cumulative Proportion 0.7966 0.9529 0.9820 0.9980 1.0000

EV4

EV5

0.090171 -0.271544 -0.252878 -0.308558 0.871194

0.002746 0.126121 -0.768449 0.626216 0.037764

UR

OPN

1.000000 0.923012

1.000000

(i) Which variables would you like to eliminate from a group of 5 explanatory variables? Why? Explain.

(ii) Obtain the principal components. (iii) Discuss the OLS method to estimate the parameters using these principal components. 6-29: Let us consider a multiple linear regression equation of the type: yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ3 X 3i +İ i

What problems can happen if we apply the OLS method to estimate the parameters when X 2 and X3 are linearly related to each other? Justify.

CHAPTER SEVEN SELECTION OF BEST REGRESSION EQUATION AND DIAGNOSTIC TESTING

7.1 Introduction In a multiple regression model, several independent or explanatory variables are included, but all of them are not important, i.e., do not have significant effects on the dependent variable Y. Thus, we have to develop a multiple regression equation in which all the variables have significant effects on the dependent variable Y. For example, suppose that we wish to establish a multiple linear regression equation for a particular dependent variable RET, say annual returns, in terms of some independent variables such as the size of the firm (S), market-to-book ratio (MB), price earning ratio (PE), liquidity ratio (LR), debt ratio (DR) and beta coefficient (BETA). The regression equation is given by: RETi = ȕ 0 +ȕ1Si +ȕ 2 MBi +ȕ 3 PE i +ȕ 4 LR i +ȕ5 DR i ȕ 6 BETA i +İ i

(7.1)

where RETi is the annual returns in the percentage of stock i, Si is the size of the ith firm which is measured in terms of sales revenue, MBi is the market-to-book ratio of the ith firm, PE i is the price earning ratio of the ith firm, LR i is the liquidity ratio of the ith firm, DR i is the debt ratio of the ith firm, BETA i is the stock’s CAPM beta coefficient of the ith firm, and İ i is the random error term corresponding to the ith set of observations. We know that the variables which are considered in the regression model (7.1) have no significant effect on RET. Therefore, we have to develop a return equation in which all the explanatory variables have significant effects on the dependent variable RET. This is known as the selection of the best regression equation. Compromising two possible criteria usually does the selection of the best regression equation namely: One: To make the multiple regression equation useful for prediction purposes, we should develop the regression model in such a way that we have to include as many variables as possible so that the reliable fitted value can be determined. As a result, the developed equation can be used for prediction purposes. Two: Since the cost, time, and number of skilled persons are associated with collecting information on several variables and subsequently monitoring them, we should derive the equation to include as few variables as possible to minimise time and costs. Therefore, we have to deal with different appropriate techniques to select the best regression equation among different alternatives. We know that different criteria are developed to select the best regression equation, but, in this chapter, the criteria which are most popular and widely applicable to select the best regression equation are all possible regressions, best subset regressions, backward elimination procedure, forward selection procedure, and stepwise regression. Some variations on these methods are discussed with their applications to numerical problems. In the case of multiple regression models, if the model is not correctly specified, the estimator would be biased. The amount of biasedness depends on the size of the parameters of omitted variables. This biasedness misleads the equation and prediction purposes. Thus, it is very important to apply the diagnostic tests. In this chapter, the most popular and widely applicable diagnostic tests including the Chow Test, CUSUM test, CUSUSQ test, Harvey and Collier (1977) test, and Ramsey test are discussed with their applications to numerical problems.

7.2 Important Techniques to Select the Best Regression Equation There is no unique statistical technique, method, procedure or criteria for selecting a regression equation which will be the best. If we know the magnitude of ı 2 (the true random variance of the observations) for any single well-defined problem, our choice of the best regression equation would be much easier. The techniques which are most popular and widely applicable for selecting the best regression equation are discussed in this section. (1) All possible regressions based on three criteria (i) Coefficient of determination R2 R2 =

Regression Sum of Squares (Adjusted) Total Sum of squares (Adjusted)

Chapter Seven

334 n

¦e

2 i

i=1

= 1

n

(7.2)

¦ yi2  ny 2 i=1

(ii): Residual mean square s 2 (Error mean square): s2 =

Residual Sum of Squares Degrees of Freedom n

=

¦e

2 i

i=1

n  k 1

(7.3)

where k is the number of explanatory variables (iii): Mallows C p statistic Cp =

ESSp s2

 (n  2p)

(7.4)

where ESSp is the residual sum of squares from a regression equation which contains p parameters including ȕ 0 , and s 2 is the residual mean square of the regression equation which contains all the independent variables.

(2) Best subset regressions based on three criteria (i): Coefficient of determination R 2 (ii) Adjusted R 2 which is given by: R2 = 1

Residual Sum of Squares/n-k-1 Total Sum of Squares/n-1

(7.5)

(iii): Mallows Cp statistic (3) Backward elimination procedure (4) Forward selection procedure (5) Stepwise regression (6) Ridge regression (8) Principal components regression (9) Latent root regression All Possible Regressions

This procedure first requires the fitting of every possible regression equation involving any number of explanatory variables. Each regression equation is examined according to the following three criteria: (i): The value of R 2 achieved by the least-squares fit. It is called the coefficient of multiple determination. It measures the proportion of total variation in the dependent variable Y which can be explained by the fitted regression equation. (ii) The value of residual mean square s 2 achieved by the least-squares fit. (iii). Mallow's Cp statistic. Use of the R2 Statistic

This procedure involves the following steps to select the best regression equation:

Selection of Best Regression Equation and Diagnostic Testing

335

Step 1: Run all possible regression equations in the following manner. For k explanatory variables, we will divide all the runs or regression equations into (k+1) sets of the type: Set (a): Consists of the regression equation with only the mean value, of the type: E(Yi ) = ȕ 0 Set (b): Consists of the regression equations with only a single explanatory variable of the type:

E(Yi ) = ȕ 0 +ȕ j X ji , (j = 1, 2,…..,k). Set (c): Consists of the regression equations with only two explanatory variables of the type: E(Yi ) = ȕ 0 +ȕ j X ji +ȕ m X mi , (j, m = 1, 2,….,k, j z m )

… … … Set (k+1): Consists of the regression equation with k explanatory variables of the type: E(Yi ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.........+ȕ k X ki

Step 2: Let R 2p denote the coefficient of determination for a subset regression model with p parameters, p = 1, 2,

§k · ……,.k+1. For a p parameters subset model, there are ¨ ¸ values of R 2p for each value of p. The value of R 2p © p-1¹ increases as p increases and it will be maximum when p=k+1. This procedure is terminated when a small increase in R 2p is found. Order the regression equations within each set (functions of same number of variables) by the value of R 2p .

Step 3: Examine the leading equation(s) which have the maximum R 2p in each set of equations and see if there is any

consistent pattern of variables in the leading equation in each set. Step 4: Select the regression equation which has a stabilised R 2p or R2 value and contains the fewest number of

explanatory variables that will be the best regression equation. Ex. 7-1: Suppose, we have four explanatory variables say: X1 , X 2 , X 3 , and X 4 . After running all possible regression equations, we have the following leader(s) in each set of equations given in Table 7-1. Table 7-1: Leader(s) in each set of equations Set

Set A Set B Set C

Leader(s) in Each Set ˆ = f(ȕˆ ) Y i 0 ˆ = ȕˆ +ȕˆ X Y i 0 4 4i ˆ ˆ ˆ Yi = ȕ 0 +ȕ1X1i +ȕˆ 2 X 2i ˆ = ȕˆ +ȕˆ X +ȕˆ X Y i

Set D Set E

R 2 in % 0

0

1

1i

2

67.5% 97.95% 97.2%

4i

ˆ = ȕˆ +ȕˆ X +ȕˆ X +ȕˆ X Y i 0 1 1i 2 2i 4 4i ˆ ˆ ˆ ˆ ˆ Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ 3 X 3i +ȕˆ 4 X 4i

98.15% 98.17%

From the estimated results, it is clear that the best choice lies in Set C. But which one should be chosen? If ˆ = f(X , X ) is chosen, there is some inconsistency, because the significant variable X is not included in the Y i 1 2 4 ˆ model. Therefore, the best selection would be Y = f(X , X , X ) . For this fitted model, the effect of the variable X i

1

2

4

3

on Y is negligible but it may be possible that the effect of the variable X 4 is significant. That is why it would be better to consider the equation Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ 3 X 4i +İ i for estimation. Ex. 7-2: Select the best regression equation using R2 statistic for the given problem in Ex. 3-5 considering a regression equation of CO2 on EN, OPN, UR and PGDP. Solution: Firstly, we find out the total number of sets for the given problem. For the given problem, the number of explanatory variables k is 4, so the total number of sets will be (k+1) = (4+1) = 5. Now, we will arrange all possible

336

Chapter Seven

regression equations for each set. Let the variable Y indicates the per capita carbon dioxide emissions in metric tons (CO2), and the variables X1 , X 2 , X 3 , and X 4 represent per capita primary energy consumption (EN) (millions Btu), trade openness (OPN) (% of exports and imports of GDP), urbanisation (UR) (% urban population of total) and per capita real GDP (PGDP) (constant 2010 US $) of the USA over a period of time. The regression equations for each set are given below in Table 7-2: Table 7-2: Number of regression equations in each set Set Set (A): Only the constant term is included in the equation Set (B): Only one independent variable is included in the regression equations

Equation(s) in Each Set (i) E(Yt ) = ȕ 0 (i) E(Yt ) = ȕ10 +ȕ11X1t (ii) E(Yt ) = ȕ 20 +ȕ 22 X 2t (iii) E(Yt ) = ȕ 30 +ȕ33 X 3t (iv) E(Yt ) = ȕ 40 +ȕ 44 X 4t

Set (C): Two independent variables are included in the regression equations

(i) E(Yt ) = Į10 +Į11X1t +Į12 X 2t (ii) E(Yt ) = Į 20 +Į 21X1t +Į 23 X 3t (iii) E(Yt ) = Į30 +Į31X1t +Į34 X 4t (iv) E(Yt ) = Į 40 +Į 42 X 2t +Į 43 X 3t (v) E(Yt ) = Į50 +Į52 X 2t +Į54 X 4t (vi) E(Yt ) = Į 60 +Į 63 X 3t +Į 64 X 4t

Set (D): Three independent variables are included in the regression equations

(i) E(Yt ) = į10 +į11X1t +į12 X 2t +į13 X 3t (ii) E(Yt ) = į 20 +į 21X1t +į 22 X 2t +į 24 X 4t (iii) E(Yt ) = į30 +į31X1t +į33 X 3t +į34 X 4t (iv) E(Yt ) = į 40 +į 42 X 2t +į 43 X 3t +į 44 X 4t

Set (E): Four independent variables are included in the regression equations

(i) E(Yt ) = Ȝ10 +Ȝ11X1t +Ȝ12 X 2t +Ȝ13 X 3t +Ȝ14 X 4t

We apply the OLS method to run all of these regression equations and then we obtain the value of R2. The estimated values of R2 for each equation in each set are shown below in Table 7-3. Table 7-3: Estimated values of R2 Set Set (A)

Set (B): Only one independent-variable regression equations

Equation(s) in Each Set ˆ = f(ȕˆ ) (i) Y t 0 ˆ = f(X ) (i) Y t 1 ˆ = f(X ) (ii) Y t

(iii) 60.27

ˆ = f(X , X ) (i) Y t 1 2 ˆ (ii) Y = f(X , X )

(i) 94.95

ˆ = f(X , X ) (iii) Y t 1 4 ˆ = f(X , X ) (iv) Y t 2 3 ˆ (v) Y = f(X , X )

(iii) 95.16

ˆ = f(X , X ) (vi) Y t 3 4

(vi) 60.67

ˆ = f(X , X , X ) (i) Y t 1 2 3 ˆ (ii) Y = f(X , X , X )

(i) 95.17

ˆ = f(X , X , X ) (iii) Y t 1 3 4 ˆ = f(X , X , X ) (iv) Y t 2 3 4

(iii) 95.49

ˆ = f(X , X , X , X ) (i) Y t 1 2 3 4

(i) 95.88%

t

Set (D): Three independent-variable regression equations Set (E): Four independent-variable regression equations

(ii) 57.40

2

ˆ = f(X ) (iii) Y t 3 ˆ (iv) Yt = f(X 4 )

t

Set (C): Two independent-variable regression equations

Estimated R2 (%) 0 (i) 82.67

t

1

2

1

(iv) 60.14 (ii) 93.87

3

(iv) 61.41 (v) 60.86

4

2

4

(ii) 95.59 (iv) 61.43%

Selection of Best Regression Equation and Diagnostic Testing

337

We now select the leading equation(s) from each set which has a maximum value of R2. The leading equation(s) are given below in Table 7-4 for each set. Table 7-4: Leading equation of each set Set

Set (B): One-variable model Set (C): Two-variable model

Variables in Equations ˆ = f(X ) Y t 1 ˆ Yt = f(X1 , X 2 ) ˆ = f(X , X ) Y t

Set (D): Three-variable model

94.95 95.16

4

ˆ = f(X , X , X ) Y t 1 2 3 ˆ Yt = f(X1 , X 2 , X 4 ) ˆ = f(X , X , X ) Y

95.17

ˆ = f(X , X , X , X ) Y t 1 2 3 4

95.88

t

Set (E): Four-variable model

1

100R 2 % 82.67

1

3

4

95.59 95.49

ˆ = f(X , X , X ) . Also, it is It is found that for Set (D), R2 is the highest for the three-variable regression equation Y t 1 2 4 ˆ = f(X , X , X , X ) , R2 is the highest but the difference found that for the four-variable regression equation Y t 1 2 3 4 between these two values is very small. That is why we can select the three-variable regression ˆ = f(X , X , X ) . But the problem is that, if we select the three-variable regression equation, the variable equation Y t 1 2 4 X3 having a significant effect on Y will be dropped from the equation. Therefore, it would be better to select the fourvariable regression equation. So, our best regression equation will be E(Yt ) = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ3 X 3t +ȕ 4 X 4t .

Using Residual Mean Square

This procedure for selecting the best regression equation involves the following steps: Step 1: Firstly, we find out the total number of sets of regression equations for the given number of explanatory variables. For k explanatory variables, the total number of sets will be (k+1). Step 2: Secondly, we arrange all possible regression equations for each set of the type: Set (A): Only the constant term is included in the equation, i.e., E(Yi ) = ȕ 0 . Set (B): A single explanatory variable is included in each equation, i.e., E(Yi ) = ȕ 0 +ȕ j X ji , (j = 1, 2, .,,,k). Set (C): Two explanatory variables are included in each equation, i.e., E(Yi ) = ȕ 0 +ȕ j X ji +ȕ l X li , (j z l =1, 2,.....,k) .

…... …… …… Set (k+1): All k explanatory variables are included in the equation, i.e., E(Yi ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +........+ȕ k X ki .

Step 3: Thirdly, we apply the OLS method to run each and every equation in each set and then we obtain the residual mean squares (EMS) for every regression equation in each set. The residual mean squares, is given by Residual Sum of Squares S2 (p) = EMS = , where p is the number of parameters in the equation. Then, we find the Degrees of Freedom mean value of EMS for every set. This can be explained with the help of Table 7-5.

Chapter Seven

338

Table 7-5: Residual means squares (EMS) and the average value of EMS for each set.

No. of Parameters (p)

Residual Mean Squares (EMS)

Average Value of EMS {s 2 (p)}

2

s12 (2), s 22 (2),.........,s 2k (2)

s 2 (2)

3

s12 (3), s 22 (3),..........,s 2k (3)

s 2 (3)

4

s12 (4), s 22 (4),..........,s 2k (4)

s 2 (4)

. . .

. . . s12 (k+1)

. . . s 2 (k+1)

2 3

(k+1)

Step 4: Plot the average value of the residual mean squares {s 2 (p)} to the Y-axis corresponding to their parameters.

s 2 (p)

s 2 (3) 3

Parameters (p)

Fig. 7-1: Plot of average values of residual mean squares

Step 5: Obtain the number of parameters, say p=3, for which s 2 (p) tends to be stabilised. Step 6: The best regression equation will be the model for which the residual mean squares is approximately equal to this stabilised value of s 2 and contains (p-1) explanatory variables. Ex. 7-3: Data on the logarithmic forms of the per capita GDP in USD (Y), per capita energy consumption in kg of oil equivalent ( X1 ), domestic investment ( X 2 ), trade openness ( X 3 ), external debt ( X 4 ) and government spending ( X5 ) of Bangladesh are collected for the period 1972-20181 to select the best regression of Y on X1 , X 2 , X 3 , , X 4 , and X5 using the residual mean square criteria. Since the number of explanatory variables k is 5, the total number of sets will be (k+1) = (5+1) = 6. Now, we will arrange all possible regression equations for each set. Thus, we have the following regression equations for each set

Set (A): Only one equation is to be formed including only the constant term in the model. §5· Set (B): This set contains ¨ ¸ = 5 equations with only a single explanatory variable or with two parameters. ©1 ¹ §5· Set (C): This set contains ¨ ¸ = 10 equations with only two explanatory variables or with three parameters. © 2¹

§ 5· Set (D): There will be ¨ ¸ = 10 equations in set D with only three explanatory variables or with four parameters. © 3¹ §5 · Set (E): The total number of equations in set E will be ¨ ¸ = 5 with only four explanatory variables or with only five © 4¹ parameters. 1

The variables domestic investment, trade openness, external debt and government spending are measured as % of GDP. Source: WDI, Own calculations

Selection of Best Regression Equation and Diagnostic Testing

339

§5· Set (F): There will be ¨ ¸ = 1 equation in set F with only five explanatory variables or with six parameters. ©5¹ We now apply the OLS method to run all the regression equations for every set and we calculate the residual mean squares (EMS) for each equation. Then, we find the mean value of EMS for each set. The results are obtained using the software package RATS and given in Table 7-6. Table 7-6: Residual mean squares and their average value for each group.

No. of Parameters (p) 2

Average Value of EMS 0.0732

Residual Mean Squares (EMS) 4.3112e-003, 0.1048, 0.0315, 0.1420, 0.0831 4.0047e-003, 4.2014e-003, 3.2305e-003, 4.1373e-003, 0.0305, 0.0932, 0.0765, 0.0308, 0.0320, 0.0831 3.8089e-003, 3.2588e-003, 3.6622e-003, 3.0899e-003, 3.6475e-003, 3.0302e-003, 0.0280, 0.0312, 0.0712, 0.0314 3.0857e-003, 2.8148e-003, 2.9895e-003, 2.4983e-003, 0.0287 2.2616e-003

3 4 5 6

0.0362 0.0182 8.0138e-003 2.2616e-003

We plot the average value of the residual mean squares to the Y-axis corresponding to their parameters p. The figure is shown below: 0.08 0.07

Average EMS

0.06 0.05 0.04 0.03 0.02 0.01 0.00 2

3

4

Parameters

5

6

Fig. 7-2: Plot of the average value of residual mean squares for each set

From Fig.7-2, it can be said that the regression equation with six parameters have the smallest average residaul mean square. Finally, it can be concluded that it would be better to consider all the explanatory variables in the regression equation. Therefore, the final regression equation will be E(Yi ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ3 X 3i +ȕ 4 X 4i +ȕ5 X 5i . Use of the Mallows C p Statistic

C. L. Mallows has suggested the following statistic for selecting the best regression equation. The Mallows Cp statistic is given by: Cp =

ESSp s2

(7.6)

 (n  2p)

where ESSp is the residual sum of squares from an equation containing p parameters including ȕ 0 , and s 2 is the residual mean squares from the equation containing all the explanatory variables which is given by n

¦e

2

i Residual Sum of Squares s2 = = i=1 Degrees of Freedom n  k 1

where k is the number of explanatory variables.

(7.7)

Chapter Seven

340

Assumptions:

(i) R.W. Kennard has pointed out the value of CP is closely related to the adjusted R 2 statistic, i.e., R 2 , and it is also related to the R 2 statistic. (ii) If an equation with p parameters is adequate, that is, is does not suffer from lack of fit, then we have E(ESSp ) (n  p)ı 2 because we also assume that E(s 2 ) = ı 2 . (iii) The expected value of CP is given by: ª ­ ESSp ½ º E CP # E « ® 2 ¾  (n  2p) » ı ¿ ¬« ¯ ¼» #

(n  p)V 2

V2

 n+2p

= p [for an adequate model]

(7.8)

We now calculate the value of Cp for all p and then plot Cp against p. The best model is selected by examining Cp against p. The selection procedure is as follows:

Mallows Cp Statistic

8

CP = p

6 4 2 0

0

2

4

6

8

Fig. 7-3: Plot of Mallows Cp statistic

(i) If the points are fairly close to C p = p line, then the model is adequate. (ii) If the points are above the line Cp = p, then the equation has a lack of fit, is not adequate, i.e., a biased equation.

(iii) Because of random variation, points representing a well-fitting equation can also fall below the Cp = p line. (iv) The best model is selected with a low Cp value about the equation C p = p . (vi) If the choice is not clear-cut, it is a matter of personal judgment. (vii) A biased equation that does not represent the actual data as well, because it has larger ESSp so that Cp > p but has a smaller estimate of Cp , indicates a smaller discrepancy from the true unknown model. (viii) An equation with more parameters that fits the actual data well but has a larger discrepancy from the true model but the model is unknown. Ex. 7-4: Select the best regression equation for the given problem in Ex. 7-3 using the Mallows CP statistic. Solution: The estimated values of Mallows CP statistic corresponding to different parameters are given in Table 7-7.

Selection of Best Regression Equation and Diagnostic Testing

341

Table 7-7: Mallows Cp statistic corresponding to different parameters

No. of parameters

Cp Statistic

2 3 4 5 6

42.7794, 36.9109 33.4180 20.3030 6.0000

From the estimated results in Table 7-7, it is found that for all regression equations, the estimated values of the Mallows Cp statistic is above the line Cp = p except for the regression equation with parameters 6. For the regression equation with parameters 6, it is found that Cp = p . Thus, it can be said that, except for the regression equation E(Yi ) =ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ 3X 3i +ȕ 4 X 4i +ȕ5 X 5i , all the equations are devoid of fit, not adequate, i.e., they are biased equations. Thus, for the given problem, the best regression equation will be E(Yi ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +ȕ 3X 3i +ȕ 4 X 4i +ȕ 5X 5i .

The Backward Elimination Procedure

In this method, the explanatory variables will be eliminated one by one which has no significant effect on the dependent variable Y until the satisfactory regression equation is obtained. This procedure involves the following steps: Step 1: Firstly, we regress Y on all k explanatory variables of the type: Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +İ i

(7.9)

We assume that the random error term İ i satisfies all the usual assumptions of a CRLM. Step 2: Secondly, we set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is: H 0 : ȕ j = 0, (j = 1, 2,.....,k)

against the alternative hypothesis H1: ȕ j z 0, (j = 1, 2,........,k) Step 3: Thirdly, we apply the OLS method to estimate the regression equation (7.9) and then we calculate the value of the partial F-test statistic for testing the null hypothesis for every predictor variable. Under the null hypothesis, the Ftest statistic is given by:

F=

RESS UESS ~F(1, n UESS n  k 1

Fcal

 k  1) (7.10)

where RESS = restricted residual sum of squares, UESS = unrestricted residual sum of squares, and k = number of explanatory variables. For k explanatory variables, we find the k F-test values. Step 4: We will select the lowest value, say Flcal , among k partial F-test values and then we compare the lowest partial F-test value with the table value of F-statistic, say Ftab .

(i) If Flcal < Ftab , we accept the null hypothesis implying that the explanatory variable, say X l , has no significant effect on Y. Thus, we remove the variable X l from the regression equation (7.9), and then we repeat steps 1, 2, 3 and 4 until the best model is selected with the remaining explanatory variables. (ii) If Flcal > Ftab , we stop the elimination procedure and consider that the regression equation obtained in step 1 is the best.

Chapter Seven

342

Advantage

(1) The backward elimination method is more economical than all the regression methods in the sense that it tries to examine only the best regression containing a certain number of variables. (2) This is a satisfactory procedure for statisticians who like to see all the variables in the equation in order not to miss any variable. Disadvantages:

(1) Multicollinearity is a serious problem; the backward elimination procedure does not work well. (2) This procedure eliminates the most important variable in the very first step. Ex. 7-5: For the given problem in Ex. 3-5, select the best regression equation considering the non-linear regression equation of the type CO2t = A 0 PGDPtȕ1 OPNȕt 2 UR ȕt 3 ENȕt 4 eİ t using the backward elimination procedure. Solution: Here, the given non-linear regression equation is: CO2t = A 0 PGDPtȕ1 OPNȕt 2 UR ȕt 3 ENȕt 4 eİ t

(7.11)

All the variables are defined previously. The logarithmic transformation of equation (7.11) will be: ln(CO2 t ) = ln(A 0 )+ȕ1ln(PGDPt )+ȕ 2 ln(OPN t )+ȕ3 ln(UR t )+ȕ 4 ln(EN t )+İ t

(7.12)

Yt = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ3 X 3t +ȕ 4 X 4t +İ t

where Yt = ln(CO2 t ), X1t = ln(PGDPt ), X 2t = ln(OPN t ) , X 3t = ln(UR t ), X 4t =ln(EN t ), and ȕ 0 = ln(A 0 ) . We assume that the random error term İ i satisfies all the usual assumptions of a CLRM. The partial F-test values for testing the null hypothesis H 0 : ȕ j = 0, (j =1, 2, 3, 4), against H1: ȕ j z 0, (j = 1, 2, 3, 4) are obtained by using the software package RATS and are given in Table 7-8. Table 7-8: The partial F-test values

Hypothesis H 0 : ȕ1 = 0 H0 : ȕ2 = 0 H 0 : ȕ3 = 0 H0 : ȕ4 = 0

F-test Value 3.3931 4.5508 0.2293 419.6751

Ftab At a 5% level of significance with 1 and 44 degrees of freedom, the table value of the test statistic is 4.064.

Decision Since 0.2229 < 4.064, the variable X 3 will be eliminated from the equation.

After eliminating the variable X 3 from the original regression equation (7.12), we have (7.13)

E(Yt ) = ȕ 0 + ȕ1X1t +ȕ 2 X 2t +ȕ 4 X 4t

The partial F-test values for testing the null hypothesis H 0 : ȕ j = 0, (j = 1, 2, 4) against H1: ȕ j z 0, (j = 1, 2, 4) will be calculated and given in Table 7-9. Table 7-9: The partial F-test values

Hypothesis

F-test Value

H 0 : ȕ1 = 0

7.3481

H0 : ȕ2 = 0

5.2196

H0 : ȕ4 = 0

467.9529

Ftab At a 5% level of significance with 1 and 45 degrees of freedom, the table value of the test statistic is 4.06.

Decision Since 5.2196 > 4.06, we stop the elimination procedure.

ȕ1 ȕ2 ȕ4 İ t Therefore, the final equation will be E(Y) t = ȕ0 +ȕ1X1t +ȕ2X2t +ȕ4X4t or CO2 t = A 0 PGDPt OPN t EN t e .

Selection of Best Regression Equation and Diagnostic Testing

343

Forward Selection Procedure

In this method, the explanatory variables are added to the equation one by one until a satisfactory regression equation is obtained. This procedure involves the following steps: Step 1: Firstly, we consider the regression equation of the type E(Yi ) = ȕ 0 . Step 2: Secondly, we calculate the simple correlation coefficient between Y and each of the variable X’s and then we select a variable which has the largest value of correlation coefficient with Y. Let the correlation coefficient between Y and X j (j =1, 2,….,k) be the highest. Now, we test whether the relationship is statistically significant or not.

(i) The null hypothesis to be tested is H 0 : ȡ YX j = 0,  j

against the alternative hypothesis H1: ȡ YX j z 0,  j

Under the null hypothesis, the test statistic is given by: t

rYX j n  2 2 1  rYX

~t (n  2)d.f.

(7.14)

j

If rYX j is significantly large, then enter the variable X j to the equation. So, the model will be E(Yi ) = ȕ 0 +ȕ1X ji

(ii) Otherwise, we stop the selection procedure and accept the equation E(Yi ) = ȕ 0 as the best regression equation. Step 3: After entering the variable X j in the model, we will search for a second independent variable. Let us examine

the partial F-test value for the rest of the explanatory variables. Say, the variable X m having the highest partial F-test value is denoted by FLar . Step 4: We then compare the largest partial F-value with the table value at a 5% level of significance with (1, n-k-1) degrees of freedom, where k is the number of explanatory variables to the unrestricted regression equation.

(i) If FLar < Ftab , stop the selection procedure and consider the equation E(Yi ) = ȕ 0 +ȕ1X ji

as the best regression

equation. (ii) Otherwise, enter the variable X m into the regression equation, and thus, the equation will be E(Yi ) = ȕ 0 +ȕ1X ji +ȕ 2 X mi . Step 5: Repeat steps 3 and 4 until the best regression equation is obtained. Advantages:

(i) The forward selection procedure involves much less computational effort than the backward elimination procedure. (ii) When the explanatory variables X’s are multicollinear, this method is more applicable than the backward elimination procedure. Disadvantages:

(i) The most serious drawback of this procedure is that a variable which may have been the best single variable at an early stage may be superfluous at the later stage.

Chapter Seven

344

Ex. 7-6: Data on the logarithmic forms of the variables GDP ( Y = ln(GDP), constant 2010 USD), capital investment (X1 =ln(K), constant 2010 USD), labour force (X2=ln(LF)), total electricity consumption (X3=ln(EC), kWt) and total arable land (X4=ln(LAND), hectares) of Bangladesh are collected for the period 1980-2018 2 to detect the best regression equation using the forward selection procedure. First, we regress Y on the constant term of the type E(Yt ) = ȕ 0 , and then we calculate the simple correlation coefficients between Y and each of the four explanatory variables which are given in Table 7-10. Table 7-10: Estimated correlation coefficients between Y and each of the explanatory variables

Correlation Coefficient between Y and X’s rYX1

Estimated Values

t-test value

0.9982

rYX2

0.9109

rYX3

0.9892

rYX4

-0.9291

102.6585

Decision The variable X1 enter in the equation.

Since the value of rYX1 = 0.9982 is the largest, we select the variable X1 to enter it in the equation. We have to testify whether the relationship between Y and X1 is statistically significant or not. To testify this, we set up the null hypothesis: H 0 : ȡ YX1 = 0

against the alternative hypothesis H1: ȡ YX1 z 0 .

Under the null hypothesis, the test statistic is given by: t

0.9982 37 1  (0.9982) 2

~t (37)d.f.

(7.15)

102.6582

At a 5% level of significance with 37 degrees of freedom, the table values of the test statistic is r 1.687 . Since the calculated value of the test statistic does not fall within the table values, the null hypothesis will be rejected and the variable X1 will enter the equation. Thus, the equation will be E(Yt ) = ȕ 0 +ȕ1X1t . We now calculate the partial F-test values for the remaining explanatory variables X 2 , X 3 , and X 4 respectively which are given in Table 7-11. Table 7-11: The partial F-test values for the variables X 2 , X 3 , and X 4

Explanatory Variables X2 X3 X4

F-test Value 16.0930 115.8160 2.1876

Decision The variable X 3 will enter in the equation at any significance level.

Since the partial F-test value for the variable X 3 is the highest, we compare this value with the table value of Fstatistic. At 5% level of significance with 1 and 35 degrees of freedom, the table value of the F-test statistic is 4.127. Since the calculated value of the F-test statistic is greater than the table value, the variable X3 has a significant effect on Y, and thus, the variable X 3 enters the equation. After entering the variable X 3 in the equation, we have E(Yt ) = ȕ 0 +ȕ1X1t +ȕ 3 X 3t . Now, we calculate the partial F-test value for the variables X 2 and X 4 which are given in Table 7-12.

2

Source: WDI, 2020

Selection of Best Regression Equation and Diagnostic Testing

345

Table 7-12: The partial F-test values for the variables X 2 and X 4

Explanatory Variables X2 X4

F-test Value 43.5310 63.0531

Decision The variable X 4 will enter the equation.

Since the partial F-test value for the variable X 4 is higher, we compare this value with the table value of the Fstatistic. At 5% level of significance with 1 and 36 degrees of freedom, the table value of the F-test statistic is 4.116. Since the calculated value of the test statistic is greater than the table value, the variable X 4 will enter in the equation. After entering the variable X 4 in the equation, the equation will be E(Yt ) = ȕ 0 +ȕ1X1t +ȕ 3 X 3t +ȕ 4 X 4t . Now, we calculate the partial F-test value for the remaining variable X 2 which is given in Table 7-13. Table 7-13: The partial F-test value for the variable X 2

Explanatory Variables X2

F-test Value 180.2755

Decision The variable X 2 will enter the equation at any significance level.

At 5% level of significance with 1 and 37 degrees of freedom, the table value of the test statistic is 4.107 which is smaller than the calculated value. Thus, the variable X 2 will enter in the equation. After entering the variable X 2 in the equation, our best regression equation will be: E(Yt ) = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ 3 X 3t +ȕ 4 X 4t . The Stepwise Regression

The stepwise regression is nothing but the successive performance of forward selection procedure and backward elimination procedure respectively. In this method, at each stage, the forward selection procedure is used to decide which variable to include and the backward elimination procedure is used to decide which variable is eliminated from the regression equation. The procedure is outlined as follows: Step 1: Firstly, we have to consider the regression equation of the type, E(Yi ) = ȕ 0 . Step 2: Secondly, we calculate the simple correlation coefficient between Y and each of the explanatory variables X’s. We select a variable which has the largest value of correlation coefficient with Y. Let the correlation coefficient between Y and the variable Xl (l =1, 2,….,k) be the highest.

(i) The null hypothesis to be tested is H 0 : ȡ YXl = 0,  l

against the alternative hypothesis H1: ȡ YXl z 0,  l .

Under the null hypothesis, the test statistic is given by: t

rYXl n  2 2 1  rYX

~t (n-2)d.f.

(7.16)

l

If rYXl is significantly large, enter the variable X l in the equation. So, the equation will be E(Yi ) = ȕ 0 +ȕ1X li . (ii) Otherwise, stop the selection procedure and accept the equation E(Yi ) = ȕ 0 as the best regression equation. Step 3: After including the variable X l in the model, we will search for a second independent variable. Let us examine the partial correlation coefficient between Y and the remaining independent variables. Select the next variable to enter in the model which has the highest partial correlation coefficient with the variable Y. Let the variable X j have the highest correlation coefficient with the variable Y and enter the model. After entering the variable X j , the

equation will be E(Yi ) = ȕ 0 +ȕ l X li +ȕ j X ji .

Chapter Seven

346

ˆ = ȕˆ +ȕˆ X +ȕˆ X . Now, we compute the partial FStep 4: Compute the regression equation Y on X l , and X j i.e., Y i 0 l li j ji

test values for each of the explanatory variables in the model and then select the lowest of these two partial F’s values and then compare it, say FLow , with the table value at a 5% level of significance with (1, n-3) degrees of freedom. Step 5: If FLow < Ftab , then drop the variable. Otherwise, keep it in the model. Repeat steps 3, 4, and 5 until a satisfactory model is obtained. Note: Different software packages such as EViews, Python, R, RATS, STATA, etc. can be applied directly to select the best regression equation using the stepwise method. Ex. 7-7: Select the best regression equation for the given problem in Ex. 7-6, using the stepwise method. Solution: Firstly, we consider the regression equation of Y on the constant term of the type E(Yt ) = ȕ 0 . We now calculate the simple correlation coefficient between Y and each of the explanatory variables. The results are given in Table 7-14. Table 7-14: The correlation coefficients between Y and the explanatory variables rYXi

Correlation between Y and Explanatory Variables The correlation coefficient between Y and X1

0.9982

The correlation coefficient between Y and X 2

0.9109

The correlation coefficient between Y and X3

0.9892

The correlation coefficient between Y and X 4

-0.9291

t-test 102.6585

Decision The variable X1 will enter the equation.

Since the correlation coefficient between Y and X1 is the highest and statistically significant at any significance level, the variable X1 will be selected to be enterd in the regression equation. After entering the variable X1 , the equation will be E(Yt ) = ȕ 0 +ȕ1X1t . We now calculate the partial correlation coefficients between Y and the remaining independent variables X 2 , X 3 and X 4 respectively. The estimated values are given in Table 7-15. Table 7-15: The partial correlation coefficients between Y and each of the remaining variables

Partial Correlation between Y and Explanatory Variables

rYXi .X jXk

Partial correlation coefficient between Y and X 2

0.5612

Partial correlation coefficient between Y and X 3

0.8763

Partial correlation coefficient between Y and X 4

-0.2425

Decision The variable X3 will enter the equation.

Since the partial correlation coefficient between Y and X 3 is the highest, the variable X3 will be selected to be entered in the model. After entering the variable X3 , the equation will be E(Yt ) = ȕ 0 +ȕ l X lt +ȕ 3 X 3t .We apply the OLS method to run the equation E(Yt ) = ȕ 0 +ȕ l X lt +ȕ 3 X 3t , and then we compute the partial F-test values for the explanatory variables X1 and X 3 respectively. The results are given in Table 7-16. Table 7-16: The partial F-test values for the variables X1 and X3

Explanatory Variables X1 X3

F-test Value 215.2914 4.9647

Decision The variable X3 will remain in the equation.

The lowest partial F-test value is 4.9647. We compare it with the table value. At a 5% level of significance with 1 and 36 degrees of freedom, the table value of the F-test statistic is 4.116. Since the calculated value of the test statistic is greater than the table value, the variable X 3 will remain in the equation. Therefore, the equation will be E(Yt ) = ȕ 0 +ȕ l X lt +ȕ 3 X 3t . We calculate the partial correlation coefficients between Y and the remaining independent variables X 2 and X 4 , which are given below in Table 7-17.

Selection of Best Regression Equation and Diagnostic Testing

347

Table 7-17: The partial correlation coefficients

Partial Correlation between Y and Explanatory Variables

rYXi .X jXk

Partial correlation coefficient between Y and X 2

0.7398

Partial correlation coefficient between Y and X 4

0.7978

Decision The variable X 4 will enter the equation.

Since the partial correlation coefficient between Y and X 4 is the highest, the variable X 4 will enter the equation. After entering the variable X 4 in the equation, the equation will be E(Yt ) = ȕ 0 +ȕ l X lt +ȕ3 X 3t +ȕ 4 X 4t . We apply the OLS method to run the equation E(Yt ) = ȕ 0 +ȕ l X lt +ȕ 3 X 3t +ȕ 4 X 4t and then compute the partial F-test values for the explanatory variables X1 , X 3 and X 4 respectively. The results are given in Table 7-18. Table 7-18: The partial F-test values

Explanatory Variables X1 X3 X4

F-test Value 203.7197

Decision The variable X 4 will exclude from the equation.

4.7799 0.00513

The lowest partial F-test value is 0.00513. We compare it with the table value. At a 5% level of significance with 1 and 35 degrees of freedom, the table value of the F-test statistic is 4.125. Since the calculated value of the test statistic is smaller than the table value, the variable X 4 will be excluded from the equation. Therefore, the final equation for the given problem will be E(Yt ) = ȕ 0 +ȕ l X lt +ȕ 3 X 3t . Criteria for Model Selection

To select the lagged length of AR (p), ARMA (p, q), ARIMA (p, d, q) model, ARCH, GARCH model, etc., we can apply different selection criteria. The most popular and widely applicable criteria are Akaike’s Information Criteria (AIC), Schwarz’s Bayesian Information Criteria (SBIC) and Hannan –Quinn Information Criteria (HQIC). These criteria are given below: AIC = ln ıˆ 2 +

2k T

(7.17)

SBIC = ln ıˆ 2 +

k log(T) T

(7.18)

HQIC = ln ıˆ 2 +

2k log ^log(T)` T

(7.19)

1 T 2 ¦ et , e’s are T t=1 the residuals of the equation, k is the total number of parameters that are estimated, and T is the sample size. We will choose the model for which the criterion values are the smallest. When two criteria differ in their trade-off between fit and parsimony, the HQIC criteria can be preferred because it has the property that will help to almost surely select the true model, if T tends to infinity provided that the true model is in the class of ARMA (p, q) models for relatively small values of p and q.

where ıˆ 2 is the maximum likelihood estimate of the residual variance ı 2 , which is given by ıˆ 2 =

7.3 Model Specification, and Diagnostic Testing Meaning of Misspecification

In the case of multiple regression models, if we consider a mathematical form for an estimation which differs from the true form of the relationship between the dependent and the observed independent variables, this is called the misspecification of the mathematical model. Ex. 7-8: Let the true wage equation be ln(Wi ) = Į 0 +Į1EDU i +Į 2 EXPi +Į3 EXPi2 +İ i

(7.20)

Chapter Seven

348

where Wi is the wages, EDU i is the education level and EXPi is the experience of the ith wage earner. But for estimation, we consider the following model: (7.21)

ln(Wi ) = Į 0 +Į1EDU i +Į 2 EXPi +İ i

In the estimated model, we omit the squared term of experience, and thus, we are committing a misspecification of the functional form between wage and its determinants. In this case, the estimator would be biased. The amount of biasedness depends on the size of Į3 and the correlation among education, experience and squared term of experience. Even if we get an unbiased estimator of Į 2 , we would not be able to estimate the return of experience because it equals to Į 2 +Į3 EXPi . Using only the biased estimator of Į 2 can be misleading, especially at extreme values of EXP. Chow Test for Parameter Structural Change

A time-series data can often contain a structural break due to a change of monetary policy, tax reform policy, or a sudden shock to the economy due to stock market crashes, wars, earthquakes, tsunamis, the COVID-19 pandemic, etc. The parameter structural break is interpreted as a special case of time dependence. To test for a structural break, we often use the Chow test which is Chow’s first test (the second test relates to predictions). The model in effect uses an F-test to determine whether a single regression is more efficient than two separate regressions involving splitting the data into two sub-samples. This could occur as illustrated in Fig. 7-4 (Case 2). In the second case, we have a structural break at time t. y

y

Model 1 Model 2

x Case 1

t

x

Case 2

Fig. 7-4: A structural break at time t

In the first case, we have just a single regression line to fit the data points (scatter plot). It can be expressed as Yt = ȕ 0 +ȕ1X t +İ t

(7.22)

In the second case, where there is a structural break which is occurred at time t = T1 , we have two separate models expressed as Yt = Į 0 +Į1X t +İ1t , t  T1

(7.23)

and Yt = į0 +į1X t +İ 2t , t  T2

(7.24)

where T1 ={1, 2,.....,T1}, T2 ={T1 +1,T1 +2,......,T} with ĭ1 = (Į 0 , Į1 , ı12 ) and ĭ 2 = (į 0 , į1 , ı 22 ) being the underlying parameters of interest respectively. This suggests that model (7.23) is applied before the break at time t=T1 , then model (7.24) is applied after the structural break. If the parameters in the above models are the same, i.e., Į 0 =į 0 , Į1 =į1 , then models (7.23) and (7.24) can be applied as a single equation model as in case 1, i.e., model (7.22), where there is a single regression line. In matrix notation, equations (7.23) and (7.24) can also be written as Y1 = X1Į+İ1

(7.25)

Selection of Best Regression Equation and Diagnostic Testing

349

(7.26)

Y2 = X 2 į+İ 2

ªĮ 0 º where Į = « » , and į = ¬Į1 ¼

ªį 0 º « ». ¬į1 ¼

For the case where the sample period is t =1, 2,.....,T1 , T1 +1, T1 +2,......,T , the distribution of the sample takes the form ª§ X1Į · § ı12 IT § Y1 |X1 · 1 «¨ ~ N ¨ ¸ ¸ ¨¨ Y |X X į 0 « © 2 2¹ ¬© 2 ¹ ©

0 ·º ¸» ı I ¸¹ »¼

(7.27)

2 2 T2

where T2 =T  T1 The null hypothesis to be tested is: H 0 : Į = į, ı12

ı 22 , which can be separated as H 01: Į = į , H 02 : ı12

ı 22

against the alternative hypothesis H11: Į z į , H12 : ı12 z ı 22

where H 0 is the intersection of H 01 and H 02 . For this case, the main advantage is that the point of the structural break is assumed to be known priori. This however raises the question whether T2 >k, ( ) 2 is estimable) or T2 k , then Chow (1960) proposed the F-type test for testing the structural break. This test involved the following steps: Step 1: Firstly, we combine all of the sample observations and then we regress Y on X. We estimate the equation over the whole period and we compute the residual sum of squares from that equation denoted by ESS. Step 2: Secondly, we split the whole data into two sub-samples. The first sub-sample contains T1 observations and the second sub-sample contains T2 observations. We regress Y on X for the first sub-sample and then we compute the residual sum of squares from that first equation denoted by ESS1 . Again, we regress Y on X for the second subsample and then we compute the residual sum of squares from that second equation denoted by ESS2 . Step 3: Thirdly, we compute the value of the test statistic. Under the null hypothesis H 01 , the test statistic is given by: F=

{ESS  (ESS1 +ESS2 )}/k ~ F(k, T  2k) (ESS1 +ESS2 )/T  2k

= Fcal

(7.29)

where ESS = Residual sum of squares for the whole sample, ESS1= Residual sum of squares for sub-sample 1, ESS2= Residual sum of squares for sub-sample 2, k = the number of restrictions which is equal to the number of parameters in each equation, and T = the number of observations for the whole sample. ESS is called the restricted residual sum of squares and (ESS1 +ESS2 ) is called the unrestricted residual sum of squares. Since the restriction is that the parameters are equal across the sum-samples, the restricted regression will be the single regression equation for the whole sample. Thus, the test is one of how much the residual sum of squares for the whole sample (ESS) will be larger than the sum of the residual sum of squares for the two sub-samples (ESS1 +ESS2 ) . If the parameters do not change much between the two sub-samples, then the restricted residual sum of squares will not rise much. Thus, the test statistic in (7.29) can be considered a straightforward application of the standard F-test formula.

Chapter Seven

350

Step 4: We choose the level of significance that we can tolerate. Let the level of significance be Į = 5% . Step 5: We then find the table value of the test statistic. Let at a 5% level of significance with k and (T-2k) degrees of freedom, the table value of the test statistic be Ftab . Step 6: Finally, we make a decision on whether the null hypothesis will be accepted or rejected by comparing the calculated value of the test statistic with the table value. If Fcal ! Ftab , we reject the null hypothesis implying that the parameters of the two sub-samples are not equal. Thus, it can be concluded that the parameters are not stable over time.

This test raises the same issue as the time invariance test. However, there is no reason to believe that H 02 would be valid when H 01 might not be. Thus, we have to test the null hypothesis H 02 . A test for H 02 against H12 can be based on the two estimated variances. Under the null hypothesis of H 02 , the test statistic is given by: s 22 /ı 22 T k F = 22 2 ~ F(T2  k, T1  k) s1 /ı1 T1  k

=

s 22 ~ F(T2  k, T1  k) s12

= Fcal

(7.30)

Let, at Į level of significance with (T2  k) and (T1  k) , the table value of the test statistic be Ftab . If Fcal ! Ftab , we reject the null hypothesis. Otherwise, we accept it. Ex. 7-9: Let us consider the money equation for USA of the type: M t = ȕ 0 +ȕ1CEX t +ȕ 2 INFt +ȕ 3 IR t +İ t

(7.31)

where M t = Money stock at time, CEX t = Real consumption expenditure at time t, INFt = Inflation rate at time t, IR t = Interest rate at time t. Now we are interested in testing a structural break which may have happened in 2003 due to the war between the USA and Iraq. The estimated results of the money equation for the whole sample, and for the time period 1970-2018, are given below ˆ =  7.6795e+012+1.6213CEX +2.1907e+011INF  1.8252e+011IR ½ M t t t t ° SE: 1.1648e+012 0.0789 1.0916e+011 7.4796e+010 ° ¾ t-TEST  6.59298 20.53957 2.00685  2.44018 ° R 2 = 0.9533, R 2 = 0.9502, ESS = 6.2827e+025, ıˆ 2 = s 2 = 1.3962e+024 °¿

(7.32)

Log(L) =  1429.538, and T= 49

The estimated results of the money equation for the first sample, and for the time period 1970-2003, are given below: ˆ =  3.8086e+012+1.0364CEX +1.6043e+010INF +2.2659e+010IR ½ M t t t t ° SE: 3.0796e+011 0.310 2.5279e+010 1.7516e+010 ° ¾ t-TEST  12.3673 33.4683 0.6346 1.2936 ° R 2 0.9893, R 2 0.9882, ESS1 = 1.6901e+024, ıˆ 12 = s12 =5.6337e+022 °¿

(7.33)

Log(L) =  936.67189 , and T= 34

The estimated results of the money equation for the 2nd sample, and for the time period 2004-2018, are given below:

Selection of Best Regression Equation and Diagnostic Testing

ˆ =  2.0780e+013+2.7913CEX  5.7146e+011INF  2.9880e+011IR ½ M t t t t ° SE: 2.3883e+012 0.1674 2.3036e+011 1.3341e+011 ° ¾ t-TEST  8.7005 16.6777  2.4807  2.2398 ° R 2 0.9736, R 2 0.9663, ESS2 = 3.1958e+024, ıˆ 22 = s 22 =2.9052e+023 °¿

351

(7.34)

Log(L) =  424.15269, and T= 15

Under the null hypothesis H 01 , the test statistic is given by: F=

6.2827e+025  (1.6901e+024+3.1958e+024)/4 ~ F(4, 41) 1.6901e+024+3.1958e+024/41

= 34.1308

(7.35)

At a 5% level of significance with 4 and 41 degrees of freedom, the table value of the test statistic is 2.61. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected implying that the parameters of the two sub-samples are not equal. Thus, it can be concluded that there is a structural break in the time period 2003 due to the war between the USA and Iraq. Under the null hypothesis H 02 , the test statistic is given by:

F=

2.9052e+023 ~ F(11, 30) 5.6337e+022

= 5.1569

(7.36)

At a 5% level of significance with 11 and 30 degrees of freedom, the table value of the test statistic is 1.7961. Since the calculated value of the test statistic is greater than the table value of the test statistic, the null hypothesis will be rejected implying that the parameters ı12 and ı 22 of the two sub-samples are not equal. Model Stability Tests Based on Recursive Estimation (CUSUM and CUSUSQ Tests)

An alternative test to the Chow test (1960) for model stability proposed by Brown, Durbin and Evans (1975) will be applicable in the situation where we believe that a time-series data set may contain a structural break but it is uncertain when a structural break might take place. This test is based on recursive residuals, i.e., residuals which are obtained by recursive least-squares estimation method which is known as the CUSUM test. This test is appropriate for time-series data. The recursive least-squares estimation technique simply involves starting with a sub-sample of a time-series data set, estimating the regression model, then sequentially adding one observation at a time and re-estimating the regression equation until the end of the sample point is reached. Initially, it is very common to estimate the regression equation using the minimum number of observations possible which will be (k+1) observations for k independent variables. Thus, in the first step, we estimate the regression equation using observations 1 to (k+1); at the second step, the regression equation will be estimated using observations 1 to (k+2) and so on; at the final step, we estimate the regression equation using observations 1 to T. In the regression model for each parameter, finally, we will get (T-k) separate estimates. We may expect that the parameter estimates which are obtained near the start of the recursive leastsquares procedure will be unstable since these estimates are based on so few observations. However, a common question will arise whether these estimates gradually settle down or the instability will be continued through the whole sample. We can answer this question using the CUSUM and CUSUMSQ tests. The CUSUM and CUSUMSQ tests are based on the one-step-ahead prediction errors which are known as the recursive residuals, i.e., the difference between y t and its predicted value based on the parameters estimated at time t-1. Let us now discuss these test procedures: Consider the linear regression model of the type: y t = x ct ȕ t +İ t , t = 1, 2,….,T

(7.37)

where ȕ t is a k u 1 parameter vector, x t is the vector of regressors and İ t is the random error term. The null hypothesis to be tested is H 0 : ȕ t = ȕ,  t

against the alternative hypothesis

Chapter Seven

352

H1: ȕ t z ȕ,  t

Assumptions:

(i) İ t is independently, identically, and normally distributed with mean zero and constant variance ı 2 i.e., İ t ~IIN(0, ı 2 ),  t . T

T

(ii) x t is a vector of strictly exogenous variables, i.e., ^x it `t=1 and ^İ t `t=1 are mutually independent for all i. ˆ be the OLS estimator using data up to and including time t, which is given by: Let ȕ(t)



t

ˆ = ¦ x xc ȕ(t) s s s=1



1 t

(7.38)

¦ x s ys

s=1

The tth recursive residual is defined as the ex-post prediction error for y t when the regression equation is obtained using only the first (t-1) observations and is given by: ˆ e t = y t  x ct ȕ(t-1)

(7.39)

The estimated variance of this residual is given by:



t ª ıˆ 2t = ı 2 «1  x tc ¦ x s x sc s=1 ¬



1

º xt » ¼

1 ı 2 ª1  x tc X ct-1X t-1 x t º ¬ ¼

(7.40)

Let the tth scaled residual be defined as: et

zt

(7.41)

ª1  x c X c X 1 x º t t-1 t-1 t ¬ ¼

Under the null hypothesis that the coefficients remain constant during the whole sample period, z t is normally distributed with mean zero and variance ı 2 , i.e., z t ~ N(0, ı 2 ) , and it is independent of z m for all m z t . The distribution z t will change over time weights against the hypothesis of model stability. Brown, Durbin and Evans (1975) have suggested two tests based on z t namely the CUSUM test and the CUSUMSQ test. The CUSUM test is based on the cumulated sum of the residuals and the CUSUMSQ test is based on the cumulated sum of squares residuals. The CUSUM test is given by:

zr ˆ r=k+1 ı r=t

zt

(7.42)

¦

where ıˆ 2 =

T 1 2 ¦ z r  z , and z T  k r=k+1

1 r=T ¦ zr T  k r=k+1

Under the null hypothesis, z t has a zero mean and a variance of approximately the number of residuals which will be summed (each term has a variance 1 and they are independent). This test is performed by plotting z t against t.



Confidence bounds for the sum are obtained by plotting the two lines that connect the points k, r a T  k







and T, r 3a T  k . Values of “a” can be found in their paper corresponding to various significance levels. At 95% and 99% confidence levels, the values of “a” are 0.948 and 1.143 respectively. The hypothesis will be rejected if z t does not fall within the boundaries. The CUSUMSQ test is given by:

Selection of Best Regression Equation and Diagnostic Testing

353

r=t

St =

2 ¦ zr

r=k+1 r=T

(7.43)

2 ¦ zr

r=k+1

Since the residuals are independent, each of the two terms of equation (7.43) is approximately the sum of Chi-square (t  k) variate each with one degree of freedom. Therefore, the expected value of St will be approximately , i.e., (T  k) tk . This test is performed by constructing confidence bounds for E(St ) at the value of t, and plotting St Tk and these bounds against t. The bounds are { E(St ) ± b }, where b depends on both (T-k) and the significance level3. If E(St ) #

the value of St does not fall within the bounds, the null hypothesis will be rejected implying that the parameters are not stable over time. There is a structural break(s) in the data set. Note: Different software packages such as EViews, Python, R, RATS, and STATA etc. can be applied directly for these tests. Harvey and Collier (1977) Test for Model Stability

A related test for model stability is proposed by Harvey and Collier (1977) which is based on the mean of z. Under the null hypothesis of model stability, z is normally distributed with mean zero and variance ı 2 /(T  k) . Under the null hypothesis, the test statistic is given by: d

z 2

s /T  k

(7.44)

~ N(0, 1)

If the sample size is small, t-test can be applied to test the null hypothesis. Under the null hypothesis, the t-test statistic is given by: t=

z Tk ~ t (t-k-1)d.f. s

where s 2

(7.45)

r=T 1 2 ¦ zr  z T  k  1 r=k+1

Let the level of significance be D . At D level of significance with (T-k-1) degrees of freedom, we find the table value of the t-test statistic. If the calculated value is greater than the table value, we reject the null hypothesis implying that there is a structural break(s) over time in the data set. Ex. 7-10: Data on the logarithmic forms of the per capita GDP (Y =ln(GDP)), FDI ( X1 =ln(FDI)), domestic investment ( X 2 =ln(INVEST)), and trade openness ( X 3 =ln(OPN)), of the USA are collected for the period 197220184 for the stability of the equation of Y on X1 , X 2 , and X 3 using the CUSUM, CUSUMSQ and Harvey-Collier tests. First, we consider the regression equation of the type:

(7.46)

Yt = x ct ȕ t +İ t

where x ct

>1

X1t

X 2t

X 3t @ , ȕ t is a 4 u 1 vector of parameters, and İ t is the random error terms.

The null hypothesis to be tested is H 0 : ȕ t = ȕ,  t

against the alternative hypothesis

3 4

Tables may be found in Harvey (1981) and Johnston (1984) Source: WDI, 2020, Own Calculations

Chapter Seven

354

H1: ȕ t z ȕ,  t

We apply the OLS method to equation (7.46) and the recursive least-squares estimates are given in Table 7-19. Table 7-19: The recursive least-squares estimate of equation (7.46)

Variable Constant X1

Linear Regression - Estimation by Recursive Least Squares Dependent Variable Y Coeff Std Error T-Stat 1.5625 1.2483 1.2517 0.0124 0.0057 2.1724

Signif 0.2174 0.0354

X2

0.3190

0.0443

7.2004

0.0000

X3

0.2018

0.0429

4.7040

0.0000

2

R Adjusted R2 TR2 Mean of Dependent Variable Std Error of Dependent Variable

Standard Error of Estimate Sum of Squared Residuals Log Likelihood Durbin-Watson Statistic

0.9999 0.9999 47 9.6867 2.9940

0.0184 0.0146 123.1946 0.3729

The recursive residuals (re), the cumulative sum of recursive residuals (CUSUM), and the cumulative sum of squares of recursive residuals (CUSUMSQ) are given in Table 7-20. Table 7-20: The recursive residuals (re), CUSUM and CUSMSQ

Year 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995

re 0.022 0.012 -0.014 -0.009 -0.013 0.002 0.002 0.013 -0.004 0.004 0.002 0.005 0.009 0.006 -0.003 -0.009 -0.007 -0.021 -0.024 -0.036

CUSUM 0 0 0 0 1.219 1.889 1.135 0.626 -0.064 0.021 0.128 0.823 0.619 0.838 0.937 1.189 1.692 2.026 1.876 1.381 1.027 -0.095 -1.406 -3.361

CUSUMSQ 0 0 0 0 0.035 0.045 0.058 0.064 0.075 0.075 0.076 0.087 0.088 0.089 0.089 0.091 0.097 0.099 0.1 0.105 0.108 0.138 0.178 0.266

Year 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018

RR -0.04 -0.042 -0.043 -0.04 -0.038 -0.012 0.009 0.009 -0.002 0.003 -0.005 0 0.002 0.036 0.017 0 -0.006 -0.008 -0.009 -0.007 -0.003 0.002 0.014

CUSUM -5.559 -7.825 -10.174 -12.371 -14.447 -15.115 -14.628 -14.116 -14.238 -14.088 -14.353 -14.378 -14.297 -12.361 -11.447 -11.47 -11.819 -12.269 -12.742 -13.123 -13.278 -13.16 -12.42

CUSUMSQ 0.379 0.498 0.627 0.739 0.839 0.849 0.855 0.861 0.861 0.862 0.864 0.864 0.864 0.951 0.97 0.97 0.973 0.978 0.983 0.986 0.987 0.987 1

The cumulative sum of recursive residuals (CUSUM) and the cumulative sum of squares of the recursive residuals (CUSUMSQ) are shown below graphically with their 95% confidence bounds

Selection of Best Regression Equation and Diagnostic Testing

355

20 15 10 5 0 -5 -10 -15 -20 1980

1985

1990

1995

CUSUM

2000

2005

2010

2015

5% Significance

Fig. 7-5(a): Plot of Cumulative sum of recursive residuals (CUSUM) 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 1980

1985

1990

1995

CUSUM of Squares

2000

2005

2010

2015

5% Signific anc e

Fig. 7-5(b): Plot of the cumulative sum of squares of recursive residuals (CSUMSQ)

From Fig. 7-5(a) and Fig, 7-5(b), it can be said that the CUSUM and CUSUMSQ tests results are not within the critical bounds implying that all coefficients in the linear regression model are not stable. The results indicate that there are structural breaks in the data. Under the null hypothesis of model stability, the Harvey-Collier test value is: d

0.005313 0.000318/39

1.861

(7.47)

At a 5% level of significance, the null hypothesis of model stability will be rejected. RESET Test for Functional Form Misspecification

This test is proposed by Ramsey (1969) for testing misspecification of a general functional form. This test is called a regression specification error test (RESET). The idea of this test statistic is very simple and is highlighted below: Let the original regression equation be of the type: Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +...........+ȕ k X ki +İ i

(7.48)

To implement the RESET test, we must decide how many functions of the fitted values to include in the expanded regression. There is no right answer to this question, but we can consider squared and cubic terms of the fitted value. Let us now consider the expanded equation of the type: ˆ 2 +į Y ˆ3 Yi =ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +į1Y i 2 i +u i

The null hypothesis to be tested is H 0 : į1 = į 2 = 0

against the alternative hypothesis H1: At least one of them is not zero.

Under the null hypothesis, the test statistic is given by:

(7.49)

Chapter Seven

356

F=

(RESS  UESS)/2 ~F(2, n  k  3) UESS/(n  k  3)

(7.50)

where RESS is the restricted residual sum of squares which is obtained from equation (7.48) and UESS is the unrestricted residual sum of squares which is obtained from equation (7.49). We can also apply the likelihood ratio test for testing the null hypothesis which is given by: 2log(LR) = 2[ log(L)  log(L0 )]~Ȥ 22

(7.51)

where log(L 0 ), is the log-likelihood function under H 0 . Comment: If the null hypothesis is rejected at D level of significance, it indicates that the model is not specified correctly. Otherwise, we can say that the model is specified correctly. Rejection of the null hypothesis implies that the parameters of the two sub-samples are not equal. Thus, it can be concluded that the parameters are not stable over time. Ex. 7-11: For the given problem in Ex. 7-10, test the misspecification of the regression equation using the RESET test. Solution: First, we regress Y on X1 , X 2 , and X 3 of the type: Yt = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ 3 X 3t +İ t

(7.52)

We now apply the OLS method to run equation (7.52) and then we obtain the residual sum of squares which is called the restricted residual sum of squares (RESS) and also we obtain the predicted values of y’s. For the given problem we have, RESS = 0.0146 and the predicted values of y’s are given by: ˆ = 1.5625+0.0124X +0.3190X +0.2018X Y t 1t 2t 3t

(7.53)

We now consider the expanded regression equation of the type: ˆ 2 +į Y ˆ3 Yt = ȕ 0 +ȕ1X1t +ȕ 2 X 2t +ȕ 3 X 3t +į1Y t 2 t +u t

(7.54)

And we set up the following null hypothesis H 0 : į1 = į 2 = 0

against the alternative hypothesis H1: At least one of them is not zero.

We apply the OLS method to run equation (7.54) and then we obtain the residual sum of squares which is called the unrestricted residual sum of squares (UESS) and is given below: UESS = 0.008865 Under the null hypothesis, the test statistic is given by: F=

(0.01455  0.00887)/2 ~F(2, 41) 0.00887/41

= 13.1472

(7.55)

Let the level of significance be 5%. Decision: At a 5% level of significance with 2 and 41 degrees of freedom, the table value of the test statistic is 3.23. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected implying that the model is not correctly specified.

We can also apply the likelihood ratio test. The likelihood ratio test statistic is given by: 2log(LR) = 2[ log(L)  log(L0 )]~Ȥ 22

= 2[ 134.8389  123.1946]~Ȥ 22 = 23.2886

(7.56)

Selection of Best Regression Equation and Diagnostic Testing

357

At 5% level of significance with 2 degrees of freedom, the table value of the Chi-square test statistic is 5.99. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected, implying that the model is not correctly specified for estimation. Exercises 7-1: Why do we need to select the best regression equation? Explain with an example. 7-2: Write different techniques to select the best regression equation. 7-3: In all possible regression equations procedure, discuss the R2 criteria, residual mean square criteria and the Mallows Cp statistic to select the best regression equation with an example of each. 7-4: Discuss the backward elimination procedure to select the best regression equation with an example. 7-5: Write the advantages and disadvantages of the backward elimination procedure. 7-6: Discuss the forward selection procedure to select a best regression equation with an example. 7-7: What are the advantages and disadvantages of the forward selection procedure? 7-8: Discuss the stepwise procedure to select the best regression equation with an example. 7-9: Discuss the Akaike’s Information Criteria (AIC), Schwarz’s Bayesian Information Criteria (SBIC) and HannanQuinn Information Criteria (HQIC) to select the lagged values of AR, ARMA and ARIMA models. 7-10: Explain the meaning of model misspecification with an example. 7-11: Discuss the Chow test for parameter structural change with an example. 7-12: Discuss the CUSUM and CUSUMSQ tests for model stability. 7-13: Discuss the Harvey and Collier (1977) test for model stability. 7-14: Discuss the RESET test for misspecification of the functional form. 7-15: Using the data of the GDP (constant 2010 US$), capital investment (constant 2010 US$), and labour force of the USA over a period of time.

(i) Estimate the Cobb-Douglas production function. (ii) Test the structural break for the Cobb-Douglas production function using the Chow test. (iii) Test the stability of the model using the CUSUM and CUSUMSQ tests. 7-16: The data given below are the values of output (OUT, in million $), the input labour (L) and input capital investment (K, in million $) of 26 firms in the year 2018.

OUT 1060 1020 1090 1150 1300 1360 1380 1430 1440 1490 1500 1520 1540

L 100 180 140 150 190 195 200 220 225 285 300 302 325

K 300 400 420 400 510 590 600 630 610 630 850 870 980

OUT 1410 1350 1382 1573 1590 1520 1535 1595 1580 1564 1485 1465 1470

L 290 250 265 330 350 365 310 315 340 345 295 280 285

K 850 750 860 850 890 875 780 790 810 820 815 860 885

(i) Estimate the Cobb-Douglas production function. (ii) Test the hypothesis that the estimates are sensitive to sample size using the additional information of observations 20-26.

Chapter Seven

358

(iii) Test the misspecification of the functional form using the RESET test. 7-17: The logarithmic transformation of the Cobb-Douglas production function of Bangladesh is, ln(GDPt ) = ȕ 0 +ȕ1lnLFt +ȕ 2 ln(K t )+İ t , where GDPt = GDP (constant 2010 USD) at time t, LFt = Labor force at time t,

K t = Capital investment at time t. The estimated results of the equation for the whole sample, and for the time period 1980-2018, are given below: ˆ ) = 9.8400  0.0475 ln(LF ) +0.6836ln(K ) ½ ln(GDP t t t ° SE: 0.6595 0.0459 0.0181 ¾ ° t-TEST 14.9211 -1.0355 9.8456 ¿ R 2 = 0.9993 , R 2

0.9992, ESS = 0.0083 , log(L) = 106.1927, T= 38

The estimated results of the function for the 1st sample, and for the time period 1980-1995, are given below: ˆ ) = 11.9046  0.1193ln(LF ) +0.6495ln(K ) ½ ln(GDP t t t ° SE: 0.9998 0.0443 0.0205 ¾ ° t-TEST 11.9064 -2.6925 1.3114 ¿ R 2 = 0.9946, R 2 = 0.9931, ESS1 =0.00228 , log(L0) = 44.6462, T1

15

The estimated results of the equation for the 2nd sample, and for the time period 1996-2018, are given below: ˆ ) = 4.6062 + 0.2354ln(LF ) + 0.6596ln(K ) ½ ln(GDP t t t ° SE: 2.8677 0.2355 0.0607 ¾ ° t-TEST 106062 1.1783 10.8729 ¿ R 2 = 0.9995, R 2 = 0.9994, ESS2 = 0.00156, , log(l) = 77.7633, T2

Test the null hypothesis of parameter structural change.

23

CHAPTER EIGHT TIME SERIES ECONOMETRICS

8.1 Introduction In economics, business, and especially in finance, we often observe data over a period of time. For example, in 1972, the GDP of Bangladesh was about 26.7476 billion USD (2015 constant US$ price) but in 2021, it reached 285.2695 billion USD which is about 10.6652 times of 1972. Such temporal data is time-series data. A difficulty in basing an econometric equation on time-series data is that the data may change in basic ways over time, although the equation is assumed to hold true for all times. The graph below gives an example. Clearly, the level of output in Bangladesh is rising over time. If we try to develop an econometric model that outputs as a function of some explanatory variables that are not changing similarly over time, we cannot hope to obtain an equation that always holds true. It may hold true for a few years, but probably not for the entire time span, because the equation fails to account for the fact that the mean and variance of the output series is changing over time. 300 250 200 150 100 50 0 1972 1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011 2014 2017 2020

Fig. 8-1: GDP (constant price in 2015 US$) of Bangladesh over some time

One problem in dealing with time-series data is how to develop an econometric model for a time-series analysis, such as output, profits, stock price index, sales revenue, market capital, etc., that evolve over time. Such series are said to be nonstationary. It is well known that the usual techniques of regression analysis can result in highly misleading conclusions when variables contain a stochastic trend (Stock and Watson (1988), Nelson and Kang (1981), Granger and Newbold (1974)). In particular, if the dependent variable and at least one independent variable contain a stochastic trend, and if they are not co-integrated, the regression results are spurious (Phillips (1986), Granger and Newbold (1974)). To identify the correct specification of the model, students and researchers should be able to distinguish between stationarity and non-stationarity and should be familiar with the different time-series econometric models. Therefore, in this chapter, fundamental concepts that are associated with time series including stationarity and nonstationarity are discussed. Also, the most popular and widely applicable time-series econometric models with their estimation techniques are discussed. These models are also explained with the help of numerical problems. Software packages such as EViews, RATS, STATA, R, Python, etc. are used for solving the numerical problems that are associated with different time-series econometric models.

8.2 Some Fundamental Concepts and Ideas In this section, some fundamental concepts and ideas that are associated with time-series econometric models are discussed with numerical examples. Definition of Time Series Time series refers to such data in which one variable is time. In other words, an arrangement of statistical data in chronological order, i.e., in accordance with the occurrence of time, is known as time-series data. Time-series data may be defined as a set of figures observed over a period of time. Mathematically, a time series is defined as the functional relationship of the type Yt = f(t), where Yt is the value of the variable Y at time t.

Chapter Eight

360

40

60

80

100

120

Ex. 8-1: A time series relating to the primary energy consumption of the USA (Quadrillion Btu) over a period of time will give an upward trend.

1940

1960

1980 Yr ENER

2000

2020

Fitted values

Fig. 8-2: Primary energy consumption of the USA in Quadrillion Btu from 1949-2019.

Uses of a Time-Series Analysis The time series analysis is a very essential tool in the fields of business, economics, finance, banking, and management for appropriate and timely decision-making and for policy formulation with respect to different problems for which risk will be minimal. Time-series analysis is not only important in business, economics, banking, finance and management, it is also very useful in natural, social and physical sciences. Some of the important uses of time series are given below: (i) To study the past behaviour of a time-series variable under consideration. (ii) To forecast the behaviour of a time-series variable in future based on the information of the past and present, which is essential for future planning. (iii) To compare the changes in the values of different time-series variables at different times or places. (iv) To evaluate the current achievement of a time-series variable. (v) To compare the actual current performance with respect to the past of a time-series variable. Stationarity A time-series variable Y is said to be stationary if it does not change in fundamental ways over time. In particular, a stationary variable Y has the following characteristics: (i) The mean is always the same. (ii) The variance is always the same. (iii) The covariance of any two observations depends only on the length of time that has passed between those observations, not on when the observations occurred. Let us consider the model of the type: Yt = Į+ȕX t +İ t

(8.1)

This stochastic model is said to be stationary if the mean and variance of Y are constant over time and the value of the covariance between two time periods depends only on the distance between these two periods or lag values, and not on the actual time, i.e., Mean : E(Yt ) = ȝ,  t [ constant, does not depend on time period t] Variance : Var(Yt ) = E[Yt  ȝ]2 = ı 2 ,  t [ constant, does not depend on time period t] Covariance : Cov(Yt , Yt-k ) = E[(Yt  E(Yt ))(Yt-k  E(Yt-k )] = Ȗ k [ depends on lags k, does not depend on time period t]

Sometimes, the stochastic process Yt is called the covariance-stationary or weakly stationary.

Time Series Econometrics

361

Non-Stationarity

If a stochastic process Yt is not stationary, then it is called a non-stationary process, i.e., a time-series variable Yt is said to be a non-stationary stochastic process if, Mean: E(Yt ) z ȝ [ not constant, depends on time period t], i.e., E(Yt )= tȝ Variance: Var(Yt ) = E[Yt  E(Yt )]2 z ı 2 [not constant, depends on time period t], i.e., Var(Yt ) = tı 2

The value of the variable Y at any time depends only on its prior value and on a random error term, i.e., (8.2)

Yt = Yt-1 + İ t

Ex. 8-2: The stock price this year depends only on last year’s price level, plus a random error term. Believers in the efficient market hypothesis argue that the stock price follows a random walk because only unforeseen events (reflected in the random error İ t ) can cause it to vary from the previous price, which already has incorporated all information known to the public. Trend Stationary Process (TSP)

Let us consider the model of the type: (8.3)

Yt = Į+ȕt +İ t

Then the stochastic process Yt is said to be a trend stationary process if E(İ t ) = ȝ;  t, and Var(İ t ) = ı 2 ,  t. Difference Stationary Process (DSP)

Let us consider the model of the type (8.4)

Yt = Į+ȕt +İ t

Then the stochastic process Yt is said to be a trend difference stationary process if E(ǻYt ) = ȝ,  t and Var(ǻYt ) = ı 2 ,  t. , where ' stands for the first difference. For time period (t-1), equation (8.4) can be written as (8.5)

Yt-1 = Į+ȕ(t-1) +İ t-1

Taking the difference between Yt and Yt-1 , we have, Yt  Yt-1 = Į+ȕt +İ t  Į  ȕt+ȕ  İ t-1 ǻYt = ȕ +u t , where, u t

İ t  İ t-1

(8.6)

Then, process (8.6) is said to be a difference stationary process if E(u t ) = 0,  t, and Var(u t ) = ı 2 ,  t.

White Noise Process and Gaussian White Noise Process

Let us consider the regression equation of the type Yt = Į+ȕX t +İ t

(8.7)

Then, the stochastic process {İ t } is said to be a white noise process if E(İ t ) = 0,  t, Var(İ t ) = ı 2 ,  t, and Cov(İ t , İ s ) = 0,  t z s. The stochastic process {İ t } is said to be a Gaussian white noise process if E(İ t ) = 0,  t, Var(İ t ) = ı 2 ,  t, and Cov(İ t , İ s ) = 0,  t z s , and İ t is normally distributed for all t, i.e., İ t ~IIN(0, ı 2 ),  t . Thus, we can say that the stochastic process {İ t } is said to be a Gaussian white noise process, if İ t is independently,

identically and normally distributed with zero mean and constant variance ı 2 for all t.

Chapter Eight

362

Random Walk Model

Let us consider the model of the type (8.8)

Yt = Yt-1 + İ t

where {İ t } is a Gaussian white noise process. From equation (8.8), we have ǻYt = İ t which indicates that the change in Yt , i.e., 'Yt , is independently and identically distributed with zero mean and constant variance ı 2 . This is called the simplest random walk model. If t =1, then from equation (8.8), we have (8.9)

Y1 = Y0 + İ1

Let Y0 = 0 at time t = 0. Thus, from (8.9), we have (8.10)

Y1 = İ1

If t = 2 , then from equation (8.9), we have Y2 = İ1 + İ 2 2

=¦ İ t

(8.11)

t=1

If t = 3, then we have Y3 = Y2 + İ 3 Y3 = İ1 + İ 2 +İ 3 3

=¦ İ t

(8.12)

t=1

Continuing this process, we have Yt = İ1 + İ 2 +İ 3 +........+İ t =

t

¦İ

(8.13)

j

j=1

Thus, from equation (8.13), we have E(Yt ) =

t

t

¦ E(İ ) = tȝ , and Var(Y ) = ¦ Var(İ ) = tı j

j=1

t

j

2

. Therefore, it can be said that the process Yt = Yt-1 + İ t is a

j=1

purely random walk process because the first difference of the process is stationary. Since the expected value and its variance change over time, the random walk model is nonstationary. Ex. 8-3: If we consider the model for the stock price of the type Pt = Pt-1 + İ t

(8.14)

where Pt indicates the stock price at time t and {İ t } is a Gaussian white noise process. Then this process is called a purely random walk model and the variable P is nonstationary. It is also interesting to note that random shocks get built permanently into the price structure.

Time Series Econometrics

363

Random Walk Model with Drift

The model of the type (8.15)

Yt = Į +Yt-1 + İ t

is called a random walk model with a drift component Į, where {İ t } is a white noise process. Ex. 8-4: Assume that the GDP of Bangladesh is growing exponentially at a constant rate. Thus, we have the GDP at time t GDPt = GDP0 eȕt +İ t

(8.16)

where {İ t } is a white noise process. Taking the logarithm of equation (8.16), we have ln(GDPt ) = ln(GDP0 )+ȕt+İ t

(8.17)

Yt = Į+ȕt+İ t

where Yt = ln(GDPt ), and Į = ln(GDP0 ) . Again, the GDP at a time (t-1) is given by GDPt-1 = GDP0 eȕ(t-1) +İ t-1

(8.18)

Again, taking the logarithm of equation (8.18), we have, ln(GDPt-1 ) = ln(GDP0 )+ȕt  ȕ+İ t-1 Yt-1 = Į+ȕt  ȕ+İ t-1 , where Yt-1 = ln(GDPt-1 ).

(8.19)

Taking the difference between equations Yt and Yt-1 , we have, Yt  Yt-1

Į+ȕt+İ t  Į  ȕt+ȕ  İ t-1

ǻYt = ȕ+İ t  İ t-1 ǻYt = ȕ+u t , where u t

İ t  İ t-1

(8.20)

Since {İ t } is a white noise process, {u t } is also a white noise. This indicates that the change in Yt , i.e., 'Yt with Yt ln(GDPt ) is independently and identically distributed with mean ȕ and constant variance. Thus, this has the form of a random walk model with drift D and the process is non-stationary. The Lag Operator

When we deal with the AR, MA, ARMA, and ARIMA models, the lag operator is used frequently. Let us now discuss the notation of the lag operator. Let the time series variable be denoted by y t . The lag operator L is used just to shift the time index one period back and is given by: Ly t = y t-1

(8.21)

We can do algebra using the lag operator L, which is why, it is very important to write this way. For example, L(Ly t ) = L(y t-1 ) = y t-2

(8.22)

This can be written as L2 y t = y t-2

(8.23)

Chapter Eight

364

The lag operator is used as a way of transforming time-series variables with the rules of multiplication and addition. With the lag notation, we can express compactly complicated time-series models. For example, y t = (ĮL+ȕL2 ) y t +İ t

(8.24)

= Įy t-1 + ȕy t-2 +İ t

8.3 Time-Series Econometric Models In time-series econometrics or econometrics for finance, we mainly deal with the following econometric models: (i) Autoregressive (AR) model (ii) Moving average (MA) model (iii) Autoregressive moving average (ARMA) model (iv) Autoregressive integrated moving (ARIMA) model (v) Autoregressive conditional heteroscedastic (ARCH) model (vi) Generalised autoregressive conditional heteroscedastic (GARCH) model (vii) Nonlinear generalised autoregressive conditional heteroscedastic (NGARCH) model (viii) Integrated generalised autoregressive heteroscedastic (IGARCH) model (ix) The exponential generalised autoregressive conditional heteroskedastic (EGARCH) model (xi) The GARCH-in-mean (GARCH-M) model (xii) The Quadratic GARCH (QGARCH) model (xiii) The Glosten-Jagannathan-Runkle GARCH (GJR-GARCH) model (xix) Multivariate GARCH (MGARCH) model But, in this chapter, the models which are most popular and widely applicable for time-series analyses are discussed.

8.4 Autoregressive (AR) Process An autoregressive process is one where the current value of a time-series variable Y depends on its past values plus the random error term. The First-Order Autoregressive (AR(1)) Process

A first-order autoregressive process is one where the value of a time-series variable Y at time t depends on the value of that variable at time (t-1) plus the random error term. In a first-order autoregressive process, we regress Y on its first lag value plus the random error term. An autoregressive model or process of order 1 is denoted by AR(1) and is given by: (8.25)

y t = Į+șy t-1 +İ t

where y t is the value of the stochastic variable Y at time t, y t-1 is the value of the variable Y at time (t-1), and {İ t } is a white noise process. From the AR(1) process, we see that y t depends on the first lag value of Y. Using the lag operator, equation (8.25) can also be written as y t = Į+șLy t +İ t [where L is the lag operator, y t-1 = Ly t ]

yt

D 1  șL

1

 ^1  șL` İ t , [ Lag operator L may be set to 1 when it operates on a constant]

Time Series Econometrics

yt

Į  ª1+șL+(șL) 2 +(șL)3 +.......º¼ İ t 1 ș ¬

yt

Į  İ t +șLİ t +ș 2 L2 İ t +ș3 L3 İ t +......... 1 ș

D

yt

1 ș

 İ t +șİ t-1 +ș 2 İ t-2 +ș3 İ t-3 +........

365

(8.26)

Equation (8.26) is called a moving average process of order infinity, i.e., MA(f). Thus, it can be said that the AR(1) process can be converted into MA(f) process. The Mean of AR(1) Process

If we take the expectation of equation (8.26), then we have, E(y t ) =

ȝ=

Į +0 +0 +0+........... 1 ș

Į 1T

(8.27)

The Variance of AR(1) Process

The variance of AR(1) is given by 2

Ȗ0 = E > yt  ȝ @

ª Į º = E« +İ t +șİ t-1 +ș 2 İ t-2 +ș 3 İ t-3 +.........  ȝ » 1  ș ¬ ¼ = E ª¬İ t +șİ t-1 +ș 2 İ t-2 +ș 3 İ t-3 +....................º¼

2

2

= ª¬1+ș 2 +ș 4 +ș 6 +...........º¼ ı 2 , [ All the covariance terms will be zero] -1

= ª¬1  ș 2 º¼ ı 2

=

1 ı2 1  ș2

(8.28)

The Autocovariance Function of AR(1) Process

The jth autocovariance of AR(1) process is given by Ȗ j = E > y t  ȝ @ ª¬ y t-j  ȝ º¼ = E ª¬İ t +șİ t-1 +ș 2 İ t-2 +ș3 İ t-3 +........+ș jİ t-j +ș j+1İ t-j-1 +ș j+2 İ t-j-2 +ș j+3 İ t-j-3 +........º¼ ª¬İ t-j +șİ t-j-1 +ș 2 İ t-j-2 +ș3 İ t-j-3 +...............º¼ 2

2

2

2

= ș j E ª¬İ t-j º¼ +ș j+2 E ª¬İ t-j-1 º¼ +ș j+4 E ª¬İ t-j-2 º¼ +ș j+6 E ª¬İ t-j-3 º¼ +........... = ș jı 2 +ș j+2 ı 2 +ș j+4 ı 2 +ș j+6 ı 2 +............

= ª¬1+ș 2 +ș 4 +ș 6 +..............º¼ ș j ı 2 șj ı2 1  ș2

(8.29)

Chapter Eight

366

The Autocorrelation Function of AR(1) Process

The jth autocorrelation function is given by: ȡj =

Ȗj Ȗ0

= șj

(8.30)

The important properties of AR(1) were derived above by viewing it as a process. Another way to arrive at the same results is to assume that the process is covariance-stationary and we can calculate the moments directly from equation (8.25). The Mean from AR(1) Model Directly

Taking the expectation of equation (8.25) we have (8.31)

E(y t ) = Į+șE(y t-1 )+E(İ t )

Since the process is covariance-stationary, we have E(y t ) = E(y t-1 ) = ȝ . Substituting the value in equation (8.31), we have ȝ = Į+șȝ+0

ȝ

Į 1 ș

Į = ȝ 1  ș

(8.32)

The Variance from AR(1) Model Directly

D , the AR(1) process can be written as

Putting the value of

(8.33)

y t  ȝ = ș(y t-1  ȝ)+İ t

Thus, it can be said that in an AR(1) process, y t can be expressed as a deviation form from the mean value ȝ which can be modelled as the sum of (y t-1  ȝ) and the random error term İ t . The variance of AR(1) is given by 2

Var(y t ) = E > y t  ȝ @

(8.34)

Squaring equation (8.34), we have 2

2

ș 2 > y t-1  ȝ @  2ș > y t-1  ȝ @ İ t +İ 2t

> yt  ȝ @

(8.35)

Taking the expectation of equation (8.35), we have 2

2

E > y t  ȝ @ = ș 2 E > y t-1  ȝ @ +2șE > y t-1  ȝ @ İ t +E ª¬ İ 2t º¼

Since İ t 's are uncorrelated with (y t-1  ȝ) , E > y t-1  ȝ @ İ t 2

(8.36) 0 and the process is covariance-stationary. Thus, we have

2

E > y t  ȝ @ = E > y t-1  ȝ @ = Ȗ 0 . Putting these values in equation (8.36), we have Ȗ 0 = ș 2 Ȗ 0 +2ș×0+ı 2 Ȗ0 =

ı2 1  ș2

(8.37)

Time Series Econometrics

367

The Autocovariance Function from AR(1) Model Directly

The autocovariance between Ȗ j for AR(1) process is given by Ȗ j = E > y t  ȝ @ ª¬ y t-j  ȝ º¼

(8.38)

Multiplying both sides of equation (8.33) by (y t-j  ȝ) , and then taking the expectation, we have



E > y t  ȝ @ ª¬ y t-j  ȝ º¼ = șE[y t-1  ȝ][y t-j  ȝ]+E İ t , ª¬ y t-j  ȝ º¼

(8.39)

Ȗ j = șȖ j-1

From equation (8.39), we have Ȗ1 = șȖ 0

Ȗ 2 = șȖ1 = ș 2 Ȗ 0 Ȗ 3 = șȖ 2 = ș3 Ȗ 0 and so on.

Continuing this process, finally, we have Ȗ j = ș j Ȗ0

(8.40)

The Autocorrelation Function from AR(1) Process Directly

The autocorrelation function ȡ j is given by: ȡj =

Ȗj

(8.41)

Ȗ0

Putting the value of Ȗ j in equation (8.41), we have ȡj = șj

(8.42)

Equation (8.42) shows that the autocorrelation function of the AR(1) process follows a pattern of geometric decay for increasing values of j as shown in Fig. 8-3 given below: 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Fig. 8-3: ACF for AR(1) process

The effects on the appearance of a time-series variable y t for different values of the parameter ș for the AR(1) model are shown below graphically.

Chapter Eight

368 30 20 10 0 -10 -20 -30 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

15

20

25

30

35

40

45

50

55

60

65

70

75

80

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100 105 110 115

Fig. 8-4 (a): ș = 0 (white noise) 30 20 10 0 -10 -20 -30 5

10

85

90

95

100 105

110

Fig. 8-4(b): ș = 0.5 30 20 10 0 -10 -20 -30 5

10

15

20

25

85

90

95

100 105 110

Fig. 8-4(c): ș = 0.9 Fig. 8-4: Effects of an AR(1) process y t = șy t-1 +İ t for different values of ș

Fig. 8-4 shows realisations of the AR(1) process with Į = 0 and {İ t } is a white noise process for different values of the autoregressive parameter ș . Fig. 8-4(a) shows that, in a time-series variable with no autocorrelation, the value of one observation will not provide information about the value of the next observation and appears patternless. Fig. 84(b) shows that with autoregressive parameter value ș = 0.5, the series is smoother, with observations above or below the mean often appearing as clusters of a modest duration. Fig. 8-4(c) shows that, for the autoregressive parameter value ș = 0.9, observations are deviated more from the mean value, may be quite prolonged and appear more clustered of a modest duration, and any shock of the time-series variable takes time to die out. The Second-Order Autoregressive (AR(2)) Process

A second-order autoregressive process is one where the current value of a time-series variable Y depends on the two past values of that variable plus a random error term, i.e., the value of Y at time t depends on the values of (t-1) and (t2) plus a random error term. In a second-order autoregressive process, we regress Y on its first and second lag values. A second-order autoregressive process or a process of order 2 is denoted by AR(2) and is given by

Time Series Econometrics

y t = Į+ș1 y t-1 +ș 2 y t-2 +İ t

369

(8.43)

where {İ t } is a white noise process. The Mean Value of AR(2) Process

Assuming that the process is covariance-stationary and then taking the expectation of the equation (8.43), we have E(y t ) = Į+ș1E(y t-1 ) + ș 2 E(y t-2 )+E(İ t )

(8.44)

Since the process is covariance-stationary, hence, E(y t ) = E(y t-1 ) = E(y t-2 ) = ȝ Therefore, equation (8.44) can be written as ȝ >1  ș1  ș 2 @ = Į

ȝ=

Į >1  ș1  ș 2 @

(8.45)

The Variance of AR(2) Process

Putting the value of D in equation (8.43), we have y t  ȝ = ș1 (y t-1  ȝ)+ș 2 (y t-2  ȝ)+İ t

(8.46)

Multiplying equation (8.46) by (y t  ȝ) and then taking expectation, we have E(y t  ȝ) 2 = ș1E(y t-1  ȝ)(y t  ȝ)+ș 2 E(y t-2  ȝ)(y t  ȝ)+E{İ t (y t  ȝ)} Ȗ 0 = ș1 Ȗ1 +ș 2 Ȗ 2

(8.47)

The jth Autocovariance Function of AR(2) Process

Multiplying equation (8.46) by (y t-j  ȝ) and then taking expectation, we have E[y t  ȝ][y t-j  ȝ] = ș1E[y t-1  ȝ][y t-j  ȝ]+ș 2 E[y t-2  ȝ][y t-j  ȝ]+E{İ t [y t-j  ȝ]} Ȗ j = ș1 Ȗ j-1 +ș 2 Ȗ j-2 , j = 1, 2

(8.48)

The jth Autocorrelation Function of AR(2) Process

The jth autocorrelation function ȡ j is given by ȡj =

=

Ȗj Ȗ0 ș1 Ȗ j-1 +ș 2 Ȗ j-2 Ȗ0

= ș1ȡ j-1 +ș 2 ȡ j-2

(8.49)

The pth-Order Autoregressive (AR(p)) Process

A pth-order autoregressive process is one where the current value of a time-series variable Y depends on the p lag values of that variable plus the random error term, i.e., the value of Y at time t depends on the values of (t-1), (t-2),…., (t-p) plus a random error term. In a pth-order autoregressive process, we regress Y on its p lag values plus a random error term. A pth-order autoregressive process or a process of order p is denoted by AR(p) and is given by y t = Į+ș1 y t-1 +ș 2 y t-2 +.......+ș p y t-p +İ t

where {İ t } is a white noise process.

(8.50)

Chapter Eight

370

The Mean Value of the pth-Order Autoregressive Process

Assuming that the process is covariance-stationary and then taking the expectation of equation (8.50), we have E(y t ) = Į +ș1E(y t-1 )+ș 2 E(y t-2 )+.......+ș p E(y t-p ) +E(İ t )

(8.51)

Since the process is covariance-stationary, E(y t ) = E(y t-1 ) =......= E(y t-p ) = ȝ Therefore, equation (8.51) can be written as ȝ ª¬1  ș1  ș 2  .......  ș p º¼ = Į ȝ=

Į ª¬1  ș1  ș 2  ..........  ș p º¼

(8.52)

The Variance of AR(p) Process

Putting the value of Į in equation (8.50), we have y t  ȝ = ș1 (y t-1  ȝ)+ș 2 (y t-2  ȝ)+........+ș p (y t-p  ȝ) +İ t

(8.53)

Multiplying equation (8.53) by (y t  ȝ) and then taking expectation, we have E(y t  ȝ) 2 = ș1E(y t-1  ȝ)(y t -ȝ)+ș 2 E(y t-2  ȝ)(y t  ȝ)+.....+ș p E(y t-p  ȝ)(y t  ȝ)+E{İ t (y t  ȝ)}

Ȗ 0 = ș1 Ȗ1 +ș 2 Ȗ 2 +........+ș p Ȗ p

(8.54)

The jth Autocovariance Function of AR(p) Process

Multiplying equation (8.53) by (y t-j  ȝ) and then taking the expectation, we have E(y t  ȝ)(y t-j  ȝ) = ș1E(y t-1  ȝ)(y t-j  ȝ)+ș 2 E(yt-2  ȝ)(y t-j  ȝ)+.....+ș p E(y t-p  ȝ)(yt-j  ȝ)+E{İ t (yt-j  ȝ)}

Ȗ j = ș1 Ȗ j-1 +ș 2 Ȗ j-2 +.........+ș p Ȗ j-p , [ j = 1, 2,……,p]

(8.55)

The jth Autocorrelation Function of AR(p) Process

The jth autocorrelation function ȡ j is given by: ȡj =

=

Ȗj Ȗ0 ș1 Ȗ j-1 +ș 2 Ȗ j-2 +...........+ș p Ȗ j-p Ȗ0

= ș1ȡ j-1 +ș 2 ȡ j-2 +.........+ș p ȡ j-p

(8.56)

Stationarity Condition for an AR(p) Process

The AR(p) model is given by y t = Į+ș1 y t-1 +ș 2 y t-2 +.........+ș p y t-p +İ t

(8.57)

where {İ t } is a white noise process. Using the lag operator L, equation (8.57) can be written as y t = Į+ș1Ly t +ș 2 L2 y t +........+ș p Lp y t +İ t [1  ș1L  ș 2 L2  ........  ș p Lp ]y t = Į+İ t

ș(L) y t = Į+İ t , [where, ș(L) = [1  ș1 L  ș 2 L2  ........  ș p Lp ] ]

(8.58)

Time Series Econometrics

371

It can be said that the AR(p) process will be stationary if equation (8.58) can be written as yt =

Į + ș(L)-1İ t ș(L)

yt =

Į + ș(L)-1İ t [1  ș1  ș 2  ........  ș p ]

y t = ȝ+ ș(L)-1İ t , where, ȝ = E(y t )

(8.59)

with ș(L)-1 converging to zero. This means that the autocorrelation functions will decline eventually as the lag length is increased. Expanding the term ș(L)-1 , equation (8.57) can be written as y t = ȝ +İ t +Į1İ t-1 +Į 2 İ t-2 +Į3 İ t-3 +Į 4 İ t-4 +.........

(8.60)

Equation (8.60) is called an MA(f) . Thus, it can be said that the AR(p) process given by equation (8.57) will be stationary, if the coefficients of MA(f) process eventually will decline for increasing lag length. On the other hand, if the AR(p) process is non-stationary, the coefficients of the MA(f) process given in equation (8.60) will not decline or will not converge to zero as the lag length increases. The condition for testing the stationarity of the AR(p) process is that all the roots of the characteristic equation 1  ș1z  ș 2 z 2  ..........  ș p z p = 0

(8.61)

lie outside the unit circle. Equation (8.61) is called the characteristic equation because its roots determine whether the AR(p) process y t is stationary or non-stationary. The autocorrelation function of the AR(p) process will depend on the roots of the characteristic equation, which is a polynomial function in z. Ex. 8-5: Let us now consider the random walk model of the type y t = y t-1 + İ t

(8.62)

To test the stationarity of the series y t , first we express y t-1 in a lag operator notation, i.e., y t-1 = Ly t . Using the lag operator, equation (8.62) can be written as y t = Ly t +İ t y t  Ly t [1  L]y t

İt İt

(8.63)

The characteristic equation is given by 1 z = 0 z=1

(8.64)

Since the root z = 1 does not lie outside the unit circle, or lies on the unit circle, the specific AR process given in equation (8.62) is not a stationary process, i.e., non-stationary process. This procedure can be applied to test whether the series is stationary or not for AR model with longer lag lengths. Let us now consider the AR(2) model of the type y t = 0.8y t-1  0.15y t-2 +İ t

Using the lag operator equation (8.64), can be written as

(8.65)

Chapter Eight

372

y t = 0.8Ly t  0.15L2 y t  İ t y t  0.8Ly t  0.15L2 y t (1  0.8L  0.15L2 )y t

İt

(8.66)

İt

The characteristic equation of model (8.66) is given by 1  0.8z  0.15z 2

(8.67)

0

The roots of equation (8.67) are given by z1 =

0.8+ (  0.8) 2  4 u 0.15 2 u 0.15

3.33

z2 =

0.8  (  0.8) 2  4 u 0.15 2 u 0.15

2.0

Since the roots lie outside the unit circle, the AR(2) process for y t described by equation (8.65) is a stationary process. Again let us consider the AR(3) model for y t of the type y t = 3y t-1  2.7y t-2 +0.7y t-3 + İ t

(8.68)

Using the lag operator, equation (8.68) can be written as y t = 3Ly t  2.7L2 y t +0.7L3 y t + İ t [1  3L  2.7L2  0.7L3 ]y t = İ t

(8.69)

The characteristic function of model (8.69) is given by 1  3z  2.7z 2  0.7z 3 = 0

(8.70)

The roots of the equation are as (1  z) (1  2z+0.7z 2 ) = 0

(8.71)

Solving equation (8.71), we have the following three roots z = 1, z = 2.21, and z = 0.6461

It is found that only 1 root of three lies outside the unit circle. Thus, it can be said that the AR process for y t given in equation (8.68) is not stationary, i.e., a non-stationary process. Test for Stationarity Based on Correlogram Definition of Correlogram: Let y1 , y 2 ,.......,y T be a univariate time-series data set. The sample autocorrelation coefficient between y t , and y t-k is denoted by ȡˆ k and is given by ȡˆ k =

cov(y t ,y t-k ) var(y t ) var(y t-k ) T

=

¦ (y

t

 y)(y t-k  y)

t=k+1

T

¦ (y t  y)2 t=1

,k1+I L@ >1+I L@ t -1

>1  (I )L@

yt =

Į +İ >1+I @ t

Į ª¬1+(I )L+(I ) 2 L2  (I )3 L3  ........º¼ y t = +İ >1+I @ t y t  I Ly t +I 2 L2 y t  I 3 L3 y t  ...........

yt

D

>1  I @

Į +İ >1+I @ t

+I y t-1  I 2 y t-2  I 3 y t-3  ............. + H t

y t = ș 0 +ș1 y t-1 +ș 2 y t-2 +ș3 y t-3 +........ + İ t , where, ș 0 =

Į 1I

(8.82)

Thus, from equation (8.82), it can be said that the first-order moving average process can be expressed as AR(f) .

Time Series Econometrics

377

The Mean Value of MA(1) Process

Taking the expectation of equation (8.81), we have E(y t ) = Į+I E(İ t -1 )+ E(İ t )

(8.83)

ȝ=Į

The Variance of MA(1) Process

The variance of MA(1) process is given by

Ȗ 0 = E(y t  ȝ) 2 = E I İ t -1 + İ t



2

= I 2 E(İ 2t-1 )+2I E(İ t -1İ t ) +E(İ 2t ) = I 2 ı 2 +ı 2 ı 2 ª¬1  I 2 º¼

(8.84)

The Autocovariance Function for MA(1) Process

The first-order autocovariance function for MA(1) process is given by Ȗ1 = E > y t  ȝ @> y t-1  ȝ @

= E >I İ t-1 +İ t @>I İ t-2 +İ t-1 @ = I 2 E(İ t-1İ t-2 )+I E(İ 2t-1 ) +I E(İ t İ t-2 ) + E(İ t-1İ t )

= I ı2

(8.85)

The jth-order (j>1) autocovariance for MA(1) process is given by Ȗ j = E > y t  ȝ @ ª¬ y t-j  ȝ º¼ , j>1

= E >I İ t-1 +İ t @ ª¬I İ t-j-1 +İ t-j º¼ = I 2 E(İ t-1İ t-j-1 )+I E(İ t-1İ t-j ) +I E(İ t İ t-j-1 ) + E(İ t İ t-j )

=0

(8.86)

The Autocorrelation Function for MA(1) Process

The first autocorrelation function is given by ȡ1 =

=

=

Ȗ1 Ȗ0

I ı2 (1  I 2 )ı 2 I (1  I 2 )

(8.87)

Chapter Eight

378

From equation (8.87), we see that the first-order autocorrelation ȡ1 of MA(1) process is a function of I . Thus, for different specifications of I , the value of ȡ1 will be different. Positive values of I induce positive autocorrelation in the series and for negative value of I the autocorrelation ȡ1 will be negative. The jth (j>1) Autocorrelation Function of MA(1) Process

The jth autocorrelation function ȡ j is given by ȡj =

Ȗj Ȗ0

=0

(8.88)

Thus, it can be concluded that, in the case of MA(1) process, the higher-order autocorrelations are all zero. The Second-Order Moving Average (MA(2)) Process

In a second-order moving average process, the time-series variable y t can be generated as a sum of a constant and moving average of two past and current random error terms. The second-order moving average process is denoted by MA(2) and is given by y t = D +I1İ t-1 +I2 İ t-2 +İ t

(8.89)

where {İ t } is a white noise process. The Mean Value of MA(2) Process

Taking the expectation of equation (8.89), we have E(y t ) = Į+I1E(İ t-1 )+I2 E(İ t-2 )+E(İ t )

(8.90)

ȝ=Į

The Variance of MA(2) Process

The variance of MA(2) process is given by 2

Ȗ0 = E > yt  ȝ @

2

= E >I1İ t-1 +I2 İ t-2 +İ t @

= E ª¬İ 2t +I12 İ 2t-1 +I22 İ 2t-2 +2I1İ t İ t-1 +2I1I2 İ t-1İ t-2 +2I2 İ t İ t-2 º¼ = E(İ 2t ) +I12 E(İ 2t-1 )+I22 E(İ 2t-2 ) +2I1E(İ t İ t-1 )+2I1I2 E(İ t-1İ t-2 )+2I2 E(İ t İ t-2 )

= ı 2 +I12 ı 2 +I22 ı 2 = ı 2 ª¬1+I12 +I22 º¼

(8.91)

The Autocovariance Function for MA(2) Process

The first autocovariance function is given by Ȗ1 = E > y t  ȝ @> y t-1  ȝ @

= E > İ t  I1İ t-1  I2 İ t-2 @> İ t-1  I1İ t-2  I2 İ t-3 @ = I1E ª¬İ 2t-1 º¼ +I1I2 E ª¬İ 2t-2 º¼

= I1ı 2 +I1I2 ı 2 = >I1 +I1I2 @ ı 2

(8.92)

Time Series Econometrics

379

The second autocovariance function is given by Ȗ 2 = E > y t  ȝ @> y t-2  ȝ @

= E > İ t +I1İ t-1 +I2 İ t-2 @> İ t-2 +I1İ t-3 +I2 İ t-4 @ = I2 E ª¬İ 2t-2 º¼

= I2 ı 2

(8.93)

The jth (j>2) autocovariance function for MA(2) process is given by Ȗ j = E > y t  ȝ @ ª¬ y t-j  ȝ º¼

= E > İ t +I1İ t-1 +I2 İ t-2 @ ª¬İ t-j +I1İ t-j-1 +I2 İ t-j-2 º¼

=0

(8.94)

The Autocorrelation Function for MA(2) Process

The first autocorrelation function for MA(2) process is given by ȡ1 =

Ȗ1 Ȗ0 (I1  I1I2 ) (1  I12  I22 )

(8.95)

The second autocorrelation function for MA(2) process is given by ȡ2 =

Ȗ2 Ȗ0

I2 (1  I12  I22 )

(8.96)

The jth (j>2) autocorrelation function for MA(2) process is given by ȡj =

Ȗj Ȗ0

=0

(8.97)

The qth-Order Moving Average (MA(q)) Process

In a qth-order moving average process, the time-series variable y t can be generated as a sum of a constant and moving averages of q lag values of the random error term and its current value. The qth-order moving average process is denoted by MA(q) and is given by y t = Į+I1İ t-1 +I2 İ t-2 +.........+Iq İ t-q +İ t

(8.98)

where {İ t } is a white noise process. The Mean Value of MA(q) Process

Taking the expectation of equation (8.98), we have E(y t ) = D +I1E(İ t-1 )+I2 E(İ t-2 )+......+Iq E(İ t-q )+E(İ t ) ȝ=Į

(8.99)

Chapter Eight

380

The Variance of MA(q) Process

The variance of MA(q) process is given by 2

E > y t  ȝ @ = E ª¬ İ t +I1İ t-1 +I2 İ t-2 +...........+Iq İ t-q º¼

2

= E(İ 2t ) + I12 E(İ 2t-1 )+I22 E(İ 2t-2 )+.......+Iq2 E(İ 2t-q )

[ E(İ i ,İ j ) = 0, i z j ]

= ı 2 +I12 ı 2 +I22 ı 2 +..........  Iq2 ı 2

(1  I12  I22  .........  Iq2 )ı 2

(8.100)

The Autocovariance Function of MA(q) Process

The jth autocovariance function of MA(q) is given by Ȗ j = E > y t  ȝ @ ª¬ y t-j  ȝ º¼ , (j = 1, 2,.......,q) = E ª¬ İ t +I1İ t-1 +I2 İ t-2 +.......+I jİ t-j +I j+1İ t-j-1 +I j+2 İ t-j-2 +......  Iq İ t-q º¼

ª¬İ t-j +I1İ t-j-1 +I2 İ t-j-2 +.......+Iq-jİ t-j-q+j +.......+Iq İ t-j-q º¼ = I j E(İ 2t-j )+I1I j+1E(İ 2t-j-1 )+I2I j+2 E(İ 2t-j-2 )+.....  Iq-jIq E(İ 2t-q ) [ E(İ i ,İ j ) = 0, i z j ]

= I j ı 2 +I1I j+1ı 2 +I2I j+2 ı 2 +........  Iq-jIq ı 2 ª¬I j  I1I j+1  I2I j+2  ........  Iq-jIq º¼ ı 2

(8.101)

The Autocorrelation Function of MA(q) Process

The jth autocorrelation function (j = 1, 2,……….,q) is given by ȡj =

Ȗj Ȗ0

ª¬I j  I1I j+1  I2I j+2  ............  Iq-jIq º¼ ª¬1 + I12 +I22 +..........+Iq2 º¼

(8.102)

The Invertibility Condition for a MA Process

Let us now consider the following MA(q) process in order to examine the shape of the partial autocorrelation functions (PACF) for a moving average process y t = c +I1İ t-1 +I2 İ t-2 +.........+Iq İ t-q +İ t

where {İ t } is a white noise process. Using the lag operators, equation (8.103) can be written as y t = c +I1Lİ t +I2 L2 İ t +.......+Iq Lq İ t +İ t y t = c +[1+I1L+I2 L2 +.......+Iq Lq ]İ t yt [1+I1L+I2 L2 +.......+Iq Lq ]

c  İt [1+I1L+I2 L2 +.......+Iq Lq ]

yt c = +İ t , where ) (L) = [1+I1L+I2 L2 +.......+Iq Lq ] ĭ(L) ĭ(L)

(8.103)

Time Series Econometrics

yt ȝ = +İ t , where E(y t ) = ȝ ĭ(L) ĭ(L)

381

(8.104)

If the process is invertible, then equation (8.104) can be written as ĭ(L)-1 y t =

yt =

ȝ +İ t ĭ(L)

ȝ + ș1 y t-1 +ș 2 y t-2 +...........+İ t ĭL

(8.105)

Thus, it can be said that if the MA process is invertible, then the MA process can be written as AR(f) .

8.6 Partial Autocorrelation Function Let {y t , t  } be a stationary process. The partial autocorrelation coefficient at lags k for k t 2 is denoted by Ik and is defined as the direct correlation between y t and y t-k after controlling the effects for observations at intermediate lags, i.e., all lags < k. Thus, the partial autocorrelation function (PACF) is defined as the correlation between y t and y t-k after reducing the effects of the observations y t-1 , y t-2 ,......,y t-(k-1) . For example, for lags 2, the PACF measures the direct correlation between y t , and y t-2 after controlling for the effects of y t-1 . At lag 0, the partial autocorrelation coefficient is 1, i.e., I0 = 1 ; and at lag 1, the partial autocorrelation coefficient is equal to the autocorrelation coefficient because there are no intermediate lag effects for removal. Thus, we have I1 = ȡ1 , where ȡ1 is the autocorrelation coefficient between y t and y t-1 . At lags 2, the partial autocorrelation coefficient is given by

I2 =

ȡ 2  ȡ12 1  ȡ12

(8.106)

We will now derive the formula to calculate the PACF for lags greater than 2. For an AR(p) process, there will be direct links between y t and y t-k for k d p , but no direct connections for k>p. For example, consider the AR(2) model for a time-series variable y of the type y t = Į +ș1 y t-1 +ș 2 y t-2 +İ t

(8.107)

where {İ t } is a white noise process. For this AR(2) model, there is a direct connection between y t and y t-1 , and between y t and y t-2 , but there will be no direct connection between y t and y t-j , where j t 3 . Thus, we will not find any zero partial autocorrelation coefficient for lags up to the order of the model, but we will find zero partial autocorrelation coefficient for lags greater than the order of the model. For example, for AR(2) models, only the first two partial autocorrelation coefficients will be non-zero, but for lag k>2, the partial autocorrelation coefficient will be zero. Derivation: The correlation between y t , and y t-k is called the kth-order autocorrelation. This kth-order autocorrelation between y t , and y t-k can in part be due to the correlation these observations have with the intervening lags y t-1 , y t-2 ,.....,and y t-(k-1) . Now, the partial autocorrelations are calculated to adjust for this kth-order correlation. We

may discuss this procedure as follows: Let us now consider the following system of k linear equations for estimation: y t = I11 y t-1 +H1t

½ ° ° = I31 y t-1 +I32 y t-2  I33 y t-3  H 3t °° ¾ ° ° ° = Ik1 y t-1 +Ik2 y t-2  Ik3 y t-3  .......  Ikk y t-k  H kt °¿

y t = I21 y t-1 +I22 y t-2  H 2t yt . . yt

(8.108)

Chapter Eight

382

Here, the coefficients I11 , I22 , , I33 , …,and Ikk are called the partial autocorrelations. In practice, they are not derived in this manner, but we can derive them from the autocorrelation functions discussed below: Multiplying the last equation of (8.108) by y t-k and then taking the expectation and then dividing the resultant equation by the variance of y t , we have, (8.109)

ȡ k = Ik1ȡ k-1 +Ik2 ȡ k-2 +Ik3ȡ k-3 +.......  Ik(k-1) ȡ1 +Ikk

If we do the same operation with y t-1 , y t-2 ,.......,y t-k successively, we get the following set of k equations (YuleWalker)

U1 = Ik1 +Ik2 U1  Ik3 U 2  ..............  Ik(k-1) U k-2  Ikk U k-1 ½ °

U 2 = Ik1 U1 +Ik2  Ik3 U1  .............  Ik(k-1) U k-3  Ikk U k-2 °

°° ¾ ° ° ° U k = Ik1 U k-1 +Ik2 U k-2  Ik3 U k-3  ...........  Ik(k-1) U1  Ikk °¿

. . .

(8.110)

The system (8.110) of k linear equations can be written as the following matrix form: ª ȡ1 «ȡ « 2 « . « « . « . « «¬ȡ k

ȡ1 º ª 1 » «ȡ 1 » « 1 » « . . » = « . . » « » « . . » « »¼ «¬ȡ k-1 ȡ k-2

ȡ2 ȡ1 . . . ȡ k-3

..... ȡ k-2 ..... ȡ k-3 ..... . ..... . ..... . ..... ȡ1

ȡ k-1 º ªIk1 º ȡ k-2 »» ««Ik2 »» . »« . » »« » . »« . » . »« . » »« » 1 »¼ «¬Ikk »¼

(8.111)

P = M)

ª ȡ1 «ȡ « 2 « . where P = « « . « . « «¬ȡ k

º » » » », M = » » » »¼

ȡ1 ª 1 «ȡ 1 « 1 « . . « . « . « . . « «¬ȡ k-1 ȡ k-2

ȡ2 ȡ1 . . . ȡ k-3

..... ȡ k-2 ..... ȡ k-3 ..... ..... ..... .....

. . . ȡ1

ȡ k-1 º ªIk1 º » «I » ȡ k-2 » « k2 » « . » . » » , and ) = « » . » « . » « . » . » » « » 1 »¼ «¬Ikk »¼

Since M is a non-singular matrix, its inverse matrix exists. Now, multiplying equation (8.111) by M -1 , we have ) = M -1P

Solving equation (8.112), we can find the value of

(8.112)

Ikk .

Another Way: To find the value of Ikk , we can apply the Cramer’s rule. Applying the Cramer’s rule to the above system of k linear equations, we have

Time Series Econometrics

Ikk =

1 ȡ1 . .

ȡ1 1 . .

ȡ2 ȡ1 . .

..... ȡ k-2 ..... ȡ k-3 ..... . ..... .

ȡ1 ȡ2 . .

.

.

.

..... . ..... ȡ1 ..... ȡ k-2 ..... ȡ k-3 ..... . ..... . ..... . ..... ȡ1

. ȡk ȡ k-1 ȡ k-2 . . . 1

ȡ k-1 ȡ k-2 ȡ k-3 1 ȡ1 ȡ 2 ȡ1 1 ȡ1 . . . . . . . . . ȡ k-1 ȡ k-2 ȡ k-3

383

(8.113)

It follows from the definition of Ikk that the partial autocorrelations of autoregressive processes have a particular form. For AR(1) model, I11 = ȡ1 , and Ikk = 0, for k>1 For AR(2) model we have, I11 = U1 , I22 =

U 2  U12 , and Ikk = 0 for k>2 1  U12

1 ȡ1 2 ȡ2 ȡ ȡ For AR(3) model, I11 = ȡ1 , I22 = 2 21 , I33 = 1 1  ȡ1 ȡ1 ȡ2

ȡ1 ȡ1 1 ȡ2 ȡ1 ȡ3 , and Ikk = 0, for k>3 ȡ1 ȡ 2 1 ȡ1 ȡ1 1

and for AR(p) model, we have

1 ȡ1

I11 = ȡ1 , I22 =

ȡ ȡ 2  ȡ12 , I33 = 2 2 1 1  ȡ1 ȡ1 ȡ2

1 ȡ1 ȡ2

ȡ1 1 ȡ1

ȡ2 ȡ1 1

ȡ1 ȡ2 ȡ3

ȡ1 ȡ3 ȡ , I44 = 3 ȡ1 ȡ 2 1 1 ȡ1 ȡ1 ȡ1 1 ȡ2 ȡ3

ȡ2 ȡ1 1

ȡ1 ȡ2 ȡ1

ȡ4 , so on, and Ikk = 0, for k>p ȡ3 ȡ2

ȡ1 ȡ2

1 ȡ1

ȡ1 1

ȡ1 ȡ1 1 ȡ2

Hence, for an AR process, the partial autocorrelation coefficient is zero for lags greater than the order of the process.

8.7 Sample ACF and PACF Plots for Different Processes Some typical examples of some standard MA and AR processes with their autocorrelation and partial autocorrelation functions are given in Figures 8-6(a) to 8-6(e). Each Fig. has 5% two-side rejection bands. These are based on r1.96 u 1/ T , where T is the sample size.

Chapter Eight

384 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00

0

1

2

3

4

5

6

7 ACF

8

9

10

11

12

13

14

15

PACF

Fig. 8-6(a): Sample autocorrelation and partial autocorrelation functions for an MA(1) process: y t = -0.5İ t-1 +İ t 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00

0

1

2

3

4

5

6

7 ACF

8

9

10

11

12

13

14

15

PACF

Fig. 8-6(b): Sample autocorrelation and partial autocorrelation functions for an MA(2) process: y t = 0.5İ t-1 -0.20İ t-2 +İ t 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00

0

1

2

3

4

5

6

7 ACF

8

9

10

11

12

13

14

15

PACF

Fig. 8-5(c): Sample autocorrelation and partial autocorrelation functions for a slowly decaying AR(1) process: y t = 0.9y t-1 +İ t 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00

0

1

2

3

4

5

6

7 ACF

8

9

10

11

12

13

14

15

PACF

Fig. 8-6(d): Sample autocorrelation and partial autocorrelation functions for a more rapidly decaying AR(1) process:

y t = 0.5y t-1 +İ t

Time Series Econometrics

385

1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00

0

1

2

3

4

5

6

7 ACF

8

9

10

11

12

13

14

15

PACF

Fig. 8-6(e): Sample autocorrelation and partial autocorrelation functions for a more rapidly decaying AR(1) process:

y t = -0.5y t-1 +İ t

In Fig. 8-6(a), the MA(1) process with confidence interval [-0.035, 0.035] has an autocorrelation function (ACF) that is statistically significant only for lag 1 which is negative, while the partial autocorrelation functions (PACF) declines geometrically and the PACFs are statistically significant for the first five lags and they are negative. As a result, the coefficients of the generating MA process will be negative. Again, the ACFs and PACFs for MA(2) process are anticipated in Fig. 8-6(b) with the confidence interval [-0.03, 0.03]. The first five autocorrelation functions are statistically significant while the first nine partial autocorrelation functions are statistically significant and they are declining geometrically. Note that, since in the MA(2) model the second coefficient on the lagged error term is negative, the ACFs and PACFs are between positive and negative alternatively, we termed this alternating and declining partial autocorrelation functions as a damped sine wave or damped sinusoid. In Fig. 8-6(c), the ACFs and PACFs for AR(1) model are presented with the confidence interval [-0.3139, 0.3139], and fairly high autocorrelation coefficient which is close to 1. The autocorrelation functions decline very slowly for increasing lag lengths. Thus, ACF would be expected to die away very slowly. Again, it is observed that for the AR(1) model, the first two PACFs are statistically significant while the remaining others are insignificant and close to zero. Fig. 8-6(d) plots the ACFs and PACFs of the AR(1) process with confidence interval [-0.3139, 0.3139], which is generated using the identical error terms but a much smaller autoregressive coefficient. In this case, the autocorrelation function dies away much more quickly than in the previous example. In fact, the first five autocorrelation functions and the first two PACFs are statistically significant. Fig. 8-6(e) shows the autocorrelation and partial autocorrelation functions with the confidence interval [-0.03, 0.03] for the AR(1) process with negative autocorrelation coefficients. Fig. 8-6(e) shows the damped sinusoidal pattern for the ACFs and the PACFs are statistically insignificant after lags 5. Recall that the autocorrelation coefficient for this AR(1) model at lags k is equal to (-0.5) k , and this will be positive for even numbers of k and will be negative for odd numbers of k. Only the first three partial autocorrelation functions are statistically significant.

8.8 Autoregressive and Moving Average (ARMA) Process Sometimes, in business, finance, banking, economic, and socio-economic problems, the time-series variable y t may exhibit both AR and MA characteristics. Thus, we have to combine the autoregressive and moving average specification which is called the ARMA model. The ARMA model consists of an AR part and an MA part. In fact, there are no fundamental differences between an AR process and an MA process. Under some conditions, an AR process can be written as an MA process and also an MA process can be written as an AR process. The order of any one of these models can be selected using some selection criteria, and then, we can choose for an AR, MA or ARMA process by diagnostic checking the ACF and PACF. In an ARMA process, we regress Y on its lagged values and also the lagged values of the random error terms. An autoregressive and moving average process of order p and q is denoted by ARMA (p, q) and is given by Yt = c+Į1Yt-1 +Į 2 Yt-2 +.......+Į p Yt-p +ș1İ t-1 +ș 2 İ t-2 +.......+ș q İ t-q +İ t

where {İ t } is a white noise process. Using the lag operator L, the ARMA(p, q) model can be written as y t = c +Į1Ly t +Į 2 L2 y t +......+Į p Lp y t + İ t +ș1Lİ t +ș 2 L2 İ t +.......+ș q Lq İ t

y t  Į1Ly t  Į 2 L2 y t  .........  Į p Lp y t = c + ª¬1+ș1L+ș 2 L2 +...........+ș q Lq º¼ İ t ª¬1  Į1L  Į 2 L2  ...........  Į p Lp º¼ y t = c + ª¬1+ș1L+ș 2 L2 +........+ș q Lq º¼ İ t

(8.114)

Chapter Eight

386

Į(L) y t = c +ș(L)İ t yt =

c ș(L) + İt Į(L) Į(L)

(8.115)

The Mean of ARMA(p, q) Process

The expected value of equation (8.115) is given by E(y t ) =

ȝ=

c ș(L) + E(İ t ) Į(L) Į(L)

c 1  Į1  Į 2  .........  Į p

(8.116)

Thus, we have y t = ȝ+

ș(L) İt Į(L)

(8.117)

Sometimes, it is better to write the ARMA(p, q) process in terms of the deviations form from the mean value. From equation (8.116), we have c = ȝ  Į1ȝ  Į 2 ȝ  ........  Į p ȝ

(8.118)

Putting the value of c in equation (8.114), we have y t  ȝ = Į1 (y t-1  ȝ)+Į 2 (y t-2  ȝ)+.....+Į p (y t-p  ȝ) + İ t +ș1İ t-1 +.....+ ș q İ t-q

(8.119)

The Variance of ARMA(p, q) Process

Multiplying equation (8.119) by (y t  ȝ) and then taking the expectation, we have 2

E > y t  ȝ @ = Į1E[y t  ȝ][y t-1  ȝ]+Į 2 E[y t  ȝ][y t-2  ȝ]+.....+Į p E[y t  ȝ][y t-p  ȝ]+ E{İ t [y t  ȝ]}+ș1E{İ t-1[y t  ȝ]}+ș 2 E{İ t-2 [y t  ȝ]}+.....+ș q E{İ t-q [y t  ȝ]}

Ȗ 0 = Į1 Ȗ1 +Į 2 Ȗ 2 +.........+Į p Ȗ p

(8.120)

The variance of ARMA (p, q) process can also be obtained as 2

ª ș(L) º 2 Var > y t @ = « » ı ¬ Į(L) ¼

(8.121)

The Autocovarince of ARMA(p, q) Process

Multiplying equation (8.119) by (y t-j  ȝ) for j > p and then taking expectation, we have Ȗ j = Į1 Ȗ j-1 +Į 2 Ȗ j-2 +.........+Į p Ȗ j-p

This is called the autocovariance function for the ARMA (p, q) model. The Autocorrelation Function of ARMA(p, q) Process

The jth (j>p) autocorrelation function is given by

ȡj =

Ȗj Ȗ0

(8.122)

Time Series Econometrics

=

387

Į1 Ȗ j-1 +Į 2 Ȗ j-2 +........+Į p Ȗ j-p Ȗ0 (8.123)

= Į1ȡ j-1 +Į 2 ȡ j-2 +........+Į p ȡ j-p

8.9 Model Selection Criteria Different techniques are developed that can be applied directly to select a good econometric model, but for time-series econometric models, the Akaike Information Criterion (AIC), Schwarz’s Bayesian Information Criterion (SBIC), and the Hannan–Quinn Information Criterion (HQIC) are the most popular and widely applicable to measure the relative quality and are discussed in chapter 7. Given a set of time-series econometric/statistical models for the data, AIC, SBIC and HQIC estimate the quality of each model relative to each of the other models. Hence, AIC, SBIC and HQIC criteria are used for model selection among a finite set of models. The model with the lowest value of AIC, SBIC and HQIC is preferred. The Schwarz’s Bayesian Information Criterion (SBIC) is based, in part, on the likelihood function and it is closely related to the Akaike Information Criterion (AIC). The Hannan–Quinn information criterion (HQIC) is an alternative to the Akaike information criterion (AIC), and the Schwarz’s Bayesian information criterion (SBIC). We will choose the model among a set of models for which the criteria values are the smallest. While the AIC and SBIC criteria differ in their trade-off between fit and parsimony, the SBIC criterion can be preferred because it has the property that will almost surely select the true model, if T tends to infinity provided that the true model is in the class of ARMA (p, q) models for relatively small values of p and q. Selection of an Appropriate AR(p) Process Using These Criteria

The AR(p) model for a time series variable Y is given by Yt = c+Į1Yt-1 +Į 2 Yt-2 +.........+Į p Yt-p +İ t

(8.124)

where {İ t } is a white noise process. Here, we have E(Yt-j , İ t ) = 0,  j, implying that the error terms are uncorrelated with the explanatory variables. The estimation of an AR process is straightforward. We can estimate the AR process using the OLS technique, i.e., by minimising the residual sum of squares, and then, we can obtain the ML estimate of ı 2 . Based on the ML estimate of ı 2 , the number of parameters that are included in the estimable AR model and sample size, we can obtain the AIC, SBIC and HQIC criteria using equations (7.17), (7.18) and (7.19). The AR model for which the values of AIC, SBIC and HQIC are the smallest will be selected for estimation and other purposes. Ex. 8-8: Select an appropriate AR model using AIC, SBIC and HQIC criteria for the given problem in Ex. 8-6. Solution: To select an appropriate AR model using AIC, SBIC and HQIC criteria, we regress ER on its p lag values of the type ER t = c+ș1ER t-1 +ș 2 ER t-2 +.......+ș p ER t-p +İ t

(8.125)

where {İ t } is a white noise process and the variable ER is used for the exchange rate between Bangladesh and the USA. Using the given data in Ex. 8-6, the AR models of different orders are estimated using the software package RATS including the AIC, SBIC and HQIC criteria. The estimated results are reported in Table 8-3. Table 8-3: The estimated results of AIC, SBIC and HQIC criteria for each AR model.

The models are estimated by the Box-Jenkins method using RATS software: Dependent Variable ER Estimated Results for AR(1) Model Variable Coeff Std Error T-Stat Signif AIC SBIC 1. Constant 1197.7288 10778.9534 0.1111 0.9121 1.3439 1.4250 2. AR{1} 0.9986 0.0129 77.4824 0.0000 Estimated Results for Model AR(2) Variable Coeff Std Error T-Stat Signif AIC SBIC 1.3659 1.4888 1. Constant 295.1924 608.3725 0.4853 0.6301 2. AR{1} 1.1675 0.1550 7.5311 0.0000 0.1557 -1.1109 0.2732 3. AR{2} -0.1730

HQIC 1.3741 HQIC 1.4113

Chapter Eight

388

From the estimated results, it is found that the values of AIC, SBIC and HQIC criteria are the smallest for the AR(1) model. Thus, it can be said that, for the variable ER, the AR(1) model will be appropriate, i.e., ER t = c+ș1ER t-1 +İ t . The estimated AR(1) model is given below: ERˆ t = 1197.7288+ 0.9986ER t-1 , R 2 t-Test: 0.1111 77.4824 SE: 10778.9534 0.0129

0.993½ ° ¾ ° ¿

(8.126)

Selection of an Appropriate MA(q) Process Using These Criteria

The estimation for an MA process is not straightforward like an AR model; it is very complicated. Let us now consider the MA(1) process of the type (8.127)

Yt = ȝ+İ t +Įİ t-1

Since İ t-1 is not observed, we cannot apply the regression technique to estimate the MA process directly. If we apply the OLS method to the MA(1) process, we have to minimise the following equation: S(Į, ȝ) =

T

¦ (Y  ȝ  Įİ t

t-1

)2

(8.128)

t=2

But in equation (8.128), İ t-1 is unknown to us, so we cannot apply the OLS technique to minimise equation (8.128). A possible solution arises if we write İ t-1 in the expression as a function of observed Yt 's . This is possible only if the MA polynomial is invertible. The invertible function of İ t-1 is given by f

İ t-1 = ¦ (Į) j (Yt-j-1  ȝ)

(8.129)

j=0

Putting the value of İ t-1 in equation (8.129), the error sum of squared can be written as T

f ª º S(Į,ȝ) = ¦ « Yt  ȝ  Į ¦ (Į) j (Yt-j-1  ȝ) » t=2 ¬ j=0 ¼

2

(8.130)

In practice, Yt is not observed for t=0, -1, -2,………. . So, we have to cut off the infinite sum in this expression to obtain an approximate sum of squares as follows: S(Į, ȝ) =

T

t-2

j

¦ (Y  ȝ  Į¦ (Į) (Y t

t=2

t-j-1

 ȝ)) 2

(8.131)

j=0

We can now estimate the MA process using the OLS technique, i.e., by minimising the residual sum of squares which is presented by equation (8.131). Then, we can obtain the residual sum of squares, and hence, we can obtain the ML estimate of ı 2 . Based on the ML estimate of ı 2 , we can then estimate the AIC, SBIC and HQIC criteria corresponding to the number of parameters that are included in the estimable MA model and the sample size. The MA model for which the values of AIC, SBIC and HQIC are the smallest will be selected for estimation and for other purposes. Ex. 8-9: The data given below are the market price index of DSE of Bangladesh over a period of time1. Select an appropriate MA process using the AIC, SBIC and HQIC criteria.

1

The data of market index have been collected from the monthly index of Dhaka Stock Exchange Limited. Later, the monthly 12

index was annualised by the following formula: INDX t =

¦ INDX

t

t=1

12

¦ INDXt (1990) t=1

×100

Time Series Econometrics

389

Table 8-4: Yearly price index of DSE

Year 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991

INDX 27.2234 29.9592 32.9699 36.2833 39.9296 43.9423 48.3583 53.2180 58.5662 64.4518 70.9289 126.4316 145.6655 133.4685 100.0000 83.6811

Year 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006

INDX 88.5703 104.0662 187.5054 211.0291 381.5475 288.3805 163.8942 130.6047 151.8730 178.5843 213.7943 215.3603 360.8285 359.6011 313.3393

Year 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

INDX 522.1619 639.0281 674.9076 1437.1778 1295.2480 1001.9304 1028.9065 1231.8739 1212.0131 1196.4072 1527.8563 1446.6753 1357.5051 1184.9951 1640.958

Source: DSE website; Own Calculations

Solution: First, we consider the MA model of the type: INDX t = ȝ+Į1İ t-1  Į 2 İ t-2 +.......+Į q İ t-q +İ t

(8.132)

where {İ t } is a white noise process and the variable INDX indicates the yearly price index of DSE. Now, to select an appropriate MA model based on the AIC, SBIC and HQIC criteria, the MA model of different orders is estimated using the LS method. To estimate the MA model of different orders, the software RATS is applied. The results of the MA model of different orders including the estimated values of AIC, SBIC and HQIC are reported in Table 8-5. Table 8-5: Estimated results of MA processes including the AIC, SBIC and HQIC criteria.

Variable 1. Constant 2. MA{1} 1. Constant 2. MA{1} 3. MA{2} 1. Constant 2. MA{1} 3. MA{2} 4. MA{3} 1. Constant 2. MA{1} 3. MA{2} 4. MA{3} 5. MA{4} 1. Constant 2. MA{1} 3. MA{2} 4. MA{3} 5. MA{4} 6. MA{5}

Estimated by LS Gauss-Newton Method: Dependent Variable INDX Estimated Results of an MA(1) Model Coeff Std Error T-Stat Signif AIC SBIC HQIC 14.5118 14.6311 14.5565 487.3526 146.2461 3.3324 0.0018 0.8901 0.0532 16.7177 0.0000 Estimated Results of an MA(2) Model 14.0545 14.2135 14.1141 497.0901 163.2267 3.0454 0.0040 1.2998 0.0792 16.4070 0.0000 0.7118 0.0865 8.2261 0.0000 Estimated Results of an MA(3) Model 495.9748 183.3036 2.7058 0.0099 14.0294 14.2282 14.1039 0.8056 0.2069 3.8920 0.0004 0.6521 0.1402 4.6525 0.0000 0.7441 0.0889 8.3695 0.0000 Estimated Results of an MA(4) Model 13.6523 13.8909 13.7417 501.6705 194.7570 2.5759 0.0138 1.1719 0.0993 11.7956 0.0000 1.0685 0.1560 6.8511 0.0000 0.8807 0.1446 6.0925 0.0000 0.7614 0.1100 6.9217 0.0000 Estimated Results of an MA(5) Model 520.8552 246.7807 2.1106 0.0413 13.6618 13.9401 13.7660 1.2010 0.1486 8.2834 0.0000 1.1725 0.1934 6.0633 0.0000 1.0493 0.2253 4.6563 0.0000 0.9044 0.2189 4.1322 0.0002 0.2274 0.1696 1.3409 0.1877

R2 0.6282 0.7781

0.7959

0.8722

0.8720

Chapter Eight

390

From the estimated results in Table 8-5, it is found that the values of the AIC, SBIC and HQIC criteria are the smallest for the MA(4) model. Thus, it can be said that, for the variable INDX, the MA(4) model will be appropriate.

8.10 Maximum Likelihood (ML) Method for Estimating AR Processes In this section, the maximum likelihood and the conditional maximum likelihood methods are discussed to estimate AR processes. Maximum Likelihood (ML) Method for Estimating an AR(1) Process

Let us now consider the AR(1) process of the type Yt = c + șYt-1 +İ t Yt =

c 1  İt 1  șL 1  șL

(8.133)

where {İ t } is a Gaussian white noise process and L is the lag operator. The expected value and the variance of Yt are given by E(Yt ) = ȝ =

c , and Var(Yt ) = E(Yt  ȝ) 2 1 ș

ı2 1  ș2

We will find the probability distribution of the first observation Y1 in the sample from equation (8.133). Since

{İ t }ft= f is a Gaussian white noise process, Y1 is also Gaussian. Hence, the probability density function of Y1 is given by 1

2

f Y1 (y1 ; c, ș, ı ) =

2ʌ(ı 2 /(1  ș 2 ))

e



1 c º ª « y1  1T » ¼ 2V 2 /(1T 2 ) ¬

2

(8.134)

We consider the distribution for the second observation Y2 conditioning on the observation Y1 = y1 . Thus, from equation (8.133), we have (8.135)

Y2 = c + șY1 + İ 2

Conditioning on Y1 = y1 means treating the random variable Y1 as if it were the deterministic constant y1 . Thus, we have (8.136)

Y2 = c + șy1 + İ 2

The distribution of Y2 is given by (Y2 |Y1 = y1 ) ~ N(c+șy1 , ı 2 ) . Hence, the probability density function of Y2 for a given Y1 = y1 is given by f Y2 |Y1 (y 2 |y1 ;c, ș,ı 2 ) =

1 2ʌı 2

e



1 2V 2

> y2  c T y1 @2

(8.137)

The joint density function of Y2 and Y1 is given by f Y2 ,Y1 (y 2 ,y1 ; c, ș, ı 2 ) = f Y2 |Y1 (y 2 |y1 ;c, ș,ı 2 ) f Y1 (y1 ;c, ș, ı 2 )

(8.138)

Similarly, the distribution of the third observation Y3 conditional on the first two observations Y2 = y 2 and Y1 = y1 is given by f Y3 |Y2 ,Y1 (y3 |y 2 ,y1 ;c, ș, ı 2 ) =

1 2ʌı 2

e



1 2V 2

( y3  c T y2 )2

(8.139)

Time Series Econometrics

391

The joint density function of Y3 , Y2 , and Y1 is given by Y3 ,Y2 ,Y1

(y3 ,y 2 ,y1 ;c,ș,ı 2 ) = f Y3 |Y2 ,Y1 (y3 |y 2 ,y1 ;c,ș,ı 2 ),f Y2 |Y1 (y 2 |y1 ;c,ș,ı 2 )f Y1 (y1 ;c,ș,ı 2 ) = f Y3 |Y2 ,Y1 (y3 |y 2 ,y1 ;ș), f Y2 ,Y1 (y 2 ,y1 ;ș)

(8.140)

In general, the values of Y1 , Y2 ,.......,Yt-1 matter for Yt only through the values Yt-1 , and the density function of the observation at time t conditional on the preceding t-1 observations is given by f Yt |Yt-1 ,Yt-2 ,..........,Y1 (y t |y t-1 , y t-2 ,.....,y1 ;c,ș,ı 2 ) = f Yt |Yt-1 (y t |y t-1:c, ș, ı 2 ) =

1 2ʌı 2

e



1 2V 2

> yt  c T yt-1 @2

(8.141)

The joint density function of the orbservations Y1 , Y2 ,.......,Yt-1 , and Yt is given by f Yt ,Yt-1 ,Yt-2 ,......,Y1 (y t ,y t-1 , y t-2 ,.....,y1 ;c, ș, ı 2 ) = f Yt |Yt-1 (y t |y t-1:c, ș, ı 2 )× f Yt-1 ,Yt-2 ,......,Y1 (y t-1 , y t-2 ,..,y1 ;c, ș, ı 2 )

(8.142)

Therefore, the likelihood function of the complete sample can thus be calculated as below: L(c, ș, ı 2 ) = f YT ,YT-1 ,YT-2 ,..........,Y1 (yT ,yT-1 , y T-2 ,.....,y1 ; c, ș, ı 2 ) T

= f Y1 (y1 ;c, ș, ı 2 )– f Yt |Yt-1 (y t |y t-1 ; c, ș, ı 2 )

(8.143)

t=2

Therefore, the log likelihood function is given by T

log(L(c, ș, ı 2 )) = log(f Y1 (y1 ;c,ș,ı 2 ))+¦ log(f Yt |Yt-1 (y t |y t-1:c, ș, ı 2 )) t=2

ª ı2 º 1 1 1 ª T  1º  2 log(L(c, ș, ı 2 )) =  log(2ʌ)  log « (y1  (c/(1  ș)) 2  « 2 » 2 » log(2ʌ) 2 2 ¬ 2 ¼ ¬1  ș ¼ 2ı /(1  ș ) T ª (y  c  șy t-1 ) 2 º ª T  1º « log(ı 2 )  ¦ « t » » 2ı 2 ¬ 2 ¼ t=2 ¬ ¼

(8.144)

The MLE of c, ș and ı 2 can be obtained by differentiating equation (8.144) with respect to c, ș and ı 2 , and then equating to zero. In practice, the result is a system of nonlinear equation in c, ș and ı 2 , and (y1 , y 2 ,......., yT ) for which there is no simple solution for c, ș and ı 2 in terms of (y1 , y 2 ,......., yT ) . Thus, to maximise the equation with respect to c, ș and ı 2 , we have to apply the iterative procedure or numerical procedure. Maximum Likelihood (ML) Method for Estimation AR(P) Process

The pth-order autoregressive function is given by Yt = c+ș1Yt-1 +ș 2 Yt-2 +.........+ș p Yt-p +İ t

(8.145)

where İ t ~IIN(0, ı 2 ). Let us define, ĭ = (c, ș1 , ș 2 ,......,ș p ) . For this model, the vector ) of the population parameters is to be estimated. The mean value of the AR(p) process is given by E(Yt ) = c +ș1E(Yt-1 )+ș 2 E(Yt-2 )+........+ș p E(Yt-p ) +E(İ t ) Since the process is covariance-stationary, we have

(8.146)

Chapter Eight

392

E(Yt ) = E(Yt-1 ) =......= E(Yt-p ) = ȝ . Therefore, equation (8.146) can be written as ȝ=

c

(8.147)

ª¬1  ș1  ș 2  ........  ș p º¼

Let (y1 , y 2 ,.........,y T ) be the sample observations. Now, the combination of the two methods which are described in the case of estimating an AR(1) process can be used to obtain the likelihood function based on the sample of size T for an AR(p) process. The first p sample observations (y1 , y 2 ,......,y p ) are collected in a (p u 1) vector and is denoted by

yp

which is a p-dimensional Gaussian variable. The mean of the vector

yp

is ȝp which denotes a (p u 1) vector,

each of whose elements is given by equation (8.147). Let ı 2 Vp denote the (p u p) variance-covariance matrix of (Y1 , Y2 ,.......,Yp ) which is given by ª E(Y1  ȝ) 2 E(Y1  ȝ)(Y2  ȝ) « E(Y2  ȝ) 2 « E(Y2  ȝ)(Y1  ȝ) « . . ı 2 Vp = « . . « « . . « «¬ E(Yp  ȝ)(Y1  ȝ) E(Yp  ȝ)(Y2  ȝ)

ª Ȗ0 «Ȗ « 1 =« . « « . « Ȗ p-1 ¬

Ȗ1 Ȗ0 . . Ȗ p-2

Ȗ2 Ȗ1 . . Ȗ p-3

...... E(Y1  ȝ)(Yp  ȝ) º » ...... E(Y2  ȝ)(Yp  ȝ) » » ...... . » ...... . » » ...... . » ...... E(Yp  ȝ) 2 »¼

...... Ȗ p-1 º ...... Ȗ p-2 »» ...... . » » ...... . » ...... Ȗ 0 »¼

(8.148)

where Ȗ j is the jth autocovariance function of the AR(p) process which is given by Ȗ j = E(Yt  ȝ)(Yt-j  ȝ), j = 1, 2,......,p . Here,

yp ~N(ȝp , ı 2 Vp ) .

The probability density function of the first p-observations is given by f Yp ,Yp-1 ,......Y1 (y p ,y p-1 ,.....y1 ;ĭ) = (2ʌ)-p/2 ı -2 Vp -1

= (2ʌ)



p 2

ı -2

p 2

1/2

ª 1 º exp «  2 (y p  ȝ p )cVp -1 (y p  ȝ p ) » ¬ 2ı ¼

Vp -1

1/2

ª 1 º exp «  2 (y p  ȝ p )cVp -1 (y p  ȝ p ) » ¬ 2ı ¼

(8.149)

For the remaining observations (y p+1 , y p+2 ,.......,y T ) in the sample, the prediction-error decomposition can be used. Conditional on the first (t-1) observations, the tth observation is Gaussian with mean c + ș1 y t-1 +ș 2 y t-2 +......+ș p y t-p and variance V 2 . Thus, the probability density function for t>p, is given by fYt |Yt-1 ,Yt-2 ,......,Y1 (yt |yt-1 ,yt-2 ,....,y1; ĭ) =

1

2º ª 1 exp « 2 ª¬ yt  c  ș1yt-1  ș2 yt-2  ......  șp yt-p º¼ » 2ı ¬ ¼ 2ʌı 2

(8.150)

The likelihood function for the complete sample y1 , y 2 ,......,y t-1 , y t is given by T

f Yt ,Yt-1 ,.....,Y1 (y t ,yt-1 ,....,y1 ; ĭ) = f Yp ,Yp-1 ,...,Y1 (yp ,yp-1 ,..,y1 ; ĭ) – f Yt |Yt-1 ,Yt-2, .....,Yt-p (yt |yt-1 ,yt-2 ,...y t-p : ĭ) t=p+1

Therefore, the log-likelihood function is given by

(8.151)

Time Series Econometrics

393

logL() ) = log{f Yt ,Yt-1 ,Yt-2 ,..........,Y1 (y t ,y t-1 , y t-2 ,.........,y1 ; ) )} T

= log(f Yp ,Yp-1 ,....,Y1 (yp ,y p-1 ,...,y1 ; ĭ)) + ¦ log{f Yt |Yt-1 ,Yt-2, ....,Yt-p (y t |y t-1 ,y t-2 ,...y t-p ; ĭ)} t=p+1

p p 1 1 =  log(2ʌ)  log(ı 2 ) + log Vp 1  2 (y p  ȝ p )cVp 1 (y p  ȝ p ) 2 2 2 2ı 

= 

Tp Tp 1 log(2ʌ)  log(ı 2 )  2 2 2 2ı

T

¦ ª¬ y

t=p+1

t

 c  ș1 y t-1  ș 2 y t-2  ......  ș p y t-p º¼

2

T T 1 1 log(2ʌ)  log(ı 2 ) + log Vp -1  2 (y p  ȝ p )cVp -1 (y p  ȝ p ) 2 2 2 2ı 

1 2ı 2

T

¦ ª¬ y

t

t=p+1

 c  ș1 y t-1  ș 2 y t-2  ......  ș p y t-p º¼

2

(8.152)

To evaluate equation (8.152), it requires inverting the (p u p) matrix Vp . We denote the ith row and jth column of Vp-1 by vij (p) which is given by p+i-j ª i-1 º vij (p) = « ¦ ș k ș k+j-i  ¦ ș k ș k+j-i » , for, 1 d i d j d p k= p+1-j ¬ k=0 ¼

(8.153)

Equation (8.153) is called the Galbraith and Galbraith (1974) equation, where, ș 0 =  1 . Here Vp-1 is a symmetric matrix, thus, for i > j, we have vij (p) = v ji (p) . For example, for an AR(1) process, Vp-1 will be a scalar quantity whose value can be obtained by taking i=j=p=1: 1 ª 0 º V1-1 = v11 (1) = « ¦ ș k ș k  ¦ ș k ș k » k= 1 ¬ k=0 ¼

= ª¬ș 02  ș12 º¼

= ª¬1  ș12 º¼ Thus, we have ı 2 V =

(8.154)

ı2 which is the variance of the AR(1) process. 1  ș12

For an AR(2) process, the V2-1 is given by ª v (2) v12 (2) º V2-1 = « 11 » ¬ v 21 (2) v 22 (2) ¼ Now, 2 ª 0 º v11 (2) = « ¦ ș k ș k  ¦ ș k ș k » k= 2 ¬ k=0 ¼

= ª¬ș 02  ș 22 º¼ = ª¬1  ș 22 º¼

In the same way, we have v12 (2) = v 21 (2) =  >ș1 +ș1ș 2 @ , and v 22 (2) = ª¬1  ș 22 º¼ .

(8.155)

Chapter Eight

394

Thus, putting these values in (8.166), we have ª (1  ș 22 ) V2-1 = « ¬ (ș1 +ș1ș 2 )

 (ș1 +ș1ș 2 ) º » (1  ș 22 ) ¼

 ș1 º ª(1  ș 2 ) = (1+ș 2 ) « (1  ș 2 ) »¼ ¬ ș1

(8.156)

Thus, we have, V2-1 = (1+ș 2 ) ª¬(1  ș 2 ) 2  ș12 º¼

(8.157)

And, (y p  ȝ p )cVp 1 (y p  ȝ p )

 ș1 º ª(y1  ȝ) º ª(1  ș 2 ) (1  ș 2 ) »¼ «¬ (y 2  ȝ) »¼ ¬ ș1

>(y1  ȝ) (y2  ȝ)@ (1+ș 2 ) «

= 1+ș 2 ^(1  ș 2 )(y1  ȝ) 2  2ș1 (y1  ȝ)(y 2  ȝ)+(1  ș 2 )(y 2  ȝ) 2 `

(8.158)

Therefore, the exact log likelihood function for the AR(2) process is given by logL()) = 

T T 1 log(2ʌ)  log(ı 2 ) + log (1+ș 2 ) ª¬(1  ș 2 ) 2  ș12 º¼ 2 2 2

^

`

­ (1+ș 2 ) ½ (1  ș 2 )(y1  ȝ) 2  2T1 (y1  ȝ)(y 2  ȝ)  (1  ș 2 )(y 2  ȝ) 2 ` ® 2 ¾^ 2ı ¯ ¿ 

where ȝ =

1 2ı 2

T

¦>y t=3

2

t

 c  ș1 y t-1  ș 2 y t-2 @

(8.159)

c . 1  ș1  ș 2

The MLE of c, ș1 , ș 2 , and ı 2 can be obtained by differentiating equation (8.159) with respect to c, ș1 , ș 2 , and ı 2 , and then equating to zero. In practice, the result is a system of nonlinear equation in c, ș1 , ș 2 , ı 2 , and (y1 , y 2 ,......., yT ) for which there is no simple solution for c, ș1 , ș 2 , and ı 2 , in terms of (y1 , y 2 ,......., yT ) . Thus, to

maximise the equation with respect to c, ș1 , ș 2 , and ı 2 , we have to apply the iterative procedure or the numerical procedure.

Ex. 8-10: Estimate the appropriate AR model for the given problem in Ex. 8-9 using the ML method. Solution: First, we consider the AR model of the type INDX t = c+ș1INDX t-1 +ș 2 INDX t-2 +.......+ș p INDX t-p +İ t

(8.160)

where {İ t } is a Gaussian white noise process and the variable INDX indicates the yearly price index of DSE. To estimate an appropriate AR model using the ML method, the software package EViews is applied. It is found that the values of the AIC, SBIC and HQIC criteria are the smallest for the AR(1) model. Theefore, the ML estimates of the AR(1) are reported in Table 8-6.

Time Series Econometrics

395

Table 8-6: The ML estimates of an AR(1) model Dependent Variable: INDX Method: ARMA Maximum Likelihood (OPG - BHHH) Sample: 1976 2021 Included observations: 46 Convergence achieved after 8 iterations Coefficient covariance computed using the outer product of gradients Coefficient Std. Error t-Statistic 1.0532 649.4189 683.9720 13.7094 0.0707 0.9697 9.5722 2858.113 27358.51 Mean dependent var 0.8979 S.D. dependent var 0.8934 Akaike info criterion 171.0767 Schwarz criterion 1258491 Hannan-Quinn criter. -301.6672 Durbin-Watson stat 189.1333 0.0000

Variable c AR(1) SIGMASQ R-squared Adjusted R-squared S.E. of regression Sum squared resid Log-likelihood F-statistic Prob(F-statistic)

Prob. 0.2981 0.0000 0.0000 474.8196 523.4355 13.2464 13.3657 13.2911 1.9386

Therefore, for the given data in Ex. 8-9, the estimated AR(1) model will be: ˆ INDX t = 683.9720+0.9697INDX t-1

(8.161)

8.11 Maximum Likelihood (ML) Method for Estimating MA Processes In this section, the conditional maximum likelihood methods are discussed to estimate MA processes.

ML Method for Estimation MA(1) Process: Conditional Maximum Likelihood Estimates Let us consider the Gaussian MA(1) process of the type (8.162)

Yt = ȝ + İ t  I1İ t-1

where İt ~IIN(0, ı2 ), t. Let ) = (ȝ, I, ı2 )c be a vector of the population parameters ȝ, I , and ı 2 . If the value of İ t-1 is known with certainty, then we have Yt |İ t-1 ~N(ȝ +I İ t-1 , ı 2 ) . Thus, the conditional pdf of Yt given İ t-1 is given by f Yt |İ t-1 (y t |İ t-1 ;ĭ) =

ª  y t  ȝ  I İ t-1 2 º exp « » 2ı 2 «¬ »¼ 2ʌı 2 1

(8.163)

From equation (8.163), we have Y1 = ȝ + İ1 +I İ 0 . Letting İ 0 = 0, we have Y1 |İ 0 ~N(ȝ, ı 2 ) . Given the value of the observation Y1 = y1 , we then can find the value of İ1 which is given by

İ1 = y1  ȝ

(8.164)

Thus, we have f Y2 |Y1 ,İ0 =0 (y 2 |y1 ,İ 0 =0;ĭ) =

ª  y 2  ȝ  I İ1 2 º exp « » 2ı 2 «¬ »¼ 2ʌı 2 1

(8.165)

Since İ1 is known with certainty, İ 2 can be calculated from the following equation İ 2 = y 2  ȝ  I İ1

(8.166)

Since İ 2 is known with certainty, İ 3 can be calculated from the following equation

İ 3 = y3  ȝ  I İ 2

(8.167)

Chapter Eight

396

Now, continuing this procedure and letting the value İ 0 = 0, the full sequence {İ1 , İ 2 ,.......,İ T } can be calculated from {y1 , y 2 ,..........,y T } by iterating on the following equation: İ t = y t  ȝ  I İ t-1 , for t = 1, 2, .....,T, and starting from İ 0 = 0 .

(8.168)

The conditional density function of the tth observation can then be calculated from equation (8.163) as f Yt |Yt-1 ,Yt-2 ,......,Y1 ,İ0 =0 (y t |y t-1 , y t-2 ,.....,y1 ;İ 0 = 0;ĭ) = f Yt |İ t-1 (y t |İ t-1 ;ĭ)

­ İ2 ½ exp ® t 2 ¾ 2ʌı 2 ¯ 2ı ¿ 1

(8.169)

Therefore, the likelihood function of the complete sample can thus be calculated as below: L(ĭ) = f YT ,YT-1 ,YT-2 ,..........Y1 |İ0 =0 (yT ,yT-1 , y T-2 ,.........,y1 |İ 0 =0; ĭ) T

= f Y1 |İ0 =0 (y1 |İ 0 =0; ĭ)– f Yt |Yt-1 ,Yt-2 ,......,Y1 ,İ0 =0 (y t |y t-1, y t-2 ,...,y1 , İ 0 = 0; ĭ)

(8.170)

t=2

Therefore, the conditional log likelihood function is given by



log > L(ĭ) @ = log f Yt ,Yt-1 ,Yt-2 ,.............Y1 |İ0 =0 (y t ,y t-1 , y t-2 ,......,y1 |İ 0 =0; ĭ) = 



T İ2 T T log(2ʌ)  log(ı 2 )  ¦ t 2 2 2 t = 1 2ı

(8.171)

Now, using the data of y and for particular numerical values of P and I , we can calculate the sequence of İ's from equation (8.168). The conditional log likelihood function (8.171) will be a function of the sum of squares of İ's . Since the conditional log likelihood function is a non-linear function of P and I , the parameters P , I and V 2 can be obtained using a computer software program for iteration for which the conditional likelihood function of the MA(1) process will be numerically optimised. Based on an arbitrary starting value İ 0 = 0, the result of iteration on equation (8.168) will be İ t = (y t  ȝ)  I (y t-1  ȝ)+I 2 (y t-2  ȝ)  .....+(  1) t-1I t-1 (y1  ȝ)+(-1) tI t İ 0

(8.172)

If |I |1 , the consequences of imposing the restriction İ 0 = 0 accumulate over time. In such a situation, the conditional approach is not reasonable. If |I |>1 is to be found from the numerical optimisation of (8.171), the result must be discarded. In such a case, the numerical optimisation of the conditional likelihood function should be attempted again with the reciprocal of Iˆ used as a starting value for the numerical search procedure.

Maximum Likelihood Estimation for a Gaussian MA(q) Process: Conditional Likelihood Function The maximum likelihood estimation technique is discussed for a Gaussian MA(q) process. Let us now consider the Gaussian MA(q) process of the type Yt = ȝ + İ t  I1İ t-1 +I2 İ t-2  ........  Iq İ t-q

(8.173)

where all the roots of 1+I1L+I2 L2 +.......+Iq Lq = 0 lie outside the unit circle and İ t ~IIN(0, ı 2 ). Let ĭ = (ȝ, I1 ,....., Iq , ı 2 )c be a vector of the population parameters ȝ, I1 ,.....Iq , and ı 2 to be estimated. From equation

(8.173), we have Y1 = ȝ + İ1 +I1İ 0 +I2 İ -1  ........  Iq İ1-q

(8.174)

Time Series Econometrics

397

A simple approach is to condition on the assumption that the first q values of İ were all zero, i.e., conditioning on İ 0 = İ -1 = İ -2 =........= İ -q+1 = 0 . Let us now defining E 0 = (İ 0 , İ -1 , İ -2 ,......,İ -q+1 )c be a q u 1 vector . Then, we have Y1 |E 0 =0~N(ȝ, ı 2 ) . Thus, the conditional pdf of Y1 given that E 0 f Y |E 1

0=

0 (y1 |E 0 =

0 is given by

ª  y1  ȝ 2 º exp « » 2ı 2 «¬ »¼ 2ʌı 2 1

0;ĭ) =

ª İ 2 º exp « 12 » 2ʌı 2 ¬ 2ı ¼ 1

=

(8.175)

Next, we consider the distribution of the second observation Y2 conditioning on Y1 = y1 . From equation (8.173), we have (8.176)

Y2 = ȝ + İ 2 +I1İ1 +I2 İ 0  I3 İ -1  ........  Iq İ -q+2

Moreover, for the given observation y1 , the value of İ1 is then known with certainty as well by İ1 = y1  ȝ and İ 0 = İ -1 = İ -2 =........= İ -q+2 = 0 . Hence, we have Y2 |Y1 f Y2 |Y1 ,E0 =0 (y 2 |y1 ,E 0 =0;ĭ) =

=

y1 , E 0 =0~N(ȝ  I1İ1 , ı 2 ) . Therefore, we have

ª  y 2  ȝ  I1İ1 2 º exp « » 2ı 2 «¬ »¼ 2ʌı 2 1

ª İ 2 º exp « 22 » 2ʌı 2 ¬ 2ı ¼ 1

(8.177)

Since İ1 is known with certainty, İ 2 can be calculated from the following equation: İ 2 = y 2  ȝ  I1İ1 .

(8.178)

Since İ 2 is known with certainty, İ 3 can be calculated from the following equation: (8.179)

İ 3 = y3  ȝ  I1İ1  I2 İ 2

Now, continuing this procedure and letting the value of E 0 from {y1 , y 2 ,......,y T } by iterating on the following equation:

0, the full sequence {İ1 , İ 2 ,.....,İ T } can be calculated

(8.180)

İ t = y t  ȝ  I1İ t-1  I2 İ t-2  ........  Iq İ t-q for t = 1, 2, .....,T, and starting from E0 = 0.

The likelihood function (conditional on E0 = 0 ) of the complete sample can thus be calculated as the product of these individuals densities as below: f YT ,YT-1 ,.......,Y1 |E0 =0 (yT ,yT-1 ,.....,y1 |E 0 =0; ĭ) = f Y1 |E0 =0 (y1 |E 0 =0; ĭ)× Yt |Yt-1 ,Yt-2 ,........,Y1 ,E 0 =0

(y t |y t-1, y t-2 ,......,y1 , E 0 =0; ĭ)

(8.181)

Therefore, the conditional log likelihood function is denoted by log( L() ) ) and is given by logL(ĭ) = log f Yt ,Yt-1 ,Yt-2 ,..........,Y1 |E0 =0 (y t ,y t-1 , y t-2 ,......,y1 |E 0 =0; ĭ) = 

T İ2 T T log(2ʌ)  log(ı 2 )  ¦ t 2 2 2 t = 1 2ı

(8.182)

Chapter Eight

398

The MLE of (ȝ, I1, I2,.....,Iq ) can then be obtained by minimising the sum of squared residuals, i.e., Max

ȝ, I1 , I2 ,.....,Iq

log{L(ȝ, I1 , I2 ,....., Iq )} =

Min

T

¦İ I

ȝ, I1 , I2 ,.....,

q

2 t

(ȝ, I1 , I2 ,....., Iq )

(8.183)

t=1

The conditional ML estimate of ı 2 turns out to be ıˆ 2 =

1 T 2 ¦ ݈ t T t=1

(8.184)

It is very important to say that, if you have a sample of size T to estimate an MA(q) process by conditional MLE, you will also use all the T observations of this sample. Analytical expressions for MLE are usually not available due to highly non-linear FOCs. MLE requires to application of numerical optimisation techniques.

Ex. 8-11: Estimate an appropriate MA model for the given problem in Ex. 8-9 using the ML method. Solution: First, we consider the MA(q) model of the type INDX t = c+I1İ t-1 +I2 İ t-2 +.......+Iq İ t-q +İ t

(8.185)

where {İ t } is a Gaussian white noise process and the variable INDX indicates the yearly price index of DSE. Using the software package EViews based on the ML method, it is found that the AIC, SBIC and HQIC criteria are the smallest for the MA(4) model. Therefore, the ML estimates of the MA(4) process are reported in Table 8-7.

Table 8-7: The estimated results of an MA(4) model

Variable c MA(1) MA(2) MA(3) MA(4) SIGMASQ R-squared Adjusted R-squared S.E. of regression Sum squared resid Log-likelihood F-statistic Prob(F-statistic)

Dependent Variable: INDX Method: Maximum Likelihood Method Sample: 1976-2021 Included Observations: 46 Convergence achieved after 91 iterations Coefficient Std. Error t-Statistic 501.6705 208.8537 2.4020 1.1719 0.1065 10.9994 1.0685 0.1672 6.3887 0.8807 0.1550 5.6812 0.7614 0.1180 6.4545 34241.90 7814.418 4.3819 Mean dependent var 0.8722 S.D. dependent var 0.8563 Akaike info criterion 198.4394 Schwarz criterion 1575127 Hannan-Quinn criter. -308.0037 Durbin-Watson stat 54.6200 0.0000

Prob. 0.0210 0.0000 0.0000 0.0000 0.0000 0.0001 474.8196 523.4355 13.6523 13.8909 13.7417 1.6526

Thus, the estimated MA(4) model is ˆ INDX t = 501.6705+1.1719İ t-1 +1.06859İ t-2 +0.8807İ t-3 +0.7614İ t-4

(8.186)

8.12 Methods for Estimating ARMA Models In this section, different estimation methods for ARMA models are described.

Conditional Maximum Likelihood Estimates of the ARMA(p, q) Process In this section, the maximum likelihood estimation technique of an ARMA(p, q) process is discussed. Let us now consider the ARMA( p, q) model of the type Yt = c+ș1Yt-1 +ș 2 Yt-2 +......+ș p Yt-p + İ t  I1İ t-1 +I2 İ t-2  ......  Iq İ t-q

(8.187)

Time Series Econometrics

399

where all the roots of (1  ș1L  ș 2 L2  ........  ș p Lp ) = 0, and (1  I1L +I2 L2  ........  Iq Lq )

0 lie outside the unit

2

circle, and İ t ~IIN(0, ı ) . Let us now define, Ĭ = (c, ș1 , ș 2 ,.....,ș p , I1 , I2 ,.....,Iq , ı 2 )c is a vector of parameters which is to be estimated for the ARMA(p, q) model. The approximation to the likelihood function for an autoregressive process conditional on the initial p values of Y, i.e., conditioning on Y0 = (Y0 , Y-1 , Y-2 ,......,Y-p+1 )c = 0 and the approximation to the likelihood function for a moving average process conditioned on the initial q values of İ , i.e., conditioning on İ0 = (İ 0 ,İ -1 ,......,İ -q+1 )c= 0 . A common approximation to the likelihood function for an ARMA(p, q) process conditions on both Y’s and İ's . Now, for the (p+1)th observation, equation (8.187) can be written as (8.188)

Yp+1 = c+ș1Yp +ș 2 Yp-1 +......+ș p Y1 + İ p+1 +I1İ p +I2 İ p-1  ......  Iq İ p-q+1

Now, conditioning on Y1 = y1 , Y2 = y 2 , Y3 = y3 ,......., and Yp = y p , and setting İ p = İ p-1 =.........= İ p-q+1 = 0, we have Yp+1 ~ N( c+ș1Yp +ș 2 Yp-1 +........+ș p Y1 , ı 2 ) .

Then, the conditional log-likelihood function can be calculated from t = p+1, p+2,……,T and is given by log(L(Ĭ) = log f(y t , y t-1 ,...,y p+1 |Y1 = y1 , Y2 = y 2 ,.....,Yp = y p ;İ p = İ p-1 =.....=İ p-q+1 =0;Ĭ) 1 T İ 2t ªT  pº ªT  pº 2 =« log(2ʌ) log(ı )   ¦ » « 2 » 2 t=p+1 ı 2 ¬ 2 ¼ ¬ ¼

(8.189)

where the sequence {İ p+1 , İ p+2 ,.....,İ T } can be calculated from {y1 , y 2 ,......,y T } iterating on (8.190)

İ t = Yt  c  ș1Yt-1  ș 2 Yt-2  .......  ș p Yt-p  I1İ t-1  I2 İ t-2  .....  Iq İ t-q where t=p+1, p+2,……...,T.

It is important to note that, if you have a sample of size T to estimate an ARMA(p, q) process by conditional MLE, you can use only the T-p observations of this sample. Now the maximisation of equation (8.189) with respect to c, ș1 ,.....,ș p , I1 ,......, Iq is equivalent to minimising the term T

¦ y

t

t=p+1

 c  ș1 y t-1  ....  ș p y t-p  I1İ t-1  I2 İ t-2  .....  Iq İ t-q

2

(8.191)

which refers to minimising the residual sum of squares of the regression equation of y t on the constant term c and its own p-lagged values and q-lagged values of the random error term. Thus, applying the OLS method to the regression equation Yt = c+ș1Yt-1 +ș 2 Yt-2 +........+ș p Yt-p + İ t +I1İ t-1 +I2 İ t-2  ........  Iq İ t-q , we can obtain the OLS estimators of c, ș1 ,.....,ș p , I1 ,......, Iq . The conditional maximum likelihood estimate of ı 2 is obtained by taking the differentiation of equation (8.189) with respect to V 2 and then equating to zero. Thus, differentiating equation (8.189) with respect to ı 2 and then equating to zero and then solving the equation, we have ª y  cˆ  șˆ y  ....  șˆ y  Iˆ İ  Iˆ İ  ........  Iˆ İ t 1 t-1 p t-p 1 t-1 2 t-2 q t-q 2 ıˆ = ¦ «« Tp t=p+1 ¬« T





2

º » » ¼»

(8.192)

Thus, it can be said that the maximum likelihood estimate of ı 2 is the average squared residual from the regression equation (8.187).

Note: From equations (8.171), (8.182), and (8.189), we see that all the conditional log-likelihood functions take a T İ2 T* T* concise form of the type  log(2ʌ)  log(ı 2 )  ¦ t 2 , where T* is the appropriate total sample observations 2 2 t=t* 2ı

Chapter Eight

400

ˆ is and t * is the first sample observation used, respectively. The solution to the conditional log-likelihood function 4 ˆ also called the conditional sums of the squared estimator, CSS, denoted as 4CSS .

Ex. 8-12: An appropriate ARMA model is estimated using the maximum likelihood method based on the per capita GDP (constant 2015 USD) of Bangladesh from 1971-20212. First, we consider the ARMA(p, q) model of the type PGDPt = c+ș1PGDPt-1 +ș 2 PGDPt-2 +......+ș p PGDPt-p + İ t  I1İ t-1 +I2 İ t-2  ......  Iq İ t-q

(8.193)

where {İ t } is a Gaussian white noise process and the variable PGDP indicates the per capita GDP (constant 2015 USD) of Bangladesh. The ML estimates of ARMA models with different orders are obtained using the software package EViews. From the estimated results, it is found that the AIC, SBIC and HQIC criteria are the smallest for the ARMA(2, 2) model. Therefore, the ML estimates of the ARMA(2, 2) process are reported in Table 8-8.

Table 8-8: The ML estimates of an ARMA(2, 2) model

Variable C AR(1) AR(2) MA(1) MA(2) SIGMASQ R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

Dependent Variable: PGDP Method: ARMA Maximum Likelihood Method Sample: 1971-2021 Included Sample Observations: 51 Convergence achieved after 50 iterations Coefficient Std. Error t-Statistic 1608.747 997.6221 1.6126 1.9883 0.0162 122.9358 -0.9903 0.0151 -65.5394 -1.1421 0.2833 -4.0310 0.5353 0.2850 1.8783 260.8816 55.2717 4.7199 Mean dependent var 0.9980 S.D. dependent var 0.9977 Akaike info criterion 17.1949 Schwarz criterion 13304.96 Hannan-Quinn criter. -220.1857 Durbin-Watson stat 4421.135 0.0000

Prob. 0.1138 0.0000 0.0000 0.0002 0.0668 0.0000 742.0574 361.9171 8.8696 9.0969 8.9565 1.4064

Therefore, the estimated ARMA(2, 2) model is ˆ = 1608.747+1.9883PGDP -0.9903PGDP -1.1421İ +0.5353İ ½ PGDP t t-1 t-2 t-1 t-2 ° t-Test: 1.6126 122.9358 -65.5394 -4.0310 1.8783 ° ° 0.2833 0.2850 SE: 997.6221 0.0162 0.0151 ¾ ° R 2 : 0.9980 ° °¿ ıˆ 2 : 260.8816

(8.194)

The Box-Jenkins Approach to Estimate ARMA Model The Box-Jenkins (1976) approach is one of the most popular and widely used methodologies to analyse time-series data in a systematic manner. It is very popular for analysing time-series data because: (i) It is applicable either for stationary or nonstationary time series variables; (ii) It can be used with or without seasonal elements; (iii) It has well-documented computer programs; (iv) This approach is practical and pragmatic; The Box-Jenkins approach is widely applicable for estimating the AR(p), MA(q), ARMA (p, q) and ARIMA(p, d, q) models. Sometimes, the ARMA models are referred to as the Box-Jenkins models. The major steps namely: identification, estimation, diagnostic checking, and forecasting and controlling are involved in estimating an

2

Source: WDI, 2022

Time Series Econometrics

401

ARMA(p, q) model using the Box-Jenkins approach. To estimate the ARMA(p, q) model using the Box-Jenkins approach, the following steps are involved:

Step 1: If necessary, transform the original data to achieve stationarity. Step 2: To capture the dynamic features of the data, it is necessary to determine the order of the ARMA model. For the specification of the model, graphical procedures are applied. By plotting the data over time and plotting the autocorrelation functions (ACFs) and partial autocorrelation functions (PACFs), we can determine the order of the ARMA model. We can also use the AIC, SBIC and HQIC criteria to determine the appropriate order of the ARMA model. Step 3: The OLS estimates or the ML estimates of the parameters of the identified ARMA model can be obtained using an appropriate software package namely: EViews, Python, R, RATS, SPSS, STATA, etc. Step 4: This step involves model checking, i.e., diagnostic analysis to confirm whether the specified and estimated model is consistent with the observed feature of the data. Box and Jenkins have suggested two methods for diagnostic checking namely: (i) overfitting and (ii) residuals diagnostics. Overfitting means that we have to fit an ARMA model with greater orders than it is required to capture the dynamics of the data as identified in step 2. If the model is correctly and adequately specified in step 2, and if we add any extra term in the ARMA model, it would be insignificant. Residual diagnostic means checking the residuals for evidence of linear dependence. If there exists the linear dependency of the residuals, the model which is specified in step 2 is not adequate to capture the dynamics of the data. The ACF, PACF or Ljung-Box tests are applied for residual diagnostics. If the model is not adequate, go back to step 2 for re-estimation. Step 5: Finally, we use the model for forecasting and control. Transform the original data to achieve it as stationary Identify the model Estimate the parameters of the identified model Diagnostic checking. Is the model adequate?

No

Yes Use the model for forecasting

Fig. 8-7: The Box-Jenkins methodology for an ARMA(p, q) model

Choosing a Model Before estimating any model, it is common to estimate autocorrelation and partial autocorrelation functions directly from the data. These will give you an idea about which model might be appropriate for the given data. After one or more models are estimated, their quality can be judged by checking whether the residuals have more or less white noise and by comparing them with alternative specifications. These comparisons can be based on statistical significance tests or the use of particular model selection criteria.

Diagnostic Checking of the ARMA(p, q) Model If an ARMA(p, q) model is chosen on the basis of the sample ACF and PACF, we also estimate an ARMA(p+1, q) and ARMA(p, q+1) model, and test the significance of the additional parameters. The significance of the residual 1 autocorrelations is often checked by comparing them with approximately two standard-error bounds r1.96 u , T where T is the sample size. If the sample autocorrelation coefficient ȡˆ k falls outside the region for a given value of k, where k is the number of lags, then the null hypothesis H 0 : ȡ k = 0, will be rejected at a 5% level of significance.

Chapter Eight

402

Thus, the term y t-k will be added to the model. Also, to check the overall acceptability of the autocorrelations of the time-series variable y, the Ljung-Box (1978) Q test statistic is applicable which is given by k

Q = T(T+2) ¦

ȡˆ 2j

j=1 (T  j)

2 [ where k > p+q] ~ Ȥ (k-p-q)d.f

(8.195)

Where ȡˆ j is the estimated autocorrelation coefficients of the time-series variable y and k is chosen by the researchers. This test statistic is very useful to test for linear dependence of time-series data.

Ex. 8-13: An appropriate ARMA model is estimated using the Box-Jenkins approach based on the time-series data carbon dioxide emission (in kt) of Japan from 1970-20193. The following steps are involved in estimating an ARMA model for the given data. Step 1: First, we transform the given variable to a new variable Y such that Yt = ln(CO2t )  ln(CO2t-1 ) to achieve stationarity, where the variable CO2 t indicates carbon dioxide emission in kt of Japan at time t. From the graphical presentation of ACF and PACF, we can conclude whether the transformed series is stationary or not. The estimated autocorrelation functions (ACF) and partial autocorrelation functions (PACF) of different lags of the variable Y are given in Table 8-9. Table 8-9: Estimated ACF and PACF for different lags of the variable Y Lags 1 2 3 4 5 6 7 8 9 10

ACF 0.4660 0.4607 0.4570 0.4042 0.3031 0.3764 0.3449 0.2098 0.2919 0.1896

PACF 0.4660 0.3111 0.2321 0.1097 -0.0439 0.1304 0.0794 -0.1223 0.0773 -0.0869

The graphical presentation of these estimated ACF and PACF is shown below: 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 0

1

2

3

4

5 ACF

6

7

8

9

10

PACF

Fig. 8-8: Correlogram of the time series variable Y.

Since the correlogram of the series Y does not drop off as k, the number of lags, becomes large, thus the variable Y is nonstationary. That is why we have to plot the ACF and PACF for the first difference of the variable Y that is 'Yt = Yt -Yt-1 which are given below in Table 8-10.

3

Source: WDI

Time Series Econometrics

403

Table 8-10: Estimated ACF and PACF for different lags of ǻY Lags 1 2 3 4 5 6 7 8 9 10

ACF -0.4708 -0.0234 0.0437 0.0531 -0.1725 0.0798 0.1122 -0.2162 0.1610 -0.0833

PACF -0.47077 -0.31484 -0.16967 -01054 -0.1869 -0.1409 0.0581 -0.1620 0.0020 -0.1053

The graphical presentation of these estimated ACF and PACF is shown below: 1.00 0.75 0.50 0.25 0.00 -0.25 -0.50 -0.75 -1.00 0

1

2

3

4

5 ACF

6

7

8

9

10

PACF

Fig. 8-9: Correlogram of the variable 'Yt

Since the correlogram of the series 'Yt drops off as k, the number of lags, becomes larger, and the variable 'Yt is stationary.

Step 2: After getting the stationary time-series data by differencing the original series Y, we examine the correlogram to decide on the appropriate orders of AR and MA components. The correlogram of an MA process is zero after a point. Thus, the ARMA(0, 1) model is specified for the variable 'Yt . Step 3: The information criteria are also calculated for different orders of ARMA models which are presented in Table 8-11. Table 8-11: Information criteria to select the ARMA model of the variable 'Yt

AR p MA o

0

0 1 2 3

-168.3828 -180.7422 -184.6775 -184.3541

AIC 1

2

3

-190.0287* -188.0287 -176.7438 -184.2713 SBIC 1

-188.0287 -186.0325 -184.0403 -183.2441

-186.0668 -184.0392 -182.3175 -182.9557

2

3

-183.9078 -179.8512 -175.7985 -172.9419

-179.8854 -175.7974 -172.0153 -170.5930

2

3

-186.4236 -183.6247 -180.8300 -179.2312

-183.6590 -180.8289 -178.3046 -178.1402

AR p MA o

0

0 1 2 3

-168.3828 -178.6817 -180.5566 -178.1727

AR p MA o

0

-187.9683* -183.9079 -170.5625 -176.0295 HQIC 1

0 1 2 3

-168.3828 -179.9396 -183.0723 -181.9463

-189.2262* -186.4236 -174.3361 -181.0610

Chapter Eight

404

From the estimated values of these criteria, it can be said that the ARMA(0, 1) model is appropriate for the variable 'Yt .

Step 4: In this step, the identified model will be estimated using the Box-Jenkins approach. The estimated results for the ARMA(0, 1) model are given in Table 8-12. Table 8-12: Estimated results for an ARMA(0, 1) model Box-Jenkins - Estimation by LS Gauss-Newton Convergence in 9 Iterations. Final criterion was 0.0000081 0 and Į1 > 0 . Under these assumptions, the conditional mean and variance for İ t and Yt are given below: E > İ t |İ t-1 @ = 0 ,  t. Thus, we have E(İ t ) = 0, and we have

(8.208)

E(Yt ) = X ct ȕ

Also, we have Var > İ t |İ t-1 @ = E ª¬İ 2t |İ t-1 º¼ 2 = E(v 2t ) D 0  D1H t-1

= Į 0 +Į1İ 2t-1

(8.209)

We have E(Yt |Yt-1 ) = X ct ȕ, and Var(Yt |Yt-1 ) = Į 0 +Į1İ 2t-1 . Furthermore, both İ t |İ t-1 and Yt |Yt-1 are normally distributed. However, the marginal (unconditional) densities for İ t and Yt will not be normal. Here, İ t is heteroscedastic conditional on İ t-1 . The unconditional variance of İ t is given by Var(İ t ) = E ª¬ Var İ t |İ t-1 º¼ = E ª¬ Į 0 +Į1İ 2t-1 º¼

= Į 0 +Į1E(İ 2t-1 ) = Į 0 +Į1Var(İ t-1 )

(8.210)

Time Series Econometrics

407

The unconditional variance remains unchanged over time if the process is variance stationary. Then, we have Var(İ t ) = Į 0 +Į1Var(İ t ), [ since Var(İ t ) = Var(İ t-1 )] Var(İ t ) =

Į0 1  Į1

(8.211)

It can be shown that the İ t 's are uncorrelated so that we have E(İ t ) = 0,  t, and Var(İ t ) = constant. Therefore, the classical assumptions of a linear regression model are satisfied. Thus, the least-squares estimator of ȕ such that ȕˆ = (X cX) -1X cY is the best linear unbiased estimator, and the usual estimator for the covariance matrix of ȕˆ , is valid. However, by using the maximum likelihood technique, it is possible to find a non-linear estimator that is asymptotically more efficient than the least-squares estimators. Conditioning on an initial observation Y0 , the likelihood function for Į = Į 0

Į1 c , and ȕ can be written as

T

L = – f(Yt |Yt-1 )

(8.212)

t=1

Using the results and ignoring a constant factor, the log-likelihood function can be written as logL = 

= 

T ª (Y -Xcȕ) 2 º 1 T log ¬ª Į 0 +Į1İ 2t-1 ¼º  ¦ « t t 2 » ¦ 2 t=1 t=1 ¬ Į 0 +Į1İ t-1 ¼ T ª º (Yt  X ct ȕ) 2 1 T 2 log ªĮ 0 +Į1 Yt-1  X ct-1ȕ º  ¦ « » ¦ 2 ¬ ¼ t=1 « Į +Į Y  X c ȕ » 2 t=1 0 1 t-1 t-1 ¬ ¼

(8.213)

A maximum likelihood estimator for Į and ȕ can be obtained by maximising the log-likelihood function log(L). Alternatively, we can find an asymptotically equivalent (FGLS) estimator that is based on the scoring algorithm using least-squares computer programs Engle (1982). It involves the following five steps:

Step 1: Regress Y on X’s of the type: Yt = x ct ȕ+İ t

(8.214)

-1 and then, find the least-squares estimator of ȕ which is given by ȕˆ = X cX X cY and corresponding residuals e = Y  Xȕˆ . Here, we assume that (T+1) observations (Y0 , Y1 ,.......,YT ) and (X 0 , X1 ,........,X T ) are available.

Step 2: Regress e 2t on e 2t-1 of the type e 2t = Į 0 +Į1e2t-1 + v t

(8.215)

and then obtain initial estimates of Į 0 and Į1 using T observations, t = 1, 2,……,T. Let us define, Įˆ = Įˆ 0

Įˆ 1 c is a

(2 u 1) vector. We then find an initial estimate of the conditional variance h 0t = Įˆ 0 +Įˆ 1e 2t-1 , for t = 1, 2,……….,T.

(8.216)

Step 4: We compute the asymptotically efficient estimator for D such that aˆ asy = Įˆ +d Į

(8.217)

ª e2 º e2 1 where dD is the vector of the least-squares estimators from a regression equation of « t0  1» on 0 and t-10 . That ht ht ¬ ht ¼ is,

e 2t e 2t-1 1 1 d +d +u t  0 1 h 0t h 0t h 0t

(8.218)

Chapter Eight

408

Thus, we have d Į = dˆ 0



-1 c dˆ 1 . The asymptotic covariance matrix for Dˆ is given by 2 QcQ , where Q is the



regressor matrix in this regression.

Step 5: Given that aˆ = aˆ 0

aˆ 1 c , we compute h1t = aˆ 0 + aˆ 1e 2t-1 , for t=1, 2,…….,T. Now, we compute

1

2 ª1 § aˆ e · º 2 rt = « 1  2 ¨ 11 t ¸ » ; t = 1, 2,……..,(T-1). «¬ h t © h t+1 ¹ »¼

(8.219)

and, ª1 ·º aˆ § e 2 s t = « 1  11 ¨ 1t+1  1¸ » , t = 1, 2,…….,(T-1). «¬ h t h t+1 © h t+1 ¹ »¼

An asymptotic estimator of ȕ is given by ˆ ȕˆ asy. = ȕ+d ȕ

(8.220)

where dȕ is the vector of the least-squares estimators from a regression equation of



dȕ = X*c X*



-1

et s t on X t rt . Thus, rt

X*c Y* , where X* and Y* are respectively {(T-1) u k}, and {(T-1) u 1} with tth row equal to

et s t . The asymptotic covariance matrix of ȕˆ asy. is given by X*c X* rt matrix in this regression.



X*tc = X ct rt and Yt*



-1

, where X* is the regressor

This procedure is asymptotically equivalent to the maximum likelihood method if desirable further sets of estimates of Į and ȕ can be obtained by using (8.217) and (8.220).

Testing for ARCH Effects The following steps are involved in determining whether the ARCH effects are present in the residuals or not.

Step 1: First, we regress Y on X’s of the type (8.221)

Yt = x ct ȕ + İ t

where x t

1

x1t

x 2t

..... x kt c and ȕ = ȕ 0

ȕ1 ȕ 2

..... ȕ k c .

Step 2: Second, we apply the OLS method to run equation (8.221) and then we determine the least-squares estimator -1 of ȕ which is given by ȕˆ = X cX X cY and corresponding residuals e = Y  Xȕˆ . Step 3: Third, we regress e 2t on its q own lags of the type e 2t = Į 0 +Į1e 2t-1 +Į 2 e 2t-2 +.......+Į q e 2t-q +v t

(8.222)

and we set up the null hypothesis to test the effects of ARCH coefficients. The null hypothesis to be tested is H 0 : Į1 = Į1 =......= Į q = 0

against the alternative hypothesis H1: At least one of them is not zero.

Step 4: Then, we calculate the value of the test statistic. Under the null hypothesis, the test statistic is given by LM = T u R e2 ~Ȥ q2

(8.223)

Time Series Econometrics

409

where T is the number of observations and R e2 is the coefficients of determination from equation (8.222).

Step 5: Finally, we will make a decision whether the null hypothesis will be accepted or rejected by comparing the calculated value of the test statistic with the table value. If the calculated value of the test statistic is greater than the table value at a given level of significance with q degrees of freedom, the null hypothesis will be rejected. The rejection of the null hypothesis implies that the ARCH model will be of order q. Ex. 8-14: The data given below are the DSE index (INDX, base year 1990) and money supply (M2, in million BDT) of Bangladesh over a period of time. Table 8-13: DSE index (the base year 1990) and money supply of Bangladesh Year 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

INDX 27.22 29.96 32.97 36.28 39.93 43.94 48.36 53.22 58.57 64.45 70.93 126.43 145.67 133.47 100.00 83.68 88.57 104.07 187.51 211.03 381.55 288.38 163.89

M2 14421 17154 21283 26883 32944 39862 46564 52189 73242 99829 113488 131937 156755 178166 211370 233352 265340 297274 328530 392048 439677 486873 534308

Year 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

INDX 130.60 151.87 178.58 213.79 215.36 360.83 359.60 313.34 522.16 639.03 674.91 1437.18 1295.25 1001.93 1028.91 1231.87 1212.01 1196.41 1527.86 1446.68 1357.51 1185.00 1640.96

M2 595294 687394 820385 1173131 1343240 1532211 1748702 2025231 2434644 2764498 3218075 3869555 4685213 5477734 6409569 7361378 8510505 9777918 11410611 12978360 14471672 16194563 18313603

Source: INDX, Own Calculations, M2 from WDI 2020.

Solution: First, the F-test and LM tests are applied to detect the presence of ARCH effects in the residuals of the estimated model ˆ ½ INDX t = 173.9229+9.75E-05M2 t ° t-Test: 4.1696 13.2346 ¾ SE: 41.71246 7.37E-06 °¿

(8.224)

Table 8-14: The F and LM Tests Results for the ARCH Effects F-test LM Test

9.5862 8.2032

Prob F(1, 43) Prob. Chi-square (1)

0.0034 0.0042

Both tests’ results support the presence of ARCH in the residuals of the estimated equation. The test results are given below: Let us now consider the ARCH model of the type

Chapter Eight

410

INDX t = ȕ 0 +ȕ1M2t +İ t ; where İ t ~ N(0, ı 2t )

(8.225)

ı 2t = Į 0 +Į1İ 2t-1 +Į1İ 2t-2 +....+Į p İ 2t-p

(8.226)

From the ML estimates of ARCH models of different orders, it is found that the AIC, SBIC and HQIC criteria are the smallest for an ARCH(1) model. The final estimated results for the ARCH(1) model are given in Table 8-15.

Table 8-15: The ML estimates of an ARCH(1) model

Variable Constant M2 Constant İ 2t-1 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log-likelihood Durbin-Watson stat

Dependent Variable: INDX Method: ML - ARCH (Marquardt) - Normal distribution Convergence achieved after 96 iterations Presample variance: backcast (parameter = 0.7) Coefficient Std. Error z-Statistic 82.7031 74.0583 1.1167 0.00012 1.49E-05 8.0524 Variance Equation 35361.55 15362.57 2.3018 1.3447 0.7345 1.6112 0.7535 Mean dependent var 0.7479 S.D. dependent var 262.7996 Akaike info criterion 3038799 Schwarz criterion -305.3448 Hannan-Quinn criter. 0.3997

Prob. 0.2641 0.0000 0.0213 0.1071 474.8196 523.4355 13.4498 13.6088 13.5093

1,200,000 1,000,000 800,000 600,000 400,000 200,000 0 1980

1985

1990

1995

2000

2005

2010

2015

2020

Conditional variance

Fig. 8-12: Plot of one-step conditional variance for an ARCH(1) model

From the estimated results in Table 8-15, it is found that the effect of İ 2t-1 on ı 2t is statistically significant at a 15% significance level for the variable INDX. The significance of Į1 supports the hypothesis that the conditional volatility changes over time due to the volatility clustering effect. Since Įˆ 0 = 35361.55 > 0 and Įˆ 1 = 1.3447 > 0 , the implied variances are positive. Also, it is found that the money supply has a significant effect on the price index at DSE.

8.14 Generalised Autoregressive Conditional Heteroscedasticity (GARCH) Model The GARCH model was developed independently by Bollerslev and Taylor in 1986. The GARCH model is an econometric model which is used in analysing time-series data where the conditional error variance is believed to be serially autocorrelated, i.e., the conditional error variance depends on its own lags. Although the generalised autoregressive conditional heteroscedasticity (GARCH) models can be used in the analysis of a number of different types of financial data, such as macroeconomic data, financial institutions typically use them to estimate the volatility of stock returns, bonds, and market indices etc. They use the resulting information in decision-making about the pricing of different stocks and judge which assets/stocks will potentially provide higher returns, and also in forecasting the returns of current investments to help in their asset allocation, hedging, risk management, and portfolio optimisation decisions. The GARCH models are used when the variance of the error terms is not constant. That is, the error terms are heteroscedastic. Heteroscedasticity describes the irregular pattern of variation of a random error term, or a variable, in an econometric model. Essentially, wherever there is a heteroscedastic problem, observations do not conform to a linear pattern. Instead, they tend to cluster. Therefore, if econometric models that assume constant variance are used on this data, the conclusions and predictive values one can draw from the model will not be reliable.

Time Series Econometrics

411

The variance of the error term in a GARCH model is assumed to vary systematically, conditional on the average size of the error terms in previous periods. In other words, it has conditional heteroscedasticity, and the reason for the heteroscedasticity is that the error term follows an autoregressive moving average pattern. This means that it is a function of an average of its own past values.

Definition: Let us now consider the econometric model for a conditional mean of the type y t = x ct ȕ+İ t , where, İ t ~ N(0, ı 2t )

(8.227)

x t may contain the lag(s) of y t and/or dummies for special features of the market. The GARCH model assumes that the conditional variance of the random error terms follows an autoregressive moving average process. The simplest form of the GARCH model is given by ı 2t

D 0  D1İ 2t-1  O1ı 2t-1

(8.228)

The model (8.228) is called the GARCH(1, 1) model, it is also called the simplest form of a conditional-variance equation. Since the conditional variance ı 2t is non-negative, the model requires the coefficients to be non-negative. Using the GARCH model, we can interpret the current fitted variance ıˆ 2t as a weighted function of a long-term average value (dependent on Į 0 ), information about volatility during the previous period ( D1İ 2t-1 ), and the fitted variance from the model during the previous period ( O1ı 2t-1 ). It is useful to define the surprise in the squared innovations as u t = İ 2t  E(İ 2t | İ t-1 , İ t-2 ,........) u t = İ 2t  ı 2t ı 2t

İ 2t  u t

(8.229)

Thus, equation (8.228) can be rewritten as İ 2t  u t İ 2t

Į 0 +Į1İ 2t-1  O1 (İ 2t-1  u t-1 )

D 0  (D1 +O1 )İ 2t-1  O1u t-1  u t

(8.230)

Equation (8.230) is called an ARMA(1, 1) process for the squared errors. Thus, it can be said that the GARCH model can be expressed as an ARMA process for the conditional variance. The GARCH (1, 1) model can be extended to a GARCH (p, q) model, where the conditional variance at time t is parameterised to depend upon q lags of the squared error and p lags of the conditional variance. The GARCH( p, q) model is given by ı 2t = Į 0 +Į1İ 2t-1 +Į 2 İ 2t-2 +......+Į q İ 2t-q +Ȝ1ı 2t-1 +Ȝ 2 ı 2t-2 +......+Ȝ p ı 2t-p q

p

i=1

j=1

= Į 0 +¦ Įi İ 2t-i +¦ Ȝ jı 2t-j

(8.231)

But in most cases, the GARCH(1, 1) model will be sufficient to measure the volatility clustered in the data. You will find a very rare situation where the higher-order GARCH model will be applicable to measure the volatility clustered in the data of finance or macroeconomics.

Show that the GARCH (1, 1) Model Can be Converted into ARCH(f) Proof: The GARCH(1, 1) model is given by ı 2t = Į 0 +Į1İ 2t-1 +Ȝ1ı 2t-1

(8.232)

For a time period (t-1), equation (8.232) can be written as ı 2t-1 = Į 0 +Į1İ 2t-2 +Ȝ1ı 2t-2

Substituting the value of ı 2t-1 in equation (8.233), it can be written as

(8.233)

Chapter Eight

412

ı 2t = Į 0 +Į1İ 2t-1 +Ȝ1 (Į 0 +Į1İ 2t-2 +Ȝ1ı 2t-2 ) = Į 0 +Į1İ 2t-1 +Ȝ1Į 0 +Ȝ1Į1İ 2t-2 +Ȝ12 ı 2t-2

(8.234)

Again, for a time period (t-2), equation (8.232) can be written as ı 2t-2 = Į 0 +Į1İ 2t-3 +Ȝ1ı 2t-3

(8.235)

Again, substituting the value of ı 2t-2 in equation (8.234), we have ı 2t = Į 0 +Į1İ 2t-1 +Ȝ1Į 0 +Ȝ1Į1İ 2t-2 +Ȝ12 Į 0 +Į1İ 2t-3 +Ȝ1ı 2t-3

= Į 0 +Į1İ 2t-1 +Ȝ1Į 0 +Ȝ1Į1İ 2t-2 +Ȝ12 Į 0 +Ȝ12 Į1İ 2t-3 +Ȝ13 ı 2t-3 = Į 0 1+Ȝ1 +Ȝ12 +Į1İ 2t-1 +Ȝ1Į1Lİ 2t-1 +Ȝ12 Į1L2 İ 2t-1 +Ȝ13 ı 2t-3 = Į 0 1+Ȝ1 +Ȝ12 + Į1İ 2t-1 1+Ȝ1L+Ȝ12 L2 +Ȝ13 ı 2t-3

(8.236)

Continuing this procedure for an infinite number of successive substitutions, we have ı 2t

D 0 1  O1  O12  ......  D1İ 2t-1 1  O1L  O12 L2  .......  O1f ı 02

(8.237)

The first term on the RHS of (8.237) is a constant term, and as the number of observations tends to infinity, then O1f o 0. Hence, equation (8.236) can also be written as ı 2t = į 0 +Į1İ 2t-1 1+Ȝ1L+Ȝ12 L2 +....... = į 0 +į1İ 2t-1 +į 2 İ 2t-2 +į3 İ 2t-3 +.........

(8.238)

Equation (8.238) indicates an ARCH model of order infinity. Thus, it can be said that the GARCH model is better than the ARCH model because we see that the GARCH(1, 1) model containing only three parameters in the conditional variance equation is a very parsimonious model, that allows an infinite number of past squared errors to influence the current conditional variance.

Show that the GARCH Model is a Generalisation of the ARCH(f) Proof: The ARCH(f) model can be written as ı 2t = Į 0 +Į1İ 2t-1 +Į 2 İ 2t-2 +Į3 İ 2t-3 +........

(8.239)

Now, imposing a condition of the lagged coefficients of equation (8.236) of the type Į j = Į1O1j-1 , i.e., geometric lag structure, equation (8.239) can be written as ı 2t = Į 0 +Į1İ 2t-1 +Į1Ȝ1İ 2t-2 +Į1Ȝ12 İ 2t-3 +........ (D 0  O1D 0 )  Į1İ 2t-1 +Ȝ1{Į 0 +Į1İ 2t-2 +Į1Ȝ1İ 2t-3 +........}

(8.240)

Since ı 2t-1 = Į 0 +Į1İ 2t-2 +Į1Ȝ1İ 2t-3 +........ , equation (8.240) can be written as ı 2t = į+Į1İ 2t-1 +Ȝ1ı 2t-1

(8.241)

where G Į 0  O1Į 0 . Equation (8.241) is called a GARCH(1, 1) model. Thus, it can be said that the GARCH(1, 1) model is a generalisation of the ARCH(f). It can be viewed as a special case of the GARCH(p, q) model, where p is the number of lags of ı 2 and q is the number of lags of İ 2 . If the GARCH model is said to be stationary when Į1 +O1  1 , if Į1 +O1 t 1 , the GARCH model is said to be the integrated GARCH model or the IGARCH model.

Time Series Econometrics

413

Estimation of a GARCH (1, 1) Model The GARCH model is very similar to the ARCH model. The difference is that the lagged conditional variance is allowed to affect the conditional variance. The estimation technique is very much similar to the ARCH model. Here, the ML method is discussed to estimate the GARCH model. In this estimation technique, we only focus on the GARCH(1, 1) model. Let us consider the GARCH(1, 1) model of the type y t = x ct ȕ+İ t

½ ° ı = Į 0 +Į İ +Ȝ1ı ¾ İ t = ı t z t , z t ~N(0, 1) °¿ 2 t

2 1 t-1

2 t-1

(8.242)

x t may contain the lag(s) of y t and/or dummies for special features of the market. Į 0 >0, Į1 >0, and O1 ! 0 .

Equation (8.242) specifies that y t as a function of x tc (1, x1t , x 2t ,....., x kt )  I t-1 . In a simple specification, x t (1, y t-1 , y t-2 ,....., y t-k )c  I t-1 such that the model is univariate. The conditional mean of y t is given by E(y t |I t-1 ) = x ct ȕ+E(ı t z t |I t-1 ) = x ct ȕ+0 = x ct ȕ

(8.243)

The conditional variance of y t is given by Var(y t | I t-1 )

Var(İ t |I t-1 ) = Var(ı t z t |I t-1 ) = ı 2t Var(z t |I t-1 ) = ı 2t

(8.244)

Thus, we have y t |I t-1 ~N(x ct ȕ, ı 2t ) . The conditional log-likelihood function for the sample observations is given by log(L)



T 1 T 1 T (y  x cȕ) 2 log(2ʌ)  ¦ log(ı 2t )  ¦ t 2 t 2 2 t=1 2 t=1 ıt



T 1 T log(2ʌ)  ¦ log ª¬Į 0 +Į1 ((y t-1  x ct-1ȕ) 2 +Ȝ1ı 2t-1 º¼  2 2 t=1 (y t  x ct ȕ) 2 1 T ¦ 2 t=1 (Į 0 +Į1 ((y t-1  x ct-1ȕ) 2 +Ȝ1ı 2t-1 )

(8.245)

A maximum likelihood estimator for the parameters Į = (Į 0 , Į1 , Ȝ1 )c and ȕ can be obtained by maximising the loglikelihood function log(L). Since there are generally no close-form solutions for this problem, the maximum is found through numerical optimisation.

Ex. 8-15: The GARCH(1, 1) model is estimated for the market returns using the monthly price index of DSE. The following GARCH (1, 1) model is considered for estimation: R t = ȕ 0 +İ t , where İ t ~ N(0, ı 2t )

(8.246)

ı 2t = Į 0 +Į1İ 2t-1 +Ȝ1ı 2t-1

(8.247)

where R t is the market returns at time t which is given by

Chapter Eight

414

ªP º R t = ln « t » ×100 ¬ Pt-1 ¼

(8.248)

where Pt is the month price index of DSE at time t, The GARCH (1, 1) model is estimated for market returns based on the data from January 1990 to September 2020 using the software package STATA . The results are given in Table 8-17.

Table 8-16: Estimated results of a GARCH(1, 1) model

500 0

Conditional variance, one-step

1000

(setting optimization to BHHH) Iteration 6: log-likelihood = -1263.8433 Iteration 0: log-likelihood = -1280.5108 Iteration 7: log-likelihood = -1263.8421 Iteration 1: log-likelihood = -1266.5761 Iteration 8: log-likelihood = -1263.8421 Iteration 2: log-likelihood = -1264.761 Sample: 1990m2 - 2020m9 Iteration 3: log- likelihood = -1264.1841 Number of obs = 368 Iteration 4: log-likelihood = -1263.9637 Distribution: Gaussian (switching optimization to BFGS) Log-likelihood = -1263.842 Iteration 5: log-likelihood = -1263.8797 Variable Coefficient Std. Error z-Statistic Prob. [95% Conf. Interval] Constant 0.4289 0.3858 1.11 0.2660 [-0.3272, 1.1849] Variance Equation Constant 14.3724 3.5685 4.03 0.000 [7.3782, 21.3665] RESID(-1)^2 0.2891 0.0542 5.34 0.000 [0.1830, 0.3953] GARCH(-1) 0.5147 0.0895 5.75 0.000 [0.3393, 0.6902]

1990m1

2000m1

t

2010m1

2020m1

Fig. 8-13: Plot of one-step conditional variance for a GARCH(1, 1)

From the estimated results in Table 8-16, it is found that the coefficients on both the lagged squared residuals and lagged conditional variance terms in the conditional-variance equation are statistically significant at any significance level. The significance of Į1 and O1 support the hypothesis that the conditional volatility changes over time due to the volatility clustering effect as implied by significant Į1 and due to temporal dependence as reflected by the significant O . Furthermore, the sum of Įˆ and Oˆ is very high (0.8038) which indicates that shocks to the conditional variance 1

1

1

will be highly persistent. A large sum of these coefficients will imply that a large positive or a large negative return will lead the future forecast of the variance to be high for a protracted period. One could also expect for individual coefficients of the conditional variance. The constant term of the variance equation is quite large and statistically significant.The ARCH parameter is around 0.29 while the coefficient on the lagged conditional variance (GARCH) is 0.51 which is larger. These coefficients are statistically significant at any significance level.

8.15 The GARCH-in-Mean (GARCH-M) Models The application of the Generalised Autoregressive Conditional Heteroscedasticity-in-Mean (GARCH(p, q)-M) model plays a significant role for appropriate decision making in investment and in corporate finance based on time-series data, when the basic assumptions of the classical linear regression model of the Capital Asset Pricing Model are not q

p

i=1

j=1

satisfied. Within the framework of the original GARCH model: ı 2t = į0 +¦ Įi İ 2t-i +¦ Ȝ jı 2t-j , the conditional mean and variance of stock returns are assumed to be influenced by the past returns and volatility based on the available information at a particular point of time. The GARCH(p, q)-M model, introduced by Engle, Lilien and Robins (1987), provides a new framework to study the risk-return relationship since the model explicitly links the conditional variance to the conditional means of returns. The GARCH(p, q) model specifies the mean equation

Time Series Econometrics

415

p

R t = ȝ+¦ ȡ j R t-j +į ı2t +İ t , where ı 2t is the conditional variance indicating the time-varying variance. The inclusion j=1

of the conditional variance into the mean equation under the GARCH(p, q)-M framework depicts the resemblance with the Capital Asset Pricing Model (CAPM) since it indicates the presence of a risk component on stock returns. The į coefficient can be interpreted as a risk aversion parameter which assumes a positive linear relationship between the conditional variance and returns. A positive and significant risk coefficient į will imply that the market rewards investors for taking an additional risk by reaping a higher return. Thus, the GARCH(p, q)-M framework uses the conditional variability of returns as a measure of time-varying risk and captures the independence between expected returns and changing volatility of asset holding postulated by portfolio theory. Thus, we can apply the GARCH(p, q)M model in order to empirically examine the risk-return relationship. Definition: It is known to us that, in finance, the stock return may depend on its volatility (risk). To model such phenomena, the GARCH-in-Mean (GARCH-M) model adds a heteroscedasticity term into the mean equation. It has the specification of the type y t = x ct ȕ +įı t +İ t

½ ° ° ı 2t = Į 0 +¦ Įi İ 2t-i +¦ Ȝ i ı 2t-i ¾ i=1 i=1 ° °¿ İ t = ı t z t , z t ~N(0, 1) q

p

(8.249)

It is called the GARCH-M (p, q) model, where y t is the time-series value at time t, x t may contain the lag(s) of y t and/or dummies for special features of the market. It also includes the mean of the GARCH model. į is the volatility coefficient for the mean; İ t is the model's residual at time t; ı t is the conditional standard deviation (i.e., volatility) at time t; q is the order of the ARCH component model, and Į1 , Į 2 ,......., and Į q are the parameters of the ARCH component model. Here, p is the order of the GARCH component model, O1 , O2 ,......., Oq are the parameters of the GARCH component model, and z t 's are the standardised residuals, i.e., z t ~IIN(0, 1) . E(z t )

0,  t, and Var(z t ) 1,  t .

If į is positive and statistically significant, it implies that risk will be increased given by an increase in the conditional variance, leading to a rise in the mean return. Thus, į can be interpreted as a risk premium. In some cases, the conditional variance term ı 2t appears directly in the conditional mean equation rather than in a square-root form ı t . Remarks:

(i) The GARCH-M(p, q) model with normally-distributed innovation has p+q+3 estimated parameters. (ii) The GARCH-M(p, q) model with generalised error distribution (GED) or Student's t-distributed innovation has p+q+4 estimated parameters. (iii) A positive risk-premium, i.e., į > 0, indicates that the data series is positively related to its volatility. (iv) Furthermore, the GARCH-M model implies that there are serial correlations in the data series itself which are introduced by those in the volatility ı 2t process. (v) The mere existence of risk-premium is, therefore, another reason that some historical stock returns exhibit serial correlations. Estimation of a GARCH-M(1, 1) Model

Let us consider the GARCH-M(1, 1) model of the type y t = x ct ȕ+įı t +İ t

½ ° ı = Į 0 +Į İ +Ȝ1ı ¾ İ t = ı t z t , z t ~N(0, 1) °¿ 2 t

2 1 t-1

2 t-1

(8.250)

Chapter Eight

416

All the terms here have been explained before. Equation (8.250) specifies that y t as a function of xtc (1, x1t , x2t ,.....,xkt ) It-1 . In a simple specification, xt conditional mean of y t is given by

(1, yt-1, yt-2 ,...., yt-k )c  It-1 such that the model is univariate. The

E(y t |I t-1 ) = x ct ȕ+įı t +E(ı t z t |I t-1 ) = x ct ȕ +įı t

(8.251)

The conditional variance of y t is given by Var(y t | I t-1 )

Var(İ t |I t-1 ) = Var(ı t z t |I t-1 )

= ı 2t Var(z t |I t-1 ) = ı 2t

(8.252)

Thus, we have y t | I t-1 ~ N(x ct ȕ  įı t , ı 2t ) . The family of GARCH models is estimated using the maximum likelihood method. The log-likelihood function is computed from the product of all conditional densities of the prediction errors. When zt is assumed to have a standard normal distribution, the log-likelihood function is given by log(L)



T 1 T 1 T (y  x ct ȕ  įı t ) 2 log(2ʌ)  ¦ log(ı 2t )  ¦ t 2 2 t=1 2 t=1 ı 2t



T 1 T log(2ʌ)  ¦ log ª¬Į 0 +Į1 ((y t-1  x ct-1ȕ  įı t-1 ) 2 +Ȝ1ı 2t-1 º¼  2 2 t=1 (y t  x ct ȕ  įı t ) 2 1 T ¦ 2 t=1 (Į 0 +Į1 ((y t-1  x ct-1ȕ  įı t-1 ) 2 +Ȝ1ı 2t-1 )

(8.253)

A maximum likelihood estimator for the parameters Į = (Į 0 , Į1 , Ȝ1 )c and ȕ can be obtained by maximising the loglikelihood function log(L). Since there are generally no close-form solutions for this problem, the maximum is found through numerical optimisation. The likelihood function is maximised via either the dual quasi-Newton or trust-region algorithm. The starting values for the regression parameters ȕ are obtained from the OLS estimates. When there are autoregressive parameters in the model, the initial values are obtained from the Yule-Walker estimates. The starting value 1.0-6 is used for the GARCH process parameters. The variance-covariance matrix is computed using the Hessian matrix. The dual quasi-Newton method approximates the Hessian matrix while the quasi-Newton method gets an approximation of the inverse of Hessian. The trust-region method uses the Hessian matrix obtained using numerical differentiation. When there are active constraints, that is, q(ș)=0, the variance-covariance matrix is given by  2 ˆ = H -1 ª I  Qc(QH -1Qc)-1QH -1 º , where H = į log(L) and Q = įq(ș) . Var(ș) ¬ ¼ įșįșc įșc

ˆ = H -1 . Therefore, the variance-covariance matrix without active constraints reduces to Var(ș)

Ex. 8-17: The GARCH-M(1, 1) model is estimated for the market returns which are obtained using the monthly price index of DSE. The following GARCH-M(1, 1) model is considered for estimation: R t = ȕ 0 +ȕ1R t-1 +įı t +İ t , where İ t ~ N(0, ı 2t )

(8.254)

ı 2t = Į 0 +Į1İ 2t-1 +Ȝ1ı 2t-1 .

(8.255)

Here, R t is the stock returns at time t which is defined in equation (8.248). The GARCH-M(1, 1) model is estimated by using the software package EViews. The results are given in Table 8-17.

Time Series Econometrics

417

Table 8-17: The ML estimates of a GARCH-M(1, 1) model

Variable

ı Constant R(-1) Constant ARCH(-1) GARCH(-1)

Dependent Variable: R Method: ML - ARCH (Marquardt) - Normal distribution Sample (adjusted): 3/22/1900 22:03 3/24/1900 04:09 Included observations: 367 after adjustments Estimation settings: tol= 0.00010, derivs=accurate numeric (linear) Initial Values: C(1)=0.06300, C(2)=0.14124, C(3)=0.00500, C(4)=48.6577, C(5)=0.15000, C(6)=0.60000 Convergence achieved after 37 iterations Presample variance: backcast (parameter = 0.7) GARCH = C(4) + C(5)*RESID(-1)^2 + C(6)*GARCH(-1) Coefficient Std. Error z-Statistic Prob. 0.1204 0.27777 0.4333 0.6647 -0.3555 1.8851 -0.1886 0.8504 0.1024 0.0714 1.4342 0.1515 Variance Equation 15.3126 3.7242 4.1117 0.0000 0.3007 0.0682 4.4083 0.0000 0.4852 0.0970 4.9996 0.0000

xb prediction, one-step

15

10

5

0 1990m1

2000m1

t

2010m1

2020m1

Conditional variance, one-step

Fig. 8-14: Predictions from the mean equation

1000 800 600 400 200

0 1990m1

2000m1

Fig. 8-15: Conditional variances from the variance equation

t

2010m1

2020m1

Chapter Eight

418

Conditional Standard Deviation

30 25 20 15 10 5 1990m1

2000m1

t

2010m1

2020m1

Fig. 8-16: Conditional standard deviations predicted from the variance equation.

From the estimated results in Table 8-17, it is found that the estimated parameter of the volatility coefficient in the mean-equation has a positive sign but not statistically significant. Therefore, we can conclude that for, these stock returns, there is no significant feedback from conditional variance to the conditional mean. It is also found that the estimated coefficients on both the lagged squared residuals and lagged conditional variance terms in the conditional variance equation are statistically significant at any significance level. The significance of Į1 and Ȝ1 support the hypothesis that the conditional volatility changes over time due to the volatility clustering effect, as implied by a significant Į1 and due to temporal dependence reflected by the significant O1 . Furthermore, the sum of Dˆ1 and Ȝˆ 1 is 0.7859, which indicates that shocks to the conditional variance will be highly persistent. Note: Different software packages such as EViews, Python, R, RATS, SHAZAM, SPLUS, SPSS, STATA and TSP can be applied directly to estimate the AR, MA, ARMA, ARIMA, ARCH, GARCH, and GARCH-M models.

Time Series Econometrics

419

Exercises 8-1: Define time-series data, time-series variables and time-series econometrics with an example of each. 8-2: Write different uses of a time-series analysis. 8-3: Distinguish between stationarity and non-stationarity of a time-series variable with an example of each. 8-4: What kinds of variables are likely to be non-stationary? How can such variables be made stationary? Explain with an example. 8-5: Define the following terms with an example of each:

(i) Trend stationary process, (ii) Difference stationary process, (iii) White noise process, and (iv) Gaussian white noise process. 8-6: Define the random walk model with and without drift. Explain it with an example. 8-7: Write the names of different econometric models that are widely applicable in time-series econometrics or in financial econometrics. 8-8: Define an AR(1) process. Find the mean, variance, and autocorrelation function for AR(1) processes. 8-9: Define an AR(2) process. Find the mean, variance, and autocorrelation function for AR(2) processes. 8-10: Define an AR(p) process. Find the mean, variance, and autocorrelation function for AR(p) processes. 8-11: Discuss the stationarity condition for AR(p) models. Explain with an example. 8-12: Define correlogram. How will you detect whether a time-series variable is stationary using a correlogram? Discuss with an example. 8-13: Discuss the Box-Pierce (1970) Q-test and Ljung-Box (1979) Q-test to test for significance of higher-order autocorrelation. 8-14: Define an MA(1) process. Find the mean, variance, and autocorrelation function for MA(1) processes. 8-15: Define an MA(2) process. Find the mean, variance, and autocorrelation function for MA(2) processes. 8-16: Define an MA(q) process. Find the mean, variance, and autocorrelation function for MA(q) processes. 8-17: Define partial autocorrelation functions. Discuss the technique to derive the partial autocorrelation functions. 8-18: Discuss the invertibility condition of MA processes. Show that an MA process can be converted into an AR(f). 8-19: How will you detect the order of AR and MA processes by plotting the sample autocorrelation and partial autocorrelation functions? Discuss with an example. 8-20: Define an ARMA(p, q) process. Find the mean, variance and autocorrelation function for ARMA(p, q) processes. 8-21: What are the differences between AR and MA models? 8-22: For the stock market prices, a researcher might suggest the following three different types of models:

Model 1: y t = y t-1  İ t , Model 2: y t = 0.54y t-1  İ t , and Model 3: y t = 0.85İ t-1 +İ t . (i) Write the name of the classes of these models. (ii) What would be the autocorrelation coefficient for each of these models? (iii) Which model is more likely to represent stock market prices from a theoretical perspective, and why? 8-23: You obtain the following estimates for an AR(2) model of stock returns data: y t = 0.91y t-1 +0.72 y t-2 +İ t

where {İ t } is a white noise process. By examining the characteristic equation, check whether the model is stationary.

Chapter Eight

420

8-24: Explain what stylised shapes would be expected for the autocorrelation and partial autocorrelation functions for the following processes:

(i) an AR(1), (ii) an AR(2), (iii) an MA(1), (iv) an MA(2), (v) an ARMA(1, 1), (vi) an ARMA(2, 1) , and (vii) an ARMA(2, 2). 8-25: Discuss the AIC, SBIC and HQIC criteria to select the AR, MA and ARMA models. 8-26: Discuss the OLS and ML methods to estimate the AR, MA, and ARMA models. 8-27: Discuss the Box-Jenkins approach to estimate ARMA(p, q) models. 8-28: Define the autoregressive conditional heteroscedastic (ARCH) model. Discuss the technique to estimate an ARCH model. 8-29: Define a GARCH(p, q) model. Show that the GARCH(1, 1) model can be converted into an ARCH(f). 8-30: Discuss the technique to estimate a GARCH model. 8-31: Define a GARCH-M(p, q) model. Discuss the estimation technique of GARCH-M(p, q) models. 8-32: A researcher uses a sample of 60 observations on Y , the unemployment rate, to model the time-series behaviour of the series and to generate predictions. First, he computes the sample autocorrelation functions with the following results:

K ȡˆ k

1 0.78

2 0.72

3 0.65

4 0.45

5 0.32

6 0.28

7 0.12

8 -0.08

9 -0.05

(i) What do you mean by the sample autocorrelation function? Do the given results indicate that an AR or an MA process is more appropriate? Why? (ii) Use the Box-Pierce (1970) Q-test and Ljung-Box (1979) Q-test to determine whether the first five autocorrelation coefficients taken together are jointly significantly different from zero. 8-33: Using the same data given in 8-32, the researcher calculates the partial autocorrelation functions that given below:

K șˆ kk

1 0.78

2 0.15

3 -0.10

4 0.06

5 0.04

6 -0.05

7 0.09

8 -0.03

9 -0.01

(i) What do you mean by the sample partial autocorrelation function? Why is the first partial autocorrelation equal to the first autocorrelation coefficient (0.78)? (ii) Does the above pattern indicate that an AR or MA process is more appropriate? Why? 8-34: Suppose you obtain the following sample autocorrelations and partial autocorrelations for a sample of 65 observations from actual data of the inflation rate.

k ȡˆ k

1 0.43

2 0.12

3 0.09

4 -0.21

5 -0.15

6 0.06

7 -0.08

8 -0.06

9 0.02

șˆ kk

0.65

0.42

0.38

0.26

0.21

0.18

0.10

0.08

0.06

(i) Does the above pattern indicate that an AR or MA process is more appropriate? Why? (ii) Use the Box-Pierce (1970) Q-test and Ljung-Box (1979) Q-test to determine whether the first five autocorrelation coefficients taken together are jointly significantly different from zero.

Time Series Econometrics

421

8-35: Using the data set given in problem 8-31, the researcher obtains both AR(1) and AR(2) models with the following results ( standard errors in parentheses): y t = 2.75 + 0.78y t-1 + e t ½ ¾ SE: (3.562) (0.09) ¿

y t = 2.5 + 0.75y t-1 + 0.18 y t-2 + et ½ ¾ SE: (3.562) (0.10) (0.07) ¿ (i) Would you prefer the AR(2) model to the AR(1)? Why? How would you check whether an ARMA(2, 1) model may be more appropriate? 8-36: Estimate, AR, MA, ARMA, ARCH and GARCH models using the consumer price index (CPI) data of the UK and then compare them with those of the USA. 8-37: Estimate an appropriate ARCH, GARCH, and GARCH-M models for the variable stock return considering the Standard & Poor's 500 Index (S&P500) over a period of time.

CHAPTER NINE UNIVARIATE TIME SERIES ECONOMETRICS WITH UNIT ROOTS

9.1 Introduction Recently, econometricians have been paying more careful attention to doing research based on time-series data. In business, economics, finance, banking, social sciences, etc., time-series data have become more popular and have been intensively used for empirical works. In Chapter 8, different time-series econometric models namely: AR, MA, ARMA, ARCH, GARCH, GARCH-in–Mean models are discussed which are based on univariate time-series data. It is well known to us that most of the macroeconomic time-series variables tend to be dominated by long-run trend behaviour, and thus, it is important to identify the source of this trending behaviour. It is also known to us that the usual techniques of regression analysis can result in highly misleading conclusions when the variables contain a stochastic trend (Stock and Watson (1988), Granger and Newbold (1974)). In particular, if the dependent variable and at least one independent variable contain a stochastic trend, and if they are not co-integrated, the regression results are spurious, (Phillips (1986), Granger and Newbold (1974)). Therefore, to identify the correct specification of the model, an investigation of the presence of a stochastic trend in the time-series variables is very much needed. In this chapter, the most popular and widely applicable tests to investigate whether the time-series data contain stochastic trend or unit root problem, namely Dickey and Fuller (DF) (1979, 1981) tests, Phillips-Perron (PP) tests [see Phillips (1987, 1988), the Augmented Dickey-Fuller (ADF) tests [see Fuller (1996)], and the Kaiatkowski Phillips, Schmidt and Shin (1992) test are discussed with their derivations. The DF, ADF and PP tests are discussed in four different cases with their application to numerical problems. In this chapter, first, the meaning of the unit root is explained with an example, and later on, convergence rates of OLS estimators of unit root processes, and Brownian motion with the functional central limit theorem are discussed. Software packages RATS, EViews, and STATA are used for solving numerical problems.

9.2 Meaning of the Unit Root Let us consider the Gaussian AR(1) process of the type y t = Į+ȡy t-1 +İ t , where İ t ~IIN(0, ı 2 )

y t  ȡLy t = Į+İ t , where L is the lag operator. yt =

Į + [1  ȡL]-1İ t 1  ȡL

yt =

Į + [1+ȡL+(ȡL) 2 +(ȡL)3 +............]İ t 1 ȡ

yt =

f Į +¦ ȡ jİ t-j 1  ȡ j=0

(9.1)

The lag polynomial (1  ȡL) has a root equal to 1/ȡ . If |ȡ|u t @ TG 4

(9.45)

p o0 . Since E(u t ) 4 is finite, this probability will be zero if T o f , which indicates that ST (.) 

Application of FCLT to Unit Root Processes

In this section, the simplest case to illustrate how to use the FCLT to compute the asymptotic distribution of the random walk y t with y0 = 0, which was pioneered by Phillips (1986, 1987)1 , is discussed. Let us now consider the random walk model of the type y t = y t-1 +İ t , where İ t ~IIN(0, ı 2 )

(9.46)

Since y0 = 0 , from equation (9.46), we have t

yt =¦ İi

(9.47)

i=1

1

Phillips (1986, 1987) developed the general derivation presented here based on the functional central limit theorem (FCLT) and the continuous mapping theorem.

Univariate Time Series Econometrics with Unit Roots

431

Let the stochastic function X T (r) be defined as: ­0, ° y /T, ° 1 ° y 2 /T, ° X T (r) = ®. °. ° °. ° °¯ yT /T,

for, r  [0, 1/T) for, r  [1/T, 2/T) for, r  [2/T, 3/T)

(9.48)

. . . for, r =1.

The graphical presentation of X T (r) as a function of r is shown below:

0

1/T

2/T

y 4 /T

y3 /T

y2 /T

y1/T

3/T

4/T

y T-1 /T

……

(T-1)/T

1

Fig. 9-1: Plot of X T (r) as a function of r

The area under this step function is the sum of T rectangles. The tth rectangle has a width 1/T and the height therefore, has an area of

³

1

0

X T (r) dr =

y t-1 . If we integrate XT (r) over r [0, 1] , we have T2

y1 y 2 y + 2 +..........+ T-1 2 T T T2

=

1 T2

T

¦y

(9.49)

t-1

t=1

Multiplying both sides of equation (9.49) by

³

1

0

T

T.X T (r) dr = T -3/2 ¦ y t-1

1

0

(9.50)

t=1

We know that

³

T, we have

L TX T (.)  o ıW(.) . Thus, it follows that

1

L T.X T (r) dr  o ı ³ W(r)dr

(9.51)

0

This implies that, from equation (9.50) T

1

L T -3/2 ¦ y t-1  o ı ³ W(r)dr t=1

(9.52)

0

T

T

t=1

t=1

Thus, when y t is a drift-less random walk, its sample mean T -1 ¦ y t diverges but T -3/2 ¦ y t converges.

y t-1 , and T

Chapter Nine

432

Derivation of the Asymptotic Distribution of T

-3/2

T

¦ tİ

t

t=1

T

In this section, the asymptotic distribution of T -3/2 ¦ tH t is derived. t=1

We can write that: T

T 3/2 ¦ y t-1

T 3/2 > y1 +y 2 +........+y T-1 @ [  y0 = 0]

t=1

T 3/2 > İ1 +(İ1 +İ 2 )+(İ1 +İ 2 +İ 3 )+.....+(İ1 +İ 2 +......+İ T-1 ) @ T 3/2 > (T  1)İ1 +(T  2)İ 2 +(T  3)İ 3  ........  {T  (T  1)}İ T-1 @ T

T 3/2 ¦ (T  t) İ t t=1 T

T

t=1

t=1

T 1/2 ¦ İ t  T 3/2 ¦ tİ

(9.53)

We know that ª 1/2 T º ª § « T ¦ İt » «§ 0 · ¨1 t=1 L 2 « »  o N «¨ ¸ , ı ¨ « 3/2 T » «© 0 ¹ ¨1 T tİ ¨ ¦ t» « «¬ ©2 t=1 ¬ ¼

1 ·º 2 ¸¸ »» 1 ¸» ¸ 3 ¹ »¼

(9.54)

T

Therefore, from equation (9.53), we derive that the variance of T3/2 ¦yt-1 is ı 2 >1  1/ 3  2(1/ 2) @ ı 2 /3 . Therefore, t=1

T

we can say, T 3/2 ¦ y t-1 is asymptotically Gaussian with mean zero and variance ı 2 /3 . Equation (9.53) also gives us a t=1

T

way to describe the asymptotic distribution of T -3/2 ¦ tİ t in terms of the functions of Brownian motion. From t=1

expression (9.53), we have T

T 3/2 ¦ tİ t t=1

T

T

t=1

t=1

T 1/2 ¦ İ t  T 3/2 ¦ y t-1 1

L  o ıW(1)  ı ³ W(r)dr 0

(9.55) 1

The last part of (9.55) is from (9.52) and the first part is from (9.44). Evidently, the random variable ı ³ W(r)dr is 0

normally distributed with mean zero and variance ı 2 /3 . Asymptotic Distribution of the Sum of Squares of a Random Walk

Using a similar method, the asymptotic distribution of the sum of squares of a random walk can be derived. Let us now consider the random walk model of the type y t = y t-1 +İ t , where İ t ~IIN(0, ı 2 ) , and y 0 = 0

(9.56)

From equation (9.56), we have yt

t

¦İ i=1

i

(9.57)

Univariate Time Series Econometrics with Unit Roots

433

Let us now define the statistic ST (r) as ST (r) = T > X T (r)

2

@

(9.58)

which can be written as ­0, ° 2 ° y1 /T, ° y 2 /T, °° 2 ST (r) = ®. °. ° °. ° 2 °¯ y T /T,

for, r  [0, 1/T) for, r  [1/T, 2/T) for, r  [2/T, 3/T)

(9.59)

. . . for, r =1.

If we integrate ST (r) over r  [0, 1] , we have 1

³S 0

T

(r) dr = =

y12 y2 y2 + 22 +..........+ T-1 2 T T T2 1 T2

T

¦y

2 t-1

(9.60)

t=1

L Since we have that ST (r)  o ı 2 [W(r)]2 , by continuous mapping theorem, we can write that T

1

L T 2 ¦ y 2t-1  o ı 2 ³ [W(r)]2 dr

(9.61)

0

t=1

Another Two Useful Derivations

In this section also, two other important derivations are given. We have T

T 5/2 ¦ ty t-1 t=1

T ªtº T 3/2 ¦ « » y t-1 t=1 ¬ T ¼

(9.62) T

1

L For, r = t/T, and if we make the use of T 3/2 ¦ y t-1  oı ³ W(r)dr , we have 0

t=1

T

T 5/2 ¦ ty t-1 t=1

T 1 ªtº L T 3/2 ¦ « » y t-1  o V ³ rW(r)dr 0 t=1 ¬ T ¼

(9.63) T

1

L Again, for r = t/T, and if we make the use of T 2 ¦ y 2t-1  o ı 2 ³ [W(r)]2 dr , we have t=1

T

T 3 ¦ t.y 2t-1 t=1

T

1

L T 2 ¦ (t/T) y 2t-1  o ı 2 ³ r[W(r)]2 dr 0

t=1

0

(9.64)

T 1 2 L Show that T 1 ¦ y t-1İ t  o ı 2 > W(1) @  1 . 2 t=1

^

`

Proof: Let us now consider the random walk model of the type y t = y t-1 +İ t , where İ t ~IIN(0, ı 2 ) , and y0 = 0

Now, squaring both sides of equation (9.65), we have

(9.65)

Chapter Nine

434

y 2t = (y t-1 +İ t ) 2 y 2t = y 2t-1 +2y t-1İ t +İ 2t y t-1İ t =

1 2 ª y t  y 2t-1  İ 2t º¼ 2¬

(9.66)

Equation (9.66) can also be written as T

T 1 ¦ y t-1İ t t=1

T 1

1 T ¦ ª y2t  y2t-1  İ 2t º¼ 2 t=1 ¬

= T 1

1 T 1 T ª¬ y 2t  y 2t-1 º¼  T 1 ¦ İ 2t ¦ 2 t=1 2 t=1

= T 1

1 2 1 T y T  T 1 ¦ İ 2t 2 2 t=1

(9.67)

L o ıW(1) , thus, by continuous mapping theorem, we have We know that T 1 y T 

T 1

1 2 L ı2 2 y T  o > W(1) @ 2 2

(9.68)

And by LLN, we have T 1

1 T 2 L ı2 ¦ İ t o 2 2 t=1

(9.69)

Therefore, using the results of (9.69) and (9.68), it follows from (9.67) that T 1 2 L T 1 ¦ y t-1İ t  o ı 2 > W(1) @  1 2 t=1

^

`

(9.70)

Hence, the theorem is proved.

9.6 Asymptotic Theory for Integrated Processes In applied econometric research or in a time-series econometric research, the asymptotic theories for the integrated random variable are very much important. In this section, some of the asymptotic theories that are appropriate to integrated processes are reviewed and derived. The asymptotic distributions are all written in terms of the functions on the standard Brownian motion, denoted by W(r) . The following propositions can be used to calculate the asymptotic distribution of statistics from several simple regressions involving unit root problems. Therefore, for the I(1) series, the properties of the estimators and test statistics will be more readily interpretable. Most of the attention is devoted to the statistical properties of series containing a single unit root, i.e., I(1) processes, and then extending to the more general I(d) processes when necessary. Proposition 9.1: Suppose that y t is a random walk without drift of the type y t = y t-1 +İ t , where y 0 = 0, and İ t ~IIN(0, ı 2 )

Then, the following propositions are derived: T

L P1: T 1/2 ¦ İ t  o ıW(1) t=1

T 1 2 L P2: T 1 ¦ y t-1İ t  o ı 2 > W(1) @  1 2 t=1

^

T

`

1

L P3: T 3/2 ¦ tİ t  o ıW(1)  ı ³ W(r)dr t=1

0

(9.71)

Univariate Time Series Econometrics with Unit Roots T

435

1

L P4: T 3/2 ¦ y t-1  o ı ³ W(r)dr 0

t=1

T

1

L P5: T 2 ¦ y 2t-1  o ı 2 ³ [W(r)]2 dr 0

t=1

T

1

L P6: T 5/2 ¦ ty t-1  o ı ³ rW(r)dr 0

t=1

T

1

L P7: T 3 ¦ t.y 2t-1  o ı 2 ³ r[W(r)]2 dr 0

t=1

T

P8: T  (v+1) ¦ t v o t=1

1 , for, v=0, 1, 2,...... v+1

Note that all those W(·) are the same Brownian motions, so all those results are correlated. If we are not interested in their correlations while we are dealing with tests for unit roots, we can find simpler expressions for them. For 1 example, P1 is just N(0, ı 2 ), P2 is ı 2 ^Ȥ 2 (1)  1` , and P3 and P4 are N(0, ı 2 /3) . In general, the correspondence 2 between the finite sample and their limits are like

T

1

t=1

0

¦o ³

, t/T o r, 1/T o dr, T -1/2 İ t o dW, etc.

T 1 o . It can be shown Ex. 9-3: To give an example of P8, let us now consider v = 3. Then, P8 represents that T4¦t3 L 4 t=1 3

T T ªtº that T4 ¦t3 = T1 ¦« » t=1 t=1 ¬ T ¼

T

1

T-1 ¦r3 o ³ r3dr = t=1

0

1 . 4

9.7 Unit Root Tests In this section, different tests are discussed to test whether the time-series variable contains the unit root problem. First, the Dickey-Fuller (1987, 1988) test is discussed for unit root problems in an AR(1) model without serial autocorrelation, and then, Phillips-Perron (1987, 1988) and Augmented Dickey-Fuller tests are discussed with the presence of serial correlation. Finally, the Kaiatkowski, Phillips, Schmidt and Shin (1992) test is discussed to detect the presence of unit root problems. Dickey-Fuller Tests for Unit Roots in an AR(1) Model (Absence of Serial Correlation)

In this section, several key cases are discussed for testing the presence of a unit root problem in time-series data. Case 1: No Constant and Trend Terms are included in the Regression Equation: True Process is a Random Walk

Let us Consider the first-order autoregressive model of the type y t = ȡy t-1 +İ t

(9.72)

where y t is the value of the variable y at time t; y t-1 is the value of the variable y at time (t-1); ȡ is the correlation coefficient between y t and y t-1 ; and İ t is the random error term which is independently, identically distributed with zero mean and constant variance ı 2 , i.e., İ t ~IID(0, ı 2 ) . The OLS estimate of ȡ is given by T

¦y y t

ȡˆ T =

t=1 T

¦y

t-1

(9.73)

2 t-1

t=1

Here, we are interested in finding the asymptotic distributions of the OLS estimator ȡˆ T , when ȡ =1 . When ȡ =1, the OLS estimator ȡˆ T of ȡˆ is given by

Chapter Nine

436 T

ȡˆ T =

¦y

t-1

[y t-1  İ t ]

t=1

T

¦y

2 t-1

t=1

T

1

¦y

İ

t-1 t

t=1 T

(9.74)

¦y

2 t-1

t=1

From equation (9.74), we have T

ȡˆ T  1=

¦y

İ

t-1 t

t=1 T

¦y

2 t-1

t=1

T

T(ȡˆ T  1) =

T -1 ¦ y t-1İ t t=1 T

T

-2

(9.75)

¦ y2t-1 t=1

T 1 2 L From proposition P2, we have T 1 ¦ y t-1İ t  o ı 2 > W(1) @  1 2 t=1

^

T

`

and from proposition P5, we have

1

L T 2 ¦ y 2t-1  o ı 2 ³ [W(r)]2 dr . Thus, equation (9.75) can be written as 0

t=1

1 2 2 ı > W(1) @  1 2 o T(ȡˆ T  1)  1 ı 2 ³ [W(r)]2 dr

^

L

`

0

^> W(1)@  1`  1)  o 2

T(ȡˆ T

L

1

2 ³ [W(r)]2 dr

(9.76)

0

First, it will be noted that (ȡˆ T  1) converges at the order of T instead of

T as in the cases when |ȡ| < 1. Therefore,

when the true coefficient is unity, ȡˆ T is super consistent. Second, since W(1) ~N(0, 1),

2

> W(1)@

will be Ȥ 2 (1)

variable. The probability of Ȥ 2 (1) is less than one which is equal to 0.68. Since the denominator of equation (9.76) is positive, and therefore, (ȡˆ  1) will be negative in probability as T becomes large, it implies that the limiting distribution of T(ȡˆ  1) is skewed to the left.

ˆ Fig. 9-2: Negatively skewed distribution of T(ȡ-1) in case one.

Univariate Time Series Econometrics with Unit Roots

437

Recall that in the AR(1) process with |ȡ| W(1) @  1 2

^

L

t  o



2

`

1/ 2

³ [W(r)] dr` 1

2

0

1 2 > W(1)@  1 2

^

`

1/ 2

^ ³ [W(r)] dr` 1

0

2

ı2

(9.81)

Chapter Nine

438

For the same reason as in (9.76), this t-statistic is asymmetric and skewed to the left and is shown below in Fig. 9-3.

Fig. 9-3: Negatively skewed distribution of t-statistic in case one.

The critical values of the DF t-test or DF ȡ test are given for various sample sizes T at different significance levels. We then compare the observed value with the reported critical value. If the observed value is above the critical value, the null hypothesis will be accepted, otherwise, the null hypothesis will be rejected. If we accept the null hypothesis, it indicates that the true process is a random walk model. Case 2: Constant Term but No Trend Term is included in the Regression Equation: True Process is a Random Walk without Drift

The true process is a random walk of the type y t = y t-1 + İ t , where İ t ~IID(0, ı 2 )

(9.82)

The following AR(1) model is considered for estimation: (9.83)

y t = Į +ȡy t-1 +İ t

The OLS estimates of the regression coefficients of (9.83) are given by ª T ª Įˆ T º « « « ȡˆ » = T ¬ T¼ « y «¦ t ¬ t=1

T

º t » t=1 » T 2 » y t-1 » ¦ t=1 ¼

¦y

-1

ª T º « ¦ yt » « t=1 » «T » « ¦ y t y t-1 » ¬ t=1 ¼

(9.84)

In deviation form, the OLS estimated coefficients vector ȕˆ T from the true value ȕ T is given by ȕˆ T  ȕ T =(x cx)-1 x cİ

(9.85)

For the given equation (9.83), equation (9.85) can be written as

ª Įˆ T  Į º « » ¬ ȡˆ T  ȡ ¼

§ ¨ ¨ 1 ¨ ª« ¨ ¬ y0 ¨ ¨ ©

1

ª1 y0 º · «1 y » ¸ 1 »¸ ... ... 1 º « ª1 «. . »¸ « » ... ... y T-1 ¼ « »¸ y . »¸ ¬ 0 «. «¬1 yT-1 »¼ ¸ ¹

1 y1

1 y1

ª İ1 º «İ » ... ... 1 º « 2 » «.» ... ... yT-1 »¼ « » «.» «¬İ T »¼

(9.86)

Under the null hypothesis H 0 : Į = 0, ȡ =1 , equation (9.86) can be written as

ª Įˆ T º «ȡˆ  1» ¬ T ¼

ª « T « «T « ¦ y t-1 ¬ t=1

T

º y t-1 » ¦ t=1 » T 2 » y t-1 » ¦ t=1 ¼

-1

ª T º « ¦ İt » « t=1 » «T » « ¦ y t-1İ t » ¬ t=1 ¼

(9.87)

Univariate Time Series Econometrics with Unit Roots

Proposition P4 establishes that

T

¦y

t-1

439

must be divided by T 3/2 before obtaining a random variable that converges in

t=1

distribution, i.e., T

1

L T 3/2 ¦ y t-1  o ı ³ W(r)dr

(9.88)

0

t=1

In other words, the order in probability of the term

T

¦y

t-1

is given by

t=1

T

¦y

O P (T 3/2 )

t-1

(9.89)

t=1

Similarly, propositions P2 and P5 establish that T

¦y

t-1 t

İ = O P (T)

(9.90)

2 t-1

= O P (T 2 )

(9.91)

t=1 T

¦y t=1

and proposition P1 establishes that T

¦İ

t

= O P (T1/2 )

(9.92)

t=1

Thus, we see that in the regression equation with a constant term, the OLS estimates have different convergent rates. Now, using the order in the probability of each term in (9.87), we have ª Įˆ T º «ȡˆ  1» ¬ T ¼

-1

ª O P (T) O P (T 3/2 ) º ª O P (T1/2 ) º « » 3/2 2 » « ¬O P (T ) O P (T ) ¼ ¬ O P (T) ¼

(9.93)

We now use the scaling matrix H T to describe the limiting distribution of the OLS estimates. Pre-multiplying equation (9.85) by H T , we can write H T (ȕˆ T  ȕ T ) = H T (x cx)-1H T H -1T x cİ -1

= ª¬ H -1T (x cx)H -1T º¼ ª¬ H -1T x cİ º¼

(9.94)

From (9.94), for the application of (9.93), we should specify the scaling matrix H T as below: ª T1/2 « ¬ 0

HT

0º » T¼

(9.95)

Therefore, equation (9.95) can be written as

1/2

ªT « ¬ 0

1/2

§ ¨ 0 º ª Įˆ T º ¨ ªT 1/2 = »« « » T ¼ ¬ȡˆ T  1¼ ¨ ¬ 0 ¨¨ ©

ª T Įˆ T º « » ¬T(ȡˆ T  1) ¼

ª 1 « « « 3/2 T «T ¦ y t-1 t=1 ¬

ª T 0 º« »« T 1 ¼ « T « ¦ y t-1 ¬ t=1 T º T 3/2 ¦ y t-1 » t=1 » T 2 » 2 T ¦ y t-1 » t=1 ¼

1

T

º y t-1 » 1/2 ¦ ªT t=1 »« T » 0 y 2t-1 » ¬ ¦ t=1 ¼

ª 1/2 T º « T ¦ İt » t=1 « » « 1 T » «T ¦ y t-1İ t » t=1 ¬ ¼

· ¸ 0 º¸ » T 1 ¼ ¸ ¸¸ ¹

-1

§ ¨ 1/2 ¨ ªT ¨ «¬ 0 ¨¨ ©

ª T º· ¦ İt » ¸ 0 º « t=1 »¸ »« »¸ T 1 ¼ « T « ¦ y t-1İ t » ¸¸ ¬ t=1 ¼¹

(9.96)

Chapter Nine

440

From

proposition

P3,

we

have

T

1

L T 3/2 ¦ y t-1  o ı ³ W(r)dr, and 0

t=1

T

from

proposition

P5,

we

have

1

L T 2 ¦ y 2t-1  o ı 2 ³ [W(r)]2 dr . Therefore, the first part of the right side of equation (9.96) can be written as 0

t=1

ª 1 « « « 3/2 T « T ¦ y t-1 t=1 ¬

T º 1 T 3/2 ¦ y t-1 » ª 1 ı ³ W(r)dr º t=1 0 » L »  o «« 1 T 1 » 2 2 2 » 2 ı W(r)dr ı ³ [W(r)] dr » T ¦ y t-1 » 0 ¬« ³0 ¼ t=1 ¼

ª ª1 0 º « 1 «0 ı » « 1 ¬ ¼ « W(r)dr ¬ ³0

Again,

from

proposition

P1,

we

have

1

W(r)dr º ª1 0 º » 1 » «0 ı » 2 ³0 [W(r)] dr »¼ ¬ ¼

³

0

(9.97)

T

L T 1/2 ¦ İ t  o ıW(1), and

from

proposition

t=1

T 1 2 L o ı 2 > W(1) @  1 . Therefore, the second part of (9.97) can be written as T 1 ¦ y t-1İ t  2 t=1

^

`

ª 1/2 T º ıW(1) ª º « T ¦ İt » t=1 L « » « »  o 1 2 « ı 2 > W(1) @  1 » « 1 T » «¬ 2 »¼ « T ¦ y t-1İ t » t=1 ¬ ¼

^

`

W(1) ª ª1 0 º « ı« 2 » 1 ¬0 ı ¼ «« > W(1) @  1 ¬2

^

`

º » » »¼

(9.98)

Now, using equations (9.98) and (9.97) in equation (9.96), we have 1 ª ª T1/2 Įˆ T º L ª1 0 º « 1 oı « « »  » « 1 ¬0 V ¼ « W(r)dr ¬T(ȡˆ T  1) ¼ ¬ ³0

ª ªı 0 º « 1 «0 1» « 1 ¬ ¼ « W(r)dr ¬ ³0

1

1

1

W(1) W(r)dr º ª1 0 º 1 ª1 0 º ª 0 » «1 1 » «0 V » «0 V » « > W(1) @2  1 2 ¬ ¼ ¬ ¼« [W(r)] dr » ³0 ¬2 ¼

³

³0 W(r)dr º» 1 » 2 ³0 [W(r)] dr »¼

^

1

W(1) ª «1 « > W(1) @2  1 ¬« 2

^

`

º » » ¼»

1 1 1 2 ª º ıW(1) ³ [W(r)]2 dr  ı > W(1)@  1 ³ W(r)dr » « 0 0 1 2 « » 2 1 1 1 2 2 » ª 1 W(r)dr º «  [W(r)] dr   W(1) 1 W(1) W(r)dr > @ ³0 ³0 «¬ ³0 »¼ «¬ »¼ 2

^

^

`

`

º » » »¼

`

(9.99)

Therefore, from (9.99), we have 1 1 1 2 ­ ½ ı ® W(1) ³ [W(r)]2 dr  > W(1) @  1 ³ W(r)dr ¾ 0 0 2 ¿ L o ¯ T1/2Dˆ T  2 1 1 2 ª º ³0 [W(r)] dr  ¬« ³0 W(r)dr ¼»

^

and

`

(9.100)

P2,

we

have

Univariate Time Series Econometrics with Unit Roots 1 1 2 W(1) @  1  W(1) ³ W(r)dr > 0 L o2 T(ȡˆ T  1)  2 1 1 2 ª W(r)dr º  [W(r)] dr ³0 ¬« ³0 ¼»

^

`

441

(9.101)

Thus, we can say that neither Įˆ T nor ȡˆ T has a limiting Gaussian distribution. Dickey and Fuller (1979) have derived the asymptotic distribution of ȡˆ T to test the null hypothesis H 0 : ȡ = 1 . Thus, we can apply the DF ȡ test statistic to test the null hypothesis H 0 : ȡ = 1 , and has the following limiting distribution 1 1 2 W(1) @  1  W(1) ³ W(r)dr > 0 L o2 T(ȡˆ T  1)  2 1 1 2 ª W(r)dr º  [W(r)] dr ³0 ¬« ³0 ¼»

^

`

(9.102)

which is not the same as the asymptotic distribution as in (9.76). Thus, different critical values must be used to test the unit root problem when the constant term will be included in the regression model. Note that this distribution is more strongly negatively skewed than that for case one. Thus, when a constant term is included in the regression, the estimated coefficient on y t-1 , that is ȡˆ T , must be more deviated from unity to reject the null hypothesis of a unit root. Indeed, for T>25, 95% of the time, the estimated value ȡˆ T will be less than one. In practice, critical values for the random variable in equation (9.102) are found by computing the exact finite sample distribution of T(ȡˆ T  1) for given a T, assuming that İ t is a Gaussian process. Then, the critical values are tabulated by Monte Carlo or by numerical approximation. If the observed value is above the critical value, the null hypothesis will be accepted; otherwise, the null hypothesis will be rejected. If we accept the null hypothesis, it indicates that the true process is a random walk model.

Fig. 9-4 : Negatively skewed distribution of T(ȡˆ T  1) for case 2.

Dickey and Fuller have also proposed the t-test statistic for testing the null hypothesis H 0 : ȡ = 1 , based on the OLS estimates. The t-test statistic is given by t=

ȡˆ T  1 ıˆ ȡˆ T

(9.103)

where ª « T 2 2 ıˆ ȡˆ T = s T > 0 1@ « T « « ¦ y t-1 ¬ t=1

T

1

º y t-1 » ¦ 0 t=1 » ª« º» T » 1 y 2t-1 » ¬ ¼ ¦ t=1 ¼

(9.104)

and s T2

1 T 2 ¦ y t  Įˆ T  ȡˆ T y t-1 T  2 t=1

(9.105)

Chapter Nine

442

Multiplying both sides of (9.105) by T 2 , we have 1

T

ª « T 2 2 2 T ıˆ ȡˆ T = s T > 0 T @ « T « « ¦ y t-1 ¬ t=1

º y t-1 » ¦ 0 t=1 » ª« º» T » T y 2t-1 » ¬ ¼ ¦ t=1 ¼ 1

T

ª « T s T2 > 0 1@ H T « T « « ¦ y t-1 ¬ t=1

º y t-1 » ¦ 0 t=1 » H T ª« º» T » ¬1 ¼ y 2t-1 » ¦ t=1 ¼

(9.106)

Now, we can write that T

ª « T HT « T « « ¦ y t-1 ¬ t=1

1

§ ª ¨ « T ¨ H -1 « ¨ T«T ¨¨ « ¦ y t-1 ¬ t=1 ©

º y t-1 » ¦ t=1 » HT T 2 » y t-1 » ¦ t=1 ¼

§ ¨ 1/2 ¨ ªT ¨ «¬ 0 ¨¨ ©

T

· º y t-1 » ¸ ¦ t=1 » H -1T ¸ T ¸ » y 2t-1 » ¸¸ ¦ t=1 ¼ ¹

ª T 0 º« »« T 1 ¼ « T « ¦ y t-1 ¬ t=1

ª 1 « « « 3/2 T «T ¦ y t-1 t=1 ¬

1

T

º y t-1 » 1/2 ¦ ªT t=1 »« T » 0 y 2t-1 » ¬ ¦ t=1 ¼

T º T 3/2 ¦ y t-1 » t=1 » T 2 2 » T ¦ y t-1 » t=1 ¼

1 ª ª1 0 º « 1 L  o « » « 1 ¬0 ı ¼ « W(r)dr ¬ ³0

· ¸ 0 º¸ » T 1 ¼ ¸ ¸¸ ¹

1

1

1

1

W(r)dr º ª1 0 º 1 0 » 1 » «0 ı » 2 ³0 [W(r)] dr »¼ ¬ ¼

³

(9.107)

Therefore, from equation (9.106), we have

p o s T2 T 2 ıˆ ȡ2ˆ T 

1

1

ª 1 ª¬0 ı -1 ¼º « 1 « «¬ ³0 W(r)dr

³0 W(r)dr º» ª 0 º 1 » «ı -1 » 2 ³0 [W(r)] dr »¼ ¬ ¼

(9.108)

By LLN, it can be easily shown that P sT2  o ı2

(9.109)

Therefore, from (9.108), we have

T 2 ıˆ ȡ2ˆ T

ª 1 L  o > 0 1@ «« 1 «¬ ³0 W(r)dr

1

1

³0 W(r)dr º» ª0º 1 » «1 » 2 ³0 [W(r)] dr »¼ ¬ ¼

1 1

1

2 ³0 [W(r)] dr  ª¬« ³0 W(r)dr º¼»

2

Therefore, the asymptotic distribution of the OLS t-test in (9.103) is given by

(9.110)

Univariate Time Series Econometrics with Unit Roots

t=

443

T(ȡˆ T  1) T 2 ıˆ ȡ2ˆT 1/ 2

2 1 ­ 1 ½ P  o T(ȡˆ T  1) ® ³ [W(r)]2 dr  ª ³ W(r)dr º ¾ « » 0 0 ¬ ¼ ¿ ¯ 1 1 2 > W(1)@  1  W(1) ³0 W(r)dr 2  o 2 1/ 2 1 ­ 1 ½ 2 ª º ® ³0 [W(r)] dr  « ³0 W(r)dr » ¾ ¬ ¼ ¯ ¿

^

L

`

(9.111)

The critical values of the DF t-test statistic are given for various sample sizes T at different levels of significance. We then compare the observed value with the reported critical values corresponding to various levels of significance. If the observed value is above the critical value, the null hypothesis will be accepted; otherwise, the null hypothesis will be rejected. If we accept the null hypothesis, it indicates that the true process is a random walk model. Case 3: Constant Term but No Trend Term is included in the Regression Equation: True Process is a Random Walk with Drift

The true process is a random walk with drift of the type (9.112)

y t = Į+y t-1 +İ t

where İ t ~IID(0, ı 2 ) , and the true value of Į is not zero. Here, the following AR(1) model is considered for estimation: (9.113)

y t = Į +ȡy t-1 +İ t

The OLS estimates of the regression coefficients of (9.113) are given by

ª Įˆ T º « ȡˆ » ¬ T¼

ª « T « «T «¦ yt ¬ t=1

T

º yt » ¦ t=1 » T 2 » y t-1 » ¦ t=1 ¼

1

ª T º « ¦ yt » « t=1 » «T » « ¦ y t y t-1 » ¬ t=1 ¼

(9.114)

When the true process is a random walk with drift, in deviation form, the OLS estimated coefficients vector ȕˆ T from the true value ȕ T can be written as ȕˆ T  ȕ T =(x cx)-1 x cİ

ª Įˆ T  Į º « ȡˆ  ȡ » ¬ T ¼

§ ¨ ¨ 1 ¨ «ª ¨ ¬ y0 ¨ ¨ ©

(9.115) 1

ª1 y0 º · «1 y » ¸ 1 »¸ ... ... 1 º « ª1 « . . »¸ « ... ... y T-1 ¼» « y ¸ » . »¸ ¬ 0 «. «¬1 yT-1 »¼ ¸ ¹

1 y1

1 y1

ª İ1 º «İ » ... ... 1 º « 2 » «.» ... ... yT-1 ¼» « » «.» «¬ İ T »¼

(9.116)

Under the null hypothesis H 0 : ȡ = 1, equation (9.116) can also be written as

ª Įˆ T  Į º « ȡˆ  1 » ¬ T ¼

ª « T « «T « ¦ y t-1 ¬ t=1

T

º y t-1 » ¦ t=1 » T 2 » y t-1 » ¦ t=1 ¼

1

ª T º « ¦ İt » « t=1 » «T » « ¦ y t-1İ t » ¬ t=1 ¼

(9.117)

We can now use the scaling matrix H T to describe the limiting distribution of the OLS estimates. Premultiplying equation (9.115) by H T , we have

Chapter Nine

444

H T (ȕˆ T  ȕ T ) = H T (x cx)-1H T H -1T x cİ -1

= ª¬ H T1 (x cx)-1H -1T º¼ ª¬ H -1T x cİ º¼

(9.118)

From equation (9.112), we have (9.119)

y t = y0 + tĮ + İ1 + İ 2 +........+ İ t

Without loss of generality, we can set y 0

0, and defining ut = İ1 + İ2 +........+ İt , equation (9.119) can be written as

y t = tĮ +u t , for t =1, 2,…..,T.

(9.120)

From equation (9.120), we have T

¦y

t-1

t=1

T

T

t=1

t=1

= Į¦ (t-1) +¦ u t-1

(9.121)

The first term of the right-hand side of (9.121) is given by T

ª T(T+1) º Į«  T» 2 ¬ ¼

Į ¦ (t  1) t=1

ª T 2 +T  2T º = Į« » 2 ¬ ¼ ĮT 2 ĮT  2 2

(9.122)

Thus, the first term must be divided by T 2 in order to converge, i.e., ª ĮT 2 ĮT º p Į T 2 «  o »  2 ¼ 2 ¬ 2

T

T 2 Į ¦ (t  1) t=1

(9.123)

The last term of the right-hand side of (9.121) will converge when divided by T 3/2 , i.e., T

1

L T 3/2 ¦ u t-1  o ı ³ W(r)dr [From proposition P4] 0

t=1

(9.124)

We see that the order in the probability of two individual terms in (9.121) are not the same. We have T

Į ¦ (t  1)

O P (T 2 )

(9.125)

t=1

and T

¦u

O P (T 3/2 )

t-1

(9.126)

t=1

Therefore, from equation (9.116), we have T T T ­ ½ p Į T 2 ¦ y t-1 = T 2 Į ¦ (t  1) +T 1/2 ®T 3/2 ¦ u t-1 ¾  o 2 t=1 t=1 t=1 ¯ ¿

(9.127)

Similarly, we can write that T

¦y

2 t-1

=

t=1

T

2

¦ >Į(t  1)+u @ t-1

t=1

=

T

¦Į t=1

2

T

T

t=1

t=1

(t  1) 2  ¦ u 2t-1  2¦ Į(t  1)u t-1

(9.128)

Univariate Time Series Econometrics with Unit Roots

445

The first term of the right-hand side of (9.128) can be written as T

2

¦Į t=1

T T ªT º (t  1) 2 = Į 2 « ¦ t 2  2¦ t+¦1» t=1 t=1 ¼ ¬ t=1

ª T(T+1)(2T+1) 2T(T+1) º = Į2 «  +T » 6 2 ¬ ¼

(9.129)

Thus, for convergence, the first term of (9.128) must be divided by T 3 , i.e., T

P T 3 ¦ Į 2 (t  1) 2  o t=1

Į2 3

(9.130)

The order in probability of the term

T

¦u

2 t-1

is

t=1

T

¦u

2 t-1

O P (T 2 )

(9.131)

t=1

T

And the order in probability of the term 2¦ Į(t  1)u t-1 is t=1

T

2¦ Į(t  1)u t-1

O P (T 5/2 )

(9.132)

t=1

Thus, if we divide equation (9.123) by T 3 , the first term will not vanish asymptotically but other terms will be vanished asymptotically. Hence, we have T

P T 3 ¦ y 2t-1  o t=1

Į2 3

(9.133)

Finally, T

¦y t=1

T

T

t=1

t=1

İ = Į ¦ (t  1)İ t +¦ u t-1İ t

t-1 t

(9.134)

From proposition P3, we have T

Į ¦ (t  1) İ t = O P (T 3/2 )

(9.135)

t=1

and from proposition, P2 we have T

¦u

(9.136)

İ = O P (T)

t-1 t

t=1

Thus, if we divide equation (9.134) by T 3/2 , we have T

T

t=1

t=1

P T 3/2 ¦ y t-1İ t  o T 3/2 ¦ Į (t  1) İ t

= O P (T 3/2 )

(9.137)

We found that, when the true regression model is a random walk with drift, the OLS estimates have different convergent rates. Using the order in the probability of each term in (9.117), we have ª Įˆ T  D º « ȡˆ  1 » ¬ T ¼

1

ª O P (T) O P (T 2 ) º ª O P (T1/2 ) º « « 2 3 » 3/2 » ¬ O P (T ) O P (T ) ¼ ¬O P (T ) ¼

(9.138)

Chapter Nine

446

Thus, for the application of (9.118), we should specify the scaling matrix H T as HT

ª T1/2 « ¬ 0

0 º » T 3/2 ¼

(9.139)

Therefore, equation (9.118) can be written as

0 º ª Įˆ T  Į º »« » T 3/2 ¼ ¬ ȡˆ T  1 ¼

ª T1/2 « ¬ 0

ª « ª 1/2 « «T «¬ 0 « «¬

ª T 0 º« « » T 3/2 ¼ « T « ¦ y t-1 ¬ t=1

1/2

ª T (Įˆ T  Į) º « 3/2 » ¬ T (ȡˆ T  1) ¼

º y t-1 » 1/2 ¦ ªT t=1 »« T » 0 y 2t-1 » ¬ ¦ t=1 ¼

º » 0 º» » T 3/2 ¼ » » »¼

1

ª T ºº ¦ İt »» 0 º « t=1 »» »« »» T 3/2 ¼ « T « ¦ y t-1İ t » » ¬ t=1 ¼ ¼»

ª « ª 1/2 T u «« «¬ 0 « ¬« ª 1 « « « 2 T « T ¦ y t-1 t=1 ¬

T

T º T 2 ¦ y t-1 » t=1 » T 3 2 » T ¦ y t-1 » t=1 ¼

1

ª 1/2 T º « T ¦ İt » t=1 « » « 3/2 T » « T ¦ y t-1İ t » t=1 ¬ ¼

(9.140)

From (9.127) and (9.133), the first term of the right-hand side of (9.140) converges to ª 1 « « « 2 T « T ¦ y t-1 t=1 ¬

T º ª T 2 ¦ y t-1 » «1 t=1 P »  « o T «D 2 » 3 T ¦ y t-1 » «¬ 2 t=1 ¼

D º

2 » »{Q D2 » 3 »¼

(9.141)

From (9.137), the second term of equation (9.140) satisfies that T ª 1/2 T º ª º 1/2 « T ¦ İt » « T ¦ İt » t=1 t=1 P « »  » o« « 3/2 T » « 3/2 T » « T ¦ y t-1İ t » «T ¦ Į(t-1)İ t » t=1 t=1 ¬ ¼ ¬ ¼

ª § «§ 0 · ¨1 L  o N «¨ ¸ , ı 2 ¨ «© 0 ¹ ¨D ¨ « ©2 ¬

D ·º

2 ¸ »» ¸ D 2 ¸» ¸ 3 ¹ »¼

N 0, ı 2 Q 

(9.142)

Therefore, we have the following limiting distribution for the OLS estimates: ª T1/2 (Įˆ T  D ) º L o N ª¬0, Q-1ı 2 QQ-1 º¼ « 3/2 »  ˆ  T (ȡ 1)  ¬ ¼ T

N ª¬0, ı 2 Q 1 º¼ 

(9.143)

We see that in case 3, both estimated coefficients are asymptotically Gaussian, and the asymptotic distributions of Įˆ T and ȡˆ are exactly the same as Įˆ and įˆ in the regression with the deterministic trend of the type: y = Į+įt+İ . This T

t

t

happened because, here, the regressor y t-1 is asymptotically dominated by the term Į(t  1) . In large samples, the regressor variable y t-1 can be replaced by the time trend Į(t  1) . It follows that, in case 3, the standard OLS t and F

Univariate Time Series Econometrics with Unit Roots

447

tests can be applied to test the null hypotheses and then the results of the test statistics can be compared with the table values of the usual t-test and F-test statistics. Case 4: Constant and Trend Terms are included in the Regression Equation: True Process is a Random Walk with or without Drift

The true process is a random walk with drift of the type (9.144)

y t = Į+y t-1 +İ t

where İ t ~IID(0, ı 2 ) , and the Į may or may not be zero. The following regression equation is considered for estimation: (9.145)

y t = Į +ȡy t-1 +įt+İ t

Without loss of information, we assume that y0 0 , and if Į z 0, then y t-1 would be asymptotically equivalent to a time trend. Since a time trend variable is already included as a separate variable in the regression equation, there will be an asymptotic collinear problem between y t and t. Rewriting equation (9.145), we have yt

(1  ȡ)Į  ȡ > y t-1  Į(t  1) @ + > į+ȡĮ @ t+İ t

Į* +ȡu t-1 +į* t+İ t

(9.146)

where Į* = (1  ȡ)Į, u t y t  Įt, and į* = į+ȡĮ . Under the null hypothesis H 0 : ȡ =1 and į = 0 , the transformation indicates that u t is a random walk. Under the null hypothesis, from equation (9.146), we have t

y t = Įt +¦ İ i

(9.147)

i =1

Therefore, we have (9.148)

u t = İ1 + İ 2 +........+ İ t

With this transformation, we regress y t on a constant D* , a drift, random walk u t-1 , and a deterministic time trend t. The OLS estimates of the regression coefficients of (9.146) are given by

ª Įˆ *T º « » « ȡˆ T » « ˆ* » ¬ įT ¼

ª « T « «T « ¦ u t-1 « t=1 « T « ¦t ¬« t=1

T

¦ u t-1 t=1 T

¦u

2 t-1

t=1 T

¦ tu

t-1

t=1

T

º t » ¦ t=1 » T » tu t-1 » ¦ t=1 » T 2 » t » ¦ t=1 ¼»

1

ª T º « ¦ yt » « t=1 » «T » « ¦ y t u t-1 » « t=1 » « T » « ¦ ty t » ¬« t=1 ¼»

(9.149)

Here, the hypothesis is that Į = Į 0 , ȡ =1, and į = 0, which implies that, in the transformed equation, it would be that Į* = 0, ȡ=1, and į* = Į 0 . Thus, the OLS estimates in the deviation form from these true values are given by

ª Įˆ *T º « » « ȡˆ T  1 » « ˆ* » ¬įT  Į 0 ¼

ª « T « «T « ¦ u t-1 « t=1 « T « ¦t ¬« t=1

T

¦ u t-1 t=1 T

¦u

2 t-1

t=1 T

¦ tu t=1

t-1

T

º t » ¦ t=1 » T » tu t-1 » ¦ t=1 » T 2 » t » ¦ t=1 ¼»

1

ª T º « ¦ İt » « t=1 » «T » « ¦ u t-1İ t » « t=1 » « T » « ¦ tİ t » ¬« t=1 ¼»

(9.150)

Note that these three estimates have different convergence rates. The convergence rate of Įˆ *T , ȡˆ T , and įˆ *T are T1/2 , T and T 3/2 respectively. Therefore, for this case, we need a scaling matrix of the type

Chapter Nine

448

HT

ª T1/2 « « 0 « 0 ¬

0

0 º » T 0 » 0 T 3/2 »¼

(9.151)

Using the scaling matrix, equation (9.150) can be written as

ªT1/2 « « 0 « 0 ¬

0 º ª Įˆ *T º » »« T 0 » « ȡˆ T  1 » 0 T3/2 »¼ «įˆ *T  D 0 » ¬ ¼ 0

ª « « ªT 1/2 «« «« 0 «« 0 «¬ « ¬«

0 T1 0

ª « « ªT 1/2 «« u «« 0 «« 0 «¬ « ¬«

ª T1/2 Įˆ *T º « » « T(ȡˆ T  1) » « 3/2 ˆ * » ¬T (įT  D 0 ) ¼

ª 1 « « « 3/2 T «T ¦ u t-1 t=1 « « 2 T « T ¦t t=1 ¬«

T

T 3/2 ¦ u t-1 t=1

T

2

T

¦u

T

ª « T º 0 «T »« 0 » «¦ u t-1 t=1 T 3/2 »¼ « T « « ¦t «¬ t=1

2 t-1

t=1 T

T 5/2 ¦ tu t-1 t=1

0 T 1 0

¦ u t-1 t=1 T

¦ u 2t-1 t=1 T

¦ tu

t-1

t=1

T

º t » ¦ t=1 » ªT1/2 T »« tu t-1 » « 0 ¦ t=1 »« 0 ¬ T 2 » t » ¦ »¼ t=1

º » 0 º» »» 0 »» T3/2 »¼ » » » ¼»

0 T 1 0

ª T ºº « ¦ İt »» »» 0 º « T t=1 »» »« 0 » « ¦ u t-1İ t » » t=1 »» T 3/2 »¼ « T « »» « ¦ tİ t » » «¬ t=1 »¼ ¼»

T º T 2 ¦ t » t=1 » T » 5/2 T ¦ tu t-1 » t=1 » T 3 2 » T ¦t » t=1 ¼»

1

1

(9.152)

ª 1/2 T º « T ¦ İt » t=1 « » « 1 T » « T ¦ u t-1İ t » t=1 « » « 3/2 T » « T ¦ tİ t » t=1 ¬« ¼»

(9.153)

The limiting distribution of each term in equation (9.153) can be found in Proposition 9.1. Based on the propositions, equation (9.153) can be written as 1 1 ª º 1 ı ³ W(r)dr « » 1/2 * 0 2 ª T Įˆ T º » 1 1 « » L « 1 2 2 « » ˆ T(ȡ 1) ı W(r)dr ı [W(r)] dr rW(r)dr V   o « » T ³0 ³0 « ³0 » « 3/2 ˆ * » 1 « » 1 1 ¬T (įT  Į 0 ) ¼ ı ³ rW(r)dr « » 0 «¬ »¼ 2 3

ª 1 1 « ª1 0 0 º « 1 = ı ««0 ı 0 »» « ³ W(r)dr 0 « «¬0 0 1 »¼ « 1 « 2 ¬«

^

ª ıW(1) « « 1 « ı 2 > W(1) @2  1 « 2 « 1 «ı W(1)  ³0 W(r)dr ¬

^

º ³0 W(r)dr » » 1 1 2 » [W(r)] dr rW(r)dr ³0 ³0 » 1 » 1 » ³0 rW(r)dr 3 ¼»

`

`

º » » » » » » ¼

`

^

1 2

1

ª W(1) « ª1 0 0 º « 1 2 u ««0 ı 0 »» « > W(1)@  1 « 2 «¬0 0 1 »¼ « 1 « W(1)  ³0 W(r)dr ¬

^

1

`

1

ª1 0 0 º «0 ı 0» « » «¬0 0 1 »¼

1

º » » » » » » ¼

Univariate Time Series Econometrics with Unit Roots

ª 1 « ı 0 0 ª º« 1 = «« 0 1 0 »» « ³ W(r)dr « 0 «¬ 0 0 ı »¼ « 1 « «¬ 2

1

1 2

º ³0 » » 1 1 2 » u [W(r)] dr rW(r)dr ³0 ³0 » 1 » 1 » ³0 rW(r)dr »¼ 3 1

ª W(1) « « 1 2 « > W(1)@  1 « 2 « 1 « W(1)  ³0 W(r)dr ¬

^

449

`

^

`

W(r)dr

º » » » » » » ¼

(9.154)

The DF unit root ȡ test in this case is given by the middle row of (9.154). The asymptotic distribution of T(ȡˆ T  1) does not depend on either ı or Į . Thus, in practice, it does not matter whether or not the true value of Į is zero. The DF t-test can also be used to test the null hypothesis H 0 : ȡ = 1, against the alternative hypothesis H1: ȡ 0 1 0@ « ¦ u t-1 « t=1 « T « ¦t «¬ t=1

ª « « ªT1/2 «« sT2 > 0 1 0@ « « 0 «« 0 «¬ « ¬«

T

¦u

T

t-1

t=1 T

¦u

2 t-1

t=1 T

¦ tu

t-1

t=1

0 T 1 0

ª 1 « « T « s T2 > 0 1 0@ «T -3/2 ¦ u t-1 t=1 « « -2 T « T ¦t t=1 ¬«

º t » ¦ t=1 » T » tu t-1 » ¦ t=1 » T 2 » t » ¦ »¼ t=1

ª « T 0 º« T »« 0 » «¦ u t-1 t=1 T 3/2 »¼ « T « « ¦t ¬« t=1 T

T -3/2 ¦ u t-1 t=1

T

-2

T

¦u

2 t-1

t=1 T

T -5/2 ¦ tu t-1 t=1

ª 1 1 « ª1 0 0 º « 1 L T 2 ı ȡ2ˆ T  o ı 2 > 0 1 0@ ««0 ı 0 »» « ³ W(r)dr « 0 «¬ 0 0 1 »¼ « 1 « «¬ 2

1

ª0º « » «1 » ¬«0 ¼»

T

¦u

T

t-1

t=1 T

¦u

2 t-1

t=1 T

¦ tu t=1

t-1

º t » ¦ t=1 » ªT 1/2 T »« tu t-1 » « 0 ¦ t=1 »« 0 T ¬ 2 » t » ¦ t=1 ¼»

T º T -2 ¦ t » t=1 » T » -5/2 T ¦ tu t-1 » t=1 » T -3 2 » T ¦t » t=1 ¼»

0 T1 0

1

ª0º «1 » « » «¬0 »¼

1 2

1

º ³0 » » 1 1 2 » u [W(r)] dr rW(r)dr ³0 ³0 » 1 » 1 » ³0 rW(r)dr »¼ 3 1

W(r)dr

º » 0 º» »» 0 »» T3/2 »¼ » » » ¼»

1

ª0º « » «1 » «¬0»¼

Chapter Nine

450 1

ª1 0 0 º ª0 º « 0 ı 0 » «1 » « » « » «¬0 0 1 »¼ «¬0 »¼ ª 1 « « 1 >0 1 0@ «« ³0 W(r)dr « 1 « 2 ¬«

1 2

º ³0 W(r)dr » » 1 1 2 » [W(r)] dr rW(r)dr ³0 ³0 » 1 » 1 » ³0 rW(r)dr 3 ¼» 1

1

ª0 º «1 » « » «¬0 »¼

Q

(9.156)

Thus, the asymptotic distribution of the OLS t-test is given by t=

T(ȡˆ T  1) 2

T ıˆ

2 ȡT

P  o

T(ȡˆ T  1) Q

(9.157)

Therefore, we can say that the asymptotic distribution of the DF t-test does not depend on Į and ı . The table values of the non-standard t distribution are given corresponding to different levels of significance and different sample sizes. We compare the calculated value of the test statistic with the table values at a given level of significance. If the calculated value is less than the table value, we reject the null hypothesis indicating that there is no problem of unit root in the data; otherwise, we accept the null hypothesis indicating that there is a problem of unit root in the data.

9.8 Summary of Dickey-Fuller Tests for Unit Roots in a First-Order Autoregressive Model (Absence of Serial Correlation) Case 1: No Constant and Trend Terms are included in the Regression Equation

The true process is purely a random walk model of the type y t = y t-1 +İ t , where, İ t ~IIN(0, ı 2 )

(9.158)

We consider the following regression equation for estimation y t = ȡy t-1 + İ t

(9.159)

The null hypothesis to be tested for the unit root problem is H 0 : ȡ =1

against the alternative hypothesis H1: ȡ < 1 .

Under the null hypothesis, the Dickey-Fuller test statistic is given by

W

ȡˆ  1 [It is called the tau statistic] ıˆ ȡˆ

(9.160)

where ȡˆ is the OLS estimation of ȡ and is given by T

ȡˆ =

¦y t=1 T

t-1

¦y

yt

(9.161) 2 t-1

t=1

and ıˆ ȡˆ is the usual OLS standard error of the OLS estimate ȡˆ , which is given by

Univariate Time Series Econometrics with Unit Roots

ıˆ ȡˆ =

s2 T

¦y

, where s 2 = 2 t-1

1 T ˆ t-1 ) 2 ¦ (y t  ȡy T  1 t=1

451

(9.162)

t=1

This tau statistic is known as the Dickey-Fuller test. We then compare the calculated value of the Dickey-Fuller test with Dickey-Fuller table values at 1%, 5% or 10% level of significance that are given in Table 9-1. If the calculated value of the test statistic is smaller than the table value, we reject the null hypothesis of unit root, indicating that the time-series data does not exhibit the unit root problem; in other words, we can say that the time-series data is stationary. Otherwise, the time-series data is said to be non-stationary or it indicates that the time-series data exhibits the unit root problem. Table 9-1: The critical values of the Dickey-Fuller test at 1%, 5% and 10% levels of significance for case 1

Sample Size T T=25 T=50 T= 100 T=250 T=500 T= ’

0.01 -2.66 -2.62 -2.60 -2.58 -2.58 -2.58

0.05 -1.95 -1.95 -1.95 -1.95 -1.95 -1.95

0.10 -1.60 -1.61 -1.61 -1.62 -1.62 -1.62

Source: Fuller, W. A. (1976), Introduction to Statistical Time Series, John Wiley & Sons, New York, p-373

We can also apply the Dickey-Fuller ȡ test in testing the null hypothesis. The test statistic is given by ȡ-test = T(ȡˆ  1)

(9.163)

Under the assumption that the true value of ȡ is unity, the limiting distribution of T(ȡˆ  1) is already derived in case one. We then compare the calculated value of the Dickey-Fuller ȡ test with the Dickey-Fuller table values at 1%, 5% or 10% level of significance which are given below in Table 9-2. If the calculated value of the test statistic is greater than the table value, we accept the null hypothesis of unit root, indicating that the time-series data exhibits the unit root problem; in other words, we can say that the time-series data is non-stationary. Otherwise, the time-series data is said to be stationary, or it indicates that the time-series data does not exhibit the unit root problem. Table 9-2: The critical values of the Dickey-Fuller ȡ test at 1%, 5% and 10% levels of significance for case one

Sample Size T T=25 T=50 T= 100 T=250 T=500 T= ’

0.01 -11.9 -12.9 -13.3 -13.6 -13.7 -13.8

0.05 -7.3 -7.7 -7.9 -8.0 -8.0 -8.1

0.10 -5.3 -5.5 -5.6 -5.7 -5.7 -5.7

Ex. 9-4: For the given problem in Ex. 8-12, test the null hypothesis of the unit root problem for case one. Solution: The following AR(1) model is considered for estimation while the true model is a purely random walk: PGDPt = ȡPGDPt-1 +İ t , where İ t ~IID(0, ı 2 )

(9.164)

The null hypothesis to be tested is H 0 : ȡ =1

against the alternative hypothesis H1: ȡ < 1 .

Under the null hypothesis, the Dickey-Fuller test statistic is given by equation (9.160). The OLS estimate of ȡ is ȡˆ = 1.0417, and ıˆ ȡˆ = 0.0037 , where ıˆ ȡˆ is the OLS standard error of the OLS estimate ȡˆ . Putting the values of ȡˆ

and ıˆ ȡˆ in equation (9.160), we have

Chapter Nine

452

W

1.0417  1 0.0037 = 11.2715

(9 .165)

Comment: At a 5% level of significance with T = 50, the table value of the DF test statistic is -1.95 [see Table 9-1] . The calculated value of the test statistic is greater than the table value, and hence, the null hypothesis of unit root will be accepted. Therefore, it can be said that the time-series variable PGDP contains the unit root problem of order 1.

The DF ȡ test value is also given by ȡ-test = 50 u (1.0417  1.0)

= 2.0835

(9.166)

Comment: At a 5% level of significance with T = 50, the table value of the DF ȡ-test statistic is -7.7 [see Table 9-2]. The calculated value of the test statistic is greater than the table value, and hence, the null hypothesis of unit root will be accepted. Thus, it can be said that the data PGDP contains the unit root problem with order 1. Test for the Second-Order Unit Root Problem

To test for the second-order unit root problem, the following regression equation is considered for estimation while the true process is a purely random walk: ǻPGDPt = ȡǻPGDPt-1 +u t , where u t ~IID(0, ı 2 )

(9.167)

The null hypothesis to be tested is H 0 : ȡ =1

against the alternative hypothesis H1: ȡ W(1) @2  1 ¬« 2

^

º » 2 » [W(r)] dr ³ ¼

³ W(r)dr

1

`

º » » ¼»

0 ª º½ «1 »° « ^O 2  J 0 ` / O » ¾° «¬ 2 »¼ ¿

(9.211)

The second term of equation (9.211) is given by ª 1 o > 0 1@ « T(ȡˆ T  1)  « W(r)dr ¬³ L

º » 2 » [W(r)] dr ³ ¼

³ W(r)dr

1

W(1) ª «1 « > W(1) @2  1 ¬« 2

^

`

2 ª 1 1 ^O  J 0 ` « 0 1 > @ « W(r)dr 2 O2 ¬³

1 2 > W(1)@  1  W(1)³ W(r)dr 2  2 2 O2 ³ [W(r)] dr  ¬ª ³ W(r)dr ¼º

^

`

º » » ¼» 1

º ª0º » « » 2 » ¬1 ¼ [W(r)] dr ³ ¼

³ W(r)dr

1 2 ^O  J 0 ` 2 2 2 ³ [W(r)] dr  ¬ª ³ W(r)dr ¼º

^

`

1 2 W(1) @  1  W(1) ³ W(r)dr > L o2  T(ȡˆ T  1)  2 2 ª W(r)dr º  [W(r)] dr O2 ³ ¬³ ¼

1 2 ^O  J 0 ` 2 2 2 ³ [W(r)] dr  ¬ª ³ W(r)dr ¼º

(9.212)

Therefore, we have

^

`

^

`

(9.213)

The first term of equation (9.213) is the same as in (9.102) which describes that T(ȡˆ T  1) has the asymptotic distribution if İ t is i.i.d. The second term is a correction factor for serial correlation. When İ t is serially uncorrelated,

Univariate Time Series Econometrics with Unit Roots

465

then ș 0 =1, and ș j =0, for j =1, 2,……. . Thus, we can say that, if İ t 's are serially uncorrelated, 4(1) 1, and

O 2 = ı 2 = Ȗ 0 . Thus, the second term of (9.213) will disappear. Hence, it can be said that (9.213) included the earlier result (9.102) as a special case when İ t is uncorrelated. Now, to construct a sample statistic that can be used to estimate the correction factor of serial correlation, we can use the OLS standard error of ȡˆ T say ı ȡˆ T . Let us now define a scaling matrix H T such that ª T1/2 « ¬ 0

HT

0º » T¼

(9.214)

and let sT2 be the OLS estimate of the variance of İ t which is given by s T2

1 T ¦ (y t  Įˆ T  ȡˆ T y t-1 )2 T  2 t=1

Then, the asymptotic distribution of T 2 ı ȡ2ˆT can be found as

T 2 ı ȡ2ˆ T

ª T1/2 s T2 > 0 1@ « ¬ 0

1

T

ª « T T 2 s T2 .> 0 1@ « T « « ¦ y t-1 ¬ t=1

º y t-1 » ¦ 0 t=1 » ª« º» T 1 » y 2t-1 » ¬ ¼ ¦ t=1 ¼

ª T 0º « « » T¼ « T « ¦ y t-1 ¬ t=1

1

T

º y t-1 » ¦ ª T1/2 t=1 » « T » 0 y 2t-1 » ¬ ¦ t=1 ¼

1 ª1 0 º ª« 1 p T 2 ı ȡ2ˆ T  o s T2 > 0 1@ « » ¬ 0 O ¼ «¬ ³ W(r)dr

ª 1 « 0 1 > @ « W(r)dr O2 ¬³ s T2

s T2

0 º ª0º »« » T ¼ ¬1 ¼ 1

º ª 1 0 º 1 ª 0 º » « » « » 2 ³ [W(r)] dr »¼ ¬0 O ¼ ¬1 ¼

³ W(r)dr 1

º ª0º » « » 2 ³ [W(r)] dr »¼ ¬1 ¼

³ W(r)dr

1

O 2 [W(r)]2 dr  ³

^

`

³ W(r)dr

(9.215)

2

It follows from (9.213) that 2 2 1 °­ T ı ȡˆ T(ȡˆ T  1)  ® 2 T 2 ¯° s T

1­ 1 ½ 1 °½ 2 p o T(ȡˆ T  1)  ® 2 ¾ ¾ ^O  J 0 `  2 2 ¯ O ¿ [W(r)] dr  ¿° ³

^³ W(r)dr`

1 2 W(1) @  1  W(1) ³ W(r)dr > L  o2  2 2 ª W(r)dr º [W(r)] dr  O2 ³ ¬³ ¼

^

`

1­ 1 ½ 1  ® 2¾ 2 ¯ O ¿ [W(r)]2 dr  ³

^

2 2 1 ­° T ı ȡˆ T(ȡˆ T  1)  ® 2 T 2 ¯° s T

1 2 > W(1)@  1  W(1)³ W(r)dr ½° 2 L 2 o ¾ ^O  J 0 `  2 2 ¿° ³ [W(r)] dr  ª¬ ³ W(r)dr º¼

^

`

`

³ W(r)dr

2

2

^O

2

 J 0`

1 2 ^O  J 0 ` 2 2 2 ³ [W(r)] dr  ¬ª ³ W(r)dr ¼º

^

^O

2

 J 0`

(9.216)

`

Chapter Nine

466

Thus, the statistic in (9.216) has the same asymptotic distribution of (9.102). Now, the calculated value of the test statistic will be compared with the table value at 1%, 5% or 10% level of significance which are given in Table 9-9. If the calculated value is less than the table value, we reject the null hypothesis of the unit root indicating that there is no problem of the unit root in the data with serially autocorrelated problem. Otherwise, we accept the null hypothesis that there is a problem of unit root in the data. Table 9-9: The critical values of the Phillips-Perron rho-test at 1%, 5% and 10% levels of significance for case 2

Sample Size T T=25 T=50 T= 100 T=250 T=500 T= ’

0.01 -17.2 -18.9 -19.8 -20.3 -20.5 -20.7

0.05 -12.5 -13.3 -13.7 -14.0 -14.0 -14.1

0.10 -10.2 -10.7 -11.0 -11.2 -11.2 -11.3

Source: Fuller, W. A. (1976), Introduction to Statistical Time Series, John Wiley & Sons, New York, p-373

OLS t-Test

Phillips and Perron have also proposed the t-test statistic for testing the null hypothesis H 0 : ȡ =1 . The asymptotic distribution of the OLS t-test for testing the null hypothesis is given by ȡˆ T  1 ıˆ ȡˆ T

tT =

=

T(ȡˆ T  1)

(9.217)

T 2 ıȡ2ˆT

From equation (9.217), we have 2 ­1 ° 2 > W(1) @  1  W(1) ³ W(r)dr 1 ­° T 2 ı ȡ2ˆT t T  o®  ® 2 2 2 °¯ s T ° ³ [W(r)]2 dr  ª ³ W(r)dr º ¬ ¼ ¯

^

p

`

½ ½° 2 ° 2 2 ¾ ^O  J 0 `¾ y T ıȡˆ T °¿ ° ¿

2 ­1 ½ ° 2 > W(1) @  1  W(1) ³ W(r)dr ° ­§ 1 · 1 2 ½ 2 2 2 2 2 ® ¾ y T ı ȡˆ T  ®¨ ¸ (O  Ȗ 0 ) ¾ u T ı ȡˆ T y s T 2 2 2 S © ¹ ª º ¯ ¿ T ° ³ [W(r)] dr  ³ W(r)dr ° ¬ ¼ ¯ ¿

^

`

^

2 ­1 ½ 1/ 2 ° 2 > W(1) @  1  W(1) ³ W(r)dr ° ­ O 2 ½ ® ¾® 2 ¾ 2 ° ³ [W(r)]2 dr  ª ³ W(r)dr º ° ¯ sT ¿ ¬ ¼ ¯ ¿

^

`

­§ 1 · 1 ½  ®¨ ¸ (O 2  Ȗ 0 ) ¾ u T 2 ı ȡ2ˆ T y s T2 2 S ¯© ¹ T ¿

^

1/ 2

`

2 1/ 2

^³ [W(r)] dr  ^³ W(r)dr` ` 2

1/ 2

`

1 2 > W(1)@  1  W(1)³ W(r)dr ­ O 2 ½1/ 2 ­§ 1 · 1 2 ½ 2  ®¨ ¸ (O  Ȗ 0 ) ¾ u ® ¾ 1/ 2 2 2 2 ¯ sT ¿ ¯© 2 ¹ ST ¿ ³ [W(r)] dr  ª¬ ³ W(r)dr º¼

^

`

^

`

^T ı

1/ 2

`

(9.218)

1 T p (y t  Įˆ T  ȡˆ T y t-1 ) 2  o E(İ 2t ) = Ȗ 0 ¦ T  2 t=1

(9.219)

2

2 ȡˆ T

y s T2

Moreover, we have s T2

Univariate Time Series Econometrics with Unit Roots

467

Therefore, equation (9.218) can be written as 1/ 2

­ Ȗ0 ½ ® 2¾ ¯O ¿

1/ 2

­ Ȗ0 ½ ® 2¾ ¯O ¿

1/ 2

­ Ȗ0 ½ ® 2¾ ¯O ¿

1 2 > W(1)@  1  W(1) ³ W(r)dr ­§ 1 · 1 2 ½ 2 o  ®¨ ¸ (O  Ȗ 0 ) ¾ u Tıȡˆ T y s T t T  2 1/ 2 2 O ¯© ¹ ¿ 2 ³ [W(r)] dr  ª¬ ³ W(r)dr º¼ p

^

`

^

^

`

`

1 2 W(1)@  1  W(1) ³ W(r)dr > ­§ 1 · 1 2 ½ L o2 t T  ®¨ ¸ (O  Ȗ 0 ) ¾ u Tı ȡˆ T y s T  2 1/ 2 ¯© 2 ¹ O ¿ 2 ³ [W(r)] dr  ª¬ ³ W(r)dr º¼

^

`

^

`

^

`

1 2 > W(1)@  1  W(1) ³ W(r)dr ­§ 1 · 1 2 ½ L 2 o t T  ®¨ ¸ (O  Ȗ 0 ) ¾ u Tı ȡˆ T y s T  2 1/ 2 ¯© 2 ¹ O ¿ 2 ª W(r)dr º  [W(r)] dr ³ ¬³ ¼

^

`

^

`

^

`

(9.220)

This limiting distribution is the same as in (9.111). Since equation (9.220) contains unknown population parameters O and Ȗ 0 , we have to replace them with the estimated values. The estimated values of Ȗ 0 and O are given below: Ȗˆ 0 =

1 T 2 ¦ et , where et = y t  Įˆ T  ȡˆ T y t-1 T t=1

(9.221)

Phillips and Perron used the standard OLS estimate Ȗˆ 0 =

1 T 2 ¦ et T  2 t=1

(9.222)

From the result of Proposition 9.2 , O 2 is the asymptotic variance of the sample mean of T

L Tİ = T -1/2 ¦ İ t  o N(0, Ȝ 2 )

(9.223)

t=1

The magnitude of O 2 can equivalently be described as

O2

V 2 [4(1)]2

f

Ȗ 0 + 2¦ Ȗ j

(9.224)

2ʌs İ (0)

j=1

where Ȗ j is the jth autocovariance of İ t , and s İ (0) is the population spectrum of İ t at frequency zero. The NeweyWest estimator can be used for O 2 , for the first q autocovariances. The Newey-West estimator O 2 is given by

Oˆ 2

q ª j º Ȗˆ 0 + 2¦ «1  » Ȗˆ j , where Ȗˆ j q+1 ¼ j=1 ¬

1 T ¦ et et-j T t=j+1

(9.225)

Now, putting these estimated values in equation (9.220), we have 1/ 2

­ Ȗˆ 0 ½ ® ˆ2 ¾ ¯Ȝ ¿

1 2 W(1) @  1  W(1) ³ W(r)dr > ­§ 1 · 1 ˆ 2 ½ L t T  ®¨ ¸ (Ȝ  Ȗˆ 0 ) ¾ u Tıˆ ȡˆ T y s T  o2 2 1/ 2 ¯© 2 ¹ O ¿ 2 ³ [W(r)] dr  ª¬ ³ W(r)dr º¼

^

`

^

^

`

`

(9.226)

The table values of the non-standard t distribution are given corresponding to different levels of significance and different sample sizes. We then compare the calculated value of the test statistic with the table value which are given in Table 9-10 at a given level of significance. If the calculated value is less than the table value, we reject the null hypothesis that there is no problem of the unit root in the data; otherwise, we accept the null hypothesis that there is a problem of the unit root in the data.

Chapter Nine

468

Table 9-10: The critical values of the Phillips-Perron t-test at 1%, 5% and 10% levels of significance for case 2

Sample Size T T=25 T=50 T= 100 T=250 T=500 T= ’

0.01 -3.75 -3.58 -3.51 -3.46 -3.44 -3.43

0.05 -3.00 -2.93 -2.89 -2.88 -2.87 -2.86

0.10 -2.63 -2.60 -2.58 -2.57 -2.57 -2.57

Source: Fuller, W. A. (1976), Introduction to Statistical Time Series, John Wiley & Sons, New York, p-373

Phillips-Perron Test for Cases 1 and 4

The Phillips-Perron test is derived for case 2. In the same way, we can derive the Phillips-Perron ȡ-test and t-test for cases 1 and 4 respectively. Then, we can compare the test values with the table values which are given in Table 9-11 and Table 9-12 for cases 1 and 4 for different levels of significance and corresponding to different sample sizes respectively. If the calculated value of the test statistic is less than the table value, we reject the null hypothesis of the unit root; otherwise, we accept the null hypothesis of the unit root. Table 9-11: The critical values of the Phillips-Perron U -test and t-test at 1%, 5% and 10% levels of significance for case 1

The critical values of the Phillips-Perron U -test at 1%, 5% and 10% levels for case 1 Sample Size T 0.01 0.05 0.10 -5.3 -7.3 -11.9 T=25 -5.5 -7.7 -12.9 T=50 -5.6 -7.9 -13.3 T= 100 -5.7 -8.0 -13.6 T=250 -5.7 -8.0 -13.7 T=500 -5.7 -8.1 -13.8 T= ’ The critical values of the Phillips-Perron t-test at 1%, 5% and 10% level for case 1 Sample Size T 0.01 0.05 0.10 -1.60 -1.95 -2.66 T=25 -1.61 -1.95 -2.62 T=50 -1.61 -1.95 -2.60 T= 100 -1.62 -1.95 -2.58 T=250 -1.62 -1.95 -2.58 T=500 -1.62 -1.95 -2.58 T= ’ Table 9-12: The critical values of the Phillips-Perron ȡ-test and t-test at 1%, 5% and 10% levels of significance for case 4

The critical values of the Phillips-Perron ȡ-test at 1%, 5% and 10% levels for case 4 Sample Size T 0.01 0.05 0.10 -15.6 -17.9 -22.5 T=25 -16.8 -19.8 -25.7 T=50 -17.5 -20.7 -27.4 T= 100 -18.0 -21.3 -28.4 T=250 -18.1 -21.5 -28.9 T=500 -18.3 -21.8 -29.5 T= ’ The critical values of the Phillips-Perron t-test at 1%, 5% and 10% levels for case 4 Sample Size T 0.01 0.05 0.10 -3.24 -3.60 -4.38 T=25 -3.18 -3.50 -4.15 T=50 -3.15 -3.45 -4.04 T= 100 -3.13 -3.43 -3.99 T=250 -3.13 -3.42 -3.98 T=500 -3.12 -3.41 -3.96 T= ’

Univariate Time Series Econometrics with Unit Roots

469

9.10 Summary of the Phillips-Perron Tests for Unit Roots in a First-Order Autoregressive Model Case 1: No Constant and Trend Terms are included in the Regression Equation

The true process is a purely random walk model without drift with serially correlated errors of the type (9.227)

y t = y t-1 +İ t f

¦ș u

where İ t = ș(L)u t

j

f

¦ j.|ș | < f, and u ~i.i.d.(0, ı

t-j ,

t

j

j=0

2

, ȝ4 ) .

j=0

The following AR(1) model is considered for estimation (9.228)

y t = ȡy t-1 +İ t

The null hypothesis to be tested for a unit root problem is H0 : ȡ 1

against the alternative hypothesis H1 : ȡ  1 .

Under the null hypothesis, the Phillips-Perron ȡ-test statistic is given by 2 2 1 ­° T ıˆ ȡˆ T ˆ Zȡ = T(ȡ T  1)  ® 2 2 °¯ s T

½° 2 ¾ Oˆ  Ȗˆ 0 °¿

^

`

(9.229)

and the Phillips-Perron t-test is given by 1/ 2

Zt

­ Ȗˆ 0 ½ ® ˆ2 ¾ ¯O ¿

­§ 1 · 1 ½ t T  ®¨ ¸ (Oˆ 2  Ȗˆ 0 ) ¾ u Tıˆ ȡˆ T y s T ˆ ¯© 2 ¹ O ¿

^

T

where ȡˆ T =

¦y

t-1

yt , s T2

t=1 T

¦y

2 t-1

`

(9.230)

1 T ¦ (y t  ȡˆ T y t-1 )2 , ıˆ ȡˆ T T  1 t=1

SE(ȡˆ T ), Oˆ 2

q ª j º Ȗˆ 0 + 2¦ «1  Ȗˆ j , q+1 »¼ j=1 ¬

t=1

Ȗˆ j

1 T 1 T 2 ˆ e e , Ȗ = et , et ¦ t t-j 0 T ¦ T t=j+1 t=1

y t  ȡˆ T y t-1 , and t T =

ȡˆ T  1 . ıˆ ȡˆ

We then compare the calculated value of the Phillips-Perron ȡ -test and t-test values with the table values given in Table 9-11 at 1%, 5% or 10% level of significance respectively. If the calculated value of any test statistic is smaller than the table value, the null hypothesis of the unit root will be rejected, indicating that the time-series data does not contain the unit root problem. That is, the time-series variable is stationary. Otherwise, the time-series variable is said to be non-stationary indicating that the time-series data exhibits the unit root problem. Ex. 9-8: For the given problem in Ex. 8-12, test the presence of the unit root problem using the Phillips-Perron test for case 1. Solution: The following AR(1) model is considered for estimation while the true model is a purely random walk with serially autocorrelated error terms without drift

(9.231)

PGDPt = ȡPGDPt-1 +İ t

where PGDPt is the per capita GDP (constant 2015 USD) of Bangladesh at time t. İ t = ș(L)u t

f

f

2

¦ ș u , ¦ j.|ș | < f, and u ~i.i.d.(0, ı , ȝ j

j=0

t-j

j

j=0

t

4

).

Chapter Nine

470

The null hypothesis to be tested is H0 : ȡ 1

against the alternative hypothesis H1: ȡ  1 .

Under the null hypothesis, the Phillips-Perron rho-test statistic is given by (9.229). For the given data, we have T

ȡˆ T =

¦y

y

t-1 t

t=1 T

¦y

1.0417 , T=50, Ȗˆ 0 = 2 t-1

1 T 2 ¦ et T t=1

21218.7949 50

424.3759 , s 2 T =

1 T 2 ¦ et T-1 t=1

21218.7949 49

433.0366 ,

t=1

Ȗˆ 1

1 T ¦ et et-1 T t=j+1

Ȗˆ 4

1 T ¦ et et-4 T t=j+1

4474.6880 50 4910.5424 50

89.4938, Ȗˆ 2

1 T ¦ et et-2 T t=j+1

98.2108, s 2 T =

1 T 2 ¦ et T-1 t=1

5589.7016 111.7940, Ȗˆ 3 50 21218.7949 49

1 T ¦ et et-3 T t=j+1

9809.9801 196.1996, 50

433.0366 , and ıˆ ȡ2ˆ T = 0.0037 2 = 1.3616e-005.

Therefore, we have

Oˆ 2

Ȗˆ 0 +

2u 4 2u3 2u 2 2 u1 Ȗˆ 1  Ȗˆ 2  Ȗˆ 3  Ȗˆ 4 5 5 5 5

8 6 4 2 = 424.3759+ u 89.4938+ u 111.794+ u 196.1996+ u 98.2108 5 5 5 5

= 897.9628 Thus, putting the values of all the terms in equation (9.229), we have 1 ­ 502 u 0.003702 ½ Zȡ = 50 u (1.0417  1)  ® ¾ ^897.9628  424.3759` 2 ¯ 433.0366 ¿

= 2.0649

(9.232)

Decision: At a 5% level of significance with T= 50, the table value of the Phillips-Perron rho test statistic is -7.7 [see Table 9-11]. Since the calculated value is greater than the table value, the null hypothesis of unit root will be accepted indicating that the time-series variable PGDP of Bangladesh exhibits the unit root problem with the presence of serial autocorrelation.

The Phillips-Perron t-test statistic is given by 1/ 2

­ 424.3759 ½ Zt = ® ¾ ¯ 897.9628 ¿

­1.0417  1 ½ ^897.9628  424.3759` 50 u 0.00370 u ® ¾ 2 897.9628 433.0366 ¯ 0.00370 ¿

= 7.6933

(9.233)

Decision: At a 5% level of significance with T= 50, the table value of the Phillips-Perron t-test statistic is -1.95 [see Table 9-11]. Since the calculated value is greater than the table value, the null hypothesis of the unit root problem will be accepted indicating that the time-series variable PGDP of Bangladesh exhibits the unit root problem. Case 2: Only the Constant Term but no Trend Term is included in the Regression Equation

Let the true process be a purely random walk with serially correlated errors of the type (9.234)

y t = y t-1 +İ t

where İ t = ș(L)u t Ÿ y t  y t-1 = İ t = ș(L)u t ,

f

¦ j.|ș | < f, and j

j=0

The following AR(1) model is considered for estimation

u t ~i.i.d.(0, ı 2 , ȝ 4 ).

Univariate Time Series Econometrics with Unit Roots

471

(9.235)

y t = Į + ȡy t-1 + İ t

The null hypothesis to be tested for the unit root problem is H 0 : ȡ =1

against the alternative hypothesis H1: ȡ < 1 .

Under the null hypothesis, the Phillips-Perron ȡ-test statistic is given by 2 2 1 ­° T ıˆ ȡˆ Zȡ = T(ȡˆ T  1)  ® 2 T 2 ¯° s T

½° 2 ¾ Oˆ  Ȗˆ 0 ¿°

^

`

(9.236)

and the Phillips-Perron t-test statistic is given by 1/ 2

Zt

­ Ȗˆ 0 ½ ® ˆ2 ¾ ¯O ¿

­§ 1 · 1 ½ t T  ®¨ ¸ (Oˆ 2  Ȗˆ 0 ) ¾ u Tıˆ ȡˆ T y s T ˆ 2 ¯© ¹ O ¿

^

`

(9.237) T

where ȡˆ T is the OLS estimate of ȡ and is given by ȡˆ T =

¦ (y

t-1

 y)(y t  y)

t=1

T

¦ (y t-1  y)2

, and ıˆ ȡˆ T is the usual OLS standard

t=1

error of the OLS estimate ȡˆ T , which is given by, ıˆ ȡˆ T =

s

2 T

T

¦ (y

t-1

, where s T2 =  y) 2

1 T ȡˆ 1 ˆ t-1 )2 , tT = T , (yt  Įˆ  ȡy ¦ ıˆ ȡˆT T  2 t=1

t=1

q

ª

j=1

¬

Oˆ2 Ȗˆ 0 + 2¦«1

j º » Ȗˆ j , Ȗˆ j q+1¼

T

1 1 e 2t , and e t ¦et et-j , Ȗˆ 0 = T ¦ T t=j+1 t=1 T

y t  ȡˆ T y t-1 .

We then compare the calculated value of the Phillips-Perron Zȡ test with the table values given in Table 9-9 or the Phillips-Perron Z t test value with the table values given in Table 9-10 at 1%, 5% or 10% level of significance with the given sample sizes respectively. If the calculated value of any test statistic is smaller than the table value, we reject the null hypothesis of the unit root, indicating that the time-series data does not exhibit the unit root problem, i.e., the time-series variable is stationary. Otherwise, the time-series variable is said to be non-stationary indicating that the time-series data exhibits the unit root problem with autocorrelated errors. Ex. 9-9: For the given problem in Ex. 8-6, test the presence of the unit root problem using the Phillips-Perron test for case 2. Solution: Let the true process be a purely random walk with serially correlated errors of the type

(9.238)

ER t = ER t-1 +İ t

where İ t = ș(L)u t ,

f

¦ j.|ș | < f, and u ~i.i.d.(0, ı j

t

2

, ȝ 4 ). ERt indicates the exchange rate between BDT and USD at

j=0

time t. The following AR(1) model is considered for estimation ER t = Į + ȡER t-1 + İ t

The null hypothesis to be tested for a unit root problem is H 0 : ȡ =1

against the alternative hypothesis H1: ȡ < 1 .

(9.239)

Chapter Nine

472

Under the null hypothesis, the Phillips-Perron ȡ -test statistic is given by (9.236). For the given problem, we have 154.0406 25.6282 -45.9518 3.5009 , Ȗˆ 1 0.5825 , Ȗˆ 2 -1.0444 , 44 44 44 24.6147 154.0406 0.7845, Ȗˆ 4 0.5594, s T2 = 3.6676, and ıˆ ȡ2ˆ T = 0.01292 = 0.000166. 44 42

ȡˆ T

0.9986, , T=44, Ȗˆ 0 =

Ȗˆ 3

34.5190 44

Therefore, we have

Oˆ 2

Ȗˆ 0 +

2u 4 2u3 2u 2 2 u1 Ȗˆ 1  Ȗˆ 2  Ȗˆ 3  Ȗˆ 4 5 5 5 5

8 6 4 2 = 3.5009+ u 0.5825+ u 1.044+ u 0.7845+ u 0.5594 5 5 5 5

= 2.3282 Putting the values of all the terms in (9.236), the Phillips-Perron rho-test statistic is given by 1 ­ 442 u 0.01292 ½ Zȡ = 44 u (0.9986  1)  ® ¾ ^2.3282  3.5009` 2 ¯ 3.6676 ¿

= - 0.00901

(9.240)

Decision: At 5% level of significance with T= 44, the table value of the Phillips-Perron rho-test statistic is -13.076 [see Table 9-9]. Since the calculated value is greater than the table value, the null hypothesis of the unit root problem will be accepted indicating that the time-series variable ER exhibits the unit root problem.

The Phillips-Perron t-test statistic is given by 1/ 2

­ 3.5009 ½ Zt = ® ¾ ¯ 2.3282 ¿

­ 0.9986  1 ½ ^2.3286  3.5009` 44 u 0.0129 u ® ¾ 2 2.3282 3.6676 ¯ 0.0129 ¿

= -0.0169

(9.241)

Decision: At 5% level of significance with T= 44, the table value of the Phillips-Perron t-test statistic is -2.95 [see Table 9-10]. Since the calculated value is greater than the table value, the null hypothesis of the unit root problem will be accepted indicating that the time-series variable ER exhibits the unit root problem. Case 4: Constant and Trend Terms are included in the Regression Equation: True Process is a Random Walk with Drift with Serially Correlated Errors

The true process is a purely random walk with drift and with serially correlated errors of the type (9.242)

y t = Į + y t-1 +İ t

where İ t = ș(L)u t , Į z 0,

f

¦ j.|ș | < f, and j

u t ~i.i.d.(0, ı 2 , ȝ 4 ) .

j=0

The following regression equation is considered for estimation y t = Į +ȡy t-1 +įt+İ t

The null hypothesis to be tested for the unit root problem is H 0 : ȡ =1

against the alternative hypothesis H1: ȡ W(1)@  1  W(1) ³0 W(r)dr 2 [Ȝ/ı]×T(ȡˆ T  1)  o 2 1 2 ª 1 W(r)dr º [W(r)] dr  ³0 «¬ ³0 »¼

^

L

`

(9.280)

Equation (9.280) implies that the second term [Ȝ/ı]×T(ȡˆ T  1) of (9.279) has the same asymptotic distribution as in (9.102) which describes the estimate of ȡ in a regression equation without the lagged ǻy and without serially correlated errors. From equation (9.260), we have Ȝ/ı =

1 1  ȕ1  ȕ 2  .......  ȕ p-1

(9.281)

Thus, it is clear to us that the magnitude of Ȝ/ı, is consistently obtained by ˆ ˆ = Ȝ/ı

1

(9.282)

1  ȕˆ 1.T  ȕˆ 2.T  .......  ȕˆ p-1.T

where ȕˆ j.T (j=1, 2,……..p-1) is the OLS estimate of the regression equation (9.253). Thus, the generalisation of the Dickey-Fuller rho-test when lagged changes in y are included in the regression equation is

1  ȕˆ 1.T

T(ȡˆ T  1)  ȕˆ  .......  ȕˆ 2.T

1 1 2 W(1) @  1  W(1) ³ W(r)dr > 0 L  o2 2 1 1 2 ³0 [W(r)] dr  ª¬« ³0 W(r)dr º¼»

^

p-1.T

`

(9.283)

Comment: To make a decision whether the null hypothesis will be accepted or rejected, the calculated value of T(ȡˆ T  1) is to be compared with the Dickey-Fuller table value at 1%, 5% or 10% level of 1  ȕˆ  ȕˆ  .......  ȕˆ 1.T

2.T

p-1.T

significance corresponding to the given sample sizes shown in Table 9-13. Table 9-13: The critical values of the Augmented Dickey-Fuller (ADF) rho-test at 1%, 5% and 10% levels of significance for case 2:

Sample Size T T=25 T=50 T= 100 T=250 T=500 T= ’

0.01 -17.2 -18.9 -19.8 -20.3 -20.5 -20.7

0.05 -12.5 -13.3 -13.7 -14.0 -14.0 -14.1

0.10 -10.2 -10.7 -11.0 -11.2 -11.2 -11.3

Source: Fuller, W. A. (1976), Introduction to Statistical Time Series, John Wiley & Sons, New York, p-373

If the calculated value is less than the table value, we reject the null hypothesi that there is no problem of unit root in the data. Otherwise, we accept the null hypothesis indicating that there is a problem of unit root in the data with lagged changes in y. ADF t-Test Statistic to Test the Null Hypothesis

H0: ȡ = 1

To test the null hypothesis H 0 : ȡ = 1 against the alternative hypothesis H1: ȡ < 1 , the OLS t-test statistic is given by tT

(ȡˆ T  1) s T2 .ecp+1 (X cX)-1e p+1

(9.284)

where e p+1 is a [(p+1)×1] matrix where the last element is unity and all other elements are zeros. Multiplying both numerator and denominator of (9.284) by T, we have

Univariate Time Series Econometrics with Unit Roots

481

T(ȡˆ T  1)

tT

(9.285)

2 T

s .ecp+1H T (X cX)-1H T e p+1

We know, ecp+1H T (XcX)-1H T e p+1

1

ecp+1 H T-1 > X cX @ H T-1 e p+1 ª V 1 L  o ecp+1 « ¬ 0 

0 º  » e p+1 Q 1 ¼

1 2 1 1 ­ ½ O 2 ® ³ [W(r)]2 dr  ª ³ W(r)dr º ¾ « » 0 0 ¬ ¼ ¿ ¯

(9.286)

and s T2

T 1 (y t  ȕˆ 1.T ǻy t-1  ȕˆ 2.T ǻy t-2  ........  ȕˆ p-1.T y t-p+1  Įˆ T  ȡˆ T y t-1 ) 2 ¦ T  (p+1) t=1 P  o E(İ 2t ) = V 2

(9.287)

Therefore, equation (9.285) can be written as 1/ 2

­ ½ 1 1 2 ° W(1) @  1  W(1) ³ W(r)dr ° > 2 ı 0 ° ° L o (ı/Ȝ) 2 y t T  ® 2 2 ¾ 1 1 1 1 ­ ½ 2 °O2 [W(r)]2 dr  ª ³ W(r)dr º ¾ ° ³0 [W(r)] dr  «¬ª ³0 W(r)dr »¼º °¯ ¯® ³0 ¬« 0 ¼» ¿ ¿°

^

`

1 1 2 > W(1)@  1  W(1)³0 W(r)dr 2 2 1/ 2 1 ­ 1 ½ 2 ª º ® ³0 [W(r)] dr  « ³0 W(r)dr » ¾ ¬ ¼ ¯ ¿

^

`

(9.288)

This distribution is the same as in (9.111). Thus, the usual t-test can be applied to test the null hypothesis. Decision: To make a decision whether the null hypothesis will be either accepted or rejected, we have to compare the (ȡˆ T  1) with the Dickey-Filler table value given in Table 9-14 at 1%, 5% or calculated value of the test statistic t T ıˆ ȡ2ˆT

10% level of significance corresponding to given sample sizes. Table 9-14: The critical values of the ADF t-test at 1%, 5% and 10% levels of significance for case 2

Sample Size T T=25 T=50 T= 100 T=250 T=500 T= ’

The critical values of ADF t-test at 1%, 5% and 10% for case 2 0.01 0.05 -3.00 -3.75 -2.93 -3.58 -2.89 -3.51 -2.88 -3.46 -2.87 -3.44 -2.86 -3.43

0.10 -2.63 -2.60 -2.58 -2.57 -2.57 -2.57

Source: Fuller, W. A. (1976), Introduction to Statistical Time Series, John Wiley & Sons, New York, p-373

If the calculated value of the test statistic is less than the table value, we reject the null hypothesis indicating that there is no problem of unit root in the data. Otherwise, we accept the null hypothesis that there is a problem of unit root in the data with lagged changes in y.

Chapter Nine

482

ADF Test for Cases 1 and 4

In this section, the ADF ȡ -test and t-test are derived for case 2. In the same way, we can also derive the ADF ȡ -test and t-test for case 1 and 4 respectively, and then, we can compare the test value with the table value which are given in Table 9-15 and Table 9-16 for cases 1 and 4 at 1%, 5% and 10% levels of significance corresponding to different sample sizes. If the calculated value of the test statistic is less than the table value, we reject the null hypothesis of the unit root. Otherwise, we accept the null hypothesis of the unit root. Table 9-15: The critical values of the ADF ȡ -test and t-test at 1%, 5% and 10% levels of significance for case 1

The critical values of the ADF ȡ -test at 1%, 5% and 10% levels for case 1 Sample Size T 0.01 0.05 -7.3 -11.9 T=25 -7.7 -12.9 T=50 -7.9 -13.3 T= 100 -8.0 -13.6 T=250 -8.0 -13.7 T=500 -8.1 -13.8 T= ’ The critical values of the ADF t-test at 1%, 5% and 10% levels for case 1 Sample Size T 0.01 0.05 -1.95 -2.66 T=25 -1.95 -2.62 T=50 -1.95 -2.60 T= 100 -1.95 -2.58 T=250 -1.95 -2.58 T=500 -1.95 -2.58 T= ’

0.10 -5.3 -5.5 -5.6 -5.7 -5.7 -5.7 0.10 -1.60 -1.61 -1.61 -1.62 -1.62 -1.62

Table 9-16: The critical values of the ADF ȡ -test and t-test at 1%, 5% and 10% levels of significance for case 4

The critical values of ADF ȡ -test at 1%, 5% and 10% levels for case 4 Sample Size T 0.01 0.05 -17.9 -22.5 T=25 -19.8 -25.7 T=50 -20.7 -27.4 T= 100 -21.3 -28.4 T=250 -21.5 -28.9 T=500 -21.8 -29.5 T= ’ The critical values of the ADF t-test at 1%, 5% and 10% levels for case 4 Sample Size T 0.01 0.05 -3.60 -4.38 T=25 -3.50 -4.15 T=50 -3.45 -4.04 T= 100 -3.43 -3.99 T=250 -3.42 -3.98 T=500 -3.41 -3.96 T= ’

0.10 -15.6 -16.8 -17.5 -18.0 -18.1 -18.3 0.10 -3.24 -3.18 -3.15 -3.13 -3.13 -3.12

9.12 Summary of the ADF Tests for Unit Roots in Autoregressive Equations In this section, the applications of the ADF rho-test and t-test are shown for three different cases with numerical data. Case 1: No Constant and Trend Terms are included in the Estimated Autoregressive Equation: True Process is a Random Walk without Drift

The true process is a purely random walk of the type y t = y t-1 +ȕ1ǻy t-1 +ȕ 2 ǻy t-2 +........+ȕ p-1'y t-p+1 + İ t

(9.289)

where İ t ~i.i.d. N(0, ı 2 ) The following regression equation is considered for estimation y t = ȡy t-1 +ȕ1ǻy t-1 +ȕ 2 ǻy t-2 +........+ȕ p-1ǻy t-p+1 +İ t

(9.290)

Univariate Time Series Econometrics with Unit Roots

483

y t  y t-1 = ȡy t-1  y t-1 +ȕ1ǻy t-1 +ȕ 2 ǻy t-2 +........+ȕ p-1ǻy t-p+1 +İ t ǻy t = (ȡ  1)y t-1 + ȕ1ǻy t-1 +ȕ 2 ǻy t-2 +........+ȕ p-1ǻy t-p+1 +İ t ǻy t = įy t-1 +ȕ1ǻy t-1 +ȕ 2 ǻy t-2 +........+ȕ p-1ǻy t-p+1 +İ t

(9.291)

where, į = ȡ  1 The null hypothesis to be tested is H0 : ȡ = 1 Ÿ H0 : į = 0

against the alternative hypothesis H1: ȡ < 1 Ÿ H1: į < 0 .

We can apply the OLS t-test statistic which is given by ȡˆ T  1 ½ ° SE(ȡˆ T ) ° ° or ¾ ° įˆ ° t= ˆ ° SE(į) ¿ t=

(9.292)

We can also apply the ADF rho-test statistic which is given by Zȡ

1  ȕˆ 1.T

T(ȡˆ T  1)  ȕˆ  .......  ȕˆ 2.T

(9.293) p-1.T

We then compare the calculated value of the ADF t-test statistic or ADF rho-test statistic with the Dickey-Fuller table value at 1%, 5% or 10% level of significance corresponding to given sample size [see Table 9-15]. If the calculated value of the test statistic is smaller than the table value, we reject the null hypothesis of the unit root. This indicates that the time-series data is not integrated with order 1, or the time-series variable is said to be stationary. Otherwise, the time-series variable is said to be integrated with order 1, that is, the time-series variable is said to be nonstationary. Ex. 9-11: For the given problem in Ex. 8-6, test the presence of a unit root problem using the ADF for case 1. Solution: The true process is a purely random walk of the type ER t = ER t-1 +ȕ1ǻER t-1 +ȕ 2 ǻER t-2 +........+ȕ p-1'ER t-p+1 + İ t

(9.294)

where İ t ~i.i.d. N(0, ı 2 ) , and ERt is the exchange rate between BDT and USD at time t. The following regression equation is considered for estimation ER t = ȡER t-1 +ȕ1ǻER t-1 +ȕ 2 ǻER t-2 +........+ȕ p-1'ER t-p+1 + İ t

(9.295)

The null hypothesis to be tested is H 0 : ȡ =1

against the alternative hypothesis H1: ȡ 1. (iii) AIC overestimates the true order with positive probability. (iv) Since SBIC and HQIC are strongly consistent for p>1, they may be used for choosing the correct model almost surely. (v) Intuitively, AIC is inconsistent because the penalty function used does not simultaneously go to infinity as T ĺ ’, and to zero when scaled by T. (v) Ivanov and Kilian (2001) extensively study the small-sample properties of these three criteria using a variety of data-generating processes and data frequencies and found that HQIC is the best for quarterly and monthly data, both when Yt is a covariance stationary and when it is a near unit-root process. Ex. 10-10: The LR test, AIC, SBIC and HQIC are applied to select the VAR model for the time-series variables exchange rates of the British pound (EXUK), Japanese yen (EXJPN), and Canadian dollar (EXCAN) against the US dollar for 1971- 2019. Let us consider a VAR(p) model of the type ª EXUK t º ª C1 º p ª ȕ11i « EXJPN » = «C » + «ȕ t » « « 2 » ¦ « 21i «¬ EXCAN t »¼ «¬ C3 »¼ i=1 «¬ȕ31i

ȕ12i ȕ 22i ȕ32i

ȕ13i º ª EXUK t-i º ª İ1t º ȕ 23i »» «« EXJPN t-i »» + ««İ 2t »» ȕ 33i »¼ «¬ EXCAN t-i »¼ «¬ İ 3t »¼

(10.134)

where EXUK is the exchange rate of the UK, EXJPN is the exchange rate of Japan, and EXCAN is the exchange rate of Canada against the US dollar respectively. İ1t , İ 2t , and İ 3t are the three white noise processes independent from the history of EXUK, EXJPN and EXCAN, and may be correlated. To select the appropriate lag-length of the VAR model, using software package EViews, different techniques are calculated and the results are given in Table 10-8.

Multivariate Time-Series Models

521

Table 10-8: The results of different techniques for selecting the lag length of a VAR model.

Lag 0 1 2 3 4

LogL -171.5825 -50.2672 -38.3904 -32.8704 -28.8131

LR Test NA 221.0635 20.0585* 8.5868 5.7705

FPE 0.47029 0.00319 0.00283* 0.00336 0.00430

AIC 7.7592 2.7674 2.6396* 2.7942 3.0139

SBIC 7.8796 3.2492* 3.4827 3.9987 4.5797

HQIC 7.8041 2.9470* 2.9539 3.2432 3.5976

Source: Data are collected from the WDI 2020. FPE: indicates final prediction error, *: indicates lag order selected by the criteria

From the estimated results given in Table 10-8, the SBIC and HQIC both select an order 1 as optimal, while AIC and LR and FPE select order 2. Since SBIC and HQIC give the appropriate result rather than AIC and LR test, we can choose VAR(1) model. Also, using software package RATS, it is confirmed that the lag length of the VAR model will be 1. Thus, the appropriate tri-variate VAR model will be ª EXUK t º ª C1 º ªȕ111 ȕ121 ȕ131 º ª EXUK t-1 º ª İ1t º « EXJPN » = «C » + «ȕ »« » « » t » « « 2 » « 211 ȕ 221 ȕ 231 » « EXJPN t-1 » + «İ 2t » ¬« EXCAN t ¼» ¬« C3 ¼» ¬«ȕ311 ȕ 321 ȕ331 ¼» ¬« EXCAN t-1 ¼» ¬« İ 3t ¼»

(10.135)

Estimation of VAR Models

Let us consider a VAR(p) model of the type Yt = į + Ǻ1 Yt -1 + Ǻ 2 Yt -2 + ....... + Ǻp Yt-p + İ t , t = 1, 2,…….,T

(10.136)

The VAR(p) model is just a seemingly unrelated regression (SUR) model with lagged variables and deterministic terms as common regressors. Using the lag operator (L), the VAR(p) model can be written as follows Yt = į + Ǻ1 LYt + Ǻ 2 L2 Yt + ....... + Ǻ p Lp Yt + İ t

(10.137)

B(L)Yt = į + İ t

where Ǻ(L) = (I k  Ǻ1 L  Ǻ 2 L2  .......  Ǻp Lp ) The VAR(p) model is said to be stable if the roots of det(I k  Ǻ1 z  Ǻ 2 z 2  .......  Ǻp z p )

0, lie outside the complex

unit circle (have the modulus greater than one), or equivalently, if the eigenvalues of the companion matrix ª Ǻ1 «I « k « . F= « « . «0 « ¬« 0

Ǻ2 0 . . 0 0

... ... . . . . ... ...

. . . .

. ... 0 ... ... I k

Ǻp º 0 »» . » » . » 0» » 0 ¼»

Have the modulus less than one. Assuming that the process has been initialised in the infinite past, a stable VAR(p) process is stationary and ergodic with time-invariant means, variances and autocovariances. If Yt is a covariance stationary, the unconditional mean is given by ȝ = (I k  %1 L  %2 L2  .......  %p Lp )-1 į

(10.138)

The mean-adjusted form of the VAR(p) is then given by Yt  ȝ = Ǻ1 (Yt-1  ȝ) + Ǻ 2 (Yt-2  ȝ) + ....... + Ǻp (Yt-p  ȝ) + İ t y t = Ǻ1 y t-1 + Ǻ 2 y t-2 +.......+Ǻp y t-p +İ t

(10.139)

Chapter Ten

522

where y t - j

Yt- j  ȝ, j = 0, 1, 2,.........,p

We can estimate this VAR(p) model using the ordinary least-squares (OLS) estimators computed separately from each equation assuming that the VAR(p) model is a covariance stationary, and there are no restrictions on the parameters of the model. In SUR notation, the ith equation in the VAR(p) model can be written as y i = Zȕ i +İ i , ( i=1, 2,……,k )

(10.140)

where y i denotes a (T×1) vector of observations of the ith (i=1, 2,……,k) equation, Z is a (T×m) matrix with tth row and is given by Zct

(1, y t-1 , y t-2 ,....., y t-p )c , m = (kp+1), ȕ i is an (m u 1) vector of parameters, and İ i is a (T×1)

of error terms with the covariance matrix ı i2 IT . Since the VAR(p) model is in the form of a SUR model where each equation has the same explanatory variables, each equation may be estimated separately by the ordinary least squares without losing efficiency relative to the generalised least squares. Now, applying the OLS method to equation (10.140), the OLS estimator of ȕ i is given by ȕˆ i = (ZcZ)-1 Zcy i , i = 1, 2,.....,k

(10.141)

Let ȕˆ = ª¬ȕˆ 1

ȕˆ 2 ... ... ȕˆ k º¼ be an (m×k) matrix of the least squares coefficients for k equations of the VAR(p) ˆ denote the operator that stacks the columns of the (m×k) matrix ȕˆ into a long (km × 1) vector. model. Let vec(ȕ) That is, ª ȕˆ 1 º « » « ȕˆ 2 » vec(ȕˆ ) = « . » « » « . » « » ˆ ¬«ȕ k ¼»

(10.142)

Under standard assumptions regarding the behavior of stationary and ergodic VAR models (see Hamilton (1994) or ˆ is consistent and asymptotically normally distributed with an asymptotic varianceLutkepohl (1991)), vec(ȕ) covariance matrix. ˆ )]= ȍ ˆ … (ZcZ)-1 Asy[var(vec(ȕ) ª ıˆ 12 « «ıˆ 21 ˆ where ȍ = « . « « . « ıˆ ¬ k1

(10.143)

ıˆ 12 ıˆ 22

... ... ıˆ 12 º » ... ... ıˆ 2k » T ˆ and is given by ıˆ = 1 eit e jt , where . ... ... . » ; here, ıˆ ij is the (i,j)th element of ȍ ¦ ij T  p t=p+1 » . ... ... . » ıˆ k2 ... ... ıˆ 2k »¼ eit is the least-squares residuals from the ith equation of the VAR(p) model given by: eit = yit  yˆ it .

Ex. 10-11: For the given problem in Ex.10-10, VAR(1) model is selected using different criteria. Therefore, VAR(1) model is estimated for the variables EXUK, EXJPN and EXCAN. The software package EViews is used to estimate the VAR(1) model. The estimated results are given in Table 10-9.

Multivariate Time-Series Models

523

Table 10-9: The estimated results of a VAR(1) model.

Vector Autoregression Estimates Sample (adjusted): 1972 2019 Included observations: 48 after adjustments EXUK EXJPN 0.8783 -72.0632 (0.1005) (34.9214) [ 8.7371] [-2.0636] -3.86E-05 0.8507 (0.00012) (0.0403) [-0.33302] [ 21.1267] -0.02504 2.9138 (0.0535) (18.6002) [-0.4677] [ 0.1567] 0.1170 58.2792 (0.0759) (26.3494) [ 1.5431] [ 2.2118] 0.7593 0.9431 0.7429 0.9392 0.1038 12521.26 0.0486 16.8693 46.2629 243.1740 79.1776 -201.6446 -3.1324 8.5685 -2.9765 8.7245 0.6029 155.5779 0.0958 68.4361

Var. EXUK(-1) EXJPN(-1) EXCAN(-1) C R-squared Adj. R-squared Sum sq. resids S.E. equation F-statistic Log likelihood Akaike AIC Schwarz SC Mean dependent S.D. dependent

EXCAN 0.2547 (0.1386) [ 1.8373] 0.00013 (0.00016) [ 0.8052] 0.8334 (0.07384) [ 11.2867] 0.0381 (0.1046) [ 0.3643] 0.8409 0.8301 0.1973 0.0670 77.5267 63.7504 -2.4896 -2.3337 1.2296 0.1624

Note: values within parentheses are the standard errors and within third brackets are t- statistics

From the estimated results of the VAR(1) model given in Table 10-9, it can be said that there is a little evidence of lead-lags interactions among the time-series variables. Maximum Likelihood Estimation and Hypothesis Testing for Unrestricted Vector Autoregression Models

Let us consider the pth-order Gaussian VAR model of the type p

Yt = į + ¦ Ǻ j Yt- j + İ t

(10.144)

j=1

All the terms of (10.144) have already been explained previously. Suppose, we have observed the k variables for (T+p) time periods assuming that the initial values Y0 , Y1 , Y-2 , ........., Y-p 1 are fixed and the estimation is based on the last T observations denoted by Y1 , Y2 , Y3 , ........., YT . The conditional likelihood function is given by f YT ,YT-1 ,............,Y1 |Y0 , Y1 , Y-2 ,.........,Y-p1 (y T , y T-1 , .....y 1 | y 0 , y -1 , .....y -p+1 ;ș)

(10.145)

We maximise this function with respect to ș , where ș is a vector that contains the elements of į, Ǻ1 , Ǻ 2 , .........., Ǻp and ȍ . Vector autoregression functions can be estimated invariably on the basis of the conditional likelihood function (10.145) rather than the full-sample unconditional likelihood function. Conditional on the values of Y observed through date (t-1), the value of Y for the time period t is equal to a constant plus an N(0, ȍ) variable, i.e.,  p į + ¦ Ǻ j Yt - j  İ t . Thus, we j=1

Constant

have p

E ª¬ Yt | Yt-1 , Yt-2 ,........., Y-p 1 º¼

and

į + ¦ Ǻ j Yt - j j=1

(10.146)

Chapter Ten

524

Var ª¬ Yt | Yt-1 , Yt-2 ,........., Y-p 1 º¼

(10.147)

ȍ

Hence, the distribution of Yt | Yt-1 , Yt-2 ,........., Y-p 1 is given by p § · Yt | Yt-1 , Yt-2 ,........., Y-p 1 ~ N ¨ į + ¦ Ǻ j Yt- j , ȍ ¸ j=1 © ¹

(10.148)

It will be convenient for us if we express the conditional mean (10.146) in a more compact form. Let us define,

Xt

ª1 º «Y » « t-1 » « Yt-2 » « » «. » «. » « » «¬ Yt-p »¼

(10.149)

Thus, Xt is a {(kp+1) u 1} vector. Let us define Ȇ c as Ȇc

ª¬į Ǻ1

Ǻ2

.... Ǻp º¼

(10.150)

Then, Ȇ c is a (k u (kp+1)) matrix of parameters. Then, the conditional mean is equal to Ȇ cX t . The jth row of Ȇ c contains the parameters of the jth equation in the VAR(p) model. Therefore, equation (10.148) can be written in a more compact form as Yt | Yt-1 , Yt-2 ,........., Y-p 1 ~ N Ȇ cX t , ȍ

(10.151)

The conditional density function of the tth observation is given by f Yt |( Yt-1 , Yt-2 ,.........,Y-p1 (y t | y t-1 , y t-2 , ....., y -p+1 ;ș) = 2ʌ

-k/2

| ȍ -1 |1/2 u

exp ª(1/ 2) yt  ȆcXt c ȍ-1 yt  ȆcXt º ¬« ¼»

(10.152)

The joint density function of observations 1 through t conditioned on Y-1 , Y-2 ,....., Y-p 1 is given by f Yt , Yt-1 , Yt-2 ,.........,Y1 |Y0 , Y-1 , Y-2 ,.........,Y-p1 (y t , y t-1 , y t-2 , ....., y 1 | y 0, y -1 , y -2 , ....., y -p+1 ;ș) = f Yt-1 , Yt-2 ,.........,Y1 |Y0 , Y-1 , Y-2 ,.........,Y-p1 (y t-1 , y t-2 , ....., y 1 | y 0 , y -1 , y -2 , ....., y -p+1 ;ș) u f Yt |( Yt-1 , Yt-2 ,.........,Y-p1 (y t | y t-1 , y t-2 , ....., y -p+1 ;ș)

(10.153)

Applying this formula recursively, the likelihood function for the full sample y T , y T-1 , ........., y 1 conditioned on y 0 , y -1 , y -2 , ........., y -p+1 is the product of the individual conditional densities which can be written as f YT , YT-1 , YT-2 ,.........,Y1 |Y0 , Y-1 , Y-2 ,.........,Y-p1 (y T , y T-1 , y T-2 , ....., y 1 | y 0, y -1 , y -2 , ....., y -p+1 ;ș) T

=– f Yt |Yt-1 , Yt-2 ,.........,Y-p1 (y t | y t-1 ,y t-2 ,....., y -p+1 ;ș)

(10.154)

t=1

The sample log likelihood function is obtained by substituting (10.153) in (10.152) and then taking log which is given by T

logL(ș) =

¦ log[f t=1



Yt |Yt-1 , Yt-2 ,.........,Y-p 1

(y t | y t-1 , y t-2 , ....., y -p+1 ;ș)]

Tk T 1 T log 2ʌ  log | ȍ-1 |  ¦ ª y t  Ȇ cX t c ȍ-1 y t  Ȇ cX t º »¼ 2 2 2 t=1 «¬

(10.155)

Multivariate Time-Series Models

525

The Ȇ contains the constant term į and autoregressive coefficient Ǻ j . The maximum likelihood estimate of Ȇ can be obtained by differentiating equation (10.155) with respect to the elements of Ȇ and then equating to zero. Thus, turns out to be given by ªT ºª T º « ¦ y t Xct » « ¦ Xt Xct » ¬ t =1 ¼ ¬ t=1 ¼

ˆc Ȇ (k×(kp+1))

1

(10.156)

which can be viewed as the sample analog of the population linear equation of Yt on a constant term and Xt . The jth row of Ȇ c is given by ªT ºª T º c y X ¦ jt t « » « ¦ X t Xct » ¬ t =1 ¼ ¬ t=1 ¼

ʌj (1×(kp+1))

1

(10.157)

which is just the estimated coefficient vector from an OLS regression of y jt on Xt . Thus, the maximum likelihood estimates of the coefficients for the jth equation of a VAR model are found by an OLS regression of y jt on a constant term and p lags of all the variables in the system. Thus, the OLS residual vector et

ˆ cX yt  Ȇ t

(10.158)

Putting the MLE of Ȇ c in equation (10.155), we have ˆ log L(ȍ,Ȇ)



Tk T 1 T log 2ʌ  log | ȍ-1 |  ¦ ª¬ect ȍ-1et º¼ 2 2 2 t=1

(10.159)

The maximum likelihood estimate of ȍ can be obtained by differentiating equation (10.159) with respect to the elements of ȍ-1 and then equating to zero. Thus, we have ˆ G log L(ȍ,Ȇ) -1 Gȍ T 1 T ȍ  ¦ et ect 2 2 t=1

0

0

T

ˆ ȍ

¦ e ec t t

t=1

(10.160)

T

ˆ is given by The element of the ith row and the ith column of ȍ T

ıˆ i2 =

¦e

2 it

t=1

(10.161)

T

which is just the average squared residual from a regression equation of the ith variable in the VAR(p) model on a constant term and p lags of all the variables that are included in the VAR(p) model. ˆ is given by The element of the ith row and the jth column of ȍ T

¦e e it

ıˆ ij

jt

t=1

T

(10.162)

which is the average product of the OLS residauls for variable i and the OLS residual for variable j. The Idea of the Likelihood Ratio Test We start with a brief review of the likelihood ratio test under the multivariate normal distribution. To perform a likelihood ratio test, we have to obtain the maximum value of the likelihood function of a VAR(p) model. The maximum value of the likelihood function for a VAR(p) model is given by

Chapter Ten

526

ˆ Ȇ) ˆ log L(ȍ,



T Tk T ˆ -1 |  1 ªec ȍ ˆ -1e º log 2ʌ  log | ȍ ¦ t t¼ 2 2 2 t=1 ¬

(10.163)

ˆ is given by equation (10.160). The last term of (10.163) is a scalar quantity. Thus, we can write where ȍ

1 T ª ˆ -1 º ¦ ect ȍ et ¼ 2 t=1 ¬

T

ˆ -1e º (1/ 2)trace¦ ª¬ect ȍ t¼ t=1

T

ˆ -1e ec (1/ 2)trace¦ ȍ t t t=1

ˆ -1 Tȍ ˆ -1 º (1/ 2)trace ª¬ȍ ¼ (1/ 2)trace > T u I k @ =

Tk 2

(10.164)

Putting this value in (10.163), we have ˆ Ȇ) ˆ log L(ȍ,



Tk T ˆ -1 |  Tk log 2ʌ  log | ȍ 2 2 2

(10.165)

To perform the likelihood ratio test, equation (10.165) will be very helpful. Our objective is to choose the largest order for the time series. Let Ǻc = [Ǻ1c , Ǻc2 , ........., Ǻcp ]c be a (kp×k) matrix. Setting ȕ = vec(Ǻ), where vec(Ǻ) stacks the columns of Ǻ , ȕ is a (k 2 p u 1) vector of parameters. The null hypothesis to be tested is H 0 : R(ȕ) = 0

against the alternative hypothesis H1: R(ȕ) z 0

where R (ȕ) = 0, indicates a set of restrictions. Under the null hypothesis, the maximum value of the log likelihood function (10.165) is given by log(L 0 )



Tk T ˆ 1 |  Tk log 2ʌ  log | ȍ 0 2 2 2

(10.166)

To estimate the system under the null hypothesis, we perform a set of k OLS regressions of each variable in the system ˆ which is given by on a constant and the lag values of the variables under H 0 . Then, we obtain ȍ T

ˆ ȍ 0

¦e

ec

t 0 t0

t=1

(10.167)

T

ˆ is the variance-covariance matrix of the residual from these regressions and e is the residual vector under where ȍ 0 t0 the null hypothesis which is given by et 0

ˆ cX yt  Ȇ 0 t

(10.168)

The maximum value of the log likelihood function of the unrestricted regression equations is given by log(L ur )



Tk T ˆ 1 |  Tk log 2ʌ  log | ȍ ur 2 2 2

(10.169)

Multivariate Time-Series Models

527

ˆ is the variance-covariance matrix of the residuals from the unrestricted regression equations. Twice the log where ȍ ur likelihood ratio is given by LR = 2 > log(L ur )  log(L 0 ) @ ˆ 1 |  log | ȍ ˆ 1 |º T ª¬log | ȍ ur re ¼ ª 1 1 º T «log  log » ˆ ˆ | ȍ0 | »¼ ¬« | ȍ ur | ˆ |  log | ȍ ˆ |º T ª¬ log | ȍ 0 ur ¼

(10.170)

under the null hypothesis LR~Ȥ q , where q is the number of restrictions imposed under H 0 . Ex. 10-12: Let a bivariate VAR be estimated with four lags. Assume that the original sample contains 46 observations on each variable (denoted y -3 ,y -2 ,.....,y 46 ) and the sample observations 1 trough 42 are used to estimate the VAR(4) model. Under the null hypothesis, the VAR(3) model is estimated using the sample observations 1 through 42. Let us define, eit (i =1, 2) is the sample residual for observation t from the unrestricted regression equation of yit on constant and the lagged values of itself and also the lags of the other variable. For the sample observations, we have T

2 1

ıˆ

T

2 1t

¦e

2 2

2.5, ıˆ =

t=1

T

¦e

T

2 2t

T

e

1t 2t

=2.8, and ıˆ 12 = ıˆ 21

t=1

¦e t=1

=1.2 .

T

ª 2.5 1.2 º ˆ «1.2 2.8» for which we have log | ȍ ur | log(5.56) 1.72 . Again, let eit0 be the (i =1, 2) ¬ ¼ be the sample residual for observation t from the restricted regression equation of yit on a constant and its own 3 lagged values and also the three lags of the other variable. For the restricted VAR model, we have

ˆ Therefore, we have ȍ ur

T

2 10

ıˆ

T

2 1t0

¦e t=1

3.2, ıˆ

T ˆ log | ȍ 0 | log(8.82)

2 2

¦e t=1

T

T

2 2t0

¦e

e

1t0 2t0

3.6, and ıˆ 120

ıˆ 210

t=1

T

ˆ 1.8 . Therefore, we have ȍ 0

ª3.2 1.8 º «1.8 3.6 » , and ¬ ¼

2.18 .

Putting the values of all the terms in (10.170), we have LR = 42(2.18  1.72)

= 19.32 The degrees of freedom for this test statistic is 22 (4  3)

(10.171) 4.

Decision: At 5% level of significance with 4 degrees of freedom, the table value of the test statistic is 9.49. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Therefore, it can be concluded that the dynamics are not completely captured by a three-lag VAR model, and thus, a four-lag VAR model would be preferable. For a small sample, the LR test in (10.170) will be biased against the null hypothesis. Thus, Sims (1980) suggested a modification to the likelihood ratio test to take into account the small-sample bias. He suggested to use the following test statistic to test the null hypothesis: LR

ˆ | ln|ȍ ˆ |º ~ Ȥ 2 (T  pk) ª¬ ln|ȍ 0 ur ¼ q

(10.172)

where pk is the total number of estimated parameters in each equation of the unrestricted VAR system. Finally, one should remember that the distribution of the LR test is only asymptotically valid. For the given problem, this test statistic will be

Chapter Ten

528

LR

(42  9)(2.18  1.72) ~ Ȥ 24 = 15.18

(10.173)

The null hypothesis will also be rejected at 5% level of significance. Forecasting with a VAR Model The VAR model is designed to know how the values of the variables at the current time period t are related to the past values. Therefore, the nature of VAR can be applied in forecasting the future values of the variables say x and y conditional on their past histories. Suppose that we have a sample of T observations of the variables x and y and we wish to forecast their values at time periods T + 1, T + 2, etc. To keep the algebra simple, we consider the bivariate VAR(1) model to explain forecasting. The bivariate VAR(1) model is y t = ȕ10 +ȕ11 y t-1 +ȕ12 x t-1 + İ1t ½ ¾ x t = ȕ 20 +ȕ 21 y t-1 + ȕ 22 x t-1 + İ 2t ¿

(10.174)

where İ1t and İ 2t are two white noise processes independent from the history of y and x, and may be correlated. For time period T+1, the VAR(1) model can be written as yT+1 = ȕ10 +ȕ11 yT +ȕ12 x T + İ1,T+1 ½° ¾ x T+1 = ȕ 20 +ȕ 21 y T + ȕ 22 x T + İ 2,T+1 °¿

(10.175)

Taking the expectation conditional on the relevant information from the sample ( y T and x T ), we have E(y T+1 |yT , x T ) = ȕ10 +ȕ11 y T +ȕ12 x T + E(İ1,T+1 |y T , x T ) ½° ¾ E(x T+1 |y T , x T ) = ȕ 20 +ȕ 21 yT + ȕ 22 x T + E(İ 2,T+1 |y T , x T ) °¿

(10.176)

The conditional expectation of the random error terms on the right-hand side of the VAR model must be zero in order to obtain the consistent OLS estimators of the coefficients. Whether this assumption is valid or not depends on the serial correlation properties of the random error terms. We have seen that serially correlated error terms are not related to the lagged dependent variables in the VAR model. Thus, we want to make sure that E(İ itf | İ1,t-1 , İ 2,t-1 ) 0, for i =1, 2 . Therefore, we have E(y T+1 |y T , x T ) = ȕ10 +ȕ11 yT +ȕ12 x T ½ ¾ E(x T+1 |y T , x T ) = ȕ 20 +ȕ 21 yT + ȕ 22 x T ¿

(10.177)

If we know the coefficients ȕ's , we can use (10.177) to forecast for period T+1. Generally, we may use our estimated VAR coefficients in place of the true values to calculate our predictions. Thus, the one-period-ahead forecasts are yˆ T+1 |T = ȕˆ 10 +ȕˆ 11 y T +ȕˆ 12 x T xˆ |T = ȕˆ +ȕˆ y + ȕˆ x T+1

20

21

T

22

½° ¾ T° ¿

(10.178)

The one-period-ahead forecast errors in equation (10.178) are given by yT+1  yˆ T+1 |T = (ȕ10  ȕˆ 10 )  (ȕ11  ȕˆ 11 )y T  (ȕ12  ȕˆ 12 )x T  İ1,T+1 ½° ¾ x T+1  xˆ T+1 |T = (ȕ 20  ȕˆ 20 )  (ȕ 21  ȕˆ 21 )y T  (ȕ 22  ȕˆ 22 )x T  İ 2,T+1 °¿

(10.179)

If the OLS estimates of ȕ’s are consistent and there is no serial correlation in İ's , then the expectation of the forecast error is asymptotically zero, i.e., E yT+1  yˆ T+1 |T

0 °½ ¾ E x T+1  xˆ T+1 |T = 0 °¿

The variance of the forecast error is given by

(10.180)

Multivariate Time-Series Models

Var(ȕˆ 10 )+Var(ȕˆ 11 )yT2  Var(ȕˆ 12 )x T2  2yT Cov(ȕˆ 10 ,ȕˆ 11 )  ½ ° °° 2x T Cov(ȕˆ 10 ,ȕˆ 11 )  2yT x T Cov(ȕˆ 11 ,ȕˆ 12 )  Var(İ1,T+1 ) ¾ Var(ȕˆ 20 )+Var(ȕˆ 21 )yT2  Var(ȕˆ 22 )x T2  2yT Cov(ȕˆ 20 ,ȕˆ 21 )  ° ° 2x T Cov(ȕˆ 20 ,ȕˆ 22 )  2yT x T Cov(ȕˆ 21 ,ȕˆ 22 )  Var(İ 2,T+1 ) °¿

Var y T+1  yˆ T+1 |T Var x T+1  xˆ T+1 |T

529

(10.181)

ˆ converge to the true population parameters. Thus, all of the If we increase the sample size, i.e., if T o f , then ȕ's terms in expression (10.181) converge to zero except the last one. Hence, we have

ı12 °½ ¾ Var(İ 2,T+1 ) ı 22 °¿

Var y T+1  yˆ T+1 |T

Var(İ1,T+1 )

Var x T+1  xˆ T+1 |T

(10.182)

In calculating the variances of the forecast errors, the errors in estimating the coefficients are often neglected. Using a recursive relationship, the two-period-ahead forecasts are given by E(yT+1 |yT+1 , x T+1 ) = ȕ10 +ȕ11 y T+1 +ȕ12 x T+1 ½ ¾ E(x T+1 |y T+1 , x T+1 ) = ȕ 20 +ȕ 21 y T+1 + ȕ 22 x T+1 ¿

(10.183)

So, by recursive expectations, we have E(yT+2 |y T , x T ) = ȕ10 +ȕ11E(y T+1 |y T , x T )+ȕ12 E(x T+1 |yT , x T )

½ ° = ȕ10 +ȕ11 >ȕ10 +ȕ11 yT +ȕ12 x T @ +ȕ12 >ȕ 20 +ȕ 21 y T + ȕ 22 x T @ ° ¾ E(x T+2 |yT , x T ) = ȕ 20 +ȕ 21E(y T+1 |y T , x T )+ ȕ 22 E(x T+1 |y T , x T ) ° = ȕ 20 +ȕ 21 >ȕ10 +ȕ11 yT +ȕ12 x T @ + ȕ 22 >ȕ 20 +ȕ 21 yT + ȕ 22 x T @°¿

(10.184)

The two-period-ahead forecasts are yˆ T+2 |T = ȕˆ 10 +ȕˆ 11 yT+1 |T+ȕˆ 12 x T+1 |T ½° ¾ xˆ T+2 |T = ȕˆ 20 +ȕˆ 21 y T+1 |T+ ȕˆ 22 x T+1 |T °¿

(10.185)

If we again ignore the errors in estimating the coefficients, the two-period-ahead forecast errors in (10.185) are given by yT+2  yˆ T+2 |T = ȕ11İ1,T+1 +ȕ12 İ 2,T+1 +İ1,T+2 ½° ¾ x T+2  xˆ T+2 |T = ȕ 21İ1,T+1 +ȕ 22 İ 2,T+1 +İ 2,T+2 °¿

(10.186)

Since the error terms for period (T+1) are correlated across equations, the variance of the two-period-ahead forecast is given by 2 2 2 2 = ȕ11 ı1 +ȕ12 ı 2 +2ȕ11ȕ12 ı12 +ı12 ½ ° 2 2 2 = (1+ȕ11 )ı12 +ȕ12 ı 2 +2ȕ11ȕ12 ı12 ° ¾ Var x T+2  xˆ T+2 |T = ȕ 221ı12 +ȕ 222 ı 22 +2ȕ 21ȕ 22 ı12 +ı 22 ° ° (1+ȕ 222 )ı 22 +ȕ 221ı12 +2ȕ 21ȕ 22 ı12 ¿

Var yT+2  yˆ T+2 |T



(10.187)

To estimate a VAR model, we can simply use the ordinary least-squares equation. Now, the (i, j)th element of 6 is 1 T 1 T eit e jt , and 6ˆ given by ıˆ ij = ¦ ¦ et ect . For the considered VAR model p =1, from equations (10.187) T  p t=p+1 T  p t=p+1 and (10.182), it can be said that the variance of the two- period-ahead forecast errors is larger than the variance of the one-period-ahead forecast error because the errors that we make in forecasting period (T+1) will be added with the forecast errors in period (T+2). If we increase our forecast horizon, the variance will be larger and larger, reflecting our inability to forecast a larger horizon in the future even if we have accurate estimates of the coefficients. The calculations of equation (10.187) will be increasingly more complex as we consider longer forecast horizons. If we include more than two variables in the VAR model or more than one lag on the right-hand side, also increases the number of terms in both equations (10.187) and (10.182) rapidly.

Chapter Ten

530

Ex. 10-13: A VAR model is forecasted for the variables number of persons left for abroad on employment (EM) and total workers’ remittances (RE) (in crore TK) of Bangladesh for the period 1978-2018 which is collected from Bangladesh Bank. First, we consider the following VAR(p) model for the given data: ª EM t º « RE » ¬ t¼

ªį1 º p ªȕ11i «į »  ¦ «ȕ ¬ 2 ¼ i=1 ¬ 21i

ȕ12i º ª EM t-i º ªİ1t º + ȕ 22i »¼ «¬ RE t-i »¼ «¬İ 2t »¼

(10.188)

where EM is the number of persons left for abroad on employment and RE is the remittance of total workers. İ1t and İ 2t are two white noise processes independent from the history of EM and RE, and may be correlated across the equations. Different techniques namely: LR, AIC, SBIC and HQIC are calculated using the software package EViews to select the lag-length of the VAR model and the results are given in Table 10.10.

Table 10-10: The results of different techniques for selecting the lag length of a VAR model.

Lag 0 1 2 3

VAR Lag Order Selection Criteria Endogenous variables: EM RE Exogenous variables: C Sample: 1978 2018 Included observations: 38 LR FPE AIC NA 4.48e+19 50.92548 157.5979* 6.14e+17* 46.63321* 4.653720 6.59e+17 46.70272 7.526148 6.42e+17 46.67047

LogL -965.5842 -880.0311 -877.3516 -872.7388

SBIC 51.01167 46.89178* 47.13366 47.27379

HQIC 50.95615 46.72521* 46.85604 46.88512

*: indicates lag order selected by the criterion

From the estimated results of all criterion, it can be said that the VAR(1) model is appropriate for the given data. The estimated results for the VAR(1) model are given in Table 10-11.

Table 10-11: Estimated results for the VAR(1) model Var. RE(-1) EM(-1) C R-squared Adj. R-squared Sum sq. resids S.E. equation F-statistic Log likelihood Akaike AIC Schwarz SC Mean dependent S.D. dependent

Vector Autoregression Estimates RE 0.947758 (0.03626) [ 26.1357] 0.020771 (0.00607) [ 3.41921] -1077.169 (1374.67) [-0.78358] 0.983688 0.982807 1.17E+09 5622.787 1115.652 -400.5816 20.17908 20.30575 32494.80 42881.50

EM 1.752797 (0.79335) [ 2.20936] 0.687849 (0.13290) [ 5.17560] 55509.57 (30074.7) [ 1.84572] 0.782064 0.770284 5.60E+11 123013.9 66.38731 -524.0004 26.35002 26.47669 295478.4 256660.1

Values within the parentheses are the standard errors, values within third brackets are the t-test values

The software package STATA is used to forecast the VAR(1) model for long horizon. The forecasting results for both variables are give in Table 10-12.

Multivariate Time-Series Models

531

Table 10-12: Forecasting values of the variables EM and RE Year

EM

RE

2018

880037.00

123156.00

2019

876709.64

133924.10

2020

893295.21

144060.54

2021

922470.69

154011.92

2022

959981.79

164049.43

2023

1003377.50

174341.70

2024

1051267.40

184997.65

2025

1102886.20

196091.62

2026

1157837.60

207678.20

6 0 0 0 0 0

3 0 0 0 0 0 0

The results are shown below graphically.

2018

2020

2022

2024

2026

-2 0 0 0 0 0

0

2 0 0 0 0 0

1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0

Forecast for re

4 0 0 0 0 0

2 0 0 0 0 0 0

Forecast for em

2018

95% CI

2020

2022

2024

2026

forecast

Fig. 10-3: Forecast values of EM and RE

From Fig 10-3, it can be said that both the variables EM and RE will be increased linearly for future time periods.

Impulse Response Function In a VAR model, using the block-F tests and the examination of causality, we can detect which of the variables have statistically significant effects on the future have valued of each of the variables in the system. But using the F-test, it will not be possible for us to detect the sign of the relationship or how long these effects require to take place. That is, the F-test will not be able to explain whether changes in the value of the given variable have either a positive or a negative effect on other variables in the system or how long it would take for the effect of that variable to work through the VAR model. One of the most popular and widely applicable techniques used to obtain this information is known as the impulse response function. Thus, impulse response functions are very useful for studying the interactions between variables in a VAR model. They represent the reactions of the variables to shocks hitting the system. It is often not clear, however, which shocks are relevant for studying specific economic problems. Therefore, structural information has to be used to specify meaningful shocks. To explain the meaning of the impulse response function, let us now consider the bivariate VAR(1) model of the type Yt = į1 +ș11Yt-1 +ș12 X t-1 + İ1t ½ ¾ X t = į 2 +ș 21Yt-1 +ș 22 X t-1 + İ 2t ¿

(10.189)

Equation (10.189) can be written as the following matrix form: § Yt · ¨ ¸ © Xt ¹

§ į1 · § ș11 ș12 · § Yt-1 · § İ1t · ¨ ¸¨ ¸¨ ¸¨ ¸ © į 2 ¹ © ș 21 ș 22 ¹ © X t-1 ¹ © İ 2t ¹ (10.190)

Z t = į + șZ t-1 + w t

§Y · where Zt = ¨ t ¸ , į = © Xt ¹

§ į1 · § ș11 ș12 · ¨ ¸, ș =¨ ¸ , Zt-1 = © į2 ¹ © ș 21 ș 22 ¹

§ Yt-1 · ¨ ¸ , and w t © X t-1 ¹

§ İ1t · ¨ ¸. © İ 2t ¹

Equation (10.190) can be written as the following VMA (vector moving average) model

Chapter Ten

532

Z t = ȝ+w t +șw t-1 +ș 2 w t-2 +.......+ș j w t-j +........... f

= ȝ+¦ ș j w t-j

(10.191)

j=0

The moving-average representation is very much important because it allows examining the interaction between the įZ ^Yt ` and ^X t ` sequences. Then, the function t = ș j , is called the impulse response function. Here, ș j is a įw t-j

2 u 2 matrix. The components of ș j are used to generate the effects of shocks İ1t and İ 2t on the entire time paths of

^Yt `

and ^X t ` sequences. The impact multipliers are the four elements. The impulse response functions are the four

sets of coefficients ș j (1, 1), ș j (1, 2), ș j (2, 1), and ș j (2, 2). More generally, the (k, m)th component of ș j is denoted by ș j (k, m). Then, the component ș j (k, m), measures the response of the kth variable to the error of the mth variable after j periods. For example, ș3 (1, 2) measures the response of the first variable to the error of the second variable after 3 periods. They are usually presented using a plot of ș j (k, m) against j. That is, for a given impulse response plot, we let j vary while holding k and m constant. For a bivariate system, there are four impulse responses plots. From these plots, we are able to know the timeဨpaths of y and x in response to the shocks İ1 and İ 2 . To illustrate how impulse response functions operate, we will consider the same simulated sequences as before without the intercept of the type (10.192)

Z t = ș Zt-1 + w t ª0.6 0.4 º where ș = « ». ¬ 0.3 0.5 ¼

The VAR can also be written by using the elements of the matrices and vectors as follows: § y t · ª0.6 0.4 º ª y t-1 º ¨ ¸= « »« »+ © x t ¹ ¬ 0.3 0.5¼ ¬ x t-1 ¼

ª İ1t º «İ » ¬ 2t ¼

(10.193)

Considering the effect at time t = 0, 1, 2,…….. of a unit shock to y t at time t=0, we have § y0 · ª0.6 0.4 º ª y-1 º ª İ10 º ¨ ¸= « »« »+ « » ; © x 0 ¹ ¬ 0.3 0.5 ¼ ¬ x -1 ¼ ¬İ 20 ¼ ª İ º ª1 º Z0 = « 10 » = « » ; ¬İ 20 ¼ ¬0 ¼ ª0.6 0.4 º ª0.6 0.4 º ª1 º ª0.6 º Z1 = « Z0 = « » »« »=« » ; ¬ 0.3 0.5 ¼ ¬ 0.3 0.5¼ ¬0 ¼ ¬ 0.3¼

ª0.6 0.4 º ª0.6 0.4 º ª0.6 º ª0.48º Z2 = « Z1 = « » »« »=« »; ¬ 0.3 0.5 ¼ ¬ 0.3 0.5 ¼ ¬ 0.3¼ ¬ 0.33¼ ª0.6 0.4 º ª0.6 0.4 º ª0.48º ª0.42 º Z3 = « Z2 = « » »« »=« »; ¬ 0.3 0.5 ¼ ¬ 0.3 0.5 ¼ ¬ 0.33¼ ¬ 0.31¼ and so on. We can now plot the impulse responses against time periods. From these graphical presentations, we can easily know the effects of shocks to the variables. More specifically, we can know the effect on y of a shock to y ( İ1 ), the effect on y of a shock to x ( İ 2 ), the effect on x of a shock to y ( İ1 ), and the effect on x of a shock to x ( İ 2 ). For a VAR(k) model, there will be k 2 impulse response functions and the same principle can be applied to know the effect in one variable response due to a shock in another variable in the system.

Multivariate Time-Series Models

533

Ex. 10-14: Impulse response functions are estimated for the VAR(1) model of the time-series variables exchange rates of the British pound (EXUK), Japanese yen (EXJPN) and Canadian dollar (EXCAN) against the US dollar for 19712019. From the estimated VAR(1) model, it is found that there are no problems of autocorrelation and heteroscedasticity. Also, it is found that the residuals are normally distributed. The impulse responses of EXUK, EXJPN and EXCAN with respect to one standard deviation shock to EXUK, EXJPN and EXCAN are estimated using EViews. The estimated results are presented graphically below. Response to Cholesky One S.D. Innovations ± 2 S.E. Response of EXUK to EXUK

Response of EXUK to EXJPN

Response of EXUK to EXCAN

.06

.06

.06

.04

.04

.04

.02

.02

.02

.00

.00

.00

-.02

-.02

-.02

-.04

-.04 1

2

3

4

5

6

7

8

9

10

-.04 1

2

Response of EXJPN to EXUK

3

4

5

6

7

8

9

10

1

Response of EXJPN to EXJPN 20

20

10

10

10

0

0

0

-10

-10

-10

-20

-20 2

3

4

5

6

7

8

9

10

2

Response of EXCAN to EXUK

3

4

5

6

7

8

9

10

1

.06

.06

.06

.04

.04

.04

.02

.02

.02

.00

.00

.00

-.02

-.02

-.02

-.04 3

4

5

6

7

8

9

10

5

6

7

8

9

10

3

4

5

6

7

8

9

10

9

10

Response of EXCAN to EXCAN .08

2

2

Response of EXCAN to EXJPN .08

1

4

-20 1

.08

-.04

3

Response of EXJPN to EXCAN

20

1

2

-.04 1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

Fig. 10-4: Impulse responses and standard error bands for innovations in EXUK, EXJPN and EXCAN

From Fig. 10-4, it can be said that the exchange rate of the British pound (EXUK) will respond negatively in a nonlinear functional form with respect to one standard deviation shock to itself and negatively in a linear functional form to one standard deviation shock to EXJPN and EXCAN for the next 10 years. The exchange rate of the Japanese yen (EXJPN) will respond negatively in a non-linear functional form with respect to one standard deviation shock to EXUK, to one standard deviation shock to itself and positively a in linear functional form with respect to one standard deviation shock to EXCAN for the next 10 years. The exchange rate of the Canadian dollar (EXCAN) will respond positively in a non-linear functional form for the first four years and then will respond negatively for the next 6 years with respect to one standard deviation shock to EXUK, will respond negatively to one standard deviation shock to EXJPN and will respond negatively in a non-linear functional form with respect to one standard deviation shock to itself for the next 10 years.

Forecast Error Variance Decomposition (FEVD) Let us consider a k-dimensional VAR(p) model of the type

Yt = į + Ǻ1 Yt-1 + Ǻ 2 Yt-2 + ........ + Ǻp Yt-p + İ t p

į  ¦ Ǻ i Yt-i + İ t

(10.194)

i=1

where İ t is an independently and identically distributed (iid) error term with zero mean vector and covariance matrix ȍ . Assuming weak stationarity, Yt can be expressed as an infinite-order moving-average of the type f

Yt = ȝ + ¦ A jİ t- j j=0

(10.195)

Chapter Ten

534

where Var(İ t ) = ȍ, and A 0 = I k . Since ȍ is typically not a diagonal matrix, the components İ jt 's are often correlated, making the interpretation of elements A l (ij), of A l complicated. To overcome the difficulty, one can make use of the Cholesky decomposition of the covariance matrix ȍ . That is, if suitable identification restrictions are available, then ȍ can be written as (10.196)

ȍ = LGLc = PPc

where G is a diagonal matrix, L is a lower-triangular matrix with unit diagonal elements, and P = LG 1/2 . Let us definie, ȟ t = P -1İ t is the orthogonalised error with identity covariance matrix, i.e., Var(ȟ t ) = I k . Since Var(ȟ t ) = I k , elements of ȟ t are uncorrelated with unit variance. The MA representation in equation (10.195) can be rewritten as f

Yt = ȝ + PP -1İ t + ¦ A j PP -1İ t- j j=1

f

= ȝ + Ȍ 0 ȟ t + ¦ Ȍ jȟ t - j

(10.197)

j=1

where Ȍ 0 = P, and Ȍ j = A j P, , for j > 0. Note that Ȍ 0 is a lower triangular matrix, by construction, Yjt depends contemporaneously on Yit with i = 1, 2, …… (j -1), where j = 2, 3,……,k. Therefore, the matrices Ȍ j depend on the ordering of the elements in Yt . Different orderings give rise to different matrices Ȍ j and there are k! possible orderings of Yt . Since the elements of ȟ t in equaiton (10.197) are uncorrelated, we have įy j,t+l įȟ it

įy j,t

=

= ȥ l,ji , l =0, 1, 2,………., j, i = 1, 2,……,k

įȟ i,t-l

(10.198)

where ȥ l,ji is the (j, i)th element of Ȍ l . ȥ l,ji is called the orthogonalised impulse response function on y j,t+l of a unit shock to the ith equation (see, e.g., Lütkepohl, 2005, Section 2.3). Consequently, Ȍ j are the orthogonalised impulse response functions of Yt with respect to ȟ t . These impulse response functions depend on the ordering of elements of Yt . For the MA representation of equation (10.197), the one-step-ahead forecast error at the forecast origin n is given by en (l) =

l-1

¦Ȍ

(10.199)

ȟ

m n+l-m

m =0

For the jth element of Yt , the forecast error is given by e j, n (l) =

l-1

k

¦¦ ȥ

ȟ

m,ji i,t+l-m

m=0 i=1 l-1

l-1

l-1

m=0

m=0

m=0

= ¦ ȥ m,j1ȟ1,t+l-m + ¦ ȥ m,j2 ȟ 2,t+l-m +............+ ¦ ȥ m,jk ȟ k,t+l-m

(10.200)

Since ȟ jt (j=1, 2,……..,k) are uncorrelated with unit variance and have no serial correlation, we have l-1

2

l-1

2

l-1

Var ª¬e j, n (l) º¼ = ¦ ª¬ ȥ m,j1 º¼ + ¦ ª¬ ȥ m,j2 º¼ +............+ ¦ ª¬ ȥ m,jk º¼ m=0

m=0

2

(10.201)

m=0

Consequently, the portion of the variance of e j, n (l) due to the shock ȟ it is called the forecast error variance decomposition given by

Multivariate Time-Series Models l-1

FEVD j,i (l) =

¦ ª¬ ȥ

m,ji

m=0

l-1

2

l-1

2

º¼

2

l-1

¦ ª¬ ȥm,j1 º¼ +¦ ª¬ȥm,j2 º¼ +............+¦ ª¬ȥm,jk º¼ m=0

m=0

535

2

, i, j = 1, 2,…….,k

(10.202)

m=0

Again, the FEVD depends on the ordering of the elements of Yt . Thus, from equation (10.202), it can be said that the generalised forecast error variance decomposition measures the proportion in the error variance of one variable caused by the innovations (shocks) in the other variables (Koop et al,. 1996; Pesaran and Shin, 1998). Thus, by employing the forecast error variance decomposition method, the relative importance of a set of variables that affect the variance of another variable can be easily identified. The advantage of using the generalised forecast error variance decomposition is that it is invariant to the ordering of the variables entering into the VAR model.

Ex. 10-15: The forecast error variance decomposition for each variable is obtained for a VAR(1) model of the variables exchange rates of the British pound (EXUK), Japanese yen (EXJPN) and Canadian dollar (EXCAN) against the US dollar vased on the data from 1971 to 2019. Using EViews, the results are given in Table 10-13. Table 10-13: Variance Decomposition of different variables. Period 1 2 3 4 5 6 7 8 9 10

S.E. 0.0486 0.0642 0.0738 0.0801 0.0844 0.0876 0.0898 0.0913 0.0925 0.0933

Period 1 2 3 4 5 6 7 8 9 10

S.E. 16.8693 21.7695 24.8562 27.2206 29.2244 30.9963 32.5790 33.9863 35.2250 36.3026

Period 1 2 3 4 5 6 7 8 9 10

S.E. 0.066966 0.090568 0.106794 0.119133 0.128864 0.136627 0.142824 0.147747 0.151630 0.154668

Variance Decomposition of EXUK EXUK EXJPN 100.0000 0.0000 99.9214 0.0167 99.7617 0.0531 99.5428 0.1063 99.2846 0.1729 99.0042 0.2494 98.7160 0.3323 98.4316 0.4180 98.1599 0.5039 97.9067 0.5873 Variance Decomposition of EXJPN EXUK EXJPN 8.6924 91.3076 5.3498 94.6429 4.9414 95.0233 6.7799 93.1264 9.9550 89.8589 13.6746 86.0152 17.3983 82.1409 20.8240 78.5453 23.8185 75.3683 26.3511 72.6466 Variance Decomposition of EXCAN EXUK EXJPN 7.643453 1.523332 14.03782 1.812152 20.52991 2.017634 26.49063 2.149663 31.66045 2.224354 35.99154 2.257652 39.53904 2.262911 42.39880 2.250445 44.67585 2.227843 46.46992 2.200478 Cholesky Ordering: EXUK EXJPN EXCAN

The combined graphs of variance decompositions are shown below:

EXCAN 0.0000 0.0619 0.1852 0.3509 0.5425 0.7464 0.9517 1.1503 1.3363 1.5060 EXCAN 0.0000 0.0073 0.0353 0.0937 0.1861 0.3103 0.4607 0.6307 0.8132 1.0023 EXCAN 90.83322 84.15003 77.45246 71.35970 66.11520 61.75081 58.19805 55.35075 53.09631 51.32960

Chapter Ten

536 V ari anc e

Dec ompos i t i on

of

E XUK

100 80 60 40 20 0 1

2

3

4

5

EXUK

V ari anc e

6

7

8

EXJPN

9

10

EXCA N

Dec ompos i t i on

of

E XJP N

100 80 60 40 20 0 1

2

3

4

5

EXUK

V ari anc e

6

7

8

EXJPN

9

10

EXCA N

Dec ompos i t i on

of

E XCA N

100 80 60 40 20 0 1

2

3

4

EXUK

5

6 EXJPN

7

8

9

10

EXCA N

Fig. 10-5: Variance decomposition graphs

From the estimated results in Table 10-13, it can be said that, in the first year, 100% variance (change) in EXUK can be explained by the shock to itself. In the second year, 99.92%, 0.02%, and 0.06% variance (change) in EXUK can be explained by the shocks to itself, EXJPN, and EXCAN. In the third year, 99.76%, 0.05%, and 0.19% variance (change) in EXUK can be explained by the shocks to itself, UXJPN and EXCAN, and so on. Finally, in tenth year, 97.91%, 0.59% and 1.51% variance (change) in EXUK can be explained by the shocks to itself, EXJAPN and EXCAN, respectively. In the same way, we can also explain the variances (changes) for other variables EXJPN and EXCAN respectively.

Advantages and Disadvantages of VAR Models In macroeconomic and financial analyses, the VAR models have several advantages compared to time0series models or simultaneous equations structural models. These are discussed below.

univariate

Advantages of VAR Models (i) In a VAR model, we do not need to specify which variables are endogenous or exogenous. In the VAR models, all variables are endogenous. (ii) The VAR models might avoid “incredible identification” problems. (iii) The VAR models are the most successful and flexible models for the analysis of multivariate time-series variables. For example, in economics, VAR model is used to forecast macroeconomic variables such as GDP, money supply, and unemployment. In finance, a VAR model is used to predict the spot prices and future prices of securities and foreign exchange rates across markets. In accounting, a VAR model is used to predict accounting variables such as sales, earnings and accruals. In marketing, it can be used to evaluate the impact of different factors on consumer behaviors and forecast its future changes, etc. (iii) The VAR models allow the value of a variable to depend on more than just its own lags or combinations of white noise terms. So, the VAR models are more flexible than univariate AR models or ARMA models. Therefore, the VAR models will be a very rich structure rather than AR or ARMA models, implying that the VAR models will be able to capture more features of the data. Thus, we can say that the VAR models capture the dynamic response to shocks. (iv) In the VAR models, all the RHS variables are predetermined. That is, they are known at time t. This implies that there is no possibility for feedback from any of the LHS variables to any of the RHS variables. Thus, we can apply the OLS method to every single equation of the VAR model for estimation. (v) The forecasts generated by the VAR models are often better than “traditional structural” models. The VAR models are considered good for short-run forecasting.

Multivariate Time-Series Models

537

(vi) The VAR models can aid the identification of shocks including monetary-policy shocks.

Disadvantages of the VAR Models The VAR models have their own drawbacks and limitations relative to other models. Some of them are highlighted below: (i) In a VAR model, there is little theoretical information about the relationship between the variables to guide the specification of the model. That is why the VAR models are theoretical (as ARMA models). The VAR models avoid the “incredible identification” problems. But identification is a big issue. Different identification schemes may give us very different results. That is why, the VAR models may give us spurious results. (ii) In a VAR model, it is very difficult to detect the appropriate lag length of the variables. (iii) In a VAR model, many parameters are to be estimated. If there are k equations, one for each of the k variables and with p lags of each of the variables in each equation, then (k+pk 2 ) parameters are to be estimated. For example, if k=3, and p=2, then 21 parameters are to be estimated. That is why the VAR models tend to be over-fitted. (iv) In a VAR model, we assume that all the variables are stationary. This assumption is not always true.

10.6 Seemingly Unrelated Regression Equations (SURE) Models Meaning: The seemingly unrelated regression equations (SURE) model is defined as the generalisation of a linear regression model which consists of several linear regression equations, each having its own dependent variable and potentially different sets of exogenous explanatory variables. In some economic situations, when the objective is to explain the whole system, there may be more than one multiple linear regression equations. There may be interactions between the individual equations of the system if the random error terms associated with at least some of the different equations are correlated with each other. This means that the equations are linked statistically, even though not structurally (through the jointness of the distribution of the error terms and through the non-diagonal covariance matrix). For this type of economic behaviours, it may be more efficient to estimate all equations jointly, rather than to estimate each one separately using the least-square method. The jointness of the equations is explained by the structure of the SURE model and the covariance matrix of the associated disturbances. In a SURE model, it is desired to consider all the separate relationships collectively to㻌draw the statistical inferences about the model parameters. Ex. 10-16: Suppose that we are interested to study the consumption pattern of Bangladesh. In Bangladesh, there are 64 districts. There is one consumption equation for each district. So, altogether, there are 64 equations which describe 64 consumption functions. It may also not be needed that, in all equations, the same explanatory variables will be included. Different equations may contain different explanatory variables. It may be noted that the consumption pattern of the neighbouring districts may have common characteristics in nature. We may assume that the error terms associated with the equations may be contemporaneously correlated. For the existence of contemporaneous correlation, it may be more efficient to estimate all 64 equations jointly, rather than to estimate each one separately using the least-squares method. The appropriate joint estimation technique is the GLS estimation. Explanation of the SURE Models Let us consider a model which consists of m multiple linear regression equations of the type yit = ȕ i0 x i0t +ȕi1 x i1t +ȕ i2 x i2t +...... +ȕ iki x iki t +İ it ki

¦ȕ x ij

ijt

j=0

 İ it , i = 1, 2,……,m, j=1, 2,…., k i , t = 1, 2,….,T

(10.203)

where yit is the tth observation of the ith dependent variable which is to be explained by the ith regression equation; x ijt is the tth observation on the jth explanatory variable appearing in the ith equation; ȕij is the regression coefficient associated with x ijt in each observation; and İ it is the tth value of the random error component associated with the ith equation of the model. Here, x i0t

1,  i and t .

Assumptions: (i) E(İ it )

0,  t, i=1, 2,....,m; t = 1, 2,....,T

(ii) Var(İ it )

ıi2 ,  t, i=1, 2,......,m.

Chapter Ten

538

(iii) E(İ it ,İ jt )

ı ij , i, j =1, 2,.....,m.

(iv) E(İ it ,İ js )

0, for t z s. i, j =1, 2,.....,m.

The set of m linear equations of (10.202) can be written as the following matrix form: Y1 = X1ȕ1 + İ1

½ Y2 = X 2 ȕ 2 + İ 2 °° °° . ¾ . ° ° . ° Ym = Xm ȕm + İm °¿

where Yi

(10.204)

ª yi1 º «y » « i2 » « . » « » , Xi « . » « . » « » ¬« yiT ¼» T×1

ª1 x i11 « «1 x i12 «. . « «. . « . «. «1 x i1T ¬

x i21 x i22 . . . x i2T

... ... x iki 1 º » ... ... x iki 2 » . . . » » . . . » » . . . » ... ... x iki T »¼

, ȕi

T×(k i +1)

ªȕi0 º « » «ȕi1 » «. » « » «. » «. » « » ¬«ȕiki ¼»

, and İ i

(k i +1)×1

ªİ i1 º «İ » « i2 » «. » « » «. » «. » « » ¬«İ iT ¼»

where i =1, 2,……….,m The set of m linear regression equations in (10.204) can be expressed as the following super model: ª Y1 º «Y » « 2» «. » « » «. » «. » « » ¬« Ym ¼»

ª X1 «0 « « . « « . « . « «¬ 0

0 X2 . . . 0

0 0 . . . 0

... ... ... ... ... ...

... 0 º ªȕ1 º ªİ1 º ... 0 »» ««ȕ 2 »» ««İ 2 »» ... . » «. » «. » »« »« » ... . » «. » «. » ... . » «. » «. » »« » « » ... X m »¼ ¬«ȕ m ¼» ¬«İ m ¼»

(10.205)

Y = Xȕ + İ

where Y is (Tm×1) , X is (Tm×k), ȕ is (k u 1) , İ is (Tm×1), and k =

m

¦k

i

.

i=1

It can be shown that the least squares applied to this system is identical to applying least squares separately to each of the two equations. For the above m classical linear regression equations, the following conventional assumptions are highlighted: (i) Xi is fixed, (i = 1, 2,…….,m) (ii) Rank(Xi ) = k i , i =1, 2,…..,m ª1 º (iii) lim « Xic Xi » T of T ¬ ¼

Qii , where Qii is a is nonsingular with fixed and finite elements.

(iv) E(İ i ) = 0,  i =1, 2,....,m ­°ı 2 I , for i = j (v) E(İ i İ jc ) = ® i T °¯ı ij IT , for i z j

where ı ij is the covariance between the disturbances of the ith and jth equations for each observation in the sample.

Multivariate Time-Series Models

539

ª1 º Considering the interactions between the two equations of the model, we assume that lim « Xic X j » T of T ¬ ¼ is a non-singular matrix with fixed and finite elements. Compactly, we can write that E(İ) = 0, and

ª ı12 IT ı12 IT ı13IT « 2 « ı21IT ı2 IT ı23IT « . . . Var(İ) = E(İİc) = « . . « . « . . . « «¬ım1IT ım2 IT ım3IT

... ... ı1mIT º » ... ... ı2m IT » ... ... . » » :… IT ... ... . » ... ... . » » 2 ... ... ım IT »¼

6

Qij , where Qij

(10.206)

where … denotes the Kronecker product operator, 6 is an (mT×mT) matrix, : is an (m×m) positive definite ª ı12 « « ı 21 « . symmetric matrix of the type : = « « . « . « ¬«ı m1

ı12 ı 22 . . . ı m2

ı13 ı 23 . . . ı m3

... ... ... ... ... ...

... ı1m º » ... ı 2m » ... . » » ... . » ... . » » ... ı 2m ¼»

The matrix : avoids the possibility of linear dependencies among the contemporaneous disturbances in the 2 equations of the model. The structure E(İİ c) = : … IT , implies that (i) Var(İ it ) = Constant,  t ; (ii) Cov(İ it , İ jt )

E(İ it ,İ jt ) = Constant,  t ;

(iii) E(İ it ,İ js ) = 0, for t z s , for all i and j. It is clear that, in a single equation model, the disturbance terms are homoscedastic. It is also clear that the two equations are related stochastically through the disturbances which are serially correlated across the equations of the model. That is why, this system is referred to as the SURE model. The SURE model is a particular case of the simultaneous-equations model involving two or more than two structural equations with two or more than two jointly dependent variables and k distinct exogenous variables in which neither the current nor logged endogenous variables appear as explanatory variables in any of the structural equations. The SURE model differs from the multivariate regression model only in the sense that it takes account of prior information concerning the absence of certain explanatory variables from certain equations of the model. Such exclusions are highly realistic in many economic situations.

The OLS and GLS Estimation of the SURE Models The seemingly unrelated regression equations (SURE) model is given by (10.207)

Y = Xȕ + İ

Here, E(İ) = 0, and Var(İ) = : … IT 

6.

The OLS estimator of ȕ is given by ȕˆ OLS

1

> XcX@

XcY

(10.208)

We have E(ȕˆ OLS )

ȕ

The variance-covariance matrix is given by

(10.209)

Chapter Ten

540

Var(ȕˆ OLS )

E(ȕˆ OLS  ȕ)(ȕˆ OLS  ȕ)c ( XcX) 1 XcE(İİc) X(XcX)1

( XcX) 1 Xc(: … IT ) X(XcX)1 ( XcX) 1 XcȈX(XcX)1

(10.210)

The generalised least-squares (GLS) estimator of ȕ is given by ȕˆ GLS

1

ª¬ XcȈ -1 X º¼ XcȈ -1 Y 1

ª¬ Xc(: 1 … IT ) X º¼ Xc(: 1 … IT )Y

(10.211)

The mean value of ȕˆ GLS is given by 1

E ¬ªȕˆ GLS ¼º

1

ª¬ XcȈ -1 X º¼ XcȈ-1 Xȕ  ª¬ XcȈ -1 X º¼ XcȈ-1 E(İ) (10.212)

ȕ

The variance-covariance matrix of ȕˆ GLS is Var(ȕˆ GLS )

E(ȕˆ GLS  ȕ)(ȕˆ GLS  ȕ)c





1

1

E ª¬ XcȈ -1 X º¼ XcȈ -1İ ª¬ XcȈ -1 X º¼ XcȈ -1İ 1

ª¬ XcȈ -1 X º¼ XcȈ -1 E(İİc) Ȉ -1 X ª¬ XcȈ -1 X º¼ 1

ª¬ XcȈ -1 X º¼ XcȈ -1 ȈȈ -1 X ª¬ XcȈ -1 X º¼ ª¬ XcȈ -1 X º¼

c

1

1

1

ª¬ Xc(: 1 … IT ) X º¼

1

(10.213)

Let us now define, D

1

( XcX) 1 Xc  ª¬ XcȈ -1 X º¼ XcȈ -1 . We have, DX = 0 . We also have

ȕˆ OLS  ȕˆ GLS

1

> XcX@

1

XcY  ª¬ XcȈ -1 X º¼ XcȈ -1 Y

> XcX@

1

1



Xc  ª¬ XcȈ -1 X º¼ XcȈ -1 Y

(10.214)

= DY

From equation (10.214), we have ȕˆ OLS = ȕˆ GLS +DY 1 ȕˆ OLS = ȕ  ª¬ XcȈ -1 X º¼ XcȈ -1İ  DXȕ  Dİ 1

ȕˆ OLS  ȕ

ª¬ XcȈ -1 X º¼ XcȈ -1İ  Dİ

ȕˆ GLS  ȕ

ª¬XcȈ Xº¼ -1

1



XcȈ -1  D İ

(10.215)

Multivariate Time-Series Models

541

The variance of ȕˆ OLS is given by Var(ȕˆ OLS )

E(ȕˆ OLS  ȕ)(ȕˆ OLS  ȕ)c

ª¬XcȈ Xº¼ -1

1

ª¬XcȈ Xº¼ -1





1

XcȈ-1  D E(İİc) ª¬ XcȈ -1 X º¼ XcȈ-1  D

1



1

XcȈ-1  D Ȉ Ȉ-1 X ª¬ XcȈ-1 X º¼  Dc

1

1

c



1

ª¬ XcȈ 1 X º¼ XcȈ 1 X ª¬ XcȈ 1 X º¼  ª¬ XcȈ 1 X º¼ XcȈ 1 ȈDc  1

DȈȈ 1 X ª¬ XcȈ 1 X º¼  DȈDc Var(ȕˆ GLS )  DȈDc [ since, DX

0]

(10.216)

Since Ȉ is a positive definite matrix, DȈDc is at least positive semidefinite. Therefore, from equation (10.216), we can write that Var(ȕˆ GLS )  Var(ȕˆ OLS )

(10.217)

Thus, it can be said that the GLSE is, in general, more efficient than the OLSE for estimating ȕ . In fact, the results that the GLSE is the best linear unbiased estimator of ȕ . Thus, it can be concluded that, in the case of a SURE model, is the best linear unbiased estimator. ȕˆ GLS

Feasible Generalised Least-Squares Estimation of SURE Models The GLSE of ȕ is ȕˆ GLS

1

ª¬ Xc(: 1 … IT ) X º¼ Xc(: 1 … IT )Y

(10.218)

From equation (10.218), we see that, if : is unknown to us, then the GLSE of ȕ cannot be used. Thus, we have to ˆ . With such a replacement, the estimator which is estimate : and then replace : by the estimated (m×m) matrix : obtained is called a feasible generalised least-squares (FGLS) estimator of ȕ . A widely used estimator of : is ˆ = (ıˆ ), , where s = ıˆ = S=ȍ ij ij ij

1 T ¦eit e jt , and ept is the OLS residual from the pth equation, i.e., ept T t=1

y pt  ȕcp.OLS x pt ,

p = i, j .

Thus, the feasible generalised least-squares (FGLS) estimator of ȕ is given by ȕˆ FGLS

1

ª¬ Xc(S-1 … IT ) X º¼ Xc(S-1 … IT )Y

ª s12 s12 s13 « 2 « s 21 s2 s 23 « . . where S = « . . . « . « . . . « ¬«s m1 s m2 s m3

(10.219)

... ... s1m º » ... ... s 2m » ... ... . » is a nonsingular matrix s is an unbiased estimator of ı . ij ij », ... ... . » ... ... . » » ... ... s2m ¼»

The FGLS estimator is a two-step estimator where the OLS is used in the first step to obtain residuals e pt and an estimator of : . In the second step, we compute ȕˆ FGLS based on the estimated value of Ȉ which is obtained in the first step. This estimator is sometimes referred to as the restricted estimator as opposed to the unrestricted estimator proposed by Zellner that uses the residuals from regressing each regressand on all distinct regressors in the system.

Chapter Ten

542

Ex. 10-17: The seemingly unrelated regression equations model is estimated in order to find the impact of population indicators on environmental degradation using the data of arable land (AL, thousand hectares), carbon dioxide emissions (CO2, in kt), the population growth rate (PG, annual %), and the population density (PD, people per km2) of Bangladesh for the period 1972-2017. To study the effects of population indicators (growth and density) on the environment, two environmental indicator variables AL and CO2 are used as the independent variables separately covering the period of study. Thus, the following two simple nonlinear population-environment models are specified: AL t = A 0 PG ȕt 1 PDȕt 2 eİt 1t ½° ¾ CO2 t = C0 PGDt 1 PDDt 2 eİt 2t °¿

(10.220)

Taking the logarithm of (10.220), we have ln(AL t ) = ln(A 0 )+ȕ1ln(PG t )+ȕ 2 ln(PD t ) +İ1t

½ ¾ ln(CO2 t ) = ln(C0 )+D1ln(PG t )+D 2 ln(PD t ) +İ 2t ¿

(10.221)

Using RATS, the SURE model is estimated for (10.221) and the results are given in Table 10-14. Table 10-14: Estimated results of the SURE model

Constant ln(PG)

Estimation by Least Squares Dependent Variable : ln(AL) Coeff Std Err. T-Stat 9.9577 0.1275 78.0912 0.1105 0.01525 7.2467

Signif 0.0000 0.0000

ln(PD)

-0.1445

0.0000

Constant ln(PG)

Dependent Variable : ln(CO2) -10.639 0.4786 -22.225 0.0000 -0.2954 0.0573 -5.1596 0.0000

Dependent Variable : ln(CO2) -10.639 0.4628 -22.988 0.0000 -0.2954 0.0553 -5.3365 0.0000

ln(PD)

3.0752

3.0752

Variable

0.0179

0.0673

-8.0558

45.6719

0.0000

Estimation by Seemingly Unrelated Regressions: ln(AL) Coeff Std Err. T-Stat Signif 9.9577 0.1232 80.7693 0.0000 0.1105 0.0147 7.4953 0.0000 -0.1445

0.0173

0.0651

-8.3320

47.2382

0.0000

0.0000

The results reveal that the population density significantly causes the decrease in arable land and increase in carbon dioxide emissions. The population growth rate significantly increases the arable land and decreases the carbon dioxide emissions in Bangladesh.

10.7 Error Correction Mechanism (ECM) Explanation of ECM: The Granger representation theorem (Granger 1983; Engle and Granger 1987) states that, if a set of variables are cointegrated, there exists a valid error correction representation of the data. Thus, if Yt and X t are both I(1) and have a cointegration vector (1,  ȕ)c , there exists an error-correction representation with Z t = Yt  ȕX t of the form

T (L) 'Yt = į+I (L)'X t-1  O Zt-1 +D (L)H t

(10.222)

where {İ t } is a white noise process and ș(L), I (L), and D (L) are polynomials in the lag operator L ( with ș 0 = 1 ) . For simplicity, let us consider the special case of equation (10.222) 'Yt = G +I1'X t-1  O > Yt-1  ȕX t-1 @ +İ t

(10.223)

where the error term has no moving average part, and the systematic dynamics is kept as simple as possible. If Yt and X t are both I(1) but have a long-run relationship, there must be some force that pulls the equilibrium error back towards zero. The error correction term describes how Yt and X t behave in the short-run consistent with a long-run co-integration relationship. When 'Yt = 'X t-1 = 0, we obtain the “no change” steady state equilibrium. į į , which corresponds to the long-run equilibrium relationship if Į = . In this case, the error-correction Ȝ Ȝ model can be written as Yt  ȕX t =

'Yt = I1'X t-1  O > Yt-1  Į  ȕX t-1 @ +İ t

(10.224)

Multivariate Time-Series Models

543

Where the constant is only present in the long-run relationship. If, however, the error correction model contains the constant term, then į = ĮȜ+Ȗ with Ȗ z 0, indicating deterministic trends in both Yt and X t , and the long-run equilibrium corresponds to a steady-state growth path with 'Yt = 'X t-1 =

J 1  I1

.

ARDL and Error Correction Models Let us consider the ARDL(1, 1) model of the type (10.225)

Yt = D 0 +D1Yt-1 +ȕ 0 X t +ȕ1X t-1 +İ t

where Yt and X t are both I(1) and have a cointegration relationship and {İ t } is a white noise process. By subtracting Yt-1 from both sides of equation (10.225) and rearranging, we have Yt  Yt-1 = D 0 +D1Yt-1  Yt-1 +ȕ 0 X t  ȕ 0 X t-1 +ȕ 0 X t-1 +ȕ1X t-1 +İ t 'Yt

D 0  Yt-1[1  D1 ]+ȕ 0 'X t +X t-1[ȕ 0 +ȕ1 ]+İ t ª º D E +ȕ ȕ 0 ǻX t  [1  D1 ] « Yt-1  0  0 1 X t-1 » +İ t 1  D1 1  D1 ¬ ¼ ȕ 0 'X t  [1  D1 ]> Yt-1  ș1  ș 2 X t-1 @ +İ t

(10.226)

= E 0 'X t  O ECM t-1 +İ t

This is applicable to all ARDL models. The term ECM t-1 = Yt-1  ș1  ș 2 X t-1 is called the error-correction term (equilibrium error). Equations (10.226) are widely known as the error correction model (ECM). Therefore, ECM and ARDL are basically the same if the series Yt and X t are integrated in the same order [often I(1)] and cointegrated. In this model, Yt and X t are assumed to be in the long-run equilibrium, i.e., changes in Yt relate to changes in X t according ȕ 0 . If Yt-1 deviates from the optimal value (i.e., its equilibrium), there is a correction. The speed of adjustment given by O 1  D1 , which will be greater than 0 and less than one, is called the coefficient of the error correction term. A More General Model for a Large Number of Lagged Terms and ECM Let us consider the ARDL model of the type Yt = D 0 +D1Yt-1 +D 2 Yt-2  ......  D p Yt-p  ȕ 0 X t +ȕ1X t-1 +......+ȕ q X t-q +İ t p

= D0 +

q

¦ Į Y +¦ ȕ X i

t-i

i

i=1

t-i

(10.227)

 İt

i=0

where Yt and X t are both I(1) and have a cointegration relationship, and {İ t } is a white noise process. By recursively replacing Yt-p+j

Yt-p+j+1  ǻYt-p+j+1 and X t-p+j

X t-p+j+1  ǻX t-p+j+1 , for j = 0, 1, 2, …….,p-1, model

(10.227) can be written as p-1

ǻYt = D 0 +

q-1

¦ ș ǻY j

j=1

t-j

+ș p Yt-1  ¦ ȥ j ǻX t-j +ȥq X t  İ t

(10.228)

j=0

p

where ș j ȥj

 ª¬D j+1  D j+2  .......  D p º¼ , j =1, 2,.....,p  1, ș p

 ª¬ȕ j+1  ȕ j+2  .....  ȕ q º¼ , j = 0, 1, 2,.....,q  1, and ȥ p

Equation (10.228) can be arranged as

¦D

i

1,

i=1

ȕ 0 +ȕ1  ȕ 2  .....  ȕ q

Chapter Ten

544 p-1

ǻYt =

ª D 0 ȥq º q-1 ș ǻY Y X »  ¦ ȥ ǻX +İ O    ¦ « t-1 j t-j O O t ¼ j=0 j t-j t j=1 ¬ p-1

q-1

¦ ș ǻY j

t-j

j=1

 O > Yt-1  G 0  G1 ('X t +X t-1 ) @  ¦ ȥ jǻX t-j +İ t j=0

p-1

q-1

¦ ș ǻY j

t-j

j=1

 O > Yt-1  G 0  G1X t-1 ) @  G1'X t +¦ ȥ jǻX t-j +İ t j=0

p-1

q-1

¦ ș ǻY j

t-j

j=1

where G 0

 ȜECM t-1  G1'X t +¦ ȥ jǻX t-j +İ t

(10.229)

j=0

ȥq

D0 ,G O 1

O

p

,O

1  ¦ D i , and ECM t-1 = Yt-1  G 0  X t-1 i=1

Equation (10.229) is called the error correction (EC) model and the term (Yt-1  G 0  G1X t-1 ) is called the errorcorrection term (equilibrium error). The coefficient of ECM t-1 is called the speed of adjustment which will be greater than zero and less than 1. Interpretation of the ECM The concepts of cointegration and the error correction mechanism are very closely related. To understand the ECM, it would be better to first think of the ECM as a convenient re-parametrisation of the general linear autoregressive distributed lag (ARDL) model given in equation (10.229). When two variables Y and X are cointegrated according to Asteriou (2007), the ECM incorporates not only the short-run effects but also the long-run effects. This is because the ECM model includes the long-run equilibrium (Yt-1  G 0  G1X t-1 ) with the short-run dynamics captured by the differenced terms. Another important advantage of the ECM is that all the terms of this model are stationary, and that is why, we can apply the standard OLS method for estimation. This is because, if Y and X are integrated of order one (i.e., if they are I(1)), then ǻYt-j and ǻX t-j are I(0) for all j. By definition, if Y and X are cointegrated, their linear combination (Yt-1  G 0  G1X t-1 ) will be integrated of order zero, i.e., I(0). An important point of an ECM is that the p

coefficient O

1  ¦ D i , provides us with information about the speed of adjustment in the cases of disequilibrium. i=1

To understand this, it would be better to consider the long-run condition. When equilibrium holds, then (Yt-1  G 0  G1X t-1 ) 0 . However, during the disequilibrium periods, this term is no longer zero and measures the distance the system is away from the equilibrium. For example, due to the coronavirus pandemic, there are negative shocks in an economy which causes (Yt-1  G 0  G1X t-1 ) to be negative because Yt-1 has moved below its long-run equilibrium path. However, since O is positive, the overall effect is to boost ǻYt back towards its long-run path as determined by X t in equation (10.229). Notice that the speed of this adjustment to equilibrium is dependent upon the p

magnitude of O

1  ¦ D i . The adjustment coefficient

O tells us how much of the adjustment to equilibrium takes

i=1

place each period, or how much of the equilibrium error is corrected each period. It can be explained in the following ways: (i) If O 1 , then 100% of the adjustment takes place within the period. Thus, it can be said that the speed of adjustment is very fast in case of any shock to the economy of society. (ii) If O 0.5, then it can be said that, when the variable Y is above its equilibrium level, it will be adjusted by 50% within the first year. The full convergence process to reach its equilibrium level takes about two years. (iii) If O

0, , then it can be said that there is no adjustment.

Vector Error Correction (VEC) Model and Cointegration When we deal with a VAR model, then cointegration analysis is somewhat more complex because the cointegration vector is generalised to a cointegration space, the dimension of which is not known previously. That is why, when we have a set of k I(1) variables, there may exist up to (k-1) independent linear combinations which implies that any linear combination of them is also I(0). That is, individual cointegration vector is no longer statistically identified;

Multivariate Time-Series Models

545

only the space spanned by these vectors is identified. If the variables of interest are stacked in the k-dimensional vector Yt , the elements of which are assumed to be I(1), there may be different vectors ȕ such that, Z t = ȕcYt , is I(0). That is, there may be more than one cointegration vectors ȕ for which the linear combination will be I(0). It is clearly possible for several equilibrium relations to govern the long-run behaviour of the k variables. In general, there can be r d (k  1), linearly independent cointegration vectors which are gathered together into the (k×r) cointegration matrix ȕ . By construction, the rank of the matrix ȕ is r which will be called the cointegration rank of Yt . This means that each element in the r-dimensional vector, Z t = ȕcYt , is I(0) in which each element in the k-dimensional vector Yt is I(1). Let us consider the pth-order Gaussian VAR model of the type p

Yt = į + ¦ Ǻ j Yt- j + İ t

(10.230)

j=1

All the terms of (10.230) are defined previously. Using the lag operator (L), equation (10.230) can be written as (10.231)

B(L)Yt = į + İ t

where Ǻ(L) = I k  Ǻ1 L  Ǻ 2 L2  .......  Ǻ p Lp Considering p =4, the VAR(4) model is given by (10.232)

Yt = į + Ǻ1 Yt-1 + Ǻ 2 Yt-2 + Ǻ 3 Yt -3 + Ǻ 4 Yt- 4  İ t

We have Yt  Yt-1 = į + Ǻ1 Yt-1  Yt-1 + Ǻ 2 Yt-2  Ǻ 2 Yt-1 + Ǻ 2 Yt-1 + Ǻ3 Yt-3  Ǻ 3 Yt- 2 + Ǻ 3 Yt-2  Ǻ 3 Yt-1  %3 Yt -1  Ǻ 4 Yt- 4  Ǻ 4 Yt-3 + Ǻ 4 Yt-3  Ǻ 4 Yt- 2 + %4 Yt- 2  %4 Yt -1 + %4 Yt-1  İ t

ǻYt

į + Ǻ1 Yt -1 + Ǻ 2 Yt -1 + Ǻ 3 Yt-1  Ǻ 4 Yt-1  Yt-1  Ǻ 2 ǻYt-1  Ǻ 3 ǻYt -2  Ǻ 3 ǻYt-1 +  Ǻ 4 ǻYt-3  Ǻ 4 ǻYt- 2  %4 'Yt-1  İ t

ǻYt = į + (Ǻ1 + Ǻ 2 + Ǻ 3 + Ǻ 4  I k )Yt -1 + ( Ǻ 2  Ǻ 3  Ǻ 4 )ǻYt-1  Ǻ 3  Ǻ 4 ǻYt-2 

Ǻ 4 ǻYt-3  İ t

(10.233)

ǻYt = į + ȆYt-1 + ī1 ǻYt-1 + ī 2 ǻYt-2 + ī 3 ǻYt -3 + İ t

where ī1

Ǻ 2  Ǻ 3  Ǻ 4 , ī 2

Ǻ 4 , and the long-run matrix is Ȇ

Ǻ 3  Ǻ 4 , ī 3

 (I k  Ǻ1  Ǻ 2  Ǻ 3  Ǻ 4 ) .

Similarly, the VAR( p) model can be expressed as ǻYt = į + ȆYt -1 + ī1 ǻYt-1 + ī 2 ǻYt-2 + ī 3 ǻYt-3  ......  ī P-1 ǻYt-(p-1) + İ t

(10.234)

p

where ī j

 ¦ Ǻi , j = 1, 2,.....(p-1) and the long-run matrix is i=j+1

Ȇ

(I k  Ǻ1  Ǻ 2  Ǻ 3  .....  Ǻ p )=  B(1)

The characteristic polynomial is I k  Ǻ1 z  Ǻ 2 z 2  ......  Ǻ p z p = B(z) . p-1

Interpretation of ǻYt = į + ȆYt-1 +

¦ ī ǻY j

t-j

+ İt

j=i

This equation is a direct generalisation of the regressions used in the augmented Dickey-Fuller test. Because ǻYt and İ t are stationary, it must be the case that ȆYt-1 is stationary. This could reflect three different situations which are illustrated below:

Chapter Ten

546

(1): If all the elements in Yt are integrated of order one and no cointegration relationships exist, it must be the case that Ȇ 0 , and equation (10.234) presents a stationary VAR model for ǻYt . (2): If all the elements in Yt are stationary I(0) variables, the matrix 3 that we can write a vector of moving average presentation, i.e.,

B(1) must be of full rank and invertible so

Yt = B -1 (L)(į + İ t ) .

(3): The interesting case is Rank(Ȇ ) r, where 0 < r < k. If Ȇ is of rank r ( 0 < r 1 . Equation (10.241) is called the VEC model with Įȕc = Ȇ . The term ȆYt-1 of (10.241) is called the error correction term. (iii) Here, we discuss the three different cases for Rank(Ȇ ) Case 1: r = 0, Ȇ = 0 (all

r:

O (Ȇ ) = 0 ), i.e., eigenvalues of Ȇ are zero;

Case 2: 0 < r < k, Ȇ = Įȕc, where Į is a (k×r) matrix and ȕ is also (k×r) ; Case 3: r =k , | Ȇ |=| -Ǻ(1) |z 0 . Case 1: Rank(Ȇ ) In case of Rank(Ȇ ) (i) Ȇ

0, m = 0 (all O (Ȇ ) = 0 ). 0 , i.e., r = 0, it follows that

0;

(ii) There does not exist a linear combination of the I(1) variables, which implies that VAR(p) model is stationary; (iii) The variables y’s are not cointegrated; (iv) The EC form reduces to a stationary VAR(p-1) in difference form of the type p-1

p

ǻYt = į + ¦ ī j ǻYt- j + İ t , where ī j j=1

Case 2: Rank(Ȇ )

 ¦ Ǻi , j = 1, 2,.....(p-1) i=j+1

r, 0 < r < k .

The rank of Ȇ is r, 0 < r < k. In case of Rank(Ȇ ) r, we factorise Ȇ in two m-rank matrices namely Į and ȕc . Rank(Į) = Rank(ȕ) = r . The order of Į is (k×r) and of ȕ , it is (k×r) . Ȇ = Įȕc z 0

In case of Rank(Ȇ )

r, 0 < r < k follows that

(i) The y’s are integrated of order 1, i.e., I(1);

Chapter Ten

548

(ii) There are r eigenvalues O (Ȇ ) z 0 ; (iii) The y’s are cointegrated. There are r linear combinations which are stationary. (iv) There are r linear independent cointegrating (column) vectors in ȕ . (v) There are r stationary linear combinations of the type ȕcYt . (vi) Yt has (k-r) unit roots, and so, (k-r) stochastic trends. (v) There are k I(1) variables, r cointegrating relations (r eigenvalues of Ȇ different from 0), and (k-r) stochastic trends r = r+(k-r). Case 3: Rank(Ȇ )

r , r =k.

In case of the full rank of Ȇ , it implies that (i) The | Ȇ |=| -Ǻ(1) |z 0 ; (ii) All the elements of Yt are stationary, i.e., Yt has no unit root; (iii) There are (k-r) = 0 stochastic trends; (iv) As a consequence, we model the relationship of the y’s in levels, not in the difference form; (v) There is no need to refer to the error correction representation. Ex. 10-18: If we consider two variables Y and X, then the VAR consists of two equations. Let us consider a VAR(1) model of the type y t = į1 +ș11 y t-1 +ș12 x t-1 +İ1t ½ ¾ x t = į 2 +ș 21 y t-1 +ș 22 x t-1 +İ 2t ¿

(10.242)

where İ1t and İ 2t are two white noise processes independent from the history of Y and X and may be correlated. In matrix form, the system of equations (10.242), can be written as § y t · § į1 · § ș11 ș12 · § y t-1 · § İ1t · ¨ ¸=¨ ¸+¨ ¸¨ ¸+¨ ¸ © x t ¹ © į 2 ¹ © ș 21 ș 22 ¹ © x t-1 ¹ © İ 2t ¹ (10.243)

Yt = į + Ǻ1 Yt-1 + İ t

Equation (10.243) is called a VAR(1) model. Equation (10.242) can also be written as y t  y t-1 = į1  (T11  1)y t-1  T12 x t-1  İ1t ½ ¾ x t  x t-1 į 2 +ș 21 y t-1  (T 22  1)x t-1  İ 2t ¿ 'y t = G1  (T11  1)y t-1  T12 x t-1  İ1t ½ ¾ ǻx t G 2  T 21 y t-1  (T 22  1)x t-1  İ 2t ¿

(10.244)

In matrix form, the system of equations (10.244) can be written as ª ǻy t º «ǻx » ¬ t¼

T12 º ª y t-1 º ª İ1t º ª G1 º ª(T11  1)  «G »  « T (T 22  1) »¼ «¬ x t-1 »¼ «¬İ 2t »¼ ¬ 2 ¼ ¬ 21 (10.245)

'Yt = į + ȆYt-1 + İ t

where Ȇ

B(1)

T12 º ª(T11  1) « T (T 22  1) »¼ ¬ 21

Multivariate Time-Series Models

549

This matrix will be zero matrix if ș11 = ș 22 =1, and ș12 = ș 21 = 0 . This corresponds to the case when two variables Y and X are nonstationary. The matrix Ȇ has a reduced rank if (T11  1)(T 22  1)  T 21T12

T 21T12

0

(T11  1)(T 22  1)

(10.246)

Thus, the reduced form of Ȇ is given by 1 ª º «T /(T  1) » > (T11  1) T12 @ ¬ 21 11 ¼

Ȇ DE c

Therefore, the system of equations (10.246) can also be written as 1 ª y t-1 º ª İ1t º ª G1 º ª º «G »  «T /(T  1) » > (T11  1) T12 @ « x »  «İ » ¬ 2 ¼ ¬ 21 11 ¼ ¬ t-1 ¼ ¬ 2t ¼

ª ǻy t º «ǻx » ¬ t¼ 'Yt

G  D (T11  1)y t-1  T12 x t-1  İ t

where D

(10.247)

1 ª º «T /(T  1) » ¬ 21 11 ¼

The error-correction form is thus quite simple as it excludes any dynamics. Here, both y and x adjust to the equilibrium error because ș 21 =0 is excluded. Also, ș 21 =0 implies that T11 T 22 1 and there is no cointegration. The linear combination z t = (ș11  1)y t +ș12 x t is stationary, that is, I(0). The linear combination is z t = (ș11  1)y t +ș12 x t

(10.248)

Equation (10.247) can be written as ª ǻy º ǻz t = > (ș11  1) ș12 @ « t » ¬ǻx t ¼ 1 ªİ º ªį º ª º ǻz t = > (ș11  1) ș12 @ « 1 » + > (ș11  1) ș12 @ « z t-1 + > (ș11  1) ș12 @ « 1t » » ¬į2 ¼ ¬ș 21 /(ș11  1) ¼ ¬ İ 2t ¼ z t = c +z t-1 +(ș11  1+ș 22  1)z t-1 +v t z t = c+(ș11 +ș 22  1)z t-1 +v t

(10.249)

where {v t } is a white noise process. z t is described by a stationary AR(1) process unless ș11 =1, and T 22

1 , which is

excluded. Ex. 10-19: In order to illustrate cointegration numerically, let us consider a first-order Gaussian VAR model of the type Yt = Ǻ1 Yt-1  İ t

(10.250)

where Yt is a (2×1) vector of endogenous variables and each of Y’s is I(1). İ t is a (2×1) vector of disturbance terms that are assumed to be independently and identically distributed errors with the distribution İ t ~N k (0, :). Ǻ1 is a 2 u 2 matrix of coefficients for Yt-1 . Model (10.250) can also be expressed as the following VEC model 'Yt = ȆYt-1  İ t

where Ȇ

(I 2  Ǻ1 )

Let the estimated value of Ȇ be

(10.251)

Chapter Ten

550

Ȇ

ª 0.5 1.0 º « 0.75 1.5 » ¬ ¼

Ȇ is a singular matrix and the rank of Ȇ is 1.

Then, Ȇ can be factorised as (10.252)

Ȇ 2u 2 = Į 2u1ȕ1cu 2

Here, k=2 is the number of endogenous variables and Rank(Ȇ ) A solution for Ȇ 2u2 = Į 2u1ȕ1cu 2 is ª 0.5 1.0 º « 0.75 1.5 » ¬ ¼

r =1 . Thus, there is only one cointegration relation.

ª 0.5 º ª1 ºc « 0.75» « 2 » ¬ ¼¬ ¼ ª 0.5 º « 0.75» >1 2@ ¬ ¼

Substituting this value in equation (10.251), we have ª ǻy1t º «ǻy » ¬ 2t ¼

ª y1,t-1 º ª İ1t º ª 0.5 º « 0.75» >1 2@ « y »  «İ » ¬ ¼ ¬ 2,t-1 ¼ ¬ 2t ¼

ª ǻy1t º «ǻy » ¬ 2t ¼

ª İ1t º ª 0.5 º « 0.75» y1,t-1  2y 2,t-1  «İ » ¬ ¼ ¬ 2t ¼

(10.253)

The linear combination z t-1 = y1,t-1  2y 2,t-1 , appears in both equations. Since the LHS variables 'y1t , ǻy 2t and the error terms are stationary, this linear combination is also stationary. This linear combination z t-1 = y1,t-1  2y 2,t-1 is a cointegrating relationship. Johansen’s Test for Cointegration In a VEC model, if we have more than two variables, there is a possibility of having more than one co-integrating equation. This means that, in the model, the variables might form several equilibrium relationships. In general, if we have k variables in the model, there may be only up to k-1 co-integrating equations. In the error-correction representation, the matrix Ȇ plays an important role to find out in co-integration vector. If Rank(Ȇ ) = 0, there is no co-integrating equation, implying that the system is not cointegrated. If Rank(Ȇ ) r, there are r cointegrating equations. Thus, for testing cointegration, we have to check the rank of Ȇ . Hence, to find out how many cointegrating relationships exist among k variables, we use the Johansen’s methodology2. This method is illustrated below. The Case of VAR Models For simplicity, emphasis should be given in the literature concerning cointegration tests based on VAR models. Johansen's method is perhaps the best-known approach in this literature. Let a set of k variables (k t 2), be under consideration that are I(1) and may be cointegrated. Consider the Gaussian VAR(p) model with a trend component of the type Yt = Ǻ1 Yt-1 + Ǻ 2 Yt-2 + .......... + Ǻp Yt-p + ĭDt + İ t

(10.254)

where Yt is a vector of the k variables at time t; Ǻ i is a k u k matrix of parameters (i= 1, 2,……….,p); D t is a vector of deterministic components with a vector of coefficients ĭ ; and İ t is a k u 1, vector of the random error terms. Assumptions: (i) The VAR(p) model is linear in the parameters; 2 Similar to Engle and Granger approach, the Johansen’s approach also requires all variables in the system are integrated of the same order 1 [I(1)].

Multivariate Time-Series Models

551

(ii) The parameters are constant; (iii) The error terms are identically and independently distributed and follow a Gaussian distribution, i.e., İ t ~IIN k (0, ȍ), where ȍ is the variance-covariance matrix of the error terms. Equation (10.254) can also be written as (I  Ǻ1 L  Ǻ 2 L2  .........  Ǻp Lp )Yt = ĭDt + İ t

(10.255)

p § · Then, the roots of | B(z)|= ¨ I  ¦ Ǻ i z i ¸ provide information on the stationarity of Yt . © i =1 ¹

1. If the roots of | B(z) | are all outside the unit circle, Yt is stationary. 2. If some roots are outside and some are on the unit circle, Yt is nonstationary. 3. If all roots are inside the unit circle, Yt is stationary. 4. If only one root of the characteristics equation lies outside the unit circle, Yt is explosive. Note: we can also find the roots by solving for eigenvalues of the companion matrix. These are equal to z -1 . Here, we p ª º assume that | B(z)| = « I  ¦ Ȇ i z i » z 0, for |z| < 1 and | B(1) | = 0 implying that Yt has some unit roots. ¬ i =1 ¼

Considering Vector Error Correction Model (VECM) In order to use the Johansen test, the VAR model (10.255) should be expressed into a VEC model of the type

ǻYt = ȆYt-1 + ī1ǻYt-1 + ī 2ǻYt-2 + ............ + īp-1Yt-p+1 + ĭDt + İ t

(10.256)

p

where Ȇ

(I k  Ǻ1  Ǻ 2  .....  Ǻ p ) =  B(1) , and ī j

 ¦ Ǻi , j = 1, 2,.....(p-1) . i=j+1

Note: The coefficient matrix of equation (10.256) can be obtained from the coefficient matrices of (10.254) as Ǻ j = ī j  ī j-1 , j = 2, 3,.....,p , and Ǻ1 = Ȇ + I + ī1 . This VEC model contains k variables in the first differenced form on the LHS and (p-1) lags of the dependent variables in the differenced form on the RHS, and ī j is the coefficient matrix which is attached with ǻYt- j (j = 1, 2,….(p-1)). In fact, the Johansen test is largely affected by the lag length employed in the VEC model. Thus, it is very important to select the lag length optimally of the VEC model as discussed in section 4.4 of this chapter. The Johansen test is based on the examination of the Ȇ matrix. Ȇ can be interpreted as a long-run coefficient matrix since, in equilibrium, all the ǻYt- j will be zero. Setting the error terms İ t to their expected value of zero, we have į + ȆE(Yt-1 ) = 0 .

In the case that 0  Rank(Ȇ ) = r < k, the number of equations of this system of linear equations which are different from zero is r. Since Rank(Ȇ )

r < k , it may be written as

Ȇ = Įȕc

where D and ȕ are the k×r full-rank matrices. Then, we have ǻYt = ĮȕcYt-1 + ī1 ǻYt-1 + ī 2 ǻYt -2 + ............ + īp-1 Yt-p+1 + ĭDt + İ t

(10.257)

where ȕcYt-1 is an (r u 1), vector of stationary co-integrating relations. All variables in equation (10.256) are now stationary; D denotes the speed of adjustment to equilibrium.

Chapter Ten

552

There are two cases of interest: 1. Rank(Ȇ ) = 0, implying that Ȇ = 0 . Thus, there is no cointegration vector. In this case, Yt has k unit roots and we can work directly on the differenced series ǻYt which is a VAR(p-1) process. 2. Rank(Ȇ ) = r > 0 . In this case, Yt has r cointegrating equations and (k-r) unit roots. As discussed before, there are (k×r) full-rank matrices Į and ȕ such that Ȇ = Įȕc . The vector series ȕcYt-1 is an I(0) process which is referred to as the cointegrating series and D denotes the impact of the co-integrating series on ǻYt . Let E A be a (k u (k-r)) full-rank matrix such that E Ac E 0 . Then, z t = ȕcYt has (k-r) unit roots and can be considered the (k-r) common trends of Yt .

Specification of Deterministic Terms and VEC Model In general, Johansen discusses five different forms for ĭDt = d 0 +d1 t . Although the first and the fifth cases are not realistic, all of the forms are presented here for reasons of complementarity.

Model 1: If ĭDt 0, it implies that there is no constant or trend term in the ECM in equation (10.256). That is, there is no intercept (or trend term) in CE (cointegrating equation) or VAR model. Thus, the components of Yt are I(1) processes without drift and z t = ȕcYt has zero mean. Model 2: If d 0 = D c0 and d1 = 0 , it implies that this is a case of restricted constant. The ECM model becomes ǻYt = Į(ȕcYt-1 + c0 ) + ī1 ǻYt-1 + ī 2 ǻYt -2 + ....... + īp-1 Yt-p+1 + İ t . The components of Yt are I(1) processes without drift and z t = ȕcYt has non-zero mean c0 . This model implies that there is intercept (or no trend) in CE, and no intercept (or trend) in the VAR model.

Model 3: If d1 = 0, and d 0 is unrestricted, this is a case of an unrestricted constant. The ECM becomes ǻYt = d 0 + ĮȕcYt -1 + ī1 ǻYt -1 + ī 2 ǻYt-2 + ...... + īp-1 Yt-p+1 + İ t . The components of Yt are I(1) processes with drift and z t = ȕcYt has non-zero mean. This is a case of intercept in CE and VAR, and no trend in CE and VAR.

Model 4: If ĭDt = d 0 + Įc1 t , this is a case of a restricted trend. The ECM then becomes ǻYt = d 0 + Į(ȕcYt -1 + c1 t) + ī1 ǻYt -1 + ī 2 ǻYt -2 + ......... + īp-1 Yt-p+1 + İ t . The components of Yt are I(1) processes with drift d 0 and the cointegrating series, and Yt has a linear trend. This implies that there are intercept and trend in CE, but intercept (no trend) in VAR.

Model 5: Intercept and quadratic trend in CE, intercept and linear trend in VAR. We now determine the rank of Ȇ or the number of cointegrating vectors. Given the specification of the deterministic term of a VEC model, two sequential tests namely: maximum eigenvalue test and trace test can be applied for determining the number of cointegrating relations or for the rank r of Ȇ .

Maximum Eigenvalue Test The null hypothesis to be tested is H 0 : Rank(Ȇ )

r

against the alternative hypothesis H1: Rank(Ȇ ) ! r

The test statistic is based on the characteristic roots (also called eigenvalues), obtained from the estimation procedure. The test consists of ordering the largest eigenvalues in a descending order and considering whether they are significantly different from zero. To understand the test procedure, suppose that we obtained m characteristic roots. Let the ith characteristic root be denoted by Oi (i =1, 2,.......,m) and put in an ascending order of the magnitude O1 t O2 t ..... t Om . If the variables under investigation are not cointegrated, the rank of Ȇ is zero and all the characteristic roots will be zero. Therefore, (1  Oˆ ) is equal to 1, and hence, ln(1  Oˆ ) = 0 . If Rank(Ȇ ) 1 , ln(1  Oˆ ) i

i

1

Multivariate Time-Series Models

553

will be negative and ln(1  Oˆi ) = 0,  i >1 . If the eigenvalue Oˆ is non-zero, ln(1  Oˆi ) < 0 ,  i >1 . Thus, for Rank(Ȇ ) 1 , the largest eigenvalue must be significantly different from zero while the others will not be significantly different from zero. Therefore, to test how many of the numbers of the characteristic roots are significantly different from zero, Johansen proposed the following test statistic:

Omax (r, r+1) =  Tln(1  Oˆr+1 )

(10.258)

We start with r = 0, implying that Rank(Ȇ ) 0, It indicates that there is no cointegration relationship against r = 1. That is, there is one cointegration relationship. If we reject r = m-1 cointegration relationships, we should have to conclude that there are r = m cointegration relationships. Since the test statistic is based on the maximum eigenvalue, it is called the maximal eigenvalue statistic and is denoted by Omax .

Trace Test The null hypothesis to be tested is H 0 : Rank(Ȇ ) d r

against the alternative hypothesis r+1 .

H1: Rank(Ȇ )

The trace test statistic is based on a likelihood ratio test about the trace of the matrix (and because of that, it is called the trace statistic). The trace statistic considers whether the trace is increased by adding more eigenvalues beyond the rth eigenvalue. The null hypothesis is that the number of cointegrating vectors is less or equal to r against an unspecified or general alternative that there are more than r cointegration vectors. It starts with m = r+1 eigenvalues, and then, for successively larger Ȝˆ i 0 , the trace statistic is equal to zero. Thus, when Ȝˆ i =0 for i = 1, 2,…..,m, the trace test statistic will be zero. k

Otrace (r) =  T ¦ ln(1  Oˆi )

(10.259)

i=r+1

We start with r = 0 implying that Rank(Ȇ )

0, It indicates that there is no cointegration relationship against r = 1.

That is, there is at least one cointegration relationship. The null hypothesis will be rejected when

Otrace is larger,

implying that the sum of the remaining eigenvalues Or 1 t Or  2 t ...... t Ok is large.

Critical values: Critical values for both test statistics are provided by Johansen and Juselius (1990). The distribution of both test statistics is non-standard, and the critical values depend on the values of (k-r) and the deterministic components which are included in each of the equations. Osterwald-Lenun (1992) provided a more complete set of critical values for the Johanset test. These critical values are directly provided by EViews after conducting a cointegration test. Decision: If the calculated value of the test statistic is greater than the critical value, the null hypothesis will be rejected. Thus, we can say that there are r cointegration vectors in favour of the alternative hypothesis that there are more than r (for Omax ) or r+1 (for Otrace ). Both tests are conducted in a sequence, and under the null hypothesis, r =0, r=1,……, r=k-1, the hypotheses for both tests are shown in Table 10-15.

Table 10-15: Sequence of hypotheses for

Omax and Otrace tests

Maximum Eigenvalue Test ( Omax ) Null Hypothesis H0 : r = 0 H0 : r = 1 H0 : r = 2 . . . H0 : r = k  1

Alternative Hypothesis H1 : 0 < r d k H1 :1 < r d k H1 : 2 < r d k . . . H1: r = k

Trace Test ( Otrace ) Null Hypothesis H0 : r = 0 H0 : r d 1 H0 : r d 2 . . . H0 : r d k  1

Alternative Hypothesis H1: r = 1 H1: r = 2 H1: r = 3 . . . H1: r = k

Chapter Ten

554

If the null hypothesis H 0 : r = 0 is rejected, there is r=1 or more than one cointegration equations. Thus, the null hypothesis H 0 : r = 1 will be tested, and so on. Therefore, the value of r will be increased continuously until the null hypothesis is accepted.

Advantages of the ECM The ECM is the most popular and widely applicable for econometric analyses of economic relationships for many reasons. Some of them are: (i) The main advantage of the VEC model is that it has a very nice interpretation with long-term and short-term equations. The vector error correction model is a representation of cointegrated VAR. If you have a cointegrated VAR, it has VECM representation, and vice versa. (ii) The error correction models are formulated in terms of the first differences, which typically eliminate the stochastic trends from the variables that are involved. As a result, the problem of spurious regressions will be solved. (iii) The VEC model indicates the existence of the causal relationships between variables but it does not indicate the direction of the causal relationships between variables. Therefore, a VEC model allows us to test for detecting the causal relationships between variables using the Engle and Granger tests procedure. (iv) The VEC model is very important for measuring the correction factor from the disequilibrium of the previous period which has a very good economic implication. In the VEC model, the coefficient of the ECM represents how fast the deviations from the long-run equilibrium are eliminated.

Ex. 10-20: The purchasing power parity (PPP) theorem indicates the real exchange rate of Japanese Yen against the USD as given by RER t =

EXJPN t ×CPIUSA t CPIJPN t

(10.260)

where EXJPN t is the nominal exchange rate of Japan for per unit of the USD at time t, CPIUSA t is the consumer price index of the USA at time t, and CPIJPN t is the consumer price index of Japan at time t. Taking logarithms of equation (10.259), we have (10.261)

q t = erjpn t  pusa t  pjpn t

where q t = ln(RER t ), erjpn t = ln(EXJPN t ), erjpnt =ln(EXJPNt), pusa t =ln(CPIUSA t ), and pjpn t =ln(CPIJPN t ). The necessary and sufficient condition of PPP implies that the variables on the RHS of equation (10.261) is the log of the exchange rate between Japan and USA, and the logs of the price levels of USA and Japan will be cointegrated with the cointegration vector [1 1 -1]c . The cointegration relationships among the variables are investigated using the Johansen (1990) test. The investigation of the cointegration relationships among the variables depends on the existence of the unit root problem in each variable. If the unit root problem is present in each variable, the long-run cointegration relationships among the variables are examined. The Augmented Dickey-Fuller (ADF) and PhillipsPerron (PP) tests are applied in order to investigate whether each of the variables contains the stochastic trend or not. EViews is used in order to detect the presence of the unit root problem using the ADF and PP tests for the data from 1960-2018. The ADF and PP test results support that all the variables are integrated of order 1. Thus, it can be said that there exists a cointegration relationship among the variables. The long-run relationships among the variables are investigated using Johansen’s (1990) trace and the maximum eigenvalue test statistics. The lag length of the unrestricted vector autoregressive (VAR) model for the given data is determined on the basis of the adjusted likelihood ratio (LR) test, AIC, SBIC, and HQIC criteria. EViews is used again to obtain Johansen’s (1990) trace and the maximum eigenvalue tests. The results are reported in Table 10-16.

Table 10-16: Johansen’s cointegration test results Hypothesised Number of Co-integrated Equation(s) None* At most 1 At most 2

Trace Statistic

5% Critical Values

Prob.

Max-Eigen Statistic

5% Critical Value

Model 1: Intercept (no trend) in co-integration equation and VAR 32.4430 29.7971 0.0242 21.3907 21.1316 11.0523 15.4947 0.2083 7.9235 14.2646 3.1288 3.8415 0.0769 3.1288 3.8415

Prob. 0.0460 0.3865 0.0769

Multivariate Time-Series Models

None* At most 1 At most 2

555

Model 2: Intercept and trend in co-integration equation and VAR 60.70574 42.91525 0.0004 35.54004 25.82321 19.38704 25.16570 25.87211 0.0610 17.98520 7.180494 12.51798 0.3261 7.180494 12.51798

0.0019 0.0790 0.3261

The results of the trace test statistic and the maximum eigenvalue test statistic for both models support that there exists 1 cointegrating equation at 5% level of significance which indicates the existence of cointegrating relationships among the variables. The estimated results for the VEC model are given in Table 10-17.

Table 10-17: The estimated results for the VEC model

ejpn (-1) pusa(-1) pjpn(-1) Constant Error Correction: CointEq1 D(ejpn(-1)) D(pusa(-1)) D(pjpn(-1)) C R-squared Adj. R-squared Sum sq. resids S.E. equation F-statistic Log likelihood Akaike AIC Schwarz SC Mean dependent S.D. dependent

Vector Error Correction Estimates Sample (adjusted): 1960 2018 Included observations: 43 after adjustments Standard errors in ( ) & t-statistics in [ ] Cointegrating Eq: Coint Eq1 : ejpn (-1) Coeff. Standard Error -1.0000 0.2374 0.1809 0.3016 0.7209 --8.9040 D(ejpn) D(pusa) -0.2817* -0.0035 (0.0603 (0.0109) [-4.6675] [-0.3212] -0.0578* 0.3061* (0.0204) (0.1134) [-2.8333] [2.7047] 2.0186* 0.7763* (0.6051) (0.1091) [3.3358] [7.1133] -0.5841 0.0654 (0.3527) (0.0636) [-1.6561] [1.0276] -0.0719* 0.0054 (0.0212) (0.0038) [-3.3864] [ 1.4102] 0.717345 0.346892 0.695602 0.296653 0.011051 0.339780 0.014578 0.080835 32.99245 6.904838 162.7472 65.11197 -5.534990 -2.109192 -5.355775 -1.929977 0.037339 -0.020734 0.026423 0.096386

From the estimated results of the VEC model, it is found that the adjusted parameter O1 cointegration equation is statistically significant but the adjusted parameters O2 0.0035 and O3 statistically significant.

t-Test -0.7620 2.3903 --D(pjpn) -0.0268 (0.0177) [-1.5138] -0.0182 (0.0332) [-0.5501] 0.0875 (0.1773) [0.4938] 0.8017 (0.1033) [7.7596] 0.0016 (0.0062) [ 0.2491] 0.660020 0.633868 0.029155 0.023678 25.23753 135.0990 -4.564878 -4.385663 0.029098 0.039132 0.2817 of the 0.0268 are not

Bounds Test Approach for Cointegration The bounds test approach for cointegration is known as the autoregressive-distributed lag (ARDL) model approach. This approach, developed by Pesaran et al., (2001), has become the most popular one among researchers. The bounds test approach has certain econometric advantages in comparison to other single equation cointegration procedures. They are as followss: (i) endogeneity problems and the inability to test hypotheses on the estimated coefficients in the long-run associated with the Engle-Granger method are avoided; (ii) the long-run and short-run parameters of the model in question are estimated simultaneously; (iii) the bounds test approach for testing the existence of the long-run

Chapter Ten

556

relationship between the variables in levels is applicable irrespective of whether the underlying time-series variables are purely I(0), I(1) or fractionally integrated; (iv) the small sample properties of the bounds testing approach are far superior to that of multivariate.

The Model The ARDL modelling approach for cointegration involves estimating the following unrestricted error correction regression equations: p

p

'y t = c0 +¦ Į i ǻy t-i  ¦ ȕ i 'x t-i +ʌ1 y t-1  ʌ 2 x t-1 +İ1t i=1

i=0

p

p

'x t = d 0 +¦ įi 'y t-i  ¦ Ȝ i 'x t-i +ʌ 3 y t-1  ʌ 4 x t-1  İ 2t i=0

(10.262)

(10.263)

i=1

where {İ1t }, and {İ 2t } are two white noise processes and independent from the history of y, and x may be correlated across the equations. ǻ denotes the first-difference operator; D 0 and ȕ 0 are two constants for two equations; ʌ1 and

ʌ 2 signifies coefficients on the lagged levels; Įi and ȕ i (i= 1, 2,......,p) denote coefficients on the lagged variables of equation (10.262); ʌ 3 and ʌ 4 signifies coefficients on the lagged levels; G i and Oi (i= 1, 2,......,p) denote coefficients on the lagged variables of equation (10.263); and p signifies the maximum lag length which is decided by the user. If we deal with three or more than three variables, there will be three or more than three equations like (10.262) and (10.263). The test procedure involves the following steps:

Step 1: First, we detect the order of integration of y t and x t using either the DF test, the ADF test, or the PP test. Suppose the test results indicate that y t and x t are integrated in different orders, say I(0) and I(1), or I(0) and I(0), or I(1) and I(0), etc. Step 2: Second, the order of lags for each variable in unrestricted regression equations (10.262) and (10.263) are determined using the AIC, SBIC, and HQIC criteria. We select the model for which the values of the AIC, SBIC, and HQIC criteria are the smallest. Step 3: Third, we set up the null hypothesis for testing the cointegration relationship between y t and x t against an appropriate alternative hypothesis. In equation (10.262), when Y is the dependent variable, the null hypothesis of no cointegration is H 0 : ʌ1 = ʌ 2 = 0

against the alternative hypothesis of cointegration H1 : ʌ1 z ʌ 2 z 0 .

On the other hand, in equation (10.263), when X is the dependent variable, the null hypothesis of no cointegration is H0 : ʌ3 = ʌ 4 = 0

against the alternative hypothesis of cointegration H1 : ʌ3 z ʌ 4 z 0 .

Step 4: Now, we apply a test statistic for testing the null hypotheses of no cointegration relationship. According to Pesaran et al., (2001), for equations (10.262) and (10.263), the F-test will be applied for investigating one or more long-run relationships. Under the null hypothesis, the asymptotic distribution of the F-statistic is non-standard and it was originally derived and tabulated by Pesaran et al., (2001) but later modified by Narayan (2005) to accommodate for small sample sizes.

Step 5: Then, we make a decision on whether the null hypothesis will be accepted or rejected by comparing the calculated value of the non standard F-test statistic with the critical values. Two sets of critical values are provided; one is appropriate where all the variables are I(0) and the other is appropriate where all the variables are I(1). According to Pesaran et al., (2001), if the calculated F-statistic falls above the upper critical value, a conclusive

Multivariate Time-Series Models

557

inference can be made regarding cointegration without knowing whether the variables are I(0) or I(1). In this case, the variables are said to be cointegrated indicating the existence of a long-run relationship among the variables. Alternatively, if the calculated F-statistic falls below the lower critical value, the null hypothesis of no cointegration will not be rejected regardless of whether the variables are I(0) or I(1). In contrast, the inference is inconclusive if the calculated F-statistic falls within the lower and upper critical values unless we know whether the series are I(0) or I(1).

Ex. 10-21: The cointegration relationships between financial development such as broad money (M2, % of GDP), domestic credit (DC, % of GDP), and the economic growth (PGDP, constant 2015 USD) of Bangladesh using the data from 1974-2021 are detected by applying the Bounds test approach. In order to detect the cointegration relationships between financial development and economic growth, using the Bounds test approach, the following unrestricted regression equations are formulated: p

p

p

ǻPGDPt = Į 0 +¦ Į1i ǻPGDPt-i +¦ Į 2i ǻM2t-i +¦ Į3i ǻDC t-i +ʌ11PGDPt-1 +ʌ12 M2 t-1 + i=1

i=0

i=0

(10.264)

ʌ13 DC t-1 +İ1t p

p

p

ǻM2 t = ȕ 0 +¦ ȕ1i ǻPGDPt-i +¦ ȕ 2i ǻM2 t-i +¦ ȕ 3i ǻDC t-i +ʌ 21PGDPt-1 +ʌ 22 M2 t-1 + i=0

i=1

i=0

(10.265)

ʌ 23 DC t-1 +İ 2t p

p

p

ǻDC t = Ȝ 0 +¦ Ȝ1i ǻPGDPt-i +¦ Ȝ 2i ǻlnM2 t-i +¦ Ȝ 3i ǻlnDC t-i +ʌ 31PGDPt-1 +ʌ 32 M2 t-1 + i=0

i=0

i=1

(10.266)

ʌ 33 DC t-1 +İ 3t

From the DF, ADF and PP tests, it is found that all the variables are non-stationary. Thus, we may expect the existence of cointegration relationships between financial development and economic growth in Bangladesh. In equation (10.264), when PGDP is the dependent variable, the null hypothesis of no cointegration is H 0 : ʌ11 = ʌ12 = ʌ13 = 0

against the alternative hypothesis of cointegration H1 : ʌ11 z ʌ12 z ʌ13 z 0 .

In equation (10.265), when M2 is the dependent variable, the null hypothesis of no cointegration is H 0 : ʌ 21 = ʌ 22 = ʌ 23 = 0 .

against the alternative hypothesis of cointegration H1 : ʌ 21 z ʌ 22 z ʌ 23 z 0 .

On the other hand, in equation (10.266), when DC is the dependent variable, the null hypothesis of no cointegration is H 0 : ʌ31 = ʌ32 = ʌ 33 = 0

against the alternative hypothesis of cointegration H1 : ʌ31 z ʌ32 z ʌ33 z 0 .

The appropriate lag-length(s) of unrestricted regression equations (10.264), (10.265) and (10.266) are selected using the AIC, SBIC, and HQIC criteria. Using RATS, the values of the F-test statistic are calculated for testing the null hypotheses. The results are given in Table 10-18.

Table 10-18: The results of F-test for cointegration relationship Functional Forms f(PGDP|M2, DC) f(M2|PGDP, DC) f(DC|PGDP, M2)

F-test Value 24.5628* 2.7593 1.8030

Critical Values At 10% level: [3.17, 4.14] At 5% level: [3.79, 4.85] At 1% level: [5.15, 6.36]

Source: WDI 2022, Table CI (iii) on p.300 of Pesaran et al. (2001) is the relevant table for us to use.

Chapter Ten

558

Decision: For k+1=3, the lower and upper bounds for the F-test statistic at 10%, 5%, and 1% significance levels are [3.17, 4.14], [3.79, 4.85], and [5.15, 6.36] respectively. From the test results, it can be said that the null hypothesis of no cointegration will be rejected only for equation (10.264), but for equations (10.265) and (10.266), the null hypothesis of no cointegration will be accepted. Finally, it can be concluded that economic growth is cointegrated with financial development in Bangladesh.

10.8 Causality Tests The Granger F Test for Causality In order to investigate the causal directions of different macroeconomic variables associated, initially we have to use the Granger causality test. Prior to the Granger causality test, we have to examine the order of integration for all variables by the unit root tests, i.e., we can apply the Dickey-Fuller (DF), Augmented Dickey-Fuller (ADF), and the Philips-Perron (PP) tests. If we find that all the variables are integrated of order 1, we have to go for further investigation. That is, for the co-integration relationship between different pairs of variables, which are used for the Granger causality test. If we find that pair of variables are cointegrated, then for the Granger Causality test takes the following forms for the variables which are cointegrated. The cointegration relationship indicates the existence of causal relationships between variables but it does not indicate the direction of causal relationships between variables. Therefore, it is common to test for detecting the causal relationships between variables using the Engle and Granger tests. There are three different models that can be used to detect the direction of causality between two variables X and Y depending upon the order of integration and the presence or absence of a cointegration relationship.

Model 1: If two variables X and Y are individually integrated of order one, i.e., I(1), and cointegrated, then the Granger causality test may use I(1) data because of super consistency properties of estimators. The regression equations for the Granger causality test will be p

q

Yt = D 0 +¦ D i Yt-i  ¦ ȕ j X t-j +İ t i=1

j=1

p

q

X t = G 0 +¦ Įi X t-i  ¦ ȕ j Yt-j +ȟ t i=1

(10.267)

(10.268)

j=1

Model 2: If X and Y are I(1) and cointegrated, the Granger causality test can also be applied to I(0) data with an error correction term. For this case, the regression equations will be as follows p

q

'Yt = D 0 +¦ Įi 'Yt-i  ¦ ȕ j 'X t-j +O1ECM t-1  İ t i=1

j=1

p

q

'X t = į 0 +¦ Įi 'X t-i  ¦ ȕ j 'Yt-j +O2 ECM t-1  ȟ t i=1

(10.269)

(10.270)

j=1

The ECM is the error correction term which combines the short-run and long-run dynamics of cointegrated variables towards the long-run equilibrium.

Model 3: If X and Y are I(1) but not cointegrated, Granger causality test requires a transformation of the data to make I(0). The models for the Granger causality test will be as follows p

q

'Yt = D 0 +¦ Įi 'Yt-i  ¦ ȕ j 'X t-j  İ t i=1

j=1

p

q

'X t = į 0 +¦ Įi 'X t-i  ¦ ȕ j 'Yt-j  ȟ t i=1

(10.271)

(10.272)

j=1

İ and ȟ are random error terms which are serially uncorrelated with zero mean and constant variance. To select the lag values in each equation, we will apply the AIC, SBIC, and HQIC criteria. The null hypothesis to be tested for equation (10.267) or (10.269) or (10.271) is H 0 : ȕ1 =ȕ 2 = .........= ȕ q = 0, or H 0 : X does not Granger cause Y. against the alternative hypothesis

Multivariate Time-Series Models

559

H1: At least one of them is not zero, or H1 : X causes Y. Also, the null hypothesis to be tested for equation (10.268) or (10.270) or (10.272) is H 0 : ȕ1 =ȕ 2 = .........= ȕ q = 0, or H 0 : Y does not Granger cause X.

against the alternative hypothesis

H1: At least one of them is not zero, or H1 : Y causes X. We now explain different cases of causality.

Case 1: Unidirectional causality from X to Y If the null hypothesis is rejected for equation (10.267) or (10.269) or (10.271) and accepted for equation (10.268), or (10.270) or (10.272), then it can be said that there is a unidirectional causality from X to Y.

Case 2: Conversely unidirectional causality from Y to X If the null hypothesis for equation (10.267) or (10.269) or (10.271) is accepted but rejected for the equation (10.268), or (10.270) or (10.272), then it can be said that there is unidirectional causality from Y to X.

Case 3: Bilateral causality If the null hypothesis is rejected for equations (10.267) and (10.268); or (10.269) and (10.270); or (10.271) and (10.272); it then can be said that there is bidirectional causality from X to Y and Y to X.

Case 4: Independence If the null hypothesis is accepted for equations (10.267) and (10.268); or (10.269) and (10.270); or (10.271) and (10.272), it then can be said that there is no causal relationship between X and Y. Another channel of causality can be studied by testing the significance of ECM’s. This test is referred to as the long run causality test. The rejection of the null hypothesis H 0 : O1 = 0 implies that the long-run relationship between Y and X is statistically significant. The Granger causality test procedure is discussed in steps for two non-stationary variables X and Y as follows:

Step 1: First, we regress Y on the lagged values of Y and the lagged values of X of the type m

q

i=1

j=1

Yt = Į 0 +¦ Įi Yt-i +¦ ȕ j X t-j +u t

(10.273)

where {u t } is a Gaussian white noise process.

Step 2: Second, we apply the OLS method to run equation (10.273) and then obtain the residual sum of squares which is called the unrestricted residual sum of squares (URSS). It is given by UESS =

T

¦e

2 t

(10.274)

t=1

m

q

i=1

j=1

where e t = Yt  Įˆ 0  ¦ Įˆ i Yt-i  ¦ ȕˆ j X t-j . The lagged values of (10.273) are selected using the AIC, SBIC, and HQIC criteria.

Step 3: Third, we set up the null hypothesis. The null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 =...........= ȕ q = 0 , or H 0 : X does not Granger cause Y.

against the alternative hypothesis

H1: At least one of them is not zero, or H1 : X causes Y.

Chapter Ten

560

Step 4: Now, we obtain the restricted residual sum of squares. Under the null hypothesis, the regression equation will be m

Yt = Į 0 +¦ D i Yt-i + u t

(10.275)

i=1

Apply the OLS method to run equation (10.275) and then obtain the residual sum of squares which is called the restricted residual sum of squares (RESS). It is given by T

RESS = ¦ e 2tr

(2.276)

t =1

m

where e tr = Yt  Įˆ 0  ¦ Įˆ i Yt-i i=1

Step 5: Then, we calculate the value of the F-test statistic which is given by F=

(RESS  UESS)/q F(q, T  p) UESS/(T  p)

(10.277)

where q is the number of restrictions; p is the number of parameters in the unrestricted regression equation; and T is the total number of observations.

Step 6: Finally, we make a decision whether the null hypothesis will be accepted or rejected by comparing the calculated value of the test statistic with its table value. Let the level of significance be 5%. At 5% level of significance with q and (T-p) degrees of freedom, we find the table value. Let the table value be Ftab,0.05(k, T-p) . If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis. Thus, we can say that X causes Y. Otherwise, we can say that X does not cause Y. Note: If we regress X on lagged values of X and Y, then the same procedure can be applied to detect whether Y causes X or not.

Ex. 10-22: The causality relationships between different pairs of variables in logarithmic forms of GDP (constant 2015 USD), money supply (M2, in crore Tk) and export values (EX, constant 2015 USD) of Bangladesh for the period 1972-2021 are detected using the Granger causality F-test. It is found that the variables ln(GDP), ln(M2), and ln(EX) are non-stationary and cointegrated. Thus, we may expect the existence of causality relationships between different pairs of variables. Since the variables are cointegrated, the causality analysis between different pairs of variables takes the form of (10.267) and (10.268). It is found that the number of lags for each variable is 2. For causality analysis performed using EViews, the test results are given in Table 10-19. Table 10-19: The Granger F-test results Hypothesis 1. ln(GDP) and ln(M2) ln(M2) does not Granger cause ln(GDP) ln(GDP) does not Granger cause ln(M2) 2. ln(GDP) and ln(EX) ln(EX) does not Granger cause ln(GDP) ln(GDP) does not Granger Cause ln(EX) 3. ln(M2) and ln(EX) ln(EX) does not Granger cause ln(M2) ln(M2) does not Granger cause ln(EX)

F-Test

(p-Value)

Conclusion

3.2295** 3.2802*

0.0501 0.0480

ln(M2) o ln(GDP) , ln(GDP) o ln(M2)

8.9789* 2.0933

0.0006 0.1366

ln(EX) o ln(GDP)

1.4503

0.2466

ln(M2) o ln(EX)

4.0749*

0.0245

Source: WDI, 2022, Own Calculations, x o y means x Granger causes y. **: indicates significant at 10% level, *: indicates significant at 5% level

The findings in Table 10-19 indicate the bidirectional causality running between broad money (M2) and economic growth, unidirectional causalities running from export (EX) to economic growth (GDP), and money supply (M2) to export values (EX).

Toda-Yamamoto Approach for Granger Causality We know that the time-series variables Y and X may be either integrated of different orders or non-cointegrated or both. According to Toda and Yamamoto (1995), in these cases, the ECM cannot be applied to Granger causality tests.

Multivariate Time-Series Models

561

For these cases, Toda and Yamamoto (1995) developed an alternative test statistic irrespective of whether Y and X are I(0), I(1) or I(2), non-cointegrated or cointegrated of a finite order. This is very popular and widely known as the Toda and Yamamoto (1995) augmented Granger causality test. This procedure is based on the asymptotic theory for testing the causality relationships between integrated variables of different orders.

The Model The Toda and Yamamoto (1995) augmented Granger causality test procedure is based on the following regression equations k+d

k+d

i=1

i=1

Yt = Į 0 +¦ Įi Yt-i +¦ ȕ i X t-i +u t k+d

k+d

i=1

i=1

(10.278)

X t = ș 0 +¦ ș i X t-i  ¦ G i Yt-i +v t

(10.279)

where {u t } and {v t } are assumed to be Gaussian white noise with a zero mean, a constant variance, and no autocorrelation; d is the maximum number of orders of integration of the variables in the system; and k is the optimal lag length of Y and X respectively.

Test Procedure This test procedure involves the following steps:

Step 1: First, using either the DF, ADF, or PP tests, we detect the maximum order of integration (d) for each variable in the system. Suppose, the test results indicate that the variable Yt is integrated of order 1, i.e., I(1) and the variable X t is integrated of order 2, i.e., I(2). Thus, the maximum order of integration is 2, i.e., d = 2. Step 2: Second, we develop an optimum lag length kth-order bivariate VAR model in the level form of the type k

k

i=1

i =1

k

k

i=1

i=1

Yt = Į 0 +¦ Įi Yt-i +¦ ȕ i X t-i +u t

(10.280)

X t = ș 0 +¦ ș i X t-i +¦ įi Yt-i +v t

(10.281)

Using the AIC, the SBIC or the HQIC criterion, we can select the optimum lag length of the variable Y and X.

Step 3: Third, we develop the bivariate VAR(k+d) model for the variables Y and X of the type k

d

k

d

i=1

j=1

i=1

j=1

k

d

k

d

i=1

j=1

i=1

j=1

Yt = Į 0 +¦ Įi Yt-i +¦ Į k+j Yt-k-j +¦ ȕ i X t-i +¦ ȕ k+j X t-k-j +u t X t = ș 0 +¦ ș i X t-i +¦ ș k+j X t-k-j +¦ įi Yt-i +¦ į k+j Yt-k-j +v t

(10.282)

(10.283)

We now apply the OLS method to run equations (10.282) and (10.283) and then we obtain the residual sum of squares which are called the unrestricted residuals sum of squares. Let us define, UESSy is the unrestricted residual sum of squares when Y is the dependent variable, and UESSx is the unrestricted residual sum of squares when the variable X is the dependent variable. At this stage, diagnostic tests for serial correlation, autoregressive conditional heteroscedasticity, heteroscedasticity, functional form misspecification, and non-normal errors should be conducted.

Step 4: Now, we set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested for equation (10.282) is H 0 : ȕ1 =ȕ 2 = .......= ȕ k = 0, or H 0 : X does not cause Y

against the alternative hypothesis

H1: At least one of them is not zero, or H1 : X causes Y. For equation (10.283), the null hypothesis of no causal relationship is

Chapter Ten

562

H 0 : G1 =G 2 = .........= G k = 0, or H 0 : Y does not cause X

against the alternative hypothesis

H1: At least one of them is not zero, or H1 : Y causes X. Step 5: Then, we obtain the restricted residual sum of squares. Under the null hypotheses, the restricted VAR model is given by k

d

d

i=1

j=1

j=1

k

d

d

i=1

j=1

j=1

Yt = Į 0 +¦ Įi Yt-i +¦ Į k+j Yt-k-j +¦ ȕ k+j X t-k-j +u t

(10.284)

X t = ș 0 +¦ ș i X t-i +¦ ș k+j X t-k-j +¦ į k+j Yt-k-j +v t

(10.285)

We apply the OLS method to run equations (10.284) and (10.285), and then, we obtain the residual sum of squares which are called the restricted residual sum of squares. Let us define, RESSy is the restricted residual sum of squares when Y is the dependent variable, and RESSx is the restricted residual sum of squares when X is the dependent variable.

Step 6: In this stage, we calculate the value of the test statistic in order to draw valid casual inferences. Toda and Yamamoto utilises a modified Wald test statistic (MWALD) restricting the parameters of the kth optimal lag order of the vector autoregressive. The MWALD statistic has an asymptotic Chi-square distribution. The MWALD statistic also has an asymptotic F distribution. (i) For equation (10.282), the F-statistic for the modified Wald test statistic is F=

(RESSy  UESSy )/k UESSy /(T  p)

(10.286)

F(k, T  p)

(ii) For equation (10.283), the F-statistic for the modified Wald test statistic is F=

(RESSx  UESSx )/k F(k, T  p) UESSx /(T  p)

(10.287)

Step 7: We make a decision whether the null hypothesis will be accepted or rejected by comparing the calculated value of the F-test with the table value. If the calculated value of the test statistic is greater than the table value, we reject the null hypothesis. Therefore, we can say that X causes Y, or Y causes X. Ex. 10-23: For the given problem in Ex. 10-21, the causality relationships between different pairs of variables are detected using the Toda-Yamamoto test. Using the DF, ADF, PP and KPSS tests, it is found that the variable PGDP is integrated of order 2, and other two variables M2 and DC are integrated of 1. Also, using the AIC, SBIC and HQIC criteria, it is found that the optimal lag length of the VAR model is 1. Thus, the VAR(1+2) model is given by ª PGDPt º ªC1 º ª Į111 Į121 Į131 º ª PGDPt-1 º « M2 » = «C » + «Į »« » t » « 2 » « 211 Į 221 Į 231 » « M2 t-1 »  « «¬ DC t »¼ «¬C3 »¼ «¬ Į311 Į321 Į331 »¼ «¬ DC t-1 »¼ ª PGDPt-1-j º ªİ1t º « » « » « M2t-1-j »  «İ 2t » « » «İ » ¬« DC t-1-j ¼» ¬ 3t ¼

ª į11,1+j « «į 21,1+j j=1 « ¬« į31,1+j 2

¦

į12,1+j į 22,1+j į32,1+j

į13,1+j º » į 23,1+j » » į33,1+j ¼»

(10.288)

where C's, Į's, and į's are the parameters to be estimated, and İ's are the random error terms distributed identically and independently with mean zero and a finite covariance matrix. The null hypothesis that money supply does not Granger cause economic growth can be expressed as H 0 : D121 = 0

Multivariate Time-Series Models

563

against the alternative hypothesis H1: D121 z 0 .

The null hypothesis that domestic credit does not Granger cause economic growth can be expressed as H 0 : D131 = 0

against the alternative hypothesis H1: D131 z 0 .

The joint null hypothesis that M2 and DC do not Granger cause economic growth can be expressed as H 0 : D121

D131

0

against the alternative hypothesis H1: D121 z D131 z 0 .

In the same way, we can also set up the null hypotheses for other pairs of variables. For testing the null hypotheses, we apply the Toda-Yamamoto test which is asymptotically distributed as F. EViews is used to obtain the test values. The results are given in Table 10-20.

Table 10-20: Toda-Yamamoto test results Hypothesis 1. Dependent variable PGDP M2 does not Granger cause PGDP DC does not Granger cause M2 2. Dependent variable M2 PGDP does not Granger cause M2 DC does not Granger cause M2 3. Dependent variable DC PGDP does not Granger cause DC M2 does not Granger cause DC x o y

F-Test

(p-Value)

0.1715 0.1674

0.6788 0.6824

0.1476 1.5008

0.7008 0.2206

0.0861 4.2822

0.7691 0.0385

Conclusion

M2 o DC

means x Granger causes y. The reported values in parentheses are the p-values of the test.

The findings in Table 10-20 indicate the unidirectional causality running from broad money (M2) to domestic credit (DC). There is no causation between DC and PGDP, and between M2 and PGDP.

Note: Different software packages such as EViews, Python, R, RATS, and STATA can be applied directly for detecting the presence of cointegration relationships between different pairs of variables or among variables based on the EG test, the AEG test, DW test and the Phillips-Ouliaries-Hansen test and for causality analyses.

564

Chapter Ten

Exercises 10-1: Define a dynamic model with an example. 10-2: Define an autoregressive distributed lag model of order p. Find the long-run effect of X on Y. 10-3: Define a distributed lag model of order p. Find the long-run effect of X on Y. Find the proportion of the long-run effect felt by the ith period of time. Explain it with an example. 10-4: Discuss the estimation technique of a distributed lag model. 10-5: Define spurious regression. Explain it with an example. 10-6: Define cointegration with an example. Write the names of different tests for cointegration. 10-7: Discuss the Engle and Granger test for cointegration. Test the existence of the long-run Purchasing Power Parity theorem between Bangladesh and the USA using the Engle-Granger test based on numerical data. 10-8: Discuss the Augmented Engle-Granger test for cointegration. Find the cointegration relationship between financial and economic development of the USA using the Augmented Engle-Granger test based on numerical data. 10-9: Discuss the Phillips-Ouliaris-Hansen test for cointegration. Find the cointegration relationship between government revenue collection and government expenditure using the Phillips-Ouliaris-Hansen test based on numerical data of a developing country. 10-10: Define a vector autoregressive model with an example. Write a VAR(p) model of three variables. 10-11: Define a stationary VAR model with an example. A researcher estimated a VAR model of two variables. The estimated results are given below: ª1.2 -0.4 º Yt = « » Yt-1 +İ t ¬0.6 0.3 ¼ Do you think that the VAR model is stationary and stable? Why?

10-12: Write a VAR(p) model of k endogenous variables and rewrite it as a VAR(1) model. 10-13: Find the mean vector and the autocovariance matrix of a VAR(p) model. 10-14: Define a structural VAR model with an example. Write a structural VAR model in a standard form. 10-15: Define a vector moving-average process. Find the mean vector and variance-covariance matrix of the VMA process. 10-16: Express a VAR(1) model into a VMA(f) . 10-17: Discuss the likelihood ratio test to detect the optimal lag length of a VAR model. 10-18: Discuss the AIC, SBIC, and HQIC criteria to detect the optimum lag length of a VAR model. 10-19: Discuss the OLS and ML methods to estimate a VAR model. 10-20: Discuss the likelihood ratio test for testing a joint null hypothesis in case of a VAR model. 10-21: Let a bivariate VAR model be estimated with four lags. Assume that the original sample contains 50 observations on each variable (denoted y -3 ,y -2 ,.....,y50 ) and the sample observations 1 through 46 are used to estimate the VAR(4) model. Under the null hypothesis, the VAR(3) model is estimated using the sample observations 1 through 46. For an unrestricted model, the variance-covariance matrix of the random error terms is ª3.5 1.5 º ª 4.2 1.75º ˆ ˆ ȍ ur «1.5 2.8» and for a restricted model, it is ȍ0 «1.75 3.9 » . Do you think that the dynamics are completely ¬ ¼ ¬ ¼ captured by a three-lag VAR model? Why?

10-22: Discuss the technique to forecast a VAR model.

Multivariate Time-Series Models

565

10-23: Illustrate the impulse response functions for the bivariate VAR model. To illustrate how impulse response functions operate, we consider the simulated sequences model without the intercept of the type Z t = ș Z t-1 + w t , ª yt º §İ · ª1.6 1.4 º where Z t = « » , and w t ¨ 1t ¸ . Here, it is given that ș = « » . Obtain the impulse responses functions, show İ ¬1.3 1.5¼ © 2t ¹ ¬xt ¼ your results graphically, and explain the effects of shocks to the variables from your graphs.

10-24: Define the forecast error variance decomposition. Discuss the technique to obtain it. 10-25: To examine empirically the supply-leading or demand-following hypotheses in case of the USA, three timeseries variables, say economic growth, money supply, and domestic credit are used. Estimate the VAR model and obtain the forecast error variance decomposition for each variable. Show your results graphically and comment on your results. 10-26: What are the advantages and disadvantages of the VAR model? 10-27: Define a seemingly unrelated regression model with an example. 10-28: Derive a seeming unrelated regression equations (SURE) model from m multiple linear regression equations. 10-29: Obtain the OLS and GLS estimators of the seemingly unrelated regression model. 10-30: Show that the GLS estimator is more efficient than the OLS estimator for a SURE model. 10-31: Discuss the techniques to obtain the feasible GLS estimators of SURE model. 10-32: Define an error correction model (ECM) with an example. Obtain the error correction model from an ARDL model and interpret all the terms of an error correction model. 10-33: Define a vector error correction model and derive a VEC model from a pth-order Gaussian VAR model . 10-34: Define a cointegrated VAR model with an example. Derive a cointegrated VAR model from a pth-order Gaussian VAR model and then interpret all the terms of a cointegrated VAR model. 10-35: Let us consider a first-order Gaussian VAR model of the type Yt = Ǻ1 Yt-1  İ t where Yt is a (2×1) vector of endogenous variables and each of y’s is I(1); İ t is a (2×1) vector of disturbance terms that are assumed to be independently, identically distributed errors with the distribution İ t ~N(0, :) ; and Ǻ1 is a 2 u 2 matrix of coefficients for Yt-1 . The model can also be expressed as the following VEC model: 'Yt = 3 Yt-1  İ t , where Ȇ (I 2  Ǻ1 ) . Here, it is given that Ȇ

ª 0.75 1.5 º « 0.85 1.7 » . ¬ ¼

Obtain the cointegrating relationship between two variables.

10-36: Briefly outline the Johansen test procedure for testing the cointegration relationships between a set of variables in the context of a VAR model. Give one or more examples from the economic growth literature where the Johansen test has been employed. What conclusion can be drawn from the results of the Johansen tests of this research? 10-37: Compare the Johansen maximum eigenvalue test with the trace test. Set up the null and alternative hypotheses in each case. 10-38: Suppose a researcher has a set of five economic variables. Yt (t =1 , 2,……,T) denotes a (5×1) vector of variables and he wishes to test the existence of cointegrating relationships using the Johansen test procedure. What is the implication of finding that the rank of the long-run matrix Ȇ takes on a value of (i) 0, (ii) 1, (iii) 2, (iv) 3, (v) 4 and (vi) 5? 10-39: Briefly outline the Bounds test approach for cointegration. 10-40: Discuss the Granger causality test to investigate the causal directions between two nonstationary and cointegrated variables. 10-41: Discuss the Granger causality test to investigate the causal directions between two nonstationary variables that have no cointegration relationship.

Chapter Ten

566

10-42: Suppose, a researcher has a set of two economic variables Y and X which are I(1) but not cointegrated. The model for the Granger causality test takes the form p

q

ǻYt = Į 0 +¦ Įi ǻYt-i +¦ ȕ jǻX t-j +İ t i=1

j=1

p

q

ǻX t = Į 0 +¦ Įi ǻX t-i +¦ ȕ jǻYt-j +ȟ t i=1

j=1

From a sample of 39 observations, it is found that the value of p is 1. The unrestricted and restricted residual sum of squares for the first model are UESS1 = 0.9707 , and RESS1 1.1020 . For the second model, the unrestricted and restricted residual sum of squares are: UESS2 = 3.0372 , and RESS2 = 3.0477 respectively. Based on the given information, test the existence of causal relationships between pairs of variables. Comment on your results.

10-43: Briefly outline the Toda-Yamamoto approach for testing the causality relationship between integrated variables of different orders. 10-44: A researcher wants to test the causal relationships between two variables say per capita GDP (Y) and the domestic investment (X) of Bangladesh. It is found that Y is I(2) and X is I(1). Which method would be appropriate for detecting the causal relationships between Y and X? Briefly outline the procedure to detect the causal relationships.

CHAPTER ELEVEN LIMITED DEPENDENT VARIABLE MODELS

11.1 Introduction This chapter presents different econometric models in which the dependent variable is bounded and restricted to take on values {lower bound, upper bound}. Thus, in this chapter, we deal with limited dependent-variable models in which the response of an economic agent is limited in some ways. The model is considered in which the dependent variable is restricted rather than continuous on the real line. A model in which the dependent variable has a lower bound (which commonly takes the value zero) and an upper bound (which can be derived from an ordinary regression model by mapping the dependent variable Y through a sigmoid or S-shaped function which is approached asymptotically as the value of the independent variable(s) representing the systematic influences) increases is called the logistic model. In practice, a logistic model depends on finding a function that will map from the range of the systematic variable into the restricted interval of the response variable Y. In some cases, we deal with the discrete choice: the dependent variable Y may be restricted to a Boolean or a binary choice, indicating that Y is a dichotomous variable which can take only the values of one if a particular course of action is present, and zero if the action was not present. A model with a binary dependent variable may be obtained from an ordinary regression model by mapping the dependent variable Y through a step function representing a threshold mechanism. When Y falls short of the threshold value, the mechanism's response is to generate a zero. When Y exceeds the threshold, a unit is delivered. Thus, the model generates a predicted probability that an individual will choose to answer yes rather than no. In such a model, if the regression coefficient ȕ j (j = 1, 2,…..,k) is greater than zero, it implies that those individuals with high values of explanatory variable X will be more likely to respond yes, and the probability of choosing yes will be 1. For instance, if higher disposable income makes it more probable to have a new car, we must be able to include highincome persons or rich persons in the sample for which the predicted probability of a rich person having a new car is bounded by 1 and a poor person’s predicted probability must be bounded by 0. This type of model is called the linear probability model (LPM). The logit and probit models are standard and these models are discussed in this chapter with their important properties and estimation techniques. When we deal with different problems irrespective of any discipline, the phenomena of censoring and truncation may arise. Therefore, in this chapter, censored and truncated regression models are discussed with their important properties and estimation techniques. Nobel Laureate and economist James Tobin developed the Tobit model in 1958 as an extension of the logit and probit models. Thus, this chapter also discusses the Tobit model with its important properties and estimation techniques. Then, the Poisson regression model is also discussed to study the situations where the dependent variable in a regression model is a count variable representing the number of occurrences of an event, i.e., yi  ^0, 1, 2,...........` , with the important

properties and estimation techniques. The LPM, logit model, probit model, Tobit model, censored regression model, truncated regression model and Poisson regression model are illustrated with their applications to numerical problems using the software packages RATS, Eviews and STATA.

11.2 Dummy Dependent Variable A dependent variable of a regression equation is said to be a dummy dependent variable when the dependent variable can be expressed as the form of a dummy variable and takes on the value 1 and 0. When the variable takes the value 1, it can be interpreted as success, and 0 as failure. For example, suppose that we are interested in studying the house ownership of a household in Dhaka city as a function of family income. For this study, the dependent variable Y takes the value 1 if the ith family (i = 1, 2,…....,N) owns a house in Dhaka city, and 0 if not. Typically, it can be written as Yi

­1, if a family owns a house in Dhaka city ® ¯0, otherwise

The variable Y is called a dummy dependent variable. Here, more examples of the applications of dummy dependent variables in an econometric analysis of economic relationships are given. Many variables are dummy variables at an individual level like car ownership, house ownership, unemployment, immigrants, etc.

Chapter Eleven

568

Topic

Dummy Dependent Variable

Purchase a new car

Car owner

Smoking

Smoker

Labour force participation

Labour force

Choice of occupation

Teaching

Union membership

Union

Retirement

Retired

Use of drugs

Drug used

Description Yi

­1, if a family owns a new car ® ¯0, otherwise

Yi

­1, if a patient smokes ® ¯0, if he/she does not smoke

Yi

­1, if a person is in the labour force ® ¯0, if he/she is not

Yi

­1, if a person is a university teacher ® ¯0, otherwise

Yi

­1, if a person is a member of a union ® ¯0, if a person is not a member of a union

Yi

­1, if a person is retired ® ¯0, if not

Yi

­1, if a drug is effective to cure a disease ® ¯0, if not

11.3 Dummy Dependent Variable Models Meaning: An econometric model is said to be a dummy dependent variable model when the dependent variable Y can be conveniently represented by a dummy variable that takes on the values zero and one. For example, let us define, Y as the dummy dependent variable indicating that Yi = 1, if the family i (i = 1, 2,……..,N) owns a house in Dhaka city and Yi = 0 if not. We suppose that house ownership is a function of family income (exogenous/independent variable) X. We want to study the relationship between house ownership in Dhaka city and family income by considering the following regression equation Yi = ȕ 0 +ȕ1X i +İ i , i = 1, 2,…………,N

(11.1)

Equation (11.1) is called the dummy dependent variable model. The different dummy dependent variable models are the linear probability models, logit model, probit model, multinomial model, censored and truncated data (Tobit) model, and sample selection model, etc. The Linear Probability Model (LPM)

Let us consider the model of the type: Yi = ȕ 0 + ȕ1X1i + ȕ 2 X 2i +......+ȕ k X ki +İ i

(11.2)

The dependent variable Y can be treated as, Yi = 1, for a success and Yi = 0, for a failure. The regression coefficient ȕ j expresses changes in probability that Yi = 1 associated with a unit change in X j (j=1, 2,…….,k) given that all other independent variables are constant. The multiple linear regression model (11.2) with a binary dependent variable or dichotomous variable Y as a linear function of the k explanatory variables X1 , X 2 ,........,X k is called a linear probability model (LPM) because the response probability is linear in ȕ j . In the LPM, ȕ j measures the change in the response probability when X j increases by one unit given that the remaining (k-1) independent variables are constant. įProb(Yi =1) = ȕj įX ji

(11.3)

Mathematically, it can be written as E(Yi |X1i , X 2i ,.....,X ki ) Prob(Yi be expressed as follows:

1| X1i , X 2i ,.....,X ki ) . The justification of the name LPM of a model like (11.2) can

The assumption E(İ i |X1i , X 2i ,.....,X ki ) = 0, implies that

Limited Dependent Variable Models

569

(11.4)

E(Yi |X1i , X 2i ,.....,X ki ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +......+ȕ k X ki

Let p be the probability of success and q the probability of failure. Thus, we have pi = Prob(Yi = 1), and q i = Prob(Yi = 0), where pi +q i = 1 Ÿ q i = 1  pi . Here, pi varies for i = 1, 2,………,N individuals. The variable Yi has the following probability distribution: Yi 1

Prob(Yi )

0

pi 1  pi

Total

1

Since Yi is a binary dependent variable, we have E(Yi |X1i , X 2i ,.....,X ki ) 1u Prob(Yi

1| X1i , X 2i ,.....,X ki )  0 u Prob(Yi

0 | X1i , X 2i ,....,X ki )

= 1 u pi  0 u q i

(11.5)

pi

Thus, the conditional expectation of model (11.2) can be interpreted as the conditional probability of Yi , and the conditional variance is given by var(Yi |X1i , X 2i ,.....,X ki ) 12 u Prob(Yi

1| X1i , X 2i ,....,X ki )  02 u Prob(Yi

0 | X1i , X 2i ,....,X ki )  pi2

= pi  pi2

(11.6)

= pi q i

Thus, var(Yi |X1i , X 2i ,....,X ki ) is not a constant because E(Yi |X1i ,X 2i ,....,X ki ) individuals. Thus, we have

pi changes for i= 1, 2, 3,.........,N,

pi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +......+ȕ k X ki Prob(Yi =1|X1i , X 2i ,.....,X ki ) ȕ 0 +ȕ1X1i +ȕ 2 X 2i +......+ȕ k X ki

(11.7)

Since the probability pi lies between 0 and 1, we have the restriction 0 d E(Yi |X1i , X 2i ,....,X ki ) d 1

(11.8)

0 d Prob(Yi =1|X1i , X 2i ,........,X ki ) d 1

Thus, the conditional expectation or conditional probability lies between 0 and 1. Problems in Estimation of LPM

If we apply the OLS method to estimate the LPM, we will face some special problems which are discussed below: (i) In the LPM, the disturbance term İ i is highly non-normal. Proof: Let us consider the LPM of the type

(11.9)

Yi = ȕ 0 +ȕ1X1i + ȕ 2 X 2i +......+ȕ k X ki +İ i

The variable Y is a binary dependent variable which takes the value 1 for a success and zero for a failure. Equation (11.9) can also be written as (11.10)

Yi = X ci ȕ + İ i

where X i

>1

X1i

.... .... X ki @ , and ȕ = >ȕ 0

ȕ1 ... ... ȕ k @c

Chapter Eleven

570

From equation (11.10), we can write that: (11.11)

İ i = Yi  X ci ȕ

When Yi = 1, we have İ i = 1  Xci ȕ, and when Yi = 0, we have İ i =  X ci ȕ. Thus, it can be said that İ i is binomially distributed, and therefore, it cannot be assumed that the disturbance term İ i is normally distributed. If the assumption of normality is not satisfied, the application of the OLS method to estimate the LPM will not be critical because the application of the OLS method to estimate an equation does not require the assumption of normality of the random error terms İ i 's. But for hypotheses testing about the parameters, the assumption of normality is required. Even the unbiasedness of the OLS estimates will not be affected if the assumption of normality is violated. Also, it can be shown that if the sample size is large, the OLS estimators tend to be normally distributed. Therefore, the statistical inferences of the LPM will follow the usual OLS method in large samples under the assumption of normality. (ii) In an LPM, the random error terms İ's are heteroscedastic Proof: We have İi

­1  X icȕ, when Yi = 1 ® ¯X ci ȕ, when Yi = 0

(11.12)

We also have Prob(Yi =1|X1 , X 2 ,......,X k ) = E(Yi |X1 , X 2 ,......,X k )

(11.13)

pi = Xci ȕ

Since the sum of probability must be one, we can write Prob(Yi =1|X1 , X 2 ,......,X k ) + Prob(Yi = 0|X1 , X 2 ,......,X k ) = 1 Prob(Yi =0|X1 , X 2 ,......,X k ) = 1  Prob(Yi =1|X1 , X 2 ,....,X k ) Prob(Yi =0|X1 , X 2 ,......,X k ) = 1  Xci ȕ

(11.14)

q i = 1  pi

Since Y takes the value 1 or 0, the residuals İ i can take only two values which are shown in equation (11.12), conditional on X. The probability distribution of İ i is shown in Table 11-1. Table 11-1: Probability distribution of İ i

Value of İ i 1  Xci ȕ

Probability

X ci ȕ Total

1  pi 1

pi

Therefore, the variance of İ i is given by 2

2

Var(İ i |X1i , X 2i ,......,X ki ) = 1  X ciȕ pi + X ci ȕ (1  pi ) 2

2

= 1  X ci ȕ X ci ȕ  X ci ȕ (1  Xci ȕ)

1  Xciȕ Xciȕ >1  Xciȕ  Xciȕ @ 1  Xciȕ Xciȕ

(11.15)

which clearly varies with the explanatory variables X1 , X 2 ,......,X k . Thus, it can be said that the random error terms İ's are heteroscedastic in the LPM. Since the model is heteroscedastic, biased standard errors lead to biased inference,

Limited Dependent Variable Models

571

so the results of the tests for hypotheses testing will be wrong. The OLS estimator is still unbiased, but the standard errors will be biased, and hence, the t-values will be wrong. The easiest way of solving this problem is to obtain the estimates of the standard errors that are robust to heteroscedasticity. (iii) The predicted probabilities may be greater than 1 or smaller than zero if we apply the OLS method to the LPM. It has been shown that, in the LPM, E(Yi |X) measures the conditional probability of the occurrence of Yi given X. Thus, logically, it must be between 0 and 1. Priori it is true, but there is no guarantee that the predicted ˆ , the estimators of E(Y |X) , will lie in the range 0 to 1. Thus, this is the big problem if we apply the probabilities Y i i OLS method to estimate the linear probability models. There are two procedures to estimate the LPM in which the predicted probabilities lie between 0 and 1. One is to apply the OLS method to the LPM and find out whether the ˆ lie between 0 and 1. If some predicted probabilities are less than 0, we assume that Y ˆ is to predicted probabilities Y i i ˆ is to be 1 for those cases. be zero for those cases. If some predicted probabilities are greater than 1, we assume that Y i

ˆ lie The second procedure is to apply the logit and probit model to ensure that the estimated conditional probabilities Y i between 0 and 1.

(iv) R-squared becomes useless as a measure of goodness of fit of the LPM. For a binary dependent variable model, the computed R2 will give us a limited value. To explain the reason behind this, let us now consider the following figures. Logically, for a linear probability model, the value of Y lies between 0 and 1 for given X. Therefore, all the values of Y will lie either along the X-axis or along the line to point 1. Whether the predicted probabilities of an LPM model is an unconstrained LPM (Fig. 11-1(a)) or a constrained LPM (Fig. 111(b)), the estimated R2 is likely to be much smaller than 1. In practice, R2 lies between 0.2 to 0.6, but, when all the points are closely clustered around points A and B in Fig. 13-1(c), the value of R2 will be high, i.e., it may be greater than 0.8. For this model, it is very easy to fix the straight line by joining the two points A and B. For this case, the predicted probability will be either very close to 0 or 1.

ˆ Y 1

LPM (unconstrained) A more reasonable regression line

0

X

Fig. 11-1(a):

ˆ 1 Y

LPM (constrained)

A more reasonable regression line

0 Fig. 11-1(b):

X

Chapter Eleven

572

ˆ Y 1

A

LPM

A more reasonable regression line

B

0

X

Fig. 11-1(c) Fig. 11-1: Linear probability models

Weighted Least-Squares Method to Estimate a Linear Probability Model

In an LPM, the random error terms İ's are heteroscedastic. Therefore, the OLS method is not applicable to estimate linear probability models. If we apply the OLS method to the LPM, the OLS estimators will be unbiased but will not be efficient. That is, they do not have the minimum variance. Thus, to solve this problem, we can transform the original LPM model in such a way that the random error terms of the transformed model will be homoscedastic. Let us consider the LPM of the type (11.16)

Yi = ȕ 0 + ȕ1X1i + ȕ 2 X 2i +......+ȕ k X ki + İ i

We have var(İ i |X1i , X 2i ,......,X ki ) = pi (1  pi ) . Dividing equation (11.16) by Yi wi

= ȕ0

1 wi

+ ȕ1

X1i wi

+ ȕ2

w i , where w i X 2i wi

+......+ȕ k

pi (1  pi ), we have X ki wi

+

İi wi

Yi* = ȕ 0 w *i + ȕ1X1i* + ȕ 2 X*2i +......+ȕ k X*ki + İ*i

where Yi* =

Yi wi

, X1i*

X1i wi

, X*2i

X 2i wi

(11.17) , …….., X*ki

X ki wi

, and İ*i

İi wi

The variance of the new random error term İ*i is constant, i.e., Var(İ*i |X) = 1. Therefore, we can apply the OLS method to the transformed model (11.17). Since the true E(Yi |X1 , X 2 ,......,X k ) = pi is unknown to us, the weights w i 's are unknown to us. Thus, to estimate w i 's, we may use the following two-step procedure. = pˆ i , Step 1: First, we apply the OLS method to the original model, and then, we obtain yˆ i = Estimate of the true E(Y|X) i ˆ i pˆ i (1-pˆ i ),  i. and then, we obtain w ˆ i to the transform model which is given by Step 2: Use the estimated w Yi ˆi w

= ȕ0

X X X İ 1 + ȕ1 1i + ȕ 2 2i +......+ȕ k ki + i ˆi ˆi ˆi ˆi ˆi w w w w w

(11.18)

and then, we apply the OLS method to run the transformed model (11.18) for estimation. Ex. 11-1: A linear probability model (LPM) is estimated using both the OLS and WLS methods by considering the financial inclusion (FINC) as a dependent variable, and age (AGE), years of schooling (SCH), gender (GEN), monthly family expenses (ME), institutional involvement (IINV) and working in abroad (WA) are independent variables which are collected by a survey of 460 respondents of Chattogram city of Bangladesh. First, the following LPM is considered

Limited Dependent Variable Models

573

(11.19)

FINCi = ȕ 0 +ȕ1AGE i +ȕ 2SCH i +ȕ 3 GEN i +ȕ 4 ME i +ȕ 5 IINVi +ȕ 6 WA i +İ i

where the dependent variable financial inclusion (FINC) can be treated as FINCi

­1, if a person has access to financial products and services that meet his/her needs ® ¯0, otherwise

The variable AGE indicates the age of the respondents; SCH indicates the years of schooling of the respondents, and GEN is the gender which is treated as GEN i

­1, if a respondent is male ® ¯0, if a respondent is female

ME indicates the monthly family expenses of the respondents; IINV is a qualitative variable which is for institutional involvement of the respondents and is defined as IINVi

­1, if a respondent involves with institution ® ¯0, otherwise

and the variable WA is a qualitative variable which is used for the respondents working abroad and defined as WA i

­1, if a respondent works in abroad ® ¯0, otherwise

In the LPM, ȕ j (j =1, 2, 3, 4, 5, 6) measures the change in the response probability when the jth independent variable increases by one unit given that the remaining independent variables are constant. Mathematically, it can be written as E(FINCi |AGE i , SCH i ,....,WA i )

Prob(FINCi =1|AGE i , SCH i ,....,WA i )

pi

The assumption E(İ i |AGE1i , SCH i ,....,WA i ) = 0 implies that pi = ȕ 0 + ȕ1AGE i +ȕ 2SCH i +ȕ 3 GEN i +ȕ 4 ME i +ȕ5 IINVi +ȕ 6 WA i

(11.20)

Equation (11.20) is estimated using both the OLS and WLS methods, and the results are given below Table 11-2: The OLS estimates of the LPM

Variable Constant AGE SCH GEN ME IINV WA Usable Observations Degrees of Freedom Centered R2 Adj(R2) Uncentered R2 TR2 Mean of Depen, Variable

Coeff 0.0467 0.0028 0.0187 -0.1513 0.0285 0.0913 0.3121

Linear Regression - Estimation by Least Squares Dependent Variable FINC Std Error T-Stat 0.0917 0.5090 0.0016 1.7400 0.0047 3.9514 0.0583 -2.5958 0.0036 7.8783 0.0416 2.1817 0.0979 3.1870 460 Std Error of Depen. Var. 453 Standard Error of Estim. 0.24043 Sum of Squared Resid. 0.23037 Regression F(6,453) 0.60704 Significance Level of F 279.222 Log Likelihood 0.48261 Durbin-Watson Statistic

Signif 0.6110 0.0825 0.0000 0.0097 0.0000 0.0289 0.0015 0.50024 0.43886 87.2452 23.8980 0.00000 -270.3358 1.6311

Source: Primary data are collected by a survey.

From the estimated results, it is found that the variables AGE, SCH, ME, IINV and WA have significant positive effects on the probability of having access to useful and affordable financial products and services that meet their needs. But the variable GEN has a significant negative effect on the probability. The slope values of 0.0028, 0.0187, 0.0285, 0.0913 and 0.3121 mean that for one-year change in AGE, on an average, the probability of having access to useful and affordable financial products and services that meet their needs will be increased by 0.0028 or 0.28%, for one year change in schooling. Similarly, on an average, the probability of having access to useful and affordable

Chapter Eleven

574

financial products and services that meet their needs will be increased by 0.0187 or 1.87% for one unit change in monthly family expenses; on an average, the probability of having access to useful and affordable financial products and services that meet their needs will be increased by 0.0285 or 2.85% for having an institutional involvement; on an average, the probability of having access to useful and affordable financial products and services that meet their needs will be increased by 0.0913 or 9.13%; and, for working abroad, on an average, the probability of having access to useful and affordable financial products and services that meet their needs will be increased by 0.3121 or 31.21%. The slope value of -0.1513 means that, for the male gender, on an average, the probability of financial inclusion will be decreased by 0.1513 or 15.13%. For the given values of AGE, SCH, GEN, ME, IINV and WA, we can estimate the probability of having access to useful and affordable financial products and services that meet their needs. The estimated linear probability model given by pˆ i = 0.0467 + 0.0028AGE i +0.0187SCH i - 0.1513GEN i +0.0285ME i +0.0913IINVi +0.3121WA i

(11.21)

ˆ i , we can obtain the WLS estimates of equation (11.19). The results are given ˆ i = pˆ i (1  pˆ i ) . Using w Thus, we have w in Table 11-3. Table 11-3: The WLS estimates of the LPM

Linear Regression - Estimation by WLS Dependent Variable FINC Std Error T-Stat 0.0939 0.7606 0.0017 1.9014 0.0043 6.9321 0.0618 -2.5213 0.0030 6.0096 0.0424 2.6517 0.0783 4.6295 0.4703 Std Error of Depen. Var. 0.4630 Standard Error of Estim. 0.6756 Sum of Squared Resid. 299.304 Log Likelihood 1.12245 Durbin-Watson Statistic

Variable Coeff Constant 0.0714 AGE 0.0032 SCH 0.0303 GEN -0.1558 ME 0.0183 IINV 0.1125 WA 0.3623 Centered R2 Adj(R2) Uncentered R2 TR2 Mean of Depen, Variable

Signif 0.4473 0.0579 0.0000 0.0120 0.0000 0.0083 0.0000 1.4123 1.0349 467.0211 -640.2860 1.6310

The estimated equation is shown below: ˆ FINC AGE i SCH i GEN i ME i 1 i = 0.0714 + 0.0032 +0.0303  0.1558 +0.0183 + ˆi ˆi ˆi ˆi ˆi ˆi w w w w w w 0.1125

IINVi ˆi w

+0.3623

WA i ˆi w

(11.22)

From the estimated results in Table 11-3, it is found that the variables AGE, SCH, ME, IINV and WA have significant positive effects on average response probability of having access to useful and affordable financial products and services that meet their needs. But the variable GEN has a significant negative effect. The estimated results show that the estimated standard errors are smaller and the corresponding t-ratios are larger. One should consider these results as a grain of salt because, in case of the WLS method, 17 observations are dropped. Since the sample size is large and ˆ i pˆ i (1  pˆ i ) are the estimated values, the statistical hypothesis testing procedures are valid. w Concept of the Logit and Probit Models

Previously, it is shown that, in a linear probability model (LPM), there are several problems such as (i) the random error terms İ's are not normally distributed; (ii) the random error terms İ's are heteroscedastic; (iii) the predicted probability does not lie within the range 0 and 1; and (iv) the R-squared values are smaller, and thus, become useless as a measure of the goodness of fit. Some of the problems can be solved but some others cannot be solved. For example, the heteroscedastic problem can be solved by using the weighted least-squares method and the weight becomes w i = pˆ i (1-pˆ i ) . The non-normality problem of the random error terms can be solved by increasing the sample size. Even, by using the mathematical programming techniques or by resorting to restricted least-squares method, we can make the predicted probabilities lie in the range 0 and 1.

Limited Dependent Variable Models

575

But, even then, the main problem of the LPM is that the response variable is coded as 1 or 0, corresponding to responses of True or False to a particular question. Thus, in the LPM, pi = E(Yi =1|X) increases linearly with X, and therefore, the marginal or incremental effect of X on pi , i.e.,

įProb(Yi =1) = ȕ j , remains constant, implying that the įX ji

model is logically not appropriate. The reason that the LPM is unrealistic is explained with the following example. Let us consider the problem of new car ownership. The application of the LPM will give us the result that the probability of owning a new car will be constant if the income of the households is increased by one unit. This is very much unrealistic because we may expect that the probability pi of having a new car would be non-linear with the level of income. At a very low-income level, households will not own a new car. But higher disposable household’s income says that X* most likely will own a new car. Thus, any increase in household income X* will have a smaller effect on the probability of owning a new car. Therefore, at both ends of the household income distribution, the probability of owning a new car will not be affected by a small change in income. Therefore, in such a situation, we have to consider the probability model which satisfies the following two important features: (i) As income Xi increases, pi = E(Yi =1|X i ) increases but never falls outside the range 0 to 1; and (ii) the relationship between pi and Xi is non-

linear; that is, when X i gets smaller, pi approaches zero at a slower and slower rate; and pi approaches 1 at a slower and slower rate as Xi gets very large1. Thus, a useful approach is to use a latent variable model which is given by

Yi* = ȕ 0 +ȕ1Xi +İ i

(11.23)

where Yi* is an unobservable magnitude which can be considered the net benefit to individual i of purchasing a new car. We cannot observe that net benefit but can observe the outcome of the individual having followed the decision rule Yi Yi

0, if Yi*  0 ½° ¾ 1, if Yi* t 0 °¿

(11.24)

That is, we observe that the individual household either can or cannot purchase a new car in 2023. If a household owns a new car, as we observed that Yi 1, we take this as evidence that a rational household made a decision that improved the household’s welfare. We say that the latent variable Y* is linearly related to the income variable X or more than one variable and a random error term İ . In the latent variable model, we must make the assumption that the disturbance İ i has a known variance ı 2 . Unlike the regression problem, we do not have sufficient information in the data to estimate its magnitude. Since we may divide equation (11.23) by any positive V without altering the estimation problem, the most useful strategy is to set ı = ı 2 =1 . In the latent variable model framework, we find the probability of an individual making each choice. Using equations (11.23) and (11.24), we have Prob(Yi* t 0) Prob(İ i +Xȕ t 0) Prob(İ i t Xȕ ) Prob(İ i  Xȕ ) Prob(Yi

1| X )

) (Yi* )

) (Xȕ)

(11.25)

The function ) (.) is a cumulative distribution function (CDF) which maps points on the real line (-f, f) into the probability measure {0, 1} and is shown in Fig. 11-2.

1

John Aldrich and Forrest Nelson, op. cit. p.26

Chapter Eleven

576 1.2

CDF

1.0 0.8 0.6 0.4 0.2 0.0 -5.0

-2.5

0.0

2.5

5.0

Fig. 11-2: A cumulative distribution function (CDF)

The explanatory variables in X are modelled in a linear relationship to the latent variable Y* . If Yi = 1 , then Yi* t 0, implies that İ i < Xȕ . Now, considering a case where İ i =0, a positive Yi* corresponds to Xȕ > 0, and vice versa. If İ i is negative, observing Yi = 1 implies that Xȕ must have outweighed the negative İ i , and vice versa. Therefore, we can interpret the outcome Yi = 1 as indicating that the explanatory factors and disturbance term faced by family i have combined to produce a positive net benefit. For example, a family might have a low income which suggests that a new car purchase is not likely but may have a sibling who works for Toyota and can arrange for an advantageous price on a new vehicle. So, it is not possible for us to observe this circumstance, and hence, it becomes a large positive İ i , explaining how Xȕ+İ i > 0 for that family. To explain such a situation, two common estimators of the binary choice model, namely: binomial probit and binomial logit models, are used. For the probit model, ) (.) is the CDF of the Normal distribution function which is given by XE

Prob(Yi =1|X) =

³ f(İ ,ȝ,ı)dİ i

i

= ĭ(Xȕ)

(11.26)

-f

İ2

where f(İ i ,ȝ,ı) =

1  2i e . 2ʌ

For the logit model, ) (.) is the CDF of the logistic distribution which is given by Prob(Yi =1|X) =

exp(Xȕ) 1+exp(Xȕ)

(11.27)

If the distribution of the sample values of Y is not too extreme, the two models will give us almost the similar results. However, a sample in which the proportion Yi = 1, or the proportion Yi = 0, is very small, it will be sensitive to the choice of CDF. Neither of these cases is really amenable to the binary choice model. The Logit Model

The logit model is an improvement of the linear probability model (LPM). It is based on the cumulative distribution function (CDF) of a random variable, with the logit model having the following logistic cumulative distribution function (CDF), and giving the following relationship:

Limited Dependent Variable Models

577

1.2 1.0

CDF

0.8 0.6 0.4 0.2 0.0 -5.0

Fig. 11-3: The logistic function

-2.5

0.0

2.5

5.0

exp(X) and its derivative. For large negative values of X, the function and its derivative are 1+exp(X)

close. In the case of the exponential function exp(X), they coincide for all values of X.

The regression line given in Fig.11-3 is non-linear, giving a more realistic description of the data with very little change in the probability at the extreme value that the explanatory variable can take. Let pi be the probability that the dependent variable Yi takes the value of 1. Then, in the logit model, the probability

pi for a given value of the explanatory variable X can be expressed as pi =E(Yi =1|X i ) =

pi =

1 , where Zi = ȕ 0 +ȕ1X i . 1+e-Zi

e Zi 1+e Zi

(11.28)

Equation (11.28) is known as the cumulative or logistic distribution function2. Equation (11.28) can be written as 1  pi =

1 1+e Zi

pi 1  pi

e Zi

(11.29)

Thus, from equation (11.29), we see that the logit model can be expressed as the odds ratio, which is simply the probability of an event occurring relative to the probability that it will not occur. Then, by taking the natural log of the odds ratio, we produce the logit (Li ) as follows: ª p º Li = ln « i » = Zi ¬1  p i ¼ Li = ȕ 0 +ȕ1X i

(11.30)

The above relationship shows that Li is a linear function in Xi , and the probability pi is not a linear function in Xi .

Li is called the logit and model (11.30) is called the logit model. Estimation of the Logit Models

The logit model is given by ª p º Li = ȕ 0 +ȕ1X i +İ i , where Li = ln « i » ¬1  pi ¼ 2

(11.31)

The logistic model has been used extensively in analysing growth phenomena such as population, GNP, money supply, etc. For theoretical and practical details of logit and probit models, see J.S. Kramer, The Logit Model for Economics, Edward Arnold Publishers, London 1991; and G. S. Maddala, Limited Dependent and Qualitative variables in Econometrics, Cambridge University Press, New York, 1983.

Chapter Eleven

578

From equation (11.31), it can be said that Li is a linear function in Xi . To estimate the model, we need the information of X and L. If we apply the OLS method to estimate the linear function (11.31), we face some difficulties which are shown below: If the probability of the occurrence of a specific event is 1, i.e., if pi

ª1º 1, we have Li = ln « » . ¬0¼

ª0º 0, then we have Li = ln « » . ¬1¼ Thus, it can be said that these expressions are meaningless. It can also be said that, if we have the data on micro or individual level, it is not possible for us to estimate the logit model (11.31) using the OLS method based on the given information. In this situation, the maximum likelihood method (ML) can be applied to estimate the model. But in this section, the weighted least-squares (WLS) method is discussed to estimate the parameters or the model. Corresponding to each value of X i (i = 1, 2,………,m) of the variable X, let there be N i observations of which there Again, if the probability of the nonoccurrence of the event is zero, i.e., if pi

are n i observations for which the specific event will occur. If pi is the probability for the occurrence of the specific event, we have pˆ i

ni Ni

(11.32)

Equation (11.32) indicates that the estimate of the true pi is a relative frequency corresponding to each value of Xi (i = 1, 2,………,m), which are shown in Table 11-4. Table 11-4: Calculations of the logit from observed frequencies

Value of X

Ni

ni

X1

N1

n1

X2

N2

n2

. . . . . Xm

. . . . . Nm

. . . . . nm

pˆ i

Lˆ i

pˆ 1

n1 N1

ª pˆ º Lˆ 1 = ln « 1 » ¬1  pˆ 1 ¼

pˆ 2

n2 N2

ª pˆ º Lˆ 2 = ln « 2 » ¬1  pˆ 2 ¼ . . . . . ª pˆ º Lˆ m = ln « m » ¬1  pˆ m ¼

. . . . . pˆ m

nm Nm

If Ni (i = 1, 2,………..,m) corresponding to each value of X i (i = 1, 2,………,m) is fairly large, pˆ i will be a reasonably good estimate of the true pi 3. Using the estimated pˆ i , we can obtain the logit Li as ª pˆ º Lˆ i = ln « i » = ȕˆ 0 +ȕˆ 1X i ¬1  pˆ i ¼

(11.33)

If pˆ i is a reasonably good estimate of the true pi then Lˆ i will be a fairly good estimate of the true logit Li . Thus, we can obtain the data for the dependent variable Li using the grouped or replicated data which are shown in Table 11-4. Then, using the WLS method based on the estimated values of Li , we can estimate the logit model. It can be shown that, if N i is fairly large and if each X i is independently distributed as a binomial variate, then ª º 1 İ i ~N «0, » N p (1 p )  i i i ¼ ¬

3

(11.34)

The probability of an event is the limit of the relative frequency as the sample size becomes infinitely large

Limited Dependent Variable Models

579

Therefore, the disturbance term İ i of the logit model like the LPM is heteroscedastic. Thus, instead of the OLS method, we can apply the WLS method to estimate the logit model. For the logit model, the weight is given by Wi =

1 ı

2 i

N i pi (1-pi ) , where ıˆ i2 =

1 . ˆ N i pi (1-pˆ i )

Therefore, the original logit model will be transformed using the weighting factor Wi , which is given by Wi Li = ȕ 0 Wi +ȕ1X i Wi +İ i Wi L*i = ȕ 0 Wi +ȕ1X*i +u i

(11.35)

where L*i = Li W, X*i = X i Wi , and u i = İ i Wi . Here, u i is the new random error term. The expected value and the variance of u i are given by E(u i ) = Wi E(İ i ) = 0,  i, and Var(u i ) = Wi2 Var(İ i ) = Wi2 ı i2 =

1 2 ı i = 1. ı i2

Since u i ~N > 0, 1@ , we can apply the OLS method to the transformed equation (13.35) which is called the WLS method. The information of L*i can be obtained from the estimated values of Li and Wi which is given by

ˆ L*i = Lˆ i W i

ª Pˆ º ˆ = X N pˆ (1-pˆ ) . ln « i » Ni Pˆi (1-Pˆi ) . Also, the information of X*i can be obtained as X*i = X i W i i i i i ˆ 1-P ¬ i¼

We will now describe different steps that are involved in estimating the logit model using the WLS method: Step One: First, for each value Xi of X, we calculate the probability of the occurrence of the event as pˆ i =

ni , i = 1, 2,.........,m. Ni

ª pˆ º Step 2: Second, we obtain the logit Li for each X i as Lˆ i = ln « i » , i = 1, 2,.........,m. ¬1 pˆ i ¼ Step 3: Since the original logit model Li = ȕ 0 +ȕ1X i +İ i is a heteroscedastic model, to solve the heteroscedastic

problem, we have to transform the logit model by using the weighted factor Wi , where Wi

N i pˆ i (1-pˆ i ) . The

transformed model is given by Wi Li = ȕ 0 Wi +ȕ1X i Wi +İ i Wi L*i = ȕ 0 Wi +ȕ1X*i +u i

(11.36)

where L*i = Transformed or weighted Li = Lˆ i Wi , X*i = Transformed or weighted Xi = Xi Wi , and u i = Transformed random error term. .

The transformed error u i is homoscedastic, keeping in mind that the variance of the original random error term İ i is ı i2 =

1 . N i pi (1-pi )

Step 4: We apply the OLS method to the transformed equation which is called the weighted least squares (WLS) method to estimate the equation. The estimated equation is given by Lˆ*i = ȕˆ 0 Wi +ȕˆ 1X*i

(11.37)

Chapter Eleven

580

The estimated ȕˆ 1 indicates that, for increasing one unit of weighted X i or X*i , the weighted log of the odds ratio in favour of the occurrence of the specific event will be changed by ȕˆ . From the estimated value of ȕˆ , we can obtain 1

1

the value of pi , i.e., the probability of the occurrence of the specific event. Step 5: Finally, we test the hypotheses and estimate confidence intervals of the parameters in the usual OLS framework. But keep in mind that all the conclusions will be valid if the sample size is reasonably large for the normality of the random error terms. Therefore, in case of small sample sizes, we have to interpret the estimated results very carefully. Marginal Effect of the Logit Model

Let us now define the logistic model of the type Prob(Yi =1|X) =

exp(ȕ 0 +ȕ1X1i +....+ȕ j X ji +.....+ȕ k X ki ) 1+exp(ȕ 0 +ȕ1X1i +....+ȕ j X ji +.....+ȕ k X ki )

(11.38)

In most of the applications, the primary goal is to explain the effect of the jth independent variable X j on the response probability Prob ^Yi

1| X` .

Taking partial derivative of equation (11.38) with respect to X j , we have exp(ȕ 0 +ȕ1X1i +....+ȕ j X ji +.....+ȕ k X ki )ȕ j įProb(Yi =1|X) = ȕj 2 įX ji ª¬1+exp(ȕ 0 +ȕ1X1i +....+ȕ j X ji +.....+ȕ k X ki ) º¼ ȕ j Prob(Yi =1|X) >1  Prob(Yi =1|X) @ ȕ j pi >1  pi @

(11.39)

Thus, one unit increase in X j leads to an increase of ȕ j pi >1  pi @ in the response probability. The Change in Probability for a Unit Increase in Xj [Another Approach]

The logit model is given by ln

pi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +......+ȕ j X ji +........+ȕ k X ki +İ i 1-pi

(11.40)

The partial derivative of equation (11.40) with respect to X j is given by p į ln i = ȕ j įX ji 1-pi 1  pi įp 1 u u i 2 pi įX >1  pi @ ji įpi įX ji

ȕj

ȕ j pi >1  pi @

(11.41)

Thus, the change in probability of the logit model for increasing an additional unit of X j can be estimated by using ȕ j pi >1  pi @ .

Relative Effects of Two Variables in the Response Probability of the Logit Models

For the logit model (11.40), the effect of the jth independent variable on the response probability is given by įProb(Yi =1|X) = ȕ j pi >1  pi @ įX ji

(11.42)

Limited Dependent Variable Models

581

Similarly, the effect of the kth independent variable on the response probability for the logit model is given by įProb(Yi =1|X) = ȕ k pi >1  pi @ įX ki

(11.43)

Thus, for the logit model, the relative effect of the independent variable X j relative to X k on the response probability is given by įProb(Yi =1|X) įX ji įProb(Yi =1|X) įX ki

ȕ j pi >1  pi @ ȕ k pi >1  pi @

ȕj

(11.44)

ȕk

If the relative effect is 1, it indicates that both the variables have the same effect on the response probability. If the relative effect is 2, it indicates that the effect of the variable X j is twice relative to the effect of the variable X k on the response probability. Ex. 11-2: The data on household income ( X i ) in thousand USD, the number of households (N i ) having income X i , and the number of households ( n i ) owning a new car are given in Table 11-5. Table 11-5: Cross-sectional data on household income (X i ) , the number of households (N i ) having income (X i ) ,

and the number of households (n i ) owning a new car. Family Income ( Xi ) (in 000’ USD) 5 8 10 12 15 20 22 25 30 32 35 40 45 50

Ni

ni

50 62 75 85 90 110 75 65 50 45 40 30 25 15

10 15 22 25 32 50 38 42 40 38 35 27 23 14

Estimate the logit model using the WLS method. Solution: The logit model is given by Li = ȕ 0 +ȕ1X i +İ i

(11.45)

ª p º where Li = ln « i » ¬1  pi ¼ ª º 1 For large N i , İ i ~N «0, ». ¬ N i Pi (1  Pi ) ¼

Since the disturbance term İ i of the logit model is heteroscedastic, the transformation of the original logit model is given by

Chapter Eleven

582

(11.46)

Wi Li = ȕ 0 Wi +ȕ1X i Wi +İ i Wi

where Wi =

1 ı

2 i

= N i pi (1  pi ) , and ıˆ i2 =

1 . N i pˆ i (1  pˆ i )

Equation (11.46) can be rewritten as L*i = ȕ 0 Wi +ȕ1X*i +u i

(11.47)

where L*i = Li Wi , X*i = X i Wi , and u i = İ i Wi . The transformed random error term u i is homoscedastic and normally distributed for large N i . Thus, the OLS method can be applied to the transformed model and the test statistic can be applied to test the null hypotheses about the parameters. To estimate the logit model of owning a new car, the necessary information is given in Table 11-6. Table 11-6: Data to estimate the logit model of owning a new car. Xi

Ni

ni

5 8 10 12 15 20 22 25 30 32 35 40 45 50

50 62 75 85 90 110 75 65 50 45 40 30 25 15

10 15 22 25 32 50 38 42 40 38 35 27 23 14

pi =

ni Ni

0.200 0.242 0.293 0.294 0.356 0.455 0.507 0.646 0.800 0.844 0.875 0.900 0.920 0.933

ª p º Li = ln « i » ¬1-pi ¼ -1.386 -1.142 -0.879 -0.875 -0.595 -0.182 0.027 0.602 1.386 1.692 1.946 2.197 2.442 2.639

Wi = N i pi >1-pi @

L*i = Li Wi

X*i = X i Wi

2.828 3.372 3.943 4.201 4.541 5.222 4.330 3.855 2.828 2.431 2.092 1.643 1.356 0.966

-3.921 -3.851 -3.467 -3.678 -2.701 -0.952 0.115 2.321 3.921 4.113 4.070 3.610 3.313 2.550

14.142 26.977 39.429 50.410 68.118 104.447 95.254 96.377 84.853 77.801 73.208 65.727 61.041 48.305

If we apply the OLS method to the transformed equation (11.47) which is called the WLS method, the WLS estimates of the logit model are given in Table 11-7. Table 11-7: The WLS estimates of the logit model

Variable Coeff W -2.1041 * 0.1061 X Usable Observations Degrees of Freedom Centered R2 Adj(R2) Uncentered R2 nR2

Linear Regression - Estimation by Least Squares Dependent Variable LW Std Error T-Stat 0.1258 -16.7301 0.0061 17.4953 14 Mean of Dependent Variable 12 Std Error of Depen. Varia. 0.9623 Standard Error of Estimate 0.9591 Sum of Squared Residuals 0.9628 Log Likelihood 13.480 Durbin-Watson Statistic

Signif 0.0000 0.0000 0.3889 3.3582 0.6788 5.5289 -13.36 0.6499

From the estimated results, it can be said that, for increasing 1 unit of weighted income ($1000), the weighted log of the odds ratio will be increased by 0.1061 unit in favour of owning a new car which is statistically significant at any significance level. Taking the antilog of 0.1061, we have 1.1119 which means that for a unit increase in X*i , the weighted odds in favour of owning a new car will be increased by 1.1119 or about 11.19 percent4.

4

In general, if we take the antilog of the jth slope coefficient, subtract 1 from it, and then multiply the result by 100, we will get the percentage change in the odds for a unit change in the jth regressor of the equation.

Limited Dependent Variable Models

583

Estimated Probability of Owning a New Car

We have Lˆ* Lˆ i = i Wi

ª pˆ º Lˆ* ln « i » = i Wi ¬1  pˆ i ¼ ª Lˆ*i º ª pˆ i º = exp « » « ˆ » ¬1  pi ¼ ¬« Wi ¼»

pˆ i

ª Lˆ* º exp « i » ¬ Wi ¼ ª Lˆ* º 1  exp « i » ¬ Wi ¼

(11.48)

Thus, we can estimate the probability that a family will own a new car corresponding to its income level using equation (11.48) and the estimated probabilities corresponding to different income levels are given below: Table 11-8: The estimated probabilities of having a new car corresponding to different income levels

Income ( X i ) 5 8 10 12 15 20 22 25 30 32 35 40 45 50

Predicted probability ( pˆ i ) 0.172 0.222 0.260 0.303 0.374 0.504 0.557 0.634 0.746 0.784 0.833 0.895 0.935 0.961

Actual probability ( pi ) 0.200 0.242 0.293 0.294 0.356 0.455 0.507 0.646 0.800 0.844 0.875 0.900 0.920 0.933

From Table 11-8, we see the probability of a family having the income of 25 thousand USD will own a new car is 0.634. You can also find the probabilities at other income levels from Table 11-8. The actual and estimated probabilities are shown in Fig. 11-4.

Probability

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

Actual Estimated

0

10

20

30

40

50

Fig. 11-4: Actual and estimated probabilities of owning a new car corresponding to different income levels.

The changes in the probability of owning a new car for increasing an additional unit of income of a family can be calculated using the following formula:

Chapter Eleven

584

įpi ˆ = ȕ1pˆ i >1  pˆ i @ įX i

(11.49)

The estimated change in probabilities corresponding to different income levels are given in Table 11-9. Table 11-9: The changes in probabilities corresponding to different income levels įPi įX i 0.0151 0.0183 0.0204 0.0224 0.0249 0.0265 0.0262 0.0246 0.0201 0.0180 0.0147 0.0100 0.0064 0.0040 0.0180

Xi

5 8 10 12 15 20 22 25 30 32 35 40 45 50 Average change in probability

From the estimated results in Table 11-9, we see that, when the income level of a household is 25 thousand USD, then įpi we have = 0.025 . This means that, if the household income increases by 1 unit from the income level 25 thousand įX i USD, the probability of owning a new car will be increased by 0.025. But at the income level of 45 thousand USD, for increasing an addition unit of income, the probability of owning a new car will be increased by 0.006. It is also found that, for one unit increase in household income, on an average, the probability of owning a new car will be increased by 0.018. The changes in probability of owning a new car at various levels of income are graphed in Fig. 11-5. 0.030

C h a n g e in P r o b a b ilit y

0.025 0.020 0.015 0.010 0.005 0.000 0

10

20

30

40

50

Income

Fig. 11-5: The changes in probabilities of owning a new car at various income levels.

Maximum Likelihood (ML) Method to Estimate the Logit Models

Let Yi be a dummy dependent variable which takes the value 1 for the occurrence of a specific event and zero for non-occurrence. For example, ­1, if a country i (i = 1, 2,....,N) won a medal at the Olympic games Yi = ® ¯0, otherwise

Suppose that Yi is a linear function of k independent variables say X1 , X 2 ,......,X k . Let us now define,

Limited Dependent Variable Models

585

Prob (yi =1|x) = G(ȕ 0 +ȕ1 x1 +..........+ȕ k x k )

(11.50)

= G(xȕ)

where x = >1 x1

x 2 ...... x k @ and ȕ = >ȕ 0 ȕ1 ... ... ȕ k @c .

Here, G is a function taking on the values strictly between zero and one, that is, 0 < G(xȕ) < 1, for all real numbers xȕ . In general, model (11.50) is often referred to as an index model because Prob (yi = 1|x) is a function of the vector x only through the index xȕ, where xȕ = ȕ 0 +ȕ1 x1 +.......+ȕ k x k is a scalar quantity. Since 0 < G(xȕ) < 1, which ensures that the estimated response probabilities strictly lie between zero and one, addressing the main worries of using the LPM. Here, G(xȕ) is usually a cumulative density function (cdf), which is monotonically increasing in the index xȕ with Prob (yi = 1|x) o 0, if xȕ o f and Prob (yi = 1|x) o 1, if xȕ o f . It follows that G(xȕ) must be a nonlinear function, and hence, we cannot apply the OLS method to estimate this nonlinear function. Various nonlinear functions for G(xȕ) have been suggested in the literature. One of the most popular and widely used is the logistic distribution (yielding the logit model), and the standard normal distribution (yielding the probit model). In the logit model, G(xȕ) =

exp(xȕ) = ĭ(xȕ) 1+exp(xȕ)

(11.51)

which lies between 0 and 1 for all values of xȕ , where xȕ is a scalar quantity. This is the cumulative distribution function (CDF) for a logistic variable. Due to the non-linear nature of the logit model, the OLS method is not applicable for estimation. We can apply the maximum likelihood method (ML) to estimate the logit model. Let us now discuss the ML method to estimate the logit model. Assuming that we have a random sample of size n. Let us define, ȕˆ ML is the ML estimate of ȕ which gives the optimum likelihood of observing the sample (y1 , y 2 ,........,y n ) conditional on the explanatory variables. We assume that G(xȕ) is the probability of observing yi = 1, conditional on the explanatory variables, i.e., Prob (yi = 1|x) = G(xȕ), Therefore, 1  G(xȕ) is the probability of observing yi = 0, conditional on the explanatory variables, i.e., Prob (yi = 0|x)=1  G(xȕ). Thus, the probability of observing the entire sample, i.e., the likelihood function, is given by (11.52)

L(y|x;ȕ) = – G(x i ȕ)– (1  G(x i ȕ)) in1

in 2

where n1 refers to the sample observations for which yi = 1, and n 2 is the sample observations for which yi = 0. We can also rewrite equation (13.52) as n

y

1 yi

L(y|x;E ) = – > G(x i ȕ) @ i >1  G(x i ȕ) @

(11.53)

i=1

because we get G(xȕ) when yi =1, and 1  G(xȕ) when yi = 0. The log likelihood function is given by logL(y|x;E ) =

n

¦ ^y log >G(x ȕ@  (1  y )log >1  G(x ȕ)@` i

i

i

i

(11.54)

i=1

The MLE of ȕ is the value of ȕ that maximises this log likelihood function. For the logit model, G(xȕ) is the logistic CDF. Then, we obtain the logit log likelihood function given by

Chapter Eleven

586

logL(y|x;ȕ) =

n

¦ ^y log >ĭ(x ȕ)@  (1  y )log >1  ĭ(x ȕ)@` i

i

i

i

i=1

logL(y|x;ȕ) =

n

i=1

logL(y|x;ȕ) =

ª exp(x i ȕ) º

°­

exp(x i ȕ) º °½

ª

¦ ® y log «1+exp(x ȕ) »  (1  y )log «1  1+exp(x ȕ) » ¾ °¯

i

¬

i

i

¼

¬

i

¼ °¿

n

¦ ^y >(x ȕ)  log(1+exp(x ȕ))@  (1  y )log >1+exp(x ȕ)@` i

i

i

i

(11.55)

i

i=1

The maximum likelihood estimates of ȕ 0 , ȕ1 ,.......,ȕ k can be obtained by taking the partial derivatives of logL(y|x;ȕ) with respect to ȕ 0 , ȕ1 ,......, and ȕ k and then equating to zero, i.e., įlogL(y|x;ȕ) ½ = 0° įȕ 0 ° ° įlogL(y|x;ȕ) = 0° įȕ1 ° ° . ¾ ° . ° . ° ° įlogL(y|x;ȕ) = 0° °¿ įȕ k

(11.56)

These are called the (k+1) first-order conditions for MLE. Now, solving these (k+1) equations, we can obtain the MLE of ȕ 0 , ȕ1 ,......., and ȕ k respectively. If the regression equation is of the type (11.57)

yi = ȕ 0 +ȕ1 x i +İ i

Then, the first-order conditions are as logL(y|x;ȕ) =0Ÿ įȕ 0

n

­°

¦ ®y i=1

¯°

i

ª ª exp(ȕ 0 +ȕ1 x i )) º ½° exp(ȕ 0 +ȕ1 x i ) º «1  »  (1  yi ) « »¾ ¬ (1+exp(ȕ 0 +ȕ1 x i )) ¼ ¬ (1+exp(ȕ 0 +ȕ1 x i )) ¼ ¿°

n ­ ª exp(ȕ 0 +ȕ1 x i )) º ½° exp(ȕ 0 +ȕ1 x i ) º logL(y|x;ȕ) ° ª = 0 Ÿ ¦ ® yi «1  »  (1  yi ) « » ¾x i įȕ1 i=1 ¯ ° ¬ (1+exp(ȕ 0 +ȕ1 x i )) ¼ ¬ (1+exp(ȕ 0 +ȕ1 x i )) ¼ ¿°

0

0

(11.58)

(11.59)

The general form of (k+1) first-order conditions can also be derived from equation (11.54) as n

­°

¦ ®y i=1

¯°

i

g(x i ȕ) g(x i ȕ) ½°  (1  yi ) ¾x G(x i ȕ) >1  G(x iȕ)@ ¿° i

0

(11.60)

Typically, it is not possible for us to solve analytically equations (11.58) and (11.59) for ȕ 0 , and ȕ1 or equation (11.60) for ȕ 0 , ȕ1 ,......., and ȕ k . Now, to estimate ȕ 0 , ȕ1 ,......., and ȕ k , we can apply the iterative “trial and error” procedure. Several algorithms can be used to solve these equations to estimate the parameters, but we will not study these here. The most common ones are based on the first, and sometimes, second derivatives of the log likelihood function. Odds-Ratio for the Logit Models

The odds-ratio in a binary response model is defined as Prob(Yi =1|X) 1  Prob(Yi =1|X)

(11.61)

Limited Dependent Variable Models

587

If the value of this ratio is 1, it indicates that both outcomes have equal probabilities. If this ratio is equal to 2, then the outcome for the occurrence of the event Yi =1, is twice more likely than the outcome for the non-occurrence of the event that is Yi =0. The logit model is given by Prob(Yi =1|X i ) =

exp(X ci ȕ) 1+exp(X icȕ)

(11.62)

Thus, we have 1  Prob(Yi =1|X) =

1 1+exp(X ci ȕ)

(11.63)

Hence, for the logit model, the odds-ratio is given by Prob(Yi =1|X) 1  Prob(Yi =1|X)

(11.64)

exp(X ci ȕ)

Taking logarithm in both sides of equation (11.64), we have ª p º ln « i » ¬1  p i ¼

X icȕ , where pi = Prob(Yi =1|X)

ª p º ln « i » = ȕ 0 + ȕ1X1i +..........+ ȕ 2 X ki ¬1  p i ¼

(11.65)

Thus, for the logit model, the log odds-ratio is a linear function of the independent variables. Therefore, we can say that, in the logit model, ȕ j measures the marginal effect of the independent variable X j (j=1, 2,…….,k) on the log odds-ratio while all other independent variables are constant. That is, a unit increase in X j leads to an increase of 100ȕ j % in the odds-ratio.

Ex. 11-3: Using the data given in Ex. 13-1, estimate the logit model with the ML method considering financial inclusion (FINC) as a limited dependent variable. Solution: Let us define the dependent variable financial inclusion (FINC) as FINCi

­1, if a person has access to financial products and services ® ¯0, otherwise

Let pi be the probability if the dependent variable Y takes the value 1. Then, in the logit model, the probability pi for a given value of the explanatory variable X’s can be expressed as pi = E(FINCi =1|AGE i , SCH i , GEN i , ME i , IINVi , WAi ) =

1 1+e-Zi

(11.66)

where Zi = ȕ 0 +ȕ1AGE i +ȕ 2SCH i +ȕ3 GEN i +ȕ 4 ME i +ȕ5 IINVi +ȕ 6 WA i

(11.67)

All the variables are defined previously. Equation (11.67) can also be written as pi =

e Zi 1+e Zi

(11.68)

Chapter Eleven

588

Equation (11.68) is known as the cumulative or logistic distribution function5. It can be written as 1  pi =

1 1+e Zi

pi 1  pi

e Zi

(11.69)

Thus, from equation (11.69), we see that the logit model can be expressed as the odds-ratio, which is simply the probability if a person has access to financial products and services to meet their needs relative to the probability if the person does not. Then, by taking the natural log of the odds- ratio, we produce the logit (Li ) as follows ª p º Li = ln « i » = Zi ¬1  p i ¼

Li = ȕ 0 +ȕ1AGE i +ȕ 2SCH i +ȕ3 GEN i +ȕ 4 ME i +ȕ5 IINVi +ȕ 6 WA i

(11.70)

Using the random error term, equation (11.70) can also be written as Li = ȕ 0 +ȕ1AGE i +ȕ 2SCH i +ȕ3 GEN i +ȕ 4 ME i +ȕ 5 IINVi +ȕ 6 WA i +İ i

(11.71)

The above relationship shows that Li is a linear function with independent variables, but the probability pi is not a linear function with independent variables. RATS is applied to estimate the logit model using the ML method. The estimated results are given below. Table 11-10: The ML estimates of the logit model

Binary Logit - Estimation by Newton-Raphson Convergence in 6 Iterations. Final criteria was 0.0000000 |z| 0.00157 1.55 0.122 0.00442 3.89 0.000 0.05451 -2.70 0.007 0.00398 8.58 0.000 0.03985 2.00 0.045 0.1441 2.76 0.006

[95% Conf. Interval] [-0.00065, 0.00552] [0.00854, 0.02588] [-0.25426, -0.04057] [0.02637, 0.0420] [0.00169, 0.15789] [0.11505, 0.638002]

The marginal effects imply that, on an average, the male gender has a 14.741% lower probability, working abroad has a 39.754% higher probability, and intuitional involvement has a 7.979% higher probability of having access to financial products and services to meet demands. The additional year of schooling is associated with a 1.723% increase in the probability and an additional unit of monthly family expenses is associated with a 3.418% increase in the probability of having access to financial products and services to meet demands. The estimated results show that the average marginal effects of all the variables, except the variable AGE, are statistically significant. Testing Statistical Hypotheses of the Logit Models

The null hypothesis to be tested against an appropriate alternative hypothesis is H 0 : ȕ1 = ȕ 2 =.......= ȕ q = 0

against the alternative hypothesis H1: At least one of them is not zero.

Under the null hypothesis, the likelihood ratio test statistic (LR) is given by LR = 2(logL ur  logL r ) ~ Ȥ q2

(11.72)

where logL ur is the log-likelihood value for the unrestricted model; logL r is the log-likelihood value for the restricted model under the null hypothesis; and q is the number of restrictions.

Chapter Eleven

590

Let the level of significance be 5%. Decision: At 5% level of significance with q degrees of freedom, we find the table value of the chi-square test statistic. If the calculated value of the chi-square test is greater than the table value, we reject the null hypothesis; otherwise, we accept it. Ex. 11-4: Testify whether the variable AGE has a significant effect on financial inclusion (FINC) for the given problem in Ex. 11-3.

Solution: The null hypothesis to be tested is: H 0 : ȕ_AGE

0

against the alternative hypothesis H1 : ȕ_AGE z 0 .

For the given problem in Ex.11-3, we have the log-likelihood value for the unrestricted model logL ur

(11.73)

 250.8211

and for the restricted model, the log-likelihood value is (11.74)

logL r =  252.0006

Putting these values in equation (11.72), we have LR = 2(  250.8211+252.0006)~F12

= 2.359

(11.75)

Decision: At 5% level of significance with 1 degree of freedom, the table value of the test statistic is 3.84. Since the calculated value is smaller than the table value, the null hypothesis will be accepted. Thus, it can be said that the variable AGE has no significant effect on the study variable financial inclusion. Probit Models

The probit model is very similar to the logit model, with the normal CDF used instead of the logistic CDF. In most other respects it follows the logit approach. Let Yi be a dummy dependent variable which takes the value 1 for the occurrence of a specific event and zero for its non- occurrence. For example, ­1, if a family owns a apartment at Dhaka city Yi = ® ¯0, if not

Let Yi be a function of k exogenous or independent variables say X1 , X 2 ,......,X k . If Yi linearly depends on X1 , X 2 ,......, and X k , our regression model for the occurrence of the specific event is given by Yi = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.......+ȕ k X ki +İ i , i = 1, 2,.......,N

(11.76)

Yi = Xci ȕ+İ i

where X i = >1 X1i

X 2i

...... X ki @c , and ȕ = >ȕ 0

ȕ1 ȕ 2

...... ȕ k @c .

Let pi be the probability for the occurrence of the event. Thus, (1  pi ) is the probability for the non-occurrence of the event. If E(İ i |X i ) = 0 , from equation (11.76), we have E(Yi |X i ) = X ci ȕ

Since Y is a binary dependent variable, we have

(11.77)

Limited Dependent Variable Models

591

E(Yi |X i ) = 1u P(Yi =1|X i )+ 0 u P(Yi =0|X i ) Xci ȕ = 1u pi + 0 u (1  pi )

(11.78)

pi = Xci ȕ

In a linear regression model, we assumed that the errors are normally distributed. But in case of the binary dependent variable, İ i is highly non-normal and we have İi

­1  X ci ȕ, when Yi 1 . ® ¯X ci ȕ, when Yi 0

Thus, instead of a normal distribution, the distribution of errors for a given independent variable has two mass points. Since İ i is a binary variable, the variance of İ i is given by var(İ i )

>1  Xciȕ @ Xciȕ

(11.79)

Since the model is heteroscedastic, biased standard errors lead to biased inference. So, the results of hypothesis tests are possibly wrong. This problem can be solved by designing the binary choice models of the type Prob(Yi =1|X i ) = F(X ci E ) .

where F(.) is a generic function that takes values in [0, 1] i.e., 0 d F(X ci ȕ) d 1, for real X icȕ . In the probit model, F(.) is the standard normal CDF, Prob(Yi =1|X i ) = F(X ci E )  [0, 1] for real X icȕ and its cumulative distribution function is given by Prob(Yi =1|X i ) =

³

Xic E

-f

1

­ 1 ½ exp ® İ 2 ¾ dİ = F(X ci ȕ) 2ʌ ¯ 2 ¿

(11.80)

and is shown below graphically. 1.0

0.4

0.5

0.20

-5.0

-2.5

0.0

2.5

5.0 -5.0

-2.5

0.0

2.5

5.0

PDF of the the Standard Normal Distribution CDF of the Standard Normal Distribution

Fig. 11-7: The standard normal distribution and its cumulative distribution function

But for the logit model, F(.) is the CDF of the logistic distribution of the type Prob(Yi =1|X i ) =

exp(X ci ȕ) 1+exp(X icȕ)

(11.81)

As both models are non-linear, ȕˆ j is not the marginal effect of X j (j = 1, 2,…,k) on Y. The two models will give us almost similar results if the distribution of the sample values of Yi is not too extreme. However, a sample in which the proportion of Yi =1 or the proportion of Yi = 0 is very small will be sensitive to the choice of CDF. Neither of these cases is really amenable to the binary choice model.

Chapter Eleven

592

The Structure of the Probit Models

The probit model can also be derived from a latent variable approach. Let Y* be an unobservable magnitude or latent variable which can be considered the net benefit to individual i of taking a particular course of action (e.g., purchasing a new car, new apartment, etc.) which can be determined by Yi* = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.........+ȕ k X ki +İ i , i = 1, 2,.......,N Yi* = X ci ȕ+İ i , i = 1, 2,.......,N

where X i = >1 X1i

(11.82) .... X ki @c and ȕ = >ȕ 0

X 2i

ȕ1 . ... ȕ k @c

We cannot observe the net benefit but we can observe the outcome of the individual i that followed the decision rule. The rule is that we observe the variable Yi which is linked to Yi* as given below Yi

* °­1, if Yi ! 0 ® * °¯0, if Yi d 0

(11.83)

For example, we can observe that an individual i (i = 1, 2,,…….,N) did or did not purchase a new car in the year 2023. If the individual consumer did, we observe that Yi 1 and we take this as evidence that a rational consumer made a decision that improved her welfare. If the consumer could not buy a new car in the year 2023, we observe that Yi 0 . We speak of Yi* as a latent variable, linearly related to a set of factors X and a disturbance term İ i . In the latent variable model, we must assume that the disturbance İ i has a known variance ı İ2 . Like all other regression models, we do not have sufficient information in the data to estimate its magnitude. Since we may divide equation (11.82) by any positive ı İ without altering the estimation problem, the most useful strategy is to set up ı İ = ı İ2 =1. To estimate equation (11.82), we assume that the expected value of the error term İ i , given X, is 0, and they are uncorrelated. The distribution of the error term is dependent on the underlying assumption made about F(.) . Based on the assumptions on the distribution functions in the latent variable model framework, we find the probability of an individual making each choice. Using equations (11.82) and (11.83), we have Prob ^Yi

Prob ^Yi* >0|X`

1| X`

= Prob ^İ i >  X ci ȕ|X`

= 1  F(  X ci ȕ)

(11.84)

= F(Xci E )

The function F(.) is a cumulative distribution function (CDF) which maps points on the real line (-f, f) into the probability measure {0, 1} and is shown in Fig. 11-8. For the probit model, F(.) is the CDF of the Standard Normal distribution function which is given by Xic E

Prob(Yi =1|X) =

³ f(İ ,ȝ,ı)dİ i

i

= F(X ci ȕ)

(11.85)

-f

where f(İ i ,ȝ,ı) =

1 2ʌ

e



İi2 2

, İ i is a standard normal variate, i.e., İ i ~N(0, 1),  i.

Since Prob(Yi =1|X) = Pi represents the probability of the occurrence of an event, it is measured by the area of the standard normal probability curve from f to Ii as shown in Fig. 11-8, where I*i d Ii = ȕ 0 +ȕ1X1i +.....+ȕ k X ki = F-1 (pi ) .

Limited Dependent Variable Models

Pi=F(Ii) 1 Pi

0

f

593

Pi=F(Ii) 1 Pi

Prob(I* d Ii )

Ii =Xciȕ

Fig. 11-8(a): Given Ii find Pi

f

f

0

Ii =F-1 (Pi )

f

Fig. 11-.8(b): Given Pi find Ii

Fig. 11-8: CDF of the standard normal distribution

Now, we can obtain the index Ii as well as ȕ 0 , ȕ1 ,....., and ȕ k , by taking the inverse of equation (11.85), i.e., Ii = F-1 (pi ) ȕ 0 +ȕ1X1i +ȕ 2 X 2i +.............+ȕ k X ki = F-1 (pi )

(11.86)

where F-1 is the inverse function of the CDF of the standard normal distribution. The procedure to obtain the value of the index Ii is explained clearly in Fig. 11-8. In panel 11-8(a) of the figure, we obtain the cumulative probability pi of occurring the event given that I*i d Ii , and in panel 11.8(b), we obtain the value of Ii for the given value of pi which is simply the reverse of the former. Using the values of Ii , we can estimate the values of the parameters ȕ 0 , ȕ1 ,....., and ȕ k . Marginal Effect of the Probit Model

We know that the probability of the occurrence of an event is given by Prob(Yi =1|X) = 1  F(  X ci E )

(11.87)

We are very much interested in explaining the effect of the jth independent variable X j , on the response probability Prob ^Yi

1| X` in different problems. Now, taking the partial derivative of equation (11.87) with respect to X j , we

have

G Prob(Yi =1|X) = f(  X ci E )E j G X ji = f(X ci ȕ)ȕ j

(11.88)

ˆ in the response probability of the probit model. Thus, one unit increase in X j , leads to an increase of ȕˆ jf(Xci ȕ)

Relative Effects of Two Variables in the Response Probability of the Probit Model

For the probit model, the effect of the jth independent variable on the response probability is given by įProb(Yi =1|X) = ȕ jf(Xci ȕ) įX ji

(11.89)

Similarly, the effect of the kth independent variable on the response probability is given by įProb(Yi =1|X) = ȕ k f(X ci ȕ) įX ki

(11.90)

Chapter Eleven

594

Thus, for the probit model the relative effect of the two independent variables X j and X k on the response probability is given by įProb(Yi =1|X) įX ji ȕ jf(Xci ȕ) = įProb(Yi =1|X) ȕ k f(Xci ȕ) įX ki ȕj

(11.91)

ȕk

Thus, it can be shown that the relative effect of the two variables X j and X k on the response probability for the probit model is identical to the logit model. If the relative effect is 1, it indicates that both variables have the same effect on the response probability. If it is 0.5, it indicates that the effect of the variable X j is half relative to the effect of the variable X k on the response probability. Alternatively, we can say that the effect of the variable X k is twice relative to the effect of the variable X j on the response probability. Ex. 11-5: For the given problem in Ex. 11-2, fit the probit model. Obtain the marginal effects of the probit model. Solution: Let the binary dependent variable Y be defined as ­1, if a family purchases a new car . Yi = ® ¯0, otherwise

Let pi be the probability that a family owns a new car which indicates the relative frequencies of owning a new car having income level X i (the empirical measure of probability), which is given by pi =

ni Ni

(11.92)

We know, Prob ^Yi =1|X i ` = pi

(11.93)

These pi 's are calculated corresponding to different income levels Xi which are shown in Table 11-12. Since pi 's are known to us, we can obtain the index Ii from the cumulative distribution function (CDF) of the standard normal distribution which are shown in Table 11-12. Table 11-12: The calculated pi 's and Ii 's corresponding to different income levels X i 's

Family Income ( X i ) (in 000’ USD) 5 8 10 12 15 20 22 25 30 32 35 40 45 50

pi =

ni Ni

0.200 0.242 0.293 0.294 0.356 0.455 0.507 0.646 0.800 0.844 0.875 0.900 0.920 0.933

Ii

Z=I+5

-0.84 -0.70 -0.55 -0.54 -0.37 -0.11 0.02 0.38 0.85 1.1 1.15 1.29 1.41 1.50

4.16 4.3 4.45 4.46 4.63 4.89 5.02 5.38 5.85 6.1 6.15 6.29 6.41 6.5

Limited Dependent Variable Models

595

Ii =F-1 (pi )

1 0.646

-f

0

0.38

+f

Fig. 11-9: CDF of normal distribution

We regress the index I on the variable X of the type Ii = ȕ 0 + ȕ1X i +İ i

(11.94)

where İ i is the disturbance term which satisfies all the usual assumptions. We apply the OLS method to run equation (11.94). The results are obtained by using EViews and are presented in Table 11-13. Table 11-13: The OLS estimates of the probit model

Linear Regression - Estimation by Least Squares Dependent Variable I Variable Coeff Std Error T-Stat Constant -1.1540 0.1016 -11.3585 Income 0.0594 0.0036 16.6527 Usable Observations 14 S.E. of regression 12 Degrees of Freedom Sum of squared residuals 0.9585 R2 F-statistic Adjusted R2 0.9551 Prob(F-statistic) Log Likelihood 4.9557 Durbin-Watson Stat Diagnostic Tests Test Result LM test for heteroscedasticity 4.3874 11.5525 LM test for autocorrelation 1.2987 LM test for ARCH Normality test for residauls 0.3478

Signif 0.0000 0.0000 0.1834 0.4038 277.3122 0.0000 0.5236 Prob. 0.0362 0.0031 0.2545 0.8403

From the estimated results, it is found that, for increasing an additional unit of income X, on an average, the index I increases by 0.0594 which is statistically significant. But, from the diagnostic test results, it is found that there are problems of heteroscedasticity and autocorrelation in the data. That is why the model is reestimated using AR terms. The estimated results are given in Table 11-14.

Chapter Eleven

596

Table 11-14: The OLS estimates of the adjusted model

Dependent Variable: L Method: Least Squares Date: 06/02/20 Time: 22:13 Sample (adjusted): 3 14 Included observations: 12 after adjustments Estimation settings: tol= 0.00010 Initial Values: C(1)=-1.15292, C(2)=0.05942, C(3)=0.00250, C(4)=0.00250 Convergence achieved after 5 iterations Variable Coeff Std Error T-Stat Constant -1.1334 0.1595 -7.1063 Income 0.0583 0.0063 9.3136 AR(1) 1.4936 0.2409 6.2010 AR(2) -0.9343 0.2827 -3.3044 R-squared 0.9902 Mean dependent var Adjusted R-squared 0.9865 S.D. dependent var S.E. of regression 0.0922 Akaike info criterion Sum squared resid 0.0681 Schwarz criterion Log likelihood 14.0070 Hannan-Quinn criter. F-statistic 268.2493 Durbin-Watson stat Prob(F-statistic) 0.0000 Diagnostic Tests Test Result Brush Pagan LM test for heteroscedasticity 0.7668 LM test for autocorrelation 3.8509 LM test for ARCH 2.4813 Test for normality of the residuals 0.8226 Xi

pˆ i

5 8 10 12 15 20 22 25 30 32 35 40 45 50 Average marginal effect Standard Error t-Test

-----------0.27376 0.34370 0.33801 0.48639 0.51595 0.60425 0.79384 0.84852 0.89205 0.88835 0.91920 0.94256

Signif 0.0001 0.0000 0.0003 0.0108 0.5108 0.7928 -1.6678 -1.5062 -1.7277 2.6092

Prob. 0.3812 0.1458 0.1152 0.6627 ª įp º Marginal Change in Probability « i » ¬ įX i ¼

-----------0.01941 0.02145 0.02131 0.02324 0.02324 0.02246 0.01662 0.01368 0.01082 0.01108 0.00873 0.00671 0.0165 0.0061 9.4175

From the estimated results of the adjusted model, it is found that, for increasing an additional unit of income X, on an average, the index I will increase by 0.0583 which is statistically significant at any significance level. This indicates that the higher the value of ˆIi , the greater the probability that a family will own a new car. From the diagnostic test results, it is found that there are no problems of heteroscedasticity, autocorrelation, and ARCH. It is also found that there is no problem of normality of the residuals. From the estimated results in Table 11-14, it is found that įpi =0.0225, when the income level of a family is 25 thousand USD. It means that, for increasing an additional unit įX i of family income from 25 thousand USD, the probability of owning a new car will increase by 0.0225. It is also found that, for increasing one unit of family income, on an average, the probability of owning a new car will increase by 0.0165, which is also statistically significant at any significance level. The actual and predicted probabilities of owning a new car at various levels of income are shown in Fig. 11-10.

Limited Dependent Variable Models

597

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

4

8

12 16 20 24 28 32 36 40 44 48 52 Income (X) Actual Fitted

Fig. 11-10: The actual and estimated probabilities at various income levels

The changes in probability corresponding to different income levels are shown in Fig. 11-11. .024 .020 .016 .012 .008 .004

4

8 12 16 20 24 28 32 36 40 44 48 52 Marginal change in probability Income (X)

Fig. 11-11: The changes in probabilities at various income levels

From Fig. 11-11, it can be said that the marginal change in probability of owning a new car is a declining function with respect to income. In a probit analysis, the unobservable index Ii is simply known as the normal equivalent deviate (n.e.d.) (in short, normit). Since the normit will be negative whenever pi  0.50, in practice, observation 5 is added with the n.e.d. The result is called the probit and is given by Zi = n.e.d.+5 = Ii +5

(11.95)

The values of Z’s are reported in Table 11-12. The regression results of Z on the variable income (X) are given in Table 11-15.

Chapter Eleven

598

Table 11-15: The OLS estimates of the equation: Zi = Į+ȕX i +İ i

Variable Constant Income R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

Coeff 3.8460 0.0594

Std Error 0.1016 0.0036 0.9585 0.9551 0.1834 0.4038 4.9558 277.3122 0.0000 Diagnostic Test LM test for heteroscedasticity LM test for autocorrelation LM test for ARCH Normality test for residuals

T-Stat 37.8549 16.6527 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Signif 0.0000 0.0000 5.3279 0.8654 -0.4223 -0.3310 -0.4307 0.5236

Test Result

Prob.

4.3874 11.5525 1.2987 0.3478

0.0362 0.0031 0.2545 0.8403

Except the intercept term, the results are identical with those which are obtained based on equation (11.94). Since the constant term 5 is added with the index Ii , then except the intercept term, all the results will be identical. The intercept will be different because Z will be different from I . If the distribution of the sample values of Y is not too extreme, the two models will give us almost similar results. However, a sample in which the proportion Yi = 1, or the proportion Yi = 0, is very small will be sensitive to the choice of CDF. Neither of these cases is really amenable to the binary choice model. Maximum Likelihood Method to Estimate the Probit Models n

Let ^(Yi , X1i , X 2i ,........,X ki )`i=1 be a random sample from the population distribution of Y conditional on X1 , X 2 ,......,X k , i.e., from f(Y|X1 , X 2 ,.....,X k , ȕ) . The likelihood estimation requires the parametric assumption of the functional form f(Y|X1 , X 2 ,.....,X k , ȕ) and we need to know the joint distribution. Let us now consider a binary dependent variable Yi , such that Yi = 1, for having a specific characteristic, and 0 otherwise. Let X i = X1i X 2i ...........X ki be a (k u 1) vector of individual characteristics, pi be the probability when Yi =1|X i =x i ,

and (1  pi ) be the probability when Yi =0|X i =x i . Thus, the conditional probability for the occurrence of a specific event is given by (11.96)

pi = Prob(Yi =1|X i =x i ) = F(x ci ȕ)

where F(x ci ȕ) = ȕ1 x1i +ȕ 2 x 2i +...........+ȕ k x ki , and ȕ = ȕ1 , ȕ 2 ,........,ȕ k c .

F(.) is the cumulative distribution function of the Normal distribution function which is given by

Since Yi

x icȕ

1

­ 1

½ ¾ dİ i = F(x ci ȕ) ¿

(11.97)

­1, with probability pi F(x ci ȕ) ® ¯0, with probability 1  Pi 1  F(x ci ȕ)

(11.98)

Prob(Yi =1|x i ) =



³ exp ®¯ 2 İ

2 i

-f

the conditional likelihood function of the observation Yi , given X i =x i , is given by y

1 yi

Li (ȕ; y|x) = f Yi |Xi (yi |x i ; ȕ) = > F(x ci ȕ) @ i >1  F(x ci ȕ) @

(11.99)

where f Yi |Xi (yi |x i ; ȕ) denotes the conditional probability mass function (pmf) of the observation Yi . The likelihood function for the entire sample is given by

L(y|x;ȕ) = – F(x ci ȕ) – (1  F(x ci ȕ)) in1

i n 2

(11.100)

Limited Dependent Variable Models

599

Where n1 refers to the sample observations for which yi = 1, and n2 to the sample observations for which yi = 0. We can also rewrite equation (11.100) for the entire sample as n

y

1 yi

L(y|x;ȕ) = – > F(x ci ȕ) @ i >1  F(x ci ȕ) @

(11.101)

i=1

because we get F(x ci ȕ), when yi = 1, and (1  F(x icȕ)), when yi = 0 . The log-likelihood function is given by

logL(y|x;E ) =

n1

n2

¦ y log > F(x cE )@  ¦ (1  y )log >1  F(xcE )@ i

i

i=1

i

i

(11.102)

i=1

We observed n1 observations for which Yi 1, and n 2 observations for which Yi 0 where n1 +n 2 = n . Now, the first-order conditions arising from equation (11.102) are non-linear and non-analytic. It is typically not possible for us to solve analytically for ȕ . Instead, to obtain parameter estimates, we rely on some sophisticated iterative “trial and error” techniques. There are many algorithms that can be applied to solve these non-linear functions. The most common ones are based on the first and sometimes the second derivatives of the log-likelihood function. Therefore, we must obtain the ML estimates using the numerical optimisation methods, e.g., the Newton-Raphson method. Fortunately, the log-likelihood functions for logit and probit models are concave, but this is not always the case for other models. The first derivative of the log-likelihood function or score vector for the probit model is given by įlogL(y|x;ȕ) = įȕ

n

ª

¦ «« y i=1

¬

i

f(x ci ȕ) f(x ci ȕ) º  (1  yi ) »x F(x ci ȕ) >1  F(x ciȕ)@ »¼ i

(11.103)

For the probit model pi = F(x ci ȕ) , where pi = Prob(Yi =1|x i ) =

1 2ʌ



­ 1

³ exp ®¯ 2 İ

-f

2 i

½ ¾ dİ i ¿

(11.104)

­ İ2 ½ exp ® i ¾ is the probit p.d.f and the probit c.d.f is given by 2ʌ ¯ 2¿

1

Thus, f(İ i ) =

F(İ) =

x icȕ

1

H

­ 1

³ exp ®¯ 2 v

-f

2

½ ¾ dv ¿

(11.105)

Here, note that, f c(İ) =  İf(İ) and F(-İ) = 1  F(İ) . The second derivative or the probit Hessian is given by n ª f(x cȕ)+(x cȕ)F(x cȕ) f(x cȕ)  (x ci ȕ) >1  F(x ci ȕ) @ º į 2 logL(y|x;ȕ) i i i =  ¦ f(x ci ȕ) « yi +(1  yi ) i » x i x ic (11.106) 2 2 įȕįȕc i=1 «¬ »¼ > F(x ciȕ)@ >1  F(x ciȕ)@

The Newton Raphson Method implies the following recursion:

Chapter Eleven

600 1

ª į 2 logL(y|x;ȕ) º ª įlogL(y|x;ȕ) º ȕˆ m+1 = ȕˆ m  « » « » įȕįȕc įȕ ¼ ȕ=ȕˆ m ¬ ¼ ȕ=ȕˆ n ¬ ª ȕˆ 1 º ª ȕˆ 1 º « » « » «ȕˆ 2 » «ȕˆ 2 » « » « » « . » =« . » «.» «.» « » « » «.» «.» «ˆ » «ˆ » ¬«ȕ k ¼» m+1 ¬«ȕ k ¼» m

ª į 2 logL « « įȕ1įȕ1 « į 2 logL « « įȕ įȕ « 1 2 . « « . « 2 « į logL «¬ įȕ1įȕ k

1

į 2 logL įȕ1įȕ 2

... ...

į 2 logL įȕ 2 įȕ 2

... ...

. .

... ... ... ...

į 2 logL įȕ 2 įȕ k

... ...

į 2 logL º » įȕ1įȕ k » į 2 logL » » įȕ 2 įȕ k » . » » . » » į 2 logL » įȕ k įȕ k »¼ ȕ =ȕˆ i im

ª įlogL º « įȕ » 1 « » « įlogL » « įȕ » 2 » « « . » « » « . » « įlogL » « » ¬« įȕ k ¼» ȕ =ȕˆ i

(11.107)

im

In equation (11.107), ȕˆ im is the mth round estimate of the ith parameter and the Hessian and score vectors are evaluated at this estimate. We know, ȕˆ is asymptotically normally distributed, i.e., ML

ª ª ª į 2 logL(y|x;ȕ) º 1 º º asmp n ȕˆ ML  ȕ  o N «0,  E « « » »» įȕįȕc « « ¬ ¼ ¼» »¼ ¬ ¬





(11.108)

where ȕˆ ML represents the last iteration of the Newton-Raphson procedure. For finite samples, the asymptotic distribution of ȕˆ can be approximated by ML

ª ª ª į 2 logL(y|x;ȕ) º 1 º º »» N «ȕ,  E « « » įȕįȕc « «¬ ¬ ¼ ȕ=ȕˆ ML »¼ »¼ ¬ 1

ª į 2 logL(y|x;ȕ) º ª įlogL(y|x;ȕ) º until ȕˆ m+1  ȕˆ m  Ȝ , where O is a very smaller Thus, we iterate ȕˆ m+1 = ȕˆ m  « » « » įȕįȕc įȕ ¼ ȕ=ȕˆ m ¬ ¼ ȕ=ȕˆ m ¬ number.

Odds-Ratio for the Probit Models

The odds-ratio in a binary response model is defined as pi 1  pi

Prob(Yi =1|X) 1  Prob(Yi =1|X)

(11.109)

If the value of this ratio is 1, it indicates that both outcomes have an equal probability. If this ratio is equal to 2, then the outcome for the occurrence of the event Yi =1, is twice the outcome for the non-occurrence of the event Yi =0 . If this ratio is equal to 1/2, the outcome for the non-occurrence of the event Yi =0, is twice the outcome for the occurrence of the event Yi =1. We know that the probit model is given by Xicȕ

Prob(Yi =1|X) =

³

-f

ª İ2 º exp «  i » dİ i 2ʌ ¬ 2¼

1

F(Xci ȕ)

(11.110)

Thus, we have 1  Prob(Yi =1|X) = 1  F(Xci ȕ)

Hence, the odds-ratio for the probit model is given by pi 1  pi

F(X ci ȕ) 1  F(X ci ȕ)

(11.111)

Limited Dependent Variable Models Xicȕ

ª İ i2 º exp ³ «¬ 2 »¼ dİi -f X cȕ ª İ i2 º 1 i 1 exp ³ «¬ 2 »¼ dİi 2ʌ - f 1 2ʌ

601

(11.112)

Ex. 11-6: Estimate the probit model using the data given in Ex. 11-1 and compare the estimated results with the linear probability and logit models. Solution: To predict the probability of whether a person has access to financial products and services that meet his/her needs, let us define the binary dependent variable FINC as FINCi

­1, if a person has access to financial products and services ® ¯0, otherwise

Let pi be the probability if the dependent variable FINCi takes the value 1, then in the probit model, the probability

pi for given values of the explanatory variables is given by pi = E(FINCi =1|x i ) = x ci ȕ

(11.113)

where xi = (1, AGEi , SCHi , GENi , MEi , IINVi , WAi )c and ȕ = (ȕ 0 , ȕ1 , ȕ 2 , ȕ3 , ȕ 4 , ȕ 5 , ȕ 6 )c . The probit model is estimated by the maximum likelihood (ML) method using RATS. The results are given in Table 11-16. Table 11-16: The ML estimates of the probit model

Binary Probit - Estimation by Newton-Raphson Convergence in 5 Iterations. Final criteria were 0.0000094 |z| 0.0016 1.46 0.144 0.0045 4.00 0.000 0.0560 -2.63 0.008 0.0038 8.81 0.000 0.0402 2.01 0.044 0.1303 2.94 0.003

[95% Conf. Interval] [-0.0008, 0.0054] [0.0091, 0.0266] [-0.2570, -0.03762] [0.0257, 0.0405] [0.0020, 0.1596] [0.1278, 0.6386]

The marginal effects imply that the male gender has a 14.73% lower probability of having access to financial products and services that meet their needs, whereas working abroad has a 38.323% higher probability and institutional involvement has an 8.08% higher probability of having access to financial products and services that meet their needs. The additional year of schooling is associated with a 1.78% increase in the probability and the additional unit of monthly family expenses is associated with a 3.31% increase in the probability of having access to financial products and services that meet their needs. The estimated results show that the average marginal effects of all the variables except the variable AGE are statistically significant. Comparison: Now, to compare the coefficients of the logit model with the coefficients of the probit model, we have to divide the logit coefficients by 1.6. This division produces the coefficients -1.5603, 0.00825, 0.0583, -0.49925, 0.11575, 0.27025, and 1.3463 respectively which are very close to the probit coefficients. The signs of the coefficients of the probit model are similar to the those of the logit and linear probability models. Only the variable GEN has a negative effect. Except for this variable, all other variables have positive effects on the likelihood of having access to financial products and services that meet their needs. Except for the effect of the variable AGE, other variables are statistically significant. The average marginal effects of different variables for both logit and probit models are very closed.

Now, for comparison, the predicted probabilities for both the logit and probit models are presented graphically below.

Limited Dependent Variable Models

603

Predicted Probability

1.00

0.75

0.50

0.25

0.00 50

100

150

200

250

300

350

400

450

Observation LOGITCDF

PROBITCDF

Fig. 11-13: The predicted probabilities (CDF) for both logit and probit models.

It is found that the correlation between the predicted probabilities of the logit and probit model is 0.9998. Thus, they are almost 100% related. Testing Statistical Hypotheses of the Probit Models

The null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 =.......= ȕ q = 0

against the alternative hypothesis H1: At least one of them is not zero.

Under the null hypothesis, the likelihood ratio test statistic (LR) is given by LR = 2(logL ur  logL r ) ~ Ȥ q2

(11.114)

where logL ur is the log-likelihood value for the unrestricted probit model; logL r is the log- likelihood value for the restricted probit model under the null hypothesis; and q is the number of restrictions. Let the level of significance be 5%. Decision: At 5% level of significance with q degrees of freedom, we find the table value of the chi-square test statistic. If the calculated value of the chi-square test is greater than the table value, we reject the null hypothesis; otherwise, we accept it. Ex. 11-7: Test the null hypothesis H 0 : ȕ_AGE LR test.

ȕ_SCH = ȕ_GEN = 0 for the given problem in Ex. 11-6 using the

Solution: The null hypothesis to be tested is H 0 : ȕ_AGE

ȕ_SCH = ȕ_GEN = 0

against the alternative hypothesis H1: At least one of them is not zero.

For the given problem, the log-likelihood value for the unrestricted probit model is logL ur =  251.3652

(11.115)

The log-likelihood value for the restricted model is logL r =  261.323

(11.116)

Putting these values in equation (11.114), we have 2 LR = 2(  251.3652+261.323)~Ȥ 3(d.f)

= 19.9155

(11.117)

Chapter Eleven

604

Decision: At 5% level of significance with 3 degrees of freedom, the table value of the test statistic is 7.8147. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Thus, we can say that the combined effect of the variables AGE, SCH and GEN on the study variable FINC is statistically significant but the effects of the individual variables may not be significant.

11.4 Goodness of Fit of Limited Dependent Variable Models In the case of limited dependent-variable models, there is a problem to use the conventional R2-type for measuring the goodness of fit6. In case of limited dependent variable model, the predicted values yˆ i 's are the probabilities and the actual values yi 's are either 1 or 0. We have

n

n

¦ y ¦ yˆ i

i=1

i

for the linear probability model and also for the logit

i=1

model, as if the constant term is also included in a linear regression model. But in the case of the probit model, the relationship is not exact, but it is approximately valid. Several R2-type measures have been used for measuring the goodness of fit of limited dependent variable models. Some of them are given below: Measures based on Sum of Squares Residuals

We know, for a linear regression model, ª Residual Sum of Squares º R2 = 1 « » ¬ Total Sum of Squares ¼ ª n 2 º « ¦ (yi  yˆ i ) » » 1  « i=1n « » (y y)  i «¬ ¦ »¼ i=1

Effron7 argued that

(11.118)

n

¦ (y

i

 yˆ i ) 2 can be used as a measure of the residual sum of squares in the case of a limited

i=1

dependent-variable model. In the case of a binary dependent variable model, we have n

¦ (y

i

n

¦y

 y)

i=1

2 i

 ny 2

i=1

ªn º = n1  n « 1 » ¬n¼ =

2

n1 n 2 n

(11.119)

where n1 is the number of one’s and n 2 is the number of zero’s , n1 +n 2 = n. Putting this value in equation (11.118), we have R2

1

n n1 n 2

n

¦ (y

i

 yˆ i ) 2

(11.120)

i=1

This is called Effron’s measure of R 2 . Thus, it can be written as Effron's R 2

1

n n1 n 2

n

¦ (y

i

 yˆ i ) 2

(11.121)

i=1

Amemiya8 has suggested that it will be more appropriate to define the residual sum of squares as

6

These are summarized in Maddala, Limited-Dependent, pp. 37-41 B. Effron, ‘Regression and ANOVA with Zero-One Data: Measure of Residual Variation,” Journal of the American Statistical Association, May 1978, pp. 113-121 8 Amemiya, “Qualitative Response Model,” p. 1504 7

Limited Dependent Variable Models n

¦ (y

605

 yˆ i ) 2

i

i=1

(11.122)

yˆ i (1  yˆ i )

Measures based on Likelihood Functions or Log Likelihood Functions

The conventional R-squared statistic is not applicable for measuring the goodness of fit of the binary dependent variable models. Many pseudo-R2 statistics have been proposed based on the likelihood ratios or log likelihood ratios. These pseudo-measures have the property that, when applied to the linear model, they match the interpretation of the linear model R-squared. Here, we discuss some of the pseudo-R-squared measures. Let L UR be the maximum of the likelihood function when maximised with respect to all the parameters and L R be the maximum of the likelihood function under the restriction ȕ1 = ȕ 2 = ........= ȕ k = 0 . Then, we have ªL º R2 = 1 « R » ¬ L UR ¼

2/n

(11.123)

We can use the analogous measure for the logit and probit model as well. We know, for the binary dependent-variable model, the likelihood function attains an absolute maximum of 1. This means that L R d L UR d 1 LR d

LR d1 L UR 2/n

> LR @ > LR @

2/n

2/n

ªL º d« R » ¬ L UR ¼

d1

d 1 R2 d 1

0 d R 2 d 1  > LR @ 2/n

2/n

2/n

> L UR @  > LR @ 0d ª1  > L @ º > L @ R «¬ »¼ UR 2/n

(11.124)

d1

2/n

Hence, Cragg and Uhler9 (1970) have suggested a pseudo- R 2 which lies between 0 and 1 and is given by 2/n

Pseudo-R

2

2/n

> L UR @  > LR @ ª1  > L @ º > L @ R «¬ »¼ UR 2/n

2/n

(11.125)

Another popular pseudo-R-squared measure is a function of the log-likelihood and it is called the McFadden pseudoR2 as given by McFadden pseudo-R 2

1

ln > L UR @ ln > L R @

(11.126)

However, this measure does not correspond to any R 2 measure in the linear regression model. If the slope parameters are all 0, McFadden's-R2 is 0, but it is never 1. It will always be less than 1. This index can also be adjusted to penalise for the number of predictors (k+1) in the model McFadden pseudo-R 2

9

1

ln > L UR @  (k+1) ln > L R @

(11.127)

G. Cragg and R. Uhler. “The Demand for Automobiles,” Canadian Journal of Economics, 1970, pp. 386-406.

Chapter Eleven

606

Maddala (1983) developed another pseudo- R 2 that can be applied to any model estimated by the maximum likelihood method. This popular and widely used measure is expressed as ªL º Maddala-R 2 = 1  « R » ¬ L UR ¼

2/n

(11.128)

where L R is the likelihood function under the null hypothesis in which only the intercept term will be included in the model; L UR is the likelihood function for the full model; and n is the sample size. The likelihood ratio test statistic is given by

O LR L UR

LR L UR

2 ln

e



O

(11.129)

2

Putting the value of

LR in terms of likelihood ratio test statistic, we have L UR

Maddala-R 2 = 1  e  O /n

(11.130)

Maddala proved that Maddala-R2 has an upper bound of 1  [L R ]2/n , and thus, suggested a normed measure based on a general principle of Cragg and Uhler (1970) as given in equation (11.130). Count-R2

Finally, we can measure the goodness of fit in the case of the binary dependent variable models in terms of the proportion of correct prediction. This proportion is also called count-R2. Count-R2 does not approach goodness of fit in a way comparable to any OLS approach. It transforms the continuous predicted probabilities into a binary variable on the same scale as the outcome variable (0-1) and then assesses the predictions as correct or incorrect. Count-R2 treats any record with a predicted probability of 0.5 or greater as having a predicted outcome of 1 and any record with a predicted probability less than 0.5 as having a predicted outcome of 0. Then, the predicted 1’s that match actual 1’s and predicted 0’s that match actual 0’s is tallied. This is the number of records correctly predicted, given that the cutoff point of 0.5. The R-square is the correct count divided by the total count. Let us now define the predicted value yˆ *i which is also a binary variable such that ­1, if yˆ i ! 0.5 yˆ *i = ® ¯0, if yˆ i  0.5

(11.131)

The count-R2 is given by Count-R 2

Number of Correct Predictions Total Number of Observations

(11.132)

Note: The value of R-squared lies between zero and one. If the value of R-squared is one, it indicates that the fit is perfect. Also, note that it assumes that there is an intercept in the model. This may be an actual explicit intercept or an implicit intercept (as when you use a complete set of indicator variables to represent a categorical variable). Ex. 11-8: Different measures of goodness of fit of the logit, probit and linear probability models are estimated using the data given in Ex. 11-1. The estimated values are reported in Table 11-18. Table 11-18: Different R 2 measures for the logit, probit, and linear probability models

Different R 2 measures

ˆ Squared correlation between FINC and FINC 2 Effron’s R Cragg-Uhler’s R2 Mcfadden’s R2 Maddala’s R2

Logit 0.2384 0.2664 0.3403 0.2127 0.2551

Probit 0.2596 0.2363 0.3380 0.2110 0.2534

LPM 0.2404 0.2404 0.3141 0.1896 0.2404

Limited Dependent Variable Models

607

The estimated results in Table 11-18 indicate that there is not much difference between goodness of fit measures in the logit and probit models. Thus, we can choose any one between them for analysis. The estimated results support that both the logit and probit model are better than the linear probability model.

11.5 Censoring and Truncation Censoring and truncation are two different phenomena that cause our samples to be incomplete. These phenomena may arise in different research problems irrespective of any discipline. If we ignore truncation or censoring when analysing data, our estimates of population parameters will be inconsistent. Censoring or truncation happens during the sampling process. Censoring is a statistical process in which observations will be censored, meaning that we only know that they are below (or above) some bounds. In other words, censoring is a condition in which the value of a measurement or observation is only partially known. In censoring, values of the dependent variable in a certain range are all transformed to (or reported at) a single value. Some examples are given below: (i) Bangladesh Bank intervenes if the exchange rate hits the band’s lower limit. (ii) Bangladesh Securities and Exchange Commission intervenes if the price index hits the lower limit. (iii) If we want to study income level, people having income below the poverty line will not be present in the sample. (iv) If we are interested in studying the size of Hilsha fish based on the specimens captured with a net, fish smaller than a specific size say m grams will not be present in our sample. (v) Suppose that, in a study, we may have full demographic information on a set of individuals, but we only observe the number of hours worked per week for those who are employed. The phenomenon of truncation occurs when values beyond a boundary are either excluded when gathered or excluded when analysed. In the case of truncation, the sample is drawn from a subset of the population so that only certain values are included in the sample. We lack observations on both the response variable and explanatory variables. Some examples are given below: (i) Suppose that we have a sample of individuals who have a high school diploma, some college degrees, some university degrees or more university degreess. The sample has been generated by interviewing those who have completed a college degree. This is a truncated sample, relative to the population, in that it excludes all individuals who have not completed a college degree. The characteristics of those excluded individuals are not likely to be the same as those in our sample. (iii) If we are interested in studying the size of Hilsha fish based on the specimens captured with a net, fish smaller than the net grid will not be present in the sample. Censored Data: For part of the range of y, we observe only that y is in that range, rather than observing the exact value of y. Ex. 11-9: Annual income top-coded at TK 60,000 (censored from above). Expenditures or hours worked bunched at 0 (censored from below).

Now, censored from below is explained with the following figure: 6 5 4 3 2 1 0

0

1

2

3

4

Fig. 11-14: Censored from below: If y d 2 we do not know its exact value.

5

6

7

Chapter Eleven

608

PDF(y*)

0.4 0.35

Prob(y*)

0.3

Prob(y*>2)

0.25 0.2 0.15 0.1

Prob(y*”2)

0.05 0

2

y*

-



Fig. 11-15: Censored from below

The pdf of the observable variable, y, is a mixture of discrete (prob. mass at y=2) and continuous (Prob[y*>2]) distributions.

0.4

PDF(y*) Prob(y*)

0.35 0.3 0.25 0.2

Prob(y*”2)

Prob(y*>2)

0.15 0.1 0.05 0 -

2

-

1

y*

Fig. 11-16: Censored from below.

Here, we assign the full probability in the censored region to the censoring point, 2 under censoring. The researcher may not care about (or instruments may not be able to detect) the level of pollutants if it falls below a certain threshold (e.g., 0.005 parts per million). In this case, any pollutant level below .005 ppm is reported as “ $60,000 per year. Sample excludes those individuals that have expenditures of TK 0.

Let us now consider the following terms: y = an observed dependent variable; y*

a latent variable ;

x = a set of k (k•1) explanatory variable(s); l = lower threshold; u = upper threshold. Truncation from below is defined as ­° y* , if y*i > l yi = ® i . °¯not observed, otherwise

Limited Dependent Variable Models

609

Truncation from above is given by ­° y* , if y*i < u yi = ® i . °¯not observed, otherwise

And truncation from between is given by ­° y* , if l < y*i < u yi = ® i . °¯not observed, otherwise

Thus, we can say that y is truncated when we only observe x for observations where Y would not be censored. We do not have a full sample for {y, x}; we exclude observations based on the characteristics of y. If y < 3 , the value of x or y is unknown to us which is called truncation from below. This is highlighted in Fig. 11-17. 14 12 10 8 6 4 2 0 0

1

2

3

4

5

6

7

Fig. 11-17: Truncation from below: If y < 3, we do not know its exact value.

For example, if a family’s income is below a certain level, say y 0 , we have no information about the family’s characteristics. Thus, we exclude those families whose income is less than y 0 . ( 0.4

PDF(Y)

0.35 0.3

Prob[Y>3]

0.25 0.2 0.15 0.1 0.05 0

3

Y

Fig. 11-18: Truncation from below: If y < 3, we do not know its exact value

Under truncation, the pdf of y will be conditional while under censoring, the censored distribution of y is a combination of probability mass function (pmf) and probability density function (pdf). Censored Regression Models

In a censored regression model, the dependent variable is censored, but we can include the censored observations in the regression equation. The dependent variable can be either left-censored, right-censored, or both left-censored and right-censored, where the lower and/or upper limit of the dependent variable can be any number. The censored regression model is called the generalisation of the standard Tobit model. Let us now consider that y is the observed dependent variable; X1 , X 2 ,........,X k are k (k t 1) independent variables; y* is the latent variable; l is the lower limit of the dependent variable y; and u is the upper limit of the dependent variable y. Let us consider the regression equation of the type y*i = ȕ 0 +ȕ1X1i +............+ȕ k X ki +İ i = x ci ȕ+İ i

(11.133)

Chapter Eleven

610

where x i

1

X1i

....... X ki c , and ȕ = >ȕ 0

ȕ1 ... ... ȕ k @c

Here, the subscript i (i = 1, 2,………,N) indicates the ith observation of the corresponding variable; y*i is the ith observation of the unobserved (“latent”) variable y* , x i is a vector of the explanatory variables, ȕ is a vector of unknown parameters, and İ i is a disturbance term. Equation (11.133) is called a censored regression equation if ­l, if y*i d l ° * yi = ® yi , if l < y*i < u ° * ¯u, if yi t u

If l =  f, or u = + f, the dependent variable y is not left-censored or right-censored, respectively. Truncated Regression Models

Truncated regression is different from censored regression. In censored regression, the dependent variable may be censored, but you can include the censored observations in the regression equation. But in truncated regression equations, a subset of observations is dropped. Thus, only the truncated data are available for the regression. If the dependent variable is truncated, we do not observe any information about a certain segment of the population. In other words, we do not have a representative (random) sample from the population. This can happen if a survey targets a sub-group of the population, for instance, when surveying industries, if we exclude the industries with less than 10 employees. Clearly, if we are modelling employment based on such data, we need to recognise the fact that industries with less than 10 employees are not included in our analytical data. Alternatively, it could be that we target poor households, and so, we exclude the household from our analysis with an income level higher than some upper threshold c. Let us now consider the regression equation of the type (11.134)

yi = ȕ 0 +ȕ1X1i +.........+ȕ k X ki +İ i

Here, the disturbance term İ i is normally distributed with mean 0 and variance ı 2 , i.e., İi ~N(0, ı2 ),  i , and uncorrelated with the explanatory variables, i.e., Cov(İi , X ji ) = 0,  i and j. Suppose that all the observations for which yi > c, are excluded from the sample. Then, equation (11.134) is called the truncated regression equation.

Theoretical Consideration of the Truncated Models

In a truncated model, we do not observe the data beyond a limit. If we just use the OLS method with this truncated data, our estimators will be biased as we now see. Let us now consider a latent random variable y* which linearly depends on k (k t 1) independent variable(s) of the type y*i = ȕ 0 +ȕ1X1i +............+ȕ k X ki +İ i (11.135)

= x ci ȕ+İ i

where x i

1

X1i

....... X ki c , and ȕ = >ȕ 0

ȕ1 ... ... ȕ k @c .

Here, the subscript i (i = 1, 2,………,N) indicates that the ith observation of the corresponding variable, x i is a vector of explanatory variables; ȕ is a vector of unknown parameters; and İ i is the disturbance term. The random error term İ i is independently, identically, and normally distributed with mean 0 and variance ı 2 , i.e., İ i ~IIN(0, ı 2 ),  i, and Cov(İ i , X ji ) = 0,  i and j. The distribution of y*i given x i is, therefore, also normal, i.e., y*i | xi ~ IIN(xciȕ, ı2 ). The

expected value of y*i is x ci ȕ i.e., E(y*i | x i )

x ci ȕ.

Suppose that the observed dependent variable y is truncated from below at zero and we want to estimate the model yi = x ci ȕ+İ i

(11.136)

Limited Dependent Variable Models

611

If we estimate model (11.136) using the truncated data, the OLS estimator ȕˆ will be biased. Visually, the threshold to the right of L is (l  x icȕ) . If we only use this part in data, our estimates will ignore the Lpart of the distribution [Fig. 11-19]. Thus, to ignore this part, the OLS estimator ȕˆ in the truncated data will be biased. Thus, for unbiasedness of the distribution, we need to add part L of the distribution. This can also be seen using the conditional expectation underlying the regression equation E(y*i |x i , y*i > 0) = x ci ȕ + E(İ i |x i , y*i > 0) = x ci ȕ + E(İ i | İ i > l  x ci ȕ)

(11.137)

= x ci ȕ + D f( l  x ci ȕ)

Given that (11.137) is the conditional expectation, the correct model for estimation should have been (11.138)

yi = x ci ȕ+Įf( l-x ci ȕ)+İ i

If we just use the model of the type (11.139)

yi = x ci ȕ+İ i

for estimation, we are omitting the term D f( l  x ci ȕ) from the estimated model. Thus, the OLS estimator ȕˆ of ȕ will be biased. Instead, we should use the correct model, including the term D f( l  x ci ȕ) . However, we cannot estimate this term using the OLS method because f( l  x ci ȕ) is most likely non-linear on ȕ . Instead of the OLS estimator, we can use the maximum likelihood estimator. (

0.4

0.35

0.3

0.25

0.2

0.15

L part

R part

0.1

0.05

0

… …….. ………. -1

Fig. 11-19: Truncation from below

We need a density (pdf) that integrates to 1 across the support of y. To get the pdf to integrate to 1, we will scale the pdf by the probability that an observation falls in the observed region. We will weigh each observation that falls in the R part shown in Fig. 11-19, by the probability that an observation will be in the R part instead of the L part. We will do this by dividing the probability density function (pdf) by the cumulative density function (cdf) of the observed part of the distribution shown below. f(İ i ) Prob(İ i >l)

f(İ i |İ i >l) =

f(İ i ) , [by symmetry] 1  Prob(İ i l) dH i =

³

f

l

f(İ i ) dİ i 1-F(l)

f 1 f(İ i )dH i ³ 1  F(l) l

=

F(f)  F(l) 1  F(l)

=

1  F(l) 1  F(l)

1

(11.141)

Thus, it is seen that this density integrates to 1. Estimation of the Truncated Models

Let us consider the regression model of the type yi = x ci ȕ+İ i , where İ i ~IIN(0, ı 2 ),  i

(11.142)

Here, we discuss the technique to estimate model (11.142) in such a situation in which the sample is limited in some manner based upon the values of the dependent variable y. We also discuss the technique using the simplest type of samples with the limited dependent variable that is truncated. In a truncated sample, an observation is left out of the observable sample if the value of y does not meet some criteria. Let the variable y be truncated below at l that is only observations for which yi t l are in the sample. Since İ i ~IIN(0, ı 2 ) Ÿ

İi ~IIN(0, 1) ı

We can write f(İ i |İ i >l) =

f(İ i ) Prob(İ i >l)

(11.143)

where lº ªİ Prob(İ i > l) = Prob « i ! » ¬ ı ı¼ §l· 1 F¨ ¸ ©ı¹ § l· F¨  ¸ © ı¹

(13.144)

We know, f(İ i )

1 2ʌı 2

exp 

1 İ i2 2 ı2

1 1 1 ªİ º exp  « i » ı 2ʌ 2¬ı¼

2

1 § İi · f V ¨© ı ¸¹

Thus, equation (11.143) can also be written as

(11.145)

Limited Dependent Variable Models

613

1 § İi · f ı ¨© ı ¸¹ f(İ i |İ i >l) = § l· F¨  ¸ © ı¹

Thus, the likelihood function for the ith sample observation is given by 1 § yi  x ci ȕ · f ı ¨© ı ¸¹ Li ȕ, ı = § x cȕ  l · F¨ i ¸ © ı ¹ 1 § yi  x ci ȕ · f ı ¨© ı ¸¹ , if ı z 1, and l z 0 § l  x ci ȕ · 1 F¨ ¸ © ı ¹

=

1 § yi  x icȕ · ¸ f¨ ¸ ı ¨© ı ¹

=

§ x cȕ · 1 F¨  i ¸ ¨ ı ¸ © ¹



f yi  x icȕ



F x icȕ

, if ı z 1, and l = 0

, if ı

1, and l = 0

(11.146)

Thus, the likelihood function for the observed sample is given by n

L (ȕ, ı) = – Li ȕ, ı i=1

ª 1 § y  x cȕ · º i « f¨ i ¸» ¸» n «ı ¨ ı © ¹ = –« » § x cȕ  l · » i=1 « i « F ¨¨ ı ¸¸ » ¹ ¼» ¬« ©

(11.147)

The log-likelihood function is given by ª 1 § y  x cȕ · º i « f¨ i ¸» ¸» n ı « ı ¨© ¹ log(L(ȕ, ı) = ¦ log « » § · i=1 « x icȕ  l » « F ¨¨ ı ¸¸ » ¹ ¼» ¬« ©

n 1 C  log(ı 2 )  2 2 2ı

n

¦ ª¬ y i=1

i

n 2 § x cȕ  l · ¸  x icȕ º  ¦ logF ¨ i ¼ ¨ ı ¸ i=1 © ¹

(11.148)

where C is a constant term which is free from the parameters to be estimated. Equation (11.148) can be slightly changed for parameterisation due to Olsen (1978), instead of (ȕ, ı) , change to (Ȗ, h) = (ȕ/ı, 1/ı) . This function will be a well-behaved function. Then, the log-likelihood function is given by

Chapter Eleven

614 n

2

n



log(L(Ȗ, h) = C + nlog(h)  ¦ ª hyi  x ic Ȗ º  ¦ logF x ic Ȗ  hl ¬ ¼ i=1 i=1



(11.149)

The maximum likelihood estimates of Ȗ 0 , Ȗ1 ,.....,Ȗ k , and h (where Ȗ i = ȕ i /ı, i= 1, 2,........,k ) can be obtained by taking the partial derivatives of logL(Ȗ, h) with respect to Ȗ 0 , Ȗ1 ,.......,Ȗ k , and h , and then, equating to zero, i.e., įlogL(Ȗ, h) ½ =0 ° įȖ 0 ° įlogL(Ȗ, h) ° =0 ° įȖ1 ° ° . ° . ¾ ° . ° įlogL(Ȗ, h) ° =0 ° įȖ k ° įlogL(Ȗ, h) ° =0 ° įh °¿

(11.150)

These are called the (k+2) first-order conditions for MLE. Solving these (k+2) equations, we can obtain the MLE of Ȗ 0 , Ȗ1 ,.......,Ȗ k , and h respectively. The first-order conditions arising from equation (11.150) are nonlinear and nonanalytic. It is typically not possible for us to solve analytically. Instead, to obtain the parameter estimates, we rely on some sophisticated iterative “trial and error” techniques. Several algorithms can be used to solve these (k+2) equations to estimate the parameters but we will not study these here. The most common ones are based on the first and sometimes second derivatives of the log likelihood function. Therefore, we need to obtain the ML estimates using the numerical optimisation methods, e.g., the Newton-Raphson method. The Tobit Model

Nobel Laureate and economist James Tobin10 developed the Tobit model in 1958 as an extension of the logit and probit models. The Tobit model is discussed below. Let us consider a latent variable regression model of the type y*i = ȕ 0 +ȕ1X1i +.........+ȕ k X ki +İ i

(11.151)

= x ci ȕ+İ i

where y* is a latent random variable which linearly depends on k (k t 1) independent variable(s), x i = (1 X1i ......X ki )c, and ȕ = >ȕ 0

ȕ1 ... ... ȕ k @c .

Here, the subscript i (i = 1, 2,………,N) indicates the ith observation of the corresponding variable; x i is a vector of explanatory variables; ȕ is a vector of unknown parameters; and İ i is a disturbance term. The random error term İ i is independently, identically and normally distributed with mean 0 and variance ı 2 , i.e., İ i ~IIN(0, ı 2 ),  i, and Cov(İ i , X ji ) = 0,  i and j . The distribution of y*i , given x i , is also normal, i.e., y*i | x i ~ N(x ci ȕ, ı 2 ). The expected value of y*i is x icȕ , i.e., E(y*i |x i ) = x ci ȕ. Let the latent variable y*i be observed if y*i > 0, and not observed if y*i d 0. Then, the observed yi is defined as ­° y* , if y*i > 0 yi = ® i * °¯0, if yi d 0

(11.152)

This model is known as the Tobit model. It is also known as a censored normal regression model because some observations on y*i (those for which y*i d 0 ) are censored.

10

J. Tobin, “Estimation of Relationships for Limited Dependent Variables,” Econometrica, Vol. 26, 1958, pp. 24-36

Limited Dependent Variable Models

615

Ex. 11-11: Suppose that we are interested in finding the effect of income on expenditure for buying an apartment in Dhaka city. Let the variable y* indicate the expenditure for buying an apartment in Dhaka city and X denote the income of a consumer. Let us consider a regression model of the type y*i = ȕ 0 +ȕ1X i +İ i , where İ i ~IIN(0, ı 2 ),  i.

(11.153)

The problem is that, if a consumer does not buy an apartment in Dhaka city, we have no information on housing expenditures for such consumers. We have information only on those consumers who buy an apartment. Thus, the consumers are divided into two groups, group one consisting of n1 consumers about whom we have information on income as well as the amount of expenditure on housing, and group two consisting of n 2 consumers about whom we have information only income but no information on housing expenditures. A sample in which information on the dependent variable is available only for some observations is known as the censored regression model. It is called a limited dependent-variable model because of the restrictions put on the values of the dependent variable. To find the impact of income on expenditure, we use the Tobin model. The model is specified as ­°ȕ 0 +ȕ1X i +İ i , if y*i > 0 yi = ® if y*i = 0 °¯0,

(11.154)

If we apply the OLS method to the equation based on n1 observations, the OLS estimators will be biased and inconsistent. Estimation of a Tobit Model

Let the latent variable y* linearly depend on k (k t 1) independent variable(s) and let us now consider the latent variable regression model of the type y*i = x ci ȕ+İ i

(11.155)

where x i = 1 X1i

..... X ki c , ȕ = >ȕ 0

ȕ1 ... ... ȕ k @c , and İ i ~IIN(0, ı 2 ),  i.

Thus, the linear conditional expectation of the latent variable y*i is given by E(y*i | x i )

x ci ȕ

(11.156)

Let the latent variable y*i be observed if y*i > 0 and is not observed if y*i d 0 . Then, the observed yi is defined as ­° y* , if y*i > 0 yi = ® i * °¯0, if yi d 0

(11.157)

Now, Prob(yi =0|x i ) = Prob(y*i d 0|x i ) = Prob(x ci ȕ + İ i d 0|x i ) = Prob( İ i d  x ci ȕ |x i )

 x cȕ ªİ º = Prob « i d i | x i » ı ¬ı ¼ §  x cȕ · F¨ i ¸ © ı ¹ § x cȕ · 1 F¨ i ¸ © ı ¹

The conditional density of yi is given by

(11.158)

Chapter Eleven

616

1 ª yi  x ci ȕ º f ı «¬ ı »¼

f(yi | x i )

(11.159)

Thus, the likelihood function is given by L(ȕ, ı 2 )

n1

ª

i=1 yi 0

¬

n § x ci ȕ · º 2 ª 1 ª yi  x ci ȕ º º f – ı ¸¹ »¼ i=1 «¬ ı «¬ ı »¼ »¼

– «1  F ¨©

(11.160)

yi ! 0

The log-likelihood function is given by lnL(ȕ, ı 2 ) =

n1

ª

i=1 yi 0

¬

n2 ª 1 ª y  x ci ȕ º º § x ci ȕ · º  ¦ ln « f « i » ¸ ı ¹ ¼ 1=1 ¬ ı ¬ ı »¼ »¼

¦ ln «1  F ¨©

yi ! 0

n1

n2 ª ª ª yi  x ci ȕ º º § x icȕ · º ln 1 F   ¦ « ¨ ı ¸ » ¦ ln «f « ı »  ln ı » © ¹ ¼ 1=1 ¬ ¬ ¼ i=1 ¬ ¼

lnL(ȕ, ı 2 ) =

yi 0

(11.161)

yi ! 0

The first-order conditions for maximising the log likelihood function are given by ­ § x cȕ · ½ ıf ¨ i ¸ ° n 2 ° 1 ° 1 © ı ¹ x ° 0Ÿ¦ 2 ® ¦ ^ yi  x ciȕ x i ` 0 i¾ § x icȕ · ° i=1 ı 2 i=1 ı ° 1 F¨ ¸ °¯ © ı ¹ °¿

(11.162)

­ § x cȕ · ½ x i ȕf ¨ i ¸ ° n 2 ª 2 ° 1 ° © ı ¹ °  « yi  x ci ȕ  1 º» 0Ÿ¦ 3 ® ¾ ¦ 2ı 4 2ı 2 »¼ § x cȕ · i=1 2ı ° 1  F ¨ i ¸ ° i=1 ¬« °¯ © ı ¹ °¿

(11.163)

n1

2

įlnL(ȕ, ı ) įȕ

n1

2

įlnL(ȕ, ı ) įı 2

Multiplying equation (11.162) by

ıˆ 2 =

1 n2

n2

¦ y

i

 x ci ȕˆ

i=1



0

ȕc and then adding with equation (11.163), we have 2ı 2

2

(11.164)

Thus, it can be said that MLE of V 2 is based on only positive observations in the sample data. Now, the MLE of ȕ = (ȕ 0 , ȕ1 ,.......,ȕ k )c, can be obtained by solving equation (11.162) respectively. But it is not possible for us to solve it analytically. Instead, to obtain the parameter estimates, we rely on some sophisticated and iterative “trial and error” techniques. Several algorithm techniques can be applied to solve these (k+1) equations to estimate the parameters, but the most popular and widely applicable technique is the Newton-Raphson method. Marginal Effects

Let us consider the latent variable regression model of the type y*i = x ci ȕ+İ i , where İ i ~IIN(0, ı 2 ),  i.

(11.165)

Thus, the conditional expectation of the latent variable y*i is given by E(y*i | x i )

x ci ȕ

Let the latent variable y*i be observed if y*i > 0, and is not observed if y*i d 0 . Then, the observed yi be defined as ­° y* , if y*i > 0 yi = ® i * °¯0, if yi d 0

(11.166)

Limited Dependent Variable Models

617

Marginal Effect on the Latent Variable

The marginal effect of the independent variable X j , (j =1, 2,.....,k) on the latent variable y* is given by wE(y*i |x i ) wX j,i

(11.167)

ȕj

Marginal Effect on Positive Values of Y

We can write that E(yi |x i , yi > 0)

(11.168)

x ci ȕ+E(İ i |İ i !  x ci ȕ)

We know that, if z is normally distributed with mean zero and variance 1, then E(z|z > c) =

f(c) 1  F(c)

Since İ i ~N(0, ı 2 ) Ÿ

(11.169) İi ~N(0, 1) ı

Now from equation (11.168), we can write that E(yi |x i , yi > 0)

 x cȕ º ªİ İ x ci ȕ  V E « i | i ! i » ı ı ı ¼ ¬ §  x cE · f¨ i ¸ ı ¹ x ci ȕ  ı © [From equation (11.169)]  § x cE · 1 F¨ i ¸ © ı ¹ § x cȕ · f¨ i ¸ ı ¹ x ci ȕ  ı © >Since f(-z) = f(z) and 1-F(-z) = F(z)@ § x icȕ · F¨ ¸ © ı ¹

Let us define the inverse Mills ratio as O (c) =

(11.170)

f(c) . Then, equation (11.170) can be written as F(c)

§ x cȕ · E(yi |x i , yi >0) = x ci ȕ+ıȜ ¨ i ¸ © ı ¹ Equation

(11.171)

is

called

(11.171) the

conditional

expectation

of

the

variable

y,

given

The partial derivative of E(yi |x i , yi >0) with respect to X j , (j =1, 2,.......,k) is given by

wE(yi |x i , yi >0) wX j, i

§ x cȕ · § x cȕ · wO ¨ i ¸ w ¨ i ¸ ı ¹ © ı ¹ ȕ j +ı © § x ci ȕ · wX j, i w¨ ¸ © ı ¹ § x cȕ · wO ¨ i ¸ ı ¹ ȕ j +ȕ j © § x cȕ · w¨ i ¸ © ı ¹

By using the quotient rule of differentiation, we can write that

(11.172)

that

y > 0.

Chapter Eleven

618

wO (c ) wc

F(c)f c(c)  f(c)Fc(c)

(11.173)

2

> F(c)@

But Fc(c) = f(c), by definition. Using the definition of normal density function, we have f c(c) equation (11.173) can be written as wO (c ) wc

 cf(c) . Therefore,

cF(c)f(c)  f(c)f(c) 2

> F(c)@

2

cȜ(c)  > Ȝ(c) @

 O (c ) > c  O (c ) @

Equation (11.172) can be written by replacing c by wE(yi |x i , yi >0) wX j, i

(11.174) x ci ȕ ı

§ x cȕ · ª§ x cȕ · § x cȕ · º ȕ j  ȕ j Ȝ ¨ i ¸ «¨ i ¸  Ȝ ¨ i ¸ » © ı ¹ ¬© ı ¹ © ı ¹¼ ­° § x cȕ · ª§ x cȕ · § x cȕ · º ½° ȕ j ®1  Ȝ ¨ i ¸ «¨ i ¸ +Ȝ ¨ i ¸ » ¾ © ı ¹ ¬© ı ¹ © ı ¹ ¼ ¿° ¯°

(11.175)

The expression within the brackets will be between 0 and 1, so the effect of X j on the conditional expectation of Y, given that Y is positive, is of the same sign as ȕ j but with a smaller magnitude. Testing ȕ j = 0, is a valid test for the partial effect being zero. Marginal Effect on the Actual Variable Y

The unconditional (on Y > 0) expectation but conditional on X is given by E(yi |x i ) = 0. Prob(yi = 0|x i ) + E(yi |x i , yi > 0) Prob(yi >0|x i ) = E(yi |x i , yi >0) Prob(yi > 0|x i ) ª § x cȕ · º f ¨ i ¸» «  x cȕ º ı ¹» ªİ = « x icȕ+ı © Prob « i ! i » ı ¼ § x cȕ · » « ¬ı F¨ i ¸ » « ı © ¹¼ ¬ ª § x cȕ · º f ¨ i ¸» « ı ¹»ª § -x cȕ · º = « x ci ȕ+ı © 1  F ¨ i ¸» « § x cȕ · » « © ı ¹¼ F¨ i ¸ » ¬ « ı © ¹¼ ¬

ª § x cȕ · º f ¨ i ¸» « x cȕ ı ¹ » F ¨§ i ¸· = « x ciȕ+ı © c x ȕ ı ¹ § · « F ¨ i ¸ »» © « ı © ¹¼ ¬ § x cȕ · § x cȕ · = F ¨ i ¸ x ci ȕ+ıf ¨ i ¸ ı © ¹ © ı ¹ The partial derivative of E(yi |x i ) with respect to X j , (j =1, 2,.......,k) is given by

(11.176)

Limited Dependent Variable Models

wE(yi |x i ) wX j, i

§ x cȕ · ȕ j F ¨ i ¸  x icȕ © ı ¹

619

§ x cȕ · § x cȕ · § x cȕ · § x cȕ · wF ¨ i ¸ w ¨ i ¸ wf ¨ i ¸ w ¨ i ¸ ı ı © ¹ © ¹ ı © ı ¹ © ı ¹ § x cȕ · wX j, i § x cȕ · wX j, i w¨ i ¸ w¨ i ¸ ı © ¹ © ı ¹

§ x cȕ · § x cȕ · ȕ j § x cȕ · § x cȕ · ȕ j ȕ j F ¨ i ¸  x ci ȕf ¨ i ¸  ı ¨  i ¸ f ¨ i ¸ © ı ¹ © ı ¹ı © ı ¹ © ı ¹ı § x cȕ · § x cȕ · ȕ j § x cȕ · ȕ j ȕ j F ¨ i ¸ +x icȕ f ¨ i ¸  x ci ȕ f ¨ i ¸ © ı ¹ © ı ¹ı © ı ¹ı § x cȕ · = ȕ jF ¨ i ¸ © ı ¹

(11.177)

Marginal Effect on the Probability that an Observation is Uncensored  x cȕ º ªİ Prob(yi >0|x i ) = Prob « i ! i » ı ı ¼ ¬ ª §  x icȕ · º «1  F ¨ ı ¸ » © ¹¼ ¬

§ x cȕ · = F¨ i ¸ © ı ¹

(11.178)

The partial derivative with respect to X j is given by

wProb(yi >0|x i ) wX j, i

§ x cȕ · § x cȕ · wF ¨ i ¸ w ¨ i ¸ © ı ¹ © ı ¹ § x cȕ · wX j, i w¨ i ¸ © ı ¹ ȕ j § x icȕ · f ı ¨© ı ¸¹

(11.179)

Ex. 11-12: The Tobit model is estimated for the banking sector of Bangladesh considering net profitability (PRO) as the dependent variable and green banking activities (GB, in million taka), loans and advances (LOAN, in million taka), deposits and other accounts (DEPO, in million taka), paid-up capital (PAID, in million taka), and investments (INVEST, in million taka) as the independent variables using the data for the period 2021. To estimate the Tobit model, the following latent variable regression equation is considered: PRO*i = ȕ 0 +ȕ1GBi +ȕ 2 INVTi +ȕ3 LOAN i +ȕ 4 DEPOi +ȕ 5 PAIDi +İ i

(11.180)

where PRO*i is the latent variable which indicates the net profit; GBi is the green banking activities; INVTi is the investment; LOAN i indicates the loans and advances; DEPOi is the deposits and other accounts; and PAIDi is the paid-up capital of the ith bank (i =1, 2,……,n). İ i is the random error term corresponding to the ith set of observations where İ i ~IIN(0, ı 2 ),  i. Let the latent variable PRO*i be observed if PRO*i > 0 and not observed if PRO*i d 0 . Then, the observed PROi is defined as ­° x ci ȕ+İ i , if PRO*i > 0 PROi = ® if PRO*i d 0 °¯0,

where xi

>1

GBi INVTi LOANi DEPOi PAIDi @c and ȕ = >ȕ 0

(11.181)

ȕ1

ȕ2

ȕ3

ȕ4

ȕ 5 @c .

Chapter Eleven

620

The ML method is applied to estimate model (11.181). The results are obtained using STATA for both the truncated and Tobit models. The estimated results are given in Table 11-19. Table 11-19: The ML estimates of the Tobit model ML estimates of Tobit Model Std. Err. t-Test Prob. 95% Confidence Interval [263.6276, 2031.541] 0.012 2.62 437.3695 [0.0561, 0.1533] 0.000 4.36 0.0240 [-0.0397, 0.0386] 0.977 -0.03 0.0194 [-0.0151, 0.0607] 0.230 1.22 0.0188 [-0.0440, 0.0189] 0.425 -0.81 0.0156 [-0.3746, 0.0182] 0.074 -1.83 0.0972 147.3535 [977.3047, 1572.93] 45 Obs. Summary 6 left-censored observations at pro0) Limit: lower = -inf , upper = +inf log likelihood = -327.61288 Number of obs = 39 log likelihood = -327.37417 Log likelihood = -327.3553 log likelihood = -327.35537 Wald chi2(5) = 39.71 log likelihood = -327.35532 Prob > chi2 = 0.0000 log likelihood = -327.35532 Coeff Std. Err. t-Test Prob. 95% Confidence Interval [-464.5618, 1390.455] 0.317 1.02 455.8867 462.9466 [0.0103, 0.1146] 0.020 2.44 0.0257 0.0625 [-0.0312, 0.0506] 0.635 0.48 0.0201 0.0096 [-0.0507, 0.0473] 0.944 -0.07 0.0241 -0.0017 [-0.0294, 0.0529] 0.565 0.58 0.0202 0.0117 [-0.3024, 0.0899] 0.279 -1.10 0.0964 -0.1062

Variable Constant GB INVT LOAN DEPO PAID SIGMA Number of obs Pseudo-R2 LR chi2(5) Prob(chi2) Log Likelihood Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4: Variable Constant GB INVEST LOAN DEPO PAID

Coeff 1147.584 0.1047 -0.00055 0.0228 -0.0125 -0.1782 1275.117

Source: Data are collected from different banks’ annual reports of 2021 and financial statements of 2021.

The Tobit model reports the ȕ coefficients for the latent regression model. The marginal effect of X j (j =1, 2, 3, 4, 5) on PRO is simply the value of ȕ j because E(PROi |x) is a linear function in x. Thus, it can be said that, for a 1-unit increase in GB, the net profit will increase by 0.1047 unit, which is statistically significant; for one unit increase in INVT, the net profit will decline by 0.0055 units, which is not statistically significant; an additional unit of increase in LOAN leads to the increase in net profit by 0.0228 unit, which is not statistically significant; one unit increase in DEPO will lower the net profit by 0.0125 unit, which is not statistically significant; and for one unit increase in PAID, the net profit will decline by 0.1782 unit, which is statistically significant at 10% level. The results of the Tobit model are not consistent with the results of the truncated model. The average marginal effects on the expected value of the censored outcome are estimated with respect to independent variables as given below. Table 11-20: The marginal effects of the Tobit model

Variable GB INVT LOAN DEPO PAID

dy/dx 0.1047 -0.00056 0.0228 -0.0025 -0.1782

The marginal effects on the latent variable Marginal effects: Latent variable Marginal effects after Tobit y = Linear prediction (predict) = 1176.6609 Std. Err. z P>|z| 0.000 4.36 0.00404 0.977 -0.03 0.01936 0.223 1.22 0.0876 0.420 -0.81 0.01557 0.067 -1.85 0.09719

The marginal effects on the unconditional expected value of PRO Average marginal effects Number of obs = 45

[ 95% C.I.] [0.0576, 0.1518] [-0.0385, 0.0374] [-0.0139, 0.0596] [-0.0431, 0.0179] [-0.3687, 0.0123]

Limited Dependent Variable Models

Variable GB INVT LOAN DEPO PAID

Variable GB INVT LOAN DEPO PAID

Variable GB INVT LOAN DEPO PAID

dy/dx 0.0808 -0.00043 0.0176 -0.0097 -0.1375

Model VCE: OIM Expression: E(pro*|pro>0), predict (ystar (0,)) dy/dx w.r.t.: GB, INVEST, LOAN, DEPO, and PAID Delta-method Std. Err. z P>|z| 0.000 4.62 0.0175 0.977 -0.03 0.0149 0.221 1.22 0.0144 0.419 -0.81 0.0119 0.064 -1.85 0.0743

621

[ 95% C.I.] [0.0465, 0.1151 ] [-0.0297, 0.0289] [-0.0106, 0.0458] [-0.0332, 0.0138] [-0.2831, 0.0081]

The marginal effects on the expected value of PRO conditional on being uncensored Average marginal effects Number of obs = 45 Model VCE: OIM Expression: E(pro|pro>0), predict (e (0,)) dy/dx w.r.t.: GB, INVEST, LOAN, DEPO, and PAID Delta-method dy/dx Std. Err. z P>|z| [ 95% C.I. [0.0350, 0.0906] 0.000 4.43 0.0142 0.0628 [-0.0231, 0.0224] 0.977 -0.03 0.0116 -0.00033 [-0.0083, 0.0357] 0.222 1.22 0.0112 0.0137 [-0.0258, 0.0107] 0.419 -0.81 0.0093 -0.0075 [-0.2217, 0.0078] 0.068 -1.82 0.0586 -0.1069 The marginal effects for the probability of being uncensored Average marginal effects Number of obs = 45 Model VCE: OIM Expression: Pr(pro>0), predict (pr (0,)) dy/dx w.r.t.: GB, INVEST, LOAN, DEPO, and PAID Delta-method dy/dx Std. Err. z P>|z| 0.000 3.74 5.46e-06 0.0000204 0.977 -0.03 3.77e-06 -1.08e-07 0.222 1.22 3.64e-06 4.45e-06 0.419 -0.81 3.03e-06 -2.45e-06 0.060 -1.88 0.0000185 -0.0000347

[ 95% C.I.] [9.71e-06 , 0.0000311] [-7.50e-06, 7.29e-06] [-2.68e-06, 0.0000116] [-8.38e-06, 3.49e-06] [-.000071, 1.52e-06]

From the estimated results, it can be said that, on an average, an additional unit of green banking activities increases latent variable PRO* by 0.1047 unit, given that all other independent variables are constant; an additional unit of investment decreases latent variable PRO* by 0.00056 unit, given that all other independent variables are constant; an additional unit of loan and advances increases latent variable PRO* by 0.0228 unit, given that all other independent variables are constant; an additional unit of deposits and other accounts decreases latent variable PRO* by 0.00255 unit, given that all other independent variables are constant; and additional unit of paid-up capital decreases latent variable PRO* by 0.1782 unit, given that all other independent variables are constant. It is also found that the marginal effects of the variables GB and PAID are statistically significant. From the estimated results, it also can be said that, on an average, an additional unit of green banking activities increases net profit by 0.0808 unit, given that all other independent variables are constant; an additional unit of investment decreases net profit by 0.0043 unit, given that all other independent variables are constant; an additional unit of loan and advances increases net profit by 0.0176 unit, given that all other independent variables are constant; an additional unit of deposits and other accounts decreases net profit by 0.0097 unit, given that all other independent variables are constant; and an additional unit of paid-up capital decreases net profit by 0.1375 unit, given that all other independent variables are constant. It is also found that the marginal effects of the variable GB and PAID are statistically significant. The average marginal effects of the variables GB, INVT, LOAN, DEPO, and PAID on PRO (which is bounded by greater than 0) are about 0.0628, -0.00033, 0.0137, -0.0075, and -0.1069, respectively. The average marginal effects of the variables GB and PAID are statistically significant.

Chapter Eleven

622

The average marginal effects of the variables GB, INVT, LOAN, DEPO, and PAID on the probability are about 0.0000204, -1.08e-07, 4.45e-06, -2.45e-06, and -0.0000347, respectively. The average marginal effects of the variables GB and PAID on the probability are statistically significant.

11.6 Poisson Regression Models Explanation of the Poisson Regression Model: The general linear regression model assumes that the random errors are normally distributed with mean zero and variance ı 2 , and hence, the dependent or response variable is normally distributed. In the case of limited dependent-variable models, where the dependent variable is a dichotomous variable taking only binary values, viz., 0 and 1, the probit and logistic regression equations are used, where the study variable follows a Bernoulli distribution. Similarly, we consider the studies where the dependent variable in a regression model is a count variable that represents the number of occurrences of an event, i.e., yi  ^0, 1, 2,...........` . For example, the

dependent variable y can be a count of medals won in a Summer Olympic games in 2016 held in Brazil with one or more explanatory variables like ln(GDP), and ln(pop). The number of cancer patients will be a function of one or more independent variables like age, hemoglobin level, sugar level of patients, etc. The number of deaths due to coronavirus will be a function of one or more independent variables like age, gender, hemoglobin level, sugar level of patients, etc. The number of car accidents in Dhaka city in a year will be a function of one or more independent variables like years of education of drivers, years of driving experience, working hours per day, salary level of drivers, etc. The number of goals in World Cup 2018 will be a function of one or more independent variables like possession, number of fouls, number of corner kicks, number of shots on goal, etc. In such situations, for the count variable, the assumption of normal or Bernoulli distribution will not be applicable. To describe such situations, the Poisson distribution is appropriate. If we assume that the dependent variable yi is a count variable and follows a Poisson distribution with a ind

parameter ȝ i , i.e., yi ~ Poisson(ȝ i ), then the probability function is given by f(yi ,ȝ i )

e  ȝ i ȝ i yi , yi yi!

(11.182)

0, 1, 2,......

where E(yi ) = ȝ i , and var(yi ) = ȝ i . Based on the sample observations y1 , y 2 ,.........., y n , we can write that E(yi ) = ȝ i , We express the Poisson regression model of the type yi = E(yi ) + İ i , i = 1, 2, 3,…...,n

(11.183)

where İ i 's are the random error terms. We can define a link function g that relates to the mean of the study variable to a linear predictor as g(ȝ i ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +...........+ȕ k X ki

(11.184)

= x ci ȕ

where x i = >1 X1i

X 2i

.... ... X ki @c , and ȕ = >ȕ 0

ȕ1 ȕ 2

.... ... ȕ k @c .

Equation (11.184) can be written as ȝ i = g -1 (x ci ȕ)

(11.185)

The identity link function is given by g(ȝ i ) = ȝ i = x ci ȕ

(11.186)

The log-link function is given by g(ȝ i ) = ln(ȝ i ) = x ci ȕ

ȝ i = g -1 (x ci ȕ) = exp(x ci ȕ) ȝ i = exp(ȕ 0 +ȕ1X1i +ȕ 2 X 2i +...........+ȕ k X ki )

(11.187)

Limited Dependent Variable Models

623

Thus, for the log-link function, equation (1.187) can be written as (11.188)

yi = exp(x ci ȕ) + İ i

Note that, in an identity link function, the predicted values of the dependent variable y can be negative. But in a loglink function, the predicted values of y are positive. ind

Sometimes, the Poisson regression model can also be written as follows. If the study variable yi ~ Poisson(t i Oi ) , where t i gives the time length in which events occur and t i is known, the probability function is given by e  t i Oi (t i Oi ) yi , yi yi!

f(yi )

0, 1, 2,.....

(11.189)

where E(yi ) t i Oi , and var(yi ) t i Oi . Based on the sample y1 , y 2 ,......., y n , we can write that E(yi ) = t i ȝ i . We express the Poisson regression model of the type yi = E(yi ) + İ i yi = t i Oi + İ i , i = 1, 2, 3,……...,n

(11.190)

where İ i 's are the random error terms. Then the link function relates to the mean of the study variable to a linear predictor as ln(ȝ i ) = ln(Oi ) + ln(t i )

(11.191)

ln(ȝ i ) = x ci ȕ + ln(t i )

where ln(t i ) is called the offset, which is defined as a part of the generalised linear regression model with the coefficient 1. In fact, ln(Ȝ i ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +......+ ȕ k X ki

(11.192)

= x ci ȕ ind

which implies that yi ~ Poisson(t i e xicȕ ) If we only have a single observation for each X j , we can estimate (11.191) by ln(yi ) = ln(ȝ i ) = ln(t i ) + x ci ȕ

(11.193)

That is, the estimated equation is given by ln(yˆ i ) = ln(ȝˆ i ) = ln(t i ) + x ci ȕˆ

(11.194)

If the Poisson model is valid, then ln(yi )  ln(t i ) will be a linear function of X j's . If yi = 0, holds, we need to consider ln(yi +c) for small c. Like the binary regression, we can investigate the main and interaction effects. Interpretation of ȕ j

If the Poisson regression model is of the type log(ȝ i ) = ȕ 0 +ȕ1X1i +ȕ 2 X 2i +........+ȕ k X ki , ȕ j can be explained as (i) We have

w (yi |x i ) = ȝ i ȕ j . Thus, for every unit increases in X j , the value of y will increase by ȝ i ȕ j units. Since wX ji

ȝ i t 0, the changes will be either positive or negative depending on the sign of ȕ j if ȕ j >0, implying that y will

increase as X j increases. If ȕ j 0, it indicates that log( ȝ i ) will increase as X j increases, if ȕ j ȕ1 , ȕ2 ,.......,ȕ k @c Equation (12.40) is called a regression equation in the deviation form from individual means which does not contain the individual effects D i . The transformation produces observations in deviations from individual means in equation (12.40) which is called the within transformation. Equation (12.40) can also be written as (12.41)

y it = x cit ȕ+u it

where y it = yit  yi , x it

(Xit  Xi ), and u it = (u it  u i ) .

In matrix form, equation (12.41) can be written as  Xȕ+u   Y=

(12.42)

We can now apply the OLS technique to equation (12.42) to estimate the parameter ȕ . This OLS estimator is called the within estimator because it relies on variations within the individuals rather than between individuals. It is also called the fixed-effects estimator as it is identical to the least-squares dummy variable estimator (LSDV). The OLS estimator of ȕ is given by  cX)  -1X  cY  ȕˆ FE = (X ª n T º = « ¦¦ (X it  X i )(X it  X i )c» ¬ i=1 t=1 ¼

1

n

T

¦¦ (X

it

 X i )(y it  y i )

(12.43)

i=1 t=1

The variance-covariance matrix of ȕˆ FE is given by var(ȕˆ FE )

 cX)  -1ı 2 (X u 1

ª n T º = « ¦ ¦ (X it  X i )(X it  X i )c » ı 2u ¬ i=1 t=1 ¼

(12.44)

A consistent estimator ı2u is obtained from the sum of squared residuals from the within estimator divided by n(T-1), which is given by ıˆ 2u =

1 n(T  1)

n

T

¦¦ uˆ

2 it

, where uˆ it = (yit  yi )  (X it  X i )cȕˆ FE

i=1 t=1

Also, averaging across all observations, equation (12.38) can be written as

y = ȕ0 +ȕ1X1 +ȕ2 X2 +...........+ȕk Xk +u where y =

1 n T 1 n T 1 n T yit , X j = X jit , and u = ¦¦ ¦¦ ¦¦ u it . nT i=1 t=1 nT i=1 t=1 nT i=1 t=1

(12.45)

Regression Models Based on Panel Data

Here, we use the restriction

n

¦Į

647

= 0. To avoid the dummy variable trap or perfect multicollinearity, this arbitrary

i

i=1

restriction on the dummy variable coefficients is used. In fact, we estimate the parameters ȕ and (ȕ 0 +Į i ) from equation n

(12.38) and not ȕ 0 and Įi separately. Thus, we have to put the arbitrary restriction ¦ Į i = 0 . i=1

Equation (12.45) can be written as (12.46)

y = ȕ 0 +x cȕ+u

where x = ª¬ X1 , X 2 ,........,X k º¼c , and ȕ = >ȕ1 , ȕ2 ,.......,ȕ k @c . The estimator of ȕ 0 can be obtained from equation (12.46) and is given by ȕˆ 0 = y  x cȕˆ FE

(12.47)

Therefore, the estimator of Į i can be obtained from equation (12.39) as given by Įˆ i = yi  ȕˆ 0  x icȕˆ FE

(12.48)

where x i = ª¬ X1i , X 2i ,...........,X ki º¼c . Important Properties of Fixed Effects Estimator ȕˆ EE

The important properties of the fixed-effects estimator ȕˆ EE of ȕ are given below: (i) Property of Unbiasedness: The fixed-effects estimator ȕˆ FE is an unbiased estimator of ȕ. Proof: The fixed-effects estimator of ȕ is ª n T º ȕˆ FE = « ¦¦ (X it  X i )(X it  X i )c» ¬ i=1 t=1 ¼

1

ª n T º = « ¦¦ (X it  X i )(X it  X i )c» ¬ i=1 t=1 ¼ ª n T º = « ¦¦ (X it  X i )(X it  X i )c » ¬ i=1 t=1 ¼

-1

n

T

¦¦ (X

it

 X i )(y it  yi )

it

 X i ) ª¬ (X it  X i )cȕ+(u it  u i ) º¼

i=1 t=1

-1

n

T

¦¦ (X i=1 t=1

n

T

¦¦ (X

it

 X i )(X it  X i )cȕ+

i=1 t=1

ª n T º « ¦ ¦ (X it  X i )(X it  X i )c» ¬ i=1 t=1 ¼

-1

ª n T º = ȕ + « ¦¦ (X it  X i )(X it  X i )c» ¬ i=1 t=1 ¼

n

T

¦ ¦ (X

it

 X i )(u it  u i )

i=1 t=1

-1

n

T

¦¦ (X

it

 X i )(u it  u i )

i=1 t=1

-1

n ª n T º ­ n T ½ = ȕ + « ¦ ¦ (X it  X i )(X it  X i )c» ® ¦ ¦ (X it  X i )u it  ¦ (TX i  TX i )u i ¾ i=1 ¬ i=1 t=1 ¼ ¯ i=1 t=1 ¿ -1

ª n T º ­ n T ½ = ȕ + « ¦¦ (X it  X i )(X it  X i )c» ®¦¦ (X it  X i )u it ¾ ¬ i=1 t=1 ¼ ¯ i=1 t=1 ¿

(12.49)

It is assumed that all X it 's are independent of all of u it , i.e., E ^(X it  X i )u it ` 0 . Therefore, taking the expectation of equation (12.49), we have

Chapter Twelve

648

E(ȕˆ FE ) = ȕ

(12.50)

Thus, it can be said that the fixed-effects estimator ȕˆ FE is an unbiased estimator of ȕ. (ii) Property of Consistency: The fixed-effects estimator ȕˆ FE is a consistent estimator of ȕ. Proof: Taking the probability limit of equation (12.49), we have ª1 n T º plim(ȕˆ FE ) = ȕ  plim « ¦ ¦ (X it  X i )(X it  X i )c» ¬ n i=1 t=1 ¼ ª1 n T º = ȕ  plim « ¦ ¦ (X it  X i )(X it  X i )c» ¬ n i=1 t=1 ¼

-1 T

ª1

n

¦ plim «¬ n ¦ (X t=1

i=1

-1 T

¦ cov(X

it

it

º  X i )(u it  u i ) » ¼

(12.51)

u it )

t=1

It is assumed that all X it 's are independent of all of u it , and thus, cov(X it ,u it )

0.

Therefore, equation (12.51) can be written as plim(ȕˆ FE ) = ȕ

(12.52)

Thus, it can be said that ȕˆ FE is a consistent estimator of ȕ . (iii)

Variance-Covariance

For

Property:

large

T,

the

variance-covariance

of

ȕˆ FE

is

given

by

1

ª n T º 2 « ¦¦ (X it  X i )(X it  X i )c » ı u . ¬ i=1 t=1 ¼

Proof: The variance-covariance matrix of ȕˆ FE is given by ª n T º var(ȕˆ FE ) = « ¦ ¦ (X it  X i )(X it  X i )c» ¬ i=1 t=1 ¼

-1

n

 X i )var{(u it  u i )(X it  X i )}

-1

(T  1)ı 2u ª n T º « ¦¦ (X it  X i )(X it  X i )c » T ¬ i=1 t=1 ¼ ª n T º « ¦ ¦ (X it  X i )(X it  X i )c » ¬ i=1 t=1 ¼

=

it

i=1 t=1

ª n T º « ¦ ¦ (X it  X i )(X it  X i )c » ¬ i=1 t=1 ¼ =

T

¦ ¦ (X

-1

n

T

¦¦ (X

it

 X i )(X it  X i )

i=1 t=1

-1

(T  1)ı 2u ª n T º (X it  X i )(X it  X i )c» ¦¦ « T ¬ i=1 t=1 ¼

-1

(12.53)

If T is very large, (T-1) # T and equation (12.53) can be written as 1

ª n T º var(ȕˆ FE ) # « ¦¦ (X it  X i )(X it  X i )c» ı 2u ¬ i=1 t=1 ¼

(12.54)

Hence, the theorem is proved. (iv) Property of Asymptotic Distribution of ȕˆ FE : The fixed-effects estimator ȕˆ FE is asymptotically normally  cX º  c:X º ªX ªX  . We include distributed, that is, n (ȕˆ FE  ȕ) ~ N(0, M-1VM-1), where M = plim« » , V= plim « » , and ȍ = var(u) ¬ n ¼ ¬ n ¼ the factor n because ȕˆ is a consistent estimator of ȕ meaning that ȕˆ is asymptotically equal to ȕ with FE

FE

Regression Models Based on Panel Data

649

probability one. Thus, (ȕˆ FE  ȕ) has a degenerate distribution for n o f with all probability mass at zero. If we multiply this term by n and consider the asymptotic distribution of n (ȕˆ FE  ȕ) , this will be a nondegenerate normal distribution. In this case, n is referred to as the rate of convergence and it is sometimes said that the estimator ȕˆ is FE

nth-root consistent. Proof: From equation (12.49), we can write that n (ȕˆ FE  ȕ) =

 cX)  1 X  cu n (X

 cX  º -1 ª X  cu º ªX « » « » ¬ n ¼ ¬ n¼ n (ȕˆ FE  ȕ) is given by

The asymptotic variance of

^

c plim ª n ȕˆ FE  ȕ º ª n ȕˆ FE  ȕ º ¬ ¼¬ ¼

`

Asy var ª n ȕˆ FE  ȕ º ¬ ¼



(12.55)









­ ½  cX)  ·1 § X  cu · º ª§ (X  cX)  ·1 § X  cu · ºc ° ° ª§ (X = plim ® «¨ ¸ ¨ ¸ » «¨ ¸ ¨ ¸» ¾ ° «¬© n ¹ © n ¹ »¼ «¬© n ¹ © n ¹ »¼ ° ¯ ¿ 1

 cX) ·  cuu  c·  cX) · § (X §X § (X   cX = plim ¨ ¸ plim ¨ ¸ plim ¨ ¸ n © n ¹ © ¹ © n ¹

1

 c:X  c · -1 §X = M -1 plim ¨ ¸ M © n ¹

= M-1VM-1

(12.56)

Hence, the theorem is proved. Also, from equation (12.56), it can be said that ȕˆ FE is asymptotically normally distributed with mean ȕ and varianceasy covariance matrix M -1VM -1 /n , i.e., ȕˆ FE o N ª¬ȕ, M -1 VM -1 /n º¼ . Testing for Fixed Effects

We can test the joint significance of the dummies of fixed-effects model. The null hypothesis to be tested is H 0 : Į1 = Į 2 =.......= Į n-1 = 0

against the alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/(n  1) ~F{(n 1), nT  n  k)} UESS/(nT  n  k)

(12.57)

where UESS is the unrestricted residual sum of squares being that of the least-squares dummy variables (LSDV) regression in (12.33) and RESS is the restricted residual sum of squares being that OLS on the pooled model. The rejection of the null hypothesis implies that all fixed effects are not jointly zero. The First Difference Estimator

In this section, the first-difference (FD) method is discussed as an alternative way to estimate the individual effects Įi which gives a better solution to the fixed-effects model under certain assumptions.

Chapter Twelve

650

Let us consider the panel-data model of the type yit = ȕ 0 +Įi +ȕ1X1it +ȕ 2 X 2it +...........+ȕ k X kit +u it

(12.58)

= ȕ 0 +Įi +X cit ȕ+u it

where X it = (X1it , X 2it ,..........,X kit )c, and ȕ = >ȕ1 , ȕ 2 ,.......,ȕ k @c . Įi is the fixed effect of the ith individual but varies over the individual units.

The statistical assumptions of the fixed-effects model are given below: (i) The model is linear in parameters. (ii) The sample is drawn at random. (iii) Each X j , (j =1, 2,…,k) varies either over time t and across individuals i. (iv) E(u it |X1it , X 2it ,.....X kit , Į i ) = 0,  i and t. (v) (X1it , X 2it ,.....X kit , yit ) are i.i.d. over the cross section. (vi) The large outliers are unlikely. (vii) There is no perfect multicollinearity. (viii) cov(u it ,u is |X1it , X 2it ,.....X kit , Į i ) = 0, for t z s. (ix) var(uit |X1it , X2it ,.....Xkit , Įi ) = ı2u ,  i and t. (x) uit ~IID(0, ı2u ),  i and t. For time period (t-1), the equation will be

yi,t-1 = ȕ0 +Įi +Xci,t-1 +ui,t-1

(12.59)

The unobserved heterogeneity of the fixed-effects model can be reduced by taking the first difference. Thus, subtracting equation (12.59) from (12.58), we have

yit  yi,t-1 = (Xit  Xi,t-1 )cȕ  uit  ui,t-1 ǻyit

(12.60)

'X cit ȕ  'u it

where 'yit = yit  yi,t-1 , 'Xit = Xit  Xi,t-1 , 'uit

uit  ui,t-1

Applying the OLS method to equation (12.60), the OLS estimator which is obtained is called the first-difference estimator of ȕ and given by ª n T º ȕˆ FD = « ¦¦ ǻX it ǻX cit » ¬ i=1 t=1 ¼

1

n

T

¦¦ ǻX

it

ǻy it

(12.61)

i=1 t=1

The OLS estimator ȕˆ FD is called the first-difference estimator of ȕ . The consistency of this estimator requires that E(ǻX it , ǻu it ) = 0 E ^(X it  X i,t-1 )(u it  u it-1 )`

½° ¾ 0,  i, j, t °¿

(12.62)

Regression Models Based on Panel Data

651

This condition is weaker than that of the strict exogeneity condition E ^(X it u is ` 0,  t and s, because it allows the correlation between X and u . To compute the variance-covariance matrix of ȕˆ , it should be considered that 'u it

it-2

it

FD

exhibits serial correlation. The first-difference estimator of ȕ will be less efficient relative to the fixed-effects estimator. For T =2, the fixed-effects and first-difference estimators will be identical. Thus, it can be said that, if T=2, it does not matter since FD and FE methods are identical in that case. When T • 3, the two methods will not give us the same results, but they both are unbiased estimators of the underlying assumptions of the FE model. Both are consistent with fixed T as n ĺ ’ underlying the assumptions of the FE model. For large n and small T (a common setup in many data sets), we might be concerned with the relative efficiency of the estimators. When the u it 's are serially uncorrelated (given that they are homoscedastic, which amounts to saying that they are i.i.d., the FE estimators will be more efficient than the FD estimators, and the standard errors reported from FE are valid. We often may assume serially uncorrelated errors, but there is no reason why this condition will necessarily hold in the data. If {u it } follows a random walk process, then its difference will be uncorrelated, and then, the first-difference estimator will be appropriate. But we may often encounter an error process with some serial correlation, but not necessarily a random walk process. When T is large and n is not very large (for instance, when we have many time periods of data on each of a small number of units, i.e., say n=20 and T=30), we must be careful in using the FE estimator since its large-sample justification relies on n ĺ ’, but not T. If the classical assumptions of a fixed-effects model are violated, inferences would be very sensitive based on large T and small n. For large T and small n, if we use the unit root process, the spurious regression problem can happen. For this case, the FD estimator would be better than the FE estimator. When T is large and n is small, there is no problem of first differencing because we can apply the central limit theorem. If the FE and FD models give substantively different results, it might be very hard to choose between them, and we might want to report them both. One consideration arises when we are using an unbalanced panel—especially one in which the missing observations on some units do not appear at the beginning or end of their time series, but create “gaps” in the time series. The FE estimator has no problem with this, but the FD estimator will lose two observations when there is a single period missing in the sequence of observations for that unit. One thing we must consider is that the data are missing. If they can be considered “missing at random”, this may not be problematic; but if there is a pattern of missingness, we must be concerned about it. One issue that often arises with individuals or firms is attrition: units leaving the sample. Individuals can die; firms can liquidate or be taken over. Are these events related to the variables that are using in the regression model? If so, we may want to worry about the sample selection problem that this entails. Nevertheless, one advantage of the fixed effects is that it allows the attrition to be correlated with Į i the unobserved fixed effect. Differences-in-Differences Estimator

The differences-in-differences or double difference is an econometric technique which is applied in quantitative research or in the social sciences research that attempts to find the impact of treatment upon an outcome variable. The differencein-difference estimator is defined as the difference in the average outcome in the treatment group before and after treatment minus the difference in the average outcome in the control group before and after treatment. It is literally a "difference of differences”. This terminology comes from the medical sciences: treatment may also refer to social or economic intervention. What is the impact of taking a drug on arsenic, HIV or pandemic coronavirus? What is the impact of a labour training program on earnings? What is the impact of taking education from a University of Cambridge on earnings etc.? Let us now consider a binary variable of the type x it

­1, if individual i receives a treatment in period t ® ¯0, otherwise

Let us consider the following fixed-effects regression equation yit = Įi +ȝ t +ȕx it + u it

(12.63)

where Įi is the fixed effect of the ith individual and ȝ t is the time-specific fixed effect. Now, the impact of treatment can be obtained by a comparison of individuals who receive treatment with those who do not, and by comparison of individuals before and after the treatment. The panel-data model can easily combine both. For time t-1, model (12.63) can be written as yit-1 = Į i +ȝ t-1 +ȕx it-1 + u it-1

Subtracting equation (12.64) from (12.63), we have

(12.64)

Chapter Twelve

652

yit  yit-1 = ȝ t  ȝ t-1 +ȕ(x it  x it-1 ) + u it  u it-1 'yit

ǻȝ t  ȕǻx it  'u it

(12.65)

If we apply the OLS method to equation (12.65), the treatment effect ȕ can be obtained consistently on the basis of the assumption E(ǻX it u it ) = 0,  i and t. In this procedure, there may exist correlation between D i and the treatment indicator because in many applications, someone can argue that individuals with certain characteristics (unobserved) are more likely to participate in a programme. This approach is very similar to the fixed-effects estimator. In a fixed effects estimator, we use the within transformation; but in this approach, the first-difference transformation is used rather than the within transformation. Let us consider a situation in which there are only two different time periods say (i) time period 1, and (ii) time period 2 and the individuals take treatment in time period 2. Therefore, we have X i1 0,  i, and Xi2 1, for a subset of individuals. Equation (12.65) implies a regression equation of y i2  y i1 on treatment dummy and a constant (corresponding to the time effect). Thus, the OLS estimate ȕˆ corresponds to the expected value of y i2  y i1 given that x i2 1, i.e., for treatment minus the expected value of y i2  y i1 given that x i1 0 , i.e., for control. Thus, the OLS estimate of ȕ is given by ȕˆ

E(y i2  yi1 |x i2 =1)  E(y i2  yi1 |x i1 = 0)

= E('yi2 |x i2 =1)  E('yi2 |x i1 =0)

= 'yi2treatment  'yi2control

(12.66)

This estimator is called the differences-in-differences estimator because one estimates the time difference for the treatment and control groups and then takes the difference between these two.

y treatment, before ycontrol, before

Fig. 12-1: ȕˆ diff-in-diffs

(y treatment, after  y treatment, before )  (y control, after  y control, before )

The Random-Effects Models The random-effects models are linear regression models in which some of the parameters (effects) that define the systematic components of the models exhibit some forms of random variation, which are used for panel analyses. The econometric models always describe variation in observed variables in terms of systematic and nonsystematic components. In fixed-effects models, the systematic effects are considered fixed or nonrandom. But in random-effects models, some of these systematic effects are considered random. In a fixed-effects model, there are too many parameters and the fixed-effects least-squares or least-squares dummy variables suffer from a large loss of degrees of freedom. The loss of degrees of freedom can be avoided if we assume that Įi is random. In this case, Įi ~IID(0, ıĮ2 ), uit ~IID(0, ı2u ) , and Įi and uit are independent. In addition, X it are independent of the Įi and uit for all i and t. The random-effects model is an appropriate specification if we draw n individuals randomly from a large population. This is usually the case for household panel studies. Let us consider the random-effects model of the type

Regression Models Based on Panel Data

653

yit = ȕ 0 +ȕ1X1it +ȕ 2 X 2it +.........+ȕ k X kit +Į i +u it

(12.67)

= ȕ 0 +x cit ȕ+Įi +u it

where İ it = Į i +u it ; yit is the general observation corresponding to the ith individual at time t; ȕ 0 is a constant which indicates the general mean; ȕ is a vector of regression coefficients, i.e., ȕ = ȕ1

ȕ2

... ... ȕ k c , xit is a k-

dimensional vector of independent variables, i.e., xit = > X1it , X2it ,...........,Xkit @c , Į i +u it , is a random error term consisting of two components: Įi is the random effect of the ith individual which does not vary over time, and uit is the remainder stochastic disturbance which varies over individual i and time t, and is assumed to be uncorrelated over time. If we apply the OLS method to equation (12.67), the OLS estimators ȕˆ 0 and ȕˆ are unbiased and consistent estimators of ȕ 0 and ȕ but not efficient. Because the error component’s structure implies that the composite error term Į i +u it , exhibits a particular form of autocorrelation, the OLS technique is not correct. The GLS method will give us the more efficient estimator. The Variance Structure of Random Effects Models

To construct the efficient estimator, we need to derive the structure of the variance of the random error terms, and then, we can apply the GLS method to obtain the efficient estimator. The following assumptions must hold for efficient estimators: (i) The model is linear in parameters. (ii) The sample is drawn at random. (iii) E(Į i ) = 0,  i. (iv) E(u it ) = 0,  i and t. (v) Each X j (j =1, 2,…,k) varies over time t and across individuals i. (vi) There is no perfect multicollinearity. (vii) E(Įi2 ) = ıĮ2 ,  i. ­°ı 2 , for t = s (viii) E(u it ,u is ) = ® u °¯0, for t z s

(ix) E(u it , Įi ) = 0,  i and t.

­°ı 2u +ı Į2 , for t = s (x) E(İ it ,İis ) = ® 2 for z s °¯ı Į , (xi) Įi ~IID(0, ıĮ2 ),  i. (xii) uit ~IID(0, ı2u ),  i and t. (xiii) E(X jit ,Įi ) = 0, j, i and t. The correlation coefficient between İ it and İ is is given by

ȡ = corr(İ it ,İ is )

­1, for t= s ° 2 ® ıĮ ° ı 2 +ı 2 , for t z s ¯ u Į

(12.68)

The last assumption is a crucial assumption for the random-effects model. It is necessary for the consistency of the RE model, but not for the FE model. It can be tested with the Hausman test. Now, we will derive the T u T matrix that

Chapter Twelve

654

describes the variance structure of the İit for individual i. For the ith individual, let us define, the vector

İ i = > İ i1 İ i2 ,.........İ iT @c . The variance-covariance matrix of İi is given by var(İ i ) = E(İ i İ ci )

ªİ i1 º «İ » « i2 » «. » = E « » > İ i1 , İ i2 ,.........,İ iT @ «. » «. » « » «¬İ iT »¼ ªı 2u +ı Į2 « 2 « ıĮ « . =« « . « . « 2 ¬« ı Į ªı Į2 « 2 «ıĮ « . =« « . « . « 2 «¬ı Į

ı Į2 ı Į2 . . . ı Į2

ı Į2

. . .

ı u2 +ı Į2 . .

. . . . . . . . .

. ı Į2

. . . . . .

ı Į2 º » ı Į2 » . » » . » . » » ı u2 +ı Į2 ¼»

. . . ı Į2 º ªı 2u » « . . . ı Į2 » « 0 . . . . » « . »« . . . . » « . . . . . » « . » « . . . ı Į2 »¼ «¬ 0

0 ı u2 . . . 0

. . . . . . . . .

0 0 .

º » » » » . . . . » . . . . » » . . . ı u2 »¼

= ıĮ2 lT lcT +ı2u IT (12.69)

= ȍT

ªı 2u +ı Į2 « 2 « ıĮ « . where 1T >1, 1,.......,1@c is a unit vector of size T and :T = « « . « . « 2 «¬ ı Į every individual i (i= 1, 2,……….,N).

ı Į2

. . .

ı u2 +ı Į2 . .

. . . . . . . . .

. ı Į2

. . . . . .

ı Į2 º » ı Į2 » . » » is a (T×T) matrix for . » . » » 2 ı u +ı Į2 »¼

We have ȍ = E(İİ c) = IN … ȍT = I N … (ı Į2 lT lcT +ı 2u I T )

= IN … (ıĮ2 T u BT +ı2u [QT +BT ]) 1 where BT = 1T1cT , between operator for a single individual; T QT

1 IT  1T1cT , within operator for a single individual. T

(12.70)

Regression Models Based on Panel Data

655

Therefore, equation (12.70) can be written as

ȍ = ıĮ2 T×B + ı2u INT where

1 B = IT … 1T1cT , between individual operator; T Q = I NT  B , within individual operator; and

INT is an identity matrix of NT rows and NT columns. Thus, we have

ȍ = Qı2u + (ıĮ2 T + ı2u )B

(12.71)

This clearly indicates that the random-effects estimator has the standard generalised least squares form summed over all individuals in the data set. This can be used to derive the generalised least estimator for the parameters. For each individual we can transform the model. In matrix form, the random-effects model can be written as Y = Xȕ+İ, with E(İİ c) = ȍ

(12.72)

The Generalised Least-Squares estimator (GLS) estimator of ȕ is given by -1 ȕˆ GLS = Xcȍ-1X Xcȍ-1Y

(12.73)

2 2 The Generalised Least Squares (GLS) gives us the efficient parameter estimates of ȕ, ıĮ and ıu based on the known structure of the variance-covariance matrix of ȍ .

The variance of ȕˆ GLS is var(ȕˆ GLS ) = ı u2 (X cȍ -1X)-1

(12.74)

Now, to obtain ȍ -1 , we will use the following formula:

ȍr = (ı2u )r Q + (ıĮ2 T + ıu2 )r B for any arbitrary scalar r. Thus, for r = -1, we have ȍ -1 =

1 1 Q+ 2 B 2 ıu (ı Į T + ı 2u )

And for r = -1/2, we have ȍ -1/2 =

1 1 Q+ B 2 ıu (ı Į T + ı u2 )

We have -1

-1 ª § ȍ ·-1 º ªȍº ȕˆ GLS = « Xc ¨ 2 ¸ X » Xc « 2 » Y «¬ © ı u ¹ »¼ ¬ ıu ¼

-1

-1 -1 = ª Xc Q+\ B X º Xc > Q+\ B@ Y ¬ ¼

(12.75)

Chapter Twelve

656

where ȥ = (ı 2u +ı Į2 T)/ı u2

1

ı Į2 T ı 2u

GLS as Weighted Least Squares -1/2 Premultiplying equation (12.72) by ıu ȍ , we have

ıu ȍ-1/2 Y = ıu ȍ-1/2 Xȕ+ıu ȍ-1/2İ

Y* = X*ȕ + İ* where Y*

(12.76)

ª º ıu ı u ȍ -1/2 Y = «Q + B» Y, and X* «¬ (ı Į2 T + ı u2 ) »¼

ª º ıu ı u ȍ -1/2 X = «Q + B» X. «¬ (ı Į2 T + ı 2u ) »¼

* -1/2 * -1/2 Thus, we can write Y = ( Q + ȥ B)Y, and X = ( Q + ȥ B)X .

In scalar form, it can be written as

y*it

(yit  yi )  ȥ-1/2 yi

ª 1 º yit  «1  » yi \ »¼ «¬

yit  (1  ș1/2 ) yi and

x*it

(xit  x i )  ȥ-1/2 x i

ª 1 º x it  «1  » xi \ »¼ «¬

xit  (1  ș1/2 ) xi

where ș =

ı 2u ı Į2 T + ı 2u

Thus, we can say that the GLS estimator can be easily obtained from the following transformed equation:

yit  (1  ș1/2 ) yi = ȕ0ș1/2 +(xit  (1  ș1/2 )xi )cȕ  uit  Įi ș1/2  (1  ș1/2 )ui yit  (1  ș1/2 ) yi = ȕ0ș1/2 +(xit  (1  ș1/2 )xi )cȕ  vit

(12.77)

1/2 1/2 where vit = uit  Įi ș  (1  ș )ui

In the transformed equation, the random error term vit is IID, and so, we can apply the OLS technique to the transformed equation. The GLS estimator of ȕ can also be obtained as -1

n ª n T º ȕˆ GLS = « ¦¦ (X it  X i )(X it  X i )c+șT ¦ (X i  X)(X i  X)c» u i=1 ¬ i=1 t=1 ¼ n ª n T º « ¦¦ (X it  X i )(yit  yi )+șT ¦ (X i  X)(Yi  Y) » i=1 ¬ i=1 t=1 ¼

(12.78)

and n ª n T º var(ȕˆ GLS ) = ı u2 « ¦¦ (X it  X i )(X it  X i )c+șT ¦ (X i  X)(X i  X)c» i=1 ¬ i=1 t=1 ¼

-1

(12.79)

From the above equation, we see that the random-effects estimator is the weighted combination of the within and between estimators. Thus, it can be written as

Regression Models Based on Panel Data

ȕˆ GLS = Ȍȕˆ B +(I  < )ȕˆ W

657

(12.80)

In (12.80), < is the weighted matrix and is proportional to the inverse of the covariance matrix of ȕˆ B 1. The value of < depends on \ in such a way that, if \ tends to zero, the fixed-effects and the random-effects estimators will coincide. This occurs when the variability of the individual effect is relatively larger than the random error terms. If \ tends to 1, ȍi would be a diagonal matrix and the random-effects estimator will be the OLS estimator. This can happen when the variability of the individual effect is very small compared to the random error terms. The between estimator is given by ȕˆ B = (X cBX) -1X cBY

ª n º = « ¦ (X i  X)(X i  X)c» ¬ i=1 ¼

-1

n

¦ (X

 X)(yi  y)

i

(12.81)

i=1

which is obtained using the OLS technique to the following regression equation: (12.82)

yi = ȕ 0 +x ci E +u i

The within estimator is given by ȕˆ W = (X cQX) -1X cQY

ª n T º = « ¦¦ (X it  X i )(X it  X i )c » ¬ i=1 t=1 ¼

1

n

T

¦¦ (X

it

 X i )(yit  yi )

(12.83)

i=1 t=1

which is obtained using the OLS method to the following regression equation:

yit  yi

(Xit  Xi )cȕ  (uit  ui )

(12.84)

Show that v it is independently and identically distributed Proof: Here, v it is defined as

vit = uit  Įi ș1/2  (1  ș1/2 )ui The expected value of v it is given by

E(vit ) = E(uit )+E(Įi )ș1/2  (1  ș1/2 )E(ui ) =0 The covariance between v it and v is is

cov(vit , vis ) = E(uit +Įi ș1/2  (1  ș1/2 )ui )(uis  Įi ș1/2  (1  ș1/2 )ui ) (1  ș1/2 )

1 1 E(u it2 )+șıĮ2  (1  ș1/2 ) E(u is2 )  (1  ș1/2 )2 E(u i2 ) T T

 2(1  ș1/2 )

ı 2u ı2  șı Į2 +(1  ș1/2 ) 2 u T T

 2(1  ș1/2 )

1

ı 2u ı2  șı Į2 +(1  2ș1/2 +ș) u T T

See Hsiao, 2003, Section 3.4 for details

(12.85)

Chapter Twelve

658

ș

Tı Į2 +ı 2u ı 2u ª1  2ș1/2  2+2ș1/2 º¼ + T T ¬

ı 2u ı 2u  T T

(12.86)

0

Since cov(vit ,vis ) = 0,  t z s , we can say that v it is independently and identically distributed. The variance v it is given by var(v it ) = ı 2u +

4ı 2u ª1  ș1/2 º¼ T ¬

(12.87)

Hence, the theorem is proved. Feasible Generalised Least Squares (FGLS) of the Random-Effects Estimators

The variance components ı Į2 , and ı 2u are unknown to us, and so, we can obtain the feasible GLS estimator (FGLS), where the unknown variances are consistently estimated in the first step. We know that the regression equation for the between estimator is given by (12.88)

yi = ȕ 0 + x ci ȕ+İi

The variance of the error term İi is given by var(İi ) = var(Į i ) + var(u i )

= ıĮ2 +ı2u /T

(12.89)

Thus, we can write ıˆ 2B =

1 n ¦ (yi  ȕˆ 0B  x ciȕˆ B )2 n i=1

(12.90)

The variance for within regression is given by ıˆ 2u =

1 n(T  1)

n

T

¦¦ uˆ

2 it

(12.91)

i=1 t=1

where uˆ it = (yit  yi )  (x it  x i )cȕˆ FE . 2 Thus, using these two estimators from equation (12.89), we can obtain ıĮ as given by

ıˆ Į2 = ıˆ 2B  ıˆ 2u /T

(12.92)

Therefore, the estimated value of ș is given by șˆ =

ıˆ 2u ıˆ 2u +ıˆ Į2 T

(12.93)

Thus, the estimate of the random-effects estimator of ȕ is given by -1

n ª n T º ˆ ȕˆ FGLS = « ¦¦ (X it  X i )(X it  X i )c+șT (X i  X)(X i  X)c » u ¦ i=1 ¬ i=1 t=1 ¼ n ª n T º ˆ « ¦¦ (X it  X i )(yit  yi )+șT ¦ (X i  X)(Yi  Y) » i=1 ¬ i=1 t=1 ¼

(12.94)

Regression Models Based on Panel Data

659

The estimated variance of the random effect estimator is given by n ª n T º ˆ var(ȕˆ FGLS ) = ıˆ u2 « ¦¦ (X it  X i )(X it  X i )c+șT (X i  X)(X i  X)c» ¦ i 1 ¬ i=1 t 1 ¼

-1

(12.95)

Since șˆ is positive, from the above equation, it can be said that the random-effects estimators are more efficient. The gain in efficiency is due to the use of between variations in the data (Xi  X). The covariance matrix is estimated by the OLS method to the transformed equation yit  (1  șˆ 1/2 ) yi = ȕ 0 șˆ 1/2 +(X it  (1  șˆ 1/2 )X i )cȕ  vit

(12.96)

Important Properties of the Random-Effects Estimator

The important properties of the random-effects estimator ȕˆ RE are highlighted below: (i) The random-effects estimator or GLS estimator ȕˆ GLS is a linear function of the dependent variable y. (ii) The random-effects estimator or GLS estimator ȕˆ GLS is an unbiased estimator of ȕ . (iii)

The

variance-covariance

matrix

of

the

random-effects

estimator

or

GLS

estimator

ȕˆ GLS

is

-1

n ª n T º var(ȕˆ GLS ) = ı 2u « ¦¦ (X it  X i )(X it  X i )c+șT ¦ (X i  X)(X i  X)c» . i i=1 t=1 1 ¬ ¼

(iv) The random-effects estimator or GLS estimator ȕˆ GLS is BLUE. (v) The random-effects estimator or GLS estimator ȕˆ GLS is a consistent estimator of ȕ . (vi) If either n or T is very large, the random-effects estimator or GLS estimator ȕˆ GLS is more efficient. (vi) Taylor (1980) has derived that the ȕˆ FGLS is more efficient than least-squares dummy variables (LSDV) estimator but the degrees of freedom will be smaller. (vii) Taylor has derived that the variance of ȕˆ FGLS is never more than 17% above the Cramer-Rao lower bound. The Breusch-Pagan Lagrange Multiplier (LM) Test for Random Effects

The magnitude of the correlation coefficient ȡ of (12.68) plays an important role in the feature of the random-effects model. If the estimated value of ȡ is very high, then it can be said that a large fraction of the total error variance is attributed to individual heterogeneity. In a panel-data regression model, if Įi = 0, for every individual, we can say that there are no individual differences and there is no problem of individual heterogeneity. In this situation for panel data, we have no need to deal with the fixed-effects or random-effects model; the pooled linear regression model will be appropriate. Thus, Breusch-Pagan derived a Lagrange Multiplier-type test statistic to detect the presence of random 2 effects in a panel-data regression model. In a random-effects model, we assume that Įi ~IID(0, ıĮ ),  i, which implies 2

that E(Į i ) = 0, and var(Įi ) = ıĮ . In addition, if var(Įi ) = 0,  i, effectively, this implies that every individual has the 2

same intercept that is a constant with a value equal to zero, and we can run a pooled regression. If var(Įi ) = ıĮ 0, in this case, the correlation coefficient ȡ will be zero and there is no random individual heterogeneity problem in the data. The null hypothesis to be tested for detecting the presence of heterogeneity problem in the data is

H0 : ıĮ2 = 0 against the alternative hypothesis

H1: ıĮ2 > 0 .

Chapter Twelve

660

The principle of the Breusch-Pagan Lagrange Multiplier (LM) test is very simple and convenient because this test is based on the estimation of the restricted model under the null hypothesis. Under the null hypothesis, i.e., when Į i = 0, the restricted random-effects model can be written as (12.97)

yit = ȕ 0 +x cit ȕ+u it

We know that, if the null hypothesis is true, the least-squares method would be appropriate. The least-squares residuals of the restricted model (12.97) are given by eit

yit  ȕˆ 0  x cit ȕˆ

(12.98)

The Breusch-Pagan Lagrange Multiplier (LM) test is based on these least-squares residuals, and for a balanced panel, it is given by 2 n ª ­T ½ º « ¦ ®¦ eit ¾ » nT « i=1 ¯ t=1 ¿ » LM = 1 n T » 2(T  1) « eit2 » « ¦¦ i=1 t=1 «¬ »¼

2

(12.99)

If the null hypothesis H0 : ıĮ2 = 0, is true, i.e., there are no random effects, the LM statistic is distributed (in large sample) as chi-square with 1 degree of freedom. The term

n

i=1

(12.99) because the term

n

­

i=1

t=1

2

T

t=1

it

½ ¾ differs from the term ¿

n

T

¦¦ e

2 it

of equation

i=1 t=1

2

T

¦ ®¯¦ e

­

¦ ®¯¦ e

it

½ ¾ contains terms like 2ei1ei2  2ei1ei3  2ei2 ei3  2ei2 ei4  ..... . If the null hypothesis ¿

is true, it indicates that the sum of these terms will not be significantly different from zero. Then, the term will be approximately equal to the term

n

T

¦¦ e

2 it

2

n

­T ½ ®¦ eit ¾ ¦ i=1 ¯ t=1 ¿

and the LM test will be approximately zero. If the null hypothesis is

i=1 t=1

not true, the sum of these terms 2ei1ei2  2ei1ei3  2ei2 ei3  2ei2 ei4  ..... will be positive, the term greater than

n

T

¦¦ e

2 it

n

2

­T ½ ® ¦ eit ¾ will be ¦ i=1 ¯ t=1 ¿

, and the LM test will be positive.

i=1 t=1

If we reject the null hypothesis at D level of significance with 1 degree of freedom, it indicates the presence of random effects in the data. Test for Poolability

For the econometric analysis of economic relationships based on panel data, a common question may arise whether it is appropriate to pool. That is why we apply an F-test statistic which is based on a panel-data regression model. Let us consider a general panel-data regression model against the more general model yit = x itc ȕi +Įi +u it , i =1, 2,…..,N, and t =1, 2,……,T.

The null hypothesis to be tested is H 0 : ȕ1 = ȕ 2 =.......= ȕ N = ȕ

imposing ^(N-1)k` restrictions on equation (12.100). against the alternative hypothesis H1 : At least one of them is not equal.

(12.100)

Regression Models Based on Panel Data

661

Under the null hypothesis, the restricted panel-data regression model can be expressed as (12.101)

yit = x itc ȕ+Įi +u it

where the ȕ parameters are allowed to differ between individuals. Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/{(N  1)k} ~F^(N-1)k, N(T-k-1))` UESS/{N(T  k  1)}

(12.102)

where RRSS is the restricted residual sum of squares from the OLS estimation of the restricted model (12.101) which is obtained from within regression, and UESS is the unrestricted residual sum of squares from the OLS estimation of the unrestricted regression equation (12.100). UESS is also given by UESS =

N

¦ RSS , i

i=1

Ti

Ti

t=1

t=1

where RSSi = SS yi  SPxy2 i /SSx i , SSxi = ¦ (Xit  X i )2 , SSyi = ¦ (yit -yi ) 2 ,

Ti

SPxyi = ¦ (X it  X i )(yit  yi ), yi = t=1

1 Ti

Ti

¦y

it

, xi =

t=1

1 Ti

Ti

¦x

it

, ȕˆ i =

t=1

SPxyi SSx i

, and Dˆ i

yi  ȕˆ i x i

The rejection of the null hypothesis at a given level of significance implies that the regression coefficients for all individuals in the panel data are not identical. The Fixed-Effects Estimator in a Random-Effects Model

We know that, in a panel-data regression model, the fixed-effects estimator is consistent even when the random error term Įi is correlated with any of the explanatory variables X j's (j =1, 2,..…,k). To show this, let us now consider the random-effects model of type yit = ȕ 0 +ȕ1X1it +ȕ 2 X 2it +...........+ȕ k X kit +Į i +u it

(12.103)

For fixed-effects estimation, we first have to express the panel-data model in the deviation form from the individual means which does not contain the random error term-individual effects Į i . Averaging the observations over time, equation (12.103) can be written as

yi = ȕ0 +ȕ1X1i +ȕ2 X2i +...........+ȕk Xki +Įi +ui where yi =

(12.104)

1 T 1 T 1 T y it , X ji = ¦ X jit , j = 1, 2,…,k. and u i = ¦ u it . ¦ T t=1 T t=1 T t=1

Subtracting equation (12.104) from (12.103), we have

yit  yi = ȕ1 (X1it  X1i )+ȕ2 (X2it  X2i )+...........+ȕik (Xkit  Xki )+(uit  ui ) yit  yi

(Xit  Xi )cȕ  (u it  ui )

(12.105)

where X it  X i = ª¬ X1it  X1 , X 2it  X 2 ,...........,X kit  X k º¼c , and ȕ = >ȕ1 , ȕ2 ,.......,ȕ k @c . Equation (12.105) is exactly the same as equation (12.41). The transformed equation (12.104) for the panel data eliminates the random effects Įi as well as any other time-invariant components. Thus, the least-squares estimator of ȕ of equation (12.105) will be identical to the fixed-effects estimator and will be consistent for large n whether the random effects Įi is correlated with the independent variable(s) or not.

Chapter Twelve

662

Comparison of Between, GLS, OLS and Within Estimators

We have -1

§ · § · 1 1 ¨ X cQX+ X cBX ¸ ¨ X cQY+ X cBY ¸ ȥ ȥ © ¹ © ¹

ȕˆ GLS

-1

-1

§ · § · ­1 ½ 1 1 ¨ X cQX+ X cBX ¸ (X cQY)+ ¨ X cQX+ X cBX ¸ ® X cBY ¾ ȥ ȥ © ¹ © ¹ ¯ȥ ¿ ȕˆ W

X cQX

-1

(12.106)

(X cQY)

(12.107)

(X cBY)

(12.108)

and ȕˆ B

X cBX

-1

Equation (12.106) can be written as § · 1 ¨ X cQX+ X cBX ¸ ȥ © ¹

ȕˆ GLS

-1

-1

X cQX X cQX

-1

§ · 1 1 -1 (X cQY) + ¨ X cQX+ X cBX ¸ X cBX X cBX X cBY ȥ © ¹ ȥ

= W1ȕˆ W +W2ȕˆ B

where W1

(12.109)

§ · 1 ¨ X cQX+ X cBX ¸ ȥ © ¹

-1

-1

XcQX ,

and W2

§ · 1 1 X cBX . ¨ X cQX+ X cBX ¸ ȥ © ¹ ȥ

Thus, it can be said that the generalised least-squares estimator of ȕ or the random-effects estimator is the weighted sum of the within and between estimators. The weighted matrices are W1 and W2 respectively. (i) If ıĮ2

0, then

1

(ii) If T o f , , then

(iii) If

1

\

1, and we have ȕˆ GLS = ȕˆ OLS .

\ 1

\

o 0 , and then, ȕˆ GLS o ȕˆ W .

o f , then ȕˆ GLS o ȕˆ B .

(iv) var(ȕˆ W )  var(ȕˆ GLS ) is a positive semidefinite matrix. (v) If

1

\

o 0 , then var(ȕˆ W ) o var(ȕˆ GLS ) .

Summary of Different Estimators

We have seen that, in a panel analysis, there have been different types of estimators for the parameter vector ȕ. There are three ways to estimate ȕ : The Between Estimator

The regression equation is yi = ȕ 0 +x icȕ+İi

Assumptions: (i) n is large. (ii) E(x i Įi ) = 0.

(12.110)

Regression Models Based on Panel Data

663

(iii) E(x i u i ) = 0. This means that the explanatory variables are strictly exogenous and uncorrelated with the individual specific effect Įi . Applying the OLS method to equation (12.110), we have ª n º ȕˆ B = « ¦ (x i  x)(x i  x)c» ¬ i=1 ¼

-1

n

¦ (x

i

 x)(yi  y)

(12.111)

i=1

The Fixed Effects or Within Estimator

The regression equation will be of the type (12.112)

yit  yi = (x it  x i )cȕ+(u it  u i )

Assumptions:

(i) n is large or T is large. (ii) Įi ~IID(0, ıD2 ),  i. (iii) uit ~IID(0, ı2u ),  i and t. (iv) There is no perfect multicollinearity. (v) E{(x it  x i )u it } 0. This indicates that explanatory variables are strictly exogenous, but it does not impose any restrictions upon the relationship between xit and Įi . Applying the OLS method to equation (12.112), the estimator which is obtained is called the within estimator as given by ª n T º ȕˆ W = « ¦ ¦ (x it  x i )(x it  x i )c » ¬ i=1 t=1 ¼

-1

n

T

¦ ¦ (x

it

 x i )(y it  yi )

(12.113)

i=1 t=1

The OLS Estimator

The OLS estimator combines the within and between dimensions of the data. The regression equation is of the type yit = ȕ 0 +ȕ1X1it +ȕ 2 X 2it +...........+ȕ k X kit  Įi +u it

Assumptions:

(i) n is large or T is large. (ii) E ^X jit İ it )` = 0,  j, i and t. (iii) E(X jit ,Įi ) = 0,  j, i and t. (iv) E(X jit ,u it ) = 0,  j, i and t. (v) uit ~IID(0, ı2u ),  i and t. (vi) Įi ~IID(0, ıĮ2 ),  i. (vii) E(u it ,Į i ) = 0,  i and t. (ix) There is no multicollinearity problem.

(12.114)

Chapter Twelve

664

Applying the OLS method to equation (12.114), the estimator which is obtained is called the OLS estimator as given by ª n T º ȕˆ OLS = « ¦¦ (X it  X i )(X it  X i )c » ¬ i=1 t=1 ¼

1

n

T

¦¦ (X

it

 X i )(y it  y)

(12.115)

i=1 t=1

The OLS estimators exploit the dimensions both within and between but not efficiently. The Random Effects Estimator

The random-effects estimator also combines both the within and between dimensions of the data. The regression equation is of the type (12.116)

yit = ȕ 0 +x cit ȕ+Įi +u it

Assumptions:

(i) n is large or T is large. (ii) Įi ~IID(0, ıĮ2 ),  i. (iii) uit ~IID(0, ı2u ),  i and t. (iv) E(X jit ,Įi ) = 0, j, i and t. (v) E(X jit ,uit ) = 0,  i, j and t. (vi) There is no perfect multicollinearity. The generalised least-squares estimator of ȕ of equation (12.116) is called the random-effects estimator which is given by -1 ȕˆ RE = ª¬Xcȍ-1X º¼ Xcȍ-1Y -1

ª n T º 1 n = « ¦ ¦ (X it  X i )(X it  X i )c+ T ¦ (X i  X)(X i  X)c» u \ i=1 ¬ i=1 t=1 ¼

ª n T º 1 n « ¦¦ (X it  X i )(yit  yi )+ \ T ¦ (X i  X)(Yi  Y) » i=1 ¬ i=1 t=1 ¼

(12.117)

This is also called the GLS estimation which exploits the combination of the between and within data set (differences between individual and within individuals) and determines the GLS estimator in the regression of individual y on individual x at time t. The First-Difference Estimator

The regression equation will be of the type ǻyit = ȕ1ǻX1it +ȕ 2 ǻX 2it +...........+ȕ k ǻX kit +ǻu it 'X itc ȕ+'u it

Assumptions:

(i) n is large. (ii) Įi ~IID(0, ıD2 ),  i. (iii) uit ~IID(0, ı2u ),  i and t. (iv) Each X j (j =1, 2,….…,k) varies over time t and across individuals i.

(12.118)

Regression Models Based on Panel Data

665

(vi) There is no perfect multicollinearity. (vii) E ^(X it  X it-1 )(u it  u it-1 )` 0. Applying the OLS method to equation (12.118), the estimator which is obtained is called the first-difference estimator of ȕ which is given by ª n T º ȕˆ FD = « ¦¦ ǻX it ǻX cit » ¬ i=1 t=1 ¼

1

n

T

¦¦ ǻX

it

(12.119)

ǻy it

i=1 t=1

This estimator is an alternative of the fixed-effects estimator based on the within transformation of the panel data. This estimator exploits the time variation in the panel data. Fixed Effects or Random Effects

It is very difficult for us to choose either the fixed-effects model or the random-effects model for many panel analyses of economic relationships, specifically when T is small. In econometric literature, the issue of choice between fixedeffects and random-effects models has generated a hot debate. Mundlak (1961) and Wallace and Hussain (1969) argued in favour of the fixed-effects model whereas Balestra and Nerlove (1966) advocated in favour of the random-effects model. But, in many applications, applied researchers have argued that the random-effects models are more appropriate than the fixed-effects models if the sample observations are selected at random from the population; and the fixed-effects models are more appropriate than the random-effects models if the samples contain all the population observations. For example, if we consider all the listed companies of DSE, the fixed-effects model will be appropriate. The transformation which is involved in a random-effects model will not remove the explanatory variables which do not vary over time, and hence, their effects on the dependent variable will be enumerated. Since the random-effects models do not include the dummy variables for the individual effects, a few parameters are to be estimated and the degrees of freedom will not be lost. The GLS estimators will be more efficient than fixed-effects estimators. However, in a random-effects model, we assume that the unobserved effects are not correlated with the explanatory variables, but in many cases, we might expect them to be non-zero. This implies inconsistency due to the omitted variables in the random-effects model. If this assumption is not satisfied, the fixed-effects model is preferable but, in this situation, the fixed-effects estimator will be consistent but inefficient. To test whether this assumption is true or not, a specification test was developed by Hausman (1978). The Hausman test (1978) is based on the difference between the fixed effects and random-effects estimators. Also, another test was developed by Chamberlain (1982) which is based on the restrictions of the parameters of the fixed-effects model. One can testify the validity of these restrictions before applying the fixed-effects model. We are very lucky that different software packages including RATS, STATA, EViews, TSP, LIMDEP, R, etc. can be applied directly to estimate fixed-effects and random-effects models and the associated Hausman (1978) and Chamberlain (1982) tests. Therefore, we should not stop here. We have to test the restrictions applicable for the fixed-effects model by Chamberlain (1982) and we have to check whether the unobserved effects and the explanatory variables are correlated or not. Hausman Test for Panel Data

The Hausman (1978) test is applied to check the existence of a relationship between the error component Įi and the independent variables X's . In panel data, it helps make a choice between the fixed- and random-effects model by testing the hypothesis of independence between X’s and the individual specific effects. The random-effects model assumes that the individual effect Įi is uncorrelated with X’s, but in many cases, this assumption is not true. That is why we need to apply the Hausman test for testing the adequacy of the random-effects model. This test is based on the comparison of the estimated coefficients from the random-effects model to those from the fixed-effects model. If there is no correlation between the error components and the regressors,the fixed-effects and random-effects estimators are consistent. Thus, the adequacy of the random-effects model will be tested by testing the following null hypothesis: H 0 : The error term Įi is uncorrelated with the x's

against the alternative hypothesis H1 : The error term Į i is correlated with the x's .

Under H 0 , the fixed-effects and the random-effects estimators are consistent, but the random-effects estimator is more efficient. Under H , the fixed-effects estimator ȕˆ is still consistent, but the random-effects estimator ȕˆ is 1

FE

RE

inconsistent. Thus, a test based on the difference between these two estimators may indicate that endogeneity affects the consistency of the random-effects estimator.

Chapter Twelve

666

For the regression model in panel data, the Hausman specification test is var(ȕˆ FE  ȕˆ RE ) = var(ȕˆ FE ) +var(ȕˆ RE )  2cov(ȕˆ FE , ȕˆ RE )

(12.120)

If the null hypothesis is true, we have cov(ȕˆ FE , ȕˆ RE )

var(ȕˆ RE )

(12.121)

Therefore, equation (12.120) can be written as var(ȕˆ FE  ȕˆ RE ) = var(ȕˆ FE )  var(ȕˆ RE )

(12.122)

Under the null hypothesis, the Hausman test statistic is given by h=

(ȕˆ FE  ȕˆ RE )c(ȕˆ FE  ȕˆ RE ) 2 ~Ȥ k var(ȕˆ )  var(ȕˆ ) FE

(12.123)

RE

So, the Hausman test tests whether the fixed-effects and random-effects estimators are significantly different or not. If h is statistically significant, we should not use the random-effects estimator. In this case, we may also conduct a test for the significance of the individual effects. If h is low, say less than the critical value, we do not reject the null hypothesis. We conclude that endogeneity does not affect the consistency of the random-effects estimators, and so, we can apply the random-effects model. If h is high, say greater than a critical value, we reject the null hypothesis, and we conclude that endogeneity is a problem in the random-effects model. Thus, we should use the fixed-effects model. Goodness-of-Fit of Panel Estimation

The goodness of fit means what portion of the total variation of the dependent variable is explained by the fitted regression equation. In panel estimation, the computation of the measure of goodness-of-fit is very uncommon because in panel analyses, we may give importance to explain the within and between variations in the panel data. Another reason is that the R2 and adjusted R2 criteria are only appropriate if the models are estimated by the OLS. Here, R2 is defined as the squared correlation coefficient between the actual and fitted values. In this context, the total sum of squares of the variable y can be partitioned into components sum of squares, i.e., within sum of squares (WSS) and between sum of squares (BSS). Thus, the total sum of squares is given by TSS =

n

T

¦¦ (y

it

 y) 2

i=1 t=1 n

T

= ¦¦ (yit  yi +yi  y) 2 i=1 t=1 n

T

n

T

n

T

= ¦¦ (yit  yi ) 2 +2¦¦ (yit  yi )(yi  y)+¦¦ (yi  y) 2 i=1 t=1 n

T

i=1 t=1

i=1 t=1

n

= ¦¦ (yit  yi ) 2 +T ¦ (yi  y) 2 i=1 t=1

i =1

= WSS + BSS 1 n T ¦¦ yit , is the overall sample mean and yi nT i=1 t=1 is obtained by averaging over time.

where y =

(12.124) 1 T ¦ yit is the sample average of the ith individual which T t=1

Therefore, the total variation of the dependent variable y can be expressed as the sum of the within variation and between variation, i.e., 1 n T 1 n T 1 n (yit  y) 2 = (yit  yi ) 2 + ¦ (yi  y) 2 ¦¦ ¦¦ nT i=1 t=1 nT i=1 t=1 n i=1

(12.125)

We can define alternative versions of R2 measure, depending upon the dimension of the data that we are interested in. For example, the fixed-effects estimator is chosen to explain the within variation as much possible, and thus, maximise the within R2 which is defined as

Regression Models Based on Panel Data

R 2W (ȕˆ FE ) = corr 2 yˆ itFE  yˆ iFE , yit  yi

667

(12.126)

The correlation between (yˆ itFE  yˆ iFE ) and (yit  yi ) is given by

corr (yˆ itFE  yˆ iFE ,yit  yi ) =

cov(yˆ itFE  yˆ iFE ,yit  yi ) var(yˆ itFE  yˆ iFE )var(yit  yi )

(12.127)

where yˆ itFE  yˆ iFE (x it  x i )cȕˆ FE and corr 2 indicates the squared correlation coefficient. We know that the between estimator ȕˆ is the OLS estimator in the panel-data model in terms of individual means that maximises the between R 2 B

and is defined as R 2B (ȕˆ B ) = corr 2 (yˆ iB , yi )

(12.128)

The correlation coefficient between yˆ iB and yi is given by

corr (yˆ iB ,yi ) = where yˆ iB

cov(yˆ iB ,yi )

(12.129)

var(yˆ iB )var(yi )

x icȕˆ B .

The OLS estimator ȕˆ maximises the overall goodness-of-fit, and thus, the overall R2 is defined as 2 ˆ = corr 2 (yˆ ,y ) R OV (ȕ) it it

(12.130)

The correlation coefficient between yˆ it and yit is given by corr (yˆ it ,yit ) =

cov(yˆ it ,y it )

(12.131)

var(yˆ it ) var(yit )

where yˆ it = x cit ȕˆ OLS For an arbitrary estimator ȕˆ of ȕ of a panel regression model, we can also define the within, between and overall R 2s by using a fitted value yˆ it

x cit ȕˆ , yˆ i

1 T 1 n T yˆ it , and yˆ = ¦ ¦¦ yˆ it , where the intercept terms are omitted from the T t=1 nT i=1 t=1

regression equation. For the fixed-effects estimator, we ignore the variation that is captured by Įˆ 1 , Įˆ 2 ,......, Įˆ n . It can be said that the fixedeffects model fits the between variation very well if we consider the variation that happens due to n estimated intercepts Įˆ 1 , Įˆ 2 ,......,Įˆ n respectively. It is somewhat unrealistic and very difficult to say that the variation between individuals is explained by fixed effects Įˆ i , they just capture it. It would be better to ignore this part of the model because we are not often computing Įˆ i . The three measures are defined in terms of the squared correlation coefficient that can be estimated by considering any of the estimators. If we consider the random-effects estimator which would be the most efficient estimator under the assumption that the individual effects Įi is uncorrelated with X’s, the within, between, and overall R2s will be smaller than the fixed effects, between, and the OLS estimators respectively. The goodness of fit between pooled model which is estimated by the OLS method and the fixed-effects model can be compared based on the usual R2. The usual R2 is not valid for comparison between the fixed-effects model and the random-effects model because in the fixed-effects model Įi 's are considered explanatory variables, while in the random-effects model, they are considered the random error terms. Based on the computed R2, comparison should be done only within the same class of models and estimators. Test for Autocorrelation in Fixed Effects Regression Models

The following steps are involved in testing the presence of serial correlation in the error terms of a fixed effects regression model.

Chapter Twelve

668

Step 1: Suppose, we have panel data of (yit , x it ), for i = 1, 2, …..,n, and t = 1, 2,. . . , T. Here, yit is the scalar dependent variable and xit be the vector of k (k t 2) independent variables. Thus, in the first step, we consider the fixed-effects linear regression model of the type yit = ȕ 0 +ȕ1X1it +ȕ 2 X 2it +...........+ȕ k X kit +Į i +u it

(12.132)

= ȕ 0 +x itc ȕ+Įi +u it

where ȕ is a vector of parameters; Įi is the fixed effects of the ith individual; uit is the unobservable random error terms. We assume that xit is uncorrelated with u it , , i.e., xit is exogenous. We can apply the OLS method to equation (12.132) to perform the test for autocorrelation and it is less complex relative to the random-effects model. In the random-effects model, the assumptions are Įi ~IID(0, ıĮ2 ),  i and t, and E(Į i , x it ) = 0,  i and t. Under these assumptions, we can also apply the OLS method to the random-effects model. Step 2: Now, we consider the first order autocorrelation model of the type

uit = ȡui,t-1 +vit

(12.133)

where vit is IID across individuals and time. Here, the assumption is ȡi = ȡ . The null hypothesis to be tested is H0 : ȡ = 0

against the alternative hypothesis H1: ȡ z 0 .

Step 3: Next, we apply the OLS method to equation (12.135), and then, we obtain the OLS residuals eit 's which are the estimated value of u it 's and given by eit = (yit  yi )  (x it  x i )cȕˆ FE , i and t

(12.134)

where ȕˆ FE is the fixed-effects estimator of ȕ as given by ª n T º ȕˆ FE = « ¦ ¦ (x it  x i )(x it  x i )c » ¬ i=1 t=1 ¼

1

n

T

¦ ¦ (x

it

 x i )(yit  yi )

(12.135)

i=1 t=1

Step 4: Then, we apply the appropriate test statistic for detecting the presence of autocorrelation. Bhargava, Franzini and Narendranathan (1983) suggested the following generalisation of the Durbin-Watson statistic as given by n

DWp =

T

¦¦ (e i=1 t=1 n

it T

 eit-1 ) 2

(12.136)

¦¦ e

2 it

i=1 t=1

We can compare the calculated value with the table values proposed by Bhargava, Franzini and Narendranathan (1983) for different values of n, T, and k. Under the null hypothesis, the Breusch and Godfrey (1981) LM test is given by 2

LM =

nT 2 ª ece-1 º ~ Ȥ 12 T  1 ¬« ecec ¼»

where the residual vector e is given by e = y  xȕˆ FE

Residauls of mean deviation regression.

The rejection of the null hypothesis implies the presence of autocorrelation in the data.

(12.137)

Regression Models Based on Panel Data

669

LM Test for Serial Correlation in Random-effects Regression Models

The following steps are involved in testing the presence of serial correlation in the error terms of a random-effects regression model. Step 1: Now, we consider the random-effects linear regression model of the type

(12.138)

yit = x cit ȕ+Įi +u it

where yit (i =1, 2,….., n, t = 1, 2,…..,T) is the scalar dependent variable and xit is the vector of k (k t 2) independent variables; ȕ is a vector of parameters; Įi is the random effect of the ith individual; Įi ~IID(0, ıĮ2 ),  i and t, uit is the unobservable random error terms; and uit ~IID(0, ı2u ) . We assume that uit =ȡui,t-1 +vit , where vit is IID across individuals and time. Step 2: Then, we set up the null hypothesis against an appropriate alternative hypothesis. The null hypothesis to be tested is Case 1: Assuming ıĮ2 = 0, or the pooled model H0 : ȡ = 0

against the alternative hypothesis H1: ȡ z 0 .

Case 2: Assuming ıĮ2 >0, or the random-effects model H0 : ȡ = 0

against the alternative hypothesis H1: ȡ z 0 .

Case 3: Testing the joint null hypothesis

H0 : ȡ = 0, ıĮ2 = 0 against the alternative hypothesis

H1: ȡ z 0, ıĮ2 z 0 . Step 3: Under the null hypothesis, the Breusch and Godfrey (1981) LM test statistic for case 1 is given by 2

LMV 2 D

ª n T º 2 « ¦¦ eit eit-1 » nT i=1 t=2 « » ~Ȥ12 (ȡ=0) = 0 T 1 « n T 2 » eit » «¬ ¦¦ i=1 t= ¼

(12.139)

The joint LM test statistic for case 3 is given by LM(ı 2u =0, ȡ =0) =

nT A 2  4AB+2TB2 ~Ȥ 22 2(T  1)(T  2) n

where A

ec(I N … J T )e 1 ece

§

i=1

2

T

¦ ¨© ¦ e t=1 n T

¦¦ e i=1 t=1

· ¸ ¹  1 , and B = ece-1 = ece 2

it

it

(12.140) n

T

¦¦ e e

it it-1

i=1 t=2 n T

¦¦ e i=1 t=1

2 it

Chapter Twelve

670

where eit 's are the OLS residuals from the restricted model, i.e., from the pooled regression model with no autocorrelation. The rejection of the null hypothesis indicates the presence of autocorrelation in the panel data. We apply the OLS method to equation (12.138) and obtain the OLS residuals eit which are the estimated value of uit as given by eit = (yit  yi )  (x it  x i )cȕˆ FE ,  i and t

(12.141)

where ȕˆ FE is the fixed-effects estimator of ȕ and given by ª n T º ȕˆ FE = « ¦ ¦ (x it  x i )(x it  x i )c » ¬ i=1 t=1 ¼

1

n

T

¦ ¦ (x

it

 x i )(yit  yi )

(12.142)

i=1 t=1

Step 4: Now, we apply the appropriate test statistic for detecting the presence of autocorrelation. Bhargava, Franzini and Narendranathan (1983) suggested the following generalisation of the Durbin-Watson statistic as given by n

DWp =

T

¦¦ (e i=1 t=1 n

it

 eit-1 ) 2

(12.143)

T

¦¦ eit2 i=1 t=1

We can compare the calculated value with the table values proposed by Bhargava, Franzini and Narendranathan (1983) for different values of n, T and k. Under the null hypothesis, the Breusch and Godfrey (1981) LM test is given by 2

LM =

nT 2 ª ece-1 º ~Ȥ 12 T  1 «¬ ecec »¼

(12.144)

where the residual vector e is given by e = y  xȕˆ FE

Residauls of mean deviation regression.

Estimation of AR(1) Models (Paris-Winsten)

Let us consider the fixed-effects model of the type yit = x cit ȕ+Įi +u it

(12.145)

where yit (i =1, 2,….., N, t = 1, 2,…..,T) is the scalar dependent variable and x it is the vector of k (k t 2) independent variables; ȕ is the vector of parameters; Įi is the fixed effect of the ith individual; uit is the unobservable random error terms; and u it ~IID(0, ı2u ). We assume that uit = ȡui,t-1 +vit , where vit is IID across individuals and time. To estimate the AR(1) model, we obtain the OLS residuals eit which are given by eit = y it  x it' ȕˆ  Įˆ i ,  i and t

where ȕˆ is the OLS estimator of ȕ obtained from equation (12.145). We obtain the first-round estimate of ȡ which is given by

(12.146)

Regression Models Based on Panel Data n

671

Ti

¦¦ e e

it it-1

ȡˆ =

i=1 t=2 n Ti

¦¦

(12.147)

eit2

i=1 t=2

Using the first-round estimate of ȡ , we transform the variables such that ˆ it-1 , X*jit y*it = yit  ȡy

ˆ jit-1 , and u *it = u it  ȡu ˆ it-1 , i, j, and t >1 . X jit  ȡX

For t=1, the transformation will be

y*i1 = 1  ȡˆ 2 yi1 , X*ji1

1  ȡˆ 2 X ji1 ,  i and j, and u*i1 = 1  ȡˆ 2 u i1 .

The transformed regression equation is given by y*it

ˆ ˆi x*itcȕ+Į*i +u*it , where Į*i = Įˆ i  ȡĮ

(12.148)

We apply the OLS method to run equation (12.148) and obtain the least-squares estimators of the parameters. Denoting ˆˆ and Įˆˆ , and using these second-round estimators, we then compute the second-round estimates of the parameters by ȕ, i

the second-round residuals as given by ˆ eˆˆ it = yit  x it' ȕˆ  Įˆˆ i ,  i and t

(12.149)

and then, we obtain the second-round estimate of ȡ as given by n

ȡˆˆ =

Ti

¦¦ eˆˆ eˆˆ

it it-1

i=1 t=2 n Ti

¦¦

(12.150)

eˆˆ it2

i=1 t=2

This iterative procedure is repeated until the value of the estimate of ȡ and ȕ converges. Test for Heteroscedasticity

Let us consider the fixed-effects model of the type (12.151)

yit = x itc ȕ+Įi +u it

where var(Į i ) =

ı Į2 ,

­ı 2u ° and var(u it ) = ® it . 2 °¯ ı u i

Let us define, eit is the fixed effect residual which is the estimated value of uit . To test for heteroscedasticity, we have to define the auxiliary regression equation of the type

eit2 = Į0 +Į1z1it +Į2 z 2it +.........+Įk zkit +vit

(12.152)

Or, we can write that

var(u it ) = ı2 h(zcit Į)

(12.153)

where z jit 's are the function of explanatory variables and vit is IID across individuals and time. The null hypothesis to be tested is H 0 : Į1 = Į 2 =..........= Į k = 0

Chapter Twelve

672

against the alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the test statistic is given by

LR = n(T-1)R e2 ~Ȥ 2k

(12.154)

where R e2 is the goodness of fit of the auxiliary regression equation. Rejection of the null hypothesis at a given level of significance with k degrees of freedom indicates the presence of heteroscedasticity in the data. Ex. 12-1: For the data of the GDP (constant 2010 $), the total labour force (LF), and domestic investment (constant 2010 $) of G7 countries namely: Canada, France, Germany, Italy, Japan, the United Kingdom and the United States2:

(i) Estimate the Cobb-Douglas production function considering pooled model, fixed effects and random-effects models and comment on your results. (ii) Test the joint significance of the fixed effects. (iii) Compare the coefficients between the pooled model and the fixed-effects model. (iv) Compare the individual coefficients between the FE and RE models. (v) Test the adequacy of the random-effects model. Solution: Let us consider the pooled regression model of the type GDPit = A 0 LFitȕ1 CAPitȕ2 e İit

(12.155)

In equation (12.155), the subscript i (i =1, 2,….,N) denotes the ith country and the subscript t ( t =1, 2,….,T) denotes the tth time period. The variable GDPit indicates the gross domestic product (GDP, constant 2010 $) of the ith country at time t; LFit indicates the total labour force of the ith country at time t; and CAPit is the capital investment (constant 2010 $) of the ith country at time t. İ it indicates the random error term corresponding to the ith and tth set of observation. A 0 is a constant; the regression coefficient ȕ1 is the output elasticity with respect to labour force; and ȕ 2 is the output elasticity with respect to capital investment for the panel. Taking the logarithm of equation (12.155), it can be written as ln(GDPit ) = ln(A 0 ) + ȕ1ln(LFit ) + ȕ 2 ln(CAPit ) + İ it Yit = ȕ 0 + ȕ1X1it +ȕ 2 X 2it + İ it

where ȕ0 = ln(A 0 ), Yit = ln(GDPit ), X1it = ln(LFit ), and X 2it = ln(CAPit ) . Assumptions:

(i) Model (12.156) is linear in parameters (ii) E(İ it ) = 0,  i and t. (iii) var(İit ) = E(İit2 ) = ıİ2 . (iv) cov(İit ,İ js )

E(İit ,İ js ) = 0, for i z j, or t z s.

(v) cov(X jit ,İit ) 0,  j, i and t. (vi) There is no relationship between variables labour force and capital investment.

2

For each country, the time-series data is used from 1972 to 2018.

(12.156)

Regression Models Based on Panel Data

673

Assuming that the regression coefficients ȕ1 , and ȕ2 are constant for all countries, the logarithmic form of the fixedeffects model is given by yit = ȕ 0 +Į i +ȕ1X1it +ȕ 2 X 2it +u it

(12.157)

Assumptions:

(i) The Įi 's are assumed to be fixed parameters to be estimated. (ii) The disturbance terms u it 's are independently and identically distributed, with zero mean and constant variance. i.e.,

uit ~IID(0, ı2u ),  i and t. (iii) The variables X1 and X 2 are assumed to be independent of the u's , i.e., cov(X jit ,uit ) = 0,  j, i and t. (iv) There is no relationship between the variables labour force and capital investment. The logarithmic form of the random-effects model is given by yit = ȕ 0 +ȕ1X1it +ȕ 2 X 2it +Įi +u it

(12.158)

Assumptions:

(i) Įi ~IID(0, ıĮ2 ),  i. (ii) uit ~IID(0, ı2u ),  i and t. (iii) E(X jit ,Įi ) = 0, j, i and t. (iv) E(X jit ,uit ) = 0,  i, j and t. (v) There is no relationship between the variables labour force and capital investment. The pooled least-squares, fixed-effects and random-effects estimations for the Cobb-Douglas production function for the given panel are given in Tables 12-1, 12-2, 12-3, 12-4 and 12-5. All the results are obtained using STATA. Table 12-1: The OLS estimates of the pooled regression model

The OLS estimates of the pooled regression model Variable Coefficient Std. Error t-value p-value Constant 8.7399 0.4193 20.84 0.000 ln(LF) 0.4778 0.0301 15.85 0.000 ln(CAP) 0.5488 0.0261 21.05 0.000 329 Number of Observations Residual SS 0.9739 R2 Residual MS Adjust (R2) 0.9737 F(2, 326) Regression SS 184.899 Prob of F Regression MS 92.4495

95% Conf. Interval [7.9150, 9.5649] [0.4185, 0.5371] [0.4975, 0.6001] 4.9552 0.0152 6082.17 0.0000

Comments: From the least-squares estimates of the pooled regression model in Table 12-1, it is found that the output elasticity with respect to labour force is 47.78% and the output elasticity with respect to capital investment is 54.88%, and both are statistically significant at any significance level. From the estimated results, it can be said that the CobbDouglas production function is said to be a constant return to scale. It is also found that about 97.37% of the total variation in ln(GDP) is explained by the fitted regression equation and the remaining 2.63% is explained by the random factors. Thus, it can be said that the fit is very good.

Chapter Twelve

674

Table 12-2: The least-squares estimate of dummy variable regression model (LSDV)

Least-squares estimate of dummy variable regression mode (LSDV) Variable Coefficient Std. Error t-value p-value 95% Conf. Interval Constant 3.7835 0.3685 10.27 0.000 [3.0587, 4.5085] ln(LF) 0.5374 0.0656 8.19 0.000 [0.4083, 0.6665] ln(CAP) 0.7011 0.0311 22.56 0.000 [0.6399, 0.7622] Country 0.3863 Country-2 0.0540 7.15 0.000 [0.2800, 0.4925] 0.3891 Country-3 0.0882 4.41 0.000 [0.2155 0.56267] -0.1209 Country-4 0.0393 -3.07 0.002 [-0.1983, -0.0435] 0.2717 Country-5 0.0686 3.96 0.000 [0.1367, 0.4066] 0.3620 Country-6 0.0713 5.08 0.000 [0.2218, 0.5022] 0.2382 Country-7 0.0560 4.25 0.000 [0.1279, 0.3484] 329 Number of Observations Residual SS (df=320) 1.8398 0.9903 R2 Residual MS 0.0057 Adjust (R2) 0.9901 F(8, 320) 4087.63 Regression SS (df=8) 188.0144 Prob of F 0.0000 Regression MS 23.5018 The least-squares dummy variable model (LSDV) provides a good understanding of the fixed-effects model. The effects of labour force and capital investment are mediated by the differences across countries. By adding the dummy variable for each country, we are estimating the pure effects of labour force and capital investment (by controlling for the unobserved heterogeneity). Each dummy is absorbing the effects particular to each country. From the LSDV estimates of the fixed-effects model, it is found that the output elasticity with respect to labour force is 53.774% and the output elasticity with respect to capital investment is 70.11%, and both are statistically significant at any significance level. From the estimated results, it can be said that the Cobb-Douglas production function is said to be an increasing return to scale. It is also found that, except for Japan, the dummies for other countries have significant positive effects but Japan has a significant negative effect on economic growth. It is also found that about 99.03% of the total variation in ln(GDP) is explained by the fitted regression equation and the remaining 0.07% is explained by random factors. Thus, it can be said that the fit is very good. Table 12-3: The least-squares estimate of the fixed-effects model

Least-squares estimate of the fixed-effects model Variable Constant ln(LF) ln(CAP) R2-Within R2-Between R2-Overall F(2, 320) Prob. Of F

Coeff. 4.0016 0.5374 0.7011 0.9423 0.9867 0.9738 2610.83 0.0000

Std. Error 0.3589 0.0656 0.0311

t-value 11.15 8.19 22.55

p-value 0.000 0.000 0.000

ıˆ u ıˆ Į corr(Įi , Xb)

95% Conf. Interval [3.2954, 4.7078] [0.4082, 0.6665] [0.6399, 0.7522] 0.0758 0.2017 -0.852 0.8761

ȡˆ [Fraction of variance due to Įi ]

From the estimates of the fixed-effects model, it is found that the output elasticity with respect to labour force is 53.74% and the output elasticity with respect to capital investment is 70.11%, and both are statistically significant at any significance level. From these estimated elasticities, it can be said that the production function is said to be an increasing return to scale for this panel. From the estimated results, it is concluded that 97.38% of the total variation of ln(GDP) is explained by the fitted regression equation and the remaining 2.62% is explained by random factors. The estimates of the error components (their standard errors) are ıˆ Į 0.2017, and ıˆ u 0.0758 for the fixed-effects model.

Regression Models Based on Panel Data

675

Table 12-4: The least-squares estimate between-group regression (regression on group means)

Least-squares estimate between-group regression (regression on group means) Variable Coefficient Std. Error t-value p-value 95% Conf. Interval Constant 13.6958 2.4867 5.51 0.005 [6.7916, 20.6000] ln(LF) 0.7707 0.1654 4.66 0.010 [0.3115, 1.2299] ln(CAP) 0.2517 0.1527 1.65 0.175 [-0.1721, 0.6757] 0.9233 Number of obs R2-Within 329 0.9936 Number of groups R2-Between 7 0.9641 R2-Overall Obs per group: min 47 F(2, 4) 308.30 Avg 47 Prob. Of F 0.0000 Max 47 sd(u_i + avg(e_i.)) 0.0736 Comments: From the OLS estimates of between regressions (regression on group means), it is found that the betweenR2 is higher than the within-R2. The variable ln(LF) explains a significant amount of the ln(GDP) variation over time for a given individual, but the variable ln(CAP) does not explain a significant amount of the variable of ln(GDP) over time for a given individual. The large coefficient differences suggest that a single regression model with classical assumptions does not provide the evidence of good fit. Table 12-5: The GLS estimates of the random-effects model

Variable Constant ln(LF) ln(CAP) R2-Within R2-Between R2-Overall Wald- Ȥ 2 (2) Prob.

Coeff. 4.3070 0.3897 0.7471 0.9414 0.9824 0.9723 5375.10 0.0000

GLS estimates of the random-effects model Std. Error t-value p-value 0.3692 11.67 0.000 0.0521 7.48 0.000 0.0280 26.70 0.000 ıˆ u ıˆ Į ȡˆ [Fraction of variance due to Įi ]

95% Conf. Inter. [3.5834, 5.0306] [0.2876, 0.4918] [0.6922, 0.8019] 0.07583 0.07276 0.47939

Comments: From the GLS estimates of the random-effects model, it is found that the output elasticity with respect to labour force is 38.97% and the output elasticity with respect to capital investment is 74.71%, and these elasticities are statistically significant at any significance level. From the estimated results, it can be said that the production function for the panel of G7 countries is an increasing return to scale. Also, from the estimated results of the random-effects model, it can be said that 97.23% of the total variation in ln(GDP) is explained by the fitted regression equation and the remaining 2.77% is explained by random factors. The estimates of the error components (their standard errors) are ıˆ Į 0.07276, and ıˆ u 0.07583 for the random-effects model. The estimated correlation coefficient is ȡˆ = 0.47939 . Thus, it can be said that small fraction of the total error variance is attributed to individual heterogeneity.

Now, comparing the estimated results for the alternative models, we can say that the output elasticity with respect to labour force is larger for the fixed-effects model followed by the pooled regression and random-effects models. The output elasticity with respect to capital investment is larger for the random-effects model followed by the fixed effects and pooled regression models. The goodness of fits is almost the same for the pooled regression, the random-effects and fixed-effects models. The estimate of the transformation parameter is 1  șˆ = 1 

0.07583 0.00575+47 u 0.00529

0.8497 .

Using this value, to transform the model as in (12.77), we apply the least-squares method to the transformed regression model to estimate the random-effects model. (ii) Test the Joint Significance of the Fixed Effects

We can test the joint significance of the dummies. The null hypothesis to be tested is H 0 : Į1 = Į 2 =.....= Į 6 = 0

Chapter Twelve

676

against the alternative hypothesis H1 : At least one of them is not zero.

Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/(n-1) ~F{(n-1), nT-n-k)} UESS/(nT-n-k)

(12.159)

For the given problem, the restricted residual sum of squares (RESS) is RESS = 4.4666, the unrestricted residual sum of squares (UESS) is UESS = 1.3640, number of countries n =7, T = 39, and k = 2. Thus, putting these values in equation (12.159), we have F=

(4.9552  1.8398)/6 ~F{6, 320} 1.8398/320

= 90.3084

(12.160)

At 5% level of significance with 6 and 230 degrees of freedom, the table value of the F test statistic is 2.19. Since the calculated value of the test statistic is larger than the table value, the null hypothesis will be rejected. Thus, it can be concluded that all fixed effects are not jointly zero. We can also conclude that there are differences in individual intercepts and the data should not be pooled into a single equation model with a common intercept parameter. Test for Poolability

For the given problem, the following regression equation is considered: yit = ȕ 0 +Į i +ȕ1X1it +ȕ 2 X 2it +u it , i =1, 2,…..,N and t =1, 2,……,T

(12.161)

where ȕ0 = ln(A 0 ), Yit = ln(GDPit ), X1it = ln(LFit ), and X 2it = ln(CAPit ) . The null hypothesis to be tested is H 0 : ȕ1 =ȕ 2 =ȕ

against the alternative hypothesis H1: ȕ1 and ȕ 2 are not equal

Under the null hypothesis, the test statistic is given by F=

(RESS  UESS)/{k(n  1)} ~F^k(n-1), n(T-k-1)` UESS/{n(T  k  1)}

(12.162)

RESS is the restricted residual sum of squares from OLS estimation of the restricted regression equation. For the given problem, we have RESS = 1.839846. UESS is the unrestricted residual sum of squares given by UESS = ESSUSA  ESSUk  ESSCAN  ESSJPN  ESSITA  ESSFRA  ESSGER = 0.0488+0.0483+0.0547+0.1667+0.0761+0.0252+0.1377

= 0.5579 For the given problem, we have n=7, T=47, k=2. Putting the values of all the terms in equation (12.162), we have F=

(1.8398  0.5579)/12 ~F^12, 308` 0.5579/308

= 58.9835

(12.163)

At 5% level of significance with 12 and 308 degrees of freedom, the table value of the F test statistic is 1.752. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Thus, it can be said that the slope coefficients are not homogenous.

Regression Models Based on Panel Data

677

Test for Random Effects

Let us consider the following random-effects model for the given problem: yit = ȕ 0 +Į i +ȕ1X1it +ȕ 2 X 2it +u it , i =1, 2,…..,N and t =1, 2,……,T

(12.164)

where ȕ0 = ln(A 0 ), Yit = ln(GDPit ), X1it = ln(LFit ), and X 2it = ln(CAPit ) . We assume that Įi ~IID(0, ıĮ2 ),  i , which implies that E(Įi ) = 0, and var(Įi ) = ıĮ2 . The null hypothesis to be tested for detecting the presence of heterogeneity is

H0 : ıĮ2 = 0 against the alternative hypothesis

H1: ıĮ2 > 0 Under the null hypothesis, the Breusch-Pagan Lagrange Multiplier (LM) test is: LM = 1193.80

(12.165)

At 5% level of significance with 1 degree of freedom, the table value is 3.84. Since the calculated value is larger than table value, the null hypothesis will be rejected. Therefore, it can be said that the intercept term will not be identical for every country. This is, based on the evidence of significant differences across countries, you cannot run a simple OLS regression. Hausman Specification Test:

The null hypothesis to be tested is H 0 : The error term D i is uncorrelated with the independent variables

against the alternative hypothesis H1 : The error term D i is correlated with the independent variables.

Under the null hypothesis, the Hausman test statistic is given by h=

(ȕˆ FE  ȕˆ RE )c(ȕˆ FE  ȕˆ RE ) 2 ~Ȥ k var(ȕˆ )  var(ȕˆ ) FE

RE

= 10.87

(12.166)

At 5% level of significance with 2 degrees of freedom, the table value of the test statistic is 5.99. Since the calculated value of the test statistic is greater than the table value, the null hypothesis will be rejected. Thus, it can be said that endogeneity is a problem in the random-effects model. Thus, we should use the fixed-effects model. Test for Autocorrelation in Fixed Effects Regression Models

Here The null hypothesis to be tested is H 0 : ȡ=0

against the alternative hypothesis H1: ȡ z 0 .

For the fixed-effects model, the correlation matrix of residuals is given in Table 12-6.

Chapter Twelve

678

Table 12-6: The correlation matrix of residuals

e1 e2 e3 e4 e5 e6 e7

e1 1.0000 -0.4891 0.6745 -0.5514 -0.6838 -0.5829 -0.1575

e2

e3

e4

e5

e6

e7

1.0000 -0.6316 0.8908 0.6451 0.7054 0.3596

1.0000 -0.7502 -0.5481 -0.5704 -0.2336

1.0000 0.6140 0.7333 0.3706

1.0000 0.9049 0.3021

1.0000 0.2038

1.0000

Also, the LM test statistics is

LM = 344.362~Ȥ12

(12.167)

At 5% level of significance with 1 degree of freedom, the table value of the chi-square test statistic is 3.84. Since the calculate value is larger than the table value, the null hypothesis will be rejected. Therefore, it can be said that the residuals are correlated across entities. Another Lagrange-Multiplier test for serial correlation is available using the command. Test for Heteroscedasticity

The null hypothesis to be tested is

H0 : ı12 = ı22 =....= ı72 = ı2 against the alternative hypothesis H1 : They are not homogeneous.

Under the null hypothesis, the modified Wald test for group wise heteroscedasticity in a fixed-effect regression model is obtained using STATA and given below: chi2 (7) = 130.24

(12.168)

At 5% level of significance with 7 degrees of freedom, the table value of the chi-squate test statistic is 14.37. Since the calculated value is larger than table value, the null hypothesis of homoscedasticity will be rejected at any significance level. Therefore, it can be said that there is a heteroscedastic problem in the data.

12.4 Instrumental Variables (IV) and GMM Estimation Instrumental Variables (IV) and Estimation

In the context of panel data, regressors in periods other than period t may be valid instruments for period-t regressors if these latter are endogenous or when they are lags of the dependent variable. These “natural” instruments will often permit consistent estimation even as the strict exogeneity assumption fails. For a general presentation of the panel IV estimation, we consider the following linear panel model in the matrix form: yit = x cit ȕ+u it

(12.169)

in which (i) xit may contain both time-varying and time-invariant explanatory variables and may include an intercept; (ii) There is no individual specific unobserved effect Įi (i=1, 2,.,N); and (iii) xit includes only current period variables (i.e., those with suffix t). Suppose, we have a panel data set (N is large, T is small) for which the observations are assumed to be independent over i (i =1, 2,…..,N). Stack all the T observations for the ith individual unit and consider the regression model

Regression Models Based on Panel Data

679

(12.170)

Yi = X i ȕ+U i ª x ci1 º ª yi1 º ª u i1 º « xc » «y » «u » « i2 » « i2 » « i2 » « . » « . » « . » where Yi = « » ; X i = « » ; and U i = « » « . » « . » « . » « . » « . » « . » « » « » « » ¬« x ciT ¼» ¬« yiT ¼» ¬« u iT ¼»

Model (12.170) defines a system of N linear regression estimations and we can rewrite it as a seemingly unrelated regression equations model. We can estimate equation (12.170) by the IV technique. The principle of the IV estimation technique is to assume that there are h instrumental variables Z1, Z2,…., and Zh for the ith individual unit. The data matrix on these h instrumental variable is Zi a T × h matrix, where h • k ( i.e., the number of instrumental variables is at least as large as the number of explanatory variables k), which satisfies the h moment conditions. E(ZicU i ) = 0 (i=1, 2,…,N)

(12.171)

The sample analogue of the moment condition (12.171) is 1 N ¦ Zci Ui = 0 N i=1 1 N ¦ Zci (Yi  Xiȕ) = 0, where Ui = Yi  Xiȕ N i=1

(12.172)

From (12.172), we have the following system of linear equations in elements of ȕ , i.e., N

ȕ ¦ Zci X i = i=1

N

¦ Z cY i

i

(12.173)

i=1

Equation (12.173) is the set of normal equations for the IV estimation of the given set of linear regression equations. (i) If h = k, we have the exact or just-identified case. (ii) If h > k, we have more estimating equations than the number of unknown parameters to be estimated, and thus, the over-identified case. The estimator derived from the above set of equations (implied by the given moment conditions) is called the generalised method of moments (GMM). The GMM Method

The GMM method is the best to deal with the dynamic micro panel data, especially to deal with firm data to control for endogeneity problems. The GMM method will be more efficient than the 2SLS method if there is heteroscedasticity or serial correlation in the error terms of a panel-data model. If we have two or more IVs and we want to use all of them, the only thing we can do is to use the GMM. Moreover, whenever we estimate a system of equations, the GMM estimation becomes more apparent rather than the dynamic OLS or the fully modified OLS estimation. For more detail, see Arellano-Bond (1991) and Blundell-Bond (1998) models for dynamic panel-data models. The GMM method also allows us to specify moment conditions from different data sets that have different units of observations. In the GMM method, we try to find the parameter estimates that minimise the moment conditions. So, the GMM allows us to easily incorporate more "known" information. The GMM approach is a very general large-sample estimator and can deal with potential endogeneity. Hansen (1982) showed that all instrumental variables estimators, in linear or nonlinear models, could be interpreted as a GMM estimator. In practice, the most important step in applying GMM is finding good instruments (instruments that are valid and strong). The OLS and 2SLS are unbiased and consistent, but the GMM estimators are consistent but not unbiased, and thus, may suffer from the finite-sample problem. The GMM is just a class of estimators, an estimator that happens to be naturally well suited to deal with potential endogeneity issues. The GMM is just an econometric trick in the sense that a hammer is just a tool trick. A hammer does not really solve the problem of nails sticking out. Now, carefully applying the hammer to said nails is how we solve the problem. The GMM estimator of ȕ is based on these moment conditions and minimises the associated quadratic form given by h×h weighting matrix W.

Chapter Twelve

680

The basic GMM panel estimators are based on moments of the form n

g(ȕ) = ¦ g i (ȕ) = i=1

n

¦ Zc U i

i

, where U i = Yi  X i ȕ

(12.174)

i=1

The GMM estimator of ȕ is obtained by minimising the quadratic form of the type ­ ½ ºc ª n º° °ª n S(ȕ) = min ® « ¦ Zci U i » W « ¦ Zci U i » ¾ ¼ ¬ i=1 ¼° °¯ ¬ i=1 ¿ min > g(ȕ)cWg(ȕ) @

(12.175)

with respect to ȕ for a suitably chosen (h u h) weighting matrix W and given by -1

ª§ n · § n ·º § n · § n · ȕˆ GMM = «¨ ¦ Xic Zi ¸ W ¨ ¦ Zci X i ¸ » ¨ ¦ Xci Zi ¸ W ¨ ¦ Zci Yi ¸ ¹ © i=1 ¹ ¼ © i=1 ¹ © i=1 ¹ ¬© i=1 -1

= > M cZX WM ZX @ > M cZX WM ZY @

(12.176) -1

If the model is just-identified, that is, h =k, then the GMM estimators just simplify the IV estimators > ZcX @ ZcY for any W. Proof: If h=k, then M cZX is a

(k×k)

nonsingular matrix, and we have

-1 ȕˆ GMM = > M cZX WM ZX @ > M cZX WM ZY @

-1 M-1Zx W-1McZX McZX WMZY

= M-1ZX MZY -1

= > ZcX @ ZcY

(12.177)

If Zi =Xi , then the GMM estimators simplify the OLS pooled estimators Proof: -1 ȕˆ GMM = > M cZX WM ZX @ > M cZX WM ZY @

-1 M-1ZX W-1McZX McZX WMZY

= M-1ZX MZY -1

= > ZcX@ ZcY -1

= > X cX @ X cY

= ȕˆ OLS

Issues: i. Choice of instruments Zi . 2. Choice of the weighting matrix W when the model is over-identified (i. e., h>k).

(12.178)

Regression Models Based on Panel Data

681

One-Step GMM or Two-Stage Least Square Estimator

The one-step GMM or the two-stage least-squares estimators use the weighting matrix ª n º W = « ¦ Zci Zi » ¬ i=1 ¼

-1

(ZcZ) -1

(12.179)

-1 ȕˆ 2SLS = ª¬McZX (ZcZ)-1MZX º¼ ª¬McZX (ZcZ)-1MZY º¼

(12.180)

implying that

It is the most efficient GMM estimator based on the moment conditions E(ZicU i ) = 0 , if var(Ui |Zi ) = ı2 IT

It is called the two-stage least-squares estimator because it is indeed the OLS estimator of the regression of y on Z(ZcZ)-1ZcX which in turn is the projection of X on the vector space generated by the columns of Z, i.e., the OLS fitted values of E(X | Z). In general, the most efficient (feasible) GMM estimator based on E(Zci U i ) = 0, uses the weighting

ˆ 1 where : ˆ is a consistent estimator of : and is defined as matrix : :

plim n of

1 n ¦ ZicUi Uci Zi n i=1

(12.181)

This yields the two-stage GMM estimator as -1

ȕˆ 2SGMM

ˆ -1M º ª M c ȍ ˆ -1M º ª M cZX ȍ ZX ¼ ¬ ZX ZY ¼ ¬

(12.182)

In this case, the asymptotic variance simplifies that Asy{var(ȕˆ 2SGMM )}

ˆ -1 )M º ª M cZX (nȍ ZX ¼ ¬

-1

(12.183)

ˆ 1 has to be based on fitted residuals, which are obtained in the first It is called the two-step GMM estimator because : step using a consistent estimator of ȕ , typically the 2SLS estimator of ȕ ˆ :

1 n ¦ Zci (Yi  Xiȕˆ 2SLS )(Yi  Xiȕˆ 2SLS )cZi n i=1

(12.184)

Testing Over-identified Restrictions

The h-moment conditions E(Zci U i ) = 0, are used to estimate k parameters. If h > k, the model is over-identified (i.e., there are more instruments than the number of parameters), and hence, one needs to test if any of the instruments in Zi is correlated with the error term (endogenous). The null hypothesis to be tested is H 0 : The over-identifying restrictions are valid

against the alternative hypothesis H1 : The over-identifying restrictions are not valid.

Under the null hypothesis, the over-identifying restriction (OIR) test statistic is given by OIRT =

n

n

ˆ ¦ Uˆ cZ (nȍ) ¦ ZcUˆ 1

i

i=1

ˆ where U i

i

i

i

(12.185)

i=1

Yi  X ci ȕˆ 2SGMM

2 Under the null hypothesis, OIRT is distributed as Ȥ (h-k) . This test is also known as the Sargan test statistic. The rejection

of the null hypothesis indicates that the over-identification restrictions are not valid.

Chapter Twelve

682

Ex. 12-2: The Cobb-Douglas production function is estimated using the GMM method for the given problem in Ex. 121 and the results are given in Table 12-7. Table 12-7: The GMM estimates of the Cobb-Douglas production function

Dependent Variable: ln(GDP) Method: Panel Generalized Method of Moments Sample: 1972 2018 Periods included: 47 Cross-sections included: 7 Total panel (balanced) observations: 329 2SLS instrument weighting matrix Instrument specification: ln(GDP) Constant ln(LF) ln(CAP) Variable Constant ln(LF) ln(CAP) R-squared Adjusted R-squared S.E. of regression Durbin-Watson stat Instrument rank

Coefficient 8.7399 0.4778 0.5488 0.9739 0.9737 0.1233 0.0285

Std. Error t-Statistic 0.4193 20.8425 0.0301 15.8505 0.0261 21.053 Mean dependent var S.D. dependent var Sum squared resid J-statistic 4

Prob. 0.0000 0.0000 0.0000 28.5773 0.7608 4.9552 326.000

Comments: From the GMM estimates of the Cobb-Douglas production function for the panel of G7 countries, it is found that the output elasticity with respect to labour force is 47.78% and the output elasticity with respect to capital investment is 54.88%, and these elasticities are statistically significant at any significance level. From the estimated results, it can be said that the production function for the panel of G7 countries is an increasing return to scale. Also, from the GMM estimates, it can be said that 97.39% of the total variation in ln(GDP) is explained by the fitted regression equation and the remaining 2.61% is explained by random factors. Thus, it can be said that the fit is very good.

12.5 Panel Unit Root Tests For a panel time-series variable, it is very common for us to test unit root problems. However, the following tests are derived for testing unit roots in the case of panel analysis. Table 12-8: Different panel unit root tests

First Generation 1. Non-stationarity Tests 2. Stationarity Test Second Generation 1. Factor Structure

2. Other Approaches

Cross-sectional Independence Levin and Lin (1992, 1993) and Levin Lin and Chu (2002) Im, Pesaran and Shin (1997, 2003) Maddala and Wu (1999) and Choi (1999, 2001) Choi’s (2001) extension Hadri (2003) Cross-sectional Dependence Pesaran (2003) Moon and Perron (2004a) Bai and Ng (2002, 2004) Choi (2002) O’ Connell (1998) Chang (2002, 2004)

Here, the tests which are the most popular and commonly used for testing unit roots for panel analysis are discussed. Levin, Lin and Chu (2002) Test

Levin and Lin (1992, 1993) and Levin, Lin, and Chu (2002) provide some new results on panel unit root tests. They generalised the Quah’s model to allow for heterogeneity of individual deterministic and heterogeneous serial correlation structure of the error terms assuming heterogeneous first-order autoregressive parameters. They assume that both N and N o 0. T tend to infinity but T increases at a faster rate such that T

Regression Models Based on Panel Data

683

They developed a procedure using the pooled t-statistic of the estimator to evaluate the hypothesis that each individual time series contains a unit root against the alternative hypothesis that each time series is stationary. Model Specifications for Panel Unit Root Tests

We observe the stochastic process {y it } for a panel of individuals (i = 1, 2,…..,N) and each individual contains t = 1, 2,…….,T time-series observations. We wish to determine whether {y it } is integrated for each individual in the panel. As in the case of a single time series, the individual regression may include the intercept and a time trend. We assume that all the individuals in the panel have identical first-order partial autocorrelation, but all other parameters in the error process are permitted to vary freely across individuals. Assumption 1: We assume that {y it } is generated by one of the following three models. Model 1: No constant and trend terms are included in the model

ǻyit = įyi,t-1 +u it

(12.186)

Model 2: Only the constant term but no time trend is included in the model

ǻyit = Į0i +įyi,t-1 +uit

(12.187)

Model 3: Both the constant and trend terms are included in the model

ǻyit = Į0i +Į1i t+įyi,t-1 +uit

(12.188)

Assumption 2: The error process uit is independently distributed across individuals and follows a stationary invertible ARMA process for each individual f

u it = ¦ Ȗ ij u i,t-j +İ it

(12.189)

j=1

Assumption 3: For all i = 1, 2,……….,N, and t = 1, 2,……….,T, f

E(İ it4 ) < f; E(İ it2 ) t Bİ > 0; and E(u it2 )+2¦ E(u it u i,t-j )  Bu  f j=1

In model 1, the panel unit root test procedure evaluates the null hypothesis H0 : į = 0

against the alternative hypothesis H1: į < 0 .

In model 2, only the individual-specific mean value but no time trend is included. The panel test procedure evaluates the null hypothesis that H 0 : į = 0 and Į oi = 0,  i

against the alternative hypothesis H1: į < 0 and Į 0i  R .

In model 3, the individual-specific mean and time trend are included. The panel test procedure evaluates the null hypothesis H 0 : į = 0 and Į1i = 0,  i

against the alternative hypothesis H1: į < 0 and D1i  R .

Chapter Twelve

684

Test Procedure

Let us consider the model of the type pi

ǻyit = įyi,t-1 +¦ șijǻyi,t-j +Į mi d mt +u it

(12.190)

j=1

For notational simplicity, d mt is used to indicate the vector of deterministic components and Į m is used to indicate the corresponding vector of coefficients for a particular model m=1, 2, 3. Thus, d1t ={0}, d 2t ={1}, and d 3t ={1, t} . Since pi is unknown, the following steps are involved in conducting a formal test for testing the presence of the unit root problem in the panel data: Step 1: Perform the ADF regressions and generate orthogonalised residuals for each individual.

For each individual i (i= 1, 2,…….,n), we implement the ADF regression pi

ǻyit = įyi,t-1 +¦ șijǻyi,t-j +Į mi d mt +u it , (i = 1, 2,…….,n; t = 1, 2,…..,T; m = 1, 2, 3)

(12.191)

j=1

The lag order pi is permitted to vary across individuals. Campbell and Perron (1991) recommended the method proposed by Hall (1990) for selecting the appropriate lag order: for a given sample of length T, choose a maximum lag order p max and then use the t-statistic of șˆ ij to determine if a smaller lag order is preferred. These t-statistics have a standard normal distribution under the null hypothesis șij = 0, both when įi = 0, and when įi < 0. Having determined the autoregressive order pi in equation (12.191), we run two auxiliary regressions to generate orthogonalised residuals. Regress 'yit and yi,t-1 against ǻyi,t-j and the appropriate deterministic variables d mt , then save the residuals

eit , and vi,t-1 ,  i and t, i.e., pi

ǻyit = ¦ Ȗijǻyi,t-j +Į mi d mt +u it

(12.192)

j=1

and pi

yi,t-1 = ¦ ʌ ijǻyi,t-j +Į mi d mt +ȟ it

(12.193)

j=1

If we apply the OLS method to the auxiliary regression equations (12.192) and (12.193), the estimated equations are given by 'ˆ yit

pi

¦ Ȗˆ ǻy ij

i,t-j

+Įˆ mi d mt

(12.194)

j=1

pi

yˆ i,t-1 = ¦ ʌˆ ijǻyi,t-j +Įˆ mi d mt

(12.195)

j=1

Therefore, the residuals are given by eit

pi

'yit  ¦ Ȗˆ ij ǻyi,t-j  Įˆ mi d mt

(12.196)

j=1

pi

vit-1 = yˆ it-1  ¦ ʌˆ ijǻyi,t-j  Įˆ mi d mt

(12.197)

j=1

Now, to control for heterogeneity across individuals, we further standardise both residuals by the regression standard error from equation (12.191) which are given by

Regression Models Based on Panel Data

685

eit v ; and v it-1 = it-1 si si

e it =

where si is the regression standard error from equation (12.191) or equivalently it can be calculated from the regression equation

eit = ȕi vi,t-1 +İit

(12.198)

We have T 1 (eit  ȕˆ i v it-1 ) 2 ¦ T  pi  1 t=pi +2

si2 =

(12.199)

Step 2: Estimate the ratio of the long-run to short-run standard deviations.

Under the null hypothesis of a unit root, the long-run variance for model 1 can be estimated as ıˆ 2yi =

k ª 1 T º 1 T ǻy it2 +2¦ w kj « ǻy it ǻy i,t-j » ¦ ¦ T-1 t=2 j=1 ¬ T-1 t=2+j ¼

(12.200)

For model 2, the long-run variance is given by ıˆ 2yi =

k ª 1 T º 1 T (ǻy it -ǻyit ) 2 +2¦ w kj « (ǻy it -ǻyit )(ǻy i,t-j -ǻyit-j ) » ¦ ¦ T-1 t=2 j=1 ¬ T-1 t=2+j ¼

(12.201)

If the data include the time trend in model 3, the trend should be removed before estimating the long-run variance. The truncation lag parameter k can be data dependent. Andrews (1991) suggests a procedure to determine k to ensure the consistency of ıˆ 2yi . The sample covariance weight w kj depends on the choice of kernel. For example, if the Bartlett kernel is used, then w kj is given by w kj =1 

j k+1

(12.202)

Now, for each individual i, we define the ratio of the long-run standard deviation to the innovation standard deviation which is given by sils =

ıˆ yi si

(12.203)

Let the average standard deviation ratio be sls =

1 N ¦ sils N i=1

(12.204)

This important statistic will be used to adjust the mean of the t-statistic later in step 3. Step 3: Pool all cross-section and time-series observations to estimate e it = įv it-1 +İ it

(12.205)

N  T-p-1, where p = 1 ¦ p is the average lag order for the This equation is based on NT observations where T= i N i=1 individual ADF test for individual unit root test. Now, the conventional regression t-statistic for testing the null hypothesis H 0 : į = 0, is given by

tį =

įˆ ˆ se(į)

where įˆ is the OLS estimator from equation (12.205) which is given by

(12.206)

Chapter Twelve

686 N

įˆ =

T

¦ ¦ e v it

it-1

i=1 t=pi +2 N

(12.207)

T

¦ ¦ v

2 it-1

i=1 t=pi +2

And the variance of įˆ is given by ˆ var(į)

ıˆ İ2 N

T

¦¦

(12.208) 2 v it-1

i=1 t=pi +2

where ıˆ İ2 is given by 

ıˆ İ2 =

1 N T ˆ  )2 ¦ ¦ (e it  įv it-1 NT i=1 t=pi +2

(12.209)

Under the null hypothesis H 0 : į = 0, the asymptotic results in the next section indicates that the regression t-statistic (

tG ) has a standard normal limiting distribution in Model 1, but diverges to negative infinity models 2 and 3. Nevertheless, it is easy to calculate the following adjusted t-statistic

t*į =

ˆ *  se(į)ȝ tG  NTs ls mT ıˆ İ2 ı*mT

(12.210)

where the mean adjustment ȝ*mT and standard deviation ı*mT are given in the table for a given deterministic specification  ( m = 1, 2, 3) and time-series dimension T.

Decision: The rejection of the null hypothesis implies that there is no problem of unit root in the panel data. Im, Pesaran and Shin (2003) Test

Im, Pesaran and Shin (IPS, 2003) suggested a newer, more flexible and computationally simpler unit root testing procedure for panel using the likelihood framework which is called the t-bar statistic that allows for simultaneously stationary and non-stationary series. Im, Pesaran and Shin (IPS) test allows for a heterogeneous coefficient of yit-1 . Instead of pooling the data, IPS consider the average value of the ADF statistic computed for each individual in the panel when the error term uit of the model

ǻyit = įi yi,t-1 +Įmi d mt +u it

(12.211)

Is serially correlated possibly with different serial correlation patterns across cross-sectional units, that is, pi

u it = ¦ Ȗij u i,t-j +İ it

(12.212)

j=1

where T and N are sufficiently large. Substituting this uit in equation (12.211), we have pi

ǻyit = įi yit-1 +¦ Ȗ ijǻyit-j +Xcit Ȝ+İ it

(12.213)

j=1

where 'yit = yit  yi,t-1 , yit (i = 1, 2,….,n, t = 1, 2,….,T) is the series under investigation for country i over period t, pi is the number of lags in the ADF regression and İ it errors are assumed to be independently and normally distributed random variables for all i’s and t’s with zero mean and finite heterogeneous variance ıi2 . Both Įi and pi in equations (12.211) and (12.212) are allowed to vary across countries. The null hypothesis to be tested is that each series in the panel contains a unit root, i.e., H 0 : įi = 0,  i

Regression Models Based on Panel Data

687

against the alternative hypothesis that some of the individual series have unit roots but not all ­įi = 0, for some i's . H1 : ® ¯įi < 0, for at least one i

There are two stages for constructing the t-bar statistic which is proposed by Im, Pesaran and Shin (2003). In the first stage, the average value of the individual ADF t-statistic for each of the countries in the sample is calculated and given by tnT =

1 n ¦ t iT n i=1

(12.214)

where tiT is the calculated ADF test statistic for country i of the panel (i = 1, 2, ……, n). IPS assumes that tiT are IID and have the finite mean and variance. Therefore, by Lindeberg-Levy central limit theorem, the standardised t-bar statistic converges to a standard normal variate as n tends to be infinite under the null hypothesis. The second step is to calculate the standardised t-bar statistic which is given by n ª º n « tnT  n1 ¦ E( tiT (pi )) » i=1 ¬ ¼ ~ N(0, 1) 1 n ¦ var( tiT (pi )) n i=1

Z tnT =

(12.215)

where n is the size of the panel, which indicates the number of countries; and E(tiT (pi )) and var( tiT (pi )) are provided by IPS for various values of T and p. However, Im, Pesaran and Shin (2003) suggested that, in the presence of crosssectional dependence, the data can be adjusted by demeaning and that the standardised demeaned t-bar statistic converges to the standard normal in the limit. The Fisher’s Type Test: Maddala and Wu (1999) and Choi (2001) Test

Maddala and Wu (1999) proposed a Fisher-type test which combines the p-values from the unit root tests for each crosssection i. The test is non-parametric and has a chi-square distribution with 2n degrees of freedom, where n is the number of countries in the panel. The test statistic is given by n

2 O =  2¦ log e (pi )~F 2n(d.f.)

(12.216)

i=1

where pi is the p-value for unit i. The Maddala and Wu (1999) test has the advantage over the Im, Pesaran and Shin (2003) test in that it does not depend on different lag lengths in the individual ADF regressions. Maddala and Wu (1999) performed Monte Carlo simulations showing that their test is superior to that proposed by Im, Pesaran and Shin (2003). Choi (2001) Test

Choi (2001) noted that LLC and IPS tests suffer from some common inflexibility which can restrict their use in applications that (i) they require an infinite number of groups. (ii) all the groups are assumed to have the same type of non-stochastic components. (iii) T is assumed to be the same for all the cross-section units, and to consider the case of unbalanced panels, further simulations are required. (iv) in the case of Levin and Lin, the critical values are sensitive to the choice of lag lengths in the ADF regressions. (v) finally, all the previous tests hypothesise that none of the groups have a unit root under an alternative hypothesis: they do not allow that some groups have a unit root and others do not. Choi (2001) tried to overcome these limitations and proposed a very simple test based on the combination of the pvalues from a unit root test applied to each group in the panel data. Choi (2001) considers the following regression equation

Chapter Twelve

688

yit = d it +x it , i = 1, 2,……,N and t = 1, 2,…..,T

(12.217)

where d it =D 0i +D1i t+D 2i t 2 +.......+Į mi i t mi and

xit = ȡi xi,t-1 +uit

(12.218)

uit is integrated of order zero. The observed data yit are composed of a non-stochastic process dit and a stochastic

process xit . Each time series yit can have a different sample size and different specification of non-stochastic and stochastic components depending on i. Here, uit may be heteroscedastic. The null hypothesis to be tested is H 0 : ȡi = 1,  i

which implies that all the time series are unit root nonstationary. The alternative hypothesis is that H1: |Ui | < 1, for at least one i for finite n .

That is, some time series are nonstationary while others are not or H1: |Ui | < 1, for some i's for infinite n .

Let G iTi be a one-sided unit root test statistic (e.g., DF tests) for the ith group in the model and assume that (i) under the null hypothesis, as Ti o f , G iTi o G i (where Gi is nondegenerate random variable) (ii) uit is independent of u js for all t and s when i z j (iii)

nk o k (a fixed constant) as n tends to infinity n

Let pi be the p value of a unit root test for cross-section i, i.e., p i = F(G iTi ) , where F(.) is the distribution function of Gi . The proposed Fisher-type test is given by n

(12.219)

P = -2¦ ln(pi ) i=1

which combines the p-values from the unit root tests for each cross-section i to test for the unit root in the panel data. Under the null hypothesis of the unit root, P is distributed as chi-square with 2n degrees of freedom as Ti o f for all n. Fisher test holds some important advantages: (i) it does not require a balanced panel as in the case of IPS. (ii) it can be carried out for any unit root test derived. (iii) it is possible to use different lag lengths in the individual ADF regressions. The main advantage of this test is that the p-values need to be derived by the Monte Carlo simulation. When n is large, it is necessary to modify the P test since in the limit, it has a degenerate distribution. For the P test, we have E > -2lnpi @ = 2, and var > -2lnpi @ = 4 .

Choi (2001) proposed a Z test which is given by Z=

1 2

n

¦ (  2lnp n i=1

i

 2)

(12.220)

Regression Models Based on Panel Data

689

This statistic corresponds to the standardised cross-sectional average of individual p-values. Under the cross-sectional independence assumption of the pi 's , the Lindeberg-Levy central limit theorem is sufficient to show that, under the unit root hypothesis, Z converges to a standard normal distribution as Ti , n o f . Residual-Based LM Test: Hadri (2000) Test

Hadri (2000) derives a residual-based Lagrange multiplier (LM) test where the null hypothesis is that there is no unit root in any of the series in the panel against the alternative hypothesis of a unit root in the panel. This is a generalisation of KPSS (Kwiatkowski, Phillips, Schmidt and Shin (1992) test from time-series to panel data. In particular, Hadri (2000) considers the following two models: Model 1: yit = x it +İ it , i= 1, 2,...,N ; t = 1, 2,………..,T

(12.221)

Model 2: yit = x it +ȕ i t+İ it , i= 1, 2,…..,N; t = 1, 2,………….,T.

(12.222)

where x it = x it-1 +u it is a random walk. İit ~IIN(0, ıİ2 ), and uit ~IIN(0, ı2u ) are mutually independent and normal that are IID across i and t. We have x i1

x i0  u i1

x i2

x i1  u i2

x i0  u i1 +u i2

x i3

x i2  u i3

x i0  u i1 +u i2  u i3

Continuing this, finally, we have t

(12.223)

x it = x i0 + ¦ u it s=1

Putting the value of xit in equation (12.222), we have t

yit = x i0 +ȕi t+¦ u it +İ it s=1

(12.224)

y it = x i0 +ȕ i t+v it t

where vit = ¦ u it +İ it . s=1

The null hypothesis to be tested is

H0 : ı 2u = 0 ; it indicates that the series is stationary. i.e., vit = İ it against the alternative hypothesis

H1: ı2u z 0. Under the null hypothesis, the LM test statistic is given by

1 N 1 ¦ N i=1 T 2 LM1 = ıˆ İ2 where Sit =

T

t=1

t

¦ ݈

2 it

¦S

is

are the partial sums of OLS residual ݈ is from equation (12.224) and ıˆ İ2 is the consistent estimate of

s=1

2 İ

(12.225)

ı under the null hypothesis, It is given by

Chapter Twelve

690

ıˆ İ2 =

1 N T 2 ¦¦ ݈ it NT i=1 t=1

(12.226)

Hadri (2000) suggested an alternative LM test that allows for heteroscedasticity across i say ıİi2 . This is, in fact, LM 2 =

1§ N § 1 ¨¦¨ N © i=1 © T 2

T

2 it

¦ S /ıˆ

2 İi

t=1

·· ¸¸ ¹¹

(12.227)

The test statistic is given by Z=

N (LM  Ȝ) Ȗ

(12.228)

which is asymptotically distributed as N(0, 1) where O =1/6, and Ȗ =1/45 is the model only includes a constant and Ȝ = 1/15, and Ȗ =11/6300 otherwise. Hadri (2000) shows using the Monte Carlo experiments that the empirical size of the test is close to its nominal 5% level for sufficiently large N and T. Ex. 12-3: The unit root problem for each panel variable in the logarithmic form is examined for the given data in Ex. 12.1 using the Levin, Lin and Chu (LLC, 2002); Im, Peasaran and Shin (IPS, 2003); Maddala and Wu (MW, 1999); Choi (2006); and Hadri (2003) tests for different cases. The results are given in Table 12-9. Table 12-9: Unit root test results

Case 1: No constant and trend terms are included in the model [Level Form] Variable ln(GDP) ln(LF) ln(CAP)

ln(GDP) ln(LF) ln(CAP) ln(GDP) ln(LF) ln(CAP)

LLC Test MW Test Choi Test 12.3996 0.0072 11.3843 (1.0000) (1.0000) (1.0000) 8.6019 0.1509 9.6172 (1.0000) (1.0000) (1.0000) 5.2611 0.5350 6.2185 (1.0000) (1.0000) (1.0000) Case 2: Only the constant term is included in the model [Level Form] LLC Test IPS Test MW Test Choi Test Hadri Test -5.2126* -0.2266 17.9861 -0.1964 12.4945* (0.000) (0.4104) (0.2074) (0.4221) (0.0000) -11.2274* -0.8642 26.7689* -0.8170 12.0378* (0.0000) (0.1938) (0.0206) (0.2070) (0.0000) -0.5445 2.0948 4.2363 2.1609 11.8590* (0.2931) (0.9819) (0.9939) (0.9846) (0.0000 Case 3: Constant and trend terms are included in the model [Level Form] -0.2411 1.4606 8.4915 1.5245 9.0981* (0.4047) (0.9279) (0.8622) (0.9363) (0.0000) -1.4681 0.8363 10.3171 0.9011 7.0953* (0.0710) (0.7957) (0.7387) (0.8162) (0.0000) -0.4851 -1.0281 19.4051 -1.0540 6.0408* (0.3138) (0.1520) (0.1500) 0.1459 (0.0000) Reported value in parenthesis are the probabilities, *: indicates significant at 5% level

All the tests’ results indicate that the panel variables lnGDP, lnLF and lnCAP are integrated of order one.

12.6 Panel Data Cointegration Analysis The cointegration technique is applied to know the existence of the long-run equilibrium relationship between two panel variables. From the statistical point of view, a long-run equilibrium relationship means that the variables move together over time so that the short-term disturbances from the long-term trend will be corrected. A lack of cointegration suggests that such variables have no long-run equilibrium relationship, and in principle, they can wander arbitrarily far away from each other (Dickey et al., 1991). Note that the regression among integrated series is meaningful, if and only if they involve cointegrated variables.

Regression Models Based on Panel Data

691

Meaning of Cointegration: Two variables are said to be cointegrated of order (1, 1) if they are individually nonstationary or random-walk stochastic processes, (integrated of order 1) but their linear combination must be stationary, i.e., integrated of order (0). More specially, if we say that two panel variables Y and X are random walk processes of order 1, they are said to be cointegrated of order (1, 1) if the linear combination is stationary or integrated of order 0, i.e., I(0). Panel Cointegration Estimation

For panel cointegrated regression models, the asymptotic properties of the estimators of the regression coefficients and the associated statistical tests are different from those of the time-series cointegration regression models. The panel cointegration models are directed at studying questions that surrounded long-run economic relationships encountered in macroeconomic and financial data. A number of cointegration tests have been derived in the case of panel data analysis, but in this section, the tests which are most popular and widely applicable are discussed. (1) Kao Tests for Panel Cointegration [provided by Kao in 1999] (2) Pedroni Test for Panel Cointegration [provided by Pedroni] (3) The Johansen Fisher Panel Conintegration Test [proposed by Maddala and Wu in 1999] Kao Test for Panel Cointegration

Kao (1999) presents two types of cointegration tests in panel data, the Dickey-Fuller (DF) and the augmented DickeyFuller (ADF) types’ tests. The DF- type test can be computed from the estimated residuals as (12.229)

eit = ȡeit-1 +vit

where eit 's are the estimated residuals from the panel static regression equation. To test the null hypothesis of no cointegration, the null hypothesis can be written as H0 : ȡ = 1

against the alternative hypothesis H1: ȡ < 1.

The OLS estimation of ȡ is given by n

T

¦¦ e e

it it-1

ȡˆ =

i=1 t=2 n T

(12.230)

¦¦ eit2 i=1 t=2

The four DF-types tests are constructed as follows DF(ȡ) =

nT (ȡˆ  1)+3 n 10.2

DF(t) = 1.25t ȡˆ = 1.875n

DFȡ* =

nT (ȡˆ  1)+3 nıˆ 2v 2 ıˆ 0v 4 º¼ 3+ ª¬ 7.2ıˆ 2v /ıˆ 0v

(12.231) (12.232) (12.233)

Chapter Twelve

692

t ȡˆ + 6nıˆ v 2ıˆ 0v

DF* (t) =

(12.234)

2 § ıˆ 0v · 2 ˆ 2v /10ıˆ 0v 2 ¸ + 3ı ¨ ˆ 2ı v ¹ ©

2 ı2v = 6u  Ȉuİ Ȉ-1İ , and ı0v

ȍu  ȍue ȍ-1İ

Static regression equation is of the type (12.235)

yit = ȝ i +x cit ȕ+u it ; i = 1, 2,.........,n; t = 1, 2,........,T

where ȕ is a (k u1) of the slope parameters; ȝ i : intercepts; u it : stationary disturbance terms; x it is a (k u1) integrated process of order 1 for all i; x it ~I(1)  i, Ÿ x it = x it-1 +İ it , {yit , x it } are independent across cross-sectional units; and

Ȧit = u it , İcit c is a linear process. Then, the long-run covariance matrix of {Ȧit } is denoted by : and given by :=

§ ȍu

f

¦ E(Ȧ ,Ȧc ) = ¨ ȍ ij

i0

©

j=-f

İu

ȍ uİ · ¸ , and Ȉ = E(Ȧi0 Ȧci0 ) = ȍİ ¹

§ Ȉu ¨ © Ȉ İu

Ȉ uİ · ¸ Ȉİ ¹

where DF(ȡ) and DF(t) are based on assuming strictly exogeneity of the regressors with respect to error in equations;

DF* (ȡ) and DF* (t) are for cointegration with endogenous regressors. For the ADF-type test, we can run the ADF regression equation of the type p

eit = ȡeit-1 +¦ Ȗ jǻeit-j +vit

(12.236)

j=1

With the null hypothesis of no cointegration, the ADF-type test statistics can be constructed as follows: t ȡˆ + 6nıˆ v /2ıˆ 0v 2ıˆ 0v

ADF =

(12.237)

2 § ıˆ 0v · 2 ˆ 2v /10ıˆ 0v 2 ¸ + 3ı ¨ ˆ 2ı v ¹ ©

The asymptotic distributions of the DF and ADF converge to a standard normal distribution N(0, 1). The rejection of the null hypothesis implies that the panel variables are cointegrated. Pedroni (1995) Test for Cointegration

Pedroni (1995) has derived a pooled Phillips and Perron-types test based on the assumption of strictly exogenous regressor variables. Under the null hypothesis of no cointegration, the panel autoregressive coefficient estimator ȡˆ can be constructed as follows: n

T

¦¦ e ǻe it

ȡˆ  1 =

it

 Ȝˆ i

i=1 t=2

n



(12.238)

T

2 ¦¦ eit-1 i=1 t=2

Oˆi acts as a scalar equivalent to the correlation matrix * and corrects for any correlation effects. f §ī ī = ¦ E(Ȧij ,Ȧci0 ) = ¨ u j=1 © ī İu

ī uİ · ¸ īİ ¹

Under the null hypothesis, the Pedroni’s tests are given by

Regression Models Based on Panel Data

PC1 =

PC2 =

T n (ȡˆ  1) 2

693

~N(0, 1)

nT(T  1)(ȡˆ  1) 2

(12.239)

~N(0, 1)

(12.240)

Rejection of the null hypothesis indicates that the panel variables are cointegrated. The Johansen Fisher Panel Cointegration Test

Maddala and Wu (1999) proposed the Johansen Fisher panel cointegration test to examine whether there is a long-run relationship between the panel variables. The Johansen Fisher panel cointegration test is a panel version of the individual Johansen cointegration test. The Johansen Fisher panel cointegration test is based on the aggregates of the p-values of the individual Johansen maximum eigenvalues and trace statistics. If pi is the p-value from an individual cointegration test for cross-section i, under the null hypothesis of no cointegration relationship, the test statistic for the panel is given by n

2¦ log(pi ) ~ Ȥ 22n

(12.241)

i 1

In the Johansen-type panel cointegration tests, results heavily depends on the number of lags of the VAR systems. Ex. 12-4: The long-run relationship among the panel variables ln(GDP), ln(LF) and ln(CAP) are examined using the Kao test and the Johansen Fisher panel cointegration test for the given data in Ex. 12-1. EViews is used for estimation and the results are given in Table 12-10. Table 12-10: Panel cointegration tests results

Kao Test -4.2007* Cointegrating Equations none maximum 1 maximum 2 none maximum 1 maximum 2

Probability 0.0000

Johansen Fisher Panel Cointegration Test Model 1: Intercept (no trend) in CE and VAR Fisher Statistic Fisher Statistic probability (Trace Test) (Max.Eigenvalue) 98.42*. 0.0000 64.37* 51.00* 0.0000 37.05* 39.00* 0.0004 39.00* Model 1: Intercept and trend in CE, no intercept in VAR 74.85* 0.0000 48.58* 39.49* 0.0000 30.79* 20.99 0.1019 20.99

probability 0.0000 0.0000 0.0004 0.0000 0.0059 0.1019

Both the Kao test and the Johansen Fisher panel cointegration test results confirm the existence of cointegrating relationships among the panel variables.

694

Chapter Twelve

Exercises 12-1: Define a panel-data regression model and compare it with cross-sectional and time-series data with an example of each. 12-2: Discuss the panel-data regression model with an example. 12-3: Why are panel estimators more efficient than the cross-sectional and time-series estimators? Explain. 12-4: Define a pooled regression model along with the assumptions involved. 12-5: Discuss the important properties of the pooled least-squares estimator of ȕ. 12-6: Define a fixed-effects model along with the assumptions involved. 12-7: Discuss the important properties of the fixed-effects estimator of ȕ. 12-8: Discuss the technique to obtain the first-difference estimator of ȕ. 12-9: Discuss the technique to obtain the difference-in-difference estimator of ȕ. 12-10: Define the random-effects model along with the assumptions involved. 12-11: Discuss the important properties of the random-effects estimator of ȕ. 12-12: Distinguish between the fixed-effects and random-effects models with an example of each. 12-13: Discuss the technique to obtain the variance structure of the random-effects model. 12-14: Discuss the technique to obtain the GLS and FGLS estimator of ȕ. 12-15: Distinguish between “within” and “between” estimators. 12-16: Distinguish between the GLS and OLS estimators. 12-17: Discuss whether the fixed-effects or the random-effects models are applicable for many panel analyses. 12-18: Illustrate the Breusch-Pagan Lagrange multiplier (LM) test for random effects. 12-19: Discuss the Hausman test to select whether the fixed-effects or random-effects models will be appropriate for panel data analysis. 12-20: Discuss the poolability tests and test for fixed effects. 12-21: Compare between the GLS, OLS and within estimators. 12-22: Discuss the goodness of fit of panel-data regression models. 12-23: Discuss the Durbin-Watson and LM tests for autocorrelation in the fixed-effects models. 12-24: Discuss the Paris-Winsten’s method to estimate an AR(1) fixed-effects models. 12-25: Discuss the test for heteroscedasticity in panel regression models. 12-26: Discuss the IV and GMM methods to estimate panel regression models. 12-27: Discuss the Levin, Lin and Chu (2002); Im, Pesaran and Shin (2003); Maddala and Wu (1999); Choi (2001); and Hadri (2000) tests for panel unit root problem. 12-28: Discuss the Kao’s, Pedroni’s, and the Johansen Fisher panel tests for panel cointegration.

APPENDICES: STATISTICAL TABLES

Appendix A: Standard Normal Distribution The entries in the table give the areas under the Standard Normal curve from 0 to z

Z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0

0.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987

0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987

0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987

0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988

0.04 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988

0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989

0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989

0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989

0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990

0.09 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990

Source: James T. McClave and P. George Benson, (1988), Statistics for Business and Economics, Appendix, Table IV, Fourth Edition, Dellen Publishing Company, San Francisco.

Appendices: Statistical Tables

696

Appendix B: The t-Distribution The entries in the table give the critical values of the t-Statistic for the specified number of degrees of freedom and areas in the right tail.

d.f. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Area in the Right Tail under the t-Distribution Curve 0.10 0.05 0.025 0.01 0.005 3.078 6.314 12.706 31.821 63.657 1.886 2.920 4.303 6.965 9.925 1.638 2.353 3.182 4.541 5.841 1.533 2.132 2.776 3.747 4.604 1.476 2.015 2.571 3.365 4.032 1.440 1.943 2.447 3.143 3.707 1.415 1.895 2.365 2.998 3.499 1.397 1.860 2.306 2.896 3.355 1.383 1.833 2.262 2.821 3.250 1.372 1.812 2.228 2.764 3.169 1.363 1.796 2.201 2.718 3.106 1.356 1.782 2.179 2.681 3.055 1.350 1.771 2.160 2.650 3.012 1.345 1.761 2.145 2.624 2.977 1.341 1.753 2.131 2.602 2.947 1.337 1.746 2.120 2.583 2.921 1.333 1.740 2.110 2.567 2.898 1.330 1.734 2.101 2.552 2.878 1.328 1.729 2.093 2.539 2.861 1.325 1.725 2.086 2.528 2.845 1.323 1.721 2.080 2.518 2.831 1.321 1.717 2.074 2.508 2.819 1.319 1.714 2.069 2.500 2.807 1.318 1.711 2.064 2.492 2.797 1.316 1.708 2.060 2.485 2.787 1.315 1.706 2.056 2.479 2.779 1.314 1.703 2.052 2.473 2.771 1.313 1.701 2.048 2.467 2.763 1.311 1.699 2.045 2.462 2.756 1.310 1.697 2.042 2.457 2.750 1.309 1.696 2.040 2.453 2.744 1.309 1.694 2.037 2.449 2.738 1.308 1.692 2.035 2.445 2.733 1.307 1.691 2.032 2.441 2.728 1.306 1.690 2.030 2.438 2.724 1.306 1.688 2.028 2.434 2.719 1.305 1.687 2.026 2.431 2.715 1.304 1.686 2.024 2.429 2.712 1.304 1.685 2.023 2.426 2.708 1.303 1.684 2.021 2.423 2.704

0.001 318.309 22.327 10.215 7.173 5.893 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.375 3.365 3.356 3.348 3.340 3.333 3.326 3.319 3.313 3.307

Econometric Analysis: An Applied Approach to Business and Economics

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

f

1.303 1.302 1.302 1.301 1.301 1.300 1.300 1.299 1.299 1.299 1.298 1.298 1.298 1.297 1.297 1.297 1.297 1.296 1.296 1.296 1.296 1.295 1.295 1.295 1.295 1.295 1.294 1.294 1.294 1.294 1.294 1.293 1.293 1.293 1.293 1.282

1.683 1.682 1.681 1.680 1.679 1.679 1.678 1.677 1.677 1.676 1.675 1.675 1.674 1.674 1.673 1.673 1.672 1.672 1.671 1.671 1.670 1.670 1.669 1.669 1.669 1.668 1.668 1.668 1.667 1.667 1.667 1.666 1.666 1.666 1.665 1.645

2.020 2.018 2.017 2.015 2.014 2.013 2.012 2.011 2.010 2.009 2.008 2.007 2.006 2.005 2.004 2.003 2.002 2.002 2.001 2.000 2.000 1.999 1.998 1.998 1.997 1.997 1.996 1.995 1.995 1.994 1.994 1.993 1.993 1.993 1.992 1.960

2.421 2.418 2.416 2.414 2.412 2.410 2.408 2.407 2.405 2.403 2.402 2.400 2.399 2.397 2.396 2.395 2.394 2.392 2.391 2.390 2.389 2.388 2.387 2.386 2.385 2.384 2.383 2.382 2.382 2.381 2.380 2.379 2.379 2.378 2.377 2.326

2.701 2.698 2.695 2.692 2.690 2.687 2.685 2.682 2.680 2.678 2.676 2.674 2.672 2.670 2.668 2.667 2.665 2.663 2.662 2.660 2.659 2.657 2.656 2.655 2.654 2.652 2.651 2.650 2.649 2.648 2.647 2.646 2.645 2.644 2.643 2.576

3.301 3.296 3.291 3.286 3.281 3.277 3.273 3.269 3.265 3.261 3.258 3.255 3.251 3.248 3.245 3.242 3.239 3.237 3.234 3.232 3.229 3.227 3.225 3.223 3.220 3.218 3.216 3.214 3.213 3.211 3.209 3.207 3.206 3.204 3.202 3.090

697

Appendices: Statistical Tables

698

Appendix C: Critical Values of the Chi-square Distribution Prob(Ȥ 2 > Ȥ Į2 ) = Į

Degrees of Freedom

Significance Level d.f. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

0.10 2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 21.06 22.31 23.54 24.77 25.99 27.20 28.41 29.62 30.81 32.01 33.20 34.38 35.56 36.74 37.92 39.09 40.26

0.05 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 23.68 25.00 26.30 27.59 28.87 30.14 31.41 32.67 33.92 35.17 36.42 37.65 38.89 40.11 41.34 42.56 43.77

0.01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.72 26.22 27.69 29.14 30.58 32.00 33.41 34.81 36.19 37.57 38.93 40.29 41.64 42.98 44.31 45.64 46.96 48.28 49.59 50.89

Source: Jeffrey M. Wooldridge. Introductory Econometrics, South-Western Cengage-Learning, USA

Econometric Analysis: An Applied Approach to Business and Economics

699

Appendix D : Critical Values of the F Distribution Prob(F(v1 ,v2 ) >FĮ(v1 ,v1 ) ) = Į

v 1 = represents the numerator degrees of freedom and v 2 = represents the denominator degrees of freedom.

Significance Level, D = 0.10 v2

v1

1 2 3 4 5

1 39.86 8.53 5.54 4.54 4.06

2 49.50 9.00 5.46 4.32 3.78

3 4 53.59 55.83 9.16 9.24 5.39 5.34 4.19 4.11 3.62 3.52

5 57.24 9.29 5.31 4.05 3.45

6 58.20 9.33 5.28 4.01 3.40

7 58.91 9.35 5.27 3.98 3.37

8 59.44 9.37 5.25 3.95 3.34

9 59.86 9.38 5.24 3.94 3.32

10 60.19 9.39 5.23 3.92 3.30

6 7 8 9 10

3.78 3.59 3.46 3.36 3.29

3.46 3.26 3.11 3.01 2.92

3.29 3.07 2.92 2.81 2.73

3.18 2.96 2.81 2.69 2.61

3.11 2.88 2.73 2.61 2.52

3.05 2.83 2.67 2.55 2.46

3.01 2.78 2.62 2.51 2.41

2.98 2.75 2.59 2.47 2.38

2.96 2.72 2.56 2.44 2.35

2.94 2.70 2.54 2.42 2.32

11 12 13 14 15

3.23 3.18 3.14 3.10 3.07

2.86 2.81 2.76 2.73 2.70

2.66 2.61 2.56 2.52 2.49

2.54 2.48 2.43 2.39 2.36

2.45 2.39 2.35 2.31 2.27

2.39 2.33 2.28 2.24 2.21

2.34 2.28 2.23 2.19 2.16

2.30 2.24 2.20 2.15 2.10

2.27 2.21 2.16 2.12 2.09

2.25 2.19 2.14 2.10 2.06

16 17 18 19 20

3.05 3.03 3.01 2.99 2.97

2.67 2.64 2.62 2.61 2.59

2.46 2.44 2.42 2.40 2.38

2.33 2.31 2.29 2.27 2.25

2.24 2.22 2.20 2.18 2.16

2.18 2.15 2.13 2.11 2.09

2.13 2.10 2.08 2.06 2.04

2.09 2.06 2.04 2.02 2.00

2.06 2.03 2.00 1.98 1.96

2.03 2.00 1.98 1.96 1.94

21 22 23 24 25

2.96 2.95 2.94 2.93 2.92

2.57 2.56 2.55 2.54 2.53

2.36 2.35 2.34 2.33 2.32

2.23 2.22 2.21 2.19 2.18

2.14 2.13 2.11 2.10 2.09

2.08 2.06 2.05 2.04 2.02

2.02 2.01 1.99 1.98 1.97

1.98 1.97 1.95 1.94 1.93

1.95 1.93 1.92 1.91 1.89

1.92 1.90 1.89 1.88 1.87

26 27 28 29 30

2.91 2.90 2.89 2.89 2.88

2.52 2.51 2.50 2.50 2.49

2.31 2.30 2.29 2.28 2.28

2.17 2.17 2.16 2.15 2.14

2.08 2.07 2.06 2.06 2.05

2.01 2.00 2.00 1.99 1.98

1.96 1.95 1.94 1.93 1.93

1.92 1.91 1.90 1.89 1.88

1.88 1.87 1.87 1.86 1.85

1.86 1.85 1.84 1.83 1.82

40 60 120

2.84 2.79 2.75 2.71

2.44 2.39 2.35 2.30

2.23 2.18 2.13 2.08

2.09 2.04 1.99 1.94

2.02 1.95 1.90 1.85

1.93 1.87 1.82 1.77

1.87 1.82 1.77 1.72

1.83 1.77 1.72 1.67

1.79 1.74 1.68 1.63

1.76 1.71 1.65 1.60

f

Source: R. P. Hooda, Statistics for Business and Economics, Macmillan India Limited.

Appendices: Statistical Tables

700

Appendix D: [Continued] Significance Level, D = 0.10

v2

v1

f

1 2 3 4 5

12 60.71 9.41 5.22 3.90 3.27

15 61.22 9.42 5.20 3.87 3.24

20 61.74 9.44 5.18 3.84 3.21

24 62.00 9.45 5.18 3.83 3.19

30 62.25 9.45 5.17 3.82 3.17

40 62.53 9.47 5.16 3.80 3.16

60 62.79 9.47 5.15 3.79 3.14

120 63.06 9.48 5.14 3.78 3.12

63.33 9.49 5.13 3.76 3.10

6 7 8 9 10

2.90 2.67 2.50 2.38 2.28

2.87 2.63 2.46 2.34 2.24

2.84 2.59 2.42 2.30 2.20

2.82 2.58 2.40 2.28 2.18

2.80 2.56 2.38 2.25 2.16

2.78 2.54 2.36 2.23 2.13

2.76 2.51 2.34 2.21 2.11

2.74 2.49 2.32 2.18 2.08

2.72 2.47 2.29 2.16 2.06

11 12 13 14 15

2.21 2.15 2.10 2.05 2.02

2.17 2.10 2.05 2.01 1.97

2.12 2.06 2.01 1.96 1.92

2.10 2.04 1.98 1.94 1.90

2.08 2.01 1.96 1.91 1.87

2.05 1.99 1.93 1.89 1.85

2.03 1.96 1.90 1.86 1.82

2.00 1.93 1.88 1.83 1.79

1.97 1.90 1.85 1.80 1.76

16 17 18 19 20

1.99 1.96 1.93 1.91 1.89

1.94 1.91 1.89 1.86 1.84

1.89 1.86 1.84 1.81 1.79

1.87 1.84 1.81 1.79 1.77

1.84 1.81 1.78 1.76 1.74

1.81 1.78 1.75 1.73 1.71

1.78 1.75 1.72 1.70 1.68

1.75 1.72 1.69 1.67 1.64

1.72 1.69 1.66 1.63 1.61

21 22 23 24 25

1.87 1.86 1.84 1.83 1.82

1.83 1.81 1.80 1.78 1.77

1.78 1.76 1.74 1.73 1.72

1.75 1.73 1.72 1.70 1.69

1.72 1.70 1.69 1.67 1.66

1.69 1.67 1.66 1.64 1.63

1.66 1.64 1.62 1.61 1.59

1.62 1.60 1.59 1.57 1.56

1.59 1.57 1.55 1.53 1.52

26 27 28 29 30

1.81 1.80 1.79 1.78 1.77

1.76 1.75 1.74 1.73 1.72

1.71 1.70 1.69 1.68 1.67

1.68 1.67 1.66 1.65 1.64

1.65 1.64 1.63 1.62 1.61

1.61 1.60 1.59 1.58 1.57

1.58 1.57 1.56 1.55 1.54

1.54 1.53 1.52 1.51 1.50

1.50 1.49 1.48 1.47 1.46

40 60 120

1.71 1.66 1.60 1.55

1.66 1.60 1.55 1.49

1.61 1.54 1.48 1.42

1.57 1.51 1.45 1.38

1.54 1.48 1.41 1.34

1.51 1.44 1.37 1.30

1.47 1.40 1.32 1.24

1.42 1.35 1.26 1.17

1.38 1.29 1.19 1.00

f

Source: R P Hooda, Statistics for Business and Economics, Macmillan India Limited.

Econometric Analysis: An Applied Approach to Business and Economics

701

Appendix D: [Continued] Significance Level, D = 0.05

v2

v1

1 2 3 4 5

1 161.4 81.51 10.13 7.71 6.61

2 199.5 19.00 9.55 6.94 5.79

3 215.7 19.16 9.28 6.59 5.41

6 7 8 9 10

5.99 5.59 5.32 5.12 4.96

5.14 4.74 4.46 4.26 4.10

11 12 13 14 15

4.84 4.75 4.67 4.60 4.54

16 17 18 19 20

224.6 19.25 9.12 6.39 5.19

230.2 19.30 9.01 6.26 5.05

234.0 19.33 8.94 6.16 4.95

236.8 19.35 8.89 6.09 4.88

238.9 19.37 8.85 6.04 4.82

9 10 240.5 241.9 19.38 19.40 8.81 8.79 6.00 5.96 4.77 4.74

4.76 4.35 4.07 3.86 3.71

4.53 4.12 3.84 3.63 3.48

4.39 3.97 3.69 3.48 3.33

4.28 3.87 3.58 3.37 3.22

4.21 3.79 3.50 3.29 3.14

4.15 3.73 3.44 3.23 3.07

4.10 3.68 3.39 3.18 3.02

4.06 3.64 3.35 3.14 2.98

3.98 3.89 3.81 3.74 3.68

3.59 3.49 3.41 3.34 3.29

3.36 3.26 3.18 3.11 3.06

3.20 3.11 3.03 2.96 2.90

3.09 3.00 2.92 2.85 2.79

3.01 2.91 2.83 2.76 2.71

2.95 2.85 2.77 2.70 2.64

2.90 2.80 2.71 2.65 2.59

2.85 2.75 2.67 2.60 2.54

4.49 4.45 4.41 4.38 4.35

3.63 3.59 3.55 3.52 3.49

3.24 3.20 3.16 3.13 3.10

3.01 2.96 2.93 2.90 2.87

2.85 2.81 2.77 2.74 2.71

2.74 2.70 2.66 2.63 2.60

2.66 2.61 2.58 2.54 2.51

2.59 2.55 2.51 2.48 2.45

2.54 2.49 2.46 2.42 2.39

2.49 2.45 2.41 2.38 2.35

21 22 23 24 25

4.32 4.30 4.28 4.26 4.24

3.47 3.44 3.42 3.40 3.39

3.07 3.05 3.03 3.01 2.99

2.84 2.82 2.80 2.78 2.76

2.68 2.66 2.64 2.62 2.60

2.57 2.55 2.53 2.51 2.49

2.49 2.46 2.44 2.42 2.40

2.42 2.40 2.37 2.36 2.34

2.37 2.34 2.32 2.30 2.28

2.32 2.30 2.27 2.25 2.24

26 27 28 29 30

4.23 4.21 4.20 4.18 4.17

3.37 3.35 3.34 3.33 3.32

2.98 2.96 2.95 2.93 2.92

2.74 2.73 2.71 2.70 2.69

2.59 2.57 2.56 2.55 2.53

2.47 2.46 2.45 2.43 2.42

2.39 2.37 2.36 2.35 2.33

2.32 2.31 2.29 2.28 2.27

2.27 2.25 2.24 2.22 2.21

2.22 2.20 2.19 2.18 2.16

40 60 120

4.08 4.00 3.92 3.84

3.23 3.15 3.07 3.00

2.84 2.76 2.68 2.60

2.61 2.53 2.45 2.37

2.45 2.37 2.29 2.21

2.34 2.25 2.17 2.10

2.25 2.17 2.09 2.01

2.18 2.10 2.02 1.94

2.12 2.04 1.96 1.88

2.08 1.99 1.91 1.83

f

4

5

6

7

8

Source: R P Hooda, Statistics for Business and Economics, Macmillan India Limited.

Appendices: Statistical Tables

702

Appendix D: [Continued] Significance Level, D = 0.05

v2

v1

f

1 2 3 4 5

12 243.9 19.41 8.74 5.91 4.68

15 245.9 19.43 8.70 5.86 4.62

20 248.0 19.45 8.66 5.80 4.56

24 249.1 19.45 8.64 5.77 4.53

30 250.1 19.46 8.62 5.75 4.50

40 251.1 19.47 8.59 5.72 4.46

60 252.2 19.48 8.57 5.69 4.43

120 253.3 19.49 8.55 5.66 4.40

254.3 19.50 8.53 5.63 4.36

6 7 8 9 10

4.00 3.57 3.28 3.07 2.91

3.94 3.51 3.22 3.01 2.85

3.87 3.44 3.15 2.94 2.77

3.84 3.41 3.12 2.90 2.74

3.81 3.38 3.08 2.86 2.70

3.77 3.34 3.04 2.83 2.66

3.74 3.30 3.01 2.79 2.62

3.70 3.27 2.97 2.75 2.58

3.67 3.23 2.93 2.71 2.54

11 12 13 14 15

2.79 2.69 2.60 2.53 2.48

2.72 2.62 2.53 2.46 2.40

2.65 2.54 2.46 2.59 2.33

2.61 2.51 2.42 2.35 2.29

2.57 2.47 2.38 2.31 2.25

2.53 2.43 2.34 2.27 2.20

2.49 2.38 2.30 2.22 2.16

2.45 2.34 2.25 2.18 2.11

2.40 2.30 2.21 2.13 2.07

16 17 18 19 20

2.42 2.38 2.34 2.31 2.28

2.35 2.31 2.27 2.23 2.20

2.28 2.23 2.19 2.16 2.12

2.24 2.19 2.15 2.11 2.08

2.19 2.15 2.11 2.07 2.04

2.15 2.10 2.06 2.03 1.99

2.11 2.06 2.02 1.98 1.95

2.06 2.01 1.97 1.93 1.90

2.01 1.96 1.92 1.88 1.84

21 22 23 24 25

2.25 2.23 2.20 2.18 2.16

2.18 2.15 2.13 2.11 2.09

2.10 2.07 2.05 2.03 2.01

2.05 2.03 2.01 1.98 1.96

2.01 1.98 1.96 1.94 1.92

1.96 1.94 1.91 1.89 1.87

1.92 1.89 1.86 1.84 1.82

1.87 1.84 1.81 1.79 1.77

1.81 1.78 1.76 1.73 1.71

26 27 28 29 30

2.15 2.13 2.12 2.10 2.09

2.07 2.06 2.04 2.03 2.01

1.99 1.97 1.96 1.94 1.93

1.96 1.93 1.91 1.90 1.89

1.92 1.88 1.87 1.85 1.84

1.87 1.84 1.82 1.81 1.79

1.82 1.79 1.77 1.75 1.74

1.77 1.73 1.71 1.70 1.68

1.71 1.67 1.65 1.64 1.62

40 60 120

2.00 1.92 1.83 1.75

1.92 1.84 1.75 1.67

1.84 1.75 1.66 1.57

1.79 1.70 1.61 1.52

1.74 1.65 1.55 1.46

1.69 1.59 1.50 1.39

1.64 1.53 1.43 1.32

1.58 1.47 1.35 1.22

1.51 1.39 1.20 1.00

f

Source: R P Hooda, Statistics for Business and Economics, Macmillan India Limited.

REFERENCES

Akaike, H. (1979). A Bayesian Extension of the Minimum AIC Procedure of Autogressive Model Fitting. Biometrika 66, 237–242. Akaike, H. (1981a). Likelihood of a Model and Information Criteria. Journal of Econometrics, 16, 3–14. Banerjee, A., Dolado J., Galbraith W., and Hendry D. F. (1993). Co-integration, Error Correction and the Econometric Analysis of Non-Stationary Data. Oxford: Oxford University Press. Baltagi, B.H. (2005). Econometric Analysis of Panel Data, 3rd Edition, John Wiley, Chichester Bierens, H. J. (2001). Unit Roots: Chapter 29 in B.H. Baltagi (ed.) A Companion to Theoretical Econometrics, Blackwell: Massachusetts. Bera, A. K., and Jarque C. M. (1981). An Efficient Large-Sample Test for Normality of Observations and Regression Residuals. Australian National University Working Papers in Econometrics 40, Canberra. Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroscedasticity, Journal of Econometrics 31, 307– 327. Bollerslev, T., Chou, R. Y., and Kroner, K. F. (1992). ARCH Modelling in Finance: A Review of the Theory and Empirical Evidence. Journal of Econometrics 52(5), 5–59. Bollerslev, T., Engle, R. F. and Wooldridge, J. M. (1988). A Capital-asset Pricing Model with Time-Varying Covariances. Journal of Political Economy 96(1), 116–31. Bound, J., Jaeger D. A., and Baker, R. M. (1995). Problems with Instrumental Variables Estimation when the Correlation between the Instruments and Endogenous Explanatory Variables is Weak. Journal of American Statistical Association, 90: 443-450. Box, G. E. P., Jenkins, G. M., and Reinsel G. C. (2015). Time Series Analysis: Forecasting and Control, 5th Edition, Holden-Day, San Francisco. Breitung, J. (2000). The Local Power of Some Unit Root Tests for Panel Data, in B. Baltagi (ed.) Nonstationary Panels, Panel Cointegration and Dynamic Panels, Advances in Econometrics 15, 161–78, JAI Press, Amsterdam. Breitung, J. and Das, S. (2005). Panel Unit Root Tests under Cross-sectional Dependence, Statistica Neerlandica 59, 414–33 Breitung, J. and Pesaran, M. H. (2008). Unit Roots and Cointegration in Panels, in L. Matyas and P. Sevestre (eds.) The Econometrics of Panel Data, 3rd edn, Springer-Verlag, Berlin Breusch, T. S., and A. R. Pagan, (1979). A Simple Test for Heteroscedasticity and Random Coefficient Variations. Econometrica 47, 987-1007. Brooks, C. (2008). Introductory Econometrics for Finance, 2nd Edition, Cambridge University Press, Cambridge, U.K. Cameron, A. C. and Trivedi P. K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press. Choi, I. (2001). Unit Root Tests for Panel Data. Journal of International Money and Finance 20, 249–72 Chow, G. C. (1960). Test of Equality between Sets of Coefficients in Two Linear Regression Models. Econometrica, 28, 591-605. Cochrane, D. and Orcutt, G. H. (1949). Application of Least Squares Regression to Relationships Containing Autocorrelated Error Terms. Journal of the American Statistical Association 44, 32–61. Dickey, D.A., and Fuller, W. A. (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association 74, 427–431. Dickey, D. A. and Fuller, W. A. (1981). Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root. Econometrica 49(4), 1057–72. Durbin, J., and Watson, G. S. (1951). Testing for Serial Correlation in Least Squares Regression. Biometrika 38, 159– 71 Dougherty, C. (1992). Introduction to Econometrics, Oxford University Press, Oxford. Enders, W. (1995). Applied Econometric Time Series, Wiley: New York. Engle, R.F., and Granger, C.W.J. (1987). Co-Integration and Error Correction: Representation, Estimation and Testing. Econometrica, 55, 251–276. Fabozzi, F. J., and Francis, J. C. (1980). Heteroscedasticity in the Single Index Model. Journal of Economics and Business 32, 243–8. Fisher, R. A. (1922). On the Mathematical Foundations of Theoretical Statistics. Royal Society of London. Philosophical Transactions (Series A) 222, 309–368. Fisher, R. A. (1932). Statistical Methods for Research Workers, 4th edn, Oliver and Boyd, Edinburgh. Franses, P. H., and Van Dijk, D. (2000). Non-linear Time Series Models in Empirical Finance, Cambridge University Press, Cambridge, UK. Fuller, W. A. (1976). Introduction to Statistical Time Series, Wiley, New York George. Godfrey, L.G. (1979). Testing the Adequacy of a Time Series Model. Biometrika 66, 67–72.

704

References

Goldberger, A. S. (1991). A Course in Econometrics. Cambridge, MA: Harvard University Press. Goldfeld, S. M., and Quandt, R. E. (1965). Some Tests for Homoskedasticity. Journal of the American Statistical Association 60, 539–47 Granger, C.W.J. (1969). Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica 37, 424–438. Granger, C.W.J., and Newbold P. (1974). Spurious Regressions in Econometrics. Journal of Econometrics 2, 111– 120. Greene, W. (2003). Econometric Analysis, 5th Edition, New York, MacMillan. Gregory, A. W. and Hansen, B. E. (1996). A Residual-based Test for Cointegration in Models with Regime Shifts. Journal of Econometrics 70, 99–126. Gujarati, D.N. (2002). Basic Econometrics, 4th Edition, McGraw Hill: New York. Hadri, K., (2000). Testing for stationarity in heterogeneous panel data. Econometrics Journal 3, 148-161. Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press: Princeton, New Jersey. Hannan, E. J., and Quinn, B. G. (1979). The Determination of the Order of an Autoregression. Journal of The Royal Statistical Society (Series B),41, 190–195. Hansen, B. E. (1996). Inference When a Nuisance Parameter is not Identified under the Null Hypothesis. Econometrica 64, 413–30 Hansen, L. P. (1982). Large Sample Properties of Generalised Method of Moments Estimators. Econometrica 50, 1029–54 Harris, R. I. D. (1995). Cointegration Analysis in Econometric Modelling, Prentice-Hall, Harlow, UK. Harris, R. D. F., and Tzavalis, E. (1999). Inference for Unit Roots in Dynamic Panels where the Time Dimension is Fixed. Journal of Econometrics 91, 201–26. Harvey, A., Ruiz, E., and Shephard, N. (1994). Multivariate Stochastic Variance Models. Review of Economic Studies 61, 247–64 Hausman, J. A. (1978). Specification Tests in Econometrics. Econometrica, 46, 1251-1271. Heckman, J. J. (1976). The Common Structure of Statistical Models of Truncation, Sample Selection, and Limited Dependent Variable and a Simple Estimate for Such Models. Annals of Economics and Social Measurements 5, 475-492. Heckman, J. J. (1979). Sample Selection Bias as a Specification Error. Econometrica 47(1), 153–61 Hendry, D. F., and Juselius, K. (2000). Explaining Cointegration Analysis: Part I, Energy Journal 21, 1–42. Hill, C. W., Griffiths, W., and Judge, G. (2000). Undergraduate Econometrics, 2nd Edition, Wiley, New York. Hsiao, C. (2003). Analysis of Panel Data, 2nd Edition, Cambridge University Press, Cambridge, UK. Hossain, Md. S. (2011). Panel Estimation for CO2 Emissions, Energy Consumption, Economic Growth, Trade Openness and Urbanization of Newly Industrialized Countries. Energy Policy, 39(11), 6991-6999. Hossain, Md. S. (2012). Multivariate Granger Causality between Economic Growth, Electricity Consumption, Exports and Remittance for the Panel of Three SAARC Countries. Global Journal of Management and Business Research 12 (4) Version 1, 41-54. Hossain, Md. S., and Saeki, C. (2012). A Dynamic Causality Study between Electricity Consumption and Economic Growth for Global Panel: Evidence from 76 Countries. Asian Economic and Financial Review 2(1), 1-13. Hendry, D. F., and Richard, J. F. (1982). On the Formulation of Empirical Models in Dynamic Econometrics, Journal of Econometrics 20, 3–33. Im, K. S., Pesaran, M. H., and Shin, Y. (2003). Testing for Unit Roots in Heterogeneous Panels. Journal of Econometrics 115, 53–74 Johansen, S. (1988). Statistical Analysis of Cointegrating Vectors. Journal of Economic Dynamics and Control 12, 231–254. Johansen, S., and Juselius, K. (1990). Maximum Likelihood Estimation and Inference on Cointegration with Applications to the Demand for Money. Oxford Bulletin of Economics and Statistics 52, 169–210 Johnston, J. (1960). Econometric Method, 2nd Edition, McGraw-Hill, New York. Judge, G.G., Hill R. C., Griffiths W.E., Lutkepohl H., and Lee, T.C. (1985). The Theory and Practice of Econometrics, John Wiley and Sons: New York. Kwaitowski, D., Phillips P.C.B., Schmidt, P., and Shin, Y (1992). Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root. Journal of Econometrics 54, 159–178. Kao, C. D. (1999). Spurious Regression and Residual-based Tests for Cointegration in Panel Data. Journal of Econometrics 90, 1–44 Kennedy, P. (2003). Guide to Econometrics, 5th Edition, Blackwell, Malden, MA. Koopmans, T. C. (1937). Linear Regression Analysis of Economic Time Series. Netherlands Economics Institute, Haarlem Koutsoyiannis, A. (1973). Theory of Econometrics, 2nd Edition. The Macmillan Press Ltd. Lahiri, K., & Egy, D. (1981). Joint Estimation and Testing for Functional Form and Heteroskedasticity. Journal of Econometrics, Vo. 15 (2), 299-307 Larsson, R., Lyhagen, J. and Lothgren, M. (2001). Likelihood-based Cointegration Tests in Heterogeneous Panels. Econometrics Journal 4, 109–42

Econometric Analysis: An Applied Approach to Business and Economics

705

Leamer, E. E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data 1st Edition John Wiley, New York. Leamer, E. E. (1985). Vector Autoregressions for Causal Interference, in K. Brunner and A. Meltzer (eds.) Understanding Monetary Regimes, Cambridge University Press, Cambridge, UK, 255–304 Levin, A., Lin, C. F., and Chu, C.S. (2002). Unit Root Tests in Panel Data: Asymptotic and Finite Sample Properties. Journal of Econometrics 108, 1-24. Leybourne, S. J., Mills, T. C., and Newbold, P. (1998). Spurious Rejections by Dickey–Fuller Tests in the Presence of a Break under the Null. Journal of Econometrics 87, 191–203. Ljung, G.M., and Box, G.E.P. (1978). On a Measure of Lack of Fit in Time-Series Models. Biometrika, 65, 297–303. Lutkepohl, H. (2001). Vector Autoregressions. Chapter 32 in B.H. Baltagi (ed.) A Companion to Theoretical Econometrics (Blackwell: Massachusetts). MacKinnon, J.G. (1991). Critical Values for Cointegration Tests, Ch. 13 in Long-Run Economic Relationships: Readings in Cointegration, eds. R.F. Engle and C.W.J. Granger (Oxford University Press: Oxford ). MacKinnon, J. G. (1996). Numerical Distribution Functions for Unit Root and Cointegration Tests. Journal of Applied Econometrics 11, 601–18 MacKinnon, J. G., Haug, A., and Michelis, L. (1999). Numerical Distribution Functions of Likelihood Ratio Tests for Cointegration. Journal of Applied Econometrics 14(5), 563–77 Maddala, G. S. (1999). Limited-dependent and Quantitative Variables in Econometrics, Cambridge University Press, Cambridge, UK. Maddala, G.S. (2009), Introduction to Econometrics, 4th Edition, Macmillan, New York. Maddala, G. S., and Kim, I-M. (1999). Unit Roots, Cointegration and Structural Change, Cambridge University Press, Cambridge. Maddala, G. S., and Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and a New Simple Test. Oxford Bulletin of Economics and Statistics 61, 631–52. Mills, T. C., and Markellos, R. N. (2008). The Econometric Modelling of Financial Time Series, 3rd Edition, Cambridge University Press, Cambridge, UK. McFadden, D. L. (1974). Conditional Logit Analysis of Qualitative Choice Behavior. Frontiers in Econometrics, Ed. P. Zarembka, 105-142, New York Academic Press. Nelson, C.R., and Plosser, C.I. (1982). Trends and Random Walks in Macroeconomic Time Series: Some Evidence and Implications. Journal of Monetary Economics, 10: 139–162. Ng, S., and Perron, P. (1995). Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag. Journal of the American Statistical Association, 90, 268–281. Newey, W. K., and West, K. D. (1987). A Simple Positive Semi-Definite Heteroscedasticity and Autocorrelation Consistent Covariance Matrix. Econometrica 55, 703-708. Neyman, J., and Pearson, E. S. (1928). On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference. Biometrika, 20A, 175–240 (Part I), 263–294 (Part II). Neyman, J., and Pearson, E. S. (1933). On the Problem of the Most Efficient Tests of Statistical Hypotheses. Royal Society of London. Philosophical Transactions. (Series A),231, 289–337. Pagan, A. R., and Schwert, G.W. (1990). Alternative Models for Conditional Stock Volatilities. Journal of Econometrics 45, 267–90. Park, R. (1966). Estimation with Heteroscedastic Error Terms. Econometrica, 34, 888. Pedroni, P. (1999). Critical Values for Cointegration Tests in Heterogeneous Panels with Multiple Regressors. Oxford Bulletin of Economic and Statistics 61, 727-731. Pedroni, P. (2004). Panel Cointegration, Asymptotic and Finite Sample Properties of Pooled Time Series Tests with an Application to the PPP hypothesis. Econometric Theory 20( 3), 597-625. Pesaran, M.H. (1997). The Role of Economic Theory in Modelling the Long Run. The Economic Journal 107, 178191. Pesaran, M.H. (2007). A Simple Panel Unit Root Test in the Presence of Cross-Section Dependence. Journal of Applied Econometrics 2, 265-312. 38 Pesaran, M.H., Smith, R.P., and Im, K.S. (1996). Dynamic Linear Models for Heterogeneous Panels, in Matyas, L. and P. Sevestre (eds), The Econometrics of Panel Data: A Handbook of Theory with Applications, 2nd Revised Ed., Kluwer Academic Publishers, Dordrecht. Pesaran, M.H., Shin, Y., and Smith, R. (1999). Pooled Mean Group Estimation of Dynamic Heterogeneous Panels. Journal of the American Statistical Association 94, 621-634. Phillips, P.C.B. (1986). Understanding Spurious Regressions in Econometrics. Journal of Econometrics 33, 311–340. Phillips, P.C.B., and Sul, D. (2003). Dynamic Panel Estimation and Homogeneity Testing Under Cross-Section Dependence. Econometrics Journal 6, 217-59. Phillips, P.C.B., and Perron, P. (1988). Testing for a Unit Root in Time Series Regression. Biometrika, 75, 335–346. Phillips, P., and Ouliaris, S. (1990). An Asymptotic Properties of Residual Based Tests for Cointegration. Econometrica 58, 73-93. Quandt, R. (1960). Tests of the Hypothesis that a Linear Regression System Obeys Two Different Regimes. Journal of the American Statistical Association 55, 324–30.

706

References

Rahman M. (2012). Basic Econometrics: Theory and Practice, 2nd Edition, The University Grants Commission of Bangladesh. Ramanathan, R. (1995). Introductory Econometrics with Applications, 3rd Edition, Dryden Press, Fort Worth, TX. Ramsey, J. B. (1969). Tests for Specification Errors in Classical Linear Least-squares Regression Analysis. Journal of the Royal Statistical Society B 31(2), 350–71. Schwarz, G. (1978). Estimating the Dimension of a Model. Annals of Statistics 6, 461–4. Shanken, J. (1992). On the Estimation of Beta-pricing Models. Review of Financial Studies 5, 1–33. Shephard, N. (1996). Statistical aspects of ARCH and Stochastic Volatility, in D. R. Cox, D. V. Hinkley and O. E. Barndorff-Nielsen (eds.) Time Series Models: in Econometrics, Finance, and Other Fields, Chapman and Hall, London 1–67. Sims, C.A., Stock J. H., and Watson, M.W. (1990). Inference in Linear Time Series Models with Some Unit Roots. Econometrica, 58, 113–144. Stock, J.H. and Watson M.W.(1988). Variable Trends in Economic Time Series. Journal of Economic Perspectives, 2: 147–174. Stock, J. H., and Watson, M. W. (2011). Introduction to Econometrics, 3rd Edition, Pearson, Boston, MA. Theil, H. (1966). Applied Economic Forecasting, North-Holland, Amsterdam. Toda, H.Y., and Yamamoto, T., (1995). Statistical inference in Vector Autoregression with Possibly Integrated Processes. Journal of Econometrics 66, 225-250. Tobin, J. (1958). Estimation of Relationships for Limited Dependent Variables, Econometrica 26(1), 24–36. Verbeek, M. (2008). A Guide to Modern Econometrics, 3rd Edition, John Wiley & Sons, Ltd. Wald, A. (1943). Tests of Statistical Hypotheses Concerning Several Parameters when the Number of Observations Is Large. Transactions of the American Mathematical Society 54, 426–482. White, H. (1980). A Heteroskedasticity-consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 48, 817–38. Wooldridge, J. M. (2010). Econometric Analysis of Cross-section and Panel Data, 2nd Edition, MIT Press, MA. Zellner, A. (1962). An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias. Journal of the American Statistical Association 57, 348–68. Zivot, E., and Andrews, K. (1992). Further Evidence on the Great Crash, the Oil Price Shock, and the Unit Root Hypothesis. Journal of Business and Economic Statistics 10, 251–270.

SUBJECT INDEX

A ACF (see autocorrelation function), 367, 373-75, 38385, 401-03. ADF test (see augmented Dickey-Fuller tests), 422, 461, 474, 480-490, 492, 499, 501-04, 554, 556-58, 561-62, 683, 685,-88, 691-92 Adjusted-R2, 28, 51, 52, 61, 63, 67, 69, 88, 90, 93, 94, 99, 124-26, 154, 207, 210, 212-13, 258, 275, 279, 334, 346, 354, 398, 400, 404, 410, 499-99, 506, 596-96, 598, 666, 681 AIC (see Akaike’s Information Criterion), 202-03, 261-63, 347, 387-90, 394, 398, 400-01, 403, 410, 484, 486-87, 498, 501, 503-05, 520-21, 523, 530, 554-59, 561-62 Alternative hypothesis, 45, 84, 91, 92, 128, 133-35, 137-42, 145-46, 150-51, 156-58, 180-87, 189-93, 195-97, 199-02, 250-52, 255-56, 258, 260-64, 289, 301-05, 341, 343-45, 349, 353-57, 376, 408, 449-60, 469-74, 478, 480, 482-89, 501-02, 505-06, 519, 526, 552-53, 556-59, 561-63, 589-90, 603, 649, 659-60, 665, 668, 669, 672, 676-78, 681, 683, 687-89, 691 Analogy property, 38, 39 Analysis of Variance (ANOVA), 86, 125, 127-28, 130-31, 133-138, 140 see also the ANOVA table AR Model (see also the autoregressive process), 234, 364, 368-71, 387-89, 394, 474, 507, 670 AR(1) Model (see also the AR(1) process), 231, 241, 243, 246, 256, 270, 364-68, 383-385, 387-88, 390, 392-93, 395, 405, 422, 435, 438, 443, 451, 455, 457, 463, 469-71, 474, 507, 670 the mean, 235, 365, 369-70 the variance, 235, 365-66, 370 the autocovariance function, 236, 365, 369, the autocorrelation function, 235, 366, 369 variance-covariance matrix, 238 AR(2) Model (see also the AR(2) process), 369, 37172, 381, 383, 393-94, 507 the mean, 369, the variance, 369 the autocovariance function, 369 the autocorrelation function, 369 AR(p) Model ( see also the AR(p) process), 260, 36971, 381, 383, 387, 391-92, 474 the mean, 370 the variance, 370 the autocovariance function, 370 the autocorrelation function, 370 ARCH Model (see autoregressive conditional heteroscedastic Model), 201-02, 347, 364, 404-06, 409-10, 412 concepts, 404 definition, 404-05 model estimation, 406 testing for ARCH effects, 408

ARDL Model (see also autoregressive distributed lag model), 494-95, 543-44, 555-56 Arellano-Bond GMM estimator, 679 ARIMA models (see autoregressive integrated moving average process), 347, 363-64, 400, 418 ARMA (p, q) models (see autoregressive moving average process), 385-86, 398, 401 autocovariance, 386 autocorrelation, 386 concepts, 385, definition, 385 maximum likelihood estimation method, 398 mean, 386 the Box-Jenkins approach, 401 choosing a model, 401 diagnostic checking, 401 variance, 386 Asset pricing model (see also capital asset pricing model), 56, 414, 415 Assumptions, 3, 4, 6, 7, 15, 19, 20, 22, 28, 40, 53, 60, 62, 76, 77, 89, 95, 98, 125, 127, 138, 146, 151, 156, 167, 172, 177, 184-87, 189, 194-97, 199, 201-07, 210-12, 214, 216, 221, 223, 232-33, 238, 250-51, 253-54, 272, 340-42, 352, 406-07, 414, 520, 522, 537-38, 550, 592, 595, 638, 641, 645, 649-51, 653, 662-64, 668, 672-73, 675 Asymptotic properties, 41, 42, 58, 691 consistency, 42 normally distributed, 42 OLS estimator, 41, 42, 58 Augmented Dickey-Fuller test (ADF test), 422, 434, 460, 474, 482, 485, 487, 490, 499, 501-04, 545, 556, 585, 687, Autocorrelation, 167, 174, 201-02, 231-238, 241, 245, 249-55, 257-62, 264-65, 268, 270-71, 278, 280, 366-86, 401-02, 435, 470, 489, 533, 561, 595-96, 598, 638, 653, 668, 670, 677, 683 asymptotic test, 254, Box-Pierce (1970) test, 263-64, 375-76. Breusch-Godfrey LM test, 258, 260, 279 Cochrane-Orcutt (1949) method, 268, 272-73, 27576, 281 consequences, 231, 238, 239 detecting, 201 Durbin’s (1960) method, 272, 275-76 Durbin-Watson test, 251-55, 257-58, 260, 277, 499 effects on the properties of the OLS estimators, 238 feasible generalized least squares (FGLS) estimator, 271-72, 277-78 first-order autoregressive scheme, 234-36, 242-43, 252, 266 generalized least squares (GLS) method, 265, 268 graphical method, 249 higher order autocorrelation, 260, 261, 263, 375, 378

708

Subject Index

Hildreth-Lu (1960) search procedure, 272, 274-75, 281 interpolation in the statistical observations, 234 maximum likelihood method, 270, 390 meaning, 231 methods for estimating, 265 misspecification, 233 OLS estimation with lagged dependent variables, 247 omitted explanatory variables, 233 Prais-Winsten estimation procedure, 272, 277-78, 281 properties of OLS estimator, 238 Q-test, 263 runs test, 262 shortcomings of the Durbin-Watson test, 255 sources, 233 tackling the problem of inconclusion, 255 Von Neumann ratio test, 250-52 Autocorrelation function (ACF), 366-67, 369-75, 37781, 385-86 Autocovariances, 232, 236, 265, 367, 369-70, 373, 377-80, 386, 392, 467, 512, 514, 516, Autoregressive conditional heteroscedasticity (ARCH), 201-02, 347, 364, 404-06 Autoregressive distributed lag model, 494-95, 543-44, 555-56 Autoregressive process, 234, 264, 368-70, 399 autocorrelation function, 366-67, 369-75, 377-81, 385-86 autocovariance function, 365-66, 368, 370, 373, 377-80, 386, 392 definition of correlogram, 372 first-order autoregressive (AR(1)) process, 231, 234, 241, 243, 246, 364-68, 384-85, 390, 39293, 405 pth-order autoregressive (AR(p)) process, 260, 366-71, 381, 391-92, 474 second-order autoregressive (AR(2)) process, 36869, 372, 393-94, stationarity condition, 235, 370 test for autocorrelation, 252, 255, 260, 262, 279, 667, 677 test for higher-order autocorrelation, 260, 263, 375 test for stationarity, 372, 374, 490 Auxiliary regression, 158, 199, 201, 261-62, 298-99, 302, 304, 671-72, 684 Average marginal effect, 589, 596, 602, 620-22, 62829 B Balanced panel, 660, 688 Bank, 21, 27, 88, 97, 150-51, 155, 157, 164, 166, 18081, 192-94, 211, 220, 223-24, 295, 530, 607, 61920, 636 Bayesian Information Criterion (see also Schwarz Bayesian Information Criterion (SBIC)), 202-03, 261-62, 264, 347, 387-90, 394, 398, 400-01, 403, 410, 483-87, 497-98, 501, 503-04, 520-21, 530, 554, 556-59, 561-62 Best linear unbiased estimator (BLUE), 38, 40, 100-01, 206, 313, 407, 541

Beta coefficient, 4, 9, 58, 94, 162, 333, 634 Beta risk, 75, 94, 138, 141-42, 162 Between estimator, 662, 657-58 BHHH estimator, 395, 414 Bias, 41, 216, 312-14, 320-22, 327 Biasedness, 108, 242, 246, 338, 348 Biased estimator, 312-14, 348 Binary choice model, 576-77, 591, 598 Binary dependent variable, 567-69, 571, 590-91, 594, 598, 601, 604-06 Binary response model, 586, 600 Binary variable, 13, 591, 606, 651 BLUE (see also best linear unbiased estimator), 38, 40, 101-02, 172, 174, 238, 240, 268, 277, 289-89, 310, 642, 659 Bounds test approach for cointegration, 555 Box-Jenkins methodology(see also Box-Jenkins approach), 387, 400-02, 404 Breusch-Godfrey test, 257-62, 279, 668-70 Breusch-Pagan test, 194-95, 198, 202, 659-60, 677 Brownian motion, 422, 426-29, 432, 434, 489 C Capital, 8, 9, 15, 56, 75, 94, 155, 162-63, 257, 295, 344, 359, 414, 500, 619, 621, 672-75, 682 Capital asset pricing model (CAPM), 56, 414, 415 concept, 56 estimation, 58 market risk premium, 56 mean-variance efficient, 56 regression equation of CAPM, 56 sensitivity to market risk, 56 testing, 58 Causality tests, 500, 558-59, 561, 565 bilateral causality, 559 conversely unidirectional causality, 559 Granger F-test for causality, 558, 560-61 independence, 559 model 1, 558 model 2, 558 model 3, 558 relationship, 503-06, 543, 546, 553-54, 556-58, 563 unidirectional causality, 559 Toda-Yamamoto approach, 567, 609, 615 Censored regression model, 567, 609, 615 Central limit theorem (CLT), 104, 256, 422, 426, 42830, 477, 651, 687-88, Characteristic equation, 371-72, 419 Characteristic roots, 324, 552-53 Chow test, 334, 348, 351 Classical linear regression model (CLRM), 166, 172, 231, 238, 272, 404, 414, 538 Cobb-Douglas production function, 8, 154, 673-74. 682 Cochrane-Orcutt procedure, 265, 272-73, 275-76, 283 Coefficient(s), 189, 200, 210, 232, 259, 261, 270, 286, 293, 295-96, 299-00, 302-03, 305-06, 308-09, 320, 328, 352, 355, 371, 382, 385, 411, 413, 418, 424, 446, 447, 463, 478-79, 516, 525, 528-29, 555, 602, 676, 684 ARCH, 408

Econometric Analysis: An Applied Approach to Business and Economics

autocorrelation, 263-65, 374-75, 381, 385, 402, 420 beta, 620, 638, 645 comparing, 672 lagged, 412, 496, 556, 646 matrix form, 509, 516-18, 522, 549-50 multiple correlation, 49, 124, 283, 292-93, 297, 302 of correlation (see also correlation coefficient), 264, 295-96, 299, 302-03, 305, 344, 346-47, of determination, (see also R2), 49, 50, 51, 52, 53, 54, 58, 63, 67, 69, 71, 72, 73, 86, 88, 89, 90, 93, 94, 98, 123-26, 128, 130, 134-35, 154, 156, 158, 160-66, 200, 202, 207, 210-13, 229-30, 238, 258-60, 275, 292, 297-98, 333-37, 340, 354, 357-58, 389, 404, 497-99, 506, 571, 573-74, 582, 595, 601, 604-07, 620, 626, 628, 666-67, 672-675 partial correlation, 283, 302-04, 345-47 rank correlation, 190-91 regression, 131, 151, 194, 218, 238, 243, 268, 300, 306, 308-10, 313, 321, 323, 438, 443, 497, 641, 653, 661, 665, 673, 691 unit root, 426 VAR, 528 Cointegration, 493, 500 Akaike’s Information Criterion (AIC), 498, 501, 503-04, 520-21, 523, 530, 554-59, 561-62 augmented Engle-Granger test, 500-503 concept, 500, 691 consumer price index, 502 definition, 500 Engle-Granger test, 500 Johansen test, 500, 550 Hannan and Quinn Information Criterion (HQIC), 498, 501, 503-04, 520-21, 523, 530, 554-59, 561-62 long-run purchasing power parity, 501 Phillips-Ouliaris-Hansen test, 500, tests for cointegration, 501, 503-04, 550, 557, 564, 692 Schwarz Bayesian Information Criteria (SBIC), 498, 501, 503-04, 520-21, 523, 530, 554-59, 561-62 Cointegration regression Durbin-Watson (CRDW) test, 500 Cointegrated variables, 558, 565, 590 Cointegration matrix, 545 Cointegrated vectors, 500, 542, 544-45, 552, 554 Collinearity, 283, 287, 289, 292-93, 295-96, 299, 30607 Conditional density function (see also pdf), 395-97, 416, 524 Conditional error variance, 410 Conditional expectation, 15, 527, 569, 611, 615-18, 647 Conditional heteroscedasticity, 201-03, 364, 404, 406, 410-11, 414, 561 Conditional likelihood function, 396, 523, 598 Conditional log-likelihood function, 396-00, 413 Conditional maximum likelihood, 390, 395, 397-99 Conditional mean, 15, 17, 22, 197, 405-06, 411, 41314, 416, 418, 510, 521, 524

709

Conditional mean equation, 405, 415 Conditional variance, 171, 405-07, 410-19, 569 Conditional variance equation, 412, 414 Conditional volatility, 410, 414, 418 Confidence intervals, 40, 385, 620, 628 concepts, 46 meaning, 46 for population parameters, 46, 47, 86, 176, 245, 287, 314 student t-test, 46, 47 Consistency property, 42, 102, 110, 240, 558, 642, 647 asymptotic properties of OLS estimator, 42 Consumption function, 5, 9, 10, 59, 170, 285 Continuous random variable(s), 426 Control variable, 12, 13 Correlation coefficient, 49, 52, 55, 56, 122, 126, 12829, 167, 189, 294, 296-97, 301, 311, 343-46, 435, 653, 659, 666-67, 675 autocorrelation, 232-35, 252, 265, 271, 372, 385, 401 partial, 283, 302-04, 345-47 sample, 401 Correlation matrix, 294-97, 301, 304, 306-08, 315, 324, 329, 377-78, 692, Correlogram, 372, 402-03 Covariance, 19, 58, 94, 100-03, 113, 120-23, 126-27, 140, 142, 157, 172-74, 176, 198, 208-10, 212, 21415, 222-25, 231, 234, 236-37, 240-41, 245, 265-66, 269-70, 278, 288, 308, 319, 327, 360, 365, 538-40, 551, 625, 640, 642, 645-46, 648, 657 Covariance stationary, 232, 360, 366, 369-70, 391, 510, 512, 516, 520-22 Critical value, 46, 438, 441, 443, 490, 499, 501-02, 553-57, 666 Cross-sectional data (see also regression model), 11, 231, 406, 581, 636, 638, 641 D Data, 1, 3, 5, 6, 7, 11, 15, 88, 124, 127, 138, 146, 150, 166-67, 169-70, 178, 196, 257, 264, 304, 307-08, 310, 318, 321, 329, 338, 340, 344, 348-49, 354-55, 374, 387-88, 395, 401, 409, 463, 466, 509, 521, 530, 535, 542, 554, 557, 577-78, 581-82, 587, 601, 606, 610, 619, 651, 672, 676, 690, 693 censored, 568, 606-07 concepts, 11 cross-sectional, 11, 15, 231, 406, 581, 636, 638, 640-41 experimental, 11 grouped, 170, 177, 192, 627 observed, 5, 7, 688 panel, 686-89, 691 primary, 573, 632 sample, 3, 17, 46, 77, 81, 95, 128, 211, 253, 272, 314, 324, 617 time series, 11, 21, 27, 60, 62, 66, 68, 97, 201, 234, 250, 348, 351, 359, 372-73, 400, 402, 406, 410, 414, 422, 435, 451, 453-55, 457-58, 469, 471, 473-74, 478, 483-90, 636, 640-41, 689 truncated, 568, 606, 608, 610-11 Debt ratio, 14, 333

710

Subject Index

Density, 106, 390-92, 396, 524, 542, 585, 609, 611-12, 615, 618 Degrees of freedom, 45, 46, 47, 84, 85, 88, 92-93, 119, 126-28, 133, 136-37, 139-40, 142-44, 148-50, 152, 158, 180, 183-88, 192, 195-98, 200, 207, 210, 21213, 218, 261-62, 264-65, 302-05, 334, 338, 340, 342-47, 350-51, 353, 356-57, 376, 404, 409, 492, 506, 519, 527, 560, 573, 582, 588, 590, 595, 601, 603-04, 652, 659, 665, 672, 367-78, 387-88 Demand curve, 62 elasticity, 18, 53, 54 export, 5 function, 3, 4, 5, 6, 9, 10, 13, 62, 75, 153, 155, 233, 310 law, 4 Denominator, 187, 257, 292, 299, 426, 436, 480, 495 Dependent variable, 1, 2, 3, 4, 6, 7, 8, 9, 10, 15, 16, 17, 20, 21, 27, 31, 38, 39, 49, 50, 51, 52, 54, 59, 60, 61, 62, 64, 67, 68, 69, 75, 76, 77, 86, 88, 90, 94, 98, 99, 124-25, 133, 138, 152, 154, 156, 171-72, 180-81, 199, 207, 210, 212-13, 216, 221, 223, 233, 238, 247, 257-58, 275, 290, 333-34, 341, 354, 359, 38788, 395, 398, 400, 404, 410, 417, 422, 486, 490, 492-93, 499, 506, 537, 542, 556-57, 562-63, 56869, 571-74, 577-78, 581-82, 584, 587-88, 590-91, 593-98, 601, 604-07, 609-12, 615, 619, 622-23, 627-28, 636, 641, 659, 665-66, 668-69, 678, 682 concepts, 13 define, 13 dummy, 567-68, 584, 590 binary, 567-71, 590-91, 594, 601, 604-06, 622 limited, 567, 587, 604, 612, 615, 622 Derivative(s) of regression equation, 29, 106, 194, 586, 614 first order, 29, 194 partial, 29, 77, 106, 204, 206, 586, 664 second order, 29, 107, 586, 599, 614 Determinant, 348, 519 Deterministic components, 1, 2, 9, 10, 75, 488, 510, 550, 684 factor, 1 model, 2, 4, 17 terms, 489, 510, 521, 546, 552-53 time trends, 446, 447, 488, 543, 547 Deterministic variable, 684 Deviation form, 36, 80, 114, 128, 241, 292, 309, 323, 366, 438, 443, 447, 463, 475, 646, 661 Dickey-Fuller (DF) test, 422, 435, 437, 450-61, 474, 480, 483-89, 503, 545, 554, 558, 691 Difference stationary process, 361 Differences-in-differences estimator, 651, 652 Differentiation, 81, 96, 97, 129, 399, 416, 617 Discrete variable(s), 12 Dispersion, 19, 41, 166, 171, 206, 297 Distributed lag model, 493-98, 543-44, 555-56 Autoregressive, 494-95 dynamic effects, 494 estimation, 497 long-run effects, 495-96 Disturbance term, 10, 16, 19, 20, 166, 182, 203, 23134, 247, 566-70, 576, 579, 582, 592, 595, 610, 614 Dummy dependent variable, 567-68, 584, 590

Dummy variable, 584, 590 concepts, 567-68 definition, 568 Durbin’s h test, 257-58 Durbin-Watson test, 251, 253-55, 257-58, 260, 277, 498-99 Durbin-Watson d statistic, 277-78 Dynamic models, 493-94, 499 Dynamic effects, 493-96 E Econometric analysis, 6, 7, 11, 35, 131, 232-33, 251, 267, 660 Econometric model(s), 2, 3, 4, 5, 7, 44, 76, 130-31, 177, 231, 233, 238, 359, 387, 405, 410-11, 493, 568 Econometrics applied, 4, 5, 177 concepts, 1 definition, 1 division, 4 limitations, 5, 6 methods, 1, 4, 5, 6, 243 scopes, 5 theoretical, 1 Efficient estimator, 228, 407, 640-41, 653, 667 Efficient market hypothesis, 361 EG test (see Engle-Granger test), 500-01, 563 Eigenvalues, 293-94, 306-09, 320, 326-29, 332, 510, 514, 521, 547-48, 551-53, 693 Eigenvectors, 307, 309, 329, 332 Elasticity measures, Cobb-Douglas production function, 8, 9, 155, 67275 demand, 18, 53, 54 double log model, 67, 254-55 linear regression model, 18, 52, 53 Endogeneity, 555, 665-66, 677, 679 Endogenous variable, 254, 508-09, 516-17, 530, 539, 549-50 Engle test, 201-02, 406-07 Engle-Granger (EG) test, 500-01, 503, 554 augmented Engle-Granger (AEG) test, 500, 503, 564 Equations linear, 67, 76, 94, 234, 525 non-linear, 69, 153-56 simple, 67, 76, 154, 155 multiple, 94, 153, 156 Error correction mechanism (ECM) ( see also cointegration and error correction mechanism), 493, 542-44, 546-49, 558 Error sum of squares (see also residual sum of squares), 22, 27, 48, 49, 50, 51, 52, 54, 58, 70, 71, 77, 79, 81, 84, 85, 90, 92, 93, 96, 105, 114-18, 124-27, 130-31, 134, 136, 139, 141, 144, 148-52, 157, 160-62, 176, 182, 184, 186, 188, 194, 204, 206, 227, 229, 242, 271, 274-76, 292, 298, 318, 321-22, 334, 337, 339, 341, 349, 356, 387-88, 399, 454, 459, 559-62, 566, 604, 649, 661, 676

Econometric Analysis: An Applied Approach to Business and Economics

Error terms (disturbances) mean, 19, 20, 57, 76, 166, 171, 172, 234-35, 406, 610, 614, 622, 644 Gaussian white-noise process, 255-56, 258-63, 265, 268-70, 272-79, 291, 297, 300, 303, 310, 36162, 390, 394, 398, 400, 501, 559, 561 stochastic, 10, 19, 20, 76, 194, 259, 361 variance, 20, 35, 57, 76, 166, 171-72, 234, 406, 610, 614, 622, 644 variance-covariance matrix, 94, 95, 173, 208, 210, 214, 237, 265, 270, 551, 654 white-noise process, 238, 241, 247-48, 255-56, 269, 273, 361, 363-64, 368-70, 374, 376, 378-81, 385, 387, 389-90, 401, 424, 488, 494-95, 50205, 507, 520, 528, 530, 536, 542-43, 548-49, 556 Estimates AR models, 394-95 ARCH models, 407, 410 ARMA models, 398, 400-01 best linear unbiased (BLUE), 38, 39, 40, 101-02, 174, 268, 277, 288-89, 642, 659 efficient, 243, 655 FGLS (see also feasible generalised least squares estimates), 212-13, 272, 274, 276-79 GARCH-M model, 417 GLS (see also generalised least squares), 210, 245, 268, 675 GMM (see also generalised method of moments estimates), 679, 682 LSDV: 645-46, 659, 674 maximum likelihood (ML), 31, 217-18, 221, 27071, 294-96, 394-95, 398, 400-01, 410, 417, 525, 586, 588, 599, 601, 614, 620, 624-25, 628 MA, 395, 398 OLS ( see also least squares estimates), 6, 28, 53, 58, 60, 61, 63, 64, 65, 67, 69, 99, 126-27, 172, 174, 194, 196, 198-99, 201, 203, 206, 211-12, 238, 245, 259-60, 262, 276, 278-79, 288-89, 291, 330, 354, 438-39, 441, 443, 445-47, 457, 463, 484, 486, 490, 506, 528, 570, 574, 595-96, 598, 673-75 Ridge, 315-17, 319-23 WLS (see also weighted least squares estimates), 207, 574, 582 Estimation methods (see also estimation techniques), 15, 75, 167, 231, 283, 310, 351, 360, 398, 537 Ad Hoc, 497 Box-Jenkins, 387, 400, 401-02, 404 Cochrane-Orcutt (1949), 265, 272-73, 275-76, 283 conditional maximum likelihood, 390, 395, 397-99 Durbin’s (1960), 275-76 feasible generalized least squares (FGLS), 212-13, 265, 272, 274, 276-79, 541 generalized least squares method (GLS), 210, 245, 265, 268, 270, 537, 539, 675, 679, 682 generalized method of moments (GMM), 4, 679, 682 graphical, 177, 249, 321 Hildreth-Lu (1960), 272, 274-75, 281 instrumental variables, 636, 678-79 method of moments, 15, 19, 21

711

maximum likelihood method, 4, 19, 105, 216, 218, 265, 270, 391, 395-96, 398, 417, 523, 588, 624 ordinary least squares (OLS) method, 4, 5, 19, 22, 27, 40, 58, 59, 60, 62, 64, 65, 66, 67, 68, 77, 97, 137-38, 153, 155-56, 173, 179, 184-86, 188-90, 192-13, 201-06, 215-16, 231, 238-39, 249-51, 254-56, 258-64, 266-68, 272-79, 283, 286, 289, 297-00, 303-04, 315, 336-37, 339, 341, 346-47, 354, 356, 388, 399, 408, 475, 490, 497-98, 516, 522, 536, 544, 559-62, 569-72, 578-79, 582, 585, 595, 610-11, 615, 641-42, 645, 650, 65253, 657, 659, 663-65, 667-68, 670-71, 684 Prais-Winsten, 272, 277-78, 281 weighted least squares (WLS), 4, 203, 207, 574 Estimator(s) best linear unbiased estimator (BLUE), 38, 40, 101-02, 172, 174, 238, 240, 268, 277, 286, 28889, 310 between, 657-58, 662, 667 biased, 312, 313-14, 320, 348 conditional omitted variable (COV), 313 consistent, 102-03, 110-11, 143, 173-74, 212, 238, 240, 642-43, 646, 648, 650, 659, 681 differences-in-differences, 651-52 efficient, 407, 640, 641, 653, 681 first difference, 650-51, 664-65 feasible generalized least squares(FGLS), 211-13, 271-72, 277-78, 407, 541, 658 fixed effects, 645-48, 651-52, 661, 665-68, 670 generalized least squares (GLS), 175, 207, 209-11, 216, 221-22, 225, 244, 265, 269, 272, 540, 65556, 658-59, 662, 664 generalized method of moments (GMM), 679, 680-82, instrumental variable (IV), 257, 636, 678-80 least-squares dummy variable (LSDV), 645-46, 659, 674 maximum likelihood, 142, 194-95, 259, 407, 413, 416, 611, 625 OLS, 40, 41, 42, 44, 63, 66, 81, 85, 90, 96, 97, 98, 99, 100, 103, 104, 106, 122, 135, 144, 146, 151-52, 172-74, 176, 185, 209, 211, 213, 22325, 238-41, 245, 247, 256, 259, 272, 286, 290, 292, 311, 313-15, 322, 327-28, 352, 435, 437, 449, 499, 523, 539-40, 574, 611, 645-46, 650, 657, 663-64, 667, 670, 681, 685 One step GMM, 681 pooled, 642-43 ridge, 314-16, 319-23 random effects, 655-59, 661-62, 664-67 two stage least square, 681, 687 unbiased, 33, 34, 35, 38, 39, 40, 100-02, 105, 108, 117-19, 172-73, 176, 225, 240, 242-43, 245-47, 312, 314-15, 320, 348, 407, 540-42, 647-48, 659 within, 646, 657, 663 weighted estimator (WE), 313 WLS, 204, 207, 209, 227, 574, 582 EViews, 1, 7, 15, 28, 69, 75, 149, 158, 204, 255, 279, 283, 318, 346, 353, 359, 394, 398, 400-01, 416, 418, 422, 490, 520, 522, 530, 533, 535, 553-54, 560, 563, 567, 629, 636, 665, 693

712

Subject Index

Exact multicollinearity (see also perfect multicollinearity), 283-85, 294-96, 641, 645, 647, 650, 653, 663-65 concepts, 284 definition, 284 Exogenous variables, 211, 352, 510, 530, 536-37, 539, 568, 590, 663 Exogeneity, 651, 678, 692 Expectation (conditional), 15, 32, 34, 36, 38, 40, 10001, 105, 117, 173, 176, 225, 235-36, 240, 242, 293, 319, 365-70, 377-79, 382, 386, 497, 512-14, 528, 546, 569, 611, 615-18, 642, 647 Expected value, 15, 20, 22, 42, 44, 57, 76, 77, 95, 113, 117-21, 142, 189, 197, 231, 340, 353, 362, 386, 390, 405, 512, 551, 579, 592, 610, 614, 620-21, 652, 657 Explanatory variables ( see also regressor variables), 7, 16, 19, 20, 75, 76, 77, 94, 98, 166, 168, 180, 182, 186-87, 189-90, 200, 234, 255, 257, 258, 283-85, 291-92, 297-00, 301-02, 306, 331, 335, 337-38, 341, 446, 493, 537, 567, 577, 587, 608, 634, 638 Exponential function, 65, 178, 200-01, 211, 577 Exponential GARCH (EGARCH), 364 Exponential relationship, 178 F F-distribution, 562, 699 F-test, 44, 75, 85, 86, 126-28, 131, 136, 138-40, 14243, 145-46, 148-150, 158, 165, 176, 187, 189, 198, 243, 245, 261, 301-02, 304, 341-49, 408, 447, 45456, 459, 531, 556-58, 562-63, 660 Farrar-Glauber test for multicollinearity, 301-04 First difference estimator, 650-51, 651, 664-65 concepts, 650, 664 estimator, 650, 664 statistical assumptions, 650, 664-65 First-order autoregression process (see also AR(1) process), 231, 241, 243, 246, 256, 270, 364-68, 383-385, 387-88, 390, 392-93, 395, 405, 422, 435, 438, 443, 451, 455, 457, 463, 469-71, 474, 507, 670 concepts, 231, 234 definition, 231, 234 autocorrelation function of, 236, 366, 369 autocovariance function of, 236, 265, 369 maximum likelihood method for estimation, 270 mean of, 235, 365, 369-70 variance of, 235, 365-66, 370 variance-covariance matrix of, 238 First-order moving average process (see also MA(1) process/ MA(1) model), 376-78, 384-85, 388, 390, 395, concepts, 376 definition, 376 autocorrelation function of, 377-78 autocovariance function of, 377 maximum likelihood method for estimation, 396 mean of, 377 variance of, 377 Fitted values, 19, 22, 28, 31, 61, 63, 67, 69, 167, 355, 360, 366, 381

Fixed effects models, 636, 644-46, 647-52, 659, 661, 663, 665-66, 668, 670, 672, 674-75 concepts, 644 estimator(s), 645, 652, 661, 663, 665-68, 670 Hausman test, 665-66, 677 important properties of fixed effects estimator, 647 property of asymptotic distribution of, 648 property of consistency, 648 property of unbiasedness, 647 statistical assumptions, 645 test for autocorrelation, 667, 677 testing for fixed effects, 649 test for heteroscedasticity, 671 test for joint significance, 675 variance-covariance property, 648 Forecasts/Forecasting, 6, 7, 11, 15, 16, 360, 404, 406, 414, 493, 528-31, 534-36 ARCH model, 347, 406, 409, 410 ARMA process, 400-01, 404 AR process, 387 Box-Jenkins approach, 387, 400, 401-02, 404 GARCH model, 410, 413-14 GARCH-in-Mean model, 415, 417-18 impulse response function, 531-32, 534 linear regression equation, 16, 21, 27, 28, 89, 90, 138, 253 MA process, 388 nonlinear regression equations, 153-56 VAR model, 528, 530-31, 536 Foreign exchange rate, 373-75, 387, 455-56, 471, 483, 489, 492, 501-02, 504, 506, 520, 533, 535-36, 554, 607 consumer price index (CPI), 492, 501-02, 554 interest rate, 12, 77, 160, 164, 350 purchasing power parity (PPP) theorem, 501, 554 Forward selection procedure, 333, 334, 343-45 Full column rank matrix, 76, 236, 283, 295 Functional form misspecifications, 355, 561 Functional central limit theorem, 422, 426, 428-30 Functional relationship(s), 1, 5, 8, 9, 10, 11, 13, 15, 59, 61, 64, 67, 73, 153, 164, 170, 539, 496, 636 G GARCH model ( see generalized autoregressive conditional heteroscedastic model), 347, 364, 41011 concepts, 410-11 definition, 411 estimation, 413 GARCH-in-mean model, 364, 414 concepts, 414-15 definition, 415 estimation, 415 Gauss-Markov theorem, 38, 40, 209 Gaussian white noise process, 255-63, 265, 268-70, 272-79, 291, 297, 300, 303, 310, 361-62, 390, 394, 398, 400, 423-24, 501, 559, 561 GDP, 11, 12, 14, 15, 64, 65, 66, 67, 68, 69, 73, 77, 97, 98, 99, 146, 155, 160, 164, 182, 197-98, 227-29, 250-51, 255, 258-59, 261, 282, 332, 338, 344, 354, 359, 363, 400, 457-58, 492, 497, 500, 509, 536,

Econometric Analysis: An Applied Approach to Business and Economics

557, 560, 566, 622, 627-29, 636, 673-77, 682, 690, 693 Generalized autoregressive conditional heteroscedastic model (GARCH model), 347, 364, 410-11 concepts, 410-11 definition, 411 estimation, 413 generalized least squares (GLS) estimator(s), 175, 207, 209-11, 216, 221-22, 225, 244, 265, 269, 272, 540, 655-56, 658-59, 662, 664 generalized method of moments (GMM), 636, 678-80, 682 instrumental variables, 257, 636, 678-80 concepts, 679 estimator, 679, 680-82 justidentified, 680 OLS pooled estimator, 680 one step GMM, 681 overidentified, 680-81 testing, 681 two stage least square estimator, 681 Glejser test, 188-89, 227 GLS (see generalized least squares), 4, 174-76, 188, 207, 209, 210-11, 215, 217, 221-22, 225, 244-45, 265, 268-72, 276, 537, 539-40, 653, 655-65, 65859, 662, 664-65, 675 GMM ( see generalized method of moments), 4, 636, 678-82 Goldfeld-Quandt test, 187, 189 Goodness –of-fit measures ( see R-squared), 49, 50, 51, 53, 54, 58, 63, 67, 69, 71, 72, 73, 86, 88, 89, 90, 93, 94, 98, 123-26, 128, 130, 134-35, 154, 156, 158, 160-66, 207, 210-13, 229-30, 238, 258, 275, 292, 297, 334-37, 340, 354, 357-58, 389, 404, 49799, 506, 571, 573-74, 582, 595, 601, 604-07, 620, 626, 628, 666-67, 672-675 binary choice models, 571, 582, 598, 604-08 coefficient of determination, 49, 50, 52, 61, 123, 158, 199, 200, 202, 259-60, 297-98, 333-35 concepts, 49 definition, 49 limited dependent variable models, 571, 604-08 linear regression model, 51, 123-24 panel data model, 666-67 Poisson regression model, 626 Granger F-test for causality, 558, 560, bidirectional causality, 559-60 bilateral causality, 559 cointegrated, 558, 560-61 concepts, 558-60 conversely unidirectional causality, 559 independence, 559 integrated of order 1, 561 model 1, 558 model 2, 558 model 3, 558 super consistency, 558 unidirectional causality, 559-60, 563 Graphs (see also individual figures listed under topics), 3, 15, 36, 179, 250, 318, 321, 359 Graphically, 7, 17, 18, 21, 28, 31, 34, 40, 48, 61, 63, 67, 70, 167, 171, 177, 250, 293, 354, 367, 404, 531, 533, 589, 591, 602, 631

713

Growth rates, 10, 15, 64, 65, 66, 542 H h-test (see Durbin’s h-test), 257-58 Hausman specification test, 653, 665-66, 677 Hessian matrix, 416, 599-00 Heteroscedasticity additive form, 166, 168, 172 Bartlett test, 179-81, 183 Breusch-Pagan test, 194-95, 198, 202 concepts, 166 consequences, 172 consistent, 172-74, 216 David Henry, 170 derivation of the variance-covariance matrix, 225 economic policies, 170 effects on the properties of OLS estimators, 173 estimation of, 203, 213, 216, 218, 221 Engle’s test, 201-03 feasible generalized least squares (FGLS) estimator, 209, 211-13 foreign banks, 180-81, 192-94, 223 inefficient, 172, 174 generalized least squares (GLS) estimator, 188, 207, 209, 211, 219, 221 Glejser test, 188-89 Goldfeld and Quandt test, 168, 186, 189 graphical method, 177-78 grouped data, 170, 177, 192 Harvey and Godfrey test, 200-01 Lagrange multiplier (LM) test, 196, 197, 202 less efficient, 174, 176 liberal trade policy, 170 likelihood ratio test, 183, 192-93, 218 linearity, 172-73, 238-39, 285 maximum likelihood estimation, 216, 218 meaning, 166 methods for detecting, 177 miss-specification, 170, 199 monetary policy, 170 multiplicative form, 166, 168, 172, 211, 221 nature of, 171, 177 Park test, 184-85 possible reasons, 169 properties of the FGLS estimator, 213 properties of OLS estimators, 173 random coefficient model, 169 residual mean square, 172, 176 skewness in the distribution, 170 Spearman’s rank correlation test, 189-90 state-owned banks, 181, 192-94, 220, 224 structural forms, 167, 184 tax reform policy, 170 test for detecting, 179, 181, 183-84, 190, 201 unbiasedness, 172-73 unbiased estimator, 172, 176, 226 variance of, 166-67, 169-71, 174-77, 205-07, 217, 219-20, 223 weighted least squares (WLS), 203-04, 206-07, 209 with autocorrelation, 167-67, 213 with correlated disturbances, 166-67, 213

714

Subject Index

without autocorrelation, 167, 179 White’s test, 198-99, 202 Higher order autocorrelation, 260, 261, 263, 375 Homoscedasticity, 19, 166, 170, 188, 194, 197, 202, 206-07, 678 assumptions, 19, 170 concepts, 19 definition, 20 Hypothesis ( see also individual null hypothesis), 2, 6, 40, 41, 45, 46, 61, 71, 84, 85, 89, 91, 92, 93, 12628, 130-31, 133-52, 157, 176, 180-03, 218, 243, 245, 251-65, 278, 283, 286, 289, 301-05, 322, 332, 341-45, 349, 351-53, 355-57, 374-76, 401, 408-10, 414, 418, 424, 426, 437-38, 441, 443, 447, 449-61, 463, 466-74, 478, 480-90, 501-06, 519-20, 526-28, 552-54, 556-63, 574, 589-91, 603-04, 606, 626, 649, 659-61, 665-66, 668-72, 675, 677-78, 683-89, 691-93 Hypothesis testing, 6, 133, 424, 523, 574 alternative, 45, 71, 84, 89, 91, 92, 128, 131, 133140, 142-143, 145, 150, 152, 157, 180-83, 18586, 189-93, 195-03, 218, 243, 251-53, 255-56, 258-64, 289, 301-05, 341-45, 349-51, 353, 35556, 376, 408, 450-60, 469, 470-73, 483-84, 486-88, 501-02, 505-06, 519, 526, 552-53, 55659, 561-63, 589-90, 603, 649, 659, 665, 668, 669, 672, 676-78, 683, 687-89, 691 autocorrelation, 252, 255, 260, 262, 596, 598, 66768, 677 cointegration, 500, 503-04, 550, 557, 692 F-test, 44, 75, 85, 86, 126-28, 131, 136, 138-40, 142-43, 145-46, 148-150, 158, 165, 176, 187, 189, 198, 243, 245, 261, 301-02, 304, 341-49, 408, 447, 454-56, 459, 531, 556-58, 562-63, 660 functional-form misspecification test, 353 heteroscedasticity, 179-81, 183-92, 194-96, 198, 200-01 joint null hypothesis, 135, 263, 454, 456, 460, 478, 563, 669 joint significance tests of regression coefficients, 649, 672, 675 Lagrange multiplier test, 145, 158, 202, 260, 678 likelihood ratio test, 183, 192-93, 218, 356, 518, 525-27, 553, 589, 603, 606 linear restriction, 143, 145 null hypothesis, 40, 41, 45, 46, 71, 84, 85, 89, 91, 92, 93, 126-28, 130-31, 133-52, 157, 176, 18003, 218, 243, 245, 251-65, 278, 283, 286, 289, 301-05, 322, 332, 341-45, 349, 351-53, 355-57, 374-76, 401, 408-10, 414, 418, 424, 426, 43738, 441, 443, 447, 449-61, 463, 466-74, 478, 480-90, 501-06, 519-20, 526-28, 552-54, 55663, 589-90, 603-04, 606, 626, 649, 659-61, 665-66, 668-72, 675, 677-78, 683-89, 691-93 p-value, 504, 560, 563, 673-75, 687-88, 693 significance level, 58, 61, 63, 66, 67, 69, 90, 91, 154, 165, 189, 207, 211-13, 257-58, 275, 279, 344-46, 404, 410, 414, 418, 502, 504, 519, 573, 582, 588, 596, 601, 673-75, 678, 682 specification test, 665-66, 677 t-ratio (see also t-test), 86, 90, 93, 98, 133-34, 154, 156, 162, 176, 185, 189-90, 196, 198-99, 201,

203, 221, 245, 250, 254, 260, 262, 273-74, 27678, 301-02, 305, 310, 312, 332, 334, 346, 350, 351, 353, 358, 388, 400, 409, 424, 442, 447, 450, 466, 497-98, 501, 555, 574 unit root, 435, 437-38, 441-43, 447, 449-50, 461, 466, 468-75, 479-88, 501, 503, 558, 596, 620, 682-83, 685-88, 696 vector autoregressive model, 523 I Idempotent matrix, 102, 105, 108, 112-13, 118-123, 640 Identity matrix, 509, 639, 655 Illustration, 498, 636 Impulse response function, 518, 531-34 Income, 2, 3, 5, 6, 9, 10, 11, 12, 13, 16, 18, 21, 27, 76, 88, 89, 90, 91, 155-56, 161-63, 169-71, 193, 195, 202, 209, 211, 220, 223, 233, 283, 285, 310, 493, 567-68, 575-76, 581-84, 594-98, 607-08, 610, 615, 632, 630 Inflation, 75, 285, 300, 301, 331, 350, 420 Information matrix, 107, 271, 625 Independence, 19, 187, 233, 415, 559, 665, 682, 689 Independent variable(s), 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 15, 16, 17, 20, 21, 27, 44, 51, 59, 61, 62, 64, 68, 75, 76, 88, 94, 124, 154, 184, 228, 284, 336, 343, 359, 422, 493, 567-68, 573, 580-81, 587, 591, 593, 610, 61415, 617, 638, 661 Index, 4, 8, 58, 75, 155-56, 306, 359, 363, 388-89, 394, 398, 405, 409-10, 413-14, 416, 426, 492, 501-02, 554, 585, 589, 593-98, 602, 605, 607, 639 Indices, 410, 503-04, 601-02 Inequality, 6, 42, 110, 173, 240, 430 Chebyshev’s, 42, 110, 173, 240, 430 Instrumental variables, 257, 636, 678-79 Instrumental variables estimators, 679 Integrated of order one, 487, 500-04, 544, 546-47, 554, 558, 561, 690-91 Integrated of order two, 561-62, Integrated of order zero, 499-00, 544, 688, 691 Integration, 463, 476, 500-01, 503-04, 556, 558, 561 Interest rate, 12, 77, 160, 164, 350 Interpretation, 1, 2, 33, 495, 534, 544-45, 554, 605, 623, 626-27 Interval estimation, 46, 86, 322 Inverse of a matrix, 107, 291, 300, 382, 416, 657 Invertible, 283, 381, 388, 495, 512, 546, 683 Investment, 5, 9, 10, 15, 17, 56, 59, 75, 77, 88, 89, 90, 94, 151, 161, 164, 166, 195, 209, 229, 257, 263-64, 282, 285, 338, 344, 353, 357, 388, 374, 414, 500, 509, 566, 619, 621, 632, 672-73, 675, 682 Irrelevant variable, 94, 125, Italy, 183-84, 672 J Johansen test, 500, 551-52, 565 Joint null hypothesis, 135, 263, 454, 456, 460, 478, 563, 669 Joint significance tests of regression coefficients, 649, 672, 675 Jointly statistically significant, 375

Econometric Analysis: An Applied Approach to Business and Economics

Judge, 15, 410 K k-dimensional matrix, 509 k-dimensional vector, 424, 429, 507-08, 512, 516, 545-46, 636 k-independent variables, 181, 234, 351, 546, 584 Klein’s rule, 299, 331 Kronecker product, 514-15, 539, 639 L Lag(s), 203, 232, 261, 264, 281, 364, 368-69, 371, 373, 381, 385, 405, 411-12, 413, 415, 503, 507, 518-21, 530, 537, 551, 554, 556-57, 561-62, 687-88 Lagged dependent variable(s), 216, 247, 257-60, 281, 528 Lag length, 232, 261, 371, 503, 518-21, 530, 537, 551, 554, 556-57, 561, 567, 687-88 Lag operator, 363-64, 370-72, 376, 380, 385, 390, 422, 495-96, 508, 510, 517-18, 521, 542, 545 Lag values, 9, 360, 375, 379, 387, 520, 526, 558 Lag order, 507, 518, 521, 530, 562, 684-85 Lag polynomial, 422, 474, 496, 518 Lagrange multiplier test, 196, 197, 202 LB test (see also Ljung-Box test), 263-65, 281, 375-76, 401-02 Least squares method (see also ordinary least squares (OLS) method), 4, 5, 19, 22, 27, 40, 58, 59, 60, 62, 64, 65, 66, 67, 68, 77, 97, 137-38, 153, 155-56, 173, 179, 184-86, 188-90, 192-13, 215-16, 231, 238-39, 249-51, 254-56, 258-64, 266-68, 272-79, 283, 286, 289, 297-00, 303-04, 315, 336-37, 339, 341, 346-47, 354, 356, 388, 399, 408, 475, 490, 497-98, 201-06, 516, 522, 536, 544, 559-62, 56972, 578-79, 582, 585, 595, 610-11, 615, 641-42, 645, 650, 652-53, 657, 659, 663-65, 667-68, 67071, 684 Cramer’s rule, 23, 25, 78, 79, 80, 382 Estimates, 6, 28, 53, 58, 60, 61, 63, 64, 65, 67, 69, 99, 126-27, 172, 174, 194, 196, 198-99, 201, 203, 206, 211-12, 238, 245, 259-60, 262, 276, 278-79, 288-89, 291, 330, 354, 438-39, 441, 443, 445-47, 457, 463, 484, 486, 490, 506, 528, 570, 574, 595-96, 598, 673-75 Estimators, 40, 41, 42, 44, 63, 66, 81, 85, 90, 96, 97, 98, 99, 100, 103-04, 106, 122, 135, 144, 146, 151-52, 172-74, 176, 185, 209, 211, 213, 223-25, 238-41, 245, 247, 256, 259, 272, 286, 290, 292, 311, 313-15, 322, 327-28, 352, 435, 437, 449, 499, 523, 539-40, 574, 611, 645-46, 650, 657, 663-64, 667, 670, 681, 685, first-order conditions for minimisation, 22, 145 matrix form, 22, 23, 78, 79, 81, 82, 90, 94, 95, 97, 98, 113, 136, 145-46, 149, 151-52, 207, 210, 213, 216, 223, 269, 291, 309, 323, 382, 507, 509, 531, 538, 548, 637, 639, 646, 655 second-order conditions for minimisation, 26, 28 substitution procedure, 23, 78, 235-36 third order condition for minimization, 26, 27 Likelihood function, 28, 29, 106, 109-11, 194, 216, 218-21, 270-71, 356, 387, 391-92, 394, 396-97,

715

399-00, 407, 413, 416, 523-26, 585-86, 598-99, 605-06, 613-14, 616, 624-26 Likelihood ratio test, 183, 192-93, 218, 356, 518, 52527, 553, 589, 603, 606 Limited dependent variable models, 568, 604, 615, 622 censoring, 568, 607-09, 633 censored regression models, 567, 609, 615, 633, dummy dependent variables, 567 dummy dependent variable models, 568 goodness-of-fit, 571, 574, 604-07 linear probability models (LPM), 567-68, 571-72, 574, 576, 601-02, 604, 606-07, 630 logit model, 567-68, 576-82, 585, 587-91, 594, 602, 629-31 Poisson regression model, 567, 622-28, 634 probit model, 567-68, 571, 574, 576-77, 585, 588, 590-95, 598-07, 629-31 truncation, 567, 607-09, 611 truncated data, 568, 608, 610-11, 633 truncated regression model, 567, 610 Tobit model, 567-68, 609, 614-15, 619-20, 633-34 Linear dependence, 283, 309, 401-02 Linear-log model, 59, 60, 61, 73 concepts, 59 definition, 59 estimation, 59 Linear probability model (LPM), 567-68, 570-71, 574, 576, 604, 607, 630 concepts, 568 definition, 568 problems in the estimation of LPM, 570 weighted least squares method, 572 Linear regression equation (see also linear regression models, simple and multiple), 3, 15, 16, 18, 21, 22, 27, 28, 40, 53, 62, 68, 76, 89, 97, 99, 124-25, 158, 177-78, 233, 283-84, 327-28, 415, 576 assumptions for empirical measurement, 3, 4, 6, 7, 15, 19, 20, 21, 22, 29, 40, 43, 60, 62, 75, 76, 77, 89, 95, 98, 125, 127, 138, 146, 151, 156, 159, 167, 172, 177, 184-87, 189, 194-97, 199, 20107, 210-12, 214, 216, 221, 223, 231-33, 238, 250-52, 254, 272, 340-42, 352, 406-07, 638, 641, 645, 649, 650-51, 653, 662-64, 668, 67274, 694 coefficient of determination, 49, 123, 124 concepts, 15, 16, 75, 76, 94 confidence interval estimation, 46, 47 definition, 15, 16, 75, 76, 94, 95 estimation, 20, 22, 23, 25, 28, 77, 80, 94, 95, 105, fitting a multiple linear regression equation, 125, 127 multiple, 75, 94 ordinary least squares method (OLS), 4, 5, 19, 22, 27, 40, 58, 59, 60, 62, 64, 65, 66, 67, 68, 77, 97, 137-38, 153, 155-56, 173, 179, 184-86, 188-90, 192-13, 215-16, 231, 238-39, 249-51, 254-56, 258-64, 266-68, 272-79, 283, 286, 289, 297-00, 303-04, 315, 336-37, 339, 341, 346-47, 354, 356, 388, 399, 408, 475, 490, 497-98, 516, 522, 536, 544, 559-62, 569-72, 578-79, 582, 585, 595, 610-11, 615, 641-42, 645, 650,652-53, 657, 659, 663-65, 667-68, 670-71, 684

716

Subject Index

partitioning the total sum of squares, 48 population regression model/function, 15, 16, 17, 18, 46 properties of OLS estimator(s), 31, 99 relationship between parameter and elasticity, 52 sample regression function, 17, 18 simple, 15, 16 test of significance of parameter estimate(s), 45, 133, 135, 136, 142, 143 variance-covariance matrix of, 94 Linear relationship, 16, 18, 22, 62, 65, 76, 94, 124, 158, 177-78, 233, 283-84, 327-28, 415, 576 Linearity, 31, 40, 99, 157, 172-73, 238-39, 285-86 Ljung-Box (LB) test, 263-65, 281, 375, 401-02, 41920 LM test (see also Lagrange multiplier test), 75, 145-46, 148, 158, 165, 196-97, 258-60, 278-81, 404, 409, 595-96, 598, 659-60, 668-70, 677-78, 689-90 Logarithmic transformation, 66, 68, 153, 184, 200, 342, 358 Logarithms, 64, 67, 106, 154-56, 554, 627-28 Log-linear model (see also semi-log model), 64, 65, 66, 67 concepts, 64 definition, 64 estimation, 65 Logit model, 567-68, 576-82, 585, 587-91, 594, 602, 629-31 changes in probability, 580 concepts, 574, 576 definition, 574, 576 estimation, 577, 584, 588 marginal effect, 580 maximum likelihood (ml) method, 584, 588 odds-ratio, 577, 576 relative effects, 580 testing statistical hypotheses, 589-90, Long-run effects, 494-96, 544, 564 Lower limit, boundary, 45, 46, 47, 86, 254-55, 607, 609 LR test ( see also likelihood ratio test), 183, 192-93, 218, 356, 518, 525-27, 553, 589, 603, 606 LSDV ( see least squares dummy dependent variable estimator), 645-46, 659, 674 M MA process ( see moving average process), 365, 376, 378-80, 399, 516-17, 531 MA(1) process (see also first-order moving average process), 376-78 autocorrelation function for, 377-78 autocovariance function of, 377 concepts, 376 definition, 376 mean value of, 377 variance of, 377 MA(2) process (see also second-order moving average process), 378-79 autocorrelation function for, 379 autocovariance function of, 378-79 concepts, 378 definition, 378

mean value of, 378 variance of, 378 MA(q) process ( see also qth-order moving average process), 379-80 autocorrelation function for, 380 autocovariance function of, 380 concepts, 379 definition, 379 mean value of, 379 variance of, 380 McFadden, 605, 626 Macroeconomic variables, 536, 558 Marginal propensity to consume (MPC), 5, 10 Marginal effect(s), 15, 61, 580, 587, 591, 593, 596, 617-20, 625 Market portfolio, 56, 57 Market risk, 56 Matrix/Matrices, 23, 76, 78, 79, 80, 81, 82, 85, 90, 95, 97, 98, 99, 104, 122, 135, 137, 142, 145-46, 14952, 210, 213, 216, 221, 223, 226-29, 239, 269, 283, 286, 290-91, 293, 295, 308-09, 315, 317, 323-24, 348, 382, 476, 478, 507, 509-10, 512, 514-16, 522, 524, 526, 531-32, 537-39, 541, 545-51, 553, 63839, 644, 646, 653, 677-81, autocovariance matrix, 512, 516, 564 covariance matrix, 198, 222, 407-08, 508, 521-22, 533-34, 624, 657, 659, 692 correlation matrix, 295-97, 301, 304, 306-08, 315, 324, 329, 677-78, 692 diagonal matrix, 307, 325-27, 533-34, 657 full-rank matrix, 290, 552 Hessian matrix, 416 idempotent matrix, 102, 105, 108, 112-13, 118-23 identity matrix, 509, 534, 639, 655 information matrix, 107, 271, 625 inverse matrix, 291, 300, 382 k-dimensional matrix, 509, 517, 550 positive definite matrix, 215, 541 positive semi-definite matrix, 102, 662 projection matrix, 645 scaling matrix, 439, 443, 446-48, 465, 475, 479 singular matrix, 294, 313, 547, 550 square matrix, 112 symmetric matrix, 102, 105, 108, 112-13, 115-16, 118-23, 215, 393 variance-covariance matrix, 92, 94, 100-03, 113, 121-22, 126-27, 140, 142, 157, 172-74, 176, 208-10, 212, 214-15, 222, 224-25, 229-31, 237, 240-41, 245, 265-66, 270, 288, 319, 327, 392, 416, 477, 514, 519-20, 522, 526-27, 539-40, 551, 562, 642, 644-46, 648-49, 651, 654-55, 659, Maximum eigenvalue test, 552-55 Maximum likelihood estimator, 142, 194-95, 259, 407, 413, 416, 611, 625 AR process, 390-92, 395 ARCH process, 415-17 ARMA process, 398-00 asymptotic properties, 107, 110 functional property, 107 logit model, 584, 586, 588 MA process, 396-97 probit model, 548, 599-00

Econometric Analysis: An Applied Approach to Business and Economics

properties of, 107 property of biasedness, 108 property of consistency, 110 property of distribution, 107 property of efficiency, 108 Poisson regression model, 624-25 regression model, 28, 29, 30, 105-06, 217 Tobit model, 615-16, 620 variance property, 107, 217 Maximum log-likelihood function, 218, 626 Market risk, 56 Market risk premium, 56 Mean (see also expected value), 15, 20, 22, 42, 44, 57, 76, 77, 95, 113, 117-21, 142, 189, 197, 231, 340, 353, 362, 386, 390, 405, 512, 551, 579, 592, 610, 614, 620-21, 652, 657 AR process, 235, 365, 369-70. ARMA process, 386 conditional, 15, 527, 569, 611, 615-18, 647 MA process, 377-79 of autocorrelated random error terms, 234 population, 428, 640 sample, 104, 428-29, 431, 467, 666 Mean squared error, 314, 320-22 Measurement errors, 234 Median, 630-31 Method of moments, 15, 19, 21, 679, 682 Microeconomic theory, 3, 4 Misspecification, 10, 11, 170, 199, 233-34, 347-48, 355-58, 561 ML estimator (see also maximum likelihood estimator), 142, 194-95, 259, 407, 413, 416, 611, 625 Model(s) (see also regression equations), 67, 69, 76, 94, 153-56, 234, 525 autoregressive (AR), 234, 364, 368-71, 387-89, 394, autoregressive distributed lag (ARDL), 494-95, 543-44, 555-56 ARCH, 201-02, 347, 364, 404-06, ARMA, 385-86, 398, 401 ARIMA, 347, 363-64, 400, 418 censored regression, 567, 609, 615 distributed lag, 493-98, 555, 543-44, 556, dynamic, 493-94, 499 double log, 67, 68, 69, 154-56, 160 econometric, 2, 3, 4, 5, 7, 44, 76, 130, 131, 177, 231, 233, 238, 359, 387, 405, 410-11, 493, 568 GARCH, 347, 364, 410-11, 413 GARCH-in-Mean, 364, 414-15 limited dependent variable, 568, 604, 615, 622 linear log, 59, 60, 61, 73 linear probability, 567-68, 570-71, 574, 576, 604, 607, 630 log-linear, 64, 65, 66, 67 logit, 567-68, 576-82, 585, 587-91, 594, 602, 62931 MA, 365, 376, 378-80, 399, 516-17, 531 multiple linear regression, 75, 76, 94, 95, 97, 99, 105, 124-25, 127-28, 133, 135-36, 138, 142-43, 145, 149, 154, 159-60, 172-73, 177, 181-82, 192, 197, 199, 207, 213, 216, 226-27, 283-84, 292, 307, 323, 330-33, 537, 565, 568, 645

717

multiple non-linear regression, 75, 153-55, 157 multivariate time series, 493, 506 Poisson regression, 567, 622, 623-28, 634 probit, 567-68, 571, 574, 576-77, 585, 588, 590-95, 598-07, 629-31 random walk, 362-63, 371, 419, 424, 427, 430, 432-33, 438, 441, 443, 450, 453, 469, 496 reciprocal, 61, 62, 63, 73 ridge regression, 310, 314 seemingly unrelated regression equations (SURE), 521, 537, 539, 542, 565, 679 simple linear regression, 15, 16, 17, 19, 20, 21, 22, 27, 28, 36, 48, 49, 52, 53, 68, 70, 133, 169, 177, 179, 192, 199, 209, 241, 243, 250, 252 simple non-linear regression, 15, 59, 61, 62, 64, 65, 66, 67, 68, 70, 75 time series econometric, 359, 364, 387, 419, 422 Tobit, 567-68, 609, 614-15, 619-20, 633-34 transformation of polynomial, 153 truncated regression, 567, 610 vector autoregressive (VAR), 493, 506, 508-09, 515-16, 518-21, 523, 525, 527-31, 535-37, 54449, 551-52, 554, 561-62, 564-65 vector error correction (VEC), 545, 546-47, 549-55, 565 Moving average process ( see also MA process), 365, 376, 378-80, 399, 516-17, 531 autocorrelation function of, 377-80 autocovariance function of, 377-80 first order, 376-78 invertible condition of, 380-81, 388 qth-order, 379-80 mean value of, 377-79 second order, 378-79 variance of, 377-80 MPC (see also marginal propensity to consume), 5, 10 Multicollinearity, 283-86, 289, 293-07, 309-10, 313, 318, 321-23, 327-32, 342, 497, 641, 645, 647, 650, 653, 663-65 auxiliary regression, 298-99, 302, 304 biasness of the ridge, 320 Chi-square test, 301 concepts, 284-85 condition number, 306-07 consequences of, 283, 285 correlation matrix, 295-97, 301, 304, 306-08, 315, 324, 329 detection of, 294, 296, 299-01 determinant of, 294 drop some, 310 effects on the properties, 286 exact, 284 F-test, 302, 304 Farrar and Glauber tests, 301-04 high degree of, 296-98, 304, 306, imperfect, 283-85 Klein’s rule, 299 Leamer’s method, 306 mean squared error of, 310, 314, 320-22 near, 284-85, 295, 298, 310 pairwise correlation, 295-96 partial regression, 297-98

718

Subject Index

perfect, 283-85, 294-96, 641, 645, 647, 650, 653, 663-65 presence of, 283, 294-96, 298-99, 301, 309-10, 318, 322, principal component, 310, 323-25, 327-29 problems of, 285, 289, 293, 295, 297-98, 300-01, 310, 321-23, 327, 329, 283, 299, 301, 304, 310 properties of the OLS estimator, 286 retainment of, 327 ridge estimator, 315-16, 319-23 ridge regression, 313, 323 ridge trace, 316, 321-22 sources of, 285 theorem, 293 t-test, 302, 305 variance-covariance matrix of, 327 variance decomposition proportions, 307 variance inflation factor (VIF), 300-01 Multiple correlation coefficient, 49, 124, 283, 292, 298, 302 Multiple regression, 75, 94, 113-14, 123, 127 Multiple regression equation/model, 75, 113-14, 123, 127, 133, 142, 151, 158-60, 165, 284, 286, 301, 333, 347 Multiplicative form of heteroscedasticity, 166, 168, 172, 211, 221 Multivariate time series models, 493 ARDL and ECM, 543-44, 555-56, 565 augmented Engle-Granger (1987) test, 500, 503, 564 autocovariance matrix of, 512, 516, 564 bilateral causality, 559 bounds test approach, 555, 557, 565 cointegrated VAR model, 543, 565 cointegration, 493, 500-06, 542-46, 549-50, 55258, 563-65, 636, 690-94 conversion of a VAR model, 517 conversely unidirectional causality, 559 determining the optimal lag length, 518 distributed lag model, 493-98, 544, 564 dynamic model, 493-94, 564 dynamic effects, 493-96 Engle Granger test, 500, 501, 503, 564 error correction mechanism (ECM), 542-44, 54651, 554-56, 558, 565 estimation of var models, 521 feasible generalized least squares estimation of, 533-35, 565 forecast error variance decomposition, 533-35, 565 forecasting with a var, 528-29 Granger F test for causality, 558, 560 independence, 559, 665, 682, 689 impulse response function, 518, 531-34 information criteria, 519-20 interpretation of the ECM, 544 Johansen’s test, 500, 551-52, 565 likelihood ratio test, 518, 525-27, 553, 589, 603, 606 long-run effects, 494-96, 544, 564 long-term relationship, 546 long-run purchasing power parity, 501, 564 maximum eigenvalue test, 552-55 maximum likelihood estimation, 523

non-stationary, 493, 499-05, 557, 559-60, 686-87 OLS and GLS estimation of SURE models, 539, 565 Phillips-Ouliaris-Hansen test, 500, 504-06, 564 rewriting a VAR(p) model, 511 seemingly unrelated regression equations (SURE) models, 521, 537, 539, 542, 565, 679 specification of deterministic terms, 552 spurious regressions, 493, 498-99, 502, 537, 554, 564, stationary, 519-22, 537, 544-51, 557, 559-60, 564, 683, 689, 691-92 stationary vector autoregression model, 509 tests for cointegration, 500, 564 Toda-Yamamoto approach, 567, 609, 615 trace test, 552-53, 555, 565, 693 unidirectional causality, 559 variance of the VAR process, 514 vector autoregressive (VAR) models, 493, 506, 508-09, 515-16, 518-21, 523, 525, 527-31, 53537, 544-49, 551-52, 554, 561-62, 564-65 vector error correction (VEC) model, 545, 546-47, 549-55, 565 vector moving average MA(q) process, 516-17, 531, 564 N Non-linear regression, 15, 18, 69, 70, 75, 154-55, 15960, 162-64, 200, 342, 396 Non-linear relationship, 18, 59, 65, 70, 124, 152, 153, 154, 155 Normal distribution, 42, 142, 256-57, 410, 416-17, 525, 576, 591, 595, 598, 643 standard, 585, 591-94, 684, 689, 692 Normal equation, 315, 323 Normality, 19, 42, 103, 105, 520, 570, 580, 595-96, 598 assumptions of CLRM, 19, 120, 122, 137, 195, 198, 201, 570 test for, 596 Null hypothesis ( see also individual hypothesis testing), 40, 41, 45, 46, 71, 84, 85, 89, 91, 92, 93, 126-28, 130-31, 133-52, 157-57, 176, 180-03, 218, 243, 245, 251-65, 278, 283, 286, 289, 301-05, 322, 332, 341-45, 349, 351-53, 355-57, 374-76, 401, 408-10, 414, 418, 424, 426, 437-38, 441, 443, 447, 449-61, 463, 466-74, 478, 480-90, 501-06, 519-20, 526-28, 552-54, 556-63, 589-90, 603-04, 606, 626, 649, 659-61, 665-66, 668-72, 675, 677-78, 683-89, 691-93 Numerical optimization, 396, 398, 413, 416, 599, 614, 625 Newton Raphson method, 588, 599-00, 614, 616 O Observations, 3, 4, 6, 10, 20, 21, 22, 28, 31, 40, 46, 46, 51, 54, 60, 61, 63, 67, 69, 76, 77, 85, 88, 90, 94, 98, 99, 104, 124-25, 127, 138, 154, 156, 161, 165, 169-70, 172, 174, 177, 183, 186-87, 206-07, 210, 212-13, 221-23, 231, 233-34, 242, 250-51, 262, 267-68, 270, 277, 283, 285, 291, 314, 317, 333,

Econometric Analysis: An Applied Approach to Business and Economics

349, 351-52, 360, 368, 381, 390-92, 395, 398-00, 404, 408-10, 412-14, 417, 428-29, 475, 492, 506, 520, 522-24, 527-28, 530, 555, 560, 573, 578, 582, 585, 588, 595-97, 601, 606-07, 609-10, 614-16, 619-20, 622, 629, 636, 639, 646, 651, 661, 665, 673-74, 678-79, 682-83, 685 Odds-ratio, 586-88, 600, 630-32 OLS estimates (see also ordinary least squares estimates), 6, 28, 53, 58, 60, 61, 63, 64, 65, 67, 69, 99, 126-27, 172, 174, 194, 196, 198-99, 201, 203, 206, 211-12, 238, 245, 259-60, 262, 276, 278-79, 288-89, 291, 330, 354, 438-39, 441, 443, 445-47, 457, 463, 484, 486, 490, 506, 528, 570, 574, 59596, 598, 673-75 OLS estimator(s) (see also ordinary least squares estimator(s)), 40, 41, 42, 44, 63, 66, 81, 85, 90, 96, 97, 98, 99, 100, 103-04, 106, 122, 135, 144, 146, 151-52, 172-74, 176, 185, 209, 211, 213, 223-25, 238-41, 245, 247, 256, 259, 272, 286, 290, 292, 311, 313-15, 322, 327-28, 352, 435, 437, 449, 499, 523, 539-40, 574, 611, 645-46, 650, 657, 663-64, 667, 670, 681, 685 OLS method ( see also ordinary least squares method), 4, 5, 19, 22, 27, 40, 58, 59, 60, 62, 64, 65, 66, 67, 68, 77, 97, 137-38, 153, 155-56, 173, 179, 184-86, 188-90, 192-13, 215-16, 231, 238-39, 249-51, 25456, 258-64, 266-68, 272-79, 283, 286, 289, 297-00, 303-04, 315, 336-37, 339, 341, 346-47, 354, 356, 388, 399, 408, 475, 490, 497-98, 516, 522, 536, 544, 559-62, 569-72, 578-79, 582, 585, 595, 61011, 615, 641-42, 645, 650, 652-53, 657, 659, 66365, 667-68, 670-71, 684 Operators, 380, 508 Omitted variables, 170, 233, 311, 313, Order condition, 22, 26, 27, 29 Order in probability, 439, 445 Orthogonal, 283, 292, 296, 300-02, 304, 307, 322-23, 325-28, 640 Orthogonality, 292-93, 301, 331 P PACF ( see partial autocorrelation function), 380-81, 384-85, 401-03, Pairwise correlation, 295-96, 325, 331 Panel data, 1, 11, 636, 638-41, 644, 650-51, 659-61, 665-68, 670, 678-79, 684, 686-91 Panel data cointegration analysis, 690 concepts, 690-91 definition, 691 Johansen Fisher panel cointegration test, 691, 69394 Kao test for panel cointegration, 692-93 Pedroni (1995) test for cointegration, 691-92 panel cointegration estimation, 691 Panel data regression models, 636 concepts, 636-39 definition, 636 differences-in-differences estimator, 651-52 efficiency of, 638, 640, 651 first difference estimator, 649-51, 664-65, 694 fixed effects model(s), 636, 644-45, 647-52, 659, 661, 663, 665-66, 668, 670, 672, 674-75

719

GMM method, 636, 678-82 goodness-of -fit, 666, 672, 675 Hausman test, 653, 665-66, 677 instrumental variables, 636, 678-79 model notation, 638 pooled regression models, 641, 675 random effects models, 636, 652-53, 665, 672, 675, 694 treatment of individual effect, 644 testing for fixed effects, 649 unit root tests, 682-83, 687-88 Panel unit root tests, 682-83 Choi (2001) test, 682, 687 Fisher’s type test, 687 Hadri (2000) test, 682, 689-90, 694 Im, Pesaran and Shin (2003) test, 682, 686-88 Levin, Lin and Chu (2002) test, 682, 690 Maddala and Wu (1999) test, 682, 687, 690-91, 693 model specifications for, 683 non-stationarity tests, 682 residual-based LM test, 689 stationarity test, 682 Parameters, 2, 3, 6, 8, 9, 10, 13, 19, 30, 31, 40, 41, 44, 47, 59, 70, 75, 76, 84, 88, 91, 92, 93, 95, 101, 105, 133, 139, 142, 149-50, 152, 154-55, 157, 163, 168, 175-76, 184, 194, 202, 204-06, 208, 210-12, 218, 221, 223-24, 234, 245, 250, 268, 273-76, 278-80, 283, 286, 288-89, 294, 308, 310, 313, 316, 320-35, 337-41, 348-51, 353, 356, 387-88, 391, 395-96, 399, 401, 406, 412-13, 415-16, 463, 467, 498-99, 510, 518-20, 522, 524, 526-27, 529, 537, 550-51, 555, 560, 562, 570, 578, 580, 582, 586, 593, 605, 607, 610, 613-14, 616, 624-26, 635-36, 639, 641, 644, 647, 650, 652, 653, 655, 661, 665, 668, 66972, 679, 681-83, 692 Park test, 184-85, 226 Partial autocorrelation function (PACF), 380-81, 38485, 401-03 Partial correlation coefficient, 283, 302-03, 305, 34547 Partitioning the total sum of squares, 48 Phillips-Ouliaris-Hansen tests, 500, 504-06, 564 Phillips-Perron tests, 463, 468-69, 471, 473, 491 Plim, 42, 111-12, 174, 247, 249, 642-44, 648-49 Poisson regression model, 567, 622, 623-28, 634 concepts, 622 definition, 622 deviance residual, 627 estimation of, 624 interpretation of ȕ 623 goodness of fit of, 626 goodness of fit tests, 627 marginal effect of, 625 maximum likelihood (ml) estimates, 624-25, 628 raw residual, 627 residuals, 627 Pearson residual, 627 variance-covariance matrix of ȕ, 625 Polynomial, 153, 158, 371, 388, 422, 474, 495-96, 518, 545 Pooled regression models, 641 assumptions, 641

720

Subject Index

concepts, 641 important properties of, 642 pooled least squares estimator, 642-43, 673, 694 property of asymptotic distribution, 643 property of consistency, 642 property of unbiasedness, 642 Population, 15, 16, 17, 18, 31, 40, 41, 42, 43, 44, 46, 47, 54, 86, 97, 98, 130, 133, 159, 166, 170, 172, 176, 180-82, 227, 245, 252, 267, 336, 391, 395-96, 428, 467, 498, 525, 529, 542, 577, 588, 598, 607, 610, 628, 640, 652, 665 Population regression function/equation, 15, 16, 17, 18, 46 Prais-Winsten estimation procedure, 272, 277 Prediction, 7, 16, 138, 172, 238, 322, 333, 351-52, 416, 521, 606, 620 Price elasticity, 18, 53, 54, 73 Price index, 4, 155-56, 359, 388-89, 394, 398, 410, 413-14, 416, 421, 492, 501-02, 554, 607 Primary data, 573, 632 Probability, 42, 46, 48, 106, 111, 247, 367-69, 390, 392, 426, 429-30, 436, 439, 444-45, 499, 520, 56981, 583-85, 587-90, 592-94, 596-98, 600-04, 60608, 611, 619, 621-23, 630-34, 642-44, 648-49, 693 Probability limit, 42, 111, 247, 642, 648 pth-order autoregressive process, 260, 278, 369, 370, 391 Probability density function, 106, 390, 392, 609, 611 Probability distribution, 3, 390, 569, 570, 644 Purchasing power parity, 501, 554, 564 p-value, 504, 560, 563, 573-75, 687, 689, 693 Q Q test, 263-65, 281, 375-76, 402, 419-20 Qth-order moving average process, 379 autocorrelation function of, 380 autocovariance function of, 380 invertible condition of, 380 maximum likelihood estimation of, 396, 398 mean of, 379 variance of, 380 Quadratic form, 108, 115-19, 143-44, 153, 159, 479, 679-80 Quadratic relationship, 178 Qualitative variable, 12, 573 Quantitative variable, 12 R R2 (see also the coefficient of determination), 49, 50, 51, 52, 53, 54, 58, 63, 67, 69, 71, 72, 73, 86, 88, 89, 90, 93, 94, 98, 123-26, 128, 130, 134-35, 154, 156, 158, 160-66, 200, 202, 207, 210-13, 229-30, 238, 258-60, 275, 292, 297-98, 333-37, 340, 354, 35758, 389, 404, 497-99, 506, 571, 573-74, 582, 595, 601, 604-07, 620, 626, 628, 666-67, 672-675 Ramsey’s RESET test, 535-38 Random effects models, 636, 652-53, 665, 672, 675, 694 assumptions, 653 Breusch-Pagan LM test for random effects, 659

comparison of between, GLS, OLS and within estimators, 662 concepts, 652 feasible generalized least squares (FGLS) of, 658 fixed effects estimator in a random effects model, 661 GLS as weighted least squares, 656-57 important properties of random effects estimator, 659 random effects estimator, 655, 658 variance structure of, 653 Random error terms, 41, 76, 95, 120, 137, 166-67, 170-71, 174, 176, 186-87, 197, 201-02, 207-08, 210, 221, 223, 226, 231-34, 238, 240-41, 243, 245, 247, 250-51, 255-56, 258, 262-63, 266, 280-81, 308, 353, 376, 378, 385, 404, 411, 503, 520, 528, 537, 550, 558, 562, 564, 570, 572, 574, 580, 62223, 638, 653, 657, 667-70 Random variable(s), 19, 43, 104, 169, 189, 231, 233, 251, 322, 390, 426, 428-30, 432, 434, 437, 439, 441, 576, 610, 614, 688 Random walk model, 362-63, 371, 419, 424, 427, 430, 432-33, 438, 441, 443, 450, 453, 469, 491 Rank of a matrix, 283, 290, 310, 345-48, 550, 552, 565 Rank correlation coefficient, 190, 191 Rank correlation test, 189, 190, 227 2 R-bar squared, (R2 see also R ), 51, 52, 58, 61, 63, 67, 69, 71, 72, 73, 88, 89, 90, 93, 124-26, 154, 161-65, 207, 210, 212-13, 229-30, 258, 275, 277, 334, 340, 354, 358, 404, 497-99, 506, 573-74, 582, 588, 595, 666, 673-74 Reciprocal models, 59, 61, 62, 63, 73 Recursive residuals, 352 Reduced form, 137, 509, 516, 549 Relationships, behavioral, 8, 14 definitional, 8, 14 dynamic, 9, 14 economic, 3, 4, 5, 7, 14, 35, 40, 41, 232-33 functional, 1, 5, 8, 9, 10, 11, 13, 15, 59, 61, 64, 67, 73, 153, 164, 170, 539, 496, 636 macro, 10, 14 micro, 10, 14 static, 9, 14 stochastic, 8, 9, 14 technical, 8, 9, 14 Regression analysis, 285, 322 Regressor, 16, 19, 20, 57, 58, 75, 131, 136, 138, 247, 252, 257-58, 296, 299, 301-02, 304, 306, 308, 322, 331, 351, 408, 446, 478, 506, 521, 541, 582, 627, 665, 678, 692 Regressor variable, 16, 19, 20, 124, 136, 138, 258, 301, 628 Regression coefficient(s), 3, 5, 6, 15, 16, 17, 20, 27, 30, 46, 47, 48, 53, 54, 59, 60, 61, 64, 67, 72, 75, 76, 77, 89, 94, 98, 313, 327, 424, 497-99, 537, 567-68, 672 Regression equation (see also regression model), autoregressive distributed lag (ARDL), 494-95, 543-44, 555-56 ARCH, 201-02, 347, 364, 404-06, ARMA, 385-86, 398, 401

Econometric Analysis: An Applied Approach to Business and Economics

ARIMA, 347, 363-64, 400, 418 censored regression, 567, 609, 615 classical linear, 166, 172, 231, 238, 272, 404, 414, 538, distributed lag, 493-98, 543-44, 555-56, double log, 67, 68, 69, 154-56, 160 dynamic, 493-94, 499 econometric, 2, 3, 4, 5, 7, 44, 76, 130, 131, 177, 231, 233, 238, 359, 387, 405, 410-11, 493, 568 GARCH, 347, 364, 410-11, 413 GARCH-in-Mean, 364, 414-15 limited dependent variable, 568, 604, 615, 622 linear, 15, 16, 17, 19, 20, 21, 22, 27, 28, 36, 48, 49, 52, 53, 68, 70, 75, 76, 94, 95, 97, 99, 105, 12425, 133, 135-36, 138, 142-43, 145, 149, 154, 159-60, 169, 172-73, 177, 179, 181-82, 192, 197, 199, 207, 209, 213, 216, 226-27, 241, 243, 250, 252, 283-84, 292, 307, 323, 330-33, 537, 565, 568, 645 linear-log, 59, 60, 61, 73 linear probability, 567-68, 570-71, 574, 576, 604, 607, 630 log-linear, 64, 65, 66, 67 logit, 567-68, 576-82, 585, 587-91, 594, 602, 62931 MA, 365, 376, 378-80, 399, 516-17, 531 multiple linear regression, 75, 76, 94, 95, 97, 99, 105, 124-25, 127-28, 133, 135-36, 138, 142-43, 145, 149, 154, 159-60, 172-73, 177, 181-82, 192, 197, 199, 207, 213, 216, 226-27, 283-84, 292, 307, 323, 330-33, 537, 565, 568, 645 multiple non-linear regression, 75, 153-55, 157 multivariate time series, 493, 506 Poisson regression, 567, 622, 623-28, 634 probit, 567-68, 571, 574, 576-77, 585, 588, 590-95, 598-07, 629-31 random walk, 362-63, 371, 419, 424, 427, 430, 432-33, 438, 441, 443, 450, 453, 469, 496 reciprocal, 61, 62, 63, 73 ridge regression, 310, 314 seemingly unrelated regression equations (SURE), 521, 537, 539, 542, 565, 679 simple linear regression, 15, 16, 17, 19, 20, 21, 22, 27, 28, 36, 48, 49, 52, 53, 68, 70, 133, 169, 177, 179, 192, 199, 209, 241, 243, 250, 252 simple non-linear regression, 15, 59, 61, 62, 64, 65, 66, 67, 68, 70, 75 single-equation, 284, 348, 536, 539, 676 spurious, 1, 359, 422, 493, 498-99, 554, 564 three variables, 76, 77, 78, 80, 82, 84, 86, 90, 94 time series econometric, 359, 364, 387, 419, 422 Tobit, 567-68, 609, 614-15, 619-20, 633-34 transformation of polynomial, 153 truncated regression, 567, 610 vector autoregressive (VAR), 493, 506, 508-09, 515-16, 518-21, 523, 525, 527-31, 535-37, 54449, 551-52, 554, 561-62, 564-65 vector error correction (VEC), 545, 546-47, 549-55, 565 Regression line, 31, 32, 171, 206, 348, 571-72, 577 Regularity conditions, 638

721

Rejection, 135, 182, 190, 192, 195, 197, 199-00, 202, 336, 383, 409, 559, 649, 661, 668, 670, 672, 681, 686, 692-93 Residuals, 17, 18, 20, 21, 22, 27, 30, 35, 36, 48, 49, 50, 51, 52, 54, 58, 61, 63, 67, 69, 70, 71, 77, 79, 81, 82, 84, 85, 86, 87, 90, 92, 93, 95, 96, 104-05, 114-18, 121-28, 130-31, 133-34, 136, 139-41, 144-45, 14852, 154, 157, 172, 176-77, 179, 182-88, 190, 19295, 197-02, 204, 206-07, 210-13, 242, 245-47, 249-52, 254-64, 271-79, 292, 298, 303, 314, 318, 321-22, 324, 334, 337-39, 341, 347, 349, 351-57, 387, 398-99, 401, 404, 407-09, 414-15, 418, 454, 459, 488, 490, 499, 501-06, 519-20, 522, 525-27, 533, 541, 559-62, 566, 570, 582, 595-96, 598, 604, 627, 640, 646, 649, 660-61, 668, 670-71, 673-74, 676-78, 681, 684, 689, 691 Residual sum of squares, 22, 27, 48, 49, 50, 51, 52, 54, 58, 70, 71, 77, 79, 81, 84, 85, 90, 92, 93, 96, 105, 114-18, 124-27, 130-31, 134, 136, 139, 141, 144, 148-52, 157, 160-62, 176, 182, 184, 186, 188, 194, 204, 206, 227, 229, 242, 271, 274-76, 292, 298, 318, 321-22, 334, 337, 339, 341, 349, 356, 387-88, 399, 454, 459, 559-62, 566, 604, 649, 661, 676, Response probability, 568, 573-74, 580-81, 593-94, 602, 630-34 Response variable, 16, 19, 20, 76, 324, 567, 575, 607, 622, 627, 633-34 Restricted regression model/equation, 85, 134-36, 150, 343, 349, 526-27, 556-57, 560-61, 676 Rho test, 466, 470, 472-74, 480, 482-88 Ridge regression, 310, 314 Ridge trace, 316, 321-22, 331 Risk, 4, 5, 15, 56, 360, 410, 414-15, 632 asset pricing, 56, 70, 414-15 aversion, 415 beta coefficients, 4, 5, 75, 94, 138, 141, 145, 632 of individual asset, 56 premium, 56, 415 return relationship, 5, 15, 414-15 systematic risk, 56 Risk-free rate, 56 Risk-free returns, 58 Riskless bond, 56 Runs test, 262 S Sample autocorrelation function, 373-74, 383-85, 401, 420 Sample data, 3, 46, 77, 95, 128, 211, 253, 372, 314, 324, 616, Sample information, 16, 17, 133 Sample mean, 104, 428-29, 431, 466, 498, 666 Sample observation(s), 10, 20, 22, 40, 46, 161-62, 170, 281, 314, 317, 324, 349, 392, 399-00, 413, 420, 527, 564, 585, 599, 613, 622, 624 Sample regression function/equation, 15, 17, 18, 70 Sample Size, 31, 35, 41, 42, 102, 141, 159, 170, 186, 190, 192, 202, 227, 357, 383, 387-88, 392, 398-99, 428, 438, 443, 450-51, 454-55, 458-61, 466-68, 471, 473, 475, 480-83, 485, 487, 498-99, 505, 51920, 528-29, 556, 570, 574, 580, 585, 606, 640, 642, 665, 688

722

Subject Index

Sample space, 426 Sample variance, 130, 417 Sargan test ( see overidentifying restriction tests), 681 SBIC (Schwarz Bayesian Information Criteria), 202, 203, 261, 264, 347, 357, 387, 389, 390, 394, 398, 400-01, 403, 410, 420, 484-87, 497-98, 501, 504, 520-21, 530, 554, 556-59, 561-62, 564 Scalar matrix, 81, 96, 105, 114, 116, 176, 183, 216, 242, 266, 393, 430, 512-13, 517-18, 526, 585, 638, 655-56, 668-70, 692 Seemingly unrelated regression equations (SURE), 521, 537, 539, 542, 565, 679 assumptions, 537-38 concepts, 537 definition, 537-38 estimated results, 542 explanation, 537 feasible generalized least squares estimation of, 541 meaning, 537 GLS estimation of, 539 OLS estimation of, 539 variance-covariance matrix, 539 Semi-log model, 64, 65, 66, 67 concepts, 64 definition, 64 estimation, 65, 67 Serial correlation, 231-34, 238, 249, 254, 258, 260, 278, 435, 450, 463-65, 491, 503, 528, 535, 561, 651, 667, 669, 678-79, 682, 686 Short-run effects, 544 Simple linear regression equation (see also simple linear regression model), 15, 16, 17, 19, 20, 21, 22, 27, 28, 36, 48, 49, 52, 53, 68, 70, 133, 169, 177, 179, 192, 199, 209, 241, 243, 250, 252 applied econometrics, 4, 5, 177 assumptions, 15, 19, 20, 22, 76, 95, 98, 138, 151, 156, 184-85 asymptotic properties of OLS estimators, 41, 42, 58, 70, 691 asymptotically normally, 42, 44, 58, 103, 110, 428, 522, 600, 643-44, 648-49 correlation coefficient, 49, 52, 54, 55, 56, 122, 124, 126, 128-29, 160, 167, 184, 190-91, 283, 29293, 295-96, 298, 301-03, 305, 343-47, 435, coefficient of determination, 49, 50, 51, 52, 53, 54, 58, 63, 67, 69, 71, 72, 73, 86, 88, 89, 90, 93, 94, 98, 123-26, 128, 130, 134-35, 154, 156, 158, 160-66, 200, 202, 207, 210-13, 229-30, 238, 258-60, 275, 292, 297-98, 333-37, 340, 354, 357-58, 389, 404 coefficient of non-determination, 52 confidence interval of, 41, 46, 47, 70, 71, 72, 86, 89, 93, 159-61, 163-64, 176, 245, 264, 286, 289, 314, 374-75, 385 deterministic model, 2 dependent variable, 1, 2, 3, 4, 6, 7, 8, 9, 10, 15, 16, 17, 20, 21, 27, 31, 38, 39, 49, 50, 51, 52, 54, 59, 60, 61, 62, 64, 67, 68, 69, 75, 76, 77, 86, 88, 90, 94, 98, 99, 124-25, 133, 138, 152, 154, 156, 171-72, 180-81, 199, 207, 210, 212-13, 216, 221, 223, 233, 238, 247, 257-58, 275, 290, 333-

34, 341, 354, 359, 387-88, 395, 398, 400, 404, 410, 417, 422, 486, 490 double log models, 67, 68, 69, 154-56, 160 economic model, 3, 4, 5, 7, 14, 35, 40, 41, 232-33 econometric analysis of economic model, 6, 7, 14 econometric model, 2, 3, 4, 5, 7, 44, 76, 130, 131, 177, 231, 233, 238, 359, 387, error sum of squares (see also residual sum of squares), 22, 27, 48, 49, 50, 51, 52, 54, 58, 70, 71, 77, 79, 81, 84, 85, 90, 92, 93, 96, 105, 11418, 124-27, 130-31, 134, 136, 139, 141, 144, 148-52, 157, 160-62, 176, 182, 184, 186, 188, 194, 204, 206, 227, 229, 242, 271, 274-76, 292, 298, 318, 321-22, 334, 337, 339, 341, 349, 356, 387-88, 399, 454, 459 Gauss Markov theorem, 38, 40, 209 independent variable, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 15, 16, 17, 20, 21, 27, 44, 51, 59, 61, 62, 64, 68, 75, 76, 88, 94, 124, 154, 184, 228, 284, 336, 343, 359, 422 limitation of econometrics, 5, 6 linear-log model, 59, 60, 61, 73 log-linear relationship, 64, 65, 66, 67 maximum likelihood method, 4, 19, 105, 216, 218, 265, 270, 391, 395-96, 398, 417 meaning of econometrics, 1, 14 meaning of theoretical model, 1 method of moments, 15, 19, 21 minimum variance property, 38, 41, 172 observed or actual value, 10, 20, 22, 171, 438, 441, 443 ordinary least squares (OLS) method, 4, 5, 19, 22, 27, 40, 58, 59, 60, 62, 64, 65, 66, 67, 68, 77, 97, 13738, 153, 155-56, 173, 179, 184-86, 188-90, 192-13, 201-06, 215-16, 231, 238-39, 249-51, 254-56, 25864, 266-68, 272-79, 283, 286, 289, 297-00, 303-04, 315, 336-37, 339, 341, 346-47, 354, 356, 388, 399, 408, 475, 490. partitioning total sum of squares, 48 probabilistic model, 2, 3, 4, 10 properties of the least squares estimators, 31 random error term, 41, 76, 95, 120, 137, 166-67, 170-71, 174, 176, 186-87, 197, 201-02, 207-08, 210, 221, 223, 226, 231-34, 238, 240-41, 243, 245, 247, 250-51, 255-56, 258, 262-63, 266, 280-81, 308, 353, 376, 378, 385, 404, 411 reciprocal, 59, 61, 62, 63, 73 relationship between correlation coefficient and regression coefficient, 54 relationship between regression parameter and elasticity, 52 regression sum of squares, 48, 49, 50, 51, 52, 85, 86, 114, 115, 117, 124, 130, 131, 134, 137, 139, 152, 159, 161, 333 sampling distribution of, 16, 19, 42, 44, 70, 100, 119, 159 scope of econometrics, 5 statistical model, 2, 272, 404 t-test (see also t ratio), 86, 90, 93, 98, 133-34, 154, 156, 162, 176, 185, 189-90, 196, 198-99, 201, 203, 221, 245, 250, 254, 260, 262, 273-74, 27678, 301-02, 305, 310, 312, 332, 334, 346, 350-

Econometric Analysis: An Applied Approach to Business and Economics

51, 353, 358, 388, 400, 409, 424, 442, 447, 450, 466, theoretical econometrics, 1 total sum of squares, 48, 49, 50, 51, 52, 54, 70, 71, 85, 86, 88, 93, 114-16, 124-26, 130-31, 134, 138-39, 150-52, 159, 161-62, 229, 292, 300, 333-34 Unbiasedness, 32, 34, 41, 172-73, 238-40, 286, 310, 373, 570 variance of, 9, 19, 28, 34, 35, 38, 39, 40, 41, 42, 44, 56, 57, 58, 76, 82, 83, 89, 90, 92, 94, 100-03, 113, 119-23, 126-27, 130-31, 140, 142, 157, 159-62, 166-67, 169-72, 174-77, 179-84, 18687, 192, 194, 196, 198, 200-01, 203, 205-12, 214-15, 217-26, 228-34, 237-38, 240-48, 25152, 262-63, 265-66, 268-70, 277-78, 280, 286, 288-93, 300-01, 306, 308-11, 313-14, 319-23, 325, 327-28, 330-33, 347, 350, 352-53, 359-62, 365-66, 369-70, 373-74, 377-78, 380 variance-covariance matrix of, variance-covariance matrix, 92, 94, 100-03, 113, 121-22, 126-27, 140, 142, 157, 172-74, 176, 208-10, 212, 21415, 222, 224-25, 229-31, 237, 240-41, 245, 265-66, 270, 288, 319, 327, 392, 416, 477 Significance level, 58, 61, 63, 66, 67, 69, 90, 91, 154, 165, 189, 207, 211-13, 257-58, 275, 279, 344-46, 404, 410, 414, 418, 502, 504, 519, 573, 582, 588, 596, 601, 673-75, 678, 682 Small sample properties, 41, 264, 520, 556 Software package(s), 149, 158, 203, 255, 274-75, 279, 283, 305, 317-18, 329-30, 339, 342, 346, 353-54, 359, 384, 396, 398, 400-01, 414, 416, 418, 422, 490, 520-22, 530, 563, 567, 629, 631-32, 634-36, 665 Spearman’s rank correlation test, 189, 190, 227 Specification, 199, 298, 347, 355, 359, 378, 385, 401, 413, 415-16, 422, 499, 518, 522, 537, 552, 644, 612, 665, 666, 677, 682, 683, 686, 688 Specification errors, 199, 355 Spurious regression, 1, 359, 422, 493, 498-99, 554, 564 Square matrix, 112 SRF ( see also sample regression function), 17, 18 Stability test, 351 Standard deviation, 207, 296, 301, 415, 418, 437, 533, 627, 685-86 Standard error of, 41, 61, 63, 67, 69, 71, 72, 82, 83, 90, 91, 154, 165, 176, 185, 189, 207, 210, 212-13, 221, 238, 243, 245, 257-58, 283, 286, 310, 354, 374, 401, 404, 421, 437, 449-53, 455-58, 460, 465, 505, 523, 530, 533, 555, 570-71, 573-74, 582, 591, 638, 651, 674-75, 684 Static relationship, 9, 14, 691-92 Stationarity, 360 covariance, 360-61 mean, 360-61 non-stationary, 361 trend stationary, 361 difference stationary, 361 variance, 360-61 Stationary time series, 403, 428, 430, 461, 515 Statistically significant, 15, 44, 58, 63, 66, 71, 72, 75, 90, 91, 98, 128, 160-62, 176, 189, 207, 213, 245,

723

264, 303, 310, 343-44, 346, 374-75, 385, 410, 41415, 418, 498-99, 531, 553, 559, 582, 588-89, 59596, 301-02, 604, 620-22, 628-29, 666, 673-75, 682 Statistical tables, 696-702 Stepwise regression, 333-34, 345 Stochastic disturbances (see also random error terms), 10, 41, 76, 95, 120, 137, 166-67, 170-71, 174, 176, 186-87, 197, 201-02, 207-08, 210, 221, 223, 226, 231-34, 238, 240-41, 243, 245, 247, 250-51, 25556, 258, 262-63, 266, 280-81, 308, 353, 376, 378, 385, 404, 411, 503, 520, 528, 537, 550, 558, 562, 564, 570, 572, 574, 580, 622-23, 638, 653, 657, 667-70 Stochastic process, 360-61, 404, 426, 429, 683, 688 Stochastic variable(s), 3, 19, 76, 168, 194, 200-01, 208, 211, 218, 228, 267, 260, 364 Stock return(s), 15, 140, 410, 414-16, 418-19, 421 Structural break(s), 346, 349-51, 353, 355, 357 Structural form(s), 167, 184, 228 Structural models, 509, 536, 539, 567 Student’s t-test, (see also t-ratio), 86, 90, 93, 98, 13334, 154, 156, 162, 176, 185, 189-90, 196, 198-99, 201, 203, 221, 245, 250, 254, 260, 262, 273-74, 276-78, 301-02, 305, 310, 312, 332, 334, 346, 350, 351, 353, 358, 388, 400, 409, 424, 442, 447, 450, 466, 497-98, 501, 555, 574 Sum of squares explained/regression, 48, 49, 50, 51, 52, 85, 86, 114-15, 117, 124, 130, 131, 134, 137, 139, 150152, 159, 161, 194, 333 residual, 22, 27, 48, 49, 50, 51, 52, 54, 58, 70, 71, 77, 79, 81, 84, 85, 90, 92, 93, 96, 105, 114-18, 124-27, 130-31, 134, 136, 139, 141, 144, 14852, 157, 160-62, 176, 182, 184, 186, 188, 194, 204, 206, 227, 229, 242, 271, 274-76, 292, 298, 318, 321-22, 334, 337, 339, 341, 349, 356, 38788, 399, 454, 459, 559-62, 566, 604, 649, 661, 676, total, 48, 49, 50, 51, 52, 54, 70, 71, 85, 86, 88, 93, 114-16, 124-26, 130-31, 134, 138-39, 150-52, 159, 161-62, 229, 292, 300, 333-34, 604, 666 Super consistency, 558 Symmetric matrix, 102, 105, 108, 112-13, 115-16, 118-23, 215, 393 Systematic risk, 56 T Table(s) (see also individual tables), 18, 28, 30, 31, 45, 47, 53, 58, 61, 63, 66, 69, 84, 85, 86, 88, 90, 92, 93, 98, 99, 126, 128, 131-33, 135-43, 148-49, 152-55, 158-59, 178-88, 190-92, 194-96, 198-01, 203, 207, 210, 212, 213, 221, 245, 247, 251, 254, 256-65, 275, 279, 289, 293, 295, 302-05, 308-10, 317, 32930, 335-47, 350-51, 353-54, 356-57, 375-76, 387, 389-90, 394-95, 398, 401-04, 409-10, 414, 416-18, 447, 450-61, 466-74, 480-90, 499, 501-06, 519-23, 527, 530, 535-36, 542, 553-55, 557, 560, 562-63, 570, 573-74, 578, 581-84, 588-90, 594-98, 601-04, 606-07, 620, 628-29, 668, 670, 673-78, 682, 686, 690, 693 Table value(s), 45, 47, 53, 84, 85, 86, 92, 93, 128, 131, 133, 135-36, 139, 140-43, 148-50, 152-53, 158,

724

Subject Index

180-81, 183-84, 186-88, 192, 194-96, 198, 200-01, 203, 207, 210, 212, 251-59, 261-65, 302-05, 341, 343-47, 350-51, 353, 357, 376, 409, 450-61, 46674, 480-88, 505-06, 519, 527, 562, 590, 603-04, 668, 670, 676-78 T-test (see also t-ratio) 86, 90, 93, 98, 133-34, 154, 156, 162, 176, 185, 189-90, 196, 198-99, 201, 203, 221, 245, 250, 254, 260, 262, 273-74, 276-78, 30102, 305, 310, 312, 332, 334, 346, 350-51, 353, 358, 388, 400, 409, 424, 442, 447, 450, 466, 497-98, 501, 555, 574 Testing statistical hypothesis (see also hypothesis testing), 6, 133, 243, 424, 523, 574 alternative, 45, 71, 84, 89, 91, 92, 128, 131, 133140, 142-143, 145, 150, 152, 157, 180-83, 18586, 189-93, 195-03, 218, 243, 251-53, 255-56, 258-64, 289, 301-05, 341-45, 349-51, 353, 35556, 376, 408, 450-60, 469, 470-73, 483-84, 486-88, 501-02, 505-06, 519, 526, 552-53, 55659, 561-63, 589-90, 603, 649, 659, 665, 668, 669, 672, 676-78, 683, 687-89, 691 ARCH, 408 CAPM, 58 autocorrelation, 252, 255, 260, 262, 596, 598, 66768, 677 causality tests, 500, 558-59, 561, 565 cointegration, 500, 503-04, 550, 557, 692 F-test, 44, 75, 85, 86, 126-28, 131, 136, 138-40, 142-43, 145-46, 148, 149, 150, 158, 165, 176, 187, 189, 198, 243, 245, 261, 301-02, 304, 34149, 408, 447, 454-56, 459, 531, 556-58, 562-63, 660 functional-form misspecification test, 353 heteroscedasticity, 179-81, 183-92, 194-96, 198, 200-01 joint null hypothesis, 135, 263, 454, 456, 460, 478, 563, 669 joint significance tests of regression coefficients, 649, 672, 675 Lagrange multiplier test, 145, 158, 202, 260, 678 likelihood ratio test, 183, 192-93, 218, 356, 518, 525-27, 553, 589, 603, 606 linear restriction, 143, 145 null hypothesis, 40, 41, 45, 46, 71, 84, 85, 89, 91, 92, 93, 126-28, 130-31, 133-52, 157-58, 176, 180-03, 218, 243, 245, 251-65, 278, 283, 286, 289, 301-05, 322, 332, 341-45, 349, 351-53, 355-57, 374-76, 401, 408-10, 414, 418, 424, 426, 437-38, 441, 443, 447, 449-61, 463, 46674, 478, 480-90, 501-06, 519-20, 526-28, 55254, 556-63, 589-90, 603-04, 606, 626, 649, 659-61, 665-66, 668-72, 675, 677-78, 683-89, 691-93 p-value, 504, 560, 563, 673-75, 687-88, 693 significance level, 58, 61, 63, 66, 67, 69, 90, 91, 154, 165, 189, 207, 211-13, 257-58, 275, 279, 344-46, 404, 410, 414, 418, 502, 504, 519, 573, 582, 588, 596, 601, 673-75, 678, 682 specification test, 665-66, 677 t-ratio (see also t-test), 86, 90, 93, 98, 133-34, 154, 156, 162, 176, 185, 189-90, 196, 198-99, 201, 203, 221, 245, 250, 254, 260, 262, 273-74, 27678, 301-02, 305, 310, 312, 332, 334, 346, 350-

51, 353, 358, 388, 400, 409, 424, 442, 447, 450, 466, 497-98, 501, 555, 574 unit root, 435, 437-38, 441-43, 447, 449-50, 461, 466, 468-75, 479-88, 501, 503, 558, 596, 620, 682-83, 685-88, 696 vector autoregressive model, 523 Theorem(s), 38, 40, 42, 44, 55, 56, 100, 104, 111-13, 130, 156, 320, 422, 426, 428-30, 433-34, 477, 491, 502, 542, 554, 564, 644, 648-49, 651, 658, 689 theorem 3.8.1: consistency of OLS estimator s2, 111 theorem 3.8.2: symmetric and idempotent matrix, 112 theorem 3.8.3: symmetric and idempotent matrix, 113 theorem 3.8.4: normally distributed, 113 theorem 3.8.5: mean of the observed and estimated values of Y, 113 theorem 3.8.6: total sum of squares can be partitioned, 114 theorem 3.8.7: RSS = ȕˆ 163 ;1 < ȕˆ 263 ; 2 < ȕˆ k63 X k ,Y)

,

114 theorem 3.8.8: RSS, ESS and TSS can be expressed as the quadratic forms, 115 2

theorem 3.8.9: E(YcAY) = ı WUDFH $ ȝc$ȝ , 117 theorem 3.8.10: regression mean square (RMS) 2 will be an unbiased estimator of ı , 117 theorem 3.8.11: residual mean square (EMS) will 2 be an unbiased estimator of ı , 118 ESS 2 theorem 3.8.12: ı is distributed as chi-square

with degrees of freedom (n-k-1), 118 ˆ theorem 6.6.1: the least squares estimate ȕ becomes too large in absolute value in the presence of multicollinearity problem, 294 theorem 6.6.2: the maximum likelihood (ML) 2 estimate of ı cannot be obtained if the multicollinearity is perfect, 294 Three variables regression equation, 76, 77, 78, 80, 82, 84, 86, 90, 94 adjusted R2, 88, 93 coefficient of multiple determination, 86, 93 concepts, 76 confidence interval estimation for, 86, 93 Cramer’s rule, 79 Definition, 76 deviation form, 80 estimation, 77, 80, 84, 88, 90 matrix form, 79, 90 meaning, 76 substitution method, 78 test of significance of equality of, 84, 93 test of significance of parameter estimates, 84, 91 variance and standard error of OLS estimators, 82, 90 Time series, 359 concepts, 359 definition, 359 difference stationary, 361

Econometric Analysis: An Applied Approach to Business and Economics

gausian white noise process, 361 lag operator, 363 non-stationary, 361 random walk model, 362-63 stationary, 360 trend stationary, 361 uses, 360 white noise process, 361 Time series data, 11, 14, 16, 21, 27, 60, 62, 66, 68, 97, 167, 201, 233-34, 250, 348, 351, 359, 372-73, 400, 402-03, 406, 410, 414, 419, 422, 435, 451, 453-54, 457-58, 464, 471, 473-74, 483-88, 490-92, 636, 638, 640, 672, 694 Time series econometric models, AR process, 234, 364, 368-71, 387-89, 394, 474, 507, 670 ARCH model, 201-02, 347, 364, 404-06, 409-10, 412 ARMA process, 385-86, 398, 401 ARIMA process, 347, 363-64, 400, 418 Box-Jenkins approach, 387, 400, 401-02, 404 concepts, 359 conditional maximum likelihood estimates, 395, 399 diagnostic checking, 385, 400-01, 404 estimation of, 387, 406, 413, 415, 450, 453, 45658, 497, 521 GARCH models, 347, 364 GARCH-in-Mean models, 364, 414 Gaussian white noise process, 255-63, 265, 26870, 272-79, 291, 297, 300, 303, 310, 361-62, 390, 394, 398, 400, 419, 423-24, 501, 559 MA process, 365, 376, 378-80, 399, 516-17, 531 maximum likelihood (ML) method, 265, 270, 281, 294, 390-91, 395-96, 398, 400, 407-08, 416, 520, 523, 525, 578, 584, 601 model selection criteria, 387, 401 multivariate time series models, 493-665 partial autocorrelation function, 380-81, 384-85, 401-03 sample ACF, 367, 373-75, 383-85, 401-03, sample PACF, 380, 381, 384-85, 401-03, stationarity condition, 235, 370, 419, 512 test for stationarity based on correlogram, 372 white noise process, 238, 241, 247-48, 255-63, 265, 268-70, 272-291, 281, 291, 297, 300, 303, 310, 361-64, 368-79, 376, 378-81, 385, 387, 389-90, 394, 398, 400, 419, 423-24, 491, 49495, 501-05, 542-43, 549, 559 Time series variables, 360, 364, 374, 400, 419, 423, 493, 507, 515, 520, 523, 535-36, 560 Tobit model, 567-68, 609, 614-15, 619-20, 633-34 Total cost function, 153-54 Trace test, 552-53, 555, 565, 693 Treatment effects Trend deterministic, 446, 488, 547 stochastic, 539, 422, 499, 555 Trend variable, 447 Trend stationary, 361, 419, 488 Truncated data, 568, 608, 610-11, 633 Truncated regression model(s), 567, 610, 633 Truncation, 567, 607-09, 611, 633, 685

725

Total sum of squares, 48, 49, 50, 51, 52, 54, 70, 71, 85, 86, 88, 93, 114-16, 124-26, 130-31, 134, 138-39, 150-52, 159, 161-62, 229, 292, 300, 333-34, 604, 666 Two-step estimator, 541 Two-stage least squares method, 266, 681, 687 Two variables linear regression model(s), 16, 19, 22, 52 Two variables non-linear regression model(s), 59, 62, 93 U Unbiasedness, 32, 34, 41, 172-73, 238-40, 286, 310, 373, 570, 642, 647 Unbiased estimator, 33, 34, 35, 38, 39, 40, 100-02, 105, 108, 117-19, 172-73, 176, 225, 240, 242-43, 245-47, 312, 314-15, 320, 348, 407, 540-42, 64748, 659 Uncentered R-squared, 61, 63, 67, 69, 90, 154, 207, 210, 212-13, 230, 258 Unit matrix, 154, 509, 639, 655 Unit root augmented Dickey-Fuller (ADF) test, 422, 461, 474, 480-87, 557-58, 561-62, 488, 490, 492, 499, 501-04, 554, 556, 683, 685,-88, 691-92. concepts, 422 definition, 423 Dickey-Fuller (DF) tests, 422, 435, 437, 450-61, 474, 480, 483-89, 503, 545, 554, 558, 691 Engle Granger test, 500-01, 503, 554 Kwiatkowski, Phillips, Schmidt, Shin (KPSS) tests, 488, 489-91, 689 Phillips-Perron (PP) tests, 463, 468-69, 471, 473, 491 summary of the ADF tests, 482 summary of Dickey-Fuller tests, 450 summary of the Phillips-Perron tests, 469 unit root tests with serially correlated errors, 461 Unrestricted residual sum of squares, 92, 131, 134, 141, 144, 148, 150, 152, 157, 160, 341, 351, 356, 454, 459, 561, 649, 661, 676 V VAR models (see also vector autoregressive models), 493, 506, 508-09, 515-16, 518-21, 523, 525, 52731, 535-37, 544-49, 551-52, 554, 561-62, 564-65 advantages of, 536 asymptotic variance-covariance matrix, 522 autocovariance matrix of VAR(p) model, 512, 514-15 concepts, 506 conversion of a VAR process into a VMA process, 517-18 definitions, 506 determining the optimal lag length of, 518 disadvantages of, 536-37 error correction mechanism, 542 estimation of VAR models, 521, 523, 530 forecasting, 528, 531 forecast error variance decomposition, 533, 535 impulse response function, 531

726

Subject Index

information criteria 519-20 likelihood ratio test to determine, 518 maximum likelihood estimation and hypothesis testing for, 523 mean value of, 512 rewriting a VAR(p) model, 511 stationary vector autoregression model, 509 VAR model in case of contemporaneous terms, 515 variance of VAR(1) process, 514 variance-covariance of, 526-27 Variable(s) Binary variable, 13, 591, 606, 651 binary dependent, 567-71, 590-91, 594, 601, 60406, 622 categorical, 12, 13, 14, 606, 627 continuous, 12, 14 control, 13 dichotomous, 13, 14, 467, 568, 622 dummy, 567-68, 584, 590 dummy dependent, 567-68, 584, 590 dependent, 1, 2, 3, 4, 6, 7, 8, 9, 10, 15, 16, 17, 20, 21, 27, 31, 38, 39, 49, 50, 51, 52, 54, 59, 60, 61, 62, 64, 67, 68, 69, 75, 76, 77, 86, 88, 90, 94, 98, 99, 124-25, 133, 138, 152, 154, 156, 171-72, 180-81, 199, 207, 210, 212-13, 216, 221, 223, 233, 238, 247, 257-58, 275, 290, 333-34, 341, 354, 359, 387-88, 395, 398, 400, 404, 410, 417, 422, 486, 490, 492-93, 499, 506, 537, 542, 55657, 562-63, 568-69, 571-74, 577-78, 581-82, 584, 587-88, 590, 591, 593-98, 601, 604-07, 609-12, 615, 619, 622-23, 627-28, 636, 641, 659, 665-66, 668-69, 678, 682 discrete, 12 endogenous, 254, 508-09, 516-17, 530, 539, 54950 exogenous, 211, 352, 510, 530, 536-37, 539, 568, 590, 663 explanatory variables ( see also regressor variables), 7, 16, 19, 20, 75, 76, 77, 94, 98, 166, 168, 180, 182, 186-87, 189-90, 200, 234, 255, 257, 258, 283-85, 291-92, 297-00, 301-02, 306, 331, 335, 337-38, 341, 446, 493, 537,567, 577, 587, 608, 634, 638 independent, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 15, 16, 17, 20, 21, 27, 44, 51, 59, 61, 62, 64, 68, 75, 76, 88, 94, 124, 154, 184, 228, 284, 336, 343, 359, 422, 493, 567-68, 573, 580-81, 587, 591, 593, 610, 614-15, 617, 638, 661 instrumental, 257, 636, 678-79 lagged dependent, 216, 247, 257-60, 281, 528 lagged independent, 281 panel, 690-93 qualitative, 12, 573 quantitative, 12 random, 19, 43, 104, 169, 189, 231, 233, 251, 322, 390, 426, 428-30, 432, 434, 437, 439, 441, 576, 610, 614, 688 stochastic, 3, 19, 76, 168, 194, 200-01, 208, 211, 218, 228, 267, 260, 364 time series, 360, 364, 374, 400, 419, 423, 493, 507, 515, 520, 523, 535-36, 560

Variance (see also covariance, heteroscedasticity) analysis of, 130-31, 160 of autocorrelated random error terms, 234 of the AR processes, 235, 365-66, 369-70 of the ARMA processes, 286 of the residual(s), 245, 247 of the MA processes, 377-78, 380 of OLS estimators, 35, 39, 40, 41, 42, 44, 83, 89, 90, 107, 108, 110, 119, 126-27, 159, 174, 228, 240-43, 286, 289-93, 300, 321-22, 327, 541 of the prediction, 120-21, 248, 268, 361, 373, 382, 390, 413, 416, 528-29, 535 of the random error terms, 35, 166, 238, 405-06 of VAR models, 514, 526-27 ˆ , 120 of Y structure of, 653-54, 694 Variance covariance matrix, 92, 94, 100-03, 113, 12122, 126-27, 140, 142, 157, 172-74, 176, 208-10, 212, 214-15, 222, 224-25, 229-31, 237, 240-41, 245, 265-66, 270, 288, 319, 327, 392, 416, 477, 514, 519-20, 522, 526-27, 539-40, 551, 562, 642, 644-46, 648-49, 651, 654-55, 659, Variance decomposition proportions, 307, 309 Variance inflation factor (VIF), 300-01, 331 Variance property of the ML estimators, 107 Variance property of the OLS estimators, 41, 101 Vectors, 325, 429, 507, 532, 545-46, 548, 552-53, 600 Vector error correction (VEC) models, 545, 546-47, 549-55, 565 advantages of the ECM, 554 bounds test approach for cointegration, 555, 557, 565 cointegration, 547, 565 cointegrated VAR model, 547, 665 cointegrated vectors, 500, 542, 544-45, 552, 554 Johansen’s test for cointegration, 500, 550 long-term relationship in, 546 maximum eigenvalue test, 552-55 specification of deterministic terms, 552 trace test, 552-53, 555, 565, 693 W Wald test, 75, 142-43, 146-47, 160, 165, 562, 678 Weak stationarity, 533 Weighted least squares (WLS) method, 4, 6, 203, 204, 206-07, 209, 572, 574, 582, 630 White noise process, 238, 241, 247-48, 255-63, 265, 268-70, 272-291, 281, 291, 297, 300, 303, 310, 361-64, 368-79, 376, 378-81, 385, 387, 389-90, 394, 398, 400, 419, 423-24, 491, 494-95, 501-05, 542-43, 549, 559 White test for heteroscedasticity, 198-99, 202 Within estimator(s), 646, 657, 662-63, 694 Wiener process, 427 Y Yield, 169, 187, 213, 278, 641 Yielding, 585 Yule-Walker equations, 382, 416

Econometric Analysis: An Applied Approach to Business and Economics

Z z-test ( see standardized normal test), 184, 252, 628 Zero conditional mean assumptions, 19, 20, 76, 172, 214, 216, 221, 223, 537, 641, 653, 662-63, 672

727

Zero mean, 19, 57, 103, 167, 171-72, 180, 182, 194, 209, 352, 361-62, 374, 405-06, 428, 435, 462, 478, 533, 552, 558, 561, 640, 673, 686 Zero correlation, 257

GREEK LETTERS AND MATHEMATICAL SYMBOLS USED IN THE BOOK

Greek Letters

Name

Į ȕ Ȥ

alpha beta

Population regression coefficient Population regression coefficient

chi delta

A variable with a chi-square distribution/chi-square test statistic Small number, used for Partial derivative, population regression coefficient, coefficient on time trend Change in value of variable Random error term, a white noise variable, a Gaussian white noise variable Autocovariance matrix for vector process Population regression coefficient , autocovariance for scalar process Matrix of eigenvalues Population regression coefficient, Individual eigenvalue, Lagrange multiplier, Coefficient of the ECM Population mean Variance-covariance matrix Random error term Product

į

delta epsilon gamma gamma lambda lambda

' İ ī Ȗ ȁ O ȝ : Ȧ

mu omega omega pi

–

Interpretation

Ȍ \

pi phi phi psi psi

ȡ

rho

¦

sigma

The number 3.14159 Matrix of autoregressive coefficient Autoregressive coefficients Matrix of moving average coefficients for vector MA process Moving average coefficient for scalar MA process, Forecast error variance decomposition Autocorrelation, Autocorrelation coefficient, Coefficient of the first lag of the dependent variable Y for unit root problem. Symbol of summation

6 ı 4 ș ș ȟ

sigma sigma tau theta theta theta xi

Long-run variance-covariance matrix Population standard deviation, Standard deviation of the random error terms Time index, Test statistic testing for unit root problem Matrix of moving average coefficients Vector of population parameters Coefficients of AR process Random error term

ȗ

zeta

ʌ ĭ

I

W

Other Letters a b C CLRM e f g i I J k

Random error term

Interpretation Elements of matrix Elements of vector/matrix Constant terms in a VAR model Classical linear regression model Residual Symbol of function Symbol of function Subscript for cross-sectional data Identity matrix Matrix of ones Number of independent variables in a regression equation

Econometric Analysis: An Applied Approach to Business and Economics

L m LM M n OP(T) p p P q Q r s 2 or s T2 T u X X

yt Y Z

The lag operator Number of groups Lagrange multiplier test Matrix Sample size for a cross-sectional data, Number of individuals in a panel data, Number of variables observed at date t in a vector system Order T in probability The order of an AR process Prob(Yi 1) Matrix The order of MA process, Number of autocovariances used in Newey-West estimate, Number of restrictions, Variance-covariance matrix of disturbances, Limiting value of (X cX)/T where X is a (T u k) matrix of independent variables Index of date for a continuous time process Unbiased estimate of ı 2 for an OLS regression with sample of size n or T Sample size for time series variable Random error term, {u} is a white noise, {u} is a Gaussian white noise Explanatory variables of regression equations (n×k) matrix of explanatory variables for an OLS regression tth observation of dependent variable y

(n×1) vector of the observations of dependent variable y Standard normal variate

Mathematical Symbols Corr(X, Y) Cov(X, Y) dy dx Gy Gx E(Y) E(Y|X) exp(x)

f Y (y) In … log(x), ln(x) log(L,ȝı 2 Ly t , L2 y t , Lk y t (1  I L)-1

Explanation Correlation between X and Y Covariance between X and Y Differentiation of y with respect to x Partial derivative of y with respect to x Expectation of Y Expected value of Y given X The variable x is appeared as an exponent to the base e (the base for natural logarithms) Probability density function of the random variable Y Identity matrix of order n Kronecker product Natural logarithm of x Log-likelihood function in which two parameters ȝ and ı 2 are to be estimated

y t-1 , L2 y t = L(Ly t ) = L(y t-1 ) = y t-2 , Lk y t = y t-k , The lag operator (L) shifts a time value y t back by one period Ly t

(1  I L)-1

f

i

i

¦I L i=0

p

Prob(Yi

1)

q

Prob(Yi

0)

Var(X) |X| Xc (X cX) (X cX)-1

X~N(ȝı 2

729

Variance of the variable X Determinant of a square matrix X Transpose of the matrix X Product of two matrices X c and X Inverse matrix of (X cX) The variable X is normally distributed with mean ȝ and variance ı 2

Greek Letters and Mathematical Symbols Used in the Book

730

Xn o Y L X T  oY p

X n  oY p

X T (.)  o X(.) L X T (.)  o X(.)

y# x ˆ Y t

1T

lim X n = Y

n of

X T converges in distribution to Y X n converges in probability to Y The sequence of functions whose value at r is X T (r) converges in probability to the function whose value at r is X(r) The sequence of functions whose value at r is X T (r) converges in probability law to the function whose value at r is X(r) y is approximately equal to x Linear projection of Y at time t when Y is regressed on k N• independent variables including constant. Square matrix of ones