225 47 8MB
English Pages 644 [645] Year 2022
HANDS-ON INTERMEDIATE ECONOMETRICS USING R Templates for Learning Quantitative Methods and R Software (Second Edition)
B1948
Governing Asia
This page intentionally left blank
B1948_1-Aoki.indd 6
9/22/2014 4:24:57 PM
HANDS-ON INTERMEDIATE ECONOMETRICS USING R Templates for Learning Quantitative Methods and R Software (Second Edition)
Hrishikesh D Vinod Fordham University, USA
World Scientific NEW JERSEY
•
LONDON
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TA I P E I
•
CHENNAI
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Control Number: 2022008587 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
HANDS-ON INTERMEDIATE ECONOMETRICS USING R Templates for Learning Quantitative Methods and R Software Second Edition Copyright © 2022 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 978-981-125-617-2 (hardcover) ISBN 978-981-125-673-8 (paperback) ISBN 978-981-125-618-9 (ebook for institutions) ISBN 978-981-125-619-6 (ebook for individuals) For any available supplementary material, please visit https://www.worldscientific.com/worldscibooks/10.1142/12831#t=suppl Desk Editors: Aanand Jayaraman/Venkatesh Sandhya Typeset by Stallion Press Email: [email protected] Printed in Singapore
Aanand Jayaraman - 12831 - Hands-on Intermediate Econometrics Using R.indd 1
1/3/2022 3:52:57 pm
March 12, 2022
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
To the memory of My father Maharshi Nyaya Ratna Vinod, who had been an Indian freedom fighter, Marathi poet, and was elected as the World Peace Ambassador, http://www.maharshivinod.org/ My mother Maitreyi Dhundiraj Vinod, who was a pioneer in women’s education in India and an inspiration to millions of women through her own outstanding academic achievement and My mother-in-law Dr. Kamal Krishna Joshi, who provided free medical care to countless poor and needy women in Nagpur, India, for 60 long years, twice a week.
v
page v
B1948
Governing Asia
This page intentionally left blank
B1948_1-Aoki.indd 6
9/22/2014 4:24:57 PM
March 12, 2022
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Foreword Comments cum Endorsements by Distinguished Econometricians The following are in alphabetical order by the last name of distinguished professors, who are all Fellows of the Journal of Econometrics. Professor William A. Barnett, Oswald Distinguished Professor of Macroeconomics, University of Kansas, USA This book provides a unified and broadly accessible presentation of modern econometrics, with emphasis on application and computing. Professor Jean-Marie Dufour, William Dow Professor of Economics, McGill University, Canada This book is a beautiful and highly accessible introduction to modern econometric methods tailored to the needs of those who wish to apply them. The main theoretical concepts are clearly explained in a non-technical way and the link with economic theory is underscored. Hrishikesh Vinod also undertook to show how the methods can be programmed with the powerful (and free) R software, which has become the reference language of statisticians for both statistical computing and graphics. Students of this book can thus quickly apply and modify the methods explained in the book, and they will also be able to draw from a wide stock of freely available code to pursue their own projects. This book is a major addition to applied econometrics, which should prove useful to students and researchers all over the world. vii
page vii
March 12, 2022
viii
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Hands-on Intermediate Econometrics Using R
It should also contribute to bridge the gap between econometricians and statisticians. I strongly recommend this book to students and applied researchers, and will certainly do so for my own students. Professor Subal C. Kumbhakar, University (distinguished) Professor, State University of New York, Binghamton The learning of econometrics is never complete without some “handson” experience. One way to accomplish this is to work on replicating previously published results. This is much easier today than it was 50 years ago thanks to the recent trend in transparency of applied econometric research. Many reputed journals now require that data and software be available for others to replicate results. This gives researchers, young and old, an opportunity to test their modeling and estimation skills. Recognizing the importance of this hands-on work, many textbook writers provide datasets from published papers on a CD and add problem sets to verify and extend results. Since all of these involve the use of some software, some books today come bundled with a student or full version of it. These “tailored” versions of the software most often have limited capacity in terms of what they can do. Vinod has done a great service by writing a book utilizing R software that can supplement the standard econometrics textbook in an advanced undergraduate and applied graduate econometrics course. Although the book does not cover all the standard topics in an econometrics textbook, it gives a much deeper understanding in terms of economic theory and econometric models and explains in detail how to use the R software on the topics covered. R is open source software that is as powerful as commercial software. Vinod utilizes the software to teach econometrics by providing interesting examples and actual data applied to important policy issues. This helps the reader to choose the best method from a wide array of tools and packages available. The data used in the examples along with R program snippets illustrate economic theory and sophisticated statistical methods that go beyond the usual regression. The R program snippets are not merely given as black boxes, but include detailed comments which help the reader better understand the software steps and use them as templates for possible extension and modification. Readers of this book, be they students of econometrics or applied economists, will benefit from the hands-on
page viii
March 12, 2022
8:57
Hands-on Intermediate Econometrics Using R. . .
Foreword
9in x 6in
b4465-fm
page ix
ix
experience either by using the R snippets in replicating published results or by customizing them for their own research. Professor Peter C. B. Phillips, Sterling Professor of Economics & Professor of Statistics, Yale University, USA Modern approaches to econometrics education acknowledge the importance of practical implementation even in introductory courses. The R software package provides an open source statistical and graphics engine that facilitates this learning process, empowering students and researchers to integrate theory and practice. Rick Vinod’s book is an outstanding tool in this educational process, gently nursing its readers through simple examples on a vast range of topics that illustrate the practical side of econometric work, forcing economic ideas to face the reality of observation. Professor Jean-Fran¸ cois Richard, University (distinguished) Professor, University of Pittsburgh, USA This book embraces R as a powerful tool to promote hands-on learning of econometric methods. It covers a wide range of commonly used techniques with emphasis on problems faced by practitioners. It does so in a way that allows readers to initially reproduce by themselves several substantive illustrations presented in the book and to subsequently develop their own applications. It also emphasizes often overlooked careful data analysis prior to model specification. All together, this book presents a very convincing case and I definitely intend to recommend it as a supplemental textbook to my graduate students, especially those with empirical interests. Professor Aman Ullah, Chair, Economics Department, University of California, Riverside, USA Econometrics applies statistical methods to study economic and financial data. The data used can be in the form of a cross-section (micro), or a time series (macro), or panel with discrete, continuous, bounded, truncated, or censored variables related simultaneously or dynamically. Econometric models include linear and nonlinear regression models, ARMA time series models, vector autoregressive models, limited dependent variable models, among others. Most econometrics texts concentrate on discussing model selection, statistical estimation, and hypothesis testing for all such models.
March 12, 2022
x
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Hands-on Intermediate Econometrics Using R
Vinod’s book is a superb work and it is unique in various respects. First, it provides a solid introduction to important topics in both micro/macro Economics and Finance in a simple way. Second, it integrates econometrics methods with economic models in producer and consumer theories, labor economics, and financial econometrics. Third, each chapter is accompanied with empirical illustrations in the software R. Fourth, the hands-on approach provides the implementation of all the results in the book easily by anyone and anywhere. It is eminently appealing to the new generation of economists and econometricians. I am sure this book will be greatly used outside the econometrics field by many readers from other applied sciences. For example, readers in applied statistics, engineering, sociology, and psychology would enjoy learning some clever modeling tricks and graphics in R. Rick Vinod is a distinguished econometrician with great width, depth, and originality in his published work and what impressed me greatly was the ease with which this has been communicated in the book as hands-on examples. It is a very user-friendly book and I forecast that this new kind of book will be extensively used by students, faculty, and applied researchers.
page x
March 12, 2022
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Preface My teacher, Wassily Leontief, a Nobel laureate, had a great influence on me. In fact, my Harvard dissertation was on “Non-Linearization of Leontief Input–Output Analysis”. Leontief’s (1982) piece in Science criticizes typical models used by academic economists for assuming stationary equilibria and lamented “the splendid isolation” of academic economists from the real world. Morgan (1988) studies Leontief’s points by studying the content of economics journals such as the American Economic Review (AER) and argues that there is “market failure” in academic economics with excess emphasis on “fine points of logic to win approval within the guild”. Fortunately, things are changing in the new century. This book includes realistic econometric tools which model nonstationary equilibria. The book is inspired by Ernst Berndt’s (1991) hands-on type book suitable for an earlier era before the Internet. Similar to Berndt, the book should help the reader in learning to practice econometrics with numerical estimation work involving computer software. Econometrics has been around as a subject of scientific inquiry since about 1933 when the journal Econometrica was founded by Ragner Frisch and others. It has grown a great deal in the 20th century and matured. The book uses econometric examples involving several policy issues such as divestiture of the Bell System, global warming, etc. Hence, it can be of interest to all social and biological scientists, engineers, legal professionals in addition to those who use the R software. I assume that the reader has only a limited (not zero) exposure to the basics in Economics, Statistics, Finance, Computer Science, and Mathematics. Every attempt is made to xi
page xi
March 12, 2022
8:57
xii
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Hands-on Intermediate Econometrics Using R
derive results from first principles, to use self-explanatory notation and minimal cross-referencing, and to fully explain the mathematics used. Some R snippets allow the readers to “see” what various matrix operations (e.g., Kronecker products) do and numerically verify the algebraic identities (e.g., singular value decomposition) with simple examples. Thus, the book is structured for the following five types of potential uses: 1. as a textbook for graduate or advanced undergraduate econometrics courses for students with mixed backgrounds focused on applications rather than proofs; 2. as a supplemental textbook for usual econometrics/applied statistics courses in social sciences, engineering, law, and finance; 3. as a tool for any student or researcher wanting hands-on learning of some of the basic regression and applied statistics methods; 4. as a supplemental material for computer science courses teaching object-oriented languages, including R; 5. as a reference for data sources and research ideas. In the new century, fast linking of all scientists through the Internet is revolutionizing the exchange of scientific ideas including data and software tools. This book is intended to highlight and welcome these exciting changes. Since open source free software is particularly powerful for exchange of software tools, the book embraces R (distributed under Free Software Foundation’s GNU GENERAL PUBLIC LICENSE Version 2, June 1991). Why R? All empirical scientists should learn more than one programming language, not some point-and-click software restricted by the imagination of the programmer. R is based on an earlier language called S developed in Bell Labs around 1979 (by John Chambers and others with whom I often used to have lunch). It is an object-oriented Unix type language, where all inputs, data, and outputs from R are “objects” inside the computer. Also, R is an “interpreted programming language” not “compiled” language similar to FORTRAN or GAUSS (reviewed in Vinod, J2000c) or many older languages.
page xii
March 12, 2022
8:57
Hands-on Intermediate Econometrics Using R. . .
Preface
9in x 6in
b4465-fm
page xiii
xiii
Hence, all R commands (or command sets enclosed in curly braces) are implemented as they are typed (or “sourced”). An advantage of Unix type languages is that numerical results are identical (platform-independent), and yet R is available for a wide variety of UNIX platforms, Windows, and Mac systems. This provides one kind of flexibility in using R. SPlus is the commercial version of S, and R is the free version. Since Splus programs work in R, this allows second kind of flexibility for R. Third kind of flexibility of R arises from the fact that it is open source, meaning that every line of the code is available for any researcher anywhere to see, modify, criticize, etc. A black box of hidden code is unavoidable for any proprietary commercial software. Not so for R. Anyone with enough patience to learn R can modify available software to suit a particular application. The fourth kind of flexibility of R is that it offers a package called “Rcmdr” for the convenience of cursory users who want the point-and-click mode and do not wish to go beyond standard techniques. A fifth kind of flexibility is the ability to choose from a wide choice of packages without having to figure out if they are money’s worth. New packages and modules of proprietary packages are often expensive and expect users to keep on paying for ever newer versions. The buyer has to figure out if the latest version is really new, bug-free, and worth buying. By contrast, free latest versions of all R packages are readily available, making life easier for users. Vinod (J1999b, J2000c, J2003c, J2003d, J2004d, J2004e) discuss numerical accuracy issues. R is believed to be numerically one of the most accurate languages available. The accuracy of its nonlinear algorithms and random number generators is reasonable. Although not perfect, R language is progressing fast. I am convinced that R is headed to become the lingua franca of all applied statistics, including econometrics. The book provides numerous snippets containing R code explaining what the code does, with detailed comments hoping to simplify and speed up the learning curve. Since the idea of learning a programming language like R may seem daunting to disinterested students, this book does some spoon feeding. It suitably motivates the reader by making the learning a bit more hands-on and fun. The reader can first do any one of the dozens of computing tasks in our snippets (e.g., run projection pursuit regression), look at the results,
March 12, 2022
xiv
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Hands-on Intermediate Econometrics Using R
and choose to learn only the relevant features of interesting tasks at leisure. My students find it fun to discover hidden jewels inside R by direct searches on Google’s website and by doing illustrative examples inside the contributed packages.
Advantages of R for Replication of Published Research There is a new emphasis on replicable empirical work, for the benefit of all students and researchers, anywhere in the globalized world. Under the able editorship of Ben Bernanke (current Chair of the US Central Bank called the Federal Reserve Bank) reproducible empirical work is being emphasized at the American Economic Review (AER), the main journal of the American Economic Association. Bernanke (2004) cites McCullough and Vinod (J2003d) in his “editorial statement” requiring all AER authors to submit both data and software code. Indeed, it would be beneficial for the profession if any researcher anywhere is able to replicate published quantitative results from all Economics journals. Econometrica, Journal of Political Economy, and many top journals are following the lead of AER and require authors to submit both data and software. I hope that this book facilitates the movement toward replicable econometrics by making R more easy and fun. Replication with free R, will be much simpler and fast for international students, eliminating possible delays in paying for software in different currencies.
Obtaining R and its documentation files Go to http://www.r-project.org. On the left side, under “Download” click on the CRAN (Comprehensive R Archive Network) link. Pick a mirror closest to your location (e.g., Pittsburgh). Assuming that you have a Windows PC, under “Download and install R” click on Windows. Now, under subdirectories, click on “base”. Next, click on the fourth item under “in this directory”: Do not be confused by options. Click on R-2.7.0-win32.exe (or similar latest setup program, size over 25 megabytes). Depending on your Internet connection, this process will take some time. The “setup” program creates an Icon for R-Gui (graphical user interface). The point is that anyone with
page xiv
March 12, 2022
8:57
Hands-on Intermediate Econometrics Using R. . .
Preface
9in x 6in
b4465-fm
page xv
xv
an Internet connection sitting almost anywhere in the world can get R on any number of computers completely free of charge. This is perhaps too good to be true. Many books, some free manual, and other documentation is also available by clicking on a link in the left column under “documentation” at the R Homepage. The Wiki for R at (http://wiki.r-project.org) is particularly useful for newcomers.
Packages within R R comes with the “base” package and additional contributed “contrib” packages have to be explicitly requested from the R website. The contributed packages are written by statisticians, computer scientists, and econometricians from around the world and contain several functions for doing operations plus many illustrative datasets. They are freely available on demand for non-commercial purposes. All R packages are required to follow a nice readable format describing all their functions. Each function is described with details of inputs and outputs, references to the literature, and generally does have examples. If one is curious to know exactly how the function was implemented, usually all one has to do is write the name of the function and the entire program can be seen. The user then has an option to modify the program, except that beginners will find that many programs are hard to understand, since their expert authors use their ingenuity in writing efficient (fast, not necessarily easy to read) code. By contrast, the code in the snippets in this book should be easy to read. In addition to user manuals, some packages also have “vignettes”, which are fully worked out examples explaining the usage and interpretation of results (type “vignette()” to know what is available). I always download and see both the user manuals and vignettes from the r-project website for contributed packages to fully understand what each package does and its full potential. Downloading the latest version of any package itself into R takes only a couple of clicks and automatic updates are also similarly available from the R-Gui on the user’s desktop. The Journal of Statistical Software of the American Statistical Association has the URL (http://www.jstatsoft.org/) for free download with a number of articles dealing with R. A recent volume 27 (July 2008) edited by Achim Zeileis and Roger Koenker
March 12, 2022
xvi
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Hands-on Intermediate Econometrics Using R
deals with “Econometrics in R”. It has excellent articles describing the following packages: plm, forecast, vars, np, Redux, sampleSelection, pscl.
Organization of the book The book begins with an introductory chapter using production functions to illustrate relevance of nonlinear functions of regression coefficients. We include preliminary data analysis with recent tools such as (Cook’s distance) multiple regression methods, singular value decomposition, collinearity problem, and ridge regression. Chapters 2, 3, and 5 deal with time series analysis with the standard topics and some sophisticated ones including: autoregressive distributed lags (ARDL), chaos theory, mean reversion, long memory, spectrum analysis, ergodicity, stationarity, business cycles with imaginary roots of AR(2), impulse response, vector autoregression (VAR) models, stochastic diffusions, cointegration, Granger causality testing, and multivariate techniques including canonical correlations. Chapter 4 discusses expected and non-expected utility theory and implications for Finance and other areas. It has new tools for measuring up to fourth-order stochastic dominance shown to study “prudence”. Chapter 6 has detailed derivations of simultaneous equations theory including k-class estimators, limited information maximum likelihood (LIML) and the “identification” problem, with hands-on examples. The novelty of Chapter 7 on “limited dependent variables” is that besides economists’ favorite Tobit and Heckman estimators, it explains the less familiar general linear model (GLM) viewpoint used by Biostatisticians with their “link functions” implying superiority of logit over probit. We also discuss survival models for Oil Company CEOs. Chapter 8 has sophisticated consumer theory including Wiener– Hopf dynamic optimization and Kernel estimation. Chapter 9 explains the traditional bootstrap, its limitations, and several newer tools including double and maximum entropy bootstraps (with easy to use R code). Chapter 10 has generalized least squares (GLS), generalized method of moments (GMM), vector autoregressive moving average (ARMA) models, “estimating function”, and related pivot functions with examples. Chapter 11 deals with nonlinear models
page xvi
March 12, 2022
8:57
Hands-on Intermediate Econometrics Using R. . .
Preface
9in x 6in
b4465-fm
page xvii
xvii
and explains how to use “projection pursuit regression” for money demand equations. The topics covered are influenced by a need to illustrate them with examples within the size limitations of a book and my own familiarity with the topic. Admittedly somewhat vain, the long list of my own and joint papers is separated from the list of other authors. The list uses the prefix J for journal articles, P for published proceedings, B for books or chapters in books, and U for unpublished but widely circulated pieces. Since it is not possible to include additional papers of mine, an appendix lists a thematic classification of my papers. It shows that the book could not accommodate some themes and state space (Kalman filter) modeling from Vinod (B1983, 1990, J1995c) using the R package “sspir”. An important selling point of the book is that it has numerous program snippets in R. My hope is that the snippets have useful practical information about implementing various theoretical results. The reader is encouraged to read them and treat the snippets as templates. I hope readers will modify the snippets to apply to different and more interesting data and models. This is the sense in which we call this book “hands-on”. The reader can readily copy and paste each snippet into R while reading the discussion and see R work out the results first hand. For brevity, very few of the outputs appear in the printed book. I am grateful to the following former/current students for detailed comments: Erik Dellith on Chapter 1, Brian Belen on Chapter 5, Caleb Roepe on Chapter 7, and Diana Rudean on Chapter 9. Many snippets in the book have been tried by graduate students at Fordham University. The students have found it fun to try them on some datasets of their own choice. It is important not to treat the snippets as black boxes to blindly perform some tasks. Professional programmers try to write the most efficient code in the sense that it works fast and has as few lines as possible. The snippets in this book are not at all professional. Instead, they should be viewed as tools for learning to use R in applied work. Hence, instead of merely copying and pasting them, my students were encouraged to slightly modify each snippet to suit a distinct problem. Such a hands-on approach makes it fun. A purchaser of this book need not type the snippets, since a CD in a sleeve inside the back cover contains all snippets as text files.
March 12, 2022
xviii
8:57
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-fm
Hands-on Intermediate Econometrics Using R
I welcome suggestions for improvements to the content of the book and/or the snippets. Although improved code to professionals means faster running with fewer lines, I prefer a readable code with some intuition for the steps used and lots of comments. Of course, I will give proper credit to the person(s) suggesting any improvements. On that note, Professor Peter C. B. Phillips, the eminent econometrician from Yale University who has seen a preprint of this book, suggested that I should clarify that my snippets often use the assignment symbol “=” from GAUSS and FORTRAN, instead of the “” showing. If not, go back to the above website and get more detailed directions for getting started. The following snippet cleans up R from previous use and gives a date–time stamp, so one knows exactly when the particular run was made: #R1.1.1 Recommended Start of a session of R software objects() # this command lists all objects already #lurking in the memory of R rm(list=ls()) #rm means remove them all to clean up old stuff
print(paste("Following executed on", date())) #date−time stamp 1.1.1.
Data on metals production available in R
Our hands-on example considers production of metal by 27 regional US firms belonging to the Standard Industrial Classification Code (SIC) 33. Our data are readily available in R with the following command snippet: #R1.1.2 Read production function data already in R into the #memory library(Ecdat) #pulls into the current memory of R # Ecdat and fBasics packages help(Metal)# gives further info about the dataset Metal, #sources etc. data(Metal)#pulls the Metal data into memory of R names(Metal)#provides the names of series in various #columns summary(Metal) #provides descriptive stats of the data met=as.matrix(Metal)#we create an object called met Ly=log(met[,1])#pull first column of met, take log, define Ly LL=log(met[,2])#pull second col. of met, take log, define LL LK=log(met[,3])#pull third col. of met, take log, define LK datmtx=cbind(Ly,LK, LL) #bind columns into a matrix fBasics::basicStats(datmtx) matplot(cbind(Ly, LK, LL), typ="b", pch=c("y","K","L"), main="Metals Data Description", xlab="Observation Number", ylab="Logs of output, capital, and labor")
March 11, 2022
13:56
4
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch01
Hands-on Intermediate Econometrics Using R
The command fBasics::basicStats(datmtx) containing the double colon (::) asks R to pull the R-package called “fBasics” and then use the command “basicStats(.)” function, avoiding the command library(fBasics). However, (::) will not work if “fBasics” is not installed. The package and function names routinely exploit the fact that R is case-sensitive. For example, lower case c followed by upper case S in “basicStats” help separate the two words in its name. The command matplot(cbind(Ly, LK, LL), refers to plotting a matrix (mtx) created by binding the indicated three columns into a matrix by the R function “cbind(.)”. The options within a command are separated by commas. The plot-type choice “b” refers to both line and print characters listed as a vector c(.,.) of characters distinguished by straight (not fancy) quotes: typ="b", pch=c("y","K","L"). The plotting option “main” refers to the main header, whereas the axis labels are “xlab, ylab”. 1.1.2.
Descriptive statistics using R
One should always study the basic descriptive statistics of available data before fitting any model. See the output in Table 1.1, where the abbreviations in the first column are standard: nobs=number of observations=T in our example, NAs=number of missing values. 50% data are both below and above the median. The first quartile has 25% data below it and 75% above it. The third quartile has 75% below it and 25% above it. SE (mean) in Table 1.1 refers to the standard error of the sample mean, which is the standard deviation of the sampling distribution. Table 1.1 reports SE of the mean. Think of the sample mean as a random variable and consider the mean of all possible samples of size T as the sample space. The sample mean is a random variable defined on this sample space, in the sense that we compute all possible samples and then all possible sample means. This is the sampling distribution of the mean. By the central limit theorem (CLT) this distribution is Normal. The standard deviation of this sampling distribution is the standard error, SE. The descriptive statistics in R include the LCL and UCL as the lower and upper confidence limits, respectively, of a 95% confidence interval on the mean. It also reports skewness and kurtosis, which are also defined in snippet #R1.1.3 and described in the following.
page 4
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch01
Production Function and Regression Methods Using R Table 1.1.
1.1.3.
page 5
5
Basic descriptive statistics, metals data.
Description
Ly
LK
LL
Nobs NAs Minimum Maximum 1st Quartile 3rd Quartile Mean Median Sum SE (mean) LCL (mean) UCL (mean) Variance Std dev Skewness Kurtosis
27 0 6.384941 9.195142 6.935645 7.928982 7.443631 7.378996 200.978045 0.146484 7.142529 7.744733 0.579354 0.761153 0.613768 −0.522245
27 0 5.634754 9.546066 6.736932 8.063829 7.445922 7.436605 201.03991 0.186384 7.062804 7.829041 0.937957 0.968482 0.231518 −0.677601
27 0 4.919981 7.355532 5.360868 6.233483 5.763652 5.560335 155.6186 0.126293 5.504052 6.023252 0.430651 0.65624 0.701836 −0.49553
Writing skewness and kurtosis functions in R
Let population central moments of order j be denoted by μj = T ¯)j . Skewness measures the nature of asymmetry in the i=1 (xi − x data. Positively skewed density has outcomes above the mean more likely than outcomes below the mean. Negatively skewed is similarly defined. Kurtosis measures the degree to which extreme outcomes in the tails of a distribution are likely. Pearson’s measures of skewness and kurtosis are skewp = (μ3 )2 /(μ2 )3 and kurtp = μ4 /(μ2 )2 . Since Pearson’s formula does not distinguish between positive and negative skewness, R package “fBasics” uses a more informative signed square root, skew = (μ3 )/σ 3 , where μ2 = σ 2 . The Normal distribution has zero skewness: skewp = 0 = skew. Since the kurtosis of the Normal is 3, R package “fBasics” subtracts 3 to define “excess kurtosis” as: kurt = [μ4 /(μ2 )2 − 3]. If kurtP > 3, i.e., if kurt > 0, the distribution is more peaked than Normal near the mode (maximum frequency) of the distribution and has thicker (fatter) tails. In Table 1.1, all variables have kurt < 0, suggesting that their probability distributions are flatter than the Normal. The sample
March 11, 2022
6
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch01
Hands-on Intermediate Econometrics Using R
estimates of skewness and kurtosis use sample moments defined by the usual unbiased estimates. Snippet #R1.1.3 illustrates how to write R functions (algorithms) for computing skewness and kurtosis, whose definitions involve third and fourth powers. Download the R package “moments”. Issuing commands moments::skewness and moments::kurtosis on the R console prints to screen more sophisticated versions of these algorithms. Packaged functions often minimize computing time, but can be hard to understand. #R1.1.3 New Functions in R to compute skewness and kurtosis myskew |z|) (Intercept) 2.50937 0.18389 13.646 t) denote the indicator function, which is unity when the condition (Ti > t) holds, that is, when the duration (tenure) random variable for the ith individual observation (e.g., CEO) exceeds t time units; I(Ti > t) = 0, otherwise. Now, f (t) =
exp(β x) h0 exp(β x) = , i I(Ti > t)h0 exp(β x) i I(Ti > t) exp(β x)
(7.4.8)
where h0 is absent from the last term. The main appeal of the Cox model over Weibull for econometric applications is that the baseline hazard h0 cancels out from the last portion of (7.4.8); that is, the individual effects are allowed for, removed (not assumed away), and need not be estimated. Assuming that there are n individuals in the study, the usual likelihood function is simply a product of n density functions. In Cox’s model, it is a product of f (t). In many problems, we must allow for censoring, because very often, we cannot wait long enough till we know that the duration of every individual included in the study has ended. For example, a study data collection might end at a particular date (e.g., 1983 in our example). Hence, we cannot study the tenure of any CEO surviving beyond 1983. For example, such a CEO may last a few years or may be fired immediately. The censoring involves a modification of the indicator function in (7.4.8) to incorporate the upper limit (e.g., 10 years) on Ti . The maximum likelihood estimation of parameters β from such likelihood functions is provided in many software programs, including R. Let us now turn to an interpretation of the coefficients β associated with the regressors x. First, we integrate out the time variable
page 360
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
Limited Dependent Variable (GLM) Models
b4465-ch07
page 361
361
from both sides of (7.4.7) from 0 to t. Thus, we replace h(t) and h0 with H(t) and H(0), respectively, denoting cumulative hazard functions by the upper case H. Now, take logs of both sides to yield ln H(t) = ln H(0) + βx.
(7.4.9)
This shows that β measures the proportionate change in cumulative hazard as a result of a change in x. Differentiate both sides of (7.4.9) with respect to x and rearrange to yield (dH(t)/dx) = H(t)β. Recall from (7.4.3) that S(t) = exp(−H(t)); that is, H(t) = − ln S(t). Using this, we have (−1/S(t))(dS(t)/dx) = − ln S(t)β, or we write dS(t) = S(t) ln S(t)β. dx
(7.4.10)
At different levels of S(t), the marginal effect of each regressor x on S(t) is estimated by (7.4.10). Note that Eqs. (7.4.8) and (7.4.10) show that the Cox regression not only controls for the baseline individual firm effects, but also permits a detailed study of marginal effects. Since S(t) and log S(t) change from one observation to another, the marginal effect also changes, even if β is fixed. However, in R, it is a simple matter to compute such effects. Now, we provide further discussion of an adjustment of likelihood function by allowing for censoring, which is similar to the adjustment for Tobit models in Section 7.2.3. This involves the clever use of censored probability distributions in a likelihood function. Consider a sample of i = 1, 2, . . . , n CEOs (patients), each having f (ti ), S(ti ), h(ti ), where we remove some unique aspects via the h0 (t) cancellation mentioned above. Let Ci denote the maximum duration possible due to censoring. Our Ci = 10 years. Let Ti denote the actual tenure (survival duration) of ith CEO. We observe the removal of the CEO only if Ti ≤ Ci ; otherwise, we conclude that the tenure Ti > 10. The tenure time recorded is min(Ti , Ci ). Now, define a binary “indicator” function δi which equals 1 if Ti ≤ Ci ; otherwise, δi = 0. Our data have the column titled “censor” containing δi values. The uncensored likelihood function is simply a product of n density functions f (ti ) for n individuals in the study. This is inappropriate here, because we do not know if and when a surviving person beyond the ending date of the study actually died (or CEO was
March 11, 2022
362
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch07
Hands-on Intermediate Econometrics Using R
removed). When we allow for right censoring, the likelihood function for survival data is a product of two parts: (a) Density functions f (ti ) when δi = 0, and (b) The product of random survivor functions S(ti ) when δi = 1, since the CEO survived (i.e., did not get fired by the board of directors or death did not occur). The estimation usually proceeds after using the definition of h(ti ) given above to rewrite f (ti ) in part (a) of the likelihood function. The ith term of the likelihood function eventually becomes h(ti )δi S(ti ). The parameter β is estimated by maximizing the censored likelihood function obtained from a product of n terms, which correctly allows for data censoring. The maximum likelihood estimation is rather easy to implement in R. Frank Harrell’s function “bj” of the package “rms” fits the Buckley–James distribution-free least squares multiple regression model to a possibly right-censored response variable. This model reduces to ordinary least squares if there is no censoring. By default, model fitting is done after taking logs of the response variable. The function “cph” in the package “rms” fits the Cox proportional hazard model. Andersen–Gill model extends the Cox model to allow for interval time-dependent regressors, time-dependent strata, and repeated events. #R7.4.2 Cox type analysis of Oil and Gas Company CEO survial oil=read.table(file="c:\\data\\oildata.txt" , header=T) attach(oil); library(survival) survreg(Surv(tenure, censor) ~ outside+sales+age) fit=coxph(Surv(tenure, censor) ~ outside+sales+age) plot(survfit(fit), xlab="Survival years", main="Plot of Fitted Survival Curves for Oil&Gas CEOs", ylab="Survival probability with confidence limits")
A typical Kaplan–Meier plot is a step function starting at a survival probability of 1. Its jump points are given in time durations, and jump size measures the change in the probability of survival. If r(ti ) denotes the number of individuals at risk (of dying) and di the number of deaths during the half-open interval [ti , ti+1 ), then a jump in probability is given by (r(ti ) − di ))/r(ti ). The Kaplan–Meier
page 362
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch07
Limited Dependent Variable (GLM) Models
Fig. 7.3.
page 363
363
Plot fitted survival curves for oil and gas CEOs.
non-parametric (survival) probability of surviving until time ti is given by the product of all such jump probabilities Prob(T > ti ) =
i−1 [r(ti ) − di )] 0
r(ti )
.
(7.4.11)
Using an estimate of the standard error of the curve, a 95% confidence interval for the true Kaplan–Meier curve is usually given in typical plots similar to Fig. 7.3. #R7.4.3 CEO Survival Analysis with sophisticated rms #Package oil=read.table(file="c:\\replication\\oildata.txt" , header=T) attach(oil); library(rms) #bring Harrell's package into #the memory of R fit2=cph(Surv(tenure,censor)~outside+sales+age,data=oil) fit2 #print results
March 11, 2022
364
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch07
Hands-on Intermediate Econometrics Using R
#cph is Cox Prop Hazard Model in this package #Buckley-James distribution-free least squares regression fit3=bj(Surv(tenure, censor)~outside+sales+age,data=oil) fit3 #print B-James results, fitp=psm(Surv(tenure, censor)~outside+sales+age,data=oil) fitp#print parametric survival model
#R7.4.Out1 OUTPUT of Cox Proportional Hazards Model # obtained by using the "cph" function of rms library Model Tests Discrimination Indexes Obs 80 LR chi2 23.29 R2 0.253 Events 72 d.f. 3 Dxy 0.524 Center -3.0625 Pr(> chi2) 0.0000 g 0.718 Score chi2 23.77 gr 2.050 Pr(> chi2) 0.0000 coef se(coef) z p-val outside 3.86e-02 6.79e-02 0.569 0.569462 sales 7.18e-05 2.94e-05 2.445 0.014505 age -6.14e-02 1.77e-02 -3.478 0.000505
The output of Buckley–James regression model by the “cph” is found in #R7.4.out1, and “bj” function is found in #R7.4.out2. Note that this adds an estimate of the intercept absent in “cph”. The slope for outside directors and size of the company measured by sales have become negative. Both have become statistically insignificant. Only the age of the CEO has a significant effect on CEO survival. When the intercept is absent as in the Cox model, the effect of age on survival is negative, suggesting that older CEOs are removed sooner. By contrast, the output from “bj” in #R7.4.out2 suggests a significant positive slope, suggesting the greater the age of the CEO, the greater the survival probability. The two Kaplan–Meier plots for young (line marked true) versus old CEOs in Fig. 7.4 support the estimated sign by “bj” over Cox’s “cph”. The output in #R7.4.out3 based on the Chi-square test supports a significant negative effect of the presence of outsiders on the CEO survival, again more consistent with “bj”.
page 364
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch07
Limited Dependent Variable (GLM) Models
Fig. 7.4.
page 365
365
Young age seems hazardous for CEOs survival.
#R7.4.Out2 OUTPUT Buckley-James Regression Model by using the "bj" function Obs Events d.f. error d.f. sigma 80 72 3 68 0.7125 Value Std. Error Z Pr(>|Z|) Intercept -7.846e-01 7.215e-01 -1.0875 2.768e-01 outside -3.612e-02 5.586e-02 -0.6466 5.179e-01 sales -3.981e-05 2.033e-05 -1.9584 5.019e-02 age 5.820e-02 1.063e-02 5.4733 4.417e-08 #R7.4.4 CEO Survival with categorical classifications # the snippet #R7.4.3 must be in memory for following to work #define factor (categorical variable) for #outside directors below the data median of 5 such directors. fac.outside=factor(outside |t|) 0.7156 0.0000
Note: Residual standard error: 37.61 on 576 degrees of freedom. Adjusted R2 : 0.9989, F (1, 576): 5.376e+05, p-value: < 2.2e−16. Breusch–Godfrey LM test statistic: 79.6229, df = 1, p-value < 2.2e − 16 for the null hypothesis that order 1 serial correlation among regression errors is zero. Breusch–Pagan studentized test statistic (heteroscedasticity) is 2.1951, df = 1, p-value = 0.1385.
March 11, 2022
13:56
382
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R
is misspecified anyway. Since one might regard rejection of NLHS on the basis of a test using the residuals of a misspecified model as hasty, the following subsection offers our updated estimates of Hall’s more sophisticated specification (8.3.4). 8.3.3.
Direct estimation of Hall’s NLHS specification
Vinod (B1988a) evaluated the log-likelihood function over 150 times using different starting values and computational strategies available in the Gauss software at the time. For the first stage, a quasiNewton type Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm was used. During the later stages, the Berndt–Hall–Hall–Hausman (BHHH) algorithm was used. Vinod (B1988a) found that the parameter estimates were very close to α = 0 and γ = 1. He provided graphics showing that for values of α and γ away from the 0 and 1 values, respectively, the log-likelihood was lower. Vinod (B1988a) also tested for serial dependence and found that the autocorrelation of all orders was within the 95% band, so the null of serial independence was not rejected. Vinod’s parameter estimations were estimated after imposing two additional constraints: α > 0 and γ ∈ [0, 1]. These were implemented by estimating α0.5 instead of α and γ1 instead of γ = 1/[1 + exp(γ1 )]. I now have second thoughts about the second constraint: γ ∈ [0, 1]. The interpretation of γ > 1 is simply that the time preference rate is higher than the market interest rate, which could well happen! We do not impose the unnecessary second constraint in the current implementation. Our estimation of (8.3.5) to (8.3.8) using R is intended to be a template for implementing analogous problems. To implement the maximum likelihood estimation in R, we use the command “optim”, which in turn offers two methods (BFGS and Nelder–Mead). Similarly, the command “maxBHHH” of the package “micEcon” is used here. Both commands expect the user to specify the likelihood function (8.3.8) as a user-specified “function” in R. The reader can see how these functions are defined by studying the following snippet: #R8.3.3.1 Function to compute concentrated likelihood fn0 n=length(pccnd); constant=1; psi2=psi^(-1/alpha) alp2=-2*alpha-2; alp3=2*alpha+2 tby2=(n-1)/2 #t1 term 1 ct - psi-1/alph s1=rep(NA,n-1); s2=rep(NA,n-1) for(i in 2:n) s1[i-1]={((pccnd[i]-psi2*pccnd[i-1])^2)*(pccnd[i-1] ^alp2)} for(m in 2:n) {s2[m-1]=log(pccnd[m-1])^alp3} s3 |t|) Quarterly t
Estimate
Std. error
Dependent = Ct Intercept −5.465434 Ct−1 0.993522 0.007968 Yt−1
15.080794 0.007138 0.007027
−0.362 139.194 1.134
0.717 |t|)
−161.1326 0.9753 0.0403 0.6632 −0.6864
73.5966 0.0084 0.0124 0.4310 0.3126
−2.19 115.69 3.24 1.54 −2.20
0.0290 0.0000 0.0013 0.1244 0.0285
improvements, and greater variety becomes available over time. It is well known that the CPI does not fully reflect the quality improvements achieved over time. In Table 8.4, the negative coefficient consistent with the first tenet of demand theory actually appears on the time regressor, presumably due to such confounding between quality and price. Reductions in all kinds of costs (the price of borrowing is the interest rate) lead to a higher indirect utility by (iii) above. Our strategy is to conduct a thought experiment involving such price reductions. The negative effect of price on consumption will be artificially reduced in constructing our Ct∗ . Note that the specifics of confounding are not relevant, provided we make sure that utility associated with the target consumption is higher, that is, Ct∗ > Ct . Let us use the following derivation of target consumption level based on fitted coefficients of (8.5.4), except that we delete the time variable having a negative coefficient from the right-hand side. This makes the left side generally larger than Ct . Ct∗ = ηˆ0 + ηˆ1 Ct−1 + ηˆ2 Yt + ηˆ3 πt .
(8.5.5)
Now that we have constructed a plausible time series for target consumption, we are ready to estimate the equation representing the linear decision rule (8.5.2) for income determination after introducing the intercept term. This completes our estimation of our system of two equations. The estimates of b1 to b3 appearing in the habit equation are −20.13363, 0.01775, and 0.98362, respectively. The estimates of Q1 and Q2 appearing in the decision rules are −21.25511 and 21.76450, respectively (Table 8.5). The minimand of the Wiener–Hopf–Whittle theory has three terms: E(Ct − Ct∗ )2 , E(Yt − Y¯t )2 , and Y¯ . The first term is the variance of the gap between the target consumption and actual
March 11, 2022
398
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R Table 8.5.
(Intercept) Ct−1 Ct∗
Estimation of the decision rule for income. Estimate
Std. error
t-Value
Pr(> |t|)
4832.5390 −21.2551 21.7645
50.8774 0.3083 0.3014
94.98 −68.94 72.22
0.0000 0.0000 0.0000
consumption or Vgap . For our data, this is estimated to be 19432.60. The variance of per capita income appearing in the second term is estimated to be VY = 28820968. The mean per capita income is 18392.12. The weights (1, μ1 , and μ2 ) involving Lagrangian coefficients shed some light on the importance given by the consumer to the three terms of the minimand. Vinod (P1996) shows that the first Lagrangian is given by μ1 = b2 (Q1 b2 + b3 )/Q2 b3 ,
(8.5.6)
estimated to be 0.0005. Similarly, the second Lagrangian is μ2 = −2Y¯ (C¯t∗ − C¯t−1 )/Q1 ,
(8.5.7)
estimated to be 21.83251. These suggest three weights (1, 0.0005, and 21.8) on the three terms. These can be made free from units of measurement by using partials with respect to logs of Vgap , VY , and Y¯ , suggesting the formula [Vgap , μ1 VY , 2μ2 Y¯ ].
(8.5.8)
Then the weights become (1, 0.7416, 41.27). Clearly, US consumers care a great deal about the level of income (last term) and less about the variance of income (second term). The failure to reach the target consumption level has a reference weight of unity. Since highly nonlinear functions (8.5.6) and (8.5.7) are involved in estimating the shadow prices, their finite sample estimates are subject to random variation. Since our estimates might be sensitive to the specific choice of Ct∗ , more research and empirical studies are needed before they can be taken seriously.
page 398
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Consumption and Demand: Kernel Regressions and Machine Learning
8.5.2.
page 399
399
Implications for various puzzles of consumer theory
Consumer theory contains various “puzzles” useful as pedagogical devices to encourage students to think more deeply about subtle aspects of consumer behavior. (1) Lowenstein and Thaler (1989, p. 192) note that “people care about changes in, as well as absolute levels of income”. This is considered a puzzle, because traditional LC/PIH theory does not recognize such behavior. By contrast, our minimand (8.5.1) contains a term for such changes. (2) An implication of LC/PIH theory, Flavin (1981), is that consumption should not respond to anticipated changes in income. On the other hand, Deaton (1986) shows that consumption is “too smooth” with reference to unanticipated changes in income. Since the habit equation (8.5.3) of our theory explicitly recognizes the response of consumption to income, excess smoothness or excess sensitivity are empirical questions — not puzzles. (3) The PIH theory implies, as asserted by Zeldes (1989, p. 277) and Caballero (1990), that consumption should decrease when δ > r (market interest is lower than the rate of time preference). Since it is a fact that during periods of low r consumption grows, this is a puzzle. Wiener–Hopf–Whittle theory can explain higher consumption during those periods by letting the target consumption Ct∗ reflect higher target values in response to the “income effect” of lower prices (interest charges on home mortgages and cars). (4) The fourth puzzle of traditional theory is that the effect of interest rates on consumption is “too small”, in the sense that rational consumers should respond more actively to changes in r, (Hall and Taylor 1986, Section 7.4). An analogous puzzle is that LC/PIH does not allow for the inability of liquidity-constrained consumers to borrow during recessions (Muelbauer, 1983). In our theory, the target consumption adjusts to incorporate the effect of such changes viewed as a kind of price change. After all, consumption of housing, automobiles, etc., is sensitive to r. It is a simple matter to recognize that the target values Ct∗ would be lower during recessions due to liquidity constraints. (5) The last puzzle of consumer theory is that during the business cycle boom periods, consumers work harder and consume more.
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R
400
It is a puzzle because LC/PIH does not allow the time preference δ to depend on business cycles. Our explicit decision Eq. (8.5.2) encourages consumers to earn more when they can, in response to changing Ct∗ and Ct−1 . 8.6.
Consumption Demand System and Forecasting
While we have focused on aggregate consumption, we should not ignore its role in microeconomics. Econometric tools discussed in earlier chapters are widely used in forecasting product demands to enable multi-product firms to manage their inventories and make other strategic decisions touching almost all aspects of a business. A much-cited Baumol and Vinod (J1970) paper uses inventory theory to study optimal transport demand for various modes of transport by individual shippers and total demand for transportation services. 8.6.1.
Machine learning tools in R: Policy relevance
Demand forecasting in the 21st century has access to Big Data (meaning data on thousands of variables, having possibly millions of observations, and other complexities). New machine learning tools, readily available in R, are designed for these tasks allowing modern managers to forecast demands at time t and location i for very specific products. Big data models can predict the demand for a specific model number j manufactured by a specific company k, as well as demand for competing products. After all, a real-world consumer must choose from a complex list of choices. Mullainathan and Spiess (2017, Table 1) shows how machine learning tools can improve OLS forecasts of house prices at five quintiles. They use a large 2011 sample of 10,000 metropolitan units in the American Housing Survey. Exploratory data analysis of high-dimensional data is called “unsupervised learning”. The neural network is better at dealing with messy and unstructured data such as images, audio, and text. If the prediction rules for one variable are specified using a highdimensional vector of variables, it is called supervised learning. Support vector machine (SVM) is a supervised machine learning algorithm used for both classification and regression. The
page 400
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Consumption and Demand: Kernel Regressions and Machine Learning
page 401
401
miscellaneous R package “e1071” by D. Meyer and others is useful here. See R function e1071::svm() for examples and references. It plots each data item as a point in n-dimensional space based on n features using feature values as a coordinate. Then, the SVM algorithm finds the hyper-plane that best differentiates the two classes. The SVM classifier finds a frontier that best separates those who buy a product from those who do not. Section 6.1.5 of Chapter 6 mentions the “undersized sample problem” in the context of simultaneous equations where the number of regressors exceeds the number of data points. It is also called the “wide data” problem in data science. In Big Data machine learning situations, undersized samples are very common. Vinod (2020s) states that a ridge type “regularization” alone permits solutions to wide-data regression y = Xβ + , and generalized linear models involving link functions. Hence, ridge biasing parameter k is the lifeblood needed to obtain estimates in modern data science. Using the singular value decomposition, X = U Λ1/2 G , the uncorrelated components c of the OLS estimator b are described in Chapter 1 of this book. Equation (1.10.11) states that the ridge estimator bk satisfies bk = GΔc,
c = Λ−1/2 U y
Δ = diag(δi ),
(8.6.1)
where λi denotes the ordered eigenvalues of X X along the diagonal of Λ, and where shrinkage factors, δi = λi /(λi + k), are applied to uncorrelated components c. Modern data science algorithms extend these ideas to wide-data problems with appropriate adjustments. For example, one keeps only the positive eigenvalues and chooses a large enough k. One needs to calculate the vector c only once for each matrix X of regressors, however wide or narrow. Each choice of the regularizer (biasing parameter k in the ridge context) leads to a unique set of shrinkage factors δi , and hence unique coefficient estimates and fitted values. Now, leave-one-out (LOO) cross-validation allows quick computer iterations to obtain sequentially improved estimates of k. Of course, one can similarly randomly select many (not just one) observations for placing them in so-called “training” subsets of data. One can then estimate the model using only the leave-many-out training set, find fitted values for all left out (but known) y values in the “testing” dataset. Such computer-intensive refitting repeated many times
March 11, 2022
402
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R
is designed to yield sequentially improved regularizers and hence improved model parameter estimates. The algorithm admits a new set of regularizers and parameters and tracks their performance as mean squared forecast error. MSFE, say. The algorithm continues till the performance in the testing subset keeps improving. More sophisticated algorithms are designed to “learn” from their mistakes in the form of large MSFE values. The averaging in the computation of MSFE is over several random subsets. Since only the “training” subset is used for estimation, all MSFE performance measurements on the testing subsets are obviously out-of-sample. Modern data analysis applied to demand forecasting tries to obtain superior out-of-sample predictions while controlling the risk of insample overfitting. Decision tree algorithms construct and analyze choices in the form of a tree diagram. The R package “party” (by Hothorn and others for partitioning) has the function ctree(formula, data), whose argument “formula” states the relationship between variables. Nonparametric regression trees for nominal, ordinal, numeric, censored, and multivariate responses are also included in the “party” package. The “rpart” package (r=recursive & partitioning) by Terry Therneau and others does classification and regression trees (or CART by Breiman, Freidman, Olshen, and Stone). They are designed for both regression and classification tasks, where the dependent variable y is respectively a usual numeric or a “factor”. The function rpart “grows” a tree while prune(fit, cp=.. ) function removes the overfitting by using Mallows’ Cp criterion. The Cp criterion computes unbiased estimates of risk (mean squared error) for model selection. The following code illustrates the prediction of car mileage from price, country, reliability, and car type. fit=rpart(Mileage ∼ Price + Country + Reliability + Type, method="anova", data=cu.summary). Random forests from the R package “randomForest” construct a large number of bootstrapped trees. The trees have nodes, and random forest belongs to the class of supervised learning algorithms. Random Forest randomly chooses the partition of each node, a subset of features (regressors), and constructs a forecast by averaging overall predictions. Sometimes it faces convergence problems. The word “classification” in the machine learning context used by computer scientists refers to a discrete choice from high-dimensional
page 402
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Consumption and Demand: Kernel Regressions and Machine Learning Table 8.6.
page 403
403
Confusion matrix. Predicted
Actual False True
False
True
True negative False negative
False positive True positive
options. What econometricians call “estimation and testing” is called “training and testing” in computer science. The confusion matrix is similar to the matrix of Type I and Type II errors in statistical inference. It counts how often the classification correctly predicts a consumer’s choice. The rows of a confusion matrix in Table 8.6 represent actual choices, while columns represent predicted choices. It is often appropriate to evaluate the performance from the ratio of correct classifications to total attempts, as follows: P erf ormance =
T ruN eg + T ruP os , T ruN eg + T ruP os + F alN eg + F alP os (8.6.2)
using self-explanatory abbreviations. The R package “caret” has a function confusionMatrix() for measuring the various components on the right-hand side of (8.6.2). Situations when (8.6.2) is inappropriate due to distinct incidence rates are described in the sequel. It is convenient to denote the elements of the 2 × 2 confusion matrix as [C11, C12] along the first row and [C21, C22] along the second row. Now Eq. (8.6.2) measures performance by (C11+C22)/GT, where GT is the grand total (GT) of all four cells in the denominator on the right-hand side of (8.6.2). Consider applying (8.6.2) to checking the accuracy of a disease test. The medical testing scientists, regulatory authorities, and others have long realized a serious flaw (8.6.2) that it does not adjust for disease incidence. The flaw is best described with the example of a fraudulent medical test for a rare disease known to affect only one person in a thousand. Now the fraudulent test simply reports every patient as disease-free without actually bothering to test. It is plausible that the confusion matrix of the fraudulent test has TN=C11=999, FP=C12=1, FN=C21=1, TP=C22=0. The right-hand side of (8.6.2)
March 11, 2022
404
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R
will have the sum of the main diagonal (999+0) divided by the grand total of 1000. The fraudulent test is found to be 99.9% effective. If the regulators were to approve such an (always negative) test, it would be a disaster. How to allow for the incidence of the disease? Scientists in this field have two jargon measurement terms called precision and recall, rather unintuitive names for econometricians. “Precision” P is defined as the ratio of true positives to all the positives predicted by the model: P=C22/(C12+C22) or P=TP/(FP+TP). The precision P is lower, the larger the number of false positives. Since the fraudulent test has zero positives, its numerator is zero, and the test is rejected out of hand by this criterion. “Recall” R is defined as the ratio of true positives to all positives in reality irrespective of the model predictions. R=C22/(C21+C22) or R=TP/(FN+TP) is high when the prediction model does not miss any positives. Again, the fraudulent test having zero positives, its numerator is zero, and the test is rejected out of hand by this criterion also. How to combine the two numbers P and R into one index? Ideally, both P and R should be unity. In reality, there is a trade-off between P and R. We want an average of P and R such that it penalizes each for being different from unity. Such average is not the arithmetic or geometric mean, but the harmonic mean, which is defined as the F1 score: F1 =
2∗P ∗R . P +R
(8.6.3)
Another piece of jargon is the receiver operating characteristic (ROC) curve for classification trees. The R package “ROCR” implements it while depicting the quality of a machine learning classifier, similar to the accuracy of a medical test. Its axes are both 0 to 1, defined over the unit square, similar to the Lorenz curve of Section 4.2.1. Its horizontal axis has a false positive rate (FPR), while its vertical axis has its sensitivity or true positive rate (TPR). The worst possible ROC is a 45◦ line, and the best is Γ-shaped with a vertical line at FPR=0 representing perfect sensitivity and zero false positives at all sensitivity levels. The ROC chart depicts the trade-off between TPR and FPR. The choice of the smallest FPR on the horizontal axis, which achieves
page 404
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Consumption and Demand: Kernel Regressions and Machine Learning
page 405
405
the highest TPR on the vertical axis, is made. Hence, the area under the ROC curve (AUC) is proposed as a better single number summarizing the trade-off between “sensitivity” or C11/(C11+C21), (or True negative to all negative) and “specificity” or C22/(C22+C12) (or true positive to all positive) in terms of elements of the 2 × 2 confusion matrix Cij for medical tests and classifiers. The R package “pROC” contains a function auc() to compute AUC. Note that the AUC equals unity when the ROC (defined over the unit square) is best with shape Γ. With width and height, both equal to unity, the area under a Γ-shaped ROC curve is also unity. Thus AUC measures how far away from the best is the current classification tree. The reader can see ROC charts for Boston housing data and a decision tree forecasting exercise on that data showing its superior performance at: https://homepages.uc.edu/lis6/Teaching/ ML19Spring/Lab/lab8_tree.html Another useful machine learning algorithm called k-nearestneighbor (kNN) is for supervised learning. It classifies a new data point based on the Euclidean distance between its features and those of its neighboring data points. For example, using the GRE and GPA scores as “features” of a student, kNN classifies the student as admitted or not to a school. The “caret” package in R implements kNN with functions trainControl() and train(.., metric = "ROC",..), which sets up a grid of tuning parameters for a number of classification and regression routines, fits each model, and calculates a resampling-based performance metric. Besides the “ROC” shown above, the metric can be “Sens” for sensitivity, or “Spec” for specificity, measures defined above. The ROC for kNN is somewhat different from the abovementioned ROC for classification trees defined over the unit square. The horizontal axis of the kNN-ROC is k (the number of nearest neighbors), while the vertical axis remains TPR. The “train” function helps choose the kNN model with the smallest k to achieve the high-enough TPR for the data at hand in terms of ROC. A small k achieves parsimony. After choosing a suitable k and the corresponding model, one chooses it for out-of-sample prediction as pr=predict(..,newdata=test), and further sends its output to confusionMatrix(pr). We can then measure classification accuracy by (8.6.2) when incidence rates are unchanged for the application at hand. The following url has worked-out examples, including a student
March 11, 2022
13:56
406
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R
admission dataset with GPA and GRE scores. https://finnstats. com/index.php/2021/04/30/knn-algorithm-machine-learning/ 8.6.2.
Almost ideal demand system
The “almost ideal demand system” (AIDS) was developed by Deaton and Muellbauer in the 1980s. A summary of econometric tools is available in Chern et al. (2003) UN FAO report on Japanese imports of food products. Both single equation models and systems of many demand equations involve regressing wi , the share of food i among the n food items on prices, total expenditure x, and various dummy variables for household characteristics. Chern et al. (2003) provide useful hints for computing elasticities from estimated coefficients. Heckman’s two-step estimator described in Chapter 7 along with the “inverse Mills ratio” is used in this literature. AIDS leads to several restrictions of the regression coefficients. The R package “micEconAids” by Henningsen comes with a vignette that describes estimation details, including (i) adding up, (ii) homogeneity, and (iii) symmetry properties needed to be consistent with economic theory. One uses panel data with (food) product i bought at time t, while denoting expenditure shares and prices by (wit , pit ). The linear approximation (LA) to AIDS is given by γij ln(pjt ) + βi ln(xt /Pt ) + uit , (8.6.4) wit = αi + j
where the coefficients are (α, β, γ), xt is the expenditure. Also, the price index Pt has a log-linear relation to individual product prices as wit ln pit . (8.6.5) ln Pt = i
The package “micEconAids” includes tools for estimation of LA-AIDS and its extensions, including bias-type technical problems. For example, Stone-index is sensitive to measurement units, the expenditure function may not monotonically increase in prices, and the Hessian matrix may not be negative semi-definite (denying a unique solution to utility maximization). The function aidsConsist(.) provides a comprehensive assessment of the consistency of data with economic theory.
page 406
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Consumption and Demand: Kernel Regressions and Machine Learning
8.7.
page 407
407
Consumers’ Surplus: Cost/Benefit Analysis of Taxes
Consumers’ surplus is defined as the difference between the maximum amount the consumer is willing to pay and what he actually pays. For example, consider a demand curve having quantity demanded (gallons per person) on the horizontal axis and the price of water per month on the vertical axis. If the consumer actually pays $10 for 30 gallons will represent one (market equilibrium) point on the demand curve. Draw a horizontal line from the market equilibrium price to the vertical axis. Now consumers’ surplus is the area above this horizontal line, representing the surplus satisfaction enjoyed without paying for it. The area computation assumes that the marginal utility (MU) of income is constant. See Schaffa (2018) for detailed graphics and explanations. He also argues that the computation of a change in consumers’ surplus, being the difference between two areas, is actually simpler since it is not path-dependent. Hausman and Newey (1995) study the effect of gasoline tax by defining “exact” consumers’ surplus for a reference utility ur as CS(p1 , p0 , ur ) = e(p1 , ur ) − e(p0 , ur ),
(8.7.1)
where e(.) denotes expenditure functions, p1 the price after gasoline tax, and p0 the initial price. Focusing on the reference utility with r = 1, they avoid assuming constant MU of income by considering equivalent variation EV(p1 , p0 , y) = e(p1 , u1 ) − e(p0 , u1 ) = y − e(p0 , u1 ),
(8.7.2)
where income y is assumed fixed over the price change. Tax increases reduce consumers’ surplus, except that increased tax revenues may be returned to the consumer in a lump sum manner. The resulting dead-weight loss is defined by DWL(p1 , p0 , y) = EV − (p1 − p0 ) h(p1 , u1 ), = y − e(p0 , u1 ) − (p1 − p0 ) q(p1 , y),
(8.7.3) (8.7.4)
where (p1 − p0 ) is a vector of price change due to taxes, h(p, u) is Hicksian compensated demand curve, and q(p, y) is Marshallian demand curve. Applying Shepard’s lemma, the implicit function theorem, and Roy’s identity, they write a differential equation in the
March 11, 2022
408
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R
equivalent variation for a price change. Hausman and Newey (1995) then estimate the following log-linear demand regression in logs of gasoline quantity q and prices p: ln q = η0 + η1 ln p + η2 ln y + w β + ,
(8.7.5)
where w has region and time dummies as covariates and estimated η are elasticities. They also use cross-validation to estimate nonparametric regressions described in Section 8.4, without providing software details. Their estimates of EV and DWL defined in (8.7.2) and (8.7.4) by parametric and non-parametric methods are found to be close. We need user-friendly R tools for computing consumers’ surplus and dead-weight losses. The following snippet computes the exact area representing the reduction in consumers’ surplus (CS) when price increases from p1 to p2. We add the area of an appropriate rectangle to the area of a triangle. q=seq(from=1,to=10,by=.1) p=50-3*q plot(q,p,typ="l",main="Demand Curve p=50-3q") p1=29; p2=35; q1=7; q2=5 lines(x=c(0,q1),y=c(p1,p1)) lines(x=c(0,q2),y=c(p2,p2)) pdiff=abs(p2-p1) qmin=min(q1,q2) qmax=max(q1,q2) lines(x=c(qmin,qmin),y=c(p1,p2)) tribase=qmax-qmin#base of triangle CS=(qmin*pdiff)+0.5*(pdiff*tribase);CS The computation CS=36 in Fig. 8.5 is exact. However, it ignores the “income effect” (price rise indirectly reduces overall income) and the “substitution effect” (when one price rises, consumers switch to buying a closely related product). If the price rise is due to government taxes, a certain proportion of tax revenue benefits the consumer. Computation of net CS after all these adjustments is challenging. Ryan (2012) criticizes typical engineering estimates of the cost of environmental regulation in terms of compliance costs, because they
page 408
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Consumption and Demand: Kernel Regressions and Machine Learning
page 409
409
20
25
30
p
35
40
45
Demand Curve p=50−3q
2
4
6
8
10
q
Fig. 8.5.
Computation of consumers’ surplus from demand curve.
ignore adverse longer-term effects of the regulation on the market power of entrenched entities. He finds that welfare costs of the 1990 Amendments to the Clean Air Act on the US Portland cement industry via higher prices to consumers were high. Thus, in addition to price, income, substitution, and tax-revenue sharing, long-run effects on market structure (monopoly power) also matter. 8.8.
Final Remarks on Modeling Consumer Behavior
We begin by noting that a stylized fact attributed to Kuznets is no longer valid. Our aim here is to update and review Hall’s (1978) AR(1) random walk model for consumption to see if it can stand the
March 11, 2022
410
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Hands-on Intermediate Econometrics Using R
test of time after using more detailed and up-to-date monthly data. We begin with a review of the Euler condition for optimum path and show that it gives the basic result of Hall’s model. His Corollary 5 states that the underlying AR(1) type regression for consumption has heteroscedastic errors. Our tests using data for non-durable consumption goods do not find statistically significant heteroscedasticity. A maximum likelihood estimate of γ = (1 + δ)/(1 + r) for a nonlinear heteroscedastic model is found to exceed unity, suggesting that time preference is generally greater than the market interest rate, δ > r, and α is close to zero. Although the latter estimate is similar to Hall’s, a subsection of this chapter is devoted to explaining why several assumptions in his theory are unrealistic. We use non-parametric kernel estimation with plausible results for unanticipated consumption. We find that “amorphous partial derivatives” as slopes are neither nearly constant nor close to unity. Using a VAR model, we consider Granger-causality tests and find that the traditional assumption of income preceding consumption might not be entirely valid. If the converse is valid, we need an alternative model for consumer behavior, where the decision equation is for income. Wiener–Hopf–Whittle methods are used to estimate our alternate model, and our numerical results provide interesting estimates of shadow prices of the terms in the alternative objective function. It is not surprising that a two-equation model of consumer theory works better than a single AR(1) equation of Hall’s model. We demonstrate that our alternative model has the advantage that it is subject to fewer puzzles than the traditional LC/PIH theory. Decision analysis, random forests, neural network, and many other machine learning tools in R are increasingly being applied to consumption demand forecasting and evaluation by confusion matrices. These methods do not rely much on economic theory. By contrast, AIDS have an intimate connection with economic theory. Henningsen’s R package “micEconAids” estimates AIDS, including their linear approximations, and checks whether the data agree with economic theory. We mention the need for easy computation of consumers’ surplus and dead-weight loss using R. An important purpose of this discussion is to engage the reader in interesting problems to provide potentially useful R software templates for various sophisticated tasks. We hope that our discussion of various puzzles and extensions of consumer theory and a rich set of new and old software
page 410
March 11, 2022
13:56
Hands-on Intermediate Econometrics Using R. . .
9in x 6in
b4465-ch08
Consumption and Demand: Kernel Regressions and Machine Learning
page 411
411
tools inspire new research insights. A better assessment of the costs and benefits of policy changes by using R tools for quantifying consumers’ surplus deserves further attention. 8.9.
Appendix: Additional Macroeconomic VARs
Since the data used in the first edition of the book is no longer directly available from the earlier sources, it is posted to my website so that readers continue to have hands-on experience. The descriptions of eleven monthly series are given earlier before snippet #R8.3.1.1. The appendix snippet #R8.A.1 uses the R package “vars” to fit the VAR model to the monthly and quarterly macroeconomic data. The VAR estimation with Granger causality and cross-correlation estimation is next. #R8.A.1 download data as in \#R8.3.1.1. head(cbind(percapc,percapi)) #get ready of Granger causality study library(vars) myy=data.frame(cbind(percapc,percapi)) var.1c