168 27 21MB
English Pages [943]
Springer Texts in Business and Economics
Burkhard Heer Alfred Maußner
Dynamic General Equilibrium Modeling Computational Methods and Applications Third Edition
Springer Texts in Business and Economics
Springer Texts in Business and Economics (STBE) delivers high-quality instructional content for undergraduates and graduates in all areas of Business/Management Science and Economics. The series is comprised of self-contained books with a broad and comprehensive coverage that are suitable for class as well as for individual selfstudy. All texts are authored by established experts in their fields and offer a solid methodological background, often accompanied by problems and exercises.
Burkhard Heer · Alfred Maußner
Dynamic General Equilibrium Modeling Computational Methods and Applications Third Edition
Burkhard Heer University of Augsburg Augsburg, Germany
Alfred Maußner University of Augsburg Augsburg, Germany
ISSN 2192-4333 ISSN 2192-4341 (electronic) Springer Texts in Business and Economics ISBN 978-3-031-51680-1 ISBN 978-3-031-51681-8 (eBook) https://doi.org/10.1007/978-3-031-51681-8 1st edition: © Springer-Verlag Berlin Heidelberg 2005 2nd edition: © Springer-Verlag GmbH Germany, part of Springer Nature 2009 3rd edition: © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
For our families: Barbara, Joe, and Carla Christa, Florian, and Johannes
Preface to the Third Edition Since we wrote the second edition of our book in 2008, we have observed three major trends in the computation of dynamic general equilibrium (DGE) models: 1) Graduate and advanced undergraduate students have increasingly relied on the usage of easy-to-use software tools that allow them to compute the standard representative-agent business cycle models with a minimum of knowledge of mathematics in the field of stochastic difference equations. Typical software packages include Dynare R and our toolbox CoRRAM, where the former is available as MATLAB R and the latter as both MATLAB and GAUSS code. Both packages draw on the use of local perturbation methods. The reorganization of our book reflects this development and streamlines the exposition so that the reader has the choice to either familiarize him- or herself with the underlying mathematical concepts or to skip to more advanced material. 2) Some important economic problems that have received more attention in the recent major crises — the Great Recession 2007-2009, the COVID-19 pandemic, or the energy and inflation crisis in 2022 — have shifted away the focus of attention from local approximation to global methods. A long-lasting disruption of supply chains, the phenomenon of deglobalisation, or the jump of energy prices by more than 100 percent casts doubt on the assumption that we consider an economy close to its long-run equilibrium, the steady state. Instead, it becomes necessary to study the dynamics of the model far from the steady state by applying global methods. Among others, we demonstrate these global methods with an example from the stock market known as the rare disaster theory that tries to explain the equity premium puzzle. According to this theory, the large excess returns of stocks vis-à-vis bonds (averaging approximately 6 percentage points over the postwar period) can be explained with the help of large disruptive events, e.g., an extreme productivity shock in terms of a disruption in supply chains. We include a detailed analysis of the rare disaster model in Chapter 5 on ‘Weighted Residual Methods’ and expand our description of the underlying theory. Other than local perturbation methods, global perturbation methods are much more difficult to be provided in a black box to the user and rather require the individual to provide his or her own programming of code and a deeper comprehension of numerical mathematics. As a consequence, we also extended Part III on ‘Tools’ and provide more detailed descriptions on different numerical concepts.
vi
vii
3) Recent developments in computer technology have been characterized by two major events, i) the increase in speed and ii) the advance of Artificial Intelligence (AI). i) In 1975, Gordon Moore revised his prediction from 1965 that the number of transistors in an integrated circuit (IC) doubles every year and increased the necessary time period to two years. This empirical relationship is commonly known as Moore’s law. In the recent decade, however, the gains in semiconductor advancement and computer speed have slowed down. As of today, we are still faced with the constraint of computational capacity and the trade-off between speed and accuracy in many applications, in particular those on heterogeneous agent economies studied in Part II of our book. In other words, the curse of dimensionality still weighs heavy on us and forces us to apply time-saving and efficient computational methods for deriving solutions of models with endogenous distributions of state variables. Often, computing the distribution of individual continuous state variables in more than two or three dimensions is beyond reasonable time limits with present-day computer technology. In this edition, we pay particular attention to the speed in the computation of heterogeneous agent economies. For example, we compare R the speeds of the computer languages MATLAB and GAUSS in the computation of the large-scale overlapping generations (OLG) model with other computer languages, such as Fortran, Python, or Julia, and comment on the use of these various languages. While most of our computer proR grams are provided as MATLAB and GAUSS code, we also provide many applications in the form of Fortran, Python, or Julia code as well.1 The download page of our book is available at:
https://www.uni-augsburg.de/de/fakultaet/wiwi/prof/vwl/ heer/dgebook/ As before, the reader does not need to download any program code from other websites in order to replicate our findings, for example, on the statistics and characteristics of business cycle models or the dynamics of the distribution function in heterogeneous agent economies. 1
For the three computer languages Python, Julia, and Fortran, we would like to point out two additional useful websites for computational economists in the field of DGE modeling. Thomas J. Sargent and John Starchurski provide Python and Julia code in their internet lab ‘QuantEcon’ on their website https://quantecon.org/lectures/ , while Ernst Fehr and Fabian Kindermann provide the Fortran source code for their textbook on the website www.ce-fortran.com.
viii
The choice of the right computer language for one’s own purposes is one of the most important questions for the reader who would like to specialize in the field of computational economics. There are several decision criteria: a) freeware versus commercial software, b) ease of application versus exeR cution speed, and c) availability of sample code. For instance, MATLAB and Python are the programming languages where the researcher finds the most sample code.2 Fortran, a compiled language tailored to number crunching, where open source compilers are available, has a steep learning curve but is particularly fast in simulation based applications, quite conR trary to MATLAB . Julia provides an alternative freeware to Python and is particularly useful for economists. However, it is slightly more difficult to apply, while, on the other hand, we find it to be considerably faster than Python. In the case of commercial software, we prefer GAUSS to R MATLAB in terms of speed and user friendliness. In sum, the choice of the right programming language is difficult and depends on the student’s individual condition with respect to his mathematical and coding skills on the one hand and the kind of problems that he or she would like to tackle on the other hand. For a student who wants to run applications of R the standard real business cycle model, MATLAB might be sufficient. For the more aspiring researcher, the speed of the software and availability of auxiliary code might be more important. ii) The digital revolution is accelerating and effects jobs, the income distribution, the aggregate economy, and, of course, economic theory. It is difficult to judge whether AI will soon even replace computational economists completely. There are several arguments brought forward by Maliar et al. (2019) that this is unlikely to happen in the near future: “Thus, the idea that there is an all-purpose procedure, which will remove the curse of dimensionality without any trade-offs and will produce a highly precise and uniformly accurate solution for any kind of models is beyond current capabilities.”
We concur with these authors in their assessment. Even if the opportunities to handle and simulate large DGE models with the help of AI gain significant momentum, the economic problems still need to be set up in a way such that it is accessible to the deep learning algorithms of AI. For this reason, we are still optimistic that the study of computational methods is a worthwhile subject for the young researcher who decides upon his or her 2
See, e.g., the code pages of the ‘Review of Economic Dynamics’ or the software collection at IDEAS under https://ideas.repec.org/i/c.html .
ix
field of specialization. AI will support, but not replace the computational economist. In addition to the recent advances in the field of computational economics, the three general developments (1)-(3) resulted in a major overhaul of our book. Furthermore, we included some 300 pages of new text in this edition. Among other subjects, we included the following new material: • New chapters and sections on numerical methods: – – – –
Chapter 3 on higher-order perturbation methods Chapter 6 on simulation-based methods Section 8.3 on Gorman preferences and aggregation Section 10.1.3 on the problem of finding an initial value for largescale OLG models – Section 11.1.3 on multi-dimensional state spaces in large-scale OLG models – Section 13.9 on Smolyak’s algorithm
• New applications:
– Rare disaster theory (Chapter 5) – A New Keynesian model in the spirit of Christiano et al. (2005) and Smets and Wouters (2007) for the study of monetary policy (Chapter 4). – Overlapping generations model with endogenous pensions (Chapter 11)
ORGANIZATION OF THE BOOK. The book is aimed at graduate students or advanced undergraduate students. It is self-contained and the student should be able to follow its analysis without any additional material or prior knowledge of dynamic macroeconomic theory. The only pre-requisite for the understanding of the material is a solid background in mathematical methods including analysis and linear algebra. In addition, if the student would like to use the computer code and adapt it for her or his purposes, a basic knowledge of computer programming is needed. We provide code to R selected problems in various programming languages including MATLAB , GAUSS, Fortran, Python, or Julia. The book consists of three parts. Part I studies methods to compute representative agent economies, Part II looks at heterogeneous agent economies, while we collect numerical and other mathematical tools in Part III. In
x
the first Chapter, we introduce the benchmark model which is the stochastic Ramsey model with endogenous labor, calibrate it, and provide an overview of possible solution methods. Chapters 2-4 study local solution methods, while Chapters 5-7 analyze global methods. We illustrate these methods with various applications where information from either the stationary equilibrium only (local methods) or the entire state space (global methods) is used for the solution of the model and compare them to each other with respect to their accuracy, speed, and ease of implementation. Chapter 2 provides an introduction to perturbation methods and the necessary tools, while Chapter 3 lays out the (somehow cumbersome) details of first-, second-, and third-order solutions. Chapter 3 can be skipped by the reader who is only interested in applying the methods without a lack of understanding in subsequent chapters. The perturbation methods are applied in Chapter 4 to study both the benchmark real-business and the New Keynesian model, where the latter is the workhorse of modern central bank policy analysis. Chapter 5 introduces the first global method in Part I and describes weighted residual methods with an application to the benchmark model, a search-and-matching model, and the disaster-risk model. Chapter 6 considers simulation based methods. Different from the weighted residuals method in Chapter 5, these methods do not choose pre-defined sets of points in the model’s state space to approximate the policy functions. This chapter provides a collection of loose methods, e.g. the extended path method and genetic search. Applications include an open-economy model and the limited-participation model of Christiano et al. (1997). After the study of Part I, the reader should be able to choose among the different methods the one that suits the computation of his particular business cycle model best. In addition, he or she should be prepared to apply the value function iteration (VI) method introduced in Chapter 7 to the heterogeneous agent models of Part II. The second part of the book employs numerical methods to the computation of heterogeneous agent economies. In particular, we consider the heterogeneous agent extension of the stochastic growth model in Chapters 8 and 9 on the one hand and the overlapping generations model in the Chapters 10 and 11 on the other. Chapters 8 and 10 are devoted to the solution of the stationary equilibrium, while Chapters 9 and 11 consider the dynamics and aggregate uncertainty. Part III of the book constitutes a detailed description of numerical tools. Chapters 12-16 cover the necessary pre-requisites for the first two parts from the field of linear algebra, function approximations, differential
xi
and integration theory, nonlinear equations, numerical optimization, and stochastic processes. We appreciate that this book cannot easily be covered in one semester, but one can conveniently choose Parts I or II of it as a one-semester course or combine Parts I and II. For example, a course on computational methods in business cycle theory may choose Chapters 1 through 5. Graduate students with prior knowledge of numerical analysis may use Chapters 8 through 11 for an introduction to the computation of heterogeneous agent economies and the theory of income distribution. Alternatively, one may concentrate on Chapters 1-2, 4, 7, 8, and 9 to understand the heterogeneous agent Aiyagari-Bewley economy proposed by Bewley (1986) and developed further in Aiyagari (1994) and Aiyagari (1995) in detail. A one-semester course in computational public finance that is aimed at the computation of Auerbach-Kotlikoff models and its extensions by Huggett (1996) and Ríos-Rull (1996) to include individual and aggregate uncertainty can be based on Chapters 1-2, 4, 10, and 11. RELATED TEXTBOOK LITERATURE. The presentation in our book is selfcontained and the reading of it is possible without the consultation of other material. The field of computational economics, however, is vast and we do not pretend to survey it. There are several other graduate textbooks that are complementary to ours. Judd (1998) gives a comprehensive survey of computational economics and remains the standard reference. Miranda and Fackler (2002) have written a book that, like ours, is more directed towards the illustration of examples and algorithms. However, their focus is more on continuous time models. Starchurski (2022) provides a treatment of various numerical tools where the focus, however, is on techniques rather than DGE models. For those readers who are absolute beginners in computing, the textbook of Heer (2019) provides an introduction to both R MATLAB and GAUSS programming starting from a ten-line code to more advanced material with a focus on OLG models. Fehr and Kindermann (2018) provide an introduction to computational economics using Fortran with an extensive treatment of dynamic programming and OLG models. In our book, we also do not cover the process of calibration and estimation methods of dynamic stochastic general equilibrium (DSGE) models with the help of econometric techniques, such as maximum likelihood and method of moments. The textbooks of Canova (2007) and DeJong and Dave (2011) are excellent references for the study of these empirical methods. Herbst and Schorfheide (2016) present Bayesian methods for
xii
the estimation of DSGE models. The textbook by Ljungqvist and Sargent (2018) on recursive macroeconomic theory and the monograph by Stokey et al. (1989) on recursive methods may serve as a helpful reference for the economic theory and mathematical background applied in this book. Acemo˘ glu (2009) is a most useful advanced textbook on macroeconomic theory with illuminating chapters on dynamic programming and both deterministic and stochastic growth.
Acknowledgements We would like to thank many people for their advice, their support, and encouragement towards our book project. The first edition of the book was written during 2000-2004, we then revised the book during 20062008. Since 2018 we have revised and improved the book further and now present the third edition. We are grateful to our many students in graduate classes in computational economics that were taught at the universities of Augsburg, Bamberg, Bolzano, Innsbruck, Leipzig, Munich, and Luxembourg and the Deutsche Bundesbank (German Central Bank). We received useful comments from Selahattin ˙Imrohoro˘ glu, Andreas Irmen, Ken Judd, Dirk Krueger, Paul McNelis, Vito Polito, Michael Reiter, José-Victor Ríos-Rull, Bernd Süssmuth, and Mark Trede. Part of this book was written during Burkhard’s stay at Georgetown University, Stanford University, Fordham University, the University of Luxembourg, and the Federal Reserve Bank at St. Louis, and we would like to thank Jim Bullard, Mark Huggett, Andreas Irmen, Ken Judd, Paul McNelis, Christopher Waller, and Christian Zimmermann for their hospitality. The views expressed in this book are ours and do not necessarily reflect the official positions of the Federal Reserve Bank of St. Louis, the Federal Reserve System, or the Board of Governors. Burkhard Heer also kindly acknowledges support from the German Science Foundation (Deutsche Forschungsgemeinschaft DFG) during his stay at Georgetown University and Stanford University. For particular assistance in the preparation of the text, including critical comments on several drafts and helpful suggestions, we would like to thank Jürgen Antony, André de Beisac, Roman Bothmann, Monika Bredow, Hans-Helmut Bünning, Anja Erdl, Daniel Fehrle, Sabine Gunst, Christopher Heiberger, Michael Holsteuer, Johannes Huber, Nikolai Hristov, Torben Klarl, Vasilij Konysev, Jana Kremer, Dominik Menno, Christian Scharrer, and Sotir Trambev.
xiii
Contents
Part I Representative Agent Models 1
Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 The Deterministic Finite Horizon Ramsey Model . . . . . . . . 1.2.1 The Ramsey Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 The Karush-Kuhn-Tucker Theorem . . . . . . . . . . . . . . . . 1.3 The Deterministic Infinite Horizon Ramsey Model . . . . . . . 1.3.1 Recursive Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Euler Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 The Saddle Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.5 Models with Analytical Solution . . . . . . . . . . . . . . . . . . 1.4 The Stochastic Ramsey Model . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Stochastic Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Stochastic Euler Equations . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 Stochastic Dynamic Programming . . . . . . . . . . . . . . . . 1.5 Labor Supply, Growth, and the Decentralized Economy . . . 1.5.1 Substitution of Leisure . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Growth and Restrictions on Technology and Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Parameterizations of Utility and Important Elasticities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.4 The Decentralized Economy . . . . . . . . . . . . . . . . . . . . . 1.6 Model Calibration and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 The Benchmark Business Cycle Model . . . . . . . . . . . . . 1.6.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 4 4 7 9 10 11 13 17 21 25 25 27 28 31 31 32 39 44 47 47 51 xv
xvi
Contents
1.6.3 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Numerical Solution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7.2 Accuracy of Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 Solution to Example 1.3.1 . . . . . . . . . . . . . . . . . . . . . . . A.2 Restrictions on Technology and Preferences . . . . . . . Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56 63 63 66 68 68 70 75
2
Perturbation Methods: Framework and Tools . . . . . . . . . . . . . . . 79 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.2 Order of Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 2.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.3.1 A Brief List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 2.3.2 Application to the Deterministic Ramsey Model . . . 82 2.4 The Stochastic Linear-Quadratic Model . . . . . . . . . . . . . . . . . 88 2.4.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 2.4.2 Policy Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 2.4.3 Certainty Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 2.5 A Canonical DSGE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 2.5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 2.5.2 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.6 More Tools and First Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 2.6.1 Computer Algebra versus Paper and Pencil . . . . . . . . 96 2.6.2 Derivatives of Composite Functions and Tensor Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 2.6.3 Derivatives of Composite Functions and Matrix Chain Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 2.6.4 Computation of Partial Derivatives . . . . . . . . . . . . . . . 106 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 A.3 Solution of the Stochastic LQ Problem . . . . . . . . . . . . . 111 A.4 Third-Order Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3
Perturbation Methods: Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 First-Order Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 First-Order Policy Functions . . . . . . . . . . . . . . . . . . . . . 3.2.2 BA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 System Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119 119 120 120 120 123
Contents
xvii
3.2.4 Digression: Solving Separately for the Deterministic and Stochastic Components . . . . . . . . 126 3.3 Second-Order Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 3.3.1 Second-Order Policy Functions . . . . . . . . . . . . . . . . . . . . 131 3.3.2 Coefficients of the State Variables . . . . . . . . . . . . . . . . 132 3.3.3 Coefficients of the Perturbation Parameter . . . . . . . . 135 3.4 Third-Order Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 3.4.1 Third-Order Policy Functions . . . . . . . . . . . . . . . . . . . . 139 3.4.2 Coefficients of the State Variables . . . . . . . . . . . . . . . . 139 3.4.3 Coefficients of the State-Dependent Uncertainty. . . 142 3.4.4 Coefficients of the Perturbation Parameter . . . . . . . . 143 3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A.5 Coefficients of the State-Dependent Uncertainty. . . 146 A.6 Coefficients of the Perturbation Parameter . . . . . . . . 152 4
Perturbation Methods: Model Evaluation and Applications . 155 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.2 Second Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.2.1 Analytic Second Moments: Time Domain . . . . . . . . . 156 4.2.2 Digression: Unconditional Means . . . . . . . . . . . . . . . . 158 4.2.3 Analytical Second Moments: Frequency Domain . . 159 4.2.4 Second Moments: Monte-Carlo Approach . . . . . . . . . 165 4.3 Impulse Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 4.4 The Benchmark Business Cycle Model . . . . . . . . . . . . . . . . . . 176 4.5 Time-to-Build Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 4.6 A New Keynesian Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 4.6.1 The Monopolistically Competitive Economy . . . . . . . .187 4.6.2 Price Staggering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 4.6.3 Wage Staggering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 4.6.4 Nominal Frictions and Interest Rate Shocks . . . . . . . 204 4.6.5 Habits and Adjustment Costs . . . . . . . . . . . . . . . . . . . . 206 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .217 A.7 Derivation of the Demand Function . . . . . . . . . . . . . . .217 A.8 Price Phillips Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 A.9 Wage Phillips Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
xviii
Contents
5
Weighted Residuals Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 5.2 Analytical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 5.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 5.2.2 Residual, Test, and Weight Function . . . . . . . . . . . . . . 236 5.2.3 Common Test Functions . . . . . . . . . . . . . . . . . . . . . . . . . 239 5.2.4 Spectral and Finite Element Functions . . . . . . . . . . . . . 241 5.2.5 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 5.2.6 General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 5.3.1 State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 5.3.2 Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 5.3.3 Residual Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 5.3.4 Projection and Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 251 5.3.5 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 5.4 The Deterministic Growth Model . . . . . . . . . . . . . . . . . . . . . . . 256 5.5 The Benchmark Business Cycle Model . . . . . . . . . . . . . . . . . . 260 5.6 The Benchmark Search and Matching Model . . . . . . . . . . . . 264 5.6.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 5.6.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 5.6.3 Galerkin Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 5.6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 5.7 Disaster Risk Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 5.7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 5.7.2 The Benchmark Business Cycle Model with Disaster Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 5.7.3 Generalized Expected Utility . . . . . . . . . . . . . . . . . . . . . 285 5.7.4 Adjustment Costs of Capital . . . . . . . . . . . . . . . . . . . . . . 291 5.7.5 Variable Disaster Size and Conditional Disaster Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 5.7.6 The Full Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
6
Simulation-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 6.2 Extended Path Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 6.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 6.2.2 The General Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 319 6.2.3 Application: The Benchmark Business Cycle Model . 321 6.2.4 Application: The Model of a Small Open Economy 323
Contents
xix
6.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 6.3 Simulation and Function Approximation . . . . . . . . . . . . . . . . 336 6.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 6.3.2 The General Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 6.3.3 Application: The Benchmark Business Cycle Model 345 6.3.4 Application: The Limited Participation Model of Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 6.3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 7
Discrete State Space Value Function Iteration . . . . . . . . . . . . . . 369 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 7.2 Solution of Deterministic Models . . . . . . . . . . . . . . . . . . . . . . . . 371 7.3 Solution of Stochastic Models . . . . . . . . . . . . . . . . . . . . . . . . . . 383 7.3.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 7.3.2 Approximations of Conditional Expectations . . . . . . 383 7.3.3 Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 7.3.4 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 7.3.5 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .387 7.3.6 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 7.3.7 Value Function Iteration and Linear Programming . . 391 7.3.8 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 7.4 Further Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .397 7.4.1 Nonnegative Investment . . . . . . . . . . . . . . . . . . . . . . . . 398 7.4.2 The Benchmark Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Part II Heterogenous Agent Models 8
Computation of Stationary Distributions . . . . . . . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Easy Aggregation and Gorman Preferences . . . . . . . . . . . . . . 8.2.1 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Gorman Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 A Simple Heterogeneous Agent Model with Aggregate Certainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Discretization of the Distribution Function . . . . . . . . 8.4.2 Discretization of the Density Function . . . . . . . . . . . .
413 413 415 415 425 429 436 442 449
xx
Contents
8.4.3 Monte-Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . 451 8.4.4 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . 453 8.5 The Risk-Free Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 8.5.1 The Exchange Economy . . . . . . . . . . . . . . . . . . . . . . . . . .457 8.5.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 8.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 8.6 Heterogeneous Productivity and Income Distribution . . . . 463 8.6.1 Empirical Facts on the Income and Wealth Distribution and Income Dynamics . . . . . . . . . . . . . . . 464 8.6.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470 8.6.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .477 8.6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 9
Dynamics of the Distribution Function . . . . . . . . . . . . . . . . . . . . . 485 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 9.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 9.3 Transition Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 9.3.1 Partial Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 9.3.2 Guessing a Finite Time Path for the Factor Prices . . 500 9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm . . . . 505 9.4.1 The Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 9.4.2 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508 9.4.3 Calibration and Numerical Results . . . . . . . . . . . . . . . 513 9.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .517 9.5.1 Costs of Business Cycles with Indivisibilities and Liquidity Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 9.5.2 Income Distribution and the Business Cycle . . . . . . . 526 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
10 Overlapping Generations Models with Perfect Foresight . . . . 543 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 10.2 The Steady State in OLG Models . . . . . . . . . . . . . . . . . . . . . . . 545 10.2.1 An Elementary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 10.2.2 Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . 550 10.2.3 Direct Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552 10.2.4 Computation of the Policy Functions . . . . . . . . . . . . . 556 10.3 The Laffer Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 10.4 The Transition Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 10.4.1 A Stylized 6-Period OLG Model . . . . . . . . . . . . . . . . . . 580
Contents
xxi
10.4.2 Computation of the Transition Path . . . . . . . . . . . . . . . 581 10.5 The Demographic Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 10.5.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590 10.5.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .597 10.5.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 10.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 10.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618 A.10 Derivation of Aggregate Bequests in (10.29) . . . . . . 618 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620 11 OLG Models with Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .627 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .627 11.2 Overlapping Generations Models with Individual Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 11.2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630 11.2.2 Computation of the Stationary Equilibrium . . . . . . . 642 11.2.3 Multi-Dimensional Individual State Space . . . . . . . . 663 11.3 Overlapping Generations with Aggregate Uncertainty . . . . 672 11.3.1 Perturbation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 674 11.3.2 The OLG Model with Quarterly Periods . . . . . . . . . . . 675 11.3.3 Business Cycle Dynamics of Aggregates and Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 11.3.4 The Krusell-Smith Algorithm and Overlapping Generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693 Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 A.11 Derivation of the Stationary Dynamic Program of the Household . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 A.12 First-Order Conditions of the Stationary Dynamic Program (11.13) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718 A.13 Derivation of the Parameters of the AR(1)-Process with Annual Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 Part III Numerical Methods 12 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .727 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .727 12.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .727 12.3 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
xxii
Contents
12.4 12.5 12.6 12.7 12.8 12.9
Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730 Linear and Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 735 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . .737 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 12.9.1 Jordan Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738 12.9.2 Schur Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 12.9.3 QZ Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740 12.9.4 LU and Cholesky Factorization . . . . . . . . . . . . . . . . . . . . 741 12.9.5 QR Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742 12.9.6 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . 743
13 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .747 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .747 13.2 Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748 13.3 Taylor’s Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 13.4 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752 13.5 Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 13.5.1 Polynomials and the Weierstrass Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 13.5.2 Lagrange Interpolating Polynomial . . . . . . . . . . . . . . . 754 13.5.3 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756 13.6 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 13.6.1 Linear Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 13.6.2 Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760 13.7 Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762 13.7.1 Orthogonality in Euclidean Space . . . . . . . . . . . . . . . . 762 13.7.2 Orthogonality in Function Spaces . . . . . . . . . . . . . . . . 763 13.7.3 Orthogonal Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 765 13.7.4 Families of Orthogonal Polynomials . . . . . . . . . . . . . . 766 13.8 Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 13.8.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 13.8.2 Zeros and Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 13.8.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 13.8.4 Chebyshev Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 13.8.5 Chebyshev Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 13.8.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 13.9 Multivariate Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .777 13.9.1 Tensor Product and Complete Polynomials . . . . . . . . .777
Contents
xxiii
13.9.2 Multidimensional Splines . . . . . . . . . . . . . . . . . . . . . . . . 779 13.9.3 Multidimensional Chebyshev Regression . . . . . . . . . . . 781 13.9.4 The Smolyak Polynomial . . . . . . . . . . . . . . . . . . . . . . . . 783 13.9.5 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788 14 Differentiation and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 14.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 14.2.1 First-Order Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 792 14.2.2 Second-Order Derivatives . . . . . . . . . . . . . . . . . . . . . . . 796 14.3 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798 14.3.1 Newton-Cotes Formulas . . . . . . . . . . . . . . . . . . . . . . . . . 798 14.3.2 Gaussian Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 14.3.3 Monomial Integration Formula . . . . . . . . . . . . . . . . . . . 801 14.4 Approximation of Expectations . . . . . . . . . . . . . . . . . . . . . . . . . 803 14.4.1 Expectation of a Function of Gaussian Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803 14.4.2 Gauss-Hermite Integration . . . . . . . . . . . . . . . . . . . . . . 804 14.4.3 Monomial Rules for Expectations . . . . . . . . . . . . . . . . 806 15 Nonlinear Equations and Optimization . . . . . . . . . . . . . . . . . . . . . . 811 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811 15.2 Stopping Criteria for Iterative Algorithms . . . . . . . . . . . . . . . 812 15.3 Nonlinear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815 15.3.1 Single Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815 15.3.2 Multiple Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818 15.4 Numerical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829 15.4.1 Golden Section Search . . . . . . . . . . . . . . . . . . . . . . . . . . 829 15.4.2 Gauss-Newton Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 15.4.3 Quasi-Newton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835 15.4.4 Genetic Search Algorithms . . . . . . . . . . . . . . . . . . . . . . 838 16 Difference Equations and Stochastic Processes . . . . . . . . . . . . . .847 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .847 16.2 Difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848 16.2.1 Linear Difference Equations . . . . . . . . . . . . . . . . . . . . . 848 16.2.2 Nonlinear Difference Equations . . . . . . . . . . . . . . . . . . . 851 16.2.3 Boundary Value Problems and Shooting . . . . . . . . . . 854 16.3 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856 16.3.1 Univariate Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
xxiv
Contents
16.3.2 Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858 16.3.3 Multivariate Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 858 16.4 Markov Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859 16.4.1 The First-Order Autoregressive Process . . . . . . . . . . . 859 16.4.2 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860 16.5 Linear Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866 16.5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866 16.5.2 The HP-Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .867 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
List of Figures
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.1 2.2 2.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8
Boundedness of the Capital Stock . . . . . . . . . . . . . . . . . . . . . . . . . . Phase Diagram of the Infinite-Horizon Ramsey Model . . . . . . . No Path Leaves the Region A2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convergence of the Capital Stock in the Infinite-Horizon Ramsey Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stationary Distribution of the Capital Stock in the Stochastic Infinite-Horizon Ramsey Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . Risk Aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impulse Responses in the Benchmark Model . . . . . . . . . . . . . . . . . Impulse Responses from an Estimated VAR . . . . . . . . . . . . . . . . . Productivity Shock in the Benchmark Business Cycle Model .
12 18 19 21 30 40 57 58 62
Eigenvalues of J . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Approximate Time Path of the Capital Stock in the Deterministic Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Policy Function for Consumption in the Deterministic Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Third-Order Approximate Policy Function for Capital . . . . . . . .167 Impulse Responses in the Time-to-Build Model . . . . . . . . . . . . . 185 Structure of the NK Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Impulse Responses in the Monopolistically Competitive Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Interest Rate Shock and Nominal Rigidities . . . . . . . . . . . . . . . . . 205 Interest Rate Shock and Real Frictions . . . . . . . . . . . . . . . . . . . . . . 211 TFP Shock in the NK Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Government Spending Shock in the NK Model . . . . . . . . . . . . . . 214
xxv
xxvi
5.1 5.2 5.3 5.4 5.5 5.6 6.1 6.2 6.3 6.4 6.5 6.6 6.7
List of Figures
Approximations of e−t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ergodic Set of the Benchmark Business Cycle Model from a Second-Order Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Euler Equation Residuals: Deterministic Growth Model . . . . . . Policy Function for the Value of Employment . . . . . . . . . . . . . . . Distribution of Unemployment in the Search and Matching Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulated Time Path of Unemployment . . . . . . . . . . . . . . . . . . . . Example Solutions of the Finite-Horizon Ramsey Model . . . . . Approximate Time Paths of the Capital Stock in the Deterministic Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simulated Time Path of the Stochastic Growth Model. . . . . . . . Ergodic Set of the Benchmark Business Cycle Model from the Extended Path Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impulse Responses to a Productivity Shock in the Small Open Economy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impulse Responses to a World Interest Rate Shock in the Small Open Economy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impulse Response to a Money Supply Shock in the Limited Participation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
242 246 260 273 273 274 314 316 318 324 332 333 358
7.1 7.2
VI versus LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Policy Function for Consumption in the Stochastic Growth Model with Nonnegative Investment . . . . . . . . . . . . . . . . . . . . . . . . 401
8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13
Dynamics of Aggregate Capital Stock K t . . . . . . . . . . . . . . . . . . . 420 Dynamics of the Gini Coefficient of Wealth . . . . . . . . . . . . . . . . . 423 Dynamics of the Gini Coefficient of Market Income . . . . . . . . . 424 Savings Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Convergence of the Distribution Mean . . . . . . . . . . . . . . . . . . . . . .447 Convergence of the Capital Stock . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Invariant Density Function of Wealth . . . . . . . . . . . . . . . . . . . . . . 449 Invariant Density Function, Employed Worker . . . . . . . . . . . . . . . 451 Next-Period Assets of the Employed Worker . . . . . . . . . . . . . . . . 460 Next-Period Assets of the Unemployed Worker . . . . . . . . . . . . . . 460 Savings in the Exchange Economy . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Stationary Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Lorenz Curve of US Earnings, Income, and Wealth in 1992 . . 465
9.1
Value Function of the Employed Worker . . . . . . . . . . . . . . . . . . . . 496
List of Figures
9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 9.13 9.14 9.15 9.16 9.17 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 10.11 10.12 10.13 10.14 10.15 10.16 10.17
xxvii
Savings of the Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 Dynamics of the Density Function over Time . . . . . . . . . . . . . . . .497 Convergence of the Aggregate Capital Stock . . . . . . . . . . . . . . . . 498 The Dynamics of the Density Function . . . . . . . . . . . . . . . . . . . . . 504 Goodness of Fit for Stationary Density . . . . . . . . . . . . . . . . . . . . . 504 Distribution Function in Period T = 3, 000 . . . . . . . . . . . . . . . . . 515 Prediction Errors without Updating . . . . . . . . . . . . . . . . . . . . . . . . 516 Time Path of the Aggregate Capital Stock . . . . . . . . . . . . . . . . . . .517 Consumption Functions in the Storage Economy . . . . . . . . . . . . . 521 Savings Functions in the Storage Economy . . . . . . . . . . . . . . . . . . 521 Invariant Density Functions in the Storage Economy . . . . . . . . 523 Consumption in the Economy with Intermediation . . . . . . . . . . 523 Savings Functions in the Economy with Intermediation . . . . . 524 Invariant Density Functions in the Economy with Intermediation Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 Lorenz Curve of Income . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 Lorenz Curve of Wealth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540 Wealth-Age Profile in the Standard OLG Model . . . . . . . . . . . . . 554 Labor-Supply-Age Profile in the Standard OLG Model . . . . . . 555 Survival Probabilities φ s in Benchmark Equilibrium . . . . . . . . . 569 Productivity-Age Profile ¯y s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 Wealth-Age Profile Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 573 Wealth-Age Profile with Age-Dependent Productivities . . . . . . 575 Labor-Supply-Age Profile in the Economy with Age-Dependent Productivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Consumption-Age Profile in the Economy with Age-Dependent Productivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 Laffer Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 Wealth-Age and Labor-Supply-Age Profiles in the New and in the Old Steady State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 Wealth-Age and Labor-Age Profiles in the Old Steady State and for the Household Born in Period t = −2 . . . . . . . . . . . . . . . 585 Transition from the Old to the New Steady State . . . . . . . . . . . . 586 Survival Probabilities in the Years 2014 and 2100 . . . . . . . . . . . . 591 US Population Growth Rate 1950-2100 (annual %) . . . . . . . . . . 591 Stationary Age Distribution, Initial and Final Steady State . . . 600 Decline of the Labor Force Share during the Transition . . . . . . 600 Increase in the Old-Age Dependency Ratio During the Transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
xxviii
List of Figures
˜ . . . . . . . . . . . . . . . . . . . . . . . . . . .607 Convergence of Transition Path K s, j Individual Wealth a˜ t in the Initial and Final Steady State . . . 610 s, j Individual Labor Supply l t , Initial and Final Steady State . . . . 611 ˜ t = K t /(A t L t ), Convergence of Capital per Working Hour, K and Aggregate Labor ˜L t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 10.22 Convergence of Factor Prices w t and r t . . . . . . . . . . . . . . . . . . . . 613 p 10.23 Convergence of Government Variables ter t and τ t . . . . . . . . . . 613 10.18 10.19 10.20 10.21
11.14 11.15 11.16
Measure µs of the s-Year-Old Cohort . . . . . . . . . . . . . . . . . . . . . . . 645 Labor Supply of the Low- and High-Skilled Workers with Idiosyncratic Productivity θ4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650 Wealth-Age Profile in the Stochastic OLG Model . . . . . . . . . . . . 656 Consumption-Age Profile in the Stochastic OLG Model . . . . . . .657 Labor-Supply-Age Profile in the Stochastic OLG Model . . . . . . 658 Lorenz Curve of US and Model Earnings . . . . . . . . . . . . . . . . . . . 659 Lorenz Curve of US and Model Wealth . . . . . . . . . . . . . . . . . . . . . 659 Policy Functions as a Function of Wealth . . . . . . . . . . . . . . . . . . . 668 Policy Functions as a Function of Accumulated Average Earnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669 Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 671 Steady-State Age Profiles of Capital, Consumption, and Working Hours in the OLG Model of Example 11.3.1 . . . . . . . . 683 Impulse Responses to a Technology Shock in the OLG Model of Example 11.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686 Impulse Responses to a Government Demand Shock in the OLG Model of Example 11.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .687 Nonstochastic Steady-State Distribution of ˜k (case 1) . . . . . . . 703 Nonstochastic Steady-State Age Profiles . . . . . . . . . . . . . . . . . . . . 710 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
12.1
Gaussian Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
13.1 13.2
Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Polynomial Approximation of the Runge Function on an Equally Spaced Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .757 Polynomial Approximation of the Runge Function on Chebyshev Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .757 Monomials on [0,1.2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Spline Interpolation of the Runge Function . . . . . . . . . . . . . . . . . 762
11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 11.11 11.12 11.13
13.3 13.4 13.5 13.6
List of Figures
13.7 13.8 13.9 13.10 13.11
xxix
13.13 13.14 13.15
Orthogonal Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762 Orthogonal Projection in Euclidean Space . . . . . . . . . . . . . . . . . . 763 Chebyshev Polynomials T1 through T5 . . . . . . . . . . . . . . . . . . . . . . 768 Weight Function of the Chebyshev Polynomials. . . . . . . . . . . . . 770 Approximation of the Runge Function with Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776 Approximation of a Kinked Function with Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .777 Rectangular Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779 Tensor and Smolyak Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
15.1 15.2 15.3 15.4 15.5 15.6 15.7
Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816 Modified Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . .817 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818 Gauss-Seidel Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820 Dogleg Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .827 Golden Section Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830 Stochastic Universal Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
16.1 16.2
Topological Conjugacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853 Local Stable and Unstable Manifolds . . . . . . . . . . . . . . . . . . . . . . . 854
13.12
List of Tables
1.1 Calibration of the Benchmark Business Cycle Model . . . . . . . 56 1.2 Business Cycles Statistics from the Benchmark Model . . . . . . 61 2.1 Code List for Equation (2.40) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 2.2 Computation of Derivatives of Example 1.6.1 . . . . . . . . . . . . . 110 4.1 4.2 4.3 4.4 4.5 4.6
Euler Equation Residuals: Benchmark Business Cycle Model 178 Second Moments: German Data . . . . . . . . . . . . . . . . . . . . . . . . . 179 Second Moments: Benchmark Business Cycle Model. . . . . . . 180 Second Moments: Time-to-Build Model . . . . . . . . . . . . . . . . . . 186 Calibration of the NK Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Second Moments: NK Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.1 Weighted Residuals Solution of the Deterministic Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 5.2 Euler Equation Residuals of the Galerkin Solution of the Benchmark Business Cycle Model . . . . . . . . . . . . . . . . . . . . . . . . 263 5.3 Second Moments from the Benchmark Business Cycle Model: Perturbation versus Galerkin Solution . . . . . . . . . . . . . 264 5.4 Calibration of the Search and Matching Model . . . . . . . . . . . . 270 5.5 Data on Global Real Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 5.6 Calibration of the Benchmark Model with Disaster Risk . . . . 278 5.7 Annualized Real Returns in the Benchmark Model . . . . . . . . 284 5.8 Annualized Real Returns with GEU . . . . . . . . . . . . . . . . . . . . . . . 290 5.9 Annualized Real Returns with GEU and Adjustment Costs of Capital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 5.10 Model Calibration with Variable Disaster Size . . . . . . . . . . . . . .297 5.11 Annualized Real Returns with Variable Disaster Size . . . . . . . 298 xxxi
xxxii
List of Tables
5.12 Baseline Calibration of the Full Disaster Risk Model . . . . . . . 302 5.13 Annualized Real Returns in the Full Disaster Risk Model . . . 303 6.1 Second Moments from the Benchmark Business Cycle Model: Extended Path Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 324 6.2 Second Moments from the Small Open Economy Model . . . 335 6.3 Successful generalized stochastic simulation (GSS) Solutions of the Stochastic Growth Model . . . . . . . . . . . . . . . . . 341 6.4 GSS Solutions of the Benchmark Business Cycle Model . . . . 348 6.5 Second Moments from the Benchmark Business Cycle Model: GSS Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 6.6 Calibration of the Limited Participation Model . . . . . . . . . . . . 355 6.7 Second Moments from the Limited Participation Model . . . . 359 7.1 Value Function Iteration in the Deterministic Growth Model: Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Value Function Iteration in the Deterministic Growth Model: Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Value Function Iteration in the Stochastic Growth Model: Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Value Function Iteration in the Stochastic Growth Model: Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 VI Solution of the Benchmark Business Cycle Model . . . . . . .
380 382 394 396 404
8.1 Statistics for the Computation of the Invariant Distribution 452 8.2 Credit Limit and Interest Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 8.3 Results of Tax Reform Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 9.1 Calibration of Employment Rates . . . . . . . . . . . . . . . . . . . . . . . . 532 9.2 Correlation of Income Shares and Output . . . . . . . . . . . . . . . . 539 10.1 10.2 10.3 10.4
Computation of the Steady State of the OLG Model . . . . . . . 558 Calibration of the Large-Scale OLG Model . . . . . . . . . . . . . . . . . 571 Computation of Laffer Curves: Runtime . . . . . . . . . . . . . . . . . . 574 Computation of the Transition Path: Runtime . . . . . . . . . . . . . 588
11.1 11.2 11.3 11.4 11.5
Calibration of OLG Model with Idiosyncratic Uncertainty . . 639 Comparison of Runtime and Accuracy . . . . . . . . . . . . . . . . . . . . 644 Comparison of Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672 Second Moments of the OLG Model of Example 11.3.1 . . . . 689 Comparison of Second Moments Across Studies . . . . . . . . . . . . 691
List of Tables
xxxiii
11.6 Calibration of the OLG Model with Individual and Aggregate Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .697 11.7 Runtime: Krusell-Smith Algorithm and OLG Models . . . . . . . 704 11.8 Cyclical Behavior of the Income Distribution . . . . . . . . . . . . . . 713 13.1 Tabulated Values of the Sine and Cosine Function . . . . . . . . . .767 16.1 Iterative Computation of the Ergodic Distribution . . . . . . . . . 862 16.2 Simulation of a Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Acronyms
AD AI AR(1) AR(2) CAS CES CPU CRRA DARE DGE DSGE etc FOC FT GA GDP GEU GSS HP IES iid IRF KKT LAPACK lhs LP LQ
automatic differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii first-order autoregressive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 second-order autoregressive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .473 computer algebra system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 constant elasticity of substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 central processing unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 constant relative risk aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 discrete algebraic Riccati equation . . . . . . . . . . . . . . . . . . . . . . . . . . 90 dynamic general equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi dynamic stochastic general equilibrium. . . . . . . . . . . . . . . . . . . . . .xi and so forth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 first-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838 gross domestic product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 generalized expected utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 generalized stochastic simulation . . . . . . . . . . . . . . . . . . . . . . . . . xxxii Hodrick-Prescott . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 intertemporal elasticity of substitution . . . . . . . . . . . . . . . . . . . . . . 41 independently and identically distributed . . . . . . . . . . . . . . . . . . . 89 impulse response function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Karush-Kuhn-Tucker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 linear algebra package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 left-hand side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 linear-quadratic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 xxxv
xxxvi
NIPA NK OLG OLS PEA rhs SD s.t. TAS TFP VAR(1) VI wrt
Acronyms
national product and income accounts . . . . . . . . . . . . . . . . . . . . . . 54 New Keynesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 overlapping generations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ordinary least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 parameterized expectations approach . . . . . . . . . . . . . . . . . . . . . 234 right-hand side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 symbolic differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 subject to . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 time-additive separable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 total factor productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 first-order vector autoregressive . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 value function iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x with respect to . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
List of Symbols
Z R R+ R++ Rn Cn Cn
set of all integers real line non-negative real numbers, i.e., x ∈ R and x ≥ 0 positive real numbers, i.e., x ∈ R and x > 0 Euclidean n-space complex n-space class of functions having n continuous derivatives
f 0 or f (1)
first derivative of a single valued function of a single argument second derivative of a single valued function of a single argument nth order derivative of a singe valued function of a single argument first partial derivative of a single valued function with respect to its ith argument second partial derivative of a single valued function with respect to argument i and j (in this order) n by m matrix A with typical element ai j the inverse of matrix A with typical element a i j the transpose of the matrix A = (ai j ) with elements A0 = (a ji ) the Jacobian matrix of the vector valued function ¯ f(¯ x) at the point x the Hesse matrix of the single valued function f (¯ x) ¯ at the point x
f 00 or f (2) f (n) f i or Di f or f x i f i j or Di D j f or f xi x j A = (ai j ) A−1 = (a i j ) A0 , AT J(¯ x) H(¯ x)
xxxvii
xxxviii
List of Symbols
∇ f (x) kxk2 tr A det A ε ∼ N (µ, σ2 ) ∀ ∃ ! n k
x = argmin f (x) x
the gradient of f at x, that is, the row vector of partial derivatives ∂ f (x)/∂ x i n the Euclidian norm q (length) of the vector x ∈ R , which is given by x 12 + x 22 + · · · + x n2 the trace of the square matrix A, i.e., the sum of its diagonal elements the determinant of the square matrix A the random variable ε is normally distributed with mean µ and variance σ2 for all exists factorial, i.e., n! = 1 × 2 × · · · × n. binomial coefficient the value x that minimizes the function f (x).
List of Programs
Fortran BM_EP.f90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 BM_VI.f90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 DGM_VI.f90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Differentiation.f90 CDHesse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797 CDJac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 IVDenF.f90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Optimization.f90 GaussNewton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834 GSearch1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845 GSearch2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845 GSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831 SGM_NNI_VI.f90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 SGM_VI.f90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390, 396, 405 SOE_EP.f90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 GAUSS AK280_perturb.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678, 680, 684, 690 AK60_direct.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 AK60_proj.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 AK60_value.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 AK70_prog_pen.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664, 666, 722 AK70_stoch_inc.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644, 652 AK70_stock_inc.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 BM_pert.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 CoRRAM_1.src xxxix
xl
LIST OF PROGRAMS
SolveModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 CoRRAM_2.src HPFilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869 Impulse1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Impulse2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 CoRRAM_3.src Bisec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815 CDHesse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797 CDJac . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 costs_cycles.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 Demo_trans.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591, 598, 605, 609 DGM_VI.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 DGM_WRM.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 dynamics_income_distrib.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531, 532, 534 Function.src Cheb_coef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 Cheb_eval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775 CSpline2_coef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 CSpline2_eval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 CSpline_coef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257, 761 CSpline_eval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257, 761 Find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760 LSpline_coef . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760 LSpline_eval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760 GetPar.g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54 Gorman.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 IVDenF.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450, 503 equivec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442, 863 IVDisF.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 IVdisF.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 IVExpF.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 IVMonteCarlo.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Krusell_Smith_algo.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Laffer.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569, 573, 574, 621 Laffer_p.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574, 578 NLEQ.src Fixp1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .821 FixvMN1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 FixvMN2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 MNRStep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
LIST OF PROGRAMS
xli
OLG6_trans.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582, 585, 587, 588 OLG_Krusell_Smith.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 708 Ramsey1.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Ramsey2.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Ramsey3.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Ramsey4.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Risk_free_rate.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 SGM_NNI_VI.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 SGM_VI.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390, 405 GSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440, 831 Markov_AR1_R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866 Markov_AR1_T. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .865 SGM_VI_MT.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 SOE.g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .331 SVar.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Tax_reform.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 transition_guess.g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503 transition_part.g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493–495, 503 Julia AK60_value.jl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558 AK70_prog_pen.jl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 AK70_stock_inc.jl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 R MATLAB BM_CGC.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 BM_CGT.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 BM_CGT_Eqs.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .262 BM_EP.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 BM_GSS.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 BM_GSS_NN.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 BM_pert.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56, 176 CDHesse.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797 CDHesseRE.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797 CDJac.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 ChebBase.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787 Cubic.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Der_BM.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 DGM.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 DGM_VI.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .379
xlii
LIST OF PROGRAMS
DR_V1.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 DR_V2.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 DR_V3.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 DR_V4.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 DR_V5.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 DR_V6.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 DRV1_Eqs_Smolyak.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 DRV2_Eqs_Smolyak.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 DRV_V3.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Example_Valder.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 GetPar.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54, 178 GH_NW.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806 GH_quad.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .806 GSearch1.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340, 845 Impulse1.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Impulse2.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175, 308 Laffer.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569, 573, 574, 621 Laffer_p.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .574, 578 Linear.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 LP_GSS_LR.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Mon_quad.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809 NK_Model.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .197 Num_Stab_Approx.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Quadratic.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Ramsey2.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Ramsey3.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Search_CGT.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272, 275, 306 Search_CGT_Eqs.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 SGM_GSS.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340 SGM_LP.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 SGM_VI.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390, 405 SOE_EP.m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334 SolveModel.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 SVar.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 TTB.m . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Python AK60_value.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 AK70_prog_pen.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664, 667 AK70_stock_inc.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
LIST OF PROGRAMS
xliii
IVDenF.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 OLG_Krusell_Smith.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698, 708 transition_part.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Part I
Representative Agent Models
Chapter 1
Basic Models
1.1 Introduction Macroeconomics at the intermediate and graduate levels rests on three workhorses: the Solow-Swan model, the overlapping generations (OLG) model, and the Ramsey model. The Solow-Swan model, introduced independently by Solow (1956) and Swan (1956), is a powerful analytical tool to study economic growth. The households in this model do not optimize but instead follow a simple rule of thumb. Most, if not all of its properties, can be derived by paper and pencil methods. Therefore, we do not consider this model in this book. The OLG model, which rests on the seminal work of Allais (1947) and Samuelson (1958), captures the demographic structure of an economy and features optimizing households. Since households of different ages behave differently, we introduce this model in Part II in Chapter 10. The Ramsey model, introduced by Ramsey (1928) as a model of a benevolent planner, lies at the heart of dynamic general equilibrium (DGE) models with homogenous agents. This model and its modern versions, the deterministic and stochastic neoclassical growth model and the benchmark real business cycle model, are the focus of the present chapter. All these models depict the interactions of optimizing agents via markets over time. Therefore, they are called DGE models. In as far as they also include various shocks that either hit the aggregate economy and/or pertain to individual firms or households, they are referred to as dynamic stochastic general equilibrium (DSGE) models. Our presentation in this chapter serves two aims: First, we prepare the ground for the algorithms presented in subsequent chapters that use one out of two possible characterizations of a model’s solution. Second,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_1
3
4
1 Basic Models
we develop standard tools in model building and model evaluation used throughout the book. The most basic DGE model is the so-called Ramsey model, where a single consumer-producer chooses a utility maximizing consumption profile. We begin with the deterministic, finite-horizon version of this model. The set of first-order conditions (FOC) for this problem is a system of nonlinear equations that can be solved with adequate software. Then, we consider the infinite-horizon version of this model. We characterize its solution along two lines: 1) The Euler equations provide a set of nonlinear difference equations that determine the optimal time path of consumption. 2) Dynamic programming delivers a policy function that relates the agent’s choice of current consumption to his stock of capital. Both characterizations readily extend to the stochastic version of the infinite-horizon Ramsey model that we introduce in Section 1.4. In Section 1.5, we add productivity growth and labor supply to this model. We use this benchmark model in Section 1.6 to illustrate the problems of parameter choice and model evaluation. Section 1.7 concludes this chapter with a synopsis of the numerical solution techniques presented in Chapters 2 through 7 and introduces measures to evaluate the goodness of the approximate solutions. Readers who already have experience with the stochastic growth model with an endogenous labor supply (our benchmark model) may consider skipping the first four sections and starting with Section 1.6 to become familiar with our notation and to obtain an idea of the methods presented in subsequent chapters.
1.2 The Deterministic Finite-Horizon Ramsey Model 1.2.1 The Ramsey Problem In 1928, Frank Ramsey, a young mathematician, posed the problem “How much of its income should a nation save?” (Ramsey (1928), p. 542) and developed a dynamic model to answer this question. Though greatly praised
1.2 The Deterministic Finite Horizon Ramsey Model
5
by Keynes,1 it took almost forty years and further papers by Cass (1965), Koopmans (1965), and Brock and Mirman (1972) before Ramsey’s formulation stimulated macroeconomic theory. Today, variants of his dynamic optimization problem are the cornerstones of most models of economic fluctuations and growth. At the heart of the Ramsey problem, there is an economic agent producing output from labor and capital who must decide how to split production between consumption and capital accumulation. In Ramsey’s original formulation, this agent was a fictitious planning authority. However, we may also think of a yeoman growing corn or of a household, who receives wage income and dividends and buys stocks. In the following, we employ the farmer narrative to develop a few basic concepts. Time is divided into intervals of unit length. The index t either specifies the current period or — if the meaning is clear — an arbitrary point in time. Let K t denote the amount of seed and L t the number of hours of labor available in period t. These factors produce the amount Yt of corn according to: Yt = F (K t , L t ).
(1.1)
The production function F : R2+ → R+ satisfies several natural and desirable properties:2,3 1) there is no free lunch; i.e., F (0, 0) = 0, 2) F is twice continuously differentiable with positive yet diminishing marginal products; i. e., Fi (K, L) > 0 > Fii (K, L) for i ∈ {1, 2}, 3) and exhibits constant returns to scale in K and L; i.e., λY = F (λK, λL) for all λ ∈ R+ .
1
Keynes (1930), p. 153 wrote:
“... one of the most remarkable contributions to mathematical economics ever made, both in respect of the intrinsic importance and difficulty of its subject, the power and elegance of the technical methods employed, and the clear purity of illumination with which the writer’s mind is felt by the reader to play about its subject.” 2
See, e.g., Barro and Sala-i Martin (2004), pp. 27f. and the discussion of these properties by Irmen and Maußner (2017). 3 For functions of several variables indices denote the partial derivative with respect to the ith argument of the function. Multiple indices denote higher-order partial derivatives in the order indicated by the sequence of indices. Hence, F1 (·) is the marginal product of capital, F2 (·) the marginal product of labor, and Fi j (·), i, j ∈ {1, 2} are the respective second-order partial derivatives of the production function.
6
1 Basic Models
At each period the farmer must decide how much corn to produce, to consume, and to put aside for future production. His future stock of capital K t+1 is the amount of the next period’s seed. The farmer’s choice of consumption C t and investment K t+1 is bounded by current production: C t + K t+1 ≤ Yt . The farmer does not value leisure but works a given number of hours L each period and seeks to maximize the utility function U(C t , C t+1 , . . . , C t+T ). In the farmer example, capital depreciates fully, because the seeds used for growing corn are not available for future production. When we think of capital in terms of machines, factories, or even more generally, human knowledge, this is an overly restrictive assumption. More generally, the resource constraint is given by: Yt + (1 − δ)K t ≥ C t + K t+1 , where δ ∈ [0, 1] is the rate of capital depreciation. In the following, the notation will become slightly simpler if we define the production function to include any capital left after depreciation and drop the constant L: f (K t ) := F (K t , L) + (1 − δ)K t .
(1.2)
Since production without seeds is impossible, we assume f (0) = 0, whereas the other properties of F carry over to f . We are now in the position to state the finite-horizon deterministic Ramsey problem formally as follows: max
(C t ,...,C t+T )
U(C t , . . . , C t+T )
s.t. K t+s+1 + C t+s ≤ f (K t+s ), 0 ≤ C t+s , 0 ≤ K t+s+1 , Kt
s = 0, . . . , T,
given.
In other words, the farmer seeks a time profile of consumption T {C t+s }s=0 := (C t , . . . , C t+T ),
(1.3)
1.2 The Deterministic Finite Horizon Ramsey Model
7
which maximizes his or her lifetime utility U subject to (s.t.) four constraints: 1) Consumption C t+s and investment K t+s+1 cannot exceed production f (K t+s ), 2) consumption cannot be negative, 3) investment cannot be negative, and 4) he or she has a given initial stock of capital K t . The adjective ‘finite-horizon’ refers to the assumption that the farmer does not plan beyond a given future period t + T < ∞. The adjective ‘deterministic’ designates the absence of uncertainty in this problem: The farmer knows in advance how much corn he will obtain when he plans to work L hours and has K t pounds of seeds. Furthermore, he is also sure as to how he will T value a given sequence of consumption {C t+s }s=0 . 1.2.2 The Karush-Kuhn-Tucker Theorem Problem (1.3) is a standard nonlinear programming problem: Choose an n-dimensional vector x ∈ Rn that maximizes the real-valued function f (x) subject to constraints of the form hi (x) ≥ 0, i = 1, . . . , l. The KarushKuhn-Tucker (KKT) theorem characterizes the necessary conditions. A local maximum satisfies these conditions, if the constraints are regular. There are several conditions that qualify the constraints as regular. Among them are the linear independence of the gradients of the binding constraints and the Slater condition.4 The KKT conditions are both necessary and sufficient, if the objective function and the l constraints are concave. The statement of this result in Theorem 1.2.1 (taken from Sundaram (1996), Theorem 7.16, pp. 187 f.) rests on the definition of the Lagrangian function: L (x, λ) := f (x) +
l X i=1
λi hi (x).
(1.4)
Theorem 1.2.1 (KKT) Let f be a concave C 1 function mapping U into R, where U ⊂ Rn is open and convex. For i = 1, . . . , l, let hi : U → R be concave C 1 functions. Suppose the Slater constraint qualification holds, i.e., there is ¯ ∈ U such that some x hi (¯ x) > 0,
i = 1, . . . , l.
Then, x∗ maximizes f over D = {x ∈ U|hi (x) ≥ 0, i = 1, . . . , l} if and only if there is λ∗ ∈ Rl such that the Karush-Kuhn-Tucker first-order conditions hold: 4
See, e.g., Bertsekas (2008), p. 315 and p. 331.
8
1 Basic Models
∂ L (x, λ) ∂ f (x∗ ) X ∗ ∂ hi (x∗ ) = + λi = 0, ∂ xj ∂ xj ∂ x j i=1
j = 1, . . . , n,
λ∗i ≥ 0,
i = 1, . . . , l,
l
λ∗i hi (x∗ ) = 0,
i = 1, . . . , l.
It is easy to see that problem (1.3) fits this theorem if the utility function U and the production function f are strictly concave, strictly increasing, and twice continuously differentiable. The Lagrangian of the problem (1.3) is given by L =U(C t , . . . , C t+T ) + +
T X
T X s=0
λ t+s ( f (K t+s ) − C t+s − K t+s+1 ) +
T X
µ t+s C t+s
s=0
ω t+s+1 K t+s+1
s=0
so that the first-order conditions read:5 0=
∂ U(C t , . . . , C t+T ) − λ t+s + µ t+s , ∂ C t+s
s = 0, . . . , T,
(1.5a)
0 = −λ t+s + λ t+s+1 f 0 (K t+s+1 ) + ω t+s+1 , s = 0, . . . , T − 1,
(1.5b)
0 = λ t+s ( f (K t+s ) − C t+s − K t+s+1 ) ,
s = 0, . . . , T,
(1.5d)
0 = µ t+s C t+s ,
s = 0, . . . , T,
(1.5e)
0 = ω t+s+1 K t+s+1 ,
s = 0, . . . , T.
(1.5f)
0 = −λ t+T + ω t+T +1 ,
(1.5c)
The multipliers λ t+s , µ t+s , and ω t+s value the severeness of the respective constraint. According to the KKT theorem, a constraint that does not bind has a multiplier of zero. For example, if C t > 0, then (1.5e) implies µ t = 0. If we want to rule out corner solutions, i.e., solutions where one or more of the nonnegativity constraints bind, we need to impose an additional assumption. In the present context this assumption has a very intuitive meaning: The farmer would hate to starve to death during any period. Formally, this translates into the statement: 5
As usual, a prime denotes the first (two primes the second) derivative of a function f (x) of one variable x. Condition (1.5c) derives from the budget constraint of period T , f (K T ) − C T − K T +1 ≥ 0, which has the multiplier λ T , and the nonnegativity constraint on K T +1 , which has the multiplier ω T +1 .
1.3 The Deterministic Infinite Horizon Ramsey Model
9
∂ U(C t , . . . , C t+T ) → ∞ if C t+s → 0 for all s = 0, . . . , T. ∂ C t+s This is sufficient to imply that C t+s > 0 for all s = 0, . . . , T , µ t+s = 0 (from (1.5e)), and the Lagrangian multipliers λ t+s equal the marginal utility of consumption in period t and, thus, are also strictly positive: ∂ U(C t , . . . , C t+T ) = λ t+s , s = 0, 1, . . . , T. ∂ C t+s Condition (1.5d) thus implies that the resource constraints always bind. Furthermore, since we have assumed f (0) = 0, positive consumption also requires positive amounts of seed K t+s+1 > 0 from period s = 0 through period s = T − 1. However, the farmer will consume his entire crop in the last period of his life, since any seed left reduces his lifetime utility. More formally, this result is implied by equations (1.5f) and (1.5c), which yield λ T K t+T +1 = 0. Taking all pieces together, we arrive at the following characterization of an optimal solution: K t+s+1 = f (K t+s ) − C t+s ,
s = 0, 1, . . . , T − 1,
(1.6a)
0 = f (K t+T ) − C t+T ,
∂ U(C t , . . . C t+T )/∂ C t+s = f 0 (K t+s+1 ), ∂ U(C t , . . . C t+T )/∂ C t+s+1
(1.6b) s = 0, 1, . . . , T − 1.
(1.6c)
The left-hand side (lhs) of equation (1.6c) is the marginal rate of substitution between consumption in two adjacent periods. It gives the rate at which the farmer is willing to forego consumption in t for consumption one period ahead. The right-hand side (rhs) provides the compensation for an additional unit of savings: the increase in future output.
1.3 The Deterministic Infinite-Horizon Ramsey Model In equation (1.6c), the marginal rate of substitution between two adjacent periods depends on the entire time profile of consumption. For this reason, we must solve the system of 2T + 1 nonlinear, simultaneous equations (1.6) at once to obtain the time profile of consumption. Though probably difficult in practice, in principle, this is a viable strategy as long as T is
10
1 Basic Models
finite. However, if we consider an economy with an indefinite final period, that is, if T approaches infinity, this is no longer feasible. We cannot solve for infinitely many variables at once. To circumvent this problem, we restrict the class of intertemporal optimization problems to problems that can be tackled with the tools of dynamic programming. The basic tool of this approach, the Bellman equation, is a recursive functional equation. Solutions of dynamic programming problems are decision rules rather than a time profile of optimal actions. The time-additive separable (TAS) utility function, which we introduce in the next subsection, rests itself on a recursive definition of life-time utility. We derive the first-order conditions for maximizing this function along two routes: the KKT method in Subsection 1.3.2 and dynamic programming in 1.3.3. Subsection 1.3.4 provides a characterization of the dynamics of the infinite-horizon Ramsey model. We close this section with a brief digression that considers the few models that admit an analytical solution of the Ramsey problem.
1.3.1 Recursive Utility The TAS utility function is defined recursively from U t = u(C t ) + β U t+1 ,
β ∈ (0, 1).
(1.7)
In this definition β is a discount factor and β −1 − 1 is known as the pure rate of time preference. The function u : [0, ∞) → R is called the oneperiod, current period, or felicity function. We assume that u is strictly increasing, strictly concave and twice continuously differentiable. The solution to the finite-horizon Ramsey model depends upon the chosen terminal date T . However, in as far as we want to portray the behavior of the economy with Ramsey-type models, there is no natural final date T . As a consequence, most models extend the planning horizon into the indefinite future by letting T → ∞. Iterating on (1.7), we arrive at the following definition of the utility function Ut =
∞ X s=0
β s u(C t+s ).
(1.8)
If we want to rank consumption streams according to this function, we must ensure that the sum on the rhs is bounded from above, i.e., U t < ∞
1.3 The Deterministic Infinite Horizon Ramsey Model
11
for every admissible sequence of points C t , C t+1 , C t+2 , . . . . This will hold if the growth factor of the one-period utility gu := u(C t+s+1 )/u(C t+s ) is smaller than 1/β for all s = 0, 1, 2, . . . . Consider the Ramsey problem (1.3) with infinite time horizon: max
C t ,C t+1 ,...
Ut =
∞ X s=0
β s u(C t+s )
s.t. K t+s+1 + C t+s ≤ f (K t+s ), 0 ≤ C t+s , 0 ≤ K t+s+1 , Kt
(1.9) s = 0, 1, . . . ,
given.
In this model we do not need to assume that the one-period utility function u is bounded. Since u is continuous, it is sufficient to assume that the economy’s resources are finite. In a dynamic context, this requires that ¯ such that there is an upper bound on capital accumulation, i.e., there is K ¯ for each K > K output is smaller than that needed to maintain K: ¯ so that ∀K t > K ¯ ⇒ K t+1 < K t . ∃K
(1.10)
For instance, let f (K) = K α , α ∈ (0, 1). Then: ¯ = 11/(α−1) = 1. K ≤ Kα ⇒ K
Condition (1.10) ensures that any admissible sequence of capital stocks ¯ , K0 } and that consumption in any period is bounded by K max := max{K max cannot exceed f (K ). Figure 1.1 makes that obvious: Consider any point ¯ , such as K1 , and assume that consumption equals zero in all to the left of K periods. Then, the sequence of capital stocks originating in K1 approaches ¯ . Similarly, the sequence starting in K2 approaches K ¯ from the right. K
1.3.2 Euler Equations There are two approaches to characterize the solution to the Ramsey problem (1.9). The first is an extension of the KKT Theorem 6 , and the second is 6
See, e.g., Chow (1997), Chapter Two and Romer (1991).
12
1 Basic Models
f (K)
45◦ K1
¯ K
K2
K
Figure 1.1 Boundedness of the Capital Stock
dynamic programming.7 According to the first approach, necessary conditions may be derived from maximizing the following Lagrangian function Lagrangian function with respect to C t , C t+1 , . . . , K t+1 , K t+2 , . . . : Lt =
∞ X s=0
β s u(C t+s ) + λ t+s ( f (K t+s ) − C t+s − K t+s+1 ) + µ t C t+s + ω t+s+1 K t+s+1 .
Note that in this expression, the Lagrangian multipliers λ t+s , µ t+s , and ω t+s+1 refer to period t + s values. Period t values are given by β s λ t+s , β s µ t+s , and β s ω t+s+1 . The first-order conditions for maximizing L t with respect to C t and K t+1 are given by: u0 (C t ) = λ t − µ t ,
0
7
(1.11a)
λ t = βλ t+1 f (K t+1 ) + ω t+1 ,
(1.11b)
0 = λ t ( f (K t ) − C t − K t+1 ),
(1.11c)
0 = µt Ct ,
(1.11d)
0 = ω t+1 K t+1
(1.11e)
Here, the standard reference is Chapter 4 of Stokey et al. (1989). See also Chapter 6 of Acemo˘ glu (2009).
1.3 The Deterministic Infinite Horizon Ramsey Model
13
and must hold from period t onwards, i.e., for t, t + 1, and so forth (etc). We continue to assume that the farmer hates starving to death, lim u0 (C) = ∞,
C→0
(1.12)
so that the nonnegativity constraints never bind. Since u is strictly increasing in its argument, the resource constraint always binds. Therefore, we can reduce the first-order conditions to a second-order difference equation in the capital stock: u0 ( f (K t ) − K t+1 ) − β f 0 (K t+1 ) = 0, 0 u ( f (K t+1 ) − K t+2 )
for t, t + 1, t + 2, . . . . (1.13)
This equation is often referred to as the Euler equation, since the mathematician Leonhard Euler (1707-1783) first derived it from a continuous time dynamic optimization problem. To find the unique optimal time path of capital from the solution to this functional equation, we need two additional conditions. The given stock of capital in period t, K t , provides the first condition. The second condition is the so called transversality condition. It is the limit of equation (1.5f) for lim T → ∞: Using equation (1.5c) in equation (1.5f) yields 0 = λ T K T +1 . In the infinite-horizon model, this limit can be written as lim β s λ t+s K t+s+1 = 0.
s→∞
(1.14)
Since λ t+s = u0 (C t+s ) from condition (1.11a), the expression β s λ t+s K t+s+1 is the utility value of the period t + s + 1 stock of capital from the perspective of the current period t. Condition (1.14), therefore, states that the present value of the terminal capital stock must approach zero.8 In the Ramsey model (1.9), condition (1.14) is a necessary condition,9 as well as conditions (1.11).
1.3.3 Dynamic Programming We now turn to a recursive formulation of the Ramsey problem. For this purpose, we assume that we already know the solution (denoted by a star) 8
Formally, solving equation (1.13) subject to the initial condition K t and the terminal condition (1.14) is a boundary value problem of a second-order nonlinear difference equation. See Section 16.2.3. 9 See Kamihigashi (2002).
14
1 Basic Models
∗ ∗ ∗ {K t+1 , K t+2 , . . . } ≡ {K t+s }∞ s=1 so that we are able to compute the lifetime utility from
v(K t ) := u( f
∗ (K t ) − K t+1 )+
∞ X s=1
∗ ∗ β s u( f (K t+s ) − K t+s+1 ).
(1.15)
Obviously, the maximum value of lifetime utility v(K t ) depends upon K t directly — via the first term on the rhs of the previous equation — and ∗ indirectly via the effect of K t on the optimal sequence {K t+s }∞ s=1 . Note also ∗ that the second term on the rhs of equation (1.15) is equal to β v(K t+1 ), i.e., the value of lifetime utility obtained from period t + 1 onwards, starting ∗ at K t+1 discounted back to period t. Before we further develop this approach, we adopt notation that is common in dynamic programming. Since K t is an arbitrary initial stock of capital, we drop the time subscript and use K to designate this variable. Furthermore, we use a prime for all next-period variables. We are then able to define the function v recursively via: v(K) :=
max
0≤K 0 ≤ f (K)
u( f (K) − K 0 ) + β v(K 0 ).
(1.16)
The first term to the right of the max operator is the utility of consumption C = f (K) − K 0 as a function of the next-period capital stock K 0 . The second term is the discounted optimal value of lifetime utility, if the sequence of optimal capital stocks starts in the next period with K 0 . Suppose we know the function v so that we can solve the optimization problem on the rhs of equation (1.16). Obviously, its solution K 0 depends upon the given value of K so that we may write K 0 = h(K). The function h is the agent’s decision rule or policy function. Note that the problem does not change with the passage of time: When the next period has arrived, the agent’s initial stock of capital is K = K 0 , and he has to make the same decision with respect to the capital stock of period t + 2, which we denote by K 00 . In this way, he ∗ can determine the entire sequence {K t+s }∞ s=1 . However, we may also view equation (1.16) as an implicit definition of the real-valued function v and the associated function h. From this perspective, it is a functional equation,10 which is named the Bellman equation after its discoverer, the US mathematician Richard Bellman (19201984). His principle of optimality states that the solution of problem (1.9) is equivalent to the solution of the Bellman equation (1.16). Stokey et al. 10
As explained in Section 16.2, a functional equation is an equation whose unknown is a function and not a point in Rn .
1.3 The Deterministic Infinite Horizon Ramsey Model
15
(1989), pp. 67-77, establish the conditions for this equivalence to hold. In this context of dynamic programming, v is referred to as the value function and h as the policy function, decision rule, or feed-back rule. Both functions are time-invariant. The mathematical theory of dynamic programming addresses the existence, the properties, and the construction of v and h. Given that both u(C) and f (K) are strictly increasing, strictly concave and twice continuously differentiable functions of their respective arguments C and K and that there exists a maximum sustainable capital ¯ as defined in (1.10), one can prove the following results:11 stock K 1) The function v exists, is differentiable, strictly increasing, and strictly concave. 2) The policy function h is increasing and differentiable. 3) The function v is the limit of the following sequence of steps s = 0, 1, . . . : v s+1 (K) =
max
0≤K 0 ≤ f (K)
u( f (K) − K 0 ) + β v s (K 0 ),
with v 0 = 0. We illustrate these findings in Example 1.3.1. Example 1.3.1 Let the one-period utility function u and the production function f be given by u(C) := ln C, f (K) := K α ,
α ∈ (0, 1),
respectively. In Appendix A.1, we use iterations over the value function to demonstrate that the policy function K t+1 = h(K t ) that solves the Ramsey problem (1.9) is given by K t+1 = αβ K tα . Furthermore, the value function is linear in ln K and given by v(K) = a + b ln K, αβ 1 a := ln(1 − αβ) + ln αβ , 1−β 1 − αβ
b :=
α . 1 − αβ
The dynamic programming approach also provides the first-order conditions (1.13). Two steps are required to arrive at this result. First, consider 11
See, e.g., Acemo˘ glu (2009), pp. 187-201, Harris (1987), pp. 34-45, or Stokey et al. (1989), pp. 103-105.
16
1 Basic Models
the first-order condition for the maximization problem on the rhs of equation (1.16): u0 ( f (K) − K 0 ) = β v 0 (K 0 ).
(1.17)
Comparing this with condition (1.11a) (and assuming µ t = 0) reveals that the Lagrange multiplier λ t ≡ β v 0 (K t+1 ) is a shadow price for newly produced capital (or investment expenditures): It equals the current value of the increase in lifetime utility obtained from an additional unit of capital. Second, let K 0 = h(K) denote the solution of this implicit equation in K 0 . This allows us to write the (1.16) as an identity, v(K) = u( f (K) − h(K)) + β v(h(K)), so that we can differentiate with respect to K on both sides. This yields v 0 (K) = u0 (C) f 0 (K) − h0 (K) + β v 0 (K 0 )h0 (K), where C = f (K) − h(K). Using the first-order condition (1.17) provides v 0 (K) = u0 (C) f 0 (K).
(1.18)
Since K is an arbitrarily given stock of capital, this equation relates the derivative of the value function v 0 (·) to the derivative of the one-period utility function u0 (·) and the derivative of the (net) production function f 0 (·) for any value of K. Thus, letting C 0 = f (K 0 ) − K 00 denote the next period’s consumption, we may write v 0 (K 0 ) = u0 (C 0 ) f 0 (K 0 ). Replacing v 0 (K 0 ) in (1.17) by the rhs of this equation yields 1=β
u0 ( f (K 0 ) − K 00 ) 0 0 f (K ). u0 ( f (K) − K 0 )
This equation must hold for any triple (K, K 0 , K 00 ) that establishes the ∗ optimal sequence {K t+s }∞ s=1 solving the Ramsey problem (1.9). Thus, it is identical to the Euler equation (1.13), except that we used primes instead of the time indices.
1.3 The Deterministic Infinite Horizon Ramsey Model
17
1.3.4 The Saddle Path To gain insights into the dynamics of the Ramsey model (1.9), we use the phase diagram technique to characterize the solution of the Euler equation (1.13). Substituting the resource constraint C t = f (K t ) − K t+1 into (1.13) yields a first-order, nonlinear system of difference equations that governs the optimal time path of capital accumulation: K t+1 = f (K t ) − C t , 1=β
0
u (C t+1 ) 0 f (K t+1 ). u0 (C t )
(1.19a) (1.19b)
Together with the initial capital stock K t and the transversality condition (1.14), these equations determine a unique solution. We use Figure 1.2 to construct it.12 The red line in this figure represents the graph of the function C t = f (K t ) that divides the plane into two regions. All points (K t , C t ) on and below this graph meet the nonnegativity constraint on the future capital stock, K t+1 ≥ 0. No time path that starts in this region can leave it via the abscissa, since for all pairs (K t , C t ) > 0, the solution to equations (1.19) in C t+1 is positive due to assumption (1.12). We divide the area below the graph of C t = f (K t ) into four parts, labeled clockwise A1 through A4 . Consider first the locus of all pairs (K t , C t ) along which consumption does not change, i.e., C t = C t+1 . According to equation (1.19b), this happens when the capital stock reaches K ∗ , given by 1 = f 0 (K ∗ ). β The graph of C t := f (K t ) − K ∗ is the locus of all pairs (K t , C t ), implying K ∗ as the next period’s capital stock. Below (above) this graph, consumption is smaller (larger), and thus, K t+1 > K ∗ (K t+1 < K ∗ ). To meet condition (1.19b), at K t+1 , marginal utility must increase (decrease), u0 (C t+1 ) > u0 (C t ) (u0 (C t+1 ) < u0 (C t )), which in turn implies C t+1 < C t (C t+1 > C t ). We have thus established that points (K t , C t ) below (above) this graph are associated with decreasing (increasing) consumption. The vertical arrows in Figure 1.2 designate that behavior. 12
The time paths shown in this figure are obtained from a numerical simulation. Since they represent the solution of a system of difference equations and not of a system of differential equations, they are connected line segments rather than smooth curves.
18
1 Basic Models Ct C t = f (K t ) ⇔ K t+1 = 0 C t = f (K t ) − K ∗ ⇔ C t+1 = C t C t = f (K t ) − K t ⇔ K t+1 = K t Saddlepath
A4
A1
C∗ A3 A2
K∗
¯ K
Kt
Figure 1.2 Phase Diagram of the Infinite-Horizon Ramsey Model
Consider second the locus of all pairs (K t , C t ) along which the capital stock does not change. Assuming K t = K t+1 in equation (1.19a) implies: C t = f (K t ) − K t . The graph of this function equals the vertical distance between the function f (K t ) and the 45-degree line in Figure 1.1. Thus, it starts at the origin, attains a maximum at K m (not shown), defined by 1 = f 0 (K m ), and cuts ¯ . Points above (below) that locus have a higher (smaller) the K-axis at K consumption, and thus, the capital stock declines (increases) in that region, as shown by the horizontal arrows. The optimal path of capital accumulation is the green line in Figure 1.2, the so-called saddle path. Points on that locus converge towards the stationary equilibrium at (K ∗ , C ∗ ). All other time paths either violate the nonnegativity constraint on K t+1 in finite time or the transversality condition (1.14). To derive this assertion, we study the behavior of the
1.3 The Deterministic Infinite Horizon Ramsey Model
19
dynamic system (1.19) in the four different regions. Consider a time path starting in region A1 . According to the arrows, it either 1) 2) 3) 4) 5)
moves towards the graph of C t = f (K t ), enters the region A4 , converges towards the stationary solution (K ∗ , C ∗ ), ¯, converges towards K or enters the region A2 .
It can be shown (by a straightforward but somewhat tedious argument) that paths that move towards the graph of C t = f (K t ) hit that line in finite time and thus constitute no feasible paths. Likewise, all paths that originate in the region A4 violate the nonnegativity constraint on K t+1 in finite time since they can only move towards the border of the feasible region as designated by the arrows. Time paths that originate in A3 either 1) enter the region A4 , 2) converge towards the stationary solution, 3) or enter the region A2 . Consider a path starting in A2 . We already know that it cannot cross the abscissa. In addition, it cannot move into A1 . To see this, consider a point P0 := (K0 , C0 ) on the border — so that C0 = f (K0 ) − K0 — and a point P1 := (K1 , C0 ), K1 < K0 to the left of P0 (see Figure 1.3). Ct A1 P1
C0
P0 ∆1 ∆2
A2
K1 Figure 1.3 No Path Leaves the Region A2
K0
Kt
20
1 Basic Models
The length of the horizontal arrow that points from P1 to the right is given by ∆1 := ( f (K1 ) − C0 ) − K1 = f (K1 ) − f (K0 ) + K0 − K1 ,
which is less than the horizontal distance between P0 and P1 , ∆2 = K0 − K1 , ¯. since f (K1 ) − f (K0 ) < 0. Therefore, each path in A2 must converge to K Consider what happens along this path. Since the stock of capital increases in this region, K t will eventually exceed K ∗ , and the marginal product of ¯ ) < 1/β. Therefore, there exists a capital decreases from 1/β at K ∗ to f 0 (K point (K0 , C0 ) on that path so that the growth factor of the marginal utility of consumption implied by (1.19b) exceeds c > 1/β: gu0 :=
u0 (C1 ) 1 = > c, 0 0 u (C0 ) β f (K1 )
and there is a lower bound on u0 (C t ) given by u0 (C t ) ≥ c t u0 (C0 ).
This implies
lim β t u0 (C t )K t+1 ≥ lim (β c) t u0 (C0 )K t+1 = ∞,
t→∞
t→∞
¯ and lim t→∞ (β c) t = ∞. Thus, we have shown since lim t→∞ K t+1 = K ¯ violates the transversality condition. A similar that a path converging to K ¯ from the right. argument applies to all paths that approach K To summarize, the only paths left are those that start on the saddle path and that converge to the stationary solution at the point (K ∗ , C ∗ ). From the point of view of dynamic programming, this line is the graph of the policy function for consumption implied by the decision rule for the next-period capital stock via the resource constraint: C t = g(K t ) := f (K t ) − h(K t ). It relates the capital stock at each date t to the optimal choice of consumption at this date. Given the initial capital stock K0 , the optimal strategy is to choose C0 = g(K0 ) and then to iterate either over the Euler equations (1.19) or, equivalently, over the policy functions h and g. The problem we have to deal with is how to derive the function h. Unfortunately, the Ramsey model (1.9) admits an analytical solution of the policy function only in a few special cases, which we consider in the next subsection.13 13
We do not pretend that the following list completely exhausts the class of models with exact solution for the policy function.
1.3 The Deterministic Infinite Horizon Ramsey Model
21
1.3.5 Models with Analytical Solution LOGARITHMIC UTILITY AND LOG -LINEAR TECHNOLOGY. In Example 1.3.1, we assume a logarithmic utility function u(C t ) = ln C t and a net production function of the Cobb-Douglas type f (K t ) = K tα . If capital depreciates fully, that is, if δ = 1, this function also describes the gross output of the economy. In Appendix A.1, we show that the policy function of the next-period capital stock is given by K t+1 = h(K t ) := αβ K tα .
(1.20)
A multisector version of this model was used in one of the seminal articles on real business cycles by Long and Plosser (1983) to demonstrate that a very standard economic model without money and other trading frictions is capable of explaining many features of the business cycle. Radner (1966) is able to dispense with the assumption of 100% depreciation. He, instead, assumes that each vintage of capital investment is a separate factor of production in a log-linear technology. The disadvantage of his model is that output is zero if gross investment in any prior period is zero. Figure 1.4 displays the time path of the stock of capital implied by the solution of Example 1.3.1. We use α = 0.36 and β = 0.996 and set the initial capital stock K0 equal to one tenth of the stationary capital stock K ∗ = (αβ)1/(1−α) . It takes only a few periods for K0 to be close to K ∗ . 0.20
Kt
0.15 0.10 0.05
1
2
3
4
5
6
7
8
9
10
t Figure 1.4 Convergence of the Capital Stock in the Infinite-Horizon Ramsey Model
22
1 Basic Models
LOGARITHMIC UTILITY AND LOG -LINEAR ADJUSTMENT COSTS. A second class of models with logarithmic utility and a log-linear production function for gross output is provided in an article by Hercowitz and Sampson (1991). Instead of full depreciation, the authors assume adjustment costs of capital that give rise to the following transition function for the stock of next period capital: K t+1 = K t1−δ I tδ ,
(1.21)
where gross investment I t equals output Yt = K tα minus consumption C t : I t = Yt − C t .
We ask you in Problem 1.1.3 to show that the policy functions for the next period capital stock and for consumption are given by k
K t+1 = k0 K t 1 , 1/δ C t = 1 − k0 K tα , where the constants k0 and k1 are functions of the model’s parameters α, β, and δ. ISOELASTIC UTILITY AND CES-TECHNOLOGY. Benhabib and Rustichini (1994) provide a class of models where the utility function is not restricted to the logarithmic case but is given by the isoelastic function 1−η
u(C t ) =
Ct
−1
1−η
,
η > 0,
which approaches ln C t for η → 1. There are two vintages of capital, K1t and K2t , that produce output according to the constant elasticity of substitution function 1 1−ε 1−ε Yt = a1 K1t + a2 K2t + (1 − a1 − a2 )L 1−ε 1−ε ,
where L is the exogenously given amount of labor. The two vintages are related to each other via the equation K2t+1 = δK1t ; that is, capital lasts for two periods, and new vintages depreciate at the rate δ ∈ (0, 1). The economy’s resource constraint is given by
1.3 The Deterministic Infinite Horizon Ramsey Model
23
Yt = C t + K1t+1 . Assuming η = ε, the solution of this model is a constant savings rate s determined from 1 s = β a1 + β 2 a2 δ1−ε ε , so that the policy function for K1t+1 is given by K1t+1 = sYt . Antony and Maußner (2012) argue that this model can be extended and interpreted as a model with adjustment costs of capital that give rise to the transition equation 1 K t+1 = (1 − δ)K t1−ε + δI t1−ε 1−ε , with generalizes equation (1.21) to a constant elasticity of substitution (CES) function with elasticity 1/ε 6= 1. The production function in their model is 1 Yt = bL 1−ε + (1 − b)K t1−ε 1−ε , with L as the given amount of labor used in the production. The savings rate that solves this model is determined from s=
∞ X j=1
1ε (1 − b)(1 − δ)δ j−1 β j .
THE LINEAR-QUADRATIC MODEL. In Section 2.4, we consider a special class of models known as linear quadratic models or optimal linear regulator problems. The Ramsey model sketched next is an example of this class. We assume a quadratic current period utility function u(C t ) := u1 C t −
u2 2 C , 2 t
u1 , u2 > 0
and a linear (net) production function f (K t ) := AK t ,
A > 0.
24
1 Basic Models
With these functions, the system of difference equations (1.19) may be written as: K t+1 = AK t − C t , u1 1 1 C t+1 = 1− + Ct . u2 βA βA
(1.22a) (1.22b)
We use the method of undetermined coefficients explained in Appendix A.1 to find the policy functions. We guess that the policy function for consumption g is linear: C t = c1 + c2 K t . Substituting this function into (1.22b) provides: u1 1 1 c1 + c2 K t+1 = 1− + (c1 + c2 K t ), u2 βA βA u1 1 1 c1 + c2 (AK t − c1 − c2 K t ) = 1− + (c1 + c2 K t ). u2 βA βA The last equation holds for arbitrary values of K t if the constant terms on both sides sum to zero: u1 1 1 0 = c1 1 − c2 − − 1− , (1.23a) βA u2 βA and if the coefficients of the variable K t also sum to zero. This condition provides the solution for c2 : c2 = A −
1 , βA
(1.23b)
which can be used to infer c1 from equation (1.23a). Inserting the solution for c2 in equation (1.22a) delivers the policy function for capital: K t+1 = h(K t ) :=
1 K t − c1 . βA
If 1/β < A, the stock of capital approaches the stationary solution, K∗ =
c1 , 1/(βA) − 1
from any given initial value K0 , and consumption converges to C ∗ = u1 /u2 , so that the transversality condition (1.14) holds.
1.4 The Stochastic Ramsey Model
25
1.4 The Stochastic Ramsey Model 1.4.1 Stochastic Output In the Ramsey problem (1.9), everything is under the farmer’s control. This is obviously an overly optimistic picture of farming: Less rain during the summer causes harvest failure, whereas the right balance between rainfall and sunshine boosts crop growth. The amount of rainfall is outside the control of the farmer, and usually, he is unable to fully predict it accurately. The ensuing uncertainty turns the crop and, in turn, the resulting consumption into stochastic variables. As a consequence, we must restate the farmer’s decision problem in the framework of expected utility maximization. We illustrate the points involved in this task in Example 1.4.1. An in-depth treatment of the analytical framework that underlies stochastic control is beyond the scope of this book. We refer the interested reader to Acemo˘ glu (2009), Chapter 16 and Stokey et al. (1989), Part III. Example 1.4.1 Assume the farmer’s planning horizon is T = 1. His one-period utility function u(C t ) is strictly increasing in consumption C t . Output in period t = 0 is given by f (K0 ) and in period t = 1 by Z1 f (K1 ), where Z1 = Z with probability π and Z1 = Z¯ > Z with probability 1 − π. f (K t ) is strictly increasing in the capital stock K t . K0 is given. Since the farmer does not plan beyond t = 1, we already know that he will choose C1 = Z1 f (K1 ). Given his investment decision in the current period, K1 , his future consumption is a random variable with realizations C1 (Z) = Z f (K1 ) and C1 ( Z¯ ) = Z¯ f (K1 ). Hence, the farmer’s expected lifetime utility is E0 [u(C0 ) + βu(C1 )] := u( f (K0 ) − K1 ) + β πu(Z f (K1 )) + (1 − π)u( Z¯ f (K1 )) ,
where E0 denotes expectations as of period t = 0. The farmer chooses K1 to maximize this expression. Differentiating with respect to K1 and setting the resulting expression to zero yields the following first-order condition: u0 (C0 ) = β u0 (Z f (K1 ))Z f 0 (K1 )π + u0 ( Z¯ f (K1 )) Z¯ f 0 (K1 )(1 − π) . | {z } =:E0 [u0 (C1 )Z1 f 0 (K1 )]
This equation is the stochastic analog to the respective Euler equation in the deterministic case. It states that the utility loss from increased savings in the current period, u0 (C0 ), must be compensated by the discounted expected future utility increase.
26
1 Basic Models
We will consider the following stochastic infinite-horizon Ramsey model, which is also known as the stochastic growth model: ∞ X s max E t β u(C t+s ) Ct
s=0
s.t. K t+s+1 + C t+s ≤ Z t+s f (K t+s ) + (1 − δ)K t+s , 0 ≤ C t+s , 0 ≤ K t+s+1 , Kt , Zt
(1.24) s = 0, 1, . . . ,
given.
Note that from here on, f (K) := F (K, L) for fixed L denotes the gross value added, and we consider capital depreciation explicitly. We need to do so, since using our specification of the production function from (1.2), Z t f (K t ) would imply stochastic depreciation otherwise. Problem (1.24) differs from the deterministic model in two respects: First, output at each period t depends on not only the amount of capital K t but also the realization of a stochastic variable Z t capturing weather conditions. We assume that the farmer knows the amount of rainfall Z t at harvest time, when he must decide about consumption. Second, as a consequence, in the current period t, the farmer only chooses the current consumption C t . In the deterministic case, he receives no new information as the future unfolds. Therefore, he can safely determine consumption from the present to the very distant future. In technical terms, his decision problem is an open-loop control problem, as opposed to closed-loop control one in the stochastic case. Here, as in Example 1.4.1, future consumption is a stochastic variable from the perspective of the current period. Thus, the farmer does better if he postpones the decision on period t consumption until this period t. As a consequence of the uncertainty with respect to consumption, the farmer aims to maximize the expected value of his lifetime utility. More specifically, the notation E t [·] denotes expectations with respect to the probability distribution of the sequence of random variables {C t+s }∞ s=0 conditional on information available at period t. The fact that we use the mathematical expectations operator means that agents use the true — or objective as opposed to subjective — probability distribution of the variables they have to forecast. Since the seminal article of Muth (1961) economists have used the term ‘rational expectations’ to designate this hypothesis on expectations formation. The solution of the deterministic, infinite-horizon Ramsey model in terms of a time-invariant policy function rests on the recursive structure
1.4 The Stochastic Ramsey Model
27
of the problem that in turn is implied by the TAS utility function. To preserve this structure in the context of a stochastic model requires us to restrict the class of probability distributions to stochastic processes that have the Markov property. If you are unfamiliar with Markov processes we recommend consulting Section 16.4, where we sketch the necessary definitions and tools. We proceed to derive the first-order conditions that govern the model’s evolution over time. As in the previous section, we obtain these conditions via two tracks: the Karush-Kuhn-Tucker (KKT) approach and stochastic dynamic programming.
1.4.2 Stochastic Euler Equations First-order conditions for the stochastic Ramsey model (1.24) can be derived in a manner analogous to that in the deterministic case. Consider the following Lagrangian function: L = Et
§X ∞ s=0
β s u(C t+s ) + µ t+s C t+s + ω t+s+1 K t+s+1
+ λ t+s (Z t+s f (K t+s ) + (1 − δ)K t+s − C t+s − K t+s+1 )
ª
.
Since the expectations operator is a linear operator, we can differentiate the expression in curly brackets with respect to C t and K t+1 (see Example 1.4.1). This delivers ∂L = E t {u0 (C t ) − λ t + µ t } = 0, (1.25a) ∂ Ct ∂L = E t {−λ t + ω t+1 + βλ t+1 (1 − δ + Z t+1 f 0 (K t+1 ))} = 0, ∂ K t+1 (1.25b) 0 = λ t (Z t f (K t ) + (1 − δ)K t − C t − K t+1 ),
(1.25c)
0 = µt Ct ,
(1.25d)
0 = ω t+1 K t+1 .
(1.25e)
Since, as in Example 1.4.1, C t , K t+1 , and hence the multipliers λ t , µ t , and ω t+1 are nonstochastic, we can replace the first condition with u0 (C t ) = λ t − µ t
(1.26a)
28
1 Basic Models
and the second with λ t = βE t λ t+1 [1 − δ + Z t+1 f 0 (K t+1 )] + ω t+1 .
(1.26b)
Note that both conditions must hold for arbitrary values of the time index t.14 We can therefore use (1.26a) and its period t + 1 equivalent to eliminate λ t and λ t+1 in condition (1.26b). An interior solution with strictly positive consumption and capital must therefore satisfy the stochastic analog to the Euler equation (1.13):15 1 =βE t
u0 (Z t+1 f (K t+1 ) + (1 − δ)K t+1 − K t+2 ) u0 (Z t f (K t ) + (1 − δ)K t − K t+1 )
(1.27)
0
× (1 − δ + Z t+1 f (K t+1 )).
In addition to the stochastic Euler equation (1.27), there is also the stochastic analog of the transversality condition (1.14), namely lim β s E t λ t+s K t+s+1 = 0,
s→∞
(1.28)
that provides a boundary condition for the solution to (1.27). Kamihigashi (2005) shows that this condition is a necessary optimality condition in the following cases: 1) the utility function u(C t ) is bounded, 2) the utility function is logarithmic u(C t ) = ln C t , 1−η 3) the utility function is of the form u(C t ) = C t /(1−η), η ∈ [0, ∞)\{1} and lifetime utility at the optimum is finite.
1.4.3 Stochastic Dynamic Programming As in the deterministic Ramsey model, there is a dynamic programming approach to characterize solutions of the stochastic Ramsey model (1.24). The value function v(K, Z) is now defined as the solution to the following stochastic functional equation: 14
To see this, replace t by t+1 in the statement of the Lagrangian function, and differentiate with respect to C t+1 and K t+2 . The respective FOC are equal to equations (1.25) with t + 1 in place of t. 15 Note that C t > 0 implies µ t = 0 via equation (1.25d) and that K t+1 implies ω t+1 = 0 via condition (1.25e).
1.4 The Stochastic Ramsey Model
v(K, Z) =
max
0≤K 0 ≤Z f (K)+(1−δ)K
29
u(Z f (K) + (1 − δ)K − K 0 ) + βE v(K 0 , Z 0 )|Z ,
where expectations are conditional on the given realization of Z and where a prime denotes next-period values. In the case of a Markov chain with realizations [z1 , z2 , . . . , zn ] and transition matrix P = (pi j ), the expression E v(K 0 , Z 0 )|Z is given by
0
0
E v(K , Z )|zi =
n X j=1
pi j v(K 0 , z j ),
and in the case of the continuous-valued Markov process with conditional probability density function π(z, Z 0 ) over the interval [a, b], it is
0
0
E v(K , Z )|z =
Z
b a
v(K 0 , Z 0 )π(z, Z 0 ) d Z 0 .
Some sophisticated mathematics are required to prove the existence and to find the properties of the value function and the associated policy function K 0 = h(K, Z). We refer the interested reader to Acemo˘ glu (2009), Chapter 16 and Stokey et al. (1989), Chapter 9 and proceed under the assumption that both the value and the policy function exist and are sufficiently differentiable with respect to K. Under this assumption, it is easy to use the steps taken on page 16 to show that the dynamic programming approach also delivers the stochastic Euler equation (1.27). We leave this exercise to the reader (see Problem 1.1.5). Example 1.4.2 extends Example 1.3.1 to the stochastic case. As in this example, there is an analytical solution for the policy function h. Example 1.4.2 Let the one-period utility function u and the production function f be given by u(C) := ln C, f (K) := K α ,
α ∈ (0, 1),
respectively. In Example 1.3.1, we find that K 0 is directly proportional to K α . Therefore, let us try K t+1 = h(K t , Z t ) := AZ t K tα
30
1 Basic Models
as a policy function with the unknown parameter A. If this function solves the problem, it must satisfy the stochastic Euler equation (1.27). To prove this assertion, we replace K t+1 in equation (1.27) with the rhs of the previous equation. This gives (1 − A)Z t K tα αβ α α−1 1 = βE t αZ [AZ K ] = . t+1 t t (1 − A)Z t+1 [AZ t K tα ]α A If we put A = αβ, the function h(Z t , K t ) = αβ Z t K tα indeed satisfies the Euler equation and is thus the policy function we are looking for.
The solution of the deterministic Ramsey model is a time path for the capital stock. In the stochastic case, K 0 = h(K, Z) is a random variable, since Z is random. The policy function induces a time-invariant probability distribution over the space of admissible capital stocks. This distribution is the counterpart to the stationary capital stock K ∗ in the deterministic Ramsey model (1.9). We illustrate this point with the aid of Example 1.4.2 for α = 0.36 and β = 0.996. We assume that Z t has a uniform distribution over the interval [0.95, 1.05] and employ a random number generator to obtain independent draws from this distribution. Starting with K ∗ = (αβ)1/(1−α) , we then iterate over K t+1 = αβ Z t K tα to obtain a path with one million observations on K t . We divide the interval between the smallest and the highest value of K attained along this path into 150 nonoverlapping intervals and count the number of capital stocks that lie in each interval. Figure 1.5 displays the result of this exercise. Since it rests on a sample from the distribution of K, it provides an approximate
Percent
1.00 0.75 0.50 0.25 0.00
0.19
0.19
0.2
0.2
0.21
0.21
0.22
Capital Stock Figure 1.5 Stationary Distribution of the Capital Stock in the Stochastic Infinite-Horizon Ramsey Model
0.22
1.5 Labor Supply, Growth, and the Decentralized Economy
31
picture of the density function of the capital stock implied by the model of Example 1.4.2. Note that despite the fact that each small subinterval S ⊂ [0.95, 1.05] of length l has the same probability of l/0.1, the distribution of the capital stock is not uniform. To understand this, note that for each fixed Z ∈ [0.95, 1.05] the capital stock approaches K(Z) = (αβ Z)1/(1−α) . Since the mean of the uniform distribution over [0.95, 1.05] is Z = 1, neither very small nor very high values of K have a high chance of being realized. Note in particular that the mean of the distribution E(K t ) is in general not equal to K ∗ .
1.5 Labor Supply, Growth, and the Decentralized Economy 1.5.1 Substitution of Leisure Thus far, we have taken the labor supply as exogenous. However, it is well-known that there are considerable employment fluctuations over the business cycle. In the context of our farming example, variations in labor input may arise from shocks to labor productivity if the farmer values both consumption and leisure. To allow for that case, we include leisure in the one-period utility function. Leisure is the farmer’s time endowment, which we normalize to 1, minus his working hours L. Thus, we may state the one-period utility function now as u(C, 1 − L).
(1.30)
In the following subsection, we will ask what kinds of restrictions we must place on u in addition to the usual assumptions with respect to concavity and monotonicity when we deal with a growing economy. Before we proceed, we briefly consider what we can expect in general from including leisure into the one-period utility function. Assume that in today’s marginal product of labor, the farmer observes an increase that he or she considers short-lived. How will he or she react? In the current period, the shock increases the farmer’s opportunity set, since at any given level of labor input the farmer’s harvest will be higher than before the shock. At the same time the shock changes the relative price of leisure: The farmer loses more output for each additional unit of leisure he or she desires. The overall effect of the shock on the intratemporal substitution between labor and consumption depends upon the relative size of the associated income and substitution effect. If leisure and
32
1 Basic Models
consumption are normal goods, the farmer wants both more consumption and more leisure (income effect). However, since leisure is more costly than before the shock, the farmer also wants to substitute consumption against leisure (substitution effect). In the intertemporal setting we are considering here, there is an additional, intertemporal substitution effect. The shock raises the current reward for an additional hour of work vis-à-vis the future return. Consequently, the farmer will want to work more now and less in the future. He or she can achieve this goal by increasing today’s savings and spending the proceeds in subsequent periods. Thus, investment serves as a vehicle to the intertemporal substitution of consumption and leisure.
1.5.2 Growth and Restrictions on Technology and Preferences LABOR-AUGMENTING TECHNICAL PROGRESS. When we refer to economic growth, we think of increases in output at given levels of input brought about by increases in technological knowledge. This kind of technological progress is called disembodied as opposed to embodied progress that operates via improvements in the quality of the factors of production. Disembodied technological progress simply shifts the production function outward. Equivalently, we may think of it as if it redoubled the available physical units of labor and capital. For instance, if L is the amount of physical or raw labor and A its efficiency level, effective labor is AL. Using this concept, the output is given by Yt = Z t F (B t K t , A t L t ), where the efficiency factors A t and B t as well as the productivity shock Z t are exogenously given time series or stochastic processes. We continue to assume that the production function F (·) has positive but diminishing marginal products, that both factors of production are essential, and that F (·) exhibits constant returns to scale. Formally, we have the following:16 1) Fi (BK, AL) > 0 > Fii (BK, AL) for i ∈ {BK, AL}, 2) F (0, AL) = 0 and F (BK, 0) = 0, 3) λY = F (λBK, λAL). 16
Here and in the following, for any function F (x 1 , . . . , x n ) the expression Fi denotes the first partial derivative of F with respect to x i , and Fi j denotes the derivative of Fi (x 1 , . . . , x n ) with respect to x j .
1.5 Labor Supply, Growth, and the Decentralized Economy
33
In Section 1.3.4, we have seen that the solution to the deterministic, infinite-horizon Ramsey model approaches a stationary equilibrium. There is an appropriate concept of stationarity in models of growth, the socalled balanced growth path. Referring to Solow (1988), p. 4, we define a balanced growth path by two requirements: 1) output per working hour grows at a constant rate, 2) and the share of net savings in output is constant. The motivation for this definition has two different sources. First, from the empirical perspective, the balanced growth path replicates the broad facts about the growth of advanced industrial economies.17 Second, from the theoretical perspective, the balanced growth path allows to define variables that relative to their trend path are stationary, such as the unscaled variables in no-growth models. Therefore, the techniques used to study stationary economies remain valid. In Appendix A.2, we show that for a balanced growth path to exist technical progress must be of the labor-augmenting type; i.e., B t ≡ 1 ∀t. As a consequence, we specify the production function as Yt = Z t F (K t , A t L t ).
(1.31)
TREND VERSUS DIFFERENCE STATIONARY GROWTH. The specification of the production function (1.31) leaves two possible modeling choices for the process governing the evolution of the efficiency factor of raw labor. If we consider growth a deterministic process, the efficiency factor A t grows at a given and constant growth factor a > 1: A t+1 = aA t .
(1.32)
Variations around the long-run path are induced by the stochastic process {Z t+s }∞ s=0 . For these variations to be temporary and not permanent, the process that governs Z t must be covariance stationary. This requires (see also Section 16.3): 1) that the unconditional mean E(Z t ) = Z is independent of time 2) and that the covariance between Z t and Z t+s , cov(Z t , Z t+s ) = E[(Z t − Z)(Z t+s − Z)], depends upon the time lag s but not on time t itself.
17
See Solow (1988), pp. 3ff.
34
1 Basic Models
To find the long-run behavior of output, assume that Z t is equal to its unconditional mean Z¯ . Since F has constant returns to scale,we may write Yt = A t Z¯ F (K t /A t , L t ).
(1.33)
Note that according to our utility function (1.30) the labor supply L t is bounded above by 1. The economy’s resource restriction Yt = C t + K t+1 − (1 − δ)K t enforces that output Yt , consumption C t , and capital K t must eventually grow at the same rate. This common rate must be the rate of growth of A t , i.e., a − 1. The assumption of deterministic growth has obvious empirical implications: Output is a trend stationary stochastic process; i.e., when we subtract a linear trend from log-output, the resulting time series is a covariance stationary stochastic process. In an influential paper, Nelson and Plosser (1982) question this implication. They provide evidence that major macroeconomic aggregates are better modeled as difference stationary stochastic processes. A stochastic process {x t } t∈Z is difference stationary if the process {(x t+1 − x t )} t∈Z is a covariance stationary stochastic process. In the context of our neoclassical production function we obtain this result if we set Z t ≡ 1 and let a difference stationary Markov process govern the evolution of the efficiency level of labor. For instance, we may assume A t to follow the process A t = A t−1 aeε t , ε t iid N − 21 σ2 , σ2 , a > 1. (1.34) Under this process, the growth factor of the efficiency level of labor, A t /A t−1 , fluctuates around its long-run mean of a, and the first difference of log-output, ln Yt − ln Yt−1 , is covariance stationary. To see this, use (1.33), and compute the log-difference ln Yt − ln Yt−1 = ln A t − ln A t−1 + ln [F (K t /A t , L t )/F (K t−1 /A t−1 , L t−1 )] , = ln a + ε t + ln [F (K t /A t , L t )/F (K t−1 /A t−1 , L t−1 )] .
The first two terms on the rhs of this expression constitute a random walk with drift ln a. The rightmost term, though also random, is stationary because hours are bounded and the resource constraint enforces that output and capital cannot depart arbitrarily far from each other.
1.5 Labor Supply, Growth, and the Decentralized Economy
35
RESTRICTIONS ON PREFERENCES. The restriction to labor-augmenting technical progress is not sufficient to guarantee the existence of a balanced growth path when the labor supply is endogenous. To develop this argument, we focus on the long-run and ignore stochastic elements. The efficiency factor of labor, therefore, always grows at the rate a, and Z ≡ 1. The distinction between trend and difference stationary growth vanishes. Using the one-period utility function (1.30), the farmer’s maximization problem is ∞ X
max
{C t+s ,L t+s }∞ s=0
s=0
s.t.
K t+s+1 + C t+s ≤ 0≤ 1≥ 0≤ Kt
β s u(C t+s , 1 − L t+s ) (1.35) F (K t+s , A t+s L t+s ) + (1 − δ)K t+s , C t+s , s = 0, 1, . . . , L t+s ≥ 0, K t+s+1 ,
given.
Since we are interested in a long-run solution with positive consumption and leisure, we will ignore the nonnegativity restrictions and the upper bound on labor in setting up the respective Lagrangian: L =
∞ X s=0
β s u(C t+s , 1 − L t+s ) + Λ t+s F (K t+s , A t+s L t+s ) + (1 − δ)K t+s − C t+s − K t+1 .
Differentiating this expression with respect to C t , L t , and K t+1 provides the following set of first-order conditions: 0 = u1 (C t , 1 − L t ) − Λ t ,
0 = −u2 (C t , 1 − L t ) + Λ t F2 (K t , A t L t )A t ,
0 = −Λ t + βΛ t+1 (1 − δ + F1 (K t+1 , A t+1 L t+1 )).
(1.36a) (1.36b) (1.36c)
Conditions (1.36a) and (1.36b) imply that the marginal rate of substitution between consumption and leisure, u2 /u1 , equals the marginal product of labor: u2 (C t , 1 − L t ) = A t F2 (K t , A t L t ). u1 (C t , 1 − L t )
(1.37)
36
1 Basic Models
Conditions (1.36a) and (1.36c) yield u1 (C t , 1 − L t ) = β(1 − δ + F1 (K t+1 , A t+1 L t+1 )). u1 (C t+1 , 1 − L t+1 )
(1.38)
Consider the rhs of this equation. Since F is homogenous of degree one, F1 is homogenous of degree zero; i.e., F1 (K t+1 , A t+1 L t+1 ) = F1 (K t+1 /A t+1 , L t+1 ). We have already seen that on a balanced growth path, both L t+1 and K t+1 /A t+1 are constants. Thus, in the long-run, the rhs of equation (1.38) is constant and the lhs must also be. Now consider the resource constraint K t+1 = Yt − C t + (1 − δ)K t . If capital and output grow at the common rate a − 1, consumption must grow at the same rate, since otherwise the growth factor of capital g K , g K :=
K t+1 Yt Ct = − + (1 − δ), Kt Kt Kt
is not constant. If consumption grows at the rate a − 1, the marginal utility of consumption must fall at a constant rate. As we show in Appendix A.2, this restricts the one-period utility function u to the class of constantelasticity functions with respect to consumption. Further restrictions derive from condition (1.37). Since the marginal product of labor increases in the long-run at the rate a − 1, there must be exactly off-setting income and substitution effects with respect to the static labor supply decision. As we demonstrate in Appendix A.2, we must restrict the one-period utility function (1.30) to C 1−η v(1 − L) if η 6= 1, u(C, 1 − L) = (1.39) ln C + v(1 − L) if η = 1. The function v must be chosen so that u(C, 1 − L) is concave. Remember that a function is concave, if and only if uii ≤ 0 and (u11 u22 − u212 ) ≥ 0, and that it is strictly concave, if u11 < 0 and (u11 u22 − u212 ) > 0.18 . For example, in the parameterization of u used in Example 1.6.1 below, the restriction of η to η > θ /(1 + θ ) implies that u is strictly concave. 18
See, e.g.,Takayama (1985) Theorem 1.E.13.
1.5 Labor Supply, Growth, and the Decentralized Economy
37
TREND IN HOURS PER CAPITA. The restriction of preferences being consistent with constant working hours per capita in the long-run was mainly motivated by the observation that post World War II US hours per capita exhibit no trend. In a recent study, Boppart and Krusell (2020) challenge this widely held belief. From a panel of 25 industrialized countries observed over the period 1870-1998, they estimate an average downward trend of hours per capita of 0.46 percent per annum. They also argue that the long-run trend in US hours has been intermitted in the postwar period by tax policies, the baby boom, stagnant real wages, and women’s increased labor force participation. How do we have to modify preferences to be consistent with a given trend in hours? In their study of the German business cycle, Maußner and Spatz (2006) ψ follow Lucke (1998), p. 76 and assume that l t := A t L t instead of raw hours L t are an argument of the instantaneous utility function u(C t , 1 − l t ). For a given trend in hours g L := L t+1 /L t , they determine the parameter ψ such that l t is constant in the long-run. Thus, ψ must solve the equation
A t+1 g l := At
ψ
L t+1 ≡ aψ g L = 1 Lt
(1.40)
for a given trend in hours g L and labor-augmenting technical progress a. Now, consider the production function (1.31). Constant returns to scale allow us to write Yt = (A t L t )Z t F (K t /(A t L t ), 1). In the long-run, if both Z t and K t /(A t L t ) are constant, output grows at the factor (1.40)
g Y = a g L = a1−ψ . As we have argued above, the long-run growth rate of per capita consumption C t must be equal to g Y . Therefore, ψ 1−ψ
h t := C t
Lt
will also stay constant, because ψ 1−ψ
gh = g C
g L = a1−ψ
ψ 1−ψ
a−ψ = a0 = 1.
With h t in place of L t , the instantaneous utility function u(C t , 1 − h t ) must still meet two requirements:
38
1 Basic Models
1) the first-order condition (1.37) implies that the marginal rate of substitution between consumption and leisure u2 (C t , 1 − h t )/u1 (C t , 1 − h t ) must grow at the rate of technical progress a − 1; 2) the first-order condition (1.38) implies that the marginal rate of substitution between consumption in two adjacent periods u1 (C t , 1 − h t )/βu1 (C t+1 , 1 − h t+1 ) must be constant. Boppart and Krusell (2020) extend the class of utility functions considered in (1.39). They prove a remarkable result: If u(C, L) is twice continuously differentiable and satisfies 1) and 2), it must be of the form u(C, L) =
(C v(h))1−η − 1 , 1−η
ψ
h = C 1−ψ L,
(1.41)
where v(h), is an arbitrary, twice continuously differentiable function of h. TRANSFORMATION TO STATIONARY VARIABLES. Given the restrictions on technology and preferences, it is always possible to choose new variables that are constant in the long-run. As an example, consider the deterministic Ramsey model (1.35). Assume η 6= 1 in (1.39) and a deterministic growth of the efficiency level of labor according to (1.32). The static labor supply condition (1.37) can then be written as v 0 (1 − L t ) Ct = F2 (K t /A t , L t ), (1 − η)v(1 − L t ) A t
(1.42)
and the intertemporal condition (1.38) is: −η
C t v(1 − L t ) −η
C t+1 v(1 − L t+1 ) =
=
−η
(aA t )η C t v(1 − L t ) η
−η
A t+1 C t+1 v(1 − L t+1 )
aη (C t /A t )−η v(1 − L t ) (C t+1 /A t+1 )−η v(1 − L t+1 )
(1.43)
= β(1 − δ + F1 (K t+1 /A t+1 , L t+1 )).
Since F is homogenous of degree one, we can transform the resource constraint to K t+1 = F (K t /A t , L t ) + (1 − δ)(K t /A t ) − (C t /A t ). A t+1 /a
(1.44)
Equations (1.42) through (1.44) constitute a dynamic system in labor L t and the new variables c t := C t /A t and k t = K t /A t . Their stationary values L, c and k are found as the solution to the system of three equations
1.5 Labor Supply, Growth, and the Decentralized Economy
39
(1 − η)v(1 − L) F2 (k, L), v 0 (1 − L) 1 = β a−η (1 − δ + F1 (k, L)), c=
0 = F (k, L) − (1 − δ − a)k − c. Note, that we can derive the efficiency conditions (1.42) through (1.44) from solving the problem max ∞
{c t+s ,L t+s }s=0
∞ X s=0
1−η β˜s c t+s v(1 − L t+s )
s.t.
c t+s ≤ F (k t+s , L t+s ) + (1 − δ)k t+s − ak t+s+1 for s = 0, 1, . . . , k t given, with discount factor β˜ := β a1−η in the stationary decision variables c t := C t /A t and k t+1 := K t+1 /A t+1 .
1.5.3 Parameterizations of Utility and Important Elasticities In this section, we consider common parameterizations of the period utility function u(·) and related elasticities that capture important aspects of the household’s preferences and determine the dynamic properties of DSGE models. Three aspects are important: 1) the household’s attitudes with regard to risk bearing, 2) the household’s willingness to substitute consumption between different periods, 3) and the household’s willingness to substitute between consumption and leisure within a given period. We consider risk bearing first and intertemporal substitution second. To keep the argument simple, we focus on period utility functions with consumption as single argument. In the final paragraph, where we consider consumption and labor, we generalize the results. RISK AVERSION. Consider the period utility function u : [0, ∞] → R introduced in Section 1.3. Arrow (1970), p. 94 and Pratt (1964), p. 122 introduced the coefficient of relative risk aversion defined by
40
1 Basic Models
R(C) := −
Cu00 (C) . u0 (C)
It is a measure of the curvature of the utility function. We explain its relation to risk aversion in Figure 1.6. v(C) u(C)
u(C2 )
¯ u(C) E(u)
u(C1 )
C1
C˜
C¯
C2
C
Figure 1.6 Risk Aversion
Consider a gamble over consumption C. The gamble offers the household consumption C1 with probability p ∈ (0, 1) and C2 > C1 with probability 1 − p. Expected consumption is C¯ = pC1 + (1 − p)C2 . The expected utility of the gamble is ¯ E(u) := pu(C1 ) + (1 − p)u(C2 ) < u(C).
Since u is a concave function, the household would prefer to receive C¯ with ˜ at which the u(C) ˜ = E(u), is the certainty over the gamble. The point C, certainty equivalent of the gamble. The household is indifferent between receiving C˜ with certainty and the gamble. Accordingly, the household would be willing to give up at most C¯ − C˜ to avoid the gamble. In the case of the less curved utility function v, drawn in black in Figure 1.6, the certainty
1.5 Labor Supply, Growth, and the Decentralized Economy
41
¯ In the equivalent of the gamble is to the right of C˜ and, thus, closer to C. 00 case of a linear utility function, u (C) = 0, the certainty equivalent and expected consumption coincide, and the household is indifferent between the gamble and the certain outcome and said to be risk-neutral. Therefore, larger values of the measure R(C) are associated with more risk aversion and a larger willingness C¯ − C˜ to avoid risky gambles. A frequently encountered parameterization of u(C) is the constant elasticity function u(C) :=
C 1−η − 1 , η ≥ 0. 1−η
(1.45)
The constant −1/(1 − η) ensures that this definition is valid for all nonnegative values of the parameter R(C) = η.19 INTERTEMPORAL ELASTICITY OF SUBSTITUTION. Consider the additively separable intertemporal utility function U(C1 , C2 ) := u(C1 ) + βu(C2 ), β > 0. The marginal rate of substitution (M RS) between consumption at t = 1 ¯ = U(C1 , C2 ), and t = 2, i.e., the slope of the indifference curve defined by U is given by M RS := −
u0 (C1 ) d C2 = . d C1 βu0 (C2 )
The intertemporal elasticity of substitution (IES) is defined as the percentage change of C2 /C1 relative to the percentage change of M RS:20 IES =
d ln(C2 /C1 ) . d ln M RS
(1.46)
19
Consider the case η = 1 so that u(C) = 0/0 for all C ∈ [0, ∞]. Using l’Hôpital’s rule, we find d(C 1−η −1) d(e(1−η) ln C −1) 1−η C −1 dη dη lim = d(1−η) = = ln C. η→1 1 − η −1 dη η=1
20
η=1
See, e.g., Barro and Sala-i Martin (2004), p. 91 for the continuous time case.
42
1 Basic Models
It is not difficult to compute this elasticity at the point C = C2 = C1 and to establish that it is equal to the inverse of R(C):21 IES = −
u0 (C) . Cu00 (C)
Accordingly, in the case of the isoelastic function (1.45) the IES is equal to 1/η. CONSUMPTION AND LABOR. Let u : R2+ → R, (C, L) 7→ u(C, L) denote the household’s period utility over consumption C and working hours L. We assume that it is increasing in its first argument, decreasing in its second, twice differentiable, and strictly concave. The static labor supply condition equates the real wage w with the marginal rate of substitution between consumption and hours (see also (1.37)): wu1 (C, L) = −u2 (C, L),
(1.47)
where the indices denote partial derivatives with respect to argument i = 1, 2. The Frisch elasticity of hours with respect to the real wage is defined as the proportionate change in hours relative to a proportionate change of the real wage, holding the marginal utility of consumption constant:22 u1 u11 d ln(L) w F ε L,w := =− . (1.48) d ln w d u1 =0 u22 u11 − u12 u21 L
Strict concavity of u implies ε FL,w > 0. Frequently encountered parameterizations of u are the following ones: 1−η C(1 − L)θ −1 u(C, L) := , (1.49a) 1−η
21
Using that C2 = g(C1 ) on any indifference curve, the elasticity is found from differentiating both C2 /C1 = g(C1 )/C1 and M RS = u0 (C1 )/(β g(C1 )) with respect to C1 , plugging the resulting into (1.46), and evaluating this expression at the point C = C1 = C2 . 22 The rhs of (1.48) follows from combining d u1 = u11 d C + u12 d L = 0 ⇒ d C = −
u12 u11
with u1 d w + u11 d C + u12 d L = −u21 d C − u22 d L.
1.5 Labor Supply, Growth, and the Decentralized Economy
u(C, L) :=
C γ (1 − L)1−γ
u(C, L) :=
C−
1−η
1−η
−1
1−η ν0 1+ν1 N 1−ν1
,
−1
1−η 1−η C (1 − L)1−γ1 u(C, L) = + γ0 . 1−η 1 − γ1
43
(1.49b)
,
(1.49c) (1.49d)
(1.49a) is introduced in King et al. (1988a) and (1.49c) in Greenwood et al. (1988). Note that both (1.49c) and (1.49d) are not of the form presented in (1.39) and, thus, not suitable for growth models. It is easy to establish that the Frisch elasticity implied by these functional forms is given by η 1− L , L η − θ (1 − η) 1 − L 1 − γ(1 − η) = , L η 1 = , ν1 1− L 1 = . L γ1
F εC,L =
(1.50a)
F εC,L
(1.50b)
F εC,L F εC,L
(1.50c) (1.50d)
Swanson (2012) considers risk aversion in the presence of labor in the utility function. The following formula for the coefficient of relative risk aversion applies in the steady state equilibrium and for given time endowments of 1: −u11 + λu12 C + w(1 − L) , u1 1 + wλ u1 u12 − u2 u11 λ := . u1 u22 − u2 u12
R(C, L) =
(1.51)
For the parameterizations (1.49a) and (1.49b) formula (1.51) implies: R(C, L) = η − θ (1 − η), R(C, L) = η.
(1.52a) (1.52b)
Accordingly, functions (1.49a) and (1.49b) are characterized by a constant relative risk aversion (CRRA). Swanson (2012) also shows that the IES is the reciprocal of the coefficient of relative risk aversion, if and only if λ = ( ¯L − L)/C. This condition holds for both (1.49a) and (1.49b).
44
1 Basic Models
1.5.4 The Decentralized Economy Thus far, for ease of exposition, we have dealt with a single agent. For each of the Ramsey models considered above, however, it is straightforward to develop a model of a decentralized economy whose equilibrium allocation coincides with the equilibrium allocation of the respective Ramsey model. Since the latter is a utility maximizing allocation, the decentralized equilibrium is optimal in the sense of Pareto efficiency. In the static theory of general equilibrium with a finite-dimensional commodity space, the correspondence between a competitive equilibrium and a Pareto efficient allocation of resources is stated in the Two Fundamental Theorems of Welfare Economics.23 The infinite-horizon Ramsey model has infinitely many commodities. Nevertheless, as shown by Debreu (1954), it is possible to extend the correspondence between competitive equilibrium and Pareto efficiency to infinite-dimensional commodity spaces. We illustrate the relation between efficiency and intertemporal equilibrium and reformulate the planing problem presented in (1.35) in terms of a decentralized economy. In this economy firms and households interact with each other in three markets. FIRMS. Consider a single firm i that employs the technology Yi = F (Ki , AL i ), where A denotes the economy wide level of labor-augmenting technical progress. Yi , Ki , and L i are the firm’s output, capital and labor input, respectively.24 The firm operates under perfect competition. It rents capital services at the rate r and buys effective labor services AL i at the real wage w. The firm maximizes profits Di := Yi − r Ki − wAL i = F (Ki , AL i ) − r Ki − wAL i .
The first-order condition with respect to capital services is given by r = F1 (Ki , AL i ) = F1 (Ki /(AL i ), 1).
(1.53)
Since the firm has to pay the same rental rate r as all other firms, this condition uniquely determines the scaled capital-labor ratio25 23
For a statement, see, e.g., Mas-Colell et al. (1995) pp. 545ff. or Starr (1997) pp. 144ff. For simplicity we suppress the time index at the moment. 25 Uniqueness follows from the fact that the marginal product of capital F1 (·) is strictly decreasing in Ki . 24
1.5 Labor Supply, Growth, and the Decentralized Economy
45
˜k := Ki . AL i This allows us to write the profit function of the firm as Di := Yi − r Ki − wAL i = AL i F (˜k, 1) − r ˜k − w = AL i F (˜k, 1) − F1 (˜k, 1)˜k − w = AL i F2 (˜k, 1) − w .
The last line employs Euler’s theorem for homogenous functions.26 Note that the term in brackets is positive (negative) if the marginal product of labor F2 (·) exceeds (falls below) the real wage w. In the first case, the firm could earn unbounded profits, if it hired an unbounded number of labor. In the second case, the firm would shut down. For a given finite supply of labor L, equilibrium on the labor market therefore requires that the real wage equals the marginal product of labor. Equilibrium on the market for capital holds, if condition (1.53) is met. Note that these conditions follow from two assumptions: 1) constant returns to scale and 2) perfect competition; i.e., firm i is a price taker on both the factor markets and the market for its output. There is no need to assume a certain number of firms that populate the economy. In fact, there is no way to determine the number of firms: In equilibrium, profits in this economy are always equal to zero, whether a single firm or hundreds of firms are operative. We can therefore drop the firm index i in (1.53), reintroduce the time index t, and state the conditions for equilibrium on both factor markets as w t = F1 (K t , A t L t ), r t = F2 (K t , A t L t ).
(1.54)
HOUSEHOLDS. Our example economy is populated by a continuum of households of mass 1; i.e., each individual household is assigned a unique real number h on the interval [0, 1]. All households have the same oneperiod utility function and the same period t capital stock. When they face 26
Since F (Ki , AL i ) has constant returns to scale, this theorem states (see, e.g., Sydsæter et al. (1999), p. 28, equation 4.21) Yi = F (Ki , AL i ) = F1 (Ki , AL i )Ki + F2 (Ki , AL i )AL i . .
46
1 Basic Models
a given path of output and factor prices, they choose identical sequences of consumption and labor supply. Let x(h) denote an arbitrary decision variable of household h ∈ [0, 1], and put x(h) = x¯ ∀h ∈ [0, 1].
Since
Z x¯ =
Z
1 0
x(h) d h =
1
x¯ d h, 0
aggregate and individual variables are identical. As a consequence, we can consider a representative member from [0, 1] without explicit reference to his or her index h. This representative household supplies labor services L t with efficiency factor A t and capital services K t at the given real wage w t and rental rate of capital r t . He or she saves in terms of capital, which depreciates at the rate δ ∈ (0, 1]. Thus, his or her budget constraint reads:27 K t+1 − K t ≤ w t A t L t + (r t − δ)K t − C t .
(1.55)
The household seeks time paths of consumption and labor supply that maximize its lifetime utility U t :=
∞ X s=0
β s u(C t+s , 1 − L t+s ),
β ∈ (0, 1),
(1.56)
subject to (1.55) and the given initial stock of capital K t . From the Lagrangian of this problem, L =
∞ X s=0
β s u(C t+s , 1 − L t+s )
+ β s Λ t+s (w t+s A t+s L t+s + (1 − δ + r t+s )K t+s − C t+s − K t+s+1
we derive the following first-order conditions: u1 (C t , 1 − L t ) = Λ t ,
u2 (C t , 1 − L t ) = Λ t w t ,
Λ t = βΛ t+1 (1 − δ + r t+1 ).
27
(1.57a) (1.57b) (1.57c)
Here, we use the fact that the firms’ profits are zero. In general, we must include the profits that firms distribute to their shareholders.
1.6 Model Calibration and Evaluation
47
Using the factor market equilibrium conditions (1.54) to substitute for w t and r t+1 and applying the Euler’s theorem to F (·), Yt = F (K t , A t L t ) = F1 (K t , A t L t )K t + F2 (K t , A t L t )A t L t equations (1.57) reduce to u2 (C t , 1 − L t ) = A t F2 (K t , A t L t ), u1 (C t , 1 − L t ) u1 (C t , 1 − L t ) = β(1 − δ + F1 (K t+1 , A t+1 L t+1 )), u1 (C t+1 , 1 − L t+1 ) K t+1 = F (K t , A t L t ) + (1 − δ)K t − C t .
(1.58a) (1.58b) (1.58c)
This system is identical to the first-order conditions that we derived for the Ramsey model (1.35) in equations (1.37) and (1.38) with the resource constraint being equal to (1.58c). Thus, the equilibrium time path of the decentralized economy is optimal in the sense that it maximizes the utility of all households given the resource constraint of the economy. On the other hand, a benevolent planner who solved the Ramsey problem (1.35) could implement this solution in terms of a competitive equilibrium. He simply has to choose time paths of wages and rental rates equal to the equilibrium sequences of the respective marginal products.
1.6 Model Calibration and Evaluation The task of numerical DGE analysis is to obtain an approximate solution of the model at hand and to use this solution to study the model’s properties. Before this can be done, specific values must be assigned to the model’s parameters. In this section, with the aid of an example that we will introduce in the next subsection, we illustrate both the calibration and the evaluation step.
1.6.1 The Benchmark Business Cycle Model Example 1.6.1 presents our benchmark model of the business cycle. More or less similar models appear among others in the papers by Hansen (1985), King et al. (1988a), and Plosser (1989). It is a stripped-down version of the celebrated model of Kydland and Prescott (1982), who were
48
1 Basic Models
awarded the Nobel Prize in economics in 2004 for their contribution to the theory of business cycles and economic policy. The model provides an integrated framework for studying economic fluctuations in a growing economy. Since it depicts an economy without money, it belongs to the class of real business cycle models. The economy is inhabited by a representative household and a representative firm. The household derives utility from consumption C t and leisure 1 − L t . He or she rents his or her labor and capital services to the firm that produces the economy’s output Yt . Labor-augmenting technical progress at the deterministic rate a − 1 > 0 accounts for output growth. Stationary shocks to total factor productivity (TFP) Z t induce deviations from the balanced growth path of output. Similar models have been used to demonstrate that elementary economic principles may account for a substantial part of the observed economic fluctuations. In the following chapters, we will apply various methods to solve this model. It thus serves as a point of reference to compare the performance of different algorithms. Example 1.6.1 Consider the following stochastic model of a household and a firm. The firm produces output Yt from labor L t and capital services K t according to the function Yt = Z t K tα (A t L t )1−α ,
α ∈ (0, 1).
(1.59)
The level of labor-augmenting technical progress grows deterministically at the rate a − 1: A t+1 = aA t .
(1.60)
The natural logarithm of TFP ln Z t follows the AR(1)-process ln Z t+1 = ρ Z ln Z t + ε t+1 ,
ε t+1 iid ∼ N (0, σε2 ).
(1.61)
The firm takes the real wage w t and the rental rate of capital r t as given and maximizes its profits Dt := Yt − r t K t − w t A t L t
(1.62)
subject to the production function (1.59). The representative household maximizes its expected utility ∞ X C 1−η (1 − L )θ (1−η) t+s s t+s Et β , β ∈ (0, 1), θ ≥ 0, η > θ /(1 + θ ) 1−η s=0 subject to a given initial stock of capital K t and the sequence of constraints
1.6 Model Calibration and Evaluation I t+s + C t+s K t+s+1 − (1 − δ)K t+s 0 0
≤ = ≤ ≤
49
Wt+s L t+s + r t+s K t+s + Dt+s I t+s ∀s = 0, 1, . . . . C t+s , I t+s .
FIRST-ORDER CONDITIONS. The first-order conditions of the firm are w t = (1 − α) rt = α
Yt . Kt
Yt /A t , Lt
From the Lagrangian function of the household’s problem 1−η
L :=
Ct
(1 − L t )θ (1−η) 1−η
+ Λ t [w t A t L t + r t K t + Dt − C t − (K t+1 − (1 − δ)K t )] 1−η C t+1 (1 − L t+1 )θ (1−η) + βE t 1−η + Λ t+1 w t+1 A t+1 L t+1 + r t+1 K t+1 + Dt+1 − C t+1 − (K t+2 − (1 − δ)K t+1 ) + ...
we derive the following first-order conditions: ∂L −η = C t (1 − L t )θ (1−η) − Λ t = 0, ∂ Ct ∂L 1−η = θ C t (1 − L t )θ (1−η)−1 − Λ t w t A t = 0, ∂ Lt ∂L = Λ t − βE t Λ t+1 (1 − δ + r t+1 ) = 0. ∂ K t+1
(1.63a) (1.63b) (1.63c)
η
In terms of stationary variables λ t := A t Λ t , x t := X t /A t , X t ∈ {C t , I t , K t , Yt }, the following system of equations governs the dynamics of the economy: −η
0 = c t (1 − L t )θ (1−η) − λ t ,
(1.64a)
50
1 Basic Models
0 = θ c t − w t (1 − L t ), 0 = y t − Z t kαt L 1−α , t yt 0 = w t − (1 − α) , Lt yt 0 = rt − α , kt 0 = yt − ct − it ,
0 = ak t+1 − (1 − δ)k t − i t , 0 = λt − β a
−η
E t λ t+1 (1 − δ + r t+1 ) .
(1.64b) (1.64c) (1.64d) (1.64e) (1.64f) (1.64g) (1.64h)
Equation (1.64a) is the scaled first-order condition (1.63a). Equation (1.64b) combines the first-order conditions (1.63a) and (1.63b). Equation (1.64c) is the production function, and equations (1.64d) and (1.64e) are the firm’s first-order conditions. Equation (1.64f) is the economy’s resource constraint, which follows from the household’s budget constraint at equality and the definition of the firm’s profits (1.62). Equation (1.64g) is the law of capital accumulation, and equation (1.64h) is the scaled Euler equation (1.63c). STATIONARY SOLUTION. From these equations, we can obtain the balanced growth path of the deterministic counterpart of the model. In the first step, we remove the uncertainty from the model by assuming that ε t+s = 0 ∀t+s ∈ N. As a consequence, we can dispose with the expectations operator in the system of equations (1.64). In the second step, we assume that all variables have converged to their stationary values; i.e., lim t→ x t = x for all x t ∈ { y t , c t , i t , L t , w r , r t , k t , λ t , ln Z t }. Considering the process for the log of the productivity shock, this assumption implies lim t→∞ ln Z t = 0 so that the long-run level of TFP is equal to Z = e0 = 1. In the final third step, irrespective of their time index, we replace all variables in the system (1.64) by their respective stationary value. This delivers:28 28
Equation (1.65a) derives from the Euler equation (1.64h) and from the first-order condition (1.64e). Equation (1.65b) follows from the resource constraint (1.64f) and the stationary version of equation (1.64g). Equation (1.65c) combines equations (1.64b) and (1.64d).
1.6 Model Calibration and Evaluation
y aη − β(1 − δ) = , k αβ y c = − (a − 1 + δ), k k L 1 − α y/k = . 1− L θ c/k
51
(1.65a) (1.65b) (1.65c)
Equations (1.65) constitute a nonlinear system of equations in the three variables y/k, c/k, and L. However, we do not need any of the methods presented in Section 15.3 to solve this system. For given values of the parameters a, α, β, and δ, equation (1.65a) determines y/k. Given this solution, we obtain c/k from equation (1.65b). If we also know the parameter θ , we can solve the third equation (1.65b) for L. The production function (1.64c) implies that the stationary capital-labor ratio k/L is a function of the output-capital ratio y/k: 1 k y α−1 = . (1.65d) L k Accordingly, we obtain the stationary levels of the variables from the identities k = (k/L)L, y = ( y/k)k, and c = (c/k)k.
1.6.2 Calibration DEFINITIONS. In this book, we use the term calibration for the process by which researchers choose the parameters of their DGE models from various sources. The most common ways are as follows: 1) the use of time series averages of the levels or ratios of economic variables, 2) the estimation of single equations, 3) a reference to econometric studies based on either macroeconomic or microeconomic data, and 4) the gauging of the parameters so that the model replicates certain empirical facts as second moments of the data or impulse responses from a structural vector autoregression. Very good descriptions of this process are given by Cooley and Prescott (1995) and by Gomme and Rupert (2007). Other authors, for instance Canova (2007) p. 249 and DeJong and Dave (2011) pp. 248 ff., use the term calibration in the sense of an empirical methodology that involves the following steps:
52
1) 2) 3) 4) 5)
1 Basic Models
select an economic question, decide about a DGE model to address this question, choose the functional forms and the parameters of this model, solve the model and evaluate its quality, and propose an answer.
In this sense, calibration is an empirical research program distinct from classical econometrics. An econometric model is a fully specified probabilistic description of the process that may have generated the data to be analyzed. The econometric toolkit is employed to estimate this model, to draw inferences about its validity, to provide forecasts, and to evaluate certain economic policy measures. The distinction between calibration and classical econometrics is most easily demonstrated with the aid of Example 1.4.2. The policy function for the next-period capital stock is K t+1 = αβ Z t K tα .
(1.66)
From this equation, we can derive an econometric, single-equation model once we specify the stochastic properties of the productivity shock Z t . Since, empirically, the stock of capital is a time series with a clear upward trend, we could assume that ln Z t is a difference stationary stochastic process with positive drift a; that is ln Z t+1 − ln Z t = a + ε t ,
where ε t is a serially uncorrelated process with mean E(ε t ) = 0 and variance E(ε2t ) = σ2 . Then, equation (1.66) implies ln K t+1 − ln K t = a + α(ln K t − ln K t−1 ) + ε t .
This is a first-order autoregressive (AR(1))-process in the variable x t := ln K t − ln K t−1 . The method of ordinary least squares provides consistent estimates of the parameters a, α, and σ2 . It should come as no surprise that the data will usually reject this model. For instance, using quarterly data for the German capital stock, we obtain an estimate of α of approximately 0.94 and of a = 0.00015. However, if capital is rewarded its marginal product, α should be equal to the capital share of income which is approximately 0.36 (see below). Furthermore, a should be equal to the quarterly growth rate of output, which — between 1991 and 2019 — was approximately twenty times larger than our estimate from this equation. The view that DSGE models are too simple to provide a framework for econometric research does not mean that they are useless. In the words of Edward Prescott (1986) p. 10:
1.6 Model Calibration and Evaluation
53
“The models constructed within this theoretical framework are necessarily highly abstract. Consequently, they are necessarily false, and statistical hypothesis testing will reject them. This does not imply, however, that nothing can be learned from such quantitative theoretical exercises.”
We have already demonstrated how we can use the model of Example 1.4.2 for ‘quantitative theoretical exercises’ in Section 1.3.5, where we constructed the distribution of the capital stock implied by the model. For this exercise, we set α = 0.36 and β = 0.996. We will explain in a moment on which considerations this choice rests. At this point, it should suffice to recognize that these values are not derived from the estimated policy function for capital but rely on time series averages. Calibration — in the sense of empirically grounded theoretical exercises — is the main use of DSGE models. However, there is also a substantial body of more recent work that employs econometric techniques — such as moment and Likelihood based methods — to estimate DSGE models. Since our focus is on numerical solutions, we refer the interested reader to the review article of Fernández-Villaverde et al. (2016) and the books by DeJong and Dave (2011), Canova (2007), and Herbst and Schorfheide (2016) that cover the application of econometric techniques to the estimation of DSGE models. PARAMETER CHOICE FOR THE BENCHMARK MODEL. We start with the assumption that the real economic data were produced by the model of Example 1.6.1. To account for the representative agent nature of this model, it is common to scale the data by the size of the population if appropriate. Since our model displays fluctuations around a stationary state, a valid procedure to select the model’s key parameters is to use long-run time series averages. We use seasonally adjusted quarterly economic data for the German economy over the period 1991.Q1 through 2019.Q4 provided by the German Statistical Office.29,30,31 In a first step, we must match the variables 29
We use the notation year.Qi to refer to the ith quarter of a year. Usually, the US economy is taken for this purpose. However, since this economy has been the focus of numerous real business cycle models, we think it is interesting to use an economy that differs in a number of aspects. 31 The German Statistical Office also provides data for the West German economy between 1970.Q1 and 1991.Q4. We limit our sample to the post-unification period to circumvent the problems introduced by merging different time series of the same variables. We provide the data in the file Data(2019).xlsx and two programs that calibrate the model’s 30
54
1 Basic Models
of our model to the variables in the tables of national product and income accounts (NIPA). On the expenditure side, NIPA measures gross domestic product (GDP) at market prices and splits GDP into the consumption of durables and nondurables, investment, government spending, and net exports. On the income side, wages and profits refer to factor costs. The difference between market prices and factor costs are the indirect taxes paid to the government less the subsidies received from the government. Our model, however, considers a closed economy without the government, and consumption refers to the expenditures on nondurables. To reconcile the model with the data, we measure all aggregates on the expenditure side at factor costs, use expenditures of the private sector on nondurables and the government’s consumption expenditures as our measure of consumption, and include net exports as well as private expenditures on durables in our measure of investment.32 Given these definitions, we can proceed to estimate various parameters of the model. In the stationary equilibrium of our model, output per household grows at the rate of labor-augmenting technical progress a − 1. Thus, we can infer a from fitting a linear time trend to the log of GDP per capita. This gives a = 1.003, implying a quarterly growth rate of 0.3 percent. The second parameter of the production technology, α, equals the average labor share in GDP at factor prices. The NIPA present no data on the wage income of self-employed persons. However, from the viewpoint of economic theory, this group of households also receives wage income as well as capital income. To account for this fact, we assume that the self-employed earn wages equal to the average wage of employees. This delivers a higher labor share of 1 − α = 0.64. The third parameter that describes the economy’s production technology is the rate of depreciation δ. We compute this rate as the average ratio of quarterly real depreciation to the quarterly capital stock.33 Compared to the number of 0.025 commonly used for the US economy34 our figure of δ = 0.014 is much smaller. With these parameters at hand we can infer the productivity shock Z t from the R parameters from this data set. The GAUSS program is GetPar.g, and the MATLAB program is GetPar.m . 32 Except for a few quarters German net exports are positive. Accordingly, they should be considered as (foreign) investment and not as consumption. 33 For this purpose, we construct a quarterly series of the capital stock from the yearly data on the stock of capital and the quarterly data on investment from the perpetual inventory method. The details of this approach can be found in the GAUSS program GetPar.g and R the MATLAB program GetPar.m 34 See, e.g., King et al. (1988a), p. 214 and Plosser (1989) p. 75.
1.6 Model Calibration and Evaluation
55
production function using the time series on the gross domestic product at factor prices Yt , on hours L t , and on the stock of capital K t : Zt =
Yt . 0.36 K t ((1.003) t L t )0.64
Given our specification of the Markov process for Z t in Example 1.6.1, we fit an AR(1)-process to the log of Z t . This delivers our estimates of ρ Z = 0.82 and of σε = 0.0071. It is not possible to determine from an aggregate time series alone all of the parameters that describe the preferences of the representative household. The critical parameter in this respect is the elasticity of the marginal utility of consumption, −η. Microeconomic studies provide evidence that this elasticity varies both with observable demographic characteristics and with the level of wealth. Browning et al. (1999) argue that if the constancy of this parameter across the population is imposed there is no strong evidence against η being slightly above one. We use η = 2, which implies that the household desires a smoother consumption profile than in the case of η = 1, i.e., the case of logarithmic preferences used in many studies. The reason for this choice is that a larger η reduces the variability of output in our model to approximately the value found for the standard deviation of GDP. Once the choice of η is made, there are several possibilities to select the value of the discount factor β. The first alternative uses the observed average (quarterly) capital-output ratio k/ y to solve for β from equation (1.65a). This yields β = 0.998. The second alternative rests on the observation that the term (1 − δ + α y/k) on the rhs of equation (1.65a) is the gross rate of return on capital. King et al. (1988a) p. 207 measure this rate from the average rate of return on equity. The average annual real return on the German stock index DAX was approximately 8 percent, implying β = 0.987. If we use bonds rather than equities and estimate the return on capital from the broad German bond index REX, whose average annual real return was 3.84 percent, we obtain β = 0.997. Other studies, e.g., Lucke (1998) p. 102, equate aη /β on the lhs of equation (1.65a) to the ex post real interest rate on short term bonds. Our estimate of 1.26 percent p.a., however, implies a value of β = 1.003, which violates the restriction β < 1. The average from the four estimates is β = 0.996, and we choose this value. The final choice concerns the preference parameter θ . We use condition (1.65c) and choose θ so that L = 0.126, which is the average quarterly fraction of 1440 (=16 × 90) hours spent on work by the typical German
56
1 Basic Models
employee. Note that many other studies put L = 1/3 arguing that individuals devote approximately 8 hours a day to market activities.35 However, we consider the typical individual to be an average over the total population, including children and retired persons. Therefore, we find a much smaller fraction of a sixteen hour day engaged in income earning activities. Table 1.1 summarizes our choice of parameters. Table 1.1 Calibration of the Benchmark Business Cycle Model Preferences Production Capital Accumulation
β=0.996 a=1.003 δ=0.014
η=2.0 α=0.36
L=0.126 ρ Z =0.82
σε =0.0071
1.6.3 Model Evaluation Econometric estimation of DSGE models, in particular with Bayesian methods, is becoming increasingly popular. As argued above in Section 1.6.2, we will neither introduce these methods nor apply them to our example models. Instead we rely on the analytically simpler tools that were used in the beginnings of DSGE modeling. Thus, we compute impulse responses and second moments from our example models and compare these with the respective empirical counterparts. IMPULSE RESPONSES. Impulse responses are the deviations of the model’s variables from their stationary solution that occur after a one-time shock that hits the economy. Figure 1.7 displays the response of several variables, measured in percentage deviations from their stationary values. They are R computed from the MATLAB program BM_Pert.m.36 The time path of productivity (1.61) is displayed in the upper left panel. In period t = 0 (not shown), the economy is in its stationary equilibrium. In period t = 1, TFP increases by one standard deviation σε = 0.0071 and follows 35 36
See, e.g., Hansen (1985) p. 319 f. The GAUSS version of this program is BM_Pert.g.
Percent
Percent
1.6 Model Calibration and Evaluation 1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
Log TFP Output
1 2 3 4 5 6 7 8 9 10 Quarter Hours Real Wage
57 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00
0.25 0.20
Consumption Investment
1 2 3 4 5 6 7 8 9 10 Quarter Capital
0.15 0.10 0.05 1 2 3 4 5 6 7 8 9 10 Quarter
0.00
1 2 3 4 5 6 7 8 9 10 Quarter
Figure 1.7 Impulse Responses in the Benchmark Model
ln Z t+1 = ρ Z ln Z t , t = 1, 2, . . . thereafter. Since ln Z t is highly autocorrelated (ρ Z = 0.82), Z t remains above Z = 1 for many periods. The above-average productivity raises the real wage, and the representative household substitutes leisure for consumption so that working hours increase (see the upper right panel of Figure 1.7). Both the increased productivity and the additional supply of labor boost output. Investment expenditures show by far the strongest reaction. Their increase is approximately three times the change in output and 8.8 times the increase in consumption. To understand the reaction of investment, note first that the household wants to spread the extra income earned in period t = 1 to smooth consumption. Second, the household anticipates higher real interest rates in the future since the productivity increase also raises the future marginal product of capital, providing an additional incentive to invest. Since investment expenditures are a small part of the existing capital stock (I/K = δ in the steady state), we only observe a modest, hump-shaped increase of capital in panel four. The capital stock attains its maximum 16 periods after the shock (not shown).
58
1 Basic Models
The above average supply of capital explains why real wages remain high even after the productivity shock has almost faded.
Percent
Percent
Output
Consumption 0.4
1.0 0.8 0.6 0.4 0.2 0.0
0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Investment
Hours 0.4
2.5 2.0 1.5 1.0 0.5 0.0
0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Figure 1.8 Impulse Responses from an Estimated VAR
Figure 1.8 displays impulse responses from a vector autoregressive model estimated from the same data that we used to calibrate the parameters of our benchmark model. The variables of the model are (in this order) GDP, consumption, investment, and hours. The model was estimated with one lag as indicated by the Schwarz information criterion. We used the Hodrick-Prescott (HP) filter (see Section 16.5.2) to remove the apparent trend in the data. The identification of the productivity shock was achieved by placing GDP at the top and by using the Cholesky factorization of the covariance matrix of the estimated residuals to obtain orthogonal shocks. Ninety-five-percent confidence bounds (the red lines in Figure 1.8) were obtained from a bootstrap procedure.37 37
Readers unfamiliar with structural vector autoregressive models may want to consult, for instance, Amisano and Giannini (1997), Canova (2007) Chapter 4, Favero (2001)
1.6 Model Calibration and Evaluation
59
Similarly as in our theoretical model investment expenditures display the largest amplitude. According to our model, the relation between the maximum increase in investment and the maximum increase in output is approximately three. In the estimated impulse responses of Figure 1.8, however, this relation is approximately two. There is a second difference between the model and the data. In the model, the maximum increase of all variables occurs in the period of the shock. In the data, the maximum increase in working hours takes place in the period after the shock hit the economy. The failure of the benchmark model to replicate this humpshaped pattern has been a concern among researchers since it was pointed out first by Cogley and Nason (1995b). SECOND MOMENTS. A second typical tool used to evaluate small-scale DSGE models is to compare the second moments of the time series implied by the model to those of the respective macroeconomic aggregates. Most of these aggregates have an upward trend that must be removed to render the time series stationary. There are several ways to do this. The simplest procedure, justified by the assumption of the model that there is a deterministic trend in labor productivity, is to extract a linear time trend from the logged variables. The Hodrick-Prescott (HP) filter described in more detail in Section 16.5.2 fits a more flexible time trend to the data. More advanced methods, such as the Baxter-King filter (see Baxter and King (1999)), approximate band-pass filters that remove cyclical components whose frequencies either exceed or fall short of the frequency of the business cycle. Whatever filter the researcher employs, he or she has to take care that the filtered data are comparable to the time series implied by his or her model. In Chapter 4 we discuss several methods to obtain second moments from the solution of a DSGE model. Here, we follow the common practice and compute second moments from simulated time series, which we compare to second moments obtained from the HP-filtered data described above. The cyclical component of a time series that the filter returns from logged data is the percentage deviation of the original series from its HP-trend component. The solution of our model consists of time paths Chapter 6 or Hamilton (1994) Chapters 10 and 11. The GAUSS program SVar.g, its R MATLAB version SVar.m, and the data set used for this estimation can be downloaded from the website of this book. Both programs have several options. For instance, you may extract a linear rather than a Hodrick-Prescott (HP)-trend from the data and change the number of lags.
60
1 Basic Models
of stationary variables x t := X t /A t , where X t denotes the level of the respective variable. Therefore, given our specification of the evolution of labor-augmenting technical progress, A t+1 = aA t
⇔ A t = A0 a t ,
we can recover the time paths of the logs of the levels from ln X t = ln x t + ln A t = ln x t + ln A0 + at. To gain comparable results, we must apply the HP-filter to ln X t . However, we can bypass the computation of ln X t , since, as we demonstrate in Section 16.5.2, the cyclical component of ln x t is equal to the cyclical component of ln X t . Table 1.2 displays the results from solving and simulating the model from Example 1.6.1 using the most widely employed perturbation method that we describe in Chapter 3. The second moments from the model are averages over 500 simulations. At the beginning of the first quarter, our model economy is on its balanced growth path. In this and the following quarters, it is hit by productivity shocks that drive the business cycle. To remove the influence of the starting point, we discard the first 50 quarters and compute second moments from the following 116 quarters. This number of quarters is equal to the number of quarterly observations from 1991.Q1 through 2019.Q4. Consider the match between the data and the model’s time series. The numbers in Table 1.2 reveal well-known results. The model is able to reproduce the fact that investment is more volatile than output and consumption, but it exaggerates this stylized fact of the business cycle. Consumption is much too smooth compared to its empirical counterpart. The autocorrelations, however, are quite in line with the data. The cross-correlations between output and the other variables are almost perfect in the model, quite in contrast to the cross-correlations found in the data. Note in particular that while the model predicts an almost perfect positive correlation between output and the real wage, this correlation is slightly negative in the data. The quite obvious mismatch between the data and the artificial time series highlights the fact that our model is apparently too simple to provide an adequate account of the empirical facts. Let us consider just two points. First, without a government and international trade, we have constructed our measure of consumption as the sum of private non-durable consumption and government consumption. If we remove government
1.6 Model Calibration and Evaluation
61
Table 1.2 Business Cycles Statistics from the Benchmark Model Variable
sx
rx y
rx
Output
1.35 (1.41) 0.44 (0.73) 4.03 (3.68) 0.80 (0.91) 0.55 (0.80)
1.00 (1.00) 0.99 (0.61) 1.00 (0.94) 1.00 (0.83) 0.99 (−0.29)
0.61 (0.80) 0.62 (0.64) 0.61 (0.80) 0.61 (0.85) 0.62 (0.60)
Consumption Investment Working Hours Real Wage
Notes:Empirical values from the HP-filtered German data are in parenthesis. s x :=standard deviation of the HP-filtered simulated series of variable x, r x y :=cross-correlation of variable x with output, and r x :=first-order autocorrelation of variable x.
spending and focus on private non-durables consumption only, the standard deviation increases to 0.93 percent. Obviously, our representative household prefers a smooth consumption profile much more than what we see in the data. Second, consider the household’s labor supply decision. Combining the first-order conditions (1.64a) and (1.64b) gives −1/η
w t = θ λt
(1 − L t )
θ (1−η) −1 η
.
For a given marginal utility of consumption λ t , this is an upward sloping labor supply schedule (see Figure 1.9).38 Using equation (1.64h), the firstorder condition of the firm with respect to labor demand can be written as w t = (1 − α)Z t kαt L −α t . For a given level of capital k t , this is a downward sloping labor demand schedule. 38
Note that we restricted η to η > θ /(1 + θ ) so that the one-period utility function is strictly concave.
62
1 Basic Models Lt Labor Demand Schedule for Z = Z2 > Z1
Labor Supply Schedule
Labor Demand Schedule for Z = Z1
L2 L1
w1
w2
wt
Figure 1.9 Productivity Shock in the Benchmark Business Cycle Model
A productivity shock raising Z from Z1 to Z2 shifts the labor demand schedule outward. Equilibrium in the labor market requires higher wages, and as a result, the representative household supplies more hours. Thus, the immediate impact of the shock is to raise the real wage, hours, and output. Since current consumption is a normal good, it increases as a consequence of the higher current income. Investment increases for several reasons: First, future consumption as well as future leisure are normal goods. Thus, the household wants to spend part of its higher current income on future consumption and future leisure. It builds up his stock of capital over the next periods so that future production is potentially higher. Second, since the productivity shock is highly autocorrelated, the household expects above normal returns to capital. Thus, all variables in the model move closely together with income which, in turn, is driven by a single shock. In reality, however, there may be additional shocks. For instance, think of a preference shock that shifts the labor supply curve to the left. This shock increases the real wage and reduces employment and output. As a consequence, the tight positive correlation between output, hours, and the real wage loosens. If this adverse shock is relatively more important than the productivity shock, the model would predict a negative correlation between output and the real wage.
1.7 Numerical Solution Methods
63
In subsequent chapters you will see how these and other extensions help to bring artificial and empirical data closer together. Before we close this chapter, we present an overview of the solution techniques to be introduced in the following chapters and relate them to the different characterizations of a model’s solution presented in the preceding sections.
1.7 Numerical Solution Methods We have seen in Sections 1.3.4 and 1.4.3 that only very special DGE models allow for an analytical solution. Thus, we usually must resort to numerical methods that provide approximate solutions. What are the general ideas behind these solutions and how are we able to determine how accurate they are? The next two subsections address these issues.
1.7.1 Overview Solutions of DGE models fall into two categories: the perturbation approach considered in Chapters 2 through 4 provides local solutions, whereas the methods presented in Chapters 5-7 provide global ones. Local methods use information about the model at a certain point in the model’s state space, e.g., the stationary equilibrium of the benchmark model considered in equations (1.65). Global methods, instead, incorporate information from (most of) the model’s entire state space. The mathematical foundation of local methods are the implicit function theorem (see Section 13.4) and Taylor’s theorem (see Section 13.3). From the viewpoint of the user, local methods have the advantage that they are easy to use, fast, and applicable to models of any size. Basically, the user must provide the model’s equations and parameter values to a program that computes the solution. This task requires relatively little programming experience, and even large-scale models can be solved on a personal computer in a few seconds or minutes, depending on the required degree of accuracy, the employed programming language and algorithm, and the speed of the central processing unit (CPU) on which the program runs. These advantages make perturbation methods the most frequently employed tool to solve DGE models. Therefore, we devote the next three chapters to their foundations, development, and usage.
64
1 Basic Models
There is no unifying framework behind the global methods considered in this book. Most of them employ tools from function approximation reviewed in Chapter 13. However, as will become apparent in a moment, there are two exceptions. Chapter 5 introduces weighted residuals or projection methods. The intuition behind these methods should be familiar from elementary econometrics: The solution of the least-squares problem in the linear regression model projects the vector of the dependent variable orthogonally into the space spanned by the observations of the independent variables.39 Weighted residual methods for functional equations extend the notions of residuals, basis vectors, and orthogonal projection known from the linear regression analysis to function spaces. After all, the solution of DGE models are functions whose arguments are the model’s endogenous state variables and shocks. In our Example 1.6.1, these arguments are the capital stock k t and the log of the TFP shock ln Z t . For each pair (k t , ln Z t ) we can compute a residual as the lhs of the Euler equation (1.64h) after substituting the approximate solution for the exact but unknown solution. Weighted residual methods construct the approximate solution from linear combinations of basis functions and choose the parameters to either minimize the sum of weighted residuals over the state space or to set to zero this sum on certain points of the state space. The finite element method approximates the unknown function by splines of low degree polynomials. Spectral methods employ orthogonal polynomials as basis functions. A researcher who uses projection methods, therefore, must choose from a variety of options and must employ many tools from numerical mathematics, including integration, maximization, and the solution of nonlinear equations. Chapter 6 introduces simulation based methods. The first part of this chapter, Section 6.2, presents a straightforward approach, known as the extended path method. As the name already indicates, it does not rely on functional analysis. Instead, it computes either the entire time path of the model or part of this path from solving a large system of nonlinear equations. For instance, the system (1.13) determines a sequence of T T unknown capital stocks {K t+s }s=1 , if the economy starts in period t with a given initial capital K t and converges after T + 1 periods to the stationary capital stock that solves 1 = β(1 − δ + f 0 (K t+T +1 )). In the stochastic case, a related deterministic system is solved repeatedly to trace out the time 39
See, e.g., Davidson and MacKinnon (2004), Section 2.3 for this geometric interpretation of the least-squares problem.
1.7 Numerical Solution Methods
65
path of the model under a given sequence of shocks. The cumbersome part of this procedure is to code this system, and the tricky part is to find initial values for nonlinear equations solvers to converge. The generalized stochastic simulation (GSS) approach presented in Section 6.3 combines function approximation and simulation. It differs from weighted residuals methods insofar as it does not employ a fixed set of points on which it computes the residuals. Instead, it aims to approximate the model on its ergodic set. Intuitively, a stochastic simulation of infinite length that employs the exact solution of the model would trace out this set. GSS approximates this set iteratively: It employs parameterized functions that approximate the solution, draws a sequence of random numbers and computes long time series. In the second step, it updates the parameters by fitting the functions most closely to these series. It repeats the simulation and fitting steps until the parameters have converged. Chapter 7 introduces VI on a discrete set of points. VI in general rests on the dynamic programming approach and computes approximations of either the value functions or the policy functions associated with the decision problems of the economic agents that populate the respective economy. It employs the fact that under certain conditions iterations over the Bellman equation provide successively better approximations of the value function (see Section 1.3.3). In general, this is also a problem of functional analysis. However, once we restrict the problem to a denumerable set of points (the discrete state space), VI becomes particularly simple. For instance, in the case of the infinite-horizon Ramsey model of Section 1.3, we can approximate the value function by a vector with n elements, each corresponding to the value obtained from a given capital stock Ki ∈ K := {K1 , K2 , . . . , Kn }. We can then solve the planner’s problem on the grid K . Accordingly, the policy function reduces to a vector of n capital stocks that represents the solution of the rhs of the Bellman equation for each Ki ∈ K . Though computationally simple, the practical relevance is limited to models with a low-dimensional state space, since the number of algebraic operations increases exponentially with the dimension of the state space. However, the method is an indispensable tool for solving heterogenous agent models as the reader will learn in Part II.
66
1 Basic Models
1.7.2 Accuracy of Solutions How shall we compare the solutions obtained from different methods and decide which one to use? In this subsection, we consider two different criteria. SECOND MOMENTS. In as much as we are interested in the kind of model evaluation considered in Section 1.6, the second moments of the time series obtained from simulating the model provide a first benchmark. Thus, each of the following chapters provides the results from the solution of the benchmark model of Example 1.6.1. Our simulations use the same sequence of shocks so that differences in the results can be traced to differences in the solution procedure. As in Heer and Maußner (2008), we will find that there are no noteworthy differences in second moments that favor the more advanced global methods over the local ones.40 EULER EQUATION RESIDUALS. However, there are considerable differences with respect to a measure of accuracy known as Euler equation residuals. To develop this measure, we will introduce a more general framework. Suppose we want to approximate a function h : X → Y that maps the subset X of Rn into the subset Y of R. The function h is implicitly defined by the functional equation41 G(h) = 0. The operator G : C1 → C2 maps the elements of the function space C1 to the function space C2 . Examples of functional equations are the Bellman equation (1.16) of the deterministic growth model and the Euler equation of the stochastic growth model (1.27). The unknown function of the former is the value function v(K), and the policy function h(K, Z) is the unknown of the latter. Suppose we have found a function ˆh that approximates the solution in the state space X . Then, for each x ∈ X , we can compute the residual R(x) := G(ˆh(x)). 40 41
For a related but independent study with similar results, see Aruoba et al. (2006). See Section 13.2 for the definition of a function space.
1.7 Numerical Solution Methods
67
Since ˆh approximates h, R(x) will in general not be equal to zero, and we can use the maximum absolute value of R(·) over all x ∈ X as a measure of the goodness of our approximation. For instance, let ˆh(K) denote an approximate solution of the policy function of the next-period capital stock in the deterministic growth model. Then, we can compute the residual of the Euler equation (1.13) from42 R(K) = 1 −
βu0 ( f (ˆh(K)) − ˆh(ˆh(K))) 0 ˆ f (h(K)). u0 ( f (K) − ˆh(K))
A more interpretable definition of the Euler equation residual is due to Christiano and Fisher (2000). In the context of equation (1.13), it is given by C˜ − 1, C C = f (K) − ˆh(K),
˜ (K) := R
1=
βu0 ( f (ˆh(K)) − ˆh(ˆh(K))) 0 ˆ f (h(K)), ˜ u0 (C)
(1.67a) (1.67b) (1.67c)
where C˜ is the value of consumption that solves equation (1.67c) exactly. ˜ (K) is the rate by which consumption had to be raised above Thus, R consumption given by the policy function ˆh in order to deliver a Euler equation residual equal to zero. In Heer and Maußner (2008), we find for the benchmark model that both the extended path and the projection method provide very accurate results. The second-order perturbation approximation of the policy functions also delivers good results and outperforms the solutions obtained from VI. The least accurate solution is the first-order perturbation approximation of the policy functions.
42
This measure was first proposed by Judd and Guu (1997).
68
1 Basic Models
A.1 Solution to Example 1.3.1 We derive the solution to Example 1.3.1 using iterations over the value function. We set the initial value equal to zero and work backwards. Hence, letting v 0 = 0, we solve v 1 = max 0 K
ln(K α − K 0 )
yielding K 0 = 0 and v 1 = α ln K. In the next step we seek K 0 that solves v 2 = max 0 K
ln(K α − K 0 ) + βα ln K 0 .
From the first order condition Kα we get
αβ 1 = 0 0 −K K
K0 =
αβ K α, 1 + αβ
v 2 = α(1 + αβ) ln K + A1 , A1 := ln(1/(1 + αβ)) + αβ ln(αβ/(1 + αβ)). The value function in step s = 3 is given by v 3 = max 0 K
ln(K α − K 0 ) + βα(1 + αβ) ln K 0 + βA1
yielding K0 =
αβ + (αβ)2 K α, 1 + αβ + (αβ)2
v 3 = α(1 + αβ + (αβ)2 ) ln K + A2 , αβ + (αβ)2 1 2 A2 = ln + αβ + (αβ) ln + βA1 . 1 + αβ + (αβ)2 1 + αβ + (αβ)2 Continuing in this fashion we find the policy function in step s given by Ps−1 s i=1 (αβ) 0 K = Ps−1 Kα s (αβ) i=0 with limit s → ∞ equal to K 0 = αβ K α .
Obviously, from the first two steps, the value function is a linear function of ln K. To infer the parameters of v := lims→∞ v s , we use the method of undetermined coefficients.
Appendix 1
69
This method postulates a functional form for the solution with unknown parameters, also called the undetermined coefficients. The parameterized function replaces the unknown function in the equations that describe the model. The resulting equations determine the coefficients. Thus, assume v = a + b ln K with a and b as yet undetermined coefficients. Solving ln(K α − K 0 ) + β(a + b ln K 0 )
max 0 K
yields K0 =
βb K α. 1+βb
Therefore
βb 1 v = α(1 + β b) ln K + β a + ln + β b ln . | {z } 1+βb 1+βb | {z } b a
Equating the constant on the rhs of this equation to a and the slope parameter to b, we get: α b = α(1 + β b) ⇒ b = , 1 − αβ βb 1 a = β a + ln + β b ln , 1+βb 1+βb αβ 1 ⇒a= ln(1 − αβ) + ln αβ . 1−β 1 − αβ
70
1 Basic Models
A.2 Restrictions on Technology and Preferences Here we derive formally the restrictions that we must place on technology and preferences to ensure the existence of a balanced growth path. We draw strongly on the Appendix to King et al. (1988a) and on Boppart and Krusell (2020). Like Solow (1988) pp. 1-4, we define a balanced growth path as an equilibrium that features (see page 33) 1) a constant rate of output growth, 2) and a constant share of savings in output. Technology. As an immediate consequence of the second assumption, savings S t = Yt − C t must grow at the same rate as output. The growth rate of capital g K follows from the resource constraint as: St
gK =
K t+1 Kt
z }| { Yt − C t +(1 − δ)K t S t Yt = = + (1 − δ), Kt Yt K t
This rate is constant, if in addition to s := S t /Yt the output-capital ratio Yt /K t is also time-invariant. This, in turn, implies that capital must grow at the same rate as output: g K = g Y . If g K > g Y , g K would approach g K = 1 − δ < 1 and the stock of capital tends zero. If instead g K < g Y , capital growth accelerates and savings cannot be constant to fuel this process. Now, consider the general case of labor and capital augmenting technical progress: Yt = F (B t K t , A t L t ),
A t = A0 a t , B t = B 0 b t .
Since F is linear homogenous, the growth factor of output, g Y , can be factored as follows Yt+1 B t+1 K t+1 F (1, X t+1 ) = = bg K g F , Yt Bt Kt F (1, X t ) ag L X t := (A0 /B0 )(a/b) t (L t /K t ) ⇒ g X = . bg K
gY =
Since g Y = g K we get from (A.2.1a) 1 = b gF . There are two cases to consider: i. b = g F ≡ 1,
ii. g F = 1/b, b > 1.
(A.2.1a) (A.2.1b)
Appendix 2
71
In the first case technical progress is purely labor augmenting and for g F ≡ 1 we must have g X = 1, implying g K = ag L . Now, in our representative agent framework with a constant population size, L is bounded between zero and one. Thus, a constant rate of capital and output growth requires g L = 1 (otherwise L → 1 or L → 0). Therefore, output and capital grow at the rate of labor augmenting technical progress a − 1. For the share of savings to remain constant, consumption must also grow at this rate. Now consider the second case. For F (1, X t+1 ) g F := = constant < 1 F (1, X t ) X t must grow at the constant rate gX =
a g L g L =1 a =⇒ g X = . b gK b gK
Let X t = X0ct ,
c=
a , b gK
and define f (X t ) := F (1, X t ) so that the condition reads f (X 0 c t+1 ) = constant. f (X 0 c t ) Since this must hold for arbitrary given initial conditions X 0 , differentiation with respect to X 0 implies § ª X t+1 Xt 1 0 0 0= f (X t ) f (X t+1 ) − f (X t+1 ) f (X t ) d X 0, f (X t )2 X0 X0 § 0 ª f (X t+1 )X t+1 f 0 (X t )X t f (X t+1 ) d X 0 0= − . f (X t+1 ) f (X t ) f (X t ) X 0 For the term in curly brackets to be zero, the elasticity of f with respect to X t must be a constant, say 1 − α: f 0 (X t )X t = 1 − α. f (X t )
However, the only functional form with constant elasticity is f (X ) = Z X 1−α with Z an arbitrary constant of integration. Thus, output must be given by a Cobb-Douglas function Y = F (BK, AL) = BK( f (AL/BK)) = BK Z(AL/BK)1−α = Z(BK)α (AL)1−α . However, if F is Cobb-Douglas technical progress can always be written as purely labor-augmenting since Yt = Z(B t K t )α (A t L t )1−α = Z K tα (A˜t L t )1−α ,
α/(1−α) A˜t := A t B t .
72
1 Basic Models
Preferences. The instantaneous utility function u(C, 1 − L) must be consistent with the first-order conditions (1.37) and (1.38). Let A t := A0 a t denote the level of labor augmenting technical progress on the balanced growth path, C t := C0 a t the respective level of consumption, and L the constant supply of labor. Remember that K t and A t must grow at the same rate so that K t /(A t L t) is constant and equal to, say, K/(AL). Since F2 (K t , A t L t ) = F2 (K t /A t L t , 1), condition (1.37) can be written as u2 (C0 a t , 1 − L) = W0 a t , W0 := A0 a t F2 (K/AL, 1). u1 (C0 a t , 1 − L)
This condition must hold for arbitrary a, C0 , W0 , and t. For t = 0 this implies: u2 (C0 , 1 − L) = W0 u1 (C0 , 1 − L)
and for C0 a t = 1: p(1 − L) :=
u1 (1, 1 − L) W0 = , u2 (1, 1 − L) C0
where p is an arbitrary function. Taken together, the marginal rate of substitution must satisfy u2 (C, 1 − L) = C p(1 − L), u1 (C, 1 − L)
(A.2.2)
where we have suppressed the time index, since C0 and W0 are arbitrary. In the same way, we can state condition (1.38) on the balanced growth path as u1 (C0 a t , 1 − L) = ∆, ∆ := β(1 − δ + F1 (K/AL, 1)). u1 (C0 a t+1 , 1 − L)
Differentiating with respect to C0 yields: d C0 u11 (C0 a t , 1 − L) u11 (C0 a t+1 , 1 − L) t t+1 ∆ C0 a − C0 a = 0. C0 u1 (C0 a t , 1 − L) u1 (C0 a t+1 , 1 − L)
The term in curly brackets must be equal to zero, again for arbitrary a, C0 and t. At C0 a t = 1 this gives −q(1 − L) :=
u11 (1, 1 − L) u11 (a, 1 − L) = a. u1 (1, 1 − L) u1 (a, 1 − L)
(A.2.3)
Since a is arbitrary, we may also set a = C. In this way, equation (A.2.3) tells us that the elasticity of the marginal utility of consumption, i.e., η := Cu11 (C, 1 − L)/u1 (C, 1 − L) depends only on L via the function q(1 − L). Writing this as
Appendix 2 −
73
u11 (C, 1 − L) q(1 − L) = C u1 (C, 1 − L)
and integrating both sides over C, Z Z u11 (C, 1 − L) q(1 − L) − dC = dC, C u1 (C, 1 − L) gives
ln u1 (C, 1 − L) = −q(1 − L) ln C + ln v1 (1 − L),
(A.2.4)
where the integration constant ln v1 depends on 1 − L. We can now use condition (A.2.2) to show that the elasticity η does not depend on L. Differentiating the previous equation with respect to 1 − L gives v10 (1 − L) u12 (C, 1 − L) = −q0 (1 − L) ln C + . u1 (C, 1 − L) v1 (1 − L)
(A.2.5)
Differentiating equation (A.2.2) with respect to C yields u21 (C, 1 − L) u2 (C, 1 − L)u11 (C, 1 − L) − = p(1 − L). u1 (C, 1 − L) u1 (C, 1 − L)2
The second term on the lhs of this expression is equal to −p(1 − L)q(1 − L) (see (A.2.2) and (A.2.3)) so that u21 (C, 1 − L) = p(1 − L) (1 − q(1 − L)) . u1 (C, 1 − L)
The rhs of this expression is a function with leisure 1 − L as its single argument. Therefore, the lhs is independent of C and only a function of 1 − L. If the utility function u(C, 1 − L) is at least twice continuously differentiable Young’s theorem establishes that u12 (C, 1 − L) = u21 (C, 1 − L). Therefore, the coefficient −q0 (1 − L) of ln C in equation (A.2.5) must be equal to zero. Setting q = η in equation (A.2.4) provides u1 (C, 1 − L) = C −η v1 (1 − L).
Integrating with respect to C yields: ( C 1−η v1 (1−L) + v2 (1 − L) if η 6= 1, 1−η u(C, 1 − L) = v1 (1 − L) ln C + v2 (1 − L) if η = 1.
(A.2.6)
(A.2.7)
Restriction on the functions v1 (1 − L) and v2 (1 − L) derive from (A.2.2). Differentiating (A.2.7) with respect to 1 − L gives ( C 1−η 0 0 for η 6= 1, 1−η v1 (1 − L) + v2 (1 − L) u2 (C, 1 − L) = ln C v10 (1 − L) + v20 (1 − L) for η = 1.
74 while (A.2.2) and (A.2.6) imply ¨ C 1−η p(1 − L)v1 (1 − L) u2 (C, 1 − L) = p(1 − L)v1 (1 − L)
1 Basic Models
for η 6= 1, . for η = 1
Comparing both expression implies v20 (1 − L) = 0 for η 6= 1 and v10 (1 − L) = 0 for η = 1. Setting the respective constants equal to zero and one, respectively, yields the functional forms of the one-period utility function given in (1.39).
Problems
75
Problem 1.1: Finite-Horizon Ramsey Model Prove that the finite horizon Ramsey model stated in (1.3) meets the assumptions of the Karush-Kuhn-Tucker (KKT) theorem 1.2.1.
Problem 1.2: Coefficient of Relative Risk Aversion 1) Compute the coefficient of relative risk aversion from (1.51) for the utility function defined in (1.49d). 2) For period utility functions u(C, L) without given time endowment, Swanson (2012) defines the consumption-only coefficient of relative risk aversion as R(C) :=
−u11 + λu12 C u1 1 + wλ
with w = −u2 /u1 and λ defined in (1.51). Use this formula and compute the coefficient of relative risk aversion for the utility function defined in (1.49c).
Problem 1.3: Infinite-Horizon Ramsey Model with Adjustment Costs Consider the following Ramsey model: A fictitious planer maximizes ∞ X
β s ln C t+s ,
β ∈ (0, 1),
s=0
subject to 1−δ δ K t+s+1 = K t+s I t+s , α I t+s = K t+s − C t+s ,
K t given.
δ ∈ (0, 1),
α ∈ (0, 1),
The symbols have the usual meaning: C t is consumption, K t is the stock of capital, and I t is investment. 1) State the Lagrangian of this problem and derive the first-order conditions of this problem. (Hint: Substitute for I t in the transition equation for capital from the definition of I t .) 2) Suppose the policy function for capital is given by k
K t+1 = k0 K t 1 . Use this equation to derive the policy functions for investment and consumption.
76
1 Basic Models
3) Assume that the policy function for consumption can be written as c
C t = c0 K t 1 . If this guess is true, how are c0 , c1 , k0 , and k1 related to the model’s parameters α, β, and δ? 4) Substitute the policy functions into the Euler equation for capital. Show that the assumptions made thus far hold, if k0 meets the condition k0 =
αβδ 1 − β(1 − δ)
δ .
Problem 1.4: A Vintage Model of Capital Accumulation In Section 1.3.5 we consider the problem max
∞ X
1−η
βs
s=0
subject to
C t+s − 1 1−η
,
β ∈ (0, 1), η > 0,
K2t+s+1 = δK1t+s , δ ∈ (0, 1), 1−η 1 1−η Yt+s = aK1t+s + (1 − a)K2t+s 1−η , Yt+s = C t+s + K1t+s+1 ,
a ∈ (0, 1),
K1t and K2t given. 1) Use dynamic programming to derive the first-order conditions for this problem. (Hint: Use v(K1 , K2 ) as value function, note that K20 = δK1 , and substitute the economy’s resource constraint for K10 .) 2) Prove that K1t+1 = sYt , where s is determined from 1 1 − s = β a + β 2 (1 − a)δ1−η η , solves this problem.
Problem 1.5: Dynamic Programming and the Stochastic Ramsey Model The stochastic Euler equations of the Ramsey model (1.24) are given in (1.27). Use stochastic dynamic programming as considered in Section 1.4.3 to derive these conditions.
Problems
77
Problem 1.6: Analytic Solution of the Benchmark Model Consider the benchmark model of Example 1.6.1. Assume η = 1 so that the current period utility function is given by u(C t , L t ) := ln C t + θ ln(1 − L t ). Furthermore, suppose δ = 1, that is, full depreciation. Use the method of undetermined coefficients (see Appendix A.1) to verify that k t+1 = AZ t kαt L 1−α , t with A to be determined, is the policy function for the next-period capital stock. Show that working hours L t are constant in this model.
Problem 1.7: A Model With Flexible Working Hours and Analytic Solution In Section 1.3.5 we considered a model with adjustment costs. We extend this model to a stochastic model with endogenous labor supply. Assume the current period utility function θ u(C t , L t ) = ln C t − L 1+ω , θ , ω > 0. 1+ω t The transition equation for the capital stock is K t+1 = K t1−δ I tδ ,
δ ∈ (0, 1).
The production function is Yt = Z t K tα L 1−α . t Determine the coefficients of the following guesses for the policy functions for consumption C t , working hours L t , and the next-period capital stock K t+1 : c
c
C t = c1 Z t 2 K t 3 , n
n
k
k
L t = n1 Z t 2 K t 3 , K t+1 = k1 Z t 2 K t 3 .
Chapter 2
Perturbation Methods: Framework and Tools
2.1 Introduction Perturbation methods are the most frequently employed tool to solve DGE models. Their popularity rests on the fact that they are easy to apply. Programmers of toolkits can keep all the algorithm details away from users. For instance, users of the collection of routines provided by the platform Dynare1 need only supply the model’s equations and parameter values and the program delivers the solution after a keystroke. Perturbation methods were among the first methods employed to solve the seminal real business cycle models developed in the 1980s. For instance, Kydland and Prescott (1982) solved their Nobel Prize winning model by transforming the nonlinear optimization problem to a linear-quadratic optimal control problem, and the famous review papers of King et al. (1988a,b) pioneered the technique to linearize the model’s equations and to solve the ensuing linear rational expectations model via the procedure proposed by Blanchard and Kahn (1980). In the 1990s and the beginning of the 2000s, Judd (1998) and Jin and Judd (2002) developed a general framework for the application of perturbation methods to DGE models, and several authors, including Kim et al. (2008), Schmitt-Grohé and Uribe (2004), Andreasen (2012), and Binning (2013) developed computer code to provide higher-order solutions. Due to their widespread use, we decided to extend our presentation of perturbation methods and to spread the material over three chapters. The present chapter introduces the perturbation approach and related issues, develops a general framework, and reviews the required tools. Chapter 3 1
See www.dynare.org.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_2
79
80
2 Perturbation Methods: Framework and Tools
derives first-, second-, and third-order solutions. Chapter 4 considers the model evaluation and presents several applications.
2.2 Order of Approximation Perturbation refers to the technique that extends a known and easy-toidentify solution of a problem to a solution of a nearby problem. A differentiable function provides a simple example. Consider a real-valued function f (x) on an open interval I := (a, b). This function is differentiable at x¯ ∈ I if there exists a real number c such that f (¯ x + h) = f (¯ x ) + ch + o(h), where o(h) denotes an infinitesimal of a higher order than h, i.e., o(h) = 0. h→0 h lim
In this definition, c := f 0 (¯ x ) is the derivative of f at the point x¯ and ch is the differential of the function. If there is no more information about f than its value at the point x¯ , f (¯ x ), and its derivative f 0 (¯ x ), we may approximate this function locally at x¯ by the linear function (the tangent to f at x¯ ): g(¯ x + h) := f (¯ x ) + f 0 (¯ x )h. If f is twice continuously differentiable at x¯ , we may invoke Taylor’s theorem (see Section 13.3) to confirm that the approximation error is proportional to h2 : o(h) = 12 f 00 (ξ)h2 , for one ξ ∈ (¯ x , x¯ + h). Furthermore, if the function is k + 1-times continuously differentiable, this theorem allows us to approximate f by a kth degree polynomial in h: g(¯ x + h) = f (¯ x ) + f 0 (¯ x )h + 12 f 00 (¯ x )h2 + · · · + The approximation error has the property lim
h→0
f (¯ x + h) − g(¯ x + h) = 0. hk
1 k!
f k (¯ x )hk .
2.3 Tools
81
This gives rise to identifying the order of the approximations with the integer k. The cases k ∈ {1, 2, 3} are often also labeled linear, quadratic, and cubic approximations. In the multidimensional case, where y ∈ Rm and x ∈ Rn so that y = f(x), we must use some measure of distance to define the degree of approximation. Let h ∈ Rn and let ||x|| denote a norm on Rn (see Section 12.4). ¯, if2 Then, we say g is an kth order approximation of f at x lim h→0 h6=0
||f(¯ x + h) − g(¯ x + h)|| = 0. ||h||k
(2.1)
2.3 Tools 2.3.1 A Brief List The equilibrium conditions of DGE models cannot be solved analytically, except in the rare cases considered in Section 1.3.5. Instead, the variables are related to each other, as in f (x, y) = 0. This is an implicit function. In this case, the implicit function theorem (see Section 13.4) provides conditions under which a local solution at the point (¯ x , ¯y ) implies the existence of a continuous and differentiable function y = h(x) so that f (x, h(x)) = 0 near (¯ x , ¯y ). To approximate the function h locally by a linear function g(x), we must compute h0 (¯ x ). The respective tool is the chain rule. Differentiating f (x, h(x)) with respect to x, applying the chain rule and evaluating the ensuing expression at the point x¯ gives f x (¯ x , ¯y ) + f y (¯ x , ¯y )h0 (¯ x ) = 0, which can be solved for h0 (¯ x ) = − f x (¯ x , ¯y )/ f y (¯ x , ¯y ). Note that this requires f y (¯ x , ¯y ) 6= 0. If this holds, the function g(x), which approximates h(x) linearly in a neighborhood ∆x := x − x¯ of x¯ , is given by g(x) := ¯y + h0 (¯ x )(x − x¯ ).
Consider a slightly extended example, where x and y refer to the same variable but at different points in time: 2
See, e.g., Judd (1998), p. 449.
82
2 Perturbation Methods: Framework and Tools
f (x t , x t+1 ) = 0.
(2.2)
Proceeding as before gives x t+1 = g(x t ) = x¯ + h0 (¯ x )(x t − x¯ ), where f (¯ x , x¯ ) = 0 denotes the stationary equilibrium of the dynamic system defined by equation (2.2). This is a linear, first-order difference equation. Written in terms of deviations from x¯ : x¯ t+1 = h0 (¯ x )¯ x t , x¯ t := x t − x¯
it is easy to see that |h0 (¯ x )| < 1 ensures that x t will converge to x¯ for any x t ∈ (¯ x − ∆x, x¯ + ∆x). The results gathered in Section 16.2 establish the conditions under which the behavior of the linear difference equation locally approximates the dynamics of the nonlinear model (2.2). In summary, differentiability, Taylor’s theorem, the implicit function theorem, the chain rule of differentiation, and results from the theory of nonlinear difference equations build the toolkit of perturbation methods for the solution of DGE models. Before we extend this univariate example to the case of many variables, we will demonstrate the interplay between the tools in solving the deterministic Ramsey model introduced in Section 1.2.
2.3.2 Application to the Deterministic Ramsey Model This section solves the deterministic Ramsey model in two ways. First, we use the implicit function theorem to obtain a system of two linear difference equations that approximate the model’s dynamics near the stationary solution. Second, we employ the implicit function theorem in a more direct approach to find the first-order and second-order approximations of the policy functions for capital and consumption. Our starting point for both approaches is the following set of equilibrium conditions derived in Section 1.3, which we reproduce from equations (1.19):
0
K t+1 − f (K t ) + C t =: g 1 (K t , C t , K t+1 , C t+1 )= 0, 0
0
2
u (C t ) − βu (C t+1 ) f (K t+1 ) =: g (K t , C t , K t+1 , C t+1 )= 0.
(2.3a) (2.3b)
2.3 Tools
83
Equation (2.3a) is the farmer’s resource constraint.3 It states that seed available for the next period K t+1 equals production f (K t ) minus consumption C t . The first-order condition with respect to the next-period stock of capital K t+1 is equation (2.3b). APPROXIMATE COMPUTATION OF THE SADDLE PATH. Equations (2.3) implicitly specify a nonlinear system of difference equations x t+1 = Ψ(x t ) in the vector x t := [K t , C t ]0 : g(x t , Ψ(xt )) = 02×1 , g = [g 1 , g 2 ]0 . The stationary solution is defined by K = K t = K t+1 and C = C t = C t+1 for all t ∈ N.4 Using this in (2.3) yields K = f (K) − C,
(2.4a)
0
1 = β f (K).
(2.4b)
Point x := [K, C]0 is a fixed point of Ψ, in the sense that x = Ψ(x). We obtain the linear approximation of Ψ at x via equation (13.12): ¯ t+1 = J(x)¯ x xt ,
¯ t := x t − x, x
(2.5)
with the Jacobian matrix J determined by J(x) =
−1
∂ g 1 (x,x) ∂ g 1 (x,x) ∂ K2 t+1 ∂ 2C t+1 ∂ g (x,x) ∂ g (x,x) ∂ K t+1 ∂ C t+1
∂ g 1 (x,x) ∂ g 1 (x,x) ∂ Ct ∂2 K t . ∂ g (x,x) ∂ g 2 (x,x) ∂ Kt ∂ Ct
(2.6)
The derivatives of g at the fixed point are easily obtained from (2.3a) and (2.3b):5 −1 1 1 −1 −β 1 1 0 β J(x) = − = u0 f 00 βu0 f 00 . −βu0 f 00 −u00 0 u00 − 00 1 + 00 u
u
In computing the matrix on the rhs of this equation, we used the definition of the inverse matrix given in (12.11). The eigenvalues λ1 and λ2 of J satisfy (see (12.29)):
3
Remember that in the notation of Section 1.3 f (K) := (1 − δ)K + F (K, L), where L are the farmer’s exogenously given working hours. 4 We use the convention that symbols without a time index refer to the stationary values. 5 We suppress the arguments of the functions and write f 0 instead of f 0 (K) etc.
84
2 Perturbation Methods: Framework and Tools
det J =
1 β
= λ1 λ2 ,
1 βu0 f 00 tr J = 1 + + = λ1 + λ2 . β u00 | {z } =:∆
Therefore, they solve equation φ(λ) := λ +
1/β = ∆. λ
The solutions are the points of intersection between the horizontal line through ∆ and the hyperbola p φ(λ) (see Figure 2.1). The graph of φ obtains a minimum at λmin = 1/ β > 1, where φ 0 (λmin ) = 1 − (1/β)λ−2 = 0.6 Since φ(1) = 1 + (1/β) < ∆, there must be one intersection to the right of λ = 1 and one to the left, proving that J has one real eigenvalue λ1 < 1 and another real eigenvalue λ2 > 1.
φ(λ) := λ +
1/β λ
λ
∆
1/β λ
λ1 1 λmin λ2 Figure 2.1 Eigenvalues of J 6
In Figure 2.1 λmin is so close to λ = 1 that we do not show it.
λ
2.3 Tools
85
Let J = T S T −1 denote the Schur factorization7 of J with T T −1 = I2 and the triangular matrix λ1 s12 S= . 0 λ2 In the new variables (where T −1 = (t i j )) y1t t 11 t 12 K t − K −1 ¯t ⇔ yt = T x = 21 22 , y2t t t Ct − C
(2.7)
the system of equations (2.5) is given by y t+1 = Sy t . The second line of this matrix equation is y2t+1 = λ2 y2t . Since λ2 > 1, the variable y2t will diverge unless we set y20 = 0. This restricts the system to the stable eigenspace. Using y2t = 0 in (2.7) implies 0 = t 21 x¯1t + t 22 x¯2t , y1t = (t
11
12
21
(2.8a) 22
− t (t /t ))x 1t .
(2.8b)
The first line is the linearized policy function for consumption: Ct − C = −
t 21 [K t − K] . t 22
(2.9a)
The second line of (2.8) implies via y1t+1 = λ1 y1t the linearized policy function for savings: K t+1 − K = λ1 [K t − K] .
(2.9b)
We illustrate these computations in the program Ramsey1.g, where we use u(C) = [C 1−η −1]/(1−η) and F (K, L) = K α . In this program, we show that it is not necessary to compute the Jacobian matrix analytically as we have done here. You may also write a procedure that receives the vector [K t , C t , K t+1 , C t+1 ] T as input and that returns the rhs of equations (2.3). This procedure can be passed to a routine that numerically evaluates the 7
See equation (12.33) in Section 12.9.
86
2 Perturbation Methods: Framework and Tools
partial derivatives at point (K, C, K, C). From the output of this procedure, you can extract the matrices that appear on the rhs of equation (2.6). Figure 2.2 compares the time path of the capital stock under the analytic solution K t+1 = αβ K tα (which requires η = δ = 1) with the path obtained from the approximate linear solution. The parameters are set equal to α = 0.36 and β = 0.996, respectively. The initial capital stock equals one-tenth of the stationary capital stock. As we would expect, far from the fixed point, the linear approximation is not that good. However, after approximately five iterations it is visually indistinguishable from the analytic solution. 0.20
Kt
0.15 0.10
Linear Approximation Analytic Solution
0.05
1
2
3
4
5
6
7
t Figure 2.2 Approximate Time Path of the Capital Stock in the Deterministic Growth Model
LINEAR AND QUADRATIC APPROXIMATIONS VIA THE IMPLICIT FUNCTION THEOREM. We now apply the implicit function theorem directly to find the linear approximation of the policy function for optimal savings. Let K t+1 = h(K t ) denote this function. Since K = h(K), its linear approximation at K is given by K t+1 = K + h0 (K)(K t − K).
(2.10)
Substituting equation (2.3a) for C t = f (K t ) − h(K t ) into equation (2.3b) delivers: 0 = φ(K t ) := u0 [ f (K t ) − h(K t )]
− βu0 [ f (h(K t )) − h(h(K t ))] f 0 (h(K t )).
2.3 Tools
87
Note that this condition must hold for arbitrary values of K t so that Theorem 13.4.1 allows us to compute h0 (K) from φ 0 (K) = 0. Differentiating with respect to K t and evaluating the resulting expression at K provides the following quadratic equation in h0 (K) (we suppress the arguments of all functions): (h0 )2 − 1 + (1/β) + (βu0 f 00 )/u00 h0 + (1/β) = 0. (2.11) | {z } =:∆
Let h01 and h02 denote the solutions. Since (by Vieta’s rule) h01 + h02 = ∆ and h01 h02 = 1/β, the solutions of equation (2.11) equal the eigenvalues of the Jacobian matrix λ1 and λ2 obtained in the previous subsection. The solution is, thus, given by h0 (K) = λ1 , and the approximate policy function coincides with equation (2.9a). Note that we actually do not need to compute the approximate policy function for consumption. Given the approximate savings function (2.9a) we obtain the solution for consumption directly from the resource constraint (2.3a). Observe further that this way of computing h0 (K) is less readily implemented on a computer. If we do not want to derive the quadratic equation by paper and pencil, we must employ software that is able to handle symbolic expressions, such as the computer algebra systems Mathematica or Maple. The need for symbolic algebra tools becomes even more apparent if we want to solve for the second-order approximations of the functions K t+1 = h(K t ) and C t = g(K t ), given by K t+1 = K + h0 (K)(K t − K) + 0.5h00 (K)(K t − K)2 ,
C t = C + g 0 (K)(K t − K) + 0.5g 00 (K)(K t − K)2 .
In Problem 2.2 we ask you to show that h00 and g 00 solve a system of linear R equations. Our MATLAB program DGM.m solves this system for u(C) :=
C 1−η − 1 , 1−η
f (K) := K α + (1 − δ)K.
It also demonstrates how our toolkit CoRRAM can be used to find the same solution with fewer lines of code. Figure 2.3 displays the linear and quadratic approximations of the policy function for consumption for the case of log-utility and full depreciation so that the analytic solution is given by C t = (1 − αβ)K tα . The parameter
88
2 Perturbation Methods: Framework and Tools 0.39 0.38
Linear Quadratic Analytic
Ct
0.37 0.36 0.35 0.34 0.33 0.16
0.17
0.18
0.19
0.2
0.21
0.22
0.23
0.24
0.25
Kt Figure 2.3 Policy Function for Consumption in the Deterministic Growth Model
values are α = 0.36 and β = 0.996. The second-order approximation (the blue line) is close to the analytic solution (the red line) so that the difference is almost invisible. It should be obvious from this example that perturbation solutions of more general models require additional tools. One does not want to derive complicated expressions using paper and pencil. Aside from the tremendous effort, this approach is error prone. We will return to this point in Section 2.6. We proceed with the linear-quadratic model in the next section. In addition to its importance as a framework of its own for dynamic models (see Hansen and Sargent (2014)), it serves as a starting point for the introduction of our canonical model in Section 2.5.
2.4 The Stochastic Linear-Quadratic Model This section presents the linear-quadratic (LQ) model and derives some of its important properties. The model consists of a quadratic objective function and a linear law of motion. Its solutions are linear policy functions. Accordingly, more generally, nonlinear models can be reduced to this structure if the nonlinear part of the model is approximated quadratically.
2.4 The Stochastic Linear-Quadratic Model
89
2.4.1 The Model Consider an economy governed by the following stochastic linear law of motion: x t+1 = Ax t + Bu t + ε t+1 .
(2.12)
The n-dimensional column vector x t holds the state variables , i.e., variables that are predetermined at period t. A fictitious social planner sets the values of the variables stacked in the m-dimensional column vector u t . We refer to x t as the state vector and to u t as the vector of control variables. A ∈ Rn×n and B ∈ Rn×m are matrices. Due to the presence of shocks, the planner cannot control this economy perfectly. The n vector of shocks ε t+1 is independently and identically distributed (iid) with mean E(ε t+1 ) = 0 and covariance matrix E ε t+1 ε Tt+1 = Σ.8 The planner must choose u t before the shocks are drawn by nature. Given x t , the planner’s objective is to maximize Et
∞ X s=0
β s x Tt+s Qx t+s + u Tt+s Ru t+s + 2u Tt+s Sx t+s ,
β ∈ (0, 1),
subject to (2.12). The current period objective function T T Q S T xt g(x t , u t ) := x t , u t S R ut
(2.13)
(2.14)
T is quadratic and concave in x Tt , u Tt . This requires that both the symmetric n × n matrix Q and the symmetric m × m matrix R are negative semidefinite.
2.4.2 Policy Functions The Bellman equation for the stochastic LQ problem is given by: v(x) := max u
x T Qx + 2u T Sx + u T Ru + βE [v(Ax + Bu + ε)] ,
(2.15)
Here and thereafter, the superscript T denotes transposition, i.e., ε T is a row vector and ε a column vector. 8
90
2 Perturbation Methods: Framework and Tools
where we used (2.12) to replace the next-period state variables in Ev(·) and where we dropped the time indices for convenience because all variables refer to the same date t. Expectations are taken conditional on the information contained in the current state x. We guess that the value function is given by v(x) := x T Px + d, P being a symmetric, negative semidefinite square matrix of dimension n and d ∈ R an unknown constant.9 Thus, we may write (2.15) as follows:10 x T Px + d = max u
x T Qx + 2u T Sx + u T Ru + βE (Ax + Bu + ε) T P(Ax + Bu + ε) + d .
(2.16)
Evaluating the conditional expectations on the rhs of (2.16) yields: x T Px + d = max u
x T Qx + 2uSx + u T Ru + βx T AT PAx + 2βx T AT P Bu + βu T B T P Bu
(2.17)
+ β tr(PΣ) + β d. In the next step, we differentiate the rhs of (2.17) with respect to the control vector u, set the result equal to the zero vector, and solve for u. This provides the solution for the policy function: u = − (R + β B T P B)−1 (S + β B T PA) x. | {z }
(2.18)
F
To find the solution for the matrix P and the constant d, we eliminate u from the Bellman equation (2.17) and compare the quadratic forms and the constant terms on both sides. Thus, P must satisfy the following implicit equation, known as discrete algebraic Riccati equation (DARE): P = Q + βAT PA
−1 − (S + β B T PA) T R + β B T P B (S + β B T PA)
(2.19)
and that d is given by:
Note, since x Tt Px t is a quadratic form, it is not restrictive to assume that P is symmetric. Furthermore, since the value function of a well-defined dynamic programming problem is strictly concave, P must be negative semidefinite. 10 If you are unfamiliar with matrix algebra, you may find it helpful to consult Sections 12.6 and 12.7. We present the details of the derivation of the policy function in Appendix A.3. 9
2.4 The Stochastic Linear-Quadratic Model
d=
91
β tr(PΣ). 1−β
There are several algorithms to solve DARE (see, e.g., Bini et al. (2012)). A straightforward approach is to iterate the matrix Riccati difference equation Ps+1 = Q + βAT Ps A
−1 − (S + β B T Ps A) T R + β B T Ps B (S + β B T Ps A),
starting with some initial negative definite matrix P0 11 and stopping if ||Ps+1 − Ps || < t ol in a given matrix norm ||·|| and for a prescribed tolerance criterion t ol. Once the solution for P has been computed, the dynamics of the model are governed by x t+1 = Ax t + Bu t + ε t+1 = (A − BF )x t + ε t+1 .
2.4.3 Certainty Equivalence The solution of the stochastic LQ problem has a remarkable feature. Since the covariance matrix of the shocks Σ does not appear equation (2.18) nor in equation (2.19), the optimal control is independent of the stochastic properties of the model summarized by Σ. Had we considered a deterministic linear quadratic problem by assuming ε t+1 = 0∀t ∈ N, we would have found the same feedback rule (2.18). You may want to verify this claim by solving Problem 2.3. This property of the stochastic LQ problem is called the certainty equivalence property. It is important to note that if we use the LQ approximation to solve DSGE models, we enforce the certainty equivalence property on this solution. This may hide important features of the model. For instance, consider two economies A and B that are identical in all respects except for the size of their productivity shocks. If economy A’s shock has a much larger standard deviation than economy B’s, it is difficult to believe that the agents in both economies use the same feed-back rules. 11
For example P0 = −0.01I n .
92
2 Perturbation Methods: Framework and Tools
2.5 A Canonical DSGE Model 2.5.1 Example There are several approaches to compute perturbation solutions of DSGE models, and each of them requires a specific formulation of the model. Examples are King and Watson (1998), Uhlig (1999), Klein (2000), or Sims (2002). Our canonical model is close to the approach of Schmitt-Grohé and Uribe (2004). Heiberger et al. (2015) employ the same model and prove that the linear part of the perturbation solution is unique, given that it exists at all. To motivate our framework, let us return to Example 1.6.1 and the set of equations (1.64) that determines the dynamics of this model. We repeat this system for the reader’s convenience: −η
0 = c t (1 − L t )θ (1−η) − λ t ,
(2.20a)
0 = θ c t − w t (1 − L t ),
(2.20b)
0 = y t − Z t kαt L 1−α , t yt 0 = w t − (1 − α) , Lt yt 0 = rt − α , kt
(2.20c) (2.20d) (2.20e)
0 = yt − ct − it
(2.20f)
0 = ak t+1 − (1 − δ)k t − i t , 0 = λt − β a
−η
(2.20g)
E t λ t+1 (1 − δ + r t+1 ) ,
0 = ln Z t+1 − ρ Z ln Z t + ε t+1 ,
ε t+1 iid
(2.20h) N (0, σε2 ).
(2.20i)
The symbols denote (where necessary) variables scaled by the level of labor augmenting technical progress A t , which grow at the rate a − 1. They refer to consumption c t , working hours L t , the Lagrange multiplier λ t of the household’s budget constraint, the real wage w t , output y t , total factor productivity Z t , the stock of capital k t , the rental rate of capital services r t , and investment i t . Equation (2.20a) states that the Lagrange multiplier must be equal to the marginal utility of consumption. Condition (2.20b) determines labor supply, equation (2.20c) is the production function, and equations (2.20d) and (2.20e) are the firm’s first-order conditions with respect to the demand for labor and capital services, equation (2.20f) is the equilibrium condition of the goods market, equation (2.20g) is the law of capital accumulation, equation (2.20h) is the household’s first-
2.5 A Canonical DSGE Model
93
order condition for the future stock of capital, and equation (2.20i) is the stochastic process for the log of total factor productivity (TFP) Z t . In this system, we can distinguish between three kinds of variables. The variable z t := ln Z t is purely exogenous and determined by the the autoregressive process given in equation (2.20i). The capital stock k t is predetermined at the beginning of each period t but evolves endogenously from period to period via equation (2.20g). The remaining variables ut := [ y t , c t , i t , L t , w t , r t , λ t ] T are determined within each period from equations (2.20a) through (2.20f) and equation (2.20h) given the values of k t and z t . Therefore, we will refer to them interchangeably as not predetermined, jump, or control variables. In terms of the LQ model of the previous subsection, k t and z t are the model’s state variables that follow the nonlinear law of motion given by equations (2.20g) and (2.20i). Each of the eight equations (2.20a)-(2.20h) is a particular version of the more general statement 0 = E t g i (st ),
T st = k t+1 , z t+1 , u Tt+1 , k t , z t , u Tt , i = 1, 2, . . . , 8.
2.5.2 Generalization VARIABLES. This structure is readily generalized to a canonical dynamic stochastic general equilibrium (DSGE) model. In the following, we denote the number of elements (i.e., the dimension) of a vector x with n(x) and distinguish three kinds of variables: endogenous states, exogenous states, and jump variables. We collect the endogenous state variables in the vector x t ∈ X ⊂ Rn(x) , which belongs to some subset X of the n(x) dimensional Euclidean space Rn(x) . Endogenous states have predetermined values at the beginning of period t but are endogenous insofar as their time paths are determined from the solution of the model. In our example, k t is the single element of this vector. Exogenous states, collected in the vector z t ∈ Z ⊂ Rn(z) , refer to variables determined outside of the model. In our example, the log of TFP Z t is the single element of this vector. Finally, the vector y t ∈ Y ⊂ Rn( y) comprises all variables that are not predetermined and not exogenous to the model. In our example, the elements of the vector u t belong to this category.
94
2 Perturbation Methods: Framework and Tools
EQUATIONS. The dynamics of the model are determined by a set of nonlinear difference equations and the law of motion of the stochastic variables in the vector z t . The former are stated as 0[n(x)+n( y)]×1 = E t g(x t+1 , y t+1 , z t+1 , x t , y t , z t ).
(2.21a)
This system determines implicitly the dynamics of the endogenous variables x t+1 and y t for any given sequence of the stochastic variables z t . For the latter we assume a first-order vector autoregressive (VAR(1))-process 0n(z)×1 = z t+1 − Rz t − ση t+1 ,
(2.21b)
η t = Ωε t ,
T
ε t iid E(ε t ) = 0n(z)×1 , var(ε t ) = E ε t ε t = I n(z) , Sn(z)×n(z)2 = E t η t+1 η Tt+1 ⊗ η Tt+1 .
(2.21c) (2.21d) (2.21e)
The parameter σ ∈ R≥0 can be used to transform the model into a deterministic one, since for σ = 0, the stochastic part drops out of equation (2.21b). The n(z) × n(z) matrix Ω introduces covariations between the elements of the n(z) vector ε t+1 . To see this, compute the covariance matrix Ση of the vector η t+1 . This gives Ση := E η t+1 η Tt+1 = E Ωε t+1 ε Tt+1 Ω T = ΩI n(z) Ω T = ΩΩ T . Note that we can recover Ω from a given positive definite matrix Ση as the square root of Ση (see, e.g., Greene (2012), p. 1041): Ω = CΛ1/2 C T , where Λ1/2 is the diagonal matrix with the square roots of the eigenvalues of Ση on the main diagonal and C is the matrix of normalized eigenvectors of Ση so that C C T = I n(z) . The statement in (2.21e) involves the Kronecker product ⊗ defined in equation (12.18) in Section 12.6. Using this definition, we obtain:
s111 . . . s11n(z) s121 s211 . . . s21n(z) s221 S= .. .. ... . . . . . sn(z)11 . . . sn(z)1n(z) sn(z)21 si, j,k = E t ηi t+1 η j t+1 ηkt+1
. . . s12n(z) . . . s22n(z) .. .. . . . . . sn(z)2n(z)
. . . s1n(z)n(z) . . . s2n(z)n(z) , .. .. . . . . . sn(z)n(z)n(z) (2.22a) (2.22b)
2.5 A Canonical DSGE Model
=
n(z) X n(z) X n(z) X s=1 r=1 l=1
95
Ωis Ω jr Ωkl E t (εst+1 ε r t+1 εl t+1 ) .
Note that S will be a zero matrix if E t (εst+1 ε r t+1 εl t+1 ) = 0 for all s, r, l ∈ {1, 2, . . . , n(z)}. Sufficient conditions for this to hold are independence and symmetry. Independence implies σsr l := E t (εst+1 ε r t+1 εl t+1 ) = E t (εst+1 )E t (ε r t+1 )E t (εl t+1 ). Since we assume E t (εi t+1 ) = 0 for all i = 1, . . . , n(z) (see (2.21d)), σsr l = 0 for all s 6= r 6= l and for all s 6= r = l. If all the n(z) elements of ε t+1 are drawn from a symmetric distribution, σsss = E t (ε3st+1 ) = 0 for all s = 1, . . . , n(z). We assume that the matrix R is such that the linear difference equation z t+1 = Rz t is stable. Accordingly, if there are no contemporary shocks, i.e., σ = 0 or ε t+1 = 0n(z)×1 for all t ∈ N, the vector z t asymptotically approaches the zero vector. This will happen if the eigenvalues of the matrix R are all within the unit circle.12 This condition also implies that the stochastic process defined by equation (2.21b) is covariance stationary (see Lütkepohl (2005), p. 15). Except for the explicit distinction between endogenous and exogenous states and the extension to nonsymmetric innovations in (2.21e), the canonical model (2.21) is equal to the framework adopted by SchmittGrohé and Uribe (2004). Different from the LQ model of Section (2.4), only the exogenous states z t follow a linear stochastic law of motion. The dynamics of the endogenous states x t are implicitly determined by the system (2.21a) and may well be nonlinear. SOLUTIONS. The solution of the model are time-invariant functions of the vector of state variables w t := [x Tt , z Tt ] T and the parameter σ. They determine the future endogenous state variables x t+1 and the current control variables y t . We will refer to them interchangeably as solution, policy functions, or decision rules and employ the notation: x1 h (w t , σ) .. , x t+1 = h x (x t , z t , σ) := (2.23a) . h x n(x) (w t , σ)
12
The unit circle is considered in Section 12.2 and Theorem 16.2.1 establishes that this condition is necessary and sufficient for the asymptotic.
96
2 Perturbation Methods: Framework and Tools
h y1 (w t , σ) .. . y t = h y (x t , z t , σ) := . h
yn( y)
(2.23b)
(w t , σ)
The perturbation approach approximates these functions by Taylor polynomials in the vector w t and the perturbation parameter σ. Given that the system of equations (2.21a) and (2.21b) has a stable solution at σ = 0, the implicit function theorem (13.4.1) implies the existence of a solution for a nearby stochastic model at σ = 1. The qualification ‘nearby’ requires that the elements on the diagonal of the covariance matrix ΩΩ T are not too large. As we have seen in the example in Section 2.3.2, finding these approximations requires the derivatives of the system (2.21a). This was easy in the deterministic model with one endogenous state K t and one jump variable C t , but it is quite intricate in the general case with many variables. We will need a few more tools to be able to handle the canonical DSGE model (2.21).
2.6 More Tools and First Results 2.6.1 Computer Algebra versus Paper and Pencil As we have seen in Section 2.3.2, it requires several steps to obtain approximate solutions for the policy function (2.23). The first step is to solve the system (2.21a) at the deterministic stationary solution. For σ = 0, equation (2.21b) becomes a purely deterministic vector-valued autoregressive process. Assuming that the deterministic system implied by 0[n(x)+n( y)]×1 = g(x t+1 , y t+1 , z t+1 , x t , y t , z t ), 0n(z)×1 = z t+1 − Rz t is asymptotically stable, it approaches a stationary solution defined by 0[n(x)+n( y)]×1 = g(x, y, 0, x, y, 0).
(2.24)
This is a nonlinear system of equations. In simple cases, as the model of Example 1.6.1, it is possible to solve this system using paper and pencil (see equations (1.65)). However, more complex models require a nonlinear equation solver, as the algorithms considered in Section 15.3.
2.6 More Tools and First Results
97
In this book, we use the terms deterministic stationary solution, deterministic stationary equilibrium, deterministic steady state , and nonstochastic steady state interchangeably. They all refer to the solution of the system (1.65). The second and all further steps invoke the implicit function theorem. To apply this theorem, we replace x t+1 , y t+1 , and y t by the policy functions (2.23a) and (2.23b) and z t+1 by (2.21b). This yields: 0m×1 = E t g h x (x t , zt , σ), h y (h x (x t , z t , σ), Rz t + σΩε t+1 , σ) , (2.25) Rz t + σΩε t+1 , x t , h y (x t , z t , σ), z t , where m = n(x)+n( y). This nonlinear system must hold for all conceivable values of the vectors x t and z t and the parameter σ. Thus, it must also hold with respect to the derivatives of this system with respect to x t , z t , and σ evaluated at the point s := [x T , y T , 0 T , x T , y T , 0 T ] T . As you will see in a moment, computing these derivatives by paper and pencil is a formidable task. Therefore, one way to proceed is to employ a computer algebra system (CAS). Software, such as Mathematica, Maple, or Sage are able to derive the formulas for the partial derivatives of (2.25) to any desired order (although the respective formulas will be difficult to read). They allow the user to manipulate the ensuing systems of equations symbolically up to the point where the embedded solvers can numerically compute the matrices that represent the approximate solution. The Mathematica program of Swanson et al. (2006) employs this approach. The reader who does not want to get involved in all the remaining details can stop here and proceed to the applications in Chapter 4. The second approach employs at least partly paper and pencil and uses one of several methods to compute derivatives. This approach was pioneered by Jin and Judd (2002) and Judd (1998) and employed, among others, by Schmitt-Grohé and Uribe (2004), Gomme and Klein (2011), Andreasen (2012), and Binning (2013). The analytical derivations either rely on the use of tensor notation (Jin and Judd (2002), Schmitt-Grohé and Uribe (2004), and Andreasen (2012)) or employ chain rules for matrixvalued derivatives (Gomme and Klein (2011) and Binning (2013)). We introduce the tensor approach in the next subsection and derive a few results analytically. Our own approach, explained in great detail in Chapter R 3 and implemented in our GAUSS and MATLAB toolboxes, rests on the use of chain rules for matrix-valued derivatives. We introduce these rules in Section 2.6.3 and close this chapter in Section 2.6.4 with a sketch of the different ways to compute partial derivatives numerically.
98
2 Perturbation Methods: Framework and Tools
2.6.2 Derivatives of Composite Functions and Tensor Notation COMPOSITE FUNCTION. The rhs of the system of equations (2.25) is a composite function. Consider the vector s t := [x Tt+1 , y Tt+1 , z Tt+1 , x Tt , y Tt , z Tt ] T ∈ R2[n(x)+n( y)+n(z)] . Via the policy functions (2.23) and via equation (2.21b), its elements are themselves functions f i (·), i = 1, 2, . . . , 2[n(x) + n( y) + n(z)] of the vector w t := [x Tt , z Tt , σ] T ; for instance: s1t = h x 1 (x t , z t , σ) and s[n(x)+n( y)+n(z)+1]t = x 1t . The elements g j (s t ), j = 1, 2, . . . , n(x) + n( y) of g map w t to a point v ∈ Rn(x)+n( y) . Therefore, and ignoring the time index for the time being, we have the composite function p(w) := (g ◦ f) (w) defined by 1 1 g f (w), f 2 (w), . . . , f n( f ) (w) , g 2 f 1 (w), f 2 (w), . . . , f n( f ) (w) , (2.26) p(w) := (g ◦ f) (w) = .. . g n(g) f 1 (w), f 2 (w), . . . , f n( f ) (w) where n(g) = n(x)+n( y) and n( f ) = 2[n(x)+n( y)+n(z)]. The application of the implicit function theorem requires computing the partial derivatives of the map p. PARTIAL DERIVATIVES. We begin with the matrix of first-order partial derivatives, the Jacobian matrix. Let p ij denote the element in row i and column j of this matrix. According to the chain rule of differentiation this element is equal to p ij
n( f )
∂ p i (w) X ∂ g i (s) ∂ f r (w) := = . ∂ wj ∂ sr ∂ wj r=1
(2.27)
The tensor notation simplifies this expression in two ways. First, we employ p ij as a shortcut for the collection of real numbers that built the Jacobian matrix. In the same way, we employ gsi and f wr as a shortcut for the r
j
gradient vectors of g i with respect to s, and of f r with respect to w. Second, we suppress the summation sign. Hence, equation (2.27) simplifies to p ij = gsi f wr . r
j
2.6 More Tools and First Results
99
Differentiating equation (2.27) with respect to w k gives the elements of the matrix of second-order partial derivatives, the Hessian matrix. Applying the chain rule again, we find: ∂ 2 p i (w) ∂ w j ∂ wk
p ijk = =
n( f) X r=1
(2.28)
n( f ) n( f ) ∂ g i (s) ∂ 2 f r (w) X X ∂ 2 g i (s) ∂ f r2 (w) ∂ f r1 (w) + . ∂ s r ∂ w j ∂ w k r =1 r =1 ∂ s r1 ∂ s r2 ∂ w k ∂ wj 1
2
Applying the same rules, we may identify the expression p ijk with the object
whose elements are the n(g) × n( f )2 second-order partial derivatives of the vector-valued function p(w) and gsi ,s and f wl ,w as the objects whose r1
r2
j
k
elements are the second-order partial derivatives of g and f, respectively. Thus, the previous equation can be written briefly as p ijk = gsi f wr r
j ,w k
+ gsi
r1 ,s r2
f wr2 f wr1 . k
j
Note that the indices r, r1 and r2 represent the summation. The index i refers to the number of the equation and the indices j and k to the number of the argument of p. Thus, what is lost by applying this notation are the upper limits of summation.13 Finally, consider an element of the matrix of third-order partial derivatives. It derives from applying the chain rule to (2.28): p ijkl := =
∂ 3 p i (w) ∂ w j ∂ wk ∂ wl n( f) X r1
n( f ) n( f) X X ∂ g i (s) ∂ 3 f r1 (w) ∂ 2 g i (s) ∂ 2 f r1 (w) f r2 (w) + ∂ s r1 ∂ w j ∂ w k ∂ w l r =1 r =1 ∂ s r1 ∂ s r2 ∂ w j ∂ w k ∂ w l =1
+
+
+ 13
1
n( f ) n( f) X X r1 =1 r2
2 i
r2
2
∂ g (s) ∂ f (w) ∂ f r1 (w) ∂ s r1 ∂ s r2 ∂ w k ∂ w j ∂ w l =1 2
n( f ) n( f) X X r1 =1 r2
∂ 2 g i (s) ∂ 2 f r2 (w) ∂ f r1 (w) ∂ s r1 ∂ s r2 ∂ w k ∂ w l ∂ w j =1
(2.29a)
n( f ) n( f ) n( f) X X X r1 =1 r2 =1 r3
∂ 3 g i (s) ∂ f r3 ∂ f r2 (w) ∂ f r1 (w) . ∂ s r1 ∂ s r2 ∂ s r3 ∂ w l ∂ w k ∂ wj =1
r We could go even further and suppress the elements s and u and write p ijk = g ri f j,k + However, as you will notice in a moment, this would raise ambiguities when
r r g ri 1 ,r2 f k 2 f j 1 .
applied to the system (2.21a).
100
2 Perturbation Methods: Framework and Tools
In tensor notation and suppressing the arguments s and w, this can be written as r
r p ijkl =g ri f jkl + g ri
1 ,r2
r
f jk1 f l 2 + g ri
r
+ g ri
1 ,r2 ,r3
r
r
1 ,r2
f l 3 f r2 + g ri
1 ,r2 ,r3
r
f k 2 f jl1 + g ri r
r
1 ,r2
r
r
f kl2 f j 1
fl 3 fk 2 f j 1 .
Note that the order of differentiation does not matter if we assume that the involved functions are at least three-times continuously differentiable. In this case, Young’s theorem establishes that pπ1 = pπ2 for any two of the six permutations π of the three indices j, k, l. For instance, p jkl = p jlk = plk j . CERTAINTY EQUIVALENCE. We will now apply tensor notation to prove that the linear approximation of the policy functions (2.23) is independent of the parameter σ and, hence, of the covariance matrix of the innovations ε. We have encountered this property as a feature of the solution of the stochastic LQ problem in Section 2.4. Using Taylor’s theorem (13.3.2), the linear approximation of the policy functions (2.23) at the point s = [x T , y T , 0 T , x T , y T , 0 T ] T can be written in tensor notation as x i t+1 = x i + h xx i (x j t − x j ) + hzx i z j t + hσx i σ, j
(2.30a)
j
yi t = yi + h xyi (x j t − x j ) + hzyi z j t + hσyi σ. j
(2.30b)
j
x
y
We will first show that all elements of hσi and hσi are equal to zero. In a x x y y second step, we will demonstrate that all elements of h x ij , hz ji , h x ij , and hz ji are independent of the parameter σ, and in a third step, we will prove x x y y that the second-order partial derivatives h x ij ,σ , hz ji,σ , h x ij ,σ , and hz ji,σ are also equal to zero. Differentiating the system (2.25) with respect to the parameter σ, yields the following conditions: yr
yr
xr
0n(g)×1 = E t g xi 0 hσx r + g iy 0 h x r1 hσ 2 + g iy 0 hz 0 1 Ω r2 ε t+1 + g iy 0 hσyr r
2
r1
r1
r
r2
+ gzi 0 Ω r ε t+1 + g iy hσyr , r
r
yr
xr
= g xi 0 hσx r + g iy 0 h x r1 hσ 2 + g iy 0 hσyr + g iy hσyr , r
r1
2
r
r
where Ω r denotes the rth row of the matrix Ω. The second line follows yr from the property (2.21d) of the model. In this expression, h x r1 are the 2
2.6 More Tools and First Results
101
coefficients of the linear part of the Taylor series approximation of the policy function h y (see (2.30b)) and g ij , j ∈ {x k0 , yk0 , yk } are the elements of the Jacobian matrix of partial derivatives of the vector valued function g(s t ) with respect to x t+1 , y t+1 , and y t . The prime designates the variables as of time t + 1. The second line of the above equation forms a linear homogenous system in the coefficients of the partial derivatives of the policy functions h x and h y with respect to the parameter σ. A unique y solution hσx = 0n(x)×1 and hσ = 0n( y)×1 exists if the coefficient matrix of this system has rank n(x) + n( y).14 Therefore, the linear approximation can only depend on the vector [x Tt , z Tt ] T . To prove the second result, we differentiate the system (2.25) with respect to vectors x t and z t . This yields two systems of quadratic equations (we have encountered this property in the example in Section 2.3.2): ¦ © yr x r 0 = E t g xi 0 h xx r + g iy 0 h x r1 h x j2 + g xi + g iy h xyr , (2.31a) j j r j 2 r r1 ¦ © yr x r yr r 0 = E t g xi 0 hzx r + g iy 0 h x r1 hz j 2 + g iy 0 hzr 1 R j2 + gzi 0 R rj + g iy hzyr + gzi . r
j
r1
2
r1
2
r
r
j
j
(2.31b)
Note that these are [n(x)+ n( y)]×[n(x)+ n(z)] equations in the unknown x x y y coefficients h x ij , hz ji , h x ij , and hz ji . Since the partial derivatives of g are evaluated at the stationary solution with σ = 0, none of the coefficients of this system and, hence, its solution, depends on σ. We defer the solution of this system to Chapter 3 and differentiate both equations with respect to the parameter σ. To shorten the result, note that g ij , j ∈ {x 0j , y 0j , x j , y j } 14
The system can be written in matrix terms as x1 hσ . . a1,1 . . . a1,n(x) b1,1 . . . b1,n( y) x . n(x) . . .. .. . . .. hσy = 0 .. . . m×m , . h1 . . . σ am,1 . . . am,n(x) bm,1 . . . bm,n( y) . .. yn( y)
hσ
where m = n(x) + n( y). The elements of the coefficient matrix are given by X ai j = g xi 0 + g iy 0 h xylj and bi j = g iy 0 + g iy j . j
l
l
j
Therefore, rank deficiency would, e.g., require bik = 0 for some k so that the respective variable yk would not occur in the system (2.21a), either at time t + 1 or at time t; thus, this system would be misspecified.
102
2 Perturbation Methods: Framework and Tools
depends on the vector s t . Consider, e.g., the term E t g xi . Differentiating j with respect to σ yields ( i ) ∂ g x (s t ) y x yr yr j r r Et = E t g xi ,x 0 hσx r + g xi , y 0 h x r1 hσ 2 + hzr 1 Ω r2 ε t+1 + hσ 1 j r j r1 2 2 ∂σ + g xi x
0 j ,z r
Ω r ε t+1 + g xi
j , yr
hσyr .
y
We already know hσr = hσr = 0. Furthermore, E t Ω r ε t+1 = 0 due to assumption (2.21d). Hence, the whole expression evaluates to zero. This result holds for the derivatives of all the terms g ij with respect to σ. Therefore, we obtain the following system of equations: ¦ yr x r yr xr xr 0 = E t g xi 0 h xx r ,σ + g iy 0 h x r1 h x j2,σ + g iy 0 h x r1 ,x r hσ 3 h x j2 j 2 2 3 r r1 r1 © y r1 y r1 x r2 xr i i + g y 0 h x r ,zr Ω r3 ε t+1 h x j + g y 0 h x r ,σ h x j2 + g iy h xyr ,σ , r j 2 2 3 r1 r1 ¦ y r1 x r 2 y r1 x r3 x r 2 i xr i i 0 = E t g x 0 hz ,σ + g y 0 h x r hz j ,σ + g y 0 h x r ,x r hσ hz j j
r
+
g iy 0 r
1
r1
2
yr xr h x r1 ,zr Ω r3 ε t+1 hz j 2 2 3 yr
r1
+
g iy 0 r
r
+ g iy 0 hzr 1,zr Ω r3 ε r+1 R j2 + g iy 0 r1
2
3
r1
x Taking expectations and using hσr = xr x system in the matrices h x j ,σ , hz jr,σ ,
2
3
yr xr h x r1 ,σ hz j 2 2
yr
xr
r
+ g iy 0 hzr 1,x r hσ 3 R j2 2 3 r1 1 © y r1 r2 i yr hzr ,σ R j + g y hz ,σ . 2
r
j
y hσr = 0 delivers a linear y y h x rj ,σ , and hz jr,σ . Again,
homogenous given that a unique solution exists, these matrices must be zero matrices so that they do not appear in the quadratic approximation of the policy functions. The same logic applies to the third-order partial derivatives of the policy y x functions. In particular, hwwσ and hwwσ are zero matrices. We sketch the proof of this assertion in Appendix A.4 and summarize the results of this paragraph for later reference: hσx = 0n(x)×1 ,
x hwσ x hwwσ
= 0n(x)×n(w) , = 0n(x)n(w)×n(w) ,
hσy = 0n( y)×1 ,
y hwσ y hwwσ
(2.32a)
= 0n( y)×n(w) ,
(2.32b)
= 0n( y)n(w)×n(w) .
(2.32c)
While helpful in the derivation of analytical results, the tensor notation does not lend itself to be easily implemented on the computer, except if one employs computer algebra.15 Therefore, we will consider a second approach that rests on matrix chain rules for composite functions. 15
The sceptical reader might want to read Section 2.5.4 of the second edition of the present book.
2.6 More Tools and First Results
103
2.6.3 Derivatives of Composite Functions and Matrix Chain Rules Matrix chain rules can be used to develop systems of linear equations whose solutions are the matrices of the second- and third-order Taylor series approximations of the policy functions (2.23). The advantage of these formulas over the tensor approach is that they are easily implemented with matrix-oriented programming environments, such as GAUSS R and MATLAB . They have been employed by Gomme and Klein (2011) (second-order approximation) and by Binning (2013) (third-order approximation) to solve the canonical DGE model defined in (2.21). We follow the approach of these authors in Chapter 3 and introduce the required tools in this subsection. Chain rules are formulas for the Hessian and the matrix of third-order partial derivatives of composite functions and we will apply them to the function defined in equation (2.26). They involve matrices of first-, secondand third-order partial derivatives of the functions g i (s), i = 1, 2, . . . , n(x)+ n( y) with respect to elements of the vector s, 0 0 0 s = [x 10 , . . . , x n(x) , y10 , . . . , yn( , z 0 , . . . , zn(z) , y) 1
x 1 , . . . , x n(x) , y1 , . . . , yn( y) , z1 , . . . , zn(z) ] T , and of the function f j (w t ), j = 1, 2, . . . , 2[n(x) + n( y)] with respect to the elements of the vector w t , w := [x 1 , . . . , x n(u) , z1 , . . . , zn(z) , σ] T , where the prime denotes variables dated t + 1. Before we can state the rules, we must define several matrices of partial derivatives. We do so for the vector-valued map f : Rn(w) → Rn(s) , w 7→ f(w) with element functions f i (w) : Rn(w) → R, w 7→ f i (w) but the definitions also apply to the map g : Rn(s) → Rn(x)+n( y) , s 7→ g(s). We define the following matrices:
104
2 Perturbation Methods: Framework and Tools
Jacobian matrix: 1 f w . . . f w1 1 n(w) . . fw := .. . . . .. , n( f ) f w1
...
(2.33a)
n( f ) f w n(w)
Hessian matrix: i 1 f w ,w . . . f wi ,w f ww 1 1 1 n(w) . i .. .. .. fww := .. , fww := , . . . n( f ) i i f w ,w . . . f w ,w f ww n(w)
Third-order matrix: 1 f www . fwww := .. , f i n( f ) f www
fwi
f wi
www
j ,w 1 ,w 1
fi w j ,w2 ,w1 = .. j ,w,w . f wi
j ,w n(w) ,w 1
:=
f wi f wi f wi
f wi
j ,w 2 ,w 2
.. .
j ,w n(w) ,w 2
n(w)
1 ,w,w
.. .
f wi ,w,w n(w)
j ,w 1 ,w 2
n(w)
1
(2.33b)
,
. . . f wi
(2.33c)
j ,w 1 ,w n(w)
. . . f wi ,w ,w j 2 n(w) .. .. . . i . . . f w ,w ,w j
n(w)
.
n(w)
In the same way, we define the Jacobian matrix gs , the Hessian matrix gss , and the matrix of third-order partial derivatives gsss of the vector valued function g : Rn(s) → Rn(x)+n( y) , s 7→ g(s). The chain rule for Hessian matrices from Magnus and Neudecker (1999), Theorem 9, p. 125, yields the following formula for the Hessian matrix of the composite function p(w) := (g ◦ f) (w) defined in (2.26): T pww = I n(g) ⊗ fw gss fw + gs ⊗ I n(w) fww . (2.34) Binning (2013), Theorem 1, establishes the following result for the matrix of third-order derivatives: T T T pwww = I n(g) ⊗ fw ⊗ fw gsss fw + I n(g) ⊗ ˜fww gss fw T + I n(g) ⊗ fw ⊗ I n(w) gss ⊗ I n(w) fww (2.35) T + I n(g) ⊗ ˜f gss ⊗ I n(w) fww + gs ⊗ I n(w)2 fwww , w
where
2.6 More Tools and First Results
105
1 1 1 I n(w) ⊗ f w1 f w2 . . . f w n(w) .. ˜fw := . n( f ) n( f ) n( f ) I n(w) ⊗ f w1 f w2 . . . f w n(w)
(2.36)
and the n( f ) × n(w)2 matrix ˜fww arranges each block of the Hessian matrix of f in one row:16
f w1 ,w . . . f w1 ,w 1
1
f 2 w1 ,w1 ˜fww := . .. n( f )
f w1 ,w1
1
n(w)
f w1 ,w . . . f w1 ,w 2
1
. . . f w2 ,w f w2 ,w 1 n(w) 2 1 .. .. .. . . . n( f ) n( f ) . . . f w1 ,w n(w) f w2 ,w1
2
n(w)
. . . f w2 ,w 2 n(w) .. .. . . n( f ) . . . f w2 ,w n(w)
. . . f w1
n(w) ,w 1
. . . f w2 ,w n(w) 1 .. .. . . n( f ) . . . f w n(w) ,w1
. . . f w1
n(w) ,w n(w)
. . . f w2 ,w n(w) n(w) . .. .. . . n( f ) . . . f w n(w) ,w n(w) (2.37)
In Chapter 3 we will need to compute mixed partial derivatives of the i form pσσw . The respective chain rule is given by l
T T pσσw = gs ⊗ I n(w)−1 fσσw + I n(g) ⊗ fw gss fσσ + 2 I n(g) ⊗ fσw gss fσ T T + I n(g) ⊗ fw ⊗ fσ gsss fσ . (2.38) To establish this result, put w j = w k = σ in equation (2.29a), which simplifies to17 i pσσl :=
=
∂ 3 g i (s) ∂ σ∂ σ∂ w l n( f) X r1
n( f ) n( f) X X ∂ g i (s) ∂ 3 f r1 (w) ∂ 2 g i (s) ∂ 2 f r1 (w) f r2 (w) + ∂ s r1 ∂ σ∂ σ∂ w l r =1 r =1 ∂ s r1 ∂ s r2 ∂ σ∂ σ ∂ w l =1 1
+2
+ 16
n( f ) n( f) X X r1 =1 r2
2 i
2
r2
∂ g (s) ∂ f (w) ∂ 2 f r1 (w) ∂ s r1 ∂ s r2 ∂ σ ∂ wl ∂ σ =1
n( f ) n( f ) n( f) X X X r1 =1 r2 =1 r3
∂ 3 g i (s) ∂ f r3 ∂ f r2 (w) ∂ f r1 (w) ∂ s r1 ∂ s r2 ∂ s r3 ∂ w l ∂σ ∂σ =1
Binning (2013) employs a different ordering of the partial derivatives. However, due to the symmetry properties of the matrices defined in (2.33b) and (2.33c) this has no effect on the results established in Chapter 3. 17 Note that for w j = w k = σ the third and fourth sums on the right-hand side of equation (2.29a) are equal to each other, since we can always interchange the order of the summation.
106
2 Perturbation Methods: Framework and Tools
n( f ) i = gsi fσσw l + f w1 . . . f w l gss fσσ l n( f ) n( f ) i T i 1 + 2 fσw fσ + f w1 . . . f w l ⊗ fσ gsss fσ . . . . fσw l gss l
l
Without loss of generality we may assume w n(w) = σ and stack the remaining n(w) − 1 = n(x) + n(z) elements in the vector i pσσw T i T 1 gsi ⊗ I n(x)+n(z) fσσw + fw gss fσσ + 2fσw gss fσ .. i pσσw := = . T T i + fw ⊗ fσ gsss fσ , i pσσw n(w)−1
where
1 fσσw 1 .. . n( f ) fσσw1 .. .
fσσw := , 1 fσσw n(w)−1 .. . n( f ) fσσw n(w)−1
f w1 1 . T fw := .. f w1 n(w)−1 1 fσw 1 .. T fσw := . 1 fσw
n( f ) . . . f w1 .. .. . . , n( f ) . . . f w n(w)−1 n( f ) . . . fσw1 .. .. . . .
n(w)−1
n( f )
. . . fw
n(w) −1
i If we finally stack the vector pσσw vertically, we obtain equation (2.38).
2.6.4 Computation of Partial Derivatives So far, we have assumed that the matrices of first-, second-, and thirdorder partial derivatives of the system (2.21a), gs , gss , and gsss , are given. However, once we implement the solution on the computer, we cannot ignore this issue. There are three different approaches to compute derivatives with the computer: 1) via numerical differentiation, 2) symbolic differentiation (SD) via a computer algebra system (CAS), or 3) via automatic differentiation (AD). To illustrate these different approaches, consider the first-order condition of the model of Example 1.6.1 as presented in equation (2.20a), now written as a function of three variables, f (λ, c, L) ≡ λ − c −η (1 − L)θ (1−η) ,
(2.40)
2.6 More Tools and First Results
107
where λ, c, and L denote, respectively, the (scaled) Lagrange multiplier of the resource constraint, consumption, and working hours.18 Numerical differentiation approximates derivatives by finite differences as explained in Section 14.2. Let h denote a small number. Then, the central difference formula employed in our Jacobian routines computes the derivative of f with respect to λ as: ∂ f (λ, c, L) f (λ + h, c, L) − f (λ − h, c, L) ≈ , ∂λ 2h ∂ f (λ, c, L) f (λ, c + h, L) − f (λ, c − h, L) f c := ≈ , ∂c 2h ∂ f (λ, c, L) f (λ, c, L + h) − f (λ, c, L − h) f L := ≈ . ∂L 2h
fλ :=
Note that each of these derivatives requires only two evaluations of the function f and is thus cheap in terms of runtime to compute. A second advantage is that numerical differentiation is not limited to simple functions such as (2.40). It is applicable to functions without analytical expression and to functions involving many complex computations. Furthermore, routines that provide numerical derivatives are available in many programming languages. However, the precision of the approximation depends critically on the choice of h and deteriorates with the order of the derivative. Computer algebra systems (CAS) operate with symbols instead of numbers. Examples of CAS programs are Maple, Mathematica, Sage, or R the symbolic toolbox of MATLAB . They derive the formulas for the partial derivatives from the symbolic definition of a function. In our example, they would compute the exact symbolic representation of the partial derivatives: ∂ f (C, L, Y ) = 1, ∂λ ∂ f (C, L, Y ) fc : = = ηc −η−1 (1 − L)θ (1−η) , ∂L ∂ f (C, L, Y ) fL : = = θ (1 − η)c −η (1 − L)θ (1−η)−1 . ∂Y
fλ : =
Given this code, one can evaluate the derivatives to the precision of the computers floating point system. Therefore, CAS can be used for derivatives of any order without loss of precision. The operations involved in computing the symbolic derivatives can consume quite some runtime in 18
We omit the time index for simplicity.
108
2 Perturbation Methods: Framework and Tools
models with many variables. Furthermore, they are limited to functions with analytic representation. Automatic differentiation (AD) evaluates derivatives while the value of the function is being computed. It just requires an algorithmic specification of the function and produces derivatives as precise as the function value itself (see Rall and Corliss (1996)). To understand what is meant by this statement, consider again the example in equation (2.40). The left column in Table 2.1 represents the steps involved in computing the function value at a given point (λ, c, L). The rightmost column of the list assigns to each step the respective gradient ∇, i.e., the vector of partial derivatives with respect to the three variables λ, c, and L. The operations in the middle column involve the power rule, the chain rule, and the product rule. The last step delivers in t 8 the function value and in ∇t 8 the gradient. This can be seen by successive substitution of the expressions for t i , i ∈ {2, 4, 5, 6} in the expression for ∇t 8 in Table 2.1. Thus, the algorithm delivers the partial derivatives at the same precision as it computes the function value. The algorithm does not supply the symbolic derivatives. Rather, it differentiates at each step using the elementary rules of differentiation. Table 2.1 Code List for Equation (2.40) Steps
Gradient
t1 t2 t3 t4 t5 t6 t7 t8
∇t 1 ∇t 2 ∇t 3 −η−1 ∇t 4 = −ηt 2 ∇t 2 ∇t 5 = −∇t 3 ∇t 6 = ϑt 5ϑ−1 ∇t 5 ∇t 7 = ∇t 4 t 6 + t 4 ∇t 6 ∇t 8 = ∇t 1 − ∇t 7
=λ =c =L −η = t2 , = 1 − t3 = t 5ϑ = t4 t6, = t1 − t7
ϑ := θ (1 − η)
Result = (1, 0, 0) = (0, 1, 0) = (0, 0, 1) −η−1 = (0, −ηt 2 , 0), = (0, 0, −1), = (0, 0, −ϑt 5ϑ−1 ), −η−1 = (0, −ηt 2 t 6 , −ϑt 5ϑ−1 t 4 ), −η−1 = (1, −ηt 2 t 6 , ϑt 5ϑ−1 t 4 )
There are two strategies to implement the computations in Table 2.1: source code transformation and operator overloading (see Rall and Corliss (1996)). The former transforms the program code displayed in the left column to a program code for the operations defined in the middle column. This is a two-step procedure and it requires other software to perform this
2.6 More Tools and First Results
109
task. It is not, however, limited to functions that have an analytic expression as the function defined by (2.40). Source code transformation can also be applied to functions, whose evaluation requires complex computations. Object-oriented programming languages are well suited to implement operator overloading (see, e.g., Neidinger (2010)). With respect to our example, consider the following strategy. We define a new data type, represented by the ordered pair x~ = [x, x 0 ], where the first component holds the value and the second component stores the value of the derivative. If x is a variable, its first derivative is equal to one. Hence, we initialize x~ = [x, 1]. If c is a constant so that its derivative is equal to zero, we initialize ~c = [c, 0]. To implement the code in the middle row of Table 2.1 we only have to define addition, subtraction, multiplication, and the power function for this new data type. This can be done by redefining the respective operators +, -, * and ˆ as in the following formulas: x~ + ~y = [(x + y), (x 0 + y 0 )], x~ − ~y = [(x − y), (x 0 − y 0 )],
x~ ∗ ~y = [(x ∗ y), (x 0 y + x y 0 )], x~ a = [(x a ), (a x a−1 )].
If we initialize ~t 1 = [λ, (1, 0, 0)], ~t 2 = [c, (0, 1, 0)], and ~t 3 = [L, (0, 0, 1)], the steps defined in the left column of Table 2.1 provide in ~t 8 both the R value of the function and its three partial derivatives. The MATLAB script Example_Valder.m illustrates this approach. It employs the class valder defined by Neidinger (2010). More recent programming languages, such as Python and Julia, provide packages that compute derivatives via operator overloading. The website http://www.autodiff.org/ provides an overview of the progress made in this area of computing and points to implementations of AD in various programming languages. Our comparison of the different ways of computing derivatives in the next paragraph uses CasADi, an AD impleR mentation in MATLAB (see Andersson et al. (2018)). R COMPARISON. The MATLAB script Der_BM.m uses the model of Example 1.6.1 and computes the Jacobian and the Hessian matrices of the system of equations (2.20) at the stationary solution. We compare four different methods: 1) numerical differentiation with central difference formulas (CD), 2) numerical differentiation with central difference formulas and Richardson’s extrapolation (CD&RE), 3) automatic differentiation (AD)
110
2 Perturbation Methods: Framework and Tools
R using the CasADi toolbox, and 4) SD as implemented in the MATLAB symbolic toolbox. We use the SD solution as benchmark with respect to the precision of the results. The error of one of the methods 1)-3) reported in Table 2.2 is defined as the maximum absolute value of the difference between the solution obtained from the respective method and the SD solution.
Table 2.2 Computation of Derivatives of Example 1.6.1 Method CD CD&RE AD SD
Jacobian Error Runtime 5.78e-8 1.12e-12 8.88e-16
0.002 0.005 0.034 0.132
Hessian Error Runtime 1.21e-05 9.93e-09 5.68e-14
0.012 0.080 0.014 0.336
Notes: CD: Central difference formulas, CD&RE: central difference formulas and Richardson extrapolation, AD: automatic differentiation, SD: symbolic differentiation, Error: maximum absolute error with respect to R the SD solution, runtime in seconds on a workstation with Intel Xeon W-2133 CPU running at 3.60 GHz, e-n:= 10−n .
The table shows that the AD solution for the Jacobian does not differ from the SD solution. The relative error is smaller than the machine epsilon of approximately 2.22 × 10−16 . The AD solution for the Hessian is also close to the SD solution. Its relative error is approximately two orders of magnitude larger than the machine epsilon. The precision of the numerical derivatives deteriorates: The relative error of the CD solution (CD&RE) for the Hessian is two (three) orders of magnitude larger than the relative error of the solution for the Jacobian. Richardson’s extrapolation increases the precision vis-à-vis the CD solution by approximately four to three orders of magnitude. With respect to runtime, the SD solution is the slowest. To compute the 8×18×18 = 2, 592 elements of the Hessian, it requires about one-third of a second, although this number should be used cautiously since it depends on the hardware. In the case of the Jacobian, numerical derivatives are fast, whereas the loops required for the Hessian in the case of Richardson’s extrapolation increase runtime so that this method is slower than AD.
Appendix 3
111
A.3 Solution of the Stochastic LQ Problem In this appendix, we provide the details of the solution of the stochastic linear quadratic (LQ) problem. If you are unfamiliar with matrix algebra, you should consult Section 12.7 before proceeding. Using matrix algebra we may write the Bellman equation (2.16) as follows: x T Px + d = max x T Qx + u T Ru + 2u T Sx u + βE x T AT PAx + u T B T PAx + ε T PAx (A.3.1) + x T AT P Bu + u T B T P Bu + ε T P Bu + x T AT Pε + u T B T Pε + ε T Pε + d . Since E(ε) = 0 the expectation of all linear forms involving the vector of shocks ε evaluate to zero. The expectation of the quadratic form ε T Pε is: n n n X n XX X E pi j εi ε j = pi j σi j , i=1 i=1
i=1 j=1
where σi j (σii ) denotes the covariance (variance) between εi and ε j (of εi ). It is not difficult to see that this expression equals the trace of the matrix PΣ; i.e., E(ε T Pε) = tr(PΣ). Furthermore, since P = P T and z := u T B T PAx = z T = (x T AT P B T u) T we may write the Bellman equation as x T Px + d = max x T Qx + 2u T Sx + u T Ru + βx T AT PAx u
+ 2βx T AT P Bu + βu T B T P Bu + β tr(PΣ) + β d .
(A.3.2)
This is equation (2.17) in the main text. Differentiation of the rhs of this expression with respect to u yields 2Sx + 2Ru + 2β(x T AT P B) T + 2β(B T P B)u. Setting this equal to the zero vector and solving for u gives (R + β B T P B) u = − (S + β B T PA) x | {z } | {z } C −1
D
⇒
u = −C Dx.
If we substitute this solution back into (A.3.2), we get: x T Px + d =x T Qx − 2(C Dx) T Sx + (C Dx) T RC Dx + βx T AT PAx
− 2βx T AT P BC Dx + β(C Dx) T B T P BC Dx + β tr (PΣ) + β d
=x T Qx + βx T AT PAx
− 2x T D T C T Sx − 2βx T AT P BC Dx
+ x T D T C T RC Dx + βx T D T C T B T P BC Dx + β tr (PΣ) + β d.
(A.3.3)
112
2 Perturbation Methods: Framework and Tools
The expression on the fourth line of (A.3.3) can be simplified to − 2x T D T C T Sx − 2βx T AT P BC Dx | {z } =2βx T D T C T B T PAx
T
T
T
= −2x D C (S + β B T PA) x = −2x T D T C T Dx. | {z } D
The terms on the fifth line of (A.3.3) add to x T D T C T (R + β B T P B)C Dx = x T D T C T D. | {z } I
Therefore, x T Px + d = x T Qx + βx T AT PAx − x T D T C T Dx + β tr(PΣ) + β d.
(A.3.4)
For this expression to hold, the coefficient matrices of the various quadratic forms on both sides of equation (A.3.4) must satisfy the matrix equation P = Q + βAT PA + D T C T D, and the constant d must be given by d=
β tr(PΣ). 1−β
This finishes the derivation of the solution of LQ the problem.
Appendix 4
113
A.4 Third-Order Effects In this Appendix, we sketch the proof of the result presented in equation (2.32c). In a first step, we differentiate equation (2.31a) with respect to x k . This yields j
r
0 r1 ,x r2
k
xr
xr
0 =g xi 0 h xx r ,x + g xi 0
xr
yr
yr
xr
xr
yr
h x k2 h x j1 + g xi 0
0 r1 , y r2
xr
h x r2 h x k3 h x j1 + g xi 0 ,x h xx r 3
k
r
yr
xr
xr
+ g xi 0
h x k2 h x j1 + g iy 0 h x r1 h x j2,x k + g iy 0 h x r1 ,x r h x j2 h x k3
+ g iy 0
h x r1 h x j2 h x k3 + g iy 0
r1 , y r2
0 r1 ,x r3
+ g xi +
0 j ,x r
yr
k
2
yr
0 j , y r1
yr x r h x j1 h x k2
+
yr
r1 ,x k
2
h xx r + g xi
g iy ,x 0 r1 r
xr
xr
2
r1
xr
2
yr
r1 , y r3
2
xr
g iy , y 0 r1 r
3
h x r1 h x j2 + g iy 0
h x r1 h x k1 + g xi 2
2
r1
j , yr
h xyr g xi k
yr yr x r h x j1 h x r2 h x k3 3
+
(A.4.1)
j
j ,x k
yr
xr
h x r1 h x j2 h x k3 2
+ g iy h x r x j , x k r
g iy ,x h xyr r k j
+ g iy
yr
r1 , y r2
yr
h x j1 h x k2 .
In a second step, we differentiate the previous equation with respect to the perturbation parameter σ. We can shorten the presentation of the final result considerably by noting that i 1) all derivatives of g ri and g rs (r ∈ {x r0 , y r0 , x r , y r }, s ∈ {x s0 , ys0 , x s , ys }) with respect to σ involve (by the chain rule) either hσyr = 0 or hσx r = 0 and can thus be ignored,
2) we have already shown in Section 2.6.2 that all partial derivatives of the form h xyr ,σ and h xx r ,σ are also equal to zero, j
j
3) and that (again by the chain rule) the derivatives of terms that involve yr h x r1 (x0 , z, σ) are given by 2
yr
yr
xr
h x r1 ,σ + h x r1 ,x r hσ 3 2
2
3
and are, thus, also equal to zero. Therefore, at the stationary solution, the derivative of equation (A.4.1) with respect to σ implies the linear homogenous system of equations 0 = g xi 0 h xx r ,x j
r
yr
k ,σ
r
2
r1
yr
xr
+ g iy 0 h x r1 h x j2,x k ,σ + g iy h xyr ,x j
k ,σ
+ g iy h x r1 ,x r r1
2
3
x r2 x r3 ,σ h x j h x k ,
i = 1, 2, . . . , n(x) + n( y), j, k = 1, 2, . . . , n(x). Given that this system has a unique solution, this solution is given by h xx r ,x ,σ = 0 j k and h xyr ,x ,σ = 0. j k Next we differentiate equation (2.31a) with respect to the variable zk . The result is given by: xr
xr
xr
r
0 =g xi 0 h xx r ,z + g xi 0 ,x 0 h x j1 hzk 2 + g xi 0 ,z 0 h x j1 R k2 j k r r r r1 r2 1 y2 x x r1 y r 2 r3 r3 r2 i + g x 0 y 0 h x j h x r hzk + hzr R k + g xi 0 ,z h xx r + g xi 0 r1
r2
3
3
r
k
j
r1
(A.4.2) x r1 y r2 , y r 2 h x j hz k
114
2 Perturbation Methods: Framework and Tools yr
yr
xr
xr
xr
yr
xr
r
+ g iy 0 h x r1 h x j2,zk + g iy 0 h x r1 ,x r h x j2 hzk 3 + g iy 0 h x r1 ,zr h x j2 R k3 2
r1
yr
+ g iy 0
xr
xr
2
r1
3
yr
xr
xr
r1
+ g iy 0
h x r1 h x j2 hzk 3 + g iy 0 ,z 0 h x r1 h x j2 R k 3 2 2 r r y x 1 3y y r1 x r2 r r4 r3 r3 h x r h x j h x r4 hzk + hzr4 R k4 + g iy 0
+ g iy 0
h x r1 h x j2 hzk 3
0 r1 ,x r3 0 r1 , y r3
+ +
yr g iy , y 0 h x j1 r1 r
+
yr
xr
2
g xi ,z j k
g xi , y hzyr j r k
2
3
yr
xr
h x r1 h x j2 2 y x yr r r r i xr i r i + g x ,x 0 hz + g x ,z 0 R k + g x , y 0 h x r1 hzk 2 + hzr 1 R k2 r1 ,zk
2
yr
r1 , y r3
2
+
j
r
g iy h xyr ,z r j k
yr x r h x r2 hzk 3 3
k
+
yr r + hzr 2 R k3 3
j
r
g iy ,x 0 r1 r
2
+
yr x r h x j 1 hz k 2
yr g iy ,z h x j1 r1 k
j
r1
+
g iy ,z 0 r1 r
+
2
2
2
yr r h x j1 R k2
yr yr g iy , y h x j1 hzk 2 . r1 r2
Differentiating this equation with respect to σ and applying the same rules as above yields: 0 = g xi 0 h xx r ,z r
j
yr
k ,σ
xr
xr
yr
+g iy 0 h x r1 h x j2,zk ,σ +g iy 0 h x j2 h x r1 ,x r r1
2
2
r1
3
y r3 i ,σ hzk +g y 0
r1
yr
xr
h x j2 h x j1,zk ,σ +g iy h xyr ,z r
j
k ,σ
.
Since we have already shown that h xyr ,x = 0, this is also a linear homogenous j k y system of equations in the matrices h xxzσ and h xzσ . A unique solution of this yr xr system is given by h x ,z ,σ = 0 and h x ,z ,σ = 0. j k j k The last step to prove the assertion (2.32c) consists in differentiating equation (2.31b) with respect to zk , and next with respect to σ. We leave this step as Problem 2.4 to the reader. The final result is the linear homogenous system: y x yr r r r r (A.4.3) 0 = g xi 0 hzx r,z ,σ + g iy 0 h x r1 hz j ,z2 k ,σ + hzr 1,zr ,σ R j2 R k3 + g iy hzyr,z ,σ . r
j
k
r1
2
1
2
r
j
k
Problems
115
Problem 2.1: Numerical Computation of the Saddle Path. Figure 1.2 displays the saddle path of the deterministic, infinite-horizon Ramsey model. From the perspective of dynamic programming, this path is the graph of the policy function for consumption C t = g(K t ). In this problem, we ask the reader to compute the graph of this function numerically via shooting (see also Section 16.2.3). We parameterize the model as follows: u(C) :=
C 1−η − 1 , η=2 1−η
f (K) = K α + (1 − δ)K, α = 0.36, δ = 0.014.
Accordingly, the following system of equations determines the dynamics of the model (see (2.3)): K t+1 = K tα + (1 − δ)K t − C t , C t+1 −η α−1 1=β 1 − δ + αK t+1 . Ct
(P.2.1.1)
1) Compute the stationary solution of this system and the Jacobian matrix (2.6) at this point. 2) Compute the Schur factorization of this matrix. Hint: In both GAUSS and R MATLAB the command schur returns the matrices S and T . 3) Compute g 0 (K) from (2.9a). 4) Compute the right wing of the saddle path: This path originates at the stationary solution (K, C) and follows from iterating forward over the system (P.2.1.1). However, the iterations cannot start at (K, C). Why does this not work? Instead, consider a point close to (K, C) computed from the linear policy function (2.9a): C0 = C + g 0 (K) d K, d K := (K0 − K).
Choose d K = 0.001K and start your iterations at (K0 , C0 ) and continue until ¯ = δ1/(α−1) . K t+1 > K 5) Compute the left wing of the saddle path: Iterate backwards, i.e., solve (P.2.1.1) for (K t , C t ) as functions of (K t+1 , C t+1 ). Start the iterations at K T − K = − d K and C T = C − g 0 (K) d K. Problem 2.2: Second-Order Approximation of the Policy Functions of the Deterministic Ramsey Model Consider equations (2.3). They determine the dynamics of the deterministic, infinite-horizon Ramsey model. Let K t+1 = h(K t ) and C t = g(K t ) denote the
116
2 Perturbation Methods: Framework and Tools
policy function for capital and consumption that solve this model. Therefore, we may write 0 = h(K t ) − f (K t ) + g(K t ),
0 = u0 (g(K t )) − βu0 (g(h(K t ))) f 0 (h(K t )).
Differentiating both equations with respect to K t provide the system of equations from which we have determined h0 in Section 2.3.2: 0 = h0 (K) − f 0 (K) + g 0 (K),
0 = u00 (g(K))g 0 (K) − βu0 (g(h(K))) f 00 (h(K))h0 (K) − βu00 (g(h(K))) f 0 (h(K))h0 (K)g 0 (K).
Differentiate this system again with respect to K and show that the solution yields the following linear system in the unknown values of the second derivatives of the policy functions at the point K: 00 00 1 1 h f = , a21 a22 g 00 b2 a21 := −β(u0 f 00 + u00 f 0 g 0 ), a22 := u00 1 − β f 0 (h0 )2 ,
b2 := −u000 (g 0 )2 1 − β f 0 (h0 )2 + β(h0 )2 u0 f 000 + 2u00 f 00 g 0 .
Note that we suppress the arguments of all functions in this expression for ease of readability. Problem 2.3: Certainty Equivalence Consider the deterministic linear quadratic optimal control problem of maximizing ∞ X t=0
β t x0t Qx + u0t Ru t + 2u0t Sx t
subject to the linear law of motion x t+1 = Ax t + Bu t . Adapt the steps followed in Section 2.4 and A.3 to this problem, and show that the optimal control as well as the matrix P are the solutions to equations (2.18) and (2.19), respectively. Problem 2.4: Derive Equation (A.4.3) Follow these steps: 1) Differentiate equation (2.31b) with respect to zk .
Problems
117
2) Differentiate the resulting system with respect to the perturbation parameter σ. 3) evaluate the expectational terms using property (2.21d). y 4) Use h xy xσ = 0 and h xzσ = 0.
Chapter 3
Perturbation Methods: Solutions
3.1 Introduction In this chapter, we derive perturbation solutions of the canonical DSGE model introduced in Section 2.5. In particular, we provide formulas for the computation of the matrices of the first-, second-, and third-order approximations of the policy functions (2.23). We start in Section 3.2 with the first-order solution. As can be seen from equations (2.31) the coefficients that constitute this solution solve a system of quadratic equations. Rather than solving this system with a nonlinear equations solver,1 we rely on methods from linear algebra: The linearized system of equations (2.21a) together with the process (2.21b) is a linear rational expectations model for which several solution methods exist. We consider the approach proposed by Klein (2000) and a second approach that proceeds in two steps (see, e.g., Hernandez (2013) and Heiberger et al. (2017)). The first step reduces the model to a smaller one. The second step solves the reduced model. Depending on the properties of the reduced model we either employ the method of Klein (2000) or a somewhat simpler method at this step. We proceed in Section 3.3 with the second-order solution. The coefficients of this solution solve a large system of linear equations. We derive this system following the approach advocated by Gomme and Klein (2011) which rests on the chain rule (2.34) introduced in Section 2.6.3. In Section 3.4 we follow Binning (2013) and derive an even larger system of linear equations whose solution delivers the coefficients of the third-order solution. The reader who does not wish to follow the tedious 1
See Meyer-Gohde and Saecker (2022) for this approach.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_3
119
120
3 Perturbation Methods: Solutions
derivations can safely skip this section. Its main purpose is to document R the formulas used in our GAUSS and MATLAB programs that implement the perturbation solution of DSGE models.
3.2 First-Order Solution 3.2.1 First-Order Policy Functions This section partly follows the presentation in Heiberger et al. (2017). For further details, in particular with respect to the uniqueness of the solution, see Heiberger et al. (2015). As we show in Section 2.6.2, the linear approximate solution does not depend on the perturbation parameter σ. Therefore, using Taylor’s theorem for the multivariable case (see Theorem 13.3.2), a first-order approximation of the policy functions (2.23) at the stationary solution x t = x, y t = y and z t = 0n(z)×1 is given by ¯ t, x t+1 = x + hwx w yt =
(3.1a)
¯ t, y + hwy w
(3.1b)
xt − x ¯ t := w . zt
(3.1c) y
Thus, our task is to find the matrices hwx and hw .
3.2.2 BA Model We begin by linearizing the system (2.21a) at the point T s = x T , y T , 01×n(z) , x T , y T , 01×n(z) that solves the system (2.24). Differentiating (2.21a) with respect to the six vectors x t+1 , y t+1 , z t+1 , x t , y t and z t yields ¯ t+1 + g y 0 E t y ¯ t+1 + gz 0 E t z t+1 + g x x ¯t + g y y ¯ t + gz z t . 0[n(x)+n( y)]×1 = g x 0 E t x The notation gi indicates the Jacobian matrix of the system of equations g(·) with respect to the vector argument i ∈ {x 0 , y 0 , z 0 , x, y, z}. The prime distinguishes vectors with variables dated at time t + 1 from those that refer to period t. The bar denotes deviations from the stationary solution,
3.2 First-Order Solution
121
¯ t := x t − x and y ¯ t := y t − y. If the variables of the model are the logs i.e., x rather than the levels of the original variables, the bar indicates percentage deviations. This distinction, however, is not important for what follows. The preceding equation and the process (2.21b) form a linear system of stochastic difference equations that can be written as ¯ t+1 ¯ w w B Et =A t , (3.2a) ¯ t+1 ¯t y y −g x −gz −g y A= , (3.2b) 0n(z)×n(x) R 0n(z)×n( y) gx0 gz 0 g y0 B= . (3.2c) 0n(z)×n(x) I n(z) 0n(z)×n( y) We refer to this linear model as the BA-model. In general, many of the equations in system (3.2) will involve only variables dated at time t, as for instance the linearized equations (1.64a)-(1.64f) of the model in Example 1.6.1. In this case, the matrix B is singular and we cannot solve ¯ Tt+1 , y ¯ Tt+1 ] T as a function of [w ¯ Tt , y ¯ Tt ] T . As noted by Klein (2000), for [w the QZ factorization can be used to solve the model in (3.2). This factorization of (A, B) is given by (see (12.34)) U H BV = S, U H AV = T,
(3.3)
where U and V are unitary matrices, S and T are upper triangular matrixes, and the superscript H denotes the Hermitian transpose of a matrix. The eigenvalues of the matrix pencil are given by λi = t ii /sii for sii 6= 0. Furthermore, the matrices S and T can be arranged such that the eigenvalues appear in ascending order with respect to their absolute value. We define new variables H ˜t ¯t w Vww Vw y w = (3.4) ˜t ¯t y Vy w Vy y y so that H
U BV E t or
˜ t+1 ˜t w w H = U AV ˜ t+1 ˜t y y
˜ t+1 ˜t Sww Sw y w Tww Tw y w Et = . ˜ t+1 ˜t 0 Sy y y 0 Ty y y
(3.5)
122
3 Perturbation Methods: Solutions
Assume that n(w) = n(x) + n(z) eigenvalues have modulus less than one so that |sii | > |t ii | for i = 1, . . . , n(w). In this case, the matrices Sww and Tww are upper triangular matrices of dimension n(w) × n(w), the matrices Sw y and Tw y are of dimension n(w) × n( y), and the matrices S y y and T y y are upper triangular matrices of dimension n( y) × n( y). Furthermore the matrix Sww is invertible.2 Accordingly, the system ˜ t+1 = T y y y ˜t S y y Et y is unstable.3 To force the system (3.2a) into its stable subspace (the saddle ˜ t = 0n( y) for all t. Thus, path in the two-dimensional case), we must set y from the first line of (3.5) −1 ˜ t+1 = Sww ˜ t. w Tww w
Since −1 ˜ t = Vww ¯t w w
(3.6)
from the first line of (3.4), we obtain −1 −1 ¯ t+1 = Vww Sww ¯ t. w Tww Vww w | {z } hw w
The second line of (3.4) together with (3.6) implies −1 ¯ t = Vy w Vww ¯ t. y w | {z } y
hw x The first n(x) rows of the matrix hw w form the matrix hw in (3.1a), since ¯ t+1 is equal to z t+1 = Rz t + ση t+1 . the second part of the vector w Note that this solution requires that the matrix Vww be invertible. Otherwise no solution exists. However, if a solution does exist, it is unique, despite that the QZ factorization is not unique (see Heiberger et al. (2015)). 2
Sww is an upper triangular matrix with nonzero elements on its diagonal so that |Sww | = 6 0, which implies that its inverse exists. 3 To see this, consider the last line of this system, which may be written as E t ˜yn( y),t+1 = λn( y) ˜yn( y),t ,
|λn( y) | = |(t n( y),n( y) /sn( y),n( y) )| > 1.
3.2 First-Order Solution
123
3.2.3 System Reduction STATIC AND DYNAMIC EQUATIONS. Assume that we are able to sort the equations (2.21a) so that the first n(u) equations g1 through gn(u) involve only period t variables, i.e., 0n(u)×1 = g1 (., ., ., ., x t , y t , z t ), 0(n(x)+n(v))×1 = g2 (x t+1 , y t+1 , z t+1 , x t , y t , z t ), n(v) := n( y) − n(u). We refer to the former as static equations and to the latter, which involve variables dated at t and t + 1, as dynamic equations. We can, therefore, partition the Jacobian matrix of the system (2.21a) as:
g1x 0 g1z 0 g1y 0 g1x g1z g1y
g2x 0 g2z 0 g2y 0 g2x g2z g2y
≡
0n(u)×n(x) 0n(u)×n(z) 0n(u)×n( y) C x Cz C y Dx
Dz
Dy
F x Fz F y
and can reformulate the linearized system (3.2) as: ¯ t = −C x x ¯ t − Cz z t , Cyy
¯ t+1 + Dz E t z t+1 + F x x ¯ t + Fz z t = −D y E t y ¯ t+1 − F y y ¯t . Dx E t x
(3.7a) (3.7b)
Our goal will be to reduce the second equation to a system similar to (3.2) but with fewer variables. There are two reasons for this approach. First, in large systems, this two step procedure saves computing time, since the factorization of a smaller matrix pencil is faster.4 Second, as discussed in Heiberger et al. (2017), there may be gains in numerical precision. REDUCTION WITHOUT QR FACTORIZATION. In a first step, we partition ¯ t and the matrix C y the vector y ¯ u ¯ t = t , C y = Cu C v . y ¯t v ¯ t as a function If the matrix Cu has full rank n(u), we can solve (3.7a) for u of the remaining variables. We will consider the case ofrank(Cu ) < n(u) later and proceed. Let D y = Du Dv and F y = Fu F v so that we can rewrite (3.7) as 4
See Hernandez (2013) for the efficiency gains of model reduction.
124
3 Perturbation Methods: Solutions
¯t x ¯ t = Cu −C x −Cz −C v ¯z t , u ¯t v ¯ t+1 x Dx Dz Dv E ¯z t+1 0n(z)×n(x) I n(z) 0n(z)×n(v) t ¯ t+1 v x ¯t Fx Fz Fv ¯z t + 0n(z)×n(x) −R 0n(z)×n(v) ¯t v Du Fu ¯ ¯ . =− Eu − u 0n(z)×n(u) t t+1 0n(z)×n(u) t −1
(3.8a)
(3.8b)
Note that in the second line we have added the process (2.21b). Using ¯ t+1 and u ¯ t in (3.8b) yields (3.8a) to replace E t u ¯ ¯ ¯ w w x t+1 t ˜Et ¯ t := t , B = A˜ ,w (3.9a) ¯ t+1 ¯t v v zt D D D D x z v u ˜ := B + Cu−1 C x Cz C v , 0n(z)×n(x) I n(z) 0n(z)×n(v) 0n(z)×n(u) (3.9b) Fx Fz Fv Fu A˜ := − − C −1 C x Cz C v . 0n(z)×n(x) −R 0n(z)×n(v) 0n(z)×n(u) u (3.9c) ˜B ˜ ) as explained This system can be solved by factoring the matrix pencil (A, in Section 3.2.2. SCHUR FACTORIZATION. Here we will explore a second approach that ˜ is invertible. This will be the case in many assumes that the matrix B applications because the first step has removed the static equations from the model, which are responsible for the matrix B in (3.2a) being not invertible. Thus, let ¯ t+1 ¯t w w ˜ −1 A. ˜ Et =W , W =B (3.10) ¯ t+1 ¯t v v The (simple) Schur factorization of the matrix W is given by S = T H W T,
T T H = I n(w)+n(v) ,
3.2 First-Order Solution
125
where S is an upper triangular matrix with the eigenvalues of W on the main diagonal. Assume that n(w) = n(x) + n(z) eigenvalues are within and n(v) eigenvalues outside the unit circle. S and T can be chosen so that the first n(w) eigenvalues appear first on the main diagonal of S. In the new variables H ˜t ¯t w Tww Twv w = (3.11) ˜t ¯t v Tvw Tv v v the transformed system reads ˜ t+1 ˜t w Sww Swv w Et = . ˜ t+1 ˜t v 0 S vv v
(3.12)
˜ t+1 = S vv v ˜ t is unstable, and we must set v ˜t = Accordingly, the system E t v 0n(v) ∀t. As a consequence, the first line of (3.12) reads as ˜ t+1 = Sww w ˜t Et w
and the first line of (3.11) is equal to ¯ t = Tww w ˜ t. w A solution exists, if the matrix Tww is invertible. It is given by: −1 ¯ t+1 = Tww Sww Tww ¯ t. w w {z } |
(3.13a)
=:hw w
x The upper n(x) rows of hw w are equal to the matrix hw in Equation (3.1a). The solution with respect to the vector v t follows from the second line of (3.11) −1 ¯ t = Tvw Tww ¯ t. v w | {z }
(3.13b)
v =:hw
¯ t yields Using (3.8a) to solve for u I n(w) −1 ¯ t = Cu −C x −Cz −C v ¯t u −1 w Tvw Tww | {z }
(3.14)
=:huw
so that hwy
huw = v . hw
(3.15)
126
3 Perturbation Methods: Solutions
REDUCTION WITH PREVIOUS QR FACTORIZATION. We turn to the case in which the matrix Cu does not have rank n(u). As proposed by Hernandez ¯ t such that the associated (2013), we reorder the variables in the vector y matrix, say, C˜u , has full rank. We assume that the n(u) × n( y) matrix C y has full row rank n(u).5 In this case, the QR factorization of C y delivers: Q T C y P = C˜u C˜v , where C˜u is invertible and C˜v is of size n(u) × [n( y) − n(u)].6 Since P P T = I n( y) , we can transform Equation (3.7a) as follows: ¯ t + C˜v v ¯ − QT C z . ¯ t ) =: C˜u u ¯t = − QT Cx x Q T C y P(P T y | {z } t | {z }z t =:C˜x
(3.16a)
=:C˜z
In the same way, we also partition the matrices D y and F y in (3.7b): ˜u D ˜v , Dy P = D F y P = F˜u F˜v .
(3.16b) (3.16c)
Replacing Cu , C x , Cz , C v , Du , and Fu in (3.9) with the respective new matrices with the tilde ~ allows us to solve the model in the same way as ˜ wy denote this solution. presented in the preceding two paragraphs. Let h The solution in the initially given ordering of the variables is then equal to: ˜y. hwy = P h w
(3.16d)
3.2.4 Digression: Solving Separately for the Deterministic and Stochastic Components y
Note that the first n(x) columns of hwx and of hw are equal to the solution of a model without exogenous shocks. The procedure adopted thus far, however, does not allow us to separate the computation of the deterministic component from the stochastic component, i.e., the remaining n(z) y columns of hwx and of hw , respectively. Now, imagine estimating the shock process of a model as proposed by Chari et al. (2007). Maximizing the 5
This will be the case if none of the n(u) linearized equations is redundant. However, as shown by Hernandez (2013), the case of rank(C y ) < n(u) can also be treated. 6 See Equation (12.38) in Section 12.9.
3.2 First-Order Solution
127
likelihood function is an iterative process. At each of its steps the model must be solved for the current estimate of the matrix R of the process (2.21b). Therefore, one can accelerate the estimation process if only the remaining n(z) columns of the linearized policy function are updated. The next paragraphs show how this can be accomplished. SYSTEM REDUCTION. Our starting point is model (3.2). We assume that it can be written as:7 x ¯t ¯ t = −C x −C v Cu u − Cz zt , (3.17a) ¯t v x ¯ ¯t x Dx Dv E t t+1 + Dz E t z t+1 + F x F v + Fz z t (3.17b) ¯ t+1 ¯t v v ¯ t+1 − Fu u ¯t. = −E t Du u
¯ t and use this As in section 3.2.3, we can solve Equation (3.17a) for u ¯ t+1 and u ¯ t in Equation (3.17b). In the last step we solution to replace E t u use (2.21b) to replace E t z t+1 with Rz t . The final result is the system: ¯ t+1 ¯ x x ¯ t, ¯ ¯ BEt = A t + Cz (3.18a) ¯ t+1 ¯t v v ¯ := Dx Dv − Du C −1 C x C v , B (3.18b) u −1 A¯ := − F x F v + Fu Cu C x C v , (3.18c) −1 −1 C¯ := − Dz − Du Cu Cz R − Fz − Fu Cu Cz . (3.18d) ¯ SCHUR FACTORIZATION. In this paragraph we assume that the matrix B defined in (3.18b) is invertible so that the reduced model can be written as ¯ t+1 ¯t x ¯ ¯ x ¯ := B ¯ −1 A, ¯ V¯ := B ¯ −1 C. Et =W + V¯ z t , W (3.19) ¯ t+1 ¯t v v ¯ and define new Let S¯ = T¯ H W T¯ denote the Schur factorization of W variables by ¯t ˜t ¯t ˜ x T¯x x T¯x v x T¯ x x T¯ x v x x = ¯ ¯ ⇔ ¯ v x ¯ vv = t (3.20) ¯ ˜ ¯ ˜t vt Tv x Tv v v t T T vt v | {z } =: T¯ H
7
Otherwise, we can reorder the variables in the vector y t as explained in the previous subsection.
128
3 Perturbation Methods: Solutions
so that the transformed system reads: ˜ t+1 ˜t x S¯x x S¯x v x ¯ t, Q ¯ := T¯ H V¯ . Et = + Qz ¯ ˜ t+1 ˜t v 0 Sv v v
(3.21)
Consider the second line of this system: ¯ v zt , ˜ t+1 = S¯v v v ˜t + Q Et v ¯ v is the lower n(v) × n(z) part of the (n(x) + n(v)) × n(z) matrix where Q ¯ Assume that v ˜ t = Φz t is the solution of this forward-looking linear Q. difference equation (see Heer and Maußner (2009a) p. 109). Therefore: ¯ v zt , E t Φz t+1 = S¯vv Φz t + Q ¯ v zt ΦRz t = S¯vv Φz t + Q so that the matrix Φ is the solution of the following Sylvester equation ¯ v. RΦ = S¯v v Φ + Q
(3.22)
Applying the vec operator to this equation (see (12.21c)) yields T ¯ v. R ⊗ I n(v) − I n(z) ⊗ S¯vv vec Φ = vec Q However, solving this linear system is numerically extensive and unstable. There exist smarter algorithms to solve equation (3.22). For instance, in R MATLAB the command Phi=sylvester(R,Svv,Qv) computes Φ and the same command has been available in GAUSS since version 17. ¯ t from the second line of the We obtain the solution for the vector v rightmost system of equations defined in (3.20): ¯ t + T¯ v v v ¯ t = Φz t T¯ v x x so that −1 v x ¯ t + (T vv )−1 Φ z t . ¯ t = − T¯ v v v T¯ x | {z } | {z } =:h vx
=:hzv
¯ t+1 follows from the first line of (3.19): The solution for x ¯x xx ¯xvv ¯ t+1 = W ¯t + W ¯ t + V¯x zt , x ¯x x − W ¯ x v T¯ vv −1 T¯ v x x ¯ x v (T vv )−1 Φ z t . ¯ t + V¯x + W = W
(3.23)
3.2 First-Order Solution
129
As shown in Heer and Maußner (2009a) p. 112, the deterministic component of the solution for the vector of states can be simplified to ¯ x v (T vv )−1 Φ z t . ¯ t+1 = T¯x x S¯x x T¯x−1 ¯ t + V¯x + W x (3.24) x x | {z } | {z } =:h xx
=:hzx
¯ t . This yields: In final step we use equation (3.18a) to solve for the vector u ¯t = u
−Cu−1 |
I n(x) 0n(x)×n(z) −1 −1 C x Cv ¯ t + Cu Cz − Cu C x C v x zt . h vx hzv {z } | {z } =:hux
=:huz
(3.25) ¯ from Equation QZ-FACTORIZATION. Suppose that either the matrix B ˆ from the system (3.18) or the matrix B ¯ t+1 ¯ x x ˆ t, ˆ ˆ BEt = A t + Cz (3.26a) ¯ t+1 ¯t y y ˆ = g x 0 g y 0 , Cˆ = −gz 0 R − gz Aˆ = −g x −g y , B (3.26b)
is not invertible. In this case we can employ the factorization defined in (12.34). Considering the system (3.18) its transformed version is:8 ¯ ˜ t+1 ˜t S¯x x S¯x v x T¯x x T¯x v x Q ¯ = U H C. ¯ Et = + ¯ x zt , Q (3.27) ˜ t+1 ˜t 0 S¯v v v 0 T¯vv v Qv ˜ t and z t . The second line of this system defines a dynamic system in v ˜ t = Φz t so that Φ solves the matrix Assume that the solution is given by v equation ¯ v. S¯v v ΦR − T¯v v Φ = Q
(3.28)
−1 ¯ We could transform this equation into a Sylvester equation ΦR − S vv Ty y = −1 ¯ S v v Q v . However, the upper triangular matrix S¯vv may have elements on its main diagonal close to zero. Therefore, we follow Heer and Maußner (2009b) who show that Φ can be computed stepwise starting with the last row of this matrix. To see this, note that the last line of the system of equations (3.28) can be written as 8
The derivations in the case of system (3.26) are basically the same. Therefore, we skip them.
130
3 Perturbation Methods: Solutions
φ n(v) sn(v),n(v) R − t n(v),n(v) I n(z) = qvn(v) .
In this equation sn(v),n(v)) and t n(v),n(v) denote the last element on the main diagonal of the matrices S¯vv and T¯vv , respectively, φ n(v) is the last row of ¯ v . Assuming that the matrix Φ and qvn(v) is the last row of the matrix Q not both sn(v),n(v) and t n(v),n(v) are equal to zero, this system can be solved for φ n(v) . Given this solution line n(v) − 1 of the system can be solved for φ n(v)−1 . For row i = n(v), . . . , 1 the solution is given by φ i = qvi +
n(v) X j=i+1
−1 t i, j φ j − si, j φ j R si,i R − t i,i I n(z) .
(3.29)
Using ˜t ¯ V¯x x V¯x v x x = t ¯ ¯ ˜t ¯t Vv x Vv v v v and assuming that Vx x is invertible it is straightforward to show that the ¯ t is given by solution for the vector of jump variables v ¯ ¯ ¯ −1 ¯ ¯ t = V¯v x V¯x−1 v (3.30a) x + Vvv − Vv x Vx x Vx v Φ . | {z } | {z } =:h vx
=:hzv
˜ t+1 yielding: The first line of (3.27) can be solved for E t x ¯ x + T¯x v Φ − S¯x v ΦR)z t . ¯ ˜ t + S¯−1 (Q ˜ t+1 = S¯−1 Et x x x Tx x x xx
Employing again the relation between the original variables and the transformed variables, i.e., ¯ t+1 = V¯x x E t x ˜ t+1 + V¯x v E t v ˜ t+1 =V¯x x E t x ˜ t+1 + V¯x v ΦRz t , Et x ¯ ˜t ˜ t = V¯x−1 ¯ t − V¯x−1 x x x x Vx v v
¯ ¯ t − V¯x−1 =V¯x−1 x x x Vx v Φz t ,
˜ t+1 and x ˜ t so that the solution for the vector of allows us to eliminate x states is given by ¯ t+1 = h xx x ¯ t + hzx z t , x h x := V¯x x S¯−1 T¯x x V¯ −1 , x hzx
(3.30b)
xx xx −1 ¯ ¯ ¯ x + V¯x v ΦR − V¯x x S −1 T¯x x V¯ −1 V¯x v Φ. ¯ := Vx x S x x Tx v Φ − S¯x v ΦR + Q xx xx
3.3 Second-Order Solution
131
3.3 Second-Order Solution 3.3.1 Second-Order Policy Functions Section 2.6.2 shows that the second-order partial derivatives of the policy functions (2.23) that involve the perturbation parameter σ are equal to zero. Therefore, we can arrange the second-order partial derivatives of the policy functions in one large matrix and one vector. Let ξ ∈ {x, y}. We define the matrix ξi ξ ξ ξ h x 1 ,x 1 . . . h x 1i ,x n(x) h x 1i ,z1 . . . h x 1i ,zn(z) ξ1 .. .. .. .. .. .. hww . . . . . . ξ2 ξ ξ ξ ξ i i i i hww h x ,x . . . h x ,x h x n(x) ,z1 . . . h x n(x) ,zn(z) n(x) 1 n(x) n(x) i := hξww := . , hξww ξ ξ ξ ξ i i i .. h i . . . h h . . . h z ,x z ,x z ,z z ,z 1 1 1 n(x) 1 1 1 n(z) .. .. .. ξn(ξ) .. .. ... . . hww . . . ξ
ξ
ξ
ξ
i i i i hzn(z) ,x 1 . . . hzn(x) ,x n(z) hzn(z) ,z1 . . . hzn(x) ,zn(z)
and the vector ξ1 hσσ ξ2 hσσ ξ hσσ := .. . . ξ
n(ξ) hσσ
T ¯ t := (x t − x) T , z Tt Letting w and employing Taylor’s theorem (13.3.2) yields the second-order expansion of the policy functions (2.23) at the stationary solution x t = x, y t = y, z t = 0 for the stochastic model with σ = 1: x x ¯ t + 12 I n(x) ⊗ w ¯ Tt hww ¯ t + 12 hσσ x t+1 = x + hwx w w σ2 , (3.31a) 1 1 y y ¯ t + 2 I n( y) ⊗ w ¯ Tt hww ¯ t + 2 hσσ y t = y + hwy w w σ2 . (3.31b) We employ the chain rule (2.34) to find the systems of equations whose y y x solutions are the objects hww , hww , hx σσ , and hσσ . In a first step, we incorporate the canonical model (2.21) into the framework of Section 2.6.3. In the vector s t := x t+1 , z t+1 , y t+1 , x t , z t , y t ∈ R2(n(x)+n(z)+n( y))
132
3 Perturbation Methods: Solutions
we gather the arguments of the vector valued function g so that (2.21a) can be written as 0(n(x)+n( y))×1 = E t g (s t ) .
(3.32a)
We can now employ (2.23) and (2.21b) to define the maps f :Rn(w)+1 → Rn(s)
by
: (w t , σ) 7→ f(w t , σ),
u :Rn(w)+1 → Rn(w)+1 : (w t , σ) 7→ u(w t , σ)
h x (w t , σ) Rz t + σΩε t+1 y h u(w t , σ) T f(w t , σ) := , xt zt h y (w t , σ) h x (w t , σ) u(w t , σ) := Rz t + σΩε t+1 . σ
(3.32b)
(3.32c)
Therefore, the first set of equilibrium conditions (2.21a) is the composite function E t (g ◦ f)(w t , σ). 3.3.2 Coefficients of the State Variables y
x To find hww and hww , we interpret σ as a parameter and not as a variable of the functions f and u defined in equations (3.32b) and (3.32c), respectively. The vector u, thus, is restricted to the first n(w) element functions. To distinguish this from the general case we employ the notation h x (w t , σ) u(w t ) := . (3.33) Rz t + σΩε t+1
Differentiating f twice with respect to w t and evaluating the result at the x point s yields a system of equations in the coefficients of the matrices hww y and hww . To find this system, we employ (2.34). The Hessian matrix of g◦ f with respect to w t , (g ◦ f)ww , is a matrix of size (n(x) + n( y))n(w) × n(w). Thus, let 0 denote a matrix of zeros with this dimension, we obtain
3.3 Second-Order Solution
133
T 0 = (g ◦ f)ww = I n(x)+n( y) ⊗ fw gss fw + gs ⊗ I n(w) fww .
(3.34)
In this expression, the matrices gs and gss are the Jacobian matrix and the Hessian matrix, respectively, of the system of equations with respect to the 2(n(x) + n(z) + n( y)) variables of the model. The Jacobian fw and the Hessian fww follow from (3.32b) and (3.32c) via differentiation with respect to w t . The reader who is eager to see the final result and is not interested in the following derivations might wish to proceed directly to the final system of equations presented in Equation (3.42). For the Jacobian matrix we obtain (recall the partition w = (x, z)): h xx hzx 0n(z)×n(x) R y x y x y h x h x h x hz + hz R , fw = (3.35) 0n(x)×n(z) I n(x) 0 I n(z) n(z)×n(x) y
hx
y
hz
where the third line follows from y y h xx hzx y hw uw = h x hz . 0n(z)×n(x) R | {z }
(3.36)
=:uw
To evaluate the Hessian fww , we first apply (2.34) to the composite function (h y ◦ u)(w t , σ) in the third line of (3.32b). This yields y T (h y ◦ u)ww = I n( y) ⊗ uw hww uw + hwy ⊗ I n(w) uww . (3.37) The matrix uw is given by the rightmost matrix in Equation (3.36). Differentiating this matrix with respect to w t yields x hww uww = (3.38) 0n(z)n(w)×n(w) so that y
(h ◦ u)ww =
T I n( y) ⊗ uw
y hww uw
+
hwy
⊗ I n(w)
x hww
0n(z)n(w)×n(w)
y x T = I n( y) ⊗ uw hww uw + h xy ⊗ I n(w) hww .
,
134
3 Perturbation Methods: Solutions
The remaining parts of the Hessian of f are easy to compute. Simply note that, e.g., the definition x i t := f i (w t , σ), i = 1, . . . , n(x) has the Hessian i matrix f ww = 0n(w)×n(w) . Therefore, the matrix fww is given by
x hww
0n(z)n(w)×n(w) x y y T ⊗ uw hww uw + h x ⊗ I n(w) hww I fww = n( y) . 0n(x)n(w)×n(w) 0n(z)n(w)×n(w) y hww
(3.39)
To evaluate the second term on the rhs side of Equation (3.34) we partition the Jacobian of g as in Section 3.2.2. Let gi , i ∈ {x 0 , z 0 , y 0 , x, z, y} denote the derivatives of g with respect to its ith vector argument. Then: gs ⊗ I n(w) fww = g x 0 gz 0 g y 0 g x gz g y ⊗ I n(w) fww y x y x = B1 hww + B2 hww + B3 C1 hww C2 + B3 C3 hww .
where B1 = g y ⊗ I n(w) ,
B2 = g x 0 ⊗ I n(w) ,
B3 = g y 0 ⊗ I n(w) ,
C1 = I n( y) ⊗ C2T , h xx hzx C2 = , 0n(z)×n(x) R
(3.40)
C3 = h xy ⊗ I n(w) .
Finally, let
T A1 := I n(x)+n( y) ⊗ fw gss fw
(3.41)
so that the system of equations (3.34) can be written as y y x −A1 = B1 hww + B3 C1 hww C2 + (B2 + B3 C3 ) hww .
(3.42) y
x This is a linear system in the unknown coefficient matrices hww and hww .A straightforward way to solve this system employs the vec operator. Using (12.21c) Equation (3.42) can be transformed to y vec(hww ) − vec(A1 ) = A , x vec(hww ) (3.43) T A := I n(w) ⊗ B1 + C2 ⊗ B3 C1 , I n(w) ⊗ (B2 + B3 C3 ) .
3.3 Second-Order Solution
135
The second way to solve Equation (3.42) is to note that this is a generalized Sylvester equation (see Kågström and Poromaa (1994), equation (1.1)) AR − LB = C,
(3.44a)
DR − L E = F.
(3.44b)
The linear algebra package (LAPACK), whose routines are freely available at http://www.netlib.org/lapack/, provides routines to solve this kind of equation. Our toolbox (see Section 3.5) provides access to these R routines from GAUSS and MATLAB via their respective foreign language interfaces. To place (3.42) into the form (3.44), note first that (3.42) can be written as y hww −A1 = B1 , B2 + B3 C3 (3.45a) x |{z} | {z } hww | {z } =:C =:A =:R y hww − B3 C1 , 0(n(x)+n( y))n(w)×n(x)n(w) (−C2 ) x hww | {z } | {z } =:B =:L
so that (3.44b) reduces to the definition of the matrix L: B3 C1 , 0(n(x)+n( y))n(w)×n(x)n(w) R − L I n(w) = 0(n(x)+n( y))n(w)×n(w) . |{z} | | {z } {z } =:D
=:E
=:F
(3.45b)
For small models, the storage requirements for the matrix equation (3.43) remain modest. The model in Example 1.6.1 has the dimensions n(x) = 1, n(z) = 1, and n( y) = 7 so that the coefficient matrix on the rhs of Equation (3.43) has size 32× 32. However, consider a medium-sized model, as, e.g., the Smets and Wouters (2007) model with n(x) = 9, n(z) = 6, and n( y) = 24. In this model, the same matrix has 7425 × 7425 elements. The matrix A from Equation (3.45), instead, is of size (n(x) + n( y))n(w) × (n(x) + n( y))n(w) and increases from 16 × 16 elements in the small model to 495 × 495 in the medium-sized model. Thus, it requires 225 times less memory than the large matrix from Equation (3.43).
3.3.3 Coefficients of the Perturbation Parameter We will now employ (2.34) with x := σ to the composite function (3.32). The respective Hessian (g ◦ f)σσ is a column vector with n(x) + n( y) ele-
136
3 Perturbation Methods: Solutions
ments: (g ◦ f)σσ = 0(n(x)+n( y))×1 = E t
T I n(x)+n( y) ⊗ fσ gss fσ + (gs ⊗ I1 ) fσσ .
(3.46)
The final result of the following derivations is presented in Equation (3.53) and the eager reader can go to there. The Jacobian matrix of f with respect to σ follows from differentiating (3.32b). This yields:
hσx Ωε t +1 y h u uσ fσ = , 0n(x)×1 0n(z)×1 y hσ
y
y
y
huy uσ = h x hz hσ
hσx Ωε t+1 . 1 | {z }
(3.47)
=:uσ
Recall that all derivatives in this expression must be evaluated at σ = 0 and s = (x, 0n(z)×1 , y, x, 0n(z)×1 , y). Using the results from Section (2.6.2) in Equation (2.32) yields:
0n(x)×1 0n(x)×n(z) Ωε t +1 I n(z) y y h Ωε t+1 hz fσ = z =: N Ωε t+1 . = Ωε 0n(x)×1 0n(x)×n(z) t+1 0n(z)×1 0n(z)×n(z) 0n( y)×1 0n( y)×n(z)
(3.48)
Differentiating Equation (3.47) again with respect to σ yields
x hσσ
0n(z)×1 M fσσ = . 0n(x)×1 0n(z)×1 y hσσ
(3.49)
The matrix M derives from applying (2.34) to the composite function (h y ◦ u)(σ).
3.3 Second-Order Solution
137
y T M := (h y ◦ u)σσ = M1 + M2 = I n( y) ⊗ uσ huu uσ + huy ⊗ I1 uσσ , {z } | {z } |
y huu1 y huu2
=:M1
=:M2
y y yi x h x ix h xzi h xσ hσσ yi yi yi yi y huu = . , huu = hz x hzz hzσ , uσσ = 0n(z)×1 . yi y y1 .. 0 hσx hσzi hσσ yn( y) huu
(3.50)
T Using again (2.32), uσ = (01×n(x) , ε Tt+1 Ω T , 1) and the expressions for M1 and M2 reduce to
y ε Tt+1 Ω T hzz1 Ωε t+1 y ε Tt+1 Ω T hzz2 Ωε t+1 + h y =: M + h y , M1 = .. 11 σσ σσ . yn( y)
ε Tt+1 Ω T hzz Ωε t+1 x h σσ y y x M2 = h x hz 0n( y)×1 0n(z)×1 = h xy hσσ . 0
(3.51)
Summarizing the results in (3.48) and (3.51) the expression in (3.46) is equal to (N Ωε t+1 ) T g1ss (N Ωε t+1 ) (N Ωε t+1 ) T g2 (N Ωε t+1 ) ss 0n(x)+n( y) = E t .. . T n(x)+n( y) (N Ωε t+1 ) gss (N Ωε t+1 ) x hσσ 0n(z)×1 M1 + M2 g x 0 gz 0 g y 0 g x gz g y + Et , 0n(x)×1 0n(z)×1 y hσσ T 1 (N Ωε ) g (N Ωε ) t+1 t+1 ss (N Ωε t+1 ) T g2 (N Ωε t+1 ) ss = Et .. . T n(x)+n( y) (N Ωε t+1 ) gss (N Ωε t+1 ) y y x y + g x 0 hσσ + g y 0 E t M11 + g y 0 hσσ + g y 0 h xy hσσ + g y hσσ .
138
3 Perturbation Methods: Solutions
The final step is to evaluate the conditional expectations in this expression. First note that each element i i ∆i = (N Ωε t+1 ) T gss (N Ωε t+1 ) = ε Tt+1 Ω T N T gss (N Ωε t+1 ) {z } | {z } | ∆i1
=:∆i2
of the first term in the previous equation is a scalar ∆i = ∆i1 ∆i2 . By definition, the trace of a scalar is equal to the scalar itself, tr(∆i ) = ∆i so that we can apply property (12.15d) of the trace operator to obtain E t ∆i = E t tr(∆i ) = E t tr(∆i1 ∆i2 ) = E t tr(∆i2 ∆i1 ) = tr(E t (∆i2 ∆i1 )) (2.21d) i i = tr N T gss N E t Ωε t+1 ε t +1 T Ω T = tr N T gss N ΩΩ T . In the same way we can determine E t M11 : y tr hzz1 ΩΩ T tr h y2 ΩΩ T zz E t M11 = . .. . yn( y) T tr hzz ΩΩ Finally, consider an extension of the trace operator proposed by Gomme and Klein (2011): X1 tr(X 1 ) X2 tr(X ) := . 2 . trm (3.52) . .. .. Xm
tr(X m )
This operator returns the traces of the m blocks of the mn × n matrix X in the m dimensional column vector x. Accordingly, the system of linear equations (3.46) can be written as: y − trm I n(x)+n( y) ⊗ N T gss N ΩΩ T − g y 0 trm I n( y) ⊗ (ΩΩ T ) hzz y y hσσ = g y + g y0 gx0 + g y0 hx . (3.53) x hσσ
3.4 Third-Order Solution
139
3.4 Third-Order Solution 3.4.1 Third-Order Policy Functions Drawing on the results stated in Equation (2.32) in Section 2.6.2 and using Taylor’s theorem (13.3.2) up to the order of three, yields the following expressions for the policy functions of the states x t+1 and the jump variables y t :9 x x x ¯ Tt hww ¯ t + 12 hσσ ¯ t + 12 I n(x) ⊗ w x t+1 =x + hwx w w σ2 + 16 hσσσ σ3 (3.54a) x x 1 1 T T T 2 ¯t ⊗w ¯ t hwww w ¯ t + 2 I n(x) ⊗ w ¯ t hσσw σ , + 6 I n(x) ⊗ w y y y ¯ Tt hww ¯ t + 12 hσσ ¯ t + 12 I n( y) ⊗ w y t =y + hwy w w σ2 + 16 hσσσ σ3 (3.54b) 1 1 y y ¯ Tt ⊗ w ¯ Tt hwww ¯ t + 2 I n( y) ⊗ w ¯ Tt hσσw + 6 I n( y) ⊗ w w σ2 ,
¯ t := [x 1t − x 1 , . . . , x n(x)t − x n(x) , z1t , . . . , zn(z)t ] T , x and y denote where w the stationary solution of x t and y t .
3.4.2 Coefficients of the State Variables y
x The matrices hwww and hwww solve a system of linear equations. This system is the result of applying the chain rule (2.35) to the system of equations (2.25). As in Section 3.3.2, we consider σ as a parameter of the functions f i (x 1t , . . . , x n(x)t , z1t , . . . , zn(z)t , σ). The system has the same structure as the system (3.42) and is given by
−
5 X i=1
y y x Pi = Q 3 hwww + Q 2 S1 hwww S2 + (Q 1 + Q 2 S3 )hwww ,
(3.55a)
where Q 1 :=g x 0 ⊗ I n(w)2 ,
(3.55b)
Q 3 :=g y ⊗ I n(w)2 ,
(3.55d)
Q 2 :=g y 0 ⊗ I n(w)2 9
(3.55c)
For the definitions of the respective matrices, please see Sections 2.6.3 and 3.3, respectively. Note also that σ = 1 in the stochastic model.
140
3 Perturbation Methods: Solutions
S1 :=I n( y) ⊗ S2T ⊗ S2T , h xx hzx S2 :=uw = , 0n(z)×n(x) R
(3.55e)
S3 :=h xy ⊗ I n(w)2 ,
(3.55g)
P1 := I n(x)+n( y) P2 := I n(x)+n( y) P3 := I n(x)+n( y)
(3.55f)
T T ⊗ fw ⊗ fw gsss fw , T ⊗ ˜fww gss fw , T ⊗ fw ⊗ I n(w) gss ⊗ I n(w) fww , T ⊗ ˜fw gss ⊗ I n(w) fww ,
(3.55h) (3.55i) (3.55j)
P4 := I n(x)+n( y) y T ˜ ww P5 :=Q 2 I n( y) ⊗ u hww uw y T + I n( y) ⊗ uw ⊗ I n(w) hww ⊗ I n(w) uww y T ˜w + I n( y) ⊗ u hww ⊗ I n(w) uww .
(3.55k) (3.55l)
The matrices P1 through P4 are equal to the first four terms of the chain rule (2.35), respectively. The matrices fw , fww , uww are defined in equations ˜ w are built from fw (3.35), (3.39), and (3.38), respectively. ˜fw as well as u and uw , respectively, according to (2.36), and the rows of the matrix ˜fww are built from the blocks of size n(w) × n(w) of fww , as defined in (2.37). The rhs of Equation (3.55a) as well as P5 follow from the fifth term of the chain rule (2.35), which yields gs ⊗ I n(w)2 fwww x hwww 0(n(z)n(w)2 )×n(w) P6 = g x 0 gz 0 g y 0 g x gz g y ⊗ I n(w)2 , 0(n(x)n(w)2 )×n(w) (3.56) 0(n(z)n(w)2 )×n(w) y hwww x y = g x 0 ⊗ I n(w)2 hwww + g y 0 ⊗ I n(w)2 ∆ + g y ⊗ I n(w)w hwww . | {z } | {z } | {z } =:Q 1
=:Q 2
=:Q 3
The matrix P6 is the third-order derivative of the composite function h y (u(w t )) defined in (3.32b) and (3.33). Applying (2.35) to this function yields
3.4 Third-Order Solution
141
y y T T T ˜ ww P6 = I n( y) ⊗ uw ⊗ uw hwww uw + I n( y) ⊗ u hww uw |{z} | {z } :=S1
=:S2
T y + I n( y) ⊗ uw ⊗ I n(w) hww ⊗ I n(w) uww y T ˜w + I n( y) ⊗ u hww ⊗ I n(w) uww + hwy ⊗ I n(w)2 uwww .
(3.57)
˜ w and u ˜ ww are built from uw and uww as shown in Again, the matrices u equations (2.36) and (2.37), respectively. The matrix uwww is matrix of the third-order partial derivatives of u(w) as defined in Equation (3.33) and therefore equal to x hwww uwww = 0n(z)n(w)2 ×n(w) so that
x hwy ⊗ I n(w)2 uwww = h xy ⊗ I n(w)2 hwww . | {z } =:S3
Inserting the rhs of (3.57) into (3.56) and collecting terms explains the remaining parts of Equation (3.55a). Again, the system (3.55a) can be solved by applying the vec operator, which transforms the system into the matrix Equation Ax = b: − vec |
=
|
5 X i=1
{z =b
Pi }
I n(w) ⊗ Q 3 +
R2T
y vec hwww . ⊗ Q 2 R1 ; I n(w) ⊗ (Q 1 + Q 2 R3 ) x {z } vec hwww | {z } =:A
=:x
(3.58)
Alternatively, the system may be written as a generalized Sylvester equation (3.44): y 5 X hwww − Pi = Q 3 Q 1 + Q 2 R3 (3.59a) x | {z } hwww i=1 | {z } | {z } =:A =:R =:C y hwww − Q 2 R1 0m1 ×m2 (−R2 ), x hwww | {z } | {z } =:B =:L
142
3 Perturbation Methods: Solutions
0m1 ×n(w) = Q 2 R1 0m1 ×m2 R − L I n(w) , |{z} {z } | {z } | =:E
=:D
=:F
(3.59b)
m1 :=[n(x) + n( y)]n(w)2 , m2 := n(x)n(w)2 .
3.4.3 Coefficients of the State-Dependent Uncertainty This subsection presents the linear system of equations whose solution is the (n( y) + n(x))n(w) column vector
x1 y1 hσσw hσσw1 1 .. .. . . h y1 h x 1 σσw n(w) σσw n(w) y hσσw .. .. y , hx . , hσσw := := . . x σσw hσσw y x n( y) n(x) hσσw1 hσσw1 .. .. . . yn( y)
hσσw n(w)
x
n(x) hσσw n(w)
Since the perturbation parameter σ captures the uncertainty that emanates y x from the shocks that drive the model, hσσw and hσσw reflect the statedependent effect of uncertainty. We apply the chain rule (2.38) to the system (2.25). We collect the single steps of this derivation in Appendix A.5. The final result is given by: y hσσw −A = B1 B2 , (3.60a) x hσσw A :=
3 X i=1
Ai + g y 0 ⊗ I n(w)
y x T I n( y) ⊗ uw hwx hσσ
y T ⊗ I n(z) hwzz , + g y 0 ⊗ I n(w) trm I n( y)n(w) ⊗ ΩΩ T I n( y) ⊗ uw x hσσ 0n(z)×1 y y y x T trm I n( y) ⊗ ΩΩ hzz + hσσ + h x hσσ T A1 : = I n(x)+n( y) ⊗ fw gss , 0n(x)×1 0n(z)×1 y hσσ
3.4 Third-Order Solution
143
T 0n(w)×n(w)n(z) y ˜ wz uw ⊗ I n(z) h T A2 := 2 trm I n(x)+n( y) ⊗ g N ΩΩ ss , 0n(w)×n(w)n(z) 0n( y)×n(w)n(z) T A3 := trm I n(x)+n( y) ⊗ fw ⊗ N T gsss N ΩΩ T , T B1 := g y ⊗ I n(w) + g y 0 ⊗ I n(w) I n( y) ⊗ uw , y B2 := g x 0 ⊗ I n(w) + g y 0 ⊗ I n(w) h x ⊗ I n(w) .
3.4.4 Coefficients of the Perturbation Parameter This subsection presents the linear system of equations whose solution is y x the n( y) vector hσσσ and the n(x) vector hσσσ . To find this system, we apply the chain rule (2.35) to the composite function (2.25) where we consider the inner functions f i , i = 1, 2, . . . , 2[n(x) + n( y) + n(z)] defined in (3.32b) and (3.32b) as functions of the single argument σ. As in the previous subsection we defer the details of this exercise to Appendix A.6 and present the result in the next equation: y y hσσσ −A = g y + g y 0 g x 0 + g y 0 h x , x hσσσ A := trm I n(x)+n( y) ⊗ N T ⊗ N T gsss N S T (3.61) 0(n(x)+n(z))×n(z)2 y ˜ zz gsss N S h + 3 trm I n(x)+n( y) ⊗ 0(n(x)+n(z)+n( y))×n(z)2 y + g y 0 trm I n( y) ⊗ S hzzz .
The matrix S in this expression is defined in (2.21e) and its elements are related to the third moments of the innovations εi , i = 1, 2, . . . , n(z) of the shock process (2.21b) in Equation (2.22b). Therefore, if the n(z) elements of ε are drawn independently from a symmetric distribution, S is a zero matrix. Accordingly, the system (3.61) will become a homogenous y linear system with solution (provided that it exists) hσσσ = 0n( y)×1 and x hσσσ = 0n(x)×1 .
144
3 Perturbation Methods: Solutions
3.5 Implementation We provide implementations of the perturbation solution in two toolboxes R written in GAUSS and MATLAB . Broadly, our approach is object oriented. The properties of an instance of the canonical DSGE model presented in R Section 2.5.2 are summarized in a structure (GAUSS) or class (MATLAB ). This allows us to define default options for the solution and evaluation of a model, e.g., the choice between various ways to compute derivatives, the order of the solution, and whether the model is reduced to a smaller model before its first-order solution is computed. The object also holds the matrices that are computed at the various stages of the solution procedure y and the matrices of the final solution, e.g. hwx and hw from the linear component of the solution presented in (3.1). The task for the user is to write a procedure (GAUSS) or function R (MATLAB ) that returns the left-hand side (lhs) of the system of equations (2.21a) and to provide the matrices R, Ω, and S from (2.21b), (2.21c), and (2.21e), respectively. In addition, the user must provide the stationary solution defined in (2.24), as well as the number of states n(x), the number of jump variables n( y), the number of shocks n(z), and the number of static equations n(u). The procedure SolveModel(&M,&E) (GAUSS) computes the model’s solution. This procedure receives two pointers. The first &M points to an instance of the DSGE object and the second points to the procedure &E with the model’s equations. If successful, the procedure y returns in M.Hxw and M.Hyw the matrices hwx and hw . The elements of the second-order solution (3.31) are stored in M.Hxww, M.Hyww, M.Hxss, and M.Hyss. The third-order solution is not available. The reason is that there is no way to implement symbolic or automatic differentiation in GAUSS. Since GAUSS provides no command to solve the generalized Sylvester equation (3.44), our toolbox provides access to the respective LAPACK routines via a dynamic link library file. R The MATLAB version of our toolbox is more versatile and computes solutions up to the third-order. Here, the pointer to the function that defines the model’s equations is stored in the property Model.Equations so that Model=SolveModel(Model) computes the solution. Note that you must pass an instance of the DSGE class (again named Model in this example) to the function and that the command returns and overwrites the existing instance of the object. The function SolveModel computes in a first step the Jacobian, Hessian, and matrix of third-order derivatives of (2.21a) at the stationary solution. In the next step it calls the procedure Linear,
3.5 Implementation
145
which computes the linear component of the solution. Depending on the setting of Model.order, the functions Quadratic and Cubic compute the remaining components of the solution. All three functions receive a copy of Model and return a new instance of this copy. The advantage of this approach is that each new copy includes both the information about the model’s properties and the results of the preceding steps. It is, therefore, not necessary to pass large lists of arguments to each of these functions. R Our toolbox uses the MATLAB function lapack.m to access LAPACK R routines that are not implemented as MATLAB commands. Both toolboxes implement the computation of welfare measures for models with mean preserving shock processes as proposed by Heiberger and Maußner (2020). In addition, both toolboxes provide commands to compute standard diagnostic tools used to evaluate the implications of a model (see Sections 4.2 and 4.3) and to visualize the results in graphs and tables. Both toolboxes can be downloaded from the website of this book and come with a detailed user’s manual that explains the installation, the setup, the solution, and the evaluation of a model. Examples that illustrate the use of our toolboxes are the programs that accompany our applications of perturbation methods in Sections 4.4 and 4.6. In later chapters, we either compare perturbations solutions to solutions obtained from global methods, or we use perturbation solutions to initialize algorithms that implement global solution methods.
146
3 Perturbation Methods: Solutions
A.5 Coefficients of the State-Dependent Uncertainty Applying (2.35) to the system (2.25) yields: § 0(n(x)+n( y))(w)×1 = E t +2
g f ⊗ I n(w) fσσw + I n(x)+n( y) ⊗ fwT gss fσσ T I n(x)+n( y) ⊗ fσw
gss fσ +
I n(x)+n( y) ⊗ fwT
= gs ⊗ I n(w) E t fσσw + A1 + A2 + A3 .
⊗ fσT
ª gsss fσ . (A.5.1)
The matrices fσ , fw , and fσσ are defined in (3.48), (3.35), and (3.49), respectively. The expectation of the matrix M , which appears in (3.49), follows from (3.50) and (3.51). Therefore, A1 := I n(x)+n( y) ⊗ fwT gss E t fσσ x hσσ 0n(z)×1 y y x + hσσ + h xy hσσ trm I n( y) ⊗ ΩΩ T hzz (A.5.2) T = I n(x)+n( y) ⊗ fw gss . 0n(x)×1 0n(z)×1 y hσσ It remains to determine fσw and fσσw . Given the results in (2.32), the only nonzero element in fσw is the derivative of h y (u(w t , σ)) with respect to σ and w t . Differentiating h yi (u(w t , σ)) first with respect to σ and then with respect to x kt yields yi hσx = k
n(z) X n(x) X s=1 j=1
x
hzyi x h x kj Ωs ε t+1 , Ωs := [Ωs1 , . . . , Ωsn(z) ] s
j
(A.5.3)
since all other terms are equal to zero at the stationary solution of the model. Differentiation with respect to σ and zkt yields yi hσz = k
n(z) X n(x) X s=1 j=1
x
hzyi x h x kj Ωs ε t+1 + k
j
n(z) X n(z) X s=1 j=1
hzyiz R jk Ωs ε t+1 . s j
The expressions in (A.5.3) and (A.5.4) are the elements of the matrix ˜ y u ⊗ I n(z) I n(w) ⊗ Ωε t+1 , h w wz ˜ y1 h wz .. ˜y = h . , wz yn( y) ˜ hwz y y i ˜ = h x i z . . . h xyi z . . . h xyi z hzyi z . . . hzyi z . . . hzyi z h . wz 1 1 1 n(z) n(x) n(z) 1 1 1 n(z) n(z) n(z)
(A.5.4)
(A.5.5)
Appendix 5
147
Thus, fσw can be written as fσw = L I n(w) ⊗ Ωε t+1 ,
0n(w)×n(w)n(z) ˜ y u ⊗ I n(z) h L := wz w . 0n(w)×n(w)n(z) 0n( y)×n(w)n(z)
(A.5.6)
T The term 2 I n(x)+n( y) ⊗ fσw gss fσ in (A.5.1), thus, consists of n(x) + n( y) blocks of the form T i i fσw gss fσ = I n(w) ⊗ (Ωε t+1 ) T L T gss N Ωε t+1 .
i If we partition the n(w)n(z) × n(z) matrix ∆i := L T gss N into n(w) blocks ∆ij of size n(z) × n(z), this can be written as (Ωε t+1 ) T ∆1i Ωε t+1 .. i I n(w) ⊗ (Ωε t+1 ) T L T gss N Ωε t+1 = , . Ωε t+1 ) T ∆in(w) Ωε t+1 i T i ∆1 . . . ∆in(w) := L T gss N.
The expectation of this expression is equal to tr ∆1i ΩΩ T .. i i E t I n(w) ⊗ (Ωε t+1 ) T L T gss N Ωε t+1 = = trm L T gss N ΩΩ T , . tr ∆in(w) ΩΩ T implying
T I n(x)+n( y) ⊗ fσw gss fσ = 2 trm I n(x)+n( y) ⊗ L T gss N ΩΩ T T 0n(w)×n(w)n(z) (A.5.7) ˜ y u ⊗ I n(z) h I g N ΩΩ T . w wz = 2 trm ⊗ n(x)+n( y) ss 0n(w)×n(w)n(z) 0n( y)×n(w)n(z)
A2 := 2E t
Similar steps will help us to evaluate the term A3 of (A.5.1). The n(x) + n( y) blocks of this expression are equal to10 i i fwT ⊗ fσT gsss fσ = fwT ⊗ (Ωε t+1 ) T N T gsss N Ωε t+1 T i T = I n(w) ⊗ (Ωε t+1 ) fw ⊗ N T gsss N Ωε t+1 , T (Ωε t+1 ) ∆1i Ωε t+1 ∆1i i . .. T T = , .. = fw ⊗ N gsss N . . (Ωε t+1 ) T ∆in(w) Ωε t+1
∆in(w)
The second step applies the rule (A ⊗ B)(C ⊗ D) = AC ⊗ BD to A := I n(w) , B := (Ωε t+1 ) T , C := fwT , and D := N T . See (12.19c).
10
148
3 Perturbation Methods: Solutions
Therefore, I n(x)+n( y) ⊗ fwT ⊗ fσT gsss fσ = trm I n(x)+n( y) ⊗ fwT ⊗ N T gsss N ΩΩ T .
A3 := E t
(A.5.8)
It remains to determine the matrix fσσw . Differentiating the expression in (3.32b) yields
x hσσw
0n(z)n(w)×1 H fσσw = . 0n(x)n(w)×1 0n(z)n(w)×1 y hσσw
(A.5.9)
The matrix H in this expression results from applying the chain rule (2.38) to the composite function h y (u(w t , σ)), where the inner vector-valued function u is defined in Equation (3.32c). Therefore, H is equal to: H=
4 X
Hi ,
i=1 huy
⊗ I n(w) uσσw , y H2 = I n( y) ⊗ uwT huu uσσ , y T H3 = 2 I n( y) ⊗ uσw huu uσ y H4 = I n( y) ⊗ uwT ⊗ uσT huuu uσ . H1 =
(A.5.10)
The partial derivatives of u follow from (3.32c) and the results presented in (2.32). They are given by: 0n(x)×1 uσ = Ωε t+1 , 1 x hσσ uσσ = 0n(z)×1 , 0 x hσσw uσσw = 0n(z)n(w)×1 , 0n(w)×1 h xx hzx uw = 0n(z)×n(x) Π , 01×n(x) 01×n(z) uσw = 0(n(w)+1)×n(w) .
Appendix 5
149
Therefore, H1 = huy ⊗ I n(w) uσσw
x hσσw
= hwy ⊗ I n(w) hσy ⊗ I n(w) 0n(z)n(w)×1 0n(w)×1
x hσσw
x = h xy ⊗ I n(w) hzy ⊗ I n(w) 0n( y)×1 ⊗ I n(w) 0n(z)n(w)×1 = h xy ⊗ I n(w) hσσw . 0n(w)×1 (A.5.12) The matrix H2 has n( y) blocks H2i given by yi yi yi x h xσ hσσ T h xyx h xz yi yi yi x hzσ 0n(z)×1 = uwT hwx H2i = uw 0n(w)×1 hz xi hzz hσσ , yi yi yi hσx hσz hσσ 0
yi hwx :=
h xyix yi , hz x
where uw is the matrix defined in Equation (3.36). Thus, y x H2 = I n( y) ⊗ uwT hwx hσσ .
(A.5.13)
Obviously, H3 = 0n( y)n(w)×1 and it remains to determine H4 . Again, this matrix consists of n( y) blocks H4i given by y i H4i = uwT ⊗ uσT huuu uσ , y y T T i i = uw ⊗ uσ 0n(w)×(n(w)+1) huuu uσ = uwT ⊗ uσT hwuu uσ , yi T T T = uw ⊗ 01×n(x) , (Ωε t+1 ) uw hwuu uσ , y yi i = uwT ⊗ 01×n(x) , (Ωε t+1 ) T hwwz Ωε t+1 + uwT hσσw , (A.5.14) T T yi (Ωε t+1 ) uw ⊗ I n(z) hwzz Ωε t+1 1 .. + u T h yi . = . w σσw T T yi (Ωε t+1 ) uw ⊗ I n(z) hwzz Ωε t+1 n(w)
yi The different steps in this derivation exploit the structure of the matrix huuu , which is given by
150
hwyi uu 1 .. yi huuu := y . , i h w n(w)uu yi hσuu yi hw x x . . . i 1 1 .. .. . hwyi uu := y . i hwi z x . . . i n(z) 1 hwyi σx . . . i 1 yi hw x x i 1 1 .. (2.32c) = y . hwi z x i n(z) 1 0
3 Perturbation Methods: Solutions yi yi . . . hσx hσx 1 zn(z) 1σ .. .. .. . . . , yi yi . . . hσz z hσz σ
yi hσx 1 x1 .. yi hσuu := y . hσzi x n(z) 1 n(z) n(z) n(z) yi y1 yi hσσx . . . hσσz hσσσ 1 n(z) hwyi x z hwyi x σ i 1 n(z) i 1 .. .. . . hwyi z z hwyi z σ
h
i n(z) n(z) y1 w i σzn(z)
h
(A.5.15)
i n(z) yi w i σσ
... 0 .. .. .. . . . . . . . hwyi z z 0 i n(z) n(z) ... 0 hwyi σσ hwyi x z i 1 n(z)
i
Accordingly, the n(w) + 1 rightmost columns of the matrix uwT ⊗ uσT eliminate yi the matrix hσuu explaining the second line of (A.5.14). The third line recognizes the n(w) zero elements in the last row of the matrices hwyi uu . The step from line i three to four follows since the n(x) zero elements of the vector uσ eliminate yi yi the first n(x) columns of hwuu . Finally, the first n(x) lines of each hwwz drop out, T due to the zeros in (01×n(x) , Ωε t+1 ) . We can now employ the trace operator to evaluate the expectation of the n(w) scalar elements of the matrix in the last line of (A.5.14). For the entire matrix H4 this can be written in terms of the matrix trace operator defined in (3.52). This yields y y E t (H4 ) = trm I n( y)n(w) ⊗ ΩΩ T I n( y) ⊗ uwT ⊗ I n(z) hwzz + I n( y) ⊗ uwT hσσw . Together with the results from (A.5.12) and (A.5.13) we have, thus, established that the expectation of the matrix H from (A.5.9) is equal to x y x y E t (H) = h xy ⊗ I n(w) hσσw + I n( y) ⊗ uwT hwx hσσ + I n( y) ⊗ uwT hσσw y (A.5.16) + trm I n( y)n(w) ⊗ ΩΩ T I n( y) ⊗ uwT ⊗ I n(z) hwzz .
We are now able to replace the four terms on the right-hand side of (A.5.1) with the results established in (A.5.16), (A.5.2), (A.5.7), and (A.5.8). This yields x hσσw 0n(z)n(w)×1 3 E t (H) X 0(n(x)+n( y))×1 = g x 0 gz 0 g y 0 g x gz g y ⊗ I n(w) A, + 0n(x)n(w)×1 i=1 i 0 n(z)n(w)×1 y hσσw x y = g x 0 ⊗ I n(w) hσσw + g y 0 ⊗ I n(w) E t (H) + g y ⊗ I n(w) hσσw +
3 X i=1
Ai .
Appendix 5
151
Replacing E t (H) in this equation by the rhs of Equation (A.5.16) and collecting terms delivers the solution presented in Equation (3.60a).
152
3 Perturbation Methods: Solutions
A.6 Coefficients of the Perturbation Parameter This appendix covers the derivation of Equation (3.61). We restrict the inner functions f of the composite function (2.25) to the single argument σ, h x (σ) Rz + ση t+1 h x (σ) y h (u(σ)) f(σ) := , u(σ) := Rz + ση t+1 , η t+1 := Ωε t+1 . x σ z h y (σ) The Jacobian fσ and the Hessian fσσ both evaluated at the stationary solution are presented in (3.48) and (3.49), respectively. Applying the chain rule (2.35) to (g ◦ f)(σ) yields
0[n(x)+n( y)]×1 = E t +
T I n(x)+n( y) ⊗ fσT ⊗ fσT gsss fσ + I n(x)+n( y) ⊗ ˜fσσ gss fσ I n(x)+n( y) ⊗ fσT
T ˜ gss fσσ + I n(x)+n( y) ⊗ fσ gss fσ + g f fσσσ . (A.6.1)
The first term on the right-hand side of this expression is a vector X 1 with elements i (3.48) T i fσT ⊗ fσT gsss fσ = E t η t+1 N T ⊗ η Tt+1 N T gsss N η t+1 , i = E t η Tt+1 ⊗ η Tt+1 N T ⊗ N T gsss N η t+1 , i (2.21e) T i = E t tr N T ⊗ N T gsss N η t+1 η Tt+1 ⊗ η Tt+1 = tr N ⊗ N T gsss NS .
X 1i = E t
The second line in this derivation employs the rule (12.19c) for A = B = η Tt+1 and C = D = N T , the third line employs (12.17b) with A = (η Tt+1 ⊗ η Tt+1 ) and i B = (N T ⊗ N T )gsss N η t+1 , and the final equality employs the definition of the matrix S given in (2.21e). Therefore, the entire term X 1 is equal to X 1 = Et
i I n(x)+n( y) ⊗ fσT ⊗ fσT gsss fσ = trm I n(x)+n( y) ⊗ N T ⊗ N T gsss N S .
(A.6.2)
The second, third, and fourth terms on the right-hand side of (A.6.1) can be added to a single term T X 2 := 3E t I n(x)+n( y) ⊗ fσσ gss fσ . (A.6.3) To see this, note that each inner function f i is a function of a single variable so that fσ = ˜fσ and fσσ = ˜fσσ (see (2.36) and (2.37), respectively). Furthermore,
Appendix 6
153
T i i T i i the product fσσ gss fσ is a scalar and gss is symmetric so that fσσ gss fσ = fσT gss fσσ . The vector X 2 has scalar elements ( ) n( y) X n(z) X n(z) X n(z) X T i i ys X 2i = E t fσσ gss fσ = E t q rs hz z η r t+1 ηl t+1 ηkt+1 , r=1 s=1 l=1 k=1
i q rs := gzi 0 y 0 + r
s
n( y) X j=1
l k
y
hzrj g iy 0 y 0 . j s
In this expression, g ai b denotes the element in row a and column b of the Hessian matrix of g i (x t+1 , z t+1 , y t+1 , x t , z t , y t ) and the prime abbreviates variables dated at t + 1. This result rests on the definitions of fσ and fσσ in Equations (3.48), (3.49), and (3.51), as well as on E t (ε t+1 ) = 0n(z)×1 . The reader may wish to verify that i X 2i = E t η Tt+1 ⊗ η Tt+1 P T gss N η t+1 , i = E t tr P T gss N η t+1 η Tt+1 ⊗ η Tt+1 , i = tr P T gss NS , 0(n(x)+n(z))×n(z)2 ˜y , P := h zz 0(n(x)+n(z)+n( y))×n(z)2 ˜ y holds the matrix of second-order coefficients of h yi with where each row of h zz respect to the vector z t (see (2.37)). Therefore, the entire vector X 2 is equal to X 2 = 3 trm I n(x)+n( y) ⊗ P T gss N S . (A.6.4) The last term on the right-hand side of (A.6.1) is equal to x hσσσ 0n(z)×1 X X 3 := E t {gs fσσσ } = g f 33 . 0n(x)×1 0 n(z)×1 y hσσσ
(A.6.5)
The matrix X 33 abbreviates the expectation of the third-order derivative of the composite function defined in (3.32b) and (3.32c) with respect to the perturbation parameter σ. This derivative is equal to
154
3 Perturbation Methods: Solutions
X 33 = E t
= Et
y y T T ˜ σσ I n( y) ⊗ uσT ⊗ uσT huuu uσ + I n( y) ⊗ u huu uσ
y y ˜ σT huu + I n( y) ⊗ uσT huu uσσ + I n( y) ⊗ u uσσ + huy uσσσ ,
y y I n( y) ⊗ uσT ⊗ uσT huuu uσ + 3 I n( y) ⊗ uσT huu uσσ + huy uσσσ , I n( y) ⊗ uσT
= Et = Et = trm
⊗ uσT
y huuu uσ
T
+ huy uσσσ
(A.6.6)
,
y
y I n( y) ⊗ uσT ⊗ uσ huuu uσ + h xy hzy hσ
y y x I n( y) ⊗ S hzzz + hσσσ + h xy hσσσ .
x hσσσ 0n(z)×1 , 1
The step from the first to the second equality sign exploits the fact that the second, third, and fourth terms of X 33 are identical. The expectation of yi x h xyix h xz 0n(x)×1 hσσ yi yi hzz 0n(z)×1 0n(z)×1 ∆i := uσT huu uσσ = 01×n(x) η Tt+1 1 hzyxi yi 01×n(x) 01×n(z) hσσ 0 is equal to zero, since E t η t+1 = 0n(z)×1 according to Assumption (2.21d). This result explains the step from the second to the third equality sign in (A.6.6). The y last line in (A.6.6) exploits the structure of the matrix huuu presented in (A.5.15). Each scalar element y y yi yi i i ∆i := uσT ⊗ uσT huuu uσ = η Tt+1 ⊗ η Tt+1 hzzz η t+1 + hσσz η t+1 + hσσσ has the expectation y yi E t (∆i ) = tr Shzzz + hσσσ so that the entire vector (∆1 , . . . , ∆n( y) ) T is equal to the first two arguments on the right-hand side of the last line in (A.6.6).
Chapter 4
Perturbation Methods: Model Evaluation and Applications
4.1 Introduction This chapter introduces the reader to two standard tools to evaluate the implications of a solved dynamic stochastic general equilibrium (DSGE) model: second moments and impulse response functions. The next section covers the various ways to compute standard deviations and autocorrelations for the model’s variables and cross-correlations between them. Section 4.3 explains the computation of impulse responses that depict the time path of the model’s variables following a one-time shock. Equipped with these diagnostic tools, we consider three applications: the benchmark business cycle model, a time-to-build variation of this model, which illustrates the usefulness of the linear-quadratic (LQ) approach considered in Section 2.4, and in the final Section 4.6, we consider the model proposed by Smets and Wouters (2007).
4.2 Second Moments A standard tool to evaluate DSGE models is to compare the second moments implied by the model to those of the respective macroeconomic aggregates. The researcher regards the model as a data generating mechanism in the same way as the econometrician considers empirical time series as realizations of some well-specified stochastic process. The econometrician starts with assumptions about this process and estimates its underlying parameters from the time series at hand. The DSGE researcher infers the time series implications of his model from the model’s solution. He is therefore able to provide a structural interpretation of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_4
155
156
4 Perturbation Methods: Model Evaluation and Applications
observed data. The matching of observed time series statistics with those implied by a model requires knowledge of several concepts and tools from time series analysis. We will introduce these in the following subsections where needed. The reader who is unfamiliar with the concept of stochastic processes may wish to read Section 16.3 before continuing.
4.2.1 Analytic Second Moments: Time Domain The linear or first-order perturbation solution of the canonical DSGE model presented in (3.1) defines a first-order vector autoregressive (VAR(1))process in the vector of endogenous and exogenous states. Therefore, standard tools from the econometrician’s toolkit can be used to infer the covariance matrix of this process. To begin, let us write the solution for the vector of endogenous states and shocks w t = [x Tt , z Tt ] T in a more compact form by defining the vector of driving forces1 0n(x)×n(x) 0n(x)×n(z) 0n(x)×1 ν t+1 := , 0n(z)×n(x) Ω ε t+1 (4.1) ε t+1 iid N 0n(z)×1 , I n(z) . The covariance matrix of this vector random variable is equal to 0n(x)×n(x) 0n(x)×n(z) T Σν = E ν t+1 ν t+1 = . 0n(z)×n(x) ΩΩ T
(4.2)
We can now combine equations (3.1a) and (2.21b) to the vector autoregressive model H xx Hzx xt − x w w ¯ t+1 = H w w ¯ t + ν t+1 , H w ≡ ¯ t := w ,w . (4.3) 0n(z)×n(x) R zt To obtain the covariance matrix of this process, we multiply the left-hand ¯ Tt+1 and the right-hand side (rhs) by [H ww w ¯ t +ν t+1 ] T side (lhs) of (4.3) by w and take expectations on both sides. This yields: T ¯ t+1 w ¯ Tt+1 = E H ww w ¯ tw ¯ Tt H ww + ν t+1 ν Tt+1 + H ww w ¯ t ν Tt+1 Ew (4.4) T ¯ Tt H ww + ν t+1 w . 1
Recall that we define the solution of the stochastic model at the point σ = 1. Henceforth we will therefore omit σ from the formulas to come.
4.2 Second Moments
157
Covariance stationarity implies ¯ t+1 w ¯ Tt+1 = E w ¯ tw ¯ Tt =: Γ0w . E w ¯ t, Since the innovations ν t+1 are independent from the current states w the expectation of the two rightmost terms in equation (4.4) vanishes. Accordingly, the covariance matrix Γ0w is the solution of Γ0w = H ww Γ0 H ww
T
+ Σν .
(4.5)
This is an example of the discrete Lyapunov equation, which can be solved in several ways. One is provided by Kitagawa (1977). A second approach is to note that (4.5) is a specific version of the generalized Sylvester equation (3.44) encountered in Section 3.3.2.2 An analytic solution follows via the property (12.21c) of the vec operator: I n(w)2 − H ww ⊗ H ww vec Γ0w = vec (Σν , ) . (4.6) Given the covariance matrix Γ0w , we can solve for the covariance matrix of entire vector of variables xt − x I n(x) 0n(x)×n(z) y y . Hz ¯s t := y t − y = M w ¯ t , M := H x zt 0n(z)×n(x) I n(z) It is given by ¯ tw ¯ Tt M T = M E w ¯ tw ¯ Tt M T = M Γ0w M T . (4.7) Γ0s := E ¯s t ¯s Tt = E M w Lagged autocovariances E ¯s t ¯s Tt−k are given by k ¯ tw ¯ t−k M T = M E (w ¯ tw ¯ t−k ) M T = M H ww Γ0w M T , (4.8) Γks = E M w where the rightmost equality follows, e.g., from equation (2.1.31) of T s Lütkepohl (2005). Since Γ−k = Γks , the elements of Γks below the main diagonal are the covariances between si t+k and s j t . The interpretation of the elements of Γks depends on the formulation of the model. Suppose that the arguments of the functions (2.21a) are the (possibly scaled) variables si t , i = 1, . . . , n(x) + n( y) + n(z). In this case, ¯si t refers to the absolute deviation of the variable i from its stationary solution. The empirical counterparts of the model’s variables, are, however, 2
Set A = I n(w) , R = Γ0w , B = (H ww ) T , D = H ww , E = I n(w) , and F = 0n(w)×n(w) .
158
4 Perturbation Methods: Model Evaluation and Applications
very often percentage deviations from an estimated trend. Therefore, we must transform ¯s t to a vector of relative deviations: ˆs t = D¯s t = DM w ¯ t, 1 where D has the main diagonal x11 , . . . , x n(x) , y11 , . . . , yn(1 y) , 1, . . . , 1 and zeros elsewhere. In this case we must replace the matrix M in equation ˜ := DM . If the arguments of the system (2.21a) are the logs (4.8) with M of the variables, the elements of ¯s t are equal to ln(si t /si ) and, thus, refer to relative deviations so that no further transformation is required.
4.2.2 Digression: Unconditional Means In some applications, for instance in the computation of welfare measures as proposed by Schmitt-Grohé and Uribe (2007), the researcher might need unconditional expected values of the model’s variables. Consider the ¯ t+1 given by second-order approximate solution of the vector w w w ¯ t+1 = H ww w ¯ t + ν t+1 + 12 I n(w) ⊗ w ¯ 0t H ww ¯ t + 12 Hσσ w w , (4.9)
w where H ww consists of the n(x) stacked matrices H x i from equation (3.31a) appended by a zero matrix of size n(z)n(w) × n(w)
w H ww
x
1 H ww x2 H ww .. .
=
x
n(x) H ww
.
0n(z)n(w)×n(w) w Analogously, Hσσ is the row vector
x1 Hσσ ... w Hσσ = x n(x) . Hσσ 0n(z)×1
Taking unconditional expectations E on both sides of this equation yields w w ¯ t+1 = H ww Ew ¯ t + Eν t+1 + 12 E I n(w) ⊗ w ¯ 0t H ww ¯ t + 21 Hσσ Ew w .
4.2 Second Moments
159
¯ t+1 = Ew ¯ t . Furthermore, Eν t+1 = Covariance stationarity implies Ew i w 0n(w)×1 . Let H ww denote the i-th block of H ww so that 1 ¯ 0t H ww ¯t E w w w .. . ¯ 0t H ww ¯t = E I n(w) ⊗ w w . 0 n(w) ¯ t H ww w ¯t E w
Since each of the elements of the n(w) × 1 vector is a scalar, we can employ the property (12.17b) of the trace operator to obtain 1 w 1 1 ¯ 0t H ww ¯t ¯ tw ¯ 0t E w w tr E H ww w tr H ww Γ0 .. .. .. . = = . . . n(w) n(w) n(w) w ¯ 0t H ww ¯t ¯ tw ¯ 0t E w w tr E H ww w tr H ww Γ0 Therefore, the unconditional expectation is given as the solution of the linear system 1 w tr H ww Γ0 .. 1 w + 1 Hw . ¯t = 2 I n(w) − H w Ew (4.10) . 2 σσ n(w) w tr H ww Γ0 The unconditional expectation of the vector y t follows by an analogous derivation from equation (3.31b) and is given by y 1 tr H ww Γ0w 1 y .. ¯ t + 12 E¯ y t = H wy Ew (4.11) + 2 Hσσ . . yn( y) w tr H ww Γ0
4.2.3 Analytical Second Moments: Frequency Domain The covariances defined in equation (4.8) pertain to variables that are stationary by construction. For instance, the output y t in the model of Example 1.6.1 is defined relative to the balanced growth path given by a t y, so that 100( y t − y)/ y is the percentage deviation of output in the model from its trend. The empirical counterpart depends on the definition of the trend or, equivalently, on the definition of the cyclical component of gross domestic product. In technical terms, nonstationary time series must be filtered before the researcher can study their cyclical properties.
160
4 Perturbation Methods: Model Evaluation and Applications
However, filters usually introduce spurious dynamics, so that it is difficult to interpret differences between model-implied second moments and their empirical counterparts. For this reason, it is common to also apply the filter that has been used to extract the cyclical component from the data to the model’s variables. Implementing filters analytically in the time domain is computationally intricate. A nice example are the formulas developed by Burnside (1999), p. 24f. for the HP-filter. In this subsection, we therefore follow Uhlig (1999) and apply the filter in the frequency domain. There, the power spectrum of a filtered stochastic process is merely the product of the gain of the filter and the power spectrum of the unfiltered stochastic process. The relation between the time and the frequency domain is the Fourier transform (FT). The next paragraph introduces the required concepts in the univariate case. POPULATION SPECTRUM. Let Y := { y t } t∈Z , y t ∈ R denote a covariance stationary stochastic process with mean µ and covariances γk := E( y t − µ)( y t−k − µ),
k ∈ Z.
The sequence of covariances is said to be absolutely summable, if ∞ X k=−∞
(4.12)
|γk | < ∞.
In this case, the population spectrum of Y , denoted by sY (ω), is defined as the FT of the sequence of covariances at the angular frequency ω: sY (ω) =
∞ 1 X γk e−iωk , 2π k=−∞
i 2 = −1.
(4.13)
The angular frequency is measured in radians per unit of time. If f denotes the number of cycles per unit of time, ω and f are related to each other by ω = 2π f . According to Moivre’s theorem e−iωk = cos(ωk) − i sin(ωk), so that equation (4.13) can also be written as
4.2 Second Moments
sY (ω) =
161
∞ 1 X γk [cos(ωk) − i sin(ωk)] . 2π k=−∞
Furthermore, since γk = γ−k , cos(ω) = cos(−ω), and sin(−ω) = − sin(ω), the population spectrum of Y is real valued and given by ∞ X 1 sY (ω) = γ0 + 2 γk cos(ωk) . 2π k=1 Given the population spectrum of the process Y , its autocovariances can be recovered by an inverse FT3 Zπ γk =
−π
sY (ω)e iωk d ω.
In the time domain, a filter is represented by a polynomial in the lag operator L, which is defined by y t− j = L j y t . Let h(L) :=
∞ X
hj L j
(4.14)
j=−∞
denote such a polynomial and assume that its coefficients h j are absolutely summable as defined in (4.12). The population spectrum of the filtered process X , x t = h(L) y t , is related to the population spectrum of the unfiltered process by4 sX (ω) = h(e−iω )sY (ω)h(e iω ), where h(e±iω )) are obtained from (4.14) by replacing L with e±iω : h(e±iω ) = 3 4
∞ X j=−∞
h j e±iω j .
See Hamilton (1994), Proposition 6.1 and the proof on pp. 172f. See Hamilton (1994), equation [6.4.7].
162
4 Perturbation Methods: Model Evaluation and Applications
In general, h(e±iω ω) will be complex valued so that h(e±iω ) = a ± i b = r [cos(θ ) ± i sin(θ )] = r e±iθ , θ = arctan(b/a), where r is the absolute value of the complex scalar z = a − i b. Since (a − i b)(a + i b) = a2 + b2 = r 2 , the relation between the population spectra of Y and X , thus, is given by sY (ω) = |h(e−iω )|2 sX (ω).
The absolute value of the complex scalar h(e−iω ) is known as the gain of the filter h(L). POWER SPECTRUM OF VECTOR PROCESSES. Consider now the case of a vector-valued stochastic process {y t } t∈Z , y t ∈ Rn( y) with covariance matrices Γk := E (y t − µ) (y t−k − µ) T ,
µ = E(y t ).
(4.15)
Its population spectrum is defined as SY (ω) :=
∞ 1 X Γk e−iωk . 2π k=−∞
(4.16)
Since Γk 6= Γ−k , the entries of the n( y)×n( y) matrix SY (ω) below and above the main diagonal are in general complex scalars. As in the univariate case, the covariance matrices can be recovered from the reverse transformation Zπ Γk =
−π
SY (ω)e iωk d ω.
(4.17)
Note from equations (4.15) and (4.16) that the population spectrum of the vector-valued white noise process {ε t } t∈Z , is equal to Sε = Let
1 Σ. 2π
ε t iid N (0n×1 , Σ)
4.2 Second Moments ∞ X
H(L) :=
163
H j L,
j=−∞
H j ∈ Rn×n
denote a matrix-valued polynomial in the lag operator and x t = H(L)y t . Similar to the univariate case, the population spectrum of the process {x t } t∈Z is related to the spectrum of {y t } t∈Z by5 SX (ω) = H(e−iω )SY (ω)H(e iω ) T .
(4.18)
POWER SPECTRUM OF THE FIRST-ORDER SOLUTION. Using the lag operator, the process in equation (4.3) can be written as ¯ t+1 = ν t+1 . I n(w) − H ww L w | {z } =:H(L)
Therefore, H(e−iω ) =
I n(x) − H xx e−iω
0n(z)×n(x)
and iω T
H(e ) =
I n(x) − (H xx ) T e iω
−Hzx e−iω
I n(z) − Re−iω
−(Hzx ) T e iω
−1
0n(x)×n(z) I n(z) − R T e iω
−1
.
Using the formula for the inverse of a partitioned matrix given in equation (12.14) yields A11 A12 −iω H(e ) = , 0n(z)×n(x) A22 −1 A11 := I n(x) − H xx e−iω , −1 A22 := I n(z) − Re−iω , A12 := A11 Hzx e−iω A22 ,
and 5
See Hamilton (1994), equation [10.4.43].
164
4 Perturbation Methods: Model Evaluation and Applications
11 A¯ 0n(x)×n(z) H(e ) = 21 , A¯ A¯22 iω T
−1 A¯11 := I n(x) − (H xx ) T e iω , −1 A¯22 := I n(z) − R T e iω , A¯21 := A¯22 (Hzx ) T e iω A¯11 .
According to equation (4.18), the population spectrum of the process (4.3), thus is given by 1 A11 A12 0n(x)×n(x) 0n(x)×n(z) A¯11 0n(x)×n(z) Sw (ω) = , 22 0n(z)×n(x) Σ A¯21 A¯22 2π 0n(z)×n(x) A 21 22 1 A12 ¯ ¯ . = (4.19) 22 Σ A A 2π A The spectrum of the HP-filtered series follows from multiplying this expression by the square of the gain of this filter. King and Rebelo (1993), p. 220 show that the gain is the following function of the filter weight λ |B(ω)| =
4λ[1 − cos(ω)]2 . 1 + 4λ[1 − cos(ω)]2
(4.20)
Summarizing, in the frequency domain the spectral matrix of the HPfiltered process (4.3) is equal to SwH P (ω) = |B(ω)|2 Sw (ω),
(4.21)
where |B(ω)| is defined in (4.20) and Sw (ω) is defined in (4.19). Given this solution, the covariance matrices of the filtered data follow from the reverse transformation Zπ N −1 2π X H P HP Γk = SwH P (ω)e iωk d ω ' Sw (ω j )e iω j k . (4.22) N −π j=0 The rightmost term in this equation, which approximates the integral on its left, is the inverse discrete FT of the sequence of points
SwH P (ω j )e iω j k
N −1 j=0
.
To see this, note first that the power spectrum is periodic with period 2π, i.e., SwH P (ω) = SwH P (ω + 2π j), j ∈ Z. Accordingly,
4.2 Second Moments
Z
165
Z
π −π
SwH P (ω)e iωk d ω
=
2π 0
SwH P (ω)e iωk d ω.
Note second that the left Riemann sum of this integral is equal to N −1 X j=0
SwH P (ω j )e iω j k (ω j+1 − ω j ) =
N −1 X j=0
SwH P (ω j )e iω j k
2π , N
since ω j := 2π j/N . Both arguments together establish the approximation (4.22). Numerical routines that compute the sum on the rhs of (4.22) for k = 0, . . . , N − 1 employ algorithms, known as inverse fast Fourier transforms, that considerably reduce the number of floating point operations that the N − 1 sums of N − 1 terms require. Given the ΓkH P , the covariance matrix of the filtered vector ¯s t is given by P HP T ΩH k = M Γk M .
(4.23)
4.2.4 Second Moments: Monte-Carlo Approach DESCRIPTION. A third approach to compute second moments refers to simulations of the model. It proceeds in several steps: In a first step, we use a random number generator and draw an independently and identically distributed (iid) sequence of random vectors {ε} Tt=1 , which represents the innovations ε t of the process (2.21b). Next, we iterate over the policy function for the endogenous states x t using one of the approximate solutions (3.1a), (3.31a), or (3.54a) and an initial condition x1 . In a third step, we compute the time path for the jump variables y t from one of the solutions (3.1b), (3.31b), or (3.54b). From the simulated time series, we finally compute second moments. Before doing so, one can discard the first, say, T1 points from each series to remove the influence of the chosen starting point. The time from t = 1 to t = T1 is known as the burn-in period. With respect to the first-order solution, this procedure has sound theoretical underpinnings. The VAR(1)-process (4.3) generates stationary time series so that for large T the sample mean (16.6), the sample autocovariances (16.7), and covariances (16.9) are consistent estimates of the respective population moments.
166
4 Perturbation Methods: Model Evaluation and Applications
What are the advantages of this procedure over the analytical approach considered in the previous two subsections? First, we might be interested in the small sample properties of the model. Second, we can check during the simulations whether the simulated time paths remain within reasonable bounds. For instance, the central bank may not be able to set negative nominal interest rates. Third, as we will see, the simulation-based approach allows more flexibility with respect to filtering and the treatment of models with growth. Before we turn to these issues, we must touch on the problem of stationarity in the case of second- and third-order solutions. PRUNING: INDUCING STATIONARITY. Second- and third-order solutions of DSGE models do not guarantee that simulated time paths are stationary. The source of this problem are the quadratic and cubic terms in the approximate solution. They induce nonlinearities that may give rise to multiple equilibria and to explosive time paths. Figure 4.1 illustrates this issue. It depicts the graph of the third-order approximate policy function for the stock of capital K t in the deterministic Ramsey model of Section 1.3. The production function is specified as Yt = K tα , and the current-period utility 1−η function is specified as u(C t ) = (C t − 1)/(1 − η). The parameter values are α = 0.36, β = 0.996, η = 1, and δ = 1 so that the analytic solution is K t+1 = αβ K tα (see equation (1.20)). The coefficients of the third-order approximate solution K t+1 = ˆh(K t ) = K + a1 (K t − K) +
a3 a2 (K t − K)2 + (K t − K)3 2 6
are given by a1 = α2 β K α−1 , a2 = (α − 1)α2 β K α−2 , a3 = (α − 2)(α − 1)α2 β K α−3 ,
where K = (αβ)1/(1−α) denotes the true stationary solution of the model. In Figure 4.1 the point K1 marks this solution. In addition to the true stationary solution of the model, there is a further point, K2 , at which the policy function ˆh(K t ) cuts the 45◦ line. Time paths computed with ˆh(K t ) that start in the interval (0, K2 ) will converge to the true stationary solution, whereas paths starting to the right of K2 will diverge. As we know from our study of the phase diagram in Section 1.3.4, the exact solution always converges to the stationary equilibrium at K. Therefore, the behavior illustrated in Figure 4.1 is an artifact of the third-order approximation of the policy function.
4.2 Second Moments
167 1.8 1.6 1.4
45◦
K t+1
1.2
K2
1.0 0.8 0.6 0.4 0.2 0.0
K1 0
0.2
0.4
0.6
0.8
1
1.2
Kt Figure 4.1 Third-Order Approximate Policy Function for Capital
Kim et al. (2008), Den Haan and De Wind (2012), and Andreasen et al. (2018) propose to eliminate higher-order terms from the perturbation solution. We sketch the approach proposed by the latter authors. They decompose the state vector into a first-, second-, and third-order component and develop the dynamics of each component separately. In our notation, ¯ t denote the vector of endogenous states in terms of deviations from let x the deterministic steady state, e.g., ¯ t := x t − x. x
We decompose this vector additively into a first-, second-, and third-order component, indicated by the superscripts f , s, and r, respectively:6 f
¯t = x ¯t + x ¯st + x ¯ rt . x The dynamics of the first-order component follows from the joint dynamics of the vector of exogenous states z t (see equation (2.21b)) and the firstorder the policy function (3.1a): z t+1 = Rz t + Ωε t+1 , f
f
¯ t+1 = hwx w ¯t, x 6
f
¯ t := w
f
¯t x . zt
(4.24a) (4.24b)
We develop the formulas for the second- and third-order pruned solution. Accordingly, ¯ rt can be skipped if only a second-order solution is desired. the term x
168
4 Perturbation Methods: Model Evaluation and Applications
Since the eigenvalues of both matrices R and hwx are within the unit circle the system (4.24a) and (4.24b) constitutes a stable first-order stochastic f ¯t. difference equation in w ¯st involves only secondThe dynamic equation of the second-order part x order terms, where we consider the perturbation parameter σ as a variable. From the second-order policy function (3.31a), we obtain the dynamic equation T f f 1 s x s x x ¯ ¯ ¯t ¯ t + 12 hσσ x t+1 = h x x t + 2 I n(x) ⊗ w hww w σ2 . (4.24c) x ¯st cannot diverge since hσσ The vector x is a matrix of constants, σ = 1, f
¯ t is a stable stochastic h xx has eigenvalues within the unit circle, and w difference equation. Accordingly, also the sum of the first- and secondf +s f ¯t = x ¯t + x ¯st , cannot diverge. From the policy function for order term, x the jump variables (3.31b), we obtain the pruned solution, which includes only first- and second-order terms:7 T f +s f f f y y ¯ t = hwy w ¯t +w ¯ st + 12 I n( y) ⊗ w ¯t ¯ t + 12 hσσ y hww w , (4.24d) ¯st x ¯ st := w . 0n(z)×1
The same logic applies with respect to the dynamics of the third-order ¯ t . The respective dynamic equation involves component of the state vector x only third-order effects. The policy function (3.54a) implies8 T f x x ¯ rt+1 =h xx x ¯ rt + I n(x) ⊗ w ¯t ¯ st + 16 hσσσ x hww w σ3 (4.24e) T T f f f x ¯t ¯t ¯t + 16 I n(x) ⊗ w ⊗ w hwww w T x ¯f + 12 I n(x) ⊗ w hσσw σ2 . ¯ t is also a stable difference equation since This third part of the vector x x hσσσ is a vector of constants and - as explained above - the processes f
f +s+r
¯ t and w ¯ st are stable. Accordingly, the sum x ¯t for w
7
f
¯t + x ¯st + x ¯st is =x
Note that (2.21b) is a linear stochastic process with second- and third-order effects equal to zero. 8 The second term on the rhs of the first line derives from the fact that T x s 1 T x T x s 1 ¯f ¯ t + 2 I n(x) ⊗ w ¯ st ¯ tf = I n(x) ⊗ w ¯ tf ¯t hww w hww w hww w 2 I n(x) ⊗ w t xi since the matrices hww are symmetric.
4.2 Second Moments
169
a stationary stochastic process. It constitutes the pruned solution of the state vector up to third-order effects. From the policy function (3.54b), we obtain the pruned solution for the jump variables, which includes effects up to the third-order:9 T f +s+r f f f y ¯t ¯t +w ¯ st + w ¯ rt + 12 I n( y) ⊗ w ¯t ¯ t + 2w ¯ st y =hwy w hww w T T f f f y y ¯t ¯t ¯ t + 12 hσσ + 16 I n( y) ⊗ w ⊗ w hwww w σ2 T y y ¯f + 12 I n( y) ⊗ w hσσw σ2 + 16 hσσσ σ3 , (4.24f) ¯ rt x ¯ rt := w . 0n(z)×1 Summarizing, the second-order pruned solution consists of equations (4.24a),(4.24b), (4.24c) and (4.24d). The third-order pruned solution involves additionally equation (4.24e) and equation (4.24f) instead of (4.24d). CONSISTENCY. In this paragraph, we consider the consistent estimation of second moments from simulated time series. Suppose that we have simulated either from the original or the pruned solution i = 1, . . . , N time series of length T for two variables, say, x, and y. For each time series i, we estimate the mean and the covariance at lag k from T X 1 x¯ := xi, T − k t=1+k t i
¯y i := cki 9
T −k 1 X i y, T − k t=1 t
T X i 1 := x ti − x¯ i y t−k − ¯y i . T − 1 − k t=1+k
(4.25a) (4.25b) (4.25c)
Note that from the product ¯ tf + w ¯ st + w ¯ rt ) ⊗ (w ¯ tf + w ¯ st + w ¯ rt ) (w
only the sum
¯f ⊗w ¯ tf + w ¯ tf ⊗ w ¯ st + w ¯ st ⊗ w ¯ tf w
y includes effects of order less than four. Since the n( y) blocks of the matrix hww are s ¯ t. symmetric matrices the third-order terms can be summarized in the term 2w
170
4 Perturbation Methods: Model Evaluation and Applications
If each of the N simulated time series is stationary and ergodic these formulas provide consistent estimates of µ x = E(x ti ), µ y = E( y ti ), and xy γk = E(x ti − µ x )( y ti − µ y ) as T → ∞.10 A common practice, at least in the early days of DSGE modeling, however, is to estimate second moments from averages over N simulations. Often, the length T of each simulation matches the number of observations from which the empirical moments are estimated.11 We will show that this practice does not deliver consistent estimates. In particular, N 1X i xy ck 6= γk . N N →∞ i=1
plim
To see this, note that (4.25c) can be written as cki = =
T X 1 T −k i x ti y t−k − x¯ i ¯y i , T − 1 − k t=1+k T −1−k
T TX −1−k X T −s X 1 1 i x ti y t−k − xi yi T − 1 − k t=1+k (T − k)(T − 1 − k) s=1 t=1+k t t+s−k TX −1−k X T 1 − xi yi . (T − k)(T − 1 − k) s=1 t=1+s+k t t−s−k
For independent simulations i = 1, . . . , N and given t, x ti and y ti are iid variables. We can then invoke Khinchine’s weak law of large numbers (see, e.g., Greene (2012), Theorem D.5) and conclude that N 1X i i plim x t y t−k = E(X t Yt−k ), N →∞ N i=1
N 1X i i x t y t+s−k = E(X t Yt−(k−s) ), N →∞ N i=1
plim
N 1X i i x t y t−s−k) = E(X t Yt−(k+s) ). N →∞ N i=1
plim
The probability limit of cki can be written as 10
See, e.g., Hamilton (1994), pp. 46-47. See, e.g., Cooley and Prescott (1995) who estimate second moments as averages from 100 simulations with 150 periods. 11
4.2 Second Moments
plim
N X
N →∞ i=1
cki =
171
T N X 1 1X i i plim x y T − 1 − k t=1+k N →∞ N i=1 t t−k
TX −1−k X T −s N 1 1X i i − plim x y (T − k)(T − 1 − k) s=1 t=1+k N →∞ N i=1 t t+s−k
−
TX −1−k X T N 1 1X i i plim x y . (T − k)(T − 1 − k) s=1 t=1+k N →∞ N i=1 t t−s−k
Covariance stationarity implies that the population covariances E(X t Yt−k ), E(X t Yt−(k−s) ), and E(X t Yt−(k+s) ) depend on k, k − s and k + s but not on the time index t. Therefore, plim
N X
N →∞ i=1
cki =E(X t Yt−k ) − −
TX −k−1 1 (T − s − k)E(X t Yt−(k−s) ) (T − k)(T − 1 − k) s=1
TX −k−1 1 (T − s − k)E(X t Yt−(k+s) ). (T − k)(T − 1 − k) s=1
Adding and subtracting µX = E(X t ) and µY = E(Yt ) and noting that γXk Y = E(X t Yt−k ) − µX µY finally yields plim
N X
N →∞ i=1
cki =γXk Y − −
TX −1−k 1 Y (T − 1 − k)γXk−s (T − k)(T − 1 − k) s=1
TX −1−k 1 γX Y . (T − k)(T − 1 − k) s=1 k+s
(4.27)
Thus, the average of sample covariances from N independently and identically distributed (iid) simulations of the model does not converge to γXk Y . The bias is introduced by the two rightmost terms in equation (4.27). In practice, this bias may be small, in particular if T is large so that the cki are already close to γXk Y . A consistent estimate can be constructed by replacing x¯ i and ¯y i in equation (4.25c) with the means over all simulations x¯ :=
N N 1X i 1X i ¯y . x¯ t and ¯y := N i=1 N i=1 t
PN i Since x¯ and ¯y converge to µX and µY and 1/N i=1 x ti y t−k converges to i XY E(X t Yt−k ), the probability limit of ck will then be γk .
172
4 Perturbation Methods: Model Evaluation and Applications
MODELS WITH TRENDS. The benchmark business cycle model considered in Example 1.6.1 involves a deterministic trend in labor-augmenting technical progress A t . The solution procedure outlined in Chapter 3 requires variables that converge to constants in the deterministic version of the underlying DSGE model. We have therefore scaled all growing variables by the level of A t . If we compute second moments for the scaled variables, either analytically or from simulated time series, how do they relate to second moments estimated from the data? This is the issue addressed in this paragraph. Throughout the following, we assume that the empirical second moments refer to data in percentage deviations from an estimated trend. For the moment, we will be agnostic about its estimation. It could be a simple time series mean, a linear trend, or some more advanced method such as, e.g., the filters proposed by Hodrick and Prescott (1997), Baxter and King (1999), or Hamilton (2018). We must, however, distinguish between two kinds of trends: 1) The growth factor a of the trend is exogenously given so that A t+1 = aA t is a deterministic trend as in the benchmark model, 2) the growth factor is determined within the model and has a stationary value of a. There are two kinds of models that belong to this class: 2a) models of stochastic growth, such as, e.g., the model of Christiano and Eichenbaum (1992), where a t = aez t and z t is a covariance stationary stochastic process. 2b) models with endogenous growth such as, e.g., the models of Romer (1986), Lucas (1988), or Aghion and Howitt (1992). Consider a model with deterministic growth and let Vt denote the level ξ of one of the growing variables so that vt := Vt /A t is the scaled variable. We introduce the parameter ξ for additional flexibility. The value ξ = 0 indicates no scaling as, e.g., for hours or the rental rate of capital in the model of Example 1.6.1. In the same model ξ = −η for the Lagrangian multiplier of the household’s budget constraint and ξ = 1 for output and the remaining variables. For some given initial A0 , the level and the scaled variable are related to each other according to Vt = (a t A0 )ξ vt .
(4.28)
Simulations of the model provide a time series for the scaled variable {vt } Tt=1 from which we can recover the level of the variable. We are then able to remove the same trend from the simulated time series {Vt } Tt=1 that has been used to construct the empirical moments. The same reasoning applies to models, where the growth factor A t /A t−1 =: a t is endogenous. In ξ these models, we must scale by A t−1 to preserve the property of endogenous
4.2 Second Moments
173
state variables to be predetermined at the beginning of period t. From the simulated time paths of {a t } Tt=1 and {vt } Tt=1 we construct the level from Vt = (a1 a2 · · · a t−1 A0 )ξ vt .
(4.29)
Now, suppose that we have used the HP-filter to estimate the cyclical component of our empirical time series. Taking logs on both sides of (4.28) delivers ln Vt = ξ ln A0 + (ξ ln a)t + ln vt = (ξ ln A0 + ln v) + (ξ ln a)t + ln(vt /v). The first two terms on the rhs of the second equality sign are a constant, ξ ln A0 + ln v, and a linear time trend (ξ ln a)t. We show in Section 16.5.2 that the cyclical component of the HP-filter does not change if we add a constant and a linear time trend to a given time series. Therefore, passing ln Vt through the HP-filter and passing ln(vt /v) through this filter delivers y identical cyclical components. The matrices hwx and hw of the first-order solution of the model either relate to absolute or relative deviations from the stationary solution v, depending on whether the model is formulated in the levels or in the logs of the scaled variables. In both cases we can employ the technique from Section 4.2.3 to construct analytical second moments from the first-order solution without the need to simulate the model. These estimates are consistent with second moments computed from HP-filtered empirical time series. However, this equivalence does not hold in the case of models with stochastic or endogenous growth. Taking logs on both sides of (4.29) yields ln Vt = ξ ln A0 + ξ
t−1 X i=1
ln ai + ln vt ,
= (ξ ln A0 + ln v) + (ξ ln a)(t − 1) + ξ
t−1 X i=1
ln(ai /a) + ln(vt /v).
Applying the HP-filter to the log of Vt yields the same cyclical component as the filtering of the sum of the two rightmost terms in the second line y of the previous equation. Second moments constructed from hwx and hw , however, refer only to the term ln(vt /v) and thus ignore the sum ξ
t−1 X
ln(ai /a).
(4.30)
i=1
Accordingly, the Monte Carlo approach is the adequate method to compute second moments for model with a time variable growth factor a t . Both versions of our toolbox consider the sum (4.30).
174
4 Perturbation Methods: Model Evaluation and Applications
4.3 Impulse Responses PURPOSE. The impulse response function (IRF) shows the time path of a variable of the canonical DSGE model presented in Section 2.5.2 triggered by a one-time shock in one of the model’s driving processes zi t , i = 1, . . . , n(z). IRFs serve two purposes. First, they give insight into the mechanics of the model. Consider, e.g., the model presented in Example 1.6.1. Figure 1.7 plots the percentage deviation of several variables from their stationary solution in response to an unexpected increase in the log of total factor productivity (TFP) of the size of one standard deviation in period t = 1. The graphs show that this shock triggers an increase in real wages that in turn increases labor supply so that output increases by more than that due to the outward shift of the production function. Second, the impulse responses from a model can be compared to those estimated from a set of macroeconomic time series (see, e.g., Figure 1.8) to see how close the model comes to the data. Equipped with a metric that measures closeness, researches can estimate certain parameters of the model by minimizing the distance between the model’s impulse responses and those obtained from the data. DEFINITION. More formally, let a hat denote the perturbation solution of the policy functions (2.23) of the model defined in (2.21) as given by one of the formulas (3.1), (3.31), or (3.54). Assume that the model was in its deterministic stationary equilibrium before time t = 1 and that at t = 1 there is an unexpected, one-time shock to the ith element of the model’s exogenous variables z t ∈ Rn(z) . Then, the time path of the model follows approximatively by iterating over ¨ Ωei , for t = 1, z t+1 = Rz t + (4.31a) 0n(z)×1 , for t = 2, 3, . . . , T, ˆ x (x t , z t , σ), x t+1 = h ˆy
y t = h (x t , z t , σ),
(4.31b) (4.31c)
for t = 1, 2, . . . , T , where ei is the vector with 1 in place i and zeros elsewhere. It is straightforward to compute relative (or, if the stationary value of a certain variable is equal to zero, absolute) deviations from this solution. Two qualifications to this statement are in order.
4.3 Impulse Responses
175
IRFS AND ORDER OF SOLUTION. First, assume that IRFs are computed from a second- or third-order perturbation solution. Then, as explained in Section 4.2.4, it is not guaranteed that they converge, and if they do, they will not converge to the stationary solution due to the presence of secondand third-order terms in the policy functions. For the pruned systems (4.24), Andreasen et al. (2018) construct generalized impulse responses analytically. A simulation based method works as follows. In a first step, it obtains the long-run equilibrium values of the vector of endogenous states x t from iterations over (4.31b) with z t = 0∀t ∈ N until convergence of x t has been achieved. Using this solution in (4.31c) provides the long-run equilibrium of the vector y t . The IRFs computed in the second step from iterations over the system (4.31) are then expressed as percentage deviations from the long-run solutions computed in the first step. IRFs constructed in this way approach zero for t → ∞. Note that both problems will not surface, if the linear approximate policy functions are used to compute the IRFs. The linear part of the solution is always stable and converges to the deterministic stationary solution of the model. If the objective of IRFs is to understand the working of the model, precision is less of an issue. Therefore, our GAUSS procedure Impulse R and the MATLAB function Impulse1.m compute IRFs from the linear solution, while the Impulse2 and Impulse2.m employ the simulation based approach. IRFS AND GROWTH. Second, assume that (some) variables of the original model display a trend and must, therefore, be scaled by some variable A t . ξ As in Section 4.2.4, we assume that vt := Vt /A t for the relation between the level of a variable Vt and its stationary value vt and distinguish between two cases: 1) The trend is purely deterministic with growth factor a, and 2) the trend is either stochastic or endogenous to the model so that a t is a variable with stationary solution a. The model’s solution always delivers a time path for the scaled variable vt , but we want the IRF to be the deviation between the time path of the variable triggered by the shock and the path the variable would have followed without the shock. Consider case 1). For some given initial A0 we have A t = a t A0 , and the trend path is given by VtTr end = (a t A0 )ξ v, whereas the time path triggered by the shock is Vt = (a t A0 )ξ vt . Therefore, the relative deviation between the two paths is equal to
176
4 Perturbation Methods: Model Evaluation and Applications
Vt Tr end Vt
=
vt v
and is, thus, equal to the relative deviation of the scaled variable. Now, consider case 2) so that A t = a t A t−1 . For a state variable X i t to remain predetermined at period t we must scale by A t−1 rather than by ξ A t (which changes within period t). Thus, let us define vt := Vt /A t−1 as the relation between the scaled variable and its level Vt . The time path of Vt after the shock is then equal to ξ
Vt = A t−1 vt = (a1 a2 · · · a t−1 A0 )ξ vt , for t = 2, 3, . . . whereas the trend path is given by VtTr end = (a t−1 A0 )ξ v. The relation between the two paths is then equal to Vt VtTr end
=
a a a t−1 ξ vt 1 2 ··· . a a a v
The percentage deviation is thus given by12 100 ln(Vt /VtTr end ) = 100 ξ
t−1 X i=1
ln(ai /a) + ln(vt /v) , for t = 2, 3, . . . .
Both versions of our toolbox compute impulse responses according to this formula if the user indicates that the growth factor of the model is an endogenous variable.
4.4 The Benchmark Business Cycle Model The benchmark business cycle model is presented in Example 1.6.1. Equations (1.64) define the dynamics of this model, and equations (1.65) determine the stationary solution of the deterministic counterpart of this model. In this subsection, we use this model to illustrate the precision of first-, second-, and third-order solutions and compare different ways to compute second moments from the solution of the model. The parameter values used for these exercises are those presented in Table 1.1. All results R were computed with the MATLAB script BM_pert.m. 12
Note that ln(vt /v) ≈ (vt − v)/v from a linear approximation of ln(vt /v) at the point vt = v.
4.4 The Benchmark Business Cycle Model
177
EULER EQUATION RESIDUALS. The benchmark model has one endogenous state variable, the stock of capital k t , and one shock, the log of total factor productivity (TFP) z t := ln Z t . Therefore, the policy functions for one of the model’s variables v are functions vt = h v (k t , z t ). For an arbitrary point (ki , z j ) in a neighborhood of the deterministic steady state (k, 0), we compute the left- and the right-hand side of equation (1.64h) from the perturbation solution ˆh v (ki , z j ) for capital v = k, the Lagrange multiplier v = λ, and the rental rate of capital v = r. Since the innovations ε are normally distributed, we obtain: rhs = β a−η E t ˆhλ (k0 , ρ Z z j + ε)(1 − δ + ˆh r (k0 , ρ Z z j + ε)), Z∞ 2 − ε2 β a−η λ 0 r 0 2σε ˆ ˆ =Æ h (k , ρ Z z j + ε)(1 − δ + h (k , ρ Z z j + ε))e d ε, 2πσε2 −∞ where k0 = ˆhk (ki , z j ). We employ the Gauss-Hermite quadrature formula (14.31) with 15 nodes to approximate the integral. Finally, solving −η
rhs = ch (1 − ˆh L (ki , z j ))θ (1−η)
for ch yields the hypothetical value of consumption c that is required to meet equation (1.64h). Our measure of the approximation error is the relative change in consumption: ch ˆhc (ki , z j )
− 1.
The perturbation approach employs only information about the exact solution at the deterministic steady state. Hence, we can expect that the precision deteriorates farther away from this point. To check this intuition, we consider four two-dimensional grids. They are presented in the first column of Table 4.1. Each combines 100 equally spaced points ki of the capital stock with 100 equally spaced points z j of the log of TFP. The smallest interval for the capital stock is between ±5 percent of the stationary capital k, and the largest is between ±10 percent. Even in a simulation with 50,000 periods the capital stock remains within this latter interval. The standard deviation for the process for z, σz , follows from equation (4.5).13 . We consider intervals of size ±2σz and ±2.5σz . Since z t is nor13
For H ww = ρ Z and Σν = σε2 we obtain q σz = σε / 1 − ρ Z2 .
178
4 Perturbation Methods: Model Evaluation and Applications
mally distributed, approximately 95 and 99 percent of the realizations of z t lie within the respective interval. Table 4.1 Euler Equation Residuals: Benchmark Business Cycle Model Grid
1st
2nd
3rd
[0.05k, 1.05k] × [−2.0σz , 2.0, σz ] [0.05k, 1.05k] × [−2.5σz , 2.5, σz ] [0.10k, 1.10k] × [−2.0σz , 2.0, σz ] [0.10k, 1.10k] × [−2.5σz , 2.5, σz ]
1.265e − 03 1.431e − 03 3.956e − 03 4.228e − 03
5.279e − 05 5.545e − 05 3.862e − 04 4.022e − 04
2.960e − 06 3.249e − 06 3.969e − 05 4.181e − 05
The table confirms the intuition. The first-order solution has a maximum Euler equation residual of approximately 0.13 percent on the smallest grid and of 0.42 percent on the largest grid. On the smallest grid, the first-order solution is approximately 24 times larger than the second-order solution, which in turn is approximately 18 times larger than the third-order solution. Even on the largest grid, the third-order solution requires less than 0.0042 percent of additional consumption to meet the Euler equation. SECOND MOMENTS. Table 4.2 presents three sets of second moments computed from quarterly German time series between the first quarter of 1991 and the fourth quarter of 2019. The reader will find a more detailed description of the data and the construction of the five time series in Section 1.6.2. The three sets of second moments differ with respect to the time series filter employed to remove the trend. We consider a linear trend, the popular Hodrick-Prescott filter, and the filter proposed recently by Hamilton (2018). The latter identifies the cyclical component of a time series { y t } Tt=1 with the residuals from a regression of y t+h on the p most recent observations y t−i , i = 0, . . . , p − 1. In our program GetPar.m we use h = 8 and p = 4, as proposed by Hamilton (2018). The second moments are the standard deviations, the correlations with output, and the first-order autocorrelation of output, consumption, investment, working hours, and the real wage. The Hamilton filter yields considerably more volatile cyclical components than the Hodrick-Prescott filter. The linear filter implies less volatility for output, consumption, and investment but more volatility for hours and the real wage than the Hamilton filter. Both the Hamilton and the
4.4 The Benchmark Business Cycle Model
179
Table 4.2 Second Moments: German Data Variable
sx
rx y
rx
Linear Trend Output Consumption Investment Hours Real Wage
1.64 1.00 1.17 0.45 4.57 0.88 2.35 0.44 1.75 −0.14
0.83 0.85 0.83 0.92 0.86
sx
rx y
rx
Hamilton Filter 2.60 1.00 1.36 0.64 6.69 0.93 1.92 0.68 1.62 −0.20
0.85 0.85 0.84 0.89 0.85
sx
rx y
rx
HP Filter 1.41 1.00 0.73 0.61 3.68 0.94 0.91 0.83 0.80 −0.29
0.80 0.64 0.80 0.85 0.60
Notes: For the definition of the time series data see Section 1.6.2. s x :=standard deviation of variable x, r x y :=cross-correlation of variable x with output, r x :=first-order autocorrelation of variable x.
Hodrick-Prescott filters imply a stronger correlation between output and the other variables in the data set than the linear filter. In particular, the Hodrick-Prescott filter yields the strongest negative correlation between output and the real wage. Note that there are almost negligible differences between autocorrelations based on the linear trend and those based on the Hamilton filter. The Hodrick-Prescott filter yields smaller persistence, in particular with respect to investment and the real wage. Table 4.3 displays six sets of second moments computed from the solution and simulation of the model. The first three panels display results for unfiltered artificial time series. The upper-left panel refers to averages over 500 simulations of length 116 periods, equal to the number of periods that underly the empirical estimates in Table 4.2. The second moments shown in the upper-middle panel are from one simulation with 50,000 observations. Both simulations employ the first-order solution so that they can be compared with second moments in the upper-right panel. This panel shows second moments computed analytically from equation (4.8). The three sets illustrate the result from Section 4.2.4, where we find that averages over short time series do not converge to the true second moments. The moments from the one long simulation differ from the analytical moments only marginally. However, there are noticeable differences to the results from many short simulations. The latter show less volatility, less persistence, and – with respect to hours – a stronger correlation with output.
180
4 Perturbation Methods: Model Evaluation and Applications Table 4.3 Second Moments: Benchmark Business Cycle Model Variable
sx
rx y
rx
sx
rx y
rx
sx
rx y
rx
Unfiltered T = 116, S = 500
Output Consumption Investment Hours Real Wage
1.95 0.78 5.59 1.10 0.92
1.00 0.93 0.99 0.97 0.96
0.79 0.85 0.78 0.78 0.83
T = 50, 000, S = 1
2.17 1.06 5.89 1.14 1.18
1.00 0.90 0.97 0.94 0.94
0.85 0.93 0.82 0.81 0.92
Analytical
2.16 1.05 5.87 1.13 1.17
1.00 0.90 0.97 0.94 0.94
0.85 0.93 0.82 0.81 0.91
Filtered Hamilton T = 50, 000, S = 1
Output Consumption Investment Hours Real Wage
2.05 0.78 5.78 1.12 0.94
1.00 0.99 0.99 0.97 0.99
0.82 0.85 0.80 0.80 0.84
Hodrick-Prescott T = 116, S = 500
1.35 0.44 4.02 0.80 0.56
1.00 0.99 1.00 1.00 1.00
0.61 0.62 0.61 0.61 0.62
Analytical
1.39 0.46 4.13 0.82 0.57
1.00 0.99 1.00 1.00 0.99
0.64 0.66 0.64 0.64 0.65
The second moments in the lower-middle and the lower-right panel are comparable to HP-filtered empirical data. The entries in the lower-right panel are computed from equation (4.21), while those in in the middleright panel are averages from 500 simulations of length 116 periods. Note that different from the unfiltered second moments the small sample bias is small. The standard deviations and autocorrelations from the simulated time series are slightly smaller than the analytical values while the correlations with output are almost identical. As is the case with the empirical data, applying the Hamilton filter to simulated data yields more volatility and more persistence. Note that the second moments presented in the upper three panels are comparable to second moments computed from empirical data that represent percentage deviations from a linear trend. We can, therefore, ask how closely the model matches the second moments presented in the upper panel of Table 4.2. A simple measure of distance is the sum of squared differences between the empirical and the model-implied second moments. Using the analytical second moments from the upper-right panel
4.5 Time-to-Build Model
181
of Table 4.3 yields a score of 5.29. If we look at both the data and the model through the lens of the HP filter, the sum of squared differences between the second moments in the third panel of Table 4.2 and those in the lower-right panel of Table 4.3 is 2.25. With respect to this metric the Hamilton filter scores between the HP filter and the linear filter. The sum of squared differences between the entries in the middle panel of Table 4.2 and those in the lower-left panel of Table 4.3 is 4.19. The reader may wonder which filter, if any, is appropriate. For a long time period the HP-filter was the consensus between most macroeconomic researchers. Until a new consensus emerges, a transparent way to present the implications of a model is to provide the results as in Table 4.3 for different time series filters.
4.5 Time-to-Build Model GESTATION PERIOD. In the benchmark business cycle model investment projects are finished after one quarter. In their classic article, Kydland and Prescott (1982) use a more realistic gestation period. Based on published studies of investment projects, they assume that it takes four quarters for an investment project to be finished. The investment costs are spread out evenly over this period. We introduce the time-to-build assumption into the benchmark model of the previous subsection and show the ease of adapting a rather tricky model to the linear-quadratic model. The only difference between the benchmark business cycle model and the time-to-build model is the timing of investment expenditures. In each quarter t the representative household launches a new investment project of size S4t . After four quarters this project is finished and adds to the capital stock. The investment costs are spread out over the entire gestation period. More formally, let Si t denote the outstanding projects at period t that require i = 1, 2, 3, 4 more quarters to be completed, and let ωi ∈ [0, 1] denote the fraction of their total costs incurred in period t. Since there are four unfinished projects in each quarter, the household’s investment expenditures I t amount to: It =
4 X i=1
ωi Si t ,
4 X i=1
ωi = 1.
Obviously, the Si t are related to each other in the following way:
(4.32)
182
4 Perturbation Methods: Model Evaluation and Applications
S1t+1 = S2t , S2t+1 = S3t ,
(4.33)
S3t+1 = S4t , and the capital stock evolves according to K t+1 = (1 − δ)K t + S1t .
(4.34)
FIRST-ORDER CONDITIONS. The framework of the stochastic LQ model of Section 2.4 involves a quadratic objective function and a linear law of motion. To use this framework, we formulate the model in terms of a stationary planning problem. In the unscaled variables the Lagrangian function of this problem reads: X ∞ 1 1−η s L t =E t β C t+1 (1 − L t+s )θ (1−η) 1 − η s=0 4 X α + Λ t+s Z t+1 K t+s (A t+s L t+s )1−α − C t+s −
ωi Si t+s
i=1
+ Γ t+s (K t+s+1 − (1 − δ)K t+s − S1t+1 ) . Since the level of labor-augmenting technical progress A t grows deterministically, A t+1 = aA t , an equivalent way to write this function is X ∞ 1 1−η 1−η s ˜ L t =A t E t β c t+1 (1 − L t+s )θ (1−η) 1 − η s=0 4 X + λ t+s Z t+1 kαt+s L 1−α t+s − c t+s −
ωi si t+s
i=1
+ γ t+s (ak t+s+1 − (1 − δ)k t+s − s1t+1 ) , η
η
where λ t := Λ t A t , γ t := Γ t A t , c t := C t /A t , i t := I t /A t , k t := K t /A t , si t := Si t /A t , and β˜ := β a1−η . Obviously, maximizing the expression in braces with respect to the scaled variables solves the planning problem. Differentiating the Lagrangian with respect to c t , L t , s4t and k t+4 provides the following conditions:14 14
To keep track of the various terms that involve s4t and k t+4 , it is helpful to write out the sum for s = 0, 1, 2, 3, 4.
4.5 Time-to-Build Model
183
−η
λ t = c t (1 − L t )θ (1−η) ,
(4.35a)
θ ct = (1 − α)Z t kαt L −α t , 1 − Lt ˜ ˜ 2 0 = E t − ω4 λ t − (β/a)ω 3 λ t+1 − (β/a) ω2 λ t+2 ˜ 3 ω1 λ t+3 + (β/a) ˜ 3 γ t+3 , − (β/a) ˜ 3 γ t+3 + (β/a) ˜ 4 (1 − δ)γ t+4 0 = E t − (β/a) ˜ 4 λ t+4 αZ t+4 kα−1 L 1−α . + (β/a) t+4
(4.35b) (4.35c)
(4.35d)
t+4
The first and the second conditions are standard and need no comment. The third and the fourth conditions imply the following Euler equation in the shadow price of capital: ¦ ˜ 0 = E t ω4 [(β/a)(1 − δ)λ t+1 − λ t ] (4.35e) ˜ ˜ + ω3 (β/a)[( β/a)(1 − δ)λ t+2 − λ t+1 ] 2 ˜ ˜ + ω2 (β/a) [(β/a)(1 − δ)λ t+3 − λ t+2 ] ˜ 3 [(β/a)(1 ˜ + ω1 (β/a) − δ)λ t+4 − λ t+3 ] © ˜ 4 αλ t+4 Z t+4 kα−1 L 1−α . + (β/a) t+4 t+4
THE LQ-PROBLEM. The state variables of this model are x t = [1, k t , s1t , s2t , s3t , ln Z t ] T , where the first element captures the constant terms in the policy functions. The current-period objective function can be written as a function of these state variables and the two control variables u t := [s4t , L t ] T : 1−η 4 X 1 ln Z t α 1−α g(x t , u t ) := e kt L t − ωi si t (1 − L t )θ (1−η) . 1−η i=1
The objective function of the LQ-problem of Section 2.4 is a quadratic function of the state and control variables. Therefore, we approximate g(x t , u t ) at the deterministic stationary solution x and u quadratically: T T xt x g(x t , u t ) ' g(x, u) + ∇g(x, u) + x t u t H(x, u) t , ut ut where ∇g(x, u) and H(x, u) are the gradient and the Hessian matrix of g(x, u), respectively, at the stationary solution.
184
4 Perturbation Methods: Model Evaluation and Applications
The linear law of motion for the state variables follows from (4.33) and (4.34): 1 0 000 0 00 0 0 1−δ 1 0 0 0 0 0 0 a a 0 0 0 1 0 0 0 0 0 a x t+1 = (4.36) ut + . 0 0 0 0 1 0 x t + 0 0 0 a 1 0 0 0 0 0 0 0 a 0 εt 00 0 0 000ρ | {z } | {z } :=A
=:B
STATIONARY EQUILIBRIUM. The remaining task is to compute the stationary equilibrium of the deterministic version of the model. In this equilibrium, the level of total factor productivity is equal to unity: Z = 1. Using this and λ t+s = λ for all t and s in condition (4.35c) yields ˜ − δ) y a − β(1 ˜ 2 + (a/β) ˜ 2 ω3 + (a/β) ˜ 3 ω4 . (4.37) = ω1 + (a/β)ω ˜ k αβ Given a, β, δ, and η, we can solve this equation for the output-capital ratio y/k. From (1 − δ)k + s1 = ak we find s1 = (a + δ − 1)k, the stationary level of new investment projects started in each period. Total investment per unit of capital is then given by 4 4 X i 1X = ωi si = (a + δ − 1) a i−1 ωi . k k i=1 i=1
Using this, we can solve for y c i = − . k k k Since y/c = ( y/k)/(c/k), we can finally solve the stationary version of (4.35b) for L. This solution in turn provides k = L( y/k)1/(α−1) , which allows us to solve for i and c. R RESULTS. The MATLAB script TTB.m solves the model. It employs the parameter values from Table 1.1, which we have also used in the benchmark model of the previous subsection. As Kydland and Prescott (1982), we set ωi = 1/4 for i = 1, 2, 3, 4. Figure 4.2 displays the impulse response
4.5 Time-to-Build Model
185
Capital Investment
3.00
New Project
10.00
2.50
8.00
2.00 Percent
12.00
6.00 1.50
4.00
1.00
2.00
0.50
0.00
0.00
−2.00 2
4
6
8
10 12 Quarter
14
16
18
−4.00 20
Figure 4.2 Impulse Responses in the Time-to-Build Model
of investment i t , the stock of capital k t and the expenditures on new projects ω4 s4t to a one-time shock to ln Z t of size σε at time t = 1. Note the sawtooth-like behavior of the new project. The shock (not shown) shifts the production function outwards and the related income effect triggers additional investment expenditures. The size of a new project s4t increases. However, this entails additional expenditures in the next three quarters until the project is finished and the new machines will be installed. Therefore, in these quarters the amount of new projects started is below average (see the right scale in the Figure). Thereafter, at quarter five, a new project is again launched. However, its size is smaller than that of the project launched in the first quarter. As a consequence, total investment expenditures gradually converge towards their long-run equilibrium value (see the left scale). The capital stock jumps upward in quarter five when the first new project is finished. Its jagged response is a consequence of the time-to-build assumption. Table 4.4 reveals that the model’s prediction for the second moments of output, consumption, investment, hours, and the real wage are close to the results obtained from the benchmark business cycle model. The entries in the table are computed from HP-filtered long time series of 50,000 observations each. Compared to the second moments in the lowerright panel of Table 4.3 they show that output, investment, and hours are
186
4 Perturbation Methods: Model Evaluation and Applications
slightly less volatile than in the benchmark model. The intuition behind this result is straightforward. When a positive technological shock hits the benchmark economy, the household takes the chance, works more at the higher real wage and transfers part of the increased income via capital accumulation into future periods. Since the shock is highly autocorrelated, the household can profit from the still above-average marginal product of capital in the next quarter. However, in the time-to-build economy, intertemporal substitution is not that easy. Income spent on additional investment projects will not pay out in terms of more capital income until the fourth quarter after the shock. However, by this time, a substantial part of the shock will have faded. This reduces the incentive to invest and, therefore, the incentive to work more. Table 4.4 Second Moments: Time-to-Build Model Variable Output Consumption Investment Hours Real Wage
sx
rx y
rx
1.32 0.50 3.88 0.76 0.60
1.00 0.94 0.99 0.98 0.97
0.64 0.50 0.66 0.66 0.55
Christiano and Todd (1996) embed the time-to-build structure in a model where labor-augmenting technical progress follows a random walk. They use a different parameterization of the weights ωi . Their argument is that investment projects typically begin with a lengthy planning phase. The overwhelming part of the project’s costs are spent in the construction phase. As a consequence, they set ω1 = 0.01 and ω2 = ω3 = ω4 = 0.33. This model is able to account for the positive autocorrelation in output growth, whereas the Kydland and Prescott (1982) parameterization of the same model – ωi = 0.25, i = 1, . . . , 4 – is not able to replicate this empirical finding.
4.6 A New Keynesian Model In this section, we develop at some length a model that combines real and nominal frictions. On the real side, the model features habit formation,
4.6 A New Keynesian Model
187
adjustment costs of investment, and variable capacity utilization. On the nominal side, it features staggered price and wage setting. These building blocks constitute New Keynesian (NK) models, which serve as a framework to study monetary and fiscal policy issues. Famous versions of this framework were developed by Christiano, Eichenbaum, and Evans (2005), Smets and Wouters (2003), and Smets and Wouters (2007). Large-scale New Keynesian models are used by many central banks, including the Federal Reserve Board and the European Central Bank.15 Book-length treatments of this subject are Woodford (2003) and Galí (2015). On the textbook level Walsh (2010), Chapter 8, provides a good introduction. Our presentation closely follows the setup of two papers by Frank Smets and Ralf Wouters. We do not, however, include the full battery of shocks considered by these authors. After all, our aim is not the estimation of models. Rather, we want to introduce the reader to this framework step by step and to uncover the contribution of the various building blocks to the effect of technology and monetary policy shocks on output and hours. Different from the papers of Smets and Wouters, we do not loglinearize the model’s equations but present the full set of equations which can be used for any order of approximation. With respect to price and wage staggering, this requires a recursive formulation of the respective first-order conditions as in, e.g., Schmitt-Grohé and Uribe (2006).
4.6.1 The Monopolistically Competitive Economy We begin with a model without adjustment costs, variable capacity utilization, and nominal frictions. To set the stage for price and wage staggering, we consider monopolistic competition in the product and labor markets. In addition, we allow for a government sector that consumes part of the economy’s output and whose monetary authority sets the nominal interest rate. Figure 4.3 depicts the different sectors of this economy and the flow of goods and services between them. The household supplies labor L t to a monopoly union and capital services K t to the intermediate goods producers. The union differentiates the labor services into a unit mass of available services L t (h), h ∈ [0, 1] and sells these at the nominal wage Wt (h) to a labor agency. This agency bundles the services to ˜L t and delivers this bundle to the intermediate 15
See Tables 1 and 2 in Sergi (2020), which collect prominent DSGE models used in central banks.
188
4 Perturbation Methods: Model Evaluation and Applications Ct , I t Lt Household
Labor Union DL t
L t (h)
Labor Agency ˜L t
DY t Tt
Kt
Bt
Intermediate Producers Yt ( j)
Gt
Government
Final Producer
Figure 4.3 Structure of the NK Model
producers.16 The intermediate sector is populated by a unit mass of firms, indexed by j ∈ [0, 1]. Each of these firms combines labor ˜L t ( j) and capital K t ( j) to the product Yt ( j). The final goods producer combines the intermediate goods into the final good Yt and delivers it to the household for consumption C t and investment I t and to the government G t . The government finances its purchases of goods from a lump-sum tax Tt and from selling nominal bonds B t to the household. We begin with the problem of the final goods producer. FINAL GOODS PRODUCER. The representative firm in this sector buys the differentiated products Yt ( j), j ∈ [0, 1] at the money price Pt ( j), assembles them into the final good Yt according to Z Yt = 16
1 0
Yt ( j)
1 1+θY
1+θY dj
, θY > 0,
(4.38)
A different approach, pursued by Erceg et al. (2000), assumes heterogenous labor services on the part of the household so that the members of the household set the wage. The approach pursued here is that developed in Schmitt-Grohé and Uribe (2005). For a comparison of the two approaches see Schmitt-Grohé and Uribe (2006).
4.6 A New Keynesian Model
189
and sells its output to the household and the government at the money price Pt . Profit maximization Z max Pt Yt −
{Yt ( j)}1j=0
1
Pt ( j)Yt ( j) d j
0
subject to (4.38) implies the demand for product j
Pt ( j) Pt
Yt ( j) =
Y − 1+θ θ Y
Yt ,
(4.39)
where the price index is given by Z Pt =
1
Pt ( j)
0
− θ1
Y
dj
−θY
(4.40)
and implies that the firm earns no profits:17 Z Pt Yt =
1
Pt ( j)Yt ( j) d j.
0
(4.41)
LABOR AGENCY. The labor agency buys the labor services L t (h), h ∈ [0, 1] at the money wage Wt (h), packs them into the bundle Z ˜L t =
1
L t (h)
0
1 1+θ L
1+θ L dh
,
(4.42)
˜ t to the intermediary producers. and sells this bundle at the nominal wage W It maximizes Z1 ˜ ˜ Wt L t − Wt (h)L t (h) d h 0
with respect to L t (h) subject to (4.42). The solution is the demand function for labor of variety L t (h) L t (h) = 17
Wt (h) ˜t W
− 1+θ L θL
˜L t ,
See Appendix A.7 for the derivation of these results.
(4.43)
190
4 Perturbation Methods: Model Evaluation and Applications
˜ t satisfies: where the wage index W Z ˜t = W
1 0
Wt (h)
− θ1
L
dh
−θ L
.
(4.44)
At this wage, the agency earns no profits. LABOR UNION. The labor union buys labor services from the household at the money wage Wt and sells the differentiated services L t (h) at the money wage Wt (h) to the labor agency. It chooses Wt (h) to maximize Z
1 0
(Wt (h) − Wt )Lht d h
subject to the labor demand function (4.43). The solution of this problem is a mark-up on wages Wt (h) = (1 + θ L )Wt .
(4.45)
At these wages, the union’s profit in terms of the final good is equal to DL t
1 := Pt
=
Z |
Z
1 0
1
Wt (h)L t (h) d h −Wt L t (h) d h , 0 {z } | {z } ˜ t ˜L t =W
=L t
˜t W W ˜L t − t L t . Pt Pt
(4.46)
The union transfers these profits to the household sector. INTERMEDIATE GOODS PRODUCERS. A producer j ∈ [0, 1] operates the technology Yt ( j) = Z t K t ( j)α A t ˜L t ( j)
1−α
− A t F,
α ∈ (0, 1),
(4.47)
where Z t is a common shock to total factor productivity (TFP) and A t is the level of the labor-augmenting technical progress. The latter grows deterministically at the rate a − 1 ≥ 0 and the former is governed by ln Z t = ρz ln Z t−1 + εz t ,
ρz ∈ [0, 1), εz t iid N (0, σz2 ).
(4.48)
4.6 A New Keynesian Model
191
K t ( j) and ˜L t ( j), denote labor and capital services at the firm level, respectively, and F is a fixed cost common to all producers. ˜ t from the The producer buys labor services at the nominal wage W labor agency and rents capital services at the real rental rate r t from the household. Cost minimization implies the first-order conditions ˜ t /A t W = mc t ( j)(1 − α)Z t K t ( j)α (A t ˜L t ( j))−α , Pt r t = mc t ( j)αZ t K t ( j)α−1 (A t ˜L t ( j))1−α ,
(4.49a) (4.49b)
where mc t ( j) denotes the marginal costs. Hence, all producers choose the same capital-labor ratio K t ( j)/(A t ˜L t ( j)), given by κ t :=
˜t ˜t α K t ( j) Kt w W ˜ t := = = ,w rt 1 − α A t Pt A t ˜L t ( j) A t ˜L t
and, therefore, have identical marginal costs mc t = mc t ( j) ∀ j ∈ [0, 1]. This allows us to compute aggregate production Y˜t as a function of aggregate labor ˜L t and capital K t : Z Y˜t := =
Z 1
1 0
Yt ( j) d j = Z
A t Z t καt |
0
(A t ˜L t ( j))Z t
1 0
K t ( j) A t ˜L t ( j)
α
− A t F d j,
˜L t ( j) d j −A t F = Z t K α (A t ˜L t )1−α − A t F. t {z }
(4.50)
=: ˜L t
¯ t ( j) and Note that each firm must employ a minimum amount of capital K labor ¯L t ( j) before it can sell goods to the market. This amount is implicitly defined by ¯ t ( j)α (A t ¯L t ( j))1−α . F = Zt K Thus, with respect to total output Yt ( j) + F average costs ac t are equal to marginal costs: ac t :=
˜ t A t L t ( j) + r t K t ( j) (4.49) mc t Z t K t ( j)α (A t L t ( j))1−α w = = mc t . Yt ( j) + F Z t K t ( j)α (A t L t ( j))1−α
Therefore, profits in terms of the final product are equal to DY t ( j) :=
Pt ( j) Yt ( j) − mc t (Yt ( j) + F ). Pt
(4.51)
192
4 Perturbation Methods: Model Evaluation and Applications
The optimal relative price maximizes this expression subject to the demand function (4.39) and is therefore equal to a markup on marginal costs P j ( j) Pt
= (1 + θY )mc t .
(4.52)
Profits in this sector are equal to Z 1 DY t = (4.49)
=
0
Z
1 0
˜t Pt ( j) W ˜L t ( j) − r t K t ( j) d j, Yt ( j) − Pt Pt Z1 Pt ( j) Yt ( j) d j − mc t Z t K t ( j)α (A t ˜L t ( j))1−α d j Pt 0
(4.41)
= Yt − mc t Z t K tα (A t ˜L t )1−α .
(4.53)
GOVERNMENT. The government collects a lump-sum tax Tt from the household, issues one-period nominal bonds B t , which pay interest R t − 1, and buys goods G t from the firm. Its budget constraint, therefore, reads B t+1 − B t Bt = G t + (R t − 1) − Tt , Pt Pt
(4.54)
where Pt denotes the money price of goods. The central bank is part of the government sector. It sets the gross interest rate R t+1 according to a Taylor rule. This rule has four arguments: the current rate R t , the gap between inflation π t := Pt /Pt−1 and the inflation target π, the gap between current (scaled) output y t and stationary output y,18 and a stochastic component that captures unexpected monetary policy shocks: R t+1 = R1−ϑ1 R t 1 (π t /π)ϑ2 (1−ϑ1 ) ( y t / y)ϑ3 (1−ϑ1 ) eεRt , ϑ
(4.55)
εRt = ρR εRt−1 + ξRt , ϑ1 ∈ [0, 1), ϑ2 > 1, ϑ3 ≥ 0, ξRt iid N (0, σR ). Scaled government expenditures g t := G t /A t follow the process
18
ln g t+1 = (1−ρ g )g +ρ g ln g t +ε g t , ρ g ∈ [0, 1), ε g t iid N (0, σ2g ). (4.56)
Smets and Wouters (2007) consider the gap between y t and the natural level of output y t∗ that prevails in the economy with flexible money prices and wages. To reduce the complexity of our model, we consider the deviation of current output from the balanced growth path y.
4.6 A New Keynesian Model
193
HOUSEHOLD. The household receives wage income (Wt /Pt )L t and profits DL t from the labor union, capital income r t K t and profits DY t from the sector of intermediate producers, and interest income (R t − 1)B t /Pt from the government. He pays taxes Tt , buys consumption goods, and saves in terms of capital goods and government bonds. His budget constraint is: Ct + I t +
B t+1 − B t Wt ≤ L t + DL t + r t K t + DY t Pt Pt Bt + (R t − 1) − Tt , Pt
(4.57)
and his stock of capital accumulates in the usual way K t+1 = (1 − δ)K t + I t .
(4.58)
The household’s current-period utility function is parameterized as η − 1 1+ν 1 1−η u(C t , L t ) := C exp L , η ≥ 0, η 6= 1, ν ≥ 0. (4.59) 1−η t 1+ν t The household maximizes ∞ X U t := E t β s u(C t+s , L t+s ) s=
subject to the budget constraint (4.57) and the law of capital accumulation (4.58). From the Lagrangian function η − 1 1+ν 1 1−η Lt = Et C t exp Lt 1−η 1+ν Wt Bt + Λt L t + DL t + r t K t + DY t + (R t − 1) − Tt − C t Pt Pt B t+1 − B t − (K t+1 − (1 − δ)K t ) − Pt β η − 1 1+ν 1−η + C t+1 exp L 1−η 1 + ν t+1 Wt+1 L t+1 + DL t+1 + r t+1 K t+1 + DY t+1 + βΛ t+1 Pt+1 B t+1 + (R t+1 − 1) − Tt+1 − C t+1 Pt+1 B t+2 − B t+1 − (K t+2 − (1 − δ)K t+1 ) − + ... Pt+1
194
4 Perturbation Methods: Model Evaluation and Applications
we obtain the first-order conditions with respect to C t , L t , K t+1 , and B t+1 : η − 1 1+ν −η 0 = C t exp Lt − Λt , (4.60a) 1+ν Wt η − 1 1+ν ν 1−η 0 = −C t exp Lt L t + Λt , (4.60b) 1+ν Pt 0 = −Λ t + βE t Λ t+1 (1 − δ + r t+1 ), R t+1 Λt 0=− + βE t Λ t+1 . Pt Pt+1
(4.60c)
(4.60d)
Note that we can combine conditions (4.60a) and (4.60b) into the wellknown static labor supply condition Wt = L νt C t , Pt which states that the household supplies labor services up to the point where the real wage equals the marginal rate of substitution between consumption and hours of work. The first-order conditions (4.60c) and (4.60d) imply that the expected discounted return on capital is equal to the expected discounted return on government bonds: βE t
Λ t+1 Λ t+1 R t+1 (1 − δ + r t+1 ) = βE t , Λt Λ t π t+1
π t+1 :=
Pt+1 . Pt
In this expression R t+1 /π t+1 is the gross real interest rate earned from holding government bonds. EQUILIBRIUM DYNAMICS. In equilibrium all markets clear. In particular, labor supply of the household equals aggregate labor sold from the trade union to the trade agency, and this amount of labor must in turn be equal to the aggregate labor demand of the intermediate producers: Z Lt =
Z 1
1 0
L t (h) d h =
0
Wt (h) ˜t W
−
θL 1+θ L
˜L t d h.
Clearing of the market for intermediate goods requires Z1 Y˜t = Yt ( j) d j. 0
Using (4.50) and (4.39) yields
4.6 A New Keynesian Model
Z t K tα (A t ˜L t )1−α
195
Z 1 − At F =
0
Pt ( j) Pt
Y − 1+θ θ Y
Yt d j.
The labor union charges the same wages for all varieties of labor (see (4.45)) and all intermediate producers set the same price (see (4.52)). ˜ t = Wt (h) = (1 + θ L )Wt and Pt = Pt ( j) so that mc t = 1/(1 + Therefore, W θY ). Furthermore, the two market clearing conditions imply L t = ˜L t and Y˜t = Yt . Inserting (4.46) and (4.53) as well as the government’s budget constraint (4.54) into the household’s budget constraint (4.57) delivers the economy’s resource restriction Yt = C t + I t + G t . Since the model depicts a growing economy, we must scale the respective variables by the level of labor-augmenting technical progress. Letting η λ t := A t Λ t and vt := Vt /A t for Vt ∈ {C t , G t , I t , K t }, and w t := Wt /(A t Pt ), the dynamics of the model is given by η − 1 1+ν −η λ t = c t exp L , (4.61a) 1+ν t w t = c t L νt , (4.61b) y t = Z t kαt L 1−α − F, t 1−α (1 + θ L )w t = Z t kαt L −α t , 1 + θY α rt = Z t kα−1 L 1−α , t t 1 + θY
(4.61c) (4.61d) (4.61e)
yt = ct + it + g t ,
(4.61f)
ak t+1 = (1 − δ)k t + i t ,
(4.61g)
R t+1 = R1−ϑ1 R t 1 (π t /π)ϑ2 (1−ϑ1 ) ( y t / y)ϑ3 (1−ϑ1 ) eεRt , ϑ
−η
E t λ t+1 (1 − δ + r t+1 ), R t+1 λ t = β a−η E t λ t+1 . π t+1 λt = β a
(4.61h) (4.61i) (4.61j)
These ten equations determine for given shocks z t = [ln Z t , ln g t , εRt ] T the dynamics of the two endogenous state variables x t = [k t , R t ] T and the eight jump variables y t = [ y t , c t , i t , L t , w t , r t , π t , λ t ] T . CALIBRATION AND DETERMINISTIC STATIONARY EQUILIBRIUM. This equilibrium follows from the system (4.61), if we eliminate the shocks and
196
4 Perturbation Methods: Model Evaluation and Applications
set Z = 1, g t = g, and εRt = 0, and vt = v for all t and all variables v of the model. We set g to a fraction γ of output y. The value of γ as well as the numeric values of the other parameters of the model are presented in Table 4.5. Several of the parameters in this table will appear in the final version of the model and will be introduced as we proceed. All values reflect properties of the US economy. The sources are Smets and Wouters (2007) (Table 1A and Table 1B as well as pp. 592-593) and Schmitt-Grohé and Uribe (2005) (Table 6.1, for the price elasticity ε = 1 + (1/θY )). Table 4.5 Calibration of the NK Model Preferences
β=0.9984 χ=0.71
Production
a=1.0043 σ Z =0.0045
Capital Accumulation
δ=0.025
Capital Utilization
Ψ=0.54
η=1.38
ν=1.83
α=0.19 ρ Z =0.95 κ=5.74
Price Setting
θY =0.2
ιY =0.24 ϕY =0.66
Wage Setting
θ L =0.015
ι L =0.58 ϕ L =0.70
Government
g/ y=0.18 ρ g =0.97 σ g =0.0053 π=1.030.25 ϑ1 =0.81 ϑ2 =2.04 ϑ3 =0.08 ρR =0.15 σR =0.0024
We choose the fixed cost parameter F so that there are no profits in the intermediate goods sector. Equation (4.53) implies F = (1 − mc)kα L 1−α
(4.62a)
y = mckα L 1−α .
(4.62b)
so that (from equation (4.61c))
Using this result, equation (4.61e) yields y r =α . k
(4.62c)
We can therefore solve the stationary version of equation (4.61i) for the output-capital ratio:
4.6 A New Keynesian Model
y aη − β(1 − δ) = . k αβ
197
(4.62d)
The stationary version of equation (4.61g) yields the investment to capital ratio: i = a−1+δ k
(4.62e)
so that equation (4.61f) can be solved for the consumption-capital ratio: y c i = (1 − γ) − . k k k
(4.62f)
Combining equations (4.61b) and (4.61d) delivers the solution for the stationary level of hours: L=
y/k 1 − α c/k 1 + θ L
1 1+ν
.
(4.62g)
In a final step, we solve (4.62b) for the capital-labor ratio k = L
y/k mc
1 α−1
,
(4.62h)
which allows us to compute the levels of k, y, c, and i from k/L, y/k, c/k, and i/k, respectively. Wages w and the Lagrange multiplier λ follow from equations (4.61b) and (4.61a), respectively. RESULTS. The model (as well as its following extensions) is solved and R evaluated with the MATLAB script NK_Model.m. Figure 4.4 presents the impulse response of several variables after a technology shock, an interest rate shock, and a government spending shock in period t = 2. The figure illustrates three important properties of the model. 1) The technology shock increases output by more and hours by less than in the model without monopolistic distortion. 2) Unexpected shocks to the nominal interest rate exert no real effects. 3) Unexpected increases in government spending have small positive effects on output and slightly negative ones on wages. The upper-left panel of Figure 4.4 displays the response of output to a technology shock. The line labeled ‘No Markups’ assumes that θY = θ L = 0 so that the model is equivalent to a competitive economy. The impulse
198
4 Perturbation Methods: Model Evaluation and Applications
Percent
TFP Shock and Output 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
TFP Shock and Hours 0.06 0.05 0.04 0.03
Shock No Markups Only Price Markup Price and Wage Markup
1
2
3
4
5
6 7 Quarter
8
9 10
0.02
0.00
0.25
2
3
4
5 6 7 Quarter
8
9 10
0.50
0.00 Percent
1
Government Spending Shock
Interest Rate Shock
0.40
Shock Output Hours Real Wage Inflation
−0.25 −0.50 −0.75
No Markups Only Price Markup Price and Wage Markup
0.01
Shock Output Hours Real Wage
0.30 0.20 0.10 0.00
1
2
3
4
5 6 7 Quarter
8
9 10
1
2
3
4
5 6 7 Quarter
8
9 10
Figure 4.4 Impulse Responses in the Monopolistically Competitive Economy
response of output for zero wage markups and for both nonzero markups on prices and wages are visually indistinguishable. The small distortion on the labor market θ L = 0.015 adds only marginally to the effect of the much larger distortion on the product market. The reason for the larger effect of the shock on output are the larger marginal products of labor and capital. Equation (4.62h) shows that the capital-labor ratio is an increasing function of the price markup. Accordingly, the marginal products of labor and capital in the deterministic steady state are increasing functions of the parameter θY . The smaller effect of the TFP shock on hours shown in the upper-right panel of the figure stems from the income effect: As output increases, profits in the intermediate goods sector increase so that the household supplies less labor.
4.6 A New Keynesian Model
199
The lower-left panel of Figure 4.4 documents the neutrality of monetary policy shocks. An unexpected increase in the nominal interest rate does not affect the real variables. The impulse responses of output, hours, and the real wage are zero. The shock lowers current inflation so that relative to the current price level the future price level, and hence, inflation must increase. The higher expected inflation completely offsets the increased nominal rate so that the real rate, and, hence, the household’s decisions to consume and to invest remain unchanged. Without a demand effect, hours and output do not change. Finally, the lower-right panel of Figure 4.4 shows the effects of a government spending shock. The increased taxes for financing the additional spending lower the household’s income so that the demand for leisure falls. The household supplies more hours. The large elasticity of labor supply with respect to hours, ν = 1.83, implies that it requires only a small decline in the real wage (see the red line) for the labor market to clear and for output to increase. However, note that the relation between the size of the impact and the size of the effects is approximately 17:1.
4.6.2 Price Staggering In this subsection we introduce a nominal rigidity. Its effect is to prevent a perfect adjustment of prices in the intermediate goods sector. The way to model this friction was proposed by Calvo (1983). Suppose that among the unit mass of producers only the fraction 1 − ϕY is allowed to set its relative price p t = Pt ( j)/Pt optimally. The remaining fraction of ϕY adjusts its money price by the rule of thumb Y Pt ( j) = π t−1 π1−ιY Pt−1 ( j),
ι
ιY ∈ [0, 1].
For ιY = 1 the nominal price is fully indexed to previous inflation π t−1 ; otherwise, it is indexed to a combination of previous inflation π t−1 and long-run inflation π. The latter is equal to the (gross) rate of inflation in the stationary equilibrium of the deterministic version of the model. Therefore, ( 1, for s = 0 Pt+s ( j) = X t,s p t , X t,s = Qs πιY π1−ιY (4.63) t+i−1 Pt+s , for s = 1, 2, . . . i=1
π t+i
is the relative price of a producer who was able to choose his optimal price in period t and who has not been able to re-optimize through period t + s.
200
4 Perturbation Methods: Model Evaluation and Applications
Substituting the demand function (4.39) for Yt ( j) into the profit function (4.51) delivers the profits of a producer with price p t : 1+θY 1+θY 1+θY 1+θY 1−
DY t+s ( j) = X t,s
1−
pt
θY
θY
−
Yt+s − mc t+s X t,s
θY
−
pt
θY
Yt+s + A t+s F .
The producers who receive the signal to optimize choose p t such that max E t pt
∞ X s=0
(βϕY )s
Λ t+1 DY t+s ( j), Λt
(4.64)
where β s Λ t+s /Λ t is the household’s stochastic discount factor for random returns accruing in period t + s and ϕYs is the probability of not being able to optimize for s periods. The first-order condition of problem (4.64) reads: ∞ X Yt+s ( j) s Λ t+s 0 = Et (βϕY ) (1 + θY )mc t+s − p t X t,s . (4.65) Λ θ t Y s=0 This can be rewritten as p t = (1 + θY )
Ψ1t , Ψ2t
(4.66a)
where Ψ1t = Ψ2t =
∞ X s=0 ∞ X s=0
(βϕY )s Λ t+s mc t+s Yt+s ( j), (βϕY )s Λ t+s X t,s Yt+s ( j).
We demonstrate in Appendix A.8 that both infinite sums can be defined η−1 recursively and in scaled variables ψi t := A t Ψi t , i = 1, 2 as ψ1t =
− λ t mc t p t
−
ψ2t = λ t p t
1+θY θY
1+θY θY
yt + β a
1−η
ϕY E t
y t + β a1−η ϕY E t
ι
π tY π1−ιY p t π t+1 p t+1
Y − 1+θ θ Y
ψ1t+1 .
(4.66b) 1 1+θ Y −θ ι Y π tY π1−ιY p t − θY ψ2t+1 . π t+1 p t+1 (4.66c)
4.6 A New Keynesian Model
201
In Appendix A.8 we derive two further equations that characterize the price staggering block of the model. With a mass of 1 − ϕY optimizing and a mass of ϕY rule-of-thumb producers, the price index (4.40) implies − 1 − θ1 ιY 1 = (1 − ϕY )p t Y + ϕY π t−1 π1−ιY /π t θY . (4.66d)
Since the prices differ between the two kinds of producers, aggregate output Y˜t is not equal to final production Yt . Rather, the relation between the two concepts is given by − 1+θY Z1 θY 1+θY ˜t P − ˜ := ˜y t = sY t y t , sY t = , P Pt ( j) θY d j (4.66e) Pt 0 where sY t is a measure of price dispersion and has the recursive definition −
sY t = (1 − ϕY )p t
1+θY θY
Y + ϕY π t−1 π1−ιY /π t
ι
− 1+θY θY
sY t−1 .
(4.66f)
The six equations (4.66) complement the ten equations (4.61). Since the prices between optimizing producers and rule-of-thumb producers differ, the production function in equation (4.61c) must be replaced by ˜y t = Z t kαt L 1−α − F. t
(4.67a)
(1 + θ L )w t = (1 − α)mc t Z t kαt L −α t ,
(4.67b)
Furthermore, equations (4.61d) and (4.61e), which derive from the costminimizing conditions (4.49), must be written as: r r = αmc t Z t kα−1 L 1−α . t t
(4.67c)
The new model has two further endogenous state variables, sY t−1 and π t−1 , and six further jump variables, mc t , p t , sY t , ˜y t , ψ1t , and ψ2t . It has the same stationary deterministic equilibrium as the model in the previous subsection. For π t = π t−1 = π equation (4.66d) implies p = 1 so that equation (4.66f) implies sY = 1. Accordingly, aggregate output is equal to final production: ˜y = y. Equations (4.66b) and (4.66c) imply the following stationary solution for the auxiliary variables ψ1 and ψ2 : λ y mc , 1 − β a1−η ϕY λy ψ2 = . 1 − β a1−η ϕY ψ1 =
(4.68a) (4.68b)
Using these in equation (4.66a) implies mc =
1 . 1 + θY
(4.68c)
202
4 Perturbation Methods: Model Evaluation and Applications
4.6.3 Wage Staggering Now consider the labor union and assume that at each period t it can set optimal wages only for the fraction 1 − ϕ L of the labor services it sells to the labor agency. For the remaining fraction nominal wages are indexed to past inflation π t−1 , long-run inflation π, and average productivity growth a. The respective rule is L Wt (h) = aπ t−1 π1−ι L Wt−1 (h),
ι
ι L ∈ [0, 1].
(4.69)
Let w ot := Wt (h)/(A t Pt ) denote the optimal real wage per efficiency unit of labor and assume that from period t onward the nominal wage Wt+s (h) is governed by the rule (4.69) so that Wt+s (h) Wt (h) L = X t,s = X t,s w ot , A t+s Pt+s A t Pt ( 1, for s = 1 X t,s = Qs πι L π1−ι L t+i−1 , for s = 1, 2, . . . . π t+i i=1
(4.70)
˜ t /(A t Pt ) where W ˜ t is the nominal wage ˜ t := W Let w t := Wt /(A t Pt ) and w index defined in equation (4.44). This allows us to write the labor demand function (4.43) in terms of real wages per efficiency unit of labor: L t (h) =
Wt (h) ˜t W
− 1+θ L θL
˜L t =
w ot ˜t w
− 1+θ L θL
˜L t .
As long as the labor union cannot re-optimize, it sells labor at the real wage X t,s w ot to the labor agency and buys the same amount at the real wage w t+s from the household. With probability ϕ L the union cannot re-optimize the wage so that the expected discounted sum of the profits is given by L L − 1+θ ∞ θ L X X t,s w ot Λ t+s L ˜L t+s . Et (βϕ L )s X t,s w ot − w t+s ˜ t+s Λt w s=0 The first-order condition for maximizing this expression with respect to w ot reads: 0 = Et
∞ X s=0
(βϕ L )
s Λ t+s
Λt
§ ª w t+s 1 L −X t,s + (1 + θ L ) L t+s (h). θL w ot
(4.71)
4.6 A New Keynesian Model
203
Proceeding as in the previous subsection (see Appendix A.9 for the derivation) delivers the set of equations that embed the nominal wage rigidity into the model: ξ1t , ξ2t 1+θ L w ot − θ L ˜L t ξ1t =λ t w t ˜t w L ι L 1−ι − 1+θ θL π t π L w ot −η + β a ϕL Et ξ1t+1 , π t+1 w ot+1 1+θ L w ot − θ L ˜L t ξ2t =λ t ˜t w ι L 1−ι − θ1 1+θ L L πt π L w ot − θ L −η + β a ϕL Et ξ2t+1 , π t+1 w ot+1 ι L 1−ι L − 1 1 ˜ t−1 θ L π t−1 π w w ot − θ L 1 =(1 − ϕ L ) + ϕL , ˜t ˜t w πt w L t =s L t ˜L t ,
w ot =(1 + θ L )
sL t
w ot =(1 − ϕ L ) ˜t w
− 1+θ L θL
+ ϕL
ι
L ˜ t−1 π t−1 π1−ι L w
˜t πt w
1+θ − θ L L
(4.72a) (4.72b)
(4.72c)
(4.72d) (4.72e) s L t−1 . (4.72f)
Equations (4.72b) and (4.72c) define the infinite sums in the nominator and denominator of equation (4.72a), which is the first-order condition (4.71). Equation (4.72d) derives from the definition of the wage index in equation (4.44). Equation (4.72e) gives the relation between labor supplied by the household L t and labor sold from the labor agency to the intermediate goods sector ˜L t . The variable s L t is a measure of wage dispersion. The six equations (4.72) determine the additional jump variables ˜L t , ˜ t , w ot , s L t , ξ1t , and ξ2t . The model has the additional endogenous state w ˜ t−1 . Since equation (4.45) no longer holds, and since variables s L t−1 and w labor supply of the household differs from labor sold to the intermediate sector, equation (4.61c) must be replaced by ˜y t = Z t kαt ˜L 1−α −F t
(4.73a)
and the cost-minimizing conditions (4.61d)-(4.61e) must be stated as
204
4 Perturbation Methods: Model Evaluation and Applications
˜ t = (1 − α)mc t Z t kαt ˜L −α w t , ˜L 1−α . r t = αmc t Z t kα−1 t t
(4.73b) (4.73c)
The extended model has the same deterministic stationary equilibrium as the model without price and wage staggering: The stationary version of ˜ so that equation (4.72f) implies s L = 1. equation (4.72d) implies w o = w Therefore, L = ˜L (from equation (4.72e)). Since ξ1 = ξ2 =
λw ˜L , 1 − β a−η ϕ L λ ˜L 1 − β a−η ϕ L
(4.74a) (4.74b)
equation (4.72a) yields ˜ = (1 + θ )w. wo = w
(4.74c)
4.6.4 Nominal Frictions and Interest Rate Shocks Figure 4.5 illustrates the role of nominal rigidities. It displays the impulse response of several variables to an unexpected interest rate shock. Without the nominal frictions, the shock drives down inflation but has no impact on output, consumption, investment, and real wages. Since inflation falls below the central bank’s target, the Taylor rule implies lower nominal interest rates in the quarters following the shock (see the upper-left panel). Sticky nominal prices slightly reduce the negative effect on inflation. Sticky nominal prices together with sticky nominal wages cut the negative effect on inflation in half. With sticky nominal prices alone, the effects on output, consumption, and investment are moderate. Since deflation is insufficient, the real interest rate increases and the household substitutes current for future consumption. This negative demand effect lowers output and reduces labor demand. The household’s desire to smooth consumption reduces investment. Sticky nominal wages hamper the required downward adjustment of the producer’s real wage. The considerable effects on output, consumption, and investment, however, stem from the effect on real wages paid to the household. In the stationary equilibrium, the relation between ˜ is given wages received by the household w and those paid by the firm w ˜ by w = w/(1 + θ L ). The shock increases the inverse markup so that the
4.6 A New Keynesian Model
205 Inflation
Shock and Nominal Interest Rate Shock Price Staggering Price & Wage Staggering Flexible Prices
Percent
0.2 0.1
0.0 −0.2 −0.4
Price Staggering Price & Wage Staggering Flexible Prices
−0.6
0.0 1
2
3
4
5 6 7 Quarter
8
9 10
1
2
3
Output
8
9 10
8
9 10
0.5 0.0
0.0 Percent
5 6 7 Quarter
Consumption
2.0
−0.5
−2.0
−1.0 −1.5
−4.0
−2.0
−6.0 1
2
3
4
5
6 7 Quarter
8
9 10
−2.5
1
2
3
0.0
0.0
−10.0
−5.0
−20.0
−10.0
−30.0
−15.0
1
2
3
4
5 6 7 Quarter
4
5
6 7 Quarter
Real Wage
Investment
Percent
4
8
9 10
Price Staggering Price & Wage Staggering Household Wages Flexible Prices
1
2
3
4
5 6 7 Quarter
8
9 10
Figure 4.5 Interest Rate Shock and Nominal Rigidities
wages received by the household fall by approximately 15 percent below their stationary value (see the red line in the lower-right panel of Figure 4.5) and trigger a sharp recession.
206
4 Perturbation Methods: Model Evaluation and Applications
The real effects of the shock shown in the figure are implausibly large and short-lived. It requires additional real frictions to reconcile the model with the empirical wisdom about the real effects of monetary policy as documented, e.g., in Christiano et al. (1999).
4.6.5 Habits and Adjustment Costs EXTENSIONS OF THE MODEL. There are two well-known mechanisms that prevent a rapid adjustment of consumption and investment: habits and adjustment costs of capital. A third ingredient to the model is a flexible utilization of installed capital. We modify the household’s utility function (4.59) and include an exogenous habit CH t equal to average consumption of the previous period C t−1 : η − 1 1+ν 1 1−η u(C t , L t ) := (C t − χ CH t ) exp L , 1−η 1+ν t η ≥ 0, η 6= 1, ν ≥ 0, χ ∈ [0, 1).
The parameter χ controls the strength of the habit. For χ = 0 the model is identical to the model from the previous subsection. The household faces costs that prevent a rapid adjustment of capital. We follow Christiano et al. (2005) and assume the following law of capital accumulation: 2 It κ K t+1 = (1 − δ)K t + 1 − −a It . (4.75) 2 I t−1 Rapid changes in investment I t /I t−1 relative to investment growth in the deterministic stationary equilibrium of a reduce the resources to accumulate capital. The size of these costs depends on the parameter κ. Capital services supplied to firms are now the product of the utilization rate u t and the current stock of capital K t . Therefore, the production function of intermediate producers in equation (4.47) needs to be changed to 1−α Yt ( j) = Z t (u t K t ( j))α (A t ˜L t ( j) − A t F. We assume that the household bears a cost ω(u t )K t if his rate of capital utilization deviates from unity. We specify the function ω(u t ) as in SchmittGrohé and Uribe (2005):
4.6 A New Keynesian Model
ω(u t ) := ω1 (u t − 1) +
207
ω2 (u t − 1)2 . 2
(4.76)
Note that both kinds of adjustment costs are equal to zero in the deterministic stationary solution of the model. Given these assumptions the Lagrangian function of the household’s intertemporal choice problem reads Lt = Et
η − 1 1+ν 1 (C t − χ CH t )1−η exp Lt 1−η 1+ν Wt Bt + Λt L t + DL t + r t u t K t + DY t + R t − ω(u t )K t Pt Pt B t+1 − Tt − C t − I t − Pt 2 It κ + q t Λ t (1 − δ)K t + 1 − −a I t − K t+1 2 I t−1 β η − 1 1+ν + (C t+1 − χ CH t+1 )1−η exp L t+1 1−η 1+ν Wt+1 B t+1 + βΛ t+1 L t+1 + DL t+1 + r t+1 u t+1 K t+1 + DY t+1 + R t+1 Pt+1 Pt+1 B t+2 − ω(u t+1 )K t+1 − Tt+1 − C t+1 − I t+1 − Pt+1 2 κ I t+1 + βq t+1 Λ t+1 (1 − δ)K t+1 + 1 − −a I t+1 − K t+2 2 It + ... .
Maximizing this function with respect to C t , L t , u t , I t , K t+1 , and B t+1 implies the following first-order conditions: η − 1 1+ν Λ t = (C t − χ C t−1 )−η exp Lt , (4.77a) 1+ν Wt Λt = (C t − χ C t−1 )1−η L νt , (4.77b) Pt r t = ω1 + ω2 (u t − 1), 2 It It It κ −a −κ −a Λt = Λt qt 1 − 2 I t−1 I t−1 I t−1 2 I t+1 I t+1 + κβE t Λ t+1 q t+1 −a , It It
(4.77c)
(4.77d)
208
4 Perturbation Methods: Model Evaluation and Applications
Λ t q t = βE t Λ t+1 ((1 − δ)q t+1 − ω(u t+1 ) + u t+1 r t+1 ), R t+1 Λ t = βE t Λ t+1 . π t+1
(4.77e) (4.77f)
Equations (4.77a) and (4.77b) modify equations (4.60a) and (4.60b). Note that we have replaced CH t with previous consumption C t−1 . Equation (4.77c) equates the rental rate of capital r t with the marginal costs of increasing the utilization of capital ω0 (u t ). Condition (4.77d) determines the shadow price of capital q t . Equation (4.77e) modifies the Euler equation for capital from equation (4.60c). Equation (4.77f) is the same as the first-order condition for bonds presented as equation (4.60d). EQUILIBRIUM DYNAMICS. In the scaled variables introduced in the previous subsections the dynamics of the full model is determined by the following set of equations: η − 1 1+ν λ t = (c t − (χ/a)c t−1 )−η exp Lt , (4.78a) 1+ν w t = (c t − (χ/a)c t−1 ) L νt , (4.78b) ˜y t = Z t (u t k t )α ˜L 1−α − F, t
˜t = w rt =
(4.78c)
(1 − α)mc t Z t (u t k t )α ˜L −α t , αmc t Z t (u t k t )α−1 ˜L 1−α , t
(4.78d) (4.78e)
r t = ω1 + ω2 (u t − 1),
p t = (1 + θY )
(4.78f)
ψ1t , ψ2t − θ1
1 = (1 − ϕY )p t
Y
(4.78g)
Y + ϕY π t−1 π1−ιY /π t
ι
−
1 θY
,
(4.78h)
˜y t = sY t y t ,
1+θ − θ Y Y
sY t = (1 − ϕY )p t
(4.78i) Y + ϕY π t−1 π1−ιY /π t
ι
1+θ − θ Y Y
sY t−1 ,
ξ1t , ξ2t ι L 1−ι L − 1 1 ˜ t−1 θ L π t−1 π w w ot − θ L + ϕL , 1 = (1 − ϕ L ) ˜t ˜t w πt w L t = s L t ˜L t ,
w ot = (1 + θ L )
(4.78j) (4.78k) (4.78l) (4.78m)
4.6 A New Keynesian Model
209
sL t
− 1+θ L
w ot = (1 − ϕ L ) ˜t w
θL
+ ϕL
ι
L ˜ t−1 π t−1 π1−ι L w
L − 1+θ θ L
˜t πt w
ω y t = c t + i t + g t + ω1 (u t − 1) + 2 (u t − 1)2 k t , 2 2 κ ai t ak t+1 = (1 − δ)k t + 1 − −a it , 2 i t−1 ϑ
R t+1 = R t 1 (π t /π)ϑ2 (1−ϑ1 ) ( y t / y)ϑ3 (1−ϑ1 ) eεRt ,
−
ψ1t = λ t mc t p t + βa ψ2t =
1−η
− λt pt
1+θY θY
yt
ϕY E t
1+θY θY
yt
ξ1t = λ t w t + βa ξ2t = λ t
−η
w ot ˜t w
ι
π tY π1−ιY p t π t+1 p t+1
1+θ − θ Y Y
(4.78r) (4.78s)
ψ1t+1 ,
− θ1 1+θY ι Y π tY π1−ιY p t − θY ψ2t+1 , π t+1 p t+1
θL
ι
(4.78v)
π tL π1−ι L w ot π t+1 w ot+1
− 1+θ L θL
˜L t L − 1+θ θ L
ξ1t+1 ,
˜L t
(4.78w) − θ1 L
1+θ L w ot − θ L + β a ϕL Et ξ2t+1 , π t+1 w ot+1 2 ai t ai t κ ai t 1 = qt 1 − −a −κ −a 2 i t−1 i t−1 i t−1 λ t+1 ai t+1 i t+1 2 −η + κβ a E t q t+1 −a . λt it it −η
(4.78p)
(4.78t)
− 1+θ L
ϕL Et
w ot ˜t w
(4.78o)
(4.78u)
+ β a1−η ϕY E t
(4.78n)
(4.78q)
−η
E t λ t+1 ((1 − δ)q t+1 − ω(u t+1 ) + u t+1 r t+1 ), R t+1 λ t = β a−η E t λ t+1 , π t+1
qt λt = β a
s L t−1 ,
ι
π tL π1−ι L
(4.78x)
Equations (4.78a) and (4.78b) are the first-order conditions (4.77a) and (4.77b) of the household with respect to consumption and labor supply in
210
4 Perturbation Methods: Model Evaluation and Applications
the scaled variables. Equation (4.78c) is the aggregate production function of the intermediate sector. Equations (4.78d) and (4.78e) are the costminimizing conditions of the intermediate producers. Equation (4.78f) is the first-order condition of the household with respect to the utilization rate of capital. Equations (4.78g)-(4.78j) together with (4.78t)-(4.78u) repeat the price-staggering equations (4.66), while equations (4.78k)-(4.78n) with equations (4.78v)-(4.78w) repeat the wage-staggering equations (4.72). Equation (4.78o) is the economy’s resource constraint, and equation (4.78p) derives from the law of capital accumulation (4.75). Equation (4.78q) repeats the Taylor rule (4.55), and equations (4.78r),(4.78s), and (4.78x) are the household’s first-order conditions for capital, bonds, and investment, respectively. DETERMINISTIC STATIONARY EQUILIBRIUM. Due to the habit in consumption, the deterministic stationary equilibrium of the extended model differs from the equilibrium of the monopolistically competitive economy considered in Subsection 4.6.1. Note first that the stationary version of equation (4.78x) implies q = 1 for i t = i for all t. Second, for u = 1 the adjustment costs of capital utilization are equal to zero, ω(1) = 0, and equation (4.78f) requires ω1 = r. As a consequence, the solutions for the fixed costs F , the rental rate of capital r, the output-capital ratio, the investment-capital ratio, the consumption share in output, and the capital-labor ratio presented in equations (4.62a), (4.62c)-(4.62f), and (4.62h) still apply. For c t = c t−1 equation (4.78b) can be solved for L=
y/k 1 − α 1 c/k 1 + θ L 1 − χ/a
1 1+ν
,
which differs from the solution (4.62g). We calibrate the parameter ω2 of the adjustment cost function (4.76) from equation (7) of Smets and Wouters (2007) and their estimate of the parameter Ψ = 0.54: In terms of percentage deviations from the stationary solution (denoted by a hat ^), equation (4.78f) implies that the relation between ω2 and Ψ is given by: ˆt = u
r 1−Ψ ˆr t ≡ ˆr t . ω2 Ψ
IMPULSE RESPONSES. Figure 4.6 displays the impulse response to an interest rate shock for inflation, output, consumption, investment, and the
4.6 A New Keynesian Model
211
real wage. The black lines in panels 2-6 trace out the response of the model without real frictions and are, therefore, identical to the respective IRFs considered in Figure 4.5. The blue lines show the response if only Inflation
Interest Rate Shock 0.0
Percent
0.2
0.1
−0.2
0.0
−0.4
1 2 3 4 5 6 7 8 9 10 Quarter
χ = 0, κ = 0, Ψ = 1 χ > 0, κ = 0, Ψ = 1 χ > 0, κ > 0, Ψ = 1 Full Model
1 2 3 4 5 6 7 8 9 10 Quarter
Output
Consumption
2.0 0.0
Percent
0.0 −2.0
−0.1
−0.1 −1.0
−4.0
−0.2
−6.0
Percent
0.0
0.0
−0.2
−2.0
1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Investment
Real Wage 0.1
0.0
0.0
−0.5
0.0
0.0
−20.0
−0.1 −0.1
−0.1 −1.0
1 2 3 4 5 6 7 8 9 10 Quarter
−0.2
1 2 3 4 5 6 7 8 9 10 Quarter
Figure 4.6 Interest Rate Shock and Real Frictions
the habit is included. The habit dampens the reaction of consumption but increases the negative effect on inflation and on the household’s real wage. The additional effects on output and investment in the first period of the
212
4 Perturbation Methods: Model Evaluation and Applications
shock are small and hardly visible in the Figure. The green lines show the effect of introducing adjustment costs. To make the effect on output, consumption, investment, and the household’s real wage visible, we added a right scale to the respective panels. The adjustment costs effectively reduce the demand effects of the shock and introduce the protracted, Ushaped response of inflation, output, consumption, and investment known from estimated IRFs (see, e.g., Christiano et al. (1999, 2005)). Including endogenous capacity utilization strengthens the negative effects of the shock on output, whereas the additional effect on inflation, consumption, investment, and the real wage is small. Figure 4.7 displays the impulse response to a shock to total factor productivity (TFP). The habit dampens the effect of consumption (see the blue line in the middle-right panel) so that on impact consumption remains almost unchanged and displays a hump-shaped response in the following quarters. Adjustment costs are responsible for the flattened and hump-shaped response of output and investment and the larger negative effect on inflation. The negative response of hours is driven by an increased inverse mark-up on the wages received by the households. Accordingly, their labor supply drops and dampens the positive effect on output. While some authors have observed this effect in estimated vector-autoregressive models (see Galí (1999) and Francis and Ramey (2005)) other still question whether this effect is a stylized fact of the US business cycle (see Christiano et al. (2004)). Note that adjustment costs strengthen and prolong the negative effect on hours. As in the case of the interest rate shock, the variable rate of capacity utilization strengthens the positive effect on output, but is less important for the response of the other variables depicted in the figure. Figure 4.8 displays the impulse response to a shock to government spending. The basic mechanism behind the positive effect on output is still the negative wealth effect that increases the household’s labor supply but decreases consumption and investment demand. The consumption habit dampens the decline in consumption but increases the negative response of investment. Adjustment costs of capital considerably dampen the negative response of investment and raise the mark-up on the producer’s real wages (not shown) so that the response of hours almost doubles (compare the blue and the green IRF in the bottom-right panel). Accordingly, the effect on output is larger. The additional effect of variable capacity utilization is a further increase in output and, therefore, smaller negative effects on consumption, investment, and inflation.
4.6 A New Keynesian Model
213
TFP Shock 0.0
0.4
Percent
Inflation
0.2
−0.1
0.0
−0.2
χ = 0, κ = 0, Ψ = 1 χ > 0, κ = 0, Ψ = 1 χ > 0, κ > 0, Ψ = 1 Full Model
1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Output
Consumption
1.0
Percent
0.4 0.5
0.2
0.0
0.0
1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Investment
Hours 0.5
Percent
4.0
0.0
2.0
0.0
1 2 3 4 5 6 7 8 9 10 Quarter
−0.5
1 2 3 4 5 6 7 8 9 10 Quarter
Figure 4.7 TFP Shock in the NK Model
SECOND MOMENTS. Table 4.6 displays second moments. For each of the nine variables in the table the respective first row shows the results computed from the first-order policy functions of the full model. The respective second row shows in brackets second moments computed from one long simulation with the pruned third-order solution. The entries in parentheses in the respective third row refer to second moments computed from linearly detrended U.S. quarterly data for the period 1966.Q1-2004.Q4.
214
4 Perturbation Methods: Model Evaluation and Applications Government Spending Shock
Inflation
Percent
0.02
χ = 0, κ = 0, Ψ = 1 χ > 0, κ = 0, Ψ = 1 χ > 0, κ > 0, Ψ = 1 Full Model
0.01
0.40
0.01
0.20
0.00 0.00
1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Output
Consumption
Percent
0.10
0.00 −0.02
0.05
−0.04 −0.06
0.00
1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Investment
Hours
Percent
0.00
0.08 0.06
−0.10
0.04
−0.20 −0.30
0.02 0.00
1 2 3 4 5 6 7 8 9 10 Quarter
1 2 3 4 5 6 7 8 9 10 Quarter
Figure 4.8 Government Spending Shock in the NK Model
Our source is the data appendix of Smets and Wouters (2007). Note that the parameter values used for our model are those that have been esti-
4.6 A New Keynesian Model
215
mated by Smets and Wouters (2007) from the same data using Bayesian estimation.19 Table 4.6 Second Moments: NK Model Variable
sx
s x /s y
Output
1.98 [2.08] (3.31) 1.89
1.00 [1.00] (1.00) 0.95
1.00 [1.00] (1.00) 0.94
[2.07] (3.66) 5.18 [5.33] (8.28) 0.75 [0.81]
[0.99] (1.10) 2.61 [2.56] (2.50) 0.38 [0.39]
(2.78) 1.63 [1.66] (2.54) 0.44 [0.47] (0.49)
Consumption
Investment
Hours
Real Wage
Inflation
rx y
rx L
rx 0.99 [0.99] (0.96) 0.99
[0.94] (0.95) 0.95 [0.93] (0.35) 0.01 [0.14]
0.01 [0.14] (0.83) −0.01
[0.15] (0.74) −0.17 [−0.11] (0.24) 1.00 [1.00]
[0.99] (0.97) 0.99 [0.99] (0.96) 0.72 [0.75]
(0.84) 0.82 [0.80] (0.77) 0.22 [0.22]
(0.83) 0.99 [0.99] (0.02) −0.23 [−0.19]
(1.00) −0.05 [0.06] (−0.45) 0.91 [0.86]
(0.96) 0.99 [0.99] (0.96) 0.80 [0.80]
(0.15)
(−0.32)
(−0.41)
(0.79)
Notes: The second moments in the first row for each variable were computed from the linear policy functions as explained in Section 4.2.1. The moments in in brackets in the second row were computed from a simulation with 50,000 observations and a burn-in period of 1,000 quarters. This simulation employs the pruned thirdorder solution of the model. The moments in parenthesis in the third row were computed from linearly detrended US data between the first quarter of 1966 and the fourth quarter of 2004 taken from the data appendix to Smets and Wouters (2007). s x :=standard deviation of variable x, s x /s y := standard deviation of variable x relative to standard deviation of output y, r x y :=cross-correlation of variable x with output, r x L :=cross-correlation of variable x with hours L, r x :=first-order autocorrelation of variable x.
19
An exception is the parameter θY , for which Smets and Wouters (2007) do not provide an estimate. As mentioned in Section 4.6.1, we took the value of this parameter from Schmitt-Grohé and Uribe (2005).
216
4 Perturbation Methods: Model Evaluation and Applications
The second column of the table presents the standard deviations. Except for inflation, all variables are considerably less volatile than in the data. The nonlinearities of the model appear in the simulation results. Without pruning, the time series diverge quickly in simulations. The pruned thirdorder solution delivers slightly larger standard deviations for all variables. The model matches the fact that consumption is nearly as volatile as output and that investment is approximately 2.6 times more volatile than output (see column three). It also predicts the strong correlation between output and consumption (see column four), but is not able to explain the weak correlation between investment and output and between output and the real wage as well as the strong correlation between hours and output observed in the data. The model also implies a positive correlation between hours and the real wage, contrary to the negative empirical correlation. The model well replicates the first-order autocorrelations found in the data. We ask the reader in Problem 4.1 to explore whether the additional shocks to the risk-free rate and the relative price of investment goods considered in Smets and Wouters (2007) bring the model’s second moments closer to the data.
Appendix 7
217
A.7 Derivation of the Demand Function The demand function (4.39) solves the problem (we omit the time index for convenience) Z1 max PY −
{Y ( j)}1j=0
P( j)Y ( j) d j.
0
The solution is a continuous function Y ∗ ( j) on the interval [0, 1]. We employ the calculus of variations (see, e.g., Kamien and Schwartz (1981), Section 3) to find the first-order condition for this problem. Let a ∈ R be an arbitrary constant and h( j) be a continuous function on [0, 1] that obeys h(0) = 0 and h(1) = 0, and consider the function Y ( j) := Y ∗ ( j) + ah( j). Note that Y (0) = Y ∗ (0) and Y (1) = Y ∗ (1). For given functions Y ∗ ( j) and h( j), the objective function of the problem depends only on the parameter a and is given by: Z L (a) := P
1 ∗
(Y ( j) + ah( j))
0
1+θY
1 1+θY
Z
dj
−
1 0
P( j)(Y ∗ ( j) + ah( j)) d j.
This function achieves its maximum at L (0). Therefore, it must satisfy the firstorder condition L 0 (0) = 0. Differentiating with respect to the scalar a gives: Z
0
L (a) =P
1
0 1
Z
∗
(Y ( j) + ah( j))
1 1+θY
θY Z
1
dj 0
1
(Y ∗ ( j) + ah( j)) 1+θY −1 h( j) d j
P( j)h( j) d j.
−
0
Note that from equation (4.38) Y
θY 1+θY
Z =
1 0
∗
(Y ( j) + ah( j))
1 1+θY
θ Y dj
so that at a = 0 the first-order condition is given by 0
L (0) =
Z 1h 0
i θY θY PY 1+θY Y ∗ ( j)− 1+θY − P( j) h( j) d j = 0.
For this integral to vanish the expression in square brackets must vanish on [0, 1], requiring θN
P( j) = P(Y ( j)/Y )− 1+θN .
218
4 Perturbation Methods: Model Evaluation and Applications
Solving for Y ( j) yields the demand function (4.39). Substituting the solution into the objective function yields Z L (0) =PY −
1 0
Z 1
=P
=P
0 1+θY θY
P( j)Y ( j) d j P( j) P
Z
1
Y
− θ1
Y
1+θY dj
− θ1 N
P( j) 0
Y
1 1+θY
Z
Z −
1
P( j) 0
1+θY θY
0
1
d j P
1
P( j)− θY P − θ1 Y
θ Y dj
Y d j,
− 1
Therefore, the definition of the price index given in (4.40) implies zero profits.
Appendix 8
219
A.8 Price Phillips Curve This appendix covers the derivation of equations (4.66b)-(4.66f). Consider the infinite sum from equation (4.66a): Ψ1t = E t Λ t mc t Yt ( j) + βϕY Λ t+1 mc t+1 Yt+1 ( j)
(A.8.1)
+ (βϕY )2 Λ t+2 mc t+2 Yt+2 ( j) + . . . , = E t Λ t mc t X t,0 p t
Y − 1+θ θ Y
Yt + βϕY Λ t+1 mc t+1 X t,1 p t
2
+ (βϕY ) Λ t+2 mc t+2 X t,2 p t
Y − 1+θ θ Y
Y − 1+θ θ Y
Yt+1
Yt+2 + . . . ,
where we employ Yt+s ( j) = X t,s p t
Y − 1+θ θ Y
.
Shifting the time index in (A.8.1) one period into the future delivers Ψ1t+1 = E t+1 Λ t+1 mc t+1 X t+1,0 p t+1
Y − 1+θ θ Y
Yt+1
+ βϕY Λ t+2 mc t+2 X t+1,1 p t+1 2
Y − 1+θ θ
+ (βϕY ) Λ t+3 mc t+3 X t+1,2 p t+1
Y
Yt+2
Y − 1+θ θ Y
Yt+3 + . . . .
This equation characterizes the auxiliary variable Ψ1t+1 of a price-setting firm in period t + 1. Multiplying both sides of this equation by
ι
π tY π1−ιY p t π t+1 p t+1
Y − 1+θ θ Y
,
using (see the definition of X t,s in equation (4.63)) X t+1,s =
π t+1 ι
π tY π1−ιY
X t,s+1 ,
and the law of iterated expectations (E t X t+1 = E t E t+1 X t+1 ) yields
(A.8.2)
220
4 Perturbation Methods: Model Evaluation and Applications βϕY E t
ι
π tY π1−ιY p t π t+1 p t+1
Y − 1+θ θ Y
= E t βϕY Λ t+1 mc t+1 X t,1 p t
Ψ1,t+1 Y − 1+θ θ
Yt+1
Y
2
+ (βϕY ) Λ t+2 mc t+2 X t,2 p t
Y − 1+θ θ Y
Yt+2 + . . . .
Comparing the rhs of this equation with the rhs of equation (A.8.1) establishes Ψ1,t =
− Λ t mc t p t
1+θY θY
Yt + βϕY E t
ι
π tY π1−ιY p t π t+1 p t+1
Y − 1+θ θ Y
Ψ1,t+1 .
η−1
η−1
Multiplying both sides of this equation by A t and defining ψi t := A t Ψi t , i = 1, 2 yields equation (4.66b). Proceeding as before, we write the second infinite sum from equation (4.66a) as: Y Y − 1+θ − 1+θ θY θY Ψ2t = E t Λ t X t,0 X t,0 p t Yt + βϕY Λ t+1 X t,1 X t,1 p t Yt+1 , (A.8.3) 2
(βϕY ) Λ t+2 X t,2 X t,2 p t
Y − 1+θ θ Y
Yt+2 + . . . .
Hence, the variable Ψ2t+1 is equal to the infinite sum − 1+θY Ψ2t+1 = E t+1 Λ t+1 X t+1,0 X t+1,0 p t+1 θY Yt+1 + βϕY Λ t+2 X t+1,1 X t+1,1 p t+1 2
Y − 1+θ θ
+ (βϕY ) Λ t+3 X t+1,2 X t+1,2 p t+1
Y
)Yt+2
Y − 1+θ θ Y
Yt+3 + . . .
Using again (A.8.2) and the law of iterated expectations implies βϕY E t
ι
π tY π1−ιY π t+1
− θ1 Y
pt p t+1
= E t βϕY Λ t+1 X t,1 X t,1 p t 2
Y − 1+θ θ Y
Y − 1+θ θ Y
+ (βϕY ) Λ t+2 X t,2 X t,2 p t
Ψ2t+1
Yt+1
Y − 1+θ θ Y
Yt+2 + . . . .
Comparing the rhs of this equation with the rhs of equation (A.8.3) establishes the recursive formula:
Appendix 8 Ψ2t =
221
− Λt pt
1+θY θY
Yt + βϕY E t
ι
π tY π1−ιY
− θ1 Y
π t+1
pt p t+1
Y − 1+θ θ Y
Ψ2t+1 .
Equation (4.66c) in the main text is the scaled version of this equation, which η−1 follows from multiplying both sides by A t . The definition of the price index in equation (4.40) implies − θ1
Pt
Y
1
1
= (1 − ϕY )Pt (A)− θY + ϕY Pt (N )− θY ,
where A (N ) indicates the set of producers who were able (not able) to set their ιY price optimally. Since Pt (N ) = π t−1 π1−ιY Pt−1 20 , this yields equation (4.66d). Since optimizing and rule-of-thumb producers charge different prices, aggregate production as defined by equation (4.50) is no longer equal to final output as given by equation (4.38). The relation between the two concepts involves a measure of price dispersion. Since Z Y˜t =
1 0
Yt ( j) d j
(4.39)
Z 1
=
0
Pt ( j) Pt
Y − 1+θ θ Y
Yt d j,
we define a second price index by − ˜t P
1+θY θY
Z
1
:= 0
Pt ( j)−
1+θY θY
dj
so that Y˜t = sY t Yt ,
sY t =
˜t P Pt
Yt − 1+θ θ Yt
.
This leads to equation (4.66e) in the main text. Since −
˜t P
1+θY θY
we obtain
= (1 − ϕY )Pt ( j)− −
sY t = (1 − ϕY )p t
1+θY θY
1+θY θY
Y ˜t−1 )− + (1 − ϕY )(π t−1 π1−ιY P
ι
Y + (1 − ϕY )(π t−1 π1−ιY /π t )−
ι
1+θY θY
1+θY θY
,
˜t−1 /Pt−1 P | {z
Y − 1+θ θ Y
}
,
=sY t−1
which is equation (4.66f) in the main text. Note the five equations that characterize the price staggering part of the model are required for perturbation solutions of second- and third-order. For a first-order solution the five equations can be reduced to a single equation in the percentage deviations vˆt := (vt − v)/vt of the relevant variables. At the stationary solution the linearized system (4.66a)-(4.66d) is given by: 20
This argument derives from a recursive definition of Pt (N ), see Maußner (2010), p. 6.
222
4 Perturbation Methods: Model Evaluation and Applications ˆ1t − ψ ˆ2t , ˆp t =ψ
ˆ1t ψ
ˆ2t ψ
(A.8.4a) 1 + θY ˆt ˆp t + ˆy t + mc ˆ t +λ = 1 − βϕY a1−η − (A.8.4b) θY 1 + θY ˆ t + ˆp t ) − βφY a1−η (ιY π θY 1 + θY ˆ1t+1 , + βϕY a1−η E t (ˆp t+1 + π t+1 ) + βϕY a1−η E t ψ θY 1 + θY ˆt ˆp t + ˆy t + λ = 1 − βϕY a1−η − (A.8.4c) θY 1 + θY 1 ˆ t − Et π ˆ t+1 ) − βϕY a1−η (ˆp t − E t ˆp t+1 ) − βϕY a1−η (ιY π θY θY ˆ2t+1 , + βϕY a1−η E t ψ
ˆp t = Therefore,
ϕY ˆ t − ιY π ˆ t−1 ) . (π 1 − ϕY
(A.8.4d)
ˆp t =(1 − βϕY a1−η )mc ˆ t − ιY βϕY a1−η π ˆ t + βϕY a1−η E t π ˆ t+1 + βϕY a1−η E t (ψ1t+1 − ψ2t+1 ) . | {z } E t p t+1
Replacing ˆp t on the left-hand side by equation (A.8.4d) and E t ˆp t+1 on the righthand side by the same equation shifted one period into the future yields (after collecting terms and simplifying the ensuing expression) ˆt = π
(1 − ϕY )(1 − βϕY a1−η ) ιY β a1−η ˆ ˆ ˆ t+1 . mc + π + Et π t t−1 ϕY (1 + ιY β a1−η ) 1 + ιY β a1−η 1 + ιY β a1−η
This is a version of the New Keynesian Phillips curve that explains current inflation ˆ t and past π ˆ t−1 and expected E t π ˆ t+1 inflation. as a function of marginal costs mc
Appendix 9
223
A.9 Wage Phillips Curve The first-order condition (4.71) can be written as the quotient of two infinite sums: w ot = (1 + θ L )
Ξ1t , Ξ2t
given by
Ξ1t := E t Λ t w t
L X t,0 w ot
L − 1+θ θ
L
˜L t + βϕ L Λ t+1 w t+1
˜t w
L X t,1 w ot
L − 1+θ θ L
˜ t+1 w
˜L t+1 (A.9.1)
2
+ (βϕ L ) Λ t+2 w t+2
1
L − θL Ξ2t := E t Λ t (X t,0 )
w ot ˜t w
L X t,2 w ot
L − 1+θ θ L
˜ t+2 w
1+θ − θ L L
˜L t+2 + . . . ,
˜L t
(A.9.2)
1+θ L w ot − θ L ˜L t+1 ˜ t+1 w 1+θ L 1 w ot − θ L L − θL ˜L t+2 + . . . , + (βϕ L )2 Λ t+2 (X t,2 ) ˜ t+2 w 1
L − θL + βϕ L Λ t+1 (X t,1 )
Therefore, the auxiliary variable Ξ1,t+1 , which is part of the first-order condition of a wage setter in period t + 1, is the infinite sum
Ξ1t+1 = E t+1 Λ t+1 w t+1
L X t+1,0 w ot+1
L
˜L t+1 ˜ t+1 w L L − 1+θ θL X t+1,1 w ot+1
+ βϕ L Λ t+2 w t+2
˜ t+2 w
2
L − 1+θ θ
+ (βϕ L ) Λ t+3 w t+3
L X t+1,2 w ot+1
˜ t+3 w
˜L t+2
L − 1+θ θ L
˜L t+3 + . . . .
L The definition of X t,s in equation (4.70) implies L X t+1,s =
π t+1 ι L 1−ι πt π L
L X t,s+1
so that the relation between the terms from period t + 1 onwards in the sum (A.9.1) and the variable Ξ1t+1 reads:
224
4 Perturbation Methods: Model Evaluation and Applications
L − 1+θ ι θL π tL π1−ι L w ot βϕ L E t Ξ1t+1 π t+1 w ot+1 L L − 1+θ θL X t,1 w ot ˜L t+1 = E t βϕ L Λ t+1 w t+1 ˜ t+1 w L L − 1+θ θL X t,2 w ot 2 ˜L t+2 + . . . , + (βϕ L ) Λ t+2 w t+2 ˜ t+2 w
where we have used the law of iterated expectations. Therefore, Ξ1t has the recursive definition Ξ1t = Λ t w t
w ot ˜t w
L − 1+θ θ L
˜L t + βϕ L E t
ι
π tL π1−ι L w ot π t+1 w ot+1
L − 1+θ θ L
Ξ1t+1 .
η
The definition of the scaled variable ξ1t := Ξ1t A t is therefore: ξ1t = λ t w t
w ot ˜t w
L − 1+θ θ L
˜L t + β a
−η
ϕL Et
ι
π tL π1−ι L w ot π t+1 w ot+1
L − 1+θ θ L
ξ1t+1 ,
which is equation (4.72b) in the main text. Similar steps show that the sum (A.9.2) implies the recursive definition (4.72c).
Problems
225
Problem 4.1: Investment-Specific Shocks and Risk Premium Shocks in the NK Model Our NK model presented in Section 4.6 features only three shocks, a technology shock, a government spending shock, and a monetary policy shock. Smets and Wouters (2007) consider additional shocks. In this problem we ask the reader to explore whether two more shocks bring the model closer to the data. The first shock drives a wedge between the price of investment goods and the price of consumption goods. Due to the presence of the investment-specific shock x t , the equation the equation for capital accumulation (4.75) needs to be reformulated as: 2 It κ K t+1 = (1 − δ)K t + 1 − −a x t It . 2 I t−1 The variable x t follows the process ln x t+1 = ρ x ln x t + ε x t+1 ,
ε x t+1 iid N (0, σ2x ).
The second shock s t drives a wedge between the return on bonds and the riskfree rate set by the central bank. As a consequence (4.60d) needs to be modified as follows: Λ t = βs t E t Λ t+1
R t+1 . π t+1
The variable s t follows the process ln s t+1 = ρs ln s t + εst+1 ,
εst+1 iid N (0, σs2 ).
Add both shocks to the system of equations (4.78). Note that the investmentspecific shock also affects the formulation of first-order conditions (4.77d) and (4.77e). Solve and simulate the model. Employ the parameter values from Table 4.5 and use ρ x = 0.71, σ x = 0.0045, ρs = 0.18, σs = 0.0024 (see Smets and Wouters (2007), Table 1B). Compare the second moments from this simulation to those presented in Table 4.6.
Problem 4.2: Government Spending and the Hours-Real-Wage Correlation Christiano and Eichenbaum (1992) argue that government spending shocks explain the small correlation between real wages and working hours observed in the data. However, in their model, a shock to government spending decreases private consumption, which is at odds with the evidence presented in Galí et al.
226
4 Perturbation Methods: Model Evaluation and Applications
(2007). We ask the reader to consider the hours-real-wage correlation and the response of private consumption in a slightly modified version of the benchmark business cycle model. Firm. The firm produces output Yt from labor L t and capital services K t according to Yt = Z t K tα (A t Nt )1−α ,
α ∈ (0, 1).
The natural logarithm of total factor productivity z t := ln Z t is governed by ρz ∈ (−1, 1), εz,t+1 iid N (0, σz2 ).
z t+1 = ρz z t + εz,t+1 ,
The level of labor augmenting technical progress A t grows deterministically: A t+1 = aA t ,
a ≥ 1.
The firm maximizes profits given the real wage Wt and the rental rate of capital rt . Housheold. The household maximizes X ∞ 1−η C t+s (1 − L t+s )θ (1−η) − 1 U t := E t 1−η s=0 subject to the constraints 0 ≥ Wt L t + r t K t − Tt − C pt − I t ,
K t+1 = (1 − δ)K t + I t .
Tt are taxes paid to the government and I t is investment in the capital stock. Consumption is a CES index of private consumption expenditures C pt and government spending G t : h
ω−1 ω
ω−1 ω
C t = C pt + κG t
ω i ω−1
.
Government. The government runs a balanced budget, G t = Tt . Scaled government spending g t ≡ G t /A t follows the process ln(g t+1 /g) = ρ g ln(g t /g) + ε g,t+1 ,
ρ g ∈ (−1, 1), ε g,t+1 iid N (0, σ2g ).
The steady-state level of government spending g is the fraction γ of scaled output y, i.e. g = γ y. Calibration. β=0.996, η=2, L=0.126, κ=0.1, ω ∈ {0.2, 0.5, 2.0}, a=1.003, α=0.36, ρz =0.84, σz =0.0072, δ=0.014, γ=0.216, ρ g =0.53, σ g = 0.0093. 1) Derive the first-order conditions for the firm and the household. 2) Derive the system of equations that determines the equilibrium of the economy.
Problems
227
3) Convert this system into a system of variables that are stationary on a balanced growth path. 4) Derive the stationary solution of the deterministic counterpart of the model. 5) Compute the impulse responses of output, private consumption, investment, hours, and the real wage to a government spending shock for all three values of ω given above. What do you observe with respect to the response of private consumption? Can you explain this result? 6) Simulate the model for all three values of the parameter ω. What happens with the hours-real-wage correlation?
Problem 4.3: Endogenous Growth and the US Business Cycle Ozlu (1996) extends the Lucas (1988) endogenous growth model to include variable capital into the production of human capital. His model can be either regarded as a one sector-model with home production of human capital or as a two-sector model. We ask the reader to compare both versions of the model to US business cycle statistics computed from HP-filtered quarterly data that span the period 1966.Q1 through 2004.Q4 and that were taken from the data appendix to Smets and Wouters (2007). Firms. In the two-sector interpretation of the model, the firm in sector i = 1, 2 employs labor L i t and capital services Ki t to produce output Yi t according to Yi t = Zi t Kiαt (A t L i t )1−α . The log of total factor productivity in each sector, zi t := ln Zi t , is governed by zi t+1 = (1 − ρi )zi + ρi zi t + εi t+1 , εi t+1 iid N (0, σ2i ). Firms in both sectors take the real wage w t , the rental rate of capital r t , and labor efficiency A t as given. The firm in sector i = 1 maximizes D1t := Y1t − w t A t L1t − r t K1t subject to its production function. The firm is sector i = 2 maximizes D2t = p t Y2t − w t A t L2t − r t K2t subject to its production function, where p t is the price of its output in terms of consumption goods. Household. The household maximizes X ∞ 1−η C t+s (1 − L t+s )θ (1−η) − 1 U t := E t 1−η s=0
228
4 Perturbation Methods: Model Evaluation and Applications
subject to the constraints 0 ≥ w t A t L t + r t K t + D1t + D2t − C t − I1t − p t I2t ,
A t+1 = (1 − δA)A t + I2t ,
K t+1 = (1 − δ)K t + I1t .
Calibration. β=0.99264, η=1, L1 =0.2, α1 =0.35, α2 =0.05, δK =0.025, a=1.004, δA=0.025, ρ1 =0.9857, σ1 =0.01369, ρ2 =0.9857, σ2 =0.007. The remaining parameters follow from the steady state solution of the model. 1) Derive the first-order conditions for the firms in both sectors. 2) Derive the first-order conditions for the household. (Hint: The household decides about his total labor supply L t and total physical capital K t+1 . As usual, K t is given at the beginning of period t. K1t , K2t as well as L1t and L2t are determined by the equilibrium conditions for the factor markets.) 3) Derive the system of equations that determines the dynamics of the economy. 4) Scale the growing variables appropriately by the level of A t . Use lower case letters to refer to scaled variables. 5) Assume Z1 = 1 in the deterministic steady state equilibrium and derive the stationary solution of the deterministic counterpart of the model. (Hint: You must determine Z2 from the model’s equations!) 6) For simulations of the two-sector model, define the stationary level of gross domestic product as g d p t = y1t + p y2t and the stationary level of aggregate investment as i t = i1t + pi2t , where p is the relative price of sector 2 output in the deterministic stationary equilibrium. Using p instead of p t confirms with the practice in the NIPA accounts to present real aggregates in terms of constant prices. 7) For simulations of the one-sector interpretation of the model set gd p t = y1t , i t = i1t , and hours t = L1t . 8) The table below presents second moments from HP-filtered quarterly data from the US economy. Variables GDP Consumption Investment Hours Real Wage
sx 1.54 1.18 5.06 1.27 0.86
rx y
rx L
1.00 0.90 0.89 0.81 0.90 0.88 0.90 1.00 0.16 −0.00
rx 0.86 0.86 0.91 0.88 0.82
Notes: s x standard deviation of variable x, s x y contemporaneous correlation between variable x and output y, s x L contemporaneous correlation between variable x and hours L, r x firstorder autocorrelation of variable x.
Problems
229
Compute the same statistics from simulations of the model. As a measure of fit, compute the sum of squared differences between the model generated second moments and those presented in the table. According to this measure, which interpretation of the model yields a better interpretation of the data? (Hint: As explained in Section 4.2.4, since this is a model with an endogenous growth factor, the computation of second-moments differs from the computation in models with deterministic growth. The CoRRAM toolbox can handle this case. You must set the flag Flag.DS=1, provide the scaling of the variables in XiVec, and place a t at the top of the vector y t . For more details, please consult the CoRRAM manual.)
Problem 4.4: Public Infrastructure Ramey (2021) computes the effects of shocks to public infrastructure investment. Rather than reproducing her NK model, we ask the reader to extend the NK model of Section 4.6 to include infrastructure capital. Government. Assume that scaled government expenditures g t := G t /A t consist of investment in infrastructure g I t and government consumption g C t : g t = g I t + gC t . The latter is exogenously fixed at g C = γC y, and the former follows the process ln g I t+1 = (1 − ρ I )g I + ρ I ln g I t + ε I t+1 , ε I t+1 iid N (0, σ2I ),
where g I = γ I y. The scaled stock of public infrastructure capital k I t accumulates according to ak I t+1 = (1 − δ I )ki t + g I t .
(P.4.4.1)
Intermediate Production. The services of public capital are an additional factor of production. Accordingly, scaled aggregate production is given by ˜y t = Z t kαt ˜L 1−α kG t − F. t ξ
Calibration. The additional parameters are calibrated as follows: γC = 0.145, γ I = 0.035, ξ = 0.05, δ I = 0.01, ρ I = 0.95, and σ I = 0.01. 1) Modify the dynamic system (4.78). In particular, extend equations (4.78c)(4.78e) to include k I t , add equation (P.4.4.1) to this system, and recalibrate the parameter F such that there are zero profits in the steady state. 2) Compute the short-run multiplier of an infrastructure investment shock as the cumulative increase of output for the four quarters following the shock, i.e., P4 ∆y s=1 ( ys − y) := P4 . ∆g I s=1 (g Is − g I )
230
4 Perturbation Methods: Model Evaluation and Applications
3) Compute the long-run multiplier of an infrastructure investment shock as the cumulative increase of output from the sum P1,000 ∆y s=1 ( ys − y) := P1,000 . ∆g I (g Is − g I ) s=1
4) Explain the difference between both multipliers. 5) Compare both the short- and the long-run multiplier to multipliers computed from a version of the model with flexible nominal prices and wages.
Chapter 5
Weighted Residuals Methods
5.1 Introduction Perturbation methods approximate the solution of DGE models at one point in the state space of the model. This chapter introduces a class of methods that draw on information from many points in the state space. This class belongs to the wider class of global methods, as opposed to local perturbation methods. We refer to the methods introduced in this chapter interchangeably as weighted residuals methods and projection methods and explain both terms in a moment. Judd (1992) introduced projection methods to economics and presents them in more detail in Judd (1998), Chapter 11. McGrattan (1999) is an early application to both the deterministic and the stochastic growth model. Fernández-Villaverde et al. (2016) is a more recent presentation of perturbation and projection methods with application to the benchmark business cycle. We can easily explain the basic idea behind these methods to readers familiar with the least squares estimator. Consider the model y = X β + ε, where y is the N vector of observations of the dependent variable, X the N × K matrix of observations of K independent variables, β the vector of unobserved parameters, and ε is the vector of unobserved disturbances. The least squares estimator of β, denoted by b, solves the linear system X T e = 0K×1 , e = y − X b. Geometrically, the vector of residuals e is perpendicular to the hyperplane spanned by the columns of X or, equivalently, the estimated equation is the orthogonal projection of the vector y into the hyperplane X b. The least squares estimator thus is a weighted residuals or projection method in the Euclidean N -space. The methods considered in this chapter apply the same concept to spaces, whose elements are
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_5
231
232
5 Weighted Residuals Methods
not N -tuples of real numbers but functions with a domain in Euclidean N -space. Chapter 3 shows that perturbation methods approximate the policy functions for all variables of a model simultaneously in successive steps equal to the order of approximation. Weighted residual methods isolate as few variables as possible from the model, compute approximate policy functions, and solve the linear and nonlinear equations of the model for the remaining variables. The next section motivates this concept, introduces the building blocks of weighted residuals methods, and summarizes the presentation in terms of a general algorithm. Section 5.3 discusses various aspects of the implementation. It draws heavily on results provided in Chapter 13 and Section 14.3, so readers unfamiliar with function approximation and numerical integration may want to read these parts first. The remaining sections of the chapter consider various applications. Section 5.4 considers the deterministic growth model and compares different numeric approximations of the policy function for consumption. Section 5.5 compares two different solutions of the benchmark business cycle model. The reader will learn that – at least for the typical calibration of this model – the different degrees of numeric precision between a second-order perturbation solution and a weighted residuals solution do not appear in business cycle statistics computed from simulated time series. Section 5.6 considers a model with labor market frictions where this conclusion does not hold. In fact, the perturbation solution predicts a much different time series behavior of the standard search and matching model than a weighted residuals solution. This result is not true in general but holds for particular calibrations of the model. We end this chapter with models driven by shocks that trigger infrequent but severe recessions. These kinds of models have been developed to explain the empirically observed large risk premia on stocks.
5.2 Analytical Framework 5.2.1 Motivation The canonical dynamic stochastic general equilibrium (DSGE) model presented in Section 2.5 involves n(x) + n( y) equations that determine the
5.2 Analytical Framework
233
solution for the model’s endogenous state variables x t+1 and the not predetermined variables y t as functions of the model’s endogenous states x t and shocks z t at time t. We refer to these solutions as policy functions and use the notation x t+1 = h x (x t , z t ) and y t = h y (x t , z t ). Careful readers will notice that we do not include the perturbation parameter σ in the list of arguments. The reason is that we are looking for global rather than local solutions. The former use information from many points of the state space to provide good approximations everywhere in this space whereas the latter focus on a particular point in the state space. Accordingly, the solution rests on neither the stationary deterministic solution of the model nor the assumption of small shocks as parameterized by σ. In contrast to the perturbation approach, which computes the parameters of all policy functions simultaneously, we look for the smallest number n(u) of unknown variables, which we need to compute the solutions of other variables from the remaining n(x) + n( y) − n(u) equations. We illustrate this point with the set of equations (1.64) that determines the dynamics of the benchmark business cycle model from Example 1.6.1. For the reader’s convenience, we reproduce this system: −η
0 = c t (1 − L t )θ (1−η) − λ t , 0 = θ c t − w t (1 − L t ), 0 = yt −
Z t kαt L 1−α , t
yt 0 = w t − (1 − α) , Lt yt 0 = rt − α , kt 0 = yt − ct − it ,
0 = ak t+1 − (1 − δ)k t − i t , 0 = λt − β a
−η
E t λ t+1 (1 − δ + r t+1 ) .
(5.1a) (5.1b) (5.1c) (5.1d) (5.1e) (5.1f) (5.1g) (5.1h)
The endogenous state of this model is the (scaled) capital stock k t , and the exogenous state is the log of total factor productivity (TFP) z t := ln Z t . Suppose we know the policy function for working hours L t = h L (k t , z t ). This knowledge would enable us to compute the solution for all other variables in a few simple steps. Given L t , k t , and z t , the production function (5.1c) yields the solution for output y t . The first-order conditions of the firm, equations (5.1d) and (5.1e), deliver the solution for the real wage w t and the rental rate of capital r t , respectively. Next, we use the real wage and hours in equation (5.1b) to find consumption c t . Consumption and output
234
5 Weighted Residuals Methods
determine investment i t from the economy’s resource constraint (5.1f). The law of capital accumulation (5.1g) delivers the next period capital stock k t+1 . Finally, the Lagrange multiplier of the household’s budget constraint λ t follows from equation (5.1a). Accordingly, with n(u) = 1 policy functions known, we can recover the solutions for all remaining variables. How do we determine this function? Note that we have not used the Euler equation (5.1h). The policy function h L (k t , z t ) must also satisfy this equation for all admissible pairs (k t , z t ). To understand this requirement, let k t+1 = hk (h L , k t , z t ) denote the solution for the next period capital stock computed from h L (k t , z t ) and the system of equations (5.1). Analogously, let λ t = hλ (h L , k t , z t ) and r t = h r (h L , k t , z t ) denote the solutions for the Lagrange multiplier and the rental rate of capital. Note also that the next period values of both variables follow from h L (k t+1 , z t+1 ) and the system (5.1): λ t+1 = hλ (h L , k t+1 , z t+1 ) = hλ (h L , hk (h L , k t , z t ), ρ Z z t + ε t+1 ), r t+1 = h r (h L , k t+1 , z t+1 ) = h r (h L , hk (h L , k t , z t ), ρ Z z t + ε t+1 ). We may abbreviate the expression λ t+1 (1 − δ + r t+1 ) as λ t+1 (1 − δ + r t+1 ) =: g(h L , k t , z t , ε t+1 ).
Since ε t+1 is normally distributed, the conditional expectation on the right-hand side (rhs) of the Euler equation (5.1h) is equal to rhs(h L , k t , z t ) := E t λ t+1 (1 − δ + r t+1 ), Z∞ =
−∞
g(h L , k t , z t , ε t+1 )
(5.2a) 2
e−(ε t+1 /σε ) /2 d ε t+1 . p 2πσε
Therefore, the entire Euler equation may be written as 0 = hλ (h L , k t , z t ) − β a−η rhs(h L , k t , z t ).
(5.2b)
The exact solution h L : W → [0, 1], W ⊂ R2 satisfies this equation for all admissible values of the state vector w t := (k t , z t ) ∈ W . Hence, the Euler equation (5.1h) can be reduced to a functional equation in the yet unknown policy function for hours. An alternative way to solve the model, known as parameterized expectations approach (PEA), is to approximate the conditional expectation on the rhs of the Euler equation (5.1h). Recall the definition of conditional expectations: let y denote a random variable that we wish to forecast
5.2 Analytical Framework
235
using observations on a vector of observed variables x := [x 1 , x 2 , . . . , x n ]0 . We seek a function he that minimizes the expected mean quadratic error E ( y − he (x 1 , x 2 , . . . , x n ))2 . The solution to this problem is the conditional expectation1 E [ y|x] := argmin E ( y − he (x 1 , x 2 , . . . , x n )2 . he
In the benchmark business cycle model, the observed variables are k t and z t so that E t λ t+1 (1 − δ + r t+1 ) = he (k t , z t ).
Given the function he , we can proceed in a similar way to solve the model. Instead of the Euler equation (5.2a) we obtain Z∞ 2 e−(ε t+1 /σε ) /2 −η e e 0 = β a h (k t , z t ) − g(h , k t , z t , ε t+1 ) p d ε t+1 , 2πσε −∞ (5.2c) and the exact solution he : W → (0, ∞) satisfies this equation for all admissible values of the state vector. An early application of this idea to solve models with rational expectations is the partial equilibrium model developed by Wright and Williams (1984). Applications to DSGE models include, among others, Den Haan and Marcet (1990), Marcet and Lorenzoni (1999), Christiano and Fisher (2000), and Duffy and McNelis (2001). In the model of (5.1), parameterizing the policy function for marginal utility λ t is equivalent to approximating the conditional expectation. However, in this case, there is no analytic solution for working hours L t , as can be seen from combining equations (5.1a) and (5.1b). Solving this equations numerically introduces an additional source of numerical errors. Equations (5.2b) or (5.2c) define maps R with an image in the set of real numbers R and domain F × W , where F is the set of admissible functions and W ⊂ R2 is the subset of admissible states w t := (k t , z t ) ∈ W : R : F × W → R.
We call R the residual function because we expect that its result R( f , w t ) will usually differ from zero for an arbitrary but admissible function f ∈ F . 1
See, e.g., Sargent (1987), p. 224.
236
5 Weighted Residuals Methods
Having introduced the residual function, a pragmatic way to proceed would be to simplify the problem further (see Fernández-Villaverde et al. (2016)). We could approximate the unknown function h L (or he ) by a linear combination of the members of a family of polynomials ϕk (wt ), ˆh L (w t ) =
K X
γk ϕk (w t ),
k=0
select some measure of distance, and choose the K + 1 parameters γk such that the residual function R is close to zero according to this measure for all wt ∈ W . To shed light on the nature of the problem we will proceed in a more abstract fashion in the next section.
5.2.2 Residual, Test, and Weight Function We begin this section with several definitions of objects that constitute weighted residuals methods tailored to solve DSGE models and characterize the conditions defining an approximate solution. We also define the notions ‘weighted residuals’ and ‘projection methods’. Let X denote a compact subset of the Euclidian n-space Rn , B(X ) the Borel σ-algebra on this set, λn the Lebesgue measure, and w : X → [0, ∞] a density function, such that µ is another measure on B(X ) that assigns to every element B ∈ B(X ) the number Z Z µ(B) := B
w(x) d λn (x) =
w(x) d x, B
i.e., the Lebesgue integral of w over the set B.2 The object (X , B(X ), µ) is a measure space (see, e.g., Halmos (1974), p. 73.) The set of real valued functions f : X → R on this space builds a vector space, i.e., a set F (X , R), where for f , g ∈ F (X , R), the sum f + g and for a scalar a ∈ R, the product a f belong to the set:3 f , g ∈ F (X , R) ⇒ f + g ∈ F (X , R),
2
f ∈ F (X , R), a ∈ R ⇒ a f ∈ F (X , R).
We use d x for the Lebesgue integral over B. If the Riemann integral of w exists, it is equal to the Lebesgue integral. 3 For the entire set of axioms that constitute a vector space, see, e.g., Luenberger (1969), pp. 11f.
5.2 Analytical Framework
237
Let R denote a map from the Cartesian product F (X , R) × X to the real line: R : F (X , R) × X → R.
For each member f of the set F (X , R), the map R defines a new function that maps x ∈ X on the real line, i.e., R( f , .) : X → R, x 7→ R( f , x). Now, consider a second vector space, the space of square integrable functions on X with respect to the measure µ: Z 2 2 L w := f : X → R f (x)w(x) d x < ∞ . X
Together with the inner product Z 〈 f , g〉w :=
X
f (x)g(x)w(x) d x
(5.3)
and the induced norm vZ u t k f kw := f 2 (x)w(x) d x X
this space is a Hilbert space.4 If the mapping f 7→ R( f , .) maps functions f 2 in F (X , R) to square integrable functions R( f , .) in L w , the policy function h ∈ F (X , R) satisfies the condition R(h, x) = 0 for almost all x ∈ X wrt µ.
Hence, the properties of a norm5 imply for all h ∈ F (X , R) that Z R(h, x) = 0 for almost all x ∈ X wrt µ ⇔
X
R2 (h, x)w(x) d x = 0. (5.4)
Note that the restriction of R(h, .) to the space of square integrable functions is not severe. Assuming a compact state space X and continuity of R(h, .) : X → R for all h, the Weierstrass theorem (see, e.g, Luenberger (1969), p. 40) guarantees that R(h, .) achieves a minimum of m and a maximum of M on X so that 4
See, e.g., Lang (1993), Theorem 1.4, pp. 182f. Namely, k f kw = 0, if and only if f (x) = 0 for almost all x ∈ X with respect to (wrt) the measure µ. 5
238
5 Weighted Residuals Methods
Z
Z 2
X
2
2
R (h, x)w(x) d x ≤ max{m , M }
X
w(x) d x < ∞.
In the next step, we choose an orthonormal basis ψn , n ∈ N for the 2 space L w so that R(h, x) =
∞ X n=1
cn ψn (x).
(5.5a)
Orthonormality of the basis implies that the coefficients cn satisfy the condition cn = 〈R(h, .), ψn 〉w ∈ R.
(5.5b)
Therefore, condition (5.4) requires cn = 0 for all n ∈ N. Hence R(h, x) = 0 for almost all x ∈ X wrt µ Z ⇔
X
R(h, x)ψn (x)w(x) d x = 0 for all n ∈ N.
(5.6)
Note that the policy function h characterized in this way is a member of the function space F (X , R), which is a vector space of infinite dimension. We therefore simplify the problem and choose basis functions ϕk for this latter space and approximate the policy function h with a linear combination of K elements: ˆh(x) :=
K X k=1
γk ϕk (x), γk ∈ R.
(5.7)
Thus, we have reduced the problem to find an element in an infinite dimensional function space to the much simpler problem to choose K real numbers γk so that Z X K R γk ϕk (x), x ψn (x)w(x) d x = 0 for all n = 1, . . . , K. (5.8) X
k=1
The meaning of the notion ‘weighted residuals methods’ should now be obvious: The integrand in the previous equation is the residual function R(ˆh, .) at the point x ∈ X weighted by the product ψn (x)w(x). In this product, the basis functions ψn may be considered as a test function for the condition R(h, x) = 0 according to the equivalence (5.6), whereas the density w is the weight function in the inner product (5.3).
5.2 Analytical Framework
239
Regarding the meaning of the alternative notion ‘projection methods’, observe that ˆ (h, .) := R
K X
cn ψn
n=1
approximates the residual function R with the first K < ∞ elements of 2 the basis ψn for the space L w . The K conditions cn = 〈R(h, .), ψn 〉w = 0,
n = 1, . . . , K
ˆ into the linear space spanned by define the orthogonal projection of R the linear combinations of the K basis functions ψn (see, e.g., Luenberger (1969), pp. 50ff.). Accordingly, choosing the parameters γk as in (5.8) guarantees that the residual vector is as close as possible to the zero vector. Note that the condition (5.8) does not guarantee that the function ˆh is close to the exact policy function h. Since we do not know the latter, and hence its value h(x), x ∈ X , theorems, e.g., Theorem (13.7.1) on orthogonal interpolation, do not apply. Accordingly, we must assume or prove that h ∈ F (X , R) is the unique solution of condition (5.4). In this case, the limit of ˆh defined in equation (5.7) for K → ∞ would coincide with the exact policy function h. Note, furthermore, that in practice, we will usually lack an analytic expression for the residual function R(h, .) : X → R. For example, consider the Euler equation (5.2). It involves an integral, whose integrand derives from the solution of several nonlinear equations, so we are unable to solve the integral analytically. Hence, instead of the exact residual function ˆ (h, .) for R(h, .) in R(h, .), we must employ a numerical approximation R ˆ ˆ the condition (5.4). With R(h, .) in place of R(h, .), it might not be possible to choose the parameters γk to achieve the true minimum of zero for the integral on the rhs of condition (5.4). An application oriented approach would choose the parameters γk to minimize the squared residuals: K Z X 2 ˆ min R γk ϕk (x) w(x) d x. (5.9) γ1 ,...,γK
X
k=1
5.2.3 Common Test Functions In this subsection, we consider three common test functions. The minimizer of the problem (5.9) will satisfy the condition
240
5 Weighted Residuals Methods
Z 2
ˆ (ˆh, x) ∂R ˆ (ˆh, x) R w(x) d x = 0 for all k = 1, . . . K. ∂ γk X
From this point of view, the least squares solution is a special version of problem (5.8), with the test functions given by ψk (x) ≡
∂ R(ˆh, x) . ∂ γk
To the extent it is impossible to obtain the analytic expression of the residual function R(ˆh, .), it will be infeasible to derive explicit formulas for its derivatives. In these cases the numeric solution of the minimization problem (5.9) is the appropriate method. A second choice, known as the Galerkin method, employs the basis functions ϕk as test functions: ψk (x) := ϕk (x)
(5.10)
so that condition (5.8) becomes Z X K R γk ϕk (x), x ϕk (x)w(x) d x = 0 for all k = 1, . . . K. X
k=1
Note that both the least squares method and the Galerkin method involve integration. The collocation method uses a measure that makes integration particularly simple, since it reduces integration to the evaluation of the integrand at a given point. More formally, this method employs integration with respect to the Dirac measure, ¨ 1, if x ∈ A, δ x (A) = 0, otherwise. This measure assigns a unit mass to the set A if this set includes the point x ∈ Rn . On a set of xk ∈ X , k = 1, . . . , K given points, the collocation method determines the parameters γk by solving the system of K nonlinear equations: K X R γk ϕk (xk ) = 0 for k = 1, . . . , K. (5.11) k=1
We can regard this condition as the limit of the Lebesgue integrals (5.8) if we employ the Dirac delta function ¨ ∞, if x = xk , δk (x) := (5.12) 0, otherwise as the test function.
5.2 Analytical Framework
241
5.2.4 Spectral and Finite Element Functions We can now distinguish between spectral and finite element methods.6 The function ˆhγ defined in equation (5.7) is a spectral function if each element of the vector γ := [γ1 , γ2 , . . . , γK ] T affects the shape of the function over the entire domain of ˆhγ . Alternatively, we may divide X into I subspaces, I X = ∪i=1 X i and join the local approximations ˆhγi to the function ˆhγ . In this case, changes to some elements of γ change the shape of the approximation only on the subspace X i , and we refer to ˆhγ as a finite element function. Since spectral methods approximate a function over its entire domain, they usually require many basis functions, i.e., K is large. Instead, finite element methods construct the approximating function from a few basis functions on a large number of subspaces I. By suitably choosing the subspaces, finite element methods are more appropriate to deal with the local behavior of the policy function, as, for example, with kinks that arise from binding constraints.
5.2.5 Illustration We now illustrate these concepts with a non-economic example. Consider the problem to approximate the solution x(t) = e−t of the differential equation d x(t) + x(t) = 0 dt with initial condition x(0) = 1 over the domain X := [0, 2]. For the spectral function, we choose the three monomials t 0 , t 1 , t 2 as basis functions ϕk (t) so that equation (5.7) yields: xˆγ (t) = γ0 + γ1 t + γ2 t 2 . The initial condition x(0) = 1 requires γ0 = 1 so that the residual function is given by R(ˆ x γ , t) = γ1 + 2γ2 t + 1 + γ1 t + γ2 t 2 . | {z } | {z } d xˆγ / d t
6
xˆγ (t)
For this distinction see also Boyd (2000), Section 1.3.
(5.13)
242
5 Weighted Residuals Methods
For the weight function, we choose w(t) = 1 for all t ∈ [0, 2], and for the test function, we choose the Dirac delta function (5.12). To determine the two parameters, we use the two points t 1 = 1 and t 2 = 2. Inserting these into (5.13) yields the two equations −1 = 2γ1 + 3γ2 , −1 = 3γ1 + 8γ2 .
The left panel of Figure 5.1 shows the respective solution along with the exact one. 1.0 0.8 x(t)
1.0
e−t Collocation: Quadratic 4 Finite Elements
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0
0.5
1 Time t
1.5
e−t Collocation: Cubic 2 Finite Elements
2
0.0
0
0.5
1 Time t
1.5
2
Figure 5.1 Approximations of e−t
Alternatively, let us divide the interval [0, 2] into subintervals [t i , t i+1 ], i = 0, 1, . . . I and approximate the solution over each of them with a linear function xˆai ,bi (t) = ai + bi (t − t i ),
i = 1, 2, . . . I
so that locally, the residual function is given by R(ˆ x ai ,bi , t) = bi + ai + bi (t − t i ).
(5.14)
The initial condition xˆa1 ,b1 (0) = 1 requires a1 = 1. To determine the remaining 2I − 1 parameters, we again employ w(t) = 1∀t ∈ [0, 2] and the Dirac measure δ(t mi ), and choose the midpoints t mi := (t i+1 + t i )/2 of the subintervals. Equations (5.5) and (5.14) thus imply: 0 = bi + ai + bi (t i+1 − t i )/2 ⇒ bi = −
ai 1 + (t i+1 − t i )/2
.
5.2 Analytical Framework
243
We join the I local solutions at nodes t i by requiring: xˆai ,bi (t i+1 ) = xˆai+1 ,bi+1 (t i+1 ) ⇒ ai+1 = ai + bi (t i+1 − t i ). The left panel of Figure 5.1 shows the solution constructed from I = 4 elements. It is much closer to the exact function than the spectral solution. However, the latter has only three parameters, whereas the former has eight. The right panel of the figure compares a spectral solution with a cubic function xˆγ (t) =
3 X
γi t k
k=0
with the same number of parameters as a finite element solution on the subintervals [0, 1] and [1, 2].7 Obviously, with the same number of parameters, the spectral solution is much closer to the exact one than the finite element solution.
5.2.6 General Procedure The lesson from the previous subsections is that the term ‘weighted residual methods’ encompasses several approaches to find an approximate solution to a functional equation that arises as part of the system of equations that determines the solution of a DGE model. We present the various decisions and steps necessary to compute this approximation in terms of an algorithm. For ease of exposition, we extend the meaning of the term policy function to include conditional expectations as well. Algorithm 5.2.1 (Weighted Residuals Method) Purpose: Approximate the policy function h of a DGE model. Steps: Step 1: Choose a compact (i.e., closed and bounded) state-space X ⊂ Rn . Decide between a spectral or a finite element method. In the latter case, I divide the space X into I subspaces X = ∪i=1 X i . The following steps pertain either to the set X or to each of the subsets X i individually.8 7
We leave it as an exercise to the reader to determine the four parameters of the spectral solution. 8 For simplicity, we use X to refer to either the entire state space or one of its subspaces.
244
5 Weighted Residuals Methods
Step 2: Choose a family of basis functions ϕk : X → R and a degree of approximation K. Let ˆhγ (x) :=
K X k=1
γk ϕk (x), x ∈ X .
Step 3: Construct either the residual function R : F (X , R) × X → R
ˆ (., .) from the respective equations of the model. or its approximation R Step 4: Choose the test functions ψk : X → R, k = 1, . . . , K, and the weight function w : X → R>0 , and either solve the system of equations Z 0= R(ˆhγ , x)ψk (x)w(x) d x, k = 1, . . . , K, i = 1, . . . I X
for the elements of the vector γ = (γ1 , γ2 , . . . , γK ) or minimize Z R(ˆhγ , x)2 d x X
with respect to the elements of the vector γ. Step 5: Verify the quality of the candidate solution γ. If necessary, return to step 2 and increase the degree of approximation K. Alternatively, return to step 1, and, if applicable, increase the number of finite elements I, choose a different family of basis functions and/or a different test and/or a different weight function.
5.3 Implementation In this section, we take a closer look at the building blocks of weighted residuals methods. Our goal is to provide a practical guide for making the various decisions to solve a particular problem.
5.3.1 State Space The state space of the infinite horizon Ramsey model consists of a single variable, the farmer’s capital stock K t . The analysis of the saddle path
5.3 Implementation
245
in Section 1.3.4 reveals that the policy function for next-period capital, K t+1 = hK (K t ), moves the current capital stock K t closer to the stationary ¯] solution K ∗ . Accordingly, we can choose any closed interval I := [K, K ∗ around K as the state space of interest because the exact policy function h will always map this interval onto itself: h : I → I. This property is particularly relevant if we employ a family of polynomials as base functions ϕk (K t ), whose domain is restricted to a closed interval of the real line, as is the case for the Chebyshev polynomials. Suppose, however, that this is not the case so that at some point, our candidate solution ˆhK (K t ) yields an image K t+1 outside of I, i.e., ˆhK (K t ) ∈ / I. Consequently, K t+1 is outside of the domain of ˆh and the value K t+2 = ˆhK (K t+1 ) is not reliable. The behavior of an approximate solution at the boundaries of the state space deserves particular attention in stochastic models, where GaussHermite integration (see Section 14.4.2) is used to compute expected values. The respective integrations nodes may well lie outside the state space considered relevant for simulations of the model. In these cases, it may be necessary to extrapolate beyond the domain of ˆh or to extend the state space. Finite element methods that employ simple monomials defined on the entire real line as base functions circumvent this problem. They have the additional advantage that one can adapt the number of subintervals to deal with kinks or rapid changes in the curvature of the policy function. Choosing the proper state space of a given model may very well require some experimentation. Ideally, we would like to approximate the solution on the ergodic set. Intuitively, this set consists of the points x t traced out by a stochastic simulation of the exact solution with infinite length. Figure 5.2 displays an estimate of the ergodic set of the benchmark business cycle model introduced in Example 1.6.1. It was constructed from a simulation with the second-order perturbation solution (see Section 4.4). The light blue dots show the t = 1, 2, . . . , 100, 000 pairs (k t , z t ) of the scaled capital stock k t and the natural logarithm of total factor productivity z t := ln Z t . None of these points lies outside the square X := [0.91k, 1.17k] × [−0.064, 0.055]. Hence, simulations of an easy to compute perturbation solution may provide an estimate of the relevant state space. As we will demonstrate later in this chapter, however, this guess may be misleading, if the perturbation solution is a poor approximation of the exact solution. Alternatively, one may start with a weighted residuals solution on a small state space, simulate the model with this solution, check whether the model remains within this set, and if not, extend the state space.
246
5 Weighted Residuals Methods 0.050
zt
0.025 0.000 −0.025 −0.050 0.92
0.94
0.96
0.98
1.00
1.02
1.04
1.06
1.08
k t /k Figure 5.2 Ergodic Set of the Benchmark Business Cycle Model from a Second-Order Solution
Figure 5.2 reveals that the ergodic set is not a rectangular region in the plane but rather is shaped like an ellipse. Now, suppose we use 16 points on the square X to compute a collocation solution and pick them from the Cartesian product of the zeros of a fourth degree Chebyshev polynomial. The black squares indicate this grid. Most of them are outside the region where the model economy stays most of the time, which is the area inside the red ellipse. The size of this area is approximately one-fourth of the size of the area of the square surrounded by the outer points of the grid. Accordingly, choosing the state space to include include all points from a simulation with a large number of observations is not always advisable. Indeed, this approach would force the residuals to be close to zero at many points irrelevant for the dynamic path of the model economy.
5.3.2 Basis Functions The domain of the basis functions ϕk : X → R of step 2 of the Algorithm 5.2.1 is a compact subset of the Euclidean n space. Before we address this n-dimensional problem, we consider the simpler univariate problem to approximate h : [a, b] → R, where [a, b] ⊂ R is an interval on the real line. A variety of univariate functions come into mind when thinking about basis functions. One obvious choice are the monomials x k , k = 0, 2, . . . , which we used in the example of Section 5.2.5. Low order monomials are
5.3 Implementation
247
useful for finite element methods because the accuracy of the solution can be controlled by the number and location of the nodes that delimit the subintervals. In addition, as mentioned above, their domain is unrestricted. They are, however, less suited for spectral methods, where the increasing order of the polynomials determines the accuracy of the solution. As we explain in Section 13.5.3, approximating a function with high-order monomials introduces oscillations at the end-point of [a, b], and the accuracy of the approximation suffers from near multicollinearity. Trigonometric functions, as the sine or the cosine function, are intensively used in the natural sciences to approximate periodic functions. Accordingly, they are less suited to reproduce the monotone behavior that usually characterizes the policy functions of DGE models (see also Judd (1998), p. 381). In economics, there seems to be a broad consensus to follow the recommendation of Boyd (2000). In his bestiary of polynomials in Appendix A of his book on spectral methods he designates Chebyshev polynomials as suitable for “Any problem whatsoever” (p. 497) and in his “Moral Principle 1” recommends this family for use except in cases where the researcher is “really, really sure that another set of basis functions is better” (p. 10). Fernández-Villaverde, Rubio-Ramírez, and Schorfheide (2016) also focus on this family in their presentation of projection methods. We consider Chebyshev polynomials at some length in Section 13.8 and employ them in later applications in this chapter. There, the reader will learn to exploit several of their properties to implement weighted residuals methods. Let us now return to the problem to express the function ϕk (x), x ∈ Rn in terms of univariate basis functions ϕki (x i ), x i ∈ R, i = 1, 2, . . . , n. The idea is to construct this function from products of univariate basis functions. Let d denote the (maximum) degree of univariate polynomials involved in the procedure and consider the set of integers {0, 1, . . . , d}. The n-fold Cartesian product of this set, K = {0, 1, . . . , d}n , has (1 + d)n members k := (k1 , k2 , . . . , kn ), ki ∈ {0, 1, . . . , d}.9 Then, we have ϕk (x) =
n Y i=1
ϕki (x i ).
The set of all these combinations is known as the n-fold tensor product. For instance, in the two-dimensional state space of the benchmark business cycle model and with d = 2, the 32 = 9 tuples are 9
Henceforth, we use ki = 0 as the smallest integer and Ki as the largest, in line with the convention to enumerate polynomials on the set {0, 1, 2, . . . }.
248
5 Weighted Residuals Methods
k1 = (0, 0), k2 = (0, 1), k3 = (0, 2), k4 = (1, 0), k5 = (1, 1), k6 = (1, 2), k7 = (2, 0), k8 = (2, 1), k9 = (2, 2) and the approximating polynomial is: ˆh(k t , z t ) = γ0,0 ϕ0 (k t )ϕ0 (z t ) + γ0,1 ϕ(k t )ϕ1 (z t ) + γ0,2 ϕ0 (k t )ϕ2 (z t ) + γ1,0 ϕ1 (k t )ϕ0 (z t ) + γ1,1 ϕ1 (k t )ϕ1 (z t ) + γ1,2 ϕ1 (k t )ϕ2 (z t ) + γ2,0 ϕ2 (k t )ϕ0 (z t ) + γ2,1 ϕ2 (k t )ϕ1 (z t ) + γ2,2 ϕ2 (k t )ϕ2 (z t ). The exponential growth in the number of coefficients with the dimension n of the state space is known as the curse of dimensionality. A smaller set of basis functions derives from the condition n X i=1
ki = j, j = 0, 1, . . . , d.
This set is known as the complete set. In the two-dimensional case, the respective combinations of indices are k1 = (0, 0), k2 = (0, 1), k3 = (1, 0), k4 = (0, 2), k5 = (1, 1), k6 = (2, 0). This set grows only polynomially in the dimension n of the state space. For instance, for d = 2, the number of elements is equal to 1 + n + n(n + 1)/2 (see Judd (1998), p. 240, Table 6.6). A further way to address the curse of dimensionality, which we will consider in one of our applications, is the Smolyak polynomial introduced in Section 13.9.4.
5.3.3 Residual Function In many economic applications, there are several ways to solve the model. For instance, in the example in Section 5.2.1, we chose the policy function of hours as the target. Alternatively, we could have used the conditional expectation hλ (k t , ln Z t ) := βE t Λ t+1 (1 − δ + r t+1 ).
However, in this case, we would have had to solve the nonlinear equation10 10
This equation derives from replacing λ t by hλ (k t , ln Z t ) in equation (1.64a) and combining the result with equations (1.64b) and (1.64c).
5.3 Implementation
(1 − L t )θ (1−η) hλ (k t , ln Z t )
249
η =
1−α α (1 − L t )Z t L −α t kt θ
for hours L t numerically, which introduces an additional source of inaccuracy. Even if we decided on the function that we wish to approximate, it is not always obvious how to define the residual function in step 3 of Algorithm 5.2.1. Consider the Euler equation of the deterministic growth model from (1.13): 0=
u0 ( f (K t ) − K t+1 ) − β f 0 (K t+1 ) u0 ( f (K t+1 ) − K t+2 )
with K t as the agent’s capital stock, u0 (C t ) the marginal utility of consumption C t = f (K t ) − K t+1 , and f (K t ) the production function (including the depreciated capital (1 − δ)K t ). Assume we want to approximate the policy function for capital K t+1 = hK (K t ) and define the residual function as: u0 f ˆhK (K t+1 ) − ˆhK (ˆhK (K t )) 0 K K R ˆh , K t = β f ˆh (K t ) − 1, (5.15) u0 f (K t ) − ˆhK (K t ) ˆhK (K t ) :=
K X
γk ϕk (K t )
k=1
so that the first term on the rhs of (5.15) is normalized to unity at the solution. Notice that by this formulation, we do not put more weight on low asset values K t (and, hence, low consumption C t ) with a corresponding high value of marginal utility, because we form the fraction of next period and current period marginal utilities. Consider the alternative residual function R ˆhK , K t = u0 f (K t ) − ˆhK (K t ) − βu0 f (ˆhK (K t )) − ˆhK ˆhK (K t ) f 0 ˆhK (K t ) .
Both terms on the rhs are not normalized but depend on the value of K t . In this case, small errors in approximating the exact policy function result in large residuals at low values of the capital stock K t , whereas relatively larger deviations in the approximated function ˆhK from the exact solution h for high values of K t yield much smaller residuals. As we aim to find a good uniform approximation of the policy function over the complete state-space, we should carefully choose of the residual function and use the definition (5.15).
250
5 Weighted Residuals Methods
In stochastic models, the residual function usually involves expected values as the expression in equation (5.2a). There are two ways to proceed. The first is to approximate the respective integrals with quadrature formulas considered in Section 14.4. Gauss-Hermite integration is feasible when few shocks drive the model. Otherwise, the exponentially increasing number of integration nodes necessitates monomial rules, which require much fewer nodes. The second approach is to replace a continuous valued shock with a finite state Markov chain. Again, we use the benchmark business cycle model to illustrate this point. Using the Gauss-Hermite formula (14.31) with nodes x 1 , x 2 , . . . , x m ˜ 1, ω ˜ 2, . . . , ω ˜ m the integral in the Euler equation (5.2a) is and weights ω approximately equal to Z∞ 2 e−(ε t+1 /σε ) /2 L g(h , k t , z t , ε t+1 ) p dz 2πσε −∞ m X p ˜ i g(h L , k t , z t , 2σε x i ). ≈ ω i=1
Alternatively, we may employ Algorithm 16.4.1 and replace the continuous valued process z t = ρ Z z t−1 + ε t by a Markov chain with elements z1 , z2 , . . . , zn and transition matrix P = (pi j ), i, j = 1, 2, . . . , n. Let L t = h L (k t , zi ) denote the policy function for hours in state (k t , z t = zi ) and let g(h L , k t , zi , z j ) represent the solution for λ t+1 (1 − δ + r t+1 ) in state (hk (k t , zi ), z t+1 = z j ). This approach allows us to approximate the expression on the rhs of equation (5.2a) by L
rhs(h , k t , zi ) ≈
n X j=1
pi j g(h L , k t , zi , z j ).
The advantage of this second approach over the first is that the latter will always compute the residual function at more extreme values of the shock process, where it may be difficult to solve the model. To demonstrate this fact, suppose we choose the interval [z, z¯] for the log of TFP. The L Gauss-Hermite formula will p compute thepresidual function R(h , (k t , z t )) over the interval [ρ Z z + 2σε x 1 , ρ Z z¯ + 2σε x m ], whereas the Markov chain approach does not go below z1 = z and beyond zn = z¯. A second advantage of a discrete process is that we can employ n univariate policy functions for hours hzL (k t ) i
:=
K X k=0
γki ϕk (k t ),
i = 1, 2, . . . , n
5.3 Implementation
251
instead of the bivariate function h L (k t , z t ). However, since we need hzL , j = j 1, 2, . . . , n to compute the residual function at (k t , zi ), we must nevertheless solve for the (1 + K) × n coefficients γki simultaneously. Finally, note that except for the case of the deterministic growth model all our examples demonstrate that our computations do not involve the ˆ (ˆh, ·). exact residual function R(ˆh, ·) but a numeric approximation R 5.3.4 Projection and Solution Depending on the choice of the weight and the test functions, step 4 of Algorithm 5.2.1 may become more or less involved. We begin with variants of the weighted residuals method that do not require integration. COLLOCATION. As stated above, the collocation method does not require integration but determines the parameters of the approximating function by selecting as many points from the state space X as there are elements of the vector γ. In the univariate case and with Chebyshev polynomials of degree k, Tk (z), z ∈ [−1, 1], as base functions, the number of parameters is equal 1 + K, where K is the degree of the polynomial ˆh(x) :=
K X k=0
γk Tk (ξ−1 (x)),
which we use to approximate the exact policy function h : [x, x¯ ] → R. The map ξ−1 is defined in equation (13.28b) and maps the state space X := [x, x¯ ] to the domain of Chebyshev polynomials [−1, 1]. In the multivariate case x ∈ Rn , the number of parameters depends on our construction of the basis. On a tensor product basis, ϕ j (x j ) :=
n Y i=1
Tki (ξ−1 (x i j ))
with 1 + Ki elements for each of i = 1, 2, . . . , n dimensions, the number Qthe n of parameters is equal to K = i=1 (1 + Ki ). Accordingly, the number of parameters increases exponentially with the dimension n of the state space.11 The K collocation nodes x j , j = 1, . . . , K, i.e., the points at which we want 11
For Ki = d, ∀i we have K = (1 + d)n .
252
5 Weighted Residuals Methods
to force the residual function R(ˆh, x j ) equal to zero, follow from the 1 + Ki 0 0 0 zeros zi,1 , zi,2 , . . . , zi,K of the Chebyshev polynomial TKi +1 (z) adjusted i +1 to the interval [x i , x¯i ] via the transformation (see equation (13.28a)) x i,l =
0 ξ(zi,l )
= xi +
0 (1 + zi,l )(¯ xi − x i)
2
,
l = 1, 2, . . . Ki + 1.
If we enumerate the K possible n-tuples of indices (l1 , l2 , . . . , l n ) with the index j, we can define any of the K collocation nodes as x j = (x 1l1 , x 2l2 , . . . , x nl n ).
(5.16)
This construction encounters a problem if we want to use the smaller complete set of univariate basis functions. Since this set has fewer members and hence fewer parameters than the K collocation nodes defined in the previous equation, the problem is which combination of points to choose. For instance, in the two-dimensional case K1 = K2 = 3, there are 16 = 4×4 collocation nodes according to (5.16) to determine the 10 nodes of the complete set n o −1 −1 Tk1 ξ (x 1 ) Tk2 ξ (x 2 ) k1 + k2 = k, k = 0, 1, 2, 3. . Heer and Maußner (2018) show for the N -country real business cycle model that an ad hoc selection of the collocation points performs very well. However, if the state space is larger than can be handled with acceptable computation time on a tensor product base, the Smolyak collocation should be used instead because this method determines its nodes optimally (see Section 13.9.4). FINITE ELEMENTS. We can also devise a finite element method without integration. For concreteness, we develop this approach for the univariate case and sketch the extension to the multivariate case. First, we divide the state space X := [x, x¯ ] into K non-overlapping intervals by selecting 1 + K points x 0 < x 1 < · · · < x K , with x 0 = x and x K+1 = x¯ . Second, we construct a spline from low-order monomials that approximates the policy function h : [x, x¯ ] → R. Suppose we employ linear polynomials ˆhk (x) := ak + bk (x − x k ) so that ¨ ak + bk (x − x k ), for x k ≤ x ≤ x k+1 and k = 0, 1, . . . K − 1, ˆh(x) := 0, otherwise.
5.3 Implementation
253
This construction allows us to extrapolate function values outside X by using ˆh0 (x) = a0 + b0 (x − x 0 ) for x < x and ˆhK−1 = aK−1 + bK−1 (x − x K−1 ) for x > x¯ . The spline ˆh(x) has 2K parameters and, as we show in Section 13.6.1, we need K+1 function values to determine them. However, different from drawing a spline through a given number of points (x k , yk ), k = 0, 1, . . . , K or approximating a known function h : [x, x¯ ] → R, we do not know the values of the policy function h. Instead of using given function values, we determine the parameters of our spline from the condition that the residual function R(ˆh, x) is equal to zero at the K + 1 nodes x k . If we want a smoother approximation, we can employ cubic functions (see Section 13.6.2) over each of the K subintervals. Still, we need only K + 1 function values to determine the 4K parameters of this spline. Accordingly, we still must solve the nonlinear system of equations R(ˆh, x k ) = 0,
k = 0, 1, . . . K.
The extension to the multivariate case should be obvious. In each dimension i = 1, 2, . . . , n of the state space we use Ki +1 nodes and construct the n-dimensional grid from the Cartesian product of the one-dimensional grid points. For instance, in the two-dimensional case, with x 1k1 and x 2k2 as nodes in the first and second dimension, respectively, the pairs (x 1k1 , x 2k2 ), k1 = 0, 1, . . . , K1 and k2 = 0, 2, . . . , K2 define a rectangular grid with elements [x 1k1 ≤ x 1 ≤ x 2k1 +1 ] × [x 2k2 ≤ x 2 ≤ x 2k2 +1 ]. For a bilinear spline or a bilinear cubic spline, the K = (1 + K1 ) × (1 + K2 ) conditions R(ˆh, (x 1k1 , x 2k2 )) = 0 determine the respective parameters. Accordingly, as with the collocation method, the number of unknown parameters increases exponentially with the dimension of the state space. Even if the number of nodes equals the number of parameters employed in the collocation method, the finite element method involves additional computations, namely those required to construct the spline. GALERKIN AND LEAST SQUARES. We turn to the Galerkin and the least squares projection. Both require the computation of integrals. In the onedimensional case, the integral will be the familiar Riemann integral, i.e., the area under a given function f : R → R over the interval [x, x¯ ]. If the state space X has more than one dimension, we consider the n-dimensional product of intervals [x 1 , x¯1 ] × · · · [x n , x¯n ], which is a hypercube in Rn . We R then denote by the shorthand X the n-fold Riemann integral
254
5 Weighted Residuals Methods
Z I(ˆhγ ) :=
R(ˆhγ , x)ψk (x)w(x) d x
Z
X x¯1
Z
:= x1
Z
x¯2 x2
···
x¯n xn
(5.17) R(ˆhγ , x)ψk (x)w(x) d x 1 d x 2 . . . d x n .
As mentioned above, it is impossible to evaluate this integral analytically, so we must employ numeric integration techniques. To the extent we use a basis of Chebyshev polynomials to approximate the policy function h : X → R, this case suggests to resort to Gauss-Chebyshev integration (see Section 14.3.2). Let ξi :[x i , x¯i ] → [−1, 1],
x i 7→ ξ(x i ) =
2(x i − x)
− 1, x¯i − x i (1 + zi )(¯ xi − x i) zi → 7 ξ−1 (zi ) = x i + 2
¯i ], ξ−1 i :[−1, 1] → [x i , x
define the bijection between the interval [x i , x¯i ] and the domain of Chebyshev polynomials [−1, 1]. Accordingly, the maps between the hypercubes X := [x 1 , x¯1 ] × · · · × [x n , x¯n ] ⊂ Rn and Z := [−1, 1]n are given by
−1 ξ1 (x 1 ) ξ1 (z1 ) . .. . ξ(x) = .. and ξ(z)−1 = . ξ−1 n (zn )
ξn (x n )
Note that the Jacobian matrix of ξ−1 (z) is equal to the product J ξ−1 (z) = 2−n (¯ x 1 − x 1 ) . . . (¯ x n − x n) . Applying the change of variable formula (14.21) to the integral (5.17) yields Z R(ˆhγ , x)ψk (x)w(x) d x X
=
(¯ x 1 − x 1 ) · · · (¯ x n − x n) 2n
Z Z
R ˆh, ξ−1 (z) ψk (ξ−1 (z))w(ξ−1 (z)) d z.
Accordingly, if we choose the weight function w(x) := p
1 1 − ξ1 (x 1 )2
··· p
1 1 − ξn (x n )2
,
5.3 Implementation
255
we can approximate the multiple integral (5.17) by the multiple sum x¯n − x n πn x¯1 − x 1 I(ˆhγ ) ≈ n ... 2 L1 Ln Ln L1 X X 0 0 0 0 ··· R ˆhγ , ξ−1 (z1,l , . . . , zn,l ) ψk ξ−1 (z1,l , . . . , zn,l ) , l1 =1
l n =1
1
n
1
n
(5.18) where L1 through L n are the number of integration nodes z 0j,l used to i approximate the integral in dimension j, i = 1, 2, . . . , n. It should be obvious that integration amplifies the curse of dimensionality. For instance, Heer and Maußner (2018) use the Galerkin method to solve the multi-country real business cycle model. For eight countries with idiosyncratic TFP shocks, the state space of this model has dimension n = 16. Even with only three integration nodes in each dimension, there are 316 = 43, 046, 721 evaluations of the integrand in each step of the solution procedure. The monomial integration formula, as presented in Section 14.3.3, considerably reduces the computational burden. For instance Heer and Maußner (2018) employ equation (14.26) and thereby reduce the number of integration nodes from 316 to 216 + 33 = 65, 569. INITIAL VALUES. All variants of the weighted residual methods lead to solving a more or less complicated system of nonlinear equations or minimizing a function in step 4 of Algorithm 5.2.1. The successful application of the available numerical tools that tackle this problem requires good starting values. Otherwise, the respective algorithms will fail to find a solution. In the applications presented later in this chapter, we initialize all computations from a perturbation solution of the respective model. After all, coding the residual function usually involves most if not all the equations required by our toolbox to compute the perturbation solution.
5.3.5 Accuracy The last step of Algorithm 5.2.1 involves a measure of solution accuracy, usually the residuals of Euler equations. To the extent the latter coincide with the values of the residual function R(ˆhγ , .), we must also evaluate the respective Euler equations on points x different from those involved in the
256
5 Weighted Residuals Methods
solution procedure because at these points, the Euler equation residuals have a numerically small absolute value. To understand this point, consider equation (5.11). It determines the parameters from the conditions that the residuals are numerically as close as possible to zero at K given points x1 , x2 , . . . , xK . The following sections present applications of Algorithm 5.2.1. We begin with the one-dimensional case of the deterministic growth model. Next, we consider the two-dimensional benchmark business cycle model. We then proceed to the standard search and matching model. It provides a nice example of where the weighted residuals solution differs markedly from the perturbation solution. We close the chapter with a disaster risk model that features a five-dimensional state space.
5.4 The Deterministic Growth Model THE MODEL. In Section 1.3, we introduce the deterministic growth model. For the reader’s convenience, we restate the farmer’s decision problem given in (1.9): max
C t ,C t+1 ,...
Ut =
∞ X
1−η
β
s
s=0
s.t. α K t+s
C t+s − 1 1−η
,
β ∈ (0, 1) , η > 0,
K t+s+1 + C t+s ≤ + (1 − δ)K t+s , 0 ≤ C t+s , 0 ≤ K t+s1 , Kt
α ∈ (0, 1),
(5.19) s = 0, 1, . . . ,
given,
where C t is consumption in period t and K t is the farmer’s capital stock. Here, we assume that the current period utility function u(C t ) has a constant elasticity of marginal utility with respect to consumption of −η. The production function F (K t , L) = K tα L 1−α with L = 1 is of the Cobb-Douglas type and capital depreciates at the rate δ ∈ (0, 1]. The resource constraint and the Euler equation of this problem are: 0 = K tα + (1 − δ)K t − C t − K t+1 , C t+1 −η α−1 0= β 1 − δ + αK t+1 − 1 = 0. Ct
(5.20a) (5.20b)
5.4 The Deterministic Growth Model
257
From the Euler equation (5.20b) we derive the steady state values of the capital stock and of consumption: 1/(1−α) αβ K= , 1 − β(1 − δ) C = (K ∗ )α − δK.
IMPLEMENTATION. The state-space X of the problem is one-dimensional and consists of the capital stock K t . We approximate the policy function for ¯ = 1.5K]. consumption hC : X → R≥0 over the interval X := [K = 0.5K, K Depending on the question at hand, the reader may want to choose a different interval. For example, if one aims to study the transition dynamics from an initial capital stock K0 < K to the stationary solution K, one would ¯ ] that contains K0 and K, a lower bound K choose an interval X := [K, K ¯ slightly above K. slightly below K0 , and an upper bound K C We compute approximations ˆh of the consumption function with four different methods: 1) finite element, 2) collocation, 3) Galerkin, and 4) least squares. A second-order perturbation solution serves as our benchmark and provides an initial guess for our algorithms. For each ˆhC , we compute the residual function R(ˆhC , Ki ) at a given point Ki , i = 1, 2, . . . , I, where I depends on the method, in two steps: ˜ i = K α + (1 − δ)Ki − ˆhC (Ki ), K i C ˆh (Ki ) η C ˜ α−1 . R(ˆh , Ki ) = 1 − β 1 − δ + αK i ˆhC (K ˜i )
(5.21a) (5.21b)
R Matrix oriented programming languages as GAUSS or MATLAB allow T C C ˆ ˆ us to compute the vector R := R(h , K1 ), . . . , R(h , KN ) from the vector
K := [K1 , . . . , KN ] T with two lines of code.12 For the finite element method, we divide X into nine intervals of equal length and approximate hC with a cubic spline. The values of ˆhC (Ki ) at each of the I = 10 grid points Ki are determined so that the residual function is equal to zero at these points. We use our GAUSS procedures CSpline_coef and CSpline_eval to implement the spline. For the remaining three methods, we approximate the consumption function with a Chebyshev polynomial of degree K = 4: ˆhC (K) =
K X k=0
12
γk Tk (ξ−1 (K)),
See our Gauss program DGM_WRM.g, which computes the results reported below.
258
5 Weighted Residuals Methods
¯ ] to where ξ−1 is the function defined in equation (13.28b). It maps [K, K the domain of Chebyshev polynomials [−1, 1]. The collocation method determines the five parameters of this polynomial from the condition that R(ˆhC , ξ(zi0 )) is equal to zero at the five zeros zi0 , i = 1, 2, . . . , I of the Chebyshev polynomial of degree K + 1 = 5. For the Galerkin and the least squares method, we employ Gauss-Chebyshev integration on I = 10 nodes. In the present model, equation (5.18) implies the system of equations K I X ¯ −K X πK R γk Tk , zi0 Tk (zi0 ), 0= 2 I i=1 k=0
k = 0, 1, . . . K,
for the Galerkin method, whereas the least squares method selects the K + 1 parameter values that minimize the function K 2 I X ¯ −K X πK f (γ0 , . . . , γK ) := R γk Tk , zi0 . 2 I i=1 k=0 We initialize both the GAUSS nonlinear equations solver EqSolve and the GAUSS minimization procedure QNewton with a vector obtained from Algorithm 13.8.1. The function values Ki (ξ(zi0 )) required in Step 3 are taken from the second-order perturbation solution. This guess is good ˜ i computed in step (5.21a) is enough, so we do not check whether K 13 ¯ outside the interval [K, K ]. RESULTS. We calibrate the model with the parameter values presented in Table 1.1, i.e., we use α = 0.36, β = 0.996, and δ = 0.014. Table 5.1 displays the parameter values for the five different solutions. The second-order solution has only two parameters (see footnote 13), and the collocation, the Galerkin, and the least squares solution have five parameters. In the case of the finite element method, the parameter values are the values of the cubic spline at the I = 10 grid points. The last line in the table presents the respective maximum of the absolute values of the residuals of the Euler equation (5.20b). They are computed on a grid of 100 points spread evenly over the interval [0.5K ∗ , 1.5K ∗ ]. As explained 13
Note that in the present model, the approximation of hC follows from equation (3.31b)
as ˆhC (Ki ) = C + γ0 (Ki − K) + γ1 (Ki − K)2 , γ0 := h y , γ1 := 0.5h y . w ww
5.4 The Deterministic Growth Model
259
in Section 1.7.2, we compute the residuals in a dimension-free way as the rate by which consumption needed to be changed vis-a-vis ˆhC (Ki ) to ˜ i given by equation (5.20a), the satisfy the Euler equation. Hence, with K Euler equation residual at a given grid point Ki , is computed as: E ER i :=
C˜i
− 1, ˆhC (Ki ) 1 ˜ i ) β 1 − δ + αK ˜ α−1 − η . C˜i = ˆhC (K i Table 5.1 Weighted Residuals Solution of the Deterministic Growth Model
Coefficient Second Order Finite Collocation Perturbation Elements γ1 γ2 γ3 γ4 γ5 γ6 γ7 γ8 γ9 γ10 EER
Galerkin
Least Squares
0.016445 2.845269 3.823873 3.823798 3.823826 −0.000075 3.109771 0.908607 0.908517 0.908496 3.350707 −0.059874 −0.059602 −0.059540 3.573873 0.008758 0.008807 0.008817 3.782536 −0.001398 −0.001526 −0.001569 3.979432 4.166374 4.344766 4.515751 4.680197 0.000940 0.000004
0.000013
0.000009
0.000008
Notes: The coefficients γi are either the parameter values of the respective polynomial (second order perturbation, collocation, Galerkin, or least squares) or the values of the spline at the grid points (finite element). EER denotes the maximum absolute value of the Euler equation residual.
The first observation from the table is that all four weighted residual methods are orders of magnitude more accurate than the second-order perturbation solution. The ratio between the Euler equation residual of the second-order perturbation solution and the residuals of the other methods is between 73 (for the collocation solution) and 248 (for the finite element solution). Note that the finite element, the Galerkin and the least squares methods employ the same amount of information since they use I = 10 points from the state space of the model to compute the parameter vector. The collocation method employs information from I = 5 points, whereas the second-order perturbation solution considers only the
260
5 Weighted Residuals Methods
point (K ∗ , C ∗ ). Accordingly, it should be unsurprising that the perturbation method performs worst. Figure 5.3 illustrates this point. It displays the Euler equation residuals from the perturbation and the Galerkin solution. The former become larger (in absolute value) the farther apart the capital stock Ki is from its stationary value K.
0.0002 0.0000 −0.0002 −0.0004
2nd Order Perturbation Galerkin with 10 nodes
−0.0006 −0.0008 −0.0010
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
Ki /K Figure 5.3 Euler Equation Residuals: Deterministic Growth Model
The second observation from Table 5.1 is that there are only very small numeric differences between the parameter values of the three methods that employ Chebyshev polynomials. Accordingly, all three methods share the same order of accuracy. Additionally, the absolute values of the parameters decline rapidly. The Chebyshev truncation theorem (13.8.1), thus, predicts that the truncation error will be small. We invite the reader to use our program DGM_WRM.g and check the sensitivity of our results with respect to the degree of the polynomial, the number of integration points, and the number of finite elements.
5.5 The Benchmark Business Cycle Model THE MODEL. We introduce the benchmark business cycle model in Example 1.6.1, compute perturbation solutions for the model in Section 4.4, and explain in Section 5.2.1 a stepwise procedure to determine the
5.5 The Benchmark Business Cycle Model
261
solution from the policy function for working hours h L . In this section we approximate this function on both a tensor product basis and a complete basis of univariate Chebyshev polynomials and determine the respective parameters with the Galerkin weighted residuals method. We calibrate the model as before according to the parameter values presented in Table 1.1. IMPLEMENTATION. The state space of the model is two-dimensional and consists of the (scaled) capital stock k t and the natural logarithm of total factor productivity (TFP) ln Z t . For ease of notation, we use the index i = 1, 2 to differentiate between both dimensions. An underscore indicates the lower bound of a variable and a bar indicates the upper bound. The function ξi : [−1, 1] → [x i , x¯i ] maps variables in the domain of Chebyshev polynomials to the domain of the capital stock (i = 1) and the domain of the TFP shock i = 2 (see equation (13.28a)). The symbol T j (z) denotes the univariate Chebyshev polynomial of degree j at point z ∈ [−1, 1]. Finally, let Ki denote the degree of approximation in dimension i. On a tensor product base with degrees K1 and K2 we approximate h L by the polynomial14 ˆh L (k, ln Z) :=
K1 X K2 X k1 =0 k2 =0
−1 γk1 ,k2 Tk1 ξ−1 1 (k) Tk2 ξ2 (ln Z) .
(5.22)
On a complete product basis, the polynomial of degree d ˆh L (k, ln Z) :=
K1 d−k X X1 k1 =0 k2 =0
−1 γk1 ,k2 Tk1 (ξ−1 1 (k))Tk2 (ξ2 (ln Z))
(5.23)
approximates the policy function. The Galerkin method combined with Gauss-Chebyshev integration on L i nodes determines the parameters γk1 ,k2 from the condition (see equation (5.18)) 14
In the following equations the symbols k1 and k2 index the parameters of the Chebyshev polynomial, and the symbols L1 and L2 indicate the number of integration nodes. Readers should not confuse the former with the symbol for the capital stock k and the latter with the symbol for working hours L.
262
5 Weighted Residuals Methods L1 L2 π2 ¯k − k ln Z − ln Z X X R ξ1 (zl1 ), ξ2 (zl2 ) Tk1 (zl1 )Tk2 (zl2 ), 0= L2 4 L1 l =1 l =1 1
k1 = 0, 1, . . . K1 , ¨ 0, 1, . . . , K2 , k2 = 0, 1, . . . , d − k1 ,
2
(5.24) tensor product basis, complete basis,
where R(k, ln Z) denotes the residual of the Euler equation (5.2) at the integration node (zl1 , zl2 ) with k = ξ1 (zl1 ) and ln Z = ξ2 (zl2 ). R Interpreted computer languages, as GAUSS and MATLAB , perform poorly in loops. We can avoid the loops involved in the double sums in equations (5.22)-(5.24). Let
TK,L
γ0,0 T0 (z1 ) T0 (z2 ) . . . T0 (z L ) T1 (z1 ) T1 (z2 ) . . . T1 (z L ) γ1,0 := and Γ := .. .. .. ... ... . . . γK1 ,0 TK (z1 ) TK (z2 ) . . . TK (z L )
. . . γ0,K2 . . . γ1,K2 . (5.25) .. . .. . . . γK1 ,K2
so that the elements of the matrix H = TK0
1 ,L 1
Γ TK2 ,L2
are working hours on all L1 × L2 integration nodes. This matrix allows us to compute the L1 × L2 matrix of Euler equation residuals R using R matrix multiplication instead of loops (see the code in the MATLAB function BM_CGT_Eqs.m) . We employ Gauss-Hermite integration (see Section 14.4.2) to approximate the expectations in equation (5.2a). Given R, the K1 × K2 conditions in equation (5.24) are equivalent to 0K1 ×K2 = TK1 ,L1 RTK0
2 ,L 2
.
The matrices TKi ,L i , i = 1, 2 also allow us to initialize the matrix Γ from an L1 × L2 matrix of working hours computed on the integration nodes (z1 , z2 ) from a perturbation solution of the model according to equation (13.44). We can approach the case of a complete product basis in a similar R way (see our MATLAB script BM_CGC.m and the related functions). To handle cases where either the capital stock or the TFP shock are outside their respective bounds, we compute a two-dimensional cubic spline on the integration nodes and use it to extrapolate working hours. R The MATLAB script BM_CGT.m solves and simulates the model on a tensor product basis and the script BM_CGC.m on a complete product basis.
5.5 The Benchmark Business Cycle Model
263
The interval for the TFP shock is equal to I Z := [−¯ z , z¯], where z¯ is equal to 3.5 timespthe unconditional standard deviation of the TFP shock, i.e., z¯ = 3.5σ Z / 1 − ρ Z . With an eye to Figure 5.2, we consider three different intervals for the capital stock, defined relative to the stationary capital stock k. The intervals are shown in the left-most column of Table 5.2. All solutions employ L1 = 7 and L2 = 7 Gauss-Chebyshev integration nodes. The Gauss-Hermite integration for the approximation of the expectation in equation (5.2a) uses nine nodes. RESULTS. Table 5.2 displays the maximum absolute value of Euler equation residuals computed on 100 × 100 evenly spaced nodes over the space defined in the leftmost column. They are computed as in Section 4.4 as the relative change in consumption required to meet the Euler equation (5.2). Table 5.2 Euler Equation Residuals of the Galerkin Solution of the Benchmark Business Cycle Model State Space
Second Order Perturbation
Tensor Product Basis K1 = 2 K2 = 2
[0.90k, 1.10k] × I Z [0.85k, 1.15k] × I Z [0.80k, 1.20k] × I Z
4.320E-04 1.360E-03 3.088E-03
K1 = 3 K2 = 3
Complete Product Basis d =2
d =3
1.367E-06 2.501E-08 3.948E-06 1.127E-07 2.622E-06 1.114E-07 5.504E-06 3.098E-07 5.197E-06 3.565E-07 8.927E-06 6.996E-07
Compared to the perturbation solution, the four weighted residuals solutions are at least two orders of magnitude more accurate. As we would expect, the precision of all methods declines as we widen the interval of the capital stock, and it increases with the degree of the polynomials. Note, however, that the increase is limited to approximately one order of magnitude despite the fact that the number of coefficients increases from 9 to 16 (tensor product basis) and from 6 to 10 (complete product basis). Finally, observe that the complete basis polynomial delivers approximately the same precision with fewer parameters than the product basis polynomial. The results in Table 5.3 echo the finding of Aruoba et al. (2006) and Heer and Maußner (2008). Both studies compare solutions of the benchmark
264
5 Weighted Residuals Methods
business model across different methods and report negligible numerical differences in business cycles statistics from simulated data. The table presents the typical set of second moments from simulations with the second-order perturbation solution and the Galerkin solution with degrees K1 = K2 = 3 on the state space [0.85k, 1.15k] × I Z , which is the most accurate among those shown in Table 5.2. We computed the standard deviations, correlation with output, and first-order autocorrelations reported in the table from simulated time series with 50,000 observations after a burn-in period of 1,000 periods. All 15 second moments from the two sets agree up to two digits after the decimal point. The next section presents a model whose simulated time paths differ markedly between the perturbation and the weighted residuals method. Table 5.3 Second Moments from the Benchmark Business Cycle Model: Perturbation versus Galerkin Solution
Variable Output Consumption Investment Working Hours Real Wage
Second Order Perturbation sx rx y rx 1.40 0.46 4.17 0.83 0.58
1.00 0.99 1.00 1.00 0.99
0.65 0.66 0.65 0.65 0.66
Galerkin K1 = 3, K2 = 3 sx rx y rx 1.40 0.46 4.17 0.83 0.58
1.00 0.99 1.00 1.00 0.99
0.65 0.66 0.65 0.65 0.66
Notes: Second moments computed from an HP-filtered simulated time series with 50,000 included observations and a burn-in period of 1,000 observations. s x :=standard deviation of variable x, r x y :=crosscorrelation of variable x with output, r x :=first-order autocorrelation of variable x.
5.6 The Benchmark Search and Matching Model 5.6.1 Motivation Search theoretic models of unemployment were introduced into the economics literature in the 1970s and 1980s by Peter Diamond, Dale Mortensen, and Christopher Pissarides. Since their seminal papers, an extensive
5.6 The Benchmark Search and Matching Model
265
body of literature has developed. More recent reviews are Rogerson et al. (2005) and Rogerson and Shimer (2012). Among the first studies that replaced the frictionless labor market in the standard real business cycle model with a market where both unemployed workers and vacant jobs coexist are Merz (1995), Andolfatto (1996), and Den Haan et al. (2000). Shimer (2005) contrasts data from the US labor market with the quantitative predictions of search theoretic models and finds that they are not able to replicate the large standard deviations in unemployment, vacancies, and labor market tightness, measured as the ratio between vacancies and unemployed persons. Hagedorn and Manovskii (2008) argue that this failure is not a problem with the model per se but its calibration. PetroskyNadeau et al. (2018) calibrate the labor market of their real business cycle model as recommended by Hagedorn and Manovskii (2008) and use a weighted residual method to solve it. They show that this model is able to explain infrequent but severe economic recessions as observed in the data by Barro and Ursúa (2012). Heiberger (2020) scrutinizes the disaster mechanism in this model and identifies the compensation of unemployed workers in the range of more than 80 percent of average labor productivity in combination with a highly persistent TFP shock as its source. Here, we follow Heiberger (2017) and compare the time series properties of a model very close to the model of Petrosky-Nadeau et al. (2018) from a second-order perturbation solution with those from a weighted residuals method.
5.6.2 The Model The economy is populated by a representative household with a unit mass of members, a representative firm, and the government. HOUSEHOLD. The representative household holds stocks issued by the firm and earns a dividend d t per unit of stocks S t . The fraction Nt ∈ [0, 1] of the members are employed and earn the real wage w t , the fraction U t = 1 − Nt is without a job and receives unemployment compensation b from the government. The household pays taxes Tt to the government, consumes C t and invests its remaining savings in new stocks sold at price vt . Accordingly, the budget constraint reads d t S t + w t Nt + (1 − Nt )b − Tt − C t ≥ vt (S t+1 − S t ).
(5.26)
266
5 Weighted Residuals Methods
At each period, the fraction ω ∈ (0, 1) of the employed members of the household loses their jobs and the fraction κwt of the unemployed members is matched to new jobs. Accordingly, the mass of employed members evolves as Nt+1 = (1 − ω)Nt + κwt (1 − Nt ).
(5.27)
Let J h (Nt , S t ) denote the expected life-time utility of the household with Nt employed members and stocks S t . It is determined as a solution to the dynamic program J h (Nt , S t ) = max u(C t ) + βE t J h (Nt+1 , S t+1 ) , β ∈ (0, 1), (5.28) S t+1
where β is the household’s subjective discount factor and E t denotes expectations as of period t. We parameterize the current-period utility function as usual: 1−η
u(C t ) :=
Ct
−1
1−η
,
η ≥ 0.
The first-order condition of this problem with respect to the number of stocks S t+1 is the Euler equation15 1 = βE t
C t+1 Ct
−η
d t+1 + vt+1 . vt
(5.29)
Accordingly, the stochastic discount factor applied to uncertain returns is
C t+1 M t+1 = β Ct
−η
.
Let ψ t denote the net value a new job in terms of consumption goods. This value is the value of a new job minus the value of staying unemployed. It is equal to the change in income w t − b plus the discounted expected continuation value. Since, from equation (5.27), one more employed household member in period t increases future employment by 1−ω−κwt , the net value of a new job follows recursively from the equation:
15
ψ t = w t − b + β(1 − ω − κwt )E t M t+1 ψ t+1 .
(5.30)
For a more detailed derivation of this equation and the following results see Maußner (2023), which is available on the books’s website.
5.6 The Benchmark Search and Matching Model
267
THE FIRM. The representative firm operates the linear technology Yt = Z t Nt , where the log of labor productivity ln Z t follows an AR(1)-process. At each period, the firm loses ωNt of its workers. It posts vacancies Vt at the unit cost c and expects that κ f t Vt of them will be occupied by workers in the following period. Thus, Nt+1 = (1 − ω)Nt + κ f t Vt describes the employment dynamics from the firm’s perspective. The firm chooses vacancies to maximize its stream of discounted profits. The Bellman equation of this problem is J(Nt , Z t ) = max Z t Nt − w t Nt − cVt + E t M t+1 J(Nt+1 , Z t+1 ) . Vt
Let ζ t :=
∂ J(Nt , Z t ) and ζ˜t := βE t M t+1 ζ t+1 ∂ Nt
denote the current period value of a new employee and its expected future discounted value, respectively. The former is related to the latter via the condition ζ t = Z t − w t + (1 − ω)ζ˜t since one more employed worker increases the current profit by Z t − w t and will stay in the firm for one additional period with probability (1 − ω). Accordingly, the first-order condition of the firm with respect to (wrt) the mass of posted vacancies Vt reads c = κ f t ζ˜t . If this condition cannot be satisfied for Vt > 0, the firm does not post vacancies and the KKT condition on vacancies reads c = κ f t ζ˜t + µ t , where µ t ≥ 0 is the Lagrange multiplier of the constraint Vt ≥ 0. BARGAINING. The real wage distributes the gains from a match between an unemployed worker and a vacancy between the firm and the worker. It follows from a Nash bargaining solution. The objective is to choose a wage
268
5 Weighted Residuals Methods
w t that maximizes the geometrically weighted average of the values ψ t and ζ t , where the weights reflect the bargaining power of the participants: max wt
1−ϕ
ζt
ϕ
ψt ,
ϕ ∈ [0, 1].
The solution of this problem delivers the wage Vt w t = (1 − ϕ)b + ϕ Z t + c . Ut Accordingly, if the bargaining power of workers ϕ approaches zero, they only receive a wage that compensates for their transition from unemployment to employment: w t = b. If the bargaining power of the firm is negligible, the worker seizes the entire value from the match, which is equal to the additional production Z t and the expenditures cVt /U t the firm would have to incur if the worker did not agree and the firm than needed to post new vacancies. MATCHING. The process of matching workers to vacancies is not modeled explicitly. Rather, its outcome M t is determined by the matching function Mt =
Vt U t Vtτ + U tτ
1/τ
with parameter τ > 0. Therefore, employment follows Nt+1 = (1 − ω)Nt + M t and the consistency of expectations requires κwt =
Mt Mt and κ f t = . Ut Vt
DYNAMICS. Finally, since the firm does not accumulate capital, the number of newly issued shares vt (S t+1 − S t ) and the amount of retained earnings RE t = Yt − w t Nt − cVt − d t S t must sum to zero. In addition, taxes Tt must equal the government’s payments to unemployed household members, Tt = U t b. The budget constraint of the household thus implies the aggregate resource constraint C t = Yt − cVt . We summarize this sketch of the model in the system of equations:
5.6 The Benchmark Search and Matching Model
269
Yt = Z t Nt ,
(5.31a)
C t = Yt − cVt ,
(5.31b)
U t = 1 − Nt , U t Vt Mt = , τ (U t + Vtτ )1−τ ¨M t , if Vt > 0 κ f t = Vt 1, if Vt = 0, κwt =
Mt , Ut
(5.31c) (5.31d) (5.31e) (5.31f)
Vt w t = (1 − ϕ)b + ϕ Z t + c , Ut ¨ κ f t ζ˜t , if Vt > 0, c= κ f t ζ˜t + µ t , µ t ≥ 0, if Vt = 0, ζ˜t = βE t
Ct C t+1
η
(5.31g) (5.31h)
Z t+1 − w t+1 + (1 − ω)ζ˜t+1 ,
Nt+1 = (1 − ω)Nt + M t ,
ln Z t+1 = ρ Z ln Z t + ε t ,
ρ Z ∈ (−1, 1), ε t+1 iid
N (0, σε2 ).
(5.31i) (5.31j) (5.31k)
STATIONARY EQUILIBRIUM AND CALIBRATION. For given values of the parameters β, b, and ϕ and given values of the variables U, Z = 1, κw , and κ f , the stationary solution for the remaining variables and the free parameters follows from the system: N = 1 − U,
(5.32a)
Y = N,
(5.32b)
M = κw U, M V= , κf M ω= , N β(1 − ϕ)(1 − b) ζ˜ = , 1 − β(1 − ω) + βϕκw
(5.32c)
c = κ f ζ,
w = (1 − ϕ)b + ϕ 1 + c
V . U
(5.32d) (5.32e) (5.32f) (5.32g) (5.32h)
270
5 Weighted Residuals Methods
We calibrate the model on a monthly frequency following Heiberger (2017), who in turn refers to Kuehn et al. (2012). Table 5.4 presents the values of the parameters and variables chosen as targets. The values of the parameters κw and κ f were first used in Den Haan et al. (2000). Together with the value of U = 0.1 they imply a job separation rate of ω = 0.05 and τ = 1.2897 from the system (5.32). The small value of the worker’s bargaining power ϕ = 0.052 is advocated by Hagedorn and Manovskii (2008). Whereas these authors employ an unemployment compensation of 0.955, Kuehn et al. (2012) choose the smaller value b = 0.85. Table 5.4 Calibration of the Search and Matching Model Preferences TFP Shock Labor Market Bargaining Value of Unemployment
β=0.991/3 η=2.0 ρ Z =0.951/3 σε =0.0077 U=0.1 κ f =0.71 κw =0.45 ϕ=0.052 b=0.85
5.6.3 Galerkin Solution There are many different ways to solve this model with weighted residuals methods. Heiberger (2017) computes finite element solutions with or without a discrete Markov chain approximation of the process (5.31k) and Galerkin solutions with Chebyshev polynomials. Here, we solve the model with a Galerkin weighted residual method. POLICY FUNCTION AND RESIDUALS. The state space of the model consists of the mass of employed household members N and the natural logarithm of TFP ln Z.16 It is evident from the system (5.31) that we can solve the model given the policy function for the value of employment from the firm’s perspective ζ˜t , hζ (N , ln Z). We approximate this function with a tensor product of K1 and K2 univariate Chebyshev polynomials: 16
We follow the convention introduced in the previous subsection and index the two dimensions with i = 1, 2.
5.6 The Benchmark Search and Matching Model
ˆhζ (N , z) :=
K1 X K2 X k1 =0 k2 =0
−1 γk1 ,k2 Tk1 (ξ−1 1 (N ))Tk1 (ξ2 (ln Z)).
271
(5.33)
Given this function, we can compute the residual function for each point (N , ln Z) from the system (5.31) in two steps. In the first step, we solve the system for consumption and next-period employment. In this step, we must check the constraint V ≥ 0. If c > hζ (N , ln Z) we set κ f = 1 and V = 0; otherwise, we solve (5.31d) and (5.31h) for V and (5.31b) for C. ˆ = (1 − ω)N + M (N , ln Z) denote these solution. In Let C(N , ln Z) and N the second step, we employ Gauss-Hermite integration to compute the expected value on the rhs of the Euler equation (5.31i). Let x i , i = 1, . . . , I denote the weights, and ln Zˆi := p integration nodes, νi the integration ζ ˆ ˆ ˆ ρ Z ln Z + 2σε x i . We then compute ζi = h (N (N , ln Z), ln Zˆi ) and solve ˆ N ˆ , ln Zˆi ) and W ˆ (N ˆ , ln Zˆi ) so that the expected the system (5.31) again for C( value is approximately equal to (see formula (14.31)): I ˆ N ˆ , ln Zˆi ) −η C( β X rhs(N , ln Z) = p νi C(N , ln Z) π i=1 ˆ (N ˆ , ln Zˆi ) + (1 − ω)ζˆi . × Zi − W We can again employ the matrices TK L and Γ defined in (5.25) to compute the system of equations whose solution in the matrix Γ solves the model. First, note that the matrix of residuals R has elements R(ξ1 (zl1 ), ξ2 (zl2 )) := ˆhζ (ξ1 (zl1 ), ξ2 (zl2 )) − rhs(ξ1 (zl1 ), ξ2 (zl2 )).
In matrix notation: R = TK0
1 ,L 1
Γ TK2 ,L2 − RHS.
Accordingly, condition (5.18) can be written as 0K1 ×K2 =
¯ − N ln Z − ln Z π2 N TK1 ,L1 RTK0 ,L . 2 2 4 L1 L2
Employing the discrete orthogonality of Chebyshev polynomials (see conditions (13.31)) and ignoring the multiplicative factor in front of the previous condition yield the equivalent condition 1 1 L1 0 0 . . . 0 L2 0 0 . . . 0 0 2 0 ... 0 0 2 0 ... 0 L1 L 0 0K1 ×K2 = Γ − . . . . . TK1 ,L1 RHS TK ,L . .2 . . . . 2 2 . . . . . . . . . . . . . . . . . . . . 2 0 0 0 . . . L22 0 0 0 . . . L1
272
5 Weighted Residuals Methods
Hence, at each iteration of a nonlinear equations solver, we need only to compute the rhs of the Euler equation (5.31i). IMPLEMENTATION. The script Search_CGT.m provides the code to solve the model. The function Search_CGT_Eqs.m, returns the residuals for a R given matrix Γ . We initialize the MATLAB nonlinear equations solver from a second-order perturbation solution computed on the L1 × L2 matrix of Chebyshev integration nodes. It was no easy task to find an acceptable solution. The script starts ¯ ] × [ln Z, ln Z] with N = 0.7, N ¯ = 0.97, ln Z = with an initial grid [N , N q 2 −3.5σε / 1 − ρ Z , and ln Z = −ln Z. The initial degrees are K1 = K2 = 4 and the number of integration nodes is L1 = 35 and L2 = 15. The script then increases K1 stepwise to K1 = 7, K2 to K2 = 6, and ln Z to q −4σε / 1 − ρ Z2 . Then, we decrease the lower bound on N in small steps to N = 0.09. Finally, we raise K1 to K1 = 8 and K2 to K2 = 7.
5.6.4 Results Figure 5.4 compares the policy function ˆhζ (N , ln Z) from a second-order perturbation solution to the projection solution. There is a marked difference between both solutions. The former slopes upward if both current employment N and current labor productivity ln Z decrease, whereas the latter decreases. In the present model, it is therefore important to include information from more than the stationary solution when computing the policy function. Consequently, simulations of the model with the perturbation solution imply that unemployment remains below a certain threshold. In Figure 5.5, the left panel shows the distribution of unemployment rates computed from a simulation with 480,000 periods. The largest value of unemployment observed is equal to U = 17.27 percent. The right panel of the figure displays the distribution of unemployment computed from the projection solution with the same starting point (the stationary solution) and the same random numbers for the process (5.31k). The right tail of the histogram indicates that much larger unemployment rates occur. The largest value observed is equal to U = 77.34 percent. Hence, there are periods during which the economy endogenously enters economic disasters with unusually high unemployment and low consumption.
5.6 The Benchmark Search and Matching Model
273 Projection
Employment Value ζ
Perturbation
10
0
0.1 0.2
0.4
0 0.6
−0.1
0.8
Employment N
g
Lo
FP
ln
Z
T
Figure 5.4 Policy Function for the Value of Employment
Figure 5.6 reveals such an event. It pictures the time path of unemployment between period t = 3001 and t = 4000. While the unemployment Projection
Perturbation
12.5
2.0
10.0
Percent
1.5
7.5 1.0 5.0 0.5
2.5
0.0 0.08
0.12
0.16 Unemployment Rate
0.0 0.00
0.20
0.40
0.60 Unemployment Rate
0.80
Figure 5.5 Distribution of Unemployment in the Search and Matching Model
274
5 Weighted Residuals Methods
rate remains below 20 percent during the entire period according to the perturbation solution, it increases beyond 30 percent in period t = 3364 according to the weighted residuals solution. Perturbation Projection
Unemployment Rate
0.30
0.20
0.10
3,000 3,100 3,200 3,300 3,400 3,500 3,600 3,700 3,800 3,900 4,000
Month Figure 5.6 Simulated Time Path of Unemployment
Petrosky-Nadeau et al. (2018) compute the average size and occurrence of economic crises defined as a cumulative decrease in consumption or output of at least 10 percent between the peak and the trough of a cycle and compare simulations of their model with empirical data. They find that the average size of a crises in consumption is close to that found in the data. The probability of entering a crisis, however, is much smaller in their model than in the data. We do not pursue these computations here, since our goal is to provide an example, where the predictions from the model simulations differ widely between those obtained from a perturbation solution and from a global method. Heiberger (2020) argues that the potential of the search and matching model to generate infrequent but severe economic downturns rests on the large value of unemployment compensation, which is independent of the state of the economy. The wage equation (5.31g) together with the small bargaining power of workers implies that the real wage hardly responds to productivity shocks. A series of unfavorable shocks can thus drive down the firm’s profits so that investment in job offers comes to a halt and the economy slides into a crisis since the exogenous separation of workers from jobs continues. We encourage the reader to experiment with different
5.7 Disaster Risk Models
275
calibrations of the parameters b and ϕ in the script Search_CGT.m to establish this claim. In the next subsection, we explore the role of a shock process that exogenously triggers economic crises to solve the equity premium puzzle.
5.7 Disaster Risk Models 5.7.1 Motivation The interest in disaster risk models stems from papers by Rietz (1988), Barro (2006), and Barro and Ursúa (2012), who argue that disaster risk is a key to solving the equity premium puzzle raised by Mehra and Prescott (1985). The puzzle paraphrases the inability of the standard real business cycle model to replicate the empirically observed size of the spread between the return on stocks and the interest earned from holding relatively riskless securities such as treasury bills or government bonds. Since the seminal article by Mehra and Prescott (1985) numerous papers have considered the empirical validity of the premium and have provided arguments to resolve the puzzle. The chapters in the handbook edited by Mehra (2008) compile much of this research. The data base compiled by Jordà, Knoll, Kuvshinov, Schularick, and Taylor (2019), covering annual return rates from 1870 through 2015 for 16 countries, confirms earlier findings. Table 5.5 reports average returns, standard deviations of returns, equity premia, and Sharpe ratios across time and countries from this study. The equity premium is the difference between the return on equity and the riskless rate. Indicators of the latter are either the return on Treasury Bills or the return on long term government bonds. The Sharpe ratio is the equity premium per unit of the standard deviation of the equity return. If we use the real return on bills (bonds) as the indicator of the riskless rate, the equity premium is 5.75 (4.35) percentage points and the Sharpe ratio to 0.26 (0.20). As Jordà, Schularick, and Taylor (2019) put it: “The risk premium puzzle is worse than you think.” Here, we will study the ability of several models with disaster risk to reproduce the data in Table 5.5. To disentangle the contribution of the various features of the final model, we proceed from a simple to a relatively complex model. The state space of the latter model has five dimensions, so we employ the Smolyak collocation method (see Section 13.9.4) to avoid
276
5 Weighted Residuals Methods Table 5.5 Data on Global Real Returns
Mean Standard Deviation
Bills
Bonds
Equity
1.03 6.0
2.53 10.69
6.88 21.79
Equity Sharpe Equity Sharpe Premia Ratio Premia Ratio Bills Bills Bonds Bonds 5.75
0.26
4.35
0.20
Notes: Return data are from Table II of Jordà et al. (2019). The implied equity premia are the difference between the return on equity and the respective yield on bills or bonds. The Sharpe ratios are equity premia per unit of the standard deviation of equity returns.
the curse of dimensionality. To ensure the results are comparable, we also employ this method for the simpler models with a smaller state space. We start with the benchmark business cycle model with an additional binary shock that indicates whether the economy is in a severe recession. Next, we replace the standard additively separable intertemporal utility function with the generalized expected utility (GEU) function introduced by Epstein and Zin (1989) and Weil (1990). To this model, we add adjustment costs of capital as in Jermann (1998). Finally, we consider a time varying size of disasters and a time varying disaster probability similar to Gourio (2012). Our final model still lacks some desirable features of an asset pricing model. For instance, we do not consider leverage as in Gourio (2012) or include additional assets as in Fehrle and Heiberger (2020). The following sections serve to guide the reader through the various steps to develop and solve a complex model and to demonstrate the virtues of a global solution methods over the much simpler to implement perturbation method.
5.7.2 The Benchmark Business Cycle Model with Disaster Risk TECHNOLOGY. We introduce disaster risk into the model presented in Example 1.6.1 by modifying the production technology for output Yt of the representative firm. Instead of equation (1.59), we employ Yt = K tα (A t L t )1−α ,
α ∈ (0, 1).
(5.34)
K t and L t denote rented capital and employed labor, respectively. The labor-augmenting technical progress A t grows stochastically at the rate
5.7 Disaster Risk Models
a t :=
277
At = a¯ ez t −ωd t , A t−1
z t = ρz z t−1 + εz t ,
a¯ ≥ 1, ω ≥ 0, d t ∈ {0, 1},
ρz ∈ (−1, 1), εz t iid N (0, σz2 ).
(5.35a) (5.35b)
The state d t = 1 indicates a severe recession, that is, an economic disaster. The probability of such an event occurring is the constant p. The time independence implies that the probability of entering a diaster and that of remaining in a disaster state are identical: Prob(d t = 1|d t−1 = 0) = Prob(d t = 1|d t−1 = 1) = p, Prob(d t = 0|d t−1 = 0) = Prob(d t = 0|d t−1 = 1) = 1 − p.
(5.36)
An economic disaster lowers productivity growth from a t = ez t to a t = ˜ t so that capital ez t −ω and destroys part of the existing capital stock K employed in production is equal to ˜t . K t = e−ωd t K In their empirical study of economic crises Barro and Ursúa (2008) use the peak-to-trough method and define a crisis as a cumulative decline in either consumption or GDP of at least 10%. Their data set includes 42 countries and covers the period 1870 to 2006. They identify 247 crises with an average size of 21-22 percent and a probability to occur of 3.5 percent per year. More recent examples of severe crises are the Great Recession 2007-2009 and the Covid Pandemic 2020-2022. DYNAMICS. All other elements of the model remain the same. However, since A t is now stochastic and changes at the beginning of the period, we cannot scale capital by A t since k t = K t /A t would not be predetermined. We therefore scale capital as well as all other growing variables by A t−1 : x t :=
Xt , X t ∈ {A t , C t , I t , K t , Wt , Yt }. A t−1
C t , I t , and Wt denote consumption, investment, and the real wage, respectively. The first-order condition for consumption, condition (1.63a), requires scaling the Lagrange multiplier of the budget constraint Λ t as η
λ t := Λ t A t−1 . Accordingly, the following set of equations determines the dynamics of the economy together with the processes (5.35) and (5.36):
278
5 Weighted Residuals Methods −η
0 = c t (1 − L t )θ (1−η) − λ t , 0 = θ c t − w t (1 − L t ), 0 = k t − e−ωd t ˜k t , 0=
0 = rt − α
yt , kt
(5.37b) (5.37c)
y t − kαt (a t L t )1−α ,
0 = w t − (1 − α)
(5.37a)
(5.37d)
yt , Lt
(5.37e) (5.37f)
0 = yt − ct − it , 0 = a t ˜k t+1 − (1 − δ)k t − i t , § ª λ t+1 −ωd t+1 −η 0 = 1 − β at Et e (1 − δ + r t+1 ) . λt
(5.37g) (5.37h) (5.37i)
CALIBRATION. Table 5.6 presents the choice of the model’s parameters. The period length is one quarter. Except for the size and probability of a disaster, the parameter values refer to the German economy and reflect quarterly observations from the first quarter of 1991 to the fourth quarter of 2019. We take the probability p of entering a disaster period from a non-disaster quarter from Fernández-Villaverde and Levintal (2018), Table 1 and the size of the shock ω from Fehrle and Heiberger (2020), Table 7. Table 5.6 Calibration of the Benchmark Model with Disaster Risk Preferences Production Disaster Shock Capital Accumulation
β=0.996 η=2.0 L=0.126 a¯=1.0028 α=0.36 ρz =0.0 σz =0.0116 p=0.0043 ω=0.067 δ=0.016
PERTURBATION SOLUTION. Perturbation methods are not really suited to the present model due to the binary variable d t . However, transforming d t into a continuous random variable allows us to derive an approximate model that fits into the perturbation framework set out in Section 2.5.2. The unconditional expectation of an economic disaster is equal to µ = p × 1 + (1 − p) × 0 = p
5.7 Disaster Risk Models
279
and its variance is given by σ2 = p(1 − p)2 + (1 − p)(0 − p)2 = p(1 − p).
Accordingly, let s t denote a continuous random variable with mean E(s t ) = 0 and variance E(s2t ) = 1. Then define Æ d˜t = µ + σs t = p + p(1 − p)s t (5.38)
so that d˜t is a continuous random variable with the same mean and the same variance as the original binary random variable d t . The model in equations (5.37) with d t replaced by d˜t can be solved with perturbation methods. These methods require solving for the stationary solution of the deterministic version of a model. To obtain this solution, we must substitute d˜t with its unconditional expectation p, assume z t = 0 for all t, and vt = vt+1 = v for all variables v of the model. From equation (5.35) we obtain a = a¯ e−ωp .
(5.39a)
Given this value and the parameters β, α, δ, and η we can use equations (5.37f) and (5.37i) to solve for the capital-output ratio y aη eωp − β(1 − δ) = . k αβ
(5.39b)
Using (5.37d) gives the capital-labor ratio y α−1 k =a . L k 1
(5.39c)
For a given value of working hours L, this solution delivers the capital stock k and the stationary value of output y. Hence, the solution of the capital stock before the realization of the disaster variable follows form equation (5.37c) and is equal to ˜k = eωp k.
(5.39d)
Equations (5.37e) and (5.37f) allow us to solve for the stationary real wage w and the stationary rental rate of capital r. The stationary value of investment follows from equation (5.37f): i = (¯ a − 1 + δ)k.
(5.39e)
280
5 Weighted Residuals Methods
Equation (5.37b) allows us to infer the value of the parameter θ as θ = (1 − L)
w c
(5.39f)
so that we obtain λ from equation (5.37a). The model has one endogenous state variable ˜k t , two shocks (z t , s t ), and nine variables that are not predetermined. v t := (a t , y t , c t , i t , Nt , k t , w t , r t , λ t ). The model equations capture the process (5.35) and the system (5.37), where d t is replaced by d˜t as defined in equation (5.38). The code for the R perturbation solution is part of the MATLAB script DR_V1.m. The system (5.37) is encoded in the function DRV1_Eqs_SD.m. SMOLYAK COLLOCATION SOLUTION. The state of the model is characterized by the triple (˜k t , z t , d t ). We approximate the policy function for hours by two Smolyak polynomials (see Section 13.9.4), one for normal periods and one for periods with economic disasters. The univariate polynomials that build the Smolyak polynomial are Chebyshev polynomials defined on the interval [−1, 1]. Hence, as in Section 5.5, we must choose bounds on ˜k and z and map the elements from ˜k ∈ [k, k] and z ∈ [z, z] to [−1, 1]. To avoid confusion with the productivity shock z t , we use x ∈ [−1, 1] to refer to the arguments of the Chebyshev polynomials. Accordingly, we now define the functions ξk and ξz as (see also equation (13.28a)): ˜k = ξk (x k )=k + z = ξz (x z ) =z +
(1 + x k )(¯k − k)
2 (1 + x z )(¯ z − z) 2
,
x k ∈ [−1, 1],
(5.40a)
,
x z ∈ [−1, 1].
(5.40b)
The Smolyak polynomial at the point (x k,i , x z,i ) is the inner product of the Smolyak base b(x k,i , x z,i ) and the vector of polynomial coefficients φ. Accordingly, the two policy functions for hours are: ˆh L (ξk (x k,i ), ξz (x z,i ), d) := b(x k,i , x z,i ) T φ , d ∈ {0, 1}. d
(5.41)
The number of points in the base, which we denote by n b , depends on the dimension of the state space ns and the level of approximation µ = 1, 2, . . . . For instance, in our two dimensional state space, the base has 13 elements for µ = 2 and 29 for µ = 3 (see Judd et al. (2014), Table 1). The Smolyak collocation determines the coefficients φ 0 and φ 1 such that the residuals of the Euler equation (5.37i) vanish on the Smolyak
5.7 Disaster Risk Models
281
grid.17 This grid is a collection of points (x k,i , x z,i ) ∈ [−1, 1] × [−1, 1] determined by the algorithm. For the entire base, a matrix of size n b × n b , we write b(x k,1 , x z,1 ) T b(x k,2 , x z,2 ) T B := (5.42) .. . b(x k,n b , x z,n b ) T
so that hours at state d on all collocation points follow from Bφ d . COMPUTATION OF RESIDUALS. The n b residuals of equation (5.37i) for d ∈ {0, 1} are the solution to the following steps. First, given B and the vector φ d compute for i = 1, 2, . . . , n b ˜ki = ξk (x k,i ),
(5.43a)
zi = ξz (x z,i ),
(5.43b)
L i,d = b(x k,i , x z,i ) T φ d ,
(5.43c)
zi −ωd
(5.43d)
ai,d = a¯ e , ki,d = ˜ki e−ωd , yi,d =
w i,d = (1 − α) ri,d = α ci,d = λi,d
(5.43e)
α ki,d (ai,d L i,d )1−α ,
yi,d
yi,d
L i,d
(5.43f)
,
(5.43g)
, ki,d w i,d (1 − L i,d )
(5.43h)
, θ −η = ci,d (1 − L i,d )θ (1−η) ,
ii,d = yi,d − ci,d ,
˜k0 = i,d
(1 − δ)ki,d + ii,d ai,d
.
(5.43i) (5.43j) (5.43k) (5.43l)
0 Second, given the next-period capital stock ˜ki,d and zi,d , compute the expectation on the right-hand side of equation (5.37i) via Gauss-Hermite 17
See Section 13.9.4 on the construction of this grid. Note in particular that x k,i , zz,i ∈ [−1, 1] refer to extrema of Chebyshev polynomials and that the index i refers to an element of the Smolyak base but does not imply that x k,i is the same extrema as x z,i .
282
5 Weighted Residuals Methods
integration with nodes x GH and weights νGH j j , j = 1, 2, . . . nGH : In the system (5.43), replace x k,i , x z,i , and d as follows: ˜0 x k,i ← ξ−1 k ( ki ),
x z,i, j ← ξ−1 z (ρ Z zi +
p
2σz x GH j ),
d ← d 0.
and compute the (scaled) Lagrange multiplier λ and the rental rate of capital r via the steps (5.43c) through (5.43j). For brevity, let p 0 0 si, j,d 0 := ˜ki,d , ρz zi + 2σz x GH j ,d summarize the new state vector, and let λ(si, j,d 0 ) and r(si, j,d 0 ) denote the values of λ and r at this point. Then, the residual r es(x k,i , x z,i , d) is (approximately) equal to r es(x k,i , x z,i , d) =1 − (1 − +
−η p)β ai,d
−η pβ ai,d
nGH X λ(si, j,0 ) j=1
nGH X λ(si, j,1 ) j=1
λi,d
λi,d
GH
νj 1 − δ + r(si, j,0 ) p
π
GH
νj 1 − δ + r(si, j,1 ) p . π
Note that these computations can be encoded in terms of matrix operations as shown in the code of the function DR_V1_Eqs_Smolyak.m that returns the 2n b vector of residuals. EQUITY PREMIUM. The gross return on one unit of capital installed in period t and used in production in period t + 1 is equal to R t+1 = e−ωd t+1 (1 − δ + r t+1 ).
(5.44)
The return on a bond, purchased in period t, which returns one unit of consumption in period t + 1 independent of the state of the economy, equals the inverse of the expected stochastic discount factor, i.e.: f
Rt =
1 , E t m t+1
−η
E t m t+1 = β a t E t
λ t+1 . λt
(5.45)
The equity premium is the expected excess of the return on capital over the riskless rate:
5.7 Disaster Risk Models
283
f
E Pt = E t (R t+1 − R t ). We estimate this premia as average from a model simulation. In each period t, we use the Gauss-Hermite quadrature to approximate E t m t+1 with a two-step procedure. Given the state (˜k t , z t , d t ), we compute p a t , λ t , and ˜k0 from equations (5.43c)-(5.43l). For s t, j,d 0 := (˜k0 , ρz z t + 2σz x GH , d 0 ), t t j we compute λ t+1 with the new state vector s t, j,d 0 in place of (˜k t , z t , d) so that (approximately)
GH GH nGH nGH X X ν ν λ(s t, j,0 ) j λ(s t, j,1 ) j −η E t m t+1 = β a t (1 − p) p +p p . (5.46) λ λ π π t t j=1 j=1
The model simulations start at the deterministic stationary solutions (˜k, z = 0, d = 0). We discard the first 1,000 periods and compute the financial statistics from the remaining 100,000 quarters. In the next step, we time aggregate four quarters to one year so that the average returns and standard deviations refer to annual data as in Table 5.5. The simulations include periods with disasters. The implication of this procedure will become obvious in the following paragraph. RESULTS. Table 5.7 presents annualized rates of return from a simulation of the model over 100,000 periods. The first two rows of results are from the version of the model without disaster risk.18 The next four rows present results for two different calibrations of the disaster size. The returns include periods with economic disasters. The final two rows present results that exclude disaster periods. It is well known that the benchmark business cycle model is not able to replicate the empirically observed equity premium. The presence of disasters does not change this picture, though the equity premium is approximately one order of magnitude larger than in the model without disasters. The model requires a much larger impact from the disaster to generate an equity premium of over one percentage point. The value of ω = 0.5108, which achieves this result, is taken from Fernández-Villaverde and Levintal (2018). Note that the larger premium is not the result of an increased return on equity but a smaller riskless return. The intuition behind this result is simple: The household wants to substitute away from 18
Here and in the following tables, we ignore rows with parameter values and count only rows with financial statistics.
284
5 Weighted Residuals Methods Table 5.7 Annualized Real Returns in the Benchmark Model Rf
sR f
R
sR
EP
SR
p = 0 and ω = 0 Perturbation 3.88 Projection 3.88
0.43 0.43
3.88 3.88
0.44 0.44
0.001 0.002 0.001 0.002
p = 0.0043 and ω = 0.067 Perturbation 3.86 Projection 3.86
0.43 0.43
3.87 3.87
0.99 0.99
0.016 0.016 0.016 0.016
p = 0.0043 and ω = 0.5108 Perturbation 2.55 Projection 2.55
0.44 0.43
3.84 3.79
5.52 5.52
1.29 1.24
0.23 0.23
p = 0.0043, ω = 0.5108, d = 0 Perturbation 2.55 Projection 2.55
0.44 0.43
4.57 4.51
0.45 0.45
2.02 1.97
4.50 4.39
Notes: R f : riskless rate, sR f : standard deviation of riskless rate, R: return on equity, sR : standard deviation of equity return, E P: equity premium, SR: Sharpe ratio. Averages from simulations with 100,000 periods.
riskier capital to save bonds so that the price of bond increases and their return declines. In the deterministic version of the model, the gross riskless rate is equal to aη /β. For the values of the three parameters from Table 5.6, the annual rate amounts to 3.91%. Therefore, the effect of potential disasters, namely to drive down the riskless rate from 3.91% to 2.55%, appears only in stochastic simulations of the model. The perturbation and the projection solution differ only slightly with respect to the implied return on equity in the case of a large disaster (compare the entries in column R in rows five and six and in rows seven and eight, respectively). The results in the last two rows exclude periods in which the economy is in a disaster state. Compared with the preceding two rows, they illustrate three points: first, the equity premium emerges primarily from the effect of potential economic disasters on the riskless rate. Disasters must not be realized to drive down the save return. Second, as expected, because disasters destroy capital, equity earns higher returns if disasters do not
5.7 Disaster Risk Models
285
happen. Therefore, the equity premium is 0.73 percentage points higher in simulations without disasters. Third, and related to the second point, disasters considerably increase the volatility of equity returns. Note that the return statistics in Table 5.5 include the impact of severe recessions (see Jordà et al. (2019), p. 1242). Accordingly, our simulation results presented below include disaster events (if not stated otherwise).
5.7.3 Generalized Expected Utility In this section, we replace the additively separable intertemporal utility function with the GEU function introduced by Epstein and Zin (1989) and Weil (1990). The production technology, the market structure, and the properties of both the technology shock and the disaster shock remain the same as in the previous section. PREFERENCES. Let 1 1−η 1−η 1−η 1−γ 1−γ 1−η θ ˜ t = (1 − β a ) C t (1 − L t ) ˜ U + β Et U , t+1
(5.47)
β ∈ (0, 1), η, γ ∈ R>0 \ {1}
denote the household’s intertemporal utility function. For a given weight on leisure θ , the additional parameter γ determines the coefficient of relative risk aversion R(C, L) = γ − θ (1 − γ).19 The inverse of the parameter η is the IES with respect to u t := C t (1 − L t )θ . The function on the righthand side of this equation is a CES aggregator of current utility derived from consumption C t and leisure 1 − L t and the certainty equivalent of ˜ 1−γ )1/(1−γ) . The parameter a is equal to the deterministic future utility (E t U t+1 steady-state value of the factor a t defined in (5.35a).20 The function reduces to the time additive intertemporal utility function, if η = γ. The case η < γ characterizes a representative consumer, who dislikes the uncertainty arising from the ex ante unknown realization of consumption and leisure more than intertemporal fluctuations in expected utility. The opposite holds for η > γ (see Weil (1990), pp. 32f.). 19
See Swanson (2018), p. 300 and compare this with (1.52a), which applies to the case of the additively separable intertemporal utility function. 20 The effect of introducing the term 1 − β a1−η in front of C t (1 − L t )θ ) is a convenient solution for lifetime utility in the deterministic version of the model. See equation (5.55) below.
286
5 Weighted Residuals Methods
FIRST-ORDER CONDITIONS OF THE HOUSEHOLD. Note that (5.47) implies 1 1−η 1−η −1 η 1−γ 1−γ 1−η θ 1−η ˜ ˜ U t = (1 − β a ) C t (1 − L t ) + β E t U t+1 , and
˜ t+1 = (1 − β a1−η ) C t+1 (1 − L t+1 ) U
θ 1−η
+β
˜ 1−γ E t+1 U t+2
1−η 1−γ
1 1−η
.
Using ˜ t + (1 − δ)e−ωd t K ˜t − K ˜ t+1 , C t = Wt L t + r t e−ωd t K
˜ t+1 + (1 − δ)e−ωd t+1 K ˜ t+1 − K ˜ t+2 C t+1 = Wt+1 L t+1 + r t+1 e−ωd t+1 K to replace consumption in both periods in equation (5.47) and differentiating the resulting function with respect to hours L t and the future capital ˜ t+1 provides the first-order conditions: stock K § ª ˜t ∂U Ct ˜ tη C t−η (1 − L t )θ (1−η) Wt − θ = (1 − β a1−η )U = 0, ∂ Lt 1 − Lt (5.48a) η ˜t ˜t U ∂U −η = (1 − η)(1 − β a1−η )C t (1 − L t )θ (1−η) (5.48b) ∂ K t+1 1−η 1−η ˜ 1−η 1−γ 1−γ −1 −γ ∂ U t+1 ˜ ˜ E t U t+1 (1 − γ)E t U t+1 =0 + ˜ t+1 1−γ ∂K
where
˜ t+1 ∂U ˜ η (1 − β a1−η )C −η (1 − L θ (1−η) )e−ωd t+1 (1 − δ + r t+1 ). =U t+1 t+1 t+1 ˜ ∂ K t+1 (5.48c) Accordingly, the first-order condition for labor supply is the same as in the benchmark model. Combining (5.48b) and (5.48c) yields the first-order condition for capital accumulation: Λ t = βE t Λ t+1 −η
˜ 1−γ U t+1
η−γ 1−γ
˜ 1−γ Et U t+1
Λ t := C t (1 − L t )θ (1−η) .
e−ωd t+1 (1 − δ + r t+1 ) ,
(5.49)
5.7 Disaster Risk Models
287
For η = γ, the term in square brackets drops out and we again have the benchmark model. To derive the dynamics of the model in stationary variables, let us first define the variable 1−η
˜t U t := U
(5.50)
and the parameter χ := 1 −
1−γ γ−η = 1−η 1−η
(5.51)
Accordingly, equation (5.47) implies 1−η
U t = (1 − β a1−η )C t
Let
1 1−χ 1−χ (1 − L t )θ (1−η) + β E t U t+1 .
η−1
u t := U t A t−1 and 1−χ
vt := E t u t+1
(5.52)
so that the scaled life-time utility has the recursive definition 1−η
u t = (1 − β a1−η )c t
1−η
(1 − L t )θ (1−η) + β a t
1 1−χ
vt
and the Euler equation (5.49) can be written as −χ u −η t+1 λ t = β a t E t λ t+1 e−ωd t+1 (1 − δ + r t+1 ). 1/(1−χ) vt
(5.53)
(5.54)
DYNAMICS AND CALIBRATION. The dynamics of the model consists of equations (5.37a)-(5.37h). Equation (5.54) replaces the Euler equation (5.37i), and the two equations (5.52) and (5.53) determine the additional variables u t and vt . For a given value of η, the model has the same deterministic stationary solution as the model with the standard intertemporal utility function. Equations (5.52) and (5.53) imply u = c 1−η (1 − L)θ (1−η) .
(5.55)
288
5 Weighted Residuals Methods
For this reason, we continue to use the value of the parameter η from Table 5.6. Caldara et al. (2012) consider values for the parameter γ in the set {2, 5, 10, 40}. We select the smallest value from this set that differs from our choice of η and put γ = 5. The values of all other parameters are equal to those in Table 5.6. SMOLYAK COLLOCATION. The model has the same state variables as the model in the previous section. In addition to the policy function for hours h L we need the policy function for utility hu . Accordingly, we have twice as many parameters as in the benchmark model: ˆh L (ξk (x k,i ), ξz (x z,i ), d) = b(x k,i , x z,i )φ L , d
(5.56)
ˆhu (ξk (x k,i ), ξz (x z,i ), d) = b(x k,i , x z,i )φ u , d ∈ {0, 1}. d
(5.57)
The residuals of the Euler equation (5.54) and of equation (5.53) at the collocation nodes in the matrix B supply the required number of conditions. The steps to compute the residuals are basically the same as in the previous 0 section. In the first step, we compute ai,d , ci,d , L i,d , λi,d , and ˜ki,d from equations (5.43c)-(5.43l) and ui,d from equation (5.57). In the second step, we solve the system at the points p 0 0 si, j,d 0 := ˜ki,d , ρz zi + 2σz x GH , d j for the variables λ, u, and r. The expected value on the rhs of equation (5.52) is then approximately equal to vi,d = (1 − p)
nGH X j=1
1−χ
u(si, j,0 )
nGH X νGH νGH j 1−χ j u(si, j,1 ) p +p p π π j=1
(5.58)
so that the residual of equation (5.53) follows from 1−η
r esu (x k,i , x z,i , d) =ui,d − (1 − β a1−η )ci,d (1 − L i,d )θ (1−η) 1−η
1 1−χ
− β ai,d vi,d . The residual of the Euler equation (5.54) is (approximately) equal to
5.7 Disaster Risk Models
289
−η r es k (˜ki , zi , d) = 1 − (1 − p)β ai,d
−
−η pβ ai,d
nGH X λ(si, j,0 )
nGH X λ(si, j,1 ) j=1
−ω
!−χ
1/(1−χ)
vi,d
νGH j × (1 − δ + r(si, j,0 )) p π !−ν u(si, j,1 ) 1/(1−ν)
λi,d ×e
λi,d
j=1
u(si, j,0 )
vi,d
νGH j (1 − δ + r(si, j,1 )) p . π
EQUITY PREMIUM. The gross return on capital is still given by equation (5.44). The stochastic discount factor, however, differs from that in equation (5.45). Accordingly, the riskless rate is now equal to f Rt
1 := , E t m t+1
m t+1 =
−η λ t+1 β at λt
u t+1 1/(1−χ)
vt
−χ
.
(5.59)
Again, we employ the Gauss-Hermite quadrature to approximate E t m t+1 in a two-step procedure. Given the state (˜k t , z t , d t ), we compute a t , λ t , and ˜k0t from equations (5.43c)-(5.43l). For p 0 s t, j,d 0 := ˜k0t , ρz t + 2σz x GH j ,d we solve the system (5.43) for λ t+1 , compute u t+1 from (5.57), and vt from (5.58) so that −ν GH nGH X νj λ(s t, j,0 ) u(s t, j,0 ) −η E t m t+1 = (1 − p)β a t 1/(1−ν) λt π vt j=1 (5.60) −ν GH nGH X νj λ(s t, j,1 ) u(s t, j,1 ) −η + pβ a t . 1/(1−ν) λt π vt j=1 RESULTS. Table 5.8 presents summary financial statistics from simulations of the model. The interested reader finds the respective computer code in the script DR_V2.m. As the model with standard preferences, the model with GEU and without disaster risk is unable to predict a non-negligible equity premium.
290
5 Weighted Residuals Methods Table 5.8 Annualized Real Returns with GEU Rf
sR f
R
sR
EP
SR
γ = 5, p = 0 and ω = 0 Perturbation 3.78 Projection 3.78
0.43 0.43
3.79 3.78
0.43 0.43
0.004 0.01 0.004 0.01
γ = 5, p = 0.0043 and ω = 0.067 Perturbation 3.72 Projection 3.72
0.43 0.43
3.77 3.77
0.99 0.99
0.05 0.05
0.05 0.05
γ = 5,p = 0.0043 and ω = 0.36 Perturbation 0.99 Projection 0.48
0.42 0.40
3.45 3.09
4.16 4.14
2.46 2.61
0.59 0.63
β = 0.99, γ = 5, p = 0.0043, and ω = 0.42 Perturbation 1.69 Projection 1.02
0.48 0.46
5.86 5.25
4.83 4.80
4.17 4.23
0.86 0.88
Notes: R f : riskless rate, sR f : standard deviation of riskless rate, R: return on equity, sR : standard deviation of equity return, E P: equity premium, SR: Sharpe ratio. Averages from simulations with 100,000 periods, including disaster periods. Unless noted otherwise, all parameters are calibrated according to Table 5.6.
However, if the effect of an economic disaster increases, the riskless rate declines much more in this model than in the model with standard preferences. The channel for this effect is the rightmost term of the stochastic discount factor (5.59). Expected utility declines with increasing disaster impact and drives down the stochastic discount factor.21 For instance, with ω = 0.36, the riskless rate declines in the projection solution from 3.78 percent in the model without disasters to 0.48 percent. At the same time, the return on equity decreases only slightly so that the equity premium raises from zero to 2.61 percent. Further increases of the parameter ω drive the riskless rate below zero. A smaller discount factor β outweighs this effect. For instance, decreasing the discount factor to β = 0.99 as in Fernández-Villaverde and Levintal (2018) and increasing the effect of the 21
Note that η = 2 and γ = 5 imply χ = −3 from equation (5.51).
5.7 Disaster Risk Models
291
disaster to ω = 0.42 (still below the value of ω = 0.5108 employed by the same authors) implies an equity premium of 4.8 percentage points. Note that the perturbation solution does not fully capture the effect of disaster size on the riskless rate, a result also known from the study of Fernández-Villaverde and Levintal (2018). While the last row in Table 5.8 shows that a model with GEU and disaster risk is able to imply both a riskless rate and an equity premium close to the data in Table 5.5, our model fails to predict the standard deviations of asset returns. Both the riskless rate and the equity return series from the data are considerably more volatile than their counterparts in the model.
5.7.4 Adjustment Costs of Capital This section introduces adjustment costs of capital as in Jermann (1998) into the model presented in the previous section. As shown by Heiberger and Ruf (2019), the combination of adjustment costs of capital and GEU is able to reproduce the empirically observed equity premium, albeit only for an IES of 1/η close to zero and for a relatively small subjective discount factor of β = 0.987 (see their tables 1 and 2). CAPITAL ACCUMULATION. Instead of the usual law of capital accumulation ˜ evolves as we now assume that the capital stock K ˜ t+1 = (1 − δ)K t + Φ(I t /K t )K t K
(5.61)
and parameterize the function Φ as Jermann (1998): Φ(x) :=
ϕ1 1−κ x + ϕ2 , 1−κ
κ ∈ R≥0 \ {1}.
(5.62)
The deterministic stationary solution of the model does not change, if the function Φ satisfies Φ0 (¯ a − 1 + δ) = 1 and Φ(¯ a − 1 + δ) = a¯ − 1 + δ. Accordingly, the parameters ϕ1 and ϕ2 must be equal to ϕ1 = (¯ a − 1 + δ)κ , κ ϕ2 = (¯ a − 1 + δ). κ−1
(5.63)
FIRST-ORDER CONDITIONS OF THE HOUSEHOLD. The function Φ(x) is a strictly increasing function of its argument x so that its inverse
292
5 Weighted Residuals Methods
x = Φ−1 ( y) gives the value x = I t /K t associated with a desired increase of the produc˜ t+1 − 1 + δ)/K t . The tive capacity per unit of the existing capacity y := (K inverse is also differentiable (see, e.g., Lang (1997), Theorem 3.2, p. 75). ˜ t+1 : Hence, we can solve (5.61) for investment I t as a function of K t and K ˜ −1 K t+1 It = Φ − 1 + δ Kt Kt with derivative ∂ It 1 = Φ−1 (·)0 = 0 =: q t ˜ t+1 Φ (I t /K t ) ∂K
(5.64)
Thus, we can write the household’s budget constraints in both periods as ˜ −1 K t+1 C t =Wt Nt + r t K t − Φ − 1 + δ Kt , Kt ˜ t+1 C t+1 =Wt+1 Nt+1 + r t+1 e−ωd t+1 K ˜ t+2 K −1 ˜ t+1 . − 1 + δ e−ωd t+1 K −Φ ˜ t+1 e−ωd t+1 K ˜ t+1 yields the Differentiating the utility function (5.47) with respect to K first-order condition for capital accumulation. In the scaled variables, it replaces the Euler equation (5.54) and reads: qt =
−η β at Et
λ t+1 λt
u t+1
−χ
1/(1−χ)
vt i t+1 −ωd t+1 e r t+1 − + q t+1 (1 − δ + Φ(i t+1 /k t+1 )) , k t+1
(5.65)
−η
where the variables Λ t = A t−1 λ t and q t are defined in equations (5.49) and (5.64), respectively. The scaled version of equation (5.61) a t ˜k t+1 = (1 − δ)k t + Φ(i t /k t )k t replaces equation (5.37h).
(5.66)
5.7 Disaster Risk Models
293
STATIONARY SOLUTION AND CALIBRATION. As noted, the properties of the function Φ in equations (5.63) lead to the deterministic stationary solution coinciding with that of the model in the previous section. In comparison with the model in the previous section, the present model has one additional parameter κ, which is equal to the elasticity of the variable q t with respect to the investment-capital ratio i t /k t . Equation (5.64) tells us that q t units of investment increase the capital stock by one unit. Hence, we can interpret this variable as the relative price of capital in terms of the final good. This price is more sensitive to a desired increase in investment relative to the given capital stock, the larger is the parameter κ. We shall consider several values of this parameter within the interval of values employed in the literature. The values of all other parameters remain as presented in Table 5.6. We compute the solution of the model as we solved the model in the previous section. We only need to change the code for the residuals of the new Euler equation (5.65). The interested reader can consult the code in the function DRV2_Eqs_Smolyak.m. EQUITY PREMIUM. The return on one additional unit of investment is equal to the term in the second line of equation (5.65). Simple algebraic manipulations using equations (5.37f) and (5.66) lead to the equivalent formula R t+1 = e−ωd t+1
α y t+1 − i t+1 + q t+1 a t+1˜k t+2 . q t k t+1
(5.67)
Equation (5.59) for the riskless rate and the approximation in equation (5.60) are still valid. RESULTS. Table 5.9 presents financial statistics from three simulations of the model coded in the script DR_V3.m. We kept the disaster size at the highest value in Table 5.8 so that the model can produce a sizeable equity premium and we can focus on the effect of the parameter κ. Adjustment costs impair the intertemporal substitution of consumption and leisure and give rise to a variable relative price of capital. For the value of κ = 0.8 the standard deviation of the return on equity increases slightly and the riskless rate declines moderately. Increasing κ to 3.0 and
294
5 Weighted Residuals Methods Table 5.9 Annualized Real Returns with GEU and Adjustment Costs of Capital Rf
sR f
R
sR
EP
SR
2.55 2.72
0.59 0.63
2.66 2.86
0.56 0.60
2.72 3.00
0.53 0.58
γ = 5, ω = 0.36, κ = 0.8 Perturbation 0.95 Projection 0.44
0.25 0.24
3.50 3.15
4.31 4.29
γ = 5, ω = 0.36, κ = 3.0 Perturbation 0.93 Projection 0.41
0.14 0.15
3.58 3.26
4.77 4.75
γ = 5, ω = 0.36, κ = 6.0 Perturbation 0.92 Projection 0.39
0.11 0.11
3.64 3.40
5.18 5.18
Notes: R f : riskless rate, sR f : standard deviation of riskless rate, R: return on equity, sR : standard deviation of equity return, E P: equity premium, SR: Sharpe ratio. Averages from simulations with 100,000 periods, including disaster periods. If not noted otherwise, all parameters are calibrated according to Table 5.6.
6.0 in line with values from related studies22 raises the volatility in the return to equity so that the model comes closer to the empirically observed fluctuations in equity returns. Still, however, the implied Sharpe ratios of over 0.5 are approximately more than twice as large as those reported in Table 5.5.
5.7.5 Variable Disaster Size and Conditional Disaster Probability In this section, we extend the model in two directions. First, we replace the time independent size of an economic disaster ω by a stochastic process for this variable. Second, we relax the assumption of a state independent probability of remaining in a disaster state. The small probability of p = 0.0043 from Table 5.6 implies that it is very unlikely for a disaster to last more than one quarter. However, the historical data on severe economic 22
For instance, Jermann (1998), employs κ = 4.35, Heiberger and Ruf (2019) employ κ = 6.3, and Heer et al. (2018) use κ = 2 and κ = 6.
5.7 Disaster Risk Models
295
recessions compiled and analyzed by Barro and Ursúa (2012) imply an average duration of approximately 14 quarters. DISASTER SIZE PROCESS. We assume that the amount by which a disaster d t = 1 lowers the growth rate of labor augmenting technical progress and by which it destroys capital, e−ω t , follows the stochastic process ˆ t = ρω ω ˆ t−1 + εωt , ω 2 ˆ t = ln(ω t /ω), ¯ ρω ∈ (−1, 1), εωt iid N (0, σω ω ).
(5.68)
Consequently, the expected disaster size, the constant ω in the models of the previous sections, is now equal to23 2
2
¯ 0.5σω /(1−ρω ) . ω := E(ω t ) = ωe
(5.69)
The process for the growth factor is now a t :=
At = a¯ ez t −ω t d t , A t−1
(5.70)
where the growth factor shock in normal times still follows the process in equation (5.35b). The capital stock in production is now determined from the state variable ˜k t by k t = e−ω t d t ˜k t .
(5.71)
In the Euler equation (5.65) and in the definition of the equity return in equation (5.67), we must replace the constant ω by the time dependent variable ω t+1 . Otherwise, the model does not differ from the model in the previous section. 23
ˆ t has unconditional mean µ = E(ω ˆ t ) = 0 and unconditional variance The variable ω 2 2 ˆ t ) = σω σ2 = var(ω /(1 − ρω ). The expectation of the log-normally distributed variable X with ln X ∼ N (µ, σ2 ) is (see, e.g., Sydsæter et al. (1999), p. 189): 2
E(X ) = eµ+0.5σ . Accordingly: 2
2
¯ = e0.5σω /(1−ρω ) , E(ω t /ω) which explains (5.69).
296
5 Weighted Residuals Methods
ˆ t , d t ) repreSMOLYAK COLLOCATION SOLUTION. The quadruple (˜k t , z t , ω sents the state of the economy in period t. The Smolyak polynomials which approximate the policy function for hours and utility now employ tensor products of three univariate Chebyshev polynomials: ˆh L (ξk (x k,i ), ξz (x z,i ), ξω (x ω,i ), d) := b(x k,i , x z,i , x ω,i ) T φ L , d
(5.72)
ˆhu (ξk (x k,i ), ξz (x z,i ), ξω (x ω,i ), d) := b(x k,i , x z,i , x ω,i ) T φ u . d
(5.73)
The first step in computing the residuals of equations (5.65) and (5.53) thus does not differ from the first step for the model in the previous section. The second step, computing the expectations from Gauss-Hermite integration, now involves double sums over products of Gauss-Hermite nodes. We illustrate this step for the Euler equation (5.52) and refer the interested reader to the function DRV4_Eqs-Smolyak.m for further details. Let p p 0 GH 0 ˆ si, j,l,d 0 := ˜ki,d , ρz zi + 2σz x GH , ρ ω + 2σ x , d ω ω i j l denote the system state at the Gauss-Hermite nodes x GH and x lGH with j
weights νGH and νGH , j, l = 1, 2, . . . nGH and u(si, j,l,d ) the value of the j l policy function (5.73) at this point. The extension of equation (5.58) to the enlarged state space is then given by: vi,d =(1 − p) +p
nGH X nGH X j=1 l=1
nGH X nGH X j=1 l=1
u(si, j,l,0 )
u(si, j,l,1 )
1−χ
1−χ
GH νGH j νl
π GH νGH j νl
π
.
(5.74)
In the same way, we adapt the computation of the stochastic discount factor in equation (5.46). CONDITIONAL DISASTER PROBABILITY. To account for longer durations of economic disasters in the model, we modify assumption (5.36) to: Prob(d t = 1|d t−1 = 0) = p0 , Prob(d t = 1|d t−1 = 1) = p1 ,
p1 > p 0 ,
Prob(d t = 0|d t−1 = 0) = 1 − p0 , Prob(d t = 0|d t−1 = 1) = 1 − p1 .
(5.75)
5.7 Disaster Risk Models
297
Note that we cannot introduce a state dependent probability in the perturbation framework, but it is easy to adapt the equations for the residuals of the projection solution. For instance, given that d = 1 in equation (5.74), we set p = p1 and p = p0 otherwise. The same applies to the computation of the expected discount factor. CALIBRATION. Table 5.10 summarizes the calibration of the model. For the baseline calibration, we employ the parameter values for the process (5.68) from Fehrle and Heiberger (2020), Table 7, and choose κ = 3.0. Table 5.10 Model Calibration with Variable Disaster Size Preferences Production Disaster Shock Capital Accumulation
β=0.996 a¯=1.0028 p0 =0.0043 δ=0.016
η=2.0 α=0.36 ¯ ω=0.067 κ=3.0
γ=5.0 ρz =0.0 ρω =0.0
N =0.126 σz =0.0116 σω =0.67
RESULTS. Table 5.11 presents results from several simulations. The script DR_V4.m solves and simulates the model with variable disaster size and constant probability of entering a disaster (p1 = 0), whereas the script DR_V5.m computes the financial statistics for the model that additionally includes the state dependent probability of remaining in a disaster (p1 > 0). ¯ = 0.067 To interpret the results, note that the benchmark value of ω implies that the expected value of the disaster size computed from equation (5.69) is equal to 0.084. Therefore, the first two rows show a small equity premium of approximately 0.4 percentage points. The next two pairs of rows demonstrate that smaller values for the expected disaster size are sufficient to generate an equity premium of approximately the size observed in the model from the previous section for ω = 0.36. The ¯ = 0.125 implies ω = E(ω) ≈ 0.16 and an equity premium of value ω 2.85 percentage points from the projection solution compared to a premium of 2.86 percentage points in Table 5.9. The variability in the equity return, however, is smaller than in the model with a constant disaster size. Therefore, the Sharpe ratios are larger and thus diverge more from their empirical counterparts. Additionally, the tendency of the projection
298
5 Weighted Residuals Methods Table 5.11 Annualized Real Returns with Variable Disaster Size Rf
sR f
R
sR
EP
SR
0.36 0.41
0.14 0.16
0.79 2.26
0.25 0.71
0.86 2.85
0.26 0.88
0.95 3.34
0.28 0.99
¯ = 0.067, d = 1 p1 = 0, ω Perturbation 3.52 Projection 3.43
0.15 0.15
3.88 3.84
2.64 2.63
¯ = 0.12, d = 1 p1 = 0, ω Perturbation 3.03 Projection 1.38
0.15 0.15
3.82 3.64
3.19 3.20
¯ = 0.125, d = 1 p1 = 0, ω Perturbation 2.96 Projection 0.76
0.15 0.15
3.81 3.61
3.25 3.26
¯ = 0.125, d = 1 p1 = 0.1, ω Perturbation 2.84 Projection 0.11
0.89 2.15
3.79 3.46
3.39 3.37
¯ = 0.125, d = 1 β = 0.99, p1 = 0.15, ω Perturbation 5.81 Projection 2.18
0.97 3.22
6.32 5.85
3.41 3.36
0.51 3.67
0.15 1.09
¯ = 0.125, d = 1 β = 0.99, p1 = 0.25, ω Perturbation 5.71 Projection 1.09
1.75 4.95
6.29 5.48
3.64 3.60
0.58 4.39
0.16 1.22
¯ = 0.125, d = 0 β = 0.99, p1 = 0.25, ω Perturbation 5.94 Projection 1.75
0.15 0.17
6.62 5.81
2.17 2.23
0.68 4.07
0.31 1.82
Notes: R f : riskless rate, sR f : standard deviation of riskless rate, R: return on equity, sR : standard deviation of equity return, E P: equity premium, SR: Sharpe ratio. Averages from simulations with 100,000 periods. d = 1 indicates results that include disaster periods. Unless otherwise noted, the parameters are calibrated according to Table 5.10.
solution to overstate the riskless rate is more pronounced in the model with a variable disaster size.
5.7 Disaster Risk Models
299
The remaining rows of the table reflect the effect of prolonged disasters. Already, the small value of p1 = 0.1 reduces the riskless rate in the projection solution from 0.76 percent to 0.11 percent so that the equity premium increases to 3.34 percentage points. A second effect is the considerable increase in the standard deviation of the riskless rate by two percentage points from 0.15 to 2.15. In this respect, the model estimate is closer to the empirically observed sizable volatility in the riskless rate. As observed, if we want to increase the equity premium further by increases of p1 , we must raise the deterministic riskless rate. The obvious way to accomplish this increase is reducing the discount factor β. The combination of β = 0.99 and p1 = 0.25 implies an equity premium of more than four percentage points, predicting a riskless rate that nearly matches the empirically observed bill rate with a standard deviation only one percentage point below the empirical standard deviation of the bill rate in Table 5.5. The last two rows of the table highlight the effect of excluding disaster periods. We already know their effect on the equity return. However, the longer duration is also responsible for the smaller riskless rate and their large standard deviation. The latter drops from 4.95 percentage points to 0.17 percentage points if we disregard periods with disasters in computing the financial statistics. Finally, note the large differences between the perturbation and the projection solution with respect to the average riskless rate and the standard deviation of this variable in simulations with conditional disaster risk. As explained above, the parameter p1 enters into the coefficients of policy functions from the projection solution but not from the perturbation solution.
5.7.6 The Full Model In this section, we introduce a further shock, namely a time varying probability of entering a disaster. DISASTER PROBABILITY. We replace the constant p0 in the process (5.75) by the process ˆp t = ρ p ˆp t−1 + ε pt , ˆp t = ln(p t /¯p), ρ p ∈ (−1, 1), ε pt iid N (0, σ2p ).
(5.76)
300
5 Weighted Residuals Methods
With this process, assumption (5.75) now reads:24 Prob(d t+1 = 1|d t = 0) = min{p t , 1}, Prob(d t+1 = 1|d t = 1) = max{p1 , min{p t , 1}}, Prob(d t+1 = 0|d t = 0) = 1 − min{p t , 1}
(5.77)
Prob(d t+1 = 0|d t = 1) = 1 − max{p1 , min{p t , 1}}.
The process (5.76) has unconditional mean E(ˆp t ) = 0 and variance σ2p /(1 − ρ p2 ), so the unconditional expected value of p t is equal to (see footnote 23): 2
2
p = ¯p e0.5σp /(1−ρp ) . PERTURBATION SOLUTION. For the perturbation solution, we substitute the binary variable d t by the continuous random variable Æ d˜t+1 = ¯p e ˆp t + ¯p e ˆp t (1 − ¯p e ˆp t )s t , ˆp t := ln(p t /¯p).
As in equation (5.38), the variable s t has mean E(s t ) = 0 and variance E(s2t ) = 1. Since ˆp t = 0 at the deterministic stationary solution, we must adapt the computation of the deterministic stationary solution in equations (5.39a), (5.39b), and (5.39d). For a given value of the growth factor a, the variable a¯ now follows from: a = a¯ e−ω¯p ,
(5.78a)
the capital-output ratio derived from the Euler equation (5.65) is equal to y aη eω¯p − β(1 − δ) = , k αβ
(5.78b)
and the capital stock before realizing the disaster shock is given by ˜k = eω¯p k.
(5.78c)
We must also consider the timing in the process (5.77). At the beginning of period t, the probability of a disaster within this period is known to the household from the realization of the variable ε pt−1 in the previous period (see (5.77)). Therefore, the variable d˜t is a function of p t−1 and we must add it as an additional state variable. The entire model, thus, has ˆ t , ˆp t , d˜t ), and the vector the endogenous states (˜k t , ˆp t−1 ), the shocks (z t , ω of jump variables (a t , y t , c t , i t , Nt , k t , w t , r t , q t , u t , vt , λ t ). 24
The min operator ensures that the probability to enter a disaster does not exceed one. After all, with the normally distributed innovation ε pt there is a small but positive probability that p t exceeds one.
5.7 Disaster Risk Models
301
SMOLYAK COLLOCATION SOLUTION. The state space for the policy functions of hours L t and utility u t has dimension ns = 4 and consists of the ˆ t , ˆp t ). As in the previous sections, the variable ˜k t and the three shocks (z t , ω additional state variable d t enters the policy functions indirectly since we compute two different policy functions, one for disaster states and another for non-disaster states. Observe also that the computation of the Euler equation residuals, e.g., in equation (5.74) involves the realization of ˆp t so that there is no need to add ˆp t−1 to the state space as is necessary for the perturbation solution. With the obvious extension of notation, the Smolyak base consists of tensor products of four univariate Chebyshev polynomials evaluated at the extrema (x k,i , x z,i , x ω,i , x p,i ). Hence, the approximate policy functions for labor and utility are: ˆh L (ξk (x k,i ), ξz (x z,i ), ξω (x ω,i ), ξ p (x p,i ), d) := b(x k,i , x z,i , x ω,i , x p,i ) T φ L , d ˆhu (ξk (x k,i ), ξz (x z,i ), ξω (x ω,i ), ξ p (x p,i ), d) := b(x k,i , x z,i , x ω,i , x p,i ) T φ u . d Since the state space now has the dimension ns = 4 a level of approximation of µ = 3 implies that each of the vectors b(·) has n b = 137 elements. The Euler equation residuals are computed following the same steps for the models in previous sections. However, the numeric approximation of expectations in computing both the residuals and the riskless rate via Gauss-Hermite integration becomes time-consuming. With the obvious extension of notation the formula in equation (5.60) becomes: E t m t+1 = (1 −
−η p t )β a t
nGH X nGH X nGH X λ(s t, j1 , j2 , j3 ,0 )
λt −χ GH GH GH νj νj νj u(s t, j1 , j2 , j3 0 ) 1 2 3
j1 =1 j2 =1 j3 =1
×
1/(1−χ) π3/2 vt nGH X nGH X nGH X λ(s t, j1 , j2 , j3 ,1 ) −η
+ max{p1 , p t }β a t
j1 =1 j2 =1 j3 =1
×
u(s t, j1 , j2 , j3 ,1 ) 1/(1−χ) vt
λt −χ
GH GH νGH j νj νj 1
2
π3/2
3
,
p t = min{¯p e ˆp t , 1}. Instead of the time-consuming Gauss-Hermite product formula, we employ monomial integration rules. These rules are able to reduce the computational time from more than eight hours to approximately one hour. For
302
5 Weighted Residuals Methods
instance, even with only nGH = 5 nodes the triple sums in the previous equation have n3GH = 125 summands, whereas the degree 5 rule in equation (14.38) involves only (ns − 1)2 + 1 = 10 elements. CALIBRATION. Table 5.12 shows the baseline calibration of the full model with the time variable disaster probability, disaster size, and conditional probability of remaining in a disaster once it has q occurred. The standard deviation of the disaster risk shock of σ p = 2.5 1 − ρ p2 is moderately smaller than in Gourio (2012) and is taken from Fehrle and Heiberger (2020). Table 5.12 Baseline Calibration of the Full Disaster Risk Model Preferences Production Disaster Probability Disaster Size Capital Accumulation
β=0.996 a¯=1.0028 p0 =0.0043 ¯ ω=0.125 δ=0.016
η=2.0 γ=5.0 N =0.126 α=0.36 ρz =0.0q σz =0.0116 ρ p =0.0 σ p =2.5 1 − ρ p2 p1 =0.0 ρω =0 σω =0.025 κ=3.0
RESULTS. Table 5.13 presents results from several simulations of the model with the script DR_V6.m. Both the solution and the simulation employ the monomial integration rule (14.40). The first two rows of results refer to the baseline calibration from Table 5.12 and reveal the effect of a variable disaster risk. The parameters that determine the expected size and duration of a disaster are the same as those in Table 5.11 for rows five and six. Regarding the model with a constant probability of entering a disaster there is a noteworthy increase in both the riskless return and its volatility in the projection solution. The riskless rate increases from 0.76 percent (see row seven in Table 5.11) to 2.70 percent, and its standard deviation increases from 0.15 percentage points to 2.76. Since the average return on equity increases only slightly, the equity premium drops to 1.03 percentage points. The source of the larger volatility in the riskless rate is the household’s savings behavior, triggered by the information of the changed probability of entering a disaster in the following period. If the disaster risk increases, the household wants to buy more riskless bonds,
5.7 Disaster Risk Models
303
thus driving down their return. This effect is operative only in the model with a variable disaster risk. Simulations that increase the autocorrelation parameter from ρ p = 0 to ρ p = 0.7 strengthen this effect and reduce the equity premium further (compare rows one and two with rows three and four in Table 5.13). Table 5.13 Annualized Real Returns in the Full Disaster Risk Model Rf
sR f
R
sR
EP
SR
ρ p = 0, p1 = 0 Perturbation 3.08 Projection 2.70
1.89 3.71 3.02 0.62 0.21 2.76 3.73 3.25 1.03 0.32 ρ p = 0.7, p1 = 0
Perturbation 3.08 Projection 2.78
2.55 3.70 3.16 0.62 0.37 3.27 3.12 3.81 0.34 0.09
ρ p = 0, p1 = 0.40 Perturbation 2.34 15.49 3.58 3.82 1.24 0.32 Projection 1.74 5.41 2.43 4.89 0.69 0.14 η = 0.8, ρ p = 0, p1 = 0.40 Perturbation 1.72 Projection 0.01
4.94 2.36 3.98 0.63 0.16 4.90 2.72 4.18 2.71 0.65
¯ = 0.135, p1 = 0.40 β = 0.99, η = 0.8, ω Perturbation 4.15 Projection 1.35
5.40 4.80 4.06 0.65 0.16 6.18 5.47 4.37 4.12 0.94
Notes: R f : riskless rate, sR f : standard deviation of riskless rate, R: return on equity, sR : standard deviation of equity return, E P: equity premium, SR: Sharpe ratio. Averages from simulations with 100,000 periods, including disaster periods.
Our insights from the model with a conditional disaster probability carry over to the present model. The larger risk to stay in a disaster state reduces the riskless rate and the return on equity. Since the effect on the riskless rate is more pronounced, a larger value of p1 = 0.4 increases the equity premium as the sixth row in Table 5.13 shows.. In rows seven and eight, we report the sensitivity of our results with respect to the intertemporal elasticity of substitution (IES) 1/η. A reduction
304
5 Weighted Residuals Methods
of η = 2 (benchmark case) to η = 0.8 has two effects. It lowers the riskless rate aη /β and increases the IES. In the baseline calibration, the deterministic riskless rate is approximately 3.9 percent p.a. and decreases to 2.5 percent for η = 0.8. In simulations with the projection solution this value decreases further to zero, whereas the perturbation solution predicts a value near the deterministic solution. The smaller value of η = 0.8 implies that the parameter χ defined in equation (5.51) changes its sign and increases from χ = −3 to χ = 21. The household is more inclined to substitute consumption and labor between the present and the future. Accordingly, risk shocks have a lower impact on the riskless rate, and its standard deviation declines. We can increase both the riskless rate and the equity return if we decrease the household’s discount factor β. The values β = 0.99 and η = 0.8 yield a deterministic riskless rate (equal to the deterministic equity return) of 5.04 percent p.a. Rows nine and ten of Table 5.13 show that for these values and a slightly larger disaster size, the projection solution yields a riskless rate close the bill rate in Table 5.5 with a standard deviation that nearly matches its empirical counterpart, and an equity premium of more than 4 percentage points. However, the model is unable to predict the empirically observed volatility of stock returns. The standard deviation in the model is approximately one-fifth of the empirical standard deviation of stock returns. Accordingly, the Sharpe ratio is much too large in the model. CONCLUSION. The preceding sections show that various parameterizations of models with disaster risk, GEU preferences, and adjustment costs of capital are able to predict equity premia of the magnitude observed in global historical data. However, none of the models and their various parameterizations can predict the empirically observed volatility of stock returns. Addressing this shortcoming requires additional features. For instance, Gourio (2012) introduces leverage into his model. He assumes that a given fraction of the firm’s capital stock is debt financed and that a fraction of this debt defaults in disaster periods. Therefore, the results of his model are close to the standard deviation in the data (see his Table 3). Finally, our results show that the local second-order perturbation solution of models with disaster risk is less reliable than the global weighted residuals method.
Problems
305
Problem 5.1: Finite Element Solution of the Benchmark Business Cycle Model Solve the benchmark business cycle model considered in Section 5.5 with the finite element method. Employ an equally spaced grid with n = 20 nodes over the capital stock in the interval [0.5, 1.5]k, where k denotes the scaled stationary capital stock. Use Algorithm 16.4.2 and approximate the process for log-TFP z t with a Markov chain with m = 15 elements. Approximate the policy function for R hours h L (k, z) on this grid with a bi-cubic spline. (Hint: in MATLAB you can use the command griddedinterpolant.) Initialize this function with a secondorder perturbation solution computed with the CoRRAM toolbox. (Hint: consult R the code in the MATLAB program BM_CGT.m.) Code the residual function. The zero of this function in the matrix L = (L i j ), i = 1, . . . , n, j = 1, . . . , m is the finite elements solution. Proceed as follows: For each (ki , z j , L i j ) compute output from the production function yi j = ez j kiα L i1−α j , real wages from the first-order condition w i j = (1 − α)
yi j Li j
,
consumption from the first-order condition for labor supply ci j = (1 − L i j )
wi j θ
,
marginal utility from −η
λi j = ci j (1 − L i j )θ (1−η) , and next-period capital from ki0 j =
yi j + (1 − δ)ki j − ci j a
.
The n × m residuals of the Euler equation are the result of the next steps: For (ki j , zl ), l = 1, . . . , m, use the approximate policy function ˆh L (·) and compute L i j,l = ˆh L (ki j , zl ). Follow the steps from above and compute marginal utility λi j,l . The residual of the Euler equation REi j is then given by REi j = 1 − β a
−η
m X l=1
p jl
λi j,l λi j
1−δ+α
yi j,l ki0 j
.
Use a nonlinear equations solver to locate the zeros of the Euler equation residuals as functions of L i j .
306
5 Weighted Residuals Methods
Problem 5.2: Search and Disutility of Employment Heiberger (2017), Section 4.7.1, considers the robustness of the search and matching model in Section 5.6 to generate endogenous disasters. Following Merz (1995), he assumes that the household’s utility function includes the disutility of hours worked by the employed members of the household. Instead of u(C t ) in (5.28), he parameterizes the period utility function as 1−η
u(C t , Nt ) :=
Ct
−1
1−η
−
ν0 1+ν N 1 , η, ν0 , ν1 > 0, 1 + ν1 t
where he normalizes hours per worker to unity. 1) Use the envelope theorem and show that the net value of a new job follows recursively from η
ν
ψ t = w t − b − ν0 C t Nt 1 + β(1 − ω − κwt )E t M t+1 ψ t+1 . 2) Solve the bargaining problem 1−ϕ
max ζ t wt
1−ϕ
ψt
, ϕ ∈ (0, 1),
for the real wage w t , where ζ t is the current period value of a new employee. Show that the bargained real wage is given by Vt η ν w t = ϕ Zt + c + (1 − ϕ) v + ν0 C t Nt 1 Ut and compare this to (5.31g). 3) Calibration: Vis-a-vis the baseline calibration presented in Table 5.4, the extended model is calibrated such that the period value of unemployment in terms of consumption goods, ˜b := b + ν0 C η N ν1 , is unchanged and equal to ˜b = 0.85. Note that, for this value of ˜b, the steady state value of C is the same as in the base line model. (N = 0.9 follows from U = 0.1.) Set ν1 = 2 and compute ν0 such that the disutility of hours in the steady state is equal to 1 3
= ν 0 C η N ν1 .
Accordingly, the new value of unemployment benefits is equal to b = ˜b − 13 .
R 4) Modify the code in the MATLAB script Search_CGT.m and the function Search_CGT_Eqs.m and compute the Galerkin solution of the model.
Problems
307
Æ 5) Place a fine grid over the square [0.1, 0.9] × [−¯ z , z¯], z¯ = 3.5σε / 1 − ρ Z2 , and compute the policy function for the value of employment ζ˜t on this grid. 6) Compare this function with the one computed from a second-order perturbation solution on the same grid points. 7) Compare your results to Figure 5.4. 8) What is the mechanism that makes endogenous disasters less likely?
Problem 5.3: Oil Price Shocks Consider the following model with a variable utilization rate of capital u t and a second shock that represents exogenous variations in the price of imported oil p t (adapted from Finn (1995)). The representative agent maximizes X ∞ s U t := E t β ln C t+s + θ ln(1 − L t+s ) , β ∈ (0, 1), θ > 0, s=0
subject to the constraints K t+1 = (u t K t )α (A t L t )1−α + (1 − δ(u t ))K t − C t − p t Q t , γ
δ(u t ) :=
ut γ
,
ζ
ut Qt = , Kt ζ A t = a t A t−1 , a t = aeεat , εat iid N (0, σ2a ), ln p t+1 = ρ p ln p t + ε pt+1 , ε pt+1 iid N (0, σ2p ), K t given. As usual, A t denotes labor efficiency in period t, C t is consumption, L t are working hours, K t is the stock of capital, and Q t it the quantity of oil imported at the price of p t . A more intense utilization of capital increases the amount of energy required per unit of capital. Thus, if the price of oil rises capital utilization will decrease. Verify this claim as follows. 1) First-order conditions: Derive the first-order conditions of the agent’s optimization problem with respect to consumption, hours, capital utilization, and next-period capital. 2) Dynamics: Define the stationary variables y t :=
Yt Ct It Kt , c t := , i t := , k t := , A t−1 A t−1 A t−1 A t−1
308
5 Weighted Residuals Methods
and derive the system of equations that determines the dynamics of the model in these variables. 3) Compute the steady state. Use the following parameter values taken from Finn (1995): β = 0.9542, θ = 2.1874, α = 0.7, γ = 1.4435, ζ = 1.7260, ρ p = 0.9039, σ p = 0.0966, Z = 1.0162, σa = 0.021. 4) Policy function for hours: The state variables of the model are the scaled capital stock k t , the growth shock εat , and the oil price shock p t . Let L t := h L (k t , εat , p t ) denote the policy function for hours. Show that, given (k t , εat , p t , L t ), the dynamic system derived in step b) can be solved for the variables u t , c t , i t , and k t+1 . Hint: This step requires solving α (aeεat L t /k t )1−α = u t
γ−α
ζ−α
+ pt ut
for the utilization rate u t . 5) Approximate h L (k t , εat , p t ) with a Smolyak polynomial. (Hint: Consult the R MATLAB code of the script DR_V3.m and the associated function DRV4_ Eqs_Smolyak.m to see how to handle models with three state variables.) 6) Compute the impulse response of the model’s variables to a one-time oil price shock. (Hint: Use the function Impulse2.m.)
Problem 5.4: Sticky Real Wages and the Equity Premium Uhlig (2007) considers the equity premium puzzle in a model with habits and sticky real wages. The presentation of his model follows Heer and Maußner (2013). Firm. The representative firm maximizes its beginning-of-period expected value X ∞ Vt := E t M t+s Yt+s − Wt+s L t+s − I t+s s=0
subject to the constraints Yt = Z K tα (A t L t )1−α , A t = aeεat A t−1 , εat iid N (0, σ2a ), K t+1 = (1 − δ)K t + Φ(I t /K t )K t .
M t+s is the stochastic discount factor for returns that accrue in period t + s. The function Φ(I t /K t ) is parameterized as in (5.62). From the perspective of the firm, both M t+s and the path of real wages Wt+s are exogenously given. Household. The representative household is the shareholder of the firm. At period t, he owns S t shares and receives the dividends per share d t . In addition, he
Problems
309
receives the wages paid by the firm to their employees. The household maximizes expected life-time utility Et
∞ X
βs
h [(C t+s − C t+s )(B + (1 − L t+s − F th )θ ]1−η − 1
1−η
s=0
subject to the constraint Wt L t + d t S t − C t ≥ vt (S t+1 − S t ).
The consumption habit C th and the leisure habit F th evolve as follows: h C th = a (1 − λC )χ C C t−1 + λC C t−1 ,
h F th = (1 − λ L )χ L (1 − L t−1 ) + λ L F t−1 . f
Real Wage. Let Wt denote the marginal rate of substitution between leisure and consumption. Uhlig (2007) assumes that the real wage of period t is the f geometrically weighted mean of the past real wage Wt−1 and Wt : f
Wt = (aWt−1 )µ (Wt )1−µ . Return on Equity. As shown in Heer and Maußner (2013), the return on equity R t+1 is equal to R t+1 =
αYt+1 − I t+1 + q t+1 K t+2 , q t K t+1
where – as in Section 5.7.4 – q t denotes the Lagrange multiplier of the constraint on capital accumulation. Calibration. Heer and Maußner (2013) conduct a search over the six parameters χ C , χ L , λC , λ L , µ, and κ to match the following targets from the US economy as close as possible: An annual equity premium of 6.18 percent, the standard deviation of GDP, sY =1.72, the relative volatilities of investment, hours, and the real wage, s I /sY =2.97, s L /sY =0.98, sW /sY =0.44, and the correlations between GDP and hours and between hours and the real wage, rY L = 0.78, r LW = 0.21. They find: χ C = 0.935, χ L = 0.96, λC = 0.68, λ L = 0.01, µ = 0.95, κ = 2.7. The remaining parameters take the following values: β = 0.99, η = 1, L = 0.33, α = 0.36, ln a = 0.004, σa = 0.018, δ = 0.025. The parameters φ1 and φ2 of (5.62) are chosen to imply q = 1 and Φ(a − 1 − δ) = a − 1 + δ. For the parameters Z, B, and ν see the instructions below. 1) Derive the first-order conditions of the firm and the household. 2) Let Λ t denote the Lagrange multiplier of the household’s budget constraint and show that M t+1 = βΛ t+1 /Λ t .
310
5 Weighted Residuals Methods
3) Scale the non-stationary variables, i.e., A t , C t , C th , I t , Wt , Yt , and Λ t , appropriately by A t−1 and show that the dynamic system is given by the following set of equations, where lower case symbols refer to scaled variables: y t = Z kαt (a t L t )1−α , yt w t = (1 − α) , Lt yt = ct + it , 1 qt = 0 , Φ (i t /k t )
ν 1−η λ t = (c t − c th )−η B + 1 − L t − F th , f
wt = ν
c t − c th
ν 1 − L t − F th
ν−1
B + 1 − L t − F th a µ f 1−µ wt = wt , a t−1
,
a t = aeεat , a t k t+1 = (1 − δ)k t + Φ(i t /k t )k t , a h c th = (1 − λC )χ C c t−1 + λC c t−1 ], a t−1 h L ht = (1 − λ L )χ L (1 − L t−1 ) + λ L F t−1 , ¦ λ −η t+1 qt = β at Et α( y t+1 /k t+1 ) λt
+ q t+1 1 − δ + Φ(i t+1 /k t+1 ) − Φ0 (i t+1 /k t+1 )(i t+1 /k t+1 )
©
.
4) Compute the deterministic stationary solution of this model for the parameter values given above. Choose the free parameters Z, B, and θ such that λ = 1 and that the Frisch elasticity ε FL,w defined in (1.48) is equal to 0.2. (Note that the stationary level of hours L is given.) 5) The model’s state variables are k t , εat , a t−1 , c th , l th , and w t−1 . Use a Smolyak polynomial in these variables to approximate the policy function of hours. Determine the parameters of this polynomial from the Euler equation for next-period capital. f
6) Confirm that the risk-free rate in this model is equal R t defined in (5.45) and that the return to equity R t+1 is equal to the rhs of (5.67) for d = 0 and ˜k t+2 = k t+2 . 7) Draw a long sequence of shocks {εa,t } Tt=1 , say T =100,000, simulate the model, and compute the equity premium as explained in Section 5.7.
Chapter 6
Simulation-Based Methods
6.1 Introduction The perturbation methods presented in Chapters 2-4 use one point in the state space of DSGE models to compute approximate solutions of the model’s policy functions. The weighted residual methods considered in Chapter 5 use a predefined set of points in the model’s state space. A subset of these methods constructs this set from the zeros or extrema of polynomials used to build the basis functions for the approximation. Many of the points chosen in this way may lie far outside the model’s ergodic set i.e., the set consisting of the points traced out by a stochastic simulation of the exact solution with infinite length. Accordingly, the algorithm wastes effort to compute good approximations in areas of the state space that are never visited by the simulation. Moreover, the algorithm may fail to find a solution at all if the model’s equations cannot be solved at some extreme locations of the predefined state space. This chapter presents methods that combine stochastic simulation with other tools to find approximate solutions on the model’s ergodic set. We devote the first part of this chapter to the extended path method. This method dates back to Fair and Taylor (1983) and was applied to the stochastic Ramsey model by Gagnon (1990). It rests on the repeated solution of a large system of nonlinear equations. This system derives from the system of stochastic difference equations that determines the time path of the model under two assumptions: first, given the observed realization of the model’s shocks, there will be no future unexpected shocks, and second, the economy will return to its deterministic balanced growth path after a given transition period of finite length. Recently, Maliar et al. (2020) extended this approach to solve models in which the economic © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_6
311
312
6 Simulation-Based Methods
fundamentals, e.g, the parameters of the agent’s utility functions or the production technology, change with the passage of time. We apply the extended path method to our benchmark model in Example 1.6.1 and to the model of a small open economy similar to the economy studied by Correia et al. (1995). In the second part of this chapter, we consider approaches that combine stochastic simulation with the methods of function approximation presented in Chapter 13. As in the weighted residuals methods considered in the previous Chapter 5, these methods choose the smallest subset of policy functions required to solve the model and compute the solutions of the remaining variables from the model’s nonlinear equations. Different from the weighted residuals methods, the residual function is evaluated on the ergodic set. Similar to the former, the computation of the residuals and the final solution give rise to a wide variety of specific implementations. We illustrate this point with our benchmark model and close the chapter with a computationally more involved model, the limited participation model of Christiano et al. (1997).
6.2 Extended Path Method 6.2.1 Motivation We motivate the extended path method with the finite-horizon deterministic Ramsey model introduced in Section 1.2. THE MODEL. We repeat the first-order conditions of this model for the reader’s convenience: K t+1 = f (K t ) − C t ,
∂ U(C0 , . . . , C T )/∂ C t = f 0 (K t+1 ), ∂ U(C0 , . . . , C T )/∂ C t+1 K T +1 = 0.
t = 0, . . . , T − 1, t = 0, . . . , T − 1,
(6.1a) (6.1b) (6.1c)
The first equation is the economy’s resource constraint, the second condition determines the farmer’s savings. As usual, K t denotes the capital stock of period t, and C t represents consumption. To determine the time path of capital and consumption from this system, we must specify functional forms for U and f . We assume that U is given by the time-separable utility
6.2 Extended Path Method
function U(C0 , . . . , C T ) :=
T X
313
βt
t=0
1−η
Ct
−1
1−η
,
β ∈ (0, 1), η > 0,
(6.2a)
where β denotes the discount factor. The function f includes depreciated capital and output is parameterized as a Cobb-Douglas function f (K t ) := (1 − δ)K t + K tα ,
α ∈ (0, 1), δ ∈ [0, 1],
(6.2b)
where δ denotes the rate of capital depreciation. By means of these two functions, equations (6.1) simplify to −η 0 = (1 − δ)K t + K tα − K t+1 −η α α−1 − β (1 − δ)K t+1 + K t+1 − K t+2 1 − δ + αK t+1 , t = 0, 1, . . . , T − 1,
0 = K T +1 .
(6.3)
For a given initial capital stock K0 , this is a system of T nonlinear equations in the T unknown capital stocks K1 , K2 , . . . , K T . Thus, it is an example of the general nonlinear system of equations f(x) = 0, x ∈ Rn considered in Section 15.3 and can be solved by using the algorithms considered there. SOLVING THE SYSTEM. The problem that we face with this approach is twofold. First, we must select T initial values for the nonlinear equations solver at which the function defined by equation (6.3) can be numerically evaluated. For instance, if for two consecutive capital stocks K t and K t+1 , consumption (the term in square brackets) is negative, the marginal utility of consumption [(1 − δ)K t + K tα − K t+1 ]−η is not defined and the solver will terminate with an error message. Second, if the solver starts with an admissible solution, it is not guaranteed that it will not try to find a solution that either implies negative values of the capital stock or again negative values of consumption. If the nonlinear equations solver has no option to supply bounds on the vector x, there is no way to prevent it from running into negative values of the capital stock. However, we can prevent negative consumption: if at some stage of the solution process C t would become negative, we replace the actual value with a small positive number. This will result in a large marginal utility and drives the left-hand side (lhs) of system (6.3) far from zero so that the solver will not accept the current point. To ensure positive consumption at the initial vector, we choose a constant sequence of capital equal to a fraction φ ∈ (0, 1) of the initial resources; i.e., K t = φ(K0α + (1 − δ)K0 ), t = 1, . . . , T .
314
6 Simulation-Based Methods
SOME TIME PROFILES. Figure 6.1 displays the results of four different optimal time paths of the capital stock. They are computed from the GAUSS program Ramsey2.g for the parameter values α = 0.36, β = 0.996, and η = 2.1 The title of each of the four panels shows the initial capital stock relative to the stationary capital stock K of the infinite horizon version of the model and the chosen depreciation rate. δ = 1 and K0 /K = 0.1
δ = 1 and K0 /K = 1.4 0.25
0.20
0.20
Kt
Kt
0.15 0.10 0.05 0.00
10
20
30
40
50
0.00
60
10
20
30
40
50
60
Period
Period
δ = 0.014 and K0 /K = 0.1
δ = 0.014 and K0 /K = 1.4
20 15
Kt
Kt
0.10 0.05
25
10 5 0
0.15
10
20
30
40
50
60
150 125 100 75 50 25 0
Period
10
20
30
40
50
60
Period
Figure 6.1 Example Solutions of the Finite-Horizon Ramsey Model
With one hundred percent depreciation, δ = 1, the two upper panels clearly show the household’s desire for smooth consumption. It either quickly builds up or draws down capital to a certain level and remains there for an extended period of time. Shortly before the end of the planning horizon capital approaches zero. In the more realistic scenarios, with a quarterly depreciation rate of 1.4 percent, displayed in the two bottom panels, this strategy is not optimal, since investment I t = K t+1 − (1 − δ)K t is only a small fraction of the capital stock. Accordingly, the time path is 1
R The MATLAB version of this program is Ramsey2.m.
6.2 Extended Path Method
315
hump-shaped for small initial capital and monotonically decreasing for large initial capital. THE INFINITE-HORIZON MODEL. Can we use this approach to solve the infinite-horizon version of the Ramsey model presented in Section 1.3? Clearly, using larger and larger values of the planning horizon T will not work for two reasons. First, for very large values, say billions, we run into work space limits of the computer hardware. Second, and more importantly, even if the end of the planning horizon lies far in the future, the terminal value of the capital stock is still zero. However, as we know from the analysis in Section 1.3.4, the optimal solution of the infinite horizon Ramsey model is to approach a positive, constant level of capital. This level follows from condition (6.1b) evaluated at the point of constant consumption and constant capital. The parameterization in (6.2) yields
1 − β(1 − δ) K= αβ
1 α−1
.
(6.4)
The trick to solve this model is to use this insight and to replace the terminal value of zero with this stationary value. This changes the interpretation of the variable T : it becomes a guess of the length of the transition period. To obtain an accurate approximation of the saddle path, we must set T sufficiently large. An appropriate number has the property that a further increase, say, from T to T 0 has negligible effects on the first T elements of both solutions. In the infinite-horizon model, the choice of the starting value is more delicate than that in the finite-horizon model. The strategy that was successful in the latter model does not work if the economy’s inherited capital stock K0 (recall, this is a parameter of our model) is small relative to the stationary capital stock K because it implies C T < 0. On the other hand, if we set all initial values equal to K, and K0 is small, we obtain C0 < 0. Instead of using different starting values for each K t , we employ a homotopy method to approach K0 : We use K for all K t to initialize the nonlinear equations solver. This approach works if we set K0 very close to K. We then use this solution as the starting value for a smaller K0 and continue in this fashion until K0 has reached the value we desire. In our program Ramsey3.g, we reduce K0 in this way to ten percent of the stationary capital stock.2 2
R The MATLAB version of this program is Ramsey3.m.
316
6 Simulation-Based Methods
Figure 6.2 illustrates that we indeed must set T to a large value of 400 to find an acceptable solution for the parameter values α = 0.36, β = 0.996, η = 2, and δ = 0.014. If we resolve the model with T = 420, the maximum absolute value of the relative difference between the new and old solution, max t=1,...,400 |1 − K t400 /K t420 |, is less than 0.002.
100
Second-order Solution First-order Solution Extended Path Solution
Kt
75
50
25
0
50
100
150
200
250
300
350
400
Period Figure 6.2 Approximate Time Paths of the Capital Stock in the Deterministic Growth Model
The extended path solution computes the exact transition path to the stationary equilibrium up to the precision of the nonlinear equations solver. Accordingly, Figure 6.2 additionally reveals that both the first- and the second-order perturbation solutions overstate the speed of adjustment: both time paths lie above the time profile of the extended path solution. The maximum vertical difference between the first-order (second-order) solution and the extended path solution is more than 17.7 (9.8) percent of the latter. With respect to runtime, however, the GAUSS program requires 0.2 hundreds of a second to compute both the first- and the second-order solutions, while it needs more than 70 seconds for the extended path R solution on a workstation with Intel Xeon W-2133 CPU running at 3.60 GHz. THE STOCHASTIC RAMSEY MODEL. With a few modifications, we can devise a method suitable for the solution of the infinite-horizon stochastic Ramsey model presented in Section 1.4. Given the parameterization in
6.2 Extended Path Method
317
(6.2), the stochastic Euler equation (1.27) is given by −η 0 = (1 − δ)K t + Z t K tα − K t+1 ¦ −η α − βE t (1 − δ)K t+1 + Z t+1 K t+1 − K t+2 © α−1 × 1 − δ + αZ t+1 K t+1 , t = 0, 1, 2, . . . ,
(6.5)
where the variable Z t denotes the total factor productivity (TFP). We assume that the natural logarithm of this variable, z t := ln Z t , follows a first-order autoregressive (AR(1))-process: z t+1 = ρz z t + ε t+1 ,
ρz ∈ (−1, 1), ε t+1 iid N (0, σε2 ).
(6.6)
Iterating this equation forward for s = 1, 2, . . . periods implies z t+s = ρzs +
s−1 X i=0
ρzi ε t+s−i .
Accordingly, if the farmer observes the current TFP shock z t , his or her expectation of the future path of this variable is E t z t+s = ρzs , since E t ε t+s−i = 0 for i = 0, . . . , s − 1. If we replace the expectation E t {·} in equation (6.5) with the expectation of z t+s , the system of equations becomes −η s α 0 = (1 − δ)K t+s + eρz z t K t+s − K t+s+1 −η s+1 α − β (1 − δ)K t+s+1 + eρz z t K t+s+1 − K t+s+2 (6.7a) s+1 α−1 × 1 − δ + αeρz K t+s+1 , s = 0, 1, 2, . . . T − 1 and K t given.
Note that for T → ∞, the sequence of capital stocks that solves this system will approach the stationary capital stock K from equation (6.4). We can use this insight and add the condition K t+T = K
(6.7b)
for some large integer value T to equations (6.7a) so that we obtain a T system of T equations that determines a convergent path {K t+s }s=1 of capital stocks. From the solution of this system, the farmer will choose
318
6 Simulation-Based Methods
only the capital stock K t+1 as his or her optimal savings decision. After all, in the following period he or she will learn about the actual TFP shock z t+1 = ρz z t + ε t+1 6= ρz z t . Replacing K t with K t+1 and z t with z t+1 in system (6.7a) again yields a system of T equations. The solution determines the optimal capital stock of period t + 2. In this way, we can trace out the time path of the model for a given sequence of shocks {z t , z t+1 , z t+s , . . . } obtained from the process (6.6). The GAUSS program Ramsey4.g implements this approach for the functional forms given in (6.2). In each period t = 1, . . . , 60, it employs the solution from the previous period as the initial values for the nonlinear equations solver. At t = 1, it sets the vector of initial values equal to the stationary solution K. The length of the extended path equals T = 150.3 108.0 107.8
1.01
107.6 1.00
0.99
107.2 107.0 106.8 106.6
Zt
Kt
107.4
0.98
Second-order Solution First-order Solution Extended Path Solution TFP 10
20
30
0.97 40
50
60
Period Figure 6.3 Simulated Time Path of the Stochastic Growth Model.
Figure 6.3 presents the results of this exercise. The right ordinate depicts the level of the TFP for which the program solves the model. The parameters of the process (6.6) are ρz = 0.82 and σε = 0.0071. The left ordinate shows the associated time paths of the capital stock obtained 3
A second solution with T = 250 does not increase the precision: the maximum absolute value of the relative difference between both solutions is less than 4 × 10−6 .
6.2 Extended Path Method
319
from a first- and second-order perturbation solution and from the extended path method. The capital stock follows the evolution of the TFP with a time lag of several periods. The downturn of productivity between periods t = 14 and t = 24 triggers a recession with trough in period t = 36. In the extended path solution, this recession is slightly more pronounced than that in the first- and second-order perturbation solutions. Note that the solution for the capital stock depends on only the first (conditional) moment of the process (6.6) and not on its second moment σε2 . Accordingly, the extended path method induces the certainty equivalence principle that we have already seen in Section 2.4.3 as a property of the LQ approximation. We measure the accuracy of the three solutions along the simulated path {(K t , Z t )}60 t=1 from the residuals of the Euler equation (6.5). For each pair (K t , Z t ), we compute the expectation on the rhs of the Euler equation from the Gauss-Hermite formula (14.31) with nine nodes. In the case of the extended path method, we must therefore solve the system (6.7a) for each of the nine nodes. The maximum absolute residual (in terms of the required change in consumption yielding a residual of zero) of the second-order time path is approximately 0.00024 and differs subtly from the residual of the first-order solution. The Euler residual of the extended path solution is less than 1.5e-5 and thus one order of magnitude smaller. The relatively small differences in accuracy follow from the fact that all three time paths remain close to the stationary solution: the maximum absolute difference between the time paths and the stationary solution is less than one percent.
6.2.2 The General Algorithm In this subsection we provide the general structure of the extended path algorithm. We resort to the notation of Section 2.5.2 for the canonical DSGE model. For the reader’s convenience we briefly restate this model. NOTATION. The model consists of three kinds of variables. x t ∈ Rn(x) is the vector of endogenous state variables. These variables have a historically given initial value x t , but their future time path x t+s , s = 1, 2, . . . is determined by the model’s equations. The elements of the vector y t ∈ Rn( y) are determined in every period by the model’s equations, and we refer to
320
6 Simulation-Based Methods
them interchangeably as control, jump or not predetermined variables. The vector of shock z t ∈ Rn(z) follows a first-order vector autoregressive (VAR(1))-process. The model’s evolution follows from the system of stochastic difference equations:4 0[n(x)+n( y)]×1 = E t g(x t+1 , y t+1 , z t+1 , x t , y t , z t ), 0n(z)×1 = z t+1 − Rz t − σΩε t+1 , ε t+1 iid N 0n(z)×1 , I n(z) .
(6.8a) (6.8b) (6.8c)
All eigenvalues of matrix R are within the unit circle so that z t will approach the zero vector if σ = 0 or if the innovations ε t+1 are equal to the zero vector for all future periods t + 1, t + 2, . . . . We assume that the endogenous variables x t+1 and y t converge toward the deterministic solution determined by 0[n(x)+n( y)]×1 = g(x, y, 0, x, y, 0).
(6.9)
We can infer the local convergence from the eigenvalues of the linearized system as explained in Sections 3.2.2 and 3.2.3. THE ALGORITHM. Given these assumptions, we can derive a finite-dimensional system of nonlinear equations from (6.8), which enables us to approximate the time path of the model for each given initial condition (x0 , z0 ). Algorithm 6.2.1 (Extended Path) Purpose: Simulation of the DSGE model (6.8) Steps: Step 1: Initialize: Let p denote the number of periods to consider and (x0 , z0 ) the initial state of the model. Step 1.1: Use a random number generator and draw a sequence of p shocks {ε t } t=1 . p Step 1.2: Compute the time path {z t } t=1 from equation (6.8b). Step 1.3: Choose T sufficiently large so that (x, y) is a good approximation of (x T , y T ) under the maintained assumption that 4
The extended assumptions on the third moments of the process in condition (2.21e) are unnecessary here, since the algorithm enforces the certainty equivalence principle .
6.2 Extended Path Method
321
after period t = 0 the vector of innovations equals its unconditional mean: ε t = 0 ∀t = 1, 2, . . . , T . (Iterate over T to see whether this condition holds.) Step 2: For t = 0, 1, . . . , p repeat these steps: Step 2.1: Compute the expected time path E t z t+s = Rs z t for s = 1, 2, . . . , T . Step 2.2: Solve the system of T (n(x) + n( y)) equations 0 = g i (x t+s+1 , y t+s+1 , Rs+1 z t , x t+s , y t+s , Rs z t ), i = 1, 2, . . . , n(x) + n( y), s = 0, 1, . . . , T − 1,
x t given,
x = x t+T , T T for {x t+s }s=1 and {y t+s }s=0 . From the solution, keep x t+1 and y t . Step 2.3: Use x t+1 in Step 2.2 as the starting value for period t + 1.
Note that it is not possible to set y t+T equal to y in Step 2.2, since this would yield a system with more equations than unknown variables. Accordingly, it is not possible to iterate backwards starting from the point (x, y, R T z t ). Indeed, we must solve the entire system of interdependent equations. Even with a moderate number of variables, this system will comprise several hundreds of equations. Nonlinear equations solvers that employ the Jacobian matrix of the system will consume a considerable amount of computation time. Thus, it is advisable to reduce the system as much as possible. For instance, one may use the model’s static equations (those that only involve variables at period t) to substitute out a subset or even all the control variables of the model. The applications in the next section illustrate this approach.
6.2.3 Application: The Benchmark Business Cycle Model EQUATIONS. Our starting point is the system of stochastic difference equations from Section 1.6. In the first step, we use equations (1.64c), (1.64d), (1.64e), and (1.64f) to remove output y t , the real wage w t , the rental rate
322
6 Simulation-Based Methods
of capital r t , and investment i t from this system.5 The following reduced system determines the dynamics of the Lagrange multiplier of the budget constraint λ t , consumption c t , hours L t , and the capital stock k t : −η
0 = c t (1 − L t )θ (1−η) − λ t , ct 0=θ − (1 − α)Z t kαt L −α t , 1 − Lt
0 = ak t+1 − (1 − δ)k t + c t − Z t kαt L 1−α , t
(6.10a) (6.10b) 1−α
0 = λ t − β a−η E t λ t+1 1 − δ + αZ t+1 kα−1 t+1 L t+1 .
(6.10c) (6.10d)
The natural logarithm of TFP, z t := ln(Z t ) follows the process (6.6), which repeats equation (1.61). This system is an example of the general model defined in equations (6.8), with x t ≡ k t , y t ≡ [c t , Nt , λ t ]0 , and z t ≡ ln Z t . For a given T , for example T = 150, we have to solve a system of 600 unknown variables. This is a considerably large number. In Problem 6.1, we ask you to write a program that solves this system. Here, we reduce this system further: we use the two static equations (6.10a) and (6.10b) to remove consumption and the Lagrange multiplier from system (6.10). The result is a system of T −1 T 2T equations in the unknown variables {L t+s }s=0 and {k t+s }s=1 : s
0 =eρz z t kαt+s L 1−α (6.11a) t+s + (1 − δ)k t+s − ak t+s+1 1 − L t+s s − (1 − α)eρz z t kαt+s L −α t+s , for s = 0, 1, . . . , T − 1. θ ρs+1 z α −η 1−α tk z e L + (1 − δ)k − ak t+s+1 t+s+2 t+s+1 t+s+1 0 =1 − β a−η s eρz z t kαt+s L 1−α t+s + (1 − δ)k t+s − ak t+s+1 (6.11b) θ (1−η) 1 − L t+s+1 s+1 1−α 1 − δ + αeρz z t kα−1 t+s+1 L t+s+1 , 1 − L t+s for s = 0, 1, . . . , T − 2.
0 = k t+T − k.
(6.11c)
SOLUTIONS. We provide two programs that simulate this model. Both employ the parameter values presented in Table 1.1. The length of the 5
Recall that lower case symbols in this system refer to stationary variables. Except for hours L t , upper case symbols designate variables that grow at the rate of technical progress a − 1.
6.2 Extended Path Method
323
extended path is T = 150. The simulations start in period t = 1 at the point (k, ε1 ) with the stationary solution for hours and the capital stock as the initial guess. The following steps employ the solution found in the previous one. The first 1,000 periods of the simulation serve as a burn-in period. Second moments are from an additional 50,000 periods. On a noteR R book with an Intel Core i-7, 1.90 GHz processor, the MATLAB script BM_EP.m requires more than five hours to finish these computations. The R script employs the nonlinear equations solver fsolve from the MATLAB optimization toolbox. The Fortran program BM_EP.f90 is much faster. On the same machine, it performs the simulations in approximately 15 minutes. The program employs the nonlinear equations solver programmed by Nowak and Weimann (1991). For this reason, we check the accuracy of the solution only with the Fortran program. Along the simulated path of 50,000 periods, it computes the residuals of the Euler equation (6.10d) from both the extended path method and a second-order perturbation solution. As in Sections 4.4 and 5.5, they have the interpretation of the rate of change in consumption required to meet the Euler equation. GaussHermite integration with 9 nodes requires 450,000 additional solutions of system (6.11). For both the simulations and the computation of residuals, the program runs for slightly more than 3 hours. RESULTS. Figure 6.4 depicts the ergodic set traced out by our simulation. Total factor productivity z t remains in the interval [−0.05, 0.051], and the capital stock does not exceed or fall below 7 percent of its stationary value. The maximum absolute value of the Euler equation residual of the extended path method on the set displayed in the figure is equal to approximately 4.0e − 6 and is thus three orders of magnitude smaller than the residual of the second-order perturbation solution, which is equal to approximately 6.5e − 3. The second moments obtained from the extended path simulation are presented in Table 6.1 and differ only in the second decimal place from those presented in Table 5.3.
6.2.4 Application: The Model of a Small Open Economy As a second example, we present the small open economy model of Correia et al. (1995). We portray this economy from the perspective of a repre-
324
6 Simulation-Based Methods 0.050
zt
0.025 0.000 −0.025 −0.050
0.94
0.96
0.98
1.00
1.02
1.04
1.06
k t /k Figure 6.4 Ergodic Set of the Benchmark Business Cycle Model from the Extended Path Simulation Table 6.1 Second Moments from the Benchmark Business Cycle Model: Extended Path Solution Variable Output Consumption Investment Hours Real Wage
sx
rx y
rx
1.39 0.46 4.16 0.82 0.57
1.00 0.99 1.00 1.00 0.99
0.64 0.66 0.64 0.64 0.65
Notes: Second moments computed from HP-filtered simulated time series with 50,000 included observations and a burn-in period of 1,000 observations. s x :=standard deviation of variable x, r x y :=cross correlation of variable x with output, r x :=first order autocorrelation of variable x.
sentative household that is both a consumer and a producer. This will streamline the derivation of the necessary equations. Problem 6.2 sketches a decentralized economy with the same dynamic properties. The mindful reader will notice that some of the model’s building blocks recur from previous sections. THE MODEL. Consider a household in a small open economy that uses domestic labor L t and domestic capital K t to produce output Yt according to
6.2 Extended Path Method
325
Yt = Z t F (K t , A t L t ). As in many earlier sections, A t denotes the level of labor-augmenting technical progress and Z t total factor productivity (TFP). The first variable grows deterministically at the gross rate a := A t+1 /A t , a ≥ 1
and the natural logarithm of the second variable, z t := ln(Z t ), follows the process z t+1 = ρz z t + ε t+1
ρz ∈ (−1, 1), ε t iid N (0, σε2 ).
As in Section 5.7.4, capital formation is subject to frictions. Investment expenditures I t do not produce additional capital one-to-one. Instead, it becomes more and more difficult to build up capital as investment expenditures increase. This is captured by K t+1 = Φ(I t /K t )K t + (1 − δ)K t ,
δ ∈ (0, 1),
(6.12)
where Φ(x) is a concave function of its argument x. The usual, frictionless process of capital accumulation, K t+1 = I t + (1 − δ)K t , is a special case of (6.12) for Φ(I t /K t ) ≡ I t /K t . The household can freely borrow or lend on the international capital market at the real interest rate r t . This variable is out of the control of the domestic economy, but the household knows its time path. At period t, the household’s net foreign wealth is B t . Accordingly, the country’s trade balance T B t = Yt − C t − I t
can differ from zero and the household’s budget constraint reads B t+1 − B t ≤ T B t + r t B t .
(6.13)
However, there are legal restrictions on the amount of international borrowing that prevent the household from accumulating debt at a rate that exceeds the respective interest rate, that is: § ª B t+s+1 ≥ 0. lim E t s→∞ (1 + r t )(1 + r t+1 )(1 + r t+2 ) · · · (1 + r t+s )
A country that is initially a net debtor (B t < 0) must therefore allow for future trade surpluses so that the inequality
326
6 Simulation-Based Methods
−B t ≤ E t
¨∞ X s=0
T B t+s (1 + r t )(1 + r t+1 ) · · · (1 + r t+s )
« (6.14)
will be satisfied. The household chooses consumption C t , investment I t , working hours L t , future domestic capital K t+1 , and net foreign wealth B t+1 to maximize Ut = Et
∞ X s=0
β s u(C t+s , L t+s ),
β ∈ (0, 1)
subject to the budget constraint (6.13), the capital accumulation equation (6.12), the solvency condition (6.14), and the given initial stocks of K t and B t , respectively. FIRST-ORDER CONDITIONS. The Lagrangian of this problem is L =u(C t , L t ) + Λ t Z t F (K t , A t L t ) + (1 + r t )B t − C t − I t − B t+1 + Λ t q t Φ(I t /K t ))K t + (1 − δ)K t − K t+1 + βE t u(C t+1 , L t+1 ) + Λ t+1 Z t+1 F (K t+1 , A t+1 L t+1 ) + (1 + r t+1 )B t+1 − C t+1 − I t+1 − B t+2
+ Λ t+1 q t+1 Φ(I t+1 /K t+1 )K t+1 + (1 − δ)K t+1 − K t+2
+ .... The multiplier q t is the price of capital in terms of the consumption good so that Λ t q t is the price in utility terms (that is, in the units in which we measure utility u). Differentiating this expression with respect to C t , L t , I t , K t+1 and B t+1 provides the first-order conditions 0 = uC (C t , L t ) − Λ t ,
0 = u L (C t , L t ) + Λ t Z t FAL (K t , A t L t )A t , 1 0 = qt − 0 , Φ (I t /K t ) Λ t+1 0 = q t − βE t Z t+1 FK (K t+1 , A t+1 L t+1 ) Λt
(6.15a) (6.15b) (6.15c) (6.15d)
6.2 Extended Path Method
327
+ q t+1 1 − δ + Φ(I t+1 /K t+1 ) − Φ0 (I t+1 /K t+1 )(I t+1 /K t+1 ) ,
0 = Λ t − βE t Λ t+1 (1 + r t+1 ).
(6.15e)
Equations (6.15a), (6.15b), and (6.15e) are standard and need no further comment. Equation (6.15c) determines investment expenditures as a function of the current capital stock and the price of capital q t . According to equation (6.15d), the current price of capital must equal the expected discounted future reward from an additional unit of capital. This reward has several components: the increased output given by the marginal product of capital Z t+1 FK (K t+1 , A t+1 L t+1 ), the residual value of the remaining unit of capital q t+1 (1 − δ), and the increased productivity of future investment q t+1 Φ(I t+1 /K t+1 ) − q t+1 Φ0 (I t+1 /K t+1 )(I t+1 /K t+1 ).
FUNCTIONAL FORMS. Correia et al. (1995) assume that F is the usual Cobb-Douglas function F (K t , A t L t ) = K tα (A t L t )1−α ,
α ∈ (0, 1).
(6.16)
For the current period utility function, they consider the specification proposed by Greenwood et al. (1988): u(C t , L t ) =
Ct −
θ 1+ν 1−η 1+ν A t L t 1−η
,
θ , ν > 0.
(6.17)
We shall see in a moment that this function excludes the income effect on labor supply implied by the utility function in our benchmark business cycle model of Example 1.6.1. As a consequence, working hours are stationary even if consumption follows a random walk. Correia et al. (1995) do not need to specify the function Φ because they resort to a first-order perturbation solution, which requires only the elasticity of Φ0 . However, we need an explicit function to solve the extended path problem and use the specification introduced in (5.62): ϕ1 Φ(I t /K t ) = 1−κ
It Kt
1−κ + ϕ2 ,
κ ∈ R≥0 \ {1}.
(6.18)
The parameter κ determines the degree of concavity. For κ close to zero, adjustment costs of capital play a minor role.
328
6 Simulation-Based Methods
TEMPORARY EQUILIBRIUM. The model depicts a growing economy. Therefore, we must define new variables that are stationary. As in the benchmark business cycle model, this is accomplished by scaling the original variables (in as much as they are not themselves stationary) by the level of laboraugmenting technical progress A t . We think by now the reader is familiar with this procedure and able to derive the following system from (6.15) and the functional specifications (6.16), (6.17), and (6.18), respectively. −η θ 0 = ct − L 1+ν − λt , (6.19a) 1+ν t α 0 = θ L νt − (1 − α)Z t L −α (6.19b) t kt , 0 = i t − (ϕ1 q t )1/κ k t , 0 = q t − β a−η E t
(6.19c)
λ t+1 1−α αZ t+1 kα−1 t+1 L t+1 λt
+ q t+1 1 − δ + Φ(i t+1 /k t+1 ) − ϕ1 (i t+1 /k t+1 )1−κ
0 = λ t − β a−η E t λ t+1 (1 + r t+1 ),
0 = ak t+1 − Φ(i t /k t )k t − (1 − δ)k t , 0 = a b t+1 −
Z t kαt L 1−α t
− (1 + r t )b t + c t + i t .
(6.19d)
, (6.19e) (6.19f) (6.19g)
The lower case variables are defined as x t := X t /A t , X t ∈ {C t , I t , K t , B t } η except for λ t := A t Λ t . Equation (6.19b) follows from (6.15b), if Λ t is replaced by (6.15a). It determines working hours L t as a function of the marginal product of labor. In a decentralized economy, the latter equals the real wage per efficiency unit of labor w t . Viewed from this perspective, equation (6.19b) is a static labor supply equation with w t as its single argument so that there is no operative income effect. This is an implication of the utility function (6.17). Equation (6.19f) is the scaled transition law of capital (6.12), and equation (6.19g) derives from the household’s budget constraint (6.13). CALIBRATION. We do not intend to provide a careful, consistent calibration of this model with respect to a specific small open economy (say, the Portuguese economy, to which Correia et al. (1995) refer) since we left out a few details of the original model (as government spending and international transfers) and since our focus is on the technical details of the solution and not on the model’s descriptive power.6 For this reason, 6
We refer the reader interested in the calibration of small open economy, real business cycle models to Chapter 4 of Uribe and Schmitt-Grohé (2017).
6.2 Extended Path Method
329
we continue to use the values of the parameters a, α, β, η, δ, ρ, σε , and L from Table 1.1 and choose θ so that the stationary fraction of working hours equals L = 0.126. The additional parameters are ν and κ. The inverse of ν is the wage elasticity of labor supply, and the inverse of κ is the elasticity of the investment to capital ratio with respect to the price of capital. We choose the values of these parameters so that the model approximately matches the standard deviation of output and investment found in German data.7 This is the case for ν = 0.70 and κ = 1/35. Finally, the values of the parameters ϕ1 and ϕ2 of the function (6.18) follow from the condition that adjustment costs play no role on the model’s balanced growth path. This requires i = (a + δ − 1)k and q = 1, implying ϕ1 = (a + δ − 1)κ , ϕ2 = (a + δ − 1)
κ . κ−1
BALANCED GROWTH PATH. Given the choices made up to this point, we can solve equations (6.19) for the economy’s balanced growth path by ignoring the expectation operator and by setting x t = x t+1 = x for all variables x. Equation (6.19e) then implies r=
aη − 1. β
(6.20a)
This is a restriction on the parameters of our model, since the real interest rate r is exogenous to the small open economy. The properties of the function Φ imply the solution for the output-capital ratio from equation (6.19d): y aη − β(1 − δ) = . k αβ
(6.20b)
Given L, we can infer k and y from this solution. This, in turn, allows us to solve (6.19c) for i. It is, however, not possible to obtain definite solutions for both b and c: on the balanced growth path the budget constraint (6.19g) simplifies to (a − (1 + r))b = y − c − i.
(6.20c)
Formally, the parameter restriction (6.20a) deprives the model of one equation. Economically, the possibility to borrow on the international 7
See the second moments from HP-filtered time series in Table 4.2.
330
6 Simulation-Based Methods
capital market allows consumption smoothing to a degree that imposes a unit root. To understand this, consider equation (6.19e) and assume a constant real interest rate r and perfect foresight. This implies λ t+1 =1 λt so that there is no tendency for λ t to return to its initial value λ, say, after a shock. However, L is determined from a, α, β, δ, θ , and ν, and will converge if the capital stock converges to k. As a consequence, any permanent jump of λ t translates into a permanent jump of consumption and — via the budget constraint — into a permanent change of b t . This unit root is an obstacle for any local solution method. After all, these methods determine the parameters of the policy function from the stationary solution. A model without tendency to return to its balanced growth path can be driven far from it, even by a sequence of shocks that are themselves generated from a stationary stochastic process. The policy functions that are used to simulate this model, thus, might become increasingly unreliable. As we will demonstrate in the next paragraph, the extended path algorithm is immune to this problem. Before we turn to the solution of our model, we resolve the problem with c and b. We simply assume that the economy starts with zero net foreign debt, b = 0, so that c = y − i. SOLUTION. The system of equations (6.19) fits into the general structure of equations (6.8). The shocks are z t = [ln Z t , r t ]0 , the endogenous state variables with initial conditions are x t = [k t , b t ]0 , and the not predetermined variables are y t = [c t , i t , L t , λ t , q t ]0 . We assume — without proof — that the capital stock approaches k but invite the reader to use the methods from Section 3.2 and to check numerically that the model has indeed one root equal to unity, one root between zero and one and two roots outside the unit circle. As noted above, this implies that i t , L t , w t , and y t also approach their stationary values. For the model to be consistent with the solvency condition (6.14), it must hold that b t does not grow without bound but converges to a certain limit in response to a shock. We induce this condition by assuming b T = b T −1 for some large T . As in the previous subsection, we reduce system (6.19). For a given value of λ t , the Euler equation (6.19e) determines the future path of marginal utility from the given time path of the world interest rate. We can then
6.2 Extended Path Method
331
use conditions (6.19a)-(6.19c) to eliminate consumption c t , hours L t , and investment i t for given values of factor productivity Z t , capital k t , and the shadow price of capital q t . Accordingly, our final system has 3T unknowns. They are the Lagrange multiplier of the current period budget constraint T −1 λ t , the sequence of capital stocks {k t+s }s=1 , the sequence of foreign assets T −1 T {b t+s }s=1 , and the shadow prices of capital {q t+s }s=0 . Given the expected T T paths of both shocks, E t {z t+s }s=0 and E t {r t+s }s=0 , the nonlinear system of equations consists of (6.19f) and (6.19g) for t through t + T − 1, and the Euler equation (6.19d) for t through t + T . For a given λ t , all future values follow from the Euler equation (6.19e). Equations (6.19b) can be solved for hours L t , t = 1, . . . , t + T , and equations (6.19c) can be solved for investment. RESPONSE TO A PRODUCTIVITY SHOCK. Figure 6.5 depicts the response of the model’s variables to an unexpected increase in TFP in period t = 1 computed from the GAUSS program SOE.g. The world interest rate r t stays at its long-run solution. As in the benchmark model, the outward shift of the production function raises the marginal product of labor and the household works more so that output increases by more than the increase in productivity. The household’s desire to smooth consumption raises the demand for domestic investment so that the price of capital increases. The prospect of aboveaverage productivity raises the anticipated return on domestic investment so that investment remains above its long-run level for several periods. For a few periods, the increases in consumption and investment exceed the increase in output and the country’s trade balance turns negative. Accordingly, the country becomes a net debtor. Since the word interest rate is given, marginal utility declines on impact and remains at this lower level. As the factor prices, investment, and capital return to their long-run levels, the trade balance increases temporarily. Finally, however, the income effect of the shock turns the country into a net lender. The proceeds from foreign assets finance permanent consumption beyond its initial level. RESPONSE TO A WORLD INTEREST RATE SHOCK. Consider next an unanticipated shock to the world interest rate in t = 1. If this shock is not autocorrelated so that r t = r for t = 2, 3, . . . there are no anticipated effects via the Euler equation (6.19e), and the current impact depends on the foreign asset position. Since Z1 = Z and k1 = k the first-order
332
6 Simulation-Based Methods 1.25
Percent
0.70
TFP Shock Output
1.00
Hours Real Wage
0.60 0.50
0.75
0.40 0.30
0.50
0.20
0.25
0.10
0.00
0.00 20
40
60
80
100
20
Quarter 3.00
Percent
0.10
−0.05
1.00 0.50
−0.10
0.00 20
40
60
80
100
−0.15
20
Quarter
Deviation
40
60
80
100
Quarter
·10−3
·10−2
0.75
Trade Balance
0.50
2.00
Foreign Assets
1.50
0.25
1.00
0.00
0.50
−0.25 −0.50
100
0.00
1.50
1.00
80
Marginal Utility Price of Capital
0.05
2.00
−0.50
60
Quarter
Investment Consumption
2.50
40
0.00 20
40
60
Quarter
80
100
20
40
60
80
100
Quarter
Figure 6.5 Impulse Responses to a Productivity Shock in the Small Open Economy Model
condition for labor supply implies L1 = L. Accordingly, if the country is initially neither a lender nor a borrower, b1 = 0, condition (6.19g) holds for an unchanged trade balance implying unchanged values of consumption c1 = c and investment. In this case, only autocorrelated shocks have temporary and lasting effects.
6.2 Extended Path Method
333
·10−2
0.00
Percent
0.00
−1.00
−2.00
−2.00
−4.00
Interest Rate Shock Output
−6.00 −8.00
·10−2
20
40
60
80
−3.00
Hours Real Wage
−4.00
100
20
Percent
Quarter 0.25 0.00 −0.25 −0.50 −0.75 −1.00 −1.25 −1.50 −1.75
Deviation
80
100
·10−2 7.50 5.00 2.50 0.00
Investment Consumption 20
40
60
Marginal Utility Price of Capital
−2.50 80
100
−5.00
20
40
60
80
100
Quarter
·10−3
1.25
Trade Balance
2.00 1.50
·10−2
Foreign Assets
1.00 0.75
1.00
0.50
0.50
0.25
0.00 −0.50
60
Quarter
Quarter 2.50
40
0.00 20
40
60
Quarter
80
100
20
40
60
80
100
Quarter
Figure 6.6 Impulse Responses to a World Interest Rate Shock in the Small Open Economy Model
Figure 6.6 displays the consequences of an autocorrelated, positive interest rate shock in the upper left panel. The shock hits the economy in period t = 1 and raises the world interest rate by one percent. The shock is autocorrelated with coefficient equal to 0.9. The prospect of temporarily higher returns on the world capital market increases savings and triggers a portfolio adjustment. Consumption declines, the price of
334
6 Simulation-Based Methods
capital falls, and investment in the country’s capital stock declines in favor of foreign bonds. The smaller stock of capital decreases the marginal product of labor so that the real wage and employment decline. This occurs no sooner than in period t = 2 so that output begins to decline in the period after the incidence of the shock. In the end, we observe a permanent increase of consumption financed from the interest income on foreign bonds. Therefore, the initial trade surplus is being replaced by a permanent trade deficit in the long-run. Note, however, that all these effects are small relative to the size of the shock. SECOND MOMENTS. How does the extended path method perform relative to the perturbation solution of the small open economy model? To answer this question, we use a modified version of the model without the unit root property. For this model, we compute the usual set of second moments from simulations with the second-order perturbation solution. We compare this set with a table of second moments obtained from the extended path solution of the model (6.19) with the unit root. Among the several ways to remove the unit root (see Schmitt-Grohé and Uribe (2003) and Uribe and Schmitt-Grohé (2017), Chapter 4) we introduce a dept elastic interest rate by assuming r t = r + ψ(e b−b t − 1), where r is the time-invariant interest rate defined in equation (6.20a) and b is the constant long-run level of foreign assets. Accordingly, a country that is initially a net debtor, b < 0, must pay an interest above r if its current debt increases, b t < b, and a country that is initially a net creditor, b > 0 must accept an interest below r if its foreign assets increase beyond b. Our simulations with the Fortran program SOE_EP.f90 exclude world interest rate shocks. We set ψ = 0.025 for the perturbation solution and ψ = 0 in the extended path solution, so that in the latter the unit root remains and the world interest rate is constant.8 Table 6.2 displays the results. There are small differences in the secondmoments that reflect the feed-back effect of the debt elastic interest rate. Notably, consumption is more (0.04 percentage points) and investment less (0.17 percentage points) volatile in the perturbation solution than 8
R We also provide a MATLAB version of this program. However SOE_EP.m is considerably slower than the Fortran program.
6.2 Extended Path Method
335
Table 6.2 Second Moments from the Small Open Economy Model
Variable Output Consumption Investment Hours Real Wage
sx
Perturbation rx y rx
1.39 0.78 3.50 0.82 0.57
1.00 1.00 0.99 1.00 1.00
0.64 0.65 0.60 0.64 0.64
Extended Path sx rx y rx 1.41 0.74 3.67 0.83 0.58
1.00 1.00 0.97 1.00 1.00
0.65 0.65 0.63 0.65 0.65
Notes: Second moments computed from HP-filtered simulated time series with 50,000 included observations and a burn-in period of 1,00 observations. s x :=standard deviation of variable x, r x y :=cross correlation of variable x with output, r x :=first order autocorrelation of variable x.
in the extended path solution. These differences are not very sensitive to the choice of the value of the parameter ψ, unless ψ is not close to zero. In this case, the feed-back effect ceases to work in the modified model. The process for foreign assets implied by the perturbation solution becomes highly persistent and domestic investment has to absorb more of the effects of TFP shocks. For instance, the value ψ = 0.000742 from Uribe and Schmitt-Grohé (2017), p. 85, increases the standard deviation of investment from 3.50 (see Table 6.2) to 3.75. The much larger value of ψ = 1.3, estimated by Uribe and Schmitt-Grohé (2017) from Argentina data, reduces the standard deviation of investment obtained from the perturbation solution to 3.20, while the other second moments differ only negligibly from those reported in Table 6.2. As in the case of the benchmark business cycle model, the extended path solution is highly accurate. Our Fortran program computes the residuals er t of equation (6.19d) with Gauss-Hermite integration on nine points. In a simulation with T = 10, 000 observations, the maximum of |er t |/c t over t is less than 6.2e−6. The residual from a simulation with the second-order perturbation solution is equal to approximately 4.2e − 2. 6.2.5 Conclusion The extended path method presented in the preceding sections is a highly accurate method. In our two applications it achieves Euler residuals that
336
6 Simulation-Based Methods
are several orders of magnitude smaller than those obtained from a secondorder perturbation solution. Compared to this latter method, it requires more computational effort, both in terms of coding the solution and in terms of computational time. Whereas it takes only a few seconds to solve and simulate a model with perturbation methods, minutes or even hours may be required to perform the same simulation with the extended path method. Different from higher-order perturbation solutions, the extended path solution rests on certainty equivalence. However, it can be applied to models, where some of the variables are not stationary. In addition, it is useful if the focus of a study is not on second moments but on the expected time path of a highly nonlinear model under given paths of its driving forces. A recent application in this respect is the study of Christiano et al. (2015) on the driving forces of the 2007/09 recession.
6.3 Simulation and Function Approximation 6.3.1 Motivation The weighted residual methods considered in Chapter 5 compute the residual function on a predefined set of points. The drawback of this approach is that the algorithm must compute the residual even at points that are unlikely to be visited by the model economy in a stochastic simulation (see Figure 5.2). Hence, it is tempting to combine stochastic simulation with methods of function approximation. We follow Judd et al. (2011) and refer to this combination as generalized stochastic simulation (GSS). We employ the stochastic Ramsey model with the parameterization defined in (6.2) to illustrate this idea and consider a variety of ways to implement this approach. THE MODEL. We repeat the system of stochastic difference equations that determines the time path of this model: K t+1 = ez t K tα + (1 − δ)K t − C t , −η Ct
=
−η βE t C t+1
1 − δ + αe
z t+1 = ρz z t + ε t+1 , ε t+1 iid
N (0, σε2 ).
z t+1
α−1 K t+1
,
α ∈ (0, 1), δ ∈ (0, 1], (6.21a)
β ∈ (0, 1), η ≥ 0,
ρz ∈ (−1, 1),
(6.21b)
(6.21c)
6.3 Simulation and Function Approximation
337
The solution of this model consists of the policy functions for next the period capital and consumption: K t+1 = hK (K t , z t ),
(6.22a)
C t = hC (K t , z t ).
(6.22b)
Note that we need only one of the two functions to simulate the model since we can employ the economy’s resource constraint (6.21a) to solve for the respective other variable. Alternatively, we may approximate the conditional expectation on the right-hand side (rhs) of the Euler equation (6.21b), which is equivalent to the policy function of marginal utility Λ t : −η
Λt ≡ Ct
= hΛ (K t , z t ).
(6.22c)
APPROXIMATING FUNCTION. In Chapter 5, we argue for Chebyshev polynomials as basis functions. In the present context, they are less suited since we do not know in advance the boundaries of the state space that we will encounter during the simulation. Polynomials whose domain is the entire real line R, such as monomials and Hermite polynomials, are better suited. The same applies to neural networks, which are an important tool in machine learning. To fix ideas, suppose we choose a second-order complete polynomial to approximate the policy function for consumption: C t = b1 + b2 K t + b3 z t + b4 K t2 + b5 K t z t + b6 z t2 .
(6.23)
SIMULATION. In the next step, we choose the number of periods T for the simulation, draw a set of T random numbers from the N (0, σε2 ) distribution, and compute the time series of TFP shocks z := {z t } Tt=0 from the process (6.21c) and an initial value z0 . We keep this sequence over the entire solution process. For t = 0, . . . , T let x t := [1, K t , z t , K t2 , K t z t , z t2 ]0 abbreviate monomials up to degree 2 in the vector of states and b := [b1 , b2 , . . . , b6 ]0 the vector of parameters so that C t = x0t b.
338
6 Simulation-Based Methods
Given some initial capital stock K0 and an initial parameter vector b0 , we can then iteratively compute a time sequence of capital and consumption: C t = x0t b0 , K t+1 = ez t K tα + (1 − δ)K t − C t , t = 0, . . . , T. (6.24) 2 2 x t+1 = [1, K t+1 , z t+1 , K t+1 , K t+1 z t+1 , z t+1 ]. The problem that we may face at this point is that at the chosen parameters b0 and/or the drawn sequence of shocks z, we are not able to compute the full sequence of consumption and capital. For instance, capital may diverge to some large number, and consumption may become negative at some t < T . Accordingly, the choice of the initial parameter vector is key to success. RESIDUALS. Given the time paths of capital and consumption, {K t } Tt=0 and {C t } Tt=0 , we can compute the residuals of the Euler equation (6.21b). Note that the capital stock at period t + 1 is determined from the pair (K t , z t ), so that only consumption C t+1 depends on the innovation ε t+1 . Let ε t+1, j , j = 1, . . . , m denote m different integration nodes, ν j denote their weights, and C t+1, j denote the respective solution for consumption: C t+1, j = x0t+1, j b0 , 2 2 0 x t+1, j := [1, K t+1 , z t+1, j , K t+1 , K t+1 z t+1, j , z t+1, j] ,
z t+1, j := ρz z t + ε t+1, j . Then, the numeric approximation of the conditional expectation on the rhs of the Euler equation (6.21b) is given by −η α−1 E t β C t+1 1 − δ + αez t+1 K t+1 n X −η α−1 ≈ ν j β C t+1, j 1 − δ + αez t+1, j K t+1 . (6.25) j=1
For m = 1 and therefore ε t+1,1 ≡ ε t+1 and ν1 = 1, this is a one-node Monte Carlo integration and the respective residual er t equals: C t+1 −η α−1 er t = 1 − β 1 − δ + αez t+1 K t+1 . Ct Judd et al. (2011) argue that deterministic integration provides both higher accuracy of the final solution and a faster convergence. Accordingly,
6.3 Simulation and Function Approximation
339
employing the Gauss-Hermite integration formula (14.31) with m nodes x GH and weights νGH gives the approximate residual j j m νGH X C t+1, j −η j α−1 er t = 1 − β 1 − δ + αez t+1, j K t+1 , p Ct π j=1 p z t+1, j = ρz z t + 2σz x GH j .
(6.26)
Via the simulation step (6.24), the vector of Euler equation residuals er := [er0 , . . . , er T ]0 is a function of the parameter vector b0 , and we indicate this by writing er(b0 ). It might be tempting to choose the vector b that minimizes the sum of squared residuals: b := argmin er(b0 )0 er(b0 ). b0
(6.27)
However, algorithms that solve this nonlinear least squares problem bear the danger of running into parameter values at which the simulation step breaks down. UPDATE OF THE PARAMETER VECTOR. Judd et al. (2011) argue for an iterative procedure. Let y t := (1 + er t )C t = x0t b + u t define the linear regression model with errors u t , which arise from the simulation of the model with initial parameter vector b0 , as defined in (6.24). We construct an update of b0 by minimizing the sum of squared errors u t . Let X (b0 ) := [x0 , . . . , x T ]0 and y(b0 ) := [ y0 , . . . , y T ]0 ; then, the well-known formula of the linear least squares estimator provides the updated parameter vector b1 :9 b1 = X (b0 )0 X (b0 )
−1
X (b0 )0 y(b0 ).
(6.28)
The simulation step (6.24) and the update step (6.28) proceed until b1 and b0 are sufficiently close together. Alternatively, one might stop the iterations process if two successive iterates for consumption are close together. Since the parameter vector b is of no special interest, this latter criterion is more important and may shorten the solution process if the 9
See any econometrics textbook for this formula, as e.g., Greene (2012), p. 68, equation (3-6) or Stock and Watson (2012), p. 744, equation (18.11).
340
6 Simulation-Based Methods
policy function converges faster than the parameter vector. To stabilize the convergence process, a common update strategy is to employ a convex combination of b1 and b0 as the updated estimate of b: b = ψb1 + (1 − ψ)b0 , ψ ∈ (0, 1). Finally, instead of the fixed point iterations, one might employ a nonlinear equations solver to compute the solution of 06,1 = b − X (b)0 X (b)
−1
X (b)0 y(b),
(6.29)
where X and y follow from the simulation step (6.24). IMPLEMENTATION. We study the performance of various implementations of the algorithm outlined in the previous paragraphs with the program SGM_GSS.m. It allows for three different methods: fixed point iterations over (6.24) and (6.28), minimization of of Euler equation residuals (6.27), and direct computation of the fixed point (6.29). The user can also choose among the policy functions for consumption, capital, and marginal utility. The computation of Euler residuals employs either one-node Monte Carlo integration or Gauss-Hermite integration. Finally, the script offers three strategies to find the initial parameter vector: a simulation that employs the perturbation solution of the model, a simulation that employs the exact analytic solution of the model for η = 1 and δ = 1, and a genetic search algorithm coded in our function GSearch1.m. Combining the various alternatives yields 54 different ways to solve the model. Table 6.3 displays only those combinations that successfully solved the model. The Euler equation residuals in the right-most column were computed from a second set of random numbers. As explained in Section 1.7.2, they measure the (maximum absolute) relative change in consumption required to meet the Euler equation (6.21b). RESULTS. The first lesson drawn from the table is that finding a solution is by no means the rule: only 13 of the 54 attempts were successful. Among those, the iterative procedure dominates with 7 solutions. Minimizing the sum of squared Euler residuals is the least effective method with only one solution. The remaining 5 solutions result from the direct computation of the fixed point. Initial values from the perturbation solution accelerate the solution in two respects: they are fast to compute, and they reduce
6.3 Simulation and Function Approximation
341
Table 6.3 Successful GSS Solutions of the Stochastic Growth Model Method
Policy Function
Initialization
1 1 1 1 1 1 1 2 3 3 3 3 3
1 2 2 2 3 3 3 3 1 1 2 3 3
1 1 1 3 1 1 3 2 1 3 3 1 3
Integration Iterations 2 1 2 1 1 2 1 2 2 2 2 2 2
2 569 2 9340 1 1 164 . . . . . .
Euler Residual 5.7478E-07 7.2423E-05 2.1092E-06 6.8978E-05 1.1679E-04 7.6980E-06 7.6227E-05 7.1144E-04 4.3402E-07 4.3402E-07 7.4530E-03 7.4955E-06 7.4955E-06
Notes: Method: 1=fixed point iterations, 2=minimization of Euler equation residuals, 3=direct computation of the fixed point; Policy function: 1=consumption, 2=marginal utility, 3=next period capital; Initialization: 1=perturbation solution, 2=analytic solution, 3=genetic search; Integration: 1=one point Monte-Carlo, 2=5 point Gauss-Hermite quadrature.
the required number of iterations to convergence in the iterative solution procedure. Genetic search is computationally more expensive and the respective initial values appear to be far from the final solution so that the iterative algorithms require many steps (often more than our upper limit on the number of iterations) to converge. Targeting the policy function for capital was successful in 6 out of the possible 18 attempts, whereas targeting the consumption function was effective in only 3 of 18 attempts. One-node Monte-Carlo integration appears to be inferior to Gauss-Hermite integration (we used 5 nodes) with respect to both solving the model at all and the final accuracy in terms of the Euler equation residuals.
6.3.2 The General Algorithm In this subsection, we describe the steps involved in simulation-based approaches to function approximation in more general terms. The objective is to solve the canonical DSGE model defined in equations (6.8).
342
6 Simulation-Based Methods
Algorithm 6.3.1 (Generalized Stochastic Simulation (GSS)) Purpose: Approximate solution of a DSGE model. Steps: Step 1: Initialize: Step 1.1: Choose the smallest number m of variables and/or expectational terms required to solve system (6.8a) for the remaining variables. Step 1.2: Let w t := [x0t , z0t ]0 ∈ W ⊂ Rn(x)+n(z) denote the vector of endogenous and exogenous state variables and h j : W → R, j = 1, . . . , m the respective policy functions. Step 1.3: Choose some flexible functional form ˆh j (w t , γ ), γ = [γ1, j , . . . , γK , j ]0 ∈ RK j j j j
(6.30)
that approximates the policy function h j . Step 1.4: Choose the number of periods T , draw a random sequence T of shocks {ε t } t=0 from the N 0n(z)×1 , I n(z) distribution, and obtain the time path of the exogenous state variables z t by iterating equation (6.8b) for t = 0, . . . , T − 1. Step 1.5: Let Γs = (γ jk )s denote the collection of parameters at step s. For s = 0 initialize the matrix Γs ∈ Rmax {K j }×m .
Step 2: Simulation and solution:
Step 2.1: Select an initial vector w0 ∈ W , employ the approximate policy functions (6.30), and for t = 0, 1, . . . , T solve system (6.8a) to obtain the sequences {x t } Tt=0 and {y t } Tt=0 . Step 2.2: Compute the matrix of residuals R ∈ R T ×m from {x t } Tt=0 , {y t } Tt=0 , {z t } Tt=0 and the respective equations of the model. Step 2.3: Determine the matrix Γ that minimizes R in some given matrix norm. Step 3: Verify the quality of the solution Γ : Draw a new sequence of shocks {ε t } Tt=0 , simulate the model, and compute the matrix R. If the errors are too large, return to Step 1 and increase the degree of approximation and/or choose a different family of basis functions. As the previous section illustrates, a particular implementation of this algorithms involves many decisions: the choice of policy functions, basis functions, degrees of approximation, the number of simulation periods, the integration method involved in the computation of the residuals, the
6.3 Simulation and Function Approximation
343
update method, and the choice of the stopping criterion. Many of these resemble the decisions involved in the implementation of the weighted residual methods discussed in Section 5.3. Accordingly, we will consider only those points where the two methods differ. BASIS FUNCTIONS. Chebyshev polynomials, the family of choice for the spectral methods considered in Chapter 5, are less suited for simulationbased methods. The first reason is that their domain is the compact interval [−1, 1]. However, we do not know the boundaries of the ergodic set in advance and it might not resemble a n(w)-dimensional hypercube (compare Figure 6.4). Accordingly, using this family requires either multidimensional extrapolation of the policy functions at the boundaries or trial and error to determine the boundaries or a combination of both. The second reason is that Chebyshev polynomials are orthogonal on the discrete set of points consisting of the zeros of the polynomial of degree n (compare (13.32)) but not on the ergodic set. The domain of the monomials and the Hermite polynomials is the entire real line; however, this advantage comes at a cost. As explained in Section 13.5.3, higher order monomials are similar to each other. Therefore, solving the regression problem as part of Step 3 may suffer from multicollinearity. The elements of a basis of Hermite polynomials (see Section 13.7.4) are less correlated than the elements of a basis of monomials. Nevertheless, they are not orthogonal under the measure of the ergodic set. Accordingly, the regression step should be able to address multicollinearity. Since even the state space of the stochastic growth model has two dimensions, we have to address the curse of dimensionality. As explained in Section 13.9, the set of complete polynomials offers a partial remedy as compared to the tensor product base. Instead of linear combinations of polynomials, we may employ flexible nonlinear functions. Neural networks provide such an alternative; we introduce them in Section 13.9.5. Their potential can been seen in the work of Lim and McNelis (2008), who employ neural networks with few nodes to solve a variety of small open economy models. More recent applications are Maliar et al. (2021) and Fernández-Villaverde et al. (2023). A potential drawback of their use is that the solution step requires nonlinear regression methods.
344
6 Simulation-Based Methods
SIMULATION LENGTH. Let Ω T := {(x t , y t , z t )} Tt=0 denote the set of points traced out by a simulation of the canonical model (6.8) with length T . Suppose we could compute this set with the exact policy functions. In this case Ω T , is a sample of size T drawn from the ergodic distribution arising from the solution of the system of stochastic difference equations. The law of large numbers implies that the first and second moments of time series estimated from this set converge in probability to those constructed from the measure of the ergodic set. Accordingly, simulation-based approaches to solve the stochastic growth model use large integer values of T . For instance, Duffy and McNelis (2001) use T = 2, 000, Den Haan and Marcet (1990) use T = 2, 500, the Fortran programs of Marcet and Lorenzoni (1999) allow for a maximum R of T = 10, 000, the MATLAB program of Judd et al. (2011) also sets T = 10, 000, and Christiano and Fisher (2000) choose an even larger T = 100, 000. There are two criteria that can guide the choice of T : the maximum Euler equation error and/or the sensitivity of time series moments. If an increase in T from T1 to T2 does not decrease the error significantly and/or only changes time series moments beyond the the second decimal digit, one can accept T1 . REGRESSION METHOD. The solution of the model involves a regression problem. If we determine the parameter matrix Γ by minimizing the sum of squared residuals, this problem is nonlinear. The nonlinearity arises from the computation of Euler equation residuals and not from the choice of the approximating functions. Even if the latter are linear in the parameters, as in our example in equation (6.23), the Euler equation residuals are nonlinear functions of the columns of the matrix Γ . In Section 15.4.2, we introduce the reader to the damped Gauss-Newton Method, which is a standard procedure to solve nonlinear least squares problems. If we determine the parameter vector from fixed point iterations, we can avoid the additional complexities of nonlinear regression methods by choosing approximations of the policy functions that are linear in the parameters. The repeated steps of simulation and parameter estimation boil down to a nonlinear difference equations in the matrix Γ and there is no guarantee that this system is asymptotically stable. To improve convergence, the updated estimate is taken as a convex combination of the matrix Γs employed in the simulation and the matrix Γ ? that solves the regression problem:
6.3 Simulation and Function Approximation
Γs+1 = ψΓ ? + (1 − ψ)Γs ,
ψ ∈ (0, 1].
345
(6.31)
The choice of the dampening parameter ψ may require some trial and error. Note that the simulation and regression step are also involved in a direct computation of the fixed point via the solution of the respective system of nonlinear equations similar to the example given in (6.29). Polynomial bases in the state vector may lead to an ill-conditioned least squares problem. There are two sources of ill-conditioning: poorly scaled variables and multicollinearity. A solution to the first problem is normalization of the columns of the regressor matrix X so that each column has a mean of zero and unit variance. Multicollinearity occurs if the columns of the data matrix X are strongly correlated. Since higher order monomials are very similar to each other (see Figure 13.4), multicollinearity will be a particular problem in monomial bases with a large number of elements in each dimension of the state space. Judd et al. (2011) present a variety of remedies encoded in a simple to use R MATLAB function Num_Stab_Approx.m. Among them is the truncated principal components estimator presented in Section 12.9.6. They also consider an alternative to the least squares method and determine the parameter matrix Γ by minimizing the sum of absolute deviations.
6.3.3 Application: The Benchmark Business Cycle Model In this section, we consider the benchmark business cycle model. We refer the reader to Example 1.6.1, the system of equations (1.64) (reproduced in (5.1)), and the calibration in Table 1.1. IMPLEMENTATION. As in Section 5.5, we approximate the policy function for working hours h L on the two-dimensional state space w t = (k, z)0 ∈ W ⊂ R2 , where k is the scaled capital stock and z = ln Z is the natural logarithm of the total factor productivity (TFP) Z. We compare an approximation that is linear in the parameter vector γ with a neural network. In the linear case, we construct a complete basis from the family of monomials with degree d: ˆh L (k, z, γ) =
d X X j=0 i1 ,i2 ≥0 i1 +i2 = j
γi1 ,i2 k i1 z i2
346
6 Simulation-Based Methods
and consider degrees between d = 2 and d = 4 so that the parameter vector γ has 6, 10, and 15 elements, respectively. In the nonlinear case, we consider a neural network with N = 2 and N = 4 neurons in the hidden layer and the sigmoid function as transfer function: ˆh L (k, z, γ) =
N X i=1
γ0i , 1 + e−x i
x i = γi1 + γi2 k + γi3 z.
Accordingly, the approximate functions have 8 and 16 parameters, respectively. From a simulation of length T , we compute the residuals of the Euler equation (1.64) via Gauss-Hermite integration with i = 1, . . . , nGH nodes x GH and weights νGH j j : er t := 1 − β a z j := ρz z t +
−η
p
nGH νGH X λ(k t+1 , z j ) j (1 − δ + r(k t+1 , z j )), p λ π t j=1
2σε x GH j ,
where k t+1 and λ t are, respectively, the capital stock of the next period and the marginal utility of consumption of the current period, both computed by solving system (1.64) for a given ˆh L (k t , z t , γs ). In the same way, λ(k t+1 , z j ) and r(k t+1 , z j ) denote the solution of this system for ˆh L (k t+1 , z j ). For both kinds of approximating functions, we determine the parameter vector in step s of the fixed-point iterations to minimize the sum of squared errors u t of the model u t (γ) = L t (1 + er t ) − ˆh L (k t , z t , γ). The solution γ? of the least squares problem ?
γ = argmin γ
T X t=1
u t (γ)2 .
is then used to update the parameter vector as proposed in equation (6.31). We stop the algorithm if the policy function for hours from two successive iterations diverges by less than some small number eps: T 1 X ˆh L (k t , z t , γ s+1 ) − ˆh L (k t , z t , γs ) < eps. ˆh L (k t , z t , γ ) T s+1 t=1
6.3 Simulation and Function Approximation
347
We follow the recommendation of Judd et al. (2011), p. 196 and choose ε = ψ × 10−4−d , where d denotes the polynomial degree and ψ is the dampening parameter in (6.31). We initialize the policy function for hours in all our different implementations of the GSS method from a second-order perturbation solution. This solution involves 6 parameters. Finally, we test the accuracy of the solution in the following way: We draw a new sequence of innovations {ε t } Tt=1 of the process (1.61) and simulate time series for the model’s variables from the system of equations (1.64) and the approximate policy function for hours. From this series we omit the first T1 elements as burn-in period of the model. From the remaining T − T1 elements we compute the errors of the Euler equation (1.64h) in the same interpretable definition as in Sections 4.4, 5.5, and 6.2.3. RESULTS. Table 6.4 presents the results of the different solutions. The R MATLAB program BM_GSS.m employs the linear policy function, the program BM_GSS_NN.m employs the nonlinear one. We compare the accuracy of the solutions to a second-order perturbation solution. Note that different from Tables 4.1 and 5.2, we do not compute the residuals on a rectangular grid but on the time path simulated with the GSS solution. Thus, the results from Table 6.4 are not directly comparable to those in the aforementioned tables. Consider first the linear case. For T = 10, 000 the GSS solution with degree d = 2 (and, hence, the same number of parameters) is approximately as accurate as the perturbation solution. Increasing the degree to d = 3 increases the accuracy by one order of magnitude. For d = 4, the additional gain in accuracy is much smaller and amounts to a factor of approximately 0.35. However, it requires many more steps (75 versus 6) to compute this solution. The remaining entries in the table show that there is no remarkable change in the accuracy of solutions if we increase the length of the simulated time series so that the model visits more extreme points in its ergodic set. The relative performance of the different solutions observed for T = 10, 000 does not change with longer timer series. The number of iterations increases with the degree d of the polynomial. The accuracy in terms of the maximum absolute Euler equation residual is approximately the same for the second-order perturbation solution and the degree d = 2 GSS solution. The latter is approximately one order of magnitude less accu-
348
6 Simulation-Based Methods Table 6.4 GSS Solutions of the Benchmark Business Cycle Model T
Solution
Order/ Degree/ Neurons
Iterations
Euler Residual
Polynomial basis 10,000 10,000 10,000 10,000 20,000 20,000 20,000 20,000 50,000 50,000 50,000 50,000
P S S S P S S S P S S S
2 2 3 4 2 2 3 4 2 2 3 4
2 8 82
4.339E-06 5.613E-06 4.367E-07 1.542E-07 1.107E-05 1.600E-05 1.268E-06 1.304E-07 5.696E-06 6.565E-06 5.093E-07 1.927E-07
25 2 26 2 353 2
1.081E-05 1.424E-06 1.688E-05 3.227E-06 6.934E-06 3.570E-06
2 6 75 2 6 80
Neural network 10,000 10,000 20,000 20,000 50,000 50,000
S S S S S S
2 4 2 4 2 4
Notes: Solution: P=perturbation, S=simulation
rate than the degree d = 3 solution, which is less accurate than the degree d = 4 solution. However, this further increase is not always remarkable. The neural networks with 4 neurons performs roughly as well as the polynomial basis of degree d = 2. The ratio of the Euler equation residual of the former to the latter ranges from approximately 2 (T = 20, 000) to 0.25 (T = 10, 000). The number of iterations is the same. However, the algorithm employing neural networks must solve a nonlinear least squares problem at each iteration, which consumes more computational time than solving the linear regression problem. For instance, on a workstation with R Intel Xeon W-2133 CPU running at 3.60 GHz it takes less than 3 seconds to compute the GSS solution for d = 2 and T = 50, 000 in the polynomial
6.3 Simulation and Function Approximation
349
basis and 11 minutes and 42 seconds on the neural net with N = 4 neurons. The algorithm requires the overwhelming part of this time, 11 minutes and 34 seconds, for the first iterative step, when the net is trained with a time series computed from the perturbation solution. Table 6.5 demonstrates that the second moments computed from the simulations do not differ between the polynomial basis and the neural network. Comparing the entries to those in Tables 5.3 and 6.1 reconfirms the conclusion drawn in Section 5.5 that different solution methods for the benchmark business cycle have only negligible effects on the second moments implied by the model. Table 6.5 Second Moments from the Benchmark Business Cycle Model: GSS Solutions
Variable Output Consumption Investment Hours Real Wage
Polynomial Basis, d = 2 sx rx y rx 1.38 0.45 4.13 0.82 0.57
1.00 0.99 1.00 1.00 0.99
0.64 0.66 0.64 0.64 0.65
Neural Network, N = 4 sx rx y rx 1.38 0.45 4.13 0.82 0.57
1.00 0.99 1.00 1.00 0.99
0.64 0.66 0.64 0.64 0.65
Notes: Second moments computed from HP-filtered simulated time series with 50,000 included observations and a burn-in period of 1,000 observations. s x :=standard deviation of variable x, r x y :=cross correlation of variable x with output, r x :=first order autocorrelation of variable x.
6.3.4 Application: The Limited Participation Model of Money Our second application of the GSS method considers a monetary model with three endogenous states that is driven by a technology and a monetary policy shock so that its state space has a dimensionality of five. Accordingly, the model is better suited than the benchmark business cycle model to test the effectiveness of the GSS. We begin with the motivation of this model.
350
6 Simulation-Based Methods
MOTIVATION. In the textbook IS-LM model, an expansionary monetary shock lowers the nominal interest rate. Since inflationary expectations do not adjust immediately, the real interest rate also declines. This spurs investment expenditures, which in turn raise aggregate spending. Given a sufficiently elastic short-run supply function, output and employment increase. This story is in line with the empirical evidence reviewed in Christiano et al. (1999). However, early monetary DSGE models do not reproduce this liquidity effect. Instead, as agents learn about a monetary shock, they increase their expectations of future inflation and demand compensation in terms of higher nominal interest rates. The model we present here is able to account for both the liquidity and the inflationary expectations effects. It rests on a paper by Christiano et al. (1997). Different from these authors, we also include capital services as a factor of production. The model includes a rudimentary banking sector. Households face a cash-in-advance constraint and can lend part of their financial wealth M t to the banking sector at the gross nominal interest rate q t (one plus the nominal interest rate). The firms in this model pay wages to the household sector before they sell their output. To finance their wage bill, they borrow money from the banking sector. The government injects money into the economy via the banking sector. The crucial assumption is that banks receive the monetary transfer after households have decided about the volume of their banking deposits. Given the additional money, banks lower the nominal interest rate to increase their loans to firms. At the reduced credit costs, firms hire more labor and increase production. The fact that households cannot trade on the market for deposits after the monetary shock has been observed gives the model its name: limited participation model. The economy consists of a representative household, a representative producer, a financial intermediary, and the central bank. We introduce the financial intermediary or bank first. FINANCIAL INTERMEDIARY. At the beginning of the period t, the bank receives deposits of size B t from the household and additional fiat money M t+1 − M t from the central bank. M t represents the beginning-of-period money balances. Accordingly, the bank can lend B t + (M t+1 − M t ) to the producer. At the end of the period, the producer pays interest and principal q t B t to the bank, and the household receives its deposits, including interest payments. Accordingly, in units of the produced good, whose money price is Pt , the bank’s profits are equal to
6.3 Simulation and Function Approximation
DtB =
q t (B t + M t+1 − M t ) q t B t M t+1 − M t − = qt . Pt Pt Pt
351
(6.32)
PRODUCER. The representative producer employs labor L t and capital services K t to produce output according to Yt = Z t K tα (A t L t )1−α ,
α ∈ (0, 1).
(6.33)
As in the benchmark model, A t is the level of labor-augmenting technical progress, which grows deterministically at the rate a − 1 ≥ 0, and the natural log of TFP, z t = ln Z t , follows the process z t+1 = ρz z t + εz,t+1
ρz ∈ (−1, 1), εz,t+1 iid N (0, σz2 ).
(6.34)
The producer hires workers at the money wage rate Wt and capital services at the real rental rate r t . Since it has to pay workers in advance, it borrows Wt L t at the nominal rate of interest q t − 1 from the bank so that producer profits are equal to DtP = Yt − q t
Wt L t − rt Kt . Pt
(6.35)
Maximizing (6.35) with respect to L t and K t provides the following firstorder conditions: q t w t = (1 − α)Z t kαt L −α t ,
w t :=
Wt Kt , k t := , A t Pt At
r t = αZ t kα−1 L 1−α . t t
(6.36a) (6.36b)
Accordingly, producer profits are zero. MONEY SUPPLY. The central bank controls the money supply factor µ t = ˆ t := ln(µ t /µ), M t+1 /M t . The deviations of µ t from its stationary level µ, µ follow the stochastic process: ˆ t+1 = ρµ µ ˆ t + εµ,t+1 , µ
ρµ ∈ (−1, 1), εµ,t+1 iid N (0, σµ2 ).
(6.37)
HOUSEHOLD. The household’s total financial wealth at the beginning of period t is given by M t = B t +X t , where B t is the amount deposited in banks and X t represents cash balances kept for the purchase of consumption
352
6 Simulation-Based Methods
goods. The household receives its wage income at the start of the period. Cash balances and wage income determine an upper threshold on the household’s ability to purchase consumption goods: Ct ≤
X t + Wt L t . Pt
(6.38)
We refer to this requirement as the cash-in-advance constraint. The household’s real income consists of wages Wt L t /Pt , net rental income (r t − δ)K t from capital services (where capital depreciates at the rate δ), interest on banking deposits (q t − 1)B t /Pt , and dividends from the bank DtB . The household splits this income between consumption C t and savings S t . Savings increase financial wealth M t and the stock of physical capital K t . Accordingly, the budget constraint reads: (X t+1 − X t ) + (B t+1 − B t ) Pt Wt Bt ≤ L t + (r t − δ)K t + (q t − 1) + DtB − C t . Pt Pt
K t+1 − K t +
(6.39)
We depart from our usual specification of the household’s preferences over consumption and leisure and follow Christiano et al. (1997), who use the instantaneous utility function: 1−η 1 θ 1+ν u(C t , L t ) := Ct − At L t − 1 , θ > 0, ν > 0 (6.40) 1−η 1+ν that we have already encountered in the small open economy model in Section 6.2.4. Technically, this makes it easy to solve for L t given the real wage and to separate the role of the elasticity of labor supply 1/ν from other factors. The household maximizes the expected stream of discounted utility Et
∞ X s=0
β t u(C t+s , L t+s )
with respect to C t , L t , K t+1 , X t+1 , and B t+1 subject to (6.38) and (6.39). Since the household must decide on the size of its nominal deposits before the monetary shock is realized, X t and B t are state variables of the model. The Lagrangian for this problem is: 1−η ∞ X 1 θ s 1+ν L = Et β C t+s − A t+s L t+s −1 1−η 1+ν s=0
6.3 Simulation and Function Approximation
353
Wt+s (q t+s − 1)B t+s B L t+s + (r t+s − δ)K t+s + + Dt+s − C t+s Pt+s Pt+s (X t+s+1 − X t+s ) + (B t+s+1 − B t+s ) − (K t+s+1 − K t+s ) − Pt+s X t+s + Wt+s L t+s + Ξ t+s − C t+s . Pt+s + Λ t+s
From this expression, we can derive the set of first-order conditions that describes the household’s decisions. In the following, we present these conditions in terms of the stationary variables y t := Yt /A t , c t := C t /A t , k t := η K t /A t , w t := Wt /(A t Pt ), π t := Pt /Pt−1 , λ t := Λ t A t , x t := X t /(A t−1 Pt−1 ), η m t := M t /(A t−1 Pt−1 ), and ξ t := Ξ t A t . The definitions of x t and m t ensure that these variables are predetermined at the beginning of period t. −η θ λt + ξt = ct − L 1+ν , (6.42a) 1+ν t w ν1 t Lt = , (6.42b) θ λ t = β a−η E t λ t+1 (1 − δ + r t+1 ) , (6.42c) λ t+1 q t+1 λ t = β a−η E t , (6.42d) π t+1 λ t+1 + ξ t+1 −η λt = β a Et , (6.42e) π t+1 0 = ξ t (x t /(aπ t ) + w t L t − c t ).
(6.42f)
Equation (6.42a) shows that the marginal utility of consumption departs from the shadow price of wealth λ t as long as the cash-in-advance constraint binds, i.e., if ξ t > 0. The related Karush-Kuhn-Tucker condition is equation (6.42f). Equation (6.42b) is the labor supply schedule. The well-known Euler equation for capital is given in (6.42c). Together with equations (6.42d) and (6.42e), it implies equal expected rewards on the holdings of physical capital, banking deposits, and cash balances. In addition to these equations, the household’s budget constraint is satisfied with the equality sign and the cash-in-advance constraint holds. The wage bill satisfies the equation wt Lt =
B t + M t+1 − M t = m t+1 − x t /(aπ t ). A t Pt
(6.43)
Combining this with condition (6.42f) enables us to write the cash-inadvance constraint in the following way:
354
6 Simulation-Based Methods
c t = m t+1 ,
if ξ t > 0,
(6.44a)
c t ≤ m t+1 , if ξ t = 0, µt mt m t+1 = , aπ t
(6.44b) (6.44c)
where the third equation is implied from the definition of m t . In equilibrium, the household’s budget constraint reduces to the well-known resource restriction: yt = ct + it ,
(6.45a)
y t = Z t kαt L 1−α , t
(6.45b)
ak t+1 = (1 − δ)k t + i t .
(6.45c)
STATIONARY EQUILIBRIUM. The stationary equilibrium of the economy satisfies the usual requirements: all (scaled) variables vt remain constant ˆ t approach over time v = vt = vt+1 , there are no shocks so that z t and u ˆ = 0, implying Z = 1 and µ t = µ, and we can neglect the their limits z = µ expectations operator. In this equilibrium, equation (6.44c) implies that the inflation factor π (one plus the rate of inflation) is proportional to the money growth factor: π=
µ . a
(6.46a)
The Euler equation for capital (6.42c) delivers y aη − β(1 − δ) 1 = β a−η (1 − δ + α( y/k)) ⇒ = . | {z } k αβ
(6.46b)
1−δ+r
Together with (6.42d), this implies the Fisher equation, here written in terms of gross rates: q = π(1 − δ + r).
(6.46c)
ξ = λ(q − 1).
(6.46d)
Given this result, the stationary version of (6.42e) implies:
Accordingly, the cash-in-advance constraint binds in equilibrium if the nominal interest rate is positive: q−1 > 0. Combining (6.46a) and (6.46c), we find that this condition is satisfied if the growth rate of money is not excessively small:
6.3 Simulation and Function Approximation
355
µ > β a1−η . Finally note that equation (6.42b) and equation (6.36a) imply 1 1−α y ν L = . q θ L
(6.46e)
Since y/L is a function of y/k, it is independent of the money growth rate. However, according to (6.46c) and (6.46a), q is an increasing function of µ. Thus, steady-state working hours depend inversely on the rate of money growth: money is not superneutral. CALIBRATION. Table 6.6 presents our choice of parameter values for the model. We stay as close as possible to our benchmark business cycle model. Accordingly, we set the preference parameters β, η, and L, the parameters that characterize the production side, a, α, ρz , and σz , as well as the depreciation rate of capital δ to the values in Table 1.1. We choose the Frisch elasticity of labor supply 1/ν so that the model approximately matches the standard deviation of German output given in Table 1.2. We calibrate the money growth rate µ so that it implies an average inflation rate of 2% per year. With respect to the parameters of the process (6.37), we are agnostic and choose ρµ = 0 and σµ = 0.01.10 Table 6.6 Calibration of the Limited Participation Model Preferences Production Capital Accumulation Monetary Policy
β=0.996 a=1.003 δ=0.014 π=1.020.25
η=2.0 α=0.36 ρµ =0
L=0.126 ρz =0.82
ν=0.3 σz =0.0071
σµ =0.01
CHOICE OF POLICY FUNCTIONS. The model has 13 endogenous variables: the three endogenous state variables k t+1 , m t+1 , and x t+1 and 10 not 10
Alternatively, one could estimate both parameters from the growth rate of M1, as done by Cooley and Hansen (1995), who estimate ρµ = 0.49 and σµ = 0.0089 from US data between the first quarter of 1954 and the second quarter of 1991. For the German economy between the first quarter of 1975 and the fourth quarter of 1989 the estimates for M1 by Maußner (2004) imply ρµ = 0 and σµ = 0.0172.
356
6 Simulation-Based Methods
predetermined variables y t , c t , i t , L t , w t , r t , q t , π t , λ t , and ξ t . The three Euler equations (6.42c)-(6.42e) are sufficient to determine the parameters of three policy functions. We must pick them in such a way that we can recover the remaining 10 variables from the system of equations (6.36), (6.42), (6.44), and (6.45). The argument of the policy functions is the state vector w t , whose elements are the (scaled) capital stock k t , the beginning-of-period real financial assets m t , the beginning-of-period real cash balances x t , the log of TFP z t , and the money growth factor µ t : ˆ t ]0 ∈ W ⊂ R 5 . w t = [k t , m t , x t , z t , µ As approximating functions, we consider monomials and Hermite polynomials. From either family we build a complete set of degree d = 2. This means that each approximating function has 21 free parameters. A closer inspection of the the system of equations reveals that we are not able to solve this system for (scaled) next-period real cash balances x t+1 . Accordingly, one of three policy functions must be the policy function for this variable. We also require the solution for one of the two Lagrange multipliers to test whether the cash-in-advance constraint binds. Since the marginal utility of consumption cannot become negative, we select the Lagrange multiplier of the budget constraint λ t as our second policy function. For the remaining policy function, there are various candidates. We tried R several and invite the reader to use our MATLAB program LP_GSS_LR.m for his or her own experiments. The solution that works is the approximation of inflation π t . Given the (approximate) policy functions, we proceed as follows. From equation (6.44c), we compute the next period real financial wealth m t+1 and use equation (6.43) to obtain the wage bill w t L t . The labor supply condition (6.42b) allows us to determine working hours and the real wage from the wage bill. In the next step, we compute output y t from equation (6.45b) and the interest rate factor q t from equation (6.36a). We can then solve equation (6.36b) for the rental rate of capital. Next we solve for consumption c t . We try c t = m t+1 in equation (6.42a). If the rhs exceeds the value given from λ t = ˆhλ (w t , γλ ), we accept this solution and solve for the Lagrange multiplier of the cash-in-advance constraint from
θ ξ t = m t+1 − L 1+ν 1+ν t
−η
− λt .
6.3 Simulation and Function Approximation
357
Otherwise, we set ξ t = 0 and solve equation (6.42a) for consumption. Given the solution for consumption, equations (6.45a) and (6.45c) determine investment i t and the next-period capital stock k t+1 . EULER EQUATION RESIDUALS. We employ the Euler equations (6.42c), (6.42d), and (6.42e) to determine the parameters of our three policy functions. As in the previous section, we rewrite these equations so that the conditional expectation equals unity at the exact policy functions. We use the multivariate Gauss-Hermite integration formula (14.33) to approximate the respective right-hand sides so that 1 + eri t , i = 1, 2, 3 measures the deviation from the exact value of unity. The errors ui t are then given by u1t (γ1 ) = π t (1 + er1t ) − ˆhπ (w t , γ1 ), u2t (γ2 ) = λ t (1 + er2t ) − ˆhλ (w t , γ2 ),
u3t (γ3 ) = x t+1 (1 + er3t ) − ˆh x (w t , γ3 ). We employ the truncated principal components formula (12.40) to estimate the vectors γi ∈ R21 from a given simulation of length T = 10, 000. We start the fixed-point iterations with a simulation that employs a secondorder perturbation solution. The algorithm with a dampening parameter ψ = 0.1 requires 587 iterations to converge and consumes more than R 19 minutes of computation time on a workstation with an Intel Xeon W-2133 CPU at 3.60GHz. ACCURACY. We stop the iterations if none of the three policy functions deviates on average in absolute terms by more than eps = 10−7 in two successive iterations. From a simulation with a second set of random numbers we compute the Euler equation errors eri t , i = 1, 2, 3, defined as deviations from the exact value of unity. The GSS solution for the baseline calibration in Table 6.6 performs not markedly better than the second-order perturbation solution, although the latter implicitly assumes that the cashin-advance constraint is always binding. The maximum absolute residual of equation (6.42c) is approximately 1.8E-4 and less than one third of the residual from the perturbation solution. The residuals of equations (6.42d) and (6.42e) have the same order of magnitude and are approximately two thirds and one sixth of the size of the perturbation solution, respectively.
358
6 Simulation-Based Methods
Percent
Money Supply Shock 1.00 0.75 0.50 0.25 0.00
ρµ = 0.0 ρµ = 0.5
Nominal Interest Rate 0.20 0.10 0.00
1
2
3
4
5
6
7
8
9 10 11
−0.10
1
2
3
4
Percent
1.00
0.00
0.50
−0.05
0.00 1
2
3
4
5
6
7
8
9 10 11
−0.10
1
2
3
4
Percent
8
9 10 11
5
6
7
8
9 10 11
8
9 10 11
Output
Hours 0.10 0.00 −0.10 1
2
3
4
5
6
7
8
9 10 11
−0.20
1
2
3
Consumption Percent
7
0.05
1.50
4
5
6
7
Investment
0.00
1.0
−0.20
0.5
−0.40 −0.60
6
Real Wage
Inflation
0.10 0.00 −0.10 −0.20 −0.30
5
0.0 1
2
3
4
5
6
7
Quarter
8
9 10 11
1
2
3
4
5
6
7
8
9 10 11
Quarter
Figure 6.7 Impulse Response to a Money Supply Shock in the Limited Participation Model
RESULTS. We first consider the relative strength between the liquidity and the anticipated inflation effect. If the monetary shock is not autocorrelated, there is no anticipated inflation effect. This effect gains importance, when the autocorrelation parameter ρµ increases. The impulse responses displayed in Figure 6.7 show this very clearly. The monetary shock hits the economy in period t = 2. The black lines correspond to the case ρµ = 0. The liquidity effect is obvious from the upper-right panel of Figure 6.7. The additional supply of money lowers the nominal interest rate. The costs of
6.3 Simulation and Function Approximation
359
hiring labor decrease, while working hours and production increase. Part of the extra income is consumed and part is transferred to future periods via additional capital accumulation. The positive effect on consumption is very small, and, thus, not visible in Figure 6.7. The blue lines correspond to an autocorrelated money supply process. In addition to the liquidity effect, there is an inflationary expectations effect. As illustrated in Figure 6.7, the latter dominates the former for our choice of parameter values. Accordingly, the nominal interest rate increases, and the firm lowers the real wage offered to workers to reduce its labor costs. Thus, the household withdraws labor from the market and output decreases. This negative income effect lowers consumption. The increased costs of holding financial wealth trigger the substitution of physical capital for money holdings so that investment demand increases sharply. Table 6.7 Second Moments from the Limited Participation Model
Variable Output Consumption Investment Hours Real Wage Inflation
sx
GSS Solution σµ = 0.01 σµ = 0.011 rx y rx sx rx y rx
1.41 1.00 0.70 0.85 1.00 0.67 3.07 1.00 0.72 0.90 0.98 0.76 0.27 0.98 0.76 1.19 −0.16 −0.08
1.41 1.00 0.70 0.85 1.00 0.67 3.07 1.00 0.72 0.90 0.98 0.75 0.27 0.98 0.75 1.27 −0.14 −0.08
Perturbation Solution σµ = 0.01 sx rx y rx 1.42 1.00 0.71 0.87 0.99 0.65 3.03 1.00 0.74 0.90 0.98 0.76 0.27 0.98 0.76 1.21 −0.17 −0.09
Notes: Second moments computed from HP-filtered simulated time series with 10,000 included observations and a burn-in period of 200 observations. s x :=standard deviation of variable x, r x y :=cross correlation of variable x with output, r x :=first order autocorrelation of variable x.
Table 6.7 presents second moments from two different simulations of the model. The first considers the benchmark calibration presented in Table 6.6; the second simulation assumes a standard deviation of the innovations in (6.37) that is ten percent larger than in the benchmark calibration, i.e., σµ = 0.011. First, compare columns 2-4 with columns 5-7. The effect of larger money supply shocks on the real variables, including the real wage, cannot be discerned and is limited to the rate of inflation, whose standard deviation increases by 0.08 percentage points. Accordingly, technology shocks are
360
6 Simulation-Based Methods
the main drivers of the real business cycle. This observation allows us, second, to focus on the differences between the present model and the benchmark real business cycle model (compare, e.g., Table 6.5). These differences may be traced to the current-period utility function (6.40), which excludes income effects from the supply of labor as is obvious from the condition on labor supply in equation (6.42b). The limited participation model needs a Frisch elasticity of labor supply that is approximately twice as large as that in the benchmark model to yield the standard deviation of output of 1.41. In the benchmark model, the Frisch elasticity of labor supply, defined in (1.50a), is equal to approximately 1.7. Accordingly, the standard deviation of the real wage is much smaller than that in the benchmark model, whereas the standard deviation of hours differs only slightly between the two models. Note also that consumption is more volatile and investment less volatile in the limited participation model than in the benchmark model. The correlation between output and the other variables (except inflation) is as strong in the limited participation model as in the benchmark model, whereas the variables are somewhat more autocorrelated in the former model. Finally, consider the rightmost three columns in the table. They confirm our observations with respect to the benchmark model: the solution method has negligible effects on the second moments obtained from the simulations of the model.
6.3.5 Conclusion Simulation-based function approximation methods solve DSGE models on the model’s ergodic set. In this respect, they differ from spectral methods that solve a model on a predefined set of points. The choice of the approximating functions and the method to compute the parameter vector are key elements of their successful implementation and the involved programming effort. Fixed point iterations together with functions that are linear in their parameters are easy to program. Nonlinear functions and the direct computation of the fixed point require additional algorithms that solve nonlinear least squares problems and compute the zeros of nonlinear functions. Different from perturbation methods and the extended path algorithm, simulation-based function approximation can handle binding and non-binding constraints. On the downside are convergence problems so that it may not be possible to increase the accuracy beyond the mag-
6.3 Simulation and Function Approximation
361
nitude that is achieved by the much simpler to implement perturbation methods.
362
6 Simulation-Based Methods
Problem 6.1: Benchmark Business Cycle Model In Section 6.2.3, we use a reduced system of equations to compute the extended path solution of the benchmark model from Example 1.6.1. Use the system of equations (6.10) instead of this system and recompute the solution. Compare the runtime of your program to the runtime of our program.
Problem 6.2: A Small Open Economy with Consumers and Producers The economy is populated by a unit mass of identical consumers. The representative consumer supplies labor services L t and allocates his wealth between the stocks S t of domestic firms and an internationally traded bond B t . The rate of return of this bond is determined on the world capital market and denoted by r t . Domestic firms are distributed on the unit interval and are identical. As a result, the consumer must choose how much of his wealth he wants to put in the stocks of domestic firms, but he has no need to decide about the allocation of funds invested into specific firms. The stock price of the representative firm is vt . Each stock yields a dividend payment of d t . The consumer’s budget constraint, thus, is: B t+1 − B t + vt (S t+1 − S t ) = w t A t L t + (1 + r t )B t + d t S t − C t ,
where C t denotes consumption, w t is the real wage per efficiency unit of labor A t L t . At period t, the consumer chooses C t , L t , S t+1 , and B t+s to maximize Et
∞ X
β
s
C t+s −
θ 1+ν 1−η 1+ν A t+s L t+s
s=0
1−η
−1
,
β ∈ (0, 1), θ > 0, ν > 0,
subject to his budget constraints and given his initial portfolio (B t , S t ). The consumer is not allowed to accumulate debt at an ever increasing rate. Thus § ª B t+s+1 lim E t ≥ 0. s→∞ (1 + r t )(1 + r t+1 )(1 + r t+2 ) · · · (1 + r t+s ) The representative firm produces output Yt according to the function Yt = Z t K tα (A t L t )1−α ,
α ∈ (0, 1).
Z t is a stationary random process and the level of labor augmenting technical progress A t is governed by A t+1 = aA t ,
a ≥ 1.
The firm is not able to rent capital services, but must accumulate capital according to K t+1 = φ(I t /K t )K t + (1 − δ)K t .
Problems
363
The firm funds its investment expenditures I t from retained earnings RE t and the emission of new stocks vt (S t+1 − S t ): I t = RE t + vt (S t+1 − S t ).
Profits Yt − w t A t L t that are not retained for investment are distributed to the share holders: d t S t = Yt − w t A t L t − RE t .
Let Λ t denote the marginal utility of consumption. The firm employs the household’s stochastic discount factor M t+s := β s
Λ t+s Λt
to value its stochastic cash flow C F t = Yt − w t A t L t − I t .
At period t, the firm maximizes Vt := E t
∞ X s=0
M t+s Yt+s − w t+s A t+s L t+s − I t+s
subject to the above given constraints with respect to L t , I t , and K t+1 . Show that the first-order conditions of the consumer’s and the firm’s problem together with the various constraints specified above imply the system of stochastic difference equations given in (6.19).
Problem 6.3: Productivity and Interest Rate Shocks Assume that the world interest rate r t in the model of the small small open economy presented in Section 6.2.4 follows the process ln(r t+1 /r) = ρ r ln(r t /r) + ε r,t+1 , ρ r ∈ (−1, 1), ε r,t+1 iid N (0, σ2r ). Use ρ r ∈ {0, 0.5, 0.9} and σ r = 0.01. Solve and simulate this version with the extended path method and compare your results to the second moments presented in Table 6.2.
Problem 6.4: Productivity and Preference Shocks Empirically, the correlation between working hours and the real wage is close to zero or even negative. The benchmark model, however, predicts a strong positive
364
6 Simulation-Based Methods
correlation. In the following model, which is adapted from Holland and Scott (1998), we introduce a preference shock in the benchmark model of Example 1.6.1. Specifically, we assume that the parameter θ in the instantaneous utility function of the representative household is not a constant but a random variable θ t that is governed by a first-order autoregressive process: ln(θ t+1 /θ ) = ρθ ln(θ t /θ ) + εθ ,t+1 , ρθ ∈ (−1, 1), εθ ,t+1 iid N (0, σθ2 ).
The innovations εθ ,t+1 induce shifts of the labor supply schedule along a given labor demand schedule. By this, they counteract the positive correlation between the real wage and working hours introduced by TFP shocks z t := ln Z t . The planer’s problem is specified as follows: ¨∞ « X C 1−η (1 − L )θt (1−η) t+s s t+s max Et β C t ,L t ,K t+1 1−η s=0 s.t.
K t+s+1 + C t+s A t+s+1 z t+s+1 εz,t+s+1 0 1 0
≤ = = ∼ ≤ ≥ ≤
α ez t+s K t+s (A t+s L t+s )1−α + (1 − δ)K t+s , aA t+s , a ≥ 1, ρz z t+s + εz,t+s+1 , ρz ∈ (−1, 1), 2 N (0, σε ), s = 0, 1, . . . , C t+s , L t+s ≥ 0, K t+s+1 ,
K t , A t , z t given. Use the parameter values given in Table 1.1 to calibrate this model. In addition, put ρθ = 0.9 and σθ = 0.01 and calibrate θ so that the stationary fraction of working hours equals L = 0.126. 1) Derive the first-order conditions for the planer’s problem and write it down in terms of stationary variables. Modify the extended path algorithm 6.2.1 to suit this model. 2) Simulate a long time series. Pass the time series for working hours and the real wage to the HP-filter and compute the average cross-correlation between those two variables. 3) Repeat this exercise for a value of σθ close to zero.
Problem 6.5: Transition Dynamics and Endogenous Growth The following endogenous growth model is based on Lucas (1990). The description of the dynamics is adapted from Grüner and Heer (2000). Consider the following deterministic Ramsey problem that is augmented by a human capital sector. Households maximize intertemporal utility:
Problems Ut =
365 ∞ X
βj
j=0
c t+ j sθt+ j
1−η
1−η
, 0 < β < 1, 0 < θ ,
where c t and s t denote consumption and leisure (spare time) in period t. The individual can allocate his time endowment B to work l, learning v and leisure s: B = l t + vt + s t . The human capital of the representative individual h is determined by the time v he or she allocates to learning according to: γ h t+1 = h t 1 + Dvt . Physical capital k t accumulates according to: k t+1 = (1 − τl )l t h t w t + 1 + (1 − τk )r t k t + b t − c t ,
where wage income and interest income are taxed at the rates τl and τk , respectively. Pre-tax wage income is given by the product of the wage rate w t , the working hours l t , and the human capital h t . r t and b t denote the real interest rate and government transfers, respectively. Production per capita y is a function of capital k and effective labor lh. Output is produced with a CES technology: y t = F (k, nh) = a0 a1 kρ + a2 (lh)ρ
ρ1
,
where σ p = 1/(1 − ρ) denotes the elasticity of substitution in production. Define k the state variable z := lh . The production per effective labor is defined by f (z) := F (z, 1). In a factor market equilibrium, factors are rewarded with their marginal products: w = f (z) − z f 0 (z), r = f 0 (z).
The government receives revenues from taxing labor income and capital income. The government budget is balanced so that government consumption g and transfers b equal tax revenues in any period: g t + b t = τw l t h t w t + τ r r t k t . Periods t correspond to years. The model is calibrated as follows: η = 2.0, θ = 0.5, η = 0.97, B = 2.13, D = 0.035, γ = 0.8, ρ = −2/3, a0 = 0.77, a1 = 0.36, a2 = 0.64, τw = 0.36, τ r = 0.40. The share of government consumption in output is g/ y = 0.21. 1) Derive the first-order conditions of the household and the equilibrium conditions of the model.
366
6 Simulation-Based Methods
2) On a balanced growth path, consumption, output, physical capital, and human capital grow at a constant rate µ, while the time allocation is constant. Derive the equations that characterize the balanced growth equilibrium. For this reason, express the equations with the help of stationary variables. For example, divide the government budget constraint by y t . 3) Use a nonlinear equation solver to compute the stationary equilibrium. 4) How does the growth rate react to a reduction of the capital income tax rate τ r from 40% to 25% that is financed a) by a reduction in transfers b t and b) by an increase in the wage income tax rate τw ? Explain why the growth rate decreases in the latter case. 5) Compute the dynamics for the transition between the old steady state that is characterized by a capital income tax rate τ r = 40% and the new steady state that is characterized by τ r = 25%. Assume that during the transition and in the new steady state, g/ y and b/ y are constant and that the wage income tax rate τw adjusts in order to balance the government budget. Use forward iteration to compute the dynamics. (difficult)
Problem 6.6: Business Cycle Fluctuations and Home Production In the US economy, hours worked fluctuate considerably more than the Solow residual, and the correlation is close to zero. The standard real business cycle model presented in Section 1.6.1 has considerable difficulties to replicate this fact. For our German calibration, for example, hours worked and productivity have approximately equal standard deviations of 0.82% (see Table 6.5). The following extension of the stochastic growth model is based on Benhabib et al. (1991). In their model, agents work in the production of both a market-produced good M and a home-produced good H. Households maximize intertemporal utility ¨∞ 1−η « X C t+s (1 − L t+s )θ (1−η) − 1 s U t := E t β 1−η s=0 where C t is the following composite of the consumption of good M and H: 1 φ φ φ C t = aC M t + (1 − a)CH t .
(P.6.6.1)
Labor L t is allocated to market and home production according to: L t = L M t + LH t . Notice that the two types of work are assumed to be perfect substitutes, while the two consumption goods are combined by an aggregator that implies a constant elasticity of substitution equal to 1/(1 − φ).
Problems
367
The model has two technologies: α 1−α YM t = F (Z M t , K M t , L M t ) = Z M t K M t LM t , γ
1−γ
YH t = G (ZH t , KH t , L H t ) = ZH t KH t L H t . The technology shocks follow the processes: ln Z M ,t+1 = ρ ln Z M t + ε M t+1 , ln ZH,t+1 = ρ ln ZH t + εH t+1 , where εi t+1 ∼ N (0, σ2i ), i = M , H with contemporaneous correlation r M H = cor(ε M t , εH t ). The household operates the home production technology (index H), and a representative firm operates the market technology (index M ). This firm employs L M t units of labor at the real wage w t and rents K M t units of capital at the rate r t . The firm maximizes profits DM t = YM t − w t L M t − r t K M t .
The household allocates his market income, w t L M t + r t K M t + DM t , to market consumption C M t and the accumulation of physical capital K t = K M t + KH t . Accordingly, the household’s budget constraint with respect to market income reads 0 ≥ w t L M t + r t K M t + DM t − C M t − I t ,
K t+1 = (1 − δ)K t + I t .
The Lagrangian of the household’s problem is given by: 1−η ∞ X C t+s (1 − L M t+s − L H t+s )θ (1−η) − 1 s L = Et β 1−η s=0 + Λ M t+s w t+s L M t+s + r t+s K M t+s + DM t+1 − C M t+s − (K t+s+1 − (1 − δ)K t+s ) γ 1−γ + ΛH t+s X t+s KH t+s L t+s − CH t+s . The household chooses K t+1 , K M t , KH t , L M t , L H t , C M t , and CH t to maximize this function subject to (P.6.6.1). In equilibrium, the markets for labor L M t and capital services K M t clear. Model periods correspond to quarters. The model is calibrated as follows: β = 0.99, α = 0.36, δ = 0.025, η = 1.5, φ = 0.8, γ = 0.08, r M H = 0.66, ρ = 0.9, σ M = σH = 0.007. The steady state leisure ¯L = 0.7 is used to calibrate θ . a is set so that CH /C M = 1/4. 1) Derive the first-order conditions of the household and the firm.
368
6 Simulation-Based Methods
2) Derive the system of equations that determines the dynamics of this economy from the first-order conditions, the market clearing conditions, and the production functions. 3) Compute the steady state and calibrate the parameters a and θ . 4) The system of equations derived in step 2) has one endogenous state variable, K t , two exogenous shocks, ln Z M t and ln ZH t , and 13 jump variables. Approximate the policy function for market hours L M t = ˆh L M (K t , ln Z M t , ln ZH t ). Show that, for given K t and L M t , the system of static equations (i.e., those that only involve variables at period t) can be solved for the remaing 10 variables. 5) Employ a complete basis of Hermite polynomials of degree d = 2. Use the GSS algorithm to determine the parameters of ˆh L M (·). 6) Apply the HP-filter to the simulated time series. Compute the standard deviation of hours worked in the market activity, L M t , and productivity, Z M t , as well as the correlation of L M t and Z M t . Explain why the variance of hours worked has increased. Vary φ and analyze the sensitivity of your result with regard to this parameter. Explain your result.
Chapter 7
Discrete State Space Value Function Iteration
7.1 Introduction The methods presented in the previous chapters use a system of stochastic difference equations that governs the time path of an economy to find approximations of the policy functions that determine the endogenous variables given the current state of the economy. In this chapter, we switch the perspective from the Euler equations approach to the dynamic programming approach. As explained in Sections 1.3.3 and 1.4.3, the object of concern is the value function that solves the Bellman equation. Given the value function, the decision variables maximize the right-hand side (rhs) of the Bellman equation, which consists of the one-period return function and the value function. The latter can be found as the limit of a sequence of functions obtained from repeatedly solving the maximization problem on the rhs of the Bellman equation. Methods that solve dynamic general equilibrium (DGE) models via a converging sequence of approximations of the value function are known as value function iteration (VI). Note the qualifier ‘approximations’ in this statement. The value function of an infinite-horizon, dynamic, stochastic optimization problem with n endogenous state variables x ∈ X ⊂ Rn and m exogenous shocks z ∈ Z ⊂ Rm is a continuous map V : X × Z → R that cannot be represented accurately on a finite-precision computer. Accordingly, value function iteration must also draw on the methods of function approximation reviewed in Chapter 13. VI simplifies considerably if we replace the original model by a model whose state space consists of a finite number of discrete points (see Judd (1998), Section 12.3). In this case, the value function is a finite-dimensional object. For instance, if the state space is one-dimensional and consists © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_7
369
370
7 Discrete State Space Value Function Iteration
of n distinct points, i.e., X = {x 1 , x 2 , . . . , x n }, x i ∈ R, then the value function is simply a vector of n elements where each element gives the value attained by the optimal policy if the initial state of the system is x i ∈ X , i = 1, 2, . . . , n. We can start with an arbitrary vector of values representing our initial guess of the value function and then obtain a new vector by solving the maximization problem on the rhs of the Bellman equation. Given that the original model admits a solution, this procedure converges to the exact value function of this discrete-valued problem. Although simple in principle, this approach has a serious drawback: It suffers from the curse of dimensionality. In a one-dimensional state space, the maximization step is simple — we simply need to search for the maximal element among n. However, the value function of a d-dimensional problem with n different points in each dimension d is an array of nd different elements, and the computation time needed to search this array may be prohibitively high. Parallelization of the computer code provides a partial remedy to this problem. Even simple laptops come with several physical or virtual cores either in one or in multiple CPUs. In addition, and as explained in Aldrich et al. (2011), the graphics processor may also be used for parallel computing. As the reader will learn in this chapter, there are various steps in the algorithm that can be performed simultaneously. However, we do not provide an introduction to parallel programming and instead refer the reader to Fernández-Villaverde and Valencia (2018), who present a guide to several programming languages that allow writing code for parallel computing. In this chapter, we confine ourselves to problems where the maximization step can be reduced to searching a vector of n elements. While this limits the class of representative agent models to which we can apply this method, this endeavor is nevertheless worth the while. As you will learn in the second part of the book, there are many heterogeneous agent models in which discrete state space VI plays an integral part in the solution procedure. For higher dimensional problems, we refer the reader to R the MATLAB program suite of Kirkby (2017), which makes use of the parallelization toolbox of this programming language. In Section 7.2, we use the infinite-horizon Ramsey model (1.9) to discuss the choice of the set X , the choice of the initial value function, the maximization step, and the termination of the sequence of iterations. In addition, we consider methods to speed up convergence and to increase precision. Section 7.3 extends these methods to the stochastic growth model (1.24). Additional applications in Section 7.4 cover the stochastic
7.2 Solution of Deterministic Models
371
growth model with irreversible investment and our benchmark model of Example 1.6.1.
7.2 Solution of Deterministic Models In this section, we introduce the discrete state space VI. The infinite-horizon deterministic Ramsey model of Section 1.3 serves as our point of departure. We repeat its main properties in the next paragraph. Then, we present a simple algorithm that computes the value function of a discrete version of this model. Subsequently, we consider several improvements of this algorithm with respect to computation time and precision. THE MODEL. In the model of Section 1.3 a fictitious planner (or farmer), equipped with initial capital K t , chooses a sequence of future capital stocks {K t+s }∞ s=1 that maximizes the lifetime utility of a representative household Ut =
∞ X s=0
β s u(C t+s ),
β ∈ (0, 1),
subject to the economy’s resource constraint f (K t+s ) ≥ C t+s + K t+s+1 , s = 0, 1, . . . and nonnegativity constraints on consumption C t+s and the capital stock K t+s+1 . The utility function u(C) is strictly concave and twice continuously differentiable. The function f (K) = F (K, L) + (1 − δ)K determines the economy’s current resources as the sum of output F (K, L) produced from a fixed amount of labor L = 1 and capital services K and the amount of capital left after depreciation, which occurs at the rate δ ∈ (0, 1]. The function f is also strictly concave and twice continuously differentiable. The method that we employ rests on a recursive formulation of this maximization problem in terms of the Bellman equation (1.16): v(K) =
max
0≤K 0 ≤ f (K)
u( f (K) − K 0 ) + β v(K 0 ).
(7.1)
This is a functional equation in the unknown value function v. Once we know this function, we can solve for K 0 as a function h of the current capital stock K. The function K 0 = h(K) is known as the policy function.
372
7 Discrete State Space Value Function Iteration
DISCRETE APPROXIMATION. We know from the analysis in Section 1.3.4 that the optimal sequence of capital stocks monotonically approaches the stationary solution K ∗ determined from the condition β f 0 (K ∗ ) = 1. Thus, the economy stays in the interval [K0 , K ∗ ] (or in the interval [K ∗ , K0 ] if K0 > K ∗ ). Instead of considering this uncountable set, we use n distinct points of this set to represent the state space. In this way, we transform our problem from solving the functional equation (7.1) in the space of continuous functions (an infinite-dimensional object) to the much nicer problem of determining a vector of n elements. Note, however, that the stationary solution of this new problem differs from K ∗ . For this reason, ¯ > K ∗ as the upper bound of the state space and K < K0 as the we use K lower bound.1 Our next decision concerns the number of points n. A fine grid K = {K1 , K2 , . . . Kn }, Ki < Ki+1 , i = 1, 2, . . . , n, provides a good approximation.2 However, the number of function evaluations that are necessary to perform the maximization step on the rhs of the Bellman equation increases with increasing n so that the computation time constraints place a limit on n. We discuss the relation between accuracy and computation time below. For the time being, we consider a given number of grid points n. A related question concerns the distance between neighboring points in the grid. In our applications, we work with equally spaced points ∆ = Ki+1 − Ki for all i = 1, 2, . . . , n − 1. However, as the policy function and the value function of the original problem are more curved for lower values of the capital stock, the approximation is less accurate over this range. As one resolution to this problem, one might choose an unequally spaced grid with more points in the lower interval of the state space, for instance Ki = K1 + ∆(i − 1)2 , ∆ = (Kn − K1 )/(n − 1)2 , or choose a grid with constant logarithmic distance, ∆ = ln Ki+1 − ln Ki . However, one can show that neither grid type dominates uniformly across applications. In our discrete model, the value function is a vector v of n elements. Its ith element holds the lifetime utility U t obtained from a sequence of capital stocks that is optimal given the initial capital stock K t = Ki ∈ K . The associated policy function can be represented by a vector h of indices. As before, let i denote the index of Ki ∈ K , and let j ∈ 1, 2, . . . , n denote the index of K 0 = K j ∈ K , that is, the maximizer of the rhs of the Bellman equation for a given Ki . Then, hi = j. ¯ > K0 and K < K ∗ . Of course, if K0 > K ∗ , we choose K We use the index i to refer to the elements of the set K and the index t to refer to the elements of the time series {K t } t∈Z .
1
2
7.2 Solution of Deterministic Models
373
The vector v can be determined by iterating over vis+1 = max
K j ∈Di
u( f (Ki ) − K j ) + β v sj ,
i = 1, 2, . . . , n,
Di := {K ∈ K : K ≤ f (Ki )}.
Successive iterations converge to the solution v∗ of the discrete-valued infinite-horizon Ramsey model according to the contraction mapping theorem.3 SIMPLE ITERATIVE PROCEDURE. The following steps describe a simple-toprogram algorithm that computes v∗ iteratively. Since the solution to max K0
u( f (K) − K 0 ) + β × 0
is obviously K 0 = 0, we start the iterations with s = 0 and vis = u( f (Ki )) for all i = 1, . . . , n. In the next step, we find a new value and policy function as follows: For each i = 1, . . . , n : Step 1: Compute w j = u( f (Ki ) − K j ) + β v sj , j = 1, . . . , n.
Step 2: Find the index j ∗ such that w j ∗ ≥ w j ∀ j = 1, . . . , n.
Step 3: Set hsi = j ∗ and vis+1 = w j ∗ . Step 4: Check for convergence: If vs+1 is close to vs , then stop. Otherwise, return to Step 1, and replace vs with vs+1 . We explain in Section 15.2 that if the maximum absolute error between vs+1 and vs , kvs+1 − vs k∞ , is smaller than ε(1 − β), then the error in accepting hs and vs as solution is smaller than ε, i.e., kvs − v∗ k∞ < ε. If one uses a standard programming language (e.g., C, Fortran, GAUSS, R or MATLAB ), there is no need to consider finding the maximal element of w = [w1 , w2 , . . . , w n ]0 in Step 2 since there are built-in subroutines as, e.g., the the MaxLoc function in Fortran 95, the maxindc command in R GAUSS and the max command in MATLAB . Note that this simple algorithm involves two loops: the outer loop over the index i and the inner loop over the index j. In this inner loop, Ki 3
See, e.g., Theorem 12.1.1 of Judd (1998) p. 402.
374
7 Discrete State Space Value Function Iteration
and vs = [v1s , v2s , . . . , vns ] are given, and the solution in Step 2 does not change their values. Accordingly, the outer loop could in principle be run on n different processor units (cores). For instance, GAUSS and the R parallelization toolbox in MATLAB provide loops that are executed on the available cores of the machine at hand. EXPLOITING MONOTONICITY AND CONCAVITY. The algorithm in the previous paragraph is not very smart. We can do much better if we exploit the structure of our problem. First, we can choose the initial value function more carefully, since an initial value function closer to the final solution saves iterations. Note that the stationary value of the original Ramsey problem follows from the Bellman equation as V (K ∗ ) = u( f (K ∗ ) − K ∗ ) + β V (K ∗ ).
Therefore, we can initialize all elements vi of v0 with vi = u( f (K ∗ ) − K ∗ )/(1 − β). Second, we can exploit the monotonicity of the policy function (see Section 1.3.3 on this result), that is: Ki ≥ K j ⇒ Ki0 = h(Ki ) ≥ K 0j = h(K j ).
As a consequence, once we find the optimal index j1∗ for K1 , we no longer need to consider capital stocks smaller than K j1∗ in the search for j2∗ . More generally, let ji∗ denote the index of the maximization problem in Step 2 for i. Then, for i + 1, we evaluate u(F (N , Ki ) − K j ) + β v sj only for indices j ∈ { ji∗ , . . . n}. Third, we can shorten the number of computations in maximization Step 2 since the function φ(K 0 ) := u( f (K) − K 0 ) + β v(K 0 )
(7.2)
is strictly concave.4 A strictly concave function φ defined over a grid of n points takes its maximum either at one of the two boundary points or in the interior of the grid. In the first case, the function is decreasing (increasing) over the whole grid if the maximum is the first (last) point of the grid. In the second case, the function is first increasing and then decreasing. Consequently, we can select the midpoint of the grid Km and the point 4
This is the case because the value function, the utility function, and the production function are strictly concave. See Section 1.3.3.
7.2 Solution of Deterministic Models
375
next to it Km+1 to determine whether the maximum lies to the left of Km (if φ(Km ) > φ(Km+1 )) or to the right of Km (if φ(Km+1 ) > φ(Km )). Thus, in the next step, we can reduce the search to a grid approximately half the size of the original grid. Kremer (2001), pp. 165ff. proves that a search based on this principle needs at most log2 (n) steps to reduce the grid to a set of three points that contains the maximum. For instance, instead of 1000 function evaluations, a binary search requires no more than 13! We describe this principle in more detail in the following algorithm: Algorithm 7.2.1 (Binary Search) Purpose: Find the maximum of a strictly concave function f (x) defined over a grid of n points X = {x 1 , ..., x n } Steps:
Step 1: Initialize: Set imin = 1 and imax = n. Step 2: Select two points: il = floor((imin + imax )/2) and iu = il + 1, where floor(i) ∈ N denotes the largest integer less than or equal to i ∈ R. Step 3: If f (x iu ) > f (x il ), set imin = il . Otherwise, set imax = iu . Step 4: If ima x − imin = 2, stop, and choose the largest element among f (x imin ), f (x imin+1 ), and f (x ima x ). Otherwise, return to Step 2. Finally, the closer the value function gets to its stationary solution, the less likely it is that the policy function changes with further iterations. Therefore, usually one can terminate the algorithm if the policy function has remained unchanged for a number of consecutive iterations. In summary, we propose the following algorithm to solve the infinitehorizon deterministic Ramsey problem via value function iteration on a discrete state space: Algorithm 7.2.2 (Value Function Iteration 1) Purpose: Find an approximate solution of the policy function for the Ramsey model (1.9) Steps: Step 1: Choose a grid K = {K1 , K2 , . . . , Kn }, Ki < Ki+1 , i = 1, 2, . . . n − 1.
Step 2: Initialize the value function: ∀i = 1, . . . , n set vi0 =
u( f (K ∗ ) − K ∗ ) , 1−β
376
7 Discrete State Space Value Function Iteration
where K ∗ denotes the stationary solution to the continuous-valued Ramsey problem. Step 3: Compute a new value function and the associated policy function, ∗ v1 and h1 , respectively; set j0∗ = 1. For i = 1, 2, . . . , n, and ji−1 , use ∗ Algorithm 7.2.1 to find the index ji that maximizes u( f (Ki ) − K j ) + β v 0j
∗ ∗ in the set of indices { ji−1 , ji−1 + 1, . . . , n}. Set h1i = ji∗ and vi1 = u( f (Ki ) − K j ∗ ) + β v 0j ∗ . i
i
Step 4: Check for convergence: If kv0 − v1 k∞ < ε(1 − β) for ε > 0 (or if the policy function has remained unchanged for a number of consecutive iterations) stop. Otherwise, replace v0 with v1 and h0 with h1 , and return to step 3.
POLICY FUNCTION ITERATION Value function iteration is a slow procedure since it converges linearly at the rate β (see Section 15.2 on rates of convergence), that is, successive iterations satisfy kvs+1 − v∗ k ≤ βkvs − v∗ k,
for a given norm kxk. Howard’s improvement algorithm or policy function iteration is a method to enhance convergence. Each time a policy function hs is computed, we solve for the value function that would occur if the policy were followed forever. This value function is then used in the next step to obtain a new policy function hs+1 . As pointed out by Puterman and Brumelle (1979), this method is akin to Newton’s method for locating the zero of a function (see Section 15.3), providing quadratic convergence under certain conditions. The value function that results from following a given policy h = [h1 , . . . , hn ]0 forever is defined by vi = u( f (Ki ) − K j ) + β v j , j = hi ,
i = 1, 2, . . . , n.
This is a system of n linear equations in ui := u( f (Ki ) − Khi ) and the unknown elements vi of the vector v. We write this system in matrixvector notation. Toward this purpose we introduce a matrix Q with zeros everywhere except for its row i and column j elements, which are ones. The above equations may then be written as v = u + βQv,
(7.3)
7.2 Solution of Deterministic Models
377
with solution v = [I − βQ]−1 u. Policy function iterations may be started with either a given value function or a given policy function. In the first case, we compute the initial policy function by performing Step 3 of Algorithm 7.2.2 once. The difference occurs at the end of Step 3. Let u1 and Q1 denote he matrix Q and the vector u obtained from the policy function h1 as explained above. Then, we set v1 = [I n − βQ1 ]u1 . If n is large, (I n − βQ) is a sizable object, and you may encounter a memory limit on your personal computer. For instance, if your grid contains 10,000 points, (I n − βQ) has 108 elements. Stored as double precision (that is, eight bytes of memory for each element), this matrix requires 0.8 gigabytes of memory. Fortunately, (I n − βQ) is a sparse matrix: It has 2n nonzero elements at most (corresponding to the case where all elements on the main diagonal of Q are equal to zero). Many linear algebra routines are able to solve linear systems Ax = b with a sparse matrix A. For instance, R using the GAUSS or MATLAB sparse matrix procedures allows us to store (I − βQ) in a matrix of maximum size 2n × 3, which occupies just 480 kilobytes of memory. If it is not possible to implement the solution of the large linear system or if it becomes too time consuming to solve this system, there is an alternative to full policy iteration. Modified policy iteration with k steps s computes the value function v1 at the end of Step 3 of Algorithm 7.2.2 in these steps: w1 = v0 , ws+1 = u + βQ1 ws ,
s = 1, . . . , k,
(7.4)
v1 = wk+1 . As proved by Puterman and Shin (1978), this algorithm achieves linear convergence at rate β k+1 (as opposed to β for value function iteration) for u close to the optimal solution u∗ . INTERPOLATION BETWEEN GRID POINTS. In the Ramsey model, we can restrict the state space to a small interval. This facilitates a reasonably accurate solution with a moderate number of grid points so that convergence is achieved in a few seconds or minutes, depending on the number of grid points n. However, in the heterogeneous agent models in the second part of the book, we encounter problems where the relevant state space is large and where we repeatedly need to compute the value function. In
378
7 Discrete State Space Value Function Iteration
these situations, the computation time on a grid with many points may become a binding constraint. Therefore, we look for methods that increase precision for a given number of grid-points without a compensating rise in computation time. How do we accomplish this? Consider Step 3 of Algorithm 7.2.2, where we maximize the rhs of the Bellman equation (7.2) with respect to K 0 . Assume that K j is this solution. Since the value function is increasing and concave, the exact maximizer must lie in the interval [K j−1 , K j+1 ]. If we are able to evaluate the rhs of the Bellman equation at all K 0 ∈ [K j−1 , K j+1 ], we can select the maximizer of φ(K 0 ) in this interval. Two things are necessary to achieve this goal: an approximation of the value function over the interval [K j−1 , K j+1 ] and a method to locate the maximum of a continuous function. We consider function approximation in Chapter 13. The methods that we employ here are linear (see Section 13.6.1) and cubic (see Section 13.6.2) spline interpolation. The first method provides a continuous func¯ ]. However, this function is not differentiable at tion over the interval [K, K the chosen nodes of the grid, say (Ki , vi ), i = 1, . . . , n. The second method determines a continuously differentiable function over the entire interval. Since the current-period utility function is smooth anywhere, these methods allow us to approximate the rhs of the Bellman equation (7.2) by a ˆ continuous function φ(K): ˆ φ(K) := u( f (Ki ) − K) + vˆ(K),
(7.5)
where vˆ is determined by interpolation, either linearly or cubically. In the interval [K j−1 , K j+1 ], the maximum of φˆ is located either at the endpoints or in the interior. For this reason, we need a method that can address both boundary and interior solutions of a one-dimensional optimization problem. The golden section search considered in Section 15.4.1 satisfies this requirement. We can now modify Step 3 of Algorithm 7.2.2 in the following way: We determine ji∗ as before and then refine the solution. First, we assume that ji∗ is the index neither of the first nor of the last grid point so that the optimum of (7.2) is bracketed by I j = [K j ∗ −1 , K j ∗ +1 ]. Instead of storing i i the index ji∗ , we now locate the maximum of (7.5) in I j with the aid of ˜ j ∗ ∈ I j in the vector h in Algorithm (15.4.1) and store the maximizer K i ˆ K ˜ j ∗ ) in vi . If j ∗ = 1, we evaluate (7.5) at a point position i. We store φ( i i close to K1 . If this returns a smaller value than at K1 , we know that the ˜ j ∗ in [K1 , K2 ]. We proceed maximizer is equal to K1 . Otherwise, we locate K i ∗ analogously if ji = n.
7.2 Solution of Deterministic Models
379
EVALUATION. In the preceding paragraphs, we introduced six different algorithms: 1) Simple value function iteration, which maximizes the rhs of the Bellman equation by selecting the maximizer from the list of all possible values, 2) Value function iteration (Algorithm 7.2.2), which exploits the monotonicity of the policy function and the concavity of the value function, 3) Policy function iteration, i.e., Algorithm 7.2.2, where we use v1 = [I − βQ1 ]−1 u1 in Step 3, 4) Modified policy function iteration, i.e., Algorithm 7.2.2, where v1 in Step 3 is computed via (7.4), 5) Value function iteration according to Algorithm 7.2.2 with linear interpolation between grid points, and 6) Value function iteration according to Algorithm 7.2.2 with cubic interpolation between grid points. We use these six algorithms to compute the approximate solution of the infinite-horizon Ramsey model with current-period utility function u(C) = C 1−η /(1 − η) and production function F (K, 1) = K α . Subsequently, we evaluate their performance with respect to computation time and accuracy. We gauge the latter with the maximum absolute value of the Euler equation residuals computed from equation (1.67). We coded the algorithms in three different languages: Fortran, GAUSS, R 5 and MATLAB . Our Fortran program employs routines from the Math R R Kernel Library of Intel and was compiled and linked with the Intel Fortran compiler, which offers built-in code optimization. The GAUSS and R MATLAB codes use built-in procedures of these languages as much as possible. All three programs, DGM_VI.f90, DGM_VI.g, and DGM_VI.m, R are ran on a workstation with an Intel Xeon W-2133 CPU running at 3.60 GHz. We employ the parameter values α = 0.36, β = 0.996, η = 2.0, and δ = 0.016 and compute the value and the policy function on a grid of n points over the interval [0.75K ∗ , 1.25K ∗ ]. The iterations stop if the maximum absolute difference between successive approximations of the value function falls below 0.01(1 − β) or if the policy function remains unchanged over 30 consecutive iterations.6 Modified policy iteration uses k = 20. We compute the Euler equation residuals on a grid with 200 equally spaced points over the interval [0.8K ∗ , 1.2K ∗ ]. This requires interpolation 5
See Aruoba and Fernández-Villaverde (2015) for a similar study with respect to the stochastic growth model. 6 This latter criterion is applicable only for algorithms 1 through 4.
380
7 Discrete State Space Value Function Iteration
between the points of the solution grid, and we use linear interpolation, irrespective of the solution algorithm. Table 7.1 presents the time required to compute the solution in hours, minutes, seconds, and hundredths of a second. Repeated runs of our programs reveal that the computation time is not completely stable between different runs, although the number of iterations required to find the solution is stable. Accordingly, the figures in the table provide a rough estimate. Table 7.1 Value Function Iteration in the Deterministic Growth Model: Runtime Language
n = 250
n = 500
n = 1, 000
n = 5, 000
n = 10, 000
Algorithm 1 Fortran 0:00:00:52 0:00:02:15 0:00:08:43 0:03:31:81 0:16:56:66 GAUSS 0:00:00:96 0:00:03:78 0:00:18:30 0:08:46:48 0:38:46:48 R MATLAB 0:00:00:95 0:00:02:48 0:00:08:44 0:02:21:02 0:10:17:97 Algorithm 2 Fortran 0:00:00:06 0:00:00:12 0:00:00:32 0:00:01:74 0:00:04:52 GAUSS 0:00:02:80 0:00:06:72 0:00:17:07 0:01:52:43 0:05:19:76 R MATLAB 0:00:05:03 0:00:10:96 0:00:28:91 0:03:12:97 0:07:46:80 Algorithm 3 Fortran 0:00:00:21 0:00:00:14 0:00:00:17 0:00:02:85 0:00:07:76 GAUSS 0:00:01:03 0:00:02:28 0:00:06:62 0:02:02:32 0:04:58:75 R MATLAB 0:00:00:94 0:00:02:26 0:00:04:40 0:01:36:33 0:05:30:82 Algorithm 4 Fortran 0:00:00:03 0:00:00:05 0:00:00:08 0:00:00:68 0:00:05:66 GAUSS 0:00:00:55 0:00:01:11 0:00:02:64 0:00:24:27 0:05:09:96 R MATLAB 0:00:00:97 0:00:02:14 0:00:04:23 0:00:40:88 0:08:35:98 Algorithm 5 Fortran 0:00:00:90 0:00:01:63 0:00:03:13 0:00:14:89 0:00:29:60 GAUSS 0:00:48:92 0:01:36:56 0:03:25:96 0:17:39:66 0:35:48:58 R MATLAB 0:01:45:03 0:03:31:03 0:07:03:11 0:35:03:65 1:11:49:61 Algorithm 6 Fortran 0:00:00:93 0:00:01:73 0:00:03:34 0:00:15:90 0:00:32:36 GAUSS 0:00:59:89 0:01:58:43 0:03:57:49 0:19:30:22 0:38:19:02 R MATLAB 0:01:46:45 0:03:33:21 0:07:12:18 0:34:07:96 1:12:33:47 Notes: Runtime is given in hours:minutes:seconds:hundredth of a seconds on a workR station with Intel Xeon W-2133 CPU running at 3.60 GHz.
7.2 Solution of Deterministic Models
381
Considering the first algorithm, surprisingly, the Fortran program perR forms worse than does the MATLAB program, although both programs evaluate the current-period utility at all grid points without a loop. GAUSS, on the other hand, requires more time, particularly for grid sizes n = 5, 000 and n = 10, 000, than does the Fortran program. The second algorithm provides a clue for the explanation of this behavior. Instead of built-in commands, it employs binary search to locate the maximum. Even for n = 10, 000, the Fortran program solves the model in approximately five R seconds, while GAUSS requires more than five minutes and MATLAB more than seven. This observation points to the MaxLoc function that we employ our Fortran implementation of algorithm one to locate the maxiR mum of the Bellman equation as a likely culprit. The MATLAB built-in command max seems to be more efficient. Algorithm three, full policy function iteration, does not speed up all programs although it reduces the number of iterations. For n = 10, 000, the Fortran program needs three seconds more to compute the solution, GAUSS R saves approximately 20 seconds, and MATLAB even saves 2 minutes. There is thus a trade-off between fewer iterations and the additional R time spent solving the sparse linear system. MATLAB seems to be more efficient in this respect, although it is altogether much slower than the Fortran program. Algorithm four, partial policy iteration, is sensitive with respect to the choice of k, the number of iterations over a given policy function. For instance, for k = 20 the algorithm requires over 1,200 iterations, while it finishes after 702 iterations for k = 30. We used this value to compute the solutions presented in the table. Again, Fortran is much faster than both R GAUSS and MATLAB . It requires less than six seconds, whereas GAUSS R requires more than five minutes7 and MATLAB over eight minutes. Algorithms five and six are computationally more expensive. Instead of a few seconds in the cases of algorithms two through four, the Fortran program needs approximately 30 seconds to compute the solution for n = 10, 000. The predominance of the compiled Fortran code over the R interpreted languages is most obvious: MATLAB needs more than one hour to finish, and GAUSS needs more than half an hour. 7
We discovered a strange behavior of GAUSS when we used the more recent sparse matrix commands: For n = 10, 000, GAUSS required over four hours to finish. When we switched to the older commands, no longer officially supported, GAUSS found the solution after five minutes and ten seconds, as Table 7.1 shows.
382
7 Discrete State Space Value Function Iteration
In summary, except for simple value function iteration, Fortran is much R faster than both GAUSS and MATLAB . Again, except for algorithm one,
GAUSS performs better than MATLAB R . Heer and Maußner (2011) point to a further improvement of value function iteration. They compute the value function on a coarse grid and use this solution to initialize the algorithms that compute the solution on a much finer grid. As shown in Table 7.1, all algorithms quickly compute the solution on a grid with few points. An initial value function closer to the final solution reduces the number of iterations and speeds up the solution process. Table 7.2 Value Function Iteration in the Deterministic Growth Model: Accuracy Algorithm 1 2 3 4 5 6
n = 250
n = 500
n = 1, 000
n = 5, 000
n = 10, 000
5.286E − 02 5.286E − 02 5.313E − 02 5.313E − 02 6.059E − 04 3.800E − 07
2.731E − 02 2.731E − 02 2.731E − 02 2.731E − 02 3.342E − 04 4.400E − 07
1.303E − 02 1.303E − 02 1.134E − 02 1.303E − 02 2.628E − 04 3.300E − 07
2.377E − 03 2.377E − 03 3.288E − 03 2.377E − 03 4.527E − 05 2.900E − 07
1.149E − 03 1.149E − 03 1.149E − 03 2.131E − 03 2.126E − 05 3.200E − 07
Table 7.2 provides our measure of accuracy computed by our Fortran program.8 Value function iteration without interpolation yields Euler equation residuals that decline from approximately five percent for n = 250 to approximately 0.1 percent for n = 10, 000. Together with runtime information presented in Table 7.1, this reveals a clear trade-off between accuracy and computational effort. In stark contrast, linear and cubic interpolation provide much higher degrees of accuracy for a number of grid points as small as n = 250: 0.06 percent and 0.00004 percent, respectively. Further increases in the number of grid points increase the runtime, but do not improve accuracy.9 8
R We do not report the accuracy results obtained from the use of GAUSS and MATLAB because the differences between the results of our Fortran programm and the results of R either the GAUSS or the MATLAB program are negligible and are caused by the respective implementation of the sparse matrix routines and the optimizer employed in algorithms five and six. 9 There are two lower bounds for accuracy: our stopping criteria for value function iteration and for the golden section search. Regarding the former, we stop after iteration s + 1 if
7.3 Solution of Stochastic Models
383
7.3 Solution of Stochastic Models In this section, we adapt the methods presented in the previous section to the stochastic growth model detailed in Section 1.4. This model belongs to a more general class of recursive problems that we describe in the next paragraph. We then develop a flexible algorithm that solves a discrete version of this problem via value function iteration.
7.3.1 Framework Let K and K 0 respectively denote the current and next-period endogenous state variables of the model and Z a purely exogenous shock governed by a covariance stationary stochastic process. The current-period return function u maps the triple (Z, K, K 0 ) ∈ X ⊆ R3 to the real line: u : X → R. The choice of K 0 is restricted to lie in a convex set DK,Z , which may depend on K and Z. In the stochastic growth model of Section 1.4, u(C) is the current-period utility of consumption C = Z f (K) + (1 − δ)K − K 0 and DK,Z := {K 0 : 0 ≤ K 0 ≤ Z f (K) + (1 − δ)K}. The solution of the problem is a value function v(K, Z) that solves the Bellman equation v(K, Z) = max u(Z, K, K 0 ) + βE v(K 0 , Z 0 )|Z , (7.6) K 0 ∈DK,Z
where E[·|Z] denotes expectations conditioned on the observed shock Z.
7.3.2 Approximations of Conditional Expectations As in Section 7.2, we replace the original problem by a discrete-valued problem and approximate the value function by an n × m matrix V = (vi j ), whose row i and column j argument gives the value of the optimal policy if the current state of the system is the pair (Ki , Z j ), Ki ∈ K = {K1 , K2 , . . . , Kn }, Z j ∈ Z = {Z1 , Z2 , . . . , Zm }. The next step in the approximation of the conditional expectation depends on the model’s assumptions with respect to Z. There are models that assume that Z is governed by a Markov chain with realizations given by the inequality kvs − vs+1 k∞ < 0.01(1 − β) holds. The latter is set to the square root of our machine epsilon and approximately equals 10−8 .
384
7 Discrete State Space Value Function Iteration
the set Z and transition probabilities given by a matrix P = (p jl ) ∈ Rm×m , whose row j and column l element is the probability of moving from Z j to state Zl (see Section 16.4 on Markov chains). In Section 8.3, for instance, you encounter a model with just two states. A household is either employed (Z1 = 1) or unemployed (Z2 = 0), and it faces given probabilities p12 of losing the job (if employed) or p21 of finding a job (if unemployed). Since the probability of staying employed p11 must equal 1 − p12 and the probability not to find a job must equal p22 = 1 − p21 , the matrix P is fully determined. Given Z and the matrix P, the Bellman equation of the discrete-valued problem is vi j =
max
Kk ∈Di j ∩K
u(Z j , Ki , Kk ) + β
m X
p jl vkl ,
l=1
(7.7)
i = 1, 2, . . . , n, j = 1, 2, . . . , m, where we use Di j as a shorthand for the set DKi ,Z j . As in the previous section, we can iterate over this equation to determine the matrix V . We suppose, as is the case in the benchmark model of Example 1.6.1, that z := ln Z follows an first-order autoregressive (AR(1))-process: z 0 = ρ z z + ε0 ,
ρz ∈ (−1, 1), ε0 iid N (0, σε2 ).
(7.8)
The first approach used to address this case is to use either Algorithm 16.4.1 or Algorithm 16.4.2 (see Section 16.4). Both approximate the process (7.8) with a finite-state Markov chain. Algorithm 16.4.2 determines the interval [z1 , zm ] endogenously from the parameters ρz , σz , and m, whereas Algorithm 16.4.1 requires an additional parameter that determines the length of the interval [z1 , zm ]. Both algorithms compute the elements of the set Z and the transition probability matrix P so that the discrete-valued Bellman equation (7.7) still applies. The second approach to approximate the conditional expectation on the rhs of the Bellman equation (7.6) rests on the analytic expression for E(·|Z). For the process (7.8), this expression equals Z∞ −(ε0 )2 0 1 2 0 0 E v(K , Z )|Z = v K 0 , eρz z+ε e 2σε d ε0 . p σε 2π −∞ If the value function is tabulated in the matrix V = (vi j ), we can interpolate between the row elements of V to obtain an integrable function of z, which allows us to employ numeric integration techniques to obtain E[·|Z]. As
7.3 Solution of Stochastic Models
385
explained in Section 14.3, Gauss-Hermite quadrature is a suitable method. In Heer and Maußner (2008), however, we point to a serious drawback of this approach. Gauss-Hermite quadrature may require a much larger interval for z than is necessary for simulations of the model. To see this, let x = −¯ x denote the smallest and the largest node employed in GaussHermite integration. For instance, x¯ ≈ 2.02 for five nodes. To ensure that the point p z 0 = ρz z + 2σε x¯ is also in Z for all zp∈ Z , the largest point of the grid (say, z¯), must be at least as large as 2σε x¯ /(1 − ρz ). Symmetrically, the smallest point p in the gridpmust be at least as small as − 2σε x¯ /(1 − ρz ). Accordingly, |¯ z − z| = 2 2σε x¯ . The unconditional standard deviation of the process Æ (7.8) is σz = σε / 1 − ρz2 . Relative to σz , the minimum size of the grid equals p p 1 − ρz µ=2 2 x¯ . 1 − ρz
For a given x¯ , this is an increasing function of the parameter ρz in the interval [0, 1). To give an example, µ ≈ 18 for ρz = 0.82 and x l = 2.02. Thus, instead of using an interval of size 3σz (as recommended for Algorithm 16.4.1), you must use an interval that is six times larger. In addition, and as explained below, the boundaries of K usually depend on the boundaries of Z . For a given number of grid points n, a larger interval I K = [K1 , Kn ] implies a less accurate solution that may outweigh the increase in precision provided by the continuous-valued integrand. With respect to the benchmark model of Example 1.6.1, Heer and Maußner (2008) indeed find that the Markov chain approximation allows a much faster computation of the value function for a given degree of accuracy. For this reason, we consider only this approach.
7.3.3 Basic Algorithm The problem that we thus have to solve, is to determine V = (vi j ) ∈ Rn×m iteratively from vis+1 j =
max
Kk ∈Di j ∩K
u(Z j , Ki , Kk ) + β
i = 1, 2, . . . , n, j = 1, 2, . . . , m.
m X l=1
s p jl vkl ,
(7.9)
386
7 Discrete State Space Value Function Iteration
This process also delivers the policy function H = (hi j ) ∈ Rn×m . In our basic algorithm, this matrix stores the index ki∗j of the optimal next-period state variable Kk0 ∈ K in its ith row and jth column element. The pair of indices (i, j) denotes the current state of the system, that is, (Ki , Z j ). We assume that the value function v of our original problem is concave in K and that the policy function h is monotonic in K so that we can continue to use all of the methods encountered in Section 7.2. As we have seen in this section, a reasonably fast algorithm should at least exploit the concavity of v and the monotonicity of h. Our basic algorithm, thus, consists of steps 1, 2.1, and 2.2i of Algorithm 7.3.1 (see below). We first discuss the choice of K and V 0 before we turn to methods that accelerate convergence and increase precision.
7.3.4 Initialization The choice of the grid K and the initial value function V 0 are a bit more delicate than the respective step of Algorithm 7.2.2. In the deterministic growth model considered in the previous section, the optimal sequence of capital stocks is either increasing or decreasing, depending on the given initial capital stock K0 . This makes the choice of K easy. In a stochastic model, the future path of K depends on the expected path of Z, and we do not know in advance whether for any given pair (Ki , Z j ) the optimal policy is to either increase or decrease K. For this reason, our strategy to choose K is ‘guess and verify’. We start with a small interval. If the policy function touches the boundaries of this interval, that is, if hi j = 1 or hi j = n for any pair of indices, we enlarge K . In the case of the stochastic growth model (1.24), an educated guess is the following: If the current shock is Z j and if we assume that Z = Z j forever, the sequence of capital stocks will approach K ∗j , determined from 1 = β(1 − δ + Z j f 0 (K ∗j )).
(7.10)
∗ Therefore, the approximate lower and upper bounds are K1∗ and Km , respectively. Since the stationary solution of the discrete-valued problem differs from the solution of the continuous-valued problem, K1 (Kn ) should ∗ be chosen as a fraction (a multiple) of K1∗ (Km ). As we already know from Section 7.2, computation time also depends on the initial V 0 . Using the zero matrix is usually not the best choice, but it may be difficult to find a better starting value. For instance, in the
7.3 Solution of Stochastic Models
387
stochastic growth model, we may try vi0j = u(Z j f (Ki ) − δKi ), that is, the utility obtained from a policy that maintains the current capital stock for one period. On the other hand, we may compute V 0 from the m different stationary solutions that result if Z equals Z j forever: vi0j
= u(Z j f
(K ∗j ) − δK ∗j ) + β
m X l=1
p jl vil0 ,
where K ∗j solves (7.10). This is a system of linear equations in the nm unknowns vi0j with solution V 0 = I − β P0
−1
U,
U = (ui j ), ui j = u(Z j f (Ki ) − δKi ), Ki = K ∗j , i = 1, . . . n, j = 1, . . . , m.
A third choice is vi0j = u( f (K ∗ ) − δK ∗ )/(1 − β), i.e., the value obtained from the stationary solution of the deterministic growth model. There is, however, an even better strategy: 1) We start with a coarse grid on the interval [K1 , Kn ]; 2) we use the basic algorithm to compute the value function V ∗ on this grid; and 3) we make the grid finer by using more points n. 4) We interpolate columnwise between neighboring points of the old grid and the respective points of V ∗ to obtain an estimate of the initial value function on the finer grid. Since on a coarse grid the algorithm quickly converges, the choice of V 0 in step 1) is not truly important, and V 0 = 0 may be used.
7.3.5 Interpolation We know from the results obtained in Section 7.2 that interpolation between the points of K is one way to increase the precision of the solution. Within the current framework, the objective is to obtain a continuous ˆ function φ(K) that approximates the rhs of the Bellman equation (7.6) given the tabulated value function in the matrix V and the grid K . We achieve this by defining ˆ φ(K) = u(Z j , Ki , K) + β
m X
p jl vˆl (K).
(7.11)
l=1
The function vˆl (K) is obtained from interpolation between two neighboring points Ki and Ki+1 from K and the respective points vil and vi+1l
388
7 Discrete State Space Value Function Iteration
ˆ from the matrix V . Thus, each time the function φ(K) is called by the maximization routine, m interpolation steps must be performed. For this reason, interpolation in the context of a stochastic model is much more time consuming than in the case of a deterministic model. Our Algorithm 7.3.1 allows for either linear or cubic interpolation in optional Step 2.2.ii.
7.3.6 Acceleration In Section 7.2, we discover that policy function iteration is a method to accelerate convergence. This method assumes that a given policy H 1 is maintained forever. In the context of the Bellman equation (7.7), this approach provides a linear system of equations in the nm unknowns vi j (for the moment, we suppress the superscript of V ): vi j = ui j + β
m X l=1
p jl vhi j l ,
ui j := u(Z j , Ki , Khi j ), i = 1, 2, . . . , n, j = 1, 2, . . . , m. Using the properties of the vec operator in (12.21) enables us to write this equation in matrix notation: vec V = vec U + βQ vec V,
U = (ui j ).
The nm × nm matrix Q is obtained from H and P: Its row r = (i − 1)m + j elements in columns c1 = (hi j − 1)m + 1 through cm = (hi j − 1)m + m equal the row j elements of P. All other elements of Q are zero. Even for a grid Z with only a few elements m, Q is much larger than its respective counterpart in equation (7.3). Table 7.1 in the previous section reveals that full policy iteration does not unambiguously outperform partial policy iteration. Therefore, we implement only modified policy iteration in our algorithm. Let s = 1, 2, . . . denote the current iteration over the value function, H s = (hsi j ) ∈ Rn×m the policy function, U s = (ui j ) ∈ Rn×m = u(Z j , Ki , Khs ) ij
the return matrix, V s ∈ Rn×m the respective value function, and R the number of iterations over the policy function. We obtain the value function for s + 1 from w1 = Vs , w r+1 = vec U + βQw r , vec V
s+1
=w
R+1
.
r = 1, 2, . . . R,
(7.12)
7.3 Solution of Stochastic Models
389
We can also use modified policy function iteration in the case of interpolation between the elements of K . Let K(i, j) denote the maximizer K of equation (7.11) and W r = (w ri j ) ∈ Rn×m the tabulated value function computed at step r = 1, 2, . . . , R of the iterations over the policy function. From this matrix, we find the value at the point (K(i, j), Zl ), l = 1, . . . , m ˆ r (K(i, j), l) denote this from interpolation by using column l of W r . Let w value. The element w r+1 of the value function that results from K(i, j) and ij W r is given by w r+1 i j = u(Z j , Ki , K(i, j) + β
m X
ˆ r (K(i, j), l). p jl w
(7.13)
l=1
We initialize W 1 with the matrix V s ∈ Rn×m obtained at step s = 1, 2, . . . of the iterations and replace V s+1 with the matrix W R+1 . Algorithm 7.3.1 (Value Function Iteration 2) Purpose: Find an approximate policy function of the recursive problem (7.6) given a Markov chain with elements Z = {Z1 , Z2 , . . . , Zm } and transition matrix P. Steps: Step 1: Choose a grid K = {K1 , K2 , . . . , Kn }, Ki < K j , i < j = 1, 2, . . . n,
and initialize V 0 . Step 2: Compute a new value function V 1 and an associated policy function H 1 : For each j = 1, 2, . . . , m, repeat these steps: Step 2.1: Initialize: k0∗ j = 1. ∗ Step 2.2: i) For each i = 1, 2, . . . , n and ki−1 j , use Algorithm 7.2.1 ∗ to find the index k that maximizes w k = u(Z j , Ki , Kk ) + β
m X l=1
0 p jl vkl
∗ ∗ in the set of indices k ∈ {ki−1 j , ki−1 j + 1, . . . , n}. Set
ki∗j = k∗ . If interpolation is not desired, set h1i j = k∗ and
vi1j = w k∗ , else, proceed as follows: ii) (optional) If k∗ = 1 evaluate the function φˆ defined by equation (7.11) at a point close to K1 . If the result is smaller than the value at
390
7 Discrete State Space Value Function Iteration
˜ = K1 , else, use Algorithm 15.4.1 to find the maxK1 , set K ˜ ˜ in h1 and imizer K of φˆ in the interval [K1 , K2 ]. Store K ij ˆ K ˜ ) in v 1 . Proceed analogously if k∗ = n. If k∗ equals φ( ij
˜ of φˆ in the interval neither 1 nor n, find the maximizer K 1 ˆ K ˜ and v 1 = φ( ˜ ). [Kk∗ −1 , Kk∗ +1 ] and set hi j = K ij Step 2.3: (optional) i) If only Step 2.2.i was taken: Compute V 1 from equation (7.12). ii) If Step 2.2.ii was also taken: Compute V 1 from equation (7.13). Step 3: Check for convergence: If max |vi1j − vi0j | ≤ ε(1 − β),
i=1,...,n j=1,...,m
ε ∈ R++
(or if the policy function has remained unchanged for a number of consecutive iterations), stop, else replace V 0 with V 1 and H 0 with H 1 , and return to Step 2. Note that in Step 2, the loop over the elements of the Markov chain can also be run in parallel. Furthermore, observe that except for the maximization Step 2.2, the basic structure of the algorithm is not limited to onedimensional problems. Consider, for instance, the case of two endogenous states, i = 1, 2 and grid points n1 and n2 . We can construct a (n1 × n2 ) × 2 matrix K that stores the Cartesian product of two grids K := K1 × K2 . Then, we can loop over the rows of this matrix to find for each (Ki , K j ) ∈ K the maximizer (Ki0 , K 0j ) ∈ K . Similarly, the algorithm can manage more than one AR(1)-process. In obvious notation, let m1 and m2 denote the number of elements of two Markov chains, m := m1 × m2 , and construct the m × 2 matrix Z that stores the Cartesian product Z := Z1 × Z2 . If both processes are independent, the transition matrix P is equal to the Kronecker product P = P1 ⊗ P2 ∈ Rm×m , and the value function v is a matrix of size V ∈ Rn×m . Accordingly, the summation over l = 1, . . . , m in Step 2.2 still applies. For correlated shocks, the methods provided by Terry and Knotek II (2011) and Gospodinov and Lkhagvasuren (2014) can be used to construct the transition probability matrix P. We provide implementations of Algorithm 7.3.1 in Fortran, GAUSS, and R MATLAB ; they can be found in the programs SGM_VI.f90, SGM_VI.g, and SGM_VI.m, respectively. These programs facilitate six different methods:
7.3 Solution of Stochastic Models
391
1) Value function iteration: Step 1, Step 2.1, Step 2.2.i, and Step 3, 2) Modified policy function iteration (method 1 amended by Step 2.3.i), 3) Value function iteration with linear interpolation: Step 2.2.ii in addition to Step 2.2.i, 4) Value function iteration with linear interpolation and modified policy iteration (method 3 amended by Step 2.3.ii), 5) Value function iteration with cubic interpolation, and 6) Value function iteration with cubic interpolation and modified policy iteration (method 5 amended by Step 2.3.ii).
7.3.7 Value Function Iteration and Linear Programming Trick and Zin (1993, 1997) employ linear programming to compute the value function. Their approach rests on the observation that the elements of the discrete value function V = (vi j ) satisfy the set of n2 m linear constraints vi j ≥ u(Z j , Ki , Kl ) + β p j1 vl1 + · · · + p jm vlm , i = 1, . . . , n, j = 1, . . . , m, l = 1, . . . , n.
The smallest nm elements that meet these conditions minimizes the linear function n X m X
vi j .
i=1 j=1
This is a standard linear programming (LP) problem without upper bounds: min c T v, v
subject to
(7.14)
u ≤ Av.
In this notation, v = vec V , c T is a row vector with all of its nm elements being equal to one, and u is the n2 m row vector defined by u(r(i, j, l)) = u(Z j , Ki , Kl ), r(i, j, l) = (i − 1)nm + ( j − 1)n + l,
i = 1, . . . , n, j = 1, . . . , m, l = 1, . . . , n.
392
7 Discrete State Space Value Function Iteration
The elements a(r, q) of the n2 m × nm matrix A are equal to zero, except the following ones: a(r(i, j, l), ( j − 1)n + i) = 1 if l 6= i, ¨ 1 − β p js , a(r(i, j, l), (s − 1)n + l) = −β p js ,
if l = i and s = j other wise.
,
i = 1, . . . , n, j = 1, . . . , m, l = 1, . . . , n, s = 1, . . . , m. Trick and Zin (1993, 1997) solve the stochastic growth model for m = 2 with discrete VI and with the fast commercial LP solver CPLEX. They document considerable gains in speed for the linear programming approach. For instance, for m = 2 and n = 33 linear programming is almost 80 times faster than VI. For m = 2 and n = 513, LP is approximately 13 times faster than VI. The LP approach is useful for the reduction of the dimension of the problem (see Trick and Zin (1997)). Suppose we partition the grid Sp K = {K1 , . . . , Kn } into p < n subsets Kl , l = 1, . . . , p so that K = l=1 Ki and approximate the value function on each of them by a linear spline Vl, j (K) := al, j + bl, j K, l = 1, . . . , p, j = 1, . . . , m. The objective function of the LP problem is now given by ¨ p n X m X X 1, if Ki ∈ Kl min 1i,l Vl, j (Ki ), 1i,l = al, j ,bl, j 0, otherwise. i=1 j=1 l=1 This is a linear function in 2pm parameters, instead of nm parameters of the original problem (7.14). For instance, for n = 1, 000 and m = 9 we are able to reduce the number of variables from 9,000 to only 40 × 9 = 360 if we use p = 20 subsets. The linear restrictions on the parameters involve p X l=1
1i,l Vl, j (Ki ) ≥ u(Z j , Ki , K r ) + β
m X q=1
p j,q
p X
1 r,l Vl,q (K r ),
l=1
i = 1, . . . , n, j = 1, . . . , m, r = 1, . . . , n and the continuity conditions al, j = al+1, j bl, j Kl,l+1 = bl+1, j Kl,l+1 where Kl,l+1 denotes the elements of K that join the sets Kl and Kl+1 (see Section 13.6.1). For instance, if K = {K1 , . . . , K5 , . . . , K10 } and p = 2 so that K1 = {K1 , . . . , K5 } and K2 = {K5 , . . . , K10 }, K1,2 = K5 .
7.3 Solution of Stochastic Models
393
Cai et al. (2017) adapt this approach to problems with continuous state and control variables. They use nonlinear programming and other tools, e.g., shape-preserving splines and numerical quadrature, and document considerable gains both in speed and accuracy vis-à-vis the LP approach.
7.3.8 Evaluation PARAMETERIZATION. We apply these six methods to the stochastic growth model presented in (1.24). We parameterize the model as usual: u(C) =
C 1−η , 1−η
ez f (K) = ez K α ,
z 0 = ρz + ε0 ,
ε0 iid N (0, σε2 ).
The parameters of the model are set equal to α = 0.36, β = 0.996, η = 2.0, δ = 0.016, ρ = 0.84, and σε = 0.0072. The programs compute the value and the policy function on a grid of n×m points. We use Algorithm 16.4.2 and approximate the stochastic process for z with a Markov chain of m = 9 elements. The boundaries of the grid for capital are found by assuming that z = z1 and z = z9 forever so that the stock of capital approaches
1 1 − β(1 − δ) α−1 K(z1 ) = , αβ ez1 1 1 − β(1 − δ) α−1 K(z9 ) = , αβ ez9 respectively. We then set the lower (upper) bound of the grid equal to 0.6K(z1 ) (1.4K(z9 )). RUNTIME. We initialize the value function on a small grid of n = 250 points for the capital stock, extend it with linear interpolation to the desired number of grid points, and measure the time required to compute the final solution. In all cases without interpolation, we stop the computation if the policy function remains unchanged over 30 consecutive iterations. Alternatively, we end the computation if the maximum absolute difference
394
7 Discrete State Space Value Function Iteration
between two successive approximations of the value function is smaller than 0.01(1 − β). Table 7.3 mainly echoes the findings from the deterministic growth model. The compiled Fortran code is the fastest. Method 1 requires less than 1 second for n = 500 and approximately 40 seconds for n = 10, 000. Modified policy iteration with R = 20 considerably speeds up convergence. The runtime ranges from 15 hundredths of a seconds for n = 500 to approximately 4 seconds for n = 10, 000. Employing interpolation slows down the computations. Linear interpolation (Method 3) requires between (approximately) 1 and 16 minutes. Cubic interpolation (Method 5) is even more time consuming, with a runtime between slightly more than 1 minute and more than 22 minutes. Modified policy iterations (Methods 4 and 6) substantially reduce the runtime; in the case of n = 10, 000, the reduction exceeds a factor of 6 for both methods. Table 7.3 Value Function Iteration in the Stochastic Growth Model: Runtime Language Method
n = 500
n = 1, 000
n = 5, 000
n = 10, 000
0:00:17:81 0:00:01:84 0:08:06:87 0:01:14:38 0:11:06:14 0:01:43:34
0:00:39:94 0:00:03:94 0:16:38:81 0:02:37:30 0:22:38:19 0:03:34:08
Fortran Fortran Fortran Fortran Fortran Fortran
1 2 3 4 5 6
0:00:00:79 0:00.00:15 0:00:46:27 0:00:05:98 0:01:03:59 0:00:09:07
0:00:02:26 0:00:00:32 0:01:25:74 0:00:12:28 0:02:07:07 0:00:17:52
GAUSS GAUSS GAUSS GAUSS GAUSS GAUSS
1 2 3 4 5 6
0:00:35:00 0:00:07:99 1:08:48:08 0:10:02:90 1:32:12:69 0:13:25:45
0:01:36:58 0:14:14:95 0:37:00:39 0:00:19:23 0:01:49:97 0:03:37:28 2:35:46:36 0:22:34:87 3:36:41:82 0:31:15:08
R MATLAB
MATLAB R R MATLAB
MATLAB R R MATLAB
MATLAB R
1 2 3 4 5 6
0:00:37:17 0:00:10:44 0:37:08:43 0:06:22:58 0:35:12:77 0:06:18:16
0:01:47:60 0:00:25:72 1:12:35:14 0:12:58:73 1:12:35:21 0:12:21:50
0:15:45:61 0:40:15:80 0:02:16:96 0:04:49:05 5:32:13:68 1:01:51:22
Notes: Runtime is given in hours:minutes:seconds:hundredths of a second on a R workstation with Intel Xeon W-2133 CPU running at 3.60 GHz.
7.3 Solution of Stochastic Models
395
The interpreted languages are considerably slower. For this reason, we have not solved the model for all six methods with all four different capital grid points n. For n = 10, 000, Fortran solves the model with Method 1 in less than 2% R of the time required either by GAUSS or MATLAB . In this case, modified policy iteration reduces the runtime by a factor between 10 (GAUSS) and R 8 (MATLAB ). For the moderately sized grid n = 1, 000 and with linear interpolation between the points of the capital grid, the relation between the runtime R of Fortan and MATLAB is approximately 1:50; for GAUSS, it is 1:108. Modified policy iteration speeds up the solution by a factor of more than 5 R in GAUSS code and more than 6 in MATLAB . R Linear and cubic interpolation requires less time in MATLAB than in GAUSS. Even for n = 500, GAUSS needs almost 1 hour more than R MATLAB to solve the model with method 5. In our GAUSS program SGM_VI_MT.g we parallelize the loop over the elements of the Markov chain using the threadfor loop. For the time consuming solution on grids with n = 500 and m = 9 elements and linear interpolation between grid points we are able to halve the runtime. ACCURACY. We measure the accuracy of the solution by the residuals of the Euler equation10 n o 0 C −η = βE (C 0 )−η 1 − δ + α(eρz+ε )(K 0 )α−1 z . The residual is computed by replacing C and C 0 in this equation with the approximate policy function for consumption, ˆhC (K, z) = ez K α + (1 − δ)K − ˆhK (K, z),
where the policy function for the next-period capital stock ˆhK is obtained from bilinear interpolation between the elements of the matrix H. We approximate the integral with the Gauss-Hermite quadrature formula (14.30) on 7 nodes. The residuals are computed over the Cartesian product from a grid of 200 points over the interval [0.75K ∗ , 1.25K ∗ ] and a grid of 100 points over the interval [−2σz , 2σz ], where σz denotes the unconditional standard deviation of the log TFP shock z. 10
See Section 1.4.2, where we derive this equation, and Section 1.7.2, where we introduce this concept.
396
7 Discrete State Space Value Function Iteration Table 7.4 Value Function Iteration in the Stochastic Growth Model: Accuracy Method
n = 500
n = 1, 000
n = 5, 000
n = 10, 000
m=9, Rouwenhorst 1 2 3 4 5 6
1 6
1 6
3.815E − 02 3.815E − 02 7.520E − 04 7.521E − 04 1.183E − 05 1.182E − 05
1.851E − 02 1.851E − 02 3.821E − 04 3.821E − 04 1.179E − 05 1.182E − 05
3.461E − 03 3.461E − 03 7.356E − 05 7.361E − 05 1.187E − 05 1.179E − 05
2.137E − 03 2.137E − 03 4.719E − 05 4.687E − 05 1.180E − 05 1.182E − 05
3.960E − 02 3.860E − 06
1.883E − 02 4.000E − 06
3.767E − 03 3.860E − 06
2.476E − 03 3.900E − 06
3.815E − 02 1.556E − 05
1.851E − 02 1.554E − 05
3.512E − 03 1.544E − 05
2.142E − 03 1.550E − 05
m=21, Rouwenhorst
m=9, Tauchen
Table 7.4 displays the maximum absolute value of the 200×100 residuals computed from the Fortran program SGM_VI.f90. With only a few grid points, n = 500, linear interpolation decreases the Euler residual from 3.815×10−2 to 7.52×10−4 by a factor greater than 50. Cubic interpolation achieves a factor of over 322. According to Table 7.3, Fortran requires less than 10 seconds to compute this solution with method 6. Increasing the number of grid points from n = 500 to n = 10, 000 is less successful in increasing accuracy. Method 2 reduces the Euler residual from 3.815×10−2 for n = 500 to 2.137 × 10−3 for n = 10, 000, a factor of approximately 18. Note also that increasing the number of grid points does not further improve accuracy when interpolation is employed. The results for m = 21 show that adding more points to the Markov chain does not improve accuracy for the method without interpolation. With cubic interpolation, this strategy moderately reduces the computation error by a factor of approximately 3. In our example, we observe that the approximation of the Markov process by the methods suggested by either Rouwenhorst (1995) or Tauchen (1986) has negligible effects on the accuracy of our results: in the first and sixth entry rows (Rouwenhorst approximation) and the last two entry rows (Tauchen approximation) in Table 7.4, the digit order of the Euler
7.4 Further Applications
397
equation errors is identical. We attribute this finding to the moderate autocorrelation ρ = 0.84 of the process for log TFP. As shown by Galindev and Lkhagvasuren (2010), Tauchen’s method requires many more grid points to approximate a highly persistent continuous AR(1)-process with the same degree of precision as Rouwenhorst’s method. For instance, for ρ = 0.99, n = 5, 000, and m = 9 our method 6 yields an Euler equation residual that is one order of magnitude larger for the Tauchen approximation than for the Rouwenhorst approximation. Since computational time increases with the number of grid points m, Algorithm 16.4.2 should be used instead of Algorithm 16.4.1 to approximate AR(1)-processes that are highly persistent. VI VERSUS LP. Finally, we compare the linear programming approach with R value function iteration. Our program SGM_LP.m employs the MATLAB solver linprog to solve (7.14). For m = 3 and different sizes of the grid K , we compare the solution with solutions obtained from simple VI that only exploits the monotonicity of the value function with respect to runtime and accuracy. We stop iterations if kvs+1 − vs k∞ < 0.01 × (1 − β). For the same level of accuracy, measured by the maximal Euler equation residual, LP outperforms VI only on a small grid with n = 100 elements. On the larger grid with n = 150, our LP solver requires 11 times more runtime than our VI routine, and, for n=500, 1,000 times more runtime (see Figure 7.1). Accordingly, we attribute the computational gains reported in Trick and Zin (1997) to the faster LP solver employed in their study.
7.4 Further Applications In this section, we consider two applications of Algorithm 7.3.1. The first model is the stochastic growth model under the assumption that the given stock of capital cannot be transferred into consumption goods. This places a nonnegativity constraint on investment. Second, we compute a discrete approximation of the policy function of our benchmark model.
398
7 Discrete State Space Value Function Iteration Runtime 1,000 750 500 250 0 100
150
200
250
300
350
400
450
500
350
400
450
500
n Accuracy
1.0
0.8
0.6 100
150
200
250
300
n Figure 7.1 VI versus LP
7.4.1 Nonnegative Investment Perturbation methods as well as the extended path algorithm are not suitable for models with binding constraints. The former require that the system of equations that determines the model’s dynamics is sufficiently differentiable at one point in the state space. A single binding constraint at this point is sufficient to violate this condition.11 Even if none of the constraints binds at this point but does so at nearby points, the exact policy functions have kinks that the approximate policy functions do not display. Thus, as soon as the model leaves the close vicinity of the stationary point, the approximate policy functions are no longer applicable. The nonlinear 11
Guerrieri and Iacoviello (2015) propose a piecewise first-order perturbation to solve models with occasionally binding constraints.
7.4 Further Applications
399
methods that we employ to solve for the model’s rational expectations path in Section 6.2 also rely on differentiable functions. However, even if one Each time a constraint binds, it creates a different branch of the economy’s future time path. All these different paths must be compared to each other to single out the correct one. This is a formidable task even in models with one state variable, easily encountering reasonable limits on computation time. Within the recursive approach taken in this chapter, however, it is not very difficult to take care of constraints. The stochastic growth model with a binding constraint on investment is a good example to make that point. MODEL. We suppose that it is not possible to consume the current stock of capital so that consumption cannot exceed production. This places the restriction K 0 ≥ (1 − δ)K on the choice of the future capital stock. Equivalently, investment I = K 0 − (1 − δ)K cannot be negative. Thus, the problem is to find a value function that solves the Bellman equation v(K, Z) = max
K 0 ∈DK,Z
u(Z f (K) + (1 − δ)K − K 0 ) + β E v(K 0 , Z 0 )|Z ,
(7.15)
DK,Z = K 0 : (1 − δ)K ≤ K 0 ≤ Z f (K) + (1 − δ)K . In Problem 7.1, we ask you to derive the first-order conditions for this maximization problem from the Karush-Kuhn-Tucker theorem 1.2.1 under the assumption of a given value function v. These conditions are required to compute Euler equation residuals for this model. However, to find v, it is not necessary to know these conditions at all. MODIFICATIONS OF THE ALGORITHM. The most obvious strategy to address the constraint is to disregard points Kk ∈ K in Step 2.2 i) for which Kk < (1 − δ)Ki holds. ∗ Instead of starting the search with the index k = ki−1 j , we first check if Kk ≥ (1 − δ)Ki . If Kk violates this condition, we try Kk+1 and so forth
400
7 Discrete State Space Value Function Iteration
until we arrive at a point Kk+r , r = 1, 2, . . . , n − k that meets this condition. Since (1 − δ)Ki < Ki , this r always exists. Then, we locate k in the set ∗ {ki−1 j + r, . . . , n}. Similar changes must be made to Step 2.2.ii. We think this is a good exercise, and we leave these changes to the reader (see Problem 7.1). However, there is an even simpler way to integrate the constraint without the necessity to modify Algorithm 7.3.1. We change the code in the function that returns the utility at the given triple (Z j , Ki , Kk ) in the Bellman equation (7.9): If Kk is smaller than (1−δ)Ki , we restrict consumption to output: ¨ Z j Kiα , if Kk ≤ (1 − δ)Ki , C= (7.16) α Z j Ki + (1 − δ)Ki − Kk , otherwise. Accordingly, the utility remains at u(Z j Kiα ) for all Kk ≤ (1 − δ)Ki . This indeed imposes the constraint. Imagine we start iterations with V 0 = (v ji ) = 0 ∀i = 1, . . . , n, j = 1, . . . , m. Without the constraint, the optimal strategy is to choose K1 for all Ki since C = Z j Kiα − δK1 spends the highest utility if capital is restricted to K ≥ K1 . With condition (7.16) imposed, our search algorithm (7.2.1) selects the index k∗ , for which the following holds: Z j Kiα + (1 − δ)Ki − Kk? ≥ Z j Kiα > Z j Kiα + (1 − δ)Ki − Kk? +1 .
The resulting matrix V 1 is then increasing in K so that also in all following iterations, the algorithm does not pick capital stocks that violate the constraint. RESULTS. The reader can find the modified Algorithm 7.3.1 in both the GAUSS program SGM_NNI_VI.g and the Fortran program SGM_NNI_VI. f90. For the constraint to bind, it requires large productivity shocks. Instead of σε = 0.0072 (the value that we use in the model of the previous section), we set σε = 0.05 and leave all other parameter values unchanged. We use a Markov chain of m = 21 points to approximate the AR(1)-process of the natural log of TFP. Our grid of the capital stock has n = 500 elements, and we interpolate cubically between these points. Figure 7.2 displays the policy function of consumption ˆhC (K, Z) in the domain [0.75K ∗ , 1.25K ∗ ] × ±4.2σz .12 The policy function is computed 12
The unconditional standard deviation of the AR(1)-process is σz = σε /
Æ
1 − ρz2
7.4 Further Applications
401
Consumption C
at 1002 pairs (K, z) via bilinear interpolation from the policy function of the next-period capital stock. The graph displays a clear kink. For each K, there is a threshold value for consumption that depends on the log of TFP. Below this point, the household would like to consume some of its capital stock to smooth consumption. Above this point, the constraint does not bind.
4
3.5
3 0.2
0
Log TFP ln Z
−0.2
70
80
90
100
110
Capital K
Figure 7.2 Policy Function for Consumption in the Stochastic Growth Model with Nonnegative Investment
7.4.2 The Benchmark Model BELLMAN EQUATION. Before we can solve the model from Example 1.6.1 with Algorithm 7.3.1, we must reformulate the optimization problem as a stationary dynamic programming problem. Let c t := C t /A t and k t := K t /A t denote consumption C t and capital K t scaled by the level of labor augmenting technical progress A t . Instead of the level of TFP Z t , we use its natural logarithm z t = ln Z t as in the previous section. Recall also that L t ∈ (0, 1] denotes hours of work. In these variables, the period t resource constraint of the economy reads
402
7 Discrete State Space Value Function Iteration
ak t+1 = ez t kαt L 1−α + (1 − δ)k t − c t . t
The period t planning problem with A t = 1 can then be written as max c t ,L t
Et
∞ X s=0
(β a
1−η s
)
1−η
c t+s (1 − L t+s )θ (1−η)
1−η
+ λ t+s ez t+s kαt+s L 1−α t+s + (1 − δ)k t+s − c t+s − ak t+s+1 , η
where λ t = Λ t A t (see Section 1.6). Accordingly, in the Bellman equation of the stationary optimization problem, the appropriate discount factor is not β but β˜ := β a1−η , and we can define this function as: 1−η z α 1−α e k L + (1 − δ)k − ak0 (1 − L)θ (1−η) v(k, z) = max k0 ,L∈(0,1] 1−η 0 0 ˜ + βE v(k , z )|z .
It is easy to apply Algorithm 7.3.1 to this model. There is just one change compared with the stochastic growth model of Section 7.3: Inside the procedure that returns the household’s utility u as a function of (z, k, k0 ), we must solve for L. We can use the first-order condition with respect to working hours for this purpose. Differentiating the rhs of the Bellman equation with respect to L and setting the result equal to zero yields θ (ez kα L 1−α + (1 − δ)k − ak0 ) = (1 − α)(1 − L)ez kα L −α .
(7.17)
For each k0 < (ez kα + (1 − δ)k)/a, this equation has a unique solution in (0, 1). We use the modified Newton-Raphson method described in Section 15.3.2 to solve this equation. From this solution, we can compute c and the utility u. SOLUTION. We parameterize the model as before (see Table 1.1): a = 1.003, α = 0.36, β = 0.996, δ = 0.014, η = 2, ρz = 0.82, σε = 0.0071, and θ replicates L = 0.126. We compare three solutions that rest on a Markov chain approximation of the process z t+1 = ρz z t + ε t+1 ,
ρz ∈ (−1, 1), ε t+1 iid N (0, σε )
(7.18)
with Algorithm 16.4.2 and m = 15 elements. The interval for the capital stock is symmetric around the stationary capital k and equal to [0.5k, 1.5k].
7.4 Further Applications
403
The first solution employs a grid of n = 50, 000 equally spaced points over this interval, and the algorithm does not interpolate between the elements of the grid. The second solution uses only n = 500 grid points over the same interval with linear interpolation. The third solution employs the same grid as the second one, but the algorithm interpolates cubically. To reduce the runtime, we first solve the problem on a grid of n = 250 and m = 15 points, and then, we use the result to initialize the value function for subsequent computations. It takes approximately seven minutes to find the first solution, 15 seconds for the second, and 20 seconds for the third. For each solution, we measure its accuracy as follows: We choose 200 equally spaced points over [0.9k, 1.1k] and 100 equally spaced points over 3.5[−σz , σz ] and compute the residuals of the Euler equation (1.64h) in the same way as explained in Section 4.4, where we obtain k0 from bilinear interpolation over the policy function for capital and L from solving equation (7.17). We compute expectations with Gauss-Hermite integration on 7 nodes. Simulations of the model work in the same way: We draw a sequence of innovations ε0t+1 , iterate over the process (7.18) to find the time path of the shock, start at t = 0 with k t = k and compute the future path of capital via bilinear interpolation. For each k0 = k t+1 , we solve equation (7.17) and compute the other variables, output, investment, and the real wage from the respective equations. RESULTS. Table 7.5 depicts the results produced with our Fortran program BM_VI.f90. The time series moments are from a simulation with T = 50, 000 observations after a burn-in period of 1, 000 quarters. 100 percent of the time series of the capital stock and almost 100 percent of the TFP series are within the grid chosen for the Euler equation residuals. The table provides two insights. The first one echoes the results known from the stochastic growth model. We can obtain a high degree of accuracy either from a very large number of grid points or from linear interpolation on a grid with a moderate number of points. Furthermore, cubic interpolation increases precision vis-à-vis linear interpolation by a factor of more than one order of magnitude. The second insight also confirms findings in previous chapters. For a reasonable degree of accuracy, the time series second moments are (in this case) virtually independent of the details of the algorithm that produced the solution.
404
7 Discrete State Space Value Function Iteration Table 7.5 VI Solution of the Benchmark Business Cycle Model
Interpolation Capital grid
none n = 50, 000
linear n = 500
cubic n = 500
Second Moments Variable Output Consumption Investment Hours Real Wage
sx
rx y
rx
sx
rx y
rx
sx
rx y
rx
1.39 0.46 4.14 0.82 0.57
1.00 0.99 1.00 1.00 0.99
0.64 0.66 0.64 0.64 0.65
1.39 0.46 4.14 0.82 0.57
1.00 0.99 1.00 1.00 0.99
0.45 0.66 0.64 0.64 0.65
1.39 0.46 4.14 0.82 0.57
1.00 0.99 1.00 1.00 0.99
0.64 0.66 0.64 0.64 0.65
Euler Equation Residuals 2.32E − 4
5.95E − 4
6.76E − 6
Notes:Second moments computed from HP-filtered simulated time series with 50,000 included observations and a burn-in period of 1,000 observations. s x :=standard deviation of variable x, r x y :=cross correlation of variable x with output, r x :=first order autocorrelation of variable x. Euler equations residuals: maximum absolute value of 200 × 100 residuals computed over the Cartesian product of [0.9k, 1.1k] × 3.5[−σz , σz ]. Gauss-Hermite integration with 7 nodes
Problems
405
Problem 7.1: Stochastic Growth Model with Non-Negative Investment In Section 7.4.1, we consider the stochastic growth model with non-negative investment. 1) Use the Karush-Kuhn-Tucker theorem 1.2.1 and the procedure outlined in Section 1.3.3 to derive the Euler equation of this model from the Bellman equation (7.15). 2) Devise a procedure to compute the residuals of this equation. R 3) Modify the function SGM_VI_3 in the MATLAB program SGM_VI.m (or the respective procedures in the programs SGM_VI.g or SGM_VI.f90) so that it can handle the non-negativity constraint on investment in the case of interpolation between grid-points.
Problem 7.2: Stochastic Growth In the benchmark model of Example 1.6.1, labor-augmenting technical progress grows deterministically. Suppose instead the following production function Yt = K tα (A t L t )1−α , where the log of labor-augmenting technical progress A t follows the process A t = aeεat A t−1 , εat iid N (0, σ2a ). The household’s period utility function is parameterized as 1−η θ (1−η) C t (1−L t ) if η 6= 1, 1−η u(C t , 1 − L t ) := ln C + θ ln(1 − L ) if η = 1. t t
Use a = 1.0032, σa = 0.0116, and the parameter values given in Table 1.1 to calibrate the model. 1) Show that life-time utility, i.e., the infinite sum U t := E t
∞ X s=0
β s u(C t+s , 1 − L t+s )
can be recursively defined as U t = u(C t , 1 − L t ) + βE t U t+1 .
Hint: Expand the sum (P.7.2.1), note that U t+1 =u(C t+1 , 1 − L t+1 ) ¦ © + E t+1 βu(C t+2 , 1 − L t+2 ) + β 2 u(C t+3 , 1 − L t+3 ) + . . . ,
and use the law of iterated expectation E t E t+1 U t+1 = E t U t+1 .
(P.7.2.1)
406
7 Discrete State Space Value Function Iteration η−1
2) Accordingly, instead of U t , we can consider scaled utility u t := U t A t−1 : 1−η
ut =
ct
(1 − L t )θ (1−η) − 1 1−η
1−η
+ β at
E t u t+1 .
In the scaled variables a t := A t /A t−1 , c t = C t /A t−1 , k t := K t /A t−1 , and y t := Yt /A t−1 , the planner’s problem can be stated as the following dynamic programming problem: v(k t , εat ) = max
1−η
ct
(1 − L t )θ (1−η) − 1 1−η
c t ,L t
subject to,
1−η
+ β at
E t u(k t+1 , εat+1 ) ,
a t k t+1 = kαt (a t L t )1−α + (1 − δ)k t − c t , k t given.
Solve this problem via VI on a discrete grid of n points for the scaled capital stock k t and a Markov chain with m elements for the random variable εat . Employ Algorithm 16.4.2 for this purpose. 3) Use a random number generator and bilinear interpolation and simulate the model. Compute second moments from first differences of the logged variables (why?) and compare these statistics to their empirical analogs obtained from German data in the table below. Variable Output Consumption Investment Hours Real Wage
sx
rx y
rx
0.75 0.76 1.99 0.97 1.01
1.00 0.56 0.68 0.59 −0.14
0.24 0.04 0.25 −0.26 −0.23
Notes: Second moments from first differences of logged German data, 70.i to 89.iv. s x := standard deviation of variable x, s x y :=cross correlation of x with output y, r x :=first order autocorrelation.
Problem 7.3: Wealth Allocation Erosa and Ventura (2004) analyze the money demand of households in a heterogeneous-agent economy. To compute the optimal household decision, they cannot rely upon perturbation methods because households differ with regard to their individual asset holdings. Instead, they use value function iteration. To facilitate the computation, they apply a nice trick that may become handy whenever you consider household optimization problems where the households hold different kinds of assets. In the present problem, households can choose to allocate
Problems
407
their wealth ω on real money m and capital k. In the following, we compute the steady state for a simple representative-agent economy. The household supplies one unit of labor inelastically. The individual consumes a continuum of commodities indexed by i ∈ [0, 1]. The representative household maximizes intertemporal utility ∞ X
β t u(c),
t=0
u(c) =
c 1−η , 1−η
where c denotes a consumption aggregator c = infi c(i). As a consequence, the household consumes the same amount of all goods i. Following Dotsey and Ireland (1996), the households chooses whether to buy the goods with cash or credit. Let s ≥ 0 denote the fraction of goods that are purchased with credit. The cash goods are purchased with the help of real money balances giving rise to the cash-in-advance constraint: c(1 − s) = m.
To purchase the good i by credit, the household must purchase wγ(i) of financial services: θ i γ(i) = γ0 , 1−i
with θ > 0. w denotes the wage rate, and the financial services are provided by competitive financial intermediaries who only use labor L f as an input. Clearly, some goods will be purchased with cash as the credit costs go to infinity for i → 1. Therefore, real money balances m will not be zero. Likewise, credit costs go to zero for i → 0 and some goods will be purchased with credit as long as nominal interest rates are above zero, which will be the case in our economy. Therefore, we have an interior solution with 0 < s < 1. Let π denote the exogenous inflation rate that characterizes monetary policy. The decision problem can be formulated by the following Bellman equation v(k, m) = max u(c) + β v(k0 , m0 ) 0 0 c,s,m ,k
subject to the cash-in-advance constraint and the budget constraint Zs c+w
0
γ(c, i) d i + k0 + m0 (1 + π) = (1 + r)k + w + m,
where r denotes the interest rate. Production uses capital K and labor L y : Y = K α (L y )1−α . Capital depreciates at rate δ. In a factor market equilibrium,
408
7 Discrete State Space Value Function Iteration w = (1 − α)K α (L y )−α ,
r = αK α−1 (L y )1−α − δ.
In general equilibrium, the government spends the seigniorage πM on government consumption G. The equilibrium conditions are given by G = πM , M = m, C = c, K = k, 1 = Ly + Lf , Y = G + δK + C. Periods correspond to quarters. The model parameters are set as follows: β = 0.99, η = 2.0, δ = 0.02, α = 0.36, π = 0.01, γ0 = 0.0421, θ = 0.3232. The algorithm consists of the following steps: Step 1: Choose initial values for K, M , and L y and compute w and r. Step 2: Solve the household decision problem. Step 3: Compute the steady state defined by k0 = k0 (k, m) = k and m0 = m0 (k, m) = m. Step 4: Return to step 1 if k 6= K and m 6= M .
Compute the steady state of the model as follows:
1) Use value function iteration over the state space (k, m).13 Provide a good initial guess for K and M (Hint: 1) assume that r = 1/β and L y ≈ 1 implying a value for K from the first-order condition of the firm. 2) Assume that c = Y − δK and that households finance about 82% of consumption with M1, which is the approximate number for the US.) 2) Use the following two-step procedure in order to solve the household optimization problem (as suggested by Erosa and Ventura (2004) in their Appendix A): a. Assume that the household allocates his wealth ω ≡ k + (1 + π)m on capital and money according to the optimal portfolio functions m = g m (ω) and k = g k (ω). As an initialization of these functions in the first iteration over K and M , use a linear function that represents the weights of K and M in total wealth K + (1 + π)M . Solve the following Bellman equation in the first stage: 13
In order to compute the optimum, you need to know Leibniz’s rule: Z
Z
b(x) a(x)
f (t, x) d t = f (b(x), x)b0 (x) − f (a(x), x)a0 (x) +
b(x) a(x)
∂ f (t, x) d t. ∂x
Problems
409 v(ω) = max0 u(c) + β v(ω0 ) c,s,ω
subject to the cash-in-advance constraint c(1 − s) = g m (ω) and the budget constraint: Zs c+w
0
γ(c, i) di + ω0 = (1 + r)g k (ω) + w + g m (ω).
This provides the policy function ω0 = gω (ω). b. In the second stage, solve the optimization problem: § ª (g k (ω), g m (ω)) = arg max max u(c) k,m
c,s
subject to (1 + π)m + k = ω c(1 − s) = m, Zs c+w
0
γ(c, i) di + ω0 = (1 + r)k + w + m,
where ω0 = gω (ω). Iterate until convergence and compare the policy functions, Euler equation residuals, and the computational time of the two procedures. 3) How does an increase of the quarterly inflation rate from 1% to 2% affect the equilibrium allocation?
Part II
Heterogenous Agent Models
Chapter 8
Computation of Stationary Distributions
8.1 Introduction This chapter introduces you to the modeling and computation of heterogeneous-agent economies. In this kind of problem, we have to compute the distribution of the individual state variable(s). While we focus on the computation of the stationary equilibrium in this chapter, you will learn how to compute the dynamics of such an economy in the next chapter. The representative agent framework embedded in the standard neoclassical growth model has become the standard tool for modern macroeconomics. It is based on the intertemporal calculus of the household that maximizes lifetime utility. Furthermore, the household behaves rationally. As a consequence, it is a natural framework for the welfare analysis of policy actions. However, it has also been subject to the criticism whether the results for the economy with a representative household carry over to one with heterogenous agents. In the real economy, agents are different with regard to many characteristics including their abilities, their education, their age, their marital status, their number of children, their wealth holdings, to name but a few. As a consequence it is difficult to define a representative agent. Simple aggregation may sometimes not be possible or lead to wrong implications. For example, if the savings of the households are a convex function of income and, therefore, the savings rate increases with higher income, the definition of the representative household as the one with the average income or median income may result in a consideration of an aggregate savings rate that is too low.1 In addition, we are 1
To see this argument, notice that the rich (poor) households with a high (low) savings rate contribute much more (less) to aggregate savings than the household with average income.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_8
413
414
8 Computation of Stationary Distributions
unable to study many important policy and welfare questions that analyze the redistribution of income among agents like, for example, through the reform of the social security and pensions system or by the choice of a flat versus a progressive schedule of the income tax. In the remaining part of the book, agents are no longer homogeneous and cannot be represented by a single agent. We start by the consideration of a very simple heterogeneous agent model where the distribution of the individual state variable (consisting of the individual wealth level) has no implication on the dynamics of the aggregate state variable (consisting of the aggregate capital stock) in the model. Using the assumption of a special form of preferences — Gorman preferences — we present a simple model in Section 8.2 that allows for an easy aggregation and computation. In this special case, the Gorman Aggregation Theorem holds. As a consequence, we are able to study the dynamics of the aggregate economy with the help of a representative agent. We argue that the assumptions of this theorem are not met empirically and consider the more relevant case where the distribution of wealth and income affects the dynamics of the aggregate economy. In Section 8.3, we, hence, consider an economy where easy aggregation is not possible. For obvious reasons, we will not start to introduce the diversity of agents along its multiple dimensions at once, but we will first confine ourselves to the consideration of one source of heterogeneity. Therefore, we augment the standard Ramsey model by the real life feature that some agents are employed, while others are unemployed.2 For simplicity, we assume that the agent cannot influence his employment probability, e.g. by searching harder for a new job or asking for a lower wage. In addition, agents cannot insure against the idiosyncratic risk of being unemployed. Accordingly, agents in our economy differ with regard to their employment status and their employment history. Those agents who were lucky and have been employed for many years are able to save more and build up higher wealth than their unlucky contemporaries who have been unemployed for longer periods of time. As a consequence, agents also differ with regard to their wealth. Besides, all agents are equal. In Section 8.4, we will compute the stationary distribution of the individual state variables. In Sections 8.5 and 8.6, we present two prominent applications from macroeconomic theory. In Section 8.5, we analyse the puzzle of the low risk-free interest rate. The standard representative agent neoclassical growth model has difficulties to 2
Different from the model of Hansen (1985), we also assume that agents do not pool their income.
8.2 Easy Aggregation and Gorman Preferences
415
replicate the fact that the real return on (supposedly) safe assets like the US Treasury Bills has been about 1% over recent decades. In Section 8.6, we evaluate the distributional effects of a switch from an income tax to a consumption tax. In stochastic steady state, we find little changes on the distribution of labor income and wealth. However, aggregate savings increase significantly in case of consumption rather than income taxation. In addition, Section 8.6 provides you with a short survey of the modern literature on the theory of income distribution.
8.2 Easy Aggregation and Gorman Preferences In this section, we study a special case of a heterogeneous agent economy where the initial distribution (or the distribution at any time) does not have an impact on the dynamics of the aggregate state variables. As a consequence it is much easier to compute the dynamics of the economy and the stationary distribution. The model is a simplified and discretized version of the one considered by García-Peñaloa and Turnovsky (2011).
8.2.1 A Numerical Example Consider an economy with N individuals. The individual household is indexed by i with individual measure µi , i = 1, . . . , N . Total measure of agents is normalized to one: N X i=1
µi = 1.
(8.1)
The household supplies labor l ti and holds wealth k it in period t. He maximizes intertemporal utility 1−η ∞ X c ti (1 − l ti )γ t β (8.2) 1−η t=0 subject to the budget constraint
k it+1 = (1 − τ)w t l ti + [1 + (1 − τ)r t ]k it + t r t − c ti ,
(8.3)
where t r t denotes government transfers in period t. The period utility is the familiar constant relative risk aversion (CRRA) function. For a given
416
8 Computation of Stationary Distributions
weight on utility γ, the parameter η controls the household’s attitude toward risk (see Section 1.5.3). The first-order conditions of the household i are given by λit = (c ti )−η (1 − l ti )γ(1−η) ,
λit (1 − τ)w t λit
= =
(8.4a)
γ(c ti )1−η (1 − l ti )γ(1−η)−1 , βλit+1 [1 + (1 − τ)r t+1 ].
(8.4b) (8.4c)
Combining (8.4a) and (8.4b), we derive the optimal labor supply as follows: c ti
γ
1 − l ti
= (1 − τ)w t .
(8.5)
Substituting (8.4a) into (8.4c), we derive
i c t+1
η
c ti
i 1 − l t+1
−γ(1−η)
1 − l ti
= β[1 + (1 − τ)r t+1 ].
With the help of this equation, after inserting (8.5), we can show that the i,c i growth rate of individual consumption, g t := c t+1 /c ti − 1, is the same for all individuals i ∈ {1, 2, . . . , N } and only depends on aggregate variables:
i c t+1
η−γ+γη
= (1 +
c ti
g tc )η−γ+γη
w t+1 =β wt
−γ(1−η)
[1 + (1 − τ)r t+1 ].
(8.6)
Together with (8.5), this equation also implies that the growth rate of leisure is the same for all individuals i ∈ {1, 2, . . . , N }: i 1 − l t+1
1 − l ti
=
(1 + g tc ) 1 + g tw
,
(8.7)
where g tw denotes the growth rate of real wages, g tw := w t+1 /w t − 1. It is straightforward to show that the equal growth rate of individual leisure implies that individual labor supply does not grow at the same rate (except for the cases that leisure is constant or equal for all individuals). Production Yt is characterized by constant returns to scale using agregate capital K t and labor L t as inputs: Yt = K tα L 1−α . t
(8.8)
8.2 Easy Aggregation and Gorman Preferences
417
In a market equilibrium, factors are compensated according to their marginal products and profits are zero: rt = α
Lt Kt
1−α (8.9a)
− δ, α Kt w t = (1 − α) , Lt
(8.9b)
where δ denotes the depreciation rate of capital. Next, we derive the growth rate of aggregate capital K t : Kt =
N X i=1
µi k it .
(8.10)
From the individual budget constraint (8.3) and the first-order condition with respect to labor, (8.5), we derive k it+1 Kt
l ti
k it
i
1 − l t t rt 1 = (1−τ)w t +[1+(1−τ)r t ] − (1−τ)w t + , (8.11) Kt Kt γ Kt Kt
which implies K t+1 Lt 1 − L t t rt 1 = (1 − τ)w t + 1 + (1 − τ)r t − (1 − τ)w t + , (8.12) Kt Kt γ Kt Kt where we used Lt ≡
N X i=1
µi l ti , Tr t ≡
N X
µi t r t .
i=1
Accordingly, the growth rate of aggregate capital, g tK ≡ K t+1 /K t − 1, only depends on aggregate variables (K t , L t ) noticing that the factor prices (8.9) and transfers t r t = τ (w t L t + r t K t ) = τ K tα L 1−α − δK t (8.13) t are also independent of the distribution of individual variables. Substituting w t , r t , and t r t from (8.9) and (8.13) into (8.12), we derive
1 K t+1 = K tα L 1−α + (1 − δ)K t − (1 − τ)(1 − α)K tα L −α t t (1 − L t ). (8.14) γ Since aggregate consumption growth and, hence, aggregate leisure growth only depends on aggregate variables (the growth rate of wages),
418
8 Computation of Stationary Distributions
we can multiply (8.7) by µi and sum over all individuals i ∈ {1, 2, . . . , N } to derive N X i=1
i µi (1 − l t+1 )=
N X i=1
µi
(1 + g tc ) 1 + g tw
(1 − l ti ).
implying 1 − L t+1 =
1 + g tc
1 + g tw
(1 − L t )
1
η
1
η
= (β[1 + (1 − τ)r t+1 ]) η−γ+γη (1 + g tw )− η−γ+γη (1 − L t ),
or
L t+1 = 1 − (β[1 + (1 − τ)r t+1 ]) η−γ+γη (1 + g tw )− η−γ+γη (1 − L t ). (8.15)
Since
1 + g tw =
K t+1 Kt
α
L t+1 Lt
−α
and r t+1 are functions of (K t , L t , K t+1 , L t+1 ) only, (8.12) and (8.15) define a system of (implicit) difference equations in (K t , L t ). In conclusion, aggregate dynamics of (K t , L t ) do not depend on the distribution of individual assets k it and labor l ti . Let us first consider the aggregate dynamics for a numerical example. In particular, we consider the case that the aggregate capital stock in period t = 0 is only equal to 90% of the steady-state capital stock K. We choose a period length of one year and assume production parameters α = 0.36 and δ = 8%. The coefficient or relative risk aversion, η, is set to 2. In addition, we assume that the government levies an income tax rate τ equal to 20%. We calibrate the remaining two parameters γ and β so that the steady-state labor supply L and real interest rate r amount to 0.30 and 3%, respectively. Therefore, we use the condition that the aggregate growth rates of consumption and wages, g tc and g tw , are equal to zero in steady state implying (with the help of (8.6)) β=
1 = 0.9766. 1 + (1 − τ)r
From (8.9a), we get K K=
1 α 1−α = 1.913. r +δ
8.2 Easy Aggregation and Gorman Preferences
419
With the help of K and L we can compute steady-state production Y = K α L 1−α = 0.5844 and, hence, steady-state consumption C = Y − δK = 0.4134. From (8.5), we derive the steady-state condition γ
N X i=1
µc i = (1 − τ)w
N X i=1
(1 − l i )
and, therefore, γ = (1 − τ)w
1− L = 1.618. C
For the period 0, we choose an initial capital stock K0 = 0.9K = 1.722. We compute the model with the help of direct computation in the GAUSS program Gorman.g.3 We apply the method of reverse shooting (see Section 16.2.3). We first set the number of transition periods equal to 70. In period t = 71, we assume that both the capital stock K t and labor L t are equal to their steady state values K = 1.9128 and L = 0.30. Next, we perturbate the final steady state by a small amount, say by -0.01% so that K70 = 1.9126. We compute L70 with the help of equation (8.14) which uses the first-order condition of the household with respect to his labor supply. In the next step, we iterate backwards in time using the two difference equations (8.14) and (8.15).4 For given (K t+1 , L t+1 ), we can compute (K t , L t ) from these two equations using the Newton-Rhapson algorithm. In essence, we have to solve two nonlinear equations. We continue this iteration until t = 0 and compare the resulting capital stock in period 0, K0 , with the initial condition for its values (if it is equal to 1.722). If it is too low, we increase the initial guess for K70 , otherwise we decrease the guess. To find the solution for K70 , we set up the problem as a nonlinear equations problem in the variable K70 so that we, again, can use the Newton-Rhapson algorithm to find the correct solution. The nonlinear equation is given by the difference of the resulting value K0 and the initial value at the amount of 1.722. The nonlinear equation is a function of the 3
García-Peñaloa and Turnovsky (2011) use local approximation methods. In particular, they linearize the equilibrium conditions around the steady state as you learned in Chapter 1. You are asked to compute the solution of the dynamics with this method in Problem 8.1. 4 Can you figure out why we do not use forward shooting to solve for the dynamics, i.e. supply a value of K1 and iterate over the difference equations system forward in time, i.e. find the solution (K t+1 , L t+1 ) using (K t , L t )?
420
8 Computation of Stationary Distributions
guess for K70 in the final period of the transition. The computation in the program Gorman.g takes less than a second. We have to be careful when we specify the initial value for K70 . If we had chosen a larger perturbation, e.g. by -1.0%, the iteration over the difference equations system would have stopped because K t would have become negative prior to the period t = 0. In addition, we have to chose a negative perturbation, i.e. K70 < K. Otherwise, the capital stock increases while we move backwards in time.
1.90
Kt
1.85 1.80 1.75 0
10
20
30
40
50
60
70
t Figure 8.1 Dynamics of Aggregate Capital Stock K t
The solution for the dynamics of the aggregate capital stock K t are illustrated in Fig. 8.1. Notice that the capital stock K t is equal to 1.722 in the initial period and approaches the steady state capital stock K = 1.913 smoothly from below. From this informal inspection, we can conclude that the number of transition periods is sufficiently high. If this were not the case and we observed a jump of K t in period t = 70, we would have to increase the number of transition periods. In the following, we study two cases for the final distribution of the individual capital stock: 1) equal distribution and 2) heterogeneous distribution. The aggregate dynamics are unaffected by the two different distributions as we have shown above. In the case of the equal distribution, the dynamics of the individual capital stocks k it , of course, are exactly equal to the one of the aggregate capital stock K t presented in Fig. 8.1. For the heterogeneous distribution, we will consider a parsimonious case. We just distinguish three types of households: poor, median, and rich workers with final wealth k i and measures µi , i = 1, 2, 3. Three different types are enough to capture the essence of the empirical US wealth distribution with regard to 1) number of people with essentially no wealth, 2) inequality, and 3) skewness of the distribution. In particular,
8.2 Easy Aggregation and Gorman Preferences
421
we fix the parameters of the final distribution to replicate the following facts from the US wealth distribution estimated by Díaz-Giménez et al. (1997) with the help of the 1992 Survey of Consumer Finances: 1) The two poorest quintiles of the distribution combined hold 1.35% of total wealth, 2) the Gini of the wealth distribution amounts to 0.78, and 3) average wealth relative to median wealth amounts to 3.61. We therefore fix the measure of the households who hold the lowest wealth equal to µ1 = 40% with k1 /K = 0.0135. Assuming that the second type i = 2 is also the median in the wealth distribution, i.e. µ2 > 10%, we can fix the remaining four free parameters and variables (µ2 , µ3 , k2 , k3 ) with the help of the following conditions: 1. Average wealth is equal to aggregate wealth: K=
3 X
µi k i .
i=1
2. Total measure of all households is equal to one: 3 X i=1
µi = 1.
3. The Gini coefficient is equal to 0.78: P3 i i=1 µ (Si−1 + Si ) Gini = 1 − = 0.78, S3
(8.16)
where the cumulative wealth shares of the bottom i types of the population with respect to wealth, Si , are defined by Si =
i X
µjkj
j=1
with S0 = 0 and k j−1 < k j . 4. The skewness of the wealth distribution is equal to the empirical one: K/k2 = 3.61. We find the following distribution for the final capital stock and measures, (k1 , k2 , k3 ) = (0.0258, 0.530, 12.288) and (µ1 , µ2 , µ3 ) = (0.400, 0.465, 0.135).5 5
The computation of the dynamics is greatly simplified by the fact that we use the calibration for the distribution in the final steady state. If we had used the calibration for
422
8 Computation of Stationary Distributions
To compute the dynamics for the three types i = 1, 2, 3, we first need to compute the labor supply l i in the final steady state. For this reason, we consider (8.11) using k it+1 = k it = k i to derive i
l =
1 γ (1 − τ)w −
t r − (1 − τ)r k i , 1 + γ1 (1 − τ)w
(8.17)
where the aggregate variables w, r, and t r are simply the final steady-state values of the wage rate, the interest rate, and government transfers. We find numerical values (l 1 , l 2 , l 3 ) = (0.328, 0.321, 0.146). The wealth-rich households supply substantially less labor than the wealth-poor households due to the income effect. The implications of our model for the individual labor supplies are difficult to reconcile with the empirical fact that highproductivity agents work more compared with the low productivity agents. For the US economy, Kydland (1984) and Ríos-Rull (1993) document that individual hours worked are increasing in skills. If the high-skilled are those agents in the economy who build up higher savings, they should supply more and not less labor than those with low skills and little wealth.6 In addition, we find that the Gini coefficient of earnings, wl i , market income, wl i + r k i , and total income, wl i + r k i + t r, are far more equally distributed than empirically observed. For these three earnings and income measures, we find Gini coefficients equal to 0.04, 0.07, and 0.03 in our model, while the empirical values of earnings and (total) income amounted to 0.61 and 0.55 for the US economy in 1998 according to Budría Rodríguez et al. (2002). To compute the dynamics of the individual variables {(k it , l ti )}70 t=0 , we i 70 first consider the labor supply {l t } t=0 . We start in the final steady state, 1 where, for example, the labor supply of type 1, l71 , is equal to 0.328 so 1 that leisure amounts to 1 − l = 0.672. With the help of the time series for aggregate labor L t , we can compute the growth factor of aggregate leisure, 1−L t+1 1−L t , which we have demonstrated to be equal to the growth factor of individual leisure. Therefore, we simply compute individual leisure with the help of the distribution of the capital stock in the initial period t = 0, we would have not been able to directly compute the distribution of labor l i among the three types in the final steady state. You are asked to solve the problem with the initial distribution given instead in Problem 8.2. 6 The same observation holds in a more elaborate model of Maliar and Maliar (2001) who introduce skill heterogeneity into a heterogeneous agent economy. See also below.
8.2 Easy Aggregation and Gorman Preferences i 1 − l t+1
1 − l ti
=
423
1 − L t+1 . 1 − Lt
i For given l t+1 starting in period t + 1 = 71, we can recover l ti from this equation. We iterate until we have found the complete series {l ti }70 t=0 for i = 1, 2, 3. Next, we compute {k it }70 with the help of the individual’s t=0 budget constraint (8.11):
1 k it+1 = (1 − τ)w t l ti + [1 + (1 − τ)r t ]k it − (1 − τ)w t (1 − l ti ) + t r t . γ
Gini Coefficient
i Starting in period t = 71 with k71 = k i , we can solve this equation for k it . We iterate backwards in time to recover the whole time series {k it }70 t=0 .
0.801 0.798 0.795 0.792 0.789 0.786 0.783 0.780 0
10
20
30
40
50
60
70
t Figure 8.2 Dynamics of the Gini Coefficient of Wealth
Fig. 8.2 displays the dynamics of the Gini coefficient for our distribution 1 2 3 {(k1t , k2t , k3t )}71 t=0 with constant measures (µ , µ , µ ) over time. We find that the inequality of the wealth distribution declines over time, from 0.801 at time t = 0 to 0.780 in the final steady state. The reason is that wealthier agents choose to enjoy higher leisure and, therefore, build up less capital (in relative terms).7 Fig. 8.3 presents the dynamics of the inequality in market income in our model. The Gini coefficient of market income also falls, from 0.06 to 0.04. Therefore, we find that during the process of transition to the steady state, both income and wealth inequality decreases ceteris paribus. However, we notice that our simple model is not sufficient to be an adequate description of empirical inequality. We leave out one basic mechanism of 7
García-Peñaloa and Turnovsky (2011) derive this result formally for K0 < K.
424
8 Computation of Stationary Distributions
Gini Coefficient
0.060 0.055 0.050 0.045 0.040 0
10
20
30
40
50
60
70
t Figure 8.3 Dynamics of the Gini Coefficient of Market Income
inequality which is differences in the wage rate. In our economy, all agents receive the same wage rate, while empirically, individual productivities are heterogeneous and wages are unequally distributed to a significant degree. For example, Heathcote et al. (2010) show that the Gini coefficient of wages has increased from 0.28 to 0.40 between 1970 and 2005, while in our model the Gini coefficient of wages is simply zero. Could we have been able to integrate wage heterogeneity into our model without complicating the aggregation? Or are we bound to sacrifice the ease of computation once we allow for these more realistic description of empirical distributional facts? For this reason, reconsider our derivation of the aggregation results. In particular, we summed up the individual budget constraints (8.11) to derive the dynamics of the aggregate capital stock and argued that it does not depend on the distribution of either individual capital stock or labor, k it and l ti . For the sake of argument, let us assume that the household types i = 1, 2, 3, differ with respect to their individual labor productivity so that they receive the wage rate wit = a i w t , with a1 < a2 < a3 (after all, this introduces a reason why agents of type 3 build up higher wealth than the poorer households because they are simply more productive and earn higher wages).8 We also normalize average productivity to one implying 8
Of course, this interpretation is far too simplifying. If wealth inequality were only explained by heterogeneity in inidividual productivity and, hence, wages, we would expect a very high correlation of earnings and wealth. However, Budría Rodríguez et al. (2002) only find a correlation coefficient equal to 0.47 for these two variables in the US in 1998. Other important factors that help to explain the higher inequality of wealth relative to earnings are the life-cycle (some retirees have high wealth but little earnings) and bequests. We will consider life-cycle aspects for the explanation of the wealth inequality in Chapter 10.
8.2 Easy Aggregation and Gorman Preferences 3 X i=1
425
µi a i = 1.
In this case, (8.5) changes to: γ
c ti
= (1 − τ)a i w t ,
1 − l ti
while (8.11) is presented by k it+1 Kt
= (1 − τ)a i w t
l ti Kt
+ [1 + (1 − τ)r t ]
k it
i
1 − l t t rt 1 − (1 − τ)a i w t + . Kt γ Kt Kt
Multiplying the latter equation by µi and summing it over all i also implies (8.12). In this case, however, L t is defined as the average efficient labor: Lt ≡
3 X i=1
µi a i l ti .
Similarly, it is easy to show that the growth rate of individual leisure is the same for all agents as captured by (8.7). In the next step, we sum up the individual leisures after multiplying both sides of the equation with the factor a i µi resulting in: N X i=1
i
aµ
i
i (1 − l t+1 )
=
N X
i
aµ
i=1
i
(1 + g tc ) 1+
g tw
(1 − l ti ).
Again, this equation implies that the dynamics of effective labor is unaffected by the distribution of the individual variables and that the dynamics are represented by (8.15). In conclusion, our aggregation results also holds in the case of heterogeneous individual productivity. We will discuss this finding in more detail in the following.
8.2.2 Gorman Preferences As a building stone of the neoclassical growth model, we look at the behavior of a representative household and a representative firm. A fundamental question, therefore, arises: When can we neglect the heterogeneity of households and firms? In the following, we will focus on the households
426
8 Computation of Stationary Distributions
where the theoretical foundation has been laid by Gorman (1953).9 In particular, Gorman has shown that, if preferences are homothetic like in (8.2) (implying that indirect utility can be written as a linear function of wealth) and households differ in initial wealth levels only, aggregate preferences do not depend on the distribution of individual wealth.10 We start by looking at the neoclassical growth model with exogenous labor supply and logarithmic utility function.11 The production sector is specified in the same way as in the previous section so that we only consider the household sector. Assume that we have N households indexed by i (each of measure 1/N ). The initial distribution of assets at time 0, a0i , is exogenously given. The household i receives exogenous (labor) income y ti and interest income r t a it on his assets in period t. He consumes the amount c ti in period t so that his budget constraint is presented by a it+1 = y ti + (1 + r t )a it − c ti .
(8.18)
For our analysis, it will be helpful to consider the intertemporal budget constraint of the household. Therefore, substitute (8.18) for period t − 1 into (8.18) and continue in this fashion for t − 2, t − 3, . . ., 0 to derive ! ! t t t Y X Y a it+1 = (1 + r j ) a0i + (1 + r j ) yτi − cτi , j=0
or, after division by the factor Q
a it+1 t j=0 (1 + r j )
j=τ+1
τ=0
=
a0i
+
Q
t X τ=0
t j=0 (1 + r j )
Q |
1 τ j=0 (1 + r j )
{z
≡pτ
yτi − cτi . }
p t denotes the price of the good in period t in units of the good at time 0. Therefore, we can rewrite this equation more compactly as 9
The case that the economy admits the standard assumption of a representative firm does require far less stringent assumptions, e.g. competitive markets and the absence of production externalities. The assumption of a representative firm in the neoclassical growth model is considered in many introductory books on microeconomics such as Mas-Colell et al. (1995), Section x, or economic growth such as Acemo˘ glu (2009), Section 5.4. 10 For a formal statement of the so-called Gorman Aggregation Theorem and the assumption of a representative household see, for example, Section 5.2 in Acemo˘ glu (2009). 11 Our argument follows Chatterjee (1994). In addition, we integrate exogenous (labor) income into our model.
8.2 Easy Aggregation and Gorman Preferences
p t a it+1 = a0i +
t X τ=0
427
pτ yτi − cτi .
Taking the limit t → ∞ and imposing the transversality condition lim p t a it+1 = 0,
t→∞
we derive the intertemporal budget constraint a0i
+
∞ X
pτ yτi
τ=0
=
∞ X
pτ cτi .
(8.19)
τ=0
The household i maximizes intertemporal utility U0i =
∞ X t=0
β t u(c ti )
(8.20)
subject to the intertemporal budget constraint (8.19). We will show in the following that the argument laid out by Gorman (1953) holds for the logarithmic utility function u(c ti ) = ln c ti .
(8.21)
Let us formulate the Lagrange function of the household i in period 0 ∞ ∞ ∞ X X X i t i i i i i L = β ln c t + λ a0 + pτ y τ − pτ cτ . (8.22) t=0
τ=0
τ=0
The first-order condition with respect to consumption in period t is presented by βt c ti
= λi p t .
(8.23)
Substituting this condition in the life-time budget constraint (8.19), we can derive the Lagrange multiplier of the household i as follows λi =
1 1 · , P i 1 − β a0i + ∞ p y τ τ=0 τ
(8.24)
which implies
∞ t X β (1 − β) c ti = a0i + pτ yτi . pt τ=0
(8.25)
428
8 Computation of Stationary Distributions
Accordingly, individual consumption is linear in individual wealth (which is the sum of individual physical and human capital). Therefore, aggregation implies that aggregate consumption is proportional to aggregate wealth (both physical and human): N ∞ X 1 X i β t (1 − β) Ct = c = A0 + pτ y τ , N i=1 t pt τ=0
(8.26)
with y t :=
N N 1X i 1X i y t , A0 := a . N i=1 N i=1 0
(8.26) implies that, if the agents differ with respect to their initial endowments of wealth, the distribution does not affect aggregate consumption, and hence, savings (wealth). This result captures the essence of the argument in Gorman (1953). Moreover, one can easily show that the CRRA utility function12 u(c) =
c 1−η 1−η
also fulfills the assumptions of the Gorman’s Aggregation Theorem and the aggregate behavior can be represented as if resulted from the maximization of a representative household with average wealth and exogenous income. Often, the preferences that fulfill the assumptions of the Gorman aggregation result are also referred to as Gorman preferences. The results for the neoclassical growth model with exogenous (labor) income and exogenous distribution of the initial endowment with assets can be extended to more general cases. Caselli and Ventura (2000) show that the Gorman Aggregation Theorem also holds in a version of the neoclassical growth model where agents also differ in their endowments of efficiency units of labor. We have already provided an informal argument in our numerical example from the previous Section 8.2.1. In Maliar and Maliar (2001), agents differ along two dimensions, initial wealth endowments and non-acquired skills. They prove that the neoclassical growth model with heterogeneous endowments of wealth and time-invariant heterogeneous endowments of efficiency units of labor admits a single-agent representation in the case of Cobb-Douglas utility. 12
See, for example, Chatterjee (1994).
8.3 A Simple Heterogeneous Agent Model with Aggregate Certainty
429
They find that demand for physical hours worked is not linear in wealth any longer, but demand for efficiency hours worked is. Consequently, the Gorman aggregation result can be shown similar to our argument above building on the aggregation of the supply of efficiency units. In particular, the preferences of the representative agent depend on average efficiency units of labor instead on working hours.13 Maliar and Maliar (2003) also introduce individual uncertainty in the heterogeneous agent neoclassical growth model. If idiosyncratic efficiency units of labor are time-varying but perfectly insurable (i.e., markets are still complete) then the strong aggregation result (the dynamics of aggregate variables do not depend on the distribution) fails, but one can obtain a weaker aggregation result. If agents preferences are characterized by the CRRA or the addilog utility function, the heterogeneous agent economy behaves like a representative agent economy with three kinds of shocks, to preferences, to technology and to labor. In sum, if labor productivity is subject to exogenous shocks and if labor income is not insurable, simple Gorman aggregation does not apply and we need to study the dynamics of the wealth and income distribution in order to determine the dynamics of the aggregate variables capital and labor. We consider this case to be the more realistic one and will concentrate on this in the remaining part of the book. A simple example of an economy where the Gorman aggregation theorem does not hold is provided in the next section.
8.3 A Simple Heterogeneous Agent Model with Aggregate Certainty In Chapter 1, we presented the deterministic infinite horizon Ramsey problem and showed that the equilibrium of this economy is equivalent to the one of a decentralized economy and that the fundamental theorems of welfare economics hold. In this section, we consider heterogeneity at the household level, but keep the simplifying assumption that all firms are equal and, hence, can act as a representative firm. As a consequence, 13
Maliar and Maliar (2001) also demonstrate that their heterogeneous agent economy, in contrast to the standard RBC model, is able to replicate the low empirical correlation between average productivity and hours worked, the so-called Dunlop-Tarshis observation as documented by Christiano and Eichenbaum (1992).
430
8 Computation of Stationary Distributions
we most conveniently formulate our model in terms of a decentralized economy and study the behavior of the households and the firm separately. As a second important characteristic of our model, we only consider idiosyncratic risk. In our economy, households can become unemployed and cannot insure themselves against this risk. However, there is no aggregate uncertainty. For example, the technology is deterministic. As you will find out, the economy will display a long-run behavior that is easily amenable to computational analysis. In the stationary equilibrium of the economy, the distribution of the state variable, the aggregate wage and the aggregate interest rate are all constant, while the employment status and the wealth level of the individual households vary.14 In our simple model, three sectors can be distinguished: households, production, and the government. Households maximize their intertemporal utility subject to their budget constraint. In order to insure against the risk of unemployment, they build up precautionary savings during good times. Firms maximize profits. The government pays unemployment compensation to the unemployed agents that is financed by an income tax. We will describe the behavior of the three sectors in turn. HOUSEHOLDS. The economy consists of many infinitely lived individuals. In particular, we consider a continuum of agents of total measure normalized to one.15 Each household consists of one agent and we will speak of households and agents interchangeably. Households differ only with regard to their employment status and their asset holdings. Households maximize their intertemporal utility ∞ X s Et β u (c t+s ) , (8.27) s=0
where β < 1 is the subjective discount factor and expectations are conditioned on the information set at time t. At time zero, the agent knows his beginning-of-period wealth a0 and his employment status ε0 ∈ {e, u}. If ε = e (ε = u), the agent is employed (unemployed). The agent’s instantaneous utility function is twice continuously differentiable, increasing and concave in his consumption c t and has the following CRRA form: 14
Aggregate uncertainty will be introduced into the heterogeneous agent extension of the Ramsey model in Chapter 9. 15 This amounts to assume that the number of individual households is infinite and, if we index the household with i ∈ [0, 1], the probability that i ∈ [i0 , i1 ] is simply i1 − i0 .
8.3 A Simple Heterogeneous Agent Model with Aggregate Certainty
431
1−η
u(c t ) =
ct
1−η
, η > 0,
(8.28)
where η, again, denotes the coefficient of relative risk aversion (which is equal to the inverse of the intertemporal elasticity of substitution, 1/η). In the following, lowercase letters denote individual variables and uppercase letters denote aggregate variables. For example, c t is individual consumption, while C t is aggregate consumption of the economy in period t. We, however, keep the notation that real prices are denoted by lower case letters, while nominal prices are denoted by upper case letters. Agents are endowed with one indivisible unit of time in each period. If the agent is employed (ε t = e) in period t, he earns gross wage w t . If the agent is unemployed (ε t = u) in period t, he receives unemployment compensation b t . We will assume that (1 − τ)w t > b t , where τ denotes the income tax rate. The individual-specific employment state is assumed to follow a first-order Markov chain. The conditional transition matrix is given by: puu pue 0 0 π(ε |ε) = Prob ε t+1 = ε |ε t = ε = , (8.29) peu pee where, for example, Prob {ε t+1 = e|ε t = u} = pue is the probability that an agent will be employed in period t + 1 given that the agent is unemployed in period t. Households know the law of motion of the employment status εt . In our economy, unemployment is exogenous. We have not modeled any frictions which might be able to explain this feature. In this regard, we follow Hansen and ˙Imrohoro˘ glu (1992) in order to simplify the exposition and the computation. Of course, it would be straightforward to introduce endogenous unemployment into this model. For example, various authors have used search frictions in the labor market in order to explain unemployment with the help of either endogenous search effort as in Costain (1997) or Heer (2003a) or endogenous separation from the firms as in Den Haan et al. (2000). In addition, we assume that there are no private insurance markets against unemployment and unemployed agents only receive unemployment compensation from the government.16 16
One possible reason why there are no private insurance markets against the risk of unemployment is moral hazard. Agents may be reluctant to accept a job if they may receive generous unemployment compensation instead. Chiu and Karni (1998) show that the presence of private information about the individual’s work effort helps to explain the failure of the private sector to provide unemployment insurance.
432
8 Computation of Stationary Distributions
The household faces the following budget constraint (1 + (1 − τ)r t ) a t + (1 − τ)w t − c t if ε t = e, a t+1 = (1 + (1 − τ)r t ) a t + b t − c t if ε t = u,
(8.30)
where r t denotes the interest rate in period t. Interest income and wage income are taxed at rate τ.17 Each agent smoothes his consumption {c t }∞ t=0 by holding the asset a t . An agent accumulates wealth in good times (ε t = e) and runs it down in bad times (ε t = u). As a consequence, agents are also heterogeneous with regard to their assets a t . We impose the asset constraint a t ≥ amin , so that households cannot run down their assets below amin ≤ 0. The first-order condition of the household that is not wealth-constrained can be solved by introducing the Lagrange multiplier λ and setting to zero the derivatives of the Lagrangian function L = Et
∞¦ X s=0
β s [u(c t+s )+ λ t+s 1ε t+s =u b t+s + (1 + (1 − τ)r t+s )a t+s +1ε t+s =e (1 − τ)w t+s − a t+s+1 − c t+s
©
with respect to c t and a t+1 . 1ε t =e (1ε t =u ) denotes an indicator function that takes the value one if the agent is employed (unemployed) in period t and zero otherwise. The first-order condition for the employed and unemployed agent in period t is u0 (c t ) = E t u0 (c t+1 )(1 + (1 − τ)r t+1 ) . β
(8.31)
The solution of this Euler equation is given by the policy function c(ε t , a t ) that is a function of the employment status ε t and the asset holdings a t in period t.18 In particular, the policy function is independent of calendar time t. Together with (8.30), the policy function c(ε t , a t ) also gives next-period asset holdings a t+1 = a0 (ε t , a t ). 17
In most OECD countries, unemployment insurance is financed by a tax that is only imposed on wage income. In our model, we assume that a general income tax is used to finance unemployment insurance. 18 The policy function also depends on aggregate variables such as the factor prices and the policy parameters. We dropped these arguments for notational convenience.
8.3 A Simple Heterogeneous Agent Model with Aggregate Certainty
433
PRODUCTION. Firms are owned by the households and maximize profits with respect to their labor and capital demand. Production Yt is characterized by constant returns to scale using capital K t and labor L t as inputs: Yt = K tα L 1−α , α ∈ (0, 1), t
(8.32)
where α denotes the production elasticity of capital. In a market equilibrium, factors are compensated according to their marginal products and profits are zero: rt = α
Lt Kt
1−α
− δ, α Kt w t = (1 − α) , Lt
(8.33a) (8.33b)
where δ denotes the depreciation rate of capital. GOVERNMENT. Government expenditures consist of unemployment compensation B t which are financed by a tax on income. The government budget is assumed to balance in every period: B t = Tt ,
(8.34)
where Tt denotes government revenues. STATIONARY EQUILIBRIUM. First, we will analyze a stationary equilibrium. We may want to concentrate on the stationary equilibrium, for example, if we want to analyze the long-run effects of a permanent change in the government policy, e.g. a once-and-for-all change in the unemployment compensation b. In a stationary equilibrium, the aggregate variables and the factor prices are constant and we will drop the time indices if appropriate, e.g. for the aggregate capital stock K or the interest rate r and the wage w. Furthermore, the distribution of assets is constant for both the employed and unemployed agents, and the numbers of employed and unemployed agents are constant, too. The individual agents, of course, are not characterized by constant wealth and employment status over time. While we focus on a stationary distribution in this chapter, we will also analyze 1) the transition dynamics for a given initial distribution of the
434
8 Computation of Stationary Distributions
assets to the stationary distribution and 2) the movement of the wealth and income distribution over the business cycle in the next chapter. For the description of the stationary equilibrium, we need to describe the heterogeneity in our economy. In this book, we use a very pragmatic and simple way to define the stationary equilibrium. In particular, we only use basic concepts from probability theory and statistics which all readers should be familiar with, namely the concept of a distribution function.19 In the stationary equilibrium, the distribution of assets is constant and we will refer to it as either the stationary, invariant, or constant distribution. In our particular model, we are aiming to compute the two cumulative distribution functions of the assets for the employed and unemployed agents, F (e, a) and F (u, a), respectively. The corresponding density functions are denoted by f (e, a) and f (u, a). The individual state space consists of the sets (ε, a) ∈ X = {e, u} × [amin , ∞). The concept of a stationary equilibrium uses a recursive representation of the consumer’s problem. Let v(ε, a) be the value function of a household characterized by employment status ε and wealth a. v(ε, a) for the benchmark government policy is defined as the solution to the dynamic program: v(ε, a) = max u(c) + βE v(ε0 , a0 )|ε , (8.35) c
subject to the budget constraint (8.30), the government policy (b, τ), and the stochastic process of the employment status ε as given by (8.29).20 DEFINITION. A stationary equilibrium for a given government policy parameter (b, τ) is a value function v(ε, a), individual policy rules c(ε, a) and a0 (ε, a) for consumption and next-period capital, a time-invariant density of the state variable x = (ε, a) ∈ X , f (e, a) and f (u, a), time-invariant
19
A description of more general heterogeneous agent economies might necessitate the use of more advanced concepts from measure theory. Since the algorithms and solution methods developed in this chapter do not require a thorough understanding of measure theory and should already be comprehensible with some prior knowledge of basic statistics, we dispense with an introduction into measure and probability theory. For a more detailed description of the use of measure theory in recursive dynamic models please see Stokey et al. (1989). 20 The solution obtained by maximizing (8.27) s.t. (8.30) and (8.29) corresponds to the solution obtained by solving (8.35) s.t. (8.30) and (8.29) under certain conditions on the boundedness of the value function v(·) (see also Section 1.3.3). This correspondence has been called the ‘Principle of Optimality’ by Richard Bellman.
8.3 A Simple Heterogeneous Agent Model with Aggregate Certainty
435
relative prices of labor and capital (w, r), and a vector of aggregates K, N , C, T , and B such that: 1. Factor inputs, consumption, tax revenues, and unemployment compensation are obtained aggregating over households: K=
X Z
amin
ε∈{e,u}
Z L= C=
∞ amin
a f (ε, a) d a,
f (e, a) d a,
X Z ε∈{e,u}
∞
∞ amin
c(ε, a) f (ε, a) d a,
(8.36a)
(8.36b) (8.36c)
T = τ(wL + r K),
(8.36d)
B = (1 − L)b.
(8.36e)
2. c(ε, a) and a0 (ε, a) are optimal decision rules and solve the household decision problem described in (8.35). 3. Factor prices (8.33a) and (8.33b) are equal to the factors’ marginal productivities, respectively. 4. The goods market clears: K α L 1−α + (1 − δ)K = C + K 0 = C + K.
(8.37)
5. The government budget (8.34) is balanced: T = B. 6. The distribution of the individual state variable (ε, a) is stationary: X F (ε0 , a0 ) = π(ε0 |ε) F ε, a0−1 (ε, a0 ) (8.38) ε∈{e,u}
for all (ε0 , a0 ) ∈ X . Here, a0−1 (ε, a0 ) denotes the inverse of the function a0 (ε, a) with respect to its second argument a.21 Accordingly, the distribution over states (ε, a) ∈ X is unchanging. 21
In particular, we assume that a0 (ε, a) is invertible. As it turns out, a0 (ε, a) is invertible in our example economy in this chapter. In Section 8.4, we will also discuss the changes in the computation of the model that are necessary if a0 (ε, a) is not invertible. This will be the case if the non-negativity constraint on assets is binding.
436
8 Computation of Stationary Distributions
CALIBRATION. As we will often use the model as an example in subsequent sections, we will already assign numerical values to its parameters in this introductory part. Following ˙Imrohoro˘ glu (1989a), periods are set equal to six weeks (≈ 1/8 of a year). Preferences and production parameters are calibrated as commonly in the dynamic general equilibrium models. In particular, we pick the values α = 0.36 and η = 2.0. Our choice of β = 0.995 implies a real annual interest rate of approximately 4% before taxes. The employment probabilities are set such that the average duration of unemployment is 2 periods (=12 weeks) and average unemployment is 8%.22 The employment transition matrix is given by: puu pue 0.5000 0.5000 = . (8.39) peu pee 0.0435 0.9565 The non-capital income of the unemployed household b amounts to 1.199 and is set equal to one fourth of the steady-state gross wage rate in the corresponding representative agent model,23 where the gross interest rate is equal to the inverse of the discount factor β and, therefore, the capital stock amounts to K = (α/(1/β − 1 + δ))1/(1−α) L. In the literature, the ratio of unemployment compensation to net wage income is also called the replacement rate which will be approximately equal to 25.6% in our model. In addition, the income tax rate τ is determined endogenously in the computation with the help of the balanced budget rule. Finally, the annual depreciation rate is set equal to 4% implying a six-week depreciation rate of approximately 0.5%.
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy With only very few exceptions, dynamic heterogeneous agent general equilibrium models do not have any analytical solution or allow for the derivation of analytical results. Algorithms to solve heterogeneous agent models with an endogenous distribution have been introduced into the 22
Notice that unemployed agents stay unemployed with a probability of 0.5. As a consequence, the average duration of unemployment is simply 1/0.5=2 periods. In Section 16.4, you will learn how to compute the stationary unemployment rate from the employment transition matrix. 23 In such a model, the ‘representative’ household consists of (1 − L) unemployed workers and L employed workers.
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy
437
economic literature during the 1990ies. Notable studies in this area are Aiyagari (1994), Aiyagari (1995), Den Haan (1997), Huggett (1993), ˙Imrohoro˘ glu et al. (1995), Krusell and Smith (1998), or Ríos-Rull (1999). We will use Example 8.4.1 which summarizes the model of the previous section as an illustration for the computation of the stationary equilibrium of such an economy. Example 8.4.1 Consider the following stationary equilibrium: a) Households are allocated uniformly on the unit interval [0, 1] and are of measure one. The individual household maximizes 1−η c v(ε, a) = max + βE v(ε0 , a0 )|ε , c 1−η s.t.
§
0
a =
(1 + (1 − τ)r) a + (1 − τ)w − c if ε = e, (1 + (1 − τ)r) a + b − c if ε = u,
a ≥ amin ,
puu pue 0 π(ε |ε) = Prob ε t+1 = ε |ε t = ε = . peu pee 0
b) The distribution of (ε, a) is stationary and aggregate capital K, aggregate consumption C, and aggregate employment L are constant. c) Factors prices are equal to their respective marginal products: 1−α L r =α − δ, K α K w = (1 − α) . L d) The government budget balances: B = T . e) The aggregate consistency conditions hold: X Z∞ K=
amin
ε∈{e,u}
Z L=
C=
∞ amin
f (e, a) d a,
X Z ε∈{e,u}
a f (ε, a) d a,
∞ amin
c(ε, a) f (ε, a) d a,
438
8 Computation of Stationary Distributions T = τ(wL + r K), B = (1 − L)b.
The computation of the solution of Example 8.4.1 consists of two basic steps, the computation of the policy function and the computation of the invariant distribution. For this reason, we apply several elements of numerical analysis that we introduced in the first part of this book. In order to solve the individual’s optimization problem, we need to know the stationary factor prices and the tax rate. For a given triplet (K, L, τ), we can use the methods presented in Part I in order to compute the individual policy functions c(ε, a) and a0 (ε, a). The next step is the basic new element that you have not encountered in the computation of representative agent economies. We need to compute the distribution of the individual state variables, aggregate the individual state variables, and impose the aggregate consistency conditions. The complete solution algorithm for Example 8.4.1 is described by the following steps: Algorithm 8.4.1 (Computation of Example 8.4.1) Purpose: Computation of the stationary equilibrium. Steps: Step 1: Compute the stationary employment L. Step 2: Make initial guesses of the aggregate capital stock K and the tax rate τ. Step 3: Compute the wage rate w and the interest rate r. Step 4: Compute the household’s decision functions. Step 5: Compute the stationary distribution of assets for the employed and unemployed agents. Step 6: Compute the capital stock K and taxes T that solve the aggregate consistency conditions. Step 7: Compute the tax rate τ that solves the government budget. Step 8: Update K and τ and return to step 2 if necessary. In Step 1, we compute the stationary employment L. In our simple Example 8.4.1, employment L t does not depend on the endogenous variables w t , r t , or the distribution of assets a t in period t. L t only depends on the number of employed in the previous period L t−1 . Given employment L t−1 in period t − 1, we know that next-period employment is simply the sum of the lucky unemployed agents who find a job and the lucky employed agents that keep their job
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy
L t = pue (1 − L t−1 ) + pee L t−1 .
439
(8.41)
In stationary equilibrium, L t = L t−1 = L so that stationary labor L is presented by L=
pue . 1 + pue − pee
Given the probabilities pue = 0.50 and pee = 0.9565, stationary labor amounts to L = 0.91996. In essence, we are computing the ergodic distribution of a simple 2-state Markov chain with the two states L = 0.91996 and 1 − L = 0.08004. Efficient methods in order to compute the stationary values of (possibly more complex) Markov-chain processes are described in more detail in Section 16.4 on Markov processes. In Step 2, we provide initial guesses of the aggregate capital stock K and the tax rate τ. In order to initialize the capital stock, we consider the steady state of the corresponding representative agent economy where the household consists of L employed workers and 1− L employed workers who pool their income and where the government does not provide unemployment insurance. Therefore, the initial value of capital stock K = 247.62 can be found as the solution to the Euler equation:
α K= 1/β − 1 + δ
1 1−α
L.
We initialize the income tax rate with τ = 2%. Our guess derives from the ergodic unemployment of approximately 8% (= 1 − L) of the households times the replacement rate of the unemployment compensation with respect to wage income, which is equal to 25%. Our initialization is likely to be a little bit too high since unemployment compensation is financed by a tax on total income, but will be accurate enough for a first guess. In fact, the equilibrium value of τ amounts to 1.72% in our economy. Next, the wage and interest rate, w and r, are computed with the help of (8.33b) and (8.33a) in Step 3. In Step 4, the individual policy functions are computed with the help of value function iteration with linear interpolation as described in Section 7.3. We compute the value function at n = 200 equidistant grid points a j in the interval [−2, 3000]. The interval is found by some trial and error. Of course, it should contain the steady state capital stock of the corresponding representative agent economy, K = (α/(1/β − 1 + δ))1/(1−α) L = 247.6. We would also love to choose an ergodic set so that once the individual’s capital stock is inside the set, it stays inside the interval. As it turns out, this interval
440
8 Computation of Stationary Distributions
is rather large and we choose the smaller interval [−2, 3000] instead. In the stationary equilibrium, all employed agents have strictly positive net savings over the complete interval [−2, 3000]. However, the number of agents that will have assets exceeding 1500 is extremely small. In fact, fewer than 0.01% of the agents have assets in the range of [1500, 3000] so that we can be very confident that our choice of the interval is not too restrictive. The reason for the low number of very rich people is the law of large numbers. We simulate the economy over 25000 periods or more and sooner or later, the employed agents will loose their job and start decumulating their wealth again. We compute the value function of the employed and unemployed separately. In order to speed up the computation, we initialize the value function assuming that the employed and unemployed do not change their employment status and consume the wage and interest income completely so that their wealth a remains constant. For example, the employed worker with wealth a (and employment status ε = e) has initial value v(e, a) =
1 c 1−η , 1−β 1−η
with c = (1 − τ)w + (1 − τ)r a. Similarly, we initialize the value function of the unemployed with wealth a using consumption c = b + (1 − τ)r a. We solve the Bellman equation (8.35) with the help of golden section search as presented in Section 15.4.1 using the GAUSS procedure GSS. Between grid points, we interpolate next-period value function linearly. In addition, we have to supply the upper and lower value of the interval for the golden section search. We have to be careful that we do not specify an upper bound of next-period wealth a0 for which consumption is negative so that our computation breaks down (when we try to evaluate c 1−η /(1 − η)). Therefore, we simply iterate over the grid points starting with a = amin = −2 and find the grid points that bracket the maximum, i.e. if the value of the right-hand side of the Bellman equation starts to decrease again, we have located the upper bound of the interval for the golden section search. We also include a special treatment of the lower bound a = amin = −2. In particular, we check if the right-hand side of the Bellman equation increases for a small increase of a0 to amin + ". If not, the constraint a0 >= amin is binding. We continue to compute the solution of the Bellman equation for all a and ε ∈ {e, u}. We only update the value function (setting it equal to the right-hand side of the Bellman equation) after we have computed the new value for both the employed and unemployed worker.
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy
441
1.0
a0 − a
0.0 −1.0
Employed Unemployed
−2.0 −3.0 −2
100
200
300
400
500
600
700
800
900 1,000 1,100
a
Figure 8.4 Savings Function
The policy functions of the employed and unemployed worker in the stationary equilibrium are illustrated in Fig. 8.4 for net savings, a0 − a, over the range [−2, 1000]. For the employed workers, savings are strictly positive, while they are negative for the unemployed worker. For individual wealth levels in excess of the average wealth, savings are almost constant for both workers. In Step 5, we compute the stationary distribution of assets for the employed and unemployed workers. The wealth distribution is continuous and, hence, is an infinite-dimensional object that can only be computed approximately. Therefore, in general, we apply other methods for its computation than in the case of a finite-state Markov-chain. Four different kinds of methods are presented in order to compute the invariant distribution F (ε, a) of the heterogeneous agent model. With the first two methods, we compute either the distribution or the density function on a discrete number of grid points over the assets. As our third method, we use Monte-Carlo simulations by constructing a sample of households and tracking them over time. And fourth, we assume a specific functional form of the distribution function and use iterative methods to compute the approximation. The remaining Steps 6-8 depend on the specific way to compute the invariant distribution and we, therefore, will discuss it for each of the four methods separately.
442
8 Computation of Stationary Distributions
8.4.1 Discretization of the Distribution Function We first consider a method which relies upon the discretization of the state space. Our individual state space consists of two dimensions, the employment status ε and the wealth level a. However, the first state variable ε can only take two different values, ε ∈ {e, u}, so that we only need to discretize the second state variable, the asset level a. Assume that we choose a grid over the state space with m points. If the policy function has been computed with the help of methods that rely upon the discretization of the state space, for example discrete value function approximation, we want to choose a finer grid for the computation of the state space following Ríos-Rull (1999). Denote the distribution function by F (ε, a) and the density function by f (ε, a). If we discretize the distribution function, the state variable (ε, a) can only take a discrete number of values 2m. In this case, we are in essence trying to compute the Markov transition matrix between these states (ε, a). For the computation of the transition matrix between employment state ε, we presented several methods in the previous section and in Section 16.4.2. These methods are not all applicable for the computation of the transition matrix between the states (ε, a). In particular, with current computer technology, we will run into problems using the GAUSS procedure equivec to compute the ergodic distribution due to the curse of dimensionality because the Markov transition matrix has (2m)2 entries. For reasonable values of grid points 2m, we have a storage capacity problem and GAUSS, for example, will be unable to compute the ergodic matrix.24 In the following, we will present two iterative methods that rely upon the discretization of the state space in order to compute the discretized invariant distribution function. Both methods can be applied over a fine grid with a high number of points m. Algorithm 8.4.2 computes the invariant distribution function based on the equilibrium condition (8.38), while Algorithm 8.4.3 computes the invariant density function. Algorithm 8.4.2 (Computation of the Invariant Distribution Function F (ε, a)) Purpose: Computation of the stationary equilibrium. 24
The transition matrix between the 2m states mainly consists of zero entries, i.e. the matrix is sparse. As a consequence, we may still be able to apply the procedure equivec.g; however, we have to change the computer code applying sparse matrix methods. In essence, we only store the non-zero entries. GAUSS, for example, provides commands that handle sparse matrix algebra.
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy
443
Steps: Step 1: Place a grid on the asset space A = {a1 = amin , a2 , . . . , am = amax } such that the grid is finer than the one used to compute the optimal decision rules. Step 2: Choose an initial piecewise distribution function F0 (ε = e, a) and F0 (ε = u, a) over the grid. The vectors have m rows each. Step 3: Compute the inverse of the decision rule a0 (ε, a). Step 4: Iterate on X −1 Fi+1 (ε0 , a0 ) = π(ε0 |ε)Fi a0 (ε, a0 ), ε (8.42) ε=e,u
on grid points (ε0 , a0 ) until F converges. The Algorithm 8.4.1 that computes the stationary equilibrium of the heterogeneous agent economy 8.4.1 and the Algorithm 8.4.2 that computes the invariant distribution function are implemented in the GAUSS program IVdisF.g. After we have computed the individual policy function a0 (ε, a) for given capital stock K, unemployment compensation b, and income tax τ, we compute the invariant distribution function according to Algorithm 8.4.2. In Step 1, we choose an equidistant grid with m = 3n = 600 points on [−2, 3000] for the computation of the distribution function.25 In Step 2, we initialize the distribution function with the equal distribution so that each agent has the steady-state capital stock of the corresponding representative agent economy. In Step 3, we compute the inverse of the policy function a0 (ε, a), a = 0 −1 a (ε, a j ), over the chosen grid with j = 1, . . . , m. Since the unemployed agent with low wealth may want to spend all his wealth and accumulate debt equal or exceeding −amin , a0 may not be invertible when a0 = amin . For −1 this reason, we define a0 (ε, amin ) as the maximum a such that a0 (ε, a) = amin .26 Furthermore, the computation of a0 (ε, a) involves some type of interpolation, as a0 (ε, a) is stored for only a finite number of values n < m. We use linear interpolation for the computation of a0 (ε, a) for a j < a < a j+1 . In Step 4, the invariant distribution is computed. F is computed for every wealth level a0 = a j , j = 1, . . . , m, and ε = e, u. In the computation, we impose two conditions: 1) If a0−1 (ε, a j ) < amin , F (ε, a j ) = 0, 25
The grid over the asset space for the value function and the distribution function do not need to be equally spaced. 26 Huggett (1993) establishes that a0 is strictly non-decreasing in a.
444
8 Computation of Stationary Distributions
and 2) if a0−1 (ε, a j ) ≥ ama x , F (ε, a j ) = g(ε), where g(ε) denotes the ergodic distribution of the employment transition matrix. The first condition states that the number of employed (unemployed) agents with a currentperiod wealth below amin is equal to zero. The second condition states that the number of the employed (unemployed) agents with a current-period wealth equal to or above amax is equal to the number of all employed (unemployed) agents. In addition, as there may be some round-off errors in the computation of the next-period distribution Fi+1 (ε0 , a0 ), we normalize the number of all agents equal to one and multiply Fi+1 (e, a0 ) and Fi+1 (u, a0 ) by 0.92/Fi+1 (e, amax ) and 0.08/Fi+1 (u, amax ), respectively. Again, we need to use an interpolation rule, this time for the computation of Fi (ε, a). In (8.42), a0 = a0−1 (ε, a j ), j = 1, . . . , m, does not need to be a grid point. As we have only stored the values of Fi (ε, a0 ) for grid points a = a j , j = 1, . . . , m, we need to interpolate the value of Fi at the point a0 . We use linear interpolation for the computation of Fi (ε, a) for a j < a < a j+1 . In the program IVDisF.g, we also increase the number of iterations over the invariant distribution as the algorithm slowly converges to the invariant aggregate capital stock K. We start with an initial number of 500 iterations i over Fi (·) which we increase by 500 in each iteration to 25, 000 iterations in the iteration q = 50 over the capital stock. In the first iterations over the capital stock, we do not need a high accuracy in the computation of the invariant distribution. It saves computational time to increase the accuracy as we get closer to the solution for the aggregate capital stock. Similarly, the value function is getting more accurate as the algorithm converges to the aggregate capital stock. The reason is that we use a better initialization of the value function in each iteration, namely the solution of the last iteration. Once we have computed the distribution function, we are also able to compute the aggregate capital stock in Step 6 of the Algorithm 8.4.1. Therefore, we assume that the distribution of wealth a is uniform in any interval [a j−1 , a j ]. Thus, with the denotation ∆ = F (ε, a j ) − F (ε, a j−1 ), we have Z aj Z aj ∆ a f (ε, a) d a = a da = a j−1 a j−1 a j − a j−1 a (8.43) 1 a2 ∆ j 1 = F (ε, a j ) − F (ε, a j−1 ) a j + a j−1 . 2 a j − a j−1 a 2 j−1
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy
445
With the help of this assumption, the aggregate capital can be computed as follows: X Z∞ K= ≈
ε∈{e,u}
amin
X
m X
ε∈{e,u}
j=2
a f (ε, a) d a
! (8.44) a j + a j−1 F (ε, a j ) − F (ε, a j−1 ) + F (ε, a1 )a1 . 2
In this computation, we assume that the distribution of the individual asset holdings is uniform in the interval [a j−1 , a j ] for j = 2, . . . , m. Of course, the accuracy of our computation will increase with a finer grid and increasing number of grid points m. If the capital stock K is close to the capital stock in the previous iteration, we are done. We stop the computation if two successive values of the capital stock diverge by less than 0.1%. The divergence between the capital stocks in iteration 50 and 51 is less than 0.1% so that we stop the computation. The computational time is very long and amounts to 1 hour and 27 minutes using an Intel(R) Xeon(R), 2.90 GHz machine. For our calibration, the invariant aggregate capital stock is K = 243.7. The implied values for the wage rate, the interest rate, and the tax rate are w = 4.770, r = 0.513%, and τ = 1.724%. The calibration of the unemployment compensation b is associated with a replacement rate of b with respect to the net wage, ζ = b/((1 − τ)w) = 25.58%.27 Notice that β = 0.99500 ≈ 0.99499 = 1/(1 + r(1 − τ)), where the deviation is due to numerical round-off errors. As in the representative agent deterministic Ramsey model, the inverse of β is equal to the gross interest rate (after taxes). In the heterogeneous agent economies of Example 8.4.1, this equation does not always need to hold. For our calibration, the wealth constraint a ≥ amin is found to be nonbinding. Huggett and Ospina (2001) show that the stationary interest rate is always larger in any equilibrium with idiosyncratic shocks as long as the consumers are risk averse (η > 0) and if the liquidity constraint binds for some agents. We will also demonstrate this result to hold in the application of Section 8.5. At this point, we need to draw your attention to an important point. For our Example 8.4.1, it is rather the exception than the rule that the Algorithm 8.4.2 converges. For instance, if you increase the number of simulations over the distribution from {500, 1000, 1500, . . . , 25000} to {2500, 5000, . . . , 125000} while you iterate over the capital stock q = 27
We will use the replacement rate ζ in the calibration of the model in Chapter 9.3.
446
8 Computation of Stationary Distributions
1, . . . , 50, the algorithm will not converge. Similarly, if we choose the uniform distribution over the interval [−2, 3000]: F (ε, a) =
a − amin , a ∈ [amin , amax ] ama x − amin
for the initial distribution rather than the equal distribution: 1 if a ≥ K F (ε, a) = 0 else, where all agents hold the representative agent economy steady-state capital stock, the algorithm does not converge either. Therefore, computing the stationary solution to Example 8.4.1 involves a lot of trial and error. Furthermore, as the computation time amounts to more than one hour in GAUSS, the solution might be very time-consuming. Why is convergence so hard to achieve with the help of Algorithm 8.4.2? Consider what happens if we are not close to the stationary solution and, for example, our choice of the stationary capital stock is too low. As a consequence, the interest rate is too high and agents save a higher proportion of their income than in the stationary equilibrium. Consequently, if we choose rather too many time periods for the simulation of the distribution when we start the algorithm (and are far away from the true solution), the distribution of wealth among the employed agents becomes increasingly concentrated in the upper end of the wealth interval [−2, 3000]. As a result, we have a new average capital stock that is much higher than the stationary capital stock. In the next iteration over the capital stock, we might, therefore, also choose a capital stock that is much higher than the stationary capital stock and an interest rate that is lower than the stationary rate. As a consequence, agents may now save a much lower proportion of their wealth than in the stationary equilibrium. For this reason, as we simulate the distribution over many periods, the distribution may now become increasingly centered in the lower part of the interval [−2, 3000]. If we are unlucky, the distribution might alternate between one that is concentrated in the lower part of the interval for individual wealth and one that is concentrated close to the upper end of the interval. The algorithm, furthermore, fails to converge at all if we do not fix the unemployment compensation b,28 but, for example, calibrate it endogenously to amount to 25% of the net wage rate in each iteration over the 28
We encourage you to recompute Example 8.4.1 with the help of IVdisF.g for the cases discussed.
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy
447
capital stock. In this case, you will not be able to generate convergence even with the choice of the equal distribution for the initial distribution. Our choice of b = 1.299 serves as an anchor. If we do not fix it, b starts to alternate between high and low values and, as a consequence, precautionary savings of the employed agents also switch between low and high values, respectively. The convergence of the algorithm improves considerably if we could also fix the wage income of the agents. In fact, you will get to know two prominent applications from the literature in Sections 8.5 and 9.5.1, where we will exactly do this. By this device, we will be able to compute the stationary equilibrium in the models of Huggett (1993) and ˙Imrohoro˘ glu (1989a) without any problems and convergence can be achieved for any initial distribution. In Section 8.6, you will encounter another example where convergence is not a problem. Different from Example 8.4.1, we will then introduce endogenous labor supply. In this case, richer agents supply less labor ceteris paribus and, as a consequence, the wage income decreases with higher wealth and so do savings. This mechanism, of course, improves convergence. In Chapter 9, where we compute the dynamics of the distribution endogenously, this problem does not occur either. In these models, as we will argue, an increase in the average capital stock during the simulation of a time series is then accompanied by a decrease in the endogenous interest rate and, hence, an endogenous reduction of the savings rate.
245
Kt
240 235 230 0
5,000
10,000
15,000
20,000
25,000
t Figure 8.5 Convergence of the Distribution Mean
The convergence of the mean of the distribution during the final iteration over the capital stock is displayed in Figure 8.5. Notice that the rate of convergence is extremely slow. We also made this observation in all of
448
8 Computation of Stationary Distributions
our other applications: Convergence of the distributions’ moments29 only occurs after a substantial number of iterations well in excess of several thousands. It is for this reason that the computation of the stationary equilibrium of a heterogeneous agent economy is extremely time-consuming. Figure 8.5 also suggests that we should increase the number of iterations over the distribution function further to perhaps n = 100, 000 or more.30 In order to judge if our results are already accurate it is instructive to look at Figure 8.6 which displays the convergence of the new value of aggregate capital stock K in each iteration q.31 At the first iteration over the capital stock, q = 1, we only use 500 iterations over the distribution functions and our value functions are highly inaccurate. For higher values of q > 30, our aggregate capital stock remains rather constant no matter if we iterate 15,000, 20,000, or 25,000 times over the distribution function (corresponding to q=30, 40, and 50, respectively). This result indicates that we have indeed found the stationary solution.
350
Kt
300 250 200 150 0
10
20
30
40
50
t Figure 8.6 Convergence of the Capital Stock
From the stationary distribution function that we computed with the help of IVDisF.g, we can also derive the invariant density function. Assuming that the wealth distribution is uniform in the interval [a j−1 , a j ], a j , a j−1 ∈ A , we compute the density such that f (ε, a) = (F (ε, a j ) − F (ε, a j−1 ))/(a j − a j−1 ) for a ∈ [a j−1 , a j ]. The invariant density function of the employed (unemployed) worker that is computed with the help of Algorithm 8.4.2 is displayed by the blue (red) line in the Figure 8.7. Notice 29
The same result holds for the second and third moments of the distributions. We encourage the reader to change the program IVDisF.g accordingly. 31 We use a dampening device to update the aggregate capital stock by only 5% of the new value and 95% of the capital stock from the previous iteration q − 1. 30
8.4 The Stationary Equilibrium of a Heterogeneous Agent Economy
449
that the wealth constraint a ≥ amin is non-binding and that the number of agents with wealth above a = 1, 000 is almost zero. Therefore, our choice of the wealth interval [amin , amax ] = [−2, 3000] is sensible. Notice further, that, as observed empirically, the distribution is skewed to the left. 0.004
f (ε, a)
0.003
0.002
Employed Unemployed
0.001
0.000 0
200
400
600
800
1,000
a Figure 8.7 Invariant Density Function of Wealth
8.4.2 Discretization of the Density Function Alternatively, we may approximate the continuous density function f (ε, a) by a discrete density function, which, for notational convenience, we also refer to as f (ε, a). Again, we discretize the asset space by the grid A = {a1 = amin , a2 , . . . , am = amax }. We assume that the agent can only choose a next-period asset a0 from the set A . Of course, the optimal next-period capital stock a0 (ε, a) will be on the grid with a probability of zero. For this reason, we introduce a simple lottery: If the optimal next-period capital stock happens to lie between a j−1 and a j , a j−1 < a0 < a j , we simply assume that the next-period capital stock will be a j with probability (a0 − a j−1 )/(a j − a j−1 ) and a j−1 with the complementary probability (a j − a0 )/(a j − a j−1 ). With these simplifying assumptions, we can compute the invariant discrete density function with the help of the following algorithm:
450
8 Computation of Stationary Distributions
Algorithm 8.4.3 (Computation of the Invariant Density Function) Purpose: Computation of the stationary equilibrium. Steps: Step 1: Place a grid on the asset space A = {a1 = amin , a2 , . . . , am = amax } such that the grid is finer than the one used to compute the optimal decision rules. Step 2: Set i = 0. Choose initial discrete density functions f0 (e, a) and f0 (u, a) over that grid. The two vectors have m rows each. Step 3: Set f i+1 (ε, a) = 0 for all ε and a. i) For every a ∈ A , ε ∈ {e, u}, compute the optimal next-period wealth a j−1 ≤ a0 = a0 (ε, a) < a j and ii) for all a0 ∈ A and ε0 ∈ {e, u} the following sums: f i+1 (ε0 , a j−1 ) =
f i+1 (ε0 , a j ) =
X
X
ε=e,u
a∈A a j−1 ≤a0 (ε,a) a only for a ≤ 1 (a0 crosses the 45 degree line). In other words, the ergodic set for the asset is approximately [−2, 1] and the chosen upper limit ama x is not binding. Even if the initial distribution has agents with a > 1, after the transition to the stationary equilibrium, no agent has a credit balance exceeding amax . The change in the asset level a0 − a is illustrated in Figure 8.11.
0.25
a0 (e, a) − a
0.00 −0.25
e = eh e = el
−0.50 −0.75 −1.00 −2 −1.5 −1 −0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
a
Figure 8.11 Savings in the Exchange Economy
Once we have computed the decision functions c(·) and a0 (·), we are able to compute the invariant distribution. We apply the methods of Section 8.4 and iterate over the density function applying Algorithm 8.4.3. We increase the number of iterations over the distribution from 5,000 to 25,000 while we iterate over the interest rate r. The computation of the stationary equilibrium is almost identical to the one in the production economy 8.4.1 in the previous section with only one exception. In Section 8.4, we analyzed a production economy where the equilibrium interest rate can be computed from the marginal product of capital. In the present exchange economy, we can only guess the equilibrium price of next-period capital q which clears the credit market. We need to modify our computation as follows: First, make two initial guesses of the interest rate r = 1/q − 1. We choose the values r1 = 0% and r2 = 1%, respectively. Next, compute the average asset holding of the economy for the two cases, a1 and a2 . We compute the following guesses
462
8 Computation of Stationary Distributions
for the equilibrium interest rate with the help of the secant method which is described in more detail in Section 15.3.1. Given two points (as , rs ) and (as+1 , rs+1 ) we compute rs+2 from: rs+2 = rs+1 −
rs+1 − rs as+1 . as+1 − as
(8.53)
In order to improve convergence, we use extrapolation and use the interest rate r = φ rs+1 + (1 − φ)rs in the next iteration. We choose a value φ = 0.5 in our computation. We stopped the computation as soon as the absolute average asset level is below 10−5 . We need 31 iterations over the interest rate with a computational time of 30 minutes 21 seconds using an Intel(R) Xeon(R), 2.90 GHz machine. The stationary distribution F (e, a) is displayed in Figure 8.12 for the employed agent, e = eh (blue line), and unemployed agent, e = e l (red line), respectively. The mean of this distribution is equal to zero.
F (e, a)
0.75
0.50
e = eh e = el
0.25
0.00 −2 −1.5 −1 −0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
a
Figure 8.12 Stationary Distribution Function
8.5.3 Results Table 8.2 presents the results from our computation for different values of the credit limit a¯. Clearly, the interest rate is lower for a stricter credit limit.
8.6 Heterogeneous Productivity and Income Distribution
463
For a credit limit a¯ = −2 approximately equal to one half of the average annual income, the interest rate is even below zero. With a lower credit limit, the interest rate increases as agents can borrow more. For a¯ = −8, the (annualized) interest rate is already equal to 3.82%, which is much higher than the values we observe empirically. In the corresponding representative agent economy, the risk-free rate is equal to the time preference rate (1 − β)/β = 0.682% corresponding to an annual value of 1.006826 −1 = 4.16%. Notice that for a less binding credit constraint, the interest rate approaches the value of the representative agent economy. As noted above, the riskfree rate is strictly less than the time preference rate in a heterogeneous agent economy with incomplete insurance markets and binding liquidity constraints. Table 8.2 Credit Limit and Interest Rate Credit Limit a¯
Annual Interest Rate r
Price p
-2 -4 -6 -8
-7.38% 1.18% 3.08% 3.82%
1.0129 0.9983 0.9949 0.9938
In conclusion, we find that incomplete insurance (against the risk of a negative endowment shock) and credit constraints help to explain that the empirically observed risk-free rate of return is lower than the one found in standard representative agent models. As a consequence, the representative agent model might not be appropriate for the analysis of some problems in finance, but rather the application of heterogeneous agent models is warranted.
8.6 Heterogeneous Productivity and Income Distribution Naturally, we are unable to study redistributive problems in the representative-agent model. The representative agent model cannot answer the question how, for example, different fiscal policies affect the distribution
464
8 Computation of Stationary Distributions
of income and wealth. Furthermore, it does not provide an answer to the question how the dispersion of income and wealth arises in the first place. The explanation of the income and the wealth distribution has been a central objective of the early literature on heterogeneous agent models. In this section, we analyze how we can model the income heterogeneity of the economy. Like in most heterogeneous agent models, the source of income heterogeneity like e.g. different levels of individual productivity or education is assumed to be exogenous. Agents with different incomes build up different savings so that the wealth distribution can be computed endogenously and compared to the empirical distribution. We will find that our simple model is unable to replicate the empirical wealth distribution successfully and we will discuss possible solutions to this problem in the next chapters. This section is organized as follows. First, empirical facts from the US and the German economy with regard to the distribution of income and wealth are reviewed.35 Second, we present a model with income heterogeneity and discuss the standard way of introducing income heterogeneity into heterogeneous agent models. For this model, we compute the endogenous invariant wealth distribution. As an application of a redistributive policy measure, we also analyze the steady-state effects of a fiscal policy reform that consists of a switch from a flat-rate income tax to a consumption tax.
8.6.1 Empirical Facts on the Income and Wealth Distribution and Income Dynamics INEQUALITY OF INCOME AND WEALTH. US households hold different levels of income and wealth. To be precise, we define earnings to be wages and salaries plus a fraction of business income, income as all kinds of revenue before taxes, and wealth as the net worth of households.36 One striking feature of the US (and most industrialized and developed countries) is that wealth is much more unequally distributed than earnings and income. Using data from the 1992 Survey of Consumer Finances, Díaz-Giménez et al. (1997) compute Gini coefficients of income, earnings, and wealth equal to 0.57, 0.63, and 0.78, respectively. Quadrini and Ríos-Rull (2015) present 35
This is only a very brief presentation of facts and the interested reader is encouraged to consult any of the references cited in this section. 36 For a more detailed definition, see Díaz-Giménez et al. (1997).
8.6 Heterogeneous Productivity and Income Distribution
465
empirical evidence that the US earnings and wealth concentration had increased during 1992-2010, while the effect on the income distribution is less clear-cut. For example, by 2010, the Gini coefficients of earnings and wealth had increased to 0.65 and 0.85, respectively. Quadrini and Ríos-Rull (2015) use data from the Survey of Consumer Finance (SCF). Using tax data, Piketty and Saez (2003), Piketty (2014), and Saez and Zucman (2014) find an even larger increase in income and wealth inequality during this period. The Lorenz curves of US earnings (red line), income (blue line), and wealth (green line) in 1992 are displayed in Figure 8.13.37 Households are ordered according to their earnings, income, and wealth holdings, respectively and the cumulative shares of earnings, income, and wealth are graphed as a function of the share of the poorest households. For example, the lowest three quintiles (60%) of the earnings distribution receive 15.3% of total earnings, while the wealth-poorest 60% of the wealth distribution only hold 6.6% of total wealth. 1.00
Equal distribution Earnings Income Wealth
0.80 0.60 0.40 0.20 0.00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Proportion of Households Figure 8.13 Lorenz Curve of US Earnings, Income, and Wealth in 1992
The distribution of income in many countries is a little less concentrated than the one in the US. For example, in Germany, the Gini coefficient of labor income amounts to 0.317, while the distribution of wages is even 37
The data on the US economy from the 1992 Survey of Consumer Finance is provided in Díaz-Giménez et al. (1997).
466
8 Computation of Stationary Distributions
less concentrated with a Gini coefficient equal to 0.275.38 Again, wealth is much more unequally distributed in Germany than earnings and, according to the estimates of Fuchs-Schündeln et al. (2010), the distribution of total wealth is characterized by a Gini coefficient in the range 0.63-0.69. The main rise in the Gini coefficient from 0.63 to 0.69 occurred after the reunification of Germany because the wealth levels differed between West and East Germans. Krueger et al. (2010) survey the cross-country facts on economic inequality from studies that are published as a special issue in the Review of Economic Dynamics. In particular, they consider inequality with respect to the variables wages, labor earnings, income, consumption, and wealth for nine countries (US, Canada, UK, Germany, Italy, Spain, Sweden, Russia, and Mexico). Countries with less institutional constraints and rigidity in labor markets such as the US and Canada are characterized by much higher inequality in wages than the continental European countries Spain, Italy, and Germany. Even though one should be very careful to compare survey data from these countries due to reporting errors, the European countries in the sample have a log variance of hourly wages that is only half of the value in the US and Canada. The two least developed countries in this sample, Mexico and Russia, are those with the highest dispersion of wages. The inequality of earnings is even higher than that of wages due to the variance of log hours (with a negative, but low correlation of wages and hours). Krueger et al. (2010) also document that wage and earnings inequality has predominantly increased in these countries over the period 1980-2005, with a strong rise in the Anglo-Saxon countries Canada, UK, and US. In addition, the skill premium of wages grew significantly in the Anglo-Saxon countries and Mexico, while it even declined in continental Europe (with the exception of Sweden). In the recent two decades, empirical research has also paid special attention to the behavior of the top income percentiles. Usually, these studies use individual tax data rather than microeconomic panel data and, therefore, rely upon a much higher number of observations. Using US tax data (stemming mostly from the Internal Revenue System IRS), Piketty and Saez (2003) document the behavior of the top income and wage shares in the US during 1913-1998. Over the century, the shares display a U-shape. Between 1970 and 1998, the share of the top income percentile had risen from 5.13% to 10.88%. The top income shares at the end of 38
See Heer and Trede (2003) who use data on annual individual labor from the German Socio-Economic Panel (SOEP) during 1995-1996.
8.6 Heterogeneous Productivity and Income Distribution
467
last century approximately matched the high levels of inequality observed prior to World War II. The large shocks during the Great Depression and World War II also had a lasting effect on capital income. The U-shape of US income inequality over time has also been confirmed by research of Badel et al. (2018). They document that the income share of the top 1% earners has increased from 10% to above 20% in the US between 1980 and 2015 (and, with a similar absolute change from 6% to 13% and 8% to 13% in the UK and Canada, respectively), while the change in the top income shares was negligible in France, Sweden, and Denmark, where the income share of the top 1% earners remained below 10% over the whole period 1980-2015. DYNAMICS OF INCOME AND WEALTH. One crucial aspect for the analysis of redistributive effects of economic policy is the consideration of mobility and, hence, the dynamics of earnings and income. Households move up and down the different income, earnings and wealth groups. Some people fulfil their and the American dream and become rich. Others have simply bad luck (such as an accident or a divorce) and become poor. A redistribution of income, therefore, may have multiple effects. For example, an increase in income taxes may help to finance a rise in unemployment benefits and redistributes income from the income-rich to the income-poor. This may increase welfare as utility is a concave function of consumption. On the other hand, higher income taxes reduce incentives both to supply labor and to accumulate savings. As a consequence, total income decreases and welfare is reduced because of the increased distortions in the economy. Redistribution comes at the expense of efficiency. If we also consider income mobility, the welfare effect of such a policy is reduced further. The reason is simple: Income-poor agents may move up the income hierarchy and will also be harmed by higher taxes and a reduction in the efficiency of the economy in the future. Therefore, if we consider the redistributive effects of an economic policy in a heterogeneous agent model, mobility is a crucial ingredient. To get an initial idea of the US earnings mobility, let us take a look at the following transition matrix estimated by Díaz-Giménez et al. (1997):39 39
Díaz-Giménez et al. (1997) use data from the 1984, 1985, 1989, and 1990 Panel Study of Income Dynamics in order to compute the transition matrix.
468
8 Computation of Stationary Distributions 1989 Quintile
0.858 0.116 0.014 0.006 0.005 0.186 0.409 0.300 0.071 0.034 1984 Quintile 0.071 0.120 0.470 0.262 0.076 0.075 0.068 0.175 0.465 0.217 0.058 0.041 0.055 0.183 0.663
(8.54)
The matrix can be interpreted as follows: The entry in the first row, second column is equal to 0.116 and signifies that 11.6% of the households in the lowest earnings quintile in 1984 were in the second lowest earnings quintile in 1989. Notice that the entries in the diagonal are the maxima of each row so that there is a tendency to remain in the same earnings group. These values range between 40.9% and 85.8% and the low-income group is the least mobile group in the US. The income mobility is almost the same in Germany. For example, Schäfer and Schmidt (2009) estimate the following transition matrix in Germany between the years 2003 and 2007: 2007 Quintile
0.61 0.23 0.09 0.04 0.05 0.18 0.46 0.24 0.09 0.02 2003 Quintile 0.08 0.20 0.44 0.20 0.08 0.03 0.07 0.20 0.46 0.23 0.03 0.04 0.07 0.22 0.64
(8.55)
In accordance with these results, Burkhauser et al. (1997) find that, for the 1980s, even though earnings are more unequally distributed in the US than in Germany, the patterns of the quintile to quintile mobility, surprisingly, are similar in the two countries. In DSGE models of income heterogeneity, you have to introduce an exogenous source of such heterogeneity and mobility. Agents have either different (stochastic) abilities, inherit different levels of wealth, or just happen to be unemployed after experiencing bad luck. Usually, the DSGE literature that specifies a heterogeneous agent economy starts from the assumption of exogenous wage inequality. Workers differ with respect to their idiosyncratic productivity, the levels of which are chosen to replicate the wage inequality among the workers. In overlapping generations models that we will consider in Chapters 10 and 11, individual productivity may also be distinguished between the different age cohorts as well. As one of the earliest studies that treats wage inequality as endogenous, Heckman
8.6 Heterogeneous Productivity and Income Distribution
469
et al. (1998) explain the rising wage inequality since the 1960s with the enlarged cohorts of the Baby Boom. Moreover, these authors endogenize the schooling choice of the young cohort. Another noteworthy exception that treats the accumulation of human capital as endogenous is the research by Krueger and Ludwig (2013), who find that the welfare-maximizing fiscal policy is characterized by a substantially progressive labor income tax code and a positive subsidy for college education. Heathcote et al. (2010) study the impact of the rising US college wage premium and the role of endogenous education choice, labor supply, and savings in an overlapping generations model with incomplete insurance markets. In Example 8.4.1, agents faced idiosyncratic risk of unemployment which they cannot insure against. In this section, we extend our analysis by also introducing individual differences in earnings abilities. One can either assume that the individual’s earnings y ti are stochastic or that labor productivity εit is stochastic. In the first case, labor income is an exogenous variable, where in the latter case, agents may still be able to vary their labor supply so that labor income of individual i in period t, y ti = εit w t l ti , which is the product of individual productivity εit , wage w t , and labor time l ti , is endogenous. We will pursue this latter approach in the following. There have been many early empirical studies on the time-series behavior of earnings and wages. For example, Lillard and Willis (1978) estimate an AR(1)-process for log earnings, while MaCurdy (1982) considers an ARMA(1,2) equation for log earnings. They find substantial persistence in the shocks to earnings (the autoregressive coefficients equal 0.406 and 0.974 for annual data, respectively). Empirical evidence provided by Shorrocks (1976) suggests that the dynamics of productivity (and income) may be modeled slightly better by a AR(2) process than by an AR(1)-process. During the 1990ies, theoretical models in the DSGE literature with income heterogeneity and exogenous labor supply have widely used a regression to the mean process for log-labor earnings. Examples include Aiyagari (1994), Hubbard et al. (1995), Huggett (1996), or Huggett and Ventura (2000). In these models, individual earnings y t (where we dropped the index i of the individual for notational convenience) follow the process: ln y t − ln y = ρ ln y t−1 − ln y + η t , (8.56) where η t ∼ N (0, ση2 ). Atkinson et al. (1992) report that estimates of the regression towards the mean parameter ρ vary from 0.65 to 0.95 in annual data. In Huggett and Ventura (2000) who study a life-cycle economy, the
470
8 Computation of Stationary Distributions
income is also age-dependent and follows the process: ln y j − ln y j = ρ ln y j−1 − ln y j−1 + η j ,
(8.57)
where y j is the income of the j-year-old household and η j ∼ N (0, ση ) and
ln y1 ∼ N (ln y1 , σ2y ). The parameters ρ, σ y1 , and ση are calibrated in 1 order to reproduce the Gini coefficient of US earnings of different cohorts and the overall economy on the one hand and the estimated variance of the persistence of the shocks to log earnings on the other hand. Guvenen et al. (2015) also study the dynamics of individual labor earnings over the life cycle. In contrast to previous studies on income inequality such as Heathcote et al. (2010) and Guvenen (2009) that apply data from the Current Population Survey, the Panel Study of Income Dynamics or the Survey of Consumer Finance, Guvenen et al. (2015) employ the more comprehensive data set from the Master Earnings File of the U.S. Social Security Administration records with more than a million observations. They find, among others, that 1) earnings shocks display substantial deviations from log-normality in the form of an extremely high kurtosis and that 2) the statistical properties of the labor earnings process vary over the life cycle. In particular, the variance of earnings are found to decline from ages 25 to 50 and subsequently rise again. Individual income risk is also found to depend on the business cycle.40 Employing the same comprehensive data set as in Guvenen et al. (2015), Guvenen et al. (2014) study the nature of income risk over the business cycle. According to their results, the variance of idiosyncratic income shocks is not countercyclical, but the left-skewness increases during recessions. They also find that the different income groups face different risks. In particular, high-income workers suffer substantially less than low-income workers during a recession (but not during an expansion). Only the top income percentile behaves differently from this and is subject to much higher income risk during a recession.
8.6.2 The Model In the following, we consider a heterogeneous agent economy where agents differ with regard to their productivity and employment status. Agents are also mobile and, between periods, the productivity and employment status 40
In Chapter 9, we will also consider the business cycle dynamics of the income distribution.
8.6 Heterogeneous Productivity and Income Distribution
471
may change. As a consequence, individual labor income also changes. The model is able to account for both the observed heterogeneity in wage rates and the observed labor income mobility in Germany. In addition to the economy studied in Example 8.4.1, we model the household’s labor supply decision. As a consequence, the labor income distribution is endogenous. As one major implication of our modeling framework, we are able to replicate the German labor income distribution quite closely. The model follows Heer and Trede (2003).41 Three sectors can be depicted: households, firms, and the government. HOUSEHOLDS. In the following model, labor supply l is endogenous. Accordingly, earnings are endogenous and we cannot specify an exogenous earnings process. Rather, the exogenous variable is productivity ε or, similarly, the wage per unit labor, wε, as all agents face the same wage rate w per efficiency unity. Similar to related studies, e.g. Ventura (1999) or Castañeda et al. (1998b), we assume productivity ε to follow a Markov chain with conditional transition probabilities given by: π(ε0 |ε) = Prob{ε t+1 = ε0 |ε t = ε},
(8.58)
(ε2 , ε3 , ε4 , ε5 ) = (0.4476, 0.7851, 1.0544, 1.7129).
(8.59)
where ε, ε0 ∈ E = {ε1 , . . . , εnε }. The productivities ε ∈ E = {ε1 , . . . , εnε } are chosen to replicate the discretized distribution of hourly wage rates which, in our model, are proportional to productivity. The number of productivities is set equal to nε = 5. We also consider unemployment and let ε1 characterize the state of unemployment by setting ε1 equal to zero. The productivities (ε2 , ε3 , ε4 , ε5 ) are estimated from the empirical distribution of hourly wages in Germany in 1995. The productivity εi corresponds to the average hourly wage rate of earners in the (i − 1)-th quartile. Normalizing the average of the four nonzero productivities to unity we arrive at
41
Heer and Trede (2003) also study the more complicated case of a progressive income tax. In this case, the policy function for labor supply does not have a continuous derivative and the computation is a little bit more complicated. The interested reader is referred to the original article. For the US economy, we know various other studies which consider the effects of a flat-rate tax versus a progressive income tax. Ventura (1999) considers a life-cycle model, Castañeda et al. (1998a) use a model similar to ours, but with a different calibration procedure for the Markov process (8.58), and Caucutt et al. (2003) also model endogenous human capital formation. Heathcote et al. (2017) find the degree of progressivity in US income tax code to be close to the optimal value. In their model, the investment into individual skills is endogenous and possibly constrained by poverty.
472
8 Computation of Stationary Distributions
The transition probability into and out of unemployment, π(ε0 = 0|ε > 0) and π(ε0 > 0|ε = 0) where ε0 represents next period’s productivity, are chosen in order to imply an average unemployment rate of 10.95%. Further, we assume that the probability to loose one’s job does not depend on the individual productivity. During unemployment, the worker’s human capital depreciates or, equivalently, his productivity decreases. We assume that the worker can only reach productivity ε2 after unemployment and, following Heer and Trede (2003), set π(ε0 = ε2 |ε = 0) = 1 − π(ε0 = 0|ε = 0) = 0.35 and π(ε0 > ε2 |ε = 0) = 0.42 The remaining (nε − 1)2 = 16 transition probabilities are calibrated such that 1) each row in the Markov transition matrix sums to one and 2) the model economy matches the observed quartile transition probabilities of the hourly wage rate from 1995 to 1996 as given by the German Socio-economic panel data.43 Our transition matrix is given by: 0.3500 0.6500 0.0000 0.0000 0.0000 0.0800 0.6751 0.1702 0.0364 0.0383 0 π(ε |ε) = 0.0800 0.1651 0.5162 0.2003 0.0384 . (8.60) 0.0800 0.0422 0.1995 0.5224 0.1559 0.0800 0.0371 0.0345 0.1606 0.6879 You may want to compare the German wage mobility of the employed agents (the lower 4x4-matrix of (8.60) divided by 1 − 10.95% in order to imply a measure equal to unity for the employed agents) with the German and US earnings mobility as described by (8.55) and (8.54). Notice, however, that (8.60) considers a 1-year transition period while (8.55) and (8.54) considers a time horizon of 4 and 5 years, respectively. If you assume that earnings follow an AR(1)-process, you may derive the 5-year transition matrix for Germany by multiplying (8.60) four times with itself.44 If you compare these two matrices, you cannot help noticing that German workers are much more mobile than the US workers. While the diagonal elements in (8.54) are in the range 0.409-0.885, the corresponding elements in the 5-year transition matrix in Germany amount to values between 0.27-0.37. This result, however, is an artefact of our approximation. 42
Alternatively, we could have assumed that the worker’s productivity does not decrease during unemployment. In this case, however, we had to introduce an additional state variable into the model which makes the computation and calibration even more cumbersome. 43 A different approach is followed by Castañeda et al. (1998b) who calibrate the transition matrix in order to replicate the U.S. earnings and wealth distribution as closely as possible. As a consequence, the diagonal elements of the transition matrix calibrated by Castañeda et al. (1998b) are far larger than the empirical counterparts. 44 See also Section 16.4.
8.6 Heterogeneous Productivity and Income Distribution
473
As pointed out above, the earnings process might be better modeled with the help of an second-order autoregressive (AR(2)) process as suggested by Shorrocks (1976). Households are of measure one and infinitely-lived. Households are heterogeneous with regard to their employment status, their productivity εi , and their wealth k i , i ∈ [0, 1].45 Individual productivity εi ∈ E = {0, 0.4476, 0.7851, 1.0544, 1.7129} follows the first-order finite-state Markov chain with conditional transition probabilities given by (8.58). Agents are not allowed to borrow, k i ≥ 0. In addition, the household faces a budget constraint. He receives income from labor l ti and capital k it which he spends on consumption c ti and next-period wealth k it+1 : k it+1 = (1 + r t )k it + w t εit l ti − (1 + τc )c ti − τ y y ti + 1ε=ε1 b t ,
(8.61)
y ti = y ti (εit , k it ) = r t k it + w t εit l ti .
(8.62)
where r t , w t , τc , and τ y denote the interest rate, the wage rate, the consumption tax rate, and the tax rate on income y, respectively. 1ε=ε1 is an indicator function which takes the value one if the household is unemployed (ε = ε1 ) and zero otherwise. If the agent is unemployed, he receives unemployment compensation b t . Taxable income is composed of interest income and labor income:
Household i, which is characterized by productivity εit and wealth k it in period t, maximizes his intertemporal utility with respect to consumption c ti and labor supply l ti : Et
∞ X s=0
i i β s u(c t+s , 1 − l t+s ),
(8.63)
where β < 1 is a discount factor and expectations are conditioned on the information set of the household at time 0. Instantaneous utility u(c t , 1−l t ) is assumed to be additively separable in the utility from consumption and the utility from leisure and is represented by: 1−η
u(c t , 1 − l t ) =
ct
1−η
+ γ0
(1 − l t )1−γ1 . 1 − γ1
(8.64)
Our choice of the functional form for utility follows Castañeda et al. (1998b). Many quantitative studies of general equilibrium model specify a 45
As we only consider one type of asset, we will refer to k as capital, wealth, and asset interchangeably.
474
8 Computation of Stationary Distributions
Cobb-Douglas functional form of utility. In this case, however, the elasticity of individual labor supply with regard to wealth is larger than for the utility function (8.64) and, consequently, the distribution of working hours varies more (and is less in accordance with empirical observations) than for our choice of the utility function (8.64).46 Notice that (8.64) differs from the functional form (1.39) and is, therefore, not applicable in models with productivity growth. To see this point assume that we analyze a perfect-foresight economy with exogenous productivity growth at a rate g > 0 and no uncertainty. In steady state, capital, wages, and consumption also grow at rate g > 0, while labor supply is constant. The first-order condition of the household with respect to labor is given by γ0
(1 − l t )−γ1 −η
ct
=
(1 − τ y )εw t . 1 + τc
(8.65)
Consequently, for a steady state growth c t+1 /c t = w t+1 /w t = 1 + g with constant labor supply l t = l, either g 6= 0 and η = 1 or g = 0. PRODUCTION. Firms are owned by the households and maximize profits with respect to their capital and labor demand. Production Yt is characterized by constant returns to scale using agregate capital K t and labor L t as inputs: Yt = K tα L 1−α . t
(8.66)
In a market equilibrium, factors are compensated according to their marginal products and profits are zero: rt = α
Lt Kt
1−α
− δ, α Kt w t = (1 − α) , Lt
(8.67a) (8.67b)
where δ denotes the depreciation rate of capital. 46
In Section 6.2.4, you learned about the preferences of Greenwood et al. (1988) where the income effect on labor supply is zero. In Section 10.3, we will use a utility function proposed by Trabandt and Uhlig (2011) that is compatible with economic growth and displays a constant Frisch elasticity.
8.6 Heterogeneous Productivity and Income Distribution
475
GOVERNMENT. Government expenditures consist of government consumption G t and unemployment compensation B t . In our benchmark case, government expenditures are financed by an income tax and a consumption tax. We will compare the employment and distribution effects of two tax systems with equal tax revenues: 1) a flat-rate income tax structure and 2) only a consumption tax (τ y = 0). The government budget is balanced in every period so that government expenditures are financed by tax revenues Tt in every period t: G t + B t = Tt .
(8.68)
STATIONARY EQUILIBRIUM. We will define a stationary equilibrium for a given government tax policy and a constant distribution F (ε, k) (and associated density f (ε, k)) over the individual state space (ε, k) ∈ E × [0, ∞). Definition: A stationary equilibrium for a given set of government policy parameters is a value function v(ε, k), individual policy rules c(ε, k), l(ε, k), and k0 (ε, k) for consumption, labor supply, and next-period capital, respectively, a time-invariant distribution F (ε, k) of the state variable (ε, k) ∈ E × [0, ∞), time-invariant relative prices of labor and capital, (w, r), and a vector of aggregates K, L, B, T , and C such that: 1. Factor inputs, consumption, tax revenues, and unemployment compensation are obtained aggregating over households: XZ ∞ K=
0 XZ ∞
k f (ε, k) d k,
(8.69a)
ε l(ε, k) f (ε, k) d k,
(8.69b)
c(ε, k) f (ε, k) d k,
(8.69c)
ε∈E
L=
0 XZ ∞ ε∈E
C=
ε∈E y
T =τ Z B=
K L
∞ 0
0 α 1−α
− δK + τc C,
b f (ε1 , k) d k.
(8.69d) (8.69e)
2. c(ε, k), l(ε, k), and k0 (ε, k) are optimal decision rules and solve the household decision problem v(ε, k) = max u(c, 1 − l) + βE v(ε0 , k0 ) ε , (8.70) c,l
476
8 Computation of Stationary Distributions
where ε0 and k0 denote next-period productivity and wealth, subject to the budget constraint (8.61), the tax policy, and the stochastic process determining the next-period productivity level (8.58). 3. Factor prices (8.67a) and (8.67b) are equal to the factors’ marginal productivities, respectively. 4. The goods market clears: F (K, L) + (1 − δ)K = C + K 0 + G = C + K + G.
(8.71)
5. The government budget (8.68) is balanced: G + B = T . 6. The distribution of the individual state variables is constant: F (ε0 , k0 ) =
X ε∈E
π(ε0 |ε) F (ε, k),
(8.72)
for all k0 ∈ [0, ∞) and ε0 ∈ E and with k0 = k0 (ε, k).47 CALIBRATION. The model is calibrated as in Heer and Trede (2003). The preference parameters are set equal to η = 2, γ0 = 0.13, and γ1 = 10. The latter two parameters are selected in order to imply an average working time of ¯l = 32% and a coefficient of variation for hours worked equal to σl /¯l = 0.367. The empirical value for Germany for the coefficient of variation is equal to 0.385. The discount factor β amounts to 0.96. The productivities ε ∈ {0, 0.4476, 0.7851, 1.0544, 1.7129} together with the Markov transition matrix (8.60) imply an ergodic distribution (0.1095, 0.3637, 0.2074, 0.1662, 0.1532) for the five productivity types. The resulting Gini coefficient of wages (or, equally, productivities) amounts to 0.254, which compares favorably with the empirical counterpart (0.275). The Markov transition matrix is given by (8.60). The income tax rate is set equal to 17.4%, while the consumption tax rate is computed endogenously in order to imply a government consumption share in GDP equal to 19.6%. The replacement rate of unemployment compensation b with respect to the gross wage of the lowest wage quartile is equal to 52%, b = 0.52ε2 w¯l 2 , where ¯l 2 denotes the average working time of the lowest productivity workers. The production elasticity α is set equal to 0.36 and the annual depreciation rate is estimated at δ = 4%. 47
Our definition of the stationary equilibrium, again, does not use advanced concepts of measure theory. In particular, our formulation of the characteristics of the stationary distribution assumes that the number of households with zero capital is zero. This will be the case for our calibration.
8.6 Heterogeneous Productivity and Income Distribution
477
8.6.3 Computation The solution algorithm for the benchmark case with a flat-rate income tax is described by the following steps: 1) Make initial guesses of the aggregate capital stock K, aggregate employment L, the consumption tax τc , and the value function v(ε, k). 2) Compute the wage rate w, the interest rate r, and unemployment compensation b. 3) Compute the household’s decision functions k0 (ε, k), c(ε, k), and l(ε, k). 4) Compute the steady-state distribution of assets. 5) Compute K, L, and taxes T that solve the aggregate consistency conditions. 6) Compute the consumption tax τc that solves the government budget. 7) Update K, L, and τc , and return to Step 2 if necessary. In Step 3, the optimization problem of the household is solved with value function iteration. For this reason, the value function is discretized using an equispaced grid K of 1,000 points on the interval [0, k max ]. The upper bound on capital k max = 12 is found to never be binding. The value function is initialized assuming that working agents supply 0.2 units of time as labor and that each agent consumes his current-period income infinitely. The matrix that stores the values of the value function has 1, 000 × 5 entries. We also assume that the agent can only choose discrete values from the interval [0, 1] for his labor supply. We choose an equispaced grid L of 100 points. The algorithm is implemented in the GAUSS program Tax_reform.g. In order to find the maximum of the rhs of the Bellman equation (8.70), we need to iterate over the next-period capital stock k0 ∈ K and the optimal labor supply l 0 ∈ L for every k ∈ K and εi , i = 1, . . . , nε . This amounts to 1, 000 × 100 × 1, 000 × 4 + 1, 000 × 1, 000 iterations (the labor supply of the unemployed is equal to 0). In order to reduce the number of iterations, we can exploit the fact that the value function is a monotone increasing function of assets k, that consumption is strictly positive and monotone increasing in k, and that the labor supply is a monotone decreasing function of assets k. Therefore, given an optimal next-period capital stock k0 (ε, ki ) and labor supply l(ε, ki ), we start the iteration over the next-period capital stock for the optimal next-period capital stock k0 (ε, ki+1 ) at k0 (ε, ki ) with ki+1 > ki . Similarly, we start the iteration over the labor supply l at l(ε, ki ) and decrease the labor supply
478
8 Computation of Stationary Distributions
at each iteration in order to find l(ε, ki+1 ) ≤ l(ε, ki ). We also stop the iteration as soon as individual consumption becomes negative, c ≤ 0. The number of iterations is reduced substantially by the exploitation of the monotonicity and nonnegativity conditions. During the first iterations over the aggregate capital stock, we do not need a high accuracy of the value and the policy functions. Therefore, we iterate only 10 times over the value function and increase the number of iterations to 20 as the algorithm converges to the true solution. By this device, we save a lot of computational time. The computer program is already very time-consuming and runs 1 hour 29 minutes an Intel(R) Xeon(R), 2.90 GHz machine. As a much faster alternative, we may compute the optimal labor supply functions with the help of the first-order condition (8.65) and you will be asked to perform this computation in the Problem 8.6. Using the time-consuming value function iteration over both the capital stock and the labor supply, however, might be a good starting point 1) if you would like to compute a rough approximation of the final solution as an initial guess for more sophisticated methods or 2) if your policy function is not well-behaved. The latter case might arise in the presence of a progressive income tax where the optimal labor supply does not have a continuous first derivative.48 As soon as we have computed the optimal policy function, we might want to check the accuracy of our computation. For this reason, we compute the residual function for the two first-order conditions: ul (c(ε, k), 1 − l(ε, k)) (1 + τc ) − 1, uc (c(ε, k), 1 − l(ε, k)) (1 − τ y )wε 0 0 0 0 u c(ε , k ), 1 − l(ε , k ) c R2 (ε, k) = E β (1 + r(1 − τ y )) − 1. uc (c(ε, k), 1 − l(ε, k)) R1 (ε, k) =
The mean absolute deviations are about 1.07% and 3.71% for the two residual functions R1 and R2 , respectively. The maximum deviations even amount to 11% and 47% for R1 and R2 , respectively. For a closer fit, we either need to increase the number of grid points or to compute the optimal policy functions at points off the grid (see Problem 8.6). The remaining steps of the algorithm are straightforward to implement using the methods presented in the previous chapters. For the computation 48 Again, the interested reader is referred to either Ventura (1999) or Heer and Trede (2003) for further reference.
8.6 Heterogeneous Productivity and Income Distribution
479
of the invariant distribution, in particular, we discretize the wealth density and compute it as described in the Algorithm 8.4.3.
8.6.4 Results In Table 8.3, the effects of the two different tax policies on the aggregate capital stock K, effective labor L, average working hours ¯l, the real interest rate r, the Gini coefficients of the labor income and the wealth distribution, and the variational coefficient of working time and effective labor are presented. In the stationary equilibrium, the unemployment rate is equal to 10.95%. Aggregate effective labor supply amounts to L = 0.251 with an average working time approximately equal to ¯l = 0.324. Working hours vary less than effective labor. The variational coefficient of working hours l (effective labor εl) is equal to 0.367 (0.691) (see the last two columns of Table 8.3). The two variational coefficients are in very good accordance with the empirical estimates 0.385 (0.638) which we computed using data from the German Socio-Economic Panel during 1995-96. The higher variation of effective labor relative to working hours reflects the optimizing behavior of the working agents who work longer if they are more productive as the substitution effect of a rise in the wage dominates the income effect. The labor supply elasticity with regard to the wage rate, ηl w , is moderate, amounting to 0.213 for the average worker. Again, this compares favorably with the data. Sieg (2000), for example, estimates that elasticities for male labor supply are small and in the range between 0.02 and 0.2. Table 8.3 Results of Tax Reform Policies Tax Policy
K
L
τy
2.70
0.251
c
3.24
0.249
τ
¯l
Gini k
σl /¯l
σεl /L
0.324 3.88% 0.317
0.406
0.367
0.691
0.323 3.01% 0.316
0.410
0.366
0.685
r
Gini wεl
Notes: Policy τ y refers to the case of a flat-rate income tax and τc to the case where the income tax rate is zero and the consumption tax rate τc is increased such that the government budget balances.
480
8 Computation of Stationary Distributions
The aggregate capital stock amounts to K = 2.70 which is associated with a capital-output coefficient equal to K/Y = 4.57. During 1991-97, the empirical value of K/Y was equal to 5.0 (2.6) in Germany for the total economy (producing sector). The distribution of wealth, however, is not modeled in a satisfactory manner. In our model, the concentration of wealth is too low with a Gini coefficient equal to Ginik = 0.406 and compares unfavorably with empirical estimates of the wealth Gini coefficient reported above (which are well in excess of 0.6). We will discuss the reasons why the simple heterogeneous agent model of this section is unable to replicate the empirical wealth distribution in the next chapters. In our second tax experiment, we set the income tax rate τ y to zero and increase the consumption tax rate τc in order to generate the same tax revenues as in the benchmark case. The new steady-state consumption tax amounts to τc = 39.5% (compared to 20.5% under tax policy τ y ). As interest income is not taxed any more, households increase their savings. Accordingly, the aggregate capital stock K rises from 2.70 to 3.24. As labor is not taxed either any more, the incentives to supply labor increases on the one hand. On the other hand, average wealth of the agents is higher and, for this reason, labor supply decreases. The net effect is rather small so that employment approximately remains constant. Associated with these changes of the input factors is a strong decline of the interest rate r by 0.8 percentage points. The distribution effect of the tax reform is rather modest. The Gini coefficient of gross labor income almost remains constant and wealth is only a little more concentrated. Similarly, the coefficients of variation for labor supply and effective labor are hardly affected. In summary, the most marked effect of a switch to a consumption tax consists of a pronounced rise of savings.
Problems
481
Problem 8.1: Transition Dynamics Compute the transition dynamics of the model presented in Section 8.2. Instead of the direct computation with the help of the dynamic equilibrium conditions, linearize the model around the steady state using the techniques presented in Chapter 3. Compare the dynamics of the direct computation with your results. Do you think it is acceptable to use linearization methods? What is the maximum divergence in the aggregate capital stock during the transition between those two methods?
Problem 8.2: Initial Instead of Final Distribution Solve the Numerical Example in Section 8.2.1 for the case that the initial rather than the final distribution of individual capital stocks (k1 , k2 , k3 ) is characterized by the conditions that 1) the two poorest quintiles of the distribution combined hold 1.35% of total wealth, 2) the Gini of the wealth distribution amounts to 0.78, and 3) average wealth relative to median wealth amounts to 3.61.
Problem 8.3: Function Approximation Compute the invariant distribution of Example 8.4.1 with the help of functional approximation as described in Algorithm 8.4.5. However, choose an exponential function of order n = 3 for the approximation of the density function. Problem 8.4: The Risk-Free Rate of Return 1) Compute the model with production in Example 8.4.1 with β = 0.96 and for different levels of minimum asset levels, amin ∈ {−2, −4, −8}, and show that the equilibrium interest rate decreases with a more binding credit constraint. 2) Compute the equilibrium prices in the exchange economy of Huggett (1993) for a higher coefficient of risk aversion η = 3 and compare your results with Table 2 in Huggett (1993).
Problem 8.5: Unemployment Insurance and Moral Hazard Consider the following extension of Example 8.4.1 (adapted from Hansen and ˙Imrohoro˘ glu (1992)). The agents’ utility function is now a function of both consumption and leisure,
482
8 Computation of Stationary Distributions γ
u(c t , l t ) =
c t (1 − l t )1−γ 1−η
1−η .
All agents are either offered an employment opportunity (ε = e) or not (ε = u). The Markov transition matrix is again described by (8.29). Agents that receive an employment offer may either accept the offer and work full-time, l = 0.3, or reject the offer and receive unemployment insurance b t with probability q(ε t−1 ). In particular, the probability of unemployment benefits may be different for a searcher, ε t−1 = u, and a quitter, ε t−1 = e, q(e) 6= q(u). Agents that turn down employment offers in order to extend unemployment spells may have different chances to receive unemployment benefits than quitters. Compute the stationary equilibrium of the model for the parameters of Example 8.4.1. In addition, set γ = 0.33. Compute the model for different replacement rates b t /(1 − τ)w t ∈ {0.25, 0.5, 0.75} and different probabilities to receive unemployment benefits g(e) = g(u) = 0.9, g(e) = g(u) = 0.8, g(e) = 0.9, g(u) = 0.8. How does the optimal unemployment insurance (as measured by the average value of the households) look like?
Problem 8.6: Income Tax Reform Recompute the model of Section 8.6 implementing the following changes: 1) Compute the optimal labor supply with the help of the first-order condition (8.65) (do not forget to check if the constraint 0 ≤ l ≤ 1 is binding). Therefore, you need to solve a nonlinear equation. 2) Compute the optimal next-period capital k0 where k0 does not need to be a grid-point. Use linear interpolation to evaluate the value function between grid-points. Apply the golden section search algorithm presented in Section 15.4.1 in order to compute the maximum right-hand side of the Bellman equation.
Problem 8.7: Superneutrality in the Sidrauski Model As it is well-known, money is superneutral in the model of Sidrauski (1967). A change of the money growth rate does not affect the real variables of the Ramsey model that is augmented by a monetary sector if 1) money demand is introduced with the help of money-in-the-utility and 2) labor supply is inelastic. Consider the following heterogeneous-agent extension of the standard Sidrauski model that consists of the three sectors households, firms, and the monetary authority (adapted from Heer (2004)):
Problems
483
Households. The household j ∈ [0, 1] lives infinitely and is characterized by j j j her productivity ε t and her wealth a t in period t. Wealth a t is composed of j j j j capital k t and real money m t ≡ M t /Pt , where M t and Pt denote the nominal money holdings of agent j and the aggregate price level, respectively. Individual j productivity ε t is assumed to follow a first-order Markov-chain with conditional probabilities given by: Γ (ε0 |ε) = Prob ε t+1 = ε0 |ε t = ε ,
where ε, ε0 ∈ E = {ε1 , . . . , εn }. j The household faces a budget constraint. She receives income from labor l t , j capital k t , and lump-sum transfers t r t which she either consumes at the amount j j j of c t or accumulates in the form of capital k t or money m t : j
j
j
j
j j
j
k t+1 + (1 + π t+1 )m t+1 = (1 + r)k t + m t + w t ε t l t + t r t − c t , P −P
where π t ≡ t Pt−1t−1 , r t , and w t denote the inflation rate, the real interest rate, and the wage rate in period t. The household j maximizes life-time utility: Ut = Et
∞ X s=0
j
j
β s u(c t+s , m t+s )
subject to the budget constraint. The functional form of instantaneous utility u(·) is chosen as follows: u(c, m) = γ ln c + (1 − γ) ln m.
Labor supply is exogenous, l = ¯l = 0.3. Production. Firms are also allocated uniformly along the unit interval and produce output with effective labor L and capital K. Let F t (ε, k, m) (with associated density f t (ε, k, m)) denote the period-t distribution of the household with wealth a = k + m and idiosyncratic productivity ε, respectively. Effective labor L t is given by: XZ Z ¯l · ε · f t (ε, k, m) dm dk. Lt = ε∈E
k
m
Effective labor L is paid the wage w. Capital K is hired at rate r and depreciates at rate δ. Production Y is characterized by constant returns to scale and assumed to be Cobb-Douglas: Yt = K tα L 1−α . t In a factor market equilibrium, factors are rewarded with their marginal product:
484
8 Computation of Stationary Distributions α w t = (1 − α)L −α t Kt ,
r t = αL 1−α K tα−1 − δ. t
Monetary Authority. Nominal money grows at the exogenous rate θ t : M t − M t−1 = θt . M t−1 The seigniorage is transferred lump-sum to the households: t rt =
M t − M t−1 . Pt
1) Define a recursive stationary equilibrium which is characterized by a constant money growth rate θ and constant distribution F (ε, k, m). ¯ money is super-neutral in 2) Show that in the homogeneous-agent case, ε j = ε, the stationary equilibrium, i.e. the steady-state growth rate of money θ has no effect on the real variables of the model. 3) Compute the heterogeneous-agent model for the following calibration: Periods correspond to years. The number of productivities is set to n = 5 with E = {0.2327, 0.4476, 0.7851, 1.0544, 1.7129}. Further, γ = 0.990, β = 0.96, α = 0.36, and δ = 0.04. The transition matrix is given by:
0.3500 0.0800 π(ε0 |ε) = 0.0800 0.0800 0.0800
0.6500 0.6751 0.1651 0.0422 0.0371
0.0000 0.1702 0.5162 0.1995 0.0345
0.0000 0.0364 0.2003 0.5224 0.1606
0.0000 0.0383 0.0384 . 0.1559 0.6879
Show that money is not superneutral (consider θ ∈ {0, 5%, 10%}). Can you think of any reason for this result?
Chapter 9
Dynamics of the Distribution Function
9.1 Introduction This chapter presents methods to compute the dynamics of an economy that is populated by heterogeneous agents. In the Section 9.2, we show that this amounts to computing the law of motion for the distribution function F (ε, a) of wealth among agents. In Section 9.3, we concentrate on an economy without aggregate uncertainty. The initial distribution is not stationary. For example, this might be the case after a change in policy, e.g., after a change in the income tax schedule, or during a demographic transition, which many modern industrialized countries are currently experiencing. Given this initial distribution, we compute the transition to the new stationary equilibrium. With the methods developed in this section, we are able to answer questions regarding how the concentration of wealth evolves following a change in capital taxation or how the income distribution evolves following a change in the unemployment compensation system. In Section 9.4, we consider a model with aggregate risk. There are many ways to introduce aggregate risk, but we will focus on a simple case. We distinguish good and bad times that we identify with booms and recessions during the business cycle. In good times, employment probabilities increase and productivity rises. The opposite holds during a recession. To solve for the dynamics of the stochastic heterogenous-agent neoclassical growth model, we introduce you to the method developed by Krusell and Smith (1998). In Section 9.5, we study two prominent applications from the literature and compute the income and wealth distribution dynamics over the business cycle. While the first application focuses on the welfare effects from business-cycle fluctuations, the second application considers the cyclical dynamics of the income shares. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_9
485
486
9 Dynamics of the Distribution Function
9.2 Motivation In the previous chapter, we focused on the case of a stationary equilibrium where the distribution of wealth is invariant. If we want to compute the nonstationary state of an economy, we face severe problems. Consider Example 8.4.1 that we restate for your convenience. However, we will now also consider the case in which the economy is not in the stationary equilibrium. In our illustrative example, households are either employed (ε t = e) or unemployed (ε t = u) in period t receiving wage income w t or unemployment compensation b t , respectively. They pay income taxes τ t on wage income, w t , and interest income, r t a t . The households maximize intertemporal utility (8.27) Et
∞ X s=0
β s u (c t+s ) ,
subject to the budget constraint (8.30) (1 + (1 − τ t )r t ) a t + (1 − τ t )w t − c t , if ε t = e, a t+1 = (1 + (1 − τ t )r t ) a t + b t − c t , if ε t = u, the employment transition probability (8.29) puu pue 0 0 π(ε |ε) = Prob ε t+1 = ε |ε t = ε = , peu pee and the aggregate consistency condition (8.36a) X Z∞ Kt =
ε t ∈{e,u}
amin
a t f t (ε t , a t ) d a t ,
(9.1)
(9.2)
(9.3)
where f t denotes the density function associated with the distribution function F t . Note further that, outside the stationary equilibrium, the income tax rate τ t is no longer time-invariant and adjusts to balance the fiscal budget: τ t Yt = b t (1 − Nt ),
(9.4)
where Yt = K tα L 1−α is aggregate production and Nt = L t denotes both the t number of employed households and aggregate labor supply because we assume an inelastic labor supply of one unit.
9.2 Motivation
487
The dynamics of the distribution are described by X F t+1 (ε t+1 , a t+1 ) = π(ε t+1 |ε t ) F t ε t , a−1 (ε , a ) t t+1 t+1 ε t ∈{e,u}
=: G(F t ),
(9.5)
where a−1 t+1 (ε t , a t+1 ) is the inverse of the optimal policy a t+1 = a t+1 (ε t , a t ) with regard to current-period wealth a t . Again, we assume that a t+1 (·) is invertible, which will be the case in our example economy. Given the distribution function F t , we can compute the aggregate capital stock K t from (9.3) as in (8.44). Furthermore, (9.5) constitutes a functional equation because it describes a map G on a function space. The factor prices depend on the aggregate capital stock K t and the aggregate employment L t in period t: w t = w(K t , L t ),
(9.6)
r t = r(K t , L t ).
(9.7)
The household’s first-order condition with respect to its intertemporal consumption allocation depends on consumption in this and the next period, c t and c t+1 : u0 (c t ) = βE t u0 (c t+1 ) (1 + (1 − τ t+1 )r t+1 ) . (9.8) Consumption in this and the next period follow from the budget constraint (9.1). What do we need to compute the solution to (9.8)? The household observes the following current-period aggregate variables: the aggregate capital stock K t , aggregate employment L t , the wage rate w t , the interest rate r t , the tax rate τ t , and the distribution of the assets F t (ε t , a t ). Her individual state space consists of her employment status ε t and her individual assets a t . The solution of (9.8), as we will argue in the following, consists of a time-invariant function a0 (ε, a, F ) which gives her the optimal next-period capital stock a t+1 = a0 (ε t , a t , F t ). Different from the optimal policy function in the computation of the stationary state in Chapter 8, we also include the distribution F as an additional argument.1 Why? In 1
In our specific model, aggregate employment L = N is constant, and we are able to drop it from the list of arguments. In other models with endogenous labor supply, L is also an additional argument of the policy functions, as we will argue below. See Problem 9.1 for the consideration of endogenous labor L as an additional argument in the dynamics of the economy.
488
9 Dynamics of the Distribution Function
Chapter 8, the aggregate capital stock K t = K is constant. Therefore, the interest rate and the wage rate are also constant. In the present model, K t and, hence, r t , w t , and τ t are not constant. As the solution of (9.8) clearly depends on r t (via c t and the budget constraint), K t also needs to be an argument of the policy function a t+1 = a0 (·).2 K t , however, can be computed with the help of the distribution F t (ε t , a t ) using (9.3). Now, we only need to explain why we also have to include the distribution of the individual states, F t (ε t , a t ), as an additional argument and do not only use the capital stock K t instead. Consider again (9.8). The next-period interest rate r t+1 appears on the rhs of the equation. Therefore, the households need to predict r t+1 . In the stationary economy, this is not a problem: r t+1 = r. In the representative-agent economy without aggregate risk it is sufficient to assume that individual households know the law of motion of the aggregate capital stock K t+1 = g(K t ) and observe the current aggregate capital stock K t to be able to solve their decision problem.3 In the heterogeneousagent economy, however, the individual household is unable to infer the value of the next-period aggregate capital stock K t+1 from the present capital stock K t . She needs to predict the distribution of the individual state variables in the next period, F t+1 (ε t+1 , a t+1 ), using F t+1 (·) = G(F t (·)) to infer K t+1 from (9.3). As a consequence, the distribution F t (ε t , a t ) is also an argument of the policy function. In particular, if we consider different distributions F t (ε t , a t ) that are characterized by the same mean a¯ t = K t , we will have different next-period distributions F t+1 (ε t+1 , a t+1 ) which only by chance will all have the same mean a t+1 = K t+1 .4 We are now ready to formulate the recursive problem. We will omit the time index t from the variables in the definition of the recursive equilibrium to keep the notation as simple as possible. The household maximizes her value function: v(ε, a, F ) = max u(c) + βE v(ε0 , a0 , F 0 ) ε, F , (9.9) c
subject to the budget constraint (9.1), the government policy (b, τ), the stochastic process of the employment status ε as given by (9.2), and the distribution dynamics (9.5). Again, the value function is a function of individual states ε and a and the distribution F (·). The distribution of 2
Alternatively, we could have used the variable r t rather than K t as an argument of a0 (·). See Ljungqvist and Sargent (2018), Chapter 12, for the case of aggregate uncertainty and endogenous labor supply. 4 In Section 8.2, we considered the special case of Gorman preferences where the distribution of individual wealth does not have any effect on the dynamics of the aggregate variables. 3
9.3 Transition Dynamics
489
assets, F , however, is an infinite-dimensional object, and we cannot track it. Furthermore, finding the law of motion for the distribution, G(F ), is not trivial, as G is a map from the set of functions (an infinite dimensional space) into itself.
9.3 Transition Dynamics In this section, we consider the transition dynamics for a given initial state in the economy with aggregate certainty as described by the following Example:5 Example 9.3.1 Households are allocated uniformly along the unit interval and are of measure one. The individual household maximizes 1−η ct 0 0 0 v(ε, a, F ) = max + βE v(ε , a , F ) ε, F , c 1−η s.t.
a0 =
§
(1 + (1 − τ)r) a + (1 − τ)w − c, if ε = e, (1 + (1 − τ)r) a + b − c, if ε = u,
a ≥ amin ,
puu pue 0 π(ε |ε) = P r ob ε t+1 = ε |ε t = ε = . peu pee 0
The distribution F of (ε, a) is described by the following dynamics: X F 0 (ε0 , a0 ) = π(ε0 |ε) F (ε, a0−1 (ε, a0 , F )). ε∈{e,u}
Factor prices are equal to their respective marginal products: 1−α L − δ, K α K w = (1 − α) . L r =α
The aggregate consistency conditions hold: 5
Please recall that aggregate variables such as r, w, b, or τ vary over time, even though we omitted the time index.
490
9 Dynamics of the Distribution Function K=
X Z ε∈{e,u}
C=
X Z
ε∈{e,u}
∞ amin ∞ amin
a f (ε, a) d a, c f (ε, a) d a,
T = τ(wL + r K), Z∞ B=
amin
b f (u, a) d a,
L = N, where f is the density function associated with F . The government policy is characterized by a constant replacement rate ζ = b/(1 − τ)w and a balanced budget: T = B.
We will introduce two ways to approximate the dynamics of the distribution (9.5). The first is to use partial information and was initially applied by Den Haan (1997) and Krusell and Smith (1998). The basic idea is that households do not use all the information at hand, i.e., the distribution F , but only use limited information about F , for example the first moment. By this device, we reduce the infinite-dimensional problem of finding a law of motion for F to a finite-dimensional problem. The second method is a shooting method, which is only applicable to models with aggregate certainty. In this case, one assumes that one reaches the new stationary equilibrium after T periods and projects a transition path for the prices {(w t , r t )} Tt=0 over the next T periods.6 Given the dynamics of prices and the optimal policy functions, we can compute the dynamics of the distribution. From this, we can update the time path for the factor prices until the algorithm converges. We will present the two approaches in turn. Both approaches assume that the stationary equilibrium is stable and that the distribution function converges to the invariant distribution function.
9.3.1 Partial Information In this subsection, we assume that agents only use partial information to predict the law of motion for the state variable(s) or, equivalently, 6
If we considered aggregate uncertainty, we would have to project a distribution over the factor prices, which would again make the problem much more complicated.
9.3 Transition Dynamics
491
are boundedly rational. Agents perceive the dynamics of the distribution F 0 = G(F ) in a simplified way. In particular, they characterize the distribution F by I statistics m = (m1 , . . . , m I ). In Chapter 8, we approximated the invariant distribution function with an exponential function. One might use the parameters ρi of the approximated exponential distribution function as statistics mi , for example. In this section, we follow Krusell and Smith (1998) and use the moments of the distribution function instead. In particular, we concentrate our analysis on the simple case in which agents only use the first moment m1 , i.e., the aggregate capital stock K. Krusell and Smith (1998) find that the forecast error due to the omission of higher moments is extremely small.7 The economic intuition for this result is straightforward. Higher moments of the wealth distribution only have an effect on aggregate next-period capital stock if agents of different wealth levels have different propensities to save out of wealth. However, most agents (except for very poor agents, who, of course, do not contribute much to total savings) have approximately the same savings rate.8 Therefore, the omission of higher moments is justified for the present case. Accordingly, we assume that agents perceive the law of motion for m as follows: m0 = H I (m)
(9.10)
with I = 1. Given the law of motion for m and the initial value of m, each agent optimizes her intertemporal consumption allocation by solving the following problem: v(ε, a, m) = max u(c) + βE v(ε0 , a0 , m0 ) ε, m , (9.11) c
subject to the budget constraint (9.1), the government policy (b, τ), the stochastic process of the employment status ε as given by (9.2), and the distribution dynamics (9.10). Again, the factor prices are computed as functions of the aggregate capital stock and employment, w = w(K, L) and 7
In Problem 9.1, you will be asked to verify this hypothesis. Young (2005) notes that higher moments do not influence nor are they influenced by the mean in this class of models. Therefore, they do not affect the forecasting of prices. In addition, he finds that the algorithm is robust to changes in the demographic structure, preferences, and curvature in the savings return. 8 Empirically, high-income households save a larger fraction than low-income households in the US. Huggett and Ventura (2000), however, show that age and relative permanent earnings differences across households together with the social security system are sufficient to replicate this fact. All these factors are absent from the model in Example 9.3.1.
492
9 Dynamics of the Distribution Function
r = r(K, L), where the aggregate capital stock is given by the first moment of the distribution K = m1 .9 Similarly, we can compute the income tax rate τ and the unemployment compensation b from the balanced budget and, for a given replacement rate ζ, aggregate capital K and employment L (with L = N ) for every period t: τK α L 1−α = T = B = (1 − N )b, b = ζ(1 − τ)w = ζ(1 − τ)(1 − α)
−α K . L
The remaining problem is to approximate the law of motion for the moments m of the distribution, m = m1 = K. We will choose a simple parameterized functional form for H1 (m) following Krusell and Smith (1998): ln K 0 = γ0 + γ1 ln K.
(9.12)
Given the function H1 , we can solve the consumer’s problem and compute optimal decision functions. For given initial distribution F0 with mean K0 , we can simulate the behavior of the economy over time and, in particular, are able to compute the law of motion for K and compare it to our projection (9.12). If the goodness of fit is not satisfactory, we might want to consider different functional forms for H I or a higher order I. As it turns out, one moment, I = 1, and the functional form (9.12) are quite satisfactory. The algorithm can be described by the following steps: Algorithm 9.3.1 (Transition Dynamics with Bounded Rationality) Purpose: Computation of the transition dynamics of the distribution function for Example 9.3.1 with given initial distribution F0 and the dynamics presented by (9.10). Steps: Step 1: Choose the initial distribution of assets F0 with mean K0 . Step 2: Choose the order I of moments m. Step 3: Guess a parameterized functional form for H I , and choose initial parameters of H I . Step 4: Solve the consumer’s optimization problem, and compute v(ε, a, m). Step 5: Simulate the dynamics of the distribution. Step 6: Use the time path for the distribution to estimate the law of motion for the moments m. 9
Again, we drop the time index from the variables to keep the notation simple.
9.3 Transition Dynamics
493
Step 7: Iterate until the parameters of H I converge. Step 8: Test the goodness of fit for H I . If the fit is satisfactory, stop; otherwise, increase I, or choose a different functional form for H I .
CALIBRATION. The model of Example 9.3.1 is calibrated as the model of Example 8.4.1. In particular, the parameter values are given by α = 0.36, β = 0.995, η = 2, δ = 0.005, and the employment transition matrix: puu pue 0.500 0.500 = . peu pee 0.0435 0.9565 Different from Example 8.4.1, we do not need to fix unemployment benefits b to ensure convergence of the algorithm but can instead use the replacement rate of unemployment benefits with respect to wage income, ζ. This facilitates the calibration because the replacement rate ζ is readily observable from empirical data, contrary to the absolute amount of unemployment benefits b. For ζ, we will use the value of the endogenous replacement rate computed in Example 9.3.1 implying ζ = 25.88%. The minimum wealth amin , again, is set equal to -2. COMPUTATION. The algorithm for the computation of the transition dynamics is implemented in the GAUSS program transition_part.g.10 We choose an equispaced grid A = {a1 , . . . , an } = {−2, . . . , 3, 000} for wealth a ∈ A with n = 201 nodes as in the computation of Example 9.3.1. We approximate the distribution over the same interval but using n g = 603 points. For the aggregate capital stock K, we also choose an equispaced grid K = {K1 , . . . , KnK } = {140, . . . , 340}. The grid for the aggregate capital stock K consists of nK = 6 nodes, and its minimum and maximum values are approximately equal to the stationary equilibrium value of the capital stock, K = 243.7 ± 100. In Step 1, we have to initialize the distribution function. We assume that at time period t = 0, the distribution is uniform over an interval approximately equal to [−2, 300]. The grid points that are closest to these values are -2 and 297.2, implying an aggregate capital stock K equal to 10
We also provide the Python code transition_part.py.
494
9 Dynamics of the Distribution Function
¯ = 147.6. If we considered a policy change, e.g., an increase the mean K in unemployment benefits b, we would have computed the invariant distribution of wealth prior to the policy change with the help of the methods developed in the previous chapter and would have used this distribution for our initialization of F . In Step 2, we set the order I equal to one, i.e., households only use the first moment of the wealth distribution as information about the distribution F . In Step 3, we choose the log-linear law of motion (9.12) for the capital stock and initialize the parameters γ0 = 0.09 and γ1 = 0.95.11 For the solution of the consumer’s optimization problem in Step 4, we resort to the methods presented in the first part of this book. In the computer program transition_part.g, we use value function iteration with linear interpolation. Our objective is to compute the value function v(ε, a, K) for ε = e, u. As we only consider the first moment of the distribution F , the value function is only a function of the employment status ε, the individual wealth a, and the aggregate capital stock K. Given the coarse grid for K, we again use linear interpolation to approximate the value function at points off the grid points. The initial value function for the employed agent, v(e, a, K), and the unemployed agent, v(u, a, K), are computed at the grid points (ai , K j ), i = 1, . . . , n and j = 1, . . . , nK , assuming that agents consume their current income permanently: v0 (e, ai , K j ) =
∞ X t=0
β t u (1 − τ)r(K j )ai + (1 − τ)w(K j )
1 u (1 − τ)r(K j )ai + (1 − τ)w(K j ) , 1−β 1 v0 (u, ai , K j ) = u (1 − τ)r(K j )ai + b(K j ) . 1−β =
The interest rate and the wage rate, of course, are functions of the aggregate capital stock K j (and so are the unemployment benefits b = ζ(1−τ)w(K j )). For a given value function in iteration l, we can compute the value function of the employed agent, for example, in the next iteration l + 1 11
Our choice of parameters (γ0 , γ1 ) is loosely motivated by the solution of Krusell and Smith (1998) who, for example, compute γ1 equal to 0.951 and 0.953 in bad and good times, respectively. Alternatively, one could start with a value of γ1 from the policy solution of the linearized neoclassical growth model. For example, Heer (2019), p. 25, finds a value of γ1 = 0.969 which fulfills the stability condition γ1 < 1. In steady state, (9.12) then implies γ0 = (1 − γ1 ) ln K = 0.031 · ln(243.7) = 0.170. You are encouraged to use these alternative starting values for (γ0 , γ1 ) in your computations with the program transition_part.g.
9.3 Transition Dynamics
495
from: vl+1 (e, ai , K j ) = max {u (c) + c
β pee vl e, a0 , eγ0 +γ1 ln K j + peu vl u, a0 , eγ0 +γ1 ln K j
with c = 1 + (1 − τ)r(K j ) ai + (1 − τ)w(K j ) − a0 .
The value function is computed for every aggregate capital stock K j ∈ K and individual wealth ai ∈ A . The outer loop of the iteration is over the capital stock K j . Given K = K j , we can compute the factor prices w and r, unemployment compensation b, income taxes τ, and the next-period capital stock K 0 . For given w, r, b, τ, and K 0 , we can compute the value function v(e, a, K j ) and v(u, a, K j ) at every grid point a = ai . Note that we do not have to compute the function v(e, a0 , K 0 ) and v(u, a0 , K 0 ) on the rhs of the Bellman equation for a0 ∈ A at each iteration over the individual wealth ai ∈ A but only once before we start the iteration over a because we know K 0 in advance (after applying (9.12)). In the program transition_part.g, we store the value functions v(e, a0 , K 0 ) and v(u, a0 , K 0 ) for K 0 = K 0 (K j ) and a0 = a1 , . . . , an in the vectors ve1 and vu1, respectively, before we start the iteration over a ∈ A . To find the optimum a0 , we only need to use the values from these onedimensional vectors (or interpolate linearly between two values am < a0 < am+1 of these vectors, respectively). The maximization of the rhs of the Bellman equation is performed using the golden section search procedure explained in Section 15.4.1. The computed value functions of the employed consumer for aggregate capital stocks K ∈ {140, 260, 340} are displayed in Figure 9.1. Since the value function of the unemployed worker displays the same behavior, we do not show this function. The value function is a concave increasing function of individual wealth a. For low aggregate capital stock K = 140 (black line), the interest rate r is large and the wage rate w is low. For low wealth a, therefore, the value is lower than the value for a larger aggregate capital stock K = 260 or K = 340 (the blue and red lines). At an intermediate wealth level a approximately equal to 240, the value function of the agents is again higher for a low capital stock K = 140 than in the other cases with a high capital stock K because interest income r a becomes a more important component of total income. The savings behavior of the households is displayed in Figure 9.2. Savings increase with a higher interest rate r or, equally, lower capital stock
496
9 Dynamics of the Distribution Function
v(e, a, K)
−25
K = 140 K = 260 K = 340
−30 −35 −40 −45 −50 −2
100
200
300
400
500
a
600
700
Figure 9.1 Value Function of the Employed Worker 2 1
a0 − a
0 −1
Employed: K = 260 Employed: K = 340 Unemployed: K = 260
−2 −3 −2
100
200
300
400
a
500
600
700
Figure 9.2 Savings of the Workers
K. The savings function a0 − a of the employed worker is presented by the black (blue) line for aggregate capital stock K = 260 (K = 340). Savings of the unemployed workers are smaller than those of the employed workers because they have lower wage income. The bottom curve in Figure 9.2 displays the savings function of the unemployed worker if the aggregate
9.3 Transition Dynamics
497
capital stock is equal to K = 260. Evidently, the unemployed household dissaves at any wealth level a. In Step 5, we compute the dynamics of the distribution function. Given F0 , we can compute F t from F t−1 with the help of the agent’s savings function using Step 3 in Algorithm 8.4.2. The optimal policy functions off grid points (a, K) are computed with the help of linear interpolation. The dynamics of the distribution function are displayed in Figure 9.3, which presents the distribution function of the employed workers at periods t = 0, t = 10, t = 100, and t = 2, 000. After 2,000 iterations, the distribution function is stationary, and the transition is complete. t t t t
0.004
f (e, a)
0.003
=0 = 10 = 100 = 2000
0.002 0.001 0.000 −2
100
200
300
400
500
a
600
700
Figure 9.3 Dynamics of the Density Function over Time
The mean of the distribution is already constant after 1,000 iterations ¯ = 246.9. The convergence of the distribution’s mean is and amounts to K displayed in Figure 9.4. If you compare Figures 9.4 and 8.5, you cannot help noticing that the convergence of the distribution’s mean (and of the higher moments) occurs much faster in Figure 9.4 than in Figure 8.5, i.e., the distribution function approaches the stationary distribution much faster in the case where we model the transition of the aggregate capital stock. What is the reason for this observation? Assume that we start with a distribution that has ¯ . In the computation of an initial mean a¯0 below the stationary mean K ¯ in every the stationary equilibrium in Chapter 8, we assumed that K t = K period, while we use the aggregate capital stock K0 = a¯0 in Algorithm
498
9 Dynamics of the Distribution Function 250
Kt
225 200 175 150 0
500
1,000
1,500
2,000
t Figure 9.4 Convergence of the Aggregate Capital Stock
9.3.1. Accordingly, the interest rate is lower in the economy with a constant interest rate than in the economy where we model the transition dynamics. Therefore, agents have lower savings in each period in the economy with constant factor prices, and the mean of the distribution adjusts at a slower pace. Consequently, Step 5 in Algorithm 8.4.1 where we compute the stationary distribution is much more time-consuming than the corresponding Step 5 in Algorithm 9.3.1. However, it would be wrong to conclude that Algorithm 9.3.1 is faster than Algorithm 8.4.1. Importantly, when using Algorithm 9.3.1, we 1) need to iterate over the law of motion for the aggregate capital stock and 2) compute the policy functions for a state space that is characterized by a higher dimension. As a consequence, the computational time may be much higher in the latter case. In our example, computational time amounts to 1 hour and 18 minutes, 17 seconds using an Intel(R) Xeon(R), 2.90 GHz machine which is approximately equal to the runtime of Algorithm 8.4.3 presented in Table 8.1. However, we have to recognize that we only use a very coarse grid over the aggregate capital stock K in the computation of the value function. Therefore, if we only seek to compute stationary equilibria, we would rather apply the methods presented in Chapter 8, bearing in mind that Algorithm 8.4.3 may not necessarily converge (as we experienced in the case in which we did not fix unemployment compensation b but adjusted it with the help of the equilibrium fiscal budget). We can use the time path of the capital stock displayed in Figure 9.4 to update the coefficients γ0 and γ1 (Steps 6 and 7). We apply ordinary least squares (OLS) regression to compute the two coefficients (Step 6). However, we only use the first 1,000 values for the capital stock {K t } Tt=0 . Close
9.3 Transition Dynamics
499
to the stationary value of K, we only have observation points (K, K 0 ) where K and K 0 are almost equal and display little variation. In this regression, we obtain a very high R2 , almost equal to one, R2 = 1.000. The computed dynamics (K, K 0 ) (simulated) and the regression line (as predicted by the households with the help of (9.12)) are almost identical. Obviously, the fit is extremely good.12 In Step 7, we update the parameters γ0 and γ1 until they converge. The final solution for the law of motion for the capital stock is given by ln K 0 = 0.0425835 + 0.9922636 ln K.
(9.13) γ0
¯ = e 1−γ1 = This equation implies a stationary capital stock equal to K 245.8, which is somewhat lower than that computed from the simulation ¯ = 246.9). For γ1 close to one, small errors in the estimation of γi , (K ¯ . For γ0 = 0.04262 and i = 0, 1, imply large errors in the computation of K ¯ = 246.9. γ1 = 0.9922636, the stationary capital stock is already equal to K In the present model, K is a sufficient predictor for factor prices and taxes. We can compute the wage rate w, the interest rate r, the tax rate τ that balances the government budget, and the unemployment compensation b if we only know K. In many economic applications, however, the distribution of wealth and its mean are not a sufficient statistic for factor prices. Consider the case of an elastic labor supply. Households maximize their utility by their choice of leisure. For example, assume instantaneous utility to be of the form u(c, 1 − l) =
c(1 − l)θ
1−η
1−η ,
(9.14)
where l denotes labor supply and 1 − l is leisure (the time endowment of the household is normalized to one). The labor income of the employed worker is simply the net wage rate times the working hours, (1 − τ)wl, and aggregate labor L in period t is given by Z L=
∞ amin
l(a; K, L) f (e, a) d a,
(9.15)
where the labor supply of the unemployed is equal to zero. In this case, individual labor supply depends on individual wealth a, and consequently, 12
We discuss the use of R2 as an accuracy measure in the next section, where we introduce uncertainty into the model.
500
9 Dynamics of the Distribution Function
aggregate labor supply L depends on the distribution of wealth.13 In this case, we also need to estimate a prediction function for aggregate labor L 0 = J(L, K),
(9.16)
that, for example, might take the log-linear form ln L 0 = ψ0 + ψ1 ln L + ψ2 ln K. The household maximizes intertemporal utility subject to the additional constraint (9.16), and the value function v(ε, a, K, L) has aggregate labor L as an additional argument. Alternatively, you may attempt to specify aggregate employment L 0 as a function of next-period capital K: L 0 = J˜(K 0 ).
(9.17)
The latter specification has the advantage that the state space is smaller; aggregate employment L is no longer a state variable. Current-period capital K is used to forecast K 0 , which, in turn, is used to forecast L 0 . You should choose the specification that provides a better fit as, for example, measured by R2 . Of course, the method developed in this chapter is still applicable to such more complicated problems, and you will be asked to solve the growth model with endogenous labor supply in Problem 9.1 using (9.16). In Section 11.3.4, we will apply the Krusell-Smith algorithm to the solution of an OLG model using the forecasting function (9.17). In particular, we will choose the function form ln L 0 = ψ0 + ψ1 ln K 0 .
9.3.2 Guessing a Finite Time Path for the Factor Prices In the previous section, we computed the value function as a function of the aggregate capital stock. If the model becomes more complex, e.g., if we consider endogenous labor supply, endogenous technology, or multiple financial assets, the number of arguments in the value function rises, and the computation becomes more cumbersome. In this section, we introduce another method for computing the transition path that only considers the individual variables as arguments of the value function (or policy functions). The only additional variable of both the value function and the policy functions is time t. The method presented in this section, however, is only applicable to deterministic economies. 13
Individual labor supply also depends on the wage rate and, hence, on K and L.
9.3 Transition Dynamics
501
Again, we consider the transition to a stationary equilibrium. For the computation, we assume that the stationary equilibrium is reached in finite time, after T periods. Typically, we choose T large enough, say T = 1, 000 or higher. Furthermore, we can compute the stationary equilibrium at period t ≥ T with the help of the methods developed in the previous chapter. We also know the distribution of wealth in the initial period t = 0 and, therefore, the aggregate capital stock and the factor prices in period t = 0 and t = T . To compute the policy functions during the transition, we need to know the time path of the factor prices, or, equivalently, the time path of the aggregate capital stock. We start with an initial guess for the time path of the factor prices and compute the decision functions, and with the help of the initial distribution and the computed decision functions, we are able to compute the implied time path of the factor prices. If the initial guess of the factor prices is different from the values implied by our simulation, we update the guess accordingly. The algorithm can be described by the following steps:14 Algorithm 9.3.2 (Computation of Example 9.3.1) Purpose: Computation of the transition dynamics by guessing a finite-time path for the factor prices Steps: Step 1: Choose the number of transition periods T . Step 2: Compute the stationary distribution F˜ of the new stationary equilibrium. Initialize the first-period distribution function F 0 . Step 3: Guess a time path for the factor prices r and w, unemployment compensation b, and the income tax rate τ that balances the budget. The values of these variables in both periods t = 0 and t = T are implied by the initial and stationary distribution, respectively. Step 4: Compute the optimal decision functions using the guess for the interest rate r, wage income w, tax rate τ, and the unemployment compensation b. Iterate backwards in time, t = T − 1, . . . , 0. Step 5: Simulate the dynamics of the distribution with the help of the optimal policy functions and the initial distribution for the transition from t = 0 to t = T . Step 6: Compute the time path for the interest rate r, the wage w, unemployment compensation b, and the income tax rate τ, and return to Step 3, if necessary. 14
The algorithm follows Ríos-Rull (1999) with some minor modifications.
502
9 Dynamics of the Distribution Function
Step 7: Compare the simulated distribution F T with the stationary distribution function F˜ . If the goodness of fit is poor, increase the number of transition periods T . In Step 4, we compute the optimal policy functions by backward iteration. In period T , we know the new stationary distribution, optimal policy functions, and the factor prices. For periods t = T − 1, . . . , 1, we may recursively compute the policy functions c t (ε t , a t ) and a t+1 (ε t , a t ) for consumption and next-period assets with the methods developed in Part I of this book.15 For example, we may compute c t (ε t , a t ) and a t+1 (ε t , a t ) for given policy functions c t+1 (ε t+1 , a t+1 ) and a t+2 (ε t+1 , a t+1 ) from the Euler equation (8.31) with the help of projection methods:16 u0 (c t (ε t , a t )) =E t u0 (c t+1 (ε t+1 , a t+1 ))(1 + (1 − τ t+1 )r t+1 ) , β (9.18) ε t = e, u, with c t (e, a t ) = (1 + r t (1 − τ t ))a t + (1 − τ t )w t − a t+1 (e, a t ),
c t (u, a t ) = (1 + r t (1 − τ t ))a t + b t − a t+1 (u, a t )
for the employed and unemployed worker, respectively. Alternatively, we may compute the optimal policy functions with value function iteration from the Bellman equation (8.35): vt (ε t , a t ) = max [u(c t ) + βE t { vt+1 (ε t+1 , a t+1 )| ε t }] . ct
(9.19)
In period t = T − 1, again, we know the optimal next-period consumption policy c T and the value function vT , which are equal to the stationary optimal consumption policy and value function, respectively. Note that to compute the policy functions from (9.19), we need to store vt , t = 1, . . . , T , but the value function is only a function of individual state variables ε and a. We iterate backwards in time and compute c t given vt+1 . Once we have computed the optimal consumption policy for t = 0, . . . , T , with the help of the value function or the Euler equation, it is straightforward to simulate the behavior of the economy with the help of the 15
Note that with the present method, the policy functions are no longer time-invariant. Optimal consumption c t (·) depends on the period t via the factor prices w t and r t , which are not arguments of the policy function. Therefore, we have to compute the optimal policy function in every period t = 0, . . . , T . 16 You will be asked to compute the transition dynamics using projection methods for the computation of the policy functions in Problem 9.1.
9.3 Transition Dynamics
503
first-period distribution function and compute the time path for the capital stock {K t } Tt=0 . COMPUTATION. The algorithm is implemented in program transition_ guess.g. The model of Example 9.3.1 is calibrated in exactly the same way as in the previous section, and we also choose the same grid over the asset space A for the value function and the distribution function as in programs transition_part.g. This is very convenient, as it allows us to load the stationary policy function and distribution function as an input into the computation.17 The transition time is set equal to T = 2, 000 periods. In Step 2, the initial distribution is chosen to be the uniform distribution over the interval of [−2, 297.2] as in the previous section. There are various ways to update the time path for {K t } Tt=0 in Step 6. We may resort to a parameterized recursive equation (9.12) as in the previous section and adjust the parameters γ0 and γ1 as in program transition_part.g. Alternatively, we may use a tatonnement algorithm guessing an initial sequence {K t } Tt=0 and updating it after each iteration i ˜ i , t = 1, . . . , T − 1, K ti = (1 − φ)K ti−1 + φ K t
˜ i denotes the aggregate capital stock that results from summing up where K t the individual assets a t (·) in iteration i. This approach is used in program transition_guess.g. A third approach is to use any standard nonlinear equation solution method, e.g., Newton’s method, to find the sequence {K t } Tt=0 that implies the same sequence for the simulated model.18 In the present case, the algorithm converges after 26 iterations over the sequence t=2,000 {K t } t=0 . The computational time amounts to 1 hour, 28 minutes and is longer than that of Algorithm 9.3.1. Further note that, different from Algorithm 9.3.1, Algorithm 9.3.2 is using the new steady-state distribution as an input that requires an additional 1 hour, 30 minutes of computation, while Algorithm 9.3.1 might need some time-consuming experimentation with an educated guess for the law of motion for the moments. The simulated time path and the projected time path of the capital stock are almost identical, and the deviation only amounts to 0.1% on average during the transition. 17
The function and distribution are computed with the help of program IVDenF.g, as described in Chapter 8. 18 In Section 11.3.4, we will study the transition for an overlapping generations model and use a quasi-Newton method to update the factor price time series.
504
9 Dynamics of the Distribution Function
The dynamics of the distribution over time are displayed in Figure 9.5. From the initial uniform distribution (black line), the distribution slowly converges to the final distribution (brown) in period T = 2, 000.
0.004
f (e, a)
0.003
t t t t
=0 = 10 = 100 = 2000
600
700
0.002 0.001 0.000 −2
100
200
300
400
500
a
Figure 9.5 The Dynamics of the Density Function
stationary partial after t = 2000 guess after t = 2000
0.004
f (e, a)
0.003 0.002 0.001 0.000 −2
100
200
300
400
a
Figure 9.6 Goodness of Fit for Stationary Density
500
600
700
9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm
505
The distributions at the end of the transition for both Algorithms 9.3.1 and 9.3.2 are compared to the new stationary distribution of wealth a in Figure 9.6. The three distributions have almost the same means, which deviate from one another by less than 1%. However, the second moments vary because the right tail of the density functions after 2,000 periods is thinner than that of the new stationary distribution. A longer transition period may even improve the fit.
9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm Thus far, we have only considered individual risk in this chapter. Agents faced an idiosyncratic risk of becoming unemployed, while the real interest rate and the factor prices were constant in the stationary state. Only during the transition to the new steady state did factor prices vary. In this section, we also examine aggregate risk and consider the algorithm of Krusell and Smith (1998).19 Due to the presence of aggregate uncertainty, there are three major changes in the computation of the model compared to that in Section 9.3: 1) The distribution of wealth is not stationary. 2) The employment level L and aggregate capital K fluctuate. 3) If we approximate the distribution function of wealth by its first I moments, for example, the value function is a function of the employment status ε, individual wealth a, the first I moments of wealth, and, in addition, aggregate technology Z.
9.4.1 The Economy As in Chapter 1, aggregate risk is introduced by a stochastic technology level Z t in period t. In particular, the productivity shock follows a Markov process with transition matrix Γ Z (Z 0 |Z), where Z 0 denotes next-period technology level and π Z Z 0 denotes the transition probability from state Z to Z 0 . This assumption is not very restrictive. Given empirical evidence, we 19
The algorithm is considered in more detail in a special issue of the Journal of Economic Dynamics and Control (2010, vol. 34). Den Haan (2010b) provides a summary of the different numerical methods to solve the incomplete markets model with aggregate uncertainty that are published in this issue and compares the behavior of the individual variables (in particular, during exceptional times) and the high-order moments of the cross-sectional distribution.
506
9 Dynamics of the Distribution Function
assumed in Chapter 1 that productivity Z t followed an AR(1)-process. As you also learn in Section 16.4, an AR(1)-process can easily be approximated by a finite Markov chain.20 With stochastic technology level Z t , aggregate production is given by: Yt = Z t K tα L 1−α . t
(9.20)
We assume competitive factor and product markets, implying the factor prices: w t = Z t (1 − α)K tα L −α t , rt =
Z t αK tα−1 L 1−α t
(9.21a) (9.21b)
− δ.
The individual employment probabilities, of course, depend on the aggregate productivity Z t . In good times (high productivity Z t ), agents have higher employment probabilities than in bad times. The joint process of the two shocks, Z t and ε t , can be written as a Markov process with transition 0 0 matrix Γ (Z , ε )|(Z, ε) . We use p(Z,ε),(Z 0 ,ε0 ) to denote the probability of transition from state (Z, ε) to state (Z 0 , ε0 ). In the following, we restrict our attention to a very simple example. The economy only experiences good and bad times with technology levels Z g and Z b , respectively, where Z g > Z b . As before, agents are either employed (ε = e) or unemployed (ε = u). Consequently, the joint processes on (Z, ε) are Markov chains with 4 states. Households are assumed to know the law of motion of both ε t and Z t , and they observe the realization of both stochastic processes at the beginning of each period. In addition, the model is identical to that in Example 9.3.1 and is summarized in the following: Example 9.4.1 Households are of measure one. The individual household maximizes v(ε, a, Z, F ) = max c
s.t. a0 = 20
§
1−η
ct
1−η
+ βE v(ε , a , Z , F ) ε, Z, F ,
0
0
0
0
(1 + (1 − τ)r) a + (1 − τ)w − c, if ε = e, (1 + (1 − τ)r) a + b − c, if ε = u,
In Problem 9.2, you will be asked to compute the solution for a heterogeneous-agent economy with aggregate uncertainty where productivity shocks follow an AR(1)-process using Tauchen’s and Rouwenhorst’s methods.
9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm
507
a ≥ amin ,
Γ (Z 0 , ε0 )|(Z, ε) = P r o b Z t+1 = Z 0 , ε t+1 = ε0 | (Z t = Z, ε t = ε) p(Z g ,e),(Z g ,e) p(Z g ,e),(Z g ,u) p(Z g ,e),(Z b ,e) p(Z g ,e),(Z b ,u) p g p g p g g g b b p g = (Z ,u),(Z ,e) (Z ,u),(Z ,u) (Z ,u),(Z ,e) (Z ,u),(Z ,u) , p(Z b ,e),(Z g ,e) p(Z b ,e),(Z g ,u) p(Z b ,e),(Z b ,e) p(Z b ,e),(Z b ,u) p(Z b ,u),(Z g ,e) p(Z b ,u),(Z g ,u) p(Z b ,u),(Z b ,e) p(Z b ,u),(Z b ,u)
where, for example, p(Z g ,u),(Z b ,e) denotes the probability that the unemployed worker in period t with aggregate technology level Z = Z g becomes employed in period t + 1 and next-period technology is equal to Z 0 = Z b . The distribution of the individual states (ε, a) for given aggregate technology Z in period t is denoted by F (ε, a, Z). The dynamics of the distribution of the individual states are described by the following equations: X F 0 (ε0 , a0 , Z 0 ) = Γ (Z 0 , ε0 |Z, ε)F (ε, a, Z), ε
−1
where a = a0 (e, a0 , F, Z) is the inverse of the optimal policy function a0 = a0 (ε, a, F, Z) with respect to individual wealth a. Factor prices are equal to their respective marginal products: 1−α L r = αZ t − δ, (9.22a) K α K w = (1 − α)Z . (9.22b) L The aggregate consistency conditions hold: XZ ∞ K=
amin
ε
Z L= C=
∞ amin
XZ ε
a f (ε, a, Z) d a,
f (e, a, Z) d a, ∞ amin
c(ε, a, Z) f (ε, a, Z) d a,
T = τ(wL + r K), Z∞ B=
amin
b f (u, a, Z) d a.
(9.23a) (9.23b) (9.23c) (9.23d) (9.23e)
Government policy is characterized by a constant replacement rate ζ = b/(1−τ)w and a balanced budget: T = B.
Notice that the individual policy and value functions are functions of the individual state variables, assets a and employment status ε, and the
508
9 Dynamics of the Distribution Function
distribution function F . With the help of the distribution function F , the individual is able to compute aggregate capital stock K and employment L using (9.23a) and (9.23b) which, in turn, also imply the factor prices r and w as well as the unemployment compensation b and taxes τ via (9.22a),(9.22b), (9.23e), and (9.23d).
9.4.2 Computation Individual employment probabilities depend on both the current employment status ε and the current- and next-period productivity, Z and Z 0 . Given an employment distribution in period t, the next-period employment distribution depends on the technology level Z 0 because agents have a higher job finding probability in good times, Z 0 = Z g , than in bad times, Z 0 = Z b. We will simplify the analysis further following Krusell and Smith (1998). In particular, we assume that the unemployment rate takes only two values u g and u b in good times and in bad times, respectively, with u g < u b . To accordingly simplify the dynamics of aggregate employment, the following restrictions have to be imposed on the transition matrix Γ : uZ
p(Z,u),(Z 0 ,u) pZ Z 0
+ (1 − u Z )
p(Z,e),(Z 0 ,u) pZ Z 0
= uZ 0 ,
(9.24)
for Z, Z 0 ∈ {Z g , Z b }. (9.24) formulates a condition on the conditional probabilities to move from state (Z, u) in period t to the state (Z 0 , u) in period t + 1 if aggregate technology changes from Z to Z 0 .21 It implies that unemployment is u g and u b if Z 0 = Z g and Z 0 = Z b , respectively. In comparison with Example 9.3.1, the household’s value function has an additional argument, the technology level Z. The Bellman equation can be formulated as follows: v(ε, a, Z, F ) = max u(c) + βE v(ε0 , a0 , Z 0 , F 0 ) ε, Z, F . c
The additional state variable Z has a finite number of values, while the distribution function F is an infinite-dimensional object. Therefore, we will apply the same finite approximation of F as in Section 9.3.1 and use the first I moments of the distribution. In particular, the household is assumed 21
The transition probability p Z Z 0 can easily be recovered from the transition matrix Γ . For example, p Z g Z g is equal to the sum of the elements in the upper left 2 × 2 submatrix of Γ .
9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm
509
to be boundedly rational and to use only the first I moments m to predict the law of motion for the distribution F (·) with m1 = K:22 v(ε, a, Z, m) = max u(c) + βE v(ε0 , a0 , Z 0 , m0 ) ε, Z, m . c
We continue to apply value function iteration to compute the optimal household policy functions. In the iteration of the value function, we make use of two distinctive properties. First, we do not need to obtain high accuracy of the value function during the initial iterations over the law of the capital stock dynamics. Since we perform several outer loops over F 0 = G(F ) and use the value function found in the last iteration of the inner loop over v(ε, a, Z, m) in the previous outer loop, the value function approximation improves in each outer loop. Second, as noted by Krusell and Smith (1998), the optimal policy function a0 (ε, a, Z, m) converges faster than the value function v(ε, a, Z, m). Since we only need to know the policy function for the simulation of the economy, we do not need to compute the value function with high accuracy.23 In an economy with aggregate uncertainty, the distribution of capital is not stationary. The household’s income and savings depend on the aggregate productivity level, and for this reason, the distribution of capital changes over time. Similarly, the law of motion of the aggregate capital stock depends on the productivity level Z, and (9.10) needs to be modified: m0 = H I (m, Z).
(9.25)
In our economy with Z ∈ {Z g , Z b }, we will again analyze the simple case where the agents only use the first moment a¯ = K to predict the law of motion for the aggregate capital stock in good and bad times, respectively, according to: γ0g + γ1g ln K, if Z = Z g , 0 ln K = (9.26) γ0b + γ1b ln K, if Z = Z b . 22
Notice also that aggregate employment L only depends on aggregate technology Z due to our assumptions on the conditional employment probabilities so that the household has all the necessary information to compute w, r, b, and τ. 23 While we use an equispaced grid over individual wealth a, Krusell and Smith (1998) use an asset grid with many grid points near zero, where we observe a large curvature in the value function. Instead of value function iteration, Maliar et al. (2010) use a grid-based Euler equation algorithm to solve the individual’s optimization problem. We will apply the latter approach in Section 11.3.4 where we study the Krusell-Smith algorithm in the overlapping generations (OLG) model.
510
9 Dynamics of the Distribution Function
As aggregate productivity is a stochastic variable, we can only simulate the dynamics of the economy. We follow Krusell and Smith (1998) and use 5,000 agents to approximate the population. We choose an initial distribution of assets a, the initial technology level Z ∈ {Z g , Z b } and employment status ε over the 5,000 households in period t = 1. In particular, we use a random number generator and set Z = Z g with probability one half and Z = Z b otherwise. In addition, we assume that every household is endowed with the initial asset holdings a1 equal to the average capital stock of the economy and that the number of unemployed individuals is equal to u1 ∈ {u g , u b }. In the first iteration, the average capital stock is computed from the stationary Euler equation 1/β − δ = α(L/K)1−α with L = 0.95. We simulate the dynamics of the economy over 3,000 periods and discard the first 500 periods. As a consequence, the initialization of the distribution of (a, ε) in period t = 1 does not have any effect on our results for the statistics of the distribution in periods 501-3000.24 As in the previous section, we use the dynamics of the capital stock t=3,000 {K t } t=501 to estimate the law of motion for K t in good and bad times, Z t = Z g and Z t = Z b , respectively. For this reason, we separate the observation points (K t , K t+1 ) into two samples with either Z t = Z g or Z t = Z b and estimate the parameters (γ0 , γ1 ) separately for each subsample. To simulate the dynamics of the households’ wealth distribution, we use the optimal policy functions of the households. The optimal nextperiod asset level a0 is a function of the employment status ε, the currentperiod wealth a, the aggregate productivity level Z, and the aggregate capital stock K, a0 = a0 (ε, a, Z, K). We use value function iteration to compute the decision functions so that the individual asset level a and the aggregate capital stock K do not need to be a grid point ai or K j , respectively. Therefore, we have to use bilinear interpolation to compute 24
This Monte Carlo simulation is very time-consuming. In Chapter 11, we will consider a stochastic economy with 75 overlapping generations. If we simulate such an economy for 1,000 households in each generation, the computational time becomes a binding constraint given current computer technology. Therefore, we will approximate the cross-sectional distribution by a piecewise linear function. Algan et al. (2008) suggest the approximation of the distribution by a parameterized function and discuss various alternative approaches in the literature. Den Haan (2010a) discusses various methods to avoid the simulation step, including the projection methods introduced by Den Haan and Rendahl (2010) and Reiter (2010). These algorithms are faster than the standard Krusell-Smith algorithm and avoid the problem that the dynamics of the aggregate capital stock are determined at inefficiently chosen points of the aggregate capital stock K t (with little variation); however, in the case of the algorithm of Reiter (2010), the method is not easy to implement and requires sophisticated programming experience.
9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm
511
the optimal next-period asset level a0 off grid points in our simulation (compare Section 13.9).25 Finally, we impose the law of large numbers on our simulation results. While we track the behavior of 5,000 agents, the fraction of unemployed agents does not need to be equal to u g in good times and u b in bad times. We use a random number generator to simulate the motion of the individuals’ employment status according to their appropriate conditional probabilities. In each period t, we check whether the fraction of unemployed individuals is equal to either u g or u b . If not, we choose a corresponding sample of agents at random and change their employment status accordingly. For example, if the number of unemployed agents is above u g ×5, 000 in period t with Z t = Z g , we choose an unemployed agent at random and switch his employment status to employed and continue this process until u t = u g . The complete algorithm can be described by the following steps:26 Algorithm 9.4.1 (Krusell-Smith Algorithm) Purpose: Computation of the dynamics in the heterogeneous-agent economy with aggregate uncertainty assuming bounded rationality of the consumers Steps: Step 1: Compute aggregate employment L as a function of current productivity Z: L = L(Z). Step 2: Choose the order I of moments m. 25
To be more exact, Krusell and Smith (1998) use different interpolation methods in the value function iteration and the simulation of the economy. In the interpolation of the value function in Step 4 of Algorithm 9.4.1, they use a nested procedure to evaluate v(a0 , K 0 ) offgrid points. For projected next-period capital stock K 0 , they interpolate the value function with a polynomial along the K-dimension for all grid points on the asset space a. Next, they use cubic splines to interpolate between grid points on the asset space. In the simulation of the economy, they apply an interpolation scheme that is much faster. They use the decision rules derived in the value function iteration to evaluate the policy functions at a finer grid over (a, K) for each value of (z, ε) once at the beginning of the simulation. For example, while they use a grid of 130 × 6 points in the value function iteration, they employ 600 × 100 points in the simulation of the economy. However, during the simulation, they only use bilinear interpolation to economize on computational time. Note that the computation of the individual policy functions becomes much more time-consuming once we use a more accurate approximation of the technology level Z t and more individual states of the individual; for example, we might use the method of Tauchen (1986) or Rouwenhorst (1995) described in Section 16.4.2 and approximate the AR(1)-process with five points instead of two points, Z g and Z b , or use different permanent skills for the individuals, e.g., skilled and unskilled labor. 26 The algorithm follows Krusell and Smith (1998) with some modifications.
512
9 Dynamics of the Distribution Function
Step 3: Guess a parameterized functional form for H I in (9.25), and choose initial parameters of H I . Step 4: Solve the consumer’s optimization problem, and compute the value function v(ε, a, Z, m). Step 5: Simulate the dynamics of the distribution function. Step 6: Use the time path for the distribution to estimate the law of motion for the moments m. Step 7: Iterate until the parameters of H I converge. Step 8: Test the goodness of fit for H I using, for example, R2 . If the fit is satisfactory, stop; otherwise, increase I, or choose a different functional form for H I . In Step 8, we suggest R2 as an accuracy test in the computation of the law of motion for aggregate capital in the stochastic economy. Den Haan (2010a) argues that both the R-squared, R2 , and the standard error of ˆ u , are very weak measures of the goodness of fit. Even for a regression, σ high value of R2 = 0.9999, he finds that there is an error for the standard deviation of aggregate capital equal to 14%. The problem of using R2 as a measure relates to the fact that it only measures the error for the prediction of the dynamics in the capital stock for one period ahead. However, the household chooses next-period wealth to smooth consumption intertemporally, and the prediction of the factor prices w t+n and r t+n over the next n = 1, 2, . . . periods also critically affect his savings behavior. Therefore, Den Haan proposes a measure that considers a multiperiod forecast.27 In addition, the R2 pertains to the average of the observations and does not accord special weight to periods of severe recessions. For this reason, Den Haan (2010a) proposes the following test. Let {Z t } Tt=t denote the exogenous sequence of the technology level. Let 0 ˆ t +1 = H(K t , Z t ) denote the forecast given our parameterized funcK 0 0 0 tion for the law of motion in Step 3 and the aggregate state variable (K t 0 , Z t 0 ) in period t 0 : ˆ t +1 = ln K 0
γ0g + γ1g ln K t 0 , if Z t 0 = Z g , γ0b + γ1b ln K t 0 , if Z t 0 = Z b .
ˆ t +2 = H(K t +1 , Z t +1 ) denote the forecast for the capital Similarly, let K 0 0 0 stock in period t 0 + 2 given the aggregate state variable (K t 0 +1 , Z t 0 +1 ) in period t 0 + 1. 27
Krusell and Smith (1998) also mention this problem and a 25-year forecast test in their article.
9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm
513
To obtain an n-period ahead forecast instead, we do not use the state ˜ t+n , variable K t but the forecasted value. For this reason, let us define K n = 0, . . . , T , instead: ˜ t = Kt K 0 0 ˜ t +n−1 if Z t +n = Z g , γ0g + γ1g ln K 0 0 ˜ t +n = K 0 ˜ t +n−1 if Z t +n = Z b . γ0b + γ1b ln K 0 0
(9.27a) (9.27b)
In our numerical example below, we will also consider the accuracy measure | ln K˜t − ln K t |, which measures the percentage deviation of the predicted and actual capital stock, in addition to R2 . In particular, we will consider the n-period ahead forecast error for n = 1, . . . , 100 and for different time periods t 0 and compare the maximum and mean forecast error as a function of n.
9.4.3 Calibration and Numerical Results The parameterization is chosen for a model period equal to one year. We set the technology level equal to Z g = 1.03 in good times and Z b = 0.97 in bad times. The average duration of a boom or a recession is 5 years. Booms and recessions are of equal length, so the transition matrix Γ Z is equal to: 0.8 0.2 ΓZ = . (9.28) 0.2 0.8 The following conditional employment probabilities are taken from Castañeda et al. (1998b) who consider the annual employment mobility for the US economy: 0.9615 0.0385 0 0 g g Γ (ε |Z = Z , Z = Z , ε) = , 0.9581 0.0492 0.9525 0.0475 0 0 b b Γ (ε |Z = Z , Z = Z , ε) = . 0.3952 0.6048 These employment probabilities imply ergodic distributions with unemployment rates u g = 3.86% and u b = 10.73%, respectively. The conditional employment probabilities for the transition from good times to bad times, Z = Z g and Z 0 = Z b , are calibrated such that all unemployed agents stay
514
9 Dynamics of the Distribution Function
unemployed and that the unemployment rate is u b in the next period using (9.24). Accordingly, p(Z g ,e),(Z b ,e) PZ g Z b
=
1 − ub . 1 − ug
Similarly, the transition matrix from Z = Z b to Z 0 = Z g is calibrated such that all employed agents remain employed and the unemployment rate is equal to u g in the next period, again making use of (9.24). The asset grids over individual wealth a, A = {a1 = 0, . . . , ana = 12} and aggregate capital K, K = {K1 = 2.5, . . . , Knk = 5.5}, are chosen to be equispaced with na = 101 and nk = 10 nodes, respectively. The upper and lower bounds of these two intervals are found to be nonbinding. The replacement rate of unemployment insurance with respect to net wages is set equal to 25%. The remaining parameters are also taken from Castañeda et al. (1998b): α = 0.36, β = 0.96, δ = 0.1, and η = 1.5.28 The algorithm is implemented in program Krusell_Smith_algo.g. The computational time amounts to 1 hour, 58 minutes using an Intel(R) Xeon(R), 2.90 GHz machine. We set the required accuracy of the golden section search algorithm in the value function equal to 1e-7, the maximum divergence of the value functions in two successive iterations equal to 1e-6, and the maximum divergence of the γ-coefficients in the law of motion for capital equal to 0.0001. In the outer loop, we need q = 41 iterations for the convergence of the law of motion, while we use j = 87 iterations in the inner loop over the value function. The optimal policy functions and the value functions behave as expected, and for this reason, we do not display them. Savings a0 (ε, a, Z, m) and consumption c(ε, a, Z, m) increase with higher individual wealth a, while net savings a0 − a decline. In addition, households save a higher proportion of their income for higher interest rates r or, equivalently, lower aggregate capital K. The mean capital stock in an economy with uncertainty is equal to ¯ K = 4.30. The distribution of individual wealth in period T = 3, 000 is graphed in Figure 9.7. In our simulation, the aggregate capital stock in the last period t = 3, 000 is equal to K3,000 = 4.08 and the economy is in a boom, Z3,000 = Z g = 1.03. Note, in particular, that the upper grid point of A , ana = 12, is not binding and the maximum wealth of the households is approximately equal to a = 7.5. 28
Our values are close to those chosen by Den Haan et al. (2010). For example, these authors choose unemployment rates of 4% and 10% in good and bad times, respectively.
9.4 Aggregate Uncertainty: The Krusell-Smith Algorithm
515
F (e, a) + F (u, a)
1.00 0.75 0.50 0.25 0.00 0
2
4
a
6
8
Figure 9.7 Distribution Function in Period T = 3, 000
The law of motion for capital (9.26) is estimated at: 0.178 + 0.886 ln K, if Z = Z g , R2 = 1.0000 ln K 0 = 0.136 + 0.889 ln K, if Z = Z b , R2 = 1.0000
(9.30)
Using (9.30), the R2 amounts to 1.0000, and the mean prediction error of the capital stock is equal to 1.1%. In addition to these standard regression errors, we apply the accuracy measure in the form of the n-period ahead forecast error suggested by Den Haan (2010a). Figure 9.8 presents the ˜ t − ln K t |, over the (absolute) percentage deviation of the forecast, | ln K first 100 periods. We started the computation of the forecast error in period t 0 = 500 (so we dropped the initial iterations) in our simulation of the capital stock dynamics and computed the forecast error for the next 100 periods. Next, we examined period t 0 = 501 and, again, computed the forecast error over the next 100 periods. We continued to do so for t 0 = 500, 501, . . . , 2500. Fig 9.8 displays both the mean and the maximum predicted forecast errors for these 2001 subperiods as a function of the forecast periods 1, 2, . . . , 100. Note that the mean prediction error increases by a factor of almost 7, from 1.06% to 7.36%, if we increase the forecast from 1 to 100 periods. Therefore, we also observe the same increase in forecast error documented by Den Haan (2010a). The maximum prediction error is substantially higher and increases from 4.34% (n=1) to 19.21% (n=100). Both the mean and maximum errors do not increase significantly beyond n = 25 (the prediction horizon originally considered by Krusell and Smith (1998)). This measure documents that the prediction errors accumulate over time, and we, therefore, have to be careful in interpreting a high R2 and a low prediction error in the regression (9.30) as evidence of high accuracy.
516
9 Dynamics of the Distribution Function
˜ t ) − ln(K t )| | ln(K
20
Maximum Mean
15 10 5 0
10
20
30
40
50
60
70
80
90 100
n Figure 9.8 Prediction Errors without Updating
The n-period ahead prediction error also helps in the comparison of other model economies that are computed with the Krusell-Smith algorithm with the standard economy considered in this section (and in the original paper by Krusell and Smith (1998)). If both the R2 and the maximum of the prediction error over longer time horizons are close to those obtained above, we know that the algorithm is performing approximately as accurately as in the benchmark economy. If not, we should probably attempt to improve the accuracy in the computation of the law of motion for capital (and, possibly, other aggregate state variables). The dynamics of the capital stock in our simulation are displayed in Figure 9.9. The standard deviation of annual capital is equal to σK = 1.30%.29 Our simple model falls short of replicating important business cycle characteristics. For example, the standard deviation of annual aggregate output (σY = 4.08%) is smaller than that of aggregate consumption (σC = 5.42%).30 In the next section, you will be introduced to two more elaborate models of business cycle dynamics. 29
The log of the time series of aggregate capital stock K t , output Yt , and consumption C t have been HP-filtered with µ = 100. 30 To match the annual volatility of output σY with its empirical value, you may re-scale the technology states (Z g , Z b ) proportionally. Ríos-Rull (1996) reports a value of σY = 2.23% and σC = 1.69% for annual GDP and consumption in the US economy during 1956-87. See also Table 11.4 in Section 11.3.1.
9.5 Applications
517
4.8
Kt
4.5
4.3
4.0 500
1,000
1,500
2,000
2,500
3,000
t Figure 9.9 Time Path of the Aggregate Capital Stock
9.5 Applications In this section, we will consider two prominent applications of computational methods for heterogeneous-agent economies with uncertainty that consider business cycle dynamics. One of the first papers in the area of computable general equilibrium models of heterogenous-agent economies is the article by ˙Imrohoro˘ glu (1989a) published in the Journal of Political Economy. Her pioneering work, although the model is only partial equilibrium, can be considered the very first milestone in the literature on computable heterogeneous-agent economies that the second part of this book addresses.31 We will recompute her benchmark equilibrium. She shows that the costs of business cycles depend on the assumption of whether agents can borrow. Her equilibrium is partial in the sense that the factor prices are exogenous. As a consequence, agents do not need to project the cyclical behavior of the interest rate and labor income. They only need to consider the probability of being employed in the next period. Therefore, the computation is straightforward when applying the methods Moreover, Ay¸se ˙Imrohoro˘ glu was already publishing other important contributions in the field of computable heterogeneous-agent economies at this very early time, when computer technology started to allow for such computations. Among others, she also made an important contribution to the study of the welfare costs of inflation that was published in the Journal of Economic Dynamics and Control in 1992.
31
518
9 Dynamics of the Distribution Function
presented in Chapter 8.32 In the second application, we consider the business cycle dynamics of the income distribution. Our model closely follows Castañeda et al. (1998b), and we need to apply the methods developed in the previous section.
9.5.1 Costs of Business Cycles with Indivisibilities and Liquidity Constraints THE MODEL. The model in ˙Imrohoro˘ glu (1989a) is similar to the economy described in Example 9.3.1. There are many infinitely lived households of mass one who differ with respect to assets a t and their employment status ε t . Households maximize their intertemporal utility Et
∞ X s=0
β s u (c t+s ) ,
(9.31)
where β < 1 is the subjective discount factor and expectations are conditioned on the information set at time t. At time zero, the agent knows his beginning-of-period wealth a0 and his employment status ε0 ∈ {e, u}. The agent’s instantaneous utility function is a CRRA function of his consumption: 1−η
u(c t ) =
ct
1−η
, η > 0,
(9.32)
where η, again, denotes the coefficientof relative risk aversion. If ε = e (ε = u), the agent is employed (unemployed). If the agent is employed he produces y(e) = 1 units of income. If he is unemployed, he engages in home productionand produces y(u) = θ units of consumption goods, where 0 < θ < 1. Furthermore, agents cannot insure against unemployment. ˙Imrohoro˘ glu (1989a) considers two different economies: In the first economy, agents cannot borrow, a ≥ 0. They can insure against fluctuations in their income by storing the asset. The budget constraint is given by: a t+1 = a t − c t + y(ε t ).
(9.33)
In the second economy, the agents can borrow at rate r b . Agents can save assets by either lending at rate rl = 0 or storing them. There is an 32
However, we include this model in the present section because it also studies the effects of business cycle fluctuations.
9.5 Applications
519
intermediation sector between borrowing and lending households. The borrowing rate r b exceeds the lending rate r b > rl . The intermediation costs, which are equal to the difference between the borrowing rate and the lending rate times the borrowed assets, are private costs and reduce total consumption. In the case without business cycle fluctuations, the individual-specific employment state is assumed to follow a first-order Markov chain. The conditional transition matrix is given by: puu pue 0 0 π(ε |ε) = P r o b ε t+1 = ε |ε t = ε = , (9.34) peu pee where, for example, P r o b {ε t+1 = e|ε t = u} = pue is the probability that an agent will be employed in period t + 1 given that the agent is unemployed in period t. In the case of business cycle fluctuations, the economy experiences good and bad times. In good times, employment is higher, and both employed and unemployed agents have a higher probability of finding a job. We can distinguish four states s ∈ {s1 , s2 , s3 , s4 }: s = s1 ) the agent is employed in good times, s = s2 ) the agent is unemployed in good times, s = s3 ) the agent is employed in bad times, and s = s4 ) the agent is unemployed in bad times. The transition between the four states is described by a first-order Markov chain with conditional transition matrix π(s0 |s). The transition matrices are chosen such that economies with and without business cycles have the same average unemployment rate. CALIBRATION. The model is calibrated for a model period of 6 weeks or approximately 1/8 of a year. The discount factor β = 0.995 implies an annual subjective time discount rate of approximately 4%. The coefficient of relative risk aversion η is set equal to 1.5. The annual borrowing rate is set equal to 8%, corresponding to a rate of r b = 1% in the model period. The conditional transition matrices π(ε0 |ε) and π(s0 |s) are calibrated such that average unemployment is 8%, and unemployment in good times and bad times is 4.3% and 11.7%, respectively. In the economy with business cycles, the average duration of unemployment is 1.66 and 2.33 periods (10 and 14 weeks) in good and bad times, respectively. Furthermore, the probability that good or bad times continue for another period is set equal to 0.9375, so the average duration of good and bad times is equal to 24 months, implying an average duration of the business cycle equal to 4 years. The transition matrices are then given by:
520
9 Dynamics of the Distribution Function
0
π(ε |ε) =
0.5000 0.5000 , 0.9565 0.0435
(9.35)
and
0.9141 0.0234 0.0587 0.0038 0.5625 0.3750 0.0269 0.0356 π(s0 |s) = . 0.0608 0.0016 0.8813 0.0563 0.0375 0.0250 0.4031 0.5344
(9.36)
The Markov process described by matrix (9.36) implies average unemployment rates of 4.28% and 11.78% during good and bad times, respectively. Finally, the households’ home production is equal to θ = 0.25. COMPUTATION. In the following, we describe the computation of the economy with business cycle fluctuations. The computation of the model is simpler than the computation for the economy considered in Section 8.4. In Example 8.4.1 with the endogenous interest rate r, we had to select an initial value of the interest rate, compute the decision functions and the invariant distribution and subsequently update the interest rate until it converges. In the present economy, the interest rate is given. We first compute the decision functions by value function iteration. The value function of the individual is a function of his assets a and the state s: v(a, s) = max u(c) + βE v(a0 , s0 ) s (9.37) c X = max u(c) + β π(s0 |s)v(a0 , s0 ) . c
s0
From Chapter 7, we know how to solve this simple dynamic programming problem. In program costs_cycles.g, we use value function iteration with linear interpolation between grid points. The maximum of the right-hand side (rhs) of the Bellman equation (9.37) is computed with golden section search. We use na = 301 grid points for the asset space, so we have to store a matrix with na × 4 = 1204 entries. The grid is chosen to be equispaced on the interval [0, 8] and [−8, 8] for the economy with only a storage technology and the economy with intermediation, respectively. First, we consider the case of a storage economy with the credit constraint a ≥ 0. As presented in Figure 9.10, consumption is an increasing
9.5 Applications
521
function of income and is also higher in good times because agents have a higher expected next-period income (not presented). The optimal nextperiod asset a0 (a, s) is a monotone increasing function of assets a. Figure 9.11 displays the savings a0 − a, which are always negative for the unemployed agent (s = 2, 4) and become negative for the employed agents (s = 1, 3) at a wealth level equal to 3.76, so the ergodic set is contained in the interval [0, 3.76]. 1.2 1.0
s=1 s=2 s=3 s=4
c(a, s)
0.8 0.6 0.4 0.2 0
1
2
4
3
5
7
6
a
8
Figure 9.10 Consumption Functions in the Storage Economy
0.25
a0 (a, s) − a
0.00
s=1 s=2 s=3 s=4
−0.25 −0.50 −0.75 0
1
2
3
4
5
a Figure 9.11 Savings Functions in the Storage Economy
6
7
8
522
9 Dynamics of the Distribution Function
Next, we compute the invariant density function f (a, s). The associated invariant distribution can be interpreted as the fraction of time that a particular individual spends in the different states (a, s). For an economy with business cycle fluctuations, the invariant distribution is the limit of the predictive probability distribution of an individual in n periods where n goes to infinity. We compute the invariant density function as described in Chapter 8 and approximate it by a discrete-valued function fˆ(a, s), a ∈ {a1 , . . . , ang }, s ∈ {1, 2, 3, 4}. We use a finer grid over the asset space for the computation of the distribution than for the computation of the policy function. In particular, we compute the density function at na g = 903 equispaced points over the intervals [0, 8] and [−8, 8]. The (discretized) invariant density function fˆ(a, s) can be computed from the following dynamics: X X fˆ0 (a0 , s0 ) = π(s0 |s) fˆ(ai , s), (9.38) s
i:ai =a0−1 (a0 ,s)
where a0−1 (a0 , s) denotes the inverse of the function a0 (a, s) with respect to its first argument a. As the optimal next-period asset level a0 may not be a grid point, we simply assume that it will be on the lower or higher neighboring point with a probability that corresponds to the distance from the higher or lower neighboring point, respectively (see Step 3 in Algorithm 8.4.3). The invariant density function fˆ(a, s) is displayed in Figure 9.12. The ergodic set is approximately [0, 3.76], and the density is zero for a > 3.76.33 On average, assets are stored in an amount of a¯ = 2.35 in this economy. As the average unemployment rate is 8%, average income (which is equal to average consumption) is equal to ¯y = 0.92 × 1.0 + 0.08 × 0.25 = 0.94. Consumption and savings behavior in the economy with intermediation is different from that in the economy with a storage technology only. In particular, the consumption behavior changes around a = 0 as the interest rate on assets changes from the low lending rate r l = 0% to the high borrowing rate r b = 8% (see Figures 9.13 and 9.14). This is why we used value function iteration. If we had used a computational method such as projection methods that does not rely on the discretization of the individual Different from our density function, the density function computed by ˙Imrohoro˘ glu (1989a) (Figure 1 in her article) displays two spikes in the range 1.5-2.0 of individual wealth and maximum values of approximately 0.05 in good times. This, however, is an artifact of her computational methods. She only computes the policy functions at grid points and does not interpolate between grid points. As a consequence, our results differ to a slight degree, and our policy functions and distribution functions are much smoother. 33
9.5 Applications
523 s=1 s=2 s=3 s=4
fˆ(a, s)
0.3
0.2
0.1
0.0
0
1
2
3
4
5
a
6
7
8
Figure 9.12 Invariant Density Functions in the Storage Economy
asset grid, we may have had problems in capturing this nonmonotonicity of the first derivative of a0 (a, ε) at a = 0. 1.25
c(a, s)
1.00
s=1 s=2 s=3 s=4
0.75 0.50 0.25 −8
−6
−4
−2
0
2
4
a
6
8
Figure 9.13 Consumption in the Economy with Intermediation
The distribution of individual wealth in the economy with intermediation is graphed in Figure 9.15. The average of assets borrowed amounts to 0.510 and is not equal to the amount of assets saved (=0.287) because we only study partial equilibrium. In general equilibrium, the interest rates r b and r l would adjust to clear the capital market. The average income ¯y is equal to the one in the economy with storage technology only and amounts to 0.940. However, because intermediation costs are private costs, average
524
9 Dynamics of the Distribution Function
a0 (a, s) − a
0.25 0.00
−0.25
s=1 s=2 s=3 s=4
−0.50 −0.75 −1.00 −8
−6
−4
−2
0
2
4
a
8
6
Figure 9.14 Savings Functions in the Economy with Intermediation
consumption is smaller than average income, ¯c = 0.935 < 0.940 = ¯y . The difference between average income and average consumption is simply the product of the borrowing rate, r b , and the assets borrowed.
fˆ(a, s)
0.50
s=1 s=2 s=3 s=4
0.25
0.00 −8
−6
−4
−2
0
a
2
4
6
8
Figure 9.15 Invariant Density Functions in the Economy with Intermediation Technology
As one central problem of her work, ˙Imrohoro˘ glu (1989a) computes the welfare gain from eliminating business cycle fluctuations. For this reason, she computes average utility in the economy with and without business cycle fluctuations, either using (9.35) or (9.36) for the state transition matrix of the economy. For the benchmark calibration, the elimination of
9.5 Applications
525
business cycles is equivalent to a utility gain corresponding to 0.3% of consumption in the economy with a storage technology. If the relative risk aversion η is increased to 6.2, the welfare gain rises to 1.5% of consumption.34 An intermediation technology significantly reduces business cycle costs. For η = 1.5, the fluctuations only cause a utility loss equivalent to 0.05% of consumption. The computation of the welfare effects from business cycle fluctuations in ˙Imrohoro˘ glu (1989a) is only sensible if the average asset holdings for the economies with and without business cycles do not change significantly. This is the case in ˙Imrohoro˘ glu (1989a). Heer (2001a) considers an economy with endogenous prices where asset holdings differ in economies with and without business cycles. Agents may hold much higher assets for precautionary reasons in a fluctuating economy. As a consequence, average asset holdings may change, and in general equilibrium, average consumption may also change significantly.35 In his model, welfare changes that result from business cycle fluctuations are even more pronounced than in the model of ˙Imrohoro˘ glu (1989a). Storesletten et al. (2001) consider the costs of business cycles in an economy where 1) there is countercyclical variation in idiosyncratic income risk and 2) households have finite lifetime. They find that the elimination of aggregate productivity shocks increases welfare by 1.44% of agent consumption, which is an order of magnitude larger than the welfare costs of business cycles found by Lucas (1987). Storesletten et al. (2001) first provide empirical evidence that the conditional standard deviation of the persistent shock to earnings more than doubles during a recession. Finite lifetime exacerbates the negative welfare effects of a downturn. Imagine a young worker in Italy who starts his working life during the Great Recession of 2007-08 and whose likelihood of finding a job falls to a value in the vicinity of 50%. At the beginning of 2020, the onset of the COVID-19 pandemic crisis may have resulted in a significant prolongation of the worker’s unemployment period and a lasting effect on his human capital. Consequently, the cohorts educated or entering the labor market in Italy during the first two decades of this century are lost generations who will never fully recover from this adverse shock during their finite 34
Note that this is a very large welfare effect. Lucas (1987) estimates the costs of business cycles to be very small and only equivalent to 0.1% of total US consumption. Different from the present model, agents can insure against the idiosyncratic risk in his model. 35 Huggett (1997) studies the one-sector growth model where agents receive idiosyncratic labor endowment shocks and face a borrowing constraint. He shows that any steady-state capital stock lies strictly above the steady state in the model without idiosyncratic shocks.
526
9 Dynamics of the Distribution Function
lifetime. In the case of infinite lifetime studied by ˙Imrohoro˘ glu (1989a) or Lucas (1987), these adverse initial conditions contribute far less to total (lifetime) utility. More recent research on the welfare costs of business cycles has extended these early contributions by 1) the introduction of frictions in the economy and 2) the consideration of different sources of exogenous aggregate shocks. In this vein, for example, Iliopulos et al. (2018) study both financial and labor market frictions in the labor market search model of Diamond (1982), Mortensen (1982), and Pissarides (1985).36 The entrepreneur borrows to finance his hiring costs but has limited access to credit in the form of a collateral constraint, as considered in Kiyotaki and Moore (1997). The financial and labor market frictions interact and result in a sizable welfare gain from elimination of business cycles equal to 2.5% of permanent consumption. Braun and Nakajima (2012) also consider an incomplete market economy with countercyclical idiosyncratic labor and asset risk. Prices are sticky, and there are separate aggregate shocks on the return to savings and wages. Due to the incompleteness of markets, welfare costs are substantially higher in their model than in that of Schmitt-Grohé and Uribe (2007). In accordance with the latter, they also find the optimal monetary policy to be one of complete price stabilization. Heer and Scharrer (2018) consider an exogenous aggregate shock to government consumption in a New Keynesian heterogeneous-agent economy with finite lifetime where some of the households use rules of thumb for their consumption policy. They show that tax financing is welfare improving relative to debt financing in the presence of stochastic government demand and, contrary to common wisdom, even benefits retired households.
9.5.2 Income Distribution and the Business Cycle Castañeda et al. (1998b) explore the business cycle dynamics of the income distribution both empirically and in a theoretical computable general 36
Note the parallels to the literature on the equity premium puzzle. In the model of Jermann (1998), the ability to generate a sizable equity premium hinges on the assumption of inelastic labor supply. In the case of flexible labor, the equity premium basically vanishes as households use their intertemporal labor supply to smooth consumption; consequently, the price of risk drops. To re-establish a significant equity premium, Boldrin et al. (2001) introduce labor market frictions in their model. Similarly, the welfare costs of business cycle fluctuations are smaller if the household can use intertemporal substitution of labor to smooth consumption over time.
9.5 Applications
527
equilibrium model. They find that, in the US, the income share earned by the lowest quintile is more procyclical and more volatile than the other income shares. In particular, the income shares earned by the 60%-95% group are actually countercyclical, while the share earned by the top 5% is still acyclical. To address these issues, they construct a heterogeneous-agent economy with aggregate uncertainty based on the stochastic neoclassical growth model. The aggregate uncertainty is modeled as in Example 9.4.1 and follows a Markov process. Again, unemployment is higher during bad times and contributes to the explanation of the business cycle dynamics of the lowest income share. Contrary to the model by ˙Imrohoro˘ glu (1989a) presented in the previous section, the aggregate capital stock and the interest rate are endogenous variables of the model. As one of their major results, cyclical unemployment helps to reconcile the model’s behavior with the data on the dynamics of the income distribution. In the following, we present a slightly modified version of the model by Castañeda et al. (1998b). In particular, we introduce income mobility into their model and consider its effect on the cyclical behavior of the income shares of each income quintile. First, we describe the model. Second, we present the calibration and the computational method. We conclude with the presentation of our results.37 THE MODEL. There are many infinitely lived households of measure one who differ with respect to assets a t , their employment status ε t , and their efficiency type i ∈ {1, . . . , 5} in period t. The mass of the type i household is equal to µi = 20% for i = 1, . . . , 5. Households maximize their intertemporal utility38 Et
∞ X s=0
β s u (c t+s ) ,
(9.39)
where β < 1 is the subjective discount factor and expectations are conditioned on the information set at time t. At time zero, the agent knows his beginning-of-period wealth a0 , his employment status ε0 ∈ {e, u} and 37
In Section 11.3.4, we will consider an overlapping generations model with elastic labor supply and further improve the modeling of the business cycle dynamics of the income distribution. 38 For the sake of notational simplicity, we refrain from indexing individual variables such as consumption, wealth, employment status, and the efficiency index with a subscript j ∈ [0, 1].
528
9 Dynamics of the Distribution Function
his efficiency type i. The agent’s instantaneous utility function is a CRRA function of his consumption: 1−η
u(c t ) =
ct
1−η
, η > 0,
(9.40)
where η, again, denotes the coefficient of relative risk aversion. The model is characterized by both idiosyncratic and aggregate risk. At the individual level, the household may either be employed (ε = e) or unemployed (ε = u). Aggregate risk is introduced by a stochastic technology level Z t in period t. In particular, the productivity shock follows a Markov process with transition matrix Γ Z (Z 0 |Z), where Z 0 denotes the next-period technology level and π Z Z 0 denotes the transition probability from state Z to Z 0 . The individual employment probabilities, of course, depend on the aggregate productivity Z t . In good times (high productivity Z t ), agents have higher employment probabilities than in bad times. The joint process of the two shocks, Z t and ε t , can be written as a Markov process with transition matrix Γi ((Z 0 , ε0 )|(Z, ε)) and depends on the efficiency type i of the agent. We use πi ((Z 0 , ε0 )|(Z, ε)) to denote the probability of transition from state (Z, ε) to state (Z 0 , ε0 ) for an individual with efficiency type i. In the following, we restrict our attention to the simple case presented in Example 9.4.1. The economy only experiences good and bad times with technology levels Z g and Z b , respectively, Z g > Z b . Consequently, the joint processes on (Z, ε) are Markov chains with 4 states for each efficiency type i = 1, . . . , 5. We study a simple extension of the model by Castañeda et al. (1998b). In particular, different from their model, we assume that agents may change their efficiency type i. Given the empirical evidence on income mobility, we examine the effect of this assumption on the cyclical behavior of the income distribution.39 We assume that the efficiency type i follows a Markov chain π(i 0 |i) that is independent of the aggregate productivity. Furthermore, the employment probability in the next period depends only on the efficiency type of this period. In other words, πi ((Z 0 , ε0 )|(Z, ε)) is not a function of next-period type i 0 . The probability of an employed type i agent in the good state Z = Z g to be employed next period as a type i 0 agent in the bad state Z 0 = Z b , for example, is given by the product π(i 0 |i) πi ((e, Z b )|(e, Z g )). 39
In addition, the assumption of income mobility may change the wealth inequality in the model as we argue below. Nevertheless, wealth heterogeneity is too small in both our model and that of Castañeda et al. (1998b).
9.5 Applications
529
Households are assumed to know the law of motion of both π(i 0 |i) and πi ((Z 0 , ε0 )|(Z, ε)) and observe the realization of both stochastic processes at the beginning of each period. In good times, agents work h(Z g ) hours, and in bad times, agents work h(Z b ) hours. Let ζi denote the efficiency factor of a type i agent. If employed, the agent receives the labor income ¯ h(Z)ζi w; otherwise, he produces home production w. i Let N (Z) denote the number of employed households of type i for current productivity Z. We will calibrate these values below such that N i (Z) is constant for Z ∈ {Z g , Z b } and does not depend on the history of the productivity level Z, {Zτ }τ=t τ=−∞ . The assumption that employment only depends on current productivity greatly simplifies the computation. Agents do not have to form expectations about the dynamics of the aggregate employment level but only need to consider aggregate productivity and the distribution of wealth. The aggregate labor input measured in efficiency units is presented by X L(Z) = ζi h(Z)N i (Z). i
For technology level Z t , aggregate production is described by Cobb-Douglas technology: Yt = Z t K tα L 1−α . t
(9.41)
We assume competitive factor and product markets, implying the factor prices: w t = Z t (1 − α)K tα L −α t ,
r t = Z t αK tα−1 L 1−α − δ, t
(9.42) (9.43)
where δ denotes the rate of depreciation. Note that the agents only need to forecast the aggregate capital stock K 0 (and, therefore, the dynamics of the distribution of capital) and the aggregate technology level Z 0 to form a prediction of the next-period factor prices w0 and r 0 , respectively, as we assume L 0 to be a function of Z 0 only. In the following, we describe the household decision problem in a recursive form. Let F (·) denote the distribution of the individual state variables (i, ε, a). For each household, the state variable consists of her efficiency type i, her employment status ε, her individual asset level a, the aggregate technology level Z, the aggregate capital stock K (which is implied by the distribution F (·)), and the distribution of efficiency types, employment, and individual wealth, F (i, ε, a).
530
9 Dynamics of the Distribution Function
The recursive problem can be formulated as follows: v(i, ε, a; Z, F ) = max u(c) + βE v(i 0 , ε0 , a0 ; Z 0 , F 0 ) i, ε, Z, F (9.44) c
subject to the budget constraint: (1 + r) a + wζi h(Z) − c, if ε = e, 0 a = ¯ − c, (1 + r) a + w if ε = u,
(9.45)
and subject to (9.42) and (9.43), the stochastic process of the employment status ε and the aggregate technology Z, πi ((Z 0 , ε0 )|(Z, ε)), the agent’s efficiency mobility as given by π(i 0 |i), and the distribution dynamics F 0 = G(F, Z, Z 0 ), where G describes the law of motion for the distribution F . The definition of the equilibrium is analogous to that in Example 9.4.1, and we will omit it for this reason. The interested reader is referred to Section 3.5.2 of Castañeda et al. (1998b). CALIBRATION. The parameters, if not mentioned otherwise, are taken from the study of Castañeda et al. (1998b).40 Model periods correspond to 1/8 of a year (≈ 6 weeks). The coefficient of relative risk aversion is set equal to η = 1.5, and the discount factor is set equal to 0.961/8 , implying an annual discount rate of 4%.41 The transition matrix between good and bad states is set equal to: 0.9722 0.0278 0 π(Z |Z) = , 0.0278 0.9722 implying equal lengths of booms and recession averaging 4.5 years. Furthermore, employment is constant both in good times and bad times, 40
We would like to thank Victor Ríos-Rull for providing us with the calibration data on the transition matrices. 41 Our choice of discount rate implies a negative (annual) real interest rate in the amount of -0.6%. This is in accordance with recent empirical observations from the US economy presented in Figure 8 of Eggertsson et al. (2019) according to which the real Federal Funds rate turned negative in 2006. Summers (2014) formalized the idea of secular stagnation that is consistent with a permanent negative natural rate of interest and a chronically binding zero lower bound on the short-term nominal interest rate. Eggertsson et al. (2019) present a life-cycle model that accounts for a natural rate between -1.5% and -2.0%. They show that the declines in the mortality and total fertility rates together with the decline in productivity growth help to explain the decline in the natural rate, while the increase in government debt has exerted an only partially offsetting effect. We will consider life-cycle models in Chapters 10 and 11.
9.5 Applications
531
and the factor Zh(Z)1−α is set equal to 1 and 0.9130 for Z = Z g and Z = Z b , respectively. We assume that average working hours amount to h(Z g ) = 32% and h(Z b ) = 30% of the available time during good and bad times, respectively.42 The production elasticity of capital is set equal to α = 0.375, and the depreciation rate is equal to 1 − 0.91/8 . Castañeda et al. (1998b) assume that agents are immobile, implying π(i 0 |i) = 1 if i 0 = i and zero otherwise. Budría Rodríguez et al. (2002) provide an estimate of the US earnings transition matrix between the different earnings quintiles from 1984 to 1989:43
0.58 0.28 0.09 0.03 0.02 0.22 0.44 0.22 0.08 0.03 P = 0.10 0.15 0.43 0.23 0.09 . 0.06 0.09 0.18 0.46 0.21 0.06 0.02 0.06 0.21 0.65
(9.46)
We still have to transform this 5-year transition matrix into a 1/8-year transition matrix. Using the definition of the root of a matrix in equation (12.31), we can compute P 1/40 , which we set equal to π(i 0 |i):44
0.983 0.015 0.001 0.000 0.000 0.011 0.974 0.013 0.001 0.000 π(i 0 |i) = 0.003 0.007 0.974 0.013 0.002 . 0.001 0.004 0.010 0.976 0.010 0.002 0.000 0.001 0.010 0.987
(9.47)
The 6-week earnings mobility is rather small, as the entries on the diagonal of (9.47) are close to unity.45 Therefore, we would expect little influence from neglecting income mobility on the results. 42
Together with the income mobility transition matrix, these are the only parameters that differ from the calibration of Castañeda et al. (1998b). 43 See Table 24 in their appendix. 44 The computation is performed by procedure ‘matroot’ in program dynamics_income_ distrib.g. Furthermore, we set all negative entries of the matrix root equal to zero and normalize the sum of each row equal to one in routine ‘matroot’. The error is rather small for our case (you may check this by computing π(i 0 |i) · π(i 0 |i) · . . . and comparing it to the matrix P from (9.46)). 45 To obtain (9.47) from (9.46), we have assumed that earnings follow an AR(1)-process. As we argued in Chapter 8, the behavior of earnings is much better described by an AR(2)-process. Due to the lack of high-frequency data on earnings mobility, however, we do not have another option.
532
9 Dynamics of the Distribution Function
We use five types of households with efficiency ζi ∈ {0.509, 0.787, 1.000, 1.290, 2.081}. The efficiency factors are chosen to be the relative earnings of the different income groups. The variations in the hours worked of these 5 income groups between good and bad times are treated as if they were variations in employment rates. With the help of the coefficient of variation of average hours, the employment rates are calibrated as in Table 9.1.46 Table 9.1 Calibration of Employment Rates
i
N (Z g )
N (Z b )
1 2 3 4 5
0.8612 0.9246 0.9376 0.9399 0.9375
0.8232 0.8854 0.9024 0.9081 0.9125
Given the employment of type i households in good and bad times, N (Z g ) and N i (Z b ), respectively, and the average duration of unemployment in good times (10 weeks) and in bad times (14 weeks), we are able to compute the matrices πi ((ε0 , Z 0 )|(ε, Z)) with Z = Z 0 . For this reason, we have to solve a system of 4 equations (including nonlinear equations), which is carried out in routine transp in program dynamics_income_distrib.g. Two equations are given by the conditions that agents are either employed or unemployed in the next period. Taking πi ((ε0 , Z g )|(ε, Z g )) as an example, we impose the following two conditions on the transition matrix: i
πi ((e, Z g )|(e, Z g )) + πi ((u, Z g )|(e, Z g )) = π(Z g |Z g ),
πi ((e, Z g )|(u, Z g )) + πi ((u, Z g )|(u, Z g )) = π(Z g |Z g ). Furthermore, the average duration of unemployment is 10 weeks or 10/6 periods in good times, implying
46
πi ((u, Z g )|(u, Z g )) = 4/10 × π(Z g |Z g ).
Please see the original article for a detailed description of the calibration procedure.
9.5 Applications
533
The fourth condition is given by the equilibrium employment in good times, N i (Z g ). We impose as our fourth nonlinear equation that the ergodic distribution of the employed agents is equal to N i (Z g ). The matrix π1 ((ε0 , Z g )|(ε, Z g )), for example, is given by: π1 ((ε0 , Z g )|(ε, Z g )) = π1 (ε0 |ε) · π(Z g |Z g ) 0.9033 0.09067 = · π(Z g |Z g ). 0.6000 0.40000 As you can easily check, the ergodic distribution of π1 (ε0 |ε) is equal to (0.8612, 0.1388)0 , the average duration of unemployment is 10 weeks, and the sum of each row is equal to one. Similarly, we are able to compute πi ((ε0 , Z b )|(ε, Z b )) for all i = 1, . . . , 5. It remains to compute the transition matrix π((ε0 , Z b )|(ε, Z g )) between good and bad times on the one hand and the transition matrix π((ε0 , Z g )|(ε, Z b )) between bad and good times on the other hand. First, we assume that all unemployed agents remain unemployed if the economy transits from good to bad times πi ((u, Z b )|(u, Z g )) = π(Z b |Z g )
and
πi ((e, Z b )|(u, Z g )) = 0 for all i = 1, . . . , 5. Second, we assume that employment levels N i (Z g ) and N i (Z b ) are constant. For this reason, N i (Z g )πi ((e, Z b )|(e, Z g )) = N i (Z b )π(Z b |Z g )
must hold. Together with the condition that
πi ((e, Z b |e, Z g )) + πi ((u, Z b )|(e, Z g )) = π(Z b |Z g ),
we have four conditions that help us to determine the transition matrix πi ((ε0 , Z b )|(ε, Z g )). For the computation of the matrix πi ((ε0 , Z b )|(ε, Z g )), we assume that all employed agents remain employed if the economy transits from the bad to the good state. Furthermore, we assume N i (Z g ) to be constant so that we also impose the restriction (1 − N i (Z g ))π(Z g |Z b ) = πi ((u, Z g )|(u, Z b ))(1 − N i (Z b )).
Together with the two conditions that the sum of each row must be unity we can determine the matrix πi ((ε0 , Z g )|(ε, Z b )) for all i = 1, . . . , 5. ¯ is set equal to 25% of average earnings Finally, household production w ¯ are in the economy. In particular, the earnings during unemployment w constant over the business cycle.
534
9 Dynamics of the Distribution Function
COMPUTATION. The program dynamics_income_distrib.g computes the solution. It employs the methods described in Section 9.4. In particular, we apply Algorithm 9.4.1 with the following Steps 1-8: Step 1: In the first step, we choose computational parameters and compute the aggregate employment levels in good and bad times (with µi = 0.20), X N (Z g ) = µi ζi h(Z g )N i (Z g ), b
N (Z ) =
i X i
µi ζi h(Z b )N i (Z b ).
The agents form very simple expectations about next-period employment. Employment next period depends only on productivity in the next period: L 0 = L 0 (Z 0 ). The policy functions are computed on the interval A × K = [amin , ama x ] × [Kmin , Kma x ] = [0, 800] × [80, 400]. The interval limits are found with some trial and error and do not bind. The policy and value functions are computed on an equispaced grid of the state space using na = 50 and nk = 5 grid points on the intervals A and K , respectively. As an initial guess for the interest rate, we use the steady-state capital stock for the corresponding representative agent model as implied by 1/β = 1 + r − δ. For the computation of the distribution function F (·), we, again, need to discretize the continuous variables of the individual state space. We use na = 100 equispaced points over the individual asset space A . Furthermore, we have ni = 5 types of agents, nz = 2 states of productivity, and ne = 2 states of employment. Step 2: Agents need to predict next-period factor prices w0 and r 0 . Factor prices are functions of both aggregate capital K 0 and aggregate labor L 0 and the exogenous technology level Z 0 . To predict the capital stock K 0 , agents need to know the dynamics of the distribution. They only use partial information about the distribution, namely its first I moments. We choose I = 1. Agents only consider the aggregate capital stock as a statistic for the distribution. As argued above, this assumption is warranted if agents of different wealth have approximately equal savings rates (except for the very poor who do not contribute much to aggregate savings). Therefore, the value function of the agents, v(i, ε, a, Z, K), and the consumption function, c(i, ε, a, Z, K), are functions of the individual efficiency type i, employment status ε, asset holdings a, aggregate productivity Z, and aggregate capital stock K. The value function and the policy functions are both five-dimensional objects. This may impose some computational problems. For example, in
9.5 Applications
535
older versions of GAUSS, only two-dimensional objects can be stored.47 There are two ways to solve this problem. First, in our model, there is only a small number of efficiency types i = 1, . . . , 5, two states of technology, Z ∈ {Z g , Z b }, and two employment statuses, ε ∈ {e, u}. Consequently, we can store the two-dimensional value matrices v(a, K; i, ε, Z) for the 5 × 2 × 2 = 20 different values of i, ε, and Z separately. This is how we proceed. If the number of states is increasing, of course, this procedure becomes cumbersome. In the latter case, you may want to store the value function in one matrix, reserving the first na rows for i = 1, Z = Z g , ε = e, the next na rows for i = 2, Z = Z g , ε = e, and so forth. In the second case, of course, it is very convenient for the computation to write a subroutine that returns you the value function v(a, K; i, ε, Z) for a state vector (i, ε, Z). For the initialization of the consumption function for each (i, ε, Z), we assume that the agents consume their respective income. We further initialize the distribution of assets, assuming that every agent holds equal wealth. The initial state of the economy is chosen at random. With probability 0.5, Z = Z g . Otherwise, the bad state Z = Z b prevails. As we dispense with the first 100 simulated time periods, the initial choice of the distribution and productivity does not matter. Step 3: We again impose a very simple law of motion for the capital stock. As in (9.26), we assume that the aggregate capital stock follows a log-linear law of motion in good and bad times, respectively: γ0g + γ1g ln K, if Z = Z g , 0 ln K = (9.49) γ0b + γ1b ln K, if Z = Z b . We initialize the parameters as follows: γ0g = γ0b = 0.09 and γ1g = γ1b = 0.95. In the outer loop, we iterate over the law of motion presented by (9.49). We set the number of iterations to a maximum of nq = 100 and stop as soon as the maximum of the absolute percentage difference between any of the four coefficients γ0x or γ1x , x ∈ {g, b}, falls below 0.01. In our simulation, we need 67 iterations over the law of motion. Since we simulate the dynamics of the shocks using a pseudorandom number generator, of course, the number of necessary iterations in the outer loop is stochastic and may differ between runs. Step 4: In this step, we compute the optimal next-period asset level a0 (i, ε, a, Z, K) by value function iteration. Between grid points, we inter47
In a more recent version, the use of n-dimensional arrays is facilitated for use by the programmer but is less comfortable than in other computer languages such as Python or R MATLAB .
536
9 Dynamics of the Distribution Function
polate linearly. The maximization of the rhs of the Bellman equation is performed using the golden section search algorithm 15.4.1. We need to find the optimum for 50 × 5 × 5 × 2 × 2 = 5, 000 grid points. The computation is much faster 1) if we compute and store the next-period value v(i 0 , ε0 , a0 , Z 0 , K 0 ) for all nk values K 0 (K) where K 0 is computed from the dynamics (9.49), before we start iterating over i, ε, a and Z. 2) We make use of both the monotonicity of the next-period asset level a0 (a, ·) and the value function v(a, ·) with respect to a and the concavity of the rhs of the Bellman equation with respect to a0 . In particular, we stop searching over the next-period asset grid a0 if the rhs of the Bellman equation decreases, and we do not search for the optimal next-period asset level for values of a0 (ai , ·) below a0 (ai−1 , ·) for ai > ai−1 . Step 5: To simulate the dynamics of the wealth distribution, we choose a sample of nh = 5, 000 households. We divide the households into 10 subsamples (i, ε), i = 1, . . . , 5, ε ∈ {e, u}. We know that the relative numbers of these subsamples are equal to N i (Z) and 1 − N i (Z) for Z = Z g and Z = Z b , respectively. We initialize the distribution such that each agent has equal wealth in period 1. In particular, the average wealth in period 1 is equal to the aggregate capital stock in the economy.48 The assets of the next period are computed with the help of the optimal decision rule a0 (a, K; i, ε, Z) for each household. The aggregate capital stock of the economy is equal to average wealth in the economy. We further use a random number generator to find 1) the productivity level of the next period Z 0 using the transition matrix π(Z 0 |Z), 2) the employment status of the next period ε0 using the transition matrix πi ((ε0 , Z 0 )|(ε, Z)), and 3) the efficiency type of the individual using the income mobility matrix π(i 0 |i). In period t, we have a sample of 5,000 households with wealth holdings a t and a distribution with mean K t . The productivity level is equal to Z t . The number of employed households of type i, for example, may not be equal to N i (Z t ). For this reason, we choose a random number of agents and switch their employment status accordingly. We also may have to switch the productivity type i. For this reason, we start by considering the households with efficiency i = 1 and ε = e. If their number is smaller than N 1 (Z t ), we switch the missing number of the households with i = 2 and ε = e to i = 1 and ε = e at random. Otherwise, we switch the surplus number of households with type i = 1 to type i = 2. We continue this 48
In the very first simulation, we use the aggregate capital stock as an initial guess that is computed from the steady state of the corresponding representative-agent model.
9.5 Applications
537
process for i = 1, . . . , 5, ε = e, u. By this procedure, agents of type i = 1 may not be switched to agents of type i = 4, for example. We judge this to be a reasonable imposition of the law of large numbers.49 Step 6: We divide the simulated time series of the aggregate capital stock t=2,000 {K t } t=101 into two subsamples, with Z t = Z g and Z t = Z b , respectively. For the two subsamples, we estimate the coefficients γ0 and γ1 of equation (9.26) with the help of an OLS regression. Step 7: We continue this iteration until the estimated OLS regressors of the log-linear law of motion for the capital stock converge. As it turns out (Step 8), the fit of the regression is very accurate, with an R2 close to one. RESULTS. The economy with efficiency mobility behaves very similarly to that without efficiency mobility. For this reason, we concentrate on displaying the results for the former economy if not mentioned otherwise. The law of motion (9.49) is given by: 0.1042 + 0.981 ln K, if Z = Z g , 0 ln K = (9.50) 0.0915 + 0.983 ln K, if Z = Z b . The stationary average aggregate capital stock amounts to K = 228.50 Total computational time amounts to 3 hours, 37 minutes using an Intel(R) Xeon(R), 2.90 GHz machine. The distribution of earnings among the employed agents is proportional to their efficiency type ζi and the wage rate w. Of course, wage w is procyclical in our model. As home production is assumed to be constant over the business cycle, while the earnings of the employed agents increase during booms and decrease during recessions, the distribution of earnings over all workers is not constant over the business cycle. During booms, Z = Z g , earnings are less concentrated and characterized by a Gini coefficient equal to 0.292. During a recession, the Gini coefficient of earnings increases to 0.304 as a larger share of workers has to rely upon home production; the share of unemployed increases from 8.0% to 11.4%. Since income ¯ is less than half of that of the worker with the from home production, w, 49
These problems arising from the fact that the law of large numbers does not hold in our Monte Carlo simulation do not show up in the methods that we present in Section 11.3.4. In this section, we approximate the distribution function over the individual states by a piecewise linear function and simulate its dynamics. 50 For example, aggregate capital amounts to K = 229 in the economy without efficiency mobility, and the laws of motion for the capital stocks are given by ln K 0 = 0.0970 + 0.982 ln K and ln K 0 = 0.0888 + 0.983 ln K in good and bad times, respectively.
538
9 Dynamics of the Distribution Function
lowest wage rate (efficiency type i = 1), the income share of the lowest income quintile drops during recession and is procyclical.51 Since earnings are positively correlated with wealth (and, hence, interest income),52 the concentration of income (earnings plus interest income) is stronger than that of earnings. In addition, the Gini coefficient of income is also more volatile than the Gini coefficient of earnings and varies between 0.296 and 0.322 over the cycle. The Lorenz curve of income is displayed in Figure 9.16. The income shares are computed as averages over 2,000 periods. Note that we are able to very closely replicate the empirical distribution of income.53 The income distribution implied by the model is almost identical to that in the US during the period 1946-84.
Percent of Income
1.00 0.75
Equal Distribution Model US Economy
0.50 0.25 0.00 0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Percent of Population Figure 9.16 Lorenz Curve of Income
Table 9.2 reports the cyclical behavior of income shares for the US and for the model economy with varying efficiency types. The empirical correlations of US output and income shares are taken from Table 2 in Castañeda et al. (1998b). The sample period, again, is 1948-86. The yearly output data are logged and detrended using a Hodrick-Prescott filter with 51
There is a partially offsetting effect from the behavior of the wage rate per efficiency unit, w, that falls during a recession such that the wage gap of the employed and unemployed ¯ also declines during a recession. However, in the lowest income quintile, wζ1 h(Z) − w, the effect of higher unemployment on the income share dominates. 52 The correlation coefficient of earnings and wealth amounts to 0.35. 53 The empirical values for the US income and wealth distributions during 1948-86 are provided in Table 1 and 6 of Castañeda et al. (1998b), respectively.
9.5 Applications
539
a smoothing parameter µ = 100.54 The income share of the lower quintiles (0-60%) is procyclical, the income share of the fourth quintile and next 15% (60-95%) is countercyclical, while the top 5% income share is acyclical. Table 9.2 Correlation of Income Shares and Output Income Quintile
US mobility
Model no mobility
lowest quintile (0-20%) second quintile (20-40%)
0.53 0.49
0.80 0.78
0.82 0.70
third quintile (40-60%) fourth quintile (60-80%) next 15% (80-95%) top 5% (95-100%)
0.31 -0.29 -0.64 0.00
-0.73 -0.79 -0.81 -0.80
-0.74 -0.79 -0.82 -0.83
In the third column of Table 9.2, we report the statistics computed from our simulation over 2,000 periods for the economy with time-varying efficiency types. Again, output is logged and detrended using the HodrickPrescott filter with µ = 100 to compare it to the empirical numbers. Therefore, we need to compute annual averages of output and income and earnings shares for 2,000/8=250 years. The simulated correlation of income is only in good accordance with the empirical observations for the first and second income quintiles as well as for the 80-95% income percentile class. As one possible explanation for the rather poor modeling of the other percentiles, we do not allow for endogenous labor supply (which may result in more procyclical behavior of the 3rd and 4th income quintiles), and we are not very successful in replicating the wealth distribution (which may result in more procyclical interest and profit income for the top 5% of the income distribution). The cyclical behavior of income shares in the economy without mobility is presented in the right column of Table 9.2 and is found to be almost identical to the one in the economy with mobility. The most pronounced effect of income mobility on the distribution of the individual variables earnings, income, and wealth is on the concentration 54 We use the same weight µ as applied by Castañeda et al. (1998b). Ravn and Uhlig (2002) argue to use the weight parameter 6.5 for annual data instead.
540
9 Dynamics of the Distribution Function
of wealth. There are two opposing effects of income mobility on wealth heterogeneity: In the economy with time-varying efficiency types, wealthrich and income-rich agents of type i = 2, 3, 4, 5 accumulate higher savings for precautionary reasons in case they move down the income ladder. This effect, of course, increases wealth concentration in our economy, and we would expect the Gini coefficient of wealth to be higher in the economy with efficiency mobility for this reason. On the other hand, agents of type i = 5, for example, might have had efficiency type i = 4 or even lower in previous periods and thus have accumulated less wealth than agents who have had efficiency type i = 5 forever. For this reason, wealth heterogeneity is lower in an economy with time-varying efficiency types. As it turns out, the latter effect dominates, and wealth heterogeneity is higher in the case of no mobility. In both economies, the endogenous wealth concentration is much lower than observed empirically, and the Gini coefficient of wealth a only amounts to 0.274 (0.340) in the economy with varying efficiency types (no efficiency mobility).55 Figure 9.17 displays the Lorenz curve of our model economy with time-varying efficiency types and the US wealth distribution. In the next chapter, you will determine how we can improve the modeling of the wealth distribution.
Percent of Wealth
1.00 0.75
Equal Distribution Model US Economy
0.50 0.25 0.00 0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Percent of Population Figure 9.17 Lorenz Curve of Wealth
55
The concentration of wealth is basically acyclical. The correlation of the wealth Gini coefficient with detrended log output amounts to -0.01.
Problems
541
Problem 9.1: Transition Dynamics in Example 9.3.1 1) Assume that agents use the first two moments to forecast future factor prices in Example 9.3.1. Show that the consideration of an additional moment does not result in much higher accuracy in the prediction of the factor prices than in the case of one moment. 2) Assume that the initial distribution of the economy described in Example 9.3.1 is given by the stationary distribution, and consider a policy that increases the replacement rate of unemployment insurance with respect to wages to 40%. Compute the transition dynamics, and assume that the income tax rate always adjusts to balance the budget. How does the wealth distribution change? Compute the Gini coefficients of the income and wealth distribution during the transition and in the new stationary state. 3) Compute the stationary state and the transition dynamics for the growth model of Example 9.3.1 with leisure. Use the utility function (9.14) with θ = 0.5. Apply the prediction function ln L 0 = ψ0 + ψ1 ln L for aggregate labor L. 4) Implement the Algorithm 9.3.2 using weighted residual methods for the computation of the policy functions.
Problem 9.2: Aggregate Uncertainty 1) Assume that agents use the first two moments to forecast future factor prices in Example 9.4.1. Show that the consideration of an additional moment does not result in much higher accuracy in the prediction of the factor prices than in the case of one moment. 2) Assume that, in Example 9.4.1, the logarithm of technology, z := ln Z, follows the first-order autoregressive process z t = ρz t−1 + η t with ρ = 0.9 and η t ∼ N (0, 0.01). Compute a 5-state Markov-chain approximation of the AR(1)process using Tauchen’s and Rouwenhorst’s methods. Compute the model of Example 9.4.1. How do results change if you use 9 states instead of 5 states for the Markov chain approximation? 3) Assume that the unemployment rate is not constant during booms or recessions. Assume that leisure is an argument of the utility function. How does program Krusell_Smith_algo.g need to be adjusted?
Problem 9.3: The Shape of the Wealth Distribution In the standard neoclassical growth model with stochastic income considered in Example 9.4.1, the wealth distribution is too equal compared to the empirical
542
9 Dynamics of the Distribution Function
distribution. As suggested by Krusell and Smith (1998), the introduction of a stochastic discount factor helps to improve the modeling of the skewed wealth distribution. For this reason, assume that β can take on three values, 0.955, 0.96, and 0.965. The invariant distribution of βs is such that 80% of the population is in the middle class and 10% of the population is in each of the other βs. Furthermore, there is no transition from the bottom to the top value of β, and the transition matrix is given by 0.98 0.02 0 Π(β 0 |β) = 0.01 0.98 0.01 0 0.02 0.98 Compute the model with the help of the Krusell-Smith algorithm 9.4.1. Simulate the economy over 6,000 periods. Compute the average Gini coefficient of the wealth distribution in the last 5,000 periods and compare it with that in the model without discount factor uncertainty.
Problem 9.4: Costs of Business Cycles 1) Compute the gain in average expected lifetime utility from eliminating the business cycle fluctuations in the model of ˙Imrohoro˘ glu (1989b) presented in Section 9.5.1. 2) Assume that there is perfect insurance in the economy described in Section 9.5.1. Each agent receives the average income in the economy. By how much is the utility gain (as measured in consumption equivalents) reduced from the elimination of cyclical fluctuations?
Problem 9.5: Dynamics of the Income Distribution Compute the model of Section 9.5.2 with the same calibration except that π((ε0 , Z g )|(ε, Z b )) = π((ε0 , Z b )|(ε, Z b )) and π((ε0 , Z b )|(ε, Z g )) = π((ε0 , Z g )|(ε, Z g )). Note that for this calibration, next-period aggregate employment L is not only a function of next-period aggregate productivity Z 0 but also of current-period productivity and employment, L 0 = L 0 (L, Z, Z 0 ). Recompute the mean Gini coefficients of income and earnings and the correlation of income and earnings with output.
Chapter 10
Overlapping Generations Models with Perfect Foresight
10.1 Introduction In this chapter, we introduce an additional source of heterogeneity. Agents not only differ with regard to their individual productivity or their wealth but also with regard to their age. First, you learn how to compute a simple overlapping generations (OLG), where each generation can be represented by a homogeneous household. Subsequently, we study the dynamics displayed by the typical Auerbach-Kotlikoff model. We will pay particular attention to the updating of the transition path for the aggregate variables and the computation of the steady state in a large-scale OLG model with more than one hundred endogenous variables. The previous two chapters concentrated on the computation of models based on the Ramsey model. In this chapter, we analyze overlapping generations models. The central difference between the OLG model and the Ramsey model is that there is a continuing turnover of the population.1 The lifetime is finite, and in every period, a new generation is born, and the 1
Finite lifetime and population turnover can also be considered in the neoclassical growth model by the introduction of a constant probability of death in each period. This model of perpetual youth is based on the work of Yaari (1965) and Blanchard (1985). However, it is difficult to replicate the empirical age structure of an economy in detail. Therefore, only a few quantitative DGE studies apply this framework. One of the few exceptions is Gertler (1999), which extends the perpetual youth model of Yaari (1965) and Blanchard (1985) to a two-period model with workers and retirees that nests the infinite-horizon and the 2-period OLG models as special cases. In his model, agents are born as workers in the first period of their life and become retirees with a constant probability. In the second period of life, agents again die with a constant probability. The model of Gertler (1999) is particularly suitable for the study of debt and redistribution between generations. Its setup is analytically most convenient and allows for easy aggregation over different cohorts if preferences are specified recursively. However, this tractability comes at a cost. The age © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_10
543
544
10 Overlapping Generations Models with Perfect Foresight
oldest generation dies.2 In such models, many cohorts coexist at any time. In the pioneering work on OLG models by Allais (1947), Samuelson (1958) and Diamond (1965), the number of coexisting cohorts only amounted to two, the young and working generation on the one hand and the old and retired generation on the other hand. In these early studies of simple OLG models, Samuelson (1958), and Diamond (1965) focused on the analysis of theoretical problems, i.e., whether there is a role for money and what the effects of national debt are, respectively.3 Subsequent research has been directed towards the study of large-scale numerical OLG models to evaluate the quantitative effects of economic policy. In such works, typically, cohorts are identified with the members of the population of the same age. One seminal work in this area is the study of dynamic fiscal policy by Auerbach and Kotlikoff (1987).4 In their work, the first cohort is identified as the 20-year-old cohort, which enters the labor market. Fifty-five different generations are distinguished such that at the end of age 74, all agents die. In their 55-period overlapping generations model with a representative household in each cohort, they report, among other findings, that a 60% benefit level of unfunded social security decreases welfare by approximately 5-6% of total wealth (depending on the financing of the social security expenditures). In addition to the early work by Auerbach and Kotlikoff (1987), subsequent authors have introduced various new elements into the study of overlapping generations, such as stochastic survival probabilities, bequests, or individual income mobility, to name but a few. The remaining part of this chapter is organized as follows. In Section 10.2, you are introduced to the basic life-cycle model with agedependent cohorts, and we compute the steady state. In Section 10.3, we consider the computation of the Laffer curve in the OLG model as an application. A particular emphasis will be placed on the provision of a good initial value for the computation of the steady state with the help of the Newton-Rhapson algorithm. In the next two sections, the transition between two steady states is examined, first in a simplified 6-period OLG distribution in this model follows from the assumption of constant survival probabilities in each period. Therefore, it is less able to match empirical age distributions of the population. 2 In Sections 10.3 and 10.5, we introduce stochastic survival in the overlapping generations model. In this case, some members of the interim cohorts between the youngest and the oldest also die. 3 Allais (1947) considered a pure exchange economy with fixed production (endowments). 4 Other early studies of life-cycle economies include Summers (1981), Auerbach et al. (1993), Evans (1983), or Hubbard and Judd (1987).
10.2 The Steady State in OLG Models
545
model in Section 10.4 and subsequently in a large-scale OLG model in Section 10.5 where we study the consequences of the demographic transition for the US economy. In the concluding Section 10.6, we evaluate the importance of OLG models in the study of macroeconomic problems. In this chapter, we focus on OLG models with perfect foresight both for the individual and the aggregate economy. OLG models with uncertainty will be considered in the next chapter.
10.2 The Steady State in OLG Models In this section, we solve an overlapping generations model without uncertainty. All agents in a given cohort are identical, and their behavior is analyzed by means of the behavior of a representative agent.
10.2.1 An Elementary Model We use a 60-period overlapping generations model as an illustration. The periods correspond to years. The model is a much-simplified version of the economy studied by Auerbach and Kotlikoff (1987).5 Three sectors can be depicted: households, production, and the government. HOUSEHOLDS. Every year, a generation of equal measure is born. The total measure of all generations is normalized to one. Their first period of life is period 1. A superscript s on a variable denotes the age of the generation, and a subscript t denotes time. For example, c ts and l ts denote consumption and labor supply of the s-year-old generation at time t. Households live T = T W + T R = 40 + 20 years. Consequently, the measure of each generation is 1/60. During their first T W = 40 years as workers, agents supply labor l ts at age s in period t enjoying leisure 1 − l ts . After T W years, retirement is mandatory (l ts = 0 for s > T W ). Agents maximize lifetime utility at the beginning of age 1 in period t: Ut = 5
T X s=1
s s β s−1 u(c t+s−1 , 1 − l t+s−1 ),
(10.1)
For example, we do not consider different types of agents in a given cohort, and we model the tax and pension system in a very stylized way.
546
10 Overlapping Generations Models with Perfect Foresight
where β denotes the discount factor. Note that, in contrast to the discount factor β in the Ramsey model, β does not necessarily need to be below one in an OLG model to guarantee that lifetime utility is finite.6 Instantaneous utility is a function of both consumption c and leisure 1 − l: u(c, 1 − l) =
(c(1 − l)γ )1−η − 1 . 1−η
(10.2)
For this utility function, the coefficient of relative risk aversion amounts to R(c, l) = η − γ(1 − η).7 Agents are born without wealth, k1t = 0, and do not leave bequests, 61 k t = 0. Since capital k is the only asset held by individuals, the terms capital and wealth will henceforth be used interchangeably. Agents receive income from capital kst and labor l ts . The real budget constraint of the working agent is given by s s s W ks+1 t+1 = (1 + r t )k t + (1 − τ t )w t l t − c t , s = 1, . . . , T ,
(10.3)
s s W ks+1 + 1, . . . , T W + T R . t+1 = (1 + r t )k t + pen − c t , s = T
(10.4)
where r t and w t denote the real interest rate and the real wage rate in period t, respectively. Wage income in period t is taxed at rate τ t . We can also interpret τ t w t l ts as the worker’s social security contributions. During retirement, agents receive public pensions pen irrespective of their employment history, and the budget constraint of the retired worker is given by
Let
y ts
¨ :=
(1 − τ t )w t l ts + (1 + r t )kst pen + (1 + r t )kst
for s = 1, . . . , T W
for s = 1 + T W , . . . , T
(10.5)
denote the income of the s-year old household. Consider a worker of age s = 1, . . . , T W . The Lagrangian function of this household at time t, whose remaining life-time are T − s years, reads L ts =u c ts , 1 − l ts + λst y ts − c ts − ks+1 t+1 s+1 s+1 s+1 s+1 s+2 + βu c t+1 , 1 − l t+1 + βλs+1 t+1 y t+1 − c t+1 − k t+2 + . . . T T T −s T −s T T + β T −s u c t+T λ t+T −s y t+T −s , 1 − l t+T −s + β −s − cs+T −s .
6
For restrictions on the size of β in economies with infinitely lived agents, see Deaton (1991). 7 Compare (1.52a).
10.2 The Steady State in OLG Models
547
Note that we have not included the capital stock for age T + 1, since with certain death at the end of age T and without a motive to leave bequests T +1 the obvious optimal choice is k t+(T = 0. The first-order conditions are +1−s) ∂ L ts c ts
∂ ∂ L ts
∂ l ts ∂ L ts
∂ ks+1 t+1
= uc (c ts , 1 − l ts ) − λst = 0,
(10.6a)
= −u1−l (c ts , 1 − l ts ) + λst (1 − τ t )w t = 0,
(10.6b)
= −λst + βλs+1 t+1 (1 + r t+1 ) = 0.
(10.6c)
The first-order conditions (10.6a) and (10.6b) can be combined to u1−l (c ts , 1 − l ts ) uc (c ts , 1 − l ts )
=γ
c ts
1 − l ts
= (1 − τ t )w t .
(10.7a)
Using (10.6a) for ages s and s + 1 to replace the respective Lagrange multipliers in (10.6c) yields s+1 s+1 uc (c t+1 , 1 − l t+1 ) 1 = [1 + r t+1 ] s s β uc (c t , 1 − l t ) s+1 −η s+1 γ(1−η) c t+1 1 − l t+1 = −η γ(1−η) [1 + r t+1 ] . c ts 1 − l ts
(10.7b)
The first-order conditions of the retired workers, s = T W + 1, . . . , T , are s given by (10.7b) and l t+T −s = 0. PRODUCTION. The production sector is identical to the one used in previous chapters. Firms are of measure one and produce output Yt in period t with aggregate labor L t and capital K t . Labor L t is paid the wage w t . Capital K t is hired at rate r t and depreciates at rate δ. Production Yt is characterized by constant returns to scale and assumed to be Cobb-Douglas: Yt = F (K t , L t ) = K tα L 1−α . t Profits are represented by Π t = Yt − w t L t − r t K t − δK t .
(10.8)
548
10 Overlapping Generations Models with Perfect Foresight
In a factor market equilibrium, factors are rewarded with their marginal product: w t = (1 − α)K tα L −α t , rt =
αK tα−1 L 1−α t
− δ.
(10.9a) (10.9b)
GOVERNMENT. The government uses revenues from taxing labor to finance its expenditures on social security: TR pen, (10.10) T where we exploit the facts that each cohort has mass 1/T and the number of retiree cohorts is equal to T R . Following a change in the provision of public pensions pen or in gross labor income w t L t , the labor income tax rate τ t adjusts to keep the government budget balanced. τt w t L t =
EQUILIBRIUM. The concept of equilibrium applied in this section uses a recursive representation of the consumer’s problem following Stokey et al. (1989). This specification turns out to be very amenable to one of the two solution methods described in this section. For this reason, let v s (kst ; K t , L t , pen) be the value of the objective function of the s-year-old agent. The value function v s (·), in particular, depends on individual wealth kst and the aggregate state variables K t and L t that determine the wage rate w t and the interest rate r t in period t via (10.9a) and (10.9b) and, in addition, on government policy pen, which implies the tax rate τ t with the help of the balanced budget (10.10). Furthermore, v s (·) depends on the age s of the household but not on calendar time t. v s (k t ; K t , L t , pen) is defined as the solution to the following dynamic programs:8 for workers of age s = 1, . . . , T W s s s+1 s+1 v s (kst ; K t , L t , pen) = max u c , 1 − l + β v k ; K , L , pen t+1 t+1 t t t+1 s s c t ,l t
(10.11a)
subject to (10.3); for retired households of age s = 1, . . . , T − 1 s s s+1 s+1 v s (kst ; K t , L t , pen) = max u c , 1 − l + β v k ; K , L , pen t+1 t+1 t t t+1 s ct
8
(10.11b)
Some authors drop the aggregate state and policy variables as arguments of the value function for notational convenience. We will proceed in the same way if appropriate.
10.2 The Steady State in OLG Models
549
subject to (10.4); and for members of the oldest generation v T (k tT ; K t , L t , pen) = u(c tT , 1).
(10.11c)
An equilibrium for a given government policy pen and initial distribu T tion of capital k0s s=1 is a collection of value functions v s (kst ; K t , L t , pen), individual policy rules for consumption c s (kst ; K t , L t , pen), labor supply l s (kst ; K t , L t , pen), and next-period capital ks+1 (kst ; K t , L t , pen), and relative prices of labor and capital, (w t , r t ), such that: 1. Individual and aggregate behavior are consistent: Lt = Kt =
TW s X l t
s=1 T X s=1
T kst T
,
(10.12a)
.
(10.12b)
The aggregate labor supply L t is equal to the sum of the labor supplies of each cohort, each of which is weighted by its measure 1/T = 1/60. Similarly, the aggregate capital supply is equal to the sum of the capital supplies of all cohorts. 2. Factor prices (w t , r t ) solve the firm’s optimization problem by satisfying (10.9a) and (10.9b). 3. Given factor prices (w t , r t ) and the government policy pen, the individual policy rules c ts (·), l ts (·), and kst+1 (·) solve the consumer’s dynamic program (10.11a)-(10.11c). 4. The goods market clears: Yt =
K tα L 1−α t
=
T X cs t
s=1
T
+ K t+1 − (1 − δ)K t .
(10.13)
5. The government budget (10.10) is balanced. CALIBRATION. Our model merely serves as an illustration. Therefore, we calibrate our model with the functional forms and parameters as commonly applied in DGE life-cycle models. Our benchmark case is characterized by the following calibration: η = 2, β = 0.99, α = 0.3, δ = 0.1, replacement pen rate of pensions with respect to average net wage income, r epl = (1−τ)w¯l = 0.3 (where ¯l denotes the average labor supply in the economy), T W = 40,
and T R = 20. γ is chosen to imply a steady-state labor supply of the working agents approximately equal to ¯l = 35% of available time and amounts to γ = 2.0.
550
10 Overlapping Generations Models with Perfect Foresight
10.2.2 Computational Methods In this section, we compute the steady state that is characterized by a 60 constant distribution of the capital stock over the generations, kst s=1 = s 60 60 k = ¯ks . In the steady state, the aggregate capital stock and t+1 s=1
s=1
aggregate employment are constant, K t = K and L t = L, respectively. As a consequence, prices w and r are also constant, and so are taxes τ. Therefore, in the steady state, the computation of the equilibrium is simplified, as for given aggregate capital stock K and employment L, the value function and the individual policy functions are only functions of age s and individual wealth ks . For notational convenience, we drop the time index t in this section and will only reintroduce it in the next section. The general solution algorithm for the computation of the steady state in our elementary OLG model is described by the following steps: Algorithm 10.2.1 (Computation of the Steady State in the OLG Model in Section 10.2.1) Purpose: Computation of the steady state. Steps: Step 1: Make initial guesses of the steady-state values of the aggregate capital stock K and employment L. Step 2: Compute the values w, r, and τ, which solve the firm’s Euler equations and the government budget. Step 3: Compute the optimal path for consumption, savings, and employment for the new-born generation by backward induction given the initial capital stock k1 = 0. Step 4: Compute the aggregate capital stock K and employment L. Step 5: Update K and L, and return to step 2 until convergence. In Step 3, the household’s optimization problem needs to be solved. For the computation of the individual policy functions, we can use the same two techniques that we applied in the solution of the Ramsey model in Chapter 1. 1) The Euler equations (10.7a) and (10.7b) provide a set of difference equations that determine the optimal time path of consumption over an individual’s lifetime. We are able to solve for the dynamics of the variables with the help of a nonlinear equations solver and call this method the direct computation of the steady-state in the following. 2) Dynamic programming (10.11a) delivers a policy function that relates the agent’s choice of current consumption to his stock of capital. In this case, the
10.2 The Steady State in OLG Models
551
policy function is computed over the whole individual state space at age s, s s [kmin , kma x ]. The first method involves the solution of a nonlinear equations problem and is only applicable in a model environment with a small number of different individuals as we solve the intertemporal optimization problem for all the different types in the OLG model separately. To understand this argument just remember how we introduced unemployment in the Ramsey model in Section 8.3. Consider our present OLG model and assume instead that the working-age households also faces the risk of unemployment and can become unemployed at each age s = 1, . . . , 40 with an exogenous probability. Starting at age s = 1, we have two kinds of households in the cohort, one employed and one unemployed receiving wage income and lower unemployment benefits, respectively. Accordingly, the two types of households will be characterized by different savings and accumulated capital stock k2,e and k2,u at the beginning of age s = 2, respectively. In the next period at age s = 2, we will have four kinds of households because some of the former employed with capital stock k2,e will become unemployed, while others stay employed. Similarly, among the formerly unemployed households with capital stock k2,u , some of the agents will become employed while the others remain unemployed. Since the savings of the individual are not constant over age,9 the wealth distribution of the individuals at the beginning of age s depends on the entire employment history and not only on the number of periods of unemployment. The individual who was unemployed at age s = 1 but employed at age s = 2 builds up different savings than an individual who was employed at age s = 1 and unemployed at age s = 2. Therefore, in this simple economy, there will be 240 = 1.1 · 1012 different individuals at age s = 40. This number is prohibitively high and we cannot compute the solution of the Euler equation for all these different types of households at age s = 40. If we consider OLG models with heterogeneity and idiosyncratic risk dynamic programming will often be the only workable procedure.10 However, it comes at a cost. In general, it is slower and less accurate than the computation of the policy function at a few specific points with the help of the Euler equations. Therefore, it is important to sort out whether 9
To see this argument notice that the household who was employed at age s = 1 will receive higher interest income at age s = 2 than the household who was unemployed at age s = 1. In addition, we will find out that labor supply is not constant over age. Therefore, labor income and, hence, savings of the employed households will also be age-dependent. 10 In Section 11.3.4, we present an alternative solution method in an OLG model with both individual and aggregate uncertainty.
552
10 Overlapping Generations Models with Perfect Foresight
idiosyncratic risk and income mobility are important for the economic problem that you study. For example, in Section 9.5.2, we found that idiosyncratic income risk does not significantly help to improve the modeling of the business cycle dynamics of the income distribution in the simple infinite-life model with heterogeneous agents in this section. As another application, Huggett et al. (2011) analyze an overlapping generations model with heterogeneity in initial abilities, wealth, and human capital and also consider idiosyncratic shocks to human capital that they estimate from US data. They find that initial endowments of human capital and wealth are more important for the explanation of inequality than are idiosyncratic shocks over the lifetime. In these two applications, for example, neglecting idiosyncratic risk might be acceptable. In other applications, for example, in the consideration of optimal pension policies11 where one of the main welfare-improving properties of social security systems stems from insurance against negative income shocks, neglecting stochastic individual income is not warranted. Next, we describe the two methods for the computation of the steady state in turn.
10.2.3 Direct Computation To illustrate the direct computation of the steady-state distribution, consider the first-order conditions of the working household with respect to labor supply and next-period capital stock, (10.7a) and (10.7b), respectively. Inserting the working household’s budget (10.3) into these two equations, we derive the following two steady-state equations for s = 1, . . . , T W − 1: (1 + r)ks + (1 − τ)wl s − ks+1 , 1 − ls −η (1 + r)ks+1 + (1 − τ)wl s+1 − ks+2
(1 − τ)w =γ 1 = β
((1 + r)ks + (1 − τ)wl s − ks+1 )−η γ(1−η) 1 − l s+1 × [1 + r] . (1 − l s )γ(1−η)
(10.14a) (10.14b)
Similarly, (10.14a) also holds for s = T W , while (10.14b) needs to be adjusted: 11
See the literature review in Section 10.6 at the end of this chapter.
10.2 The Steady State in OLG Models
(1 + r)k T
W
+1
+ pen − k T
553 W
+2
−η
1 = −η β (1 + r)k T W + (1 − τ)wl T W − k T W +1 1 × γ(1−η) [1 + r] . 1 − lTW
(10.15)
For the retired agent, the labor supply is zero, l s = 0, and the Euler equation is given by: −η (1 + r)ks+1 + pen − ks+2 1 = [1 + r] , (10.16) β ((1 + r)ks + pen − ks+1 )−η
for s = T W + 1, . . . , T − 1 = 41, . . . , 59. Recall that the optimal capital stock after death is also set equal to zero, k61 ≡ 0. The equations (10.14a)(10.16) for s = 1, . . . , 59 constitute a system of 59+40=99 equations in the s 40 59+40=99 unknowns {ks }60 s=2 and {l }s=1 . Therefore, we have the same type of problem that we already encountered in Section 1.2, where we considered the finite-horizon Ramsey model. We again need to compute a nonlinear system of equations in n unknowns, in our case with n = 99. However, as you may have learned by now, the computation of such largescale non-linear problems may become cumbersome. Therefore, we should make further use of the recursive structure of the problem. In the GAUSS program AK60_direct.g, we compute the solution of this problem. We know that agents are born without wealth at age 1, k1 = 0, and do not leave bequests. Therefore, k61 = 0. Let us start by providing an initial guess of the wealth in the last period of life, k60 . With the help of this initial guess and the retired worker’s first-order condition (10.16) at age s = 59, we are able to compute k59 . In this case, we only have to solve a nonlinear equation problem with one unknown. Having computed ks+1 and ks+2 for the retired agent, we simply iterate backwards and compute ks for s = 59, 58, . . . , 41. From (10.14a) for s = 40 and (10.15), we are able to compute l 40 and k40 . We continue to compute ks and l s with the help of the values l s+1 , ks+1 , and ks+2 found in the previous two iterations with the help of equations (10.14a) and (10.14b) until we obtain k1 and l 1 . If k1 = 0, we are finished. Otherwise, we need to update our guess for k60 and recompute the distribution of individual capital and labor supply, s 40 {ks }60 s=1 and {l }s=1 . Note, in particular, that we need to iterate backwards. We cannot start with a guess of k2 given k1 = 0 and iterate forwards in the presence of endogenous labor supply. With exogenous labor supply, we would also be able to find the optimal capital distribution with forward iteration (why do you think this might be?).
554
10 Overlapping Generations Models with Perfect Foresight
Finally, we need to mention how we update successive values for k60 . In AK60_direct.g, we apply the Secant method that we present in Section 15.3.12 Successive values of k60 are found by: 60 60 ki+2 = ki+1 −
60 ki+1 − ki60 1 ki+1 − ki1
1 ki+1 ,
where the subscript i denotes the number of the iteration. As the first two guesses for k60 , we choose the values k160 = 0.15 and k260 = 0.2.13 After 5 iterations, we find the absolute value of k51 to be below 10−8 .
¯ks
1.5 1.0 0.5 0.0
1
10
20
30
40
s
50
60
Figure 10.1 Wealth-Age Profile in the Standard OLG Model
60 The solution for ¯ks s=1 is displayed in Figure 10.1. As is typical for the life-cycle model, wealth ks increases until the first period of retirement at age s = T W + 1 and declines monotonically thereafter. At age T W + 1, the income drops and savings that affect the capital stock in the next period decline.14 The aggregate capital stock amounts to K = 0.940. The optimal labor supply is graphed in Figure 10.2. The labor supply declines with 12
Alternatively, you can also apply the Newton-Rhapson algorithm. You will be asked to perform this in Problem 10.1. 13 One way to come up with an initial value of k160 is by trial and error or a grid search. A more elaborate guess for the initialization of the individual policy variables will be derived in Section 10.3. 14 If we assume that labor productivity is age-specific and falls beyond a certain age, the wealth-age profile is found to peak prior to the end of the working life.
10.2 The Steady State in OLG Models
555
increasing age s because older agents hold higher stocks of capital.15 As a consequence, marginal utility of income declines in older age. The average working time amounts to 0.355, so that aggregate employment is equal to L = 0.237. The steady-state values of pensions, the interest rate, and taxes are given by pen = 0.0979, r = 1.42%, and τ = 13.0%, respectively. 0.370 0.365
¯l s
0.360 0.355 0.350 0.345 1
10
20
30
40
s Figure 10.2 Labor-Supply-Age Profile in the Standard OLG Model
This direct method of computing the policy functions only at single values of the state space is very fast and accurate. It is also applicable to more complex models with many types of agents and/or assets. In Problem 10.4, we ask you to solve an OLG model with 5 different types of agents in each generation and with a portfolio choice on capital and money. The basic challenge is to come up with a good initial guess. Therefore, you are asked to compute the solution for the model without money and just one 15
This behavior of mean working hours is not in accordance with empirical observations of Kaplan (2012) who estimates a hump-shaped labor-supply-age profile for the US economy. Labor supply peaks after 15-20 working years, then flattens and eventually decreases toward retirement. As the main missing factor in our model, we do not yet assume a humpshaped efficiency-age profile. According to empirical evidence provided by Hansen (1993) the hourly wage rate increases until mid-age (around age 50) before it declines toward retirement. We, therefore, will consider age-dependent efficiency in the OLG models of the subsequent sections. We, however, do not match the fact that labor supply on the extensive margin (labor-market participation) is also hump-shaped over the life-cycle as documented by Bick et al. (2018). To correct the OLG model setup for this shortcoming, one may introduce fixed labor market participation costs that varies by age as in Kitao (2014) and applied by Cooley and Henriksen (2018) to explain the slowdown in growth in Japan and the US as a consequence of an aging population.
556
10 Overlapping Generations Models with Perfect Foresight
representative agent in each generation and find the solution step by step using homotopy methods. Direct computation should be applied whenever possible. However, this will not always be feasible, such as in models with idiosyncratic uncertainty. In this case, the only feasible alternative solution method often consists of computing the policy functions over an interval of the state space. This method is described next.
10.2.4 Computation of the Policy Functions 60 To compute the age-wealth profile of the steady state, ¯ks s=1 , we may also compute the optimal policy function ks+1 (ks ; K, L, pen) for each cohort s s s over an interval ks ∈ [kmin , kmax ]. As we do not know the age-wealth profile in advance, we will begin by computing the policy functions for each s s age over the same interval [kmin , kmax ] = [kmin , kmax ]. In later iterations, we may adapt the state space for each cohort s.16 Having computed the 60 policy functions, it is easy to find the solution ¯ks s=1 . We simply start with k1 = ¯k1 and compute k2 (k1 ) = k2 (0) = ¯k2 . Similarly, we compute ¯ks+1 with the help of ks+1 (¯ks ; K ¯ , ¯L , pen) for s = 2, . . . , T − 1. There are various methods to compute the policy functions. We will discuss the value function iteration and projection methods that you have already encountered in Chapter 7 and Chapter 5, respectively, for the solution of the Ramsey problem. We discuss these methods in turn and start with value function iteration in an economy with finite lifetime. VALUE FUNCTION ITERATION. A straightforward method of approximating the value function v s (ks ; K, L, pen) at a point ks with s = 1, . . . , T involves tabulating it for a finite number nk of points on the state space starting in the last period of life, T , and iterating backwards in time to the period s = 1. The maximization occurs over the interval [kmin , kmax ], which, in particular, must contain the steady state, a point that is only determined as the outcome of our computation. In the last period T , the value function v T (k T ; K, L, pen) is given by (10.11c) with c T = (1 + r)k T + pen. For a given table of values for v s+1 (ks+1 ; K, L, pen) on the grid [kmin , kmax ], 16
The adaption of the age-specific asset grid may not be a feasible strategy in the case of heterogeneous agents within a given cohort, a problem that you will encounter in Section 10.4.
10.2 The Steady State in OLG Models
557
the approximate retired agent’s maximum at age s on the right-hand side of (10.11a)-(10.11c) can be found by choosing the largest value for v s+1 (ks+1 ; K, L, pen) given ks , which we store as ks+1 (ks ; K, L, pen). Together with the two neighboring points on the asset grid, we bracket the maximum and apply a golden section search to find the maximum of the Bellman equation (10.11a). To obtain values of the value function v s+1 (·) off gridpoints, we interpolate linearly or with the help of a cubic spline. At this point, we need to emphasize a crucial difference between finitehorizon and infinite-horizon problems. In contrast to value function iteration in infinite-horizon models, we know the value of the agent in the last period of his life, v 60 = u(c 60 , 1 − l 60 ), with c 60 = (1 + r)k60 + pen and l 60 = 0. As a consequence, we do not have the problem of providing an initial guess for the value function. This feature also holds for the other solution methods of finite-horizon problems, e.g., the projection method presented below. Given the value v 60 (ks ; K, L, pen) for ks ∈ [kmin , kmax ], we can find the value functions of the different cohorts, v s (·), s = 59, . . . , 1, with only one iteration. As a consequence, the computation of the policy functions is much faster in most applications with finite horizons than in infinite-horizon problems. Note, however, that the need for storage capacity increases as the number of policy functions is multiplied by the number of different age cohorts. The dynamic programming problem of the working agent (10.11a) involves the maximization over an additional control, labor supply l s . A standard procedure to solve this kind of problem consists of choosing the largest value over a grid on labor supply l s ∈ [l min , l max ]. As a consequence, the optimal next-period capital stock together with the optimal labor supply decision is found by iterating over a two-dimensional grid. For reasonable required accuracy, we often find this procedure to already imply prohibitive storage capacity and computing speed to be a useful method on personal computers. Instead, we only iterate over a one-dimensional grid of the capital stock and solve the household’s Euler equation (10.7a) and budget constraint (10.3) for given current and next-period capital stock (ks , ks+1 ). For our choice of the functional form for utility u(.), we can solve these two equations even directly for c s and l s for given ks and ks+1 . Note that this procedure does not restrict the controls c s and l s to lie on any grid. The solution with the help of the value function iteration is computed in the program AK60_value.g.17 Concerning our computation details, 17
We also provide the Python and Julia code for this application. The program AK60_ value.py is also described in much more detail in our Jupyter notebook tutorial that is
558
10 Overlapping Generations Models with Perfect Foresight
wealth is bounded below by kmin = 0, while maximum wealth is set equal to kma x = 5.0, which is found to never be binding.18 Furthermore, we choose an equispaced grid over the capital stock [kmin , kmax ] of nk = 50 points. The required storage capacity for the value function v s (ks ) and the two policy functions, c s (ks ) and l s (ks ), is equal to 2 × T × nk + T w × nk = 800 numbers. The user can choose between linear and cubic spline interpolation. The age-wealth profile computed with value function iteration is almost identical to that displayed in Figure 10.1. Our results are summarized in Table 10.1. Table 10.1 Computation of the Steady State of the OLG Model K
N
Runtime
Direct computation
0.940
0.237
00:01
Value function iteration — linear interpolation — cubic spline
0.950 0.941
0.238 0.236
00:31 01:10
Projection method
0.943
0.237
00:05
Method
Notes: Runtime is given in minutes:seconds on an Intel(R) Xeon(R) CPU running at 2.90 GHz.
The aggregate capital stock and aggregate employment amount to K = 0.950 (0.941) and L = 0.238 (0.236) for the linear (cubic spline) interpolation. Note that in the case of cubic spline interpolation between grid points, the aggregate capital stock diverges less from the aggregate capital stock found with the direct method described above. This difference is a good measure of accuracy because the latter solution can be expected to coincide with the true solution (in the case of direct computation of the steady-state distribution, the accuracy of the non-linear equation solution and the divergence of ¯k1 from zero are both less than 10−8 ). In addition, cubic spline interpolation is slower than linear interpolation because it available on our download page. The Julia code AK60_value.jl uses the OPTIM package by Mogensen and Riseth (2018). 18 In our model, we abstract from any inequality constraints such as c ≥ 0 or k ≥ 0 because these constraints do not bind (except in period 1 with ¯k1 = 0.0). In Section 10.5, we will consider a life-cycle model where the initial cohorts accumulate debt, ks < 0.
10.2 The Steady State in OLG Models
559
needs more function evaluations. While value function iteration with linear interpolation takes 31 seconds, cubic spline interpolation takes 1 minutes and 10 seconds.19 This finding echoes our results for the stochastic Ramsey presented in Table 7.3. PROJECTION METHODS. Alternatively, we compute the steady-state so 60 lution ¯ks s=1 with the help of projection methods that we introduce in Chapter 5. For this reason, we approximate the consumption function c s (ks ; K, L, pen), s = 1, . . . , 60, and the labor supply l s (ks ; K, L, pen), s = 1, . . . , 40, with Chebyshev polynomials of order nc and nl over the interval [kmin , kma x ], respectively: s
s
c (k ; K, L, pen) = l s (ks ; K, L, pen) =
nc X j=0 nl X j=0
c,s
γ j T j (ξ−1 (ks )), l,s
γ j T j (ξ−1 (ks )),
where ξ−1 (ks ) = (2ks − kmin − kmax )/(kmax − kmin ) is the linear transformation that maps ks ∈ [kmin , kmax ] into the domain of the Chebyshev polynomials [−1, 1].20 We choose orthogonal collocation to compute the c,s l,s coefficients γ j and γ j from equations (10.7a) and (10.7b). In the case of the retired worker, for example, we solve the system of nc + 1 nonlinear equations at the values z that are the nc + 1 (transformed) zeros of the Chebyshev polynomial Tnc . To solve the nonlinear equations problem, we use a quasi-Newton method. The initial guess for the coefficients γc,s and γl,s are the coefficients γc,s+1 and γl,s+1 for s < T + T R . For s = T , we are able to compute the exact values of c T at 2nc Chebyshev interpolation nodes because we know that the household consumes all its income and wealth in the last period. Therefore, we can approximate the function in period T by least squares with the help of Algorithm 13.8.1. For s = T W , we need to provide an initial guess of the coefficients for labor supply, γl,40 . We use an inelastic labor supply function, l 40 (ks ; K, L, pen) ≡ 0.3, to initialize the coefficients, again making use of Algorithm 13.8.1. 19
In our experience, Julia is much fast than GAUSS which, in turn, is faster than Python. Computational times for the computer program AK60_value amount to 8.8 seconds, 30.8 seconds and 10 minutes:43 seconds on Julia, GAUSS and Python, respectively. Note, however, that we have not made extensive use of methods to speed up the computational time such as multi-threading or parallelization. 20 See equation (13.28b) in Chapter 13.
560
10 Overlapping Generations Models with Perfect Foresight
The program AK60_proj.g computes the steady-state solution with the help of projection methods. Concerning our computational details, we c,s choose a degree of approximation nc equal to 3. Given this choice, γ3 and c,s c,s −6 −3 γ3 /γ2 are less than 10 and 10 , respectively, and we can be confident that our approximation is acceptable. Similarly, we choose nl = 3. We also choose the same interval over the state space, [kmin , kmax ] = [0, 5], as in the case of the value function iteration. The computed Chebyshev coefficients drop off nicely because the decision functions of the household can quite accurately be described by polynomial functions of small degree. In addition, all parameter values are exactly the same as in the case of value function iteration. The results from the solution of the steady-state distribution with the help of the projection method nearly coincide with those from the value function iteration with cubic spline interpolation (see Table 10.1). In the former, the aggregate capital stock amounts to K = 0.943, and employment is equal to L = 0.237. The optimal policy functions for consumption differ by less than 0.1% over the range [kmin , kmax ] = [0.0, 5.0] between these two methods. Importantly, however, the algorithm based on projection methods is 60 times faster than that based on value function iteration and takes only 5 seconds. At this point, let us reconsider the results in Table 10.1 and mention one word of caution. In accordance with the results presented in Table 10.1, we recommend using direct computation whenever possible. In Chapter 11, we introduce both individual and aggregate uncertainty. In these cases, direct computation is not feasible in most cases, and one might conjecture that the projection method is preferable to value function iteration with cubic splines due to the much shorter computational time. However, in many cases, the introduction of uncertainty requires the consideration of much larger intervals for the state space over which we have to approximate the policy function. In these cases, we often find that the quality of the projection method deteriorates,21 or it might be extremely difficult to come up with a good initial guess. In fact, the search for a good initial guess might be more time-consuming than the use of value function iteration methods, or the initial guess itself might only be found with the help of value function iteration. For this reason, we cannot offer a general recommendation for which method is preferable, and the researcher may 21
For example, when using projection methods, we may be unable to preserve the shape (e.g., the concavity) of the policy function.
10.3 The Laffer Curve
561
have to try different methods and find the most accurate and fastest one through trial and error.
10.3 The Laffer Curve In many cases, it is difficult to come up with an initial guess for the solution of the non-linear equations that characterize the equilibrium of a largescale overlapping generations model. In Section 10.2.3, we studied a simple OLG model with a Cobb-Douglas utility function so that we could solve the first-order conditions of the households for labor supply as a linear function of consumption. This allowed us to write the equilibrium conditions of the model as a function of the capital stock alone and to effortlessly come up with an initial guess for the solution of the model. In the following, we will consider a standard model for the analysis of fiscal policy for which the computation of the nonlinear equilibrium equations is a much more complicated and sophisticated task. The model extends the Ramsey model of Trabandt and Uhlig (2011) to an overlapping generations model with a public pay-as-you-go pension system building upon Heer et al. (2020). In addition to these authors, we reduce the length of the period from five years to one year so that the equilibrium conditions amount to solving a non-linear equations problem in 119 endogenous variables. We will apply the OLG model to compute the Laffer curve of the US economy which describes the government tax revenue as a function of the income tax rate. DEMOGRAPHICS AND TIMING. A period, t, corresponds to one year. At each period t, a new generation of households is born. Newborns have a real-life age of 20 denoted by s = 1. All generations retire at the end of age s = T W = 45 (corresponding to a real-life age of 64) and live up to a maximum age of s = T = 70 (real-life age 89). The number of periods during retirement is equal to T R = T − T W = 25. Let Nts denote the number of agents of age s at t. We denote the total population at t by Nt . At t, all agents of age s survive until age s + 1 with probability φ ts , where φ t0 = 1 and φ tT = 0. The number of the first cohort
562
10 Overlapping Generations Models with Perfect Foresight
Nt1 grows at a constant rate n. In a stationary equilibrium with constant survival probabilities φ s the total population grows at this rate, too.22 Let µst denote the share of the s-year-old cohort in the total population in period t, µst := Nts /Nt . We will only analyze stationary economies, so we can omit the time index, and the measure is constant. In equilibrium, the total population is given by Nt =
T X s=1
s
Nt µ =
T X s=1
Nts .
(10.17)
To numerically compute the stationary population shares µs , we simply observe that for constant survival probabilities φ s , the dynamics of the population cohorts are presented by s−1 Nts = φ s−1 Nt−1 .
Division of this equation by total population Nt and noting that Nt = (1 + n)Nt−1 results in Nts Nt
=φ
s−1
s−1 Nt−1 Nt−1 Nt−1 Nt
or 22
To formally derive this argument, consider an economy with two cohorts, T = 2. The first cohort has n children, so that 1 Nt+1 = (1 + n)Nt1 .
Furthermore, we know the survival probability from age 1 to age 2 is presented by φ 1 , so that 2 Nt+1 = φ 1 Nt1 .
Therefore, the total population at period t + 1 amounts to 1 2 Nt+1 = Nt+1 + Nt+1 = (1 + n)Nt1 + φ 1 Nt1 . 1 Substitution of Nt1 = Nt − φ 1 Nt−1 in this equation results in 1 Nt+1 = (1 + n) Nt − φ 1 Nt−1 + φ 1 Nt1 ,
or
1 Nt+1 = (1 + n)Nt + φ 1 Nt1 − (1 + n)Nt−1 = (1 + n)Nt . | {z } =0
It is straightforward to generalize this proof to a longer lifetime T > 2.
10.3 The Laffer Curve
µst = µs =
563
φ s−1 s−1 µ . 1+n
(10.18)
Therefore, for the numerical derivation of µs , we can simply initialize 1 s µ P =s 1, iterate over (10.18) for s = 2, . . . , T , and divide µ by the sum s µ to normalize the total number of population shares equal to one. HOUSEHOLDS. Each household comprises one (possibly retired) worker. Households maximize expected intertemporal utility at the beginning of age 1 in period t max
T X s=1
j−1 s s β s−1 Πsj=1 φ t+ j−2 u(c t+s−1 , l t+s−1 ) + v(g t+s−1 ) ,
(10.19)
where β > 0 denotes the discount factor. Instantaneous utility u(c, l) is specified as a function of consumption c and labor l as in Trabandt and Uhlig (2011):23
u(c, l) =
1+1/ϕ , ln c − κl
1 1−η
if η = 1,
η c 1−η 1 − κ(1 − η)l 1+1/ϕ − 1 , if η > 0 and η 6= 1.
(10.20)
During working life, the labor supply of the s-year-old household amounts to l s ≥ 0, s = 1, . . . , T W , while it is set to l s = 0 during retirement, for s = T W +1, . . . , T . Utility from government consumption, v(g t ), is additive, so government consumption per capita, g t , does not have any direct effect on household behavior (only indirectly through its effects on transfers and taxes). Let ¯y s denote the age-productivity of the s-year-old household, while A t denotes aggregate labor productivity. The age-productivity profile, { ¯y s }45 s=1 , is a hump-shaped function as estimated by Hansen (1993) and constant over time, while A t grows at rate gA. Accordingly, total labor income, w t A t ¯y s l ts , is the product of the wage rate per efficiency unit w t , labor productivity A t , the age-efficiency factor ¯y s , and working hours l ts . The retired household receives a lump-sum pension pen t that does not depend on age. Net non-capital income x st is represented by 23
Trabandt and Uhlig (2011) show in their Proposition 1 that this function is consistent with long-run growth and features a constant Frisch elasticity ϕ and a constant intertemporal elasticity of substitution, 1/η.
564
10 Overlapping Generations Models with Perfect Foresight
x st =
p (1 − τlt − τ t )w t A t ¯y s l ts pen
s = 1, . . . , T W ,
(10.21)
s = T W + 1, . . . , T.
t
The budget constraint of the household at age s = 1, . . . , T W is given by (1 + τct )c ts =x st + 1 + (1 − τkt )(r t − δ) kst (10.22) s+1 + R bt bst + t r t − ks+1 t+1 − b t+1 ,
where kst and bst denote the capital stock and government bonds of the syear-old cohort at the beginning of period t. The household is born without assets and leaves no bequests at the end of its life, implying k1t = k tT +1 = 0 and b1t = b tT +1 = 0. It receives interest income r t and R bt − 1 on capital and government bonds and pays income taxes on labor and capital income at the rates of τlt and τkt , respectively. Capital depreciation δkst is tax exempt. Consumption is taxed at the rate τct . The household also pays contributions p to the pension system at the rate τ t that are levied on its wage income. In addition, the household receives government transfers in the amount of t r t in period t. The Lagrangian function of a 1-year-old household in period t can be formulated as follows: L t =u(c t1 , l t1 ) + v(g t ) p + λ1t (1 − τlt − τ t )w t A t ¯y 1 l t1 + t r t − k2t+1 − b2t+1 − c t1 (1 + τct ) 2 2 + βφ t1 u(c t+1 , l t+1 ) + v(g t+1 )
p 2 + λ2t+1 (1 − τlt+1 − τ t+1 )w t+1 A t+1 ¯y 2 l t+1
+ (1 − τkt+1 )(r t+1 − δ)k2t+1 + k2t+1 + R bt+1 b2t+1 + t r t+1 3 3 2 c − k t+2 − b t+2 − c t+1 (1 + τ t+1 )
+β
2
2 φ t1 φ t+1
3 3 u(c t+2 , l t+2 )+
v(g t+2 ) + λ3t+2
...
+ ...
The first-order conditions of the 1-year-old household with respect to c t1 , l t1 , k2t+1 , and b2t+1 are as follows:
10.3 The Laffer Curve
565
η λ1t (1 + τct ) = (c t1 )−η 1 − κ(1 − η)(l t1 )1+1/ϕ , 1 p 1 l 1 λ t (1 − τ t − τ t )A t ¯y w t = κη 1 + (c t1 )1−η ϕ η−1 1 1/ϕ × 1 − κ(1 − η)(l t1 )1+1/ϕ (l t ) , 1 1 2 k λ t = βφ t λ t+1 1 + (1 − τ t+1 )(r t+1 − δ) , λ1t = βφ t1 λ2t+1 R bt+1 .
In a similar fashion, we can derive the first-order conditions of the syear-old household by taking the derivative of the Lagrangian L t−s+1 with s+1 respect to c ts , l ts , ks+1 t+1 , and b t+1 for the generation that is born in period t − s + 1: s =1, . . . , T : 0 =λst (1 + τct ) − (c ts )
−η
s =1, . . . , T w :
p
1 − κ(1 − η)(l ts )
0 =λst (1 − τlt − τ t )A t ¯y s w t − κη 1 +
1+1/ϕ η
s+1
1 (c ts )1−η ϕ
0 =λst − βφ ts λ t+1 1 + (1 − τkt+1 )(r t+1 − δ) , s =1, . . . , T − 1 :
0
=λst
, (10.23b)
η−1 s 1/ϕ × 1 − κ(1 − η)(l ts )1+1/ϕ (l t ) ,
s =1, . . . , T − 1 :
(10.23a)
(10.23c)
(10.23d)
b − βφ ts λs+1 t+1 R t+1 .
In equations (10.23a), (10.23c), and (10.23d), l ts = 0 for s ≥ T W + 1. Note that according to the first-order conditions (10.23c) and (10.23d), the (certain) real after-tax returns on both assets are equal, meaning that the individual household is indifferent between holding assets in the form of physical capital or government debt. If we had only one household that lived for two periods as in the simple two-period OLG model of Diamond (1965) and Samuelson (1962), this would pose no problem because in this case the only household with savings also needs to hold the assets in same amount as the aggregate assets.24 With many periods, however, the portfolio allocation is indeterminate. Therefore, we assume without loss of generality that each household holds the two assets in the same proportion 24
The same conclusion, of course, holds in the Ramsey model with a representative agent.
566
10 Overlapping Generations Models with Perfect Foresight
(which is equal to the share of the aggregate asset in total aggregate assets, e.g., K t /(K t + B t ) is the portfolio share of capital).25 PRODUCTION. The production technology is described by a Cobb-Douglas function: Yt = K tα (A t L t )1−α .
(10.24)
Capital depreciates at the rate δ, and A t grows at the exogenous rate gA: A t+1 = 1 + g A. At
(10.25)
Firms maximize profits Π t = Yt − r t K t − w t A t L t
(10.26)
resulting in the first-order conditions that the factor prices w t and r t in period t are equal to the marginal products of labor and capital: w t = (1 − α)K tα (A t L t )−α , r t = αK tα−1 (A t L t )1−α .
(10.27a) (10.27b)
GOVERNMENT. The government expenditures consist of public consumption G t , transfers Tr t , and interest on public debt B t . Government expenditures are financed by taxes Tt , debt, B t+1 − B t , and confiscated accidental bequests according to: G t + Tr t + R bt B t = B t+1 + Tt + Beq t .
(10.28)
Government consumption and transfers grow at the exogenous rate gA such G that, for example, g˜ t := A t Nt t denotes stationary government expenditures per capita. 25
Kaplan and Violante (2014) endogenize the portfolio allocation on illiquid and liquid assets in a life-cycle model without aggregate uncertainty. They assume that the adjustment of the illiquid asset (physical capital or equity) is costly, while the liquid asset (government bonds or money) can be altered without transaction costs. As a consequence, the return of the illiquid asset must be higher in equilibrium and, for example, can be matched with the equity premium or the shape of the portfolio share over the life-cycle. Braun and Ikeda (2021) apply these ideas to study the distributional effects of monetary policy where money is the liquid asset.
10.3 The Laffer Curve
567
Accidental bequests are collected from households that do not survive:26 Beq t =
T −1 X s=1
s+1 (1 − φ ts )Nts ks+1 + b t+1 t+1 .
(10.29)
Taxes are levied on consumption, interest income, and wage income: Tt = τct C t + τlt w t A t L t + τkt (r t − δ)K t
(10.30)
with aggregate consumption C t , labor L t , and capital K t presented as the sum of the individual variables, respectively: Ct = Lt = Kt =
T X s=1 TW X s=1 T X s=1
Nts c ts ,
(10.31a)
Nts ¯y s l ts ,
(10.31b)
Nts kst .
(10.31c)
In particular, aggregate labor L t is the sum over the individual effective labor supplies, ¯y s l ts . SOCIAL SECURITY. The social security authority runs a balanced budget: p
τt w t At L t =
T X
Nts pen t .
(10.32)
s=T W +1
GOODS MARKET EQUILIBRIUM. The goods market is in equilibrium, meaning that total production is equal to total demand: Yt = C t + G t + K t+1 − (1 − δ)K t .
(10.33)
STATIONARY EQUILIBRIUM. In stationary equilibrium, output, consumption, capital, debt, pensions, and transfers (per capita) all grow at the rate gA. For this reason, we define the following individual stationary variables: 26
Accidental bequests are derived in Appendix A.10.
568
10 Overlapping Generations Models with Perfect Foresight
v˜t :=
vt ˜ t := λ t Aη , for vt ∈ {c t , kst , bst , t r t , pen t } and λ t At
and aggregate stationary variables: V˜t :=
Vt Lt for Vt ∈ {B t , Beq t , K t , Tr t , Yt } and ˜L t := Nt A t Nt
implying the factor prices ˜ α−1 ˜L 1−α , r t = αK t t ˜ α ( ˜L t )−α . w t = (1 − α)K
(10.34a) (10.34b)
t
Consequently, non-capital income x˜ t is defined by p (1 − τlt − τ t )w t ¯y s l ts , s = 1, . . . , T W , s x˜ t = pg en t , s = T W + 1, . . . , T.
(10.35)
The budget constraint of the household at age s = 1, . . . , T is given by (1 + τct )˜c ts =˜ x st + 1 + (1 − τkt )(r t − δ) ˜kst + R bt ˜bst + ter t (10.36) − (1 + gA) ˜ks+1 + ˜bs+1 . t+1
t+1
The stationary first-order conditions of the s-year-old household are as follows: s =1, . . . , T : ˜ s (1 + τc ) − (˜c s ) 0 =λ t t t
−η
1 − κ(1 − η)(l ts )
1+1/ϕ η
(10.37a)
s =1, . . . , T w : (10.37b) p s l s ˜ (1 − τ − τ ) ¯y w t 0 =λ t t t η−1 1 − κη 1 + (˜c ts )1−η 1 − κ(1 − η)(l ts )1+1/ϕ (l ts )1/ϕ , ϕ s =1, . . . , T − 1 : ˜ s − βφ s λ ˜ s+1 1 + (1 − τk )(r t+1 − δ) , 0 =(1 + gA)η λ t t+1 t+1 s =1, . . . , T − 1 : ˜ s − βφ s λ ˜ s+1 R b . 0 =(1 + gA)η λ t t+1 t+1
(10.37c)
(10.37d)
The budget constraint of the government is presented by: ˜t + T ˜ ˜ ˜ fr t + R b B g G t t = (1 + g A)(1 + n) B t+1 + Tt + B eq t ,
(10.38)
10.3 The Laffer Curve
569
and the budget of the social security authority is balanced: τ w t ˜L t = p
T X
µs pg en t .
(10.39)
s=T W +1
Finally, the economy’s resource constraint in stationary variables is given by: ˜ t + (1 + gA)(1 + n)K ˜ t+1 − (1 − δ)K ˜t . Y˜t = C˜t + G
(10.40)
CALIBRATION. We calibrate the model with respect to characteristics of the US economy. The population parameters are taken from UN (2015). For the population growth rate, we use the average over the years 1990-2010 (to match the time horizon chosen by Trabandt and Uhlig (2011)), implying n=0.95%. We use the same procedure to compute the age-specific survival rates from the same data set. The survival probabilities φ s decline with age s and are presented in Figure 10.3. The calibration is implemented in the R GAUSS and MATLAB programs Laffer.g and Laffer.m, which can be downloaded together with the data input files from the book’s download page. The calibration parameters are summarized in Table 10.2. 1.00 0.98
φs
0.96 0.94 0.92 0.90 1
10
20
30
40
50
s
60
70
Figure 10.3 Survival Probabilities φ s in Benchmark Equilibrium
The productivity-age profile { ¯y s }45 s=1 is taken from Hansen (1993) and presented in Figure 10.4.27 The calibration of the production and pref27
The hump-shaped productivity-age profile is revisited by Rupert and Zanella (2015) who find that this kind of hump-shaped wage profiles can be found among workers that
570
10 Overlapping Generations Models with Perfect Foresight
¯y s
erence parameters follows Trabandt and Uhlig (2011). In particular, we set the production elasticity of capital α, the depreciation rate δ, and the annual growth rate of output gA equal to α = 0.35, δ = 8.3%, and gA = 2.0%. From these authors, we also take the parameters 1/η = 1/2 and ϕ = 1 for the intertemporal elasticity of substitution and the Frisch elasticity, as well as the preference parameter κ = 3.63, which implies an average labor supply of the US worker of 30.8% of available time in good accordance with the value estimated by Trabandt and Uhlig (2011). Finally, β = 1.0372 is chosen, meaning that the real interest on government bonds amounts to 4%. 1.15 1.10 1.05 1.00 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55
1
5
10
15
20
25
30
35
40
45
s Figure 10.4 Productivity-Age Profile ¯y s
Following Trabandt and Uhlig (2011), the tax parameters are set to τl +τ p = 28%, τk = 36%, and τc = 5%, and the government consumption share in GDP is set equal to G/Y = 18%. The debt-output level amounted to B/Y = 63% during this period. The gross replacement rate of pensions with respect to wages, r epl = pen/(w¯l) = 35.2%, for 2014 are taken from the OECD (2015) (series: gross pension replacement rates for men, % of pre-retirement earnings). The social security rate τ p is computed endogenously and amounts to 8.87%. In addition, we find an endogenous value of the transfer-to-GDP ratio equal to Tr/Y = 5.18%. entered the labor market before 1960. For workers who entered the labor market after the 1960s they find that wages do not fall until these workers are in their late 60s.
10.3 The Laffer Curve
571
Table 10.2 Calibration of the Large-Scale OLG Model Parameter
Value
Description
n α δ gA 1/η ϕ κ
0.95% 0.35 8.3% 2.0% 1/2 1 3.63
β τl + τ p τk τc G/Y
1.037 28% 36% 5% 18%
B/Y r epl
63% 35.2%
population growth rate production elasticity of capital depreciation rate of capital growth rate of output intertemporal elasticity of substitution Frisch elasticity of labor supply preference parameter for weight of disutility from labor discount factor tax on labor income tax on capital income tax on consumption share of government spending in steadystate production debt-output ratio gross pension replacement rate
STEADY-STATE COMPUTATION. To compute the steady state, we solve a nonlinear equations problem in 119 variables consisting of the 69 individual asset levels, a˜s = ˜ks + ˜bs , s = 2, . . . , 70, (with a˜1 = 0), the 45 individual ˜ , ˜L , A, ˜ τp , labor supplies, l s , s = 1, . . . , 45, and the aggregate variables K fr. and T The system of nonlinear equations consists of the 69 Euler conditions (10.37c) and the 45 first-order conditions with respect to labor (10.37b) together with the aggregate conditions ˜= Ω ˜L =
70 X s=1 45 X
µs a˜s ,
(10.41a)
µs ¯y s l s ,
(10.41b)
s=1
˜ =Ω ˜, ˜ −B K
˜ ˜ − G, fr = T˜ + B g T eq − (R b − 1)B P70 g en s=46 µs p τp = , w ˜L
(10.41c) (10.41d) (10.41e)
572
10 Overlapping Generations Models with Perfect Foresight
where Ω denotes aggregate assets and pg en are computed using the definition of the replacement rate r epl: pg en = r epl × w¯l. All other variables, e.g., individual consumption, factor prices, and aggregate bequests and taxes, can be computed with the help of the 119 endogenous variables. For example, for the computation of individual consumption levels ˜c s , we can use the individual budget constraints. For the computation of the factor prices, we use the first-order conditions of the firms. We solve this nonlinear equations problem with a modified NewtonRhapson method as described in Algorithm 15.3.2. The main challenge for the solution is to come up with a good initial value for the individual and aggregate state variables. Therefore, we begin from a simple 46-period OLG model with exogenous labor where all cohorts except the last are workers, which consists of retirees. The exogenous labor supply is set equal to 0.3, and the initial value for the aggregate capital stock is set equal to 0.0499, which follows from a real interest rate of 4% and ˜ α−1 ˜L 1−α − δ = 0.04 r − δ = αK
with an initial guess for aggregate labor ˜L = 0.3 (the average age-efficiency is normalized to one). In addition, we use the approximation (1 + gA)η /R b for the discount factor β assuming R b = 1.04. The wealth-age profile for this model is represented by the blue line in Figure 10.5, corresponding to the case T R = 1. Thereafter, we add one additional cohort of retirees in each step and use the solution of the model in the previous step as an input for the initial value of the next step. The solution for the wealth-age profiles for these cases is again illustrated in Figure 10.5 for T R = 11 and T R = 21. Note that the wealth maximum is attained at later ages as the lifetime increases. In addition, maximum wealth declines with increasing retirement periods T R. Finally, we introduce endogenous labor into the model (green line) and recalibrate β such that the real interest rate on bonds after taxes is equal to 4%. Since our initial value of the discount factor β is below the calibrated value of β, savings increase again, and the wealth-age profile shifts up in the final case of the benchmark equilibrium (brown line in Figure 10.5). During these initial computations, we compute the solution
10.3 The Laffer Curve TR = 1
573
T R = 11
T R = 21
Endogenous Labor
Benchmark
2.4
˜ks + ˜bs
2.0 1.6 1.2 0.8 0.4 0.0
1
10
20
30
40
s
50
60
70
Figure 10.5 Wealth-Age Profile Approximation
for the individual optimization problem in an inner loop and update the aggregate capital variables in an outer loop with a dampening iterative scheme as described in Section 3.9 of Judd (1998), which helps to ensure convergence. For the final calibration and the computation of the steady states for different tax rates, we apply the modified Newton-Rhapson algorithm to the complete set of the 119 individual and aggregate equilibrium conditions. The computation time of the steady state and the results amounts to 1:05 (minutes:seconds) with the GAUSS program Laffer.g R and 1:06 (minutes:seconds) with the MATLAB program Laffer.m.28 The runtime is presented in Table 10.3. Our step-wise procedure for the computation of the steady state has an additional advantage. At each step, we can check whether the new solution makes intuitive sense or if there is an abrupt change in the policy functions or equilibrium values of the model. Therefore, we have supporting evidence that our computational procedure is correct if the solution converges “smoothly” to the final solution such as that for the wealth-age profile displayed in Figure 10.5. 28
R In general, we find that computational speed is lower in MATLAB than in GAUSS if we introduce additional loops into the program code. Aruoba and Fernández-Villaverde (2015) consider the solution of the stochastic neoclassical growth model with the help of value function iteration with grid search and compare different programming languages. R As one result, they find that MATLAB is 9-11 times slower than C++.
574
10 Overlapping Generations Models with Perfect Foresight Table 10.3 Computation of Laffer Curves: Runtime Method
Runtime GAUSS
R MATLAB
Benchmark Version
Direct computation
Laffer.g
Laffer.m
1:05
1:06
Computation of the Laffer Curves
Laffer_p.g
Laffer_p.m
Direct computation
1:13
0:32
With parallel processing
0:28
0:10
Notes: Runtime is given in minutes:seconds on an Intel(R) Xeon(R) CPU with 8 cores running at 2.90 GHz.
INDIVIDUAL POLICY FUNCTIONS. In Figures 10.6-10.8, the individual policy functions with respect to savings, labor supply, and consumption are displayed (in the steady state). The wealth-age profile in Figure 10.6 (corresponding to the brown curve in Figure 10.5) is hump-shaped, and maximum wealth a˜s = ˜ks + ˜bs is obtained at real-life age 58 (corresponding to s = 39). To smooth intertemporal utility over the lifetime, households accumulate savings for old age because pensions are below wage income. Since individual productivity ¯y s peaks at age 50 and declines thereafter, the households start to reduce their wealth prior to retirement. The labor-supply-age profile in Figure 10.7 mimics the profile of individual efficiency ¯y s over the working life. Therefore, it also displays kinks at ages 30 and 50. Labor supply l s , however, peaks prior to efficiency ¯y s due to the income effect. Since wealth increases until age 58, households decrease their labor supply accordingly. Individual consumption c s increases until retirement according to Figure 10.8. Consumption displays the same kinks as labor supply. When the household retires, its labor supply drops to zero, and its utility increases significantly ceteris paribus. To intertemporally smooth utility, therefore, the household reduces consumption at age 65. During retirement, the consumption profile is hump-shaped because the survival probabilities
10.3 The Laffer Curve
575
2.0
˜ks + ˜bs
1.6 1.2 0.8 0.4 0.0 1
10
20
30
40
50
70
60
s
Figure 10.6 Wealth-Age Profile with Age-Dependent Productivities
0.350 0.325
ls
0.300 0.275 0.250 0.225 0.200 1
5
10
15
20
25
30
35
40
45
s Figure 10.7 Labor-Supply-Age Profile in the Economy with Age-Dependent Productivities
and, therefore, the effective discount factor, βφ s , continue to decline. At age 78, βφ s drops below the gross interest rate on bonds, R b , such that the household decreases consumption in accordance with the Euler equation (10.23d). RESULTS. The model is applied to the computation of the Laffer curve for labor income taxation. Figure 10.9 presents the effects of an increase in the labor income tax rate τl + τ p on labor income tax revenue (blue line) and total tax revenue (black line). There are two different kinds of possible
576
10 Overlapping Generations Models with Perfect Foresight 0.250 0.225
˜c s
0.200 0.175 0.150 0.125 1
10
20
30
40
50
70
60
s
Figure 10.8 Consumption-Age Profile in the Economy with Age-Dependent Productivities
Laffer curve exercises. For the ‘s-Laffer curve’, government transfers Tr are varied to absorb (provide) the additional (missing) taxes, while for the ‘gLaffer curve’, transfers Tr are held constant, and government consumption G adjusts to balance the government budget (10.33). We analyze the ‘sLaffer curve’. In addition, we assume that the level of government debt B t (relative to labor productivity A t and total population Nt ) remains constant, while pensions are adjusted such that the replacement rate of pensions relative to wages remains at r epl = 35.2%. 0.125 0.100 0.075
Labor Income Tax Revenues Total Tax Revenues
0.050 0.025 20
30
40
50 l
τ +τ Figure 10.9 Laffer Curves
60 p
70
80
90
10.3 The Laffer Curve
577
The resulting profile of tax revenues is hump-shaped, and the maximum of total tax revenue (black line) occurs at a labor income tax rate of τl + τ p = 66.9%. The revenues from the labor income tax (blue line) peak at a slightly higher rate τl + τ p = 70.9% because the capital stock and, hence, capital income tax revenue decrease with higher labor income taxes. Moreover, total production and, hence, aggregate consumption decrease with higher labor income taxes. Note that we have graphed taxes as a function of the total levies on labor income consisting of both the wage tax τl and the pension contribution rate τ p . The corresponding maximal labor tax rate in the Ramsey model analyzed by Trabandt and Uhlig (2011) in Table 5 amounts to 63% and is slightly lower than that in our overlapping generations model. If we had used the assumption that pensions are fixed in level and that the replacement rate adjusts, the maximal labor tax rate would also be lower in our OLG model.29 PARALLEL COMPUTING. In the following, we consider how parallel computing can help to speed up the computation. To benefit from multi-core processing in the computation of the above problem, you need to have access to either GAUSS 16 (or a more recent version) or the parallel R computing toolbox in MATLAB . With parallel computing, you need to consider two primary issues. First, you have to break up the problem so that it can be run in parallel. Second, you need to avoid writing the same variables in different iterations of the same portions of the code at the same time. Therefore, you should only use local and not global variables in the section of the code that you compute in parallel. An introductory guide to parallelization in economics is provided by Fernández-Villaverde and Valencia (2018).30 Table 10.3 displays the runtime for our various computations on an Intel(R) Xeon(R) CPU with 8 cores running at 2.90 GHz computer. In the computation of the benchmark equilibrium, we cannot use parallel programming techniques. The computational times for this step-wise procedure explained in greater detail above amount to approximately 1 minute R with the programs Laffer in both GAUSS and MATLAB . In the compu29
You are encouraged to program the necessary changes in the GAUSS program Laffer.g R or MATLAB program Laffer.m. 30 R The authors review the use of parallelization in Julia, MATLAB , R, and Python, among others. They note that value function iterations are particularly amenable to the use of R parallelization. With respect to the parallel toolbox of MATLAB , however, they argue that it “suffers from limited performance and it is expensive to purchase”.
578
10 Overlapping Generations Models with Perfect Foresight
tation of the Laffer curve, we use a loop over the labor income tax rate that can easily be parallelized.31 Using just one processor, we observe a computational time of 1 minute 13 seconds with GAUSS and 32 seconds R with MATLAB . If we make use of parallelization (using 8 cores), the computational time drops by approximately two-thirds and amounts to R only 28 and 10 seconds, respectively. In case of MATLAB , however, you
R have to consider the additional time that MATLAB uses to set up a conR nection to the processors (which are called “workers” in MATLAB ). This time is independent of the latter use of parallelization and amounts to 20 seconds in the case of 8 workers on our machine. Therefore, we also find, in accordance with Fernández-Villaverde and Valencia (2018), that computational time can be significantly reduced with the help of parallel programming. In the present example, however, we need to mention a severe limitation in our computation. If we apply parallelization, we have to use the same initial value to solve the nonlinear system of equations that characterizes the equilibrium value (for different labor income tax rates) with the modified Newton-Rhapson method. As an initial value, we use the solution from the benchmark equilibrium with τl + τ p = 28%. When we try to compute the equilibrium for labor income taxes in excess of 75%, however, the computation breaks down R (in both GAUSS and MATLAB ). The initial value is not close enough to the solution. If we do not use parallel programming, however, we continuously increase the labor income tax rate and can use the solution from the last iteration as an initial value in each iteration. In this way, we can also compute the equilibrium values of the economy for labor income taxes that exceed 75%. In conclusion, even if we have a problem that is potentially amenable to parallel programming, additional problems might arise that keep us from using these techniques. Nevertheless, we encourage the reader to make use of parallel programming, in particular in the computation of transition in large-scale OLG models that rely upon the solution with the help of value function iteration. 31
R In the GAUSS and MATLAB programs, laffer_p.g and Laffer_p.m, you need to set the variable parallel in lines 23 and 11 equal to 1 (0) if you want to compute the Laffer curve (not) using parallelization. Both programs use the for-loop which can be parallelized.
10.4 The Transition Path
579
10.4 The Transition Path In their seminal work, Auerbach and Kotlikoff (1987) laid the groundwork for the modern analysis of dynamic fiscal policy. Typically, such works analyze the question of how a particular policy affects the welfare of different generations. For example, how a change in the pension replacement rate, i.e., the ratio of pensions to wage income, affects the lifetime utility of present and future cohorts of the population. In their analysis, Auerbach and Kotlikoff (1987) assume that the economy is in a steady state in period 0 that, for example, is characterized by a replacement rate of 30%. At the beginning of period 1, the government announces an unexpected change in pension policy, for example a decrease in the replacement rate to 20% that becomes effective in period t. Agents have perfect foresight and already adjust their behavior in period 1 and all subsequent periods. After a certain number of transition periods, the economy converges to the new steady state. The number of transition periods is taken as approximately 2-3 times the number of generations. Auerbach and Kotlikoff (1987), for example, assume in their 55-overlapping-generations model that the economy has reached the new steady state after 150 periods. In the following, we will study the computation of the transition path in a model with perfect foresight. First, we present a simple stylized 6-period model, which we have chosen for illustrative purposes. Subsequently, we describe our basic algorithm to solve this problem. The main insights also carry over to larger-scale models. If we consider a 6-period model, the transition is complete after some 20 periods, and we have to predict the time path of the aggregate variables in our model. In the 6-period model, these aggregate variables will be the aggregate capital stock and employment. Therefore, we will have to predict 40 values. For the most simple initial guess and a simple updating scheme of the transition path, the iteration over the time path of the aggregate variables will converge. However, this does not need to be the case in more complex models where we have to predict time paths consisting of several hundred or even thousand variables. In the next section, you will be introduced to a much more complex 70-period OLG model of the demographic transition. In this case, we need to apply much more sophisticated updating schemes for the transition path.32 In the second part of this section, we will therefore 32
In our own work, we have found examples where simple Newton-Raphson methods or linear updating schemes do not converge, for example, in the model of Heer and Irmen (2014).
580
10 Overlapping Generations Models with Perfect Foresight
introduce you to three different ways to update the transition path: the linear, Newton, and Broyden methods.
10.4.1 A Stylized 6-Period OLG Model In the spirit of Auerbach and Kotlikoff (1987), we compute the transition dynamics associated with a long-run once-and-for-all change in fiscal policy as in the following. In particular, we consider an unexpected change in the replacement rate from 30% to 20%, which is announced and becomes effective in period 1. Whereas Auerbach and Kotlikoff (1987) consider a 55-period model in their original work, we distinguish only 6 generations in our model. The periods in our model can be interpreted as decades. Of course, the main idea of the solution method is unaffected by this innocent assumption. During the first 4 decades, the agents are working, while during the last two decades of their life, they are retired. Otherwise, the model is exactly the same as that described in Section 10.2. For your convenience, we have summarized the description of the economy in Example 10.4.1. As we consider decades rather than years, we also need to adjust the calibration of the discount factor β and the depreciation rate δ. The new values are also summarized in Example 10.4.1. Example 10.4.1 Households live for 6 periods. Each generation is of measure 1/6. In the first 4 periods, they are working l hours; in the last two periods, they are retired and receive pensions. Households maximize lifetime utility at age 1 in period t: Ut =
6 X s=1
s s β s−1 u(c t+s−1 , 1 − l t+s−1 ).
Instantaneous utility is a function of both consumption c and leisure 1 − l: u(c, 1 − l) =
(c(1 − l)γ )1−η − 1 . 1−η
The working agent of age s faces the following budget constraint in period t: s s s ks+1 t+1 = (1 + r t )k t + (1 − τ t )w t l t − c t , s = 1, . . . , 4.
The budget constraint of the retired worker is given by s s ks+1 t+1 = (1 + r t )k t + pen t − c t , s = 5, 6
10.4 The Transition Path
581
with k1t = k7t = 0 and l t5 = l t6 = 0. The total time endowment is normalized to one and allocated to work l t and leisure, 1 − l t . Production Yt is characterized by constant returns to scale and assumed to be Cobb-Douglas: Yt = K tα L 1−α . t In a factor market equilibrium, factors are rewarded with their marginal product: w t = (1 − α)K tα L −α t ,
r t = αK tα−1 L 1−α − δ. t
Furthermore, the government budget is balanced in every period t: 1 pen t . 3 Note that the number of retirees relative to the total population amounts to 1/3. In equilibrium, individual and aggregate behavior are consistent, and the aggregate capital stock K t , employment L t , and consumption C t are equal to the sums of the respective individual variables, kst , l ts , and c ts , weighted by their measure 1/6: τt w t L t =
Lt = Kt = Ct =
4 X ls t
s=1
6
6 X ks t
s=1
6
6 X cs t
s=1
6
, , .
Moreover, the goods market clears: K tα L 1−α = C t + K t+1 − (1 − δ)K t . t
In period 0, the economy is in the steady state associated with the parameter values β = 0.90, η = 2.0, γ = 2.0, α = 0.3, and δ = 0.40 and a replacement rate of pen pensions relative to net wage earnings equal to r epl net = (1−τ)wt ¯l = 30%, where t t ¯l t is the average labor supply in the economy. In period t = 1, the government announces a change in the replacement rate to r epl net = 20%, which becomes instantaneously effective in period 1.
10.4.2 Computation of the Transition Path The Auerbach-Kotlikoff problem in Example 10.4.1 is solved in six basic steps that are described in the following Algorithm 10.4.1:
582
10 Overlapping Generations Models with Perfect Foresight
Algorithm 10.4.1 (Computation of the Transition Dynamics for the 6-Period OLG Model of Example 10.4.1) Purpose: Computation of the transition dynamics. Steps: Step 1: Choose the number of transition periods t c . Step 2: Compute the initial and final steady state solutions for periods t = 0 and t = t c + 1, respectively. Step 3: Provide an initial guess for the time path of the aggregate variables tc {(K t0 , L 0t )} t=1 . Step 4: Compute the transition path. tc Step 5: If the new value {(K t1 , L 1t )} t=1 is close to the starting value, stop. Otherwise, update the initial guess, and return to Step 4. Step 6: If the aggregate variables in period t c are not close to those in the new steady state, increase t c , and return to step 3 using the transition path from the last iteration in the formulation of an initial guess. In Step 1, we need to assume that the transition only lasts a finite number of periods to compute the transition. Typically, if T denotes the number of generations, researchers pick a number of transition periods approximately equal to 3 × T , which is usually found to be sufficient to guarantee convergence to the new steady state. We will choose t c = 20 model periods, corresponding to 200 years. As we will find, this number of periods is sufficiently high, and the transition will be complete. The computation is implemented in the program OLG6_trans.g. In Step 2, we compute the old and new steady states using the methods described in Section 10.2 above. In particular, for this simple problem, we are able to use direct computation. The wealth-age profile and the labor supply-age profiles for the two steady states are displayed in Figure 10.10. Note that savings increase in the new steady state because the government reduces pensions and the agents accumulate higher private savings for old age. Since the government reduces pensions, it is also able to cut wage taxes while keeping the government budget balanced. Taxes τ are reduced from 13.04% to 9.09%. Consequently, the labor supply is higher in the new steady state than in the old steady state. The aggregate capital stock and employment amount to 0.0667 (0.0590) and 0.234 (0.229) in the new (old) steady state with a replacement rate of 20% (30%), respectively. In Step 3, we provide a guess for the dynamics of the capital stock and tc employment, {(K t , L t )} t=1 . We know that K0 = 0.0590 (L0 = 0.229) and K t c +1 = 0.0667 (L t c +1 = 0.234). As an initial guess, we simply interpolate
10.4 The Transition Path
583
Old Steady State
New Steady State
0.125
0.400
0.100
0.375
ls
ks
0.075 0.050
0.350 0.325
0.025
0.300
0.000 1
2
3
4
5
6
1
2
3
4
Figure 10.10 Wealth-Age and Labor-Supply-Age Profiles in the New and in the Old Steady State
linearly between these values. Given the time path of the aggregate state variables, we can compute wages, interest rates, pensions, and the tax rate from the first-order conditions of the firm and the balanced budget constraint. COMPUTATION OF THE TRANSITION PATH FOR GIVEN FACTOR PRICES, TAXES, AND PENSIONS. In Step 4, we need to compute the transition between the old and the new steady state. For this reason, the two steady states need to be saddlepoint stable, an issue that we turn to in Section 11.3. Given the sequence of factor prices, taxes, and pensions, we can compute the capital stock and labor supply of the s-year-old household in period t, s = 1, . . . , 6, starting in the last period of the transition t c and going backward in time. As in the case for the computation of the steady state, we use the first-order conditions of the households to compute the capital stock and the labor supply of the household born in period t = 20, 19, . . . , 0, −1, −2, −3, −4: γ
s c t+s−1
s 1 − l t+s−1
= (1 − τ t+s−1 )w t+s−1 , s = 1, . . . , 4
(10.42a)
s+1 −η s+1 γ(1−η) (c t+s ) (1 − l t+s ) 1 = s (1 + r t+s ), s = 1, . . . , 5 s −η β (c t+s−1 ) (1 − l t+s−1 )γ(1−η) (10.42b)
with l t5 = l t6 = 0. Furthermore, we substitute consumption from the budget constraint
584
10 Overlapping Generations Models with Perfect Foresight s s c t+s−1 = (1 − τ t+s−1 )w t+s−1 l t+s−1 + (1 + r t+s−1 )kst+s−1 − ks+1 t+s
for s = 1, . . . , 4 and
s c t+s−1 = pen t+s−1 + (1 + r t+s−1 )kst+s−1 − ks+1 t+s
for s = 5, 6 and use k1t = k7t = 0 such that (10.42) is a system of 9 nonlinear 4 2 3 equations in the 9 unknowns (k2t , k3t+1 , k4t+2 , k5t+3 , k6t+4 , l t1 , l t+1 , l t+2 , l t+3 ). In the program OLG6_trans.g, this nonlinear equations system is coded in the procedure rftr and solved with the modified Newton-Rhapson method described in Section 15.3.2. The sequences of the factor prices, pensions, and income tax rates have to be specified as global variables. We can use this routine in all periods of the transition and in the steady state. For example, during our first iteration over the aggregate capital stock and labor supply, we use the time sequences 0 0 0 (K20 , K21 , . . . , K25 ) = (0.663, 0.667, . . . , 0.667)
and 0 0 0 (L20 , L21 , . . . , L25 ) = (0.2332, 0.2334, . . . , 0.2334),
where the values for the periods t = 21, . . . , 25 are equal to the new steady-state values. From these sequences, we compute the factor prices 0 25 0 25 {(w0t , r t0 )}25 t=20 , the tax rate {τ t } t=20 , and the pensions {pen t } t=20 . We store them as global variables and compute the policy of the household 4 1 2 6 1 born in period t = 20, (k20 , k21 , . . . , k25 , l20 , . . . , l23 ). We continue in the same way for the agents born in t = 19, . . . , 1. For the agents that are born prior to period 1, t = 0, . . . , −4, we need to modify our computation. As an example, let us consider the computation of the policy functions for the household born in period t = 0. We assumed that the change in policy is unexpected. Therefore, the agent does not know in period t = 0 that the policy change will come into effect in period t = 1 and, hence, that the factor prices, tax rates, and pensions will be different from the old steady-state values starting in period t = 1. Therefore, we cannot simply use the vectors (K00 , K10 , . . . , K50 ) and (L00 , L10 , . . . , L50 ) and the corresponding factor prices, tax rates, and pensions together with the system of non-linear equations (10.42) to compute the optimal allocation. In period 0, the household behaves exactly as the household in the old steady state. Accordingly, its savings and labor supply in the first period of its life are also equal to those of the 1-year-old household in the old steady state, k2 = 0.0373 and l 1 = 0.395. Therefore, we modify the
10.4 The Transition Path
585
procedure rftr in the program OLG6_trans.g and replace the two firstorder conditions (10.42a) and (10.42b) for s = 1 with the condition that k2 and l 1 are equal to the old steady-state values. We proceed in the same way for t = −1, . . . , −4. In Figure 10.11, we depict the wealth-age and labor-age profile for the household born in period t = −2 with the red line. At age s = 4 in period t = 1, it learns about the change in policy. It adjusts its labor supply and savings in this period. Consequently, l 4 and k5 are different from the old steady-state values that are depicted by the blue line. k4 is still determined by the household’s behavior in period 0. Household Born in t = −2
Old Steady State
0.400
0.100
0.375
ls
ks
0.075 0.050
0.350 0.325
0.025
0.300
0.000 1
2
3
4
5
6
1
2
3
4
Figure 10.11 Wealth-Age and Labor-Age Profiles in the Old Steady State and for the Household Born in Period t = −2
We stop our computation for the household that is born in period t = −4 because the households born in periods t = −5, −6, . . . do not have any effect on the aggregate capital stock and employment during the transition, as they are not alive in period 1. It is straightforward to compute the aggregate capital stock and employment in each period t = 1, . . . , 20 from the savings and labor supply of the households that are alive in period t: 1X s 1X s Kt = kt , L t = l . 6 s=1 6 s=1 t 6
4
Of course, we update the aggregate capital stock and employment for t, t + 1, . . . , t + 5 directly after the computation of the optimal savings and labor supply of the household born in period t so that we do not have to store the optimal policies for all generations.
586
10 Overlapping Generations Models with Perfect Foresight
The computation with the direct method is fast and accurate. The computation for the savings and labor supply of each generation takes only fractions of a second, and the accuracy is equal to that of the nonlinear equation solver (10−10 ). As we already mentioned in Section 10.2, the direct computation of the first-order conditions with nonlinear equations methods may not be feasible in more complex models. In the application in the next chapter, for example, we will not be able to use it but have to resort to the more time-consuming value function iteration method instead. UPDATING SCHEMES. In Step 5, we need to update the time path for the t c =20 aggregate variables {(K t , L t )} t=1 . We will consider three methods: 1) simple linear updating, 2) the Newton-Raphson method and 3) Broyden’s method. 1) With linear updating, we simply compute the new capital stock K t and employment L t as a weighted average of the old and the new value in iteration i over the transition path, e.g., K ti+1 = φK ti +(1− φ)K ti∗ , where K ti denotes the value of the capital stock used in the last iteration and K ti∗ is the value that is found in iteration i by averaging the individual capital stocks of the households alive in period t. In the program OLG6_trans.g, we choose φ = 0.8. Convergence of the time paths for the capital stock and employment {(K t , L t )}20 t=1 occurs after 14 iterations. The computation takes 9 hundredths of a second on a computer with Intel(R) Xeon(R) CPU running at 2.90 GHz.
0.066
Kt
0.064 0.062 0.060 0
5
10
15
20
t Figure 10.12 Transition from the Old to the New Steady State
The computed dynamics of the aggregate capital stock K t are displayed in Figure 10.12. Obviously, the economy has converged from the old to the
10.4 The Transition Path
587
¯ = 0.0590 to K = 0.0667. new steady-state aggregate capital stock, from K In period 0, the economy is in the old steady state. All agents have chosen their next-period capital stock k1s , s = 1, . . . , 6 under the assumption that there is no change in fiscal policy. Consequently, the capital stock of an s-year-old generation in period 1, k1s , is also equal to the capital stock of the s-year-old generation in period 0, k0s . Accordingly, the aggregate ¯ = K0 = K1 . Only in period 2 capital stock is equal in these two periods, K does the capital stock K t start to change. In period 20, the last period of the transition, the capital stock K20 is equal to 0.06670 and only diverges from the new steady-state value K = 0.06669 by 0.014%. 2) In Step 5, we search for the solution of the nonlinear equation
K1i∗ − K1i K2i∗ − K2i .. .
i∗ i K − K20 i i g(K1i , K2i , . . . , K20 , L1i , L2i , . . . , L20 ) = 20i∗ = 0. L1 − L1i i∗ L2 − L2i .. . i∗ i L20 − L20
To apply the Newton-Raphson algorithm, we have to compute the Jacobian matrix, which is a 40 × 40 matrix in our Example 10.4.1. Let xi denote i i 0 our stacked column vector (K1i , K2i , . . . , K20 , L1i , L2i , . . . , L20 ) and denote by i J(x ) the Jacobian matrix that results from the differentiation of the above equation g(xi ). In the next step, we update the time path for the aggregate variables according to (15.13) xi+1 = xi − J(xi )−1 g(xi ). In the program OLG6_trans.g, the solution for the aggregate variables is found after two iterations. The runtime amounts to 1.11 seconds because the computation of the Jacobian matrix is ‘relatively’ time-consuming. Our results are summarized in Table 10.4. 3) In many applications that are based upon the OLG framework, the number of generations and transition periods is much higher than in Example 10.4.1, and we may have to solve a system g(x) = 0 in several hundred variables. You will be introduced to such an application in the next section, where the dimension of the Jacobian matrix amounts to 1500 × 1500. In these cases, the computation time for the Jacobian matrix
588
10 Overlapping Generations Models with Perfect Foresight Table 10.4 Computation of the Transition Path: Runtime Method
Runtime
Iterations
Linear Update
0:09
14
Newton
1:11
2
— Jacobian matrix
0:64
2
— Steady-state derivatives
0:05
3
Broyden
Notes: Runtime is given in seconds:hundredths of seconds on a computer with Intel(R) Xeon(R) CPU running at 2.90 GHz. In the Broyden updating step, a first initialization of the Jacobian matrix is either provided by the Jacobian of the non-linear system describing the transition path or by using the derivatives of the nonlinear equations that are describing the final steady state for an approximation of the Jacobian.
becomes prohibitive, especially if we need to iterate many times in Step 5, not only twice as in the present Example 10.4.1. In these cases, we advocate for the Broyden algorithm that is described in greater detail in Section 15.3.2. This algorithm is identical to the Newton-Raphson algorithm except that you do not compute the Jacobian matrix in each step but rather use an approximation for the update. In the program OLG6_trans.g, you may choose between two different approaches to initialize the Jacobian matrix in the first iteration. In the first case, you compute the actual Jacobian matrix. In the second case, we use an initialization that has been suggested by Ludwig (2007). In particular, we assume that i) the choice of K ti (L it ) has an effect only on the capital stock K ti∗ (employment L i∗ t ) in the same period and ii) the effect is identical to that in the final steady state. Therefore, we set all elements off the diagonal in the Jacobian matrix equal to zero and initialize the elements on the diagonal with the partial derivative of the variable in the respective nonlinear equation that describes the final steady state. The two conditions for the final steady state in period 21 are given by: ∗ h1 (K21 , L21 ) K21 − K21 h(K21 , L21 ) = 2 = ∗ = 0. L21 − L21 h (K21 , L21 ) According to this equation, the capital stock K21 and employment L21 and associated factor prices w21 and r21 imply individual savings and labor
10.5 The Demographic Transition
589
∗ ∗ supply that add up to K21 and L21 . Therefore, we have to compute the 1 partial derivatives ∂ h (K21 , L21 )/∂ K21 and ∂ h2 (K21 , L21 )/∂ L21 .33 In Table 10.4, note that the Broyden algorithm is much faster than the Newton algorithm, especially if we use the steady-state derivatives to form an initial guess of the Jacobian matrix. In this case, the gain in speed is considerable, as we only have to find the derivatives of a twodimensional nonlinear equations system that is describing the steady state rather than the 40-dimensional nonlinear equations system that describes the transition. Convergence (as measured by the number of iterations over the transition path) is slower with the Broyden algorithm than with the Newton algorithm, but the slower convergence is usually outweighed by the gain in speed in the computation of the Jacobian matrix. As another alternative, many studies consider a variant of the GaussSeidel algorithm. In Section 15.3.2, we introduce you to the details of the Gauss-Seidel algorithm, and you are asked to apply this algorithm to the solution of the model in this section in Problem 10.7. In our experience, the Broyden algorithm seems to dominate the other updating schemes in more complex models in terms of convergence and robustness. This is also confirmed by findings of Ludwig (2007), who advocates a hybrid algorithm for the solution of complex OLG models with more complicated nonlinear transitional dynamics that combines the Gauss-Seidel and Broyden’s method.
10.5 The Demographic Transition In the following, we consider the transition dynamics between two steady states in a more sophisticated model. As an example, we analyze the demographic transition in an economy with 70 overlapping generations during the period 2015-2314 where the government unexpectedly decreases the replacement rate of pensions in the year 2015 once and for all from 35.2% to 30%. Prior to the policy change (in the year 2014) and after the transition (in the year 2315), we assume that the economy is in a stationary equilibrium implied by the demographics and pension policy prevailing in the respective year. The demographics are calibrated with respect to those in the United States. 33
In addition, you may also initialize the elements of the Jacobian matrix that describe the contemporaneous effects of K ti (L ti ) on L ti∗ (K ti∗ ) using the cross derivatives ∂ h1 (K21 , L21 )/∂ L21 and ∂ h2 (K21 , L21 )/∂ K21 .
590
10 Overlapping Generations Models with Perfect Foresight
In this model, we need to find the time path for a five-dimensional vector consisting of the capital stock K, employment L, government transfers t r and pensions pen j for two household types j = 1, 2 over a time horizon of 300 periods. In essence, we have to solve a nonlinear system of equations of 1500 variables. As one possible approach to this problem, we will propose Broyden’s method that you were introduced to in the previous section and that is also described in greater detail in Section 15.3.2. Let us first turn to the description of the model and then study the computation.
10.5.1 The Model We analyze the effects of a (very simplified) demographic transition on savings and factor prices in a model with overlapping generations and heterogeneous agents. We consider a transition period of 300 years (periods) from the initial year 2015 until the final year 2314. Beyond 2100, we assume that the demographic variables are constant. There are three sectors in the economy: households, firms, and the government. Workers accumulate savings for old age. Firms maximize profits. The government collects taxes and social security contributions and runs a balanced budget. DEMOGRAPHICS AND TIMING. The demographics are specified in the same way as in Section 10.3. A period, t, corresponds to one year. At each t, a new generation of households is born. Newborns have a real-life age of 20 denoted by s = 1. All generations work for 45 years until age s = T W , corresponding to real-life age 64, and retire thereafter. They live up to a maximum age s = T = 70 (corresponding to real-life age 89). At t, all agents of age s survive until age s + 1 with probability φ ts , where φ t0 = 1.0 and φ tT = 0.34 The survival probabilities in the initial state in the year 2014 and the final periods 2100-2315 are illustrated in Figure 10.13. The (projected) survival probabilities increase over time due to technological advances in the health sector. Let Nts and Nt denote the number of agents of age s at t and total population, respectively. We assume that the youngest cohort at age s = 1 will have 1 + n t children and identify the population growth rate with 34
For simplicity, we assume that survival probabilities for the s-aged agent are constant after 2100.
10.5 The Demographic Transition
591
1.00
φs
0.98 0.96
2014 2100
0.94 0.92 0.90 1
10
20
30
40
50
60
69
s Figure 10.13 Survival Probabilities in the Years 2014 and 2100
n t .35 For this reason, we will also apply the ‘medium’ forecast of the US population rate for the time period 1955-2100 provided by UN (2015). The time series is graphed in Figure 10.14 and is available as a download from our homepage together with the GAUSS computer code Demo_trans.g. Beyond 2100, we again assume that the population grows at the constant rate of 0.20% annually as forecasted by the UN for 2100. 1.75
Percent
1.50 1.25 1.00 0.75 0.50 0.25 1960
1980
2000
2020
2040
2060
2080
2100
Year Figure 10.14 US Population Growth Rate 1950-2100 (annual %)
To simplify our analysis, we assume that prior to the initial period of the transition (in the periods t < 1), the economy is in a stationary state with constant survival probabilities and population growth rate equal to those prevailing in 2014. 35
Of course, this is a simplifying assumption. In Chapter 11.2, we will derive the growth rate of the population of the newborns with s = 1 as a function of the population growth rate n t .
592
10 Overlapping Generations Models with Perfect Foresight
HOUSEHOLDS. Each household comprises one worker of efficiency type j. Its consumption and labor supply in period t at age s are denoted by s, j s, j c t and l t . The total time endowment is normalized to one, so leisure amounts to 1 − l. Households with permanent efficiency ε j maximize intertemporal utility at the beginning of age 1 in period t: max
T X s=1
s, j s, j i−1 β s−1 Πsi=1 φ t+i−2 u(c t+s−1 , 1 − l t+s−1 ),
(10.43)
where instantaneous utility u(c, 1 − l) is a function of consumption c and leisure 1 − l: 1−η c γ (1 − l)1−γ u(c, 1 − l) = , η > 0, γ ∈ (0, 1); (10.44) 1−η
here, β > 0 denotes the discount factor, γ the weight of consumption in the Cobb-Douglas utility function and η the coefficient of relative risk aversion.36 Households are heterogeneous with respect to age, s, individual labor efficiency, e(s, j), and wealth, a. We stipulate that an agent’s efficiency e(s, j) = ¯y s ε j depends on its age, s ∈ S ≡ {1, 2, ..., 45}, and its permanent efficiency type, ε j ∈ E ≡ {" 1 , " 2 }. The share of the permanent efficiencies in each cohort, Γ (ε j ), j = 1, 2, is constant over time. We choose the TW age-efficiency profile, { ¯y s }s=1 , in accordance with the US wage profile as illustrated in Figure 10.4. The permanent efficiency types ε1 and ε2 are meant to capture differences in education and ability. The net wage income in period t of an s-year-old household with efp s, j ficiency type j is given by (1 − τlt − τ t ) w t e(s, j) A t l t , where w t and A t denote the wage rate per efficiency unit in period t and aggregate productivity, respectively. Wage income is taxed at rate τlt . Furthermore, the p worker has to pay contributions to the pension system at rate τ t . A retired j worker does not work, l ts = 0 for s > T W , and receives pensions pen t that depend on his efficiency type j. Households are born without assets at the beginning of age s = 1; 1, j hence, a t = 0 for j = 1, 2. Parents do not leave bequests to their children, and all accidental bequests are confiscated by the government. We could have introduced bequests and a parent-child link into the model; however, this would greatly complicate the computation (see Heer (2001b)). As an alternative, we could have assumed the presence of perfect annuity 36
Compare (1.52b).
10.5 The Demographic Transition
593
markets as in Krueger and Ludwig (2007), for example. In this case, the end-of-period assets of the dying households are shared equally by the surviving members of the same cohort.37 s, j The household earns interest r t on its wealth a t ∈ R. Capital income is taxed at rate τkt . In addition, households receive lump-sum transfers t r t from the government. As a result, the budget constraint at t of an ss, j year-old household with productivity type j and wealth a t is represented by: s, j j p s, j 1s>T W × pen t + (1 − τlt − τ t )w t e(s, j)A t l t + 1 + (1 − τkt )r t a t s, j
s+1, j
+t r t = c t + a t+1 ,
where 1s>T W is an indicator function that takes value 1 if s > T w , i.e., if the household is retired, and 0 otherwise. FIRMS. At each t, firms produce output, Yt , with capital, K t , and aggregate effective labor, A t L t , according to the following constant-returns-to-scale production function: Yt = K tα (A t L t )1−α .
(10.45)
Productivity A t grows at the exogenous rate gA.38 Aggregate effective labor is defined as the sum of the individual effective labor supplies: Lt =
Tw X 2 X s=1 j=1
s, j
Nts Γ (ε j ) ¯y s ε j l t .
(10.46)
Profits are represented by Π t = Yt − r t K t − w t A t L t − δK t ,
and profit maximization gives rise to the first-order conditions: ∂ Yt K t α−1 = rt + δ = α , (10.47a) ∂ Kt At L t ∂ Yt Kt α = w t = (1 − α) , (10.47b) ∂ (A t L t ) At L t Again, w, r, and δ denote the real wage rate, the real interest rate, and the depreciation rate, respectively. 37
You are encouraged to make the necessary changes in the computer program Demo_ trans.g and compute the sensitivity of our results with respect to this assumption.
38
Heer and Irmen (2014) consider a model of the demographic transition with endogenous growth.
594
10 Overlapping Generations Models with Perfect Foresight
GOVERNMENT. The government collects income taxes Tt to finance its expenditures on government consumption G t and transfers Tr t . In addition, it confiscates all accidental bequests Beq t . The government budget is balanced in every period t: G t + Tr t = Tt + Beq t .
(10.48)
Government spending is a constant fraction of output: G t = g¯ Yt . Note that transfers are paid to all Nt households, Tr t = t r t Nt . In view of the tax rates τlt and τkt , the government’s tax revenues are as follows: Tt = τlt w t A t L t + τkt r t Ω t ,
(10.49)
where Ω t is aggregate wealth at t: Ωt =
T X 2 X s=1 j=1
s, j
Nts Γ (e j ) a t .
SOCIAL SECURITY. The social security system is a pay-as-you-go system. The social security authority collects contributions from the workers to finance its pension payments to the retired agents. Pensions pen t = (pen1t , pen2t ) are a constant fraction r epl of labor income of the productivity type j = 1, 2: j j pen t = r epl t × ε j w t ¯l t ,
(10.50)
j
where ¯l t denotes the average working hours of workers with productivity ε j , j = 1, 2, in period t. In equilibrium, the social security budget is balanced: T 2 X X s=T w +1 j=1
j
p
Nts Γ (e j ) pen t = τ t w t A t L t .
(10.51)
The gross replacement rate of pensions, r epl, is assumed to drop unexpectedly from r epl0 to r epl1 between periods t = 0 (corresponding to the year 2014) and t = 1 (corresponding to the year 2015) and remains p constant thereafter. The contribution rate τ t has to adjust to balance the 39 social security budget. 39
In Problem 10.6, you are also asked to compute the case in which 1) the social security p contribution rate τ t is constant, while r epl adjusts, and 2) the government increases the retirement age by 5 years for those agents born after 2020.
10.5 The Demographic Transition
595
GOODS MARKET EQUILIBRIUM. In equilibrium, aggregate production is equal to aggregate demand: Yt = C t + G t + I t ,
(10.52)
where investment I t is equal to I t = K t+1 − (1 − δ)K t . COMPETITIVE EQUILIBRIUM. In the competitive equilibrium, individual behavior is consistent with the aggregate behavior of the economy, firms maximize profits, households maximize intertemporal utility, and factor and goods markets are in equilibrium. To express the equilibrium in terms of stationary variables only, we have to divide aggregate quantities (with the exception of aggregate labor L t ) by A t Nt and individual variables (with the exception of individual labor l ts ) by A t . Therefore, we define the following stationary aggregate variables: ˜ t := K
Kt Beq t Tt Ωt ˜ t := g , B eq t := , T˜t := , Ω , A t Nt A t Nt A t Nt A t Nt
˜ t := G
Gt Ct Yt Lt , C˜t := , Y˜t := , ˜L t := A t Nt A t Nt A t Nt Nt
and stationary individual variables: ˜c ts
c ts
ast pen t t rt s := , pg en t := , a˜ t := , ter t := . At At At At
Let F t (s, j, a˜) denote the distribution of age s, efficiency type j and individual wealth a˜ in period t. p A competitive equilibrium for a government policy (τkt , τlt , τ t , g¯ , θ , ter t , 1 2 pg en ,g p en ) and initial distribution F0 (s, j, a˜) in period 0 corresponds to a price system, an allocation, and a sequence of aggregate productivity indicators {A t } that satisfy the following conditions: N
1. Population grows at the rate g N ,t = Nt+1 − 1. t 2. In the capital market equilibrium, aggregate wealth is equal to aggregate capital: ˜t . ˜t = K Ω
596
10 Overlapping Generations Models with Perfect Foresight
3. Households maximize intertemporal utility (10.43) subject to the budget constraint: s, j p s, j j 1s>T w × pg en t + (1 − τlt − τ t )w t e(s, j)l t + 1 + (1 − τkt )r t a˜ t + ter t s, j
s+1, j
= ˜c t + a˜ t+1 (1 + gA)
with l ts = 0 for t > T W . This gives rise to the two first-order conditions: s, j
1 − γ ˜c t p 0= − (1 − τlt − τ t )w t e(s, j), (10.53a) s, j γ 1−l t γ(1−η)−1 s, j s, j 0 = ˜c t (1 − l t )(1−γ)(1−η) − β(1 + gA)γ(1−η)−1 φ ts
× 1 + (1 − τkt+1 )r t+1
s+1, j
˜c t+1
γ(1−η)−1
s, j
4. 5.
6. 7. 8. 9.
s, j
(10.53b)
s+1, j
(1 − l t+1 )(1−γ)(1−η) .
Individual labor supply l t , consumption ˜c t , and optimal next-period s, j assets (˜ a0 ) t in period t are functions of the individual state variables j, s, and a˜ and depend on the period t. Firms maximize profits satisfying (10.47a) and (10.47b). g The aggregate variables labor supply ˜L t , bequests B eq t , consumption ˜ t , and transfers T fr t are equal to the sum of the individual C˜t , wealth Ω variables. The government budget (10.48) is balanced. The social security authority provides pensions pen j , j = 1, 2, as presented in (10.50) and runs a balanced budget (10.51). The goods market (10.52) clears. The distribution F t (s, j, a˜), j = 1, 2, evolves according to F t+1 (s, j, a˜0 ) = φ ts
Nt F t (s, j, a˜0−1 (s, j, a˜0 )), s = 1, . . . , 74, Nt+1
and for the newborns F t+1 (1, j, 0) =
1 Nt+1
Nt+1
× Γ (ε j ),
1 with Nt+1 = (1 + n t )Nt1 . Above, a˜0−1 (s, j, a˜0 ) denotes the inverse of the function a0 (s, j, a˜) with respect to its third argument a˜.40 Notice further that only the fraction φ s of the s-year-old households survive and that its share in total population shrinks by the factor Nt /Nt+1 due to population growth. 40
In particular, we assume that a˜0 (s, j, a˜) is invertible.
10.5 The Demographic Transition
597
10.5.2 Calibration The model is calibrated for the initial steady state in the year 2014 in accordance with the calibration in Section 10.3. The production and government parameters are chosen as in Trabandt and Uhlig (2011). In particular, we set the production elasticity of capital to α = 0.35, the depreciation rate to δ = 8.3%, and the economic growth rate gA to 2.0%, as presented in Table 10.2. Furthermore, the government share is equal to G/Y = 18.0%, the sum of the labor income tax and the social security contribution rates amount to τl +τ p = 28%, and the tax on capital income is set to τk = 36%. The gross replacement rate of pensions is set to r epl = 35.2%. With regard to the preference parameters, we again assume that the coefficient of relative risk aversion is equal to η = 2.0. The relative weight of consumption in utility, γ, is calibrated to 0.25 to imply an average labor supply equal to 0.30.41 As new parameters, we introduce the permanent efficiencies (ε1 , ε2 ) = (0.57, 1.43) following Storesletten et al. (2004). We also set the share of each efficiency group equal to 1/2, Γ (ε j ) = 0.5, j = 1, 2. By this choice, the Gini coefficients of gross income and wages are slightly below the empirical values reported by Díaz-Giménez et al. (1997) and amount to 0.335 and 0.252, respectively. As a consequence, the Gini coefficient of wealth is also too small and equals 0.485. Finally, we calibrate β such that the real interest rate amounts to 3%. Therefore, we must choose β = 1.0275. The transfers t r and the social security contribution rate τ p follow endogenously from the fiscal and social security budget constraints. The share of transfers in GDP is equal to -0.2%, while the social security contribution rate τ p amounts to 9.68%. 41
Note that the Cobb-Douglas utility function (10.44) only allows for the choice of one free parameter γ, while (10.20) provided two free parameters, κ and ϕ. Therefore, it is not possible to simultaneously match both the average labor supply and the Frisch labor supply elasticity by the choice of γ. It is straightforward to show (see, for example, Appendix 4.2 in Heer (2019)) that the Frisch labor supply elasticity amounts to ηl,w =
1 − γ(1 − η) 1 − l = 1.48 η l
in the case of the Cobb-Douglas utility function (10.44). Therefore, we choose a higher Frisch labor supply elasticity than in the model in Section 10.3.
598
10 Overlapping Generations Models with Perfect Foresight
10.5.3 Computation In the following paragraphs, we will describe the computation of the transition dynamics. We choose a time horizon of 300 years, which we may interpret as the years 2015-2314, corresponding to the periods t = 1, . . . , 300. We start with an initial guess of the time path for the aggregate 1 2 ˜ t , ˜L t , ter t , pg variables {(K en t , pg en t )}300 t=1 and iterate until it converges. The allocation in period t = 0 is given and corresponds to the initial steady state. The computation of the transition is very time-consuming. In each iteration, we have to compute the optimal allocation separately for all individual generations that are born in the years 1946, 1947, . . . , 2314 since each generation faces a different factor price sequence and hence chooses a different allocation. We also have to consider the generations born prior to 2015 because they are still alive in 2015, and their savings and labor supply contribute to the aggregate capital stock and employment, respectively. Therefore, we have to go back in time until 1946, when the oldest generation that is still alive in 2015 was born. The solution of the perfect foresight model with Broyden’s method is described by the following steps: Algorithm 10.5.1 (Computation of the Transition Dynamics for the Perfect Foresight OLG Model in Section 10.5 with a Broyden Algorithm) Purpose: Computation of the transition dynamics. Steps: Step 1: Compute the demographics. Step 2: Compute the initial and final steady-state solutions for the periods t = 0 and t = 301, respectively. Step 3: Provide an initial guess for the time path of the aggregate variables 1,0 2,0 ˜ 0 , ˜L 0 , ter 0 , pg {(K en t , pg en t )}300 t t t t=1 . Step 4: Compute the transition. 1,1 2,1 ˜ 1 , ˜L 1 , ter 1 , pg Step 5: If the new value {(K en t , pg en t )}300 t t t t=1 is close to the starting value, stop. Otherwise, update the initial guess with the help of Broyden’s method and return to Step 4. The algorithm is implemented in program Demo_trans.g. In Step 1, we first initialize the number of transition periods. In our case, we assume that the transition takes 300 periods. As mentioned above, you should
10.5 The Demographic Transition
599
choose a number that is at least three times higher than the maximum age of the households. As we will find, 300 periods is sufficient in our case. Next, we load the UN data on the (projected) demographic variables from the Excel file survival_probs_US.xls that is also provided as a download on our home page. The data must be transformed from 5-year values to annual values for both dimensions, age and time. For example, the survival probability of the 20-25 year olds in the period 2010-15 to ages 25-30 in the period 2015-20 is computed with the help of the UN data on life expectancy. We use cubic spline interpolation along both dimensions to convert these to annual data.42 With the help of the demographic variables, the population growth rate and the survival probabilities, we are able to compute the measures of the population and individual generations during the transition 2015-2314 and in the new steady state 2315. The total measure of the population is normalized to one in the initial steady state in 2014 corresponding to t = 0. In the following years, the sizes of the cohorts Nts are computed with the help of the following iteration over t = 1, . . . , 301: s−1 s−1 φ t−1 Nt−1 , s = 2, . . . , T s Nt = (10.54) 1 (1 + n t ) Nt−1 , s = 1. In Figure 10.15, we display the stationary distribution of the cohorts in the initial and final steady states, 2014 and 2315, that are associated with these survival probabilities. Note that due to the aging of the US population, the share of the young cohorts declines relative to those of the older cohorts, as the survival probabilities rise, and the population growth rate falls from 0.75% in 2014 to 0.20% in 2100 and beyond. As a consequence of the demographic transition, the labor force shrinks. Since fewer young workers are born and households are becoming older, the share of retired agents rises in parallel, and the burden of the pensions increases. We, therefore, have two simultaneous detrimental effects on public finances. The revenues from income taxes fall, and the expenditures on social security rise. The sustainability of public pensions is one of the main challenges facing modern industrialized countries in the coming decades. The dynamics of the labor force share are graphed in Figure 10.16. In our economy, we observe a considerable decline in the labor force share from 78.6% to 69.5%. In addition, the composition of the labor 42
Cubic spline interpolation is described in greater detail in Section 13.6.2.
Percent
600
10 Overlapping Generations Models with Perfect Foresight 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2
Initial Steady State Final Steady State 1
10
20
30
40
50
70
60
Age Figure 10.15 Stationary Age Distribution, Initial and Final Steady State
force changes. In particular, the share of older agents in the labor force increases, and therefore, the average productivity of the worker also rises.
Percent
78 76 74 72 70 2015
2050
2100
2150
2200
2250
2314
Year Figure 10.16 Decline of the Labor Force Share during the Transition
Note that the labor force share continues to decline beyond 2100, even though we assume that survival probabilities and population growth rates are constant after 2100. Similarly, we observe that the population Nt continues to grow at a higher rate than n2100 = 0.20% for some time (not presented). At first sight, this result seems puzzling. To understand this point more clearly, we have to distinguish between the fertility rate n t and the population growth rate g tN , which, for simplification, we have used interchangeably thus far. However, for a non-stationary population, the two rates do not coincide. To understand the basic mechanism that is at work here, consider a model with two generations where the survival probability from age 1 to age 2 is equal to one and the fertility rate is equal
10.5 The Demographic Transition
601
to n, so that the young generation in period t has (1 + n) children at the end of the period that become the young generation in the next period, t + 1: 1 Nt+1 = (1 + n)Nt1 .
Let Nt denote the measure of the total population, which consists of the 1 measure of the agents born in period t − 1 (the old), Nt−1 , and those 1 born in period t (the young), Nt . Consequently, the total population is represented by 1 Nt = Nt1 + Nt−1 .
In period t = 0, the population is stationary. Until period t = 0, the fertility rate, n t = n > 0, is constant, implying the following: 1 1 1 1 N0 = N01 + N−1 = (1 + n)N−1 + N−1 = (2 + n)N−1 .
Similarly, 1 1 1 1 1 N−1 = N−1 + N−2 = (1 + n)N−2 + N−2 = (2 + n)N−2 .
Consequently, the population growth rate g0N is equal to the fertility rate n in period 0: 1 + g0N =
1 N−1 N0 = 1 = 1 + n0 = 1 + n. N−1 N−2
Let us assume that the fertility rate amounts to n in period t = 0 so that N1 = (1 + n)N01 + N01 while it drops to zero in period t = 1: N2 = N11 + N11 . As a consequence, the population growth rates g tN in periods t = 1 and t = 2 are still larger than zero: 1 + g1N = and
N01 N1 = 1 = 1 + n > 1, N0 N−1
602
10 Overlapping Generations Models with Perfect Foresight
1 + g2N =
1
2N1 N2 2 + 2n = = > 1. 1 N1 2+n (2 + n)N0
Percent
Only in period t = 3 does the population growth rate g tN drop to zero. Since we have 70 overlapping generations in our model, the transition of the labor force share is much more time-consuming, of course. Therefore, the labor force share does not stabilize until 2150. The corresponding dynamics of the old-age dependency ratio are presented in Figure 10.17.43 The population size of the retirees, 65-89 years old, relative to the work force, 20-64 years old, rises from 27.2% in 2015 (in our model) to 44.0% in the final steady state. Accordingly, 2.3 workers have to finance the pensions of one retiree in the pay-as-you-go public pension system in the long-run. 44 42 40 38 36 34 32 30 28 26 2015
2050
2100
2150
2200
2250
2314
Year Figure 10.17 Increase in the Old-Age Dependency Ratio During the Transition
In Step 2, we compute the initial and the final steady states with the help of direct computation. Therefore, we start with a simple model. We s, j first consider an OLG model with inelastic labor supply, l t = 0.3 for s ∈ {1, . . . , 45} and j ∈ {1, 2}. In this case, aggregate employment ˜L t is also exogenous and amounts to ˜L = 0.236 in the initial steady state. With the help of ˜L and ¯l = 0.30, we can compute the social contribution rate τ p from the social security budget. The steady state is computed in the routine getk in the program. Importantly, the system of nonlinear equations consists of two equations. The two unknowns are the aggregate ˜ and government transfers ter in the steady state. The two capital stock K 43
The old-age dependency ratio is a useful statistic for the study of the sustainability and financing of the social security system.
10.5 The Demographic Transition
603
equations reflect the conditions that the sum of the individual assets equals the aggregate capital stock and that the government budget is balanced. The computation of the individual asset allocations {˜ as, j }70 s=1 for given p ˜ , ˜L , τ , and transfers ter in the procedure getk is very fast, as it amounts K to the solution of a linear system of equations. To see this, consider the intertemporal first-order condition for the one-year-old agent and note that the terms involving the labor supply l 1 = l 2 = 0.3 drop out of the equation:44 (˜c 1 )γ(1−η)−1 = β(1 + gA)γ(1−η)−1 φ 1 1 + (1 − τk )r (˜c 2 )γ(1−η)−1 ,
which can be reduced to a linear equation in ˜c 1 and ˜c 2 : ˜c 1 = κ˜c 2 , with 1/(γ(1−η)−1) κ = (1 + gA) βφ 1 1 + (1 − τk )r .
Once we replace ˜c 1 and ˜c 2 with the help of the budget constraint, we find a linear (!) equation in a˜2 and a˜3 (˜ a1 = 0 is known). Similarly, we can find another 73 linear equations involving a˜3 , . . . , a˜70 by starting from the firstorder conditions for the agents aged s = 2, . . . , 69. The solution of a system of linear equations can be computed very fast via the LU decomposition described in Section 12.9. It takes only fractions of a second to solve our problem. We store the wealth-age profile as an initial guess for the direct computation of the steady state. To solve the system of nonlinear equations in the procedure getk, we ˜ and ter. Since we need to provide an initial guess for the two variables K do not have any a priori information on the equilibrium transfers in our ˜ , we model, we set the initial guess of ter equal to zero. For the variable K can use the household’s Euler equation and assume that the consumption profile is flat so that (except at age s = 45 when the labor supply falls to zero in the next period) the marginal utility of consumption is also constant. As a consequence, the Euler equation, 1/β = φ s (1 + (1 − τk )r), ˜ , if we provides us with a guess for the real interest rate r and, hence, K provide an initial value for β. We try the standard value β = 0.99 and approximate the survival probability by one so that our initial guess for ˜ is equal to 7.01. For this initial guess, however, the nonlinear equation K 44
For the ease of exposition, we drop the time index t = 0 from the variables ˜c , l, and φ and the efficiency type j ∈ {1, 2} from the variables ˜c and l.
604
10 Overlapping Generations Models with Perfect Foresight
solver is not able to find a solution for the procedure getk. This is a typical problem in the computation of OLG models. In Section 10.3, we discussed more elaborate ways to come up with an initial guess. In the present case, we only use a trial-and-error approach and consider different values for ˜ = 1.0 works fine. the capital stock.45 We find that K Next, we compute the steady state with elastic labor supply. The most straightforward computation procedure would be the solution of a system of nonlinear equations that consists of both the individual first-order conditions and aggregate equilibrium conditions with the help of a nonlinear equation routine, e.g., the Newton-Rhapson method. We followed this procedure in Section 10.3. In the present case, however, we follow a different approach. In a later step of the algorithm, we need to provide a guess on ß1 , and pen ß2 ˜ , ˜L , ter, pen how the aggregate equilibrium conditions for K depend on a small change in the steady-state values for these variables. Therefore, we need to specify our equilibrium conditions in a nested way. In the upper layer, we provide the five aggregate equilibrium conditions 1 2 ˜ , ˜L , ter, pg on the endogenous aggregate variables K en , and pg en . The five equilibrium conditions specified in the function getss consist of the fiscal budget, the two equilibrium conditions that aggregate capital and labor are equal to the sum of the respective individual variables, and the conditions under which the replacement ratios of the pensions for two efficiency types j = 1, 2 are equal to the policy parameter r epl. In the procedure getss, we compute endogenous factor prices, wages w t , and interest rates r t p together with the equilibrium social security contribution rate τ t . The latter follows from the budget of the social security authority. Subsequently, we solve the nonlinear equations problem of the individual optimization problem that is specified in the function Sys1, our second layer. As an initial guess for the solution of this system of nonlinear equations in the variables a˜s, j and l s, j , we use the solution for the model with exogenous labor supply as computed with the help of the procedure getk. Note that we define the aggregate variables, factor prices, transfers, and social security contributions over the 70-period lifetime of the household as a global variable in our program. In the case of the steady state, these sequences are constant, of course. When we call the routine Sys1, it computes the allocation of the household for any sequence of these variables over the lifetime of the household. In particular, we are also able to use it ˜ and We might have simply provided a grid of values over the aggregate capital stock K chosen the one that minimizes the absolute value of the function getk.
45
10.5 The Demographic Transition
605
for the computation of the household problem during the transition, when the aggregate variables are not vectors of constants. In the last calibration step, we choose β such that the real interest rate is equal to 3.0%. For this reason, we add a new variable, β, and a new aggregate equilibrium condition, r = 0.03, to the function getss, which we name calibbeta in the program Demo_trans.g. We have to use a tâtonnement process over the real interest rate r and approach its final value of 3% stepwise (in 10 equispaced steps), meaning that we iteratively compute the value of β for r = 8.92%, 8.26%, . . . , 3.00%. If we do not use 1 2 ˜ , ˜L , ter, pg the results for the endogenous variables (K en , pg en ) from the interim computations as an initial guess in the final step with r = 3.0%, the Newton-Rhapson algorithm does not converge. From this step, we derive β = 1.027, and we have finished the computation of the initial steady state. The computational time of the initial steady state amounts to 14 minutes 18 seconds on a machine with Intel(R) Xeon(R) CPU running at 2.90 GHz. The time-consuming part of this computation stems from the calibration of β and the necessary tâtonnement process over the interest rate. Given the value of β, the final steady state for 2315 is computed within seconds. At this point, let us mention one generalization. Thus far, we have assumed that we are in a steady state at the beginning of the transition. Of course, this does not need to be the case. For example, consider the case of the German economy that was experiencing a structural break in 1989 in the form of reunification with the formerly socialist part of the country. As a consequence, it may be somewhat far-fetched to assume that Germany had reached a steady state. Similarly, if we want to study the question of demographic transition in the US, the assumption of a steady state today would probably be an oversimplification because fertility and mortality rates have not been stationary in the US in recent decades. How can we compute the initial steady state in this case? As you will see shortly, to compute the transition, we need to know the allocation of all agents who are alive at the beginning of the transition. Accordingly, we need to know the sequence of the factor prices, tax rates, and social security contribution rates that are important for their allocation decision. Since our oldest agents are aged 70, we need to know the time series for the last 70 periods. If we are in a steady state, this does not pose a problem since all aggregate variables are constant. If we are out of the steady state, the issue is more problematic. One typical approach to solving this problem is used by Krueger and Ludwig (2007), among others. They study the demographic transition in the US (among other countries)
606
10 Overlapping Generations Models with Perfect Foresight
starting in 2000. They assume that the economy was in a steady state in 1950 and start the computation of the transition in that year, exogenously imposing the demographic development of the US population during the period 1950-2000. Even if the US were not in a steady state in 1950, the allocation in 2000 is hardly affected by this initialization because most of the households whose decisions depend on the behavior of the economy prior to 1950 are no longer alive. We, however, simplify the present exposition and computation by assuming that the economy is in steady state prior to 2015.46 In Step 3, we provide an initial guess for the time path of the aggre1,0 2,0 ˜ 0 , ˜L 0 , ter 0 , pg gate variables {(K en t , pg en t )}300 t t t t=1 . We assume a linear law of 1 2 ˜ ˜ motion for K , L , ter, pg en , and pg en , starting from the initial steady-state value in period t = 0 (corresponding to 2014) and reaching the new steady state in period t = 301 (corresponding to 2315).47 The initial guess for ˜ = K/(AN ), is graphed as a blue line in Figure 10.18. the capital stock K Given this sequence for the endogenous aggregate variables, we call the procedure getvaluetrans in the program Demo_trans.g in Step 4. 1,0 2,0 ˜ 0 , ˜L 0 , ter 0 , pg With the help of the aggregate variables {(K en t , pg en t )}300 t t t t=1 , we can compute all factor prices and the social security contribution rate that are necessary for the computation of the individual optimization problem. We start the computation of the individual policy functions in the last period of the transition, t = 300, and iterate backwards. In each 46
Let us also mention another complication that arises in the case of a method that calibrates the steady state with the characteristics of the economy in a period in the distant past, say in 1950. We take many of the calibration targets and parameters from the present period. For example, Trabandt and Uhlig (2011) use the period 1990-2010 for their tax and production parameters or the government share. Of course, we cannot use these observations to calibrate the steady state in 1950 but have to use the variables that prevail at that time. The official statistics on economic variables at that time, however, are less reliable than statistics based on modern data. If we would instead like to use the observations from the more recent period but still start our computation from a steady state in 1950, we would have to simulate the dynamics and compare the simulated time series during the period 1990-2010 with the respective empirical counterparts, e.g., if the simulated average labor supply in our model during the period 1990-2010 is equal to 0.30. Therefore, we have to calibrate the model based on the transition dynamics rather than the steady-state behavior. Since the simulation of the transition period can be very time-consuming, this procedure is very likely to be too cumbersome, and we do not know of any general-equilibrium study that uses this approach in a large-scale OLG model. 47 If we had assumed the existence of perfect annuity markets rather than postulating that accidental bequests are confiscated by the government, we could have dropped transfers ter t from the set of endogenous variables since they would have already been implied by ˜ t , ˜L t , and the fiscal budget constraint of the government. K
10.5 The Demographic Transition Initial Guess Iteration 4
607
Iteration 1 Iteration 5
Iteration 2 Iteration 6
Iteration 3
1.55 1.50
˜ K
1.45 1.40 1.35 1.30 2015
2050
2100
2150
2200
2250
2314
Year ˜ Figure 10.18 Convergence of Transition Path K
period, we compute the optimal policy functions for the households born in period t. Therefore, we assign the factor prices to global variables and call the routine sys1. In each period t = 1, . . . , 300, we have to aggregate the capital stock and labor supply of the generations currently alive. Therefore, when we consider a generation born in period t, we use s ˜st+s−1 its allocation {(˜ ast+s−1 , l t+s−1 )}s=70 s=1 and add the individual wealth a s ˜ t+s−1 and and effective labor supply ¯y s ε j l t+s−1 to the aggregate wealth Ω employment ˜L t+s−1 , respectively, for s = 1, . . . , 70. Of course, we have to multiply the individual variables by the measure of the generation with age-s productivity j that we computed in Step 2. As we iterate over the periods t, we do not have to store the allocation of the generation born in period t. We only use the allocation of the generation born in t as an initial guess for that born in t − 1 that we compute in the next iteration. When we compute the individual policy functions, we need to be careful in the periods prior to and after the transition. 1) When we consider the behavior of the s-year-old household born prior to period t = 1, we have to adjust the computation in the procedure Sys1 with respect to our assumption that the household behaves like the household in the initial steady state. Furthermore, we have assumed that the household does not expect the policy change in period t = 1 when the replacement rate of pensions with respect to net wage income is permanently decreased from 35.2% to 30.0%. Therefore, we simply add the conditional statement in the
608
10 Overlapping Generations Models with Perfect Foresight
procedure Sys1 that individual savings and labor supply of the s-year-old household with efficiency type j are equal to those values prevailing in the initial steady state during the periods t = 0, −1, −2 . . ., −68.48 2) When the transition is complete after 300 periods, we assume that the factor prices, pensions, and the social security contribution rate are equal to the corresponding values in the final steady state. From the aggregate consistency conditions and the government budget constraint we compute new sequences for our five state variables, say 1,i∗ 2,i∗ ˜ i∗ , ˜L i∗ , ˜t r i t , pg {(K en t , pg en t )}300 t t t=1 , where i = 1, 2, . . . denotes the iteration number and the asterisk ∗ indicates that the sequence derives from the iteration i policy functions of the agents. 1,i 2,i ˜ i , ˜L i , ter i , pg In Step 5, we update our prior guess {(K en t , pg en t )} t=300 t t t t=1 . This amounts to the solution of the 1500 equations
˜ i∗ − K ˜i K t t ˜L i∗ − ˜L i t t i∗ i ter − ter = 0, t = 1, . . . , 300. t t 1,i∗ 1,i pg en − pg en t 2,i∗ 2,i pg en t − pg en t
Standard Newton-Raphson methods will break down because we are attempting to compute the Jacobian matrix with a dimension of 1500 × 1500 in each step. A workable solution to this problem is Broyden’s algorithm as described in Section 15.3.2. To economize on computational time, we use (15.17) to update the inverse matrix of the Jacobian matrix. The only remaining problem is to find an initial value of the Jacobian matrix. As described in the previous section, we use the Jacobian matrix in the final steady state as an approximation for the Jacobian matrix in all periods.49 It is straightforward to compute the derivative of our function getvaluess 1 ˜ 301 , ˜L301 , ter 301 , pg with respect to the final steady-state values K en301 , and 2 pg en301 . Given the Jacobian matrix in the final steady state, we compute the Kronecker product of this matrix and an identity matrix of dimension 300 × 300. In essence, we assume that the aggregate variables in period 1 2 ˜ t , L t , ter t , pg t, (K en t , pg en t ), do not have an impact on the behavior in the other periods and that the economy behaves similarly in every period. In 48
The oldest household that we consider in the computation of the transition is the 70, j household born in period t = −68. Its last period of life is period t = 1, so its savings, a˜1 , still has an impact on the aggregate variables during the transition. 49 In particular, we are also using the final steady-state values of the cross derivatives ˜ i∗ − K ˜ i )/∂ L i , ∂ (K ˜ i∗ − K ˜ i )/∂ ˜t r i , . . . for the initialization. ∂ (K t t t t t t
10.5 The Demographic Transition
609
our problem, we find that this approximation of the Jacobian matrix J performs very satisfactorily and its computation is very fast, amounting to a matter of seconds. To improve convergence in our program Demo_trans.g, we also imply 1i 2i ˜ i , ˜L i , ter i , pg a line search over the values xi := {(K en t , pg en t )}300 t t t t=1 . Assume that we have computed the next step size d x = −J(xi )−1 f(xi ), with the help of the value returned by the routine getvaluetrans. We then apply Algorithm 15.3.3 from Chapter 15 to find an increase λ dx that improves our fit, 0 < λ ≤ 1. In case we do not find a decrease in the function value, we re-initialize our Jacobian matrix with our first initialization and return to Step 4. In our computation, however, this step is not necessary. We stop the computation as soon as we have found a solution, where the sum of the squared deviations of two successive values xi is less than 0.001. In this case, the maximum percentage difference between two individual ˜ i and ˜L i with the new values K ˜ i∗ and ˜L i∗ is approximately equal values of K t t t t to 0.01%. With Broyden’s method, we need 6 iterations. The computation is time-consuming and amounts to 1 hour 8 minutes on machine with Intel(R) Xeon(R) CPU running at 2.90 GHz. The convergence of the time ˜ i is presented path to its final value (the black line) for the capital stock K t in Figure 10.18 for the six iterations i = 1, . . . , 6. As you can see by simple inspection, we observe a smooth approximation of the final steady state. We also provide the option to compute the transition path with the help of a tâtonnement process. We find that an update parameter of ψ = 0.9 works sufficiently fast, meaning that the old solution is a linear interpolation of the old and the new solution with weights of 90% and 10%, respectively. In this case, we need 37 iterations, and the computational time is 2 hours 50 minutes.
10.5.4 Results In this section, we present our economic results. First, we discuss the individual policy functions in the initial and final steady states, 2014 and 2315, before we turn to the transition dynamics. Figure 10.19 displays the individual wealth-age profiles of the low- and high-efficiency households in the initial and final steady state. Obviously,
610
10 Overlapping Generations Models with Perfect Foresight
the savings of high-productivity workers are higher than those of lowproductivity workers because their labor income is also higher. In the initial and final steady state, both types of agents dissave during their first years, so wealth declines below zero, and the households accumulate debt.50 Note that the households accumulate higher lifetime savings in the final High Efficiency Households
1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00
4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 −0.5
a˜s,2
a˜s,1
Low Efficieny Households
1 10 20 30 40 50 60 70
s
Initial Steady State
1 10 20 30 40 50 60 70
s
Final Steady State
s, j
Figure 10.19 Individual Wealth a˜t in the Initial and Final Steady State
steady state (red line) than in the initial steady state (blue line). Maximum wealth in the final steady state also peaks at a higher age (approximately age 58 versus age 55 in the initial steady state) and is approximately 10% larger than in the initial steady state for both types of households. There are various effects on savings behavior when we move from 2014 to 2315. First, the survival probabilities increase. Therefore, average lifetime increases, and households need to accumulate more savings for old age. In addition, we analyze a pension policy that reduces the gross replacement rate of pensions from 35.2% to 30.0%, so pensions decline from 0.0713 (lowefficiency type) and 0.1782 (high-efficiency type) to 0.0696 and 0.1697, respectively.51 Again, this effect increases savings ceteris paribus. On the other hand, the social security authority has to increase its contribution rate τ p from 9.68% to 13.1% because of the increase in the dependency ratio from 27.2% to 44.0%. Consequently, the tax wedge increases, and 50
In Section 11.2, we also consider the case in which we impose a credit constraint a ≥ 0 in a model with heterogeneous agents. 51 The decline in pensions is smaller than the decline in the replacement rate because both the wage and the average working hours increase between 2015 and 2316.
10.5 The Demographic Transition
611
the household receives a lower net labor income. The overall effect on savings is positive, so lifetime savings attain a higher maximum at age 58 in 2315 than in 2014. In Figure 10.20, we present the labor supply of low- and high-productivity workers in the initial and final steady state. By our choice of the utility function, high-productivity and low-productivity workers have approximately equal labor supply since the substitution and income effects of higher wages are of almost the same size. When we compare the initial (blue line) with the final (red line) steady state, we find that in the former, younger agents work somewhat longer, while in the latter, the older agents are supplying more labor to a much stronger extent. The total effect is a significant increase in the average working hours over the working life for both low- and high-efficiency workers. The strong increase is explained by the need to accumulate more savings for old age as the public pay-asyou-go pension is reduced. We also observe that the household substitutes labor less strongly intertemporally in the final steady state than in the initial steady state, meaning that labor drops less at the end of the working life. High Efficiency Households
Low Efficieny Households 0.35
0.35
0.30
0.30
l s,2
l s,1
0.40
0.25
0.25 0.20
0.20 1 5 10 15 20 25 30 35 40 45
1 5 10 15 20 25 30 35 40 45
s
s
Initial Steady State
Final Steady State s, j
Figure 10.20 Individual Labor Supply l t , Initial and Final Steady State
The transition dynamics of the aggregate state variables and the factor prices during 2014-2313 are illustrated in Figures 10.21-10.23. In the year 2014, the economy is in the old steady state. In the next year with the ˜ t is still predetermined and unexpected policy change, the capital stock K equal to the old steady-state value. Aggregate labor ˜L t , however, jumps
612
10 Overlapping Generations Models with Perfect Foresight
upwards because pension entitlements fall. Therefore, aggregate labor ˜L t increases (right panel of Figure 10.21), while capital per effective labor, k t = K t /(A t L t ), decreases initially (left panel of Figure 10.21).52 Aggregate Labor
Scaled Capital 6.75
0.240 0.235
6.25
˜L
˜ K
6.50 6.00 5.75 5.50 2014
0.230 0.225
2100
2200
Year
2313
0.220 2014
2100
2200
2313
Year
˜ t = K t /(A t L t ), and Figure 10.21 Convergence of Capital per Working Hour, K Aggregate Labor ˜L t
During the demographic transition, however, the declining labor force share slowly decreases total employment ˜L in our economy. Due to the changing composition of the population, the share of 40-60-year-old households with higher wealth initially increases strongly so that the capital stock per capita, K/(AN ), also rises initially. In the medium- to long-run, the higher share of retirees who dissave becomes a more dominating factor ˜ t is hump-shaped over on aggregate savings. Therefore, the capital stock K time as presented in Figure 10.18. In the long-run, however, aggregate ˜ t remains above the capital stock in the old steady-state.53 As a capital K consequence, the capital intensity, k = K/(AL), and wages increase, while the interest rate declines (in accordance with the hypothesis of a secular stagnation and a low real interest rate; see, e.g., Eggertsson et al. (2019)). The transition of the factor prices is illustrated in Figure 10.22. Note that the wage rate even overshoots its long-run value, while the real interest ˜ in the left panel of We display capital per (effective) working hour k rather than K Figure 10.21 as we would like to explain the behavior of the factor prices. The real interest rate r and the wage rate w are monotone functions of k, r = αkα−1 − δ and w = (1 − α)kα . ˜ / ˜L by dividing K ˜ (as presented in You can derive the dynamics of the variable k = K Figure 10.18, black line) by ˜L (as present in the right panel of Figure 10.21). 53 This observation also holds in the case of a constant replacement rate of 35.2%, albeit to a lesser extent. The reader is invited to also run the computer program for this case. With a constant replacement rate, the convergence of the Broyden algorithm is much faster, and the computational time amounts to only 21 minutes. 52
10.5 The Demographic Transition
613
rate undershoots its long-run value. This behavior is explained by the hump-shaped transition of the capital stock. Wage
Interest Rate
1.26 1.22
r
w
1.24 1.20 1.18 2014
2100
2200
2313
3.25 3.00 2.75 2.50 2.25 2.00 2014
2100
Year
2200
2313
Year
Figure 10.22 Convergence of Factor Prices w t and r t
The transition dynamics of the endogenous government variables, government transfers ter and the social security contribution rate τ p , are presented in Figure 10.23. The government has to lower the transfers (or, more accurately, has to increase its lump-sum taxes as ter < 0) because tax revenues decline in a population characterized by a larger share of retirees. The social security authority has to increase the social security contribution rate τ p to finance the larger amount of pensions to the retirees. Note that τ p does not stabilize until 2140 and mirrors the behavior of the dependency ratio during the transition, as depicted in Figure 10.17. Contribution Rate
Transfers
−0.3 −0.5
τp
ter
0.0
·10−2
−0.8 −1.0
2014
2100
2200
Year
2313
13.0 12.0 11.0 10.0 9.0 8.0 2014
2100
2200
Year
p Figure 10.23 Convergence of Government Variables ter t and τ t
2313
614
10 Overlapping Generations Models with Perfect Foresight
10.6 Conclusion The OLG model is a natural framework to analyze life-cycle problems such as the provision of public pensions, endogenous fertility, or the accumulation of human capital and wealth. Moreover, it is the most appropriate model to examine the consequences of the demographic transition, for example, with respect to the sustainability of public finances or the effect of population growth on climate policies. In this concluding section, we will provide a rather incomplete survey of some of the main applications in the literature that predominantly employs OLG models. We will focus on 1) the distributions of income and wealth and 2) public finances with a particular focus on the demographic transition. 1) The OLG model is the dominant modeling framework for studying questions of income and wealth dynamics and the determinants of savings. Empirically, income is found to depend on age. In particular, Hansen (1993) shows that the US age-productivity profile is hump-shaped such that the wage rate peaks at approximately age 50-52 and is approximately twice as high as in young age. Therefore, labor income is higher in the middle and near the end of the working life, meaning that income is less concentrated in an older economy with many workers in the 45-60 years-of-age cohorts. In addition, savings also depend on both age and income, and one of the main motives for savings is presented by the need to provide for old age. In this vein, Huggett and Ventura (1999) examine the determinants of savings and use a calibrated life-cycle model to investigate why highincome households as a group save a much larger fraction of income than do low income households as documented by cross-sectional US data. De Nardi and Fella (2017) also include bequests, entrepreneurship, medical expenses, and preference heterogeneity as possible explanatory factors for the observed heterogeneity in savings rates and, hence, wealth. The life-cycle savings behavior of the households in the overlapping generations model is also reflected in a hump-shaped wealth-age profile that displays peaks close to the age of retirement. Therefore, in a young economy with only a few older workers, wealth is more concentrated than in an economy with many older workers or early retirees. As one of the most prominent and earliest articles in this literature, Huggett (1996) shows that the life-cycle model is able to reproduce the US wealth Gini coefficient and a significant fraction of the wealth inequality within age groups. Heer (2001b) and De Nardi and Yang (2016) also study the role of bequests in explaining observed wealth inequality, while De Nardi (2015)
10.6 Conclusion
615
provides a survey on the modeling of the wealth distribution in DGE models. 2) The demographic structure of the economy also has a significant impact on public finances. In particular, earnings are, as argued above, age-dependent, so tax revenues also depend on the (relative) size of the individual age cohorts in the total population. In addition, the ratio of the number of retirees to workers, which is measured by the so-called age-dependency ratio, is a crucial determinant of the levels of expenditures and revenues of the social security authority. In basically all industrialized OECD countries, pensions are predominantly distributed through a pay-asyou-go system, meaning that the current workers finance the pensions of current retirees through their social security contributions. Consequently, the US tax system and the US social security system have also attracted substantial attention in OLG analysis: ˙Imrohoro˘ glu (1998) analyzes the ˙ effects of capital income taxation, Imrohoro˘ glu et al. (1998) evaluate the benefits of tax-favored retirement accounts, and Ventura (1999) considers the effects of a flat-rate versus a progressive income tax. The effects of social security and unemployment compensation are studied by ˙Imrohoro˘ glu et al. (1995), Hubbard et al. (1995), Heer (2003b), Fehr et al. (2013), and Heer (2018), among others. ˙Imrohoro˘ glu et al. (1995), for example, examine the effects of a change in public pensions on economic welfare in a 60period OLG model with liquidity constraints and income uncertainty. The optimal replacement rate of both pensions and unemployment insurance with respect to gross wages is mostly found to be low in these studies, except for the very first period of unemployment. In particular, pensions are optimally provided at replacement rates below 30% and are lump-sum so that they redistribute among retirees. The OLG framework is also the natural framework for studying questions related to the demographic transition. As the population ages, the pension system comes under pressure. De Nardi et al. (1999), Nishiyama and Smetters (2007) and Kitao (2014), among many others, examine different policy plans to cope with the transition. De Nardi et al. (1999) find that switching to a purely defined contribution system is one of the few policies that raises the welfare of all generations. Nishiyama and Smetters (2007), and Kitao (2014) use so-called Hicksian compensation to evaluate different reform proposals. Therefore, they compute the transfer paid by future generations that is necessary to compensate all living generations to ensure that they do not suffer a welfare loss (as measured by their expected lifetime utility) from the pension reform policy. The government administers this compensation by the accumulation of debt that has to be repaid
616
10 Overlapping Generations Models with Perfect Foresight
by those born after the present period such that all future generations have the same lifetime utility (possibly adjusted for productivity growth). The change in the lifetime utility of future generations with respect to that under the present policy is called the Hicksian efficiency gain.54 Nishiyama and Smetters (2007) find that the welfare effects of a 50% privatization of social security depend on many factors, including the assumption of a closed economy or the progressivity of pensions. Kitao (2014) identifies a policy that reduces the benefit level of pensions (and, hence, the reduction of social security contributions to finance them) as the most efficient policy. The sustainability of public finances is also closely related to the role of public debt. In the Great Recession of 2007-2008 and since, we have observed that debt levels (relative to GDP) increased significantly in many OECD countries. For example, the US debt-GDP ratio increased from approximately 60% in 2005 to more than 100% by 2015.55 Similarly, debt (relative to GDP) has increased by 20 percentage points or more in many Eurozone countries, including France, Italy, Spain, and Greece. The most indebted OECD country, however, is Japan, which had accumulated debt equal to 238% of GDP as of 2015. To study the role and sustainability of public debt, the OLG model, again, is the natural framework. While the standard neoclassical growth model is characterized by Ricardian equivalence such that the financing of government expenditures by debt or lump-sum taxes does not affect equilibrium values of output, consumption, or employment, this observation does not hold for the OLG model.56 In addition, the demographic structure is important in the analysis of public debt sustainability since the largest part of the government revenues used to service public debt is provided by income taxes paid by workers, i.e. the young cohorts. As one example from the growing literature on this problem, Braun and Joines (2015) analyze the sustainability of Japanese public finances. They find that Japan will suffer from a severe fiscal crisis if it continues present pension and tax policies. Heer (2019) studies the financing of possible US pension reform proposals with public debt during the transition 2015-2060 and finds that debt levels can only be increased to approximately 215% by 2055 and would crowd out capital by approximately one-third. Beyond this threshold level, debt payments 54
Sometimes, the welfare gain is also referred to as Kaldor-Hicks improvement. The data are taken from the IMF World Economic Outlook database. 56 Barro (1974) showed that the means of financing public expenditures does not matter for the real allocation of the economy if the following holds: 1) Families act as infinitely lived dynasties because of intergenerational altruism, 2) capital markets are perfect, and 3) the path of government expenditures is fixed. 55
10.6 Conclusion
617
would become unsustainable, meaning that the government, even at the maximum level of income tax rates, would not be able to finance them. The OLG model framework has also been successfully applied to many other areas of macroeconomic problems, including the study of business cycle fluctuations, the pricing of assets and equities, or the optimal portfolio choice over the life-cycle.57 Business cycles will be the focus of attention in the next chapter. Let us reiterate that the aforementioned list of recent applications is only selective and by no means exhaustive.
57
Please see Ríos-Rull (1996), Brooks (2002), Storesletten et al. (2007), or Heer and Scharrer (2018), among others.
618
10 Overlapping Generations Models with Perfect Foresight
A.10 Derivation of Aggregate Bequests in (10.29) To derive accidental bequests, sum up individual budget constraints (10.22) weighted by the measure of s-year old households to get c
(1 + τ )
T X
µst Nt c ts
|s=1 {z
}
= (1 − τ
l
p − τ t )A t w t
TW X
T X
µst Nt ¯y s l ts +
|s=1 {z
}
µst Nt pen t
s=T W +1
|
=L t
=C t
{z
}
=Pen t
T T T X X X + 1 + (1 − τk )(r t − δ) µst Nt kst +R bt µst Nt bst + µst Nt t r t
|s=1 {z
|s=1 {z
}
=K t
−
T −1 X s=1
µst Nt ks+1 t+1 −
T −1 X s=1
}
|s=1 {z
=B t
}
=Tr t
µst Nt bs+1 t+1 .
With the help of the social security authority’s budget constraint, (10.32), we can eliminate pensions from this equation. Division by (A t Nt ) results in ˜ t + Rb B ˜ f (1 + τc )C˜t =(1 − τl )w t ˜L t + 1 + (1 − τk )(r t − δ) K t t + Trt − (1 + gA)
T −1 X s=1
µst ˜ks+1 t+1 − (1 + g A)
T −1 X s=1
µst ˜bs+1 t+1 .
Since we assume constant returns to scale and perfect competition in factor and product markets, the Euler’s theorem holds: ˜t , Y˜t = w t ˜L t + r t K such that L k c˜ ˜ t + Rb B ˜ ˜ ˜ fr t + Y˜t + (1 − δ)K C˜t = T t t − τ w t L t − τ (r t − δ) K t − τ C t
− (1 + gA)
T −1 X s=1
µst ˜ks+1 t+1 − (1 + g A)
T −1 X s=1
µst ˜bs+1 t+1 .
Inserting the goods market equilibrium ˜ t + (1 + n)(1 + gA)K ˜ t+1 − (1 − δ)K ˜t , Y˜t = C˜t + G
in the equation above, we derive, after some re-arranging, L k c˜ ˜t + T ˜ ˜ ˜ fr t + R b B G t t − τ w t L t + τ (r t − δ) K t + τ C t | {z } T˜t
˜ t+1 + (1 + gA) = −(1 + n)(1 + gA)K
T −1 X s=1
µs ˜ks+1 t+1 + (1 + g A)
T −1 X s=1
µs ˜bs+1 t+1 .
Appendix 10
619
Inserting the fiscal budget constraint (10.28) (after division by (A t Nt )) into this equation, we derive ˜ t+1 + B ˜ t+1 g (1 + n)(1 + gA)B eq t = −(1 + n)(1 + gA)K + (1 + gA)
T −1 X s=1
µs ˜ks+1 t+1 + (1 + g A)
T −1 X s=1
µs ˜bs+1 t+1 .
We can solve for accidental bequests g B eq t = (1 + gA) + (1 + gA)
s=1
T −1 X s=1
= (1 + gA) + (1 + gA)
T −1 X
= (1 + gA) = (1 + gA) = (1 + gA)
˜ µst ˜bs+1 t+1 − (1 + n)(1 + g A) B t+1 ,
T −1 X s=1
T −1 X s=1
µst ˜ks+1 t+1 − (1 + n)(1 + g A)
µst ˜bs+1 t+1 − (1 + n)(1 + g A)
T −1 X s=1 T −1 X s=1 T −1 X s=1
˜ µst ˜ks+1 t+1 − (1 + n)(1 + g A) K t+1
T X s=1 T X s=1
µst+1 ˜kst+1 µst+1 ˜bst+1 ,
s+1
˜ µst ˜ks+1 t+1 + b t+1 − (1 + n)(1 + g A) s+1
˜ µst ˜ks+1 t+1 + b t+1 − (1 + g A)
T −1 X s=1
T −1 X s=1
˜ s+1 ˜ s+1 µs+1 t+1 k t+1 + b t+1
˜ s+1 µst φ s ˜ks+1 t+1 + b t+1 ,
˜ s+1 µst (1 − φ s ) ˜ks+1 t+1 + b t+1 .
where we used ˜k1t+1 = ˜b1t+1 = 0 to derive the third from the second equation and s s Nt+1 µs+1 t+1 = φ Nt µ t , s = 1, . . . , T − 1,
or, equivalently, µs+1 t+1 =
µst φ s 1+n
.
to derive the fourth from the third equation. g The last equation for B eq t above is the stationary pendant of (10.29).
620
10 Overlapping Generations Models with Perfect Foresight
Problem 10.1: Computation of the 60-Period OLG Model with the Newton-Raphson Method Recompute the illustrative example in Section 10.2.1 using direct computation with the help of the program AK60_direct.g. However, instead of the secant method, use the Newton-Raphson method. Therefore, write a routine that computes the capital stock k1 for given k60 . Solve the nonlinear equation k1 (k60 ) = 0. Problem 10.2: Earnings-Related Pensions Consider the steady state of the model in Section 10.2 with a constant wage and interest rate. Pension payments pen are assumed to be lump-sum irrespective of the individual’s earnings history and contributions to social security. As a more realistic description of pension payments in modern industrial countries, let pension payments depend on average lifetime earnings. In addition, the government provides a minimum social security penmin in old age. More formally, TW for an individual with earnings history {(l s w)}s=1 , annual pension payments are calculated by the formula pen = ε
TW s X l w + penmin , 0 ≤ ε < 1. W T s=1
As a consequence, the individual state variables of the value function of the retired agent are given by individual wealth k and annual pension payments pen, while the individual state variables of the young agent are given by his wealth k and his accumulated earnings. Furthermore, the working agent maximizes his labor supply while taking the intertemporal effect on his entitlement to pension payments into account. Accordingly, his first-order condition with respect to labor supply in the steady state is given by ul (c s , 1 − l s ) = (1 − τ)wuc (c s , 1 − l s ) + β T
W
+1−s ∂
vT
W
+1
W
(k T , pen; K, N ) w ε W, ∂ pen T
where the second additive term on the right-hand side of the equation reflects the increase in old-age utility from an increase in labor supply through its effect on pensions. Compute the stationary equilibrium, and show that an initial increase in ε from 0 to 0.1 increases employment and welfare as measured by the value of the newborn generation v 1 (0, pen; K, N ).
Problems
621
Problem 10.3: Laffer Curve Recompute the Laffer curve from the model presented in Section 10.3 for the year 2050. The population variables, i.e., the survival probabilities and the population growth rate, are also provided as Excel file downloads from the book’s homepage. Show that, as a consequence of aging, the maximum value of US tax revenues shrinks in absolute value. Consider the following two policies to restore the fiscal space: 1) a reduction in the replacement rate of pensions relative to wages to 30% and 2) an increase in the retirement age to 70 (corresponding to T W = 50). R Implement the necessary changes in the GAUSS or MATLAB computer programs Laffer.g or Laffer.m.
Problem 10.4: Money-Age Distribution In this problem, you will compute the money-age distribution and compare the model’s cross-sectional correlation of money with income and wealth to its empirical counterparts. The model follows Heer et al. (2011). Consider an economy with overlapping generations. The total measure of all households is equal to one. Households live a maximum of T = 60 years and face a stochastic survival probability φ s from age s to s + 1 with φ 0 = 1. For the first 40 years, they are working, supplying one unit of labor inelastically, and for the remaining years, they are retired. Households differ with respect to age s and productivity type j = 1, . . . , 5. Each productivity class j is of equal measure. Households hold two assets, real money m t = M t /Pt and capital k t . The household with productivity type j maximizes lifetime utility: s−1 T X Y Ut = β s−1 φ i u(c j,s , m j,s ), s=1
i=0
where β and c s, j denote the discount factor and real consumption of the s-year-old with productivity type j, respectively. A role for money is introduced by assuming money in the utility function: u(c, m) =
c γ m1−γ
1−η
1−η
.
Individual productivity e(s, j) = ¯y s ε j depends on household age, s ∈ {1, 2, . . . , 40}, and efficiency type, ε j ∈ {0.5, 0.75, 1, 1.25, 1.5}. The s-year-old agent receives income from capital k j,s and labor e(s, j)w in each period s of his life. After retirement, agents do not work, e(s, j) = 0 for s ≥ 41. The budget constraint of the s-year-old household with productivity type j is given by:58 58
At the end of the final period, k j,T +1 = m j,T +1 = 0.
622
10 Overlapping Generations Models with Perfect Foresight (1 − τk )r k j,s + (1 − τl − τ p )we(s, j)¯l + pen(s, j) + t r + k j,s + m j,s = c j,s + k j,s+1 + m j,s+1 (1 + π) − Sei g n,
where Sei gn and π = Pt /Pt−1 − 1 denote seigniorage and the inflation rate between two successive periods t −1 and t, respectively. Note that in the stationary equilibrium, π is a constant and equals the money growth rate. Real interest income is taxed at the rate τk . In addition, the households receive transfers t r from the government. ¯l = 1 denotes the exogenous working hours in the economy. Social security benefits pen(s, j) depend on the agent’s age s and on his productivity type j as follows: § pen(s, j) =
0, for s < 41 θ (1 − τl − τ p )wε j ¯l, for s ≥ 41.
The average net replacement rate of pensions, r epl net , amounts to 30%. Output is produced with capital K and effective labor L. Effective labor L is paid the wage w. Production Y is characterized by constant returns to scale and assumed to be Cobb-Douglas: Y = K α L 1−α , with α = 0.35. In a factor market equilibrium, factors are rewarded with their marginal product: w = (1 − α)K α L −α ,
r = αK α−1 L 1−α − δ.
Capital depreciates at the rate δ = 0.08. The government consists of the fiscal and monetary authority. Nominal money grows at the exogenous rate µ: M t+1 − M t = µ. Mt Seigniorage Sei gn = M t+1 − M t is transferred lump-sum. The government uses revenues from taxing income and aggregate accidental bequests Beq to finance its expenditures on government consumption G, govern˜ We assume ment transfers t r, and transfers to the one-year-old households m. ˜ are financed by the government: that the first-period money balances m ˜ = τk r k + τwL + Beq. G + tr + m Transfers t r are distributed lump-sum to all households. Furthermore, the government provides social security benefits Pen that are financed by taxes on labor income: Pen = τ p wL.
Problems
623
In a stationary equilibrium, the aggregate variables are equal to the sum of the individual variables, households maximize intertemporal utility, firms maximize profits, the factor and goods markets clear, and the government and social security budgets are balanced. To compute the model, use the efficiency-age profile { ¯y s }40 s=1 and the survival probabilities {φ s }60 that you find in the programs Laffer.g or Laffer.m. The s=1 remaining parameters are calibrated as follows: The money endowment of the newborn generation is equal to 20% of the average money holdings in the economy. Furthermore, we set β = 1.01, η = 2.0, µ = 0.04, (τk , τl )=(0.429, 0.248), G/Y = 0.19. Calibrate γ such that the velocity of money PY /M is equal to 6.0. 1) Compute the steady state. Use direct computation. First, compute the model without money or heterogeneous productivity. Then, introduce different productivity types before you also consider money. Graph the wealth-age and money-age profiles. 2) Compute the Gini coefficients of the wage, income, and capital distributions. Do they match the empirical numbers that you encountered in Chapter 8? 3) Compute the cross-sectional correlation of money with total income and capital. Compare your findings to the empirical values of 0.22 and 0.25 during the period 1994-2001 (see Heer et al. (2011)). Can you think of any reasons why the correlations are higher in our model? How should we improve upon the modeling of money demand in OLG models?
Problem 10.5: Recompute the 6-Period OLG model from Example 10.4.1 1) Use value function iteration with cubic spline interpolation to compute the policy functions of the households. 2) Assume instead that households know the change in policy in period t 6 periods in advance. Compare the transition dynamics with the case in which the change in policy is unexpected.
Problem 10.6: Demographic Transition Recompute the model of the demographic transition in Section 10.5 for the following three cases: 1) Assume that the government holds the social security contribution rate τ p constant and that the net replacement rate of pensions adjusts to keep the government budget balanced. 2) The change in the population growth rate in period t = 0 is expected.
624
10 Overlapping Generations Models with Perfect Foresight
3) The government announces in period t = 20 that it increases the retirement age from T W + 1 = 41 to T W + 1 = 46 for all those agents that are born after t = 20. Again, distinguish between the two cases in which 1) the replacement rate and 2) the contribution rate remain constant during the transition. How do you need to adjust the program Demo_trans.g? What are the effects on the transition paths of the factor prices?
Problem 10.7: Gauss-Seidel Algorithm In their original work, Auerbach and Kotlikoff (1987) compute the transition with the help of the Gauss-Seidel Algorithm. In this problem, we ask you to compute the model in Section 10.5 with the help of the Gauss-Seidel algorithm. Assume that the economy has reached the final steady state in period t = 200. 1) Compute the initial and the final steady states. 2) As an initial guess for the transition, specify a linear law of motion for 1,0 2,0 ˜ 0 , ˜L 0 , ter 0 , pg {(K en t , pg en t )} t=199 t=0 . t t t
1,1 2,1 ˜ 1 , ˜L 1 , ter 1,1 , pg 3) Compute (K en199 , pg en199 ) given that all other aggregate vari199 199 199 1,0 2,0 ˜ 0 , ˜L 0 , ter 0 , pg ables are equal to the initial values {(K en t , pg en t )} t=198 t=0 . Theret t t fore, you have to write a routine that computes the savings and labor supply of all households that are alive in period t = 199 given the factor prices and transfers over their lifetime. As an input, you need to provide your initial 1,0 2,0 ˜ 0 , ˜L 0 , ter 0 , pg value of {(K en t , pg en t )} t=199 t=0 . t t t 1,1 2,1 ˜ 1 , ˜L 1 , ter 1 , pg 4) Compute (K en198 , pg en198 ) given the sequence 198 198 198
¦
© t=197 1,0 2,0 ˜ 0 , ˜L 0 , ter 0 , pg (K en t , pg en t ) t t t t=0
and
1,1 2,1 ˜ 1 , ˜L 1 , ter 1 , pg (K en199 , pg en199 ) 199 199 199
in the same way as above.
1,1 2,1 ˜ 1 , ˜L 1 , ter 1 , pg (K en t , pg en t ), t t t
5) Continue to compute step 3) until convergence.
t = 197, . . . , 1 and return to
Compare the computational time with that of our Broyden algorithm.
Problem 10.8: Secular Decline in Labor Supply Boppart and Krusell (2020) present evidence that hours worked have fallen steadily over the last 150 years in the US and approximately halved during this period. They formulate a utility function that is able to capture this long-run trend on a balanced growth path. As a special case of (1.41), consider the following utility function:
Problems
625
ψ u(c, l) = ln c + ξ ln 1 − l c 1−ψ . In this utility function with ψ > 0, the income effect (of higher wages) slightly outweighs the substitution effect on labor supply. For ψ = 0.20 and a growth rate of total factor productivity equal to 2%, consumption will grow by 1.6%, while labor will fall by 0.4% annually on a balanced growth path, in accordance with time series evidence from the US economy. Recompute the model of the demographic transition in Section 10.5.1. Calibrate the parameters ξ so that average hours in the initial steady state in 2014 amount to 0.30. (What is the value of the Frisch labor supply elasticity?) Compare your results for the year 2050 with those from the benchmark model with the standard Cobb-Douglas utility function (10.44).
Chapter 11
OLG Models with Uncertainty
11.1 Introduction In this chapter, we introduce both idiosyncratic and aggregate uncertainty into the overlapping generations (OLG) model. The methods that we will apply for the computation of these models are already familiar to you from previous chapters on the (stochastic) neoclassical growth model and will only be modified to allow for the more complex age structure of OLG models. In particular, we apply the perturbation methods from Chapter 3, the algorithm for the computation of the stationary distribution from Chapter 8, and the algorithm by Krusell and Smith (1998) from Chapter 9 for the solution of the nonstochastic steady state and the business cycle dynamics of the OLG model. In the following, we first introduce individual stochastic productivity in the standard OLG model and, subsequently, aggregate stochastic productivity. In the first section, agents have different productivity types. Different from the traditional Auerbach-Kotlikoff models, agents are subject to idiosyncratic shocks and may change their productivity types at random. As a consequence, the direct computation of policies — i.e. solving the firstorder and equilibrium conditions with the help of the Newton-Rhapson algorithm or similar methods for all possible individual states — is no longer feasible, and we will resort to value function iteration.1 As an inter1
In fact, the choice of the way to compute individual policy functions is more involved. For the problem studied in Section 11.2.3, we recommend to use value function iteration as the first-order condition of the individual optimization problem contains the derivative of the value function. For the problem analysed in Section 11.3.4, we use a nonlinear equations solver to derive optimal policy functions over the discretized state space because of its higher speed.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_11
627
628
11 OLG Models with Uncertainty
esting application, we attempt to explain the empirically observed income and wealth heterogeneity. In the second part of this chapter, we introduce aggregate uncertainty and study the business cycle dynamics of the OLG model.
11.2 Overlapping Generations Models with Individual Uncertainty One of the main aims of the heterogeneous-agent literature in the 1990s was to explain the high concentration of wealth. In the US economy, the distribution of wealth is characterized by a Gini coefficient equal to 0.78 according to estimates by Díaz-Giménez et al. (1997) and Budría Rodríguez et al. (2002). In the period 1990-2010, wealth inequality increased. For example, Quadrini and Ríos-Rull (2015) report a rise in the Gini coefficient of wealth from 0.80 to 0.85 during the period 1998-2010; during the Great Recession between 2006 and 2007, however, the top wealth quintile experienced the largest fall in net worth among all quintiles as documented by Krueger et al. (2016). In particular, net worth of the top quintile fell by an annual rate of 4.9% during 2006-2010, while it actually increased for the second through fourth quintiles. One main explanatory factor of the inequality in the wealth distribution, of course, is the unequal distribution of earnings. However, when we added heterogeneous productivity into the Ramsey model in Section 8.6, the model failed to replicate the observed wealth concentration. In the present chapter, we add another important determinant of the wealth distribution in addition to heterogeneous individual productivity: life-cycle savings. Agents accumulate wealth to finance consumption in old age. For this reason, we will consider an overlapping generations model in the following.2 Our OLG model for the study of the wealth distribution builds upon the model presented in Chapter 10.5. It is characterized by the following features: 2
As an alternative way to model life-cycle savings, Castañeda et al. (2003) consider the standard Ramsey model with heterogeneous productivity. In addition, they assume that agents retire and die with certain probabilities. In the former case, agents receive pensions that are lower than labor income. Krueger et al. (2016) also introduce a constant probability of retiring into the model of Krusell and Smith (1998) (using their model variant with preference heterogeneity) to study the impact of the Great Recession on the US wealth distribution and consumption expenditures.
11.2 Overlapping Generations Models with Individual Uncertainty
1) 2) 3) 4) 5)
629
life-cycle savings, uncertain earnings, uncertain lifetime, pay-as-you-go pensions, and endogenous labor supply.
Uncertain earnings also generate additional wealth heterogeneity because income-rich agents increase their precautionary savings to ensure against the misfortune of a fall in individual earnings. As a consequence, the discount rate, β −1 − 1, increases relative to the real interest rate r.3 Therefore, if the lifetime is certain, consumption increases over the lifetime, even into the final years of life. Empirically, however, the consumption-age profile is hump-shaped in the US. For this (and other) reasons, we also introduce stochastic survival to improve the model’s quality.4 If agents have lower survival probabilities in old age, consumption is hump-shaped again because utility in future periods of life are discounted at a higher (effective) rate. Uncertain lifetime also helps us to explain the empirical distribution of wealth because households have an additional motive to accumulate precautionary savings to insure against the risk of longevity. In the presence of public pension, however, the incentive to accumulate savings is particularly reduced among the low-income households as their earnings are closer to the public pensions than in the case of the income-rich households. However, the quantitative effect of the public pay-as-you-go pensions system on savings and, hence, wealth inequality depends sensitively on the progressiveness of the pension schedule with respected to the individuals’ accumulated pension contributions. Finally, endogenous labor supply helps to explain why wage inequality is smaller than earnings inequality in the US economy. In the following OLG model, we first analyze the case with lump-sum pensions such that the dimension of the individual state space in continuous variables only amounts to one. We will compute the steady state for this model. The computational time of this problem is much higher than that in Chapter 10.5.1 due to the presence of idiosyncratic uncertainty and the use 3
Wickens (2011) analyzes the two-period OLG model and shows that capital accumulation can be either higher or lower than in the Ramsey model, meaning that the discount rate, 1/β − 1, may also be lower or higher than the real interest rate r in the OLG model. In the model of secular stagnation in Eggertsson et al. (2019), for example, the discount rate, 1/β − 1, amounts to 2%, while the real interest rate ranges between -1.5% and -2.0%. 4 Uncertain lifetime was already introduced in the model of the demographic transition in Section 10.5.1.
630
11 OLG Models with Uncertainty
of value function iteration. In the second part, we introduce contributiondependent pensions that will effectively expand the individual state space by one dimension. For these two economies, we will compute the stationary equilibrium only and compare the computational speed of the different programming languages Python, GAUSS, and Julia.
11.2.1 The Model The model builds upon that described in Section 10.3 and extends it to idiosyncratic income risk. For your convenience, we present the full model in the following. DEMOGRAPHICS. A period, t, corresponds to one year. At each period t, a new generation of households is born. Newborns have a real-life age of 20 denoted by s = 1. All generations retire at the end of age s = T W = 45 (corresponding to a real-life age of 64) and live up to a maximum age of s = T = 70 (real-life age 89). The number of periods during retirement is equal to T R = T − T W = 25. Let Nts denote the number of agents of age s at t. We denote the total population at t by Nt . At t, all agents of age s survive until age s + 1 with probability φ ts , where φ t0 = 1 and φ tT = 0. In period t, the newborn cohort grows at rate n t as described in Section 10.3: 1 Nt+1 = (1 + n t )Nt1 .
(11.1)
HOUSEHOLDS. Each household comprises one (possibly retired) worker. Households maximize expected intertemporal utility at the beginning of age 1 in period t 5 max E t
T X s=1
j−1 s s β s−1 Πsj=1 φ t+ j−2 u(c t+s−1 , 1 − l t+s−1 ) + ω(g t+s−1 ) , (11.2)
where β > 0 denotes the discount factor. Instantaneous utility u(c, 1 − l) is specified as a function of consumption c and leisure 1 − l: 5
In Appendix A.11, we consider the more general optimization problem of the s-year old household in period t. The focusing on the 1-year-old household allows for the use of a simpler notation.
11.2 Overlapping Generations Models with Individual Uncertainty
u(c, 1 − l) =
c γ (1 − l)1−γ 1−η
631
1−η ,
(11.3)
where η denotes the coefficient of relative risk aversion6 and γ is the share of consumption in utility. During working life, s = 1, . . . , T W , the labor supply of the s-year-old household amounts to 0 ≤ l s ≤ l max , where we impose a constraint on the minimum and maximum working hours, 0 and l max , respectively. During retirement, l ts = 0 for s = T W + 1, . . . , T . Utility from government consumption, ω(g t ), is additive, so government consumption per capita, g t , does not have any direct effect on household behavior (only indirectly through its effects on transfers and taxes). Gross labor income of the s-year-old worker in period t, ε(s, e, θ )A t w t l ts , consists of the product of his idiosyncratic productivity ε(s, e, θ ), aggregate productivity A t , the wage per efficiency unit w t , and his working time l ts . Aggregate productivity A t grows at rate gA. The household’s idiosyncratic labor productivity ε(s, e, θ ) is stochastic and also depends on his age s according to ε(s, e, θ ) = eθ ¯y s . The shock θ follows a Markov process and takes only a finite number nθ of possible values in the set Θ = {θ1 = θ , . . . , θnθ = θ } with θi < θi+1 for i = 1, . . . , nθ − 1. Let Prob(θ 0 |θ ) denote the transition probability of the household with productivity θ in this period to become a household with productivity θ 0 in the next period for households aged s = 1, . . . , T W − 1. In old age, we assume without loss of generality that productivity remains constant. Further note that we assume that the transition matrix is independent of age s < T W and time-invariant.7 The shocks θ are independent across agents, and the law of large numbers holds (there is an infinite number of agents) so that there is no aggregate uncertainty. The labor productivity process is calibrated in detail below. In addition, the individual’s productivity depends on his permanent productivity type e ∈ {e1 , e2 }, which is chosen to reflect the difference in earnings that stems from different levels of education (high school/college) and an age component ¯y s , which is hump-shaped over the life-cycle. All households receive transfers t r t from the government. The worker p pays labor income taxes τlt and social security contribution τ t proportional 6
Remember from Section 1.5.3 that the intertemporal elasticity of substitution of consumption amounts to 1/(1 − γ(1 − η)) for this type of utility function. 7 Of course, this is a simplification. For example, Peterman and Sommer (2019) find that the transition probabilities of young workers out of employment (into unemployment) is higher than that of old workers in the US economy.
632
11 OLG Models with Uncertainty
to his labor income, ε(s, e, θ )A t w t l ts . The average accumulated earnings of the s+1-year old household at the beginning of period t +1 are summarized by the accounting variable at age s + 1, x s+1 t+1 as follows: x s+1 t+1
(s−1)x s +ε(s,e,θ )A t
=
x st (1 +
s
gA),
s t wt lt
, s = 1, . . . , T w s = T w + 1, . . . , T − 1,
(11.4)
with initial accumulated earnings equal to zero at the beginning of life, x 1t = 0. Note that workers do not accrue interest on their social security payments but that accumulated contributions in old age grow at the rate of aggregate productivity gA.8 In old age, the retired worker receives a pension. We distinguish two cases: 1) In the first case, pensions pen(x st ) are independent of the individual’s contribution (or rather earnings) x st , pen(x st ) ≡ pen t . 2) In the second case, pensions pen(x st ) depend on the average contributions over the life-cycle, and the variable x st enters the individual state vector as a second continuous state variable. In the United States, the pension system is (imperfectly) indexed to contributions but redistributes from those with high contributions to those with low contributions. Such a pension system is called a ‘progressive’ system and will be specified in greater detail in the next section. Pensions are assumed to not be subject to income taxes. Accordingly, net non-capital income y ts is represented by y ts =
p (1 − τlt − τ t )ε(s, e, θ )A t w t l ts , pen(x s ), t
by
s = 1, . . . , T W , s = T W + 1, . . . , T.
(11.5)
The budget constraint of the household at age s = 1, . . . , T is then given
s+1 s k s s b s (1+τct )c ts + ks+1 t+1 + b t+1 = y t +(1−τ t )(r t −δ)k t + k t +R t b t + t r t , (11.6)
where kst and bst denote the capital stock and government bonds of the syear-old agent at the beginning of period t. The household is born without assets and leaves no bequests at the end of its life, implying k1t = k tT +1 = 0 and b1t = b tT +1 = 0. The agent receives interest income r t and r tb = R bt − 1 on capital and government bonds and pays income taxes on labor and As a consequence, stationary contributions x˜ ts = x t /A t and, hence, pensions during old age, s = T W + 1, . . . , T , will be constant in steady state and do not decline during retirement. 8
11.2 Overlapping Generations Models with Individual Uncertainty
633
capital income at the rates of τlt and τkt , respectively. Capital depreciation δkst is tax exempt. Consumption is taxed at the rate τct . In addition, we impose a non-negativity constraint on assets, bst ≥ 0 and kst ≥ 0. If the household deceases, accidental bequests are collected by the government. TECHNOLOGY. Output is produced with the help of capital K t and effective labor L t according to the standard Cobb-Douglas function: Yt = K tα (A t L t )1−α .
(11.7)
Firms are competitive and maximize profits Π t = Yt − r t K t − w t A t L t such that factor prices are given by: w t = (1 − α)K tα (A t L t )−α , r t = αK tα−1 (A t L t )1−α .
(11.8a) (11.8b)
GOVERNMENT AND SOCIAL SECURITY. The government levies income taxes τlt and τkt on labor and capital income and taxes τct on consumption in period t. In addition, the government confiscates all accidental bequests Beq t . It pays aggregate transfers Tr t , provides a certain level G t of total public expenditures, and pays interest r tb on the accumulated public debt B t . In each period, the government budget is financed by issuing government debt: Tr t + G t + r tb B t − Ta x t − Beq t = B t+1 − B t ,
(11.9)
where taxes Ta x t are given by Ta x t = τlt A t L t w t + τkt (r t − δ)K t + τct C t ,
(11.10)
and C t denotes aggregate consumption. The government provides pay-as-you-go pensions to retirees that it finances with the contributions of workers. Let Pen t denote aggregate pension payments. The social security budget is assumed to balance: p
Pen t = τ t A t L t w t .
(11.11)
634
11 OLG Models with Uncertainty
HOUSEHOLDS’ OPTIMIZATION PROBLEM. In the following, we formulate the household optimization problem in recursive form. Therefore, we first make an assumption with respect to the portfolio choice of individuals. In equilibrium, as argued in Section 10.5, the allocation to the two assets is indeterminate because we have many agents, and the after-tax returns on both assets, K and B, are equal. Therefore, we make the innocuous assumption that all households hold both assets in the same proportion, which is equal to K/(K +B) and B/(K +B). In addition, we define household assets ast as the sum of the two individual assets, kst + bst . Let the state vectors be given by zst = (s, e, θ t , ast ) in case 1 and zst = (s, e, θ t , ast , x st ) in case 2. vt (zst ) denotes the value function of the s-year old household in period t.9 The optimization problem of the household is given by the Bellman equation s s s s+1 vt (zst ) = max u(c , 1 − l ) + ω(g ) + βφ E v (z ) , (11.12) t t t+1 t t t t+1 s s c t ,l t
subject to (11.4)-(11.6) and ast ≥ 0, and with the terminal condition vt z Tt +1 = 0. STATIONARY EQUILIBRIUM. In the following, we consider the equilibrium for case 1 in which pensions are lump-sum, and the individual state vector zst does not include average accumulated contributions x st . To express the equilibrium in terms of stationary variables, we have to divide individual variables (with the exception of individual labor supply l ts ) by aggregate productivity A t and aggregate variables (with the exception of effective labor L t ) by the product of aggregate productivity A t and the measure of the total population Nt . Therefore, we define the following stationary individual variables ˜c ts , ˜y ts , ˜kst , ˜bst , a˜st , and x˜ st : ˜c ts :=
c ts At
, ˜y ts :=
y ts At
, ˜kst :=
kst At
, ˜bst :=
bst At
, a˜st :=
ast At
, x˜ st :=
x st At
,
and stationary aggregate variables X˜ t =
Xt for X ∈ {Pen, Tr, G, B, Beq, Ta x, Y, K, C, Ω}. A t Nt
Stationary aggregate labor is defined as ˜L t = L t /Nt . Moreover, individual and aggregate government transfers are identical given that transfers are distributed lump-sum and in equal amount to all households: 9
Note that outside the steady state, the value function is not independent of the time period t.
11.2 Overlapping Generations Models with Individual Uncertainty
635
fr t = ter t . T The Bellman equation can be rewritten in stationary form using the stationary state vector ˜zst = (s, e, θ t , a˜st ):10 ( ) s γ s 1−γ 1−η (˜ c ) (1 − l ) t t ˜ s E t v˜t+1 ˜zs+1 , v˜t ˜zst = max + βφ t t+1 ˜c ts ,l ts 1−η (11.13)
subject to: 0 = ˜y ts + [1 + (1 − τkt )(r t − δ)]˜ ast + Ý t r t − (1 + τc )˜c ts
(11.14)
gA)˜ as+1 t+1 ,
− (1 + ¨ p (1 − τlt − τ t ) ¯y s eθ t w t l ts , s = 1, . . . , T W , s ˜y t = pg en t , s = T W + 1, . . . , T,
a˜st ≥ 0,
with the terminal condition v˜t (˜z Tt +1 ) = 0 and β˜ = (1 + g a )γ(1−η) β.11 To define the measures of households, we proceed as follows. Note that with respect to age s, efficiency type e, and productivity type θ we can divide the total population into T × 2 × nθ discrete subgroups. For each subgroup, identified by the tripple (s, e, θ ), let f (s, e, θ , ·) : [amin , ∞] → R+ so that Z a˜ F (s, e, θ , a˜) :=
amin
f (s, e, θ , a) d a
and F (., ., ., a˜) =
θnθ Z e2 X T X X s=1 e=e1 θ =θ1
a˜ amin
f (s, e, θ , a) d a.
10
See Appendix A.11 for the derivation of the stationary dynamic program of the household. In addition, vt zst s . v˜t ˜z t := (A t )γ(1−η) As we consider the case 1, pg en(˜ x ts ) = pg en t = pen t /A t are constant and equal for all retirees; in particular, they do not depend on accumulated earnings x˜ ts . Appendix A.11 derives the more general dynamic programming problem for the case 2.
11
636
11 OLG Models with Uncertainty
Accordingly, F (s, e, θ , a˜) is the mass of households in group (s, e, θ ) with assets a ∈ [amin , a˜] and F (·, ·, ·, a˜) the mass of the population with assets a ∈ [amin , a˜].12 For the computation of the distribution function F (·) we discretize the asset space. We denote the asset grid with na elements by A = {˜ a1 , a˜2 , . . . , a˜na } and the density function on this grid by f (s, e, θ , a˜).13 DEFINITION. In a stationary equilibrium with constant survival probabilities φ s , a constant population growth rate, g N , and constant public ˜ T fr, Pg policy (τc , τk , τl , τ p , G, en), prices and the distribution of the individual state variables f (s, e, θ , a˜s ) are constant and satisfy the following conditions:14 1. Total population Nt is equal to the sum of all cohorts: Nt =
T X s=1
Nts
with associated constant shares of the s-year-old cohorts µs = µst =
Nts Nt
.
(11.15)
2. Population Nt and the youngest cohort Nt1 grow at the same rates g N ,t =
Nt+1 Nt
− 1 and n t =
1 Nt+1
Nt1
− 1, respectively, implying:
Nt+1 − Nt = n. Nt 3. Households maximize their lifetime utility as described by the solution to the Bellman equation (11.13), implying the optimal policy functions 12
Note the slight abuse of notation: We introduced a˜ as scaled assets. To circumvent a further symbol for the upper bound of the interval, say, a¯˜, we temporarily dropped the tilde. 13 In the computation of the stationary equilibrium, we use a finer grid on the asset space for the discretization of the distribution function than that for the policy and value function. For example, this numerical device allows us to assess the accuracy of the policy function off grid points when we weight the Euler residuals by the measure of the households. We already described the choice of finer asset grids over the distribution than for the policy function (which follows Ríos-Rull (1999)) in Chapter 8. 14 In the stationary equilibrium, we can drop the time indices. Next-period variables are denoted with a prime, e.g. a˜0 denotes next-period wealth.
11.2 Overlapping Generations Models with Individual Uncertainty
637
a˜0 (˜zs ), ˜c (˜zs ), and l(˜zs ) for next-period wealth, consumption, and labor supply. 4. Firms maximize profits, implying the factor prices w and r: ˜ α ˜L −α , w = (1 − α)K ˜ α−1 ˜ 1−α
r = αK
L
(11.16a)
.
(11.16b)
5. Aggregate effective labor supply is equal to the sum of the individual effective labor supplies: ˜L =
na nθ X Tw X 2 X X s=1 j=1 iθ =1 ia =1
ε(s, e j , θiθ ) l(s, e j , θiθ , a˜ia ) f (s, e j , θiθ , a˜ia ). (11.17)
˜ is equal to the sum of the individual wealth levels: 6. Aggregate wealth Ω ˜= Ω
na nθ X T X 2 X X s=1 j=1 iθ =1 ia =1
a˜ia f (s, e j , θiθ , a˜ia ).
(11.18)
˜ and B ˜ are equal (no-arbitrage 7. The after-tax returns on the two assets K condition): r b = (1 − τkr )(r − δ).
(11.19)
˜+K ˜. ˜ =B Ω
(11.20)
8. In capital market equilibrium,
9. At the end of period t, the government collects accidental bequests from the s-year old households that do not survive from period t until period t + 1:15 na nθ X T X 2 X X g B eq = (1 − φ s )˜ a0 (s, e j , θiθ , a˜ia ) f (s, e j , θiθ , a˜ia ). (11.21) (1 + gA) s=2 j=1 i =1 i =1 θ
a
10. The goods markets clear: ˜ + (1 + gA)(1 + n)K ˜ 0 − (1 − δ)K ˜, Y˜ = C˜ + G
(11.22)
where aggregate consumption C˜ is the sum of individual consumption values: C˜ = 15
na nθ X T X 2 X X s=1 j=1 iθ =1 ia =1
˜c (s, e j , θiθ , a˜ia ) f (s, e j , θiθ , a˜ia ).
For the derivation of accidental bequests, compare Appendix A.10.
(11.23)
638
11 OLG Models with Uncertainty
11. The density function f (s, e, θ , a˜s ) (and the associated distribution function F (s, e, θ , a˜s )) of the per-capita variables (detrended by aggregate productivity A t ) are constant, f (·) = f 0 (·) (F (·) = F 0 (·)). The dynamics of the distribution function F (s, θ , e, a˜) evolves according to F 0 (s + 1, e, θ 0 , a˜s+1 ) =
X θ
Prob(θ 0 |θ )
φs F (s, e, θ , a˜s ), 1+n
s =1, . . . , T − 1,
where, on the right-hand side of the equation, we sum over all the productivity types θ in period t and wealth a˜s of the s-year old is given by16 a˜s = (˜ a0 )−1 (s, e, θ , a˜s+1 ). The distribution of the state variables (s, e, θ , a˜s ) among the newborn cohort is constant and is represented by: µ1 × ν(θ ) × π(e) if a˜1 ≥ 0, 1 F (1, e, θ , a˜ ) = 0 else, where ν(θ ) and π(e) denote the shares of the θ and e productivity types in the cohorts (assumed to be constant over age s).17
CALIBRATION. Our model is calibrated for the US economy for the year 2015. We assume that the US economy is in the steady state in this period.18 In the definition of (˜ a0 )−1 (·), we refer to the inverse function of a˜0 (s, e, θ , a˜s ) with respect s to the argument a˜ . For a˜0 = 0, the inverse function may not be well defined because there may be an interval [0, a˜s,0 ] on which the optimal next-period wealth is equal to a˜0 = 0 due to the credit constraint. In this case, we choose the upper boundary a˜s,0 as the value of the function (˜ a0 )−1 (with a slight abuse of notation). Note that a˜0 (s, e, θ , a˜s ) is a strictly monotone function of a˜s for values of a˜0 > 0. 17 In our definition of the stationary distribution dynamics, we choose to use 1) the continuous rather than the discretized version of F (·) and 2) apply the concept of the distribution rather than the density function. 1) Using the continuous rather than the discretized state variables helps us to avoid the problem in the formulation of the dynamics if the optimal next-period asset level a˜0 (s, e, θ , a˜s ) at a particular grid point a˜s = aia is not a grid point. We will discuss the numerical implementation of the dynamics in the next section. 2) By using F (·) rather than f (·), we can avoid the problem that, due to the credit constraint a˜s ≥ 0, the measure of the households with a˜s = 0 is non-zero. 18 Of course, this is not an innocuous assumption given that the Great Recession just took place during 2007-08. 16
11.2 Overlapping Generations Models with Individual Uncertainty
639
Periods correspond to years. Agents are born at real lifetime age 20, which corresponds to age s = 1. As stated above, they work T w = 45 years, corresponding to a real lifetime age of 64, and live a maximum life of 70 years (T R = 25), meaning that agents do not become older than real lifetime age 89. Our survival probabilities φ ts and population growth rates n t are taken from the UN (2015), which provides 5-year forecasts until the year 2100. We interpolate population data using cubic splines and assume that survival probabilities and the population growth rate are constant after 2100. In 2015, the population growth rate n amounted to 0.754%. For the discount factor, we choose the parameter value β = 1.011 in accordance with the empirical estimates of Hurd (1989), who explicitly accounts for mortality risk. In addition, we choose the coefficient of relative risk aversion η = 2.0. The preference parameter γ = 0.33 is calibrated such that the average labor supply ¯l is approximately 0.30.19 The model parameters are presented in Table 11.1. Table 11.1 Calibration of OLG Model with Idiosyncratic Uncertainty
19
Parameter
Value
Description
α δ gA η γ
0.35 8.3% 2.0% 2.0 0.33
β n τl + τ p τk τc G/Y
1.011 0.754% 28% 36% 5% 18%
B/Y r epl (e1 , e2 )
63% 35.2% (0.57, 1.43)
production elasticity of capital depreciation rate of capital growth rate of output coefficient of relative risk aversion preference parameter for utility weight of consumption discount factor population growth rate tax on labor income tax on capital income tax on consumption share of government spending in steadystate production debt-output ratio gross pension replacement rate permanent productivity types
We found this value by trial and error. As an initial guess, we took γ = 0.338 from p. 28 of Heer (2019) who considers the corresponding neoclassical growth model.
640
11 OLG Models with Uncertainty
As in Section 10.3, the tax parameters are set to τl + τ p = 28%, τk = 36%, and τc = 5%, and the government consumption share in GDP is set equal to G/Y = 18% following Trabandt and Uhlig (2011). The debtoutput level amounts to B/Y = 63%.20 The gross replacement rate of pensions with respect to wage income (of the household with unitary individual efficiency and average labor supply ¯l), r epl = pen/(w¯l) = 35.2%, for 2014 is taken from the OECD (2015) (series: gross pension replacement rates for men, % of pre-retirement earnings). In case 1 with lump-sum pensions, the endogenous social security rate τ p amounts to 7.58%. In addition, we find a government-transfer-to-GDP ratio equal to Tr/Y = 4.93%. Finally, we calibrate the labor efficiency of the s-year old household, ε(s, e, θ ) = θ e ¯y s . Following Krueger and Ludwig (2007), we choose the ne = 2 permanent efficiency types of the workers (e1 , e2 ) = (0.57, 1.43).21 The mean efficiency index ¯y s of the s-year-old worker is taken from Hansen (1993) and illustrated in Figure 10.4. The logarithm of the idiosyncratic productivity shock, ln θ , follows a Markov process. The first-order autoregressive process is given by: ln θ 0 = ρ ln θ + ζ,
(11.24)
where ξ ∼ N (0, σξ ) is distributed independently of age s. Huggett uses ρ = 0.96 and σξ2 = 0.045. Furthermore, we follow Huggett (1996) and choose a log-normal distribution of earnings for the 1-year old (corresponding to real-life age 20) with σ y1 = 0.38 and mean y 1 . As the log endowment of the initial generation of agents is normally distributed, the log efficiency of subsequent agents will continue to be normally distributed. This is a useful property of the earnings process, which has often been described as log-normal in the literature.22 20
In the first quarter of the year 2023, the debt-GDP ratio amounted to 118.6% in the US economy. Clearly, the assumption of a constant debt-GDP level in our calibration is not innocuous. Source of the data: US Office of Management and Budget and Federal Reserve Bank of St. Louis, Federal Debt: Total Public Debt as Percent of Gross Domestic Product [GFDEGDQ188S], retrieved from FRED, Federal Reserve Bank of St. Louis; https: //fred.stlouisfed.org/series/GFDEGDQ188S , July 18, 2023. 21 This calibration is in accordance with recent evidence presented by Heer and Rohrbacher (2021) according to which the weekly hourly earnings of college graduates are characterized by a skill premium of approximately 150% relative to those of the high school graduates since 1994. 22 The log-normal distribution, however, has an unrealistically thin top tail. As argued by Saez (2001), this property of the income distribution has important implications for the design of optimal income tax progressivity. In addition, Mankiw et al. (2009) note that
11.2 Overlapping Generations Models with Individual Uncertainty
641
We discretize the state space Θ = {θ1 , . . . , θnθ } using nθ = 5 values. The logarithm of the states θiθ , iθ = 1, . . . , nθ , are equally spaced and range from −mσ y1 to mσ y1 . We choose m = 1.0 such that the Gini coefficient of hourly wages amounts to 0.374, which we estimated for the US during the calibration period using PSID data. Our grid Θ, therefore, is presented by: Θ = {0.4688, 0.6847, 1.0000, 1.4605, 2.1332} with corresponding logarithmic values ln Θ = {−0.7576, −0.3788, 0.0000, 0.3788, 0.7576}. The probability of having productivity θiθ in the first period of life is computed by integrating the area under the normal distribution, implying the initial distribution among the 21-year-old agents for each permanent productivity type e j , j = 1, 2:
0.1783 0.2010 ν(θ ) = 0.2413 . 0.2010 0.1783 Each permanent efficiency type e j , j = 1, 2, has a share of 50% in each cohort. The transition probabilities are computed using Tauchen’s method as described in Algorithm 16.4.1. As a consequence, the efficiency index θ follows a finite Markov-chain with transition matrix: 0.7734 0.2210 0.0056 0.0000 0.0000 0.1675 0.6268 0.2011 0.0046 0.0000 0 Prob(θ |θ ) = 0.0037 0.1823 0.6281 0.1823 0.0033 . (11.25) 0.0000 0.0046 0.2011 0.6268 0.1675 0.0000 0.0000 0.0056 0.2210 0.7734 The pension function pg en(·) and the social security parameters are calibrated as follows. In case 1, we simply assume that pensions are lumpsum and that the ratio of pensions to average labor income is equal to the constant replacement rate of 35.2%. In case 2, we calibrate the PAYG pension system in closer accordance with the US pension system. Following the derivation of the ability distribution, which is central to the parameterization of our model, from the income distribution is ‘fraught with perils’.
642
11 OLG Models with Uncertainty
Huggett and Ventura (2000), pensions pg en(˜ x ) are modeled as a piecewise linear function of average past earnings: min pg en + 0.9˜ x if x˜ ≤ 0.2¯ x min pg en + 0.9(0.2¯ x) pg en(˜ x ) = +0.32(˜ x − 0.2¯ x) if 0.2¯ x < x˜ ≤ 1.24¯ x min en + 0.9(0.2¯ x) pg +0.32(1.24¯ x − 0.2¯ x ) + 0.15(˜ x − 1.24¯ x ) if x˜ > 1.24¯ x
(11.26)
where x¯ denotes the average value of accumulated earnings x˜ among min the retired in the economy. The lump-sum benefit pg en is set equal to 12.42% of GDP per capita in the model economy. Depending on the bracket in which the retired agent’s average earnings x˜ are situated, she receives 90% of the first 20% of x¯ , 32% of the next 104% of x¯ , and 15% of the remaining earnings (˜ x − 1.24¯ x ). Therefore, the marginal benefit rate declines with average earnings. We find that, in this case, the average replacement rate of pensions with respect to gross wage is higher than in the case of lump-sum pensions and amounts to 50.5%.
11.2.2 Computation of the Stationary Equilibrium In the following, we compute the stationary equilibrium for the economy with lump-sum pensions where the population parameters φ s and n are constant. We calibrate the model to match the parameters displayed in Table 11.1. The solution algorithm closely follows Algorithm 10.2.1 and consists of the following steps: Algorithm 11.2.1 (OLG Model with Idiosyncratic Uncertainty) Purpose: Computation and calibration of the stationary equilibrium in the OLG model with idiosyncratic productivity shocks. Steps: Step 1: Parameterize the model, and choose asset grids for the individual state space. Step 2: Make initial guesses of the steady-state values of the aggregate capital ˜ , labor ˜L , mean working hours ¯l, labor income taxes τl , the stock K social security contribution rate τ p , and government transfers ter.
11.2 Overlapping Generations Models with Individual Uncertainty
643
Step 3: Compute the values w and r that solve the firm’s Euler equation, and compute pg en. Step 4: Compute the household’s policy functions by backward induction using value function iteration. Step 5: Compute the optimal path for consumption, savings, and labor supply for the new-born generation by forward induction given the initial asset level a˜1 = 0 and distribution of idiosyncratic productivities e and θ . ˜ labor supply ˜L , mean working Step 6: Compute the aggregate savings Ω, ¯ Þ hours l, aggregate taxes Ta x, and transfers ter. Step 7: Update the aggregate variables, and return to Step 3 until convergence. Step 8: Update the asset grid of the individual state space if necessary, and return to Step 3 until convergence. The algorithm is implemented in the GAUSS program AK70_stoch_ inc.g. The computer code is also available on our download page in the programming languages Python and Julia.23 As presented in Table 11.2, the computational times vary between 27 minutes and 55 hours depending on 1) the computer language and 2) the interpolation mode for the policy function. In general, we find in our heterogeneous-agent applications that Python code is much slower than Julia or GAUSS; Python code is slower the more (nested) loops are part of the program. We will discuss computational speed in more detail below. In the first step of the algorithm, we parameterize the model as presented in Table 11.1. In addition, we choose an equispaced grid of na = 500 ˜ = {˜ points for the policy function over the individual asset space Ω a1 = min ma x a˜ , . . . , a˜na = a˜ } with the minimum and maximum asset levels equal to a˜ min = 0 and a˜ ma x = 20.0. The upper boundary point a˜ max is found with some trial and error such that the measure of households with wealth level equal and close to a max is zero, but some households hold wealth in the top quintile of the asset space. Since our algorithm restricts the optimal policy function a˜0 (·) to lie on the interval [˜ a min , a˜ max ], the behavior of the policy functions displays abrupt changes at the upper boundary of the asset space interval, and we want to restrict the evaluations of policy functions to a˜ a˜ ma x . For the distribution function, we will choose a finer grid of na g = 1, 000 points as discussed in Chapter 8. In case 2, we also have to specify a grid of n x points for accumulated earnings x˜ . We will consider 23
In addition to the code, you can find an extensive Jupyter manuscript on our web pages that explain the Python code in detail (line by line).
644
11 OLG Models with Uncertainty Table 11.2 Comparison of Runtime and Accuracy
Interpolation
linear
cubic
cubic
Grid points na na g
500 1,000
500 1,000
300 1,000
Aggregates ˜ K ˜L
1.486 0.3097
1.486 0.3097
1.484 0.3096
Accuracy Young Old
0.00065 0.00196
0.000074 0.000084
0.000088 0.000118
Runtime Julia GAUSS Python
1:29:56 27:38 32:49:37
1:32:43 1:16:34 55:30:33
45:37 51:56 48:17:04
Notes: Accuracy is measured by the mean absolute value of the Euler residuals for the young and old households. The Euler residual is evaluated at all nag = 1, 000 asset grid points over the distribution. Runtime is given in hours:minutes:seconds on an Intel(R) Xeon(R), 2.90 GHz.
this case separately in the next section and concentrate on case 1 with lump-sum pensions in the following. We also choose the discretization of the individual idiosyncratic productivity space Θ and the transition matrix of the Markov process, Prob(θ 0 |θ ), as described in the previous section on calibration. With the help of the stationary survival probabilities φ s and the population growth rate n, we can compute the stationary age distribution in the population. Let Nt denote total population and Nts denote the number of s-year old households in period t, while the the measure of the s-year old in period t, µst , is presented by (11.15). The sum of all measures µst is equal to one by definition: 70 X s=1
µst = 1.0.
(11.27)
To compute the stationary measure {µs }70 s=1 , we simply set up a vector with 70 entries in the computer program AK70_stoch_inc.g, initialize
11.2 Overlapping Generations Models with Individual Uncertainty
645
the measure of 1-year olds, µ1 , equal to one, and iterate over age s = 1, . . . , T − 1 as follows: µs+1 =
φs s µ. 1+n
(11.28)
This formula is derived from the following two equations: 1 Nt1 = (1 + n)Nt−1 ,
(11.29)
s−1 Nts = φ s−1 Nt−1 .
(11.30)
and
N
t Division of both equations by population size Nt and noting that 1+n = Nt−1 implies our formula. After we have computed all µs , s = 1, . . . , T , we normalize P thes measures s by dividing each measure µ by the sum of all measures, s µ , meaning that their sum is equal to one and the µs also present the share of s-year olds in the total population. The measure of the cohorts in our calibration declines monotonically with age s as presented in Figure 11.1.
0.020
µs
0.015 0.010 0.005 1
10
20
30
40
s
50
60
70
Figure 11.1 Measure µs of the s-Year-Old Cohort
In the second step, we provide an initial guess of the endogenous ag˜ , efficient labor ˜L , mean gregate variables: the aggregate capital stock K l working hours ¯l, labor income taxes τ , the social security contribution rate τ p , and government transfers ter. We assume that households work 30% of their time, ¯l = 0.30. Moreover, we simply assume that aggregate efficient labor is also equal to ˜L = 0.30. Of course, this is only an approximation in our economy. On the one hand, only 78% of the households are working.
646
11 OLG Models with Uncertainty
Since we normalize the total measure of households to one, ˜L should be lower for this reason. On the other hand, we observe that workers with higher productivity also work longer hours. Therefore, ˜L should be higher than the number of working hours. As it turns out, our initial guess of ˜L of 0.30 is close to its final value (=0.310). The convergence and the final result of our computation is insensitive to the initial guesses; however, computational time until convergence to the final values may depend on our initial choice. Using a real interest rate (net of depreciation) equal to r − δ = 3% and efficient labor supply ˜L = 0.30, we can compute the implied steady-state value of the capital stock from the first-order condition of the firm, implying ˜ = 1.708. In our model, aggregate capital is lower than aggregate savings, K ˜ < Ω, ˜ . In ˜ as part of the savings is invested in government bonds B K equilibrium, government bonds amount to 63% of GDP. Since the wealthGDP ratio is approximately 3.0 in the United States, government bonds amount to approximately 21% of wealth, and hence, physical capital is approximately 79% of wealth. Therefore, we obtain an initial guess for our ˜ . For the final remaining ˜ by using the approximation Ω ˜ ≈ 1.26K wealth Ω aggregate quantities, we need to provide a guess for government transfers ter, which we initialize with a small number, ter = 0.01. In Step 3, we use the firm’s first-order conditions (11.8) with respect to labor and capital to compute the wage w and interest rate r. In stationary variables, these equations can be expressed as follows: ˜ α ˜L t −α , w t = (1 − α)K t ˜ α−1 ˜L t 1−α . r t = αK t
(11.31a) (11.31b)
Stationary pensions are computed with the help of our calibration for the replacement rate r epl using pg en = r epl × w ¯l. The social security tax τ p follows from the social security budget. Noting that the share of retirees in total population is equal to 21.9%, we find the equilibrium social security tax using τ p = 0.219g p en/(w ˜L ). Moreover, we can compute τl from our l p calibration τ + τ = 28%. As a result, we have the values of all variables, (w, τl , τ p , r, r b , ter, pg en), that are needed as input for the computation of the individual policy functions a˜0 (·), ˜c (·), and l(·). In Step 4, we need to solve a finite-time dynamic programming problem to find the optimal policy function. Again, we use value function iteration with linear (or, alternatively, cubic) interpolation between grid points as described in more detail in Chapter 7. The finite value function problem is
11.2 Overlapping Generations Models with Individual Uncertainty
647
solved by starting in the last period of life, s = T , and iterating backwards in age until the first year of life, s = 1. The value function in the last period of life, T = T W + T R = 70, with given exogenous pensions pg en and interest rate r is represented by24 v˜ T (˜ a T ) = u(˜c T , 1) = with ˜c T =
(˜c T )γ
1−η
1−η
pg en + [1 + (1 − τk )(r − δ)]˜ a T + ter . 1 + τc
(11.32)
(11.33)
Retired agents at age T = 70 consume their total income consisting of pensions and interest income plus government transfers. Note that, during retirement, the value function only depends on age s and wealth a˜ but not on individual productivities θ and e as pensions are provided lump-sum. Therefore, we initialize the value function of the retirees in our computer code as an array vr[:,:] with T R × na zero entries. We also store the optimal policy function ˜c (T, a˜ia ) and optimal nextperiod assets a˜0 (T, a˜ia ) (=0) of the T -year-old household for all asset points a˜ia , ia = 1, . . . , na . In the next step of the value function iteration over age s = T, . . . , 1, we consider the retiree in his second-to-last period at retirement age T −1. We use golden section search to compute the optimal solution to the right-hand side of the Bellman equation:25
v˜
T −1
(˜ a
T −1
pg en + [1 + (1 − τk )(r − δ)]˜ a T −1 + ter − (1 + gA)˜ aT ) = max u ,1 a˜ T 1 + τc T −1 T T ˜ + βφ v˜ (˜ a ) .
In our computer code, the right-hand side of the Bellman equation is stored as a function of a˜0 and age s. To optimize it, we need to evaluate the value function at age s + 1 for the asset level a˜0 , which may not be a grid point stored in the array vr[:,:]. For this reason, we need to use interpolation to find the value. We will use either linear or cubic interpolation and compare them with respect to speed and accuracy below. 24
For the ease of exposition, we introduce a superscript s = T for age s in the value function v˜. 25 Notice that we can drop the expectational operator E from the right-hand side of the Bellman equation for the retired worker.
648
11 OLG Models with Uncertainty
The optimization step of the rhs of the Bellman equation (11.13) is performed using the golden section search algorithm described in Section 15.4.1. In the optimization, we need to take special care if the boundaries of the golden section search happen to coincide with either a˜ min = 0 or a˜ ma x . (Remember that we imposed the credit constraint a˜ ≥ 0.) Therefore, we have to evaluate whether we have a corner solution a˜0 = a˜ min (˜ a0 = a˜ ma x ). For example, we compare the rhs of the Bellman equation at a˜ min with that evaluated at a˜ min + eps, where ’eps’ is a small constant. If the value at a˜ min is larger than that at a˜ min + eps, we have a corner solution. Otherwise, we may apply golden section search again. At this point, let us mention that it is always a good idea to print intermediate results during the coding process. For example, you should start programming the inner loop over the value function prior to the outer loop over the aggregate variables. Once you have computed the value function and policy functions of the retiree for the first time, you should analyze their graphs for different ages and check whether their shapes are monotone and increasing and whether the policy functions are well-behaved at the boundaries of the state space where a˜0 ≥ 0 or a˜0 ≤ a ma x might be binding. In our web documentation of the Python code, for example, you will find the illustrations of ˜c (·), a˜0 (·), and v˜(·) as functions of a˜ at the first year of retirement at age s = T + 1, and all functions are smooth and monotone. Next, we turn to the value function iteration for the worker. The procedure is almost the same as in the case of the retiree. The worker also maximizes the right-hand side of the Bellman equation. However, different from the retiree, he also chooses optimal labor supply. There are different ways in which we can implement this two-dimensional optimization problem. We have chosen to break it up into two nested optimization problems. In the outer loop, we optimize over the next-period wealth a˜0 as in the case of the retiree using golden section search. In the inner loop, we compute the optimal labor supply given this and next-period wealth, a˜ and a˜0 , using the first-order condition of the worker with respect to his labor supply:26 (1 − τl − τ p )ε(s, e, θ )w 1 − γ ˜c = . 1 + τc γ 1−l
(11.34)
After the substitution of ˜c from the budget constraint of the worker, we can solve for the labor supply l of the worker: 26
For the derivation of the first-order condition, see Appendix A.12.
11.2 Overlapping Generations Models with Individual Uncertainty
l =γ−
(1 − γ) [1 + (1 − τk )(r − δ)]˜ a + ter − (1 + gA)˜ a0 (1 − τl − τ p )ε(s, e, θ )w
649
.
(11.35)
We also have to impose the constraints l ≥ 0 and l ≤ l max if l happens to lie outside the admissible space. Note that, in the present case, it is easy to compute the optimal labor supply since we can explicitly solve for optimal labor l. If we use a different utility function or consider contribution-based pensions, labor supply may only be computed implicitly, and we will have to use more sophisticated methods. In the case of a different utility function, we may have to solve a nonlinear equation problem. If we also consider contribution-based pensions, the computation of the optimal labor supply is more complicated, and we will discuss it in the next section of this chapter. The implementation of this nested optimization is straightforward. Different from the case of the retired household, however, we need to solve the value function iteration problem for the different productivity types θ and e because the worker’s labor supply and income depend upon them. Of course, the additional loops over the variables θiθ , iθ = 1, . . . , 5, and e j , j = 1, 2, considerably slow the computation. Therefore, the better modeling of the workers’ heterogeneity comes at the cost of reduced speed. To graphically evaluate the computation of the policy functions for the worker is difficult. Given that the number of all productivities and ages of the workers alone amounts to ne × nθ × T w = 450, we need to restrict our attention to the study of a random choice. In Figure 11.2, we graph the labor supply function of skilled and unskilled workers at age s = 21 with idiosyncratic productivity θ4 = 0.378 in the stationary equilibrium. Labor supply falls with higher wealth a˜. The high-skilled (blue curve) has a much higher labor supply than the low-skilled (red curve) for given wealth a˜ (but since the high-skilled workers hold higher wealth on average, the difference in working hours observed empirically is smaller).27 In addition, the lower boundary l ≥ 0 starts to bind for the low-skilled worker for a wealth level a˜ in excess of 11.0. To assess the accuracy of our policy function approximation, we study the Euler equation residual. Therefore, we compute the value of the Euler equation in implicit form using the policy functions for consumption ˜c and 27
Blundell et al. (2018) find a significant difference of both male and female employment at the intensive margin across skill groups. According to their Figure 4, for example, men aged 25-55 with college, high school, and no high school worked approximately 45, 43, and 41 hours per week on average during the period 1978-2007.
650
11 OLG Models with Uncertainty 0.5
Low Skilled High Skilled
l(21, e, θ4 , ·)
0.4 0.3 0.2 0.1 0.0
0
2
4
6
8
10
12
14
18
16
20
a˜ Figure 11.2 Labor Supply of the Low- and High-Skilled Workers with Idiosyncratic Productivity θ4
labor l.28 For the s-year-old worker with wealth level a˜ and productivity type ε(s, e, θ ), the Euler equation residual is defined by:29 R(˜ a) = 1 −
β(1 + gA
uc (˜c , 1 − l) γ(1−η)−1 ) (1 + r b )φ s E {u
c 0 , 1 − l 0 )} c (˜
,
(11.36)
where ˜c 0 and l 0 are the next-period consumption level and labor supply and expectations E are conditional on information at age s in period t (after observing the idiosyncratic productivity shock in this period). The righthand side of (11.36) presents the first-order condition of the household with respect to next-period wealth a˜0 . If R(˜ a) = 0, the Euler equation error is zero, and we have a perfect fit. As emphasized in previous chapters, the Euler equation residual is a popular measure of accuracy. In particular, Santos (2000) has shown that the accuracy of the numerical approximation of the policy function is of the same order of magnitude as the numerical error of the Euler equation residual.30 Therefore, we apply the Euler equation error as our criterion to get an idea about the goodness of fit. We use a finer grid of the asset level for the Euler equation residual than that for the policy function. We want to consider points off the policy function grid where we have to interpolate between grid points and accuracy might be lower. As our measure of fit, we 28
The Euler equation residual is considered in many others chapters of this book and defined in Section 1.7.2. 29 For the derivation of the Euler equation (11.36), ses Appendix A.12. 30 Note, however, that his results are derived for the stochastic neoclassical growth model where households are infinitely lived.
11.2 Overlapping Generations Models with Individual Uncertainty
651
use the average absolute Euler equation residual. Alternatively, you may weigh the Euler residuals with the measure of the s-year-old households with asset level a˜ia and productivity ε(s, e j , θiθ ), f (s, e j , θiθ , a˜ia ). To compute the Euler equation residual, we also need to define functions for the marginal utility of consumption for the present and the next period in our computer code. The evaluation of next-period marginal utility is very time-consuming because we need to interpolate the next-period policy functions for consumption and labor supply to derive c 0 and l 0 . Therefore, you should compute the Euler residual only when you start to set up the program to check for the accuracy of your computer code in the development stage and after the final outer loop over the aggregate variables when you want to determine whether the number of grid points and the mode of interpolation is sufficient. In the case of linear interpolation of the policy function, the Euler equation residuals are equal to 0.065% and 0.196% for the workers and retirees, respectively, as presented in Table 11.2. Given the accuracy of our golden section search routine (1e-5), this accuracy is admissible, and we are fairly certain that we have correctly programmed the value function iteration problem. The solution to the dynamic programming problem and that from the Euler equation (almost) coincide. If you want to increase accuracy, you need to either 1) consider more grid points na and/or 2) use cubic interpolation instead of linear interpolation. As is often the case in computational economics, you are faced with the trade-off between speed and accuracy. In the case of cubic interpolation and na = 500 grid points, the accuracy measure in form of the mean absolute Euler equation residual improves and decreases to 0.0074% and 0.0085% for the young and old workers, respectively, but computational time increases considerably in GAUSS and Python and only to a small extent in Julia (compare the first and second entry columns in Table 11.2). Evidently, we can achieve higher accuracy by exploiting the curvature of the value function. For a smaller number of grid points, na = 300, we find that, in the case of Julia, cubic rather than linear interpolation does not only provide higher accuracy as measured by the Euler equation residual (amounting to 0.0088% and 0.0118% for the worker and retiree) but also allows for a faster computation. As displayed in the third entry-column of Table 11.2, computational time drops from 1 hour and 29 minutes (linear interpolation with na = 500) to 45 minutes (cubic interpolation with na = 300). This observation, however, does not hold unanimously for all computer languages, e.g., for the computer languages Python and GAUSS in the present example. We, therefore, recommend that you program your
652
11 OLG Models with Uncertainty
code flexibly enough that you can easily switch between linear and cubic interpolation. In Step 5, we compute the endogenous wealth distribution f (s, e, θ , a˜) in each cohort s over the asset space [˜ a min , a˜ max ] for the ne permanent and nθ idiosyncratic productivity types. Over the asset space, we use an equispaced grid of na g = 1, 000 points, resulting in a total number of grid points (in case 1 with lump-sum pensions): T W × ne × nθ × na g + T R × nag = 45×2×5×1000+25×1000 = 475, 000. In the computer program AK70_stoch_inc.g, the distribution is stored in the variables gkw[s,e,θ ,˜ a] and gkr[s,˜ a] for the worker and retiree, respectively. In the case of the worker, gkw is a four-dimensional array, while, in the case of the retired household, gkr is only two-dimensional because the productivity type does not affect behavior in old age. We start the computation of the distribution function with the newborn generation at age s = 1 with zero wealth. The newborn generation has measure µ1 = 0.02118. Furthermore, we know the distribution of the idiosyncratic productivity at age 1, which is given by ν(θ ) (see the calibration section). Each permanent productivity has measure 1/2 in each cohort of workers. The distribution in the first period is initialized for e ∈ {e1 , e2 } as follows: 1 µ1 v(θ ) if a˜ = 0 gkw[1, e, θ , a˜] = 2 0 else. Given the distribution of the wealth level a˜ and productivity ε(s, e, θ ) at age s = 1, we can compute the distribution of (s, e, θ , a˜) at age s = 2 by using the optimal decision functions of the agents with respect to labor supply l, consumption ˜c , and next-period wealth a˜0 and the transition probabilities for the idiosyncratic productivities. The policy functions at age s = 1 were stored in the arrays lopt[1,e,θ ,˜ a], cwopt[1,e,θ ,˜ a] and awopt[1,e,θ ,˜ a] in Step 4. For expositional reasons, let us consider a specific example, e.g., a lowskilled worker with idiosyncratic productivity θ4 = 1.4605 and zero wealth a˜ = 0.0. His measure is equal to 0.00213 and given by the product of 1) the measure of the 1-year old, µ1 , 2) the share of workers with idiosyncratic productivity θ4 , ν(θ4 ), and 3) the share of low-skilled workers among all workers (equal to 1/2). His optimal next-period wealth is represented by awopt[1,e1 ,θ4 ,0]=0.008365. Therefore, we know that the households at age s = 1 in period t with measure equal to 0.00213 chooses to have
11.2 Overlapping Generations Models with Individual Uncertainty
653
next-period wealth a˜0 = 0.008365. Some households will die, so only φ 1 = 99.92% of the 1-year-old households survive. In addition, the share of each cohort decreases between period t and period t + 1 by the factor 1/(1 + n) = 0.9925 because the population grows. As a consequence, the measure of households in the next period that is implied by this type of worker amounts to 0.002129
φ1 = 0.002111. 1+n
In the computation of the dynamics, we also have to take care of the Markov transition matrix Prob(θ 0 |θ ) that describes the behavior of the stochastic part of idiosyncratic productivity. For example, for θ4 , we know from inspection of (11.25) that a household with productivity θ4 at age 1 becomes a household with productivity θiθ at age 2 with probabilities 0.0000, 0.0046, 0.2011, 0.6268, and 0.1675 for iθ = 1, . . . , 5. Therefore, we have to add the above measure to the five productivity levels θ1 − θ5 weighted by the respective probabilities.31 Similarly, we compute the dynamics of the households at age 1 over the complete productivity and asset space. In the next outer iteration, we increase the age s by one. We also have to discuss how we handle the case when the next-period asset a˜0 is not a grid point on the grid ‘ag’ for the distribution function. For example, in our example above, the next-period wealth amounts to a˜0 = 0.008365 and lies between the first two grid points a g1 = 0 and a g2 = 0.01998. In this case, we proceed as described in Chapter 8.4 and introduce a simple lottery: If the optimal next-period wealth a˜0 happens to lie between a g ia−1 and a g ia , a g ia−1 < a˜0 < a g ia , we simply assume that the nextperiod wealth level will be a g ia with probability (˜ a0 − a g ia−1 )/(a g ia − a g ia−1 ) and a g ia−1 with the complementary probability (a g ia − a˜0 )/(a g ia − a g ia−1 ). The computational time for Step 5 diverges significantly across the different computer languages. In essence, we have to compute five nested iterations in the five variables age s, permanent productivity e, idiosyncratic productivity θ , individual wealth a˜ and next-period idiosyncratic productivity θ 0 (from the outer to the inner loop) to compute the measures at the 475,000 grid points. We programmed the computation of the distribution in exactly the same way in the three programming languages Python, Julia, and GAUSS, meaning that we executed exactly the same number of operations (products and sums) and stored the numbers with 31
As a consequence, we have to iterate over five loops (s, e, θ , a˜, θ 0 ) to compute the distribution f (s, e, θ , a˜) at age s = 2, . . . , T w .
654
11 OLG Models with Uncertainty
the same accuracy (with a single precision of 8 digits). Python, in particular, proved to be particularly slow in the computation of Step 5 and needed 33 minutes. In comparison, Julia and GAUSS (8.2 and 11.3 seconds) were considerably faster.32 ˜ which is simply equal In Step 6, we update the aggregate variables Ω, to the sum of all savings, weighted by the measure of the household with ˜ , we state vector (s, e, θ , a˜). To compute the new aggregate capital stock K ˜ need to compute production Y first (using the old values of the capital ˜ is equal to stock and aggregate labor) and use the calibration that debt B ˜ ˜ ˜ 0.63Y to derive the value of B . Aggregate capital K is then implied by the capital market equilibrium: ˜ =Ω ˜. ˜ −B K
(11.37)
The labor market variables ˜L and ¯l are simply equal to the sum and the average of the individual variables weighted by their measures ‘gkw’ of the distribution function. The pension contribution rate τ p is computed with the help of the social security budget. To derive government transfers, we need to compute accidental bequests and total taxes first. The former variable is derived from the sum of all individual accidental bequests, weighted by the measure of the s-year old individual with wealth a˜ and productivity ε(s, e, θ ). Total taxes can be derived with the help of ˜ ˜ + τc C, Þ T a x = τl w ˜L + τk (r − δ)K
where aggregate consumption C˜ is equal to the sum of all individual consumption values (weighted by their measure). Given Walras’s law, we have one redundant equation in our model, which is represented by the goods market equilibrium (11.22): ˜ + [(1 + n)(1 + gA) − (1 − δ)] K ˜ α ˜L 1−α = C˜ + G ˜. Y˜ = K We have not yet used this equation in the computation of our solution. As an additional accuracy check of our computation, we should evaluate and compare the left-hand and right-hand sides of this equation. In our case, the deviation is less than 0.001 after the final iteration q = 28, and we can be quite confident that our computation is correct and has converged. In Step 7, we use this new value for the capital stock knew to update kbar using a weighted average of the new and the old values of the capital ˜ for the next outer stock (with weights 0.2 and 0.8, respectively) to derive K 32
In the case of Python and Julia, we also sped up the code using multi-threading.
11.2 Overlapping Generations Models with Individual Uncertainty
655
loop over the aggregate variables. The use of this extrapolation (using the value from the last iteration in the update) helps to stabilize the sequence so that the outer loop converges. We proceed in a similar way with the other aggregate variables ˜L , ¯l, τ p , τl , and ter and return to Step 2. We also save the values of the aggregate variables in each outer loop and find that ˜ is hump-shaped and smooth. the convergence of the capital stock K Finally, in Step 8, we use the distribution function f (s, e, θ , a˜) to evaluate whether the upper boundary a˜ max is chosen reasonably. If very close to a˜ ma x , all measures are equal to zero, we know that we have bracketed the ergodic distribution. If the highest wealth level a˜ with a strictly positive measure is reasonably close to a˜ max we have also chosen an efficient upper boundary. In addition, we study the Euler equation residuals to gauge the accuracy of the interpolation scheme (linear or cubic) and the number of grid points for the asset grid of the policy functions. If not, we have to adjust a˜ ma x and na and restart the outer loop in Step 3. In summary, we find that the coding for the computation of a largescale overlapping generations model with idiosyncratic productivity is a non-trivial task. The computer code may easily amount to several hundred lines. Often, we find a trade-off between accuracy and speed. The difference in speed is only caused to a small extent by our different coding techniques in the programming languages Julia, GAUSS, and Python.33 For example, in GAUSS and Python, we used our own routine for the golden section search, while, in Julia, we used a code provided by the package ‘optim’. The differences in speed become most evident in the computation of the distribution in Step 6 where we iterate over five nested loops (in the individual variables age s, permanent productivity e, idiosyncratic productivity θ , wealth a˜, and next-period idiosyncratic productivity θ 0 ). Here, the number of operations (additions and multiplications) is exactly the same in the code of all three programming languages, as is the accuracy that we use to store the numbers (1e-8). The difference in speed is impressive. In particular, GAUSS and Julia display approximately the same computational speed, while Python is considerably slower. In our experience, we find that in many applications of heterogeneous-agent models with overlapping generations, GAUSS and Julia outperform Python in terms of speed and that the difference is often crucial given the extensive runtimes.34 Consider the policy experiment where we would like to find 33
In fact, the syntax of most commands in the three languages is very similar. R For the same reason, we are also reluctant to write MATLAB code for the computation R of heterogeneous-agent OLG models (Heer (2019) provides MATLAB code for smalland medium-sized OLG models). As a faster alternative, Fortran and C++ code may be
34
656
11 OLG Models with Uncertainty
the optimal steady-state pension replacement rate in the present model (e.g., the highest lifetime utility of the average newborn) by searching over a fine grid of the replacement rate r epl. In Python, the computational time may become prohibitive in this particular example.
a¯s
RESULTS. Figure 11.3 displays the average wealth a¯s of the cohorts over the lifetime. As you know by now, the hump-shaped profile is typical for the life-cycle model. Households save for old age when their non-wealth income shrinks (pensions are below earnings). In the case of stochastic survival, however, individual wealth peaks prior to retirement because next-period utility is discounted at an increasing rate, 1/(φ s β) − 1, given that the survival probabilities φ s fall with age s. Average wealth amounts ˜ = 1.824, of which 81.5% is held in the form of physical capital, to Ω ˜ K = 1.486, and the remaining part is held in the form of government debt, ˜ = 0.338. The real interest on government bonds amounts to 2.77%. B
3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 1
10
20
30
40
50
s
60
70
Figure 11.3 Wealth-Age Profile in the Stochastic OLG Model
Consumption as presented in Figure 11.4 peaks in the last period of the working life. At the first year of retirement, leisure jumps to 1.0 (retirees do considered. However, these languages come at a cost. 1) Learning the languages Fortran or C++ is more time-consuming than learning Python, Julia, or GAUSS. 2) Writing your code in Fortran or C++ is also more difficult than the composition of the code with the help of a high-productivity language such as Python or Julia. If the reader is interested in how to write Fortran code for the computation of large-scale OLG models, we recommend consulting the textbook by Fehr and Kindermann (2018).
11.2 Overlapping Generations Models with Individual Uncertainty
657
not work), and to smooth utility over time, consumption falls. Aggregate consumption C˜ amounts to 0.275, and its share in total demand is equal to 51.3% (government consumption and investment take up 18.0% and 30.7% of GDP, respectively). 0.4
¯c s
0.3 0.3 0.2 0.2
1
10
20
30
40
s
50
60
70
Figure 11.4 Consumption-Age Profile in the Stochastic OLG Model
Figure 11.5 presents average working hours ¯l s (averages among the s-year-old workers), which peak at s = 12 (according to real lifetime age 32) and decline thereafter. The labor-supply-age profile displays kinks at the same ages as the underlying efficiency-age profile ¯y s displayed in Figure 10.4. The initial increase in labor supply is caused by the increase in the age productivity ¯y s ; however, ¯y s peaks at age 52, while labor starts to decline prior to this age because of the wealth effect. Workers become wealth-richer with increasing age and, for this reason, reduce their labor supply ceteris paribus. As we noted above, the labor supply of high-skilled workers is larger than that of low-skilled workers so that efficient working hours also exceed working hours. Given a workforce share in total population equal to 78%, aggregate efficient labor amounts to ˜L = 0.310. While the process for the individual’s hourly wage rate is exogenous in our model, her labor supply and, hence, earnings are endogenous. The Lorenz curve for the earnings in the steady state of our model (red line) and for the US economy (blue line) are displayed in Figure 11.6.35 Evidently, the inequality in earnings for the model economy matches the empirical value in the US, although earnings are slightly less concentrated in the model than in the US economy. In our model, the (earnings-) poorest quintile of 35
The empirical data displayed in Figures 11.6 and 11.7 are taken from Tables 5 and 7 in Budría Rodríguez et al. (2002).
11 OLG Models with Uncertainty
¯l s
658 0.375 0.350 0.325 0.300 0.275 0.250 0.225 0.200 1
5
10
15
20
25
30
35
40
45
s Figure 11.5 Labor-Supply-Age Profile in the Stochastic OLG Model
the workers receives only 2.5% of total earnings, while the top quintile earns 53.4% of the total. According to the estimates of Budría Rodríguez et al. (2002), the corresponding quintiles received 0% and 60.2% in the US in 1996. The Gini coefficient of earnings amounts to 0.51 in the model compared to a value of 0.66 in the US economy in 1996.36 Earnings are more concentrated than hourly wages (with a Gini coefficient equal to 0.37) because the more productive workers supply more labor than the less productive workers in our model (as observed empirically). Figure 11.7 presents the Lorenz curves of wealth in the model (red line) and in the US economy (blue line). Wealth is more concentrated than earnings in our model. The (wealth-) poorest quintile holds no wealth at all, and 20% of the model population is credit-constrained. The top quintile of the wealth distribution holds 67.6% of total wealth. Both Budría Rodríguez et al. (2002) and Krueger et al. (2016) report that the lowest quintile does not hold any wealth in the US economy in 1996 and 2006, while the top quintile holds 81-82% in both years. The Gini coefficient of wealth amounts to 0.66 in our model, while empirically, the Gini coefficient is even higher and amounts to values close to 0.80 as reported by Budría Rodríguez et al. (2002) and Krueger et al. (2016). In summary, our model is able to replicate the fact that wealth is distributed more unequally than earnings. The OLG model also generates 36
Krueger et al. (2016) find lower inequality of earnings for the US economy in 2006 than Budría Rodríguez et al. (2002). Different from Budría Rodríguez et al. (2002), they use data from the Panel Study of Income Dynamics (PSID) rather than data from the Survey of Consumer Finances (SCF). For example, Krueger et al. (2016) find a Gini coefficient of earnings equal to 0.43. Therefore, our results for the inequality of earnings are in between those implied by the two different data sets in 1996 and 2006.
Proportion of Earnings
11.2 Overlapping Generations Models with Individual Uncertainty 1.0
659
Equal distribution Model US
0.8 0.6 0.4 0.2 0.0
0
0.2
0.4
0.6
0.8
1
Proportion of Households
Proportion of Wealth
Figure 11.6 Lorenz Curve of US and Model Earnings 1.0
Equal distribution Model US
0.8 0.6 0.4 0.2 0.0
0
0.2
0.4
0.6
0.8
1
Proportion of Households Figure 11.7 Lorenz Curve of US and Model Wealth
more wealth heterogeneity than the neoclassical model with heterogeneous productivity presented in Section 8.6. For this reason, the OLG model is a predominant prototype model when the researcher would like to study wealth distribution effects of economic and fiscal policies.37 The life-cycle savings motive seems to be a very natural and intuitive way to generate heterogeneity and also makes the wealth distribution dependent on demographic effects. For example, the present model helps to shed light 37
You have already encountered methods to create more wealth inequality in the neoclassical growth model with heterogeneous agents. Usually, however, these modeling choices impose exogenous sources of inequality such as differences in preference parameters to generate sufficient wealth heterogeneity. For example, the original paper by Krusell and Smith (1998) uses the assumptions that households are heterogeneous with respect to their discount factor β, which follows an AR(1)-process; however, the underlying parameter β cannot be observed empirically.
660
11 OLG Models with Uncertainty
on the question of how wealth inequality will behave in times of aging as is presently underway in the advanced industrial countries.38 There are numerous reasons why the endogenous wealth heterogeneity of our model is smaller than observed empirically:39 1. Earnings-related pensions: Pensions are not related to the earnings history of the recipient. If the earnings-rich agents receive larger pensions, one might suppose that wealth heterogeneity would also be higher. However, as earnings-poor agents also know that they will only receive small pensions, they will also save more for precautionary reasons.40 2. Asset-based means test of social security: We neglect any asset-based means tests of social security. Hubbard et al. (1995) show that, in the presence of social insurance programs with means tests, low-income households are likely to hold virtually no wealth across their lifetime. Unemployment and asset-based social insurance would imply a much higher proportion of agents with zero or near-zero wealth. 3. Unemployment: Furthermore, agents are subject to employment risk. Heer (2003b) studies a life-cycle economy with endogenous search unemployment. Working agents may lose their job at an exogenous rate; higher search effort increases the job finding probability, but searching for a job also causes a disutility for the agent. Heer (2003b) shows that the replacement rate of unemployment insurance has only a very small effect on wealth heterogeneity. Although income is redistributed from income-rich agents to income-poor workers with the help of unemployment insurance, higher unemployment insurance also increases endogenous unemployment, so that the number of unemployment recipients increases. As a consequence, the wealth Gini coefficient 38
You are asked to compute the distribution effects of aging in Problem 11.1. For a more comprehensive survey of wealth heterogeneity in quantitative general equilibrium models, see De Nardi (2015). Quadrini and Ríos-Rull (1997) present an early review of studies of wealth heterogeneity in computable general equilibrium models with uninsurable idiosyncratic exogenous shocks to earnings, distinguishing business ownership, higher rates of return on high asset levels, and changes in health and marital status, among others, as possible explanatory factors of wealth inequality. 40 In our own research, we have only encountered applications where the introduction of earnings-related benefits decreased wealth heterogeneity (as measured by the Gini coefficient). In the next section where we introductions earnings-related pensions, we will confirm this result. 39
11.2 Overlapping Generations Models with Individual Uncertainty
661
changes by less than one percentage point if the replacement ratio of unemployment insurance increases from 0% to 50% or even to 100%; for a replacement ratio exceeding 70%, wealth heterogeneity actually starts to increase again. 4. Bequests: In our model, we assumed that the government collects accidental bequests and that the households do not have any altruistic bequest motive. Empirically, however, parents care for their children and, in particular, affluent individuals leave bequests to their descendants. For example, Kessler and Masson (1989), considering France, find that only 36% of the households receive any inheritances and those who do are approximately 2.4 times richer than the representative household. Heer (2001b) considers an OLG model where parents leave altruistic and accidental bequests to their children. He, however, finds that bequests are able to explain only a small fraction of observed wealth heterogeneity. The main reasons are that i) poor agents may also receive bequests and ii) agents who expect a high inheritance in the future also spend more on consumption. Importantly, however, Heer (2001b) only considers intergenerational transfers of physical wealth, not transfers of human wealth. Rich parents may have rich children because they may invest in their college education, for example. Loury (1981) analyzes parental human capital investment in their offspring. The allocation of training and hence the earnings of the children depend on the distribution of earnings among the parents. Becker and Tomes (1979) present a model framework comprising both human and non-human capital transfers from parents to children. The introduction of human capital transfers in an OLG model to explain the observed wealth heterogeneity is analyzed in a general equilibrium by De Nardi and Yang (2016). In their model, parents pass on both bequests of wealth and inheritance of abilities. Importantly, they are able to match the skewness and the long tail of the distribution of wealth and bequests. 5. Credit limits in financial markets: In our model, agents are not allowed to borrow against anticipated bequests, implying a credit limit a ≥ 0. For lower binding constraints, a < 0, wealth heterogeneity increases as demonstrated by Huggett (1996). In particular, the proportion of agents holding zero and negative assets increases. 6. Entrepreneurship: In our model, entrepreneurship is missing. Quadrini (2000) provides a comparative analysis of differences in income and wealth between
662
11 OLG Models with Uncertainty
workers and entrepreneurs in the US economy. According to his results, the top wealth group contains a high share of entrepreneurs. Prominent examples of successful entrepreneurs who earned considerable wealth in present times include persons such as Bill Gates, Elon Musk, or Jeff Bezos who are associated with the firms Microsoft, Tesla, and Amazon, among others. Brüggemann (2021) points out that, in the Survey of Consumer Finances (2010), entrepreneurs constitute more than a third of the households in the top income percentile even though there share in total population only amounts to 7.4%. As one of the first quantitative studies in dynamic general equilibrium, Quadrini (2000) introduces entrepreneurship into the neoclassical growth model to explain the high concentration of wealth among the very rich agents. Cagetti and De Nardi (2009) introduce endogenous entrepreneurship in a dynamic life-cycle general equilibrium model to study the distribution and welfare effects of lower estate taxation.41 They find that, if other taxes were raised to compensate for the shortfall in fiscal revenues, most households would lose. In independent research, ˙Imrohoro˘ glu et al. (2018) and Brüggemann (2021) use a model based on Cagetti and De Nardi (2006) and Cagetti and De Nardi (2009)42 to study the optimal marginal tax rates of the top 1 percent and find welfare-maximizing rates in the vicinity of 60%.43 7. Stochastic health: Using data from the Assets and Health Dynamics of the Oldest (AHEAD), De Nardi et al. (2010) estimate that the average out-of-pocket annual medical expenditures increase from $ 1,100 at age 75 to $ 9,200 at age 95. In addition, both health expenditures and longevity are highly uncertain in old age, so households accumulate precautionary savings. 41
Cagetti and De Nardi (2009) like Krueger et al. (2016) use an adaption of the neoclassical growth model that is based upon work by Blanchard (1985) and Gertler (1999). Households are born as workers. With certain probabilities they first become retired before they decease. Deceased households are replaced by young (working) households. See also Footnote 1 in Chapter 10. 42 In addition to these authors, ˙Imrohoro˘ glu et al. (2018) introduce superstar entrepreneurs in their model, while Brüggemann (2021) allows for elastic labor supply of the workers and abstracts from intergenerational correlation of earnings or abilities. 43 Kindermann and Krueger (2022) who do not explicitly consider entrepreneurs find an even higher optimal marginal tax rate of approximately 80%; however, their modeling of the income productivity process assumes very high and transitory shocks so that the labor supply of the top income workers is rather inelastic due to the strong intertemporal substitution effect.
11.2 Overlapping Generations Models with Individual Uncertainty
663
One possible way to model the heterogeneity in health and, hence, precautionary savings, is presented by Jung and Tran (2016). In their OLG model, households also hold health capital that is modeled in a similar way as physical capital. In particular, health capital increases with medical services and depreciates with age. In addition, it is subject to a health shock. As a consequence, they are able to match the variance in health capital among the old in the US.44 Nevertheless, wealth inequality falls short of empirical values. As a policy application, Jung and Tran (2016) consider the U.S. health care reform 2010 in the form of the Affordable Care Act (ACA) also known as ’Obamacare’. As the main goal of this policy, the health coverage rate, especially among low-income groups, is targeted to increase. The ACA is found to redistribute income both from low to high health risk types and from high to low income groups.
11.2.3 Multi-Dimensional Individual State Space In this section, we add another dimension to the individual state space and study how the computational time is affected by the so-called ‘curse of dimensionality’. Computational speed increases nonlinearly with the number of continuous state variables. Therefore, with present computer technology, it is difficult to solve large-scale heterogeneous-agent OLG models with uncertainty once the number of continuous (individual) state variables exceeds two or three. In the following, we add contribution-dependent pensions following Huggett and Ventura (1999) and Kitao (2014). Therefore, accumulated earnings x˜ as described in (11.4) enter the individual state space as a new (continuous) variable. We will show below that the value and policy functions do not display substantial curvature with respect to the variable x˜ . As a consequence, it suffices to use only a few grid points n x over the asset space X = {˜ x 1 , . . . , x˜n x } in the approximation of the functions. Evidently, the severity of the curse of dimensionality critically depends on the functional relationship between the policy functions and the individual state variables and implies high computational costs in the presence of 44
In old age, for example, the health capital in the bottom three quartiles of the US health capital distribution diverges by -5.9%, -27.8% and -59.1% from that of the top quartile. We would like to thank Juergen Jung and Chung Tran for the provision of their estimates which is based upon data from the Medical Expenditure Panel Survey (MEPS).
664
11 OLG Models with Uncertainty
high curvature or discontinuity of the policy functions or its derivatives. Sources of discontinuity in the behavior of the policy functions are frequent and include, among others, credit constraints on assets, asset-based means tests, tax exemptions, or income eligibility thresholds for social insurance programs such as Medicaid. These kinds of constraints might result in kinks of or jumps in the policy functions that are difficult to approximate numerically. With the enlarged state space, Algorithm 11.2.1 for the computation of the stationary equilibrium in the large-scale OLG model still applies. However, in some steps, our implementation will slightly diverge from the model with lump-sum pensions, which we discuss in the following. The algorithm is implemented in GAUSS, Python, and Julia with corresponding program names AK70_prog_pen.g , AK70_prog_pen.py , and AK70_prog_pen.jl , respectively. In Step 1, we need to provide the parameters of the pension schedule as described in the calibration section. Note that in the model with contribution-dependent pensions, we need to update an additional aggregate variable, that for average accumulated earnings x¯ . As an initial value of x¯ , we compute the average wage income per worker with the help of ˜ and ˜L in Step 2. the initial values of K A noted above, we also need to compute the value and policy functions over an additional dimension of a continuous state variable in the form of accumulated earnings x˜ in our multi-dimensional optimization problem. For this reason, we discretize the state space with n x gridpoints with respect to the variable x˜ , X = {˜ x 1 , . . . , x˜n x }. The computation of this higher-dimensional problem, of course, is much slower than the problem studied in the previous section, and we need to be careful in the choice of the number of grid points. We choose na = 200 and n x = 10, which captures the curvature of the policy functions with respect to the state variables a˜ and x˜ . As we noted above, the policy functions are rather linear with respect to x˜ , and a coarse grid on the accumulated earnings x˜ is sufficient. Of course, you will only know this from experience or after the execution of the first test runs with the computer program for a new model. Thorough study of your policy functions in the setup of the code is therefore a vital component of your programming technique and implies substantial trial and error. In total, we compute the solution of the value function iteration in the inner loop the following number of times: T W × ne × nθ × na × n x + T r × na × n x = 950, 000.
11.2 Overlapping Generations Models with Individual Uncertainty
665
In the present model specification, we also have to specify another grid for labor supply. Different from the model with lump-sum pensions, our first-order condition with respect to labor supply is now represented by45 ∂ u(˜c , 1 − l) ∂ u(˜c , 1 − l) (1 − τl − τ p ) ¯y s eθ w = ∂ (1 − l) ∂ ˜c 1 + τc ∂ v˜(˜zz+1 ) ¯y s eθ w + β(1 + gA)γ(1−η)−1 φ s E t . ∂ x˜ 0 s
(11.38)
In comparison with (11.34), the term with the derivative of the value function with respect to next-period accumulated earnings, ∂ v˜(˜zs+1 )/∂ x˜ 0 , enters as a new additive term on the right-hand-side of equation (11.38). In essence, we have two options to compute the optimal labor supply in our model. 1) We may solve (11.38) using a nonlinear equations routine such as the Newton-Rhapson algorithm. Therefore, we have to evaluate the derivative of the value function with respect to its second argument, accumulated earnings x˜ . If we store the value function at n x grid points, this involves finding the derivative between grid points x˜ and, therefore, a higher-order interpolation method along this dimension. To see this point, assume instead that we only use linear interpolation of the value function between two neighboring grid points, x˜i and x˜i+1 . As a consequence, the derivative ∂ v˜0 (·)/∂ x˜ 0 is no longer continuous (it is constant between grid points and jumps at grid points) so that a solution to (11.38) may not be found. You are asked in Problem 11.2 to study this method. 2) We may specify a grid over labor, L = {l1 , . . . , l nl } with l1 = 0 and l nl = l max and nl grid points. We then use a simple search mechanism to compute the labor supply that implies the maximum value of the value function. We adopt this procedure in the following. For this reason, we choose a grid over labor with nl = 30 equispaced points. In Step 4, we, again use value function iteration to compute optimal policy functions and the value function at grid points. Different from the computational method in the previous section, we do not evaluate the right-hand side of the Bellman equation (11.13) between grid points but only use the simplest computational technique and choose the value a˜0 as the grid point a˜ia on A that implies the maximum value. We introduced this technique as ‘Simple Iterative Procedure’ in Section 7.2.46 While this technique is easy to program, the computational speed for given accuracy is usually much slower than value function iteration that optimizes between 45 46
(11.38) is derived in Appendix A.12. You are asked to solve this problem using intermediate values of the grid in Problem 11.2.
666
11 OLG Models with Uncertainty
grid points. Therefore, you have to consider the trade-off between the time you need to write the program and the execution time of the program. In addition, if you want to obtain a quick first impression about the solution, high accuracy may not be the most important criterion for your choice, and you may prefer a computational method that is fast to program. If the (slow) solution with low accuracy yields promising results, you can still switch to a more accurate and fast method. We implemented the maximization step at a grid point (˜ aia , x˜i x ) for an s-year old worker with permanent and idiosyncratic productivities, e j and θiθ , as follows.47 First, we compute the next-period values of accumulated contributions x˜ 0 for all points on the labor grid L . If any of the values for x˜ 0 happens to lie above the upper boundary x˜ max , we set it equal to x˜ ma x .48 Next, we compute present consumption ˜c implied by the nextperiod wealth a˜0 ∈ A and the nl possible values of the labor supply l ∈ L using the budget constraint (11.14). The result consists of a matrix with (na × nl ) entries. To compute the first additive term on the right-hand side of the Bellman equation (11.13), we pass the matrix on to our function utility(c,l) (in Python or Julia) or procedure utility(c,l) (in GAUSS) in the computer program. Note that, for this reason, you have to program the function/procedure utility(.,.) in such a way that it can use matrices as arguments and also returns a matrix. Usually, this command requires a special syntax depending on the computer language that you are using.49 In addition, we need to generate a matrix for the second input argument l by constructing a matrix with na rows where the row vectors are equal to the grid points on L . Next, we compute the second additive term on the rhs of the Bellman equation (11.13). Therefore, we need to interpolate the next-period value function at the values of x˜ 0 that do not lie on the grid X . Again, the computed values of the expected next-period value function (discounted by the factor (1+ gA)γ(1−η) βφ s ) are stored in a matrix. We add the two matrices that contain the values of u(˜c , l) and the discounted next-period value function v˜0 (˜ a0 , x˜ 0 ) and call the resulting matrix bellman 47
The optimization step for the retiree is comparatively simple because we do not need to optimize over the labor supply. For this reason, we do not describe it here in detail. 48 Of course, if the upper boundary x˜ ma x keeps binding during the final iteration over the aggregate variables, we need to increase x˜ ma x and rerun the program. To check for this problem, we plot the distribution of x˜ at the first year of retirement in the program AK70_prog_pen.g. 49 For example, if you are using Julia and specified a function u(c,l) for instantaneous utility from consumption c and labor l, you need to apply the command u.(c0,l0) to compute the matrix of instantaneous utilities for your (na × nl )-matrices c0 and l0.
11.2 Overlapping Generations Models with Individual Uncertainty
667
in our programs. Finally, we simply allocate the maximum value in the matrix bellman using the particular command provided in the computer language, e.g., xmax = np.where(bellman == np.max(bellman)) in Python. Let us pause at this point to reflect on our methodological choices. Why did we not consider the grid points on accumulated earnings, x˜ 0 ∈ X , rather than the grid points on labor supply, l ∈ L , to find the maximum of the right-hand side of the Bellman equation? This way, we would have avoided the time-consuming step to interpolate the next-period value function v˜0 (˜ a0 , x˜ 0 ) along the x˜ 0 -dimension because all points would lie on the grid space. To illustrate this point, let us consider a particular numerical example from our program AK70_prog_pen.py. In particular, let us consider a 40-year old with permanent productivity e1 = 0.57 and idiosyncratic stochastic component θ1 = 0.4688. The age component of this 40-year old amounts to ¯y 40 = 1.0590. For a wage rate approximately equal to w = 1.3 in our stationary equilibrium, the wage per hour amounts to ε(s, e, θ )w ≈ 0.3679. Assume that we would like to compute the optimal labor supply with an accuracy of at least ∆l = 0.02 (as for the present choice of numerical parameters). An increase in labor supply by 0.02 results in an increase of accumulated average earnings x˜ at age 41 by the amount ε(s, e, θ )w ∆l = 4.60 · 10−5 . 40 As a consequence, we would have to choose a grid for x˜ with a distance between grid points equal to 4.60 · 10−5 or, given the lower and upper boundaries x˜ min = 0 and x˜ max = 3.0, a total number of grid points equal to n x = 16, 310. Of course, this high number would result in the breakdown of our algorithm since we would have to compute and store the solution to the value function iteration problem at the following number of points: T w × ne × nθ × na × n x + T r × na × n x = 1.6 · 109 . Figure 11.8 displays the behavior of the policy functions labor supply, l(·), and savings, a˜0 (·) − a˜, as a function of individual wealth, a˜, for different levels of accumulated earnings, x˜ ∈ {0, 0.33, 0.66} in the stationary equilibrium.50 We randomly select a 20-year old with permanent productivity e2 = 1.43 and stochastic productivity θ4 = 1.4605. Evidently, both labor supply and savings decrease monotonically with wealth a˜. 50
Average accumulated earnings amount to x¯ = 0.4551.
668
11 OLG Models with Uncertainty Labor Supply
l(20, e2 , θ4 , a˜, x˜ )
0.55
x˜ = 0 x˜ = 0.33 x˜ = 0.67
0.50 0.45 0.40 0.35 0.30 0.25 0.20 0
2
4
6
8
10
12
14
16
18
20
a˜ Savings x˜ = 0 x˜ = 0.33 x˜ = 0.67
a˜0 (20, e2 , θ4 , a˜, x˜ ) − a˜
0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 −0.05 −0.10 −0.15
0
2
4
6
8
10
12
14
16
18
20
a˜ Figure 11.8 Policy Functions as a Function of Wealth
In Figure 11.9, we plot the policy functions as a function of accumulated earnings x˜ for three different levels of wealth, a˜ ∈ {0, 0.45, 0.95}, where the upper two values correspond to 26.3% and 55.4% of the average ˜ in the economy. Clearly, the policy functions are rather flat with wealth Ω respect to the state variable x˜ , and we can be quite confident that a coarse grid for accumulated earnings x˜ is sufficient. The accuracy of the policy functions as measured by the Euler residuals is much smaller than in the case of methods that search over the optimum between grid points. For a number of grid points on the asset space A and labor L with na = 200 and nl = 30 grid points, respectively, we find that the mean absolute Euler residuals of the young and old amount to 3.62% and 3.63% and, therefore, are higher by a factor of approximately 10-50 than those from the model with lump-sum pensions above. Obviously, the
11.2 Overlapping Generations Models with Individual Uncertainty
669
Labor Supply a˜ = 0 a˜ = 0.45 a˜ = 0.95
l(20, e2 , θ4 , a˜, x˜ )
0.53 0.52 0.51 0.50 0.49 0.48 0.47 0
0.5
1
1.5
2
2.5
3
x˜ Savings 0.46
a˜0 (20, e2 , θ4 , a˜, x˜ ) − a˜
a˜ = 0 a˜ = 0.45 a˜ = 0.95
0.44 0.42 0.40 0.38 0.36 0
0.5
1
1.5
2
2.5
3
x˜ Figure 11.9 Policy Functions as a Function of Accumulated Average Earnings
accuracy is low and may be prohibitive if we would like to derive policy implications. To improve upon the accuracy, we increase the number of grid points on A and L to na = 400 and nl = 60 once the aggregate ˜ , ˜L , ter, x¯ ) are close to convergence. In particular, we state variables (K increase the number of grid points in Step 7 as soon as the percentage ˜ falls below 0.1%. We continue deviation of the aggregate capital stock K to iterate over the aggregate capital stock until the capital stock again converges and deviates by less than 0.1% from its value in the previous outer iteration. By starting our program with a low accuracy that we only increase shortly prior to convergence, we considerably economize on computational speed. In the final iteration over the aggregate variables, the absolute Euler residuals still average 3.27% and 1.65% for the worker and retiree, respectively.
670
11 OLG Models with Uncertainty
In Step 5, we compute the endogenous distribution of wealth and accumulated earnings distribution, f (s, e, θ , a˜, x˜ ), in each cohort s for the ne permanent and nθ idiosyncratic productivity types. Different from the model with lump-sum pensions where we interpolated between grid points to find the optimal levels of a˜0 , we choose the same number of grid points na for individual wealth as in the computation of the policy functions. Since we require the next-period wealth a˜0 to be a grid point on A , there is no need to choose a finer grid. The cumulative distributions of individual wealth a˜ of the total population, F (·, ·, ·, a˜, ·), and accumulated average earnings x˜ of a 46-year old at the start of retirement, F (46, ·, ·, ·, x˜ )/µ46 , are illustrated in Figure 11.10. The measure of the households with wealth a˜ ≥ 17.5 is zero, so the upper boundary a˜ max = 20.0 is chosen sufficiently high. Similarly, the measure of 46-year-old households with accumulated earnings x˜ ≥ 2.33 is equal to zero, so the upper boundary x˜ max = 3.0 is also chosen adequately. In addition, we checked the distribution of accumulated average earnings at different working ages (it is constant during retirement) and made the same observation. When you set up the program, it is advisable that you examine the distribution of the individual states at the beginning and in the final run of the program. COMPUTATIONAL TIME. The computational time is summarized in Table 11.3. Using GAUSS, the computation of the optimal policy for given aggregates amounts to 32 minutes:47 seconds, while the computation of the distribution is much faster and only requires 1 minute:13 seconds. Total computational time is considerable, amounting to more than 1 day. Comparing the three languages, we find that Julia is approximately twice as slow as GAUSS, while Python is by far the slowest among the three computer languages considered. In fact, total computational time for the Python code amounts to more than 6 days! This ranking is in good accordance with our own personal experience from the computation of large-scale OLG models with heterogeneous agents. In some applications, however, we have also experienced Julia code being faster than GAUSS code (for example, in the present model, in the computation of the distribution) so that a unanimous overall ranking of the two programming languages GAUSS and Julia with respect to computational speed above is only tentative.
11.2 Overlapping Generations Models with Individual Uncertainty
671
Wealth 1.00
F (·, ·, ·, a˜, ·)
0.80 0.60 0.40 0.20 0
4
2
8
6
10
12
14
16
18
20
a˜ Cumulative Earnings of the 46-Year-Old Cohort
F (46, ·, ·, ·, x˜ )/µ46
1.00 0.80 0.60 0.40 0.20 0
0.5
1
1.5
2
2.5
3
x˜ Figure 11.10 Cumulative Distribution Functions
˜ = 1.713, implying an aggreRESULTS. Aggregate savings amount to Ω ˜ gate capital stock equal to K = 1.376. Both values are slightly lower than the corresponding values found in the model with lump-sum pensions ˜ = 1.486). There are many opposing effects of ˜ = 1.824 and K (with Ω contribution-based pensions on aggregate savings. First, workers with high income will receive a higher pension and, hence, need to save less to provide for old age, so aggregate savings fall. Second, workers with low income need to increase their old-age savings. Third, with contribution-based pensions, the incentive to supply labor increases for all workers such that income and, hence, savings increase. In fact, aggregate labor in efficiency units amounts to ˜L = 0.327 and is higher than in the case of lump-sum pensions above (with ˜L = 0.310). Fourth, with our new calibration of the pension schedule, the average replacement rate of pensions increases from
672
11 OLG Models with Uncertainty Table 11.3 Comparison of Runtime
Julia GAUSS Python
Policy
Distribution
Total
1:17:03 31:47 2:28:12
0:30 1:13 24:38
3:10:41:56 1:11:13:02 6:01:17:04
Notes: Runtime is given in days:hours:minutes:seconds on a machine with Intel(R) Xeon(R) CPU running at 2.90 GHz. The first- and second entry-columns report the time for one execution of Steps 4 and 5 of Algorithm 11.2.1, respectively, using na = 200, n x = 10, and nl = 30 grid points.
35% (with lump-sum pensions) to 53%. As a consequence, all households have to save less for old age ceteris paribus. In our calibration, we find that the negative effects 1) and 4) dominate the positive effects 2) and 3) on savings. Our main interest in the study of this multi-dimensional optimization problem is motivated by the question of whether contribution-based pensions help to explain the high wealth inequality observed empirically. Our hope was that the provision of higher pensions to the retirees with high incomes during working life also increases wealth inequality. However, the Gini coefficient of wealth even decreases from 0.66 (with lump-sum pensions) to 0.64 (with earnings-dependent pensions) as the low- (high-) income workers have to save a higher (lower) share of their income for old age.
11.3 Overlapping Generations with Aggregate Uncertainty In this section, we introduce aggregate uncertainty into the standard OLG model. One prominent application of OLG models with aggregate uncertainty is to consider the effects of aging and the public pension system on the equity premium.51 Having a higher share of older agents is likely to increase the returns from stocks relative to those from bonds. Old agents prefer to hold a large part of their wealth in the form of safe assets such as bonds because their (relatively safe) nonfinancial wealth in the form of discounted pensions is relatively small. Younger agents may prefer to invest 51
Please see also Section 5.7 of this book on the equity premium puzzle.
11.3 Overlapping Generations with Aggregate Uncertainty
673
predominantly in risky assets such as stocks because their total wealth mainly consists of discounted lifetime labor income that is characterized by relatively little risk.52 Aging may now increase the demand for bonds relative to stocks and thus increase the equity premium. If, however, public pension systems are changed from pay-as-you-go to fully funded, total savings may increase, and if the pension funds invest the additional savings primarily in the stock market, the equity premium may fall. Brooks (2002) quantitatively explores the impact of the baby boom on stock and bond returns in a model with 4 overlapping generations. He predicts a sharp rise in the equity premium when the baby boomers retire. As an important step to answering this question in a more realistic setting, Storesletten et al. (2007) consider an OLG model with annual periods. They analyze the effects of idiosyncratic risk and life-cycle aspects of asset pricing. Obviously, in models with aggregate uncertainty, we can no longer study the transition dynamics because the time path is stochastic. To see this point, assume that the US is in a steady state at present (period t = 0) and that there is a sudden, unexpected and permanent decline in the fertility rate. If aggregate technology were a deterministic variable, we could compute the transition path just as in Section 10.4. In this case, agents would have to predict the time path for the factor prices. We would know that after some 200–300 periods, the economy is close to the new steady state and that we may stop the computation. If technology is stochastic, however, agents can only form expectations about the time path of factor prices, and the number of possible time paths becomes infinite. Assume that technology can only take two different values. Even in this case, we would have to compute some 2n different transition paths with n denoting the number of periods. Given our experience, it is safe to assume that in a model with annual periods, n should be in the range of 200–300. The computational costs become unbearable.53 Therefore, we will confine the analysis of OLG models with aggregate uncertainty to the study of the dynamics around the steady state. In the following, we describe two methods for the computation of the dynamics close to the steady state that you have already encountered in previous chapters of the book. First, we will consider models without idiosyncratic uncertainty. In these models, we can compute the steady 52
There are numerous other variables than age that influence the portfolio decision of the households such as housing. 53 As one possible solution to this problem, one can use Monte Carlo simulation techniques to compute multiple possible transition paths and the associated distribution for the factor prices during the transition. We will not pursue this method here.
674
11 OLG Models with Uncertainty
state by solving directly for the individual state variables with the help of nonlinear equation solvers. The method has been described in Section 10.2. Although the state space may be quite large and include hundreds of variables, it is often possible to apply first-order or even higher-order perturbation methods as described in Chapters 2 and 3 to compute the solution. In the following Section 11.3.1, we will use first-order perturbation to study the business cycle dynamics of an OLG model with 280 overlapping generations. Second, we consider an OLG model with both idiosyncratic and aggregate uncertainty in Section 11.3.4. As the most practical approach to the solution of such a problem, we advocate the algorithm of Krusell and Smith (1998) from Section 9.5.2. We will use the latter method to compute the business cycle dynamics of the income distribution and compare our results with those from the model with infinitely lived households from Section 9.5.2.
11.3.1 Perturbation Methods There are few studies that apply first- or second-order perturbation methods to OLG models of large-scale economies. In his pioneering work, Ríos-Rull (1996) considers the dynamics in a stochastic life-cycle model and compares the real business cycle statistics in the OLG model with those in the standard neoclassical growth model.54 He finds that in the two models, the aggregate variables such as output, investment and consumption are characterized by similar second moments. The observation that business cycles properties are identical in representative agent and OLG models, however, does not hold in general. For example, Braun and Ikeda (2021) who study monetary policy over the lifecycle in a stochastic New Keynesian OLG model show that the sign, magnitude and persistence of household consumption responses to monetary shocks critically depend on age. In our own work (Heer and Maußner (2012)), we consider the redistributive effects of inflation following an unanticipated monetary expansion. We find that unanticipated inflation that is caused by a monetary shock reduces income inequality. Heer and Scharrer (2018) consider the redistributive effects of a government demand shock and show that an unexpected increase in government spending decreases both income 54
Different from our algorithm in this section, he concentrates on the analysis of a paretooptimal economy and studies the problem of a central planner. In particular, he uses the LQ-approximation presented in Section 2.4 to compute the solution of this model.
11.3 Overlapping Generations with Aggregate Uncertainty
675
and wealth inequality. In particular, and contrary to conventional wisdom, they show that debt rather than tax financing of additional government spending may also harm retirees.
11.3.2 The OLG Model with Quarterly Periods In the following, we illustrate the numerical and analytical methods with the help of a 280-period OLG model that is described in Example 11.3.1. The model builds upon that considered in Section 10.3. However, while we neglect government debt, we subsequently consider quarterly rather than annual periods to study business cycle dynamics. In addition, we introduce both a technology shock ε Zt and a government demand shock εGt into the model. In particular, both the (logarithmic) aggregate technology level Z t and government consumption G t follow AR(1)-processes. Example 11.3.1 The model consists of three sectors: households, firms, and the government. Demographics. The demographics are modeled with respect to the characteristics of the US economy in the year 2015. Total population Nt grows at the constant rate n: Nt+1 = (1 + n)Nt .
(11.39)
Households live a maximum of T = 280 periods (=quarters) corresponding to 70 years (real-life ages 20 to 89). They survive from age (quarter) s to age (quarter) s + 1 with probability φ s (with φ 0 ≡ 1.0). Let µst denote the share of generation s in the total population in period t. We consider a stationary population that is characterized by constant µs , s = 1, . . . , T . For the first T W = 180 periods (=45 years), households are working, for the last T − T W = 100 periods (=25 years), they are retired and receive pensions. Households. Households maximize expected lifetime utility at age 1 in period t: ! T s X Y s s Et β s−1 φ j−1 u(c t+s−1 , 1 − l t+s−1 ). s=1
j=1
Instantaneous utility is a function of both consumption c and leisure 1 − l: 1−η c γ (1 − l)1−γ −1 u(c, 1 − l) = . 1−η
The working agent of age s with age-dependent productivity ¯y s faces the following budget constraint in period t:
676
11 OLG Models with Uncertainty s p k l ¯s s ks+1 t+1 = 1 + (1 − τ )r t k t + (1 − τ − τ t )w t A t y l t + t r t − (1 + τ
c
)c ts ,
(11.40)
s = 1, . . . , 180,
where r t , w t , and A t denote the interest rate, the wage rate, and aggregate labor productivity in period t. The government imposes constant taxes τc , τk , and τl on consumption as well as capital and labor income, while social security contrip butions τ t are time-dependent. In addition, the households receive government transfers t r t . Households are born with zero wealth, k1t = 0. Retirees receive pensions pen t and government transfers t r t . Accordingly, their budget constraint is given by k s c s ks+1 t+1 = (1 + (1 − τ )r t )k t + pen t + t r t − (1 + τ )c t , s = 181, . . . , 280, (11.41)
with k tT +1 ≡ 0 and l tT
W
= l tT
+1
W
+2
= . . . = l tT = 0.
Production. Goods and factor markets are competitive. Production Yt is characterized by constant returns to scale and assumed to be Cobb-Douglas: Yt = Z t K tα (A t L t )1−α ,
(11.42)
where aggregate labor productivity A t grows at an exogenous rate gA: A t+1 = (1 + gA)A t ,
(11.43)
and technology ln Z t follows the AR(1)-process: ln Z t = ρ ln Z t−1 + ε Zt . The technology innovation ε t is i.i.d., ε Zt ∼ N (0, σ2Z ). In a factor market equilibrium, factors are rewarded with their marginal product: w t = (1 − α)Z t K tα (A t L t )−α , 1−α
r t = αZ t K tα−1 (A t L t )
− δ,
(11.44a) (11.44b)
where capital K t depreciates at the quarterly rate δ and L t denotes aggregate effective labor. Public Sector. The government receives taxes from consumption and (labor and capital) income and collects accidental bequests Beq t :55 Beq t =
T −1 X s=1
(1 − φ s )µst Nt ks+1 t+1 .
(11.45)
Total government revenues are spent on government consumption G t and lump˜ t := G t / sum transfers to all households Tr t . Stationary public consumption, G (A t Nt ), is stochastic and follows an AR(1)-process: 55
(11.45) is derived in Appendix A.10.
11.3 Overlapping Generations with Aggregate Uncertainty
677
˜ t = (1 − ρ G ) ln G ˜ + ρ G ln G ˜ t−1 + εG , ln G t
(11.46)
τl w t A t L t + τk r t K t + τc C t + Beq t = G t + Tr t ,
(11.47)
where εGt is i.i.d., εGt ∼ N (0, σ2G ). The government budget is balanced in every period t:
where C t denotes aggregate consumption. The social security authority pays pen t to all retirees and adjusts the social p security contribution rate τ t on wage income such that its budget is balanced in every period t: Pen t =
T X
p
µst Nt pen t = τ t w t A t L t .
(11.48)
s=T W +1
Equilibrium. In equilibrium, individual and aggregate behavior are consistent: Lt = Kt =
Tw X s=1 T X s=1
µst Nt ¯y s l ts ,
(11.49a)
µst Nt kst ,
(11.49b)
PT W
¯l t = Ps=1 TW
µst Nt l ts
s s=1 µ t Nt
Ct = Tr t =
T X s=1 T X s=1
,
(11.49c)
µst Nt c ts ,
(11.49d)
µst Nt t r t ,
(11.49e)
and the goods market clears: Z t (A t L t )1−α K tα = C t + G t + I t ,
(11.50)
with investment I t : I t = K t+1 − (1 − δ)K t .
(11.51)
Calibration. The discount factor β = 0.99 is chosen in accordance with its value in the business cycle models presented in the first part of the book, while γ = 0.29 is set such that average labor supply amounts to approximately ¯l = 0.30. The remaining preference, production, and tax parameter values are chosen as in Section 10.3 (adjusted to quarterly rates): η = 2.0, α = 0.35, δ = 2.075%, gA = 0.50%, τc = 5.0%, τk = 36.0%, and τl + τ p = 28.0%. Government consumption
678
11 OLG Models with Uncertainty
in steady state is equal to 18% of GDP. Pensions pen are constant and calibrated such that the nonstochastic replacement rate of pensions relative to net wage pen earnings is equal to r epl net = (1−τl −τp )wA¯l = 49.4% in accordance with OECD (2019), where ¯l is the average labor supply in the nonstochastic steady state
of the economy. The stochastic survival probabilities φ s are calibrated for the US economy in the year 2015 using UN (2015); in addition, we set the annual population growth equal to n = 0.754%. The age-efficiency profile is taken from Hansen (1993) and interpolated to values in between years. The parameters of the AR(1)-process for (logarithmic) technology Z t are set equal to ρ = 0.95 and σ = 0.00763 as in Prescott (1986). Following SchmittGrohé and Uribe (2007), we use ρ G = 0.87 and σG = 0.016 for the parameters of the autoregressive process for (logarithmic) government consumption G t .
COMPUTATION. We solve for the local dynamics of the model with the help of perturbation methods as described in Chapters 2 and 3. In particular, we use the toolbox CoRRAM. Therefore, we, first, have to solve the nonstochastic steady state and, second, provide the contemporaneous and dynamic equilibrium conditions in a procedure/function to CoRRAM. The GAUSS computer code AK280_perturb.g that is available as a download form our homepage computes Example 11.3.1.56 The total runtime amounts to 41 minutes on a machine with Intel(R) Xeon(R) CPU running at 2.90 GHz, of which 39 minutes are required for the computation of the steady state versus 2 minutes for the computation of the dynamics. NONSTOCHASTIC STEADY STATE. For the economy described in Example 11.3.1, we can compute the nonstochastic steady state with the help of the methods described in Section 10.2 by solving a large-scale nonlinear system of equations (in 464 variables). For this reason, we transform the model equations using the following individual stationary variables ˜ks = t
kst At
, ˜c ts =
c ts At
, ter t =
t rt pen t , pg en t = , At At
and aggregate stationary variables 56
To run program AK280_perturb.g, you first need to install the CoRRAM package. For a description of the package see Section 3.5.
11.3 Overlapping Generations with Aggregate Uncertainty
679
Tr t Pen t Kt Beq t Gt ˜t = fr t = g , T , Pg en t = , K , B eq t = , A t Nt A t Nt A t Nt A t Nt A t Nt L ˜L t = t , Nt
˜t = G
fr t in equilibrium. with ter t = T Using stationary variables, we can express the budget constraint of the households with the help of k ˜s (1 + τc )˜c ts + (1 + gA)˜ks+1 t+1 − [1 + (1 − τ )r t ] k t p (1 − τ L − τ t )w t ¯y s l ts + ter t , s = 1, . . . , T W , = pg en t + ter t , s = T W + 1, . . . , T.
(11.52) To solve the individual optimization model, we also need to derive the stationary first-order conditions of the household presented by p
(1 − τl − τ t )w t ¯y s (1 − l ts ) =
1−γ (1 + τc )˜c ts , γ
(11.53a)
s = 1, . . . , T W , γ(1−η)−1 (1−γ)(1−η) (1 + gA)˜c ts 1 − l ts (11.53b) ¦ © (1−γ)(1−η) γ(1−η)−1 s+1 s+1 = βφ s E t 1 + (1 − τk )r t+1 ˜c t+1 1 − l t+1 , s = 1, . . . , T − 1.
(11.53a) describes the optimal labor supply of the worker at age s = 1, . . . , T W , while (11.53b) presents the Euler condition of the household at age s = 1, . . . , T − 1. According to (11.53a), the marginal utility from an additional unit of leisure is equal to the utility gain from consumption that results from the wage of an additional labor unit. (11.53b) equates the marginal utility gain from consuming one additional unit in the present period and the gain that results from saving and consuming it in the next period. The nonstochastic steady state is characterized by a constant technology ˜ t = G. ˜ Furthermore, all level and government demand, Z t = Z = 1 and G individual and aggregate variables are constant and are denoted by a vari˜ denote the nonstochastic able without a time index. For example, ˜ks and K steady state capital stock of the individual at age s and the nonstochastic steady-state aggregate capital stock, respectively. The nonstochastic steady state is described by the 464 equations consisting of the 180 first-order conditions with respect to labor, (11.53a),
680
11 OLG Models with Uncertainty
the 279 Euler equations, (11.53b), the fiscal budget constraint, (11.47), the social security budget, (11.48), and the 3 aggregate consistency conditions, (11.49a)–(11.49c). The 464 endogenous variables consist of the 279 individual capital stocks ˜ks , s = 2, . . . , T , (recall that ˜k1 = 0), the 180 individual labor supplies l s , s = 1, . . . , T W , government transfers ˜, ter, social security contributions τ p , and the 3 aggregate variables K ˜L , and ¯l. In addition, we eliminate consumption ˜c s from these equilibrium conditions with the help of the individual budget constraints, (11.52), the factor prices w and r with the help of the marginal products of capital and labor, (11.44a) and (11.44b), and pensions with the calibration of their replacement rate with respect to net wages, pg en = r epl net × (1 − τl − τ p )w¯l . The equilibrium conditions of the steady state are provided in the procedure steady_state_public(x) in the computer program AK280_perturb.g. The main challenge in the solution of the nonlinear system of equations for the steady state is searching for a good initial value such that the Newton-Rhapson algorithm is able to find the root. There is no general rule for how you should find such a value. Often, you need to start with a simpler problem and apply a trial-and-error procedure to obtain to the initial value of the final problem.57 This is how we proceed in the following. We start with a simple OLG model where households only live for a maximum of 20 periods (=quarters) and work all their life. In addition, we assume that transfers are zero and neglect bequests and government consumption but include the tax rates in the household optimization problem. As a consequence, we do not derive a general but only a partial equilibrium model.58 To compute the solution to this initial model, we first compute the measures of the 20 cohorts by setting µ1 = 1 and iterating over s = 1, . . . , 19 as follows: µs+1 =
φs s µ. 1+n
We normalize thePmeasures {µs }20 s=1 by the division of the sum of all meas sures such that s µ = 1.0. Next, we use an outer and an inner loop to compute the steady state in this partial equilibrium model (where the 57
Please also read Section 10.3 for more on this topic. We also experimented with other initial specifications of the economy consisting of models with a larger number of periods or endogenous labor supply. However, in this case, the Newton-Rhapson algorithm did not converge for our initial guesses of the endogenous variables. 58
11.3 Overlapping Generations with Aggregate Uncertainty
681
fiscal budget is not balanced). In the outer loop, we iterate over the aggre˜ , ˜L , τ p , ¯l). The aggregate variables imply the factor prices gate variables (K that we need for the computation of the individual optimal policies. We initialize the average and aggregate effective labor supply with ¯l = 0.30 and ˜L = 0.30, respectively. Using the first-order condition of the firms with respect to their capital demand, (11.44b), we can derive the initial ˜ = 1.256. The social security rate guess for the aggregate capital stock K p is set to τ = 0% because we do not consider any retirees yet. In the inner loop, we solve the individual household optimization problem for given exogenous factor prices w and r and tax rates (τc , τl , τ p , τk ). The optimal policy functions c s , l s , and ks+1 are computed as the solution of the procedure ss_exog_lab(x) in the program AK280_perturb.g. The 19 nonlinear equations in this procedure consists of the 19 Euler conditions, (11.53b). The 20 budget constraints, (11.52), are used to substitute for the variables c s , s = 1, . . . , 20, from these Euler equations. The exogenous labor supplies are all set equal to l s = 0.30, s = 1, . . . , 20. As an initial guess of the 19 endogenous variables ks , s = 2, . . . , 20, we assume a simple life-cycle profile of savings. In particular, savings ks increase from k1 = 0 to a maximum k19 at age 19 and then fall by half at age 20. This profile is founded in our experience on life-cycle savings in OLG models from this and the previous chapter. We set the maximum individual capital stock to ˜ . Once we have computed twice the average (=aggregate) capital stock K the solution for the individual problem, we compute the new aggregate ˜ s by aggregating over the individual capital stocks ks using capital stock K (11.49b). We update the old aggregate capital stock by 20% and continue to iterate over the outer loop. In the inner loop, we use the solution from the previous iteration as an initial guess for {ks }20 s=2 . At this point, we do not need high accuracy (which is only needed in the final steady state with 280 cohorts and a fully specified general equilibrium model with a balanced government budget). Therefore, we only use 5 iterations over the aggregate variables to save on computational time. Next, we add another period of life in each step and recompute the steady state. Each time, we continue to iterate 5 times over the aggregate variables and use the optimal policy functions found in the previous iteration. To provide a new guess for the capital stock in the added new period at the end of the life, we simply assume it to be half the value of the end-of-life wealth found in the last iteration with a shorter life time. Once we have completed the computation of the model with 180 cohorts of workers, we start to add cohorts of retirees one at a time. In the outer loop over the aggregate variables, we therefore also need to compute the
682
11 OLG Models with Uncertainty
pensions that are calibrated with the help of the net replacement rate. In addition, we compute the social security contribution rate τ p that balances the budget of the social security authority, (11.48). In the next step, we compute the same model but with endogenous labor supply. The nonlinear system of equations now additionally contains the 180 first-order conditions with respect to labor, (11.53a). If we attempt to compute the solution with the help of the guess l s = 0.30, s = 1, . . . , 280, the Newtwon-Rhapson algorithm does not converge. We therefore have to use a more incremental approach. We first endogenize l 1 and maintain the assumption of exogenous labor supply l s = 0.30 for the other cohorts s = 2, . . . , 280. Next, we endogenize the labor supply of the second cohort and so.59 In the final model type, we include the government budget in the nonlinear system of equations and endogenize government transfers ter as a new variable. We initialize the variable ter at zero. We solve the complete nonlinear system of equations in all individual and aggregate variables in one loop. ˜ = 8.477, Our solutions for the aggregate variables are as follows: K ˜ = 0.148, ˜L = 0.235, ¯l = 0.300, Y˜ = 0.824, C˜ = 0.442, ˜I = 0.234, G τl = 18.21%, τ p = 9.79%, and ˜t r = 0.0311.60 The optimal policy functions of the cohorts s = 1, . . . 280 are displayed in Figure 11.11, which presents individual wealth ks (upper-left panel), consumption c s (upperright panel), and labor supply l s (bottom panel). Individual wealth ks is hump-shaped and peaks around age 150 (corresponding to real-life age 57). Households start to save for old age beginning in period s = 35. Prior to this age, wage income is too low due to low age productivity ¯y s , so they instead accumulate debt to finance consumption. Individual consumption c s is also hump-shaped but peaks only in the last period of the working life. The drop in consumption during retirement is explained by the increase in leisure and the utility-smoothing behavior of individuals. Working hours l s are also hump-shaped but peak prior to age efficiency ¯y s due to the wealth effect on labor supply. With the help of the individual policy functions and the measures µs of cohort s = 1, . . . , 280, we can also compute inequality measures of the wealth and income distribution. We will use the Gini coefficient that, for 59
The nonlinear system of equations is specified in the procedure ss_endog_lab in the program AK280_perturb.g. 60 We calibrated γ = 0.29 such that average labor supply is equal to ¯l = 0.300. We found this value of γ by using a grid over the parameter and computing the nonstochastic steady state in each case.
11.3 Overlapping Generations with Aggregate Uncertainty Consumption
Capital Holdings 17.5 15.0 12.5 10.0 7.5 5.0 2.5 0.0 −2.5
1
50 100 150 200
s
683
280
0.54 0.51 0.48 0.45 0.42 0.39 0.36 0.33 0.30 0.27
1
50 100 150 200
s
280
Working Hours 0.36 0.33 0.30 0.27 0.24 0.21 0.18
1
50
100
s
150 180
Figure 11.11 Steady-State Age Profiles of Capital, Consumption, and Working Hours in the OLG Model of Example 11.3.1
example, is defined as follows in the case of the wealth distribution:61 PT s s=1 µ (Ss−1 + Ss ) Gini = 1 − , (11.54) ST where the accumulated wealth is defined by Ss =
s X
µjkj
j=1
with S0 = 0. We also compute the Gini coefficient for gross income and earnings. Gross income amounts to wage income (pensions) plus interest income for the workers (retirees) and transfers. Earnings are simply equal to wage income.62 The Gini coefficients of wealth, income, and earnings 61
Compare (8.16). In the computation of the Gini for earnings, we have to modify the computation of the Gini coefficient such that the shares of the workers sum up to 1.0. 62
684
11 OLG Models with Uncertainty
amount to 0.418, 0.201, and 0.121 in the nonstochastic steady state of the model and fall short of empirical values, as we only consider the inter-cohort and not the intra-cohort inequality.63 COMPUTATION OF THE DYNAMICS. Our main research question concerns the cyclical behavior of the aggregate variables and the (income and wealth) distribution in the economy. We use a first-order perturbation to solve for the dynamics and apply the notation described in Chapters 2-4. In the set of control variables in our dynamic system of equations, we also include GDP Y˜ together with the aggregate demand components ˜ investment ˜I , and government consumption G ˜ as well as consumption C, the factor prices and the Gini coefficients of wealth, income, and earnings because we would like to study their behavior in response to technology and government demand shocks. As input into the solver package CoRRAM, we specify a procedure OLG_Eqs1 in our computer program AK280_perturb.g that contains a total of 755 (static and dynamic) equations in 280 state variables, 473 control variables and two exogenous shocks, ln Z t and εGt . The state variables consist of the 279 individual capital stocks ˜kst , s = 2, . . . , 280, and ˜ t−1 . The inclusion of the public consumption in the previous period, G latter variable among the state variables results from the stochastic autoregressive behavior of government consumption, (11.46). The control variables consist of the 280 individual consumptions ˜c s , s = 1, . . . , 280, the 180 individual labor supplies, l s , s = 1, . . . , 180, and the 13 aggregate ˜ t , w t , r t , τ pt , ter t , Gini weal th , Gini income , and ˜ t , ˜L t , C˜t , ˜I t , G variables Y˜t , K t t ear nings
Gini t . In addition, we need to provide the information to CoRRAM that there are 194 static equations among the 755 equations consisting of the 180 first-order conditions with respect to individual labor supply, the aggregate consistency conditions, the first-order conditions of the firms, the production function, the definitions of the Gini coefficients, the goods market equilibrium,64 the budget constraint of a household in its last period of life, s = 280, and the budget constraints of the fiscal and social security authorities. 63
Your are asked to introduce within-cohort inequality in Problem 11.3. ˜ t . As you We use the goods market equilibrium to compute investment ˜I t = Y˜t − C˜t − G know from Walras’ law, one equilibrium equation in the model is redundant. In our case, ˜ t+1 = (1 − δ)K ˜ t + ˜I t , we do not specify the aggregate capital accumulation, (1 + gA)(1 + n)K in the procedure OLG_EQ1. 64
11.3 Overlapping Generations with Aggregate Uncertainty
685
The dynamic system characterizing our model has 282 eigenvalues with absolute value less than one, which is exactly the number of predetermined variables (the number of state variables and exogenous shocks). Therefore, our economy is locally stable.65
11.3.3 Business Cycle Dynamics of Aggregates and Inequality Following the procedure presented in Chapter 4, we study impulse responses and second moments in our business cycle model. In particular, we are interested in the cyclical distribution effects on income and wealth. EFFECTS OF A TECHNOLOGY SHOCK. Figure 11.12 illustrates the effects of a technology shock in period 2 in the amount of one standard deviation, ε2Z = 0.76%. As a consequence, production Y˜t increases by 1.1% (black line in the upper-right panel). In addition, wages w t (blue line in the bottom-left panel) and the interest rate r t (not illustrated) both increase, and the households increase their individual labor supplies such that aggregate labor ˜L t increases by 0.46% (black line, bottom left). Due to the higher income, households increase both consumption and savings such that consumption C˜t (blue line, top right) and investment ˜I t (red line, top right) increase by 0.5% and 2.8%, respectively. Note that both the qualitative and quantitative responses are in good but not exactly close accordance with those for the benchmark model presented in Figure 1.7 in Section 1.6. For example, the immediate responses of consumption and investment are approximately 0.5% and 3.5% in the representative-agent model. The distribution effects of a technology shock are depicted in the bottomright panel of Figure 11.12. The Gini coefficient of earnings (red line) falls by 0.6% on impact. The decline in inequality is explained by the stronger labor response of younger households to a wage increase than that of the older households. Households at young age have lower wealth and a longer remaining period to substitute labor intertemporally. As a consequence, earnings increase by a larger percentage in young than in old age. The younger households subsequently also increase their savings to a larger extent than the older workers such that wealth inequality slowly starts to 65
See Laitner (1990) for a detailed analysis of local stability and determinacy in AuerbachKotlikoff models.
686
11 OLG Models with Uncertainty 3.0
0.8
2.5
Percent
0.6
2.0
0.4
1.5
TFP Shock
1.0
0.2
Output Consumption Investment
0.5 0.0
0.0 5
10
5
15
0.6
Percent
0.5 0.4 0.3 0.2
Real Wage Labor
0.1 0.0 5
10
t
10
15
t
t
15
0.3 0.2 0.1 0.0 −0.1 −0.2 −0.3 −0.4 −0.5 −0.6
Gini Wealth Gini Income Gini Earnings 5
10
15
t
Figure 11.12 Impulse Responses to a Technology Shock in the OLG Model of Example 11.3.1
decline. Note that wealth is predetermined, and hence the shock in period 2 does not affect the wealth distribution until period 3. Income inequality, however, increases. This property of our OLG business cycle model results from our assumption that pensions are constant. As the older retirees are among those with the lowest income (recall that pensions are below average net earnings as expressed in the replacement rate of 49.4%), the increase in income of the working households magnifies the income gap. In sum, the Gini coefficient of the income distribution rises by 0.3% on impact. EFFECTS OF A GOVERNMENT DEMAND SHOCK. Figure 11.13 presents the impulse responses of the aggregate demand components government ˜ t (upper left panel), consumption C˜t , investment ˜I t , and consumption G ˜ output Yt (upper-right panel), the wage rate w t , and aggregate labor ˜L t (bottom-left panel), and the Gini coefficients (bottom-right panel)
11.3 Overlapping Generations with Aggregate Uncertainty
687
to a government demand shock εGt of one standard deviation in period 2, ε2G = 1.6%. Given the fiscal budget constraint, higher government consumption results in smaller government transfers ter t . As a consequence of this negative income effect, households increase their labor supply such that aggregate labor ˜L t rises by 0.08%. Note that the quantitative effect on labor supply is relatively small. The marginal product of labor falls with higher aggregate labor, and the wage declines by 0.03%. On impact, the ˜ t is constant and production increases by an even smaller capital stock K percentage (equal to 0.05%) than labor. Private consumption C˜t and savings fall due to the decline in income (lower transfers ter t ). Therefore, government consumption (partially) crowds out investment, which falls by 0.7%. Government Demand
Percent
1.5 1.2 0.9 0.6 0.3 0.0
5
10
15
0.1 0.0 −0.1 −0.2 −0.3 −0.4 −0.5 −0.6 −0.7
Output Consumption Investment
5
Percent
8.0
·10−2
7.5 5.0
6.0 4.0
2.5 Real Wage Labor
2.0 0.0 −2.0 −4.0
10
15
t
t ·10−2
Gini Wealth Gini Income Gini Earnings
0.0 −2.5 −5.0
5
10
t
15
5
10
15
t
Figure 11.13 Impulse Responses to a Government Demand Shock in the OLG Model of Example 11.3.1
The impulse responses are in close accordance with empirical observations. For example, Blanchard and Perotti (2002) find evidence in their VAR analysis of the US economy that a positive government consumption
688
11 OLG Models with Uncertainty
shock increases GDP Y˜t and employment ˜L t , while investment ˜I t declines strongly. With respect to the responses of private consumption and real wages to a positive government demand shock, the empirical evidence is more mixed. Most studies, however, find a positive effect on consumption C˜t , e.g., Blanchard and Perotti (2002), Galí, López-Salido, and Vallés (2007), and Ravn, Schmitt-Grohé, and Uribe (2012).66 While Rotemberg and Woodford (1992) find evidence that real wages also increase after a government spending shock, Monacelli, Perotti, and Trigari (2010) only find a statistically insignificant increase in the real wage for men. The effect of a government demand shock on inequality is quantitatively much smaller (in absolute value) than that of a technology shock. As illustrated in the bottom-right panel of Figure 11.13, the Gini coefficient of earnings only falls by 0.06% as younger workers increase their labor supply to a larger extent than older workers. Government transfers constitute a larger share of disposable income among younger workers than among older workers. Therefore, wealth inequality also increases because younger workers are able to accumulate less life-cycle savings over time. The effect on the income distribution is mixed. On the one hand, earnings among younger workers (who have lower age-dependent productivity ¯y s and, hence, labor income) increase. On the other hand, transfers are reduced by an equal amount to all households, which affects income-poor households more significantly. In addition, the decline in wages and the corresponding increase in interest rates67 favors the wealth rich, who are also the income rich in our economy (age efficiency ¯y s peaks at real-life age 52, while wealth peaks at real-life age 57). As a consequence, income inequality is basically unaffected by a government demand shock, and the Gini coefficient of income remains almost constant.68 66
Heer (2019) shows that the qualitative response of private consumption to a government demand shock depends on the elasticity of substitution between private and public consumption in utility. For example, if private and public consumption are complements, C˜t also increases in response to a positive government demand shock. In the present model, ˜ t does not enter utility. we simply assume that public consumption G 67 Note, however, that the rise in interest rate in our model is not in accordance with empirical evidence from the US economy. In particular, Auerbach et al. (2020) study regional effects of public defense spending and find declining interest rates in response to an expansive government demand shock. Moreover, Heer et al. (2018) show that stock returns in the consumption and investment goods sectors are uncorrelated with GDP growth. 68 Heer and Scharrer (2018) introduce adjustment costs of capital into a similar largescale OLG model and show that, in this case, the return on capital may fall and income inequality declines after a shock to government consumption. They decompose the income
11.3 Overlapping Generations with Aggregate Uncertainty
689
SECOND MOMENTS. Table 11.4 presents the second moments of our aggregates and Gini coefficients in the form of the volatility s x , the crosscorrelations with output and labor, r x Y and r x L , and the autocorrelation r x . While the standard deviations of the aggregate demand components are in line with empirical evidence, e.g., investment is approximately three times as volatile as output and consumption and labor are less volatile than output, we are unable to model the correlations of our main variables with output and employment. For example, consumption and investment are almost perfectly correlated with labor, while the empirical values reported by Heer (2019) are equal to 0.88 and 0.68 for the US economy during 1953–2014, respectively. In addition, real wages are empirically negatively correlated with employment and output, e.g., the correlation of real wages and output amounted to -0.27 and -0.36 for the US economy, while the correlation is almost perfect in our model.
Table 11.4 Second Moments of the OLG Model of Example 11.3.1 Variable Output Y˜t Labor ˜L t Consumption C˜t Investment ˜I t Real wage w t Gini wealth Gini income Gini earnings ˜t Gov demand G
sx
rx Y
rx L
rx
1.39 0.62 0.64 3.80 0.79 0.11 0.38 0.76 2.02
1.00 0.98 0.97 0.96 0.99 0.01 1.00 −0.99 0.05
0.98 1.00 0.90 0.92 0.95 0.14 0.97 −1.00 0.17
0.72 0.71 0.74 0.71 0.73 0.96 0.72 0.71 0.67
Notes: s x :=standard deviation of HP-filtered simulated series of variable x, r x Y :=cross correlation of variable x with output Y˜ , r x L :=cross correlation of variable x with aggregate labor ˜L , r x :=first order autocorrelation of variable x. Time series are detrended using the HodrickPrescott filter with weight λ = 1600.
As documented in Section 9.5.2, the low (high) income quintiles display procyclical (anticyclical) income shares such that the Gini coefficient of responses for the individual age types and study the debt versus tax financing of additional government expenditures, among other topics.
690
11 OLG Models with Uncertainty
income is procyclical,69 whereas Guvenen et al. (2015) report that the last two recessions had little effect on the income distribution with the exception of the top percentile, who were characterized by persistent and enormous losses. In our model, the inequality of income is perfectly procyclical, while the inequality of earnings is strongly anticyclical. The model also fails to replicate the cyclical pattern of the wealth distribution. As documented in Table 3 of Krueger et al. (2016), the shares in net worth increased among the bottom four quintiles of the wealth distribution and fell in the top quintile during the Great Recession between 2006 and 2010. This procyclical behavior of wealth inequality cannot be replicated with our simple model. The Gini coefficient of wealth is uncorrelated with output in our model. COMPARISON OF OLG MODELS WITH QUARTERLY AND ANNUAL PERIODS. When we attempt to study questions of business cycle dynamics of the income and wealth distribution, we are often confronted with the problem of obtaining high-frequency data. Most data on inequality measures and percentiles of the income and wealth distribution in the US economy (and other countries) are only available at an annual frequency, if at all, for example in the form of the PSID panel data or data from the tax authorities for the case of the US economy. Therefore, it often makes sense to use a business cycle model with annual periods to map it to the appropriate data. In the following, we will study our OLG business cycle model for annual frequencies.70 Therefore, the parameterization has to be adjusted for the new period length. For example, we set the population growth rate to 0.754% and 0.754%/4=0.1885% in the cases of annual and quarterly periods, respectively.71 In the case of the quarterly autoregressive AR(1)-process for technology, we change the quarterly autoregressive parameters, ρ Zp= 0.96 and σ Z = 0.00763, to ρ Z,annual = (ρ z )4 = 0.815 and σ Z,annual = 1 + (ρ Z )2 + (ρ Z )4 + (ρ Z )6 σ Z = 0.0142.72 Similarly, we adjust the parameters of the autoregressive AR(1)-process for government consumption, ρ G = 0.573 and σ G = 0.0266. In addition, we choose the HP 69
In the next paragraph, we will also highlight that the Gini coefficient of income has a positive correlation with output (rGini income ,Y = 0.25) at an annual frequency. 70 In the GAUSS program AK280_perturb.g, you simply have to set the period length to years rather than quarters in line 23 of the code. 71 Alternatively, one could choose 1.007541/4 − 1 = 0.1880% instead. 72 The derivation of the formulas is relegated to Appendix A.13.
11.3 Overlapping Generations with Aggregate Uncertainty
691
filter weight λ = 100 for annual data. For the lower frequency, the computational time of the program AK280_perturb.g decreases considerably, from 32 minutes (quarterly periods) to 1 minute (annual periods).
Table 11.5 Comparison of Second Moments Across Studies Variable
US data
Ríos-Rull (1996)
Example 11.3.1
Output Y˜ sY rY
2.23 0.50
1.65 0.47
1.78 0.46
Consumption C˜ sC sC /sY rC Y
1.69 0.76 0.86
0.74 0.45 0.97
0.95 0.53 0.95
Investment ˜I sI s I /SY rI Y
8.66 3.88 0.87
4.45 2.71 0.99
4.73 2.66 0.96
Labor ˜L sL s L /sY r LY
1.72 0.77 0.93
0.59 0.36 0.95
0.74 0.42 0.96
Gini Income sGini sGini /sY rGini,Y
1.05 0.47 0.25
0.48 0.27 0.99
Notes: The entries for the US data are taken from Table 4 of Ríos-Rull (1996) with the exception of the Gini index of income (own calculations). s x :=standard deviation of HPfiltered simulated series of variable x, r x Y :=cross correlation of variable x with output Y , r x :=first-order autocorrelation of variable x. Time series are detrended using the HodrickPrescott filter with weight λ = 100.
Table 11.5 reports the second moments of annual time series from the US economy, the model of Ríos-Rull (1996), and our own model in this section. The empirical data for the aggregates Y˜t , C˜t , ˜I t , and ˜L t for the US economy are taken from Table 4 in Ríos-Rull (1996), who provides
692
11 OLG Models with Uncertainty
HP-filtered data for the period 1956–1987. The estimates for the second moments of the Gini coefficient of income are calculated with the help of World Bank data for the period 1991–2018.73 The second entry column replicates the results of Table 5 in Ríos-Rull (1996). The model of Ríos-Rull (1996) is specified without a government sector, so there are no taxes, government spending, or public pensions. In addition, he assumes perfect annuities markets. Evidently, the model is able to explain a large part of the fluctuations in GDP with the help of only a technology shock because the standard deviation of output sY is equal to 1.65% in the model compared to 2.25% in the US economy. The model is also able to generate the same relative volatilities of consumption and investment with output, the autocorrelation of output rY , and the high correlations of the aggregates consumption C˜t , investment ˜I t , and labor ˜L t with output Y˜t . Our model with a government only mildly improves the results on the matching of the business-cycle facts by Ríos-Rull (1996). In particular, the government helps to increase the volatility of output Y˜t , consumption C˜t , and labor ˜L t to a small extent. In our model, we also analyze the cyclical behavior of the income distribution. Empirically, the Gini coefficient of income is half as volatile as output, sGini /sY = 0.47, while we are only able to replicate 56% of this relative volatility. Apparently, allowing only for inter- but not intra-generational inequality and neglecting stochastic individual productivity results in too little volatility in income inequality. In addition, the Gini coefficient of income in the model is too procyclical as measured by the correlation with output (rGini,Y = 0.99) compared to its empirical value (rGini,Y = 0.25). From this exercise, we might conclude that the business cycle dynamics of the OLG model compare closely with those of the standard Ramsey model. We might even go one step further and conjecture that the OLG model or even other heterogeneous-agent models have a limited role in studying business cycles because the much simpler neoclassical growth model is able to do so except for the obvious questions of distributional effects on income, earnings, or wealth. However, we should be careful drawing this conclusion. For example, Krueger et al. (2016) set up a model that adds retirees to the model of Krusell and Smith (1998) and is able to more closely replicate the US wealth distribution. In particular, they model the large share of the population at the bottom of the wealth distribution with zero or close-to-zero wealth. For this reason, they introduce preference 73
We retrieved the Gini Index for the United States [SIPOVGINIUSA, World Bank] from FRED, Federal Reserve Bank of St. Louis.
11.3 Overlapping Generations with Aggregate Uncertainty
693
heterogeneity, unemployment insurance, and social security. As a consequence, aggregate consumption declines much more strongly in response to a large shock (as during the Great Recession) and is 0.5 percentage points higher (in absolute value) than that in the representative-agent economy. Kaplan et al. (2018) find that the transmission mechanism of monetary policy for household consumption is different in models with heterogeneous agents in comparison with a small- and medium-scale representative agent New Keynesian model. There is also empirical evidence that highlights the importance of considering the distribution in business cycle models. For example, Brinca et al. (2016) study the effects of wealth inequality on fiscal multipliers in a VAR and find, in a sample of 15 OECD countries, that fiscal multipliers increase with the country Gini of wealth.
11.3.4 The Krusell-Smith Algorithm and Overlapping Generations The Algorithm proposed by Krusell and Smith (1998) that you learned about in Section 9.4 can also be applied to economies with finite lifetimes with some minor modifications.74 The individual state space is simply characterized by an additional dimension, which is age. Therefore, the simulation step becomes more time-consuming. However, we have not encountered any other limitations in the application of the Krusell-Smith algorithm to finite-lifetime economies in our experience. In particular, the goodness of fit for the law of motion for a particular functional form is almost identical to that in infinite-lifetime models. Given current computer technology, however, the algorithm is still very time-consuming. Below, we will present a simple example that takes us some 5 hours (41 hours) to compute with an Intel Pentium(R) M, 319 MHz machine using GAUSS (Python), even though technology as the only stochastic aggregate variable is assumed to take only two different values. Therefore, adding more stochastic aggregate variables may seriously exacerbate computational time. The economic analysis in this section is very closely related to that of the infinite-lifetime model in Section 9.5.2. Hereinafter, we study the business 74
In a more recent application, Kaplan et al. (2020), for example, study the effects of the housing boom and bust during the Great Recession. Therefore, they introduce multiple aggregate shocks to income, housing finance conditions, and beliefs about future housing demand in a large-scale OLG model with home ownership. The main driver of the volatility in house prices is found to be the shift in beliefs. The main transmission mechanism to the real economy is a wealth effect through the household balance sheet.
694
11 OLG Models with Uncertainty
cycle dynamics of the income distribution in an OLG model with aggregate uncertainty. For this reason, let us briefly review our results from Chapter 9. In Table 9.2, we present the empirical correlation between output and income shares as estimated by Castañeda et al. (1998b). The US income distribution is highly, but not perfectly, procyclical for the low income quintiles, countercyclical for the top 60-95%, and acyclical for the top 5%. In their model, cyclical fluctuations result from the stochastic technology level. During a boom, the number of unemployed workers decreases. As a consequence, the relative income share of the lower income quintiles rises relative to that of the higher income quintiles. However, the income shares are almost perfectly correlated with output, either positively or negatively. Therefore, we also fail to replicate the income dynamics of the very income rich, which is acyclical. In the following, we use a simple business cycle model with overlapping generations and elastic labor supply and improve upon the modeling of the cyclical income distribution dynamics in some aspects. Therefore, we consider a minor modification of Example 11.3.1. In particular, household productivity also includes a stochastic component. As we noted, this element of a model is difficult to integrate in the solution with perturbation methods, and thus we need to apply the Krusell-Smith algorithm for the solution of the model. Moreover, we also simplify the model of Example 11.3.1 with respect to three assumptions. 1) We consider only annual periods and 2) assume a simple two-state process for technology, while 3) government consumption is constant. The rest of the model is unchanged. In our model, the almost perfect correlation of the lower income quintiles with output is reduced as the high-productivity agents have a more elastic labor supply than their low-productivity counterparts.75 In addition, the share of the top quintile of the income earners is less anticyclical as in Castañeda et al. (1998b) and, for this reason, in better accordance with empirical observations because many of the income-rich agents in our model are wealth-rich agents close to and in early retirement. The economic mechanism is as follows. During an economic expansion, both wages and pensions increase. Pensions are tied to the current wage rate. However, workers also increase their labor supply, which is not possible for retired workers. Therefore, the income share of workers increases and is procyclical. Since the top income quintile contains both workers and 75
Heer and Maußner (2012) show that this does not need to be the case in the presence of progressive income taxation.
11.3 Overlapping Generations with Aggregate Uncertainty
695
retirees, the opposing cyclical effects on these groups result in a lower correlation of this income quintile with GDP. AN OLG MODEL OF THE INCOME DISTRIBUTION BUSINESS CYCLE DYNAMICS . In the following, we consider the model of Example 11.3.1 with annual periods and 70 overlapping generations subject to the following modifications: 1. Worker labor productivity ε(s, e, θ ) = eθ ¯y s depends on the agent’s permanent efficiency type e ∈ E = {e1 , e2 }, his idiosyncratic stochastic productivity θ ∈ Θ = {θ1 , θ2 }, and his age s ∈ S . This modeling of labor productivity has often been applied in DGE analysis for the following reasons: i) Differences in the permanent efficiency type e help to generate the wage heterogeneity that is observed empirically. Often, the wage gap between the two different productivity types e2 and e1 , e2 > e1 , is interpreted as the skill premium of college versus high-school graduates. ii) Workers will accumulate precautionary savings if they face idiosyncratic productivity risk θ . Therefore, the wealth distribution becomes more heterogenous in better accordance with reality. iii) The age-dependent component ¯y s helps to explain differences in the ageincome distribution that is important to explain the movement of the cross-sectional factor shares. In each period t, an equal measure of 1-year-old workers (corresponding to real-life age 20) of productivity types ε(1, e j , θiθ ), j = 1, 2,iθ = 1, 2, is born. During working age, s = 1, . . . , 45, the process for idiosyncratic productivity θ s is described by a Markov chain: θ θ π11 π12 0 0 π(θ |θ ) = Prob θ t+1 = θ |θ t = θ = . (11.55) πθ21 πθ22 Depending on his permanent efficiency type e, the agent receives pensions pen t (e) in old age that are financed by a social security tax p τ t on the workers’ wage income. Net labor income of a worker of type p (s, e, θ ) amounts to (1 − τlt − τ t )eθ t ¯y s A t w t l t , so his budget constraint reads as: s p l s s k ¯ ks+1 =(1 − τ − τ )eθ y A w l + 1 + (1 − τ )r kt t t t t t t+1 t + t r t − (1 + τc )c ts , s = 1, . . . , 45,
where, again, τlt , τkt , and τct denotes the wage, capital income and consumption tax rates in period t.
696
11 OLG Models with Uncertainty
At age 46 (corresponding to real-life age 65), households retire and face the budget constraint s k c s ks+1 t+1 = pen t (e) + 1 + (1 − τ )r t k t + t r t − (1 + τ )c t , s = 46, . . . , 70.
(11.56)
2. Aggregate production is represented by (11.42) where the stochastic component Z t , Z t ∈ {Z b , Z g }, follows a 2-state Markov process: Z Z π11 π12 π(Z 0 |Z) = Prob Z t+1 = Z 0 |Z t = Z = . (11.57) Z Z π21 π22 We associate Z b and Z g with a negative shock (recession) and a positive shock (boom), respectively. 3. Government consumption (relative to aggregate labor productivity A t ˜ t ≡ G t /(A t Nt ) = G. ˜ 76 and population Nt ) is constant, G 4. The government provides pensions to the retired agents. Pensions are proportional to the current-period net wage rate with the replacement rate being denoted by r epl net . In addition, we distinguish between two cases. Pensions are either lump-sum or depend on the permanent efficiency type e:
pen t (e) =
p
r epl net (1 − τlt − τ t )w t ¯l, p r epl net (1 − τlt − τ t )w t e¯l,
case 1: lump-sum, case 2: efficiency-dependent,
where ¯l denotes the average labor supply in the economy in the nonstochastic steady state (with Z ≡ 1). Therefore, pensions of the retired agents do not increase if the contemporary workers increase their labor supply.77 CALIBRATION. The calibration of the model parameters is summarized in Table 11.6. We choose the same values for the production, tax and demographic parameters as in the previous section. The production elasticity of capital amounts to α = 0.35. Capital depreciates at the rate δ = 8.3%, and economic growth gA amounts to 2.0%. The tax rates on labor income (including social security), τl + τ p , capital income, τk , and consumption, τc , are set at 28%, 36%, and 5%, respectively. Population grows at 0.754% annually. 76
In Problem 11.5, you are asked to solve the model with stochastic government demand. This rather innocuous assumption simplifies the computation of the individual policy functions. The household has to forecast next-period wages, interest rates, and pensions to derive its optimal savings, consumption, and labor supply. For our specification of the model, it does not need to forecast the average working hours in the next period.
77
11.3 Overlapping Generations with Aggregate Uncertainty
697
Table 11.6 Calibration of the OLG Model with Individual and Aggregate Uncertainty Parameter
Value
Description
α δ gA τl + τ p τk τc n η γ
0.35 8.3% 2.0% 28% 36% 5% 0.754% 2.0 0.29
β G/Y
1.011 18%
r epl net (e1 , e2 )
49.4% (0.57, 1.43)
production elasticity of capital depreciation rate of capital growth rate of output tax on labor income tax on capital income tax on consumption population growth rate coefficient of relative risk aversion preference parameter for utility weight of consumption discount factor share of government spending in steady-state production net pension replacement rate permanent productivity types
With respect to the preference parameters, we choose the coefficient of relative risk aversion η = 2.0 and the weight of consumption in utility γ = 0.29 such that average labor supply is approximately equal to 1/3. We set the discount factor β equal to 1.011 in accordance with estimates by Hurd (1989). The share of government consumption in GDP amounts to 18%. The replacement rate of average pensions relative to net wage earnings is equal ¯p en to r epl net = (1−τl −τtp )w ¯l = 49.4%. t
t
t
The Markov process (11.57) of the aggregate technology level is calibrated such that the average duration of one cycle is equal to 6 years: 2/3 1/3 0 π(Z |Z) = . (11.58) 1/3 2/3 Aggregate technology is chosen such that the mean Z¯ is equal to one and the annual standard deviation of output is approximately equal to 2%, implying (Z b , Z g ) = (0.98, 1.02).78 78
The standard deviation of annual HP-filtered output amounts to 2.4% in our model.
698
11 OLG Models with Uncertainty
The calibration of individual productivity ε(s, e, θ ) is chosen in accordance with Krueger and Ludwig (2007). In particular, we pick (e1 , e2 ) = (0.57, 1.43) such that the average productivity is one and the implied variance of labor income for the new entrants at age s = 1 is equal to the value reported by Storesletten et al. (2007). The annual persistence of the idiosyncratic component θ is chosen to be 0.98. In addition, idiosyncratic productivity has a conditional variance of 8%, implying (θ1 , θ2 ) = (0.727, 1.273), and 0.98 0.02 0 π(θ |θ ) = . (11.59) 0.02 0.98 Note that individual income is highly persistent. The age-efficiency ¯y s profile is taken from Hansen (1993).79 The calibration implies an average labor supply approximately equal to ¯l = 0.333 and a Gini coefficient of income (wealth) equal to 0.40 (0.62) in case 1 (lump-sum pensions) in good accordance with empirical observations, although the values are lower than those in most recent studies on the empirical wealth and income distribution.80 As noted previously, Budría Rodríguez et al. (2002) find a value of 0.55 (0.80) for the income Gini (wealth Gini) for the US economy. As the main reason for the underestimation of inequality in our model, we do not model the top percentile of the income distribution. COMPUTATION. To compute the OLG model with both individual and aggregate uncertainty, we use the algorithm of Krusell and Smith (1998). The Python and GAUSS programs OLG_Krusell_Smith.py and OLG_ Krusell_Smith.g implement the algorithm that is described by the following steps: Algorithm 11.3.1 (Krusell-Smith Algorithm for OLG Models) Purpose: Computation of the OLG model with individual and aggregate uncertainty Steps: Step 1: Compute the nonstochastic steady state with Z ≡ 1. Store the policy functions and the steady-state distribution of (s, e, θ , ˜k). 79
See Figure 10.4 in Section 10.3. In the case of efficiency-dependent pensions, the Gini coefficients of (gross) income and wealth drop to 0.39 and 0.56, respectively. 80
11.3 Overlapping Generations with Aggregate Uncertainty
699
Step 2: Choose an initial parameterized functional form for the law of mo˜ 0 = g K (Z, K ˜ ), agtion for the aggregate next-period capital stock K L ˜ ˜ gregate present-period employment L = g (Z, K ), and transfers ˜ ). fr = g Tr (Z, K T Step 3: Solve the consumer’s optimization problem as a function of the indi˜ ). vidual and aggregate state variables, (s, e, θ , ˜k; Z, K Step 4: Simulate the dynamics of the distribution function. Step 5: Use the time path for the distribution to estimate the law of motion ˜ 0 , ˜L , and T fr. for K Step 6: Iterate until the parameters converge. Step 7: Test the goodness of fit for the functional form g = (g K , g L , g Tr )0 using, for example, R2 . If the fit is satisfactory, stop; otherwise, choose a different functional form for g(·) and return to Step 3. STEP 1: COMPUTATION OF THE NONSTOCHASTIC STEADY STATE. The nonstochastic steady state can be computed with two nested loops. In the ˜ , ˜L , T fr, and ¯l. In outer loop, we iterate over the aggregate variables K the inner loop, we solve the individual’s utility maximization problem for given factor prices w and r and pensions pg en. In Section 11.2, we applied value function iteration to derive optimal savings, consumption, and labor supply in the overlapping generations model with idiosyncratic uncertainty. In the following, however, we use a nonlinear equations solver and apply it to the first-order condition in the form of the Euler equation at each grid point of the discretized state space instead. We find this method to be much faster in the present case, and speed is essential in the computation of this heterogeneous agent model. Let us start with the description of the outer loop. We initially pick a value for average working hours equal to ¯l = 0.30. Since the share of workers in the total population is equal to 78% in our calibration with respect to the characteristics of the US economy in 2015, we use a value of aggregate labor equal to ˜L = 0.78 × 0.30 = 0.234. Recall that the average (permanent, idiosyncratic, and age-dependent) productivities of workers, ε, θ , and y s , have been normalized to 1.0. We compute the aggregate capital stock with the help of the marginal product of capital, which we set fr = 0. In the outer equal to r = 2.0% as our initial guess. Finally, we set T ˜ , ˜L , T fr, ¯l) by 30% with respect to their new values. With loop, we update (K the help of the aggregate variables, we can compute factor prices w and r from the first-order conditions of the firms and pg en with the help of the net net replacement rate r epl .
700
11 OLG Models with Uncertainty
Next, we compute the optimal policy functions over the individual state space ˜z = (s, e, θ , ˜k) for given w, r, and pg en. Therefore, we have to discretize the state space with respect to the only continuous variable, individual wealth (or capital) ˜k. We specify an equispaced grid with nk = 50 points over the individual asset space ˜k with lower and upper boundaries ˜k min = 0 and ˜k ma x = 10.0. The upper value is approximately equal to five times the average wealth in the economy (in the nonstochastic steady state). We have to ascertain that in neither the nonstochastic steady state nor in the simulation do households choose a next-period capital stock at the upper boundary k ma x or above. If this were the case in our computation, we would have to respecify the boundaries and choose a wider interval. We iterate backwards over age starting in the last period of life where the next-period capital stock is equal to zero: ˜k0 (70, e, θ , ˜k) = 0 for all grid points (s, e j , θiθ , ˜kik ), for j = 1, 2, iθ = 1, 2 and ik = 1, . . . , 50. We iterate backwards over ages s = 69, 68, . . . , 1, solving for the nonlinear Euler equation residual at each grid point: r f (˜z) = uc (˜c (˜z), 1 − l(˜z)) − βφ s E t uc (c 0 (˜z0 ), 1 − l(˜z0 ))(1 + (1 − τk )r) !
=0
(11.60) with l= ˜c =
¨
n o ter+[1+(1−τk )r]˜k−(1+gA)˜k0 max 0, γ − (1 − γ) , s ≤ 45, (1−τl −τ p )eθ ¯y s w
0,
s > 45,
(11.61a)
(1 − τl − τ p )eθ ¯y s wl + ter + [1 + (1 − τk )r]˜k − (1 + gA)˜k0 . 1 + τc (11.61b)
If our solution of the residual function r f (˜z) = 0 implies a negative labor supply for the worker, l < 0, we set l = 0. We store the optimal policy functions ˜k0 (˜z), ˜c (˜z), and l(˜z) at each grid point for each age s = 69, . . . , 1. The computation of the solution for the nonlinear equation (11.60) merits some comments. First, we have to check whether our solution for the next-period capital stock is not the lower boundary value ˜k0 = 0. Therefore, we evaluate (11.60) at ˜k0 = 0. If r f (˜z) < 0, we know that ˜k0 = 0. Otherwise, we continue to find the inner solution for ˜k0 . Second, we find that the nonlinear equation solver is very sensitive with respect to the choice of the initial value ˜k0 supplied to the rootfinding routine for r f (˜z) = 0. Depending on the particular nonlinear
11.3 Overlapping Generations with Aggregate Uncertainty
701
equation solver applied in our programs (for example, in our Python or GAUSS codes, we use the modified Newton-Rhapson algorithm described in Section 13.5.1) we might have to test different starting values. In the present application, we specified five initial guesses for the optimal policy ˜k0 (s, e j , θi , ˜ki ): θ k 1. 2. 3. 4. 5.
˜k0 = 0, solution for ˜kik −1 found in the last iteration if ik > 1, solution ˜k0 (s + 1, e j , θiθ , ˜kik ) at age s + 1, ˜k0 = ˜k, computation of maximum possible value ˜k0 for small ˜c ≈ 0.001 and using half of this value.
From these values, we pick the one with the lowest absolute value for the residual function r f (˜z). Third, in the computation of r f (˜z) with the help of (11.60), we need to evaluate next-period consumption and labor supply at wealth level ˜k0 . In general, the next-period capital stock ˜k0 will not be a grid point. We used the stored policy functions of consumption and labor, ˜c 0 (˜z) and l 0 (˜z), at age s + 1 in the previous iteration and interpolate linearly between grid points. The computation is carried out in the functions/procedures rfold and rfyoung of the GAUSS/Python program OLG_Krusell_Smith for the retired and working households, respectively. Finally, we have to aggregate the individual capital levels ˜k, consumption ˜ , ˜L , T fr, ¯l). Therefore, we have to ˜c , and labor l to derive the aggregates (K compute the (nonstochastic) steady-state distribution f (s, e, θ , ˜k). For the computation of distribution f (˜z), we start at the newborn generation with age s = 1 and iterate forward over age. At age s = 1, the newborn cohort has measure µ1 = 0.02118 (recall that we normalized total population to have a measure of one).81 Each productivity type (e j , θiθ ), j = 1, 2, iθ = 1, 2, has equal measure such that 1 µ if ik = 1, f (1, e j , θiθ , ˜kik ) = 4 0 else. Given the distribution at age s, we update the distribution at age s + 1 using the optimal policy function ˜k0 (s, e j , θiθ , ˜kik ) and summing over all grid points at age s.82 If next-period wealth ˜k0 (s, e j , θiθ , ˜kik ) = kikk happens 81
We derive this measure using the US population growth rate and survival probabilities of the year 2015 and computing the stationary population. 82 When we start the iteration at age s to compute the measure at age s + 1, we initialize all measures f (s + 1, e j , θiθ , ˜kik ) with zero.
702
11 OLG Models with Uncertainty
to be a grid point (s, e j , θiθ , ˜kik ) (this will be the case if the credit constraint ˜k0 ≥ 0 is binding), we simply add the respective measure at age s to the measure at age s + 1, f (s + 1, e j , θ 0 , ˜kikk ), noting that only φ s of the households survive and that the size of the cohort in the total population shrinks by a factor of (1 + n) due to population growth. In particular, at age s, we iterate over the grid on (e j , θiθ , ˜kik ) for j = 1, 2, iθ = 1, 2 and ik = 1, . . . , nk as follows:83 SET f (s + 1, e j , θ 0 , ˜kikk ) = f (s + 1, e j , θ 0 , ˜kikk )
φs + π(θ 0 |θiθ ) × f (s, e j , θiθ , ˜kik ) × , 1+n
for θ 0 ∈ {θ1 , θ2 }.
If next-period wealth ˜k0 (s, e j , θiθ , ˜kik ) lies between the asset grid points kikk and kikk +1 , we add the measure in period s to the measures of these two points at age s + 1 according to:84 SET f (s + 1, e j ,θ 0 , ˜kikk ) = f (s + 1, e j , θ 0 , ˜kikk ) ˜ki +1 − ˜k0 φs kk + π(θ 0 |θiθ ) × f (s, e j , θiθ , ˜kik ) × , ˜ki +1 − ˜ki 1+n kk kk SET f (s + 1, e j ,θ 0 , ˜kikk +1 ) = f (s + 1, e j , θ 0 , ˜kikk +1 ) ˜k0 − ki φs kk + π(θ 0 |θiθ ) × f (s, e j , θiθ , ˜kik ) × , kikk +1 − kikk 1+n for θ 0 ∈ {θ1 , θ2 }.
Of course, this division of the measures is necessary due to our discretization of the distribution function and is only an approximation of the true distribution. Only if next-period capital ˜k0 (˜z) were a linear function of present-period capital ˜k would we find the mean capital stock of the discretized distribution f (˜z) to coincide with its exact value. At low levels of the capital stock ˜k, however, we find substantial curvature of the savings function ˜k0 (˜z).85 83
With the following ’SET’-statement, we describe pseudocode. The command ’SET x = y’ assigns the value ’y’ to the variable ’x’ in the step of the algorithm. 84 Note that the code can easily be parallelized in the individual loop over ages s = 1, . . . , 69. 85 For this reason, some economists use more grid points at the lower boundary of the interval [k min , k max ].
11.3 Overlapping Generations with Aggregate Uncertainty
703
Figure 11.14 presents the nonstochastic steady-state cumulative distribution of the individual capital stock ˜k in case 1 (lump-sum pensions). Note that the upper boundary k max = 10.0 is not binding; maximum individual wealth amounts to ˜k = 9.59 (˜k = 8.78) in case 1 (case 2 with efficiency-dependent pensions). In addition, we find that 21.2% (15.5%) of the households are credit-constrained with ˜k = 0. 1.0
F (˜k)
0.8 0.6 0.4 0.2 0
1
2
3
4
5
6
7
8
9
10
˜k Figure 11.14 Nonstochastic Steady-State Distribution of ˜k (case 1)
Now, we are able to sum up the capital stocks, consumption and labor supply of the individual households weighted by their measures ˜ , ˜L , ¯l). We derive f (s, e j , θiθ , ˜kik ) to determine the aggregate variables (K fr with the help of the fiscal budget (11.47). aggregate transfers T The optimal policy functions for the steady state are stored to use them as an initial guess for the policy functions in Step 3. Similarly, we save the nonstochastic steady state distribution and use it as the initial distribution for the simulation of the stochastic economy in Step 4. The computation of the nonstochastic steady state with this Euler equation method is very fast. As presented in Table 11.7, the runtime amounts to approximately 2 minutes (12 minutes) in our GAUSS (Python) program. If you compare these runtimes with those from Table 11.2 where we used value function iteration, the difference is dramatic (hours versus minutes). As is typical, we find that GAUSS is much faster than Python, in this particular application by a factor of 6. STEP 2: CHOOSE AN INITIAL PARAMETERIZED FUNCTIONAL FORM FOR THE ˜ 0 , ˜L , AND T fr. In the second step, we need to postulate AGGREGATES K the laws of motion for the next-period capital stock, employment, and
704
11 OLG Models with Uncertainty Table 11.7 Runtime: Krusell-Smith Algorithm and OLG Models Python
GAUSS
Nonstochastic steady state
11:53
2:19
Policy functions
39:03
14:18
1:13:52
3:52
1:21:40:18
5:18:40
Simulation (200 periods) Total
Notes: Runtime is given in days:hours:minutes:seconds on an Intel(R) Xeon(R), 2.90 GHz.
transfers:86 ˜ t+1 = g K (Z t , K ˜ t ), K ˜L t = g L (Z t , K ˜ t ), ˜ t ). fr t = g (Z t , K T Tr
(11.62a) (11.62b) (11.62c)
In light of the results obtained by Krusell and Smith (1998), we use a ˜ t+1 log-log specification for the dynamics of the aggregate capital stock K as a function of stochastic aggregate technology Z t and present-period ˜ t . For the contemporaneous employment ˜L t , we copy this capital stock K fr t , however, functional relationship. With respect to aggregate transfers T we use a function in levels rather than in logs given the observation that aggregate transfers are close to zero. Accordingly, we choose the following specification of g = (g K , g L , g Tr )0 : ˜ t + ωK,2 1 Z =Z b + ωK,3 1 Z =Z b ln K ˜ t , (11.63) ln K t+1 = ωK,0 + ωK,1 ln K t t ˜ t + ω L,2 1 Z =Z b + ω L,3 1 Z =Z b ln K ˜ t , (11.64) ln ˜L t = ω L,0 + ω L,1 ln K t t ˜ t + ω Tr,2 1 Z =Z b + ω Tr,3 1 Z =Z b K ˜t . fr t = ω Tr,0 + ω Tr,1 K T t t
(11.65)
˜ t+1 is only a function of present-period capital stock K ˜ t and Note, in particular, that K aggregate stochastic technology Z t . Therefore, employment ˜L t is not an aggregate state variable. This would be different if we assumed sticky employment, e.g., as resulting from employment search or other frictions in the labor market. 86
11.3 Overlapping Generations with Aggregate Uncertainty
705
Here, 1 Z t =Z b denotes an indicator function that takes the value one if the present-period technology level Z t is equal Z b and zero otherwise. As ˜ , ˜L , T fr}, equal to zero and choose initial values, we set ω x,1 -ω x,3 , x ∈ {K ˜ , ˜L , T ˜ t+1 , ˜L t , and T fr}, so that the values of K fr t are equal to ω x,0 , x ∈ {K ˜ ˜ fr. In the final their nonstochastic steady-state counterparts K , L , and T iteration in Step 6, we find the following solution (case 1):87
0.0412 −1.2484 0.0053 0.0175 0.0216 0.0056 ωK = , ωL = , ω Tr = . 0.9251 0.0841 −0.0030 −0.0070 −0.0048 0.0015 STEP 3: SOLVE THE CONSUMER’S OPTIMIZATION PROBLEM AS A FUNCTION ˜ ). OF THE INDIVIDUAL AND AGGREGATE STATE VARIABLES , (s, e, θ , ˜ k; Z, K In Step 3, we compute the individual policy functions as functions of the ˜ t+1 , individual and aggregate state variables for a given law of motion for K ˜L t , and T fr t . For this reason, we choose a rather loose equispaced grid for ˜ because the curvature of the policy function the aggregate capital stock K with respect to this argument is rather low. We find that nK = 7 points are sufficient. Furthermore, we choose 80% and 120% of the nonstochastic ˜ = 1.820, as the lower and upper steady-state aggregate capital stock, K boundary for this interval. In our simulations, the aggregate capital stock ˜ t ∈ [1.456, 2.184]. always remains within these boundaries, K We compute the policy functions in the same way as described in Step 2 above. However, we have to consider that factor prices are no longer constant but depend on the aggregate states in periods t and t +1. Consider the case in which we would like to compute the optimal policy functions at ˜ i , ) with Z t = Z iZ , i Z ∈ {b, g}, and K ˜ t = Ki . grid point (s, e j , θiθ , ˜kik ; Z iZ , K K K ˜ Aggregate employment in period t, L t , can be computed with the help of ˜ t ). Given (Z t , K ˜ t , ˜L t ), we are able to compute factor the function g L (Z t , K prices w t and r t from the first-order conditions of the firms. In addition, we can compute contemporaneous pensions pg en t with the help of the replacement rate. When we compute the residual function r f (z) with the help of (11.60), we need to evaluate the expected next-period marginal utility of consumps+1 tion c t+1 and the interest rate r t+1 . Since both aggregate and individual 87
In case 2, the solution is equal to ωK = (0.0385, 0.0176, 0.9256, −0.0070)0 , ω L = (−1.2647, 0.0228, −0.0770, 0.0040)0 , and ω Tr = (0.0030, 0.00560, −0.0022, 0.0015)0 .
706
11 OLG Models with Uncertainty
s+1 productivity, Z t+1 and θ t+1 ,88 are stochastic, we need to evaluate c t+1 and r t+1 in each case and weight the marginal utility of consumption by proba˜ t+1 bility π(Z t+1 |Z t )×π(θ t+1 |θ t ). We know that next-period capital stock K is predetermined from our choice in period t and can be computed with the help of g K (·). Given the realization of Z t+1 , we are able to compute ˜L t+1 with the help of g L (·) and, hence, interest rates r t+1 . Finally, to compute s+1 s+1 ˜c t+1 and l t+1 at age s + 1 with productivity θ t+1 and aggregates Z t+1 and ˜ t+1 , we interpolate ˜c (·) and l(·) bilinearly at (˜k t+1 , K ˜ t+1 ). K The computation of the optimal policies as functions of the individual and aggregate states is quite time-consuming because the total number of grid points amounts to:
T × ne × nθ × nk × n Z × nK = 70 × 2 × 2 × 50 × 2 × 7 = 28, 000. As presented in Table 11.7, computational time of Step 4 amounts to 14 minutes (GAUSS) and 39 minutes (Python). STEP 4: SIMULATE THE DYNAMICS OF THE DISTRIBUTION FUNCTION. Starting with the nonstochastic steady-state distribution as our initial distribution f0 (s, e, θ , ˜k), we compute the dynamics of f t (·) over 200 periods ˜ 0 , ˜L0 , t = 1, . . . , 200, using Algorithm 8.4.3. In period 0, the aggregates K f T r 0 are equal to their nonstochastic steady-state values. To choose Z0 , we use a random number generator of the uniform distribution on the interval [0, 1] and assign Z0 = Z b if the realization is below 0.5 and Z0 = Z g otherwise. We use a pseudo-random number generator to simulate the technology level {Z t }200 t=0 over 201 periods. Given the distribution in period t, f t (s, e, θ , ˜k), we can compute the next-period distribution, f t+1 (s, e, θ , ˜k), ˜ ) and l(s, e, θ , ˜k; Z, K ˜ ). with the help of the policy functions k0 (s, e, θ , ˜k; Z, K ˜ To compute the distribution in period t + 1, f t+1 (s, e, θ , k), given the distribution in period t, f t (s, e, θ , ˜k), we use the same method as described under Step 1 above. In particular, we set all measures f t+1 (·) equal to zero and initialize the measure of the 1-year-old cohort for each productivity type (e j , θiθ ), j = 1, 2, iθ = 1, 2: f t+1 (1, e j , θiθ , ˜kik ) = 88
µ1 4
0
if ik = 1, else.
We distinguish between θ t and θiθ in our notation. θ t is the idiosyncratic stochastic productivity of the worker in period t, while θ1 and θ2 are the elements of the state space Θ that constitute the possible set of realizations.
11.3 Overlapping Generations with Aggregate Uncertainty
707
Next, we iterate over all ages s = 1, . . . , 69 in period t. At each age s, we iterate over all grid points (e j , θiθ , ˜kik ) for j = 1, 2, iθ = 1, 2 and ik = 1, . . . , nk to compute the next-period wealth k0 (s, e j , θiθ , ˜kik ) and add the current-period measure f t (s, e j , θiθ , ˜kik ) to the next-period measure f t+1 (s + 1, ., ., .). If next-period wealth ˜k0 (s, e j , θiθ , ˜kik ) = kikk happens to be a grid point, we simply add the respective measure at age s to the measure at age s + 1, noting that only φ s of the households survive and that the size of the cohort in the total population declines by a factor of (1 + n) due to population growth: SET f t+1 (s + 1, e j , θ 0 , ˜kikk ) = f t+1 (s + 1, e j , θ 0 , ˜kikk ) + π(θ 0 |θiθ ) × f t (s, e j , θiθ , ˜kik ) ×
φs , 1+n
for θ 0 ∈ {θ1 , θ2 }.
If next-period wealth ˜k0 (s, e j , θiθ , ˜kik ) lies between the asset grid points kikk and kikk +1 , we add the measure in period s to the measures of these two points at age s + 1 according to: SET f t+1 (s + 1, e j ,θ 0 , ˜kikk ) = f t+1 (s + 1, e j , θ 0 , ˜kikk ) ˜ki +1 − ˜k0 φs kk + π(θ 0 |θiθ ) × f t (s, e j , θiθ , ˜kik ) × , ˜ki +1 − ˜ki 1+n kk kk SET f t+1 (s + 1, e j ,θ 0 , ˜kikk +1 ) = f t+1 (s + 1, e j , θ 0 , ˜kikk +1 ) ˜k0 − ki φs kk + π(θ 0 |θiθ ) × f t (s, e j , θiθ , ˜kik ) × , kikk +1 − kikk 1+n for θ 0 ∈ {θ1 , θ2 }.
We also use the distribution in period t, f t (s, e, θ , ˜k), to compute distributional measures for our results, i.e., the Gini coefficients of income and the income shares of the bottom four quintiles, the 80%–95% percentile, and the top 5% as well as GDP Y˜t . We save the time series for three aggregates ˜ t , ˜L t , and T cr t for which we specify the law of motion g = (g K , g L , g Tr )0 . K ˜ t , ˜L t , T cr t )}200 will be used in the next step. After these comThe series {(K t=0 putations, we no longer need to save f t (·) and can delete it to save memory space. The computational time for the simulation of 200 periods again depends crucially on the computer language that you apply. As presented in Table 11.7, it only took us 4 minutes to simulate 200 periods with GAUSS,
708
11 OLG Models with Uncertainty
while the Python code took more than 1 hour for the exact same number of operations.89 Let us emphasize one point here. Many studies on OLG models with aggregate uncertainty consider a sample of approximately 1,000 households for each generation and simulate their behavior. We find that this method has several disadvantages. First, it is very time-consuming. We instead advocate storing the actual distribution at the grid points (s, e, θ , ˜k) as in Algorithm 8.4.3. This procedure requires less storage capacity. Importantly, the computation of the next-period distribution is much faster than the simulation of some 1,000 households in each generation. Second, if we simulate the behavior of the household sample for each generation, we will have to use a random number generator to switch the agent’s type from θ to θ 0 . As we are only using some 1,000 agents, the law of large numbers does not need to hold, and the percentage of the agents with θ 0 = θ1 or θ 0 = θ2 is not equal to 50%. Therefore, during our simulation, we always have to adjust the number of agents with productivity θ 0 = θ1 (θ 0 = θ2 ) to one half in each generation, which involves some arbitrariness because we have to select some households whose productivity is changed ad hoc.90 ˜ 0 , ˜L , AND T fr AND GOOD STEPS 5-7: ESTIMATE THE LAW OF MOTION FOR K NESS OF FIT. We simply apply ordinary least squares (OLS) to estimate the ω coefficients in (11.63)–(11.65). We update the coefficients by 30% in each outer loop and stop the algorithm as soon as the maximum absolute change in ω·,K is below 0.001. In our last iteration, the R2 in the ˜ 0 , ˜L , and T fr exceeds 0.9999. Therefore, we can be three regressions of K confident that our postulated law of motion g(·) is satisfactory.91 The total computation of OLG_Krusell_Smith.g (OLG_Krusell_Smith.py) takes some 5 hours (2 days) on an Intel Pentium(R) M, 319 MHz machine in the case of GAUSS (Python). BUSINESS CYCLE DYNAMICS OF THE INCOME DISTRIBUTION. Figure 11.15 describes the behavior of our economy in the nonstochastic steady state. 89
We encourage a reader who uses Python to apply methods of parallelization, multithreading and/or Numba to accelerate program execution. 90 Please compare this with Section 9.4. 91 Compare our discussion of R2 as a measure of accuracy in the Krusell-Smith Algorithm 9.4.1 and its short-fall in Chapter 9.
11.3 Overlapping Generations with Aggregate Uncertainty
709
We graph the average wealth of the cohorts in the top row, average labor supply of the workers in the medium row, and average gross income in the bottom row. Agents accumulate savings until the age 36 (corresponding to real lifetime age 55) and dissave thereafter. Labor supply peaks prior to wealth (and age-specific productivity) at real lifetime age 30. With increasing wealth, households decrease their labor supply ceteris paribus. Total income which is defined as the sum of wage and interest income before taxes plus pensions and transfers peaks at real lifetime age 40. Our average-age profiles do not completely accord with empirical observations in Budría Rodríguez et al. (2002). Based on the 1998 data from the Survey of Consumer Finances they find that US household income, earnings, and wealth peak around ages 51-55, 51-55, and 61-65, respectively. As one possible explanation why households accumulate savings over a shorter time period and supply less labor in old age in our model than empirically, we conjecture that some important aspects are missing, e.g. the risk of large old-age medical expenses which might motivate households to save more and longer. Another possible reason for the shorter saving period in our model is the absence of an operative bequest motive. In order to compute the correlation of the income distribution with output, we simulate the dynamics of our economy over 150 periods in ˜ (Step 6).92 The time series of the outer loop over the law of motion for K production and the income shares of the bottom and fourth quintiles over the periods 51-100 that resulted from the final simulation are illustrated in Figure 11.16 for case 1 (lump-sum pensions).93 In the top row, we graph the dynamics of output. If the technology level jumps from Z b to Z g or vice versa, this is also instantaneously reflected in the movement of the production level. Of course, these abrupt changes in production are a direct consequence of our assumption that technology Z t can only take two different values. Notice that after an initial upward (downward) jump, production continues to increase (fall) as capital accumulates (decumulates) gradually. In the medium row, we graph the behavior of the bottom quintile of the income distribution. Evidently, this income share does not mimic the 92
We drop the first 50 periods from our time series so that the initialization of the distribution in period 0 has no effect on our results. Notice that this is necessary in our model as the nonstochastic steady state capital stock may diverge from the long-run average capital stock in the stochastic economy. 93 If you run the program OLG_Krusell_Smith, of course, the outcome might look differently since it depends on the realizations of Z t as the outcome of your random number generator.
710
11 OLG Models with Uncertainty
˜ks
Average Cohort Wealth 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 1
10
20
30
40
50
60
s
70
Average Cohort Labor Supply 0.40
¯l s
0.35 0.30 0.25 0.20 1
5
10
15
20
25
30
35
40
45
s Average Cohort Income 0.60
˜y s
0.50 0.40 0.30 0.20 10
20
30
40
50
s
60
70
Figure 11.15 Nonstochastic Steady-State Age Profiles
behavior of production closely and is negatively correlated with output. Consider the periods 74-82. In period 74, aggregate productivity is equal to Z t = Z g and consequently drops to Z t = Z b in the following periods 75-82. As a consequence, output Y˜t falls from 0.559 in period 74 to 0.529 in period 75 and continues to fall slowly until period 82. The income share of the poorest quintile increases between period 74 and 75, from 5.75% to 6.21%, and continues to slowly increase thereafter. The lowest
11.3 Overlapping Generations with Aggregate Uncertainty
711
Production 0.56 0.55 0.54 0.53 0.52 51
55
60
65
70
75
80
85
90
95
100
90
95
100
90
95
100
t
6.50 6.25 6.00 5.75 5.50 5.25 5.00
Income Share Bottom Quintile
·10−2
51
55
60
65
70
75
80
85
t Income Share Fourth Quintile 0.234 0.232 0.230 0.228 0.226 0.224 0.222 51
55
60
65
70
75
80
85
t Figure 11.16 Simulation Results
income quintile is predominantly composed of the young workers with the lowest idiosyncratic productivity. For example, productivity of the 1-year old worker with θ = θ1 and e = e1 amounts to only 25% of average productivity among workers, e1 · θ1 · ¯y 1 = 0.727 · 0.57 · 0.596 = 0.247. In the case of lump-sum pensions, wage income of the least productive worker is lower than pension income of the retirees so that the lowest income quintile does not contain retirees in case 1. If during a recession,
712
11 OLG Models with Uncertainty
stochastic aggregate technology Z t falls from 1.02 to 0.98, wages, interest rates, and pensions decrease by approximately 4% on impact. There are multiple opposing effects on the income share of the bottom quintile: 1. Government transfers drop during a recession. Since transfers constitute a larger component of total income among the households in the lowest income quintile than among the richer households their income share drops. 2. Workers decrease their labor supply (the substitution effect is stronger than the income effect); however, households in the lowest quintile decrease their labor supply by less than the high-productivity workers as the decline in transfers t r results in a stronger income effect among them. As a consequence, their labor income rises relative to that of the high-productivity workers (but falls relative to pension income). 3. Since the capital stock is predetermined, but employment decreases, the marginal product of capital declines stronger than the marginal product of labor on impact. Therefore, capital income drops and the relative income of the credit-constrained households in the lowest income quintile rises relative to that of the income-rich households. The total effect on the correlation of the bottom income share with GDP is negative and amounts to -0.17 as presented in Table 11.8. In the case of the fourth income share (60%-80%) illustrated in the bottom row of Figure 11.16, the comovement with GDP is less evident. During the extracted recession period 75-82, the income share of the quintile only starts to rise in period 76 (which is hard to discern in the Figure 11.16). The income share of the fourth quintile amounts to 23.3%, 23.3%, 22.4%, and 22.5% in the periods 73-76. Accordingly, the income share only drops with a lag of one period. Thereafter, it starts to rise again and moves in opposite direction of production. As presented in Table 11.8, the correlation is close to zero and amounts to 0.07. As one explanation for the initial persistence of the income share, this quintile contains many workers with high productivity e = e2 and affluent retirees. While the workers decrease their labor supply during the recession, the rich retirees cannot adjust their behavior along this margin. As a consequence, their relative income increases. As the recession continues, however, the composition of the households in the quintile changes.94 94
For example, the workers with the highest productivity prior to retirement are among the top income earners and may move to the fourth quintile during retirement. Since their savings are relatively smaller due to the long recession, their interest income is also much less than that of households who enter retirement after a boom period.
11.3 Overlapping Generations with Aggregate Uncertainty
713
Table 11.8 shows in detail the behavior of all income quintiles. In the first entry row, we display the empirical correlations of output with the 1st, 2nd, 3rd, and 4th income quintiles, and the 80-95% and 95-100% income groups for the US economy, respectively.95 In the second row, you find the values as resulting from the simulation of the most preferred model of Castañeda et al. (1998b). The last two lines display the values obtained from simulating our economy for the two cases that pensions are either lump-sum or proportional to the individual efficiency e. Obviously, both our specifications 1) and 2) have their strengths and weaknesses. The model 1) with the lump-sum pensions apparently works better in the explanation of the behavior of the first quintile, while the case 2) with pensions proportional to productivity e implies correlations of the fourth quintile in better accordance with empirical observations. Table 11.8 Cyclical Behavior of the Income Distribution 0-20%
20-40%
40-60%
60-80%
80-95%
95-100%
US
0.53
0.49
0.31
-0.29
-0.64
0.00
Castañeda et al. (1998)
0.95
0.92
0.73
-0.56
-0.90
-0.84
Our model case 1 case 2
-0.17 -0.59
0.17 0.78
0.37 0.25
0.19 -0.25
0.07 0.26
-0.30 -0.29
Notes: Entries in rows 1 and 2 are reproduced from Table 4 in Castañeda et al. (1998). Annual logarithmic output has been detrended using the Hodrick-Prescott filter with smoothing parameter λ = 100.
Comparing the results for our overlapping generations model with those from Castañeda et al. (1998b), we conclude that a combination of these two models seems to be a promising starting point to get a better explanation of the cyclical income distribution. Both unemployment and the demographic structure (with pensions) are important explanatory factors of the income distribution which need to be taken in consideration. The focus of the average cyclical patterns of the income shares, however, might hide the importance of specific historic episodes. Income is composed of both labor 95
The estimates are reproduced from Table 4 in Castañeda et al. (1998b).
714
11 OLG Models with Uncertainty
and capital income. As we noticed during the two most recent recessions, the Great Recession during 2007-2009 and the COVID-19 crisis during 2019-2021, capital income may behave very differently during a recession depending on the cause of the recession. For example, the stock market index SP 500 fell by 38% over the year 2008 during the housing crisis, but increased by 28% over the year 2019. In the former case, the crisis started in the financial markets (the subprime crisis), while the latter was caused by an epidemic affecting both (global) supply and demand. Therefore, capital income behaved diametrically different between these two recessions and an analysis that only considers averages might be misleading.
Appendix 11
715
A.11 Derivation of the Stationary Dynamic Program of the Household Section 11.2 presents an OLG model with individual uncertainty in an environment where all aggregate variables, including factor prices, evolve deterministically. In this Appendix, we first present the decision problem of a s-year old household in period t. Since this household is born into an economy with a growing population and a growing labor productivity, we derive the recursive formulation of this problem in scaled variables that exhibit zero growth over time. To formulate the intertemporal optimization problem more elegantly, we ins,s+i troduce the variable ψ t which describes the conditional survival probability of the s-year old household in period t to be alive at age s + i in period t + i. s,s+i The conditional survival probability ψ t can be computed with the help of the unconditional survival probabilities from age s + i − 1 to age s + i in a recursive way according to: 1 if i = 0, s,s+i ψt = s,s+i−1 s+i−1 ψt φ t+i−1 else. s+i T −s The household at age s = 1, . . . , T seeks sequences of consumption {c t+i }i=0 W s+i T −s 96 and labor supply {l t+i }i=0 that solve the problem: T −s s+i γ s+i 1−γ 1−η X (c t+i ) (1 − l t+i ) s i s,s+i max U t = E t β ψt , 1−η i=0
subject to,
s+i s+i+1 s+i k s (1 + τct+i )c t+i + ks+i+1 t+i+1 + b t+i+1 = y t+i + [1 + (1 − τ t+i )(r t+i − δ)]k t+i
+ R bt+i bs+i t+i + t r t+i ,
¨ s+i y t+i
=
x s+i+1 t+i+1
p
s+i (1 − τlt+i − τ t+i ) ¯y s+i eθ t+i A t+i w t+i l t+i , s+i pen(x t+i ),
¨ (s+i−1)x s+i +(1−τl t+i
=
(1 + gA)x s+i t+i ,
p s+i y s+i eθ t+i A t+i l t+i t+i −τ t+i ) ¯
s+i
,
i = 0, . . . , T W − s, i = T W − s + 1, . . . , T − s,
i = 0, . . . , T W − s, i = T W − s + 1, . . . , T.
The household is uncertain with respect to his individual productivity θ t+i but not with respect to tax rates, factor prices, and aggregate variables. In equilibrium, the return on capital and the return on bonds cannot differ from each other. Otherwise, the household would not willingly hold both assets in his portfolio. Accordingly, we cannot determine the share of government bonds in the household’s wealth ast = kst + bst . Using R bt = (1 + (1 − τkt )(r t − δ)) for all t, we can reformulate the household’s budget constraint to 96
We ignore the additive utility from government consumption and the credit constraint, as+i ≥ 0, for ease of exposition. t+i
716
11 OLG Models with Uncertainty s+i c s+i s+i+1 y t+i + 1 + (1 − τkt+i )(r t+i − δ) as+i t+i + t r t+i = (1 + τ t+i )c t+i + a t+i+1 .
Since aggregate labor productivity A t grows deterministically at the rate gA ≥ 0, individual consumption c ts , income y ts , wealth ast , and accumulated earnings x st will also grow over time. To remove this trend, we reformulate the decision problem in variables scaled by the level of aggregate labor productivity: ˜c ts :=
c ts
, ˜y ts :=
At
y ts At
, a˜st :=
ast At
, x˜ st :=
x st At
, ter t :=
t rt . At
In these variables, the objective function can be written as: U ts = E t
γ(1−η) At
(˜c ts )γ (1 − l ts )1−γ
1−η
1−η
+ βΨ ts,s+1 ((1 +
gA)A t )
γ(1−η)
+ β 2 Ψ ts,s+2 (1 + gA)2 A t
s+1 γ s+1 1−γ (˜c t+1 ) (1 − l t+1 )
1−η
1−η
s+2 γ s+2 1−γ 1−η (˜c t+2 ) (1 − l t+2 )
γ(1−η)
1−η
+ ... .
Defining the variables ˜st := u
U ts γ(1−η)
At
and β˜ := β(1 + gA)γ(1−η)
we can write the household’s objective function in terms of scaled variables as: ˜st u
= Et
X T −s i=0
˜i
β
Ψ ts,s+i
s+i γ s+i 1−γ (˜c t+i ) (1 − l t+i )
1−η
1−η
.
Employing the technique to derive a recursive definition of an infinite sum introduced in Section A.8 allows us to write the previous equation as: ˜st u
:=
(˜c ts )γ (1 − l ts )1−γ 1−η
1−η ˜ s Et u ˜s+1 + βφ t t+1 .
The individual state variables are age s, the permanent productivity type e, the idiosyncratic productivity shock θ t , wealth a˜st , and accumulated earnings x˜ st . We summarize these variables in the row vector ˜zst := s, e, θ t , a˜st , x˜ st . The value function, i.e., the function that solves the household’s optimization ˜ t , depends not only on the vector ˜zst but problem and returns the maximum of u additionally on factor prices in period t. To capture the latter, we denote the
Appendix 11
717
value function by v˜t (˜zst ). We are now in the position to present the recursive formulation of the household’s decision problem: ( ) 1−η (˜c ts )γ (1 − l ts )1−γ s s s+1 ˜ E t v˜t+1 ˜z v˜t ˜z t = max + βφ , (A.11.1) t t+1 ˜c ts ,l ts 1−η subject to:
0 = ˜y ts + [1 + (1 − τkt )(r t − δ)]˜ ast + Ý t r t − (1 + τct )˜c ts − (1 + gA)˜ as+1 t+1 , ¨ p (1 − τlt − τ t ) ¯y s eθ t w t l ts , s = 1, . . . , T W , s ˜y t = pg en(˜ x st ), s = T W + 1, . . . , T, x˜ s+1 t+1
¨ (s−1)˜x s + ¯y s eθ s w t
=
x˜ st ,
(1+gA )s
s t lt
, s = 1, . . . , T W ,
s = T W + 1, . . . , T.
718
11 OLG Models with Uncertainty
A.12 First-Order Conditions of the Stationary Dynamic Program (11.13) Before we derive the first-order conditions of the stationary dynamic programming problem (A.11.1) we simplify the problem further and consider the steady state of the economy where all scaled aggregate variables and factor prices are time invariant. Accordingly, we can suppress the time index in (A.11.1). In addition, we drop the age index from the state variable ˜z = (s, e, θ , a˜, x˜ ) and the individual variable ˜c , l, a˜, and x˜ . Moreover, we use the prime 0 to distinguish between the current period values of the individual state variables and their next period values so that ˜z0 = (s + 1, e, θ 0 , a˜0 , x˜ 0 ). Furthermore, we denote expectations with respect to next-period values given the current state ˜z by E˜z . Accordingly, the stationary dynamic program of the s-year old household reads: v˜(˜z) = max ˜c ,l
˜c γ (1 − l)1−γ 1−η
1−η
˜ s E˜z v˜(˜z0 ) + βφ
(A.12.1a)
subject to ˜y + [1 + (1 − τk )(r − δ)]˜ a + ter − (1 + τc )˜c , 1 + gA ¨ (1 − τl − τ p ) ¯y s eθ wl, s = 1, . . . , T W , ˜y = pg en(˜ x ), s = T W + 1, . . . , T
a˜0 =
0
x˜ =
¨ (s−1)˜x + ¯y s eθ wl x˜ ,
(1+gA )s
, s = 1, . . . , T W ,
s = T w + 1, . . . , T.
(A.12.1b)
(A.12.1c)
(A.12.1d)
Differentiating the rhs of the Bellman equation (A.12.1a) with respect to ˜c and l yields the following two first-order conditions:97 0=
∂ u(˜c , 1 − l) ˜ s ∂ v˜(˜z0 ) 1 + τc , − βφ E˜z ∂ ˜c ∂ a˜0 1 + gA
∂ u(˜c , 1 − l) ˜ s ∂ v˜(˜z0 ) (1 − τl − τ p ) ¯y s eθ w + βφ E˜z ∂ (1 − l) ∂ a˜0 1 + gA s 0 ˜ s E˜z ∂ v˜(˜z ) ¯y eθ w . + βφ ∂ x˜ 0 (1 + gA)s
0=−
(A.12.2) (A.12.3)
Differentiating the Bellman equation (A.12.1a) from left to right, ignoring the max operator and the dependence of a˜0 , ˜c , and l on the state vector ˜z (an application of the envelope theorem) we obtain 97
For ease of exposition, we use u(˜c , 1 − l) to denote the current-period utility function.
Appendix 12 ∂ v˜(˜z) ∂ a˜
719 0 k ˜ s E˜z ∂ v˜(˜z ) 1 + (1 − τ )(r − δ) βφ 0 ∂ a˜ 1 + gA
= (A.12.2)
=
∂ u(˜c , 1 − l) 1 + (1 − τk )(r − δ) . ∂ ˜c 1 + τc
Accordingly, condition (A.12.2) implies the familiar Euler equation98 ∂ u(˜c , 1 − l) ∂ u(˜c 0 , 1 − l 0 ) = β(1 + gA)γ(1−η)−1 φ s Ez [1 + (1 − (τk )0 )(r 0 − δ)]. ∂ ˜c ∂ ˜c 0 Using (A.12.2) in condition (A.12.3) yields: ∂ u(˜c , 1 − l) ∂ u(˜c , 1 − l) (1 − τl − τ p )θ ¯y s w + ∂ (1 − l) ∂ ˜c 1 + τc s 0 ˜ s E˜z ∂ v˜(z ) θ ¯y w , + βφ ∂ x˜ 0 (1 + gA)s
0=−
which is equation (11.38). Note that in the case where pension payments are independent of the employment history, the second term on the rhs of the previous equation is equal to zero and we obtain the static labor supply condition ∂ u(˜c ,1−l) ∂ (1−l) ∂ u(˜c ,1−l) ∂ ˜c
=
(1 − τl − τ p ) ¯y s eθ w . 1 + τc
Note in addition that the operator Ez defines the sum X E˜z v˜(s + 1, e, θ 0 , a˜0 , x˜ 0 ) = Prob θ 0 |θ v˜(s + 1, e, θ 0 , a˜0 , x˜ 0 ), θ0
since the discrete valued individual productivity θ ∈ {θ1 , . . . , θnθ } is the single stochastic variable in the model.
98
Note that ˜c 0 refers to scaled consumption of the s + 1 year old household and that in the stationary equilibrium r = r 0 and τk = (τk )0
720
11 OLG Models with Uncertainty
A.13 Derivation of the Parameters of the AR(1)-Process with Annual Periods In this Appendix, we derive the parameters (ρ, σ) that we were using for the AR(1)-process with annual periods in Section 11.3.1. In particular, we choose (ρ, σ) so that they correspond to the parameters of the AR(1)-process with quarterly periods, (ρ q , σq ) = (0.95, 0.00763). q Let z t denote the logarithm of the technology level in the model with quarterly periods that follows the AR(1)-process: q
q
q
q
q
q
q
q
q
q
q
q
z t+1 = ρ q z t + ε t+1 , q where ε t ∼ N 0, (σq )2 . Similarly, z t+2 = ρ q z t+1 + ε t+2 , z t+3 = ρ q z t+2 + ε t+3 , z t+4 = ρ q z t+3 + ε t+4 . Let z Ta denote the logarithm of the technology level in the corresponding model with annual periods that follows the AR(1)-process: z Ta +1 = ρz Ta + ε T +1 , where ε T ∼ N 0, σ2 . If we identify the technology level z q at the beginning of the quarters t, t + 4, t + 8 with the annual technology level z a at the beginning of the periods T , T + 1, T + 2, we find: q
q
q
z Ta +1 = z t+4 = ρ q z t+3 + ε t+4 , q q q = ρ q ρ q z t+2 + ε t+3 + ε t+4 , q
q
q
= (ρ q )2 z t+2 + ρ q ε t+3 + ε t+4 , q
q
q
q
= (ρ q )3 z t+1 + (ρ q )2 ε t+2 + ρ q ε t+3 + ε t+4 , q
q
q
q
q
q
q
q
q
= (ρ q )4 z t + (ρ q )3 ε t+1 + (ρ q )2 ε t+2 + ρ q ε t+3 + ε t+4 , = (ρ q )4 z Ta + (ρ q )3 ε t+1 + (ρ q )2 ε t+2 + ρ q ε t+3 + ε t+4 . Accordingly, we can make the following identifications: ρ = (ρ q )4 q
q
q
q
ε T = (ρ q )3 ε t+1 + (ρ q )2 ε t+2 + ρ q ε t+3 + ε t+4 . Therefore, var(ε) = σ2
= var εq (1 + ρ q + (ρ q )2 + (ρ q )3 ) = 1 + (ρ q )2 + (ρ q )4 + (ρ q )6 ) (σq )2 .
For (ρ q , σq ) = (0.95, 0.00763), we get ρ = 0.814 and σ = 0.0142.
Problems
721
Problem 11.1: Concentration of Wealth and the Natural Real Interest Rate Recompute the model described in Section 11.2 for the following changes: 1) Borrowing constraint: Recompute the model for a less stricter borrowing constraint where the agent can borrow up to the average net wage in the economy, (1 − τl − τ p )w¯l. How does this affect the Gini coefficient of wealth? 2) Pay-as-you-go pensions: Compute the effect of higher public pensions on the wealth heterogeneity. For this reason, increase the replacement rate of pensions with respect to average gross wages to 50%. 3) Demographics: What is the effect of aging on the inequality of earnings, income, and wealth? Recompute the model for the population parameters for the year 2100 as estimated by UN (2015). 4) Perfect annuities markets: Assume that there exist perfect annuities markets in the economy. Financial intermediaries collect accidental bequests without transactions costs and redistribute them to those agents of the same age as the deceased. Consequently, the budget constraints of the households adjust as follows: (1 + τct )c ts = y ts +
1 + (1 − τkt )(r t − δ) φ ts
kst +
R bt φ ts
bst + t r t − ks+1 − bs+1 t t+1 .
The government no longer collects accidental bequests, Beq t = 0. Recompute the model of Section 11.2 and show that wealth inequality increases. 5) Bequests: Introduce bequests into the model following Eggertsson et al. (2019). Accordingly, lifetime utility (11.2) is replaced by U t1
:=E t
X T s=1
+β
j−1 s s β s−1 Πsj=1 φ t+ j−2 u(c t+s−1 , l t+s−1 ) + v(g t+s−1 ) T
j−1 Π Tj=1 φ t+ j−2
T ω(beq t+T )
,
T where beq t+T denotes the amounts of bequests left per descendent at the end of age T . Assume that only bequests left after the maximum period of life T is considered in utility and that all bequests are left to the cohorts at age T − 24.
As only some of the households survive until the age T , not all members of the (T − 24)-year-old would receive a bequest. For simplicity assume that the cohort of the (T − 24)-year-old participates in a bequest insurance market and
722
11 OLG Models with Uncertainty
all households in this cohort receive the same bequest. In addition, assume that the utility from bequests is given by ω(beq) = ω0
beq1−η . 1−η
Calibrate the parameter ω0 such that total bequests are equal to 3.0% of output. In addition, assume that there exists perfect annuities market as above. Recompute the model and evaluate the effects of perfect annuities markets and the bequest motive on 1) wealth inequality and 2) the real interest rate r b . Show that these two model elements help to establish a much lower real interest rate r b ; a negative natural real interest rate serves as the central ingredient in the modeling of secular stagnation in Eggertsson et al. (2019). 6) Utility function Consider the following function of instantaneous utility from consumption and labor taken from Trabandt and Uhlig (2011): 1+1/ϕ if η = 1, ln c − κl u(c, l) = 1 c 1−η 1 − κ(1 − η)l 1+1/ϕ η − 1 if η > 0 and η 6= 1. 1−η Choose calibration parameters κ = 3.63 and ϕ = 1.0 as in Section 10.3. What are the effects of the utility function on wealth inequality (as measured by the Gini coefficient) and the real interest rate on bonds r b ? In light of these results, would you suggest that researchers should study the sensitivity of their dynamic general equilibrium results with respect to specification of the instantaneous utility function?
Problem 11.2: Recompute the Model in Section 11.2.3 Different from the program AK70_prog_pen.g, optimize the right-hand side of the Bellman equation by choosing optimal next-period wealth a˜0 and labor supply l between grid points. Use golden section search for the optimization with respect to wealth a˜0 and compute the optimal labor supply with the help of the first-order condition (11.38) using the Newton-Rhapson method.
Problem 11.3: Business Cycle Dynamics of Distribution Measures Recompute the dynamics of Example 11.3.1 adding the following modifications:
Problems
723
1) In each cohort, there is an equal share of unskilled and skilled workers with permanent productivity e ∈ {0.53, 1.47} so that the individual hourly gross wage of the s-year old with permanent productivity e amounts to w t e ¯y s A t . 2) Assume that pensions are adjusted to the wage deviations from its steady state value with a lag of one period according to pg en t − pg en w t−1 − w = . pg en w l ow,s
3) Assume that the skilled and unskilled supply labor at the amounts of l t hi gh,s and l t at age s in period t, respectively. Use the nested CES production function suggested by Krusell et al. (2000): Yt = Z t µ
σ L tlow
+ (1 − µ)
ρ αK t
+ (1 − α)
hi gh ρ ρ Lt
σ
1/σ ,
(P.11.3.1)
where σ and ρ govern the substitution elasticities between unskilled labor hi gh L tlow , capital (equipment) K t , and skilled labor L t . If σ > ρ, capital is more complementary with skilled labor than with unskilled labor. Apply the parameterization of Krusell et al. (2000) with 1/(1−ρ) = 0.67 and1/(1−σ) = 1.67.99 Calibrate µ and α so that the wage share in GDP is equal to 64% and the skill premium of the high-skilled amounts to 150%. Re-compute the impulse response functions and second moments of the Gini coefficients for wealth, income, and earnings in each case.
Problem 11.4: Business Cycle Dynamics of the Income Distribution Consider the model described in Section 11.3.4. Recompute the model for quarterly frequencies. Be careful to recalibrate β and δ. What are the effects on business cycle statistics for the income shares?
Problem 11.5: Redistributive Effects of Cyclical Government Spending Introduce stochastic government spending G t into the model in Section 11.3.4. Assume that government spending follows the AR(1)-process 99 Maliar et al. (2022) re-estimate the capital-skill complementarity model of Krusell et al. (2000) for the period 1963-2017 and confirm the success of the model to capture the empirical skill premium in the US economy. They project that inequality will continue to increase over the next two decades.
724
11 OLG Models with Uncertainty ˜ t = ρ ln G ˜ t−1 + (1 − ρ) ln G ˜ + "t , ln G
with " ∼ N (0, σ2 ), ρ = 0.7, and σ = 0.007. Assume further that government expenditures are financed with a proportional tax on factor income and that the government budget balances in each period. 1) Reformulate the model. 2) Compute the nonstochastic steady state assuming that government expenditures amount to 18% of total production. What are the values for the nonstochastic steady state transfers? 3) Discretize the AR(1)-process for government consumption choosing three values. Let the middle point correspond to the one in the nonstochastic steady state. Use the Markov-Chain Approximation algorithm from Section 16.4. 4) Compute the business cycle dynamics for the model. The state space consists ˜ ˜ , G). of (s, e, θ , ˜k; Z, K 5) How do cyclical government spending affect the income distribution? Simulate a time series where the government expenditures are increased above the steady state level for one period and fall back to the steady state level thereafter. Plot the impulse response functions of the gross income Gini index.
Part III
Numerical Methods
Chapter 12
Linear Algebra
12.1 Introduction This chapter covers some elementary and some relatively advanced but very useful concepts and techniques from linear algebra. Most of the elementary material here can be found in any undergraduate textbook on linear algebra, e.g., Lang (1986). For the more advanced subjects, Bronson (1989), Golub and Van Loan (1996), and Magnus and Neudecker (1999) are good references. In addition, many texts on econometrics review matrix algebra, e.g., Greene (2012) Appendix A, Judge et al. (1988) Appendix A, and Lütkepohl (2005) Appendix A.
12.2 Complex Numbers A complex number c is an object of the form c = α + iβ, where α and β are real numbers, α, β ∈ R. The symbol i designates the imaginary unit, p 2 whose square is defined as negative unity, i.e., i = −1 or i = −1. The set of all these numbers is denoted by the symbol C. In the definition of c, the real number α is called the real part of the complex number c, and the real number β is called the imaginary part. If we measure α on the abscissa and β on the ordinate, c is a point in the plane, sometimes called the Gaussian plane. Instead of representing c with the pair (α, β), we can use polar coordinates. If θ is the angle (measured in radians) between the horizontal axis and the vector from the origin to the point (α, β), then c = r(cos θ + i sin θ ) as shown in Figure 12.1. According to the Pythagorean theorem, the length of the vector is equal to
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_12
727
728
12 Linear Algebra Imaginary axis (r cos θ , r i sin θ )
β
r
θ α
Real axis
Figure 12.1 Gaussian Plane
p r = α2 + β 2 . r is called the modulus (or simply absolute value |c|) of the complex number c. The complex conjugate of c is denoted by ¯c and given by ¯c = α − iβ. The addition and multiplication of the complex numbers c1 = α1 + iβ1 and c2 = α2 + iβ2 are defined by the following formulas: c1 + c2 = α1 + α2 + i(β1 + β2 ), c1 c2 = (α1 + iβ1 )(α2 + iβ2 ) = (α1 α2 ) − (β1 β2 ) + i(α1 β2 + α2 β1 ). In polar coordinates, the product c1 c2 is given by1 c1 c2 = r1 r2 [cos(θ1 + θ2 ) + i sin(θ1 + θ2 )]. Thus, the vector representing c1 in the Gaussian plane is stretched by r2 and rotated counterclockwise by θ2 radians. If c1 = c2 , this implies that c 2 = r 2 [(cos(2θ ) + i sin(2θ )]; more generally, for each t ∈ N: c t = r t [cos(tθ ) + i sin(tθ )].
Since cos2 (x) + sin2 (x) = 1 for all x ∈ R, this implies
1
This follows from the trigonometric formulas
cos(θ1 + θ2 ) = cos(θ1 ) cos(θ2 ) − sin(θ1 ) sin(θ2 ), sin(θ1 + θ2 ) = sin(θ1 ) cos(θ2 ) + cos(θ1 ) sin(θ2 ).
See, e.g., Sydsæter et al. (1999) p. 15.
(12.1)
12.4 Norms
729
lim |c t | = lim r t = 0 for r ∈ (0, 1).
t→∞
t→∞
(12.2)
Using the modulus as a measure of distance in C implies that the sequence c t converges to the origin of the Gaussian plane if the absolute value of r is less than one. Sometimes we say that a complex number is inside (on or outside) the unit circle. The unit circle is the circle around the origin of the Gaussian plane with a radius equal to one. Thus, complex numbers inside (on or outside) this circle have moduli less than (equal to or greater than) unity.
12.3 Vectors A real (complex) vector of dimension n is a n-tuple of numbers x i ∈ R (x i ∈ C) i = 1, 2, . . . n, and it is denoted by x1 x2 x= ... . xn The space of all n-tuples is Rn (Cn ). Vector addition and scalar multiplication are defined by y1 + b x 1 y2 + b x 2 . z = y + bx = .. . yn + b x n
12.4 Norms Norms are measures of vector length. Since the distance between two vectors is given by the length of their difference, norms also define measures of distance. More formally, a norm on Rn (and similarly on Cn ) is a real-valued function kxk : Rn → R+ with the following three properties: 1) 2) 3)
kxk ≥ 0 for all x ∈ Rn , and kxk = 0 if and only if x = 0 ∈ Rn , kaxk = |a| · kxk for all x ∈ Rn and a ∈ R, kx + yk ≤ kxk + kyk for all x, y ∈ Rn .
(12.3)
730
12 Linear Algebra
The most common examples of norms on Rn are: 1) The `∞ or sup norm: kxk∞ := max |x i |, where |x i | denotes the abso1≤i≤n
lute value of x i .
2) The `2 or Euclidean norm: kxk2 :=
Pn i=1
x i2
1/2
.
12.5 Linear Independence A set of n vectors xi , i = 1, 2, . . . n is linearly independent if and only if 0 = a1 x1 + a2 x2 + · · · + an xn
has only the solution a1 = a2 = · · · = an = 0. If B := {v1 , v2 , . . . , vn } is a set of linearly independent vectors in Rn , then any other vector x ∈ Rn can be represented as a linear combination x = a1 v1 + a2 v2 + · · · + an vn .
Moreover, B is called a basis of Rn .
12.6 Matrices A real (complex) matrix A with the typical element ai j ∈ R (ai j ∈ C) is the following n-by-m array of real numbers: a11 a12 . . . a1m a21 a22 . . . a2m A = (ai j ) := ... ... . . . ... . an1 an2 . . . anm A is called a square matrix if n = m. Other special matrices encountered in the main text are:2 1 0 ... 0 a11 0 . . . 0 a11 a12 . . . a1n 0 a22 . . . 0 0 a22 . . . a2n 0 1 . . . 0 . . . . . . . . .. .. . . . .. .. .. . . . .. .. .. . . . ... 0
0 . . . ann
0
0 . . . ann
0 0 ... 1
diagonal matrix upper triangular matrix identity matrix 2
In this text, we denote the identity matrix of dimension n by I n .
12.6 Matrices
731
If we consider the matrix A = (ai j ) as the row vector [a11 , a21 , . . . an1 , a12 , . . . , an2 , . . . , a1m . . . , anm ], | {z } | {z } | {z } column 1
column 2
column m
we may apply the definition of any vector norm to this ‘long’ vector to find the corresponding matrix norm. For instance, the `2 norm of A is kAk =
m X n X j=1 i=1
!1/2 ai2j
.
Matrix addition and scalar multiplication are defined componentwise: a11 + d b11 a12 + d b12 . . . a1m + d b1m a21 + d b21 a22 + d b22 . . . a2m + d b2m C = A + dB = (12.4) .. .. .. .. . . . . an1 + d bn1 an2 + d bn2 . . . anm + d bnm for A, B, C ∈ Rn×m and d ∈ R. Thus, matrix addition obeys the following rules: A + B = B + A,
(12.5a)
A + (B + C) = (A + B) + C.
(12.5b)
The product of two matrices, A ∈ Rn×m and B ∈ Rm×k , is the n × k matrix C = (ci j ), which is defined by ci j =
m X
ail bl j .
(12.6)
l=1
For matrices A, B, C, and D that have suitable dimensions, matrix multiplication satisfies the following rules A(B + C) = AB + AC,
(12.7a)
A(BC) = (AB)C,
(12.7b)
A(B + C)D = ABD + AC D.
(12.7c)
Notably, however, matrix multiplication is generally not commutative; i.e., AB 6= BA except in special cases. The determinant of a 2 × 2 matrix A, denoted by either |A| or det(A), is defined by
732
12 Linear Algebra
|A| = a11 a22 − a12 a21 .
(12.8)
There is a recursive formula that can be used to compute the determinant of an arbitrary square matrix of dimension n. With an arbitrary row (e.g., i) or column (e.g., j), the formula is: |A| =
n X j=1
ai j (−1)i+ j |Ai j | =
n X i=1
ai j (−1)i+ j |Ai j |,
(12.9)
where Ai j is the square matrix of dimension n − 1 obtained from A by deleting the i-th row and the j-th column. This expansion gives the determinant of A in terms of a sum of the determinants of the matrices of dimension n − 1. The latter can be reduced further to the determinants of matrices of dimension n − 2 and so forth until the summands are 2 × 2 matrices, computed according to equation (12.8). The following rules apply to the determinant of two matrices A, B ∈ Rn×n and a scalar c:3 |cA| = c n |A|,
(12.10a) (12.10b)
|AB| = |A||B|.
The rank of an arbitrary n × m matrix A, denoted by rank(A), is the maximal number of linearly independent rows of A. This number is equal to the maximal number of linearly independent columns of A. The transpose of A ∈ Rn×m , denoted by A0 or AT , is the m × n matrix obtained when the rows and columns of A are interchanged: a11 a21 . . . an1 a12 a22 . . . an2 AT = (aiTj ) = (a ji ) = . .. . . .. ... . . . a1m a2m . . . anm In the case of a complex matrix A ∈ Cn×m , we use the superscript H to denote the complex conjugate transpose of A. AH ∈ Cm×n is the matrix whose element aiHj is the complex conjugate of the element a ji of A:
a¯11 a¯21 a¯12 a¯22 AH = (aiHj ) = .. ... . a¯1m a¯2m 3
... ... .. .
a¯n1 a¯n2 . .. .
. . . a¯nm
See, e.g., Lütkepohl (2005), p. 649.
12.6 Matrices
733
The complex conjugate transpose is also referred to as the Hermitian transpose. The inverse of a square matrix A is denoted A−1 = (a i j ) (note that we use superscripts to indicate the typical element of an inverse matrix) and solves the problem AA−1 = I. If it exists, the inverse is unique and given by ai j =
(−1)i+ j |A ji | |A|
.
(12.11)
If |A| = 0, the inverse does not exist. The expansion formula (12.9) implies that matrices with a row (or column) of zeros or with linearly dependent rows (or columns) have no inverse. In general, an invertible (noninvertible) matrix is called a nonsingular (singular) matrix. Consider a partitioned matrix A and its inverse A−1 defined as: A11 A12 A11 A12 −1 A= , A = 21 22 . (12.12) A21 A22 A A If A11 is invertible, the blocks of the inverse are related to the blocks of A as follows:4 , −1 22 −1 A11 = A−1 11 + A11 A12 A A21 A11 , 22 A12 = −A−1 11 A12 A ,
(12.13)
A21 = −A22 A21 A−1 11 ,
A22 = A22 − A21 A−1 11 A12 .
However, if the matrix A22 is invertible the relation between the blocks of A and those of A−1 is given by:5 A11 = A11 − A12 A−1 22 A21 12
=
21
=
A A
A22 =
−1
,
11
−A A12 A−1 22 , −1 −A22 A21 A11 , −1 11 −1 A−1 22 + A22 A12 A A12 A22 .
(12.14)
A square matrix A is symmetric if it equals its transpose: A = A0 . The transpose operator obeys the following rules: 4 5
See, e.g., Sydsæter et al. (1999), equation (19.48), p. 129. See, e.g., Sydsæter et al. (1999), equation (19.49), p. 130.
734
12 Linear Algebra
(A0 )0 = A, 0
(12.15a)
0
0
(A + B) = A + B ,
(12.15b)
(AB) = B A ,
(12.15c)
(A−1 )0 = (A0 )−1 .
(12.15d)
0
0 0
A square matrix A ∈ Rn×n is called orthogonal if AT A = I n . A complex square matrix A ∈ Cn×n is called unitary if AH A = I n . The trace of a square matrix A is the sum of the elements of its main diagonal; i.e., tr A =
n X
aii .
(12.16)
i=1
The trace operator satisfies the following rules: tr(A + B) = tr(A) + tr(B),
(12.17a)
tr(AB) = tr(BA), if AB is a square matrix, T
tr(A ) = tr(A).
(12.17b) (12.17c)
The Kronecker product ⊗ of two matrices A and B is the following expression:
a11 B a12 B . . . a1m B a21 B a22 B . . . a2m B A⊗ B = .. . . . . ... . .. . an1 B sn2 B . . . anm B
(12.18)
Assuming that the involved matrices have suitable dimensions, the Kronecker product satisfies the following rules:6 (A ⊗ B) T = AT ⊗ B T ,
(12.19a)
A ⊗ (B + C) = A ⊗ B + A ⊗ C,
(A ⊗ B)(C ⊗ D) = AC ⊗ BD, (A ⊗ B)
−1
−1
=A
⊗B
−1
(12.19b) (12.19c)
,
(12.19d)
tr(A ⊗ B) = tr(A) tr(B).
(12.19e)
The vec operator transforms a matrix A ∈ Rn×m into an nm by 1 vector by stacking the columns of A: 6
See, e.g., Lütkepohl (2005), p. 661
12.7 Linear and Quadratic Forms
a11 a21 .. . an1 a12 a22 . vec(A) = .. an2 . .. a 1m a 2m . .. anm
735
(12.20)
The following rules apply to the vec operator: vec(A + B) = vec(A) + vec(B), = (I ⊗ A) vec(B)
vec(AB)
(12.21a) (12.21b)
0
= (B ⊗ I) vec(A),
vec(ABC) = (C 0 ⊗ A) vec(B)
(12.21c)
= (I ⊗ AB) vec(C)
= (C 0 B 0 ⊗ I) vec(A).
12.7 Linear and Quadratic Forms Let a = (a1 , a2 , . . . , an )0 and x = (x 1 , x 2 , . . . , x n )0 denote two column vectors of dimension n. The dot product 0
z(x) = a x =
n X
ai x i
(12.22)
i=1
with given a is called a linear form. The column vector of partial derivatives of z with respect to x i , i = 1, 2, . . . , n, denoted by ∇z, is given by: ∇z(x) :=
∂ a0 x = a = (a0 )0 . ∂x
(12.23)
736
12 Linear Algebra
Since z = z 0 = x0 a we also have ∂ x0 a = a. ∂x
(12.24)
A direct application of these findings leads to the following two rules: ∂ u0 Bx = (u0 B)0 = B 0 u, ∂x ∂ u0 Bx = Bx, ∂u
(12.25a) (12.25b)
where u ∈ Rm , B ∈ Rm×n , and x ∈ Rn . Let A = (ai j ) denote a n × n square matrix and x = (x 1 , x 2 , . . . , x n )0 a n-dimensional column vector. The expression q(x) = x0 Ax,
q ∈ R,
(12.26)
is a quadratic form. If q ≥ 0 (q ≤ 0) for each nonzero vector x, the matrix A is positive (negative) semidefinite. If q > 0 (q < 0), A is positive (negative) definite. Let B ∈ Rm×n , x ∈ Rn , and v = Bx. Since v0 v =
m X i=1
vi2 = x0 B 0 Bx ≥ 0 ∀x,
the square matrix A := B 0 B is clearly positive definite. Using the rule for matrix multiplication given in (12.6), equation (12.26) can be written in several ways: q(x) = =
=
n X n X
ai j x i x j ,
i=1 j=1 n X i=1 n X i=1
aii x i2 +
aii x i2 +
n X n X
ai j x i x j ,
i=1 j=1 j6=i n X n X i=1 j=1+i
(ai j + a ji )x i x j .
Setting a˜i j = a˜ ji ≡ (ai j + a ji )/2, it is obvious that we can represent the quadratic form q(x) equivalently with the symmetric matrix A˜ := (˜ ai j ). Accordingly, it is easy to show that the column vector of the first partial derivatives of q(x) with respect to x i , i = 1, 2, . . . , n, is given by ∇q(x) :=
∂ x0 Ax ˜ = (A + A0 )x = 2Ax. ∂x
(12.27)
12.8 Eigenvalues and Eigenvectors
737
12.8 Eigenvalues and Eigenvectors Let A ∈ Rn×n . A right eigenvector of A to an eigenvalue of λ is a vector v 6= 0 that solves Av = λv
⇔
(A − λI)v = 0.
(12.28)
Similarly, the solution of v0 A = λv0 is called a left eigenvector of A. The system of n linear equations (12.28) has nontrivial solutions v 6= 0 if the determinant |A − λI| vanishes. The condition |A − λI| = 0 results in a polynomial of degree n in λ. It is well known from the Fundamental Theorem of Algebra (see, e.g., Hirsch and Smale (1974) pp. 328ff.) that this polynomial has at least one root. The distinct roots λi , i = 1, 2, . . . , k, k ≤ n are the eigenvalues of the matrix A. The eigenvalues canP be real or n complex and may have algebraic multiplicity mi ≥ 1 with n = i=1 mi .7 All eigenvalues of unitary matrices have absolute values equal to one. Solving equation (12.28) for a given λi produces an associated eigenvector vi . Thus, the eigenvectors are vectors that either stretch or shrink when multiplied by A. If vi solves (12.28) and c is an arbitrary scalar, then cvi also solves (12.28). Therefore, eigenvectors are preserved under scalar multiplication and can be normalized to have unit length. The number of linearly independent eigenvectors to an eigenvalue is referred to as the geometric multiplicity of the respective eigenvalue. There are two important relations between the elements of A and its eigenvalues: n X i=1 n Y i=1
λi =
n X
aii ,
(12.29a)
i=1
λi = |A|.
(12.29b)
In other words, the sum of the eigenvalues of A equals the trace of A, and the determinant of A equals the product of the n eigenvalues. Note that equation (12.28) is a special case of (A − λI)m vm = 0
for m = 1. If there are nontrivial solutions vm for m ≥ 2 but not for m − 1, the vector vm is called a generalized right eigenvector of rank m for the 7
Accordingly, if we count the multiples of a root, the Fundamental Theorem of Algebra implies that the polynomial |A − λI| = 0 has n roots.
738
12 Linear Algebra
square matrix A. The space spanned by the (generalized) eigenvectors of A is called the eigenspace of A. The eigenspace can be partitioned into three subspaces formed by generalized eigenvectors that belong to eigenvalues with 1) moduli less than one (stable eigenspace, E s ), 2) moduli equal to one (center eigenspace, E c ), 3) moduli greater than one (unstable eigenspace, E u ).
12.9 Matrix Factorization Matrix factorizations or, synonymously, matrix decompositions play an important role in the solution of systems of linear difference equations. They are also used to solve systems of linear equations and least squares problems. In the following, we consider the Jordan, the Schur, the QZ, the LU, the Cholesky, the QR, and the singular value factorization.
12.9.1 Jordan Factorization Consider the case of n distinct real eigenvalues and the associated eigenvectors v1 , v2 , . . . , vn of a square matrix A. The matrix V = [v1 , v2 , . . . , vn ] transforms A into a diagonal matrix Λ with the eigenvalues λ1 , λ2 , . . . , λn on its main diagonal: Λ = V −1 AV. In the general case of real and complex eigenvalues, possibly with multiplicity mi > 1, it may not be possible to diagonalize A. However, there exists a matrix M (in general a complex matrix) of a set of linearly independent generalized eigenvectors that puts A in the Jordan canonical form: J1 0 . . . 0 0 J2 . . . 0 A = M J M −1 , J = (12.30) ... ... . . . ... , 0 0 . . . JS
12.9 Matrix Factorization
739
where the Jordan blocks Ji ∈ Csi ×si , i = 1, 2, . . . , S, n = by
PS
i=1 si
are given
0 0 0 .. . . 0 0 0 ... 1 0 0 0 0 . . . λi
λi 0 0 Ji = .. . 0
1 λi 0 .. .
0 1 λi .. .
0 ... 0 ... 1 ... .. . . . .
The scalars λi refer to eigenvalues of A that are not necessarily distinct; i.e., an eigenvalue λi may appear in more than one Jordan block and the sizes s j of all blocks with λ j = λi sum to the algebraic multiplicity mi . Notably, if λi is a simple eigenvalue (i.e., has multiplicity mi = 1), then Ji = λi . The Jordan blocks are determined uniquely. They can be ordered in J according to the absolute values of the eigenvalues of A. There is also a real Jordan factorization of A where each complex root λ j = α j + iβ j in J j is represented by a matrix
α j −β j , βj αj
and the respective elements on the upper-right diagonal are replaced with two-dimensional identity matrices I2 . Consider a matrix A ∈ Rn×n whose n eigenvalues λi are p all real and 1/2 nonnegative (that is, A is positive semidefinite). Let Λ = ( λi ) be the diagonal matrix, with the square roots of the eigenvalues along the main diagonal. Then, A1/2 = V Λ1/2 V −1 , because A1/2 A1/2 = V Λ1/2 V −1 V Λ1/2 V −1 = V ΛV −1 = A. It is easy to show by induction that for any r = 1, 2, . . . A1/r = V Λ1/r V −1 ,
1/r
Λ1/r = (λi ).
(12.31)
We use this definition of the root of a matrix in Section 9.5.2 to compute a 1/8-year transition matrix on the basis of a 5-year transition matrix.
740
12 Linear Algebra
If the matrix A is symmetric and positive semidefinite, the matrix V has the property V V T = I n , and A can be factored as A = P P T , P := V Λ1/2 .
(12.32)
12.9.2 Schur Factorization The Schur factorization of a square matrix A is given by A = T S T −1 .
(12.33)
The complex matrix S is an upper triangular matrix with the eigenvalues of A on the main diagonal. It is possible to choose T such that the eigenvalues appear in any desired order along the diagonal of S. The transformation matrix T is a unitary matrix: T −1 = T H .
12.9.3 QZ Factorization Consider the set of all matrices of the form C := A − λB, where A and B are (possibly complex) square matrices of dimension n and where the scalar λ may be real or complex. This set is called a (matrix) pencil. The eigenvalues of this pencil are the scalars λ that solve the generalized eigenvalue problem |A−λB| = 0. The generalized Schur or QZ factorization of a matrix pencil (A−λB) is given by (see, e.g., Golub and Van Loan (1996), Theorem 7.7.1, p. 377) S = U H BV, T = U H AV,
(12.34)
where S, T, U, V ∈ Cn×n . The matrices S and T are upper triangular matrices, and the matrices U and V are unitary. Unitary matrices have the property that their columns build an orthonormal basis of the complex space Cn such that U H U = I n and V H V = I n , where the superscript H denotes the Hermitian transpose of a given matrix. The eigenvalues of the pencil are λi = t ii /sii for sii 6= 0 and i = 1, . . . , n. If sii = 0 and t ii 6= 0, the eigenvalue µi = sii /t ii of the pencil |µA − B| = 0 is defined and equal
12.9 Matrix Factorization
741
to zero. Thus, the case of sii ≈ 0 is not truly ill-conditioned and one may regard λi as an ‘infinite eigenvalue’. The matrices S, T , V , and U can be constructed such that the eigenvalues λi appear in ascending order with respect to their absolute values.
12.9.4 LU and Cholesky Factorization Consider a system of linear equations a11 x 1 + a12 x 2 + · · · + a1n x n = b1 , a21 x 1 + a22 x 2 + · · · + a2n x n = b2 , .. .. ⇔ Ax = b. . = ., an1 x 1 + an2 x 2 + · · · + ann x n = bn
(12.35)
We assume that the square matrix A has full rank, i.e., that there are no linearly dependent rows or columns in A. In this case, it is possible to factorize A as follows A = LU, 1 l21 l L= 31 ...
0 0 1 0 l32 1 .. .. . .
l n1 l n2 l n3
... 0 u11 . . . 0 0 . . . 0 , U = 0 . . . .. .. . . ... 1 0
u12 u22 0 .. . 0
u13 u23 u33 .. .
(12.36)
. . . u1n . . . u2n . . . u3n . . . .. . .
0 . . . unn
If A is symmetric and positive definite, its Cholesky factor is the lower triangular matrix L that solves L L 0 = A.
(12.37)
Both the LU and the Cholesky factorization can be used to solve the linear ˜ := Ux. Then, it is easy to solve the system system (12.35). Let x L˜ x=b via forward substitution: x˜1 = b1 , x˜2 = b2 − l21 x˜1 ,
x˜3 = b3 − l31 x˜1 − l32 x˜2 , .. .. .=.
742
12 Linear Algebra
˜, the desired solution for x can be obtained via Given the solution for x ˜: backward substitution from Ux = x x˜n , unn 1 x n−1 = (˜ x n−1 − un−1 n x n ) , un−1 n−1 1 x n−2 = (˜ x n−2 − un−2 n−1 x n−1 − un−2 n x n ) , un−2 n−2 .. .. .=. xn =
The solution of a system of linear equations via its LU or Cholesky factorization is the strategy that underlies linear equation solvers. For instance, the LAPACK routine dgesv.for, the GAUSS command x=b/A and the R MATLAB command linsolve employ the LU factorization.
12.9.5 QR Factorization For an A ∈ Rn×m matrix, there is an orthogonal matrix Q ∈ Rn×n and an upper triangular matrix R ∈ Rn×m such that A = QR. If n = m and rank(A) = m, this factorization can also be used to solve the linear system Ax = b. Since Q T Ax = Rx = Q T b =: c, we obtain
r11 0 . .. 0
r12 r22 .. .
... ... .. .
r1n x1 c1 r2n x 2 c2 . = . . .. . .. ..
0 . . . rnn
xn
cn
This system can be solved for x i via backward substitution. In Section 3.2.3, we employ another version of the QR decomposition for the cases n < m and rank(A) = n. There is an m × m permutation matrix P that interchanges the columns of A, an orthogonal matrix Q ∈ Rn×n , and an upper triangular, nonsingular matrix R1 ∈ Rn×n such that:8 Q T AP = R1 R2 , R2 ∈ Rn×(m−n) . (12.38) 8
See Golub and Van Loan (1996), p. 271.
12.9 Matrix Factorization
743
12.9.6 Singular Value Decomposition Any real matrix A ∈ Rn×m can be factored into the following product:9 A = U DV T ,
U ∈ Rn×n , U T U = I n ,
V ∈ Rm×m , V T V = I m , Σ D= ∈ Rn×m if n ≥ m, 0 D = Σ 0 ∈ Rn×m if m ≥ n, σ1 0 . . . 0 0 σ2 . . . 0 Σ= ... ... . . . ... , p = min{n, m}, σ1 ≥ σ2 ≥ · · · ≥ σ p .
(12.39)
0 0 . . . σp
The entries of the diagonal matrix Σ are the positive square roots of the eigenvalues of the matrix AT A. The singular value decomposition can be used to compute the solution b of the linear least squares problem b = (X T X )−1 X T y, X ∈ RN ×K , y ∈ RN ×1 , N ≥ K
without the need to compute the inverse of the matrix X T X . Notably, for N ≥ K, the matrix D is given by Σ D= ∈ RN ×K . 0 Using (12.9.6) with X ≡ A yields
X T X = V D T U T U DV T = V Σ T ΣV T
so that b = V (D T D)−1 V T V D T U T y = V Σ−1 U1T y. The entries of the diagonal matrix Σ−1 are the inverse singular values σi , and the matrix U1 consists of the first K columns of the matrix U. 9
See, e.g., Golub and Van Loan (1996), Theorem 2.5.2, p. 70.
744
12 Linear Algebra
The singular value decomposition may also be used to address the problem of multicollinearity, i.e., if some columns of the matrix X are close to being a linear combination of other columns of X . In this case, X T X is a near-singular matrix, and small changes in the elements of X can significantly change the elements of the coefficient vector b. An indicator of this problem is a large condition number of X T X , which is measured as the ratio between the largest and the smallest singular value, i.e., by σ1 /σK . Judd et al. (2011) suggest mitigating this problem with a truncated principal components estimator.10 Consider the matrix of principal components Z = X V , whose columns are linear combinations of the columns of the data matrix X . The linear regression model with errors ε ∈ RN ×1 can then be written as y = X b + ε = X V V −1 b + ε = Zd + ε, d = V −1 b.
Instead of considering all K columns of the matrix Z, we use only the first r columns, z11 . . . z1r σ1 u11 . . . σ r u1r . . . .. .. Z r := .. . . . .. = .. . . zN 1 . . . zN r
σ1 uN 1 . . . σ r uN r
so that the condition number of Z r is given by σ1 /σ r . Accordingly, we choose r so that the condition number cannot exceed a given threshold that we are willing to tolerate. This procedure yields the following truncated least squares estimator: T −1 b r = V Σ−1 U y, Σ = r 1 r
1 σ1
... 0 .. . . .. . . . 0 . . . σ1r 0 ... 0 .. . . .. . . . 0 ... 0
0 ... 0 .. . . .. . . . 0 ... 0 ∈ RK×K . 0 ... 0 .. . . .. . . . 0 ... 0
(12.40)
It is beyond the scope of this text to address algorithms that compute any of the abovementioned factorizations.11 Most programming languages provide suitable routines. LAPACK, which is freely available at http: 10
The principal components model is explained in more detail in Judge et al. (1988), Section 21.2.3. 11 A good reference on this subject is Golub and Van Loan (1996).
12.9 Matrix Factorization
745
//www.netlib.org/lapack/lug/index.html, is a large collection of linear algebra routines written in Fortran. They are distributed with Intel’s R Fortran compiler. MATLAB provides an interface for these routines and has its own commands for various matrix factorizations. In addition to offering the QR, the LU, the Cholesky, and the singular value factorization, the more recent versions of GAUSS provide commands for the Schur and the QZ factorization with the ability to sort eigenvalues.
Chapter 13
Function Approximation
13.1 Introduction There are numerous instances in which we need to approximate functions of one or several variables. Examples include the perturbation methods considered in Chapters 2-4, the discrete state space methods of Chapter 7, and the weighted residuals methods of Chapter 5. This chapter gathers the most useful tools for this purpose. First, we clarify the notion of function approximation in the next section. We distinguish between local and global methods. The local approximation of the policy functions of the canonical DSGE model in Chapters 2-4 relies on the combination of two tools: Taylor’s theorem in one or more dimensions and the implicit function theorem. We cover them in Sections 13.3 and 13.4, respectively. Global methods approximate a given function with a combination of simple basis functions. Various kinds of polynomials can serve as basis functions. Polynomials also play an important role in numerical integration, which we consider in Section 14.3. Section 13.5 considers monomials as basis functions and the related Lagrange interpolation. One way to overcome the poor performance of the Lagrange polynomial on equally spaced grids is spline interpolation, which employs low-order polynomials piecewise on the domain of a function. We use this technique to implement the finite element methods in Chapter 5 and consider it in Section 13.6. In Section 13.7, we introduce the family of orthogonal polynomials. Members of this family include the Hermite and Chebyshev polynomials. The former play an important role in the numerical approximation of expectations considered in Section 14.4.2. The latter are key to the spectral methods of Chapter 5, so we devote Section 13.8 to them.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_13
747
748
13 Function Approximation
In Section 13.9, we turn to the problem of approximating a function of several variables. We consider tensor product bases, complete bases, and the Smolyak polynomial on a sparse grid.
13.2 Function Spaces The tools introduced in Chapter 12 focus on Euclidean space, i.e., n-tuples of real numbers x := (x 1 , x 2 , . . . , x n ) T ∈ Rn . With the rules of vector addition and scalar multiplication Euclidean space is a vector space.1 With one of the vector norms considered in Section 12.4, it is also a normed vector space. This allows one to define neighborhoods and limits. Functional analysis develops the same concepts on spaces, whose elements are functions. Accordingly, it enables us to speak of a function f n , n ∈ N, approaching another function f as its limit, which is the concern of function approximation. Our sketch of concepts is conducted without the mathematical rigor of textbooks on this subject because we require only a few definitions to understand the theorems that characterize the tools in this chapter. The elements of a function space must have the same domain. Their other properties depend on the specific question at hand. As an example, consider all continuous functions f : [a, b] → R, which map the closed interval [a, b] ∈ R to the real line R. The common symbol for this set is C 0 ([a, b], R). For two members f and g of this set, define addition as h(x) := f (x) + g(x), x ∈ [a, b]
and scalar multiplication as
h(x) := a f (x), a ∈ R, x ∈ [a, b]
so that continuity is preserved and the function h : [a, b] → R is also a member of the set C 0 ([a, b], R). Furthermore, let |x| denote the absolute value of x ∈ [a, b], and consider the definition Z b k f k1 :=
a
| f (x)| d x.
(13.1)
k f k1 satisfies the conditions for a norm (see (12.3)) and is known as the L 1 norm. The space C 0 ([a, b], R) with this norm, denoted by L 1 ([a, b], R),
1
For the complete list of axioms that define a vector space, see, e.g., Lang (1997), p. 129.
13.3 Taylor’s Theorem
749
is a normed vector space. A second norm on the space of continuous functions is the maximum or sup norm k f k∞ := max | f (x)|, x∈[a,b]
(13.2)
which is used to define uniform convergence. We say that the sequence of functions f n converges uniformly to f if for ε > 0 there exists N such that for all n > N k f n − f k∞ < ε. In the section on orthogonal polynomials, we shall encounter the space of square integrable functions f : [a, b] → R, which are defined by the condition: Z b a
| f (x)|2 d x < ∞.
(13.3)
Similar to the inner product x T y ∈ R of two vectors x, y ∈ Rn in Euclidean n space, there is an inner product denoted by Z 〈 f , g〉 :=
b a
f (x)g(x) d x.
This gives rise to the least-squares norm v uZ b t k f k2 := f (x)g(x) d x.
(13.4)
a
The vector space of functions with this norm is denoted by L 2 ([a, b], R).
13.3 Taylor’s Theorem Consider a function f of a single variable x ∈ U, where U is an open subset of R. Taylor’s theorem states the following:2 2
Statements of this theorem appear in calculus textbooks and in most texts on mathematics for economists. See, e.g., Lang (1997), p. 111 for the single-variable and p. 408 for the n-variable case. Judd (1998) states the theorem for a singe variable on p. 23. We employ his notation for the n-variable case on p. 239 in our statement below.
750
13 Function Approximation
Theorem 13.3.1 Let f : [a, b] → R be a k + 1 times continuously differentiable function on (a, b), let x¯ be a point in (a, b), and let h ∈ R with x¯ + h ∈ [a, b]. Then: f (¯ x + h) = f (¯ x ) + f (1) (¯ x )h + f (2) (¯ x) + f (k+1) (ξ)
hk+1 , (k + 1)!
h2 hk + · · · + f (n) (¯ x) 2 k!
ξ ∈ (¯ x , x¯ + h).
In this statement, f (i) is the i–th derivative of f evaluated at the point x¯ . The derivative that appears in the rightmost term of this formula is evaluated at some unknown point between x¯ and x¯ + h. If we neglect this term, the formula approximates the function f at x¯ , and the approximation error is of order k+1. By this we mean that the error is proportional to hk+1 , where the constant of proportionality is given by C = f k+1 (ξ)/(k + 1)!. For the n-variable case, the theorem is: Theorem 13.3.2 Let X ⊂ Rn be an open subset, x := [x 1 , x 2 , . . . , x n ] T ∈ X , h := [h1 , h2 , . . . , hn ] T ∈ Rn such that x + th ∈ X for all t ∈ [0, 1]. Assume that f : X → R is (k + 1)-times continuously differentiable. Then, there is a λ ∈ [0, 1], such that f (x + h) = f (x) +
n n n X ∂ f (x) 1 X X ∂ 2 f (x) h i1 + hi hi ∂ x i1 2 i =1 i =1 ∂ x i1 ∂ x i2 1 2 i =1 1
+
+
1 k!
n X
i =1
|1
1
··· {z
n X
ik =1
}
2
k
∂ f (x) h i h i2 . . . h i k ∂ x i1 ∂ x i2 . . . ∂ x ik | 1 {z } k terms
k summation signs n n X X
1 (k + 1)! i
1 =1
···
ik+1
∂ k+1 f (x + λh) hi hi . . . hik+1 . ∂ x i1 ∂ x i2 . . . ∂ x ik+1 1 2 =1
An immediate corollary of this theorem is the quadratic approximation of f at x, which follows from this theorem for k = 2. Let us arrange the first-order partial derivatives in the row vector ∂ f (x) ∂ f (x) ∇ f (x) T := ∂ x . . . ∂ x , 1
n
the gradient of f at the point x, and the second-order partial derivatives in the matrix
13.3 Taylor’s Theorem
∂ 2 f (x)
∂ x1 ∂ x2
2 ∂ f (x) ∂ x2 ∂ x1 H(x) := . ..
751
∂ 2 f (x) ∂ x1 ∂ x2
...
∂ 2 f (x) ∂ x1 ∂ x n
∂ 2 f (x) ∂ x2 ∂ x2
...
∂ 2 f (x) ∂ x2 ∂ x n
.. .
..
∂ 2 f (x) ∂ 2 f (x) ∂ x n ∂ x1 ∂ x n ∂ x2
.. .
.
...
∂ 2 f (x) ∂ xn∂ xn
,
(13.5)
the Hessian matrix of f at x. Then, 1 f (x + h) ≈ f (x) + ∇ f (x) T h + h T H(x)h, 2
(13.6)
where the approximation error φ(h), with φ(0) = 0, has the property lim h→0 h6=0
φ(h) = 0. khk2
Similarly, the linear approximation is given by: f (x + h) ≈ f (x) + ∇ f (x) T h,
(13.7)
where the error φ(h) now obeys lim h→0 h6=0
φ(h) = 0. khk
Consider a map f : X → Y that maps points x of the open subset X ⊂ Rn into points y of the open subset Y ⊂ Rm : y1 = f 1 (x 1 , x 2 , . . . , x n ), y2 = f 2 (x 1 , x 2 , . . . , x n ), y = f(x) ⇔ (13.8) .. .. .=. ym = f m (x 1 , x 2 , . . . , x n ). If we apply Taylor’s theorem to each component of f, we obtain the linear approximation of the nonlinear map f: y ≈ f(x) + J(x)h. The matrix
(13.9)
752
13 Function Approximation
∂ f 1 (x) ∂ x1
2 ∂ f (x) ∂ x1 J(x) := . ..
∂ f 1 (x) ∂ x2
...
∂ f 1 (x) ∂ xn
∂ f 2 (x) ∂ x2
...
∂ f 2 (x) ∂ xn
.. .
..
∂ f m (x) ∂ f m (x) ∂ x1 ∂ x2
.
...
.. . ∂ f m (x) ∂ xn
(13.10)
is called the Jacobian matrix of f at the point x.
13.4 Implicit Function Theorem In this book, many if not all systems of nonlinear equations that we have encountered are not given in explicit form y = f(x), as considered in the previous paragraph. Rather, the relation between y ∈ Rm and x ∈ Rn is implicitly defined via g(x, y) = 0m×1 ,
(13.11)
where g : U → V , U is a subset of Rn × Rm and V a is subset of Rm . The implicit function theorem3 allows us to compute the derivative of y = f(x) ¯) of (13.11). at a solution (¯ x, y Theorem 13.4.1 (Implicit Function Theorem) Let U be an open subset in a product U1 × U2 , U1 ⊂ Rn , U2 ⊂ Rm , and let g : U → V ⊂ Rm be a ¯) ∈ U with x ¯ ∈ U1 and p-times continuously differentiable map. Let (¯ x, y m×n ¯ ∈ U2 . Let g(¯ ¯) = 0m×1 . Assume that D y (g(¯ ¯)) ∈ R y x, y x, y : U2 → V is ¯ ∈ U1 and invertible. Then, there exists an open ball B ⊂ U1 , centered at x ¯ = f(¯ a continuous map f : B → U2 such that y x) and g(x, f(x)) = 0m×1 for all x ∈ B. If B is a sufficiently small ball, then f is uniquely determined and p-times continuously differentiable. The expression D y denotes the Jacobian matrix of g with respect to the elements of the vector y. This is an m × m matrix of partial derivatives: ∂ g 1 (¯x,¯y) ∂ g 1 (¯x,¯y) ∂ g 1 (¯ x,¯ y) . . . ∂ y1 ∂ y2 ∂ ym 2 ∂ g 2 (¯ x,¯ y) ∂ g (¯x,¯y) ∂ g 2 (¯x,¯y) ... ∂ y ∂ y1 ∂ y2 m . ¯)) = D y (g(¯ x, y . . . . .. .. .. .. ∂ g m (¯ x,¯ y) ∂ g m (¯ x,¯ y) ∂ y1 ∂ y2
3
...
See, e.g., Lang (1997) p. 529, Theorem 5.4.
∂ g m (¯ x,¯ y) ∂ ym
13.5 Lagrange Interpolation
753
If this matrix is invertible (as required by Theorem 13.4.1), we obtain ¯ by differentiating g(x, f(x)) with respect to x: the Jacobian matrix of f at x ¯)Dx (¯ ¯), J(¯ x) := f x (¯ x) = −D−1 x, y x, y y (¯
(13.12)
where Dx (·) is analogously defined as D y (·).
13.5 Lagrange Interpolation 13.5.1 Polynomials and the Weierstrass Approximation Theorem Suppose that we are given a function f : [a, b] → R and want to approximate it by a simpler one. Unlike the Taylor series approximation at a point x ∈ [a, b] we want to employ information from the shape of f on its entire domain. The simpler function we shall consider is a polynomial of degree n defined by pn (x) := c0 + c1 x + c2 x 2 + · · · + cn x n , cn 6= 0, x ∈ R.
(13.13)
The functions 1, x, x 2 , ..., x n in this formula are called the monomials, and the ck ∈ R are called the coefficients. Note that the degree n of the polynomial is the largest exponent in this formula with a nonzero coefficient. The set of polynomials of degree at most n build a finitedimensional vector space. The monomials span this space or, equivalently, build a n + 1-dimensional base of this space. We can construct other bases {ϕk }nk=0 from the monomial base via
ϕ0 (x) 1 ϕ1 (x) x . = A . .. .. ϕn (x)
xn
for some given (n + 1) × (n + 1) nonsingular matrix A. One reason to employ polynomials for function approximation is given by the Weierstrass approximation theorem.4 Theorem 13.5.1 (Weierstrass Approximation Theorem) Let [a, b] be a closed interval, and let f be a continuous function on [a, b]. Then, f can be 4
See, e.g., Lang (1997), Theorem 2.1, p. 287 and his definition of uniform convergence on p. 179.
754
13 Function Approximation
uniformly approximated by polynomials; i.e., for each ε > 0, there exists N such that for all n ≥ N k f (x) − pn (x)k∞ < ε, ∀x ∈ [a, b]. This theorem assures us that polynomials are useful for function approximation; however, it does not give us a way to find the appropriate polynomial. A simple requirement on pn (x) is that it passes through n + 1 given points. Accordingly, suppose that we have decided to sample n + 1 distinct points, called the nodes, x 0 < x 1 < · · · x n , from the interval [a, b] and can compute the related ordinate values yi = f (x i ). Equivalently, we may be given n + 1 data points (x i , yi ), i = 0, 1, . . . , n, which we want to approximate by the polynomial (13.13). This latter problem arises, e.g., in Chapter 7, where we know the value function on a grid but also want to evaluate the function between the grid points. The problem is to choose the coefficients ci of pn (x) so that yi = pn (x i ), i = 0, 1, . . . n. This give rise to n + 1 linear equations:
y0 1 y1 1 . = . .. .. yn | {z } =:y
x 02 . . . x 12 . . . .. . . . . 1 x n x n2 . . . | {z x0 x1 .. .
=:X
x 0n c0 x 1n c1 . .. . ... cn x nn } | {z }
(13.14)
=:c
Since the points x 0 , x 1 , . . . , x n are distinct, the n + 1 rows of matrix X are linearly independent and the system has a unique solution given by c = X −1 y.
(13.15)
13.5.2 Lagrange Interpolating Polynomial A second approach to finding the interpolating polynomial is Lagrange interpolation. Consider Figure 13.1 and the problem of computing the straight line from (x 0 , y0 ) to (x 1 , y1 ) that approximates the function f (x) in the interval [x 0 , x 1 ]. This is equivalent to solving the system y0 = c0 + c1 x 0 y1 = c0 + c2 x 1
13.5 Lagrange Interpolation
755
for c0 and c1 yielding f (x 1 ) − f (x 0 ) fˆ(x) := p1 (x) = f (x 0 ) + (x − x 0 ) x1 − x0 x1 − x x − x0 ≡ y0 + y1 . x −x x −x | 1 {z 0} | 1 {z 0} =:L1,0 (x)
(13.16)
=:L1,1 (x)
This is the linear interpolation formula. It has the property of being shape preserving, which is important if we want to approximate a function with a known shape, e.g., the value function in Chapter 7. In particular, if f is monotonically increasing and concave on [a, b], then fˆ(x) preserves these properties. y f (x) y1 fˆ(x) y0
f (x)
x0
x
x1
x
Figure 13.1 Linear Interpolation
We use the second equation in (13.16) to introduce the Lagrange interpolating polynomial. First, note that both L1,0 (x) and L1,1 (x) as defined in (13.16) are polynomials of degree one, which explains the first index. Second, observe that the term L1,0 is equal to one at the point x = x 0 , while the term L1,1 is equal to one at the point x = x 1 , which explains the second index. Furthermore, L1,0 (x 1 ) = 0 and L1,1 (x 0 ) = 0. To generalize this formula to a polynomial of degree n that passes trough n + 1 distinct points (x i , yi ), i = 0, 1, . . . n, define the k-th Lagrange polynomial by L n,k (x) :=
(x − x 0 )(x − x 1 ) · · · (x − x k−1 )(x − x k+1 ) · · · (x − x n ) (x k − x 0 )(x k − x 1 ) · · · (x k − x k−1 )(x k − x k+1 ) · · · (x k − x n ) (13.17)
756
13 Function Approximation
so that L n,k (x k ) = 1 and L n,k (x i ) = 0 for i 6= k. Then, it is apparent that the polynomial pn (x) = L n,0 (x) y0 + L n,1 (x) y1 + · · · + L n,n (x) yn
(13.18)
passes through the n + 1 points (x i , yi ). Note that the polynomial (13.13) with coefficients given by (13.15) and the Lagrange polynomial (13.18) are equivalent ways to write the solution of the interpolation problem.
13.5.3 Drawbacks Lagrange polynomials have several drawbacks: They can oscillate near the end points of the interval [a, b], low- and high-order monomials differ greatly in size, and bases from higher-order monomials are nearly multicollinear. OSCILLATORY BEHAVIOR. First, consider the interpolation error, which is given by the term on the right-hand side of the following equation:5 f (1+n) (ξ(x)) Y (x − x i ), ξ(x) ∈ (a, b). (n + 1)! i=0 n
f (x) − pn (x) =
This error is mainly determined by the polynomial l(x) = (x − x 0 ) . . . (x − x n ). On equally spaced grids, this polynomial oscillates near the boundary of the interval [a, b] for large n. Figure 13.2 illustrates the consequences of this behavior for the Runge function f (x) := 1/(1 + x 2 ) on the interval [−5, 5]. The polynomial pn (x) employs the nodes x i = −5 + 10i/n, i = 0, 1, . . . , n. Clearly, the higher-degree polynomial p10 (x) performs worse than the polynomial p5 (x) at both the left and right ends of the interval.6 A careful choice of the interpolation nodes x i may overcome this problem. Figure 13.3 compares the Lagrange polynomial p10 (x) on an equally spaced grid with a Lagrange polynomial of the same order on a grid that consists of the 11 zeros of the Chebyshev polynomial T11 (x) (see Section 13.8) shown as green dots. Note that there are more points near the left and right boundaries of the interval [−5, 5] and that the large swings around the true function disappear. 5
For this formula, see, e.g., Stoer and Bulirsch (2002), p. 49 or Quarteroni et al. (2007), equation (8.7) on p. 335. 6 See, e.g., Walter (2014), pp. 93f. and Quarteroni et al. (2007), p. 337 for similar examples.
13.5 Lagrange Interpolation
757 f (x) = p5 (x) p10 (x)
2.0 1.5
1 1+x 2
1.0 0.5 0.0 −5.0 −4.0 −3.0 −2.0 −1.0
0.0
1.0
2.0
3.0
4.0
5.0
x
Figure 13.2 Polynomial Approximation of the Runge Function on an Equally Spaced Grid 2.0 1.5
1 f (x) = 1+x 2 p10 (x) on equally spaced grid p10 (x) on Chebyshev zeros Chebyshev zeros
1.0 0.5 0.0 −5.0 −4.0 −3.0 −2.0 −1.0
0.0
1.0
2.0
3.0
4.0
5.0
x
Figure 13.3 Polynomial Approximation of the Runge Function on Chebyshev Zeros
SIZE DIFFERENCES AND NEAR MULTICOLLINEARITY. Figure 13.4 shows the graphs of several monomials over the interval [0, 1.2]. Within the interval [0, 1], the graphs are close together and they separate quickly from each other to the right of the point x = 1. The large differences in size are an obstacle to root-finding and minimization algorithms that operate on algebraic polynomials. The similarity, in particular of the higher order members of this family, means that the columns of matrix X in equation (13.14) are nearly multicollinear so that the solution c is numerically un-
758
13 Function Approximation
stable: Small differences in the elements of X can induce large changes in the elements of the coefficients c. An indicator of this problem is the condition number of X T X .7 Greene (2012) considers values in excess of 20 as indicative of a problem. For example, the matrix X = [x 1 , x 2 , x 5 , x 10 , x 15 ] with 100 nodes equally spaced on the interval x ∈ [0, 1.2] has a condition number of over 208. x1 x2 x5 x 10 x 15
15 12 9 6 3 0 0.00
0.20
0.40
0.60
0.80
1.00
1.20
x
Figure 13.4 Monomials on [0,1.2]
One way to address these drawbacks is spline interpolation, which is considered in the next section.
13.6 Spline Interpolation Spline interpolation employs low-order monomials to approximate the function f : [a, b] → R on subintervals of [a, b] and joins these local approximations to build fˆ(x) on [a, b]. Figure 13.5 illustrates this principle. We choose n + 1 points x i , i = 0, 1, . . . n that divide the interval [a, b] into n subintervals, which are not necessarily of the same length. On each of them, a low-order polynomial pi (x) := ci0 + ci1 (x − x i ) + · · · + cin (x − x i )n , x ∈ [x i , x i+1 ] locally approximates the function f . 7
See Section 12.9.6 on the definition of this concept.
13.6 Spline Interpolation p0 (x) a = x0
759
p1 (x) x1
x2
pn−1 (x)
p3 (x)
p2 (x) x3
x4
x n−1
xn = b
Figure 13.5 Spline Interpolation
Usually, the order of the local polynomials is not larger than three. For ease of exposition, we use the first two, three, and four letters of the alphabet to represent the coefficients of a linear, a quadratic, and a cubic polynomial. Furthermore we take as given n + 1 pairs of observations (x i , yi ), yi = f (x i ).
13.6.1 Linear Splines Linear polynomials are defined by pi (x) = ai + bi (x − x i ), x ∈ [x i , x i+1 ]. Their 2n coefficients are determined by two conditions: 1) At nodes x i , i = 0, 1, . . . n − 1, the polynomial agrees with the values of the function yi = f (x i ). 2) At the n − 1 neighboring nodes, the polynomials agree: pi (x i+1 ) = pi+1 (x i+1 ). It is easy to see (and therefore left to the reader) that these conditions uniquely determine the parameters. For i = 0, 1, . . . , n, the parameters are determined by ai = yi , yi+1 − yi bi = x i+1 − x i
so that on each subinterval, the polynomial pi (x) coincides with the linear algebraic polynomial encountered in the previous section (see equation (13.16)). In the GAUSS programming language, we implement linear splines with two procedures. LSpline_coef computes the coefficients and must be called first. Afterwards, the user can evaluate the spline repeatedly with
760
13 Function Approximation
the procedure LSpline_eval. The latter function employs binary search, encoded in the procedure Find, to find the index i with the property x ∈ [x i , x i+1 ] for x ∈ [a, b] (see Figure 13.5). Linear splines preserve the continuity of a function, its monotonicity and its convexity. In many applications, for instance in root-finding and optimization, it is advantageous for the approximate function to be differentiable everywhere and not only within the subintervals. Quadratic splines, defined by pi (x) = ai + bi (x − x i ) + ci (x − x i )2 , x ∈ [x i , x i+1 ]. have one more degree of freedom that can be used to ensure that the derivatives of pi (x) and pi+1 (x) agree at neighboring nodes: bi + 2ci (x i+1 − x i ) = bi+1 . The problem with quadratic polynomials, however, is that these additional n − 1 conditions together with the 2n conditions from pi (x i ) = yi and pi (x i+1 ) = pi+1 (x i+1 ), i = 0, . . . , n, determine only 3n − 1 of the 3n parameters and it is difficult to argue for a specific additional condition. For instance, one can either supply a condition for the derivative at x 0 or at x n but not for both endpoints. Cubic splines do not suffer from this problem and are thus widely applied in interpolation.
13.6.2 Cubic Splines The cubic polynomial pi (x) := ai + bi (x − x i ) + ci (x − x i )2 + di (x − x i )3 , x i ∈ [x i , x i+t ] has four free parameters. To determine their values we impose the following conditions: 1) The approximation is exact at the grid points, yi = pi (x i ): yi = ai i = 0, . . . , n. 2) At the n − 1 neighboring nodes, the polynomials agree: pi (x i+1 ) = pi+1 (x i+1 )
⇒ ai + bi (x i+1 − x i ) + ci (x i+1 − x i )2 + di (x i+1 − x i )3 = ai+1 .
13.6 Spline Interpolation
761
3) At the n − 1 neighboring nodes, the first and second derivatives agree: 0 pi0 (x i+1 ) = pi+1 (x i+1 )
⇒ bi+1 = bi + 2ci (x i+1 − x i ) + 3di (x i+1 − x i )2 ,
00 pi00 (x i+1 ) = pi+1 (x i+1 )
⇒ ci+1 = ci + 3di (x i+1 − x i ).
These conditions yield 4n − 2 linear equations in the 4n unknowns ai , bi , ci , di , leaving us two conditions short of fixing the coefficients. If we know the derivative of f at both endpoints, we can supply the clamped boundary condition f 0 (x 0 ) = p10 (x 0 ) and f 0 (x n ) = pn0 (x n ). In most applications, this information is not available. Thus, we may use the secant over the first and the last subinterval, giving the conditions p10 (x 0 ) = ( y1 − y0 )/(x 1 − x 0 ) and pn0 (x n ) = ( yn − yn−1 )/(x n − x n−1 ). The respective cubic spline is called the secant Hermite spline. A third solution is the natural spline that sets the second derivative at the endpoints equal to zero: p100 (x 0 ) = pn00 (x n ) = 0. Finally, the ‘not-a-knot’ condition demands that the third derivative agrees on the boundaries of the first and the last subinterval: p1000 (x 1 ) = p2000 (x 1 ) 000 and pn−2 (x n−1 ) = pn000 (x n−1 ). The coefficients of cubic splines are fast to compute since the set of equations sketched above is not only linear but can be reduced to a tridiagonal system in the coefficients ci . We implement the cubic spline interpolation following Press et al. (1992) Section 3.3 in two GAUSS procedures. The procedure CSpline_coef computes the coefficients of the quadratic part of the spline from given vectors x and y that supply the data point (x i , yi ). It uses the not-a-knot condition. After a call to this function we can use CSpline_eval , which returns the value of the spline at a given point x ∈ [a, b]. R In MATLAB , the command griddedinterpolant implements linear and cubic splines. For the latter, it also uses the not-a-knot condition. Figure 13.6 shows the linear and cubic spline approximations of the Runge function. The grid is determined from the same 11 values of x ∈ [−5, 5] employed in Section 13.5.3 to determine the Lagrange polynomial. A glance at Figure 13.3 should convince the reader that both splines are more appropriate for approximating this function than the Lagrange polynomials.
762
13 Function Approximation 1 f (x) = 1+x 2 Linear Spline Cubic Spline
1.0 0.8 0.6 0.4 0.2 0.0 −5.0 −4.0 −3.0 −2.0 −1.0
0.0
1.0
2.0
3.0
4.0
5.0
x
Figure 13.6 Spline Interpolation of the Runge Function
13.7 Orthogonal Polynomials 13.7.1 Orthogonality in Euclidean Space Recall from the geometry of the Euclidean n-space that the angle θ between two vectors x, y ∈ Rn satisfies (see, e.g., Lang (1986), p. 25): cos θ kxk2 kyk2 = x T y.
Accordingly, two vectors are perpendicular or orthogonal, cos θ = 0, if their scalar product 〈x, y〉 vanishes (see Figure 13.7): 0 = 〈x, y〉 := x T y. y
θ = 90◦ Figure 13.7 Orthogonal Vectors
x
13.7 Orthogonal Polynomials
763
The orthogonal projection of a vector z ∈ Rn onto the space spanned by two linearly independent vectors x, y ∈ Rn , therefore, is defined by [x, y] T (z − ˆz) = 0,
ˆz = γ1 x + γy
and illustrated in Figure 13.8.
z
90◦
y
ˆz
x Figure 13.8 Orthogonal Projection in Euclidean Space
A reader acquainted with the linear regression model will recall that this is the principle that determines the least-squares estimator. Given an n × m matrix X of observations of m ≤ n independent variables and an n-dimensional vector y of observations of one dependent variable, the vector of estimated coefficients b, say, has the property X 0 (y − X b) = 0m×1 .
If they are linearly independent, the columns xi , i = 1, . . . , m of the matrix X build a base for the m dimensional subspace of all their linear combinations. If the base consists of mutually orthogonal vectors of unit length, it is called an orthonormal base. With appropriate changes, these definitions carry over to spaces of functions to which we turn now.
13.7.2 Orthogonality in Function Spaces Consider the space of functions C k ([a, b], R) that map the interval [a, b] to the real line R and that are k-times continuously differentiable. This
764
13 Function Approximation
set is a vector space. Unlike Euclidean n-space, however, there is no finite dimensional set of basis functions ϕi (x), i = 0, 1, . . . that allows us to represent f ∈ C k ([a, b], R) as a linear combination of them. This can be seen by taking the limit k → ∞ in the statement of Taylor’s theorem 13.3.1. To motivate the notion of orthogonality in a function space, consider the following problem. Assume we want to approximate f ∈ C 0 ([a, b], R) by a linear combination of base functions taken from some family of polynomials ϕk : [a, b] → R: fˆγ (x) :=
K X
γi ϕk (x).
k=0
Suppose further that there is a weight function. This is an integrable function with positive values almost everywhere8 on [a, b], i.e., w : [a, b] → R>0 , and a finite integral: Z b a
w(x) d x < ∞.
Our goal is to choose parameters γk , k = 0, 1, . . . K, such that the weighted sum of squared errors R(γ, x) := f (x) − fˆγ (x) over all x ∈ [a, b] reaches a minimum:9 2 Z b K X min w(x) f (x) − γk ϕk (x) d x. (13.19) γ
a
k=0
The first-order conditions for this problem are Z b K X 0=2 w(x) f (x) − γ j ϕ j (x) ϕk (x) d x, k = 0, 1, . . . , K, a
j=0
which may be rewritten as Z b a
w(x) f (x)ϕk (x) d x =
K X j=0
Z γj
b a
w(x)ϕ j (x)ϕk (x) d x,
(13.20)
k = 0, 1, . . . , K. 8
Intuitively, the qualifier ‘almost everywhere’ allows w(x) to be nonpositive on a very small set of points. This set must be so small that its size – technically, its measure – equals zero. 9 This is called a continuous least squares approximation of f (x). We assume that this and the following integrals are finite.
13.7 Orthogonal Polynomials
765
If the integral on the rhs of (13.20) vanishes for k 6= j and equals a constant c j for k = j, it will be easy to compute γk from 1 γk = ck
Z
b a
w(x) f (x)ϕk (x) d x.
This motivates the following definition of orthogonal polynomials: A set of functions (ϕ0 , ϕ1 , . . . ) is called orthogonal with respect to the weight function w if and only if: Z
b a
w(x)ϕk (x)ϕ j (x) d x = 0 for k 6= j.
(13.21)
If, in addition, the integral equals unity for all k = j, the polynomials are called orthonormal.10
13.7.3 Orthogonal Interpolation Polynomial interpolation on the zeros of orthogonal polynomials does not suffer from the oscillatory behavior encountered on equidistant grids in Section 13.5.3. In contrast, the following theorem (Theorem 6.5 in Mason and Handscomb (2003)) states that on the zeros of a family of orthogonal polynomials the polynomial (13.13) approximates a given function arbitrarily well in the least-squares norm: Theorem 13.7.1 Given a function f ∈ C 0 ([a, b], R), a system of polynomials {ϕk (x), k = 0, 1, . . . } (with exact degree k) that are orthogonal with respect to w(x) on [a, b], and a polynomial pn (x) that interpolates f (x) in the zeros of ϕn+1 (x), then Z
b
lim
n→∞
a
w(x) ( f (x) − pn (x))2 d x = 0.
10
Note the analogy with the definition of orthogonality in Euclidean n-space. If we define the scalar product of square integrable functions as in (5.3), condition (13.21) can be written as < φk (x), φ j (x) >w = 0.
766
13 Function Approximation
13.7.4 Families of Orthogonal Polynomials Orthogonal polynomials satisfy a three-term recurrence relationship:11 ϕk (x) = (ak x + bk )ϕk−1 (x) − ck ϕk−2 (x).
(13.22)
For our purposes, two members of the family of orthogonal polynomials are of particular relevance: Hermite and Chebyshev polynomials. The former play an important role in the numerical approximation of agent’s expectations discussed in Section 14.4. They are defined by (see, e.g., Boyd (2000), p. 505): H0 = 1, H1 (x) = 2x,
(13.23)
H k (x) = 2x H k−1 (x) − 2(k − 1)H k−2 (x)
2
and are orthogonal with respect to the weight function w(x) := e−x : ¨p Z∞ π2 j j!, for k = j, 2 e−x H k (x)H j (x) d x = (13.24) 0, otherwise . −∞
Chebyshev polynomials are key to our implementation of the spectral methods of Chapter 5, and therefore, we consider them in more depth in the next section.
13.8 Chebyshev Polynomials 13.8.1 Definition There are four kinds of Chebyshev polynomials (named after the Russian mathematician Pafnuty Lvovich Chebyshev (1821-1892)). Our focus here is on polynomials of the first kind.12 The degree-n polynomial is defined on the domain z ∈ [−1, 1] by the formula13 Tn (z) := cos(nθ (z)),
θ (z) = cos−1 (z).
(13.25)
Using the properties of the cosine function on the interval θ ∈ [0, π] as
11
See, e.g. Golub and Welsch (1969), equation (2.1). See Mason and Handscomb (2003), Section 1.2 for the second through fourth kind. 13 We use z rather than x as an argument of the polynomial to remind the reader that the domain of the Chebyshev polynomials is the interval [−1, 1] rather than the entire real line R. 12
13.8 Chebyshev Polynomials
767
Table 13.1 Tabulated Values of the Sine and Cosine Function θ
sin(θ )
cos θ
0
0
1
1 8π
p
p
π
2 2
p
1 p
p 2+ 2 2
−
2 2
p
p 2− 2 2
0
p 2− 2 2
0
p
3 4π 7 8π
p 2+ 2 2
p 2+ 2 2 p
2 2
1 2π 5 8π
p
p
1 4π 3 8π
p 2− 2 2
p −
−
p
p 2− 2 2 p
2 2
p 2+ 2 2
-1
The analytic expressions in the entries of the table employ standard formulas for the sine and cosine functions. See, e.g., p Sydsæter et al. (1999), pp. 14f. For instance, the value of x = cos(π/4) = 2/2 follows from xp= sin(π/2 − p π/4) = sin(π/4) = y and x 2 + y 2 = 1 so that x = 1/2 = 0.5 2.
presented in Table 13.1, the degree-zero and the degree one polynomial are equal to T0 (z) = 1,
(13.26a) −1
T1 (z) = cos(cos (z)) = z.
(13.26b)
Further members of this family of polynomials follow from the recurrence relation Tn+1 (z) = 2z Tn (z) − Tn−1 (z).
(13.26c)
768
13 Function Approximation
To see this, recall the following relation between the cosine and sine functions14 cos(x + y) = cos(x) cos( y) − sin(x) sin( y), cos(x − y) = cos(x) cos( y) + sin(x) sin( y). Therefore, Tn+1 (z) = cos((n + 1)θ (z))
(13.27a)
= cos(nθ (z)) cos(θ (z)) − sin(nθ (z)) sin(θ (z)),
Tn−1 (z) = cos((n − 1)θ (z))
(13.27b)
= cos(nθ (z)) cos(θ (z)) + sin(nθ (z)) sin(θ (z)).
Adding equations (13.27a) and (13.27b) yields Tn+1 (z) + Tn−1 (z) = 2 cos(θ (z)) cos(nθ (z)) = 2z Tn (z). | {z } | {z } =z
T1
=Tn (z)
T2
T3
T4
T5
1.0
0.5
0.0
−0.5 −1.0 −1.0 −0.8 −0.6 −0.4 −0.2
0.0
0.2
z
0.4
0.6
0.8
1.0
Figure 13.9 Chebyshev Polynomials T1 through T5 .
Figure 13.9 displays the polynomials T1 through T5 . The graph of the degree-one polynomial is the 45◦ line in the plane [−1, 1] × [−1, 1]. The
14
See, e.g., Sydsæter et al. (1999), p. 15
13.8 Chebyshev Polynomials
769
degree-two polynomial is a parabola with a vertex at (0, −1). The polynomials of degrees three and above oscillate between −1 and 1. In general, we will be concerned with functions whose domains are the closed interval [a, b], i.e., f : [a, b] → R. We employ the bijections (1 + z)(b − a) , 2 2(x − a) ξ−1 : [a, b] → [−1, 1], x → 7 z = ξ−1 (x) := − 1. b−a ξ : [−1, 1] → [a, b], z 7→ x = ξ(z) := a +
(13.28a) (13.28b)
to map points between the two domains.
13.8.2 Zeros and Extrema The cosine function has zeros in the interval [0, 2π] at θ1 = π/2 and θ2 = θ1 + π and is periodic with period 2π. Therefore, its zeros occur at θk = 12 π + (k − 1)π =
2k − 1 π, k = 1, 2, . . . , 2
and the Chebyshev polynomial Tn (z), n = 1, 2, . . . has its n zeros at the points 2k − 1 0 zk = cos π , k = 1, 2, . . . , n. (13.29) 2n The extrema of the cosine function occur at kπ, k = 0, 1, . . . (see Table 13.1). Accordingly, the n + 1 extrema of Tn (z), denoted by z¯k , follow from the condition n cos−1 (¯ zk ) = kπ and are given by kπ z¯k = cos , k = 0, 1, . . . , n n
(13.30)
with Tn (¯ zk ) = (−1)k . They are often referred to as Gauss-Lobatto points (see, e.g., Boyd (2000), p. 570).
770
13 Function Approximation
13.8.3 Orthogonality The weight function of the Chebyshev polynomials is the function w(z) := p
1 1 − z2
plotted in Figure 13.10. In particular, Chebyshev polynomials satisfy:15 Z1 if i = j = 0, π, Ti (z)T j (z) 1 d z = 2 π, if i = j = 1, 2, . . . , (13.31) p 1 − z2 −1 0, if i 6= j.
1
(1 − z 2 )− 2
7.0 6.0 5.0 4.0 3.0 2.0 1.0
−1
−0.8 −0.6 −0.4 −0.2
0 z
0.2
0.4
0.6
0.8
1
Figure 13.10 Weight Function of the Chebyshev Polynomials.
There is also a discrete version of this property. Let zk0 denote the n zeros of Tn (z), and consider two polynomials Ti (z) and T j (z), i, j < n − 1. Then: if i = j = 0, n n X 1 0 0 Ti (zk )T j (zk ) = 2 n if i = j = 1, 2, . . . (13.32) 0 k=1 if i 6= j. 15
For a proof, see, e.g., Burden and Faires (2016), pp. 508f. or Mason and Handscomb (2003), p. 83.
13.8 Chebyshev Polynomials
771
Mason and Handscomb (2003), pp. 107f. proves this relation from the properties of the trigonometric functions sin(θ ) and cos(θ ). We will see in Section 14.3.2 on Gauss-Chebyshev integration that (13.32) is simply the exact Gauss-Chebyshev integral of (13.31). With respect to the extrema of Tn , the discrete orthogonality relation is (see Mason and Handscomb (2003), p. 109): n, if i = j = 0, n, n X Ti (¯ zk )T j (¯ zk ) 1 = 2 n, if i = j = 1, 2, . . . n − 1, , ck 0, k=0 if i 6= j. (13.33) ¨ 2, for k = 0, n, ck = 1, for k = 2, . . . , n − 1. 13.8.4 Chebyshev Regression We are now prepared to consider the approximation of a given function f : [a, b] → R by a linear combination of Chebyshev polynomials. Without loss of generality, we assume for the moment [a, b] ≡ [−1, 1] and define: pK (z) := γ0 T0 (z) + γ1 T1 (z) + γ2 T2 (z) + · · · + γK TK (z).
(13.34)
For γK 6= 0, pK (z) is a Chebyshev polynomial of degree K. We choose the coefficients γk , k = 0, . . . , K, to minimize the weighted integral Z1 [ f (z) − pK (z)]2 I(γ0 , . . . , γK ) := d z. p 1 − z2 −1
The first-order conditions of this problem are ∂ I(γ0 , . . . , γK ) = 0, k = 0, 1, . . . K. ∂ γk The respective system of equations is: a01 a02 . . . a0K γ0 b0 a11 a12 . . . a1K γ1 b1 . . = . , .. . . .. .. . . .. .. .
aK1 aK2 . . . aK K γK bK Z1 Z1 Tk (z)T j (z) f (z)Tk (z) ak j := d z, bk = d z. p p 1 − z2 1 − z2 −1 −1
772
13 Function Approximation
Using the orthogonality (13.31) and Gauss-Chebyshev integration (14.19) on the m ≥ K + 1 nodes of Tm (z) to evaluate the integrals yields: π 0 ... 0 γ0 b0 m 0 π2 . . . 0 γ1 b1 πX . . = , b = f (z 0j )Tk (z 0j ), k = 0, . . . , K. .. .. . . . ... ... ... k m 0 0 ...
π 2
γn
j=1
bK
so that the coefficients are given by:16 ¨ m 1, for k = 0, ck X γk = f (z 0j )Tk (z 0j ), ck = m j=1 2, for k > 0.
(13.35)
An alternative version, derived from the integration formula (14.20), is: ck X 1 f (¯ z j )Tk (¯ z j ), m j=0 d j ¨ ¨ 1, for k = 0, 2, ck = , dj = 2 for k > 0, 1, m
¯k = γ
for j ∈ {0, m},
(13.36)
for j ∈ {1, . . . , m − 1}.
We summarize the computation of the coefficients in equation (13.35) in terms of an algorithm: Algorithm 13.8.1 (Chebyshev Regression) Purpose: Approximate a continuous function f : [a, b] → R with a linear combination of Chebyshev polynomials. Steps: Step 1: Choose the degree K of the approximating Chebyshev polynomial. Compute m ≥ K + 1 Chebyshev interpolation nodes zk0 , k = 1, . . . , m, from (13.29). Step 2: For k = 1, 2, . . . , m, employ the bijection ξ(·) from equation (13.28a) and compute f (ξ(zk0 )). Step 3: Compute the Chebyshev coefficients: γ0 =
1X f (ξ(z 0j )), m j=1
γk =
2X f (ξ(z 0j ))Tk (z 0j ), m j=1
m
m
16
k = 1, . . . , K.
Some authors multiply the coefficient of T0 by 0.5 so that the distinction of cases in (13.35) is unnecessary.
13.8 Chebyshev Polynomials
773
The algorithm is easy to implement. The GAUSS procedure Cheb_coef illustrates this point. Let TK,m denote the matrix of base functions evaluated at the m Chebyshev zeros,
1 1 ... 1 0 z20 . . . zm z10 0 0 0 T (z ) T (z ) . . . T2 (zm ) , TK,m := 2. 1 2. 2 .. .. .. .. . . 0 TK (z10 ) TK (z20 ) . . . TK (zm )
(13.37)
D be the diagonal matrix 1
m
0 D = .. . 0
0 0 ... 2 m 0 ... .. .. . . . . . . 0 .. . . .
0 0 .. , .
(13.38)
2 m
and y be the vector of function values
f (ξ(z10 )) f (ξ(z20 )) y := . ... 0 f (ξ(zm )) Then, the vector of coefficients is given by γ = DT y.
(13.39)
Theorem 13.7.1 implies that the Chebyshev polynomial (13.34) converges to the true function f in the least-squares norm. In practice, however, we cannot use an infinite series to approximate a given function. The Chebyshev truncation theorem (Boyd (2000), Theorem 6, p. 47) provides an estimate of the error: Theorem 13.8.1 (Chebyshev Truncation Theorem) The error in approximating f (z) by f K (z) :=
K X
γk Tk (z)
k=0
is bounded by the sum of the absolute values of all neglected coefficients:
774
13 Function Approximation
| f (z) − f K (z)| ≤
∞ X i=K+1
|γi |
for all f (z), K, and z ∈ [−1, 1].
Suppose we observe the rapid convergence of the series {γk }Kk=0 with γK being close to zero, then we can be confident in ignoring the higherorder terms in (13.34). Boyd (2000), p. 51, summarizes this finding in his Rule-of-Thumb 2: “we can loosely speak of the last retained coefficient as being the truncation error”.
13.8.5 Chebyshev Evaluation Suppose that we have computed the coefficients γ0 , γ1 , . . . , γK of the Chebyshev polynomial (13.34) and want to evaluate it on some point x in the domain of the function f : [a, b] → R, which the polynomial approximates. The recursive definition (13.26c) implies an efficient method, which we present in terms of an algorithm (see Judd (1998), Algorithm 6.1): Algorithm 13.8.2 (Chebyshev Evaluation) Purpose: Evaluate a K-th degree Chebyshev polynomial at z given the coefficients γ0 , . . . , γK Steps: Step 1: Initialize: set the elements of the K + 1-vector y = [ y1 , . . . , yK+1 ] T equal to zero, and set y1 = 1 and y2 = z. Step 2: For k = 2, 3, . . . , K − 1, compute: yk+1 = 2z yk − yk−1 . PK Step 3: Return pn (z) = k=0 γk yk . A simple way to implement this procedure for an entire vector of points x := [x 1 , x 2 , . . . , x n ] consists of three steps: 1) Use (13.28b) and map each element of x to the corresponding element of the vector z := [ξ−1 (x 1 ), ξ−1 , . . . , ξ−1 (x n )].
13.8 Chebyshev Polynomials
775
2) Employ the recursion (13.26c) to construct the matrix
1 1 ... 1 z2 . . . zn z1 T2 (z1 ) T2 (z2 ) . . . T2 (zn ) . TK,n (z) := . .. .. .. .. . . . TK (z1 ) TK (z2 ) . . . TK (zn ) 3) Compute fˆ(x 1 ) fˆ(x ) 2 0 (z)γ. . = TK,n . . fˆ(x n )
An even faster way to solve this problem is the following algorithm (see Corless and Fillion (2013) Algorithm 2.2): Algorithm 13.8.3 (Clenshaw Algorithm) Purpose: Evaluate a K-th degree Chebyshev polynomial at z, given the coefficients γ0 , . . . , γK Steps: Step 1: Initialize: set the K + 1 elements of the vector y = [ y1 , . . . , yK+1 ] T equal to zero and replace yK with γK . Step 2: For k = K − 2, K − 3, . . . , 1, compute yk = γk+1 + 2z yk+1 − yk+2 .
Step 3: Return pK (z) = (γ0 − y2 ) + y1 z.
If we count each multiplication and addition as one floating point operation (FLOP), steps 2 and 3 involve a total of 4n FLOPs. Steps 2 and 3 of Algorithm 13.8.2 involve a total of 5n − 3 FLOPs. Our GAUSS procedure Cheb_eval implements this algorithm.
13.8.6 Examples Figure 13.11 illustrates the convergence of Chebyshev polynomials to the Runge function shown in Figures 13.3 and 13.6. Note that the degree
776
13 Function Approximation f (x) = p5 (x) p10 (x) p25
1.0
1 1+x 2
0.5
0.0 −5.0 −4.0 −3.0 −2.0 −1.0
0.0
1.0
2.0
3.0
4.0
5.0
x
Figure 13.11 Approximation of the Runge Function with Chebyshev Polynomials
K = 25 polynomial still has many fewer free parameters than the cubic spline on the 10 subintervals shown in Figure 13.6. As a second example of the performance of Chebyshev polynomials, we consider a function with a kink: ¨ 0, for x ∈ [0, 1), f (x) := (13.40) x − 1, for x ∈ [1, 2]. This kind of function appears in economic problems with constraints. For instance, in Chapter 7, we consider a model with a binding constraint on investment. As another example, assume that agents supply labor elastically. If the wage is below unemployment compensation, they do not work and the labor supply is equal to zero. For a wage exceeding unemployment compensation, they supply labor, and if the substitution effect dominates the income effect the labor supply increases with the wage rate. The optimal labor supply may look similar to the function in equation (13.40). Figure 13.12 shows that with increasing order, Chebyshev polynomials approach this function. The degree K = 55 polynomial has a maximal error of less than 0.0066 on an equally spaced grid of 200 points over the interval [0, 2].
13.9 Multivariate Extensions
777 p5 (x) p15 (x) p55 g(x)
1.0 0.8 0.6 0.4 0.2 0.0 0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
x
1.6
1.8
2.0
Figure 13.12 Approximation of a Kinked Function with Chebyshev Polynomials
13.9 Multivariate Extensions The policy functions of DSGE models usually have more than one independent variable. This section considers the extension of the approximation of a given function f : X → R defined on a compact subset of the ddimensional space of real numbers, X ⊂ Rd . The next subsection employs a straightforward approach. We approximate f by Cartesian products of one-dimensional polynomials. However, this approach suffers from the curse of dimensionality; i.e., the number of coefficients of the respective polynomials grows exponentially with d. A smaller set is the complete set of polynomials. On this set, however, there are more nodes than required to determine the parameters of the collocation solution considered in Chapter 5, and there is no guidance for their selection. The Smolyak algorithm presented in Subsection 13.9.4 solves this problem. It selects collocation points on sparse grids, whose number grows only polynomially with the dimension d.
13.9.1 Tensor Product and Complete Polynomials Suppose that we want to approximate the function f : X ⊂ Rd → R, x := [x 1 , x 2 , . . . , x d ] 7→ f (x)
778
13 Function Approximation
with polynomials. Let ϕk : [a, b] → R denote a univariate polynomial from some family of polynomials, and consider the d univariate polynomials pK j , j = 1, 2, . . . , d: j
j
pK j (x) := γ0 ϕ0 (x) + · · · + γK ϕK j (x). j
Let K = [K1 , . . . , Kd ] T denote the vector of univariate dimensions. The Cartesian product of these d polynomials, pK1 × pK2 × · · · × pKd , is the d-fold sum K
p (x) :=
K1 X l1 =0
γl1 ,l2 ,...,l d :=
d Y j=1
···
Kd X l d =0
γl1 ,l2 ,...,l d ϕl1 (x 1 )ϕl2 (x 2 ) · · · ϕl d (x d ),
j
γl . j
For K j = K, j = 1, . . . , d, this polynomial has (1+K)d coefficients, a number that grows exponentially with the number of dimensions d. The single products ϕl1 . . . ϕl d are members of the set ( ) d Y Φ := ϕl j (x j ) l j = 0, 1, . . . , K , (13.41) j=1
which is an example of a d-fold tensor product base of polynomials. The complete set of polynomials has fewer members. An example of this set is build by the monomials involved in the Taylor series expansion of f according to Theorem 13.3.2 up to degree K: ¨ « d X kd k1 k2 d PK := (x 1 x 2 · · · x d ) ki = j, ki ≥ 0, j = 0, 1, . . . , K . i=1
For d = 2 and K = 2, this set has 6 members: P22 := 1, x 1 , x 2 , x 1 x 2 , x 12 , x 22 .
With ϕk (x) := x k , the tensor product set has 9 members Φ = 1, x 2 , x 22 , x 1 , x 1 x 2 , x 1 x 22 , x 12 , x 12 x 2 , x 12 x 22 . The complete set of polynomials grows only polynomially in the dimension d. For instance, for K = 2, the set has 1 + d + d(d + 1)/2 elements (Judd (1998), Table 6.6).
13.9 Multivariate Extensions
779
13.9.2 Multidimensional Splines A spline on a d-dimensional subset X ⊂ Rd consists of d-dimensional polynomials restricted to the elements of a d-dimensional grid over X . It is beyond the scope of this book to consider d-dimensional splines in general. Instead, we focus on linear and cubic splines in two dimensions. They are helpful in many of the applications considered in this book. Let X := [a1 , b1 ] × [a2 , b2 ] denote a rectangular subset of R2 , and let f : X → R denote a function we want to approximate with a twodimensional spline on suitably chosen rectangles. To accomplish this, we subdivide both intervals as shown in Figure 13.13. The shaded rectangle corresponds to the subinterval [x 1i , x 1i+1 ] × [x 2 j , x 2 j+1 ]. x 2 j+2
x 2 j+1
x2 j
x 1i−1
x 1i
x 1i+1
x 2 j−1 x 1i+2
Figure 13.13 Rectangular Grid
BILINEAR INTERPOLATION. A bilinear, tensor-product-base polynomial on this subinterval can be written as ij
p2 (x) := c0 + c1 (x 1 − x 1i ) + c2 (x 2 − x 2 j ) + c3 (x 1 − x 1i )(x 2 − x 2 j ). Alternatively, we can write the polynomial as a tensor product of Lagrange polynomials so that we need not compute the coefficients in a separate step. To develop the respective formula, we start with the univariate
780
13 Function Approximation
interpolation of x 1 at the points x 2 j and x 2 j+1 : fˆ(x 1 , x 2 j ) = L1,0 (x 1 ) f (x 1i , x 2 j ) + L1,1 (x 1 ) f (x 1i+1 , x 2 j ), fˆ(x 1 , x 2 j+1 ) = L1,0 (x 1 ) f (x 1i , x 2 j+1 ) + L1,1 (x 1 ) f (x 1i+1 , x 2 j+1 ), where17 x 1i+1 − x 1 , x 1i+1 − x 1i x 1 − x 1i L1,1 (x 1 ) = , x 1i+1 − x 1i
L1,0 (x 1 ) =
are the Lagrange polynomials k = 0 and k = 1 of the variable x 1 seen in equation (13.16). In the second and final step, we interpolate the variable x 2 on fˆ(x 1 , x 2 j ) and fˆ(x 1 , x 2 j+1 ), fˆ(x 1 , x 2 ) = L2,0 (x 2 ) fˆ(x 1 , x 2 j ) + L2,1 (x 2 ) fˆ(x 1 , x 2 j+1 ) giving the solution: fˆ(x 1 , x 2 ) =L1,0 (x 1 )L2,0 (x 2 ) f (x 1i , x 2 j ) + L1,1 (x 1 )L2,0 (x 2 ) f (x 1i+1 , x 2 j ) + L1,0 (x 1 )L2,1 (x 2 ) f (x 1i , x 2 j+1 )
(13.42)
+ L1,1 (x 1 )L2,1 (x 2 ) f (x 1i+1 , x 2 j+1 ). This procedure can be extended to more than two dimensions. Our GAUSS R procedure BLIP implements equation (13.42). MATLAB provides the command griddedinterpolant for linear and cubic spline interpolation for functions of more than one independent variable. CUBIC SPLINES. A cubic tensor product polynomial on the rectangle [x 1i , x 1i+1 ] × [x 2 j , x 2 j+1 ] ij
with coefficients ckl can be written as ij
p4 (x 1 , x 2 ) := 17
4 X 4 X k=1 l=1
ij
ckl (x 1 − x 1i )k−1 (x 2 − x 2 j )l−1 .
In a slight abuse of the notation introduced in (13.17), we use the first index to refer to the argument of the function y = f (x 1 , x 2 ) and ignore the dependence on the chosen rectangle (i, j).
13.9 Multivariate Extensions
781
Similar to bilinear interpolation, we can construct the cubic spline in two dimensions stepwise (see Press et al. (1992) pp. 120ff. ). Let n1 and n2 denote the numbers of grid points on [a1 , b1 ] and [a2 , b2 ], respectively. In the first step, we compute for each grid point x 1i ∈ [a1 , b1 ] the quadratic coefficients of the cubic spline on x 20 , x 21 , . . . , x 2n2 . They approximate the function f (x 1i , x 2 ). Next, we evaluate the n1 splines for a given value x 2 ∈ [a2 , b2 ]. This yields the points fˆ(x 10 , x 2 ), fˆ(x 11 , x 2 ), . . . , fˆ(x 1n1 , x 2 ). In the third and final step, we employ these values to compute the spline over x 10 , . . . , x 1n1 and evaluate this spline at x 1 . For the GAUSS programming environment, we implement this approach in two functions encoded in Fortran. They can be called via the GAUSS foreign language interface. The procedure CSpline2_coef must be called first. It solves the first step. Afterwards, the procedure CSpline3_eval provides the interpolated value.
13.9.3 Multidimensional Chebyshev Regression The coefficients of a tensor product Chebyshev polynomial follow from formulas similar to (13.35). We will demonstrate their derivation for the case of d = 2. For notational convenience, we assume the same number of polynomials K1 = K2 = K and (with the obvious change of coordinates, zi = ξ−1 (x i )) consider f (z1 , z2 ) on the square [−1, 1]2 : p
K,K
K X K X
(z) :=
k1 =0 k2 =0
γk1 ,k2 Tk1 (z1 )Tk2 (z2 ).
Let γ = [γ0,0 , . . . , γ0,K , γ1,0 , . . . , γ1,K , . . . , γK,K ] T denote the vector of coefficients. We wish to choose these to minimize the double integral: Z
1
Z 1
−1
−1
I(γ) :=
f (z1 , z2 ) − q
K X K X k1 =0 k2 =0
d z1 d z2 . q 1 − z12 1 − z22
The first-order conditions are: ∂ I(γ) = 0, r1 , r2 = 0, 1, . . . , K. ∂ γ r1 ,r2
2 γk1 ,k2 Tk1 (z1 )Tk2 (z2 )
782
13 Function Approximation
The respective system of linear equations is: Aγ = b. The vector b has the typical element: Z
Z
1
1
bi := −1
−1
Tr1 (z1 )Tr2 (z2 ) f (z1 , z2 ) d z1 d z2 , q q 1 − z12 1 − z22
i = 1 + r2 + r1 (1 + K).
The typical element of the matrix A is: Z
1
ai, j := −1
Tr1 (z1 )Tk1 (z1 ) q 1 − z12
Z
j = 1 + k2 + k1 (1 + K).
1 −1
Tr2 (z2 )Tk2 (z2 ) d z2 d z1 , q 1 − z22
Hence, from the orthogonality of the Chebyshev polynomials (13.31), π2 , for i = j = 1, 1 2 π , for i = j = 2, . . . , 3 + K, ai, j = 21 2 for i = j = 4 + K, . . . (1 + K)2 , 4π , 0, for i 6= j. We approximate the elements of the vector b by Gauss-Chebyshev sums on m ≥ 1 + K nodes of Tm (z): bi =
m m ππ X X Tr (z 0 )Tr (z 0 ) f (zk0 , zk0 ). 1 2 m m k =1 k =1 1 k1 2 k2 1
2
Therefore, the formula: 4 γ r1 ,r2 = c r1 c r2 ¨ 2, c r1 , c r2 = 1,
m m 1 X X Tr (z 0 )Tr (z 0 ) f (zl0 , zl0 ), 1 2 m2 k =1 k =1 1 k1 2 k2 1
2
for r1 , r2 = 0,
(13.43)
otherwise.
determines the coefficients of p K,K (z1 , z2 ). Similar to the univariate case, we can compute the coefficients from a matrix product. Let
13.9 Multivariate Extensions
783
0 f (z10 , z10 ) . . . f (z10 , zm ) γ0,0 . . . γ0,K .. .. .. , Γ := ... . . . ... F := . . . 0 0 0 f (zm , z10 ) . . . f (zm , zm ) γK,0 . . . γK,K denote the matrix of function values at the m2 nodes of Tm (z) and the coefficient matrix, respectively. Let the matrices TK,m and D be defined as in (13.37) and (13.38), respectively. Then, the coefficient matrix follows from: 0 Γ = DTK,m F TK,m D.
(13.44)
The obvious generalization of (13.43) to the d-dimensional case is: m m X 2d 1 X γ r1 ,...,rd = ··· Tr (z 0 ) · · · Trd (zk0 ) f (zk0 , . . . , zk0 ), d 1 d c r1 · · · c rd md k =1 k =1 1 k1 1 d ¨ 2, for r j = 0, j = 1, . . . , d, cr j = 1, otherwise.
(13.45)
13.9.4 The Smolyak Polynomial The seminal paper of Smolyak (1963) presents a method to construct a polynomial with a relatively small number of coefficients that interpolates a function of d variables.18 The collocation nodes form a sparse grid within the d-dimensional hypercube [−1, 1]d . The first application of Smolyak’s method to an economic problem was an article by Krueger and Kubler (2004). The presentation here follows Malin, Krueger, and Kubler (2011) and Judd, Maliar, Maliar, and Valero (2014). We begin with the construction of the grid. CONSTRUCTION OF THE GRID. Let G i , with G 1 = {0} denote the set that consists of the extrema of the Chebyshev polynomial of degree mi − 1, where mi := 2i−1 + 1, i = 2, 3, . . . . 18
(13.46)
Smolyak’s paper is written in Russian. We include the reference to give credit to the inventor of the method.
784
13 Function Approximation
Therefore, the first four unidimensional grids are (see also Table 13.1): G1 = 0 , G 2 = − 1, 0, 1 , p p G 3 = − 1, −0.5 2, 0, 0.5 2, 1 , § Æ Æ p p p G 4 = − 1, −0.5 2 + 2, −0.5 2, −0.5 2 − 2, 0, ª Æ Æ p p p 0.5 2 − 2, 0.5 2, 0.5 2 + 2, 1 . Note that G 1 ⊂ G 2 ⊂ G 3 ⊂ G 4 . In general, each of the sets G i = z¯1 , z¯2 , . . . , z¯mi
(13.47)
has an odd number of elements, and the locations of the extrema of Tmi −1 and Tmi imply G i−1 ⊂ G i . We need two more definitions before we can construct the multidimensional Smolyak grid. Let i := [i1 , i2 , . . . , id ] denote the vector of indices of unidimensional grids in dimensions 1 through d, and abbreviate the sum of its elements by |i| = i1 + i2 + · · · + id . Additionally, we introduce the integer λ = 0, 1, . . . . It determines the degree of exactness (or the level of the approximation) in the sense that the Smolyak polynomial introduced below can reproduce polynomials of degree at most λ (see Barthelmann et al. (2000), Theorem 4). The Smolyak grid of dimension d and level of approximation λ is defined as: [ H d,λ = G m i1 × G m i2 × · · · × G m i d . (13.48) |i|=d+λ
In words, it consists of the unions of Cartesian products of unidimensional grids with the property that the sum of the indices considered in the construction is equal to d + λ. For d = 1, the set H 1,λ is equal to the set G m1+λ . As an example, we construct the set H 2,2 on [−1, 1]2 . The union consists of the tensor products that satisfy i1 + i2 = 4. These are:
13.9 Multivariate Extensions
785
p p i1 = 1 and i2 = 3 : G 1 × G 3 = (0, −1); (0, −0.5 2); (0, 0); (0, 0.5 2); (0, 1) , i1 = 2 and i2 = 2 : G 2 × G 2 = (−1, −1); (−1, 0); (−1, 1); (0, −1); (0, 0); (0, 1); (1, −1); (1, 0); (1, 1) , p p i1 = 3 and i2 = 1 : G 3 × G 1 = (−1, 0); (−0.5 2, 0); (0, 0); (0.5 2, 0); (1, 0)}.
Therefore, H 2,2 is a set of 13 pairs of Chebyshev extrema: p p H 2,2 = (0, −1); (0, −0.5 2); (0, 0); (0, 0.5 2); (0, 1);
(13.49)
(−1, −1); (−1, 0); (−1, 1); (1, −1); (1, 0); p p (1, 1); (−0.5 2, 0); (0.5 2, 0) .
Figure 13.14 displays these pairs as circles together with the elements of the tensor product G 3 × G 3 as blue dots. The latter set consists of 25 elements, almost twice the number of elements of the Smolyak grid. 1.0
0.5
0.0
−0.5 −1.0 −1.0
−0.5
0.0
0.5
1.0
Figure 13.14 Tensor and Smolyak Grid
INTERPOLATING POLYNOMIAL. The polynomial that interpolates a given function f : [−1, 1]d → R, z := [z1 , z2 , . . . , zd ] 7→ f (z)
786
13 Function Approximation
is a linear combination of polynomials, which are themselves sums of the tensor products of Chebyshev polynomials: i
p (z) :=
mi1 −1
X
l1 =0
···
mid −1
X
l d =1
αl1 ,...,l d Tl1 (z1 ) · · · Tl d (zd ).
(13.50)
The Smolyak polynomial combines the polynomials pi according to:19,20 p
d,λ
(z) :=
X
(−1)
d+λ−|i|
max{d,1+λ}≤|i|≤d+λ
d −1 d + λ − |i|
X
pi (z).
i1 +···+id =|i|
(13.51)
To understand these formulas, we consider the case of d = 2 and λ = 2. Therefore, we have to consider combinations of indices i1 and i2 that satisfy 3 = max{2, 1 + 2} ≤ i1 + i2 ≤ 4 = 2 + 2. The respective numbers of basis functions mi1 and mi2 follow from equation(13.46) and the weights attached to each combination satisfy: 1 4−i1 −i2 w i1 ,i2 = (−1) . 4 − i1 − i2
Accordingly, the single terms of the sum (13.51) are: i1 = 1 and i2 = 2 ⇒ mi1 = 1, mi2 i1 = 2 and i2 = 1 ⇒ mi1 = 3, mi2 i1 = 1 and i2 = 3 ⇒ mi1 = 1, mi2 i1 = 2 and i2 = 2 ⇒ mi3 = 1, mi2 i1 = 3 and i2 = 1 ⇒ mi5 = 1, mi2
1 = 2, w1,2 = (−1) = −1, 1 1 = 1, w2,1 = (−1)1 = −1, 1 1 = 5, w1,3 = (−1)0 = 1, 0 1 = 3, w2,2 = (−1)0 = 1, 0 1 = 1, w3,1 = (−1)0 = 1, 0 1
and the five respective polynomials are: 19 20
See Judd et al. (2014), equations (5)-(7). The symbol n n! = k k!(n − k)!
in this formula denotes the binomial coefficient, which gives the weight assigned to pi .
13.9 Multivariate Extensions
787
p1,2 (z1 , z2 ) = γ0,0 T0 (z1 )T0 (z2 ) + γ0,1 T0 (z1 )T1 (z2 ) + γ0,2 T0 (z1 )T2 (z2 ), p2,1 (z1 , z2 ) = γ0,0 T0 (z1 )T0 (z2 ) + γ1,0 T1 (z1 )T0 (z2 ) + γ2,0 T2 (z1 )T1 (z2 ), p1,3 (z1 , z2 ) = γ0,0 T0 (z1 )T0 (z2 ) + γ0,1 T0 (z1 )T1 (z2 ) + γ0,2 T0 (z1 )T2 (z2 ) + γ0,3 T0 (z1 )T3 (z2 ) + γ0,4 T0 (z1 )T4 (z2 ), p2,2 (z1 , z2 ) = γ0,0 T0 (z1 )T0 (z2 ) + γ0,1 T0 (z1 )T1 (z2 ) + γ0,2 T0 (z1 )T2 (z2 ) + γ1,0 T1 (z1 )T0 (z2 ) + γ1,1 T1 (z1 )T1 (z2 ) + γ1,2 T1 (z1 )T1 (z2 ) + γ2,0 T2 (z1 )T0 (z2 ) + γ2,1 T2 (z1 )T1 (z2 ) + γ2,2 T2 (z1 )T2 (z2 ), p
3,1
(z1 , z2 ) = γ0,0 T0 (z1 )T0 (z2 ) + γ1,0 T1 (z1 )T0 (z2 ) + γ2,0 T2 (z1 )T0 (z2 ) + γ3,0 T3 (z1 )T0 (z2 ) + γ4,0 T4 (z1 )T0 (z2 ).
Adding the five polynomials together with their weights leaves the final interpolating polynomial with 13 elements: p d=2,λ=2 (z) = γ0,0 T0 (z1 )T0 (z2 ) + γ0,1 T0 (z1 )T1 (z2 ) + γ0,2 T0 (z1 )T2 (z2 ) + γ0,3 T0 (z1 )T3 (z2 ) + γ0,4 T0 (z1 )T4 (z2 ) + γ1,0 T1 (z1 )T0 (z2 ) + γ1,1 T1 (z1 )T1 (z2 ) + γ1,2 T1 (z1 )T2 (z2 ) + γ2,0 T2 (z1 )T0 (z2 ) + γ2,1 T2 (z1 )T1 (z2 ) + γ2,2 T2 (z1 )T2 (z2 ) + γ3,0 T3 (z1 )T0 (z2 ) + γ4,0 T4 (z1 )T0 (z2 ). Note that the set H 2,2 presented in (13.49) also has 13 elements. Therefore, we can determine the 13 coefficients of the polynomial by solving the linear equation
f (¯ z11 , z¯21 ) T0 (¯ z11 )T0 (¯ z21 ) . . . T4 (¯ z11 )T0 (¯ z21 ) γ0,0 z12 , z¯22 ) T0 (¯ z12 )T0 (¯ z22 ) . . . T4 (¯ z12 )T0 (¯ z22 ) γ0,1 f (¯ = . , . . . . .. .. .. .. .. γ4,0 f (¯ z113 , z¯213 ) T0 (¯ z113 )T0 (¯ z213 ) . . . T4 (¯ z113 )T0 (¯ z213 ) (13.53) where (¯ z1i , z¯2i ), i = 1, . . . , 13 denote the elements of H 2,2 . Therefore, the difficult part of the implementation of the Smolyak polynomial is the construction of the matrix on the rhs of equation (13.53) for a given dimension R d and level of approximation λ. Our MATLAB function ChebBase2.m performs this task. It is based on the code that accompanies Judd, Maliar, Maliar, and Valero (2014).
788
13 Function Approximation
13.9.5 Neural Networks Neural networks provide an alternative to linear combinations of polynomials. A single-layer neural network is a nonlinear function of the form n X Φ(a, x) := h ai g(x i ) , i=1
where h and g are scalar functions. In the left panel of Figure 13.15, the first row of nodes represents the function g processing the inputs x i . The result is aggregated via summation, as indicated by the arrows toward the single node that represents the function h that delivers the final output y. In the single hidden-layer feed-forward network displayed in the right panel of Figure 13.15, the function g delivers its output to a second row of nodes. There, this input is processed by another function, say G, before it is aggregated and passed on to the function h. Single Layer x1
...
x2
Hidden-Layer Feedforward xn
x1
...
x2
...
xn
... ...
y
y
Figure 13.15 Neural Networks
Formally, the single hidden-layer feedforward network is given by: n ! m X X Φ(a, b, x) := h bjG ai j g(x i ) . j=1
i=1
The function G is called the hidden-layer activation function. A common choice for G is the sigmoid function G(x) =
1 . 1 + e−x
13.9 Multivariate Extensions
789
Neural networks are efficient functional forms for approximating multidimensional functions. Often, they require fewer parameters for a given accuracy than polynomial approximations.21 Duffy and McNelis (2001) solve the stochastic growth model by approximating the rhs of the Euler equation for capital by a function of capital and log TFP using either neural networks or polynomials and find that the former may be preferred to the latter. Lim and McNelis (2008) use neural networks throughout their book to solve increasingly complex models of small open economies. The deep-learning algorithm of Maliar et al. (2021) for solving dynamic economic models involves neural networks to perform model reduction and to handle multicollinearity. Fernández-Villaverde et al. (2023) employ neural networks to solve a heterogenous agent model with an occasionally binding constraint on the nominal interest rate set by the monetary authority.
21
See Sargent (1993), pp. 58f. and the literature cited there.
Chapter 14
Differentiation and Integration
14.1 Introduction There are numerous instances where we need analytic or numerical derivatives of functions. We require the former to derive the first-order conditions of optimization problems, while the latter are important ingredients in many algorithms. Think of the perturbation methods considered in Chapters 2-4 that rest on Taylor’s theorem 13.3.2 or Newton’s method to locate the zeros of a system of equations, which is encountered in Chapter 15. The next section presents the standard methods that numerically approximate first-order and second-order partial derivatives from finite differences. Finite differences are, however, not suitable to produce numerically reliable third- and higher-order derivatives. For those, computer algebra or automatic differentiation techniques are the methods of choice. They are usually available in specific toolboxes and outlined in Section 2.6.4. There are two reasons to include numerical integration in this text. The first and foremost is that we often must approximate expectations, either as part of an algorithm or to check the numerical precision of a certain method. Consider the weighted residuals methods of Chapter 5 and the computation of Euler residuals, as, e.g., in Section 4.4. Expectations are a special form of integral. More general integrals can be either part of a model’s equilibrium conditions or must be solved to implement an algorithm. Consider the Galerkin and the least squares versions of the spectral methods in Chapter 5. Section 14.3 covers numerical integration more generally, and Section 14.4 the approximation of expectations.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_14
791
792
14 Differentiation and Integration
14.2 Differentiation This section provides some background on numerical differentiation and presents several algorithms that approximate the Jacobian matrix of a vector-valued function and the Hessian matrix of a real-valued function, respectively. The related program code can be used in place of built-in routines, e.g., Gradp and Hessp in GAUSS or DFDJAC and DFDHES from the IMSL library of Fortran subroutines.
14.2.1 First-Order Derivatives DIFFERENCE FORMULAS. The basis of numerical derivative formulas is Taylor’s theorem. From Theorem 13.3.1, for k = 1 and some given point x¯ and h > 0 we can obtain: f (¯ x + h) = f (¯ x ) + f (1) (¯ x )h + f (2) (ξ)
h2 , ξ ∈ [¯ x , x¯ + h]. 2
(14.1)
Thus, we may approximate the first derivative by the formula DF D f (¯ x , h) :=
f (¯ x + h) − f (¯ x) . h
(14.2)
This is known as the forward difference formula. The approximation error is proportional to h, since (14.1) implies: D f (¯ x , h) − f (1) (¯ x ) = f (2) (ξ)/2 h. FD
Thus, the error is of first order. The backward difference formula is derived from Taylor’s theorem for −h in place of h. Its error is also of first order. Now, we consider Taylor’s theorem for k = 2, h > 0, −h, ξ1 ∈ [¯ x , x¯ + h], and ξ2 ∈ [¯ x − h, x¯ ]: h2 h3 + f (3) (ξ1 ) , 2 6 2 h h3 f (¯ x − h) = f (¯ x ) − f (1) (¯ x )h + f (2) (¯ x ) − f (3) (ξ2 ) . 2 6 f (¯ x + h) = f (¯ x ) + f (1) (¯ x )h + f (2) (¯ x)
(14.3a) (14.3b)
If we subtract the second line from the first, the quadratic term disappears and from h3 f (¯ x + h) − f (¯ x − h) = 2 f (1) (¯ x )h + f (3) (ξ1 ) + f (3) (ξ2 ) 6
14.2 Differentiation
793
we find the approximation DC D f (¯ x , h) :=
f (¯ x + h) − f (¯ x − h) 2h
(14.4)
known as the central difference formula. Letting C denote the maximum of ( f (3) (ξ1 ) + f (3) (ξ2 ))/6 in [¯ x − h, x¯ + h], we see that the approximation error is proportional to Ch2 and, thus, of second order. When we add equation (14.3a) to equation (14.3b) the first derivative terms cancel, and we obtain the central difference formula for the second derivative: DC2 D f (¯ x , h) :=
f (¯ x + h) + f (¯ x − h) − 2 f (¯ x) , 2 h
(14.5)
for which the approximation error is bounded by Ch and, thus, of first order. CHOICE OF h. From the previous discussion, it might seem to be a good idea to choose h as small as possible. However, recall the finite precision of computer arithmetic. Let us suppose that your computer is able to represent, say, the first 10 digits to the right of the decimal point of any floating point number correctly. If h is too small, the first and second terms in the numerator of equation (14.2) may differ only in the eleventh digit, and therefore, the computed derivative is highly unreliable. Let us suppose that the error in computing f (¯ x ) and f (¯ x + h) is ¯e and eh , respectively. At the best, ¯e and eh are smaller than the machine epsilon ε, i.e., the smallest positive number for which the statement 1 + ε > 1 is true on your machine. However, if f (x) is the result of complicated and involved computations, the actual error may be much larger. We want to find an upper bound on the total error E(δ, h) that results when we use f˜(¯ x ) := f (¯ x ) + ¯e and f˜(¯ x , h) := f (¯ x + h) + eh to compute DF D f (¯ x , h), where ¯e, eh ≤ δ for some δ ≥ ε. f˜(¯ x ) − f˜(¯ x + h) 0 E(δ, h) := f (¯ x) − h |¯e − eh | ≤ f 0 (¯ x ) − DF D f (¯ x , h) + , h } | {z } | {z ≤Ch
≤ Ch +
2δ , h
C :=
max
≤2δ/h 2
ξ∈[¯ x ,¯ x +h]
f (ξ) . 2
794
14 Differentiation and Integration
Thus, there is a trade-off between the truncation error, which results from considering only the first two terms of the Taylor polynomial, and the round-off error stemming from the finite precision arithmetic of computers. The former decreases with h, whereas the latter increases. Setting the derivative of this upper bound with respect to h to zero and solving for h gives the step size that provides the smallest upper bound: v t 2δ h∗ = . (14.6) C If we perform the same exercise with respect to the central difference formulas (14.4) and (14.5), we find that the optimal choice of h is 13 f (3) (ξ1 ) + f (3) (ξ2 ) 2δ ∗∗ h = , C := max . (14.7) ξ1 ,ξ2 ∈[¯ x ,¯ x +h] C 6
The constant C depends on the properties of the function f (x) and the point x¯ and is, thus, specific to the problem at hand. On the assumption that 2/C ≈ 1, a good choice of h is to set the step size for the central difference formula equal to h = ε1/3 . We employ this value in our procedures that approximate the partial derivatives of systems of equations. RICHARDSON’S EXTRAPOLATION. The derivation of the central difference formula shows that we can combine two approximations with truncation error being proportional to h to a new approximation for which the error is proportional to h2 . Richardson’s extrapolation exploits this observation. Consider, more generally, a formula Dk (h) that approximates some un¯ to the order of k and assume that the truncation error known value D admits a polynomial representation: ¯ = c1 hk + c2 hk+1 + c3 hk+3 + . . . . Dk (h) − D
Then, the approximation defined by Dk+1 (h) := Dk (h/2) +
Dk (h/2) − Dk (h) 2k − 1
has the truncation error 3c3 c2 ¯ =− Dk+1 − D hk+1 − k k+2 + . . . k 2(2 − 1) 4(2k − 1)
(14.8)
and is, thus, of order k + 1 (see, e.g., Burden and Faires (2016), p. 180). This statement can be proved by induction over k. Richardson’s extrapolation gives rise to the following algorithm:
14.2 Differentiation
795
Algorithm 14.2.1 (Richardson’s h Extrapolation) ¯ by a given formula D(h) so Purpose: Approximate some unknown value D that the truncation error is of the order of k + 1. Steps: Step 1: Initialize: Choose a step size h, a maximal order of approximation k, and a (k + 1) × (k + 1) matrix A with elements ai j of zeros. Set ε equal to the machine epsilon. Step 2: For i = 1, 2, . . . , k + 1, compute ai1 = D(h/2i−1 ). Step 3: For j = 1, 2, . . . , k and i = 1, 2, . . . , k + 1 − j, compute ai, j+1 = ai+1, j +
ai+1, j − ai, j 2j − 1
.
¯ Step 4: Return the element a1,k+1 as the estimate of D Alternatively, one may estimate the relative truncation error after each step j from r el er r j :=
2|ak+1− j, j+1 − ak+2− j, j |
|ak+1− j, j+1 | + |ak+2− j, j | + ε
and stop the loop over j = 2, . . . , k whenever this error starts to increase between j − 1 and j. The truncation error of the central difference formula (14.4) has the polynomial representation ¯ = c1 h2 + c2 h4 + c3 h6 + . . . . D(h) − D To obtain improved estimates for the first derivative from this formula, we must replace 2 j − 1 in Step 3 of the algorithm with 4 j − 1. COMPUTATION OF THE JACOBIAN. It is easy to apply the above results in computing the Jacobian matrix (13.10) of a vector-valued function f : Rn → Rm . By using the central difference formula (14.4), we may approximate the element ∂ f i (¯ x)/∂ x j by f (¯ x + e j h) − f (¯ x − e j h) ∂ f i (¯ x) ' , ∂ xj 2h
(14.9)
796
14 Differentiation and Integration
where e j is the unit (row) vector with one in the j–th position and zeros elsewhere. If the arguments x j , j = 1, . . . , n differ considerably in size, we set h proportional to x j using h j = ¯h max{|x j |, 1}.
(14.10)
R The GAUSS procedure CDJac and the MATLAB function CDJac.m as well as the Fortran subroutine CDJac employ equation (14.9) together with this p choice of h j (and ¯h = 3 ε, ε the machine epsilon as default) to compute the Jacobian of a user-supplied routine that evaluates f i (¯ x ), i = 1, 2, . . . , m
R at x¯ . The MATLAB function CDJacRE.m combines (14.9) and Algorithm 14.2.1 to compute Jacobian matrices with a smaller truncation error.
14.2.2 Second-Order Derivatives Now, suppose that we want to compute the elements of the Hessian matrix (13.5) of f : Rn → R. There are two possibilities. Note that the Hessian matrix equals the Jacobian matrix of the gradient of f . Thus, if an analytic expression for the gradient of f is easy to program, one can use this as an input to a procedure that approximates the Jacobian. This gives a better approximation than the use of difference formulas for second partial derivatives.1 If this is not an option, one can apply the central difference formula for the second derivative of a function in one variable to compute the diagonal elements of H.2 This gives: f (¯ x + ei hi ) + f (¯ x − ei hi ) − 2 f (¯ x) ∂ 2 f (¯ x) ' . 2 ∂ xi∂ xi hi
(14.11)
We can improve upon this estimate by using Richardson’s extrapolation as defined in Algorithm 14.2.1. There are several choices for the off-diagonal elements of the Hessian matrix. We now show that a four-point formula provides a second-order truncation error. Let f i , f i j , and f i jk abbreviate the first-, second-, and ¯, and let hi , h j ∈ R>0 for some third-order partial derivatives at the point x i, j ∈ {1, 2, . . . , n}. Using Theorem (13.3.2) for k = 3 yields: 1
As we demonstrated above, the error of the central difference formula for the first derivative is of second order, whereas the error from the central difference formula for the second derivative is of first order. 2 In the following we use hi proportional to max{|x i |, 1} as in (14.10).
14.2 Differentiation
797
1 X X f i 1 i 2 h i1 h i2 2 i1 ∈{i, j} i2 ∈{i, j} X f i1 i2 i3 hi1 hi2 hi3 + C1
f (¯ x + ei hi + e j h j ) = f (¯ x) + f i hi + f j h j + +
1 X 6
X
i1 ∈{i, j} i2 ∈{i, j} i3 ∈{i, j}
1 X X f i 1 i 2 h i1 h i2 2 i1 ∈{i, j} i2 ∈{i, j} X f i1 i2 i3 hi1 hi2 hi3 + C2 ,
f (¯ x − ei hi − e j h j ) = f (¯ x) − f i hi − f j h j + −
1 X 6
X
i1 ∈{i, j} i2 ∈{i, j} i3 ∈{i, j}
where C1 and C2 are truncation errors that involve fourth-order terms. Adding both equations gives f (¯ x + ei hi + e j h j ) + f (¯ x − ei hi − e j h j )
= 2 f (¯ x) + f ii h2i + f j j h2j + ( f i j + f ji )hi h j + C1 + C2 .
(14.12)
so that the third-order terms drop out. Using the same expansions for f (¯ x − ei hi + e j h j ) and f (¯ x + ei hi − e j h j ) yields f (¯ x − ei hi + e j h j ) + f (¯ x + ei hi − e j h j )
= 2 f (¯ x) + f ii h2i + f j j h2j − ( f i j + f ji )hi h j + C3 + C4 .
(14.13)
Subtracting (14.13) from (14.12) and employing Young’s theorem3 , i.e. f i j = f ji , finally produces 1 fi j ' f (¯ x + ei hi + e j h j ) + f (¯ x − ei hi − e j h j ) 4hi h j (14.14) − f (¯ x − ei hi + e j h j ) − f (¯ x − ei hi + e j h j ) ,
with truncation error C1 + C2 − C3 − C4 C := . 4hi h j
Since the terms Ci , i = 1, . . . , 4 are of order four, this error is of order two. We can therefore employ Algorithm 14.2.1 with 4 j − 1 to reduce the truncation error further. We implement the formulas (14.11) and (14.14) in our GAUSS proR cedure CDHesse, the Fortran subroutine CDHesse, and the MATLAB R function CDHesse.m The MATLAB function CDHesseRE combines formulas (14.11) and (14.14) with Algorithm 14.2.1 to reduce the truncation error. 3
See, e.g., Theorem 1.1 on p. 372 of Lang (1997).
798
14 Differentiation and Integration
14.3 Numerical Integration 14.3.1 Newton-Cotes Formulas Basically, there are two different approaches to computing an integral Rb f (x) d x numerically.4 The first idea is to approximate the function f (x) a by piecewise polynomials and integrate the polynomials over subintervals of [a, b]. For example, the trapezoid rule evaluates the function f (x) at the end points x = a and x = b and uses the linear Lagrange polynomial P1 (x) =
x−b x −a f (a) + f (b) a−b b−a
(14.15)
to approximate f (x). Integration of P1 over [a, b] results in the formula Z
b
f (x) d x ≈
a
b−a [ f (a) + f (b)] . 2
(14.16)
More generally, we can divide the interval [a, b] into n equidistant subintervals of length h = (b − a)/n, n ∈ N and approximate f (x) on each of them with the Lagrange polynomial of order n. This gives Z
Z
b
f (x) d x ≈
a
bX n a k=0
f (a + kh)L k (x) d x.
Defining x = a + th, the Lagrange polynomial L k (x) presented in (13.17) becomes L k (t) =
n Y t−i . k − i i=0 k6=i
Using the change of variable formula presented below in (14.21) to replace x with t yields the Newton-Cotes formula Z
b a
4
f (x) d x ≈ h
n X k=0
Z w k f (a + kh), w k :=
nY n 0 i=0 k6=i
t−i d t. k−i
(14.17)
In fact, there is a third approach that we do not pursue here. It considers the related problem to solve an ordinary differential equation. See, e.g., Walter (2014), Section 6.2.4.
14.3 Numerical Integration
799
For n = 2, we obtain Simpson’s rule5 Z b h f (x) d x ≈ [ f (a) + 4 f (a + h) + f (b)] . 3 a
(14.18)
14.3.2 Gaussian Formulas The weights w k of the Newton-Cotes formula (14.17) ensure that the local approximation of f (x) is correct and the nodes x k follow from the arbitrary choice of n. The second approach, which we pursue in all quadrature applications of this book, is to choose both the weights w k and the nodes Rb x k optimally to provide a good approximation of a f (x) d x. It is obvious that we increase the degrees of freedom at our disposal if we choose both the nodes x k and the weights w k simultaneously rather than just the weights w k in order to obtain a good approximation. Essentially, the resulting Gaussian quadrature formulas have twice the order than that of the Newton-Cotes formulas for the same number of functions evaluations.6 GAUSS-CHEBYSHEV INTEGRATION. For the Chebyshev polynomials considered in Section 13.8, the following theorem provides the integration weights and nodes: Theorem 14.3.1 Let zk0 , k = 1, . . . , K denote the zeros of TK (z) and ωk =
π . K
The approximation Z1 I := −1
w(z) f (z) d z '
K X i=1
ωk f (zk0 ) =: I K
(14.19)
is exact for all polynomials of degree 2K − 1 or less.
Now, since the degree of the polynomial Tl (z) := Ti (z)T j (z), i, j < K − 1 from equation (13.31) is l = i + j < 2K − 2, Theorem 14.3.1 implies that (13.32) equals the exact integral (13.31). 5 6
For the computation of w k , k = 0, 1, 2 see Stoer and Bulirsch (2002), p. 147. Notice, however, that higher order does not always translate into higher accuracy.
800
14 Differentiation and Integration
For functions that are 2K times continuously differentiable on [−1, 1], the approximation error is given by:7 I − In =
π 22K−1
f 2K (ξ) , ξ ∈ [−1, 1]. (2K)!
The Gauss-Chebyshev quadrature formula based on the extrema of Tn (z) is8 π 1 1 In = f (¯ z0 ) + f (¯ z1 ) + · · · + f (¯ zK−1 ) + 2 f (¯ zK ) . (14.20) K 2 CHANGE OF VARIABLE TECHNIQUE. Gaussian quadrature formulas, such as the Gauss-Chebyshev formulas from the previous subsection, rest on specific weight functions and specific domains. Therefore, we must adapt those rules to approximate the integral of a specific function f on a given domain. The respective tool is the change of variable technique. We present it here for the general case of a function f : X → R that maps some bounded subset X ⊂ Rn of Euclidean n space to the real line. Let h : X → Y ⊂ Rn , x 7→ y := h(x)
define a one-to-one map between the bounded sets X and Y so that yi = hi (x 1 , . . . , x n ). Furthermore, let ∂ h1 (x) J(h)(x) :=
∂ x1
.. .
∂ hn (x) ∂ x1
... .. . ...
∂ h1 (x) ∂ xn
.. .
∂ hn (x) ∂ xn
denote the Jacobian matrix of h and assume that this matrix is nonsingular on X . If f is integrable on X and h differentiable on X , then the change of variables formula asserts:9 Z Z I(y) := Y 7
f (y) d y =
X
f (h(x))|J(h)(x)| d x.
(14.21)
See, e.g., Chawla (1968), equation (2) or Judd (1998), equation (7.2.4) on p. 259. See, e.g., Mason and Handscomb (2003), equation (8.32) on p. 208). 9 See, e.g., Judd (1998), Theorem 7.5.3, p. 275 or Sydsæter et al. (1999), formula 9.73, p. 54. 8
14.3 Numerical Integration
801
Note that we replace y with h(x) and additionally adjust the integral by the determinant of the Jacobian matrix |J(h)(x)|.10 In the univariate case considered in the previous subsection, the relation between z ∈ [−1, 1] and x ∈ [a, b] is given by the bijection (13.28a), which has the Jacobian matrix b−a ψ0 (z) = . 2 Accordingly, the integral of f : [a, b] → R equals Z b Z1 f (ψ(z)) p b−a f (x) d x = 1 − z2 d z. p 2 2 1−z a −1 p Applying the formula (14.19) to the function g(z) := f (ψ(z)) 1 − z 2 gives Z b K Ç π(b − a) X f (x) d x ≈ f (ψ(zk0 )) 1 − (zk0 )2 . (14.22) 2n a k=1 In the same way, we can use the formula (14.20) to approximate the integral of f (x) on the extrema of TK (z). The most natural way to extend this formula to d dimensions it through product rules. They approximate multiple integrals by multiple sums. For example, consider the integral of f : X → R, X ⊂ Rd over the ddimensional rectangle [a1 , b1 ] × · · · × [ad , bd ]. Its approximation by a product rule based on the Gauss-Chebyshev formula (14.22) is: Z b1 Z bd a1
···
f (x 1 , . . . , x d ) d x 1 · · · d x d
ad
m m πd (b1 − a1 ) · · · (bd − ad ) X X · · · f (ψ(zk0 ), . . . , ψ(zk0 )) (14.23) 1 d (2m)d k1 kd r r × 1 − (zk0 )2 . . . 1 − (zk0 )2 .
≈
1
d
14.3.3 Monomial Integration Formula NOTATION. Stroud (1971) provides quadrature formulas for Riemann integrals of a function f : X → R, X ⊂ Rn over various regions and for
10
For an intuitive explanation, see Judge et al. (1988), p. 31.
802
14 Differentiation and Integration
various weighting functions w(x). They take the form Z b1 Z bn J X ···
a1
an
f (x)w(x) d x 1 · · · d x n ≈
j=1
B j f (v j1 , v j2 , . . . , v jn ),
(14.24)
n
x ∈ X ⊂ R , B j ∈ R,
where the vector v j := (v j1 , v j2 , . . . , v jn ) ∈ Rn denotes the jth integration node. The rule (14.24) has degree d if it is exact for any combination of monomials g(z) :=
α α z1 1 z2 2
· · · znαn ,
α1 , α2 , . . . , αn ≥ 0, 0 ≤
n X i=1
αi ≤ d
but not exact for at least one polynomial of degree d + 1 (Stroud (1971), p. 3). INTEGRATION FORMULA. For the integrals of f over the hypercube Cn := [−1, 1]n (the n-fold Cartesian product of the closed interval [−1, 1]) and the weight function w(x) = 1, Z1 Z1 I Cn ( f ) :=
−1
···
−1
f (x 1 , x 2 , . . . , x n ) d x 1 · · · d x n
we reproduce two formulas. The degree d = 3 formula uses J = 2n + 1 integration nodes (see Stroud (1971), p. 230): I Cn ( f ) ≈ B0 f (0, 0, . . . , 0) + B1 B0 =
3−n n 1 2 , B1 = 2n . 3 6
n X i=1
( f (ei ) + f (−ei )) , (14.25)
The second formula is of degree d = 5, employing J = 2n + 2n + 1 integration (see Stroud (1971), p. 233): I Cn ( f ) ≈ B0 f (0, 0, . . . , 0) + B1 r=
v t2 5
, B0 =
n X i=1
( f (rei ) + f (−rei )) + B2
8 − 5n n 5 n 1 2 , B1 = 2 , B2 = . 9 18 9
X v∈V
f (v), (14.26)
In this formula, the symbol ei denotes the ith unit vector, i.e., the ndimensional vector with all elements equal to zero except its ith element,
14.4 Approximation of Expectations
803
which is equal to one. The set V consists of the 2n elements constructed from n draws of the set {−1, 1}. For instance, if n = 3 this set consists of the eight points v1 := (1, 1, 1),
v5 := (−1, 1, 1),
v2 := (1, 1, −1), v6 := (−1, 1, −1), v3 := (1, −1, 1), v7 := (−1, −1, 1),
v4 := (1, −1, −1), v8 := (−1, −1, −1).
If the domain of the function f is not Cn but X := [a1 , b1 ]×· · ·×[an , bn ], we can employ the same change of variable formula used in the previous subsection to map points in [−1, 1] to [ai , bi ].
14.4 Approximation of Expectations This section explains the numerical approximation of expectations with Gauss-Hermite and monomial integration or, interchangeably, quadrature formulas. We focus on Gaussian random variables, as explained in the next subsection. Afterwards, we introduce Gauss-Hermite integration. To overcome the curse of dimensionality in cases with many random variables, we present several monomial integration formulas in Subsection 14.3.3.
14.4.1 Expectation of a Function of Gaussian Random Variables Let us suppose that the vector of random variables x ∈ Rn is normally distributed with mean E(x) = µ and covariance matrix var(x) := E xx T = Σ so that the probability density function is given by n
1
1
f (x) = π− 2 |Σ|− 2 e− 2 (x−µ)
T
Σ−1 (x−µ)
.
(14.27)
Let g(x) denote some function of this random vector. The problem that we want to solve is to numerically approximate the expectation Z∞ Z∞ E(g(x)) :=
−∞
···
−∞
g(x) f (x) d x 1 · · · d x n .
(14.28)
We transform this problem to a simpler one that replaces the exponential in (14.27) with the weight function
804
14 Differentiation and Integration 2
2
2
w(x) = e−z1 −z2 −···−zn
for which there exist quadrature formulas. Let Ω denote the n × n square matrix with the property Σ = ΩΩ T . Since Σ is the covariance matrix of the random variables x 1 , . . . , x N , it is symmetric and positive definite. Accordingly, we have two choices for the matrix Ω. The first derives from the Jordan factorization of Σ and the second from the Cholesky factorization . The former is given Ω := PΛ1/2 (see equation (12.32)) and the latter by Ω := L (see equation 12.37). Given the matrix Ω, we can define the vector z as: z=
Ω−1 (x − µ) p 2
so that the exponent in (14.27) is given by − 12
p
−1 p 2z T Ω T Σ−1 2Ωz = −z T Ω T ΩΩ T Ωz
= −z T Ω T (Ω T )−1 Ω−1 Ωz = −z T z.
We can now papply the p change of variable technique. The Jacobian matrix of x = µ + 2Ωz is 2Ω with determinant:11 1
1
1
1
|Σ|− 2 = |ΩΩ T |− 2 = (|Ω||Ω T |)− 2 = (|Ω|2 )− 2 = |Ω|−1 . By using this in the formula (14.21), we can write expression (14.28) in terms of the vector of uncorrelated random variables z as: Z∞ Z∞ p T − 2n E(g(h(z))) := π ··· g 2Ωz + µ e−z z d z1 · · · d zn . (14.29) −∞
−∞
14.4.2 Gauss-Hermite Integration UNIVARIATE INTEGRATION FORMULA. Gauss-Hermite integration derives its weights ωi and integration nodes zk , k = 1, 2, . . . , m from the zeros of the Hermite polynomials introduced in Section 13.7.4. In the single 11
See (12.10) for these derivations.
14.4 Approximation of Expectations
805
variable case, let z ∈ R. The integral of g(z) with weight function e−z equals Z I(z) :=
∞
g(z)e
−∞
−z 2
2
p (m!) π 2m dz = ωk g(zk ) + m g (ξ), ξ ∈ R, 2 (2m)! k=1 m X
(14.30)
where g k (z) denotes the kth derivative of g at the point z. Accordingly, if g is a polynomial of degree 2m − 1 (such that g 2m (z) = 0), the remainder term vanishes and the formula is said to be exact of degree 2m − 1. Usingpthis formula and the change of variable formula (14.21) with x = µ + 2σz, the expectation (14.28) can be approximated by E(g(x)) = p ≈p
Z
1 2πσ 1
∞
g(x)e
−∞ m X
−
ωk g(µ +
(x−µ)2 2σ2
p
dx
p 2σzk ) 2σ,
2πσ k=1 m X p ω ˜ k g(µ + 2σzk ), ω ˜ k := p k . = ω π k=1
(14.31)
INTEGRATION NODES AND WEIGHTS. There exist tables with nodes and weights (see, e.g., Judd (1998), p. 262, Table 7.4). Instead, one can follow Golub and Welsch (1969) and compute any number of nodes and weights. They show that the nodes associated with Gaussian quadrature rules based on orthogonal polynomials ϕk (x) are equal to the eigenvalues of the tridiagonal matrix
α1 β1 0 J := .. . 0
β1 α2 β2 .. .
0 β2 α3 .. .
0 0 β3 .. .
... ... ... .. .
0 0 0 .. .
0 0 0 , β
0 0 0 . . . αm−1 m−1 0 0 0 0 . . . βm−1 αm
1 ck+1 2 bk αk := − , βk := , ak ak ak+1
where ak , bk , and ck are the coefficients introduced in equation (13.22). It follows from equation (13.23) that ai = 2, bi = 0, and ci = 2(i − 1) so that the matrix J simplifies to
806
14 Differentiation and Integration
0 β1 0 J := .. . 0
β1 0 β2 .. .
0 β2 0 .. .
0 0 β3 .. .
... ... ... .. .
0 0 0 .. .
0 0 0 ... 0 0 0 0 0 . . . βn−1
0 0 0 , β
1 i 2 βi := . 2
(14.32)
n−1
0
The weights are equal to the squared elements of the first component of the orthonormal eigenvectors of J times the integral Z∞ p 2 e−z d z = π. −∞
Accordingly, instead of using the weights ωk in the quadrature rule (14.31), ˜ k . It is easy to implement the comwe need to compute only the weights ω putation of Gauss-Hermite nodes and weights in matrix oriented languages R as our MATLAB function GH_NW.m demonstrates. MULTIVARIATE EXTENSION. Since the elements of the vector z are uncorrelated, we can evaluate the n-fold integral element wise and approximate each of the partial integrals by a weighted sum with nGH elements: Z
∞ −∞
g(z1 , . . . , zi , . . . zn )e
−zi2
d zi ≈
nGH X
ωik g(z1 , . . . , zik , . . . , zn ).
k=1
The entire integral (14.29) is then approximately equal to the n-fold sum n
E t (g(h(z))) ≈ π− 2
nGH X i1 =1
···
nGH X
ωi1 ωi2 · · · ωin
in =1
×g
p
(14.33)
2Ω(z1,i1 , . . . , zn,in ) T + µ .
R Our MATLAB function GH_quad.m implements equation (14.33). For a given matrix root Ω p and the desired number of integration nodes nGH , it constructs the nodes 2Ωzk and the respective products of weights.
14.4.3 Monomial Rules for Expectations For the weight function in (14.27),
14.4 Approximation of Expectations 2
2
2
w(z) := e−z1 ez2 · · · e−zn = e−z and the related integral Z∞ Z∞ I(z) :=
−∞
···
−∞
T
807
z
T
g(z)e−z z d z1 · · · d zn
Stroud (1971), p. 315, presents several monomial formulas (see also Section 14.3.3). We reproduce three of them. Degree 3, 2n Nodes. A formula of degree d = 3 with only 2n nodes is (Stroud (1971), p. 315): I(z) ≈
n n Æ π2 X Æ g( n/2ei ) + g(− n/2ei ) , 2n i=1
(14.34)
with ei as the ith unit vector. Accordingly, the expectation (14.29) is approximately equal to p 1 X p g nΩe j + µ + g − nΩe j + µ . 2n i=1 n
E(g(h(z))) ≈
(14.35)
Degree 3, 2n Nodes. The formula of degree d = 3 with 2n nodes is (Stroud (1971), p. 316): n π2 X Æ I(z) ≈ n f v 1/2 , 2 v∈V
(14.36)
where V is the set already encountered in equation (14.26) so that the expectation (14.29) is approximately equal to E(g(h(z))) ≈
1 X g (Ωv + µ) . 2n v∈V
(14.37)
Degree 5, 2n2+1 Nodes. Let (s1 , s2 ) represent p the result of two draws from the two-element set {s, −s} with s = (n + 2)/2, and denote by S the set of points obtained from replacing all possible combinations of two elements of the zero vector (0, 0, . . . , 0) with (s1 , s2 ). This set has 2n(n − 1) elements. For example, if n = 3, this set consists of the twelve points
808
14 Differentiation and Integration
s1 := (s, s, 0),
s5 := (s, 0, s),
s9 := (0, s, s),
s2 := (s, −s, 0), s6 := (s, 0, −s), s10 := (0, s, −s), s3 := (−s, s, 0), s7 := (−s, 0, s), s11 := (0, −s, s),
s4 := (−s, −s, 0), s8 := (−s, 0, −s), s12 := (0, −s, −s).
The formula of degree 5 with 2n + 1 nodes is (Stroud (1971), p. 317): n
2π 2 I(x) ≈ g(0, . . . , 0) n+2 n n Æ (4 − n)π 2 X Æ + g( (n + 2)/2e ) + g(− (n + 2)/2e ) , j j 2(n + 2)2 j=1 n X π2 + g(s). (n + 2)2 s∈S
(14.38)
With this rule, the expectation (14.29) is approximately equal to 2 g(µ) n+2 n p 4−n X + g(Ω n + 2e j + µ) 2(n + 2)2 j=1
E(g(h(z))) ≈
p 4−n X + g(−Ω n + 2e j + µ) 2(n + 2)2 j=1 X p 1 + g( 2Ωs + µ). (n + 2)2 s∈S n
(14.39)
DEGREE 5, 2N2+2N NODES This rule combines the 2n nodes from (14.34) with the set V from (14.36): I(x) ≈
n n Æ 4π 2 X Æ g (n + 2)/4e + g − (n + 2)/4e j j (n + 2)2 j=1 n (n − 2)2 π 2 X Æ + n g v (n + 2)/(2(n − 2)) . 2 (n + 2)2 v∈V
Accordingly, the rule approximates the expectation (14.29) with
(14.40)
14.4 Approximation of Expectations
E(g(h(z))) ≈
809
n X p p 4 g n + 2Ωe j / 2 + µ 2 (n + 2) j=1 n X p p 4 g − n + 21Ωe / 2 + µ j (n + 2)2 j=1 n−2 X Æ + n g Ωv (n + 2)/(n − 2) + µ . (14.41) 2 (n + 2)2 v∈V
+
R Our MATLAB function Mon_quad implements the integration formulas (14.34), (14.36), (14.38), and (14.40). Its arguments are the matrix root Ω and an indicator for the desired rule, which takes the values 1, 2, 3, 4. The function returns the integration nodes, say, z = x − µ, and the weights of the respective formula. Added to the vector µ, these nodes can be passed to a function that computes g(µ + z).
Chapter 15
Nonlinear Equations and Optimization
15.1 Introduction This chapter provides a brief introduction to numerical methods that locate the roots of systems of nonlinear equations or that find the extrema of real-valued functions defined on a subset of Euclidean n-space. In this book, the problem of solving nonlinear equations is a key part of the weighted residuals methods in Chapter 5 and the extended path algorithm 6.2.1 in Chapter 6. At several other instances, we just need to find the zero of a univariate, nonlinear function in a given interval. An example is to solve equation (7.17) for working hours L for two given values of the capital stock. Numerical optimization appears in Algorithm 6.3.1 in Chapter 6 if we employ nonlinear functions as basis functions and in Chapter 7 as part of the value function iteration (VI) method. There is an intimate relation between both kinds of problems. The rootfinding problem for the vector-valued function f := [ f 1 , . . . , f n ] : X →⊂ Rn is to find the vector x ∈ X ⊂ Rn that solves the system 0n×1 = f(x) ⇔ 0 = f i (x 1 , x 2 , . . . , x n ), i = 1, 2, . . . , n.
(15.1)
max f (x).
(15.2)
The optimization problem is to find the vector x ∈ X ⊂ Rn that maximizes (or minimizes) the function f : X → R within a set D ⊂ X : x∈D
Obviously, the solution to problem (15.1) is also the minimum of the squared sum
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_15
811
812
15 Nonlinear Equations and Optimization
S(x) =
n X i=1
f i (x)
2
.
With one exception, Algorithm 15.4.6, we consider algorithms that are iterative in the sense that they proceed from a given initial point x0 ∈ X ∈ Rn in successive steps, and it is not guaranteed that they approach the final solution. The first issue, which we consider in the next section, is to develop stopping criteria for our iterative algorithms. Section 15.3 considers standard methods to solve nonlinear equations and Section 15.4 deals with numerical optimization. Still a good reference for the material covered in this chapter is Dennis and Schnabel (1983). More recent texts are Kochenderfer and Wheeler (2019) and Nocedal and Wright (2006). Nonlinear equation solvers and optimization routines are available in all programming languages, and many of them are available free of charge. Whenever possible, we employ the respective routines in our programs. For some of the algorithms presented in this chapter we provide implementations in GAUSS or Fortran.
15.2 Stopping Criteria for Iterative Algorithms Stopping criteria can be based on two questions:1 1) Have we solved the problem? 2) Have we ground to a halt? Let us consider the problem of finding the root of the system of nonlinear equations f(x) = 0. To answer the first question we must decide when f(x) is close to zero. To answer the second question we must decide when two successive points xs+1 and xs are close together so that we can reasonably assume the sequence is near its limit point. To tackle both problems, we need measures of distance, or more generally, vector norms as defined in Section (12.4). Given a vector norm kxk, ∗ we are speaking of a vector sequence {xs }∞ s=1 converging to a point x if s ∗ lims→∞ kx − x k = 0. A key property of a convergent series is the rate at which the series converges to its limit. We say that xs converges at rate q to x∗ if there exists a constant c ∈ [0, 1) and an integer ¯s such that 1
kxs+1 − x∗ k ≤ ckxs − x∗ kq for all s ≥ ¯s.
See Dennis and Schnabel (1983) p. 159.
(15.3)
15.2 Stopping Criteria for Iterative Algorithms
813
If q in (15.3) equals 1 (2), we say that the vector series converges linearly (quadratically). If there is a sequence {cs }∞ s=1 that converges to zero and satisfies kxs+1 − x∗ k ≤ cs kxs − x∗ k,
then we say the sequence (xs )∞ s=1 converges superlinearly. With these definitions at hand, we may accept xc as a solution of f(x) = 0 if it satisfies kf(xc )k < ε
(15.4)
for some choice of ε > 0. Care must be taken with respect to the scaling of f. For example, if f j ∈ [10−5 , 10−4 ] ∀ j and ε = 10−3 , any x causes us to stop. If the function values f j (x 1 , . . . , x n ), j = 1, . . . , n, differ greatly in magnitude, applying (15.4) may be overly restrictive. Therefore, before applying (15.4), the function argument x i , i = 1, . . . , n should be scaled so that the function values f j (x 1 , . . . , x n ) have approximately the same magnitude at points not near the root. An answer to the second question can be based on the rule |x is − x is+1 | 1 + |x is |
≤ ε ∀ i = 1, 2, . . . , n, ε ∈ R++ .
(15.5)
It assesses whether the change in the i–th coordinate of x is small relative to the magnitude of x is . To circumvent the possibility of x i ' 0, 1 + |x is | instead of |x is | is used in the denominator. However, if ∀i x i is much smaller than unity, this criterion indicates convergence too early. Therefore, if the typical value of x i , say, x˜i , is known, Dennis and Schnabel (1983) p. 160 recommend |x is − x is+1 |
max{|x is |, |˜ x i |}
≤ ε ∀i = 1, 2, . . . , n, ε ∈ R++ .
(15.6)
In some cases, for instance, in iterations over the value function, it is known that kxs+1 − x∗ k ≤ ckxs − x∗ k, 0 ≤ c < 1 for all s ≥ 1. Thus, the properties of norms given in (12.3) imply kxs − x∗ k ≤
kxs − xs+1 k . 1−c
814
15 Nonlinear Equations and Optimization
Using kxs − xs+1 k ≤ ε(1 − c), ε ∈ R++
(15.7)
as the stopping rule ensures that the error kxs − x∗ k in accepting xs as the solution is always bounded from above by ε. In Section 15.3.2, we present a globally convergent extension of the Newton-Raphson algorithm 15.3.2. This algorithm is based on finding the minimizer x∗ ∈ Rn of a real-valued function f (x). Therefore, in addition to the stopping criteria discussed thus far, we need criteria that tell us, when we are close to the minimum of f (x). A necessary condition for any minimum is f i (x∗ ) = 0,
i = 1, 2, . . . , n,
where f i (x∗ ) denotes the partial derivative of f with respect to its i-th argument evaluated at x∗ . Let ∇ f := [ f1 , f2 , . . . , f n ] T
denote the column vector of partial derivatives (the gradient of f ). Then, a natural choice to stop is, if at the k-th step k∇ f (xk )k ≤ ε
for a small positive number ε. However, this criterion is sensitive with respect to the scale of f . To see this, let us suppose f (x) are the costs of producing the quantities x i , i = 1, 2, . . . , n in US $. Now, if instead we measure costs in thousands of dollars, so that f˜ = S f f with S f = 10−3 , the algorithm already stops if S f k∇ f (xs )k ≤ ε. To circumvent this problem, we can use k∇ f (xk )k ≤ ε, max{| f˜|, | f (xk )|}
(15.8)
which is independent of the scale of f . We use max{| f˜|, | f |} instead of 1 + | f | in the denominator to allow for the typical value of f , f˜, to be much smaller than 1. However, (15.8) is not independent of the scale of x i , i = 1, 2, . . . , n. For instance, let f (x) := (S x x)2 , where S x is a scaling factor for x and we assume | f˜| ≡ 1 < (S x x k )2 . In this case, the lhs of (15.8) yields 2 . |S x x k |
15.3 Nonlinear Equations
815
Here, again, the algorithm stops the sooner the larger the scale – tons instead of kilos – for x. A measure that is invariant to both the scale of f and x i is the partial elasticity of f with respect to x i : f i (xk )x i . f (xk ) To account for either x ik ≈ 0 or f (xk ) ≈ 0, Dennis and Schnabel (1983) p. 160 recommend the following stopping criterion: f (xk ) max{|x k |, x˜ } i i i (15.9) < ε, ∀i = 1, 2, . . . , n. k ˜ max{| f (x )|, f }
15.3 Nonlinear Equations In the next subsection, we describe two well-known methods that locate the zero of a function of a single variable. Subsection 15.3.2 considers systems of nonlinear equations.
15.3.1 Single Equations BISECTION. Let us suppose we want to solve the equation f (x) = 0 for x ∈ [a, b]. If f is continuous and monotone in [a, b], and if f (a) and f (b) are of opposite sign, the intermediate value theorem tells us that there is a unique x ∗ ∈ [a, b] for which f (x ∗ ) = 0. The bisection method constructs a series of shrinking intervals I j , j = 1, 2, . . . that bracket the solution to any desired degree. Figure 15.1 illustrates this approach. The first interval is given by I1 = [a, p1 ] with p1 = a + (b − a)/2. Since f changes its sign in I1 , that is, f (a) f (p1 ) < 0, we know that x ∗ ∈ I1 . In the next step, we consider the smaller interval I2 = [a, p2 ] with p2 = a+(p1 −a)/2. At the boundaries of I2 , f has the same sign, f (a) f (p2 ) > 0. Thus, the zero must be to the right of p2 . For this reason, we now adjust the lower bound and choose I3 = [p2 , p2 +(p1 − p2 )/2]. Continuing in this way brings us closer and closer to x ∗ . The bisection method, which we summarize in Algorithm 15.3.1, is a derivative-free method and thus can be applied to problems where f is not differentiable. Our GAUSS procedure Bisec implements this algorithm.
816
15 Nonlinear Equations and Optimization y f (b)
f (x)
f (p1 ) 0
a
p2
x∗
p1
b
x
I2
f (a)
I1
Figure 15.1 Bisection Method
Algorithm 15.3.1 (Bisection) Purpose: Approximate the solution x ∗ of f (x) = 0 in I = [a, b]. Steps: Step 1: Initialize: Choose a tolerance ε1 , a parameter convergence criterion ε2 , a maximum number of iterations n, and set i = 1. Step 2: Compute f a = f (a). Step 3: Compute p = a + (b − a)/2. Step 4: Compute f p = f (p). Step 5: Check for convergence: If | f p| < ε1 or (b − a)/2 < ε2 or i = n stop. Step 6: Adjust the boundaries: If f a × f p > 0, replace a with p and f a with f p; else, replace b with p. Increase i by one and return to Step 3.
NEWTON-RAPHSON METHOD. Newton’s method, also known as the Newton-Raphson method, uses the linear approximation of f to locate the zero x ∗ . Since it converges quadratically, it is much faster than the bisection method. In Figure 15.2 the domain of the function f (x) is the set of nonnegative real numbers R+ . Let us consider the point x 0 . We approximate f linearly around x 0 . This gives g 0 (x) := f (x 0 ) + f 0 (x 0 )(x − x 0 ), where f 0 (x 0 ) is the slope of f at x 0 . The root x 10 of g 0 (x) is given by
15.3 Nonlinear Equations
817
y g 0 (x) f (x)
1
g (x)
x 10
x2
x ∗ x1
x
x0
Figure 15.2 Modified Newton-Raphson Method
0 = f (x 0 ) + f 0 (x 0 )(x 10 − x 0 )
⇒
x 10 = x 0 −
f (x 0 ) . f 0 (x 0 )
However, x 10 < 0 where f (x) is not defined. Hence, we choose a point x 1 > 0 between x 10 and x 0 . Approximating f at x 1 and solving for the root of g 1 (x) = f (x 1 ) + f 0 (x 1 )(x − x 1 ) gets us close to x ∗ . The method where one iterates over x s+1 = x s −
f (x s ) f 0 (x s )
(15.10)
until f (x s+1 ) ≈ 0 is called the Newton-Raphson method. The modified Newton-Raphson method takes care of regions where f is not defined and 0 backtracks from x s+1 along the direction f 0 (x s ) to a point x s+1 at which f can be evaluated. There are problems where it is impossible or very costly (in terms of computational time) to compute the derivative of f . For instance, in Section 8.5 we compute the stationary distribution of an exchange economy with credit constraints. In this problem, there is no analytical expression for the function that relates the economy’s interest rate to average asset holdings. In these situations, we use the slope of the secant that connects two points (x s , f (x s )) and (x s+1 , f (x s+1 )) in place of f 0 (x s ) in (15.10) (see Figure 15.3). This provides the secant method:
818
15 Nonlinear Equations and Optimization y
f (x)
ys
ys+1
x s+2
x ∗ x s+1
xs
x
Figure 15.3 Secant Method
x s+2 = x s −
x s+1 − x s f (x s ). f (x s+1 ) − f (x s )
(15.11)
Both the iterative scheme (15.10) and (15.11) converge to the solution x under suitable conditions.2 Furthermore, they are easily generalized to the multivariable framework. ∗
15.3.2 Multiple Equations Assume we want to solve a system of n equations in the unknowns x = [x 1 , x 2 , . . . , x n ]: 0 = f 1 (x 1 , x 2 , . . . , x n ), 0 = f 2 (x 1 , x 2 , . . . , x n ), ⇐⇒ 0 = f(x). (15.12) .. .. .=. 0 = f n (x 1 , x 2 , . . . , x n ), As in the single equation case, there are simple, derivative-free methods, as well as extensions of the Newton-Raphson and secant method. 2
See, e.g., DENNIS and SCHNABEL (1983), Theorem 2.4.3 and Theorem 2.6.3.
15.3 Nonlinear Equations
819
GAUSS-SEIDEL METHOD. This method solves a sequence of univariate equations 0 = f i (x 1 , . . . , x i−1 , x i , x i+1 , . . . , x n ) by considering only x i as unknown and the other n − 1 variables as given constants. It starts with a point xs = [x 1s , x 2s , . . . , x ns ] and obtains a new point xs+1 by solving 0 = f 1 (x 1s+1 , x 2s , . . . , x ns ), 0 = f 2 (x 1s+1 , x 2s+1 , . . . , x ns ), .. .. . = ., 0 = f n (x 1s+1 , x 2s+1 , . . . , x ns+1 ). This process is continued until two successive solutions xs+1 and xs are close together, as defined by either condition (15.5) or (15.6). Thus, the problem of solving n equations simultaneously is reduced to solving n single equations in one variable x is+1 . Depending on the nature of the functions f i these solutions may be either obtained analytically – if it is possible to write x is+1 = hi (xs−i ), where xs−i is the vector xs without its ith element – or by any of the methods considered in the previous subsection. Figure 15.4 illustrates that the sequence of points constructed in this way may well diverge from the true solution (x 1∗ , x 2∗ ). A set of sufficient conditions for the convergence of the Gauss-Seidel iterations is derived by Carnahan et al. (1969) p. 308. Assume there is an open ball N (x∗ , ε) of radius ε > 0 surrounding x∗ defined by its elements x ∈ N (x∗ , ε) satisfying |x i − x i∗ | < ε for all i = 1, 2, . . . , n and a positive number K < 1 so that the partial derivatives of f i satisfy X ∂ f i ∂fi ∂ x (x) + 1 − ∂ x (x) < K, ∀i = 1, 2, . . . , n j i j6=i for all x ∈ N (x∗ ). Then, the Gauss-Seidel iterations converge to x0 for each xs ∈ N (x∗ ).3 MODIFIED NEWTON-RAPHSON. The Newton-Raphson method considered in Subsection 15.3.1 is based on a linear approximation of f (x). In the multivariable case the linear approximation of f(x) at a point xs is 3
This condition derives from condition (5.37) in Carnahan et al. (1969) p. 308, if we define the functions F i (x) as F i (x) = x i − f i (x).
820
15 Nonlinear Equations and Optimization x2
f 2 (x 1 , x 2 ) = 0 (x 11 , x 21 )
(x 12 , x 21 )
x 2∗
x 20
(x 11 , x 20 )
(x 12 , x 22 )
f 1 (x 1 , x 2 ) = 0 x 1∗
x1
Figure 15.4 Gauss-Seidel Iterations
g(x) := f(xs ) + J(xs )w, w := (x − xs ),
with the Jacobian matrix J defined in (13.10). The zero of g(xs+1 ) is xs+1 = xs − J(xs )−1 f(xs ).
(15.13)
To establish the convergence of the sequence of iterations (15.13) we need a few more definitions. Let k · k denote a given vector or matrix norm, depending on the respective context.4 We define an open ball with center xs and radius r, N (xs , r), as the collection of all x ∈ Rn whose distances to xs are less than r: N (xs , r) := {x ∈ Rn : kx − xs k < r}.
The Jacobian matrix J(x) is a map of points in Rn to points in Rn×n . This ˜ ∈ N (xs , r) map is called Lipschitz on N (xs , r) with constant γ if for all x, x the following condition holds ˜k. kJ(x) − J(˜ x)k ≤ γkx − x
This is a stronger condition than the continuity of J. A sufficient condition for J to be Lipschitz on N (xs , r) with constant γ is that J is differentiable
4
See Section 12.4 on the definition of vector and matrix norms.
15.3 Nonlinear Equations
821
on the closure of N (xs , r). In the one-dimensional case f (x) = 0, this requires the function f to be twice continuously differentiable. The following theorem, taken from Dennis and Schnabel (1983) p. 90, states that the sequence of points x0 , x1 , . . . converges quadratically to x∗ , if x0 is sufficiently close to x∗ . Theorem 15.3.1 Let f : Rn → Rn be continuously differentiable in an open convex set D ⊂ Rn . Assume that there exists x∗ ∈ Rn and r, β > 0, such that N (x∗ , r) ⊂ D, f(x∗ ) = 0, J(x∗ )−1 exists with kJ(x∗ )−1 k ≤ β, and J is Lipschitz with constant γ on N (x∗ , r). Then, there exists ε > 0 such that for all x0 ∈ N (x∗ , ε), the sequence x1 , x2 , . . . generated by xs+1 = xs − J(xs )−1 f (xs ),
s = 0, 1, . . . ,
is well defined, converges to x∗ , and obeys kxs+1 − x∗ k ≤ βγkxs − x∗ k2 ,
s = 0, 1, . . . .
If the initial guess x0 is not as close to the final solution as required by this theorem, the algorithm may hit points for which f is not defined (as the point x 10 in Figure 15.2). To circumvent this case, we specify upper and ¯] such that f is well defined for all x ∈ [x, x ¯]. lower bounds [x, x Putting all pieces together provides Algorithm 15.3.2. We implemented the single equation version of this algorithm in the GAUSS procedure Fixp1. For the multiequation version, we provide two implementations. The procedure FixvMN1 requires no prior knowledge about the domain of f. However, the procedure that evaluates f(x) must return a GAUSS missing value code if it fails to compute f at x + w. The procedure then backtracks from x + w toward x. The procedure FixvMN2 is designed for situations where the user knows the boundaries (as in the application of the deterministic extended path approach of Section 6.2). Both routines compute xs+1 in equation (15.13) from the solution of the linear system J(xs )ws+1 = −f(xs ),
ws+1 = xs+1 − xs ,
via the LU factorization of the Jacobian matrix. Algorithm 15.3.2 (Modified Newton-Raphson) Purpose: Approximate the solution x∗ of (15.12). Steps: ¯]. Step 1: Initialize: choose x0 ∈ [x, x
822
15 Nonlinear Equations and Optimization
Step 2: Compute J(x0 ) the Jacobian matrix of f at x0 and solve J(x0 )w = ¯] choose λ ∈ (0, 1) such that x2 = −f(x0 ). If x1 = x0 + w ∈ / [x, x 0 1 ¯] and set x = x2 . x + λw ∈ [x, x Step 3: Check for convergence: if kf(x1 )k∞ < ε and/or |x i1 −x i0 |/(1+|x i0 |) ≤ ε ∀i for a given tolerance ε ∈ R++ stop, else set x0 = x1 and return to step 2.
BROYDEN’S SECANT UPDATE. In systems with many variables, the computation of the Jacobian matrix slows down the algorithm considerably and may even prohibit the use of the modified Newton-Raphson method. For instance, the problem presented in Section 10.4.2 involves 1200 variables, and each function evaluation requires several minutes. As a consequence, the computation of the Jacobian in Step 2 of Algorithm 15.3.2 with the help of numerical differentiation is a matter of days rather than hours or minutes. In general, for a system of n equations in n unknowns, Newton’s method requires at least 2n2 + n scalar functional evaluations in each step.5 Broyden’s method overcomes this problem. Instead of computing the Jacobian matrix at each step of the iterations, the method updates the most recent estimate of this matrix by using only n function evaluations. The method is an extension of the secant-method given in (15.11) to the multivariate case. Let As denote the estimate of the Jacobian matrix at step s of the iterations. ws+1 = xs+1 − xs is the step from the point xs to xs+1 , and ys+1 := f(xs+1 ) − f(xs ). The extension of the secant formula f 0 (x s ) ≈ ( f (x s+1 ) − f (x s ))/(x s+1 − x s ) to the n-variable case implies As+1 ws+1 = ys+1 .
(15.14)
However, this system of n equations does not uniquely determine the n2 unknown elements of As+1 . In fact, there is an n(n − 1)-dimensional space of matrices satisfying condition (15.14). As shown by Dennis and Schnabel (1983) pp. 170f., the additional condition T As+1 z = As z for all z satisfying xs+1 − xs z = 0 (15.15) minimizes the difference between two successive linear approximations of f(x) at xs and xs+1 subject to condition (15.14). The two conditions 5
2n2 evaluations to obtain the approximate Jacobian matrix with the help of the central difference formula (14.9) and n evaluations to compute f i (x), i = 1, 2, . . . , n.
15.3 Nonlinear Equations
823
(15.14) and (15.15) uniquely determine As+1 via the update formula s+1 y − As ws+1 (ws+1 ) T s+1 s A =A + . (15.16) (ws+1 ) T (ws+1 ) In some applications, even the initial computation of the Jacobian matrix J(x0 ) may be too time-consuming. In these cases, the identity matrix is often used as an initial guess for A0 .6 There are two ways to accelerate the iterations further. 1) Let us suppose we use the QR factorization of As to solve the linear system As ws+1 = −f(xs ). In this case, it is possible to update the matrix product QR instead of As . Since the QR factorization of an n×n matrix requires (4/3)n3 floating point operations (flops), whereas its QR update requires at a maximum 26n2 flops,7 this can save considerable time in systems with many unknowns n. 2) There is also an update formula for the inverse of As , denoted by (As )−1 , that derives from the Sherman-Morrison formula:8 (As+1 )−1 = (As )−1 s+1 w − (As )−1 ys+1 (ws+1 ) T (As )−1 + . (ws+1 ) T (As )−1 ys+1
(15.17)
Dennis and Schnabel (1983) Theorem 8.2.2 shows that Broyden’s algorithm converges superlinearly (see Section 15.2 on the different rates of convergence). However, the lower rate of convergence vis-à-vis the Newton-Raphson algorithm is usually outperformed by the faster computation of the secant update of the approximate Jacobian matrix. The secant method with an update of the QR factorization is implemented in the Fortran program hybrd1. This program is part of a freely available collection of routines named MINPACK that can be used to solve unconstrained optimization problems and to find the roots of a system of nonlinear equations. If the initial guess x0 is bad, it may happen that the Newton-Raphson iterations with or without Broyden’s secant approximation of the Jacobian matrix fail to converge to the solution x∗ . Next, we discuss two approaches that facilitate convergence: the line search and the trust region approach. One has to be careful, though, with the initialization of the Jacobian A0 as the sequence As does not need to converge to the true matrix J(x∗ ). In the computation of the demographic transition problem in Section 10.5, we allow for the respecification of the Jacobian if the algorithm fails to converge, and we use an alternative initialization. 7 See Golub and Van Loan (1996) p. 225 and p. 608. 8 See, for example, Dennis and Schnabel (1983) p. 188. 6
824
15 Nonlinear Equations and Optimization
LINE SEARCH. This strategy forces the algorithm to converge to the solution from any starting point in the domain of the vector valued function f. It is based on two observations: 1) The solution to f(x) = 0 is also a minimizer of T
g(x) := (1/2)f(x) f(x) = (1/2)
n X i=1
( f i (x 1 , . . . , x n ))2 .
2) The Newton-Raphson step at xs , w = −J(xs )−1 f(xs ), is a descent direction for g. To see the latter, note that the linear approximation (see equation (13.7)) of g at xs is given by gˆ (xs+1 ) ≈ g(xs ) + [∇g(xs )] T (xs+1 − xs ), | {z } w
where ∇g denotes the gradient of g, i.e., the column vector of the first partial derivatives of g, which equals: ∇g(xs ) = J(xs ) T f(xs ).
(15.18)
Therefore, g(xs+1 ) − g(xs ) ≈ [∇g(xs )] T w = f(xs ) T J(xs )(−J(xs )−1 f(xs )) = −f(xs ) T f(xs ) ≤ 0.
Thus, the idea is to move in the Newton-Raphson direction, then check whether going all the way actually reduces g. If not, we move back toward xs until we obtain a sufficient reduction in g. The details of this procedure are from Dennis and Schnabel (1983), who show that this algorithm converges to a minimum of g except in rare cases (see their Theorem 6.3.3 on p. 121). Let h(λ) := g(xs + λw) denote the restriction of g to the line through xs in the direction w. We look for a step of size λ ∈ (0, 1] that reduces g(xs ) at least by λα∇g(xs ) T w for a small α ∈ (0, 1/2), i.e.,
15.3 Nonlinear Equations
825
g(xs + λw) ≤ g(xs ) + λα[∇g(xs )] T w.
(15.19)
Dennis and Schnabel (1983) recommend α = 10−4 . First, we try the full Newton-Raphson step, and hence, set λ1 = 1. If λ1 fails to satisfy (15.19), we approximate h by a parabola, ˆh(λ) := aλ2 + bλ + c. Then, we choose λ2 as the minimizer of this function. This produces: λ2 = −
b . 2a
We obtain a and b from the following conditions: 0 ˆh(0) = c = g(xs ), a = h(1) − h(0) − h (0), ˆh(1) = a + b + c = g(xs + w), ⇒ b = h0 (0), c = h(0). ˆh0 (0) = b = ∇g(xs ) T w. Therefore: λ2 = −
−h0 (0) b = . 2a 2(h(1) − h(0) − h0 (0))
(15.20)
Note that λ2 < (1/2) if g(xs +w) > g(xs ) and λ2 = 1/[2(1−α)]. Since steps that are too small or too large can prevent the algorithm from converging to the minimum of g, we require λ2 ∈ [0.1, 0.5].9 If the quadratic approximation was not good, λ2 may still violate (15.19). In this case, we approximate h by a cubic function: ˆh(λ) := aλ3 + bλ2 + cλ + d. The parameters of this approximation must solve the following system of equations
9
ˆh(λ1 ) = aλ3 + bλ2 + cλ1 + d = h(λ1 ) = 1 1
g(xs + λ1 w), (15.21a)
ˆh(λ2 ) = aλ3 + bλ2 + cλ2 + d = h(λ2 ) = 2 2
g(xs + λ2 w), (15.21b)
ˆh0 (0) =
c=
h0 (0) =
ˆh(0) =
d=
h(0) =
See Dennis and Schnabel (1983) for examples.
∇g(xs ) T w,
(15.21c)
g(xs ), (15.21d)
826
15 Nonlinear Equations and Optimization
and the minimizer of ˆh(λ) is the solution to p −b + b2 − 3ac λ3 = . 3a
(15.22)
If α < (1/4), this solution is always real.10 Here, again we avoid too large or too small steps by restricting λ3 to λ3 ∈ [0.1λ2 , 0.5λ2 ].
If λ3 still violates (15.19), we approximate h at points xs , xs + λ2 w, and xs + λ3 w, solve (15.21) and (15.22) for λ4 and continue this procedure until λk satisfies (15.19). To prevent the line search from becoming trapped in an endless loop, we check at each step whether λk is larger than some minimal value λmin . We choose λmin so that λ < λmin implies convergence according to the parameter convergence criterion ε. For example, let us consider the convergence criterion (15.6), we define ∆i :=
|x is − x is+1 |
max{|x is |, |˜ x i |}
,
and ∆ = arg max{∆1 , ∆2 , . . . , ∆n }. Then, λmin = ε/∆. If the line search is used in a pure minimization routine, where (15.9) is used to stop the algorithm, λ < λmin should never occur. If it nevertheless does, this usually indicates that the ε used in (15.6) is too large relative to the ε used in (15.9). If λ < λmin occurs in a nonlinear equation solver, the calling program should verify whether the minimum of g as defined above is also a zero of f . Algorithm 15.3.3 summarizes the line search. Our GAUSS nonlinear equations solver employs this algorithm in the procedure MNRStep. Algorithm 15.3.3 (Line Search) Purpose: Find a step size that achieves a sufficient decrease in the value of a function to be minimized. Steps: Step 1: Initialize: Choose α = 10−4 , compute λmin , put λk = 1, and k = 1. Step 2: If λk satisfies (15.19) stop and return λk ; else, increase k by 1 and proceed to the next step. 10
See Dennis and Schnabel (1983) p. 129.
15.3 Nonlinear Equations
827
Step 3: If k = 2 solve (15.20) for λ2 , yet restrict the solution to the interval λ2 ∈ [0.1, 0.5]. If k > 2 solve (15.21) and (15.22) using the two most recent values of λ, say λk−1 and λk−2 , and restrict the solution to the interval λk ∈ [0.1λk−1 , 0.5λk−1 ]. In any case put λ = λk . If λ > λmin return to step 2, else stop and let the calling program know that no further decrease of g can be achieved within the given parameter tolerance ε. TRUST REGION. This approach specifies an open ball N (xs , δ) of radius δ at xs (the trust region) in which it assumes that the linear approximation of f is sufficiently good. Then, it computes a direction w that minimizes T gˆ (x) := 0.5 f(xs ) + J(xs )w [f(xs ) + J(xs )w subject to kwk2 ≤ δ. x2
xSD1 xSD2
x DL xN R2
xSD3
x δ
s
xN R1
N (xs , δ) x1
Figure 15.5 Dogleg Step
Figure 15.5 illustrates the approximate solution to this problem.11 If the Newton-Raphson step wN R from xs to xN R1 remains in N it is selected. If
11
See Dennis and Schnabel (1983) pp. 130ff.
828
15 Nonlinear Equations and Optimization
the Newton-Raphson step forces the algorithm to leave N (the point xN R2 in Figure 15.5), the steepest decent direction is considered. This direction is given by the gradient of g (see (15.18)). The algorithm first tries to minimize gˆ along this direction. If the point xS D1 = xs − µ∇g(xs ),
µ=
k∇g(xs )k22
[∇g(xs )] T [J(xs )] T J(xs )∇g(xs )
is outside N , the algorithm moves to the point xS D2 = xs −
δ ∇g(xs ) k∇g(xs )k2
on the boundary of N . Otherwise, the point x D L = xs + w D L ,
w D L = wS D + λ[wN R − wSD ],
wN R = −J(xs )−1 f(xs ), wS D = −µ∇g(xs ),
is selected. This point is on the intersection of the convex combination between the steepest decent point xSD3 and the Newton-Raphson point xN R2 with the boundary of N . The step from xs to this point is called the dogleg step, and the parameter λ is the positive root of the quadratic equation T 2 [wSD ] T wSD − δ2 0 = λs + λ wN R − wSD wSD + , α α T N R α := wN R − wS D w − wSD .
The initial radius of the trust region is usually set to a multiple, 1, 10, or 100, say, of kx0 k2 . This radius is shortened if it is not possible to reduce g sufficiently. Dennis and Schnabel (1983) pp. 143ff. recommend the line search algorithm to compute λ ∈ (0.1, 0.5) so that δs+1 = λδs . The implementation of the trust region approach in the Fortran program hybrd1 just sets δs+1 = 0.5δs , if the actual reduction of the function value ∆a := g(xs ) − g(xs+1 ) is less than 10% of the predicted reduction ∆ p := g(xs )− gˆ (xs+1 ). If ∆a ∈ (0.1, 0.5)∆ p , the trust radius is not changed. It is set to δs+1 = max{δs , 2kxs+1 − xs k} if ∆a ∈ (0.5, 0.9)∆ p . If the actual reduction amounts to ∆a ∈ (0.9, 1.1)∆ p the program doubles the radius. R The trust region approach is also part of the MATLAB function fsolve.
15.4 Numerical Optimization
829
15.4 Numerical Optimization There are some algorithms where we must find the extrema of a given function. Think of the nonlinear least squares problem as part of Algorithm 6.3.1 or think of the maximization step as part of Algorithm 7.3.1. In other algorithms, we are free to choose whether to solve the system of first-order conditions that characterizes the optimal solution or to employ numerical optimization tools. Sometimes one line of attack may work while the other performs poorly. Here, we describe three well-known tools from numerical optimization. The golden section search is a simple means of locating the maximizer of a single valued function in a given interval [a, b]. The GaussNewton method is tailored to nonlinear least squares problems, while the BFGS quasi-Newton method is suitable for a wide class of unconstrained minimization problems. Finally, we consider stochastic algorithms.12
15.4.1 Golden Section Search This method locates the maximum of a single peaked function f (x) in the interval I = [A, D]. The idea is to shrink the interval around the true maximizer x ∗ in successive steps until the midpoint of the remaining interval is a good approximation to x ∗ (see Figure 15.6). Assume we have two function evaluations at points B and C, respectively. It is obvious from Figure 15.6 that for f (C) > f (B) the maximum lies in the shorter interval [B, D]. However, if we observe f (B) > f (C), the maximum lies in the (also shorter) interval [A, C]. The question is, how should we choose B and C? There are two reasonable principles that guide our choice. First, we note that we do not know in advance whether we end up with [B, D] or [A, C]. Our aim is to reduce the interval as much as possible. The unfavorable case is to end up with the larger of the two intervals. We exclude this possibility by choosing B and C so that both intervals are of the same size:13
12
AC = BD ⇒ AB = C D.
(15.23)
Note that any of the algorithms introduced in the following subsections can be used to either maximize or minimize a given function f (x) of n variables, since the maximum of f (x) is equal to the minimum of − f (x). 13 The second equality follows from the fact that AD=AC+C D and that AD=AB+BD.
830
15 Nonlinear Equations and Optimization y
f (C) f (B)
f (x) A
B
x∗ C
A1
B1
D C1
x
D1
Figure 15.6 Golden Section Search
The second principle is to shrink the interval so that the next smaller interval, which is either [A1 , D1 ] = [B, D] or [A1 , D1 ] = [A, C], is a scaleddown version of [A, D]. Accordingly, let p :=
AC AD
,
(15.24a)
and 1−p =
AD − AC AD
=
CD AD
(15.24b)
denote the relative sizes of the subintervals [A, C] and [C, D]. Together with condition (15.23) definitions (15.24) imply the condition 1−p CD AB = = = p. p AC AC The solution of the quadratic equation (15.25) yields: p 5−1 ≈ 0.618. p= 2
(15.25)
(15.26)
15.4 Numerical Optimization
831
This number divides the interval [A, D] into the so-called golden sections. Accordingly, given A and D we choose B = A+ (1 − p)AD and C = A+ pAD. In all following steps, we keep this division. For instance, as in the case of Figure 15.6, if the next interval is [A1 , D1 ] = [B, D], we choose B1 = A1 + (1 − p)A1 D1 and C1 = A1 + pA1 D1 . To decide on the next interval [A2 , D2 ] we need one additional function evaluation, either at C1 if f (B) < f (C) or at B1 if f (B) > f (C). In summary, the following iterative scheme brackets the solution x ∗ : Algorithm 15.4.1 (Golden Section Search) Purpose: Find the maximizer of a single peaked function f (x) in the interval [x, x]. Steps: Step 1: Initialize: Set A = x, D = x and compute B = pA + (1 − p)D,
C = (1 − p)A + pD, p p = (+ 5 − 1)/2,
and store f (B) in f B and f (C) in f C. Step 2: If f B > f C replace D by C, C by B, and f C by f B. Find the new B from B = pC + (1 − p)A and store f (B) in f B. Otherwise: replace A by B, B by C, and f B by f C. Find the new C from C = pB + (1 − p)D and store f (C) in f C. Step 3: Check for convergence: if |D − A| < ε max{1, |B| + |C|} stop and return B; else, repeat the previous step. Our procedure GSS, available in GAUSS and Fortran, implements this algorithm. Its inputs are the pointer to the procedure that returns f (x) and the boundaries of the interval in which the maximum lies.
15.4.2 Gauss-Newton Method Algorithms that solve nonlinear least squares problems are adapted from procedures that solve the more general problem of finding the minimizer of a real-valued function. The solution that we propose is known as the damped Gauss-Newton method.14 To introduce this algorithm, we return 14
See Dennis and Schnabel (1983) Chapter 10.
832
15 Nonlinear Equations and Optimization
to the more common notion of seeking to minimize 1X ( yi − f (γ, xi ))2 , xi = (x i1 , x i2 , . . . , x in ). m i=1 m
S(γ) :=
(15.27)
with respect to the parameter vector γ = (γ1 , γ2 , . . . , γ p ) T for a given set of i = 1, . . . , m observations of a dependent variable yi and a vector xi of n independent variables. The minimizer γ∗ must solve the set of first-order conditions ∂f ∗ ∂S −2 X = ( yi − f (γ∗ , xi )) (γ , xi ) = 0, ∂ γj m i=1 ∂ γj m
(15.28)
j = 1, 2, . . . , p. Instead of solving this system of p nonlinear equations in γ, the simple Gauss-Newton method operates on a linearized minimization problem. Let us suppose we have an initial guess γs := [γs1 , . . . , γsp ] T . We consider the linear approximation of f at this vector:15 . p X ∂ f (γ s , xi ) f (γ, xi ) ≈ f (γ , xi ) + (γ j − γsj ). ∂ γ j j=1 s
The respective linear least squares problem is: 2 p m X X s ∂ f (γ , xi ) 1 yi − f (γs , xi ) − min (γ j − γsj ) . γ m ∂ γ j i=1 j=1 Its solution is provided by the well-known formula ¯ = (X¯ T X¯ )−1 X¯ T y ¯, γ ¯ = γ − γs , and X¯ is the ¯ = [ y1 − f (γs , x1 ), . . . , ym − f (γs , xm )] T , γ with y m × p matrix with typical element x¯ t j :=
∂ f (γs , x t ) . ∂ γj
The simple Gauss-Newton method chooses 15
See equation (13.7)
15.4 Numerical Optimization
833
¯ γs+1 = γs + (X¯ T X¯ )−1 X¯ T y | {z } =:¯ γ
¯ the sum of squares S is decreasing. To see as the next value of γ. Along γ this, note that ∇S(γ) :=
−2 T ¯ X¯ y m
is the (column) vector of partial derivatives of S evaluated at γ. Therefore, ¯= [∇S(γs )] T γ
−2 T ¯ X¯ (X¯ T X¯ )−1 X¯ T y ¯ < 0. y m |{z} | {z } |{z} =:z T
=:A
=:z
This follows from the fact that the matrix X¯ T X¯ and thus its inverse A are positive definite.16 If γs+1 is not the minimizer of S, f is linearized at the new value of γs+1 and the related linear least squares problem is solved again to deliver γs+2 . These steps are repeated until convergence. If the initial value of γ is not near the (local) minimizer, this method may fail to converge, much like the Newton-Raphson method considered in Algorithm 15.3.2. The damped Gauss-Newton method uses the line search from Algorithm 15.3.3 to force the iterations downhill toward a local minimum. Indeed, since the sum of squares (15.27) is bounded from below and since the gradient of a polynomial is continuously differentiable and thus Lipschitz, these iterations satisfy the conditions of Theorem 6.3.3 from Dennis and Schnabel (1983). As a consequence, using the damped Gauss-Newton method takes us to a local minimum of S(γ). We use the stopping rule (15.9) (see Section 15.2) to terminate the algorithm.17 In summary, the damped Gauss-Newton algorithm proceeds as follows: 16 17
See Section 12.7 on definite quadratic forms. Indeed, since Theorem 6.3.3. from Dennis and Schnabel (1983) establishes convergence
of ∇S(γs ) T (γs+1 − γs ) kγs+1 − γs k2
but not of γs , it makes sense to try criterion (15.9) first. Our line search procedure warns us if it is not possible to decrease S further, even if (15.9) is not met.
834
15 Nonlinear Equations and Optimization
Algorithm 15.4.2 (Damped Gauss-Newton) Purpose: Find the minimizer of the nonlinear least squares problem (15.27) Steps: Step 1: Initialize: Choose a vector γ1 and stopping criteria ε1 ∈ R++ and ε2 ∈ R++ , ε1 >> ε2 . Set s = 0. Step 2: Linearize f (γ, xi ) at γs and set ¯ = [ y1 − f (γs , x1 ), . . . , y T − f (γs , x t )] T , y ∂ f (γs , x t ) X¯ = (¯ x t j ) := . ∂ γj Step 3: Compute γs+1 : Solve the linear system ¯ = X¯ T ¯y X¯ T X¯ γ ¯ . Use Algorithm 15.3.3 with ε2 to find the step length λ and set for γ γs+1 = γs + λ¯ γ. Step 4: Check for convergence: Use criterion (15.9) with ε1 to see whether the algorithm is close to the minimizer. If so, stop. If not, and if the line search was successful, increase s by one and return to Step 2. Otherwise stop and report convergence to a nonoptimal point. We provide an implementation of this algorithm in Fortran in the subroutine GaussNewton. The procedure allows the user to either supply his or her own routine for the computation of the gradient of f or to use built-in forward difference methods (or our routines described in Section 14.2) to approximate the gradient. We note that the matrix X¯ is the Jacobian matrix of the vector-valued function f (γ, x1 ) f (γ, x2 ) . γ 7→ .. . f (γ, x T )
Thus, if you write a subroutine that returns for i = 1, . . . , T the number zi = f (γ, xi ) and pass this routine to another routine that approximates the Jacobian matrix of a vector-valued function, e.g., the gradp routine in GAUSS, the output of this routine is X¯ .
15.4 Numerical Optimization
835
15.4.3 Quasi-Newton In this section, we introduce the so-called BFGS method to locate the minimizer of a function of several variables. This method derives from Newton’s method, which we describe next. NEWTON’S METHOD. Let us suppose you want to minimize y = f (x) on an open subset U of Rn . Newton’s method solves this problem by considering the quadratic approximation (see equation (13.6)) fˆ(x0 + h) = f (x0 ) + [∇ f (x0 )] T h + 12 h T H(x0 )h. In this formula, ∇ f (x0 ) is the column vector of first partial derivatives of f with respect to x i , i = 1, 2, . . . , n, and H is the Hessian matrix of second partial derivatives. Minimizing fˆ with respect to the vector h requires the following first-order conditions to hold:18 ∇ f (x0 ) + H(x0 )h = 0. Solving for x1 = x0 + h provides the following iterative formula: x1 = x0 − H(x0 )−1 ∇ f (x0 ).
(15.29)
It is well known that iterations based on this formula converge quadratically to the minimizer x∗ of f (x) if the initial point x0 is sufficiently close to the solution x∗ .19 The second-order conditions for a local minimum require the Hessian matrix to be positive semidefinite in a neighborhood of x∗ .20 Furthermore, using ∇ f (x0 ) = −H(x0 )h in the quadratic approximation formula gives fˆ(x1 ) − f (x0 ) = −(1/2)h T H(x0 )h. Thus, if the Hessian is positive definite (see Section 12.6 on definite matrices), the Newton direction is always a decent direction. The computation of the Hessian matrix is time consuming. Furthermore, there is nothing that ensures this matrix to be positive definite far away 18
See Section 12.7 on the differentiation of linear and quadratic forms. This follows from Theorem 15.3.1, since the iterative scheme (15.29) derives from the Newton-Raphson method applied to the system of first-order conditions ∇ f (x) = 0. 20 See, e.g., Sundaram (1996) Theorem 4.3.
19
836
15 Nonlinear Equations and Optimization
from the solution. Therefore, the quasi-Newton methods tackle these problems by providing secant approximations to the Hesse matrix. In addition, they implement line search methods that direct the algorithm downhill and thus help to ensure almost global convergence. The secant method that has proven to be most successful was discovered independently by Broyden, Fletcher, Goldfarb, and Shanno in 1970. It is known as the BFGS update formula. BFGS SECANT UPDATE. The BFGS quasi-Newton method replaces the Hessian in (15.29) by a positive definite matrix H k that is updated at each iteration step k. The identity matrix I n can be used to initialize the sequence of matrices. Consider the following definitions: xk+1 − xk = −H k−1 ∇ f (xk ),
(15.30a)
wk := xk+1 − xk ,
(15.30b)
zk := ∇ f (xk+1 ) − ∇ f (xk ),
H k+1 := H k +
zk zkT
zkT wk
−
H k wk wkT H kT wkT H k wk
(15.30c) ,
(15.30d)
where the last line defines the BFGS update formula for the secant approximation of the Hessian matrix. The following theorem provides the foundation of the BFGS method:21 Theorem 15.4.1 Let f : Rn → R be twice continuously differentiable in an open convex set D ⊂ Rn and let H(x) be Lipschitz. Assume there exists x∗ ∈ D such that ∇ f (x∗ ) = 0 and H(x∗ ) is nonsingular and positive definite. Then there exist positive constants ε, δ such that if kx0 − x∗ k2 ≤ ε and kH0 − H(x∗ )k ≤ δ, then the positive definite secant update (15.30) is well defined, {xk }∞ remains in D and converges superlinearly to x∗ . k=1 Instead of updating the approximate Hessian H k , one can also start with a positive definite approximation of the inverse of H k , say Ak := H k−1 . The next iteration of xk is then given by xk+1 = xk − Ak ∇ f (xk ).
This involves only vector addition and matrix multiplication, whereas (15.29) requires the solution of a system of linear equations. The BFGS update formula for Ak is given by (see Press et al. (1992), p. 420): 21
See Theorems 9.1.2 and 9.3.1 of Dennis and Schnabel (1983).
15.4 Numerical Optimization
Ak+1 = Ak +
wk wkT
−
(Ak zk )(Azk ) T
wkT zk zkT Ak zk wk Ak zk uk := T − T . wk zk zk Ak zk
837
+ (zk Ak zk )uk ukT , (15.31)
Yet another approach is to use the fact that a positive definite matrix H k has a Cholesky factorization L k L kT = H k , where L k is a lower triangular matrix (see Section 12.9). By using this factorization, it is easy to solve the linear system (L k L kT )(xk+1 − xk ) = −∇ f (xk ) by forward and backward substitution (see Section 12.9). Thus, instead of updating H k , one may want to update L k . Goldfarb (1976) provides the details of this approach, which underlies the GAUSS routine QNewton. The BFGS iterations may be combined with line search (see Algorithm 15.3.3) to enhance global convergence. Indeed, if the Hessian matrix of f (x) (not its approximation!) is positive definite for all x ∈ Rn a Theorem due to Powell (see Dennis and Schnabel (1983) Theorem 9.51 on p. 211) establishes global convergence.22 Putting all pieces together provides Algorithm 15.4.3: Algorithm 15.4.3 (BFGS Quasi-Newton) Purpose: Minimize f (x) in U ⊂ Rn . Steps:
Step 1: Initialize: Choose x0 , stopping criteria ε1 ∈ R++ and ε2 ∈ R++ , ε1 >> ε2 , and either A0 = I n or H0 = I n . Set k = 0. Step 2: Compute the gradient ∇ f (xk ) and solve for wk either from H k wk = −∇ f (xk ) or from wk = −Ak ∇ f (xk ). Step 3: Use Algorithm 15.3.3 with ε2 to find the step length s, and put xk+1 = xk + swk . 22
This does not imply that a computer-coded algorithm does indeed converge. Finite precision arithmetic accounts for differences between the theoretical gradient, the theoretical value of f and the approximate Hessian.
838
15 Nonlinear Equations and Optimization
Step 4: Check for convergence: Use criterion (15.9) with ε1 to see whether the algorithm is close to the minimizer. If so, stop. If not, and if the line search was successful, proceed to Step 5. Otherwise stop and report convergence to a nonoptimal point. Step 5: Use either (15.30d) or (15.31) to obtain Ak+1 or H k+1 , respectively. Increase k by one and return to Step 2. Versions of the algorithm are available in the GAUSS command QNewton R R and in the MATLAB function fminunc, which is part of the MATLAB optimization toolbox.
15.4.4 Genetic Search Algorithms The Gauss-Newton as well as the BFGS quasi-Newton method start from a given initial guess and move downhill along the surface of the objective function until they approach a minimizer. Thus, they may not be able to find the global minimizer. Genetic search algorithms, instead, search the set of possible solutions globally. Accordingly, they are an alternative to gradient based methods in Step 4 of the weighted residuals Algorithm 5.2.1 and Step 2.3 of the stochastic simulation Algorithm 6.3.1. TERMINOLOGY. Genetic search algorithms use operators inspired by natural genetic variation and natural selection to evolve a set of candidate solutions to a given problem. The terminology used to describe genetic algorithms (GAs) is from biology. The set of candidate solutions is called a population, its members are referred to as chromosomes, and each iteration step results in a new generation of candidate solutions. In binary-coded GAs, chromosomes are represented by bit strings of a given length l. Each bit is either on (1) or off (0). In real-coded GAs, a chromosome is a point in an m-dimensional (m ≤ n) subspace of Rn . A chromosome’s fitness is its ability to solve the problem at hand. In most problems, the fitness is determined by a real-valued objective function that assigns higher numbers to better solutions. BASIC STRUCTURE. The evolution of a population of chromosomes consists of four stages: 1) the selection of parents,
15.4 Numerical Optimization
839
2) the creation of offspring (crossover), 3) the mutation of offspring, 4) and the final selection of those members of the family that survive to the next generation. The encoding of the problem (binary or floating point) and the operators used to perform selection, crossover, and mutation constitute a specific GA. The many different choices that can be made along these dimensions give rise to a variety of specific algorithms that are simple to describe and program. However, at the same time, this variety is a major obstacle to any general theory that is able to explain why and how these algorithms work.23 Intuitively, and very generally, one can think of GAs as contractive mappings operating on metric spaces whose elements are populations.24 A mapping f is contractive if the distance between f (x) and f ( y) is less than the distance between x and y. Under a contractive mapping an arbitrary initial population converges to a population where each chromosome achieves the same (maximum) fitness value that is the global solution to the problem at hand. The problem with this statement is that it gives no hint as to how fast this convergence takes place, and no guidance for whether specific operators accelerate or slow down convergence. Therefore, many insights into the usefulness of specific GAs come from simulation studies. In the following, we restrict ourselves to real-coded GAs. Our focus on this class of GAs is motivated by their prevalence in applied research of DSGE models. The methods presented in Chapter 5 and in Section 6.3 rest on the approximation of unknown functions by linear combinations of members of a family of polynomials. The problem is to find the parameters that constitute this approximation. Without prior knowledge about their domain, it is difficult to decide the length l of the binary strings, which determines the precision of the solution. Furthermore, using floating point numbers avoids the time-consuming translation to and from the binary alphabet. An additional advantage of real-coded GAs is their capacity for the local fine-tuning of the solutions.25 23
Mitchell (1996) as well as Michalewicz (1996) review the theoretical foundations of genetic algorithms. 24 See Michalewicz (1996) pp. 68ff. 25 Advantages and disadvantages of real-coded GAs vis-à-vis binary-coded GAs are discussed by Herrera et al. (1998). In their experiments most real-coded GAs are better than binarycoded GAs in minimizing a given function.
840
15 Nonlinear Equations and Optimization
CHOICE OF INITIAL POPULATION. The initial population of a real-coded GA is chosen at random. If there are no a priori restrictions on the candidate solutions, one can use a random number generator to perform this task. In our applications, we draw from the standard normal distribution. When we pass a randomly chosen chromosome to the routine that evaluates the candidate’s fitness, it may happen that the chromosome violates the model’s restrictions. For instance, in Algorithm 6.3.1, a time path may become infeasible. In this case, the program returns a negative number, and our initialization routine discards the respective chromosome. Alternatively, one may want to assign a very small fitness number to those chromosomes. After all, bad genes can mutate or generate reasonably good solutions in the crossover process. SELECTION OF PARENTS. There are many different ways to choose parents from the old generation to produce offspring for the new generation. The most obvious and simplest approach is sampling with replacement, where two integers from the set 1, 2, . . . , n that index the n chromosomes of the population are drawn at random. More in the spirit of natural selection, where fitter individuals usually have a better chance to reproduce, is the concept of fitness-proportionate selection. Here, each chromosome i = 1, 2, . . . , nPhas a chance to reproduce according to its relative fitness p(i) = f (i)/ i f (i), where f (i) denotes the fitness of chromosome i. The following code implements this selection principle: Algorithm 15.4.4 (Fitness-Proportionate Selection) Purpose: Choose a chromosome from the old generation for reproduction Steps:
P Step 1: For i = 1, 2, . . . , n, compute p(i) = f (i)/ i f (i). Step 2: Use a random number generator that delivers random numbers uniformly distributed in [0, 1] and draw Pi y ∈ [0, 1]. Step 3: For i = 1, 2, . . . , n, compute q(i) = j=1 p( j). If q(i) ≥ y, select i and stop. In small populations, the actual number of times an individual is selected as parent can be far from its expected value p(i). The concept of stochastic universal sampling avoids this possibility and gives each chromosome a chance to be selected as parent that is between the floor and the ceiling
15.4 Numerical Optimization
841
of p(i)n.26 Rather than choosing one parent after the other stochastic universal sampling selects n parents at a time. Each member of the old generation is assigned a slice on a roulette wheel, the size of the slice being proportionate to the chromosome fitness f (i). There are n equally spaced pointers and the wheel is spun. For instance, in Figure 15.7 the chromosome i = 1 with relative fitness p(1) is not selected, whereas chromosome 4 is selected twice.
p(1)
p(2)
p(4)
p(3)
Figure 15.7 Stochastic Universal Sampling
The next algorithm implements stochastic universal sampling (see Mitchell (1996), p. 167): Algorithm 15.4.5 (Stochastic Universal Sampling) Purpose: Choose n parents from the old generation for reproduction. Steps: Step 1: For i = 1, 2, . . . , n compute the relative fitness r(i) = P so that 26
P i
f (i) i f (i)/n
r(i) = n.
The floor of x is the largest integer i1 with the property i1 ≤ x and the ceiling is the smallest integer i2 with the property x ≤ i2 .
842
15 Nonlinear Equations and Optimization
Step 2: Use a random number generator that delivers random numbers uniformly distributed in [0, 1] and draw y ∈ [0, 1]. Step 3: Put i = 1. Pi Step 4: Compute q(i) = j=1 r( j). Step 5: If q(i) > y select i and increase y by 1. Step 6: Repeat Step 5 until q(i) ≤ y. Step 7: Terminate if i = n; otherwise, increase i by 1 and return to Step 4. The major problem with both fitness proportionate and stochastic universal sampling is ‘premature convergence’. Early in the search process, the fitness variance in the population is high, and under both selection schemes, the small number of very fit chromosomes reproduces quickly. After a few generations, they and their descendants build a fairly homogenous population that limits further exploration of the search space. A selection scheme that addresses this problem isP sigma scaling. Let σ den note the standard deviation of fitness and f¯ = i=1 f (i)/n the average fitness. Under sigma scaling, chromosome i is assigned a probability of reproduction according to f (i)− f¯ 1 + 2σ if σ 6= 0, p(i) := 1 if σ = 0. An addition to many selection methods is ‘elitism’: The best chromosome in the old generation replaces the worst chromosome in the new generation irrespective of whether it was selected for reproduction.27 CROSSOVER. In nature, the chromosomes of most species are arrayed in pairs. During sexual reproduction, these pairs split, and the child’s chromosomes are the combination of the chromosomes of its two parents. Crossover operators mimic this process. Following Herrera et al. (1998) pp. 288ff., we describe some of these operators for real-coded GAs, where P1 = (p11 , p12 , . . . , p1m ) and P2 = (p21 , p22 , . . . , p2m ) denote the chromosomes of two parents, and C1 = (c11 , c12 , . . . , c1m ) and C2 = (c21 , c22 , . . . , c2m ) are their children. 1) Simple crossover: A position i = 1, 2, . . . , m − 1 is randomly chosen. The two children are: C1 = (p11 , p12 , . . . , p1i , p2i+1 , . . . , p2m ), C2 = (p21 , p22 , . . . , p2i , p1i+1 , . . . , p1m ). 27
See Mitchell (1996) for further selection schemes.
15.4 Numerical Optimization
843
2) Shuffle crossover: For each position i = 1, 2, . . . , m draw a random number λ ∈ [0, 1]. If λ < 0.5 set c1i = p1i , c2i = p2i , else set c1i = p2i , c2i = p1i . 3) Linear crossover: Three offspring are built according to 1 i 1 i p + p , 2 1 2 2 2 1 c2i = p1i − p2i , 3 2 1 i 3 i i c3 = − p1 + p2 , 2 2
c1i =
and the two most promising are retained for the next generation. 4) Arithmetical crossover: A scalar λ ∈ [0, 1] is randomly chosen (or given as constant) and the chromosomes of child 1 and child 2 are built from c1i = λp1i + (1 − λ)p2i , c2i = (1 − λ)p1i + λp2i .
5) BLX-α crossover: One child is generated, where for each i = 1, 2, . . . , m, the number c i is randomly (uniformly) chosen from the interval [pmin − α∆, pma x + α∆],
pma x := max{p1i , p2i }, pmin := min{p1i , p2i }, ∆ := pmax − pmin . Herrera et al. (1998) report good results for α = 0.5. MUTATION. In nature, mutations, i.e., sudden changes in the genetic code, result either from copying mistakes during sexual reproduction or are triggered in living organisms by external forces, e.g., by radiation. In binary-coded GAs, the mutation operator randomly selects a position in a bit string and changes the respective bit from 0 to 1 or vice versa. Mutation operators designed for real-coded GAs also randomly select an element
844
15 Nonlinear Equations and Optimization
of a chromosome and either add or subtract another randomly selected number. Nonuniform operators decrease this number from generation to generation toward zero and, thus, allow for the local fine-tuning of the candidate solutions. The experiments of Herrera et al. (1998) show that nonuniform mutation is very appropriate for real-coded GAs. In our algorithm, we use the following operator suggested by Michalewicz (1996) p. 128. Let ci denote the i-th element in a child chromosome selected for mutation and ci0 the mutated element. The operator selects ci0
=
ci + ∆(t) if a random binary digit is 0 ci − ∆(t) if a random binary digit is 1
∆(t) := y(1 − r
(1−(t/T ) b )
(15.32)
),
where y is the range of ci and r ∈ [0, 1] is a random number. t is the current generation and T the maximal number of iterations. If the range of the parameters is not known in advance y can be drawn from a standard normal distribution. The parameter b defines the degree of nonuniformity. Tests of real-coded GAs to solve various nonlinear programming problems with constraints employ b = 2 (Michalewicz (1996)) and b = 5 (Herrera et al. (1998)). FINAL SELECTION. Whether a final selection among children and parents is undertaken depends upon the choice of selection method of parents. If parents are chosen at random with replacement from generation P (t) one needs a final fitness tournament between parents and children to exert selection pressure. In this case, the initial heterogeneity in the population decreases quickly and reasonably good solutions emerge within a few generations. However, this tight selection pressure may hinder the algorithm from sampling the solution space more broadly so that only local optima are found. Therefore, there is a trade-off between tight selection and short runtime on the one hand and more precise solutions and a longer runtime on the other hand. IMPLEMENTATION. This sketch of the building blocks of GAs, which is by no means exhaustive, demonstrates that the researcher has many degrees of freedom in developing his or her own implementation. Therefore, it is a good idea to build on GAs that have performed well in previous work.
15.4 Numerical Optimization
845
Duffy and McNelis (2001) used a genetic algorithm to find the parameters of the approximate expectations function.28 They choose four parents at random (with replacement) from the old generation. With a probability of 0.95 the best two of the four have two children. With equal probability of selection, crossover is either arithmetical, single point, or shuffle. The probability of mutations in generation t, π(t), is given by π(t) = µ1 + µ2 /t,
(15.33)
where µ1 = 0.15 and µ2 = 0.33. Mutations are nonuniform, as given in (15.32) with b = 2, and there is a final fitness tournament between parents and children. The two members of the family with the best fitness pass to the new generation. In addition, the best member of the old generation replaces the worst member of the new generation (elitism). We provide implementations of this GA in the Fortran subroutine R GSearch1 and in the MATLAB function GSearch1.m. The user can supply the following parameters: • • • • • •
npop: the size of the population, ngen: the number of iterations (generations), probc: the probability of crossover, mu1: the first parameter in (15.33), mu2: the second parameter in (15.33), mu3: the parameter b in (15.32).
In the Fortran subroutine GSearch2 we provide a more flexible implementation of a GA. The user can choose between two selection methods: stochastic universal sampling and the method used in GSearch1. We do not provide an option for sigma scaling, since, from our experiments, we learned that a sufficiently high probability of mutation prohibits population heterogeneity from shrinking too fast. In addition to arithmetical, single point, and shuffle crossover, we allow for BLS-α and linear crossover. This is motivated by the good results obtained for these two operators in the experiments of Herrera et al. (1998). The user can decide either to use a single operator throughout or apply all of the operators with equal chance of selection. The program uses the same mutation operator as Search1.f90. If stochastic universal sampling is used, there is no final fitness tournament. The two children always survive, except they provide invalid solutions. If this happens, they are replaced by their parents. 28 Another application of a GA to the stochastic growth model is the paper of Gomme (1998), who solves for the policy function. His procedure replaces the worst half of solutions with the best half, plus some noise.
846
15 Nonlinear Equations and Optimization
The basic structure of both implementations is summarized in the following algorithm. Algorithm 15.4.6 (Genetic Search Algorithm) Purpose: Find the minimum of a user-defined objective function. Steps: Step 1: Initialize: Set t = 1. Choose at random an initial population of candidate solutions P (t) of size n. Step 2: Find a new set of solutions P (t + 1): Repeat the following steps until the size of P (t + 1) is n: Step 2.1: Step 2.2: Step 2.3: Step 2.4:
Select two parents from the old population P (t). Produce two offspring (crossover). Perform random mutation of the new offspring. Depending upon the selection method in Step 2.1, either evaluate the fitness of parents and offspring and retain the two fittest or pass the two children to the next generation.
Step 3: If t=ngen terminate, otherwise return to Step 2.
Chapter 16
Difference Equations and Stochastic Processes
16.1 Introduction Stochastic processes are present in all chapters of Part I and II of this book, either explicitly or behind the scenes. In the stochastic models of Part I (vector) autoregressive processes (think of the processes that govern total factor productivity (TFP), government spending, and the money growth rate) are the driving forces of the economy. In the heterogeneous agent models of Part II, Markov chains govern idiosyncratic shocks to agents’ labor earnings. The solutions of models with aggregate uncertainty are stochastic processes whose properties can be derived from those of the driving processes as explained in Section 4.2 for perturbation solutions of representative agent models. Moreover, since we interpret (macroeconomic) time series as realizations of stochastic processes, the computation of time series objects, e.g., variances, auto- and cross-correlations, employed in the calibration and evaluation of models, requires a basic understanding of stochastic processes. Accordingly, this chapter gathers some background material to which we refer in Parts I and II of this book. The building blocks of (vector) autoregressive stochastic processes are linear difference equations. We have also used the latter as approximations to systems of nonlinear difference equations, e.g., the system (2.3), which is part of the solution of the deterministic infinite horizon Ramsey model. Therefore, before we formally introduce stochastic processes in discrete time in Section 16.3, we briefly review difference equations in Section 16.2. Section 16.4 considers Markov chains and techniques that approximate continuously valued autoregressive processes by finite-state Markov chains that can be used in value function iteration. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8_16
847
848
16 Difference Equations and Stochastic Processes
Finally, Section 16.5 considers time series filters, notably the much employed Hodrick-Prescott filter, used to extract stochastic trends from nonstationary stochastic processes. If not stated otherwise, the material gathered here is found in econometric textbooks with a focus on time series analysis as, e.g., Hamilton (1994), Lütkepohl (2005), or Neusser (2016). Ljungqvist and Sargent (2018), Chapter 2 covers Markov chains and first-order stochastic difference equations.
16.2 Difference Equations Dynamic models are either formulated in terms of difference or differential equations. Since the models considered in this book are formulated in discrete time we review a few basic definitions and facts about difference equations.
16.2.1 Linear Difference Equations Let us consider a function x that maps t ∈ R into x(t) ∈ R. In practice, we do not observe economic variables x at every instant of time t. Most economic data are compiled at a yearly, quarterly, or monthly frequency. To account for that fact, we consider the function x only at equally spaced points in time: x(t), x(t + h), x(t + 2h), . . . , and usually normalize h ≡ 1. It is then common to write x t instead of x(t). The first difference of x t , ∆x t , is defined as ∆x t := x t − x t−1 , and further differences are computed according to ∆2 x t := ∆x t − ∆x t−1 = x t − 2x t−1 + x t−2 ,
∆3 x t := ∆2 x t − ∆2 x t−1 = x t − 3x t−1 + 3x t−2 − x t−3 , .. . ∆n x t := ∆n−1 x t − ∆n−1 x t−1 .
A difference equation of order n relates the function x to its n differences. The simplest of these equations is
16.2 Difference Equations
∆x t = x t − x t−1 = a x t−1 ,
849
a ∈ R.
(16.1)
In this equation, x t−1 and its first difference ∆x t are linearly related (only addition and scalar multiplication are involved). Furthermore, the coefficient a does not depend on t. Therefore, equation (16.1) is a firstorder linear difference equation with a constant coefficient. The unknown in this equation is the function x. For this reason, equation (16.1) is a functional equation. Let us assume we know the time t = 0 value x 0 . We can then determine all future (or past) values of x t by iterating forwards (or backwards) on (16.1): x 1 = λx 0 ,
λ := 1 + a,
x 2 = λx 1 = λ2 x 0 , x 3 = λx 2 = λ3 x 0 , ..., x t = λx t−1 = λ t x 0 . In most applications, we are interested in the limiting behavior of x as t → ∞. The previous derivations show that x approaches zero for every initial x 0 ∈ R if and only if |λ| < 1. This behavior is called asymptotic stability. Note that this result also applies to complex numbers a, x 0 ∈ C0 (see equation (12.2)). Now, we consider the generalization of (16.1) to n variables: x t = Ax t−1 ,
x := [x 1 , x 2 , . . . , x n ]0 ∈ Rn .
(16.2)
What are the properties of the n by n matrix A that ensure that x t is asymptotically stable? To answer this question we use the Jordan factorization A = M J M −1 (see equation (12.30)) to transform (16.2) into a simpler system. First, we define new variables y t = M −1 x t . Second, we multiply (16.2) from the left by M −1 to obtain M −1 x = M −1 A x t−1 , | {z }t |{z} M y t−1
=y t
−1
y t = |M {zAM} y t−1 , =J
y t = Jy t−1 .
(16.3)
Since J is a block-diagonal matrix, the new system is decoupled into K independent blocks:
850
16 Difference Equations and Stochastic Processes
ykt = Jk ykt−1 . If the geometric multiplicity of the k-th eigenvalue equals 1, the size of the vector ykt equals the algebraic multiplicity of the k-th eigenvalue. Otherwise, there are multiple Jordan blocks for the k-th eigenvector, and the sum of the sizes of the corresponding ykt sums up to its algebraic multiplicity.1 For instance, if A has n distinct real or complex eigenvalues λi , equation (16.3) simplifies to λ1 0 0 . . . 0 0 λ2 0 . . . 0 yt = ... ... ... . . . ... y t−1 ⇔ yi t = λi yi t−1 , i = 1, 2, . . . , n. 0 0 0 0 λn
Thus, it is obvious that the transformed system is asymptotically stable if the absolute value of all eigenvalues is less than unity. But if y t converges toward the zero vector, then so does the vector x t = M y t . Although less obvious, this also holds in the general case where A may have multiple eigenvalues.2 Thus, we have the following theorem: Theorem 16.2.1 The linear system of difference equations (16.2) is asymptotically stable, lim x t = 0n×1 ,
t→∞
if and only if every eigenvalue of A is less than unity in modulus. Now, we suppose that only n1 < n eigenvalues have a modulus less than unity while the remaining n2 = n − n1 eigenvalues exceed unity. Since we may choose M so that the stable eigenvalues appear in the first blocks of J, we can partition J into Pn1 ×n1 0n1 ×n2 J= 0n2 ×n1 Q n2 ×n2 and y t into y t = [y1t , y2t ]0 ,
y1t = [ y1,t , y2,t , . . . , yn1 ,t ]0 , y2t = [ yn1 +1,t , yn1 +2,t , . . . , ynt ]0 1
The geometric multiplicity is the number of linearly independent eigenvectors to an eigenvalue λ. The algebraic multiplicity is the number of times the eigenvalue λ solves the eigenvalue problem (12.28). See Section 12.8. 2 See Murata (1977) p. 85.
16.2 Difference Equations
851
so that (16.3) is given by y1t = Py1t−1 , y2t = Qy2t−1 . Since all the eigenvalues of P are inside the unit circle, the vector y1t approaches the zero vector as t → ∞. The vector y2t , however, is farther and farther displaced from the origin as t → ∞, since all the eigenvalues of Q are outside the unit circle. If we are free to choose the initial y0 , we put y20 = 0n1 ×1 , and thus, we can ensure that y0 and hence x0 = M y0 converges to the zero vector as t → ∞. By choosing y0 = [ y10 , . . . , yn1 0 , 0, . . . 0], we restrict the system to the stable eigenspace of the matrix A.3
16.2.2 Nonlinear Difference Equations Let f i : Rn → R, i = 1, 2, . . . , n denote arbitrary differentiable functions. A system of n nonlinear first-order difference equations is defined by the following set of equations:
1 x 1t f (x 1 t−1 , x 2 t−1 , . . . , x n t−1 ) x 2t f 2 (x 1 t−1 , x 2 t−1 , . . . , x n t−1 ) . = . .. .. . x nt
(16.4)
f n (x 1 t−1 , x 2 t−1 , . . . , x n t−1 )
In a more compact notation, this can be written as x t = f(x t−1 ). We assume there is a point x∗ that satisfies x∗ = f(x∗ ). Such a point is called a fixed point, a rest point or, in more economic terms, a stationary equilibrium. What can be said about the asymptotic behavior of x t under the map f? Is there a relation between the linear system (16.2) and the nonlinear system? To deal with these questions, 3
Remember, from Section 12.8, this is the set spanned by the generalized eigenvectors of A with modulus less than one.
852
16 Difference Equations and Stochastic Processes
we consider the linear approximation of f at x∗ (see equations (13.9) and (13.10) for details): f(x∗ + h) ≈ f(x∗ ) + J(x∗ )h, h = x − x∗ ,
where J(x∗ ) is the Jacobian matrix of f. Since x∗ = f(x∗ ), the linear approximation of the system of nonlinear difference equations (16.4) at x∗ is given by ¯ t = J(x∗ )¯ x x t−1 ,
¯ t := x t − x∗ . x
(16.5)
The relation between the nonlinear system and the linearized system is the subject of the following theorem:4 Theorem 16.2.2 (Hartman-Grobman) Let f : Rn → Rn be a C 1 diffeomorphism with a hyperbolic fixed point x∗ . Then, there exists a homeomorphism g defined on some neighborhood U of x∗ such that g(f(x)) = J(x∗ )g(x) for all x ∈ U.
A map f is a homeomorphism if
1) it is a continuous one-to-one map of some subset U ⊂ Rn onto the set Y ⊂ Rn , i.e., f(x1 ) = f(x2 ) ⇒ x1 = x2 and f(U) = Y , 2) with a continuous inverse f−1 . If both f and f−1 have continuous first derivatives, the homeomorphism ¯ is a hyperbolic fixed point of f if the f is a C 1 diffeomorphism. Finally, x Jacobian matrix J(¯ x) has no eigenvalues on the unit circle. Figure 16.1 illustrates Theorem 16.2.2. The image of the point x t under the nonlinear map f is given by x t+1 . If we map x t+1 into the set g(U) = Y , we obtain y t+1 . However, we arrive at the same result if we first map x t into Y via the nonlinear change of coordinates g and then apply the linear operator J(x∗ ) to y t . Two maps that share this property are called topological conjugates. This allows us to infer the dynamics of x t near x∗ from the dynamics of the linear system (16.5) near the origin. Since we already know that the linear system is asymptotically stable if all the eigenvalues of J(x∗ ) are inside the unit circle, we can conclude from Theorem 16.2.2 that under this condition for all x0 sufficiently close to x∗ the series x t tends to x∗ as t → ∞. A fixed point with this property is called locally asymptotically stable. 4
See, e.g., Guckenheimer and Holmes (1986) p. 18, Theorem 1. 4 1 for a statement and Palis Jr. and de Melo (1982) pp. 60ff. for a proof.
16.2 Difference Equations
853 g(U) = Y y t+1 g
U⊂X x t+1
J(x∗ ) yt
g
f xt Figure 16.1 Topological Conjugacy
We can also extend our results with respect to the case where the matrix A in (16.2) has n1 eigenvalues inside and n2 = n − n1 eigenvalues outside the unit circle to the nonlinear system of difference equations (16.4). For that purpose, we define two sets. The local stable manifold of x∗ is the set of all x ∈ U that are starting points for which the series x t tends to x∗ as t → ∞: n o Wlsoc (x∗ ) := x ∈ U : lim f t (x) → x∗ and f t (x) ∈ U ∀t ≥ 0 . t→∞
Here, f t (·) is recursively defined via f t = f(f t−1 (·)). The local unstable manifold is the set of all x ∈ U that are starting points for which the series x t tends x∗ as we move backward in time: n o Wluoc (x∗ ) := x ∈ U : lim f t (x) → x∗ and f t (x) ∈ U ∀t ≤ 0 . t→−∞
Together with Theorem 16.2.2, the next theorem shows that if we restrict the initial point x0 to lie in the stable eigenspace of J(x∗ ), then for x0 sufficiently close to x∗ , the dynamics of the linear system mimics the s dynamics of the nonlinear system on the local stable manifold Wloc (x∗ ):5 5
See, e.g., Guckenheimer and Holmes (1986) p. 18, Theorem 1.4.2 for a statement of this theorem and Palis Jr. and de Melo (1982) pp. 75ff. for a proof. Grandmont (2008) provides more detailed theorems that relate the solution of the nonlinear system (16.4) to the solution of the linear system (16.2).
854
16 Difference Equations and Stochastic Processes
Theorem 16.2.3 (Stable Manifold Theorem) Let f : Rn → Rn be an C 1 diffeomorphism with a hyperbolic fixed point x∗ . Then, there are local stable u and unstable manifolds Wlsoc , Wloc , respectively, tangent to the eigenspaces s ∗ u ∗ ∗ E (x ), E (x ) of J(x ), respectively, and of corresponding dimensions. Figure 16.2 illustrates this theorem. Note, that any time path that is neither in the local stable nor in the local unstable manifold also diverges from x∗ as t → ∞, as the path represented by the red broken line. x2
E u (x∗ )
E s (x∗ )
u Wloc (x∗ )
s Wloc (x∗ )
x∗
x1 Figure 16.2 Local Stable and Unstable Manifolds
16.2.3 Boundary Value Problems and Shooting The problem to solve a system of difference or differential equations requires additional conditions to determine a single time path from the set of possible solutions. These conditions can specify 1) the initial state of the system or 2) a mixture of initial conditions for a subset of the variables and terminal conditions for the remaining variables. Case 1) is known as initial value problem and case 2) as boundary value problem. To see this, consider the second-order differential equation x 00 (t) + x(t) = 0, t, x(t) ∈ R.
16.2 Difference Equations
855
It is easily verified by differentiation that both sin(t) and cos(t) satisfy these conditions. However, sin(t) is the only function that satisfies the initial conditions x(0) = 0 and x 0 (0) = 1. On the contrary, cos(t) is the solution to the boundary value problem cos(0) = 1 and cos(π) = −1. Note further that we can reformulate the second-order equation as a first-order system of two differential equations in the variables x(t) and y(t) := x 0 (t): x 0 (t) 0 1 x(t) = . y 0 (t) −1 0 y(t) | {z } | {z } =:z0 (t)
=:z(t)
Accordingly, the initial condition that determines the sine function is z(0) = [0, 1] T , and the boundary condition determining the cosine function is z1 (0) = 0 and z1 (π) = −1. In the case of a nonlinear system of difference equation (16.4) with a stable fixed point x∗ , we can trace out the equilibrium path from every initial x0 that is in the basin of x∗ , i.e., the set {x t | lim t→∞ x t = x∗ }, by iterating over (16.4) with initial point x0 . It involves trial and error to find T ∈ Z such that kx T − x∗ k is small in some given vector norm. This forward shooting can also be used to approximate x∗ instead of solving the nonlinear system x∗ = f(x∗ ). Consider the case of a given terminal condition x T . If the system (16.4) has an easy to compute inverse we can iterate backward in time using x t = f−1 (x t+1 ) with initial value x T . If an analytic inverse does not exist and if it is computationally burdensome to approximate the inverse of f(x t ) forward shooting is an alternative. Given T , we start with an arbitrary initial value x0 and iterate forward to f T (x0 ). If this point is far from the given terminal value x T (in a given vector norm kf T (x0 ) − x T k) we try a different value. The search for the appropriate initial value can be programmed as solution of the zero of a nonlinear function g : Rn → Rn , x0 7→ g(f T (x0 ) − x T ).
Forward shooting is not applicable if the system (16.4) is saddle path stable since we had to start in the (local) stable manifold to approach the fixed point x∗ . Saddle path stable equilibria belong the class of boundary
856
16 Difference Equations and Stochastic Processes
value problems. As an example, consider the Ramsey model in Section 1.3.4. We know the initial capital stock K0 . However, iterations over the system (1.19) that start at K0 and arbitrary values of consumption C0 diverge from the stationary solution (K ∗ , C ∗ ). Only initial values on the saddle path converge. However, we do not know this path in advance. What we know is the terminal value (K ∗ , C ∗ ). Backward iterations (also known as reverse shooting) cannot start at the fixed point x∗ since x∗ = f−1 (x∗ ). Instead, we start backward iterations at a point x T close to x∗ in the stable hyperplane. In this way, the path traced out by x T −t = f−t (x T ) stays close to the stable manifold. For instance, in Section 2.3.2, we compute the tangent to the saddle path of the Ramsey model shown in Figure 1.2. The solution is the linearized consumption function in (2.9a). For a small perturbation of the capital stock d K = K T − K the corresponding change in consumption that keeps the system close to the saddle path is d C = −(t 21 /t 22 ) d K. To approximate the left wing of this path via backward iterations we must set d K to a small negative number. Setting d K to a small positiv value and iterating forward in time approximates the right wing of the saddle path. In Section 8.2.1 we use reverse shooting to compute the time path of aggregate capital and working hours in a heterogenous agent model with Gorman preferences.
16.3 Stochastic Processes 16.3.1 Univariate Processes Let y t denote the observation of a variable y at time t. The ordered list of observations from period t = 1 through period t = T , denoted by { y t } Tt=1 , is called a time series. Econometricians interpret time series as realizations of a stochastic process {Yt } t∈Z . A stochastic process is an ordered sequence of random variables Yt . If the realizations of Yt belong to some interval on the real line, Yt ∈ [a, b] ⊆ R, the process is continuously valued, and the interval [a, b] is called the state space of Yt . A discrete valued process has realisation from a countable number of values, Yt ∈ { y1 , y2 , . . . }. What can we learn about a stochastic process from a given time series? Not very much, unless we put a number of restrictions on the underlying stochastic process. The most important one is stationarity. Strictly speaking, the process {Yt } t∈Z is stationary if the joint probability distribution of the collection
16.3 Stochastic Processes
857
(Yt+ j1 , Yt+ j2 , . . . , Yt+ jn ) depends only on the n-tuple of integers ( j1 , j2 , . . . , jn ) but not on the time index t. A process whose elements Yt , t ∈ Z share the same probability distribution, which is independent of the realizations of other members of the process, Yt+s , s 6= 0 is called independently and identically distributed (iid). Obviously, iid processes are stationary. A less demanding assumption than stationarity is covariances stationarity. The process {Yt } t∈Z is covariance stationary if its unconditional mean µ t := E(Yt ) is independent of time t and if the k = 0, 1, 2, . . . autocovariances γk ≡ E(Yt − µ)(Yt−k − µ)
depend only on the integer k. We refer to µ as the first moment and to γk as the second moment of the process. Covariance stationarity, however, is not sufficient for a consistent estimation of µ and γk from a given time series { y t } Tt=1 . The additional requirement is ergodicity. We provide no formal definition of this concept6 but note its consequence: if {Yt } t∈Z is ergodic and covariance stationary, the time series mean ¯y =
T 1X yt T t=1
(16.6)
is a consistent estimate of the process mean µ and the sample autocovariance T 1 X ck = ( y t − ¯y )( y t−k − ¯y ) T t=k+1
(16.7)
is a consistent estimate of the autocovariance γk . Note that c0 defines the sample variance of the process so that rk =
ck , c0
k = 1, 2, . . .
(16.8)
is the sample autocorrelation coefficient between observations that are k periods apart. In the same way, we can estimate the covariance at lag k between two stationary and ergodic stochastic processes {Yt } t∈Z and {X t } t∈Z from the time series { y t } Tt=1 and {x t } Tt=1 :
6
We refer the interested reader to econometric textbooks. See, e.g., Davidson and MacKinnon (1993), p. 132.
858
16 Difference Equations and Stochastic Processes xy
ck :=
T 1 X ( y t − ¯y )(x t−k − x¯ ), T t=k+1
(16.9)
and compute the correlation coefficient between both series from xy
xy rk
ck := Æ q . y c0x c0
(16.10)
16.3.2 Trends Most economic time series exhibit obvious trends, and thus, the underlying process {Yt } t∈Z cannot be stationary. If the researcher considers the time trend as deterministic, i.e., Yt = α1 + α2 t + ε t , where {ε t } t∈Z is ergodic and covariance stationary, the researcher must extract the trend from y t before the first and second moments of ε t are estimated. This is the case of trend stationary growth considered in Section 1.5.2. Instead of a deterministic trend, a researcher may consider a random walk with drift, Yt = µ + Yt−1 + ε t , causing the trending behavior of the time series. In this case, firstdifferencing renders the process stationary, and accordingly, the first and second moments of ε t can be estimated from ∆ y t = y t − y t−1 . This is the case of difference stationary growth considered in Section 1.5.2. A more flexible way to remove the trend from a nonstationary time series is the popular Hodrick-Prescott (HP) filter. We show in Section 16.5.2 that its cyclical component ¯y t = y t − g t does not change when a linear trend is added to the original series. Therefore, to treat artificial and actual time series in the same way, researchers usually apply the filter also to the simulated time series, even if these are stationary per construction.
16.3.3 Multivariate Processes These concepts readily generalize to vector-valued stochastic processes. Let Y t ∈ Rn( y) denote an n( y) vector of random variables and {Y t } t∈Z
16.4 Markov Processes
859
the ordered sequence of these vectors. The mean µ of this process is the unconditional expectation µ = E (Y t ) and its covariance matrix is given by Γk := E (Y t − µ) (Y t−k − µ) T . The elements on the main diagonal of this matrix are the autocovariances of the vector elements Yi t at lag k. The elements on the upper right are the covariances between Yi t and Y j t−k , i = 1, . . . n( y), and j = i + 1, . . . , n( y). The off-diagonal elements on the lower left are the covariances between Yi t and Y j t−k , i = 1, . . . , n( y) and j = 1, . . . i − 1. Therefore, Γ (k) = Γ (−k) T . A prominent example of a vector-stochastic process is the first-order vector autoregressive (VAR(1))-process Y t = a + AY t−1 + ε t .
(16.11)
The innovationsε t ∈ Rn( y) are white noise with mean zero, i.e, E(ε t ) = 0n( y)×1 , E ε t ε Tt =Σε , and E ε t εsT = 0n( y)×n( y) for t 6= s. This process is covariance stationary if all n( y) eigenvalues of the matrix A are within the unit circle.
16.4 Markov Processes Markov processes are an indispensable ingredient in dynamic stochastic general equilibrium (DSGE) models. They preserve the recursive structure that these models inherit from their deterministic relatives. In this section, we review a few results about these processes that we have used repeatedly in the development of solution methods and in applications. A stochastic process has the Markov property if the probability distribution of Yt+1 only depends upon the realization of Yt .
16.4.1 The First-Order Autoregressive Process An example of a Markov process is the first-order autoregressive (AR(1)) process
860
16 Difference Equations and Stochastic Processes
Yt = (1 − ρ)Y¯ + ρYt−1 + ε t ,
ρ ∈ (−1, 1), ε t iid N (0, σε2 ),
(16.12)
which we use consistently in this book. The random variable ε t , the socalled innovations of this process, are iid draws from a normal distribution with mean E(ε t ) = 0 and variance E(ε2t ) = σε2 .7 Given Yt , the next period’s shock Yt+1 is normally distributed with mean E(Yt+1 |Yt ) = (1 − ρ)Y¯ + ρYt and variance var(Yt+1 |Yt ) = σε2 . Since higher-order autoregressive processes can be reduced to first-order vector autoregressive processes, the first-order process plays a prominent role in the development of dynamic stochastic general equilibrium (DSGE) models. As an example, let us consider the second-order autoregressive process Yt = ρ1 Yt−1 + ρ2 Yt−2 + ε t .
(16.13)
Defining X t = Yt−1 , equation (16.13) can be written as Yt+1 = ρ1 Yt + ρ2 X t + ε t , X t+1 = Yt , which is an example of the first-order vector autoregressive process (16.11) with Y t := (Yt , X t )0 , a = (0, 0)0 , innovations ε: = (ε t , 0)0 , and ρ1 ρ2 A := . 1 0
16.4.2 Markov Chains DEFINITION. Markov chains are discrete-valued Markov processes. A timeinvariant, first-order Markov chain is characterized by three objects: 1) The column vector y = [ y1 , y2 , . . . , yn ]0 summarizes the n different realizations of Yt , t ∈ N. 2) The probability distribution of the initial date t = 0 is represented by the vector π0 = [π01 , π02 , . . . , π0n ]0 , where π0i denotes the probability of the event Y0 = yi . 3) The dynamics of the process is represented by a transition matrix P = (pi j ) ∈ Rn×n , where pi j denotes the probability of the event Yt+1 = y j |Yt = yi , i.e., the probability that next period’s is y j given that Pstate m this period’s state is yi . Therefore, pi j ≥ 0 and j=1 pi j = 1. 7
A more general definition restricts the innovations ε t to a white noise process only, i.e., E(ε t ) = 0, E(ε2t ) = σε2 , and E(ε t ε t+s ) = 0 for s 6= 0.
16.4 Markov Processes
861
Thus, given Yt = yi , the conditional expectation of Yt+1 is E(Yt+1 |Yt = yi ) =
n X
pi j y j ,
j=1
and the conditional variance is var(Yt+1 |Yt = yi ) =
n X j=1
pi j ( y j − p0i y)2 ,
where p0i denotes the i-th row of P. The probability distribution of Yt evolves according to π0t+1 = π0t P.
(16.14)
COMPUTATION OF THE ERGODIC DISTRIBUTION. The limit of (16.14) for t → ∞ is the time invariant, stationary, or ergodic distribution of the Markov chain (y, P, π0 ). It is defined by π0 = π0 P
⇔ (I − P 0 )π = 0.
(16.15)
Does this limit exist? And if it exists, is it independent of the initial distribution π0 ? The answer to both questions is yes, if either all pi j > 0 or, if for some integer k ≥ 1 all elements of the matrix P k := |P × P{z · · · × P} k−elements
are positive, i.e., (P k )i j > 0 for all (i, j). This latter condition states that it is possible to reach each state j in at least k steps from state i.8 Obviously, this is a weaker condition than pi j > 0 for all (i, j). As an example, we consider the transition matrix 0.0 1.0 P= , 0.9 0.1 for which 0.90 0.10 P =P×P = . 0.09 0.91 2
8
See, e.g. Ljungqvist and Sargent (2018) Theorem 2.2.1 and Theorem 2.2.2.
862
16 Difference Equations and Stochastic Processes
We need to compute the invariant distribution in many applications. For instance, in Section 8.4, we must solve for the stationary distribution of employment to find the stationary distribution of assets. The states of the respective Markov chain are y1 = e and y2 = u, where e (u) denotes (un)employment, and π01 (π02 = 1 − π01 ) is the probability that a randomly selected agent from the unit interval is employed in period t = 0. The transition matrix P is given by puu pue 0.5000 0.5000 P= = , peu pee 0.0435 0.9565 where puu (pue ) denotes the probability that an unemployed agent stays unemployed (becomes employed). One obvious way to find the stationary distribution is to iterate over equation (16.14) until convergence is achieved. When we start with an arbitrary fraction of unemployed and employed agents of (0.5, 0.5) and iterate over (16.14), we obtain the sequence in Table 16.1, which converges quickly to (0.08, 0.92), the stationary probabilities of being (un)employed. Table 16.1 Iterative Computation of the Ergodic Distribution Iteration No.
πu
πe
0 1 2 3 4 5 10 20
0.500000 0.271750 0.167554 0.119988 0.098275 0.088362 0.080202 0.080037
0.500000 0.728250 0.832446 0.880012 0.901725 0.911638 0.919798 0.919963
Another procedure to compute the stationary distribution of a Markov chain is by means of Monte Carlo simulations. For the two-state chain of the previous example, this is easily done: We assume an initial state of employment y0i , for example y02 = e. We use a uniform random number generator with the support [0, 1]. If the random number is less than 0.9565, y12 = e; otherwise, the agent is unemployed in period 1, y11 = u. In the next period, the agent is either employed or unemployed. If employed,
16.4 Markov Processes
863
the agent remains employed if the random number of this period is less than 0.9565 and becomes unemployed otherwise. If unemployed, the agent remains unemployed if the random number of this period is less than 0.5 and becomes employed otherwise. We continue this process for T periods and count the number of times the agent is either employed or unemployed. The relative frequencies converge slowly to the ergodic distribution according to the Law of Large Numbers. In our computation, we obtain the simulation results displayed in Table 16.2. Table 16.2 Simulation of a Markov Chain Iteration No. 10 100 1000 10000 100000 500000
πu
πe
0.10 0.12 0.063 0.0815 0.0809 0.0799
0.90 0.88 0.937 0.9185 0.9191 0.9201
Notice that this procedure converges very slowly. Furthermore, if the Markov chain has more than n = 2 states, this becomes a very cumbersome procedure. For this reason, we usually employ a third, more direct way to compute the ergodic distribution. Observe that the definition of the invariant distribution (16.15) implies that π is an eigenvector to the eigenvalue of one of the matrix −P 0 , where π has been normalized so that Pn i=1 πi = 1. Solving the eigenvalue problem for the matrix given above gives π1 = 0.0800 and π2 = 0.920. An equivalent procedure uses the fact that the matrix I − P 0 has rank n − 1 (given that P 0 has rank n) and that the sum of the probabilities πi , i = 1, . . . , n, must equal one. Therefore, the vector π must solve the following system of linear equations:
p11 − 1 p12 . . . p1,n−1 p22 − 1 . . . p2,n−1 p21 . .. .. .. . π0 . . . . p p ... p n−1,1
pn1
n−1,2
pn2
n−1,n−1
. . . pn,n−1
1 1 .. . = (0, . . . , 0, 1). − 1 1 1
We provide the procedure equivec to perform this task.
864
16 Difference Equations and Stochastic Processes
MARKOV CHAIN APPROXIMATIONS OF AR(1)-PROCESSES In Section 1.4.3, we extend the value function iteration method from Section 1.3.3 to solve the stochastic Ramsey model when the amount of rainfall Z t is a finite state Markov chain. Empirically, however, the shift parameter of the production function resembles an AR(1)-process. In Section 7.3.2, we resolve this problem by approximating the AR(1)-process with a finite-state Markov chain. We next describe two algorithms that perform this task. Let us consider the process Yt+1 = ρYt + ε t ,
ρ ∈ (−1, 1), ε t iid N (0, σε2 ).
The mean and variance of this process are E(Yt ) = 0 and9 E(Yt2 ) =: σ2Y =
σε2 1 − ρ2
.
(16.16)
Tauchen (1986) proposes to choose a grid Y = [ y1 , y2 , . . . , yn ] of equidistant points y1 < y2 , . . . , < yn , whose upper end point is a multiple, say λ, of the standard deviation of the autoregressive process, ym = λσY and whose lower end point is y1 = − ym . For a given realization yi ∈ Y , the variable y := ρ yi + ε is normally distributed with mean ρ yi and variance σε2 . Let d y denote half of the distance between two consecutive grid points. The probability that y is in the interval [ y j − d y; y j + d y] is given by Prob( y j − d y ≤ y ≤ y j + d y) = π( y j + d y) − π( y j − d y)
where π(·) denotes the cumulative distribution function of the normal distribution with mean ρ yi and variance σε2 . Equivalently, the variable v := ( y −ρ yi )/σε has a standard normal distribution. Thus, the probability of switching from state yi to state y j for j = 2, 3, ..., n − 1, say pi j , is approximated by the probability for y to lie in the interval [ yi −d y; yi +d y], which is given by the area under the probability density function of the standard normal distribution in the interval y j − ρ yi − d y y j − ρ yi + d y , . σε σε The probability of arriving at state y1 is approximated by the probability for y less than y1 + d y, which equals the Parea under the probability density in the interval [−∞, y1 + d y]. Since j pi j = 1, the probability of going Pn−1 from any state i to the upper bound yn is simply pin = 1 − j=1 pi j . We summarize this method in the following steps: See, e.g., Hamilton (1994) p. 53. This result also follows from equation (4.5) for H ww = ρ and Σν = σε2 .
9
16.4 Markov Processes
865
Algorithm 16.4.1 (Tauchen) Purpose: Finite state Markov chain approximation of the first order autoregressive process Steps: Step 1: Compute the discrete approximation of the realizations: Let ρ and σε denote the autoregressive parameter and the standard deviation of innovations, respectively. Select p the size of the grid by choosing λ ∈ R++ so that y1 = −λσε / 1 − ρ 2 . Choose the number of grid points n. Put st ep = −2 y1 /(n − 1) and for i = 1, 2, ..., n compute yi = y1 + (i − 1)st ep. Step 2: Compute the transition matrix P = (pi j ): Let π(·) denote the cumulative distribution function of the standard normal distribution. For i = 1, 2, . . . , n set y −ρ y st ep pi1 = π 1 σε i + 2σε , y −ρ y y −ρ y st ep st ep pi j = π j σε i + 2σε − π j σε i − 2σε , j = 2, 3, . . . , n − 1, Pn−1 pin = 1 − j=1 pi j . Tauchen (1986) reports the results of Monte Carlo experiments that show that choosing n = 9 and λ = 3 gives an adequate representation of the underlying AR(1)-process. Our GAUSS procedure Markov_AR1_T in the file ToolBox.src implements the above algorithm. It takes ρ, σε , λ, and n as input and returns the vector y = [ y1 , y2 , ..., ym ]0 and the transition matrix P. The method of Tauchen (1986) replicates the conditional distribution of the underlying AR(1)-process. Rouwenhorst (1995) proposes a method that targets the autocorrelation parameter ρ. Kopecky and Suen (2010) show that his method also matches the conditional and unconditional mean and variance of the underlying AR(1)-process. Galindev and Lkhagvasuren (2010) find that it outperforms Tauchen’s method for highly persistent processes, i.e., when ρ is close to one. The Markov chain proposed bypRouwenhorst (1995) consists of n evenly spaced points over the interval n − 1[−σY , σY ] (where σY is given in equation (16.16)) and constructs the transition probability matrix P recursively, as demonstrated in the following algorithm.
866
16 Difference Equations and Stochastic Processes
Algorithm 16.4.2 (Rouwenhorst) Purpose: Finite state Markov chain approximation of the first order autoregressive process Steps: Step 1: Compute the discrete approximation of the realizations: Let ρ and σε denote the autoregressive parameter and the p standard deviation of innovations, respectively. Set y1 = −σε / 1 − ρ 2 . Choose the number of grid points n. Put st ep = −2 y1 /(n − 1) and for i = 1, 2, ..., n compute yi = y1 + (i − 1)st ep. Step 2: Compute the transition matrix P = (pi j ): Let p 1−p P2 = , 1−q q with p = q = (1 + ρ)/2. For s = 3, ..., . . . n, compute Ps−1 0(s−1)×1 0(s−1)×1 Ps−1 Ps = p + (1 − p) 01×(s−1) 0 0 01×(s−1) 0 01×(s−1) 01×(s−1) 0 +q + (1 − q) 0(s−1)×1 Ps−1 Ps−1 0(s−1)×1 and multiply all but the first and the last row of Ps by 0.5. We implement this algorithm in the GAUSS procedure Markov_AR1_R. The arguments of this procedure are n, ρ, and σε . It returns y and P.
16.5 Linear Filters 16.5.1 Definitions A filter transforms one time series or stochastic process {Yt } t∈Z to another series {X t } t∈Z . The purpose of filtering is best understood by viewing a time series not as a (possibly vector-valued) sequence of real numbers but as the superposition of waves of different periodicity. Filters aim at removing or amplifying waves of a particular periodicity from a given series. For instance, if we consider the business cycle as a periodic phenomenon with a periodicity between, say, two and eight years, we may want to remove from quarterly data all cycles with a periodicity below 4 and above 32 quarters.
16.5 Linear Filters
867
In general, a linear filter is defined by its weights ψs as follows: Xt =
∞ X
ψs Yt−s ,
s=−∞
∞ X s=−∞
|ψs | < ∞,
(16.17)
The filter is called one-sided if ψs = 0 for all s < 0, and two-sided otherwise. The business cycle literature has several filters. For instance, Baxter and King (1999) and Christiano and Fitzgerald (2003) construct a bandpass filters that remove both short waves associated with seasonal movements and long waves that determine the trend. The most widely employed filter was proposed in a discussion paper by Robert Hodrick and Edward Prescott that circulated in the nineteen eighties (see Hodrick and Prescott (1980)) and was later published in Hodrick and Prescott (1997). Several studies point out that the filter induces spurious cycles, that it suffers from an end-point bias, and that the common choice of the smoothing parameter is at odds with statistical reasoning.10 Despite this discussion, we consider this filter in more detail in the next subsection since it has been used in numerous studies.
16.5.2 The HP-Filter Let ( y t ) Tt=1 denote the log of a time series that may be considered as realization of a nonstationary stochastic process. The growth component (g t ) Tt=1 of this series as defined by the HP-Filter is the solution to the following minimization problem: min
(g t ) Tt=1
T X t=1
( y t − g t )2 + λ
T −1 X t=2
[(g t+1 − g t ) − (g t − g t−1 )]2 .
(16.18)
The parameter λ must be chosen by the researcher. Its role can be easily seen by considering the two terms to the right of the minimization operator. If λ is equal to zero, the obvious solution to (16.18) is y t = g t , i.e., the growth component were set equal to the original series. As λ becomes increasingly large, it is important to keep the second term as small as possible. Since this term equals the growth rate of the original series 10 See, among others, King and Rebelo (1993), Cogley and Nason (1995a), Ravn and Uhlig (2002), Mise et al. (2005), and for a more recent discussion Hamilton (2018) and Schüler (2018).
868
16 Difference Equations and Stochastic Processes
between two successive periods, the ultimate solution for λ → ∞ is a constant growth rate g. Thus, by choosing the size of the weight λ the filter returns anything between the original time series and a linear time trend. The first order conditions of the minimization problem imply the following system of linear equations: Ag = y,
(16.19)
where g = [g1 , g2 , . . . , g T ]0 , y = [ y1 , y2 , . . . , y T ]0 , and A is the tridiagonal matrix
1 + λ −2λ λ 0 0 −2λ 1 + 5λ −4λ λ 0 λ −4λ 1 + 6λ −4λ λ λ −4λ 1 + 6λ −4λ 0 0 λ −4λ 1 + 6λ 0 . . . .. .. . .. .. . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
We note that A can be factored in11 A = I + λK 0 K, 1 −2 1 0 0 . . . 0 0 1 −2 1 0 . . . 0 0 0 1 −2 1 . . . 0 K = . . . . . .. .. .. .. .. . . . ... 0 0
0
0 0 0 .. .
... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 0 ... 0 0 0 . .. .. .. .. . . . . . . . 1 + 6λ −4λ λ . . . −4λ 1 + 5λ −2λ ... λ −2λ 1 + λ
0 0 0 , .. .
0 0 . . . 1 −2 1
which shows that A is positive definite.12 Linear algebra routines tailored to solve systems with banded matrices can be employed to solve the system (16.19). These methods require considerably less workspace than methods that operate on the matrix A. For instance, the LAPACK routine dpbsv 11
See Brandner and Neusser (1990) p. 5. Recall from Section 12.7 that A is positive definite if the quadratic form q := x0 Ax satisfies q > 0 for all nonzero vectors x. The matrix I + λK 0 K clearly satisfies this requirement since
12
x0 [I + λK 0 K]x =
T X i=1
x i2 + λ
T X i=1
zi2 , z := Kx.
16.5 Linear Filters
869
uses the main and the two upper codiagonals of A, i.e., a 3 × T -matrix, whereas the general linear system solvers require the full T × T matrix A. Our implementation of the HP-Filter in the CoRRAM toolbox in the GAUSS procedure HPFilter uses the command bandsolpd to solve (16.19). The cyclical component of y, c = y − g = [I − A−1 ]y, remains unchanged if a linear time trend
a1 + a2 a1 + 2a2 a := .. . a1 + Ta2 is added to the time series ( y t ) Tt=1 . To see this, note that13 c = [I − A−1 ][y + a] = [I − A−1 ]y + [I − A−1 ]a . | {z } =0
The usual choice of the filter weight is λ = 1600 for quarterly data. This rests on the observation that with this choice the filter “removes all cycles longer than 32 quarters leaving shorter cycles unchanged” (Brandner and Neusser (1990) p. 7). For yearly data, Ravn and Uhlig (2002) propose λ = 6.5 whereas Baxter and King (1999) advocate for λ = 10.
13
This statement can be proven by noting that [I − A−1 ]a = A−1 [A − I]a = A−1 λ(K 0 K)a = λA−1 K 0 (Ka) . |{z} =0
Bibliography Acemo˘ glu, D. (2009). Introduction to Modern Economic Growth. Princeton, NJ and Oxford: Princeton University Press. Aghion, P. and P. Howitt (1992). A model of growth through creative destruction. Econometrica 60, 323–351. Aiyagari, R. S. (1994). Uninsured idiosyncratic risk and aggregate saving. Quarterly Journal of Economics 109(3), 659–684. Aiyagari, R. S. (1995). Optimal capital income taxation with incomplete markets, borrowing constraints, and constant discounting. Journal of Political Economy 103(6), 1158–1175. Aldrich, E. M., J. Fernández-Villaverde, A. R. Gallant, and J. F. Rubio-Ramírez (2011). Tapping the supercomputer under your desk: Solving dynamic equilibrium models with graphics processors. Journal of Economic Dynamics and Control 35, 386–393. Algan, Y., O. Allais, and W. J. Den Haan (2008). Solving heterogeneous-agent models with parameterized cross-sectional distributions. Journal of Economic Dynamics & Control 32(3), 875–908. Allais, M. (1947). Économie et Intérêt. Paris: Imprimerie Nationale. Amisano, G. and C. Giannini (1997). Topics in Structural VAR Econometrics (2nd ed.). Berlin: Springer. Andersson, J., J. Gillis, and M. Diehl (2018). User documentation for CasADi v3.44. Manuscript. http://casadi.sourceforge.net/v3.4.4/users_ guide/casadi-users_guide.pdf. Andolfatto, D. (1996). Business cycles and labor-market search. American Economic Review 86, 112–132. Andreasen, M. M. (2012). On the effects of rare disasters and uncertainty shocks for risk premia in non-linear DSGE models. Review of Economic Dynamics 15, 295–316. Andreasen, M. M., J. Fernández-Villaverde, and J. F. Rubio-Ramírez (2018). The pruned state-space system for non-linear DSGE models: Theory and empirical applications. Review of Economic Studies 85, 1–49. Antony, J. and A. Maußner (2012). A note on an extension of a class of solutions to dynamic programming problems arising in economic growth. Macroeconmic Dynamics 16, 472–476. Arrow, K. J. (1970). Essays in the Theory of Risk Bearing. Amsterdam: NorthHolland. Aruoba, S. B. and J. Fernández-Villaverde (2015). A comparison of programming languages in macroeconomics. Journal of Economic Dynamics and Control 58, 265–273. Aruoba, S. B., J. Fernández-Villaverde, and J. F. Rubio-Ramírez (2006). Comparing solution methods for dynamic equilibrium economies. Journal of Economic Dynamics & Control 30(12), 2477–2508. Atkinson, A. B., F. Bourguignon, and C. Morrisson (1992). Empirical Studies of Earnings Mobility. Chur: Harwood Academic Publishers. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8
870
Bibliography
871
Auerbach, A. J., Y. Gorodnichenko, and D. Murphy (2020). Effects of fiscal policy on credit markets. AEA Papers and Proceedings 110, 119–124. Auerbach, A. J. and L. J. Kotlikoff (1987). Dynamic Fiscal Policy. New York: Cambridge University Press. Auerbach, A. J., L. J. Kotlikoff, and J. Skinner (1993). The efficiency gains from dynamic tax reform. International Economic Review 24, 81 – 100. Badel, A., M. Daly, M. Huggett, and M. Nybom (2018). Top earners: cross-country facts. Review of the Federal Reserve Bank of St. Louis 100(3), 237–57. Barro, R. J. (1974). Are government bonds net wealth? The Journal of Political Economy 82(6), 1095 – 1117. Barro, R. J. (2006). Rare disasters and asset markets in the twentieth century. Quarterly Journal of Economics 121, 823–866. Barro, R. J. and X. Sala-i Martin (2004). Economic Growth (2nd ed.). Cambridge, MA, London: MIT Press. Barro, R. J. and J. F. Ursúa (2008). Macroeconomic crises since 1870. Brookings Papers on Economic Activity 39(1), 255–335. Barro, R. J. and J. F. Ursúa (2012). Rare macroeconomic disasters. Annual Review of Economics 4, 89–109. Barthelmann, V., E. Novak, and K. Ritter (2000). High dimensional polynomial interpolation on sparse grids. Advances in Computational Mathematics 12, 273–288. Baxter, M. and R. G. King (1999). Measuring business cycles: Approximate bandpass filters for economic time series. Review of Economics and Statistics 81(4), 575–593. Becker, G. S. and N. Tomes (1979). An equilibrium theory of the distribution of income and intergenerational mobility. Journal of Political Economy 87, 575 – 593. Benhabib, J., R. Rogerson, and R. Wright (1991). Homework in macroeconomics: Household production and aggregate fluctuations. Journal of Political Economy 99(6), 1166–1187. Benhabib, J. and A. Rustichini (1994). A note on a new class of solutions to dynamic programming problems arising in economic growth. Journal of Economic Dynamics & Control 18(3-4), 1166–1187. Bertsekas, D. P. (2008). Nonlinear Programming (2nd ed.). Belmont, MA: Athena Scientific. Bewley, T. (1986). Stationary monetary equilibrium with a continuum of independently fluctuating consumers. In W. Hildenbrand and A. Mas-Colell (Eds.), Contributions to Mathematical Economics in Honor of Gérard Debreu, pp. 79–102. Amsterdam: North-Holland. Bick, A., N. Fuchs-Schündeln, and D. Lagakos (2018, January). How Do Hours Worked Vary with Income? Cross-Country Evidence and Implications. American Economic Review 108(1), 170–199. Bini, D. A., B. Iannazzo, and B. Meini (2012). Numerical Solution of Algebraic Riccati Equations. Society of Industrial and Applied Mathematics (SIAM).
872
Bibliography
Binning, A. (2013). Third-order approximation of dynamic models without the use of tensors. Working Paper 2013/13, Norges Bank. Blanchard, O. J. (1985). Debt, deficits, and finite horizons. Journal of Political Economy 93(2), 223 – 247. Blanchard, O. J. and C. M. Kahn (1980). The solution of linear difference models under rational expectations. Econometrica 48(5), 1305–1311. Blanchard, O. J. and R. Perotti (2002). An empirical characterization of the dynamic effects of changes in government spending and taxes on output. Quarterly Journal of Economics 177, 1329 – 1368. Blundell, R., R. Joyce, A. Norris Keiller, and J. P. Ziliak (2018). Income inequality and the labour market in Britain and the US. Journal of Public Economics 162(C), 48–62. Boldrin, M., L. J. Christiano, and J. D. M. Fisher (2001). Habit persistence, asset returns and the business cycle. American Economic Review 91(1), 149–166. Boppart, T. and P. Krusell (2020). Labor supply in the past, present, and future: A balanced-growth perspective. Journal of Political Economy 128(1), 118–157. Boyd, J. P. (2000). Chebyshev and Fourier Spectral Methods (2nd ed.). Mineola, NY: Dover Publications, Inc. Brandner, P. and K. Neusser (1990). Business cycles in open economies: Stylized facts for Austria and Germany. Working Paper 40, Austrian Institute for Economic Research. Braun, R. A. and D. Ikeda (2021). Monetary policy over the life cycle. FRB Atlanta Working Paper 2021-20a, Federal Reserve Bank of Atlanta. Braun, R. A. and D. H. Joines (2015). The implications of a graying Japan for government policy. Journal of Economic Dynamics and Control 57, 1–23. Braun, R. A. and T. Nakajima (2012). Uninsured countercyclical risk: An aggregation result and application to optimal monetary policy. Journal of the European Economic Association 10(6), 1450 – 1474. Brinca, P., H. A. Holter, P. Krusell, and L. Malafry (2016). Fiscal multipliers in the 21st century. Journal of Monetary Economics 77, 53–69. Brock, W. and L. Mirman (1972). Optimal economic growth and uncertainty: The discounted case. Journal of Economic Theory 4(3), 479–513. Bronson, R. (1989). Theory and Problems of Matrix Operations. New York: McGraw-Hill. Brooks, R. (2002). Asset-market effects of the baby boom and social-security reform. American Economic Review 92, 402 – 406. Browning, M., L. P. Hansen, and J. J. Heckman (1999). Micro data and general equilibrium models. In J. B. Taylor and M. Woodford (Eds.), Handbook of Macroeconomics, Vol. 1A, pp. 543–633. Amsterdam: North-Holland. Brüggemann, B. (2021). Higher taxes at the top: the role of entrepreneurs. American Economic Journal: Macroeconomics 13(3), 1–36. Budría Rodríguez, S., J. Díaz-Giménez, V. Quadrini, and J.-V. Ríos-Rull (2002). Updated facts on the U.S. distributions of earnings, income, and wealth. Quarterly Review Federal Reserve Bank of Minneapolis 26(3), 2–35.
Bibliography
873
Burden, R. L. and D. Faires (2016). Numerical Analysis (10th ed.). Pacific Grove, CA: Brooks/Cole, Cengage Learning. Burkhauser, R. V., D. Holtz-Eakin, and S. E. Rhody (1997). Labor earnings mobility and inequality in the United States and Germany during the growth years of the 1980s. International Economic Review 38(4), 775–794. Burnside, C. (1999). Real business cycle models: Linear approximation and GMM estimation. https://ideas.repec.org/c/dge/qmrbcd/76.html. Cagetti, M. and M. De Nardi (2006). Entrepreneurs, frictions and wealth. Journal of Political Economy 114(5), 835 – 870. Cagetti, M. and M. De Nardi (2009). Estate taxation, entrepreneurship, and wealth. American Economic Review 99(1), 85 – 111. Cai, Y., K. L. Judd, T. S. Lontzek, V. Michelangeli, and C.-L. Su (2017). A nonlinear programming method for dynamic programming. Macroeconomic Dynamics 21, 336–361. Caldara, D., J. Fernández-Villaverde, and J. F. Rubio-Ramírez (2012). Computing DSGE models with recursive preferences and stochastic volatility. Review of Economic Dynamics 15, 188–206. Calvo, G. A. (1983). Staggered prices in a utility-maximizing framework. Journal of Monetary Economics 12(3), 383–398. Canova, F. (2007). Methods for Applied Macroeconomic Research. Princeton, NJ: Princeton University Press. Carnahan, B., H. Luther, and J. O. Wilkes (1969). Applied Numerical Methods. New York: John Wiley & Sons. Caselli, F. and J. Ventura (2000). A representative consumer theory of distribution. American Economic Review 90(4), 909 – 926. Cass, D. (1965). Optimum growth in an aggregative model of capital accumulation. Review of Economic Studies 32(3), 233–240. Castañeda, A., J. Díaz-Giménez, and J.-V. Ríos-Rull (1998a). Earnings and wealth inequality and income taxation: Quantifying the tradeoffs of switching to a proportional income tax in the U.S. Working Paper 9814, Federal Reserve Bank of Cleveland. Castañeda, A., J. Díaz-Giménez, and J.-V. Ríos-Rull (1998b). Exploring the income distribution business cycle dynamics. Journal of Monetary Economics 42(1), 93–130. Castañeda, A., J. Díaz-Giménez, and J.-V. Ríos-Rull (2003). Accounting for the US inequality. Journal of Political Economy 111(4), 818–857. Caucutt, E. M., S. ˙Imrohoro˘ glu, and K. Kumar (2003). Growth and welfare analysis of tax progressivity in a heterogeneous-agent model. Review of Economic Dynamics 6(3), 546–577. Chari, V. V., P. J. Kehoe, and E. R. McGrattan (2007). Business cycle accounting. Econometrica 75(3), 781–836. Chatterjee, S. (1994). Transitional dynamics and the distribution of wealth in a neoclassical growth model. Journal of Public Economics 54(1), 97 – 119. Chawla, M. (1968). Error bounds for Gauss-Chebyshev quadrature formula of the closed type. Mathematics of Computation 22, 889–891.
874
Bibliography
Chiu, W. H. and E. Karni (1998). Endogenous adverse selection and unemployment insurance. Journal of Political Economy 106, 806– 827. Chow, G. C. (1997). Dynamic Economics. Optimization by the Lagrange Method. New York, Oxford: Oxford University Press. Christiano, L. J. and M. Eichenbaum (1992). Current real-business-cycle theories and aggreagte labor-market fluctuations. American Economic Review 82, 430– 450. Christiano, L. J., M. Eichenbaum, and C. L. Evans (1997). Sticky price and limited participation models of money: A comparison. European Economic Review 41(6), 1201–1249. Christiano, L. J., M. Eichenbaum, and C. L. Evans (1999). Monetary policy shocks: What have we learned and to what end? In J. B. Taylor and M. Woodford (Eds.), Handbook of Macroeconomics. Vol. 1A, pp. 65–148. Amsterdam: Elsevier. Christiano, L. J., M. Eichenbaum, and C. L. Evans (2005). Nominal rigidities and the dynamic effects of a shock to monetary policy. Journal of Political Economy 113, 1–45. Christiano, L. J., M. Eichenbaum, and M. Trabandt (2015). Unterstanding the Great Recession. American Economic Journal: Macroeconomics 7(1), 110–167. Christiano, L. J., M. Eichenbaum, and R. Vigfusson (2004). The response of hours to a technology shock: Evidence based on direct measures of technology. Journal of the European Economic Association 2, 381–395. Christiano, L. J. and J. D. M. Fisher (2000). Algorithms for solving dynamic models with occasionally binding constraints. Journal of Economic Dynamics & Control 24(8), 1179–1232. Christiano, L. J. and T. J. Fitzgerald (2003). The band pass filter. International Economic Review 44, 435–465. Christiano, L. J. and R. M. Todd (1996). Time to plan and aggregate fluctuations. Quarterly Review Federal Reserve Bank of Minneapolis 20(1), 14–27. Cogley, T. and J. M. Nason (1995a). Effects of the Hodrick-Prescott filter on trend and difference stationary time series, implications for business cycle research. Journal of Economic Dynamics and Control 19, 253–278. Cogley, T. and J. M. Nason (1995b). Output dynamics in real-business-cycle models. American Economic Review 85(3), 492–511. Cooley, T. F. and G. D. Hansen (1995). Money and the business cycle. In T. F. Cooley (Ed.), Frontiers of Business Cycle Research, pp. 175–216. Princeton, NJ: Princeton University Press. Cooley, T. F. and E. Henriksen (2018). The demographic deficit. Journal of Monetary Economics 93, 45 – 62. Cooley, T. F. and E. C. Prescott (1995). Economic growth and business cycles. In T. F. Cooley (Ed.), Frontiers of Business Cycle Research, pp. 1–38. Princeton, NJ: Princeton University Press. Corless, R. M. and N. Fillion (2013). A Graduate Introduction to Numerical Methods. New York: Springer. Correia, I., J. Neves, and S. T. Rebelo (1995). Business cycles in a small open economy. European Economic Review 39(6), 1089–1113.
Bibliography
875
Costain, J. (1997). Unemployment insurance with endogenous search intensity and precautionary saving. Universitat Pompeu Fabra Economics Working Paper 243. Davidson, R. and J. G. MacKinnon (1993). Estimation and Inference in Econometrics. New York, Oxford: Oxford University Press. Davidson, R. and J. G. MacKinnon (2004). Econometric Theory and Methods. New York, Oxford: Oxford University Press. De Nardi, M. (2015). Quantitative models of wealth inequality: A survey. Working Paper 21106, National Bureau of Economic Research (NBER). De Nardi, M. and G. Fella (2017). Saving and wealth inequality. Review of Economic Dynamics 26, 280–300. De Nardi, M., E. French, and J. B. Jones (2010). Why do the elderly save? The role of medical expenses. Journal of Political Economy 118(1), 39–75. De Nardi, M., S. ˙Imrohoro˘ glu, and T. J. Sargent (1999). Projected US demographics and social security. Review of Economic Dynamics 2(3), 575–615. De Nardi, M. and F. Yang (2016). Wealth inequality, family background, and estate taxation. Journal of Monetary Economics 77, 130–145. Deaton, A. (1991). Saving and liquidity constraints. Econometrica 59, 1221 – 1248. Debreu, G. (1954). Valuation equilibrium and Pareto optimum. Proceedings 7, National Academy of Sciences. DeJong, D. N. and C. Dave (2011). Structural Macroeconometrics (2nd ed.). Princeton, NJ: Princeton University Press. Den Haan, W. J. (1997). Solving dynamic models with aggregate shocks and heterogenous agents. Macroeconomic Dynamics 1(2), 335–386. Den Haan, W. J. (2010a). Assessing the accuracy of the aggregate law of motion in models with heterogeneous agents. Journal of Economic Dynamics and Control 34, 79 – 99. Den Haan, W. J. (2010b). Comparison of solutions to the incomplete markets model with aggregate uncertainty. Journal of Economic Dynamics and Control 34, 4 – 27. Den Haan, W. J. and J. De Wind (2012). Nonlinear and stable perturbation-based approximations. Journal of Economic Dynamics and Control 36, 1477–1497. Den Haan, W. J., K. L. Judd, and M. Juillard (2010). Computational suite of models with heterogeneous agents: Incomplete markets and aggregate uncertainty. Journal of Economic Dynamics and Control 34, 1 – 3. Den Haan, W. J. and A. Marcet (1990). Solving the stochastic growth model by parameterizing expectations. Journal of Business & Economic Statistics 8(1), 31–34. Den Haan, W. J., G. Ramey, and J. Watson (2000). Job destruction and propagation of shocks. American Economic Review 90(3), 482–498. Den Haan, W. J. and P. Rendahl (2010). Solving the incomplete markets model with aggregate uncertainty using explicit aggregation. Journal of Economic Dynamics and Control 34, 69 – 78.
876
Bibliography
Dennis, J. E. and R. B. Schnabel (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Englewood Cliffs, NJ: Prentice-Hall. Diamond, P. A. (1965). National debt in a neoclassical growth model. American Economic Review 55(5), 1126–1150. Diamond, P. A. (1982). Wage determination and efficiency in search equilibrium. Review of Economic Studies 49, 212 – 227. Díaz-Giménez, J., V. Quadrini, and J.-V. Ríos-Rull (1997). Dimensions of inequality: Facts on the US distributions of earnings, income, and wealth. Quarterly Review Federal Reserve Bank of Minneapolis 21(2), 3–21. Dotsey, M. and P. Ireland (1996). The welfare costs of inflation in general equilibrium. Journal of Monetary Economics 37(1), 29–47. Duffy, J. and P. D. McNelis (2001). Approximating and simulating the stochastic growth model: Parameterized expectations, neural networks, and the genetic algorithm. Journal of Economic Dynamics & Control 25(9), 1273–1303. Eggertsson, G., N. R. Mehrotra, and J. A. Robbins (2019). A model of secular stagnation: Theory and qualitative evaluation. American Economic Journal: Macroeconomics 11(1), 1 – 48. Epstein, L. G. and S. E. Zin (1989). Substitution, risk aversion, and the temporal behavior of consumption and asset returns: A theoretical framework. Econometrica 57, 937–969. Erceg, C. J., D. W. Henderson, and A. T. Levin (2000). Optimal monetary policy with staggered wage and price contracts. Journal of Monetary Economics 46, 281–313. Erosa, A. and G. Ventura (2004). On inflation as a regressive consumption tax. Journal of Monetary Economics 49(4), 761–795. Evans, O. J. (1983). Tax policy, the interest elasticity of saving, and capital accumulation: Numerical analysis of theoretical models. American Economic Review 83, 398 – 410. Fair, R. C. and J. B. Taylor (1983). Solution and maximum likelihood estimation of dynamic nonlinear rational expectations models. Econometrica 51(4), 1169– 1185. Favero, C. (2001). Applied Macroeconometrics. Oxford: Oxford University Press. Fehr, H., M. Kallweit, and F. Kindermann (2013). Should pensions be progressive? European Economic Review 63, 94 – 116. Fehr, H. and F. Kindermann (2018). Introduction to Computational Economics Using Fortran. Oxford: Oxford University Press. Fehrle, D. and C. Heiberger (2020). The return on everything and the business cycle in production economies. Volkswirtschaftliche Diskussionsreihe 338, Universität Augsburg, Institut für Volkswirtschaftslehre. Fernández-Villaverde, J., S. Hurtado, and G. Nuño (2023). Financial frictions and the wealth distribution. Econometrica 91(3), 869–901. Fernández-Villaverde, J. and O. Levintal (2018). Solution methods for models with rare disasters. Quantitative Economics 9, 903–944.
Bibliography
877
Fernández-Villaverde, J., J. Marbet, G. Nuño, and O. Rachedi (2023). Inequality and the zero lower bound. Working Paper 31282, National Bureau of Economic Research (NBER). Fernández-Villaverde, J., J. F. Rubio-Ramírez, and F. Schorfheide (2016). Solution and estimation methods for DSGE models. In J. B. Taylor and H. Uhlig (Eds.), Handbook of Macroeconomics, Volume 2A, pp. 527–724. Amsterdam: NorthHolland. Fernández-Villaverde, J. and D. Z. Valencia (2018). A practical guide to parallelization in economics. Working Paper 24561, National Bureau of Economic Research (NBER). Finn, M. G. (1995). Variance properties of Solow’s productivity residual and their cyclical implications. Journal of Economic Dynamics & Control 19(5-7), 1249–1281. Francis, N. and V. A. Ramey (2005). Is the technology-driven real business cycle hypothesis dead? Shocks and aggregate fluctuations revisited. Journal of Monetary Economics 52, 1379–1399. Fuchs-Schündeln, N., D. Krueger, and M. Sommer (2010). Inequality trends for Germany in the last two decades: A tale of two countries. Review of Economic Dynamics 13(1), 103 – 132. Gagnon, J. E. (1990). Solving the stochastic growth model by deterministic extended path. Journal of Business & Economic Statistics 8(1), 35–36. Galí, J. (1999). Technology, employment, and the business cycle: Do technology schocks explain aggregate fluctuations? American Economic Review 89, 249– 271. Galí, J. (2015). Monetary Policy, Inflation, and the Business Cycle (2nd ed.). Princeton, NJ and Oxford: Princeton University Press. Galí, J., J. D. López-Salido, and J. Vallés (2007). Unterstanding the effects of government spending on consumption. Journal of the European Economic Association 5(1), 227–270. Galindev, R. and D. Lkhagvasuren (2010). Discretization of highly persistent correlated AR(1) shocks. Journal of Economic Dynamics and Control 34, 1260– 1276. García-Peñaloa, C. and S. J. Turnovsky (2011). Taxation and income distribution dynamics in a neoclassical growth model. Journal of Money, Credit and Banking 43(8), 1543 – 1577. Gertler, M. (1999). Government debt and social security in a life-cycle economy. Carnegie-Rochester Conference Series on Public Policy 50, 61 – 110. Goldfarb, D. (1976). Factorized variable metric methods for unconstrained optimization. Mathematics of Computation 30(136), 796–811. Golub, G. H. and C. F. Van Loan (1996). Matrix Computations (3rd ed.). Baltimore, MD and London: The Johns Hopkins University Press. Golub, G. H. and J. H. Welsch (1969). Calculation of Gauss quadrature rules. Mathematics of Computation 106, 221–230. Gomme, P. (1998). Evolutionary programming as a solution technique for the Bellman equation. Working Paper 9816, Federal Reserve Bank of Cleveland.
878
Bibliography
Gomme, P. and P. Klein (2011). Second-order approximation of dynamic models without the use of tensors. Journal of Economic Dynamics and Control 35, 604–615. Gomme, P. and P. Rupert (2007). Theory, measurement, and calibration of macroeconomic models. Journal of Monetary Economics 54(2), 460–497. Gorman, W. M. (1953). Community preference fields. Econometrica 21, 63 – 80. Gospodinov, N. and D. Lkhagvasuren (2014). A moment-matching method for approximating vector autoregessive processes by finite-state Markov chains. Journal of Applied Econometrics 29, 843–859. Gourio, F. (2012). Disaster risk and business cycles. American Economic Review 102, 2724–2766. Grandmont, J.-M. (2008). Nonlinear difference equations, bifurcations and chaos: An introduction. Research in Economics, 122–177. Greene, W. H. (2012). Econometric Analysis (7th ed.). Boston: Pearson. Greenwood, J., Z. Hercowitz, and G. W. Huffman (1988). Investment, capacity utilization, and the real business cycle. American Economic Review 78(3), 402–417. Grüner, H. P. and B. Heer (2000). Optimal flat-rate taxes on capital - a reexamination of Lucas’ supply side model. Oxford Economic Papers 52(2), 289– 305. Guckenheimer, J. and P. Holmes (1986). Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields (2nd ed.). New York: Springer-Verlag. Guerrieri, L. and M. Iacoviello (2015). Occbin: A toolkit for solving dynamic models with occasionally binding constraints easily. Journal of Monetary Economics 70(1), 22–38. Guvenen, F. (2009). An empirical investigation of labor income processes. Review of Economic Dynamics 12, 58–79. Guvenen, F., F. Karahan, S. Ozkan, and J. Song (2015). What do data on millions of U.S. workers reveal about life-cycle earnings risk? Econometrica 89(5), 2303–2339. Guvenen, F., S. Ozkan, and J. Song (2014). The nature of countercyclical income risk. Journal of Political Economy 122(3), 621–660. Hagedorn, M. and I. Manovskii (2008). The cyclical behavior of equilibrium unemployment and vacancies revisited. American Economic Review 98, 1692– 1706. Halmos, P. R. (1974). Measure Theory. New York: Springer-Verlag. Hamilton, J. D. (1994). Time Series Analysis. Princeton, NJ: Princeton University Press. Hamilton, J. D. (2018). Why you should never use the Hodrick-Prescott filter. Review of Economics and Statistics 100(5), 831–843. Hansen, G. D. (1985). Indivisible labor and the business cycle. Journal of Monetary Economics 16(3), 309–327. Hansen, G. D. (1993). The cyclical and secular behavior of the labor input: Comparing efficiency units and hours worked. Journal of Applied Econometrics 8(1), 71–80.
Bibliography
879
Hansen, G. D. and A. ˙Imrohoro˘ glu (1992). The role of unemployment insurance in an economy with liquidity constraints and moral hazard. Journal of Political Economy 100(1), 118–142. Hansen, L. P. and T. J. Sargent (2014). Recursive Models of Dynamic Linear Economies. Princeton and Oxford: Princeton University Press. Harris, M. (1987). Dynamic Economic Analysis. New York: Oxford University Press. Heathcote, J., F. Perri, and G. L. Violante (2010). Unequal we stand: An empirical analysis of economic inequality in the United States, 1967-2006. Review of Economic Dynamics 13, 15 – 51. Heathcote, J., K. Storesletten, and G. L. Violante (2010). The macroeconomic implications of rising wage inequality in the United States. Journal of Political Economy 118, 681 – 722. Heathcote, J., K. Storesletten, and G. L. Violante (2017). Optimal tax progressivity: An analytical framework. Quarterly Journal of Economics 132(4), 1693–1754. Heckman, J. J., L. Lochner, and C. Taber (1998). Explaining rising wage inequality: Explorations with a dynamic general equilibrium model of labor earnings with heterogeneous agents. Review of Economic Dynamics 1(1), 1–58. Heer, B. (2001a). On the welfare gain from stabilizing cyclical fluctuations. Applied Economic Letters 8(5), 331–334. Heer, B. (2001b). Wealth distribution and optimal inheritance taxation in life-cycle economies. Scandinavian Journal of Economics 103, 445 – 465. Heer, B. (2003a). Employment and welfare effects of a two-tier unemployment compensation system. International Tax and Public Finance 10, 147 – 168. Heer, B. (2003b). Welfare costs of inflation in a dynamic economy with search unemployment. Journal of Economic Dynamics and Control 28, 255 – 272. Heer, B. (2004). Nonsuperneutrality of money in the Sidrauski model with heterogenous agents. Economics Bulletin 5(8), 1–6. Heer, B. (2018). Optimal pensions in aging economies. B.E. Journal of Macroeconomics 18, 1–19. Heer, B. (2019). Public Economics. The Macroeconomic Perspective. Cham: Springer. Heer, B. and A. Irmen (2014). Population, pensions and endogenous economic growth. Journal of Economic Dynamics and Control 46, 50–72. Heer, B. and A. Maußner (2008). Computation of business cycle models: A comparison of numerical methods. Macroeconomic Dynamics 12(5), 641–663. Heer, B. and A. Maußner (2009a). Dynamic General Equilibrium Modelling. (2nd ed.). Berlin: Springer. Heer, B. and A. Maußner (2009b). Education briefing: Computation of businesscycle models with the generalized Schur method. Indian Growth and Development Review 12(5), 641–663. Heer, B. and A. Maußner (2011). Value function iteration as a solution method for the Ramsey model. Journal of Economics and Statistics 231(4), 494–515. Heer, B. and A. Maußner (2012). The burden of unanticipated inflation: Analysis of an overlapping generations model with progressive income taxation and staggered prices. Macroeconomic Dynamics 16, 278–308.
880
Bibliography
Heer, B. and A. Maußner (2013). Asset returns, the business cycle, and the labor market. German Economic Review 14(3), 372–397. Heer, B. and A. Maußner (2018). Projection methods and the curse of dimensionality. Journal of Mathematical Finance 8, 317–334. Heer, B., A. Maußner, and P. D. McNelis (2011). The money-age distribution: Empirical facts and the limits of three monetary models. Journal of Macroeconomics 33(3), 390–405. Heer, B., A. Maußner, and B. Süßmuth (2018). Cyclical asset returns in the consumption and investment goods sector. Review of Economic Dynamics 28, 1094–2025. Heer, B., V. Polito, and M. R. Wickens (2020). Population aging, social security and fiscal limits. Journal of Economic Dynamics and Control 116. https://doi.org/10.1016/j.jedc.2020.103913. Heer, B. and S. Rohrbacher (2021). Endogenous longevity and optimal tax progressivity. Journal of Health Economics 79. Heer, B. and C. Scharrer (2018). The age-specific burdens of short-run fluctuations in government spending. Journal of Economic Dynamics and Control 90, 45–75. Heer, B. and M. Trede (2003). Efficiency and distribution effects of a revenueneutral income tax reform. Journal of Macroeconomics 25(1), 87–107. Heiberger, C. (2017). Asset Prices, Epstein-Zin Utility, and Endogenous Economic Disasters. Ph. D. thesis, University of Augsburg. Heiberger, C. (2020). Labor market search, endogenous disasters and the equity premium puzzle. Journal of Economic Dynamics and Control 114. Heiberger, C., T. Klarl, and A. Maußner (2015). On the uniqueness of solutions to rational expectations models. Economic Letters 128, 14–16. Heiberger, C., T. Klarl, and A. Maußner (2017). On the numerical accuracy of first-order approximate solutions to DSGE models. Macroeconomic Dynamics 21, 1811–1826. Heiberger, C. and A. Maußner (2020). Perturbation solutions and welfare costs of business cycles in DSGE models. Journal of Economic Dynamics and Control 113. http://doi.org/10.1016/j.jedc.2019.103819 . Heiberger, C. and H. Ruf (2019). Epstein-Zin utility, asset prices, and the business cycle revisited. German Economic Review 20(4), 730–758. Herbst, E. P. and F. Schorfheide (2016). Bayesian Estimation of DSGE Models. Princeton and Oxford: Princeton University Press. Hercowitz, Z. and M. Sampson (1991). Output growth, the real wage, and employment fluctuations. American Economic Review 81(5), 1215–1237. Hernandez, K. (2013). A system reduction method to efficiently solve DSGE models. Journal of Economic Dynamics and Control 37, 571–576. Herrera, F., M. Lozano, and J. L. Verdegay (1998). Tackling real-coded genetic algorithms: Operators and tools for behavioural analysis. Artificial Intelligence Review 12(4), 265–319. Hirsch, M. W. and S. Smale (1974). Differential Equations, Dynamical Systems, and Linear Algebra. New York: Academic Press.
Bibliography
881
Hodrick, R. J. and E. C. Prescott (1980). Post-war U.S. business cycles: An empirical investigation. Discussion Paper 451, University of Warwick. Hodrick, R. J. and E. C. Prescott (1997). Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking 29(1), 1–16. Holland, A. and A. Scott (1998). The determinants of UK business cycles. Economic Journal 108, 1067–1092. Hubbard, R. G. and K. L. Judd (1987). Social security and individual welfare: Precautionary saving, borrowing constraints, and the payroll tax. American Economic Review 77(4), 630–646. Hubbard, R. G., J. Skinner, and S. P. Zeldes (1995). Precautionary saving and social insurance. Journal of Political Economy 103(2), 360–399. Huggett, M. (1993). The risk-free rate in heterogeneous-agent incompleteinsurance economies. Journal of Economic Dynamics & Control 17(5-6), 953– 969. Huggett, M. (1996). Wealth distribution in life-cycle economies. Journal of Monetary Economics 38(3), 469–494. Huggett, M. (1997). The one-sector growth model with idiosyncratic shocks: Steady states and dynamics. Journal of Monetary Economics 39, 385 – 403. Huggett, M. and S. Ospina (2001). Aggregate precautionary saving: When is the third derivative irrelevant? Journal of Monetary Economics 48(2), 373–3964. Huggett, M. and G. Ventura (1999). On the distributional effects of social security reform. Review of Economic Dynamics 2, 498 – 531. Huggett, M. and G. Ventura (2000). Understanding why high income households save more than low income households. Journal of Monetary Economics 45(2000), 361–397. Huggett, M., G. Ventura, and A. Yaron (2011). Sources of lifetime inequality. American Economic Review 101, 2923 – 2954. Hurd, M. D. (1989). Mortality risk and bequests. Econometrica 57(4), 799–813. Iliopulos, E., F. Langot, and T. Sopraseuth (2018). Welfare costs of fluctuations when labor market search interacts with financial frictions. Journal of Money, Credit and Banking 51, 2207–2237. ˙Imrohoro˘ glu, A. (1989a). Cost of business cycle with indivisibilities and liquidity constraints. Journal of Politial Economy 97, 1364–1383. ˙Imrohoro˘ glu, A. (1989b). Cost of business cycles with indivisibilities and liquidity constraints. Journal of Political Economy 97(6), 1364–1383. ˙Imrohoro˘ glu, A., S. ˙Imrohoro˘ glu, and D. H. Joines (1995). A life cycle analysis of social security. Economic Theory 6(1), 83–114. ˙Imrohoro˘ glu, A., S. ˙Imrohoro˘ glu, and D. H. Joines (1998). The effect of taxfavored retirement accounts on capital accumulation. American Economic Review 88, 749–768. ˙Imrohoro˘ glu, A., C. Kumru, and A. Nakornthab (2018). Revisiting top income taxes. Working Paper in Economics and Econometrics 660, Australian National University (ANU). ˙Imrohoro˘ glu, S. (1998). A quantitative study of capital income taxation. International Economic Review 39, 307–328.
882
Bibliography
Irmen, A. and A. Maußner (2017). A note on the characterization of the neoclassical production function. Macroeconomic Dynamics 21(7), 1827–1835. Jermann, U. J. (1998). Asset pricing in production economies. Journal of Monetary Economics 41(2), 257–275. Jin, H.-H. and K. L. Judd (2002). Perturbation methods for general dynamic stochastic models. Mimeo. Jordà, O., K. Knoll, D. Kuvshinov, M. Schularick, and A. M. Taylor (2019). The rate of return on everything, 1870-2015. Quarterly Journal of Economics 134(3), 1225–1298. Jordà, O., M. Schularick, and A. M. Taylor (2019). The total risk premium puzzle. Working Paper 25653, National Bureau of Economic Research (NBER). Judd, K. L. (1992). Projection methods for solving aggregate growth models. Journal of Economic Theory 58, 410–452. Judd, K. L. (1998). Numerical Methods in Economics. Cambridge, MA and London: MIT Press. Judd, K. L. and S.-M. Guu (1997). Asymptotic methods for aggregate growth models. Journal of Economic Dynamics and Control 21, 1025–1042. Judd, K. L., L. Maliar, and S. Maliar (2011). Numerically stable and accurate stochastic simulation approaches for solving dynamic economic models. Quantitative Economics 2, 173–210. Judd, K. L., L. Maliar, S. Maliar, and R. Valero (2014). Smolyak method for solving dynamic economic models: Lagrange interpolation, anisotropic grid and adaptive domain. Journal of Economic Dynamic and Control 44, 92–123. Judge, G. G., R. Carter Hill, W. E. Griffiths, and H. Lütkepohl (1988). Introduction to the Theory and Practice of Econometrics (2nd ed.). New York: John Wiley & Sons. Jung, J. and C. Tran (2016). Market inefficiency, insurance mandate and welfare: U.S. health care reform 2010. Review of Economic Dynamics 20, 132 – 159. Kågström, B. and P. Poromaa (1994). Lapack-style algorithms and software for solving the generalized Sylvester equation and estimating the separation between regular matrix pairs. Working Notes UT, CS-94-237, LAPACK. Kamien, M. I. and N. L. Schwartz (1981). Dynamic Optimization, The Calculus of Variations and Optimal Control in Economics and Management. Amsterdam: North-Holland. Kamihigashi, T. (2002). A simple proof of the necessity of the transversality condition. Economic Theory 20(2), 427–433. Kamihigashi, T. (2005). Necessity of the transversality condition for stochastic models with bounded or CRRA utility. Journal of Economic Dynamics and Control 29, 1313–1329. Kaplan, G. (2012). Inequality and the life cycle. Quantitative Economics 3, 471 – 525. Kaplan, G., K. Mitman, and G. L. Violante (2020). The housing boom and bust: Model meets evidence. Journal of Political Economy 128(9), 3285 – 3345. Kaplan, G., B. Moll, and G. L. Violante (2018). Monetary policy according to HANK. American Economic Review 108(3), 7697 – 743.
Bibliography
883
Kaplan, G. and G. L. Violante (2014). A model of the consumption response to fiscal stimulus payments. Econometrica 82(4), 1199 – 1239. Kessler, D. and A. Masson (1989). Bequest and wealth accumulation: Are some pieces of the puzzle missing? Journal of Economic Perspectives 3, 141 – 152. Keynes, J. M. (1930). F.P. Ramsey, Obituary. Economic Journal 40(157), 153–154. Kim, J., S. Kim, E. Schaumburg, and C. A. Sims (2008, 10). Calculating and using second-order accurate solutions of discrete time dynamic equilibrium models. Journal of Economic Dynamics and Control 32, 3397–3414. Kindermann, F. and D. Krueger (2022). High marginal tax rates on the top 1%? Lessons from a life cycle model with idiosyncratic income risk. American Economic Journal: Macroeconomics 14, 319–366. King, R. G., C. I. Plosser, and S. T. Rebelo (1988a). Production, growth and business cycles I. The basic neoclassical model. Journal of Monetary Economics 21(2-3), 195–232. King, R. G., C. I. Plosser, and S. T. Rebelo (1988b). Production, growth and business cycles II. New directions. Journal of Monetary Economics 21(2-3), 309–341. King, R. G. and S. T. Rebelo (1993). Low frequency filtering and real business cycles. Journal of Economic Dynamics and Control 17(1-2), 207–231. King, R. G. and M. W. Watson (1998). The solution of singular linear difference systems under rational expectations. International Economic Review 39(4), 1015–1026. Kirkby, R. (2017). A toolkit for value function iteration. Computational Economics 49, 1–15. Kitagawa, G. (1977). An algorithm for solving the matrix equation x = f x f t + s. International Journal of Control 25(5), 745–753. Kitao, S. (2014). Sustainable social security: Four options. Review of Economic Dynamics 17, 756 – 779. Kiyotaki, N. and J. Moore (1997). Credit cycles. Journal of Political Economy 105, 211 – 248. Klein, P. (2000). Using the generalized Schur form to solve a multivariate linear rational expectations model. Journal of Economic Dynamics and Control 24, 1405–1423. Kochenderfer, M. J. and T. A. Wheeler (2019). Algorithms for Optimization. Cambridge, MA and London: MIT Press. Kocherlakota, N. R. (1996). The equity premium: It’s still a puzzle. Journal of Economic Literature 34(1), 42–71. Koopmans, T. C. (1965). On the concept of optimal economic growth (reprinted in scientific papers of Tjalling C. Koopmans, pp. 485-547). In Pontifical Academy of Sciences (Ed.), The Econometric Approach to Development Planning, chap. 4, pp. 225–287. Amsterdam and Chicago: North-Holland and Rand McNally. Kopecky, K. A. and R. M. Suen (2010). Finite state Markov-chain approximations to highly persistent processes. Review of Economic Dynamics 13(3), 701–714. Kremer, J. (2001). Arbeitslosigkeit, Lohndifferenzierung und wirtschaftliche Entwicklung. Lohmar, Köln: Josef Eul Verlag.
884
Bibliography
Krueger, D. and F. Kubler (2004). Computing equilibrium in OLG models with stochastic production. Journal of Economic Dynamics and Control 28, 1411– 1436. Krueger, D. and A. Ludwig (2007). On the consequences of demographic change for rates of returns to capital, and the distribution of wealth and welfare. Journal of Monetary Economics 54, 49–87. Krueger, D. and A. Ludwig (2013). Optimal progressive labor income taxation and education subsidies when education decisions and intergenerational transfers are endogenous. American Economic Review 103, 496–501. Krueger, D., K. Mitman, and F. Perri (2016). Macroeconomics and household heterogeneity. In J. B. Taylor and H. Uhlig (Eds.), Handbook of Macroeconomics, pp. 843–921. Elsevier. Krueger, D., F. Perri, L. Pistaferrri, and G. L. Violante (2010). Cross-sectional facts for macroeconomists. Review of Economic Dynamics 13, 1–14. Krusell, P., L. E. Ohanian, J.-V. Ríos-Rull, and G. L. Violante (2000). Capital-skill complementarity and inequality: A macroeconomic analysis. Econometrica 68, 1029 – 1054. Krusell, P. and A. A. Smith (1998). Income and wealth heterogeneity in the macroeconomy. Journal of Political Economy 106(5), 867–896. Kuehn, L.-A., N. Petrosky-Nadeau, and L. Zhang (2012). An equilibrium asset pricing model with labor market search. Working Paper 17742, National Bureau of Economic Research (NBER). Kydland, F. E. (1984). Labor-force heterogeneity and the business cycle. CarnegieRochester Conference Series on Public Policy 21, 173 – 208. Kydland, F. E. and E. C. Prescott (1982). Time to build and aggregate fluctuations. Econometrica 50(6), 1345–1370. Laitner, J. (1990). Tax changes and phase diagrams for an overlapping generations model. Journal of Political Economy 98, 193 – 220. Lang, S. (1986). Introduction to Linear Algebra (2nd ed.). New York: Springer. Lang, S. (1993). Real and Functional Analysis (3rd ed.). New York: Springer. Lang, S. (1997). Undergraduate Analysis (2nd ed.). New York: Springer. Lillard, L. A. and R. J. Willis (1978). Dynamic aspects of earnings mobility. Econometrica 46(5), 985–1012. Lim, G. C. and P. D. McNelis (2008). Computational Macroeconomics for the Open Economy. Cambridge, MA and London: MIT Press. Ljungqvist, L. and T. J. Sargent (2018). Recursive Macroeconomic Theory (4th ed.). Cambridge, MA and London: MIT Press. Long, J. B. and C. I. Plosser (1983). Real business cycles. Journal of Political Economy 91(1), 36–69. Loury, G. C. (1981). Intergenerational transfers and the distribution of earnings. Econometrica 49, 843 – 867. Lucas, R. E. (1987). Models of Business Cycles. New York: Blackwell. Lucas, R. E. (1988). On the mechanics of economic development. Journal of Monetary Economics 22(1), 3–42.
Bibliography
885
Lucas, R. E. (1990). Supply-side economics: An analytical review. Oxford Economic Papers 42(2), 293–316. Lucke, B. (1998). Theorie und Empirie realer Konjunkturzyklen. Heidelberg: Physica-Verlag. Ludwig, A. (2007). The Gauss-Seidel-quasi-Newton method: A hybrid algorithm for solving dynamic economic models. Journal of Economic Dynamics and Control 31, 1610–1632. Luenberger, D. G. (1969). Optimization by Vector Space Methods. New York: John Wiley & Sons. Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Berlin: Springer. MaCurdy, T. E. (1982). The use of time series processes to model the error structure of earnings in longitudinal data analysis. Journal of Econometrics 18(1), 83– 114. Magnus, J. R. and H. Neudecker (1999). Matrix Differential Calculus with Applications in Statistics and Econometrics (3rd ed.). Chichester: John Wiley &Sons. Maliar, L. and S. Maliar (2001). Heterogeneity in capital and skills in a neoclassical stochastic growth model. Journal of Economic Dynamics and Control 25(9), 1367–1397. Maliar, L. and S. Maliar (2003). The representative consumer in the neoclassical growth model with idiosyncratic shocks. Review of Economic Dynamics 6(2), 368–380. Maliar, L., S. Maliar, J. B. Taylor, and I. Tsener (2020). A tractable framework for analyzing a class of nonstationary Markov models. Quantitative Economics 11, 1289–1323. Maliar, L., S. Maliar, and I. Tsener (2022). Capital-skill complementarity and inequality: Twenty years after. Economics Letters 220, 110844. Maliar, L., S. Maliar, and F. Valli (2010). Solving the incomplete markets model with aggregate uncertainty using the Krusell-Smith algorithm. Journal of Economic Dynamics and Control 34, 42–49. Maliar, L., S. Maliar, and P. Winant (2019). Will artificial intelligence replace computational economists any time soon? CEPR discussion paper series DP 14024. Maliar, L., S. Maliar, and P. Winant (2021). Deep learning for solving dynamic economic models. Journal of Monetary Economics 122, 76–101. Malin, B. A., D. Krueger, and F. Kubler (2011). Solving the multi-country real business cycle model using a Smolyak-collocation method. Journal of Economic Dynamics and Control 35, 229–239. Mankiw, N. G., M. Weinzierl, and D. Yagan (2009). Optimal taxation in theory and practice. Journal of Economic Perspectives 23(4), 147 – 174. Marcet, A. and G. Lorenzoni (1999). The parameterized expectations approach: Some practical issues. In R. Marimon and A. Scott (Eds.), Computational Methods for the Study of Dynamic Economies, pp. 143–171. Oxford: Oxford University Press.
886
Bibliography
Mas-Colell, A., M. D. Whinston, and J. R. Green (1995). Microeconomic Theory. New York and Oxford: Oxford University Press. Mason, J. C. and D. Handscomb (2003). Chebyshev Polynomials. Boca Raton: Chapman & Hall/CRC. Maußner, A. (2004). Endogenous growth with nominal frictions. Journal of Economics 83(1), 1–47. Maußner, A. (2010). The analytics of New Keynesian Phillips curves. Volkswirtschaftliche Diskussionsreihe 313, Institut für Volkswirtschaftslehre der Universität Augsburg. Maußner, A. (2023). Computational macroeconomics. Lecture Notes, University of Augsburg. Maußner, A. and J. Spatz (2006). Determinants of business cycles in small scale macroeconomic models: The German case. Empirical Economics 31, 925–950. McGrattan, E. R. (1999). Application of weighted residual methods to dynamic economic models. In R. Marimon and A. Scott (Eds.), Computational Methods for the Study of Dynamic Economies, pp. 114–142. Oxford: Oxford University Press. Mehra, R. (Ed.) (2008). Handbook of the Equity Risk Premium. Amsterdam: Elsevier. Mehra, R. and E. C. Prescott (1985). The equity premium: A puzzle. Journal of Monetary Economics 15(2), 145–161. Merz, M. (1995). Search in the labor market and the real business cycle. Journal of Monetary Economics 36, 269–300. Meyer-Gohde, A. and J. Saecker (2022). Solving linear DSGE models with Newton methods. Goethe-Universität Frankfurt and Institute for Monetary and Financial Stability (IMFS), https://www.imfs-frankfurt.de/fileadmin/ user_upload/Meyer-Gohde/Publikationen/newton_DSGE.pdf . Michalewicz, Z. (1996). Genetic Algorithms + Data Structures = Evolution Programs (3rd ed.). Berlin, Heidelberg, New York: Springer. Miranda, M. J. and P. L. Fackler (2002). Applied Computational Economics and Finance. Cambridge, MA, London: MIT Press. Mise, E., T.-H. Kim, and P. Newbold (2005). On suboptimality of the HodrickPrescott filter at time series endpoints. Journal of Macroeconomics 27, 53–67. Mitchell, M. (1996). An Introduction to Genetic Algorithms. Cambridge, MA and London: MIT Press. Mogensen, P. K. and A. N. Riseth (2018). Optim: A mathematical optimization package for Julia. Journal of Open Source Software 3(24). Monacelli, T., R. Perotti, and A. Trigari (2010). Unemployment fiscal multipliers. Journal of Monetary Economics 57, 531 – 553. Mortensen, D. (1982). The matching process as a noncooperative bargaining game. In J. McCall (Ed.), The Economics of Information and Uncertainty, pp. 233–258. University Chicago Press. Murata, Y. (1977). Mathematics for Stability and Optimization of Economic Systems. New York: Academic Press.
Bibliography
887
Muth, J. F. (1961). Rational expectations and the theory of price movements. Econometrica 29(3), 315–335. Neidinger, R. D. (2010). Introduction to automatic differentiation and MATLAB object-oriented programming. SIAM Review 52(3), 545–563. Nelson, C. R. and C. I. Plosser (1982). Trends and random walks in macroeconomic time series: Some evidence and implications. Journal of Monetary Economics 10(2), 139–162. Neusser, K. (2016). Time Series Econometrics. Springer. Nishiyama, S. and K. Smetters (2007). Does social security privatization produce efficiency gains? Quarterly Journal of Economics 122, 1677 – 1719. Nocedal, J. and S. J. Wright (2006). Numerical Optimization (2nd ed.). New York: Springer. Nowak, U. and L. Weimann (1991). A family of Newton codes for systems of highly nonlinear equations. Technical Report TR-91-10, Konrad-Zuse-Zentrum für Informationstechnik Berlin. OECD (2015). Pensions at a Glance 2015: OECD and G20 indicators. Paris: OECD Publishing. OECD (2019). Pensions at a Glance 2019: OECD and G20 indicators. Paris: OECD Publishing. Ozlu, E. (1996). Aggregate economic fluctuations in endogenous growth models. Journal of Macroeconomics 18, 27–47. Palis Jr., J. and W. de Melo (1982). Geometric Theory of Dynamical Systems. New York: Springer. Peterman, W. B. and K. Sommer (2019). How well did social security mitigate the effects of the Great Recession? International Economic Review 60(3), 1433– 1466. Petrosky-Nadeau, N., L. Zhang, and L.-A. Kuehn (2018). Endogenous disasters. American Economic Review 108(8), 2212–2245. Piketty, T. (2014). Capital in the Twenty-First Century. Cambridge, MA: Harvard University Press. Piketty, T. and E. Saez (2003). Income inequality in the United States 1913-1998. Quarterly Journal of Economics 118(1), 1–39. Pissarides, C. (1985). Short-run dynamics of unemployment, vacancies, and real wages. American Economic Review 75, 676–690. Plosser, C. I. (1989). Understanding real business cycles. Journal of Economic Perspectives 3(3), 51–77. Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica 32(1/2), 122–136. Prescott, E. C. (1986). Theory ahead of business cycle measurement. Quarterly Review Federal Reserve Bank of Minneapolis 10(4), 9–22. Press, W. H., S. A. Teukolsky, W. T. Veterling, and B. P. Flannery (1992). Numerical Recipes in FORTRAN: The Art of Scientific Computing. (2nd ed.). Cambridge, MA: Cambridge University Press.
888
Bibliography
Puterman, M. L. and S. L. Brumelle (1979). On the convergence of policy iteration in stationary dynamic programming. Mathematics of Operations Research 4(1), 60–69. Puterman, M. L. and M. C. Shin (1978). Modified policy iteration algorithms for discounted Markov decision problems. Management Science 24(11), 1127– 1237. Quadrini, V. (2000). Entrepreneurship, saving and social mobility. Review of Economic Dynamics 3(1), 1–40. Quadrini, V. and J.-V. Ríos-Rull (1997). Understanding the U.S. distribution of wealth. Federal Reserve Bank of Minneapolis Quarterly Review 21, 22–36. Quadrini, V. and J.-V. Ríos-Rull (2015). Inequality in macroeconomics. In A. Atkinson and F. Bourguignon (Eds.), Handbook of Income Distribution, pp. 1229–1302. North-Holland. Quarteroni, A., R. Sacco, and F. Saleri (2007). Numerical Mathematics (2nd ed.). Berlin: Springer. Radner, R. (1966). Optimal growth in a linear-logarithmic economy. International Economic Review 7(1), 1–33. Rall, L. B. and G. F. Corliss (1996). An introduction to automatic differentiation. In M. Bery, C. H. Bischof, G. F. Corliss, and A. Griewank (Eds.), Computational Differentiation: Techniques, Applications, and Tools, pp. 1–21. Philadelphia, PA: SIAM. Ramey, V. A. (2021). The macroeconomic consequences of infrastructure investment. In E. L. Glaeser and J. M. Poterba (Eds.), Economic Analysis and Infrastructure Investment, pp. 219–276. Chicago: University of Chicago Press. Ramsey, F. (1928). A mathematical theory of saving. Economic Journal 38(152), 543–559. Ravn, M. O., S. Schmitt-Grohé, and M. Uribe (2012). Consumption, government spending, and the real exchange rate. Journal of Monetary Economics 59, 215–234. Ravn, M. O. and H. Uhlig (2002). On adjusting the HP-filter for the frequency of observations. Review of Economics and Statistics 84(2), 371–380. Reiter, M. (2010). Solving the incomplete markets model with aggregate uncertainty by backward induction. Journal of Economic Dynamics and Control 34, 28 – 35. Rietz, T. A. (1988). The equity risk premium. Journal of Monetary Economics 22, 117–131. Ríos-Rull, J.-V. (1993). Working in the market, working at home, and the acquisition of skills: A general-equilibrium approach. American Economic Review 8(4), 893–907. Ríos-Rull, J.-V. (1996). Life-cycle economies and aggregate fluctuations. Review of Economic Studies 63, 465–489. Ríos-Rull, J.-V. (1999). Computation of equilibria in heterogenous-agent models. In R. Marimon and A. Scott (Eds.), Computational Methods for the Study of Dynamic Economies, pp. 238–264. Oxford: Oxford University Press.
Bibliography
889
Rogerson, R. and R. Shimer (2012). Search in macroeconomic models of the labor market. In Handbook of Labor Economics, Volume 4 A, pp. 619–700. San Diego: North-Holland. Rogerson, R., R. Shimer, and R. Wright (2005). Search-theoretic models of the labor market: A survey. Journal of Economic Literature 43(4), 959–988. Romer, P. (1986). Increasing returns and long run growth. Journal of Political Economy 94, 1002–1087. Romer, P. (1991). Increasing returns and new developments in the theory of growth. In W. A. Barnett, A. Mas-Colell, J. Gabszewicz, C. D’Aspremont, and B. Cornet (Eds.), Economic Equilibrium Theory and Applications, pp. 83–110. Cambridge MA: Cambridge University Press. Rotemberg, J. J. and M. Woodford (1992). Oligopolistic pricing and the effects of aggregate demand on economic activity. Journal of Political Economy 100, 1153 – 1207. Rouwenhorst, K. G. (1995). Asset pricing implications of equilibrium business cycle models. In T. F. Cooley (Ed.), Frontiers of Business Cycle Research, pp. 294–330. Princeton, NJ: Princeton University Press. Rupert, P. and G. Zanella (2015). Revisiting wage, earnings and hours profiles. Journal of Monetary Economics 72, 114 – 130. Saez, E. (2001). Using elasticities to derive optimal income tax rates. Review of Economic Studies 68(1), 205 – 229. Saez, E. and G. Zucman (2014). Wealth inequality in the United States since 1913: Evidence from capitalized income tax data. Working Paper 20625, National Bureau of Economic Research (NBER). Samuelson, P. A. (1958). An exact consumption-loan model of interest with or without the social contrivance of money. Journal of Political Economy 66(6), 467–482. Santos, M. S. (2000). Accuracy of numerical solutions using the Euler equation residuals. Econometrica 68(6), 1377– 1402. Sargent, T. J. (1987). Macroeconomic Theory (2nd ed.). London, Oxford, Boston, New York, San Diego: Academic Press. Sargent, T. J. (1993). Bounded Rationality in Macroeconomics. Oxford: Clarendon Press. Schäfer, H. and J. Schmidt (2009). Einkommensmobilität in Deutschland. IWTrends 36(2), 91 – 105. Schmitt-Grohé, S. and M. Uribe (2003). Closing small open economy models. Journal of International Economics 61, 163–185. Schmitt-Grohé, S. and M. Uribe (2004). Solving dynamic general equilibrium models using a second-order approximation to the policy function. Journal of Economic Dynamics and Control 28(4), 755–775. Schmitt-Grohé, S. and M. Uribe (2005). Optimal fiscal and monetary policy in a medium-scale macroeconomic model. NBER Macroeconomics Annual 20, 383–425.
890
Bibliography
Schmitt-Grohé, S. and M. Uribe (2006). Comparing two variants of Calvo-type wage stickiness. Working Paper 12740, National Bureau of Economic Research (NBER). Schmitt-Grohé, S. and M. Uribe (2007). Optimal simple and implementable monetary and fiscal rules. Journal of Monetary Economics 54, 1702–1725. Schüler, Y. S. (2018). On the cyclical properties of Hamilton’s regression filter. Discussion Paper 03/2018, Deutsche Bundesbank. Sergi, F. (2020). The standard narrative about DSGE models in central banks’ technical reports. The European Journal of the History of Economic Thought 27(2), 163–193. Shimer, R. (2005). The cyclical behavior of equilibrium unemployment and vacancies. American Economic Review 95, 25–49. Shorrocks, A. F. (1976). Income mobility and the Markov assumption. Economic Journal 86(343), 566–578. Sidrauski, M. (1967). Rational choice and patterns of growth in a monetary economy. American Economic Review 57(2), 534–544. Sieg, H. (2000). Estimating a dynamic model of household choices in the presence of income taxation. International Economic Review 43(3), 637–668. Sims, C. A. (2002). Solving linear rational expectations models. Computational Economics 20, 1–20. Smets, F. and R. Wouters (2003). An estimated dynamic stochastic general equilibrium model of the euro area. Journal of the European Economic Association 1(5), 1123–1175. Smets, F. and R. Wouters (2007). Shocks and frictions in U.S. business cycles: A Bayesian DSGE approach. American Economic Review 97, 586–606. Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl. Akad. Nauk SSSR 148, 1042–1045. Solow, R. M. (1956). A contribution to the theory of economic growth. Quarterly Journal of Economics 70, 65–94. Solow, R. M. (1988). Growth Theory: An Exposition. New York and Oxford: Oxford University Press. Starchurski, J. (2022). Economic Dynamics: Theory and Computation (2nd ed.). Cambridge, MA: MIT Press. Starr, R. M. (1997). General Equilibrium Theory: An Introduction. Cambridge, MA: Cambridge University Press. Stock, J. H. and M. W. Watson (2012). Introduction to Econometrics (3rd ed.). Boston: Pearson. Stoer, J. and R. Bulirsch (2002). Introduction to Numerical Analysis (3rd ed.). New York: Springer. Stokey, N. L., R. E. Lucas, and E. C. Prescott (1989). Recursive Methods in Economic Dynamics. Cambridge, MA and London: Harvard University Press. Storesletten, K., C. I. Telmer, and A. Yaron (2001). The welfare cost of business cycles revisited: Finite lives & cyclical variation in idiosyncratic risk. European Economic Review 45(7), 1311–1339.
Bibliography
891
Storesletten, K., C. I. Telmer, and A. Yaron (2004). Consumption and risk sharing over the business cycle. Journal of Monetary Economics 51, 609–633. Storesletten, K., C. I. Telmer, and A. Yaron (2007). Asset pricing with idiosyncratic risk and overlapping generations. Review of Economic Dynamics 10, 519–548. Stroud, A. (1971). Approximate Calculation of Multiple Integrals. Englewood Cliffs, NJ: Prentice-Hall. Summers, L. H. (1981). Capital taxation and accumulation in a life cycle growth model. American Economic Review 71, 533 – 544. Summers, L. H. (2014). U.S. economic prospects: Secular stagnation, hysteresis, and the zero lower bound. Business Economics 49(2), 65 – 73. Sundaram, R. K. (1996). A First Course in Optimization Theory. Cambridge, MA: Cambridge University Press. Swan, T. W. (1956). Economic growth and capital accumulation. Economic Record 32, 334–361. Swanson, E. T. (2012). Risk aversion and the labor margin in dynamic equilibrium models. American Economic Review 102(4), 1663–1691. Swanson, E. T. (2018). Risk aversion, risk premia, and the labor margin with generalized recursive preferences. Review of Economic Dynamics 28, 290–321. Swanson, E. T., G. Anderson, and A. T. Levin (2006). Higher-order perturbation solutions to dynamic, discrete time rational expecations models. Working Paper Series 2006-01, Federal Reserve Bank of San Fransisco. Sydsæter, K., A. Strøm, and P. Berck (1999). Economists’ Mathematical Manual (3rd ed.). Berlin: Springer. Takayama, A. (1985). Mathematical Economics (2nd ed.). Cambridge, MA: Cambridge University Press. Tauchen, G. (1986). Finite state Markov-chain approximations to univariate and vector autoregressions. Economic Letters 20(2), 177–181. Terry, S. J. and E. S. Knotek II (2011). Markov-chain approximations of vector autoregressions: Application of general multivariate-normal integration techniques. Economics Letters 110, 4–6. Trabandt, M. and H. Uhlig (2011). The Laffer curve revisited. Journal of Monetary Economics 58, 305 – 327. Trick, M. A. and S. E. Zin (1993). A linear programming approach to solving stochastic dynamic programs. Gisa working paper, Cargegie Mellon University, Tepper School of Business. https://EconPapers.repec.org/RePEc: cmu:gsiawp:4. Trick, M. A. and S. E. Zin (1997). Spline approximations to value functions. Macroeconomic Dynamics 1(1), 255–277. Uhlig, H. (1999). A toolkit for analysing nonlinear dynamic stochastic models easily. In R. Marimon and A. Scott (Eds.), Computational Methods for the Study of Dynamic Economies, pp. 30–61. Oxford: Oxford University Press. Uhlig, H. (2007). Explaining asset prices with external habits and wage rigidities in a DSGE model. American Economic Review, Papers & Proceedings 97, 239–243. UN (2015). World population prospects: The 2015 revision, Methodology of the United Nations population estimates and projections. Working Paper ESA/P/WP.
892
Bibliography
242, United Nations, Department of Economic and Soical Affairs, Population Division. Uribe, M. and S. Schmitt-Grohé (2017). Open Economy Macroeconomics. Princeton and Oxford: Princeton University Press. Ventura, G. (1999). Flat tax reform: A quantitative exploration. Journal of Economic Dynamics and Control 23, 1425–1458. Walsh, C. E. (2010). Monetary Theory and Policy (3rd ed.). Cambridge, MA and London: MIT Press. Walter, E. (2014). Numerical Methods and Optimization, A Consumer Guide. Cham: Springer. Weil, P. (1990). Nonexpected utility in macroeconomics. Quarterly Journal of Economics 105, 29–42. Wickens, M. R. (2011). Macroeconomic Theory: A Dynamic General Equilibrium Approach (2nd ed.). Princeton and Oxford: Princeton University Press. Woodford, M. (2003). Interest and Prices. Foundations of a Theory of Monetary Policy. Princeton, NJ and Oxford: Princeton University Press. Wright, B. D. and J. C. Williams (1984). The welfare effects of the introduction of storage. Quarterly Journal of Economics 99(1), 169–192. Yaari, M. E. (1965). Uncertain lifetime, life insurance, and the theory of the consumer. Review of Economic Studies 32, 137 – 150. Young, E. R. (2005). Approximate aggregation. Computing in Economics and Finance, Society for Computational Economics 141.
Name Index
Acemo˘ glu, Daron, xii, 12, 15, 25, 29, 426 Aghion, Philippe, 172 Aiyagari, Rao S., xi, 437, 469 Aldrich, Eric M., 370 Algan, Yann, 510 Allais, Maurice, 3, 544 Allais, Olivier, 510 Amisano, Gianni, 58 Anderson, Gary, 97 Andersson, Joel, 109 Andolfatto, David, 265 Andreasen, Martin M., 79, 97, 167, 175 Antony, Jürgen, 23 Arrow, Kenneth J., 39 Aruoba, S. Bora˘ gan, 66, 263, 379, 573 Atkinson, Anthony Barnes, 469 Auerbach, Alan J., 544, 545, 579, 580, 624, 688 Badel, Alejandro, 467 Barro, Robert J., 5, 41, 265, 275, 277, 295, 616 Barthelmann, Volker, 784 Baxter, Marianne, 59, 172, 867, 869 Becker, Gary S., 661 Bellman, Richard E., 14, 434
Benhabib, Jess, 22, 366 Berck, Peter, 45, 295, 728, 733, 767, 768, 800 Bertsekas, Dimitri P., 7 Bewley, T., xi Bick, Alexander, 555 Bini, Dario A., 91 Binning, Andrew, 79, 97, 103–105, 119 Blanchard, Olivier J., 79, 543, 662, 687, 688 Blundell, Richard, 649 Boldrin, Michele, 526 Boppart, Timo, 37, 38, 70, 624 Bourguignon, François, 469 Boyd, John P., 241, 247, 766, 769, 773, 774 Brandner, Peter, 868, 869 Braun, R. Anton, 526, 566, 616, 674 Brinca, Pedro, 693 Brock, William, 5 Bronson, Richard, 727 Brooks, Robin, 617, 673 Browning, Martin, 55 Brüggemann, Bettina, 662 Brumelle, Shelby L., 376 Budría Rodríguez, Santiago, 422, 424, 531, 628, 657, 658, 698, 709
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8
893
894 Bulirsch, Roland, 756, 799 Burden, Richard L., 770, 794 Burkhauser, Richard V., 468 Burnside, Craig, 160 Cagetti, Marco, 662 Cai, Yongyang, 393 Caldara, Dario, 288 Calvo, Guillermo A., 199 Canova, Fabio, xi, 51, 53, 58 Carnahan, Brice, 819 Carter Hill, R., 727, 744, 801 Caselli, Francesco, 428 Cass, David, 5 Castañeda, Ana, 471–473, 513, 514, 518, 526–528, 530, 531, 538, 539, 628, 694, 713 Caucutt, Elisabeth M., 471 Chari, Varadarajan V., 126 Chatterjee, Satyajit, 426, 428 Chawla, M.M., 800 Chebyshev, Patnuty Lvovich, 766 Chiu, W. Henry, 431 Chow, Gregory C., 11 Christiano, Lawrence J., ix, x, 67, 172, 186, 187, 206, 212, 225, 235, 312, 336, 344, 350, 352, 429, 526, 867 Cogley, Timothy, 59, 867 Cooley, Thomas F., 51, 170, 355, 555 Corless, Robert M., 775 Corliss, George F., 108 Correia, Isabel, 312, 323, 327, 328 Costain, James, 431 Daly, Moira, 467 Dave, Chetan, xi, 51, 53 Davidson, Russell, 64, 857 de Melo, Welington, 852, 853 De Nardi, Mariacristina, 614, 615, 660–662 De Wind, Joris, 167 Deaton, Angus, 546 Debreu, Gerard, 44
NAME INDEX DeJong, David N., xi, 51, 53 Den Haan, Wouter J., 167, 235, 265, 270, 344, 431, 437, 453, 490, 505, 510, 512, 514, 515 Dennis, John E., 812, 813, 815, 821–828, 831, 833, 836, 837 Diamond, Peter A., 264, 526, 544 Díaz-Giménez, Javier, 421, 422, 424, 464, 465, 467, 471–473, 513, 514, 518, 526–528, 530, 531, 538, 539, 597, 628, 657, 658, 694, 698, 709, 713 Diehl, Moritz, 109 Dotsey, Michael, 407 Duffy, John, 235, 344, 789, 845 Eggertsson, Gauti, 530, 612, 629, 721, 722 Eichenbaum, Martin, ix, x, 172, 187, 206, 212, 225, 312, 336, 350, 352, 429 Epstein, Larry G., 276, 285 Erceg, Christopher J., 188 Erosa, Andrés, 406, 408 Euler, Leonhard, 13 Evans, Charles L., ix, x, 187, 206, 212, 312, 350, 352 Evans, Owen J., 544 Fackler, Paul L., xi Fair, Ray C., 311 Faires, Douglas, 770, 794 Favero, Carlo, 58 Fehr, Hans, xi, 615, 656 Fehrle, Daniel, 276, 278, 297, 302 Fella, Giulio, 614 Fernández-Villaverde, Jesús, 53, 66, 167, 175, 231, 236, 247, 263, 278, 283, 288, 290, 291, 343, 370, 379, 573, 577, 578, 789 Fillion, Nicolas, 775 Finn, Mary G., 307, 308 Fisher, Jonas D. M., 67, 235, 344, 526 Fitzgerald, Terry J., 867
NAME INDEX Flannery, Brian P., 761, 781, 836 Francis, Neville, 212 French, Eric, 662 Fuchs-Schündeln, Nicola, 466, 555 Gagnon, Joseph E., 311 Galí, Jordi, 212, 226, 688 Galindev, Ragchassuren, 397, 865 Gallant, A. Ronald, 370 García-Peñaloa, Cecilia, 415, 419, 423 Gertler, Mark, 543, 662 Giannini, Carlo, 58 Gillis, Joris, 109 Goldfarb, Donald, 837 Golub, Gene H., 727, 740, 742–744, 766, 805, 823 Gomme, Paul, 51, 97, 103, 119, 138, 845 Gorman, W. M., 426–428 Gorodnichenko, Yuriy, 688 Gospodinov, Nikolay, 390 Gourio, François, 276, 302, 304 Grandmont, Jean-Michel, 853 Green, Jerry R., 44, 426 Greene, William H., 94, 170, 339, 727, 758 Greenwood, Jeremy, 43, 327, 474 Griffiths, William E., 727, 744, 801 Grüner, Hans Peter, 364 Guckenheimer, John, 852, 853 Guerrieri, Luca, 398 Guu, Sy-Ming, 67 Guvenen, Fatih, 470, 690 Hagedorn, Marcus, 265, 270 Halmos, Paul R., 236 Hamilton, James D., 59, 161, 163, 170, 172, 178, 848, 864, 867 Handscomb, David, 765, 766, 770, 771, 800 Hansen, Gary D., 47, 56, 355, 414, 431, 481, 555, 563, 569, 614, 640, 678, 698 Hansen, Lars Peter, 55, 88
895 Harris, Milton, 15 Heathcote, Jonathan, 424, 469–471 Heckman, James J., 55, 469 Heer, Burkhard, xi, 66, 67, 128, 129, 252, 255, 263, 294, 308, 309, 364, 382, 385, 431, 456, 466, 471, 472, 476, 478, 482, 494, 525, 526, 561, 579, 592, 593, 597, 614–617, 621, 623, 639, 640, 655, 660, 661, 674, 688, 689, 694 Heiberger, Christopher, 92, 119, 120, 122, 123, 145, 265, 270, 274, 276, 278, 291, 294, 297, 302, 306 Henderson, Dale W., 188 Henriksen, Espen, 555 Herbst, Edward P., xi, 53 Hercowitz, Zvi, 22, 43, 327, 474 Hernandez, Kolver, 119, 123, 126 Herrera, Francisco, 839, 842–845 Hirsch, Morris W., 737 Hodrick, Robert J., 172, 867 Holland, Allison, 364 Holmes, Philip, 852, 853 Holter, Hans A., 693 Holtz-Eakin, Douglas, 468 Howitt, Peter, 172 Hubbard, Robert G., 469, 544, 615, 660 Huffman, Gregory W., 43, 327, 474 Huggett, Mark, xi, 437, 443, 445, 447, 457, 459, 467, 469, 481, 491, 525, 552, 614, 640, 642, 661, 663 Hurd, Michal D., 697 Hurtado, Samuel, 343 Iacoviello, Matteo, 398 Iannazzo, Bruno, 91 Ikeda, Daisuke, 566, 674 Iliopulos, Eleni, 526 ˙Imrohoro˘ glu, Ay¸se, 431, 437, 481, 542, 615
896 ˙Imrohoro˘ glu, Selahattin, 437, 471, 615 Ireland, Peter, 407 Irmen, Andreas, 5, 579, 593 Jermann, Urban J., 276, 291, 294, 526 Jin, He-Hui, 79, 97 Joines, Douglas H., 437, 615, 616 Jones, John B., 662 Jordà, Òscar, 275, 276, 285 Joyce, Robert, 649 Judd, Kenneth L., xi, 67, 79, 81, 97, 231, 247, 248, 280, 336, 338, 339, 344, 345, 347, 369, 373, 393, 514, 544, 744, 749, 774, 778, 783, 786, 787, 800, 805 Judge, George G., 727, 744, 801 Juillard, Michel, 514 Jung, Juergen, 663 Kågström, Bo, 135 Kahn, Charles M., 79 Kallweit, Manuel, 615 Kamien, Morton I., 217 Kamihigashi, Takashi, 13, 28 Kaplan, Greg, 555, 566, 693 Karahan, Fatih, 470, 690 Karni, Edi, 431 Kehoe, Patrick J., 126 Kessler, Denis, 661 Keynes, John M., 5 Kim, Jinill, 79, 167 Kim, Sunghyun, 79, 167 Kim, Tae-Han, 867 Kindermann, Fabian, xi, 615, 656, 662 King, Robert G., 43, 47, 54, 55, 59, 70, 79, 92, 164, 172, 867, 869 Kirkby, Robert, 370 Kitagawa, Genshiro, 157 Kitao, Sagiri, 555, 615, 616, 663 Kiyotaki, Nobuhiro, 526 Klarl, Torben, 92, 119, 120, 122, 123
NAME INDEX Klein, Paul, 92, 97, 103, 119, 121, 138 Knoll, Katharina, 275, 276, 285 Knotek II, Edward S., 390 Kochenderfer, Mykel J., 812 Kocherlakota, Narayana R., 456 Koopmans, Tjalling C., 5 Kopecky, Karen A., 865 Kotlikoff, Laurence J., 544, 545, 579, 580, 624 Kremer, Jana, 375 Krueger, Dirk, 466, 469, 593, 605, 628, 640, 658, 662, 690, 692, 698, 783 Krusell, Per, 37, 38, 70, 437, 485, 490–492, 494, 505, 508–512, 515, 516, 542, 624, 627, 628, 659, 674, 692, 693, 698, 704, 723 Kubler, Felix, 783 Kuehn, Lars-Alexander, 265, 270, 274 Kumar, Krishna, 471 Kumru, Cagri, 662 Kuvshinov, Dmitry, 275, 276, 285 Kydland, Finn E., 47, 79, 181, 184, 186, 422 Lagakos, David, 555 Laitner, John, 685 Lang, Serge, 237, 292, 727, 748, 749, 752, 753, 762, 797 Langot, François, 526 Levin, Andrew T., 97, 188 Levintal, Oren, 278, 283, 290, 291 Lillard, Lee A., 469 Lim, Guay C., 343, 789 Ljungqvist, Lars, xii, 488, 848, 861 Lkhagvasuren, Damba, 390, 397, 865 Lochner, Lance, 469 Long, John B., 21 Lontzek, Thomas S., 393 López-Salido, J. David, 226, 688 Lorenzoni, Guido, 235, 344
NAME INDEX Loury, Glenn C., 661 Lozano, Manuel, 839, 842–845 Lucas, Robert E., xii, 12, 15, 25, 29, 172, 227, 364, 434, 525, 526, 548 Lucke, Bernd, 37, 55 Ludwig, Alexander, 469, 588, 589, 593, 605, 640, 698 Luenberger, David G., 236, 237, 239 Luther, H.A., 819 Lütkepohl, Helmut, 95, 157, 727, 732, 734, 744, 801, 848 MacKinnon, James G., 64, 857 MaCurdy, Thomas E., 469 Magnus, Jan R., 104, 727 Malafry, Laurence, 693 Maliar, Lilia, viii, 280, 311, 336, 338, 339, 343–345, 347, 422, 428, 429, 509, 723, 744, 783, 786, 787, 789 Maliar, Serguei, viii, 280, 311, 336, 338, 339, 343–345, 347, 422, 428, 429, 509, 723, 744, 783, 786, 787, 789 Malin, Benjamin A., 783 Mankiw, N. Gregory, 640 Manovskii, Iourii, 265, 270 Marbet, Joel, 789 Marcet, Albert, 235, 344 Mas-Colell, Andreu, 44, 426 Mason, John C., 765, 766, 770, 771, 800 Masson, André, 661 Maußner, Alfred, 5, 23, 37, 66, 67, 92, 119, 120, 122, 123, 128, 129, 145, 221, 252, 255, 263, 266, 294, 308, 309, 355, 382, 385, 621, 623, 674, 688, 694 McGrattan, Ellen R., 126, 231 McNelis, Paul D., 235, 343, 344, 621, 623, 789, 845 Mehra, Rajnish, 275, 456 Mehrotra, Neil R., 530, 612, 629, 721, 722
897 Meini, Beatrice, 91 Merz, Monika, 265, 306 Meyer-Gohde, Alexander, 119 Michalewicz, Zbigniew, 839, 844 Michelangeli, Valentina, 393 Miranda, Mario J., xi Mirman, Leonard, 5 Mise, Emi, 867 Mitchell, Melanie, 839, 841, 842 Mitman, Kurt, 628, 658, 662, 690, 692, 693 Mogensen, Patrick Kofod, 558 Moll, Benjamin, 693 Monacelli, Tommaso, 688 Moore, John, 526 Morrisson, Christian, 469 Mortensen, Dale, 264, 526 Murata, Yasuo, 850 Murphy, Daniel, 688 Muth, John F., 26 Nakajima, Tomoyuki, 526 Nakornthab, Arm, 662 Nason, James M., 59, 867 Neidinger, Richard D., 109 Nelson, Charles R., 34 Neudecker, Heinz, 104, 727 Neusser, Klaus, 848, 868, 869 Neves, Joao, 312, 323, 327, 328 Newbold, Paul, 867 Nishiyama, Shinichi, 615, 616 Nocedal, Jorge, 812 Norris Keiller, Agnes, 649 Novak, Erich, 784 Nowak, U., 323 Nuño, Galo, 343, 789 Nybom, Martin, 467 OECD, 570, 640, 678 Ohanian, Lee E., 723 Ospina, Sandra, 445 Ozkan, Serdar, 470, 690 Ozlu, Elvan, 227 Palis Jr., Jacob, 852, 853 Perotti, Roberto, 687, 688
898 Perri, Fabrizio, 424, 466, 470, 628, 658, 662, 690, 692 Peterman, William B., 631 Petrosky-Nadeau, Nicolas, 265, 270, 274 Piketty, Thomas, 465, 466 Pissarides, Christopher, 264, 526 Pistaferrri, Luigi, 466 Plosser, Charles I., 21, 34, 43, 47, 54, 55, 70, 79 Polito, Vito, 561 Poromaa, Peter, 135 Pratt, John W., 39 Prescott, Edward C., xii, 12, 15, 25, 29, 47, 51, 52, 79, 170, 172, 181, 184, 186, 275, 434, 456, 548, 678, 867 Press, William H., 761, 781, 836 Puterman, Martin L., 376, 377 Quadrini, Vincenzo, 421, 422, 424, 464, 465, 467, 531, 597, 628, 657, 658, 660–662, 698, 709 Quarteroni, Alfio, 756 Rachedi, Omar, 789 Radner, Roy, 21 Rall, Louis B, 108 Ramey, Garey, 265, 270, 431 Ramey, Valerie A., 212, 229 Ramsey, Frank, 3, 4 Ravn, Morten O., 539, 688, 867, 869 Rebelo, Sergio T., 43, 47, 54, 55, 70, 79, 164, 312, 323, 327, 328, 867 Reiter, Michael, 510 Rendahl, Pontus, 510 Rhody, Stephen E., 468 Rietz, Thomas A., 275 Ríos-Rull, José-Victor, 421, 422, 424, 437, 442, 464, 465, 467, 471– 473, 501, 513, 514, 518, 526– 528, 530, 531, 538, 539, 597, 628, 636, 657, 658, 694, 698, 709, 713
NAME INDEX Riseth, Asbjørn Nilsen, 558 Ritter, Klaus, 784 Robbins, Jacob A., 530, 612, 629, 721, 722 Rogerson, Richard, 265, 366 Rohrbacher, Stefan, 640 Romer, Paul, 11, 172 Rotemberg, Julio J., 688 Rouwenhorst, K. Geert, 396, 511, 865 Rubio-Ramírez, Juan F., 53, 66, 167, 175, 231, 236, 247, 263, 288, 370 Ruf, Halvor, 291, 294 Rupert, Peter, 51, 569 Rustichini, Aldo, 22 Sacco, Riccardo, 756 Saecker, Johanna, 119 Saez, Emmanuel, 465, 466, 640 Sala-i Martin, Xavier, 5, 41 Saleri, Fausto, 756 Sampson, Michael, 22 Samuelson, Paul A., 3, 544 Santos, Manuel S., 650 Sargent, Thomas J., xii, 88, 235, 488, 615, 789, 848, 861 Schäfer, Holger, 468 Scharrer, Christian, 526, 617, 674, 688 Schaumburg, Ernst, 79, 167 Schmidt, Jörg, 468 Schmitt-Grohé, Stephanie, 79, 92, 95, 97, 158, 187, 188, 196, 206, 215, 328, 334, 335, 526, 678, 688 Schnabel, Robert B., 812, 813, 815, 821–828, 831, 833, 836, 837 Schorfheide, Frank, xi, 53, 231, 236, 247 Schularick, Moritz, 275, 276, 285 Schüler, Yves S., 867 Schwartz, Nancy L., 217 Scott, Andrew, 364 Sergi, Francesco, 187
NAME INDEX Shimer, Robert, 265 Shin, Moon Chirl, 377 Shorrocks, Anthony F., 469, 473 Sidrauski, Miguel, 482 Sieg, Holger, 479 Sims, Christopher A., 79, 92, 167 Skinner, Jonathan, 469, 544, 615, 660 Smale, Stephen, 737 Smets, Frank, ix, 135, 155, 187, 192, 196, 210, 214–216, 225, 227 Smetters, Kent, 615, 616 Smith, Anthony A., 437, 485, 490– 492, 494, 505, 508–512, 515, 516, 542, 627, 628, 659, 674, 692, 693, 698, 704 Smolyak, S. A., 783 Solow, Robert M., 3, 33, 70 Sommer, Kamila, 631 Sommer, Mathias, 466 Song, Jae, 470, 690 Sopraseuth, Thepthida, 526 Spatz, Julius, 37 Starchurski, John, xi Starr, Ross M., 44 Stock, James H., 339 Stoer, Josef, 756, 799 Stokey, Nancy L., xii, 12, 15, 25, 29, 434, 548 Storesletten, Kjetil, 469, 471, 525, 597, 617, 673, 698 Strøm, Arne, 45, 295, 728, 733, 767, 768, 800 Stroud, A.H., 801, 802, 807, 808 Su, Che-Lin, 393 Suen, Richard M.H., 865 Summers, Lawrence H., 530, 544 Sundaram, Rangarajan K., 7, 835 Süßmuth, Bernd, 294, 688 Swan, Trevor W., 3 Swanson, Eric T., 43, 75, 97, 285 Sydsæter, Knut, 45, 295, 728, 733, 767, 768, 800
899 Taber, Christopher, 469 Takayama, Akira, 36 Tauchen, George, 396, 511, 864, 865 Taylor, Alan M., 275, 276, 285 Taylor, John B., 311 Telmer, Chris I., 525, 597, 617, 673, 698 Terry, Stephen J., 390 Teukolsky, Saul A., 761, 781, 836 Todd, Richard M., 186 Tomes, Nigel, 661 Trabandt, Mathias, 336, 474, 561, 563, 569, 570, 597, 606, 640, 722 Tran, Chung, 663 Trede, Mark, 466, 471, 472, 476, 478 Trick, Michael A., 391, 392, 397 Trigari, Antonella, 688 Tsener, Inna, 311, 723 Turnovsky, Stepen J., 415, 419, 423 Uhlig, Harald, 92, 160, 308, 309, 474, 539, 561, 563, 569, 570, 597, 606, 640, 722, 867, 869 UN, 569, 591, 639, 678, 721 Uribe, Martin, 79, 92, 95, 97, 158, 187, 188, 196, 206, 215, 328, 334, 335, 526, 678 Ursúa, José F., 265, 275, 277, 295 Valencia, David Zarruk, 370, 577, 578 Valero, Rafel, 280, 783, 786, 787 Vallés, Javier, 226, 688 Valli, Fernando, 509 Van Loan, Charles F., 727, 740, 742– 744, 823 Ventura, Gustavo, 406, 408, 469, 471, 478, 491, 552, 614, 615, 642, 663 Ventura, Jaume, 428 Verdegay, José L., 839, 842–845 Veterling, William T., 761, 781, 836
900 Vigfusson, Robert, 212 Violante, Giovanni L., 424, 466, 469–471, 566, 693, 723 Walsh, Carl E., 187 Walter, Éric, 756, 798 Watson, Joel, 265, 270, 431 Watson, Mark W., 92, 339 Weil, Phillipe, 276, 285 Weimann, L., 323 Weinzierl, Matthew, 640 Welsch, John H., 766, 805 Wheeler, Tim A., 812 Whinston, Michael D., 44, 426 Wickens, Michael R., 561, 629 Wilkes, James O., 819 Williams, Jeffrey C., 235 Willis, Robert J., 469 Winant, Pablo, viii, 343, 789 Woodford, Michael, 187, 688 Wouters, Ralf, ix, 135, 155, 187, 192, 196, 210, 214–216, 225, 227 Wright, Brian D., 235 Wright, Randall, 265, 366 Wright, Stephen J., 812 Yaari, Menahem E., 543 Yagan, Danny, 640 Yang, Fang, 614, 661 Yaron, Amir, 525, 552, 597, 617, 673, 698 Young, Eric R., 491 Zanella, Giulio, 569 Zeldes, Stephen P., 469, 615, 660 Zhang, Lu, 265, 270, 274 Ziliak, James P., 649 Zin, Stanley E., 276, 285, 391, 392, 397 Zucman, Gabriel, 465
NAME INDEX
Subject Index
Accidental bequests definition, 567 derivation, 618–619 Accumulated contributions, 632 Accuracy measures, see Euler equation residuals in the Krusell-Smith algorithm, 512–513, 515–516 Adjustment costs of capital, 206, 291, 327 Algebraic multiplicity, 737, 850 Annuity markets, 593, 721 Anticipated inflation, 358 Approximation methods, see Extended path, see Interpolation, see Perturbation, see Stochastic simulation, see Value function iteration, see Weighted residuals Arbitrage, 637 Asset asset market, incomplete, 456–463 asset-based means test, 660 constraint, 432 distribution, 433–456, 487 no-arbitrage condition, 637
Auerbach-Kotlikoff model, 543, 579–581, 627 local stability, 685 Autoregressive process AR(1), 52, 55, 317, 384, 469, 531, 675, 678, 859, 864, 865 AR(2), 473, 531, 860 ARMA(1,2), 469 VAR(1), 94, 156, 165, 320, 859, 860 Backward induction, 550, 553 Backward iteration, 115, 849 Balanced growth path, 33, 50, 70, 159, 311 of the small open economy, 329 Banking deposits, 350, 353 Banking sector, 350 Basis complete set of polynomials, 252, 261, 263, 778 of Rn , 730 orthonormal, 238, 740 tensor product, 251, 261, 263, 778 Basis functions, 246–248 Bellman equation, 10, 14, 65, 371, 399 in the OLG model, 557
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 B. Heer and A. Maußner, Dynamic General Equilibrium Modeling, Springer Texts in Business and Economics, https://doi.org/10.1007/978-3-031-51681-8
901
902 in the search and matching model, 267 in the simple heterogenous agent model, 434 in the stochastic LQ problem, 89 of the retired worker, 647 Benchmark business cycle model, 47–51, 172, 176–181, 233, 245, 250, 260–264, 321–323, 345–349, 401–403 Bequests and wealth inequality, 614 in the OLG model, 592, 721 BFGS, see Quasi-Newton method Bisection method, 815–816 Borrowing rate, 519 Boundary value problem, 854–856 Broyden algorithm, see Broyden’s method Broyden’s method, 586, 588–590, 598, 608, 609, 624, 822–823 Burn-in period, 165, 264, 323, 347, 403 Business cycle statistics in Germany, 60 in the benchmark model, 60 in the model with limited participation, 359 in the New Keynesian model, 216 in the OLG model, 689 in the small open economy, 334 in the US, 213 of the income distribution, 713 Calibration definition, 51–53 demographic transition, 597 income dynamics, 471, 531 of the benchmark model, 53–56 of the disaster risk model, 278 of the limited participation model, 355 of the New Keynesian model, 196 of the OLG model, 569
SUBJECT INDEX of the search and matching model, 270 of the small open economy model, 328 unemployment dynamics, 472 Calvo price staggering, 199–201 Canonical DSGE model, 93–95 Capital market equilibrium, 595, 637 international, 325 Capital utilization, 206, 307 Cash-in-advance, 350, 407 Certainty equivalence, 40, 91, 100, 319, 320, 336 Certainty equivalent, 40 CES production, 22 CES utility, 22 Change of variable technique, 254, 800–801, 803–805 Chebyshev coefficients, 560, 772 nodes, 559 polynomials, 559, 766 regression, 771–774, 781 truncation theorem, 773 Cholesky factorization, see Matrix factorization Cobb-Douglas technology, 71 Cobb-Douglas utility, 592 Coefficient of relative risk aversion, 39, 285, 431 Collocation, see Weighted residuals, collocation Complex numbers, 162, 727–729, 849 Composite function, 98, 103, 132, 133, 135, 136, 140, 143 Condition number, 744, 758 Conditional expectations, 26, 29, 90, 138, 234, 243, 383, 518 Consistency conditions aggregate, 437, 486, 489 Constraint asset constraint, 432, 457 budget constraint, 46, 430, 434
SUBJECT INDEX credit constraint, 432, 457 liquidity constraint, 518–525, 615 Consumption-age profile in the OLG model with uncertainty, 657 with age-dependent productivities, 574 Contraction mapping theorem, 373 Control variable, 89, 93, 183, 320, 393 Convergence distribution function, 447 linear, 376, 377, 813 nonlinear equations, 820 quadratic, 376, 813 rate of, 812 superlinear, 813 Covariance stationarity, 857 CRRA utility, 43 Cubic approximation, see Perturbation, solutions, third-order Cumulative distribution function, 434 Curse of dimensionality, 248, 255, 276, 343, 370, 777, 803 Demographic transition, 590–617 Density function, 434, 442, 449–451 invariant, 450, 522 Dependency ratio, see Old-age dependency ratio Derivative numerical, 110 symbolic, 106 Determinant, 731, 732, 737, 801, 804 Deterministic growth model, see Ramsey model, infinite-horizon deterministic Difference equations forward-looking, 128 linear, 82, 848–851
903 nonlinear, 94, 344, 851–854 stability, 850 stochastic, 121, 311, 321 Difference stationary growth, 33–34, 858 Differentiation automatic, 106, 108–110 numerical, 106, 110, 792–797 symbolic, 106, 110 Dirac measure, 240 Disaster risk model, 275–304 Discount factor, 10, 39, 55 in OLG model, 546 Discretization density function, 449–451 distribution function, 442–449 value function, 369 Distribution approximation of, 442–449, 452–456 ergodic, 444, 453, 454, 513 income, 465 invariant, 443, 452 money-age, 621 stationary, 434, 485, 552–556 US wealth, 420 wealth, 448, 465 Dunlop-Tarshis observation, 429 Dynamic programming, 13, 15, 28, 76, 369, 401, 520 Earnings distribution of, 463–473, 537, 628, 661 inequality, 463–473, 658 mobility, 463–473, 531 Economy, decentralized, 44–47, 429 Eigenspace, 85, 738, 851, 853, 854 Eigenvalue, 83, 320, 736–740, 805, 850, 863 Eigenvector, 94, 736–738, 806, 851, 863 Employment distribution, 508
904 history, 414, 546 mobility, 513 probability, 414, 436, 528 stationary, 438 status, 414, 431, 487, 488, 511 transition matrix, 436, 444, 493 transition probability, 486 Endogenous growth, 172, 227, 593 Equilibrium competitive, 429 decentralized, 429, 581 partial, 523 recursive competitive, 434 stationary, 430, 433–441, 475, 550–552, 620 Equity premium, 456 premium puzzle, 275, 308 Ergodic set, 245, 311, 323, 343, 439, 521 Euclidean space, 93, 246, 748, 762, 800 Euler equation, 13, 234, 317, 356, 432, 502, 553, 680 Euler equation residuals definition, 66 extended path, 323 perturbation, 177 stochastic simulation, 340, 347 VI, 382, 404 weighted residuals, 259, 263 Euler’s theorem, 45, 47, 618 Exchange economy, 457–459 Exponential function, class of, 453–456 Extended path, 64, 312–321, 324, 334, 398, 811, 821 Fertility rate, 600 Final goods producer, 188 Finite difference method, 791 Finite element method, see Weighted residuals Fixed point, 851 Fixed-point iteration, 340
SUBJECT INDEX Forward iteration, 115, 553, 849 Fourier transform, 165 Frequency domain analysis second moments, 159 Frisch elasticity, 42, 43, 310, 360, 563 Function space, 487, 748 Functional equation, 14, 28, 234, 243, 371, 849 Gauss-Chebyshev quadrature, 455 Gauss-Lobatto points, 769 Gauss-Newton algorithm, 344, 838 Gauss-Newton method, 829, 831–834 Gauss-Seidel algorithm, 589, 624, 818–819 Gaussian formulas, see Numerical integration Gaussian plane, 727, 729 Generalized stochastic simulation (GSS), see Stochastic simulation Genetic search algorithm, 340, 341, 838–846 Geometric multiplicity, 737, 850 German Socio-Economic Panel GSOEP, 472, 479 Gini coefficient definition, 421 earnings, 464, 538 income, 464, 479, 538, 698, 709 wages, 466 wealth, 464, 479, 480, 540, 614, 628, 658, 660, 698 Golden section search, 378, 382, 440, 460, 520, 722, 829–831 Goodness of fit R-squared (R2 ) in the KS algorithm, 512–513, 515–516 Gorman preferences, 425–429, 856 Gorman’s Aggregation Theorem, 426 Government
SUBJECT INDEX budget, 433, 475, 499, 548, 582, 677 policy, 433, 475, 488, 549, 579 revenues, 433, 475 Government spending shock in the New Keynesian model, 212 in the OLG model, 686 Grid search, 554 Habits in consumption, 206, 309 in leisure, 309 Hermite polynomial, see Polynomials Hermitian transpose, 121, 733, 740 Hessian matrix, 99, 109, 132, 135, 183, 751, 796, 835, 837 chain rule, 104 numerical approximation, 796 secant approximation, 836 Hicksian efficiency, 615 Hodrick-Prescott filter, 58–60, 160, 173, 178, 848, 858, 867–869 Home production, 366, 518, 529, 533 Homotopy, 315, 556 Howard’s improvement algorithm, see Policy function iteration HP-filter, see Hodrick-Prescott filter Human capital, 428 Idiosyncratic risk income, 470, 630 unemployment, 414 IES, 41, 43, 285, 563, 631 Ill-conditioned problem, 345, 741 Implicit function theorem, 63, 81, 82, 86, 96–98, 747, 752–753 Impulse response function, 56, 173–176 Income capital income, 436, 546, 615 current, 494 cyclical behavior of income shares, 527, 539
905 dispersion of, 464 distribution of, 415, 467, 485, 518, 526–540, 660 heterogeneity of, 464 mobility, 467, 527, 528, 539, 544 tax, 415, 431, 470, 485, 492, 615 Independently and identically distributed, 857 Indicator function, 432, 473 Inflation business-cycle statistics, 215 response to a TFP shock, 212 response to an interest rate shock, 212 Initial value problem (IVP) Perturbing of, 481 Insurance, incomplete, 456–459 Interest rate shock, 204, 331 Intermediate goods, 187 Interpolation bicubic, 780 bilinear, 395, 401, 403, 510, 779–780 Chebyshev, 772 cubic, 378, 388, 558, 623, 760–761 Lagrange, 747, 753–758 linear, 378, 380, 388, 439, 494, 558, 646, 759–760 multidimensional, 777–789 orthogonal, 239, 765–766 shape preserving, 755 Smolyak, 783–787 Intertemporal budget constraint, 426 Intertemporal elasticity of substitution, see IES Investment nonnegative, 397–401 volatility in the benchmark model, 60
906 Jacobian matrix, 83, 98, 109, 120, 254, 587, 589, 608, 609, 752, 804, 820, 823, 834, 852 numerical approximation, 795–796 Jordan factorization, see Matrix factorization Jump variable, 93, 320 Kaldor-Hicks improvement, 616 Karush-Kuhn-Tucker conditions, 7, 12, 353 Karush-Kuhn-Tucker theorem, 7, 8, 11, 399, 405 Kronecker product, 94, 390, 734 Krusell-Smith algorithm, 490–493, 693 Labor effective, 425, 479 productivity, 31, 469 Labor agency, 189 Labor demand, 62, 474 Labor supply, 61, 545, 549, 559, 620, 678 elasticity, 479 endogenous, 471, 500 Frisch elasticity, see Frisch elasticity Labor union, 190 Labor-augmenting technical progress, 32, 33, 35, 37, 44, 48, 54, 60, 172, 182, 186, 190, 195, 276, 325, 328, 351 Labor-supply-age profile in the OLG model with uncertainty, 658 in the standard OLG model, 555 with age-dependent productivities, 575 Laffer curve, 561–578 g-Laffer curve, 576 s-Laffer curve, 576 Lagrange polynomial, see Polynomials
SUBJECT INDEX Lagrangian function, 7, 182, 193, 207, 326, 352, 432, 546 Law of iterated expectations, 224, 405 Law of large numbers, 170, 440 Least squares approximation, 764 Least squares method linear, 231, 498 nonlinear, 339, 829 Leibniz’s rule, 408 Lending rate, 519 l’Hôpital’s rule, 41 Life-cycle model, see Overlapping generations model Life-cycle savings, see Wealth-age profile Lifetime utility in the OLG model, 545 in the Ramsey model, 7 Likelihood function, 127 Line search, 824–828, 833, 834, 836–838 Linear algebra, 119, 135, 377, 727–745 Linear approximation, see Perturbation, solutions, first-order Linear-quadratic model, 79, 88–91, 181 Liquidity effect, 350, 358 Log-linear technology, 21 Logarithmic preferences, 21 Lorenz curve earnings, 465, 657 income, 465, 538 wealth, 465, 540, 658 LU factorization, see Matrix factorization Lyapunov equation, 157 Machine epsilon, 793 Manifold, 853 Markov chain, 29, 250, 270, 383, 384, 390, 431, 471, 641, 860
SUBJECT INDEX approximation of AR(1)-process, 864–866 ergodic distribution, 439, 861 Markov process, 27, 29, 34, 55, 631, 859–866 Markov property, 859 Matching function, 268 Matrix chain rule, 97, 103–105, 119, 131, 139, 140, 142, 143 Matrix factorization Cholesky, 58, 741, 742, 804, 837 Jordan, 738–740, 804, 849 LU, 741, 742, 821 QR, 123, 126, 742, 823 QZ, 121, 122, 740 Schur, 85, 115, 124, 127, 740 singular value, 743 Method of undetermined coefficients, 24, 68, 77 Mobility German income, 468 German wage, 472 US earnings, 467 US income, 467 Modified policy iteration, 377, 379, 388 Moivre’s theorem, 160 Money in the utility function, 482 Sidrauski model, 482 superneutrality, 482 Monomials, see Polynomials Monopolistic competition, 187 Monte-Carlo integration, 341 Monte-Carlo simulation, 451–453, 862 Multicollinearity, 247, 343, 345, 744, 757, 789 Nash bargaining, 267 National accounts, 54 Natural spline, 761 Neoclassical growth model, see Ramsey model
907 Neural network, 337, 343, 345, 348, 349, 787–789 New Keynesian model, 186–216 New Keynesian Phillips curve, 219–222 Newton’s method, 376, 835–836 Newton-Raphson method, 402, 620, 722, 816–822 globally convergent extension, 824–828 No-arbitrage condition, 637 Nominal frictions, 204 Nonlinear equations, 815–828, see Bisection method, see Gauss-Seidel algorithm, see Line search, see Newton-Raphson method, see Trust region Nonlinear least squares, see Least squares method Norm, 81, 91, 237, 342, 376, 729, 748, 812, 820 L 1 , 748 L 2 , 749, 765, 773 sup, 730, 749 Numerical differentiation, see Differentiation Numerical integration, 232, 798–803 Gauss-Hermite, 177, 250, 262, 263, 271, 282, 283, 289, 296, 301, 323, 335, 339–341, 346, 357, 385, 395, 403, 804–806 Gaussian formulas, 799–801 monomial formulas, 801–803, 806–809 Newton-Cotes formulas, 798–799 Trapezoid rule, 798 Numerical optimization, 829–846, see Gauss-Newton method, see Genetic search algorithm, see Golden section search, see Newton’s method, see Quasi-Newton method
908 Old-age dependency ratio, 602 Open economy, see Small open economy model Optimization, see Numerical optimization Overlapping generations model, 3 aggregate uncertainty, 672–714 business cycle dynamics, 695–714 demographic transition, 589–617 individual uncertainty, 628–663 steady state, 545–578 transition dynamics, 579–589 with perfect foresight, 543–617 Parallel computing, 370, 395, 577–578 Partial information, 490–500, 534 Pay-as-you-go, 561 Pension system contributions-based, 632 pay-as-you-go, 561 progressive, 632 Pensions, 546 earnings-related, 620 lump-sum, 629 replacement rate, 579, 678 Periodic functions, 247 Perpetual youth model, 543 Perturbation, 63 framework, 79–110 implementation, 143–145 solutions first-order, 120–130, 684 second-order, 131–138 third-order, 139–143 Tools, 81–82 Phase diagram, 16–20 Phillips curve derivation, 219–222 Policy function choice of, 355 first-order approximation, 120 in the canonical DSGE model, 95
SUBJECT INDEX in the deterministic Ramsey model, 14 in the stochastic LQ model, 89–91 second-order approximation, 102, 131 third-order approximation, 139 Policy function iteration, 376–377, 381, 388 modified, 389 Polynomials Chebyshev, 337, 343, 559, 747, 756, 766–776, 781 Hermite, 337, 343, 747, 766 Lagrange, 747, 755, 756, 761, 779, 780, 798 monomial, 241, 245, 246, 252, 337, 343, 345, 356, 753, 757, 758, 778, 802 orthogonal, 762, 764–765 Smolyak, 783–787 Population growth rate in the US, 591 versus fertility rate, 600 Portfolio adjustment capital and foreign bonds, 333 Power spectrum, 162 Predetermined variable, 233, 320 Prediction error, 515 Preferences, see Utility function Price staggering, 199–201 Principle of optimality, 14, 434 Probability distribution, 522 Production function Cobb-Douglas, 21, 547 constant returns to scale, 5, 433, 547 marginal product, 5, 433, 489 properties, 5 Productivity growth, 32 Productivity-age profile, 570 Projection, see Weighted residuals Pruning, 166–169, 216 Public debt and the fiscal budget, 566
SUBJECT INDEX sustainability, 616 QR factorization, see Matrix factorization Quadratic approximation, see Perturbation, solutions, second-order Quasi-Newton method, 829, 836–838 BFGS, 559 QZ factorization, see Matrix factorization R-squared (R2 ) Accuracy measure in the KS algorithm, 512–513, 515–516 Ramsey model, 3, 4 finite-horizon, 6–7, 10, 312–315 infinite-horizon deterministic, 9–20, 82–88, 166, 315–316, 371, 379, 429 infinite-horizon stochastic, 26, 311, 316–319, 393–397 with nonnegative investment, 397–401 Random number generator, 451, 511 Random walk with drift, 858 Rank, 101, 123, 126, 732, 863 Rate of interest risk-free, 414, 456–463 risk-free rate puzzle, 414 Rational expectations, 26, 79, 119, 235, 399 Rationality, bounded, 492, 511 Recursive methods, xii, 371, 383, 389 Recursive utility, 10 Redistribution intergenerational, 543 of income, 414, 467 Regression in the Krusell-Smith algorithm, 498 linear, 339
909 nonlinear, 343, 344 Replacement rate of pensions, see Pensions unemployment compensation, 445 Residual function, 235, 240, 248–251 Returns to scale constant, 5, 547 Reverse shooting, 419, 856 Ricardian equivalence, 616 Riccati equation, 90 Richardson’s extrapolation, 109, 110, 794, 796 Risk aversion, 39–41, see CRRA utility Risk premium shock in the NK model, 225 Risk, idiosyncratic, 414, 469, 505, 525 Risk-free rate puzzle, see Rate of interest Root finding, 757, 760 Rouwenhorst method, 865 Runge function, 756, 757, 761, 775 Saddle path, 16–20, 115 Sampling stochastic universal, 841 with replacement, 840 Savings age-dependent, 551 aggregate capital, 646 and intertemoral substitution, 32 and portfolio adjustment, 333 behavior, 302, 495 definition, 70 function, 87, 496 idiosyncratic risk, 695 in the storage economy, 521 life-cycle, 681 lifetime, 610 of the (un)employed worker, 496 old-age, 574 precautionary, 430, 662
910 rate, 23, 447, 491 stocks, 265 Schur decomposition, see Schur factorization Schur factorization, see Matrix factorization Search and matching model, 264–275, 306 Secant Hermite spline, 761 Secant method, 462, 554, 620, 817 Second moments analytical frequency domain, 159–165 time domain, 156–158 Monte-Carlo simulation, 165–169 Seigniorage, 408, 484, 622 Sharpe ratio, 275, 293, 297, 303 Shooting, 854–856 Shooting method, 490 reverse, 419 Simpson’s rule, 799 Simulation, see Monte-Carlo simulation Simulation-based methods, 311–361 Singular value decomposition, see Matrix factorization Small open economy model, 323–335 Smolyak polynomial, see Polynomials Social security, 660 contributions, 546, 620 system, 491 Sparse matrix, 377, 381, 382 Sparse matrix methods, 442 Splines, see Interpolation, cubic, see Interpolation, linear Stability asymptotic, 95, 849 in the OLG model, 685 State space, 398 choice of, 245 discretization, 65, 369, 371, 442, 456, 556, 641
SUBJECT INDEX in the deterministic Ramsey model, 244–246 in the disaster risk model, 275 in the limited participation model, 349 in the search and matching model, 270 in the stochastic Ramsey model, 261 individual, 434, 475, 487, 534 solution methods, 63 State variable, 89, 93 Stationary distribution, 30 Stationary equilibrium, 97, 174, 851 Stationary solution, 96, 183, 233 Stationary variables, 38 Steady state deterministic, 97, 167 distribution, 552 nonstochastic, 97, 627, 678–684 transition to the new, 579 Stochastic growth model, see Ramsey model, infinite-horizon stochastic Stochastic process, 32, 294, 351, 393, 856–866 autoregressive, see Autoregressive process covariance stationary, 33, 95, 160, 172, 383, 857 difference stationary, 34, 52 trend stationary, 34 Stochastic simulation, 65, 245, 312, 336, 341–343 Stopping criteria, 382, 812–815 Sylvester equation, 128, 129 generalized, 135, 141, 144, 157 Symbolic differentiation, see Differentiation Tatonnement process, 573, 605, 609 Tax capital income, 577
SUBJECT INDEX consumption tax, 415, 464, 476 income, 415, 431, 480, 615 income tax reform, 470–479 labor income, 546, 583 revenues, 433, 475, 548 system, 615 Taylor’s theorem, 63, 80, 82, 100, 120, 131, 139, 747, 749–752, 764, 791, 792 Tensor notation, 98–102 Tensor product basis, 252, 778 Tightness of the labor market, 265 Time series, 848, 856 Time-to-build model, 181–186 Topological conjugates, 852 Trace, 138, 159, 734 Trade balance, 325 Transition matrix, 29, 250, 384, 389, 390, 431, 472, 493, 505, 641, 739, 860, 861, 865 conditional, 519 Transitional dynamics demographic transition, 589–613 in the heterogeneous-agent model, 489–505 in the OLG model, 579–589 Transpose, 732, 733 Transversality condition, 13, 17, 18, 20, 28 in model with Gorman preferences, 427 Trend stationary growth, 33–34, 858 Triangular matrices, 740 Truncation Chebyshev truncation theorem, 773 error, 774, 794 Trust region, 827–828 Uncertainty, 627 aggregate, 430, 672–714 individual, 628–663 state-dependent, 142
911 Undetermined coefficient, see Method of undetermined coefficients Unemployment compensation, 430, 433, 445, 473, 492, 660 cyclical, 527 duration of, 436 replacement rate, 436, 660 risk, 430 Uniform approximation, 249 Unintended bequests, see Accidental bequests Unit circle, 320, 330, 729, 851, 852 Unit root, 330, 334 Unitary matrix, 121, 734, 737, 740 Utility function CES, see CES utility Greenwood-Hercowitz-Huffman preferences, 43 instantaneous, 37, 499 King-Plosser-Rebelo preferences, 43 lifetime, see Lifetime utility Vacancy posting, 267 Value function, 15, 369–372, 374, 376–378, 383, 393, 399, 403, 495, 556 approximation, 755 concavity, 379 initial guess, 90 Value function iteration, 15, 65, 369, 396, 813 accuracy, 382 analytical solution, 68 deterministic models, 371–376 finite lifetime, 557 in heterogenous-agent economy, 477 in the OLG model, 556 stochastic models, 383–391 stopping criteria, 382 VAR, see Autoregressive process Variational coefficient
912 effective labor, 479, 480 labor supply, 480 working hours, 479 Vec operator, 128, 134, 141, 157, 388, 734–735 Vector space, 236, 237, 238, 748, 749, 753, 764 Wage income, 432, 447, 496, 546, 678 Wage Phillips curve derivation, 223–224 Wage staggering, 201–204 Walras’s law, 654 Wealth average, 536 concentration of, 466, 480, 540, 628 distribution, 441, 446, 454, 479, 491, 510, 536, 614, 628 financial, 350 heterogeneity, 540, 628, 660 Wealth-age profile, 558, 582 in the OLG model with Uncertainty, 656 in the standard OLG model, 554 with age-dependent productivities, 575 Weierstrass approximation theorem, 753–754 Weight function, 238, 242, 244, 254, 764, 765, 766, 770, 802, 803, 805, 806 Weighted residuals, 64, 243–244, 264, 311, 747, 791, 811 collocation, 240, 251–253, 257, 259, 559 finite element method, 64, 241, 243, 245, 252, 257, 259, 747 Galerkin, 240, 253, 257, 259, 261, 264, 270 least squares, 240, 257, 259, 559 Smolyak collocation, 275, 280, 288, 296, 301 spectral method, 64, 241, 243
SUBJECT INDEX Weighted residulas Smolyak collocation, 252 Welfare analysis, 413 effects, 525 Welfare economics theorem, 44 White noise, 859 Workforce, 657 Young’s theorem, 73, 797 Zero lower bound on investment, see Ramsey model with nonnegative investment on nominal interest rate, 530 on wealth, see Asset, constraint