248 23 7MB
English Pages [251] Year 2021
Spatial Microeconometrics
Spatial Microeconometrics introduces the reader to the basic concepts of spatial statistics, spatial econometrics and the spatial behavior of economic agents at the microeconomic level. Incorporating useful examples and presenting real data and datasets on real firms, the book takes the reader through the key topics in a systematic way. The book outlines the specificities of data that represent a set of interacting individuals with respect to traditional econometrics that treat their locational choices as exogenous and their economic behavior as independent. In particular, the authors address the consequences of neglecting such important sources of information on statistical inference and how to improve the model predictive performances. The book presents the theory, clarifies the concepts and instructs the readers on how to perform their own analyses, describing in detail the codes which are necessary when using the statistical language R. The book is written by leading figures in the field and is completely up to date with the very latest research. It will be invaluable for graduate students and researchers in economic geography, regional science, spatial econometrics, spatial statistics and urban economics. Giuseppe Arbia is full professor of economic statistics at the Faculty of Economics, Catholic University of the Sacred Heart, Rome, and lecturer at the University of Italian Switzerland in Lugano. Since 2006 he has been president of the Spatial Econometrics Association and since 2008 he has chaired the Spatial Econometrics Advanced Institute. He is also a member of many international scientific societies. Giuseppe Espa is full professor in economic statistics at the Department of Economics and Management of the University of Trento and the LUISS “Guido Carli” University of Rome. Diego Giuliani is associate professor in economic statistics at the Department of Economics and Management of the University of Trento. He works primarily on the use and development of statistical methods to analyze firm-level microgeographic data.
Routledge Advanced Texts in Economics and Finance
27 Regional Economics, Second Edition Roberta Capello 28 Game Theory and Exercises Gisèle Umbhauer 29 Innovation and Technology Business and Economics Approaches Nikos Vernardakis 30 Behavioral Economics, Third Edition Edward Cartwright 31 Applied Econometrics A Practical Guide Chung-ki Min 32 The Economics of Transition Developing and Reforming Emerging Economies Edited by Ichiro Iwasaki 33 Applied Spatial Statistics and Econometrics Data Analysis in R Edited by Katarzyna Kopczewska 34 Spatial Microeconometrics Giuseppe Arbia, Giuseppe Espa and Diego Giuliani
For more information about this series, please visit: www.routledge.com/ Routledge-Advanced-Texts-in-Economics-and-Finance/book-series/SE0757
Spatial Microeconometrics
Giuseppe Arbia, Giuseppe Espa and Diego Giuliani
First published 2021 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2021 Giuseppe Arbia, Giuseppe Espa and Diego Giuliani The right of Giuseppe Arbia, Giuseppe Espa and Diego Giuliani to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record has been requested for this book ISBN: 978-1-138-83374-6 (hbk) ISBN: 978-1-138-83375-3 (pbk) ISBN: 978-1-315-73527-6 (ebk) Typeset in Galliard by codeMantra
G.A. To Pietro, my first grandson G.E. To the memory of my father, Salvatore D.G. To Serena
Contents
Foreword
xiii
LU NG -F EI L EE
Preface and acknowledgements PART I
xix
Introduction
1
1
3
Foundations of spatial microeconometrics modeling 1.1 A micro-level approach to spatial econometrics 3 1.2 Advantages of spatial microeconometric analysis 5 1.3 Sources of spatial micro-data 7 1.4 Sources of uncertainty in spatial micro-data 8 1.5 Conclusions and plan of the book 10
PART II
Modeling the spatial behavior of economic agents in a given set of locations
13
2
Preliminary definitions and concepts 2.1 Neighborhood and the W matrix 15 2.2 Moran’s I and other spatial correlation measures 22 2.3 The Moran scatterplot and local indicators of spatial correlation 26 2.4 Conclusions 29
15
3
Basic cross-sectional spatial linear models 3.1 Introduction 30 3.2 Regression models with spatial autoregressive components 30 3.2.1 Pure spatial autoregression 30 3.2.2 The spatial error model 32
30
viii Contents
3.3 3.4 3.5
3.6
3.2.3 The spatial lag model 35 3.2.4 The spatial Durbin model 39 3.2.5 The general spatial autoregressive model with spatial autoregressive error structure 41 Test of residual spatial autocorrelation with explicit alternative hypotheses 44 Marginal impacts 46 Effects of spatial imperfections of micro-data 48 3.5.1 Introduction 48 3.5.2 Measurement error in spatial error models 49 3.5.3 Measurement error in spatial lag models 50 Problems in regressions on a spatial distance 53
4
Non-linear spatial models 4.1 Non-linear spatial regressions 57 4.2 Standard non-linear models 58 4.2.1 Logit and probit models 58 4.2.2 The tobit model 60 4.3 Spatial probit and logit models 63 4.3.1 Model specification 63 4.3.2 Estimation 65 4.4 The spatial tobit model 72 4.4.1 Model specification 72 4.4.2 Estimation 72 4.5 Further non-linear spatial models 73 4.6 Marginal impacts in spatial non-linear models 74
57
5
Space–time models 5.1 Generalities 76 5.2 Fixed and random effects models 76 5.3 Random effects spatial models 77 5.4 Fixed effect spatial models 79 5.5 Estimation 80 5.5.1 Introduction 80 5.5.2 Maximum likelihood 80
76
5.5.2.1 Likelihood procedures for random effect models 80 5.5.2.2 Likelihood procedures for fixed effect models 82
5.5.3 The generalized method of moments approach 83 5.5.3.1 Generalized method of moments procedures for random effects models 84 5.5.3.2 Generalized method of moments procedures for fixed effects models 85
5.6 A glance at further approaches in spatial panel data modeling 85
Contents ix PART III
Modeling the spatial locational choices of economic agents 6 Preliminary definitions and concepts in point pattern analysis 6.1 Spatial point patterns of economic agents 93 6.2 The hypothesis of complete spatial randomness 94 6.3 Spatial point processes 95 6.3.1 Homogeneous Poisson point process 96 6.3.2 Aggregated point processes 98
91
93
6.3.2.1 Inhomogeneous Poisson point processes 98 6.3.2.2 Cox processes 100 6.3.2.3 Poisson cluster point processes 101
6.3.3 Regular point processes 104 6.4 Classic exploratory tools and summary statistics for spatial point patterns 107 6.4.1 Quadrat-based methods 107 6.4.2 Distance-based methods 110
7 Models of the spatial location of individuals 7.1 Ripley’s K-function 113 7.2 Estimation of Ripley’s K-function 114 7.3 Identification of spatial location patterns 116 7.3.1 The CSR test 116 7.3.2 Parameter estimation of the Thomas cluster process 121 7.3.3 Parameter estimation of the Matérn cluster process 124 7.3.4 Parameter estimation of the log-Gaussian Cox process 126
113
8 Points in a heterogeneous space 8.1 Diggle and Chetwynd’s D-function 127 8.2 Baddeley, Møller and Waagepetersen’s K inhom-function 132 8.2.1 Estimation of K inhom-function 133 8.2.2 Inference for K inhom-function 135 8.3 Measuring spatial concentration of industries: Duranton–Overman K-density and Marcon–Puech M-function 138 8.3.1 Duranton and Overman’s K-density 139 8.3.2 Marcon and Puech’s M-function 140
127
x Contents
9
Space–time models 143 9.1 Diggle, Chetwynd, Häggkvist and Morris’ space–time K-function 143 9.1.1 Estimation of space–time K-function 145 9.1.2 Detecting space–time clustering of economic events 145 9.2 Gabriel and Diggle’s STIK-function 148 9.2.1 Estimation of STIK-function and inference 152
PART IV
Looking ahead: modeling both the spatial location choices and the spatial behavior of economic agents
157
10 Firm demography and survival analysis 159 10.1 Introduction 159 10.2 A spatial microeconometric model for firm demography 161 10.2.1 A spatial model for firm demography 161 10.2.1.1 10.2.1.2 10.2.1.3 10.2.1.4
Introduction 161 The birth model 162 The growth model 163 The survival model 163
10.2.2.1 10.2.2.2 10.2.2.3 10.2.2.4
Data description 164 The birth model 166 The growth model 169 The survival model 170
10.2.2 A case study 164
10.2.3 Conclusions 172 10.3 A spatial microeconometric model for firm survival 172 10.3.1 Introduction 172 10.3.2 Basic survival analysis techniques 173 10.3.3 Case study: The survival of pharmaceutical and medical device manufacturing start-up firms in Italy 176 10.3.3.1 Data description 176 10.3.3.2 Definition of the spatial microeconometric covariates 177 10.3.3.3 Definition of the control variables 180 10.3.3.4 Empirical results 181
10.4 Conclusion 186 Appendices Appendix 1: Some publicly available spatial datasets 187 Appendix 2: Creation of a W matrix and preliminary computations 188
Contents xi
Appendix 3: Spatial linear models 191 Appendix 4: Non-linear spatial models 192 Appendix 5: Space–time models 193 Appendix 6: Preliminary definitions and concepts in point pattern analysis 194 Appendix 6.1: Point pattern datasets 194 Appendix 6.2: Simulating point patterns 195 Appendix 6.2.1: Homogeneous Poisson processes 195 Appendix 6.2.2: Inhomogeneous Poisson processes 196 Appendix 6.2.3: Cox processes 196 Appendix 6.2.4: Poisson cluster processes 196 Appendix 6.2.5: Regular processes 197
Appendix 6.3: Quadrat-based analysis 197 Appendix 6.4: Clark–Evans test 198 Appendix 7: Models of the spatial location of individuals 199 Appendix 7.1: K-function-based CSR test 199 Appendix 7.2: P oint process parameters estimation by the method of minimum contrast 199 Appendix 8: Points in a heterogeneous space 200 Appendix 8.1: D -function-based test of spatial interactions 200 Appendix 8.2: K inhom-function-based test of spatial interactions 202 Appendix 8.3: Duranton–Overman K-density and Marcon–Puech M-function 203 Appendix 9: Space–time models 204 Appendix 9.1: Space–time K-function 204 Appendix 9.2: Gabriel and Diggle’s STIK-function 205 Bibliography Index
207 219
Foreword
Giuseppe Arbia, Giuseppe Espa and Diego Giuliani, invited me to read their monograph on Spatial Microeconometrics and to write a foreword for it. As a theoretical econometrician who has worked on both microeconometrics and spatial econometrics, I am glad to have the chance to write such a foreword to their book. The three authors are statisticians, economists and educators. Their monograph on spatial microeconometrics has covered important spatial economic issues and models. The book is divided into two parts. The first half introduces and provides insights into popular simultaneous type equation models, such as the spatial autoregressive model (SAR), which capture contemporaneous interactions and spillover effects on economic outcomes of spatially located economic agents. The second half of the monograph concentrates on the location choices and spatial behavior of economic agents. The approach for the second half starts with spatial point processes. The merit of this monograph is that the authors provide for each spatial model an empirical application to illustrate the practical relevance of a spatial model specification and its implication of interactions via real data. While there are discussions on some regularity conditions for a model structure, the details are referred to the theoretical spatial econometric literature. Chapter 1 provides the foundations of spatial econometric models with microeconomic justification, even though the popular SAR model has been introduced by statisticians, in the attempt to extend autoregression in time series to capture spatial correlation and/or spillover effects across spatial units. A time series has the time forwarding influence of past activities on current activity. However, for spatial interactions, in general, there is not a single forwarding direction of influence for spatial outcomes because the outcome of a spatial unit might have influences on all direct neighbors in all spatial directions. In order to capture neighboring spatial units’ influences, typically, a spatial network matrix Wn would be constructed to capture possible, but relative influence of neighbors. Whether neighboring and networking units have significant influence on a spatial unit would then be summarized by an additional coefficient of the networking factor. Chapter 2 of the monograph introduces neighboring structures and the construction of the spatial network matrix W. With cross-sectional data on
xiv Foreword outcomes of spatial units, a preliminary interest is whether those outcomes are spatially correlated or not. A test statistic developed by Moran is the most useful statistic for testing purposes. Moran’s test statistic lays the statistical foundation for spatial statistics and econometrics. Formal regression models with spatial interactions on outcomes as well as spatial correlation in disturbances are introduced. A regression equation with spatially correlated disturbances extends the regression model with serially correlated disturbances to the spatial situation. The Moran test of spatial correlation in the regression residuals is shown to be a Lagrange multiplier test. For estimation, the monograph has focused on the method of maximum likelihood (ML) and the generalized method of moments (GMM). For the SAR model, in the presence of exogenous variables, the orthogonality condition of exogenous variables and disturbances provides naturally the adoption of the two-stage least squares estimation method as the exogenous variables would be valid instrumental variables. In a regression framework, a classical social science model includes spatial interactions which describes how neighbors’ exogenous characteristics influence the outcomes of a spatial unit. In spatial econometrics, an additional neighboring characteristic term is called a Durbin term. In a SAR model, one may include Durbin terms to capture directly exogenous effects due to interactions with neighbors. The subsequent sections of Chapter 3 point out the economic implications in terms of the marginal impact of a small change of an exogenous variable on the possible outcome. The marginal effects and multiplier effect are important, in particular, in SAR models as spillover effects due to neighbors are the key feature in such a model. The regression and SAR models in Chapter 3 are linear models as they are linear in the dependent variables and their (main) coefficients. Linear models have the computational advantage in estimation, but, as in the classical microeconometric literature, individual decisions might involve discrete or limited outcomes. Discrete outcomes are usually modeled with probit or logit models. Limited outcomes are formulated with a censoring regression, which is known as a tobit model. In the classical microeconomics literature, individuals are assumed to make decisions only keeping in mind their self-interest, but do not take into account any possible externality in their decisions. On the contrary, spatial microeconometrics allows for individual decisions, taking into account the influence of possible decisions or actions of others in a game setting, even though their decisions might not take into account any externalities that their decision might generate. While in a SAR model, outcomes are the results of optimized decisions of each individual in a game setting, the SAR equation can be simply a linear equation. But for discrete choices and tobit models, there are limited points with probability involved so the resulting model could not be specified properly only with linear structures. Chapter 4 presents some of these non-linear models which take into account spill-over effects. There are two types of probit and tobit SAR models discussed in the monograph. One is a latent SAR underlying process but with observable binary or censored indicators. As the likelihood function for such a model is computationally complicated (as it involves multiple
Foreword xv integrals), simplified estimation methods are discussed as they can be computationally tractable even if they might lose asymptotic efficiency for estimation. Other probit and tobit SAR models have a simultaneous structure in that the peer agents’ chosen alternatives or limited outcomes have a direct influence on an agent’s decision. The latent tobit SAR model can be computationally more attractive, but the estimation of the corresponding probit SAR would still be complicated. Furthermore, the corresponding discrete choice SAR model might have multiple Nash equilibrium, and tractable estimation methods remain to be considered. The asymptotic theory for estimators for such non-linear models has been developed in the theoretical spatial econometric literature. Asymptotic theories extend non-linear time series to spatial mixing and near-epoch dependence processes. These theories have not been presented in this monograph, but readers can refer to existing publications. Due to the non-linear structures of the spatial probit and tobit models, the monograph points out the importance of using marginal impacts for understanding economics implications of regressors in such models. After the presentation of popular linear and non-linear spatial models for cross sectional data, Chapter 5 of the monograph considers panel data with space–time models. Cross-sectional models are static models. However, panel data models can incorporate dynamic adjustments and can identity effects of time-variant, but individual-invariant, explanatory variables. In a cross-sectional model, if an explanatory time variable stays constant for all units at each time, then its effect on outcomes cannot be identifiable because a cross-sectional invariant variable could not be separately identified from the intercept term in a model. With panel data models, since such an explanatory variable will have different values over time, its effect can then be identified. Another important feature of a panel data model is its ability to capture and identify effects of overall unobserved individual factors which do not change over time. In a cross-sectional model, those unobservables could only be captured as a part of the overall disturbance. Relevant unobservables in a cross-sectional model would not be allowed to correlate with included regressors in order to identify their specific effects. However, in a panel data model, time invariant unobservables can be treated as individual parameters for estimation, so their correlation with included regressors are allowed. That is the advantage of treating time-invariant, but unobserved individual variables with a fixed effect in a panel data model. If those time-invariant unobservables were not correlated with included regressors, they can be treated as a random component in the overall disturbance, which results in a random components model. As in the usual panel regression model, a Hausman test can validate whether individual effects are correlated with included regressors or not. Chapter 5 presents both fixed effects and random effects space-time models. In a spatial panel model, spatial and time lagged dependent variables can capture “diffusion” effects, while a cross sectional model cannot. In this chapter, the monograph discusses ML and GMM approaches. In a short panel time model, the GMM approach is of special interest as initial lagged endogenous variable would not be easily dealt with in the ML approach.
xvi Foreword The second half of this monograph investigates spatial location patterns of economics agents (firms) and the growth and survival of firms taking into account spillover effects of existing firms. Chapter 6 considers the micro-geographical distribution of economic events and activities from the spatial statistics point of view. A spatial point process generates locations of objects on a plane and thus can be used to analyze spatial point patterns. A spatial point process is characterized by an intensity function, which describes an expected number of points per unit located within an infinitesimal region centered at the generic point x. If the intensity function is a constant through space, the point process is stationary. The hypothesis of complete spatial randomness (CSR) assumes that points have been generated under stationarity and independence. The homogeneous Poisson point process is the basic process which represents the CSR hypothesis. Other point processes generate aggregated or regular points patterns. Aggregation of points arises because of true contagion or apparent contagion. Apparent contagion relaxes stationarity. An inhomogeneous Poisson process can lead to apparent contagion while a Poisson cluster process leads to true contagion. The λ(x) of the inhomogeneous Poisson process in a Cox process is stochastic, and λ(x) and λ(y) at different locations x and y can be correlated. Another form of violation of the CSR hypothesis is the spatial inhibition, which can be modeled with Matern’s inhibition processes, the simple sequential inhibition process and the Strauss process. Traditional techniques provide formal tests for the CSR hypothesis. Chapter 7 introduces the K-function. Different spatial patterns in the CSR and the presence of “clustering” and “inhibition” can be captured by the K-function. In turn, one may introduce Monte Carlo procedures with the K-function to test for CSR. The K-function and its subsequent variants and extensions in the next chapters are useful mainly to perform static analysis of spatial distributions of economic agents’ locations on a continuous space. Chapter 8 extends the modeling framework to a heterogeneous economics space by introducing inhomogeneous K-functions – such as the D-function and the K inhom-function, which are distance-based measures of spatial concentration of industries and provide tools to assess the statistical significance of spatial interactions. Both the K-density and the M-function are proposed as adaptations of the K-function to industrial agglomeration. Chapter 9 extends the framework for location analysis to data with space and time. It concerns dynamic point patterns by introducing a spatio-temporal K-function, which separates the spatial and temporal dynamics. This function can be used to detect and measure space-time clustering. It can also be extended to be several possible diagnostic tools to detect independence between the spatial and temporal components of processes, as well as spatial-temporal clustering and spatial-temporal inhibition. Chapter 10 considers more behaviors of firms in space and time. Firms are created at some random locations at some point in time, one models the way they operate, grow and attract or repulse other firms in their neighborhood. The authors formalize these processes of a firm with three model components, a birth model, a growth model and a death/survival model, which take into account the presence of spatial spillover effect, spatial externalities and spatial inhibition
Foreword xvii among economic agents. The firm formation process is modeled as an inhomogeneous Poisson process with an intensity function λ(x) at location x driven by potential interaction effects of existing firms. The growth of a firm depends on its development at the beginning and competitive influences of other firms in the neighborhood. The death/survival component models a death/survival process taking into account competitive or cooperative influences of the other neighboring firms in the survival probability of a firm. Instead of discrete time for death/survival modeling, the final part of this chapter extends the survival data (or failure time) models to describe the death/survival component of the behavior of firms taking into account externalities in a continuous time setting. A case study with an Italian data provides an illustrative example for this chapter. Lung-Fei Lee Ohio State University
Preface and acknowledgements
This book is devoted to discussing a class of statistical and econometric methods designed to analyze individual micro-data which are observed as points in the economic space thus emphasizing the role of geographical relationships and other forms of network interaction between them. Classical spatial econometrics is a field which traditionally studies the specificity of data observed within discrete portions of space, such as counties or regions. It owes its increasing popularity to the fact that applications can be found not only in regional and urban economics but also in a very wide variety of scientific fields like agricultural and health economics, resources and energy economics, land use, economic development, innovation diffusion, transportation, public finance, industrial organization, political sciences, psychology, demography, managerial economics, education, history, labor, criminology and real estate to name only a small subset of them. Microeconometrics is a well-established field of research, which, however, generally neglects spatial and network relationships between economic agents. Spatial microeconometrics represents the subfield which joins the efforts of these two fields. Although a spatial microeconometric approach had already been suggested in the late 1980s, at that time appropriate models had not been developed, adequate empirical data were not available and computing power was still inadequate to treat them anyway. Now that theoretical models have been fully developed and computing power has increased dramatically, there are no more obstacles and the field is rapidly growing under the impulse received by the widespread availability of detailed individual data linked to the Big Data advent and to the increased demand for empirical studies associated with them. Indeed, the availability of detailed databases coming from new data sources (e.g. crowdsourcing, cell phones, web scraping, internet of things, drones) makes it possible to eventually abandon the rather unrealistic representative agent paradigm that dominated the scene during the twentieth century and to start thinking in a totally different way by considering “the economy as a self-organizing system, rather than a glorified individual” to express it in Alan Kirman’s words. This book aims to draw the boundaries of this challenging new branch of studies.
xx Preface and acknowledgements Going through the book, the reader will learn the specificities of treating data representing sets of interacting individuals with respect to the traditional microeconometric approach which treats their locational choices as exogenous and their economic behavior as independent. In particular, the reader will learn the consequences of neglecting such important sources of information on statistical inference and how to improve the model predictive performances exploiting them. The book introduces the theory, formally derives the properties of the models and clarifies the various concepts, discussing examples based on freely accessible real data that can be used to replicate the analyses for a better understanding of the various topics. It also instructs the readers on how to perform their own analyses, describing in detail the codes which are necessary when using the free statistical language R. This is an important and distinctive feature of the book in that, following the description of the procedures contained in the Appendix, all the models described in the text can be immediately put into action using the datasets which are of direct interest for the reader. The book thus represents an essential reference for master’s and PhD students as well as academic researchers who are engaged in the econometric analysis of empirical data in many branches of economics and in other neighboring fields like, such as environmental and epidemiological studies. We believe that our work fills a gap in the literature with only marginal overlap with other existing textbooks on general spatial econometrics which are either explicitly focused on the analysis of regional data or do not consider the issues connected with the point pattern of individual agents and their locational choices. The writing of this book has a long history that is perhaps worth telling briefly. Back in 1996 G.A. and G.E. wrote a book, entitled Statistica economica territoriale (Spatial Economic Statistics), published by the Italian publisher CEDAM (Arbia and Espa, 1996), that we like to think had a certain impact in the Italian academy and a certain role in diffusing those methodological practices among Italian researchers. The book was devoted to the econometric analysis of microdata, but it was limited to the study of the locational choices of individuals. At the time the two of us already had the idea of writing a second, more comprehensive, textbook which, in our minds, should also include the joint modeling of individuals’ location decisions and their interactions. However, the time was not right for such a project. Even if the two of us perceived clearly the importance of such a comprehensive approach, the literature was still scarce and so incredibly scattered in many diverse disciplinary journals that it seemed impossible to bring it all within a single unified presentation accessible to all scholars. Furthermore, too many methodological problems were still waiting for satisfactory methodological answers. Indeed, most of the material we report in this book was still largely unwritten at the time and it materialized only after the turn of the new millennium. An important moment in the genesis of this book was the meeting with D.G., an enthusiastic student who at the time of the 1996 book was only 15-years-old. After he obtained his doctorate under the supervision of G.E. at the University of Trento, the three of us started a fruitful cooperation that still
Preface and acknowledgements xxi goes on and that generated, in the last decade, a stream of papers on this subject. During this period the original plan of the book came back into discussion and we all agreed that the time was now right to engage the challenge of writing it. What we had in mind was a book that could be used as a textbook for special topics in an econometrics course or in a course devoted specifically to spatial econometrics with an emphasis on individual spatial and network interaction. Despite the large class of models introduced from very different fields, we wanted to produce a book which was rather self-contained and whose understanding did not require any particular background beyond a working knowledge of elementary inferential statistics and econometrics at the level of an introductory academic course. The reader will judge if we achieved our aim. The work has been demanding due to the vast literature examined and it was carried out jointly by the three of us in Rome (G.A.) and Trento (G.E. and D.G.) where we are currently located. Part of the work, however, was developed when G.A. was visiting the University of Illinois at Urbana-Champain, the universities of Sendai and Tsukuba in Japan, Stellenbosch University in South Africa, the Higher School of Economics in Moscow and the Centre for Entrepreneurship and Spatial Economics of Jönköping University in Sweden where he was invited to teach courses using some of the material reported here. We wish to take this chance to thank all these institutions for their interest in the subject and their warm hospitality. An acceleration towards the production of the final draft, however, was provided by the lockdown measures imposed in Italy from March 9th to May 3rd 2020 to limit the diffusion of the SARS-CoV-2 during its pandemic. In those days, forced to stay at home for about two months with no teaching tasks to undertake and no distractions, we concentrated our efforts on the production of this book, trying to at least take some (small) advantage of the dramatic situation that was taking place around us. Even if this is perhaps not the best place for it, we feel it our duty to thank all the healthcare personnel for the incredible efforts they made in those days to contain the contagion even at the price, sometimes, of their own lives. Without them we would not be here today and their sacrifice can never be forgotten. As it is common in these cases, the work has benefited by the comments and remarks received by a large number of people and we are happy to have here the chance to fulfil the pleasant duty to thank all of them in the occasion of submitting our draft to the publisher. First of all, we wish to thank all the participants to the Spatial Econometrics Advanced Institute, a summer school held yearly in Rome since 2008 where G.A. and D.G. had a teaching role in recent years. The active presence of the students in class represented a great stimulus to collect the material in this book. In particular we wish to thank Giovanni Millo of Assicurazioni Generali (Trieste, Italy) who was first a student, and then for many years an instructor, at the summer school and who collaborated substantially to the drafting of Chapter 5. Secondly, we would like to also thank Danila Filipponi, Simonetta Cozzi and Patrizia Cella of the Italian National Institute of Statistics in Rome for their
xxii Preface and acknowledgements help and assistance in gathering the datasets used in some examples described in this book and Maria Michela Dickson and Flavio Santi of Trento University who carefully read previous drafts of the book and provided valuable comments and suggestions. G.A.: I wish to dedicate this book to the newly born Pietro, my first grandson. I dedicated my first book to his mother Elisa back in 1989. Looking back to those years, it is sad to note how many of the people that were close to me at that time passed away, first of all my beloved parents, Francesco and Giulia. However, Pietro is now here and his arrival gives a positive sense to the time that has passed by. Therefore, this is my welcome to him, with whom I was so lucky to spend the last month in our country house in Tuscany alternating my last revisions of the book with the grandfather’s important duty of assisting him in his first steps and his childish games. Even if he is the “special guest” here, in my dedication I cannot forget to give my thanks to my beloved wife Paola and to our three grown-up children: Elisa, Francesco and Enrica, although I have no more hope that they will ever read any of my books. G.E.: I wish to dedicate this book to the memory of my father Salvatore who has always supported me in all my choices and inspired my love for statistics. I also would like to acknowledge the love and constant support of my sons Guido and Massimo. D.G.: I wish to dedicate this book to my wife and partner, Serena, for her unwavering support and encouragement during the completion of this project. To all of people mentioned above our thoughts are directed in this torrid and muggy day of an unusual mid-August when everybody else both in Rome and Trento seem to be on the beach despite the pandemic alert and we are here writing these that, hopefully, will be the last words of this book before it is published. Rome, August 15th 2020 Ascension of Virgin Mary
Part I
Introduction
1
Foundations of spatial microeconometrics modeling
1.1 A micro-level approach to spatial econometrics This book is devoted to the spatial econometric analysis of individual micro-data observed as points in the economic space (Dubé and Legros, 2014), sometimes referred to as “spatial microeconometrics” (Arbia et al., 2016). This branch is rapidly emerging onto the stage of spatial econometrics, building upon results from various branches of spatial statistics (Diggle, 2003) and on the earlier contributions of Arbia and Espa (1996), Duranton and Overman (2005), Marcon and Puech (2003; 2009; 2010) and Arbia et al. (2008; 2010; 2014a; 2014b; 2015b). In a relatively recent paper Pinkse and Slade (2010) heavily criticized the current developments of spatial econometrics, observing: The theory is in many ways in its infancy relative to the complexity of many applications (in sharp contrast to time-series econometrics, where the theory is well developed) … due to the fact that it is almost invariably directed by what appears to be the most obvious extension of what is currently available rather than being inspired by actual empirical applications. and: Many generic large sample results treat locations as both exogenous and fixed and assume that they are observations at particular locations of an underlying spatial process. … Economists have studied the locational choices of individuals … and of firms … but generally treat the characteristics of locales as given. The purpose of much spatial work, however, is to uncover the interaction among (authorities of) geographic units, who choose, e.g., tax rates to attract firms or social services to attract households. … An ideal model would marry the two; it would provide a model explaining both individuals’ location decisions and the action of, say, local authorities. (Pinkse and Slade, 2010) This new modelling strategy, which treats location as endogenous by taking into account simultaneously both individuals’ locational choices and their economic
4 Introduction decisions in their chosen location, represents the scope of the growing field of spatial microeconometrics. As a matter of fact, a spatial microeconometric approach (unconceivable until only a few decades ago) is now more and more feasible due to the increasing availability of very large geo-referenced databases in all fields of economic analysis. For instance, the US Census Bureau’s Longitudinal Business Database provides annual observations for every private-sector establishment with a payroll and includes approximately 4 million establishments and 70 million employees each year. Sourced from US tax records and Census Bureau surveys, the micro-records document the universe of establishments and firms characterized by their latitude–longitude spatial coordinates (Glaeser and Kerr, 2009). Examples of this kind can be increasingly found in all branches of economics including education, health economics, agricultural economics, labor economics, industrial economics, house prices, technological diffusion and many others. We will discuss them in the next section. The availability of these detailed geographical databases now makes it possible to model individuals’ economic behavior in space to gain information about economic trends at a regional or macro-level. A spatial microeconometric approach had already been suggested some 30 years ago by Durlauf (1989), at a time when data allowing this kind of approach were not yet available, appropriate models had not been developed and computing power was limited. Durlauf criticized the mainstream macroeconomy, pointing out that “macroeconomic modeling currently relies upon the representative agent paradigm to describe the evolution of time series. There is a folk wisdom that heterogeneity of agents renders these models unsatisfactory approximations of the macroeconomy”. He then proceeded to describe a “lattice economy” where a “collection of agents are distributed across space and time” and “macroeconomy consists of many simple agents simultaneously interacting”. Durlauf (1989; 1999) suggested a parallel between physics and economic analysis. In particular he concentrated on the links existing between formal individual choice models and the formalism of statistical mechanics, which suggested that there are many useful tools that applied economists could borrow from physics. Just as in statistical mechanics models explain how a collection of atoms can exhibit the correlated behavior necessary to produce a magnet, in economics one may devise models aimed at explaining spatially interdependent behaviors. The basic idea in statistical mechanics, that the behavior of one atom is influenced by the behavior of other atoms located nearby, is indeed very similar to the hypothesis that forms the basis of all spatial econometric studies that individual or collective decisions depend upon the decisions taken in other neighboring regions or by neighboring economic agents. According to Kirman (1992) the traditional approach considers “the aggregate behavior of the economy as though it were the behavior of a single representative agent”. However there is strong evidence that “heterogeneity and dispersion of agents’ characteristics may lead to regularity in aggregate behavior” and that “once we allow for interdependence … consistency between microeconomic characteristics and macroeconomic characteristics may be lost” and,
Foundations of spatial microeconometrics 5 finally, “strong local random interacting agents who are a priori identical may produce macroeconomic irregularities”. Kirman concludes his work by stating that we must change our attitude and start thinking “of the economy as a self-organizing system, rather than a glorified individual”. Perhaps the most radical criticism in this respect is, however, presented by Danny Quah (1993), who states: Modern macroeconomics concerns itself, almost by definition with substitution of consumption and production across time. The macroeconomist wishes to understand the dynamic of inflation and asset prices, output and employment, growth and business cycles. Whether in doing so, one uses ideas of search and nonconvexities, intertemporal substitution and real business cycles, sticky prices and wages, or dynamic externalities, one implicitly assumes that it is the variation in economic activity across time that is the most useful to analyse. But why must that variation be the most important? In doing so the macroeconomist “almost exclusively focuses on aggregate (rather than disaggregate) shocks as the source of economic fluctuations” ignoring “rich cross-sectional evidence on economic behaviour” and losing “the ability to say anything about the rich heterogeneous observations on economic activity across space, industries, firms and agents”. These criticisms should be distinguished from those implying the failure of aggregation to a representative agent (e.g. Forni and Lippi, 1997; Kirman, 1992). There, the researcher points out the inability to represent aggregate behaviour because of individual heterogeneity. Here I assert instead that it is individual heterogeneity that is more interesting even from the perspective of wishing to understand macroeconomic behaviour. However in introducing such concepts into the discussion and ignoring the empirical tools, “researchers have used empirical ideas that are altogether uninformative. Those econometricians who model dynamic adjustment have done so not because adjustment occurs only in time and not in space, but because time series methods are already readily available for the former and not the latter” (Quah, 1993). The quoted sentences can be considered in some sense the manifesto of spatial microeconometrics.
1.2 Advantages of spatial microeconometric analysis The biggest advantage of a spatial microeconometric approach over orthodox spatial econometrics is the possibility of treating location and distances as endogenous thus allowing the modeling of both economic variables and locational choices within the same methodological framework (Pinkse and Slade, 2010). Spatial microeconometrics present many distinctive features with respect to orthodox spatial econometrics based on regional data and with respect to standard
6 Introduction microeconometrics. Concerning the general field of microeconometrics, Cameron and Trivedi report six distinctive features: (i) discreteness and nonlinearity, (ii) greater realism, (iii) greater information content, (iv) microeconomic foundations, (v) disaggregation and heterogeneity and (vi) dynamics (Cameron and Trivedi, 2005). The lack of theories to support regional econometric modelling (Pinkse and Slade, 2010; Corrado and Fingleton, 2012) is one of the deeper criticisms against spatial econometrics restrictively conceived, which can, at most, lead to the identification of technical relationships with little or no possibility of drawing causal inferences. On the contrary, a spatial microeconometric approach provides the possibility of identifying more realistic models because hypotheses about economic behavior are usually elicited from theories related to the individual choices of economic agents. The inconsistency between microeconomic theories and macro-relationships has long been discussed in the economic literature (Pesaran et al., 1987; Klein, 1946). As a matter of fact, a relationship estimated at an individual level, such as a production function, may be regarded as a behavioral relationship that, for the single firm, embodies a particular interpretation of the causal mechanism linking inputs to outputs. However, the same relationship at an aggregate level does not depend on profit maximization but purely on technological factors (Klein, 1946). The relatively cavalier fashion with which most empirical studies shift from one unit to the other has seldom been criticized in the literature (Green, 1964; Hannan, 1970; Haitovsky, 1973). Traditionally economists have been faced with this problem in the analysis of family budgets: if we estimate a linear consumption function on aggregate data the impact of income on consumption has nothing in common with the individual marginal propensity to consume (Modigliani and Brunberg, 1955; Stocker, 1982). The aggregation problem is a particularly relevant feature of the spatial econometrics of regional data that can be tackled by estimating models at a micro-geographical level. In fact, geographically aggregated data within discrete portions of space are based on arbitrary definitions of the spatial observational units, and, in this way, they introduce a statistical bias arising from the discretional characterization of space. This issue is very well known in the statistical literature, where it is referred to as the “modifiable areal unit problem” or MAUP (Arbia, 1989). The modifiable areal unit problem is more severe than the traditional modifiable unit problem (Yule and Kendall, 1950), because regional data are usually very irregular aggregations of individual characterized by large differences in terms of the size and the shape of the various spatial units. The MAUP manifests itself in two ways: (i) the scale problem, dealing with the indeterminacy of any statistical measure to changes in the level of aggregation of data, and (ii) the aggregation problem, having to do with the indeterminacy of any statistical measures due to changes in the aggregation criterion at a given spatial scale. The effects of aggregation on standard econometric models are well known, dating back to the early contributions of Prais and Aitchinson (1954), Theil (1954), Zellner (1962), Cramer (1964), Haitovsky (1973). More contributions were made by Barker and Pesaran (1989), Okabe and Tagashira
Foundations of spatial microeconometrics 7 (1996), Tagashira and Okabe (2002) and Griffith et al. (2003). The main results found in the literature are that the estimators of regression’s parameters have a larger variance when using aggregated rather than individual data, leading to false inferential conclusions and to the acceptance of models that should be discarded. Orcutt et al. (1968), through a microsimulation study, pointed out that “detailed study of the individual regression indicates a tendency to reject the null hypothesis more frequently than the usual sampling theory suggests. … Perhaps this is why economic theories are almost never rejected on the basis of empirical evidences.” Similar conclusions were reached by Arbia (1989), who considered a spatial random economy constituted by many interacting agents. He noticed that “even a small amount of autocorrelation between the individuals can produce the ecological fallacy effect”.1 The loss in efficiency due to aggregation depends on the grouping criterion and it is minimized when individuals are grouped so as to maximize the within-group variability. The effects of MAUP on different statistical measures, pioneered by Gehlke and Biehl (1934), Yule and Kendall (1950), Robinson (1950) and Openshaw and Taylor (1979), have been studied at length by Arbia (1989) who derived the formal relationship between the Pearson’s correlation coefficient at the individual level and the same coefficient at the aggregate level when data are spatially correlated. Arbia and Petrarca (2011) presented a general framework for analyzing the effects of MAUP on spatial econometric models showing that the efficiency loss, connatural to any aggregation process, is mitigated by the presence of a positive spatial correlation parameter and conversely exacerbated by the presence of a negative spatial correlation. This result is intuitive: positive spatial correlation implies aggregation between similar values thus preserving variability, while negative spatial correlation implies aggregation between very different values thus implying a more dramatic increase of variability.
1.3 Sources of spatial micro-data The large availability of geo-coded data in many fields has increased enormously the potential applications of spatial microeconometrics, opening new possibilities for modeling the individual economic behavior in fields like education, industrial economics, hedonic house prices, health economics, agricultural economics, labor economics, business, crime, social networks and technological diffusion to name only a few. Indeed the advent of Big Data (Arbia, forthcoming) has brought a revolution in terms of the data availability at an individual level so that the sources of spatial micro-data are now no longer limited to archives, administrative records or panels, as they were in the recent past. In fact, they more and more include alternative data sources such as satellite and aerial photographs, information obtained through drones, crowdsourcing, cell phones, web scraping, the internet of things (IOT) and many others. In particular, common examples of increasingly popular alternative spatial data sources are represented by crowdsourcing (data voluntarily collected by individuals), web scraping (data extracted from
8 Introduction websites and reshaped in a structured dataset) and the internet of things. These typologies of data with the addition of a spatial reference are commonly known as “volunteered geographic information” (VGI) (Goodchild, 2007). Crowdsourced data are common in many situations. An example is represented by data collected through smart phones in order to measure phenomena that are otherwise difficult to quantify precisely and quickly, such as food prices in developing countries (see e.g. Arbia et al., 2020) and epidemiological data (Crequit et al., 2018). The practice of extracting data from the web and using them in statistical analyses is also becoming more and more popular, such as collecting online prices in the real estate market (Beręsewicz, 2015; Boeing and Waddell, 2017; Arbia and Nardelli, 2020) or in consumer goods. A very good example in this respect is constituted by the Billion Prices Project (Cavallo and Rigobon, 2016) an academic initiative that collects prices from hundreds of online retailers around the world on a daily basis to conduct economic research. The internet of things consists of a system of interrelated computing devices which are provided with the ability to automatically transfer data over a network. An example is constituted by electronic devices to monitor the quality of the air in metropolitan areas (see e.g. ). There are two pieces of information that are needed in order to conduct a micro-level spatial econometric analysis. The first is derived from the traditional observation of attributes, while the second is the exact geographical location of the observed individuals and can take the form of UTM/GPS coordinates. Many spatial econometric methods are based on the possibility of accessing the exact individual locations and calculating inter-point distances between them. In many situations such information is obtained automatically in the process of data acquisition. For instance, in the case of crowdsourcing from cell phones data are related to the coverage area of a cellular system which is divided into non-overlapping cells. In areas where the cells are very dense the individual’s position can be assessed with a high degree of precision. Moreover, when collected from the internet of things, data are transmitted automatically containing both the attribute information and the GPS coordinates. Conversely, when the coordinates cannot be observed directly, the process of geo-coding often implies converting addresses into geographical coordinates, such as in the case of web scraping house-price data from real estate companies. In this case the task can be automatically accomplished through the use of such programs as Google Maps Geocoding API. Travel distances or times for a matrix of origins and destinations based on recommended routes from a start to an end point can be similarly obtained through Google Maps Distance Matrix API.
1.4 Sources of uncertainty in spatial micro-data Having described briefly the major advantages of a micro-approach to spatial econometrics, let us now present the typical problems emerging when we use micro-data that are, in contrast, irrelevant in the use of regional aggregated data in statistical analyses. When using regional data in spatial econometrics
Foundations of spatial microeconometrics 9 almost invariably (i) the spatial units (regions) constitute the whole population, (ii) there are no missing data and (iii) the location of the observations is known exactly. Conversely, when we use individual geo-coded data, we encounter various forms of data imperfection that can mask the real phenomena up to the point of distorting dramatically the inferential conclusions: data are often based on a sample, some data may be missing and they may very frequently contain both attribute and locational errors. Dealing with spatial micro-data, there is still a certain degree of ambiguity in the literature on the concept of uncertainty and missing data. In order to clarify this issue, let us distinguish the case of missing data from the case of missing location. In practice we can encounter four different cases that must be distinguished because the consequences (and the solutions) are intuitively different in each situation. These will be discussed in turn in the present section The first case is missing spatial data and spatial location, when both the location and some measurements are unknown. We know of the presence of some individuals in a certain area but not exactly where they are, and, furthermore, we do not have information about some or all their characteristics. Some individuals are simply not observed in the study area. A second situation within this context is missing spatial data, when the location of individuals is known exactly, but we are unable to observe the characteristics of some or all of the individuals. This happens, for instance, when we know of the presence of a firm and its exact GPS location, but some or all information is missing at a certain moment of time (e.g., the number of employees or the production realized by the firm in that location). This case represents the traditional case of missing data as it has been treated at length in the statistical literature (Little, 1988; Little and Rubin, 2002; Rubin, 1976; Roderick and Rubin, 2007) where solutions have been suggested to replace the observations that are missing following different interpolating strategies (e.g., the expectation-maximization (EM) algorithm by Dempster et al. (1977) and multiple imputation methods (Rubin, 1987)). These approaches, however, do not adequately treat the nature of spatial data and do not suggest solutions to the problem of locating the information that is artificially recovered in the space. A further cause of uncertainty is unintentional positional error: that is, when observations on individuals are available, but their location is missing or not known with certainty. For instance, we have a list of firms in a census tract and we also have observations on some of their statistical characteristics, but we do not know their exact address within the area. In this case it is common to assign the individual to the centroid of each area, but this procedure generates a positional error (see Cozzi and Filipponi, 2012, for the archive of firms managed by the Italian National Statistical Institute). In this situation, not only are the traditional statistical procedures proposed in the literature to minimize the fallacies produced by missing data useless, but even their consequences on statistical modeling are still largely unknown (see Bennett et al., 1984; Griffith et al., 1989). Finally, we can encounter the case of intentional positional error (Arbia et al., 2015a) where both location and measurement of the single individuals are
10 Introduction known, but the individuals’ positions might be geo-masked before making them publicly available to preserve respondents’ confidentiality. In Section 3.4.2 we will discuss some of the effects of these data imperfections on spatial econometric modeling. Further sources of errors derive from measurement errors (Greene, 2018) and from the misalignment which might occur when data are collected at different levels of resolution (Mugglin et al., 2000; Banerjee and Gelfand, 2002; Madsen et al., 2008). Last but not least, a problem that it is often overlooked when analyzing spatial micro-data is the fact that they often refer to individuals that are not observed in the whole population but only in a sample. When data are observed for the whole population we refer to them as a “mapped point pattern” whereas in the second case we refer to a “sample point pattern”. In this respect, a common characteristic of many new unconventional data collection sources is represented by the collection of sample data which lack any precise statistical sample design. In crowdsourcing, for instance, participation is generally voluntary, meaning that the population is self-selecting. A similar problem emerges when extracting data that were published on web platforms and social media without taking into account the process that lead to their publication. This situation is described in statistics as “convenience sampling” in the presence of which, as it is well known, no sound probabilistic inference is possible (Hansen et al., 1953), as, in general, all the optimal properties of the estimators are lost. More precisely, while in a formal sample design the choice of observations is suggested by a precise mechanism which allows the calculation of the probabilities of inclusion of each unit (and, hence, sound probabilistic inferences), with convenience collection no probability of inclusion can be calculated thus giving rise to over- and under-representativeness of the sample units.
1.5 Conclusions and plan of the book In this first chapter we have introduced the main ideas on which spatial microeconometrics are grounded, and we discussed the advantages and the drawbacks connected with such an approach. The rest of the book is organized into three parts. Part II deals with methods and models for the spatial behavior of a single economic agent observed in a set of locations which are assumed to be exogenously given. This involves introducing some preliminary concepts in Chapter 2, discussing cross-section linear and non-linear models in Chapters 3 and 4, and dynamic space–time models in Chapter 5. In Part III, we will concentrate our attention on the position of the individual economic agent and discuss methods to model its spatial locational choices. After a preliminary chapter (Chapter 6) we will consider approaches to modeling the spatial location of the individual agent in a homogeneous (Chapter 7) and heterogeneous (Chapter 8) space. These approaches are extended to include the temporal dimension in Chapter 9. Part IV unites the methods discussed in Parts II and III, considering modeling strategies where both the spatial location and the spatial behavior of the
Foundations of spatial microeconometrics 11 economic agents are considered endogenous with a strong emphasis on the spatial aspects of firm demography and survival analysis. This discussion is reported in Chapter 10 which also concludes the book with a discussion of the many open questions and research challenges connected with the current and future development of this new discipline.
Note 1 The “ecological fallacy“ is the extending of conclusions and relationships observed at an aggregated level to the level of individuals.
Part II
Modeling the spatial behavior of economic agents in a given set of locations
2
Preliminary definitions and concepts
2.1 Neighborhood and the W matrix The classical linear regression model assumes normal, exogenous and spherical disturbances (Greene, 2018). However, when we observe a phenomenon in, say, n regions, non-sphericalness of residuals may arise due to the presence of spatial autocorrelation and spatial heterogeneity among the stochastic terms, in which case the optimal properties of the ordinary least squares (OLS) are lost. Before introducing various alternatives to the basic model, let us, however, introduce some preliminary concepts. In fact, we can intuitively define spatial correlation as a feature of data describing the fact that observations that are close together are more correlated than observations that are far apart (the “first law of geography” (Tobler, 1970)). However, a formal definition requires a clarification of the concept of “closeness”. At the heart of spatial econometrics methods is the definition of the so-called “weights matrix” (or “connectivity matrix”). The simplest of all definitions is the following: w 11 ... nWn = w1n
... wij
wn1 wnn
(2.1)
in which each generic element is defined as 1 if j ∈ N (i) wij = 0 otherwise
(2.2)
N(i) being the set of neighbors of location j. By definition we have that wii = 0. Many different alternative definitions of the W matrix are possible. A first definition is based on an inverse function of the distance: wij = dij−α ; α > 0 where often α = 2 due to the analogy with Newton’s law of universal gravitation. This first definition, however, presents the disadvantage of producing very dense W matrices an issue that can create computational problems with very large datasets.
16 Spatial behavior of economic agents A second definition considers a threshold distance (say d*) introduced to increase the sparseness of the W matrix thus reducing the computational problems emerging when dealing with large datasets (see Figure 2.1b). We can then have 1 if dij < d * simple binary matrices where wij = or, alternatively, a combina 0 otherwise −α d if dij < d * tion with the inverse distance definition where wij = ij . Finally, 0 otherwise 1 if i ∈ N k (i) where we can adopt a k nearest neighbors definition where wij = 0 otherwise N k (i) is the set of the k nearest neighbors to point i (see Figure 2.1a). Quite often the W matrices are standardized to sum unity in each row an operation called “row standardization”. In this case we have: wij * =
wij n
∑w
; wij * ∈ W *
(2.3)
ij
j =1
This standardization may be very useful in some instances. For example, by using the standardized weights we can define the matrix product L(y ) = W * y
(2.4)
1.0 0.8 s2 0.6 0.4 0.2
0.2
0.4
s2 0.6
0.8
1.0
in which each single element is equal to:
0.2
0.4
0.6
0.8
1.0
0.2
0.4
0.6
s1
s1
(a)
(b)
0.8
1.0
Figure 2.1 K-nearest neighbors contiguity criterion (k = 4): (a) only the first k nearest neighbors are considered; maximum threshold criterion; (b) all points within a radius d* are considered neighbors to the point located in the center.
Preliminary definitions and concepts 17 n
L(yi ) =
∑w
* ij y j
i=1
n
∑
=
i=1
wij y j n
∑w
∑
=
yj
j ∈N (i )
# N (i)
(2.5)
ij
i=1
with #N(i) representing the cardinality of the set N(i). The term in Equation 2.5 represents the average of variable y observed in all the individuals that are neighbors to individual i (according to the criterion chosen in defining W). It therefore assumes the meaning of the “spatially lagged value” of yi and for this reason is often indicated with the symbol L(y) by analogy with the lag operator in time-series analysis. This definition assumes no directional bias in that all neighbors affect individual i in the same way (for alternatives, see Arbia, 1990; Arbia et al., 2013). An important aspect of W matrices is represented by their “density”, defined n −1 as the percentage of non-zero entries a value that ranges between 0 and n when all off-diagonal entries are non-zero. The complement to 1 of the density is called the “sparsity” of a matrix. Dense W matrices should be avoided as they may involve severe computational problems in terms of computing time, storage and accuracy especially when the sample size is very large (see Arbia et. al., 2019b).
Example 2.1 Some simple examples of W matrices for sets of points are reported here. Consider, initially, a map of ten individual points (Figure 2.2). From this dataset we can build up four different W matrices employing the different neighborhood criteria illustrated above. i
ii
iii
iv
The first W matrix can be built up as a simple binary threshold distance by fixing a conventional threshold at a distance of, say, 0.45 (R Command: {spdep}-dnearneigh) (Table 2.1). The matrix can then be row-standardized obtaining Table 2.2. The second alternative is to build up a W matrix using the nearest neighbor criterion (R Command: {spdep}-knearneigh). We obtain Table 2.3, which is standardized by definition. As a third alternative we consider the squared inverse distance criterion. We first obtain the pair-wise distance matrix (Table 2.4) and then the W matrix is obtained calculating in each entry the squared inverse distance (keeping 0 in the main diagonal) (Table 2.5), which can be row-standardized, obtaining Table 2.6. Finally we build up a combination of cases ii and iii using the squared inverse distance below a threshold ( 1 indicate negative spatial correlation and finally values of the index < 1 indicate positive spatial correlation. The expected value and the variance of the index under the null of no spatial correlation where derived by Geary (1954) who also proved normality thus allowing the standard hypothesis testing procedures. Similarly to Moran’s I, however, no explicit alternative hypothesis is specified.
2.3 The Moran scatterplot and local indicators of spatial correlation Both Moran’s I and Geary’s index are global measures in that they allow us to test for a spatial pattern over the study area as a whole. However, individual behaviors are typically heterogeneous in space. Therefore, it could happen that the presence of significant spatial correlation in smaller sub-sections of the study areas is not observed in a single global index due to the presence of other sub-sections displaying negative or no spatial correlation. A way of measuring this effect is to define local measures that look at the spatial correlation of the phenomenon within smaller partitions. This approach is referred to as the analysis of “local indicators of spatial association” (LISA) (Anselin, 1995). The LISA are exploratory tools that can be employed on data before starting the formal modelling through regression analysis to identify patterns of spatial correlation among the variables of interest. While many different local indicators can be defined (see Anselin, 1995), here we will present only two of them, namely the local version of Moran’s I and the Getis–Ord statistic. Local Moran’s I represents a decomposition of global Moran’s I (Anselin, 1995) and can be expressed as:
∑w
ij (X i
Ii =
j
n −1
∑
− X )(X j − X ) (X j − X )2
(2.14)
j
where w ij are the elements of a weight matrix defined in Section 2.1, X i (i = 1, 2, …, n) are the observed values of the variable X in location i and X is the sample mean of X. As is clear, Equation 2.14 represents the single addend of Equation 2.9 and basically constitutes the contribution of each individual unit to the global measure. Positive values of local Moran’s I indicate a clustering of high values in the neighborhood of high values or, alternatively, a clustering of low values in the neighborhood of low values. These are generally indicated as HH (for high–high, called “hot spots”) or LL (for low–low, called “cold spots”). Conversely, negative values of the local index indicate the presence
Preliminary definitions and concepts 27 of spatial outliers where there is a significant concentration of low values in the neighborhood of high values or, alternatively an extra concentration of high values in the neighborhood of low values. These points are indicated respectively as HL or LH points. The significance of each local indicator can be then calculated either assuming asymptotic normality (by using the formal expressions derived for the expected value and the variance) or, alternatively, using Monte Carlo simulated sampling distributions.
Example 2.4 A sample output of the R procedure for local Moran’s I (procedure localmoran) is reported in Table 2.12. The outcome of the procedure is a vector of values which represent the single contribution to global Moran’s I (elements Ii in Equation 2.14) with the associated significance level. Table 2.12 includes only a sample of rows from the output of local Moran’s I related to the variable house price in the R dataset Boston already illustrated in Example 2.3 (R Command: {spdep}-localmoran). Significant local spatial correlations are highlighted in bold. Table 2.12 Local Moran’s I of house prices in Boston Location Ii
E(Ii)
Var(Ii)
z-score
Significance
1 … 15 16 17 18 19 20 21 22 23 24 25 26 …
−0.001980198 … −0.002 −0.002 −0.002 −0.002 −0.002 −0.002 −0.002 −0.002 −0.002 −0.002 −0.002
0.037846512 … 0.025 0.022 0.025 0.026 0.024 0.023 0.024 0.026 0.026 0.026 0.027 0.025 …
−0.448013484 … 1.090 0.621 −0.156 1.450 0.642 1.265 2.369 0.831 1.961 2.218 1.747 1.883 …
0.67 … 0.137 0.267 0.562 0.073 0.260 0.102 0.009 0.203 0.025 0.013 0.040 0.030 …
−8.913755e – 02 … 0.170 −0.027 −0.027 0.234 0.098 0.190 0.367 0.133 0.312 0.590 0.287 0.295 …
…
A second measure of local spatial correlation is based on the local concentration of values in the neighborhood of each individual. This second local index was introduced by Ord and Getis (1992) and it assumes the following expression:
∑w
ij X i
Gi =
j
∑X j
(2.15) j
28 Spatial behavior of economic agents where wij are the elements of the usual weight matrix. The index Gi is thus the share of the total amount of the variable X which is concentrated in the neighborhood of the i-th individual. In the same paper (Ord and Getis, 1992) the authors derived the expected value and variance of Gi so that formal testing procedures are available. A further paper (Getis and Ord, 1995) introduced a modified version of the statistics where the values are standardized to facilitate the interpretation. The modified index is expressed as:
∑ w = ∑ n
Gi*
j =1
n
wij2 j =1
S
∑ w − ∑ w n
−X
ij X j
j =1
ij
(2.16)
n
j =1
ij
n −1
45
where S is the standard deviation of the variable X. The modified G* is a Gaussian z-score so that hypothesis testing can be run straightforwardly. Positive values of the statistics indicate clustering of high values (coded as HH), while negative values indicate clustering of low values (coded as LL). A further tool is represented by the Moran scatterplot, a graphical exploratory tool introduced by Anselin (1995) that can help in identifying local
40
204
268
35
279 198 276 273
196 281 283
30
241 231 222 240 210
225
25
spatially lagged PRICE
197 278
181
99 269 262
20
257
229 234 187 226 187 167 162 263 268 258 379 372
108 105 101
15
285
99
10
20
30 PRICE
Figure 2.4 Moran scatterplot of house prices in Boston.
40
50
Preliminary definitions and concepts 29 patterns of spatial correlation. It is obtained as a simple scatterplot which places the value of the variable (say X) on the horizontal axes and the corresponding spatially lagged value on the vertical axes (say WX).
Example 2.5 As an example of a Moran scatterplot, Figure 2.4 considers again the house prices in 506 locations reported in the dataset Boston already used in Examples 2.3 and 2.4. The global value of Moran’s I is positive and highly significantly (I = 20.1892, p-value < 2.2e − 16) as it evident by the increasing regression line drawn on the graph. However, in addition to this global information, the graph also shows the presence of a large number of outliers (points observed in quadrants 2 and 4) which relate to prices that are much higher (or much lower) than the average of the neighboring locations (R Command: {spdep}-moran.plot).
2.4 Conclusions This chapter aimed to introduce the fundamental concepts of spatial analysis which will constitute the backbone of the rest of this part of the book and, in some sense of the whole book. We have introduced, in particular, the notion of the W matrix which is the fundamental tool in the spatial regression models which we will discuss in Chapters 3 and 4. The notion of spatial autocorrelation among regression residuals has also been approached by introducing various measures and hypothesis test statistics. Finally, we have also introduced some exploratory tools much used in the literature before facing the problem of specifying a plausible behavioral regression model which contemplates the presence of spatially interacting individual agents.
3
Basic cross-sectional spatial linear models
3.1 Introduction This chapter discusses different specifications of linear spatial econometrics. In particular, Section 3.2 is devoted to a detailed presentation of the basic models belonging to the SAR AR (1,1) (spatial autoregressive with additional autoregressive error structure) class. Section 3.3 introduces the associated tests for residual spatial correlation. Section 3.4 approaches the problem of quantifying the marginal effects in a spatial econometric linear model. In Section 1.4 we discussed the possible presence of spatial imperfection that may occur when dealing with micro-data. This topic is taken up again in Section 3.5 discussing how these imperfections may affect spatial econometric regression analysis with a particular emphasis on locational error and missing spatial data. Finally, Section 3.6 is devoted to the particular case, that occurs frequently in spatial microeconometrics, when a distance is used as a predictor in a regression model and how data imperfections related to missing data or locational error may seriously undermine the estimation and hypothesis testing procedures.
3.2 Regression models with spatial autoregressive components 3.2.1 Pure spatial autoregression The simplest of the models containing a spatial autoregressive component is the autopredictive model where the independent variable is regressed on its spatial lag without including any further predictors. This specification is known in the literature as the “purely spatial autoregressive model” (SAR) which can be expressed by the following equation: y = ρWy + ε
ρ < lm > 3 3 3
∑
(3.69)
This expression leads us to several interesting conclusions. First of all, with location error heteroscedasticity is introduced into the model due to the fact that Var [ui ] in Equation 3.66, in general, will be non-constant. Secondly, the attenuation effect is, obviously, null when θ * = 0 and it is also reduced to zero if points are scattered. In fact, when d → ∞, Var (u ) → 0. It is, conversely, emphasized if points are clustered in space in that it becomes larger when d → 0. Finally, the attenuation effect is a complex function of the maximum displacement distance θ * and of the pairwise distances between the observed points.
3.6 Problems in regressions on a spatial distance In many spatial microeconometric studies, the distance between each observed individual economic agent and a conspicuous point is often used as a predictor in a regression model. For instance, in health economics it is common practice to postulate a relationship between a health outcome for each individual (such as the effect of a health policy) and the individual’s distance from a clinic or a hospital. Similar examples may be found in labor economics, education, hedonic price studies and industrial economics to name but a few. In principle, there is no obstacle to using the OLS procedure to estimate the parameters of a regression with a distance used as an independent variable. However, as we have shown in Section 3.5, the presence of missing data or locational errors produces the effect that pairwise distances are biased upwards and produce a measurement error on
54 Spatial behavior of economic agents spatial autoregressive models. A similar effect can be observed when a distance is used as a regressor (see Arbia et al., 2015a). In this section we will report some useful theoretical results with the aim of illustrating the dangers that are hidden in this procedure. Let us consider the simple linear model with only the square of the distance as a regressor: y = α + β d + ε
(3.70)
which, although admittedly not very realistic, helps in illustrating the main points. In Equation 3.70, without loss of generality, we can assume that the conspicuous point is located in the origin of a squared area, so that the distance between and the conspicuous point can be simply expressed as dij2 = (i 2 + j 2 ). Consider further the case of a uniform geo-masking (illustrated in Section 3.5) where the coordinates are displaced along a random angle and a random distance. The observed distance after geo-masking, can now be expressed as 2
d ij = (i + θCosδ)2 + ( j + θSinδ)2 using polar coordinates. By defining the meas2
urement error as uij = (d ij − dij2 ), (similarly to the case discussed in Section 3.5) Arbia et al. (2015a) proved that the geo-masking produces an upward bias and a non-constant variance, respectively given by: E(uij ) =
θ* ≠0 3
(3.71)
and Var (uij ) =
17 *4 2 *2 2 θ + θ dij 180 3
(3.72)
So, consistently with the results reported in Section 3.5, the variance of the measurement error increases with the maximum displacement distance θ * and as we move away from the conspicuous point (the term dij). We can use this result to provide an explicit expression to the estimation variance and for the attenuation effect induced by measurement error. We have, respectively: 17 *4 2 *2 2 β2 θ + θ dij + σ ν2 180 3 Var(β) = nσ 22
(3.73)
d
and σ2 2 p lim(β) = β 2 d 2 σu + σ d2
σ 22 =β d 2 17 *4 θ + θ *2dij2 + σ 22 d 180 3
(3.74)
Basic cross-sectional spatial linear models 55 (see again Arbia et. al., 2015a) which leads to the intuitive results that the greater the maximum displacement distance in a geo-masking procedure, the lower the precision and the larger the attenuation effects. This result is useful for practical purposes because the data producers, before geo-masking the data, can calculate the appropriate expression when choosing the maximum location error (θ *) so as to limit the negative consequences on any subsequent econometric analysis. Furthermore, using Equation 3.74, the data producer could disclose to the end users and to the practitioners the level of attenuation that is expected given the chosen level of the geo-masking procedure. The results reported here for linear models were extended to the case when a distance is used as a regressor in a discrete choice model (Arbia et al., 2019b).
Example 3.7 In order to illustrate the effects described in this chapter, let us consider a set of simulated data where 100 individuals are observed in a unit square study area as shown in Figure 3.2. Taking these points as given, we have that in Equations 3.73 and 3.74 dij2 = 0.520151 (considering for operational reasons the mean of
0.0
0.2
0.4
j
0.6
0.8
1.0
all squared distances from the origin) and σ 22 = 0.1592879. d With reference to these artificial data Figure 3.3 reports the behavior of the attenuation effect for Gaussian (lower curve) and uniform (upper curve) geo-masking for the values of the maximum displacement distance θ * ranging between 0 and 1.44 (1.44 being the theoretical maximum possible distance in a unitary square). Two features emerge from the inspection of the graph. First,
0.2
0.4
0.6
0.8
1.0
i
Figure 3.2 100 simulated individuals observed in a unit square study area.
0.6 0.4 0.2
attenuation_Gauss
0.8
1.0
56 Spatial behavior of economic agents
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
theta
Figure 3.3 Attenuation effect for Gaussian (lower curve) and uniform (upper curve) geo-masking as a function of the maximum displacement.
the attenuation increases dramatically already at small levels of θ *. Secondly, the Gaussian geo-masking, other things being constant, produces more severe consequences on the estimation of β than the uniform geo-masking. This type of graph could be used by data producers to calibrate the optimal value of θ * and to communicate to the practitioners the resulting level of attenuation they should expect from a regression analysis.
4
Non-linear spatial models
4.1 Non-linear spatial regressions In Chapter 3 we considered the case of linear spatial regressions. In this chapter we will extend the analysis to non-linear models. Non-linear models can emerge for different reasons. One important instance is when the dependent variable can assume only a limited number of discrete outcomes. There are many examples of cases where this modeling framework is useful in spatial microeconometrics. For instance, we might be interested in explaining patient’s choices in health economics or school choices in educational economics. But many other examples can be found: for instance, when studying the presence or absence of a certain technology in a set of firms in industrial economics, consumer choice regarding different shopping centers, in electoral behavior, in criminal behavior and in a large number of other situations. We refer to these cases as to “discrete choice modeling” (see Greene, 2018). A second important class of non-linear models emerges when the dependent variable is limited by censoring or truncation (Greene, 2018). Surprisingly, despite the interest of non-linear models from an applied perspective, in the spatial econometrics literature they have received comparatively less attention than the models presented in Chapter 3, partly because of the higher analytical complexity involved and the associated higher computational effort required even in moderately large samples. In the case of non-linear models the framework presented in Chapter 3 cannot be employed, and it needs to be adjusted to respect the statistical nature of the datasets employed. The various specifications of spatial non-linear models follow the general strategy used in the literature to deal with non-spatial non-linear models with a particular emphasis on the “logit”, “probit” and “tobit” specifications adapted to account for the presence of spatial dependence in the dependent variable. In what follows we will present some of the most popular spatial versions of these models. The interested reader is referred to the works of Beron and Vijverberg (2004), Fleming (2004), Smirnov (2010) and LeSage and Pace (2009) for more thorough reviews.
58 Spatial behavior of economic agents
4.2 Standard non-linear models 4.2.1 Logit and probit models We will start by introducing standard discrete choice specifications based on the idea of a utility function associated with a latent continuous variable y* which can be studied through a linear regression model: y • = Xβ + ε
ε ≅ i.i.d.
(4.1)
where the error terms are independent, with a distribution that can be differently specified. We imagine that if the utility variable y* exceeds a certain threshold, say a, then the associated binary variable y is equal to 1 and 0 otherwise. Formally we can express this condition as y = I(y*>0) with I(.) the indicator function such 1 if a > 0 that I (a > 0) = . When the utility y* is greater than 0, then the 0 otherwise economic event materializes (such as the decision to go to a certain hospital or to enroll in a certain school). Equation 4.1 represents a basic linear regression model but expressed in terms of an unobservable utility variable, y •. The phenomenon of interest can only be observed through the variable y. If we are interested in the probability that the event is realized we can calculate: P (yi = 1 X ) = P (yi• > 0 X ) = P (X i β > ε X ) = P(ε < X i β X ) = F ( X i β ) µi
=
∫ f (µ ) d µ
(4.2)
−∞
and, similarly for P(yi=0|X). With the symbol F(x) in Equation 4.2 we indicate the cumulative probability distribution function, with f ( x ) the associated den∂ sity function such that f ( x ) = F ( x ) and with µ = Xβ the systematic compo∂x nent of the model which in this literature is called the “index function” (Greene, 2018). If we specify different distribution functions in Equation 4.2 we obtain different discrete choice models (for a review, see Greene, 2018). The most popular are the probit and logit specifications. In particular, if in Equation 4.2 we define F(.) as the standardized normal probability distribution function, say F(.) = Φ, we obtain the probit model. If conversely we define F(.) as a standardized logistic distribution, say F(.) = Λ, we produce the logit model. The standardized logistic distribution has, by definition, zero expected value and a variance equal 2 to π 3 . However, since the value of the dichotomous variable y depends only on the sign of y • and not its absolute value, it is not affected by the amount of the variance so that it is not a limitation to standardize it to 1. A popular estimation method for both the probit and the logit is the maximum likelihood approach. Indeed, the likelihood function can be easily built up as the probability of drawing a random sample, of size say n, from a sequence of independent Bernoulli variables. This is equal to: L(β ) = P (Υ1 = y1 , ...,Υn = y n X ) = P (Υ1 = y1 X )...P (Υn = y n X )
(4.3)
Non-linear spatial models 59 where Y i represents a random variable and y i its sample realization. In the standard setting due to the hypothesis of Equation 4.1, we have: L (β ) =
∏ F (X β )∏1 − F (X β ) i
where for
(4.4)
i
yi =1
yi =0
∏ represents the product for all values of y such that y = 1 and similarly i
yi =1
∏. Since y is dichotomous, we can express Equation 4.4 as: i
yi =0
n
L (β X ) =
∏ F (X β ) ∏1 − F (X β ) yi
i
i =1
i
yi −1
(4.5)
yi =0
The log likelihood follows straightforwardly as: l (β )= ln [L (β )] =
n
∑ y {ln F (X β ) + (1 − y ) ln 1 − F (X β )} i
i
i
i
(4.6)
i =1
which can only be maximized numerically due to its non-linearity. From Equation 4.6 it follows the score function:
∂ l (β )= ∂β
n
∑ yFf
i i
+ (1 − yi )
i
i =1
− fi x i = 0 1 − Fi
(4.7)
First of all, let us assume that the probability distribution function has a logistic specification which defined the logit model, and, furthermore, let us set in Equation 4.7 Fi = Λi = Λ ( X i β ). Equation 4.7 becomes (Greene, 2018):
∂ l (β )= ∂β
n
∑( y
i
− Λi ) X i = 0
(4.8)
i =1
From Equation 4.8 we also obtain the elements of the Hessian matrix as:
∂2 l (β )= ∂β 2
n
∑ Λ (1 − Λ ) X X i
i
i
T i
(4.9)
i =1
which forms the basis for the calculation of Fisher information matrix to be used in confidence interval estimation and hypothesis testing. As an alternative, we can assume that the errors in Equation 4.1 are normally distributed, leading to a probit model. In this case the score function is equal to:
∂ l (β )= ∂β
∑ 1−−ϕΦ i
yi =0
i
Xi +
∑ Φϕ
i
yi =1
i
X i = 0
(4.10)
60 Spatial behavior of economic agents with φi = φ ( X i β ), the standard normal density function such that ϕ = elements of the Hessian matrix in this case can be derived as:
∂2 l (β )= ∂β 2
∂ Φ(t ) . The ∂t
n
∑ −κ (κ + X β ) X X i
i
i
i
T i
(4.11)
i=1
with the term κ being equal to κ i =
(2yi − 1)φ (2 yi − 1)X iT β Φ (2 yi − 1)X iT β
.
An important aspect of both logit and probit models concerns the interpretation of the parameters, a topic which we have already discussed in Section 3.8 when dealing with spatial linear models. Indeed, in both specifications, the interpretation is not straightforward as it is in the case of the (a-spatial) linear regression model because in a non-linear model impacts are not constant over the values of the variable X, and a whole function has to be considered. In this case, the marginal effect of a unitary increase of the independent variables on the binary dependent variable is not simply expressed by the regression coefficient, and it assumes, in general, the following expression:
∂ E (y X ) = f ( Xβ ) β ∂X
(4.12)
which, due to the non-linearity of the model, depends on the observed value of the variable X. More explicitly, for the logit model this could be expressed as:
∂ E (y X ) = Λ ( Xβ ) 1 − Λ ( Xβ ) β ∂X
(4.13)
while for the probit model it assumes the following expression:
∂ E (y X ) = φ ( Xβ ) β ∂X
(4.14)
4.2.2 The tobit model A second source of non-linearity in the models can refer to the presence of censoring or truncation of the dependent variable. A general class of these models is represented by the tobit model (see Goldberger, 1964; Tobin, 1958). In contrast with the logit and probit specification, in a tobit model the dependent variable is censored or truncated in some way (see Greene, 2018). A latent variable linear models can then be specified, but in this case we don’t observe a dependent variable, say, y *, rather only a variable y, say, such that y = min(y *, c) if right censored, or w = max(y *, c) if left censored with c the threshold. Usually the threshold is normalized by setting c = 0. In a truncated regression, data are missing beyond a censoring point.
Non-linear spatial models 61 The simpler standard tobit model (Type I) is expressed by the following equation: y * = Xβ + ε
(4.15)
(
)
with ε ≈ i.i.d .N 0, σ ε2 and the usual notation for the rest. A similar notation can be used for left truncation. y * is a latent unobservable variable, while the * y : if y * > c observable variable is defined by y = . In deriving the likelihood 0 : if y * ≤ c function for variable y notice that, according to the definition given above, we have to contemplate two cases: (i) y > 0 and (ii) y = 0. When y > 0 the density function for the single observation i is given by:
(
)
P yi > 0 =
1 yi − xi β φ σ σ
(4.16)
with φ indicating again the normal probability density function. When y ≤ 0, in contrast, the density function for the single observation i is given by: x β P yi = 0 = 1 − φ i σ
(
)
(4.17)
with φ the cumulative distribution function of the normal distribution. Let us now make use again of the indicator function I i yi* > 0 = I i , for short. Then Equations 4.16 and 4.17 can be combined as:
(
( )
P yi
)
1−I I i x β i 1 yi − xi β 1 − φ i = φ σ σ σ
(4.18)
For the hypothesis of independence from Equation 4.15 we can obtain straightforwardly the likelihood function as: n
L (β , σ ) =
∏ i=1
1−I 1 y − x β I i x β i i 1 − φ i φ i σ σ σ
(4.19)
and the log likelihood as: I ln 1 φ yi − xi β + 1 − I ln 1 − φ xi β i) ( i σ σ σ i=1 n
l (β , σ ) =
∑
that is used in estimation and hypothesis testing procedures.
(4.20)
62 Spatial behavior of economic agents
Example 4.1
540 520
baltimore$Y
560
580
To illustrate the models presented here let us introduce a set of house-price data and house characteristics observed in Baltimore, Maryland in 1978. The data were prepared by Luc Anselin for the library spdep in R and were originally collected by Dubin (1992). The location of the houses are reported in Figure 4.1.
860
880
900
920
940
960
980
baltimore$X
Figure 4.1 Locations of 211 houses in Baltimore. Table 4.1 Output of a probit model explaining luxury houses in Baltimore as a function of number of storeys and square feet estimated with ML Parameter
Estimated value Standard error
z-test
p-value
Intercept Number of storeys Square feet
−1.624 −2.297
0.842 0.695
−3.113 −3.308
0.002 0.001
0.236
0.046
5.082
AIC = 96.893
3.74e − 07
Non-linear spatial models 63 Table 4.2 Output of a logit model explaining luxury houses in Baltimore as a function of number of storeys and square feet estimated with ML Parameter
Estimated value Standard error
z-test
p-value
Intercept Number of storeys Square feet
−1.584 −1.085
0.455 0.345
−3.479 −3.144
0.0005 0.001
0.118
0.022
5.310
1.1e − 07
AIC = 96.89
We estimate a probit and a logit model where a luxury house (defined as a house with a price higher than the 90th quantile) is expressed as function of two variables: square feet and number of storeys. The results obtained with the standard likelihood procedure are reported in Tables 4.1 and 4.2 for the probit and logit models (R Command: {stat}-glm.Probit). Results are obviously different in the two modeling frameworks, although the significance and the sign of the coefficients are in accordance in each model.
Example 4.2 With the same dataset, we want to explain only the price of the luxury houses (with a censoring at the 90th quantile) as a function of the number of storeys Table 4.3 Output of a tobit model explaining the price of luxury houses in Baltimore as a function of number of storeys and square feet estimated with ML Parameter
Estimated value Standard error
t-test
p-value
Intercept Number of storeys Square feet
48.058 −17.001
3.677 2.228
13.069 −7.628
< 2e − 16 2.38e − 14
2.600
0.147
10.890
< 2e − 16
AIC = 57897.88
and square feet using a tobit model type I as shown in Table 4.3 (R command: {CensReg} CensReg).
4.3 Spatial probit and logit models 4.3.1 Model specification In the spatial econometric literature the classical probit model has been adapted to account for spatial dependence in its versions as spatial lag or spatial error which we have reviewed in the case of linear models in Chapter 3. In particular,
64 Spatial behavior of economic agents by analogy with the spatial lag model presented in Section 3.3, the spatial lag probit model can be expressed as: y * = λWy * + Xβ + ε
(4.21)
with ε | X ≈ i.i.d . N (0 , 1), W the usual non-stochastic weight matrix, y* an n-by-1 vector of the latent variable Y, λ the spatial autoregressive coefficient, y the observed binary variable such that y = (I (y * > 0) and X the matrix of regressors. Notice that, the error variance can be normalized to 1 without loss of generality because only the sign is relevant to determine y and not the absolute value of y *. The estimation of Equation 4.21 presents three major problems. First of all, by analogy with the case of the spatial lag specification for linear models described in Section 3.8, we have a problem of endogeneity generated by the correlation between the error and the spatially lagged value of y*. Secondly, the standard ML estimators are also inconsistent because of the heteroscedasticity induced by spatial dependence (Case, 1992; Pinkse and Slade, 1998). Thirdly, we also observe the inefficiency of the estimators as a consequence of the neglected information in the off-diagonal terms of the variance–covariance matrix (Fleming, 2004). Equation 4.15 can be expressed in its structural form by isolating y* in the right hand side. We have: y * = (I − λW )
−1
−1
Xβ + (I − λW ) ε = X *β + ε *
−1
(4.22) −1
with ε * = (I − λW ) ε and ε ≈ MVN (0, Ω) and X * = (I − λW ) X and assuming that all the diagonal elements of W are zero and that λ 0|X i* , wij yi* = P X *β + ε * > 0 | X i* , wij yi* *
X iT β + σ i X *T β Φ i Ωii
(
)
(
)
(4.32)
which can be used in an M-step where the following log likelihood is maximized: l = cos t −
1 1 ln Ω − µT µ 2 2
(
)
(4.33)
with Ω = (I − λW )−1 (I − λW )−T , µ = (I − λW ) yˆ * − X i β , and yˆ * is the vector of the predicted values of the latent variable derived in the E-step. Although theoretically sound, the EM approach has two drawbacks. First of all, it requires the estimation of the variance–covariance matrix Ω. The solution suggested by McMillen (1992) consisted of interpreting the probit model as a non-linear weighted least square model conditional on the spatial parameter (Amemiya, 1985), but this solution produces biased estimators (Fleming, 2004). Secondly, the estimation process can be computationally very slow and
Non-linear spatial models 67 inaccurate. Indeed, the calculation of the determinant of the matrix Ω needs to be repeated at each iteration of the M-step until convergence and in the presence of large n and dense W matrices this operation can be very long and inaccurate in that it needs to be based on approximations.
Example 4.3 Using again the dataset presented in Examples 4.1 and 4.2, we have the results shown in Table 4.4 (R command: {McSpatial}-spprobitml), where the parameter Table 4.4 Output of a spatial lag probit model explaining luxury houses in Baltimore as a function of number of storeys and square feet estimated with ML Parameter
Estimated value Standard error
z-test
p-value
Intercept Number of storeys Square feet
−1.674 −0.558
0.635 0.323
−2.635 −1.727
0.008 0.084
0.119
0.022
5.338
0.084
λ = 0.7941587
λ is positive, and the coefficients are consistent with those found in Example 4.1 for the a-spatial probit specification. Turning our attention to the spatial error probit model, Pinkse and Slade (1998) suggested a solution based on the generalized method of moments. Going back to Equation 4.6 the generic formulation of the log likelihood function of a probit model is: l (β , ρ)= ln [L (β , ρ)] =
n
∑ y {ln F (X β ) + (1 − y ) ln 1 − F (X β )} i
i
i
i
i=1
which, in the hypothesis of normal disturbances, can be expressed as:
(
l β , ρ|X ,Wy
n
*
XT β n XT β = yi lnΦ i + (1 − yi )lnΦ 1 − i Ωii Ωii i =1 i=1
) ∑
∑
(4.34)
Traditionally a GMM approach is based on the identification of a set of moment conditions based on the properties of the residuals. In this new context we introduce the notion of pseudo- (or generalized) errors, defined by: X T β X T β yi − Φ i φ i Ωii σ ii ui = X T β X T β Φ i 1 − Φ i Ωii Ωii
(4.35)
68 Spatial behavior of economic agents Let us now consider a set of k instruments arranged in a (n-by-k) matrix H. The instruments are exogenous by definition so that we can define the set of the moments condition:
(
)
E H T u = 0
(4.36)
If we indicate with the symbol hi the i-th row of a matrix of instruments H, the single i-th condition of Equation (4.36) can be expressed algebraically as: E hi
X T β X T β yi − Φ i φ i Ωii σ ii =0 X T β X T β i i 1 − Φ Φ Ωii Ωii
(4.37)
so that, eventually, if we set to zero the empirical analogue of Equation 4.37, we obtain the following system of moment equations:
n
m (β , ρ ) =
∑ i=1
X T β X T β yi − Φ i φ i Ωii σ ii hi =0 X T β X T β Φ i 1 − Φ i Ωii Ωii
(4.38)
If we adopt a generalized version of the method of moments, Equation 4.38 is replaced by the following minimization problem: m(β , ρ)T M −1m(β , ρ) = min
(4.39)
M being a positive definite matrix containing the weights assigned to each sample moments m(β , ρ). Pinkse and Slade (1998) proved that the GMM procedure provides consistent and asymptotically normal estimators and developed an explicit expression for their variance–covariance matrix using Newey and West’s approach (see Newey and West, 1987). The GMM estimator is distribution free so that we do not have to rely on the hypothesis of normality of residuals as in the maximum likelihood procedure. Furthermore, we do not need to calculate the determinant and the inverse of large n-by-n matrices, which, as noted in the previous section, constitutes the major computational problem connected with the use of the maximum likelihood procedure when estimating a spatial lag probit model. On the other side, Equation 4.35 cannot be minimized analytically, so the GMM estimators suffer from the computational problems connected with numerical optimization. Similarly to what happens when implementing the ML, this operation requires the evaluation of the variance–covariance matrix Ω repeatedly for each candidate value of the parameter ρ in a numerical search and this can become a formidable task even for moderately large sample due to the complex form of Ω. The technique described here could be adjusted also for a spatial error logit model.
Non-linear spatial models 69
Example 4.4 With the R command: {McSpatial}-gmmprobit, we can estimate the spatial lag version of the procedure introduced by Pinkse and Slade (1998). Using again the dataset described in Example 4.1 and building up the same model, we obtain Table 4.5, which shows that, while all parameters are still significant and of the same sign as in Examples 4.1 and 4.3, now also the parameter λ is positive and significant. Table 4.5 Output of a spatial lag probit model explaining luxury houses in Baltimore as a function of number of storeys and square feet estimated with GMM Parameter
Estimated value Standard error
z-test
p-value
Intercept Number of storeys Square feet
−1.074 −0.832
0.544 0.416
−1.974 −2.001
0.048 0.045
0.113
0.0226
5.005
5.56e−07
λ = 0.7162724 (p-value = 1.597265e − 06)
In order to estimate a spatial lag logit model, Klier and McMillen (2008a) proposed a linearized version of Pinkse and Slade’s GMM approach illustrated in the previous section which avoids the problem of inverting large matrices. Recalling Equation 4.22 the spatial lag logit model can be defined as: y * = λWy * +Xβ + ε
(
(4.40)
)
and ε | X ≈ Logistic 0, σ ε2 . In its reduced form this becomes: y * = (I − λW )
−1
−1
Xβ + (I − λW ) ε = X *β + ε *
(4.41)
*
where the transformed errors ε are heteroscedastic with a variance–covariance matrix: σ 1 0 ˆ = Σ 0
0 σ2
0
0
0 0
σn
0
(4.42)
In this context, due to the hypothesis that the errors are distributed according to the logistic law, the probability of success can be defined as: Pi = P ( yi = 1) =
(
exp X **β
(
)
1 + exp X **β
)
(4.43)
H = I −λ
70 Spatial behavior of economic agents −1 ^ −1 is the variable X transformed as in Equation 4.22, with X ** = (I − λW ) X ∑ but also normalized to account for the heteroscedasticity. In this setting, Klier and McMillen (2008a) define the generalized logit model as:
ui = yi − Pi
−1
(4.44)
and introduced their estimation procedure, by assuming an initial value for the parameters to be estimated in Equation 4.40. Let us call them δ0 = ( β0 , λ0 ). They then use these initial values in Equations 4.39 and 4.40 to calculate the initial value of the generalized residuals, call them u 0 = yi − Pi 0 with an intuitive notation. ∂P The next step involves the calculation of the gradient Gδ0 = i0 and to regress ∂δ0 −1 it onto a set of instruments, say H, defined by the transformation H = (I − λW ) WX ** with X ** defined as before. The outcome of this operation is an estimated value for the gradient which we will call Gˆ . This estimation is then employed in the following recursive expression:
δ0
δ1 = δ0 + (Gˆ 0T Gˆ 0 )−1Gˆ 0T u 0
(4.45)
which makes use of the initial value of δ0 and of the pseudo-residuals u 0 . Klier and McMillen (2008a) derived the explicit expressions for the gradients of β and λ , given by: G βi = Pi (1 − Pi )Z i**
(4.46)
and Z **β G λi = Pi (1 − Pi )H i β − i 2 Ξii σi
(4.47)
with Ξii the i-th diagonal element of matrix Ξ = (I − λW )−1W (I − λW )−1 (I − λW )−1. To make the method operational and avoid the computations involved by the inversion of the matrix (I − λW )−1 Klier and McMillen (2008a) proposed linearizing Equation 4.47 by a series expansion approximation around the starting point λ0 = 0 where no matrix inversion is required. Following this idea, and stopping at the first linear term of the expansion, we have: ui ≅ ui0 + G (δ − δ0 )
(
and let M = H T H
(
)
(4.48)
) −1. The objective function to be minimize thus becomes:
νT H H T H H T ν
(4.49)
X **
Non-linear spatial models 71 where v are the transformed generalized errors defined by:
νi = ui0 + Gδ0 −Gδ
(4.50)
In this way the procedure reduces to a two-stage least squares estimation of a standard non-spatial logit model. In general, the linearized model provides a good approximation for estimating the true parameters although with a certain loss of efficiency. Furthermore, when the true structure of the model is captured by the model, Klier and McMillen (2008a) show that the linearization provides accurate estimates when the parameter λ < 0.5 while it produces a upward bias when λ > 0.5. The procedure can be easily extended to deal with a spatial lag probit model instead of a logit model.
Example 4.5 Again we make use of the house price data presented in Example 4.1 to estimate a spatial lag probit model using the linearized version of the GMM procedure illustrated in this section. Results are reported in Table 4.6 for the probit model (R command: {McSpatial}-spprobit) and Table 4.7 for the logit model (R command: {McSpatial}-splogit). Although with some difference in the absolute value, the inferential conclusions are similar for the two models. In both cases the spatial correlation parameter λ is positive and highly significant, although for the second model only the Table 4.6 Output of a spatial lag probit model explaining luxury houses in Baltimore as a function of number of storeys and square feet estimated with a linearized version of GMM Parameter
Estimated value Standard error
z-test
p-value
Intercept Number of storeys Square feet
−0.255 −0.828
0.539 0.283
−0.474 −2.923
0.635 0.003
0.018
0.023
5.450
0.000
λ = 0.95393 (p-value = 0.000)
Table 4.7 Output of a spatial lag logit model explaining luxury houses in Baltimore as a function of number of storeys and square feet estimated with a linearized version of GMM Parameter
Estimated value Standard error
z-test
p-value
Intercept Number of storeys Square feet
0.149 −1.530
1.472 0.973
0.101 −1.572
0.919 0.115
0.171
0.074
2.296
0.021
λ = 1.04116 (p-value = 0.004)
72 Spatial behavior of economic agents variable “square feet” is significant. Furthermore, comparing these results with those reported in Examples 4.1, 4.3 and 4.4, we obtain similar results in terms of the sign of the coefficients, but notice that both the absolute value and the standard errors are rather different in the three estimation methods.
4.4 The spatial tobit model 4.4.1 Model specification Although spatial versions of the probit and logit models are probably the most commonly used non-linear models in the spatial literature, in the specific area of microeconometrics the tobit model also enjoys a certain popularity (Tobin, 1958; Amemiya, 1985). In recent years some studies (Flores-Lagunes and Schnier, 2012; Xu and Lee, 2015) extended the basic tobit model to consider spatial effects. Qu and Lee (2012) introduced into the literature, two distinct typologies of spatial lag tobit models namely: the “simultaneous spatial lag tobit model”, which is expressed as: n yi = max 0, λ wij y j + xi β + εi i =i
∑
(4.51)
and the “latent spatial lag tobit model”, which is expressed through the latent variable yi* defined by the equation: yi = max {0, yi* } where yi* = λ
n
∑w
* ij y j
+ x i β + εi
(4.52)
i =i
It is fair to say that, compared to the latent SAR Tobit model, there are fewer studies that use the simultaneous specification, possibly because, in this case, there is still no formal proof of asymptotic properties of the ML estimators. Qu and Lee (2012) also showed that the spatial lag tobit model can be motivated by two distinct branches of microeconomic theories. The first is the literature on peer effects from an exogenous social network in which the model represents a Nash equilibrium where each individual maximizes its utility. The second is related to the standard econometric modelling in cases where a large share of data can be zero. Apart from the Qu and Lee specification, LeSage (2000) and LeSage and Pace (2009) presented a Bayesian approach in the estimation of the latent SAR Tobit model. Similarly, Donfouet et al. (2012) and Autant-Bernard and LeSage (2011) also make us of a latent SAR Tobit model using Bayesian tools.
4.4.2 Estimation Qu and Lee (2013) showed that the log likelihood function of the latent spatial lag tobit model (Equation 4.52) can be expressed as:
Non-linear spatial models 73 N
l (λ, β , σ ) =
∑
I ( yi = 0) lnΦ(zi (θ ) −
I =1
1 ln 2πσ 2 2
−
(
1 2
n
n
) ∑I ( yi > 0) + ln I 2 − λW22
∑
i =1
I ( yi > 0) zi2 (θ )
(4.53)
i=1
where, in addition to the notation previously introduced, θ = ( λ , β ′, σ ) , W22 represents the submatrix of W which corresponds to yi >0, I2 is the identity matrix ( yi − λwijΥj − xi β ) with the same dimension and the term zi is given by zi = σ where Y = max(0, y*). Qu and Lee (2015) showed that Equation 4.53 is computationally tractable and can be maximized numerically to obtain the ML estimation. They also established the consistency and asymptotic normality of the ML estimation and proved through simulations their finite sample performance and the robustness of estimates under non-normal disturbances.
4.5 Further non-linear spatial models Apart from the non-linear models considered in this section, in the econometric literature we find various other specifications of discrete choice models including bivariate and multivariate probit and logit, ordered probit and logit, truncation, censoring, sample selection, models for count data and duration (see Greene, 2018). Some of these topics have been treated in the spatial context (e.g. Wang and Kockelman, 2009), but the field is still largely unexplored. From the estimation point of view in the literature, various alternatives have been introduced in an attempt to reduce the computational burden that can be very heavy even with datasets of a few thousand observations. For instance, LeSage and Pace (2009) suggest the use of a Gibbs sampler algorithm in order to estimate a spatial lag probit model. However, the approach does not eliminate the problem of the inversion of the weight matrix which is present in all the other methods. As a consequence, its use is limited to samples of few thousands of observations. They report that, in a simulation experiment with a sample size of 400 and 1,200 draws of the MCMC sampler and only m = 10 replications of the Gibbs sampler, the estimation procedure required 20 minutes with the computational time increasing proportionally to n. Thus for instance, if we increase the sample size up to n = 10,000, the time required increases up to about 9 hours. The procedure is obviously sensitive to m, but even reducing it to, say, m = 1 (at the expense of accuracy) the time required is still more than 1 hour. Beron and Vijverberg (2004) proposed a further alternative based on the GHK simulator to evaluate the n-dimension integral, but without succeeding to substantially reducing the computational burden. More recently Wang et al. (2013) and Arbia et al. (forthcoming) suggest the use of a partial bivariate likelihood.
74 Spatial behavior of economic agents
4.6 Marginal impacts in spatial non-linear models In Chapter 3.8 we discussed the issue of calculating the marginal impact produced on the dependent variable y of unitary changes in the dependent variables of a linear spatial regression model. In Equation 4.14 we presented the analytical expression of the analogous marginal effects in a standard probit model with no spatial dependence. It is now possible to extend this analysis to the spatial probit model. The discussion which follows draws heavily on LeSage et al. (2011) who introduced this topic in the spatial literature. Let us consider the impact on y* in location i arising from a change in an independent variable Xk at location j (LeSage et al., 2011) By definition this is given by the derivative of the expected value of y* that can be written as: E(y*) = S(ρ)Xβ = η where S(ρ) = (I − ρW )
−1
(4.54) of single entry Sij(ρ). Let us now consider the derivative
of E(y*) in location i with respect to the observation of variable Xk in location ∂P ( yi = 1) j, which can be written as . Now remember that from Equation 4.10 ∂X k , j we can express P(y = 1) = F {S(ρ)Xβ} as the probability distribution function at the truncation point S(ρ)Xβ. Therefore we can express the marginal effect in algebraic terms as: ∂P ( yi = 1) ∂F (η ) = Sij(ρ)βk = f (ηi ) Sij.(ρ)βk ∂X k , j ∂η
(4.55)
∂F ( x ) where f(.) represents the probability density function f ( x ) = and βk the ∂x coefficient of variable Xk in the probit model. Notice, incidentally, that if ρ = 0 then S(ρ) = In (and each entry Sij (ρ) = 0), and Equation 4.55 expresses the ∂P ( yi = 1) ∂F (η ) marginal effect in a standard a-spatial probit model as = βk ∂X k , j ∂η where changes in the x-values of neighboring location j have no influence on location i. LeSage et al. (2011) suggested representing all the marginal effects in matrix form arranging on a diagonal matrix, the vector of the probability density function evaluated at the predictions for each observation. Let us indicate this vector as df (ηi ) ≡ { f (ηi )} and the diagonal matrix which contains it with the symbol D{f(η)}. By construction D is symmetric with all the non-diagonal elements equal to 0s. D is symmetric. As a consequence we can re-express Equation 4.55 for the independent variable X k in matrix form as: ∂P ( yi = 1) = D{f(η)}S(ρ)Inβk ∂X 'v
(4.56)
Non-linear spatial models 75 with In the n-by-n identity matrix. The matrix S(ρ) can be now expanded in power series as: S(ρ) = I n + ρW + ρ2 W2 + …
(4.57)
So that Equation 4.50 becomes : ∂P ( yi = 1) =[D{f(η} + ρD{f(η)}W + ρ2 D{f(η)}W2 + …]βk ∂X 'k
(4.58)
We can now derive the n-by-1 vector of total effects as the row summation of the matrix expressed in Equation 4.58. This is given by: ∂P ( yi = 1) 2 2 ι =[D{f(η} + ρD{f(η)}W + ρ D{f(η)}W + …] ι βv X ∂ ' v
(4.59)
ι being the unitary vector of dimension n. Hence: ∂P ( yi = 1) 2 ι = D[{f(η}ι + ρD{f(η)}ι + ρ D{f(η)}ι + …]βv ∂X 'v = D[{f(η}i] (I − ρ)
−1
βv = d[{f(η}] (I − ρ)
−1
βv
(4.60)
In practical cases it is more useful to calculate some scalar measures which can be derived by analogy of those described in section 3.8 for linear models. In particular, the average total impact is the average of the vector of total effects, that is: T
ATI = n−1 d { f (η )} ι (I − ρ )
−1
βv
(4.61)
Similarly, the average direct impact is given by:
β 1 ∂P ( yi = 1) 2 2 tr = (tr[D{f(η)}] + ρtr[D{f(η)}W] + ρ tr[D{f(η)}W ] + …) v n n ∂X 'v (4.62) and, obviously, the average indirect impact can be derived by: AII = ATI – ADI
(4.63)
Efficient methods to derive these summary measures are described in LeSage and Pace (2009). Similar summary measures can be derived for the spatial tobit models. (See the reference manual of the R command {spatial probit}-marginal.effects).
5
Space–time models1
5.1 Generalities Although there are several possible alternative statistical modelling frameworks for space–time economic data (see Cressie and Wikle, 2011), the dominant approach in econometrics considers spatial panels of individuals (Baltagi, 2008; Wooldridge, 2002). Generally speaking, the term “panel data” is used to refer to a set of repeated observations on a cross-section of economic agents such as households, firms and so on. This can be achieved by surveying a number of individual economic agents over time. Panel data models have become widespread with the increasing availability of databases, surveys and other forms of data collection containing multiple observations on individual units being continuously updated. Web scraping and crowdsourcing surveys are typical examples of this type of data. Spatial panels are a special case of panel data where we also record the geographical position of individuals. Let us start by presenting the non-spatial panel data model which forms the basis for other more sophisticated frameworks. The basic model can be expressed as follows: yit = α + β T X it + εit
(5.1)
In Equation 5.1 the index i = 1, …, n refers to the individual economic agent, the index t = 1, …, T is the time index, Xit is a non-stochastic vector of observations of the independent variables in individual i and time t, εit the innovation term such that ε X ≈ i.i.d.N (0, σ ε2 n I n ) and α and β parameters to be estimated. The double dimensionality of panel data allows for richer modeling possibilities with respect to the single cross-section or the single time series. Typically, microeconomic panels are characterized by a large number of cross-sectional units observed in few moments in time (short panels) or a limited number of relatively long time series (long panels). When we consider spatial panel data in particular they typically fall into the category of the short panel.
5.2 Fixed and random effects models In microeconometrics, panel data models are used to control for “unobserved heterogeneity” related to individual-specific, time-invariant characteristics which
Space–time models 77 are difficult or impossible to observe although they can lead to biased and inefficient estimates of the parameters of interest if omitted. In order to model such hidden characteristics, we can assume that the error term in Equation 5.1 can be split into two components with the first capturing individual behaviors that do not change over time. In this case Equation 5.1 becomes: yit = αi + β T xit + ui = αi + β T x it + (µi + εit )
(5.2)
where the error term (called “composite error uit ”) is split into a term μi, which represents the error component typical of the individual observed in location i, and a term εit , which is called the “idiosyncratic component”, and it is usually assumed ε X ≈ i.i.d.N (0, σ ε2 n I n ). The two components are assumed to be mutually independent. The previous framework is often referred to as the “unobserved effects model”. The optimal strategy for estimating the model in Equation 5.2 is selected depending on the properties of the two error components. We can distinguish three basic models (Baltagi, 2008). First of all, if we do not have any individual component, we do not have estimation problems. A pooled OLS estimation is unbiased, consistent and the most efficient estimation criterion. This model is called the “pooling model” and will not be discussed in this chapter. Secondly, if the individual-specific component μi is uncorrelated with the independent variables, the overall error, uit, is also uncorrelated with the independent variables and the OLS criterion leads to consistent estimators. However, the presence of a common error component over individuals introduces a correlation across the composite error terms and OLS estimators become inefficient and we have to consider alternative estimators. This second situation is called in the literature “random effects”. We will present these models in Section 5.3. Finally, if the individual component μi is correlated with the independent variables the OLS estimators become inconsistent and we have to consider the sequence of terms μi as a further set of n parameters to be estimated as if we had n different intercepts, say αit =αi, one for each individual although constant over time. This model is called a “fixed effects (or within) model” and it is usually estimated using OLS after an appropriate transformation which ensures consistency. We will show how to deal with these models in Section 5.4.
5.3 Random effects spatial models The literature has recently considered panel regression models with a spatially lagged dependent variable or spatially autocorrelated disturbances, both in the context of fixed and random effects specifications (Lee and Yu, 2010b). As noted in the previous section, in a random effects specification, the unobserved individual effects are assumed to be uncorrelated with the other explanatory variables of the model and can therefore be safely treated as components of the error term. Within the context of random effects panel data models in the literature we can have two different formalizations when the model is specified as a spatial
78 Spatial behavior of economic agents error. Furthermore the random effects model can be specified in a spatial lag version (see Chapter 3). Dealing with the spatial error version of a panel data model, in a first specification, we start by assuming that the composite error uit is split into a term μi, which is assumed to be μi ~i.i.d. N(0, σμ2), and a second error term ε , which is expressed in the form of a spatial error, such that, for each t = 1, …, T, we can write (Baltagi et al., 2003):
εit = ρWεit + ηi
(5.3)
as in Section 3.2.2, with ηi a well-behaved independent normal error term η ≈ n.i.d .N (0, σ η2 ). With respect to the standard cross-sectional models, the formalization is complicated by the presence of two instead of one indices. For this reason, before proceeding any further, we will introduce some useful notation to simplify the formalization. First of all, in order to accommodate the two indices i and t, we will make use of the Kronecker product indicated by a c a12 c12 and C = 11 , then the symbol ⊗ such that if A = 11 a 21 a 22 c 21 c 22 a11c12 a11c11 a 21c12 a11c11 a C a c a c a c a c 11 21 11 22 21 21 21 22 a12C 11 . = A ⊗C = a 22C a 21C a12c11 a12c12 a 22c11 a 22c12 a 22c 21 a 22c 22 a12c 21 a12c 22 Secondly, we also define a matrix B such that B=(In−ρW), with In an n-by-n identity matrix, W the usual spatial weight matrix and ρ the spatial error dependence parameter. From Equation 5.3, in algebraic terms, we can write the idiosyncratic error component as εit = (I − ρW )−1ηi . Given the symbolism introduced, in matrix term this becomes:
(
)
ε = I T ⊗ B −1 η
(5.4)
with η an nT-by-1 vector of the disturbances. As a consequence, the composite error term ui = (µi + εit ) can be written in matrix form as:
(
) (
)
−1 u = iT ⊗ I n µ I T ⊗ BN e
(5.5)
with iT a vector of one dimension T and IT the T-by-T identity matrix. Let us now define the matrix J T = iT iTT as a T-by-T matrix constituted by ones. It can be shown that the variance–covariance matrix of u can be written as: nT ΩnT ,u
(
)
(
= σ µ2 J T ⊗ I n + σ e2 I T ⊗ BnT Bn
)
−1
(5.6)
From which the derivation of the likelihood is straightforward although tedious.
Space–time models 79 A second alternative specification for the spatial error was considered by Kapoor et al. (2007) where the authors assume that a spatial correlation structure could be applied not only to the idiosyncratic error, but to both error components. In this case the composite disturbance term, u = (iT ⊗ I n ) µ + ε , is assumed to follow a spatial autoregressive structure (Chapter 3.2.2): u = ρ (I T ⊗ W ) u + η
(5.7)
with all symbols already introduced. Following this second specification, the variance–covariance matrix of u can now be written as: nT ΩnT ,u
(
) (
= I T ⊗ B −1 Ωε I T ⊗ B −T
)
(5.8)
where Ωε = σ µ2 J T + σ η2I T ⊗ I n is the typical variance–covariance matrix of a one-way error component model. Although similar, the economic meaning of the two spatial error specifications are very different. In the first model only the time-varying components diffuse spatially, while in the second the spatial spillovers also display a permanent component. Moving to a spatial lag version of a random effects panel data model, this can be written as a combination of a spatial filtering on the dependent variable y and a random effects structure for the disturbances. More formally, we have:
(I T
⊗ A ) y = Xβ + u
(5.9)
with A = (In − λW ), and: u = (iT ⊗ µ) + η
(5.10)
where the variance–covariance matrix is defined as ∑ = ϕ ( Jt ⊗ I n ) + I n and the parameter φ is defined as: ϕ=
σ µ2 σ η2
(5.11)
This parameter represents the ratio between the variance of the individual effect and the variance of the idiosyncratic error. From Equations 5.9 and 5.10 again, the likelihood can be easily derived.
5.4 Fixed effect spatial models As observed in Section 5.2, if we relax the hypothesis that individual effects are uncorrelated with the independent variables then we fall into the situation known as fixed effects and the OLS strategy leads to inconsistent estimators. In this case the fixed individual effects can be eliminated using a procedure called “time-demeaning” (Wooldridge, 2002) which consist of taking the first time differences of the raw data. Following this approach we replace the original variables y and X in Equation 5.2 with the vector of the demeaned values given by:
80 Spatial behavior of economic agents yit = yit − yi
(5.12)
T
with yi =
∑y
it
t =1
the time mean at location i and similarly for X. This operation T has the effect of removing the constant term from the regression. From a statistical point of view, the random effects hypothesis is associated to the idea of sampling individuals from an infinite population, which has led Elhorst (2009) to consider it practically irrelevant in the spatial econometric contexts. However, the current literature tends to concentrate on the statistical properties of the individual effects, which in both cases are random variables. Hence the crucial distinction becomes whether we can assume the error to be correlated with the regressors or not. The Hausman (1978) test is the standard procedure to test the hypothesis of correlation (see Lee and Yu, 2012, for a thorough discussion on this topic).
5.5 Estimation 5.5.1 Introduction Spatial panel models expressed in terms of either random or fixed effects, can be estimated employing a maximum likelihood or a generalized method of moments approach. Generally speaking the pros and cons of the two methods are those discussed already when dealing with cross-sectional data (Section 3.2). The ML procedure is fully efficient, but is also computationally more demanding and relies on the distributional hypothesis of residual normality. On the other hand, a GMM strategy is easier computationally and does not need any distributional assumption, so estimates are more distributionally robust. In this section we will discuss with a certain detail both approaches.
5.5.2 Maximum likelihood The standard procedures developed for ML estimation of both spatial lag and spatial error panels are a combination of the standard time-demeaning technique of non-spatial panel data (Wooldridge, 2002) and the ML framework for cross-sectional data (see Chapter 3.2). Data are transformed through timedemeaning which consists of subtracting the temporal mean from each observation in order to eliminate the individual spatial effects. After time-demeaning the standard spatial lag or spatial error estimators can be applied to the transformed data so that the first-order conditions simplify to those of OLS, with an additional spatial filter on y in the spatial lag case.
5.5.2.1 Likelihood procedures for random effect models As mentioned in Section 5.3, the random effects spatial error can be specified in two possible ways, depending on the interaction between the spatial autoregressive effect and the individual error components.
Space–time models 81 In the first spatial error specification (Baltagi et al., 2003), only the idiosyncratic error is spatially correlated, and the model can be expressed through the following three equations: y = Xβ + u u = (iT ⊗ µ) + ε
(5.13) (5.14)
ε =ρ(iT ⊗ µ)ε + η
(5.15)
and
with the scaled errors’ covariance expressed by:
(
)
(
ΣSEM −RE = J T ⊗ Tϕ I n + (B T B )−1 + ET ⊗ B T B
)
−1
(5.16)
JT and ET = I T − J T . From Equation T 5.16, it follows immediately that the likelihood function can be maximized in an estimation phase. In the second spatial error specification (Kapoor et al., 2007) the spatial model applies to both the individual and the idiosyncratic error component and can be expressed through the set of equations: having indicated with the symbols J T =
y = Xβ + u u = (iT ⊗ µ) + ε
5.17) (5.18)
u = ρ(I T ⊗ W )u + ε
(5.19)
and
where the scaled errors’ covariance to be substituted into the likelihood is:
(
ΣKKP = (ϕ J T + I T ) ⊗ B T B
)
−1
(5.20)
Again Equation 5.20 constitutes the basis for building up the likelihood to be maximized. Finally, the general likelihood function for the random effects spatial lag panel model combined with any error covariance structure Σ represents the panel data version of the spatial lag cross-sectional model discussed in Section 3.2. From Equation 5.11 we have: l = const −
nT 1 1 ln(σ η2 ) + ln | Σ | +T ln | A | − 2 2 2σ η2
T −1 (I T ⊗ A ) y − Xβ Σ (I T ⊗ A ) y − Xβ nT
(5.21)
with Σ the composite error variance–covariance matrix. The likelihood is generally maximized using the iterative procedure suggested by Oberhofer and Kmenta (1974). Starting from an initial value for the spatial lag parameter λ and
82 Spatial behavior of economic agents the error covariance parameters, we obtain estimates for β and σ η2 from the first order conditions:
βˆ = (X T Σ−1X )−1 X T Σ−1 (I T ⊗ A ) y
(5.22)
and
σ η2 =
T −1 (I T ⊗ A ) y − Xβ Σ (I T ⊗ A ) y − Xβ nT
(5.23)
The likelihood reported in Equation 5.21 can be then concentrated and maximized with respect to the parameters contained in A and Σ. The estimated values are then used to update the expression for Σ−1 and the steps are repeated until convergence.
5.5.2.2 Likelihood procedures for fixed effect models From a computational point of view, according to the framework introduced by Elhorst (2003), fixed effects estimation of spatial panel models is accomplished as a pooled estimation on time-demeaned data. Let us start by considering the panel data version of the spatial lag model described in Section 3.2. In this case, in order to estimate the parameters, we need to correct the likelihood of the pooled model by adding a spatial filtering on y using the filter I T ⊗ A , with A = (I n − λW ) and λ being, as usual, the spatial lag parameter (Elhorst, 2003). We also need to consider the explicit expression for the determinant of the spatial filter matrix I T ⊗ A which is equal to A raised to the power of T. The validity of Elhorst’s procedures relies on a property that guarantees that Σ (I n ⊗ A ) y = (I n ⊗ A ) Σy , for each matrix Σ, so that demeaning the spatially lagged data is equivalent to spatially lagging the demeaned data (see Mutl and Pfaffermayr, 2011; Kapoor et al., 2007). An efficient two-step iterative estimation procedure can be obtained as follows. First of all let us consider the vector of the demeaned values for X and Y defined as in Equation 5.12. This operation has the effect of removing the constant term from the regression. Secondly, consider the residuals derived from the demeaned model with a further spatial filter on y, defined as:
η = (I T ⊗ A) y − X β
(5.24)
Thirdly, we can derive the likelihood, concentrated with respect to β and σ η2 : l = const −
nT nT ln(σ e2 ) + T ln | A | − ln(ηT η) 2 2
(5.25)
that can be maximized with respect to λ. The value of λ thus obtained is used in a generalized least squares step, imposing the following first order conditions:
(
βˆ = X T X
)
−1
X (I T ⊗ A ) y
(5.26)
Space–time models 83 and
σ η2 =
ηT η nT
(5.27)
In this way we obtain a new expression for the errors to be used in Equation 5.24. The procedure is then iterated until convergence. The same procedure can be easily adapted to the spatial error model specification. Again, an efficient two-step procedure can be based on concentrating the likelihood with respect to β and σ η2 obtaining: l = const −
nT nT ln(σ η2 ) + T ln | B | − ln(ηT η) 2 2
(5.28)
where the residuals from the demeaned model are now filtered using the following expression:
η = (I T ⊗ B ) y − X β
(5.29)
with B = (I − ρW ). Equation 5.28 can then be maximized with respect to ρ. The value of ρ obtained from the maximization of Equation 5.28 can used in a generalized least squares step, imposing the following first order conditions:
(
βˆ = X T X
)
−1
Xy
(5.30)
and
σ η2 =
ηT η nT
(5.31)
obtaining new expressions for the errors to be used in Equation 5.29 and the procedure can be iterated until convergence. The methodology described can be easily extended to the SAR AR specification as shown for instance by Millo and Pasini (2010). According to Anselin et al. (2008) the operation of time-demeaning alters the properties of the joint distribution of errors, introducing serial dependence. To solve this problem, Lee and Yu (2010a) suggest a different orthonormal transformation (For a discussion and simulation results, see Lee and Yu, 2010b; Millo and Piras, 2012).
5.5.3 The generalized method of moments approach Generalized method of moments estimators of spatial panel models use both spatial Cochrane–Orcutt transformations (Chapter 3.2), to filter out the spatial dependence, and the standard GLS or time-demeaning transformations (Section 5.5.2). The spatial Cochrane–Orcutt transformations are based on consistent estimates of the spatial parameters. Although the approach is rather general, we will follow the literature (Kapoor et al., 2007; Millo and Piras, 2012) and limit
84 Spatial behavior of economic agents ourselves to the spatial error model, while referring to the existing literature for the extension to a spatial lag or a spatial SAR AR version (Mutl and Pfaffermayr, 2011).
5.5.3.1 Generalized method of moments procedures for random effects models Kapoor et al. (2007) extended to the panel case the generalized moment estimator suggested in Kelejian and Prucha (1999) for a cross-sectional model presented in Section 3.2.2. The authors present an estimation procedure which enables the estimation of the spatial parameter ρ of the error process and of the two variance components of the disturbance term, respectively σ 12 = σ µ2 + σ η2 and σ η2. They then introduce the following moment conditions: 1 εT Q 0ε n (T − 1) 1 ε T Q 0ε n (T − 1) 1 ε T Q 0ε n ( T − 1) E 1 εT ε n ( T − 1) 1 εTε n ( T − 1 ) 1 εTε n ( T − 1 )
σ η2 2 1 T σ η tr(W W ) n 0 = σ 12 2 1 T σ 1 tr(W W ) n 0
(5.32)
IT − J T and IN is the (time-)demeaning matrix, so that Q 0 y = y (see T previous section). Furthermore, we define ε = u − ρu , ε = u − ρu , u = (I T ⊗ Wn )u and u = (I T ⊗ Wn )u . The estimator implied by Equation 5.32 is based on the fact that in a random effects model without a spatial lag of the dependent variable, the OLS estimator of β is consistent, therefore the OLS residuals can be employed in the GMM procedure. While a first set of GMM estimators is based only on the first three equations assigning equal weights to them, a second set of estimators make use of all of the moment conditions and an optimal weighting scheme: the inverse of the variance covariance matrix of the sample moments at the true parameter values under the assumption of normality (see Kapoor et al., 2007). A third set of estimators can also be defined making use of all moment conditions but with a simplified weighting scheme and can be used when computational difficulties arise in calculating the elements of the asymptotic variance covariance matrix of the sample moments. where, Q 0 =
Space–time models 85 While the first estimator can be used to perform a spatial Cochrane–Orcutt transformation in order to eliminate individual effects, the data are further 1 − σ v Q 0 using a standard approach in the panel data pre-multiplied by I nT − σ1 literature. In this case the feasible GLS estimator reduces to an OLS calculated on the “doubly” transformed model. Finally, small sample inference can be based on the following expression for the parameter’s variance–covariance matrix: Γ = (X *T Ωε−1X *)−1
(5.33)
where the variables X* are the result of a spatial Cochrane–Orcutt transformation of the original model, and both X* and Ωε−1 are dependent on the estimated values of ρ, σ µ2 and σ 12.
5.5.3.2 Generalized method of moments procedures for fixed effects models If we choose to adopt a fixed effects model, we can exploit a modification of the above procedure. Mutl and Pfaffermayr (2011) noted that, under the fixed effects assumption, OLS estimators of the regression equation are no longer consistent and suggested replacing the OLS residuals with the spatial two-stage least squares within residuals. In the spatial error case, a simple within-estimator will produce consistent estimates of the model parameters. Following this idea, we can reformulate the first three moments conditions of Kapoor et al. (2007) replacing the residuals with the within residuals and then estimate the spatial parameter using the GMM restricted to the first three moments conditions. The model parameters are then obtained by OLS after a further spatial Cochrane– Orcutt transformation of the within transformed variables.
5.6 A glance at further approaches in spatial panel data modeling The literature on spatial panel data models has grown rapidly in recent years and the largest number of theoretical and applied papers published in spatial econometrics are now related to this topic (see Arbia, 2012 for a review). The basic frameworks presented here are only a small description of a much larger field of research and applications. Excellent survey papers in this area are provided by Baltagi and Pesaran (2007), the review article by Lee and Yu (2010b) and the book by Elhorst (2014) to which we refer the reader for more details. Furthermore, we limit ourselves to what are referred to in the literature as static panel data models which do not include any time lag. When a time lag is included we deal with what are termed “dynamic spatial panel data models”. This large class of models are further distinguished into “stable” and “unstable” models and include the discussion of important issues such as spatial unit roots, spatial co-integration and explosive roots. Despite the interest in these models they are not included in this book where we want to keep the discussion as simple as possible. Furthermore,
86 Spatial behavior of economic agents for this class of models, no pre-defined package is currently available in the language R. (For an exhaustive review of these topics see Lee and Yu, 2011; Baltagi et al., 2013; Bai and Li, 2018;,Li et al., 2019; Orwat-Acedańska, 2019.)
Example 5.1: Insurance data
4200000 4600000 5000000
coord[,2]
Although there is great interest in the current scientific debate on spatial panel data for micro-data, it is still difficult to find freely available datasets to test the methods discussed in this chapter referring to the single individual agent, because most of the applications so far are still confined to regional aggregated data. However, just to give the reader a taste of the possible applications, in this example we will make use of regional data, but treating them as if they were individuals, considering the coordinates of the centroids of each area as point data. The dataset {splm}Insurance contains data on insurance consumption in the 103 Italian provinces from 1998 to 2002 and contains different variables among which the “real per capita gross domestic product” and the “real per capita bank deposit”. In what follows we will build up a model where we explain the per capita bank deposit as a function of per capita GDP. The locations of the 103 centroids are given in Figure 5.1. We start by estimating a pooled model with no spatial components with OLS obtaining the results in Table 5.1. We then estimate the different specifications of spatial panel data models. In all cases we used a k-nearest neighbor setting k = 3. We start considering the results of a random coefficient models. Remember that in the case of random coefficients, we have an extra parameter ϕ described in Equation 5.11 which is the ratio between the variance of the individual effect and the variance of the idiosyncratic error. The first model is a panel model without spatial effects which leads to the results estimated with maximum likelihood in Table 5.2, where all parameters including ϕ are significant.
400000
600000
800000 coord[,1]
Figure 5.1 Centroids of the 103 Italian provinces.
1000000 1200000
Space–time models 87 The second model is a spatial lag random effects model estimated with maximum likelihood which leads to the results in Table 5.3. Notice that in this specification the spatial correlation parameter is positive and significant. The third model is a spatial error random effects model with the specification of Baltagi et al. (2003) estimated with maximum likelihood which leads to the results in Table 5.4. Table 5.1 Output of a pooled regression with no spatial components of the per capita bank deposit expressed as a function of per capita GDP in the 103 Italian provinces. Estimation method: maximum likelihood Parameter Intercept Real per capita GDP
−2088.1 0.615
Standard error
t-test
p-value
424.46 0.0179
−6.436 34.235
1.822e − 010 2.2e − 16
R 2 = 0.69556
Table 5.2 Output of a panel data random effect model with no spatial components of the per capita bank deposit expressed as a function of per capita GDP in the 103 Italian provinces. Estimation method: maximum likelihood Parameter Intercept Real per capita GDP
1346.5 0.272
Standard error
t-test
p-value
999.68 0.052
13.4689 −5.217
2.2e − 16 1.816e − 07
ϕ = 28.6664 (0.0001)
Table 5.3 Output of a spatial lag panel data random effect model of the per capita bank deposit expressed as a function of per capita GDP in the 103 Italian provinces. Estimation method: maximum likelihood Parameter Intercept Real per capita GDP
2078.1 0.1027
Standard error
t-test
p-value
756.98 0.0456
2.745 2.541
0.0006 0.011
ϕ = 14.627 (0.0002); λ = 0.546 (p-value < 2.2e − 16)
Table 5.4 Output of a spatial error panel data random effect model of the per capita bank deposit expressed as a function of per capita GDP in the 103 Italian provinces. Specification of Baltagi et al. (2003). Estimation method: maximum likelihood Parameter Intercept Real per capita GDP
−186.733 0.506
Standard error
t-test
p-value
608.442 0.033
−0.306 15.219
0.758 0.
(7.6)
Equation 7.6 can properly represent the null hypothesis of CSR and can be used as a benchmark to develop formal tests. Significant deviations from this benchmark indicate the alternative hypothesis of either spatial heterogeneity or spatial dependence, or both. In particular, if K (d ) > π d 2 there are more points within a distance d, from each points, than expected under CSR. In case the observed point pattern represents the spatial distribution of economic agents in an homogeneous space, this circumstance implies that positive spatial interactions among agents, and hence clustering, are detected. On the other hand, if K (d ) < π d 2 there are less than expected points within a distance d from each points implying, for example, negative spatial interactions among economic agents that tend to repulse each other up to a distance d. To favor an easier interpretation, Besag (1977) proposed the linear transformation of K (d ) : L (d ) = K (d ) π where the linearization is obtained by dividing by π and computing the square root. Under this normalization, the null hypothesis of complete spatial randomness is represented by a straight line because it is satisfied when L (d ) = d . Other similar normalizations have followed since Besag (1977), such as L (d ) = K (d ) π − d (see e.g. Marcon and Puech, 2003) that makes the CSR hypothesis represented by 0.
118 Locational choices of economic agents The sampling distribution of Kˆ (d ) under CSR is analytically tractable only in the cases of simple study areas A. In particular, Lang and Marcon (2013) developed an analytical global test for K (d ) against the CSR hypothesis that can be used for point patterns observed in a square A, while Marcon et al. (2013) extended it to a rectangular A. In cases of arbitrarily shaped study areas, it is common practice to make inferences about the CSR hypothesis by estimating the sampling distribution of Kˆ (d ) by direct Monte Carlo simulations of the homogeneous Poisson process. Moreover, as compared with global tests, the Monte Carlo simulations approach has the advantage of detecting deviations from CSR at each distance d. As illustrated by Besag and Diggle (1977), this simple inferential procedure consists, firstly, of generating m point patterns according to the homogeneous Poisson process conditional on the same number of points of the observed pattern under analysis. Secondly, for each of the m simulated patterns, a different Kˆ (d ) (or a transformation of it, such as Lˆ (d ) or ˆ L (d ) ) can be computed. With the m resulting estimated functions, it is then possible to obtain the approximate m (m + 1) × 100% confidence envelopes from ˆ the highest and lowest values of the Kˆ (d ) (or Lˆ (d ) and L (d ) ) functions computed from the m CSR simulated patterns. The graph of the estimated function and its corresponding confidence envelopes against d can represents a proper ˆ significant test. Indeed, if the observed curve of Kˆ (d ) (or Lˆ (d ) and L (d ) ) lies outside – above or below – the simulation envelopes at some distances d, we have indications of significant departures from CSR at those distances. As a way of illustrating the Monte Carlo-simulated confidence envelopes approach to testing the CSR hypothesis, Figure 7.1 shows the performance of the test in different paradigmatic empirical circumstances. In particular, the graphs in the figure depict the behavior of Lˆ (d ) on the vertical axis and the distance d on the horizontal axis. In each graph, the solid line represents the observed function, while the shaded area is bounded by the corresponding upper and lower confidence envelopes for 999 realizations of a homogeneous Poisson process. It can be clearly noted that (unlike the quadrat-based CSR tests that are conditioned on the arbitrary choice of a specific spatial scale) the K-function-based tests identify significant deviations from CSR (toward clustering or inhibition) at varying spatial scales simultaneously. In particular, this distance-based approach allows the detection of differing, even opposite, spatial patterns co- occurring within the same dataset. For example, the empirical situation depicted in Figure 7.1 is characterized by clustering at small distances and inhibition at relatively higher distances due to the presence of small clusters of units located far apart from each other. Conversely, Figure 7.1d exhibits inhibition at small distances and clustering at relatively higher distances due to the presence of a large concentration of units in the east part of the geographical area in which they are located distancing each other.
Models of spatial location of individuals 119
0.00
0.05
0.10
˄ L d
0.15
0.20
0.25
(a) Complete Spatial Randomness
0.00
0.05
0.10
0.15
0.20
0.25
d
0.00
0.05
0.10
0.15
˄ L d
0.20
0.25
0.30
(b) Concentration at d > 0.05
0.00
0.05
0.10
0.15
0.20
0.25
d
0.00
0.05
0.10
0.15
0.20
0.25
(c) Inhibition at d < 0.07
0.00
0.05
0.10
0.15
0.20
0.25
d
Figure 7.1 Performance of the K-function-based CSR test in different paradigmatic empirical circumstances.
120 Locational choices of economic agents
0.00
0.05
0.10
˄ L d
0.15
0.20
0.25
(d) Clustering at 0.08 < d < 0.17
0.00
0.05
0.10
0.15
0.20
0.25
0.15
0.20
0.25
0.15
0.20
0.25
d
0.00
0.05
0.10
˄ L d
0.15
0.20
0.25
(e) Concentration at 0.015 < d < 0.06 and inhibition at 0.10 < d < 0.12
0.00
0.05
0.10
d
0.00
0.05
0.10
˄ L d
0.15
0.20
0.25
0.30
(f) Inhibition at d < 0.075 and concentration d > 0.140
0.00
0.05
0.10
d
Figure 7.1 (Continued)
Models of spatial location of individuals 121 7.3.2 Parameter estimation of the Thomas cluster process As we have seen in Chapter 6, the Thomas model (1949) is a Poisson cluster process that firstly generates a set of leader points according to a homogeneous Poisson process with intensity ρ and then, around each leader point, generates a random number of follower points. This process is characterized by the fact that the number of followers of each leader is a Poisson random variable with mean µ and the dispersion of followers with respect to their leaders’ locations follows a radially symmetric normal distribution with standard deviation σ . Moreover, because of the assumption of stationarity, the theoretical first-order intensity of the process is λ = ρµ . Diggle (2003) shows that the theoretical K-function for the Thomas process is K (d ) = π d 2 +
d 2 1 1 − exp− 2 . ρ 4σ
(7.7)
The process parameters, that is ρ , µ and σ , can then be estimated from the observed data by finding the optimal parameters’ values that ensure the closest match between the theoretical K-function, as expressed by Equation 7.7, and the empirical K-function estimated from the observed data. In order to find the optimal values, Diggle and Gratton (1984) proposed the “method of minimum contrast”, subsequently generalized by Møller and Waagepetersen (2003), which essentially consists of minimizing the following general discrepancy criterion: D (θ ) =
∫
d0 0
Kˆ u q − K u ; θ q p du ( ) ( )
(7.8)
where θ is the set of parameters to be estimated, Kˆ (u ) indicates the observed value of the K-function computed from the data, K (u ; θ ) denotes the theoretical expected value of the K-function, and d0 , q and p are constants that need to be chosen. According to this method, θ is estimated as the value θˆ that minimizes the criterion D (θ ). A detailed discussion on how to specify the values for q, p and d0 depending on the empirical situation under study is given by Diggle (2003). A typical choice for the first two constants is q = 1/4 and p = 2 so that D (θ ) corresponds to the integrated squared difference between the fourth roots of the two K-functions (Waagepetersen, 2007). With respect to d0 , Diggle (2003) recommends that, for theoretical and practical reasons, its value should be substantially smaller than the spatial extension of the observed study area. We can then estimate the values of θ = ( ρ, σ ) of a Thomas process through the method of minimum contrast by using Equation 7.7 and hence minimizing D (θ ) =
∫
d0 0
(
(
(
1/ 4 2
)) )
1/4 − π u 2 + ρ −1 1 − exp −u 2 4σ 2 Kˆ (u )
du.
(7.9)
Once the optimal value of θ = ( ρ, σ ) is obtained, the remaining mean number of followers, parameter µ , can be derived from the estimated first-order intensity λˆ .
122 Locational choices of economic agents
Example 7.1: The spatial location pattern of new pharmaceutical and medical device manufacturing firms in Veneto (Italy) Let us consider an example of the K-function-based method for parameter estimation of a Thomas cluster process. The data we are using consists of the observed micro-geographical pattern of firm entries into the pharmaceutical and medical device manufacturing industry in Veneto, a region situated in the north-east of Italy with an area of 18,399 km 2. This dataset is a subset of an internationally comparable database of Italian firm demographics, managed by the Italian National Institute of Statistics (ISTAT), and contains the (normalized) geographical coordinates of the 333 newly created firms during the period between years 2004 and 2009. Figure 7.2 reports the spatial distribution of firms. As can be noted, new businesses seem to concentrate in some specific geographical areas thus violating the CSR hypothesis. This is confirmed by the CSR test depicted in Figure 7.3. For simplicity, let us suppose that the industrial territory of Veneto is substantially homogeneous and that a theoretical economic model suggests that new firms in the pharmaceutical and medical device manufacturing sector should follow a leader–follower location pattern. A Thomas cluster process can then be
Figure 7.2 L ocations of the 333 new firms in the pharmaceutical and medical device manufacturing sector in Veneto, 2004–2009.
0.15 0.00
0.05
0.10
˄ L d
0.20
0.25
Models of spatial location of individuals 123
0.00
0.05
0.10
0.15
0. 0
0.25
d
Figure 7.3 Behavior of the empirical Lˆ (d ) (continuous line) and the corresponding 99.9 per cent CSR confidence bands (shaded area) for the 333 new firms in the pharmaceutical and medical device manufacturing sector in Veneto, 2004–2009.
plausible as a data-generating model, which can be fitted using the method of minimum contrast and, in particular, solving the minimum optimization problem specified by Equation 7.9 through least squares estimation. The obtained estimates are ρˆ = 76.0, σˆ = 0.012 and µˆ = 8.4. As a simple goodness-of-fit examination, we can rely on the graphical summary, depicted in Figure 7.4, that compares the behavior of Lˆ (u ) (solid line) with confidence envelops for 99 realizations of the fitted Thomas process (shaded area). The graph indicates that the fit is statistically adequate and that the observed data are compatible with the estimated model.
0.15 0.00
0.05
0.10
˄ L d
0.20
0.25
0.30
124 Locational choices of economic agents
0.00
0.05
0.10
0.15
0.20
0.25
d
Figure 7.4 Goodness-of-fit of a Thomas process to the pharmaceutical and medical device manufacturing firms data using the empirical (continuous line) and confidence envelopes from 99 simulations of the fitted model (shaded area).
7.3.3 Parameter estimation of the Matérn cluster process As we have seen in Chapter 6, the Matérn cluster process (Matérn, 1986) is a Poisson cluster process that, like the Thomas model, firstly generates a set of leader points according to an homogeneous Poisson process with intensity ρ and then, around each leader point, generates a random number of follower points. This process is characterized by the fact that the number of followers of each leader is a Poisson random variable with mean µ and the location of followers with respect to their leaders’ locations are independently and uniformly distributed inside a circle of radius R centered on the leader point. Even in this case, because of the assumption of stationarity, the theoretical first-order intensity of the process is λ = ρµ . Møller and Waagepetersen (2003) show that the theoretical K-function for the Matérn cluster process is K (d ) = π d 2 +
1 d h ρ 2R
(7.10)
Models of spatial location of individuals 125 where
2 + 1 8z 2 − 4 arcos (z ) − 2arcsin (z ) + 4z h (z ) = π 1
(
)
(1 − z 2 )
3
− 6z 1 − z 2 if z ≤ 1 if z > 1
The process parameters, ρ and R, can be estimated from the observed data through the method of minimum contrast by using Equation 7.10 as the discrepancy criterion expressed by Equation 7.8. In turn, the remaining estimate of parameter µ can be derived from the estimated first-order intensity λˆ .
Example 7.2: The spatial location pattern of new pharmaceutical and medical device manufacturing firms in Veneto (continued)
0.15 0.00
0.05
0.10
˄ L d
0.20
0.25
0.30
In this example we try to assess whether the observed micro-geographical pattern of new firms in the pharmaceutical and medical device manufacturing
0.00
0.05
0.10
0.15
0.20
0.25
d
Figure 7.5 Goodness-of-fit of a Matérn cluster process to the pharmaceutical and medical device manufacturing firms data using the empirical Lˆ (d ) (continuous line) and confidence envelopes from 99 simulations of the fitted model (shaded area).
126 Locational choices of economic agents industry, presented in Example 7.1, can be better described by the Matérn cluster process than the Thomas model. In other words, we examine whether the dispersion of firms around the cluster centers follows a uniform rather than a normal distribution. The least squares estimation of the Matérn cluster process using Equation 7.8, with Equation 7.10 for specifying K (u ; θ ), gives estimates ρˆ = 76.5, Rˆ = 0.024 and µˆ = 8.3. The goodness-of-fit inspection, depicted in Figure 7.5, that compares the behavior of Lˆ (u ) (solid line) with confidence envelops for 99 realizations of the fitted Matérn cluster process (shaded area), indicates rejection of the model at small distances. This implies that the spatial distribution of the pharmaceutical and medical device manufacturing firms within the clusters is better modelled by the normal distribution.
7.3.4 Parameter estimation of the log-Gaussian Cox process The K-function can also be exploited, through the method of minimum contrast, to fit a different class of aggregated point processes. In particular, it has been proven to be especially efficient in estimating the parameters of a log-Gaussian Cox process. We recall from Chapter 6 that, essentially, this class of point processes assumes that the first order intensity is a non-negative-valued process, {Λ ( x ) : x ∈ 2 }, such that Λ ( x ) = exp {S ( x )}, where S ( x ) is a stationary
Gaussian random field with mean µ , variance σ 2 and covariance function γ (d ) . If we specify the covariance function according to a correlation function ρ (d ), then λ = exp µ + 0.5σ 2 and γ (d ) = exp σ 2 ρ (d ) (Møller et al., 1998). As shown by Møller and Waagepetersen (2003), for the log-Gaussian Cox process the theoretical expected K-function takes the form
(
K (d ) = 2π
)
∫
d 0
(
)
uγ (u ) du.
In order to fit the process, a parametric covariance function needs to be specified. A common choice is the exponential covariance function γ (d ) = σ 2 exp (− d α ), where α is a parameter that controls the scale of the spatial autocorrelation. In this framework, the method of minimum contrast is first used to find optimal values of the process parameters, σ 2 and α . Secondly, the remaining estimate of parameter µ can be derived from the estimated first-order intensity λˆ .
Note 1 This formula is preferable because of technical aspects related to the fact that the unbiased estimator for λ 2 is n (n − 1) ( A )2 (Illian et al., 2008). This difference is obviously irrelevant if n is large.
8
Points in a heterogeneous space
This chapter extends the modeling framework presented in the previous chapter to the case of a heterogeneous economic space. It introduces the concepts of D-function, inhomogeneous K-function, K-density and M-function and illustrates the differences with the previous approach and its empirical relevance in practical cases.
8.1 Diggle and Chetwynd’s D-function As discussed in Chapter 7, Ripley’s K-function statistic is a proper summary measure of the spatial characteristics of an observed point pattern dataset and a suitable inferential tool to test for the presence of spatial dependence, only under the assumption that the data-generating point process is stationary, say λ ( x ) = λ : that is, only if the territory can be considered essentially homogeneous. However, in real-case situations, the space where economic agents operate, and hence locate, is rarely homogeneous. For instance, firms may not locate in some areas due to the presence of legal and geophysical limitations or may locate in certain zones because of the influence of exogenous factors such as the presence of useful infrastructures, proximity to communication routes or more advantageous local taxation systems. In other words, referring to the statistical categories introduced in Chapter 6, the K-function cannot distinguish between apparent and true contagion. In order to relax the assumption of spatial homogeneity, Diggle and Chetwynd (1991) proposed a method, which is based on the K-function and the use of a case-control design setting, to detect the presence of significant actual spatial dependence while controlling for the spatial heterogeneity of the study area. Let us consider a set of cases located in a given study area, such as the firms belonging to a specific industry or the firms that exit the market within a given time period. Let us also consider a reference set of controls located in the same study area, such as the firms belonging to all other industries or all firms that still operate in the market. Under the working assumption that the unobserved exogenous factors of spatial heterogeneity affecting the location of cases and controls are substantially the same, we can detect positive (negative) spatial interactions among cases if the cases are more spatially concentrated (dispersed) than the controls. In order to measure the over-concentration (over-dispersion) of cases
128 Locational choices of economic agents with respect to controls, Diggle and Chetwynd (1991) defined the D-function statistic, that is: D (d ) = K cases (d ) − K controls (d )
(8.1)
where K cases (d ) and K controls (d ) are the K-function for the cases and controls patterns respectively. In this setting, the controls represent the reference pattern capturing spatial heterogeneity and hence D (d ) = 0 can properly represents the null hypothesis of absence of spatial interactions within an inhomogeneous space. Therefore, if at a given distance d, D (d ) > 0, the cases are relatively more spatially concentrated than the controls, implying the presence of actual spatial interactions among cases. However, if at a given distance d, D (d ) < 0, the cases are relatively less spatially concentrated than the controls, indicating spatial dispersion among cases. D (d ) can be straightforwardly estimated from the observed case-control data with Dˆ (d ) = Kˆ (d ) − Kˆ (d ), where Kˆ (d ) and Kˆ (d ) are the emcases
controls
cases
controls
pirical K-function, computed using Equation 7.4, for the cases and controls patterns respectively. In order to assess if Dˆ (d ) is significantly greater (lower) than zero and to test the null hypothesis of absence of relative spatial interaction, Diggle and Chetwynd (1991) refer to the hypothesis of “random labelling”, introduced by Cuzick and Edwards (1990). Under this hypothesis, the observed points are labelled randomly as cases or controls. Since the hypothesis of random labelling is consistent with the hypothesis of no spatial interactions between cases, D (d ) = 0 can be tested with a simple Monte Carlo procedure which consists on generating m simulations, in each of which the observed “labels” identifying the cases and controls are randomly permuted amongst the observed locations. For each of the m simulated patterns, a different Dˆ (d ) can be computed. With the m resulting estimated functions, it is then possible to obtain the approximate m/(m + 1) × 100% confidence envelopes from the highest and lowest values of the Dˆ (d ) functions computed from the m randomly labelled patterns. The graph of the estimated function and its corresponding confidence envelopes against d can represent a proper significant test. Indeed, if the observed curve of Dˆ (d ) lies outside – above or below – the simulation envelopes at some distances d, we have indications of significant departures from random labelling, and hence from the hypothesis of no spatial interactions, at those distances. As a way of illustration of the Monte Carlo-simulated confidence envelopes approach to testing the no spatial interactions hypothesis under spatial heterogeneity, Figure 8.1 shows the performance of the test in different paradigmatic empirical circumstances. In particular, the graphs in the figure depict the behavior of Dˆ (d ) on the vertical axis and the distance d on the horizontal axis. In each graph, the solid line represents the observed function, while the shaded area is bounded by the corresponding upper and lower confidence envelopes for 999 randomly labelled patterns. Figure 8.1a depicts a hypothetical empirical situation where the level of spatial interactions among cases is not significantly different from that among the controls and hence the D-function-based test detects no
Points in a heterogeneous space 129
-0.05
0.00
˄ D d
0.05
0.10
(a) No spatial interactions among cases (random labelling)
0.00
0.05
0.10
0.15
0.20
0.25
0.15
0.20
0.25
0.15
0.20
0.25
d
-0.05
0.00
˄ D d
0.05
0.10
(b) Relative spatial clustering of cases
0.00
0.05
0.10 d
-0.05
0.00
˄ D d
0.05
0.10
(c) Relative spatial dispersion of cases
0.00
0.05
0.10 d
Figure 8.1 Performance of the D-function-based test of spatial interactions in different paradigmatic case-control settings. The locations of the cases are represented by solid circles and the locations of the controls are represented by empty circles.
130 Locational choices of economic agents spatial clustering or dispersion of cases. In Figure 8.1b the cases are significantly more concentrated than the controls, indeed the D-function-based test detects significant relative spatial clustering. Figure 8.1c shows the opposite situation, in which the cases are significantly more dispersed than the controls, leading the D-function to deviate downwards from the null hypothesis of random labelling thus revealing the presence of relative spatial dispersion among cases. It is important to note that the D-function can only be used to test for the presence of significant relative spatial concentration/dispersion but cannot be used to measure its level, as the values of the function cannot be meaningfully interpreted (Marcon and Puech, 2003). The D-function, in its original or slightly modified form, has been used widely in empirical economic studies. For example, Sweeney and Feser (1998) used this approach to assess whether the spatial distribution of firms in North Carolina depended upon their size; Feser and Sweeney (2000) focused on the spatial clustering of firms’ industrial linkages; Marcon and Puech (2003) evaluated the geographical concentration of industries in France; Arbia et al. (2008) detected the existence of knowledge spillovers analyzing the spatial pattern of patents and Kosfeld et al. (2011) studied the conditional concentration of German industries.
Example 8.1: The spatial location pattern of firms’ exits from the hospitality market on the main island of Sicily Let us consider an example of the use of the D-function to test for the presence of interactions in the spatial distribution of failed hospitality businesses on the main island of Sicily. The number of such businesses established in 2010 was 164 and we observed whether or not they ceased to operate during the period 2011
Figure 8.2 Locations of the 164 hospitality businesses established in Sicily, 2010. The firms that survived until 2015 are represented by empty circles; the 55 that did not are represented by solid circles.
Points in a heterogeneous space 131
0e+00 -2e+09
-1e+09
˄ D d
1e+09
2e+09
3e+09
to 2015. The businesses consisted of the following types of accommodation: hotels, resorts, youth hostels, mountain retreats, holiday homes, farm stays, campgrounds, and holiday and other short-stay accommodation.1 Data provided by the Italian National Institute of Statistics (ISTAT). Figure 8.2 shows the spatial distribution of the 164 hospitality firms and distinguishes between the 109 that survived until 2015 (empty circles) and the 55 that failed (solid circles). As the map clearly suggests, the territory is highly heterogeneous as hospitality firms tend to be located near the seaside. As a consequence, the spatial intensity of firms’ locations cannot be realistically considered as constant. Suppose we are interested in assessing whether there is some spatial dynamic, such as a contagious effect or a local competition phenomenon, in the failure of firms. Since the spatial intensity of firms’ locations is not constant, we cannot
0
10000
20000
30000
40000
d (metres)
Figure 8.3 Behavior of the empirical Dˆ (d ) (continuous line) and the corresponding 99.9 per cent random labelling confidence bands (shaded area) for the 55 failed hospitality firms in Sicily, 2011–2015.
132 Locational choices of economic agents use the K-function to detect spatial dependence among firms’ failures as we may obtain spurious results in which spatial interactions are confounded with spatial heterogeneity. Under the reasonable assumption that the location of firms which survived and the location of firms that failed was affected by the same spatial variation factors, we can detect significant spatial dependence in the death processes of firms while verifying whether failed firms are more spatially concentrated or dispersed than survived firms. We can then perform the D-function-based test treating the firms that exited the market and the firms that survived as cases and controls, respectively. Figure 8.3 shows the results of the test revealing that, at small distances (below 9 km), failed firms are relatively spatially dispersed. This evidence suggests that failure in the hospitality industry in Sicily is not contagious, as failures tend to occur relatively far apart from each other. We have then a preliminary indication that the death process of firms is characterized by spatial local competition where the failure of a firm positively affects the survival of the other firms located nearby.
8.2 Baddeley, Møller and Waagepetersen’s K inhom-function Despite its effectiveness as a tool to assess the statistical significance of spatial interactions in a case-control setting, the D-function has some relevant limitations. First of all, its values are not straightforwardly interpretable and hence do not provide a measure of the degree of spatial dependence. Secondly, and more importantly, the D-function detects relative, not absolute, spatial interactions. It detects, a positive (or negative) spatial dependence between cases when cases’ locations are seen to be more aggregated (or dispersed) than the trend in the controls’ pattern. This implies that the results of different D-functions cannot be compared, over the same distances, if the controls’ patterns are not the same; indeed, if the controls change then the benchmark distribution capturing spatial heterogeneity changes as well, thus invalidating any comparisons (Marcon and Puech, 2017). Baddeley et al. (2000) introduced a function that overcomes these limits and allows the detection and measurement of absolute spatial dependence, within a spatially heterogeneous setting, while allowing comparisons between different datasets. In particular, they derived an inhomogeneous version of Ripley’s K-function, called K inhom, which can be seen as a generalization of the traditional measure, which is limited to homogeneous point processes, to the case of a general kind of inhomogeneous point processes. In Chapter 6 we have seen that an inhomogeneous point process is essentially a homogeneous Poisson process where the constant first-order intensity λ is replaced by a first-order intensity function varying over the space (say λ ( x )). According to Baddeley et al. (2000), an inhomogeneous Poisson process is also, in particular, “second-order intensity-reweighted stationary” if:
λ2 ( x , y ) = λ ( x ) λ ( y ) g ( x − y ) where g ( x − y ) is a function that accounts for the level of spatial dependence between the x and y arbitrary events. In this kind of setting, the second-order
Points in a heterogeneous space 133 intensity of the inhomogeneous Poisson process, at the point locations x and y, is the product of the first-order intensities at x and y multiplied by a spatial correlation factor. If there is no spatial interaction between the points of the process at locations x and y then λ2 ( x , y ) = λ ( x ) λ ( y ) and g ( x − y ) = 1. As a consequence, when g ( x − y ) > 1, we have attraction, while if g ( x − y ) < 1, we have repulsion (or inhibition) between the two locations (Møller and Waagepetersen, 2007). If we further assume isotropy, g (⋅) depends only on the distance between points, d = x − y , and hence λ2 ( x , y ) = λ (d ) and g ( x − y ) = g (d ). The scaled quantity g (d ) = λ2 ( x , y ) λ ( x ) λ ( y ) is sometimes referred to as the “pair correlation function” (Ripley 1976, 1977). If λ ( x ) is bounded away from zero, the K inhom-function of a second-order intensity-reweighted stationary and isotropic spatial point process is given by: K inhom (d ) = 2π
∫
d 0
ug (u ) du , with d > 0
(8.2)
(Baddeley et al. 2000; Diggle 2003). In the empirical economic analyses where the data-generating point process is inhomogeneous, the K inhom-function quantifies properly the mean (global) level of spatial interactions between the economic agents (such as firms or consumers) up to each distance d, while controlling for the presence of spatial heterogeneity, as modelled by λ ( x ). In particular, if the data-generating point process is an inhomogeneous Poisson process without spatial interactions between agents, we have K inhom (d ) = π d 2. In contrast, when K inhom (d ) > π d 2 (or K inhom (d ) < π d 2) the underlying point process tends to generate point patterns of agents that are relatively more aggregated (or more dispersed) than the ones that an inhomogeneous Poisson process with first-order intensity λ ( x ) and no spatial interactions would generate (Diggle et al. 2007). To ease the interpretation, we can exploit the same linear transformations proposed for Ripley’s K-function, such as L inhom (d ) = K inhom (d ) π or L (d ) = K (d ) π − d that make the null hypothesis of no spatial interinhom
inhom
actions represented by d and 0, respectively.
8.2.1 Estimation of Kinhom-function Baddeley et al. (2000) have shown that, if λ ( x ) is known, a proper edge-corrected unbiased estimator of K inhom (d ) is given by: 1 Kˆ inhom (d ; λ (d )) = A
∑∑ λ (x () λ (x n
i=1 j ≠i
wij I dij ≤ d ) i
j
)
(8.3)
where A is, as usual, the total surface of the study area, the term dij is still the Euclidean spatial distance between the ith and the jth observed points, and I (dij ≤ d ) continues to represent the indicator function such that I = 1 if dij ≤ d and 0 otherwise. As with the estimator of the homogeneous K-function
134 Locational choices of economic agents (see Section 7.2), due to the presence of edge effects arising from the bounded nature of the study area, we still need to introduce the adjustment factor wij in order to avoid potential biases in the estimates close to the boundary. In practical applications λ ( x ) is unknown, and usually we do not have a theoretical economic model that specifies a given functional form for it. Therefore, it has to be estimated. The literature has proposed both parametric and nonparametric approaches to the estimation. In case the nonparametric estimation is preferable, a popular method in this context is the kernel smoothing technique (see Silverman, 1986). Baddeley et al. (2000) proposed the use of a slightly modified version of the Berman and Diggle’s kernel estimator (1989) of λ ( x ), that is:
λˆh ( xi ) =
∑h i≠ j
x j − xi k C h ( x j ), h
−2
(8.4)
where k(⋅) is a radially symmetric bivariate probability density function (typically chosen to be Gaussian), h represents the bandwidth (i.e. the parameter controlling the smoothness of the intensity surface) and C h ( x j ) =
∫k
h
(x j − u ) du
is a factor that allows correction of the presence of edge effects. Baddeley et al. (2000) show that, with a careful, proper, choice for the bandwidth h, the estimator of the K -function, Kˆ d ; λˆ (d ) , that incorpoinhom
inhom
(
)
rates Equation 8.4 into Equation 8.3, can provide an approximately unbiased estimation of the inhomogeneous K-function. However, Diggle et al. (2007) showed that non-parametrically estimating both λ ( x ), which represents the spatial heterogeneity of the data-generating process, and K inhom (d ), which expresses the spatial interactions among events, using the same observed point pattern data may lead to spurious estimates. Indeed, we cannot really distinguish the contributions due to spatial heterogeneity or spatial dependence phenomena unless we rely on some assumptions about the nature of the underlying data-generating process. In some microeconomic applications, for example, it may be plausible to assume that the spatial scale of the first-order intensity is larger than the spatial scale of the second-order intensity. This assumption may then imply that the heterogeneity of the geographical space operates at a larger scale than the one characterizing spatial interactions amongst economic agents, and, as a consequence, these two characteristics of the underlying data-generating point process are separable (Diggle et al. 2007). This specific assumption may be realistic if prior knowledge or theoretical indications about the geographical extent of spatial interactions amongst economic agents is available, as may be the case, for example, when analyzing spatial competition among retailers (see e.g. Arbia et al., 2015b). Alternatively, following the example of Diggle (2003) among others, in those empirical circumstances where λ ( x ) can be considered as a function of exogenous geographically referenced variables expressing spatial heterogeneity, the values of λ ( x ) can be estimated using a parametric regression model. For example, in the context of micro-geographical location patterns of firms, these variables, which represent common factors shared by the firms located in the same area, could be the geographical coordinates of communication routes, the locations of useful
Points in a heterogeneous space 135 infrastructures or the positions of firms of different vertically related industries. A convenient specification of the model for λ ( x ), that describes the probability that an economic agent locates in x as a consequence of heterogeneous conditions, is the log-linear model: k λ ( x ) = exp β j z j ( x ) j =1
∑
(8.5)
where z j ( x ) is one of k geographically referenced explanatory variables and β j is the associated regression parameter. While assuming that the observed locations of the economic agents are the realization of an inhomogeneous Poisson process with intensity function λ ( x ), the model in Equation 8.5 can be fitted to the data using likelihood-based methods. Because of advantages in the computational implementation, the model is commonly fitted by maximizing its pseudo-likelihood (Besag, 1975) through the Berman–Turner approximation (Berman and Turner, 1992). A clear and detailed discussion of the method is given by Baddeley and Turner (2000). As shown by Strauss and Ikeda (1990), maximum pseudo-likelihood of a Poisson model is equivalent to maximum likelihood. This equivalence makes it possible to test for goodness-of-fit of the model expressed in Equation 8.5 using standard formal likelihood ratio criteria based on the χ 2 distribution.
8.2.2 Inference for Kinhom-function
(
)
Since the sampling distribution of Kˆ inhom d ; λˆ (d ) is unknown, in order to assess
the statistical significance of the test of K inhom (d ) against the hypothesis of no spatial interactions, we need to rely on Monte Carlo-simulated confidence envelopes. In particular, we can rely on an inferential procedure that consists, firstly, on generating m point patterns according to an inhomogeneous Poisson process conditional on the same number of points of the observed pattern under analysis and with λ ( x ) estimated parametrically or non-parametrically using Equation 8.4 or Equation 8.5. Secondly, for each of the m simulated patterns, a different ˆ Kˆ (d ) (or a transformation of it, such as Lˆ (d ) or L (d )) can be inhom
inhom
inhom
computed. With the m resulting estimated functions, it is then possible to obtain the approximated m/(m + 1) × 100% confidence envelopes from the highest and ˆ lowest values of the Kˆ (d ) (or Lˆ (d ) and L (d )) functions computed inhom
inhom
inhom
from the m patterns simulated according to the null hypothesis. The graph of the estimated function and its corresponding confidence envelopes against d can represent a proper significant test. Indeed, if the observed curve of Kˆ inhom (d ) ˆ (or Lˆ (d ) and L (d )) lies outside – above or below – the simulation eninhom
inhom
velopes at some distances d, we have indications of significant departures from hypothesis of no spatial interactions at those distances.
136 Locational choices of economic agents
Example 8.2: Local competition of supermarkets in the municipality of Trento (Italy) To illustrate the use of the K inhom-function we refer to an example in which the aim is to verify the presence of a relevant pattern of spatial competition in the location of supermarkets within the city of Trento in 2004 (Figure 8.4a). The spatial intensity of the location process of supermarkets cannot be reasonably assumed to be constant as stores tend to locate as close as possible to the potential market demand, which is obviously not regularly distributed in the area of the city. Let us suppose that in this empirical circumstance we can use the number of households by census tract in 2004 (Figure 8.4b), which is the finest available level of spatial resolution for this variable, as a proxy for the spatial distribution of potential customers in the city of Trento. To assess the nature and level of spatial interactions among supermarkets while controlling for the spatial heterogeneity of the potential market demand and hence of the retail location opportunities, the K inhom-function-based test can be performed by estimating λ ( x ) parametrically according to Equation 8.5 and using the number of households by census tract as a useful covariate capturing spatial heterogeneity. The resulting estimated spatial intensity is therefore given by the following estimated equation:
λˆ ( x ) = exp {-15.033 + 0.007h (x )} (a)
(b)
under 14 14 - 46 46 - 123 over 123
Figure 8.4 (a) locations of the 82 supermarkets in Trento in 2004; (b) quartile distribution of the number of households by census tract in 2004. Source: Italian National Institute of Statistics.
Points in a heterogeneous space 137 in which h(x ) represents the number of households in the census tract where point x is located. The estimated model coefficient for h(x ) is positive and significant (at the 5% level both according to the Wald and likelihood ratio tests), thus implying that supermarkets tend to locate in the relatively more populated areas of the city. Having derived an estimate of λ ( x ) we can then estimate Kˆ inhom (d ) (and hence Lˆ (d )) using Equation 8.3. Figure 8.5 displays Lˆ (d ) along inhom
inhom
2000 1500 0
500
1000
˄ L inhom d
2500
3000
3500
with the confidence envelopes referred to the null hypothesis of absence of spatial dependence at a 99.9% significance level. The graph seems to reveal a complex location phenomenon for the supermarkets in Trento which occurs at different spatial scales. In particular, a significant upward deviation of the estimated function is evident, relative to the confidence bands at small distances (below 1.5 km), while a significant strong downward deviation occurs after 2.5 km. This suggests
0
500
1000
1500
2000
2500
3000
d (metres)
Figure 8.5 Behavior of the empirical Lˆ inhom (d ) (continuous line) and the corresponding 99.9 per cent confidence bands under the null hypothesis of absence of spatial interactions (shaded area) for the 82 supermarkets in Trento, 2004.
138 Locational choices of economic agents that once we control for the spatial distribution of potential market demand, we find both positive and negative spatial externalities at play which lead to smallscale spatial clusters of supermarkets (with an extension within 1.5 km) that tend to form at not less than 2.5 km from other clusters of supermarkets. In this example the K inhom-function has helped to empirically disentangle spatial heterogeneity from the spatial dependence observed in the pattern of the supermarkets in Trento. The L inhom-function reported in Figure 8.5 clearly shows how strong the tendency is for stores to locate in isolated small clusters because of a genuine interaction amongst economic agents and not just because of a tendency to concentrate in the most populated areas. In other words, the L inhom-function identifies situations of over-concentration at small distances and over-dispersion at relatively higher distances as opposed to a non-constant underlying intensity. In this way, it clearly quantifies the level of spatial interactions which cannot simply be put down to exogenous factors which change smoothly over space such as the potential market demand.
8.3 Measuring spatial concentration of industries: Duranton–Overman K-density and Marcon–Puech M-function Following the increasing availability of economic micro-geographical data, a growing stream of literature in the field of spatial economics and regional sciences has been focusing on developing distance-based measures of spatial concentration of industries that can overcome the methodological limits of the more traditional indices, such as the “location quotient” or “Ellison–Glaeser index” (Ellison and Glaeser, 1997), which are based on discrete areal-level data. The most established and used distance-based measures of spatial clusters of firms are probably the K-density and M-function, that have been developed, respectively, by Duranton and Overman (2005) and Marcon and Puech (2010). Both measures have been proposed as adaptations of Ripley’s K-function to the empirical contexts of industrial agglomeration. In particular, they have been developed in order to meet the methodological requirements formalized by Duranton and Overman (2005) which state that an ideal measure of spatial concentration of firms’ plants should (i) allow comparison of the level of concentration among different industries, (ii) control for industrial concentration, (iii) control for spatial heterogeneity as represented by the spatial distribution of the whole manufacturing, (iv) provide the statistical significance of the estimated values and (v) be robust with respect to spatial scale of data aggregation. As a matter of fact, Ripley’s K-function does not meet, in particular, property iii, because, as we have discussed in this section, it detects actual spatial clustering only under the assumption of spatial homogeneity. This assumption, however, does not hold as the spatial intensity of firms’ locations cannot be realistically considered as constant. The K-density and M-function are two alternative relative measures of spatial concentration of firms located in a heterogeneous space within the same industry.
Points in a heterogeneous space 139 8.3.1 Duranton and Overman’s K-density Duranton and Overman’s K-density (2005), say Kˆ density , is conceptually similar to the second-order intensity function λ2 ( x − y ), introduced in Chapter 6, as it represents an estimator of the density distribution of the distances between pairs of firms’ locations. As pointed out by Marcon and Puech (2017), Kˆ density can also be viewed as an estimator of the probability density function of finding another firm located at a given distance d from a typical reference firm. Indeed, essentially, it computes the observed average number of pairs of firms at each distance and then applies a smoothing in order to obtain a continuous function that sums to 1. More precisely, Kˆ density can be defined as: Kˆ density (d ) =
1 n (n − 1)
n
∑∑k (d
ij , d ; h
),
i=1 j ≠i
where k (dij , d ; h ) =
2 (dij − d ) 1 exp − h 2π 2h 2
is a Gaussian kernel smoother à la Silverman (1986) such that it reaches the maximum value when dij = d while it decreases, as dij deviates from d, following a Gaussian distribution with a given standard deviation h, namely the bandwidth. Duranton and Overman (2005) suggest choosing the value of the bandwidth h according to the criterion proposed by Silverman (1986; see Section 3.4.2; Equation 3.31). Following a similar approach to that of random labelling for the D-function (see Section 8.1), to assess the statistical significance of Kˆ density , its values are compared to the confidence interval of the null hypothesis that firms are randomly reassigned to the actual observed firms’ locations. This implies that the function does not have a closed-form benchmark value for the null hypothesis of absence of spatial clustering, which is instead derived by means of a Monte Carlo approach. In practice, at each simulation, the firms of a single sector of activity (or of any specific typology of interest) are randomly assigned over the locations of all firms of all sectors (or of all the possible typologies). The null hypothesis of absence of spatial interactions among the firms of the same sector (or the same typology) is then represented by the center of the Monte Carlosimulated confidence envelopes. In this respect, similarly to the D -function, K-density allows for the detection of relative, rather than absolute, spatial concentration of firms of a specific typology. Duranton and Overman (2005) have also proposed a weighted version of the function that also accounts for the size of firms, as measured for example by the number of employees, the valueadded, the capital, and so on. See Duranton and Overman (2005) for more details.
140 Locational choices of economic agents In the last few years K-density has been used quite extensively in the field of spatial economics and regional sciences. We can cite, among the most notable empirical works, Duranton and Overman (2008), Klier and McMillen (2008b), Koh and Riedel (2014), Kerr and Kominers (2015) and Behrens and Bougna (2015).
8.3.2 Marcon and Puech’s M-function Marcon and Puech (2010)’s M-function can be viewed as a distance-based version of the popular location quotient of regional industrial specialization. Indeed, in a framework where firms belong to different typologies (e.g. different industries), it consists of a function of the distance d that provides the relative frequency of firms of a specific typology of interest (e.g. a particular industry) that are located within d of a typical reference firm of the same type, compared to the same ratio computed with respect to all firms of all types. If the typical firm is indicated by i, another firm of the same type is denoted by jc and another firm of any typology is represented by j, then the M-function can be estimated by:
Mˆ (d ) =
n
∑ I (d
i=1
∑ I (d
∑
ij
j c ≠i
ij
j c ≠i
≤ d )w j c ≤ d )w j
n
Wc − w j c
∑ W −w i =1
j
where w j c and w j are weights associated to firms j and jc, respectively. Analogously, Wc and W are the total weight of firms jc and the total weight of all firms in the dataset, respectively. The weights can be specified using any useful variable representing the size of firms. If the size is not taken into consideration, the weights are all set to 1. As with the location quotient, Mˆ (d ) has a benchmark value of 1 for any distance. Therefore, if at a given distance d, Mˆ (d ) is significantly greater than 1, we
detect significant relative spatial concentration of firms of the chosen typology of interest if compared to all other firms. However, if at a given distance d, Mˆ (d ) is significantly lower than 1, we detect significant relative spatial dispersion of firms of the chosen typology of interest if compared to all other firms. The confidence interval for the null hypothesis of 1 can be derived through Monte Carlo simulations based on random reassignments of firms’ typologies. Some notable empirical works that employed the M-function to analyze the spatial patterns of firms are those of Jensen and Michel (2011) and Moreno-Monroy and García (2016).
Points in a heterogeneous space 141
Example 8.3: The spatial pattern of single-plant metallurgy manufacturing firms in the province of Trento As an example of the application of the K-density and M-function measures of spatial concentration, and their associated tests, let us suppose that we are interested in analyzing the spatial pattern of a set of micro-data on single-plant firms in the metallurgy manufacturing industry in the province of Trento. This dataset was collected in 2009 by the Italian National Institute of Statistics. The dataset reports, in particular, the full address and the five-digit activity sector reference – according to the Nace Rev. 2 European industrial classification system – of the 1,007 single-plant manufacturing firms operating in the area. On the basis of such information we can identify the 98 single-plant firms belonging to the metallurgy sector. Figure 8.6 shows the spatial distribution of these plants. In order to detect the genuine spatial concentration generated by the interactions among metallurgical firms, while controlling for the heterogeneity of their geographical territory, we use the Kˆ density and Mˆ (d ) functions, which consider the spatial distribution of all manufacturing (Figure 8.6a) as the null distribution. Figure 8.7 show the behavior of the two estimated functions, along with the 99% confidence level simulation-based envelopes referring to the null hypothesis of absence of spatial dependence. Both measures reveal a significant strong tendency of metallurgical single-plant firms to cluster until at least 25 km. As a cumulative function which detects spatial interactions up to a certain distance, Mˆ (d ) is better at identifying the global pattern of spatial clustering.
Figure 8.6 L ocation of the 1,007 single-plant manufacturing firms in the province of Trento, 2009: (a) all manufacturing plants (1,007 observations); (b) plants from the metallurgy sector (98 observations). Source: Italian National Institute of Statistics.
3.5e-05
142 Locational choices of economic agents
˄ Observed values K density d
˄ Observed values M d
Expected values K density d 2.5 1.5 ˄ M d 1.0 0.5 0.0
1.0e-05 5.0e-06
M d
2.0
3.0e-05 2.5e-05 2.0e-05 1.5e-05
˄ K density d
Expected values
0
5000
10000
15000 d (metres)
20000
25000
0
5000
10000
15000
20000
25000
d (metres)
Figure 8.7 Behavior of Kˆ density and Mˆ (d ) and the corresponding 99% confidence bands for the metallurgy sector in Trento, 2009.
However, as a density function which detects spatial interactions specifically at a certain distance, Kˆ density is better at detecting the presence of local density of firms. Marcon and Puech (2010) show that Kˆ density and Mˆ (d ) are complementary rather than alternative approaches to measuring spatial clustering of economic activities and, as a consequence, both measures should be used to provide a complete picture of the spatial pattern of interactions. See Marcon and Puech (2010) for an interesting discussion of this issue.
Note 1 The relevant NACE codes are I55100, I55201, I55202, I55203, I55204, I55205, I55300, I55902.
9
Space–time models
This chapter extends the modeling framework presented in Chapters 7 and 8 to the case of a dynamic point pattern. It introduces the concept of a spatiotemporal K-function which allows the separation of the spatial and temporal dynamics and their interactions. Its use is demonstrated in the context of firm demography.
9.1 Diggle, Chetwynd, Häggkvist and Morris’ space–time K-function Ripley’s K-function and its various variants and extensions, presented in Chapters 7 and 8, can be conveniently used to perform an essentially static analysis of the spatial distribution of economic agents on a continuous space. For some microeconomic phenomena of interest, however, analyzing a micro-geographical pattern of events as it is observed in a given single moment of time can be limiting or, worse, misleading. The importance of considering the temporal dynamics in the analysis of spatial patterns of events is well explained by Getis and Boots’ seminal work (1978) which defines a useful “framework for viewing spatial processes”. The proposed conceptualization illustrates that without looking at the temporal evolution of a phenomenon of interest it is not possible to clearly detect the mechanism generating its spatial structure. More precisely, it is shown that even very different spatio-temporal point processes can lead to spatial patterns that look exactly the same. This implies that only phenomena with no increase or decrease of points over time can be represented as pure spatial processes (Getis and Boots, 1978) and, therefore, can be successfully analyzed while neglecting their time dimension. Diggle et al. (1995) proposed a dynamic extension of Ripley’s K-function which allows the characterization of the second-order properties of a general homogeneous spatio-temporal point process. Let us consider that the observed dataset of interest is a “time-labelled spatial point pattern” where each point location has also a label indicating its time of occurrence. Let us also consider that this dataset is a realization of a spatio-temporal point process that generates countable sets of points ( s i , t i ) where s i ∈ 2 is, following the notation used in Chapter 6, the spatial location of the ith event and t i ∈ represents its time of
144 Locational choices of economic agents occurrence. If the spatio-temporal point process is homogeneous both in space and time, that is if it is characterized by a constant first-order intensity λST , defined as the mean number of events per unitary spatial area per unit time interval, Diggle et al. (1995) suggest that its second-order properties can be defined by the space-time K-function: K (d ,t ) = λST −1 E [number of further points falling at a distance ≤ d and a time ≤ t from a typical point].
(9.1)
In Chapter 7, in the context of static analysis, we have seen that under the null hypothesis of complete spatial randomness, that is if the generating spatial point process is a homogeneous Poisson process, the static K-function can be written in a closed form and, in particular, K (d ) = π d 2 . Analogously, Diggle et al. (1995) show that under the null hypothesis of complete spatio-temporal randomness, that is if the locations and the corresponding times of occurrence are generated, respectively, by two independent homogeneous Poisson processes on 2 and then K (d , t ) = 2π d 2t .
(9.2)
Equation 9.2 can be used as a benchmark to develop formal tests of absence of spatio-temporal interactions among economic agents. Alternatively, under the consideration that in practice the underlying spatio-temporal point process is observed in a finite spatial region, say A, and within a finite time interval, say (0, T), Diggle et al. (1995) suggest that the absence of spatio-temporal interactions among events is also consistent with the independence between the spatial and temporal component processes and hence with the following factorization of the space–time K-function: K (d ,t ) = K S (d ) KT (t )
(9.3)
In Equation 9.3, K S (d ) is Ripley’s K-function, as defined in Equation 7.1, that we recall here for convenience: K S (d ) = λS−1 E [number of further points falling at a distance ≤ d from a typical point] and KT (t ) is the K-function of the temporal component process, which is defined as: KT (t ) = λT−1 E [number of further points falling at a time interval ≤ t from a typical point] where λS represents the constant spatial intensity, i.e. the mean number of points per unitary area, and similarly λT denotes the constant temporal intensity; that is, the mean number of points per unit time.
Space–time models 145 9.1.1 Estimation of space–time K-function Diggle et al. (1995) found an approximately unbiased estimator for K (d ,t ) by extending Ripley’s (1976) estimator for static spatial point processes – see Equation 7.4 – to the case of spatio-temporal point processes. In particular, referring to a time-labelled point pattern with ( s i , t i ): i = 1, 2,..., n locations and corresponding time labels, observed within a spatial region A and a time interval (0,T), the approximately unbiased estimator for K (d ,t ) is: Kˆ (d ,t ) =
AT n (n − 1)
n
∑∑ w
−1 −1 ij vij I
(dij ≤ d ) I (t ij ≤ t )
(9.4)
i=1 j ≠i
where, as usual, dij is the spatial distance between the ith and the jth observations and I (dij ≤ d ) represents the indicator function such that I = 1 if dij ≤ d and 0 otherwise. Analogously, t ij is the time interval between the ith and the jth observations and I (t ij ≤ t ) represents the indicator function such that I = 1 if t ij ≤ t and 0 otherwise. In addition, as in Equation 7.4, wij is an edge-effect correction factor corresponding to the proportion of the circumference of the circle centered on s i , passing through s j , which lies within the study region A. By analogy, in order to correct also for the potential temporal edge effects, vij is introduced and consists on the time segment starting from t i , passing through t j , lying within the observed total duration time between 0 and T. Following the same logic, edge-effect adjusted approximately unbiased estimators for K S (d ) and KT (t ) are, respectively, Kˆ S (d ) =
A n (n − 1)
n
∑∑ w
−1 ij I
(dij ≤ d )
−1 ij I
(t ij ≤ t )
i=1 j ≠i
and Kˆ T (t ) =
T n (n − 1)
n
∑∑ v i=1 j ≠i
(Diggle et al., 1995).
9.1.2 Detecting space–time clustering of economic events With microdata regarding the occurrences of single economic events in space and time, such as the establishment of new firms or the sale of real estate, and under the assumption that the data-generating spatio-temporal point process is homogeneous, the space-time K-function can be used to detect and measure space-time clustering. Since the null hypothesis of complete spatio-temporal randomness can be expressed in terms of independence between the spatial and temporal component processes, and hence by Equation 9.3, one possible
146 Locational choices of economic agents diagnostic tool for space–time clustering of economic agents can be the function proposed by Diggle et al. (1995): Dˆ (d ,t ) = Kˆ (d , t ) − Kˆ S (d ) Kˆ T (t )
(9.5)
This empirical summary statistic is proportional to the increased numbers of economic events occurred within spatial distance d and time interval t with respect to a process which is characterized by the same temporal and spatial characteristics but no space–time interaction. Therefore, the presence of space–time interactions can be detected by the emergence of peaks on the 3-dimensional surface of Dˆ (d ,t ) plotted against d and t. Diggle et al. (1995) have also proposed the following transformation of Equation (9.5) that provides relative quantities rather than absolute numbers: Dˆ 0 (d ,t ) = Dˆ (d , t )
{Kˆ
S
(d ) Kˆ T (t )}
(9.6)
Equation 9.6 is indeed proportional to the relative increase in the occurrence of economic events within spatial distance d and time interval t with respect to a process with the same temporal and spatial characteristics, but no space–time interaction, and, in some empirical circumstances, can ease the interpretation. Likewise Dˆ (d ,t ), also Dˆ 0 (d ,t ) can be plotted in a 3-dimensional graph against d and t to visualize and detect the interaction between the spatial and temporal component processes. It can be noted that, for any d and t, both Dˆ (d ,t ) and Dˆ 0 (d ,t ) have a benchmark value of 0 representing the null hypothesis of no space–time interactions between economic events. Since the sampling distribution of Dˆ (d ,t ) is intractable, Diggle et al. (1995) have introduced a Monte Carlo simulation procedure to assess the statistical significance of the deviations of Dˆ (d ,t ) from 0. The procedure consists of performing a number of, say m, simulations in each of which the observed time “labels” are randomly permuted amongst the observed locations. For each of the m resulting simulated spatial–temporal point patterns, a different Dˆ (d ,t ) can be computed, thus obtaining Dˆ (d ,t ): i = 1, 2,..., m estimates. The i
variance of the distribution of these m estimates, say Vˆ (d ,t ), provides an unbiased estimate of the variance of Dˆ (d ,t ) under the null hypothesis of no space– time interaction (Diggle et al., 1995). The following standardized statistic: Rˆ (d , t ) = Dˆ (d ,t )
Vˆ (d , t )
(9.7)
can then be used as an inferential tool that allows the visualization of deviations from the null hypothesis. In particular, Diggle et al. (1995) suggest that the plot of Equation 9.7 against Kˆ S (d ) Kˆ T (t ) can be viewed analogously to the plot of the regression model’s standardized residuals against model predicted values. Indeed, if there is no significant dependence between the spatial and temporal
Space–time models 147 component processes then approximately 95% of the values of Rˆ (d , t ) would lie within two standard errors (Diggle et al., 1995). As a consequence, a relevant amount of values of Rˆ (d , t ) outside the interval (−2, 2) would indicate a significant deviation from the null hypothesis and hence would reveal a potentially interesting form of space–time interactions among economic agents that could be better interpreted through the examination of the plot of Dˆ (d , t ), or Dˆ (d , t ) , 0
against d and t. Finally, Diggle et al. (1995) also proposed a simple global Monte Carlo test for space–time interaction that consists of taking the actual observed sum of the Rˆ (d , t ) values, that is: u=
∑∑ Rˆ (d, t ) d
t
and rank it with respect to the m analogous sums ui =
∑ ∑ Rˆ (d, t ): d
t
i
i = 1, 2,..., m computed for the m Monte Carlo simulated spatio-temporal point patterns. A particularly extreme value of u among the u1 ,..., um values would constitute evidence of overall space–time interaction. For example, if u is ranked above (or below) 95 out of 100 values of ui then the probability that the observed space–time interaction occurred by pure chance is less than 5%. As a way of illustration of this approach to the analysis and testing of spacetime interactions among events, Figures 9.1, 9.2 and 9.3 show the behavior of Dˆ (d , t ) and Rˆ (d , t ) and the performance of the associated Monte Carlo test in the three different paradigmatic empirical circumstances of no space-time interactions, spatio-temporal clustering and spatio-temporal inhibition. In all figures, the first graph (a) depicts a spatio-temporal point pattern where the events’ locations are represented by circles with size proportional to the time of occurrence of the events; the second graph (b) is the plot of Dˆ (d , t ) against d and t; and the third graph (c) is the plot of Rˆ (d , t ) against Kˆ S (d ) Kˆ T (t ). In particular, Figure 9.1 refers to a stylized fact where economic agents tend to locate independently of where and when the other actors locate. Indeed, Figure 9.1b shows that Dˆ (d , t ) fluctuates randomly around 0, and Figure 9.1c shows that almost all values of Rˆ (d , t ) lie between −2 and 2. Figure 9.2 describes instead the case
of space–time clustering where, as depicted by the map, the economic agents that form spatial clusters tend also to have similar times of occurrence. In this case, function Dˆ (d , t ) reveals a strong positive peak around d = 0.12 and t = 0.10 and the plot of Rˆ (d , t ) indicates that a large amount of values are greater than 2 indicating statistically significant spatio-temporal interactions. On the other hand, Figure 9.3 describes the situation of space–time inhibition where, as it is suggested by graph (a), economic agents tend to locate distancing each other both in space and time. We can indeed observe a significant steep negative peak in the plot of Dˆ (d , t ).
148 Locational choices of economic agents 0.04 0.02 D
0.00
–0.02 –0.04
–0.06 15 10 t
0 0.00
(b)
0.05
0.10
0.15
0.20
0.25
d
0 –2
–1
∂
R(d, t)
1
2
(a)
5
0
(c)
1
∂
2
∂
KS (d)KT (t )
3
4
5
Figure 9.1 Performance of the space–time K-function approach under complete spatiotemporal randomness (p-value test Monte Carlo = 0.816).
The space–time K-function has been used in some empirical economic related studies. For example, Arbia et al. (2010) used this approach to detect the existence of space–time clustering of ICT firms in Rome; Kang (2010) analyzed the spatio-temporal distribution of firms in the Columbus metropolitan area (Ohio); Conrow et al. (2015) studied the spatio-temporal relationship between alcohol sellers’ locations and crime events in Buffalo, New York.
9.2 Gabriel and Diggle’s STIK-function Diggle et al.’s space–time K-function (1995) can properly describe and detect the spatio-temporal characteristics of an observed time-labelled point pattern dataset under the assumption that the data-generating spatio-temporal process is homogeneous, both in space and time. Nevertheless, the associated Monte Carlo test for space–time interaction, which is based on random permutations of
Space–time models 149
0.15 0.10 D
0.05 0.00
–0.05 15 10 t
0.20
5 0.10 0.05
0.25
0.15 d
0 0.00
(b)
2 –4
–2
0
∂
R(d, t)
4
6
8
(a)
0
(c)
1
2
∂
3
∂
4
5
KS (d)KT (t )
Figure 9.2 Performance of the space–time K-function approach under space-time clustering (p-value test Monte Carlo = 0.001).
the time labels t i : i = 1, 2,..., n , still provides valid inference even if the underlying data-generating process is inhomogeneous (Diggle et al., 1995). However, in order to deal explicitly with spatio-temporal inhomogeneity, Gabriel and Diggle (2009) proposed an extension of the space–time K-function to the context of inhomogeneous spatio-temporal point processes. Their proposed space–time inhomogeneous K-function (STIK-function) formally is a reduced measure of the second-order properties of an inhomogeneous spatio-temporal point process and, as we will show, can be efficiently used to measure and detect spatio-temporal clustering of economic events (e.g. firm entries or exits) while controlling for the heterogeneity of the territory and/or time period under study. Let us continue to assume that the observed time-labelled point pattern ( s i , t i ): i = 1, 2,..., n is a realization of a spatio-temporal point process. Let us consider, however, that now this process is inhomogeneous. Analogously to what we have seen for purely spatial point processes in Chapter 6, an inhomogeneous spatio-temporal point process is characterized by a non-constant first-order
150 Locational choices of economic agents
0 –1 D –2 –3 0.00 0.05 0.10
15
d
10
0.15 5
0.20 0.25
0
(b)
–3
–2
∂
R (d, t ) –1
0
(a)
t
0
1
(c)
2
∂
3
∂
4
5
6
KS (d)KT (t)
Figure 9.3 Performance of the space–time K-function approach under space-time inhibition (p-value test Monte Carlo = 0.001).
intensity function, λ ( s , t ), that in a spatio-temporal setting can be defined as follows (Gabriel and Diggle, 2009):
λ (s , t ) =
lim
ds ×dt →0
E N (ds × dt ) ds × dt
(9.8)
where ds × dt is an infinitesimally small spatio-temporal region containing the point ( s , t ), N (ds × dt ) represents the number of points located in it and ds × dt is the volume of ds × dt . Equation 9.8 can therefore be interpreted as the expected number of points located within an infinitesimal region centered on the point ( s , t ). As a consequence, a relatively high (low) value of λ ( s , t ) implies a relatively high (low) expected concentration of points located around ( s , t ). In the context of the study of economic phenomena, λ ( s , t ) may describe spatio-temporal heterogeneity which occurs because exogenous factors lead economic agents to locate in certain geographical areas and during certain times.
Space–time models 151 For example, firms may not locate in some areas and period of times due to the presence of legal and geophysical limitations or may locate in certain zones during given periods because of the influence of exogenous factors such as the timely presence of useful infrastructures, the proximity to communication routes or more advantageous local taxation systems. According to Gabriel and Diggle (2009), the second-order intensity function can be generalized from the context of a purely spatial point process to the setting of a spatio-temporal point process using the following function:
λ2 (( s , t ) , ( s ′, t ′)) =
lim
E N (ds × dt ) N (ds ′ × dt ′)
ds ×dt , ds ′×dt ′ →0
ds × dt ds ′ × dt ′
where ( s , t ) and ( s ′, t ′) denote two distinct generic events in the spatio-temporal domain. Informally, λ2 (( s , t ) , ( s ′, t ′)) can be interpreted as the expected number of point events located in s ′ and occurred in time t ′ or located in s and occurred in time t. In the context of the analysis of the spatio-temporal distribution of economic agents, λ2 (( s , t ) , ( s ′, t ′)) may describe, for example, the spatio-temporal dependence occurring when the presence of an economic activity in an area, because of the working of spatial externalities, attracts other firms to locate in the same area relatively rapidly. In reality, space–time clusters of economic activities may be due to the joint action of spatio-temporal heterogeneity and spatio-temporal interactions among economic agents. However, in many empirical applications, although it is important to measure the agglomeration of economic activities properly, it is often important to disentangle heterogeneity from interactions or, in other words, to distinguish between exogenous and endogenous formation of clusters of economic agents. The STIK-function (which represents a different and more tractable measure, with respect to λ2 (⋅), of the second-order properties of a spatio-temporal point process), can be used to identify the endogenous effects of the spatio-temporal interactions among economic agents after adjusting for the exogenous effects of the spatio-temporal characteristics of the region under observation. It is useful to first introduce the pair correlation function (Gabriel and Diggle, 2009): g (( s , t ) , ( s ′, t ′)) =
λ2 (( s , t ) , ( s ′, t ′)) λ ( s , t ) λ ( s ′, t ′)
which can be heuristically interpreted as a measure of the spatio-temporal association between ( s , t ) and ( s ′, t ′). Indeed, in the case of no spatio-temporal interactions between ( s , t ) and ( s ′, t ′) we have λ2 (( s , t ) , ( s ′, t ′)) = λ ( s , t ) λ ( s ′, t ′) and hence g (( s , t ) , ( s ′, t ′)) = 1. On the other hand, if λ2 (( s , t ) , ( s ′, t ′)) > λ ( s , t ) λ ( s ′, t ′) and g (( s , t ) , ( s ′, t ′)) > 1 we detect some form of attraction, while if λ2 (( s , t ) , ( s ′, t ′)) < λ ( s , t ) λ λ2 (( s , t ) , ( s ′, t ′)) < λ ( s , t ) λ ( s ′, t ′) and g (( s , t ) , ( s ′, t ′)) < 1 we have some form of repulsion, or inhibition, between ( s , t ) and ( s ′, t ′).
152 Locational choices of economic agents While extending the concept of second-order reweighted stationarity introduced by Baddeley et al. (2000) (see Chapter 8) from a pure spatial context to a spatio-temporal setting, Gabriel and Diggle (2009) define a spatio-temporal point process as second-order intensity reweighted stationary and isotropic if λ ( s , t ) is bounded away from zero and if g (( s , t ) , ( s ′,t ′)) depends only on the spatial distance u, between s and s ′, and the time interval v, between t and t ′. In the light of these concepts and definitions, the STIK-function of a secondorder intensity reweighted stationary and isotropic spatio-temporal point process can be written as (Gabriel and Diggle, 2009): K ST (u , v ) = 2π
∫ ∫ v
u
−v
0
g (u ′, v ′) u ′ du ′dv′
(9.9)
with g (u , v ) = λ2 (u , v ) ( λ ( s , t ) λ ( s ′, t ′)) . Gabriel and Diggle (2009) show that the STIK-function can be used to develop a formal test for the presence of spatio-temporal concentration because, under the null hypothesis of no spatial temporal interactions, it is possible to compute the integral in the right-hand side of Equation 9.9 and hence writing K ST (u , v ) in a closed form. The inhomogeneous spatio-temporal Poisson process is the process which represents the benchmark of no spatio-temporal interactions in a heterogeneous environment, where the spatio-temporal heterogeneity is specified by a known first-order intensity function λ ( s , t ) (see Diggle, 2007). Gabriel and Diggle (2009) show that if the observed points ( s i , t i ) : i = 1, 2,, n are a realization of an inhomogeneous spatio-temporal Poisson process with given λ ( s , t ), then K ST (u , v ) = π u 2v , with u > 0 and v > 0
(9.10)
Equation 9.10 can properly represent the null hypothesis of no spatio-temporal interactions among events. Therefore, significant deviations from this benchmark indicate the alternative hypothesis of spatio-temporal dependence. In particular, if K ST (u , v ) > π u 2v we detect positive dependence and then spatio-temporal clustering (a situation where events tend to attract each other both in time and space); in contrast, when K ST (u , v ) < π u 2v we have, as usual, evidence of negative dependence and hence inhibition (in which events tend to distance each other both spatially and in time).
9.2.1 Estimation of STIK-function and inference A significance test to verify whether economic agents tend to interact in space and time may consist of assessing if, for some u and v, the function K ST (u , v ) − π u 2v, estimated on the observed spatio-temporal distribution of the economic agents (s i , t i ) : i = 1, 2,, n within a spatial region A and a time interval (0,T), is significantly greater than 0. Diggle and Gabriel (2009) have shown that an approximately unbiased estimator for K ST (u , v ) is
Space–time models 153 Kˆ ST (u, v ) =
1 n A T nv
nv nv
∑∑ w1 i=1 j >i
ij
1 I (uij ≤ u ) I (t ij ≤ v ) (9.10) λ (s i , t i ) λ (s j , t j )
which has to be computed with the ( s i , t i ) sorted so that t i < t i +1. The notation in Equation 9.10 is coherent with that used for Equation 9.4. The new term nv refers to the number of point for which t i ≤ T − v . The introduction of n nv allows to adjust for the temporal edge effects (Diggle and Gabriel, 2009). We already know from Chapter 8 that in practical applications λ ( s , t ) in Equation 9.10 is unknown and that, in general, it is difficult to identify a theoretical economic model that specifies a given functional form for it. In most cases it has then to be estimated. In this respect, Gabriel and Diggle (2009) suggest using the working assumption that λ ( s , t ) can be factorized into the product of the separated spatial intensity, say m ( s ), and temporal intensity, say µ (t ) : that is, that λ ( s , t ) = m ( s ) µ (t ) . According to this assumption, the separable effects are considered of first-order kind, and hence generated by the spatio-temporal heterogeneity, while the non-separable effects are considered of second-order kind, and hence as a consequence of spatio-temporal interactions among events (Gabriel and Diggle, 2009). As in the purely spatial context considered in Chapter 8, both parametric and nonparametric approaches to the estimation of m ( s ) and µ (t ) are available. In case the nonparametric estimation is preferable, a suitable procedure is that of kernel smoothing. Alternatively, if suitable additional information is available one can still rely on parametric regression models where m ( s ) and µ (t ) are specified as functions of a set of, respectively, geographically and temporally referenced variables capturing the effects of spatio-temporal heterogeneity. These variables should represent common contextual factors shared by the economic agents located in the same area during the same period of time. This would allow the inclusion of both individual traits and the context where the economic agents operate within a unified framework. In order to evaluate the statistical significance of the deviations of Kˆ ST (u , v ) − π u 2v from 0 we can rely on Monte Carlo simulated tolerance envelopes (Gabriel and Diggle, 2009). In particular, we can rely on an inferential procedure that consists on generating counterfactual spatio-temporal point patterns according to an inhomogeneous spatio-temporal Poisson process with ˆ ( s ) µˆ (t ) , conditional upon the number of the obfirst-order intensity λˆ ( s , t ) = m served events.
Example 9.1: Long run spatial dynamics of ICT firms in Rome, 1920–2005 Let us consider an example that illustrates the use of the STIK-function approach to the analysis of the spatio-temporal clustering of firms in the ICT sector in Rome. The data we are using consist of the locations of new ICT firms in Rome observed over a fairly long period, 1920 to 2005. In particular, the dataset reports the full address and the year of establishment of the 169 plants currently
154 Locational choices of economic agents
0
10
Frequency 20 30 40
50
60
operating in the area, thus disregarding the firms that exited the market before 2005. Figure 9.4 displays the spatial and temporal distribution of these firms. As can be noted, the economic activity seems to concentrate in some specific geographical areas (namely the main urban areas of the city) and in the most recent years. Due to the unavailability of other variables proxying for the contextual spatial and temporal heterogeneity, we may find it proper to estimate the separate temporal intensity µ (t ) of the establishment of new firms using standard kernel density smoothing. As we already know from Chapter 8, the separate spatial intensity m ( s ), can be properly estimated using the spatial kernel estimator proposed by Baddeley et al. (2000) (see Equation 8.4). Figure 9.5 displays the resulting estimates for the spatial and temporal intensity functions.
1920
1940
1960 1980 Year of entry
2000
Figure 9.4 Spatial (a) and temporal (b) distribution of the 169 new firms in the ICT sector in Rome, 1920–2005. Source: Industrial Union of Rome (UIR). (b)
0.01 0.00
80000 60000 40000 20000 0
Density 0.02 0.03
0.04
(a)
1920
1940
1960
1980
2000
2020
Figure 9.5 Estimated spatial intensity (a) and temporal intensity (b) for new firms in the ICT sector in Rome, 1920–2005.
10 0
5
v
15
20
Space–time models 155
0
2
4
6
u
8
10
12
14
Figure 9.6 Plot of the function Kˆ ST (u , v ) − π u 2v for new firms in the ICT sector in Rome, 1920–2005, compared with the tolerance envelopes under the null hypothesis of no spatio-temporal interactions: shaded areas are those leading to rejection of the null hypothesis.
ˆ ( s ) we can obtain Kˆ ST (u , v ) − π u 2v Now that we have an estimate of µˆ (t ) and m and perform the test for spatio-temporal interactions among new firms while adjusting for an underlying spatial and temporal trend. According to Gabriel and Diggle (2009) the results of the test can be usefully visualized as in Figure 9.6. In particular, this kind of graph summarizes the comparison between the behavior of Kˆ ST (u , v ) − π u 2v , which refers to the observed data, and the tolerance envelopes simulated according to the null hypothesis of no spatio-temporal interactions among economic agents: the shaded areas identify the values of spatial distance u and temporal distance v for which the empirical Kˆ ST (u , v ) − π u 2v , describing the observed firm entries data, is above the 95th percentile of the Kˆ ST (u , v ) − π u 2v (thus rejecting the null hypothesis) computed on 1,000 simulations of an inhomogeneous Poisson process with first- ˆ ( s ) µˆ (t ) . order intensity λˆ ( s , t ) = m Significant spatio-temporal clusters of ICT firms occur with a time lag of approximately 10 years up to a distance of basically 5 km. This provides evidence of a phenomenon of space–time clustering which cannot be explained merely by spatio-temporal heterogeneity (an exogenous characteristic of the observed area ˆ ( s )), but it is also due to and of the observed period, as described by µˆ (t ) and m some form of genuine interaction amongst economic agents.
Part IV
Looking ahead Modeling both the spatial location choices and the spatial behavior of economic agents
10 Firm demography and survival analysis
10.1 Introduction Part II of this book was devoted to methods and techniques for analyzing the spatial behavior of individual economic agents taking their locations as exogenously given. Part III was devoted to the methodologies used to study the locational choice of economic agents and their joint locations with respect to other agents. The scope of the growing field of spatial microeconometrics makes use of both streams of literature to build up a new modeling strategy which treats location as endogenous and takes into account simultaneously both individuals’ locational choices and their economic decisions in the chosen location. The literature in this area is still relatively in its infancy (Dubé and Legros, 2014; Arbia et al., 2016) and rather scattered in a few articles. The aim of this chapter is to show through case studies how it is possible to join together within a unified framework the lessons learnt in the first chapters of this book and produce models in this direction. A first example of a spatial microeconometric model can be found in the literature related to firm demography and firm survival. The current state-of-the-art in this area includes a vast variety of contributions and empirical methodologies mainly for data aggregated at macro- or meso-territorial levels, in which the typical observations consist of administrative units such as regions, counties or municipalities. Comparatively less attention has been devoted to the development of a systematic approach to the analysis of individual micro-data where the observations are represented by the locations of each individual firm. This approach strongly limits the possibility of obtaining robust evidences about economic dynamics for two main reasons. The first is related to the modifiable areal unit problem (Arbia, 1989) mentioned in Chapter 1: with regional data we do not observe the dynamics of the single individual but only the dynamics of variables within arbitrary partitions of the territory. The second reason is that theoretical models of firm demography are based on behavioral models of single individual economic agents (Hopenhayn, 1992; Krueger, 2003; Lazear, 2005 among others), so that if we base our conclusions on regional aggregates we support the theoretical model only if we are ready to admit the restrictive and unrealistic assumption of the homogeneity of each firm’s behavior
160 Looking ahead within the region. In the few remarkable cases where a genuine micro-approach was adopted, the results confirm the relevance of neighborhood effects that reveal interesting scenarios for future researches. For instance, Igami (2011) shows that the introduction of a new large supermarket in one area increases the chance of failure for larger stores in the neighborhood, but increases the probability of survival of smaller incumbents. Analyzing a set of data collected in the food retailing sector, Borraz et al. (2014) show that the establishment of a supermarket in the neighborhood of a small store significantly increases the probability of the small store going out of business in the same year. A good way of overcoming these limitations is to simply remove the boundaries and analyze the economy of a continuous space. Economists often see that economic activities are located in a continuous space and that “there is no particular reason to think that national boundaries define a relevant region” (Krugman 1991a; 1991b). So why should a regional boundary define a relevant region? Obviously we are not saying that boundaries should be ignored altogether but only that we need to distinguish between the meanings of boundaries in different situations. In some instances boundaries can be classified as significant borders, that is, places where the economic conditions change abruptly because of some change, for example, in the tax system or transport costs. In other instances borders are irrelevant, where nothing actually happens from an economic standpoint. Starting from these considerations, we think that the shift of emphasis from a meso- to a micro-level is likely to bear interesting fruit. Krugman (1991a) has remarked that “if we want to understand differences in national growth rate, a good place to start is by examining differences in regional growth”. Here we assert that a good way to understand regional economics is to begin by examining the micro-behavior of economic agents in the economic space, and so explore the micro-foundations of regional economics. After a model has been identified at the micro spatial level, we can always superimpose an administrative grid and examine the implied meso-scenario. In fact, phenomena in nature are encountered in continuous space and are developed over continuous time; it is only due to our limitations that we discretize phenomena in some way (and subsequently distort it by reducing the quantity of information). Apart from the motivations given in the previous sections, a more remote incentive to study the continuous properties of economic phenomena dates back to Leibniz and his famous quote: “natura non facit saltus”.1 The same general idea has been adopted in time-series analysis with the development of continuous time econometrics and is providing significant contributions to many branches of economics (see Gandolfo 1990; Bergstrom 1990). The idea of continuous space modeling is not new in economic geography and spatial economics: it was present in Weber’s studies of industrial location at the beginning of the twentieth century (Weber, 1909). More recently Beckmann (1970) and Beckmann and Puu (1985) analyzed equilibrium conditions of models defined in a continuous space. Griffith (1986) discusses a spatial demand curve based on a central place economic landscape defined on a continuous surface. Kaashoek and Paelinck (1994; 1996; 1998) derive the properties of a non-equilibrium
Firm demography and survival analysis 161 dynamic path of continuous space economic variables based on partial differential equation theory (John 1978; Toda 1989). However, these studies are all concerned with the theoretical properties of models, whereas we are interested in identifying models susceptible to statistical estimation and testing on the basis of existing data. There are many reasons why such an approach has not been adopted thus far. The most obvious are lack of an appropriate statistical methodology, lack of accurate data (often not available for confidentiality reasons) and lack of appropriate computer technologies. However, the methods for analyzing spatial data on a continuous space now form a well-consolidated methodological body as it was extensively discussed in this book. The availability of statistical data at the individual agent level has also increased considerably in recent times, due to the diffusion of spatially referenced administrative records, new Big Data sources, as discussed at length in Chapter 1.3, and the development of methods to conceal confidential data without seriously distorting the statistical information (see Cox 1980; Duncan and Lambert 1986; De Waal and Willenborg 1998; Willenborg and De Waal 1996; Arbia et al., 2016; Chapter 3.5). As a consequence, there no longer appear to be any technical obstacles in a microeconomic approach to regional problems. We will formalize such an approach in the next section.
10.2 A spatial microeconometric model for firm demography 10.2.1 A spatial model for firm demography 10.2.1.1 Introduction In this section we will introduce a class of testable models to help explain the concentration of firms in space. The formalism is taken from Arbia (2001) and derives from a model proposed by Rathbun and Cressie (1994) for the spatial diffusion of vegetation. Our modelling framework also shares a particular resemblance with the methodology employed by Van Wissen (2000) to simulate the dynamics of firm demography in space. However, even if the set-up of the model is similar, we must emphasize that in Van Wissen’s approach (as in other recent works on firm demography, see Bade and Nerlinger 2000; Van Dijk and Pellenbarg 2000), the aim is to model firm behavior within regions. Our goal, however, is to explain why a firm locates (develops and dies) at a certain point in space. Indeed, the stochastic point processes theory presented in Part III is at the basis of possible firm demography models which account for the links of spatial interactions amongst individual economic agents. By treating the spatial distribution of the economic activities as the result of a dynamic process which occurs in space and time, the observed micro-geographical patterns can be modeled as realizations of a marked space–time survival point processes (see Chapter 6), where firms are created at some random location and at some point in time and then they operate, grow and attract (or repulse) other firms in their
162 Looking ahead neighborhood. Following the reductionist approach already exploited by Arbia (2001), in this section we formalize three different model components: i a birth model, ii a growth model and iii a death/survival model. Such an approach can prove very useful in that the model parameters can provide indications of how to validate the different paradigmatic economic-t heoretical cases, such as the presence of spatial spillover effects, the effects of positive spatial externalities or hypotheses concerning the spatial inhibition processes among economic agents.
10.2.1.2 The birth model The first element of the comprehensive spatial microeconometrics approach is constituted by an equation for the birth of new firms. In this methodological framework the observed spatial point pattern of new firms is assumed to be the realization of a point process conditional on the locations of existing firms in that moment. In order to formalize our model, we rely on the spatial point process methodology (Diggle, 2003) introduced in Chapter 6. Within this framework, a spatial point process is considered as a stochastic mechanism that generates patterns of points on a planar map. The basic characteristic of a spatial point process is the intensity function, introduced in Equation 6.1 and denoted by the symbol λ ( x ). Thus, by definition, the higher λ ( x ), the higher the expected concentration of points around x (see Arbia et al., 2008). Following the modeling framework originally proposed in a seminal paper by Rathbun and Cressie (1994) and imported by Arbia (2001) into the field of regional economics, the formation process of new firms can be modeled as an inhomogeneous Poisson process (see Diggle, 2003) with intensity function λ ( x ) driven by the potential interaction effects of the existing firms. The values of λ ( x ) constitute a realization of a random function parametrically specified by the following model:
λ ( x ) = exp {α + β1d ( x ) + β 2W ( x ) + β3 Z ( x ) + Φ ( x ) βk }
(10.1)
where α , β1, β2, β3 are parameters to be estimated, d(x) indicates the distance of point x from a conspicuous point (see Section 3.6) and W (x) is a term measuring the sign and the intensity of the interaction between the firm located in point x and the other existing firms and incorporates the idea of non-constant spatial returns. A particular specification for the W (x) function was suggested by Arbia (2001). Furthermore, in Equation 10.1, Z represents a vector of independent variables assumed to be spatially heterogeneous (such as demand or unitary transport costs and regional policy instruments such as local taxation, incentives)
Firm demography and survival analysis 163 that can influence the birth of economic activities in the long-run. Finally Φ(x) is the error term of the model assumed to be spatially stationary, Gaussian and zero mean, but non-zero spatial correlations. Due to the nature of the error term, the estimation of Equation 10.1 presents some problems that will be discussed more thoroughly in Section 10.2.2 where we discuss a numerical application of the model. Notice that Equation 10.1 can be seen as a continuous space version of Krugman’s concentration model that avoids the problems associated with arbitrary geographical partitions (see Krugman, 1991a). An example of the formalization of Equation 10.1 was already presented in Example 8.1.
10.2.1.3 The growth model The second component of our model is the spatial growth dynamic of the firms. In this context, the growth of a single firm (proxied, for example, by the growth in the number of employees), is assumed to be a function of its stage of development at the beginning of the period and of the competitive (or cooperative) influences of the other firms located in the neighborhood. Following the framework suggested by Rathbun and Cressie (1994) and Arbia (2001), the growth of the ith firm can be modeled as follows: g i = ϕ 0 + ϕ1D ( x ) + ϕ 2Wi + εi
(10.2)
where g i represents the growth of firm i in a certain period of time, D ( x ) represents its dimension at the beginning of the period, and: n
Wi =
∑Φ
ij
(10.3)
j =1
represent measures of the level of spatial interaction between the firms, ϕ1, ϕ 2 and ϕ 3 are parameters to be estimated and the εi ’s are independently and normally distributed errors with zero mean and a finite variance.
10.2.1.4 The survival model The last component of the model is devoted to uncovering the death/survival process that together with the other two components brings the time dimension into the discussion and to fully describing the whole spatial demographic phenomenon. The death/survival process is fitted to the firms existing at a given moment of time which have survived or ceased to operate in a certain time interval. According to the methodological framework proposed by Rathbun and Cressie (1994) and Arbia (2001), a death/survival process can be developed as a survival conditional probability model which will be now described. Let xit denote the spatial coordinates of firms, which survive at time t. Let also Z it be a measure of firm dimension at time t (such as the number of employees). Finally let M i (t ) denote a survivorship indicator variable such that M i (t ) = 1 if the ith
164 Looking ahead firm survives at time t and M i (t ) = 0 if it ceases its activity at time t. The survival probability of the ith firm can then be defined as: pi (t ; θ ) ≡ P {M i (t ; θ ) = 1| M i (t − 1; θ ) = 1}
(10.4)
where θ is a set of unknown parameters. By assuming that the survival at time t is a function of the dimension of the firm at time t – 1 and of the competitive or cooperative influences of the other neighboring firms also operating at time t – 1, pi (t ; θ ) is then modeled using the following space–time logistic regression (Rathbun and Cressie, 1994): p (t , θ ) ln i = θ0 + θ1D ( x ) + θ 2Wi 1 − pi (t , θ )
(10.5)
where again, as in Equation10.2: n
Wi =
∑Φ
ij (Z i , d
= x it − xit −1 )
(10.6)
j =1
represent measures of the level of spatial interaction of the ith firm, Φij ( Z , d ) is a known function which can be specified by the same functional forms already proposed for the growth model’s Equations 10.1 and 10.2, and θ0 , θ1 and θ 2 are the parameters to be estimated. Given a proper specification of Φij ( Z , d ) and hence given Equation 10.6, θ for Equation 10.5 can be estimated using the standard maximum likelihood procedure. This model aims to test the hypothesis that the presence of firms in one location can attract or inhibit the establishment of other firms in nearby locations.
10.2.2 A case study 10.2.2.1 Data description In the past, most of the studies on firm demography were based on data on firm location aggregated at the level of geographical partitions such as regions, counties or countries mainly due to data limitations. The scenario, however, is rapidly changing and the accessibility of micro-geographical data related to single economic agents is becoming more and more common in many applied studies. In particular, when dealing with firm demography, many existing official databases are enriched with detailed information related to a single firm, including its geographical coordinates and a set of relevant variables, such as production, capital and labor inputs, level of technology and many others. A good example of such an informative database is the Statistical Register of Active Enterprises, managed and updated by the Italian Statistical Institute. At the firm level, this database currently contains information about firm code, tax code, business name, sector of activity (according to the NACE classification), number
Firm demography and survival analysis 165 of employees, legal status (according to the current classification), class of sales, date of establishment and (if applicable) of termination. Drawn from such a rich database, the data employed in the case study analyzed in the present chapter refer to the geographical location and the number of employees of small retail food stores and big supermarkets2 also selling food products operating in the town of Trento between 2004 and 2007. The model presented here draws heavily on work by Arbia et al. (2014a; 2017) and consists of the three basic components discussed in Section 10.2.1 We can conceive the three models as a way of describing different aspects of the phenomenon. Consequently (and consistently with Rathbun and Cressie, 1994), the three models are estimated separately and not simultaneously with no interaction effects, thus eliminating possible problems of simultaneity that could arise in the estimation methods and undermine their validity. More specifically, the case study refers to food stores in Trento, the birth component model will be fitted to the observed spatial distribution of 26 small retail food stores established after 2004 which were still active in 2007. Similarly, the growth component model will be fitted to the observed spatial pattern of the 72 small retail food stores established in 2004 or earlier, which were always active in the period from 2004 to 2007. Finally, the death/survival component process will be fitted to the 229 small retail food stores which were established in 2004 or earlier and that survived at least until 2007 or ceased to operate during the period 2004 to 2007. The three point patterns relative to birth, growth and survival, are represented in Figure 10.1. In this methodological framework the observed spatial point pattern of the establishment of new small food stores is assumed to be the realization of a point process conditional on the locations of existing small food stores and big supermarkets in that moment. Our model aims to test the hypothesis that big
(a)
(b)
(c)
Figure 10.1 Spatial point patterns of small retail food stores used to estimate: (a) the firm establishment process, 2004–2007; (b) the firm growth process, 2004–2007; (c) the firm survival process, 2004–2007.
166 Looking ahead supermarkets inhibit the establishment of small food stores in nearby locations, while, due to the presence of positive spatial externalities, the presence of small food stores attract other small food stores (see Igami, 2011; Borraz et al., 2014).
10.2.2.2 The birth model Following the modelling framework described in Section 3.4.2 the new firm formation process of small food stores during the period 2004 to 2007 is modeled as an inhomogeneous Poisson process with intensity function λ ( x ) driven by the potential interaction effects of the existing small food stores and big supermarkets. The values of λ ( x ) can thus be conceived as a realization of a random function parametrically specified by the following model:
λ ( x ) = exp {α + β ss n ss ( x ) + βbs nbs ( x ) + βnh nnh ( x ) + βaf naf ( x )}
(10.7)
where β ss , βbs , βnh , βaf are the parameters to be estimated. The variables n ss ( x ) and nbs ( x ) measure the overall number of employees of the small food stores and, respectively, of big supermarkets, existing from before 2005 which are located around the arbitrary point x. The two additional variables nnh ( x ) and naf ( x ) control for other factors that can affect the spatial intensity of the formation process of new small food stores. Locational choices of firms, and in particular of retail activities, can strongly depend on the potential market demand. As a proxy for the spatial distribution of potential customers in the city of Trento we use the number of households by census tract in 2004, which is at the finest level of spatial resolution. In order to properly include this data in the modeling framework of Equation 10.7 we can build up a marked point pattern (see Chapter 6) where the points are the centroids of the census tracts and the associated marks represent the number of household per census tract (see Figure 10.2). Then we could define the control variable nnh ( x ) , which measures the number of households in 2004 that reside close to the arbitrary point x. However, the decision to open a new firm in a particular location is also affected by the spatial characteristics of the territory (such as the urban structure, the presence of useful infrastructure or environmental and administrative limits). To control for these unidentified sources of spatial heterogeneity, we can include the variable naf ( x ) , which is constructed as the overall number of employees of all firms of all industries, operating from before 2005, located around the generic point x. The use of this specific control variable is motivated by the assumption that the main unobserved exogenous spatial factors affecting the locational choices of firms are common for all the economic agents. The model is based on the working assumption that the locations of incumbent economic agents operating before 2005 are exogenously given. As already remarked by Arbia (2001), this hypothesis is consistent with Krugman’s idea of “historical initial conditions” (Krugman, 1991a). The logarithmic transformation allows to fit the model maximizing the log pseudo-likelihood for λ ( x ) (Besag, 1975) based on the points x constituting the observed point pattern.
Firm demography and survival analysis 167
under 14 14 – 46 46 – 123 over 123
(a)
(b)
Figure 10.2 Number of households by census tract in 2004 in Trento: (a) as a census tracts map of the quartile distribution; (b) as a marked point pattern.
According to the current state-of-the-art in the spatial statistics literature, the most efficient and versatile method of maximizing the log pseudo-likelihood (thus obtaining unbiased estimates of the parameters) is the technique proposed by Berman and Turner (1992). (For a clear and detailed discussion of the method, see Baddeley and Turner, 2000.) As shown, for example, by Strauss and Ikeda (1990), maximum pseudo-likelihood is equivalent to maximum likelihood in the case of a Poisson stochastic process. Therefore, it is possible to test the significance of the estimated model parameters by using standard formal likelihood ratio criteria based on the χ 2 distribution. The maximum pseudo-likelihood estimates of the parameters for the birth process model (Equation 10.7) are reported in Table 10.1. The significantly positive value of the estimate of β ss indicates that establishment of small food stores is positively dependent on the locations and sizes of the existing small food stores, while the significantly negative value of the estimate of βbs indicates that they are negatively dependent on the locations and sizes of the existing big supermarkets. Therefore, the probability of the establishment of new small food stores is higher in the locations characterized by the presence of other existing small food stores thus highlighting the presence of positive spatial externalities. On the other hand, such probability is lower in the locations characterized by the presence of existing big supermarkets, which indicates the presence of negative spatial externalities. The model also reveals the presence of a positive significant relationship between the spatial intensity and the two proxies of market potential and the urban structure (respectively the parameters βnh and βaf ).
168 Looking ahead In the model considered, no measure is available to test the goodness-of-fit playing a role similar to that of R 2 in the OLS standard regression. To this aim, however, it is possible to rely on Monte Carlo simulations. The adequacy of the model to the observed data can indeed be visually assessed by looking at the behavior of the empirical inhomogeneous K-function (see Section 6.3.2.1) of the observed 26 new small retail food stores point pattern (see Figure 10.1a) plotted against the behavior of the inhomogeneous K-function, in terms of confidence bands, derived from 999 simulations of the estimated model. Figure 10.3 shows the behavior of the empirical Kˆ I (d ) − π d 2 calculated from the observed point pattern of the new small food stores against the upper and lower confidence bands calculated from 999 realizations of the estimated model. The benchmark value representative of a good fit is zero for each distance d. As it can be noted, the empirical function tends to be close to zero and lies entirely within the confidence bands thus indicating that the estimated model describes adequately the spatial birth phenomenon of small food stores. Table 10.1 Estimates for the spatial establishment point process of new small food stores Parameter
Estimate
Standard error
z-test
α β ss βbs βnh βaf
−18.900 0.074 −0.028 0.004 0.001
0.8809 0.0234 0.0075 0.0012 0.0003
** *** *** ***
**Significant at 5%; *** Significant at 1%.
4e+08
KˆI (d )– p d 2
3e+08 2e+08 1e+08 0e+00 0
500
1000
1500
2000
distance d (in meters)
2500
3000
Figure 10.3 Behavior of the empirical Kˆ I (d ) − π d 2 (continuous line) and the corresponding 99.9% confidence bands (dashed lines).
Firm demography and survival analysis 169 10.2.2.3 The growth model The spatial growth dynamic of small food stores is modeled using the observed point pattern of small food stores that were established in 2004 or before and that survived for the whole period from 2005 to 2007 (see Figure 10.1b). As described in Equation 10.2, the growth of a single small food store is assumed to be a function of its stage of development at the beginning of the period and of the influences of the other food stores located in the neighborhood. Let {x ss ,i : i = 1, , n } and {xbs ,i : i = 1, , m} denote, respectively, the spatial coordinates of small food stores and of big supermarkets existing from at least 2004 and that have survived until 2007. Let also Z 04ss ,i (Z 04bs ,i ) and Z 07 ss ,i (Z 07bs ,i ) denote the number of employees of the ith small store (big supermarket) in 2004 and 2007, respectively, and let g i = Z 07 ss ,i Z 04ss ,i be a proxy for the growth of the ith small store. Following the framework suggested by Rathbun and Cressie (1994), the growth of the ith small food store can be modeled as follows: g i = α + β Z Z 04ss ,i + β ssWss ,i + βbsWbs ,i + εi
(10.8)
where n
W ss ,i ≡
∑ Φ (Z 04 ij
ss , j , dij
= x ss ,i − x ss , j
)
(10.9)
= x ss ,i − xbs , j
)
(10.10)
j =1
and m
Wbs ,i ≡
∑ Φ (Z 04 ij
j =1
bs , j , dij
measures the level of spatial interaction of the ith small food store with, respectively, other small food stores and big supermarkets, α , β Z , β ss , βbs are the parameters to be estimated and the εi ’s are independently and normally distributed errors with zero mean and a finite variance. Relying on the idea that the level of spatial interaction of a neighboring economic activity should depend on the distance to that economic activity and on its size (here approximated by the number of employees), the two measures of spatial interaction reported in Equations 10.9 and 10.10 are derived by specifying the functional form of Φij ( Z , d ). Rathbun and Cressie (1994) proposed choosing between the following functional 2 forms: Φij ( Z , d ) = 1 d, Φij ( Z , d ) = 1 d 2 , Φij ( Z , d ) = Z d , Φij ( Z , d ) = ( Z d ) and Φij ( Z , d ) = Z 2 d . Having chosen a proper specification of Φij ( Z , d ) and given Equations 10.9 and 10.10, the parameters of the growth model in Equation 10.7 can be estimated using OLS. The OLS estimates of the parameters for the growth process model fitted to the data of the small food stores are reported in Table 10.2. In our analysis we specified the spatial interaction term as
170 Looking ahead Table 10.2 Estimates for the growth point process of new small food stores Estimates
Full model
Restricted model
α
1.1600 *** (0.1145) –0.0317 (0.0364) 0.0446 (0.0987) 0.0000 *** (0.0000) 0.6811
1.1190 *** (0.0622) –
βZ β ss βbs R2
– 0.0000 *** (0.0000) 0.6921
*** Significant at 1%.
Φij ( Z , d ) = Z 2 d , which is the one maximizing the fit in terms of the coefficient of determination R 2. As can be seen, the parameters β Z and β ss are not significant, thus implying that in our specific case, the growth of small food stores is not affected by their size at the beginning of the period nor by the closeness to other economic activities of the same typology. On the other hand, the parameter βbs is significant and positive (although very close to zero), thus indicating that spatial interactions with big supermarkets fosters firm growth. In other words, growth rates are higher for small food stores located in the proximity of big supermarkets than for those that are far from them. This evidence suggests the presence of some sort of cooperative behavior amongst competitive economic agents. Similar results have been found by Igami (2011) and Borraz et al. (2014).
10.2.2.4 The survival model The death/survival process is fitted to the small food stores existing in 2004 and which have survived or ceased to operate during the period 2005 to 2007 (Figure 10.1c). Therefore, the data of interest consist of the spatial point patterns of small food stores observed at times t = 2005, 2006, 2007. According to the methodological framework described in Section 10.2.1.4, a death/survival process can be developed as a survival conditional probability model which will be now described. Let {x ss ,i (t ) : i = 1, , n } and {xbs ,i (t ) : i = 1, , m} denote the spatial coordinates of small food stores and big supermarkets, respectively, which survive at time t. Let also Z ss ,i ,t and Z bs ,i ,t represent the number of employees of the ith small store and of the ith big supermarket, respectively, at time t. Finally let M i (t ) denote a survivorship indicator variable such that M i (t ) = 1 if the ith small food store survives at time t and M i (t ) = 0 if the ith small food store ceases its activity at time t. The survival probability of the ith small food store can then be defined as described in Equation 10.4 by assuming that the survival at time t is a function of the dimension of the small food store at time t – 1 and of the competitive or cooperative influences of the other neighboring small food stores
Firm demography and survival analysis 171 and big supermarkets also operating at time t – 1. pi (t ; θ ) is then modeled using Equation 10.5, that now becomes: log
pi (t ; θ ) = α + β Z Z ss ,i ,t −1 + β ssW ss ,i,t −1 + βbsWbs ,i ,t −1 1 − pi (t ; θ )
(10.11)
where n
Wss ,i ,t −1 ≡
∑Φ (Z ij
)
(10.12)
)
(10.13)
ss , j ,t −1 , dij
= x ss ,i (t − 1) − x ss , j (t − 1)
bs , j ,t −1 , dij
= x ss ,i (t − 1) − x bs , j (t − 1)
j =1
and m
Wbs ,i ,t −1 ≡
∑Φ (Z ij
j =1
represent measures of the level of spatial interaction of the ith small food store with, respectively, other small food stores and big supermarkets, Φij ( Z , d ) is a known function which can be specified by the same functional forms already proposed for the growth model’s (see Equations 10.9 and 10.10) and ′ θ ≡ (α , β Z , β ss , βbs ) is the vector parameter to be estimated. By choosing a proper specification of Φij ( Z , d ) and given Equations 10.12 and 10.13, θ for Equation 10.11 can be estimated using the standard maximum likelihood procedure. For estimation purposes, first of all let us specify the spatial interaction term again as Φij ( Z , d ) = Z 2 d , which is the specification that produces the best fit in terms of the value of the log likelihood. The ML estimates of the parameters for the death/survival model expressed in Equation 10.10 are shown in Table 10.3. The results show that the parameters β Z and βbs are not significant, thus implying that the survivorship of small food stores is unaffected by their size and by the closeness of big supermarkets. Furthermore, since β ss is significant and negative (although very close to zero), the estimated model indicates that spatial Table 10.3 Estimates for the death/survival point process of new small food stores Estimates
Full model
Restricted model
α
1.7220 *** (0.4842) 0.1811 (0.1648) –0.0003 *** (0.0000) 2.6880 (5.6140) –70.7685
2.3076 *** (0.2345) –
βZ β ss βbs Log likelihood
*** Significant at 1%.
–0.0003 *** (0.0000) – –72.4076
172 Looking ahead interactions with the other small food stores result in a relatively lower probability to survive. This evidence reinforces the conjecture of competitive behavior between small food stores.
10.2.3 Conclusions In this section we discussed a model-based approach to the analysis of the dynamics of firm demography based on micro-geographical data. We presented a methodology to estimate a three-equations model dealing respectively with firms’ birth, growth and survival. We argue that decomposing firm demography processes into these three sub-processes allows us to uncover the relative importance of competitive and cooperative spatial interactions in determining the spatial distribution of economic activities. It is important to note that in the case study, the data allowed only the study of the spatial distribution of firm entries, exits and incumbent firms. Thus the conclusions are limited to the evidence of significant spatial interactions amongst economic agents. In order to uncover the entire locational process of economic agents, it would be necessary to access a larger information set of structural variables rather than just the mere geographical location of firms, such as, for instance, local demand, workforce skill and urban structure. However, this case study shows the potential of the proposed methodologies to study the spatial microeconomic behavior of firms. A possible extension of the techniques presented here could include more comprehensive studies which model the role of spatial proximity in the process of co-agglomeration and in the analysis of the joint locational behavior of the different economic sectors. In this way the global pattern of firm location, growth and survival observed at the level of the economy as a whole will be modeled as the outcome of the individual firm choices and their interaction in space and time.
10.3 A spatial microeconometric model for firm survival 10.3.1 Introduction The presence of knowledge spillovers and shared human capital is at the heart of the Marshall–Arrow–Romer externalities (MAR) hypothesis (see Marshall, 1920; Arrow, 1962; Romer, 1986) which represented the fount of a flood of scientific contributions produced in the last decades in the field of firm formation, agglomeration, growth and survival. According to the MAR hypothesis, similar firms located nearby increase the chance of human interaction, labor mobility and knowledge exchange which in turn has an effect on firm creation, development and survival. Most of the earlier empirical contributions on knowledge externalities considered data aggregated at a regional level, mainly due to data limitation, leading to contrasting empirical results (Mansfield, 1995; Henderson, 2003; Rosenthal and Strange, 2003). In particular, the role of agglomeration economies has been considered to explain the establishment of firms at a
Firm demography and survival analysis 173 regional level and its effects on the growth of regional employment and regional production (Glaeser et al., 1992; Henderson, 2003). This is a further fruitful field for spatial microeconometric studies. The present section tries to bridge the gap between the existing literature on point patterns presented in Part II with that on survival models. In particular our interest is in modeling the effects of spatial concentration and interaction on the probability of firm survival by incorporating spatial interaction effects amongst economic agents. We present a model for the probabilities of the failure of individual firms, which makes use of a survival regression model and which takes into account both the spatial interactions among firms and the potential effects deriving from agglomeration.
10.3.2 Basic survival analysis techniques This chapter is devoted to introducing a spatial microeconometric version of the basic survival Cox model (Cox, 1972). Before presenting such a model, however, it is probably useful to introduce briefly the major concepts related to non-spatial survival analysis. Survival data analysis (or “failure time data”) concentrates on the analysis of data corresponding to the time elapsed between a time origin and the time of occurrence of an event of interest (a “failure”). When the time elapsed represents the response variable of a model, standard statistical methods cannot be employed. Perhaps the most striking feature of survival data is the presence of censoring: that is, the presence of incomplete observations, when the follow-up time is shorter than that necessary for an event to be observed. Censoring can arise due to time limits and other restrictions and makes it very hard to calculate even the simplest descriptive statistics, such as the mean or the median survival times. Furthermore, survival data often exhibit a positively skewed distribution with a high degree of asymmetry which makes the normal distribution assumption unreasonable. Failure times can be considered empirical realizations of a positive random variable T related to time. In what follows we consider T as a continuous random variable characterized by a probability density function f(t) and a (cumulative) distribution function F (t ) = Pr [T ≤ t ] =
∫
t
0
f ( x ) dx. The survival function S (t )
can therefore be defined as the complementary function of F (t ) : S (t ) = 1 − F (t ) = Pr [T < t ]
(10.14)
and represents the probability of surviving at t (Marubini and Valsecchi, 1995). Apart from the three above mentioned functions ( f (t ), F (t ) , S (t ) ), two more functions are of interest when dealing with survival data. The first function,3 say λ (t ) , is called the “hazard function” and represents the instantaneous failure rate for an individual firm surviving to time t:
λ (t ) = lim
∆t →0+
Pr {t ≤ T < t + ∆t |T ≥ t } ∆t
;
(10.15)
174 Looking ahead In Equation 10.15, λ (t ) dt thus represents the probability that the event of interest occurs in the infinitesimal interval (t, t + dt), given survival at time t (Marubini and Valsecchi, 1995). The second function, say Λ (t ) , is called the “cumulative hazard function” and represents the integral of the hazard function: Λ (t ) =
∫
t
0
λ ( x ) dx. It is easy
to show that λ (t ) = f (t ) S (t ) and that, therefore, Λ (t ) = − log S (t ). From this relationship it can be immediately verified that Λ (t ) diverges so that λ (t ) is not a conditional density function. Parametric survival models are commonly specified by defining a plausible functional form for λ (t ) from which S (t ) and f (t ) can be derived. The simplest distribution (which plays a central role in the analysis of survival and epidemiological data) is the exponential distribution (Marubini and Valsecchi, 1995) which assumes the hazard function to be constant through time (λ (t ) = λ ). The basic model can be then easily extended to include independent explanatory variables, which enable us to investigate the role of selected covariates taking into account the effect of confounding factors. If Y is a continuous response, regression models are commonly used to model its expectation E (Υ). Since in the exponential distribution the expectation is 1 λ , an alternative way (Glasser, 1967) is to model the hazard as:
λ (t , x ) = λ0 exp ( b'x )
(10.16)
In Equation 10.16, x is a vector of k covariates including a constant term and b is a vector of unknown regression parameters to be estimated. Since b'x = b0 + b1x1 + ... + bk x k , the term λ0 exp (b0 ) represents the failure rate in the reference category (that is when x = 0). It is important to note that the model specified in Equation 10.16 relies on two basic assumptions: (i) the hazard function is independent of the values of the covariates and (ii) the covariates act in a multiplicative way on the baseline hazard. Therefore, if we consider two individuals characterized by covariate vectors x 1 and x 2, respectively, the hazard ratio:
λ (t , x 2 ) = exp b′ ( x 2 − x 1 ) λ (t , x 1 )
(10.17)
is independent on time. For this reason Equation 10.16 is called a “proportional hazard model”. In a seminal paper, Cox (1972) introduced a regression model which is currently the most widely used in the analysis of censored survival data. In the Cox model, the hazard function depends on both time and covariates, but through two separate factors:
λ (t , x ) = λ0 (t ) exp ( b'x )
(10.18)
In Equation 10.18, the baseline hazard λ0 (t ) is arbitrary defined (although it is assumed to be the same for all firms), while the covariates act in a multiplicative
Firm demography and survival analysis 175 way on the baseline hazard. In this sense, the Cox model is a semi-parametric model where the hazards are proportional, since the hazard ratio, given by
λ0 (t ) exp ( b′x 2 ) = exp b′ ( x 2 − x 1 ) λ0 (t ) exp ( b′x 1 )
(10.19)
is independent of time. An important difference from the parametric model (Equation 10.16) is in the form of the linear predictor b'x = b1x1 + ... + bk x k which does not include an intercept term. In terms of the inferential strategy, the parameter estimators of a Cox model and the significance tests are usually based on the partial likelihood technique (Cox, 1975). In the context of the exponential and Cox models, time is considered as measured on a continuous scale so that the exact survival and censoring times are recorded in relatively fine units without multiple survival times. However, data on the duration of firms are usually observed in discrete units of yearly length, so that we only know the time interval within which each event has occurred. In such cases, when many duration times are tied, the partial likelihood construction is not appropriate. Although some approximations of the exact marginal likelihood have been proposed, in particular those suggested by Breslow (1974) and Efron (1977), they are still inaccurate when the number of ties becomes high. In these cases a model for discrete times can be recommended. In a discrete times model, each firm experiences a sequence of censorings at t1, t 2, … and either fails or is finally censored in its last interval. The likelihood function over all the firms is a product of a Bernoulli likelihood for each firm in each interval. A discrete time proportional hazard model can be fitted by treating the observations in each time interval as independent across intervals, including period-specific intercepts (as a set of dummy variables) and a complementary log-log link function (Rabe-Hesketh and Skrondal, 2008). The linear predictor includes the “baseline hazards” (without making any assumptions about their functional form) and the covariates whose effects are assumed to be linear and additive on the logit scale. According to this model, the difference in the log odds between firms with different covariates is constant over time. This model, known as “cloglog model”, is the exact grouped-duration equivalent of the Cox model (for a formal derivation of the key link between the continuous-time Cox model and the discrete-time cloglog model, see the Appendix in Hess and Persson, 2012). Coefficient estimates obtained from these two model specifications should be identical, if the true underlying model were indeed a Cox model (Hess and Persson, 2012). It is also possible to include in the cloglog model a random intercept for each region (Rabe-Hesketh and Skrondal, 2008) in order to accommodate dependence among the survival times of different firms located in the same small geographical area (regions of Italy), and to control for the presence of unobserved heterogeneity after conditioning on the covariates included in the regression model. The exponential of the random intercept is called “shared frailty” due to the fact that it represents a region-specific disposition or “frailty” that is shared among firms nested in a region.
176 Looking ahead It is important to observe that survival regression models have been widely used in the empirical literature to model firm survival (see Audretsch and Mahmood, 1995; Dunne et al., 2005 among others); however, not many attempts have been made, thus far, to explain how spatial interactions among individual firms affect their survival probabilities and to model the effects on survival of spatial agglomeration. In this chapter, we show how it is possible to augment the explicative power of a survival regression model by explicitly taking into account spatial information while modeling the hazard function and survival probabilities by presenting the evidence of a case study.
10.3.3 Case study: The survival of pharmaceutical and medical device manufacturing start-up firms in Italy 10.3.3.1 Data description In this section we illustrate the use of a survival regression model augmented with the inclusion of micro-founded spatial covariates in order to assess the effects of agglomeration externalities generated by incumbent firms on the survival of start-up firms. We will show how this framework allows us to overcome the methodological pitfalls met by the agglomeration measures typically used in the current literature while uncovering the problem of firm survival. The case study involves the 3,217 firms in the pharmaceutical and medicaldevice manufacturing industry in Italy which started their activity between 2004 and 2008.4 In order to assess the effects of agglomeration externalities on the survival of these firms, we can also use the data about the 10,572 incumbent firms of the same industry, established before 2004 and still surviving in 2009. This dataset is a subset of an internationally comparable database on Italian firm demography built up and managed by the Italian National Institute of Statistics, in accordance with the procedures suggested by OECD and EUROSTAT and based on the statistical information contained in the National Business Registers. The business registers collect yearly a large set of information on the date of registration (firm entry) or deregistration (firm exit) for each business unit. However, this information does not purely represent firm demography, as registration and deregistration may also depend on non-demographic events such as changes of activity, mergers, break-ups, split-offs, take-overs and restructuring. Even if much of the literature on firm demography regularly makes use of data extracted from the business registers without any controls for the influence of non-demographic aspects, it should be noted that the mere observation of data from business registers does not allow a proper comparison at an international level due to a series of inconsistencies, such as different definitions, different units of observation, different national legal systems to name but a few. In this chapter, we specifically exploit data based on the true firm entries and exits with the explicit aim to remove some of these inconsistencies. For each firm, our database currently contains, for the period 2004 to 2009, information about firm code, sub-sector of activity (according to the NACE classification), number of
Firm demography and survival analysis 177 employees, legal status (according to the current classification), date of establishment (if between 2004 and 2009), termination date (if the exit occurred before 2009) and the precise spatial location (in GMT longitude and latitude coordinates).
10.3.3.2 Definition of the spatial microeconometric covariates In the empirical literature based on firm-level data (e.g. Staber 2001; Ferragina and Mazzotta, 2014 among others) agglomeration externalities effects on firm exit are typically assessed by regressing the probability (or hazard) of firm default on locational measures, such as industry specialization indices. Then the statistical significance, sign and magnitude of the associated estimated regression parameters are used to assess the empirical evidence indicating whether agglomeration externalities play a significant role in firm survival. In this chapter, however, we argue that the locational measures commonly used by researchers (such as the locational quotient or the Ellison–Glaeser index (Ellison and Glaeser, 1997)) might not be adequate for at least three reasons. First of all, these measures are calculated on regional aggregates built on arbitrarily defined geographical units (such as provinces, regions or municipalities). Hence, they introduce a statistical bias arising from the discretional definition of space (i.e. the modifiable areal unit problems bias. See Arbia, 1989). As evidence of this effect, Beaudry and Schiffauerova (2009) reviewed the relevant regional science literature and found that the emergence and intensity of agglomeration externalities are strictly dependent on the chosen level of spatial aggregation of data. Secondly, the dependent variable (the hazard rate) is defined at the firm level while the locational measures are defined at the regional level. As a consequence the regression model will be necessarily based on the implicit assumption that the behavior of firms is homogeneous within each region, which is certainly too restrictive and unrealistic in many empirical situations.5 Thirdly, the sampling distribution of locational measures traditionally employed in the literature is unknown, and, therefore, we cannot establish in a conclusive way whether the phenomenon is characterized by significant spatial concentration. In order to provide a solution to these problems, we develop a firm-level distance-based measure of spatial concentration to be included in the survival regression model thus taking into account the presence of spatial effects in firm survival analysis. Furthermore, unlike the regional-level locational measures that can only detect the presence of externalities at the regional level, the firm-level measures proposed here allow us also to clearly identify which firms benefit from MAR externalities testing: for example, whether small firms benefit from this more than the big firms. In order to build up a set of variables to capture MAR externalities, we rely on the well-established idea (see e.g. Glaeser et al., 1992) that the degree of
178 Looking ahead specialization of an industry matters more than its size. The rationale behind this hypothesis is that the degree of specialization can be seen as a proxy for the intensity and density of interaction among firms (Beaudry and Schiffauerova, 2009). In what follows we build up a firm-level distance-based measure of spatial concentration which is able to capture the start-up firm’s potential for Marshall externalities generated by incumbents. This measure can be seen as an application of the idea of LISA (see Chapter 2.3) to the standard Ripley’s K-function (Chapter 7.1) introduced by Getis (1984). Basically a local K-function is a statistical measure allowing to assess spatial interactions among geo-referenced locations. Indeed, within the context of micro-geographical data identified by maps of point events in two-dimensional space (represented by their longitude/ latitude coordinates), Getis’ local K-function can be seen as an explorative tool that summarizes the characteristics of a spatial distribution of point events relative to the location of a given point event. In our particular case the event of interest is represented by the presence of start-up firms in a particular location and our modeling framework aims at testing statistically if a given individual start-up firm is more likely to be localized in a clustering situation. For any given start-up firm i, located in a given geographical area, the local K-function can be defined as follows: K i (d ) = E I (dij ≤ d ) λ j ≠i
∑
(10.20)
where the term dij is the Euclidean distance between the ith start-up firm’s and the jth incumbent firm’s locations, I (dij ≤ d ) represents the indicator function such that I = 1 if dij ≤ d and 0 otherwise, and λ represents the mean number of firms per unitary area (a parameter called “spatial intensity”). Therefore, λ K i (d ) can be interpreted as the expected number of further incumbent firms located up to a distance d of the ith start-up firm. The local K-function quantifies the degree of spatial interaction between the ith start-up firm and all other incumbent firms at each possible distance d, and hence can be exploited to develop a micro-based measure of spatial concentration. Turning now to the inferential aspects, following Getis (1984), a proper unbiased estimator of K i ( s ) for a study area with n firms is given by: A Kˆ i ( s ) = n −1
n−1
∑w I (s ij
ij
≤ s)
(10.21)
j ≠i
where A is the study area and A denotes its surface. Due to the presence of edge effects arising from the bounded nature of the study area, an adjustment factor, say wij , is introduced thus avoiding potential biases in the estimates close to the boundary.6 The adjustment function wij represents the reciprocal of the proportion of the surface area of a circle centered on the ith start-up firm’s location, passing through the jth incumbent firm’s location, which lies within the area A (Boots and Getis, 1988).
Firm demography and survival analysis 179 As a final step, we use the function expressed in Equation 10.21 in order to obtain a measure of spatial concentration with a clear benchmark value allowing us to assess if the ith start-up firm is located in an agglomerated industrial area. The most popular strategy in the literature (see e.g. Beaudry and Schiffauerova, 2009) has been to refer to a relative benchmark, in which an industry in a region is considered as geographically concentrated (or dispersed) if it is over-represented (or under-represented) within the region with respect to the entire economy. A relative measure allows us to control for the presence of spatial heterogeneity in the study area and hence it is able to identify spatial concentration due to the genuine spatial interactions amongst economic agents (see e.g. Haaland et al., 1999; Espa et al., 2013). Following these considerations, a firmlevel relative measure of spatial concentration for newly established economic activities in the health and pharmaceutical industry can be defined as: RSi (d ) = Kˆ i ,sector (d ) Kˆ i ,all (d )
(10.22)
where Kˆ i ,sector (d ) is the local K-function estimated on the incumbent firms belonging to the same health and pharmaceutical sub-sector of activity of the ith start-up firm and Kˆ i ,all (d ) is the local K-function estimated on all incumbent firms of the entire health and pharmaceutical industry. If, at a given distance d, RSi (d ) tends to be equal to 1 then the ith start-up firm is located in an area (a circle with radius d) where economic activities are randomly and independently located from each other, implying absence of spatial interactions. When, at a given distance d, the functional expressed in Equation 10.22 is significantly greater than 1, then the ith start-up firm is located in a cluster with a spatial extension of d where the incumbent firms of its sub-sector of activity are more concentrated then all incumbent firms of the dataset, implying the presence of spatial concentration. For example, a value of RSi (d ) = 2 indicates that, among the incumbent firms located within distance d from the ith firm, the expected number of incumbent firms belonging to the ith firm’s sub-sector is two times the expected number of incumbent firms of the entire health and pharmaceutical industry. However, when at a given distance d, RSi (d ) is significantly lower than 1, the ith start-up firm is located in a dispersed area, where the incumbent firms of its sub-sector of activity are less concentrated then all incumbent firms of the dataset, implying the presence of spatial dispersion. For example, RSi (d ) = 0.5 indicates that, among the incumbent firms located within distance d from the ith start-up firm, the expected number of incumbent firms belonging to the ith start-up firm’s sub-sector is half the expected number of incumbent firms of the entire health and pharmaceutical industry. The function expressed in Equation 10.22 represents a relative measure of spatial concentration with the benchmarking value (case of random localization) represented by the spatial distribution of all health and pharmaceutical firms. Hence a specific sub-sector exhibits over-concentration (or over-dispersion) if its spatial distribution is more concentrated (or dispersed) than the spatial distribution of health and pharmaceutical firms as a whole. In order to use this function
180 Looking ahead as a proper measure of the level of spatial interactions among firms, however, it is necessary that the confounding exogenous factors of spatial heterogeneity (such as land regulation, topography lock in, proximity to raw materials and land use policies) affect all the considered sub-sectors in the same way. Since all firms belong to the same health and pharmaceutical industry, we can reasonably assume that their locational choices are affected by common unobserved exogenous spatial factors. Therefore, in this empirical case, Equation 10.22 can suitably represent a micro-geographical firm-level version of the location quotient that can be used to assess the presence of MAR externalities. In order to evaluate the significance of the values of RSi (d ) a proper inferential framework needs to be introduced. However, since the exact distribution of RSi (d ) is unknown no exact statistical testing procedure can be adopted and we have to base our conclusions on Monte Carlo simulated confidence envelopes (Besag and Diggle, 1977). In practice, we generate n simulations in each of which the m incumbent firm locations are randomly labelled with the observed m sub-sector of activity “markers”. Then, for each simulation, we calculate a different RSi (d ) function. We are then able to obtain the approximate n (n + 1) × 100% confidence envelopes from the highest and lowest values of the RSi (d ) functions calculated from the n simulations under the null hypothesis. Finally, if the observed RSi (d ) falls, at the given distance d, outside the envelopes – above or below – this will indicate a significant departure from the null hypothesis of absence of spatial interactions (RSi (d ) = 1).
10.3.3.3 Definition of the control variables We considered three establishment-specific control variables: the number of employees in each firm, the legal status of the firm and the geographical area in which it was located. More specifically, the number of employees is measured as the annual mean number of employees classified into three categories: (i) small firms with only 1 employee (2,496 firms, representing 77.6% of all the firms in the database), (ii) medium-sized firms with between 2 and 5 employees (681 firms, representing 21.2% of the dataset) and (iii) large firms with more than 5 employees (40 firms, representing 1.2% of the dataset). In terms of legal status the 3,217 firms belonged to three main categories: sole trader (2,454 firms, representing 75.4% of the dataset), partnerships (365 firms, representing 11.3% of the dataset) and companies (412 firms, representing 12.8% of the dataset). Finally the variable defining the geographical area consisted of four categories: north-west (798 firms, representing 24.8% of the dataset), north-east (350 firms, representing 10.9% of the dataset), center (901 firms, representing 28.0% of the dataset) and south and islands (1,168 firms, 36.3% of the dataset). According to the literature on spatial externalities (see e.g. Beaudry and Schiffauerova, 2009), other firm-level control covariates (such as proxies for the level of innovation and labor skills) should be taken into consideration but, because of data availability constraints, cannot be included in the model. However, since RSi (d ) detects spatial interactions between firms belonging to the same narrow
Firm demography and survival analysis 181 sub-sectors of the health and pharmaceutical industry, with reasonably similar characteristics, these unobserved variables should not exert a relevant confounding effect. We also considered three region-specific control variables (defined at the second level of the European Nomenclature of Units for Territorial Statistics), namely the regional unemployment rate, the proportion of region population between 24 and 54 years old and rural land price, in order to control for regional heterogeneity.
10.3.3.4 Empirical results Table 10.4 contains the Kaplan–Meier estimates of survival probability of the 3,217 start-up firms included in our database observed in the period 2004 to 2008. The table shows that after one year after being established, around 94% of firms still survived, while after four years around 1 firm out of 4 had failed. In the end, after five years of observation, the estimated survival probability is around 72%. A graphical representation is given in Figure 10.4 which shows that during the first five years of activity the propensity to failure tends to be constant over time. Among the 3,217 start-up firms, 415 failed in the period 2004 to 2008. Since the sum of survival times of all start-up firms is 6,444 years, a rough estimate of the annual hazard of failure is given by the ratio of these two figures: 415/6,444 = 0.0644 firms/year. In epidemiology this is called the “raw (or crude) incidence rate”. It means that if we observe 10,000 firms for one year, we expect that 644 will fail during that period. One of the most striking features of the data reported in Table 10.4 is that the incidence rate calculated within each year of follow-up (the age-specific incidence rate) is fairly constant so that the crude rate (0.0644) in this case could be considered a good synthesis of the data. When the incidence rate is constant we have the simplest and the best-known survival model: the exponential model. Such a model is not suitable, for example, in many medical studies, since for humans the incidence rate cannot be considered constant across years. However, it could be a suitable model in other contexts, such as engineering or physics. If the exponential model is the correct one, it is possible to estimate the mean survival time taking the reciprocal of the incidence Table 10.4 Kaplan–Meier estimates of probability of survival of the 3,217 pharmaceutical and medical-device manufacturing start-up firms in Italy, 2004–2008 Time Firms at risk
Firm exits
Survival probability
Lower 95% Upper 95% Incidence Rate CI CI
1 2 3 4 5
198 109 64 32 12
0.938 0.874 0.813 0.761 0.719
0.930 0.860 0.794 0.736 0.687
3,217 1,595 917 496 219
0.947 0.888 0.833 0.787 0.753
0.0615 0.0683 0.0698 0.0645 0.0548
0.90 0.85 0.80 0.70
0.75
Survival Probability
0.95
1.00
182 Looking ahead
0
1
2 3 Years form entry
4
5
Figure 10.4 K aplan–Meier survival curves for the 3,217 pharmaceutical and medicaldevice manufacturing start-up firms in Italy, 2004–2008.
rate; in our case we have about 15.5 years, a value that is fairly outside the follow-up time, so that it must be considered an extrapolation. As explained in Section 10.4.2, in our empirical circumstances, due to the discrete characteristics of time observations, the best specification of the hazard of firm exit appears to be the complementary log-log model with frailty. However, in order to assess the robustness of our results to model specification, we also consider three alternative models: (i) the exponential model, (ii) the Cox proportional hazards model and (iii) the complementary log-log model without frailty. In all four models, we included the micro-founded spatial covariate of interest (the measure RSi (d )) classified within three categories, namely “dispersion” (when RSi (d ) is significantly lower than 1, the baseline category), “independence” (when RSi (d ) = 1) and “concentration” (when RSi (d ) is significantly greater than 1). In order to avoid the imposition of a unique arbitrarily chosen spatial scale, and hence to control for the MAUP effects, the models have been estimated using a large set of different distances d, ranging from 5 to 100 kilometers, at which RSi (d ) has been computed. Table 10.5 shows the estimates of the complementary log-log model with and without frailty for two relevant
Firm demography and survival analysis 183 distances, 10 and 60 kilometers. Figure 10.5 summarizes the values of the estimated model coefficients of RSi (d ) for all models and distances. Table 10.5 indicates that the coefficient associated to “ RSi (d ) : independence” is positive and highly significant in all cases, while the coefficient associated to “ RSi (d ): concentration” is never significant. According to how RSi (d ) has been formalized, this implies that start-up firms located in a dispersed area (i.e. relatively far from the incumbent firms belonging to the same sub-sector) or in a cluster (i.e. relatively close to the incumbent firms belonging to the same sub-sector) will tend to have a lower exit hazard. In other words, geographical proximity to incumbent firms, on the one hand, and spatial differentiation, on the other hand, decrease the risk of failure, thus highlighting the presence of both negative and positive MAR externalities. This evidence suggests that in the industry of pharmaceutical and medical devices there are both competitive and cooperative behaviors amongst economic agents.
Cox Model
20
40
60
Logarithm of the Hazard Ratio
Independence vs Dispersion Concentration vs Dispersion
80
Distance (km)
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Logarithm of the Hazard Ratio
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Exponential Model
100
20
40
60
Distance (km)
80
Logarithm of the Hazard Ratio 100
0.0 0.1 0.2 0.3 0.4 0.5 0.6
Logarithm of the Hazard Ratio
0.0 0.1 0.2 0.3 0.4 0.5 0.6
20
40
60
Distance (km)
80
100
Complementary Log Log Mixed Model
Complementary Log Log Model Independence vs Dispersion Concentration vs Dispersion
Independence vs Dispersion Concentration vs Dispersion
Independence vs Dispersion Concentration vs Dispersion
20
40
60
Distance (km)
80
100
Figure 10.5 Estimated models’ coefficients of variable RSi (d ) at distances from 5 to 100 kilometers.
184 Looking ahead Table 10.5 Complementary log-log model estimates (logarithm of hazard ratios) for the 3217 pharmaceutical and medical device manufacturing start-up firms in Italy, 2004–2008
RSi (d ): dispersion RSi (d ): independence RSi (d ): concentration Employees: 1 Employees: 2–5 Employees: >5 Legal status: sole trader Legal status: partnership Legal status: company Region: north-west Region: center Region: south Region: north-east Population aged 25–54 Unemployment rate Rural land price Log incidence rate: first year Log incidence rate: second year Log incidence rate: third year Log incidence rate: fourth year Log incidence rate: fifth year
ρ
Wald χ 2
Cloglog model without frailty
Cloglog model with frailty
d = 10 km
d = 60 km
d = 10 km
d = 60 km
Reference 0.352*** (0.117) 0.363 (0.316) Reference −0.137 (0.132) −1.160 (0.720) Reference 0.364** (0.179) 0.519*** (0.200) Reference 0.061 (0.318) 0.278 (0.242) 0.363** (0.177) 10.651** (4.865) 0.301 (0.309) −0.181 (0.826) 10.651*** (4.865) 0.301*** (0.309) −0.181*** (0.826) 0.352*** (0.117) 0.363*** (0.316) – 2970.42***
Reference 0.464*** (0.131) 0.253 (0.190) Reference −0.159 (0.132) −1.095 (0.717) Reference 0.38** (0.179) 0.487** (0.199) Reference 0.152 (0.319) 0.279 (0.241) 0.380** (0.177) 10.476** (4.831) 0.165 (0.313) −0.280 (0.829) 10.476*** (4.831) 0.165*** (0.313) −0.280*** (0.829) 0.464*** (0.131) 0.253*** (0.190) – 2963.94***
Reference 0.365*** (0.116) 0.390 (0.315) Reference −0.147 (0.132) −1.178 (0.720) Reference 0.360** (0.179) 0.530*** (0.200) – –
Reference 0.474*** (0.128) 0.254 (0.190) Reference −0.166 (0.132) −1.107 (0.717) Reference 0.380** (0.179) 0.500** (0.199) – –
–
–
–
–
10.778** (4.871) 0.128 (0.150) −0.447 (0.514) 10.778*** (4.871) 0.128*** (0.150) −0.447*** (0.514) 0.365*** (0.116) 0.390*** (0.315) 0.004 2776.39***
10.395** (4.838) 0.071 (0.152) −0.568 (0.516) 10.395*** (4.838) 0.071*** (0.152) −0.568*** (0.516) 0.474*** (0.128) 0.254*** (0.190) 0.004 2779.61***
*** Significant at 1% level; ** significant at 5%; * significant at 10%; standard errors are in parentheses.
Firm demography and survival analysis 185 All estimated coefficients associated with the control variables have the expected sign. In particular, they show that the risk of failure is negatively affected by legal status and that there is heterogeneity between geographical areas. Interestingly, the coefficients of the incidence rates at the various years from firm’s entry have very similar values, indicating that in the first five years of activity the risk of failure of start-up firms is substantially constant over time. Finally, the parameter ρ , which accounts for the presence of a frailty effect, is not statistically significant. This means that spatial variability and heterogeneity of hazard rates is well captured by variable RSi (d ). Figure 10.5 displays the geographical dimension of MAR externalities by showing the way in which the coefficients associated to “ RSi (d ) : independence” and “ RSi (d ) : concentration” are affected by distance. The important evidence emerging from these results is that the way in which MAR externalities exert their effects on start-up firm survival in the first five years of activity strongly depends on the spatial dimension of the industrial site where they are located. Indeed, it can be seen that the two coefficients display a different magnitude depending on the value of the distance d. This implies that when we try to estimate the effect of MAR externalities on firm survival using region-level locational measures (thus referring to a fixed arbitrarily defined spatial scale), what we estimate is the combined result of different effects exerting their influence at different spatial scales. This result confirms the opportunity to use firm-level distance-based measures, such as the one proposed here, to better assess MAR externalities in their whole complexity. In particular, all the graphs in Figure 10.5 show that the relative advantage of dispersion over independence is positively related with the spatial scale since the coefficient for “ RSi (d ) : independence” increases steadily over distance. In contrast, the relative advantage of dispersion over concentration is negatively related with the spatial dimension of the industrial site since the coefficient for “ RSi (d ) : concentration” tends to decrease over distance. Therefore, Figure 10.5 unveils the action of Krugman’s centrifugal and centripetal forces (Krugman, 1991a). Specifically, it shows that centrifugal factors dominate at large distances, making it advantageous to locate as far as possible from incumbent firms. On the other hand, centripetal forces are relevant at short distances, making it advantageous to locate close to incumbent firms. The opposite behaviors of the two lines of Figure 10.5 show that the choice of location to reduce the risk of failure and increase the probability of survival of a new medical-device manufacturing firm in Italy, should take into consideration the distance from incumbent firms. The economic agent should choose the optimal distance from incumbents that avoids their competition but, at the same time, does not limit the opportunity to exploit MAR externalities that cannot occur at large distances. Figure 10.5 also shows that the four different models considered produce substantially the same results. There is therefore strong evidence that (at least in the first five years), the underlying model explaining the survival of start-up firms
186 Looking ahead can be considered to belong to the exponential family. This evidence is further corroborated by Grambsch and Therneau’s proportional hazards test (1994). Indeed, at all distances, the global chi-square test leads to the non-rejection of the proportional hazards assumption of a Cox regression. It is thus possible to conclude that, at least until 5 years of activity, experience does not help start-up firms to survive in the pharmaceutical and medical devices market.
10.4 Conclusion In this chapter we have discussed some pioneering models that try to capture simultaneously both the aspect of individual agents location and their spatial behavior. In this sense we tried to summarize the message of the whole book by showing how the spatial econometric methods presented in Part II can be integrated with the point pattern analysis literature of Part III to form a unique body of theory. The literature in this area is still relatively scarce and scattered, but it is easy to forecast that it will grow quickly in the near future under the rapidly increasing availability of reliable micro-data and the impulse of an increasing demand for more realistic testable behavioral models which involve space and spatial relationships. The methods and the applications presented here are only a small example of the vast possibilities offered to the analysis of the microeconomic behavior of individual economic agents.
Notes 1 G.W. Leibniz, Neuveaux essaies sur l’entendement humain (1703), bk 4, ch. 16. 2 The identification of these two kinds of economic activities has been made referring to the OECD/Eurostat classification scheme (NACE Rev 2). 3 Notice that in this section symbols are used with a different meaning with respect to those used elsewhere in the book. 4 This section draws heavily on Arbia et al., 2015b. 5 For a comprehensive discussion of the weaknesses of the regional-level location measures, see Duranton and Overman, 2005; Combes et al., 2008. 6 Firms located near the boundary of the study area may be close to unobserved firms located outside the study area. Therefore, for these firms, it may not be possible to count the actual number of further incumbent firms located up to a distance d. An estimator of the local K-function that does not account for this circumstance would tend to underestimate the actual degree of spatial interaction and hence would lead to biased estimates. For further details about the estimation biases generated by the presence of boundary limits, see Diggle, 2003.
Appendices
Appendix 1: Some publicly available spatial datasets The package spdep contains some datasets that are useful for additional practice. In particular, to download the dataset Baltimore, type: data(baltimore)
To visualize the content type the command: > str(baltimore)
which shows the following variables: ‘data.frame’: 211 obs. of 17 variables: $ STATION: int 1 2 3 4 5 6 7 8 9 10 … $ PRICE: num 47 113 165 104.3 62.5 … $ NROOM: num 4 7 7 7 7 6 6 8 6 7 … $ DWELL: num 0 1 1 1 1 1 1 1 1 1 … $ NBATH: num 1 2.5 2.5 2.5 1.5 2.5 2.5 1.5 1 2.5 … $ PATIO: num 0 1 1 1 1 1 1 1 1 1 … $ FIREPL: num 0 1 1 1 1 1 1 0 1 1 … $ AC: num 0 1 0 1 0 0 1 0 1 1 … $ BMENT: num 2 2 3 2 2 3 3 0 3 3 … $ NSTOR: num 3 2 2 2 2 3 1 3 2 2 … $ GAR: num 0 2 2 2 0 1 2 0 0 2 … $ AGE: num 148 9 23 5 19 20 20 22 22 4 … $ CITCOU: num 0 1 1 1 1 1 1 1 1 1 … $ LOTSZ: num 5.7 279.5 70.6 174.6 107.8 … $ SQFT: num 11.2 28.9 30.6 26.1 22 … $ X: num 907 922 920 923 918 900 918 907 918 897 … $ Y: num 534 574 581 578 574 577 576 576 562 576 …
The last two variables are the spatial coordinates. To download the dataset Boston, type: data(boston)
188 Appendices which downloads three objects: (i) boston.c (ii) boston.soi (iii) boston.utm.
In particular, the object (boston.c) contains 506 observations of the following 20 variables: $ TOWN: Factor w/ 92 levels “Arlington”,”Ashland”, … : 54 77 77 46 46 46 69 69 69 69 … $ TOWNNO: int 0 1 1 2 2 2 3 3 3 3 … $ TRACT: int 2011 2021 2022 2031 2032 2033 2041 2042 2043 2044 … $ LON: num -71-71-70.9–70.9–70.9 … $ LAT: num 42.3 42.3 42.3 42.3 42.3 … $ MEDV: num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 … $ CMEDV: num 24 21.6 34.7 33.4 36.2 28.7 22.9 22.1 16.5 18.9 … $ CRIM: num 0.00632 0.02731 0.02729 0.03237 0.06905 … $ ZN: num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 … $ INDUS: num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 … $ CHAS: Factor w/ 2 levels “0”,”1”: 1 1 1 1 1 1 1 1 1 1 … $ NOX: num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 … $ RM: num 6.58 6.42 7.18 7 7.15 … $ AGE: num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 … $ DIS: num 4.09 4.97 4.97 6.06 6.06 … $ RAD: int 1 2 2 3 3 3 5 5 5 5 … $ TAX: int 296 242 242 222 222 222 311 311 311 311 … $ PTRATIO: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 … $ B: num 397 397 393 395 397 … $ LSTAT: num 4.98 9.14 4.03 2.94 5.33 …
including longitude (LONG) and latitude (LAT) of the point data that can be used to create the W matrix. Furthermore, the object boston.utm contains the universal transverse Mercator (UTM) coordinates of the points that can also be used to build up the weight matrix.
Appendix 2: Creation of a W matrix and preliminary computations Most of the R procedures that will be presented, are contained in the package (spdep). When needed we will present other libraries. To install the package, type the command:
Appendices 189 > install.packages(“spdep”)
for the first time and then at the beginning of each new session, call it back by typing: > library(spdep)
In order to create a W matrix the starting point is always a system of point coordinates. If we have, say, the s1 and s2 coordinates of a map of n points, first of all we need to combine them together into an n-by-2 matrix with the command: > coordinates nbnear knn nbk W W listw2mat(nb)
190 Appendices or, vice versa, as: > mat2listw(nb)
To calculate an inverse distance weight matrix, we proceed in a different way. First of all, we have to generate a matrix of distances through the command: > D W for(i in 1:dim(W)[1]) {W[i,i] = 0}
To calculate an inverse distance with threshold, we proceed as before to generate the inverse distance W matrix and then we set: W WX W lm.morantest(model1, W)
which uses a W matrix contained in the object W. By default, the randomization option and the one-sided test are considered for the hypothesis testing. To change the default, introduce the option: > lm.morantest(model1, W, randomization=FALSE), alternative “two-sided”)
which considers the hypothesis of normality and a two-sided alternative hypothesis of positive or negative spatial autocorrelation.
Appendices 191
Appendix 3: Spatial linear models All the R commands related to the procedures described in this chapter are contained in the libraries (spdep) and spatialreg. We assume that we want to estimate the model y = β0 + β1x + β 2z + ε (possibly with the addition of a spatial lag or a spatial error or both) using a weight matrix contained in object W. We also assume that the observations of the variables y, x and z are stored in a file called filename. If the data are stored in the active R session, all the options including this specification can be omitted. To start, if we want to estimate the parameters of a purely autoregressive model, the command is: > model0 model1 model2 model3 model4 model5 model6 model7 lm.LMtests(Model1, listw=W, test=“all”)
which produces the LM test with the spatial error or the spatial lag model as alternatives (LM SEM and LM LAG ) and, in addition, the robust test RLM SEM and RLM LAG discussed in the text. For the calculation of the impact measures, the spdep and the spatialreg libraries contain a command for impacts in a spatial lag model. If we call such a model, say, model3, we can obtain the three impact measures (direct, indirect and total) by typing the command: > impact model8 rho=rho0
rho0 being any value such that |rho0 | < 1. The robustness of the results with respect to different initial values should be tested in any practical circumstances. Having done so, now type: > model9 model10 model11 model0 model0 model1 xcoord ycoord > + >
library(spatstat) ptsdata ptsdata2 plot(ptsdata2)
The ppp function allows us also to deal with other types of data for the study area, such as pixel images. Type?ppp to get further details.
Appendix 6.2: Simulating point patterns The spatstat package contains functions to simulate the main point processes, such as the homogeneous and inhomogeneous Poisson processes, Cox processes, Poisson cluster processes, Matern processes, simple sequential inhibition processes and Strauss processes.
Appendix 6.2.1: Homogeneous Poisson processes A homogeneous Poisson process with given constant first-order intensity in a given study area can be simulated using the function rpoisp, which requires at least the arguments lambda, that is the value of the first-order intensity, and win, that is the study area that can be specified as in the ppp function. The result is a ppp object. For example, a realization of a homogeneous Poisson process with intensity 50 in a unit square can be obtained with:
196 Appendices > csrpattern plot(csrpattern)
Appendix 6.2.2: Inhomogeneous Poisson processes The function rpoispp can be used also to simulate inhomogeneous Poisson processes by making the argument lambda equal to a function or a pixel image, instead of a constant value, that provide the differing values of the first-order intensity. For example, if we type: > lambdaFun inhompat plot(inhompat)
we obtain a partial realization of an inhomogeneous Poisson process on a square of side of 4 units with spatially varying intensity function λ ( x1 , x 2 ) = x12 + x 22.
Appendix 6.2.3: Cox processes The spatstat function rLGCP allows us to simulate the log-Gaussian Cox process. It makes use of at least three arguments: mu, var and model, which are the mean, variance and covariance function, respectively, of the underlying Gaussian process for the intensity. In order to specify the exponential covariance function, model has to be set equal to “exp”; otherwise, for instance, model=“matern” gives the Whittle–Matérn specification. To see the complete list of the available functional forms for the covariance function type?rLGCP. Depending on the choice of the covariance function, it is also possible to use the argument scale. For example, the following code allows to obtain a partial realization of a log-Gaussian Cox process on a unit square with mean µ = 4, variance σ 2 = 0.25 and correlation function ρ (d ) = exp {−d 0.2}: > LGCPpattern plot(LGCPpattern)
Appendix 6.2.4: Poisson cluster processes To simulate a Poisson cluster process with radially symmetric normal dispersion of followers, that is a Thomas model, the spatstat package provides us with the function rThomas, whose main arguments, kappa, scale and mu indicate, respectively, the process parameters ρ , σ and µ . Therefore, for example, to obtain a realization of a Thomas process in unit square with ρ = 25, σ = 0.025 and µ = 4, we can type: > ThomasPattern plot(ThomPattern)
Appendices 197 In a very similar way, to simulate a Poisson cluster process with uniform dispersion of followers, that is a Matérn cluster process, we can use the function rMatClust. To obtain a realization of a Matérn cluster process in a unit square with ρ = 25, R = 0.025 and µ = 4, we can type: > MatClustPattern plot(MatClustPattern)
Appendix 6.2.5: Regular processes Spatstat allows us to simulate the main regular point processes. First of all, the spatstat functions rMaternI and rMaternII generate random point patterns according to the Matérn model I and Matérn model II inhibition processes, respectively. They both require, at least, the values for the intensity and inhibition distance parameters, represented by the arguments kappa and r, respectively. For example, to simulate regular point processes with intensity 50 and inhibition distance 0.08: > + > > + >
MatPatternI StrPattern plot(StrPattern)
Appendix 6.3: Quadrat-based analysis The fundamental spatstat function to perform quadrat count analysis of a spatial point pattern is the function quadratcount, which subdivides the study area of
198 Appendices a point pattern into quadrats and provides the count for each quadrat. Its essential inputs are the point pattern of interest X, which has to be in the form of an object of class ppp, and the pair of scalars nx and ny that define the nx by ny grid of quadrats. The output of the function is an object of class quadratcount, which essentially consists of a nx by ny contingency table whose elements are the quadrat counts. For example, to get the 4-by-4 grid of quadrat counts for the simulated point pattern data simdat, type: > QC QC x y [0,2.5) [2.5,5) [5,7.5) [7.5,10] [7.5,10] 15 15 15 8 [5,7.5) 8 19 17 10 [2.5,5) 8 11 12 8 [0,2.5) 8 15 17 11
Various functions dedicated to the class quadratcount are available. Among others, plot provides a graphical representation of quadrat counts, intensity computes the quadrat-based intensity and quadrat.test performs the quadrat-based chi-squared CSR test, > plot(QC) > intensity(QC) > quadrat.test(QC)
Appendix 6.4: Clark–Evans test The CSR test based on the Clark–Evans index can be performed with the function clarkevans.test, which simply requires the user to indicate the point pattern under study, X, as a ppp object. Among other options, the function also allows us, through the argument alternative, to choose the type of test: “two.sided”, for the two-tailed test; “greater”, for the upper-tailed test; and “less”, for the lower-tailed test. For example, to perform the two-tailed Clark–Evans test for the simdat point pattern, type: > clarkevans.test(X=simdat, alternative=“two.sided”) Clark-Evans test No edge correction Z-test data: simdat R = 1.1175, p-value = 0.001603 alternative hypothesis: two-sided
Appendices 199
Appendix 7: Models of the spatial location of individuals Appendix 7.1: K-function-based CSR test To perform a CSR test based on the K-function we can use the R function envelope from the (spatstat) package. This is a general command to compute simulated confidence envelopes of a summary function according to a given generating point process. By default, if the user does not specify which summary function should be considered and which point process should be simulated, envelope uses the K-function and the homogeneous Poisson process. As an example, let us see how to conduct the CSR test for a simulated point pattern dataset. First of all, we generate the point pattern ptsdata within a unit square study area and according to an homogeneous Poisson process with λ = 100: > > > >
library(spatstat) set.seed(1234) ptsdata CSRbands plot(CSRbands, fmla=sqrt(./pi) ~ r)
where the option fmla allows to specify which transformation of the function has to be plotted. The string sqrt(./pi) ~ r indicates the Besag’s L transformation.
Appendix 7.2: Point process parameters estimation by the method of minimum contrast The spatstat functions thomas.estK, matclust.estK and lgcp.estK apply the method of minimum contrast using the K-function to estimate the parameters of the Thomas process, the Matérn cluster process and the log-Gaussian Cox process, respectively. Their usage is similar and, in its simpler form, only requires to indicate the dataset, as a ppp class object, to be fitted. For example, to fit the three processes to the simulated point pattern data simdat, type:
200 Appendices > > > > > > > >
library(spatstat) plot(simdat) fitThomas + > +
KdemopatCases + + + + +
DenvFunct > > >
library(spatstat) lambda KinhomBands plot(KinhomBands, fmla=sqrt(./pi) ~ r)
Alternatively, in order to perform the K inhom-function-based test with the first-order intensity function estimated by a parametric regression model, the function ppm can be used. For example, to estimate λ ( x ) with a model that assumes that the intensity depends on a quadratic trend in the spatial coordinates, we may run: > fit lambda demopat2 Kd plot(Kd)
and > M plot(M)
204 Appendices where, quite intuitively, the options NumberOfSimulations, Alpha and ReferenceType refer, respectively, to the number of simulations for the confidence envelopes, the level of significance and typology of points for which we want to analyze the relative spatial concentration.
Appendix 9: Space–time models Appendix 9.1: Space–time K-function To estimate the Diggle, Chetwynd, Häggkvist and Morris space–time K-function and perform the associated test of space–time interactions, it is possible to use functions contained in the library (splancs) (Rowlingson and Diggle, 2015). To illustrate their use, we refer to an artificial, simulated, spatio-temporal point pattern dataset. First of all, we generate the spatial coordinates of the dataset according to a homogeneous Poisson process using the spatstat function rpoispp: > library(spatstat) > points plot(points)
Secondly, we simulate the associated time labels according to a Poisson distribution with mean equal to 10 using the function rpois: > times hist(times)
The space–time K-function can be estimated using the splancs function stkhat, which requires the following inputs: – pts, the spatial coordinates of the points in the form of a matrix where the rows identify the observations and the columns represent the horizontal and vertical coordinates, respectively; – times, the vector of time labels associated to the points; – poly, the study area of the point pattern as a matrix containing the spatial coordinates of its vertices; – tlimits, the vector specifying the range of the time interval to be considered; – s, the vector of spatial distances to be used for the estimation; – tm, the vector of temporal distances to be used for the estimation. If we consider a unit square as the study area, we can estimate the space–time K-function for the points point pattern with the times time labels by running the following code: > library(splancs) > ptsData area > > +
s abline(v=STtest$t0)
Appendix 9.2: Gabriel and Diggle’s STIK-function Gabriel and Diggle’s STIK-function can be estimated using the function STIKhat contained in the library (stpp). This function is characterized by essentially the same inputs as function stkhat, but it additionally requires the user to provide lambda, a vector of values of the estimated first-order intensity function evaluated at the points of the pattern. For the purpose of illustration, let us refer, as in the previous paragraph, to a simulated example dataset: > points times mhat > > + +
stppData