270 114 7MB
English Pages XII, 194 [200] Year 2020
Ronny Vallejos Felipe Osorio Moreno Bevilacqua
Spatial Relationships Between Two Georeferenced Variables With Applications in R
Spatial Relationships Between Two Georeferenced Variables
Ronny Vallejos Felipe Osorio Moreno Bevilacqua •
•
Spatial Relationships Between Two Georeferenced Variables With Applications in R
123
Ronny Vallejos Department of Mathematics Federico Santa María Technical University Valparaíso, Chile
Felipe Osorio Department of Mathematics Federico Santa María Technical University Valparaíso, Chile
Moreno Bevilacqua Faculty of Engineering and Sciences Universidad Adolfo Ibañez Viña del Mar, Chile
ISBN 978-3-030-56680-7 ISBN 978-3-030-56681-4 https://doi.org/10.1007/978-3-030-56681-4
(eBook)
© Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my lovely wife and son, Carmen and Ronny Javier. Ronny Vallejos To my children Vicente and Florencia, for their love. Felipe Osorio To Gianni and Agostina. Moreno Bevilacqua
Preface
In this book we cover a wide range of topics that currently are available only as a material included in many research papers. The material we cover is related to 35 years of research in spatial statistics and image processing. Our approach includes an exposition of the techniques keeping the mathematical and statistical background at a minimum so that the technical aspects are placed in an appendix in order to facilitate the readability. Each chapter contains a section with applications and R computations where real datasets in different contexts (Fisheries Research, Forest Sciences, and Agricultural Sciences) are analyzed. We trust that the book will be of interest to those who are familiar with spatial statistics and to scientific researchers whose work involves the analysis of geostatistical data. For the first group, we recommend a fast reading of Chap. 1 and then the chapters of interest. For the second group, the preliminaries given in Chap. 1 are recommended as a prerequisite, especially because of the language and notation used further in the book. The interdependence of the chapters is depicted below, where arrow lines indicate prerequisites. Extensive effort was invested in the composition of the reference list for each chapter, which should guide readers to a wealth of available materials. Although our reference lists are extensive, many important papers that do not fit our presentation have been omitted. Other omissions and discrepancies are inevitable. We apologize for their occurrence. Many colleagues, students, and friends have been of great help to our work in this book in several ways: by having discussions that improve our understanding of specific subjects; by doing research with us in a number of collaborative projects; by providing constructive criticism on earlier versions of the manuscript; and by supporting us with enthusiasm to finish this project. In particular, we would like to thank Aaron Ellison, Daniel Griffith, Andrew Rukhin, Wilfredo Palma, Manuel Galea, Emilio Porcu, Pedro Gajardo, Jonathan Acosta, Silvia Ojeda, Javier Pérez, Francisco Alfaro, Rogelio Arancibia, Carlos Schwarzenberg, Angelo Gárate, and Macarena O’Ryan.
vii
viii
Preface
Chapter 2
Chapter 3 Chapter 1
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Ronny Vallejos’ research was supported by Fondecyt grant 1120048, by the Advanced Center for Electrical and Electronics Engineering, FB-0008 and by internal grants at Universidad Técnica Federico Santa María, Chile. Felipe Osorio’s work was supported by Fondecyt grant 1140580. Moreno Bevilacqua’s research was partially supported by Fondecyt grant 1160280, and by Millennium Science Initiative of the Ministry of Economy, Development, and Tourism, grant Millenium Nucleus Center for the Discovery of Structures in Complex Data, Chile. During the period in which the book was written Ronny Vallejos and Felipe Osorio were in the Departamento de Matemática at Universidad Técnica Federico Santa María, Valparaíso, Chile. Moreno Bevilacqua was in the Departamento de Estadística at Universidad de Valparaíso, Chile. We wish to end this preface by thanking our families, friends, and others who helped make us what we are today and thereby contributed to this book. In particular, we would like to thank Eva Hiripi of Springer for her constant support. Valparaíso, Chile May 2020
Ronny Vallejos Felipe Osorio Moreno Bevilacqua
Contents
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
1 1 1 3 5 6 6 6 7 7 10 15 16 17 20 21 24
Modified t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Modified t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimation of the Effective Sample Size . . . . . . . . . . . . . . Applications and R Computations . . . . . . . . . . . . . . . . . . 2.4.1 Application 1: Murray Smelter Site Revisited . . . . 2.4.2 Application 2: Modified t-Test Between Images . . 2.5 A Permutation Test Under Spatial Correlation . . . . . . . . . 2.5.1 Application 3: Permutation t-Test Between Images
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
27 27 27 29 32 33 35 37 39
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 The Pinus Radiata Dataset . . . . . . . . . . . . . . . . . 1.1.2 The Murray Smelter Site Dataset . . . . . . . . . . . . 1.1.3 Similarity Between Images . . . . . . . . . . . . . . . . . 1.2 Objective of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Layout of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Preliminaries and Notation . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Spatial Processes . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Intrinsic Stationary Processes and the Variogram . 1.5.3 Estimation of the Variogram . . . . . . . . . . . . . . . 1.5.4 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.5 The Cross-Variogram . . . . . . . . . . . . . . . . . . . . . 1.5.6 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The 2.1 2.2 2.3 2.4
ix
x
Contents
2.6 Assessing Correlation Between One Process and Several Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Application 4: Pinus Radiata Dataset Revisited . . 2.7 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
40 42 44 44
.. .. ..
47 47 47
..
53
.. ..
56 58
. . . .
3 A Parametric Test Based on Maximum Likelihood . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Parametric Bivariate Covariance Models . . . . . . . . . . . . 3.2 A Parametric Test Based on ML Under Increasing Domain Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 A Parametric Test Based on ML Under Fixed Domain Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Examples with the R Package GeoModels . . . . . . . . . . . . . . . . 3.4.1 An Example of Test of Independence Using a Separable Matérn model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 An Example of Test for Assessing the Correlation Using a Nonseparable Matérn Model . . . . . . . . . . . . . . 3.5 Application to the Pinus Radiata Dataset . . . . . . . . . . . . . . . . . 3.6 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..
58
. . . .
. . . .
60 62 67 67
4 Tjøstheim’s Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Measures of Association . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Definition of the Measure and Its Properties . . . . . . . . . . . 4.3 Applications and R Computations . . . . . . . . . . . . . . . . . . 4.3.1 The R Function cor.spatial . . . . . . . . . . . . . 4.3.2 Application 1: Murray Smelter Site Revisited . . . . 4.3.3 Application 2: Flammability of Carbon Nanotubes . 4.4 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
69 69 71 73 73 73 74 75 77
5 The Codispersion Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Codispersion Index . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Relationship Between Codispersion and Correlation . 5.2.2 Differentiating Correlation and Spatial Association . 5.3 Codispersion for Spatial Autoregressive Processes . . . . . . . 5.4 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Simulation of Spatial Autoregressive Models . . . . . . 5.6.2 Simulation of Spatial Moving Average Models . . . . 5.6.3 Coverage Probability . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
79 79 79 80 81 83 87 92 96 96 96 98
Contents
xi
5.7 The Codispersion Map . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 The Map for Data Defined on a General Lattice . . 5.7.2 The Map for Data Defined on a Regular Grid . . . . 5.8 Applications and R Computations . . . . . . . . . . . . . . . . . . 5.8.1 The R Function codisp . . . . . . . . . . . . . . . . . . . 5.8.2 Application 1: Flammability of Carbon Nanotubes Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.3 Application 2: Comovement Between Time Series . 5.8.4 Application 3: Codispersion Maps for Images . . . . 5.8.5 Computational Time Comparison . . . . . . . . . . . . . 5.9 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. 99 . 99 . 100 . 102 . 102
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
102 104 105 106 109 112
6 A Nonparametric Coefficient . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Linear Smoothers and Kernel Functions . . . . . . . . . . . 6.3 A Nadaraya-Watson Codispersion Coefficient . . . . . . . 6.4 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Selection of the Bandwidth . . . . . . . . . . . . . . . . . . . . 6.5.1 Bandwidth Selection for the Semivariogram . . 6.5.2 Bandwidth Selection for the Cross-Variogram . 6.6 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 R Computations and Applications . . . . . . . . . . . . . . . 6.7.1 The R Function Codisp.Kern . . . . . . . . . . 6.7.2 Application: The Camg Dataset . . . . . . . . . . . 6.8 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
115 115 115 117 118 119 120 123 123 125 125 126 127 129
7 Association for More Than Two Processes 7.1 Introduction . . . . . . . . . . . . . . . . . . . . 7.2 The Codispersion Matrix . . . . . . . . . . . 7.3 Asymptotic Results and Examples . . . . 7.4 Spectral Band Selection . . . . . . . . . . . 7.5 Applications and R Computations . . . . 7.6 Final Comments . . . . . . . . . . . . . . . . . 7.7 Problems for the Reader . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
131 131 131 133 136 138 141 142 143
8 Spatial Association Between Images 8.1 Introduction . . . . . . . . . . . . . . . 8.2 The Structural Similarity Index . 8.3 Some Extensions . . . . . . . . . . . 8.3.1 The CQ Coefficient . . . . 8.3.2 The CQmax Coefficient . . 8.3.3 The CSIM Coefficient . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
145 145 146 149 149 150 153
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
xii
Contents
8.4 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Performance of the Directional Contrast . . . . . . . . 8.4.2 Performance of the Coefficients Under Distortions . 8.5 Applications and R Computations . . . . . . . . . . . . . . . . . . 8.5.1 Application: Stochastic Resonance . . . . . . . . . . . . 8.6 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
153 153 157 160 160 163 164
Appendix A: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Appendix B: Effective Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Appendix C: Solutions to Selected Problems. . . . . . . . . . . . . . . . . . . . . . . 183 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Chapter 1
Introduction
1.1 Motivating Examples The types of spatial data we describe in the following examples have been widely discussed in Cressie (1993) and Schabenberger and Gotway (2005) in the context of a single realization of a stochastic sequence. Since the book addresses spatial association between two stochastic sequences, we consider some basic assumptions that do not vary throughout the book. For example, we denote the two random sequences as X (s) and Y (s) for s ∈ D ⊂ R2 , and the available information is the observations X (s1 ), . . . , X (sn ) and Y (s1 ), . . . , Y (sn ). That is, both variables have been measured at the same locations in space.
1.1.1 The Pinus Radiata Dataset Pinus radiata, which is one of the most widely planted species in Chile, is planted in a wide array of soil types and regional climates. Two important measures of plantation development are dominant tree height and basal area. Research shows that these measures are correlated with the regional climate and local growing conditions (see Snowdon 2001). The study site is located in the Escuadrón sector, south of Concepción, in the southern portion of Chile (36◦ 54 S, 73◦ 54 O) and has an area of 1244.43 hectares. In addition to mature stands, there is also interest in areas that contain young (i.e., four years old) stands of Pinus radiata. These areas have an average density of 1600 trees per hectare. The basal area and dominant tree height in the year of the plantation’s establishment (1993, 1994, 1995, and 1996) were used to represent stand attributes. These three variables were obtained from 200 m2 circular sample plots and point-plant sample plots. For the latter, four quadrants were established around the sample point, and the four closest trees in each quadrant (16 trees in total) were then selected and measured. The samples were located systematically using a mean distance of 150 m between samples. The total number of plots available © Springer Nature Switzerland AG 2020 R. Vallejos et al., Spatial Relationships Between Two Georeferenced Variables, https://doi.org/10.1007/978-3-030-56681-4_1
1
1 Introduction
5908000
5909000
5910000
5911000
5912000
5913000
2
665000
666000
667000
668000
669000
670000
Fig. 1.1 Locations where the samples were taken
5
4 2
5
y 6
2
5 6 4
5
6
5910000
2
6
6
8
6
4
4
2
5
6
4
6
669000
670000
5908000
6
668000
5
6
3 6
6
4
6
5909000
4
5909000
6
4
6
5908000
4
6
6 7
8
4
4
667000
4
4
4
5 6
5911000
6
y
2
4
666000
4
4
4
5910000
6 6
3
2
6
6
6
8
8
4
4 4
4
7
4
5912000
2
4
4 6
2
6
2
8
4
5911000
6
6
10
4
6
10
3
4
5
5912000
6
8
6
12
6 6
2
Height
5913000
2
5913000
Basal area
6
6 5
666000
667000
668000
x
x
(a)
(b)
669000
670000
Fig. 1.2 a Bilinear interpolation of the three basal areas; b Bilinear interpolation of the three heights
for this study was 468 (Fig. 1.1). Figure 1.2 shows a simple bilinear interpolation and the corresponding contours for the two variables. The original georeferenced data do not enable estimation of the sample correlation coefficient because it is challenging to train the human eye to capture two-dimensional patterns. The objective of analyzing these data is to construct a suitable measure that takes into account the spatial association between the two variables. One could be tempted to compute the Pearson correlation coefficient for the two sequences by considering these variables to be two simple columns. Then, the construction of a scatterplot could
1.1 Motivating Examples
3
5 3
4
Height
6
7
8
Fig. 1.3 Height versus Basal area (468 observations)
2
4
6
8
10
12
Basal Area
help to determine whether there is a linear trend between the basal area and height. It is interesting to emphasize that the human eye can usually be trained to estimate the value of the correlation coefficient from the information provided by a scatterplot between the variables of interest. However, when the data have been georeferenced on two-dimensional space, it is difficult to estimate a reasonable association between the variables. For the forest variables, a scatterplot between the basal area and height is displayed in Fig. 1.3, which shows a clear linear correlation between the basal area and height. The correlation coefficient confirms the linear pattern (0.7021). Although the exploratory data analysis provides good initial insight into the real problem, the issue of how to take into account the possible spatial association for each variable has not yet been addressed. Thus, a primary objective in analyzing these data is to develop coefficients for the spatial association between two georeferenced variables that take into account the existing spatial association within and between the variables.
1.1.2 The Murray Smelter Site Dataset The dataset consists of soil samples collected in and around the vacant, industrially contaminated, Murray smelter site (Utah, USA). This area was polluted by airborne emissions and the disposal of waste slag from the smelting process. A total of 253 locations were included in the study, and soil samples were taken from each location. Each georeferenced sample point is a pool composite of four closely adjacent soil samples in which the concentration of the heavy metals arsenic (As) and lead (Pb) was determined. A detailed description of this dataset can be found in Griffith (2003) and
4
1 Introduction
(a)
(b)
Fig. 1.4 Locations of 253 geocoded aggregated surface soil samples collected in a 0.5 square mile area in Murray, Utah and their measured concentrations of As and Pb. Of these 173 were collected in a facility Superfund site, and 80 were collected in two of its adjacent residential neighborhoods located along the western and southern borders of the smelter site. a As measurements; b Pb measurements
Griffith and Paelinck (2011). For each location, the As and Pb attributes are shown in Fig. 1.4a, b. The objective for this data is to assess the spatial association between As and Pb. Figure 1.4 shows that, in this case, the observations are clearly located in a nonrectangular grid in two-dimensional space. Again, the goal can be achieved by quantifying the coefficients of the spatial association or by constructing a suitable hypothesis test for the Pearson correlation coefficient ρ between As and Pb. A hypothesis test of the form H0 : ρ = 0 against H1 : ρ = 0 can be stated under the assumption of normality for both variables (As and Pb). Then, the test statistic is √ n−2 = 11.5548, (1.1) t =r√ 1 − r2 where n = 253 and r = 0.5892. The p-value associated with the test is 2.2 × 10−16 ; thus, there is sufficient evidence to reject H0 for a significance α > p. In the previous analysis, we assumed that the correlation between the variables is constant, i.e., cor[X (s), Y (s)] = ρ, for all s ∈ D. However, as we will see in the following chapters, this dataset and several others do not support this restriction. Instead, they exhibit a clear spatial association between the variables of interest.
1.1 Motivating Examples
5
The objective in analyzing these data is to develop suitable hypothesis testing methods to assess the significance of the spatial association between two spatial variables by considering the existing spatial association between them.
1.1.3 Similarity Between Images With the rapid proliferation of digital imaging, image similarity assessment has become a fundamental issue in many applications in various fields of knowledge (Martens and Meesters 1998). Many proposals of indices that capture the similarity or dissimilarity between two digital images have received attention during the past decade. One important feature to consider is the capability of some coefficients to provide a better interpretation of human visual perception than is provided by the widely used mean square error (MSE) (Wang et al. 2004). Here, we introduce an example that uses real data to illustrate the dependence of the spatial association on a particular direction in space, noting that the correlation coefficient (a crude measure of spatial association between two processes) cannot account for the directional association between two images. To accomplish this goal, an original image (Lenna) of size 512 × 512 was taken from the USC-SIPI image database http://sipi.usc.edu/database/ (See Fig. 1.5a). The image shown in Fig. 1.5a was processed by Algorithm 4.1 in Vallejos et al. (2015) to transform the original image into an image with a clear pattern in the direction h = (1, 1). The processed image is displayed in Fig. 1.5b. The correlation coefficient between the images shown in Fig. 1.5 is r = 0.6909. Clearly, the correlation coefficient does not capture the evident pattern observed by the human eye between the original and transformed images. In fact, the trend in the off-diagonal of the image in Fig. 1.5b is sufficient to decrease the correlation
(a)
(b)
Fig. 1.5 a Original image (Lenna); b Image transformed into the direction h = (1, 1)
6
1 Introduction
coefficient to 0.6909 even though the features of the original image are still present and detectable by the human eye. The objective in analyzing these data is to construct image similarity coefficients that can detect patterns in different directions in space and to appropriately represent the human visual system.
1.2 Objective of the Book The aim of this book is to gather the published material that is spread throughout the literature. The book may be of interest to two types of users. First, researchers from applied areas, such as agriculture, soil sciences, forest sciences, environmental sciences, and engineering. For these and other users who possibly are more interested in the applications, the book is organized in such a way that the mathematical foundations in each chapter can be skipped. Second, for investigators who are interested in the development of new techniques and methods to assess the significance of the correlation between two or more spatial processes that are well defined on a twodimensional plane, at the end of the book, we include an appendix with the proofs of the results presented in the book and some mathematical details that support the expressions and equations that are briefly explained in the main text. Although the book contains methods that were discovered and proposed approximately thirty years ago and are very well known to readers working in spatial statistics and geostatistics, other methods in this book have recently been developed and are not yet available in a publication like this.
1.3 Layout of the Book This book is divided into three parts. The first part considers the association between two random fields from an hypothesis testing perspective (Chaps. 2 and 3). The second part is devoted to point estimation coefficients of association. These perspectives are developed in Chaps. 4–7. The third part considers the spatial association between two images (Chap. 8). Several applications are presented throughout the book. Each chapter ends with a set of theoretical and applied exercises. Most of the applied problems are related to real datasets, and it is expected that the reader will use R software to solve them.
1.4 Computation To illustrate the applicability of the methods exposed in this book, each chapter contains a section on R computations with practical applications. In most of the examples, we show how R software and the contributed packages SpatialPack and
1.4 Computation
7
GeoModels can be used to implement the techniques described in the corresponding chapter. The SpatialPack package is freely available from the R website www.rproject.org and the package GeoModels can be downloaded from the website https:// vmoprojs.github.io/GeoModels-page/.
1.5 Preliminaries and Notation In this section, we provide the necessary material that will be used in subsequent chapters. Readers interested in practical applications with real datasets can skip the rest of this chapter and move ahead.
1.5.1 Spatial Processes In this section, we introduce the basic notion of stochastic processes. Our goal is to define the mean, variance, and covariance functions of spatial processes. These concepts will be used in the subsequent chapters. A stochastic process is a family or collection of random variables in a probability space. Let (Ω, F, P) be a probability space, and let D be an arbitrary index set. A stochastic process is a function X : (Ω, F, P) × D −→ R, such that for all s ∈ D, X (w, s) is a random variable. In the sequel, we will denote a stochastic process as X (s), for s ∈ D, or {X (s) : s ∈ D}. The above definition enables us to define a variety of processes. For example, if D = Z, X (s) is a discrete time series. Similarly, a spatial process is a collection of random variables that are indexed by the set D ⊂ Rd . In the time series case, the realizations of the process are observations indexed by time, while in the spatial case, the realizations of the process are regions on the subspace Rd . Additionally, in the first case, the index set is a totally ordered set; however, in the spatial case, it is possible to define a partially ordered set. We denote the coordinates of a spatial process defined on a space of dimension d as s = (s1 , . . . , sd ) . As an example, consider the process {X (s) : s ∈ Z} defined by the equation X (s) = A cos(ηs + φ),
(1.2)
where A is a random variable independent of φ ∼ U(0, 2π ), and η is a fixed constant. For the particular case when A ∼ N (0, 1) and η = 1, 1000 observations from process (1.2) were generated. This realization of process {X (s)} is displayed in Fig. 1.6.
1 Introduction
0 -2
-1
X
1
2
8
0
200
400
600
800
1000
Time
Fig. 1.6 A realization from process (1.2)
As another example, consider the process {Z (x, y) : (x, y) ∈ Z2 } defined as follows (1.3) Z (x, y) = β1 x + β2 y + (x, y), where { (x, y); (x, y) ∈ Z2 } is a collection of independent and identically distributed random variables with zero-mean and variance σ 2 . For β1 = 1, β2 = 1, (x, y) ∼ N (0, 1), and for a grid of size 512 × 512, a realization of this process is shown in Fig. 1.7. The class of all random functions or stochastic processes is too large to enable methods that are suitable for all types of processes to be designed. In fact, most of the developments have been proposed for special cases. One important class of processes are characterized by the feature that their distributions do not change over time/space. We can summarize this property by saying that for any set of locations s1 , . . . , sn , the joint probability of {X (s1 ), . . . , X (sn )} must remain the same if we shift each location by the same amount. In other words, the distribution of the process does not change if the spatial distribution is invariant under translation of the coordinates. A process {X (s) : s ∈ D} with this property is called strictly stationary. Indeed, a spatial process {X (s) : s ∈ D} is said to be strictly stationary if for any set of locations s1 , . . . , sn ∈ D and for any h ∈ D, the joint distribution of {X (s1 ), . . . , X (sn )} is identical to that of {X (s1 + h), . . . , X (sn + h)}. Strict stationarity is a severe requirement and can be relaxed by introducing a milder version that imposes conditions on only the first two moments of the process. The second-order condition, E[X 2 (s)] < ∞, for all
9
0.6
0.8
1.0
1.5 Preliminaries and Notation
0.2
y
0.4
Z
0.0
x 0.0
0.2
0.4
0.6
0.8
1.0
(b)
(a)
Fig. 1.7 a A realization from (1.3). b Image associated with (a)
s ∈ D, guarantees the existence of the mean, variance, and covariance functions (see Exercise 1.2), which we define, respectively, as μ(s) = E[X (s)], σ 2 (s) = var[Z (s)], and C(s1 , s2 ) = cov[X (s1 ), X (s2 )]. Then, a second-order process {X (s) : s ∈ D} is said to be weakly stationary if the mean function is constant and the covariance function between X (si ) and X (s j ) depends on only the difference si − s j , i.e., (i) E[X (s)] = μ, for all s ∈ D. (ii) cov[X (si ), X (s j )] = g(si − s j ), for all si , s j ∈ D and for some function g. More information about the function g(·) will be given later. From condition (ii), we see that the variance of a weakly stationary process is also constant (does not depend on s). The covariance function can then be written as C(h) = cov[X (s), X (s + h)]. It should be emphasized that if a second-order process is strictly stationary, then it is also weakly stationary. The reciprocal is not true (see Exercise 1.3). However, the normality assumption for the process guarantees that both stationary notions are equivalent. Example 1.1 Consider the process given by Eq. (1.2) with A ∼ N (0, 1). Clearly, E[X (s)] = 0. Moreover, cov[X (s1 ), X (s2 )] = E[A2 /2] E[cos(η(s1 − s2 )) + cos(η(s1 + s2 ) + 2φ)] = Equivalently, C(h) =
1 2
1 2
cos(η(s1 − s2 )).
cos(ηh). Thus, {X (s) : s ∈ Z} is a weakly stationary process.
The second-order property required for weak stationarity is crucial. There are examples of processes that are strictly stationary for which the second-order property does not hold (see Exercise 1.4).
10
1 Introduction
The covariance function of a weakly stationary process must satisfy the following properties (Schabenberger and Gotway 2005) C(0) ≥ 0. C(h) = C(−h). C(0) ≥ C(h). If C j (h) are valid covariance functions for j = 1, 2, . . . , n, then nj=1 b j C j (h) is also a valid covariance function if b j ≥ 0 for all j. (5) If C j (h) are valid covariance functions for j = 1, 2, . . . , n, then nj=1 C j (h) is a valid covariance function. (6) A valid covariance function in Rd is also valid in Rk for k < d. (7) If C is a valid covariance function in Rd , then the positive-definiteness condition is satisfied, i.e.,
(1) (2) (3) (4)
n n
ai a j C(si − s j ) ≥ 0, for all si , s j ∈ D and for all ai , a j ∈ R.
i=1 j=1
Example 1.2 (Intra-class correlation) Consider the process {X (s) : s ∈ D} with the covariance structure cov[X (si ), X (s j )] = ρ, −1 ≤ ρ ≤ 1, si = s j , and cov[X (si ), X (si )] = 1, for i = 1, 2, . . . , n, or in its matrix notation Σ = (1 − ρ)I + ρ J, where each element in J is equal to unity. It is straightforward to check that Σ satisfies the positive-definiteness condition if −1/(n − 1) ≤ ρ ≤ 1. Example 1.3 (Toeplitz structure) Consider the process {X (s) : s ∈ D} with the covariance structure cov[X (si ), X (s j )] = ρ |i− j| . It is easy to see that the corresponding covariance matrix Σ is positive definite if 0 < ρ < 1 (see Excercise 1.5). Example 1.4 (Kronecker product structure) Consider the process {X (s) : s ∈ D} such that we form the matrix X = {X (si j )}, i, j = 1, . . . , n. Assume that all vectors X ( j) = (X (s1 j ), . . . , X (sn j )) , i = 1, . . . , n, have the same covariance matrix Σ. Furthermore, suppose that all these vectors are independent. Then, the covariance matrix of vec(X) is I ⊗ Σ, which is in fact positive definite since Σ is a positive definite matrix.
1.5.2 Intrinsic Stationary Processes and the Variogram We already introduced two classes of stationary processes. Now, through one example in a time series context, we introduce the idea of intrinsically stationary processes. Let us consider a random walk of the form Y (t) = Y (t − 1) + (t), t ∈ R,
1.5 Preliminaries and Notation
11
where (t) are independent random variables with zero-mean and variance σ 2 . A straightforward calculation shows that var[Y (t)] = tσ 2 , which states that {Y (t) : t ∈ R} is not a second-order stationary process. However, the first differences Y (t + 1) − Y (t) are second-order stationary. This motivates the introduction of a new process {X (s) : s ∈ D} to study the increment of the form X (s + h) − X (s) as a function of the separation h ∈ D. One way to measure how the dissimilarity between X (s + h) and X (s) evolves with separation h is to consider var[X (s + h) − X (s)]. In practice, small variance is expected for points that are close in space, while large variance is expected for points with a large separation in space according to the first law of geography (Tobler 1970). A spatial process {X (s) : s ∈ D} is called intrinsically stationary if for all s ∈ D, E[X (s)] = μ, and var[X (s + h) − X (s)] = 2γ (h) (1.4) is only a function of h. In such a case, the function 2γ (h) is called a variogram of the process and γ (h) is called a semivariogram. If E[X (s + h) − X (s)] = 0, then var[X (s + h) − X (s)] = 2γ (h) = E[X (s + h)) − X (s)]2 . An immediate consequence is that a second-order stationary process is also an intrinsically stationary process. However, the converse is not true. We recall that the intrinsically stationary condition requires the existence of the first and second moment of the increment X (s + h) − X (s); however, intrinsic stationarity does not provide any information about the likelihood of the vector (X (s1 ), . . . , X (sn )). Example 1.5 (Wiener Levy Process) Let us define the process {X (t) : t ∈ Z} with X (0) = 0 and consider the process {Y (t) : t ∈ Z} such that Y (i) = 1 with probability 0.5 and Y (i) = −1 with probability 0.5. In addition, assume that Y (t) and Y (s) are independent for all s, t ∈ Z. Let X (t) =
t
Y (i).
i=1
Clearly, for all t, E[X (t)] = 0 and var[X (t)] = t. Thus, the process {Y (t) : t ∈ Z} is certainly not weakly stationary. However, E[X (t) − X (s)] = 0 and Var[X (t) − X (s)] = t − s, for t > t. Thus, the intrinsic hypothesis (1.4) is satisfied.
12
1 Introduction
For weakly stationary processes, there is a relationship between γ (h) and C(h). 2γ (h) = var[X (s + h) − X (s)] = var[X (s + h)] + var[X (s)] − 2cov[X (s + h), X (s)] = 2(C(0) − C(h)). Thus, γ (h) = C(0) − C(h). Example 1.6 (Autoregressive process) Consider the process X (t) = φ X (t − 1) + (t), where |φ| < 1, and the variables (t) are independent and identically distributed with mean zero and variance σ 2 . Then, C(h) = σ 2 φ h /(1 − φ 2 ), C(0) = σ 2 /(1 − φ 2 ), and γ (h) = σ 2 (1 − φ h )/(1 − φ 2 ). For φ = 0.25 and σ 2 = 1, the functions C(h) and γ (h) are plotted in Fig. 1.8. This is possible because h ∈ R. The covariance function in Fig. 1.8 illustrates a property that is valid for all stationary processes in time series. The covariance function decays to zero as an exponential function (Anderson 1971, p. 175). This elementary fact is not true for weakly stationary processes on a plane. The relationship between functions C(·) and γ (·) shows that it is always possible to recover the semivariogram from C(·). However, to recover the covariance function in terms of the semivariogram, an additional condition called ergodicity must be assumed. A stationary spatial process X (·) can be ergodic in the mean or covarince function. Here, we skip the formal definitions because are beyond the scope of this book. Instead, we encourage the reader to see Yaglom (1987) and Chiles and Delfiner C(h)
0.0
0.4
0.2
0.6
0.4
0.8
0.6
1.0
0.8
γ(h)
0.0
0.2
0.4
0.6
0.8
1.0
0.0
h
Fig. 1.8 C(h) (left) and γ (h) (right) for an AR(1) model
0.2
0.4
0.6 h
0.8
1.0
1.5 Preliminaries and Notation
13
(1999) for a detailed discussion or ergodicity. The following Theorem stated by Adler (1981) helps to demonstrate the concept of ergodicity in this context. Theorem 1.1 Let {X (s)} be a spatial process with covariance function C(h). If {X (s)} is ergodic, then C(h) −→ 0, as ||h|| −→ ∞. The above Theorem states that the covariance function vanishes when the separation of the points in space increases. Then, (see Banerjee 2004) C(h) = C(0) − γ (h) = lim γ (u) − γ (h). ||u||→∞
Example 1.7 Consider the process {X (s)} with the covariance structure given by C(h) = e−φ||h|| . Then, it is clear that C(h) −→ 0, as h −→ ∞, for φ > 0. This correlation structure is a particular case of a more general class called exponential correlation. Example 1.8 (Continuation of Example 1.1) For the previously described process X (s) = A cos(ηs + φ), where A ∼ N (0, 1), η is a constant, φ ∼ U(0, 2π ) and the variables A and φ are independent. The covariance function is C(h) = cos(ηh)/2. Clearly, C(h) 0 as h → ∞. Thus, {X (s)} is not an ergodic process. A valid semivariogram satisfies a definite negative condition. Indeed, for locaany n bi = tions s1 , s2 , . . . , sn ∈ D and any constants b1 , b2 , . . . , bn ∈ R, such that i=1 0, if γ (h) is a valid variogram, then one has n n
bi b j γ (si − s j ) ≤ 0.
(1.5)
i=1 j=1
For a complete discussion of the necessary conditions for a valid variogram, see Cressie (1993). A covariance function of a stationary spatial process is called isotropic if C(h) = C(h); otherwise, the covariance function is called anisotropic. The isotropic condition establishes that it is not possible to distinguish the correlation in one direction from another. Isotropic models are simple from an inference perspective. As an example, consider the spatial vector (X (s 1 ), . . . , X (sn )) ∼ N (μ, Σ), where μ ∈ Rn and Σ is an n × n covariance matrix. If μ and Σ are unknown, the total number of real parameters to estimate is n + n 2 . It is not common in statistical inference to estimate n + n 2 from n observations. A model for which n observations are available to estimate n + n 2 parameters is referred in statistics as a saturated model. In this sense, isotropic processes provide a class of parametric models for the covariance components with a few number of parameters. This fact produces an enormous simplification of the estimation process. Another advantage of isotropic models is that the plot of γ (h) versus h can be employed to inspect the features of the variogram. A process that is intrinsically stationary and isotropic is called homogeneous. As an example, consider the following isotropic semivariogram (h = t)
14
1 Introduction
τ 2 + σ 2 t, t > 0, τ 2 > 0, σ 2 > 0 γ (t) = 0, otherwise. This semivariogram does not correspond to a weakly stationary process since γ (t) → ∞ as t → ∞. Another example of a process with a semivariogram that cannot be obtained from a covariance function is the power semivariogram described by γ (t) = b|t| p ,
(1.6)
where 0 < p < 2, b > 0. A Fractional Brownian motion has a semivariogram of the form (1.6) (Wackernagel 2003, p. 406). Given a semivariogram γ (t), h = t, the values limt→0+ γ (t), limt→∞ γ (t), and mint {t : γ (t) = sill}) are called, respectively, the nugget effect, sill, and range. For typical variogram behavior, the values of the nugget effect, sill, and range are shown in Fig. 1.9. Example 1.9 Consider the wave variogram described by the equation γ (t) =
τ 2 + σ 2 (1 − 0,
sin(φt) ), φt
Then, limt→0+ γ (t) = limt→0+ τ 2 + σ 2 (1 −
t > 0, otherwise.
sin(φt) ) φt
= τ 2 . Moreover,
limt→∞ γ (t) = τ 2 + σ 2 . To find the value of the range, we have to solve the equation τ 2 + σ 2 = τ 2 + σ 2 (1 − sin(φt)/(φt)), which is equivalent to solving sin(φt) = 0. Thus, t = 2π k/φ, k ∈ Z, and the range is equal to 2π/φ. The corresponding covariance function is given by C(t) = σ 2 sin(φt)/(φt).
Sill
Nugget Effect Range
Fig. 1.9 Behavior of a typical semivariogram model
t
1.5 Preliminaries and Notation
15
In the literature, there are several summaries of parametric semivariogram models and covariance functions. See, for example, Banerjee (2004), Chiles and Delfiner (1999), Cressie (1993). The Matérn isotropic covariance function is given by Handcock and Stein (1993), 1 MT ν,α (t) = ν−1 2 Γ (ν)
ν t t , Kν α α
(1.7)
where Kν is a modified Bessel function of the second kind of order ν, Γ is the gamma function, α > 0 is the range or distance parameter, and ν > 0 is the smoothness parameter. This class was originally introduced by Matérn (1960), but the model (1.7) was deduced earlier by Whittle (1954, Eq. 65). Alternative parameterizations of the Matérn class have been studied (e.g., Handcock and Stein 1993; Stein 1999). The differentiability of a Gaussian spatial process with a Matérn covariance function depends on the parameter ν. A discussion of this aspect can be found in Rasmussen and Williams (2006). Another interesting family of covariance functions is the piecewise polynomial covariance functions with compact support, where compact support refers to the fact that the covariance becomes zero when the distance between the points exceeds a certain threshold. This assumption helps substantially when the size of the spatial dataset is large; in such a case, the covariance matrix becomes sparse and the estimation process, for example, is simplified. Covariance functions of this type will be given in detail in Sect. 3.1.1.
1.5.3 Estimation of the Variogram For n sampling sites s1 , . . . , sn , a natural and unbiased estimator of the semivariogram of a spatial process {X (s) : s ∈ D ⊂ R2 } based on the method of moments is the empirical semivariogram (Matheron 1963) given by γ (h) =
1 (X (si ) − X (s j ))2 , 2|N (h)|
(1.8)
N (h)
where N (h) = {(si , s j ) : si − s j = h, 1 ≤ i, j ≤ n}, and | · | denotes the cardinality of a set. Alternatively to the empirical semivariogram, robust estimators have been proposed (Cressie and Hawkins 1980; Genton 1998) to address the estimation of the semivariogram when there are outliers in the spatial sample. Furthermore, GarcíaSoidán et al. (2004) proposed a nonparametric (Nadaraya-Watson type) estimator for the semivariogram defined as
16
1 Introduction
n γ˘ (k) =
i=1
2 k−(si −s j ) X (si ) − X (s j ) K h , n n k−(si −s j ) 2 i=1 K j=1 h
n
j=1
(1.9)
where h represents a bandwidth parameter and K : Rd −→ R is a symmetric and strictly positive density function (see Sect. 6.3). For such an estimator, Garcia-Soidan (2007) established consistency and asymptotic normality under regularity conditions and addressed the inadequate behavior of the estimator (1.9) near the endpoints. The computation of the semivariogram for a given spatial sample can be achieved with several standard R packages, such as GeoModels, GeoR, RandomFields, and sgeostat.
1.5.4 Kriging The final goal of a goestatistical analysis is to provide a predictor for the spatial process Y (·) for non-observed locations in space, for example, s0 ∈ D. Several methods can be used in spatial prediction depending on the nature of the problem. The methods developed for this purpose are called kriging in honor of Krige (1951), who was the first to propose a method to predict a spatial variable on a non-observed location. That method is currently known as ordinary kriging. Let us consider a model of the form Y (s) = X(s) β + (s), s ∈ D, where X(·) is a vector of covariates, β is an unknown parameter, and the process (·) has a semivariogram denoted by γ (·). Let s0 be an arbitrary location on the domain D. Denote the predictor of the
(s0 ). One process at location s0 as Y (s0 ) and denote the estimator of Y (s0 ) as Y
(s0 ) is to select an optimal criterion function between the true way to determine Y
(s0 ) by minimizing the prediction and estimated values, e.g., we can estimate Y
(s0 ) − Y (s0 )]. Since the problem is still too general, additional error variance, var[Y restrictions must be imposed. For example, the estimator can be chosen within a
(s0 )] =
(s0 ) = λ Y , and E[Y class of linear and unbiased estimators. Precisely, Y E(Y (s0 )), where Y = (Y (s1 ), . . . , Y (sn )) , λ = (λ1 , . . . , λn ) , and s1 , . . . , sn are n points where the process Y (·) has been observed. If the semivariogram of process Y (·) is known, the predictor is (Gelfand et al. 2010)
(s0 ) = (γ + X Γ −1 X)−1 (x 0 − X Γ −1 γ ) Γ −1 Y , Y where X = (X(s1 ), . . . , X(sn )) , γ = (γ (s1 − s0 ), . . . , γ (sn − s0 )) , Γ is an n × n symmetric matrix with the i j-th element given by Γi j = γ (si − s j ), and x 0 = X(s0 ).
1.5 Preliminaries and Notation
17
The kriging predictor is accompanied by the prediction error variance, which is called the universal kriging variance, given by σ 2 (s0 ) = γ Γ −1 γ − (X Γ −1 γ − x 0 ) (X Γ −1 X)−1 (X Γ −1 γ − x 0 ). In addition, if the process Y (·) is Gaussian, a (1 − α)100% confidence interval for Y (s0 ) can be constructed as
(s0 ) ± z α/2 σ (s0 ), Y where z α/2 is the upper α/2 quantile of the standard normal distribution and 0 < α < 1. Several useful comments and variants of the kriging methodology can be applied in different scenarios depending on the nature of the spatial data. Excellent discussions can be found, for instance, in Cressie (1993), Schabenberger and Gotway (2005), and Gelfand et al. (2010).
1.5.5 The Cross-Variogram Consider two intrinsic stationary processes {X (s) : s ∈ D} and {Y (s) : s ∈ D} with semivariograms γ X (·) and γY (·). The cross-variogram is defined as γ X Y (h) =
1 E[(X (s + h) − X (s))(Y (s + h) − Y (s))], 2
(1.10)
where, as before, h is a vector separating two places such that s, s + h ∈ D. Additionally, the cross-covariogram is defined as follows C X Y (h) = E[(X (s) − E[X (s)])(Y (s + h) − E[Y (s + h])].
(1.11)
In this section, we denote the covariance functions of X and Y as C X and CY , respectively. Note from Eq. (1.10) the difference between the cross-variogram and the crosscovariogram. The latter involves the mean of the two processes, but the crossvariogram captures the spatial variability of the product (X (s + h) − X (s))(Y (s + h) − Y (s)). From an estimation perspective, the cross-covariogram has more variability since the mean of the processes must be estimated previously. Hence, there exists a superposition of errors in the computation of the cross-covariogram that it is difficult to quantify. Another interesting feature of the cross-covariogram is that the order in which the variables are considered matters. For example, consider the processes X (·) and Y (·) centered to zero means; then,
18
1 Introduction
C X Y (h) = E[X (s)Y (s + h)] = E[Y (s)X (s − h)] = CY X (−h).
(1.12)
We recall that, in general the identity C X Y (h) = C X Y (−h) is not true. Using the Cauchy-Schwarz inequality, it is possible to construct coefficients of association that belong to the interval [−1, 1]. In fact, the coefficient C X Y (h) X Y (h) = 2 E[X (s) − E[X (s)]] E[Y (s + h) − E[Y (s + h)]]2
(1.13)
measures the correlation between X (s) and Y (s + h) similarly to the times series case. Furthermore, the codispersion coefficient first introduced by Matheron (1965) is given by γ X Y (h) ρ X Y (h) = . E[X (s + h) − X (s)]2 E[Y (s + h) − Y (s)]2
(1.14)
As we mentioned previously, the cross-variogram does not involve the means of the processes; thus, the estimate is not affected by the estimation of the means. Moreover, the cross-variogram does not assume finite variances. The relationship between the cross-variogram and the cross-covariogram (when it exists) is γ X Y (h) = C X Y (0) −
1 [C X Y (h) + C X Y (−h)] . 2
(1.15)
Clark et al. (1989) defined the pseudo cross-variogram as γYc X (h) =
1 E[Y (s + h) − X (s)]2 . 2 p
Myers (1991) proposed the more general definition for the cross-variogram γY X (·) as half of the variance of the difference p
γY X (h) =
1 var[Y (s + h) − X (s)]. 2 p
If the means differ, then γYc X (h) = γY X (h) + (μ X − μY )2 . In contrast to the usual cross-variogram, the pseudo variogram is not an even function. That is, in general, p p γY X (h) = γY X (−h). Proposition 1.1 Let {X (s)} and {Y (h)} be two intrinsic stationary processes. Then, the following three identities hold: p
p
(i) γ X Y (h) = γY X (−h). p (ii) γY X (h) = 21 C X (0) + 21 CY (0) + CY X (h). p p p (iii) γY X (h) = 21 γY X (h) + 21 γY (h) + γY X (0).
1.5 Preliminaries and Notation
19 p
Proof The proof of (i) is similar to (1.12). To prove (ii), note that γY X (h) = 1 var[Y (s + h)] + 21 var[X (s)] − cov(Y (s + h), X (s)). The proof of (iii) is left to 2 the reader (see Papritz et al. 1993). Proposition 1.1 (iii) states that the cross-variogram can be expressed in terms of the pseudo cross-variogram. The converse is not true. In fact, to recover the pseudo cross-variogram from the cross-variogram, more restrictive stationary conditions are required. We recall that in general, the cross-covariogram is not an even function. The crossvariogram satisfies γ X Y (0) = 0 and is an even function. Chiles and Delfiner (1999, pp. 328–329) provide a decomposition of the cross-covariogram into even and odd parts as follows C X Y (h) =
1 1 [C X Y (h) + C X Y (−h)] + [C X Y (h) − C X Y (−h)] . 2 2
Papritz et al. (1993) investigated the relationship between the slopes of the variogram and pseudo cross-variograms. By expanding the term var[Y (s + h) − X (s + h) + X (s + h) − X (s)], one obtains p
p
γY X (h) = γY X (0) + γ X (h) + cov[Y (s + h) − X (s + h), X (s + h) − X (s)]. (1.16) By dividing Eq. (1.16) by γ X (h) and applying the limit to both sides of the equation, we obtain p
p
γY X (h) γ (0) γ X (h) = lim Y X + lim ||h||→∞ γ X (h) ||h||→∞ γ X (h) ||h||→∞ γ X (h) lim
+ lim
||h||→∞
cov[Y (s + h) − X (s + h), X (s + h) − X (s)] . γ X (h) γ
p
(0)
γ X (h) is unbounded; then, lim||h||→∞ γYXX(h) = 0 and lim||h||→∞ γγ XX (h) = 1. (h) Using the Cauchy-Schwarz inequality, it is easy to establish that p 0 ≤ |cov[Y (s + h) − X (s + h), X (s + h) − X (s)]| ≤ 2 γY X (0)γ X (h).
20
1 Introduction
Thus, cov[Y (s + h) − X (s + h), X (s + h) − X (s)] γ X (h) p 2 γY X (0) ≤ lim = 1. ||h||→∞ γ X (h)
0 ≤ lim
||h||→∞
γ
p
(h)
p
Hence, lim||h||→∞ γYXX(h) = 1. This implies that the slopes of γY X (h) and γ X (h) must be equal for large lag distances. Similarly, the variograms γ X (h), γY (h), γY X (h), p and γY X (h) must have equal slopes for large lag distances.
1.5.6 Image Analysis There is an obvious difference in the treatment of images in spatial statistics and image processing. On the one hand, an image is considered to be a set of gray intensities that are realizations of random processes observed on the plane. On the other hand, an image is a vector corresponding to a vectorization of such gray intensities. Undoubtedly, treating an image as a vector has several advantages, such as the simplicity of the calculations and the efficiency of the computational implementations of certain algorithms. However, it may be more convenient to treat the image as a two-dimensional array of points because this structure maintains the information of the relative positions and neighbors. In this section, we consider both notations according to the definitions given by Vallejos et al. (2016). Henceforth, we let R+ denote the nonnegative real line, and we let R+N denote the set of N −dimensional vectors with nonnegative components. An image is considered to be an element x ∈ R+N . Alternatively, assume that the finite set of gray intensities {X (i, j) : 1 ≤ i ≤ n, 1 ≤ j ≤ m} can be arranged into an n × m matrix X such that the (i, j)th element X(i, j) = X (i, j), i.e., X ∈ Mn×m (R). Then, the vectorization of X is given by x = vec(X) = (x 1 , . . . , xN) . There is a variety of available routines and packages to analyze images in R. The package png has a routine to load images. For instance, to display the Rlogo_old.png image (an RGB image from our book website), we can use the following R code. > > > > > > > > > >
l i b r a r y ( png ) l i b r a r y ( R C u r l ) # p a c k a g e to h a n d l e u r l s url < - " h t t p : / / s r b 2 g v . mat . u t f s m . cl / f i l e s / img / R l o g o _ old . png " img > > >
n >
R2 d a t a ( m u r r a y ) # load ’ murray ’ d a t a s e t > xpos ypos rad . As s y m b o l s ( xpos , ypos , c i r c l e s = rad . As , + bg = " gray " , xlab = " " , ylab = " " ) > rad . Pb s y m b o l s ( xpos , ypos , c i r c l e s = rad . Pb , + bg = " gray " , xlab = " " , ylab = " " )
correction i n c h e s = 0.35 , correction i n c h e s = 0.35 ,
This kind of analysis is crucial from an environmental perspective, as it allows efforts to be focused on carrying out corrective actions such as soil treatment or the removal of the affected area. Here are the commands needed to develop the modified t-test for the Murray Smelter site dataset. # # R code to c o m p u t e the m o d i f i e d t - test > c o o r d s < - m u r r a y [ c ( " xpos " , " ypos " )] > x y m u r r a y . test c l a s s ( m u r r a y . test ) [1] " mod . ttest "
34
2 The Modified t Test
The results reported by modified.ttest have a format that is inspired by the t.test routine (available in the R package stats) which allows one to perform the standard t-test. The following R code thus is a typical output of the modified t-test. > m u r r a y . t e s t # o u t p u t for the m o d i f i e d t - test C o r r e c t e d P e a r s o n ’ s c o r r e l a t i o n for s p a t i a l a u t o c o r r e l a t i o n data : x and y ; c o o r d i n a t e s : xpos and ypos F - s t a t i s t i c : 8 1 . 9 4 9 on 1 and 1 5 4 . 0 6 1 7 DF , p - value : 0 a l t e r n a t i v e h y p o t h e s i s : true a u t o c o r r e l a t i o n is not equal to 0 sample correlation : 0.5893
The information displayed when printing an object of the class mod.ttest yields F = 81.9490, and the degrees of freedom of the F distribution for the numerator and denominator are 1 and 154.0617, respectively. In addition, p-value = 0 and the sample correlation coefficient r X Y = 0.5893. Thus, the null hypothesis of no spatial association between the processes is rejected at a 5% level of significance. Furthermore, the reduction in the sample size is 38.32%. Indeed, we have = 156.0617 pointing out a strong spatial association between the random that M processes associated with the As and Pb measurements. In Chap. 5 we will introduce another technique to measure the association between those processes that enables to obtain additional information about the range and direction in which the processes are correlated. The code summary(murray.ttest) provides further information such as the upper boundaries for each of the thirteen (default) bins used in the computation of the modified t-test, and for each class, the estimated Moran coefficient is also given for both variables (As and Pb). > s u m m a r y ( m u r r a y . test ) # a d d i t i o n a l info using s u m m a r y () C o r r e c t e d P e a r s o n ’ s c o r r e l a t i o n for s p a t i a l a u t o c o r r e l a t i o n data : x and y ; c o o r d i n a t e s : xpos and ypos F - s t a t i s t i c : 8 1 . 9 4 9 on 1 and 1 5 4 . 0 6 1 7 DF , p - value : 0 a l t e r n a t i v e h y p o t h e s i s : true a u t o c o r r e l a t i o n is not equal to 0 sample correlation : 0.5893
1 2 3 4 5 6 7 8 9 10 11 12 13
Upper Bounds Cardinality 424.4 1625 848.8 3590 1273.2 4651 1697.6 5061 2122.0 5181 2546.4 4274 2970.8 3285 3395.2 1925 3819.6 1194 4244.0 635 4668.4 300 5092.8 129 5517.2 28
Moran : x 0.168696 0.056656 0.003615 -0.025936 -0.035670 -0.047373 -0.036903 -0.038192 0.019267 0.059552 0.066867 0.073100 0.076299
Moran : y 0.190601 0.042300 0.001929 -0.008734 -0.035700 -0.025357 -0.031196 -0.063536 -0.061597 0.032902 0.013618 0.080474 0.123326
2.4 Applications and R Computations
(a)
35
(b)
Fig. 2.2 First band for a old and b new R logos
We remark that this dataset will be also considered again when we describe a nonparametric way of addressing the association between two random fields based on ranks (see Chap. 4).
2.4.2 Application 2: Modified t-Test Between Images In the previous chapter, we recall that this methodology can be easily used to measure the association between two digital images. Quantifying the association and the degree of similarity between two images is a challenging problem, some methodologies will be discussed in Chap. 8. Here, we develop a methodology under the assumption that both stochastic processes X (s) and Y (s) are Gaussian random fields. We will not go over the goodness of fit test to check the normality assumption, which could not be supported by the data. Consequently, the results discussed in this chapter must be considered with certain caution. An image can be understood as a realization of a stochastic process defined on a regular grid. Here, we carry out the comparison between the current version of the R logo, which was introduced in February, 20161 with the previous version of the logo used in the period 2000–2016. The images considered in our analysis can be found in the website http://srb2gv.mat.utfsm.cl/datasets.html. The following R code yields Fig. 2.2 and the rectangular grid used in the computation of the modified t-test. > > > >
1A
l i b r a r y ( png ) l i b r a r y ( R C u r l ) # p a c k a g e to h a n d l e u r l s url >
c r e a t e the r e g u l a r grid rows > + >
plot R logos ( F i g u r e 2 . 2 ) par ( pty = " s " , mfrow = c (1 ,2)) image ( t ( o l d R l o g o [ rev ( rows ) , cols ]) , col = gray ( ( 0 : 3 2 ) / 32) , xaxt = " n " , yaxt = " n " ) box () image ( t ( n e w R l o g o [ rev ( rows ) , cols ]) , col = gray ( ( 0 : 3 2 ) / 32) , xaxt = " n " , yaxt = " n " ) box ()
In Chap. 1 it was emphasized that PNG images have several channels. In our analysis we are only considering the first band. Without loss of generality the rest of bands can be analyzed by the same methodology. We skip this only because the computation of the test can be computationally expensive for illustrative purposes, since the size of the images is 561 × 724 and the total number of pixels is n = 406,164. One interesting feature of the routine modified.ttest is that does not require the storage of the n(n − 1)/2 distances. In our example, it should be allocated objects of 10.31 Gigabytes (for each one of the K bins). The following R code computes the modified t-test for the two R logos described above. # c o e r c i n g i m a g e s into v e c t o r s > x y logo . test logo . test C o r r e c t e d P e a r s o n ’ s c o r r e l a t i o n for s p a t i a l a u t o c o r r e l a t i o n data : x and y ; c o o r d i n a t e s : xpos and ypos F - s t a t i s t i c : 4 6 . 5 6 8 5 on 1 and 1 1 7 . 0 7 1 6 DF , p - value : 0 a l t e r n a t i v e h y p o t h e s i s : true a u t o c o r r e l a t i o n is not equal to 0 sample correlation : 0.5335
As expected, the hypothesis of no correlation between the old and new logos (Fig. 2.2a, b) should be rejected. The sample correlation coefficient is r X Y = 0.5335, = 119.0716, yielding the statistic F = 46.5685 while the effective sample size is M with 1 and 117.0716 degrees of freedom. We observe the enormous reduction of the effective sample size in a 99.97%, which contribute to the rejection of H0 : ρ X Y = 0, and confirm the strong spatial association between the processes. Finally, we stress that the time required to develop the modified t-test was 5 h and 30 min on a PC with an Intel Core i5 of 2.50 Ghz × 4 processor and 8 Gb of RAM.
2.5 A Permutation Test Under Spatial Correlation
37
2.5 A Permutation Test Under Spatial Correlation Motivated by the difficulties to assess the normality assumption of a random field Viladomat et al. (2014) introduced a permutation type test (see, Boos and Stefanski 2013, for instance, Chap. 12) to address the hypotheses described in (2.1) keeping the distributional assumptions to a minimum. The procedure is quite intensive from a computational perspective. Although our discussion of this method is strongly based on the previously introduced modified t-test, the methodology can be adapted to any other test statistic. The main drawback lies in recovering the inherent dependency of the spatial process. The procedure is based on randomly permute the set of spatial locations {s1 , . . . , sn }. Those permutations will be denoted as π(s). In this way, the interest of applying a permutation on the observed values of the process X (s) is due to the fact that X (π(s)) and Y (s) are independent. This allows the estimation of the distribution of the sample correlation coefficient r X Y under the null hypothesis H0 : ρ X Y = 0. One challenging aspect is to reconstruct the actual dependence that has been removed during the permutation process. In order to maintain the distributional assumptions to a minimum, Viladomat et al. (2014) considered Nadaraya-Watson type estimators to build an instrumental process Z (s) such that its estimated semivariogram represents the better fit to the empirical semivariogram for the original process X (s). Based on a Monte Carlo simulation study, Viladomat et al. (2014) provided empirical evidence in favor that X (s) and Z (s) have approximately the same correlation structure. The Algorithm 2.5.1 describes the steps to construct the proposed randomized test. Algorithm 2.5.1 construct B datasets (Z 1 , Y ), . . . , (Z B , Y ). Through the calculation of the sample correlation coefficients ri = cor(Zi , Y ), a Monte Carlo estimation of the p-value associated with the hypothesis (2.1) can be obtained as p=
B 1 I (|ri | > |r X Y |), B i=1
(2.12)
where r X Y represents the sample correlation coefficient for the observed processes X and Y , while I (·) denotes the indicator function. In the methodology, once the random selection has been carried out, the stage where the spatial dependence is recovered is described by the Algorithm 2.5.2. This procedure approximates the semivariogram of the original process X (s), which is estimated using a nonparametric alternative proposed by García-Soidán et al. (2004) for isotropic processes, defined through n γtarget (k) =
i=1
n
wi j (X (si ) − X (s j ))2 n n , 2 i=1 j=1 wi j j=1
k ≥ 0,
(2.13)
38
2 The Modified t Test
k− si −s j , K 1 (·) is a one-dimensional kernel function, and h where wi j = K 1 h denotes the bandwidth. In particular, Viladomat et al. (2014) suggested the use of an adapted Gaussian kernel of the form K 1 (u) = exp(−(2.68u)2 /2), such that its quantiles are in the interval ∓0.25h. Algorithm 2.5.1: Permutation test under spatial correlation
1 2 3 4 5 6 7 8 9 10
input : Two spatial processes X and Y , the set of coordinates Sloc = {s1 , . . . , sn } ⊂ D ⊂ R2 , and the number of bootstrap replicates B. output: Vector of sample correlations r = (r1 , . . . , r B ) and the estimated p-value. begin for i ∈ {1, . . . , B} do Randomly permute the set of coordinates, π(Sloc ) Apply the permutation on process X.That is, X π ←− X(π(Sloc )) Invoke Algorithm 2.5.2 to construct Zi , smoothing the process X π Compute the sample coefficient coefficient, ri ←− cor(Zi , Y ) end Obtain p using Equation (2.12) return r = (r1 , . . . , r B ) and p end
The construction of the smoothed process X δ in Algorithm 2.5.2 is performed from the fit of a local regression model (Loader 1999), which must be adapted to the spatial context. This means that for each one of the locations s1 , . . . , sn we need to obtain s−si ≤δ K 1 ( s − s i /δ)X π (s i ) , (2.14) X δ (s) = s−si ≤δ K 1 ( s − s i /δ) where δ denotes the proportion of nearest neighbors that are considered in the fit and K 1 (·) is a one-dimensional kernel function. In their numerical experiments Viladomat et al. (2014) considered a Gaussian kernel for the fitted process in (2.14) and they recommended to use B = 1000 bootstrap replicates. They also recommend to vary the tunning parameter δ allowing the user to choose a grid of values for the bandwidths, called Δ. The goal of the least squares fitted in Algorithm 2.5.2 is to provide a smoothed process whose semivariogram is the closest to the actual semivariogram given by γtarget . Indeed, this is a parameter of the algorithm and thus needs to be computed only once. Other recommendations to get bootstrap samples in the context of spatial data can be found in García-Soidán et al. (2014). An outstanding feature of the permutation test proposed by Viladomat et al. (2014) is that a slightly adaptation from the previously described algorithms lead to the local hypothesis testing problem H0 j : ρ X Y (s j ) = 0, for an arbitrary location s j ∈ D. Therefore, this methodology allows the construction of a map of p-values to identify those regions that may have strong correlation between the two processes X (s) and Y (s).
2.5 A Permutation Test Under Spatial Correlation
39
Algorithm 2.5.2: Matching semivariograms
1 2 3 4 5 6 7 8 9 10
input : Permuted process X π , set of coordinates Sloc = {s1 , . . . , sn } ⊂ D ⊂ R2 , estimated semivariogram γtarget (k) for the original process X, and Δ : the set of proportions of neighbors for the smoothness stage. output: Smoothed process Z begin foreach δ ∈ Δ do Construct the smoothness process X δ through (2.14) Based on the process X δ , compute γδ using (2.13) δ ), from the regression of Obtain the coefficients ( αδ , β γδ on γtarget and storage the residuals eδ end Choose δ ∗ ∈ Δ such that the sum of squares R SSδ = eδ 2 is minimized δ ∗ |1/2 X δ ∗ + | Set Z ←− |β αδ ∗ |1/2 U, with U ∼ N (0, I) simulated independently return Z end
2.5.1 Application 3: Permutation t-Test Between Images We illustrate the nonparametric test based on permutations revisiting the images analyzed in Sect. 2.4.2. As indicated above this is a highly computationally intensive procedure. In order to overpass this drawback, Viladomat et al. (2014) provided a code which takes advantage of multicore functionality to parallelize some calculations (Calaway et al. 2017). Although an underlying motivation for this nonparametric test is the analysis of medium-size datasets, we must keep in mind that the images in the experiment with the R logos are of dimensions 561 × 724, hence, the sample size is n = 406,164. This leads to the fact that the memory requirements are excessive for today’s personal computer. Therefore, the image was scaled up to 10%, obtaining images of size 56 × 72, thus the new sample size is n = 4032. For comparison purposes we ran the modified t-test available in the R SpatialPack package and both procedures lead to equivalent results. Using the procedure implemented in the routine modified.ttest for the scaled images it was obtained that the sample correlation coefficient is r X Y = 0.5758, with the test statistic F = 47, 1408, which should be compared with a quantile of the F distribution with 1 y 95.0609 degrees of freedom. Thus we reject the null hypothesis H0 : ρ X Y = 0 with a p-value = 0. We also consider the methodology proposed by Viladomat et al. (2014) considering B = 1000 and bandwidths Δ = (0.1, 0.2, . . . , 0.9), which leads to an estimated global p-value B 1 I (|ri | > 0.5758) = 0. p= B i=1
40
2 The Modified t Test
Fig. 2.3 Map of p-values for the local correlations between the R logos
This is in agreement with the results based on the procedure proposed by Clifford et al. (1989). Furthermore, the map of p-values for the correlations between the R logos was obtained, showing a strong correlation between the images as expected (see Fig. 2.3). For this dataset, the obtained results using the modified t-test and the nonparametric permutation test are equivalent. In fact, Viladomat et al. (2014) reported a simulation study where the performance of Algorithms 2.5.1 and 2.5.2 is quite similar to the modified t-test with a slightly advantage of these algorithms in the power function for very specific scenarios. The methodology proposed by Viladomat et al. (2014) does not offer a mechanism to carry out the estimation of the effective sample size. Additionally, for our numerical experiment, the modified.ttest routine took 2.17 s, while the code in Viladomat et al. (2014) took 4 min and 30 s. Because such implementations have been made in languages with slightly different characteristics (compiled C versus interpreted R), the user times are not directly comparable. But they give a crude idea of the performance of the algorithms for a practical context.
2.6 Assessing Correlation Between One Process and Several Others In the context of spatial data, it is common to make measurements of several processes of interest. In particular, a situation of practical importance involves determining if a spatial process is correlated with several other processes, when all processes have been observed at the same locations in space. In this section, we present an interesting extension of the modified t-test which takes advantage of the relationship between the correlation coefficient and the determination coefficient in the context of linear
2.6 Assessing Correlation Between One Process and Several Others
41
regression, allowing the reuse of the methodology proposed by Dutilleul (1993) for the estimation of the effective sample size. Suppose that Z(s) = (Y (s), X (s)) , is a multivariate process, where s ∈ D with X(s) = (X 1 (s), . . . , X q (s)) . We assume that Z(s) is a multivariate stationary Gaussian process of size (q + 1). Based on the set of observations (Y (s 1 ), X (s1 )) , . . . , (Y (sn ), X (sn )) the goal is to test the hypotheses H0 : ρY X = 0
against
H1 : ρY X = 0,
(2.15)
where ρY X represents the multiple correlation coefficient (Anderson 2003, p. 38), for which its sample version is RY X =
−1 s Y X S X X sY X . sY2
Here we denote the sample covariance matrix among the observations of Z(s) as sY2 s Y X . = sY X S X X
SZZ
Note that the definitions of the elements of S Z Z are similar to those presented in Eqs. (2.4) and (2.5). In the context of multivariate analysis under normality, the hypothesis test defined in (2.15) has been studied by a number of authors (see, for example, Anderson 2003, Sect. 4.4.2), from where we see that RY2 X ∼ Beta(q/2, (n − q − 1)/2). However, to prove that a spatial process is correlated with several others, Dutilleul et al. (2008) followed a procedure similar to that used by Dutilleul (1993) for the derivation of the modified t-test (see Eqs. 2.2 and 2.3). Thus, they proposed the test statistic RY2 X /q , (2.16) F= (1 − RY2 X )/(M − q − 1) where the effective sample size was computed through M = 1 + q/ E[RY2 X ], and the distribution of the statistic in (2.16) under H0 : ρY X = 0 can be approximated by an F distribution with q and M − q − 1 degrees of freedom. In order to estimate the effective sample size, in Dutilleul et al. (2008) the connection between the multiple correlation coefficient and the sample correlation coefficient between (s), say rY Y, is conthe observations of the process, Y (s), and the predicted process Y sidered. As a matter of fact, RY2 X = rY2 Y. This allows to use the assumptions given in
42
2 The Modified t Test
(s) for estimating the effective sample size Sect. 2.3 about the processes Y (s) and Y using the results available in Dutilleul (1993). Then Y) Y ) tr( P Σ tr( P Σ = 1 + 1/ , M E[rY2 Y] = 1 + tr( P Σ Y P Σ Y) where Σ Y and Σ Y can be estimated as is described in Sect. 2.3. Thus, the distribution − of the test statistic F given in (2.16) under H0 can be approximated by an F(q, M q − 1) distribution. This means that for the hypothesis testing problem with level equal to α, the critical region associated with the modified F-test is M − q − 1 q
RY2 X − q − 1), > F1−α (q, M 1 − RY2 X
− q − 1) denotes the upper quantile of order (1 − α)100% from where F1−α (q, M − q − 1 degrees of freedom. The performance of the F distribution with q and M this test was studied by Dutilleul et al. (2008) through a Monte Carlo simulation study.
2.6.1 Application 4: Pinus Radiata Dataset Revisited In Chap. 1 the pinus radiata dataset was used to motivate the necessity of the development of measures that quantify the relationships between spatial variables. Another reason to study this type of variable is the abundance of natural resources. In order to exemplify the methodology described in this section, we will assume that the height of the trees (Y ) can be influenced by the basal area (X 1 ), the elevation (X 2 ) and the slope (X 3 ) of the terrain. In Cuevas et al. (2013) this dataset was used to study the existing relationship between pairs of variables using a nonparametric approach based on kernels, proposal that will be discussed in detail in Chap. 6. The modified.Ftest routine available in the package SpatialPack presents an implementation of the methodology proposed by Dutilleul et al. (2008) that takes advantage of the C code underlying the routine modified.ttest. The following code fragment in R presents the results of the modified F-test for the pinus radiata dataset. # load the pinus R a d i a t a d a t a s e t > data ( r a d i a t a ) # d e f i n i n g the r e s p o n s e and p r e d i c t o r v a r i a b l e s > y x c o o r d s < - r a d i a t a [ c ( " xpos " , " ypos " )] # c o m p u t i n g the m o d i f i e d F - test for s p a t i a l a s s o c i a t i o n
2.6 Assessing Correlation Between One Process and Several Others
43
> r a d i a t a . test r a d i a t a . test $ ESS [1] 8 3 . 4 6 1 1 3
An interesting information that is returned as an element of an object created when the function modified.Ftest is invoked is the effective sample size, in this = 83.4611. Even though the display of the function modified.Ftest case, M is strongly inspired by the implementation of the function used for the modified t-test, there is a slight difference. Internally, the routine performs the fitting of a linear regression model (without intercept) so that the sample correlation coefficient between the response and the predicted values by the regression model is computed, here rY Y = 0.5917. Now, we know that RY2 X = rY2 Y, thus evaluating (2.16) we have that 83.4611 − 3 − 1 0.59172 = 14.2708, F= 3 1 − 0.59172 which must be compared with an upper quantile of the F-distribution with 3 and 79.4611 degrees of freedom. As expected, (see also Cuevas et al. 2013), the hypothesis H0 : ρY X = 0 is rejected, indicating the presence of spatial association. More detailed information regarding the modified F-test for the pine radiata variables can be obtained using the command: # d e t a i l e d o u t p u t for the m o d i f i e d F - test > s u m m a r y ( r a d i a t a . test ) M u l t i p l e c o r r e l a t i o n for a s s e s s i n g s p a t i a l a u t o c o r r e l a t i o n F - s t a t i s t i c : 1 4 . 2 7 0 8 on 3 and 7 9 . 4 6 1 1 DF , p - v a l u e : 0 a l t e r n a t i v e h y p o t h e s i s : true m u l t i p l e c o r r e l a t i o n is not equal to 0 sample correlation : 0.5917
1 2 3 4 5 6 7 8 9 10 11 12 13
Upper Bounds Cardinality 540.2 4768 1080.5 12697 1620.7 16823 2161.0 18405 2701.2 17369 3241.5 14504 3781.7 11083 4322.0 7054 4862.2 3604 5402.4 1905 5942.7 786 6482.9 229 7023.2 51
Moran : y Moran : p r e d i c t e d 0.51553 0.2404995 0.24120 0.0576893 -0.02857 0.0007995 -0.13779 -0.0324859 -0.02283 -0.0437018 0.10583 0.0042594 0.03649 0.0271991 -0.21031 -0.0496320 -0.40335 -0.1882064 -0.31022 -0.1053134 -0.65077 0.0903762 -0.81111 0.1518159 -1.08289 -0.1897675
where similarly to the modified t-test when applying the summary function to an object of the class mod.Ftest, the result contains the upper bounds, the cardinality as well as the Moran indices, which allow to get estimates of the covariance structures for Σ Y and Σ Y, respectively.
44
2 The Modified t Test
2.7 Problems for the Reader 2.1 Let R be a random variable with probability density function given by (2.6). Prove that E[R] = 1/(M − 1). 2.2 Suppose that Z ∼ N p (μ, Ω) and let Q = Z AZ be a quadratic form with A being a symmetric matrix. Compute var[Z AZ]. 2.3 Verify Eq. (2.11). 2.4 Suppose that the processes X (s) and Y (s) have covariance matrices Σ X = σ X2 [(1 − ρ X )I + ρ X J] and Σ Y = σY2 [(1 − ρY )I + ρY J], respectively. Show that = n. M 2.5 Consider the regression model Y = xβ + , where x ∈ Rn and assume that Y ) = cor(Y , x). E[] = 0, cov[] = σ 2 I n . Show that cor(Y , 2.6 For the logos dataset in R (see Application 2, Sect. 2.4.2) a. Generate a sample of size 300, without replacement, from the 561 × 724 = 406,164 observations and get the modified F-test using as a response variable the old R logo image combining the 3 bands via the function as.raster which is available in R. Use each of the band of the new R logo as a covariate. b . Repeat the experiment described in a. 500 times and compute the empirical size associated with the modified F-test.
References Anderson, T. W. (2003). An introduction to multivariate statistical analysis (3rd ed.). New York: Wiley. Boos, D. D., & Stefanski, L. A. (2013). Essential statistical inference: Theory and methods. New York: Springer. Calaway, R., Revolution Analytics, & Weston, S. (2017). doMC: Foreach parallel adaptor for ‘parallel’. Package version 1.3.5. https://CRAN.R-project.org/package=doMC. Clifford, P., & Richardson, S. (1985). Testing the association between two spatial processes. Statistics and Decisions, Supp. Issue 2, 155–160. Clifford, P., Richardson, S., & Hémon, D. (1989). Assessing the significance of the correlation between two spatial processes. Biometrics, 45, 123–134. Cuevas, F., Porcu, E., & Vallejos, R. (2013). Study of spatial relationships between two sets of variables: A nonparametric approach. Journal of Nonparametric Statistics, 25, 695–714. Dutilleul, P. (1993). Modifying the t test for assessing the correlation between two spatial processes. Biometrics, 49, 305–314. Dutilleul, P., Pelletier, B., & Alpargu, G. (2008). Modified F tests for assessing the multiple correlation between one spatial process and several others. Journal of Statistical Planning and Inference, 138, 1402–1415. García-Soidán, P., Febrero, M., & Gonzáles, W. (2004). Nonparametric kernel estimation of an isotropic variogram. Journal of Statistical Planning and Inference, 121, 65–92.
References
45
García-Soidán, P., Menezes, R., & Rubiños, O. (2014). Bootstrap approaches for spatial data. Stochastic Environmental Research and Risk Assessment, 28, 1207–1219. Loader, C. (1999). Local regression and likelihood. New York: Springer. Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37, 17–23. Richardson, S., & Clifford, P. (1991). Testing association between spatial processes. In A. Possolo (Ed.), Spatial statistics and imaging. Lecture Notes (Vol. 20, pp. 295–308). California, Hayward: Institute of Mathematical Statistics. Schott, J. R. (1997). Matrix analysis for statistics. New York: Wiley. Viladomat, J., Mazumder, R., McInturff, A., McCauley, D. J., & Hastie, T. (2014). Assessing the significance of global and local correlations under spatial autocorrelation: A nonparametric approach. Biometrics, 70, 409–418.
Chapter 3
A Parametric Test Based on Maximum Likelihood
3.1 Introduction Assessing the significance of the correlation between the components of a bivariate random field is of great interest in the analysis of spatial-spatial data. In this chapter, testing the association between two georeferenced correlated variables is addressed for the components of a bivariate Gaussian random field using the asymptotic distribution of the ML estimator of a general parametric class of bivariate covariance models. The study of asymptotic properties of the ML estimators, in this case, is complicated because more than one asymptotic framework can be considered when observing a single realization. In particular, under infill (Cressie 1993) or fixed domain asymptotics, one supposes that the sampling domain is bounded and that the sampling set becomes increasingly dense. Under increasing domain asymptotics, the sampling domain increases with the number of observed data, and the distance between any two sampling locations is bounded away from zero. The asymptotic behavior of maximum likelihood estimators of the covariance parameters can be quite different under these two frameworks (Zhang and Zimmerman 2005). In this chapter, we present a parametric test based on the increasing domain asymptotics setting for a general class of bivariate covariance models. For a specific model, we also we present a parametric test based on the fixed domain setting.
3.1.1 Parametric Bivariate Covariance Models In Chap. 1 we denoted the cross-covariance function for two processes X and Y as C X Y . In order to simplify the notation for a multivariate vector, here we will denote the cross-covariance function between two components of a vector as Ci j , for i, j = 1, . . . , p. © Springer Nature Switzerland AG 2020 R. Vallejos et al., Spatial Relationships Between Two Georeferenced Variables, https://doi.org/10.1007/978-3-030-56681-4_3
47
48
3 A Parametric Test Based on Maximum Likelihood
Let Z i = {Z i (s), s ∈ A ⊆ Rd }, i = 1, 2 be two Gaussian random fields and Z 12 = {Z 12 (s) = (Z 1 (s), Z 2 (s))T , s ∈ A ⊆ Rd } be a bivariate Gaussian random field. The assumption of Gaussianity implies that the first and second moments determine uniquely the finite dimensional distributions. In particular, we shall assume weak stationarity throughout, so that the mean vector μ = E(Z12 (s)) is constant, and without loss of generality we assume μ = (0, 0) . Under second-order stationary assumption, var(Z i (s)), i = 1, 2 is bounded and the covariance function between Z12 (s1 ) and Z12 (s2 ), for any pair s1 , s2 in the spatial domain, is represented by a mapping C : Rd → M2×2 defined through 2 2 C(h) = Ci j (h) i, j=1 = cov Z i (s1 ), Z j (s2 ) i, j=1 , h = s1 − s2 ∈ A.
(3.1)
The function C(h) is called bivariate covariance function. Here, M2×2 is the set of two-dimensional squared, symmetric and positive definite matrices. The functions Cii (h) i = 1, 2 are the marginal covariance functions of the Gaussian random fields Z i , i = 1, 2 while Ci j (h) is called cross-covariance function between Z i and Z j for i, j = 1, 2 and i = j at spatial lag h. We recall that the cross-covariance function is not symmetric, i.e., Ci j (h) = C ji (h) or equivalently Ci j (h) = Ci j (−h) for i, j = 1, 2 and i = j (Wackernagel 1978). In the univariate setting, semivariograms are often the main focus in geostatistics and are defined as the variance of contrasts. Similarly, in the bivariate setting, the semivariogram matrix function can be defined as 2 Γ (h) = γi j (h) i, j=1 =
1 2
2 cov Z i (s1 ) − Z i (s2 ), Z j (s1 ) − Z j (s2 ) i, j=1 .
Under weakly stationarity, the relation between the (cross) semivariogram and the (cross) covariance is given γi j (h) = Ci j (0) − 0.5(Ci j (h) + C ji (h)). Hereafter, in this chapter we assume symmetry that is Ci j (h) = C ji (h), j = i and in this case the (cross) semivariogram simplifies to γi j (h) = Ci j (0) − Ci j (h) i, j = 1, 2.
(3.2)
2 The mapping K : Rd → M2×2 defined through K (h) = K i j (h) i, j=1 with Ci j (h) K i j (h) = Cii (0)C j j (0) is called bivariate correlation function, K ii (h) being the marginal correlation functions of the Gaussian random fields Z i , i = 1, 2, K 12 (h) being the cross-correlation function between the fields Z 1 and Z 2 and K 12 (0) expressing the marginal correlation between the two components. The mapping C (and, consequently, K ) must be positive definite, which means that, for a given realization Z N = (Z 1;N , Z 2;N ) ,
3.1 Introduction
49
where Z k;N = (Z k (s1 ), . . . Z k(s N )) , k = 1, 2, the (2N ) × (2N ) covariance matrix p n Σ := Σ i j i, j=1 with Σ i j = Ci j (sl − sm ) l,m=1 is positive semidefinite. We shall assume throughout that the mapping C comes from a parametric family of bivariate covariances {C θ (·), θ ∈ Θ ⊆ R p }, with Θ an arbitrary parametric space. Recent literature has been engaged on offering new models for bivariate covariances and for a through review the reader is referred to Genton and Kleiber (2015) with their exhaustive list of references. One of them is the linear model of coregionalization that has been popular for over thirty years. It consists of representing the bivariate Gaussian field as a linear combination of r = 1, 2 independent univariate fields. The resulting bivariate covariance function takes the form: C θ (h) =
r k=1
2 ψik ψ jk Rk;ψk (h)
,
(3.3)
i, j=1
2,r with A := [ψlm ]l,m=1 being a 2 × r dimensional matrix with full rank, and with Rk;ψk (h), k = 1, . . . , r being a univariate parametric correlation model. Clearly, we have θ = (vec(A) , ψ1 , . . . , ψq ) . Note that when r = 2, the marginal correlation is given by
K 12;θ (0) =
ψ11 ψ21 + ψ12 ψ22 2 2 2 2 + ψ12 )(ψ21 + ψ22 ) (ψ11
.
A criticism expressed about this model by Gneiting et al. (2010) is that when ψik = 0 for each i, k, the smoothness of any component defaults to that of the roughest latent process. Another general parametric class, called separable, is obtained through the following bivariate covariance function: 2 C θ (h) = ρi j σi σ j Rψ (h)) i, j=1 , ρii = 1, |ρ12 | < 1,
(3.4)
where Rψ (h) is a univariate parametric correlation model and θ = (σ12 , σ22 , ψ, ρ12 )T . Here σi2 > 0, i = 1, 2 are the marginal variance parameters and ρ12 , is the so-called colocated correlation parameter (Gneiting et al. 2010), expressing the marginal correlation between Z 1 and Z 2 . This type of construction assumes that the two components of the bivariate Gaussian random field share the same correlation structure. Therefore, the model is not able to capture different spatial dependencies and/or the smoothness of each of the component fields. A generalization of (3.4), that here we call nonseparable, which allows to overcome this drawback is 2 C θ (h) = ρi j σi σ j Rψi j (h) i, j=1 , ρii = 1,
(3.5)
50
3 A Parametric Test Based on Maximum Likelihood
where θ = (σ12 , σ22 , ψ 11 , ψ 12 , ψ 22 , ρ12 ) . In this general approach, the difficulty lies in deriving conditions on the model parameters that result in a valid multivariate covariance model. For instance, Gneiting et al. (2010) proposed the model (3.5) with R(h) equal to the Matérn isotropic correlation model defined in Eq. (1.7). Putting together (3.5) and (1.7) we obtain the nonseparable bivariate Matérn:
2 C θ (h) = ρi j σi σ j MT νi j ,αi j (h) i, j=1 , ρii = 1.
(3.6)
with θ = (σ12 , σ22 , ν11 , ν12 , ν22 , α11 , α12 , α22 , ρ12 )T . Gneiting et al. (2010) find a set of sufficient and necessary conditions on the colocated correlation parameter ρ12 in order the model (3.6) to be valid. For instance, one of these conditions is if −1 −1 −1 ≤ min(α11 , α22 ) then (3.6) is valid if and only if: ν12 = 0.5(ν1 + ν2 ) and α12 |ρ12 | ≤
2 α12 α11 α22
−d/2
Γ (ν12 ) Γ (ν1 + d/2) Γ (ν2 + d/2) < 1. Γ (ν1 ) Γ (ν2 ) Γ (ν12 + d/2)
0.2
0.4
0.6
0.8
C11 C22 C12
0.0
Fig. 3.1 Marginal and cross-correlation for a bivariate Matérn model
1.0
Note that this condition shrinks the range of validity of ρ12 . In general, considering the nonseparable construction (3.5), leads to restrictions on the upper and lower bound of the colocated parameter than can be more or less severe depending on the scale and smoothness parameters. Figure 3.1 shows a graphical representation of model (3.6) when σ12 = σ22 = 1, α11 = 0.2/3, α12 = 0.12/3 α22 = 0.15/3, ν1 = 0.5, ν2 = 1.5, ν12 = 0.5(ν1 + ν2 ) and ρ12 = 0.4. Under this setting the sample paths associated with the first and
0.0
0.1
0.2
0.3 Distance
0.4
0.5
0.6
51 1.0
1.0
3.1 Introduction
3 0.8
0.8
3
2
2
0.6
0.6
1
1
0 0.4
0.4
0
−1
0.2
0.2
−1 −2
−2
0.0
0.0
−3
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Fig. 3.2 A realization of a bivariate Matérn model with different smoothness parameters
second component are 0 and 1 times differentiable, respectively. Figure 3.2 shows a simulated realization of a zero-mean bivariate Gaussian random field with covariance (3.6) under the same parameter setting. It is apparent the difference between the first and second component in terms of smoothness. Another example of model (3.5) can be found in Daley et al. (2015), where, in this case, R(h) is a compactly supported correlation model of the generalized Wendland type defined as (Gneiting 2002): 1 GW κ,μ,β (h) =
||h||/β
0,
u(u 2 −(||h||/β)2 )κ−1 (1−u)μ du , B(2κ,μ+1)
0 ≤ ||h|| < β, ||h|| ≥ β,
(3.7)
where β > 0 is the compact support and κ > 0 is the smoothness parameter. For the case κ = 0 the class is defined as (1 − ||h||/β)μ , 0 ≤ ||h|| < β, GW 0,μ,β (h) = (3.8) 0, ||h|| ≥ β, and the condition μ ≥ (d + 1)/2 + κ guarantees the validity of the model. As for the Matérn case, this class allows for a continuous parameterization of smoothness of the underlying random field. Specifically, for a positive integer k, the sample paths of a Gaussian process are k times differentiable if and only if κ > k − 1/2 (Bevilacqua et al. 2018). From a computational point of view, generalized Wendland is an interesting model since compactly supported covariance functions, leading to sparse matrices, is a very accessible and scalable approach when handling large spatial datasets. In fact, well established and implemented algorithms for sparse matrices can be used when estimating the covariance parameters and/or predicting at unknown locations (see, Furrer and Sain 2010 and the reference therein).
52
3 A Parametric Test Based on Maximum Likelihood
Table 3.1 Generalized Wendland model and Matérn with increasing smoothness parameters κ and ν. S P(k) means that the sample paths of the associated Gaussian field are k times differentiable. Here r = ||h|| is the euclidean norm and (·)+ denotes the positive part κ GW κ,μ,1 ν MT ν,1 S P(k) 0 1 2 3
μ
(1 − r )+
μ+1 (1 − r )+ (1 + r (μ + 1)) μ+2 (1 − r )+ (1 + r (μ + 2) + 2 2 r (μ + 4μ + 3) 13 ) μ+3 (1 − r )+ 1 + r (μ + 3) + 2 2 r (2μ + 12μ + 15) 15 + r 3 (μ3 1 9μ2 + 23μ + 15) 15
+
0.5
e−r
0
1.5
e−r (1 + r )
1
2.5
e−r (1 + r +
3.5
e−r (1 +
r 2
r2 3 )
6 + r 2 15 +
2 r3 15 )
3.
In the univariate case Bevilacqua et al. (2018) show that the kriging predictor with Generalized Wendland model is asymptotically (in the fixed domain sense) as efficient as the kriging predictor with a Matérn model. This implies that kriging can be performed with a compactly supported function without any loss of prediction efficiency, under fixed domain asymptotics. In some special cases Matérn and Generalized Wendland models have well known closed forms. Table 3.1 compares the Generalized Wendland for κ = 0, 1, 2, 3 with Matérn for ν = 0.5, 1.5, 2.5, 3.5 with the associated degree of sample paths differentiability when fixing the scale parameters equal to 1. Daley et al. (2015) provide sufficient conditions for the parameter ρ12 and μi j i, j = 1, 2 for the validity of the bivariate generalized Wendland model. From our perspective, a crucial benefit of the general class (3.5) with respect to (3.3) is that the colocated correlation parameter express the marginal correlation between the components, that is, K 12;θ (0) = ρ12 so when ρ12 = 0, the components of the bivariate random field are independent; hence, the colocated correlation parameter can be used to build a test of independence or, in general, to assess the significance of the correlation between the components of the random field. This problem is addressed in the next section. Note that in the general construction (3.5), nuggets parameters are not considered. A useful parametrization of (3.5) taking into account the nugget parameters is the following: 1 1 ρi j (σi2 + τi2 ) 2 (σ 2j + τ 2j ) 2 , h = 0, (3.9) Ci j;θ (h) = otherwise. ρi j σi σ j Rψi j (h), for i = 1, 2 and where ρii = 1. This parametrization is different from the one adopt in Gneiting et al. (2010). It can be shown that (3.9) is the covariance of a bivariate random field obtained as the sum of a bivariate Gaussian random field with covariance model (3.5) and a bivariate Gaussian white noise with a specific correlation between the components (see Exercise 3.1). If τ12 = τ22 = 0 then (3.9) reduces to (3.5).
3.1 Introduction
53
Note that, using this specific parametrization, K 12;θ (0) = ρ12 still holds. This implies that the colocated correlation parameter can be used to build a test of independence even in the presence of nugget effects. For notation simplicity, in what follows we consider covariance models with zero nuggets.
3.2 A Parametric Test Based on ML Under Increasing Domain Asymptotics Since we are assuming that the state of truth is represented by some parametric family of bivariate covariances {C θ (·), θ ∈ Θ ⊆ R p }, we may use the abuse of notation Σ N (θ) for the covariance matrix Σ N , in order to emphasize the dependence on the unknown parameters vector. Specifically, we assume that the parametric bivariate covariance model is of the type (3.4), or (3.5) so θ = (σ12 , σ22 , ψ , ρ12 ) or θ = (σ12 , σ22 , ψ 22 , ψ 11 , ψ 12 , ρ12 ) depending if a separable or a nonseparable bivariate covariance model is considered. For a realization Z N from a bivariate Gaussian random field, the log-likelihood, up to an additive constant, can be written as 1 1 l N (θ) = − log |Σ N (θ)| − Z N [Σ N (θ)]−1 Z N . 2 2
(3.10)
and θ N := argmaxθ∈Θ l N (θ) is the ML estimator of θ. Mardia and Marshall (1984) provide conditions in the univariate case for the consistency and the asymptotic normality of the maximum likelihood estimator. Under suitable conditions, θ N is consistent and asymptotically normal, with covariance matrix equal to the inverse of the Fisher Information matrix, given by F N (θ) =
dΣ N (θ) dΣ N (θ) p 1 tr Σ N (θ)−1 Σ N (θ)−1 . 2 dθi dθ j i, j=1 d
(3.11)
p That is, θ N → θ and θ N ≈ N (θ, F N (θ)−1 ) as N → ∞. The sufficient conditions for the asymptotic normality and weak consistency of the maximum likelihood estimator are given in Mardia and Marshall (1984) in the univariate case and Bevilacqua et al. (2015) in the bivariate case. Mardia and Marshall (1984) assume, among other conditions, that the sample set grows (i.e., si − s j ≥ c > 0) in such a way that the sampling domain increases in extent as N increases, i.e., the increasing domain setting. In general, it is not easy to check all the conditions because they are based, for instance, on the eigenvalues of the covariance matrix and its derivatives. Bevilacqua et al. (2015) show that these conditions are verified for a bivariate separable exponential model.
54
3 A Parametric Test Based on Maximum Likelihood
Then, as outlined in Bevilacqua et al. (2015), testing the independence or assessing the strength of correlation between the two components of a bivariate Gaussian random field with covariance (3.4) or (3.5), leads to the following hypothesis testing problems: (3.12) H0 : ρ12 = 0 versus H1 : ρ12 = 0, H0 : ρ12 ≤ k versus H1 : ρ12 > k,
(3.13)
where k belongs to the valid parameter space of the bivariate correlation model. , σ12 , σ22 , ψ ρ12 ) or θ N = ( σ12 , σ22 , Given the maximum likelihood estimate θ N = ( ,ψ , ,ψ ψ 11 12 22 ρ12 ) these tests are based on the asymptotic null distribution: ρ12 − ρ12 ≈ N (0, 1), se( ρ12 )
(3.14)
where se( ρ12 ) denotes the standard error of ρ12 given by se( ρ12 ) = F N ( θ N )−1 ρ12 , and ρ12 = 0 or ρ12 = k, depending on whether test (3.12) or (3.13) is considered. −1 associated with the θ N )−1 Here, F N ( ρ12 is the element on the diagonal of F N (θ N ) colocated correlation parameter. We consider a bivariate covariance model as in (3.5) in an increasing order of complexity: • A separable model that is a bivariate model with covariance model (3.4). σ12 σ1 σ2 ρ12 , M = [Rψ (||si In this case Σ N (θ) = A ⊗ M, where A = − σ22 − s j ||)]i,N j=1 is a correlation matrix and θ = (σ12 , σ22 , α, ρ12 ) . Here we are assuming, without loss of generality, that ψ = α is a scalar parameter. In this case, Bevilacqua et al. (2015) give an explicit form for the Fisher information matrix given by ⎞ N (2 − ρ212 ) −N ρ212 tr (B) −N ρ12 ⎜ 4σ 4 (1 − ρ2 ) 4σ 2 σ 2 (1 − ρ2 ) 2σ 2 2σ 2 (1 − ρ2 ) ⎟ 12 1 2 12 1 1 12 ⎟ ⎜ 1 ⎟ ⎜ −N ρ12 N (2 − ρ212 ) tr (B) ⎟ ⎜ ⎟ ⎜ − 4 2 2 2 2 ⎟ ⎜ 4σ (1 − ρ ) 2σ 2σ (1 − ρ ) 2 12 2 2 12 ⎟ ⎜ , F N (θ) = ⎜ − tr (B)ρ12 ⎟ ⎟ ⎜ 2 − − tr (B ) ⎟ ⎜ ⎜ 1 − ρ212 ⎟ ⎟ ⎜ ⎝ N (1 + ρ212 ) ⎠ − − − (1 − ρ212 )2 ⎛
and its inverse is given by
(3.15)
3.2 A Parametric Test Based on ML Under Increasing Domain Asymptotics
⎛ F −1 N (θ)
⎜ ⎜ ⎜ =⎜ ⎜ ⎝
σ14 ( N tr (B 2 )−C ) σ12 σ22 (tr (B 2 )−2ρ212 C ) NC NC σ24 ( N tr (B 2 )−C ) − NC
− −
−
σ12 tr (B) −σ12 ρ12 (ρ212 −1) C N
−
σ22 tr (B) −σ22 ρ12 (ρ212 −1) C N N 0 C (ρ212 −1)2 − N
− −
55
⎞ ⎟ ⎟ ⎟ ⎟, ⎟ ⎠
M where B = M −1 ∂∂α and C = N tr(B 2 ) − (tr(B))2 . Then, in this case: F −1 N (θ N )ρ12 =
(ρ212 − 1)2 N
(3.16)
and the associated standard error estimation is given by se(ρ12 ) = F −1 N (θ N )ρ12 . It is interesting to note that for the separable construction, the asymptotic distribution does not depend on the spatial correlation structure, i.e., on the choice of R. • A nonseparable model as in Eq. (3.5). In this case: Σ(θ) =
σ12 M 11 σ1 σ2 ρ12 M 12 σ1 σ2 ρ12 M 21 σ22 M 22
,
N where Mmn = [Rψmn (si − s j )]i= j=1 for m, n = 1, 2 with M12 = M21 . First, note that when ρ12 = 0, ψ 12 cannot be estimated. From a Fisher information perspective, it means that the Fisher information matrix is singular. For this reason, under this setting, the test (3.12) is not feasible and the test (3.13) can be considered only for k = 0. The inverse of the covariance matrix is given by
Σ
−1
(θ) =
A1 − σ11σ2 ρ12 A1 M 12 M −1 22 1 − 2 A2 σ
1 σ12
2
−1 −1 with A1 = M 11 − ρ212 M 212 M −1 and A2 = M 22 − ρ212 M 212 M −1 . 22 11 Without loss of generality we assume that ψ mn = αmn is a scalar parameter that is θ = (σ12 , σ22 , α11 , α12 , α22 , ρ12 ) . Bevilacqua et al. (2015) give an explicit form for the Fisher information matrix F N (θ) in this case. It turns out from the (rather complicated) expression of the Fisher information matrix that the asymptotic variance of the correlation parameter depends on the spatial dependence. Specifically, 2 2 2 it is given by F N (θ)ρ12 = f (E, L), where E = A2 M −1 11 M 12 , L = A1 A2 M 12 2 and f (X, Y ) = ρ12 tr(X) + tr(Y ).
56
3 A Parametric Test Based on Maximum Likelihood
3.3 A Parametric Test Based on ML Under Fixed Domain Asymptotics Equivalence and orthogonality of probability measures are useful tools when assessing the asymptotic properties of both prediction and estimation for Gaussian fields (Skorokhod and Yadrenko 1973; Ibragimov and Rozanov 1978; Stein 1999), under fixed domain asymptotics. Denote with Pi , i = 0, 1, two probability measures defined on the same measurable space {Ω, F}. P0 and P1 are called equivalent (denoted P0 ≡ P1 ) if P1 (A) = 1 for any A ∈ F implies P0 (A) = 1 and vice versa. On the other hand, P0 and P1 are orthogonal (denoted P0 ⊥ P1 ) if there exists an event A such that P1 (A) = 1 but P0 (A) = 0. In order to define previous concepts, we restrict the event A to the σ-algebra generated by the univariate random field Z = {Z (s), s ∈ D}, where D ⊂ Rd . We emphasize this restriction by saying that the two measures are equivalent on the paths of Z . Gaussian measures are completely characterized by their mean and covariance function. We write P(ρ) for a Gaussian measure with zero-mean and covariance function ρ. It is well known that two Gaussian measures are either equivalent or orthogonal on the paths of Z (Ibragimov and Rozanov 1978). Stein (1988, 1990, 1999, 2004) provides conditions under which predictions under a misspecified covariance function are asymptotically efficient, and mean square errors converge almost surely to their targets. Since Gaussian measures depend exclusively on their mean and covariance functions, practical evaluation of Stein’s conditions translates into the fact that the true and the misspecified covariances must be compatible, i.e., the induced Gaussian measures are equivalent. Under fixed domain asymptotics, no general results are available for the asymptotic properties of maximum likelihood estimators in the univariate case. Yet, some results have been obtained when assuming that the covariance belongs to the parametric family of Matérn and Generalized Wendland when the Gaussian field is defined on Rd , d = 1, 2, 3. For instance, if P(σ02 MT ν,α0 ) and P(σ12 MT ν,α1 ) are two zero-mean Gaussian probability measures with two Matérn covariance functions sharing the same smoothness parameter then, for any bounded infinite set D ⊂ Rd , d = 1, 2, 3, P(σ02 MT ν,α0 ) ≡ P(σ12 MT ν,α1 ) on the paths of Z if and only if (Zhang 2004): σ02 /α02ν = σ12 /α12ν .
(3.17)
From inference point of view, the consequence of this result is that, under fixed domain asymptotics, only the so-called microergodic parameter (Stein 1999) σ 2 /α2ν can be estimated consistently. A similar result can be found in Bevilacqua et al. (2018) for the Generalized Wendland model in Eq. (3.7).
3.3 A Parametric Test Based on ML Under Fixed Domain Asymptotics
57
Zhang and Cai (2015) generalized this results for a bivariate Gaussian random fields with separable bivariate Matérn covariance model. Specifically, let Z 12 = {Z 12 (s) = (Z 1 (s), Z 2 (s)) , s ∈ D}, D ⊂ Rd , d = 1, 2, 3 a zero-mean bivariate Gaussian random field with bivariate Matérn (BMT ) covariance model defined in (3.6) in its separable version: 2 BMT η (h) = ρi j σi σ j MT ν,α (h) i, j=1 , ρii = 1, |ρ12 | < 1,
(3.18)
2 2 with η = (σ12 , σ22 , ν, α, ρ12 ) . Now let η i = (σ1,i , σ2,i , ν, αi , ρ12,i ) , i = 0, 1 and let P(BMT η0 ), P(BMT η1 ) two Gaussian measures. Then Zhang and Cai (2015) showed that If 2 2 2 2 /α02ν = σ1,1 /α12ν , σ2,0 /α02ν = σ2,1 /α12ν , ρ12;0 = ρ12;1 , σ1,0
(3.19)
then, for any bounded infinite set D ⊂ Rd , d = 1, 2, 3, P(BMT η0 ) ≡ P(BMT η1 ) on the paths of Z 12 . Velandia et al. (2017) show that, in the special case d = 1 and ν = 0.5, conditions (3.19) are also necessary for the equivalence of the two Gaussian measures. An immediate consequence of these results is that, under fixed domain asymptotics, only the three microergodic parameters σ12 /α, σ22 /α, and ρ12 can be estimated consistently. 2 2 σ1;N , σ2;N , α N , and ρ12;N the maximum Given a realization Z N from Z 12 let likelihood estimator of the covariance parameters obtained jointly maximizing (3.10), under some restriction of the parametric space (see Velandia et al. 2017 for the details). p p 2 / α N → σi2 /α, i = 1, 2 and ρ12;N → ρ12 and Velandia et al. (2017) show that σi;N that the asymptotic distribution of maximum likelihood estimator of the microergodic parameters is given by ⎛
⎞ 2 / αN σ1;N d 2 ⎝ σ2;N / αN ⎠ ≈ N ρ12;N
⎛
⎛
⎛
⎞
⎜ σ12 /α ⎜⎝ 2 ⎠ ⎜ σ2 /α , ⎝ ρ12
1 N
2(
σ12 2 α )
⎜ ⎜ ρ12 σ01 σ02 2 ⎜ 2( ) α ⎝ ρ12 σ12 (1−ρ212 ) α
2( ρ12 σα01 σ02 )2 2(
σ22 2 α )
ρ12 σ22 (1−ρ212 ) α
(ρ12 σ12 (1−ρ212 ) α ρ12 σ22 (1−ρ212 ) α
⎞⎞ ⎟⎟ ⎟⎟ ⎟⎟ . (3.20) ⎠⎠
(ρ212 − 1)2
Note that the asymptotic variance of ρ12;N under fixed domain asymptotics coincide to the asymptotic variance in the increasing domain setting in Eq. (3.16), i.e., (ρ212 − 1)2 /N . This implies that, at least in the separable exponential model and d = 1, the asymptotic distribution of the maximum likelihood estimator of the colocated correlation parameter does not depend on the asymptotic framework, i.e., tests (3.12) and (3.13) lead to the same results irrespectively of the asymptotic framework.
58
3 A Parametric Test Based on Maximum Likelihood
3.4 Examples with the R Package GeoModels In this section we depict some examples of the tests (3.12) and (3.13) with simulated data, using the bivariate Matérn model in Eq. (3.6). Simulation and computation of maximum likelihood estimation for a bivariate Gaussian random field with some specific covariance models was implemented through the GeoSim and GeoFit functions in the R package GeoModels (Bevilacqua and Morales 2018).
3.4.1 An Example of Test of Independence Using a Separable Matérn model We first randomly select two hundred spatial location sites in the unit square: > > > > > >
require ( GeoModels ) N > > > > # > + +
pcol > > > > > > > > > > + +
pcol > >
s p a t i a l grid x + > +
s i m p l e co - k r i g i n g map p r e d i c t i o n for h e i g h t data require ( fields ) c o l o u r test as . n u m e r i c (1 - pnorm ( test )) [1] 0 . 9 6 9 5 0 4 8
The p-value in this case is approximatively 0.997 thus, there is clear evidence to accept H0 .
3.6 Problems for the Reader 3.1 Show that if {(Z 1 (s), Z 2 (s)) , s ∈ A ⊆ Rd } is a bivariate Gaussian random field with covariance function (3.5) and {(K 1 (s), K 2 (s)) , s ∈ A ⊆ Rd } is a bivariate Gaussian white noise such that V(K i (s)) = τi2 for i = 1, 2 and Cov(K 1 (s), K 2 (s)) = 1 1 ρ12 ((σ12 + τ12 ) 2 (σ22 + τ22 ) 2 − σ1 σ2 ) then, if Z i ⊥ K j , i, j = 1, 2, the bivariate random field {(Z 1 (s) + K 1 (s), Z 2 (s) + K 2 (s)) , s ∈ A ⊆ Rd } has covariance function (3.9). 3.2 Let Z N be a realization from a bivariate Gaussian random field {(Z 1 (s), Z 2 (s)) , s ∈ A ⊆ Rd } with E(Z i (s)) = μi and separable covariance given in Eq. (3.4). Then the log-likelihood function is given by 1 1 l N (γ) = − log |Σ N (θ)| − (Z N − μ)T [Σ N (θ)]−1 (Z N − μ), 2 2
(3.21)
where μ = (μ1 , μ2 ) and γ = (μ , θ ) , where dim(θ) = q. Show that the associated Fisher information matrix is given by ⎛ ⎜ F ∗N (γ) = ⎝
σ22 1 R (θ)−1 1 N d N N
− σ1 σd2 ρ12 1N R N (θ)−1 1 N
− −
−
σ12 1 R (θ)−1 1 N d N N
0q
⎞
⎟ 0q ⎠, F N (θ)
where 1 N and 0q are the unit vector of length N and the zero vector of length q, respectively, d = σ12 σ22 (1 − ρ212 ) and F N (θ) is defined in Eq. (3.15).
References Bevilacqua, M., & Morales, V. (2018). GeoModels: A package for Geostatistical Gaussian and non Gaussian Data Analysis. R package version 1.0.3-4. https://vmoprojs.github.io/GeoModelspage. Bevilacqua, M., Vallejos, R., & Velandia, D. (2015). Assessing the significance of the correlation between the components of a bivariate Gaussian random field. Environmetrics, 26, 545–556.
68
3 A Parametric Test Based on Maximum Likelihood
Bevilacqua, M., Faouzi, T., Furrer, R., & Porcu, E. (2018). Estimation and prediction using generalized Wendland covariance function under fixed domain asymptotics. Annals of Statistics (to appear). Cressie, N. (1993). Statistics for spatial data. New York: Wiley. Daley, J. D., Porcu, E., & Bevilacqua, M. (2015). Classes of compactly supported covariance functions for multivariate random fields. Stochastic Environmental Research and Risk Assessment, 29, 1249–1263. Efron, B., & Hinkley, D. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected fisher information. Biometrika, 65, 457–482. Furrer, R., & Sain, S. R. (2010). spam: A sparse matrix R package with emphasis on MCMC methods for Gaussian Markov random fields. Journal of Statistical Software, 36, 1–25. Genton, M. G., & Kleiber, W. (2015). Cross-covariance functions for multivariate geostatistics. Statistical Science, 30, 147–163. Gneiting, T. (2002). Compactly supported correlation functions. Journal of Multivariate Analysis, 105, 1167–1177. Gneiting, T., Kleiber, W., & Schlather, M. (2010). Matern cross-covariance functions for multivariate random fields. Journal of the American Statistical Association, 105, 1167–1177. Ibragimov, I. A., & Rozanov, Y. A. (1978). Gaussian random processes. New York: Springer. Mardia, K. V., & Marshall, R. J. (1984). Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika, 71, 135–146. Nelder, J. A., & Mead, R. (1965). A simplex algorithm for function minimization. Computer Journal, 7, 308–313. Skorokhod, A. V., & Yadrenko, M. I. (1973). On absolute continuity of measures corresponding to homogeneous Gaussian fields. Theory of Probability and Its Applications, 71, 135–146. Stein, M. (1988). Asymptotically efficient prediction of a random field with a misspecified covariance function. The Annals of Statistics, 16, 55–63. Stein, M. (1990). Uniform asymptotic optimality of linear predictions of a random field using an incorrect second order structure. The Annals of Statistics, 19, 850–872. Stein, M. (1999). Interpolation of spatial data. Some theory of kriging. New York: Springer. Stein, M. (2004). Equivalence of Gaussian measures for some nonstationary random fields. Journal of Statistical Planning and Inference, 123, 1–11. Velandia, D., Bachoc, F., Bevilacqua, M., Gendre, X., & Loubes, J. M. (2017). Maximum likelihood estimation for a bivariate Gaussian process under fixed domain asymptotics. Electronic Journal of Statistics, 11, 2978–3007. Wackernagel, H. (1978). Multivariate geostatistics: An introduction with applications. New York: Springer. Zhang, H. (2004). Inconsistent estimation and asymptotically equivalent interpolations in modelbased geostatistics. Journal of the American Statistical Association, 99, 250–261. Zhang, H., & Cai, W. (2015). When doesn’t cokriging outperform kriging? Statistical Science, 30, 176–180. Zhang, H., & Zimmerman, D. L. (2005). Towards reconciling two asymptotic frameworks in spatial statistics. Biometrika, 92, 921–936.
Chapter 4
Tjøstheim’s Coefficient
In the previous chapters, we studied the association between two georeferenced sequences from a hypothesis testing perspective. In the following three chapters, we focus on some coefficients of spatial association. These coefficients are not simple modifications of the correlation coefficient, but the underlying idea of its construction relies on the properties of the inner product from which the correlation coefficient was designed.
4.1 Measures of Association Let (Ω, F, P) be a probability space and denote by L 2 (Ω, F, P) the set of all realvalued random variables on (Ω, F, P) that have finite second moments. Define the inner product of two random variables X and Y as X, Y = E[X Y ].
(4.1)
We call the norm associated with L 2 (or the induced by L 2 norm) X 2 = X, X 1/2 . Thus, the distance between X and Y is d(X, Y ) = X − Y 2 . An inner product space is called a Hilbert space if it is complete as a metric space. This means that Cauchy sequences are convergent. The L 2 space is a Hilbert space. For the proof, see, for instance, Fristedt and Gray (1997). © Springer Nature Switzerland AG 2020 R. Vallejos et al., Spatial Relationships Between Two Georeferenced Variables, https://doi.org/10.1007/978-3-030-56681-4_4
69
70
4 Tjøstheim’s Coefficient
For two second-order random variables X and Y , the Cauchy-Schwarz inequality states that E[X Y ]2 ≤ E[X 2 ] E[Y 2 ]. Then, the covariance between X and Y can be written using the inner product as cov[X, Y ] = X − E[X ], Y − E[Y ]. Similarly, the correlation coefficient is Y − E[Y ] X − E[X ] . , cor[X, Y ] = E[(X − E[X ])2 ] E[(Y − E[Y ])2 ] Because of the Cauchy-Schwarz inequality, | E[(X − E[X ])(Y − E[Y ])]| ≤
E[(X − E[X ])2 ] E[(Y − E[Y ])2 ],
or equivalently, | cov[X, Y ]| ≤ (var[X ] var[Y ])1/2 . Thus, the correlation coefficient satisfies | cor[X, Y ]| ≤ 1. Once an inner product has been defined in a suitable space of random variables, a coefficient of association between X and Y can be defined as the angle between them ρ=
X, Y . X 2 · Y 2
The generalization for two random vectors is straightforward. Let X = (X 1 , . . . , X n ) and Y = (Y1 , . . . , Yn ) be two random vectors, and let x and y be two realizations of X and Y , respectively. Then, using the usual inner product of Rn , we have that n xi · yi . x, y = x · y = i=1
In this context, the Cauchy-Schwarz inequality allows one to establish the sample correlation coefficient as (xi − x)(yi − y) x − x 1n , y − y 1n = i , r= 2 2 x − x 1n · y − y 1n (x i i − x) i (yi − y) and thus, |r | ≤ 1.
4.1 Measures of Association
71
The inference for ρ for a bivariate normal distribution and further discussions can be found in Anderson (2003).
4.2 Definition of the Measure and Its Properties Tjøstheim (1978) introduced a nonparametric coefficient to measure the association between two spatial sequences. This measure is useful even for two ordinal variables. It constitutes an extension of standard rank-order correlation coefficients such as Spearman’s rho and Kendall’s tau (Glick 1982). Consider again two spatial processes X (s) and Y (s) defined on a two-dimensional space, that is, s = (s1 , s2 ) . Define the function ⎧ ⎪ ⎨0, u < 0, c(u) = 1, u > 0, (4.2) ⎪ ⎩1 , u = 0. 2 Then, the rank R(si ) of process X (s) at the point si is defined as R X (si ) =
n
c(X (si ) − X (s j )),
(4.3)
j=1
and similarly for RY (si ). We define the s1 coordinate corresponding to the rank i of X (s) as FX . Then, n FX (i) = s j1 δ(i, R X (s j )), (4.4) j=1
where δ(i, j) is the Kronecker delta and si = (si1 , si2 ), i = 1, . . . , n. The s1 coordinate corresponding to the rank i of Y (s) is G X (i) =
n
s j1 δ(i, RY (s j )).
(4.5)
j=1
Similarly we define the s2 coordinate corresponding to the rank i of X (s) as FY . Then n s j2 δ(i, R X (s j )). (4.6) FY (i) = j=1
Finally, the s2 coordinate corresponding to the rank i of Y (s) is G Y (i) =
n j=1
s j2 δ(i, RY (s j )).
(4.7)
72
4 Tjøstheim’s Coefficient
Tjøstheim’s coefficient is defined as
i [(FX (i) −
A =
i [(FX (i) −
FX
)2
F X )(FY (i) − F Y ) + (G X (i) − G X )(G Y (i) − G Y )] , (4.8) + (G X (i) − G X )2 ] i [(FY (i) − F Y )2 + (G Y (i) − G Y )2 ]
where F X = i FX (i)/n, F Y = i FY (i)/n and similarly for G X and G Y . Tjøstheim (1978) proved that if X and Y are two vectors of n independent random variables and X, Y are also independent, then the variance of A is given by var(A) =
(
2 2 + 2( i si1 si2 )2 + ( i si2 ) 2 2 2 . (n − 1)( i si1 + i si2 )
2 2 i si1 )
(4.9)
A discussion about the advantages and disadvantages of Tjøstheim’s coefficient and some extensions are available in Hubert and Golledge (1982). Applications of Tjøstheim’s coefficient can be found in Cliff and Ord (1981), Hubert et al. (1985), Rukhin and Vallejos (2008), and Lai et al. (2010), among others. Previous work on demonstrating the poor performance of classic correlation coefficients for spatially dependent data was reported in Bivand (1980) and Glick (1982). Example 4.1 Consider the five locations s1 = (2, 1), s2 = (1, 2), s3 = (2, 2), s4 = (3, 2), and s5 = (2, 3) shown in Fig. 4.1. Assume that two processes have been observed on these locations such that X (2, 1) = 7, X (1, 2) = 4, X (2, 2) = 6, X (3, 2) = 2, X (2, 3) = 0, Y (2, 1) = 8, Y (1, 2) = 0, Y (2, 2) = 7, Y (3, 2) = 1, and Y (2, 3) = 5. Then, R X (s1 ) = c(0) + c(X (s1 ) − X (s2 )) + c(X (s1 ) − X (s3 )) + c(X (s1 ) − X (s4 )) + c(X (s1 ) − X (s5 ))
2.5 2.0
Fig. 4.1 Locations on the plane of the five points considered in Example 4.1
3.0
+ 1 + 1 + 1 + 1 = 4.5.
1.5
1 2
1.0
=
1.0
1.5
2.0
2.5
3.0
4.2 Definition of the Measure and Its Properties
73
Similarly, R X (s2 ) = 2.5, R X (s3 ) = 3.5, R X (s4 ) = 1.5, R X (s5 ) = 0.5, RY (s1 ) = 4.5, RY (s2 ) = 0.5, RY (s3 ) = 3.5, RY (s4 ) = 1.5, and RY (s5 ) = 2.5. At this point, we emphasize that the ranks should be natural numbers between 1 and 5 for X and Y . Here, we obtained different results because we are exemplifying formula (4.3). In this case, R X (si ) and RY (si ) vanish because none of the ranks are integers. In practice, to avoid this situation, statistical software packages have suitable routines to pass over the tie issue if does not exist. For the remainder of the example, we used the rank function in R with the option first to generate new ranks for X and Y . Consequently, R X (s1 ) = 5, R X (s2 ) = 3, R X (s3 ) = 4, R X (s4 ) = 2, R X (s5 ) = 1, RY (s1 ) = 5, RY (s2 ) = 1, RY (s3 ) = 4, RY (s4 ) = 2, RY (s5 ) = 3, FX = (2, 3, 1, 2, 2), FY = (3, 2, 2, 2, 1), G X = (1, 3, 2, 2, 2), and G Y = (2, 2, 3, 2, 1). Then, Tjostheim’s coefficient for this dataset is A = 0.
4.3 Applications and R Computations 4.3.1 The R Function cor.spatial The coefficient first introduced by Tjøstheim (1978) was implemented through the cor.spatial function in the R package SpatialPack (Osorio and Vallejos 2020). This procedure handles the possible ties that can occur in the observed values through the option ties.method = "first". In the computation of Tjøstheim’s coefficient, the coordinates of the ranks defined in Eqs. (4.4)–(4.7) are first centered and then computed using R commands, whereas the computation of A in Eq. (4.8) is performed in C. Internally, the calculations are optimized by calling level 1 routines from BLAS (Lawson et al. 1979). The procedure also returns var(A) as an attribute of "variance".
4.3.2 Application 1: Murray Smelter Site Revisited Here, we consider the Murray Smelter Site Dataset described in Sect. 1.1.2. The R code > c o o r d s < - m u r r a y [ c ( " x p o s " , " y p o s " )] > x y m u r r a y . cor m u r r a y . cor [1] 0 . 5 3 4 9 4 attr ( , " v a r i a n c e " ) [1] 0 . 0 0 3 5 4
provides Tjøstheim’s coefficient and its variance. For the Murray dataset, the value of this coefficient is 0.5349, and its variance is 0.0035. This nonparametric correlation
74
4 Tjøstheim’s Coefficient
coefficient is far from the values obtained with other methods, thereby providing evidence of a positive spatial autocorrelation and a spatial codispersion in all cases larger than 0.46. These results are consistent with the findings reported by Rukhin and Vallejos (2008) in a series of Monte Carlo simulations in which the correlation coefficient, Tjøstheim’s coefficient, and the codispersion coefficient were compared in terms of bias and mean square error. For large values of the correlation between the processes, Tjøstheim’s coefficient was the most biased one.
4.3.3 Application 2: Flammability of Carbon Nanotubes To illustrate a practical application of Tjøstheim’s coefficient with data measured on a rectangular grid, an example related to the flammability of polymers is considered here. The flame-retardant property of clay polymer nanocomposites improves the physical and flammability properties of polymers (see Kashiwagi et al. 2005). The distribution of this nanotube was examined by optical microscopy. This distribution is believed to mainly depend on the distance from the top surface to the location of the polymer matrix (polymethyl methacrylate). The collected dataset consists of four 512 × 512 images plotted in Fig. 4.2. Images (a) and (b) were taken at the same distance from the main polymer matrix, and images (c) and (d) were both taken at the same distance from the main polymer matrix but closer than images (a) and (b). Rukhin and Vallejos (2008) used the codispersion coefficient to address the association between all pairs of images from Fig. 4.2. Here, we computed Tjøstheim’s coefficient to explore the association between these images. Using the function cor.spatial, we obtained that the value of the coefficient between images (a) and (b) is 0.335, that between (a) and (c) is 0.016, and that between (c) and (d) is 0.346. Although the results are not stressing the strong association between images taken at the same distance, the coefficient preserves the expected patterns. The arguments of the function cor.spatial were obtained from a file called nanotubes, which contains six columns arranged in such a way that the four first columns are the variables and the last two columns are the x and y coordinates. > nanotubes a b c 1 78 64 29 2 67 55 28 3 37 32 30 4 38 40 31 5 33 35 28 ...
# l a r g e d a t a s e t w i t h 2 6 2 1 4 4 rows d xpos ypos 28 1 1 30 1 2 29 1 3 27 1 4 27 1 5
Then, the coefficients were obtained running the code > > > >
c o o r d s 0 is given by β = P (Q 0 > z α |H A ) . Note that Mρ0 M( ρ X Y (h) − ρ0 ) > zα − β = P [Q 0 > z α |H A ] = P v v ! Mρ0 ≈ P Z > zα − v ! Mρ0 . = 1 − Φ zα − v
!
Because v is generally unknown, we can use any consistent estimator of v, for example, v , and compute β ≈ 1 − Φ zα −
! Mρ0 . v
(5.26)
Now, consider the test H0 : ρ = 0 versus H A : ρ > 0.
(5.27)
Let r be the sample correlation coefficient computed over M 2 pixels. It is well known (Anderson 2003, Chap. 4) that for i.i.d. random variables X and Y , r has the null distribution with p.d.f. (M 2 −4)
(1 − r 2 ) 2 # , |r | ≤ 1, f (r ) = " 2 B 21 , (M 2−2)
94
5 The Codispersion Coefficient
where B is the Beta function. Critical values of r are usually obtained from t-tables 2 −2)1/2 r because (M has a t-distribution with M 2 − 2 degrees of freedom (Anderson (1−r 2 )1/2 2003). We denote the power function of the test (5.27) as βρ . Example 5.4 To explore the power of tests (5.25) and (5.27) for spatial models, we consider X (i, j) = φ1 X (i − 1, j) + φ2 X (i, j − 1) + 1 (i, j),
(5.28)
Y (i, j) = ψ1 Y (i − 1, j) + ψ2 Y (i, j − 1) + 2 (i, j),
(5.29)
where (i, j) ∈ Z and the parameters φ1 , φ2 , ψ1 , and ψ2 belong to the stationary region described by Basu and Reinsel (1993). Then, for s = (i, j), the process Z(s) = (X (s), Y (s)) has the convergent representation Z(s) =
A(t)(s − t),
(5.30)
t∈T
where (s) are independent random vectors with mean 0 and covariance matrix Σ, with T = Z2+ , and A(t) = A(k, l) =
k +l k
φ1k φ2l 0 0 ψ1k ψ2l
= diag(α t , β t ).
Because condition (5.30) is satisfied, Theorem 5.1 holds, and we recall that the asymptotic variance can be written as vh2
2 ( α t β t )2 ρ X Y (h)2 2 2 = 1− . αt βt
We ran simulation experiments to observe the behavior of the power functions of the similarity coefficient and the correlation coefficient. To compute the variance of the similarity coefficient, we use the results given by (Rukhin 2006, Sect. 6). By setting x and y for h = (h 1 , h 2 ) such that 1 + x 4 + y 4 + −2x 2 − 2y 2 − 2x 2 y 2 > 0, Hh (x, y) =
∞ ∞
(k + l)!(k + l + h 1 + h 2 )! k=0 l=0
k!l!(k + h 1 )!(l + h 2 )!
x 2k+h 1 y 2l+h 2 ,
we have
s
×
α t β t = 2H0 ( φ1 φ2 , ψ1 ψ2 ) − Hh ( φ1 φ2 , ψ1 ψ2 )
$
φ1 φ2
h 1 /2
ψ1 ψ2
h 2 /2
+
φ2 φ1
h 1 /2
ψ2 ψ1
h 2 /2 %
,
(5.31)
5.5 Hypothesis Testing
95
α t2 = 2 [H0 (φ1 , ψ1 ) − Hh (φ1 , ψ1 ] ,
t
β t2 = 2 [H0 (φ2 , ψ2 ) − Hh (φ2 , ψ2 ] ,
t
0.2
0.4
0.4
0.6
0.6
0.8
0.8
1.0
1.0
where the function H is given by (5.31). To estimate the standard deviation of ρ X Y (h), we constructed a function to evaluate (5.31). For example, for the simulation study conducted in Sect. 5.6, with h 1 = 1, h 2 = 0, φ1 = ψ1 = 0.3, φ2 = ψ2 = 0.2,
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0
0
(b)
0.2
0.4
0.6
0.8
1.0
(a)
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0
(c)
Fig. 5.3 Power functions associated with tests (5.25) and (5.27) when the simulated data were generated from processes (5.28) and (5.29) for a significance level α = 0.05. a h 1 = 1, h 2 = 0; b h 1 = 0, h 2 = 1; c h 1 = 1, h 2 = 1
96
5 The Codispersion Coefficient
one obtains t α t = 0.0207, t α t2 = 0.0549, and β t2 = 0.7305. Then, for ρ = 0.05, v = 0.9971, ρ = 0.10, v = 0.9904, and ρ = 0.15, v = 0.9784. The results of the power functions are displayed in Fig. 5.3.
5.6 Numerical Experiments In this section, we report the results of two Monte Carlo simulation studies that enhance the behavior of the codispersion coefficient as an estimation of the correlation between two georeferenced variables.
5.6.1 Simulation of Spatial Autoregressive Models Consider X and Y as in (5.28) and (5.29) with the correlation structure (5.4). In the Monte Carlo simulation, two sets of parameters and three values for ρ were used: φ1 = 0.3, φ2 = 0.2, ψ1 = 0.3, ψ2 = 0.2; φ1 = 0.3, φ2 = 0.2, ψ1 = 0.6, ψ2 = 0.1; ρ = 0, 0.1, 0.3, 0.5, 0.7, and 0.9. One thousand replicates were obtained for each case of the two sets of parameters. The average values and the standard deviations of the estimates over these replicates are given in Table 5.2 and in Rukhin and Vallejos (2008). The simulations were performed for a grid size of 30 × 30. One then has E[X (s)Y (s)] = c, where the multiple c can be determined. Hence, all the simulation averages reported in Table 5.2 were corrected by the factor 1/c. From Table 5.2, we see that the similarity coefficient and the correlation coefficient are comparable and less biased than Tjøstheim’s coefficient. The bias of Tjøstheim’s coefficient increases as ρ increases, and the standard deviation of the similarity coefficient is smaller than the standard deviation of Tjøstheim’s coefficient when ρ ≥ 0.5. The standard deviations of the codispersion coefficient and the correlation coefficient decrease as ρ increases. All estimators underestimate the true value. In general, the performance of the codispersion and correlation coefficients is better than that of Tjøstheim’s coefficient for ρ = 0.
5.6.2 Simulation of Spatial Moving Average Models Here, we consider the models X (i, j) = 1 (i, j) − θ1 1 (i − 1, j) − θ2 1 (i, j − 1),
(5.32)
Y (i, j) = 2 (i, j) − η1 2 (i − 1, j) − η2 2 (i, j − 1),
(5.33)
with correlation structure (5.4). It follows from (5.1) that
5.6 Numerical Experiments
97
Table 5.2 The means and standard deviations of ρ X Y (h), r , and ρ T for ρ = 0, 0.1, and 0.3. ρ T represents Tjøstheim’s coefficient ρ
φ1
φ2
ψ1
ψ2
h1
h2
ρ X Y
s.d.( ρX Y ) r
s.d.(r )
ρ T
s.d.( ρT )
0
0.3
0.2
0.3
0.2
1
0
−0.0006
0.0384
0.0002
0.0389
0.0028
0.0391
0
0.3
0.2
0.3
0.2
0
1
0.0008
0.0416
0.0004
0.0388
0.0027
0.0391
0
0.3
0.2
0.3
0.2
1
1
0.0012
0.0405
0.0006
0.0394
0.0028
0.0391
0
0.3
0.2
0.6
0.1
1
0
−0.0008
0.0365
0.0003
0.0421
0.0054
0.0397
0
0.3
0.2
0.6
0.1
0
1
0.0006
0.0454
0.0002
0.0421
0.0054
0.0397
0
0.3
0.2
0.6
0.1
1
1
0.0012
0.0434
0.0005
0.0426
0.0054
0.0397
0.1
0.3
0.2
0.3
0.2
1
0
0.0990
0.0411
0.1008
0.0423
0.0466
0.0406
0.1
0.3
0.2
0.3
0.2
0
1
0.1009
0.0424
0.1006
0.0417
0.0466
0.0406
0.1
0.3
0.2
0.3
0.2
1
1
0.0992
0.0443
0.1010
0.0427
0.0466
0.0406
0.1
0.3
0.2
0.6
0.1
1
0
0.0955
0.0382
0.0941
0.0449
0.0453
0.0421
0.1
0.3
0.2
0.6
0.1
0
1
0.0941
0.0462
0.0941
0.0446
0.0453
0.0421
0.1
0.3
0.2
0.6
0.1
1
1
0.0929
0.0469
0.0944
0.0389
0.0453
0.0421
0.3
0.3
0.2
0.3
0.2
1
0
0.2989
0.0378
0.3006
0.0389
0.1343
0.1343
0.3
0.3
0.2
0.3
0.2
0
1
0.3006
0.0390
0.3004
0.0383
0.1343
0.0405
0.3
0.3
0.2
0.3
0.2
1
1
0.2990
0.0408
0.3007
0.0393
0.1343
0.0405
0.3
0.3
0.2
0.6
0.1
1
0
0.2878
0.0352
0.2793
0.0416
0.1246
0.0401
0.3
0.3
0.2
0.6
0.1
0
1
0.2806
0.0428
0.2792
0.0413
0.1246
0.0401
0.3
0.3
0.2
0.6
0.1
1
1
0.2796
0.0434
0.2795
0.0421
0.1246
0.0401
⎧ √ ρ[2(θ2 1 η12+θ2 η2 )−θ12−η12] , h 1 = 1, h 2 = 0, ⎪ ⎪ ⎪ ⎨ 2 (1+θ1 +θ2 +θ1 )(1+η1 +η2 +η1 ) ρ[2(θ1 η1 +θ2 η2 )−θ2 −η2 ] ρ X Y (h 1 , h 2 ) = 2√(1+θ 2 +θ 2 +θ )(1+η2 +η2 +η ) , h 1 = 0, h 2 = 1, 2 2 1 2 1 2 ⎪ ⎪ ⎪ ⎩ √ ρ(θ2 1 η12+θ2 η2 )2 2 , h 1 ≥ 1, h 2 ≥ 1. (1+θ1 +θ2 )(1+η1 +η2 )
T We performed a simulation study to observe the performance of ρ X Y , r , and ρ for several values of ρ, and three sets of parameters: i) θ1 = 0.3, θ2 = 0.5, η1 = 0.3, η2 = 0.5, ii) θ1 = 0.3, θ2 = 0.5, η1 = 0.2, η2 = 0.4, and iii) θ1 = 0.3, θ2 = 0.5, η1 = 0.1, η2 = 0.6. The simulation results show patterns similar to those of the autoregressive case discussed in the previous section. Tjøstheim’s coefficient is strongly biased, and the performance of the codispersion coefficient is similar to that of the correlation coefficient. However, the sample codispersion coefficient appears to do a good job of estimating its theoretical value. In all cases, the standard deviation of the codispersion coefficient is smaller than the standard deviation of the correlation coefficient. For large values of ρ (ρ ≥ 0.7), the standard deviation of the codispersion coefficient is the smallest. When ρ = 0.9, the standard deviation of the codispersion coefficient reported for smaller values of ρ exceeds the standard deviation of the codispersion coefficient. In all cases, the coefficients are biased downward.
98
5 The Codispersion Coefficient
5.6.3 Coverage Probability To construct an approximate confidence interval for ρ X Y (h), h ∈ Z+ , we use Theorem 5.1, vh ρ X Y (h) ± z α/2 √ , M where z α/2 is the (1 − α/2)-th percentile of the standard normal distribution. To study the performance of this interval, we conducted a simulation study for several values of ρ with 10000 simulation runs from two stationary models of the form X (t) = φ1 X (t − 1) + 1 (t), Y (t) = φ2 Y (t − 1) + 2 (t), where 1 (·) and 2 (·) are two white noise sequences such that var[ 1 (t)] = σ 2 , var[ 2 (t)] = τ 2 , and ρσ τ s = t, cov[ 1 (s), 2 (t)] = 0 s = t for |ρ| < 1. The estimated coverage probability (ECP) was compared with the theoretical (known) coverage probability, written as CP. For ρ ≤ 0.3, the ECP is between 0.88 and 0.94; however, for larger values of ρ, the ECP is between 0.08 and 0.96. The worst case found corresponds to φ1 = −0.5, φ2 = 0.3, and ρ = 0.7. The codispersion coefficient is biased for large values of ρ, especially when |φ1 − φ2 | is large. For the pairs of parameters φ1 = 0.8, φ2 = 0.5, and φ1 = 0.4, φ2 = 0.2, we have 0.90 < ECP < 0.96. Hence, in these cases, all the estimated coverage probabilities are accurate. We did not observe any clear pattern as M increases; M = 50 was sufficient to obtain accurate estimates. The same procedure was repeated for the spatial models (5.28) and (5.29). We studied the coverage probability of a confidence interval for ρ X Y of the form ρ X Y (h) ± z α/2
vh , M
(5.34)
where vh is the variance of the sample codispersion coefficient. From Table 5.3, we observed that the behavior of the ECP is similar to that of the one-dimensional case. The ECP is not strongly affected by large values of ρ, as in the one-dimensional case.
5.7 The Codispersion Map
99
Table 5.3 Coverage probability of (5.34) for φ1 = 0.3, φ2 = 0.2, ψ1 = 0.3, ψ2 = 0.2, CP = 95 M ρ h1 h2 ECP M ρ h1 h2 ECP M ρ h1 h2 ECP 10
0 0 0 0.1 0.1 0.1 0.3 0.3 0.3 0.5 0.5 0.5
1 0 1 1 0 1 1 0 1 1 0 1
0 1 1 0 1 1 0 1 1 0 1 1
93.5 15 90.6 91.7 93.3 91.2 91.3 92.7 92.6 91.5 83.0 82.7 81.9
0 0 0 0.1 0.1 0.1 0.3 0.3 0.3 0.5 0.5 0.5
1 0 1 1 0 1 1 0 1 1 0 1
0 1 1 0 1 1 0 1 1 0 1 1
93.0 30 91.4 91.0 92.8 90.7 91.8 92.2 91.3 90.1 86.0 85.2 85.4
0 0 0 0.1 0.1 0.1 0.3 0.3 0.3 0.5 0.5 0.5
1 0 1 1 0 1 1 0 1 1 0 1
0 1 1 0 1 1 0 1 1 0 1 1
92.1 91.4 91.2 91.4 91.1 91.0 90.1 90.7 90.8 83.0 82.8 82.9
5.7 The Codispersion Map Similar to the variogram map described by Isaaks and Srivastava (1989), this codispersion map is a set of estimations for the codispersion coefficient in many different directions on the plane. The main goal is to obtain a summary of the codispersion values for angles in the interval [0, π ] and radius varying in the range ]0, dmax ], where dmax depends on the type of spatial data (i.e., defined on rectangular or general grids). In other words, a plot of the codispersion map summarizes the information about the spatial association between two sequences in a radial way on the plane circumscribing the map in a semisphere of radius dmax .
5.7.1 The Map for Data Defined on a General Lattice The construction of the codispersion map for spatial data defined on a general grid can be summarized by three steps. Step 1 consists of selecting the maximum distance dmax . The second step involves the definition of the points h k where the estimations will be evaluated; these points are chosen uniformly on the interval ]0, dmax ]. The third step consists of choosing angles from the interval [0, π ] to fix the directions that will be used in the estimation. In practice, for each of the selected directions, the estimation of the semivariograms is conducted for the previously selected distances h k . Algorithm 5.7.1 presents the computational procedure to estimate the values of the codispersion and the plotting procedure to yield the codispersion map, as based on Equation γ X (h k ) − γY (h k ) γ X +Y (h k ) − . (5.35) ρ X,Y (h k ) = √ γ X (h k ) γY (h k )
100
5 The Codispersion Coefficient
It should be stressed that in Algorithm 5.7.1, the notation used for the estimation of the semivariogram is γ X (h[k], α[i]), highlighting the dependency of the estimated semivariogram on the angle. This fact is crucial to restrict the search of the points belonging to each Nk . The computational implementation of the semivariograms relies on the numerical computations provided by the R package geoR (Ribeiro and Diggle 2001). Once the estimations in all defined directions are available, the algorithm fits a rectangular grid of the estimations, and bilinear interpolations are generated to fill the rectangular grid. The R code to generate such a codispersion map can be found on the website http://srb2gv.mat.utfsm.cl/Rcodes.html.
Algorithm 5.7.1: Algorithm to construct the codispersion map for spatial data defined on nonrectangular grids.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
input : Two spatial sequences X and Y and the corresponding set of coordinates Sloc = {s1 , . . . , sn } ⊂ D ⊂ R2 output: A codispersion map between the variables X and Y begin dmax /2 ←− Half of the maximum distance among the locations B ←−13 h ←− {dmax /B, 2dmax /B, . . . , dmax } α ←− {0, 0. 01, 0. 02, . . . , π } cir c = (xcir c, ycir c) ←− Cartesian coordinates for the pairs (h[k], α[i]), k = 1, . . . , B, i = 1, . . . , |α| z ←− Array of length |α| · B, where |α| is the length of α for i = 1, 2, . . . , |α| do for k = 1, 2, . . . , B do z[k + (i − 1)B] ←− γ X +Y (h[k], α[i]) − γY (h[k], α[i]) − γ X (h[k], α[i]) end end return plot (inter polate(xcir c, ycir c, z, n)), display the interpolated cells in an inscribed circumference on a semicircle of diameter n end
5.7.2 The Map for Data Defined on a Regular Grid In the case of a rectangular grid, the estimation of the codispersion coefficient is much simpler. The implemented estimation is based on Eq. (5.10). The domain is the set of locations {1, . . . , n} × {1, . . . , m}, where n and m, respectively, denote the numbers of rows and columns of the grid. The simplicity arises from the fact that the distances between two contiguous points are always the same. The possible directions considered are those that satisfy h = (h 1 , h 2 ), with h 1 ∈ {−m + 1, . . . , m + 1}, h 2 ∈ {0, . . . , n − 1}. To ensure a sufficient number of differences in a particular direction, we imposed the restrictions |h 1 | ≤ min(n, m)/3 and h 2 ≤ min(n, m)/3. The observations X (sk ) and Y (sk ), k = 1, . . . , nm, are described in the matrix notation used for the images, i.e., X ( j, i) and Y ( j, i), ( j, i) ∈ {1, . . . , n} × {1, . . . , m}. Con-
5.7 The Codispersion Map
101
trary to the usual convention, we used the index j for the row and i for the column, and position (1, 1) is located at the top-left corner of the grid. Given the observations of the processes X (·) and Y (·) on a finite rectangular grid of size n × m, the observations using the new notation are X ( j, i) and Y ( j, i). Considering the direction h = (h 1 , h 2 ) with |h 1 | ≤ m − 1 and 0 ≤ h 2 ≤ n − 1, the estimator of the codispersion (5.10) can be written as
ρ X Y (h) =
( j,i)∈N (h)
( j,i)∈N (h)
V ( j, i)2
U ( j, i)
,
(5.36)
W ( j, i)2
( j,i)∈N (h)
where U ( j, i) = (X ( j, i) − X ( j + h 1 , i + h 2 ))(Y ( j, i) − Y ( j + h 1 , i + h 2 )), W ( j, i) = (Y ( j, i) − Y ( j + h 1 , i + V ( j, i) = (X ( j, i) − X ( j + h 1 , i + h 2 ))2 , h 2 ))2 , and Sh = {( j, i) : ( j + h 1 , i + h 2 ) ∈ Sloc }. We recall that we are assuming a positive axis in the vertical downward direction. Algorithm 5.7.2 computes the codispersion map in the manner described above. From a practical perspective, the algorithm was written in C and can be run from R using the interface .Call. Thus, the R code is simply a wrapper that enables us to run the routines developed in C internally. The R and C routines and directions for compiling these files can be found on the website http://spatialpack.mat.utfsm.cl/codispmap/.
Algorithm 5.7.2: Algorithm to construct the codispersion map for spatial data defined for rectangular grids.
1 2 3 4 5 6 7 8 9 10
input : Two matrices X and Y output: A codispersion map between the variables X and Y begin n ←− Number of rows m ←− Number of columns h max ←− Integer less than or equal to one-third of the minimum between n and m h ←− Set the directions to be considered in the map for i ∈ {1, . . . , nr ows(h)} do z[h[i, 1], h[i, 2]] ←− ρ X Y (h[i, 1], h[i, 2]) end return plot (z) end
102
5 The Codispersion Coefficient
5.8 Applications and R Computations 5.8.1 The R Function codisp The codisp function computes the codispersion coefficient for general (nonrectangular) grids according to its definition in Eq. (5.10). This function uses the C code developed for the function modified.ttest, which was discussed in Chap. 2. The output object corresponds to a class list "codisp", whose components have a structure similar to that of the elements defined for the modified t-test. The value of the output coef corresponds to a vector of size nclass that contains the values of the codispersion coefficient ρ X Y (h k ) for each of the previously defined strata. The information associated with the upper bounds of the strata is returned in card from the output subject. A generic function has been written to appropriately print the results that are obtained by the function codisp. Additionally, ρ X Y (h k ) versus h k can be plotted using the plot method.
5.8.2 Application 1: Flammability of Carbon Nanotubes Revisited To illustrate a practical application of the codispersion coefficient with data measured on a rectangular grid, an example related to the flammability of polymers is considered. Rukhin and Vallejos (2008) used the codispersion coefficient as a metric of the closeness of two sample polymers and fitted the spatial autoregressive processes to each image; they computed the estimated codispersion coefficient between all pairs of images shown in Fig. 4.2. As a result, the spatial association between pairs (ac), (ad), (bc), and (bd) was almost null. However, the spatial association between the pairs (ab) and (cd) was as high (greater than 0.8) as expected. These conclusions were obtained from the codispersion coefficient ρ X Y (h), which was evaluated in the directions h = (1, 0), h = (0, 1), and h = (1, 1) (see Rukhin and Vallejos 2008). Here, we used the values provided by the function codisp to generate an omnidirectional plot ρ(||h||) against ||h|| in the same manner that the omnidirectional variogram is plotted in spatial statistics. This process provides additional information about the codispersion range, and it is sometimes possible to analyze the shape of the codispersion curve. Figure 5.4 shows the omnidirectional codispersion coefficient between image pairs (ab) and (cd). In all cases, the values of the codispersion are greater than 0.88 for images (a) and (b) and greater than 0.93 for images (c) and (d), which supports the findings of Rukhin and Vallejos (2008). Additionally, the highest values of the codispersion are attained for a distance of 250 units. Figure 5.5 shows the codispersion for pairs of images (ac), (ad), (bc), and (bd). In all cases, the values are close to zero. The function takes the following computational times to compute the codispersion coefficients (considering the thirteen bins that are needed to yield the codispersion
5.8 Applications and R Computations
103
ab
0.86
0.86
0.88
0.88
0.90
0.90
0.92
0.92
0.94
0.94
cd
50
150
250
350
50
150
distances
250
350
distances
Fig. 5.4 Codispersion coefficient between the pairs of images (ab) (left) and (cd) (right) ad
−0.03
−0.03
−0.01
−0.01
0.01
0.01
0.03
0.03
ac
100
200
50
300
100
200
distances
distances
bc
bd
300
−0.03
−0.03
−0.01
−0.01
0.01
0.01
0.03
0.03
50
50
100
200 distances
300
50
100
200
300
distances
Fig. 5.5 Codispersion coefficient between pairs of images (ac) (top left), (ad) (top right), (bc) (bottom left), and (bd) (bottom right)
104
5 The Codispersion Coefficient
plot): (ab) 5 h, 45 min and 23 s; (ac) 5 h 44 min and 8 s; (ad) 5 h 43 min and 53 s; (bc) 5 h 40 min and 42 s; (bd) 5 h 41 min and 2 s; (cd) 5 h 41 min and 43 s. All computations were performed on a PC with a Core 2 quad q8400 2.66 GHz processor and 8 GB RAM DDR2 800 MHz.
5.8.3 Application 2: Comovement Between Time Series The codispersion coefficient was introduced as a measure of comovement between two time series by Vallejos (2008). This coefficient can be used in time series to study how well two chronological sequences move together. When two spatial processes are defined on a set D ⊂ R1 , the codispersion is called the comovement coefficient, which shares a number of the standard properties of the correlation coefficient and is interpretable as the cosine of the angle between the vectors formed by the first difference of the sample series. As in the case of the classic correlation, a comovement coefficient of +1 indicates that the sample function/processes being compared are rescaled/retranslated versions of one another. Similarly, a profile matched with its reflection across the time axis gives a comovement of −1. Spatialpack can be used to quantify the comovement between two time series. For illustrative purposes, a real data example is presented here. The dataset consists of two time series representing the monthly deaths from bronchitis, emphysema, and asthma in the UK for 1974–1979. X represents male deaths while Y represents female deaths during the same period of time. The whole dataset is described in (Diggle 1990, Table A.3) and is available in R. The R code > > > > > > > >
library ( SpatialPack ) x